graphql-engine/server/src-lib/Hasura/Server/Compression.hs
Brandon Simmons 746cae5f4b benchmarks: Add Accept-Encoding: gzip headers (NO REGRESSION)
PR-URL: https://github.com/hasura/graphql-engine-mono/pull/5622
GitOrigin-RevId: 431f7f382573737af9b85966a58c5b262f8cb377
2022-08-25 06:43:18 +00:00

77 lines
2.5 KiB
Haskell

module Hasura.Server.Compression
( compressResponse,
CompressionType (..),
compressionTypeToTxt,
)
where
import Codec.Compression.GZip qualified as GZ
import Data.ByteString.Lazy qualified as BL
import Data.Text qualified as T
import Hasura.Prelude
import Hasura.Server.Utils (gzipHeader)
import Network.HTTP.Types.Header qualified as NH
data CompressionType
= CTGZip
deriving (Show, Eq)
compressionTypeToTxt :: CompressionType -> Text
compressionTypeToTxt CTGZip = "gzip"
-- | Maybe compress the response body
compressResponse ::
NH.RequestHeaders ->
BL.ByteString ->
(BL.ByteString, Maybe NH.Header, Maybe CompressionType)
compressResponse reqHeaders unCompressedResp =
let compressionTypeM = getAcceptedCompression reqHeaders
appendCompressionType (res, headerM) = (res, headerM, compressionTypeM)
gzipCompressionParams =
-- See Note [Compression ratios]
GZ.defaultCompressParams {GZ.compressLevel = GZ.compressionLevel 1}
in appendCompressionType $ case compressionTypeM of
Just CTGZip -> (GZ.compressWith gzipCompressionParams unCompressedResp, Just gzipHeader)
Nothing -> (unCompressedResp, Nothing)
-- | Which, if any, compressed encodings can the client accept?
--
-- https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Encoding
getAcceptedCompression :: NH.RequestHeaders -> Maybe CompressionType
getAcceptedCompression reqHeaders
| "gzip" `elem` acceptEncodingVals = Just CTGZip
| otherwise = Nothing
where
acceptEncodingVals =
concatMap (splitHeaderVal . snd) $
filter (\h -> fst h == NH.hAcceptEncoding) reqHeaders
splitHeaderVal bs = map T.strip $ T.splitOn "," $ bsToTxt bs
{-
Note [Compression ratios]
~~~~~~~~~~~~~~~~~~~~~~~~~
I did some measurements of compression ratios at `gzip -1` (libc) of some
randomly generated json, real json datasets, and output from our benchmarked
chinook queries:
2552/6131 = 0.41
4666/8718 = 0.53
13921/27131 = 0.51
5895/8879 = 0.66 <----- completely random strings
8634/28261 = 0.30
70422/372466 = 0.18
200/600 = 0.33 <----| from chinook graphql benchmarks
3000/33000 = 0.09 <----|
13000/190000 = 0.07 <----'
Given these numbers I would suggest using a rule-of-thumb expected compression
ratio between 2:1 and 10:1, depending on what being conservative means in the
context.
I didn't test higher compression levels much, but `gzip -4` for the most part
resulted in less than 10% smaller output on random json, and ~30% on our highly
compressible benchmark output.
-}