* pass request body to logging context in all cases
* add message size logging on the websocket API
this is required by graphql-engine-pro/#416
* message size logging on websocket API
As we need to log all messages recieved/sent by the websocket server,
it makes sense to log them as part of the websocket server event logs.
Previously message recieved were logged inside the onMessage handler,
and messages sent were logged only for "data" messages (as a server event log)
* fix review comments
Co-authored-by: Phil Freeman <phil@hasura.io>
* server: add logging for action handlers
* add changelog entry
* change action-handler log type from internal to non-internal
* fix action-handler-log name
* Propagate the trace context to event triggers
* Handle missing trace and span IDs
* Store trace context as one LOCAL
* Add migrations
* Documentation
* Include the request ID as trace metadata
* changelog
* Fix warnings
* Respond to code review suggestions
* Respond to code review
* Undo changelog
* Update CHANGELOG.md
* Typo
Co-authored-by: Vamshi Surabhi <0x777@users.noreply.github.com>
* server: log request/response sizes for event triggers
event triggers (and scheduled triggers) now have request/response size
in their logs.
* add changelog entry
These must have gotten messed up during a refactor. As a consequence
almost all samples received so far fall into the single erroneous 0 to
1K seconds (originally supposed to be 1ms?) bucket.
I also re-thought what the numbers should be, but these are still
arbitrary and might want adjusting in the future.
* Pass environment variables around as a data structure, via @sordina
* Resolving build error
* Adding Environment passing note to changelog
* Removing references to ILTPollerLog as this seems to have been reintroduced from a bad merge
* removing commented-out imports
* Language pragmas already set by project
* Linking async thread
* Apply suggestions from code review
Use `runQueryTx` instead of `runLazyTx` for queries.
* remove the non-user facing entry in the changelog
Co-authored-by: Phil Freeman <paf31@cantab.net>
Co-authored-by: Phil Freeman <phil@hasura.io>
Co-authored-by: Vamshi Surabhi <0x777@users.noreply.github.com>
The current idle GC settings seem never to cause idle GC to trigger.
The changes here at least help memory usage to look more reasonable when
running certain benchmarks, and speculatively could partially fix some
memory leaks users have reported.
See ourIdleGC for details.
Referencing canonical memory issue #3388
https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/runtime_control.html#rts-flag---disable-delayed-os-memory-return
Referencing canonical memory issue #3388
This is a bit of a mystery. It didn't seem to have any effect in early
repros we had. But now, running an introspection query benchmark I see:
Running 400 concurrent connections:
before this change: max residency ~450M
after: ~140M
No difference in latency was observed.
...BUT: if I give graphql-engine a warmup of 10 requests with 1
connection (i.e. no concurrency): I see both have a max residency of
~140M (i.e. the flag doesn't help)
...also interestingly: a single warmup request doesn't seem to have
any effect (ending RES is still high), 2 requests gets max RES down to
~180M.
I suspect many concurrent connections are spraying pinned data over a
bunch of blocks which are then not released to the OS barring memory
pressure. Whatever this is is maybe thread-local or "per-capability" in
some sense...
This adds a server flag, --pg-connection-options, that can be used to set a PostgreSQL connection parameter, extra_float_digits, that needs to be used to avoid loss of data on older versions of PostgreSQL, which have odd default behavior when returning float values. (fixes#5092)
This reduces memory consumption for new idle subscriptions significantly
(see linked ticket).
The hypothesis is: we fork a lot of threads per websocket, and some of
these use slightly more than the initial 1K stack size, so the first
overflow balloons to 32K, when significantly less is required.
However: running with `+RTS -K1K -xc` did not seem to show evidence of
any overflows! So it's a mystery why this improves things.
GHC should probably also be doubling the stack buffer at each overflow
or doing something even smarter; the knobs we have aren't so helpful.
Introspection query is failing with `type info not found for xxxx` error message if multiple actions are defined with reused PG scalars. The fix for the same.
* Benchmark GraphQL queries using wrk
* fix console assets dir
* Store wrk parameters as well
* Add details about storing results in Readme
* Remove files in bench-wrk while computing server shasum
* Instead of just getting maximum throughput per query per version,
create plots using wrk2 for a given set of requests per second.
The maximum throughput is used to see what values of requests per second are feasible.
* Add id for version dropdown
* Allow specifiying env and args for GraphQL Engine
1) Arguments defined after -- will be applied as arguments to Hasura GraphQL Engine
2) Script will also pass the environmental variables to Hasura GraphQL Engine instances
Hasura GraphQL engine can be run with the given environmental variables and arguments as follows
$ export HASURA_GRAPHQL_...=....
$ python3 hge_wrk_bench.py -- --hge_arg1 val1 --hge_arg2 val2 ...
* Use matplotlib instead of plotly for figures
* Show throughput graph also.
It maybe useful in checking performance regression across versions
* Support storing results in s3
Use --upload-root-uri 's3://bucket/path' to upload results inside the
given path.When specified, the results will be uploaded to the bucket,
including latencies, latency histogram, and the test setup info.
The s3 credentials should be provided as given in AWS boto3 documentation.
* Allow specifying a name for the test scenario
* Fix open latency uri bug
* Update wrk docker image
* Keep ylim a little higher than maximum so that the throughput plot is clearly visible
* Show throughput plots for multiple queries at the same time
* 1) Adjust size of dropdowns
2) Make label for requests/sec invisible when plot type is throughput
* 1) Adding boto3 to requirements.txt
2) Removing CPU Key print line
3) Adding info about the tests that will be run with wrk2
* Docker builder fo wrk-websocket-server
* Make it optional to setup remote graphql-engine
* Listen on all interfaces and enable ping thread
* Add bench_scripts to wrk-websocket-server docker
* Use 127.0.0.1 instead of 'localhost' to address local hge
For some reason it seems wrk was hanging trying to resolve 'localhost'.
ping was able to fine from the same container, so I'm not sure what the
deal was. Probably some local misconfiguration on my machine, but maybe
this change will also help others.
* Store latency samples in subdirectory, server_shasum just once at start, additional docs
* Add a note on running the benchmarks in the simplest way
* Add a new section on how to run benchmarks on a new linux hosted instance
Co-authored-by: Nizar Malangadan <nizar-m@users.noreply.github.com>
Co-authored-by: Brandon Simmons <brandon.m.simmons@gmail.com>
Co-authored-by: Karthikeyan Chinnakonda <karthikeyan@hasura.io>
Co-authored-by: Brandon Simmons <brandon@hasura.io>
Co-authored-by: Vamshi Surabhi <0x777@users.noreply.github.com>
* new typeclass to abstract the logic of QueryLog-ing
* abstract the logic of logging websocket-server logs
introduce a MonadWSLog typeclass
* move catalog initialization to init step
expose a helper function to migrate catalog
create schema cache in initialiseCtx
* expose various modules and functions for pro
* generalize PGExecCtx to support specialized functions for various operations
* fix tests compilation
* allow customising PGExecCtx when starting the web server
* fix relay introspection failing if any views exist, fix#5020
* reduce base64 encoded node id length, close#5037
* make node field type non-nullable in an edge
* more relay tests with permissions & complete restructure of test yaml files
Co-authored-by: Aravind <aravindkp@outlook.in>
Co-authored-by: Vamshi Surabhi <0x777@users.noreply.github.com>
The bulk of changes here is some shifting of code around and a little
parameterizing of functions for easier testing.
Also: comments, some renaming for clarity/less-chance-for-misue.
Store the admin secret only as a hash to prevent leaking the secret
inadvertently, and to prevent timing attacks on the secret.
NOTE: best practice for stored user passwords is a function with a
tunable cost like bcrypt, but our threat model is quite different (even
if we thought we could reasonably protect the secret from an attacker
who could read arbitrary regions of memory), and bcrypt is far too slow
(by design) to perform on each request. We'd have to rely on our
(technically savvy) users to choose high entropy passwords in any case.
Referencing #4736