Commit Graph

601 Commits

Author SHA1 Message Date
Phil Freeman
c93a00b327
Merge branch 'master' into event-trigger-lock-timeout 2020-07-29 13:27:07 -07:00
Anon Ray
046a783a14
server: pass http and websocket request to logging context (#5470)
* pass request body to logging context in all cases

* add message size logging on the websocket API

  this is required by graphql-engine-pro/#416

* message size logging on websocket API

  As we need to log all messages recieved/sent by the websocket server,
  it makes sense to log them as part of the websocket server event logs.
  Previously message recieved were logged inside the onMessage handler,
  and messages sent were logged only for "data" messages (as a server event log)

* fix review comments

Co-authored-by: Phil Freeman <phil@hasura.io>
2020-07-29 20:18:36 +05:30
Anon Ray
6c7e63791f
server: add logging for action handlers (#5471)
* server: add logging for action handlers

* add changelog entry

* change action-handler log type from internal to non-internal

* fix action-handler-log name
2020-07-29 19:00:29 +05:30
Phil Freeman
ca190ef59a
Merge branch 'master' into event-trigger-lock-timeout 2020-07-28 16:12:13 -07:00
Phil Freeman
02ddcb34a0
Apply suggestions from code review
Co-authored-by: Brandon Simmons <brandon.m.simmons@gmail.com>
2020-07-28 16:11:51 -07:00
Phil Freeman
fe70d9fbe8 Add a 30 minute timeout for event trigger locks 2020-07-28 13:45:20 -07:00
Phil Freeman
df51a8eb18
Attach request ID as tracing metadata (#5456)
* Propagate the trace context to event triggers

* Handle missing trace and span IDs

* Store trace context as one LOCAL

* Add migrations

* Documentation

* Include the request ID as trace metadata

* changelog

* Fix warnings

* Respond to code review suggestions

* Respond to code review

* Undo changelog

* Update CHANGELOG.md

* Typo

Co-authored-by: Vamshi Surabhi <0x777@users.noreply.github.com>
2020-07-28 13:06:54 -07:00
Naveen Naidu
664e9df9c6
Tracing: Simplify HTTP traced request (#5451)
Remove the Inversion of Control (SuspendRequest) and simplify
the tracing of HTTP Requests.

Co-authored-by: Phil Freeman <phil@hasura.io>
2020-07-28 11:51:56 -07:00
Anon Ray
434c78267c
server: log request/response sizes for event triggers (#5463)
* server: log request/response sizes for event triggers

  event triggers (and scheduled triggers) now have request/response size
  in their logs.

* add changelog entry
2020-07-28 10:52:44 -07:00
Phil Freeman
0ae5384115
Propagate the trace context to event triggers (#5409)
* Propagate the trace context to event triggers

* Handle missing trace and span IDs

* Store trace context as one LOCAL

* Add migrations

* Documentation

* changelog

* Fix warnings

* Respond to code review suggestions

* Respond to code review

* Undo changelog

* Update CHANGELOG.md

Co-authored-by: Vamshi Surabhi <0x777@users.noreply.github.com>
2020-07-23 13:39:26 -07:00
Brandon Simmons
2eab6a89aa Fix latency buckets for telemetry data
These must have gotten messed up during a refactor. As a consequence
almost all samples received so far fall into the single erroneous 0 to
1K seconds (originally supposed to be 1ms?) bucket.

I also re-thought what the numbers should be, but these are still
arbitrary and might want adjusting in the future.
2020-07-22 12:29:38 -04:00
Aravind
d8481c3a1c
tag release v1.3.0 (#5423) 2020-07-20 20:38:00 +05:30
Tirumarai Selvan
709460b9ce
update pg-client (#5421) 2020-07-20 13:45:15 +05:30
Anon Ray
1eb36bbbb3
server: refactor 'pollQuery' to have a hook to process 'PollDetails' (#5391)
Co-authored-by: Vamshi Surabhi <0x777@users.noreply.github.com>
2020-07-16 18:49:42 +05:30
Phil Freeman
0dddbe9e9d
Add MonadTrace and MonadExecuteQuery abstractions (#5383)
Co-authored-by: Vamshi Surabhi <0x777@users.noreply.github.com>
2020-07-15 16:10:48 +05:30
Soham Chowdhury
dbdf81b26d
docs: note libkrb5-dev dep on Debian, update GHC version (#5377)
* docs: note libkrb5-dev dep on Debian, update GHC version

* docs: note openssl/libssl-dev requirements on Debian
2020-07-15 11:03:33 +05:30
Lyndon Maydwell
24592a516b
Pass environment variables around as a data structure, via @sordina (#5374)
* Pass environment variables around as a data structure, via @sordina

* Resolving build error

* Adding Environment passing note to changelog

* Removing references to ILTPollerLog as this seems to have been reintroduced from a bad merge

* removing commented-out imports

* Language pragmas already set by project

* Linking async thread

* Apply suggestions from code review

Use `runQueryTx` instead of `runLazyTx` for queries.

* remove the non-user facing entry in the changelog

Co-authored-by: Phil Freeman <paf31@cantab.net>
Co-authored-by: Phil Freeman <phil@hasura.io>
Co-authored-by: Vamshi Surabhi <0x777@users.noreply.github.com>
2020-07-14 12:00:58 -07:00
Brandon Simmons
66551acac4 Replace idle GC with a custom GC thread
The current idle GC settings seem never to cause idle GC to trigger.
The changes here at least help memory usage to look more reasonable when
running certain benchmarks, and speculatively could partially fix some
memory leaks users have reported.

See ourIdleGC for details.

Referencing canonical memory issue #3388
2020-07-14 11:54:24 +05:30
Phil Freeman
505ac06d9e
Expose all modules in Cabal file (#5371) 2020-07-14 11:26:53 +05:30
Karthikeyan Chinnakonda
0ef52292b5
server: call the webhook asynchronously in event triggers (#5352)
* server: call the webhook asynchronosly in event triggers
2020-07-10 22:17:05 +05:30
Brandon Simmons
6d235be29c Add --disable-delayed-os-memory-return to default rtsopts
https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/runtime_control.html#rts-flag---disable-delayed-os-memory-return

Referencing canonical memory issue #3388

This is a bit of a mystery. It didn't seem to have any effect in early
repros we had. But now, running an introspection query benchmark I see:

  Running 400 concurrent connections:
    before this change: max residency ~450M
    after: ~140M
    No difference in latency was observed.

  ...BUT: if I give graphql-engine a warmup of 10 requests with 1
  connection (i.e. no concurrency): I see both have a max residency of
  ~140M (i.e. the flag doesn't help)

  ...also interestingly: a single warmup request doesn't seem to have
  any effect (ending RES is still high), 2 requests gets max RES down to
  ~180M.

I suspect many concurrent connections are spraying pinned data over a
bunch of blocks which are then not released to the OS barring memory
pressure. Whatever this is is maybe thread-local or "per-capability" in
some sense...
2020-07-06 11:54:18 -04:00
Brandon Simmons
5cd30e073a
Disable downgrade test for now. Closes #5273 (#5291) 2020-07-06 10:08:26 +05:30
Aravind
7ae3e55a88
tag release v1.3.0-beta.4 (#5281) 2020-07-04 10:15:37 +05:30
Rakesh Emmadi
2fe353a294
allow array relation connection fields regardless of aggregation permission & change relay endpoint to '/v1beta1/relay' (fix #5218) (#5257)
* fix error when array relation connections are queried, fix #5218

* change relay endpoint to '/v1beta1/relay'

* Update CHANGELOG.md

Co-authored-by: Tirumarai Selvan <tiru@hasura.io>
2020-07-03 12:00:35 +05:30
Karthikeyan Chinnakonda
9ef6de5113
server: include additional fields in scheduled trigger webhook payload (#5262)
* include scheduled triggers metadata in the webhook body

Co-authored-by: Tirumarai Selvan <tiru@hasura.io>
2020-07-03 06:25:07 +05:30
Brandon Simmons
9e423a3c55 Fix buggy parsing of new --conn-lifetime flag in 2b0e3774 2020-07-02 13:27:46 -04:00
Karthikeyan Chinnakonda
97b1155bf8
server: unlock scheduled events on graceful shutdown (#4928) 2020-07-02 17:27:09 +05:30
Vamshi Surabhi
9ccbd1c0f6
bump pg-client-hs version (fixes a build issue on some environments) (#5267) 2020-07-02 12:11:08 +05:30
Vamshi Surabhi
cfffade115 do not use prepared statements for mutations 2020-07-02 10:48:35 +05:30
Brandon Simmons
2b0e3774a3
5087 libpq pool leak (#5089)
Shrink libpq buffers to 1MB before returning connection to pool. Closes #5087

See: https://github.com/hasura/pg-client-hs/pull/19

Also related: #3388 #4077
2020-07-01 09:23:10 +05:30
Auke Booij
bc3d735bf3
server/docs: add instructions to fix loss of float precision in PostgreSQL <= 11 (#5187)
This adds a server flag, --pg-connection-options, that can be used to set a PostgreSQL connection parameter, extra_float_digits, that needs to be used to avoid loss of data on older versions of PostgreSQL, which have odd default behavior when returning float values. (fixes #5092)
2020-06-30 10:39:25 +02:00
Brandon Simmons
1dbe5cd1ab
Merge branch 'master' into 5190-improve-websockets-idle-memory-v2-only-kc8 2020-06-26 15:02:49 -04:00
Aravind
a8cc3605b9
tag release v1.3.0-beta.3 (#5207) 2020-06-26 06:58:01 +05:30
Brandon Simmons
f9b5b8382c Lower stack chunk size in RTS to reduce thread STACK memory (closes #5190)
This reduces memory consumption for new idle subscriptions significantly
(see linked ticket).

The hypothesis is: we fork a lot of threads per websocket, and some of
these use slightly more than the initial 1K stack size, so the first
overflow balloons to 32K, when significantly less is required.

However: running with `+RTS -K1K -xc` did not seem to show evidence of
any overflows! So it's a mystery why this improves things.

GHC should probably also be doubling the stack buffer at each overflow
or doing something even smarter; the knobs we have aren't so helpful.
2020-06-25 14:42:51 -04:00
Aravind Shankar
8092e04838
tag release v1.3.0-beta.2 (#5121) 2020-06-25 11:54:16 +05:30
Rakesh Emmadi
8b49f472a2
fix postgres query error for object relationship with permission limit, fix #5148 (#5177)
Co-authored-by: Tirumarai Selvan <tiru@hasura.io>
2020-06-25 09:03:37 +05:30
Rakesh Emmadi
c7ffd882d0
fix relay introspection when remote relationships are defined, fix #5144 (#5145)
Co-authored-by: Tirumarai Selvan <tiru@hasura.io>
2020-06-24 19:25:50 +05:30
Brandon Simmons
2e84a729e2
Fix erroneously-commited cabal.project.dev change from f8a731 (#5183)
Co-authored-by: Tirumarai Selvan <tiru@hasura.io>
2020-06-23 22:20:25 +05:30
Karthikeyan Chinnakonda
6a58c144f5
server: fix updating of headers behaviour in the update cron trigger API and create future events immediately (#5151)
* server: fix bug to update headers in an existing cron trigger and create future events

Co-authored-by: Tirumarai Selvan <tiru@hasura.io>
2020-06-23 20:51:34 +05:30
Rakesh Emmadi
ea23571049
fix introspection faling when multiple actions defined with PG scalar types (fix #5166) (#5173)
Introspection query is failing with `type info not found for xxxx` error message if multiple actions are defined with reused PG scalars. The fix for the same.
2020-06-23 15:35:54 +05:30
nizar-m
f8a7312a30
Regression benchmarks setup (#3310)
* Benchmark GraphQL queries using wrk

* fix console assets dir

* Store wrk parameters as well

* Add details about storing results in Readme

* Remove files in bench-wrk while computing server shasum

* Instead of just getting maximum throughput per query per version,
create plots using wrk2 for a given set of requests per second.
The maximum throughput is used to see what values of requests per second are feasible.

* Add id for version dropdown

* Allow specifiying env and args for GraphQL Engine

1) Arguments defined after -- will be applied as arguments to Hasura GraphQL Engine
2) Script will also pass the environmental variables to Hasura GraphQL Engine instances

Hasura GraphQL engine can be run with the given environmental variables and arguments as follows

$ export HASURA_GRAPHQL_...=....
$ python3 hge_wrk_bench.py -- --hge_arg1 val1 --hge_arg2 val2 ...

* Use matplotlib instead of plotly for figures

* Show throughput graph also.

It maybe useful in checking performance regression across versions

* Support storing results in s3

Use --upload-root-uri 's3://bucket/path' to upload results inside the
given path.When specified, the results will be uploaded to the bucket,
including latencies, latency histogram, and the test setup info.
The s3 credentials should be provided as given in AWS boto3 documentation.

* Allow specifying a name for the test scenario

* Fix open latency uri bug

* Update wrk docker image

* Keep ylim a little higher than maximum so that the throughput plot is clearly visible

* Show throughput plots for multiple queries at the same time

* 1) Adjust size of dropdowns
2) Make label for requests/sec invisible when plot type is throughput

* 1) Adding boto3 to requirements.txt
2) Removing CPU Key print line
3) Adding info about the tests that will be run with wrk2

* Docker builder fo wrk-websocket-server

* Make it optional to setup remote graphql-engine

* Listen on all interfaces and enable ping thread

* Add bench_scripts to wrk-websocket-server docker

* Use 127.0.0.1 instead of 'localhost' to address local hge

For some reason it seems wrk was hanging trying to resolve 'localhost'.
ping was able to fine from the same container, so I'm not sure what the
deal was. Probably some local misconfiguration on my machine, but maybe
this change will also help others.

* Store latency samples in subdirectory, server_shasum just once at start, additional docs

* Add a note on running the benchmarks in the simplest way

* Add a new section on how to run benchmarks on a new linux hosted instance

Co-authored-by: Nizar Malangadan <nizar-m@users.noreply.github.com>
Co-authored-by: Brandon Simmons <brandon.m.simmons@gmail.com>
Co-authored-by: Karthikeyan Chinnakonda <karthikeyan@hasura.io>
Co-authored-by: Brandon Simmons <brandon@hasura.io>
Co-authored-by: Vamshi Surabhi <0x777@users.noreply.github.com>
2020-06-19 22:40:17 +05:30
Anon Ray
a7a60c2dfe
server: changes catalog initialization and logging for pro customization (#5139)
* new typeclass to abstract the logic of QueryLog-ing

* abstract the logic of logging websocket-server logs

  introduce a MonadWSLog typeclass

* move catalog initialization to init step

  expose a helper function to migrate catalog
  create schema cache in initialiseCtx

* expose various modules and functions for pro
2020-06-19 12:12:32 +05:30
Karthikeyan Chinnakonda
d064959ac3
server: drop catalog dependencies when parent column/table is dropped containing a remote relationship (#5119) 2020-06-17 13:18:31 +05:30
Vamshi Surabhi
6fc404329a
generalize query execution logic on Postgres (#5110)
* generalize PGExecCtx to support specialized functions for various operations

* fix tests compilation

* allow customising PGExecCtx when starting the web server
2020-06-16 23:14:59 +05:30
Anon Ray
0cf4cbc5c6
server: refactor GQL execution check and config API (#5094)
Co-authored-by: Vamshi Surabhi <vamshi@hasura.io>
Co-authored-by: Vamshi Surabhi <0x777@users.noreply.github.com>
2020-06-16 20:53:06 +05:30
Rakesh Emmadi
4e229dc568
relay fixes (#5013)
* fix relay introspection failing if any views exist, fix #5020

* reduce base64 encoded node id length, close #5037

* make node field type non-nullable in an edge

* more relay tests with permissions & complete restructure of test yaml files

Co-authored-by: Aravind <aravindkp@outlook.in>
Co-authored-by: Vamshi Surabhi <0x777@users.noreply.github.com>
2020-06-16 19:55:49 +05:30
Brandon Simmons
cf8cf4f5aa
Merge branch 'master' into 4736-security-testing 2020-06-09 10:50:35 -04:00
Karthikeyan Chinnakonda
b782986e48
fix bug which arised when renaming a column/table if it was used to create the remote relationship (#5005) 2020-06-09 19:59:39 +05:30
Brandon Simmons
5e37350561 Refactor and unit test authentication code paths (closes #4736)
The bulk of changes here is some shifting of code around and a little
parameterizing of functions for easier testing.

Also: comments, some renaming for clarity/less-chance-for-misue.
2020-06-08 13:10:58 -04:00
Brandon Simmons
d747bc1148 Tighten up handling of admin secret, more docs
Store the admin secret only as a hash to prevent leaking the secret
inadvertently, and to prevent timing attacks on the secret.

NOTE: best practice for stored user passwords is a function with a
tunable cost like bcrypt, but our threat model is quite different (even
if we thought we could reasonably protect the secret from an attacker
who could read arbitrary regions of memory), and bcrypt is far too slow
(by design) to perform on each request. We'd have to rely on our
(technically savvy) users to choose high entropy passwords in any case.

Referencing #4736
2020-06-08 13:09:25 -04:00