mirror of https://github.com/hasura/graphql-engine.git synced 2025-01-07 08:13:18 +03:00

History

Phil Freeman 317193df10 [PACHA-24] Metrics in the SQL layer (#1029 ) <!-- The PR description should answer 2 important questions: --> ### What - Adds datafusion row metrics to our NDC query and aggregate nodes, for explain output - Aggregates all datafusion metrics in the trace attributes: - `rows_processed`, i.e. total number of rows considered over all execution plan nodes - `elapsed_compute`, i.e. CPU time spent in _processing_ data (not fetching it) - Adds the explain output to the `create_logical_plan` span. E.g. a query we don't push down to NDC: ```sql SELECT COUNT(42 * invoiceId) AS odd_count FROM InvoiceLine; ``` Attributes: ```text rows_processed: 2242 total_rows: 1 elapsed_compute: 417 logical_plan: Projection: count(Int64(42) * InvoiceLine.invoiceId) AS odd_count Aggregate: groupBy=[[]], aggr=[[count(Int64(42) * InvoiceLine.invoiceId)]] TableScan: InvoiceLine ``` The metrics clearly indicate that the cost in terms of rows processed per row returned (2242 / 1) is very high in this case. The logical plan makes it clear why this was the case: we failed to push down the aggregate node. ### How <!-- How is it trying to accomplish it (what are the implementation steps)? --> V3_GIT_ORIGIN_REV_ID: c26cce9adab9d0feb0a7d2873a3eea38542564a0		2024-08-29 00:56:46 +00:00
..
src	[PACHA-24] Metrics in the SQL layer (#1029 )	2024-08-29 00:56:46 +00:00
Cargo.toml	[PACHA-4] initial support for commands (#975 )	2024-08-18 03:34:15 +00:00
readme.md	Refactor SQL layer to use OpenDD query IR (#925 )	2024-08-05 23:38:19 +00:00

readme.md

SQL Interface

An experimental SQL interface over OpenDD models. This is mostly targeted at AI use cases for now - GenAI models are better at generating SQL queries than GraphQL queries.

This is implemented using the Apache DataFusion Query Engine by deriving the SQL metadata for datafusion from Open DDS metadata. As the implementation currently stands, once we get a LogicalPlan from datafusion we replace TableScans with NDC queries to the underlying connector. There is a rudimentary optimizer that pushes down projections to the opendd query so that we don't fetch all the columns of a collection.