mirror of
https://github.com/hasura/graphql-engine.git
synced 2025-01-07 08:13:18 +03:00
317193df10
<!-- The PR description should answer 2 important questions: --> ### What - Adds datafusion row metrics to our NDC query and aggregate nodes, for explain output - Aggregates all datafusion metrics in the trace attributes: - `rows_processed`, i.e. total number of rows considered over all execution plan nodes - `elapsed_compute`, i.e. CPU time spent in _processing_ data (not fetching it) - Adds the explain output to the `create_logical_plan` span. E.g. a query we don't push down to NDC: ```sql SELECT COUNT(42 * invoiceId) AS odd_count FROM InvoiceLine; ``` Attributes: ```text rows_processed: 2242 total_rows: 1 elapsed_compute: 417 logical_plan: Projection: count(Int64(42) * InvoiceLine.invoiceId) AS odd_count Aggregate: groupBy=[[]], aggr=[[count(Int64(42) * InvoiceLine.invoiceId)]] TableScan: InvoiceLine ``` The metrics clearly indicate that the cost in terms of rows processed per row returned (2242 / 1) is very high in this case. The logical plan makes it clear why this was the case: we failed to push down the aggregate node. ### How <!-- How is it trying to accomplish it (what are the implementation steps)? --> V3_GIT_ORIGIN_REV_ID: c26cce9adab9d0feb0a7d2873a3eea38542564a0 |
||
---|---|---|
.. | ||
src | ||
Cargo.toml | ||
readme.md |
SQL Interface
An experimental SQL interface over OpenDD models. This is mostly targeted at AI use cases for now - GenAI models are better at generating SQL queries than GraphQL queries.
This is implemented using the Apache DataFusion Query Engine by deriving the SQL
metadata for datafusion from Open DDS metadata. As the implementation currently
stands, once we get a LogicalPlan
from datafusion we replace TableScan
s with
NDC queries to the underlying connector. There is a rudimentary optimizer that
pushes down projections to the opendd query so that we don't fetch all the
columns of a collection.