Skip to content

Conversation

@bvolpato
Copy link
Member

@bvolpato bvolpato commented Oct 31, 2025

Warning

This is still work in progress, working on some test cases.

Description

A new rule PushProjectionThroughJoinIntoTableScan was introduced to push projections that appear above a join down to the table scans on either side of the join. This optimization is particularly beneficial for cross-connector joins where the join itself cannot be pushed down to the connectors.

The rule applies when:

  • All projection expressions are deterministic.
  • Each projection expression references columns from only one side of the join.
  • For outer joins, projections on the non-preserved side are not pushed to maintain correctness.

This transformation creates new ProjectNodes on each side of the join, potentially followed by the original ProjectNode if some expressions could not be pushed down. The rule also ensures that symbols required by the join criteria and filter, as well as the original project's output symbols, are preserved through identity projections on the respective sides.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Oct 31, 2025
@sourcery-ai
Copy link

sourcery-ai bot commented Oct 31, 2025

Reviewer's Guide

Introduces a new optimizer rule that pushes deterministic, single-side Project expressions through a Join into the underlying TableScans (especially useful for cross-connector joins), updates the planner to register the rule, and adds/adjusts tests to validate and reflect the new behavior.

Sequence diagram for projection pushdown through join during planning

sequenceDiagram
participant "Planner"
participant "PushProjectionThroughJoinIntoTableScan"
participant "ProjectNode"
participant "JoinNode"
participant "TableScan (Left)"
participant "TableScan (Right)"

"Planner"->>"PushProjectionThroughJoinIntoTableScan": Apply rule to ProjectNode above JoinNode
"PushProjectionThroughJoinIntoTableScan"->>"ProjectNode": Inspect assignments
"PushProjectionThroughJoinIntoTableScan"->>"JoinNode": Inspect join type and criteria
alt Projections are deterministic and reference only one side
    "PushProjectionThroughJoinIntoTableScan"->>"TableScan (Left)": Push left-side projections
    "PushProjectionThroughJoinIntoTableScan"->>"TableScan (Right)": Push right-side projections
end
"PushProjectionThroughJoinIntoTableScan"->>"JoinNode": Create new JoinNode with projected children
alt Remaining projections
    "PushProjectionThroughJoinIntoTableScan"->>"ProjectNode": Add remaining projections above join
end
"PushProjectionThroughJoinIntoTableScan"-->>"Planner": Return transformed plan
Loading

Class diagram for the new PushProjectionThroughJoinIntoTableScan rule

classDiagram
class PushProjectionThroughJoinIntoTableScan {
  +apply(ProjectNode, Captures, Context) Result
  +getPattern() Pattern<ProjectNode>
}
PushProjectionThroughJoinIntoTableScan --|> Rule
class Rule {
}
class ProjectNode {
}
class JoinNode {
}
class Assignments {
}
class PlanNode {
}
PushProjectionThroughJoinIntoTableScan o-- ProjectNode
PushProjectionThroughJoinIntoTableScan o-- JoinNode
PushProjectionThroughJoinIntoTableScan o-- Assignments
PushProjectionThroughJoinIntoTableScan o-- PlanNode
Loading

Class diagram for updated PlanOptimizers registration

classDiagram
class PlanOptimizers {
  +PlanOptimizers(...)
}
class PushProjectionThroughJoinIntoTableScan {
}
PlanOptimizers o-- PushProjectionThroughJoinIntoTableScan
Loading

File-Level Changes

Change Details Files
Implement the PushProjectionThroughJoinIntoTableScan optimization rule and register it
  • Define a rule matching Project over Join, filtering on determinism and non-identity assignments
  • Separate projections by join side, push non-identity expressions down, add identity symbols for join criteria and outputs
  • Reconstruct child ProjectNodes, rebuild the JoinNode, and optionally add a top-level Project to preserve output shape
  • Register the new rule in PlanOptimizers alongside other projection pushdown rules
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/PushProjectionThroughJoinIntoTableScan.java
core/trino-main/src/main/java/io/trino/sql/planner/PlanOptimizers.java
Add comprehensive tests for the new PushProjectionThroughJoinIntoTableScan rule
  • Create TestPushProjectionThroughJoinIntoTableScan with cases where the rule should fire (inner join, single-side projections)
  • Include negative tests for cross-side projections, outer/full joins, identity-only, and non-deterministic expressions
  • Leverage RuleTester with a mock connector to verify rule firing and non-firing scenarios
core/trino-main/src/test/java/io/trino/sql/planner/iterative/rule/TestPushProjectionThroughJoinIntoTableScan.java
Update existing planner tests to reflect new projection pushdown behavior
  • Adjust TestDereferencePushDown to expect pushed-down strictProject under the join’s left child
  • Update TestLogicalPlanner’s expected plan patterns to include an explicit project before unnest in the first branch
core/trino-main/src/test/java/io/trino/sql/planner/TestDereferencePushDown.java
core/trino-main/src/test/java/io/trino/sql/planner/TestLogicalPlanner.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@bvolpato-dd bvolpato-dd force-pushed the bvolpato/push-projection-through-join-into-tablescan branch from 7d6fbf9 to 503519c Compare November 3, 2025 02:32
@github-actions github-actions bot added iceberg Iceberg connector delta-lake Delta Lake connector mongodb MongoDB connector elasticsearch Elasticsearch connector opensearch OpenSearch connector labels Nov 3, 2025
A new rule `PushProjectionThroughJoinIntoTableScan` is introduced to push projections that appear above a join down to the table scans on either side of the join. This optimization is particularly beneficial for cross-connector joins where the join itself cannot be pushed down to the connectors.

The rule applies when:
- All projection expressions are deterministic.
- Each projection expression references columns from only one side of the join.
- For outer joins, projections on the non-preserved side are not pushed to maintain correctness.

This transformation creates new `ProjectNode`s on each side of the join, potentially followed by the original `ProjectNode` if some expressions could not be pushed down. The rule also ensures that symbols required by the join criteria and filter, as well as the original project's output symbols, are preserved through identity projections on the respective sides.
@bvolpato-dd bvolpato-dd force-pushed the bvolpato/push-projection-through-join-into-tablescan branch from 503519c to af993f1 Compare November 3, 2025 02:32
* expression references columns from only one side of the join - For outer joins, projections on
* the non-preserved side are not pushed (to maintain correctness)
*/
public class PushProjectionThroughJoinIntoTableScan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a push projection through join right ? TableScan would be a bit redundant here

Comment on lines +59 to +64
* Project(x, y) -- identity projections
* Join(a = b)
* Project(x := f(a), a) -- pushed down
* TableScan(a, ...)
* Project(y := g(b), b) -- pushed down
* TableScan(b, ...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if the Join filters the data then these projection would be redundant right ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed delta-lake Delta Lake connector elasticsearch Elasticsearch connector iceberg Iceberg connector mongodb MongoDB connector opensearch OpenSearch connector

Development

Successfully merging this pull request may close these issues.

2 participants