Skip to content

Commit 4e0fb05

Browse files
InSubquery Expression
1 parent 4ad8c8e commit 4e0fb05

File tree

10 files changed

+135
-45
lines changed

10 files changed

+135
-45
lines changed

docs/bloom-filter-join/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
Bloom Filter Join uses [BloomFilter](BloomFilter.md)s as runtime filters when [spark.sql.optimizer.runtime.bloomFilter.enabled](../configuration-properties.md#spark.sql.optimizer.runtime.bloomFilter.enabled) configuration property is enabled.
99

10-
Bloom Filter Join uses [InjectRuntimeFilter](../logical-optimizations/InjectRuntimeFilter.md) logical optimization to inject up to [spark.sql.optimizer.runtimeFilter.number.threshold](../configuration-properties.md#spark.sql.optimizer.runtimeFilter.number.threshold) filters ([BloomFilter](BloomFilter.md)s or `InSubquery`s).
10+
Bloom Filter Join uses [InjectRuntimeFilter](../logical-optimizations/InjectRuntimeFilter.md) logical optimization to inject up to [spark.sql.optimizer.runtimeFilter.number.threshold](../configuration-properties.md#spark.sql.optimizer.runtimeFilter.number.threshold) filters ([BloomFilter](BloomFilter.md)s or [InSubquery](../expressions/InSubquery.md)s).
1111

1212
??? note "SPARK-32268"
1313
Bloom Filter Join was introduced in [SPARK-32268]({{ spark.jira }}/SPARK-32268).

docs/expressions/InSubquery.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
title: InSubquery
3+
---
4+
5+
# InSubquery Expression
6+
7+
`InSubquery` is a [Predicate](Predicate.md) that represents the following [IN](../sql/AstBuilder.md#withPredicate) SQL predicate in a logical query plan:
8+
9+
```sql
10+
NOT? IN '(' query ')'
11+
```
12+
13+
`InSubquery` can also be used internally for other use cases (e.g., [Runtime Filtering](../runtime-filtering/index.md), [Dynamic Partition Pruning](../dynamic-partition-pruning/index.md)).
14+
15+
## Creating Instance
16+
17+
`InSubquery` takes the following to be created:
18+
19+
* <span id="values"> Values ([Expression](Expression.md)s)
20+
* <span id="query"> [ListQuery](ListQuery.md)
21+
22+
`InSubquery` is created when:
23+
24+
* [InjectRuntimeFilter](../logical-optimizations/InjectRuntimeFilter.md) logical optimization is executed (and [injectInSubqueryFilter](../logical-optimizations/InjectRuntimeFilter.md#injectInSubqueryFilter))
25+
* `AstBuilder` is requested to [withPredicate](../sql/AstBuilder.md#withPredicate) (for `NOT? IN '(' query ')'` SQL predicate)
26+
* [PlanDynamicPruningFilters](../physical-optimizations/PlanDynamicPruningFilters.md) physical optimization is executed (with [spark.sql.optimizer.dynamicPartitionPruning.enabled](../configuration-properties.md#spark.sql.optimizer.dynamicPartitionPruning.enabled) enabled)
27+
* `RowLevelOperationRuntimeGroupFiltering` logical optimization is executed
28+
29+
## Unevaluable { #Unevaluable }
30+
31+
`InSubquery` is an [Unevaluable](Unevaluable.md) expression.
32+
33+
`InSubquery` can be converted to a [Join](../logical-operators/Join.md) operator at logical optimization using [RewritePredicateSubquery](../logical-optimizations/RewritePredicateSubquery.md):
34+
35+
* [Left-Semi Join](../logical-operators/Join.md) unless it is a `NOT IN` that becomes a [Left-Anti Join](../logical-operators/Join.md) (among the other _less_ important cases)
36+
37+
`InSubquery` can also be converted to [InSubqueryExec](InSubqueryExec.md) expression (over a [SubqueryExec](../physical-operators/SubqueryExec.md)) in [PlanSubqueries](../physical-optimizations/PlanSubqueries.md) physical optimization.
38+
39+
## Logical Analysis
40+
41+
`InSubquery` is resolved using the following logical analysis rules:
42+
43+
* [ResolveSubquery](../logical-analysis-rules/ResolveSubquery.md)
44+
* `InConversion`
45+
46+
## Logical Optimization
47+
48+
`InSubquery` is optimized using the following logical optimizations:
49+
50+
* [NullPropagation](../logical-optimizations/NullPropagation.md) (so `null` values give `null` results)
51+
* [InjectRuntimeFilter](../logical-optimizations/InjectRuntimeFilter.md)
52+
* [RewritePredicateSubquery](../logical-optimizations/RewritePredicateSubquery.md)
53+
54+
## Physical Optimization
55+
56+
`InSubquery` is optimized using the following physical optimizations:
57+
58+
* [PlanSubqueries](../physical-optimizations/PlanSubqueries.md)
59+
* [InsertAdaptiveSparkPlan](../physical-optimizations/InsertAdaptiveSparkPlan.md)
60+
* [PlanAdaptiveSubqueries](../physical-optimizations/PlanAdaptiveSubqueries.md)
61+
62+
## Catalyst DSL
63+
64+
`InSubquery` can be created using [in](#in) operator using [Catalyst DSL](../catalyst-dsl/index.md) (via `ImplicitOperators`).
65+
66+
## nodePatterns { #nodePatterns }
67+
68+
??? note "TreeNode"
69+
70+
```scala
71+
nodePatterns: Seq[TreePattern]
72+
```
73+
74+
`nodePatterns` is part of the [TreeNode](../catalyst/TreeNode.md#nodePatterns) abstraction.
75+
76+
`nodePatterns` is [IN_SUBQUERY](../catalyst/TreePattern.md#IN_SUBQUERY).

docs/expressions/InSubqueryExec.md

Lines changed: 41 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
1-
# InSubqueryExec
1+
---
2+
title: InSubqueryExec
3+
---
24

3-
`InSubqueryExec` is a [ExecSubqueryExpression](ExecSubqueryExpression.md) that represents `InSubquery` and [DynamicPruningSubquery](DynamicPruningSubquery.md) expressions at execution time.
5+
# InSubqueryExec Expression
6+
7+
`InSubqueryExec` is an [ExecSubqueryExpression](ExecSubqueryExpression.md) that represents [InSubquery](InSubquery.md) and [DynamicPruningSubquery](DynamicPruningSubquery.md) expressions at execution time.
48

59
## Creating Instance
610

@@ -13,13 +17,12 @@
1317

1418
`InSubqueryExec` is created when:
1519

16-
* [PlanSubqueries](../physical-optimizations/PlanSubqueries.md) physical optimization is executed (and plans `InSubquery` expressions)
17-
* [PlanAdaptiveSubqueries](../physical-optimizations/PlanAdaptiveSubqueries.md) physical optimization is executed (and plans `InSubquery` expressions)
20+
* [PlanSubqueries](../physical-optimizations/PlanSubqueries.md) physical optimization is executed (and plans [InSubquery](InSubquery.md) expressions)
21+
* [PlanAdaptiveSubqueries](../physical-optimizations/PlanAdaptiveSubqueries.md) physical optimization is executed (and plans [InSubquery](InSubquery.md) expressions)
1822
* [PlanDynamicPruningFilters](../physical-optimizations/PlanDynamicPruningFilters.md) physical optimization is executed (and plans [DynamicPruningSubquery](DynamicPruningSubquery.md) expressions)
1923

20-
## Broadcasted Result
24+
## Broadcasted Result { #resultBroadcast }
2125

22-
<span id="resultBroadcast">
2326
```scala
2427
resultBroadcast: Broadcast[Array[Any]]
2528
```
@@ -28,12 +31,16 @@ resultBroadcast: Broadcast[Array[Any]]
2831

2932
`resultBroadcast` is updated when `InSubqueryExec` is requested to [update the collected result](#updateResult).
3033

31-
## <span id="eval"> Interpreted Expression Evaluation
34+
## <span id=""> Interpreted Expression Evaluation { #eval }
3235

33-
```scala
34-
eval(
35-
input: InternalRow): Any
36-
```
36+
??? note "Expression"
37+
38+
```scala
39+
eval(
40+
input: InternalRow): Any
41+
```
42+
43+
`eval` is part of the [Expression](Expression.md#eval) abstraction.
3744

3845
`eval` [prepareResult](#prepareResult).
3946

@@ -44,45 +51,45 @@ eval(
4451
* `null` for `null` evaluation result
4552
* `true` when the [result](#result) contains the evaluation result or `false`
4653

47-
`eval` is part of the [Expression](Expression.md#eval) abstraction.
54+
## Code-Generated Expression Evaluation { #doGenCode }
4855

49-
## Code-Generated Expression Evaluation
56+
??? note "Expression"
5057

51-
<span id="doGenCode">
52-
```scala
53-
doGenCode(
54-
ctx: CodegenContext,
55-
ev: ExprCode): ExprCode
56-
```
58+
```scala
59+
doGenCode(
60+
ctx: CodegenContext,
61+
ev: ExprCode): ExprCode
62+
```
63+
64+
`doGenCode` is part of the [Expression](Expression.md#doGenCode) abstraction.
5765

5866
`doGenCode` [prepareResult](#prepareResult).
5967

6068
`doGenCode` creates a [InSet](InSet.md) expression (with the [child](#child) expression and [result](#result)) and requests it to [doGenCode](Expression.md#doGenCode).
6169

62-
`doGenCode` is part of the [Expression](Expression.md#doGenCode) abstraction.
70+
## Updating Result { #updateResult }
6371

64-
## Updating Result
72+
??? note "ExecSubqueryExpression"
6573

66-
<span id="updateResult">
67-
```scala
68-
updateResult(): Unit
69-
```
74+
```scala
75+
updateResult(): Unit
76+
```
77+
78+
`updateResult` is part of the [ExecSubqueryExpression](ExecSubqueryExpression.md#updateResult) abstraction.
7079

7180
`updateResult` requests the [BaseSubqueryExec](#plan) to [executeCollect](../physical-operators/SparkPlan.md#executeCollect).
7281

7382
`updateResult` uses the collected result to update the [result](#result) and [resultBroadcast](#resultBroadcast) registries.
7483

75-
`updateResult` is part of the [ExecSubqueryExpression](ExecSubqueryExpression.md#updateResult) abstraction.
76-
77-
## <span id="result"> result Registry
84+
## result
7885

7986
```scala
8087
result: Array[Any]
8188
```
8289

8390
`result`...FIXME
8491

85-
## <span id="prepareResult"> prepareResult
92+
## prepareResult { #prepareResult }
8693

8794
```scala
8895
prepareResult(): Unit
@@ -96,4 +103,8 @@ prepareResult(): Unit
96103
[this] has not finished
97104
```
98105

99-
`prepareResult` is used when `InSubqueryExec` expression is evaluated ([interpreted](#eval) or [code-generated](#doGenCode)).
106+
---
107+
108+
`prepareResult` is used when:
109+
110+
* `InSubqueryExec` expression is evaluated ([interpreted](#eval) or [code-generated](#doGenCode)).

docs/logical-analysis-rules/ResolveSubquery.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212
* [ScalarSubquery](../expressions/ExecSubqueryExpression-ScalarSubquery.md)
1313
* [Exists](../expressions/Exists.md)
14-
* [ListQuery](../expressions/ListQuery.md) (in `InSubquery` expressions)
14+
* [ListQuery](../expressions/ListQuery.md) (in [InSubquery](../expressions/InSubquery.md) expressions)
1515

1616
`ResolveSubquery` is part of [Resolution](../Analyzer.md#Resolution) rule batch of the [Logical Analyzer](../Analyzer.md).
1717

@@ -79,7 +79,7 @@ resolveSubQueries(
7979

8080
* [ScalarSubquery](../expressions/ExecSubqueryExpression-ScalarSubquery.md)
8181
* [Exists](../expressions/Exists.md)
82-
* [ListQuery](../expressions/ListQuery.md) (in `InSubquery` expressions)
82+
* [ListQuery](../expressions/ListQuery.md) (in [InSubquery](../expressions/InSubquery.md) expressions)
8383

8484
`resolveSubQueries` is used when `ResolveSubquery` is [executed](#apply).
8585

docs/logical-optimizations/InjectRuntimeFilter.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -126,9 +126,9 @@ Property | Value
126126
Unless the `Aggregate` logical operator [canBroadcastBySize](../JoinSelectionHelper.md#canBroadcastBySize), `injectInSubqueryFilter` returns the given `filterApplicationSidePlan` logical plan (and basically throws away all the work so far).
127127

128128
!!! note
129-
`injectInSubqueryFilter` skips the `InSubquery` filter if the size of the `Aggregate` is beyond [broadcast join threshold](../JoinSelectionHelper.md#canBroadcastBySize) and the semi-join will be a shuffle join, which is not worthwhile.
129+
`injectInSubqueryFilter` skips the [InSubquery](../expressions/InSubquery.md) filter if the size of the `Aggregate` is beyond [broadcast join threshold](../JoinSelectionHelper.md#canBroadcastBySize) and the semi-join will be a shuffle join, which is not worthwhile.
130130

131-
`injectInSubqueryFilter` creates an `InSubquery` logical operator with the following:
131+
`injectInSubqueryFilter` creates an [InSubquery](../expressions/InSubquery.md) expression with the following:
132132

133133
* The given `filterApplicationSideExp` (possibly [mayWrapWithHash](#mayWrapWithHash))
134134
* [ListQuery](../expressions/ListQuery.md) expression with the `Aggregate`

docs/physical-operators/SparkPlan.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -520,7 +520,7 @@ executeCollect(): Array[InternalRow]
520520

521521
* `SparkPlan` is requested to <<executeCollectPublic, executeCollectPublic>>
522522

523-
* `ScalarSubquery` and `InSubquery` plan expressions are requested to `updateResult`
523+
* [ScalarSubquery](../expressions/ScalarSubquery.md) and [InSubquery](../expressions/InSubquery.md) plan expressions are requested to `updateResult`
524524

525525
## <span id="outputPartitioning"> Output Data Partitioning Requirements
526526

docs/physical-optimizations/InsertAdaptiveSparkPlan.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ buildSubqueryMap(
112112
plan: SparkPlan): Map[Long, SubqueryExec]
113113
```
114114

115-
`buildSubqueryMap` finds [ScalarSubquery](../expressions/ScalarSubquery) and [ListQuery](../expressions/ListQuery.md) (in `InSubquery`) expressions (unique by expression ID to reuse the execution plan from another sub-query) in the given [physical query plan](../physical-operators/SparkPlan.md).
115+
`buildSubqueryMap` finds [ScalarSubquery](../expressions/ScalarSubquery) and [ListQuery](../expressions/ListQuery.md) (in [InSubquery](../expressions/InSubquery.md)) expressions (unique by expression ID to reuse the execution plan from another sub-query) in the given [physical query plan](../physical-operators/SparkPlan.md).
116116

117117
For every `ScalarSubquery` and `ListQuery` expressions, `buildSubqueryMap` [compileSubquery](#compileSubquery), [verifyAdaptivePlan](#verifyAdaptivePlan) and registers a new [SubqueryExec](../physical-operators/SubqueryExec.md) operator.
118118

docs/runtime-filtering/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Runtime Filtering
22

3-
**Runtime Filtering** is an optimization of join queries by pre-filtering one side of a join using [Bloom Filter](../bloom-filter-join/index.md) or `InSubquery` predicate based on the values from the other side of the join.
3+
**Runtime Filtering** is an optimization of join queries by pre-filtering one side of a join using [Bloom Filter](../bloom-filter-join/index.md) or [InSubquery](../expressions/InSubquery.md) predicate based on the values from the other side of the join.
44

5-
Runtime Filtering uses [InjectRuntimeFilter](../logical-optimizations/InjectRuntimeFilter.md) logical optimization to inject either [Bloom Filter](../bloom-filter-join/index.md) or `InSubquery` predicate based on [spark.sql.optimizer.runtime.bloomFilter.enabled](../configuration-properties.md#spark.sql.optimizer.runtime.bloomFilter.enabled) configuration property.
5+
Runtime Filtering uses [InjectRuntimeFilter](../logical-optimizations/InjectRuntimeFilter.md) logical optimization to inject either [Bloom Filter](../bloom-filter-join/index.md) or [InSubquery](../expressions/InSubquery.md) predicate based on [spark.sql.optimizer.runtime.bloomFilter.enabled](../configuration-properties.md#spark.sql.optimizer.runtime.bloomFilter.enabled) configuration property.

docs/sql/AstBuilder.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -763,7 +763,7 @@ Creates an [UnresolvedHaving](../logical-operators/UnresolvedHaving.md) for the
763763
HAVING booleanExpression
764764
```
765765

766-
### <span id="withHints"> withHints
766+
### withHints { #withHints }
767767

768768
Adds an [UnresolvedHint](../logical-operators/UnresolvedHint.md) for `/*+ hint */` in `SELECT` queries.
769769

@@ -826,17 +826,19 @@ For regular `SELECT` (no `TRANSFORM`, `MAP` or `REDUCE` qualifiers), `withQueryS
826826

827827
1. [UnresolvedHint](#withHints) unary logical operator (if used in the parsed SQL text)
828828

829-
### withPredicate
829+
### withPredicate { #withPredicate }
830830

831-
* `NOT? IN '(' query ')'` adds an [In](../expressions/In.md) predicate expression with a [ListQuery](../expressions/ListQuery.md) subquery expression
831+
Creates a [InSubquery](../expressions/InSubquery.md) over a [ListQuery](../expressions/ListQuery.md) (possibly "inverted" using `Not` unary expression)
832832

833-
* `NOT? IN '(' expression (',' expression)* ')'` adds an [In](../expressions/In.md) predicate expression
833+
```sql
834+
NOT? IN '(' query ')'
835+
```
834836

835-
### <span id="withPivot"> withPivot
837+
### withPivot { #withPivot }
836838

837839
Creates a [Pivot](../logical-operators/Pivot.md) unary logical operator for the following SQL clause:
838840

839-
```text
841+
```sql
840842
PIVOT '(' aggregates FOR pivotColumn IN '(' pivotValue (',' pivotValue)* ')' ')'
841843
```
842844

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -329,6 +329,7 @@ nav:
329329
- In: expressions/In.md
330330
- Inline: expressions/Inline.md
331331
- InSet: expressions/InSet.md
332+
- InSubquery: expressions/InSubquery.md
332333
- InSubqueryExec: expressions/InSubqueryExec.md
333334
- InterpretedProjection: expressions/InterpretedProjection.md
334335
- JsonToStructs: expressions/JsonToStructs.md

0 commit comments

Comments
 (0)