You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/bloom-filter-join/index.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@
7
7
8
8
Bloom Filter Join uses [BloomFilter](BloomFilter.md)s as runtime filters when [spark.sql.optimizer.runtime.bloomFilter.enabled](../configuration-properties.md#spark.sql.optimizer.runtime.bloomFilter.enabled) configuration property is enabled.
9
9
10
-
Bloom Filter Join uses [InjectRuntimeFilter](../logical-optimizations/InjectRuntimeFilter.md) logical optimization to inject up to [spark.sql.optimizer.runtimeFilter.number.threshold](../configuration-properties.md#spark.sql.optimizer.runtimeFilter.number.threshold) filters ([BloomFilter](BloomFilter.md)s or `InSubquery`s).
10
+
Bloom Filter Join uses [InjectRuntimeFilter](../logical-optimizations/InjectRuntimeFilter.md) logical optimization to inject up to [spark.sql.optimizer.runtimeFilter.number.threshold](../configuration-properties.md#spark.sql.optimizer.runtimeFilter.number.threshold) filters ([BloomFilter](BloomFilter.md)s or [InSubquery](../expressions/InSubquery.md)s).
11
11
12
12
??? note "SPARK-32268"
13
13
Bloom Filter Join was introduced in [SPARK-32268]({{ spark.jira }}/SPARK-32268).
`InSubquery` is a [Predicate](Predicate.md) that represents the following [IN](../sql/AstBuilder.md#withPredicate) SQL predicate in a logical query plan:
8
+
9
+
```sql
10
+
NOT? IN'(' query ')'
11
+
```
12
+
13
+
`InSubquery` can also be used internally for other use cases (e.g., [Runtime Filtering](../runtime-filtering/index.md), [Dynamic Partition Pruning](../dynamic-partition-pruning/index.md)).
*[InjectRuntimeFilter](../logical-optimizations/InjectRuntimeFilter.md) logical optimization is executed (and [injectInSubqueryFilter](../logical-optimizations/InjectRuntimeFilter.md#injectInSubqueryFilter))
25
+
*`AstBuilder` is requested to [withPredicate](../sql/AstBuilder.md#withPredicate) (for `NOT? IN '(' query ')'` SQL predicate)
26
+
*[PlanDynamicPruningFilters](../physical-optimizations/PlanDynamicPruningFilters.md) physical optimization is executed (with [spark.sql.optimizer.dynamicPartitionPruning.enabled](../configuration-properties.md#spark.sql.optimizer.dynamicPartitionPruning.enabled) enabled)
27
+
*`RowLevelOperationRuntimeGroupFiltering` logical optimization is executed
28
+
29
+
## Unevaluable { #Unevaluable }
30
+
31
+
`InSubquery` is an [Unevaluable](Unevaluable.md) expression.
32
+
33
+
`InSubquery` can be converted to a [Join](../logical-operators/Join.md) operator at logical optimization using [RewritePredicateSubquery](../logical-optimizations/RewritePredicateSubquery.md):
34
+
35
+
*[Left-Semi Join](../logical-operators/Join.md) unless it is a `NOT IN` that becomes a [Left-Anti Join](../logical-operators/Join.md) (among the other _less_ important cases)
36
+
37
+
`InSubquery` can also be converted to [InSubqueryExec](InSubqueryExec.md) expression (over a [SubqueryExec](../physical-operators/SubqueryExec.md)) in [PlanSubqueries](../physical-optimizations/PlanSubqueries.md) physical optimization.
38
+
39
+
## Logical Analysis
40
+
41
+
`InSubquery` is resolved using the following logical analysis rules:
`InSubqueryExec` is a [ExecSubqueryExpression](ExecSubqueryExpression.md) that represents `InSubquery` and [DynamicPruningSubquery](DynamicPruningSubquery.md) expressions at execution time.
5
+
# InSubqueryExec Expression
6
+
7
+
`InSubqueryExec` is an [ExecSubqueryExpression](ExecSubqueryExpression.md) that represents [InSubquery](InSubquery.md) and [DynamicPruningSubquery](DynamicPruningSubquery.md) expressions at execution time.
4
8
5
9
## Creating Instance
6
10
@@ -13,13 +17,12 @@
13
17
14
18
`InSubqueryExec` is created when:
15
19
16
-
*[PlanSubqueries](../physical-optimizations/PlanSubqueries.md) physical optimization is executed (and plans `InSubquery` expressions)
17
-
*[PlanAdaptiveSubqueries](../physical-optimizations/PlanAdaptiveSubqueries.md) physical optimization is executed (and plans `InSubquery` expressions)
20
+
*[PlanSubqueries](../physical-optimizations/PlanSubqueries.md) physical optimization is executed (and plans [InSubquery](InSubquery.md) expressions)
21
+
*[PlanAdaptiveSubqueries](../physical-optimizations/PlanAdaptiveSubqueries.md) physical optimization is executed (and plans [InSubquery](InSubquery.md) expressions)
18
22
*[PlanDynamicPruningFilters](../physical-optimizations/PlanDynamicPruningFilters.md) physical optimization is executed (and plans [DynamicPruningSubquery](DynamicPruningSubquery.md) expressions)
`doGenCode` is part of the [Expression](Expression.md#doGenCode) abstraction.
57
65
58
66
`doGenCode`[prepareResult](#prepareResult).
59
67
60
68
`doGenCode` creates a [InSet](InSet.md) expression (with the [child](#child) expression and [result](#result)) and requests it to [doGenCode](Expression.md#doGenCode).
61
69
62
-
`doGenCode` is part of the [Expression](Expression.md#doGenCode) abstraction.
70
+
## Updating Result { #updateResult }
63
71
64
-
## Updating Result
72
+
??? note "ExecSubqueryExpression"
65
73
66
-
<spanid="updateResult">
67
-
```scala
68
-
updateResult():Unit
69
-
```
74
+
```scala
75
+
updateResult(): Unit
76
+
```
77
+
78
+
`updateResult` is part of the [ExecSubqueryExpression](ExecSubqueryExpression.md#updateResult) abstraction.
70
79
71
80
`updateResult` requests the [BaseSubqueryExec](#plan) to [executeCollect](../physical-operators/SparkPlan.md#executeCollect).
72
81
73
82
`updateResult` uses the collected result to update the [result](#result) and [resultBroadcast](#resultBroadcast) registries.
74
83
75
-
`updateResult` is part of the [ExecSubqueryExpression](ExecSubqueryExpression.md#updateResult) abstraction.
76
-
77
-
## <spanid="result"> result Registry
84
+
## result
78
85
79
86
```scala
80
87
result:Array[Any]
81
88
```
82
89
83
90
`result`...FIXME
84
91
85
-
## <spanid="prepareResult"> prepareResult
92
+
## prepareResult { #prepareResult }
86
93
87
94
```scala
88
95
prepareResult():Unit
@@ -96,4 +103,8 @@ prepareResult(): Unit
96
103
[this] has not finished
97
104
```
98
105
99
-
`prepareResult` is used when `InSubqueryExec` expression is evaluated ([interpreted](#eval) or [code-generated](#doGenCode)).
106
+
---
107
+
108
+
`prepareResult` is used when:
109
+
110
+
*`InSubqueryExec` expression is evaluated ([interpreted](#eval) or [code-generated](#doGenCode)).
Copy file name to clipboardExpand all lines: docs/logical-optimizations/InjectRuntimeFilter.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -126,9 +126,9 @@ Property | Value
126
126
Unless the `Aggregate` logical operator [canBroadcastBySize](../JoinSelectionHelper.md#canBroadcastBySize), `injectInSubqueryFilter` returns the given `filterApplicationSidePlan` logical plan (and basically throws away all the work so far).
127
127
128
128
!!! note
129
-
`injectInSubqueryFilter` skips the `InSubquery` filter if the size of the `Aggregate` is beyond [broadcast join threshold](../JoinSelectionHelper.md#canBroadcastBySize) and the semi-join will be a shuffle join, which is not worthwhile.
129
+
`injectInSubqueryFilter` skips the [InSubquery](../expressions/InSubquery.md) filter if the size of the `Aggregate` is beyond [broadcast join threshold](../JoinSelectionHelper.md#canBroadcastBySize) and the semi-join will be a shuffle join, which is not worthwhile.
130
130
131
-
`injectInSubqueryFilter` creates an `InSubquery` logical operator with the following:
131
+
`injectInSubqueryFilter` creates an [InSubquery](../expressions/InSubquery.md) expression with the following:
132
132
133
133
* The given `filterApplicationSideExp` (possibly [mayWrapWithHash](#mayWrapWithHash))
134
134
*[ListQuery](../expressions/ListQuery.md) expression with the `Aggregate`
Copy file name to clipboardExpand all lines: docs/physical-optimizations/InsertAdaptiveSparkPlan.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -112,7 +112,7 @@ buildSubqueryMap(
112
112
plan: SparkPlan):Map[Long, SubqueryExec]
113
113
```
114
114
115
-
`buildSubqueryMap` finds [ScalarSubquery](../expressions/ScalarSubquery) and [ListQuery](../expressions/ListQuery.md) (in `InSubquery`) expressions (unique by expression ID to reuse the execution plan from another sub-query) in the given [physical query plan](../physical-operators/SparkPlan.md).
115
+
`buildSubqueryMap` finds [ScalarSubquery](../expressions/ScalarSubquery) and [ListQuery](../expressions/ListQuery.md) (in [InSubquery](../expressions/InSubquery.md)) expressions (unique by expression ID to reuse the execution plan from another sub-query) in the given [physical query plan](../physical-operators/SparkPlan.md).
116
116
117
117
For every `ScalarSubquery` and `ListQuery` expressions, `buildSubqueryMap`[compileSubquery](#compileSubquery), [verifyAdaptivePlan](#verifyAdaptivePlan) and registers a new [SubqueryExec](../physical-operators/SubqueryExec.md) operator.
**Runtime Filtering** is an optimization of join queries by pre-filtering one side of a join using [Bloom Filter](../bloom-filter-join/index.md) or `InSubquery` predicate based on the values from the other side of the join.
3
+
**Runtime Filtering** is an optimization of join queries by pre-filtering one side of a join using [Bloom Filter](../bloom-filter-join/index.md) or [InSubquery](../expressions/InSubquery.md) predicate based on the values from the other side of the join.
4
4
5
-
Runtime Filtering uses [InjectRuntimeFilter](../logical-optimizations/InjectRuntimeFilter.md) logical optimization to inject either [Bloom Filter](../bloom-filter-join/index.md) or `InSubquery` predicate based on [spark.sql.optimizer.runtime.bloomFilter.enabled](../configuration-properties.md#spark.sql.optimizer.runtime.bloomFilter.enabled) configuration property.
5
+
Runtime Filtering uses [InjectRuntimeFilter](../logical-optimizations/InjectRuntimeFilter.md) logical optimization to inject either [Bloom Filter](../bloom-filter-join/index.md) or [InSubquery](../expressions/InSubquery.md) predicate based on [spark.sql.optimizer.runtime.bloomFilter.enabled](../configuration-properties.md#spark.sql.optimizer.runtime.bloomFilter.enabled) configuration property.
0 commit comments