You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
*`AggUtils` is used to [planAggregateWithoutDistinct](#planAggregateWithoutDistinct), [planAggregateWithOneDistinct](#planAggregateWithOneDistinct), and `planStreamingAggregation`
67
+
*`AggUtils` is used to [createStreamingAggregate](#createStreamingAggregate), [planAggregateWithoutDistinct](#planAggregateWithoutDistinct), [planAggregateWithOneDistinct](#planAggregateWithOneDistinct)
68
+
69
+
## <spanid="planStreamingAggregation"> Planning Execution of Streaming Aggregation
*`StatefulAggregationStrategy` ([Spark Structured Streaming]({{ book.structured_streaming }}/StatefulAggregationStrategy)) execution planning strategy is requested to plan a logical plan of a streaming aggregation (a streaming query with [Aggregate](logical-operators/Aggregate.md) operator)
`createStreamingAggregate`[creates an aggregate physical operator](#createAggregate) (with `isStreaming` flag enabled).
102
+
103
+
!!! note
104
+
`createStreamingAggregate` is exactly [createAggregate](#createAggregate) with `isStreaming` flag enabled.
105
+
106
+
---
107
+
108
+
`createStreamingAggregate` is used when:
109
+
110
+
*`AggUtils` is requested to plan a [regular](#planStreamingAggregation) and [session-windowed](#planStreamingAggregationForSession) streaming aggregation
Copy file name to clipboardExpand all lines: docs/ObjectAggregationIterator.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@
16
16
* <spanid="newMutableProjection"> Function to create a new `MutableProjection` given expressions and attributes (`(Seq[Expression], Seq[Attribute]) => MutableProjection`)
17
17
* <spanid="originalInputAttributes"> Original Input [Attribute](expressions/Attribute.md)s
Copy file name to clipboardExpand all lines: docs/configuration-properties.md
+8Lines changed: 8 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -50,6 +50,14 @@ Since: `3.2.0`
50
50
51
51
Use [SQLConf.ADAPTIVE_CUSTOM_COST_EVALUATOR_CLASS](SQLConf.md#ADAPTIVE_CUSTOM_COST_EVALUATOR_CLASS) method to access the property (in a type-safe way).
**(internal)** The number of rows of an in-memory hash map (to store aggregation buffer) before [ObjectHashAggregateExec](physical-operators/ObjectHashAggregateExec.md) ([ObjectAggregationIterator](ObjectAggregationIterator.md#processInputs) precisely) falls back to sort-based aggregation
56
+
57
+
Default: `128`
58
+
59
+
Use [SQLConf.objectAggSortBasedFallbackThreshold](SQLConf.md#objectAggSortBasedFallbackThreshold) for the current value
`supportsAggregate` is enabled (`true`) when there is a `TypedImperativeAggregate` aggregate function among the [AggregateFunction](../expressions/AggregateFunction.md)s of the given [AggregateExpression](../expressions/AggregateExpression.md)s.
15
-
16
-
`supportsAggregate` is used when:
5
+
`ObjectHashAggregateExec` uses [ObjectAggregationIterator](../ObjectAggregationIterator.md) for [aggregation](#doExecute) (one per partition).
17
6
18
-
*`AggUtils` utility is used to [select an aggregate physical operator](../AggUtils.md#createAggregate)
7
+

19
8
20
9
## Creating Instance
21
10
22
11
`ObjectHashAggregateExec` takes the following to be created:
23
12
24
-
* <spanid="requiredChildDistributionExpressions"> (optional) Required Child Distribution [Expression](../expressions/Expression.md)s
13
+
* <spanid="requiredChildDistributionExpressions"> Required Child Distribution [Expression](../expressions/Expression.md)s
14
+
*[isStreaming](#isStreaming) flag
15
+
* <spanid="numShufflePartitions"> Number of Shuffle Partitions (always `None`)
*`AggUtils` utility is used to [create a physical operator for aggregation](../AggUtils.md#createAggregate)
25
+
*`AggUtils` is requested to [create a physical operator for aggregation](../AggUtils.md#createAggregate)
26
+
27
+
### <spanid="isStreaming"> isStreaming Flag
28
+
29
+
`ObjectHashAggregateExec` is given `isStreaming` flag when [created](#creating-instance).
30
+
31
+
The `isStreaming` is always `false` but when `AggUtils` is requested to [create a streaming aggregate physical operator](../AggUtils.md#createStreamingAggregate).
35
32
36
33
## <spanid="metrics"> Performance Metrics
37
34
38
-
Key | Name (in web UI)
39
-
----------------|--------------------------
40
-
numOutputRows | number of output rows
41
-
aggTime | time in aggregation build
35
+
### <spanid="aggTime"> time in aggregation build
36
+
37
+
The time to [doExecute](#doExecute) of a single partition.
38
+
39
+
### <spanid="numOutputRows"> number of output rows
40
+
41
+
*`1` when there is no input rows in a partition and no [groupingExpressions](#groupingExpressions).
42
+
* Used to create an [ObjectAggregationIterator](../ObjectAggregationIterator.md#numOutputRows).
43
+
44
+
### <spanid="numTasksFallBacked"> number of sort fallback tasks
45
+
46
+
Used to create a [ObjectAggregationIterator](../ObjectAggregationIterator.md#numTasksFallBacked).
47
+
48
+
### <spanid="spillSize"> spill size
49
+
50
+
Used to create a [ObjectAggregationIterator](../ObjectAggregationIterator.md#spillSize).
`doExecute` is part of the [SparkPlan](SparkPlan.md#doExecute) abstraction.
50
59
51
-
`doExecute` uses [ObjectAggregationIterator](../ObjectAggregationIterator.md) for aggregation (one per partition).
60
+
---
61
+
62
+
`doExecute` requests the [child physical operator](#child) to [execute](SparkPlan.md#execute) (and generate an `RDD[InternalRow]`) that is `mapPartitionsWithIndexInternal` to process partitions.
63
+
64
+
!!! note
65
+
`doExecute` adds a new `MapPartitionsRDD` ([Spark Core]({{ book.spark_core }}/rdd/MapPartitionsRDD)) to the RDD lineage.
66
+
67
+
For no input records (in a partition) and non-empty [groupingExpressions](#groupingExpressions), `doExecute` returns an empty `Iterator`.
68
+
69
+
Otherwise, `doExecute` creates a [ObjectAggregationIterator](../ObjectAggregationIterator.md).
52
70
53
-
`doExecute`...FIXME
71
+
For no input records (in a partition) and no [groupingExpressions](#groupingExpressions), `doExecute` increments the [numOutputRows](#numOutputRows) metric (so it's just `1`) and requests the `ObjectAggregationIterator` for [outputForEmptyGroupingKeyWithoutInput](../ObjectAggregationIterator.md#outputForEmptyGroupingKeyWithoutInput).
72
+
73
+
Otherwise, `doExecute` returns the `ObjectAggregationIterator`.
`supportsAggregate` is enabled (`true`) when there is a `TypedImperativeAggregate` aggregate function among the [AggregateFunction](../expressions/AggregateFunction.md)s of the given [AggregateExpression](../expressions/AggregateExpression.md)s.
83
+
84
+
---
85
+
86
+
`supportsAggregate` is used when:
87
+
88
+
*`AggUtils` utility is used to [select an aggregate physical operator](../AggUtils.md#createAggregate)
0 commit comments