Skip to content

Commit f8f109d

Browse files
ObjectAggregationIterator and ObjectHashAggregateExec
1 parent 658b37c commit f8f109d

18 files changed

+184
-120
lines changed

docs/ObjectAggregationIterator.md

Lines changed: 0 additions & 24 deletions
This file was deleted.

docs/SortBasedAggregationIterator.md

Lines changed: 0 additions & 22 deletions
This file was deleted.

docs/UnsafeRow.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ Object getBaseObject()
122122

123123
* `UnsafeWriter` is requested to `write` an `UnsafeRow`
124124
* `UnsafeExternalRowSorter` is requested to `insertRow` an `UnsafeRow`
125-
* `UnsafeFixedWidthAggregationMap` is requested to [getAggregationBufferFromUnsafeRow](UnsafeFixedWidthAggregationMap.md#getAggregationBufferFromUnsafeRow)
125+
* `UnsafeFixedWidthAggregationMap` is requested to [getAggregationBufferFromUnsafeRow](physical-operators/UnsafeFixedWidthAggregationMap.md#getAggregationBufferFromUnsafeRow)
126126
* `UnsafeKVExternalSorter` is requested to `insertKV`
127127
* `ExternalAppendOnlyUnsafeRowArray` is requested to [add an UnsafeRow](ExternalAppendOnlyUnsafeRowArray.md#add)
128128
* `UnsafeHashedRelation` is requested to [get](physical-operators/UnsafeHashedRelation.md#get), [getValue](physical-operators/UnsafeHashedRelation.md#getValue), [getWithKeyIndex](physical-operators/UnsafeHashedRelation.md#getWithKeyIndex), [getValueWithKeyIndex](physical-operators/UnsafeHashedRelation.md#getValueWithKeyIndex), [apply](physical-operators/UnsafeHashedRelation.md#apply)
@@ -179,8 +179,8 @@ void copyFrom(
179179

180180
`copyFrom` is used when:
181181

182-
* `ObjectAggregationIterator` is requested to [processInputs](ObjectAggregationIterator.md#processInputs) (using `SortBasedAggregator`)
183-
* `TungstenAggregationIterator` is requested to [produce the next UnsafeRow](TungstenAggregationIterator.md#next) and [outputForEmptyGroupingKeyWithoutInput](TungstenAggregationIterator.md#outputForEmptyGroupingKeyWithoutInput)
182+
* `ObjectAggregationIterator` is requested to [processInputs](physical-operators/ObjectAggregationIterator.md#processInputs) (using `SortBasedAggregator`)
183+
* `TungstenAggregationIterator` is requested to [produce the next UnsafeRow](physical-operators/TungstenAggregationIterator.md#next) and [outputForEmptyGroupingKeyWithoutInput](physical-operators/TungstenAggregationIterator.md#outputForEmptyGroupingKeyWithoutInput)
184184

185185
## Deserializing UnsafeRow
186186

docs/configuration-properties.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ Use [SQLConf.ADAPTIVE_CUSTOM_COST_EVALUATOR_CLASS](SQLConf.md#ADAPTIVE_CUSTOM_CO
5252

5353
## <span id="spark.sql.objectHashAggregate.sortBased.fallbackThreshold"> spark.sql.objectHashAggregate.sortBased.fallbackThreshold
5454

55-
**(internal)** The number of rows of an in-memory hash map (to store aggregation buffer) before [ObjectHashAggregateExec](physical-operators/ObjectHashAggregateExec.md) ([ObjectAggregationIterator](ObjectAggregationIterator.md#processInputs) precisely) falls back to sort-based aggregation
55+
**(internal)** The number of rows of an in-memory hash map (to store aggregation buffer) before [ObjectHashAggregateExec](physical-operators/ObjectHashAggregateExec.md) ([ObjectAggregationIterator](physical-operators/ObjectAggregationIterator.md#processInputs) precisely) falls back to sort-based aggregation
5656

5757
Default: `128`
5858

docs/expressions/AggregateExpression.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020

2121
* For `PartialMerge` or `Final` modes, the input to the [AggregateFunction](#aggregateFunction) is [immutable input aggregation buffers](AggregateFunction.md#inputAggBufferAttributes), and the actual children of the `AggregateFunction` is not used
2222

23-
* [AggregateExpression](../AggregationIterator.md#aggregateExpressions)s of a [AggregationIterator](../AggregationIterator.md) cannot have more than 2 distinct modes nor the modes be among `Partial` and `PartialMerge` or `Final` and `Complete` mode pairs
23+
* [AggregateExpression](../physical-operators/AggregationIterator.md#aggregateExpressions)s of a [AggregationIterator](../physical-operators/AggregationIterator.md) cannot have more than 2 distinct modes nor the modes be among `Partial` and `PartialMerge` or `Final` and `Complete` mode pairs
2424

2525
* `Partial` and `Complete` or `PartialMerge` and `Final` pairs are supported
2626

docs/expressions/DeclarativeAggregate.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ Used when:
1616

1717
* `EliminateAggregateFilter` logical optimization is executed
1818
* `AggregatingAccumulator` utility is used to create an `AggregatingAccumulator`
19-
* `AggregationIterator` is requested for the [generateResultProjection](../AggregationIterator.md#generateResultProjection)
19+
* `AggregationIterator` is requested for the [generateResultProjection](../physical-operators/AggregationIterator.md#generateResultProjection)
2020
* `HashAggregateExec` physical operator is requested to [doProduceWithoutKeys](../physical-operators/HashAggregateExec.md#doProduceWithoutKeys) and [generateResultFunction](../physical-operators/HashAggregateExec.md#generateResultFunction)
2121
* `AggregateProcessor` is [created](../window-functions/AggregateProcessor.md#apply)
2222

@@ -32,7 +32,7 @@ Used when:
3232

3333
* `EliminateAggregateFilter` logical optimization is executed
3434
* `AggregatingAccumulator` utility is used to create an `AggregatingAccumulator`
35-
* `AggregationIterator` is [created](../AggregationIterator.md#expressionAggInitialProjection)
35+
* `AggregationIterator` is [created](../physical-operators/AggregationIterator.md#expressionAggInitialProjection)
3636
* `HashAggregateExec` physical operator is requested to [doProduceWithoutKeys](../physical-operators/HashAggregateExec.md#doProduceWithoutKeys), [createHashMap](../physical-operators/HashAggregateExec.md#createHashMap) and [getEmptyAggregationBuffer](../physical-operators/HashAggregateExec.md#getEmptyAggregationBuffer)
3737
* `HashMapGenerator` is created
3838
* `AggregateProcessor` is [created](../window-functions/AggregateProcessor.md#apply)

docs/expressions/ImperativeAggregate.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@ Used when:
1515

1616
* `EliminateAggregateFilter` logical optimization is executed
1717
* `AggregatingAccumulator` is requested to `createBuffer`
18-
* `AggregationIterator` is requested to [initializeBuffer](../AggregationIterator.md#initializeBuffer)
19-
* `ObjectAggregationIterator` is requested to [initAggregationBuffer](../ObjectAggregationIterator.md#initAggregationBuffer)
20-
* `TungstenAggregationIterator` is requested to [createNewAggregationBuffer](../TungstenAggregationIterator.md#createNewAggregationBuffer)
18+
* `AggregationIterator` is requested to [initializeBuffer](../physical-operators/AggregationIterator.md#initializeBuffer)
19+
* `ObjectAggregationIterator` is requested to [initAggregationBuffer](../physical-operators/ObjectAggregationIterator.md#initAggregationBuffer)
20+
* `TungstenAggregationIterator` is requested to [createNewAggregationBuffer](../physical-operators/TungstenAggregationIterator.md#createNewAggregationBuffer)
2121
* `AggregateProcessor` is requested to [initialize](../window-functions/AggregateProcessor.md#initialize)
2222

2323
### <span id="merge"> merge
@@ -31,7 +31,7 @@ merge(
3131
Used when:
3232

3333
* `AggregatingAccumulator` is requested to `merge`
34-
* `AggregationIterator` is requested to [generateProcessRow](../AggregationIterator.md#generateProcessRow)
34+
* `AggregationIterator` is requested to [generateProcessRow](../physical-operators/AggregationIterator.md#generateProcessRow)
3535

3636
### <span id="update"> update
3737

@@ -44,7 +44,7 @@ update(
4444
Used when:
4545

4646
* `AggregatingAccumulator` is requested to `add` an `InternalRow`
47-
* `AggregationIterator` is requested to [generateProcessRow](../AggregationIterator.md#generateProcessRow)
47+
* `AggregationIterator` is requested to [generateProcessRow](../physical-operators/AggregationIterator.md#generateProcessRow)
4848
* `AggregateProcessor` is requested to [update](../window-functions/AggregateProcessor.md#update)
4949

5050
## Implementations

docs/AggregationIterator.md renamed to docs/physical-operators/AggregationIterator.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# AggregationIterators
22

3-
`AggregationIterator` is an [abstraction](#contract) of [aggregation iterators](#implementations) of [UnsafeRow](UnsafeRow.md)s.
3+
`AggregationIterator` is an [abstraction](#contract) of [aggregation iterators](#implementations) of [UnsafeRow](../UnsafeRow.md)s.
44

55
```scala
66
abstract class AggregationIterator(...)
@@ -22,12 +22,12 @@ From [scala.collection.Iterator]({{ scala.api }}/scala/collection/Iterator.html)
2222
`AggregationIterator` takes the following to be created:
2323

2424
* <span id="partIndex"> Partition ID
25-
* <span id="groupingExpressions"> Grouping [NamedExpression](expressions/NamedExpression.md)s
26-
* <span id="inputAttributes"> Input [Attribute](expressions/Attribute.md)s
27-
* <span id="aggregateExpressions"> [AggregateExpression](expressions/AggregateExpression.md)s
28-
* <span id="aggregateAttributes"> Aggregate [Attribute](expressions/Attribute.md)s
25+
* <span id="groupingExpressions"> Grouping [NamedExpression](../expressions/NamedExpression.md)s
26+
* <span id="inputAttributes"> Input [Attribute](../expressions/Attribute.md)s
27+
* <span id="aggregateExpressions"> [AggregateExpression](../expressions/AggregateExpression.md)s
28+
* <span id="aggregateAttributes"> Aggregate [Attribute](../expressions/Attribute.md)s
2929
* <span id="initialInputBufferOffset"> Initial input buffer offset
30-
* <span id="resultExpressions"> Result [NamedExpression](expressions/NamedExpression.md)s
30+
* <span id="resultExpressions"> Result [NamedExpression](../expressions/NamedExpression.md)s
3131
* <span id="newMutableProjection"> Function to create a new `MutableProjection` given expressions and attributes (`(Seq[Expression], Seq[Attribute]) => MutableProjection`)
3232

3333
??? note "Abstract Class"

docs/physical-operators/HashAggregateExec.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,10 @@
1313

1414
`HashAggregateExec` supports [Java code generation](CodegenSupport.md) (aka _codegen_).
1515

16-
`HashAggregateExec` uses [TungstenAggregationIterator](../TungstenAggregationIterator.md) (to iterate over `UnsafeRows` in partitions) when [executed](#doExecute).
16+
`HashAggregateExec` uses [TungstenAggregationIterator](TungstenAggregationIterator.md) (to iterate over `UnsafeRows` in partitions) when [executed](#doExecute).
1717

1818
!!! note
19-
`HashAggregateExec` uses `TungstenAggregationIterator` that can (theoretically) [switch to a sort-based aggregation when the hash-based approach is unable to acquire enough memory](../TungstenAggregationIterator.md#switchToSortBasedAggregation).
19+
`HashAggregateExec` uses `TungstenAggregationIterator` that can (theoretically) [switch to a sort-based aggregation when the hash-based approach is unable to acquire enough memory](TungstenAggregationIterator.md#switchToSortBasedAggregation).
2020

2121
See [testFallbackStartsAt](#testFallbackStartsAt) internal property and [spark.sql.TungstenAggregate.testFallbackStartsAt](../configuration-properties.md#spark.sql.TungstenAggregate.testFallbackStartsAt) configuration property.
2222

@@ -33,7 +33,7 @@ supportsAggregate(
3333
aggregateBufferAttributes: Seq[Attribute]): Boolean
3434
```
3535

36-
`supportsAggregate` [checks support for aggregation](../UnsafeFixedWidthAggregationMap.md#supportsAggregationBufferSchema) given the aggregation buffer [Attribute](../expressions/Attribute.md)s.
36+
`supportsAggregate` [checks support for aggregation](UnsafeFixedWidthAggregationMap.md#supportsAggregationBufferSchema) given the aggregation buffer [Attribute](../expressions/Attribute.md)s.
3737

3838
`supportsAggregate` is used when:
3939

@@ -228,10 +228,10 @@ In the end, `doExecute` calculates the <<aggTime, aggTime>> metric and returns a
228228

229229
* A single-element `Iterator[UnsafeRow]` with the <<TungstenAggregationIterator.md#outputForEmptyGroupingKeyWithoutInput, single UnsafeRow>>
230230

231-
* The [TungstenAggregationIterator](../TungstenAggregationIterator.md)
231+
* The [TungstenAggregationIterator](TungstenAggregationIterator.md)
232232

233233
!!! note
234-
The [numOutputRows](#numOutputRows), [peakMemory](#peakMemory), [spillSize](#spillSize) and [avgHashProbe](#avgHashProbe) metrics are used exclusively to create the [TungstenAggregationIterator](../TungstenAggregationIterator.md).
234+
The [numOutputRows](#numOutputRows), [peakMemory](#peakMemory), [spillSize](#spillSize) and [avgHashProbe](#avgHashProbe) metrics are used exclusively to create the [TungstenAggregationIterator](TungstenAggregationIterator.md).
235235

236236
!!! note
237237
`doExecute` (by `RDD.mapPartitionsWithIndex` transformation) adds a new `MapPartitionsRDD` to the RDD lineage. Use `RDD.toDebugString` to see the additional `MapPartitionsRDD`.
@@ -352,7 +352,7 @@ finishAggregate(
352352
createHashMap(): UnsafeFixedWidthAggregationMap
353353
```
354354

355-
`createHashMap` creates a [UnsafeFixedWidthAggregationMap](../UnsafeFixedWidthAggregationMap.md) (with the <<getEmptyAggregationBuffer, empty aggregation buffer>>, the <<bufferSchema, bufferSchema>>, the <<groupingKeySchema, groupingKeySchema>>, the current `TaskMemoryManager`, `1024 * 16` initial capacity and the page size of the `TaskMemoryManager`)
355+
`createHashMap` creates a [UnsafeFixedWidthAggregationMap](UnsafeFixedWidthAggregationMap.md) (with the <<getEmptyAggregationBuffer, empty aggregation buffer>>, the <<bufferSchema, bufferSchema>>, the <<groupingKeySchema, groupingKeySchema>>, the current `TaskMemoryManager`, `1024 * 16` initial capacity and the page size of the `TaskMemoryManager`)
356356

357357
## Internal Properties
358358

0 commit comments

Comments
 (0)