Skip to content

Commit a55d79b

Browse files
Basic Aggregation, groupByKey and FlatMapGroupsWithState
1 parent f8f109d commit a55d79b

25 files changed

+438
-635
lines changed

docs/CheckAnalysis.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ In the end, `checkAnalysis` [marks the entire logical plan as analyzed](logical-
6969
* `CheckAnalysis` is requested to [checkSubqueryExpression](#checkSubqueryExpression)
7070
* Catalyst DSL's [analyze](catalyst-dsl/DslLogicalPlan.md#analyze) operator is used
7171
* `ExpressionEncoder` is requested to [resolveAndBind](ExpressionEncoder.md#resolveAndBind)
72-
* [RelationalGroupedDataset.as](RelationalGroupedDataset.md#as) operator is used
72+
* [RelationalGroupedDataset.as](basic-aggregation/RelationalGroupedDataset.md#as) operator is used
7373

7474
## <span id="checkShowPartitions"> checkShowPartitions
7575

docs/Column.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -307,7 +307,7 @@ generateAlias(e: Expression): String
307307
`generateAlias` is used when:
308308

309309
* `Column` is requested to <<named, named>>
310-
* `RelationalGroupedDataset` is requested to [alias](RelationalGroupedDataset.md#alias)
310+
* `RelationalGroupedDataset` is requested to [alias](basic-aggregation/RelationalGroupedDataset.md#alias)
311311

312312
=== [[named]] `named` Method
313313

@@ -321,4 +321,4 @@ named: NamedExpression
321321
`named` is used when the following operators are used:
322322

323323
* [Dataset.select](spark-sql-dataset-operators.md#select)
324-
* [KeyValueGroupedDataset.agg](KeyValueGroupedDataset.md#agg)
324+
* [KeyValueGroupedDataset.agg](basic-aggregation/KeyValueGroupedDataset.md#agg)

docs/Dataset-untyped-transformations.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Dataset API &mdash; Untyped Transformations
22

3-
**Untyped transformations** are part of the Dataset API for transforming a `Dataset` to a [DataFrame](DataFrame.md), a [Column](Column.md), a [RelationalGroupedDataset](RelationalGroupedDataset.md), a [DataFrameNaFunctions](spark-sql-DataFrameNaFunctions.md) or a [DataFrameStatFunctions](spark-sql-DataFrameStatFunctions.md) (and hence _untyped_).
3+
**Untyped transformations** are part of the Dataset API for transforming a `Dataset` to a [DataFrame](DataFrame.md), a [Column](Column.md), a [RelationalGroupedDataset](basic-aggregation/RelationalGroupedDataset.md), a [DataFrameNaFunctions](spark-sql-DataFrameNaFunctions.md) or a [DataFrameStatFunctions](spark-sql-DataFrameStatFunctions.md) (and hence _untyped_).
44

55
!!! note
66
Untyped transformations are the methods in the `Dataset` Scala class that are grouped in `untypedrel` group name, i.e. `@group untypedrel`.

docs/Dataset.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ When created, `Dataset` requests [QueryExecution](#queryExecution) to [assert an
3535

3636
* [Dataset.select](spark-sql-Dataset-typed-transformations.md#select), [Dataset.randomSplit](spark-sql-Dataset-typed-transformations.md#randomSplit) and [Dataset.mapPartitions](spark-sql-Dataset-typed-transformations.md#mapPartitions) typed transformations are used
3737

38-
* [KeyValueGroupedDataset.agg](KeyValueGroupedDataset.md#agg) operator is used (that requests `KeyValueGroupedDataset` to [aggUntyped](KeyValueGroupedDataset.md#aggUntyped))
38+
* [KeyValueGroupedDataset.agg](basic-aggregation/KeyValueGroupedDataset.md#agg) operator is used (that requests `KeyValueGroupedDataset` to [aggUntyped](basic-aggregation/KeyValueGroupedDataset.md#aggUntyped))
3939

4040
* [SparkSession.emptyDataset](SparkSession.md#emptyDataset) and [SparkSession.range](SparkSession.md#range) operators are used
4141

@@ -550,7 +550,7 @@ Internally, `ofRows` SessionState.md#executePlan[prepares the input `logicalPlan
550550

551551
* `Dataset` is requested to execute <<checkpoint, checkpoint>>, `mapPartitionsInR`, <<withPlan, untyped transformations>> and <<withSetOperator, set-based typed transformations>>
552552

553-
* `RelationalGroupedDataset` is requested to [create a DataFrame from aggregate expressions](RelationalGroupedDataset.md#toDF), `flatMapGroupsInR` and `flatMapGroupsInPandas`
553+
* `RelationalGroupedDataset` is requested to [create a DataFrame from aggregate expressions](basic-aggregation/RelationalGroupedDataset.md#toDF), `flatMapGroupsInR` and `flatMapGroupsInPandas`
554554

555555
* `SparkSession` is requested to <<SparkSession.md#baseRelationToDataFrame, create a DataFrame from a BaseRelation>>, <<SparkSession.md#createDataFrame, createDataFrame>>, <<SparkSession.md#internalCreateDataFrame, internalCreateDataFrame>>, <<SparkSession.md#sql, sql>> and <<SparkSession.md#table, table>>
556556

docs/ExpressionEncoder.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ tuple(
9595
`tuple` is used when:
9696

9797
* `Dataset` is requested to [selectUntyped](Dataset.md#selectUntyped), [select](Dataset.md#select), [joinWith](Dataset.md#joinWith)
98-
* `KeyValueGroupedDataset` is requested to [aggUntyped](KeyValueGroupedDataset.md#aggUntyped)
98+
* `KeyValueGroupedDataset` is requested to [aggUntyped](basic-aggregation/KeyValueGroupedDataset.md#aggUntyped)
9999
* `Encoders` utility is used to [tuple](Encoders.md#tuple)
100100
* `ReduceAggregator` is requested for `bufferEncoder`
101101

docs/KeyValueGroupedDataset.md

Lines changed: 0 additions & 90 deletions
This file was deleted.

docs/QueryExecution.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ val qe = new QueryExecution(sparkSession, plan)
2424
`QueryExecution` is created when:
2525

2626
* [Dataset.ofRows](Dataset.md#ofRows) and [Dataset.selectUntyped](Dataset.md#selectUntyped) are executed
27-
* `KeyValueGroupedDataset` is requested to [aggUntyped](KeyValueGroupedDataset.md#aggUntyped)
27+
* `KeyValueGroupedDataset` is requested to [aggUntyped](basic-aggregation/KeyValueGroupedDataset.md#aggUntyped)
2828
* `CommandUtils` utility is requested to [computeColumnStats](CommandUtils.md#computeColumnStats) and [computePercentiles](CommandUtils.md#computePercentiles)
2929
* `BaseSessionStateBuilder` is requested to [create a QueryExecution for a LogicalPlan](BaseSessionStateBuilder.md#createQueryExecution)
3030

0 commit comments

Comments
 (0)