Skip to content

Commit 08dbad7

Browse files
WriteToDataSourceV2(Exec) migrates to Structured Streaming
1 parent e419e10 commit 08dbad7

File tree

8 files changed

+15
-91
lines changed

8 files changed

+15
-91
lines changed

docs/execution-planning-strategies/DataSourceV2Strategy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Logical Operator | Physical Operator
77
[DataSourceV2ScanRelation](../logical-operators/DataSourceV2ScanRelation.md) with [V1Scan](../connector/V1Scan.md) | [RowDataSourceScanExec](../physical-operators/RowDataSourceScanExec.md)
88
[DataSourceV2ScanRelation](../logical-operators/DataSourceV2ScanRelation.md) | [BatchScanExec](../physical-operators/BatchScanExec.md)
99
`StreamingDataSourceV2Relation` |
10-
[WriteToDataSourceV2](../logical-operators/WriteToDataSourceV2.md) | [WriteToDataSourceV2Exec](../physical-operators/WriteToDataSourceV2Exec.md)
10+
`WriteToDataSourceV2` ([Spark Structured Streaming]({{ book.structured_streaming }}/logical-operators/WriteToDataSourceV2)) | `WriteToDataSourceV2Exec` ([Spark Structured Streaming]({{ book.structured_streaming }}/physical-operators/WriteToDataSourceV2Exec))
1111
[CreateTableAsSelect](../logical-operators/CreateTableAsSelect.md) | `AtomicCreateTableAsSelectExec` or [CreateTableAsSelectExec](../physical-operators/CreateTableAsSelectExec.md)
1212
`RefreshTable` | `RefreshTableExec`
1313
`ReplaceTable` | `AtomicReplaceTableExec` or `ReplaceTableExec`

docs/logical-operators/AppendData.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,6 @@
22

33
`AppendData` is a [V2WriteCommand](V2WriteCommand.md) that represents appending data (the result of executing a [structured query](#query)) to a [table](#table) (with the [columns matching](#isByName) by [name](#byName) or [position](#byPosition)).
44

5-
!!! note
6-
`AppendData` has replaced the deprecated [WriteToDataSourceV2](WriteToDataSourceV2.md) logical operator.
7-
85
## Creating Instance
96

107
`AppendData` takes the following to be created:

docs/logical-operators/LogicalPlan.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,13 +27,11 @@ Logical operators with a single [child](../catalyst/TreeNode.md#children) logica
2727
* [CreateTable](CreateTable.md)
2828
* [IgnoreCachedData](IgnoreCachedData.md)
2929
* [NamedRelation](NamedRelation.md)
30-
* `ObjectProducer`
3130
* [ParsedStatement](ParsedStatement.md)
3231
* [SupportsSubquery](SupportsSubquery.md)
33-
* `Union`
3432
* [V2CreateTablePlan](V2CreateTablePlan.md)
3533
* [View](View.md)
36-
* [WriteToDataSourceV2](WriteToDataSourceV2.md)
34+
* _others_
3735

3836
## <span id="statsCache"> Statistics Cache
3937

docs/logical-operators/WriteToDataSourceV2.md

Lines changed: 0 additions & 23 deletions
This file was deleted.

docs/new-and-noteworthy/datasource-v2.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -24,13 +24,7 @@ When executed, `DataSourceV2ScanExec` physical operator creates a [DataSourceRDD
2424

2525
## Data Writing
2626

27-
DataSource V2 uses [WriteToDataSourceV2](../logical-operators/WriteToDataSourceV2.md) and [AppendData](../logical-operators/AppendData.md) logical operators to represent data writing (over a [DataSourceV2Relation](../logical-operators/DataSourceV2Relation.md) logical operator). As of Spark SQL 2.4.0, `WriteToDataSourceV2` operator was deprecated for the more specific `AppendData` operator (compare _"data writing"_ to _"data append"_ which is certainly more specific).
28-
29-
NOTE: One of the differences between `WriteToDataSourceV2` and `AppendData` logical operators is that the former (`WriteToDataSourceV2`) uses [DataSourceWriter](../logical-operators/WriteToDataSourceV2.md#writer) directly while the latter (`AppendData`) uses [DataSourceV2Relation](../logical-operators/AppendData.md#table) to [get the DataSourceWriter from](../logical-operators/DataSourceV2Relation.md#newWriter).
30-
31-
[WriteToDataSourceV2](../logical-operators/WriteToDataSourceV2.md) and [AppendData](../logical-operators/AppendData.md) (with [DataSourceV2Relation](../logical-operators/DataSourceV2Relation.md)) logical operators are planned as (_translated to_) a [WriteToDataSourceV2Exec](../physical-operators/WriteToDataSourceV2Exec.md) physical operator.
32-
33-
When executed, `WriteToDataSourceV2Exec` physical operator...FIXME
27+
DataSource V2 uses `WriteToDataSourceV2` ([Spark Structured Streaming]({{ book.structured_streaming }}/logical-operators/WriteToDataSourceV2)) and [AppendData](../logical-operators/AppendData.md) logical operators to represent data writing (over a [DataSourceV2Relation](../logical-operators/DataSourceV2Relation.md) logical operator). As of Spark SQL 2.4.0, `WriteToDataSourceV2` operator was deprecated for the more specific `AppendData` operator (compare _"data writing"_ to _"data append"_ which is certainly more specific).
3428

3529
## <span id="filter-pushdown"> Filter Pushdown Performance Optimization
3630

docs/physical-operators/V2TableWriteExec.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ query: SparkPlan
1616

1717
* [TableWriteExecHelper](TableWriteExecHelper.md)
1818
* [V2ExistingTableWriteExec](V2ExistingTableWriteExec.md)
19-
* [WriteToDataSourceV2Exec](WriteToDataSourceV2Exec.md)
19+
* `WriteToDataSourceV2Exec` ([Spark Structured Streaming]({{ book.structured_streaming }}/physical-operators/WriteToDataSourceV2Exec))
2020

2121
## <span id="writeWithV2"> writeWithV2
2222

@@ -25,9 +25,9 @@ writeWithV2(
2525
batchWrite: BatchWrite): Seq[InternalRow]
2626
```
2727

28-
`writeWithV2` requests the [physical query plan](#query) to [execute](SparkPlan.md#execute) (that gives a `RDD[InternalRow]`).
28+
`writeWithV2` requests the [physical query plan](#query) to [execute](SparkPlan.md#execute) (and produce a `RDD[InternalRow]`).
2929

30-
`writeWithV2` requests the given `BatchWrite` for a [DataWriterFactory](../connector/BatchWrite.md#createBatchWriterFactory).
30+
`writeWithV2` requests the given [BatchWrite](../connector/BatchWrite.md) to [create a DataWriterFactory](../connector/BatchWrite.md#createBatchWriterFactory) (with the number of partitions of the `RDD`)
3131

3232
`writeWithV2` prints out the following INFO message to the logs:
3333

@@ -51,10 +51,16 @@ Data source write support [batchWrite] is committing.
5151
Data source write support [batchWrite] committed.
5252
```
5353

54-
In the end, `writeWithV2` returns no `InternalRow`s.
54+
In the end, `writeWithV2` returns an empty collection (of `InternalRow`s).
55+
56+
---
5557

5658
`writeWithV2` is used when:
5759

58-
* `WriteToDataSourceV2Exec` is [executed](WriteToDataSourceV2Exec.md#run)
59-
* `V2ExistingTableWriteExec` is [executed](V2ExistingTableWriteExec.md#run)
6060
* `TableWriteExecHelper` is requested to [writeToTable](TableWriteExecHelper.md#writeToTable)
61+
* `V2ExistingTableWriteExec` is [executed](V2ExistingTableWriteExec.md#run)
62+
* `WriteToDataSourceV2Exec` ([Spark Structured Streaming]({{ book.structured_streaming }}/physical-operators/WriteToDataSourceV2Exec)) is executed
63+
64+
## Logging
65+
66+
`V2TableWriteExec` is a Scala trait and logging is configured using the logger of the [implementations](#implementations).

docs/physical-operators/WriteToDataSourceV2Exec.md

Lines changed: 0 additions & 46 deletions
This file was deleted.

mkdocs.yml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -312,7 +312,6 @@ nav:
312312
- Window: logical-operators/Window.md
313313
- WithCTE: logical-operators/WithCTE.md
314314
- WithWindowDefinition: logical-operators/WithWindowDefinition.md
315-
- WriteToDataSourceV2: logical-operators/WriteToDataSourceV2.md
316315
- SparkSession Registries:
317316
- Catalog:
318317
- Catalog: Catalog.md
@@ -531,7 +530,6 @@ nav:
531530
- WindowExec:
532531
- WindowExec: physical-operators/WindowExec.md
533532
- WindowExecBase: physical-operators/WindowExecBase.md
534-
- WriteToDataSourceV2Exec: physical-operators/WriteToDataSourceV2Exec.md
535533
- Distribution and Partitioning:
536534
- Distribution: physical-operators/Distribution.md
537535
- Partitioning: physical-operators/Partitioning.md

0 commit comments

Comments
 (0)