Skip to content

Commit 3d5627b

Browse files
[DP] GraphElementRegistry and the pipeline graph elements
1 parent 4741467 commit 3d5627b

File tree

4 files changed

+146
-1
lines changed

4 files changed

+146
-1
lines changed
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# GraphElementRegistry
2+
3+
`GraphElementRegistry` is an [abstraction](#contract) of [graph element registries](#implementations).
4+
5+
## Contract
6+
7+
### register_dataset { #register_dataset }
8+
9+
```py
10+
register_dataset(
11+
self,
12+
dataset: Dataset,
13+
) -> None
14+
```
15+
16+
See:
17+
18+
* [SparkConnectGraphElementRegistry](SparkConnectGraphElementRegistry.md#register_dataset)
19+
20+
Used when:
21+
22+
* [@create_streaming_table](./index.md#create_streaming_table), [@table](./index.md#table), [@materialized_view](./index.md#materialized_view), [@temporary_view](./index.md#temporary_view) decorators are used
23+
24+
### register_flow { #register_flow }
25+
26+
```py
27+
register_flow(
28+
self,
29+
flow: Flow,
30+
) -> None
31+
```
32+
33+
See:
34+
35+
* [SparkConnectGraphElementRegistry](SparkConnectGraphElementRegistry.md#register_flow)
36+
37+
Used when:
38+
39+
* [@append_flow](./index.md#append_flow), [@table](./index.md#table), [@materialized_view](./index.md#materialized_view), [@temporary_view](./index.md#temporary_view) decorators are used
40+
41+
### register_sql { #register_sql }
42+
43+
```py
44+
register_sql(
45+
self,
46+
sql_text: str,
47+
file_path: Path,
48+
) -> None
49+
```
50+
51+
See:
52+
53+
* [SparkConnectGraphElementRegistry](SparkConnectGraphElementRegistry.md#register_sql)
54+
55+
Used when:
56+
57+
* `pyspark.pipelines.cli` is requested to [register_definitions](#register_definitions)
58+
59+
## Implementations
60+
61+
* [SparkConnectGraphElementRegistry](SparkConnectGraphElementRegistry.md)
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# MaterializedView
2+
3+
`MaterializedView` is a `Table` that represents a materialized view in a pipeline dataflow graph.
4+
5+
`MaterializedView` is created using [@materialized_view](#materialized_view) decorator.
6+
7+
`MaterializedView` is a Python class.
8+
9+
## materialized_view
10+
11+
```py
12+
materialized_view(
13+
query_function: Optional[QueryFunction] = None,
14+
*,
15+
name: Optional[str] = None,
16+
comment: Optional[str] = None,
17+
spark_conf: Optional[Dict[str, str]] = None,
18+
table_properties: Optional[Dict[str, str]] = None,
19+
partition_cols: Optional[List[str]] = None,
20+
schema: Optional[Union[StructType, str]] = None,
21+
format: Optional[str] = None,
22+
) -> Union[Callable[[QueryFunction], None], None]
23+
```
24+
25+
`materialized_view` uses `query_function` for the parameters unless they are specified explicitly.
26+
27+
`materialized_view` uses the name of the decorated function as the name of the materialized view unless specified explicitly.
28+
29+
`materialized_view` makes sure that [GraphElementRegistry](GraphElementRegistry.md) has been set (using `graph_element_registration_context` context manager).
30+
31+
??? note "Demo"
32+
33+
```py
34+
from pyspark.pipelines.graph_element_registry import (
35+
graph_element_registration_context,
36+
get_active_graph_element_registry,
37+
)
38+
from pyspark.pipelines.spark_connect_graph_element_registry import (
39+
SparkConnectGraphElementRegistry,
40+
)
41+
42+
dataflow_graph_id = "demo_dataflow_graph_id"
43+
registry = SparkConnectGraphElementRegistry(spark, dataflow_graph_id)
44+
with graph_element_registration_context(registry):
45+
graph_registry = get_active_graph_element_registry()
46+
assert graph_registry == registry
47+
```
48+
49+
`materialized_view` creates a new `MaterializedView` and requests the `GraphElementRegistry` to [register_dataset](GraphElementRegistry.md#register_dataset) it.
50+
51+
`materialized_view` creates a new `Flow` and requests the `GraphElementRegistry` to [register_flow](GraphElementRegistry.md#register_flow) it.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# SparkConnectGraphElementRegistry
2+
3+
`SparkConnectGraphElementRegistry` is a [GraphElementRegistry](GraphElementRegistry.md).
4+
5+
## Creating Instance
6+
7+
`SparkConnectGraphElementRegistry` takes the following to be created:
8+
9+
* <span id="spark"> `SparkSession` (`SparkConnectClient`)
10+
* <span id="dataflow_graph_id"> Dataflow Graph ID
11+
12+
`SparkConnectGraphElementRegistry` is created when:
13+
14+
* `pyspark.pipelines.cli` is requested to [run](#run)
15+
16+
## register_dataset { #register_dataset }
17+
18+
??? note "GraphElementRegistry"
19+
20+
```py
21+
register_dataset(
22+
self,
23+
dataset: Dataset,
24+
) -> None
25+
```
26+
27+
`register_dataset` is part of the [GraphElementRegistry](GraphElementRegistry.md#register_dataset) abstraction.
28+
29+
`register_dataset` makes sure that the given `Dataset` is either `MaterializedView`, `StreamingTable` or `TemporaryView`.
30+
31+
`register_dataset` requests this [SparkSession](#spark) to [execute](#execute_command) a `PipelineCommand.DefineDataset`.

docs/declarative-pipelines/index.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ As of this [Commit 6ab0df9]({{ spark.commit }}/6ab0df9287c5a9ce49769612c2bb0a1da
4444
from pyspark import pipelines as dp
4545
```
4646

47-
## Python Decorators for Datasets and Flows { #python-decorators }
47+
## Python Decorators for Tables and Flows { #python-decorators }
4848

4949
Declarative Pipelines uses the following [Python decorators](https://peps.python.org/pep-0318/) to describe tables and views:
5050

@@ -73,6 +73,8 @@ from pyspark import pipelines as dp
7373

7474
### @dp.materialized_view { #materialized_view }
7575

76+
Creates a [MaterializedView](MaterializedView.md) (for a table whose contents are defined to be the result of a query).
77+
7678
### @dp.table { #table }
7779

7880
### @dp.temporary_view { #temporary_view }

0 commit comments

Comments
 (0)