Skip to content

Commit d8166f2

Browse files
authored
docs(logical): Add docs on creating logical datasets and bulk relationship removal (#15029)
1 parent 4ae6437 commit d8166f2

File tree

1 file changed

+70
-2
lines changed

1 file changed

+70
-2
lines changed

docs/features/feature-guides/logical-models/overview.md

Lines changed: 70 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,14 +38,74 @@ Columns on the logical parent and physical children can be linked as well:
3838

3939
## Creating Logical Models
4040

41+
Logical models are created like any DataHub dataset. We recommend using the Python SDK.
42+
43+
:::note Logical Model Platform
44+
All DataHub datasets require a platform, representing where the dataset exists. If your logical models are stored in a system users are familiar with, we recommend creating a custom platform for that system and providing a custom icon. Otherwise, we recommend using the platform `logical`, which has a special default icon.
45+
:::
46+
47+
### Create Dataset in "logical" Platform
48+
49+
```python
50+
from datahub.sdk import DataHubClient, Dataset
51+
client = DataHubClient.from_env()
52+
dataset = Dataset(
53+
platform="logical",
54+
name=logical_model_name,
55+
description=logical_model_description,
56+
schema=[
57+
# tuples of (field name / field path, data type, description)
58+
(
59+
"zipcode",
60+
"varchar(50)",
61+
"This is the zipcode of the address. Specified using extended form and limited to addresses in the United States",
62+
),
63+
("street", "varchar(100)", "Street corresponding to the address"),
64+
("date_column", "date", "Date of the last sale date for this property"),
65+
],
66+
)
67+
client.entities.upsert(dataset)
68+
```
69+
70+
### Create Dataset in Custom Platform
71+
72+
```python
73+
# Create custom platform with custom logo
74+
from datahub.sdk import DataHubClient
75+
from datahub.emitter.mcp import MetadataChangeProposalWrapper
76+
from datahub.metadata.schema_classes import DataPlatformInfoClass, PlatformTypeClass
77+
from datahub.metadata.urns import DataPlatformUrn
78+
79+
urn = DataPlatformUrn("<platformName>").urn()
80+
aspect = DataPlatformInfoClass(
81+
name="<platformName>",
82+
type=PlatformTypeClass.OTHERS,
83+
datasetNameDelimiter=".",
84+
logoUrl="<url>"
85+
)
86+
client = DataHubClient.from_env()
87+
client._graph.emit(MetadataChangeProposalWrapper(entityUrn=urn, aspect=aspect))
88+
89+
# Create dataset in custom platform
90+
from datahub.sdk import DataHubClient, Dataset
91+
client = DataHubClient.from_env()
92+
dataset = Dataset(
93+
platform="<platformName>",
94+
... # See above
95+
)
96+
client.entities.upsert(dataset)
97+
```
98+
99+
## Linking Logical Models
100+
41101
At its core, the logical -> physical relationship is created by the [`LogicalParent`](../../../generated/metamodel/entities/dataset.md#logicalparent) aspect. To link columns, this aspect must also be created on each child schmea field entity. However, for ease of use, we recommend the OpenAPI endpoint.
42102

43103
### OpenAPI
44104

45105
The OpenAPI endpoint creates a logical -> physical relationship for a single logical-physical pair, as well as the column-level relationships between their columns, if specified.
46106

47107
```shell
48-
curl -X POST 'http://localhost:8080/openapi/v3/entity/logical/<physical_child_urn>/relationship/physicalInstanceOf/<logical_model_urn>' \
108+
curl -X POST 'http://localhost:8080/openapi/v3/logical/<physical_child_urn>/relationship/physicalInstanceOf/<logical_model_urn>' \
49109
-H 'accept: application/json' \
50110
-H 'Content-Type: application/json' \
51111
-d '{
@@ -55,14 +115,22 @@ curl -X POST 'http://localhost:8080/openapi/v3/entity/logical/<physical_child_ur
55115
}'
56116
```
57117

118+
These relationships can also be removed (as of DataHub Cloud v0.3.15):
119+
120+
```shell
121+
curl -X DELETE 'http://localhost:8080/openapi/v3/logical/<physical_child_urn>/relationship/physicalInstanceOf' \
122+
-H 'accept: application/json' \
123+
-H 'Content-Type: application/json'
124+
```
125+
58126
### Python SDK
59127

60128
The Python SDK can also query the same endpoint:
61129

62130
```python
63131
from datahub.sdk import DataHubClient
64132
client = DataHubClient.from_env()
65-
url = f"{client._graph.config.server}/openapi/v3/entity/logical/{child_urn}/relationship/physicalInstanceOf/{parent_urn}"
133+
url = f"{client._graph.config.server}/openapi/v3/logical/{child_urn}/relationship/physicalInstanceOf/{parent_urn}"
66134
client._graph._post_generic(url, {column.parent_name: column.child_name for column in columns})
67135
```
68136

0 commit comments

Comments
 (0)