Skip to content

Commit 3546bc4

Browse files
committed
Layout NG: Folder structure and naming things, focusing on ETL and CDC
- Dissolve individual pages in category section `etl`, relocating them into dedicated items within the backbone section `integrate` instead. - Relocated items: Azure Functions, Apache Iceberg, InfluxDB, MongoDB, MySQL and MariaDB, RisingWave, Streamsets. - Dissolve weird page toc assembly on ETL and CDC category index pages, using `toctree` only for now.
1 parent 99f70bd commit 3546bc4

File tree

21 files changed

+502
-411
lines changed

21 files changed

+502
-411
lines changed

docs/_include/links.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
[HNSW paper]: https://arxiv.org/pdf/1603.09320
2828
[HoloViews]: https://www.holoviews.org/
2929
[Indexing, Columnar Storage, and Aggregations]: https://cratedb.com/product/features/indexing-columnar-storage-aggregations
30+
[InfluxDB]: https://github.com/influxdata/influxdb
3031
[inverted index]: https://en.wikipedia.org/wiki/Inverted_index
3132
[JOIN]: inv:crate-reference#sql_joins
3233
[JSON Database]: https://cratedb.com/solutions/json-database
@@ -38,9 +39,12 @@
3839
[langchain-rag-sql-binder]: https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Flangchain%2Fcratedb-vectorstore-rag-openai-sql.ipynb
3940
[langchain-rag-sql-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/langchain/cratedb-vectorstore-rag-openai-sql.ipynb
4041
[langchain-rag-sql-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/langchain/cratedb-vectorstore-rag-openai-sql.ipynb
41-
[MongoDB CDC Relay]: https://cratedb-toolkit.readthedocs.io/io/mongodb/cdc.html
42+
[MongoDB]: https://www.mongodb.com/docs/manual/
43+
[MongoDB Atlas]: https://www.mongodb.com/docs/atlas/
44+
[MongoDB CDC Relay]: inv:ctk:*:label#mongodb-cdc-relay
4245
[MongoDB Change Streams]: https://www.mongodb.com/docs/manual/changeStreams/
43-
[MongoDB Table Loader]: https://cratedb-toolkit.readthedocs.io/io/mongodb/loader.html
46+
[MongoDB collections and databases]: https://www.mongodb.com/docs/php-library/current/databases-collections/
47+
[MongoDB Table Loader]: inv:ctk:*:label#mongodb-loader
4448
[Multi-model Database]: https://cratedb.com/solutions/multi-model-database
4549
[nearest neighbor search]: https://en.wikipedia.org/wiki/Nearest_neighbor_search
4650
[Nested Data Structure]: https://cratedb.com/product/features/nested-data-structure

docs/ingest/cdc/index.md

Lines changed: 5 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,8 @@
55
:::
66

77
:::{div}
8-
You have a variety of options to connect and integrate with 3rd-party
8+
CrateDB provides a variety of options to connect and integrate with third-party
99
CDC applications, mostly using [CrateDB's PostgreSQL interface].
10-
1110
CrateDB also provides a few native adapter components that can be used
1211
to leverage its advanced features.
1312

@@ -17,83 +16,9 @@ to use them optimally.
1716
Please also have a look at support for [generic ETL](#etl) solutions.
1817
:::
1918

20-
(cdc-dms)=
21-
## AWS DMS
22-
23-
:::{div}
24-
[AWS Database Migration Service (AWS DMS)] is a managed migration and replication
25-
service that helps move your database and analytics workloads between different
26-
kinds of databases quickly, securely, and with minimal downtime and zero data
27-
loss.
28-
29-
AWS DMS supports migration between 20-plus database and analytics engines, either
30-
on-premises, or per EC2 instance databases. Supported data migration sources are:
31-
Amazon Aurora, Amazon DocumentDB, Amazon S3, IBM DB2, MariaDB, Azure SQL Database,
32-
Microsoft SQL Server, MongoDB, MySQL, Oracle, PostgreSQL, SAP ASE.
33-
34-
The [AWS DMS Integration with CrateDB] uses Amazon Kinesis Data Streams as
35-
a DMS target, combined with a CrateDB-specific downstream processor element.
36-
37-
CrateDB provides two variants how to conduct data migrations using AWS DMS.
38-
Either use it standalone / on your own premises, or use it in a completely
39-
managed environment with services of AWS and CrateDB Cloud.
40-
AWS DMS supports both `full-load` and `cdc` operation modes, often used in
41-
combination with each other (`full-load-and-cdc`).
42-
:::
43-
44-
(cdc-kinesis)=
45-
## AWS Kinesis
46-
You can use Amazon Kinesis Data Streams to collect and process large streams of data
47-
records in real time. A typical Kinesis Data Streams application reads data from a
48-
data stream as data records.
49-
50-
As such, a common application is to relay DynamoDB table change stream events to a
51-
Kinesis Stream, and consume that from an adapter to a consolidation database.
52-
:::{div}
53-
- About: [Amazon Kinesis Data Streams]
54-
- See: [](#cdc-dynamodb)
55-
:::
56-
57-
## Debezium
58-
19+
- {ref}`aws-dms`
20+
- {ref}`aws-dynamodb`
21+
- {ref}`aws-kinesis`
5922
- {ref}`debezium`
60-
61-
(cdc-dynamodb)=
62-
## DynamoDB
63-
:::{div}
64-
Support for loading DynamoDB tables into CrateDB (full-load), as well as
65-
[Amazon DynamoDB Streams] and [Amazon Kinesis Data Streams],
66-
to relay CDC events from DynamoDB into CrateDB.
67-
68-
- [DynamoDB Table Loader]
69-
- [DynamoDB CDC Relay]
70-
71-
If you are looking into serverless replication using AWS Lambda:
72-
- [DynamoDB CDC Relay with AWS Lambda]
73-
- Blog: [Replicating CDC events from DynamoDB to CrateDB]
74-
:::
75-
76-
## MongoDB
77-
:::{div}
78-
Support for loading MongoDB collections and databases into CrateDB (full-load),
79-
and [MongoDB Change Streams], to relay CDC events from MongoDB into CrateDB.
80-
81-
- [MongoDB Table Loader]
82-
- [MongoDB CDC Relay]
83-
:::
84-
85-
## StreamSets
86-
87-
The [StreamSets Data Collector] is a lightweight and powerful engine that
88-
allows you to build streaming, batch and change-data-capture (CDC) pipelines
89-
that can ingest and transform data from a variety of different sources.
90-
91-
StreamSets Data Collector Engine makes it easy to run data pipelines from Kafka,
92-
Oracle, Salesforce, JDBC, Hive, and more to Snowflake, Databricks, S3, ADLS, Kafka
93-
and more. Data Collector Engine runs on-premises or any cloud, wherever your data
94-
lives.
95-
23+
- {ref}`mongodb`
9624
- {ref}`streamsets`
97-
98-
99-
[StreamSets Data Collector]: https://www.softwareag.com/en_corporate/platform/integration-apis/data-collector-engine.html

docs/ingest/etl/index.md

Lines changed: 11 additions & 136 deletions
Original file line numberDiff line numberDiff line change
@@ -7,161 +7,36 @@
77
:::
88

99
:::{div}
10-
You have a variety of options to connect and integrate with 3rd-party
10+
CrateDB provides a variety of options to connect and integrate with third-party
1111
ETL applications, mostly using [CrateDB's PostgreSQL interface].
12-
:::
12+
CrateDB also provides a few native adapter components that can be used
13+
to leverage its advanced features.
1314

1415
This documentation section lists corresponding ETL applications and
1516
frameworks which can be used together with CrateDB, and outlines how
1617
to use them optimally.
1718
Please also have a look at support for [](#cdc) solutions.
19+
:::
1820

1921

20-
## Apache Airflow / Astronomer
21-
2222
- {ref}`apache-airflow`
23-
24-
## Apache Flink
25-
2623
- {ref}`apache-flink`
27-
28-
## Apache Hop
29-
3024
- {ref}`apache-hop`
31-
32-
## Apache Iceberg / RisingWave
33-
:::{div}
34-
- {ref}`iceberg-risingwave`
35-
:::
36-
37-
```{toctree}
38-
:hidden:
39-
40-
iceberg-risingwave
41-
```
42-
43-
## Apache Kafka
44-
25+
- {ref}`apache-iceberg`
4526
- {ref}`apache-kafka`
46-
47-
## Apache NiFi
48-
4927
- {ref}`apache-nifi`
50-
51-
## AWS DMS
52-
53-
:::{div}
54-
[AWS Database Migration Service (AWS DMS)] is a managed migration and replication
55-
service that helps move your database and analytics workloads between different
56-
kinds of databases quickly, securely, and with minimal downtime and zero data
57-
loss. It supports migration between 20-plus database and analytics engines.
58-
59-
AWS DMS supports migration between 20-plus database and analytics engines, either
60-
on-premises, or per EC2 instance databases. Supported data migration sources are:
61-
Amazon Aurora, Amazon DocumentDB, Amazon S3, IBM DB2, MariaDB, Azure SQL Database,
62-
Microsoft SQL Server, MongoDB, MySQL, Oracle, PostgreSQL, SAP ASE.
63-
64-
The [AWS DMS Integration with CrateDB] uses Amazon Kinesis Data Streams as
65-
a DMS target, combined with a CrateDB-specific downstream processor element.
66-
67-
CrateDB provides two variants how to conduct data migrations using AWS DMS.
68-
Either use it standalone / on your own premises, or use it in a completely
69-
managed environment with services of AWS and CrateDB Cloud.
70-
:::
71-
72-
73-
## AWS Kinesis
74-
75-
Amazon Kinesis Data Streams is a serverless streaming data service that
76-
simplifies the capture, processing, and storage of data streams at any
77-
scale, such as application logs, website clickstreams, and IoT telemetry
78-
data, for machine learning (ML), analytics, and other applications.
79-
:::{div}
80-
The [DynamoDB CDC Relay] pipeline uses Amazon Kinesis to relay a table
81-
change stream from a DynamoDB table into a CrateDB table, see also
82-
[DynamoDB CDC](#cdc-dynamodb).
83-
:::
84-
85-
86-
## Azure Functions
87-
28+
- {ref}`aws-dms`
29+
- {ref}`aws-dynamodb`
30+
- {ref}`aws-kinesis`
8831
- {ref}`azure-functions`
89-
90-
```{toctree}
91-
:hidden:
92-
93-
azure-functions
94-
```
95-
96-
97-
## dbt
98-
9932
- {ref}`dbt`
100-
101-
## DynamoDB
102-
:::{div}
103-
- [DynamoDB Table Loader]
104-
- [DynamoDB CDC Relay]
105-
:::
106-
107-
108-
## Estuary
109-
11033
- {ref}`estuary`
111-
112-
## InfluxDB
113-
114-
- {ref}`integrate-influxdb`
115-
116-
## Kestra
117-
34+
- {ref}`influxdb`
11835
- {ref}`kestra`
119-
120-
## Meltano
121-
12236
- {ref}`meltano`
123-
124-
## MongoDB
125-
:::{div}
126-
- Tutorial: {ref}`integrate-mongodb`
127-
- Documentation: [MongoDB Table Loader]
128-
- Documentation: [MongoDB CDC Relay]
129-
:::
130-
```{toctree}
131-
:hidden:
132-
133-
mongodb
134-
```
135-
136-
137-
## MySQL
138-
139-
- {ref}`integrate-mysql`
140-
141-
```{toctree}
142-
:hidden:
143-
144-
mysql
145-
```
146-
147-
## Node-RED
148-
37+
- {ref}`mongodb`
38+
- {ref}`mysql`
14939
- {ref}`node-red`
150-
151-
## RisingWave
152-
15340
- {ref}`risingwave`
154-
155-
## SQL Server Integration Services
156-
15741
- {ref}`sql-server`
158-
159-
## StreamSets
160-
16142
- {ref}`streamsets`
162-
163-
```{toctree}
164-
:hidden:
165-
166-
streamsets
167-
```

docs/ingest/telemetry/index.md

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,14 @@
33
(integrate-metrics)=
44
# Telemetry data
55

6+
:::{div}
67
CrateDB integrations with metrics collection agents, brokers, and stores.
78
This documentation section lists applications and daemons which can
89
be used together with CrateDB, and educates about how to use them optimally.
910

1011
Storing metrics data for the long term is a common need in systems monitoring
1112
scenarios. CrateDB offers corresponding integration adapters.
12-
13-
## Prometheus
13+
:::
1414

1515
- {ref}`prometheus`
16-
17-
## Telegraf
18-
1916
- {ref}`telegraf`
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
(apache-iceberg)=
2+
# Apache Iceberg
3+
4+
:::{rubric} About
5+
:::
6+
The [Iceberg table format] is designed to manage a large, slow-changing collection
7+
of files in a distributed file system or key-value store as a database table.
8+
9+
:::{rubric} Learn
10+
:::
11+
CrateDB provides integration capabilities with Apache Iceberg implementations,
12+
see {ref}`risingwave-iceberg`.
13+
14+
:::{todo}
15+
🚧 _Please note this page is a work in progress._ 🚧
16+
:::
17+
18+
19+
[Iceberg table format]: https://iceberg.apache.org/spec/

docs/integrate/aws-dms/index.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
(aws-dms)=
2+
(cdc-dms)=
3+
# AWS Database Migration Service
4+
5+
:::{include} /_include/links.md
6+
:::
7+
8+
:::{rubric} About
9+
:::
10+
11+
:::{div}
12+
[AWS Database Migration Service (AWS DMS)] is a managed migration and replication
13+
service that helps move your database and analytics workloads between different
14+
kinds of databases quickly, securely, and with minimal downtime and zero data
15+
loss.
16+
17+
AWS DMS supports migration between 20-plus database and analytics engines, either
18+
on-premises, or per EC2 instance databases. Supported data migration sources are:
19+
Amazon Aurora, Amazon DocumentDB, Amazon S3, IBM DB2, MariaDB, Azure SQL Database,
20+
Microsoft SQL Server, MongoDB, MySQL, Oracle, PostgreSQL, SAP ASE.
21+
:::
22+
23+
:::{rubric} Learn
24+
:::
25+
26+
:::{div}
27+
The [AWS DMS Integration with CrateDB] uses Amazon Kinesis Data Streams as
28+
a DMS target, combined with a CrateDB-specific downstream processor element.
29+
30+
CrateDB provides two variants how to conduct data migrations using AWS DMS.
31+
Either use it standalone / on your own premises, or use it in a completely
32+
managed environment with services of AWS and CrateDB Cloud.
33+
34+
AWS DMS supports both `full-load` and `cdc` operation modes, often used in
35+
combination with each other (`full-load-and-cdc`).
36+
:::
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
(aws-dynamodb)=
2+
(cdc-dynamodb)=
3+
# Amazon DynamoDB
4+
5+
:::{include} /_include/links.md
6+
:::
7+
8+
:::{rubric} About
9+
:::
10+
11+
:::{div}
12+
The [DynamoDB Table Loader] supports loading DynamoDB tables into CrateDB (full-load),
13+
while the [DynamoDB CDC Relay] pipeline uses [Amazon DynamoDB Streams] or [Amazon Kinesis
14+
Data Streams] to relay table change stream CDC events from a DynamoDB table into CrateDB.
15+
16+
:::{rubric} Learn
17+
:::
18+
19+
:::{div}
20+
It is a common application to relay DynamoDB table change stream events to a
21+
Kinesis Stream, and consume that from an adapter to write into an analytical
22+
or long-term storage consolidation database.
23+
24+
If you are looking into serverless replication using AWS Lambda:
25+
- [DynamoDB CDC Relay with AWS Lambda]
26+
- Blog: [Replicating CDC events from DynamoDB to CrateDB]
27+
:::

0 commit comments

Comments
 (0)