diff --git a/docs/_include/card/timeseries-datashader.md b/docs/_include/card/timeseries-datashader.md index 369e5425..16b5096c 100644 --- a/docs/_include/card/timeseries-datashader.md +++ b/docs/_include/card/timeseries-datashader.md @@ -10,7 +10,9 @@ points from your backend systems to the browser's glass. This notebook plots the venerable NYC Taxi dataset after importing it into a CrateDB Cloud database cluster. -🚧 _Please note this notebook is a work in progress._ 🚧 +```{todo} +🚧 This notebook is a work in progress. 🚧 +``` {{ '{}[cloud-datashader-github]'.format(nb_github) }} {{ '{}[cloud-datashader-colab]'.format(nb_colab) }} ::: diff --git a/docs/_include/links.md b/docs/_include/links.md index b3bd6df5..eff688c3 100644 --- a/docs/_include/links.md +++ b/docs/_include/links.md @@ -27,6 +27,7 @@ [HNSW paper]: https://arxiv.org/pdf/1603.09320 [HoloViews]: https://www.holoviews.org/ [Indexing, Columnar Storage, and Aggregations]: https://cratedb.com/product/features/indexing-columnar-storage-aggregations +[InfluxDB]: https://github.com/influxdata/influxdb [inverted index]: https://en.wikipedia.org/wiki/Inverted_index [JOIN]: inv:crate-reference#sql_joins [JSON Database]: https://cratedb.com/solutions/json-database @@ -38,9 +39,12 @@ [langchain-rag-sql-binder]: https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Flangchain%2Fcratedb-vectorstore-rag-openai-sql.ipynb [langchain-rag-sql-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/langchain/cratedb-vectorstore-rag-openai-sql.ipynb [langchain-rag-sql-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/langchain/cratedb-vectorstore-rag-openai-sql.ipynb -[MongoDB CDC Relay]: https://cratedb-toolkit.readthedocs.io/io/mongodb/cdc.html +[MongoDB]: https://www.mongodb.com/docs/manual/ +[MongoDB Atlas]: https://www.mongodb.com/docs/atlas/ +[MongoDB CDC Relay]: inv:ctk:*:label#mongodb-cdc-relay [MongoDB Change Streams]: https://www.mongodb.com/docs/manual/changeStreams/ -[MongoDB Table Loader]: https://cratedb-toolkit.readthedocs.io/io/mongodb/loader.html +[MongoDB collections and databases]: https://www.mongodb.com/docs/php-library/current/databases-collections/ +[MongoDB Table Loader]: inv:ctk:*:label#mongodb-loader [Multi-model Database]: https://cratedb.com/solutions/multi-model-database [nearest neighbor search]: https://en.wikipedia.org/wiki/Nearest_neighbor_search [Nested Data Structure]: https://cratedb.com/product/features/nested-data-structure diff --git a/docs/conf.py b/docs/conf.py index 698dedfb..65727aed 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -57,6 +57,8 @@ r"https://www.tableau.com/", # Read timed out. (read timeout=15) r"https://kubernetes.io/", + # Connection to renenyffenegger.ch timed out. + r"https://renenyffenegger.ch", ] linkcheck_anchors_ignore_for_url += [ diff --git a/docs/connect/configure.md b/docs/connect/configure.md index 741bbb70..6c1d2516 100644 --- a/docs/connect/configure.md +++ b/docs/connect/configure.md @@ -1,8 +1,10 @@ (connect-configure)= - # Configure -In order to connect to CrateDB, your application or driver needs to be +:::{include} /_include/links.md +::: + +To connect to CrateDB properly, your application or driver needs to be configured with corresponding connection properties. Please note that different applications and drivers may obtain connection properties in different formats. @@ -143,7 +145,10 @@ crate://crate@localhost:4200/?schema=doc :::::: -```{tip} +:::{rubric} Notes +::: + +:::{div} - CrateDB's fixed catalog name is `crate`, the default schema name is `doc`. - CrateDB does not implement the notion of a database, however tables can be created in different [schemas]. @@ -155,4 +160,4 @@ crate://crate@localhost:4200/?schema=doc called `crate`, defined without a password. - For authenticating properly, please learn about the available [authentication] options. -``` +::: diff --git a/docs/connect/index.md b/docs/connect/index.md index 94069e65..bcfbf77b 100644 --- a/docs/connect/index.md +++ b/docs/connect/index.md @@ -85,6 +85,7 @@ Database connectivity information. :::: ::::{grid-item-card} {material-outlined}`lightbulb;2em` How to connect +:width: auto - {ref}`connect-java` - {ref}`connect-javascript` - {ref}`connect-php` @@ -105,6 +106,7 @@ CLI programs ide Drivers DataFrame libraries +mcp/index ORM libraries ``` diff --git a/docs/integrate/mcp/community.md b/docs/connect/mcp/community.md similarity index 100% rename from docs/integrate/mcp/community.md rename to docs/connect/mcp/community.md diff --git a/docs/integrate/mcp/cratedb-mcp.md b/docs/connect/mcp/cratedb-mcp.md similarity index 100% rename from docs/integrate/mcp/cratedb-mcp.md rename to docs/connect/mcp/cratedb-mcp.md diff --git a/docs/integrate/mcp/index.md b/docs/connect/mcp/index.md similarity index 85% rename from docs/integrate/mcp/index.md rename to docs/connect/mcp/index.md index cabc3c09..4bfc3d73 100644 --- a/docs/integrate/mcp/index.md +++ b/docs/connect/mcp/index.md @@ -1,3 +1,5 @@ +(mcp)= +(connect-mcp)= # Model Context Protocol (MCP) ```{toctree} @@ -8,9 +10,7 @@ cratedb-mcp Community servers ``` -## About - -:::{rubric} Introduction +:::{rubric} About ::: [MCP], the Model Context Protocol, is an open protocol that enables seamless @@ -19,22 +19,14 @@ integration between LLM applications and external data sources and tools. MCP is sometimes described as "OpenAPI for LLMs" or as "USB-C port for AI", providing a uniform way to connect LLMs to resources they can use. -:::{rubric} Details -::: - -The main entities of MCP are [prompts], [resources], and [tools]. +The main entities of MCP are [Prompts], [Resources], and [Tools]. MCP clients call MCP servers, either by invoking them as a subprocess and communicating via Standard Input/Output (stdio), Server-Sent Events (sse), or HTTP Streams (streamable-http), see [transports]. -:::{rubric} Discuss +:::{rubric} Usage ::: -To get in touch with us to discuss CrateDB and MCP, head over to GitHub at -[Model Context Protocol (MCP) @ CrateDB] or the [Community Forum]. - -## Usage - You can use MCP with [CrateDB] and [CrateDB Cloud], either by selecting the **CrateDB MCP Server** suitable for Text-to-SQL and documentation retrieval, or by using community MCP servers that are compatible with PostgreSQL databases. @@ -66,9 +58,16 @@ GitHub Copilot, Mistral AI, OpenAI Agents SDK, VS Code, Windsurf, and others. -[Community Forum]: https://community.cratedb.com/ +:::{rubric} Discuss +::: + +To get in touch with us to discuss CrateDB and MCP, please head over to +the CrateDB community forum at [Introducing the CrateDB MCP Server]. + + [CrateDB]: https://cratedb.com/database [CrateDB Cloud]: https://cratedb.com/docs/cloud/ +[Introducing the CrateDB MCP Server]: https://community.cratedb.com/t/introducing-the-cratedb-mcp-server/2043 [MCP]: https://modelcontextprotocol.io/ [MCP clients]: https://modelcontextprotocol.io/clients [Model Context Protocol (MCP) @ CrateDB]: https://github.com/crate/crate-clients-tools/discussions/234 diff --git a/docs/ingest/cdc/index.md b/docs/ingest/cdc/index.md index 8b8fd542..019e41ad 100644 --- a/docs/ingest/cdc/index.md +++ b/docs/ingest/cdc/index.md @@ -5,95 +5,20 @@ ::: :::{div} -You have a variety of options to connect and integrate with 3rd-party +CrateDB provides many options to connect and integrate with third-party CDC applications, mostly using [CrateDB's PostgreSQL interface]. - -CrateDB also provides a few native adapter components that can be used -to leverage its advanced features. +CrateDB also provides native adapter components to leverage advanced +features. This documentation section lists corresponding CDC applications and frameworks which can be used together with CrateDB, and outlines how to use them optimally. -Please also have a look at support for [generic ETL](#etl) solutions. -::: - -(cdc-dms)= -## AWS DMS - -:::{div} -[AWS Database Migration Service (AWS DMS)] is a managed migration and replication -service that helps move your database and analytics workloads between different -kinds of databases quickly, securely, and with minimal downtime and zero data -loss. - -AWS DMS supports migration between 20-plus database and analytics engines, either -on-premises, or per EC2 instance databases. Supported data migration sources are: -Amazon Aurora, Amazon DocumentDB, Amazon S3, IBM DB2, MariaDB, Azure SQL Database, -Microsoft SQL Server, MongoDB, MySQL, Oracle, PostgreSQL, SAP ASE. - -The [AWS DMS Integration with CrateDB] uses Amazon Kinesis Data Streams as -a DMS target, combined with a CrateDB-specific downstream processor element. - -CrateDB provides two variants how to conduct data migrations using AWS DMS. -Either use it standalone / on your own premises, or use it in a completely -managed environment with services of AWS and CrateDB Cloud. -AWS DMS supports both `full-load` and `cdc` operation modes, often used in -combination with each other (`full-load-and-cdc`). +Please also take a look at support for {ref}`generic ETL ` solutions. ::: -(cdc-kinesis)= -## AWS Kinesis -You can use Amazon Kinesis Data Streams to collect and process large streams of data -records in real time. A typical Kinesis Data Streams application reads data from a -data stream as data records. - -As such, a common application is to relay DynamoDB table change stream events to a -Kinesis Stream, and consume that from an adapter to a consolidation database. -:::{div} -- About: [Amazon Kinesis Data Streams] -- See: [](#cdc-dynamodb) -::: - -## Debezium - +- {ref}`aws-dms` +- {ref}`aws-dynamodb` +- {ref}`aws-kinesis` - {ref}`debezium` - -(cdc-dynamodb)= -## DynamoDB -:::{div} -Support for loading DynamoDB tables into CrateDB (full-load), as well as -[Amazon DynamoDB Streams] and [Amazon Kinesis Data Streams], -to relay CDC events from DynamoDB into CrateDB. - -- [DynamoDB Table Loader] -- [DynamoDB CDC Relay] - -If you are looking into serverless replication using AWS Lambda: -- [DynamoDB CDC Relay with AWS Lambda] -- Blog: [Replicating CDC events from DynamoDB to CrateDB] -::: - -## MongoDB -:::{div} -Support for loading MongoDB collections and databases into CrateDB (full-load), -and [MongoDB Change Streams], to relay CDC events from MongoDB into CrateDB. - -- [MongoDB Table Loader] -- [MongoDB CDC Relay] -::: - -## StreamSets - -The [StreamSets Data Collector] is a lightweight and powerful engine that -allows you to build streaming, batch and change-data-capture (CDC) pipelines -that can ingest and transform data from a variety of different sources. - -StreamSets Data Collector Engine makes it easy to run data pipelines from Kafka, -Oracle, Salesforce, JDBC, Hive, and more to Snowflake, Databricks, S3, ADLS, Kafka -and more. Data Collector Engine runs on-premises or any cloud, wherever your data -lives. - +- {ref}`mongodb` - {ref}`streamsets` - - -[StreamSets Data Collector]: https://www.softwareag.com/en_corporate/platform/integration-apis/data-collector-engine.html diff --git a/docs/ingest/etl/index.md b/docs/ingest/etl/index.md index 64143cbc..070fc533 100644 --- a/docs/ingest/etl/index.md +++ b/docs/ingest/etl/index.md @@ -7,161 +7,36 @@ ::: :::{div} -You have a variety of options to connect and integrate with 3rd-party +CrateDB provides many options to connect and integrate with third-party ETL applications, mostly using [CrateDB's PostgreSQL interface]. -::: +CrateDB also provides native adapter components to leverage advanced +features. This documentation section lists corresponding ETL applications and frameworks which can be used together with CrateDB, and outlines how to use them optimally. -Please also have a look at support for [](#cdc) solutions. - +Please also take a look at support for {ref}`cdc` solutions. +::: -## Apache Airflow / Astronomer - {ref}`apache-airflow` - -## Apache Flink - - {ref}`apache-flink` - -## Apache Hop - - {ref}`apache-hop` - -## Apache Iceberg / RisingWave -:::{div} -- {ref}`iceberg-risingwave` -::: - -```{toctree} -:hidden: - -iceberg-risingwave -``` - -## Apache Kafka - +- {ref}`apache-iceberg` - {ref}`apache-kafka` - -## Apache NiFi - - {ref}`apache-nifi` - -## AWS DMS - -:::{div} -[AWS Database Migration Service (AWS DMS)] is a managed migration and replication -service that helps move your database and analytics workloads between different -kinds of databases quickly, securely, and with minimal downtime and zero data -loss. It supports migration between 20-plus database and analytics engines. - -AWS DMS supports migration between 20-plus database and analytics engines, either -on-premises, or per EC2 instance databases. Supported data migration sources are: -Amazon Aurora, Amazon DocumentDB, Amazon S3, IBM DB2, MariaDB, Azure SQL Database, -Microsoft SQL Server, MongoDB, MySQL, Oracle, PostgreSQL, SAP ASE. - -The [AWS DMS Integration with CrateDB] uses Amazon Kinesis Data Streams as -a DMS target, combined with a CrateDB-specific downstream processor element. - -CrateDB provides two variants how to conduct data migrations using AWS DMS. -Either use it standalone / on your own premises, or use it in a completely -managed environment with services of AWS and CrateDB Cloud. -::: - - -## AWS Kinesis - -Amazon Kinesis Data Streams is a serverless streaming data service that -simplifies the capture, processing, and storage of data streams at any -scale, such as application logs, website clickstreams, and IoT telemetry -data, for machine learning (ML), analytics, and other applications. -:::{div} -The [DynamoDB CDC Relay] pipeline uses Amazon Kinesis to relay a table -change stream from a DynamoDB table into a CrateDB table, see also -[DynamoDB CDC](#cdc-dynamodb). -::: - - -## Azure Functions - +- {ref}`aws-dms` +- {ref}`aws-dynamodb` +- {ref}`aws-kinesis` - {ref}`azure-functions` - -```{toctree} -:hidden: - -azure-functions -``` - - -## dbt - - {ref}`dbt` - -## DynamoDB -:::{div} -- [DynamoDB Table Loader] -- [DynamoDB CDC Relay] -::: - - -## Estuary - - {ref}`estuary` - -## InfluxDB - -- {ref}`integrate-influxdb` - -## Kestra - +- {ref}`influxdb` - {ref}`kestra` - -## Meltano - - {ref}`meltano` - -## MongoDB -:::{div} -- Tutorial: {ref}`integrate-mongodb` -- Documentation: [MongoDB Table Loader] -- Documentation: [MongoDB CDC Relay] -::: -```{toctree} -:hidden: - -mongodb -``` - - -## MySQL - -- {ref}`integrate-mysql` - -```{toctree} -:hidden: - -mysql -``` - -## Node-RED - +- {ref}`mongodb` +- {ref}`mysql` - {ref}`node-red` - -## RisingWave - - {ref}`risingwave` - -## SQL Server Integration Services - - {ref}`sql-server` - -## StreamSets - - {ref}`streamsets` - -```{toctree} -:hidden: - -streamsets -``` diff --git a/docs/ingest/telemetry/index.md b/docs/ingest/telemetry/index.md index 4b5f4f86..45b17b08 100644 --- a/docs/ingest/telemetry/index.md +++ b/docs/ingest/telemetry/index.md @@ -3,17 +3,14 @@ (integrate-metrics)= # Telemetry data +:::{div} CrateDB integrations with metrics collection agents, brokers, and stores. This documentation section lists applications and daemons which can be used together with CrateDB, and educates about how to use them optimally. Storing metrics data for the long term is a common need in systems monitoring scenarios. CrateDB offers corresponding integration adapters. - -## Prometheus +::: - {ref}`prometheus` - -## Telegraf - - {ref}`telegraf` diff --git a/docs/integrate/apache-iceberg/index.md b/docs/integrate/apache-iceberg/index.md new file mode 100644 index 00000000..606d34f0 --- /dev/null +++ b/docs/integrate/apache-iceberg/index.md @@ -0,0 +1,19 @@ +(apache-iceberg)= +# Apache Iceberg + +:::{rubric} About +::: +The [Iceberg table format] is designed to manage a large, slow-changing collection +of files in a distributed file system or key-value store as a database table. + +:::{rubric} Learn +::: +CrateDB provides integration capabilities with Apache Iceberg implementations, +see {ref}`risingwave-iceberg`. + +:::{todo} +🚧 This page is a work in progress. 🚧 +::: + + +[Iceberg table format]: https://iceberg.apache.org/spec/ diff --git a/docs/integrate/aws-dms/index.md b/docs/integrate/aws-dms/index.md new file mode 100644 index 00000000..d494cb1d --- /dev/null +++ b/docs/integrate/aws-dms/index.md @@ -0,0 +1,35 @@ +(aws-dms)= +(cdc-dms)= +# AWS Database Migration Service + +:::{include} /_include/links.md +::: + +:::{rubric} About +::: + +:::{div} +[AWS Database Migration Service (AWS DMS)] is a managed migration and replication +service that helps move your database and analytics workloads between different +kinds of databases quickly, securely, and with minimal downtime and zero data +loss. + +AWS DMS supports migration between 20+ database and analytics engines, either +on-premises or on EC2-hosted databases. Supported data migration sources include: +Amazon Aurora, Amazon DocumentDB, Amazon S3, IBM DB2, MariaDB, Azure SQL Database, +Microsoft SQL Server, MongoDB, MySQL, Oracle, PostgreSQL, SAP ASE. +::: + +:::{rubric} Learn +::: + +:::{div} +The [AWS DMS Integration with CrateDB] uses Amazon Kinesis Data Streams as +a DMS target, combined with a CrateDB-specific downstream processor element. + +CrateDB supports two ways to run AWS DMS migrations: +Either standalone/on‑premises, or fully managed with AWS and CrateDB Cloud. + +AWS DMS supports both `full-load` and `cdc` operation modes, which are often +combined (`full-load-and-cdc`). +::: diff --git a/docs/integrate/aws-dynamodb/index.md b/docs/integrate/aws-dynamodb/index.md new file mode 100644 index 00000000..a94ec33d --- /dev/null +++ b/docs/integrate/aws-dynamodb/index.md @@ -0,0 +1,27 @@ +(aws-dynamodb)= +(cdc-dynamodb)= +# Amazon DynamoDB + +:::{include} /_include/links.md +::: + +:::{rubric} About +::: + +:::{div} +The [DynamoDB Table Loader] supports loading DynamoDB tables into CrateDB (full-load), +while the [DynamoDB CDC Relay] pipeline uses [Amazon DynamoDB Streams] or [Amazon Kinesis +Data Streams] to relay table change stream CDC events from a DynamoDB table into CrateDB. + +:::{rubric} Learn +::: + +:::{div} +It is a common application to relay DynamoDB table change stream events to a +Kinesis Stream, and consume that from an adapter to write into an analytical +or long-term storage consolidation database. + +If you are looking into serverless replication using AWS Lambda: +- [DynamoDB CDC Relay with AWS Lambda] +- Blog: [Replicating CDC events from DynamoDB to CrateDB] +::: diff --git a/docs/integrate/aws-kinesis/index.md b/docs/integrate/aws-kinesis/index.md new file mode 100644 index 00000000..2c773669 --- /dev/null +++ b/docs/integrate/aws-kinesis/index.md @@ -0,0 +1,28 @@ +(aws-kinesis)= +# Amazon Kinesis + +:::{include} /_include/links.md +::: + +:::{rubric} About +::: + +:::{div} +[Amazon Kinesis Data Streams] is a serverless streaming data service that +simplifies the capture, processing, and storage of data streams at any +scale, such as application logs, website clickstreams, and IoT telemetry +data, for machine learning (ML), analytics, and other applications. + +You can use Amazon Kinesis Data Streams to collect and process large data +streams in real time. A typical application reads data from the stream as +records. +::: + +:::{rubric} Learn +::: + +:::{div} +The [DynamoDB CDC Relay] pipeline uses Amazon Kinesis to relay a table +change stream from a DynamoDB table into a CrateDB table, see also +{ref}`DynamoDB CDC `. +::: diff --git a/docs/integrate/azure-functions/index.md b/docs/integrate/azure-functions/index.md new file mode 100644 index 00000000..1b3fd982 --- /dev/null +++ b/docs/integrate/azure-functions/index.md @@ -0,0 +1,38 @@ +(azure-functions)= +# Azure Functions + +:::{include} /_include/links.md +::: + +:::{rubric} About +::: + +_Execute event-driven serverless code with an end-to-end development experience._ + +[Azure Functions] is a serverless solution that allows you to build robust apps +while using less code, and with less infrastructure and lower costs. Instead +of worrying about deploying and maintaining servers, you can use the cloud +infrastructure to provide all the up-to-date resources needed to keep your +applications running. + +An Azure Function is a short-lived, serverless computation that is triggered +by external events. The trigger produces an input payload, which is delivered +to the Azure Function. The Azure Function then does computation with this +payload and subsequently outputs its result to other Azure Functions, computation +services, or storage services. See also [What is Azure Functions?]. + +:::{rubric} Learn +::: + +A common pattern is to use an Azure Function to enrich and ingest data +to a CrateDB instance by connecting that Azure Function to an IoT Hub's new +messages trigger. + +:::{toctree} +:maxdepth: 1 +learn +::: + + +[Azure Functions]: https://azure.microsoft.com/en-us/products/functions +[What is Azure Functions?]: https://learn.microsoft.com/en-us/azure/azure-functions/functions-overview diff --git a/docs/ingest/etl/azure-functions.rst b/docs/integrate/azure-functions/learn.rst similarity index 99% rename from docs/ingest/etl/azure-functions.rst rename to docs/integrate/azure-functions/learn.rst index 1d604a24..8c83d89b 100644 --- a/docs/ingest/etl/azure-functions.rst +++ b/docs/integrate/azure-functions/learn.rst @@ -1,4 +1,4 @@ -.. _azure-functions: +.. _azure-functions-learn: =========================================================== Data Enrichment using IoT Hubs, Azure Functions and CrateDB diff --git a/docs/integrate/cluvio/index.md b/docs/integrate/cluvio/index.md index 52385cba..c360e3cb 100644 --- a/docs/integrate/cluvio/index.md +++ b/docs/integrate/cluvio/index.md @@ -6,7 +6,7 @@ ```{div} :style: "float: right; margin-left: 1em" -[![cluvio-logo-full_color-on_dark.svg ](https://github.com/crate/crate-clients-tools/assets/453543/cac142ef-412a-4a67-a63f-bf9d1ce92c84){w=180px}](https://www.cluvio.com/) +[![cluvio-logo-full_color-on_dark.svg](https://github.com/crate/crate-clients-tools/assets/453543/cac142ef-412a-4a67-a63f-bf9d1ce92c84){w=180px}](https://www.cluvio.com/) ``` [Cluvio] is a programmable and interactive dashboarding platform β€” your analytics diff --git a/docs/integrate/index.md b/docs/integrate/index.md index f0f85f58..ea665ee6 100644 --- a/docs/integrate/index.md +++ b/docs/integrate/index.md @@ -19,9 +19,14 @@ Please also visit the [Overview of CrateDB integration tutorials]. apache-airflow/index apache-flink/index apache-hop/index +apache-iceberg/index apache-kafka/index apache-nifi/index apache-superset/index +azure-functions/index +aws-dms/index +aws-dynamodb/index +aws-kinesis/index cluvio/index datagrip/index dbeaver/index @@ -35,9 +40,10 @@ grafana/index influxdb/index kestra/index marquez/index -mcp/index meltano/index metabase/index +mongodb/index +mysql/index node-red/index plotly/index powerbi/index @@ -47,6 +53,7 @@ rill/index risingwave/index sql-server/index streamlit/index +streamsets/index tableau/index telegraf/index ::: diff --git a/docs/integrate/influxdb/index.md b/docs/integrate/influxdb/index.md index 30fd2b83..a26c0774 100644 --- a/docs/integrate/influxdb/index.md +++ b/docs/integrate/influxdb/index.md @@ -1,189 +1,25 @@ +(influxdb)= (integrate-influxdb)= (integrate-influxdb-quickstart)= -(import-influxdb)= -# Import data from InfluxDB +# InfluxDB -In this quick tutorial, you will use the [CrateDB Toolkit InfluxDB I/O subsystem] -to import data from [InfluxDB] into [CrateDB]. You can also import data directly -from files in InfluxDB line protocol format. - -## Synopsis - -### InfluxDB Server -Transfer data from InfluxDB bucket/measurement into CrateDB schema/table. -```shell -ctk load table \ - "influxdb2://example:token@influxdb.example.org:8086/testdrive/demo" \ - --cratedb-sqlalchemy-url="crate://user:password@cratedb.example.org:4200/testdrive/demo" -``` -Query data in CrateDB. -```shell -export CRATEPW=password -crash --host=cratedb.example.org --username=user --command='SELECT * FROM testdrive.demo;' -``` - -### InfluxDB Line Protocol -Transfer data from InfluxDB line protocol file into CrateDB schema/table. -```shell -ctk load table \ - "https://github.com/influxdata/influxdb2-sample-data/raw/master/air-sensor-data/air-sensor-data.lp" \ - --cratedb-sqlalchemy-url="crate://user:password@cratedb.example.org:4200/testdrive/air-sensor-data" -``` -Query data in CrateDB. -```shell -export CRATEPW=password -crash --host=cratedb.example.org --username=user --command='SELECT * FROM testdrive."air-sensor-data";' -``` - - -## Data Model - -InfluxDB stores time series data in buckets and measurements. CrateDB stores -data in schemas and tables. - -- A **bucket** is a named location with a retention policy where time series data is stored. -- A **series** is a logical grouping of data defined by shared measurement, tag, and field. -- A **measurement** is similar to an SQL database table. -- A **tag** is similar to an indexed column in an SQL database. -- A **field** is similar to an un-indexed column in an SQL database. -- A **point** is similar to an SQL row. - -## Tutorial - -The tutorial heavily uses Docker to provide services and to run jobs. -Alternatively, you can use the drop-in replacement Podman. -The walkthrough uses basic example setup including InfluxDB 2.x and -a few samples worth of data that is being transferred to CrateDB. - -### Services - -Prerequisites are running instances of CrateDB and InfluxDB. - -Start InfluxDB. -:::{code} shell -docker run --rm -it --name=influxdb \ - --publish=8086:8086 \ - --env=DOCKER_INFLUXDB_INIT_MODE=setup \ - --env=DOCKER_INFLUXDB_INIT_USERNAME=admin \ - --env=DOCKER_INFLUXDB_INIT_PASSWORD=secret0000 \ - --env=DOCKER_INFLUXDB_INIT_ORG=example \ - --env=DOCKER_INFLUXDB_INIT_BUCKET=testdrive \ - --env=DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=token \ - --volume="$PWD/var/lib/influxdb2:/var/lib/influxdb2" \ - influxdb:2 -::: - -Start CrateDB. -:::{code} shell -docker run --rm -it --name=cratedb \ - --publish=4200:4200 \ - --volume="$PWD/var/lib/cratedb:/data" \ - crate:latest -Cdiscovery.type=single-node -::: - -### Sample Data -Command shortcuts. -:::{code} shell -alias influx="docker exec influxdb influx" -alias influx-write="influx write --bucket=testdrive --org=example --token=token --precision=s" -::: - -Write a few samples worth of data to InfluxDB. -:::{code} shell -influx-write "demo,region=amazonas temperature=27.4,humidity=92.3,windspeed=4.5 1588363200" -influx-write "demo,region=amazonas temperature=28.2,humidity=88.7,windspeed=4.7 1588549600" -influx-write "demo,region=amazonas temperature=27.9,humidity=91.6,windspeed=3.2 1588736000" -influx-write "demo,region=amazonas temperature=29.1,humidity=88.1,windspeed=2.4 1588922400" -influx-write "demo,region=amazonas temperature=28.6,humidity=93.4,windspeed=2.9 1589108800" -::: - -### Data Import - -First, create these command aliases, for better UX. -:::{code} shell -alias crash="docker run --rm -it --link=cratedb ghcr.io/crate/cratedb-toolkit:latest crash" -alias ctk="docker run --rm -it --link=cratedb --link=influxdb ghcr.io/crate/cratedb-toolkit:latest ctk" +:::{include} /_include/links.md ::: -Now, import data from InfluxDB bucket/measurement into CrateDB schema/table. -:::{code} shell -ctk load table \ - "influxdb2://example:token@influxdb:8086/testdrive/demo" \ - --cratedb-sqlalchemy-url="crate://crate@cratedb:4200/testdrive/demo" +:::{rubric} About ::: -Verify that relevant data has been transferred to CrateDB. -:::{code} shell -crash --host=cratedb --command="SELECT * FROM testdrive.demo;" -::: - -## Cloud to Cloud - -The procedure for importing data from [InfluxDB Cloud] into [CrateDB Cloud] is -similar, with a few small adjustments. - -First, helpful aliases again: -:::{code} shell -alias ctk="docker run --rm -it ghcr.io/crate/cratedb-toolkit:latest ctk" -alias crash="docker run --rm -it ghcr.io/crate/cratedb-toolkit:latest crash" +:::{div} +[InfluxDB] is a scalable datastore for metrics, events, and real-time analytics. +InfluxDB Core is a database built to collect, process, transform, and store event +and time series data. It is ideal for use cases that require real-time ingest and +fast query response times to build user interfaces, monitoring, and automation solutions. ::: -You will need your credentials for both CrateDB and InfluxDB. -These are, with examples: - -**CrateDB Cloud** -* Host: ```purple-shaak-ti.eks1.eu-west-1.aws.cratedb.net``` -* Username: ```admin``` -* Password: ```dZ..qB``` - -**InfluxDB Cloud** - * Host: ```eu-central-1-1.aws.cloud2.influxdata.com``` - * Organization ID: ```9fafc869a91a3406``` - * All-Access API token: ```T2..==``` - -For CrateDB, the credentials are displayed at time of cluster creation. -For InfluxDB, they can be found in the [cloud platform] itself. - -Now, same as before, import data from InfluxDB bucket/measurement into -CrateDB schema/table. -:::{code} shell -ctk load table \ - "influxdb2://9f..06:T2..==@eu-central-1-1.aws.cloud2.influxdata.com/testdrive/demo?ssl=true" \ - --cratedb-sqlalchemy-url="crate://admin:dZ..qB@purple-shaak-ti.eks1.eu-west-1.aws.cratedb.net:4200/testdrive/demo?ssl=true" +:::{rubric} Learn ::: -::: {note} -Note the **necessary** `ssl=true` query parameter at the end of both database connection URLs -when working on Cloud-to-Cloud transfers. +:::{toctree} +:maxdepth: 1 +learn ::: - -Verify that relevant data has been transferred to CrateDB. -:::{code} shell -crash --hosts 'https://admin:dZ..qB@purple-shaak-ti.eks1.eu-west-1.aws.cratedb.net:4200' --command 'SELECT * FROM testdrive.demo;' -::: - -## More information - -There are more ways to apply the I/O subsystem of CrateDB Toolkit as -pipeline elements in your daily data operations routines. Please visit the -[CrateDB Toolkit InfluxDB I/O subsystem] documentation, to learn more about what's possible. - -The InfluxDB I/O subsystem is based on the [influxio] package. Please also -check its documentation to learn about more of its capabilities, supporting -you when working with InfluxDB. - -:::{note} -**Important:** If you discover any issues with this adapter, please -[report them] back to us. -::: - - -[cloud platform]: https://docs.influxdata.com/influxdb/cloud/admin -[CrateDB]: https://github.com/crate/crate -[CrateDB Cloud]: https://console.cratedb.cloud/ -[CrateDB Toolkit InfluxDB I/O subsystem]: https://cratedb-toolkit.readthedocs.io/io/influxdb/loader.html -[InfluxDB]: https://github.com/influxdata/influxdb -[InfluxDB Cloud]: https://cloud2.influxdata.com/ -[influxio]: https://influxio.readthedocs.io/ -[report them]: https://github.com/crate/cratedb-toolkit/issues -[What are series and bucket in InfluxDB]: https://stackoverflow.com/questions/58190272/what-are-series-and-bucket-in-influxdb/69951376#69951376 diff --git a/docs/integrate/influxdb/learn.md b/docs/integrate/influxdb/learn.md new file mode 100644 index 00000000..0d17ceda --- /dev/null +++ b/docs/integrate/influxdb/learn.md @@ -0,0 +1,193 @@ +(influxdb-learn)= +(import-influxdb)= +# Import data from InfluxDB + +In this quick tutorial, you will use the [CrateDB Toolkit InfluxDB I/O subsystem] +to import data from [InfluxDB] into [CrateDB]. You can also import data directly +from files in InfluxDB line protocol format. + +## Synopsis + +### InfluxDB Server +Transfer data from InfluxDB bucket/measurement into CrateDB schema/table. +```shell +ctk load table \ + "influxdb2://example:token@influxdb.example.org:8086/testdrive/demo" \ + --cratedb-sqlalchemy-url="crate://user:password@cratedb.example.org:4200/testdrive/demo" +``` +Query data in CrateDB. +```shell +export CRATEPW=password +crash --host=cratedb.example.org --username=user --command='SELECT * FROM testdrive.demo;' +``` + +### InfluxDB Line Protocol +Transfer data from InfluxDB line protocol file into CrateDB schema/table. +```shell +ctk load table \ + "https://github.com/influxdata/influxdb2-sample-data/raw/master/air-sensor-data/air-sensor-data.lp" \ + --cratedb-sqlalchemy-url="crate://user:password@cratedb.example.org:4200/testdrive/air-sensor-data" +``` +Query data in CrateDB. +```shell +export CRATEPW=password +crash --host=cratedb.example.org --username=user --command='SELECT * FROM testdrive."air-sensor-data";' +``` + + +## Data Model + +InfluxDB stores time series data in buckets and measurements. CrateDB stores +data in schemas and tables. + +- A **bucket** is a named location with a retention policy where time series data is stored. +- A **series** is a logical grouping of data defined by shared measurement, tag, and field. +- A **measurement** is similar to an SQL database table. +- A **tag** is similar to an indexed column in an SQL database. +- A **field** is similar to an un-indexed column in an SQL database. +- A **point** is similar to an SQL row. + +> via: [What are series and bucket in InfluxDB] + +## Tutorial + +The tutorial heavily uses Docker to provide services and to run jobs. +Alternatively, you can use the drop-in replacement Podman. +The walkthrough uses basic example setup including InfluxDB 2.x and +a few samples worth of data that is being transferred to CrateDB. + +### Services + +Prerequisites are running instances of CrateDB and InfluxDB. + +Start InfluxDB. +:::{code} shell +docker run --rm -it --name=influxdb \ + --publish=8086:8086 \ + --env=DOCKER_INFLUXDB_INIT_MODE=setup \ + --env=DOCKER_INFLUXDB_INIT_USERNAME=admin \ + --env=DOCKER_INFLUXDB_INIT_PASSWORD=secret0000 \ + --env=DOCKER_INFLUXDB_INIT_ORG=example \ + --env=DOCKER_INFLUXDB_INIT_BUCKET=testdrive \ + --env=DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=token \ + --volume="$PWD/var/lib/influxdb2:/var/lib/influxdb2" \ + influxdb:2 +::: + +Start CrateDB. +:::{code} shell +docker run --rm -it --name=cratedb \ + --publish=4200:4200 \ + --volume="$PWD/var/lib/cratedb:/data" \ + crate:latest -Cdiscovery.type=single-node +::: + +### Sample Data +Command shortcuts. +:::{code} shell +alias influx="docker exec influxdb influx" +alias influx-write="influx write --bucket=testdrive --org=example --token=token --precision=s" +::: + +Write a few samples worth of data to InfluxDB. +:::{code} shell +influx-write "demo,region=amazonas temperature=27.4,humidity=92.3,windspeed=4.5 1588363200" +influx-write "demo,region=amazonas temperature=28.2,humidity=88.7,windspeed=4.7 1588549600" +influx-write "demo,region=amazonas temperature=27.9,humidity=91.6,windspeed=3.2 1588736000" +influx-write "demo,region=amazonas temperature=29.1,humidity=88.1,windspeed=2.4 1588922400" +influx-write "demo,region=amazonas temperature=28.6,humidity=93.4,windspeed=2.9 1589108800" +::: + +### Data Import + +First, create these command aliases, for better UX. +:::{code} shell +alias crash="docker run --rm -it --link=cratedb ghcr.io/crate/cratedb-toolkit:latest crash" +alias ctk="docker run --rm -it --link=cratedb --link=influxdb ghcr.io/crate/cratedb-toolkit:latest ctk" +::: + +Now, import data from InfluxDB bucket/measurement into CrateDB schema/table. +:::{code} shell +ctk load table \ + "influxdb2://example:token@influxdb:8086/testdrive/demo" \ + --cratedb-sqlalchemy-url="crate://crate@cratedb:4200/testdrive/demo" +::: + +Verify that relevant data has been transferred to CrateDB. +:::{code} shell +crash --host=cratedb --command="SELECT * FROM testdrive.demo;" +::: + +## Cloud to Cloud + +The procedure for importing data from [InfluxDB Cloud] into [CrateDB Cloud] is +similar, with a few small adjustments. + +First, helpful aliases again: +:::{code} shell +alias ctk="docker run --rm -it ghcr.io/crate/cratedb-toolkit:latest ctk" +alias crash="docker run --rm -it ghcr.io/crate/cratedb-toolkit:latest crash" +::: + +You will need your credentials for both CrateDB and InfluxDB. +These are, with examples: + +:::{rubric} CrateDB Cloud +::: +- Host: ```purple-shaak-ti.eks1.eu-west-1.aws.cratedb.net``` +- Username: ```admin``` +- Password: ```dZ..qB``` + +:::{rubric} InfluxDB Cloud +::: +- Host: ```eu-central-1-1.aws.cloud2.influxdata.com``` +- Organization ID: ```9fafc869a91a3406``` +- All-Access API token: ```T2..==``` + +For CrateDB, the credentials are displayed at time of cluster creation. +For InfluxDB, they can be found in the [cloud platform] itself. + +Now, same as before, import data from InfluxDB bucket/measurement into +CrateDB schema/table. +:::{code} shell +export CRATEPW='dZ..qB' +ctk load table \ + "influxdb2://9f..06:T2..==@eu-central-1-1.aws.cloud2.influxdata.com/testdrive/demo?ssl=true" \ + --cratedb-sqlalchemy-url="crate://admin:${CRATEPW}@purple-shaak-ti.eks1.eu-west-1.aws.cratedb.net:4200/testdrive/demo?ssl=true" +::: + +::: {note} +Note the **necessary** `ssl=true` query parameter at the end of both database connection URLs +when working on Cloud-to-Cloud transfers. +::: + +Verify that relevant data has been transferred to CrateDB. +:::{code} shell +export CRATEPW='dZ..qB' +crash --hosts 'https://admin:${CRATEPW}@purple-shaak-ti.eks1.eu-west-1.aws.cratedb.net:4200' --command 'SELECT * FROM testdrive.demo;' +::: + +## More information + +There are more ways to apply the I/O subsystem of CrateDB Toolkit as +pipeline elements in your daily data operations routines. Please visit the +[CrateDB Toolkit InfluxDB I/O subsystem] documentation, to learn more about what's possible. + +The InfluxDB I/O subsystem is based on the [influxio] package. See its +documentation for additional capabilities when working with InfluxDB. + +:::{note} +**Important:** If you discover any issues with this adapter, please +[report them] back to us. +::: + + +[cloud platform]: https://docs.influxdata.com/influxdb/cloud/admin +[CrateDB]: https://github.com/crate/crate +[CrateDB Cloud]: https://console.cratedb.cloud/ +[CrateDB Toolkit InfluxDB I/O subsystem]: https://cratedb-toolkit.readthedocs.io/io/influxdb/loader.html +[InfluxDB]: https://github.com/influxdata/influxdb +[InfluxDB Cloud]: https://cloud2.influxdata.com/ +[influxio]: https://influxio.readthedocs.io/ +[report them]: https://github.com/crate/cratedb-toolkit/issues +[What are series and bucket in InfluxDB]: https://stackoverflow.com/questions/58190272/what-are-series-and-bucket-in-influxdb/69951376#69951376 diff --git a/docs/integrate/meltano/index.md b/docs/integrate/meltano/index.md index 001daeea..893c0114 100644 --- a/docs/integrate/meltano/index.md +++ b/docs/integrate/meltano/index.md @@ -34,7 +34,9 @@ as Singer taps and targets. - [meltano-tap-cratedb] - [meltano-target-cratedb] -🚧 _Please note these adapters are a work in progress._ 🚧 +:::{todo} +🚧 These adapters are a work in progress. 🚧 +::: [Examples about working with CrateDB and Meltano]: https://github.com/crate/cratedb-examples/tree/amo/meltano/framework/singer-meltano diff --git a/docs/integrate/mongodb/index.md b/docs/integrate/mongodb/index.md new file mode 100644 index 00000000..9f68789c --- /dev/null +++ b/docs/integrate/mongodb/index.md @@ -0,0 +1,55 @@ +(mongodb)= +# MongoDB + +:::{include} /_include/links.md +::: + +:::{rubric} About +::: + +:::{div} +[MongoDB] is a document database designed for ease of application development and scaling. +[MongoDB Atlas] is a multi-cloud database service by the same people who build MongoDB. +Atlas simplifies deploying and managing your databases while offering the versatility +you need to build resilient and performant global applications on the cloud providers +of your choice. +::: + +:::{rubric} Learn +::: + +:::{div} +Explore support for loading [MongoDB collections and databases] into CrateDB (`full-load`), +and [MongoDB Change Streams], to relay CDC events from MongoDB into CrateDB (`cdc`). +::: + +:::{list-table} +:header-rows: 1 +:widths: auto + +* - Feature + - CrateDB + - CrateDB Cloud + - Description +* - [MongoDB Table Loader] + - βœ… + - βœ… + - CLI `ctk load table` for loading collections into CrateDB (`full-load`). + Tutorial: {ref}`import-mongodb` +* - [MongoDB CDC Relay] + - βœ… + - βœ… + - CLI `ctk load table` for streaming changes of collections into CrateDB (`cdc`). +* - {ref}`MongoDB CDC integration ` + - ❌ + - βœ… + - Managed data loading from MongoDB and MongoDB Atlas into CrateDB Cloud + (`full-load` and `cdc`), including advanced data translation and compensation + strategies. +::: + +:::{toctree} +:maxdepth: 1 +:hidden: +learn +::: diff --git a/docs/ingest/etl/mongodb.md b/docs/integrate/mongodb/learn.md similarity index 99% rename from docs/ingest/etl/mongodb.md rename to docs/integrate/mongodb/learn.md index 711527a2..7db84dbf 100644 --- a/docs/ingest/etl/mongodb.md +++ b/docs/integrate/mongodb/learn.md @@ -2,6 +2,7 @@ (migrating-mongodb)= (integrate-mongodb-quickstart)= (import-mongodb)= +(mongodb-learn)= # Import data from MongoDB diff --git a/docs/integrate/mysql/index.md b/docs/integrate/mysql/index.md new file mode 100644 index 00000000..06d65dd1 --- /dev/null +++ b/docs/integrate/mysql/index.md @@ -0,0 +1,42 @@ +(mysql)= +(mariadb)= +# MySQL and MariaDB + +:::{include} /_include/links.md +::: + +:::{rubric} About +::: + +```{div} +:style: "float: right; margin-left: 1em" + +[![mysql-logo](https://www.mysql.com/common/logos/powered-by-mysql-167x86.png){w=180px}](https://www.mysql.com/) +

+[![mariadb-logo](https://mariadb.com/wp-content/themes/mariadb-2025/public/images/logo-dark.4482a1.svg){w=180px}](https://www.mariadb.com/) +``` + +[MySQL] and [MariaDB] are well-known free and open-source relational database +management systems (RDBMS), available as standalone and managed variants. + +MySQL is a component of the LAMP web application software stack (and others), +which is an acronym for Linux, Apache, MySQL, Perl/PHP/Python. + +In 2010, when Oracle acquired Sun, Monty Widenius, MySQL's founder, forked the +open-source MySQL project to create MariaDB. + +```{div} +:style: "clear: both" +``` + +:::{rubric} Learn +::: + +:::{toctree} +:maxdepth: 1 +learn +::: + + +[MariaDB]: https://mariadb.com/ +[MySQL]: https://www.mysql.com/ diff --git a/docs/ingest/etl/mysql.rst b/docs/integrate/mysql/learn.rst similarity index 100% rename from docs/ingest/etl/mysql.rst rename to docs/integrate/mysql/learn.rst diff --git a/docs/ingest/etl/iceberg-risingwave.md b/docs/integrate/risingwave/apache-iceberg.md similarity index 99% rename from docs/ingest/etl/iceberg-risingwave.md rename to docs/integrate/risingwave/apache-iceberg.md index 1ac4dafb..c28ae4a3 100644 --- a/docs/ingest/etl/iceberg-risingwave.md +++ b/docs/integrate/risingwave/apache-iceberg.md @@ -1,4 +1,5 @@ (iceberg-risingwave)= +(risingwave-iceberg)= # Stream processing from Iceberg tables to CrateDB using RisingWave diff --git a/docs/integrate/risingwave/index.md b/docs/integrate/risingwave/index.md index 1f4ce9a0..c382efc9 100644 --- a/docs/integrate/risingwave/index.md +++ b/docs/integrate/risingwave/index.md @@ -1,5 +1,4 @@ (risingwave)= - # RisingWave ```{div} @@ -76,17 +75,21 @@ referenced below. ## Learn -:::{rubric} Tutorials -::: -- An example with data coming from an Apache Iceberg table and aggregations - materialized in real-time in CrateDB, using RisingWave. - See {ref}`iceberg-risingwave`. +Follow the full example tutorial sourcing data from an Apache Iceberg table, +and sinking it into CrateDB. See {ref}`risingwave-iceberg`. :::{note} We are tracking interoperability issues per [Tool: RisingWave] and appreciate any contributions and reports. ::: +:::{toctree} +:maxdepth: 1 +:hidden: + +apache-iceberg +::: + [CREATE SINK]: https://docs.risingwave.com/sql/commands/sql-create-sink [RisingWave]: https://github.com/risingwavelabs/risingwave diff --git a/docs/integrate/streamsets/index.md b/docs/integrate/streamsets/index.md new file mode 100644 index 00000000..2fecc23d --- /dev/null +++ b/docs/integrate/streamsets/index.md @@ -0,0 +1,23 @@ +(streamsets)= +# StreamSets + +:::{rubric} About +::: + +The [StreamSets Data Collector] is a lightweight, powerful engine for building +streaming, batch, and change data capture (CDC) pipelines that ingest and transform +data from various sources. + +Use it to run pipelines from sources such as Kafka, Oracle, Salesforce, JDBC, and Hive +to destinations including Snowflake, Databricks, Amazon S3, and Azure Data Lake Storage (ADLS). +It runs on-premises or in any cloud. + +:::{rubric} Learn +::: + +:::{toctree} +:maxdepth: 1 +learn +::: + +[StreamSets Data Collector]: https://www.softwareag.com/en_corporate/platform/integration-apis/data-collector-engine.html diff --git a/docs/ingest/etl/streamsets.rst b/docs/integrate/streamsets/learn.rst similarity index 99% rename from docs/ingest/etl/streamsets.rst rename to docs/integrate/streamsets/learn.rst index 24216ff1..da79b2de 100644 --- a/docs/ingest/etl/streamsets.rst +++ b/docs/integrate/streamsets/learn.rst @@ -1,4 +1,4 @@ -.. _streamsets: +.. _streamsets-learn: ================================================================ Data Stream Pipelines with CrateDB and StreamSets Data Collector