Skip to content
This repository was archived by the owner on May 22, 2025. It is now read-only.

Commit bd3f51c

Browse files
authored
Update "Getting Started" tutorial and README (#653)
Resolves #652
1 parent 6a587b2 commit bd3f51c

File tree

2 files changed

+266
-268
lines changed

2 files changed

+266
-268
lines changed

README.md

Lines changed: 116 additions & 106 deletions
Original file line numberDiff line numberDiff line change
@@ -15,110 +15,116 @@
1515
[![CI](https://github.com/astronomer/astro-sdk/actions/workflows/ci.yaml/badge.svg)](https://github.com/astronomer/astro-sdk)
1616
[![codecov](https://codecov.io/gh/astronomer/astro-sdk/branch/main/graph/badge.svg?token=MI4SSE50Q6)](https://codecov.io/gh/astronomer/astro-sdk)
1717

18-
**Astro SDK Python** allows rapid and clean development of {Extract, Load, Transform} workflows using Python.
19-
It helps DAG authors to achieve more with less code.
18+
**Astro Python SDK** allows for rapid and clean development of extract, transform, and load (ETL) workflows using Python.
19+
20+
The SDK abstracts the boilerplate code required for communication between datasets and tasks, which helps DAG authors to achieve more with less code.
21+
2022
It is powered by [Apache Airflow](https://airflow.apache.org) and maintained by [Astronomer](https://astronomer.io).
2123

2224
> :warning: **Disclaimer** This project is in a **preview** release state. In other words, it is not production-ready yet.
2325
The interfaces may change. We welcome users to try out the interfaces and provide us with feedback.
2426

25-
## Install
26-
27-
**Astro SDK Python** is available at [PyPI](https://pypi.org/project/astro-sdk-python/). Use the standard Python
28-
[installation tools](https://packaging.python.org/en/latest/tutorials/installing-packages/).
27+
## Prerequisites
2928

30-
To install a cloud-agnostic version of **Astro SDK Python**, run:
29+
- Apache Airflow >= 2.1.0.
3130

32-
```
33-
pip install astro-sdk-python
34-
```
35-
36-
If using cloud providers, install using the optional dependencies of interest:
37-
38-
```commandline
39-
pip install astro-sdk-python[amazon,google,snowflake,postgres]
40-
```
41-
42-
43-
## Quick-start
44-
45-
After installing Astro, copy the following example dag `calculate_popular_movies.py` to a local directory named `dags`:
46-
47-
```Python
48-
from datetime import datetime
49-
from airflow import DAG
50-
from astro import sql as aql
51-
from astro.files import File
52-
from astro.sql.table import Table
53-
54-
@aql.transform()
55-
def top_five_animations(input_table: Table):
56-
return """
57-
SELECT Title, Rating
58-
FROM {{input_table}}
59-
WHERE Genre1=='Animation'
60-
ORDER BY Rating desc
61-
LIMIT 5;
62-
"""
63-
64-
with DAG(
65-
"calculate_popular_movies",
66-
schedule_interval=None,
67-
start_date=datetime(2000, 1, 1),
68-
catchup=False,
69-
) as dag:
70-
imdb_movies = aql.load_file(
71-
File("https://raw.githubusercontent.com/astronomer/astro-sdk/main/tests/data/imdb.csv"),
72-
output_table=Table(
73-
name="imdb_movies", conn_id="sqlite_default"
74-
),
75-
)
76-
top_five_animations(
77-
input_table=imdb_movies,
78-
output_table=Table(
79-
name="top_animation"
80-
),
81-
)
82-
```
83-
84-
Set up a local instance of Airflow by running:
31+
## Install
8532

86-
```shell
87-
export AIRFLOW_HOME=`pwd`
88-
export AIRFLOW__CORE__ENABLE_XCOM_PICKLING=True
89-
airflow db init
90-
```
33+
The Astro Python SDK is available at [PyPI](https://pypi.org/project/astro-sdk-python/). Use the standard Python
34+
[installation tools](https://packaging.python.org/en/latest/tutorials/installing-packages/).
9135

92-
Create an SQLite database for the example to run with and run the DAG:
36+
To install a cloud-agnostic version of the SDK, run:
9337

9438
```shell
95-
# The sqlite_default connection has different host for MAC vs. Linux
96-
export SQL_TABLE_NAME=`airflow connections get sqlite_default -o yaml | grep host | awk '{print $2}'`
97-
sqlite3 "$SQL_TABLE_NAME" "VACUUM;"
98-
airflow dags test calculate_popular_movies `date -Iseconds`
39+
pip install astro-sdk-python
9940
```
10041

101-
Check the top five animations calculated by your first Astro DAG by running:
42+
You can also install dependencies for using the SDK with popular cloud providers:
10243

10344
```shell
104-
sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"
105-
```
106-
107-
You should see the following output:
108-
109-
```console
110-
$ sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"
111-
Toy Story 3 (2010)|8.3
112-
Inside Out (2015)|8.2
113-
How to Train Your Dragon (2010)|8.1
114-
Zootopia (2016)|8.1
115-
How to Train Your Dragon 2 (2014)|7.9
45+
pip install astro-sdk-python[amazon,google,snowflake,postgres]
11646
```
11747

11848

119-
## Requirements
120-
121-
**Astro SDK Python** depends on Apache Airflow >= 2.1.0.
49+
## Quickstart
50+
51+
1. Copy the following DAG into a file named `calculate_popular_movies.py` and add it to the `dags` directory of your Airflow project:
52+
53+
```Python
54+
from datetime import datetime
55+
from airflow import DAG
56+
from astro import sql as aql
57+
from astro.files import File
58+
from astro.sql.table import Table
59+
60+
@aql.transform()
61+
def top_five_animations(input_table: Table):
62+
return """
63+
SELECT Title, Rating
64+
FROM {{input_table}}
65+
WHERE Genre1=='Animation'
66+
ORDER BY Rating desc
67+
LIMIT 5;
68+
"""
69+
70+
with DAG(
71+
"calculate_popular_movies",
72+
schedule_interval=None,
73+
start_date=datetime(2000, 1, 1),
74+
catchup=False,
75+
) as dag:
76+
imdb_movies = aql.load_file(
77+
File("https://raw.githubusercontent.com/astronomer/astro-sdk/main/tests/data/imdb.csv"),
78+
output_table=Table(
79+
name="imdb_movies", conn_id="sqlite_default"
80+
),
81+
)
82+
top_five_animations(
83+
input_table=imdb_movies,
84+
output_table=Table(
85+
name="top_animation"
86+
),
87+
)
88+
```
89+
90+
2. Ensure that your Airflow environment is set up correctly by running the following commands:
91+
92+
```shell
93+
export AIRFLOW_HOME=`pwd`
94+
export AIRFLOW__CORE__ENABLE_XCOM_PICKLING=True
95+
airflow db init
96+
```
97+
98+
3. Create a SQLite database for the example to run with and run the DAG:
99+
100+
```shell
101+
# The sqlite_default connection has different host for MAC vs. Linux
102+
export SQL_TABLE_NAME=`airflow connections get sqlite_default -o yaml | grep host | awk '{print $2}'`
103+
sqlite3 "$SQL_TABLE_NAME" "VACUUM;"
104+
```
105+
106+
4. Run the example DAG:
107+
108+
```sh
109+
airflow dags test calculate_popular_movies `date -Iseconds`
110+
```
111+
112+
5. Check the result of your DAG by running:
113+
114+
```shell
115+
sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"
116+
```
117+
118+
You should see the following output:
119+
120+
```shell
121+
$ sqlite3 "$SQL_TABLE_NAME" "select * from top_animation;" ".exit"
122+
Toy Story 3 (2010)|8.3
123+
Inside Out (2015)|8.2
124+
How to Train Your Dragon (2010)|8.1
125+
Zootopia (2016)|8.1
126+
How to Train Your Dragon 2 (2014)|7.9
127+
```
122128

123129
## Supported technologies
124130

@@ -144,41 +150,45 @@ How to Train Your Dragon 2 (2014)|7.9
144150

145151
## Available operations
146152

147-
A summary of the currently available operations in **Astro SDK Python**.
148-
* `load_file`: load a given file into a SQL table
149-
* `transform`: applies a SQL select statement to a source table and saves the result to a destination table
150-
* `truncate`: remove all records from a SQL table
151-
* `run_raw_sql`: run any SQL statement without handling its output
152-
* `append`: insert rows from the source SQL table into the destination SQL table, if there are no conflicts
153-
* `merge`: insert rows from the source SQL table into the destination SQL table, depending on conflicts:
154-
* ignore: do not add rows that already exist
155-
* update: replace existing rows with new ones
156-
* `export_file`: export SQL table rows into a destination file
157-
* `dataframe`: export given SQL table into in-memory Pandas data-frame
153+
The following are some key functions available in the SDK:
154+
155+
- `load_file`: load a given file into a SQL table
156+
- `transform`: applies a SQL select statement to a source table and saves the result to a destination table
157+
- `drop_table`: Drops a SQL table
158+
- `run_raw_sql`: run any SQL statement without handling its output
159+
- `append`: insert rows from the source SQL table into the destination SQL table, if there are no conflicts
160+
- `merge`: insert rows from the source SQL table into the destination SQL table, depending on conflicts:
161+
- ignore: do not add rows that already exist
162+
- update: replace existing rows with new ones
163+
- `export_file`: export SQL table rows into a destination file
164+
- `dataframe`: export given SQL table into in-memory Pandas data-frame
165+
166+
For a full list of available operators, see the [SDK reference documentation](https://astro-sdk.readthedocs.io/en/latest/astro/sql/operators/append.html).
158167

159168
## Documentation
160169

161170
The documentation is a work in progress--we aim to follow the [Diátaxis](https://diataxis.fr/) system:
162-
* **[Getting Started](docs/getting-started/GETTING_STARTED.md)**: a hands-on introduction to **Astro SDK Python**
163-
* **How-to guides**: simple step-by-step user guides to accomplish specific tasks
164-
* **[Reference guide](https://astro-sdk.readthedocs.io/)**: commands, modules, classes and methods
165-
* **Explanation**: Clarification and discussion of key decisions when designing the project.
171+
172+
- **[Getting Started](docs/getting-started/GETTING_STARTED.md)**: A hands-on introduction to the Astro Python SDK
173+
- **How-to guides**: Simple step-by-step user guides to accomplish specific tasks
174+
- **[Reference guide](https://astro-sdk.readthedocs.io/)**: Commands, modules, classes and methods
175+
- **Explanation**: Clarification and discussion of key decisions when designing the project
166176

167177
## Changelog
168178

169-
We follow Semantic Versioning for releases. Check the [changelog](docs/CHANGELOG.md) for the latest changes.
179+
The Astro Python SDK follows semantic versioning for releases. Check the [changelog](docs/CHANGELOG.md) for the latest changes.
170180

171-
## Release Managements
181+
## Release managements
172182

173-
To learn more about our release philosophy and steps, check [here](docs/development/RELEASE.md)
183+
To learn more about our release philosophy and steps, see [Managing Releases](docs/development/RELEASE.md).
174184

175-
## Contribution Guidelines
185+
## Contribution guidelines
176186

177187
All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.
178188

179189
Read the [Contribution Guideline](docs/development/CONTRIBUTING.md) for a detailed overview on how to contribute.
180190

181-
As contributors and maintainers to this project, you should abide by the [Contributor Code of Conduct](docs/development/CODE_OF_CONDUCT.md).
191+
Contributors and maintainers should abide by the [Contributor Code of Conduct](docs/development/CODE_OF_CONDUCT.md).
182192

183193
## License
184194

0 commit comments

Comments
 (0)