You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+42-59Lines changed: 42 additions & 59 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,18 +5,17 @@ This repository allows you to install [Apache Airflow](https://airflow.apache.or
5
5
## Requirements
6
6
- A supported operating system.
7
7
- MySQL or PostgreSQL database in which to store Airflow metadata.
8
-
-[Airflow](https://github.com/teamclairvoyant/apache-airflow-parcels) and [RabbitMQ](https://github.com/teamclairvoyant/rabbitmq-cloudera-parcel) parcels need to be installed.
9
8
10
9
### Currently Supported Versions of Airflow
11
-
- Airflow 1.7.1.3
12
-
- Airflow 1.8.0
10
+
- Airflow 1.10
13
11
14
12
### Currently Supported Operating Systems
15
-
- CentOS 6 & 7
16
-
- RHEL 6 & 7
13
+
- CentOS/RHEL 6 & 7
14
+
- Debian 8
15
+
- Ubuntu 14.04, 16.04, & 18.04
17
16
18
17
## Installing the CSD
19
-
1. Download the Jar file. [Airflow CSD](https://teamclairvoyant.s3.amazonaws.com/apache-airflow/cloudera/csd/AIRFLOW-1.8.0.jar)
18
+
1. Download the Jar file. [Airflow CSD](http://archive.clairvoyantsoft.com/airflow/csd/)
20
19
2. Copy the jar file to the `/opt/cloudera/csd` location on the Cloudera Manager server.
21
20
3. Restart the Cloudera Manager Server service. `service cloudera-scm-server restart`
Airflow Webserver: Airflow Webserver role is used to start the Airflow Web UI. Webserver role can be deployed on more than instances. However, they will be the same and can be used for backup purposes.
61
+
1. Webserver
62
+
2. Scheduler
63
+
3. Worker
64
+
4. Flower Webserver
65
+
5. Kerberos
66
+
6. Gateway
70
67
71
-
Airflow Scheduler: Airflow Scheduler role is used to schedule the Airflow jobs. This is limited to one instance to reduce the risk of duplicate jobs.
68
+
Webserver: Airflow Webserver role runs the Airflow Web UI. Webserver role can be deployed on more than instances. However, they will be the same and can be used for backup purposes.
72
69
73
-
Airflow Worker: Airflow Worker role picks jobs from RabbitMQ and executed them on the nodes. Multiple instances can be deployed.
70
+
Scheduler: Airflow Scheduler role is used to schedule the Airflow jobs. This is limited to one instance to reduce the risk of duplicate jobs.
74
71
75
-
RabbitMQ: RabbitMQ role facilitates the use of RabbitMQ as the messaging broker. Currently the number of roles is limited to 1.
72
+
Worker: Airflow Worker role picks jobs from the Scheduler and executes them. Multiple instances can be deployed.
76
73
77
-
Airflow Flower: Airflow Flower is used to monitor celery clusters. Multiple instances are supported
74
+
Flower Webserver: Flower Webserver role is used to monitor Celery clusters. Celery allows for the expansion of Worker Only one instance is needed.
78
75
79
-
Kerberos: Kerberos is used to enable Kerberos protocol for the Airflow. It internally executes `airflow kerberos`. An external Kerberos Distribution Center must be setup. Multiple instances can be setup for load balancing purposes.
76
+
Kerberos: Airflow Kerberos role is used to enable Kerberos protocol for the other Airflow roles and for DAGs. This role should exist on each host with an Airflow Worker role.
80
77
81
-
Gateway: The purpose of the gateway role is to write the configurations from the configurations tab into the airflow.cfg file. This is done through the update_cfg.sh file which is executed from the scriptRunner within the gateway role.
78
+
Gateway: The purpose of the gateway role is to make the configuration available to CLI clients.
82
79
83
80
## Using the Airflow binary:
84
81
Here are some of the examples of Airflow commands:
@@ -103,55 +100,40 @@ The DAG file has to be copied to `dags_folder` directory within all the nodes. I
103
100
In order to enable authentication for the Airflow Web UI check the "Enable Airflow Authentication" option. You can create Airflow users using one of two options below.
104
101
105
102
### Creating Airflow Users using UI:
106
-
1. Navigate to Airflow CSD. In the configurations page, enter the Airflow Username, Airflow Email, Airflow Password you want to create.
107
-
2. Deploy the client configurations to create the Airflow user.
103
+
One way to add Airflow users to the database is using the `airflow-mkuser` script. Users can be added as follows:
104
+
105
+
1. Navigate to Airflow WebUI.
106
+
2. In the Admin dropdown choose Users.
107
+
3. Choose Create and enter the username, email, and password you want to create.
108
108
109
109
Note: Although the last created user shows up in the Airflow configurations, you can still use the previously created users.
110
110
111
-
### Using mkuser.sh
112
-
Another way to add Airflow users is using the `mkuser.sh` script. Users can be added as follows:
113
-
1. Navigate to the current working directory of the CSD under `/var/run/cloudera-scm-agent/process`
114
-
2. Export PYTHONPATH and AIRFLOW_HOME environment variables. By default these are:
111
+
### Using airflow-mkuser
112
+
Another way to add Airflow users to the database is using the `airflow-mkuser` script. Users can be added as follows:
0 commit comments