Skip to content

Commit adb21b9

Browse files
committed
Merged code from master
2 parents db1e042 + 11b39eb commit adb21b9

23 files changed

+1382
-863
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,5 @@
11
target/
22
*.swp
3+
.gitignore
4+
*.DS_Store
5+
.idea/*

Makefile

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Licensed under the Apache License, Version 2.0 (the "License");
2+
# you may not use this file except in compliance with the License.
3+
# You may obtain a copy of the License at
4+
#
5+
# http://www.apache.org/licenses/LICENSE-2.0
6+
#
7+
# Unless required by applicable law or agreed to in writing, software
8+
# distributed under the License is distributed on an "AS IS" BASIS,
9+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
10+
# See the License for the specific language governing permissions and
11+
# limitations under the License.
12+
#
13+
# Copyright 2019 Clairvoyant, LLC.
14+
15+
VERSION-CSD = $(shell bash ./version)
16+
VERSION-PARCEL = $(shell bash ./version-parcel)
17+
PACKAGE_NAME = AIRFLOW-$(VERSION-CSD)
18+
SHA_CMD := $(shell { command -v sha1sum || command -v sha1 || command -v shasum; } 2>/dev/null)
19+
20+
.PHONY: help dist validate clean
21+
help:
22+
@echo 'Please use "make <target>" where <target> is one of:'
23+
@echo ' dist : Create a CSD jarfile'
24+
@echo ' validate : Run unit tests'
25+
@echo ' clean : Clean up all generated files'
26+
27+
dist: clean validate
28+
@mkdir -p target/$(PACKAGE_NAME)
29+
@echo "*** Building CSD jarfile ..."
30+
cp -pr src/{aux,descriptor,images,scripts} target/$(PACKAGE_NAME)
31+
sed -e 's|{{ version }}|$(VERSION-CSD)|' -e 's|{{ parcel_version }}|$(VERSION-PARCEL)|' \
32+
src/descriptor/service.sdl >target/$(PACKAGE_NAME)/descriptor/service.sdl
33+
34+
jar -cvf target/$(PACKAGE_NAME).jar -C target/$(PACKAGE_NAME) .
35+
$(SHA_CMD) target/$(PACKAGE_NAME).jar | awk '{ print $$1 }' > target/$(PACKAGE_NAME).jar.sha
36+
@echo "*** complete"
37+
38+
validate: src/descriptor/service.sdl
39+
@echo "*** Validating service config ..."
40+
@java -jar ../../cloudera/cm_ext/validator/target/validator.jar -s src/descriptor/service.sdl
41+
42+
validate-mdl: src/descriptor/service.mdl
43+
@echo "*** Validating monitor config ..."
44+
@java -jar ../../cloudera/cm_ext/validator/target/validator.jar -z src/descriptor/service.mdl
45+
46+
clean:
47+
rm -rf target

README.md

Lines changed: 63 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -5,19 +5,30 @@ This repository allows you to install [Apache Airflow](https://airflow.apache.or
55
## Requirements
66
- A supported operating system.
77
- MySQL or PostgreSQL database in which to store Airflow metadata.
8+
<<<<<<< HEAD
89
- RabbitMQ
910
- [Airflow Parcel](https://github.com/teamclairvoyant/apache-airflow-parcels)
1011

1112
### Currently Supported Versions of Airflow
1213
- Airflow 1.9.0
1314
- Airflow 1.10.0
15+
=======
16+
17+
### Currently Supported Versions of Airflow
18+
- Airflow 1.10
19+
>>>>>>> 11b39ebe4b384d6ad1c1735474123e5af2c8676a
1420
1521
### Currently Supported Operating Systems
16-
- CentOS 6 & 7
17-
- RHEL 6 & 7
22+
- CentOS/RHEL 6 & 7
23+
- Debian 8
24+
- Ubuntu 14.04, 16.04, & 18.04
1825

1926
## Installing the CSD
27+
<<<<<<< HEAD
2028
1. Download the Jar file. [Airflow 1.9.0 CSD](https://s3-us-west-2.amazonaws.com/archive.clairvoyantsoft.com/airflow/csd/AIRFLOW-1.9.0.jar)
29+
=======
30+
1. Download the Jar file. [Airflow CSD](http://archive.clairvoyantsoft.com/airflow/csd/)
31+
>>>>>>> 11b39ebe4b384d6ad1c1735474123e5af2c8676a
2132
2. Copy the jar file to the `/opt/cloudera/csd` location on the Cloudera Manager server.
2233
3. Restart the Cloudera Manager Server service. `service cloudera-scm-server restart`
2334

@@ -58,25 +69,40 @@ create_postgresql_dbs-airflow.sh --host <host_name> --user <username> --password
5869
```
5970

6071
## Roles
72+
<<<<<<< HEAD
6173
There are seven roles defined in the CSD.
6274
1. Airflow Webserver
6375
2. Airflow Scheduler
6476
3. Airflow Worker
6577
4. Airflow Flower
6678
5. Kerberos
6779
6. Gateway
80+
=======
81+
There are six roles available for deployment:
82+
>>>>>>> 11b39ebe4b384d6ad1c1735474123e5af2c8676a
83+
84+
1. Webserver
85+
2. Scheduler
86+
3. Worker
87+
4. Flower Webserver
88+
5. Kerberos
89+
6. Gateway
6890

69-
Airflow Webserver: Airflow Webserver role is used to start the Airflow Web UI. Webserver role can be deployed on more than instances. However, they will be the same and can be used for backup purposes.
70-
71-
Airflow Scheduler: Airflow Scheduler role is used to schedule the Airflow jobs. This is limited to one instance to reduce the risk of duplicate jobs.
91+
Webserver: Airflow Webserver role runs the Airflow Web UI. Webserver role can be deployed on more than instances. However, they will be the same and can be used for backup purposes.
7292

73-
Airflow Worker: Airflow Worker role picks jobs from RabbitMQ and executed them on the nodes. Multiple instances can be deployed.
93+
Scheduler: Airflow Scheduler role is used to schedule the Airflow jobs. This is limited to one instance to reduce the risk of duplicate jobs.
7494

95+
<<<<<<< HEAD
7596
Airflow Flower: Airflow Flower is used to monitor celery clusters. Multiple instances are supported
97+
=======
98+
Worker: Airflow Worker role picks jobs from the Scheduler and executes them. Multiple instances can be deployed.
7699

77-
Kerberos: Kerberos is used to enable Kerberos protocol for the Airflow. It internally executes `airflow kerberos`. An external Kerberos Distribution Center must be setup. Multiple instances can be setup for load balancing purposes.
100+
Flower Webserver: Flower Webserver role is used to monitor Celery clusters. Celery allows for the expansion of Worker Only one instance is needed.
101+
>>>>>>> 11b39ebe4b384d6ad1c1735474123e5af2c8676a
78102
79-
Gateway: The purpose of the gateway role is to write the configurations from the configurations tab into the airflow.cfg file. This is done through the update_cfg.sh file which is executed from the scriptRunner within the gateway role.
103+
Kerberos: Airflow Kerberos role is used to enable Kerberos protocol for the other Airflow roles and for DAGs. This role should exist on each host with an Airflow Worker role.
104+
105+
Gateway: The purpose of the gateway role is to make the configuration available to CLI clients.
80106

81107
## Using the Airflow binary:
82108
Here are some of the examples of Airflow commands:
@@ -101,46 +127,35 @@ The DAG file has to be copied to `dags_folder` directory within all the nodes. I
101127
In order to enable authentication for the Airflow Web UI check the "Enable Airflow Authentication" option. You can create Airflow users using one of two options below.
102128

103129
### Creating Airflow Users using UI:
104-
1. Navigate to Airflow CSD. In the configurations page, enter the Airflow Username, Airflow Email, Airflow Password you want to create.
105-
2. Deploy the client configurations to create the Airflow user.
130+
One way to add Airflow users to the database is using the `airflow-mkuser` script. Users can be added as follows:
131+
132+
1. Navigate to Airflow WebUI.
133+
2. In the Admin dropdown choose Users.
134+
3. Choose Create and enter the username, email, and password you want to create.
106135

107136
Note: Although the last created user shows up in the Airflow configurations, you can still use the previously created users.
108137

109-
### Using mkuser.sh
110-
Another way to add Airflow users is using the `mkuser.sh` script. Users can be added as follows:
111-
1. Navigate to the current working directory of the CSD under `/var/run/cloudera-scm-agent/process`
112-
2. Export PYTHONPATH and AIRFLOW_HOME environment variables. By default these are:
138+
### Using airflow-mkuser
139+
Another way to add Airflow users to the database is using the `airflow-mkuser` script. Users can be added as follows:
113140

114-
PYTHONPATH:
115-
```bash
116-
export PYTHONPATH=/opt/cloudera/parcels/AIRFLOW/usr/lib/python2.7/site-packages:$PYTHONPATH
117-
```
118-
Airflow Home:
119-
```bash
120-
export AIRFLOW_HOME=/var/lib/airflow
121-
```
122-
3. Within the scripts directory, you can find the `mkuser.py` file. Execute `mkuser.py` to add a user to Airflow:
123-
```bash
124-
/opt/cloudera/parcels/AIRFLOW/bin/python2.7 mkuser.py <Username> <UserEmail> <Password>
125-
```
126-
For example, this can be like
127-
```bash
128-
/opt/cloudera/parcels/AIRFLOW/usr/bin/python2.7 mkuser.py airflowUser airflow@email.com airflowUserPassword
129-
```
141+
```bash
142+
airflow-mkuser <username> <email> <password>
143+
```
144+
For example, this can be like:
145+
```bash
146+
airflow-mkuser admin admin@localdomain password123
147+
```
130148

131149
## Building the CSD
132150
```bash
133151
git clone https://github.com/teamclairvoyant/apache-airflow-cloudera-csd
134152
cd apache-airflow-cloudera-csd
135-
mvn clean package
136-
```
137-
or
138-
```bash
139-
java -jar target/validator.jar -s src/descriptor/service.sdl
140-
jar -cvf AIRFLOW-1.0.0.jar -C src/ .
153+
make dist
141154
```
155+
Update the `version` file before running `make dist` if creating a new release.
142156

143157
## Limitations:
158+
<<<<<<< HEAD
144159
1. The IP address of the RabbitMQ instance has to be manually entered during installation configuration.
145160
2. After deploying configurations, there is no alert or warning that the specific roles needs to be restarted.
146161
3. Only 'airflow.contrib.auth.backends.password_auth' mechanism is supported for Airflow user authentication.
@@ -149,6 +164,14 @@ jar -cvf AIRFLOW-1.0.0.jar -C src/ .
149164
1. Build RabbitMQ parcel.
150165
2. Test Database connection.
151166
3. Add the support for more Airflow user authentication methods.
167+
=======
168+
1. After deploying configurations, there is no alert or warning that the specific roles needs to be restarted.
169+
2. Only 'airflow.contrib.auth.backends.password_auth' mechanism is supported for Airflow user authentication.
170+
171+
## Future work:
172+
1. Test Database connection.
173+
2. Add the support for more Airflow user authentication methods.
174+
>>>>>>> 11b39ebe4b384d6ad1c1735474123e5af2c8676a
152175
153176
## Known Errors:
154177

@@ -157,7 +180,8 @@ jar -cvf AIRFLOW-1.0.0.jar -C src/ .
157180
Upon many deployments, you may face an error called 'Markup file already exists' while trying to stop a role and the process never stops. In that case, stop the process using the "Abort" command and navigate to `/var/run/cloudera-scm-agent/process` and delete all the `GracefulRoleStopRunner` directories.
158181

159182
## Resources:
160-
1. https://github.com/cloudera/cm_ext/wiki/The-Structure-of-a-CSD
161-
2. https://github.com/cloudera/cm_ext/wiki/Service-Descriptor-Language-Reference
162-
3. https://github.com/cloudera/cm_csds
183+
1. https://github.com/teamclairvoyant/apache-airflow-parcels
184+
2. https://github.com/cloudera/cm_ext/wiki/The-Structure-of-a-CSD
185+
3. https://github.com/cloudera/cm_ext/wiki/Service-Descriptor-Language-Reference
186+
4. https://github.com/cloudera/cm_csds
163187

assembly.xml

Lines changed: 0 additions & 13 deletions
This file was deleted.

pom.xml

Lines changed: 0 additions & 35 deletions
This file was deleted.

src/aux/airflow-env.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
#!/bin/bash
2+
##
3+
# Generated by Cloudera Manager and should not be modified directly
4+
##

0 commit comments

Comments
 (0)