Skip to content

Commit 11b39eb

Browse files
authored
Merge pull request #5 from teamclairvoyant/simplify
Simplify
2 parents 9b8dc1e + 5dc81f0 commit 11b39eb

26 files changed

+1353
-1243
lines changed

Makefile

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Licensed under the Apache License, Version 2.0 (the "License");
2+
# you may not use this file except in compliance with the License.
3+
# You may obtain a copy of the License at
4+
#
5+
# http://www.apache.org/licenses/LICENSE-2.0
6+
#
7+
# Unless required by applicable law or agreed to in writing, software
8+
# distributed under the License is distributed on an "AS IS" BASIS,
9+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
10+
# See the License for the specific language governing permissions and
11+
# limitations under the License.
12+
#
13+
# Copyright 2019 Clairvoyant, LLC.
14+
15+
VERSION-CSD = $(shell bash ./version)
16+
VERSION-PARCEL = $(shell bash ./version-parcel)
17+
PACKAGE_NAME = AIRFLOW-$(VERSION-CSD)
18+
SHA_CMD := $(shell { command -v sha1sum || command -v sha1 || command -v shasum; } 2>/dev/null)
19+
20+
.PHONY: help dist validate clean
21+
help:
22+
@echo 'Please use "make <target>" where <target> is one of:'
23+
@echo ' dist : Create a CSD jarfile'
24+
@echo ' validate : Run unit tests'
25+
@echo ' clean : Clean up all generated files'
26+
27+
dist: clean validate
28+
@mkdir -p target/$(PACKAGE_NAME)
29+
@echo "*** Building CSD jarfile ..."
30+
cp -pr src/{aux,descriptor,images,scripts} target/$(PACKAGE_NAME)
31+
sed -e 's|{{ version }}|$(VERSION-CSD)|' -e 's|{{ parcel_version }}|$(VERSION-PARCEL)|' \
32+
src/descriptor/service.sdl >target/$(PACKAGE_NAME)/descriptor/service.sdl
33+
34+
jar -cvf target/$(PACKAGE_NAME).jar -C target/$(PACKAGE_NAME) .
35+
$(SHA_CMD) target/$(PACKAGE_NAME).jar | awk '{ print $$1 }' > target/$(PACKAGE_NAME).jar.sha
36+
@echo "*** complete"
37+
38+
validate: src/descriptor/service.sdl
39+
@echo "*** Validating service config ..."
40+
@java -jar ../../cloudera/cm_ext/validator/target/validator.jar -s src/descriptor/service.sdl
41+
42+
validate-mdl: src/descriptor/service.mdl
43+
@echo "*** Validating monitor config ..."
44+
@java -jar ../../cloudera/cm_ext/validator/target/validator.jar -z src/descriptor/service.mdl
45+
46+
clean:
47+
rm -rf target

README.md

Lines changed: 42 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,17 @@ This repository allows you to install [Apache Airflow](https://airflow.apache.or
55
## Requirements
66
- A supported operating system.
77
- MySQL or PostgreSQL database in which to store Airflow metadata.
8-
- [Airflow](https://github.com/teamclairvoyant/apache-airflow-parcels) and [RabbitMQ](https://github.com/teamclairvoyant/rabbitmq-cloudera-parcel) parcels need to be installed.
98

109
### Currently Supported Versions of Airflow
11-
- Airflow 1.7.1.3
12-
- Airflow 1.8.0
10+
- Airflow 1.10
1311

1412
### Currently Supported Operating Systems
15-
- CentOS 6 & 7
16-
- RHEL 6 & 7
13+
- CentOS/RHEL 6 & 7
14+
- Debian 8
15+
- Ubuntu 14.04, 16.04, & 18.04
1716

1817
## Installing the CSD
19-
1. Download the Jar file. [Airflow CSD](https://teamclairvoyant.s3.amazonaws.com/apache-airflow/cloudera/csd/AIRFLOW-1.8.0.jar)
18+
1. Download the Jar file. [Airflow CSD](http://archive.clairvoyantsoft.com/airflow/csd/)
2019
2. Copy the jar file to the `/opt/cloudera/csd` location on the Cloudera Manager server.
2120
3. Restart the Cloudera Manager Server service. `service cloudera-scm-server restart`
2221

@@ -57,28 +56,26 @@ create_postgresql_dbs-airflow.sh --host <host_name> --user <username> --password
5756
```
5857

5958
## Roles
60-
There are seven roles defined in the CSD.
61-
1. Airflow Webserver
62-
2. Airflow Scheduler
63-
3. Airflow Worker
64-
4. RabbitMQ
65-
5. Airflow Flower
66-
6. Kerberos
67-
7. Gateway
59+
There are six roles available for deployment:
6860

69-
Airflow Webserver: Airflow Webserver role is used to start the Airflow Web UI. Webserver role can be deployed on more than instances. However, they will be the same and can be used for backup purposes.
61+
1. Webserver
62+
2. Scheduler
63+
3. Worker
64+
4. Flower Webserver
65+
5. Kerberos
66+
6. Gateway
7067

71-
Airflow Scheduler: Airflow Scheduler role is used to schedule the Airflow jobs. This is limited to one instance to reduce the risk of duplicate jobs.
68+
Webserver: Airflow Webserver role runs the Airflow Web UI. Webserver role can be deployed on more than instances. However, they will be the same and can be used for backup purposes.
7269

73-
Airflow Worker: Airflow Worker role picks jobs from RabbitMQ and executed them on the nodes. Multiple instances can be deployed.
70+
Scheduler: Airflow Scheduler role is used to schedule the Airflow jobs. This is limited to one instance to reduce the risk of duplicate jobs.
7471

75-
RabbitMQ: RabbitMQ role facilitates the use of RabbitMQ as the messaging broker. Currently the number of roles is limited to 1.
72+
Worker: Airflow Worker role picks jobs from the Scheduler and executes them. Multiple instances can be deployed.
7673

77-
Airflow Flower: Airflow Flower is used to monitor celery clusters. Multiple instances are supported
74+
Flower Webserver: Flower Webserver role is used to monitor Celery clusters. Celery allows for the expansion of Worker Only one instance is needed.
7875

79-
Kerberos: Kerberos is used to enable Kerberos protocol for the Airflow. It internally executes `airflow kerberos`. An external Kerberos Distribution Center must be setup. Multiple instances can be setup for load balancing purposes.
76+
Kerberos: Airflow Kerberos role is used to enable Kerberos protocol for the other Airflow roles and for DAGs. This role should exist on each host with an Airflow Worker role.
8077

81-
Gateway: The purpose of the gateway role is to write the configurations from the configurations tab into the airflow.cfg file. This is done through the update_cfg.sh file which is executed from the scriptRunner within the gateway role.
78+
Gateway: The purpose of the gateway role is to make the configuration available to CLI clients.
8279

8380
## Using the Airflow binary:
8481
Here are some of the examples of Airflow commands:
@@ -103,55 +100,40 @@ The DAG file has to be copied to `dags_folder` directory within all the nodes. I
103100
In order to enable authentication for the Airflow Web UI check the "Enable Airflow Authentication" option. You can create Airflow users using one of two options below.
104101

105102
### Creating Airflow Users using UI:
106-
1. Navigate to Airflow CSD. In the configurations page, enter the Airflow Username, Airflow Email, Airflow Password you want to create.
107-
2. Deploy the client configurations to create the Airflow user.
103+
One way to add Airflow users to the database is using the `airflow-mkuser` script. Users can be added as follows:
104+
105+
1. Navigate to Airflow WebUI.
106+
2. In the Admin dropdown choose Users.
107+
3. Choose Create and enter the username, email, and password you want to create.
108108

109109
Note: Although the last created user shows up in the Airflow configurations, you can still use the previously created users.
110110

111-
### Using mkuser.sh
112-
Another way to add Airflow users is using the `mkuser.sh` script. Users can be added as follows:
113-
1. Navigate to the current working directory of the CSD under `/var/run/cloudera-scm-agent/process`
114-
2. Export PYTHONPATH and AIRFLOW_HOME environment variables. By default these are:
111+
### Using airflow-mkuser
112+
Another way to add Airflow users to the database is using the `airflow-mkuser` script. Users can be added as follows:
115113

116-
PYTHONPATH:
117-
```bash
118-
export PYTHONPATH=/opt/cloudera/parcels/AIRFLOW/usr/lib/python2.7/site-packages:$PYTHONPATH
119-
```
120-
Airflow Home:
121-
```bash
122-
export AIRFLOW_HOME=/var/lib/airflow
123-
```
124-
3. Within the scripts directory, you can find the `mkuser.py` file. Execute `mkuser.py` to add a user to Airflow:
125-
```bash
126-
/opt/cloudera/parcels/AIRFLOW/bin/python2.7 mkuser.py <Username> <UserEmail> <Password>
127-
```
128-
For example, this can be like
129-
```bash
130-
/opt/cloudera/parcels/AIRFLOW/usr/bin/python2.7 mkuser.py airflowUser airflow@email.com airflowUserPassword
131-
```
114+
```bash
115+
airflow-mkuser <username> <email> <password>
116+
```
117+
For example, this can be like:
118+
```bash
119+
airflow-mkuser admin admin@localdomain password123
120+
```
132121

133122
## Building the CSD
134123
```bash
135124
git clone https://github.com/teamclairvoyant/apache-airflow-cloudera-csd
136125
cd apache-airflow-cloudera-csd
137-
mvn clean package
138-
```
139-
or
140-
```bash
141-
java -jar target/validator.jar -s src/descriptor/service.sdl
142-
jar -cvf AIRFLOW-1.0.0.jar -C src/ .
126+
make dist
143127
```
128+
Update the `version` file before running `make dist` if creating a new release.
144129

145130
## Limitations:
146-
1. Number of RabbitMQ instances is limited to 1.
147-
2. The IP address of the RabbitMQ instance has to be manually entered during installation configuration.
148-
3. After deploying configurations, there is no alert or warning that the specific roles needs to be restarted.
149-
4. Only 'airflow.contrib.auth.backends.password_auth' mechanism is supported for Airflow user authentication.
131+
1. After deploying configurations, there is no alert or warning that the specific roles needs to be restarted.
132+
2. Only 'airflow.contrib.auth.backends.password_auth' mechanism is supported for Airflow user authentication.
150133

151134
## Future work:
152-
1. RabbitMQ needs to installed in Cluster Mode.
153-
2. Test Database connection.
154-
3. Add the support for more Airflow user authentication methods.
135+
1. Test Database connection.
136+
2. Add the support for more Airflow user authentication methods.
155137

156138
## Known Errors:
157139

@@ -164,7 +146,8 @@ Upon many deployments, you may face an error called 'Markup file already exists'
164146
Occasionally, we experienced some delay in DAG execution. We are working to fix this.
165147

166148
## Resources:
167-
1. https://github.com/cloudera/cm_ext/wiki/The-Structure-of-a-CSD
168-
2. https://github.com/cloudera/cm_ext/wiki/Service-Descriptor-Language-Reference
169-
3. https://github.com/cloudera/cm_csds
149+
1. https://github.com/teamclairvoyant/apache-airflow-parcels
150+
2. https://github.com/cloudera/cm_ext/wiki/The-Structure-of-a-CSD
151+
3. https://github.com/cloudera/cm_ext/wiki/Service-Descriptor-Language-Reference
152+
4. https://github.com/cloudera/cm_csds
170153

assembly.xml

Lines changed: 0 additions & 13 deletions
This file was deleted.

pom.xml

Lines changed: 0 additions & 35 deletions
This file was deleted.

src/aux/airflow-env.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
#!/bin/bash
2+
##
3+
# Generated by Cloudera Manager and should not be modified directly
4+
##

0 commit comments

Comments
 (0)