Saturday 26 May 2018

Install apache airflow on ubuntu

What is Airflow:

Airflow is a platform to programmatically author, schedule and monitor workflows. This blog contains following procedures to install airflow in ubuntu/linux machine. 
  1. Installing system dependencies 
  2. Installing airflow with extra packages
  3. Installing airflow meta database
    1. Mysql
    2. Postgres
  4. Installing Rabbitmq (Message broker for CeleryExecutor)
We can use RabbitMQ as a message broker if you are using Celery executor. For LocalExecutor no need to install any message brokers like Rabbitmq/Redis 

1. Installing Dependency packages:

apt-get update && apt-get upgrade -y

sudo apt-get -yqq install git \
    python-dev \
    libkrb5-dev \
    libsasl2-dev \
    libssl-dev \
    libffi-dev \
    build-essential \
    libblas-dev \
    liblapack-dev \
    libpq-dev \
    python-pip \
    python-requests \
    apt-utils \
    curl \
    netcat \
    locales \
    libmysqlclient-dev \
pip install --upgrade pip 

2. Install Apache airflow

pip install PyYAML==3.12
pip install requests==2.18.4
pip install simplejson==3.12.0

pip install apache-airflow[crypto,celery,postgres,hive,hdfs,jdbc,gcp_api,rabbitmq,password,s3,mysql]==1.8.1
pip install celery==3.1.17 

3. Install Meta Database:

i. Install  Mysql

#Installing and enable mysql server
sudo debconf-set-selections <<< 'mysql-server mysql-server/root_password password airflowd2p'
sudo debconf-set-selections <<< 'mysql-server mysql-server/root_password_again password airflowd2p'

sudo apt-get -y install mysql-server    libmysqlclient-dev 

ii . Install Postgressql

# Installing and enable postgresql in systemd and starting server
sudo apt-get -y install postgresql \
update-rc.d postgresql enable

service postgresql start

4. Install rabbitbq

apt-get update && apt-get upgrade -y
#Install erlang - dependency package for rabbitmq
sudo dpkg -i erlang-solutions_1.0_all.deb
sudo apt-get update
#Install rabbitmq server
echo "deb xenial main" | sudo tee /etc/apt/sources.list.d/bintray.rabbitmq.list
sudo apt-get update

sudo apt-get -yqq install erlang    rabbitmq-server

5. Create Rabbitmq users:

#!/usr/bin/env bash
#Creating airflow user, tag, virtual host
rabbitmq-plugins enable rabbitmq_management
rabbitmqctl add_user airflow_user airflow_user
rabbitmqctl add_vhost airflow
rabbitmqctl set_user_tags airflow_user airflow_tag
rabbitmqctl set_user_tags airflow_user administrator

rabbitmqctl set_permissions -p airflow airflow_user ".*" ".*" ".*"