Showing posts with label airflow. Show all posts
Showing posts with label airflow. Show all posts

Saturday 26 May 2018

Install apache airflow on ubuntu

What is Airflow:

Airflow is a platform to programmatically author, schedule and monitor workflows. This blog contains following procedures to install airflow in ubuntu/linux machine. 
  1. Installing system dependencies 
  2. Installing airflow with extra packages
  3. Installing airflow meta database
    1. Mysql
    2. Postgres
  4. Installing Rabbitmq (Message broker for CeleryExecutor)
We can use RabbitMQ as a message broker if you are using Celery executor. For LocalExecutor no need to install any message brokers like Rabbitmq/Redis 

1. Installing Dependency packages:


apt-get update && apt-get upgrade -y

sudo apt-get -yqq install git \
    python-dev \
    libkrb5-dev \
    libsasl2-dev \
    libssl-dev \
    libffi-dev \
    build-essential \
    libblas-dev \
    liblapack-dev \
    libpq-dev \
    python-pip \
    python-requests \
    apt-utils \
    curl \
    netcat \
    locales \
    libmysqlclient-dev \
    supervisor
pip install --upgrade pip 

2. Install Apache airflow


pip install PyYAML==3.12
pip install requests==2.18.4
pip install simplejson==3.12.0

pip install apache-airflow[crypto,celery,postgres,hive,hdfs,jdbc,gcp_api,rabbitmq,password,s3,mysql]==1.8.1
pip install celery==3.1.17 

3. Install Meta Database:


i. Install  Mysql


#Installing and enable mysql server
sudo debconf-set-selections <<< 'mysql-server mysql-server/root_password password airflowd2p'
sudo debconf-set-selections <<< 'mysql-server mysql-server/root_password_again password airflowd2p'

sudo apt-get -y install mysql-server    libmysqlclient-dev 

ii . Install Postgressql


# Installing and enable postgresql in systemd and starting server
sudo apt-get -y install postgresql \
    postgresql-contrib
update-rc.d postgresql enable

service postgresql start

4. Install rabbitbq

apt-get update && apt-get upgrade -y
#Install erlang - dependency package for rabbitmq
sudo dpkg -i erlang-solutions_1.0_all.deb
sudo apt-get update
#Install rabbitmq server
echo "deb https://dl.bintray.com/rabbitmq/debian xenial main" | sudo tee /etc/apt/sources.list.d/bintray.rabbitmq.list
sudo apt-get update

sudo apt-get -yqq install erlang    rabbitmq-server

5. Create Rabbitmq users:


#!/usr/bin/env bash
#Creating airflow user, tag, virtual host
rabbitmq-plugins enable rabbitmq_management
rabbitmqctl add_user airflow_user airflow_user
rabbitmqctl add_vhost airflow
rabbitmqctl set_user_tags airflow_user airflow_tag
rabbitmqctl set_user_tags airflow_user administrator

rabbitmqctl set_permissions -p airflow airflow_user ".*" ".*" ".*"


Sunday 18 September 2016

How SMTP server works? and How to configure SMTP server for apache airflow?

What is SMTP?


     SMTP stands for Simple Mail Transfer Protocol. First of all what is protocol? Almost all your online activity made through the help of protocols. Protocol is the set of roles and guidelines which is help your computer to link up to networks everywhere.

    There are lot of protocols are there which are used for various purpose like send Email, File Transfer, Online shopping, read news etc. SMTP closely works with MTA (Mail Transfer Agent) which is running in your computer, so emails are moves from your computer's MTA to an another computer MTA.

How it works?


    In simple words SMTP Server is a machine just like our normal server which is designed for more specific purpose. Our machine email client connect SMPT server using specific port (usually port number 25).

If we send the email through SMTP server, first our email client makes request to SMTP server which is configured in your machine. 

SMTP server receive the request and verify your authentication credentials, once verification done then it is process further steps

The request contains message content, sender address, recipient email addresses and other information

Then the server send the request to recipient's mail server, recipient's server repeats the same process of authentication and verification.

Recipient's server verifies the sender and receiver addresses, message content, DNS issues, signature verification and more on.

Once everything fine, recipient mail server will deliver the message to recipient.


SMTP Providers:

 
 There are several SMTP providers available across the world. I have listed few common service providers below.


PROVIDER
URL
SMTP SETTINGS
Gmail
Gmail.com
Smtp.gmail.com
Yahoo
Yahoo.com
Smtp.mail.yahoo.com
Outlook.com (former Hotmail)
Outlook.com
Smtp.live.com
Bluewin
Bluewin.ch
Smtpauths.bluewin.ch
BT Connect
Btconnect.com
Mail.btconnect.tom
Comcast
Comcast.net
Smtp.comcast.net
Earthlink
Earthlink.net
Smtpauth.earthlink.net
AOL
Aol.com
Smtp.aol.com
Orange
Orange.net
Smtp.orange.net
Tin
Tin.it
Mail.tin.it
Tiscali
Tiscali.co.uk
Smtp.tiscali.co.uk
Verizon
Verizon.net
Outgoing.verizon.net
Virgin
Virgin.net
Smtp.virgin.net
Wanadoo
Wanadoo.fr
Smtp.wanadoo.fr
AT&T
Att.net
Outbound.att.net


Configure SMTP server in apachi airflow:


 
    Airflow is a python based platform for schedule and monitoring the workflows. Want to know more about airflow go through the airflow document.

Airflow allow us to send email notification when specific event occurs like job fails, retry, sla notification. It having email operator also you can send email based on your requirement. Before that we need to configure SMTP server for enable airflow can send email notification.

We need to add smtp configuration settings in airflow.cfg file which is usually located in airflow home folder (/root/airflow/).  Here I have using gmail SMTP server for below example configuration. 

Open airflow.cfg file in any editor and add configuration in [smtp] section.

[smtp]
# If you want airflow to send emails on retries, failure, and you want to use
# the airflow.utils.email.send_email_smtp function, you have to configure an smtp
# server here
smtp_host = smtp.gmail.com
smtp_starttls = True
smtp_ssl = False
smtp_user = <your gmail address>
smtp_port = 587
smtp_password = <password>
smtp_mail_from = <from mail address>

In default SMTP uses port 25, but if you using google SMTP server, you have to use port 587. Once done your configuration try using EmailOperator class in airflow to check the functionality.

send_mail = EmailOperator (
                         dag=dag_id,
                         task_id="simple_email_opearetor",
                         to=[''my_mail@gmail.com","second_mail@yahoo.com"],
                         subject="Testing",
                         html_content="<h3>Welcome to Airflow</h3>")