Saturday 26 May 2018

Install apache storm and Zookeeper

Requirements:


  • install java
  • install zookeeper
  • install storm

Installing Java:

    sudo apt-get install software-properties-common python-software-properties
    sudo add-apt-repository ppa:webupd8team/java
    sudo apt-get update
    sudo apt-get install oracle-java8-installer

    

 Check Java version after successful installation :


java -version

Setting Java path


Add following two lines into .bashrc file

Refer https://askubuntu.com/questions/175514/how-to-set-java-home-for-java

export JAVA_HOME =/usr/lib/jvm/java-8-oracle
export PATH=$PATH:$JAVA_HOME/bin
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_151.jdk/Contents/Home
export PATH=$PATH:$JAVA_HOME/bin


Installing zookeeper:

mv zookeeper-3.4.10/ zookeeper
cd zookeeper
mkdir data
cp conf/zoo_sample.cfg conf/zoo.cfg 
bin/zkServer.sh start


Installing Storm:

mv apache-storm-1.1.1/ apache-storm
cd apache-storm
mkdir data
cd apache-storm/bin/
chmod +x storm 

configuring storm.yaml:

vim apache-storm/conf/storm.yaml
storm.zookeeper.servers:     
    - "localhost"
nimbus.seeds: ["localhost"]

Adding Storm path:


Add following line into .bashrc file

export PATH=$PATH:/home/storm_user/storm-pixel/apache-storm/bin


check storm commands after successful installation:

  • storm version
  • storm nimbus 
  • storm supervisor
  • storm ui

Configuring streamparse:



Dependencies:

  • lein
  • storm

Install lein:

# refer http://leiningen.org/ cd /usr/bin wget https://raw.githubusercontent.com/technomancy/leiningen/stable/bin/lein chmod +x lein run lein command. it will install packages to local lein version 


numpy basics python

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

  • a powerful N-dimensional array object
  • sophisticated (broadcasting) functions
  • tools for integrating C/C++ and Fortran code
  • useful linear algebra, Fourier transform, and random number capabilities


Importing Numpy:

import numpy as np

Creating numpy Array:

 d = np.array([1,2,3,4,5])

Numpy range:

  d = np.arange(1,10). # It will create numpy array from range 1 to 9
   

numpy shape:


It will return total elements count based on rows or shape
d = np.array([1,2,3])
print d   # array([1, 2, 3])
print d.shape # (3,)

numpy reshape:



It will change the shape of numpy arrays

d = np.arange(1,10)    # array([1,2,3,4,5,6,7,8,9])
d.shape     # (9,)
d.reshape(3,3)
print d    #  Array([[1, 2, 3],
                                [4, 5, 6],
                                [7, 8, 9]])
Above example, it will reshape like 3X3 matrix structure


np.zeros()


    It will create zero value matrix numpy array. We have to give dimension value in the function and it will create matrix arrays.
np.zeros(3, 3)       # Array([[0., 0., 0.],
                                            [0., 0., 0.],
                                            [0., 0., 0.]]) 


np.vstack()


It will vertically stack each elements in  numpy array.

c = np.array([1,2,3])  # array([1, 2, 3])
np.vstack(c)    # array([ [ 1],
                                       [ 2],
                                       [ 3]])
   

np.eye()


It will create numpy  identical matrix array.

 np.eye(3) # it will create 3X3 matrix
                     Array (  [ 1,    0,   0]
                                   [ 0,     1,   1]
                                   [ 0,     0,   1])
        

     
np.dot()


 It will dot product of two matrix (multiplication)       
 np.dot(M1, M2)

np.sum()


 It will sum of all the elements in given array.
#M = Array([[1, 2, 3],
            [4, 5, 6],
            [7, 8, 9]])
np.sum(M) # 45 it will sum all the elements in array
np.sum(M, axis=0)   # [[12, 15, 18]] 
    
If axis= 0, it will sum column wise, it 
If axis = 1, it will sum row wise


np.random.rand()


It will produce random np arrays


np.append()


Append elements to nd array
 A = array([1, 2, 3])
 B = np.append(A, 4) # [1, 2, 3, 4]
 B = np.append(A, [4, 5,6,7]) # [1, 2, 3, 4, 5, 6, 7]
        
    






            
    

     

Install apache airflow on ubuntu

What is Airflow:

Airflow is a platform to programmatically author, schedule and monitor workflows. This blog contains following procedures to install airflow in ubuntu/linux machine. 
  1. Installing system dependencies 
  2. Installing airflow with extra packages
  3. Installing airflow meta database
    1. Mysql
    2. Postgres
  4. Installing Rabbitmq (Message broker for CeleryExecutor)
We can use RabbitMQ as a message broker if you are using Celery executor. For LocalExecutor no need to install any message brokers like Rabbitmq/Redis 

1. Installing Dependency packages:


apt-get update && apt-get upgrade -y

sudo apt-get -yqq install git \
    python-dev \
    libkrb5-dev \
    libsasl2-dev \
    libssl-dev \
    libffi-dev \
    build-essential \
    libblas-dev \
    liblapack-dev \
    libpq-dev \
    python-pip \
    python-requests \
    apt-utils \
    curl \
    netcat \
    locales \
    libmysqlclient-dev \
    supervisor
pip install --upgrade pip 

2. Install Apache airflow


pip install PyYAML==3.12
pip install requests==2.18.4
pip install simplejson==3.12.0

pip install apache-airflow[crypto,celery,postgres,hive,hdfs,jdbc,gcp_api,rabbitmq,password,s3,mysql]==1.8.1
pip install celery==3.1.17 

3. Install Meta Database:


i. Install  Mysql


#Installing and enable mysql server
sudo debconf-set-selections <<< 'mysql-server mysql-server/root_password password airflowd2p'
sudo debconf-set-selections <<< 'mysql-server mysql-server/root_password_again password airflowd2p'

sudo apt-get -y install mysql-server    libmysqlclient-dev 

ii . Install Postgressql


# Installing and enable postgresql in systemd and starting server
sudo apt-get -y install postgresql \
    postgresql-contrib
update-rc.d postgresql enable

service postgresql start

4. Install rabbitbq

apt-get update && apt-get upgrade -y
#Install erlang - dependency package for rabbitmq
sudo dpkg -i erlang-solutions_1.0_all.deb
sudo apt-get update
#Install rabbitmq server
echo "deb https://dl.bintray.com/rabbitmq/debian xenial main" | sudo tee /etc/apt/sources.list.d/bintray.rabbitmq.list
sudo apt-get update

sudo apt-get -yqq install erlang    rabbitmq-server

5. Create Rabbitmq users:


#!/usr/bin/env bash
#Creating airflow user, tag, virtual host
rabbitmq-plugins enable rabbitmq_management
rabbitmqctl add_user airflow_user airflow_user
rabbitmqctl add_vhost airflow
rabbitmqctl set_user_tags airflow_user airflow_tag
rabbitmqctl set_user_tags airflow_user administrator

rabbitmqctl set_permissions -p airflow airflow_user ".*" ".*" ".*"


ansible basics for beginners

What is Ansible


Ansible interacting with machines via SSH. So nothing need to be installed in client machines. Only prerequisite is ansible need to be installed in controller machine with python and ssh enabled.

Inventory:


Inventory file:


Inventory file is an simple text file which contains List of machines going to interact with it. We can mention single machines or group of machines going to use it. We can pass direct commands to modules in cmd line using ansible cli.

Cmd: ansible group-name -i <inventory-filename> -m <module-name> <module-params>

ansible group-name -i <inventory-filename> -m <module-name> <module-params>
 
Inventory:
server1.mycomp.com
server2.mycomp.com
 
[clients] #group name
server3.mycomp.com
server4.mycomp.com  


Ex: 
ansible clients -i inventory -m ping
ansible clients -i inventory -m apt -a "name=mysql-server state=present"

    Inventory file can also be an executable file. For example if you don’t know the number of instances running in AWS means we can simple write a script to return running instances name from AWS.

Ansible play books:


    Ansible playbook is an simple YAML file which contains list of tasks that need to be performed in client machines which we mentioned in inventory file.

playbook.yaml
---

- hosts: all
  tasks:
    - name: updating package list
      apt: update_cache=yes cache_valid_time=3600
- hosts: clients
  tasks:
    - name: installing mysql server
      apt: name=mysql-server state=present

In above code snippet, we used apt module for updating and installing packages. Host all specifies perform the task to all the host machines which we mentioned in inventory file. 

And also we can perform task to specific group of hosts. “hosts: client” specifies perform  below mentioned tasks only to client group which we created in inventory file. “-name” of each tasks contains some human readable message which will print while performing the tasks. This will be very helpful while monitoring the execution

    Running playbook:

  ansible-playbook -i inventory playbook.yaml

Vaiables in playbook:


Ansible using jinja2 templating system for dealing with varibles.

playbook.yaml
---
- hosts: all
  tasks:
    - name: updating package list
      apt: update_cache=yes cache_valid_time=3600
- hosts: clients
  vars:
    init_script: "create_db.sql"
  tasks:
    - name: installing mysql server
      apt: name=mysql-server state=present
    - name: coping init sql files

      copy: src=/tmp/{{init_script}} dest=/tmp/mysql/{{init_script}}


Variable loops in playbook:


playbook.yaml
---
- hosts: all
  tasks:
    - name: updating package list
      apt: update_cache=yes cache_valid_time=3600
- hosts: clients
  vars:
    init_script: “create_db.sql"
  tasks:
    - name: installing mysql server
      apt: name={{item}} state=present
      with_items:
        - python 
        - python-pip 
        - vim
    - name: coping init sql files

      copy: src=/tmp/{{init_script}} dest=/tmp/mysql/{{init_script}}

Other way - we can combine the variables based on hosts vise

playbook.yaml


---

- hosts: all
  tasks:
    - name: updating package list
      apt: update_cache=yes cache_valid_time=3600
- hosts: clients
  vars:
    packages:
      - python 
      - python-pip 
      - vim
  tasks:
    - name: installing mysql server
      apt: name={{item}} state=present
      with_items: {{packages}}
        - name: coping init sql files
          copy: src=/tmp/{{init_script}} dest=/tmp/mysql/{{init_script}}
     

Directory Group variables:


In default ansible will look directory called “group_vars” and “host_vars” in same location which playbook located. If you define any variables under the group_vars directory it will automatically applied to that specific group.

My folder structure:
    - inventory
    - playbook.yml
    - group_vars
            - all 
            - clients
    - host_vars
            - server.com

In above folder structure, variable defined in the file called “all” under the group_vars directory which will be available for all hosts defined in inventory hosts. If you want to define variables for specific host create file with same hostname under the “host_vars” directory.

Inventory directory:


    Normally inventory file will be simple test file but it can also be an directory. 

     ansible-playbook -i <inventory-dirctory> playbook.yml

  • ansible-playbook -i uat deploy.yml
  • ansible-playbook -i dev deploy.yml
  • ansible-playbook -i prod deploy.yml

Directory structure of inventory folder:
        
        dev
              - hosts
              - group_vars
              - host_vars
        uat
              - hosts
              - group_vars
              - host_vars
        Prod
              - hosts
              - group_vars
              - host_vars
        deploy.yml

Is there any text files available in your inventory directory, ansible will treat it as inventory file.

Roles in ansible:


You can use single playbook file for managing entire tasks of your infrastructure. But once in a stage your playbook file will be more bigger and hard to manage. For this ansible has the “role” feature, so you can split your playbook yaml file into more moduler way.

You can create a directory called “roles” and create playbook modules.

Roles directory structure:

        dev
              - hosts
              - group_vars
              - host_vars
        roles
              - common
                    - defaults
                        - main.yml   # variable values
                    - tasks
                        - main.yml   # list of tasks need to be execute
                    - files
                        - server.py   # file need to be copy
                    - templates
                        - config.py.j2  # template file used for template module
                    - meta
                        - main.yml  # list the dependency task before perform
              - webserver
                      - defaults
                        - main.yml   # variable values
                    - tasks
                        - main.yml   # list of tasks
              - db
                    - tasks
                        - main.yml   # list of tasks
        deploy.yml

Deploy.yaml

- hosts: database-server
  roles:
    - common
    - db
- hosts: web-server
  roles:
    - common
    - webserver



Here we can break down the roles folder into more modules. It has documented in ansible documentation site. 
  • Defaults folder contains the variable need to be register
  • Task folder contains task need to be perform for that group
  • Files folder contains the files need to be transferred
  • Templates folder is for template module
  • Meta folder contains the dependency list for That specific group
    
        Ex:
                main.yml
                --- 
                Dependencies:
                    - common
                    - db