Technical blog: May 2018

Saturday, 26 May 2018

Install apache storm and Zookeeper

Requirements:

install java
install zookeeper
install storm

Installing Java:



    sudo apt-get install software-properties-common python-software-properties

    sudo add-apt-repository ppa:webupd8team/java

    sudo apt-get update

    sudo apt-get install oracle-java8-installer

Check Java version after successful installation :


java -version

Setting Java path

Add following two lines into .bashrc file

Refer https://askubuntu.com/questions/175514/how-to-set-java-home-for-java




export JAVA_HOME =/usr/lib/jvm/java-8-oracle

export PATH=$PATH:$JAVA_HOME/bin

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_151.jdk/Contents/Home

export PATH=$PATH:$JAVA_HOME/bin

Installing zookeeper:



wget http://www-eu.apache.org/dist/zookeeper/stable/zookeeper-3.4.10.tar.gz

tar -zxvf zookeeper-3.4.10.tar.gz

mv zookeeper-3.4.10/ zookeeper

cd zookeeper

mkdir data

cp conf/zoo_sample.cfg conf/zoo.cfg 

bin/zkServer.sh start

Installing Storm:



wget http://www-us.apache.org/dist/storm/apache-storm-1.1.1/apache-storm-1.1.1.tar.gz

tar -zxvf apache-storm-1.1.1.tar.gz

mv apache-storm-1.1.1/ apache-storm

cd apache-storm

mkdir data

cd apache-storm/bin/

chmod +x storm

configuring storm.yaml:



vim apache-storm/conf/storm.yaml

storm.zookeeper.servers:     

    - "localhost"

nimbus.seeds: ["localhost"]

Adding Storm path:

Add following line into .bashrc file




export PATH=$PATH:/home/storm_user/storm-pixel/apache-storm/bin

check storm commands after successful installation:

storm version
storm nimbus
storm supervisor
storm ui

Configuring streamparse:

http://streamparse.readthedocs.io/en/stable/quickstart.html

Dependencies:

lein
storm

Install lein:


# refer http://leiningen.org/
cd /usr/bin
wget https://raw.githubusercontent.com/technomancy/leiningen/stable/bin/lein
chmod +x lein
run lein command. it will install packages to local
lein version

numpy basics python

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

a powerful N-dimensional array object

sophisticated (broadcasting) functions

tools for integrating C/C++ and Fortran code

useful linear algebra, Fourier transform, and random number capabilities

Importing Numpy:


import numpy as np

Creating numpy Array:


 d = np.array([1,2,3,4,5])

Numpy range:


  d = np.arange(1,10). # It will create numpy array from range 1 to 9

numpy shape:

It will return total elements count based on rows or shape



d = np.array([1,2,3])

print d   # array([1, 2, 3])

print d.shape # (3,)

numpy reshape:

It will change the shape of numpy arrays



d = np.arange(1,10)    # array([1,2,3,4,5,6,7,8,9])

d.shape     # (9,)

d.reshape(3,3)

print d    #  Array([[1, 2, 3],

                                [4, 5, 6],

                                [7, 8, 9]])

Above example, it will reshape like 3X3 matrix structure

np.zeros()

It will create zero value matrix numpy array. We have to give dimension value in the function and it will create matrix arrays.



np.zeros(3, 3)       # Array([[0., 0., 0.],

                                            [0., 0., 0.],

                                            [0., 0., 0.]])

np.vstack()

It will vertically stack each elements in numpy array.



c = np.array([1,2,3])  # array([1, 2, 3])

np.vstack(c)    # array([ [ 1],

                                       [ 2],

                                       [ 3]])

np.eye()

It will create numpy identical matrix array.


 np.eye(3) # it will create 3X3 matrix


                     Array (  [ 1,    0,   0]

                                   [ 0,     1,   1]

                                   [ 0,     0,   1])

np.dot()

It will dot product of two matrix (multiplication)


 np.dot(M1, M2)

np.sum()

It will sum of all the elements in given array.



#M = Array([[1, 2, 3],

            [4, 5, 6],

            [7, 8, 9]])

np.sum(M) # 45 it will sum all the elements in array

np.sum(M, axis=0)   # [[12, 15, 18]]

If axis= 0, it will sum column wise, it

If axis = 1, it will sum row wise

np.random.rand()

It will produce random np arrays

np.append()

Append elements to nd array



 A = array([1, 2, 3])

 B = np.append(A, 4) # [1, 2, 3, 4]

 B = np.append(A, [4, 5,6,7]) # [1, 2, 3, 4, 5, 6, 7]

Install apache airflow on ubuntu

What is Airflow:

Airflow is a platform to programmatically author, schedule and monitor workflows. This blog contains following procedures to install airflow in ubuntu/linux machine.

Installing system dependencies
Installing airflow with extra packages
Installing airflow meta database

Mysql
Postgres

Installing Rabbitmq (Message broker for CeleryExecutor)

We can use RabbitMQ as a message broker if you are using Celery executor. For LocalExecutor no need to install any message brokers like Rabbitmq/Redis

1. Installing Dependency packages:


apt-get update && apt-get upgrade -y








sudo apt-get -yqq install git \


    python-dev \


    libkrb5-dev \


    libsasl2-dev \


    libssl-dev \


    libffi-dev \


    build-essential \


    libblas-dev \


    liblapack-dev \


    libpq-dev \


    python-pip \


    python-requests \


    apt-utils \


    curl \


    netcat \


    locales \


    libmysqlclient-dev \


    supervisor




pip install --upgrade pip

2. Install Apache airflow




pip install PyYAML==3.12


pip install requests==2.18.4



pip install simplejson==3.12.0






pip install apache-airflow[crypto,celery,postgres,hive,hdfs,jdbc,gcp_api,rabbitmq,password,s3,mysql]==1.8.1


pip install celery==3.1.17

3. Install Meta Database:

i. Install Mysql



#Installing and enable mysql server

sudo debconf-set-selections <<< 'mysql-server mysql-server/root_password password airflowd2p'


sudo debconf-set-selections <<< 'mysql-server mysql-server/root_password_again password airflowd2p'






sudo apt-get -y install mysql-server    libmysqlclient-dev

ii . Install Postgressql



# Installing and enable postgresql in systemd and starting server

sudo apt-get -y install postgresql \

    postgresql-contrib




update-rc.d postgresql enable





service postgresql start

4. Install rabbitbq



apt-get update && apt-get upgrade -y




#Install erlang - dependency package for rabbitmq

wget https://packages.erlang-solutions.com/erlang-solutions_1.0_all.deb

sudo dpkg -i erlang-solutions_1.0_all.deb

sudo apt-get update




#Install rabbitmq server

echo "deb https://dl.bintray.com/rabbitmq/debian xenial main" | sudo tee /etc/apt/sources.list.d/bintray.rabbitmq.list

wget -O- https://dl.bintray.com/rabbitmq/Keys/rabbitmq-release-signing-key.asc |      sudo apt-key add -

sudo apt-get update





sudo apt-get -yqq install erlang    rabbitmq-server

5. Create Rabbitmq users:



#!/usr/bin/env bash




#Creating airflow user, tag, virtual host

rabbitmq-plugins enable rabbitmq_management

rabbitmqctl add_user airflow_user airflow_user

rabbitmqctl add_vhost airflow

rabbitmqctl set_user_tags airflow_user airflow_tag

rabbitmqctl set_user_tags airflow_user administrator





rabbitmqctl set_permissions -p airflow airflow_user ".*" ".*" ".*"

What is Ansible

Ansible interacting with machines via SSH. So nothing need to be installed in client machines. Only prerequisite is ansible need to be installed in controller machine with python and ssh enabled.

Inventory:

Inventory file:

Inventory file is an simple text file which contains List of machines going to interact with it. We can mention single machines or group of machines going to use it. We can pass direct commands to modules in cmd line using ansible cli.

Cmd: ansible group-name -i <inventory-filename> -m <module-name> <module-params>

ansible group-name -i <inventory-filename> -m <module-name> <module-params>

Inventory:


server1.mycomp.com

server2.mycomp.com

  

[clients] #group name

server3.mycomp.com

server4.mycomp.com

Ex:


ansible clients -i inventory -m ping

ansible clients -i inventory -m apt -a "name=mysql-server state=present"

Inventory file can also be an executable file. For example if you don’t know the number of instances running in AWS means we can simple write a script to return running instances name from AWS.

Ansible play books:

Ansible playbook is an simple YAML file which contains list of tasks that need to be performed in client machines which we mentioned in inventory file.

playbook.yaml



---







- hosts: all


  tasks:


    - name: updating package list


      apt: update_cache=yes cache_valid_time=3600


- hosts: clients


  tasks:


    - name: installing mysql server


      apt: name=mysql-server state=present

In above code snippet, we used apt module for updating and installing packages. Host all specifies perform the task to all the host machines which we mentioned in inventory file.

And also we can perform task to specific group of hosts. “hosts: client” specifies perform below mentioned tasks only to client group which we created in inventory file. “-name” of each tasks contains some human readable message which will print while performing the tasks. This will be very helpful while monitoring the execution

Running playbook:


  ansible-playbook -i inventory playbook.yaml

Vaiables in playbook:

Ansible using jinja2 templating system for dealing with varibles.

playbook.yaml



---

- hosts: all


  tasks:


    - name: updating package list


      apt: update_cache=yes cache_valid_time=3600




- hosts: clients


  vars:


    init_script: "create_db.sql"


  tasks:


    - name: installing mysql server


      apt: name=mysql-server state=present


    - name: coping init sql files






      copy: src=/tmp/{{init_script}} dest=/tmp/mysql/{{init_script}}

Variable loops in playbook:

playbook.yaml



---

- hosts: all


  tasks:


    - name: updating package list


      apt: update_cache=yes cache_valid_time=3600


- hosts: clients


  vars:


    init_script: “create_db.sql"


  tasks:


    - name: installing mysql server


      apt: name={{item}} state=present


      with_items:


        - python 


        - python-pip 

        - vim

    - name: coping init sql files






      copy: src=/tmp/{{init_script}} dest=/tmp/mysql/{{init_script}}

Other way - we can combine the variables based on hosts vise

playbook.yaml



---






- hosts: all


  tasks:


    - name: updating package list


      apt: update_cache=yes cache_valid_time=3600




- hosts: clients


  vars:


    packages:

      - python 


      - python-pip 

      - vim

  tasks:


    - name: installing mysql server


      apt: name={{item}} state=present


      with_items: {{packages}}


        - name: coping init sql files


          copy: src=/tmp/{{init_script}} dest=/tmp/mysql/{{init_script}}

Directory Group variables:

In default ansible will look directory called “group_vars” and “host_vars” in same location which playbook located. If you define any variables under the group_vars directory it will automatically applied to that specific group.

My folder structure:

- inventory

- playbook.yml

- group_vars

- all

- clients

- host_vars

- server.com

In above folder structure, variable defined in the file called “all” under the group_vars directory which will be available for all hosts defined in inventory hosts. If you want to define variables for specific host create file with same hostname under the “host_vars” directory.

Inventory directory:

Normally inventory file will be simple test file but it can also be an directory.

ansible-playbook -i <inventory-dirctory> playbook.yml

ansible-playbook -i uat deploy.yml
ansible-playbook -i dev deploy.yml
ansible-playbook -i prod deploy.yml

Directory structure of inventory folder:

dev

- hosts

- group_vars

- host_vars

uat

- hosts

- group_vars

- host_vars

Prod

- hosts

- group_vars

- host_vars

deploy.yml

Is there any text files available in your inventory directory, ansible will treat it as inventory file.

Roles in ansible:

You can use single playbook file for managing entire tasks of your infrastructure. But once in a stage your playbook file will be more bigger and hard to manage. For this ansible has the “role” feature, so you can split your playbook yaml file into more moduler way.

You can create a directory called “roles” and create playbook modules.

Roles directory structure:

dev

- hosts

- group_vars

- host_vars

roles

- common

- defaults

- main.yml # variable values

- tasks

- main.yml # list of tasks need to be execute

- files

- server.py # file need to be copy

- templates

- config.py.j2 # template file used for template module

- meta

- main.yml # list the dependency task before perform

- webserver

- defaults

- main.yml # variable values

- tasks

- main.yml # list of tasks

- db

- tasks

- main.yml # list of tasks

deploy.yml

Deploy.yaml



- hosts: database-server


  roles:


    - common


    - db

- hosts: web-server


  roles:


    - common


    - webserver

Here we can break down the roles folder into more modules. It has documented in ansible documentation site.

Defaults folder contains the variable need to be register
Task folder contains task need to be perform for that group
Files folder contains the files need to be transferred
Templates folder is for template module
Meta folder contains the dependency list for That specific group

Ex:

main.yml

---

Dependencies:

- common

- db

Category List

Saturday, 26 May 2018

Install apache storm and Zookeeper

Requirements:

Installing Java:

Check Java version after successful installation :

Setting Java path

Installing zookeeper:

Installing Storm:

configuring storm.yaml:

Adding Storm path:

check storm commands after successful installation:

Configuring streamparse:

Dependencies:

numpy basics python

Importing Numpy:

Creating numpy Array:

Numpy range:

numpy shape:

numpy reshape:

np.zeros()

np.vstack()

np.eye()

np.dot()

np.sum()

np.random.rand()

np.append()

Install apache airflow on ubuntu

What is Airflow:

1. Installing Dependency packages:

2. Install Apache airflow

3. Install Meta Database:

i. Install Mysql

ii . Install Postgressql

4. Install rabbitbq

5. Create Rabbitmq users:

ansible basics for beginners

What is Ansible

Inventory:

Inventory file:

Ansible play books:

Vaiables in playbook:

Variable loops in playbook:

Directory Group variables:

Inventory directory:

Roles in ansible:

About Me