Sunday 18 September 2016

How SMTP server works? and How to configure SMTP server for apache airflow?

What is SMTP?


     SMTP stands for Simple Mail Transfer Protocol. First of all what is protocol? Almost all your online activity made through the help of protocols. Protocol is the set of roles and guidelines which is help your computer to link up to networks everywhere.

    There are lot of protocols are there which are used for various purpose like send Email, File Transfer, Online shopping, read news etc. SMTP closely works with MTA (Mail Transfer Agent) which is running in your computer, so emails are moves from your computer's MTA to an another computer MTA.

How it works?


    In simple words SMTP Server is a machine just like our normal server which is designed for more specific purpose. Our machine email client connect SMPT server using specific port (usually port number 25).

If we send the email through SMTP server, first our email client makes request to SMTP server which is configured in your machine. 

SMTP server receive the request and verify your authentication credentials, once verification done then it is process further steps

The request contains message content, sender address, recipient email addresses and other information

Then the server send the request to recipient's mail server, recipient's server repeats the same process of authentication and verification.

Recipient's server verifies the sender and receiver addresses, message content, DNS issues, signature verification and more on.

Once everything fine, recipient mail server will deliver the message to recipient.


SMTP Providers:

 
 There are several SMTP providers available across the world. I have listed few common service providers below.


PROVIDER
URL
SMTP SETTINGS
Gmail
Gmail.com
Smtp.gmail.com
Yahoo
Yahoo.com
Smtp.mail.yahoo.com
Outlook.com (former Hotmail)
Outlook.com
Smtp.live.com
Bluewin
Bluewin.ch
Smtpauths.bluewin.ch
BT Connect
Btconnect.com
Mail.btconnect.tom
Comcast
Comcast.net
Smtp.comcast.net
Earthlink
Earthlink.net
Smtpauth.earthlink.net
AOL
Aol.com
Smtp.aol.com
Orange
Orange.net
Smtp.orange.net
Tin
Tin.it
Mail.tin.it
Tiscali
Tiscali.co.uk
Smtp.tiscali.co.uk
Verizon
Verizon.net
Outgoing.verizon.net
Virgin
Virgin.net
Smtp.virgin.net
Wanadoo
Wanadoo.fr
Smtp.wanadoo.fr
AT&T
Att.net
Outbound.att.net


Configure SMTP server in apachi airflow:


 
    Airflow is a python based platform for schedule and monitoring the workflows. Want to know more about airflow go through the airflow document.

Airflow allow us to send email notification when specific event occurs like job fails, retry, sla notification. It having email operator also you can send email based on your requirement. Before that we need to configure SMTP server for enable airflow can send email notification.

We need to add smtp configuration settings in airflow.cfg file which is usually located in airflow home folder (/root/airflow/).  Here I have using gmail SMTP server for below example configuration. 

Open airflow.cfg file in any editor and add configuration in [smtp] section.

[smtp]
# If you want airflow to send emails on retries, failure, and you want to use
# the airflow.utils.email.send_email_smtp function, you have to configure an smtp
# server here
smtp_host = smtp.gmail.com
smtp_starttls = True
smtp_ssl = False
smtp_user = <your gmail address>
smtp_port = 587
smtp_password = <password>
smtp_mail_from = <from mail address>

In default SMTP uses port 25, but if you using google SMTP server, you have to use port 587. Once done your configuration try using EmailOperator class in airflow to check the functionality.

send_mail = EmailOperator (
                         dag=dag_id,
                         task_id="simple_email_opearetor",
                         to=[''my_mail@gmail.com","second_mail@yahoo.com"],
                         subject="Testing",
                         html_content="<h3>Welcome to Airflow</h3>")




Wednesday 31 August 2016

How to split large files into smaller chunk files using python?

In big data world, many of us handing large data files. When the file size is very big (above 10 GB) it is difficult to handle it as a single big file, at the time we need to split into several smaller chunks and than process it.

There are several ways to split a large files, but here I have given one of the two ways to do this process.
  • Using unix commands to split a file
  • Second option is typical pythonic way

1. Using unix commands:


Unix having inbuilt spilt command for split a files into smaller files.
split [options] filename prefix
Options:

   There are several options are there in split command (see using man command).  main options are split based on number of lines or size
  -l linenumber

  -b bytes

Example: split -l 1000 myfile.csv

How to call unix commands using python?


    Python having inbuilt OS module to do this. But I recommend to use subprocess module for calling unix commands inside python code. The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes.

code:
 import subprocess
 cmd = ["split", "-l", "10000", "", "myfile.csv"]
 subprocess.check_call(cmd) 

2. Using python file object to split files:


We can able to split files in typical python way. you can read the file line by line and write it to new file. based on the line count we can split the files.

chunked_file_line_count = 1000
with open("mylargefile.csv") as file_obj:
                line_count = 0
                file_count = 0
                print "================ Spliting file ======================== "+self.file_name
                chunked_file_object = open("file_"+str(file_count)+".csv","wb")
                for line in file_obj:
                    chunked_file_object.write(line)
                    line_count = line_count + 1
                    if line_count == chunked_file_line_count:
                        file_count = file_count + 1
                        line_count = 0
                        print "writing file " + str(file_count)
                        chunked_file_object = open("file_"+str(file_count)+".csv","wb")

                chunked_file_object.close()









Monday 22 August 2016

What is SSH key authentication?


SSH key authentication providing secure way of logging into server without entering password. while password is easily cracked with a brute force attack but SSH key is impossible to crack.

So what is SSH key authentication?

   SSH key is a two long string generated using key-pair which you have provided. one string is called public key and another one is called private key. You have to place public key in your server and private key in your client machine. When you request the server, it will check the provided private key with server public key, If it matches then system unlocks without the need for a password.


Step 1: Creating public key and private key using puttygen:



PuTTYgen is a key generator. It generates pairs of public and private keys. key passphrase is giving additional security but you can leave it as empty. You could generate the keys and store it into your local machine. putty key generator have options to save public key and private key.



Step 2: Add your public key to server:

Copy the public key from keygen. Key is shown in following format

                                             "ssh-rsa <keystring> keycomment"


                                  






Add your public key into your server ssh folder. In windows ssh folder available in the following general path C:\Users\<username>\.ssh. save the ppk file into this path.

In Linux machine it will available in user home location "/home/<username>/.ssh"
.ssh folder is hidden by default.









There is an "authorized_keys" file available in .ssh folder. This file contains the list of public keys for the server. You have to add your public key into authorized_key file. Open "authorized_keys" in any one editor(vi, vim, nano.. ) and add your new public key to end of the file.



Step 3: Call your server with private key

You have stored public key into server ssh folder. Now call your server with locally stored private key file. If your private key and server public key match up, than server allows you without asking password.

Windows user connect server via putty tool. 

  • Open putty
  • Give IP address of the server in hostname location
  • Click SSH authentication option in left side and then click Auth section
  • Browse private key file which is stored in your local machine and click open
  • Enter username for login (name of the user you have added public key in .ssh folder in server)




Linux User:

Linux user can connect server via terminal using ssh. go to the path of your pem key file(not .ppk file) and put following command.

               ssh -i <pem key file> username@hostname

Example:  ssh -i private_key.pem ubuntu@10.10.10.10

Linux or mac machine does not support .ppk file, it supports .pem(permission file) format. you can convert ppk file in to pem file using puttygen software.

Converting .ppk file to .pem file:

  • Open puttygen
  • load existing ppk file
  • click conversions in menu bar
  • choose Export open ssh key
  • save it in to .pem format

you could have convert vice versa. You can convert ppk to pem and also pem to ppk file.