Thursday 2 February 2017

Integrate pylint into git hook and pycharm

What is pylint:


Pylint is a source code, bug and quality checker for the Python programming language. It follows the style recommended by PEP 8, the Python style guide.[4] It is similar to Pychecker and Pyflakes, but includes the following features:
  • Checking the length of each line
  • Checking if variable names are well-formed according to the project's coding standard
  • Checking if declared interfaces are truly implemented.

Installing pylint:

Install following pylint package using pip installer

mac/unix:

    pip install pylint

windows:

   python -m pip install pylint

Once you installed pylint into your system, check the pylint version using following command to make sure pylint was installed properly or not.

 pylint --version

pylint example.py

Configure pylint into git hook:


Pre-commit hook for Git checking Python code quality. The hook will check files ending with .py or that has a she bang (#!) containing python.
The script will try to find pylint configuration files in the order determined by pylint. It also looks for a [pre-commit-hook] section in the pylint configuration for commit hook specific options.
pip install git-pylint-commit-hook
Next go to your git initialized folder and navigate into .git/hooks/ directory. Rename the existing template file "pre-commit.sample" into "pre-commit"
Delete everything in that file and paste this in the pre-commit file
#!/bin/sh
git-pylint-commit-hook

Usage

The commit hook will automatically be called when you are running git commit. If you want to skip the tests for a certain commit, use the -n flag,
 git commit -n.

pylint configuration

Settings are loaded by default from the .pylintrc file in the root of your repo.
[pre-commit-hook]
command=custom_pylint
params=--rcfile=/path/to/another/pylint.rc
limit=8.0
command is for the actual command, for instance if pylint is not installed globally, but is in a virtualenv inside the project itself.
params lets you pass custom parameters to pylint
limit is the lowest value which you want to allow for a pylint score. Any lower than this, and the script will fail and won’t commit.

Integrate Pylint into pycharm IDE:


step 1: 
Go to file -> settings 


Step2:

select "Tools -> External tools-> click add icon"



Step 3:

Fill the tool setting parameters. To be a little more flexible, you can use PyCharm macros. As an example use the value “$FilePath$” for Working directory and “$Promt$” for Parameters. This allows the use in other projects, too.



Step 4:

Now pylint is configured into your system. Right click the file and select pylint from external tools to run pylint for specific files












Sunday 18 September 2016

How SMTP server works? and How to configure SMTP server for apache airflow?

What is SMTP?


     SMTP stands for Simple Mail Transfer Protocol. First of all what is protocol? Almost all your online activity made through the help of protocols. Protocol is the set of roles and guidelines which is help your computer to link up to networks everywhere.

    There are lot of protocols are there which are used for various purpose like send Email, File Transfer, Online shopping, read news etc. SMTP closely works with MTA (Mail Transfer Agent) which is running in your computer, so emails are moves from your computer's MTA to an another computer MTA.

How it works?


    In simple words SMTP Server is a machine just like our normal server which is designed for more specific purpose. Our machine email client connect SMPT server using specific port (usually port number 25).

If we send the email through SMTP server, first our email client makes request to SMTP server which is configured in your machine. 

SMTP server receive the request and verify your authentication credentials, once verification done then it is process further steps

The request contains message content, sender address, recipient email addresses and other information

Then the server send the request to recipient's mail server, recipient's server repeats the same process of authentication and verification.

Recipient's server verifies the sender and receiver addresses, message content, DNS issues, signature verification and more on.

Once everything fine, recipient mail server will deliver the message to recipient.


SMTP Providers:

 
 There are several SMTP providers available across the world. I have listed few common service providers below.


PROVIDER
URL
SMTP SETTINGS
Gmail
Gmail.com
Smtp.gmail.com
Yahoo
Yahoo.com
Smtp.mail.yahoo.com
Outlook.com (former Hotmail)
Outlook.com
Smtp.live.com
Bluewin
Bluewin.ch
Smtpauths.bluewin.ch
BT Connect
Btconnect.com
Mail.btconnect.tom
Comcast
Comcast.net
Smtp.comcast.net
Earthlink
Earthlink.net
Smtpauth.earthlink.net
AOL
Aol.com
Smtp.aol.com
Orange
Orange.net
Smtp.orange.net
Tin
Tin.it
Mail.tin.it
Tiscali
Tiscali.co.uk
Smtp.tiscali.co.uk
Verizon
Verizon.net
Outgoing.verizon.net
Virgin
Virgin.net
Smtp.virgin.net
Wanadoo
Wanadoo.fr
Smtp.wanadoo.fr
AT&T
Att.net
Outbound.att.net


Configure SMTP server in apachi airflow:


 
    Airflow is a python based platform for schedule and monitoring the workflows. Want to know more about airflow go through the airflow document.

Airflow allow us to send email notification when specific event occurs like job fails, retry, sla notification. It having email operator also you can send email based on your requirement. Before that we need to configure SMTP server for enable airflow can send email notification.

We need to add smtp configuration settings in airflow.cfg file which is usually located in airflow home folder (/root/airflow/).  Here I have using gmail SMTP server for below example configuration. 

Open airflow.cfg file in any editor and add configuration in [smtp] section.

[smtp]
# If you want airflow to send emails on retries, failure, and you want to use
# the airflow.utils.email.send_email_smtp function, you have to configure an smtp
# server here
smtp_host = smtp.gmail.com
smtp_starttls = True
smtp_ssl = False
smtp_user = <your gmail address>
smtp_port = 587
smtp_password = <password>
smtp_mail_from = <from mail address>

In default SMTP uses port 25, but if you using google SMTP server, you have to use port 587. Once done your configuration try using EmailOperator class in airflow to check the functionality.

send_mail = EmailOperator (
                         dag=dag_id,
                         task_id="simple_email_opearetor",
                         to=[''my_mail@gmail.com","second_mail@yahoo.com"],
                         subject="Testing",
                         html_content="<h3>Welcome to Airflow</h3>")




Wednesday 31 August 2016

How to split large files into smaller chunk files using python?

In big data world, many of us handing large data files. When the file size is very big (above 10 GB) it is difficult to handle it as a single big file, at the time we need to split into several smaller chunks and than process it.

There are several ways to split a large files, but here I have given one of the two ways to do this process.
  • Using unix commands to split a file
  • Second option is typical pythonic way

1. Using unix commands:


Unix having inbuilt spilt command for split a files into smaller files.
split [options] filename prefix
Options:

   There are several options are there in split command (see using man command).  main options are split based on number of lines or size
  -l linenumber

  -b bytes

Example: split -l 1000 myfile.csv

How to call unix commands using python?


    Python having inbuilt OS module to do this. But I recommend to use subprocess module for calling unix commands inside python code. The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes.

code:
 import subprocess
 cmd = ["split", "-l", "10000", "", "myfile.csv"]
 subprocess.check_call(cmd) 

2. Using python file object to split files:


We can able to split files in typical python way. you can read the file line by line and write it to new file. based on the line count we can split the files.

chunked_file_line_count = 1000
with open("mylargefile.csv") as file_obj:
                line_count = 0
                file_count = 0
                print "================ Spliting file ======================== "+self.file_name
                chunked_file_object = open("file_"+str(file_count)+".csv","wb")
                for line in file_obj:
                    chunked_file_object.write(line)
                    line_count = line_count + 1
                    if line_count == chunked_file_line_count:
                        file_count = file_count + 1
                        line_count = 0
                        print "writing file " + str(file_count)
                        chunked_file_object = open("file_"+str(file_count)+".csv","wb")

                chunked_file_object.close()









Monday 22 August 2016

What is SSH key authentication?


SSH key authentication providing secure way of logging into server without entering password. while password is easily cracked with a brute force attack but SSH key is impossible to crack.

So what is SSH key authentication?

   SSH key is a two long string generated using key-pair which you have provided. one string is called public key and another one is called private key. You have to place public key in your server and private key in your client machine. When you request the server, it will check the provided private key with server public key, If it matches then system unlocks without the need for a password.


Step 1: Creating public key and private key using puttygen:



PuTTYgen is a key generator. It generates pairs of public and private keys. key passphrase is giving additional security but you can leave it as empty. You could generate the keys and store it into your local machine. putty key generator have options to save public key and private key.



Step 2: Add your public key to server:

Copy the public key from keygen. Key is shown in following format

                                             "ssh-rsa <keystring> keycomment"


                                  






Add your public key into your server ssh folder. In windows ssh folder available in the following general path C:\Users\<username>\.ssh. save the ppk file into this path.

In Linux machine it will available in user home location "/home/<username>/.ssh"
.ssh folder is hidden by default.









There is an "authorized_keys" file available in .ssh folder. This file contains the list of public keys for the server. You have to add your public key into authorized_key file. Open "authorized_keys" in any one editor(vi, vim, nano.. ) and add your new public key to end of the file.



Step 3: Call your server with private key

You have stored public key into server ssh folder. Now call your server with locally stored private key file. If your private key and server public key match up, than server allows you without asking password.

Windows user connect server via putty tool. 

  • Open putty
  • Give IP address of the server in hostname location
  • Click SSH authentication option in left side and then click Auth section
  • Browse private key file which is stored in your local machine and click open
  • Enter username for login (name of the user you have added public key in .ssh folder in server)




Linux User:

Linux user can connect server via terminal using ssh. go to the path of your pem key file(not .ppk file) and put following command.

               ssh -i <pem key file> username@hostname

Example:  ssh -i private_key.pem ubuntu@10.10.10.10

Linux or mac machine does not support .ppk file, it supports .pem(permission file) format. you can convert ppk file in to pem file using puttygen software.

Converting .ppk file to .pem file:

  • Open puttygen
  • load existing ppk file
  • click conversions in menu bar
  • choose Export open ssh key
  • save it in to .pem format

you could have convert vice versa. You can convert ppk to pem and also pem to ppk file.