Wednesday, 31 August 2016

How to split large files into smaller chunk files using python?

In big data world, many of us handing large data files. When the file size is very big (above 10 GB) it is difficult to handle it as a single big file, at the time we need to split into several smaller chunks and than process it.

There are several ways to split a large files, but here I have given one of the two ways to do this process.
  • Using unix commands to split a file
  • Second option is typical pythonic way

1. Using unix commands:


Unix having inbuilt spilt command for split a files into smaller files.
split [options] filename prefix
Options:

   There are several options are there in split command (see using man command).  main options are split based on number of lines or size
  -l linenumber

  -b bytes

Example: split -l 1000 myfile.csv

How to call unix commands using python?


    Python having inbuilt OS module to do this. But I recommend to use subprocess module for calling unix commands inside python code. The subprocess module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes.

code:
 import subprocess
 cmd = ["split", "-l", "10000", "", "myfile.csv"]
 subprocess.check_call(cmd) 

2. Using python file object to split files:


We can able to split files in typical python way. you can read the file line by line and write it to new file. based on the line count we can split the files.

chunked_file_line_count = 1000
with open("mylargefile.csv") as file_obj:
                line_count = 0
                file_count = 0
                print "================ Spliting file ======================== "+self.file_name
                chunked_file_object = open("file_"+str(file_count)+".csv","wb")
                for line in file_obj:
                    chunked_file_object.write(line)
                    line_count = line_count + 1
                    if line_count == chunked_file_line_count:
                        file_count = file_count + 1
                        line_count = 0
                        print "writing file " + str(file_count)
                        chunked_file_object = open("file_"+str(file_count)+".csv","wb")

                chunked_file_object.close()









Monday, 22 August 2016

What is SSH key authentication?


SSH key authentication providing secure way of logging into server without entering password. while password is easily cracked with a brute force attack but SSH key is impossible to crack.

So what is SSH key authentication?

   SSH key is a two long string generated using key-pair which you have provided. one string is called public key and another one is called private key. You have to place public key in your server and private key in your client machine. When you request the server, it will check the provided private key with server public key, If it matches then system unlocks without the need for a password.


Step 1: Creating public key and private key using puttygen:



PuTTYgen is a key generator. It generates pairs of public and private keys. key passphrase is giving additional security but you can leave it as empty. You could generate the keys and store it into your local machine. putty key generator have options to save public key and private key.



Step 2: Add your public key to server:

Copy the public key from keygen. Key is shown in following format

                                             "ssh-rsa <keystring> keycomment"


                                  






Add your public key into your server ssh folder. In windows ssh folder available in the following general path C:\Users\<username>\.ssh. save the ppk file into this path.

In Linux machine it will available in user home location "/home/<username>/.ssh"
.ssh folder is hidden by default.









There is an "authorized_keys" file available in .ssh folder. This file contains the list of public keys for the server. You have to add your public key into authorized_key file. Open "authorized_keys" in any one editor(vi, vim, nano.. ) and add your new public key to end of the file.



Step 3: Call your server with private key

You have stored public key into server ssh folder. Now call your server with locally stored private key file. If your private key and server public key match up, than server allows you without asking password.

Windows user connect server via putty tool. 

  • Open putty
  • Give IP address of the server in hostname location
  • Click SSH authentication option in left side and then click Auth section
  • Browse private key file which is stored in your local machine and click open
  • Enter username for login (name of the user you have added public key in .ssh folder in server)




Linux User:

Linux user can connect server via terminal using ssh. go to the path of your pem key file(not .ppk file) and put following command.

               ssh -i <pem key file> username@hostname

Example:  ssh -i private_key.pem ubuntu@10.10.10.10

Linux or mac machine does not support .ppk file, it supports .pem(permission file) format. you can convert ppk file in to pem file using puttygen software.

Converting .ppk file to .pem file:

  • Open puttygen
  • load existing ppk file
  • click conversions in menu bar
  • choose Export open ssh key
  • save it in to .pem format

you could have convert vice versa. You can convert ppk to pem and also pem to ppk file.