Apache Sqoop Import Command Usage

posted on Nov 20th, 2016

Apache Sqoop

Apache Sqoop is a command-line interface application for transferring data between relational databases and Hadoop. It supports incremental loads of a single table or a free form SQL query as well as saved jobs which can be run multiple times to import updates made to a database since the last import. Imports can also be used to populate tables in Hive or HBase. Exports can be used to put data from Hadoop into a relational database. Sqoop got the name from sql+hadoop. Sqoop became a top-level Apache project in March 2012.

Pre Requirements

1) A machine with Ubuntu 14.04 LTS operating system.

2) Apache Hadoop pre installed (How to install Hadoop on Ubuntu 14.04)

3) MySQL Database pre installed (How to install MySQL Database on Ubuntu 14.04)

4) Apache Sqoop pre installed (How to install Sqoop on Ubuntu 14.04)

Sqoop Import From MySQL Database data to HDFS

The 'Import tool' imports individual tables from RDBMS to HDFS. Each row in a table is treated as a record in HDFS. All records are stored as text data in the text files or as binary data in Avro and Sequence files.

Step 1 - Change the directory to /usr/local/hadoop/sbin

$ cd /usr/local/hadoop/sbin

Step 2 - Start all hadoop daemons.

$ start-all.sh

Step 3 - The JPS (Java Virtual Machine Process Status Tool) tool is limited to reporting information on JVMs for which it has the access permissions.

$ jps

Step 4 - Enter into MySQL command line inteface(CLI). Open a terminal (CTRL + ALT + T) and type the following command.

$ mysql -u root -p

Enter password

Enter password :****

Step 5 - Create a new database 'userdb'

mysql> create database userdb;

Step 6 - Use database 'userdb'

mysql> use userdb;

Step 7 - Create a new table 'employee'

mysql> create table employee(id int,name varchar(20),deg varchar(20),salary int,dept varchar(20), primary key(id));

Step 8 - Insert data into employee table.

mysql> insert into employee values(1201,'gopal','manager',50000,'TP');
mysql> insert into employee values(1202,'manisha','Proof reader', 50000,'TP');
mysql> insert into employee values(1203,'khalil', 'php dev',30000,'AC');
mysql> insert into employee values(1204,'prasanth','php dev',30000,'AC');
mysql> insert into employee values(1205,'kranthi','admin','20000','TP');

Step 9 - Verify

mysql> select * from employee;

Step 10 - Create a new table 'employee_address'

mysql> create table employee_address(id int,hno varchar(20),street varchar(20),city varchar(20),primary key(id));

Step 11 - Insert data into employee_address table.

mysql> insert into employee_address values(1201,'288A','vgiri','jublee');
mysql> insert into employee_address values(1202,'108I','aoc','sec-bad');
mysql> insert into employee_address values(1203,'144Z','pgutta','hyd');
mysql> insert into employee_address values(1204,'78B','old city','sec-bad');
mysql> insert into employee_address values(1205,'720X','hitec','sec-bad');

Step 12 - Verify

mysql> select * from  employee_address;

Step 13 - Create a new table 'employee_contact'

mysql> create table employee_contact(id int,phno varchar(20),email varchar(30),primary key(id));

Step 14 - Insert data into employee_contact table.

mysql> insert into employee_contact values(1201,'2356742','gopal@tp.com');
mysql> insert into employee_contact values(1202,'1661663','manisha@tp.com');
mysql> insert into employee_contact values(1203,'8887776','khalil@ac.com');
mysql> insert into employee_contact values(1204,'9988774','prasanth@ac.com');
mysql> insert into employee_contact values(1205,'2356742','kranthi@tp.com');

Importing a Table

Sqoop tool 'import' is used to import table data from the table to the Hadoop file system as a text file or a binary file.

Step 15 - Change the directory to /usr/local/sqoop/bin

$ cd /usr/local/sqoop/bin

Step 16 - The following command is used to import the employee table from MySQL database server to HDFS.

$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--password root \
--table employee --m 1

Apache Sqoop Import Command Usage

Step 17 - Verify

$ hdfs dfs -cat /user/hduser/employee/part-m-00000

Apache Sqoop Import Command Usage

Apache Sqoop Import Command Usage

Importing into Target Directory

We can specify the target directory while importing table data into HDFS using the Sqoop import tool.

Step 18 - The following command is used to import employee_address table data into '/targetflodername' directory.

$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--password root \
--table employee_address \
--m 1 \
--target-dir /targetflodername

Apache Sqoop Import Command Usage

Apache Sqoop Import Command Usage

Step 19 - Verify

$ hdfs dfs -cat /targetflodername/part-m-00000

Apache Sqoop Import Command Usage

Import Subset of Table Data

We can import a subset of a table using the 'where' clause in Sqoop import tool. It executes the corresponding SQL query in the respective database server and stores the result in a target directory in HDFS.

$ cd /usr/local/sqoop/bin

Step 20 - The following command is used to import a subset of employee_address table data. The subset query is to retrieve the employee id and address, who lives in Secunderabad city.

$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--password root \
--table employee_address \
--m 1 \
--where "city ='sec-bad'" \
--target-dir /sqoop/wherequery

Step 21 - Verify

$ hdfs dfs -cat /sqoop/wherequery/part-m-00000

Apache Sqoop Import Command Usage

Incremental Import

Incremental import is a technique that imports only the newly added rows in a table. It is required to add 'incremental', 'check-column', and 'last-value' options to perform the incremental import.

Step 22 - Insert some more data into employee table.

mysql> insert into employee values(1206,'satish p','grp des','20000','GR');

Before going to next step please remove old employee folder from HDFS.

$ hdfs dfs -rmr /user/hduser/employee

Step 23 - The following command is used to perform the incremental import in the employee table.

$ sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root \
--password root \
--table employee \
--m 1 \
--incremental append \
--check-column id \
--last-value 1205

Apache Sqoop Import Command Usage

Step 24 - The following command is used to verify the imported data from employee table to HDFS employee/ directory.

$ hdfs dfs -cat /user/hduser/employee/part-m-00000

Step 25 - The following command is used to see the modified or newly added rows from the employee table.

$ hdfs dfs -cat /user/hduser/employee/part-m-00001

Apache Sqoop Import Command Usage

Import all tables

Import all the tables from the RDBMS database server to the HDFS. Each table data is stored in a separate directory and the directory name is same as the table name.

Let us take an example of importing all tables from the userdb database. The list of tables that the database userdb contains is as follows.

mysql> show tables;

Apache Sqoop Import Command Usage

Step 26 - The following command is used to import all the tables from the userdb database.

$ sqoop import-all-tables \
--connect jdbc:mysql://localhost/userdb \
--username root \
--password root

Step 27 - Verify

$ hdfs dfs -ls /user/hduser/

Apache Sqoop Import Command Usage

Please share this blog post and follow me for latest updates on

facebook             google+             twitter             feedburner

Previous Post                                                                                          Next Post

Labels : Apache Sqoop Installation on Ubuntu   Apache Sqoop Export Command Example   Apache Sqoop Job Command Example   Apache Sqoop Codegen Command Example   Apache Sqoop Eval Command Example   Apache Sqoop List-tables Command Example   Apache Sqoop List-tables Command Example