sqoop working
It is used to transfer data between rdbms to hdfs and vice versa.
- sqoop import: transfer from rdbms to hdfs
- sqoop export: transfer from hdfs to rdbms
- sqoop eval: to run queries on databases
Setup cloudera quickstart machine on local
We have multiple ways to setup cloudera machine either as VM or docker image, in this case we will use docker and local machine is ubuntu. So, prerequisite is to:
- setup docker, follow here
- run below to pull cloudera quickstart image and then run image which will setup cloudera machine to interact via terminal
docker pull cloudera/quickstart:latest
docker run --hostname=quickstart.cloudera --privileged=true -t -i -p 8888:8888 -p 80:80 -p 8080:8080 cloudera/quickstart /usr/bin/docker-quickstart
Commands
list databases
sqoop-list-databases \
--connect "jdbc:mysql://quickstart.cloudera:3306" \
--username root \
--password cloudera
list tables
sqoop-list-tables \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username root \
--password cloudera
query database
sqoop-eval \
--connect "jdbc:mysql://quickstart.cloudera:3306" \
--username root \
--password cloudera \
--query "select * from retail_db.customers limit 5"
import one table
It will run mapreduce job where only mappers will work. Default four mappers will run but can be changed. It divide work based on primary key. If no primary key then command will failm in that case we have two options:
- change number of mapper to one: use
-m 1
parameter - use split by column.
Example: Here orders
table has primary key and we testing import on this table.
sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username root \
--password cloudera \
--table orders \
--target-dir /result
Run below command after above import command to check files produced by import command, default number of mappers are 4 so thats why it will have four part-m files.
hadoop fs -ls /result
import all tables
sqoop-import-all-tables \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username root \
--password cloudera \
--table orders \
--warehouse-dir /result
Note:
- target-dir: files will directly be stored in the directory mentioned in command.
- warehouse-dir: files will be stored in the subdirectory (same as tablename) of the directory mentioned in command.
Note: sqoop is retired as of June 2021
Sources:
- Cloudera quickstart docker image for testing, source
Comments