Please keep below points in mind before creating a table in HIVE.
- HIVE is not a database but it is a data warehouse which is designed for data analytics(OLAP)
- HIVE doesn't have any internal storage to store data, it will use HDFS to store table data and RDBMS(like MySQL) to store schema related information. Hence we need to have a RDBMS like MySQL.
- When we create a database, HIVE will create a folder in HDFS. Later when we create a HIVE table, it will create another folder inside database folder.
- After creating database, table, once we load/insert data into it then that data will be stored as a file in HDFS and schema information will be stored in MySQL.
- Delimiter is very important while loading the data, we will see some examples below.
- As this is analytics, we know what type of data is there(as data is historical) and we will create schema based on the data. Based on the data, we need to create a table.
HOW TO CREATE DATABASE
-----------------------------
CREATE DATABASE IF NOT EXISTS kalyan;
CREATE DATABASE IF NOT EXISTS kalyan
COMMENT 'My database name is kalyan'
LOCATION '/hive/kalyan'
WITH DBPROPERTIES ('key1' = 'value1', 'key2' = 'value2');
DESCRIBE DATABASE kalyan;
DESCRIBE DATABASE EXTENDED kalyan; // It will show detailed table information
HOW TO DROP DATABASE
-----------------------------
DROP DATABASE IF EXISTS kalyan; // To drop an empty database
DROP DATABASE IF EXISTS kalyan CASCADE; // Use to drop a database which has tables
DROP DATABASE IF EXISTS kalyan RESTRICT;
HOW TO USE DATABASE
-----------------------------
USE kalyan;
HOW TO ALTER DATABASE
-----------------------------
ALTER DATABASE kalyan SET DBPROPERTIES ('key1' = 'value11', 'key3' = 'value3');
ALTER DATABASE kalyan SET OWNER USER hadoop;
ALTER DATABASE kalyan SET OWNER ROLE hadoop;
Example of a HIVE table :
- Always create EXTERNAL tables in production, otherwise we will land into data loss scenarios.
- Observe that FIELDS(COLUMNS) are terminated by '\t' ; that means columns are tab separated
- Observe that LINES are terminated by '\n' ; that means lines are separated by new line character
- We are going to store this as text file
- If we don't mention a location as below table then by default file will store in default warehouse location mentioned in hive-site.xml (property will be mentioned as below)
- <property>
- <name>hive.metastore.warehouse.dir</name>
- <value>/home/arun/work/warehouse</value>
- </property>
- Note that we have used delimiters, hence we need to load data which is in the above format otherwise entire data will be loaded into one FIELD/COLUMN and rest of the columns will be filled with NULL values
CREATE EXTERNAL TABLE IF NOT EXISTS student
(
name string comment 'student name',
id int comment 'student id',
course string comment 'student course',
year int comment 'student year'
)
COMMENT 'My table name is student'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t' // By default FIELD separator is "ctrl + a" or in unicode \001
LINES TERMINATED BY '\n' // By default LINE separator is \n ; so this line is optional
STORED AS TEXTFILE // By default it will store a text file ; so this line is also optional
LOCATION '/hive/arun/student'
TBLPROPERTIES ('key1'='value1', 'key2'='value2')
;
Incase if we want to store the data which is "," separated then we have to write as below :
CREATE EXTERNAL TABLE IF NOT EXISTS student
(
name string comment 'student name',
id int comment 'student id',
course string comment 'student course',
year int comment 'student year'
)
COMMENT 'My table name is student'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' // By default FIELD separator is "ctrl + a" or in unicode \001
LINES TERMINATED BY '\n' // By default LINE separator is \n ; so this line is optional
STORED AS TEXTFILE // By default it will store a text file ; so this line is also optional
LOCATION '/hive/arun/student'
TBLPROPERTIES ('key1'='value1', 'key2'='value2')
;
HOW TO KNOW INFORMATION ABOUT TABLE
----------------------------------------------------------------
DESCRIBE tbl_name;
DESCRIBE EXTENDED tbl_name; // To get more information about table
DESCRIBE FORMATTED tbl_name; // To get more information about table in structured format
HOW TO LOAD DATA INTO TABLE
-----------------------------------------------
- LOAD DATA LOCAL INPATH '/home/arun/work/hive_inputs/student.txt' OVERWRITE INTO TABLE student2;
- LOAD DATA LOCAL INPATH '/home/arun/work/hive_inputs/student.txt' INTO TABLE student2;
hadoop fs -mkdir -p /home/arun/work/hive_inputs
hadoop fs -put /home/arun/work/hive_inputs/student.txt /home/orienit/work/hive_inputs/student.txt
- LOAD DATA INPATH '/home/arun/work/hive_inputs/student.txt' OVERWRITE INTO TABLE student2;
hadoop fs -put /home/arun/work/hive_inputs/student.txt /home/orienit/work/hive_inputs/student.txt
- LOAD DATA INPATH '/home/arun/work/hive_inputs/student.txt' INTO TABLE student2;
Points to remember during data loading :
- We can load data in 4 types as shown above
- local to HDFS (with overwrite - old data will be deleted)
- local to HDFS(without overwrite - old data will be retained, new data will be appended)
- HDFS to HDFS (with overwrite - old data will be deleted)
- HDFS to HDFS (without overwrite - old data will be retained , new data will be appended)
- When we load data from local to HDFS then it is a COPY operation
- When we load data from HDFS to HDFS then it is a MOVE operation
Note : We don't load data this way in production, we will generally do something like "INSERT OVERWRITE" etc. which we will see in further blogs
Thanks,
Arun Mathe
Email ID : arunkumar.mathe@gmail.com
Contact ID : 9704117111
Comments
Post a Comment