Skip to main content

HIVE : CREATE & DROP database

HIVE is a Data warehouse, it is not a database. 

Like how important it is to understand when to use a particular tool, it is equally important to understand when NOT to use it.

  • HIVE is designed for only analytical operations in large scale, it is not a good fit for transactional operations.
  • HIVE data is totally de-normalized.
  • HIVE supports JOINS but need to avoid them as much as we can to improve performance. 
  • HIVE query language HQL is similar to SQL.

Lets understand the relation between Hadoop and HIVE :

  • HDFS is having folders and files
  • HIVE have databases and tables
  • When we create a database in HIVE, it will create a folder in HDFS
  • When we create a table in HIVE, it will create a folder in HDFS
  • When we insert records in HIVE table, those records will be saved in HDFS in the form of files
  • Delimiter is very important while creating a table in HIVE
  • Delimiter can be a Comma, Tab etc.,
  • HIVE can store structured, semi-structured & un-structured data but it is important to convert un-structured data into HIVE understandable format using serDe, we will see more about it in further blogs of HIVE


              Table in HIVE                                 How it sore in HDFS as a file

               C1    C2    C3                                    11,12,13

               11    12    13                                       21,22,23

                21    22    23


Note :

Always write your HIVE queries in a notepad and copy paste them in HIVE prompt because HIVE won't allow you to update the query if there is a mistake.



Create a Database in HIVE :

Syntax 1 :

CREATE DATABASE IF NOT EXISTS <DB_NAME>;


Syntax 2 :

CREATE DATABASE IF NOT EXISTS <DB_NAME>

COMMENT 'COMMENT ON DATABASE'

LOCATION 'PATH OF THE DATABASE LOCATION'

WITH DBPROPERTIES(KEY1 = VALUE1, KEY2 = VALUE2,….)

;


You can mention location to explicitly mention where you want to locate your table :

Example :

create database IF NOT EXISTS sample3
COMMENT 'sample 2 database'
LOCATION '/hive/sample3'
;



Location path should be a folder, it won't work if it is a file path. Because in HIVE table will be saved as folder and records will be saved as file.


We can also add some properties as shown below :

Example :

create database IF NOT EXISTS sample4

COMMENT 'sample 4 database'

LOCATION '/hive/sample4'

WITH DBPROPERTIES('key1' = 'value1', 'key2' = 'value2')

;

These properties will be useful while working with other systems like Spark, and other NoSQL databases to send them some access information.


Using EXTENDED keyword while describing a database will print additional information as well like key, value information which we have used in above WITH DBPROPERTIES keyword while creating sample4 database.

Describe database EXTENDED sample4; 



Note :

  • WHEN WE CREATE A DATABASE / TABLE LOCATION PATH MUST & SHOULD BE FOLDER PATH, NOT FILE PATH
    • IF FOLDER PATH EXISTS, THEN IT WILL USE IT 
    • IF FOLDER PATH NOT EXISTS, THEN IT WILL CREATE IT 
    • IF FILE PATH EXISTS, THEN IT THROW AN ERROR



DROP DATABASE in HIVE :

Syntax :

  • DROP DATABASE <DATABASE_NAME>;
  • DROP DATABASE IF EXISTS <DATABASE_NAME>;
  • DROP DATABASE IF EXISTS <DATABASE_NAME>;
  • DROP DATABASE IF EXISTS <DATABASE_NAME> RESTRICT;
  • DROP DATABASE IF EXISTS <DATABASE_NAME> CASCADE;

Examples :

  • DROP database IF EXISTS sample1;
  • DROP database IF EXISTS sample1 RESTRICT; ==> It will restrict to drop a database if it is having some tables in it
  • DROP database IF EXISTS sample1 CASCADE; ==> This is like a force drop, it will drop database though there are tables associated to this database sample1


We will see more about creating tables in next blog. Have a great day!



Arun Mathe

Gmail ID : arunkumar.mathe@gmail.com



Comments

Popular posts from this blog

(AI #1) Deep Learning and Neural Networks

I was curious to learn Artificial Intelligence and thinking what is the best place to start learning, and then realized that Deep Learning and Neural Networks is the heart of AI. Hence started diving into AI from this point. Starting from today, I will write continuous blogs on AI, especially Gen AI & Agentic AI. Incase if you are interested on above topics then please watch out this space. What is Artificial Intelligence, Machine Learning & Deep Learning ? AI can be described as the effort to automate intellectual tasks normally performed by Humans. Is this really possible ? For example, when we see an image with our eyes, we will identify it within a fraction of milliseconds. Isn't it ? For a computer, is it possible to do the same within same time limit ? That's the power we are talking about. To be honest, things seems to be far advanced than we actually thing about AI.  BTW, starting from this blog, it is not just a technical journal, we talk about internals here. ...

Spark Core : Understanding RDD & Partitions in Spark

Let us see how to create an RDD in Spark.   RDD (Resilient Distributed Dataset): We can create RDD in 2 ways. From Collections For small amount of data We can't use it for large amount of data From Datasets  For huge amount of data Text, CSV, JSON, PDF, image etc. When data is large we should go with Dataset approach     How to create an RDD ? Using collections val list = List(1, 2, 3, 4, 5, 6) val rdd = sc.parallelize(list) SC is Spark Context parallelize() method will convert input(collection in this case) into RDD Type of RDD will be based on the values assigned to collection, if we assign integers and RDD will be of type int Let's see below Scala code : # Created an RDD by providing a Collection(List) as input scala> val rdd = sc.parallelize(List(1, 2, 3, 4, 5)) rdd: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:23 # Printing RDD using collect() method scala> rdd.collect() res0: Array[Int] = Array(1, 2, 3, 4...

(AI #3) Deep Learning Foundations - Activation & Loss Functions, Gradient Descent algorithms & Optimization techniques

It is extremely important to have a deep knowledge while designing a machine learning model, otherwise we will end up creating ML models which are of no use. We have to have a clear understanding on certain techniques to confidently build a ML model, train it using "training data", finalize the model and to deploy it in production. So far, from blog #1, #2, we have seen about the fundamentals of Deep Learning and Neural Network, architecture of a Neural Network, internal layers and components etc.  Providing the links of Blogs #1 , #2 below for quick reference. Deep Learning & Neural Networks : https://arunsdatasphere.blogspot.com/2026/01/deep-learning-and-neural-networks.html Building a real world neural network: A practical usecase explained : https://arunsdatasphere.blogspot.com/2026/01/building-real-world-neural-network.html Now let's dive through below concepts/criteria to help gaining confidence on building your ML model: Activation Functions (Forward Propaga...