How to Create a Table in Hive

Introduction

Apache Hive is a data warehousing tool used to perform queries and analyze structured data in Apache Hadoop. It uses a SQL-like language called HiveQL.

In this article, learn how to create a table in Hive and load data. We will also show you crucial HiveQL commands to display data.

How to create a table in Hive.

Prerequisites

  • A system running Linux
  • A user account with sudo or root privileges
  • Access to a terminal window/command line
  • Working Hadoop installation
  • Working Hive installation

Note: Follow our instructions for installing Hadoop and installing Hive on Ubuntu if you have not installed it before.

Create and Load Table in Hive

A table in Hive is a set of data that uses a schema to sort the data by given identifiers.

The general syntax for creating a table in Hive is:

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
(col_name data_type [COMMENT 'col_comment'],, ...)
[COMMENT 'table_comment']
[ROW FORMAT row_format]
[FIELDS TERMINATED BY char]
[STORED AS file_format];

Follow the steps below to create a table in Hive.

Step 1: Create a Database

1. Create a database named “company” by running the create command:

create database company;

The terminal prints a confirmation message and the time needed to perform the action.

2. Next, verify the database is created by running the show command:

show databases;

3. Find the “company” database in the list:

Creating a database in Hive.

4. Open the “company” database by using the following command:

use company;
Selecting database in Hive.

Step 2: Create a Table in Hive

The “company” database does not contain any tables after initial creation. Let’s create a table whose identifiers will match the .txt file you want to transfer data from.

1. Create an “employees.txt” file in the /hdoop directory. The file shall contain data about employees:

Creating the text file used to load the data in the table/

2. Arrange the data from the “employees.txt” file in columns. The column names in our example are:

  • ID
  • Name
  • Country
  • Department
  • Salary

3. Use column names when creating a table. Create the table by running the following command:

create table employees (id int, name string, country string, department string, salary int)

4. Create a logical schema that arranges data from the .txt file to the corresponding columns. In the “employees.txt” file, data is separated by a '-'. To create a logical schema type:

row format delimited fields terminated by '-';

The terminal prints out a confirmation message:

Creating table in Hive.

5. Verify if the table is created by running the show command:

show tables;
Verifying the table creation.

Step 3: Load Data From a File

You have created a table, but it is empty because data is not loaded from the “employees.txt” file located in the /hdoop directory.

1. Load data by running the load command:

load data inpath '/hdoop/employees.txt' overwrite into table employees;

2. Verify if the data is loaded by running the select command:

select * from employees;

The terminal prints out data imported from the employees.txt file:

Loading data into the table.

Display Hive Data

You have several options for displaying data from the table. By using the following options, you can manipulate large amounts of data more efficiently.

Display Columns

Display columns of a table by running the desc command:

desc employees;

The output displays the names and properties of the columns:

Displaying column information

Display Selected Data

Let’s assume that you want to display employees and their countries of origin. Select and display data by running the select command:

select name,country from employees;

The output contains the list of employees and their countries:

Selecting specific data for displaying.

Conclusion

After reading this article, you should have learned how to create a table in Hive and load data into it.

There is also a method of creating an external table in Hive. In Hive terminology, external tables are tables not managed with Hive. Their purpose is to facilitate importing of data from an external file into the metastore.

Working in Hive and Hadoop is beneficial for manipulating big data. Next, we recommend our article about Hadoop architecture to learn more about how Hadoop functions.

原创文章,作者:3628473679,如若转载,请注明出处:https://blog.ytso.com/223774.html

(0)
上一篇 2022年1月7日
下一篇 2022年1月7日

相关推荐

发表回复

登录后才能评论