Hive-1.2.0学习笔记（二）Hive数据类型

发布时间：2020-07-12 16:09:36 来源：网络阅读：2972 作者：luchunli1985 栏目：大数据

鲁春利的工作笔记，谁说程序员不能有文艺范？

在hive中创建mywork数据库，以后的测试在该数据库中进行，避免每次都使用default数据库。

hive> create database mywork; OK Time taken: 0.487 seconds hive> show databases; OK default mywork Time taken: 0.614 seconds, Fetched: 2 row(s) hive>  hive> use mywork; OK Time taken: 0.064 seconds hive> create table student(id int, name string); OK Time taken: 0.519 seconds hive>

查看Hive在HDFS上的存储

[hadoop@dnode1 ~]$ hdfs dfs -ls -R /user/hive drwxrw-rw-   - hadoop hadoop          0 2015-12-08 21:37 /user/hive/warehouse drwxrw-rw-   - hadoop hadoop          0 2015-12-08 21:36 /user/hive/warehouse/mywork.db drwxrw-rw-   - hadoop hadoop          0 2015-12-08 21:36 /user/hive/warehouse/mywork.db/student [hadoop@dnode1 ~]$

Hive支持的数据类型如下:

原生类型:

TINYINT                1字节 SMALLINT               2字节 INT                    4字节 BIGINT                 8字节 BOOLEAN                true/false FLOAT                  4字节 DOUBLE                 8字节 STRING                 字符串 BINARY (Hive 0.8.0以上才可用) TIMESTAMP (Hive 0.8.0以上才可用)

复合类型:

arrays: ARRAY<data_type>                有序字段，类型必须相同 maps: MAP<primitive_type, data_type>    无序的键/值对 structs: STRUCT<col_name : data_type [COMMENT col_comment], ...>    一组命名的字段 union: UNIONTYPE<data_type, data_type, ...>

说明：ARRAY数据类型通过下标来获取值，如arrays[0]，MAP通过["指定域名称"]访问， STRUCT类型通过点方式访问（如structs.col_name)。

建表示例：

hive> create table employee (     > eno int comment 'the no of employee',     > ename string comment 'name of employee',     > salary float comment 'salary of employee',     > subordinates array<string> comment 'employees managed by current employee',     > deductions map<string, float> comment 'deductions',     > address struct<province : string, city : string, street : string, zip : int> comment 'address'     > ) comment 'This is table of employee info'; OK Time taken: 0.33 seconds hive>

在Hive中各列之间，以及复合类型内部使用了不同的分隔符来指定，每行数据对应一条记录。

Hive-1.2.0学习笔记（二）Hive数据类型

在${HIVE_HOME}/data目录下创建文件data_default.txt文件，采用默认分隔符，内容为：

Hive-1.2.0学习笔记（二）Hive数据类型

Hive默认的字段分隔符为ascii码的控制符\001，建表的时候用fields terminated by '\001'。造数据在vi 打开文件里面，用ctrl+v然后再ctrl+a可以输入这个控制符\001（即^A）。按顺序，\002的输入方式为ctrl+v,ctrl+b。以此类推。

说明：

1000                    员工编号 zhangsan                员工姓名 5000.0                  员工工资 lisi^Bwangwu            下属员工 ptax^C200^Bpension^C200                工资扣除金额（如税收等） shandong^Bheze^Bdingtao^B274106        家庭住址（struct结构只需指定值即可）

加载数据

hive> load data local inpath 'data/data_default.txt' into table employee; Loading data to table mywork.employee Table mywork.employee stats: [numFiles=1, numRows=0, totalSize=83, rawDataSize=0] OK Time taken: 0.426 seconds hive> select * from employee; OK 1000    zhangsan        5000.0  ["lisi","wangwu"]       {"ptax":200.0,"pension":200.0}  {"province":"shandong","city":"heze","street":"dingtao","zip":274106} Time taken: 0.114 seconds, Fetched: 1 row(s) hive> # 对于复合类型数据查询方式如下 hive> select eno, ename, salary, subordinates[0], deductions['ptax'], address.province from employee; OK 1000    zhangsan        5000.0  lisi    200.0   shandong Time taken: 0.129 seconds, Fetched: 1 row(s) hive>

查看HDFS数据结构

[hadoop@dnode1 ~]$ hdfs dfs -ls -R /user/hive/warehouse/ drwxrw-rw-   - hadoop hadoop          0 2015-12-09 00:00 /user/hive/warehouse/mywork.db drwxrw-rw-   - hadoop hadoop          0 2015-12-09 00:00 /user/hive/warehouse/mywork.db/employee -rwxrw-rw-   2 hadoop hadoop         83 2015-12-09 00:00 /user/hive/warehouse/mywork.db/employee/data_default.txt drwxrw-rw-   - hadoop hadoop          0 2015-12-08 23:03 /user/hive/warehouse/mywork.db/student [hadoop@dnode1 ~]$ hdfs dfs -text /user/hive/warehouse/mywork.db/employee/data_default.txt 1000zhangsan5000.0lisiwangwuptax200pension200shandonghezedingtao274106 [hadoop@dnode1 ~]$

Hive-1.2.0学习笔记（二）Hive数据类型

自定义分隔符：

hive> create table employee_02 (     > eno int comment 'the no of employee',     > ename string comment 'name of employee',     > salary float comment 'salary of employee',     > subordinates array<string> comment 'employees managed by current employee',     > deductions map<string, float> comment 'deductions',     > address struct<province : string, city : string, street : string, zip : int> comment 'address'     > ) comment 'This is table of employee info'     > row format delimited fields terminated by '\t'     > collection items terminated by ','     > map keys terminated by ':'     > lines terminated by '\n'; OK Time taken: 0.228 seconds hive> load data local inpath 'data/data_employee02.txt' into table employee_02; Loading data to table mywork.employee_02 Table mywork.employee_02 stats: [numFiles=1, totalSize=99] OK Time taken: 0.371 seconds hive> select * from employee_02; OK 1000    'zhangsan'      5000.0  ["'lisi'","'wangwu'"]   {"'ptax'":200.0,"'pension'":200.0}      {"province":"'shandong'","city":"'heze'","street":"'dingtao'","zip":274106} Time taken: 0.101 seconds, Fetched: 1 row(s) hive>

data/employee02.txt文件内容为

[hadoop@nnode data]$ pwd /usr/local/hive1.2.0/data [hadoop@nnode data]$ cat data_employee02.txt  1000    'zhangsan'      5000.0  'lisi','wangwu' 'ptax':200,'pension':200        'shandong','heze','dingtao',274106 [hadoop@nnode data]$

说明：由于在文本文件中包含有单引号，在load到hive的表之后表示方式为属性加双引号，这里的单引号被认为了是属性或值的一部分了，需要注意。

查看详细表定义

# 建表时为默认设置 hive> describe formatted employee; OK # col_name              data_type               comment                                eno                     int                     the no of employee   ename                   string                  name of employee     salary                  float                   salary of employee   subordinates            array<string>           employees managed by current employee deductions              map<string,float>       deductions           address                 struct<province:string,city:string,street:string,zip:int>       address                                # Detailed Table Information              Database:               mywork                    Owner:                  hadoop                    CreateTime:             Tue Dec 08 23:10:07 CST 2015      LastAccessTime:         UNKNOWN                   Protect Mode:           None                      Retention:              0                         Location:               hdfs://cluster/user/hive/warehouse/mywork.db/employee     Table Type:             MANAGED_TABLE             Table Parameters:                         COLUMN_STATS_ACCURATE   true                         comment                 This is table of employee info         numFiles                1                            numRows                 0                            rawDataSize             0                            totalSize               83                           transient_lastDdlTime   1449590423                             # Storage Information             SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:            org.apache.hadoop.mapred.TextInputFormat OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed:             No                        Num Buckets:            -1                        Bucket Columns:         []                        Sort Columns:           []                        Storage Desc Params:                      serialization.format    1                    Time taken: 0.098 seconds, Fetched: 37 row(s) # 建表时自定义了分隔符 hive> describe formatted employee_02; OK # col_name              data_type               comment                                eno                     int                     the no of employee   ename                   string                  name of employee salary                  float                   salary of employee   subordinates            array<string>           employees managed by current employee deductions              map<string,float>       deductions           address                 struct<province:string,city:string,street:string,zip:int> address                   # Detailed Table Information              Database:               mywork                    Owner:                  hadoop                    CreateTime:             Wed Dec 09 00:12:53 CST 2015      LastAccessTime:         UNKNOWN                   Protect Mode:           None                      Retention:              0                         Location:               hdfs://cluster/user/hive/warehouse/mywork.db/employee_02    Table Type:             MANAGED_TABLE             Table Parameters:                         COLUMN_STATS_ACCURATE   true                         comment                 This is table of employee info         numFiles                1                            totalSize               99                           transient_lastDdlTime   1449591260                             # Storage Information             SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat:            org.apache.hadoop.mapred.TextInputFormat OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed:             No                        Num Buckets:            -1                        Bucket Columns:         []                        Sort Columns:           []                        Storage Desc Params:                      colelction.delim        ,                            field.delim             \t                           line.delim              \n                           mapkey.delim            :                            serialization.format    \t                   Time taken: 0.116 seconds, Fetched: 39 row(s) hive>

遗留问题：

hive> delete from student; FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations. hive>

注意事项：如果sql语句中含有tab格式的内容，则会出现如下问题

Hive-1.2.0学习笔记（二）Hive数据类型

向AI问一下细节

Hive-1.2.0学习笔记（二）Hive数据类型

猜你喜欢

最新资讯

相关推荐

相关标签