All Products
Search
Document Center

MaxCompute:ORC external tables

Last Updated:Aug 26, 2025

This topic describes how to create, read from, and write to external tables in ORC format that are stored in OSS.

Prerequisites

  • Ensure your Alibaba Cloud account, RAM user, or RAM role has the required permissions to access OSS external tables. For more information about authorization, see STS authorization for OSS.

  • (Optional) Prepare an OSS bucket, directories, and data files. For more information, see Create buckets, Manage directories, and Simple upload.

    MaxCompute can now automatically create directories in OSS. You can use a single SQL statement to read from or write to an external table that uses a UDF. Manual directory creation is no longer required, but the legacy method is still supported.
  • A MaxCompute project has been created. For more information, see Create a MaxCompute project.

    MaxCompute is deployed only in specific regions. To prevent cross-region data connectivity issues, we recommend that you use a bucket in the same region as your MaxCompute project.
  • The Alibaba Cloud account or RAM user has the CreateTable permission on your project. For more information about table operation permissions, see MaxCompute permissions.

Limits

  • The cluster attribute is not supported for OSS external tables.

  • A single file cannot exceed 3 GB. If a file exceeds this limit, we recommend that you split it.

Notes

If the schema in the ORC file is inconsistent with the external table schema:

  • Inconsistent number of columns: If the number of columns in the ORC file is less than the number of columns in the Data Definition Language (DDL) statement of the external table, the system fills the missing columns with NULL values when reading the ORC data. If the number of columns in the ORC file is greater, the extra column data is discarded.

  • Inconsistent column types: MaxCompute allows you to read INT data from an ORC file as the STRING type, but this is not recommended. When you attempt to read STRING data as the INT type, string values are converted to NULL, while numeric values are processed correctly.

Data type support

Note

In the following table, 已开通 indicates support and 未开通 indicates no support.

  • For more information about MaxCompute data types, see Data Types Version 1.0 and Data Types Version 2.0.

  • Java Native Interface (JNI) mode (Native ORC Reader is not used to read tables): set odps.ext.oss.orc.native=false;. Read and write operations are supported.

  • Native mode (Native ORC Reader is used to read tables): set odps.ext.oss.orc.native=true;. Only read operations are supported.

Data type

JNI mode support

Native mode support

TINYINT

已开通

已开通

SMALLINT

已开通

已开通

INT

已开通

已开通

BIGINT

已开通

已开通

BINARY

已开通

已开通

FLOAT

已开通

已开通

DOUBLE

已开通

已开通

DECIMAL(precision,scale)

已开通

已开通

VARCHAR(n)

已开通

已开通

CHAR(n)

已开通

已开通

STRING

已开通

已开通

DATE

已开通

已开通

DATETIME

已开通

已开通

TIMESTAMP

未开通

未开通

TIMESTAMP_NTZ

未开通

未开通

BOOLEAN

已开通

已开通

ARRAY

已开通

已开通

MAP

已开通

已开通

STRUCT

已开通

已开通

JSON

未开通

未开通

Supported compression formats

When you read from or write to compressed OSS files, add the with serdeproperties attribute to the CREATE TABLE statement. For more information, see with serdeproperties parameters.

The supported data file formats for read and write operations are ORC files compressed using SNAPPY or ZLIB.

Create a external table

Syntax

For more information about the syntax structure of external tables for each format, see OSS external tables.

  • Basic syntax

    CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name> ( <col_name> <data_type>, ... ) [COMMENT <table_comment>] [PARTITIONED BY (<col_name> <data_type>, ...)] STORED AS orc LOCATION '<oss_location>';
  • Full syntax

    CREATE EXTERNAL TABLE [IF NOT EXISTS] <mc_oss_extable_name> ( <col_name> <data_type>, ... ) [COMMENT <table_comment>] [PARTITIONED BY (<col_name> <data_type>, ...)] ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' WITH serdeproperties( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS orc LOCATION '<oss_location>' tblproperties ( '<xxx>'='<yyy>' );

Common parameters

For information about common parameters, see Basic syntax parameters.

Unique parameters

Properties for WITH SERDEPROPERTIES

property_name

Scenario

Description

property_value

Default value

mcfed.orc.schema.resolution

Add this property when the data in the same OSS external table has different schemas.

Specifies the parsing method for ORC files. The value name indicates that files are parsed by column name.

name

By column position. This is equivalent to setting mcfed.orc.schema.resolution'='position'.

Properties for TBLPROPERTIES

property_name

Scenario

Description

property_value

Default value

mcfed.orc.compress

Add this property to write ORC data to OSS in a compressed format.

ORC compression property. Specifies the compression method for ORC data.

  • SNAPPY

  • ZLIB

None

io.compression.codecs

Add this property if the OSS data file is in Raw-Snappy format.

Set this parameter to enable MaxCompute to read Raw-Snappy compressed data.

com.aliyun.odps.io.compress.SnappyRawCodec

None

Write data

For more information about the write syntax in MaxCompute, see Write syntax.

Query and analyze data

Scenarios

This section describes how to create an ORC external table with SNAPPY compression, and then read data from and write data to the table.

Note
  • When you run the following sample code, replace <uid> with your Alibaba Cloud account ID.

  • The role used in the following examples is aliyunodpsdefaultrole. If you want to use a different role, replace aliyunodpsdefaultrole with the name of the target role and grant the target role permissions to access OSS.

  1. Prepare an ORC file that is compressed by using SNAPPY.

    In the oss-mc-test bucket for the Sample data, create the orc_snappy/dt=20250526 folder hierarchy, and store the snappy file in the dt=20250526 partition folder.

  2. Create an ORC external table with SNAPPY compression.

    CREATE EXTERNAL TABLE orc_data_type_snappy ( vehicleId INT, recordId INT, patientId INT, calls INT, locationLatitute DOUBLE, locationLongtitue DOUBLE, recordTime STRING, direction STRING ) PARTITIONED BY (dt STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' WITH serdeproperties ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) STORED AS ORC LOCATION 'oss://oss-cn-hangzhou-internal.aliyuncs.com/oss-mc-test/orc_snappy/' tblproperties ( 'mcfed.orc.compress'='SNAPPY');
  3. Import partition data. If an OSS external table is a partitioned table, you must perform an additional operation to import partition data. For more information, see Syntax for adding partition data to an OSS external table.

    -- Import partition data MSCK REPAIR TABLE orc_data_type_snappy ADD PARTITIONS;
  4. Read data from the ORC external table.

    SELECT * FROM orc_data_type_snappy WHERE dt=20250526 LIMIT 10;

    The following result is returned:

    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | dt | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+ | 1 | 12 | 76 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:10 | SW | 20250526 | | 1 | 1 | 51 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:00 | S | 20250526 | | 1 | 2 | 13 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:01 | NE | 20250526 | | 1 | 3 | 48 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:02 | NE | 20250526 | | 1 | 4 | 30 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:03 | W | 20250526 | | 1 | 5 | 47 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:04 | S | 20250526 | | 1 | 6 | 9 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:05 | S | 20250526 | | 1 | 7 | 53 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:06 | N | 20250526 | | 1 | 8 | 63 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:07 | SW | 20250526 | | 1 | 9 | 4 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:08 | NE | 20250526 | | 1 | 10 | 31 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:09 | N | 20250526 | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+
  5. Write data to the ORC external table and query the data.

    INSERT INTO orc_data_type_snappy PARTITION (dt ='20250526') VALUES (1,16,76,1,46.81006,-92.08174,'9/14/2014 0:10','SW'); -- Query the newly written data SELECT * FROM orc_data_type_snappy WHERE dt = '20250526' AND recordid=16;

    The following result is returned:

    +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+ | vehicleid | recordid | patientid | calls | locationlatitute | locationlongtitue | recordtime | direction | dt | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+ | 1 | 16 | 76 | 1 | 46.81006 | -92.08174 | 9/14/2014 0:10 | SW | 20250526 | +------------+------------+------------+------------+------------------+-------------------+----------------+------------+------------+