GitHub - samqws-marketing/amzn-ion-hive-serde: https://github.com/amzn/ion-hive-serde.git

Amazon Ion Hive Serde

A Apache Hive SerDe (short for serializer/deserializer) for the Ion file format.

Features

Read data stored in Ion format both binary and text.
Supports all Ion types including nested data structures, see Type mapping documentation for more information.
Supports flattening of Ion documents through path extraction.
Supports importing shared symbol tables and custom symbol table catalogs.
IonInputFormat and IonOutputFormat are able to handle both Ion binary and Ion text.
Configurable through SerDe properties.

Installation

Download the latest ion-hive-serde-all-<version-number>.jar from [https://github.com/amzn/ion-hive-serde/releases] and place the JARs into hive/lib or use ADD JAR in Hive. That jar contains the SerDe and all its dependencies.

To build it locally run :./gradlew :serde:singleJar

Building

Project is separated into two modules:

serde: with the SerDe code and unit tests.
integration-tests: integration tests using a dockerized hive installation.

To build only the SerDe code:

./gradlew :serde:build

To build the SerDe including integration tests:

./gradlew build

Integration tests require docker to be installed, but the build itself will take care of creating the necessary containers, starting and stopping them. See integration-tests/README.md for more information, including how to run the integration tests on your IDE.

Examples

Examples shown using Ion text for readability but for better performance and compression Ion binary is recommended in production systems.

Simple query

~$ cat test.ion

{
  name: "foo",
  age: 32
}
{
  name: "bar",
  age: 28
}

$ hadoop fs -put -f test.ion /user/data/test.ion

$ hive

hive> CREATE DATABASE test;

hive> CREATE EXTERNAL TABLE test (
        name STRING,
        age INT
      )
      ROW FORMAT SERDE 'com.amazon.ionhiveserde.IonHiveSerDe'
      STORED AS
        INPUTFORMAT 'com.amazon.ionhiveserde.formats.IonInputFormat'
        OUTPUTFORMAT 'com.amazon.ionhiveserde.formats.IonOutputFormat'
      LOCATION '/user/data';

hive> SELECT * FROM test;
OK

foo 32
bar 28

Flattening

~$ cat test.ion

{
  personal_info: { name: "foo", age: 32 }
  professional_info: { job_title: "software engineer" }
}
{
  personal_info: { name: "bar", age: 28 }
  professional_info: { job_title: "designer" }
}


$ hadoop fs -put -f test.ion /user/data/test.ion

$ hive

hive> CREATE DATABASE test;

hive> CREATE EXTERNAL TABLE test (
        name STRING,
        age INT,
        jobtitle STRING
      )
      ROW FORMAT SERDE 'com.amazon.ionhiveserde.IonHiveSerDe'
      WITH SERDEPROPERTIES (
        "ion.name.path_extractor" = "(personal_info name)",
        "ion.age.path_extractor" = "(personal_info age)",
        "ion.jobtitle.path_extractor" = "(professional_info job_title)",
      )
      STORED AS
        INPUTFORMAT 'com.amazon.ionhiveserde.formats.IonInputFormat'
        OUTPUTFORMAT 'com.amazon.ionhiveserde.formats.IonOutputFormat'
      LOCATION '/user/data';

hive> SELECT * FROM test;
OK

foo 32 software engineer
bar 28 designer

Contributing

See CONTRIBUTING

License

This library is licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
config		config
docs		docs
gradle/wrapper		gradle/wrapper
integration-test		integration-test
serde		serde
.gitignore		.gitignore
.travis.yml		.travis.yml
.whitesource		.whitesource
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

License

samqws-marketing/amzn-ion-hive-serde

Folders and files

Latest commit

History

Repository files navigation

Amazon Ion Hive Serde

Features

Installation

Building

Examples

Simple query

Flattening

Contributing

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages