Having a large amount of test data sometimes take a lot of effort, and to simulate a more realistic scenario, it’s good to have a large number of tables with distinct column types. This script generates random tables schema for Hive.
If you want to set up a Hive environment for dev and test purposes, take a look at: https://dev.to/mesmacosta/quickly-set-up-a-hive-environment-on-gcp-38j8
Environment
Activate your virtualenv
pip install --upgrade virtualenv python3 -m virtualenv --python python3 env source ./env/bin/activate
Install the requirements for the metadata generator
pip install -r requirements.txt
Code
Execution
export HIVE_SERVER=127.0.0.1 export HIVE_USERNAME=hive export HIVE_PORT=10000 export HIVE_DATABASE=default python metadata_generator.py \ --hive-host=$HIVE_SERVER \ --hive-user=$HIVE_USERNAME \ --hive-port=$HIVE_PORT \ --hive-database=$HIVE_DATABASE
And that's it!
If you have difficulties, don’t hesitate reaching out. I would love to help you!
Top comments (0)