Skip to content

Sub command: Custom

Faisal Ali edited this page Jul 15, 2021 · 9 revisions

Introduction

Mock Data tool is designed with mocking tables based on the datatype of a column, it's not smart in determining if that is a name column or a email column etc. With custom sub command mock data tool provides the control to the user and lets the user decide the lifecycle of mocking the data to the tables, i.e

  1. User can pick which column to skip and let mock data tool decide the best data for it
  2. User can control what kind of data goes to a column i.e user can feed in custom dataset to mock ( i.e picked randomly during mocking )
  3. User can select from the list of supported realistic data key

NOTE: For all the realistic key, checkout the page

Under the custom subcommand the user is provided with a file and a plan of how the data will be loaded to the columns, the file can then be modified and fed to the tool to control the dataset to mock.

Short Hand: The short hand of the schema subcommand is c

Preference Order

There are 3 ways to load data using the custom tool,

  1. User provided dataset
  2. Realistic dataset
  3. Random dataset

The order of selection (in case two or more option is set for a column) of what kind of data to be used to mock the table is determined by the order mentioned above i.e user generated dataset is give preference over realistic dataset etc.

Usage

The usage of table subcommand is

[gpadmin@gpdb-m ~]$ mock custom --help Control the data being written to the tables Usage: mock custom [flags] Aliases: custom, c Flags: -f, --file string Mock the tables provided in the yaml file -h, --help help for custom -t, --table-name string Provide the table name whose skeleton need to be copied to the file Global Flags: -a, --address string Hostname where the postgres database lives -d, --database string Database to mock the data (default "gpadmin") -q, --dont-prompt Run without asking for confirmation -i, --ignore Ignore checking and fixing constraints -w, --password string Password for the user to connect to database -p, --port int Port number of the postgres database (default 3000) -r, --rows int Total rows to be faked or mocked (default 10) -u, --username string Username to connect to the database -v, --verbose Enable verbose or debug logging 

Example

As indicated above, you have choice of three ways to control the data to be loaded onto a table, click below if you want to quickly jump to the one you are interested

User Generated Dataset

  • Lets take a example of table that has a check constraint ( for eg.s partition in greenplum database or create have your own postgres database tables)
  • Now lets build a plan of this table
    mock custom --table-name sales -- OR -- mock c -t sales 

    NOTE:

    • If the table is not on the default public schema then use mock c -t <schema-name>.<table-name>
    • If you want to generate plan for multiple table then use mock c -t <schema-name1>.<table-name1>,<schema-name2>.<table-name2>...<schema-nameN>.<table-nameN>
  • Once the plan is generated you will received the location and yaml file at the end The YAML is saved to file: <PATH>/<FILENAME> creating-custom-files
  • Edit the file generated using any text editor of your choice
    • On the column you want to take control add array of value you would like to mock data to randomly pick under the UserData key, for eg we take control of date column below
      Custom: - Schema: public Table: sales Column: - Name: id Type: integer UserData: [] Realistic: "" - Name: date Type: date UserData: - 2016-01-01 - 2016-03-01 - 2016-04-01 Realistic: "" - Name: amt Type: numeric(10,2) UserData: [] Realistic: "" 
    • Continue this procedure for the rest of the columns you are interested
  • Using the custom generated plan, feed the yaml to the mock tool
    mock custom --file <filename or path/filename> -- OR -- mock c -f <filename or path/filename> 
    loading-data-via-custom-file
  • If you want more rows use the row flag
    mock custom --file <filename or path/filename> --row <total rows number> -- OR -- mock c -f <filename or path/filename> -r <total rows number> 

Realistic Dataset

  • Lets create a table eg.s
    CREATE TABLE employee ( name VARCHAR(100), email VARCHAR(120), mobile VARCHAR(50), gender VARCHAR(2), address VARCHAR(500) ); 
  • Let's generate a plan for the table
    mock custom --table-name employee -- OR -- mock c -t employee 
  • Edit the yaml generated using the above command to include realistic keys like below, for the complete list of realistic keys available check out this part of the code available here
    Custom: - Schema: public Table: employee Column: - Name: name Type: character varying(100) UserData: [] Realistic: "NameFullName" - Name: email Type: character varying(120) UserData: [] Realistic: "InternetEmail" - Name: mobile Type: character varying(50) UserData: [] Realistic: "PhoneNumberString" - Name: gender Type: character varying(2) UserData: [] Realistic: "NameGenderAbbrev" - Name: address Type: character varying(500) UserData: [] Realistic: "AddressString" 
  • Using the custom generated plan, feed the yaml to the mock tool
    mock custom --file <filename or path/filename> -- OR -- mock c -f <filename or path/filename> 
    realistic-data-loading

Random / User Generated / Realistic Dataset

If you combine all the three i.e power of random generated data / user provided & realistic you can have N possibilities of loading the data, let's take a example

  • Let us create a table

    CREATE TABLE employee ( name VARCHAR(100), password_hash VARCHAR(30), gender VARCHAR ); 
  • Let's generate a plan for the table

    mock custom --table-name employee -- OR -- mock c -t employee 
  • Edit the yaml generated using the above command, here we will use

    • name column will be fed by realistic data
    • password_hash column will be generated randomly by the tool
    • gender column will be inserted by user generated dataset

    so our yaml now looks like

    Custom: - Schema: public Table: employee Column: - Name: name Type: character varying(100) UserData: [] Realistic: "NameFullName" - Name: password_hash Type: character varying(30) UserData: [] Realistic: "" - Name: gender Type: character varying UserData: ["M", "F", "O"] Realistic: "" 
  • Using the custom generated plan, feed the yaml to the mock tool

    mock custom --file <filename or path/filename> -- OR -- mock c -f <filename or path/filename> 

    all-custom-command-options

Clone this wiki locally