Posted on May 6, 2024

A Comprehensive Guide to Extracting Data from MySQL Using Singer ETL

In this guide, we'll walk through the process of extracting data from MySQL using tap-mysql and loading it into target-jsonl. This seamless process ensures efficient data transfer while maintaining integrity.

Step 1: Enable Python Virtual Environment (venv)

Even without administrative access, you can set up a Python virtual environment. Here's how:

Create a new virtual environment:
```
python3 -m venv <file_name> 
```
Navigate to the Script Directory within the created file and activate the virtual environment:
```
.\activate.bat 
```

Now, your virtual environment is active and ready for use.

Step 2: Install `tap-mysql` and `target-jsonl`

Utilize pip to install the necessary packages:

pip install tap-mysql target-jsonl

Step 3: Prepare Configuration Files

tap-mysql requires two input files: config.json and properties.json.

`config.json`:

{ "host": "127.0.0.1", "port": "3306", "user": "root", "password": "root" }

`properties.json`:

{ "streams": [ { "tap_stream_id": "sakila-actor_info", "table_name": "actor_info", "schema": { "properties": { "actor_id": { "inclusion": "available", "minimum": 0, "maximum": 65535, "type": [ "null", "integer" ] }, "first_name": { "inclusion": "available", "maxLength": 45, "type": [ "null", "string" ] }, "last_name": { "inclusion": "available", "maxLength": 45, "type": [ "null", "string" ] }, "film_info": { "inclusion": "available", "maxLength": 65535, "type": [ "null", "string" ] } }, "type": "object" }, "stream": "actor_info", "metadata": [ { "breadcrumb": [], "metadata": { "selected": true, "replication-method": "FULL_TABLE", "selected-by-default": false, "database-name": "sakila", "is-view": true } }, { "breadcrumb": [ "properties", "actor_id" ], "metadata": { "selected-by-default": true, "sql-datatype": "smallint unsigned" } }, { "breadcrumb": [ "properties", "first_name" ], "metadata": { "selected-by-default": true, "sql-datatype": "varchar(45)" } }, { "breadcrumb": [ "properties", "last_name" ], "metadata": { "selected-by-default": true, "sql-datatype": "varchar(45)" } }, { "breadcrumb": [ "properties", "film_info" ], "metadata": { "selected-by-default": true, "sql-datatype": "text" } } ] } ] }

Step 4: Generate `properties.json`

Execute the following command in discover mode to generate catalog.json:

tap-mysql --config config.json --discover > catalog.json

Locate the JSON content of the desired table in catalog.json and copy it into another file. Let's name this file selected_table.json.

In selected_table.json, add the following lines within the curly braces to ensure the table is selected:

{ "streams": [ // Paste the content here ] }

This step ensures that only the selected table is included for extraction.

Step 5: Run `tap-mysql`

Execute the following command:

tap-mysql --config config.json --catalog selected_table.json

Congratulations! You've successfully extracted data from MySQL using tap-mysql.

Step 6: Send Data to `jsonl` Target

Run the following command to send the data to jsonl target:

tap-mysql --config config.json --catalog selected_table.json | target-jsonl

A file with the same name as the table will be created.

Step 7: Convert Output to a DataFrame

Here's an example of how to convert the output to a DataFrame using Python:

import pandas as pd import json data = [] with open("<file_name>.jsonl", "r") as f: for line in f: data.append(json.loads(line)) df = pd.DataFrame(data) print(df.columns)

This step allows for further analysis and manipulation of the extracted data.

By following these steps, you've successfully extracted and transformed data from MySQL into a structured format, ready for analysis and insights.

DEV Community

A Comprehensive Guide to Extracting Data from MySQL Using Singer ETL

Step 1: Enable Python Virtual Environment (venv)

Step 2: Install `tap-mysql` and `target-jsonl`

Step 3: Prepare Configuration Files

`config.json`:

`properties.json`:

Step 4: Generate `properties.json`

Step 5: Run `tap-mysql`

Step 6: Send Data to `jsonl` Target

Step 7: Convert Output to a DataFrame

Top comments (0)

Step 1: Enable Python Virtual Environment (venv)

Step 2: Install tap-mysql and target-jsonl

Step 3: Prepare Configuration Files

config.json:

properties.json:

Step 4: Generate properties.json

Step 5: Run tap-mysql

Step 6: Send Data to jsonl Target

Step 7: Convert Output to a DataFrame

Step 2: Install `tap-mysql` and `target-jsonl`

`config.json`:

`properties.json`:

Step 4: Generate `properties.json`

Step 5: Run `tap-mysql`

Step 6: Send Data to `jsonl` Target