In this guide, we'll walk through the process of extracting data from MySQL using tap-mysql
and loading it into target-jsonl
. This seamless process ensures efficient data transfer while maintaining integrity.
Step 1: Enable Python Virtual Environment (venv)
Even without administrative access, you can set up a Python virtual environment. Here's how:
-
Create a new virtual environment:
python3 -m venv <file_name>
-
Navigate to the Script Directory within the created file and activate the virtual environment:
.\activate.bat
Now, your virtual environment is active and ready for use.
Step 2: Install tap-mysql
and target-jsonl
Utilize pip to install the necessary packages:
pip install tap-mysql target-jsonl
Step 3: Prepare Configuration Files
tap-mysql
requires two input files: config.json
and properties.json
.
config.json
:
{ "host": "127.0.0.1", "port": "3306", "user": "root", "password": "root" }
properties.json
:
{ "streams": [ { "tap_stream_id": "sakila-actor_info", "table_name": "actor_info", "schema": { "properties": { "actor_id": { "inclusion": "available", "minimum": 0, "maximum": 65535, "type": [ "null", "integer" ] }, "first_name": { "inclusion": "available", "maxLength": 45, "type": [ "null", "string" ] }, "last_name": { "inclusion": "available", "maxLength": 45, "type": [ "null", "string" ] }, "film_info": { "inclusion": "available", "maxLength": 65535, "type": [ "null", "string" ] } }, "type": "object" }, "stream": "actor_info", "metadata": [ { "breadcrumb": [], "metadata": { "selected": true, "replication-method": "FULL_TABLE", "selected-by-default": false, "database-name": "sakila", "is-view": true } }, { "breadcrumb": [ "properties", "actor_id" ], "metadata": { "selected-by-default": true, "sql-datatype": "smallint unsigned" } }, { "breadcrumb": [ "properties", "first_name" ], "metadata": { "selected-by-default": true, "sql-datatype": "varchar(45)" } }, { "breadcrumb": [ "properties", "last_name" ], "metadata": { "selected-by-default": true, "sql-datatype": "varchar(45)" } }, { "breadcrumb": [ "properties", "film_info" ], "metadata": { "selected-by-default": true, "sql-datatype": "text" } } ] } ] }
Step 4: Generate properties.json
Execute the following command in discover mode to generate catalog.json
:
tap-mysql --config config.json --discover > catalog.json
Locate the JSON content of the desired table in catalog.json
and copy it into another file. Let's name this file selected_table.json
.
In selected_table.json
, add the following lines within the curly braces to ensure the table is selected:
{ "streams": [ // Paste the content here ] }
This step ensures that only the selected table is included for extraction.
Step 5: Run tap-mysql
Execute the following command:
tap-mysql --config config.json --catalog selected_table.json
Congratulations! You've successfully extracted data from MySQL using tap-mysql
.
Step 6: Send Data to jsonl
Target
Run the following command to send the data to jsonl
target:
tap-mysql --config config.json --catalog selected_table.json | target-jsonl
A file with the same name as the table will be created.
Step 7: Convert Output to a DataFrame
Here's an example of how to convert the output to a DataFrame using Python:
import pandas as pd import json data = [] with open("<file_name>.jsonl", "r") as f: for line in f: data.append(json.loads(line)) df = pd.DataFrame(data) print(df.columns)
This step allows for further analysis and manipulation of the extracted data.
By following these steps, you've successfully extracted and transformed data from MySQL into a structured format, ready for analysis and insights.
Top comments (0)