TrainDB is an ML model-based approximate query processing engine that aims to answer time-consuming analytical queries in a few seconds. TrainDB will provide SQL-like query interface and support various DBMS data sources.
Docs(English) • Docs(Korean) • Tutorial(Colab)
- Java 11+
- Maven 3.x
- SQLite3 (or other DBMS for catalog store, supported by datanucleus)
For python environment setup, see README in our traindb-model repository.
$ git clone --recurse-submodules https://github.com/traindb-project/traindb.git
$ cd traindb $ mvn package
Then, you can find traindb-x.y-SNAPSHOT.tar.gz in traindb-assembly/target directory.
$ tar xvfz traindb-assembly/target/traindb-x.y-SNAPSHOT.tar.gz
Now, you can execute SQL statements using the command line interface.
You need to put JDBC driver for your DBMS into the directory included in CLASSPATH.
$ cd traindb-assembly/target/traindb-x.y-SNAPSHOT $ bin/trsql sqlline> !connect jdbc:traindb:<dbms>://<host> Enter username for jdbc:traindb:<dbms>://localhost: <username> Enter password for jdbc:traindb:<dbms>://localhost: <password> 0: jdbc:traindb:<dbms>://<host>>
You can train ML models and run approximate queries like the following example.
0: jdbc:traindb:<dbms>://<host>> CREATE MODELTYPE tablegan FOR SYNOPSIS AS LOCAL CLASS 'TableGAN' IN '$TRAINDB_PREFIX/models/TableGAN.py'; No rows affected (0.255 seconds) 0: jdbc:traindb:<dbms>://<host>> TRAIN MODEL tgan MODELTYPE tablegan ON <schema>.<table>(<column 1>, <column 2>, ...); epoch 1 step 50 tensor(1.1035, grad_fn=<SubBackward0>) tensor(0.7770, grad_fn=<NegBackward>) None epoch 1 step 100 tensor(0.8791, grad_fn=<SubBackward0>) tensor(0.9682, grad_fn=<NegBackward>) None ... 0: jdbc:traindb:<dbms>://<host>> CREATE SYNOPSIS <synopsis> FROM MODEL tgan LIMIT <# of rows to generate>; ... 0: jdbc:traindb:<dbms>://<host>> SELECT APPROXIMATE avg(<column>) FROM <schema>.<table>;