Skip to content

Conversation

taewhi
Copy link
Member

@taewhi taewhi commented Sep 25, 2024

A new feature has been implemented to train models and generate synopses for columns in multiple tables.

For example, it is possible to train a model by joining the two tables orders and order_products in the instacart benchmark by order_id as follows:

trsql> TRAIN MODEL ctgan_join MODELTYPE ctgan   FROM instacart.orders(order_id, order_dow)   JOIN instacart.order_products(order_id, product_id, add_to_cart_order, reordered)   ON orders.order_id = order_products.order_id   OPTIONS ('epochs' = 100);

After training the model, the process of generating synopses and running approximate queries is the same as in simple queries.

trsql> CREATE SYNOPSIS ctgan_join_syn FROM MODEL ctgan_join LIMIT 10 PERCENT;
trsql> EXPLAIN PLAN for   SELECT APPROXIMATE product_name, count(*) as order_count   FROM instacart.order_products, instacart.orders, instacart.products  WHERE orders.order_id = order_products.order_id  AND order_products.product_id = products.product_id  AND (order_dow = 0 OR order_dow = 1)  GROUP BY product_name ORDER BY order_count DESC LIMIT 5; +------------------------------------------------------------------------------------------------------------------+ | PLAN | +------------------------------------------------------------------------------------------------------------------+ | JdbcToEnumerableConverter  JdbcSort(sort0=[$1], dir0=[DESC], fetch=[5])  JdbcProject(product_name=[$0], order_count=[CAST(*(10.00000177414538:DECIMAL(16, 14), $1)):BIGINT NOT NULL])  JdbcAggregate(group=[{5}], order_count=[COUNT()])  JdbcJoin(condition=[=($1, $4)], joinType=[inner])  JdbcProject(order_id=[$0], product_id=[$2], order_id0=[$0], order_dow=[$1])  JdbcFilter(condition=[AND(=($0, $0), OR(=($1, 0), =($1, 1)))])  JdbcTableScan(table=[[jdbc, instacart, ctgan_join_syn]]) <- synopsis scan replacing join  JdbcProject(product_id=[$0], product_name=[$1])  JdbcTableScan(table=[[jdbc, instacart, products]])  | +------------------------------------------------------------------------------------------------------------------+
@taewhi taewhi linked an issue Sep 25, 2024 that may be closed by this pull request
@taewhi taewhi merged commit da6bdc6 into main Sep 25, 2024
1 check passed
@taewhi taewhi deleted the dev/issue-56 branch September 25, 2024 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant