Skip to content

Conversation

@sjrusso8
Copy link

@sjrusso8 sjrusso8 commented Jan 2, 2023

@andialbrecht & @mrmasterplan update my initial PR with the lexer changes. See below!

This PR will add frequently used Databricks and Delta table syntax. Databricks SQL has a lot of special operations to work with Delta tables which means a lot of new keywords.

Here is an example of standard operations of Databricks SQL for a created Delta table.

CREATE TABLE IF NOT EXISTS default.event ( id INT, name STRING, description VARCHAR(30) ) USING delta LOCATION '/mnt/data/location' PARTITIONED BY (id) COMMENT 'this is a comment' TBLPROPERTIES ( 'foo'='bar', delta.autoOptimize.optimizeWrite = true, delta.autoOptimize.autoCompact = true ); OPTIMIZE event WHERE date >= current_timestamp() - INTERVAL 1 day ZORDER BY (id); VACUUM event; CREATE BLOOMFILTER INDEX ON TABLE event FOR COLUMNS(description OPTIONS (fpp=0.1, numItems=50000000)); CREATE TABLE default.event_clone SHALLOW CLONE default.event; DESCRIBE HISTORY event; DESCRIBE TABLE EXTENDED event; SHOW DETAIL event; MSCK REPAIR TABLE event SYNC METADATA; REFRESH TABLE event;

Then operating on those statements should parse out additional keywords like below.

statements = sqlparse.parse(sql) for statement in statements: result = [v.value for v in sqlparse.sql.IdentifierList(statement.tokens).get_identifiers() if v.is_keyword] print(result) >>> ['CREATE', 'TABLE', 'IF', 'NOT', 'EXISTS', 'USING', 'LOCATION', 'PARTITIONED BY', 'COMMENT', 'TBLPROPERTIES'] >>> ['OPTIMIZE', 'ZORDER BY'] >>> ['VACUUM'] >>> ['CREATE', 'BLOOMFILTER INDEX', 'ON', 'TABLE', 'FOR'] >>> ['CREATE', 'TABLE', 'SHALLOW CLONE'] >>> ['DESCRIBE', 'HISTORY'] >>> ['DESCRIBE', 'TABLE', 'EXTENDED'] >>> ['SHOW', 'DETAIL'] >>> ['MSCK REPAIR', 'TABLE', 'SYNC', 'METADATA'] >>> ['REFRESH', 'TABLE']
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant