Skip to content

Conversation

@LightWind1
Copy link

#376
I add three regular expression to match Chinese, Japanese, Korean words .
Now it can tokenize sql correctly like 'select T2.名称 , T2.南北区域 from 民风彪悍十大城市 as T1 join 省份 as T2 on 民风彪悍十大城市.所属省份id == 省份.词条id group by T1.所属省份id order by count ( * ) asc limit 3'

@andialbrecht andialbrecht self-assigned this Mar 5, 2024
@andialbrecht
Copy link
Owner

Hi @LightWind1, can you clarify what problem your change solves?
I've had a look on how the parser sees your statement and to me everything looks as expected:

import sqlparse sql = 'select T2.名称 , T2.南北区域 from 民风彪悍十大城市 as T1 join 省份 as T2 on 民风彪悍十大城市.所属省份id == 省份.词条id group by T1.所属省份id order by count ( * ) asc limit 3' p = sqlparse.parse(sql)[0] p._pprint_tree() |- 0 DML 'select' |- 1 Whitespace ' ' |- 2 IdentifierList 'T2.名称 ...' | |- 0 Identifier 'T2.名称' | | |- 0 Name 'T2' | | |- 1 Punctuation '.' | | `- 2 Name '名称' | |- 1 Whitespace ' ' | |- 2 Punctuation ',' .....and so on..... 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants