DEV Community

AXUM中文博客
AXUM中文博客

Posted on

使用 Zhparser 插件实现 PostgreSQL 中文全文检索

Docker 容器

docker run \ --name postgres \ -e POSTGRES_PASSWORD=postgres \ -e TZ=PRC \ --restart=always \ -e PGDATA=/var/lib/postgresql/data/pgdata \ -v /var/docker/postgres:/var/lib/postgresql/data \ -p 5432:5432 \ -d postgres docker exec -it postgres bash # 进入 pg 容器 
Enter fullscreen mode Exit fullscreen mode

编译安装 Zhparser

以下步骤都是在 PG 容器中操作。

安装依赖:

postgresql-server-dev-17 改成对应版本,也可以在 docker run 中明确指定拉取的镜像版本,以便保持统一。

apt update -y && apt install wget gcc make git bzip2 postgresql-server-dev-17 -y 
Enter fullscreen mode Exit fullscreen mode

编译 Zhparser:

cd /tmp wget http://www.xunsearch.com/scws/down/scws-1.2.3.tar.bz2 tar -jxvf scws-1.2.3.tar.bz2 cd scws-1.2.3 ./configure && make && make install cd .. git clone https://github.com/amutu/zhparser.git cd zhparser/ make && make install 
Enter fullscreen mode Exit fullscreen mode

验证安装。首先连接到 PG 服务器:

psql -U postgres 
Enter fullscreen mode Exit fullscreen mode

然后:

CREATE EXTENSION zhparser; -- 启用 Zhparser 扩展 CREATE TEXT SEARCH CONFIGURATION chinese (PARSER = zhparser); -- 中文全文检索 ALTER TEXT SEARCH CONFIGURATION chinese ADD MAPPING FOR n,v,a,i,e,l WITH simple; -- 修改词性 select ts_token_type('zhparser'); -- 词性列表 
Enter fullscreen mode Exit fullscreen mode

测试:

to_tsvector 测试:

SELECT to_tsvector('chinese','人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来。Hello world'); 
Enter fullscreen mode Exit fullscreen mode

结果:

 to_tsvector ------------------------------------------------------------------------------------------------------------------------------ 'hello':12 'world':13 '人生':1 '使':4 '千金':8 '复来':11 '天生我材必有用':7 '对月':6 '尽':10 '尽欢':3 '得意':2 '散':9 '空':5 (1 row) 
Enter fullscreen mode Exit fullscreen mode

to_tsquery 测试:

SELECT to_tsquery('chinese', '金风玉露一相逢,便胜却人间无数。It & works'); 
Enter fullscreen mode Exit fullscreen mode

结果:

 to_tsquery -------------------------------------------------------------- '金风玉露' <-> '相逢' <-> '胜' <-> '人间' <-> 'it' & 'works' (1 row) 
Enter fullscreen mode Exit fullscreen mode

参考:https://www.fdevops.com/2023/02/05/postgres-zhparser-31246

Top comments (0)