Skip to content

Commit 090f9b4

Browse files
authored
Update readme: add usage for Python users
1 parent 3f44ac1 commit 090f9b4

File tree

1 file changed

+71
-4
lines changed

1 file changed

+71
-4
lines changed

Readme.md

Lines changed: 71 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,79 @@ VnCoreNLP is a Java NLP annotation pipeline for Vietnamese, providing rich lingu
1414

1515
Please **CITE** paper [1] whenever VnCoreNLP is used to produce published results or incorporated into other software. If you are dealing in depth with either word segmentation or POS tagging, you are encouraged to also cite paper [2] or [3], respectively.
1616

17-
**NOTE** that if you are looking for light-weight versions, VnCoreNLP's word segmentation and POS tagging components have also been released as independent packages [RDRsegmenter](https://github.com/datquocnguyen/RDRsegmenter) [2] and [VnMarMoT](https://github.com/datquocnguyen/VnMarMoT) [3], resepectively.
17+
**NOTE**
18+
- If you are looking for light-weight versions, VnCoreNLP's word segmentation and POS tagging components have also been released as independent packages [RDRsegmenter](https://github.com/datquocnguyen/RDRsegmenter) [2] and [VnMarMoT](https://github.com/datquocnguyen/VnMarMoT) [3], resepectively.
19+
- A special thanks goes to Khoa Duong (@dnanhkhoa) for creating a Python wrapper of VnCoreNLP.
1820

21+
## Installation
1922

20-
## Using VnCoreNLP from the command line
23+
- `Python 3.4+` if using the Python wrapper. Then, to install the wrapper, users have to run the following command:
2124

22-
Assume that Java 1.8+ is already set to run in the command line or terminal (for example: adding Java to the environment variable `path` in Windows OS); and file `VnCoreNLP-1.1.jar` (27MB) and folder `models` (115MB) are placed in the same working folder. You can run VnCoreNLP to annotate an input raw text corpus (e.g. a collection of news content) by using following commands:
25+
``$ pip install vncorenlp``
26+
27+
- `Java 1.8+`
28+
- File `VnCoreNLP-1.1.jar` (27MB) and folder `models` (115MB) are placed in the same working folder.
29+
30+
31+
32+
## Usage for Python users
33+
34+
### Use as a service (recommended)
35+
36+
1. Run the following command:
37+
38+
``$ vncorenlp -Xmx2g <VnCoreNLP-jar-file-path> -p 9000 -a "wseg,pos,ner,parse"``
39+
40+
The service is now available at ``http://127.0.0.1:9000``.
41+
42+
2. Use the service in your `python` code:
43+
44+
```python
45+
from vncorenlp import VnCoreNLP
46+
text = "Ông Nguyễn Khắc Chúc đang làm việc tại Đại học Quốc gia Hà Nội. Bà Lan, vợ ông Chúc, cũng làm việc tại đây."
47+
annotator = VnCoreNLP(address="http://127.0.0.1", port=9000)
48+
annotated_text = annotator.annotate(text) # json format
49+
50+
# If you want to use only the word segmenter
51+
word_segmented_text = annotator.tokenize(text)
52+
```
53+
54+
- `print(annotated_text)`
55+
56+
```
57+
{'sentences': [[{'index': 1, 'form': 'Ông', 'posTag': 'Nc', 'nerLabel': 'O', 'head': 4, 'depLabel': 'sub'}, {'index': 2, 'form': 'Nguyễn_Khắc_Chúc', 'posTag': 'Np', 'nerLabel': 'B-PER', 'head': 1, 'depLabel': 'nmod'}, {'index': 3, 'form': 'đang', 'posTag': 'R', 'nerLabel': 'O', 'head': 4, 'depLabel': 'adv'}, {'index': 4, 'form': 'làm_việc', 'posTag': 'V', 'nerLabel': 'O', 'head': 0, 'depLabel': 'root'}, {'index': 5, 'form': 'tại', 'posTag': 'E', 'nerLabel': 'O', 'head': 4, 'depLabel': 'loc'}, {'index': 6, 'form': 'Đại_học', 'posTag': 'N', 'nerLabel': 'B-ORG', 'head': 5, 'depLabel': 'pob'}, {'index': 7, 'form': 'Quốc_gia', 'posTag': 'N', 'nerLabel': 'I-ORG', 'head': 6, 'depLabel': 'nmod'}, {'index': 8, 'form': 'Hà_Nội', 'posTag': 'Np', 'nerLabel': 'I-ORG', 'head': 6, 'depLabel': 'nmod'}, {'index': 9, 'form': '.', 'posTag': 'CH', 'nerLabel': 'O', 'head': 4, 'depLabel': 'punct'}], [{'index': 1, 'form': 'Bà', 'posTag': 'Nc', 'nerLabel': 'O', 'head': 9, 'depLabel': 'sub'}, {'index': 2, 'form': 'Lan', 'posTag': 'Np', 'nerLabel': 'B-PER', 'head': 1, 'depLabel': 'nmod'}, {'index': 3, 'form': ',', 'posTag': 'CH', 'nerLabel': 'O', 'head': 1, 'depLabel': 'punct'}, {'index': 4, 'form': 'vợ', 'posTag': 'N', 'nerLabel': 'O', 'head': 1, 'depLabel': 'nmod'}, {'index': 5, 'form': 'ông', 'posTag': 'Nc', 'nerLabel': 'O', 'head': 4, 'depLabel': 'nmod'}, {'index': 6, 'form': 'Chúc', 'posTag': 'Np', 'nerLabel': 'B-PER', 'head': 5, 'depLabel': 'nmod'}, {'index': 7, 'form': ',', 'posTag': 'CH', 'nerLabel': 'O', 'head': 1, 'depLabel': 'punct'}, {'index': 8, 'form': 'cũng', 'posTag': 'R', 'nerLabel': 'O', 'head': 9, 'depLabel': 'adv'}, {'index': 9, 'form': 'làm_việc', 'posTag': 'V', 'nerLabel': 'O', 'head': 0, 'depLabel': 'root'}, {'index': 10, 'form': 'tại', 'posTag': 'E', 'nerLabel': 'O', 'head': 9, 'depLabel': 'loc'}, {'index': 11, 'form': 'đây', 'posTag': 'P', 'nerLabel': 'O', 'head': 10, 'depLabel': 'pob'}, {'index': 12, 'form': '.', 'posTag': 'CH', 'nerLabel': 'O', 'head': 9, 'depLabel': 'punct'}]]}
58+
```
59+
60+
- `print(word_segmented_text)`
61+
62+
```
63+
[['Ông', 'Nguyễn_Khắc_Chúc', 'đang', 'làm_việc', 'tại', 'Đại_học', 'Quốc_gia', 'Hà_Nội', '.'], ['Bà', 'Lan', ',', 'vợ', 'ông', 'Chúc', ',', 'cũng', 'làm_việc', 'tại', 'đây', '.']]
64+
```
65+
66+
67+
68+
69+
### Use without the service
70+
71+
```python
72+
from vncorenlp import VnCoreNLP
73+
text = "Ông Nguyễn Khắc Chúc đang làm việc tại Đại học Quốc gia Hà Nội. Bà Lan, vợ ông Chúc, cũng làm việc tại đây."
74+
annotator = VnCoreNLP("<VnCoreNLP-jar-file-path>")
75+
annotated_text = annotator.annotate(text) # json format
76+
77+
# If you want to use only the word segmenter
78+
word_segmented_text = annotator.tokenize(text)
79+
80+
```
81+
82+
For more details, we refer users to [https://github.com/dnanhkhoa/python-vncorenlp](https://github.com/dnanhkhoa/python-vncorenlp).
83+
84+
85+
## Usage for Java users
86+
87+
### Using VnCoreNLP from the command line
88+
89+
You can run VnCoreNLP to annotate an input raw text corpus (e.g. a collection of news content) by using following commands:
2390

2491
//To perform word segmentation, POS tagging, NER and then dependency parsing
2592
$ java -Xmx2g -jar VnCoreNLP-1.1.jar -fin input.txt -fout output.txt
@@ -31,7 +98,7 @@ Assume that Java 1.8+ is already set to run in the command line or terminal (for
3198
$ java -Xmx2g -jar VnCoreNLP-1.1.jar -fin input.txt -fout output.txt -annotators wseg
3299

33100

34-
## Using VnCoreNLP from the API
101+
### Using VnCoreNLP from the API
35102

36103
The following code is a simple and complete example:
37104

0 commit comments

Comments
 (0)