Skip to content

Commit de8dc90

Browse files
authored
Update README
1 parent 67e54db commit de8dc90

File tree

1 file changed

+7
-8
lines changed

1 file changed

+7
-8
lines changed

README.md

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -72,12 +72,11 @@ We already trained a model for 8 epochs on the data that was preprocessed in the
7272
The number of epochs was chosen using [early stopping](https://en.wikipedia.org/wiki/Early_stopping), as the version that maximized the F1 score on the validation set.
7373
```
7474
wget https://s3.amazonaws.com/code2vec/model/java14m_model.tar.gz
75-
mkdir -p models/java14m/
7675
tar -xvzf java14m_model.tar.gz
7776
```
7877

7978
##### Note:
80-
This trained model is in a "released" state, which means that we stripped it from its training parameters and can thus be used for inference, but cannot be further trained. If you use this trained model in the next steps, use 'saved_model_iter8.release' instead of 'saved_model_iter8' in every command line example that loads the model such as: '--load models/java14m/saved_model_iter8'. To read how to release a model, see [Releasing the model](#releasing-the-model).
79+
This trained model is in a "released" state, which means that we stripped it from its training parameters and can thus be used for inference, but cannot be further trained. If you use this trained model in the next steps, use 'saved_model_iter8.release' instead of 'saved_model_iter8' in every command line example that loads the model such as: '--load models/java14_model/saved_model_iter8'. To read how to release a model, see [Releasing the model](#releasing-the-model).
8180

8281
#### Training a model from scratch
8382
To train a model from scratch:
@@ -103,14 +102,14 @@ Once the score on the validation set stops improving over time, you can stop the
103102
and pick the iteration that performed the best on the validation set.
104103
Suppose that iteration #8 is our chosen model, run:
105104
```
106-
python3 code2vec.py --load models/java14m/saved_model_iter8 --test data/java14m/java14m.test.c2v
105+
python3 code2vec.py --load models/java14_model/saved_model_iter8 --test data/java14m/java14m.test.c2v
107106
```
108107
While evaluating, a file named "log.txt" is written with each test example name and the model's prediction.
109108

110109
### Step 4: Manual examination of a trained model
111110
To manually examine a trained model, run:
112111
```
113-
python3 code2vec.py --load models/java14m/saved_model_iter8 --predict
112+
python3 code2vec.py --load models/java14_model/saved_model_iter8 --predict
114113
```
115114
After the model loads, follow the instructions and edit the file Input.java and enter a Java
116115
method or code snippet, and examine the model's predictions and attention scores.
@@ -156,7 +155,7 @@ Code2vec supports the following features:
156155
If you wish to keep a trained model for inference only (without the ability to continue training it) you can
157156
release the model using:
158157
```
159-
python3 code2vec.py --load models/java14m/saved_model_iter8 --release
158+
python3 code2vec.py --load models/java14_model/saved_model_iter8 --release
160159
```
161160
This will save a copy of the trained model with the '.release' suffix.
162161
A "released" model usually takes 3x less disk space.
@@ -169,11 +168,11 @@ In order to export embeddings from a trained model, use the "--save_w2v" and "--
169168

170169
Exporting the trained *token* embeddings:
171170
```
172-
python3 code2vec.py --load models/java14m/saved_model_iter3 --save_w2v models/java14m/tokens.txt
171+
python3 code2vec.py --load models/java14_model/saved_model_iter3 --save_w2v models/java14_model/tokens.txt
173172
```
174173
Exporting the trained *target* (method name) embeddings:
175174
```
176-
python3 code2vec.py --load models/java14m/saved_model_iter3 --save_t2v models/java14m/targets.txt
175+
python3 code2vec.py --load models/java14_model/saved_model_iter3 --save_t2v models/java14_model/targets.txt
177176
```
178177
This saves the tokens/targets embedding matrices in word2vec format to the specified text file, in which:
179178
the first line is: \<vocab_size\> \<dimension\>
@@ -183,7 +182,7 @@ These word2vec files can be manually parsed or easily loaded and inspected using
183182
```python
184183
python3
185184
>>> from gensim.models import KeyedVectors as word2vec
186-
>>> vectors_text_path = 'models/java14m/targets.txt' # or: `models/java14m/tokens.txt'
185+
>>> vectors_text_path = 'models/java14_model/targets.txt' # or: `models/java14_model/tokens.txt'
187186
>>> model = word2vec.load_word2vec_format(vectors_text_path, binary=False)
188187
>>> model.most_similar(positive=['equals', 'to|lower']) # or: 'tolower', if using the downloaded embeddings
189188
>>> model.most_similar(positive=['download', 'send'], negative=['receive'])

0 commit comments

Comments
 (0)