Add tf.keras model implementation #27

eladn · 2019-07-17T09:06:15Z

Main changes:

Move to TensorFlow 2.0.0-beta1.
New Keras model, implemented in keras_model.py. It uses tf.keras module (and not the pure Keras package). In order to use the new added Keras model, one should provide the --framework keras argument to code2vec.py. The default framework is still the old TensorFlow model (this is the one chosen if no additional arguments are stated).
The keras model allows TensorBoard output while training.
Move the TensorFlow model to tensorflow_model.py.
Implement model_base.py for common implementation-independent model methods, from which both tensorflow_model.py and keras_model.py inherit.
Use tf.Dataset for input pipeline and re-implement the reader in path_context_reader.py.
Adapt the TensorFlow model to work with the new reader.
Minor refactor the TensorFlow model.
Move configurations from common.py to config.py.
Added parameters and properties to config.py.
Refactor the printing format of the model evaluation results.
Use python's logger as output method instead of print(.) s.
Added option to separate between <OOV> and <PAD> special words. Use parameter SEPARATE_OOV_AND_PAD in config.py to set whether to apply this option. The default is False (as it was used to be).
Add requirements.txt file.
Add python type-annotations to all parts of the code.
The README.md has been updated accordingly.

What left to do:

Train the new Keras for 8 iterations, upload it to S3 and add the link to README.

Should not break the current behavior, just added new functionalities. If new params not stated explicitly the default behavior haven't changed.

…mpl at word2vec.py; use common model base

…reader.dataset directly to model.fit() instead of using reader.dataset.iterator; rename "Model" ==> "Code2VecModel"; use common.SpecialDictWords in keras_model and reader; fix tensorflow.*python*.keras imports

…DICTION

…al model output; PathContextReader: use abstract ModelInputTensorsFormer to be inherited by the impl.

…s for train; refactor config param names

…ve+load keras model; reader refactor

…layer to not use bias (now #trainable_params equals to orig tf model); use tf.train.AdamOptimizer() as optimizer instead keras "adam" - now training works on GPU

…lass; migrate lookup tables handling into `Vocab`

…maintain number of epochs for a model (recovered on load); model_base have no session object (keras model doesn't need it now); export compile keras model to a method; fix use of vocab size in embeddings; enhance save+load vocabularies; repeat eval reader; additional refactor

… model; impl `_get_vocab_embedding_as_np_array()` in keras model; separate embedding constants in `Config` (by vocab type)

…00; use keras CB to perform logging and evaluate during training; move arg parsing to config; fix AttentionLayer so mask will be input; use logger in classes (instead of prints)

…return None for non-relevant properties

…ing_split())

…ave+load) using tf.compat.v1 api

…_words`

…w old tf model file loads successfully

… If new params not stated explicitly the default behavior haven't changed.

Add tf.keras model implementation

eladn added 30 commits March 13, 2019 16:30

README minor fixes

624eab6

train.sh - add #! hash first line

79f5891

gitignore: ignore models, data, .idea, and tar.gz

04efcc5

new: Keras AttentionLayer

beb2eda

Config / add params: DL_FRAMEWORK & DROPOUT_KEEP_RATE

36c69a7

common::split_to_batches() / use iterator instead of creating a list

b79127d

Keras AttentionLayer / minor comments modification

254ee5e

add Keras model impl (not fully implemented yet); dispatch tf/keras i…

7ebf1ce

…mpl at word2vec.py; use common model base

keras model: use tf.data reader [now training works]

bee8ccb

export Config to config.py; common.SpecialDictWords(Enum);

e41730c

add common.tf_get_first_true()

429258e

model_base: use common.SpecialDictWords

69dc76a

keras model: add prediction tf graph, add evaluation f1 metric; pass …

9bf16e8

…reader.dataset directly to model.fit() instead of using reader.dataset.iterator; rename "Model" ==> "Code2VecModel"; use common.SpecialDictWords in keras_model and reader; fix tensorflow.*python*.keras imports

export word prediction calculation from model into WordPredictionLayer

290f254

impl keras Words Subtoken Metrics (Precision, Recall, F1)

79fdfe3

export the topk param into config.TOP_K_WORDS_CONSIDERED_DURING_PRE…

b832b4e

…DICTION

keras model: use 'target_word_prediction' layer output as an addition…

fd48203

…al model output; PathContextReader: use abstract ModelInputTensorsFormer to be inherited by the impl.

minor refactor: name for metrics

4a99e60

keras model: impl save+load, use val reader for train, use checkpoint…

6014a33

…s for train; refactor config param names

keras model: minor refactor

992622a

keras model: add code_vectors to output, impl evaluate()+predict()

d976f41

base model redactor; OOV+PAD special words instead of NoSuch; impl sa…

e80cc9b

…ve+load keras model; reader refactor

fix store+load model (to use RELEASE param correctly); fix transform …

4ca69ca

…layer to not use bias (now #trainable_params equals to orig tf model); use tf.train.AdamOptimizer() as optimizer instead keras "adam" - now training works on GPU

SpecialDictWords: each word has its string representation and its index

8ea4851

subtoken metrics: fix subtoken separator

8db1d7d

impl Vocab class; refactor SpecialVocabWords

d191727

reader: minor refactor + fix csv_record_defaults

ffd416a

export vocabs management into vocabularies.py; new Code2VecVocabs c…

5122495

…lass; migrate lookup tables handling into `Vocab`

move VocabType to vocabularies; move save_word2vec_format() to base…

98a0b1b

… model; impl `_get_vocab_embedding_as_np_array()` in keras model; separate embedding constants in `Config` (by vocab type)

eladn added 28 commits June 2, 2019 13:45

config: change default values

365bc36

migrate to TF-2.0.0-alpha; now keras model works with BS=1024 + TP=12…

46ae7ad

…00; use keras CB to perform logging and evaluate during training; move arg parsing to config; fix AttentionLayer so mask will be input; use logger in classes (instead of prints)

config: refactor: add annotations + optional strings + is_saving + …

5891587

…return None for non-relevant properties

reader: fix process_input_row() to adapt TF2

df3b0ca

vocabs: fix minor issue (field initialization position)

cbe8fa6

keras modeL: save only when needed; adapt save+load+predict to TF2

7da5dc4

docs: add some docs to classes

2547420

keras model: refactor: add _create_train_callbacks(); add docs

b79789e

logger fix: don't print twice (turn off propagate)

69b41b6

model base tiny refactor

6edde7a

keras model save fix (to also save vocabs)

8242e6e

tensorflow model: print # trainable params

49bbd64

add requirements.txt

0aa498e

README: add keras impl title, update requirements

f710132

model base: create model save dir if not exists

38ffab5

update version: TF2.0.0-alpha ==> TF2.0.0-beta1 (use tf.compat.v1.str…

d43d0d0

…ing_split())

tensorflow model: make it work in TF2.0.0-beta1 (train+eval+predict+s…

1bc14d4

…ave+load) using tf.compat.v1 api

add & support option config.SEPARATE_OOV_AND_PAD and `vocab.special…

cb7df2b

…_words`

README: minor bold-updates order fix

63a8019

vocabs: fix save&load to match old format (without special words); no…

41e13ec

…w old tf model file loads successfully

config: rename param "NUM_BATCHES_TO_LOG_PROGRESS"

dfba714

vocabs: refactor error msg on failed load due to wrong min word idx

cf716b5

tf model: use logger instead of print()s

b68f0e9

vocabs: fix save_to_file() to adapt to old format

cbc6487

keras & tf models: refactor - move aux classes below main model class

86d5b43

config: make logging go to stdout instead of stderr

419c7eb

readme: remove paper-version note. We just added new functionalities.…

1c92dab

… If new params not stated explicitly the default behavior haven't changed.

Merge remote-tracking branch 'upstream/master'

64bebf7

urialon merged commit 01d1731 into tech-srl:master Jul 17, 2019

anki54 pushed a commit to anki54/code2vec that referenced this pull request May 31, 2020

Merge pull request tech-srl#27 from eladn/master

4f6b0b8

Add tf.keras model implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add tf.keras model implementation #27

Add tf.keras model implementation #27

Uh oh!

eladn commented Jul 17, 2019 •

edited

Loading

Labels

2 participants

Add tf.keras model implementation #27

Add tf.keras model implementation #27

Uh oh!

Conversation

eladn commented Jul 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Labels

2 participants

eladn commented Jul 17, 2019 •

edited

Loading