[Flax] Fix eval and data_args usage in streaming example #14129

falcaopetri · 2021-10-23T15:25:29Z

What does this PR do?

This PR fixes the evaluation loop in run_mlm_flax_stream.py. Current behavior didn't update the correct variable, which leads to data leakage during evaluation.

It also takes the opportunity to improve some DataTrainingArguments usages.

It's a draft PR because there is an open improvement that could be made: the script splits train-eval based solely in data_args.{dataset_name,num_eval_samples}, but also accepts unused args train_file, validation_file, train_ref_file, validation_ref_file, validation_split_percentage. Other data args that are unused: pad_to_max_length, line_by_line.

My suggestion would be to remove all these unused args. May I proceed with that?

Before submitting

Did you read the contributor guideline, Request section?
Did you make sure to update the documentation with your changes?
The script is mentioned in jax-projects/dataset-streaming/README, but no changes are required.

Who can review?

@patrickvonplaten

github-actions · 2021-12-03T15:02:03Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patrickvonplaten · 2021-12-22T20:06:52Z

@patil-suraj could you take a look here? :-)

falcaopetri added 2 commits October 23, 2021 11:43

[Flax] Fix eval in streaming example

7c9a3d4

[Flax] Improve data_args usage in streaming example

754f16f

patrickvonplaten requested a review from patil-suraj November 9, 2021 07:42

github-actions bot closed this Dec 12, 2021

patrickvonplaten reopened this Dec 13, 2021

falcaopetri marked this pull request as ready for review December 14, 2021 01:48

github-actions bot closed this Dec 22, 2021

patrickvonplaten reopened this Dec 22, 2021

huggingface deleted a comment from github-actions bot Jan 16, 2022

patil-suraj added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Jan 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Flax] Fix eval and data_args usage in streaming example #14129

[Flax] Fix eval and data_args usage in streaming example #14129

Uh oh!

falcaopetri commented Oct 23, 2021

github-actions bot commented Dec 3, 2021

patrickvonplaten commented Dec 22, 2021

Labels

3 participants

[Flax] Fix eval and data_args usage in streaming example #14129

Are you sure you want to change the base?

[Flax] Fix eval and data_args usage in streaming example #14129

Uh oh!

Conversation

falcaopetri commented Oct 23, 2021

What does this PR do?

Before submitting

Who can review?

github-actions bot commented Dec 3, 2021

patrickvonplaten commented Dec 22, 2021

Labels

3 participants