Skip to content

Conversation

@changlan
Copy link

  • Do not create sharded TFRecords.
  • Ensure there is only one process per host that creates the TFRecord file.

Signed-off-by: Chang Lan changlan@google.com

@changlan
Copy link
Author

Fix #383. The issue I had was that host 0 had shard 0-7 and host 1 had shard 8-15, but TFRecordDataset on every MPI process had to access all the shards regardless of d.shard(). I did not observe any performance issue after removing all the shards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants