Skip to content

Conversation

@lidiancracy
Copy link
Contributor

When I used process.sh to extract project data, I found that the project was too large to be extracted. As a result, I modified extract.py to read the dataset paths in batches. Moreover, some datasets might have errors that prevent them from being parsed (it doesn't throw an error but just hangs, which was quite perplexing). Therefore, I added a time constraint, and if it exceeds a certain duration without processing, it skips. I hope this can assist users dealing with large volumes of data.

…ng of projects instead of loading them all at once. During batch processing, I also incorporated timeout handling.
@lidiancracy lidiancracy changed the title update exteacr.py update extract.py Sep 20, 2023
@urialon urialon merged commit 77637c5 into tech-srl:master Sep 20, 2023
@urialon
Copy link
Collaborator

urialon commented Sep 20, 2023

Great, thank you @lidiancracy !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants