update extract.py #182

lidiancracy · 2023-09-20T04:02:03Z

When I used process.sh to extract project data, I found that the project was too large to be extracted. As a result, I modified extract.py to read the dataset paths in batches. Moreover, some datasets might have errors that prevent them from being parsed (it doesn't throw an error but just hangs, which was quite perplexing). Therefore, I added a time constraint, and if it exceeds a certain duration without processing, it skips. I hope this can assist users dealing with large volumes of data.

…ng of projects instead of loading them all at once. During batch processing, I also incorporated timeout handling.

urialon · 2023-09-20T14:34:41Z

Great, thank you @lidiancracy !

I added a batch_size to the extract file, allowing for batch processi…

103362b

…ng of projects instead of loading them all at once. During batch processing, I also incorporated timeout handling.

lidiancracy changed the title ~~update exteacr.py~~ update extract.py Sep 20, 2023

urialon merged commit 77637c5 into tech-srl:master Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update extract.py #182

update extract.py #182

Uh oh!

lidiancracy commented Sep 20, 2023

urialon commented Sep 20, 2023

Labels

2 participants

update extract.py #182

update extract.py #182

Uh oh!

Conversation

lidiancracy commented Sep 20, 2023

urialon commented Sep 20, 2023

Labels

2 participants