Skip to content

Conversation

@Andrey170170
Copy link
Collaborator

Completed the first version of the Distributed downloader package. It is runnable and installable.
There are some non-critical problems:

  • need to write tests
  • need to somehow expose only certain functions from the package and not expose internal classes.
Andrey170170 and others added 30 commits June 11, 2024 18:52
Prepared file structure for package creation
some little fixes
fixes for scripts to be runnable in new file structure
Added config file in yaml format
Created a new wrapper scripts that controls the whole process to make the project more package like.
Finished `main.py` script Added folder structure initialization steps to `server_prep.py` script (now renamed to `initialization.py`)
Created `fake_profiler.py`, it initializes profiles with constant rate_limit Rewrote `MPI_download_prep` to follow a new logic of structure
Transferred downloader job submission inside schedule_creator Added a restriction prohibiting user from running main.py if schedule_creation was already scheduled and haven't completed yet Some minor changes
Small fix
Small fix
Added filtering scripts: based on image size and based on similarity between MD5 hashsum Also added scripts to delete images that were filtered out
Added filtering scripts: based on image size and based on similarity between MD5 hashsum Also added scripts to delete images that were filtered out
Some minor changes and fixes
Added name_table to have stable names between several sections of data transfer
minor updates
Fixed bug in schedule creation script.
Made downloader scripts consistent with new format of configuration (using `.yaml` file) Added verification step inside downloading job (`slurm` files) to reduce total number of jobs that is scheduled
Added check for main function whether there is possibility of infinite loop or if all servers are downloaded
Added scripts to perform data merging
some small adjustments
Transferred code of all filters into a new file structure.
Changed the way how registry works, now it uses decorators Added wrapper runner scripts for each stage of tool
Completed tools refactoring, haven't tested yet
# Conflicts: #	README.md #	requirements.txt
Some minor fixes
Some minor fixes
Andrey170170 and others added 4 commits July 23, 2024 17:36
Updated tools to follow new Config/Checkpoint logic Refactored code to follow snake_case scheme for all file fields
Added config checking mechanism (compares config with a template) Added reset options for downloader and tools, so now it can be automatically relaunched
Updated structure to be package installable
Andrey170170 and others added 5 commits July 29, 2024 23:42
Updated documentation (Readme.md file)
Added example for ignored_servers
Small readme fixes
Andrey170170 and others added 3 commits August 7, 2024 20:48
Changed gbif_id to source_id
Co-authored-by: Matt Thompson <31709066+thompsonmj@users.noreply.github.com>
@thompsonmj thompsonmj self-requested a review February 4, 2025 15:19
Copy link
Contributor

@thompsonmj thompsonmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Andrey170170 Andrey170170 merged commit a381072 into main Feb 4, 2025
@Andrey170170 Andrey170170 deleted the refactoring branch February 4, 2025 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants