Scripts that download and transform dev.to blog articles.
The main.py script runs three subordinate scripts, download_articles.py, download_images.py, and copy_and_transform.py. The following will run the main script, which downloads articles, images, and then makes copies which transform the original markdown content:
python main.py <username> --root <root dir> --download_dir <download dir> --transformed_dir <transform dir> --article <article name>
Options:
usernamerefers to the dev.to user whose articles will be downloadedroot dirdetermines which base path the files are downloaded to - defaults to the current directory.download diris the directory to which the markdown files and image files will be downloaded - defaults todownloaded_filestransform dirwill contain a copy of the contents ofdownload dirwith changes applied to the markdown (localizes links and fixes some markdown to work with jekyll) - defaults totransformed_files- Once this script has been run for all articles, the
--articleoption can be used to download and transform additional articles one at a time.article namerefers to the name of the article in the dev.to url - a wildcard match to this parameter is applied.
Example:
python main.py ben --root ~/downloaded_articles
When download_articles.py runs, it generates an articles_dict.json file. This file stores key-value pair mappings of article names to article titles. This is used by the copy_and_transform.py script to produce local links when an article links another article by the current author, as specified by username.
By default, these scripts download the markdown files to the _posts subdirectory, and the images to the assets/images subdirectory, to match the structure expected by the jekyll static site generator. This can be modified in common.py.
To delete an article, run:
python delete_matching.py <article name> --root <root dir>
This will delete matching markdown files and image directories, and will also remove the mapping for that article from articles_dict.json.