Skip to content

sp1thas/scrapy-folder-tree

Repository files navigation

scrapy-folder-tree

pre-commit.ci status codecov PyPI GitHub license PyPI - Format PyPI - Status

This is a scrapy pipeline that provides an easy way to store files and images using various folder structures.

Supported folder structures:

Given this scraped file: 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg, you can choose the following folder structures:

Using the file name

class: scrapy-folder-tree.ImagesHashTreePipeline

full ├── 0 . ├── 5 . . ├── b . . . ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg 
Using the crawling time

class: scrapy-folder-tree.ImagesTimeTreePipeline

full ├── 0 . ├── 11 . . ├── 48 . . . ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg 
Using the crawling date

class: scrapy-folder-tree.ImagesDateTreePipeline

full ├── 2022 . ├── 1 . . ├── 24 . . . ├── 05b40af07cb3284506acbf395452e0e93bfc94c8.jpg 

Installation

pip install scrapy-folder-tree

Usage

Use the following settings in your project:

ITEM_PIPELINES = { 'scrapy_folder_tree.FilesHashTreePipeline': 300 }