Retail products classifier

This project helps classify retail products into categories. Although in this example the categories are structured in a hierarchy, to keep it simple I considered all subcategories as top-level. The main packages used in this projects are: sklearn, nltk and dataset.

You can read the post explaining this project here.

You will need Python3+ to use this project.

Installation

1. Download

Now, you need the text-classification-python project files in your workspace:

$ git clone https://github.com/joaorafaelm/text-classification-python; $ cd text-classification-python;

2. Virtualenv (Optional)

You should already know what is virtualenv at this stage. So, simply create it for the project:

$ virtualenv venv; $ source venv/bin/activate;

3. Requirements

You will find the requirements.txt. To install them, simply type:

$ pip install -r requirements.txt

Running the scraper

To run the scraper you will need a csv of ASINS (amazons product identifier). Just search the webz for it. And then run:

python amazon_scrape.py

All data will be saved into sqlite (file database.db), table products.

Dump the database

datafreeze .datafreeze.yaml

This will create a json file under the directory dumps/.

Data preparation

python data_prep.py

The script will create a new file called products.json at the root of the project, and print out the category tree structure. Change the value of the variables default_depth, min_samples and domain if you need more data.

Classify and get prediction results.

python classify.py

It will print out the accuracy of each category, along with the confusion matrix.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
dumps		dumps
.datafreeze.yaml		.datafreeze.yaml
.gitignore		.gitignore
.sqliterc		.sqliterc
README.md		README.md
amazon_scrape.py		amazon_scrape.py
category_tree.txt		category_tree.txt
classify.py		classify.py
data_prep.py		data_prep.py
dataset.db		dataset.db
export.py		export.py
predict.py		predict.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Retail products classifier

Installation

1. Download

2. Virtualenv (Optional)

3. Requirements

Running the scraper

Dump the database

Data preparation

Classify and get prediction results.

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

Uh oh!

joaorafaelm/text-classification-python

Folders and files

Latest commit

History

Repository files navigation

Retail products classifier

Installation

1. Download

2. Virtualenv (Optional)

3. Requirements

Running the scraper

Dump the database

Data preparation

Classify and get prediction results.

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages