This project performs customer segmentation on mall customer data using K-Means clustering. It includes both a command-line analysis pipeline and an interactive web dashboard built with Flask.
- Data Exploration & Preprocessing
- Automatic Optimal Cluster Detection (Elbow Method & Silhouette Score)
- 2D & 3D Visualizations of customer segments
- Detailed Cluster Analysis with demographic and behavioral insights
- Marketing Strategy Recommendations based on segment characteristics
- Interactive Web Dashboard for exploring segmentation results
The analysis uses the Mall_Customers.csv dataset with the following features:
- CustomerID: Unique identifier for each customer
- Gender: Customer's gender (Male/Female)
- Age: Customer's age
- Annual Income (k$): Customer's annual income in thousands of dollars
- Spending Score (1-100): Score assigned by the mall based on customer behavior and spending patterns
-
Clone the repository:
git clone https://github.com/username/mall-customer-segmentation.git cd mall-customer-segmentation -
Create and activate a virtual environment (optional but recommended):
python -m venv venv # On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate -
Install the required packages:
pip install -r requirements.txt
To run the complete analysis pipeline from the command line:
python main.py This will generate visualizations and CSV files with the segmentation results.
To launch the interactive web dashboard:
python run_flask_app.py This will start a Flask web server and open a browser tab at http://127.0.0.1:5000/ with the application.
mall-customer-segmentation/ │ ├── Mall_Customers.csv # Input dataset ├── requirements.txt # Python dependencies ├── README.md # Project documentation │ ├── data_preprocessing.py # Data loading and preprocessing ├── kmeans_clustering.py # K-means clustering implementation ├── advanced_analysis.py # Multi-dimensional clustering ├── main.py # CLI analysis pipeline │ ├── app.py # Flask web application ├── run_flask_app.py # Script to run the web app │ └── templates/ # HTML templates for Flask ├── index.html # Landing page ├── results.html # Analysis results page └── error.html # Error page The K-means clustering algorithm identifies distinct customer segments based on annual income and spending patterns. For each segment, the analysis provides:
- Demographic Profile: Age distribution, gender ratio, income levels
- Spending Behavior: Average spending score and patterns
- Marketing Recommendations: Tailored strategies for each segment
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Before pushing to GitHub:
-
Create a .gitignore file with the following content:
# Python bytecode __pycache__/ *.py[cod] *$py.class # Virtual environments venv/ env/ ENV/ # Generated files *.png *.csv !Mall_Customers.csv # Flask session files flask_session/ # IDE files .idea/ .vscode/ *.swp *.swo -
Create a screenshots directory for README images
mkdir screenshots -
Run the analysis once to generate screenshots for documentation:
python main.py -
Copy key visualizations to the screenshots directory for README use
copy customer_segments_5_clusters.png screenshots/customer_segments.png -
Update repository URL in this README with your actual GitHub username/repository