Skip to content

The OpenWeatherMap Data Pipeline Engineering Project collects real-time weather data from multiple cities, processes it through a robust ETL pipeline, and generates insightful visualizations to reveal patterns and trends in temperature, humidity, and wind conditions

Notifications You must be signed in to change notification settings

fahmizainal17/OpenWeatherMap_Data_Pipeline_Engineering_Project

Repository files navigation

🌤️ OpenWeatherMap Data Pipeline Engineering Project

1. Screenshots

  1. Docker Compose Containers
    Docker Compose Up
    This shows Docker running our containers for the weather pipeline, Prometheus, and Grafana.

  2. Grafana Dashboard
    Grafana Dashboard
    Displays the pipeline metrics (data volumes, durations, API performance) once properly configured.

  3. Airflow Login Page
    Airflow Login
    Airflow prompts for username and password to access its web UI.

  4. Airflow DAGs Interface
    Airflow DAGs
    A list of DAGs (pipelines) that can be scheduled and monitored via Airflow.


2. Overview

The OpenWeatherMap Data Pipeline Engineering Project is a comprehensive data engineering solution to collect, process, and analyze weather data from the OpenWeatherMap API. It demonstrates a complete ETL pipeline with integrated monitoring, visualization, and multiple deployment options.

flowchart LR User([User]) API[OpenWeatherMap API] ETL[ETL Pipeline] DB[(Storage)] Monitor[Monitoring] Insight[Analytics] User --> API API --> ETL ETL --> DB DB --> Insight ETL <--> Monitor Insight --> User style API fill:#93c5fd,stroke:#2563eb,stroke-width:2px style ETL fill:#fde68a,stroke:#d97706,stroke-width:2px style Monitor fill:#d1fae5,stroke:#059669,stroke-width:2px style Insight fill:#fbcfe8,stroke:#db2777,stroke-width:2px 
Loading

3. Table of Contents

  1. Key Features
  2. Technology Stack
  3. Project Structure
  4. Installing & Running (The Story)
  5. Processing Pipeline
  6. Data Analysis
  7. Deployment Options
  8. Monitoring
  9. References
  10. License

4. Key Features

  • Automated Weather Data Collection

    • Multi-city weather data extraction
    • Configurable sampling frequency
    • Resilient retry logic for API calls
  • Robust Data Processing

    • Data cleaning, outlier handling
    • Derived metric computation
  • Comprehensive Analytics

    • City-to-city comparisons
    • Temperature trend analysis
    • Weather pattern visualizations
  • Enterprise-Grade Infrastructure

    • Docker containerization
    • Kubernetes orchestration
    • Optional Airflow scheduling

5. Technology Stack

  • Python 3.12+
  • Docker / Kubernetes
  • Prometheus / Grafana
  • Apache Airflow (for advanced scheduling)
flowchart TD Python[Python 3.12] --> Pandas[Pandas] & Matplotlib[Matplotlib] Python --> Docker[Docker] Docker --> K8s[Kubernetes] Python --> Airflow[Airflow] Python --> Prometheus[Prometheus] Prometheus --> Grafana[Grafana] 
Loading

6. Project Structure

weather_data_pipeline/ ├── README.md ├── images/ │ ├── Grafana.png │ ├── apache_airflow_interface.png │ ├── apache_airflow_login.png │ └── docker-compose up.png ├── config/ │ └── config.yaml ├── data/ │ ├── raw/ │ ├── processed/ │ └── output/ ├── logs/ ├── requirements.txt ├── src/ │ ├── extract.py │ ├── transform.py │ ├── load.py │ ├── analyze.py │ └── utils.py ├── main.py ├── Dockerfile ├── docker-compose.yml ├── airflow/ │ └── weather_pipeline_dag.py ├── kubernetes/ │ └── deployment.yaml └── monitoring/ ├── prometheus.yml └── grafana-dashboard.json 

7. Installing & Running (The Story)

Below is a step-by-step flow illustrating how to install Docker, confirm everything is up and running, then transition to Kubernetes and Airflow.

Step 1: Install & Verify Docker

  1. On macOS, ensure Docker is not stuck:
    sudo launchctl remove com.docker.vmnetd
  2. Verify Docker commands:
    docker pull hello-world docker run hello-world docker ps docker ps -a docker --version
  3. Check existing images & remove any (optional):
    docker images docker rmi <IMAGE_ID>
  4. Open Docker Desktop GUI (macOS):
    open -a Docker

Step 2: Build & Run Our Weather Pipeline

  1. Build the Docker image:

    docker build -t weather-pipeline .
  2. Run the container with your API key:

    docker run --env-file .env weather-pipeline
  3. Spin up services via Docker Compose:

    docker compose up

    If you get a port conflict (e.g., for port 9090), try:

    lsof -i :9090 kill -9 <PID> pkill -f prometheus docker compose up
  4. Check Docker containers:

    docker ps

    Screenshot:
    Docker Compose Up

    You should see 3 containers:

    • Weather Pipeline (port 8000)
    • Prometheus (port 9090)
    • Grafana (port 3000)

Step 3: Monitor Pipeline with Grafana

Once Docker is up, open Grafana at http://localhost:3000.

  • Username: admin
  • Password: admin (by default, if unchanged)

Screenshot:
Grafana Dashboard

You’ll see panels for:

  • Pipeline Duration
  • Data Volumes (Records Processed & Data Points Extracted)
  • API Performance

If you see “No data,” check your prometheus.yml or the pipeline’s main logs to ensure metrics are being scraped properly.

Step 4: Using Airflow Locally

Screenshot (Login):
Airflow Login

  1. Install & Initialize Airflow:
    pip install apache-airflow airflow db init
  2. Create Admin User:
    airflow users create \ --username admin \ --password admin \ --firstname Admin \ --lastname User \ --role Admin \ --email admin@example.com
  3. Add the DAG:
    mkdir -p ~/airflow/dags cp airflow/weather_pipeline_dag.py ~/airflow/dags/
  4. Start Airflow:
    airflow webserver --port 8080 airflow scheduler
  5. Screenshot (DAGs):
    Airflow DAGs
    You should now see a list of DAGs (pipelines). Enable or trigger the relevant ones.

Step 5: Kubernetes (Optional)

  1. Start Minikube:
    minikube start
  2. Apply the Weather Pipeline Deployment:
    kubectl apply -f kubernetes/deployment.yaml
  3. (Optional) Create a secret for your API key:
    kubectl create secret generic weather-pipeline-secrets \ --from-literal=API_KEY=your_openweathermap_api_key
  4. Check pods:
    kubectl get pods

Now your pipeline can run in a Kubernetes environment!


8. Processing Pipeline

graph TD A[Input Config] --> B[Data Extraction] B --> C[Data Transformation] C --> D[Data Loading] C --> E[Data Analysis] E --> F[Visualization] F --> G[Results] 
Loading
  1. Extract: Grab weather data from OpenWeatherMap
  2. Transform: Clean, normalize, handle outliers
  3. Load: Store processed data in local files or DB
  4. Analyze: Generate city comparisons, identify trends
  5. Visualize: Plot charts & graphs (Matplotlib, etc.)

9. Data Analysis

mindmap root((Weather Analysis)) City Comparisons Temperature Humidity Wind Speed Temporal Analysis Daily Variation Long-term Trend Weather Conditions Condition Distribution Alerts Correlation Temperature-Humidity Wind-Temperature 
Loading

The pipeline can generate:

  • Time-series plots (temperature trends)
  • Comparison charts across multiple cities
  • Correlation analyses (humidity vs. temperature)

10. Deployment Options

  • Local Docker: docker-compose up --build
  • Kubernetes (Minikube): minikube start && kubectl apply -f deployment.yaml
  • Airflow: Local scheduler and UI (port 8080)
  • EC2: GitHub Actions CI/CD for continuous deployment

11. Monitoring

  1. Prometheus collects pipeline metrics (port 9090).
  2. Grafana visualizes metrics (port 3000).
    • Import monitoring/grafana-dashboard.json for a pre-built dashboard.
  3. Alerts can be configured in Prometheus/Grafana to notify on pipeline failures or anomalies.

12. References


13. License

© 2025 Fahmi Zainal All rights reserved. This project and its contents are proprietary and confidential. Unauthorized copying, distribution, or modification of this software, via any medium, is strictly prohibited. For licensing inquiries, please contact the project maintainer. 

About

The OpenWeatherMap Data Pipeline Engineering Project collects real-time weather data from multiple cities, processes it through a robust ETL pipeline, and generates insightful visualizations to reveal patterns and trends in temperature, humidity, and wind conditions

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published