-
Docker Compose Containers
This shows Docker running our containers for the weather pipeline, Prometheus, and Grafana. -
Grafana Dashboard
Displays the pipeline metrics (data volumes, durations, API performance) once properly configured. -
Airflow Login Page
Airflow prompts for username and password to access its web UI. -
Airflow DAGs Interface
A list of DAGs (pipelines) that can be scheduled and monitored via Airflow.
The OpenWeatherMap Data Pipeline Engineering Project is a comprehensive data engineering solution to collect, process, and analyze weather data from the OpenWeatherMap API. It demonstrates a complete ETL pipeline with integrated monitoring, visualization, and multiple deployment options.
flowchart LR User([User]) API[OpenWeatherMap API] ETL[ETL Pipeline] DB[(Storage)] Monitor[Monitoring] Insight[Analytics] User --> API API --> ETL ETL --> DB DB --> Insight ETL <--> Monitor Insight --> User style API fill:#93c5fd,stroke:#2563eb,stroke-width:2px style ETL fill:#fde68a,stroke:#d97706,stroke-width:2px style Monitor fill:#d1fae5,stroke:#059669,stroke-width:2px style Insight fill:#fbcfe8,stroke:#db2777,stroke-width:2px
- Key Features
- Technology Stack
- Project Structure
- Installing & Running (The Story)
- Processing Pipeline
- Data Analysis
- Deployment Options
- Monitoring
- References
- License
-
Automated Weather Data Collection
- Multi-city weather data extraction
- Configurable sampling frequency
- Resilient retry logic for API calls
-
Robust Data Processing
- Data cleaning, outlier handling
- Derived metric computation
-
Comprehensive Analytics
- City-to-city comparisons
- Temperature trend analysis
- Weather pattern visualizations
-
Enterprise-Grade Infrastructure
- Docker containerization
- Kubernetes orchestration
- Optional Airflow scheduling
- Python 3.12+
- Docker / Kubernetes
- Prometheus / Grafana
- Apache Airflow (for advanced scheduling)
flowchart TD Python[Python 3.12] --> Pandas[Pandas] & Matplotlib[Matplotlib] Python --> Docker[Docker] Docker --> K8s[Kubernetes] Python --> Airflow[Airflow] Python --> Prometheus[Prometheus] Prometheus --> Grafana[Grafana]
weather_data_pipeline/ ├── README.md ├── images/ │ ├── Grafana.png │ ├── apache_airflow_interface.png │ ├── apache_airflow_login.png │ └── docker-compose up.png ├── config/ │ └── config.yaml ├── data/ │ ├── raw/ │ ├── processed/ │ └── output/ ├── logs/ ├── requirements.txt ├── src/ │ ├── extract.py │ ├── transform.py │ ├── load.py │ ├── analyze.py │ └── utils.py ├── main.py ├── Dockerfile ├── docker-compose.yml ├── airflow/ │ └── weather_pipeline_dag.py ├── kubernetes/ │ └── deployment.yaml └── monitoring/ ├── prometheus.yml └── grafana-dashboard.json
Below is a step-by-step flow illustrating how to install Docker, confirm everything is up and running, then transition to Kubernetes and Airflow.
- On macOS, ensure Docker is not stuck:
sudo launchctl remove com.docker.vmnetd
- Verify Docker commands:
docker pull hello-world docker run hello-world docker ps docker ps -a docker --version
- Check existing images & remove any (optional):
docker images docker rmi <IMAGE_ID>
- Open Docker Desktop GUI (macOS):
open -a Docker
-
Build the Docker image:
docker build -t weather-pipeline .
-
Run the container with your API key:
docker run --env-file .env weather-pipeline
-
Spin up services via Docker Compose:
docker compose up
If you get a port conflict (e.g., for port 9090), try:
lsof -i :9090 kill -9 <PID> pkill -f prometheus docker compose up
-
Check Docker containers:
docker ps
You should see 3 containers:
- Weather Pipeline (port 8000)
- Prometheus (port 9090)
- Grafana (port 3000)
Once Docker is up, open Grafana at http://localhost:3000.
- Username:
admin
- Password:
admin
(by default, if unchanged)
You’ll see panels for:
- Pipeline Duration
- Data Volumes (Records Processed & Data Points Extracted)
- API Performance
If you see “No data,” check your
prometheus.yml
or the pipeline’s main logs to ensure metrics are being scraped properly.
- Install & Initialize Airflow:
pip install apache-airflow airflow db init
- Create Admin User:
airflow users create \ --username admin \ --password admin \ --firstname Admin \ --lastname User \ --role Admin \ --email admin@example.com
- Add the DAG:
mkdir -p ~/airflow/dags cp airflow/weather_pipeline_dag.py ~/airflow/dags/
- Start Airflow:
airflow webserver --port 8080 airflow scheduler
- Screenshot (DAGs):
You should now see a list of DAGs (pipelines). Enable or trigger the relevant ones.
- Start Minikube:
minikube start
- Apply the Weather Pipeline Deployment:
kubectl apply -f kubernetes/deployment.yaml
- (Optional) Create a secret for your API key:
kubectl create secret generic weather-pipeline-secrets \ --from-literal=API_KEY=your_openweathermap_api_key
- Check pods:
kubectl get pods
Now your pipeline can run in a Kubernetes environment!
graph TD A[Input Config] --> B[Data Extraction] B --> C[Data Transformation] C --> D[Data Loading] C --> E[Data Analysis] E --> F[Visualization] F --> G[Results]
- Extract: Grab weather data from OpenWeatherMap
- Transform: Clean, normalize, handle outliers
- Load: Store processed data in local files or DB
- Analyze: Generate city comparisons, identify trends
- Visualize: Plot charts & graphs (Matplotlib, etc.)
mindmap root((Weather Analysis)) City Comparisons Temperature Humidity Wind Speed Temporal Analysis Daily Variation Long-term Trend Weather Conditions Condition Distribution Alerts Correlation Temperature-Humidity Wind-Temperature
The pipeline can generate:
- Time-series plots (temperature trends)
- Comparison charts across multiple cities
- Correlation analyses (humidity vs. temperature)
- Local Docker:
docker-compose up --build
- Kubernetes (Minikube):
minikube start && kubectl apply -f deployment.yaml
- Airflow: Local scheduler and UI (port
8080
) - EC2: GitHub Actions CI/CD for continuous deployment
- Prometheus collects pipeline metrics (port
9090
). - Grafana visualizes metrics (port
3000
).- Import
monitoring/grafana-dashboard.json
for a pre-built dashboard.
- Import
- Alerts can be configured in Prometheus/Grafana to notify on pipeline failures or anomalies.
- OpenWeatherMap API Docs
- Docker Documentation
- Kubernetes Docs
- Prometheus Docs
- Grafana Docs
- Apache Airflow Docs
© 2025 Fahmi Zainal All rights reserved. This project and its contents are proprietary and confidential. Unauthorized copying, distribution, or modification of this software, via any medium, is strictly prohibited. For licensing inquiries, please contact the project maintainer.