DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Apache Iceberg Table Optimization #2: The Basics of Compaction — Bin Packing Your Data for Efficiency

Apache Iceberg Table Optimization #2: The Basics of Compaction — Bin Packing Your Data for Efficiency

Comments
3 min read
Big Data Fundamentals: big data tutorial

Big Data Fundamentals: big data tutorial

1
Comments
5 min read
Big Data Fundamentals: big data tutorial

Big Data Fundamentals: big data tutorial

1
Comments
5 min read
Apache Iceberg Table Optimization #7: Using Iceberg Metadata Tables to Determine When Compaction Is Needed

Apache Iceberg Table Optimization #7: Using Iceberg Metadata Tables to Determine When Compaction Is Needed

Comments
3 min read
Apache Iceberg Table Optimization #5: Avoiding Metadata Bloat with Snapshot Expiration and Rewriting Manifests

Apache Iceberg Table Optimization #5: Avoiding Metadata Bloat with Snapshot Expiration and Rewriting Manifests

Comments
3 min read
Apache Iceberg Table Optimization #4: Smarter Data Layout — Sorting and Clustering Iceberg Tables

Apache Iceberg Table Optimization #4: Smarter Data Layout — Sorting and Clustering Iceberg Tables

1
Comments
3 min read
Big Data Fundamentals: big data tutorial

Big Data Fundamentals: big data tutorial

1
Comments
5 min read
Apache Iceberg Table Optimization #1: The Cost of Neglect — How Apache Iceberg Tables Degrade Without Optimization

Apache Iceberg Table Optimization #1: The Cost of Neglect — How Apache Iceberg Tables Degrade Without Optimization

Comments
3 min read
Apache Iceberg Table Optimization #3: Optimizing Compaction for Streaming Workloads in Apache Iceberg

Apache Iceberg Table Optimization #3: Optimizing Compaction for Streaming Workloads in Apache Iceberg

Comments
3 min read
Database Design Errors to Avoid & How To Fix Them

Database Design Errors to Avoid & How To Fix Them

11
Comments 1
5 min read
Cross-Platform Multi-Channel Attribution in Marketing: Balancing Costs and Results Across Devices

Cross-Platform Multi-Channel Attribution in Marketing: Balancing Costs and Results Across Devices

1
Comments
5 min read
🛒 Real-Life Data Lakehouse Use Case: Revolutionizing Retail Analytics

🛒 Real-Life Data Lakehouse Use Case: Revolutionizing Retail Analytics

1
Comments
2 min read
Data and analytics reimagined with Terraform and DevOps principles

Data and analytics reimagined with Terraform and DevOps principles

Comments
3 min read
Top Trends and Applications in Data Engineering and AI for the Modern Enterprise

Top Trends and Applications in Data Engineering and AI for the Modern Enterprise

Comments 1
5 min read
A Real-Time Earthquake Monitoring Pipeline with Kafka, MySQL, PostgreSQL, and Grafana

A Real-Time Earthquake Monitoring Pipeline with Kafka, MySQL, PostgreSQL, and Grafana

3
Comments
4 min read
Pandas vs Polars: Is It Time to Rethink Python’s Trusted DataFrame Library?

Pandas vs Polars: Is It Time to Rethink Python’s Trusted DataFrame Library?

3
Comments 2
3 min read
Big Data Fundamentals: big data project

Big Data Fundamentals: big data project

5
Comments
5 min read
Big Data Fundamentals: big data tutorial

Big Data Fundamentals: big data tutorial

5
Comments
5 min read
Big Data Fundamentals: big data

Big Data Fundamentals: big data

5
Comments
6 min read
Big Data Fundamentals: big data example

Big Data Fundamentals: big data example

5
Comments
5 min read
Building a Real-Time Crypto Pipeline with Binance APIs, PostgreSQL, Debezium, Kafka, Spark & Cassandra

Building a Real-Time Crypto Pipeline with Binance APIs, PostgreSQL, Debezium, Kafka, Spark & Cassandra

2
Comments
6 min read
Architecting your GenAI data pipeline with AWS native services

Architecting your GenAI data pipeline with AWS native services

6
Comments
10 min read
BEGINNER'S GUIDE TO STREAM REAL-TIME DATA USING APACHE KAFKA

BEGINNER'S GUIDE TO STREAM REAL-TIME DATA USING APACHE KAFKA

Comments
4 min read
Stop Drawing ETL Diagrams — Your Python Code Visualizes Itself

Stop Drawing ETL Diagrams — Your Python Code Visualizes Itself

4
Comments
4 min read
⚡ Kafka ClickHouse: Real-Time Data Pipeline for Beginners

⚡ Kafka ClickHouse: Real-Time Data Pipeline for Beginners

2
Comments
2 min read
loading...