DEV Community

Cover image for Amazon S3 Tables: Turn Your S3 into a SQL-Powered Data Lakehouse – Desi Style!

Amazon S3 Tables: Turn Your S3 into a SQL-Powered Data Lakehouse – Desi Style!

"Bas ek storage bucket hai s3 toh... "kaise query karein SQL?"

Lo bhai, ab mil gaya solution – Amazon S3 Tables! AWS be like: 😎


🧠 Kya Hai S3 Tables?

Amazon S3 Tables make your S3 intelligent enough to manage analytics workloads – bina zyada jhanjhat ke – with tools that improve query performance and lower table storage costs.

Yeh tables specially banaye gaye hain for tabular data jaise:

  • 📢 Ad impressions
  • 📈 Streaming sensor readings
  • 💸 Daily transactions

Just like a traditional database table — rows and columns, bhai!


🪣 Naya Bucket Type: Table Bucket

Amazon S3 Tables introduce a new bucket type called the table bucket.

  • Yeh bucket store karta hai tables as sub-resources
  • Tables stored in Apache Iceberg format (future-proof!)
  • Perfect for big data workflows & data lakes

🧪 Query Like a Boss – With SQL!

Once your data is stored in S3 Table Buckets, aap SQL jaisa query likh sakte ho using:

⚙️ Engine 🧠 Use Case
🔥 Apache Spark High-performance big data jobs
🎯 Amazon Redshift Data warehouse meets lakehouse
🧠 Amazon Athena Direct SQL on S3 without ETL

Bas schema set karo, aur phir SELECT * FROM your_table — easy as that! 😍


📌 S3 Tables Explained Short and Sweet

Amazon S3 Tables =
✅ Analytics-ready storage in S3
✅ Iceberg format support
✅ SQL queries via Athena, Redshift, Spark
✅ Table-like behavior with cost-efficient S3 backend

Aapka data lake ab ban gaya data smart lake! 🧠💧


🔧 Implementation – S3 Table Banate Hai Bhai! 🏗️

📍 Default Region: us-east-1
(AWS ka New York samajh lo – sab kuch pehle yahi chalu hota hai 😄)


🪣 Step 1: Create a Table Bucket and Enabling Its Integration

Create


🔗 Step 2: After Creation of Table Bucket

Image


🔐 Step 3: Creating IAM Role – Glue Ko Permission Dena Zaroori Hai Bhai! 🧑‍💻

Ab Glue job ko apna kaam karne ke liye proper IAM Role chahiye. Without permission, koi bhi AWS service kaam nahi karegi — "Access Denied" aa jayega 😅

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "glue.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }


📦 Step 4: Downloading & Storing the Iceberg JAR – Glue Ko Samjhao Iceberg! 🧊

Ab Glue job ko Apache Iceberg Tables handle karne ke liye thoda extra gyaan chahiye — uske liye chahiye runtime JAR.

Yeh JAR file ek tarah se translator hai jo Glue ko samjhaata hai ki Iceberg format ka data kaise read/write kar

S3 Jar

https://mvnrepository.com/artifact/software.amazon.s3tables/s3-tables-catalog-for-iceberg-runtime


🔥 Step 5: Creating Glue Job – Ab Aayega Spark Ka Tadka! ⚡

Ab hamara table bucket aur Iceberg JAR ready hai. Next step – let's create a Glue Job jo Spark engine ka use karke S3 Tables ke saath baat kare.


🛠️ Glue Job Configuration

Here's how to set up the Glue job:

  • Job Type: Spark
  • Glue Version: 5.0 (❗Only this version supports S3 Tables)
  • IAM Role: Use the IAM role created in the earlier step (with S3 + Glue permissions)
  • Script Location: You can provide a script or write one in the editor
  • JAR Location: Provide the path to the Iceberg runtime JAR you uploaded in the last step

Configuration


Adding Dependent jar path which we have stored in S3 Bucket.

Dependent


💻 Step 6: Glue Script Code – Table Bana Denge Boss! 🧊

Ab jab Glue Job ready hai, toh chalo ek Iceberg Table create karte hain jo Amazon S3 Table Bucket ke andar store hoga. Is script se aap Spark engine ke through Iceberg table create kar sakte ho using SQL style queries.

🔁 Note: Replace ACCOUNT_NO with your actual AWS account ID!

import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from pyspark.sql import SparkSession # Configure Spark session for Iceberg spark_conf = SparkSession.builder.appName("GlueJob") \ .config("spark.sql.catalog.s3tablesbucket", "org.apache.iceberg.spark.SparkCatalog") \ .config("spark.sql.catalog.s3tablesbucket.catalog-impl", "software.amazon.s3tables.iceberg.S3TablesCatalog") \ .config("spark.sql.catalog.s3tablesbucket.warehouse", "arn:aws:s3tables:us-east-1:ACCOUNT_NO:bucket/demo-bucket-table-poc") \ .config("spark.sql.defaultCatalog", "s3tablesbucket") \ .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \ .config("spark.sql.catalog.s3tablesbucket.cache-enabled", "false") # Initialize Glue context with custom Spark configuration sc = SparkContext.getOrCreate(conf=spark_conf.getOrCreate().sparkContext.getConf()) glueContext = GlueContext(sc) spark = glueContext.spark_session ## @params: [JOB_NAME] args = getResolvedOptions(sys.argv, ['JOB_NAME']) job = Job(glueContext) job.init(args['JOB_NAME'], args) namespace = "demo" table = "student_table" def run_sql(query): try: result = spark.sql(query) result.show() return result except Exception as e: print(f"Error executing query '{query}': {str(e)}") return None def main(): try: # Create a new namespace if it doesn't exist print("CREATE NAMESPACE") run_sql(f"CREATE NAMESPACE IF NOT EXISTS {namespace}") # Show all namespaces print("SHOW NAMESPACES") run_sql("SHOW NAMESPACES") # Describe a specific namespace print("DESCRIBE NAMESPACE") run_sql(f"DESCRIBE NAMESPACE {namespace}") # Create table in the namespace print("CREATE TABLE") create_table_query = f""" CREATE TABLE IF NOT EXISTS {namespace}.{table} ( rollno INT, name STRING, marks INT ) """ run_sql(create_table_query) # Insert data into table print("INSERT INTO") insert_query = f""" INSERT INTO {namespace}.{table} VALUES (1, 'ABC', 100), (2, 'XYZ', 200) """ run_sql(insert_query) # Show tables in the namespace print("SHOW TABLES") run_sql(f"SHOW TABLES IN {namespace}") # Select all from a specific table print("SELECT FROM TABLE") run_sql(f"SELECT * FROM {namespace}.{table} LIMIT 20") except Exception as e: print(f"An error occurred in main execution: {str(e)}") raise # Re-raise the exception for Glue to handle finally: job.commit() if __name__ == "__main__": try: main() except Exception as e: print(f"Job failed with error: {str(e)}") sys.exit(1) 
Enter fullscreen mode Exit fullscreen mode

🔍 Step 7: Dekho Ab Magic! – S3 Table Bucket mein Table Nazar Aayega 🪄

Jaise hi aapka Glue Job successfully execute ho jaata hai, namespace aur table details automatically create ho jaayenge aapke S3 Table Bucket ke andar.

Table


🔎 Step 8: Ab Athena Se Baat Karenge – Table Ko Query Karke Dekhen! 🧠📊

Ab hum dekhte hain ki Athena ke through kaise apne S3 Table ko query karna hai. Yeh ekdum simple SQL jaisa feel dega!

Athena


✅ Wrap Up – Ab S3 Sirf Storage Nahi, Smart Table Hai! 🎓

Toh bhaiyon aur behno, ab jab koi kahe:

“S3 toh sirf ek storage hai…”

Aap confidently keh sakte ho –

“Nahi! Ab hai Amazon S3 Tables – ek analytics-ready, SQL-queryable, smart storage solution!” 🧠💡


💥 Ye powerful combo deta hai:

✅ Low-cost S3 storage
✅ High-performance Apache Iceberg tables
✅ SQL queries with Athena, Redshift, and Spark
✅ Easy integration with AWS Glue & other services

Agar aapka kaam hai big data, analytics, ya data lake creation,
toh S3 Tables ek game-changer ban sakta hai. 🚀


👨‍💻 About Me

Hi! I'm Utkarsh, a Cloud Specialist & AWS Community Builder who loves turning complex AWS topics into fun chai-time stories

👉 Explore more


Top comments (0)