iceberg-python
polaris-local-forge
| iceberg-python | polaris-local-forge | |
|---|---|---|
| 8 | 2 | |
| 965 | 12 | |
| 6.6% | - | |
| 9.8 | 5.9 | |
| 4 days ago | 19 days ago | |
| Python | Jinja | |
| Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
iceberg-python
- DuckLake is an integrated data lake and catalog format
Have you tried out PyIceberg yet? It's a pure Python implementation and it works pretty well. It supports a SQL Catalog as well as an In-Memory Catalog via a baked in SQLite SQL Catalog.
https://py.iceberg.apache.org/
- AWS open source newsletter, #207
Access data in Amazon S3 Tables using PyIceberg through the AWS Glue Iceberg REST endpoint - demonstrates how to access Iceberg tables stored in S3 Tables using PyIceberg, a Python library for programmatic access to Iceberg table metadata as well as to table data in Iceberg format [hands on]
- Let's Build Together: A Local Playground for Apache Polaris
PyIceberg - Python library for Apache Iceberg
- Ultimate Directory of Apache Iceberg Resources
PyIceberg Docs
- Quick tip: Using SingleStore with PyIceberg
In a previous article, we implemented an Iceberg catalog using SingleStore and JDBC. Another way that we can create the catalog is using PyIceberg. In this article, we'll see how.
- Lessons Learned from Scaling to Multi-Terabyte Datasets
Iceberg is working hard to support pure python[0] / rust[1] workflows without Spark. Following Tabular's acquisition [2], I hope it still moves in this direction at the same clip.
We're using iceberg + duckdb to power analytics in our app[3] and I'm really happy with the combo.
0 - https://github.com/apache/iceberg-python
1 - https://github.com/apache/iceberg-rust
2 - https://x.com/thisritchie/status/1800522255426072647
3 - https://www.definite.app/
- Understanding Parquet, Iceberg and Data Lakehouses
You don't need a Spark deployment. The first reference implementations for reading and writing were in Spark.
Now, with PyIceberg, there is read support in Python. Write support should be merged very soon - https://github.com/apache/iceberg-python/pull/41
polaris-local-forge
- All Data and AI Weekly #183 - 31-March-2025
- Let's Build Together: A Local Playground for Apache Polaris
This is why I created an open source starter kit that provides everything needed to get Polaris up and running in a local development environment. The project follows the true spirit of open source collaboration, building upon and integrating with other excellent open source tools in the ecosystem.
What are some alternatives?
lance - Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
pussh - Parallel SSH, batch and command line oriented
Daft - High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
Snowflake-AI-Toolkit - Snowflake AI Toolkit is an AI Accelerator and Playground for enabling AI in Snowflake. It is an Plug and Play Streamlit based Native App that can be used to explore, learn and build rapid prototypes of AI Solutions in Snowflake powered by the Snowflake's Cortex and AI Functions.
iceberg-rust - Apache Iceberg
arrow-datafusion-python - Apache DataFusion Python Bindings [Moved to: https://github.com/apache/datafusion-python]