iceberg-python
helm
| iceberg-python | helm | |
|---|---|---|
| 8 | 268 | |
| 965 | 29,200 | |
| 6.6% | 1.7% | |
| 9.8 | 9.8 | |
| 4 days ago | 4 days ago | |
| Python | Go | |
| Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
iceberg-python
- DuckLake is an integrated data lake and catalog format
Have you tried out PyIceberg yet? It's a pure Python implementation and it works pretty well. It supports a SQL Catalog as well as an In-Memory Catalog via a baked in SQLite SQL Catalog.
https://py.iceberg.apache.org/
- AWS open source newsletter, #207
Access data in Amazon S3 Tables using PyIceberg through the AWS Glue Iceberg REST endpoint - demonstrates how to access Iceberg tables stored in S3 Tables using PyIceberg, a Python library for programmatic access to Iceberg table metadata as well as to table data in Iceberg format [hands on]
- Let's Build Together: A Local Playground for Apache Polaris
PyIceberg - Python library for Apache Iceberg
- Ultimate Directory of Apache Iceberg Resources
PyIceberg Docs
- Quick tip: Using SingleStore with PyIceberg
In a previous article, we implemented an Iceberg catalog using SingleStore and JDBC. Another way that we can create the catalog is using PyIceberg. In this article, we'll see how.
- Lessons Learned from Scaling to Multi-Terabyte Datasets
Iceberg is working hard to support pure python[0] / rust[1] workflows without Spark. Following Tabular's acquisition [2], I hope it still moves in this direction at the same clip.
We're using iceberg + duckdb to power analytics in our app[3] and I'm really happy with the combo.
0 - https://github.com/apache/iceberg-python
1 - https://github.com/apache/iceberg-rust
2 - https://x.com/thisritchie/status/1800522255426072647
3 - https://www.definite.app/
- Understanding Parquet, Iceberg and Data Lakehouses
You don't need a Spark deployment. The first reference implementations for reading and writing were in Spark.
Now, with PyIceberg, there is read support in Python. Write support should be merged very soon - https://github.com/apache/iceberg-python/pull/41
helm
- Helm v4: key changes from v3
Helm's first major release in six years, v4, is now available. In this article, I'll explore the key changes: switching from 3-Way Merge to Server-Side Apply, support for WASM plugins, improved resource readiness tracking, content-based chart caching, and more.
- Helm Charts: Kubernetes Package Management
Helm GitHub Repository
- Helm v4.0.0
- DocumentDB goes cloud-native: Introducing the DocumentDB Kubernetes Operator
Ready to try it out? Getting started with the operator is straightforward. You can use a local Kubernetes cluster such as minikube or kind and use Helm for installation.
- A Different Way to Think About Deploying Containers to the Cloud
To get to a working deployment of the proposed app, though, you would probably need to learn at least a dozen different k8s concepts. Here’s a short list of what you might need: a Deployment to describe Pods in a ReplicaSet along with a Service, Ingress and Ingress Controller to hook up your domain. Helm to install Cert Manager so you can get SSL working. You’ll likely need to learn about plenty more along the way.
- Fast Overview for Infraestructure as Data
HELM
- Guide to Testing SQS-Based Microservices with Signadot Sandboxes
Install Docker, Minikube & Helm on your local machine
- Platform Engineering for the uninitiated
Some of the brightest minds came together to set up the Cloud Native Compute Foundation and championed the concept of GitOps. This brought about yet another major shift in developer mindsets, and allowed techies to be more declarative with their infrastructure and focus solely on the what; the responsibility of how was abstracted away with the new technologies on the horizon. With Kubernetes widely adopted for container orchestration at scale in the cloud, Helm surfaced as the package manager for deployments to k8s clusters. The packages came to be known as Charts and could be deployed with predictability and consistency. This was a step in the right direction, but still included a little bit of ClickOps. Thanks to CNCF yet again, tools like Flux and Argo CD alleviated that pain aptly, and it became possible to manage Helm deployments in a declarative manner. As one can tell, this is already a lot to deal with for a developer who's supposed to write code for implementing business logic.
- Kubernetes Overview: Container Orchestration & Cloud-Native
Package Management: Helm charts simplify application deployment and configuration management. Over 2,000 community charts provide pre-configured applications, while organizations maintain internal chart repositories for proprietary software. Helm best practices ensure secure and maintainable deployments.
- Deploying GitHub Self-Hosted Runners on Your Home Kubernetes Cluster with ARC
kubectl and helm installed on your machine
What are some alternatives?
lance - Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
oauth2-proxy - A reverse proxy that provides authentication with Google, Azure, OpenID Connect and many more identity providers.
Daft - High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
istio - Connect, secure, control, and observe services.
iceberg-rust - Apache Iceberg
keda - KEDA is a Kubernetes-based Event Driven Autoscaling component. It provides event driven scale for any container running in Kubernetes