Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow

Sudheesh Katkam Simplifying Data Access for Python

Introduction • Data comes in all shapes, sizes and formats • Data captured in multiple storage systems • Data takes a complex path to (Python) applications • How do we simplify access to data?

Traditional Memory Buffer Memory Layout Table

Traditional Memory Buffer Arrow Memory Buffer Memory Layout Table

Apache Arrow Goals • Cache-efficient columnar memory • Zero-copy messaging / IPC • Language-agnostic metadata • Complex/ nested schema support • Main implementations in C++ and Java, with bindings for C, Python, Ruby, JavaScript

About Dremio • Launched in July 2017 • Self-Service Data Platform • Apache License • Built entirely on Apache Arrow, Calcite, Parquet • Narwhal’s name is Gnarly (see me for stickers!)

SQL Data Virtualization RDBMS, MongoDB, Elasticsearch, Hadoop, S3, NAS, Excel, JSON Data Acceleration OLAP and ad hoc queries at interactive speed, without cubes or BI extracts Data Curation Wrangle, prepare, enrich any source without making copies of your data Data Catalog Interactive Data Discovery, Enterprise and Personal Data Assets New Tier in Analytics: Self-Service Data

Join the Community! • GitHub: github.com/dremio/dremio-oss github.com/apache/arrow • Dremio Community: community.dremio.com • Arrow Slack:apachearrowslackin.herokuapp.com • Twitter: @ApacheArrow, @DremioHQ

Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow

More Related Content

What's hot

Similar to Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow

More from PyData

Recently uploaded

Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow