Sudheesh Katkam Simplifying Data Access for Python
Introduction • Data comes in all shapes, sizes and formats • Data captured in multiple storage systems • Data takes a complex path to (Python) applications • How do we simplify access to data?
Demo
Traditional Memory Buffer Memory Layout Table
Traditional Memory Buffer Arrow Memory Buffer Memory Layout Table
Apache Arrow Goals • Cache-efficient columnar memory • Zero-copy messaging / IPC • Language-agnostic metadata • Complex/ nested schema support • Main implementations in C++ and Java, with bindings for C, Python, Ruby, JavaScript
Apache Arrow
Apache Arrow Adoption
About Dremio • Launched in July 2017 • Self-Service Data Platform • Apache License • Built entirely on Apache Arrow, Calcite, Parquet • Narwhal’s name is Gnarly (see me for stickers!)
SQL Data Virtualization RDBMS, MongoDB, Elasticsearch, Hadoop, S3, NAS, Excel, JSON Data Acceleration OLAP and ad hoc queries at interactive speed, without cubes or BI extracts Data Curation Wrangle, prepare, enrich any source without making copies of your data Data Catalog Interactive Data Discovery, Enterprise and Personal Data Assets New Tier in Analytics: Self-Service Data
Demo
Join the Community! • GitHub: github.com/dremio/dremio-oss github.com/apache/arrow • Dremio Community: community.dremio.com • Arrow Slack:apachearrowslackin.herokuapp.com • Twitter: @ApacheArrow, @DremioHQ

Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow