ML and Data Science at Uber Sudhir Tonse, Engineering Lead, Uber FEB 18, 2017 GITPro 2017
Where do we want to go today? Agenda
Introduction Problem Space Tools of the Trade Challenges likely unique to Uber .. interesting opportunities Challenges & Opportunities Who am I and what are we talking about today? Why does Uber need ML and what are some of the problems we tackle? What does Uber’s tech stack look like? Agenda Hop on the Uber ML Ride … destination please?
Uber, this talk and me the speaker Introduction
•Engineering Leader @ Uber •Marketplace Data •Realtime Data Processing •Analytics •Forecasting • Previous -> MicroServices/Cloud Platform at Netflix •Twitter @stonse 5 Who am I?
Driver Partner Riders Merchants Uber’s logistic platform Marketplace Our partner in the ride sharing business Folks like you and me who request a ride on any of Uber’s transportation products. e.g. UberX, uberPool Restaurants or shops that have signed on to the Uber platform. Introduction Uber
“Transportation as reliable as running water, everywhere, for everyone” Uber Mission
• Mapping (Routes, ETAs, …) • Fraud and Security • uberEATS Recommendations • Marketplace Optimizations • Forecasting • Driver Positioning • Health, Trends, Issues, ... • And more … ML Problems Why do we need Machine Learning? ETA, Route Optimization, Pickup Points, Pool rider matches
Marketplace Build the platform, products, and algorithms responsible for the real time execution and online optimization of Uber's marketplace. We are building the brain of Uber, solving NP-hard algorithms and economic optimization problems at scale. Uber | Marketplace Mission
Request Event Driver Accept Event Trip Started Event more events … Overall Flow Ma t c h Se r v i ces
Trip States Sub-title
Scale ~400 Cities Many Billion Events per Day
Scale Geo Space Vehicle Types Time
• Indexing, Lookup, Rendering • Symmetric Neighbors • Convex & Compact Regions • Equal Areas • Equal Shape Space -> Hexagons
Granular Data
Multi-resolution Realtime Forecasting, Airport ETR ML Examples
Real-time spatiotemporal forecasting at a variable resolution of time and space Example 1
Rider Demand Forecasting Predict #of Riders per hexagon for various time horizons
Spatial granularity & Multiresolution Forecasting The more you aggregate or zoom out, trends emerge Sparsity at hexagon level: many hexagons have little signal
1. Forecast at the hex-cluster level 2. Using past activity for a similar time window, apportion out total activity from the hex-cluster to its component hexagons Multiresolution Forecasting Forecasting at different spatial granularity
Airport ETR ML Example No 2. Airport Taxi Line Uber Airport Lot
Flight Arrival (t1) Client Eyeball (t2) Pickup Request (t3) Airport Demand (ETR) Mean Delay ~30 minutes Half Life ~ 1.0 minute
“ETR too much. I bail out ..” Solution: Time Meter Banner “Only about 20 minutes. I would wait!” 20 minutes wait to get a $40 trip, oh yeah!
Data Science Flow A Typical Data Scientist Workflow Analyze/Prepare Feature Selection Model Fitting Evaluation Storage Apply Model and serve predictions Evaluate Runtime Performance Serving/Dissemination Monitoring Data exploration, cleansing, transformations etc. Evaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
Data Preparation A Typical Data Scientist Workflow Analyze/Prepare Data exploration, cleansing, transformations etc. Feature Selection Model Fitting Evaluation Storage Apply Model and serve predictions Evaluate Runtime Performance Serving/Dissemination Monitoring Evaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
Data Processing
Data Science Flow A Typical Data Scientist Workflow Feature Selection Model Fitting Evaluation StorageEvaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
Data Scientists (Analytics)
Data Science Flow A Typical Data Scientist Workflow Analyze/Prepare Feature Selection Model Fitting Evaluation Storage Apply Model and serve predictions Evaluate Runtime Performance Serving/Dissemination Monitoring Data exploration, cleansing, transformations etc. Evaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
Overview Streamline the forecasting process from conception to production • Streams w/ flexible geo-temporal resolution • Valuable external data feeds • Modular, reusable components at each stage • Same code for offline model fitting and production to enable fast model iteration Operators & Computation DAGs Feature Generation Online ModelsOffline Model Fitting Predictions, Metrics & Visualizations External DataStreams Airport feed Weather feed Concerts feed
Realtime Models - Something happened at a time and a place. Now we will Evaluate the DAG - DAG evaluated for a single instant in time real-time spatiotemporal forecasting at a variable resolution of time and space
Under the hood .. Tools & Framework
• Curated set of algorithms • Model Versioning • Model Performance & Visualizations • Automated Deployment Workflow • … Machine Learning as a Service ML workflow at Uber
Open Source Technologies Sub-title Samza Micro Batch based processing Good integration with HDFS & S3 Exactly once semantics Spark Streaming Well integrated with Kafka Built in State Management Built in Checkpointing Distributed Indexes & Queries Versatile aggregations Jupyter/IPython Great community support Data Scientists familiar with Python
.. Challenges & Opportunities
• What’s the best model for integrating vast amounts of disparate kinds of information over space and time? • What’s the best way of building spatiotemporal models in a fashion that is effective, elegant, and debuggable? • About a 100 or so more … :-) ML Problems Challenges
Links Thank you! • Realtime Streaming at Uber https://www.infoq.com/presentations/real-tim e-streaming-uber • Spark at Uber (http://www.slideshare.net/databricks/spark- meetup-at-uber) • Career at Uber (https://www.uber.com/careers/) •https://join.uber.com/marketplace
Happy to discuss design/architecture Q & A No product/business questions please :-) @stonse
Proprietary and confidential © 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber. Sudhir Tonse @stonse Thank you

ML and Data Science at Uber - GITPro talk 2017

  • 1.
    ML and DataScience at Uber Sudhir Tonse, Engineering Lead, Uber FEB 18, 2017 GITPro 2017
  • 2.
    Where do wewant to go today? Agenda
  • 3.
    Introduction Problem SpaceTools of the Trade Challenges likely unique to Uber .. interesting opportunities Challenges & Opportunities Who am I and what are we talking about today? Why does Uber need ML and what are some of the problems we tackle? What does Uber’s tech stack look like? Agenda Hop on the Uber ML Ride … destination please?
  • 4.
    Uber, this talkand me the speaker Introduction
  • 5.
    •Engineering Leader @Uber •Marketplace Data •Realtime Data Processing •Analytics •Forecasting • Previous -> MicroServices/Cloud Platform at Netflix •Twitter @stonse 5 Who am I?
  • 6.
    Driver Partner RidersMerchants Uber’s logistic platform Marketplace Our partner in the ride sharing business Folks like you and me who request a ride on any of Uber’s transportation products. e.g. UberX, uberPool Restaurants or shops that have signed on to the Uber platform. Introduction Uber
  • 7.
    “Transportation as reliableas running water, everywhere, for everyone” Uber Mission
  • 8.
    • Mapping (Routes,ETAs, …) • Fraud and Security • uberEATS Recommendations • Marketplace Optimizations • Forecasting • Driver Positioning • Health, Trends, Issues, ... • And more … ML Problems Why do we need Machine Learning? ETA, Route Optimization, Pickup Points, Pool rider matches
  • 9.
    Marketplace Build the platform,products, and algorithms responsible for the real time execution and online optimization of Uber's marketplace. We are building the brain of Uber, solving NP-hard algorithms and economic optimization problems at scale. Uber | Marketplace Mission
  • 10.
    Request Event Driver Accept Event TripStarted Event more events … Overall Flow Ma t c h Se r v i ces
  • 11.
  • 12.
  • 13.
  • 14.
    • Indexing, Lookup,Rendering • Symmetric Neighbors • Convex & Compact Regions • Equal Areas • Equal Shape Space -> Hexagons
  • 15.
  • 16.
  • 17.
    Real-time spatiotemporal forecasting ata variable resolution of time and space Example 1
  • 18.
    Rider Demand Forecasting Predict#of Riders per hexagon for various time horizons
  • 19.
    Spatial granularity &Multiresolution Forecasting The more you aggregate or zoom out, trends emerge Sparsity at hexagon level: many hexagons have little signal
  • 20.
    1. Forecast atthe hex-cluster level 2. Using past activity for a similar time window, apportion out total activity from the hex-cluster to its component hexagons Multiresolution Forecasting Forecasting at different spatial granularity
  • 21.
    Airport ETR ML ExampleNo 2. Airport Taxi Line Uber Airport Lot
  • 22.
    Flight Arrival (t1)Client Eyeball (t2) Pickup Request (t3) Airport Demand (ETR) Mean Delay ~30 minutes Half Life ~ 1.0 minute
  • 23.
    “ETR too much. Ibail out ..” Solution: Time Meter Banner “Only about 20 minutes. I would wait!” 20 minutes wait to get a $40 trip, oh yeah!
  • 24.
    Data Science Flow ATypical Data Scientist Workflow Analyze/Prepare Feature Selection Model Fitting Evaluation Storage Apply Model and serve predictions Evaluate Runtime Performance Serving/Dissemination Monitoring Data exploration, cleansing, transformations etc. Evaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
  • 25.
    Data Preparation A TypicalData Scientist Workflow Analyze/Prepare Data exploration, cleansing, transformations etc. Feature Selection Model Fitting Evaluation Storage Apply Model and serve predictions Evaluate Runtime Performance Serving/Dissemination Monitoring Evaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
  • 26.
  • 27.
    Data Science Flow ATypical Data Scientist Workflow Feature Selection Model Fitting Evaluation StorageEvaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
  • 28.
  • 29.
    Data Science Flow ATypical Data Scientist Workflow Analyze/Prepare Feature Selection Model Fitting Evaluation Storage Apply Model and serve predictions Evaluate Runtime Performance Serving/Dissemination Monitoring Data exploration, cleansing, transformations etc. Evaluate strength of various signals Use Python/R etc. to fit Model. Evaluate Model Performance Store Model with versioning
  • 30.
    Overview Streamline the forecastingprocess from conception to production • Streams w/ flexible geo-temporal resolution • Valuable external data feeds • Modular, reusable components at each stage • Same code for offline model fitting and production to enable fast model iteration Operators & Computation DAGs Feature Generation Online ModelsOffline Model Fitting Predictions, Metrics & Visualizations External DataStreams Airport feed Weather feed Concerts feed
  • 31.
    Realtime Models - Somethinghappened at a time and a place. Now we will Evaluate the DAG - DAG evaluated for a single instant in time real-time spatiotemporal forecasting at a variable resolution of time and space
  • 32.
    Under the hood.. Tools & Framework
  • 33.
    • Curated setof algorithms • Model Versioning • Model Performance & Visualizations • Automated Deployment Workflow • … Machine Learning as a Service ML workflow at Uber
  • 34.
    Open Source Technologies Sub-title Samza MicroBatch based processing Good integration with HDFS & S3 Exactly once semantics Spark Streaming Well integrated with Kafka Built in State Management Built in Checkpointing Distributed Indexes & Queries Versatile aggregations Jupyter/IPython Great community support Data Scientists familiar with Python
  • 35.
  • 36.
    • What’s thebest model for integrating vast amounts of disparate kinds of information over space and time? • What’s the best way of building spatiotemporal models in a fashion that is effective, elegant, and debuggable? • About a 100 or so more … :-) ML Problems Challenges
  • 37.
    Links Thank you! • RealtimeStreaming at Uber https://www.infoq.com/presentations/real-tim e-streaming-uber • Spark at Uber (http://www.slideshare.net/databricks/spark- meetup-at-uber) • Career at Uber (https://www.uber.com/careers/) •https://join.uber.com/marketplace
  • 38.
    Happy to discussdesign/architecture Q & A No product/business questions please :-) @stonse
  • 39.
    Proprietary and confidential© 2016 Uber Technologies, Inc. All rights reserved. No part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval systems, without permission in writing from Uber. This document is intended only for the use of the individual or entity to whom it is addressed and contains information that is privileged, confidential or otherwise exempt from disclosure under applicable law. All recipients of this document are notified that the information contained herein includes proprietary and confidential information of Uber, and recipient may not make use of, disseminate, or in any way disclose this document or any of the enclosed information to any person other than employees of addressee to the extent necessary for consultations with authorized personnel of Uber. Sudhir Tonse @stonse Thank you