Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on Teradata

Rapid Productionalization of Predictive Models In-database Modeling with Revolution Analytics on Teradata Skylar Lyon Accenture Analytics

Introduction Skylar Lyon Accenture Analytics • 7 years of experience with focus on big data and predictive analytics - using discrete choice modeling, random forest classification, ensemble modeling, and clustering • Technology experience includes: Hadoop, Accumulo, PostgreSQL, qGIS, JBoss, Tomcat, R, GeoMesa, and more • Worked from Army installations across the nation and also had the opportunity to travel twice to Baghdad to deploy solutions downrange. Copyright © 2014 Accenture. All rights reserved. 2

How we got here Project background and my involvement • New Customer Analytics team for Silicon Valley Internet eCommerce giant • Data scientists developing predictive models • Deferred focus on productionalization • Joined as Big Data Infrastructure and Analytics Lead Copyright © 2014 Accenture. All rights reserved. 3

Colleague‘s CRAN R model Binomial logistic regression • 50+ Independent variables including categorical with indicator variables • Train from small sample (many thousands) – not a problem in and of itself • Scoring across entire corpus (many hundred millions) – slightly more challenging Copyright © 2014 Accenture. All rights reserved. 4

Benchmarking our optimized process 5+ hours to 40 seconds: Recommendation is that this now become the defacto productionalization process Copyright © 2014 Accenture. All rights reserved. 6 rows minutes

Optimization process Recode CRAN R to Rx R Before trainit <- glm(as.formula(specs[[i]]), data = training.data, family='binomial', maxit=iters) fits <- predict(trainit, newdata=test.data, type='response') After trainit <- rxGlm(as.formula(specs[[i]]), data = training.data, family='binomial', maxIterations=iters) fits <- rxPredict(trainit, newdata=test.data, type='response') Copyright © 2014 Accenture. All rights reserved. 7

Additional benefits to new process Technology is increasing data science team’s options and opportunities • Train in-database on much larger set – reduces need to sample • Nearly “native” R language – decrease deploy time • Hadoop support – score in multiple data warehouses Copyright © 2014 Accenture. All rights reserved. 8

Technical considerations Environment setup • Teradata environment – 4 node, 1700 series appliance server • Revolution R Enterprise – version 7.1, running R 3.0.2 Copyright © 2014 Accenture. All rights reserved. 10

Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on Teradata

More Related Content

What's hot

Viewers also liked

Similar to Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on Teradata

More from Revolution Analytics

Recently uploaded

Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on Teradata

Editor's Notes