© 2018 KNIME AG. All Right Reserved. From Raw Data to Deployment Paolo Tamagnini: paolo.tamagnini@knime.com Maarit Widmann: maarit.widmann@knime.com Rosaria Silipo: rosaria.silipo@knime.com @KNIME
© 2018 KNIME AG. All Rights Reserved. Do you recognize this? 2 https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
© 2018 KNIME AG. All Rights Reserved. Let’s unroll it! It always starts with some data … 3 Data Preparation Model Training Model Optimization Deployment Data Manipulation Data Blending Missing Values Handling Feature Generation Dimensionality Reduction Feature Selection Outlier Removal Normalization Partitioning … Model Training Bag of Models Model Selection Ensemble Models Own Ensemble Model External Models Import Existing Models Model Factory … Parameter Tuning Parameter Optimization Regularization Model Size No. Iterations … Performance Measures Accuracy ROC Curve Cross-Validation … Files & DBs Dashboards REST API SQL Code Export Reporting … Model Evaluation
© 2018 KNIME AG. All Rights Reserved. The many Lives of a Dataset 4 Data Preparation Model Training Model Optimization Model Evaluation Deployment Partitioning: • Training Set • Validation Set • Test Set Training Set Validation Set Test Set New Data from Real World Applications Original Data Set with Past Observations
© 2018 KNIME AG. All Rights Reserved. Data Exploration • Sometimes in between Data Access and Data Preparation there is a Data Exploration phase • The Data Exploration phase is useful to get to know the data • KNIME offers a few visualization nodes to build dashboards to explore the data 5
© 2018 KNIME AG. All Rights Reserved. What about Big Data? • Big Data serves Scalability • The whole Analytics Process is no different on Big Data • You need: – a Big Data Platform – The KNIME Big Data (Spark & Hive) Extension 6
© 2018 KNIME AG. All Rights Reserved. One Example for Every Need The KNIME EXAMPLES Server 7 50_Applications
© 2018 KNIME AG. All Rights Reserved. Classification Problem & Data Set • Airline Dataset: http://stat-computing.org/dataexpo/2009/the-data.html • Smaller dataset (Jan 2007) (AirlineDataset.table) • Challenge: Predict Departure Delays If on original airline dataset, only flights from airport ORD Output Class = “delay” if depdelay > 15min otherwise “no delay” Available features: date, dep time, arr time, carrier, destination, cancelled, … 8
© 2018 KNIME AG. All Rights Reserved. Challenges • Group 1. Data Access and Data Preparation • Group 2. ML Model Training • Group 3. Model Deployment • Import file Learnathon_2018.knar into your workspace 9
© 2018 KNIME AG. All Rights Reserved. Group 1. Data Access and Data Preparation 10
© 2018 KNIME AG. All Rights Reserved. Group 2. Model Training & Optimization 11
© 2018 KNIME AG. All Rights Reserved. Group 3. Deployment 12
© 2018 KNIME AG. All Rights Reserved. KNIME Fall Summit 2018 November 6 – 9 at AT&T Executive Education and Conference Center, Austin, Texas • Tuesday & Wednesday: One-day courses • Thursday & Friday: Summit sessions Use the code LEARNATHON for 10% off tickets! Register at: knime.com/fall-summit2018 13
© 2018 KNIME AG. All Rights Reserved. Save the Date: KNIME Spring Summit 2019 March 18 – 22 at bcc Berlin Congress Center, Berlin • Monday & Tuesday: One-day courses • Wednesday & Thursday: Summit sessions • Friday: Workshops Use the code LEARNATHON for 10% off tickets! Tickets available soon at www.knime.com
© 2018 KNIME AG. All Rights Reserved. KNIME Beginner’s Luck Free Copy of KNIME Beginner’s Luck Book from KNIME Press https://www.knime.com/knimepress with this code: HAMBURG-0918 15
© 2018 KNIME AG. All Rights Reserved. You can find KNIMers here! 16 • KNIME (www.knime.com) • BLOG for news, tips and tricks(www.knime.com/blog) • FORUM for questions and answers (tech.knime.com/forum) • EXAMPLE SERVER for example workflows • LEARNING HUB (www.knime.com/learning-hub) • KNIME TV channel on • KNIME on @KNIME • KNIME on https://www.facebook.com/KNIMEanalytics • On
© 2018 KNIME AG. All Rights Reserved. 17 The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany. Thank You!

KNIME Data Science Learnathon: From Raw Data To Deployment

  • 1.
    © 2018 KNIMEAG. All Right Reserved. From Raw Data to Deployment Paolo Tamagnini: paolo.tamagnini@knime.com Maarit Widmann: maarit.widmann@knime.com Rosaria Silipo: rosaria.silipo@knime.com @KNIME
  • 2.
    © 2018 KNIMEAG. All Rights Reserved. Do you recognize this? 2 https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
  • 3.
    © 2018 KNIMEAG. All Rights Reserved. Let’s unroll it! It always starts with some data … 3 Data Preparation Model Training Model Optimization Deployment Data Manipulation Data Blending Missing Values Handling Feature Generation Dimensionality Reduction Feature Selection Outlier Removal Normalization Partitioning … Model Training Bag of Models Model Selection Ensemble Models Own Ensemble Model External Models Import Existing Models Model Factory … Parameter Tuning Parameter Optimization Regularization Model Size No. Iterations … Performance Measures Accuracy ROC Curve Cross-Validation … Files & DBs Dashboards REST API SQL Code Export Reporting … Model Evaluation
  • 4.
    © 2018 KNIMEAG. All Rights Reserved. The many Lives of a Dataset 4 Data Preparation Model Training Model Optimization Model Evaluation Deployment Partitioning: • Training Set • Validation Set • Test Set Training Set Validation Set Test Set New Data from Real World Applications Original Data Set with Past Observations
  • 5.
    © 2018 KNIMEAG. All Rights Reserved. Data Exploration • Sometimes in between Data Access and Data Preparation there is a Data Exploration phase • The Data Exploration phase is useful to get to know the data • KNIME offers a few visualization nodes to build dashboards to explore the data 5
  • 6.
    © 2018 KNIMEAG. All Rights Reserved. What about Big Data? • Big Data serves Scalability • The whole Analytics Process is no different on Big Data • You need: – a Big Data Platform – The KNIME Big Data (Spark & Hive) Extension 6
  • 7.
    © 2018 KNIMEAG. All Rights Reserved. One Example for Every Need The KNIME EXAMPLES Server 7 50_Applications
  • 8.
    © 2018 KNIMEAG. All Rights Reserved. Classification Problem & Data Set • Airline Dataset: http://stat-computing.org/dataexpo/2009/the-data.html • Smaller dataset (Jan 2007) (AirlineDataset.table) • Challenge: Predict Departure Delays If on original airline dataset, only flights from airport ORD Output Class = “delay” if depdelay > 15min otherwise “no delay” Available features: date, dep time, arr time, carrier, destination, cancelled, … 8
  • 9.
    © 2018 KNIMEAG. All Rights Reserved. Challenges • Group 1. Data Access and Data Preparation • Group 2. ML Model Training • Group 3. Model Deployment • Import file Learnathon_2018.knar into your workspace 9
  • 10.
    © 2018 KNIMEAG. All Rights Reserved. Group 1. Data Access and Data Preparation 10
  • 11.
    © 2018 KNIMEAG. All Rights Reserved. Group 2. Model Training & Optimization 11
  • 12.
    © 2018 KNIMEAG. All Rights Reserved. Group 3. Deployment 12
  • 13.
    © 2018 KNIMEAG. All Rights Reserved. KNIME Fall Summit 2018 November 6 – 9 at AT&T Executive Education and Conference Center, Austin, Texas • Tuesday & Wednesday: One-day courses • Thursday & Friday: Summit sessions Use the code LEARNATHON for 10% off tickets! Register at: knime.com/fall-summit2018 13
  • 14.
    © 2018 KNIMEAG. All Rights Reserved. Save the Date: KNIME Spring Summit 2019 March 18 – 22 at bcc Berlin Congress Center, Berlin • Monday & Tuesday: One-day courses • Wednesday & Thursday: Summit sessions • Friday: Workshops Use the code LEARNATHON for 10% off tickets! Tickets available soon at www.knime.com
  • 15.
    © 2018 KNIMEAG. All Rights Reserved. KNIME Beginner’s Luck Free Copy of KNIME Beginner’s Luck Book from KNIME Press https://www.knime.com/knimepress with this code: HAMBURG-0918 15
  • 16.
    © 2018 KNIMEAG. All Rights Reserved. You can find KNIMers here! 16 • KNIME (www.knime.com) • BLOG for news, tips and tricks(www.knime.com/blog) • FORUM for questions and answers (tech.knime.com/forum) • EXAMPLE SERVER for example workflows • LEARNING HUB (www.knime.com/learning-hub) • KNIME TV channel on • KNIME on @KNIME • KNIME on https://www.facebook.com/KNIMEanalytics • On
  • 17.
    © 2018 KNIMEAG. All Rights Reserved. 17 The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by KNIME.com AG under license from KNIME GmbH, and are registered in the United States. KNIME® is also registered in Germany. Thank You!