Story of Building a Telecom Data Solution Sawinder Pal Kaur, PhD Data Scientist, SAP Labs
Outline 1. Define business objectives and translating business problem into data science problem 2. Introduction to Telecom data - data scale, volume, continuous and categorical variables, static and dynamic data 3. Architecture and data processing pipeline: Big data handling and data science methods for Categorical feature selection 4. Solution Engineering: How to keep project managers do feature selection and identify the opportunities to optimize the existing plans and services?
Business Objective
Business Objective • Personalize recommendation • More customer satisfaction • Improved Customer retention • Increased frequency of selling • Better mix of products • Increased customer loyalty • Better decision on coupons and discounts • Develop effective strategy for new product launches • Better offers to specific customer profile • Better product design / pricing • Improve quality of service for highest margin customers • Invest where highest margin customers are using the network resources Recommend Plans and Services Grouping/ Clustering Identify Profit Maximization Opportunities
Telecom Data & Data Processing Pipeline
Data • How much data is available? • Data infrastructure • Data dashboards • Data preparation for Machine learning • Data protection and privacy
Partitioning the data into similar groups Multi dimensional clustering Grouping customers- One dimensional binning/clustering
High, low, and normal profitable customers - One dimensional outlier detection Multi dimensional outlier detection
• Dealing with missing – • Delete the rows with missing • Replace missing using • mean/median • Other number • Conditional mean • Model like K nearest neighborhood
• Filter Methods – used as independent feature selection e.g. Pearson correlation, Mutual Information, MRMR • Dimensionality reduction – PCA, Variational autoencoder • Feature Engineering • Creating new variables – Polynomials, Interaction variables, Ratios • Wrapper and Embedded methods - used in the model building process Feature selection Base set Learning Model Performance
Business Insights
Cluster Size Revenue Profit Usage Discount Cost 1 1283 0.05 -0.24 0.90 0.23 0.46 2 582 -0.13 -0.05 -0.15 -1.87 -0.10 3 71 -0.28 -0.55 0.05 -8.07 0.46 4 5309 -0.17 -0.01 -0.37 0.25 -0.25 5 9 19.37 16.26 1.12 -0.06 3.03 6 222 0.10 -1.19 3.66 0.13 2.06 7 270 2.75 2.35 0.11 0.08 0.36 8 8 0.64 -12.55 6.61 0.25 20.97 Revenue, profit and cost is very high Profit is very low profit and cost and volume are very high
Telecom Data Analytics

Telecom Data Analytics

  • 1.
    Story of Buildinga Telecom Data Solution Sawinder Pal Kaur, PhD Data Scientist, SAP Labs
  • 2.
    Outline 1. Define businessobjectives and translating business problem into data science problem 2. Introduction to Telecom data - data scale, volume, continuous and categorical variables, static and dynamic data 3. Architecture and data processing pipeline: Big data handling and data science methods for Categorical feature selection 4. Solution Engineering: How to keep project managers do feature selection and identify the opportunities to optimize the existing plans and services?
  • 3.
  • 4.
    Business Objective • Personalize recommendation •More customer satisfaction • Improved Customer retention • Increased frequency of selling • Better mix of products • Increased customer loyalty • Better decision on coupons and discounts • Develop effective strategy for new product launches • Better offers to specific customer profile • Better product design / pricing • Improve quality of service for highest margin customers • Invest where highest margin customers are using the network resources Recommend Plans and Services Grouping/ Clustering Identify Profit Maximization Opportunities
  • 5.
    Telecom Data & DataProcessing Pipeline
  • 6.
    Data • How muchdata is available? • Data infrastructure • Data dashboards • Data preparation for Machine learning • Data protection and privacy
  • 7.
    Partitioning the datainto similar groups Multi dimensional clustering Grouping customers- One dimensional binning/clustering
  • 8.
    High, low, andnormal profitable customers - One dimensional outlier detection Multi dimensional outlier detection
  • 9.
    • Dealing withmissing – • Delete the rows with missing • Replace missing using • mean/median • Other number • Conditional mean • Model like K nearest neighborhood
  • 10.
    • Filter Methods– used as independent feature selection e.g. Pearson correlation, Mutual Information, MRMR • Dimensionality reduction – PCA, Variational autoencoder • Feature Engineering • Creating new variables – Polynomials, Interaction variables, Ratios • Wrapper and Embedded methods - used in the model building process Feature selection Base set Learning Model Performance
  • 11.
  • 12.
    Cluster Size RevenueProfit Usage Discount Cost 1 1283 0.05 -0.24 0.90 0.23 0.46 2 582 -0.13 -0.05 -0.15 -1.87 -0.10 3 71 -0.28 -0.55 0.05 -8.07 0.46 4 5309 -0.17 -0.01 -0.37 0.25 -0.25 5 9 19.37 16.26 1.12 -0.06 3.03 6 222 0.10 -1.19 3.66 0.13 2.06 7 270 2.75 2.35 0.11 0.08 0.36 8 8 0.64 -12.55 6.61 0.25 20.97 Revenue, profit and cost is very high Profit is very low profit and cost and volume are very high