Telecom Data Analytics

Story of Building a Telecom Data Solution Sawinder Pal Kaur, PhD Data Scientist, SAP Labs

Outline 1. Define business objectives and translating business problem into data science problem 2. Introduction to Telecom data - data scale, volume, continuous and categorical variables, static and dynamic data 3. Architecture and data processing pipeline: Big data handling and data science methods for Categorical feature selection 4. Solution Engineering: How to keep project managers do feature selection and identify the opportunities to optimize the existing plans and services?

Business Objective • Personalize recommendation • More customer satisfaction • Improved Customer retention • Increased frequency of selling • Better mix of products • Increased customer loyalty • Better decision on coupons and discounts • Develop effective strategy for new product launches • Better offers to specific customer profile • Better product design / pricing • Improve quality of service for highest margin customers • Invest where highest margin customers are using the network resources Recommend Plans and Services Grouping/ Clustering Identify Profit Maximization Opportunities

Telecom Data & Data Processing Pipeline

Data • How much data is available? • Data infrastructure • Data dashboards • Data preparation for Machine learning • Data protection and privacy

Partitioning the data into similar groups Multi dimensional clustering Grouping customers- One dimensional binning/clustering

High, low, and normal profitable customers - One dimensional outlier detection Multi dimensional outlier detection

• Dealing with missing – • Delete the rows with missing • Replace missing using • mean/median • Other number • Conditional mean • Model like K nearest neighborhood

• Filter Methods – used as independent feature selection e.g. Pearson correlation, Mutual Information, MRMR • Dimensionality reduction – PCA, Variational autoencoder • Feature Engineering • Creating new variables – Polynomials, Interaction variables, Ratios • Wrapper and Embedded methods - used in the model building process Feature selection Base set Learning Model Performance

Cluster Size Revenue Profit Usage Discount Cost 1 1283 0.05 -0.24 0.90 0.23 0.46 2 582 -0.13 -0.05 -0.15 -1.87 -0.10 3 71 -0.28 -0.55 0.05 -8.07 0.46 4 5309 -0.17 -0.01 -0.37 0.25 -0.25 5 9 19.37 16.26 1.12 -0.06 3.03 6 222 0.10 -1.19 3.66 0.13 2.06 7 270 2.75 2.35 0.11 0.08 0.36 8 8 0.64 -12.55 6.61 0.25 20.97 Revenue, profit and cost is very high Profit is very low profit and cost and volume are very high

Telecom Data Analytics

More Related Content

Similar to Telecom Data Analytics

Recently uploaded

Telecom Data Analytics