Tutorial of Topological Data Analysis Tran Quoc Hoan @k09hthaduonght.wordpress.com/ Paper Alert 2016-04-15, Hasegawa lab., Tokyo The University of Tokyo Part III - Mapper Algorithm
My TDA = Topology Data Analysis ’s road TDA Road 2 Part I - Basic concepts & applications Part II - Advanced TDA computation Part III - Mapper Algorithm Part V - Applications in… Part VI - Applications in… Part IV - Software Roadmap He is following me
TDA Road Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/ Mapper Algorithm
Basic motivation Mapper Algorithm 4 Basic idea Perform clustering at different “scales”, track how clusters change as scale varies Motivation • Coarser than manifold learning, but still works in nonlinear situation • Extract meaningful geometric information about dataset • Efficiently computable (for large dataset) Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition. G Singh, F Mémoli, GE Carlsson - SPBG, 2007
Morse theory Mapper Algorithm 5 Basic idea Describe topology of a smooth manifold M using level sets of a suitable function h : M -> R • Recover M by looking at h-1((∞, t]), as t scans over the range of h • Topology of M changes at critical points of h
Reeb graphs Mapper Algorithm 6 • For each t in R, contract each component of f-1(t) to a point • Resulting structure is a graph
Mapper Mapper Algorithm 7 The mapper algorithm is a generalization of this procedure (Singh- Memoli-Carlsson) Input ✤ Filter (continuous) function f: X -> R ✤ Cover L of im(f) by open intervals: Method ✤ Cluster each inverse image f-1(Lα) into various connected components ✤ The Mapper is the nerve of V • Clusters are vertices • 1 k-simplex per (k+1)-fold intersection connected cover V ✤ Color vertices according to average value of f in the cluster k i=0Vi 6= ;, V0, ..., Vk 2 V
Workflow - Illustration Mapper Algorithm 8Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/ f could be in n-dimension
Workflow - Illustration Mapper Algorithm 9Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/ f could be in n-dimension
Workflow - Illustration Mapper Algorithm 10Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/ f could be in n-dimension
Mapper in practice Mapper Algorithm 11 Input ✤ Filter (continuous) function f: P -> R ✤ Cover L of im(f) by open intervals: Method ✤ Cluster each inverse image f-1(Lα) into various connected components in G ✤ The Mapper is the nerve of V connected cover V ✤ Color vertices according to average value of f in the cluster - Point cloud P with metric dP - Compute neighborhood graph G = (P, E) • Clusters are vertices • 1 k-simplex per (k+1)-fold intersection k i=0Vi 6= ;, V0, ..., Vk 2 V (intersections materialized by data points)
Mapper in practice Mapper Algorithm 12Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
Mapper in practice Mapper Algorithm 13Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
Mapper in practice Mapper Algorithm 14Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
Mapper in practice Mapper Algorithm 15 Parameters ✤ Filter (continuous) function f: P -> R ✤ Cover L of im(f) by open intervals: ✤ Neighborhood size δ Example: uniform cover L • Resolution / granularity: r (diameter of intervals) • Gain: g (percentage of overlap) range scale geometric scale
Filter functions Mapper Algorithm 16 Choice of filter function is essential • Some kind of density measure • A score measure difference (distance) from some baseline • An eccentricity measure Statistics
 Mean/Max/Min Variance n-Moment Density … Machine Learning
 PCA/SVD Auto encoders Isomap/MDS/TSNE SVM Distance Error/Debugging Info … Geometry
 Centrality
 Curvature
 Harmonic Cycles …
Filter functions Mapper Algorithm 17 Eccentricity Density - How close the point lies to the “center” of the point cloud. - How close the point to the surrounding points
Mapper in applications Mapper Algorithm 18 Extracting insights from the shape of complex data using topology, Lum et al., Nature, 2013 Topological Data Analysis for Discovery in Preclinical Spinal Cord Injury and Traumatic Brain Injury, Nielson et al., Nature, 2015 Using Topological Data Analysis for Diagnosis Pulmonary Embolism, Rucco et al., arXiv preprint, 2014 Topological Methods for Exploring Low-density States in Biomolecular Folding Pathways, Yao et al., J. Chemical Physics, 2009 CD8 T-cell reactivity to islet antigens is unique to type 1 while CD4 T-cell reactivity exists in both type 1 and type 2 diabetes, Sarikonda et al., J. Autoimmunity, 2013 Innate and adaptive T cells in asthmatic patients: Relationship to severity and disease mechanisms, Hinks et al., J. Allergy Clinical Immunology, 2015 ✤ ✤ ✤ ✤ ✤ ✤
Mapper in practice Mapper Algorithm 19 1. Clustering 2. Feature selection
Mapper in clustering Mapper Algorithm 20 (1) Compute the Mapper (2) Detect interesting topological substructures (“loops”, “flares”) (3) Use substructure to cluster data select parameters Not easy (Tutorial part 1 + 2)
Mapper Algorithm 21 Extracting insights from the shape of complex data using topology, Lum et al., Nature, 2013 f: 1st and 2nd SVD r = 120, g = 22% PCA can show the Republican/ Democrat cluster but TDA gives more information House Party representative grouping Point: member of the House PCA
Mapper Algorithm 22 Extracting insights from the shape of complex data using topology, Lum et al., Nature, 2013 Detect new clusters for NBA players
Mapper Algorithm 23 Innate and adaptive T cells in asthmatic patients: Relationship to severity and disease mechanisms, Hinks et al., J. Allergy Clinical Immunology, 2015 The TDA used 62 subjects with most complete data. f: 1st and 2nd SVD r = 120, g = 14%, equalized
Mapper in feature selection Mapper Algorithm 24 (1) Compute the Mapper (2) Detect interesting topological substructures (“loops”, “flares”) (3) Select features that best discriminate data in substructure select parameters Kolmogorov-Smirnov test on (substructure) feature vs. (whole dataset) feature, select features with low p-val
Mapper Algorithm 25 Extracting insights from the shape of complex data using topology, Lum et al., Nature, 2013 Goal: detect factors that influence survival after therapy in breast cancer patients Points: breast cancer patients that went through specific therapy PCA/Single-linkage clustering cannot see this f: eccentricity r = 1/30, g = 33%
Mapper Algorithm 26 Topological Data Analysis for Discovery in Preclinical Spinal Cord Injury and Traumatic Brain Injury, Nielson et al., Nature, 2015
Select Parameters Mapper Algorithm 27 parameter r parameter g parameter δ parameter f • Small r -> fine cover 
 (close to Reeb) (sensitive to δ) • Large r -> rough cover 
 (less sensitive to δ) • g ≈ 1 -> more points inside intersections , less sensitive to δ but far from Reeb • g ≈ 0 -> controlled Mapper dimension, close to Reeb • Large δ -> fewer nodes, clean Mapper but far from Reeb (more straight lines) • Small δ -> distinct topological structure but lots of nodes (noisy) • Depend mostly on the dataset coordinate, density estimation, eccentricity, eigenvector
Select Parameters Mapper Algorithm 28 Example: P in R2 sampled from known distribution f = density estimator, r = 1/30, g = 20% δ = percentage of the diameter of X Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
Reference links Mapper Algorithm 29 • INF563 Topological Data Analysis Course
 http://www.enseignement.polytechnique.fr/informatique/INF563/ • AYASDI
 http://www.ayasdi.com/ • …

Tutorial of topological data analysis part 3(Mapper algorithm)

  • 1.
    Tutorial of Topological DataAnalysis Tran Quoc Hoan @k09hthaduonght.wordpress.com/ Paper Alert 2016-04-15, Hasegawa lab., Tokyo The University of Tokyo Part III - Mapper Algorithm
  • 2.
    My TDA =Topology Data Analysis ’s road TDA Road 2 Part I - Basic concepts & applications Part II - Advanced TDA computation Part III - Mapper Algorithm Part V - Applications in… Part VI - Applications in… Part IV - Software Roadmap He is following me
  • 3.
    TDA Road Imagesource: http://www.enseignement.polytechnique.fr/informatique/INF563/ Mapper Algorithm
  • 4.
    Basic motivation Mapper Algorithm4 Basic idea Perform clustering at different “scales”, track how clusters change as scale varies Motivation • Coarser than manifold learning, but still works in nonlinear situation • Extract meaningful geometric information about dataset • Efficiently computable (for large dataset) Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition. G Singh, F Mémoli, GE Carlsson - SPBG, 2007
  • 5.
    Morse theory Mapper Algorithm5 Basic idea Describe topology of a smooth manifold M using level sets of a suitable function h : M -> R • Recover M by looking at h-1((∞, t]), as t scans over the range of h • Topology of M changes at critical points of h
  • 6.
    Reeb graphs Mapper Algorithm6 • For each t in R, contract each component of f-1(t) to a point • Resulting structure is a graph
  • 7.
    Mapper Mapper Algorithm 7 Themapper algorithm is a generalization of this procedure (Singh- Memoli-Carlsson) Input ✤ Filter (continuous) function f: X -> R ✤ Cover L of im(f) by open intervals: Method ✤ Cluster each inverse image f-1(Lα) into various connected components ✤ The Mapper is the nerve of V • Clusters are vertices • 1 k-simplex per (k+1)-fold intersection connected cover V ✤ Color vertices according to average value of f in the cluster k i=0Vi 6= ;, V0, ..., Vk 2 V
  • 8.
    Workflow - Illustration MapperAlgorithm 8Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/ f could be in n-dimension
  • 9.
    Workflow - Illustration MapperAlgorithm 9Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/ f could be in n-dimension
  • 10.
    Workflow - Illustration MapperAlgorithm 10Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/ f could be in n-dimension
  • 11.
    Mapper in practice MapperAlgorithm 11 Input ✤ Filter (continuous) function f: P -> R ✤ Cover L of im(f) by open intervals: Method ✤ Cluster each inverse image f-1(Lα) into various connected components in G ✤ The Mapper is the nerve of V connected cover V ✤ Color vertices according to average value of f in the cluster - Point cloud P with metric dP - Compute neighborhood graph G = (P, E) • Clusters are vertices • 1 k-simplex per (k+1)-fold intersection k i=0Vi 6= ;, V0, ..., Vk 2 V (intersections materialized by data points)
  • 12.
    Mapper in practice MapperAlgorithm 12Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
  • 13.
    Mapper in practice MapperAlgorithm 13Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
  • 14.
    Mapper in practice MapperAlgorithm 14Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
  • 15.
    Mapper in practice MapperAlgorithm 15 Parameters ✤ Filter (continuous) function f: P -> R ✤ Cover L of im(f) by open intervals: ✤ Neighborhood size δ Example: uniform cover L • Resolution / granularity: r (diameter of intervals) • Gain: g (percentage of overlap) range scale geometric scale
  • 16.
    Filter functions Mapper Algorithm16 Choice of filter function is essential • Some kind of density measure • A score measure difference (distance) from some baseline • An eccentricity measure Statistics
 Mean/Max/Min Variance n-Moment Density … Machine Learning
 PCA/SVD Auto encoders Isomap/MDS/TSNE SVM Distance Error/Debugging Info … Geometry
 Centrality
 Curvature
 Harmonic Cycles …
  • 17.
    Filter functions Mapper Algorithm17 Eccentricity Density - How close the point lies to the “center” of the point cloud. - How close the point to the surrounding points
  • 18.
    Mapper in applications MapperAlgorithm 18 Extracting insights from the shape of complex data using topology, Lum et al., Nature, 2013 Topological Data Analysis for Discovery in Preclinical Spinal Cord Injury and Traumatic Brain Injury, Nielson et al., Nature, 2015 Using Topological Data Analysis for Diagnosis Pulmonary Embolism, Rucco et al., arXiv preprint, 2014 Topological Methods for Exploring Low-density States in Biomolecular Folding Pathways, Yao et al., J. Chemical Physics, 2009 CD8 T-cell reactivity to islet antigens is unique to type 1 while CD4 T-cell reactivity exists in both type 1 and type 2 diabetes, Sarikonda et al., J. Autoimmunity, 2013 Innate and adaptive T cells in asthmatic patients: Relationship to severity and disease mechanisms, Hinks et al., J. Allergy Clinical Immunology, 2015 ✤ ✤ ✤ ✤ ✤ ✤
  • 19.
    Mapper in practice MapperAlgorithm 19 1. Clustering 2. Feature selection
  • 20.
    Mapper in clustering MapperAlgorithm 20 (1) Compute the Mapper (2) Detect interesting topological substructures (“loops”, “flares”) (3) Use substructure to cluster data select parameters Not easy (Tutorial part 1 + 2)
  • 21.
    Mapper Algorithm 21 Extractinginsights from the shape of complex data using topology, Lum et al., Nature, 2013 f: 1st and 2nd SVD r = 120, g = 22% PCA can show the Republican/ Democrat cluster but TDA gives more information House Party representative grouping Point: member of the House PCA
  • 22.
    Mapper Algorithm 22 Extractinginsights from the shape of complex data using topology, Lum et al., Nature, 2013 Detect new clusters for NBA players
  • 23.
    Mapper Algorithm 23 Innateand adaptive T cells in asthmatic patients: Relationship to severity and disease mechanisms, Hinks et al., J. Allergy Clinical Immunology, 2015 The TDA used 62 subjects with most complete data. f: 1st and 2nd SVD r = 120, g = 14%, equalized
  • 24.
    Mapper in featureselection Mapper Algorithm 24 (1) Compute the Mapper (2) Detect interesting topological substructures (“loops”, “flares”) (3) Select features that best discriminate data in substructure select parameters Kolmogorov-Smirnov test on (substructure) feature vs. (whole dataset) feature, select features with low p-val
  • 25.
    Mapper Algorithm 25 Extractinginsights from the shape of complex data using topology, Lum et al., Nature, 2013 Goal: detect factors that influence survival after therapy in breast cancer patients Points: breast cancer patients that went through specific therapy PCA/Single-linkage clustering cannot see this f: eccentricity r = 1/30, g = 33%
  • 26.
    Mapper Algorithm 26 TopologicalData Analysis for Discovery in Preclinical Spinal Cord Injury and Traumatic Brain Injury, Nielson et al., Nature, 2015
  • 27.
    Select Parameters Mapper Algorithm27 parameter r parameter g parameter δ parameter f • Small r -> fine cover 
 (close to Reeb) (sensitive to δ) • Large r -> rough cover 
 (less sensitive to δ) • g ≈ 1 -> more points inside intersections , less sensitive to δ but far from Reeb • g ≈ 0 -> controlled Mapper dimension, close to Reeb • Large δ -> fewer nodes, clean Mapper but far from Reeb (more straight lines) • Small δ -> distinct topological structure but lots of nodes (noisy) • Depend mostly on the dataset coordinate, density estimation, eccentricity, eigenvector
  • 28.
    Select Parameters Mapper Algorithm28 Example: P in R2 sampled from known distribution f = density estimator, r = 1/30, g = 20% δ = percentage of the diameter of X Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
  • 29.
    Reference links Mapper Algorithm29 • INF563 Topological Data Analysis Course
 http://www.enseignement.polytechnique.fr/informatique/INF563/ • AYASDI
 http://www.ayasdi.com/ • …