The document discusses the importance of data visualization in data science, highlighting its role at various stages of the data science process and how it aids in understanding data and engaging audiences. It covers visual encoding techniques, biases in visualization, and the tools available for creating effective visualizations. Additionally, it outlines design principles and narrative structures essential for conveying insights through data.
Data Visualization in DataScience Maloy Manna biguru.wordpress.com linkedin.com/in/maloy twitter.com/itsmaloy
2.
Synopsis Having data isnot enough. Adding context to data is essential to understand the data, find patterns and engage audiences. Data visualization is a key element of data science, the interdisciplinary field which deals with finding insights from data. • In this webinar, we explore the roles of data visualization at different stages of the data science process, and why it is essential. • We also look at how data is encoded visually with shape, size, color and other variables and also the basic principles of visual encoding can be applied to build better visualizations. • We cover narratives, types of bias and maps. • Finally we look at how various tools – both open source and off-the-shelf software that’s used in data science to build effective data visualizations.
3.
Speaker profile Maloy Manna ProjectManager - Engineering AXA Data Innovation Lab • Over 14 years experience building data driven products and services • Previous organizations: Thomson Reuters, Saama, Infosys, TCS biguru.wordpress.com linkedin.com/in/maloy twitter.com/itsmaloy
4.
Contents Defining Datavisualization Data science process Data visualization Visual encoding of data Narrative structures Dataviz Technology & Tools
5.
Defining Data visualization •Visual display of quantitative information • Mapping data to visual elements • Encoding data with size, shape, color... • Storytelling / narrative elements
Data science projectlife-cycle • Acquire data • Prepare data • Analysis & Modeling • Evaluation & Interpretation • Deployment • Operations & Optimization
8.
Data science process DataWrangling EDA: Exploratory Data Analysis Data Visualization ExplanatoryExploratory Source: Computational Information Design | Ben Fry
9.
Exploratory data visualization Dataanalysis approaches: Classical: Problem > Data > Model > Analysis > Conclusions EDA: [Exploratory Data Analysis] Problem > Data > Analysis > Model > Conclusions Bayesian: Problem > Data > Model > Prior distribution > Analysis > Conclusions EDA = approach, not a set of techniques
10.
Exploratory data visualization Statisticalapproaches: • Quantitative • Hypothesis testing • Analysis of variance (ANOVA) • Point estimates and confidence intervals • Least squares regression • Graphical • Scatter plots • Histograms • Probability plots • Residual plots • Box plots • Block plots
Exploratory data visualization Graphicalanalysis procedures: • Testing assumptions • Model selection • Model validation • Estimator selection • Relationship identification • Factor effect determination • Outlier detection MUST USE for deriving insights from data
13.
Exploratory data analysis Anscombe'squartet N=11 Mean of X = 9.0 Mean of Y = 7.5 Intercept = 3 Slope = 0.5 Residual standard deviation = 1.237 Correlation = 0.816
Visual encoding ofdata Data → visual display elements • Position x • Position y • Retinal variables • Size, Orientation (ordered data) • Color Hue, Shape (nominal data) • Animation
21.
Visual encoding ofdata Ranking visual display elements (framework): 1. Position along a common-scale e.g. scatter plots 2. Position on identical but non-aligned scales E.g. multiple scatter plots 3. Length e.g. bar chart 4. Angle & Slope e.g. pie-chart 5. Area e.g. bubbles 6. Volume, density & color saturation e.g. heat-map 7. Color hue e.g. highlights Ref. Graphical Perception & graphical methods for analyzing scientific data – William Cleveland & Robert McGill (1985)
22.
Design principles Choosethe right type of chart • Trends / Change over time → Line charts • Distributions → Histograms • Summary Information → Table • Relationships → Scatter Plots Get it right in black & white (before adding color) Prefer 2D to 3D for statistical charts Use color to highlight Avoid rainbow palette Avoid chartjunk : “less is more” Try to have a high data-ink ratio
23.
Design principles Choosethe right type of chart Ranking Time-series Deviation Correlation Nominal comparison
24.
Narrative structures Data Journalism Traditionaljournalism Data journalism • Data around narrative • Narrative around data • Linear flow • Complex, often non-linear flow • Physical static media • Online interactive media
Narrative structures Bias andErrors (statistics): • Selection bias e.g. in sampling • Omitted-variable bias Errors: • Hypothesis testing • Null Hypothesis = default/no-effect state Null Hypothesis H0 Valid Invalid Reject Type I error • False positive Correct inference • True positive Accept Correct inference • True negative Type II error • False negative
30.
Narrative structures Storytelling: Visualnarratives have moved from author-driven to viewer- driven with use of highly interactive media for data visualization Author driven Viewer driven Strong ordering Exploratory Heavy messaging Ability to ask questions Need for clarity and speed Build own story Author-driven Viewer-driven
References Visual display ofQuantitative Information: Edward Tufte http://goo.gl/qb5ej Exploratory Data Analysis: John Tukey http://goo.gl/tV57HP Data Science Life cycle : Maloy Manna http://www.datasciencecentral.com/profiles/blogs/the-data-science-project-lifecycle Selecting right graph for your message: Stephen Few www.perceptualedge.com/articles/ie/the_right_graph.pdf Practical rules for using color in charts: Stephen Few www.perceptualedge.com/articles/visual.../rules_for_using_color.pdf OpenIntro Statistics: https://www.openintro.org/stat/ Misleading with statistics: Eric Portelance https://medium.com/i-data/misleading-with-statistics-c63780efa928 Computational Information Design: Ben Fry http://benfry.com/phd/dissertation-050312b-acrobat.pdf