Introduction to Explain
© Explore Data Science Academy
Overview
This tutorial is laid out as follows:
Explain
The Explore Data Science Process
Visualise
Communicate
Which tools we are using
Conclusion
Explain
Explaining your work is about visualising the data and communicating your insights in an effective way.
Visualise Communicate
= data dinosaur?
Same Stats, Different Graphs
The Explore Data Science Process
The Explore Data Science Process is about solving real-world problems using data.
EXPLAIN
Business Intelligence
Data Exploration
Dashboards
Business Acumen
Communication
Problem identification
Ethical considerations
Statistics
Probability Theory
Applied Statistics
Bayesian Stats
Visualise
Visualise is the dual skill of understanding and interpreting your data through visual images.
Data Exploration Dashboards
• Key to understanding ALL of your data.
• Dashboards turn data into insights.
• Individual variables summarised by summary • The key is to find key performance indicators
statistics and distributions. (KPIs) that the end user cares about.
• Relationships between variables determined by
• To help them understand their problem.
hypothesis testing.
Why do we need to visualise data?
What do all of the below datasets have in
common with this dinosaur?
...make both calculations and graphs.
Both sorts of output should be studied;
each will contribute to understanding.
F. J. Anscombe, 1973
Source: Same Stats, Different Graphs
But visualisation by itself is dangerous
Hypothesis Testing
• Summary Statistics help us gain insight into a single
variable’s distribution.
• Hypothesis Testing is the set of statistical techniques
to help us understand if our intuitions and hypotheses
are true and provable.
• Linear Regression is an essential tool to understand
the relationship between two numerical variables.
• Chi2 tests allow us to understand how categorical
variables interact and are related.
Source: Spurious Correlations
Visualisation and communication go hand-in-hand
1. 6.
Show your data! Utilise Macro / Micro
Edward Tufte’s
2. Rules for Data Visualisation 7.
Use graphics Use Layers
3. 8.
Avoid Chartjunk Use Multiples
4. 9.
Utilise Data Ink Use Colours
5. 10.
Use Labels Understand Narrative
Source: Tufte's Rules
Communicate
Communicate is the skill of delivering solutions that create change in the real-world.
Communication Skills
• Problem identification is a fundamental data
science skill.
• Effective communication is about being able
to describe your solution to suit your audience.
• It’s not about what you have done, it’s about
what they must do.
• Presentations help us deliver our solutions
with structure and evidence.
• Develop your written and spoken
communication skills throughout the year.
What tools are used within the Academy?
The some of the following tools may be used within your course:
● Google Docs is a free web-based application in which documents and
spreadsheets can be created, edited and stored online.
● Google Slides is a free web-based application in which presentations can be
created edited and stored online.
What tools are used within the Academy?
The some of the following tools may be used within your course:
● Power BI is a business analytics service that provides interactive
visualizations with self-service business intelligence capabilities,
where end users can create reports and dashboards by themselves.
● Matplotlib is a Python 2D plotting library which produces publication
quality figures in a variety of hard copy formats and interactive
environments across platforms.
● Seaborn is a Python visualization library based on matplotlib that
provides a high-level interface for drawing attractive statistical
graphics.
Conclusion
What we’ve learnt
Visualising data and communicating your insights in
an effective way
Understanding and interpreting your data through
visual images
Delivering solutions that create change in the
real-world