Chapter 2
Types of Digital Data
 “Reimagining Data Visualization Using Python”
 Seema Acharya
 Copyright  2022 Wiley India Pvt. Ltd. All rights reserved.
In the presentation
 Digital Data
 Classification of digital data
 Structured data
  Benefits of structured data
  Disadvantages of structured data
 Semi-structured data
 Unstructured data
 Structured Vs. Unstructured data
Digital Data
 Irrespective of the size of an enterprise (big or small), data assume
 significance as precious and irreplaceable asset.
 Data are present inside the enterprise and data also exist outside the four
 walls and firewalls of the enterprise.
 Data are present in homogeneous sources and data are also there in
 heterogeneous sources.
Classification of Digital Data
 Digital data Structured data
 Semi-structured data
 Unstructured data
Classification of Digital Data
Almost 80% of data generated in any enterprise today is unstructured data.
Roughly around 10% of data is in the structured and semi-structured
category.
 10% Structured Data
 10% Semi-Structured Data
 80% UnStructured Data
Structured Data
When do we say that the data are structured?
The simple answer is when data conform to a pre-defined schema or
structure we say it is structured data.
 Sources of structured
 data
Benefits of Structured Data
 It can be easily used by machine learning algorithms. It is easy to manipulate
 and query-structured data.
 It can be easily used by an average business user.
 There are several tools available in the market to work with and analyze
 structured data.
 Structured data is recommended to be used on websites. Structured data
 helps to markup your webpage so that search engines can quickly crawl your
 page. It tells the search engine what is there on each of the webpage and
 allows them to easily pick important bits of the information that they need.
 This could lead to improved SEO (Search Engine Optimization). It allows search
 engines to more accurately display relevant content.
 Structured data requires less storage space. Structured data is data that is
 formatted to fit a pre-defined structure before loading in data storage.
Disadvantages of Structured Data
 Storage inflexibility: Structured data is generally stored in relational
 databases or data warehouses both of which have highly rigid and stringent
 structures.
 Limited use cases: Pre-defined, structured data can only be used for its
 intended purpose which limits its use cases.
Semi-Structured Data
 Semi-structured data is also referred to as self-describing structure.
 It does not confirm to the data models that one typically associates with
 relational data bases or any other form of data tables.
 It uses tags to segregate semantic elements. Tags are also used to
 enforce hierarchies of records and fields within data.
 Sources of Semi-structured
 data
Unstructured Data
 Unstructured data does not conform to any data model.
 Its structure is quite unpredictable
 Human generated – social media comments, emails, word processing,
 PowerPoint presentations etc.
 Machine generated – satellite images, scientific data, surveillance images
 and videos etc.
 Sources of Unstructured data
Structured Vs. Unstructured data
 Structured Data Unstructured data
Who Self-service access. Business Data Scientists
 Users
What Only select data types Varied data types
When Schema on write Schema on read
Where Commonly stored in data Commonly stored in Data Lakes
 warehouse
How Predefined format Native format
Generated Human generated – Human generated – social media
by spreadsheets comments, emails, word processing,
 Machine generated – weblog PowerPoint presentations etc.
 statistics, Point of sale data Machine generated – satellite images,
 such as barcodes, and quantity scientific data, surveillance images and
 videos etc.
Characterist Quantitative, Factual Qualitative
ic
Thank you