CSE423
VIRTUALIZATION AND CLOUD COMPUTING
Working with Cloud-based Storage
1
Introduction
The world is creating massive amounts of data.
A large percentage of that data either is already stored in the
cloud, will be stored in the cloud, or will pass through the cloud
during the data's lifecycle.
Cloud storage systems are among the most successful cloud
computing applications in use today.
This chapter surveys the area of cloud storage systems,
categorizes the different cloud storage system types, discusses
file-sharing and backup software and systems
2
Lecture Outline
Measuring the digital universe
Provisioning cloud storage
Creating cloud storage systems
Cloud backup solutions
Cloud storage interoperability
3
Measuring the Digital Universe
Facts of hunger for storage
An email with a 1GB attachment to 3 people can generate
an estimated 5 GB of stored managed data.
Only 25% of the data stored is unique, while 75% of the
data stored is duplicated.
70% of the data stored in the world is user initiated.,
remainder is enterprise generated content.
4
Measuring the Digital Universe
Facts of hunger for storage
More than 50% of the data created everyday is the data that
is automatically generated, (called shadow data/digital
shadow ) especially from video cameras and surveillance
photos, financial transaction event logs, performance data
and so on.
However lots of shadow data does get retained having never
been touched by human bieng
Much of the data produced is temporal, stored briefly and
get deleted.
5
Measuring the Digital Universe
The storage giant EMC has an interest in knowing just how
much data is being stored worldwide.
EMC has funded some studies over the past decade to assess
the size of what it calls “The Digital Universe.”
The latest study done by IDC in 2007-2008 predicted that by
2011 the world will store 1800 exabytes (EB) or 1.8 zettabytes
(ZB) of data. By the year 2020,stored data will reach an
astonishing 35ZB
https://www.emc.com/leadership/digital-universe/index.htm
6
EMC’s Digital Universe Homepage
7
Cloud Storage Data Usage in 2020
By International Data Corporation, Digital Universe, May 2010 8
Cloud Storage Definition
IaaS model
Storage accessed by Web service API
Cloudy characteristics
Network access most often through browser
On-demand provisioning
User control
SaaS model
Software package on top of cloud storage
for backup, synchronization, archiving, etc.
9
Storage Devices
Block storage device
Raw storage that can be partitioned to create volumes
Data is transferred in blocks
Example, hard disk, flash drives
Faster data transfers/ additional overhead on clients
File storage device
Expose its storage to client in a form of files
Example, file server, most often in the form of Network
Attached Storage (NAS) devise
Slower transfers/ less overhead from clients
10
Provisioning Cloud Storage
Cloud storage may be broadly
categorized into two major classes of
storage:
Unmanaged Storage
Managed Storage
11
Cloud Storage Types
Unmanaged storage
Unmanaged storage is presented to a user as if it is a
ready-to-use disk drive. The user has little control over
the nature of how the disk is used.
Preconfigured storage (limited level of mgt)
Cannot (1) format as your like, (2) install your own
file system (FAT, NTFS), and (3) change drive
properties (compression, encryption)
Reliable, relatively cheap, easy to work with
Ex-Application using this storage are SaaS web services
12
Cloud Storage Types
Managed storage
Managed storage involves the provisioning of raw
virtualized disk and the use of that disk to support
applications that use cloud-based storage
Provided as a raw disk
Can (1) format and partition the disk, (2) attach or
mount the disk, and (3) make storage assets available
to applications and other users
Support applications built using Web services
Ex-Application using this storage are IaaS web services
13
Unmanaged Cloud Storage
With the development of high capacity disks in mid to late
1990 a new class of Storage provider known as Storage
Service Provider (SSP) appeared with intent of doing
online storage
IDrive, FreeDrive, MyVirtualDrive, OmniDrive, Xdrive
offered file hosting services in unmanaged storage form.
Volumes were accessible using FTP then Utility then
within browsers. DropBox example of file transfer utility.
In unmanaged cloud, disk space provided to user as a
sized partition. 14
Dropbox – File Transfer Utility
15
Managed Cloud Storage
User provisions storage on demand and pays using pay-as-you-
go model
System appears to user as a raw disk that user must partition
and format
Amazon Simple Storage Service (S3)
http://aws.amazon.com/s3/
Rackspace Cloud
http://www.rackspace.com/index.php
Google Storage for Developers
https://cloud.google.com/storage/
16
Amazon S3 and Rackspace Cloud
17
Creating Cloud Storage Systems
Concepts
Multiple copies of data are stored on
multiple servers and in multiple locations
Storage virtualization software
Failover - > changing the pointers to the stored
object’s location
Example
Amazon Web Service (EC2, S3) supports
“failover” / load balancing ->but you must
purchase these features
18
Evaluating Cloud Storage
Important considerations
Client self-service
Strong management capabilities
Scale up – more disks
Scale out – additional storage systems
Performance characteristics such as
throughput
Block-based or file-based protocol support
Seamless maintenance and upgrades
19
Cloud Backup Solutions
Last line of defense in a strong backup routine
Backup types
Full system or image backups
Point-in-time (PIT) backups or snapshots
Incremental backups
3-2-1 Backup rule
3 copies (1 primary and 2 backups)
2 different media
1 copy should be stored offside
20
21
Backup Types
Full System/ Image Backups
Creates a complete copy of volume
including all system files, the boot record
and any other data contained in the disks.
For create image backup of active system
we need to stop all applications.
Ex. Ghost
22
Backup Types
Point in Time (PIT) or Snapshots
Referred to as incremental backup, created
so often.
Lets you restore your data to a point in
time and save multiple copies of any files
that have been changed.
Ex- Carbonite
23
Cloud Backup Solutions
Last line of defense in a strong backup routine
Backup types
Full system or image backups
Point-in-time (PIT) backups or snapshots
Incremental backups
3-2-1 Backup rule
3 copies (1 primary and 2 backups)
2 different media
1 copy should be stored offside
24
Cloud Backup Features
Logon authentication
High encryption of data transfers
Automated and scheduled backup
Fast backup (snapshots) after full online
backup, with 10-30 historical versions of
a file retained
Ability to retrieve historical versions of
file
25
Cloud Backup Features (2)
Multiplatform support (Win/ Mac / Linux)
Web-based management console with ease
to use features such as drag and drop.
24x7 technical support
Logging and reporting of operations
Multisite storage or replication, enabling data
failover
26
Cloud Attached Backup
27
CTERA sells a server referred to as Cloud Attached Storage,
which is meant for the Small and Medium Business (SMB)
market, branch offices, and the Small Office Home Office
(SOHO) market.
The CTERA Cloud Attached Storage backup server has the
attributes of a NAS (Network Attached Storage), with the added
feature that after you set up which systems you want to back
up, create user accounts, and set the backup options through a
browser interface, the system runs automated backup copying
and synchronizing of your data with cloud storage. Backed up
data may be shared between users
28
Cloud Storage Interoperability
Open standards (operating-system
neutral and file-system neutral)
Workgroups
Cloud Data Management Interface (CDMI)
from Storage Networking Industry
Association (SNIA)
http://www.snia.org
Open Cloud Computing Interface (OCCI)
from SNIA and Open Grid Forum (OGF)
http://www.ogf.org
29
References
Chapter 15 of Course Book: Cloud
Computing Bible, 2011, Wiley Publishing
Inc.
30