100% found this document useful (2 votes)
613 views50 pages

Google Colab & Python for ML Beginners

Information System Course. This Presentation is about the Google Collab & Python. This is taught in North South University MIS (Management Information System) Module. This topic is taught by Dr. Rakibul Islam, Assistant Professor.

Uploaded by

Sadman Kabir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
613 views50 pages

Google Colab & Python for ML Beginners

Information System Course. This Presentation is about the Google Collab & Python. This is taught in North South University MIS (Management Information System) Module. This topic is taught by Dr. Rakibul Islam, Assistant Professor.

Uploaded by

Sadman Kabir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Google Colab and Python

for Data Analytics


Google Colab
• Google is quite aggressive in AI research. Over many years,
Google developed AI framework called TensorFlow and a
development tool called Colaboratory.
• TensorFlow is an end-to-end platform that makes it easy for
you to build and deploy ML models. It is an entire ecosystem
to help you solve challenging, real-world problems with
machine learning. TensorFlow is open-sourced.

2
Google Colab
• Colaboratory is now known as Google Colab or simply Colab.
• The introduction of Colab has eased the learning and development of
machine learning applications.
• Another attractive feature that Google offers to the developers is the
use of GPU. Colab supports GPU and it is totally free. The reasons for
making it free for public could be to make its software a standard in
the academics for teaching machine learning and data science.
• I assume that you are already aware about Jupyter, GitHub, basics
of Python and other computer programming languages.

3
Google Colab
• Colab is a free Jupyter notebook environment that runs entirely
in the cloud. Most importantly, it does not require a setup and
the notebooks that you create can be simultaneously edited by
your team members - just the way you edit documents in Google
Docs.
• Colab supports many popular machine learning libraries which
can be easily loaded in your notebook.

4
What Colab Offers You?
• As a programmer, you can perform the following using Google Colab.
  Write and execute code in Python
  Document your code that supports mathematical equations
  Create/Upload/Share notebooks
  Import/Save notebooks from/to Google Drive
  Import/Publish notebooks from GitHub
  Import external datasets e.g. from Kaggle
  Integrate PyTorch, TensorFlow, Keras, OpenCV
  Free Cloud service with free GPU

5
How Google Colab Works?

As Colab implicitly uses Google Drive for storing your notebooks,


ensure that you are logged in to your Google Drive account
before proceeding further.

6
Step 1: Open the following URL in your browser:
https://colab.research.google.com
Step 2: Click on the NEW NOTEBOOK link at the
bottom of the screen
Step 3: Click on the connect at the top (right) of the
screen
Step 4: Click on the untitled.ipynb at the top (left)
of the screen
Step 5: Write and execute program
Step 6: Click on box at left corner to import file
Step 7: Click on box (arrow) at left top corner to
select file
What is Python?
• Python is an interpreted, high level and general-purpose
programming language. 
• Python is a general-purpose coding language—which means
that, unlike HTML, CSS, and JavaScript, it can be used for other
types of programming and software development besides
web development.

14
Why Python?

• Python works on different platforms (Windows, Mac, Linux, Raspberry Pi,


etc).
• Python has a simple syntax similar to the English language with influence
from mathematics.
• Python has syntax that allows developers to write programs with fewer
lines than some other programming languages.
• Python runs on an interpreter system, meaning that code can be executed
as soon as it is written. This means that prototyping can be very quick.
• Python can be treated in a procedural way, an object-oriented way or a
functional way.

15
Development Steps of Python
• Python was conceptualized in the late 1980s and first released
in 1991. Guido van Rossum(গুইডো ভ্যান রসুম), as a successor to
the ABC Programming Language, worked that time in a project
at the CWI (Centrum Wiskunde & Informatica), called Amoeba,
a distributed operating system.
• ABC is a general-purpose programming language and
programming environment, which was developed in the
Netherlands, Amsterdam, at the CWI

16
Development Steps of Python

•Guido Van Rossum published the first version of Python code (version 0.9.0)
at alt.sources in February 1991. This release included already exception
handling, functions, and the core data types of list, dict, str and others. It
was also object oriented and had a module system.
•Python version 1.0 was released in January 1994. The major new features
included in this release were the functional programming tools lambda,
map, filter and reduce, which Guido Van Rossum never liked.
•Six and a half years later in October 2000, Python 2.0 was introduced. This
release included list comprehensions, a full garbage collector and it was
supporting unicode.

17
Development Steps of Python

•Python flourished for another 8 years in the versions 2.x


before the next major release as Python 3.0 (also known as
"Python 3000" and "Py3K") was released. Python 3 is not
backwards compatible with Python 2.x. The emphasis in
Python 3 had been on the removal of duplicate programming
constructs and modules, thus fulfilling or coming close to
fulfilling the 13th law of the Zen of Python: "There should be
one -- and preferably only one -- obvious way to do it."

18
What can Python do?
• Python can be used on a server to create web applications.
• Python can be used alongside software to create workflows.
• Python can connect to database systems. It can also read and modify files.
• Python can be used to handle big data and perform complex mathematics.
• Python can be used for rapid prototyping, or for production-ready software development.
• Entry-Level Python Jobs
• Entry-Level Software Developer.
• Quality Assurance Engineer.
• Junior Python Developer.
• Python Full Stack Developer.
• GIS Analyst.
• Senior Python Developer.
• Data Scientist.
• Machine Learning Engineering 19
Python Variables
• Variables are containers for storing data values.
• Python has no command for declaring a variable.
• A variable is created the moment you first assign a value to it.
x = 5
y = "John"
print(x)
print(y)
• Variables do not need to be declared with any particular type, and can even
change type after they have been set.
x = 4       # x is of type int
x = "Sally" # x is now of type str
print(x)
20
Python Variables
• If you want to specify the data type of a variable, this can be done with casting.
x = str(3)    # x will be '3'
y = int(3)    # y will be 3
z = float(3)  # z will be 3.0

• You can get the data type of a variable with the type() function.


x = 5
y = "John"
print(type(x))
print(type(y))
• String variables can be declared either by using single or double quotes.
x = "John"
# is the same as
x = 'John'
• Variable names are case-sensitive.
a = 4
A = "Sally"
#A will not overwrite a
21
Python Variables
• A variable can have a short name (like x and y) or a more descriptive name
(age, carname, total_volume).
• Rules for Python variables:
• A variable name must start with a letter or the underscore character
• A variable name cannot start with a number
• A variable name can only contain alpha-numeric characters and underscores (A-z, 0-9,
and _ )
• Variable names are case-sensitive (age, Age and AGE are three different variables)
myvar = "John"
my_var = "John"
_my_var = "John"
myVar = "John"
MYVAR = "John"
myvar2 = "John"

22
Illegal variable names
• 2myvar = "John“
• my-var = "John“
• my var = "John"

23
Multi Words Variable Names
• Variable names with more than one word can be difficult to read.
• There are several techniques you can use to make them more readable:
• Pascal Case
• Each word starts with a capital letter:
• MyVariableName = "John"
• Camel Case
• Each word, except the first, starts with a capital letter:
• myVariableName = "John"
• Snake Case
• Each word is separated by an underscore character
• my_variable_name = "John"
24
Python Variables - Assign Multiple Values
• Many Values to Multiple Variables
• Python allows you to assign values to multiple variables in one
line:
• Example (Note: Make sure the number of variables matches
the number of values, or else you will get an error.)
x, y, z = "Orange", "Banana", "Cherry"
print(x)
print(y)
print(z)

25
Python Variables - Assign Multiple Values
• One Value to Multiple Variables
• You can assign the same value to multiple variables in one line:
x = y = z = "Orange"
print(x)
print(y)
print(z)
• Unpack a Collection
• If you have a collection of values in a list, tuple etc. Python allows you extract the
values into variables. This is called unpacking.
fruits = ["apple", "banana", "cherry"]
x, y, z = fruits
print(x)
print(y)
print(z) 26
Python - Output Variables
• The Python print statement is often used to output variables
• To combine both text and variable, Python uses the + character
x = "awesome"
print("Python is " + x)
• You can also use the + character to add a variable to another variable
x = "Python is "
y = "awesome"
z =  x + y
print(z)
• For number, the + character works as a mathematical operator
x = 5
y = 10
print(x + y)
• If you try to combine a string and a number, Python will give you an error:
x = 5
y = "John"
print(x + y)
27
Data Type in Python
Built-in Data Types
• In programming, data type is an important concept.
• Variables can store data of different types, and different types can do different things.
• Python has the following data types built-in by default, in these categories:

Text Type: str


Numeric Types: int, float, complex
Sequence Types: list, tuple, range
Mapping Type: dict
Set Types: set, frozenset
Boolean Type: bool
Binary Types: bytes, bytearray, memoryview
28
Data Type in Python
• You can get the data type of any object by using the type() function
• Print the data type of the variable x:
x = 5
print(type(x))

29
Setting the Data Type
• In Python, the data type is set when you assign a value to a variable:
Example Data Type
x = "Hello World" str
x = 20 int
x = 20.5 float
x = 1j complex
x = ["apple", "banana", "cherry"] list
x = ("apple", "banana", "cherry") tuple
x = range(6) range
x = {"name" : "John", "age" : 36} dict
x = {"apple", "banana", "cherry"} set
x = frozenset({"apple", "banana", "cherry"}) frozenset
x = True bool
x = b"Hello" bytes
x = bytearray(5) bytearray
x = memoryview(bytes(5)) memoryview
30
If you want
Setting the Specific Data Type
to specify the data type, you can use the following constructor
functions:
Example Data Type
x = str("Hello World") str
x = int(20) int
x = float(20.5) float
x = complex(1j) complex
x = list(("apple", "banana", "cherry")) list
x = tuple(("apple", "banana", "cherry")) tuple
x = range(6) range
x = dict(name="John", age=36) dict
x = set(("apple", "banana", "cherry")) set
x = frozenset(("apple", "banana", "cherry")) frozenset
x = bool(5) bool
x = bytes(5) bytes
x = bytearray(5) bytearray
x = memoryview(bytes(5)) memoryview
Exercise

• The following code example would print the data type of x, what data
type would that be?
x = 5 print(type(x))

32
Python Libraries for Data Science
Many popular Python toolboxes/libraries:
Data Processing and Modeling
• NumPy
• SciPy
• Pandas
• SciKit-Learn
• TensorFlow
• Keras
Data Mining
• scrapy
Visualization libraries
• matplotlib
• Seaborn
• plotly
• pydot

and many more …


33
Python Libraries for Data Science
NumPy:
 NumPy (Numerical Python) is a perfect tool for scientific computing
and performing basic and advanced array operations.
 introduces objects for multidimensional arrays and matrices, as well
as functions that allow to easily perform advanced mathematical
and statistical operations on those objects
 provides vectorization of mathematical operations on arrays and
matrices which significantly improves the performance
 many other python libraries are built on NumPy
Link: http://www.numpy.org/

34
Python Libraries for Data Science
SciPy:
 collection of algorithms for linear algebra, differential equations, numerical
integration, optimization, statistics and more

 part of SciPy Stack

 built on NumPy

Link: https://www.scipy.org/scipylib/

35
Python Libraries for Data Science
Pandas:
 Pandas is a library created to help developers work with "labeled" and
"relational" data. 

 adds data structures and tools designed to work with table-like data (similar
to Series and Data Frames in R)

 provides tools for data manipulation: reshaping, merging, sorting, slicing,


aggregation etc.

Link: http://pandas.pydata.org/
 allows handling missing data
36
Python Libraries for Data Science
SciKit-Learn:
• This is an industry-standard for data science projects based in Python.
Scikits is a group of packages in the SciPy Stack that were created for
specific functionalities – for example, image processing. Scikit-learn uses the
math operations of SciPy to expose a concise interface to the most common
machine learning algorithms. 
• Data scientists use it for handling standard machine learning and data
mining tasks such as classification, clustering, regression, model selection,
model validation, dimensionality reduction, and classification.
• built on NumPy, SciPy and matplotlib
Link: http://scikit-learn.org/

37
Python Libraries for Data Science
•  TensorFlow
• TensorFlow is a popular Python framework for machine learning and
deep learning, which was developed at Google Brain. It's the best tool
for tasks like object identification, speech recognition, and many
others. It helps in working with artificial neural networks that need to
handle multiple data sets. The library includes various layer-helpers
(tflearn, tf-slim, skflow), which make it even more functional.
TensorFlow is constantly expanded with its new releases – including
fixes in potential security vulnerabilities or improvements in the
integration of TensorFlow and GPU.

38
Python Libraries for Data Science
• Keras
• Keras is a great library for building neural networks and modeling. It's
very straightforward to use and provides developers with a good
degree of extensibility. The library takes advantage of other packages,
(Theano or TensorFlow) as its backends. Moreover, Microsoft
integrated CNTK (Microsoft Cognitive Toolkit) to serve as another
backend. It's a great pick if you want to experiment quickly using
compact systems – the minimalist approach to design really pays off!

39
Python Libraries for Data Science
• Scrapy
• One of the most popular Python data science libraries, Scrapy helps to
build crawling programs (spider bots) that can retrieve structured
data from the web – for example, URLs or contact info. It's a great
tool for scraping data used in, for example, Python machine learning
models. 
• Developers use it for gathering data from APIs. This full-fledged
framework follows the Don't Repeat Yourself principle in the design of
its interface. As a result, the tool inspires users to write universal code
that can be reused for building and scaling large crawlers.

40
Python Libraries for Data Science
matplotlib:
• This is a standard data science library that helps to generate data
visualizations such as two-dimensional diagrams and graphs (histograms,
line plots, pie charts, scatterplots, non-Cartesian coordinates graphs).
• Matplotlib is one of those plotting libraries that are really useful in data
science projects — it  provides an object-oriented API for embedding plots
into applications. 
• a set of functionalities similar to those of MATLAB
• relatively low-level; some effort needed to create advanced visualization 

Link: https://matplotlib.org/

41
Python Libraries for Data Science
Seaborn:
 Seaborn is based on Matplotlib and serves as a useful Python machine
learning tool for visualizing statistical models – heatmaps and other types of
visualizations that summarize data and depict the overall distributions. When
using this library, you get to benefit from an extensive gallery of visualizations
(including complex ones like time series, joint plots, and violin diagrams).
 based on matplotlib 
 provides high level interface for drawing attractive statistical graphics
 Similar (in style) to the popular ggplot2 library in R

Link: https://seaborn.pydata.org/

42
Python Libraries for Data Science
• Ploty
• This web-based tool for data visualization that offers many
useful out-of-box graphics – you can find them on the 
Plot.ly website. The library works very well in interactive web
applications. Its creators are busy expanding the library with
new graphics and features for supporting multiple linked
views, animation, and crosstalk integration.

Link: https://seaborn.pydata.org/

43
Python Libraries for Data Science
•  Pydot
• This library helps to generate oriented and non-oriented
graphs. It serves as an interface to Graphviz (written in pure
Python). You can easily show the structure of graphs with the
help of this library. That comes in handy when you're
developing algorithms based on neural networks and
decision trees.

44
Loading Python Libraries
In [ ]: #Import Python Libraries
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib as mpl
import seaborn as sns

Press Shift+Enter to execute the jupyter cell

45
Reading data using pandas
In [ ]: #Read csv file
df = pd.read_csv("Salaries.csv")

Note: The data import process.

There is a number of pandas commands to read other data formats:

pd.read_excel('myfile.xlsx',sheet_name='Sheet1', index_col=None,
na_values=['NA'])
pd.read_stata('myfile.dta')
pd.read_sas('myfile.sas7bdat')
pd.read_hdf('myfile.h5','df')
46
Data Frame data types
Pandas Type Native Python Type Description
object string The most general dtype. Will be assigned
to your column if column has mixed
types (numbers and strings).

int64 int Numeric characters. 64 refers to the


memory allocated to hold this character.

float64 float Numeric characters with decimals. If a


column contains numbers and NaNs(see
below), pandas will default to float64, in
case your missing value has a decimal.

datetime64, timedelta[ns] N/A (but see the datetime module in Values meant to hold time data. Look
Python’s standard library) into these for time series experiments.

47
How to install Python?
• Installing or updating Python on your computer is the first
step to becoming a Python programmer. There are a
multitude of installation methods: you can download official
Python distributions from Python.org, install from a package
manager, and even install specialized distributions for
scientific computing, Internet of Things, and embedded
systems. 

48
How to install Python?
• Python 3 Installation on Windows
• Step 1: Select Version of Python to Install. ...
• Step 2: Download Python Executable Installer. ...
• Step 3: Run Executable Installer. ...
• Step 4: Verify Python Was Installed On Windows. ...
• Step 5: Verify Pip Was Installed. ...
• Step 6: Add Python Path to Environment Variables
(Optional)

49
Thank
You

You might also like