Management Analytics Python Giovanni Della Lunga giovanni.dellalunga@gmail.com MASTER BIG DATA, ANALYTICS AND TECHNOLOGIES FOR MANAGEMENT
Python ABC A Concise Introduction
4 Major Versions of Python »“Python” is written in C/C++ - Version 2.7 came out in mid-2010 - Version 3.1.2 came out in early 2010 »“Jython” is written in Java for the JVM »“IronPython” is (was!) written in C# for the .Net environment
2.x Vs 3.x
Development Environments what IDE to use? http://stackoverflow.com/questions/81584 1. PyDev with Eclipse 2. Komodo 3. Emacs 4. Vim 5. TextMate 6. Gedit 7. Idle 8. PIDA (Linux)(VIM Based) 9. NotePad++ (Windows) 10.BlueFish (Linux)
Pydev with Eclipse
Setup »Anaconda  http://docs.continuum.io/conda/index.html  Installs:  Python env (including IPython)  Several packages »Eclipse (pre-requisite: Java)  http://www.eclipse.org/downloads/ »PyDev (requires Java 7)  Install: http://pydev.org/manual_101_install.html  Setup Interpreter
Python Interactive Shell % python Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> You can type things directly into a running Python session >>> 2+3*4 14 >>> name = "Andrew" >>> name 'Andrew' >>> print "Hello", name Hello Andrew >>>
The Python Interpreter • Python is an interpreted language • The interpreter provides an interactive environment to play with the language • Results of expressions are printed on the screen >>> 3 + 7 10 >>> 3 < 15 True >>> 'print me' 'print me' >>> print 'print me' print me >>>
The print Statement >>> print 'hello' hello >>> print 'hello', 'there' hello there • Elements separated by commas print with a space between them • A comma at the end of the statement (print ‘hello’,) will not print a newline character
No Braces, only Spaces! »Python uses indentation instead of braces to determine the scope of expressions »All lines must be indented the same amount to be part of the scope (or indented more if part of an inner scope) »This forces the programmer to use proper indentation since the indenting is part of the program!
Variables »Are not declared, just assigned »The variable is created the first time you assign it a value »Are references to objects »Type information is with the object, not the reference »Everything in Python is an object
Variables
Variables
Numbers: Integers »Integer – the equivalent of a C long »Long Integer – an unbounded integer value. >>> 132224 132224 >>> 132323 ** 2 17509376329L >>>
Numbers: Floating Point »int(x) converts x to an integer »float(x) converts x to a floating point »The interpreter shows a lot of digits >>> 1.23232 1.2323200000000001 >>> print 1.23232 1.23232 >>> 1.3E7 13000000.0 >>> int(2.0) 2 >>> float(2) 2.0
Numbers are immutable >>> x = 4.5 >>> y = x >>> y += 3 >>> x 4.5 >>> y 7.5 x 4.5 y x 4.5 y 7.5
Basic operations »Assignment:  size = 40  a = b = c = 3 »Numbers  integer, float  complex numbers: 1j+3, abs(z) »Strings  'hello world', 'it's hot'  "bye world"  continuation via or use """ long text """"
Date » import datetime » now = datetime.datetime.now() » print » print "Current date and time using str method of datetime object:" » print str(now) » print » print "Current date and time using instance attributes:" » print "Current year: %d" % now.year » print "Current month: %d" % now.month » print "Current day: %d" % now.day » print "Current hour: %d" % now.hour » print "Current minute: %d" % now.minute » print "Current second: %d" % now.second » print "Current microsecond: %d" % now.microsecond » print » print "Current date and time using strftime:" » print now.strftime("%Y-%m-%d %H:%M")
Lists »lists can be heterogeneous  a = ['spam', 'eggs', 100, 1234, 2*2] »Lists can be indexed and sliced:  a[0]  spam  a[:2]  ['spam', 'eggs'] »Lists can be manipulated  a[2] = a[2] + 23  a[0:2] = [1,12]  a[0:0] = []  len(a)  5
List methods »append(x) »extend(L)  append all items in list (like Tcl lappend) »insert(i,x) »remove(x) »pop([i]), pop()  create stack (FIFO), or queue (LIFO)  pop(0) »index(x)  return the index for value x
List methods »count(x)  how many times x appears in list »sort()  sort items in place »reverse()  reverse list
String Literals »Strings are immutable »There is no char type like in C++ or Java »+ is overloaded to do concatenation >>> x = 'hello' >>> x = x + ' there' >>> x 'hello there'
Strings share many features with lists >>> smiles = "C(=N)(N)N.C(=O)(O)O" >>> smiles[0] 'C' >>> smiles[1] '(' >>> smiles[-1] 'O' >>> smiles[1:5] '(=N)' >>> smiles[10:-4] 'C(=O)' Use “slice” notation to get a substring
String operations »concatenate with + or neighbors  word = 'Help' + x  word = 'Help' 'a' »subscripting of strings  'Hello'[2]  'l'  slice: 'Hello'[1:2]  'el'  word[-1]  last character  len(word)  5  immutable: cannot assign to subscript
String Methods: find, split smiles = "C(=N)(N)N.C(=O)(O)O" >>> smiles.find("(O)") 15 >>> smiles.find(".") 9 >>> smiles.find(".", 10) -1 >>> smiles.split(".") ['C(=N)(N)N', 'C(=O)(O)O'] >>> Use “find” to find the start of a substring. Start looking at position 10. Find returns -1 if it couldn’t find a match. Split the string into parts with “.” as the delimiter
String operators: in, not in if "Br" in “Brother”: print "contains brother“ email_address = “clin” if "@" not in email_address: email_address += "@brandeis.edu“
String Method: “strip”, “rstrip”, “lstrip” are ways to remove whitespace or selected characters >>> line = " # This is a comment line n" >>> line.strip() '# This is a comment line' >>> line.rstrip() ' # This is a comment line' >>> line.rstrip("n") ' # This is a comment line ' >>>
More String methods email.startswith(“c") endswith(“u”) True/False >>> "%s@brandeis.edu" % "clin" 'clin@brandeis.edu' >>> names = [“Ben", “Chen", “Yaqin"] >>> ", ".join(names) ‘Ben, Chen, Yaqin‘ >>> “chen".upper() ‘CHEN'
Control flow: if x = int(raw_input("Please enter #:")) if x < 0: x = 0 print 'Negative changed to zero' elif x == 0: print 'Zero' elif x == 1: print 'Single' else: print 'More' » no case statement
Control flow: for a = ['cat', 'window', 'defenestrate'] for x in a: print x, len(x) »no arithmetic progression, but  range(10)  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]  for i in range(len(a)): print i, a[i] »do not modify the sequence being iterated over
Loop Control Statements break Jumps out of the closest enclosing loop continue Jumps to the top of the closest enclosing loop pass Does nothing, empty statement placeholder
Loops: break, continue, else »break and continue like C »else after loop exhaustion for n in range(2,10): for x in range(2,n): if n % x == 0: print n, 'equals', x, '*', n/x break else: # loop fell through without finding a factor print n, 'is prime'
Loop example » import fnmatch » import os » images = ['*.jpg', '*.jpeg', '*.png', '*.tif', '*.tiff'] » matches = [] » for root, dirnames, filenames in os.walk('C:'): » for extensions in images: » for filename in fnmatch.filter(filenames, extensions): » print filename » matches.append(os.path.join(root, filename)) Simple Matching fnmatch() compares a single file name against a pattern and returns a Boolean indicating whether or not they match. Filtering To test a sequence of filenames, you can use filter(). It returns a list of the names that match the pattern argument.
Loop example » # Python program to check if the input number is prime or not » num = 407 » # take input from the user » # num = int(input("Enter a number: ")) » # prime numbers are greater than 1 » if num > 1: » # check for factors » for i in range(2,num): » if (num % i) == 0: » print(num,"is not a prime number") » print(i,"times",num//i,"is",num) » break » else: » print(num,"is a prime number") » » # if input number is less than » # or equal to 1, it is not prime » else: » print(num,"is not a prime number") To understand this example, you should have the knowledge of following Python programming topics: • Python if...else Statement • Python for Loop • Python break and continue A positive integer greater than 1 which has no other factors except 1 and the number itself is called a prime number. 2, 3, 5, 7 etc. are prime numbers as they do not have any other factors. But 6 is not prime (it is composite) since, 2 x 3 = 6.
Python Structures Storing Structured Information
List comprehensions »Create lists without map(), filter(), lambda »= expression followed by for clause + zero or more for or of clauses >>> vec = [2,4,6] >>> [3*x for x in vec] [6, 12, 18] >>> [{x: x**2} for x in vec} [{2: 4}, {4: 16}, {6: 36}]
List comprehensions »cross products: >>> vec1 = [2,4,6] >>> vec2 = [4,3,-9] >>> [x*y for x in vec1 for y in vec2] [8,6,-18, 16,12,-36, 24,18,-54] >>> [x+y for x in vec1 and y in vec2] [6,5,-7,8,7,-5,10,9,-3] >>> [vec1[i]*vec2[i] for i in range(len(vec1))] [8,12,-54]
List comprehensions »can also use if: >>> [3*x for x in vec if x > 3] [12, 18] >>> [3*x for x in vec if x < 2] []
Tuples: sort of an immutable list >>> yellow = (255, 255, 0) # r, g, b >>> one = (1,) >>> yellow[0] >>> yellow[1:] (255, 0) >>> yellow[0] = 0 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment Very common in string interpolation: >>> "%s lives in %s at latitude %.1f" % ("Andrew", "Sweden", 57.7056) 'Andrew lives in Sweden at latitude 57.7'
Tuples and sequences »lists, strings, tuples: examples of sequence type »tuple = values separated by commas >>> t = 123, 543, 'bar' >>> t[0] 123 >>> t (123, 543, 'bar')
Tuples »Tuples may be nested >>> u = t, (1,2) >>> u ((123, 542, 'bar'), (1,2)) »like strings, immutable  can't assign to individual items
Tuples »Empty tuples: () >>> empty = () >>> len(empty) 0 »one item  trailing comma >>> singleton = 'foo',
Tuples »sequence unpacking  distribute elements across variables >>> t = 123, 543, 'bar' >>> x, y, z = t >>> x 123 »packing always creates tuple »unpacking works for any sequence
Dictionaries »associative arrays »indexed by keys »keys are any immutable type: e.g., tuples »but not lists (mutable!) »uses 'key: value' notation >>> tel = {'hgs' : 7042, 'lennox': 7018} >>> tel['cs'] = 7000 >>> tel
Dictionaries »no particular order »delete elements with del >>> del tel['foo'] »keys() method  unsorted list of keys >>> tel.keys() ['cs', 'lennox', 'hgs'] »use has_key() to check for existence >>> tel.has_key('foo') 0
Dictionaries » prices = {'apple': 0.40, 'banana': 0.50} » my_purchase = { » 'apple': 1, » 'banana': 6} » grocery_bill = sum(prices[fruit] * my_purchase[fruit] » for fruit in my_purchase) » print 'I owe the grocer $%.2f' % grocery_bill
Defining functions def fib(n): """Print a Fibonacci series up to n.""" a, b = 0, 1 while b < n: print b, a, b = b, a+b >>> fib(2000) » First line is docstring » first look for variables in local, then global » need global to assign global variables
Modules Coding Your Ideas
Importing and Modules » Use classes & functions defined in another file » A Python module is a file with the same name (plus the .py extension) » Like Java import, C++ include » Three formats of the command: import somefile from somefile import * from somefile import className » The difference? What gets imported from the file and what name refers to it after importing
import … import somefile » Everything in somefile.py gets imported. » To refer to something in the file, append the text “somefile.” to the front of its name: somefile.className.method(“abc”) somefile.myFunction(34)
from … import * from somefile import * » Everything in somefile.py gets imported » To refer to anything in the module, just use its name. Everything in the module is now in the current namespace. » Take care! Using this import command can easily overwrite the definition of an existing function or variable! className.method(“abc”) myFunction(34)
from … import … from somefile import className » Only the item className in somefile.py gets imported. » After importing className, you can just use it without a module prefix. It’s brought into the current namespace. » Take care! Overwrites the definition of this name if already defined in the current namespace! className.method(“abc”) imported myFunction(34)  Not imported
Module search path »current directory »list of directories specified in PYTHONPATH environment variable »uses installation-default if not defined, e.g., .:/usr/local/lib/python »uses sys.path >>> import sys >>> sys.path ['', 'C:PROGRA~1Python2.2', 'C:Program FilesPython2.2DLLs', 'C:Program FilesPython2.2lib', 'C:Program FilesPython2.2liblib-tk', 'C:Program FilesPython2.2', 'C:Program FilesPython2.2libsite- packages']
Target Web Scraping
Target »Web Scraping  The need and importance of extracting data from the web is becoming increasingly loud and clear.  There are several ways to extract information from the web.  Use of APIs being probably the best way to extract data from a website. If you can get what you need through an API, it is almost always preferred approach over web scrapping.
Target »Web Scraping  Sadly, not all websites provide an API.  Some do it because they do not want the readers to extract huge information in structured way, while others don’t provide APIs due to lack of technical knowledge. What do you do in these cases?  Well, we need to scrape the website to fetch the information.
Target »Ok, but what is Web Scraping?  Web scraping is a computer software technique of extracting information from websites. This technique mostly focuses on the transformation of unstructured data (HTML format) on the web into structured data (database or spreadsheet).  You can perform web scrapping in various ways…  We’ll resort to Python because of its ease and rich ecosystem. It has a library known as ‘BeautifulSoup’ which assists this task.  In this lesson, I’ll show you the easiest way to learn web scraping using python programming.
Downloading Files from the Web with the requests Module » The requests module lets you easily download files from the Web without having to worry about complicated issues such as network errors, connection problems, and data compression. » The requests module doesn’t come with Python, so you’ll have to install it first. From the command line, run pip install requests. » Next, do a simple test to make sure the requests module installed itself correctly. Enter the following into the interactive shell: >>> import requests » If no error messages show up, then the requests module has been successfully installed.
Downloading a Web Page with the requests.get() Function »The requests.get() function takes a string of a URL to download. »By calling type() on requests.get()’s return value, you can see that it returns a Response object, which contains the response that the web server gave for your request…
Downloading a Web Page with the requests.get() Function The Complete Works of William Shakespeare The Tragedy of Romeo and Juliet The Library of the Future Complete Works of William Shakespeare Library of the Future is a TradeMark (TM) of World Library Inc. <<THIS ELECTRONIC VERSION OF THE COMPLETE WORKS OF WILLIAM SHAKESPEARE IS COPYRIGHT 1990-1993 BY WORLD LIBRARY, INC., AND IS PROVIDED BY PROJECT GUTENBERG ETEXT OF CARNEGIE MELLON UNIVERSITY WITH PERMISSION. ELECTRONIC AND MACHINE READABLE COPIES MAY BE DISTRIBUTED SO LONG AS SUCH COPIES (1) ARE FOR YOUR OR OTHERS PERSONAL USE ONLY, AND (2) ARE NOT DISTRIBUTED OR USED COMMERCIALLY. PROHIBITED COMMERCIAL DISTRIBUTION INCLUDES BY ANY SERVICE THAT CHARGES FOR DOWNLOAD TIME OR FOR MEMBERSHIP.>> 1595 THE TRAGEDY OF ROMEO AND JULIET by William Shakespeare
Libraries required for web scraping » Urllib2: It is a Python module which can be used for fetching URLs.  It defines functions and classes to help with URL actions (basic and digest authentication, redirections, cookies, etc).  For more detail refer to the documentation page. » BeautifulSoup: It is an incredible tool for pulling out information from a webpage.  You can use it to extract tables, lists, paragraph and you can also put filters to extract information from web pages.  You can look at the installation instruction in its documentation page.
Basics – Get familiar with HTML (Tags) <!DOCTYPE html> : HTML documents must start with a type declaration HTML document is contained between <html> and </html> The visible part of the HTML document is between <body> and </body> HTML headings are defined with the <h1> to <h6> tags HTML paragraphs are defined with the <p> tag
Basics – Get familiar with HTML (Tags)
Basics – Get familiar with HTML (Tags) »Other useful HTML tags are:  HTML links are defined with the <a> tag, “<a href=“http://www.test.com” >This is a link for test.com</a>”  HTML tables are defined with<Table>, row as <tr> and rows are divided into data as <td>
Scrapping a web Page using BeautifulSoup »Here, I am scraping data from a Wikipedia page.  Our final goal is to extract list of state, union territory capitals in India. And some basic detail like establishment, former capital and others form this wikipedia page.  Let’s learn with doing this project step wise step…
Defining Classes Object Oriented Programming in Python
It’s all objects… »Everything in Python is really an object.  We’ve seen hints of this already… “hello”.upper() list3.append(‘a’) dict2.keys()  These look like Java or C++ method calls.  New object classes can easily be defined in addition to these built- in data-types. »In fact, programming in Python is typically done in an object oriented fashion.
Defining a Class »A class is a special data type which defines how to build a certain kind of object. »The class also stores some data items that are shared by all the instances of this class »Instances are objects that are created which follow the definition given inside of the class »Python doesn’t use separate class interface definitions as in some languages »You just define the class and then use it
Methods in Classes »Define a method in a class by including function definitions within the scope of the class block »There must be a special first argument self in all of method definitions which gets bound to the calling instance »There is usually a special method called __init__ in most classes »We’ll talk about both later…
A simple class def: student class student: “““A class representing a student ””” def __init__(self,n,a): self.full_name = n self.age = a def get_age(self): return self.age
Instantiating Objects » There is no “new” keyword as in Java. » Just use the class name with ( ) notation and assign the result to a variable » __init__ serves as a constructor for the class. Usually does some initialization work » The arguments passed to the class name are given to its __init__() method » So, the __init__ method for student is passed “Bob” and 21 and the new class instance is bound to b: b = student(“Bob”, 21)
Constructor: __init__ »An __init__ method can take any number of arguments. »Like other functions or methods, the arguments can be defined with default values, making them optional to the caller. »However, the first argument self in the definition of __init__ is special…
Self »The first argument of every method is a reference to the current instance of the class »By convention, we name this argument self »In __init__, self refers to the object currently being created; so, in other class methods, it refers to the instance whose method was called »Similar to the keyword this in Java or C++ »But Python uses self more often than Java uses this
Self »Although you must specify self explicitly when defining the method, you don’t include it when calling the method. »Python passes it for you automatically Defining a method: Calling a method: (this code inside a class definition.) def set_age(self, num): >>> x.set_age(23) self.age = num
Deleting instances: No Need to “free” »When you are done with an object, you don’t have to delete or free it explicitly. »Python has automatic garbage collection. »Python will automatically detect when all of the references to a piece of memory have gone out of scope. Automatically frees that memory. »Generally works well, few memory leaks »There’s also no “destructor” method for classes
Definition of student class student: “““A class representing a student ””” def __init__(self,n,a): self.full_name = n self.age = a def get_age(self): return self.age
Traditional Syntax for Access >>> f = student(“Bob Smith”, 23) >>> f.full_name # Access attribute “Bob Smith” >>> f.get_age() # Access a method 23
Two Kinds of Attributes » The non-method data stored by objects are called attributes » Data attributes  Variable owned by a particular instance of a class  Each instance has its own value for it  These are the most common kind of attribute » Class attributes  Owned by the class as a whole  All class instances share the same value for it  Called “static” variables in some languages  Good for (1) class-wide constants and (2) building counter of how many instances of the class have been made
Data Attributes »Data attributes are created and initialized by an __init__() method.  Simply assigning to a name creates the attribute  Inside the class, refer to data attributes using self  for example, self.full_name class teacher: “A class representing teachers.” def __init__(self,n): self.full_name = n def print_name(self): print self.full_name
Class Attributes » Because all instances of a class share one copy of a class attribute, when any instance changes it, the value is changed for all instances » Class attributes are defined within a class definition and outside of any method » Since there is one of these attributes per class and not one per instance, they’re accessed via a different notation:  Access class attributes using self.__class__.name notation -- This is just one way to do this & the safest in general. class sample: >>> a = sample() x = 23 >>> a.increment() def increment(self): >>> a.__class__.x self.__class__.x += 1 24
Data vs. Class Attributes class counter: overall_total = 0 # class attribute def __init__(self): self.my_total = 0 # data attribute def increment(self): counter.overall_total = counter.overall_total + 1 self.my_total = self.my_total + 1 >>> a = counter() >>> b = counter() >>> a.increment() >>> b.increment() >>> b.increment() >>> a.my_total 1 >>> a.__class__.overall_total 3 >>> b.my_total 2 >>> b.__class__.overall_total 3
Subclasses »A class can extend the definition of another class  Allows use (or extension ) of methods and attributes already defined in the previous one.  New class: subclass. Original: parent, ancestor or superclass »To define a subclass, put the name of the superclass in parentheses after the subclass’s name on the first line of the definition. Class Cs_student(student):  Python has no ‘extends’ keyword like Java.  Multiple inheritance is supported.
Redefining Methods »To redefine a method of the parent class, include a new definition using the same name in the subclass.  The old code won’t get executed. »To execute the method in the parent class in addition to new code for some method, explicitly call the parent’s version of the method. parentClass.methodName(self, a, b, c)  The only time you ever explicitly pass ‘self’ as an argument is when calling a method of an ancestor.
Definition of a class extending student Class Student: “A class representing a student.” def __init__(self,n,a): self.full_name = n self.age = a def get_age(self): return self.age Class Cs_student (student): “A class extending student.” def __init__(self,n,a,s): student.__init__(self,n,a) #Call __init__ for student self.section_num = s def get_age(): #Redefines get_age method entirely print “Age: ” + str(self.age)

Introduction to python programming 1

  • 1.
    Management Analytics Python Giovanni DellaLunga giovanni.dellalunga@gmail.com MASTER BIG DATA, ANALYTICS AND TECHNOLOGIES FOR MANAGEMENT
  • 2.
  • 3.
    4 Major Versionsof Python »“Python” is written in C/C++ - Version 2.7 came out in mid-2010 - Version 3.1.2 came out in early 2010 »“Jython” is written in Java for the JVM »“IronPython” is (was!) written in C# for the .Net environment
  • 4.
  • 5.
    Development Environments what IDEto use? http://stackoverflow.com/questions/81584 1. PyDev with Eclipse 2. Komodo 3. Emacs 4. Vim 5. TextMate 6. Gedit 7. Idle 8. PIDA (Linux)(VIM Based) 9. NotePad++ (Windows) 10.BlueFish (Linux)
  • 6.
  • 7.
    Setup »Anaconda  http://docs.continuum.io/conda/index.html  Installs: Python env (including IPython)  Several packages »Eclipse (pre-requisite: Java)  http://www.eclipse.org/downloads/ »PyDev (requires Java 7)  Install: http://pydev.org/manual_101_install.html  Setup Interpreter
  • 8.
    Python Interactive Shell %python Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> You can type things directly into a running Python session >>> 2+3*4 14 >>> name = "Andrew" >>> name 'Andrew' >>> print "Hello", name Hello Andrew >>>
  • 9.
    The Python Interpreter •Python is an interpreted language • The interpreter provides an interactive environment to play with the language • Results of expressions are printed on the screen >>> 3 + 7 10 >>> 3 < 15 True >>> 'print me' 'print me' >>> print 'print me' print me >>>
  • 10.
    The print Statement >>>print 'hello' hello >>> print 'hello', 'there' hello there • Elements separated by commas print with a space between them • A comma at the end of the statement (print ‘hello’,) will not print a newline character
  • 11.
    No Braces, onlySpaces! »Python uses indentation instead of braces to determine the scope of expressions »All lines must be indented the same amount to be part of the scope (or indented more if part of an inner scope) »This forces the programmer to use proper indentation since the indenting is part of the program!
  • 12.
    Variables »Are not declared,just assigned »The variable is created the first time you assign it a value »Are references to objects »Type information is with the object, not the reference »Everything in Python is an object
  • 13.
  • 14.
  • 15.
    Numbers: Integers »Integer –the equivalent of a C long »Long Integer – an unbounded integer value. >>> 132224 132224 >>> 132323 ** 2 17509376329L >>>
  • 16.
    Numbers: Floating Point »int(x)converts x to an integer »float(x) converts x to a floating point »The interpreter shows a lot of digits >>> 1.23232 1.2323200000000001 >>> print 1.23232 1.23232 >>> 1.3E7 13000000.0 >>> int(2.0) 2 >>> float(2) 2.0
  • 17.
    Numbers are immutable >>>x = 4.5 >>> y = x >>> y += 3 >>> x 4.5 >>> y 7.5 x 4.5 y x 4.5 y 7.5
  • 18.
    Basic operations »Assignment:  size= 40  a = b = c = 3 »Numbers  integer, float  complex numbers: 1j+3, abs(z) »Strings  'hello world', 'it's hot'  "bye world"  continuation via or use """ long text """"
  • 19.
    Date » import datetime »now = datetime.datetime.now() » print » print "Current date and time using str method of datetime object:" » print str(now) » print » print "Current date and time using instance attributes:" » print "Current year: %d" % now.year » print "Current month: %d" % now.month » print "Current day: %d" % now.day » print "Current hour: %d" % now.hour » print "Current minute: %d" % now.minute » print "Current second: %d" % now.second » print "Current microsecond: %d" % now.microsecond » print » print "Current date and time using strftime:" » print now.strftime("%Y-%m-%d %H:%M")
  • 20.
    Lists »lists can beheterogeneous  a = ['spam', 'eggs', 100, 1234, 2*2] »Lists can be indexed and sliced:  a[0]  spam  a[:2]  ['spam', 'eggs'] »Lists can be manipulated  a[2] = a[2] + 23  a[0:2] = [1,12]  a[0:0] = []  len(a)  5
  • 21.
    List methods »append(x) »extend(L)  appendall items in list (like Tcl lappend) »insert(i,x) »remove(x) »pop([i]), pop()  create stack (FIFO), or queue (LIFO)  pop(0) »index(x)  return the index for value x
  • 22.
    List methods »count(x)  howmany times x appears in list »sort()  sort items in place »reverse()  reverse list
  • 23.
    String Literals »Strings areimmutable »There is no char type like in C++ or Java »+ is overloaded to do concatenation >>> x = 'hello' >>> x = x + ' there' >>> x 'hello there'
  • 24.
    Strings share manyfeatures with lists >>> smiles = "C(=N)(N)N.C(=O)(O)O" >>> smiles[0] 'C' >>> smiles[1] '(' >>> smiles[-1] 'O' >>> smiles[1:5] '(=N)' >>> smiles[10:-4] 'C(=O)' Use “slice” notation to get a substring
  • 25.
    String operations »concatenate with+ or neighbors  word = 'Help' + x  word = 'Help' 'a' »subscripting of strings  'Hello'[2]  'l'  slice: 'Hello'[1:2]  'el'  word[-1]  last character  len(word)  5  immutable: cannot assign to subscript
  • 26.
    String Methods: find,split smiles = "C(=N)(N)N.C(=O)(O)O" >>> smiles.find("(O)") 15 >>> smiles.find(".") 9 >>> smiles.find(".", 10) -1 >>> smiles.split(".") ['C(=N)(N)N', 'C(=O)(O)O'] >>> Use “find” to find the start of a substring. Start looking at position 10. Find returns -1 if it couldn’t find a match. Split the string into parts with “.” as the delimiter
  • 27.
    String operators: in,not in if "Br" in “Brother”: print "contains brother“ email_address = “clin” if "@" not in email_address: email_address += "@brandeis.edu“
  • 28.
    String Method: “strip”,“rstrip”, “lstrip” are ways to remove whitespace or selected characters >>> line = " # This is a comment line n" >>> line.strip() '# This is a comment line' >>> line.rstrip() ' # This is a comment line' >>> line.rstrip("n") ' # This is a comment line ' >>>
  • 29.
    More String methods email.startswith(“c")endswith(“u”) True/False >>> "%s@brandeis.edu" % "clin" 'clin@brandeis.edu' >>> names = [“Ben", “Chen", “Yaqin"] >>> ", ".join(names) ‘Ben, Chen, Yaqin‘ >>> “chen".upper() ‘CHEN'
  • 30.
    Control flow: if x= int(raw_input("Please enter #:")) if x < 0: x = 0 print 'Negative changed to zero' elif x == 0: print 'Zero' elif x == 1: print 'Single' else: print 'More' » no case statement
  • 31.
    Control flow: for a= ['cat', 'window', 'defenestrate'] for x in a: print x, len(x) »no arithmetic progression, but  range(10)  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]  for i in range(len(a)): print i, a[i] »do not modify the sequence being iterated over
  • 32.
    Loop Control Statements breakJumps out of the closest enclosing loop continue Jumps to the top of the closest enclosing loop pass Does nothing, empty statement placeholder
  • 33.
    Loops: break, continue,else »break and continue like C »else after loop exhaustion for n in range(2,10): for x in range(2,n): if n % x == 0: print n, 'equals', x, '*', n/x break else: # loop fell through without finding a factor print n, 'is prime'
  • 34.
    Loop example » importfnmatch » import os » images = ['*.jpg', '*.jpeg', '*.png', '*.tif', '*.tiff'] » matches = [] » for root, dirnames, filenames in os.walk('C:'): » for extensions in images: » for filename in fnmatch.filter(filenames, extensions): » print filename » matches.append(os.path.join(root, filename)) Simple Matching fnmatch() compares a single file name against a pattern and returns a Boolean indicating whether or not they match. Filtering To test a sequence of filenames, you can use filter(). It returns a list of the names that match the pattern argument.
  • 35.
    Loop example » #Python program to check if the input number is prime or not » num = 407 » # take input from the user » # num = int(input("Enter a number: ")) » # prime numbers are greater than 1 » if num > 1: » # check for factors » for i in range(2,num): » if (num % i) == 0: » print(num,"is not a prime number") » print(i,"times",num//i,"is",num) » break » else: » print(num,"is a prime number") » » # if input number is less than » # or equal to 1, it is not prime » else: » print(num,"is not a prime number") To understand this example, you should have the knowledge of following Python programming topics: • Python if...else Statement • Python for Loop • Python break and continue A positive integer greater than 1 which has no other factors except 1 and the number itself is called a prime number. 2, 3, 5, 7 etc. are prime numbers as they do not have any other factors. But 6 is not prime (it is composite) since, 2 x 3 = 6.
  • 36.
  • 37.
    List comprehensions »Create listswithout map(), filter(), lambda »= expression followed by for clause + zero or more for or of clauses >>> vec = [2,4,6] >>> [3*x for x in vec] [6, 12, 18] >>> [{x: x**2} for x in vec} [{2: 4}, {4: 16}, {6: 36}]
  • 38.
    List comprehensions »cross products: >>>vec1 = [2,4,6] >>> vec2 = [4,3,-9] >>> [x*y for x in vec1 for y in vec2] [8,6,-18, 16,12,-36, 24,18,-54] >>> [x+y for x in vec1 and y in vec2] [6,5,-7,8,7,-5,10,9,-3] >>> [vec1[i]*vec2[i] for i in range(len(vec1))] [8,12,-54]
  • 39.
    List comprehensions »can alsouse if: >>> [3*x for x in vec if x > 3] [12, 18] >>> [3*x for x in vec if x < 2] []
  • 40.
    Tuples: sort ofan immutable list >>> yellow = (255, 255, 0) # r, g, b >>> one = (1,) >>> yellow[0] >>> yellow[1:] (255, 0) >>> yellow[0] = 0 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment Very common in string interpolation: >>> "%s lives in %s at latitude %.1f" % ("Andrew", "Sweden", 57.7056) 'Andrew lives in Sweden at latitude 57.7'
  • 41.
    Tuples and sequences »lists,strings, tuples: examples of sequence type »tuple = values separated by commas >>> t = 123, 543, 'bar' >>> t[0] 123 >>> t (123, 543, 'bar')
  • 42.
    Tuples »Tuples may benested >>> u = t, (1,2) >>> u ((123, 542, 'bar'), (1,2)) »like strings, immutable  can't assign to individual items
  • 43.
    Tuples »Empty tuples: () >>>empty = () >>> len(empty) 0 »one item  trailing comma >>> singleton = 'foo',
  • 44.
    Tuples »sequence unpacking distribute elements across variables >>> t = 123, 543, 'bar' >>> x, y, z = t >>> x 123 »packing always creates tuple »unpacking works for any sequence
  • 45.
    Dictionaries »associative arrays »indexed bykeys »keys are any immutable type: e.g., tuples »but not lists (mutable!) »uses 'key: value' notation >>> tel = {'hgs' : 7042, 'lennox': 7018} >>> tel['cs'] = 7000 >>> tel
  • 46.
    Dictionaries »no particular order »deleteelements with del >>> del tel['foo'] »keys() method  unsorted list of keys >>> tel.keys() ['cs', 'lennox', 'hgs'] »use has_key() to check for existence >>> tel.has_key('foo') 0
  • 47.
    Dictionaries » prices ={'apple': 0.40, 'banana': 0.50} » my_purchase = { » 'apple': 1, » 'banana': 6} » grocery_bill = sum(prices[fruit] * my_purchase[fruit] » for fruit in my_purchase) » print 'I owe the grocer $%.2f' % grocery_bill
  • 48.
    Defining functions def fib(n): """Printa Fibonacci series up to n.""" a, b = 0, 1 while b < n: print b, a, b = b, a+b >>> fib(2000) » First line is docstring » first look for variables in local, then global » need global to assign global variables
  • 49.
  • 50.
    Importing and Modules »Use classes & functions defined in another file » A Python module is a file with the same name (plus the .py extension) » Like Java import, C++ include » Three formats of the command: import somefile from somefile import * from somefile import className » The difference? What gets imported from the file and what name refers to it after importing
  • 51.
    import … import somefile »Everything in somefile.py gets imported. » To refer to something in the file, append the text “somefile.” to the front of its name: somefile.className.method(“abc”) somefile.myFunction(34)
  • 52.
    from … import* from somefile import * » Everything in somefile.py gets imported » To refer to anything in the module, just use its name. Everything in the module is now in the current namespace. » Take care! Using this import command can easily overwrite the definition of an existing function or variable! className.method(“abc”) myFunction(34)
  • 53.
    from … import… from somefile import className » Only the item className in somefile.py gets imported. » After importing className, you can just use it without a module prefix. It’s brought into the current namespace. » Take care! Overwrites the definition of this name if already defined in the current namespace! className.method(“abc”) imported myFunction(34)  Not imported
  • 54.
    Module search path »currentdirectory »list of directories specified in PYTHONPATH environment variable »uses installation-default if not defined, e.g., .:/usr/local/lib/python »uses sys.path >>> import sys >>> sys.path ['', 'C:PROGRA~1Python2.2', 'C:Program FilesPython2.2DLLs', 'C:Program FilesPython2.2lib', 'C:Program FilesPython2.2liblib-tk', 'C:Program FilesPython2.2', 'C:Program FilesPython2.2libsite- packages']
  • 55.
  • 56.
    Target »Web Scraping  Theneed and importance of extracting data from the web is becoming increasingly loud and clear.  There are several ways to extract information from the web.  Use of APIs being probably the best way to extract data from a website. If you can get what you need through an API, it is almost always preferred approach over web scrapping.
  • 57.
    Target »Web Scraping  Sadly,not all websites provide an API.  Some do it because they do not want the readers to extract huge information in structured way, while others don’t provide APIs due to lack of technical knowledge. What do you do in these cases?  Well, we need to scrape the website to fetch the information.
  • 58.
    Target »Ok, but whatis Web Scraping?  Web scraping is a computer software technique of extracting information from websites. This technique mostly focuses on the transformation of unstructured data (HTML format) on the web into structured data (database or spreadsheet).  You can perform web scrapping in various ways…  We’ll resort to Python because of its ease and rich ecosystem. It has a library known as ‘BeautifulSoup’ which assists this task.  In this lesson, I’ll show you the easiest way to learn web scraping using python programming.
  • 59.
    Downloading Files fromthe Web with the requests Module » The requests module lets you easily download files from the Web without having to worry about complicated issues such as network errors, connection problems, and data compression. » The requests module doesn’t come with Python, so you’ll have to install it first. From the command line, run pip install requests. » Next, do a simple test to make sure the requests module installed itself correctly. Enter the following into the interactive shell: >>> import requests » If no error messages show up, then the requests module has been successfully installed.
  • 60.
    Downloading a WebPage with the requests.get() Function »The requests.get() function takes a string of a URL to download. »By calling type() on requests.get()’s return value, you can see that it returns a Response object, which contains the response that the web server gave for your request…
  • 61.
    Downloading a WebPage with the requests.get() Function The Complete Works of William Shakespeare The Tragedy of Romeo and Juliet The Library of the Future Complete Works of William Shakespeare Library of the Future is a TradeMark (TM) of World Library Inc. <<THIS ELECTRONIC VERSION OF THE COMPLETE WORKS OF WILLIAM SHAKESPEARE IS COPYRIGHT 1990-1993 BY WORLD LIBRARY, INC., AND IS PROVIDED BY PROJECT GUTENBERG ETEXT OF CARNEGIE MELLON UNIVERSITY WITH PERMISSION. ELECTRONIC AND MACHINE READABLE COPIES MAY BE DISTRIBUTED SO LONG AS SUCH COPIES (1) ARE FOR YOUR OR OTHERS PERSONAL USE ONLY, AND (2) ARE NOT DISTRIBUTED OR USED COMMERCIALLY. PROHIBITED COMMERCIAL DISTRIBUTION INCLUDES BY ANY SERVICE THAT CHARGES FOR DOWNLOAD TIME OR FOR MEMBERSHIP.>> 1595 THE TRAGEDY OF ROMEO AND JULIET by William Shakespeare
  • 62.
    Libraries required forweb scraping » Urllib2: It is a Python module which can be used for fetching URLs.  It defines functions and classes to help with URL actions (basic and digest authentication, redirections, cookies, etc).  For more detail refer to the documentation page. » BeautifulSoup: It is an incredible tool for pulling out information from a webpage.  You can use it to extract tables, lists, paragraph and you can also put filters to extract information from web pages.  You can look at the installation instruction in its documentation page.
  • 63.
    Basics – Getfamiliar with HTML (Tags) <!DOCTYPE html> : HTML documents must start with a type declaration HTML document is contained between <html> and </html> The visible part of the HTML document is between <body> and </body> HTML headings are defined with the <h1> to <h6> tags HTML paragraphs are defined with the <p> tag
  • 64.
    Basics – Getfamiliar with HTML (Tags)
  • 65.
    Basics – Getfamiliar with HTML (Tags) »Other useful HTML tags are:  HTML links are defined with the <a> tag, “<a href=“http://www.test.com” >This is a link for test.com</a>”  HTML tables are defined with<Table>, row as <tr> and rows are divided into data as <td>
  • 66.
    Scrapping a webPage using BeautifulSoup »Here, I am scraping data from a Wikipedia page.  Our final goal is to extract list of state, union territory capitals in India. And some basic detail like establishment, former capital and others form this wikipedia page.  Let’s learn with doing this project step wise step…
  • 67.
    Defining Classes Object OrientedProgramming in Python
  • 68.
    It’s all objects… »Everythingin Python is really an object.  We’ve seen hints of this already… “hello”.upper() list3.append(‘a’) dict2.keys()  These look like Java or C++ method calls.  New object classes can easily be defined in addition to these built- in data-types. »In fact, programming in Python is typically done in an object oriented fashion.
  • 69.
    Defining a Class »Aclass is a special data type which defines how to build a certain kind of object. »The class also stores some data items that are shared by all the instances of this class »Instances are objects that are created which follow the definition given inside of the class »Python doesn’t use separate class interface definitions as in some languages »You just define the class and then use it
  • 70.
    Methods in Classes »Definea method in a class by including function definitions within the scope of the class block »There must be a special first argument self in all of method definitions which gets bound to the calling instance »There is usually a special method called __init__ in most classes »We’ll talk about both later…
  • 71.
    A simple classdef: student class student: “““A class representing a student ””” def __init__(self,n,a): self.full_name = n self.age = a def get_age(self): return self.age
  • 72.
    Instantiating Objects » Thereis no “new” keyword as in Java. » Just use the class name with ( ) notation and assign the result to a variable » __init__ serves as a constructor for the class. Usually does some initialization work » The arguments passed to the class name are given to its __init__() method » So, the __init__ method for student is passed “Bob” and 21 and the new class instance is bound to b: b = student(“Bob”, 21)
  • 73.
    Constructor: __init__ »An __init__method can take any number of arguments. »Like other functions or methods, the arguments can be defined with default values, making them optional to the caller. »However, the first argument self in the definition of __init__ is special…
  • 74.
    Self »The first argumentof every method is a reference to the current instance of the class »By convention, we name this argument self »In __init__, self refers to the object currently being created; so, in other class methods, it refers to the instance whose method was called »Similar to the keyword this in Java or C++ »But Python uses self more often than Java uses this
  • 75.
    Self »Although you mustspecify self explicitly when defining the method, you don’t include it when calling the method. »Python passes it for you automatically Defining a method: Calling a method: (this code inside a class definition.) def set_age(self, num): >>> x.set_age(23) self.age = num
  • 76.
    Deleting instances: NoNeed to “free” »When you are done with an object, you don’t have to delete or free it explicitly. »Python has automatic garbage collection. »Python will automatically detect when all of the references to a piece of memory have gone out of scope. Automatically frees that memory. »Generally works well, few memory leaks »There’s also no “destructor” method for classes
  • 77.
    Definition of student classstudent: “““A class representing a student ””” def __init__(self,n,a): self.full_name = n self.age = a def get_age(self): return self.age
  • 78.
    Traditional Syntax forAccess >>> f = student(“Bob Smith”, 23) >>> f.full_name # Access attribute “Bob Smith” >>> f.get_age() # Access a method 23
  • 79.
    Two Kinds ofAttributes » The non-method data stored by objects are called attributes » Data attributes  Variable owned by a particular instance of a class  Each instance has its own value for it  These are the most common kind of attribute » Class attributes  Owned by the class as a whole  All class instances share the same value for it  Called “static” variables in some languages  Good for (1) class-wide constants and (2) building counter of how many instances of the class have been made
  • 80.
    Data Attributes »Data attributesare created and initialized by an __init__() method.  Simply assigning to a name creates the attribute  Inside the class, refer to data attributes using self  for example, self.full_name class teacher: “A class representing teachers.” def __init__(self,n): self.full_name = n def print_name(self): print self.full_name
  • 81.
    Class Attributes » Becauseall instances of a class share one copy of a class attribute, when any instance changes it, the value is changed for all instances » Class attributes are defined within a class definition and outside of any method » Since there is one of these attributes per class and not one per instance, they’re accessed via a different notation:  Access class attributes using self.__class__.name notation -- This is just one way to do this & the safest in general. class sample: >>> a = sample() x = 23 >>> a.increment() def increment(self): >>> a.__class__.x self.__class__.x += 1 24
  • 82.
    Data vs. ClassAttributes class counter: overall_total = 0 # class attribute def __init__(self): self.my_total = 0 # data attribute def increment(self): counter.overall_total = counter.overall_total + 1 self.my_total = self.my_total + 1 >>> a = counter() >>> b = counter() >>> a.increment() >>> b.increment() >>> b.increment() >>> a.my_total 1 >>> a.__class__.overall_total 3 >>> b.my_total 2 >>> b.__class__.overall_total 3
  • 83.
    Subclasses »A class canextend the definition of another class  Allows use (or extension ) of methods and attributes already defined in the previous one.  New class: subclass. Original: parent, ancestor or superclass »To define a subclass, put the name of the superclass in parentheses after the subclass’s name on the first line of the definition. Class Cs_student(student):  Python has no ‘extends’ keyword like Java.  Multiple inheritance is supported.
  • 84.
    Redefining Methods »To redefinea method of the parent class, include a new definition using the same name in the subclass.  The old code won’t get executed. »To execute the method in the parent class in addition to new code for some method, explicitly call the parent’s version of the method. parentClass.methodName(self, a, b, c)  The only time you ever explicitly pass ‘self’ as an argument is when calling a method of an ancestor.
  • 85.
    Definition of aclass extending student Class Student: “A class representing a student.” def __init__(self,n,a): self.full_name = n self.age = a def get_age(self): return self.age Class Cs_student (student): “A class extending student.” def __init__(self,n,a,s): student.__init__(self,n,a) #Call __init__ for student self.section_num = s def get_age(): #Redefines get_age method entirely print “Age: ” + str(self.age)