ISI Summer Internship Program 2010 Python Tutorial Liang Huang lhuang@isi.edu based on a course I taught at Penn (but updated to Python 2.6) www.cis.upenn.edu/~lhuang3/cse399-python Monday, June 14, 2010
“Hello, World” • C#include <stdio.h> int main(int argc, char ** argv) { ! printf(“Hello, World!n”); } • Java class Hello public { ! public static void main(String argv[]) ! { ! ! System.out.println(“Hello, World!”); ! } } • now in “Hello, World!” print Python 2 Monday, June 14, 2010
Reversing an Array static int[] reverse_array(int a[]) { ! int [] temp = new int[ a.length ]; ! for (int i = 0; i < len; i++) ! { ! ! temp [i] = a [a.length - i - 1]; ! } ! return temp; } Java def rev(a): def ...(...): Python ! if a == []: no need to specify ! ! return [] ... argument and return types! ! else: python will figure it out. ! ! return rev(a[1:]) + [a[0]] (dynamically typed) or even simpler: a without a[0] singleton list a.reverse() built-in list-processing function 3 Monday, June 14, 2010
Quick-sort public void sort(int low, int high) int partition(int low, int high) { { ! if (low >= high) return; ! int pivot = a[low]; ! int p = partition(low, high); ! int i = low - 1; ! sort(low, p); ! int j = high + 1; ! sort(p + 1, high); ! while (i < j) } ! { void swap(int i, int j) ! ! i++; while (a[i] < pivot) i++; {! ! ! j--; while (a[j] > pivot) j--; ! int temp = a[i]; ! ! if (i < j) swap(i, j); ! a[i] = a[j]; ! } ! a[j] = temp; ! return j; } } Java def sort(a): ! if a == []: {x | x ∈ a, x < pivot} Python ! ! return [] ! else: ! ! pivot = a[0]!! ! ! left = [x for x in a if x < pivot ] ! ! right = [x for x in a[1:] if x >= pivot] ! ! return sort(left) + [pivot] + sort(right) smaller semantic-gap! 4 Monday, June 14, 2010
Python is... • a scripting language (strong in text-processing) • interpreted, like Perl, but much more elegant • a very high-level language (closer to human semantics) • almost like pseudo-code! • procedural (like C, Pascal, Basic, and many more) • but also object-oriented (like C++ and Java) • and even functional! (like ML/OCaml, LISP/Scheme, Haskell, etc.) • from today, you should use Python for everything • not just for scripting, but for serious coding! 5 Monday, June 14, 2010
Basic Python Syntax Monday, June 14, 2010
Numbers and Strings • like Java, Python has built-in (atomic) types • numbers (int, float), bool, string, list, etc. • numeric operators: + - * / ** % >>> a = 5 >>> >>> s = “hey” c = 1.5 >>> b = 3 >>> 5/2 >>> s + “ guys” >>> type (5) 2 'hey guys' <type 'int'> >>> 5/2. >>> len(s) >>> a += 4 2.5 3 >>> a >>> 5 ** 2 >>> s[0] 9 25 'h' no i++ or ++i >>> s[-1] >>> from __future__ import division 'y' >>> 5/2 2.5 recommended! 7 Monday, June 14, 2010
Assignments and Comparisons >>> a = b = 0 >>> a = b = 0 >>> a >>> a == b 0 True >>> b >>> type (3 == 5) 0 <type 'bool'> >>> "my" == 'my' >>> a, b = 3, 5 True >>> a + b 8 >>> (1, 2) == (1, 2) >>> (a, b) = (3, 5) True >>> a + b >>> 8 >>> 1, 2 == 1, 2 >>> a, b = b, a ??? (swap) (1, False, 2) 8 Monday, June 14, 2010
for loops and range() • for always iterates through a list or sequence >>> sum = 0 >>> for i in range(10): ... sum += i ... Java 1.5 >>> print sum foreach (String word : words) ! System.out.println(word) 45 >>> for word in ["welcome", "to", "python"]: ... print word, ... welcome to python >>> range(5), range(4,6), range(1,7,2) ([0, 1, 2, 3, 4], [4, 5], [1, 3, 5]) 9 Monday, June 14, 2010
while loops • very similar to while in Java and C • but be careful • in behaves differently in for and while • break statement, same as in Java/C >>> a, b = 0, 1 >>> while b <= 5: ... print b ... a, b = b, a+b ... 1 simultaneous 1 assignment 2 3 fibonacci series 5 10 Monday, June 14, 2010
Conditionals >>> if x < 10 and x >= 0: >>> if 4 > 5: ... print x, "is a digit" ... print "foo" ... ... else: >>> False and False or True ... print "bar" True ... >>> not True bar False >>> print “foo” if 4 > 5 else “bar” ... conditional expr since Python 2.5 >>> bar C/Java printf( (4>5)? ”foo” : “bar”); 11 Monday, June 14, 2010
if ... elif ... else >>> a = "foo" >>> if a in ["blue", "yellow", "red"]: ... print a + " is a color" ... else: ...! ! if a in ["US", "China"]: ... ! ! print a + " is a country" ... ! ! else: ...! ! ! ! print "I don't know what”, a, “is!" ... I don't know what foo is! switch (a) { ! case “blue”: >>> if a in ...: ! case “yellow”: ! case “red”: ... print ... C/Java ! ! print ...; break; ... elif a in ...: ! case “US”: ! case “China”: ... print ... ! ! print ...; break; ... else: ! else: ... print ... ! } ! print ...; 12 Monday, June 14, 2010 ! !
break, continue and else • break and continue borrowed from C/Java • special else in loops • when loop terminated normally (i.e., not by break) • very handy in testing a set of properties || func(n) >>> for n in range(2, 10): for (n=2; n<10; n++) { ... for x in range(2, n): ! good = true; ! for (x=2; x<n; x++) ... if n % x == 0: ! ! if (n % x == 0) { ... break ! ! ! good = false; ... else: C/Java ! ! ! break; ... print n, ! ! } if (x==n) ... ! if (good) ! ! printf(“%d “, n); } prime numbers 13 Monday, June 14, 2010
Defining a Function def • no type declarations needed! wow! • Python will figure it out at run-time • you get a run-time error for type violation • well, Python does not have a compile-error at all >>> def fact(n): ... if n == 0: ... return 1 ... else: ... return n * fact(n-1) ... >>> fact(4) 24 14 Monday, June 14, 2010
Fibonacci Revisited >>> a, b = 0, 1 >>> while b <= 5: ... print b ... a, b = b, a+b ... 1 1 def fib(n): 2 ! if n <= 1: 3 ! ! return n 5 ! else: ! ! return fib (n-1) + fib (n-2) conceptually cleaner, but much slower! >>> fib(5) 5 >>> fib(6) 8 15 Monday, June 14, 2010
Default Values >>> def add(a, L=[]): ... return L + [a] ... >>> add(1) [1] >>> add(1,1) error! >>> add(add(1)) [[1]] lists are heterogenous! >>> add(add(1), add(1)) ??? [1, [1]] 16 Monday, June 14, 2010
Approaches to Typing ✓ strongly typed: types are strictly enforced. no implicit type conversion - weakly typed: not strictly enforced - statically typed: type-checking done at compile-time ✓ dynamically typed: types are inferred at runtime weak strong static C, C++ Java, Pascal dynamic Perl,VB Python, OCaml, Scheme 17 Monday, June 14, 2010
Lists heterogeneous variable-sized array a = [1,'python', [2,'4']] Monday, June 14, 2010
Basic List Operations • length, subscript, and slicing >>> a[0:3:2] [1, [2, '4']] >>> a = [1,'python', [2,'4']] >>> len(a) >>> a[:-1] 3 [1, 'python'] >>> a[2][1] '4' >>> a[0:3:] >>> a[3] [1, 'python', [2, '4']] IndexError! >>> a[-2] >>> a[0::2] 'python' [1, [2, '4']] >>> a[1:2] ['python'] >>> a[::] [1, 'python', [2, '4']] >>> a[:] [1, 'python', [2, '4']] 19 Monday, June 14, 2010
+, extend, +=, append • extend (+=) and append mutates the list! >>> a = [1,'python', [2,'4']] >>> a + [2] [1, 'python', [2, '4'], 2] >>> a.extend([2, 3]) >>> a [1, 'python', [2, '4'], 2, 3] same as a += [2, 3] >>> a.append('5') >>> a [1, 'python', [2, '4'], 2, 3, '5'] >>> a[2].append('xtra') >>> a [1, 'python', [2, '4', 'xtra'], 2, 3, '5'] 20 Monday, June 14, 2010
Comparison and Reference • as in Java, comparing built-in types is by value • by contrast, comparing objects is by reference >>> c = b [:] >>> [1, '2'] == [1, '2'] >>> c True [1, 5] >>> a = b = [1, '2'] >>> c == b slicing gets >>> a == b True a shallow copy True >>> c is b >>> a is b False True >>> b [1] = 5 what about a += [1]? >>> a [1, 5] >>> b[:0] = [2] insertion >>> b >>> a = 4 [2, 1, 3, 4] >>> b >>> b[1:3]=[] [1, 5] >>> b >>> a is b [2, 4] deletion >>> False 21 Monday, June 14, 2010
List Comprehension >>> a = [1, 5, 2, 3, 4 , 6] >>> [x*2 for x in a] [2, 10, 4, 6, 8, 12] >>> [x for x in a if 4th smallest element ... len( [y for y in a if y < x] ) == 3 ] [4] >>> a = range(2,10) >>> [x*x for x in a if ... [y for y in a if y < x and (x % y == 0)] == [] ] ??? [4, 9, 25, 49] square of prime numbers 22 Monday, June 14, 2010
List Comprehensions >>> vec = [2, 4, 6] >>> [[x,x**2] for x in vec] [[2, 4], [4, 16], [6, 36]] >>> [x, x**2 for x in vec] SyntaxError: invalid syntax >>> [(x, x**2) for x in vec] [(2, 4), (4, 16), (6, 36)] >>> vec1 = [2, 4, 6] >>> vec2 = [4, 3, -9] >>> [x*y for x in vec1 for y in vec2] [8, 6, -18, 16, 12, -36, 24, 18, -54] (cross product) >>> [x+y for x in vec1 for y in vec2] [6, 5, -7, 8, 7, -5, 10, 9, -3] >>> [vec1[i]*vec2[i] for i in range(len(vec1))] [8, 12, -54] (dot product) 23 Monday, June 14, 2010
Strings sequence of characters Monday, June 14, 2010
Basic String Operations • join, split, strip • upper(), lower() >>> s = " this is a python course. n" >>> words = s.split() >>> words ['this', 'is', 'a', 'python', 'course.'] >>> s.strip() 'this is a python course.' >>> " ".join(words) 'this is a python course.' >>> "; ".join(words).split("; ") ['this', 'is', 'a', 'python', 'course.'] >>> s.upper() ' THIS IS A PYTHON COURSE. n' http://docs.python.org/lib/string-methods.html 25 Monday, June 14, 2010
Basic Search/Replace in String >>> "this is a course".find("is") 2 >>> "this is a course".find("is a") 5 >>> "this is a course".find("is at") -1 >>> "this is a course".replace("is", "was") 'thwas was a course' >>> "this is a course".replace(" is", " was") 'this was a course' >>> "this is a course".replace("was", "were") 'this is a course' these operations are much faster than regexps! 26 Monday, June 14, 2010
String Formatting >>> print “%.2f%%” % 97.2363 97.24 >>> s = '%s has %03d quote types.' % ("Python", 2) >>> print s Python has 002 quote types. 27 Monday, June 14, 2010
Tuples immutable lists Monday, June 14, 2010
Tuples and Equality • caveat: singleton tuple a += (1,2) # new copy • ==, is, is not >>> (1, 'a') >>> 1, 2 == 1, 2 (1, 'a') (1, False, 2) >>> (1) >>> (1, 2) == (1, 2) 1 True >>> [1] >>> (1, 2) is (1, 2) [1] False >>> (1,) >>> "ab" is "ab" (1,) True >>> [1,] >>> [1] is [1] [1] False >>> (5) + (6) >>> 1 is 1 11 True >>> (5,)+ (6,) >>> True is True (5, 6) True 29 Monday, June 14, 2010
Comparison • between the same type: “lexicographical” • between different types: arbitrary • cmp(): three-way <, >, == • C: strcmp(s, t), Java: a.compareTo(b) >>> (1, 'ab') < (1, 'ac') >>> [1] < [1, 2] < [1, 3] True True >>> (1, ) < (1, 'ac') >>> [1] == [1,] == [1.0] True True >>> [1] < [1, 'ac'] >>> cmp ( (1, ), (1, 2) ) True -1 >>> 1 < True >>> cmp ( (1, ), (1, ) ) False 0 >>> True < 1 >>> cmp ( (1, 2), (1, ) ) False 1 30 Monday, June 14, 2010
enumerate >>> words = ['this', 'is', 'python'] >>> i = 0 >>> for word in words: ... i += 1 ... print i, word ... 1 this 2 is 3 python >>> for i, word in enumerate(words): ... print i+1, word ... • how to enumerate two lists/tuples simultaneously? 31 Monday, June 14, 2010
zip and _ >>> a = [1, 2] >>> b = ['a', 'b'] >>> zip (a,b) [(1, 'a'), (2, 'b')] >>> zip(a,b,a) [(1, 'a', 1), (2, 'b', 2)] >>> zip ([1], b) [(1, 'a')] >>> a = ['p', 'q']; b = [[2, 3], [5, 6]] >>> for i, (x, [_, y]) in enumerate(zip(a, b)): ...! ! print i, x, y ... 0 p 3 1 q 6 32 Monday, June 14, 2010
zip and list comprehensions >>> vec1 = [2, 4, 6] >>> vec2 = [4, 3, -9] >>> [(x, y) for x in vec1 for y in vec2] [(2, 4), (2, 3), (2, -9), (4, 4), (4, 3), (4, -9), (6, 4), (6, 3), (6, -9)] >>> [(vec1[i], vec2[i]) for i in range(len(vec1))] [(2, 4), (4, 3), (6, -9)] >>> sum([vec1[i]*vec2[i] for i in range(len(vec1))] -34 >>> sum([x*y for (x,y) in zip(vec1, vec2)]) -34 >>> sum([v[0]*v[1] for v in zip(vec1, vec2)]) -34 33 Monday, June 14, 2010
how to implement zip? binary zip: easy >>> def myzip(a,b): ... if a == [] or b == []: ... return [] ... return [(a[0], b[0])] + myzip(a[1:], b[1:]) ... >>> myzip([1,2], ['a','b']) [(1, 'a'), (2, 'b')] >>> myzip([1,2], ['b']) [(1, 'b')] how to deal with arbitrarily many arguments? 34 Monday, June 14, 2010
Basic import and I/O Monday, June 14, 2010
import and I/O • similar to import in Java • File I/O much easier than Java import sys from sys import * for line in sys.stdin: or for line in stdin: ! print line.split() ! print line.split() import System; Java import System.*; >>> f = open("my.in", "rt") to read a line: >>> g = open("my.out", "wt") line = f.readline() >>> for line in f: ... print >> g, line, to read all the lines: ... g.close() lines = f.readlines() file copy note this comma! 36 Monday, June 14, 2010
import and __main__ • multiple source files (modules) foo.py • C: #include “my.h” def pp(a): ! print “ “.join(a) • Java: import My • if __name__ == “__main__”: demo ! from sys import * ! a = stdin.readline() ! pp (a.split()) • handy for debugging >>> import foo >>> pp([1,2,3]) interactive 1 2 3 37 Monday, June 14, 2010
Quiz • Palindromes abcba • read in a string from standard input, and print True if it is a palindrome, print False if otherwise def palindrome(s): len(s) <= 1 if _____________: return True s[0] == s[-1] s[1:-1] return _____________ and palindrome(________) if __name__ == '__main__': import sys ___________ strip() s = sys.stdin.readline().________ print palindrome(s) 38 Monday, June 14, 2010
Functional Programming Monday, June 14, 2010
map and filter • intuition: function as data • we have already seen functional programming a lot! • list comprehension, custom comparison function map(f, a) [ f(x) for x in a ] filter(p, a) [ x for x in a if p(x) ] map(f, filter(p, a)) [ f(x) for x in a if p(x) ] >>> map(int, ['1','2']) >>> def is_even(x): [1, 2] ... return x % 2 == 0 >>> " ".join(map(str, [1,2])) ... 1 2 >>> filter(is_even, [-1, 0]) [0] 40 Monday, June 14, 2010
lambda • map/filter in one line for custom functions? • “anonymous inline function” • borrowed from LISP, Scheme, ML, OCaml >>> f = lambda x: x*2 >>> f(1) 2 >>> map (lambda x: x**2, [1, 2]) [1, 4] >>> filter (lambda x: x > 0, [-1, 1]) [1] >>> g = lambda x,y : x+y >>> g(5,6) 11 >>> map (lambda (x,y): x+y, [(1,2), (3,4)]) [3, 7] 41 Monday, June 14, 2010
more on lambda >>> f = lambda : "good!" >>> f <function <lambda> at 0x381730> >>> f() 'good!' lazy evaluation >>> a = [5, 1, 2, 6, 4] >>> a.sort(lambda x,y : y - x) >>> a [6, 5, 4, 2, 1] custom comparison 42 Monday, June 14, 2010
Dictionaries (heterogeneous) hash maps Monday, June 14, 2010
Constructing Dicts • key : value pairs >>> d = {'a': 1, 'b': 2, 'c': 1} >>> d['b'] 2 >>> d['b'] = 3 >>> d['b'] 3 >>> d['e'] KeyError! >>> d.has_key('a') True >>> 'a' in d True >>> d.keys() ['a', 'c', 'b'] >>> d.values() [1, 1, 3] 44 Monday, June 14, 2010
Other Constructions • zipping, list comprehension, keyword argument • dump to a list of tuples >>> d = {'a': 1, 'b': 2, 'c': 1} >>> keys = ['b', 'c', 'a'] >>> values = [2, 1, 1 ] >>> e = dict (zip (keys, values)) >>> d == e True >>> d.items() [('a', 1), ('c', 1), ('b', 2)] >>> f = dict( [(x, x**2) for x in values] ) >>> f {1: 1, 2: 4} >>> g = dict(a=1, b=2, c=1) >>> g == d True 45 Monday, June 14, 2010
Mapping Type http://docs.python.org/lib/typesmapping.html 46 Monday, June 14, 2010
Sets identity maps, unordered collection Monday, June 14, 2010
Sets • sets do not have a special syntactic form • unlike [] for lists, () for tuples and {} for dicts • construction from lists, tuples, dicts (keys), and strs • in, not in, add, remove >>> a = set([]) >>> a = set((1,2)) >>> 1 in a >>> a False set([1, 2]) >>> a.add(1) >>> b = set([1,2]) >>> a.add('b') >>> a == b >>> a True set([1, 'b']) >>> c = set({1:'a', 2:'b'}) >>> a.remove(1) >>> c >>> a set([1, 2]) set(['b']) 48 Monday, June 14, 2010
set and frozenset type 49 Monday, June 14, 2010
Basic Sorting >>> a = [5, 2, 3, 1, 4] >>> a.sort() >>> print a [1, 2, 3, 4, 5] >>> a = [5, 2, 3, 1, 4] >>> a.sort(reverse=True) >>> a [5, 4, 3, 2, 1] >>> a = [5, 2, 3, 1, 4] >>> a.sort() >>> a.reverse() >>> a [5, 4, 3, 2, 1] 50 Monday, June 14, 2010
Built-in and Custom cmp >>> a = [5, 2, 3, 1, 4] >>> a.sort(cmp) >>> print a [1, 2, 3, 4, 5] >>> a = [5, 2, 3, 1, 4] >>> def reverse_numeric(x, y): >>> return y-x >>> >>> a.sort(reverse_numeric) >>> a [5, 4, 3, 2, 1] 51 Monday, June 14, 2010
Sorting by Keys >>> a = "This is a test string from Andrew".split() >>> a.sort(key=str.lower) >>> a ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This'] >>> import operator >>> L = [('c', 2), ('d', 1), ('a', 4), ('b', 3), ('b', 1)] >>> L.sort(key=operator.itemgetter(1)) >>> L [('d', 1), ('b', 1), ('c', 2), ('b', 3), ('a', 4)] >>> sorted(L, key=operator.itemgetter(1, 0)) [('b', 1), ('d', 1), ('c', 2), ('b', 3), ('a', 4)] sort by two keys 52 Monday, June 14, 2010
Decorate-Sort-Undecorate >>> words = "This is a test string from Andrew.".split() >>> deco = [ (word.lower(), i, word) for i, word in ... enumerate(words) ] >>> deco.sort() >>> new_words = [ word for _, _, word in deco ] >>> print new_words ['a', 'Andrew.', 'from', 'is', 'string', 'test', 'This'] • Most General • Faster than custom cmp • stable sort (by supplying index) 53 Monday, June 14, 2010
default values • counting frequencies >>> def incr(d, key): ... if key not in d: ... d[key] = 1 ... else: ... d[key] += 1 ... >>> def incr(d, key): ...! ! d[key] = d.get(key, 0) + 1 ... >>> incr(d, 'z') >>> d {'a': 1, 'c': 1, 'b': 2, 'z': 1} >>> incr(d, 'b') >>> d {'a': 1, 'c': 1, 'b': 3, 'z': 1} 54 Monday, June 14, 2010
defaultdict • best feature introduced in Python 2.5 >>> from collections import defaultdict >>> d = defaultdict(int) >>> d['a'] 0 >>> d['b'] += 1 >>> d {'a': 0, 'b': 1} >>> d = defaultdict(list) >>> d['b'] += [1] >>> d {'b': [1]} >>> d = defaultdict(lambda : <expr>) 55 Monday, June 14, 2010
Example: Word Freqs • Counting Word Frequencies • read in a text file, count the frequencies of each word, and print in descending order of frequency import sys input Python is a cool language but OCaml if __name__ == '__main__': is even cooler since it is purely functional ! wordlist = {} ! for i, line in enumerate(sys.stdin): ! ! for word in line.split(): output 3 is 1 2 ! if word in wordlist:! ! 1a1 ! ! wordlist[word][0] += 1 1 but 1 ! ! wordlist[word][1].add(i+1) 1 cool 1 ! else: 1 cooler 2 ! ! wordlist[word] = [1, set([i+1])] 1 even 2 1 functional 2 ! sortedlst = [(-freq, word, lines) 1 it 2 for (word, (freq, lines)) in wordlist.items()] 1 language 1 1 ocaml 1 ! sortedlst.sort() 1 purely 2 for freq, word, lines in sortedlist: 1 python 1 ! ! print -freq, word, " ".join(map(str, lines)) 1 since 2 56 Monday, June 14, 2010
using defaultdict • Counting Word Frequencies • read in a text file, count the frequencies of each word, and print in descending order of frequency import sys input from collections import defaultdict Python is a cool language but OCaml is even cooler since it is purely functional if __name__ == '__main__': ! wordlist = defaultdict(set) ! for i, line in enumerate(sys.stdin): output 3 is 1 2 ! ! for word in line.split(): 1a1 ! wordlist[word].add(i) 1 but 1 1 cool 1 ! sortedlist = sorted([(-len(lines), word, lines) 1 cooler 2 for (word, lines) in wordlist.items()]) 1 even 2 1 functional 2 1 it 2 for freq, word, lines in sortedlist: 1 language 1 ! ! print -freq, word, " ".join(map(str, lines)) 1 ocaml 1 1 purely 2 1 python 1 1 since 2 57 Monday, June 14, 2010
Implementation • lists, tuples, and dicts are all implemented by hashing • strings are implemented as arrays • lists, tuples, and strings • random access: O(1) • insertion/deletion/in: O(n) • dict • in/random access: O(1) • insertion/deletion: O(1) • no linear ordering! 58 Monday, June 14, 2010
Pythonic Styles • do not write ... when you can write ... for key in d.keys(): for key in d: if d.has_key(key): if key in d: i = 0 for x in a: ... for i, x in enumerate(a): i += 1 a[0:len(a) - i] a[:-i] for line in for line in sys.stdin: sys.stdin.readlines(): for x in a: print x, print " ".join(map(str, a)) print s = "" for i in range(lev): print " " * lev s += " " print s 59 Monday, June 14, 2010
Object-Oriented Programming Monday, June 14, 2010
Overview of Python OOP • Motivations of OOP • complicated data structures • modularity • Perl does not have built-in OOP • Perl + OOP ==> Ruby (pure OO, like Java) • Python has OOP from the very beginning • hybrid approach (like C++) • nearly everything inside a class is public • explicit argument self 61 Monday, June 14, 2010
Classes class Point(object): ! def __init__(self, x, y): ! ! self.x = x ! ! self.y = y ! ! def norm(self): ! ! return self.x ** 2 + self.y ** 2 ! ! def __str__(self): ! ! return "(" + str(self.x) + ", " + str(self.y) + ")" • constructor __init__(self, ...) >>> p = Point (3, 4) >>> p.__str__() >>> p.x '(3, 4)' 3 >>> str(p) >>> p.norm() '(3, 4)' 25 >>> print p (3, 4) 62 Monday, June 14, 2010
Classes class Point(object): ! "A 2D point" ! def __init__(self, x, y): ! ! self.x = x ! ! self.y = y ! ! ! def __str__(self): ! ! " like toString() in Java " ! ! return "(" + str(self.x) + ", " + str(self.y) + ")" • doc-strings (__doc__), like javadoc (with pydoc) • no polymorphic functions (earlier defs will be shadowed) • ==> only one constructor (and no destructor) • each function can have only one signature • semantics: Point.__str__(p) equivalent to p.__str__() 63 Monday, June 14, 2010
Implementation: dicts class Point(object): ! "A 2D point" ! def __init__(self, x, y): ! ! self.x = x ! ! self.y = y ! ! ! def __str__(self): ! ! " like toString() in Java " ! ! return "(" + str(self.x) + ", " + str(self.y) + ")" >>> p = Point (5, 6) >>> p.z = 7 >>> print p (5, 6) >>> print p.w AttributeError - no attribute ‘w’ >>> p["x"] = 1 AttributeError - no attribute ‘setitem’ 64 Monday, June 14, 2010
repr and cmp class Point: ! def __init__(self, x, y): ! ! self.x = x ! ! self.y = y ! ! ! def __str__(self): ! ! return "(" + str(self.x) + ", " + str(self.y) + ")" >>> p = Point(3,4) >>> p <__main__.Point instance at 0x38be18> >>> Point (3,4) == Point (3,4) False 65 Monday, June 14, 2010
repr and cmp class Point: ! def __init__(self, x, y): ! ! self.x = x ! ! self.y = y ! ! ! def __str__(self): ! ! return "(" + str(self.x) + ", " + str(self.y) + ")" ! def __repr__(self): ! ! return self.__str__() ! ! def __cmp__(self, other): ! ! if self.x == other.x: ! ! ! return self.y - other.y ! ! return self.x - other.x! with __repr__ and __cmp__ >>> p = Point(3,4) >>> p >>> p (3, 4) <__main__.Point instance at 0x38be18> >>> cmp(Point(3,4), Point(4,3)) >>> Point (3,4) == Point (3,4) -1 False >>> Point (3,4) == Point (3,4) True 66 Monday, June 14, 2010
Inheritance class Point (object): ! ... ! def __str__(self): ! ! return "(" + self.__repr__() + ")" ! def __repr__(self): ! ! return str(self.x) + ", " + str(self.y) ! ... class Point3D (Point): ! "A 3D point" ! def __init__(self, x, y, z): ! ! ! ! Point.__init__(self, x, y) self.z = z super-class, like C++ ! ! (multiple inheritance allowed) ! def __str__(self): ! ! return Point.__str__(self) + ", " + str(self.z) ! def __cmp__(self, other): ! ! tmp = Point.__cmp__(self, other) ! ! if tmp != 0: ! ! ! return tmp ! ! return self.z - other.z 67 Monday, June 14, 2010
Overloading • like operator overloading in C++ • special functions like __add__(), __mul__() class Point:   # previously defined methods here...   def __add__(self, other):     return Point(self.x + other.x, self.y + other.y) def __mul__(self, other):   ! return self.x * other.x + self.y * other.y >>> p = Point (3, 4) dot-product >>> q = Point (5, 6) >>> print (p + q) (8, 10) >>> print (p * q) 35 68 Monday, June 14, 2010
mul vs. rmul class Point: ! def __mul__(self, other):   ! ! return self.x * other.x + self.y * other.y ! def __rmul__(self, other):   ! ! return Point(other * self.x,  other * self.y) scalar-multiplication >>> p1 = Point(3, 4) >>> p2 = Point(5, 7) >>> print p1 * p2 43 >>> print 2 * p2 (10, 14) >>> print p2 * 2 AttributeError: 'int' object has no attribute 'x' 69 Monday, June 14, 2010
iadd (+=) and neg (-) class Point(object):   def __add__(self, other):     return Point(self.x + other.x, self.y + other.y) def __iadd__(self, other): self.x += other.x self.y += other.y return self must return self here! def __neg__(self): return Point(-self.x, -self.y) add, sub, mul, div, radd, rsub, rmul, rdiv, iadd, isub, imul, idiv, neg, inv, pow, len, str, repr, cmp, eq, ne, lt, le, gt, ge 70 Monday, June 14, 2010
Trees class Tree:   def __init__(self, node, children=[]):     self.node = node     self.children = children def total(self):   if self == None: return 0   return self.node + sum([x.total() for x in self.children]) def pp(self, dep=0):   print “ |“ * dep, self.node left = Tree(2) for child in self.children: right = Tree(3) child.pp(dep+1) >>> t = Tree(1, [Tree(2), Tree(3)]) >>> total(t) def __str__(self):   return “(%s)” % “ “.join(map(str, 5 [self.node] + self.children)) >>> t.pp() 1 | 2 | 3 >>> print t (1 (2) (3)) 71 Monday, June 14, 2010

Python for text processing

  • 1.
    ISI Summer InternshipProgram 2010 Python Tutorial Liang Huang lhuang@isi.edu based on a course I taught at Penn (but updated to Python 2.6) www.cis.upenn.edu/~lhuang3/cse399-python Monday, June 14, 2010
  • 2.
    “Hello, World” • C#include <stdio.h> int main(int argc, char ** argv) { ! printf(“Hello, World!n”); } • Java class Hello public { ! public static void main(String argv[]) ! { ! ! System.out.println(“Hello, World!”); ! } } • now in “Hello, World!” print Python 2 Monday, June 14, 2010
  • 3.
    Reversing an Array static int[] reverse_array(int a[]) { ! int [] temp = new int[ a.length ]; ! for (int i = 0; i < len; i++) ! { ! ! temp [i] = a [a.length - i - 1]; ! } ! return temp; } Java def rev(a): def ...(...): Python ! if a == []: no need to specify ! ! return [] ... argument and return types! ! else: python will figure it out. ! ! return rev(a[1:]) + [a[0]] (dynamically typed) or even simpler: a without a[0] singleton list a.reverse() built-in list-processing function 3 Monday, June 14, 2010
  • 4.
    Quick-sort public void sort(intlow, int high) int partition(int low, int high) { { ! if (low >= high) return; ! int pivot = a[low]; ! int p = partition(low, high); ! int i = low - 1; ! sort(low, p); ! int j = high + 1; ! sort(p + 1, high); ! while (i < j) } ! { void swap(int i, int j) ! ! i++; while (a[i] < pivot) i++; {! ! ! j--; while (a[j] > pivot) j--; ! int temp = a[i]; ! ! if (i < j) swap(i, j); ! a[i] = a[j]; ! } ! a[j] = temp; ! return j; } } Java def sort(a): ! if a == []: {x | x ∈ a, x < pivot} Python ! ! return [] ! else: ! ! pivot = a[0]!! ! ! left = [x for x in a if x < pivot ] ! ! right = [x for x in a[1:] if x >= pivot] ! ! return sort(left) + [pivot] + sort(right) smaller semantic-gap! 4 Monday, June 14, 2010
  • 5.
    Python is... • a scripting language (strong in text-processing) • interpreted, like Perl, but much more elegant • a very high-level language (closer to human semantics) • almost like pseudo-code! • procedural (like C, Pascal, Basic, and many more) • but also object-oriented (like C++ and Java) • and even functional! (like ML/OCaml, LISP/Scheme, Haskell, etc.) • from today, you should use Python for everything • not just for scripting, but for serious coding! 5 Monday, June 14, 2010
  • 6.
  • 7.
    Numbers and Strings • like Java, Python has built-in (atomic) types • numbers (int, float), bool, string, list, etc. • numeric operators: + - * / ** % >>> a = 5 >>> >>> s = “hey” c = 1.5 >>> b = 3 >>> 5/2 >>> s + “ guys” >>> type (5) 2 'hey guys' <type 'int'> >>> 5/2. >>> len(s) >>> a += 4 2.5 3 >>> a >>> 5 ** 2 >>> s[0] 9 25 'h' no i++ or ++i >>> s[-1] >>> from __future__ import division 'y' >>> 5/2 2.5 recommended! 7 Monday, June 14, 2010
  • 8.
    Assignments and Comparisons >>> a = b = 0 >>> a = b = 0 >>> a >>> a == b 0 True >>> b >>> type (3 == 5) 0 <type 'bool'> >>> "my" == 'my' >>> a, b = 3, 5 True >>> a + b 8 >>> (1, 2) == (1, 2) >>> (a, b) = (3, 5) True >>> a + b >>> 8 >>> 1, 2 == 1, 2 >>> a, b = b, a ??? (swap) (1, False, 2) 8 Monday, June 14, 2010
  • 9.
    for loops andrange() • for always iterates through a list or sequence >>> sum = 0 >>> for i in range(10): ... sum += i ... Java 1.5 >>> print sum foreach (String word : words) ! System.out.println(word) 45 >>> for word in ["welcome", "to", "python"]: ... print word, ... welcome to python >>> range(5), range(4,6), range(1,7,2) ([0, 1, 2, 3, 4], [4, 5], [1, 3, 5]) 9 Monday, June 14, 2010
  • 10.
    while loops • very similar to while in Java and C • but be careful • in behaves differently in for and while • break statement, same as in Java/C >>> a, b = 0, 1 >>> while b <= 5: ... print b ... a, b = b, a+b ... 1 simultaneous 1 assignment 2 3 fibonacci series 5 10 Monday, June 14, 2010
  • 11.
    Conditionals >>>if x < 10 and x >= 0: >>> if 4 > 5: ... print x, "is a digit" ... print "foo" ... ... else: >>> False and False or True ... print "bar" True ... >>> not True bar False >>> print “foo” if 4 > 5 else “bar” ... conditional expr since Python 2.5 >>> bar C/Java printf( (4>5)? ”foo” : “bar”); 11 Monday, June 14, 2010
  • 12.
    if ... elif... else >>> a = "foo" >>> if a in ["blue", "yellow", "red"]: ... print a + " is a color" ... else: ...! ! if a in ["US", "China"]: ... ! ! print a + " is a country" ... ! ! else: ...! ! ! ! print "I don't know what”, a, “is!" ... I don't know what foo is! switch (a) { ! case “blue”: >>> if a in ...: ! case “yellow”: ! case “red”: ... print ... C/Java ! ! print ...; break; ... elif a in ...: ! case “US”: ! case “China”: ... print ... ! ! print ...; break; ... else: ! else: ... print ... ! } ! print ...; 12 Monday, June 14, 2010 ! !
  • 13.
    break, continue andelse • break and continue borrowed from C/Java • special else in loops • when loop terminated normally (i.e., not by break) • very handy in testing a set of properties || func(n) >>> for n in range(2, 10): for (n=2; n<10; n++) { ... for x in range(2, n): ! good = true; ! for (x=2; x<n; x++) ... if n % x == 0: ! ! if (n % x == 0) { ... break ! ! ! good = false; ... else: C/Java ! ! ! break; ... print n, ! ! } if (x==n) ... ! if (good) ! ! printf(“%d “, n); } prime numbers 13 Monday, June 14, 2010
  • 14.
    Defining a Functiondef • no type declarations needed! wow! • Python will figure it out at run-time • you get a run-time error for type violation • well, Python does not have a compile-error at all >>> def fact(n): ... if n == 0: ... return 1 ... else: ... return n * fact(n-1) ... >>> fact(4) 24 14 Monday, June 14, 2010
  • 15.
    Fibonacci Revisited >>> a, b = 0, 1 >>> while b <= 5: ... print b ... a, b = b, a+b ... 1 1 def fib(n): 2 ! if n <= 1: 3 ! ! return n 5 ! else: ! ! return fib (n-1) + fib (n-2) conceptually cleaner, but much slower! >>> fib(5) 5 >>> fib(6) 8 15 Monday, June 14, 2010
  • 16.
    Default Values >>> def add(a, L=[]): ... return L + [a] ... >>> add(1) [1] >>> add(1,1) error! >>> add(add(1)) [[1]] lists are heterogenous! >>> add(add(1), add(1)) ??? [1, [1]] 16 Monday, June 14, 2010
  • 17.
    Approaches to Typing ✓ strongly typed: types are strictly enforced. no implicit type conversion - weakly typed: not strictly enforced - statically typed: type-checking done at compile-time ✓ dynamically typed: types are inferred at runtime weak strong static C, C++ Java, Pascal dynamic Perl,VB Python, OCaml, Scheme 17 Monday, June 14, 2010
  • 18.
    Lists heterogeneous variable-sized array a = [1,'python', [2,'4']] Monday, June 14, 2010
  • 19.
    Basic List Operations • length, subscript, and slicing >>> a[0:3:2] [1, [2, '4']] >>> a = [1,'python', [2,'4']] >>> len(a) >>> a[:-1] 3 [1, 'python'] >>> a[2][1] '4' >>> a[0:3:] >>> a[3] [1, 'python', [2, '4']] IndexError! >>> a[-2] >>> a[0::2] 'python' [1, [2, '4']] >>> a[1:2] ['python'] >>> a[::] [1, 'python', [2, '4']] >>> a[:] [1, 'python', [2, '4']] 19 Monday, June 14, 2010
  • 20.
    +, extend, +=,append • extend (+=) and append mutates the list! >>> a = [1,'python', [2,'4']] >>> a + [2] [1, 'python', [2, '4'], 2] >>> a.extend([2, 3]) >>> a [1, 'python', [2, '4'], 2, 3] same as a += [2, 3] >>> a.append('5') >>> a [1, 'python', [2, '4'], 2, 3, '5'] >>> a[2].append('xtra') >>> a [1, 'python', [2, '4', 'xtra'], 2, 3, '5'] 20 Monday, June 14, 2010
  • 21.
    Comparison and Reference • as in Java, comparing built-in types is by value • by contrast, comparing objects is by reference >>> c = b [:] >>> [1, '2'] == [1, '2'] >>> c True [1, 5] >>> a = b = [1, '2'] >>> c == b slicing gets >>> a == b True a shallow copy True >>> c is b >>> a is b False True >>> b [1] = 5 what about a += [1]? >>> a [1, 5] >>> b[:0] = [2] insertion >>> b >>> a = 4 [2, 1, 3, 4] >>> b >>> b[1:3]=[] [1, 5] >>> b >>> a is b [2, 4] deletion >>> False 21 Monday, June 14, 2010
  • 22.
    List Comprehension >>> a = [1, 5, 2, 3, 4 , 6] >>> [x*2 for x in a] [2, 10, 4, 6, 8, 12] >>> [x for x in a if 4th smallest element ... len( [y for y in a if y < x] ) == 3 ] [4] >>> a = range(2,10) >>> [x*x for x in a if ... [y for y in a if y < x and (x % y == 0)] == [] ] ??? [4, 9, 25, 49] square of prime numbers 22 Monday, June 14, 2010
  • 23.
    List Comprehensions >>> vec = [2, 4, 6] >>> [[x,x**2] for x in vec] [[2, 4], [4, 16], [6, 36]] >>> [x, x**2 for x in vec] SyntaxError: invalid syntax >>> [(x, x**2) for x in vec] [(2, 4), (4, 16), (6, 36)] >>> vec1 = [2, 4, 6] >>> vec2 = [4, 3, -9] >>> [x*y for x in vec1 for y in vec2] [8, 6, -18, 16, 12, -36, 24, 18, -54] (cross product) >>> [x+y for x in vec1 for y in vec2] [6, 5, -7, 8, 7, -5, 10, 9, -3] >>> [vec1[i]*vec2[i] for i in range(len(vec1))] [8, 12, -54] (dot product) 23 Monday, June 14, 2010
  • 24.
    Strings sequence of characters Monday, June 14, 2010
  • 25.
    Basic String Operations • join, split, strip • upper(), lower() >>> s = " this is a python course. n" >>> words = s.split() >>> words ['this', 'is', 'a', 'python', 'course.'] >>> s.strip() 'this is a python course.' >>> " ".join(words) 'this is a python course.' >>> "; ".join(words).split("; ") ['this', 'is', 'a', 'python', 'course.'] >>> s.upper() ' THIS IS A PYTHON COURSE. n' http://docs.python.org/lib/string-methods.html 25 Monday, June 14, 2010
  • 26.
    Basic Search/Replace inString >>> "this is a course".find("is") 2 >>> "this is a course".find("is a") 5 >>> "this is a course".find("is at") -1 >>> "this is a course".replace("is", "was") 'thwas was a course' >>> "this is a course".replace(" is", " was") 'this was a course' >>> "this is a course".replace("was", "were") 'this is a course' these operations are much faster than regexps! 26 Monday, June 14, 2010
  • 27.
    String Formatting >>> print “%.2f%%” % 97.2363 97.24 >>> s = '%s has %03d quote types.' % ("Python", 2) >>> print s Python has 002 quote types. 27 Monday, June 14, 2010
  • 28.
    Tuples immutable lists Monday, June 14, 2010
  • 29.
    Tuples and Equality • caveat: singleton tuple a += (1,2) # new copy • ==, is, is not >>> (1, 'a') >>> 1, 2 == 1, 2 (1, 'a') (1, False, 2) >>> (1) >>> (1, 2) == (1, 2) 1 True >>> [1] >>> (1, 2) is (1, 2) [1] False >>> (1,) >>> "ab" is "ab" (1,) True >>> [1,] >>> [1] is [1] [1] False >>> (5) + (6) >>> 1 is 1 11 True >>> (5,)+ (6,) >>> True is True (5, 6) True 29 Monday, June 14, 2010
  • 30.
    Comparison • between the same type: “lexicographical” • between different types: arbitrary • cmp(): three-way <, >, == • C: strcmp(s, t), Java: a.compareTo(b) >>> (1, 'ab') < (1, 'ac') >>> [1] < [1, 2] < [1, 3] True True >>> (1, ) < (1, 'ac') >>> [1] == [1,] == [1.0] True True >>> [1] < [1, 'ac'] >>> cmp ( (1, ), (1, 2) ) True -1 >>> 1 < True >>> cmp ( (1, ), (1, ) ) False 0 >>> True < 1 >>> cmp ( (1, 2), (1, ) ) False 1 30 Monday, June 14, 2010
  • 31.
    enumerate >>> words = ['this', 'is', 'python'] >>> i = 0 >>> for word in words: ... i += 1 ... print i, word ... 1 this 2 is 3 python >>> for i, word in enumerate(words): ... print i+1, word ... • how to enumerate two lists/tuples simultaneously? 31 Monday, June 14, 2010
  • 32.
    zip and _ >>> a = [1, 2] >>> b = ['a', 'b'] >>> zip (a,b) [(1, 'a'), (2, 'b')] >>> zip(a,b,a) [(1, 'a', 1), (2, 'b', 2)] >>> zip ([1], b) [(1, 'a')] >>> a = ['p', 'q']; b = [[2, 3], [5, 6]] >>> for i, (x, [_, y]) in enumerate(zip(a, b)): ...! ! print i, x, y ... 0 p 3 1 q 6 32 Monday, June 14, 2010
  • 33.
    zip and listcomprehensions >>> vec1 = [2, 4, 6] >>> vec2 = [4, 3, -9] >>> [(x, y) for x in vec1 for y in vec2] [(2, 4), (2, 3), (2, -9), (4, 4), (4, 3), (4, -9), (6, 4), (6, 3), (6, -9)] >>> [(vec1[i], vec2[i]) for i in range(len(vec1))] [(2, 4), (4, 3), (6, -9)] >>> sum([vec1[i]*vec2[i] for i in range(len(vec1))] -34 >>> sum([x*y for (x,y) in zip(vec1, vec2)]) -34 >>> sum([v[0]*v[1] for v in zip(vec1, vec2)]) -34 33 Monday, June 14, 2010
  • 34.
    how to implementzip? binary zip: easy >>> def myzip(a,b): ... if a == [] or b == []: ... return [] ... return [(a[0], b[0])] + myzip(a[1:], b[1:]) ... >>> myzip([1,2], ['a','b']) [(1, 'a'), (2, 'b')] >>> myzip([1,2], ['b']) [(1, 'b')] how to deal with arbitrarily many arguments? 34 Monday, June 14, 2010
  • 35.
    Basic import andI/O Monday, June 14, 2010
  • 36.
    import and I/O • similar to import in Java • File I/O much easier than Java import sys from sys import * for line in sys.stdin: or for line in stdin: ! print line.split() ! print line.split() import System; Java import System.*; >>> f = open("my.in", "rt") to read a line: >>> g = open("my.out", "wt") line = f.readline() >>> for line in f: ... print >> g, line, to read all the lines: ... g.close() lines = f.readlines() file copy note this comma! 36 Monday, June 14, 2010
  • 37.
    import and __main__ • multiple source files (modules) foo.py • C: #include “my.h” def pp(a): ! print “ “.join(a) • Java: import My • if __name__ == “__main__”: demo ! from sys import * ! a = stdin.readline() ! pp (a.split()) • handy for debugging >>> import foo >>> pp([1,2,3]) interactive 1 2 3 37 Monday, June 14, 2010
  • 38.
    Quiz • Palindromes abcba • read in a string from standard input, and print True if it is a palindrome, print False if otherwise def palindrome(s): len(s) <= 1 if _____________: return True s[0] == s[-1] s[1:-1] return _____________ and palindrome(________) if __name__ == '__main__': import sys ___________ strip() s = sys.stdin.readline().________ print palindrome(s) 38 Monday, June 14, 2010
  • 39.
  • 40.
    map and filter • intuition: function as data • we have already seen functional programming a lot! • list comprehension, custom comparison function map(f, a) [ f(x) for x in a ] filter(p, a) [ x for x in a if p(x) ] map(f, filter(p, a)) [ f(x) for x in a if p(x) ] >>> map(int, ['1','2']) >>> def is_even(x): [1, 2] ... return x % 2 == 0 >>> " ".join(map(str, [1,2])) ... 1 2 >>> filter(is_even, [-1, 0]) [0] 40 Monday, June 14, 2010
  • 41.
    lambda • map/filter in one line for custom functions? • “anonymous inline function” • borrowed from LISP, Scheme, ML, OCaml >>> f = lambda x: x*2 >>> f(1) 2 >>> map (lambda x: x**2, [1, 2]) [1, 4] >>> filter (lambda x: x > 0, [-1, 1]) [1] >>> g = lambda x,y : x+y >>> g(5,6) 11 >>> map (lambda (x,y): x+y, [(1,2), (3,4)]) [3, 7] 41 Monday, June 14, 2010
  • 42.
    more on lambda >>> f = lambda : "good!" >>> f <function <lambda> at 0x381730> >>> f() 'good!' lazy evaluation >>> a = [5, 1, 2, 6, 4] >>> a.sort(lambda x,y : y - x) >>> a [6, 5, 4, 2, 1] custom comparison 42 Monday, June 14, 2010
  • 43.
    Dictionaries (heterogeneous) hash maps Monday, June 14, 2010
  • 44.
    Constructing Dicts • key : value pairs >>> d = {'a': 1, 'b': 2, 'c': 1} >>> d['b'] 2 >>> d['b'] = 3 >>> d['b'] 3 >>> d['e'] KeyError! >>> d.has_key('a') True >>> 'a' in d True >>> d.keys() ['a', 'c', 'b'] >>> d.values() [1, 1, 3] 44 Monday, June 14, 2010
  • 45.
    Other Constructions • zipping, list comprehension, keyword argument • dump to a list of tuples >>> d = {'a': 1, 'b': 2, 'c': 1} >>> keys = ['b', 'c', 'a'] >>> values = [2, 1, 1 ] >>> e = dict (zip (keys, values)) >>> d == e True >>> d.items() [('a', 1), ('c', 1), ('b', 2)] >>> f = dict( [(x, x**2) for x in values] ) >>> f {1: 1, 2: 4} >>> g = dict(a=1, b=2, c=1) >>> g == d True 45 Monday, June 14, 2010
  • 46.
    Mapping Type http://docs.python.org/lib/typesmapping.html 46 Monday, June 14, 2010
  • 47.
    Sets identity maps, unordered collection Monday, June 14, 2010
  • 48.
    Sets • sets do not have a special syntactic form • unlike [] for lists, () for tuples and {} for dicts • construction from lists, tuples, dicts (keys), and strs • in, not in, add, remove >>> a = set([]) >>> a = set((1,2)) >>> 1 in a >>> a False set([1, 2]) >>> a.add(1) >>> b = set([1,2]) >>> a.add('b') >>> a == b >>> a True set([1, 'b']) >>> c = set({1:'a', 2:'b'}) >>> a.remove(1) >>> c >>> a set([1, 2]) set(['b']) 48 Monday, June 14, 2010
  • 49.
    set and frozensettype 49 Monday, June 14, 2010
  • 50.
    Basic Sorting >>> a = [5, 2, 3, 1, 4] >>> a.sort() >>> print a [1, 2, 3, 4, 5] >>> a = [5, 2, 3, 1, 4] >>> a.sort(reverse=True) >>> a [5, 4, 3, 2, 1] >>> a = [5, 2, 3, 1, 4] >>> a.sort() >>> a.reverse() >>> a [5, 4, 3, 2, 1] 50 Monday, June 14, 2010
  • 51.
    Built-in and Customcmp >>> a = [5, 2, 3, 1, 4] >>> a.sort(cmp) >>> print a [1, 2, 3, 4, 5] >>> a = [5, 2, 3, 1, 4] >>> def reverse_numeric(x, y): >>> return y-x >>> >>> a.sort(reverse_numeric) >>> a [5, 4, 3, 2, 1] 51 Monday, June 14, 2010
  • 52.
    Sorting by Keys >>> a = "This is a test string from Andrew".split() >>> a.sort(key=str.lower) >>> a ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This'] >>> import operator >>> L = [('c', 2), ('d', 1), ('a', 4), ('b', 3), ('b', 1)] >>> L.sort(key=operator.itemgetter(1)) >>> L [('d', 1), ('b', 1), ('c', 2), ('b', 3), ('a', 4)] >>> sorted(L, key=operator.itemgetter(1, 0)) [('b', 1), ('d', 1), ('c', 2), ('b', 3), ('a', 4)] sort by two keys 52 Monday, June 14, 2010
  • 53.
    Decorate-Sort-Undecorate >>> words = "This is a test string from Andrew.".split() >>> deco = [ (word.lower(), i, word) for i, word in ... enumerate(words) ] >>> deco.sort() >>> new_words = [ word for _, _, word in deco ] >>> print new_words ['a', 'Andrew.', 'from', 'is', 'string', 'test', 'This'] • Most General • Faster than custom cmp • stable sort (by supplying index) 53 Monday, June 14, 2010
  • 54.
    default values • counting frequencies >>> def incr(d, key): ... if key not in d: ... d[key] = 1 ... else: ... d[key] += 1 ... >>> def incr(d, key): ...! ! d[key] = d.get(key, 0) + 1 ... >>> incr(d, 'z') >>> d {'a': 1, 'c': 1, 'b': 2, 'z': 1} >>> incr(d, 'b') >>> d {'a': 1, 'c': 1, 'b': 3, 'z': 1} 54 Monday, June 14, 2010
  • 55.
    defaultdict • best feature introduced in Python 2.5 >>> from collections import defaultdict >>> d = defaultdict(int) >>> d['a'] 0 >>> d['b'] += 1 >>> d {'a': 0, 'b': 1} >>> d = defaultdict(list) >>> d['b'] += [1] >>> d {'b': [1]} >>> d = defaultdict(lambda : <expr>) 55 Monday, June 14, 2010
  • 56.
    Example: Word Freqs • Counting Word Frequencies • read in a text file, count the frequencies of each word, and print in descending order of frequency import sys input Python is a cool language but OCaml if __name__ == '__main__': is even cooler since it is purely functional ! wordlist = {} ! for i, line in enumerate(sys.stdin): ! ! for word in line.split(): output 3 is 1 2 ! if word in wordlist:! ! 1a1 ! ! wordlist[word][0] += 1 1 but 1 ! ! wordlist[word][1].add(i+1) 1 cool 1 ! else: 1 cooler 2 ! ! wordlist[word] = [1, set([i+1])] 1 even 2 1 functional 2 ! sortedlst = [(-freq, word, lines) 1 it 2 for (word, (freq, lines)) in wordlist.items()] 1 language 1 1 ocaml 1 ! sortedlst.sort() 1 purely 2 for freq, word, lines in sortedlist: 1 python 1 ! ! print -freq, word, " ".join(map(str, lines)) 1 since 2 56 Monday, June 14, 2010
  • 57.
    using defaultdict • Counting Word Frequencies • read in a text file, count the frequencies of each word, and print in descending order of frequency import sys input from collections import defaultdict Python is a cool language but OCaml is even cooler since it is purely functional if __name__ == '__main__': ! wordlist = defaultdict(set) ! for i, line in enumerate(sys.stdin): output 3 is 1 2 ! ! for word in line.split(): 1a1 ! wordlist[word].add(i) 1 but 1 1 cool 1 ! sortedlist = sorted([(-len(lines), word, lines) 1 cooler 2 for (word, lines) in wordlist.items()]) 1 even 2 1 functional 2 1 it 2 for freq, word, lines in sortedlist: 1 language 1 ! ! print -freq, word, " ".join(map(str, lines)) 1 ocaml 1 1 purely 2 1 python 1 1 since 2 57 Monday, June 14, 2010
  • 58.
    Implementation • lists, tuples, and dicts are all implemented by hashing • strings are implemented as arrays • lists, tuples, and strings • random access: O(1) • insertion/deletion/in: O(n) • dict • in/random access: O(1) • insertion/deletion: O(1) • no linear ordering! 58 Monday, June 14, 2010
  • 59.
    Pythonic Styles • do not write ... when you can write ... for key in d.keys(): for key in d: if d.has_key(key): if key in d: i = 0 for x in a: ... for i, x in enumerate(a): i += 1 a[0:len(a) - i] a[:-i] for line in for line in sys.stdin: sys.stdin.readlines(): for x in a: print x, print " ".join(map(str, a)) print s = "" for i in range(lev): print " " * lev s += " " print s 59 Monday, June 14, 2010
  • 60.
  • 61.
    Overview of PythonOOP • Motivations of OOP • complicated data structures • modularity • Perl does not have built-in OOP • Perl + OOP ==> Ruby (pure OO, like Java) • Python has OOP from the very beginning • hybrid approach (like C++) • nearly everything inside a class is public • explicit argument self 61 Monday, June 14, 2010
  • 62.
    Classes class Point(object): ! def __init__(self, x, y): ! ! self.x = x ! ! self.y = y ! ! def norm(self): ! ! return self.x ** 2 + self.y ** 2 ! ! def __str__(self): ! ! return "(" + str(self.x) + ", " + str(self.y) + ")" • constructor __init__(self, ...) >>> p = Point (3, 4) >>> p.__str__() >>> p.x '(3, 4)' 3 >>> str(p) >>> p.norm() '(3, 4)' 25 >>> print p (3, 4) 62 Monday, June 14, 2010
  • 63.
    Classes class Point(object): ! "A 2D point" ! def __init__(self, x, y): ! ! self.x = x ! ! self.y = y ! ! ! def __str__(self): ! ! " like toString() in Java " ! ! return "(" + str(self.x) + ", " + str(self.y) + ")" • doc-strings (__doc__), like javadoc (with pydoc) • no polymorphic functions (earlier defs will be shadowed) • ==> only one constructor (and no destructor) • each function can have only one signature • semantics: Point.__str__(p) equivalent to p.__str__() 63 Monday, June 14, 2010
  • 64.
    Implementation: dicts class Point(object): ! "A 2D point" ! def __init__(self, x, y): ! ! self.x = x ! ! self.y = y ! ! ! def __str__(self): ! ! " like toString() in Java " ! ! return "(" + str(self.x) + ", " + str(self.y) + ")" >>> p = Point (5, 6) >>> p.z = 7 >>> print p (5, 6) >>> print p.w AttributeError - no attribute ‘w’ >>> p["x"] = 1 AttributeError - no attribute ‘setitem’ 64 Monday, June 14, 2010
  • 65.
    repr and cmp class Point: ! def __init__(self, x, y): ! ! self.x = x ! ! self.y = y ! ! ! def __str__(self): ! ! return "(" + str(self.x) + ", " + str(self.y) + ")" >>> p = Point(3,4) >>> p <__main__.Point instance at 0x38be18> >>> Point (3,4) == Point (3,4) False 65 Monday, June 14, 2010
  • 66.
    repr and cmp class Point: ! def __init__(self, x, y): ! ! self.x = x ! ! self.y = y ! ! ! def __str__(self): ! ! return "(" + str(self.x) + ", " + str(self.y) + ")" ! def __repr__(self): ! ! return self.__str__() ! ! def __cmp__(self, other): ! ! if self.x == other.x: ! ! ! return self.y - other.y ! ! return self.x - other.x! with __repr__ and __cmp__ >>> p = Point(3,4) >>> p >>> p (3, 4) <__main__.Point instance at 0x38be18> >>> cmp(Point(3,4), Point(4,3)) >>> Point (3,4) == Point (3,4) -1 False >>> Point (3,4) == Point (3,4) True 66 Monday, June 14, 2010
  • 67.
    Inheritance class Point (object): ! ... ! def __str__(self): ! ! return "(" + self.__repr__() + ")" ! def __repr__(self): ! ! return str(self.x) + ", " + str(self.y) ! ... class Point3D (Point): ! "A 3D point" ! def __init__(self, x, y, z): ! ! ! ! Point.__init__(self, x, y) self.z = z super-class, like C++ ! ! (multiple inheritance allowed) ! def __str__(self): ! ! return Point.__str__(self) + ", " + str(self.z) ! def __cmp__(self, other): ! ! tmp = Point.__cmp__(self, other) ! ! if tmp != 0: ! ! ! return tmp ! ! return self.z - other.z 67 Monday, June 14, 2010
  • 68.
    Overloading • like operator overloading in C++ • special functions like __add__(), __mul__() class Point:   # previously defined methods here...   def __add__(self, other):     return Point(self.x + other.x, self.y + other.y) def __mul__(self, other):   ! return self.x * other.x + self.y * other.y >>> p = Point (3, 4) dot-product >>> q = Point (5, 6) >>> print (p + q) (8, 10) >>> print (p * q) 35 68 Monday, June 14, 2010
  • 69.
    mul vs. rmul class Point: ! def __mul__(self, other):   ! ! return self.x * other.x + self.y * other.y ! def __rmul__(self, other):   ! ! return Point(other * self.x,  other * self.y) scalar-multiplication >>> p1 = Point(3, 4) >>> p2 = Point(5, 7) >>> print p1 * p2 43 >>> print 2 * p2 (10, 14) >>> print p2 * 2 AttributeError: 'int' object has no attribute 'x' 69 Monday, June 14, 2010
  • 70.
    iadd (+=) andneg (-) class Point(object):   def __add__(self, other):     return Point(self.x + other.x, self.y + other.y) def __iadd__(self, other): self.x += other.x self.y += other.y return self must return self here! def __neg__(self): return Point(-self.x, -self.y) add, sub, mul, div, radd, rsub, rmul, rdiv, iadd, isub, imul, idiv, neg, inv, pow, len, str, repr, cmp, eq, ne, lt, le, gt, ge 70 Monday, June 14, 2010
  • 71.
    Trees class Tree:   def __init__(self, node, children=[]):     self.node = node     self.children = children def total(self):   if self == None: return 0   return self.node + sum([x.total() for x in self.children]) def pp(self, dep=0):   print “ |“ * dep, self.node left = Tree(2) for child in self.children: right = Tree(3) child.pp(dep+1) >>> t = Tree(1, [Tree(2), Tree(3)]) >>> total(t) def __str__(self):   return “(%s)” % “ “.join(map(str, 5 [self.node] + self.children)) >>> t.pp() 1 | 2 | 3 >>> print t (1 (2) (3)) 71 Monday, June 14, 2010