Lecture 1: Introduction to Algorithms Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794–4400 http://www.cs.stonybrook.edu/˜skiena
Topic: Course Mechanics
Syllabus / Course Mechanics • Prerequisites (Data structures and linear algebra) • Textbook (ADM third edition) • Grading • Homeworks • Daily problems • Exams • Rules of the Game
1 Introduction to algorithms 1-27 2 Asymptotic notation 31-40 3 Logarithms and more 41-58 HW1 out 4 Elementary data structures 65-75 5 Dictionary data structures 76-92 6 Hashing 93-102 7 Applications of Sorting 109-114 8 Heapsort/Priority Queues 115-126 HW1 in / HW2out 9 Mergesort/Quicksort/Binsort 127-151 Midterm 1 10 Data structures for graphs 197-211 11 Breadth-first search 212-220 HW2 in / HW3 out 12 Topological sort/connectivity 221-234 13 Minimum spanning trees 243-256 14 Shortest paths 257-266 15 Exploiting graph algorithms 267-275 16 Combinatorial search 281-288 HW3 in / HW4 out 17 Program optimization 289-302 18 Elements of dynamic programming 307-325 19 Examples of dynamic programming 326-336 20 Limitations of dynamic programming 337-344 HW4 in / HW5 out 21 Dynamic programming review Midterm 2 22 Reductions 355-360 23 Easy reductions 361-368 24 Harder reductions 369-372 25 The NP-completeness challenge 373-382 HW 5 in Final Exam
Instructor Style Disclaimer I try to make lectures fun through jokes and analogies, but always fear saying something that may offend someone in the class. I am particularly fearful of teaching online, as I will miss feedback mechanisms I am used to in the classroom. I want everyone to feel comfortable in my classroom. If anything I say bothers you, please come by and tell me so. I will apologize, and then do my best to understand the issue to avoid doing so again.
Questions?
Topic: What is an Algorithm?
What Is An Algorithm? Algorithms are the ideas behind computer programs. An algorithm is the thing which stays the same whether the program is in assembly language running on a supercomputer in New York or running on a cell phone in Kathmandu in Python! To be interesting, an algorithm has to solve a general, specified problem. An algorithmic problem is specified by describing the set of instances it must work on, and what desired properties the output must have.
Example Problem: Sorting Input: A sequence of N numbers a1...an Output: the permutation (reordering) of the input sequence such as a1 ≤ a2 . . . ≤ an. We seek algorithms which are correct and efficient. A faster algorithm running on a slower computer will always win for sufficiently large instances, as we shall see. Usually, problems don’t have to get that large before the faster algorithm wins.
Correctness For any algorithm, we must prove that it always returns the desired output for all legal instances of the problem. For sorting, this means even if (1) the input is already sorted, or (2) it contains repeated elements. Algorithm correctness is not obvious in many optimization problems! Algorithms problems must be carefully specified to allow a provably correct algorithm to exist. We can find the “shortest tour” but not the “best tour”.
Expressing Algorithms We need some way to express the sequence of steps comprising an algorithm. In order of increasing precision, we have English, pseu- docode, and real programming languages. Unfortunately, ease of expression moves in the reverse order. I prefer to describe the ideas of an algorithm in English, moving to pseudocode to clarify sufficiently tricky details of the algorithm.
Questions?
Topic: Robot Tour Optimization
Robot Tour Optimization Suppose you have a robot arm equipped with a tool, say a soldering iron. To enable the robot arm to do a soldering job, we must construct an ordering of the contact points, so the robot visits (and solders) the points in order. We seek the order which minimizes the testing time (i.e. travel distance) it takes to assemble the circuit board.
Find the Shortest Robot Tour You are given the job to program the robot arm. Give me an algorithm to find the most efficient tour!
Nearest Neighbor Tour A popular solution starts at some point p0 and then walks to its nearest neighbor p1 first, then repeats from p1, etc. until done. Pick and visit an initial point p0 p = p0 i = 0 While there are still unvisited points i = i + 1 Let pi be the closest unvisited point to pi−1 Visit pi Return to p0 from pi
Nearest Neighbor Tour is Wrong! 3 −5 11 −1 1 0 −21 −1 0 1 3 11 −21 −5 Starting from the leftmost point will not fix the problem.
Closest Pair Tour Another idea is to repeatedly connect the closest pair of points whose connection will not cause a cycle or a three-way branch, until all points are in one tour. Let n be the number of points in the set For i = 1 to n − 1 do d = ∞ For each pair of endpoints (x, y) of partial paths If dist(x, y) ≤ d then xm = x, ym = y, d = dist(x, y) Connect (xm, ym) by an edge Connect the two endpoints by an edge.
Closest Pair Tour is Wrong! Although it works correctly on the previous example, other data causes trouble: 1 + ε 1 + ε 1 − ε 1 − ε (l) 1 + ε 1 − ε 1 + ε 1 − ε (r)
A Correct Algorithm: Exhaustive Search We could try all possible orderings of the points, then select the one which minimizes the total length: d = ∞ For each of the n! permutations Πi of the n points If (cost(Πi) ≤ d) then d = cost(Πi) and Pmin = Πi Return Pmin Since all possible orderings are considered, we are guaranteed to end up with the shortest possible tour.
Exhaustive Search is Slow! Because it tries all n! permutations, it is much too slow to use when there are more than 10-20 points. No efficient, correct algorithm exists for the traveling salesman problem, as we will see later.
Questions?
Topic: Movie Star Scheduling
Selecting the Right Jobs A movie star wants to the select the maximum number of staring roles such that no two jobs require his presence at the same time. Process Terminated "Discrete" Mathematics Halting State Programming Challenges Calculated Bets Tarjan of the Jungle The President’s Algorist Steiner’s Tree The Four Volume Problem
The Movie Star Scheduling Problem Input: A set I of n intervals on the line. Output: What is the largest subset of mutually non- overlapping intervals which can be selected from I? Give an algorithm to solve the problem!
Earliest Job First Start working as soon as there is work available: EarliestJobFirst(I) Accept the earlest starting job j from I which does not overlap any previously accepted job, and repeat until no more such jobs remain.
Earliest Job First is Wrong! The first job might be so long (War and Peace) that it prevents us from taking any other job.
Shortest Job First Always take the shortest possible job, so you spend the least time working (and thus unavailable). ShortestJobFirst(I) While (I 6= ∅) do Accept the shortest possible job j from I. Delete j, and intervals which intersect j from I.
Shortest Job First is Wrong! Taking the shortest job can prevent us from taking two longer jobs which barely overlap it.
First Job to Complete Take the job with the earliest completion date: OptimalScheduling(I) While (I 6= ∅) do Accept job j with the earliest completion date. Delete j, and whatever intersects j from I.
First Job to Complete is Optimal! Proof: Other jobs may well have started before the first to complete (say, x), but all must at least partially overlap both x and each other. Thus we can select at most one from the group. The first these jobs to complete is x, so selecting any job but x would only block out more opportunties after x.
Questions?
Topic: Proof and Counterexample
Demonstrating Incorrectness Searching for counterexamples is the best way to disprove the correctness of a heuristic. • Think about all small examples. • Think about examples with ties on your decision criteria (e.g. pick the nearest point) • Think about examples with extremes of big and small...
Induction and Recursion Failure to find a counterexample to a given algorithm does not mean “it is obvious” that the algorithm is correct. Mathematical induction is a very useful method for proving the correctness of recursive algorithms. Recursion and induction are the same basic idea: (1) basis case, (2) general assumption, (3) general case. n X i=1 i = n(n + 1)/2
Questions?

lecture1 .pdf introduction to algorithms

  • 1.
    Lecture 1: Introduction toAlgorithms Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794–4400 http://www.cs.stonybrook.edu/˜skiena
  • 2.
  • 3.
    Syllabus / CourseMechanics • Prerequisites (Data structures and linear algebra) • Textbook (ADM third edition) • Grading • Homeworks • Daily problems • Exams • Rules of the Game
  • 4.
    1 Introduction toalgorithms 1-27 2 Asymptotic notation 31-40 3 Logarithms and more 41-58 HW1 out 4 Elementary data structures 65-75 5 Dictionary data structures 76-92 6 Hashing 93-102 7 Applications of Sorting 109-114 8 Heapsort/Priority Queues 115-126 HW1 in / HW2out 9 Mergesort/Quicksort/Binsort 127-151 Midterm 1 10 Data structures for graphs 197-211 11 Breadth-first search 212-220 HW2 in / HW3 out 12 Topological sort/connectivity 221-234 13 Minimum spanning trees 243-256 14 Shortest paths 257-266 15 Exploiting graph algorithms 267-275 16 Combinatorial search 281-288 HW3 in / HW4 out 17 Program optimization 289-302 18 Elements of dynamic programming 307-325 19 Examples of dynamic programming 326-336 20 Limitations of dynamic programming 337-344 HW4 in / HW5 out 21 Dynamic programming review Midterm 2 22 Reductions 355-360 23 Easy reductions 361-368 24 Harder reductions 369-372 25 The NP-completeness challenge 373-382 HW 5 in Final Exam
  • 5.
    Instructor Style Disclaimer Itry to make lectures fun through jokes and analogies, but always fear saying something that may offend someone in the class. I am particularly fearful of teaching online, as I will miss feedback mechanisms I am used to in the classroom. I want everyone to feel comfortable in my classroom. If anything I say bothers you, please come by and tell me so. I will apologize, and then do my best to understand the issue to avoid doing so again.
  • 6.
  • 7.
    Topic: What isan Algorithm?
  • 8.
    What Is AnAlgorithm? Algorithms are the ideas behind computer programs. An algorithm is the thing which stays the same whether the program is in assembly language running on a supercomputer in New York or running on a cell phone in Kathmandu in Python! To be interesting, an algorithm has to solve a general, specified problem. An algorithmic problem is specified by describing the set of instances it must work on, and what desired properties the output must have.
  • 9.
    Example Problem: Sorting Input:A sequence of N numbers a1...an Output: the permutation (reordering) of the input sequence such as a1 ≤ a2 . . . ≤ an. We seek algorithms which are correct and efficient. A faster algorithm running on a slower computer will always win for sufficiently large instances, as we shall see. Usually, problems don’t have to get that large before the faster algorithm wins.
  • 10.
    Correctness For any algorithm,we must prove that it always returns the desired output for all legal instances of the problem. For sorting, this means even if (1) the input is already sorted, or (2) it contains repeated elements. Algorithm correctness is not obvious in many optimization problems! Algorithms problems must be carefully specified to allow a provably correct algorithm to exist. We can find the “shortest tour” but not the “best tour”.
  • 11.
    Expressing Algorithms We needsome way to express the sequence of steps comprising an algorithm. In order of increasing precision, we have English, pseu- docode, and real programming languages. Unfortunately, ease of expression moves in the reverse order. I prefer to describe the ideas of an algorithm in English, moving to pseudocode to clarify sufficiently tricky details of the algorithm.
  • 12.
  • 13.
    Topic: Robot TourOptimization
  • 14.
    Robot Tour Optimization Supposeyou have a robot arm equipped with a tool, say a soldering iron. To enable the robot arm to do a soldering job, we must construct an ordering of the contact points, so the robot visits (and solders) the points in order. We seek the order which minimizes the testing time (i.e. travel distance) it takes to assemble the circuit board.
  • 15.
    Find the ShortestRobot Tour You are given the job to program the robot arm. Give me an algorithm to find the most efficient tour!
  • 16.
    Nearest Neighbor Tour Apopular solution starts at some point p0 and then walks to its nearest neighbor p1 first, then repeats from p1, etc. until done. Pick and visit an initial point p0 p = p0 i = 0 While there are still unvisited points i = i + 1 Let pi be the closest unvisited point to pi−1 Visit pi Return to p0 from pi
  • 17.
    Nearest Neighbor Touris Wrong! 3 −5 11 −1 1 0 −21 −1 0 1 3 11 −21 −5 Starting from the leftmost point will not fix the problem.
  • 18.
    Closest Pair Tour Anotheridea is to repeatedly connect the closest pair of points whose connection will not cause a cycle or a three-way branch, until all points are in one tour. Let n be the number of points in the set For i = 1 to n − 1 do d = ∞ For each pair of endpoints (x, y) of partial paths If dist(x, y) ≤ d then xm = x, ym = y, d = dist(x, y) Connect (xm, ym) by an edge Connect the two endpoints by an edge.
  • 19.
    Closest Pair Touris Wrong! Although it works correctly on the previous example, other data causes trouble: 1 + ε 1 + ε 1 − ε 1 − ε (l) 1 + ε 1 − ε 1 + ε 1 − ε (r)
  • 20.
    A Correct Algorithm:Exhaustive Search We could try all possible orderings of the points, then select the one which minimizes the total length: d = ∞ For each of the n! permutations Πi of the n points If (cost(Πi) ≤ d) then d = cost(Πi) and Pmin = Πi Return Pmin Since all possible orderings are considered, we are guaranteed to end up with the shortest possible tour.
  • 21.
    Exhaustive Search isSlow! Because it tries all n! permutations, it is much too slow to use when there are more than 10-20 points. No efficient, correct algorithm exists for the traveling salesman problem, as we will see later.
  • 22.
  • 23.
  • 24.
    Selecting the RightJobs A movie star wants to the select the maximum number of staring roles such that no two jobs require his presence at the same time. Process Terminated "Discrete" Mathematics Halting State Programming Challenges Calculated Bets Tarjan of the Jungle The President’s Algorist Steiner’s Tree The Four Volume Problem
  • 25.
    The Movie StarScheduling Problem Input: A set I of n intervals on the line. Output: What is the largest subset of mutually non- overlapping intervals which can be selected from I? Give an algorithm to solve the problem!
  • 26.
    Earliest Job First Startworking as soon as there is work available: EarliestJobFirst(I) Accept the earlest starting job j from I which does not overlap any previously accepted job, and repeat until no more such jobs remain.
  • 27.
    Earliest Job Firstis Wrong! The first job might be so long (War and Peace) that it prevents us from taking any other job.
  • 28.
    Shortest Job First Alwaystake the shortest possible job, so you spend the least time working (and thus unavailable). ShortestJobFirst(I) While (I 6= ∅) do Accept the shortest possible job j from I. Delete j, and intervals which intersect j from I.
  • 29.
    Shortest Job Firstis Wrong! Taking the shortest job can prevent us from taking two longer jobs which barely overlap it.
  • 30.
    First Job toComplete Take the job with the earliest completion date: OptimalScheduling(I) While (I 6= ∅) do Accept job j with the earliest completion date. Delete j, and whatever intersects j from I.
  • 31.
    First Job toComplete is Optimal! Proof: Other jobs may well have started before the first to complete (say, x), but all must at least partially overlap both x and each other. Thus we can select at most one from the group. The first these jobs to complete is x, so selecting any job but x would only block out more opportunties after x.
  • 32.
  • 33.
    Topic: Proof andCounterexample
  • 34.
    Demonstrating Incorrectness Searching forcounterexamples is the best way to disprove the correctness of a heuristic. • Think about all small examples. • Think about examples with ties on your decision criteria (e.g. pick the nearest point) • Think about examples with extremes of big and small...
  • 35.
    Induction and Recursion Failureto find a counterexample to a given algorithm does not mean “it is obvious” that the algorithm is correct. Mathematical induction is a very useful method for proving the correctness of recursive algorithms. Recursion and induction are the same basic idea: (1) basis case, (2) general assumption, (3) general case. n X i=1 i = n(n + 1)/2
  • 36.