Machine Learning Version Spaces Learning
Introduction: Neural Net approaches Symbolic approaches: version spaces decision trees knowledge discovery data mining speed up learning inductive learning … Very many approaches to machine learning:
Version Spaces A concept learning technique based on refining models of the world.
Concept Learning Example: a student has the following observations about having an allergic reaction after meals: Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - - Concept to learn : under which circumstances do I get an allergic reaction after meals ??
In general There is a set of all possible events: Example: There is a boolean function (implicitly) defined on this set. Example: We have the value of this function for SOME examples only. Restaurant Meal Day Cost 3 X 3 X 7 X 2 = 126 Reaction: Restaurant X Meal X Day X Cost --> Bool Find an inductive ‘guess’ of the concept, that covers all the examples !
Pictured: Set of all possible events                                   Given some examples: + + + + - - - - Find a concept that covers all the positive examples and none of the negative ones!
Non-determinism Many different ways to solve this! How to choose ? Set of all possible events                                   + + + + - - - -
An obvious, but bad choice: The concept IS: Alma 3 and breakfast and Friday and cheap OR Alma 3 and lunch and Saturday and cheap Does NOT generalize the examples any !! Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No - - - + +
Pictured: Only the positive examples are positive ! Set of all possible events + + + + - - - -                                  
Equally bad is: The concept is anything EXCEPT : De Moete and lunch and Friday and expensive AND Sedes and breakfast and Sunday and cheap AND Alma 3 and breakfast and Sunday and expensive Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - -
Pictured: Everything except the negative examples are positive: Set of all possible events                                   + + + + - - - -
Solution: fix a language of hypotheses: We introduce a fix language of concept descriptions. = hypothesis space The concept can only be identified as being one of the hypotheses in this language avoids the problem of having ‘useless’ conclusions forces some generalization/induction to cover more than just the given examples.
Reaction - Example: Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - - Every hypothesis is a 4-tuple: maximally specific: ex.: [ Sedes, lunch, Monday, cheap] combinations of ? and values are allowed: ex.: [ De Moete, ?, ?, expensive] or [ ?, lunch, ?, ?] One more hypothesis:  (bottom = denotes no example) most general hypothesis: [ ?, ?, ?, ?]
Hypotheses relate to sets of possible events       Events    x2      x1 x1 = < Alma 3, lunch, Monday, expensive> x2 = < Sedes, lunch, Sunday, cheap> h1 = [?, lunch, Monday, ?] h2 = [?, lunch, ?, cheap] h3 = [?, lunch, ?, ?] General Specific                 Hypotheses h2 h3 h1
Expressive power of this hypothesis language: Conjunctions of explicit, individual properties Examples: [?, lunch, ?, cheap] : Meal = lunch  Cost = cheap [?, lunch, ?, ?] : Meal = lunch In addition to the 2 special hypotheses:  : nothing [?, lunch, Monday, ?] : Meal = lunch  Day = Monday [ ?, ?, ?, ?] : anything
Other languages of hypotheses are allowed Example: identify the color of a given sequence of colored objects. A useful language of hypotheses: the examples: red : + purple : - blue : + any_color mono_color polyo_color red blue green orange purple 
Important about hypothesis languages: They should have a specific <-> general ordering. Corresponding to the set-inclusion of the events they cover.
Defining Concept Learning: Given: A set X of possible events : Ex.: Eat-events: <Restaurant, Meal, Day, Cost> An (unknown) target function c: X -> {-, +} Ex.: Reaction: Eat-events -> { -, +} A language of hypotheses H Ex.: conjunctions: [ ?, lunch, Monday, ?] A set of training examples D , with their value under c Ex.: (<Alma 3, breakfast,Friday,cheap>,+) , … Find: A hypothesis h in H such that for all x in D : x is covered by h  c( x ) = +
The inductive learning hypothesis: If a hypothesis approximates the target function well over a sufficiently large number of examples , then the hypothesis will also approximate the target function well on other unobserved examples .
Find-S: a naïve algorithm Replace h by a minimal generalization of h that covers x Return h Initialize : h :=  For each positive training example x in D Do: If h does not cover x :
Reaction example: minimal generalizations = the individual events Generalization = replace something by ? no more positive examples: return h Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - - Initially: h =  Example 1: h = [Alma 3,breakfast,Friday,cheap]  Example 2: h = [Alma 3, ? , ? ,cheap] 
Properties of Find-S Non-deterministic : Depending on H, there maybe several minimal generalizations: Beth Jo Alex Hacker Scientist Footballplayer  H Beth can be minimally generalized in 2 ways to include a new example Jo.
Properties of Find-S (2) May pick incorrect hypothesis (w.r.t. the negative examples): D : Beth Alex Jo + - + Beth Jo Alex Hacker Scientist Footballplayer  H Is a wrong solution
Properties of Find-S (3) Cannot detect inconsistency of the training data: Nor inability of the language H to learn the concept: D : Beth Jo Beth + + - D : Beth Alex Jo + - + Jo Alex Scientist  H Beth
Nice about Find-S: It doesn’t have to remember previous examples ! If h already covered the 20 first examples, then h’ will as well h        X H If the previous h already covered all previous examples, then a minimal generalization h’ will too !  h’   
Dual Find-S: Initialize : h := [ ?, ?, .., ?] For each negative training example x in D Do: If h does cover x : Replace h by a minimal specialization of h that does not cover x Return h
Reaction example: Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - - Initially: h = [ ?, ?, ?, ?] Example 1:  h= [ ?, breakfast, ?, ?] Example 2:  h= [ Alma 3, breakfast, ?, ?] Example 3:  h= [ Alma 3, breakfast, ?, cheap]
Version Spaces: the idea: BUT: do NOT select 1 minimal generalization or specialization at each step, but keep track of ALL minimal generalizations or specializations Perform both Find-S and Dual Find-S: Find-S deals with positive examples Dual Find-S deals with negative examples
Version spaces: initialization The version spaces G and S are initialized to be the smallest and the largest hypotheses only. G = { [ ?, ?, …, ?] } S = {  }
Negative examples: Replace the top hypothesis by ALL minimal specializations that DO NOT cover the negative example. Invariant : only the hypotheses more specific than the ones of G are still possible: they don’t cover the negative example S = {  } G = { [ ?, ?, …, ?] } G = {h1, h2, …, hn}
Positive examples: Replace the bottom hypothesis by ALL minimal generalizations that DO cover the positive example. Invariant : only the hypotheses more general than the ones of S are still possible: they do cover the positive example G = {h1, h2, …, hn} S = {  } S = { h1’, h2’, …,hm’ }
Later: negative examples Replace the all hypotheses in G that cover a next negative example by ALL minimal specializations that DO NOT cover the negative example. Invariant : only the hypotheses more specific than the ones of G are still possible: they don’t cover the negative example S = { h1’, h2’, …,hm’ } G = {h1, h2, …, hn} G = {h1, h21, h22, …, hn}
Later: positive examples Replace the all hypotheses in S that do not cover a next positive example by ALL minimal generalizations that DO cover the example. Invariant : only the hypotheses more general than the ones of S are still possible: they do cover the positive example G = {h1, h21, h22, …, hn} S = { h1’, h2’, …,hm’ } S = { h11’, h12’, h13’, h2’, …,hm’ }
Optimization: negative: Only consider specializations of elements in G that are still more general than some specific hypothesis (in S ) … are more general than ... G = {h1, h21, …, hn} S = { h1’, h2’, …,hm’ } Invariant : on S used !
Optimization: positive: Only consider generalizations of elements in S that are still more specific than some general hypothesis (in G ) … are more specific than ... Invariant : on G used ! G = {h1, h21, h22, …, hn} S = { h13’, h2’, …,hm’ }
Pruning: negative examples The new negative example can also be used to prune all the S - hypotheses that cover the negative example. G = {h1, h21, …, hn} S = { h1’, h3’, …,hm’ } Invariant only works for the previous examples, not the last one Cover the last negative example!
Pruning: positive examples The new positive example can also be used to prune all the G - hypotheses that do not cover the positive example. G = {h1, h21, …, hn} S = { h1’, h3’, …,hm’ } Don’t cover the last positive example
Eliminate redundant hypotheses If a hypothesis from G is more specific than another hypothesis from G : eliminate it ! More specific than another general hypothesis Obviously also for S ! G = {h1, h21, …, hn} S = { h1’, h3’, …,hm’ } Reason: Invariant acts as a wave front : anything above G is not allowed. The most general elements of G define the real boundary
Convergence: Eventually, if G and S MAY get a common element: Version Spaces has converged to a solution . Remaining examples need to be verified for the solution . G = {h1, h21, h22, …, hn} S = { h13’, h2’, …,hm’ }
Reaction example Initialization: [ ?, ?, ?, ?] Most general  Most specific
Alma3, breakfast, Friday, cheap: + Positive example: minimal generalization of  [ ?, ?, ?, ?]  [Alma3, breakfast, Friday, cheap]
DeMoete, lunch, Friday, expensive: - Negative example: minimal specialization of [ ?, ?, ?, ?] 15 possible specializations !! [ Alma3 , ?, ?, ?] [ DeMoete , ?, ?, ?] [ Sedes , ?, ?, ?] [?, breakfast , ?, ?] [?, lunch , ?, ?] [?, dinner , ?, ?] [?, ?, Monday , ?] [?, ?, Tuesday , ?] [?, ?, Wednesday , ?] [?, ?, Thursday , ?] [?, ?, Friday , ?] [?, ?, Saturday , ?] [?, ?, Sunday , ?] [?, ?, ?, cheap ] [?, ?, ?, expensive ] Matches the negative example X X X X Does not generalize the specific model Specific model: [Alma3, breakfast, Friday, cheap] X X X X X X X X Remain !
Result after example 2: [ ?, ?, ?, ?]  [Alma3, breakfast, Friday, cheap] [ Alma3 , ?, ?, ?] [?, breakfast , ?, ?] [?, ?, ?, cheap ]
Alma3, lunch, Saturday, cheap: + Positive example: minimal generalization of [Alma3, breakfast, Friday, cheap] [Alma3, breakfast, Friday, cheap] [Alma3, ?, ?, ?] [?, breakfast, ?, ?] [?, ?, ?, cheap] [Alma3, ? , ? , cheap] does not match new example
Sedes, breakfast, Sunday, cheap: - Negative example: minimal specialization of the general models: The only specialization that is introduced is pruned, because it is more specific than another general hypothesis [Alma3, ?, ?, ?] [?, ?, ?, cheap] [Alma3, ? , ? , cheap] [ Alma3 , ?, ?, cheap]
Alma 3, breakfast, Sunday, expensive: - Negative example: minimal specialization of [Alma3, ?, ?, ?] Cheap food at Alma3 produces the allergy ! [Alma3, ?, ?, ?] [Alma3, ? , ? , cheap] [Alma3, ?, ?, cheap ] Same hypothesis !!!
Version Space Algorithm: Generalize all hypotheses in S that do not cover the example yet, but ensure the following: - Only introduce minimal changes on the hypotheses. - Each new specific hypothesis is a specialization of some general hypothesis . - No new specific hypothesis is a generalization of some other specific hypothesis . Prune away all hypotheses in G that do not cover the example. Initially : G := { the hypothesis that covers everything} S := {  } For each new positive example:
Version Space Algorithm (2): Specialize all hypotheses in G that cover the example, but ensure the following: - Only introduce minimal changes on the hypotheses. - Each new general hypothesis is a generalization of some specific hypothesis . - No new general hypothesis is a specialization of some other general hypothesis . Prune away all hypotheses in S that cover the example. Until there are no more examples: report S and G OR S or G become empty : report failure . . . For each new negative example:
Properties of VS: Symmetry: positive and negative examples are dealt with in a completely dual way. Does not need to remember previous examples. Noise: VS cannot deal with noise ! If a positive example is given to be negative then VS eliminates the desired hypothesis from the Version Space G !
Termination: If it terminates because of “no more examples”: Then all these hypotheses , and all intermediate hypotheses , are still correct descriptions covering the test data. Example (spaces on termination): [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] VS makes NO unnecessary choices !
Termination (2): If it terminates because of S or G being empty: Then either: The data is inconsistent (noise?) The target concept cannot be represented in the hypothesis-language H. [Alma3, breakfast, ?, cheap]  [Alma3, lunch, ?, cheap] <Alma3, dinner, Sunday, cheap> - <Alma3, breakfast, Sunday, cheap> + <Alma3, lunch, Sunday, cheap> + Example: target concept is: given examples like: this cannot be learned in our language H.
Which example next? VS can decide itself which example would be most useful next. It can ‘query’ a user for the most relevant additional classification ! Example: <Alma3,lunch,Monday, expensive > [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] classified negative by 3 hypotheses classified positive by 3 hypotheses is the most informative new example
Use of partially learned concepts: Example: <Alma3,lunch,Monday,cheap> [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] can be classified as positive is covered by all remaining hypotheses ! it is enough to check that it is covered by the hypotheses in S ! (all others generalize these)
Use of partially learned concepts (2): Example: <Sedes,lunch,Sunday,cheap> [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] can be classified as negative is not covered by any remaining hypothesis ! it is enough to check that it is not covered by any hypothesis in G ! (all others specialize these)
Use of partially learned concepts (3): Example: <Alma3,lunch,Monday,expensive> [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] can not be classified is covered by 3, not covered 3 hypotheses no conclusion
Use of partially learned concepts (4): Example: <Sedes,lunch,Monday,expensive> [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] probably does not belong to the concept : ratio : 1/6 can only be classified with a certain degree of precision is covered by 1, not covered 5 hypotheses
The relevance of inductive BIAS: choosing H Our hypothesis language L fails to learn some concepts. What about choosing a more expressive language H? Assume H’: See example : [Alma3, breakfast, ?, cheap]  [Alma3, lunch, ?, cheap] allows conjunctions (as before) allows disjunction and negation too ! Example: Restaurant = Alma3  ~(Day = Monday)
Inductive BIAS (2): This language H’ allows to represent ANY subset of the complete set of all events X But X has 126 elements we can express 2 126 different hypotheses now ! Restaurant Meal Day Cost 3 X 3 X 7 X 2 = 126
Inductive BIAS (3) Version Spaces using H’: Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - - ?  {~[DeMoete. Lunch, Friday, expensive]  ~[Sedes, breakfast, Sunday, cheap]} {~[DeMoete. Lunch, Friday, expensive]  ~[Sedes, breakfast, Sunday, cheap]  ~[Alma 3, breakfast, Sunday, expensive]} {~[DeMoete. Lunch, Friday, expensive]} {[Alma 3, breakfast, Friday, cheap]  [Alma 3, lunch, Saturday, cheap]} {[Alma 3, breakfast, Friday, cheap]}
Inductive BIAS (3) Resulting Version Spaces: We haven’t learned anything merely restated our positive and negative examples ! G = {~[DeMoete. Lunch, Friday, expensive]  ~[Sedes, breakfast, Sunday, cheap]  ~[Alma 3, breakfast, Sunday, expensive]} S = {[Alma 3, breakfast, Friday, cheap]  [Alma 3, lunch, Saturday, cheap]} In general: in order to be able to learn, we need an inductive BIAS (= assumption): Example: “ The desired concept CAN be described as a conjunction of features “
Shift of Bias Practical approach to the Bias problem: Avoids the choice. Gives the most general concept that can be learned. Start VS with a very weak hypothesis language. If the concept is learned: o.k. Else Refine your language and restart VS

Version spaces

  • 1.
    Machine Learning VersionSpaces Learning
  • 2.
    Introduction: Neural Netapproaches Symbolic approaches: version spaces decision trees knowledge discovery data mining speed up learning inductive learning … Very many approaches to machine learning:
  • 3.
    Version Spaces A concept learning technique based on refining models of the world.
  • 4.
    Concept Learning Example:a student has the following observations about having an allergic reaction after meals: Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - - Concept to learn : under which circumstances do I get an allergic reaction after meals ??
  • 5.
    In general Thereis a set of all possible events: Example: There is a boolean function (implicitly) defined on this set. Example: We have the value of this function for SOME examples only. Restaurant Meal Day Cost 3 X 3 X 7 X 2 = 126 Reaction: Restaurant X Meal X Day X Cost --> Bool Find an inductive ‘guess’ of the concept, that covers all the examples !
  • 6.
    Pictured: Set ofall possible events                                   Given some examples: + + + + - - - - Find a concept that covers all the positive examples and none of the negative ones!
  • 7.
    Non-determinism Many differentways to solve this! How to choose ? Set of all possible events                                   + + + + - - - -
  • 8.
    An obvious, butbad choice: The concept IS: Alma 3 and breakfast and Friday and cheap OR Alma 3 and lunch and Saturday and cheap Does NOT generalize the examples any !! Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No - - - + +
  • 9.
    Pictured: Only thepositive examples are positive ! Set of all possible events + + + + - - - -                                  
  • 10.
    Equally bad is:The concept is anything EXCEPT : De Moete and lunch and Friday and expensive AND Sedes and breakfast and Sunday and cheap AND Alma 3 and breakfast and Sunday and expensive Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - -
  • 11.
    Pictured: Everything exceptthe negative examples are positive: Set of all possible events                                   + + + + - - - -
  • 12.
    Solution: fixa language of hypotheses: We introduce a fix language of concept descriptions. = hypothesis space The concept can only be identified as being one of the hypotheses in this language avoids the problem of having ‘useless’ conclusions forces some generalization/induction to cover more than just the given examples.
  • 13.
    Reaction - Example:Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - - Every hypothesis is a 4-tuple: maximally specific: ex.: [ Sedes, lunch, Monday, cheap] combinations of ? and values are allowed: ex.: [ De Moete, ?, ?, expensive] or [ ?, lunch, ?, ?] One more hypothesis:  (bottom = denotes no example) most general hypothesis: [ ?, ?, ?, ?]
  • 14.
    Hypotheses relate to sets of possible events       Events    x2      x1 x1 = < Alma 3, lunch, Monday, expensive> x2 = < Sedes, lunch, Sunday, cheap> h1 = [?, lunch, Monday, ?] h2 = [?, lunch, ?, cheap] h3 = [?, lunch, ?, ?] General Specific                 Hypotheses h2 h3 h1
  • 15.
    Expressive power ofthis hypothesis language: Conjunctions of explicit, individual properties Examples: [?, lunch, ?, cheap] : Meal = lunch  Cost = cheap [?, lunch, ?, ?] : Meal = lunch In addition to the 2 special hypotheses:  : nothing [?, lunch, Monday, ?] : Meal = lunch  Day = Monday [ ?, ?, ?, ?] : anything
  • 16.
    Other languages ofhypotheses are allowed Example: identify the color of a given sequence of colored objects. A useful language of hypotheses: the examples: red : + purple : - blue : + any_color mono_color polyo_color red blue green orange purple 
  • 17.
    Important about hypothesislanguages: They should have a specific <-> general ordering. Corresponding to the set-inclusion of the events they cover.
  • 18.
    Defining Concept Learning:Given: A set X of possible events : Ex.: Eat-events: <Restaurant, Meal, Day, Cost> An (unknown) target function c: X -> {-, +} Ex.: Reaction: Eat-events -> { -, +} A language of hypotheses H Ex.: conjunctions: [ ?, lunch, Monday, ?] A set of training examples D , with their value under c Ex.: (<Alma 3, breakfast,Friday,cheap>,+) , … Find: A hypothesis h in H such that for all x in D : x is covered by h  c( x ) = +
  • 19.
    The inductive learninghypothesis: If a hypothesis approximates the target function well over a sufficiently large number of examples , then the hypothesis will also approximate the target function well on other unobserved examples .
  • 20.
    Find-S: a naïvealgorithm Replace h by a minimal generalization of h that covers x Return h Initialize : h :=  For each positive training example x in D Do: If h does not cover x :
  • 21.
    Reaction example: minimalgeneralizations = the individual events Generalization = replace something by ? no more positive examples: return h Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - - Initially: h =  Example 1: h = [Alma 3,breakfast,Friday,cheap]  Example 2: h = [Alma 3, ? , ? ,cheap] 
  • 22.
    Properties of Find-SNon-deterministic : Depending on H, there maybe several minimal generalizations: Beth Jo Alex Hacker Scientist Footballplayer  H Beth can be minimally generalized in 2 ways to include a new example Jo.
  • 23.
    Properties of Find-S(2) May pick incorrect hypothesis (w.r.t. the negative examples): D : Beth Alex Jo + - + Beth Jo Alex Hacker Scientist Footballplayer  H Is a wrong solution
  • 24.
    Properties of Find-S(3) Cannot detect inconsistency of the training data: Nor inability of the language H to learn the concept: D : Beth Jo Beth + + - D : Beth Alex Jo + - + Jo Alex Scientist  H Beth
  • 25.
    Nice about Find-S:It doesn’t have to remember previous examples ! If h already covered the 20 first examples, then h’ will as well h        X H If the previous h already covered all previous examples, then a minimal generalization h’ will too !  h’   
  • 26.
    Dual Find-S: Initialize: h := [ ?, ?, .., ?] For each negative training example x in D Do: If h does cover x : Replace h by a minimal specialization of h that does not cover x Return h
  • 27.
    Reaction example: RestaurantMeal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - - Initially: h = [ ?, ?, ?, ?] Example 1:  h= [ ?, breakfast, ?, ?] Example 2:  h= [ Alma 3, breakfast, ?, ?] Example 3:  h= [ Alma 3, breakfast, ?, cheap]
  • 28.
    Version Spaces: theidea: BUT: do NOT select 1 minimal generalization or specialization at each step, but keep track of ALL minimal generalizations or specializations Perform both Find-S and Dual Find-S: Find-S deals with positive examples Dual Find-S deals with negative examples
  • 29.
    Version spaces: initializationThe version spaces G and S are initialized to be the smallest and the largest hypotheses only. G = { [ ?, ?, …, ?] } S = {  }
  • 30.
    Negative examples: Replacethe top hypothesis by ALL minimal specializations that DO NOT cover the negative example. Invariant : only the hypotheses more specific than the ones of G are still possible: they don’t cover the negative example S = {  } G = { [ ?, ?, …, ?] } G = {h1, h2, …, hn}
  • 31.
    Positive examples: Replacethe bottom hypothesis by ALL minimal generalizations that DO cover the positive example. Invariant : only the hypotheses more general than the ones of S are still possible: they do cover the positive example G = {h1, h2, …, hn} S = {  } S = { h1’, h2’, …,hm’ }
  • 32.
    Later: negative examplesReplace the all hypotheses in G that cover a next negative example by ALL minimal specializations that DO NOT cover the negative example. Invariant : only the hypotheses more specific than the ones of G are still possible: they don’t cover the negative example S = { h1’, h2’, …,hm’ } G = {h1, h2, …, hn} G = {h1, h21, h22, …, hn}
  • 33.
    Later: positive examplesReplace the all hypotheses in S that do not cover a next positive example by ALL minimal generalizations that DO cover the example. Invariant : only the hypotheses more general than the ones of S are still possible: they do cover the positive example G = {h1, h21, h22, …, hn} S = { h1’, h2’, …,hm’ } S = { h11’, h12’, h13’, h2’, …,hm’ }
  • 34.
    Optimization: negative: Onlyconsider specializations of elements in G that are still more general than some specific hypothesis (in S ) … are more general than ... G = {h1, h21, …, hn} S = { h1’, h2’, …,hm’ } Invariant : on S used !
  • 35.
    Optimization: positive: Onlyconsider generalizations of elements in S that are still more specific than some general hypothesis (in G ) … are more specific than ... Invariant : on G used ! G = {h1, h21, h22, …, hn} S = { h13’, h2’, …,hm’ }
  • 36.
    Pruning: negative examples The new negative example can also be used to prune all the S - hypotheses that cover the negative example. G = {h1, h21, …, hn} S = { h1’, h3’, …,hm’ } Invariant only works for the previous examples, not the last one Cover the last negative example!
  • 37.
    Pruning: positive examples The new positive example can also be used to prune all the G - hypotheses that do not cover the positive example. G = {h1, h21, …, hn} S = { h1’, h3’, …,hm’ } Don’t cover the last positive example
  • 38.
    Eliminate redundant hypothesesIf a hypothesis from G is more specific than another hypothesis from G : eliminate it ! More specific than another general hypothesis Obviously also for S ! G = {h1, h21, …, hn} S = { h1’, h3’, …,hm’ } Reason: Invariant acts as a wave front : anything above G is not allowed. The most general elements of G define the real boundary
  • 39.
    Convergence: Eventually, if G and S MAY get a common element: Version Spaces has converged to a solution . Remaining examples need to be verified for the solution . G = {h1, h21, h22, …, hn} S = { h13’, h2’, …,hm’ }
  • 40.
    Reaction example Initialization:[ ?, ?, ?, ?] Most general  Most specific
  • 41.
    Alma3, breakfast, Friday, cheap: + Positive example: minimal generalization of  [ ?, ?, ?, ?]  [Alma3, breakfast, Friday, cheap]
  • 42.
    DeMoete, lunch, Friday, expensive: - Negative example: minimal specialization of [ ?, ?, ?, ?] 15 possible specializations !! [ Alma3 , ?, ?, ?] [ DeMoete , ?, ?, ?] [ Sedes , ?, ?, ?] [?, breakfast , ?, ?] [?, lunch , ?, ?] [?, dinner , ?, ?] [?, ?, Monday , ?] [?, ?, Tuesday , ?] [?, ?, Wednesday , ?] [?, ?, Thursday , ?] [?, ?, Friday , ?] [?, ?, Saturday , ?] [?, ?, Sunday , ?] [?, ?, ?, cheap ] [?, ?, ?, expensive ] Matches the negative example X X X X Does not generalize the specific model Specific model: [Alma3, breakfast, Friday, cheap] X X X X X X X X Remain !
  • 43.
    Result after example2: [ ?, ?, ?, ?]  [Alma3, breakfast, Friday, cheap] [ Alma3 , ?, ?, ?] [?, breakfast , ?, ?] [?, ?, ?, cheap ]
  • 44.
    Alma3, lunch, Saturday, cheap: + Positive example: minimal generalization of [Alma3, breakfast, Friday, cheap] [Alma3, breakfast, Friday, cheap] [Alma3, ?, ?, ?] [?, breakfast, ?, ?] [?, ?, ?, cheap] [Alma3, ? , ? , cheap] does not match new example
  • 45.
    Sedes, breakfast, Sunday, cheap: - Negative example: minimal specialization of the general models: The only specialization that is introduced is pruned, because it is more specific than another general hypothesis [Alma3, ?, ?, ?] [?, ?, ?, cheap] [Alma3, ? , ? , cheap] [ Alma3 , ?, ?, cheap]
  • 46.
    Alma 3, breakfast, Sunday, expensive: - Negative example: minimal specialization of [Alma3, ?, ?, ?] Cheap food at Alma3 produces the allergy ! [Alma3, ?, ?, ?] [Alma3, ? , ? , cheap] [Alma3, ?, ?, cheap ] Same hypothesis !!!
  • 47.
    Version Space Algorithm:Generalize all hypotheses in S that do not cover the example yet, but ensure the following: - Only introduce minimal changes on the hypotheses. - Each new specific hypothesis is a specialization of some general hypothesis . - No new specific hypothesis is a generalization of some other specific hypothesis . Prune away all hypotheses in G that do not cover the example. Initially : G := { the hypothesis that covers everything} S := {  } For each new positive example:
  • 48.
    Version Space Algorithm(2): Specialize all hypotheses in G that cover the example, but ensure the following: - Only introduce minimal changes on the hypotheses. - Each new general hypothesis is a generalization of some specific hypothesis . - No new general hypothesis is a specialization of some other general hypothesis . Prune away all hypotheses in S that cover the example. Until there are no more examples: report S and G OR S or G become empty : report failure . . . For each new negative example:
  • 49.
    Properties of VS:Symmetry: positive and negative examples are dealt with in a completely dual way. Does not need to remember previous examples. Noise: VS cannot deal with noise ! If a positive example is given to be negative then VS eliminates the desired hypothesis from the Version Space G !
  • 50.
    Termination: If itterminates because of “no more examples”: Then all these hypotheses , and all intermediate hypotheses , are still correct descriptions covering the test data. Example (spaces on termination): [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] VS makes NO unnecessary choices !
  • 51.
    Termination (2): Ifit terminates because of S or G being empty: Then either: The data is inconsistent (noise?) The target concept cannot be represented in the hypothesis-language H. [Alma3, breakfast, ?, cheap]  [Alma3, lunch, ?, cheap] <Alma3, dinner, Sunday, cheap> - <Alma3, breakfast, Sunday, cheap> + <Alma3, lunch, Sunday, cheap> + Example: target concept is: given examples like: this cannot be learned in our language H.
  • 52.
    Which example next?VS can decide itself which example would be most useful next. It can ‘query’ a user for the most relevant additional classification ! Example: <Alma3,lunch,Monday, expensive > [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] classified negative by 3 hypotheses classified positive by 3 hypotheses is the most informative new example
  • 53.
    Use of partiallylearned concepts: Example: <Alma3,lunch,Monday,cheap> [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] can be classified as positive is covered by all remaining hypotheses ! it is enough to check that it is covered by the hypotheses in S ! (all others generalize these)
  • 54.
    Use of partiallylearned concepts (2): Example: <Sedes,lunch,Sunday,cheap> [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] can be classified as negative is not covered by any remaining hypothesis ! it is enough to check that it is not covered by any hypothesis in G ! (all others specialize these)
  • 55.
    Use of partiallylearned concepts (3): Example: <Alma3,lunch,Monday,expensive> [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] can not be classified is covered by 3, not covered 3 hypotheses no conclusion
  • 56.
    Use of partiallylearned concepts (4): Example: <Sedes,lunch,Monday,expensive> [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] probably does not belong to the concept : ratio : 1/6 can only be classified with a certain degree of precision is covered by 1, not covered 5 hypotheses
  • 57.
    The relevance ofinductive BIAS: choosing H Our hypothesis language L fails to learn some concepts. What about choosing a more expressive language H? Assume H’: See example : [Alma3, breakfast, ?, cheap]  [Alma3, lunch, ?, cheap] allows conjunctions (as before) allows disjunction and negation too ! Example: Restaurant = Alma3  ~(Day = Monday)
  • 58.
    Inductive BIAS (2):This language H’ allows to represent ANY subset of the complete set of all events X But X has 126 elements we can express 2 126 different hypotheses now ! Restaurant Meal Day Cost 3 X 3 X 7 X 2 = 126
  • 59.
    Inductive BIAS (3)Version Spaces using H’: Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - - ?  {~[DeMoete. Lunch, Friday, expensive]  ~[Sedes, breakfast, Sunday, cheap]} {~[DeMoete. Lunch, Friday, expensive]  ~[Sedes, breakfast, Sunday, cheap]  ~[Alma 3, breakfast, Sunday, expensive]} {~[DeMoete. Lunch, Friday, expensive]} {[Alma 3, breakfast, Friday, cheap]  [Alma 3, lunch, Saturday, cheap]} {[Alma 3, breakfast, Friday, cheap]}
  • 60.
    Inductive BIAS (3)Resulting Version Spaces: We haven’t learned anything merely restated our positive and negative examples ! G = {~[DeMoete. Lunch, Friday, expensive]  ~[Sedes, breakfast, Sunday, cheap]  ~[Alma 3, breakfast, Sunday, expensive]} S = {[Alma 3, breakfast, Friday, cheap]  [Alma 3, lunch, Saturday, cheap]} In general: in order to be able to learn, we need an inductive BIAS (= assumption): Example: “ The desired concept CAN be described as a conjunction of features “
  • 61.
    Shift of BiasPractical approach to the Bias problem: Avoids the choice. Gives the most general concept that can be learned. Start VS with a very weak hypothesis language. If the concept is learned: o.k. Else Refine your language and restart VS