The document discusses version space learning, an approach to machine learning where both the most general and most specific hypotheses consistent with the training examples are maintained. It begins by introducing concept learning and version spaces, showing how all possible hypotheses can be represented as a lattice. The Find-S and Dual Find-S algorithms are presented for updating the version spaces in response to positive and negative examples. The key properties of version spaces are that they track all hypotheses consistent with the examples seen so far, avoiding premature commitment to a single hypothesis.
Overview of machine learning, highlighting Neural Net, Symbolic approaches like decision trees, and concepts like version spaces.
Discussion of version spaces as a technique for concept learning and refining models based on observed examples. Illustration of concept learning through a student's allergic reaction data, focusing on how to infer concepts from examples.
Introduction of hypothesis spaces to formalize concept learning, emphasizing need for expressiveness and specific/general hypotheses.
Definition of inductive learning involving examples and hypothesis and its importance based on the inductive learning hypothesis.
Explanation of the Find-S algorithm for concept learning, detailing its operation, properties, and challenges.
Exploration of the Dual Find-S algorithm to specialize hypotheses based on negative training examples.
Introduction to managing hypotheses with version spaces, incorporating both generalizations and specializations.
Process of replacing hypotheses in version spaces using positive and negative examples to refine learning.
Techniques to prune redundant hypotheses in version spaces to ensure convergence and maintain learning efficiency.Conditions under which version spaces converge to a solution, addressing data inconsistency and hypothesis limitations.
Approaches to classify new examples based on partially learned concepts within version spaces.
Discussion on inductive bias in choosing hypothesis languages and its effect on learning expressive concept representations.
Introduction: Neural Netapproaches Symbolic approaches: version spaces decision trees knowledge discovery data mining speed up learning inductive learning … Very many approaches to machine learning:
3.
Version Spaces A concept learning technique based on refining models of the world.
4.
Concept Learning Example:a student has the following observations about having an allergic reaction after meals: Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - - Concept to learn : under which circumstances do I get an allergic reaction after meals ??
5.
In general Thereis a set of all possible events: Example: There is a boolean function (implicitly) defined on this set. Example: We have the value of this function for SOME examples only. Restaurant Meal Day Cost 3 X 3 X 7 X 2 = 126 Reaction: Restaurant X Meal X Day X Cost --> Bool Find an inductive ‘guess’ of the concept, that covers all the examples !
6.
Pictured: Set ofall possible events Given some examples: + + + + - - - - Find a concept that covers all the positive examples and none of the negative ones!
7.
Non-determinism Many differentways to solve this! How to choose ? Set of all possible events + + + + - - - -
8.
An obvious, butbad choice: The concept IS: Alma 3 and breakfast and Friday and cheap OR Alma 3 and lunch and Saturday and cheap Does NOT generalize the examples any !! Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No - - - + +
9.
Pictured: Only thepositive examples are positive ! Set of all possible events + + + + - - - -
10.
Equally bad is:The concept is anything EXCEPT : De Moete and lunch and Friday and expensive AND Sedes and breakfast and Sunday and cheap AND Alma 3 and breakfast and Sunday and expensive Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - -
Solution: fixa language of hypotheses: We introduce a fix language of concept descriptions. = hypothesis space The concept can only be identified as being one of the hypotheses in this language avoids the problem of having ‘useless’ conclusions forces some generalization/induction to cover more than just the given examples.
13.
Reaction - Example:Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - - Every hypothesis is a 4-tuple: maximally specific: ex.: [ Sedes, lunch, Monday, cheap] combinations of ? and values are allowed: ex.: [ De Moete, ?, ?, expensive] or [ ?, lunch, ?, ?] One more hypothesis: (bottom = denotes no example) most general hypothesis: [ ?, ?, ?, ?]
Expressive power ofthis hypothesis language: Conjunctions of explicit, individual properties Examples: [?, lunch, ?, cheap] : Meal = lunch Cost = cheap [?, lunch, ?, ?] : Meal = lunch In addition to the 2 special hypotheses: : nothing [?, lunch, Monday, ?] : Meal = lunch Day = Monday [ ?, ?, ?, ?] : anything
16.
Other languages ofhypotheses are allowed Example: identify the color of a given sequence of colored objects. A useful language of hypotheses: the examples: red : + purple : - blue : + any_color mono_color polyo_color red blue green orange purple
17.
Important about hypothesislanguages: They should have a specific <-> general ordering. Corresponding to the set-inclusion of the events they cover.
18.
Defining Concept Learning:Given: A set X of possible events : Ex.: Eat-events: <Restaurant, Meal, Day, Cost> An (unknown) target function c: X -> {-, +} Ex.: Reaction: Eat-events -> { -, +} A language of hypotheses H Ex.: conjunctions: [ ?, lunch, Monday, ?] A set of training examples D , with their value under c Ex.: (<Alma 3, breakfast,Friday,cheap>,+) , … Find: A hypothesis h in H such that for all x in D : x is covered by h c( x ) = +
19.
The inductive learninghypothesis: If a hypothesis approximates the target function well over a sufficiently large number of examples , then the hypothesis will also approximate the target function well on other unobserved examples .
20.
Find-S: a naïvealgorithm Replace h by a minimal generalization of h that covers x Return h Initialize : h := For each positive training example x in D Do: If h does not cover x :
21.
Reaction example: minimalgeneralizations = the individual events Generalization = replace something by ? no more positive examples: return h Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - - Initially: h = Example 1: h = [Alma 3,breakfast,Friday,cheap] Example 2: h = [Alma 3, ? , ? ,cheap]
22.
Properties of Find-SNon-deterministic : Depending on H, there maybe several minimal generalizations: Beth Jo Alex Hacker Scientist Footballplayer H Beth can be minimally generalized in 2 ways to include a new example Jo.
23.
Properties of Find-S(2) May pick incorrect hypothesis (w.r.t. the negative examples): D : Beth Alex Jo + - + Beth Jo Alex Hacker Scientist Footballplayer H Is a wrong solution
24.
Properties of Find-S(3) Cannot detect inconsistency of the training data: Nor inability of the language H to learn the concept: D : Beth Jo Beth + + - D : Beth Alex Jo + - + Jo Alex Scientist H Beth
25.
Nice about Find-S:It doesn’t have to remember previous examples ! If h already covered the 20 first examples, then h’ will as well h X H If the previous h already covered all previous examples, then a minimal generalization h’ will too ! h’
26.
Dual Find-S: Initialize: h := [ ?, ?, .., ?] For each negative training example x in D Do: If h does cover x : Replace h by a minimal specialization of h that does not cover x Return h
27.
Reaction example: RestaurantMeal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - - Initially: h = [ ?, ?, ?, ?] Example 1: h= [ ?, breakfast, ?, ?] Example 2: h= [ Alma 3, breakfast, ?, ?] Example 3: h= [ Alma 3, breakfast, ?, cheap]
28.
Version Spaces: theidea: BUT: do NOT select 1 minimal generalization or specialization at each step, but keep track of ALL minimal generalizations or specializations Perform both Find-S and Dual Find-S: Find-S deals with positive examples Dual Find-S deals with negative examples
29.
Version spaces: initializationThe version spaces G and S are initialized to be the smallest and the largest hypotheses only. G = { [ ?, ?, …, ?] } S = { }
30.
Negative examples: Replacethe top hypothesis by ALL minimal specializations that DO NOT cover the negative example. Invariant : only the hypotheses more specific than the ones of G are still possible: they don’t cover the negative example S = { } G = { [ ?, ?, …, ?] } G = {h1, h2, …, hn}
31.
Positive examples: Replacethe bottom hypothesis by ALL minimal generalizations that DO cover the positive example. Invariant : only the hypotheses more general than the ones of S are still possible: they do cover the positive example G = {h1, h2, …, hn} S = { } S = { h1’, h2’, …,hm’ }
32.
Later: negative examplesReplace the all hypotheses in G that cover a next negative example by ALL minimal specializations that DO NOT cover the negative example. Invariant : only the hypotheses more specific than the ones of G are still possible: they don’t cover the negative example S = { h1’, h2’, …,hm’ } G = {h1, h2, …, hn} G = {h1, h21, h22, …, hn}
33.
Later: positive examplesReplace the all hypotheses in S that do not cover a next positive example by ALL minimal generalizations that DO cover the example. Invariant : only the hypotheses more general than the ones of S are still possible: they do cover the positive example G = {h1, h21, h22, …, hn} S = { h1’, h2’, …,hm’ } S = { h11’, h12’, h13’, h2’, …,hm’ }
34.
Optimization: negative: Onlyconsider specializations of elements in G that are still more general than some specific hypothesis (in S ) … are more general than ... G = {h1, h21, …, hn} S = { h1’, h2’, …,hm’ } Invariant : on S used !
35.
Optimization: positive: Onlyconsider generalizations of elements in S that are still more specific than some general hypothesis (in G ) … are more specific than ... Invariant : on G used ! G = {h1, h21, h22, …, hn} S = { h13’, h2’, …,hm’ }
36.
Pruning: negative examples The new negative example can also be used to prune all the S - hypotheses that cover the negative example. G = {h1, h21, …, hn} S = { h1’, h3’, …,hm’ } Invariant only works for the previous examples, not the last one Cover the last negative example!
37.
Pruning: positive examples The new positive example can also be used to prune all the G - hypotheses that do not cover the positive example. G = {h1, h21, …, hn} S = { h1’, h3’, …,hm’ } Don’t cover the last positive example
38.
Eliminate redundant hypothesesIf a hypothesis from G is more specific than another hypothesis from G : eliminate it ! More specific than another general hypothesis Obviously also for S ! G = {h1, h21, …, hn} S = { h1’, h3’, …,hm’ } Reason: Invariant acts as a wave front : anything above G is not allowed. The most general elements of G define the real boundary
39.
Convergence: Eventually, if G and S MAY get a common element: Version Spaces has converged to a solution . Remaining examples need to be verified for the solution . G = {h1, h21, h22, …, hn} S = { h13’, h2’, …,hm’ }
Alma3, lunch, Saturday, cheap: + Positive example: minimal generalization of [Alma3, breakfast, Friday, cheap] [Alma3, breakfast, Friday, cheap] [Alma3, ?, ?, ?] [?, breakfast, ?, ?] [?, ?, ?, cheap] [Alma3, ? , ? , cheap] does not match new example
45.
Sedes, breakfast, Sunday, cheap: - Negative example: minimal specialization of the general models: The only specialization that is introduced is pruned, because it is more specific than another general hypothesis [Alma3, ?, ?, ?] [?, ?, ?, cheap] [Alma3, ? , ? , cheap] [ Alma3 , ?, ?, cheap]
46.
Alma 3, breakfast, Sunday, expensive: - Negative example: minimal specialization of [Alma3, ?, ?, ?] Cheap food at Alma3 produces the allergy ! [Alma3, ?, ?, ?] [Alma3, ? , ? , cheap] [Alma3, ?, ?, cheap ] Same hypothesis !!!
47.
Version Space Algorithm:Generalize all hypotheses in S that do not cover the example yet, but ensure the following: - Only introduce minimal changes on the hypotheses. - Each new specific hypothesis is a specialization of some general hypothesis . - No new specific hypothesis is a generalization of some other specific hypothesis . Prune away all hypotheses in G that do not cover the example. Initially : G := { the hypothesis that covers everything} S := { } For each new positive example:
48.
Version Space Algorithm(2): Specialize all hypotheses in G that cover the example, but ensure the following: - Only introduce minimal changes on the hypotheses. - Each new general hypothesis is a generalization of some specific hypothesis . - No new general hypothesis is a specialization of some other general hypothesis . Prune away all hypotheses in S that cover the example. Until there are no more examples: report S and G OR S or G become empty : report failure . . . For each new negative example:
49.
Properties of VS:Symmetry: positive and negative examples are dealt with in a completely dual way. Does not need to remember previous examples. Noise: VS cannot deal with noise ! If a positive example is given to be negative then VS eliminates the desired hypothesis from the Version Space G !
50.
Termination: If itterminates because of “no more examples”: Then all these hypotheses , and all intermediate hypotheses , are still correct descriptions covering the test data. Example (spaces on termination): [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] VS makes NO unnecessary choices !
51.
Termination (2): Ifit terminates because of S or G being empty: Then either: The data is inconsistent (noise?) The target concept cannot be represented in the hypothesis-language H. [Alma3, breakfast, ?, cheap] [Alma3, lunch, ?, cheap] <Alma3, dinner, Sunday, cheap> - <Alma3, breakfast, Sunday, cheap> + <Alma3, lunch, Sunday, cheap> + Example: target concept is: given examples like: this cannot be learned in our language H.
52.
Which example next?VS can decide itself which example would be most useful next. It can ‘query’ a user for the most relevant additional classification ! Example: <Alma3,lunch,Monday, expensive > [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] classified negative by 3 hypotheses classified positive by 3 hypotheses is the most informative new example
53.
Use of partiallylearned concepts: Example: <Alma3,lunch,Monday,cheap> [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] can be classified as positive is covered by all remaining hypotheses ! it is enough to check that it is covered by the hypotheses in S ! (all others generalize these)
54.
Use of partiallylearned concepts (2): Example: <Sedes,lunch,Sunday,cheap> [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] can be classified as negative is not covered by any remaining hypothesis ! it is enough to check that it is not covered by any hypothesis in G ! (all others specialize these)
55.
Use of partiallylearned concepts (3): Example: <Alma3,lunch,Monday,expensive> [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] can not be classified is covered by 3, not covered 3 hypotheses no conclusion
56.
Use of partiallylearned concepts (4): Example: <Sedes,lunch,Monday,expensive> [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] probably does not belong to the concept : ratio : 1/6 can only be classified with a certain degree of precision is covered by 1, not covered 5 hypotheses
57.
The relevance ofinductive BIAS: choosing H Our hypothesis language L fails to learn some concepts. What about choosing a more expressive language H? Assume H’: See example : [Alma3, breakfast, ?, cheap] [Alma3, lunch, ?, cheap] allows conjunctions (as before) allows disjunction and negation too ! Example: Restaurant = Alma3 ~(Day = Monday)
58.
Inductive BIAS (2):This language H’ allows to represent ANY subset of the complete set of all events X But X has 126 elements we can express 2 126 different hypotheses now ! Restaurant Meal Day Cost 3 X 3 X 7 X 2 = 126
59.
Inductive BIAS (3)Version Spaces using H’: Restaurant Meal Day Cost Reaction Alma 3 breakfast Friday cheap Yes De Moete lunch Friday expensive No Alma 3 lunch Saturday cheap Yes Sedes breakfast Sunday cheap No Alma 3 breakfast Sunday expensive No + - + - - ? {~[DeMoete. Lunch, Friday, expensive] ~[Sedes, breakfast, Sunday, cheap]} {~[DeMoete. Lunch, Friday, expensive] ~[Sedes, breakfast, Sunday, cheap] ~[Alma 3, breakfast, Sunday, expensive]} {~[DeMoete. Lunch, Friday, expensive]} {[Alma 3, breakfast, Friday, cheap] [Alma 3, lunch, Saturday, cheap]} {[Alma 3, breakfast, Friday, cheap]}
60.
Inductive BIAS (3)Resulting Version Spaces: We haven’t learned anything merely restated our positive and negative examples ! G = {~[DeMoete. Lunch, Friday, expensive] ~[Sedes, breakfast, Sunday, cheap] ~[Alma 3, breakfast, Sunday, expensive]} S = {[Alma 3, breakfast, Friday, cheap] [Alma 3, lunch, Saturday, cheap]} In general: in order to be able to learn, we need an inductive BIAS (= assumption): Example: “ The desired concept CAN be described as a conjunction of features “
61.
Shift of BiasPractical approach to the Bias problem: Avoids the choice. Gives the most general concept that can be learned. Start VS with a very weak hypothesis language. If the concept is learned: o.k. Else Refine your language and restart VS