1 UNSUPERVISED LEARNINGUNSUPERVISED LEARNING Supervised and Unsupervised Learning ID3 and Version space are supervised learning algorithms Unsupervised learning eliminates the teacher and requires that the learners form concepts (categories) on their own Conceptual clustering is the problem of discovering useful categories in unclassified data (data whose categories are not pre-determined)
2 CONCEPTUAL CLUSTERINGCONCEPTUAL CLUSTERING Unsupervised Learning and Numeric taxonomy The clustering problem begins with a collection of unclassified objects and a means for measuring the similarity of objects The goal is to organize the objects into classes so that similar objects are in one class Numeric taxonomy is one of the oldest approaches to clustering problem
3 CONCEPTUAL CLUSTERINGCONCEPTUAL CLUSTERING In Numeric Taxonomy the objects are represented as a collection of features and each of the feature has some numeric value An object is thus a vector of n feature values and can be considered as a point in n-dimensional space The similarity of any two objects can be measured by the Euclidean distance between them in this space Using this similarity metric, clustering algorithms build clusters in a bottom up fashion (agglomerative clustering strategy)
4 CONCEPTUAL CLUSTERINGCONCEPTUAL CLUSTERING The categories are formed by the following approach 1. Examine all pairs of objects, and select the pair with the highest degree of similarity and make that pair a cluster 2. Define the features of the cluster as some function, such as average, of the features of the component members and then replace the component objects with this cluster definition 3. Repeat this process on the collection of objects until all objects have been reduced to a single cluster The result of this algorithm is a binary tree whose leaf nodes are instances and whose internal nodes are clusters of increasing size
5 CONCEPTUAL CLUSTERINGCONCEPTUAL CLUSTERING
6 CONCEPTUAL CLUSTERINGCONCEPTUAL CLUSTERING We may extend this algorithm to objects represented as sets of symbolic features. The only problem is in the measuring the similarity of objects A similarity metric can be the proportion of features that any two objects have in common object 1 = {small, red, rubber, ball} object 2 = {small, blue, rubber, ball} object 3 = {large, black, wooden, ball} similarity (object 1, object 2) = ¾ similarity (object 1, object 3) = ¼ similarity (object 2, object 3) = ¼
7 CONCEPTUAL CLUSTERINGCONCEPTUAL CLUSTERING In defining categories we cannot give all features equal weight In any given context, certain of an object’s features are more important than others; simple similarity metrics treat all features equally The feature weights are to be set according to the goals of the categorization
8 CLUSTER/2CLUSTER/2 CLUSTER/2 forms k categories by constructing individuals around k seed objects The parameter k is user adjustable CLUSTER/2 evaluates the resulting clusters, selecting new seeds and repeating the process until its quality criteria are met
9 CLUSTER/2CLUSTER/2 The algorithm • Select k seeds from the set of observed objects. This may be done randomly or according to some selection function • For each seed, using that seed as a positive instance and all other seeds as negative instances, produce a maximally general definition that tries to cover all of the non-seed instances, until stopped by the negative instances (other seeds)
10 CLUSTER/2CLUSTER/2 The algorithm • Classify all objects in the sample according to these descriptions. Note that this may lead to multiple classifications of other, non seed, objects
11 CLUSTER/2CLUSTER/2 The algorithm • Replace each maximally general description with a maximally specific description that covers all objects in the category. This decreases likelihood that classes overlap on unseen objects
12 CLUSTER/2CLUSTER/2 The algorithm • Classes may still overlap on given objects. • Using a distance metric, select an element closest to the center of each class. Using these central elements as new seeds, repeat the above steps.
13 CLUSTER/2CLUSTER/2 The algorithm • Stop when clusters are satisfactory. A typical quality matrix is the complexity of the general descriptions of classes. For instance, we might prefer clusters that yield syntactically simple definitions, such as those with a small number of conjuncts
14 CLUSTER/2CLUSTER/2 The algorithm • If clusters are unsatisfactory and no improvement occurs over several iterations, select the new seeds closest to the edge of the cluster, rather than those at the center. This favors creation of totally new clusters
15 AssignmentAssignment Read Section 9.6.1 & 9.6.2 of Luger

Clustering in artificial intelligence

  • 1.
    1 UNSUPERVISED LEARNINGUNSUPERVISED LEARNING Supervisedand Unsupervised Learning ID3 and Version space are supervised learning algorithms Unsupervised learning eliminates the teacher and requires that the learners form concepts (categories) on their own Conceptual clustering is the problem of discovering useful categories in unclassified data (data whose categories are not pre-determined)
  • 2.
    2 CONCEPTUAL CLUSTERINGCONCEPTUAL CLUSTERING UnsupervisedLearning and Numeric taxonomy The clustering problem begins with a collection of unclassified objects and a means for measuring the similarity of objects The goal is to organize the objects into classes so that similar objects are in one class Numeric taxonomy is one of the oldest approaches to clustering problem
  • 3.
    3 CONCEPTUAL CLUSTERINGCONCEPTUAL CLUSTERING InNumeric Taxonomy the objects are represented as a collection of features and each of the feature has some numeric value An object is thus a vector of n feature values and can be considered as a point in n-dimensional space The similarity of any two objects can be measured by the Euclidean distance between them in this space Using this similarity metric, clustering algorithms build clusters in a bottom up fashion (agglomerative clustering strategy)
  • 4.
    4 CONCEPTUAL CLUSTERINGCONCEPTUAL CLUSTERING Thecategories are formed by the following approach 1. Examine all pairs of objects, and select the pair with the highest degree of similarity and make that pair a cluster 2. Define the features of the cluster as some function, such as average, of the features of the component members and then replace the component objects with this cluster definition 3. Repeat this process on the collection of objects until all objects have been reduced to a single cluster The result of this algorithm is a binary tree whose leaf nodes are instances and whose internal nodes are clusters of increasing size
  • 5.
  • 6.
    6 CONCEPTUAL CLUSTERINGCONCEPTUAL CLUSTERING Wemay extend this algorithm to objects represented as sets of symbolic features. The only problem is in the measuring the similarity of objects A similarity metric can be the proportion of features that any two objects have in common object 1 = {small, red, rubber, ball} object 2 = {small, blue, rubber, ball} object 3 = {large, black, wooden, ball} similarity (object 1, object 2) = ¾ similarity (object 1, object 3) = ¼ similarity (object 2, object 3) = ¼
  • 7.
    7 CONCEPTUAL CLUSTERINGCONCEPTUAL CLUSTERING Indefining categories we cannot give all features equal weight In any given context, certain of an object’s features are more important than others; simple similarity metrics treat all features equally The feature weights are to be set according to the goals of the categorization
  • 8.
    8 CLUSTER/2CLUSTER/2 CLUSTER/2 forms kcategories by constructing individuals around k seed objects The parameter k is user adjustable CLUSTER/2 evaluates the resulting clusters, selecting new seeds and repeating the process until its quality criteria are met
  • 9.
    9 CLUSTER/2CLUSTER/2 The algorithm • Selectk seeds from the set of observed objects. This may be done randomly or according to some selection function • For each seed, using that seed as a positive instance and all other seeds as negative instances, produce a maximally general definition that tries to cover all of the non-seed instances, until stopped by the negative instances (other seeds)
  • 10.
    10 CLUSTER/2CLUSTER/2 The algorithm • Classifyall objects in the sample according to these descriptions. Note that this may lead to multiple classifications of other, non seed, objects
  • 11.
    11 CLUSTER/2CLUSTER/2 The algorithm • Replaceeach maximally general description with a maximally specific description that covers all objects in the category. This decreases likelihood that classes overlap on unseen objects
  • 12.
    12 CLUSTER/2CLUSTER/2 The algorithm • Classesmay still overlap on given objects. • Using a distance metric, select an element closest to the center of each class. Using these central elements as new seeds, repeat the above steps.
  • 13.
    13 CLUSTER/2CLUSTER/2 The algorithm • Stopwhen clusters are satisfactory. A typical quality matrix is the complexity of the general descriptions of classes. For instance, we might prefer clusters that yield syntactically simple definitions, such as those with a small number of conjuncts
  • 14.
    14 CLUSTER/2CLUSTER/2 The algorithm • Ifclusters are unsatisfactory and no improvement occurs over several iterations, select the new seeds closest to the edge of the cluster, rather than those at the center. This favors creation of totally new clusters
  • 15.