Graphical Models Prepared By: Nivetha Department of Computer Science and Engineering
Graphical Models • A graphical model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. • It provides a language to facilitate communication between a domain expert and a statistician, provide flexible and modular definitions of families of probability distributions, and are amenable to scalable computational techniques • Graphical models in machine learning are a powerful framework used to represent and reason about the dependencies between variables. • These models provide a structured way to visualize and compute joint probabilities for a set of variables in complex systems, which is useful for tasks like prediction, decision making, and inference.
Graphical Models • The Graphical model (GM) is a branch of ML which uses a graph to represent a domain problem • Probabilistic graphical modeling combines both probability and graph theory • Also called as Bayesian networks, belief networks or probabilistic networks • Consists of graph structure-Nodes and arcs • Two categories — Bayesian networks and Markov networks
Graphical Models • Each node corresponds to a random variable, X, and has a value corresponding to the probability of the random variable, P(X). • If there is a directed arc from node X to node Y, this indicates that X has a direct influence on Y. • This influence is specified by the conditional probability directed acyclic P(Y|X). • Bayesian -The network is a directed acyclic graph (DAG); namely, these are graph no cycles. • The nodes and the arcs between the nodes define the structure of the network, and the conditional probabilities are the parameters given the structure.
Example • This example models that rain causes the grass to get wet • It rains on 40 percent of the days and when it rains, there is a 90 percent chance that the grass gets wet; maybe 10 percent of the time it does not rain long enough for us to really consider the grass wet enough. • The random variables in this example are binary; they are either true or false. • There is a 20 percent probability that the grass gets wet without its actually raining, for example, when a sprinkler is used
Ex: Bayesian network modeling that rain is the cause of wet
Conditional Independence • In a graphical model, not all nodes are connected; actually, in general, a node is connected to only a small number of other nodes. • Certain subgraphs imply conditional independence statements, and these allow us to break down a complex graph into smaller subsets in which inferences can be done locally and whose results are later propagated over the graph
Canonical Cases for Conditional Independence • Head-to-tail Connection • Tail-to-Tail Connection • Head-to-Head Connection
Canonical Cases for Conditional Independence Case 1: Head-to-tail Connection •Three events may be connected serially, as seen in figure . We see here that X and Z are independent given Y: Knowing Y tells Z everything; knowing the state of X does not add any extra knowledge for Z; we write P(Z|Y,X)= P(Z|Y). We say that Y blocks the path from X to Z, or in other words, it separates them in the sense that if Y is removed, there is no path between X to Z. In this case, the joint is written as
Case 1: Head-to-tail
Case 1: Head to Tail Connection
Case 2: Tail-to-tail X may be the parent of two nodes Y and Z. The joint density is written as Normally Y and Z are dependent through X; given X, they become independent:
Case 3 Head-to-head there are two parents X and Y to a single node Z,
Combining Sub graphs
Advantages • No of variables stored is less • we do not need to designate explicitly certain variables as input and certain others as output.
Example Graphical Model • Naïve Bayes Classifier • Hidden Markov model
Example Graphical Model Naive Bayes’ Classifier Hidden Markov Model
Classification
BAYESIAN NETWORKS  Directed graphs not contain cycles, that is, there cannot be any loops in the graphs(DAGs: directed, acyclic graphs) , when they are paired with the conditional probability tables, they are called Bayesian networks  Bayesian Networks help us to effectively visualize the probabilistic model for each domain and to study the relationship between random variables in the form of a user-friendly graph.
Why Bayes Network?  Bayes optimal classifier is too costly to apply  Naïve Bayes makes overly restrictive assumptions.  But all variables are rarely completely independent.  Bayes network represents conditional independence relations among the features.  Representation of causal relations makes the representation and inference efficient.
Bayes Network  Two different ways to calculate the conditional probability.  Given A and B are dependent events, the conditional probability is calculated as P (A| B) = P (A and B) / P (B)  If A and B are independent events, then the expression for conditional probability is given by, P(A| B) = P (A)
Bayesian Network – example 1 o The probability of a random variable depends on his parents. oBayesian network models capture both conditionally dependent and conditionally independent relationships between random variables.  Create a Bayesian Network that will model the marks of a student in his examination
Bayesian Network- example The marks will depend on  Exam Level (e) :(difficult, easy)  IQ of the students(I): (high,low)  Marks -> admitted to a university  The IQ -> aptitude score(s) of the student  Each node has a probability table
Bayesian Network- example  Exam level and IQ level are parent nodes – represented the probability  Marks depends on Exam level and IQ level – represented by conditional probability .  Conditional probability table for Marks contains entry for Exam level and IQ level  Conditional probability table for Admission contains entry for Marks  Conditional probability table for Apti score contains entry for IQ level
Bayesian Network- example  Calculate Joint probability p(a,m,i,e,s)=p(a|m) p(m|i,e) p(e) p(i) p(s|i)  p(a|m) : CP of student admit-> marks  p(m|i,d):cp of the student’s marks ->(IQ & Exam level)  p(i): probability -> IQ level  p(e): probability -> exam level  p(a): probability ->aptitude level  p(s|i) CP of aptitude scores ->IQ level
Bayesian Network- example Calculate the probability that in spite of the exam level being difficult, the student having a low IQ level and a low Aptitude Score, manages to pass the exam and secure admission to the university. Joint Probability Distribution can be written as P[a=1, m=1, i=0, e=1, s=0] From the above Conditional Probability tables, the values for the given conditions are fed to the formula and is calculated as below. P[a=1, m=1, i=0, e=0, s=0] = P(a=1 | m=1) . P(m=1 | i=0, e=1) . P(i=0) . P(e=1) . P(s=0 | i=0) = 0.1 * 0.1 * 0.8 * 0.3 * 0.75 = 0.0018
Bayesian Networks – Example 2  You have a new burglar alarm installed at home  It is reliable at detecting burglary ,but also sometimes responds to minor earthquakes.  You have two neighbors, John and Mary ,who promised to call you at work when they hear the alarm  John always calls when he hears the alarm, but sometimes confuses telephone ringing with the alarm and calls too  Merry likes loud music and sometimes misses the alarm  Given the evidence of who has or has not called, we would like to estimate the probability of a burglary
Probability for no burglary =1-0.01 =0.99 Probability of for earthquake =1-0.02 =0.98 Probability for no alarm given burglary and earthquake =1- 0.95 Probability for Mary will not call and no =1-0.01=0.99
1.What is the probability that the alarm has sounded but neither a burglary nor an earthquake has occurred, and both John and Merry call?
2. What is the probability that John call?
Naive Bayes’ Classifier If the inputs are independent, we have the graph which is called the naive Bayes’ classifier, because it ignores possible dependencies, namely, correlations, among the inputs and reduces a multivariate problem to a group of univariate problems
The Hidden Markov model (HMM) • The Hidden Markov model (HMM) is a statistical model and uses a Markov process that contains hidden and unknown parameters. • In this model, the observed parameters are used to identify the hidden parameters. These parameters are then used for further analysis • It is a probabilistic graphical model that is commonly used in statistical pattern recognition and classification.
The Hidden Markov model (HMM)
The Hidden Markov model (HMM)
The Hidden Markov model (HMM)
Hidden Markov Model as a Graphical Model
Hidden Markov Model as a Graphical Model
Hidden Markov Model as a Graphical Model
Unit V -Graphical Models in artificial intelligence and machine learning
Unit V -Graphical Models in artificial intelligence and machine learning
Unit V -Graphical Models in artificial intelligence and machine learning

Unit V -Graphical Models in artificial intelligence and machine learning

  • 1.
    Graphical Models Prepared By:Nivetha Department of Computer Science and Engineering
  • 2.
    Graphical Models • Agraphical model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. • It provides a language to facilitate communication between a domain expert and a statistician, provide flexible and modular definitions of families of probability distributions, and are amenable to scalable computational techniques • Graphical models in machine learning are a powerful framework used to represent and reason about the dependencies between variables. • These models provide a structured way to visualize and compute joint probabilities for a set of variables in complex systems, which is useful for tasks like prediction, decision making, and inference.
  • 3.
    Graphical Models • TheGraphical model (GM) is a branch of ML which uses a graph to represent a domain problem • Probabilistic graphical modeling combines both probability and graph theory • Also called as Bayesian networks, belief networks or probabilistic networks • Consists of graph structure-Nodes and arcs • Two categories — Bayesian networks and Markov networks
  • 4.
    Graphical Models • Eachnode corresponds to a random variable, X, and has a value corresponding to the probability of the random variable, P(X). • If there is a directed arc from node X to node Y, this indicates that X has a direct influence on Y. • This influence is specified by the conditional probability directed acyclic P(Y|X). • Bayesian -The network is a directed acyclic graph (DAG); namely, these are graph no cycles. • The nodes and the arcs between the nodes define the structure of the network, and the conditional probabilities are the parameters given the structure.
  • 5.
    Example • This examplemodels that rain causes the grass to get wet • It rains on 40 percent of the days and when it rains, there is a 90 percent chance that the grass gets wet; maybe 10 percent of the time it does not rain long enough for us to really consider the grass wet enough. • The random variables in this example are binary; they are either true or false. • There is a 20 percent probability that the grass gets wet without its actually raining, for example, when a sprinkler is used
  • 6.
    Ex: Bayesian networkmodeling that rain is the cause of wet
  • 7.
    Conditional Independence • Ina graphical model, not all nodes are connected; actually, in general, a node is connected to only a small number of other nodes. • Certain subgraphs imply conditional independence statements, and these allow us to break down a complex graph into smaller subsets in which inferences can be done locally and whose results are later propagated over the graph
  • 8.
    Canonical Cases forConditional Independence • Head-to-tail Connection • Tail-to-Tail Connection • Head-to-Head Connection
  • 9.
    Canonical Cases forConditional Independence Case 1: Head-to-tail Connection •Three events may be connected serially, as seen in figure . We see here that X and Z are independent given Y: Knowing Y tells Z everything; knowing the state of X does not add any extra knowledge for Z; we write P(Z|Y,X)= P(Z|Y). We say that Y blocks the path from X to Z, or in other words, it separates them in the sense that if Y is removed, there is no path between X to Z. In this case, the joint is written as
  • 10.
  • 11.
    Case 1: Headto Tail Connection
  • 12.
    Case 2: Tail-to-tail Xmay be the parent of two nodes Y and Z. The joint density is written as Normally Y and Z are dependent through X; given X, they become independent:
  • 13.
    Case 3 Head-to-headthere are two parents X and Y to a single node Z,
  • 14.
  • 15.
    Advantages • No ofvariables stored is less • we do not need to designate explicitly certain variables as input and certain others as output.
  • 16.
    Example Graphical Model •Naïve Bayes Classifier • Hidden Markov model
  • 17.
    Example Graphical Model NaiveBayes’ Classifier Hidden Markov Model
  • 18.
  • 19.
    BAYESIAN NETWORKS  Directedgraphs not contain cycles, that is, there cannot be any loops in the graphs(DAGs: directed, acyclic graphs) , when they are paired with the conditional probability tables, they are called Bayesian networks  Bayesian Networks help us to effectively visualize the probabilistic model for each domain and to study the relationship between random variables in the form of a user-friendly graph.
  • 20.
    Why Bayes Network? Bayes optimal classifier is too costly to apply  Naïve Bayes makes overly restrictive assumptions.  But all variables are rarely completely independent.  Bayes network represents conditional independence relations among the features.  Representation of causal relations makes the representation and inference efficient.
  • 21.
    Bayes Network  Twodifferent ways to calculate the conditional probability.  Given A and B are dependent events, the conditional probability is calculated as P (A| B) = P (A and B) / P (B)  If A and B are independent events, then the expression for conditional probability is given by, P(A| B) = P (A)
  • 22.
    Bayesian Network –example 1 o The probability of a random variable depends on his parents. oBayesian network models capture both conditionally dependent and conditionally independent relationships between random variables.  Create a Bayesian Network that will model the marks of a student in his examination
  • 23.
    Bayesian Network- example Themarks will depend on  Exam Level (e) :(difficult, easy)  IQ of the students(I): (high,low)  Marks -> admitted to a university  The IQ -> aptitude score(s) of the student  Each node has a probability table
  • 24.
    Bayesian Network- example Exam level and IQ level are parent nodes – represented the probability  Marks depends on Exam level and IQ level – represented by conditional probability .  Conditional probability table for Marks contains entry for Exam level and IQ level  Conditional probability table for Admission contains entry for Marks  Conditional probability table for Apti score contains entry for IQ level
  • 25.
    Bayesian Network- example Calculate Joint probability p(a,m,i,e,s)=p(a|m) p(m|i,e) p(e) p(i) p(s|i)  p(a|m) : CP of student admit-> marks  p(m|i,d):cp of the student’s marks ->(IQ & Exam level)  p(i): probability -> IQ level  p(e): probability -> exam level  p(a): probability ->aptitude level  p(s|i) CP of aptitude scores ->IQ level
  • 26.
    Bayesian Network- example Calculatethe probability that in spite of the exam level being difficult, the student having a low IQ level and a low Aptitude Score, manages to pass the exam and secure admission to the university. Joint Probability Distribution can be written as P[a=1, m=1, i=0, e=1, s=0] From the above Conditional Probability tables, the values for the given conditions are fed to the formula and is calculated as below. P[a=1, m=1, i=0, e=0, s=0] = P(a=1 | m=1) . P(m=1 | i=0, e=1) . P(i=0) . P(e=1) . P(s=0 | i=0) = 0.1 * 0.1 * 0.8 * 0.3 * 0.75 = 0.0018
  • 27.
    Bayesian Networks –Example 2  You have a new burglar alarm installed at home  It is reliable at detecting burglary ,but also sometimes responds to minor earthquakes.  You have two neighbors, John and Mary ,who promised to call you at work when they hear the alarm  John always calls when he hears the alarm, but sometimes confuses telephone ringing with the alarm and calls too  Merry likes loud music and sometimes misses the alarm  Given the evidence of who has or has not called, we would like to estimate the probability of a burglary
  • 28.
    Probability for no burglary=1-0.01 =0.99 Probability of for earthquake =1-0.02 =0.98 Probability for no alarm given burglary and earthquake =1- 0.95 Probability for Mary will not call and no =1-0.01=0.99
  • 29.
    1.What is theprobability that the alarm has sounded but neither a burglary nor an earthquake has occurred, and both John and Merry call?
  • 30.
    2. What isthe probability that John call?
  • 31.
    Naive Bayes’ Classifier Ifthe inputs are independent, we have the graph which is called the naive Bayes’ classifier, because it ignores possible dependencies, namely, correlations, among the inputs and reduces a multivariate problem to a group of univariate problems
  • 34.
    The Hidden Markov model(HMM) • The Hidden Markov model (HMM) is a statistical model and uses a Markov process that contains hidden and unknown parameters. • In this model, the observed parameters are used to identify the hidden parameters. These parameters are then used for further analysis • It is a probabilistic graphical model that is commonly used in statistical pattern recognition and classification.
  • 35.
  • 36.
  • 37.
  • 38.
    Hidden Markov Modelas a Graphical Model
  • 39.
    Hidden Markov Modelas a Graphical Model
  • 40.
    Hidden Markov Modelas a Graphical Model