Graphical Models • Agraphical model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. • It provides a language to facilitate communication between a domain expert and a statistician, provide flexible and modular definitions of families of probability distributions, and are amenable to scalable computational techniques • Graphical models in machine learning are a powerful framework used to represent and reason about the dependencies between variables. • These models provide a structured way to visualize and compute joint probabilities for a set of variables in complex systems, which is useful for tasks like prediction, decision making, and inference.
3.
Graphical Models • TheGraphical model (GM) is a branch of ML which uses a graph to represent a domain problem • Probabilistic graphical modeling combines both probability and graph theory • Also called as Bayesian networks, belief networks or probabilistic networks • Consists of graph structure-Nodes and arcs • Two categories — Bayesian networks and Markov networks
4.
Graphical Models • Eachnode corresponds to a random variable, X, and has a value corresponding to the probability of the random variable, P(X). • If there is a directed arc from node X to node Y, this indicates that X has a direct influence on Y. • This influence is specified by the conditional probability directed acyclic P(Y|X). • Bayesian -The network is a directed acyclic graph (DAG); namely, these are graph no cycles. • The nodes and the arcs between the nodes define the structure of the network, and the conditional probabilities are the parameters given the structure.
5.
Example • This examplemodels that rain causes the grass to get wet • It rains on 40 percent of the days and when it rains, there is a 90 percent chance that the grass gets wet; maybe 10 percent of the time it does not rain long enough for us to really consider the grass wet enough. • The random variables in this example are binary; they are either true or false. • There is a 20 percent probability that the grass gets wet without its actually raining, for example, when a sprinkler is used
Conditional Independence • Ina graphical model, not all nodes are connected; actually, in general, a node is connected to only a small number of other nodes. • Certain subgraphs imply conditional independence statements, and these allow us to break down a complex graph into smaller subsets in which inferences can be done locally and whose results are later propagated over the graph
Canonical Cases forConditional Independence Case 1: Head-to-tail Connection •Three events may be connected serially, as seen in figure . We see here that X and Z are independent given Y: Knowing Y tells Z everything; knowing the state of X does not add any extra knowledge for Z; we write P(Z|Y,X)= P(Z|Y). We say that Y blocks the path from X to Z, or in other words, it separates them in the sense that if Y is removed, there is no path between X to Z. In this case, the joint is written as
Case 2: Tail-to-tail Xmay be the parent of two nodes Y and Z. The joint density is written as Normally Y and Z are dependent through X; given X, they become independent:
BAYESIAN NETWORKS Directedgraphs not contain cycles, that is, there cannot be any loops in the graphs(DAGs: directed, acyclic graphs) , when they are paired with the conditional probability tables, they are called Bayesian networks Bayesian Networks help us to effectively visualize the probabilistic model for each domain and to study the relationship between random variables in the form of a user-friendly graph.
20.
Why Bayes Network? Bayes optimal classifier is too costly to apply Naïve Bayes makes overly restrictive assumptions. But all variables are rarely completely independent. Bayes network represents conditional independence relations among the features. Representation of causal relations makes the representation and inference efficient.
21.
Bayes Network Twodifferent ways to calculate the conditional probability. Given A and B are dependent events, the conditional probability is calculated as P (A| B) = P (A and B) / P (B) If A and B are independent events, then the expression for conditional probability is given by, P(A| B) = P (A)
22.
Bayesian Network –example 1 o The probability of a random variable depends on his parents. oBayesian network models capture both conditionally dependent and conditionally independent relationships between random variables. Create a Bayesian Network that will model the marks of a student in his examination
23.
Bayesian Network- example Themarks will depend on Exam Level (e) :(difficult, easy) IQ of the students(I): (high,low) Marks -> admitted to a university The IQ -> aptitude score(s) of the student Each node has a probability table
24.
Bayesian Network- example Exam level and IQ level are parent nodes – represented the probability Marks depends on Exam level and IQ level – represented by conditional probability . Conditional probability table for Marks contains entry for Exam level and IQ level Conditional probability table for Admission contains entry for Marks Conditional probability table for Apti score contains entry for IQ level
25.
Bayesian Network- example Calculate Joint probability p(a,m,i,e,s)=p(a|m) p(m|i,e) p(e) p(i) p(s|i) p(a|m) : CP of student admit-> marks p(m|i,d):cp of the student’s marks ->(IQ & Exam level) p(i): probability -> IQ level p(e): probability -> exam level p(a): probability ->aptitude level p(s|i) CP of aptitude scores ->IQ level
26.
Bayesian Network- example Calculatethe probability that in spite of the exam level being difficult, the student having a low IQ level and a low Aptitude Score, manages to pass the exam and secure admission to the university. Joint Probability Distribution can be written as P[a=1, m=1, i=0, e=1, s=0] From the above Conditional Probability tables, the values for the given conditions are fed to the formula and is calculated as below. P[a=1, m=1, i=0, e=0, s=0] = P(a=1 | m=1) . P(m=1 | i=0, e=1) . P(i=0) . P(e=1) . P(s=0 | i=0) = 0.1 * 0.1 * 0.8 * 0.3 * 0.75 = 0.0018
27.
Bayesian Networks –Example 2 You have a new burglar alarm installed at home It is reliable at detecting burglary ,but also sometimes responds to minor earthquakes. You have two neighbors, John and Mary ,who promised to call you at work when they hear the alarm John always calls when he hears the alarm, but sometimes confuses telephone ringing with the alarm and calls too Merry likes loud music and sometimes misses the alarm Given the evidence of who has or has not called, we would like to estimate the probability of a burglary
28.
Probability for no burglary=1-0.01 =0.99 Probability of for earthquake =1-0.02 =0.98 Probability for no alarm given burglary and earthquake =1- 0.95 Probability for Mary will not call and no =1-0.01=0.99
29.
1.What is theprobability that the alarm has sounded but neither a burglary nor an earthquake has occurred, and both John and Merry call?
Naive Bayes’ Classifier Ifthe inputs are independent, we have the graph which is called the naive Bayes’ classifier, because it ignores possible dependencies, namely, correlations, among the inputs and reduces a multivariate problem to a group of univariate problems
34.
The Hidden Markov model(HMM) • The Hidden Markov model (HMM) is a statistical model and uses a Markov process that contains hidden and unknown parameters. • In this model, the observed parameters are used to identify the hidden parameters. These parameters are then used for further analysis • It is a probabilistic graphical model that is commonly used in statistical pattern recognition and classification.