Followup to: Searching for Bayes-Structure
Previously I spoke of mutual information between X and Y, I(X;Y), which is the difference between the entropy of the joint probability distribution, H(X,Y) and the entropies of the marginal distributions, H(X) + H(Y).
I gave the example of a variable X, having eight states 1..8 which are all equally probable if we have not yet encountered any evidence; and a variable Y, with states 1..4, which are all equally probable if we have not yet encountered any evidence. Then if we calculate the marginal entropies H(X) and H(Y), we will find that X has 3 bits of entropy, and Y has 2 bits.
However, we also know that X and Y are both even or both odd; and this is all we know about the relation between them. So for the joint distribution (X,Y) there are only 16 possible states, all equally probable, for a joint entropy of 4 bits. This is a 1-bit entropy defect, compared to 5 bits of entropy if X and Y were independent. This entropy defect is the mutual information – the information that X tells us about Y, or vice versa, so that we are not as uncertain about one after having learned the other.
Suppose, however, that there exists a third variable Z. Z has two states, “even” and “odd”, perfectly correlated to the evenness or oddness of (X,Y). In fact, we’ll suppose that Z is just the question “Are X and Y even or odd?”
If we have no evidence about X and Y, then Z itself necessarily has 1 bit of entropy on the information given. There is 1 bit of mutual information between Z and X, and 1 bit of mutual information between Z and Y. And, as previously noted, 1 bit of mutual information between X and Y. So how much entropy for the whole system (X,Y,Z)? You might naively expect that
H(X,Y,Z) = H(X) + H(Y) + H(Z) – I(X;Z) – I(Z;Y) – I(X;Y)
but this turns out not to be the case.
Continue reading "Conditional Independence, and Naive Bayes" »
loading...