learning Bayes net parameters with missing data

(2.4 hours to learn)

Summary

There is no closed-form solution for the maximum likelihood parameters of a Bayes net when some of the variables are unobserved. However, it is possible to apply the EM algorithm, where the E step involves computing marginals and the M step involves computing the maximum likelihood parameters with fully observed data.

Context

This concept has the prerequisites:

Goals

  • Be able to use the EM algorithm to learn Bayes net parameters when some of the variables are unobserved.
    • Know how to derive the update rules.
    • Know how you would implement it if you're given toolboxes for inference and for parameter learning with fully observed data. What outputs are needed from the inference algorithm?
  • What is the missing at random assumption, and why is it needed to apply EM?
  • In the fully observed case, maximum likelihood decomposed into separate estimation problems for each clique. Why doesn't that happen when there is missing data?
    • And why does the decomposition hold in the M step?
  • Give an example where the likelihood function is multimodal (and therefore you shouldn't always expect to the global optimum).
  • Give an example where the model is unidentifiable, i.e. multiple parameter settings are equally good.

Core resources (read/watch one of the following)

-Free-

Coursera: Probabilistic Graphical Models (2013)
An online course on probabilistic graphical models.
Author: Daphne Koller
Other notes:
  • The lecture "EM in practice" has good practical advice about using EM, and "Latent variables" talks about some cool applications.
  • Click on "Preview" to see the videos.

-Paid-

Supplemental resources (the following are optional, but you may find them useful)

-Free-

Coursera: Machine Learning
An online machine learning course aimed at advanced undergraduates.
Author: Pedro Domingos
Other notes:
  • Click on "Preview" to see the videos.

See also

-No Additional Notes-