Observable Markov models explained visually

by Will Paul Credits | Docs

Observable Markov models (or just Markov models/chains) are kind of scary by default, who's Markov, what's observable, and why do I care? To make matters worse most textbooks start with a technical definition and some lambdas and letters for good measure. When in reality the concept is simple and acts as a building block for a number of sequential classifiers.

All Markov models are based around one simplifying assumption that a guy named Markov posited over a hundred years ago: what if the chance of the next event depended on only the previous event (or a set amount of previous events). This is a simplifying assumption, because in most cases it's just not true, the real world is far more complex than that. So why even bother then? Because this assumption makes it possible for us to simulate and predict events efficiently with data we can actually collect.

So what sort of events can we predict with Markov models? The classic textbook example is weather. Imagine we're predicting weather in Rochester and decide to bucket the weather into two possible states: cloudy and raining. We might look through the weather data in aggregate and find that there is a 50/50 chance of rain vs. clouds and decide that the best simulation we could make of the Rochester weather was effectively a coin toss: heads for rain, tails for clouds.

However, with a Markov model we can actually do better. By looking at the order of weather events in our data we uncover the fact that weather is sort of 'sticky', i.e. when raining its more likely to stay raining than to change and vice versa. This is exactly the sort of information that can be encoded in a Markov model. Simply by increasing the probability of rainy \(\rightarrow\) rainy and cloudy \(\rightarrow\) cloudy and suddenly the model's performance improves over the old coin toss approach. Obviously, this is a contrived example, but Markov models are used all around you in the real world as well, from the autocomplete on your phone to Google's PageRank algorithm.

Instructions

Each node in the graph below represents a state and each link represents the probability of transitioning from one node to the next. These correlate with the cells in the transition matrix, which represent the probability of going from the row node to the column node. Note that outgoing paths (rows) sum to 1, but that no such restriction is placed on columns.


This transition matrix shows the probability of going from one state (rows) to another (columns).


You might be wondering why I call these 'observable' Markov models. It's to emphasize that all the states are not hidden. For example finding out the whether its rainy or cloudy is something that can usually be automatically measured. A powerful extension built on top of Markov models can be used to predict hidden states based on what can be observed. For example using the weather (observable) to predict ice cream sales (hidden) or converting raw speech sound waves (observable) to text (hidden). But that's another visualization for another day.