Description
<p style="margin-bottom: 0px; padding: 5px 0px 5px 10px; border: 0px; outline: 0px; vertical-align: baseline; font-family: Arial;">Hidden Markov Models are used in multiple areas of Machine Learning, such as speech recognition, handwritten letter recognition or natural language processing.</p><p style="margin-bottom: 0px; padding: 5px 0px 5px 10px; border: 0px; outline: 0px; vertical-align: baseline; font-family: Arial;"><a name="HiddenMarkovModels-FormalDefinition" style="margin: 0px; padding: 0px; color: rgb(48, 76, 144); border: 0px; outline: 0px; vertical-align: baseline;"></a></p><h2 id="formal-definition" style="margin: 0px; padding: 20px 10px 5px; font-family: Arial; font-weight: normal; line-height: 27.299999237060547px; color: rgb(85, 85, 85); text-rendering: optimizelegibility; font-size: 1.5em; border: 0px; outline: 0px; vertical-align: baseline;">Formal Definition</h2><p style="margin-bottom: 0px; padding: 5px 0px 5px 10px; border: 0px; outline: 0px; vertical-align: baseline; font-family: Arial;">A Hidden Markov Model (HMM) is a statistical model of a process consisting of two (in our case discrete) random variables O and Y, which change their state sequentially. The variable Y with states {y_1, ... , y_n} is called the "hidden variable", since its state is not directly observable. The state of Y changes sequentially with a so called - in our case first-order - Markov Property. This means, that the state change probability of Y only depends on its current state and does not change in time. Formally we write: P(Y(t+1)=y_i|Y(0)...Y(t)) = P(Y(t+1)=y_i|Y(t)) = P(Y(2)=y_i|Y(1)). The variable O with states {o_1, ... , o_m} is called the "observable variable", since its state can be directly observed. O does not have a Markov Property, but its state probability depends statically on the current state of Y.</p><p style="margin-bottom: 0px; padding: 5px 0px 5px 10px; border: 0px; outline: 0px; vertical-align: baseline; font-family: Arial;">Formally, an HMM is defined as a tuple M=(n,m,P,A,B), where n is the number of hidden states, m is the number of observable states, P is an n-dimensional vector containing initial hidden state probabilities, A is the nxn-dimensional "transition matrix" containing the transition probabilities such that A[i,j](i,j.html) =P(Y(t)=y_i|Y(t-1)=y_j) and B is the mxn-dimensional "emission matrix" containing the observation probabilities such that B[i,j]= P(O=o_i|Y=y_j).</p>