Given bigram probabilities for words in a text, how would one compute trigram probabilities ? For example, if we know that P(dog cat) = 0.3 and P(cat mouse) = 0.2 how do we find the probability of P(dog cat mouse) ? Thank you!
Why would you want to do that? It's unlikely to be a good estimator of true trigram probability. There might never even be a third word after dog cat , there just no way to tell from probabilities based "given two words, this combination occurs X out of Z times"
Commented Dec 14, 2013 at 10:59I understand that it is not a good way to get a trigram's probability, but is there some way to estimate the probability given 2 bigrams?
Commented Dec 14, 2013 at 17:25In the following I consider a trigram as three random variables A,B,C . So dog cat horse would be A=dog, B=cat, C=horse .
Using the chain rule: P(A,B,C) = P(A,B) * P(C|A,B) . Now your stuck if you want to stay exact.
What you can do is assuming C is independent of A given B . Then it holds that P(C|A,B) = P(C|B) . And P(C|B) = P(C,B) / P(B) , which you should be able to compute from your trigram frequencies. Note that in your case P(C|B) should really be the probability of C following a B , so it's the probability of a BC divided by the probability of a B* .
So to sum it up, when using the conditional independence assumption:
P(ABC) = P(AB) * P(BC) / P(B*)
And to compute P(B*) you have to sum up the probabilities for all trigrams beginning with B .