Compute trigram probability from bigrams probabilities

Given bigram probabilities for words in a text, how would one compute trigram probabilities ? For example, if we know that P(dog cat) = 0.3 and P(cat mouse) = 0.2 how do we find the probability of P(dog cat mouse) ? Thank you!

3,835 8 8 gold badges 26 26 silver badges 35 35 bronze badges asked Dec 14, 2013 at 10:22 Leopold Joy Leopold Joy 4,630 4 4 gold badges 29 29 silver badges 38 38 bronze badges

Why would you want to do that? It's unlikely to be a good estimator of true trigram probability. There might never even be a third word after dog cat , there just no way to tell from probabilities based "given two words, this combination occurs X out of Z times"

Commented Dec 14, 2013 at 10:59

I understand that it is not a good way to get a trigram's probability, but is there some way to estimate the probability given 2 bigrams?

Commented Dec 14, 2013 at 17:25

1 Answer 1

In the following I consider a trigram as three random variables A,B,C . So dog cat horse would be A=dog, B=cat, C=horse .

Using the chain rule: P(A,B,C) = P(A,B) * P(C|A,B) . Now your stuck if you want to stay exact.

What you can do is assuming C is independent of A given B . Then it holds that P(C|A,B) = P(C|B) . And P(C|B) = P(C,B) / P(B) , which you should be able to compute from your trigram frequencies. Note that in your case P(C|B) should really be the probability of C following a B , so it's the probability of a BC divided by the probability of a B* .

So to sum it up, when using the conditional independence assumption:

P(ABC) = P(AB) * P(BC) / P(B*) 

And to compute P(B*) you have to sum up the probabilities for all trigrams beginning with B .