Double-counting spaces?! There may be a bug in this derivation. As it stands, (M + 1)k+2 seems to potentially double-count the interword spaces and perhaps miss all-space stretches. W. Li (personal correspondence) claims that the normalization of Equation 5.4 makes this issue disappear, but I am not fully convinced.

Subject: Re: Clarification of your InfoTheory92 paper?
Date: Wed, 9 Jun 1999 17:08:45 -0400 (EDT)
From: “W. Li” <wli@crick.rockefeller.edu>
To: rik@cs.ucsd.edu

I just went back to read my paper. I think counting the space once or twice leads to the same result. In the paper, I wrote the probability . . . is “proportional” to. . . . then I added up all to get the normalization factor. The same (1/27) factor will be canceled.

No, nobody asked this question before! I suspect people only look at the abstract/conclusion and never bother with the derivation itself (it can be bad for the authors because no feedback!).