The average book length is probably not over 400 pages. An average page probably doesn't have over 25 sentences on it. So the whole book contains only ten thousand sentences.
That gives you 14 more bits of entropy.
The total is 41 bits of entropy. This is one-eighth as secure as a 4-gram composed of random words from a corpus of 2k, if we measure strictly by entropy.
The situation is actually much worse, though: your favorite book is probably a popular book. So the number of bits of entropy provided by the choice of book might be a lot smaller than 27. I would guess that it's perhaps 10.
And many of those 129 million books are not very different. They contain quotes from other books, reprinted short stories, folk tales, set phrases, and so on.
In practice I think it might be difficult to mount a password-guessing attack using the Google Books corpus, because it's hard to get access to that corpus. The Project Gutenberg corpus would not be so hard.
Of course, the flip side of this is that we're veering off into attacks where you're targeting one specific person and know a bit about how they've chosen their password.
If you want to mount such an attack, fine, but most of us are dealing with the much-more-common threat of someone who gets a file or a database of hashed passwords and wants to crack them all in one go.
That's an interesting analysis. I can't really see any major deficiencies with it.
On the plus side, a sentence is probably going to be easier to remember than 4 random words. Personally, I draw some of my "high-security" passwords from literature, but then I modify the case and do the "leetspeak" character substitution, so a naive sentence attack would not work. A more clever one might, though.
Edit: Oops, as dpark points out, I swapped two digits. My apologies. Below, my original, erroneous comment.
41 bits of entropy means you have on the order of a one in 10^12 chance (2^41) of guessing it, and 2,000^4 is on the order of 10^16. So how is the former "one eighth as secure" as the other? Wouldn’t it be 10^4 times less secure, that is, 10,000 times more likely to be cracked?
If you choose one of eight small modifications to apply at a randomly-selected character, you get perhaps 6 bits of entropy from the choice of character and 3 bits from the choice of modification. That's better, but adding an extra common word to the end of the sentence would be better still.
Your favorite book is almost certainly chosen from the 129 million books that Google knows about: http://www.fastcompany.com/1678254/how-many-books-are-there-...
That gives you 27 bits of entropy.
The average book length is probably not over 400 pages. An average page probably doesn't have over 25 sentences on it. So the whole book contains only ten thousand sentences.
That gives you 14 more bits of entropy.
The total is 41 bits of entropy. This is one-eighth as secure as a 4-gram composed of random words from a corpus of 2k, if we measure strictly by entropy.
The situation is actually much worse, though: your favorite book is probably a popular book. So the number of bits of entropy provided by the choice of book might be a lot smaller than 27. I would guess that it's perhaps 10.
And many of those 129 million books are not very different. They contain quotes from other books, reprinted short stories, folk tales, set phrases, and so on.
In practice I think it might be difficult to mount a password-guessing attack using the Google Books corpus, because it's hard to get access to that corpus. The Project Gutenberg corpus would not be so hard.