Word frequencies from French Google Books corpus

The Google Books NGram Viewer is a great resource. It has word frequency counts for a large sampling of books spanning hundreds of years and many languages.

I wrote some code (in this Colab notebook) to help me augment my vocabulary lists with frequencies of how often a word, in any of its inflected forms, appears in the subset of French books published in 2007 and known to Google. This lets me post tables like this:

expression (root)frequency
bondé1 in 742,000
détrempé1 in 2,040,000
patauger1 in 1,220,000
ballotté1 in 834,000
pain azyme1 in 10,200,000

According to this estimate, I’d come upon the word have to ready 742,000 words on average before coming upon bondé or one of its forms. As it happens, the usage of this word in books has been becoming (somewhat) more common over time:

I’ve gone back to my earlier vocabulary list posts (Pietr-le-Letton Chapter 5 and Chapter 6) and updated the lists with frequencies. I’ve also pointed out a few false conflations that Google has made (e.g. it thinks étaient is a form of the rare verb étayer. It is, but most of the instances of étaient are conjugations of être.) Take a look at the old list posts, and play around with the NGram viewer if you’ve never seen it before.