Vocab list: Pietr-le-Letton, Chapter 7

I’m making lists of unfamiliar words as I read George Simenon’s 1931 Pietr-le-Letton, the novel debut of the famous commissaire Maigret. Here’s my list for Chapter 7 (Troisième Entracte) with links to definitions and word frequencies from Google Books NGram Viewer.

In this chapter, Maigret stops briefly at the hotel where a person of interest is staying, then follows them first to the theater and then to a cabaret nightclub. I’m a fan of French theater, so many theater-specific vocab words did not make it onto this list (though some did). The list is largely words about coming, going, dining, and dressing.

I’ve augmented my frequency tables based on a comment from reader F. P., who suggested that I display both modern word frequency and contemporaneous frequency. The most recent data I have from the Google NGram Viewer ends at 2008. Pietr-le-Letton was published in 1931, but I rounded back to 1928 for aesthetics. 1968 falls midway between these two.

To put these numbers in some perspective, a typical novel is 60,000 to 100,000 words long, and real hefty novels top out around 500,000 words. So when you see a word frequency of 1 in 1,000,000 you should think “I could read 5-10 novels and never see this word or its variants.” Recall the frequencies shown pool together various inflections of the word, so the row for matelassé is really all of matelassé, matelassée, matelassées, matelassés, matelasser, matelasse, matelassent, matelassant, and matelassait combined.

expression (root)Frequency in 2008Frequency in 1968Frequency in 1928
ruée1 in 7,4201 in 7,0201 in 5,920
dresser1 in 25,2001 in 18,7001 in 14,200
soulevé1 in 27,4001 in 21,2001 in 18,500
lasse1 in 30,6001 in 31,6001 in 32,400
cerne1 in 64,3001 in 104,0001 in 236,000
cernée1 in 64,3001 in 104,0001 in 236,000
coulisses1 in 238,0001 in 132,0001 in 224,000
affermissant1 in 328,0001 in 200,0001 in 176,000
crispé1 in 418,0001 in 373,0001 in 453,000
corbeille1 in 433,0001 in 431,0001 in 259,000
croquer1 in 518,0001 in 808,0001 in 874,000
Mâcon1 in 606,0001 in 512,0001 in 476,000
vergogne1 in 662,0001 in 754,0001 in 925,000
navré1 in 677,0001 in 564,0001 in 455,000
badaud1 in 818,0001 in 774,0001 in 813,000
blanchâtre1 in 864,0001 in 484,0001 in 293,000
réverbère1 in 886,0001 in 659,0001 in 825,000
bleuté1 in 955,0001 in 919,0001 in 932,000
crépitant1 in 1,010,0001 in 796,0001 in 1,030,000
désaltérer1 in 1,160,0001 in 1,590,0001 in 1,190,000
crotté1 in 1,250,0001 in 1,470,0001 in 1,220,000
piétinements1 in 1,320,0001 in 936,0001 in 1,470,000
hargneux1 in 1,560,0001 in 1,090,0001 in 1,100,000
emmitouflée1 in 2,340,0001 in 3,280,0001 in 3,960,000
péristyle1 in 2,450,0001 in 1,380,0001 in 970,000
débraillé1 in 2,760,0001 in 1,590,0001 in 1,480,000
entrefilet1 in 3,080,0001 in 2,590,0001 in 2,070,000
plastron1 in 3,210,0001 in 2,290,0001 in 1,630,000
lestement1 in 3,630,0001 in 3,880,0001 in 1,450,000
matelassé1 in 7,210,0001 in 5,220,0001 in 6,830,000
contremarque1 in 10,500,0001 in 26,000,0001 in 12,300,000
maigriote1 in 75,900,0001 in 27,400,0001 in 23,400,000
panneau-réclame

F. P. also suggested that I sort the words by frequency, which I have using 2008 data. Those interested in the details of the data generation can read my code.

Looking down the first column of the table, I see that there’s a few words I was unfamiliar with that are currently more common than 1 in 100,000 words of book text. But the bulk of the new-to-me words are more rare than that, and many are rarer than one-in-a-million. And recall, this statistic pools together various inflections of the word (so matelassé is really all of matelassé, matelassée, matelassées, matelassés, matelasser, matelasse, matelassent, matelassant, and matelassait combined).

Looking across the rows, you can see which words were rare even in Simenon’s time, and which were relatively common then but have since fallen out of favor. For example, blanchâtre is currently a 1-in-864,000 word, though when Simenon wrote it was only a 1-in-293,000 word. Likewise péristyle was a one-in-a-million word then, but has become 2.5x more rare since. On the other hand, lasse was pretty common then and is pretty common now, piétinements was very rare then and now, and maigriote was already off the charts rare in 1928, coming in at a whopping 1-in-23,400,000 (it’s 3x as rare now, but…).

Word frequencies from French Google Books corpus

The Google Books NGram Viewer is a great resource. It has word frequency counts for a large sampling of books spanning hundreds of years and many languages.

I wrote some code (in this Colab notebook) to help me augment my vocabulary lists with frequencies of how often a word, in any of its inflected forms, appears in the subset of French books published in 2007 and known to Google. This lets me post tables like this:

expression (root)frequency
bondé1 in 742,000
détrempé1 in 2,040,000
patauger1 in 1,220,000
ballotté1 in 834,000
pain azyme1 in 10,200,000

According to this estimate, I’d come upon the word have to ready 742,000 words on average before coming upon bondé or one of its forms. As it happens, the usage of this word in books has been becoming (somewhat) more common over time:

I’ve gone back to my earlier vocabulary list posts (Pietr-le-Letton Chapter 5 and Chapter 6) and updated the lists with frequencies. I’ve also pointed out a few false conflations that Google has made (e.g. it thinks étaient is a form of the rare verb étayer. It is, but most of the instances of étaient are conjugations of être.) Take a look at the old list posts, and play around with the NGram viewer if you’ve never seen it before.

Vocab list: Pietr-le-Letton, Chapter 6

I’m making lists of unfamiliar words as I read George Simenon’s Pietr-le-Letton. Here’s my list for Chapter 6 (Au Roi de Sicile), with links to the search result page on Linguee and word frequencies from the Google NGram Viewer.

In this chapter, Maigret follows up a lead in a run down building in the Jewish quarter of town, near rue de Rosiers in le Marais. Simenon explicitly calls this place «le ghetto de Paris». He interviews the building manager, a not-very-cooperative Jew. The vocabulary has a lot of words about ragged, crowded, noisy, dilapidated, damp and dirty conditions.

28 unfamiliar words in 7 1/2 pages is getting up there, but still less than 1 in 5, which is the cutoff for a “just right book”.

expression (root)frequency
bondé1 in 742,000
détrempé1 in 2,040,000
patauger1 in 1,220,000
ballotté1 in 834,000
pain azyme1 in 10,200,000
grouillante1 in 1,330,000
grouillement1 in 3,190,000
faïence1 in 677,000
étayer1 in 2,360
boyau1 in 912,000
calotte1 in 971,000
crasseux1 in 1,330,000
empâtée1 in 3,710,000
peignoir1 in 1,500,000
entrouvrir1 in 382,000
esclandre1 in 3,310,000
ameuter1 in 1,470,000
grommeler1 in 942,000
parois1 in 69,100
crayeux1 in 3,880,000
sournois1 in 482,000
effaré1 in 712,000
loqueteux1 in 6,740,000
verdâtre1 in 923,000
clapoter1 in 5,110,000
vol à l’esbroufeNone
en faction1 in 2,420,000
pestant1 in 102,000
ronfler1 in 983,000

The frequency numbers are from the French Google Books corpus, specifically books published in 2007. They count how many words of such books you would have to read on average before coming upon the given word in any of its inflected forms. As you can see, a lot of these are fairly literary or old-fashioned words – the Pietr-le-Letton was written in 1931, after all.

There’s a few glitches in this analysis. The word étayer (meaning “to support”), is not so common you’d see it once in 2,360 words. Rather, Google NGram Viewer is conflating the 3rd person plural imparfait of the verb être (ils étaient) with the 3rd person plural present of the verb étayer (ils étaient). Same spelling, very different frequency. So take the frequency estimates with a grain of salt

Vocab list: Pietr-le-Letton, Chapter 5

I’m making lists of unfamiliar words as I read George Simenon’s Pietr-le-Letton. Below is my list for Chapter 5 (Le Russe Ivre), with links to the search result page on Linguee and word frequencies from the Google NGram Viewer.

The chapter takes place in a run-down bar in a fishing town (Fécamp) in winter, which accounts for why there are so many words about boats, bars, and rain. There’s 26 words here and the chapter is 9 pages long, so that’s about 3 new words a page – a “just right book” for my reading level.

expression (root)frequency
prunelles1 in 742,000
bouges1 in 61,200
soutiers1 in 11,100,000
zinc1 in 396,000
canaille1 in 690,000
entrebâillement1 in 4,290,000
crapuleux1 in 1,690,000
louvoyer1 in 1,640,000
luisant1 in 670
oeillade1 in 13,900,000
se saouler1 in 5,040,000
vergue1 in 1,610,000
tressaillir1 in 454,000
heurter1 in 48,400
toussotement1 in 11,600,000
buée1 in 1,670,000
ricaner1 in 528,000
bac1 in 82,000
tremper1 in 140,000
tiraillait1 in 594,000
bec-de-cane1 in 19,800,000
tournant1 in 8,540
marchand de bestiaux1 in 17,500,000
entrouverte1 in 382,000
blême1 in 860,000
tasser1 in 166,000

The frequency numbers are from the French Google Books corpus, specifically books published in 2007. They count how many words of such books you would have to read on average before coming upon the given word in any of its inflected forms. As you can see, a lot of these are fairly literary or old-fashioned words – the Pietr-le-Letton was written in 1931, after all. There’s a few glitches in this analysis. The word luisant, from luire = to shine, is not so common you’d see it once in 670 words. Rather, Google NGram Viewer thinks that lui is a form of luire. As far as I can tell, that’s outright wrong, but of course the pronoun lui is very common and so the conflation makes the estimate worthless. The single form luisant occurs 1 in 1,160,000, but that doesn’t account for all the other forms of luire. So take the frequency estimates with a grain of salt

I’ll be curious to see if my list length diminishes in later chapters and later novels. I’m reminded of the game I used to play when reading Sherlock Holmes stories aloud with my daughter – we’d joke about how many paragraphs into a story Conan Doyle could get without using the word “singular”. It was rarely double-digit.

Lesson 2020-07-01

My lesson with my teacher N today was mostly conversation (tout en français, bien sur), and mostly what we discussed was the process of creating this website, www.monsieurmiller.com. Turns out I really don’t know how to pronounce the first syllable of monsieur. In general, my pronunciation is pretty terrible, but that’s an awkwardly beginner word for me not to have the correct pronunciation ingrained.

In the discussion, we talked over the nuances of construire, créer, and édifier, and decided that créer was the best word for the start of a new website. Overall good exercise of technical web vocabulary domaine, lien, site, enregistrer, navigateur, onglet, etc. Tried to articulate the difference between a page and a post, which is not clear to me even in English. N asked me whether I intended to make the site bilangue, which for the time being I am not. Once I get my feet under me I may try writing some all-French posts.

We spent a little time looking at the several idiomatic expressions using the word lieu, following this quiz from www.partajondelfdalf.com, a site I had not encountered before.

Other tidbits: the expression en avoir marre de keeps tripping me up, as I think of en as absorbing the final de as in “Essais d’ouvrir la porte” –> “J’en ai essayé.” But you need both the en and the de in that expression “Ma famille en a marre de m’écouter parler de la France.” and not “Ma famille a marre de m’écouter parler de la France.