Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The overloading of the shi sound in mandarin approaches absurd levels.

I've always liked:

四 是 四 , 十 是 十 , 十 四 是 十 四 , 四 十 是 四 十 , 四 十 四 只 石 狮 子 是 死 的 sì shì sì shí shì shí shí sì shì shí sì sì shí shì sì shí sì shí sì zhī shí shī zǐ shì sǐ de. Translation: 4 is 4, 10 is 10, 14 is 14, 40 is 40, 44 small stones are dead

Then get a southerner with say a Sichuan or yunnan accent to say this (in mandarin). They cannot properly pronounce shi (they say it like si). The above just sounds like an angry bee. Given that shi and si is used a lot this becomes a real pain for a non native speaker.

Makes this tough when you're buying something that is 44 rmb and you cant tell whether they said "is 14" or "44" or what.



My impression (from very brief study of Mandarin) is that Mandarin has a lot more homonyms than most languages, e.g. lǐ has a ton of meanings: http://en.wiktionary.org/wiki/l%C7%90 It also seems to me that a lot of Mandarin phonemes are much closer together than most languages, e.g. ch and q are both a similar "ch" sound (likewise sh and x). Or maybe these sounds just seem similar to me because of my English upbringing?

Do linguists have a way to quantitatively measure how close together the sounds and words are in a language? Some sort of Shannon entropy measure, maybe? Or a way to measure how spread out words are in "phonetic space". I couldn't find anything, but I'd like to know if there's a way to measure these things objectively.


On the first part: See my other post for more, but basically, much (not all) of the homonymy is from older and/or written-only forms; and yes, ch and q don't sound any closer to a native Mandarin speaker than, say, sh and s do to a native English speaker or u and ou to a native French speaker.

On the second part: no defined measure that I know of. It would be a little tricky in that language is a moving target, everyone speaks it slightly differently, and even for a single speaker the "location" of a particular phone is more of a probability distribution even after you factor out varying context. That said, there definitely are charts that map out the space and take a stab at identifying the prototype location of each phone in the sound space, so it's not entirely implausible that you could summarise that with a distance measure. I strongly suspect that the value of the measure would not vary much among languages with similar-size phoneme inventories, though.


In real life speech, people don't say such things in Chinese, just as English speakers don't say "Suzie sells sea shells by the sea shore". If they do, they'll say it slower to avoid ambiguities, maybe rephrase it as well, e.g. "You remember Suzie? She's into sea shells. She sells them by the beach."




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: