Syl·lab·i·fi·ca·tion of Phonemes

synsinger

Jul 3

One of the things that synSinger needs to do is take multi-syllable words, convert them to phonemes, and then split those phonemes back into syllables.

The good news is that I've got a version of the CMU Pronouncing Dictionary that does exactly that. Here's a portion of it:

SHUPP  SH AH1 P SHUR  SH ER1 SHURE  SH UH1 R SHURGARD  SH UH1 R G AA2 R D SHURLEY  SH ER1 L IY0 SHURR  SH ER1 SHURTLEFF  SH ER1 T L IH0 F SHURTLIFF  SH ER1 T L IH0 F SHURTZ  SH ER1 T S SHUSTER  SH AH1 S T ER0 SHUSTERMAN  SH AH1 S T ER0 M AH0 N SHUT  SH AH1 T SHUTDOWN  SH AH1 T D AW2 N SHUTDOWNS  SH AH1 T D AW2 N Z SHUTE  SH UW1 T SHUTES  SH UW1 T S SHUTOUT  SH AH1 T AW2 T

For words in the CMU Dictionary, it's simply a process of doing a lookup.

For words not in the dictionary, the process is a bit more complex.

I've got a modified copy of the 1976 rules-based English-to-phonemes program from the Naval Research Laboratory. For example, here's the rules for the /T/ section:

  -- T   "(TH)",           "TH",   "#:(TED)",        "TIXD",   "S(TI)#N",        "CH",   "(TI)O",          "SH",   "(TI)A",          "SH",   "(TIEN)",         "SHUN",   "(TUR)#",         "CHER",   "(TU)A",          "CHUW",   " (TWO)",         "TUW",   "&(T)EN ",        "",   "(T)",            "T",

The program isn't great, but as a fallback for the dictionary, it's adequate.

But there are no syllabification rules, so the phonemes still need to be split into syllables.

In one respect, this is a non-issue. As a rule of singing, all phoneme at the head of a note is the vowel. Any consonants that precede a vowel are moved to the end of the prior syllable.

For example, the word "geometry" is rendered phonetically as "JH IY AA M AH T R IY".

Internally, synSinger breaks the syllables at the vowels, so it would be "JH - IY - AA M - AH T R - IY". Note that the /jh/ is moved to the note or silence that precedes it.

This can look a bit odd to a user, so I've put together some code to convert phonemes generated by the rule-based code into syllables. It renders the example as "JH IY - AA - M AH - T R IY".

It isn't completely correct, but it looks reasonably accurate, which is all it needs to be. After all, it's just the fallback code.

Another option would be to write code to scan the phonetic dictionary for the phonemes that fall between vowels, and keep the most common ones as hyphenation rules. This wouldn't be too difficult to do, so I may do that at some point.

But for now, I just want to integrate the existing English-to-phoneme code so I can move forward with the synthesis code.

Comment