synsinger posted: " I was having problems with WORLD rendering phonemes - the results were "bumpy", as follows: The output should have been relatively flat, not wavy After much head scratching, I did a number of tests and verified that the code worked just fine on aud" synSinger
I was having problems with WORLD rendering phonemes - the results were "bumpy", as follows:
The output should have been relatively flat, not wavy
After much head scratching, I did a number of tests and verified that the code worked just fine on audio, just not on my voice.
I suspected that the FFT size might be at fault, but that didn't fix the problem. Finally, the light dawned on me - I've got a relatively deep voice, so it might be having trouble with the detecting the pitch.
There's a comment in the WORLD code that addresses this very issue:
// You can set the f0_floor below world::kFloorF0.
The f0_floor was what I was looking for. I ended up increasing the size of the FFT as well, since low-pitch waves require larger FFTs in order to properly detect the pitch.
With those two corrected, WORLD was able to properly analyze the pitch of the wave, and synthesize the waveform correctly:
Correctly rendering the wave
I'm now able to render strings of phonemes:
A rendering of "Twinkle, Little Star" displayed in Praat
There's a lot of work to be done in making the phonemes connect smoothly, but it's starting to sound much better (although still very robotic).
No comments:
Post a Comment