Phoneme Transcription Schemes

A phonetic transcription scheme is a way of writing down phonemes to represent the different sounds of one or more languages or accents. They are typically used either to describe how to pronounce the words of a language or to transcribe someone speaking.

Arpabet

Arpabet is used for transcribing General American English. It has several variants used in different text-to-speech programs and dictionaries:

  1. Arpabet – used in the CMU dictionary
  2. festival – festival dictionary phonemes

Cepstral

The Cepstral text-to-speech program uses the CMU dictionary and a variation of the Arpabet phoneme set for its General American English voices. The other languages support different phoneme sets specific for that language that appear to be derived from Arpabet as well:

  1. cepstral-de – Cepstral German voices
  2. cepstral-en_UK – Cepstral British English voices
  3. cepstral-en_US – Cepstral American English voices
  4. cepstral-es_LA – Cepstral Americas Spanish voices
  5. cepstral-fr_CA – Cepstral Canadian French voices
  6. cepstral-it – Cepstral Italian voices

SAMPA

The SAMPA-based transcription schemes vary between the different languages, and consist of:

  1. sampa-ar
  2. sampa-bg
  3. sampa-cs
  4. sampa-da
  5. sampa-de
  6. sampa-el
  7. sampa-en
  8. sampa-en_US
  9. sampa-es
  10. sampa-et
  11. sampa-fr
  12. sampa-he
  13. sampa-hr
  14. sampa-hu
  15. sampa-it
  16. sampa-nl
  17. sampa-no
  18. sampa-pl
  19. sampa-pt
  20. sampa-ro
  21. sampa-ru
  22. sampa-sl
  23. sampa-sv
  24. sampa-th
  25. sampa-tr
  26. sampa-yue

Various prosodic elements are supported through the SAMPROSA transcription scheme that can be used in addition to the language transcription scheme.

MBROLA

The MBROLA phoneme sets are different for each voice. The voice describes the phonemes supported in a text file with the same name as the voice (e.g. ca1.txt) which are derivatives of the corresponding SAMPA phoneme set for that language.

MBROLA transcriptions also contain prosody information. They consist of the following on each line:

phoneme length position frequency position frequency ...

where the (position, frequency) pairs are optional and form the frequency contour shape, with position being the percentage through the phoneme.

International Phonetic Alphabet

The International Phonetic Alphabet (IPA) provides a representation of all the different pronounceable sounds that humans can make that are used in languages from all over the world. There are different transcription schemes based on IPA:

  1. Unicode – using the correct Unicode symbols for the IPA
  2. Kirshenbaum – ASCII transcription of IPA
  3. X-SAMPA – alternate ASCII transcription of IPA
  4. CXS – extension to X-SAMPA for constructed languages
  5. Z-SAMPA – another extension to X-SAMPA for constructed languages

NOTE: The IPA symbols are used differently by different transcribers depending on the language and phonetic qualities the transcriber wants to record.

eSpeak

The eSpeak phoneme set is derived from Kirshenbaum but instead of having a single phoneme set for all languages it uses a different phoneme set for each language, tailoring the voice to that language.

Language Phonetic Alphabets

Some of the worlds languages have alphabets that are phonetic – they describe how the words are to be pronounced implicitly. These alphabets include:

  1. Hiragana – Japanese (for Japanese language words)
  2. Katakana – Japanese (for foreign language words)
  3. Cherokee
  4. Hangul – Korean

For these alphabets, the alphabet characters can either be:

  1. pronounced directly;
  2. mapped to the alphabet of the speaker (e.g. using Hepburn romanji for Japanese Hiragana/Katakana);
  3. mapped to the IPA or one of its variants (for a more precise pronunciation).

Languages that have other alphabets in addition to the phonetic ones (e.g. Japanese has Kanji, which are imported Chinese characters) can map those alphabets to their phonetic alphabet (e.g. mapping Kanji to Hiragana). This allows the pronunciation of those characters to be kept as accurate as possible for as long as possible.