Numbers

Numbers can be written in many different scripts, such as Latin, Roman, Japanese, Malayalam, Braille and Ogham. Some of these scripts may have special rules for processing them (e.g. how to interpret the tens, hundreds and thousands).

The eSpeak text-to-speech engine is poor at handling numbers greater than billion (it pronounces 1 trillion as 1 thousand billion and then reverts to speaking the numbers individually). It can handle numbers with commas in better (e.g. 1,000), pronouncing 1 quadrillion as 1 million billion, but pronounces 1 quintillion incorrectly as 1 million!

The rsynth text-to-speech engine generates words as part of the number conversion algorithm that feed into the grapheme-to-phoneme rules, whereas eSpeak directly generates phonemes. The eSpeak approach is harder to maintain because accent/dialect variations need to be applied to the number rules in addition to their associated words and the phoneme sequences are harder to read.

Algorithm

The Cainteoir Text-to-Speech engine uses the following algorithm when processing numbers:

1. Normalization

The idea here is to identify the number form being used via the script and codepoints that make up the number. This identification results in a set of number processing rules to be run through the number to convert it to its ASCII form (e.g. converting VI to 6).

After this step, all numbers will be in their ASCII form irrespective of the way they were written in the source text. The rationale behind this is that it makes the subsequent processing easier.

2. Grouping

The pronunciation rules for different languages use groups of digits, where the length of the groups vary on the language. For Ethiopic, groups of 2 digits are used; for Chinese, Japanese and Korean, groups of 4 digits are used; for Western languages, groups of 3 digits are used.

The number is split into blocks reading right-to-left to ensure that each block has the correct number of digits. Also, each block is associated with the rank (block number) of that block.

The rationale behind this is that it then allows the correct pronunciation for numbers to be constructed by reading the processed blocks left-to-right.

3. Word Generation

Reading the number and rank of each block, moving left-to-right, the words to pronounce each block are constructed from word rules associated with the specified locale.

These rules handle regional differences (e.g. twelve thousand and sixty three in British English and twelve thousand sixty three in American English¹), as well as language differences.

The different word lists used to represent each number triple varies depending on the number scale used. For example, for 1000000000 (10^9):

Short Scale used in US and modern British English, uses one billion².
British Long Scale used in traditional British English, uses one thousand million².
European Long Scale used in Continental Europe, uses one milliard².
Greek Scale as proposed by Russ Rowlett, uses one gillion³.

NOTE: These systems can represent very large numbers over 10^99 in size^{4, 5}.

Numbers

Algorithm

1. Normalization

2. Grouping

3. Word Generation

References