Skip to content

Mapping between Languages

In [1]:
from phones import PhoneCollection
pc = PhoneCollection()

There are several ways to convert phones between languages.

On a Collection Level

The get_closest method allows you to find the closest phone to the source language phone in the target language and shows you the euclidean distance as well.

In [2]:
# get the closest german phones to the english phone ð
pc.get_closest("ð", "eng", "deu")
Out[2]:
[(2.8284271247461903, z (deu)), (2.8284271247461903, ʒ (deu))]
In [3]:
# also get phones which are further away
pc.get_closest("ð", "eng", "deu", return_all=True)[:5]
Out[3]:
[(2.8284271247461903, z (deu)),
 (2.8284271247461903, ʒ (deu)),
 (3.4641016151377544, d (deu)),
 (3.4641016151377544, s (deu)),
 (3.4641016151377544, ʃ (deu))]

On a Phone Level

Alternatively, the .closest method each phone has can be used.

In [4]:
pc.phones("ð").val.langs("deu").closest()
Out[4]:
[(2.8284271247461903, z (deu)), (2.8284271247461903, ʒ (deu))]

Noisy Conversion

As you can see above, there are two phones with the same distance. You might want to introduce some randomness and translate to different close phones each time. In this case, you can use the .noise method.

In [5]:
(
    pc
    .phones("ð")
    .val
    .noise(p=.05, random_state=10)
    .langs("deu")
    .closest()
)
Out[5]:
[(2.8284271247461903, ʒ (deu))]

This is done by corrupting each value in the phone vector with probability p.

The higher p, the more likely changes are to the vector, which can lead to phones with higher distances.

In [6]:
(
    pc
    .phones("ð")
    .val
    .noise(p=1, random_state=10)
    .langs("deu")
    .closest()
)
Out[6]:
[(7.3484692283495345, øː (deu))]