Skip to content

API Reference#

phones#

PhoneCollection #

val: object property readonly #

If the collection is filtered down to a single phone, return that phone.

Returns:

Type Description
object

A Phone object.

values: List[object] property readonly #

The collection as a list of phones.

Returns:

Type Description
List[object]

A list of Phone objects.

values_with_allophones: List[object] property readonly #

The collection as a list of phones.

Returns:

Type Description
List[object]

A list of Phone objects.

__init__(self, source: PhoneSource = PhoneSource(urls=['https://raw.githubusercontent.com/phoible/dev/master/data/phoible.csv'], index_column='Phoneme', feature_columns=['tone', 'stress', 'syllabic', 'short', 'long', 'consonantal', 'sonorant', 'continuant', 'delayedRelease', 'approximant', 'tap', 'trill', 'nasal', 'lateral', 'labial', 'round', 'labiodental', 'coronal', 'anterior', 'distributed', 'strident', 'dorsal', 'high', 'low', 'front', 'back', 'tense', 'retractedTongueRoot', 'advancedTongueRoot', 'periodicGlottalSource', 'epilaryngealSource', 'spreadGlottis', 'constrictedGlottis', 'fortis', 'raisedLarynxEjective', 'loweredLarynxImplosive', 'click'], language_column='ISO6393', allophone_column='Allophones', dialect_column='SpecificDialect'), cache_dir: str = '/home/runner/.cache/phones', merge_same_language: bool = True, load_dialects: bool = False, _master: object = None) -> None special #

Creates a PhoneCollection object that loads phones from a PhoneSource into a pandas DataFrame.

Parameters:

Name Type Description Default
source PhoneSource

The PhoneSource object that defines the source of the data.

PhoneSource(urls=['https://raw.githubusercontent.com/phoible/dev/master/data/phoible.csv'], index_column='Phoneme', feature_columns=['tone', 'stress', 'syllabic', 'short', 'long', 'consonantal', 'sonorant', 'continuant', 'delayedRelease', 'approximant', 'tap', 'trill', 'nasal', 'lateral', 'labial', 'round', 'labiodental', 'coronal', 'anterior', 'distributed', 'strident', 'dorsal', 'high', 'low', 'front', 'back', 'tense', 'retractedTongueRoot', 'advancedTongueRoot', 'periodicGlottalSource', 'epilaryngealSource', 'spreadGlottis', 'constrictedGlottis', 'fortis', 'raisedLarynxEjective', 'loweredLarynxImplosive', 'click'], language_column='ISO6393', allophone_column='Allophones', dialect_column='SpecificDialect')
cache_dir str

The directory where the data will be downloaded and cached.

'/home/runner/.cache/phones'
merge_same_language bool

If true, multiple phone definitions in the same language are merged.

True
load_dialects bool

If false, dialects are ignored.

False

dialects(self, dialects: Union[str, List[str]], inplace = True) -> object #

It takes a list of dialects and returns a copy PhoneCollection with only the rows that have one of those dialects.

Parameters:

Name Type Description Default
dialects Union[str, List[str]]

A list of dialects or single dialects to filter on. Use None to remove all dialects except the one without a specific name.

required
inplace

Modifies the underlying dataframe, affecting phones.

True

Returns:

Type Description
object

A new instance of the class, with the filtered data.

feature_to_weight(feature: str) -> float staticmethod #

If the feature is a string, try to convert it to a float "-" is converted to -1, "+" to 1. If it's a string but can't be converted to a float, return 0.0. If it's a comma-delimited list of "+" and "-", return the mean of the list of floats.

Parameters:

Name Type Description Default
feature str

The feature to be converted to a weight.

required

Returns:

Type Description
float

The string feature converted to a float in [-1.,0.,1.]

get_closest(self, phone: str, src_language: str, tgt_language: str, return_allophones: bool = False, distance_fn: Callable[[Iterable[float], Iterable[float]], float] = <function euclidean at 0x7f3a5748ca70>, distance_weights = None, allow_allophones = True, return_all = False) -> Union[List[Tuple[float, str]], Tuple[List[Tuple[float, str]], List[Tuple[float, str]]]] #

Given a phone, a source language, a target language, a distance function, and a distance weight, get_closest returns the closest phone in the target language. It also returns all the allophones of the source phone in the target language. It also returns the distance between the source phone and the closest phone.

Examples:

Let's say we want to find the closest phone to the phone ð in the language English in German.

1
2
pc = PhoneCollection()
pc.get_closest("ð", "eng", "deu")

[(2.8284271247461903, 'z'), (2.8284271247461903, 'ʒ')]

Parameters:

Name Type Description Default
phone str

The phone to be mapped.

required
src_language str

The language of the phone that you want to find the closest one to.

required
tgt_language str

The language of the target phone.

required
return_allophones bool

If True, return a tuple of (closest_phones, allophones)

False
distance_fn Callable[[Iterable[float], Iterable[float]], float]

The distance function to use.

<function euclidean at 0x7f3a5748ca70>
distance_weights

If None, the distance weights are set to 1/n, where n is the number of features. Otherwise, the weights are normalised and then used for the distance calculations.

None
allow_allophones

If True, then if the phone is not found in the inventory, search for a phone the given phone is an allophone of.

True
return_all

If True, return all phones and their distances, not just the closest ones.

False

Returns:

Type Description
Union[List[Tuple[float, str]], Tuple[List[Tuple[float, str]], List[Tuple[float, str]]]]

Returns a list of (distance, phone) for the closests phones or for all phones if return_all is True. If allow_allophones is True, returns a Tuple of lists with the first entry being the closests phones and the second being allophones.

get_closest_by_phone(self, phone: List[float], distance_fn: Callable[[Iterable[float], Iterable[float]], float] = <function euclidean at 0x7f3a5748ca70>) -> List[Tuple[float, object]] #

Given a phone, return the closest phone in the collection

Parameters:

Name Type Description Default
phone List[float]

The phone to find the closest phone to.

required
distance_fn Callable[[Iterable[float], Iterable[float]], float]

The function that will be used to measure the distance between phones.

<function euclidean at 0x7f3a5748ca70>

Returns:

Type Description
List[Tuple[float, object]]

A list of tuples, where each tuple contains a distance and a phone.

get_closest_by_vector(self, vector: List[float], distance_fn: Callable[[Iterable[float], Iterable[float]], float] = <function euclidean at 0x7f3a5748ca70>) -> List[Tuple[float, object]] #

Given a vector, find the phone that is closest to the vector

Parameters:

Name Type Description Default
vector List[float]

The vector we're looking for the closest phones to.

required
distance_fn Callable[[Iterable[float], Iterable[float]], float]

The function that will be used to calculate the distance between the vector and phones.

<function euclidean at 0x7f3a5748ca70>

Returns:

Type Description
List[Tuple[float, object]]

A list of tuples, where each tuple contains a distance and a phone.

get_mean_allophone_distance(self, distance_weights = None, show_progress = False) -> float #

For each row in the dataframe, we get the phone and allophone values. If the allophone is different from the phone, we get the mean distance between the allophone and the phone. We return the mean of all allophone <-> phone distances.

Parameters:

Name Type Description Default
distance_weights

A dictionary of weights for each distance type.

None
show_progress

If True, show a progress bar.

False

Returns:

Type Description
float

The mean of the distances between allophones and their phones.

get_mean_phone_distance(self, phone: str, other_phone: str, distance_fn: Callable[[Iterable[float], Iterable[float]], float] = <function euclidean at 0x7f3a5748ca70>, distance_weights = None) -> float #

For a given phone, find the mean of all the features for that phone. Then, find the distance between that phone and another phone.

Parameters:

Name Type Description Default
phone str

The phone to compare to the other phone.

required
other_phone str

The other phone to compare to.

required
distance_fn Callable[[Iterable[float], Iterable[float]], float]

The distance function to use.

<function euclidean at 0x7f3a5748ca70>
distance_weights

This is a list of weights for each feature.

None

Returns:

Type Description
float

The mean distance between the two phones.

langs(self, langs, inplace = True) -> object #

It takes a list of languages and returns a copy PhoneCollection with only the rows that have one of those languages.

Parameters:

Name Type Description Default
langs

A list of languages or single language to filter on.

required
inplace

Modifies the underlying dataframe, affecting phones.

True

Returns:

Type Description
object

A new instance of the class, with the filtered data.

phones(self, phones: Union[str, List[str]]) -> object #

It takes a list of phones and returns a copy PhoneCollection with only the rows that have one of those phones.

Parameters:

Name Type Description Default
phones Union[str, List[str]]

A list of phones or single phone to filter on.

required

Returns:

Type Description
object

A new instance of the class, with the filtered data.

convert #

This module allows to convert between the "ipa", "xsampa" and "arpabet" formats. The code is adapted from the phonecodes package by Mark Hasegawa-Johnson.

Examples:

A converter object can be used.

1
2
3
from phone.convert import Converter
converter = Converter()
converter("wɜ˞ld", "ipa", "arpabet")

['W', 'ER', 'L', 'D']

You can also list all possible formats.

1
converter.formats

features #

Phone #

__init__(self, index: str, features: Dict[str, Union[int, str]], language_code: Optional[str] = None, allophones: Optional[List[str]] = None, collection: Optional[object] = None) -> None special #

Create a new Phone object.

Parameters:

Name Type Description Default
index str

The index of the phone in the phone set.

required
features Dict[str, Union[int, str]]

A dictionary of features. When not provided with numerical values, - will be replaced with -1 and + with 1.

required
language_code Optional[str]

The language code of the language that the phoneme belongs to.

None
allophones Optional[List[str]]

A list of allophones for the phoneme.

None
collection Optional[object]

The PhoneCollection the phone is contained in.

None

Examples:

The Phone class supports arithmetic operations.

1
2
pc = PhoneCollection()
pc.phones("i").val + pc.phones("u").val

[(0.7071067811865476, iu (adn)),(0.7071067811865476, iu (bhg)),...]

And data augmentation.

1
2
3
4
pc = PhoneCollection()
z = pc.phones("z").val
z_noise = pc.phones("z").val.noise(.05, random_state=42)
z.vector - z_noise.vector

array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., -2., 0., 0., 0., 0., 0., 0., 0.])

If the phones vector has been altered, we can also find its closest existing phone(s).

1
2
pc = PhoneCollection()
z_noise = pc.phones("z").val.noise(.05, random_state=42).closest()

[(0.0, z̤ (xho))]

... filtered by language(s)

1
2
pc = PhoneCollection()
z_noise = pc.phones("z").val.noise(.05, random_state=42).langs("eng").closest()

[(2.0, z (eng))]

closest(self) -> List[Tuple[float, object]] #

Given the current phone's vector, return the closest phone(s) in the collection and their distances.

Returns:

Type Description
List[Tuple[float, object]]

A list of distance,phone tuples.

get_feature_vector(self, features: List[str]) -> ndarray #

Get the feature vector of the phone for the features provided.

Returns:

Type Description
ndarray

A numpy array of the feature vector.

langs(self, langs: str) -> object #

The langs function takes a string or list of languages and returns the phone with the languages filter applied.

Parameters:

Name Type Description Default
langs str

A list of language codes or single language code.

required

Returns:

Type Description
object

A new instance of the class.

noise(self, p: float = 0.005, abs_max_change: float = 2, return_close = False, random_state: int = None) -> Union[List[Tuple[float, object]], object] #

Given a phone, it will return a new phone with a random vector that is close to the original phone.

Parameters:

Name Type Description Default
p float

The element-wise probability of a change in a phone vector.

0.005
abs_max_change float

The maximum absolute value an element of the phone vector can change.

2
random_state int

Seed used for the random numbers used.

None

Returns:

Type Description
Union[List[Tuple[float, object]], object]

The phone object with a noised feature vector.

PhoneFeature #

__init__(self, feature: str, value: float) -> None special #

Create a new instance of a PhoneFeature with the feature and value provided.

Parameters:

Name Type Description Default
feature str

The feature to be evaluated.

required
value float

The value of the feature.

required

sources #

Sources for phone features.

sources.PHOIBLE#

Source for phone inventories from phoible.org. Use the following citation:

1
2
3
4
5
6
@article{moran2014phoible,
title={PHOIBLE online},
author={Moran, Steven and McCloy, Daniel and Wright, Richard},
year={2014},
publisher={Max Planck Institute for Evolutionary Anthropology}
}
sources.PANPHON#

Source for phone inventories from panphon. Use the following citation:

1
2
3
4
5
6
7
@inproceedings{mortensen2016panphon,
title={Panphon: A resource for mapping IPA segments to articulatory feature vectors},
author={Mortensen, David R and Littell, Patrick and Bharadwaj, Akash and Goyal, Kartik and Dyer, Chris and Levin, Lori},
booktitle={Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers},
pages={3475--3484},
year={2016}
}

PhoneSource dataclass #

The PhoneSource class is a dataclass that stores the information about a phone source.

A phone source is a csv containing phone definitions and their linguistic features

Attributes:

Name Type Description
urls List[str]

a list of urls to the csv files

index_column str

the name of the column that contains the index (ipa character(s)) of the phone

feature_columns List[str]

a list of the names of the columns that contain the features of the phone

language_column str

the name of the column that contains the iso code of the language

dialect_column str

the name of the column that contains the dialect