API Reference#
phones
#
PhoneCollection
#
val: object
property
readonly
#
If the collection is filtered down to a single phone, return that phone.
Returns:
Type | Description |
---|---|
object |
A |
values: List[object]
property
readonly
#
The collection as a list of phones.
Returns:
Type | Description |
---|---|
List[object] |
A list of |
values_with_allophones: List[object]
property
readonly
#
The collection as a list of phones.
Returns:
Type | Description |
---|---|
List[object] |
A list of |
__init__(self, source: PhoneSource = PhoneSource(urls=['https://raw.githubusercontent.com/phoible/dev/master/data/phoible.csv'], index_column='Phoneme', feature_columns=['tone', 'stress', 'syllabic', 'short', 'long', 'consonantal', 'sonorant', 'continuant', 'delayedRelease', 'approximant', 'tap', 'trill', 'nasal', 'lateral', 'labial', 'round', 'labiodental', 'coronal', 'anterior', 'distributed', 'strident', 'dorsal', 'high', 'low', 'front', 'back', 'tense', 'retractedTongueRoot', 'advancedTongueRoot', 'periodicGlottalSource', 'epilaryngealSource', 'spreadGlottis', 'constrictedGlottis', 'fortis', 'raisedLarynxEjective', 'loweredLarynxImplosive', 'click'], language_column='ISO6393', allophone_column='Allophones', dialect_column='SpecificDialect'), cache_dir: str = '/home/runner/.cache/phones', merge_same_language: bool = True, load_dialects: bool = False, _master: object = None) -> None
special
#
Creates a PhoneCollection
object that loads phones from a PhoneSource
into a pandas DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source |
PhoneSource |
The |
PhoneSource(urls=['https://raw.githubusercontent.com/phoible/dev/master/data/phoible.csv'], index_column='Phoneme', feature_columns=['tone', 'stress', 'syllabic', 'short', 'long', 'consonantal', 'sonorant', 'continuant', 'delayedRelease', 'approximant', 'tap', 'trill', 'nasal', 'lateral', 'labial', 'round', 'labiodental', 'coronal', 'anterior', 'distributed', 'strident', 'dorsal', 'high', 'low', 'front', 'back', 'tense', 'retractedTongueRoot', 'advancedTongueRoot', 'periodicGlottalSource', 'epilaryngealSource', 'spreadGlottis', 'constrictedGlottis', 'fortis', 'raisedLarynxEjective', 'loweredLarynxImplosive', 'click'], language_column='ISO6393', allophone_column='Allophones', dialect_column='SpecificDialect') |
cache_dir |
str |
The directory where the data will be downloaded and cached. |
'/home/runner/.cache/phones' |
merge_same_language |
bool |
If true, multiple phone definitions in the same language are merged. |
True |
load_dialects |
bool |
If false, dialects are ignored. |
False |
dialects(self, dialects: Union[str, List[str]], inplace = True) -> object
#
It takes a list of dialects and returns a copy PhoneCollection
with only the rows that have one of
those dialects.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dialects |
Union[str, List[str]] |
A list of dialects or single dialects to filter on. Use |
required |
inplace |
Modifies the underlying dataframe, affecting phones. |
True |
Returns:
Type | Description |
---|---|
object |
A new instance of the class, with the filtered data. |
feature_to_weight(feature: str) -> float
staticmethod
#
If the feature is a string, try to convert it to a float "-" is converted to -1, "+" to 1. If it's a string but can't be converted to a float, return 0.0. If it's a comma-delimited list of "+" and "-", return the mean of the list of floats.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
feature |
str |
The feature to be converted to a weight. |
required |
Returns:
Type | Description |
---|---|
float |
The string feature converted to a float in |
get_closest(self, phone: str, src_language: str, tgt_language: str, return_allophones: bool = False, distance_fn: Callable[[Iterable[float], Iterable[float]], float] = <function euclidean at 0x7f3a5748ca70>, distance_weights = None, allow_allophones = True, return_all = False) -> Union[List[Tuple[float, str]], Tuple[List[Tuple[float, str]], List[Tuple[float, str]]]]
#
Given a phone, a source language, a target language, a distance function, and a distance weight,
get_closest
returns the closest phone in the target language.
It also returns all the allophones of the source phone in the target language.
It also returns the distance between the source phone and the closest phone.
Examples:
Let's say we want to find the closest phone to the phone ð
in the language English
in German
.
1 2 |
|
[(2.8284271247461903, 'z'), (2.8284271247461903, 'ʒ')]
Parameters:
Name | Type | Description | Default |
---|---|---|---|
phone |
str |
The phone to be mapped. |
required |
src_language |
str |
The language of the phone that you want to find the closest one to. |
required |
tgt_language |
str |
The language of the target phone. |
required |
return_allophones |
bool |
If True, return a tuple of |
False |
distance_fn |
Callable[[Iterable[float], Iterable[float]], float] |
The distance function to use. |
<function euclidean at 0x7f3a5748ca70> |
distance_weights |
If None, the distance weights are set to 1/n, where n is the number of features. Otherwise, the weights are normalised and then used for the distance calculations. |
None |
|
allow_allophones |
If True, then if the phone is not found in the inventory, search for a phone the given |
True |
|
return_all |
If True, return all phones and their distances, not just the closest ones. |
False |
Returns:
Type | Description |
---|---|
Union[List[Tuple[float, str]], Tuple[List[Tuple[float, str]], List[Tuple[float, str]]]] |
Returns a list of |
get_closest_by_phone(self, phone: List[float], distance_fn: Callable[[Iterable[float], Iterable[float]], float] = <function euclidean at 0x7f3a5748ca70>) -> List[Tuple[float, object]]
#
Given a phone, return the closest phone in the collection
Parameters:
Name | Type | Description | Default |
---|---|---|---|
phone |
List[float] |
The phone to find the closest phone to. |
required |
distance_fn |
Callable[[Iterable[float], Iterable[float]], float] |
The function that will be used to measure the distance between phones. |
<function euclidean at 0x7f3a5748ca70> |
Returns:
Type | Description |
---|---|
List[Tuple[float, object]] |
A list of tuples, where each tuple contains a distance and a phone. |
get_closest_by_vector(self, vector: List[float], distance_fn: Callable[[Iterable[float], Iterable[float]], float] = <function euclidean at 0x7f3a5748ca70>) -> List[Tuple[float, object]]
#
Given a vector, find the phone that is closest to the vector
Parameters:
Name | Type | Description | Default |
---|---|---|---|
vector |
List[float] |
The vector we're looking for the closest phones to. |
required |
distance_fn |
Callable[[Iterable[float], Iterable[float]], float] |
The function that will be used to calculate the distance between the vector and phones. |
<function euclidean at 0x7f3a5748ca70> |
Returns:
Type | Description |
---|---|
List[Tuple[float, object]] |
A list of tuples, where each tuple contains a distance and a phone. |
get_mean_allophone_distance(self, distance_weights = None, show_progress = False) -> float
#
For each row in the dataframe, we get the phone and allophone values. If the allophone is different from the phone, we get the mean distance between the allophone and the phone. We return the mean of all allophone <-> phone distances.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
distance_weights |
A dictionary of weights for each distance type. |
None |
|
show_progress |
If True, show a progress bar. |
False |
Returns:
Type | Description |
---|---|
float |
The mean of the distances between allophones and their phones. |
get_mean_phone_distance(self, phone: str, other_phone: str, distance_fn: Callable[[Iterable[float], Iterable[float]], float] = <function euclidean at 0x7f3a5748ca70>, distance_weights = None) -> float
#
For a given phone, find the mean of all the features for that phone. Then, find the distance between that phone and another phone.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
phone |
str |
The phone to compare to the other phone. |
required |
other_phone |
str |
The other phone to compare to. |
required |
distance_fn |
Callable[[Iterable[float], Iterable[float]], float] |
The distance function to use. |
<function euclidean at 0x7f3a5748ca70> |
distance_weights |
This is a list of weights for each feature. |
None |
Returns:
Type | Description |
---|---|
float |
The mean distance between the two phones. |
langs(self, langs, inplace = True) -> object
#
It takes a list of languages and returns a copy PhoneCollection
with only the rows that have one of
those languages.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
langs |
A list of languages or single language to filter on. |
required | |
inplace |
Modifies the underlying dataframe, affecting phones. |
True |
Returns:
Type | Description |
---|---|
object |
A new instance of the class, with the filtered data. |
phones(self, phones: Union[str, List[str]]) -> object
#
It takes a list of phones and returns a copy PhoneCollection
with only the rows that have one of
those phones.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
phones |
Union[str, List[str]] |
A list of phones or single phone to filter on. |
required |
Returns:
Type | Description |
---|---|
object |
A new instance of the class, with the filtered data. |
convert
#
This module allows to convert between the "ipa", "xsampa" and "arpabet" formats. The code is adapted from the phonecodes package by Mark Hasegawa-Johnson.
Examples:
A converter object can be used.
1 2 3 |
|
['W', 'ER', 'L', 'D']
You can also list all possible formats.
1 |
|
features
#
Phone
#
__init__(self, index: str, features: Dict[str, Union[int, str]], language_code: Optional[str] = None, allophones: Optional[List[str]] = None, collection: Optional[object] = None) -> None
special
#
Create a new Phone
object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index |
str |
The index of the phone in the phone set. |
required |
features |
Dict[str, Union[int, str]] |
A dictionary of features. When not provided with numerical values, |
required |
language_code |
Optional[str] |
The language code of the language that the phoneme belongs to. |
None |
allophones |
Optional[List[str]] |
A list of allophones for the phoneme. |
None |
collection |
Optional[object] |
The |
None |
Examples:
The Phone
class supports arithmetic operations.
1 2 |
|
[(0.7071067811865476, iu (adn)),(0.7071067811865476, iu (bhg)),...]
And data augmentation.
1 2 3 4 |
|
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., -2., 0., 0., 0., 0., 0., 0., 0.])
If the phones vector has been altered, we can also find its closest existing phone(s).
1 2 |
|
[(0.0, z̤ (xho))]
... filtered by language(s)
1 2 |
|
[(2.0, z (eng))]
closest(self) -> List[Tuple[float, object]]
#
Given the current phone's vector, return the closest phone(s) in the collection and their distances.
Returns:
Type | Description |
---|---|
List[Tuple[float, object]] |
A list of distance,phone tuples. |
get_feature_vector(self, features: List[str]) -> ndarray
#
Get the feature vector of the phone for the features provided.
Returns:
Type | Description |
---|---|
ndarray |
A numpy array of the feature vector. |
langs(self, langs: str) -> object
#
The langs function takes a string or list of languages and returns the phone with the languages filter applied.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
langs |
str |
A list of language codes or single language code. |
required |
Returns:
Type | Description |
---|---|
object |
A new instance of the class. |
noise(self, p: float = 0.005, abs_max_change: float = 2, return_close = False, random_state: int = None) -> Union[List[Tuple[float, object]], object]
#
Given a phone, it will return a new phone with a random vector that is close to the original phone.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
p |
float |
The element-wise probability of a change in a phone vector. |
0.005 |
abs_max_change |
float |
The maximum absolute value an element of the phone vector can change. |
2 |
random_state |
int |
Seed used for the random numbers used. |
None |
Returns:
Type | Description |
---|---|
Union[List[Tuple[float, object]], object] |
The phone object with a noised feature vector. |
sources
#
Sources for phone features.
sources.PHOIBLE
#
Source for phone inventories from phoible.org. Use the following citation:
1 2 3 4 5 6 |
|
sources.PANPHON
#
Source for phone inventories from panphon. Use the following citation:
1 2 3 4 5 6 7 |
|
PhoneSource
dataclass
#
The PhoneSource class is a dataclass that stores the information about a phone source.
A phone source is a csv containing phone definitions and their linguistic features
Attributes:
Name | Type | Description |
---|---|---|
urls |
List[str] |
a list of urls to the csv files |
index_column |
str |
the name of the column that contains the index (ipa character(s)) of the phone |
feature_columns |
List[str] |
a list of the names of the columns that contain the features of the phone |
language_column |
str |
the name of the column that contains the iso code of the language |
dialect_column |
str |
the name of the column that contains the dialect |