Elan¶

class pympi.Elan.Eaf(file_path=None, author='pympi')¶

Read and write Elan’s Eaf files.

Note

All times are in milliseconds and can’t have decimals.

Variables:

annotation_document (dict) – Annotation document TAG entries.
licences (dict) – Licences included in the file.
header (dict) – XML header.
media_descriptors (list) – Linked files, where every file is of the form: {attrib}.
properties (list) – Properties, where every property is of the form: (value, {attrib}).
linked_file_descriptors (list) – Secondary linked files, where every linked file is of the form: {attrib}.
timeslots (dict) – Timeslot data of the form: {id -> time(ms)}.
tiers (dict) –
Tiers, where every tier is of the form: {tier_name -> (aligned_annotations, reference_annotations, attributes, ordinal)},

aligned_annotations of the form: [{id -> (begin_ts, end_ts, value, svg_ref)}],

reference annotations of the form: [{id -> (reference, value, previous, svg_ref)}].
linguistic_types (list) – Linguistic types, where every type is of the form: {id -> attrib}.
locales (list) – Locales, every locale is of the form: {attrib}.
constraints (dict) – Constraints, every constraint is of the form: {stereotype -> description}.
controlled_vocabularies (dict) –
Controlled vocabulary, where every controlled vocabulary is of the form: {id -> (descriptions, entries, ext_ref)},

descriptions of the form: [(lang_ref, text)],

entries of the form: {id -> (values, ext_ref)},

values of the form: [(lang_ref, description, text)].
external_refs (list) – External references, where every reference is of the form [id, type, value].
lexicon_refs (list) – Lexicon references, where every reference is of the form: [{attribs}].
annotations (dict) – Dictionary of annotations of the form: {id -> tier}``, this is only used internally.

__init__(file_path=None, author='pympi')¶

Construct either a new Eaf file or read on from a file/stream.

Parameters:	file_path (str) – Path to read from, - for stdin. If `None` an empty Eaf file will be created. author (str) – Author of the file.

add_linguistic_type(lingtype, constraints=None, timealignable=True, graphicreferences=False, extref=None, param_dict=None)¶

Add a linguistic type.

Parameters:

lingtype (str) – Name of the linguistic type.
constraints (list) – Constraint names.
timealignable (bool) – Flag for time alignable.
graphicreferences (bool) – Flag for graphic references.
extref (str) – External reference.
param_dict (dict) – TAG attributes, when this is not None it will ignore all other options. Please only use dictionaries coming from the get_parameters_for_linguistic_type()

Raises KeyError:

If a constraint is not defined

add_linked_file(file_path, relpath=None, mimetype=None, time_origin=None, ex_from=None)¶

Add a linked file.

Raises KeyError:
Parameters:	file_path (str) – Path of the file. relpath (str) – Relative path of the file. mimetype (str) – Mimetype of the file, if `None` it tries to guess it according to the file extension which currently only works for wav, mpg, mpeg and xml. time_origin (int) – Time origin for the media file. ex_from (str) – Extracted from field.
	If mimetype had to be guessed and a non standard extension or an unknown mimetype.

add_secondary_linked_file(file_path, relpath=None, mimetype=None, time_origin=None, assoc_with=None)¶

Add a secondary linked file.

Raises KeyError:
Parameters:	file_path (str) – Path of the file. relpath (str) – Relative path of the file. mimetype (str) – Mimetype of the file, if `None` it tries to guess it according to the file extension which currently only works for wav, mpg, mpeg and xml. time_origin (int) – Time origin for the media file. assoc_with (str) – Associated with field.
	If mimetype had to be guessed and a non standard extension or an unknown mimetype.

add_tier(tier_id, ling='default-lt', parent=None, locale=None, part=None, ann=None, tier_dict=None)¶

Add a tier. When no linguistic type is given and the default linguistic type is unavailable then the assigned linguistic type will be the first in the list.

Parameters:

tier_id (str) – Name of the tier.
ling (str) – Linguistic type, if the type is not available it will warn and pick the first available type.
parent (str) – Parent tier name.
locale (str) – Locale.
part (str) – Participant.
ann (str) – Annotator.
tier_dict (dict) – TAG attributes, when this is not None it will ignore all other options. Please only use dictionaries coming from the get_parameters_for_tier()

Raises ValueError:

If the tier_id is empty

child_tiers_for(id_tier)¶

Give all child tiers for a tier.

Raises KeyError:
Parameters:	id_tier (str) – Name of the tier.
Returns:	List of all children
	If the tier is non existent.

clean_time_slots()¶: Clean up all unused timeslots. .. warning:: This can and will take time for larger tiers. When you want to do a lot of operations on a lot of tiers please unset the flags for cleaning in the functions so that the cleaning is only performed afterwards.

copy_tier(eaf_obj, tier_name)¶

Copies a tier to another pympi.Elan.Eaf object.

Raises KeyError:
Parameters:	eaf_obj (pympi.Elan.Eaf) – Target Eaf object. tier_name (str) – Name of the tier.
	If the tier doesn’t exist.

create_controlled_vocabulary(cv_id, descriptions, entries, ext_ref=None)¶

Create a controlled vocabulary. .. warning:: This is a very raw implementation and you should check the Eaf file format specification for the entries.

Parameters:	cv_id (str) – Name of the controlled vocabulary. descriptions (list) – List of descriptions. entries (dict) – Entries dictionary. ext_ref (str) – External reference.

create_gaps_and_overlaps_tier(tier1, tier2, tier_name=None, maxlen=-1, fast=False)¶

Create a tier with the gaps and overlaps of the annotations. For types see get_gaps_and_overlaps()

Parameters:	tier1 (str) – Name of the first tier. tier2 (str) – Name of the second tier. tier_name (str) – Name of the new tier, if `None` the name will be generated. maxlen (int) – Maximum length of gaps (skip longer ones), if `-1` no maximum will be used. fast (bool) – Flag for using the fast method.
Returns:	List of gaps and overlaps of the form: `[(type, start, end)]`.
Raises:	KeyError – If a tier is non existent. IndexError – If no annotations are available in the tiers.

extract(start, end)¶

Extracts the selected time frame as a new object.

Parameters:	start (int) – Start time. end (int) – End time.
Returns:	class:pympi.Elan.Eaf object containing the extracted frame.

filter_annotations(tier, tier_name=None, filtin=None, filtex=None, regex=False, safe=False)¶

Filter annotations in a tier using an exclusive and/or inclusive filter.

Parameters:

tier (str) – Name of the tier.
tier_name (str) – Name of the output tier, when None the name will be generated.
filtin (list) – List of strings to be included, if None all annotations all is included.
filtex (list) – List of strings to be excluded, if None no strings are excluded.
regex (bool) – If this flag is set, the filters are seen as regex matches.
safe (bool) – Ignore zero length annotations(when working with possible malformed data).

Raises KeyError:

If the tier is non existent.

generate_annotation_id()¶: Generate the next annotation id, this function is mainly used internally.

generate_ts_id(time=None)¶

Generate the next timeslot id, this function is mainly used internally

Raises ValueError:
Parameters:	time (int) – Initial time to assign to the timeslot.
	If the time is negative.

get_annotation_data_at_time(id_tier, time)¶

Give the annotations at the given time.

Raises KeyError:
Parameters:	id_tier (str) – Name of the tier. time (int) – Time of the annotation.
Returns:	List of annotations at that time.
	If the tier is non existent.

get_annotation_data_between_times(id_tier, start, end)¶

Gives the annotations within the times.

Raises KeyError:
Parameters:	id_tier (str) – Name of the tier. start (int) – Start time of the annotation. end (int) – End time of the annotation.
Returns:	List of annotations within that time.
	If the tier is non existent.

get_annotation_data_for_tier(id_tier)¶

Gives a list of annotations of the form: (begin, end, value)

Raises KeyError:
Parameters:	id_tier (str) – Name of the tier.
	If the tier is non existent.

get_full_time_interval()¶

Give the full time interval of the file.

Returns:	Tuple of the form: `(min_time, max_time)`.

get_gaps_and_overlaps(tier1, tier2, maxlen=-1)¶

Give gaps and overlaps. The return types are shown in the table below. The string will be of the format: id_tiername_tiername.

Note

There is also a faster method: get_gaps_and_overlaps2()

For example when a gap occurs between tier1 and tier2 and they are called speakerA and speakerB the annotation value of that gap will be G12_speakerA_speakerB.

The gaps and overlaps are calculated using Heldner and Edlunds method found in:

Heldner, M., & Edlund, J. (2010). Pauses, gaps and overlaps in conversations. Journal of Phonetics, 38(4), 555–568. doi:10.1016/j.wocn.2010.08.002

id	Description
O12	Overlap from tier1 to tier2
O21	Overlap from tier2 to tier1
G12	Between speaker gap from tier1 to tier2
G21	Between speaker gap from tier2 to tier1
W12	Within speaker overlap from tier2 in tier1
W21	Within speaker overlap from tier1 in tier2
P1	Pause for tier1
P2	Pause for tier2

Parameters:	tier1 (str) – Name of the first tier. tier2 (str) – Name of the second tier. maxlen (int) – Maximum length of gaps (skip longer ones), if `-1` no maximum will be used.
Yields:	Tuples of the form `[(start, end, type)]`.
Raises:	KeyError – If a tier is non existent. IndexError – If no annotations are available in the tiers.

get_gaps_and_overlaps2(tier1, tier2, maxlen=-1)¶

Faster variant of get_gaps_and_overlaps().

Raises KeyError:
Parameters:	tier1 (str) – Name of the first tier. tier2 (str) – Name of the second tier. maxlen (int) – Maximum length of gaps (skip longer ones), if `-1` no maximum will be used.
Yields:	Tuples of the form `[(start, end, type)]`.
	If a tier is non existent.

get_linguistic_type_names()¶

Give a list of available linguistic types.

Returns:	List of linguistic type names.

get_linked_files()¶: Give all linked files.

get_parameters_for_linguistic_type(lingtype)¶

Give the parameter dictionary, this is usable in add_linguistic_type().

Raises KeyError:
Parameters:	lingtype (str) – Name of the linguistic type.
	If the linguistic type doesn’t exist.

get_parameters_for_tier(id_tier)¶

Give the parameter dictionary, this is usaable in add_tier().

Raises KeyError:
Parameters:	id_tier (str) – Name of the tier.
Returns:	Dictionary of parameters.
	If the tier is non existent.

get_ref_annotation_at_time(tier, time)¶

Give the ref annotations at the given time.

Raises KeyError:
Parameters:	tier (str) – Name of the tier. time (int) – Time of the annotation of the parent.
Returns:	List of annotations at that time.
	If the tier is non existent.

get_ref_annotation_data_for_tier(id_tier)¶

“Give a list of all reference annotations of the form: [(start, end, value, refvalue)]

Raises KeyError:
Parameters:	id_tier (str) – Name of the tier.
	If the tier is non existent.
Yields:	Reference annotations within that tier.

get_secondary_linked_files()¶: Give all linked files.

get_tier_ids_for_linguistic_type(ling_type, parent=None)¶

Give a list of all tiers matching a linguistic type.

Raises KeyError:
Parameters:	ling_type (str) – Name of the linguistic type. parent (str) – Only match tiers from this parent, when `None` this option will be ignored.
Returns:	List of tiernames.
	If a tier or linguistic type is non existent.

get_tier_names()¶

List all the tier names.

Returns:	List of all tier names

insert_annotation(id_tier, start, end, value='', svg_ref=None)¶

Insert an annotation.

Parameters:	id_tier (str) – Name of the tier. start (int) – Start time of the annotation. end (int) – End time of the annotation. value (str) – Value of the annotation. svg_ref (str) – Svg reference.
Raises:	KeyError – If the tier is non existent. ValueError – If one of the values is negative or start is bigger then end or if the tiers already contains ref annotations.

insert_ref_annotation(id_tier, tier2, time, value, prev=None, svg=None)¶

Insert a reference annotation.

Parameters:	id_tier (str) – Name of the tier. tier2 (str) – Tier of the referenced annotation. time (int) – Time of the referenced annotation. value (str) – Value of the annotation. prev (str) – Id of the previous annotation. svg_ref (str) – Svg reference.
Raises:	KeyError – If the tier is non existent. ValueError – If the tier already contains normal annotations or if there is no annotation in the tier on the time to reference to.

merge_tiers(tiers, tiernew=None, gapt=0, sep='_', safe=False)¶

Merge tiers into a new tier and when the gap is lower then the threshhold glue the annotations together.

Parameters:

tiers (list) – List of tier names.
tiernew (str) – Name for the new tier, if None the name will be generated.
gapt (int) – Threshhold for the gaps, if the this is set to 10 it means that all gaps below 10 are ignored.
sep (str) – Separator for the merged annotations.
safe (bool) – Ignore zero length annotations(when working with possible malformed data).

Raises KeyError:

If a tier is non existent.

remove_all_annotations_from_tier(id_tier, clean=True)¶

remove all annotations from a tier

Raises KeyError:
Parameters:	id_tier (str) – Name of the tier.
	If the tier is non existent.

remove_annotation(id_tier, time, clean=True)¶

Remove an annotation in a tier, if you need speed the best thing is to clean the timeslots after the last removal.

Raises KeyError:
Parameters:	id_tier (str) – Name of the tier. time (int) – Timepoint within the annotation. clean (bool) – Flag to clean the timeslots afterwards.
	If the tier is non existent.
Returns:	Number of removed annotations.

remove_controlled_vocabulary(cv)¶

Remove a controlled vocabulary.

Raises KeyError:
Parameters:	cv (str) – Controlled vocabulary id.
	If the controlled vocabulary is non existent.

remove_linguistic_type(ling_type)¶

Remove a linguistic type.

Raises KeyError:
Parameters:	ling_type (str) – Name of the linguistic type.
	When the linguistic type doesn’t exist.

remove_linked_files(file_path=None, relpath=None, mimetype=None, time_origin=None, ex_from=None)¶

Remove all linked files that match all the criteria, criterias that are None are ignored.

Parameters:	file_path (str) – Path of the file. relpath (str) – Relative filepath. mimetype (str) – Mimetype of the file. time_origin (int) – Time origin. ex_from (str) – Extracted from.

remove_secondary_linked_files(file_path=None, relpath=None, mimetype=None, time_origin=None, assoc_with=None)¶

Remove all secondary linked files that match all the criteria, criterias that are None are ignored.

Parameters:	file_path (str) – Path of the file. relpath (str) – Relative filepath. mimetype (str) – Mimetype of the file. time_origin (int) – Time origin. ex_from (str) – Extracted from.

remove_tier(id_tier, clean=True)¶

Remove a tier.

Raises KeyError:
Parameters:	id_tier (str) – Name of the tier. clean (bool) – Flag to also clean the timeslots.
	If tier is non existent.

remove_tiers(tiers)¶

Remove multiple tiers, note that this is a lot faster then removing them individually because of the delayed cleaning of timeslots.

Raises KeyError:
Parameters:	tiers (list) – Names of the tier to remove.
	If a tier is non existent.

shift_annotations(time)¶

Shift all annotations in time. Annotations that are in the beginning and a left shift is applied can be squashed or discarded.

Parameters:	time (int) – Time shift width, negative numbers make a left shift.
Returns:	Tuple of a list of squashed annotations and a list of removed annotations in the format: `(tiername, start, end, value)`.

to_file(file_path, pretty=True)¶

Write the object to a file, if the file already exists a backup will be created with the .bak suffix.

Parameters:	file_path (str) – Filepath to write to. pretty (bool) – Flag for pretty XML printing (Only unset this if you are afraid of wasting bytes because it won’t print unneccesary whitespace).

to_textgrid(filtin=[], filtex=[], regex=False)¶

Convert the object to a pympi.Praat.TextGrid object.

Raises ImportError:
Parameters:	filtin (list) – Include only tiers in this list, if empty all tiers are included. filtex (list) – Exclude all tiers in this list. regex (bool) – If this flag is set the filters are seen as regexes.
Returns:	`pympi.Praat.TextGrid` representation.
	If the pympi.Praat module can’t be loaded.

pympi.Elan.indent(el, level=0)¶

Function to pretty print the xml, meaning adding tabs and newlines.

Parameters:	el (ElementTree.Element) – Current element. level (int) – Current level.

pympi.Elan.parse_eaf(file_path, eaf_obj)¶

Parse an EAF file

Parameters:	file_path (str) – Path to read from, - for stdin. eaf_obj (pympi.Elan.Eaf) – Existing EAF object to put the data in.
Returns:	EAF object.

pympi.Elan.to_eaf(file_path, eaf_obj, pretty=True)¶

Write an Eaf object to file.

Parameters:	file_path (str) – Filepath to write to, - for stdout. eaf_obj (pympi.Elan.Eaf) – Object to write. pretty (bool) – Flag to set pretty printing.

Elan¶

Table Of Contents

Previous topic

This Page