pynlpl.formats.folia.Correction¶
-
class
pynlpl.formats.folia.
Correction
(doc, *args, **kwargs)¶ Bases:
pynlpl.formats.folia.AbstractElement
,pynlpl.formats.folia.AllowGenerateID
Corrections are one of the most complex annotation types in FoLiA. Corrections can be applied not just over text, but over any type of structure annotation, token annotation or span annotation. Corrections explicitly preserve the original, and recursively so if corrections are done over other corrections.
Despite their complexity, the library treats correction transparently. Whenever you query for a particular element, and it is part of a correction, you get the corrected version rather than the original. The original is always non-authoritative and normal selection methods will ignore it.
- This class takes four classes as children, that in turn encapsulate the actual annotations:
Method Summary
__init__
(doc, *args, **kwargs)Initialize self. accepts
(Class[, raiseexceptions, parentinstance])add
(child, *args, **kwargs)addable
(parent[, set, raiseexceptions])Tests whether a new element of this class can be added to the parent. addidsuffix
(idsuffix[, recursive])Appends a suffix to this element’s ID, and optionally to all child IDs as well. addtoindex
([norecurse])Makes sure this element (and all subelements), are properly added to the index. ancestor
(*Classes)Find the most immediate ancestor of the specified type, multiple classes may be specified. ancestors
([Class])Generator yielding all ancestors of this element, effectively back-tracing its path to the root element. append
(child, *args, **kwargs)See AbstractElement.append()
context
(size[, placeholder, scope])Returns this word in context, {size} words to the left, the current word, and {size} words to the right copy
([newdoc, idsuffix])Make a deep copy of this element and all its children. copychildren
([newdoc, idsuffix])Generator creating a deep copy of the children of this element. correct
(**kwargs)count
(Class[, set, recursive, ignore, node])Like AbstractElement.select()
, but instead of returning the elements, it merely counts them.current
([index])Get the current authoritative annotation (used with suggestions in a structural context) deepvalidation
()Perform deep validation of this element. description
()Obtain the description associated with the element. feat
(subset)Obtain the feature class value of the specific subset. findcorrectionhandling
(cls)Find the proper correctionhandling given a textclass by looking in the underlying corrections where it is reused findreplaceables
(parent[, set])Internal method to find replaceable elements. generate_id
(cls)getindex
(child[, recursive, ignore])Get the index at which an element occurs, recursive by default! getmetadata
([key])Get the metadata that applies to this element, automatically inherited from parent elements gettextdelimiter
([retaintokenisation])See AbstractElement.gettextdelimiter()
hascurrent
([allowempty])Does the correction record the current authoritative annotation (needed only in a structural context when suggestions are proposed) hasnew
([allowempty])Does the correction define new corrected annotations? hasoriginal
([allowempty])Does the correction record the old annotations prior to correction? hasphon
([cls, strict, correctionhandling])See AbstractElement.hasphon()
hassuggestions
([allowempty])Does the correction propose suggestions for correction? hastext
([cls, strict, correctionhandling])See AbstractElement.hastext()
incorrection
()Is this element part of a correction? If it is, it returns the Correction element (evaluating to True), otherwise it returns None insert
(index, child, *args, **kwargs)items
([founditems])Returns a depth-first flat list of all items below this element (not limited to AbstractElement) json
([attribs, recurse, ignorelist])Serialises the FoLiA element and all its contents to a Python dictionary suitable for serialisation to JSON. leftcontext
(size[, placeholder, scope])Returns the left context for an element, as a list. new
([index])Get the new corrected annotation. next
([Class, scope, reverse])Returns the next element, if it is of the specified type and if it does not cross the boundary of the defined scope. original
([index])Get the old annotation prior to correction. originaltext
([cls])Alias for retrieving the original uncorrect text. parsexml
(node, doc, **kwargs)Internal class method used for turning an XML element into an instance of the Class. phon
([cls, previousdelimiter, strict, …])See AbstractElement.phon()
phoncontent
([cls, correctionhandling])See AbstractElement.phoncontent()
postappend
()This method will be called after an element is added to another and does some checks. previous
([Class, scope])Returns the previous element, if it is of the specified type and if it does not cross the boundary of the defined scope. relaxng
([includechildren, extraattribs, …])Returns a RelaxNG definition for this element (as an XML element (lxml.etree) rather than a string) remove
(child)Removes the child element replace
(child, *args, **kwargs)Appends a child element like append()
, but replaces any existing child element of the same type and set.resolveword
(id)rightcontext
(size[, placeholder, scope])Returns the right context for an element, as a list. select
(Class[, set, recursive, ignore, node])Select child elements of the specified class. setdoc
(newdoc)Set a different document. setdocument
(doc)Associate a document with this element. setparents
()Correct all parent relations for elements within the scop. settext
(text[, cls])Set the text for this element. speech_speaker
()Retrieves the speaker of the audio or video file associated with the element. speech_src
()Retrieves the URL/filename of the audio or video file associated with the element. stricttext
([cls])Alias for text()
withstrict=True
suggestions
([index])Get suggestions for correction. text
([cls, retaintokenisation, …])See AbstractElement.text()
textcontent
([cls, correctionhandling])See AbstractElement.textcontent()
textvalidation
([warnonly])Run text validation on this element. toktext
([cls])Alias for text()
withretaintokenisation=True
updatetext
()Recompute textual value based on the text content of the children. xml
([attribs, elements, skipchildren])Serialises the FoLiA element and all its contents to XML. xmlstring
([pretty_print])Serialises this FoLiA element and all its contents to XML. __iter__
()Iterate over all children of this element. __len__
()Returns the number of child elements under the current element. __str__
()Alias for text()
Class Attributes
-
ACCEPTED_DATA
= (<class 'pynlpl.formats.folia.Comment'>, <class 'pynlpl.formats.folia.Current'>, <class 'pynlpl.formats.folia.Description'>, <class 'pynlpl.formats.folia.ErrorDetection'>, <class 'pynlpl.formats.folia.Feature'>, <class 'pynlpl.formats.folia.ForeignData'>, <class 'pynlpl.formats.folia.Metric'>, <class 'pynlpl.formats.folia.New'>, <class 'pynlpl.formats.folia.Original'>, <class 'pynlpl.formats.folia.Suggestion'>)¶
-
ANNOTATIONTYPE
= 16¶
-
AUTH
= True¶
-
AUTO_GENERATE_ID
= False¶
-
LABEL
= 'Correction'¶
-
OCCURRENCES
= 0¶
-
OCCURRENCES_PER_SET
= 0¶
-
OPTIONAL_ATTRIBS
= (0, 1, 2, 4, 3, 5, 8, 6, 7, 9, 11)¶
-
PHONCONTAINER
= False¶
-
PRIMARYELEMENT
= True¶
-
PRINTABLE
= True¶
-
REQUIRED_ATTRIBS
= None¶
-
REQUIRED_DATA
= None¶
-
SETONLY
= False¶
-
SPEAKABLE
= True¶
-
SUBSET
= None¶
-
TEXTCONTAINER
= False¶
-
TEXTDELIMITER
= None¶
-
XLINK
= False¶
-
XMLTAG
= 'correction'¶
Method Details
-
__init__
(doc, *args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
__init__
(doc, *args, **kwargs) Initialize self. See help(type(self)) for accurate signature.
-
classmethod
accepts
(Class, raiseexceptions=True, parentinstance=None)¶
-
add
(child, *args, **kwargs)¶
-
classmethod
addable
(parent, set=None, raiseexceptions=True)¶ Tests whether a new element of this class can be added to the parent.
This method is mostly for internal use. This will use the
OCCURRENCES
property, but may be overidden by subclasses for more customised behaviour.Parameters: - parent (
AbstractElement
) – The element that is being added to - set (str or None) – The set
- raiseexceptions (bool) – Raise an exception if the element can’t be added?
Returns: bool
Raises: ValueError
- parent (
-
addidsuffix
(idsuffix, recursive=True)¶ Appends a suffix to this element’s ID, and optionally to all child IDs as well. There is sually no need to call this directly, invoked implicitly by
copy()
-
addtoindex
(norecurse=[])¶ Makes sure this element (and all subelements), are properly added to the index.
Mostly for internal use.
-
ancestor
(*Classes)¶ Find the most immediate ancestor of the specified type, multiple classes may be specified.
Parameters: *Classes – The possible classes ( AbstractElement
or subclasses) to select from. Not instances!Example:
paragraph = word.ancestor(folia.Paragraph)
-
ancestors
(Class=None)¶ Generator yielding all ancestors of this element, effectively back-tracing its path to the root element. A tuple of multiple classes may be specified.
Parameters: *Class – The class or classes ( AbstractElement
or subclasses). Not instances!Yields: elements (instances derived from AbstractElement
)
-
append
(child, *args, **kwargs)¶ See
AbstractElement.append()
-
context
(size, placeholder=None, scope=None)¶ Returns this word in context, {size} words to the left, the current word, and {size} words to the right
-
copy
(newdoc=None, idsuffix='')¶ Make a deep copy of this element and all its children.
Parameters: - newdoc (
Document
) – The document the copy should be associated with. - idsuffix (str or bool) – If set to a string, the ID of the copy will be append with this (prevents duplicate IDs when making copies for the same document). If set to
True
, a random suffix will be generated.
Returns: a copy of the element
- newdoc (
-
copychildren
(newdoc=None, idsuffix='')¶ Generator creating a deep copy of the children of this element.
Invokes
copy()
on all children, parameters are the same.
-
correct
(**kwargs)¶
-
count
(Class, set=None, recursive=True, ignore=True, node=None)¶ Like
AbstractElement.select()
, but instead of returning the elements, it merely counts them.Returns: int
-
current
(index=None)¶ Get the current authoritative annotation (used with suggestions in a structural context)
This returns only one annotation if multiple exist, use index to select another in the sequence.
Returns: an annotation element ( AbstractElement
)Raises: NoSuchAnnotation
-
deepvalidation
()¶ Perform deep validation of this element.
Raises: DeepValidationError
-
description
()¶ Obtain the description associated with the element.
Raises: NoSuchAnnotation
if there is no associated description.
-
feat
(subset)¶ Obtain the feature class value of the specific subset.
If a feature occurs multiple times, the values will be returned in a list.
Example:
sense = word.annotation(folia.Sense) synset = sense.feat('synset')
Returns: str or list
-
findcorrectionhandling
(cls)¶ Find the proper correctionhandling given a textclass by looking in the underlying corrections where it is reused
-
classmethod
findreplaceables
(parent, set=None, **kwargs)¶ Internal method to find replaceable elements. Auxiliary function used by
AbstractElement.replace()
. Can be overriden for more fine-grained control.
-
generate_id
(cls)¶
-
getindex
(child, recursive=True, ignore=True)¶ Get the index at which an element occurs, recursive by default!
Returns: int
-
getmetadata
(key=None)¶ Get the metadata that applies to this element, automatically inherited from parent elements
-
gettextdelimiter
(retaintokenisation=False)¶
-
hascurrent
(allowempty=False)¶ Does the correction record the current authoritative annotation (needed only in a structural context when suggestions are proposed)
-
hasnew
(allowempty=False)¶ Does the correction define new corrected annotations?
-
hasoriginal
(allowempty=False)¶ Does the correction record the old annotations prior to correction?
-
hasphon
(cls='current', strict=True, correctionhandling=1)¶
-
hassuggestions
(allowempty=False)¶ Does the correction propose suggestions for correction?
-
hastext
(cls='current', strict=True, correctionhandling=1)¶
-
incorrection
()¶ Is this element part of a correction? If it is, it returns the Correction element (evaluating to True), otherwise it returns None
-
insert
(index, child, *args, **kwargs)¶
-
items
(founditems=[])¶ Returns a depth-first flat list of all items below this element (not limited to AbstractElement)
-
json
(attribs=None, recurse=True, ignorelist=False)¶ Serialises the FoLiA element and all its contents to a Python dictionary suitable for serialisation to JSON.
Example:
import json json.dumps(word.json())
Returns: dict
-
leftcontext
(size, placeholder=None, scope=None)¶ Returns the left context for an element, as a list. This method crosses sentence/paragraph boundaries by default, which can be restricted by setting scope
-
new
(index=None)¶ Get the new corrected annotation.
This returns only one annotation if multiple exist, use index to select another in the sequence.
Returns: an annotation element ( AbstractElement
)Raises: NoSuchAnnotation
-
next
(Class=True, scope=True, reverse=False)¶ Returns the next element, if it is of the specified type and if it does not cross the boundary of the defined scope. Returns None if no next element is found. Non-authoritative elements are never returned.
Parameters: - Class (*) – The class to select; any python class subclassed off ‘AbstractElement`, may also be a tuple of multiple classes. Set to
True
to constrain to the same class as that of the current instance, set toNone
to not constrain at all - scope (*) – A list of classes which are never crossed looking for a next element. Set to
True
to constrain to a default list of structure elements (Sentence,Paragraph,Division,Event, ListItem,Caption), set toNone
to not constrain at all.
- Class (*) – The class to select; any python class subclassed off ‘AbstractElement`, may also be a tuple of multiple classes. Set to
-
original
(index=None)¶ Get the old annotation prior to correction.
This returns only one annotation if multiple exist, use index to select another in the sequence.
Returns: an annotation element ( AbstractElement
)Raises: NoSuchAnnotation
-
originaltext
(cls='original')¶ Alias for retrieving the original uncorrect text.
A call to
text()
withcorrectionhandling=CorrectionHandling.ORIGINAL
-
classmethod
parsexml
(node, doc, **kwargs)¶ Internal class method used for turning an XML element into an instance of the Class.
Parameters: - node - XML Element (*) –
- doc - Document (*) –
Returns: An instance of the current Class.
-
phon
(cls='current', previousdelimiter='', strict=False, correctionhandling=1)¶
-
phoncontent
(cls='current', correctionhandling=1)¶
-
postappend
()¶ This method will be called after an element is added to another and does some checks.
It can do extra checks and if necessary raise exceptions to prevent addition. By default makes sure the right document is associated.
This method is mostly for internal use.
-
previous
(Class=True, scope=True)¶ Returns the previous element, if it is of the specified type and if it does not cross the boundary of the defined scope. Returns None if no next element is found. Non-authoritative elements are never returned.
Parameters: - Class (*) – The class to select; any python class subclassed off ‘AbstractElement`. Set to
True
to constrain to the same class as that of the current instance, set toNone
to not constrain at all - scope (*) – A list of classes which are never crossed looking for a next element. Set to
True
to constrain to a default list of structure elements (Sentence,Paragraph,Division,Event, ListItem,Caption), set toNone
to not constrain at all.
- Class (*) – The class to select; any python class subclassed off ‘AbstractElement`. Set to
-
classmethod
relaxng
(includechildren=True, extraattribs=None, extraelements=None, origclass=None)¶ Returns a RelaxNG definition for this element (as an XML element (lxml.etree) rather than a string)
-
remove
(child)¶ Removes the child element
-
replace
(child, *args, **kwargs)¶ Appends a child element like
append()
, but replaces any existing child element of the same type and set. If no such child element exists, this will act the same as append()Keyword Arguments: - alternative (bool) – If set to True, the replaced element will be made into an alternative. Simply use
AbstractElement.append()
if you want the added element - be an alternative. (to) –
See
AbstractElement.append()
for more information and all parameters.- alternative (bool) – If set to True, the replaced element will be made into an alternative. Simply use
-
resolveword
(id)¶
-
rightcontext
(size, placeholder=None, scope=None)¶ Returns the right context for an element, as a list. This method crosses sentence/paragraph boundaries by default, which can be restricted by setting scope
-
select
(Class, set=None, recursive=True, ignore=True, node=None)¶ Select child elements of the specified class.
A further restriction can be made based on set.
Parameters: - Class (class) – The class to select; any python class (not instance) subclassed off
AbstractElement
- Set (str) – The set to match against, only elements pertaining to this set will be returned. If set to None (default), all elements regardless of set will be returned.
- recursive (bool) – Select recursively? Descending into child elements? Defaults to
True
. - ignore – A list of Classes to ignore, if set to
True
instead of a list, all non-authoritative elements will be skipped (this is the default behaviour and corresponds to the following elements:Alternative
,AlternativeLayer
,Suggestion
, andfolia.Original
. These elements and those contained within are never authorative. You may also include the boolean True as a member of a list, if you want to skip additional tags along the predefined non-authoritative ones. - node (*) – Reserved for internal usage, used in recursion.
Yields: Elements (instances derived from
AbstractElement
)Example:
for sense in text.select(folia.Sense, 'cornetto', True, [folia.Original, folia.Suggestion, folia.Alternative] ): ..
- Class (class) – The class to select; any python class (not instance) subclassed off
-
setdoc
(newdoc)¶ Set a different document. Usually no need to call this directly, invoked implicitly by
copy()
-
setdocument
(doc)¶ Associate a document with this element.
Parameters: doc ( Document
) – A documentEach element must be associated with a FoLiA document.
-
setparents
()¶ Correct all parent relations for elements within the scop. There is sually no need to call this directly, invoked implicitly by
copy()
-
settext
(text, cls='current')¶ Set the text for this element.
Parameters: - text (str) – The text
- cls (str) – The class of the text, defaults to
current
(leave this unless you know what you are doing). There may be only one text content element of each class associated with the element.
-
speech_speaker
()¶ Retrieves the speaker of the audio or video file associated with the element.
The source is inherited from ancestor elements if none is specified. For this reason, always use this method rather than access the
src
attribute directly.Returns: str or None if not found
-
speech_src
()¶ Retrieves the URL/filename of the audio or video file associated with the element.
The source is inherited from ancestor elements if none is specified. For this reason, always use this method rather than access the
src
attribute directly.Returns: str or None if not found
-
suggestions
(index=None)¶ Get suggestions for correction.
Yields: Suggestion
element that encapsulate the suggested annotations (if index isNone
, default)Returns: a Suggestion
element that encapsulate the suggested annotations (if index is set)Raises: IndexError
-
text
(cls='current', retaintokenisation=False, previousdelimiter='', strict=False, correctionhandling=1, normalize_spaces=False)¶
-
textcontent
(cls='current', correctionhandling=1)¶
-
textvalidation
(warnonly=None)¶ Run text validation on this element. Checks whether any text redundancy is consistent and whether offsets are valid.
Parameters: warnonly (bool) – Warn only (True) or raise exceptions (False). If set to None then this value will be determined based on the document’s FoLiA version (Warn only before FoLiA v1.5) Returns: bool
-
updatetext
()¶ Recompute textual value based on the text content of the children. Only supported on elements that are a
TEXTCONTAINER
-
xml
(attribs=None, elements=None, skipchildren=False)¶ Serialises the FoLiA element and all its contents to XML.
Arguments are mostly for internal use.
Returns: an lxml.etree.Element See also
AbstractElement.xmlstring()
- for direct string output
-
xmlstring
(pretty_print=False)¶ Serialises this FoLiA element and all its contents to XML.
Returns: a string with XML representation for this element and all its children Return type: str
-
__iter__
()¶ Iterate over all children of this element.
Example:
for annotation in word: ...
-
__len__
()¶ Returns the number of child elements under the current element.