Formats¶
Corpus Gesproken Nederlands¶
-
exception
pynlpl.formats.cgn.InvalidFeatureException¶
-
exception
pynlpl.formats.cgn.InvalidTagException¶
-
pynlpl.formats.cgn.parse_cgn_postag(rawtag, raisefeatureexceptions=False)¶
GIZA++¶
-
class
pynlpl.formats.giza.GizaModel(filename, encoding='utf-8')¶
-
class
pynlpl.formats.giza.GizaSentenceAlignment(sourceline, targetline, index)¶ -
getalignedtarget(index)¶ Returns target range only if source index aligns to a single consecutive range of target tokens.
-
intersect(other)¶
-
-
class
pynlpl.formats.giza.IntersectionAlignment(source2target, target2source, encoding=False)¶ -
reset()¶
-
-
class
pynlpl.formats.giza.MultiWordAlignment(filename, encoding=False)¶ Source to Target alignment: reads source-target.A3.final files, in which each source word may be aligned to multiple target words (adapted from code by Sander Canisius)
-
reset()¶
-
targetword(index, targetwords, alignment)¶ Return the aligned targeword for a specified index in the source words. Multiple words are concatenated together with a space in between
-
targetwords(index, targetwords, alignment)¶ Return the aligned targetwords for a specified index in the source words
-
-
class
pynlpl.formats.giza.WordAlignment(filename, encoding=False)¶ Target to Source alignment: reads target-source.A3.final files, in which each source word is aligned to one target word
-
reset()¶
-
targetword(index, targetwords, alignment)¶ Return the aligned targetword for a specified index in the source words
-
-
pynlpl.formats.giza.parseAlignment(tokens)¶
Moses¶
-
class
pynlpl.formats.moses.PhraseTable(filename, quiet=False, reverse=False, delimiter='|||', score_column=3, max_sourcen=0, sourceencoder=None, targetencoder=None, scorefilter=None)¶
-
class
pynlpl.formats.moses.PhraseTableClient(host='localhost', port=65432)¶
SoNaR¶
-
class
pynlpl.formats.sonar.Corpus(corpusdir, extension='pos', restrict_to_collection='', conditionf=<function Corpus.<lambda>>, ignoreerrors=False)¶
-
class
pynlpl.formats.sonar.CorpusDocument(filename, encoding='iso-8859-15')¶ This class represent one document/text of the Corpus (read-only)
-
paragraphs(with_id=False)¶ Extracts paragraphs, returns list of plain-text(!) paragraphs
-
sentences()¶ Iterate over all sentences (sentence_id, sentence) in the document, sentence is a list of 4-tuples (word,id,pos,lemma)
-
words()¶
-
-
class
pynlpl.formats.sonar.CorpusDocumentX(filename, tree=None, index=True)¶ This class represent one document/text of the Corpus, loaded into memory at once and retaining the full structure
-
paragraphs(node=None)¶ iterate over paragraphs
-
save(filename=None, encoding='iso-8859-15')¶
-
sentences(node=None)¶ iterate over sentences
-
validate(formats_dir='../formats/')¶ checks if the document is valid
-
words(node=None)¶ iterate over words
-
xpath(expression)¶ Executes an xpath expression using the correct namespaces
-
-
class
pynlpl.formats.sonar.CorpusFiles(corpusdir, extension='pos', restrict_to_collection='', conditionf=<function Corpus.<lambda>>, ignoreerrors=False)¶
-
class
pynlpl.formats.sonar.CorpusX(corpusdir, extension='pos', restrict_to_collection='', conditionf=<function Corpus.<lambda>>, ignoreerrors=False)¶
-
pynlpl.formats.sonar.ns(namespace)¶ Resolves the namespace identifier to a full URL