Language Models

class pynlpl.lm.lm.ARPALanguageModel(filename, encoding='utf-8', encoder=None, base_e=True, dounknown=True, debug=False, mode='simple')

Full back-off language model, loaded from file in ARPA format.

This class does not build the model but allows you to use a pre-computed one. You can use the tool ngram-count from for instance SRILM to actually build the model.

class NgramsProbs(data, mode='simple', delim=' ')

Store Ngrams with their probabilities and backoffs.

This class is used in order to abstract the physical storage layout, and enable memory/speed tradeoffs.

backoff(ngram)

Return backoff value of a given ngram tuple

prob(ngram)

Return probability of given ngram tuple

score(data, history=None)
scoreword(word, history=None)
class pynlpl.lm.lm.SimpleLanguageModel(n=2, casesensitive=True, beginmarker='<begin>', endmarker='<end>')

This is a simple unsmoothed language model. This class can both hold and compute the model.

append(sentence)
load(filename)
save(filename)
scoresentence(sentence)
class pynlpl.lm.srilm.SRILM(filename, n)
logscore(ngram)
scoresentence(sentence, unknownwordprob=-12)
exception pynlpl.lm.srilm.SRILMException

Base Exception for SRILM.

class pynlpl.lm.client.LMClient(host='localhost', port=12346, n=0)
scoresentence(sentence)