gransk.plugins.find

Find stuff in extracted content.

gransk.plugins.find.find_entities

class gransk.plugins.find.find_entities.Subscriber(pipeline)

Bases: gransk.core.abstract_subscriber.Subscriber

Class for finding entities in text based on regular expressions.

Add subscriber to pipeline.

Parameters:pipeline (gransk.core.pipeline.Pipeline) – Pipeline managing subscribers and events.
consume(doc, _)

Find entities in documents matching compiled regular expression.

Parameters:doc (gransk.core.document.Document) – Document object.
setup(config)

Compile configured regular expressions.

Parameters:config (dict) – Configuration object.

gransk.plugins.find.find_names_brute

class gransk.plugins.find.find_names_brute.Subscriber(pipeline)

Bases: gransk.core.abstract_subscriber.Subscriber

Class for finding names in text based on a provided list of tokens. This approach has the benefit over other Named Entity Extraction approaches that it is independent of the context in which the names are. It may thus be a good supplement to improve entity recognition.

Add subscriber to pipeline.

Parameters:pipeline (gransk.core.pipeline.Pipeline) – Pipeline managing subscribers and events.
consume(doc, _)

Find names in documents based on the provided word list.

Parameters:doc (gransk.core.document.Document) – Document object.
setup(config)

Load name model (word list) and compile regexes for stop characters.

Parameters:config (dict) – Configuration object.

gransk.plugins.find.polyglot_ner

class gransk.plugins.find.polyglot_ner.Subscriber(pipeline)

Bases: gransk.core.abstract_subscriber.Subscriber

Class for finding named entities in text using the Polyglot NER package.

Add subscriber to pipeline.

Parameters:pipeline (gransk.core.pipeline.Pipeline) – Pipeline managing subscribers and events.
consume(doc, _)

Find names in documents using Polyglot NER.

Parameters:doc (gransk.core.document.Document) – Document object.
setup(config)

Load Polyglot NER pakcage.

Parameters:config (dict) – Configuration object.