What is SProUT ?
SProUT is DFKI LT Lab's linguistic army knife, a flexible multi-purpose engine for domain-independent and domain-specific multilingual NLP tasks such as structured named entity recognition, information extraction, opinion mining, ontology extraction from text, and many more.
SProUT (Shallow Processing with Unification and Typed Feature Structures) is also a platform for development of multilingual shallow text processing and information extraction systems.
It consists of several reusable Unicode-capable online linguistic processing components for basic linguistic operations ranging from tokenization to coreference matching. Since typed feature structures (TFS) are used as a uniform data structure for representing the input and output by each of these processing resources, they can be flexibly combined into a pipeline that produces several streams of linguistically annotated structures, which serve as an input for the shallow grammar interpreter, applied at the next stage.
The grammar formalism in SProUT, called XTDL is a blend of very efficient finite-state techniques and unification-based formalisms which are known to guarantee transparency and expressiveness. A grammar in SProUT consists of pattern/action rules, where the LHS of a rule is a regular expression over TFSs with functional operators and coreferences, representing the recognition pattern, and the RHS of a rule is a TFS specification of the output structure. Click here to learn more about XTDL.
Furthermore, SProUT comes with an integrated grammar development and testing environment.
Currently, the platform provides linguistic processing resources for several languages including among other English, German, French, Italian, Dutch and Spanish.
To learn more about SProUT see the Publications and Documentation sections.
A comprehensive compilation of (mostly online) publications describing applications built with SProUT and other derived work is available here.
If you are interested in obtaining a licence for using SProUT please go to the Licencing section.