-
Notifications
You must be signed in to change notification settings - Fork 12
Introduction to NIF
NLP Interchange Format (NIF) is an ontology that describes strings. There are some interesting resources where you can find complete wikis and documentation:
It's possible either create a NIF dataset using online web services or downloading the project from GitHub and running as a maven project using CLI. This wiki will describe both ways: using web service and CLI.
The easiest way to create a NIF dataset is using one of the implemented NIF REST web service. However, using our online web services is feasible only to create small datasets. To annotate large documents or texts we highly recommend deploy a local web service or use CLI interface (there is a detailed example in the GitHub page). Finally, you can check which web services are available accessing NIF Dashboard and choose one of them to annotate your text/string. In this example the DBpedia SpotLight wrapper will be used.
The simplest way to understand and start using NIF is accessing in your browser (or using curl
) the following address:
It's important to understand each of the parameters in order to produce a correct output:
-
f=text
: Means that the FORMAT is type text. -
i=This+is+Germany
: It's the input text that NIF will annotate. The input text/document is referred by NIF as a Context, in this case the Context is the sentence "This is Germany". -
t=direct
: Means that the input is a text string and not a file or URL. -
confidence=0.2
: is the confidence level used to query Spotlight and ranges from 0 to 1. -
prefix=http://myprefix.org/
: the prefix which the NIF implementation must use to create and parse URIs.
Notice that the string "This is Germany." was annotated. The process to annotate large text files or documents are the same, nevertheless you must use NIF web service in your localhost.
The following step by step describes how to start a localhost web service.
- Clone NIF project from our GitHub webpage:
git clone https://github.com/NLP2RDF/software.git
- Open the java-maven folder and compile the project:
mvn install
- Open the software/java-maven/implementation/spotlight folder
- Start jetty server:
mvn jetty:run
At this stage you are able to access NIF web service locally using port 9995. The same example above can be accessed locally by following address:
curl "http://localhost:9995/spotlight?f=text&i=This+is+Germany.&t=direct&confidence=0.2&prefix=http://myprefix.org/"
To save the annotated string to a file you can redirect the output using >
.
To use the CLI software you must clone NIF project. The process is described in the 1.2 section.
After opening the spotlight folder, you should run the wrapper using the command:
mvn exec:java -e -Dexec.mainClass="org.nlp2rdf.implementation.spotlight.SpotlightCLI" -D exec.args="-f text -i 'This is Germany.'"
Most part of cases you have to annotate a text or document which is is a file. Thus, you can change the parameter -t=file
and i=pathToFile
. Notice that this parameter works only in the CLI mode. In the same way, you can redirect the output to a file using outfile=pathToFile
.
For more details of NIF parameters, please access the Public API Specification (http://persistence.uni-leipzig.org/nlp2rdf/specification/api.html).