Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate BioNLPProcessor with Odinson #392

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions extra/build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ libraryDependencies ++= {
"ai.lum" %% "nxmlreader" % "0.1.2",
"org.clulab" %% "processors-main" % procVersion,
"org.clulab" %% "processors-corenlp" % procVersion,
"org.clulab" %% "reach-processors" % "1.6.3-SNAPSHOT"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to include a snapshot dependency in the default branch.

@kwalcock , is a version going to be published? What is the ETA on getting the transformer-based processor integrated into Reach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone ;-) might still verify that the RC version works on aarch. Someone else might want to prepare a version of breeze/netlib that works on anything other than Linux (luhenry/netlib#21). I suspect that it will need to be me and that it will require forking that project and will take more time than we want.

)

}
3 changes: 2 additions & 1 deletion extra/src/main/resources/application.conf
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,9 @@ odinson.indexDir = ${odinson.dataDir}/index
odinson.extra {
# processor to use for AnnotateText
# choices: FastNLPProcessor, CluProcessor
processorType = "CluProcessor"
#processorType = "CluProcessor"
#processorType = "FastNLPProcessor"
processorType = "BioNLPProcessor"
rulesFile = /example/rules.yml
outputFile = ../example_extractions.jsonl
}
Expand Down
5 changes: 5 additions & 0 deletions extra/src/main/scala/ai/lum/odinson/extra/AnnotateText.scala
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import ai.lum.odinson.Document
import ai.lum.odinson.extra.utils.{ ExtraFileUtils, ProcessorsUtils }
import ai.lum.odinson.extra.utils.ProcessorsUtils.getProcessor
import org.clulab.utils.FileUtils
import org.clulab.processors.bionlp.BioNLPProcessor

object AnnotateText extends App with LazyLogging {

Expand Down Expand Up @@ -59,6 +60,10 @@ object AnnotateText extends App with LazyLogging {
def annotateTextFile(f: File): Document = {
val text = f.readString()
val doc = processor.annotate(text)
processor match {
case p:BioNLPProcessor =>
p.recognizeRuleNamedEntities(doc)
}
// use file base name as document id
doc.id = Some(f.getBaseName())
ProcessorsUtils.convertDocument(doc)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import ai.lum.odinson.{ Document => OdinsonDocument, Sentence => OdinsonSentence
import org.clulab.dynet
import org.clulab.processors.clu.CluProcessor
import org.clulab.processors.fastnlp.FastNLPProcessor
import org.clulab.processors.bionlp.BioNLPProcessor
import org.clulab.processors.{
Processor,
Document => ProcessorsDocument,
Expand Down Expand Up @@ -42,6 +43,10 @@ object ProcessorsUtils {
dynet.Utils.initializeDyNet(autoBatch = false, mem = "1024,1024,1024,1024")
new CluProcessor
}
case "BioNLPProcessor" => {
dynet.Utils.initializeDyNet(autoBatch = false, mem = "1024,1024,1024,1024")
new BioNLPProcessor
}
}
}

Expand Down
Loading