Apache cTAKESTM is a natural language processing system for extraction of information from electronic medical record clinical free-text.
This version of the smoking status collection processing engine processes flat files to classify patient records into five pre-determined categories:
- past smoker (P)1
- current smoker (C)1
- smoker (S)
- nonsmoker (N)
- unknown (U)
1where a past and current smoker are distinguished based on temporal expressions in the patient's medical records.
Java 1.8 is required to run cTAKES
Install scripts for Windows and Linux are located in the Scripts directory.
In the initial setup cTAKES will recognize only few sample concepts in text. If you wish to perform named entity recognition or concept identification for anything other than these few words, you will need to provide UMLS credentials to cTAKES. If you do not have a UMLS API Key, you may request one at UMLS Terminology Services. After obtaining a Key, there are two methods (i.e., Operating System Variable or a Java Command Parameter) with two options each (i.e., ctakes.umls_apikey or umlsKey) to utilize it with Apache cTAKES.
Method 1: Operating System Variable
Set either ctakes.umls_apikey
or umlsKey
as an operating system variable:
Option | Windows | Linux |
---|---|---|
1 | set ctakes.umls_apikey=MY_UMLS_KEY | export ctakes.umls_apikey=MY_UMLS_KEY |
2 | set umlsKey=MY_UMLS_KEY | export umlsKey=MY_UMLS_KEY |
Method 2: Java Command Parameter
Set either Dctakes.umls_apikey
or DumlsKey
in your Java command parameters.
Once you have your UMLS API Key find the line in each script that runs java and add the chosen parameter to the java command with your key. Make sure you substitute your actual key. In the examples below, the rest of the lines after -cp are not shown because you do not need to modify the rest of the line. Do not delete the rest of the line after -cp however.
Option | Code |
---|---|
1 | java -Dctakes.umls_apikey=MY_UMLS_KEY -cp ... |
2 | java -DumlsKey=MY_UMLS_KEY -cp ... |
Windows
Step | Windows |
---|---|
1. Place patient note files: | C:\apache-ctakes-4.0.0.1\testdata\smoking\testinput |
2. Run the smoking status pipeline: | cd C:\apache-ctakes-4.0.0.1 bin\runSmokingStatusCPE.bat |
3. Results are written to: | C:\apache-ctakes-4.0.0.1\testdata\smoking\testoutput\results.txt |
Linux
Step | Linux |
---|---|
1. Place patient note files: | /usr/local/apache-ctakes-4.0.0.1/testdata/smoking/testinput |
2. Run the smoking status pipeline: | cd /usr/local/apache-ctakes-4.0.0.1 ./bin/runSmokingStatusCPE.sh |
3. Results are written to: | /usr/local/apache-ctakes-4.0.0.1/testdata/smoking/testoutput/results.txt |
Please note that this project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Established in 1853, Washington University in Saint Louis is among the world’s leaders in teaching, research, patient care, and service to society. Boasting 24 Nobel laureates to date, the University is ranked 7th in the world for most cited researchers, received the 4th highest amount of NIH medical research grants among medical schools in 2019, and was tied for 1st in the United States for genetics and genomics in 2018. The University is committed to learning and exploration, discovery and impact, and intellectual passions and challenging the unknown.