Computational methods for evaluating patient-provider communication.
Logistic regression and HMM can be used for prediction.
Author: Jihyun Park [email protected]
Last updated: 6/22/2018
Python2
should be installed with packages numpy
, nltk
, pandas
, sklearn
, csv
, cPickle
.
load_model_and_predict.ipynb
Demo iPython notebook file that loads the pre-trained model and the sample test data
and predicts on the sample test data set.
Training and test file should be a pipe-delimited file (delimited with "|") with visitid
, talkturn
,
text
, topicnumber
, topicletter
as column names.
For the test data, topicnumber
and topicletter
columns are not necessary
since the test data can be run without labels.
However, the scores will not be calculated without those columns.
Classes for models are in the file. Details of the usage can be found in the demo iPython notebook file and the code docstring.
-
DialogModel
Base class for dialog model. Used when you have a set of results from another base model (independent model) that is trained somewhere else (e.g. output from RNN). Predictions and output probabilities are loaded usingload_model()
in this class object and then the object can be plugged into HMM. -
LogRegDialogModel
Class for running independent logistic regression model.
fit_model(tr_data)
to train data,predict(te_data)
to make prediction. -
HMMDialogModel
Class for running Hidden Markov Model on top of some base independent model.
fit_model(tr_data)
to train data,predict_viterbi(te_data)
to make prediction. -
DialogResult
Class that stores the results and calculates and prints out the scores.
Classes for the data. The classes loads the data and pre-processes.
DialogData
: Base class for dialog data.MHDTrainData
: Class for MHD training data.MHDTestData
: Class for MHD test data. Preprocessing methods are inpreprocess.py
file.
Methods that are related to HMM.
Utility methods.
Saved as model/*.pkl
Files
Vocabulary and label files from the training data are saved as data/vocab.pkl
and data/label*.pkl
.