In medical practices, doctors detail patients' care plan via discharge summaries written in the form of unstructured free texts, which among the others contain medication names and prescription information. Extracting prescriptions from discharge summaries is challenging due to the way these documents are written. Handwritten rules and medical gazetteers have proven to be useful for this purpose but come with limitations on performance, scalability, and generalizability. We instead present a machine learning approach to extract and organize medication names and prescription information into individual entries. Our approach utilizes word embeddings and tackles the task in two extraction steps, both of which are treated as sequence labeling problems. When evaluated on the 2009 i2b2 Challenge official benchmark set, the proposed approach achieves a horizontal phrase-level F1-measure of 0.864, which to the best of our knowledge represents an improvement over the current state-of-the-art.
Studying multiple outcomes simultaneously allows researchers to begin to identify underlying factors that affect all of a set of diseases (i.e., shared etiology) and what may give rise to differences in disorders between patients (i.e., disease subtypes). In this work, our goal is to build risk scores that are predictive of multiple phenotypes simultaneously and identify subpopulations at high risk of multiple phenotypes. Such analyses could yield insight into etiology or point to treatment and prevention strategies. The standard canonical correlation analysis (CCA) can be used to relate multiple continuous outcomes to multiple predictors. However, in order to capture the full complexity of a disorder, phenotypes may include a diverse range of data types, including binary, continuous, ordinal, and censored variables. When phenotypes are diverse in this way, standard CCA is not possible and no methods currently exist to model them jointly. In the presence of such complications, we propose a semi-parametric CCA method to develop risk scores that are predictive of multiple phenotypes. To guard against potential model mis-specification, we also propose a nonparametric calibration method to identify subgroups that are at high risk of multiple disorders. A resampling procedure is also developed to account for the variability in these estimates. Our method opens the door to synthesizing a wide array of data sources for the purposes of joint prediction.
OBJECTIVE: Our objective was to develop a machine learning-based system to determine the severity of Positive Valance symptoms for a patient, based on information included in their initial psychiatric evaluation. Severity was rated on an ordinal scale of 0-3 as follows: 0 (absent=no symptoms), 1 (mild=modest significance), 2 (moderate=requires treatment), 3 (severe=causes substantial impairment) by experts. MATERIALS AND METHODS: We treated the task of assigning Positive Valence severity as a text classification problem. During development, we experimented with regularized multinomial logistic regression classifiers, gradient boosted trees, and feedforward, fully-connected neural networks. We found both regularization and feature selection via mutual information to be very important in preventing models from overfitting the data. Our best configuration was a neural network with three fully connected hidden layers with rectified linear unit activations. RESULTS: Our best performing system achieved a score of 77.86%. The evaluation metric is an inverse normalization of the Mean Absolute Error presented as a percentage number between 0 and 100, where 100 means the highest performance. Error analysis showed that 90% of the system errors involved neighboring severity categories. CONCLUSION: Machine learning text classification techniques with feature selection can be trained to recognize broad differences in Positive Valence symptom severity with a modest amount of training data (in this case 600 documents, 167 of which were unannotated). An increase in the amount of annotated data can increase accuracy of symptom severity classification by several percentage points. Additional features and/or a larger training corpus may further improve accuracy.
This paper presents a novel method for automatically recognizing symptom severity by using natural language processing of psychiatric evaluation records to extract features that are processed by machine learning techniques to assign a severity score to each record evaluated in the 2016 RDoC for Psychiatry Challenge from CEGS/N-GRID. The natural language processing techniques focused on (a) discerning the discourse information expressed in questions and answers; (b) identifying medical concepts that relate to mental disorders; and (c) accounting for the role of negation. The machine learning techniques rely on the assumptions that (1) the severity of a patient's positive valence symptoms exists on a latent continuous spectrum and (2) all the patient's answers and narratives documented in the psychological evaluation records are informed by the patient's latent severity score along this spectrum. These assumptions motivated our two-step machine learning framework for automatically recognizing psychological symptom severity. In the first step, the latent continuous severity score is inferred from each record; in the second step, the severity score is mapped to one of the four discrete severity levels used in the CEGS/N-GRID challenge. We evaluated three methods for inferring the latent severity score associated with each record: (i) pointwise ridge regression; (ii) pairwise comparison-based classification; and (iii) a hybrid approach combining pointwise regression and the pairwise classifier. The second step was implemented using a tree of cascading support vector machine (SVM) classifiers. While the official evaluation results indicate that all three methods are promising, the hybrid approach not only outperformed the pairwise and pointwise methods, but also produced the second highest performance of all submissions to the CEGS/N-GRID challenge with a normalized MAE score of 84.093% (where higher numbers indicate better performance). These evaluation results enabled us to observe that, for this task, considering pairwise information can produce more accurate severity scores than pointwise regression - an approach widely used in other systems for assigning severity scores. Moreover, our analysis indicates that using a cascading SVM tree outperforms traditional SVM classification methods for the purpose of determining discrete severity levels.
Ava C Carter, Howard Y Chang, George Church, Ashley Dombkowski, Joseph R Ecker, Elad Gil, Paul G Giresi, Henry Greely, William J Greenleaf, Nir Hacohen, Chuan He, David Hill, Justin Ko, Isaac Kohane, Anshul Kundaje, Megan Palmer, Michael P Snyder, Joyce Tung, Alexander Urban, Marc Vidal, and Wing Wong. 2017. “Challenges and recommendations for epigenomics in precision health.” Nat Biotechnol, 35, 12, Pp. 1128-1132.
Linking putatively pathogenic variants to the tissues they affect is necessary for determining the correct diagnostic workup and therapeutic regime in undiagnosed patients. Here, we explored how gene expression across healthy tissues can be used to infer this link. We integrated 6,665 tissue-wide transcriptomes with genetic disorder knowledge bases covering 3,397 diseases. Receiver-operating characteristics (ROC) analysis using expression levels in each tissue and across tissues indicated significant but modest associations between elevated expression and phenotype for most tissues (maximum area under ROC curve = 0.69). At extreme elevation, associations were marked. Upregulation of disease genes in affected tissues was pronounced for genes associated with autosomal dominant over recessive disorders. Pathways enriched for genes expressed and associated with phenotypes highlighted tissue functionality, including lipid metabolism in spleen and DNA repair in adipose tissue. These results suggest features useful for evaluating the likelihood of particular tissue manifestations in genetic disorders. The web address of an interactive platform integrating these data is provided.
The CEGS N-GRID 2016 Shared Task (Filannino et al., 2017) in Clinical Natural Language Processing introduces the assignment of a severity score to a psychiatric symptom, based on a psychiatric intake report. We present a method that employs the inherent interview-like structure of the report to extract relevant information from the report and generate a representation. The representation consists of a restricted set of psychiatric concepts (and the context they occur in), identified using medical concepts defined in UMLS that are directly related to the psychiatric diagnoses present in the Diagnostic and Statistical Manual of Mental Disorders, 4th Edition (DSM-IV) ontology. Random Forests provides a generalization of the extracted, case-specific features in our representation. The best variant presented here scored an inverse mean absolute error (MAE) of 80.64%. A concise concept-based representation, paired with identification of concept certainty and scope (family, patient), shows a robust performance on the task.
De-identification, identifying information from data, such as protected health information (PHI) present in clinical data, is a critical step to enable data to be shared or published. The 2016 Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-scale and RDOC Individualized Domains (N-GRID) clinical natural language processing (NLP) challenge contains a de-identification track in de-identifying electronic medical records (EMRs) (i.e., track 1). The challenge organizers provide 1000 annotated mental health records for this track, 600 out of which are used as a training set and 400 as a test set. We develop a hybrid system for the de-identification task on the training set. Firstly, four individual subsystems, that is, a subsystem based on bidirectional LSTM (long-short term memory, a variant of recurrent neural network), a subsystem-based on bidirectional LSTM with features, a subsystem based on conditional random field (CRF) and a rule-based subsystem, are used to identify PHI instances. Then, an ensemble learning-based classifiers is deployed to combine all PHI instances predicted by above three machine learning-based subsystems. Finally, the results of the ensemble learning-based classifier and the rule-based subsystem are merged together. Experiments conducted on the official test set show that our system achieves the highest micro F1-scores of 93.07%, 91.43% and 95.23% under the "token", "strict" and "binary token" criteria respectively, ranking first in the 2016 CEGS N-GRID NLP challenge. In addition, on the dataset of 2014 i2b2 NLP challenge, our system achieves the highest micro F1-scores of 96.98%, 95.11% and 98.28% under the "token", "strict" and "binary token" criteria respectively, outperforming other state-of-the-art systems. All these experiments prove the effectiveness of our proposed method.
The CEGS N-GRID 2016 Shared Task 1 in Clinical Natural Language Processing focuses on the de-identification of psychiatric evaluation records. This paper describes two participating systems of our team, based on conditional random fields (CRFs) and long short-term memory networks (LSTMs). A pre-processing module was introduced for sentence detection and tokenization before de-identification. For CRFs, manually extracted rich features were utilized to train the model. For LSTMs, a character-level bi-directional LSTM network was applied to represent tokens and classify tags for each token, following which a decoding layer was stacked to decode the most probable protected health information (PHI) terms. The LSTM-based system attained an i2b2 strict micro-F measure of 0.8986, which was higher than that of the CRF-based system.
The 2016 CEGS N-GRID shared tasks for clinical records contained three tracks. Track 1 focused on de-identification of a new corpus of 1000 psychiatric intake records. This track tackled de-identification in two sub-tracks: Track 1.A was a "sight unseen" task, where nine teams ran existing de-identification systems, without any modifications or training, on 600 new records in order to gauge how well systems generalize to new data. The best-performing system for this track scored an F1 of 0.799. Track 1.B was a traditional Natural Language Processing (NLP) shared task on de-identification, where 15 teams had two months to train their systems on the new data, then test it on an unannotated test set. The best-performing system from this track scored an F1 of 0.914. The scores for Track 1.A show that unmodified existing systems do not generalize well to new data without the benefit of training data. The scores for Track 1.B are slightly lower than the 2014 de-identification shared task (which was almost identical to 2016 Track 1.B), indicating that these new psychiatric records pose a more difficult challenge to NLP systems. Overall, de-identification is still not a solved problem, though it is important to the future of clinical NLP.
Neuropsychiatric disorders are common health problems affecting approximately 1% of the population. Twin, adoption, and family studies have displayed a strong genetic component for many of these disorders; however, the underlying pathophysiological mechanisms and neural substrates remain largely unknown. Given the critical need for new diagnostic markers and disease-modifying treatments, expanding the focus of genomic studies of neuropsychiatric disorders to include the role of non-coding RNAs (ncRNAs) is of growing interest. Of known types of ncRNAs, microRNAs (miRNAs) are 20-25-nucleotide, single-stranded, molecules that regulate gene expression through post-transcriptional mechanisms and have the potential to coordinately regulate complex regulatory networks. In this review, we summarize the current knowledge on miRNA alteration/dysregulation in neuropsychiatric disorders, with a special emphasis on schizophrenia (SCZ), bipolar disorder (BD), and major depressive disorder (MDD). With an eye toward the future, we also discuss the diagnostic and prognostic potential of miRNAs for neuropsychiatric disorders in the context of personalized treatments and network medicine.
Evidence has revealed interesting associations of clinical and social parameters with violent behaviors of patients with psychiatric disorders. Men are more violent preceding and during hospitalization, whereas women are more violent than men throughout the 3days following a hospital admission. It has also been proven that mental disorders may be a consistent risk factor for the occurrence of violence. In order to better understand violent behaviors of patients with psychiatric disorders, it is important to investigate both the clinical symptoms and psychosocial factors that accompany violence in these patients. In this study, we utilized a dataset released by the Partners Healthcare and Neuropsychiatric Genome-scale and RDoC Individualized Domains project of Harvard Medical School to develop a unique text mining pipeline that processes unstructured clinical data in order to recognize clinical and social parameters such asage, gender, history of alcohol use, and violent behaviors, and explored the associations between these parameters and violent behaviors of patients with psychiatric disorders. The aim of our work was to demonstrate the feasibility of mining factors that are strongly associated with violent behaviors among psychiatric patients from unstructured psychiatric evaluation records using clinical text mining. Experiment results showed that stimulants, followed by a family history of violent behavior, suicidal behaviors, and financial stress were strongly associated with violent behaviors. Key aspects explicated in this paper include employing our text mining pipeline to extract clinical and social factors linked with violent behaviors, generating association rules to uncover possible associations between these factors and violent behaviors, and lastly the ranking of top rules associated with violent behaviors using statistical analysis and interpretation.
De-identification, or identifying and removing protected health information (PHI) from clinical data, is a critical step in making clinical data available for clinical applications and research. This paper presents a natural language processing system for automatic de-identification of psychiatric notes, which was designed to participate in the 2016 CEGS N-GRID shared task Track 1. The system has a hybrid structure that combines machine leaning techniques and rule-based approaches. The rule-based components exploit the structure of the psychiatric notes as well as characteristic surface patterns of PHI mentions. The machine learning components utilize supervised learning with rich features. In addition, the system performance was boosted with integration of additional data to the training set through domain adaptation. The hybrid system showed overall micro-averaged F-score 90.74 on the test set, second-best among all the participants of the CEGS N-GRID task.
De-identification of clinical narratives is one of the main obstacles to making healthcare free text available for research. In this paper we describe our experience in expanding and tailoring two existing tools as part of the 2016 CEGS N-GRID Shared Tasks Track 1, which evaluated de-identification methods on a set of psychiatric evaluation notes for up to 25 different types of Protected Health Information (PHI). The methods we used rely on machine learning on either a large or small feature space, with additional strategies, including two-pass tagging and multi-class models, which both proved to be beneficial. The results show that the integration of the proposed methods can identify Health Information Portability and Accountability Act (HIPAA) defined PHIs with overall F-scores of ∼90% and above. Yet, some classes (Profession, Organization) proved again to be challenging given the variability of expressions used to reference given information.
BACKGROUND: The CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing (NLP) provided a set of 1000 neuropsychiatric notes to participants as part of a competition to predict psychiatric symptom severity scores. This paper summarizes our methods, results, and experiences based on our participation in the second track of the shared task. OBJECTIVE: Classical methods of text classification usually fall into one of three problem types: binary, multi-class, and multi-label classification. In this effort, we study ordinal regression problems with text data where misclassifications are penalized differently based on how far apart the ground truth and model predictions are on the ordinal scale. Specifically, we present our entries (methods and results) in the N-GRID shared task in predicting research domain criteria (RDoC) positive valence ordinal symptom severity scores (absent, mild, moderate, and severe) from psychiatric notes. METHODS: We propose a novel convolutional neural network (CNN) model designed to handle ordinal regression tasks on psychiatric notes. Broadly speaking, our model combines an ordinal loss function, a CNN, and conventional feature engineering (wide features) into a single model which is learned end-to-end. Given interpretability is an important concern with nonlinear models, we apply a recent approach called locally interpretable model-agnostic explanation (LIME) to identify important words that lead to instance specific predictions. RESULTS: Our best model entered into the shared task placed third among 24 teams and scored a macro mean absolute error (MMAE) based normalized score (100·(1-MMAE)) of 83.86. Since the competition, we improved our score (using basic ensembling) to 85.55, comparable with the winning shared task entry. Applying LIME to model predictions, we demonstrate the feasibility of instance specific prediction interpretation by identifying words that led to a particular decision. CONCLUSION: In this paper, we present a method that successfully uses wide features and an ordinal loss function applied to convolutional neural networks for ordinal text classification specifically in predicting psychiatric symptom severity scores. Our approach leads to excellent performance on the N-GRID shared task and is also amenable to interpretability using existing model-agnostic approaches.
Engulfment of synapses and neural progenitor cells (NPCs) by microglia is critical for the development and maintenance of proper brain circuitry, and has been implicated in neurodevelopmental as well as neurodegenerative disease etiology. We have developed and validated models of these mechanisms by reprogramming microglia-like cells from peripheral blood mononuclear cells, and combining them with NPCs and neurons derived from induced pluripotent stem cells to create patient-specific cellular models of complement-dependent synaptic pruning and elimination of NPCs. The resulting microglia-like cells express appropriate markers and function as primary human microglia, while patient-matched macrophages differ markedly. As a demonstration of disease-relevant application, we studied the role of C4, recently implicated in schizophrenia, in engulfment of synaptic structures by human microglia. The ability to create complete patient-specific cellular models of critical microglial functions utilizing samples taken during a single clinical visit will extend the ability to model central nervous system disease while facilitating high-throughput screening.
Major depressive disorder frequently co-occurs with medical disorders, raising the possibility of shared genetic liability. Recent identification of 15 novel genetic loci associated with depression allows direct investigation of this question. In cohorts of individuals participating in biobanks at two academic medical centers, we calculated polygenic loading for risk loci reported to be associated with depression. We then examined the association between such loading and 50 groups of clinical diagnoses, or topics, drawn from these patients' electronic health records, determined using a novel application of latent Dirichilet allocation. Three topics showed experiment-wide association with the depression liability score; these included diagnostic groups representing greater prevalence of mood and anxiety disorders, greater prevalence of cardiac ischemia, and a decreased prevalence of heart failure. The latter two associations persisted even among individuals with no mood disorder diagnosis. This application of a novel method for grouping related diagnoses in biobanks indicate shared genetic risk for depression and cardiac disease, with a pattern suggesting greater ischemic risk and diminished heart failure risk.
BACKGROUND: Applications of natural language processing to mental health notes are not common given the sensitive nature of the associated narratives. The CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing (NLP) changed this scenario by providing the first set of neuropsychiatric notes to participants. This study summarizes our efforts and results in proposing a novel data use case for this dataset as part of the third track in this shared task. OBJECTIVE: We explore the feasibility and effectiveness of predicting a set of common mental conditions a patient has based on the short textual description of patient's history of present illness typically occurring in the beginning of a psychiatric initial evaluation note. MATERIALS AND METHODS: We clean and process the 1000 records made available through the N-GRID clinical NLP task into a key-value dictionary and build a dataset of 986 examples for which there is a narrative for history of present illness as well as Yes/No responses with regards to presence of specific mental conditions. We propose two independent deep neural network models: one based on convolutional neural networks (CNN) and another based on recurrent neural networks with hierarchical attention (ReHAN), the latter of which allows for interpretation of model decisions. We conduct experiments to compare these methods to each other and to baselines based on linear models and named entity recognition (NER). RESULTS: Our CNN model with optimized thresholding of output probability estimates achieves best overall mean micro-F score of 63.144% for 11 common mental conditions with statistically significant gains (p<0.05) over all other models. The ReHAN model with interpretable attention mechanism scored 61.904% mean micro-F1 score. Both models' improvements over baseline models (support vector machines and NER) are statistically significant. The ReHAN model additionally aids in interpretation of the results by surfacing important words and sentences that lead to a particular prediction for each instance. CONCLUSIONS: Although the history of present illness is a short text segment averaging 300 words, it is a good predictor for a few conditions such as anxiety, depression, panic disorder, and attention deficit hyperactivity disorder. Proposed CNN and RNN models outperform baseline approaches and complement each other when evaluating on a per-label basis.
In response to the challenges set forth by the CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing, we describe a framework to automatically classify initial psychiatric evaluation records to one of four positive valence system severities: absent, mild, moderate, or severe. We used a dataset provided by the event organizers to develop a framework comprised of natural language processing (NLP) modules and 3 predictive models (two decision tree models and one Bayesian network model) used in the competition. We also developed two additional predictive models for comparison purpose. To evaluate our framework, we employed a blind test dataset provided by the 2016 CEGS N-GRID. The predictive scores, measured by the macro averaged-inverse normalized mean absolute error score, from the two decision trees and Naïve Bayes models were 82.56%, 82.18%, and 80.56%, respectively. The proposed framework in this paper can potentially be applied to other predictive tasks for processing initial psychiatric evaluation records, such as predicting 30-day psychiatric readmissions.