Deidentified Psychiatric Intake Notes

A data repository containing 1,000 deidentified and annotated psychiatric intake notes was developed and used for the RDoC Natural Language Processing Challenge and Workshop. This data set contains information about patients' medical and psychiatric histories, drug and alcohol use, family history, current living situations, and other information potentially relevant to their psychiatric problems. The corpus contains 1,862,452 tokens. It will be made available soon on the Department of Biomedical Informatics website and data portal.

Outcomes of this challenge were published in a November 2017 special issue of the Journal of Biomedical Informatics (Volume 75 Supplement). Guest-edited by challenge organizers Özlem Uzuner, Amber Stubbs, and Michele Filannino, this issue provides overviews of the challenge and its two main tracks—De-identification and Symptom Severity Classification—along with the results of 14 teams who participated in the challenge.