Abstract:The second track of the CEGS N-GRID 2016 natural language processing shared tasks focused on predicting symptom severity from neuropsychiatric clinical records. For the first time, initial psychiatric evaluation records have been collected, de-identified, annotated and shared with the scientific community. One-hundred-ten researchers organized in twenty-four teams participated in this track and submitted sixty-five system runs for evaluation. The top ten teams each achieved an inverse normalized macro-averaged mean absolute error score over 0.80. The top performing system employed an ensemble of six different machine learning-based classifiers to achieve a score 0.86. The task resulted to be generally easy with the exception of two specific classes of records: records with very few but crucial positive valence signals, and records describing patients predominantly affected by negative rather than positive valence. Those cases proved to be very challenging for most of the systems. Further research is required to consider the task solved. Overall, the results of this track demonstrate the effectiveness of data-driven approaches to the task of symptom severity classification.