It has long been hypothesized that aging and neurodegeneration are associated with somatic mutation in neurons; however, methodological hurdles have prevented testing this hypothesis directly. We used single-cell whole-genome sequencing to perform genome-wide somatic single-nucleotide variant (sSNV) identification on DNA from 161 single neurons from the prefrontal cortex and hippocampus of 15 normal individuals (aged 4 months to 82 years), as well as 9 individuals affected by early-onset neurodegeneration due to genetic disorders of DNA repair (Cockayne syndrome and xeroderma pigmentosum). sSNVs increased approximately linearly with age in both areas (with a higher rate in hippocampus) and were more abundant in neurodegenerative disease. The accumulation of somatic mutations with age-which we term genosenium-shows age-related, region-related, and disease-related molecular signatures and may be important in other human age-associated conditions.
OBJECTIVES: As electronic mental health records become more widely available, several approaches have been suggested to automatically extract information from free-text narrative aiming to support epidemiological research and clinical decision-making. In this paper, we explore extraction of explicit mentions of symptom severity from initial psychiatric evaluation records. We use the data provided by the 2016 CEGS N-GRID NLP shared task Track 2, which contains 541 records manually annotated for symptom severity according to the Research Domain Criteria. METHODS: We designed and implemented 3 automatic methods: a knowledge-driven approach relying on local lexicalized rules based on common syntactic patterns in text suggesting positive valence symptoms; a machine learning method using a neural network; and a hybrid approach combining the first 2 methods with a neural network. RESULTS: The results on an unseen evaluation set of 216 psychiatric evaluation records showed a performance of 80.1% for the rule-based method, 73.3% for the machine-learning approach, and 72.0% for the hybrid one. CONCLUSIONS: Although more work is needed to improve the accuracy, the results are encouraging and indicate that automated text mining methods can be used to classify mental health symptom severity from free text psychiatric notes to support epidemiological and clinical research.
BACKGROUND: Genetic studies of neuropsychiatric disease strongly suggest an overlap in liability. There are growing efforts to characterize these diseases dimensionally rather than categorically, but the extent to which such dimensional models correspond to biology is unknown. METHODS: We applied a newly developed natural language processing method to extract five symptom dimensions based on the National Institute of Mental Health Research Domain Criteria definitions from narrative hospital discharge notes in a large biobank. We conducted a genome-wide association study to examine whether common variants were associated with each of these dimensions as quantitative traits. RESULTS: Among 4687 individuals, loci in three of five domains exceeded a genome-wide threshold for statistical significance. These included a locus spanning the neocortical development genes RFPL3 and RFPL3S for arousal (p = 2.29 × 10) and one spanning the FPR3 gene for cognition (p = 3.22 × 10). CONCLUSIONS: Natural language processing identifies dimensional phenotypes that may facilitate the discovery of common genetic variation that is relevant to psychopathology.
BACKGROUND: Relying on diagnostic categories of neuropsychiatric illness obscures the complexity of these disorders. Capturing multiple dimensional measures of neuropathology could facilitate the clinical and neurobiological investigation of cognitive and behavioral phenotypes. METHODS: We developed a natural language processing-based approach to extract five symptom dimensions, based on the National Institute of Mental Health Research Domain Criteria definitions, from narrative clinical notes. Estimates of Research Domain Criteria loading were derived from a cohort of 3619 individuals with 4623 hospital admissions. We applied this tool to a large corpus of psychiatric inpatient admission and discharge notes (2010-2015), and using the same cohort we examined face validity, predictive validity, and convergent validity with gold standard annotations. RESULTS: In mixed-effect models adjusted for sociodemographic and clinical features, greater negative and positive symptom domains were associated with a shorter length of stay (β = -.88, p = .001 and β = -1.22, p < .001, respectively), while greater social and arousal domain scores were associated with a longer length of stay (β = .93, p < .001 and β = .81, p = .007, respectively). In fully adjusted Cox regression models, a greater positive domain score at discharge was also associated with a significant increase in readmission risk (hazard ratio = 1.22, p < .001). Positive and negative valence domains were correlated with expert annotation (by analysis of variance [df = 3], R = .13 and .19, respectively). Likewise, in a subset of patients, neurocognitive testing was correlated with cognitive performance scores (p < .008 for three of six measures). CONCLUSIONS: This shows that natural language processing can be used to efficiently and transparently score clinical notes in terms of cognitive and psychopathologic domains.
Detailed characterization of the cell types in the human brain requires scalable experimental approaches to examine multiple aspects of the molecular state of individual cells, as well as computational integration of the data to produce unified cell-state annotations. Here we report improved high-throughput methods for single-nucleus droplet-based sequencing (snDrop-seq) and single-cell transposome hypersensitive site sequencing (scTHS-seq). We used each method to acquire nuclear transcriptomic and DNA accessibility maps for >60,000 single cells from human adult visual cortex, frontal cortex, and cerebellum. Integration of these data revealed regulatory elements and transcription factors that underlie cell-type distinctions, providing a basis for the study of complex processes in the brain, such as genetic programs that coordinate adult remyelination. We also mapped disease-associated risk variants to specific cellular populations, which provided insights into normal and pathogenic cellular processes in the human brain. This integrative multi-omics approach permits more detailed single-cell interrogation of complex organs and tissues.
Motivated by applications in genomics, we consider in this paper global and multiple testing for the comparisons of two high-dimensional linear regression models. A procedure for testing the equality of the two regression vectors globally is proposed and shown to be particularly powerful against sparse alternatives. We then introduce a multiple testing procedure for identifying unequal coordinates while controlling the false discovery rate and false discovery proportion. Theoretical justifications are provided to guarantee the validity of the proposed tests and optimality results are established under sparsity assumptions on the regression coefficients. The proposed testing procedures are easy to implement. Numerical properties of the procedures are investigated through simulation and data analysis. The results show that the proposed tests maintain the desired error rates under the null and have good power under the alternative at moderate sample sizes. The procedures are applied to the Framingham Offspring study to investigate the interactions between smoking and cardiovascular related genetic mutations important for an inflammation marker.