Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Reducing noise and stutter in short tandem repeat loci with unique molecular identifiers.

In Forensic science international. Genetics

Unique molecular identifiers (UMIs) are a promising approach to contend with errors generated during PCR and massively parallel sequencing (MPS). With UMI technology, random molecular barcodes are ligated to template DNA molecules prior to PCR, allowing PCR and sequencing error to be tracked and corrected bioinformatically. UMIs have the potential to be particularly informative for the interpretation of short tandem repeats (STRs). Traditional MPS approaches may simply lead to the observation of alleles that are consistent with the hypotheses of stutter, while with UMIs stutter products bioinformatically may be re-associated with their parental alleles and subsequently removed. Herein, a bioinformatics pipeline named strumi is described that is designed for the analysis of STRs that are tagged with UMIs. Unlike other tools, strumi is an alignment-free machine learning driven algorithm that clusters individual MPS reads into UMI families, infers consensus super-reads that represent each family and provides an estimate the resulting haplotype's accuracy. Super-reads, in turn, approximate independent measurements not of the PCR products, but of the original template molecules, both in terms of quantity and sequence identity. Provisional assessments show that naïve threshold-based approaches generate super-reads that are accurate (∼97 % haplotype accuracy, compared to ∼78 % when UMIs are not used), and the application of a more nuanced machine learning approach increases the accuracy to ∼99.5 % depending on the level of certainty desired. With these features, UMIs may greatly simplify probabilistic genotyping systems and reduce uncertainty. However, the ability to interpret alleles at trace levels also permits the interpretation, characterization and quantification of contamination as well as somatic variation (including somatic stutter), which may present newfound challenges.

Woerner August E, Mandape Sammed, King Jonathan L, Muenzler Melissa, Crysup Benjamin, Budowle Bruce


Molecular barcodes, Probabilistic genotyping, Stutter, Unique molecular identifiers

Surgery Surgery

Identification of candidate genes encoding tumor-specific neoantigens in early- and late-stage colon adenocarcinoma.

In Aging ; h5-index 49.0

Colon adenocarcinoma (COAD) is one of the most common gastrointestinal malignant tumors and is characterized by a high mortality rate. Here, we integrated whole-exome and RNA sequencing data from The Cancer Genome Atlas and investigated the mutational spectra of COAD-overexpressed genes to define clinically relevant diagnostic/prognostic signatures and to unmask functional relationships with both tumor-infiltrating immune cells and regulatory miRNAs. We identified 24 recurrently mutated genes (frequency > 5%) encoding putative COAD-specific neoantigens. Five of them (NEB, DNAH2, ABCA12, CENPF and CELSR1) had not been previously reported as COAD biomarkers. Through machine learning-based feature selection, four early-stage-related (COL11A1, TG, SOX9, and DNAH2) and four late-stage-related (COL11A1, SOX9, TG and BRCA2) candidate neoantigen-encoding genes were selected as diagnostic signatures. They respectively showed 100% and 97% accuracy in predicting early- and late-stage patients, and an 8-gene signature had excellent prognostic performance predicting disease-free survival (DFS) in COAD patients. We also found significant correlations between the 24 candidate neoantigen genes and the abundance and/or activation status of 22 tumor-infiltrating immune cell types and 56 regulatory miRNAs. Our novel neoantigen-based signatures may improve diagnostic and prognostic accuracy and help design targeted immunotherapies for COAD treatment.

Wang Chong, Xue Wenhua, Zhang Haohao, Fu Yang


colon adenocarcinoma, machine learning, neoantigens, recurrent mutations, sequencing

General General

Quantifying the Alignment of Graph and Features in Deep Learning.

In IEEE transactions on neural networks and learning systems

We show that the classification performance of graph convolutional networks (GCNs) is related to the alignment between features, graph, and ground truth, which we quantify using a subspace alignment measure (SAM) corresponding to the Frobenius norm of the matrix of pairwise chordal distances between three subspaces associated with features, graph, and ground truth. The proposed measure is based on the principal angles between subspaces and has both spectral and geometrical interpretations. We showcase the relationship between the SAM and the classification performance through the study of limiting cases of GCNs and systematic randomizations of both features and graph structure applied to a constructive example and several examples of citation networks of different origins. The analysis also reveals the relative importance of the graph and features for classification purposes.

Qian Yifan, Expert Paul, Rieu Tom, Panzarasa Pietro, Barahona Mauricio


General General

A rapid whisker-based decision underlying skilled locomotion in mice.

In eLife

Skilled motor behavior requires rapidly integrating external sensory input with information about internal state to decide which movements to make next. Using machine learning approaches for high-resolution kinematic analysis, we uncover the logic of a rapid decision underlying sensory-guided locomotion in mice. After detecting obstacles with their whiskers mice select distinct kinematic strategies depending on a whisker-derived estimate of obstacle location together with the position and velocity of their body. Although mice rely on whiskers for obstacle avoidance, lesions of primary whisker sensory cortex had minimal impact. While motor cortex manipulations affected the execution of the chosen strategy, the decision-making process remained largely intact. These results highlight the potential of machine learning for reductionist analysis of naturalistic behaviors and provide a case in which subcortical brain structures appear sufficient for mediating a relatively sophisticated sensorimotor decision.

Warren Richard A, Zhang Qianyun, Hoffman Judah R, Li Edward Y, Hong Y Kate, Bruno Randy M, Sawtell Nathaniel B


barrel cortex, decision-making, locomotion, motor cortex, mouse, neuroscience, whiskers

Surgery Surgery

A Novel Scoring System to Predict Length of Stay After Anterior Cervical Discectomy and Fusion.

In The Journal of the American Academy of Orthopaedic Surgeons

INTRODUCTION : The movement toward reducing healthcare expenditures has led to an increased volume of outpatient anterior cervical diskectomy and fusions (ACDFs). Appropriateness for outpatient surgery can be gauged based on the duration of recovery each patient will likely need.

METHODS : Patients undergoing 1- or 2-level ACDFs were retrospectively identified at a single Level I spine surgery referral institution. Length of stay (LOS) was categorized binarily as either less than two midnights or two or more midnights. The data were split into training (80%) and test (20%) sets. Two multivariate regressions and three machine learning models were developed to predict a probability of LOS ≥ 2 based on preoperative patient characteristics. Using each model, coefficients were computed for each risk factor based on the training data set and used to create a calculatable ACDF Predictive Scoring System (APSS). Performance of each APSS was then evaluated on a subsample of the data set withheld from training. Decision curve analysis was done to evaluate benefit across probability thresholds for the best performing model.

RESULTS : In the final analysis, 1,516 patients had a LOS <2 and 643 had a LOS ≥2. Patient characteristics used for predictive modeling were American Society of Anesthesiologists score, age, body mass index, sex, procedure type, history of chronic pulmonary disease, depression, diabetes, hypertension, and hypothyroidism. The best performing APSS was modeled after a lasso regression. When applied to the withheld test data set, the APSS-lasso had an area under the curve from the receiver operating characteristic curve of 0.68, with a specificity of 0.78 and a sensitivity of 0.49. The calculated APSS scores ranged between 0 and 45 and corresponded to a probability of LOS ≥2 between 4% and 97%.

CONCLUSION : Using classic statistics and machine learning, this scoring system provides a platform for stratifying patients undergoing ACDF into an inpatient or outpatient surgical setting.

Russo Glenn S, Canseco Jose A, Chang Michael, Levy Hannah A, Nicholson Kristen, Karamian Brian A, Mangan John, Fang Taolin, Vaccaro Alexander R, Kepler Christopher K


General General

Convolutional Neural Networks for Semantic Segmentation as a Tool for Multiclass Face Analysis in Thermal Infrared.

In Journal of nondestructive evaluation

Convolutional neural networks were used for multiclass segmentation in thermal infrared face analysis. The principle is based on existing image-to-image translation approaches, where each pixel in an image is assigned to a class label. We show that established networks architectures can be trained for the task of multiclass face analysis in thermal infrared. Created class annotations consisted of pixel-accurate locations of different face classes. Subsequently, the trained network can segment an acquired unknown infrared face image into the defined classes. Furthermore, face classification in live image acquisition is shown, in order to be able to display the relative temperature in real-time from the learned areas. This allows a pixel-accurate temperature face analysis e.g. for infection detection like Covid-19. At the same time our approach offers the advantage of concentrating on the relevant areas of the face. Areas of the face irrelevant for the relative temperature calculation or accessories such as glasses, masks and jewelry are not considered. A custom database was created to train the network. The results were quantitatively evaluated with the intersection over union (IoU) metric. The methodology shown can be transferred to similar problems for more quantitative thermography tasks like in materials characterization or quality control in production.

Müller David, Ehlen Andreas, Valeske Bernd


Artificial intelligence, Health monitoring, Intelligent sensors, Machine learning, Thermography