Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data.

In Genome medicine ; h5-index 64.0

Multi-omics data are good resources for prognosis and survival prediction; however, these are difficult to integrate computationally. We introduce DeepProg, a novel ensemble framework of deep-learning and machine-learning approaches that robustly predicts patient survival subtypes using multi-omics data. It identifies two optimal survival subtypes in most cancers and yields significantly better risk-stratification than other multi-omics integration methods. DeepProg is highly predictive, exemplified by two liver cancer (C-index 0.73-0.80) and five breast cancer datasets (C-index 0.68-0.73). Pan-cancer analysis associates common genomic signatures in poor survival subtypes with extracellular matrix modeling, immune deregulation, and mitosis processes. DeepProg is freely available at

Poirion Olivier B, Jing Zheng, Chaudhary Kumardeep, Huang Sijia, Garmire Lana X


Cancer, Deep learning, Ensemble learning, Machine learning, Prognosis, Survival, multi-omics

General General

Highly accurate protein structure prediction with AlphaFold.

In Nature ; h5-index 368.0

Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort1-4, the structures of around 100,000 unique proteins have been determined5, but this represents a small fraction of the billions of known protein sequences6,7. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the 3-D structure that a protein will adopt based solely on its amino acid sequence, the structure prediction component of the 'protein folding problem'8, has been an important open research problem for more than 50 years9. Despite recent progress10-14, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even where no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14)15, demonstrating accuracy competitive with experiment in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm.

Jumper John, Evans Richard, Pritzel Alexander, Green Tim, Figurnov Michael, Ronneberger Olaf, Tunyasuvunakool Kathryn, Bates Russ, Žídek Augustin, Potapenko Anna, Bridgland Alex, Meyer Clemens, Kohl Simon A A, Ballard Andrew J, Cowie Andrew, Romera-Paredes Bernardino, Nikolov Stanislav, Jain Rishub, Adler Jonas, Back Trevor, Petersen Stig, Reiman David, Clancy Ellen, Zielinski Michal, Steinegger Martin, Pacholska Michalina, Berghammer Tamas, Bodenstein Sebastian, Silver David, Vinyals Oriol, Senior Andrew W, Kavukcuoglu Koray, Kohli Pushmeet, Hassabis Demis


Ophthalmology Ophthalmology

Clinician-driven artificial intelligence in ophthalmology: resources enabling democratization.

In Current opinion in ophthalmology

PURPOSE OF REVIEW : This article aims to discuss the current state of resources enabling the democratization of artificial intelligence (AI) in ophthalmology.

RECENT FINDINGS : Open datasets, efficient labeling techniques, code-free automated machine learning (AutoML) and cloud-based platforms for deployment are resources that enable clinicians with scarce resources to drive their own AI projects.

SUMMARY : Clinicians are the use-case experts who are best suited to drive AI projects tackling patient-relevant outcome measures. Taken together, open datasets, efficient labeling techniques, code-free AutoML and cloud platforms break the barriers for clinician-driven AI. As AI becomes increasingly democratized through such tools, clinicians and patients stand to benefit greatly.

Korot Edward, Gonçalves Mariana B, Khan Saad M, Struyven Robbert, Wagner Siegfried K, Keane Pearse A


Ophthalmology Ophthalmology

Artificial intelligence-based predictions in neovascular age-related macular degeneration.

In Current opinion in ophthalmology

PURPOSE OF REVIEW : Predicting treatment response and optimizing treatment regimen in patients with neovascular age-related macular degeneration (nAMD) remains challenging. Artificial intelligence-based tools have the potential to increase confidence in clinical development of new therapeutics, facilitate individual prognostic predictions, and ultimately inform treatment decisions in clinical practice.

RECENT FINDINGS : To date, most advances in applying artificial intelligence to nAMD have focused on facilitating image analysis, particularly for automated segmentation, extraction, and quantification of imaging-based features from optical coherence tomography (OCT) images. No studies in our literature search evaluated whether artificial intelligence could predict the treatment regimen required for an optimal visual response for an individual patient. Challenges identified for developing artificial intelligence-based models for nAMD include the limited number of large datasets with high-quality OCT data, limiting the patient populations included in model development; lack of counterfactual data to inform how individual patients may have fared with an alternative treatment strategy; and absence of OCT data standards, impairing the development of models usable across devices.

SUMMARY : Artificial intelligence has the potential to enable powerful prognostic tools for a complex nAMD treatment landscape; however, additional work remains before these tools are applicable to informing treatment decisions for nAMD in clinical practice.

Ferrara Daniela, Newton Elizabeth M, Lee Aaron Y


General General

Ecotoxicological read-across models for predicting acute toxicity of freshly dispersed versus medium-aged NMs to Daphnia magna.

In Chemosphere

Nanoinformatics models to predict the toxicity/ecotoxicity of nanomaterials (NMs) are urgently needed to support commercialization of nanotechnologies and allow grouping of NMs based on their physico-chemical and/or (eco)toxicological properties, to facilitate read-across of knowledge from data-rich NMs to data-poor ones. Here we present the first ecotoxicological read-across models for predicting NMs ecotoxicity, which were developed in accordance with ECHA's recommended strategy for grouping of NMs as a means to explore in silico the effects of a panel of freshly dispersed versus environmentally aged (in various media) Ag and TiO2 NMs on the freshwater zooplankton Daphnia magna, a keystone species used in regulatory testing. The dataset used to develop the models consisted of dose-response data from 11 NMs (5 TiO2 NMs of identical cores with different coatings, and 6 Ag NMs with different capping agents/coatings) each dispersed in three different media (a high hardness medium (HH Combo) and two representative river waters containing different amounts of natural organic matter (NOM) and having different ionic strengths), generated in accordance with the OECD 202 immobilization test. The experimental hypotheses being tested were (1) that the presence of NOM in the medium would reduce the toxicity of the NMs by forming an ecological corona, and (2) that environmental ageing of NMs reduces their toxicity compared to the freshly dispersed NMs irrespective of the medium composition (salt only or NOM-containing). As per the ECHA guidance, the NMs were grouped into two categories - freshly dispersed and 2-year-aged and explored in silico to identify the most important features driving the toxicity in each group. The final predictive models have been validated according to the OECD criteria and a QSAR model report form (QMRF) report included in the supplementary information to support adoption of the models for regulatory purposes.

Varsou Dimitra-Danai, Ellis Laura-Jayne A, Afantitis Antreas, Melagraki Georgia, Lynch Iseult


Ecological corona, Machine learning, Nanoinformatics, Nanomaterials ageing, Nanosafety, Read-across

General General

Predicting suicide attempts and suicide deaths among adolescents following outpatient visits.

In Journal of affective disorders ; h5-index 79.0

BACKGROUND : Few studies report on machine learning models for suicide risk prediction in adolescents and their utility in identifying those in need of further evaluation. This study examined whether a model trained and validated using data from all age groups works as well for adolescents or whether it could be improved.

METHODS : We used healthcare data for 1.4 million specialty mental health and primary care outpatient visits among 256,823 adolescents across 7 health systems. The prediction target was 90-day risk of suicide attempt following a visit. We used logistic regression with least absolute shrinkage and selection operator (LASSO) and generalized estimating equations (GEE) to predict risk. We compared performance of three models: an existing model, a recalibrated version of that model, and a newly-learned model. Models were compared using area under the receiver operating curve (AUC), sensitivity, specificity, positive predictive value and negative predictive value.

RESULTS : The AUC produced by the existing model for specialty mental health visits estimated in adolescents alone (0.796; [0.789, 0.802]) was not significantly different than the AUC of the recalibrated existing model (0.794; [0.787, 0.80]) or the newly-learned model (0.795; [0.789, 0.801]). Predicted risk following primary care visits was also similar: existing (0.855; [0.844, 0.866]), recalibrated (0.85 [0.839, 0.862]), newly-learned (0.842, [0.829, 0.854]).

LIMITATIONS : The models did not incorporate non-healthcare risk factors. The models relied on ICD9-CM codes for diagnoses and outcome measurement.

CONCLUSIONS : Prediction models already in operational use by health systems can be reliably employed for identifying adolescents in need of further evaluation.

Penfold Robert B, Johnson Eric, Shortreed Susan M, Ziebell Rebecca A, Lynch Frances L, Clarke Greg N, Coleman Karen J, Waitzfelder Beth E, Beck Arne L, Rossom Rebecca C, Ahmedani Brian K, Simon Gregory E


Adolescents, Machine learning, Suicide