Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Integrated knowledge mining, genome-scale modeling, and machine learning for predicting Yarrowia lipolytica bioproduction.

In Metabolic engineering

Predicting bioproduction titers from microbial hosts has been challenging due to complex interactions between microbial regulatory networks, stress responses, and suboptimal cultivation conditions. This study integrated knowledge mining, feature extraction, genome-scale modeling (GSM), and machine learning (ML) to develop a model for predicting Yarrowia lipolytica chemical titers (i.e., organic acids, terpenoids, etc.). First, Y. lipolytica production data, including cultivation conditions, genetic engineering strategies, and product information, was manually collected from literature (∼100 papers) and stored as either numerical (e.g., substrate concentrations) or categorical (e.g., bioreactor modes) variables. For each case recorded, central pathway fluxes were estimated using GSMs and flux balance analysis (FBA) to provide metabolic features. Second, a ML ensemble learner was trained to predict strain production titers. Accurate predictions were obtained for instances with production titers >1 g/L (R2 = 0.92). However, the model had reduced predictability for low performance strains (0.01-1 g/L, R2 = 0.36) due to biosynthesis bottlenecks not captured in the features. Feature ranking indicated that the FBA fluxes, the number of enzyme steps, the substrate inputs, and thermodynamic barriers (i.e., Gibbs free energy of reaction) were the most influential factors. Third, the model was evaluated on other oleaginous yeasts and indicated there were conserved features for some hosts that can be potentially exploited by transfer learning. The platform was also designed to assist computational strain design tools (such as OptKnock) to screen genetic targets for improved microbial production in light of experimental conditions.

Czajka Jeffrey J, Oyetunde Tolutola, Tang Yinjie J


Computational strain design, FBA, Machine learning, Pathway bottlenecks, Yarrowia lipolytica

General General

ncRDense: A novel computational approach for classification of non-coding RNA family by deep learning.

In Genomics

With the rapidly growing importance of biological research, non-coding RNAs (ncRNA) attract more attention in biology and bioinformatics. They play vital roles in biological processes such as transcription and translation. Classification of ncRNAs is essential to our understanding of disease mechanisms and treatment design. Many approaches to ncRNA classification have been developed, several of which use machine learning and deep learning. In this paper, we construct a novel deep learning-based architecture, ncRDense, to effectively classify and distinguish ncRNA families. In a comparative study, our model produces comparable results with existing state-of-the-art methods. Finally, we built a freely accessible web server for the ncRDense tool, which is available at

Chantsalnyam Tuvshinbayar, Siraj Arslan, Tayara Hilal, Chong Kil To


Classification, Deep learning, Densenet, Feature encoding, Non-coding RNA

Internal Medicine Internal Medicine

The effect of cardiac rhythm on artificial intelligence-enabled ECG evaluation of left ventricular ejection fraction prediction in cardiac intensive care unit patients.

In International journal of cardiology ; h5-index 68.0

The presence of left ventricular systolic dysfunction (LVSD) alters clinical management and prognosis in most acute and chronic cardiovascular conditions. While transthoracic echocardiography (TTE) remains the most common diagnostic tool to screen for LVSD, it is operator-dependent, time-consuming, effort-intensive, and relatively expensive. Recent work has demonstrated the ability of an artificial intelligence-augment ECG (AI-ECG) model to accurately predict LVSD in critical intensive care unit (CICU) patients. We demonstrate that the AI-ECG algorithm can maintain its performance in these patients with and without AF despite their clinical differences. An AI-ECG algorithm can serve as a non-invasive, inexpensive, and rapid screening tool for early detection of LVSD in resource-limited settings, and potentially expedite clinical decision making and guideline-directed therapies in the acute care setting.

Kashou Anthony H, Noseworthy Peter A, Lopez-Jimenez Francisco, Attia Zachi I, Kapa Suraj, Friedman Paul A, Jentzer Jacob C


Artificial intelligence, Atrial fibrillation, Cardiac intensive care unit, Echocardiography, Electrocardiogram, Left ventricular systolic dysfunction

General General

High-Throughput Computational Analysis of Biofilm Formation from Time-Lapse Microscopy.

In Current protocols

Candida albicans biofilm formation in the presence of drugs can be examined through time-lapse microscopy. In many cases, the images are used qualitatively, which limits their utility for hypothesis testing. We employed a machine-learning algorithm implemented in the Orbit Image Analysis program to detect the percent area covered by cells from each image. This is combined with custom R scripts to determine the growth rate, growth asymptote, and time to reach the asymptote as quantitative proxies for biofilm formation. We describe step-by-step protocols that go from sample preparation for time-lapse microscopy through image analysis parameterization and visualization of the model fit. © 2021 Wiley Periodicals LLC. Basic Protocol 1: Sample preparation Basic Protocol 2: Time-lapse microscopy: Evos protocol Basic Protocol 3: Batch file renaming Basic Protocol 4: Machine learning analysis of Evos images with Orbit Basic Protocol 5: Parametrization of Orbit output in R Basic Protocol 6: Visualization of logistic fits in R.

Salama Ola E, Gerstein Aleeza C


Candida, R, drug response, image analysis, orbit

Dermatology Dermatology

Do AI models recognise rare, aggressive skin cancers? An assessment of a direct-to-consumer app in the diagnosis of Merkel cell carcinoma and amelanotic melanoma.

In Journal of the European Academy of Dermatology and Venereology : JEADV

Machine learning (ML) models for skin cancer recognition have reported comparable or superior performance to dermatologists in controlled or restricted settings.1 One restriction is the number of disease classes. When trained models are deployed into real-world contexts, an important challenge will be the detection of rare but aggressive skin cancers that are not well-covered in training datasets, such as Merkel cell carcinoma (MCC) and amelanotic melanoma.

Steele L, Velazquez-Pimentel D, Thomas B R


Amelanotic, Artificial Intelligence, Machine Learning, Melanoma, Mobile Applications, Reproducibility of Results, Skin Neoplasms

General General

BERM: a Belowground Ecosystem Resiliency Model for estimating Spartina alterniflora belowground biomass.

In The New phytologist

Spatiotemporal patterns of Spartina alterniflora belowground biomass (BGB) are important for evaluating salt marsh resiliency. To solve this, we created the BERM (Belowground Ecosystem Resiliency Model), which estimates monthly BGB (30-m spatial resolution) from freely available data such as Landsat-8 and Daymet climate summaries. Our modeling framework relied on extreme gradient boosting, and used field observations from four Georgia salt marshes as ground-truth data. Model predictors included estimated tidal inundation, elevation, leaf area index, foliar nitrogen, chlorophyll, surface temperature, phenology, and climate data. The final model included thirty-three variables, and the most important variables were elevation, vapor pressure from the previous four months, NDVI from the previous five months, and inundation. Root Mean Squared Error for BGB from testing data was 313 g m-2 (11% of the field data range), explained variance (R2 ) was 0.62-0.77. Testing data results were unbiased across BGB values and were positively correlated with ground-truth data across all sites and years (r = 0.56-0.82 and 0.45-0.95, respectively). BERM can estimate BGB within S. alterniflora salt marshes where environmental parameters are within the training data range, and can be readily extended through a reproducible workflow. This provides a powerful approach for evaluating spatiotemporal BGB and associated ecosystem function.

O’Connell Jessica L, Mishra Deepak R, Alber Merryl, Byrd Kristin B


\nSporobolus alterniflorus, Georgia Coastal Ecosystems LTER, PhenoCam, machine learning, phenology, productivity, tidal salt marsh, wetland