Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Systems biology approaches integrated with artificial intelligence for optimized food-focused metabolic engineering.

In Metabolic engineering communications

Metabolic engineering aims to maximize the production of bio-economically important substances (compounds, enzymes, or other proteins) through the optimization of the genetics, cellular processes and growth conditions of microorganisms. This requires detailed understanding of underlying metabolic pathways involved in the production of the targeted substances, and how the cellular processes or growth conditions are regulated by the engineering. To achieve this goal, a large system of experimental techniques, compound libraries, computational methods and data resources, including the multi-omics data, are used. The recent advent of multi-omics systems biology approaches significantly impacted the field by opening new avenues to perform dynamic and large-scale analyses that deepen our knowledge on the manipulations. However, with the enormous transcriptomics, proteomics and metabolomics available, it is a daunting task to integrate the data for a more holistic understanding. Novel data mining and analytics approaches, including Artificial Intelligence (AI), can provide breakthroughs where traditional low-throughput experiment-alone methods cannot easily achieve. Here, we review the latest attempts of combining systems biology and AI in metabolic engineering research, and highlight how this alliance can help overcome the current challenges facing industrial biotechnology, especially for food-related substances and compounds using microorganisms.

Helmy Mohamed, Smith Derek, Selvarajoo Kumar


oncology Oncology

Immune contexture analysis in immuno-oncology: applications and challenges of multiplex fluorescent immunohistochemistry.

In Clinical & translational immunology

The tumor microenvironment is an integral player in cancer initiation, tumor progression, response and resistance to anti-cancer therapy. Understanding the complex interactions of tumor immune architecture (referred to as 'immune contexture') has therefore become increasingly desirable to guide our approach to patient selection, clinical trial design, combination therapies, and patient management. Quantitative image analysis based on multiplexed fluorescence immunohistochemistry and deep learning technologies are rapidly developing to enable researchers to interrogate complex information from the tumor microenvironment and find predictive insights into treatment response. Herein, we discuss current developments in multiplexed fluorescence immunohistochemistry for immune contexture analysis, and their application in immuno-oncology, and discuss challenges to effectively use this technology in clinical settings. We also present a multiplexed image analysis workflow to analyse fluorescence multiplexed stained tumor sections using the Vectra Automated Digital Pathology System together with FCS express flow cytometry software. The benefit of this strategy is that the spectral unmixing accurately generates and analyses complex arrays of multiple biomarkers, which can be helpful for diagnosis, risk stratification, and guiding clinical management of oncology patients.

Shakya Reshma, Nguyen Tam Hong, Waterhouse Nigel, Khanna Rajiv


FCS express image cytometry, immune profiling, multiplexed fluorescent immunohistochemistry, quantitative digital pathology, tumor microenvironment, vectra

General General

Defining relictual biodiversity: Conservation units in speckled dace (Leuciscidae: Rhinichthys osculus) of the Greater Death Valley ecosystem.

In Ecology and evolution

The tips in the tree of life serve as foci for conservation and management, yet clear delimitations are masked by inherent variance at the species-population interface. Analyses using thousands of nuclear loci can potentially sort inconsistencies, yet standard categories applied to this parsing are themselves potentially conflicting and/or subjective [e.g., DPS (distinct population segments); DUs (Diagnosable Units-Canada); MUs (management units); SSP (subspecies); ESUs (Evolutionarily Significant Units); and UIEUs (uniquely identified evolutionary units)]. One potential solution for consistent categorization is to create a comparative framework by accumulating statistical results from independent studies and evaluating congruence among data sets. Our study illustrates this approach in speckled dace (Leuciscidae: Rhinichthys osculus) endemic to two basins (Owens and Amargosa) in the Death Valley ecosystem. These fish persist in the Mojave Desert as isolated Plio-Pleistocene relicts and are of conservation concern, but lack formal taxonomic descriptions/designations. Double digest RAD (ddRAD) methods identified 14,355 SNP loci across 10 populations (N = 140). Species delimitation analyses [multispecies coalescent (MSC) and unsupervised machine learning (UML)] delineated four putative ESUs. FST outlier loci (N = 106) were juxtaposed to uncover the potential for localized adaptations. We detected one hybrid population that resulted from upstream reconnection of habitat following contemporary pluvial periods, whereas remaining populations represent relics of ancient tectonism within geographically isolated springs and groundwater-fed streams. Our study offers three salient conclusions: a blueprint for a multifaceted delimitation of conservation units; a proposed mechanism by which criteria for intraspecific biodiversity can be potentially standardized; and a strong argument for the proactive management of critically endangered Death Valley ecosystem fishes.

Mussmann Steven M, Douglas Marlis R, Oakey David D, Douglas Michael E


Amargosa Basin, Owens Basin, SNPs, ddRAD, machine learning, phylogenomics, selection

General General

Improving the accessibility and transferability of machine learning algorithms for identification of animals in camera trap images: MLWIC2.

In Ecology and evolution

Motion-activated wildlife cameras (or "camera traps") are frequently used to remotely and noninvasively observe animals. The vast number of images collected from camera trap projects has prompted some biologists to employ machine learning algorithms to automatically recognize species in these images, or at least filter-out images that do not contain animals. These approaches are often limited by model transferability, as a model trained to recognize species from one location might not work as well for the same species in different locations. Furthermore, these methods often require advanced computational skills, making them inaccessible to many biologists. We used 3 million camera trap images from 18 studies in 10 states across the United States of America to train two deep neural networks, one that recognizes 58 species, the "species model," and one that determines if an image is empty or if it contains an animal, the "empty-animal model." Our species model and empty-animal model had accuracies of 96.8% and 97.3%, respectively. Furthermore, the models performed well on some out-of-sample datasets, as the species model had 91% accuracy on species from Canada (accuracy range 36%-91% across all out-of-sample datasets) and the empty-animal model achieved an accuracy of 91%-94% on out-of-sample datasets from different continents. Our software addresses some of the limitations of using machine learning to classify images from camera traps. By including many species from several locations, our species model is potentially applicable to many camera trap studies in North America. We also found that our empty-animal model can facilitate removal of images without animals globally. We provide the trained models in an R package (MLWIC2: Machine Learning for Wildlife Image Classification in R), which contains Shiny Applications that allow scientists with minimal programming experience to use trained models and train new models in six neural network architectures with varying depths.

Tabak Michael A, Norouzzadeh Mohammad S, Wolfson David W, Newton Erica J, Boughton Raoul K, Ivan Jacob S, Odell Eric A, Newkirk Eric S, Conrey Reesa Y, Stenglein Jennifer, Iannarilli Fabiola, Erb John, Brook Ryan K, Davis Amy J, Lewis Jesse, Walsh Daniel P, Beasley James C, VerCauteren Kurt C, Clune Jeff, Miller Ryan S


R package, computer vision, deep convolutional neural networks, image classification, machine learning, motion‐activated camera, remote sensing, species identification

Public Health Public Health


In The annals of applied statistics

Bayesian Additive Regression Trees (BART) is a flexible machine learning algorithm capable of capturing nonlinearities between an outcome and covariates and interactions among covariates. We extend BART to a semiparametric regression framework in which the conditional expectation of an outcome is a function of treatment, its effect modifiers, and confounders. The confounders are allowed to have unspecified functional form, while treatment and effect modifiers that are directly related to the research question are given a linear form. The result is a Bayesian semiparametric linear regression model where the posterior distribution of the parameters of the linear part can be interpreted as in parametric Bayesian regression. This is useful in situations where a subset of the variables are of substantive interest and the others are nuisance variables that we would like to control for. An example of this occurs in causal modeling with the structural mean model (SMM). Under certain causal assumptions, our method can be used as a Bayesian SMM. Our methods are demonstrated with simulation studies and an application to dataset involving adults with HIV/Hepatitis C coinfection who newly initiate antiretroviral therapy. The methods are available in an R package called semibart.

Zeldow Bret, Lo Re Vincent, Roy Jason


Bayesian Additive Regression Trees, antiretrovirals, structural mean model

General General

Deep Learning for Acute Myeloid Leukemia Diagnosis.

In Journal of medicine and life

By changing the lifestyle and increasing the cancer incidence, accurate diagnosis becomes a significant medical action. Today, DNA microarray is widely used in cancer diagnosis and screening since it is able to measure gene expression levels. Analyzing them by using common statistical methods is not suitable because of the high gene expression data dimensions. So, this study aims to use new techniques to diagnose acute myeloid leukemia. In this study, the leukemia microarray gene data, contenting 22283 genes, was extracted from the Gene Expression Omnibus repository. Initial preprocessing was applied by using a normalization test and principal component analysis in Python. Then DNNs neural network designed and implemented to the data and finally results cross-validated by classifiers. The normalization test was significant (P>0.05) and the results show the PCA gene segregation potential and independence of cancer and healthy cells. The results accuracy for single-layer neural network and DNNs deep learning network with three hidden layers are 63.33 and 96.67, respectively. Using new methods such as deep learning can improve diagnosis accuracy and performance compared to the old methods. It is recommended to use these methods in cancer diagnosis and effective gene selection in various types of cancer.

Nazari Elham, Farzin Amir Hossein, Aghemiri Mehran, Avan Amir, Tara Mahmood, Tabesh Hamed

AML, deep learning, machine learning, microarray, neural network