Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

Public Health Public Health

Asthma in farm children is more determined by genetic polymorphisms and in non-farm children by environmental factors.

In Pediatric allergy and immunology : official publication of the European Society of Pediatric Allergy and Immunology

BACKGROUND : The asthma syndrome is influenced by hereditary and environmental factors. With the example of farm exposure, we study whether genetic and environmental factors interact for asthma.

METHODS : Statistical learning approaches based on penalized regression and decision trees were used to predict asthma in the GABRIELA study with 850 cases (9% farm children) and 857 controls (14% farm children). Single-nucleotide polymorphisms (SNPs) were selected from a genome-wide dataset based on a literature search or by statistical selection techniques. Prediction was assessed by receiver operating characteristics (ROC) curves and validated in the PASTURE cohort.

RESULTS : Prediction by family history of asthma and atopy yielded an area under the ROC curve (AUC) of 0.62 [0.57-0.66] in the random forest machine learning approach. By adding information on demographics (sex and age) and 26 environmental exposure variables, the quality of prediction significantly improved (AUC=0.65 [0.61-0.70]). In farm children, however, environmental variables did not improve prediction quality. Rather SNPs related to IL33 and RAD50 contributed significantly to the prediction of asthma (AUC=0.70 [0.62-0.78]).

CONCLUSIONS : Asthma in farm children is more likely predicted by other factors as compared to non-farm children though in both forms, family history may integrate environmental exposure, genotype, and degree of penetrance.

Krautenbacher Norbert, Kabesch Michael, Horak Elisabeth, Braun-Fahrländer Charlotte, Genuneit Jon, Boznanski Andrzej, von Mutius Erika, Theis Fabian, Fuchs Christiane, Ege Markus J


Childhood asthma, GWAS, SNPs, environment, farming, machine learning, penalized regression, random forest, risk prediction, statistical learning

General General

Molecular basis and therapeutic potential of myostatin on bone formation and metabolism in orthopedic disease.

In BioFactors (Oxford, England)

Myostatin, a member of the transforming growth factor-β (TGF-β) superfamily, is a key autocrine/paracrine inhibitor of skeletal muscle growth. Recently, researchers have postulated that myostatin is a negative regulator of bone formation and metabolism. Reportedly, myostatin is highly expressed in the fracture area, affecting the endochondral ossification process during the early stages of fracture healing. Furthermore, myostatin is highly expressed in the synovium of patients with rheumatoid arthritis (RA) and is an effective therapeutic target for interfering with osteoclast formation and joint destruction in RA. Thus, myostatin is a potent anti-osteogenic factor and a direct modulator of osteoclast differentiation. Evaluation of the molecular pathway revealed that myostatin can activate SMAD and mitogen-activated protein kinase signaling pathways, inhibiting the Wnt/β-catenin pathway to synergistically regulate muscle and bone growth and metabolism. In summary, inhibition of myostatin or the myostatin signaling pathway has therapeutic potential in the treatment of orthopedic diseases. This review focused on the effects of myostatin on bone formation and metabolism and discussed the potential therapeutic effects of inhibiting myostatin and its pathways in related orthopedic diseases.

Cui Yinxing, Yi Qian, Sun Weichao, Huang Dixi, Zhang Hui, Duan Li, Shang Hongxi, Wang Daping, Xiong Jianyi


bone metabolism, bone regeneration, myostatin, myostatin inhibitor, myostatin signaling pathway

Public Health Public Health

Early prediction of mortality risk among patients with severe COVID-19, using machine learning.

In International journal of epidemiology ; h5-index 76.0

BACKGROUND : Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 infection, has been spreading globally. We aimed to develop a clinical model to predict the outcome of patients with severe COVID-19 infection early.

METHODS : Demographic, clinical and first laboratory findings after admission of 183 patients with severe COVID-19 infection (115 survivors and 68 non-survivors from the Sino-French New City Branch of Tongji Hospital, Wuhan) were used to develop the predictive models. Machine learning approaches were used to select the features and predict the patients' outcomes. The area under the receiver operating characteristic curve (AUROC) was applied to compare the models' performance. A total of 64 with severe COVID-19 infection from the Optical Valley Branch of Tongji Hospital, Wuhan, were used to externally validate the final predictive model.

RESULTS : The baseline characteristics and laboratory tests were significantly different between the survivors and non-survivors. Four variables (age, high-sensitivity C-reactive protein level, lymphocyte count and d-dimer level) were selected by all five models. Given the similar performance among the models, the logistic regression model was selected as the final predictive model because of its simplicity and interpretability. The AUROCs of the external validation sets were 0.881. The sensitivity and specificity were 0.839 and 0.794 for the validation set, when using a probability of death of 50% as the cutoff. Risk score based on the selected variables can be used to assess the mortality risk. The predictive model is available at [].

CONCLUSIONS : Age, high-sensitivity C-reactive protein level, lymphocyte count and d-dimer level of COVID-19 patients at admission are informative for the patients' outcomes.

Hu Chuanyu, Liu Zhenqiu, Jiang Yanfeng, Shi Oumin, Zhang Xin, Xu Kelin, Suo Chen, Wang Qin, Song Yujing, Yu Kangkang, Mao Xianhua, Wu Xuefu, Wu Mingshan, Shi Tingting, Jiang Wei, Mu Lina, Tully Damien C, Xu Lei, Jin Li, Li Shusheng, Tao Xuejin, Zhang Tiejun, Chen Xingdong


COVID-19, death, fatality rate, machine learning, predictive model

General General

Estimation and prediction of ellipsoidal molecular shapes in organic crystals based on ellipsoid packing.

In PloS one ; h5-index 176.0

Crystal structure prediction has been one of the fundamental and challenging problems in materials science. It is computationally exhaustive to identify molecular conformations and arrangements in organic molecular crystals due to complexity in intra- and inter-molecular interactions. From a geometrical viewpoint, specific types of organic crystal structures can be characterized by ellipsoid packing. In particular, we focus on aromatic systems which are important for organic semiconductor materials. In this study, we aim to estimate the ellipsoidal molecular shapes of such crystals and predict them from single molecular descriptors. First, we identify the molecular crystals with molecular centroid arrangements that correspond to affine transformations of four basic cubic lattices, through topological analysis of the dataset of crystalline polycyclic aromatic molecules. The novelty of our method is that the topological data analysis is applied to arrangements of molecular centroids intead of those of atoms. For each of the identified crystals, we estimate the intracrystalline molecular shape based on the ellipsoid packing assumption. Then, we show that the ellipsoidal shape can be predicted from single molecular descriptors using a machine learning method. The results suggest that topological characterization of molecular arrangements is useful for structure prediction of organic semiconductor materials.

Ito Daiki, Shirasawa Raku, Iino Yoichiro, Tomiya Shigetaka, Tanaka Gouhei


Cardiology Cardiology

Comparing a novel machine learning method to the Friedewald formula and Martin-Hopkins equation for low-density lipoprotein estimation.

In PloS one ; h5-index 176.0

BACKGROUND : Low-density lipoprotein cholesterol (LDL-C) is a target for cardiovascular prevention. Contemporary equations for LDL-C estimation have limited accuracy in certain scenarios (high triglycerides [TG], very low LDL-C).

OBJECTIVES : We derived a novel method for LDL-C estimation from the standard lipid profile using a machine learning (ML) approach utilizing random forests (the Weill Cornell model). We compared its correlation to direct LDL-C with the Friedewald and Martin-Hopkins equations for LDL-C estimation.

METHODS : The study cohort comprised a convenience sample of standard lipid profile measurements (with the directly measured components of total cholesterol [TC], high-density lipoprotein cholesterol [HDL-C], and TG) as well as chemical-based direct LDL-C performed on the same day at the New York-Presbyterian Hospital/Weill Cornell Medicine (NYP-WCM). Subsequently, an ML algorithm was used to construct a model for LDL-C estimation. Results are reported on the held-out test set, with correlation coefficients and absolute residuals used to assess model performance.

RESULTS : Between 2005 and 2019, there were 17,500 lipid profiles performed on 10,936 unique individuals (4,456 females; 40.8%) aged 1 to 103. Correlation coefficients between estimated and measured LDL-C values were 0.982 for the Weill Cornell model, compared to 0.950 for Friedewald and 0.962 for the Martin-Hopkins method. The Weill Cornell model was consistently better across subgroups stratified by LDL-C and TG values, including TG >500 and LDL-C <70.

CONCLUSIONS : An ML model was found to have a better correlation with direct LDL-C than either the Friedewald formula or Martin-Hopkins equation, including in the setting of elevated TG and very low LDL-C.

Singh Gurpreet, Hussain Yasin, Xu Zhuoran, Sholle Evan, Michalak Kelly, Dolan Kristina, Lee Benjamin C, van Rosendael Alexander R, Fatima Zahra, Peña Jessica M, Wilson Peter W F, Gotto Antonio M, Shaw Leslee J, Baskaran Lohendran, Al’Aref Subhi J


General General

Forecasting and optimizing Agrobacterium-mediated genetic transformation via ensemble model- fruit fly optimization algorithm: A data mining approach using chrysanthemum databases.

In PloS one ; h5-index 176.0

Optimizing the gene transformation factors can be considered as the first and foremost step in successful genetic engineering and genome editing studies. However, it is usually difficult to achieve an optimized gene transformation protocol due to the cost and time-consuming as well as the complexity of this process. Therefore, it is necessary to use a novel computational approach such as machine learning models for analyzing gene transformation data. In the current study, three individual machine learning models including Multi-Layer Perceptron (MLP), Adaptive Neuro-Fuzzy Inference System (ANFIS), and Radial Basis Function (RBF) were developed for forecasting Agrobacterium-mediated gene transformation in chrysanthemum based on eleven input variables including Agrobacterium strain, optical density (OD), co-culture period (CCP), and different antibiotics including kanamycin (K), vancomycin (VA), cefotaxime (CF), hygromycin (H), carbenicillin (CA), geneticin (G), ticarcillin (TI), and paromomycin (P). Consequently, best-obtained results were used in the fusion process by bagging method. Results showed that ensemble model with the highest R2 (0.83) had superb performance in comparison with all other individual models (MLP:063, RBF:0.69, and ANFIS: 0.74) in the validation set. Also, ensemble model was linked to Fruit fly optimization algorithm (FOA) for optimizing gene transformation, and the results showed that the maximum gene transformation efficiency (37.54%) can be achieved from EHA105 strain with 0.9 OD600, for 3.8 days CCP, 46.43 mg/l P, 9.54 mg/l K, 18.62 mg/l H, and 4.79 mg/l G as selection antibiotics and 109.74 μg/ml VA, 287.63 μg/ml CF, 334.07 μg/ml CA and 87.36 μg/ml TI as antibiotics in the selection medium. Moreover, sensitivity analysis demonstrated that input variables have a different degree of importance in gene transformation system in the order of Agrobacterium strain > CCP > K > CF > VA > P > OD > CA > H > TI > G. Generally, the developed hybrid model in this study (ensemble model-FOA) can be employed as an accurate and reliable approach in future genetic engineering and genome editing studies.

Hesami Mohsen, Alizadeh Milad, Naderi Roohangiz, Tohidfar Masoud