Doctor Penguin

Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

Public Health

Public Health

Monitoring Longitudinal Trends and Assessment of the Health Risk of Shigella flexneri Antimicrobial Resistance.

In Environmental science & technology ; h5-index 132.0
Shigella flexneri infection is the main cause of diarrhea in humans worldwide. The emergence of antimicrobial resistance (AMR) of S. flexneri is a growing public health threat worldwide, while large-scale studies monitoring the longitudinal AMR trends of isolates remain scarce. Here, the AMR gene (ARG) profiles of 717 S. flexneri isolates from 1920 to 2020 worldwide were determined. The results showed that the average number of ARGs in isolates has increased significantly, from 19.2 ± 2.4 before 1970 to 29.6 ± 5.3 after 2010. In addition, mobile genetic elements were important contributors to ARGs in S. flexneri isolates. The results of the structural equation model showed that the human development index drove the consumption of antibiotics and indirectly promoted the antibiotic resistance. Finally, a machine learning algorithm was used to predict the antibiotic resistance risk of global terrestrial S. flexneri isolates and successfully map the antibiotic resistance threats in global land habitats with over 80% accuracy. Collectively, this study monitored the longitudinal AMR trends, quantitatively surveilled the health risk of S. flexneri AMR, and provided a theoretical basis for mitigating the threat of antibiotic resistance.
Fang Guan-Yu, Mu Xiao-Jing, Huang Bing-Wen, Jiang Yu-Jian

2023-Mar-17

genomic analysis, human development index, machine learning, mobile genetic elements, pathogenic bacteria

General

General

miProBERT: identification of microRNA promoters based on the pre-trained model BERT.

In Briefings in bioinformatics
Accurate prediction of promoter regions driving miRNA gene expression has become a major challenge due to the lack of annotation information for pri-miRNA transcripts. This defect hinders our understanding of miRNA-mediated regulatory networks. Some algorithms have been designed during the past decade to detect miRNA promoters. However, these methods rely on biosignal data such as CpG islands and still need to be improved. Here, we propose miProBERT, a BERT-based model for predicting promoters directly from gene sequences without using any structural or biological signals. According to our information, it is the first time a BERT-based model has been employed to identify miRNA promoters. We use the pre-trained model DNABERT, fine-tune the pre-trained model on the gene promoter dataset so that the model includes information about the richer biological properties of promoter sequences in its representation, and then systematically scan the upstream regions of each intergenic miRNA using the fine-tuned model. About, 665 miRNA promoters are found. The innovative use of a random substitution strategy to construct a negative dataset improves the discriminative ability of the model and further reduces the false positive rate (FPR) to as low as 0.0421. On independent datasets, miProBERT outperformed other gene promoter prediction methods. With comparison on 33 experimentally validated miRNA promoter datasets, miProBERT significantly outperformed previously developed miRNA promoter prediction programs with 78.13% precision and 75.76% recall. We further verify the predicted promoter regions by analyzing conservation, CpG content and histone marks. The effectiveness and robustness of miProBERT are highlighted.
Wang Xin, Gao Xin, Wang Guohua, Li Dan

2023-Mar-17

BERT, deep learning, microRNA promoter, ncRNA

General

General

DaDL-SChlo: protein subchloroplast localization prediction based on generative adversarial networks and pre-trained protein language model.

In Briefings in bioinformatics
Chloroplast is a crucial site for photosynthesis in plants. Determining the location and distribution of proteins in subchloroplasts is significant for studying the energy conversion of chloroplasts and regulating the utilization of light energy in crop production. However, the prediction accuracy of the currently developed protein subcellular site predictors is still limited due to the complex protein sequence features and the scarcity of labeled samples. We propose DaDL-SChlo, a multi-location protein subchloroplast localization predictor, which addresses the above problems by fusing pre-trained protein language model deep learning features with traditional handcrafted features and using generative adversarial networks for data augmentation. The experimental results of cross-validation and independent testing show that DaDL-SChlo has greatly improved the prediction performance of protein subchloroplast compared with the state-of-the-art predictors. Specifically, the overall actual accuracy outperforms the state-of-the-art predictors by 10.7% on 10-fold cross-validation and 12.6% on independent testing. DaDL-SChlo is a promising and efficient predictor for protein subchloroplast localization. The datasets and codes of DaDL-SChlo are available at https://github.com/xwanggroup/DaDL-SChlo.
Wang Xiao, Han Lijun, Wang Rong, Chen Haoran

2023-Mar-17

data augmentation, generative adversarial network, pre-trained model, subchloroplast localization

Radiology

Radiology

An imaging-based method of mapping multi-echo BOLD intracranial pulsatility.

In Magnetic resonance in medicine ; h5-index 66.0

PURPOSE : Cardiac-related intracranial pulsatility may relate to cerebrovascular health, and this information is contained in BOLD MRI data. There is broad interest in methods to isolate BOLD pulsatility, and the current study examines a deep learning approach.

METHODS : Multi-echo BOLD images, respiratory, and cardiac recordings were measured in 55 adults. Ground truth BOLD pulsatility maps were calculated with an established method. BOLD fast Fourier transform magnitude images were used as temporal-frequency image inputs to a U-Net deep learning model. Model performance was evaluated by mean squared error (MSE), mean absolute error (MAE), structural similarity index (SSIM), and mutual information (MI). Experiments evaluated the influence of input channel size, an age group effect during training, dependence on TE, performance without the U-Net architecture, and importance of respiratory preprocessing.

RESULTS : The U-Net model generated BOLD pulsatility maps with lower MSE as additional fast Fourier transform input images were used. There was no age group effect for MSE (P > 0.14). MAE and SSIM metrics did not vary across TE (P > 0.36), whereas MI showed a significant TE dependence (P < 0.05). The U-Net versus no U-Net comparison showed no significant difference for MAE (P = 0.059); however, SSIM and MI were significantly different between models (P < 0.001). Within the insula, the cross-correlation values were high (r > 0.90) when comparing the U-Net model trained with/without respiratory preprocessing.

CONCLUSION : Multi-echo BOLD pulsatility maps were synthesized from a U-net model that was trained to use temporal-frequency BOLD image inputs. This work adds to the deep learning methods that characterize BOLD physiological signals.

Valsamis Jake J, Luciw Nicholas J, Haq Nandinee, Atwi Sarah, Duchesne Simon, Cameron William, MacIntosh Bradley J

2023-Mar-17

blood oxygenation, cerebrovascular, deep learning, pulsatility

General

General

Anticipating the transmissibility of the 2022 mpox outbreak.

In Journal of medical virology
An ongoing outbreak of monkeypox virus (MPXV) was first reported in the United Kingdom (UK) on 6 May 2022. As of 17 November, there had been a total of 80,221 confirmed MPXV cases in over 110 countries. Based on data reported between 6 May and 30 June 2022 in the UK, Spain, and Germany, we applied a deep learning approach using convolutional neural networks to evaluate the parameters of the 2022 MPXV outbreak. The basic reproduction number (R0) of MPXV was estimated to be 2.32 in the UK, which indicates the active diffusion of MPXV since the beginning of the outbreak. The data from Spain and Germany produced higher median R0 values of 2.42 and 2.88, respectively. Importantly, the estimated R0 of MPXV in the three countries tends to the previously calculated R0 of smallpox (3.50 to 6.00). Furthermore, the incubation (1/ε) and infectious (1/γ) period was predicted between 9-10 days and 4-5 days, respectively. The R0 value derived from MPXV is consistent with the significantly increasing number of cases, indicating the risk of a rapid spread of MPXV worldwide, which would provide important insights for the prevention and control of MPXV epidemic. This article is protected by copyright. All rights reserved.
Liu Tuoyu, Yang Shan, Luo Boyu, Fan Xinyue, Zhuang Yingtan, Gao George F, Bi Yuhai, Teng Yue

2023-Mar-17

R0, epidemic, mpox, outbreak, transmissibility

General

General

Measuring biological age using a functionally interpretable multi-tissue RNA clock.

In Aging cell ; h5-index 58.0
The quantification of the biological age of cells yields great promises for accelerating the discovery of novel rejuvenation strategies. Here, we present MultiTIMER, the first multi-tissue aging clock that measures the biological, rather than chronological, age of cells from their transcriptional profiles by evaluating key cellular processes. We applied MultiTIMER to more than 70,000 transcriptional profiles and demonstrate that it accurately responds to cellular stressors and known interventions while informing about dysregulated cellular functions.
Jung Sascha, Arcos Hodar Javier, Del Sol Antonio

2023-Mar-16

aging, machine learning, transcriptomics