Doctor Penguin

Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General

General

Machine learning models trained on synthetic datasets of multiple sample sizes for the use of predicting blood pressure from clinical data in a national dataset.

In PloS one ; h5-index 176.0

INTRODUCTION : The potential for synthetic data to act as a replacement for real data in research has attracted attention in recent months due to the prospect of increasing access to data and overcoming data privacy concerns when sharing data. The field of generative artificial intelligence and synthetic data is still early in its development, with a research gap evidencing that synthetic data can adequately be used to train algorithms that can be used on real data. This study compares the performance of a series machine learning models trained on real data and synthetic data, based on the National Diet and Nutrition Survey (NDNS).

METHODS : Features identified to be potentially of relevance by directed acyclic graphs were isolated from the NDNS dataset and used to construct synthetic datasets and impute missing data. Recursive feature elimination identified only four variables needed to predict mean arterial blood pressure: age, sex, weight and height. Bayesian generalised linear regression, random forest and neural network models were constructed based on these four variables to predict blood pressure. Models were trained on the real data training set (n = 2408), a synthetic data training set (n = 2408) and larger synthetic data training set (n = 4816) and a combination of the real and synthetic data training set (n = 4816). The same test set (n = 424) was used for each model.

RESULTS : Synthetic datasets demonstrated a high degree of fidelity with the real dataset. There was no significant difference between the performance of models trained on real, synthetic or combined datasets. Mean average error across all models and all training data ranged from 8.12 To 8.33. This indicates that synthetic data was capable of training equally accurate machine learning models as real data.

DISCUSSION : Further research is needed on a variety of datasets to confirm the utility of synthetic data to replace the use of potentially identifiable patient data. There is also further urgent research needed into evidencing that synthetic data can truly protect patient privacy against adversarial attempts to re-identify real individuals from the synthetic dataset.

Arora Anmol, Arora Ananya

2023

General

General

AUNet: a deep learning method for spectral information classification to identify inks.

In Analytical methods : advancing methods and applications
It is common to tamper with the contents of documents and forge contracts illegally. In this work, we propose a U-shaped network with attention modules (AUNet) and combine it with a hyperspectral system to effectively identify different inks. It provides an effective detection method for illegal tampering with documents and forging contract contents. First, the hyperspectral system obtains the spectral information of different pen inks without destroying the sample. Second, because the hyperspectral system's detection data have the characteristics of small samples, we introduce U-Net to conduct the deep fusion of multi-level spectral information to avoid feature degradation and fully mine the deep features hidden in the spectral information. Finally, spatial and channel attention modules are introduced to focus on the features affecting classification performance. The results show that AUNet effectively realizes the effective classification of ink spectral information and achieves 97.81% accuracy, 98.71% recall, 98.80% precision, and 98.71% F₁-score.
Shi Yan, He Xinyu, Zhang Qinglun, Yin Chongbo, Feng Ninghui, Chen Haoming, Lin Hualing

2023-Mar-17

General

General

Corrigendum: Small target detection with remote sensing images based on an improved YOLOv5 algorithm.

In Frontiers in neurorobotics
[This corrects the article DOI: 10.3389/fnbot.2022.1074862.].
Pei Wenjing, Shi Zhanhao, Gong Kai

2023

EIoU loss, YOLOv5s, deep learning, remote sensing images, small target detection

General

General

Longitudinal proteomic investigation of COVID-19 vaccination.

In Protein & cell
Although the development of COVID-19 vaccines has been a remarkable success, the heterogeneous individual antibody generation and decline over time are unknown and still hard to predict. In this study, blood samples were collected from 163 participants who next received two doses of an inactivated COVID-19 vaccine (CoronaVac®) at a 28-day interval. Using TMT-based proteomics, we identified 1,715 serum and 7,342 peripheral blood mononuclear cells (PBMCs) proteins. We proposed two sets of potential biomarkers (seven from serum, five from PBMCs) at baseline using machine learning, and predicted the individual seropositivity 57 days after vaccination (AUC = 0.87). Based on the four PBMC's potential biomarkers, we predicted the antibody persistence until 180 days after vaccination (AUC = 0.79). Our data highlighted characteristic hematological host responses, including altered lymphocyte migration regulation, neutrophil degranulation, and humoral immune response. This study proposed potential blood-derived protein biomarkers before vaccination for predicting heterogeneous antibody generation and decline after COVID-19 vaccination, shedding light on immunization mechanisms and individual booster shot planning.
Wang Yingrui, Zhu Qianru, Sun Rui, Yi Xiao, Huang Lingling, Hu Yifan, Ge Weigang, Gao Huanhuan, Ye Xinfu, Song Yu, Shao Li, Li Yantao, Li Jie, Guo Tiannan, Shi Junping

2023-Feb-06

COVID-19, machine learning, neutralizing antibodies (NAbs), proteomics, vaccination

General

General

A deep reinforcement learning algorithm for the rectangular strip packing problem.

In PloS one ; h5-index 176.0
As a branch of the two-dimensional (2D) optimal blanking problem, rectangular strip packing is a typical non-deterministic polynomial (NP-hard) problem. The classical packing solution method relies on heuristic and metaheuristic algorithms. Usually, it needs to be designed with manual decisions to guide the solution, resulting in a small solution scale, weak generalization, and low solution efficiency. Inspired by deep learning and reinforcement learning, combined with the characteristics of rectangular piece packing, a novel algorithm based on deep reinforcement learning is proposed in this work to solve the rectangular strip packing problem. The pointer network with an encoder and decoder structure is taken as the basic network for the deep reinforcement learning algorithm. A model-free reinforcement learning algorithm is designed to train network parameters to optimize the packing sequence. This design can not only avoid designing heuristic rules separately for different problems but also use the deep networks with self-learning characteristics to solve different instances more widely. At the same time, a piece positioning algorithm based on the maximum rectangles bottom-left (Maxrects-BL) is designed to determine the placement position of pieces on the plate and calculate model rewards and packing parameters. Finally, instances are used to analyze the optimization effect of the algorithm. The experimental results show that the proposed algorithm can produce three better and five comparable results compared with some classical heuristic algorithms. In addition, the calculation time of the proposed algorithm is less than 1 second in all test instances, which shows a good generalization, solution efficiency, and practical application potential.
Fang Jie, Rao Yunqing, Shi Mingliang

2023

Radiology

Radiology

Quantitative bone marrow lesion, meniscus, and synovitis measurement: current status.

In Skeletal radiology
Imaging plays a pivotal role in osteoarthritis research, particularly in epidemiological and clinical trials of knee osteoarthritis (KOA), with the ultimate goal being the development of an effective drug treatment for future prevention or cessation of disease. Imaging assessment methods can be semi-quantitative, quantitative, or a combination, with quantitative methods usually relying on software to assist. The software generally attempts image segmentation (outlining of relevant structures). New techniques using artificial intelligence (AI) or deep learning (DL) are currently a frequent topic of research. This review article provides an overview of the literature to date, focusing primarily on the current status of quantitative software-based assessment techniques of KOA using magnetic resonance (MR) imaging. We will concentrate on the imaging evaluation of three specific structural imaging biomarkers: bone marrow lesions (BMLs), meniscus, and synovitis consisting of effusion synovitis (ES) and Hoffa's synovitis (HS). A brief clinical and imaging background review of osteoarthritis evaluation, particularly relating to these three structural markers, is provided as well as a general summary of the software methods. A summary of the literature with respect to each KOA assessment method will be presented overall as well as with respect to each specific biomarker individually. Novel techniques, as well as future goals and directions using quantitative imaging assessment, will be discussed.
Smith Stacy E, Bahouth Sara M, Duryea Jeffrey

2023-Mar-16

Bone marrow lesions, Effusion synovitis, Hoffa’s, MRI; Quantitative software, Meniscus