Doctor Penguin

Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

Public Health

Public Health

Examining thematic and emotional differences across Twitter, Reddit, and YouTube: The case of COVID-19 vaccine side effects.

In Computers in human behavior ; h5-index 125.0
Social media discourse has become a key data source for understanding the public's perception of, and sentiments during a public health crisis. However, given the different niches which platforms occupy in terms of information exchange, reliance on a single platform would provide an incomplete picture of public opinions. Based on the schema theory, this study suggests a 'social media platform schema' to indicate users' different expectations based on previous usages of platform and argues that a platform's distinct characteristics foster distinct platform schema and, in turn, distinct nature of information. We analyzed COVID-19 vaccine side effect-related discussions from Twitter, Reddit, and YouTube, each of which represents a different type of the platform, and found thematic and emotional differences across platforms. Thematic analysis using k-means clustering algorithm identified seven clusters in each platform. To computationally group and contrast thematic clusters across platforms, we employed modularity analysis using the Louvain algorithm to determine a semantic network structure based on themes. We also observed differences in emotional contexts across platforms. Theoretical and public health implications are then discussed.
Kwon Soyeon, Park Albert

2023-Jul

Consumer health information, Schema theory, Social media, Social network analysis, Unsupervised machine learning

General

General

A study on Shine-Muscat grape detection at maturity based on deep learning.

In Scientific reports ; h5-index 158.0
The efficient detection of grapes is a crucial technology for fruit-picking robots. To better identify grapes from branch shading that is similar to the fruit color and improve the detection accuracy of green grapes due to cluster adhesion, this study proposes a Shine-Muscat Grape Detection Model (S-MGDM) based on improved YOLOv3 for the ripening stage. DenseNet is fused in the backbone feature extraction network to extract richer underlying grape information; depth-separable convolution, CBAM, and SPPNet are added in the multi-scale detection module to increase the perceptual field of grape targets and reduce the model computation; meanwhile, PANet is combined with FPN to promote inter-network information flow and iteratively extract grape features. In addition, the CIOU regression loss function is used and the prior frame size is modified by the k-means algorithm to improve the accuracy of detection. The improved detection model achieves an AP value of 96.73% and an F1 value of 91% on the test set, which are 3.87% and 3% higher than the original network model, respectively; the average detection speed under GPU reaches 26.95 frames/s, which is 6.49 frames/s higher than the original model. The comparison results with several mainstream detection algorithms such as SSD and YOLO series show that the method has excellent detection accuracy and good real-time performance, which is an important reference value for the problem of accurate identification of Shine-Muscat grapes at maturity.
Wei Xinjie, Xie Fuxiang, Wang Kai, Song Jian, Bai Yang

2023-Mar-20

General

General

CBCovid19EC: A dataset complete blood count and PCR test for COVID-19 detection in Ecuadorian population.

In Data in brief
In this work, we present the complete blood count data and PCR test results of a population of Ecuadorians from different provinces, primarily residing in the Andean region, especially in Quito. PCR was the standard test to detect Covid-19 during the pandemic since 2020. The data were obtained between March 1st and August 12th, 2021. Segurilab and Previne Salud laboratories performed the tests. The dataset contains about 400 clinical cases. Each patient agreed to participate in the study by sharing the results of their PCR (reverse transcription polymerase chain reaction) tests and CBC (complete blood count). CBC test measured several components and features of the blood, including red blood cells, white blood cells, hemoglobin, hematocrit, and platelets. The shared data are intended to provide researchers with input to analyze various events associated with the diagnosis of Covid-19 linked to potential diseases identified in the components measured in the CBC test. These data are helpful for pattern analysis of blood components in modeling prediction and clustering problems. The components measured in the complete blood count and CRP together can be helpful for the analysis of different medical conditions using machine learning algorithms.
Ordoñez-Avila R, Parraga-Alava J, Hormaza J Meza, Vaca-Cárdenas L, Portmann E, Terán L, Dorn M

2023-Apr

Ecuador, Hematological data, Machine learning, SARS-Cov-2

Radiology

Radiology

MEDIMP: Medical Images and Prompts for renal transplant representation learning

ArXiv Preprint
Renal transplantation emerges as the most effective solution for end-stage renal disease. Occurring from complex causes, a substantial risk of transplant chronic dysfunction persists and may lead to graft loss. Medical imaging plays a substantial role in renal transplant monitoring in clinical practice. However, graft supervision is multi-disciplinary, notably joining nephrology, urology, and radiology, while identifying robust biomarkers from such high-dimensional and complex data for prognosis is challenging. In this work, taking inspiration from the recent success of Large Language Models (LLMs), we propose MEDIMP -- Medical Images and Prompts -- a model to learn meaningful multi-modal representations of renal transplant Dynamic Contrast-Enhanced Magnetic Resonance Imaging (DCE MRI) by incorporating structural clinicobiological data after translating them into text prompts. MEDIMP is based on contrastive learning from joint text-image paired embeddings to perform this challenging task. Moreover, we propose a framework that generates medical prompts using automatic textual data augmentations from LLMs. Our goal is to learn meaningful manifolds of renal transplant DCE MRI, interesting for the prognosis of the transplant or patient status (2, 3, and 4 years after the transplant), fully exploiting the available multi-modal data in the most efficient way. Extensive experiments and comparisons with other renal transplant representation learning methods with limited data prove the effectiveness of MEDIMP in a relevant clinical setting, giving new directions toward medical prompts. Our code is available at https://github.com/leomlck/MEDIMP.
Leo Milecki, Vicky Kalogeiton, Sylvain Bodard, Dany Anglicheau, Jean-Michel Correas, Marc-Olivier Timsit, Maria Vakalopoulou

2023-03-22

Public Health

Public Health

Splicing signature database development to delineate cancer pathways using literature mining and transcriptome machine learning.

In Computational and structural biotechnology journal
Alternative splicing (AS) events modulate certain pathways and phenotypic plasticity in cancer. Although previous studies have computationally analyzed splicing events, it is still a challenge to uncover biological functions induced by reliable AS events from tremendous candidates. To provide essential splicing event signatures to assess pathway regulation, we developed a database by collecting two datasets: (i) reported literature and (ii) cancer transcriptome profile. The former includes knowledge-based splicing signatures collected from 63,229 PubMed abstracts using natural language processing, extracted for 202 pathways. The latter is the machine learning-based splicing signatures identified from pan-cancer transcriptome for 16 cancer types and 42 pathways. We established six different learning models to classify pathway activities from splicing profiles as a learning dataset. Top-ranked AS events by learning model feature importance became the signature for each pathway. To validate our learning results, we performed evaluations by (i) performance metrics, (ii) differential AS sets acquired from external datasets, and (iii) our knowledge-based signatures. The area under the receiver operating characteristic values of the learning models did not exhibit any drastic difference. However, random-forest distinctly presented the best performance to compare with the AS sets identified from external datasets and our knowledge-based signatures. Therefore, we used the signatures obtained from the random-forest model. Our database provided the clinical characteristics of the AS signatures, including survival test, molecular subtype, and tumor microenvironment. The regulation by splicing factors was additionally investigated. Our database for developed signatures supported retrieval and visualization system.
Lee Kyubin, Hyung Daejin, Cho Soo Young, Yu Namhee, Hong Sewha, Kim Jihyun, Kim Sunshin, Han Ji-Youn, Park Charny

2023

AS, Alternative splicing, AUCPR, the area under the precision-recall curve, AUROC, the area under the receiver operating characteristic, Alternative splicing, DAS, differential alternative splicing, Database, EMT, epithelial mesenchymal transition, Gene signature, ML, machine learning, Machine-learning, NER, named entity recognition, NLP, natural language process, PCA, principal component analysis, PSI, percent spliced in index, RF, random-forest, SF, splicing factor, TCGA, The Cancer Genome Atlas, Text-mining, Tumor transcriptome

General

General

A dataset on the physiological state and behavior of drivers in conditionally automated driving.

In Data in brief
This dataset contains data of 346 drivers collected during six experiments conducted in a fixed-base driving simulator. Five studies simulated conditionally automated driving (L3-SAE), and the other one simulated manual driving (L0-SAE). The dataset includes physiological data (electrocardiogram (ECG), electrodermal activity (EDA), and respiration (RESP)), driving and behavioral data (reaction time, steering wheel angle, …), performance data of non-driving-related tasks, and questionnaire responses. Among them, measures from standardized questionnaires were collected, either to control the experimental manipulation of the driver's state, or to measure constructs related to human factors and driving safety (drowsiness, mental workload, affective state, situation awareness, situational trust, user experience). In the provided dataset, some raw data have been processed, notably physiological data from which physiological indicators (or features) have been calculated. The latter can be used as input for machine learning models to predict various states (sleep deprivation, high mental workload, ...) that may be critical for driver safety. Subjective self-reported measures can also be used as ground truth to apply regression techniques. Besides that, statistical analyses can be performed using the dataset, in particular to analyze the situational awareness or the takeover quality of drivers, in different states and different driving scenarios. Overall, this dataset contributes to better understanding and consideration of the driver's state and behavior in conditionally automated driving. In addition, this dataset stimulates and inspires research in the fields of physiological/affective computing and human factors in transportation, and allows companies from the automotive industry to better design adapted human-vehicle interfaces for safe use of automated vehicles on the roads.
Meteier Quentin, Capallera Marine, de Salis Emmanuel, Angelini Leonardo, Carrino Stefano, Widmer Marino, Abou Khaled Omar, Mugellini Elena, Sonderegger Andreas

2023-Apr

Conditionally automated driving, Driver state, Electrocardiogram (ECG), Electrodermal activity (EDA), Physiology, Respiration, Situation awareness (SA), Takeover quality