Doctor Penguin

Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General

General

Forecast-Aware Model Driven LSTM

ArXiv Preprint
Poor air quality can have a significant impact on human health. The National Oceanic and Atmospheric Administration (NOAA) air quality forecasting guidance is challenged by the increasing presence of extreme air quality events due to extreme weather events such as wild fires and heatwaves. These extreme air quality events further affect human health. Traditional methods used to correct model bias make assumptions about linearity and the underlying distribution. Extreme air quality events tend to occur without a strong signal leading up to the event and this behavior tends to cause existing methods to either under or over compensate for the bias. Deep learning holds promise for air quality forecasting in the presence of extreme air quality events due to its ability to generalize and learn nonlinear problems. However, in the presence of these anomalous air quality events, standard deep network approaches that use a single network for generalizing to future forecasts, may not always provide the best performance even with a full feature-set including geography and meteorology. In this work we describe a method that combines unsupervised learning and a forecast-aware bi-directional LSTM network to perform bias correction for operational air quality forecasting using AirNow station data for ozone and PM2.5 in the continental US. Using an unsupervised clustering method trained on station geographical features such as latitude and longitude, urbanization, and elevation, the learned clusters direct training by partitioning the training data for the LSTM networks. LSTMs are forecast-aware and implemented using a unique way to perform learning forward and backwards in time across forecasting days. When comparing the RMSE of the forecast model to the RMSE of the bias corrected model, the bias corrected model shows significant improvement (27\% lower RMSE for ozone) over the base forecast.
Sophia Hamer, Jennifer Sleeman, Ivanka Stajner

2023-03-23

General

General

Human Behavior in the Time of COVID-19: Learning from Big Data

ArXiv Preprint
Since the World Health Organization (WHO) characterized COVID-19 as a pandemic in March 2020, there have been over 600 million confirmed cases of COVID-19 and more than six million deaths as of October 2022. The relationship between the COVID-19 pandemic and human behavior is complicated. On one hand, human behavior is found to shape the spread of the disease. On the other hand, the pandemic has impacted and even changed human behavior in almost every aspect. To provide a holistic understanding of the complex interplay between human behavior and the COVID-19 pandemic, researchers have been employing big data techniques such as natural language processing, computer vision, audio signal processing, frequent pattern mining, and machine learning. In this study, we present an overview of the existing studies on using big data techniques to study human behavior in the time of the COVID-19 pandemic. In particular, we categorize these studies into three groups - using big data to measure, model, and leverage human behavior, respectively. The related tasks, data, and methods are summarized accordingly. To provide more insights into how to fight the COVID-19 pandemic and future global catastrophes, we further discuss challenges and potential opportunities.
Hanjia Lyu, Arsal Imtiaz, Yufei Zhao, Jiebo Luo

2023-03-23

General

General

Natural language processing models reveal neural dynamics of human conversation.

In bioRxiv : the preprint server for biology
Human verbal communication requires a rapid interplay between speech planning, production, and comprehension. These processes are subserved by local and long-range neural dynamics across widely distributed brain areas. How linguistic information is precisely represented during natural conversation or what shared neural processes are involved, however, remain largely unknown. Here we used intracranial neural recordings in participants engaged in free dialogue and employed deep learning natural language processing models to find a striking similarity not only between neural-to-artificial network activities but also between how linguistic information is encoded in brain during production and comprehension. Collectively, neural activity patterns that encoded linguistic information were closely aligned to those reflecting speaker-listener transitions and were reduced after word utterance or when no conversation was held. They were also observed across distinct mesoscopic areas and frequency bands during production and comprehension, suggesting that these signals reflected the hierarchically structured information being conveyed during dialogue. Together, these findings suggest that linguistic information is encoded in the brain through similar neural representations during both speaking and listening, and start to reveal the distributed neural dynamics subserving human communication.
Cai Jing, Hadjinicolaou Alex E, Paulk Angelique C, Williams Ziv M, Cash Sydney S

2023-Mar-11

General

General

Digitally Diagnosing Multiple Developmental Delays using Crowdsourcing fused with Machine Learning: A Research Protocol.

In medRxiv : the preprint server for health sciences

BACKGROUND : Roughly 17% percent of minors in the United States aged 3 through 17 years have a diagnosis of one or more developmental or psychiatric conditions, with the true prevalence likely being higher due to underdiagnosis in rural areas and for minority populations. Unfortunately, timely diagnostic services are inaccessible to a large portion of the United States and global population due to cost, distance, and clinician availability. Digital phenotyping tools have the potential to shorten the time-to-diagnosis and to bring diagnostic services to more people by enabling accessible evaluations. While automated machine learning (ML) approaches for detection of pediatric psychiatry conditions have garnered increased research attention in recent years, existing approaches use a limited set of social features for the prediction task and focus on a single binary prediction.

OBJECTIVE : I propose the development of a gamified web system for data collection followed by a fusion of novel crowdsourcing algorithms with machine learning behavioral feature extraction approaches to simultaneously predict diagnoses of Autism Spectrum Disorder (ASD) and Attention-Deficit/Hyperactivity Disorder (ADHD) in a precise and specific manner.

METHODS : The proposed pipeline will consist of: (1) a gamified web applications to curate videos of social interactions adaptively based on needs of the diagnostic system, (2) behavioral feature extraction techniques consisting of automated ML methods and novel crowdsourcing algorithms, and (3) development of ML models which classify several conditions simultaneously and which adaptively request additional information based on uncertainties about the data.

CONCLUSIONS : The prospective for high reward stems from the possibility of creating the first AI-powered tool which can identify complex social behaviors well enough to distinguish conditions with nuanced differentiators such as ASD and ADHD.

Washington Peter

2023-Mar-07

General

General

Conserved cysteine residues in Kaposi's sarcoma herpesvirus ORF34 are necessary for viral production and viral pre-initiation complex formation.

In bioRxiv : the preprint server for biology

UNLABELLED : Kaposi's sarcoma herpesvirus (KSHV) ORF34 is a component of the viral pre-initiation complex (vPIC), a highly conserved piece of machinery essential for late gene expression among beta- and gamma-herpes viruses. KSHV ORF34 is also estimated to be a hub protein, associated with the majority of vPIC components. However, the precise mechanisms underlying how the ORF34 molecule contributes to the vPIC function, including the binding manner to other vPIC components, remain unclear. Therefore, we constructed ORF34 alanine-scanning mutants, in which amino-acid residues that were conserved among other herpesviruses had been replaced by alanine. The mutants were analyzed for their binding functions to other vPIC factors, and then were evaluated for their recovering ability of viral production using the cells harboring ORF34-deficient KSHV-BAC. The results demonstrated that at least four cysteines conserved in ORF34 were crucial for binding to other vPIC components, ORF24 and ORF66, virus production, and late gene transcription and expression. Based on the amino acid sequence of ORF34, these four cysteines were expected to constitute a pair of C-Xn-C consensus motifs. An artificial intelligence-predicted structure model revealed that the four cysteines were present tetrahedrally in an intramolecular fashion. Another prediction algorithm indicated the possible capture of metal cations by ORF34. Furthermore, it was experimentally observed that the elimination of cations by a selective chelator resulted in the loss of ORF34's binding ability to other vPIC components. In conclusion, our results suggest the functional importance of KSHV ORF34 conserved cysteines for vPIC components assembly and viral replication.

IMPORTANCE : The gamma- and beta-herpesvirus family conserve the viral-factor based mechanism for initiating viral late gene transcription. This viral pre-initiation complex (vPIC) is a functional analog to cellular PIC consisting of general transcriptional factors. We focused on KSHV ORF34, an essential factor for viral replication as a vPIC component. The precise mechanism underlying vPIC formation and critical domain structure of ORF34 for its function are presently unclear. Therefore, we investigated the contribution of conserved amino-acid residues among ORF34 homologs to virus production, late gene expression, and interaction with other vPIC components. We demonstrated for the first time that four conserved cysteines (C170, C175, C256, and C259) in ORF34 are essential for vPIC formation, late gene transcription, and viral production. Importantly, the predicted structure model and biochemical experiment provide evidence showing that these four conserved cysteines are present in a tetrahedral formation which helped to maintain metal cation.

Watanabe Tadashi, Narahari Akshara, Bhardwaj Esha, Kuriyama Kazushi, Nishimura Mayu, Izumi Taisuke, Fujimuro Masahiro, Ohno Shinji

2023-Mar-09

General

General

Aberrant phase separation is a common killing strategy of positively charged peptides in biology and human disease.

In bioRxiv : the preprint server for biology
Positively charged repeat peptides are emerging as key players in neurodegenerative diseases. These peptides can perturb diverse cellular pathways but a unifying framework for how such promiscuous toxicity arises has remained elusive. We used mass-spectrometry-based proteomics to define the protein targets of these neurotoxic peptides and found that they all share similar sequence features that drive their aberrant condensation with these positively charged peptides. We trained a machine learning algorithm to detect such sequence features and unexpectedly discovered that this mode of toxicity is not limited to human repeat expansion disorders but has evolved countless times across the tree of life in the form of cationic antimicrobial and venom peptides. We demonstrate that an excess in positive charge is necessary and sufficient for this killer activity, which we name 'polycation poisoning'. These findings reveal an ancient and conserved mechanism and inform ways to leverage its design rules for new generations of bioactive peptides.
Boeynaems Steven, Ma X Rosa, Yeong Vivian, Ginell Garrett M, Chen Jian-Hua, Blum Jacob A, Nakayama Lisa, Sanyal Anushka, Briner Adam, Van Haver Delphi, Pauwels Jarne, Ekman Axel, Schmidt H Broder, Sundararajan Kousik, Porta Lucas, Lasker Keren, Larabell Carolyn, Hayashi Mirian A F, Kundaje Anshul, Impens Francis, Obermeyer Allie, Holehouse Alex S, Gitler Aaron D

2023-Mar-09