Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

High performance logistic regression for privacy-preserving genome analysis.

In BMC medical genomics

BACKGROUND : In biomedical applications, valuable data is often split between owners who cannot openly share the data because of privacy regulations and concerns. Training machine learning models on the joint data without violating privacy is a major technology challenge that can be addressed by combining techniques from machine learning and cryptography. When collaboratively training machine learning models with the cryptographic technique named secure multi-party computation, the price paid for keeping the data of the owners private is an increase in computational cost and runtime. A careful choice of machine learning techniques, algorithmic and implementation optimizations are a necessity to enable practical secure machine learning over distributed data sets. Such optimizations can be tailored to the kind of data and Machine Learning problem at hand.

METHODS : Our setup involves secure two-party computation protocols, along with a trusted initializer that distributes correlated randomness to the two computing parties. We use a gradient descent based algorithm for training a logistic regression like model with a clipped ReLu activation function, and we break down the algorithm into corresponding cryptographic protocols. Our main contributions are a new protocol for computing the activation function that requires neither secure comparison protocols nor Yao's garbled circuits, and a series of cryptographic engineering optimizations to improve the performance.

RESULTS : For our largest gene expression data set, we train a model that requires over 7 billion secure multiplications; the training completes in about 26.90 s in a local area network. The implementation in this work is a further optimized version of the implementation with which we won first place in Track 4 of the iDASH 2019 secure genome analysis competition.

CONCLUSIONS : In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure multi-party computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.

De Cock Martine, Dowsley Rafael, Nascimento Anderson C A, Railsback Davis, Shen Jianwei, Todoki Ariel


Gene expression data, Gradient descent, Logistic regression, Machine learning, Secure multi-party computation

Pathology Pathology

Anatomical and Pathological Observation and Analysis of SARS and COVID-19: Microthrombosis Is the Main Cause of Death.

In Biological procedures online

The spread of the coronavirus (SARS-CoV-2, COVID-19 for short) has caused a large number of deaths around the world. We summarized the data reported in the past few months and emphasized that the main causes of death of COVID-19 patients are DAD (Diffuse Alveolar Damage) and DIC (Disseminated intravascular coagulation). Microthrombosis is a prominent clinical feature of COVID-19, and 91.3% of dead patients had microthrombosis.Endothelial damage caused by SARS-CoV-2 cell invasion and subsequent host response disorders involving inflammation and coagulation pathways play a key role in the progression of severe COVID-19. Microvascular thrombosis may lead to microcirculation disorders and multiple organ failure lead to death.The characteristic pathological changes of DAD include alveolar epithelial and vascular endothelial injury, increased alveolar membrane permeability, large numbers of neutrophil infiltration, alveolar hyaline membrane formation, and hypoxemia and respiratory distress as the main clinical manifestations. DAD leads to ARDS in COVID-19 patients. DIC is a syndrome characterized by the activation of systemic intravascular coagulation, which leads to extensive fibrin deposition in the blood. Its occurrence and development begin with the expression of tissue factor and interact with physiological anticoagulation pathways. The down-regulation of fibrin and the impaired fibrinolysis together lead to extensive fibrin deposition.DIC is described as a decrease in the number of platelets and an increase in fibrin degradation products, such as D-dimer and low fibrinogen. The formation of microthrombus leads to the disturbance of microcirculation, which in turn leads to the death of the patient. However, the best prevention and treatment of COVID-19 microthrombosis is still uncertain.This review discusses the latest findings of basic and clinical research on COVID-19-related microthrombosis, and then we proposed the theory of microcirculation perfusion bundle therapy to explore effective methods for preventing and treating COVID-19-related microthrombosis. Further research is urgently needed to clarify how SARS-CoV-2 infection causes thrombotic complications, and how it affects the course and severity of the disease. To cultivate a more comprehensive understanding of the underlying mechanism of this disease. Raise awareness of the importance of preventing and treating microthrombosis in patients with COVID-19.

Chen Wenjing, Pan Jing Ye


Autopsy, COVID-19, Diffuse Alveolar Damage, Disseminated Intravascular Coagulation, Microcirculation Dysfunction, Pathology, SARS-CoV-2

General General

Development and Validation of a Machine learning prediction model of respiratory failure within 48 hours of patient admission for COVID-19.

In Journal of medical Internet research ; h5-index 88.0

BACKGROUND : Predicting early respiratory failure in COVID-19 can help triage patients to higher levels of care, allocate scarce resources, and reduce morbidity and mortality by appropriately monitoring and treating patients at greatest risk for deterioration. Given the complexity of COVID-19 disease, machine learning (ML) approaches may support clinical decision making for patients with this disease.

OBJECTIVE : Our objective is to derive a machine learning model that predicts respiratory failure within 48 hours of admission based on data from the emergency department (ED).

METHODS : Data was collected from patients with COVID-19 who were admitted to Northwell Health acute care hospitals and discharged, died, or spent a minimum of 48 hours in the hospital between March 1, 2020 and May 11, 2020. Of 11,525 patients, 933 (8.1%) were placed on invasive mechanical ventilation within 48 hours of admission. Variables used by the models included clinical and laboratory data commonly collected in the ED. We trained and validated three predictive models (two based on XGBoost, one that utilized logistic regression) using cross hospitals validation. We compared model performance between all three models as well as an established early warning score (Modified Early Warning Score (MEWS)) using receiver operating characteristic (ROC) curves, precision-recall (PR) curves, and other metrics.

RESULTS : The XGBoost model had the highest mean accuracy of 0.919 (AUC = 0.77), outperforming the other two models as well as MEWS. Important predictor variables included the type of oxygen delivery used in the ED, patient age, Emergency Severity Index (ESI), respiratory rate, serum lactate, and demographic characteristics.

CONCLUSIONS : XGBoost has high predictive accuracy, outperforming other early warning scores. The clinical plausibility and predictive ability of XGBoost suggest that the model could be used to predict 48-hour respiratory failure in admitted patients with COVID-19.


Bolourani Siavash, Brenner Max, Wang Ping, McGinn Thomas, Hirsch Jamie, Barnaby Douglas, Zanos Theodoros


General General

Prediction of adverse drug reactions using drug convolutional neural networks.

In Journal of bioinformatics and computational biology

Prediction of Adverse Drug Reactions (ADRs) has been an important aspect of Pharmacovigilance because of its impact in the pharma industry. The standard process of introduction of a new drug into a market involves a lot of clinical trials and tests. This is a tedious and time consuming process and also involves a lot of monetary resources. The faster approval of a drug helps the patients who are in need of the drug. The in silico prediction of Adverse Drug Reactions can help speed up the aforementioned process. The challenges involved are lack of negative data present and predicting ADR from just the chemical structure. Although many models are already available to predict ADR, most of the models use biological activities identifiers, chemical and physical properties in addition to chemical structures of the drugs. But for most of the new drugs to be tested, only chemical structures will be available. The performance of the existing models predicting ADR only using chemical structures is not efficient. Therefore, an efficient prediction of ADRs from just the chemical structure has been proposed in this paper. The proposed method involves a separate model for each ADR, making it a binary classification problem. This paper presents a novel CNN model called Drug Convolutional Neural Network (DCNN) to predict ADRs using chemical structures of the drugs. The performance is measured using the metrics such as Accuracy, Recall, Precision, Specificity, F1 score, AUROC and MCC. The results obtained by the proposed DCNN model outperform the competing models on the SIDER4.1 database in terms of all the metrics. A case study has been performed on a COVID-19 recommended drugs, where the proposed model predicted the ADRs that are well aligned with the observations made by medical professionals using conventional methods.

Mantripragada Anjani Sankar, Teja Sai Phani, Katasani Rohith Reddy, Joshi Pratik, Masilamani V, Ramesh Raj


Adverse drug reactions, CNN, COVID-19, deep learning, health informatics, machine learning, pharmacovigilance

Public Health Public Health

How artificial intelligence may help the Covid-19 pandemic: Pitfalls and lessons for the future.

In Reviews in medical virology

The clinical severity, rapid transmission and human losses due to coronavirus disease 2019 (Covid-19) have led the World Health Organization to declare it a pandemic. Traditional epidemiological tools are being significantly complemented by recent innovations especially using artificial intelligence (AI) and machine learning. AI-based model systems could improve pattern recognition of disease spread in populations and predictions of outbreaks in different geographical locations. A variable and a minimal amount of data are available for the signs and symptoms of Covid-19, allowing a composite of maximum likelihood algorithms to be employed to enhance the accuracy of disease diagnosis and to identify potential drugs. AI-based forecasting and predictions are expected to complement traditional approaches by helping public health officials to select better response and preparedness measures against Covid-19 cases. AI-based approaches have helped address the key issues but a significant impact on the global healthcare industry is yet to be achieved. The capability of AI to address the challenges may make it a key player in the operation of healthcare systems in future. Here, we present an overview of the prospective applications of the AI model systems in healthcare settings during the ongoing Covid-19 pandemic.

Malik Yashpal Singh, Sircar Shubhankar, Bhat Sudipta, Ansari Mohd Ikram, Pande Tripti, Kumar Prashant, Mathapati Basavaraj, Balasubramanian Ganesh, Kaushik Rahul, Natesan Senthilkumar, Ezzikouri Sayeh, El Zowalaty Mohamed E, Dhama Kuldeep


SARS-CoV-2, artificial intelligence, covid-19, diagnosis, epidemiology, therapeutic developments

General General

Impact of Machine Learning Pipeline Choices in Autism Prediction From Functional Connectivity Data.

In International journal of neural systems

Autism Spectrum Disorder (ASD) is a largely prevalent neurodevelopmental condition with a big social and economical impact affecting the entire life of families. There is an intense search for biomarkers that can be assessed as early as possible in order to initiate treatment and preparation of the family to deal with the challenges imposed by the condition. Brain imaging biomarkers have special interest. Specifically, functional connectivity data extracted from resting state functional magnetic resonance imaging (rs-fMRI) should allow to detect brain connectivity alterations. Machine learning pipelines encompass the estimation of the functional connectivity matrix from brain parcellations, feature extraction, and building classification models for ASD prediction. The works reported in the literature are very heterogeneous from the computational and methodological point of view. In this paper, we carry out a comprehensive computational exploration of the impact of the choices involved while building these machine learning pipelines. Specifically, we consider six brain parcellation definitions, five methods for functional connectivity matrix construction, six feature extraction/selection approaches, and nine classifier building algorithms. We report the prediction performance sensitivity to each of these choices, as well as the best results that are comparable with the state of the art.

GraƱa Manuel, Silva Moises


Autism, brain functional connectivity, brain parcellation, feature extraction, machine learning