Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Detection and classification of breast cancer using Logistic Regression feature selection and GMDH classifier.

In Journal of biomedical informatics ; h5-index 55.0

Breast cancer is the most common cancer among women such that the existence of a precise and reliable system for the diagnosis of benign or malignant tumors is critical. Nowadays, using the results of Fine Needle Aspiration (FNA) cytology and machine learning techniques, detection and early diagnosis of this cancer can be done with greater accuracy. In this paper, we propose a method consisting of two steps: in the first step, to eliminate the less important features, logistic regression has been used. In the second step, the Group Method Data Handling (GMDH) neural network is used for the diagnosis of benign and malignant samples. To evaluate the performance of the proposed method, three datasets WBCD, WDBC and WPBC are investigated with metrics: precision, the Area Under the ROC (AUC), true positive rate, false positive rate, accuracy and F-criteria. Simulation results show that the proposed method reaches a precision of 99.4% for WBCD, 99.6% for WDBC and a precision of 96.9% for WPBC dataset.

Khandezamin Ziba, Naderan Marjan, Javad Rashti Mohammad


Group Method Data Handling, breast cancer, feature selection, logistic regression, machine learning

oncology Oncology

Ensemble learning models that predict surface protein abundance from single-cell multimodal omics data.

In Methods (San Diego, Calif.)

Single-cell protein abundance is a fundamental type of information to characterize cell states. Due to high cost and technical barriers, however, direct quantification of proteins is difficult. Single-cell RNA sequencing (scRNA-seq) data, serving as a cost-effective substitute of single-cell proteomics, may not accurately reflect protein expression levels due to measurement error, noise, post-transcriptional and translational regulation, etc. The recently emerging single-cell multimodal omics data, e.g. CITE-seq and REAP-seq, can simultaneously profile RNA and protein abundances in single cells, providing labeled data for predictive modeling in a supervised learning framework. Deep neural network-based transfer learning method has been applied to imputation of surface protein abundance from single-cell transcriptomic data. However, it is unclear if the artificial neural network is the best model, and it is desirable to improve the prediction performance (e.g. accuracy, interpretability) of machine learning models. In this paper, we compared several tree-based ensemble learning methods with neural network models, and found that ensemble learning often performed better than neural network, and Random Forest (RF) performed the best overall. Moreover, we used the feature importance scores from RF to interpret biological mechanisms underlying the prediction. Our study demonstrates the effectiveness of ensemble learning for reliable protein abundance prediction using single-cell multimodal omics data, and paves the way for knowledge discovery by mining single-cell multi-omics data in large scale.

Xu Fan, Wang Shike, Dai Xinnan, Mundra Piyushkumar A, Zheng Jie


CITE-seq, Ensemble learning, Protein abundance, REAP-seq, Single cell, Transcriptomic

oncology Oncology

Acute myeloid leukemia and artificial intelligence, algorithms and new scores.

In Best practice & research. Clinical haematology

Artificial intelligence, and more narrowly machine-learning, is beginning to expand humanity's capacity to analyze increasingly large and complex datasets. Advances in computer hardware and software have led to breakthroughs in multiple sectors of our society, including a burgeoning role in medical research and clinical practice. As the volume of medical data grows at an apparently exponential rate, particularly since the human genome project laid the foundation for modern genetic inquiry, informatics tools like machine learning are becoming crucial in analyzing these data to provide meaningful tools for diagnostic, prognostic, and therapeutic purposes. Within medicine, hematologic diseases can be particularly challenging to understand and treat given the increasingly complex and intercalated genetic, epigenetic, immunologic, and regulatory pathways that must be understood to optimize patient outcomes. In acute myeloid leukemia (AML), new developments in machine learning algorithms have enabled a deeper understanding of disease biology and the development of better prognostic and predictive tools. Ongoing work in the field brings these developments incrementally closer to clinical implementation.

Radakovich Nathan, Cortese Matthew, Nazha Aziz


Acute myeloid leukemia, Artificial intelligence, Genomics, Machine learning, Malignant hematology, Multi-omics, Risk stratification

Cardiology Cardiology

Digital cardiovascular care in COVID-19 pandemic: A potential alternative?

In Journal of cardiac surgery ; h5-index 21.0

BACKGROUND : Cardiovascular patients are at increased risk of acquiring coronavirus disease 2019 (COVID-19) infection while their visit to healthcare facilities. There is a need for alternative tools for optimal monitoring and management of cardiovascular patients in the present pandemic situation. Digital health care may prove to be a new revolutionary tool to protect cardiovascular patients from coronavirus disease by avoiding routine visits to health care facilities that are already overwhelmed with COVID-19 patients.

METHODS : To evaluate the role of digital health care in the present era of the COVID-19 pandemic, we have reviewed the published literature on digital health services providing cardiovascular care.

RESULTS AND CONCLUSION : Digital health including telemedicine services, robotic telemedicine carts, use of artificial intelligence and machine learning, use of digital gadgets like smartwatches and web-based applications may be a safe alternative for the management of cardiovascular patients in the present pandemic situation.

Kaushik Atul, Patel Surendra, Dubey Kalika


COVID-19 pandemic, artificial intelligence, cardiovascular care, digital health, telemedicine

General General

Unsupervised explainable AI for simultaneous molecular evolutionary study of forty thousand SARS-CoV-2 genomes

bioRxiv Preprint

Unsupervised AI (artificial intelligence) can obtain novel knowledge from big data without particular models or prior knowledge and is highly desirable for unveiling hidden features in big data. SARS-CoV-2 poses a serious threat to public health and one important issue in characterizing this fast-evolving virus is to elucidate various aspects of their genome sequence changes. We previously established unsupervised AI, a BLSOM (batch-learning SOM), which can analyze five million genomic sequences simultaneously. The present study applied the BLSOM to the oligonucleotide compositions of forty thousand SARS-CoV-2 genomes. While only the oligonucleotide composition was given, the obtained clusters of genomes corresponded primarily to known main clades and internal divisions in the main clades. Since the BLSOM is explainable AI, it reveals which features of the oligonucleotide composition are responsible for clade clustering. The BLSOM has powerful image display capabilities and enables efficient knowledge discovery about viral evolutionary processes.

Ikemura, T.; Wada, K.; Wada, Y.; Iwasaki, Y.; Abe, T.


Pathology Pathology

Implicit Subspace Prior Learning for Dual-Blind Face Restoration

ArXiv Preprint

Face restoration is an inherently ill-posed problem, where additional prior constraints are typically considered crucial for mitigating such pathology. However, real-world image prior are often hard to simulate with precise mathematical models, which inevitably limits the performance and generalization ability of existing prior-regularized restoration methods. In this paper, we study the problem of face restoration under a more practical ``dual blind'' setting, i.e., without prior assumptions or hand-crafted regularization terms on the degradation profile or image contents. To this end, a novel implicit subspace prior learning (ISPL) framework is proposed as a generic solution to dual-blind face restoration, with two key elements: 1) an implicit formulation to circumvent the ill-defined restoration mapping and 2) a subspace prior decomposition and fusion mechanism to dynamically handle inputs at varying degradation levels with consistent high-quality restoration results. Experimental results demonstrate significant perception-distortion improvement of ISPL against existing state-of-the-art methods for a variety of restoration subtasks, including a 3.69db PSNR and 45.8% FID gain against ESRGAN, the 2018 NTIRE SR challenge winner. Overall, we prove that it is possible to capture and utilize prior knowledge without explicitly formulating it, which will help inspire new research paradigms towards low-level vision tasks.

Lingbo Yang, Pan Wang, Zhanning Gao, Shanshe Wang, Peiran Ren, Siwei Ma, Wen Gao