Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Subclonal reconstruction of tumors by using machine learning and population genetics.

In Nature genetics ; h5-index 174.0

Most cancer genomic data are generated from bulk samples composed of mixtures of cancer subpopulations, as well as normal cells. Subclonal reconstruction methods based on machine learning aim to separate those subpopulations in a sample and infer their evolutionary history. However, current approaches are entirely data driven and agnostic to evolutionary theory. We demonstrate that systematic errors occur in the analysis if evolution is not accounted for, and this is exacerbated with multi-sampling of the same tumor. We present a novel approach for model-based tumor subclonal reconstruction, called MOBSTER, which combines machine learning with theoretical population genetics. Using public whole-genome sequencing data from 2,606 samples from different cohorts, new data and synthetic validation, we show that this method is more robust and accurate than current techniques in single-sample, multiregion and longitudinal data. This approach minimizes the confounding factors of nonevolutionary methods, thus leading to more accurate recovery of the evolutionary history of human cancers.

Caravagna Giulio, Heide Timon, Williams Marc J, Zapata Luis, Nichol Daniel, Chkhaidze Ketevan, Cross William, Cresswell George D, Werner Benjamin, Acar Ahmet, Chesler Louis, Barnes Chris P, Sanguinetti Guido, Graham Trevor A, Sottoriva Andrea

2020-Sep

General General

Hybrid Harris hawks optimization with cuckoo search for drug design and discovery in chemoinformatics.

In Scientific reports ; h5-index 158.0

One of the major drawbacks of cheminformatics is a large amount of information present in the datasets. In the majority of cases, this information contains redundant instances that affect the analysis of similarity measurements with respect to drug design and discovery. Therefore, using classical methods such as the protein bank database and quantum mechanical calculations are insufficient owing to the dimensionality of search spaces. In this paper, we introduce a hybrid metaheuristic algorithm called CHHO-CS, which combines Harris hawks optimizer (HHO) with two operators: cuckoo search (CS) and chaotic maps. The role of CS is to control the main position vectors of the HHO algorithm to maintain the balance between exploitation and exploration phases, while the chaotic maps are used to update the control energy parameters to avoid falling into local optimum and premature convergence. Feature selection (FS) is a tool that permits to reduce the dimensionality of the dataset by removing redundant and non desired information, then FS is very helpful in cheminformatics. FS methods employ a classifier that permits to identify the best subset of features. The support vector machines (SVMs) are then used by the proposed CHHO-CS as an objective function for the classification process in FS. The CHHO-CS-SVM is tested in the selection of appropriate chemical descriptors and compound activities. Various datasets are used to validate the efficiency of the proposed CHHO-CS-SVM approach including ten from the UCI machine learning repository. Additionally, two chemical datasets (i.e., quantitative structure-activity relation biodegradation and monoamine oxidase) were utilized for selecting the most significant chemical descriptors and chemical compounds activities. The extensive experimental and statistical analyses exhibit that the suggested CHHO-CS method accomplished much-preferred trade-off solutions over the competitor algorithms including the HHO, CS, particle swarm optimization, moth-flame optimization, grey wolf optimizer, Salp swarm algorithm, and sine-cosine algorithm surfaced in the literature. The experimental results proved that the complexity associated with cheminformatics can be handled using chaotic maps and hybridizing the meta-heuristic methods.

Houssein Essam H, Hosney Mosa E, Elhoseny Mohamed, Oliva Diego, Mohamed Waleed M, Hassaballah M

2020-Sep-02

General General

Robustness and rich clubs in collaborative learning groups: a learning analytics study using network science.

In Scientific reports ; h5-index 158.0

Productive and effective collaborative learning is rarely a spontaneous phenomenon but rather the result of meeting a set of conditions, orchestrating and scaffolding productive interactions. Several studies have demonstrated that conflicts can have detrimental effects on student collaboration. Through the application of network science, and social network analysis in particular, this learning analytics study investigates the concept of group robustness; that is, the capacity of collaborative groups to remain functional despite the withdrawal or absence of group members, and its relation to group performance in the frame of collaborative learning. Data on all student and teacher interactions were collected from two phases of a course in medical education that employed an online learning environment. Visual and mathematical analysis were conducted, simulating the removal of actors and its effect on the group's robustness and network structure. In addition, the extracted network parameters were used as features in machine learning algorithms to predict student performance. The study contributes findings that demonstrate the use of network science to shed light on essential elements of collaborative learning and demonstrates how the concept and measures of group robustness can increase understanding of the conditions of productive collaborative learning. It also contributes to understanding how certain interaction patterns can help to promote the sustainability or robustness of groups, while other interaction patterns can make the group more vulnerable to withdrawal and dysfunction. The finding also indicate that teachers can be a driving factor behind the formation of rich clubs of well-connected few and less connected many in some cases and can contribute to a more collaborative and sustainable process where every student is included.

Saqr Mohammed, Nouri Jalal, Vartiainen Henriikka, Tedre Matti

2020-Sep-02

General General

Deep learning-based diatom taxonomy on virtual slides.

In Scientific reports ; h5-index 158.0

Deep convolutional neural networks are emerging as the state of the art method for supervised classification of images also in the context of taxonomic identification. Different morphologies and imaging technologies applied across organismal groups lead to highly specific image domains, which need customization of deep learning solutions. Here we provide an example using deep convolutional neural networks (CNNs) for taxonomic identification of the morphologically diverse microalgal group of diatoms. Using a combination of high-resolution slide scanning microscopy, web-based collaborative image annotation and diatom-tailored image analysis, we assembled a diatom image database from two Southern Ocean expeditions. We use these data to investigate the effect of CNN architecture, background masking, data set size and possible concept drift upon image classification performance. Surprisingly, VGG16, a relatively old network architecture, showed the best performance and generalizing ability on our images. Different from a previous study, we found that background masking slightly improved performance. In general, training only a classifier on top of convolutional layers pre-trained on extensive, but not domain-specific image data showed surprisingly high performance (F1 scores around 97%) with already relatively few (100-300) examples per class, indicating that domain adaptation to a novel taxonomic group can be feasible with a limited investment of effort.

Kloster Michael, Langenkämper Daniel, Zurowietz Martin, Beszteri Bánk, Nattkemper Tim W

2020-Sep-02

Ophthalmology Ophthalmology

A deep learning approach in diagnosing fungal keratitis based on corneal photographs.

In Scientific reports ; h5-index 158.0

Fungal keratitis (FK) is the most devastating and vision-threatening microbial keratitis, but clinical diagnosis a great challenge. This study aimed to develop and verify a deep learning (DL)-based corneal photograph model for diagnosing FK. Corneal photos of laboratory-confirmed microbial keratitis were consecutively collected from a single referral center. A DL framework with DenseNet architecture was used to automatically recognize FK from the photo. The diagnoses of FK via corneal photograph for comparing DL-based models were made in the Expert and NCS-Oph group through a majority decision of three non-corneal specialty ophthalmologist and three corneal specialists, respectively. The average percentage of sensitivity, specificity, positive predictive value, and negative predictive value was approximately 71, 68, 60, and 78. The sensitivity was higher than that of the NCS-Oph (52%, P < .01), whereas the specificity was lower than that of the NCS-Oph (83%, P < .01). The average accuracy of around 70% was comparable with that of the NCS-Oph. Therefore, the sensitive DL-based diagnostic model is a promising tool for improving first-line medical care at rural area in early identification of FK.

Kuo Ming-Tse, Hsu Benny Wei-Yun, Yin Yu-Kai, Fang Po-Chiung, Lai Hung-Yin, Chen Alexander, Yu Meng-Shan, Tseng Vincent S

2020-Sep-02

General General

AI-based prediction for the risk of coronary heart disease among patients with type 2 diabetes mellitus.

In Scientific reports ; h5-index 158.0

Type 2 diabetes mellitus (T2DM) is one common chronic disease caused by insulin secretion disorder that often leads to severe outcomes and even death due to complications, among which coronary heart disease (CHD) represents the most common and severe one. Given a huge number of T2DM patients, it is thus increasingly important to identify the ones with high risks of CHD complication but the quantitative method is still not available. Here, we first curated a dataset of 1,273 T2DM patients including 304 and 969 ones with or without CHD, respectively. We then trained an artificial intelligence (AI) model using randomly selected 4/5 of the dataset and use the rest data to validate the performance of the model. The result showed that the model achieved an AUC of 0.77 (fivefold cross-validation) on the training dataset and 0.80 on the testing dataset. To further confirm the performance of the presented model, we recruited 1,253 new T2DM patients as totally independent testing dataset including 200 and 1,053 ones with or without CHD. And the model achieved an AUC of 0.71. In addition, we implemented a model to quantitatively evaluate the risk contribution of each feature, which is thus able to present personalized guidance for specific individuals. Finally, an online web server for the model was built. This study presented an AI model to determine the risk of T2DM patients to develop to CHD, which has potential value in providing early warning personalized guidance of CHD risk for both T2DM patients and clinicians.

Fan Rui, Zhang Ning, Yang Longyan, Ke Jing, Zhao Dong, Cui Qinghua

2020-Sep-02