Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

Radiology Radiology

Inconsistency in the use of the term "validation" in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging.

In PloS one ; h5-index 176.0

BACKGROUND : The development of deep learning (DL) algorithms is a three-step process-training, tuning, and testing. Studies are inconsistent in the use of the term "validation", with some using it to refer to tuning and others testing, which hinders accurate delivery of information and may inadvertently exaggerate the performance of DL algorithms. We investigated the extent of inconsistency in usage of the term "validation" in studies on the accuracy of DL algorithms in providing diagnosis from medical imaging.

METHODS AND FINDINGS : We analyzed the full texts of research papers cited in two recent systematic reviews. The papers were categorized according to whether the term "validation" was used to refer to tuning alone, both tuning and testing, or testing alone. We analyzed whether paper characteristics (i.e., journal category, field of study, year of print publication, journal impact factor [JIF], and nature of test data) were associated with the usage of the terminology using multivariable logistic regression analysis with generalized estimating equations. Of 201 papers published in 125 journals, 118 (58.7%), 9 (4.5%), and 74 (36.8%) used the term to refer to tuning alone, both tuning and testing, and testing alone, respectively. A weak association was noted between higher JIF and using the term to refer to testing (i.e., testing alone or both tuning and testing) instead of tuning alone (vs. JIF <5; JIF 5 to 10: adjusted odds ratio 2.11, P = 0.042; JIF >10: adjusted odds ratio 2.41, P = 0.089). Journal category, field of study, year of print publication, and nature of test data were not significantly associated with the terminology usage.

CONCLUSIONS : Existing literature has a significant degree of inconsistency in using the term "validation" when referring to the steps in DL algorithm development. Efforts are needed to improve the accuracy and clarity in the terminology usage.

Kim Dong Wook, Jang Hye Young, Ko Yousun, Son Jung Hee, Kim Pyeong Hwa, Kim Seon-Ok, Lim Joon Seo, Park Seong Ho

2020

Public Health Public Health

India nudges to contain COVID-19 pandemic: A reactive public policy analysis using machine-learning based topic modelling.

In PloS one ; h5-index 176.0

India locked down 1.3 billion people on March 25, 2020, in the wake of COVID-19 pandemic. The economic cost of it was estimated at USD 98 billion, while the social costs are still unknown. This study investigated how government formed reactive policies to fight coronavirus across its policy sectors. Primary data was collected from the Press Information Bureau (PIB) in the form press releases of government plans, policies, programme initiatives and achievements. A text corpus of 260,852 words was created from 396 documents from the PIB. An unsupervised machine-based topic modelling using Latent Dirichlet Allocation (LDA) algorithm was performed on the text corpus. It was done to extract high probability topics in the policy sectors. The interpretation of the extracted topics was made through a nudge theoretic lens to derive the critical policy heuristics of the government. Results showed that most interventions were targeted to generate endogenous nudge by using external triggers. Notably, the nudges from the Prime Minister of India was critical in creating herd effect on lockdown and social distancing norms across the nation. A similar effect was also observed around the public health (e.g., masks in public spaces; Yoga and Ayurveda for immunity), transport (e.g., old trains converted to isolation wards), micro, small and medium enterprises (e.g., rapid production of PPE and masks), science and technology sector (e.g., diagnostic kits, robots and nano-technology), home affairs (e.g., surveillance and lockdown), urban (e.g. drones, GIS-tools) and education (e.g., online learning). A conclusion was drawn on leveraging these heuristics are crucial for lockdown easement planning.

Debnath Ramit, Bardhan Ronita

2020

General General

Person identification from EEG using various machine learning techniques with inter-hemispheric amplitude ratio.

In PloS one ; h5-index 176.0

Association between electroencephalography (EEG) and individually personal information is being explored by the scientific community. Though person identification using EEG is an attraction among researchers, the complexity of sensing limits using such technologies in real-world applications. In this research, the challenge has been addressed by reducing the complexity of the brain signal acquisition and analysis processes. This was achieved by reducing the number of electrodes, simplifying the critical task without compromising accuracy. Event-related potentials (ERP), a.k.a. time-locked stimulation, was used to collect data from each subject's head. Following a relaxation period, each subject was visually presented a random four-digit number and then asked to think of it for 10 seconds. Fifteen trials were conducted with each subject with relaxation and visual stimulation phases preceding each mental recall segment. We introduce a novel derived feature, dubbed Inter-Hemispheric Amplitude Ratio (IHAR), which expresses the ratio of amplitudes of laterally corresponding electrode pairs. The feature was extracted after expanding the training set using signal augmentation techniques and tested with several machine learning (ML) algorithms, including Linear Discriminant Analysis (LDA), Support Vector Machine (SVM), and k-Nearest Neighbor (kNN). Most of the ML algorithms showed 100% accuracy with 14 electrodes, and according to our results, perfect accuracy can also be achieved using fewer electrodes. However, AF3, AF4, F7, and F8 electrode combination with kNN classifier which yielded 99.0±0.8% testing accuracy is the best for person identification to maintain both user-friendliness and performance. Surprisingly, the relaxation phase manifested the highest accuracy of the three phases.

Jayarathne Isuru, Cohen Michael, Amarakeerthi Senaka

2020

General General

A detection method for android application security based on TF-IDF and machine learning.

In PloS one ; h5-index 176.0

Android is the most widely used mobile operating system (OS). A large number of third-party Android application (app) markets have emerged. The absence of third-party market regulation has prompted research institutions to propose different malware detection techniques. However, due to improvements of malware itself and Android system, it is difficult to design a detection method that can efficiently and effectively detect malicious apps for a long time. Meanwhile, adopting more features will increase the complexity of the model and the computational cost of the system. Permissions play a vital role in the security of the Android apps. Term Frequency-Inverse Document Frequency (TF-IDF) is used to assess the importance of a word for a file set in a corpus. The static analysis method does not need to run the app. It can efficiently and accurately extract the permissions from an app. Based on this cognition and perspective, in this paper, a new static detection method based on TF-IDF and Machine Learning is proposed. The system permissions are extracted in Android application package's (Apk's) manifest file. TF-IDF algorithm is used to calculate the permission value (PV) of each permission and the sensitivity value of apk (SVOA) of each app. The SVOA and the number of the used permissions are learned and tested by machine learning. 6070 benign apps and 9419 malware are used to evaluate the proposed approach. The experiment results show that only use dangerous permissions or the number of used permissions can't accurately distinguish whether an app is malicious or benign. For malware detection, the proposed approach achieve up to 99.5% accuracy and the learning and training time only needs 0.05s. For malware families detection, the accuracy is 99.6%. The results indicate that the method for unknown/new sample's detection accuracy is 92.71%. Compared against other state-of-the-art approaches, the proposed approach is more effective by detecting malware and malware families.

Yuan Hongli, Tang Yongchuan, Sun Wenjuan, Liu Li

2020

General General

Interactive machine learning for fast and robust cell profiling.

In PloS one ; h5-index 176.0

Automated profiling of cell morphology is a powerful tool for inferring cell function. However, this technique retains a high barrier to entry. In particular, configuring image processing parameters for optimal cell profiling is susceptible to cognitive biases and dependent on user experience. Here, we use interactive machine learning to identify the optimum cell profiling configuration that maximises quality of the cell profiling outcome. The process is guided by the user, from whom a rating of the quality of a cell profiling configuration is obtained. We use Bayesian optimisation, an established machine learning algorithm, to learn from this information and automatically recommend the next configuration to examine with the aim of maximising the quality of the processing or analysis. Compared to existing interactive machine learning tools that require domain expertise for per-class or per-pixel annotations, we rely on users' explicit assessment of output quality of the cell profiling task at hand. We validated our interactive approach against the standard human trial-and-error scheme to optimise an object segmentation task using the standard software CellProfiler. Our toolkit enabled rapid optimisation of an object segmentation pipeline, increasing the quality of object segmentation over a pipeline optimised through trial-and-error. Users also attested to the ease of use and reduced cognitive load enabled by our machine learning strategy over the standard approach. We envision that our interactive machine learning approach can enhance the quality and efficiency of pipeline optimisation to democratise image-based cell profiling.

Laux Lisa, Cutiongco Marie F A, Gadegaard Nikolaj, Jensen Bjørn Sand

2020

Public Health Public Health

Contrastive Cross-site Learning with Redesigned Net for COVID-19 CT Classification.

In IEEE journal of biomedical and health informatics

The pandemic of coronavirus disease 2019 (COVID-19) has lead to a global public health crisis spreading hundreds of countries. With the continuous growth of new infections, developing automated tools for COVID-19 identification with CT image is highly desired to assist the clinical diagnosis and reduce the tedious workload of image interpretation. To enlarge the datasets for developing machine learning methods, it is essentially helpful to aggregate the cases from different medical systems for learning robust and generalizable models. This paper proposes a novel joint learning framework to perform accurate COVID-19 identification by effectively learning with heterogeneous datasets with distribution descrepancy.We build a powerful backbone by redesigning the recently proposed COVID-Net in aspects of network architecture and learning strategy to improve the prediction accuracy and learning efficiency. On top of our improved backbone, we further explicitly tackle the cross-site domain shift by conducting separate feature normalization in latent space. Moreover, we propose a contrastive training objective to enhance the domain invariance of semantic embeddings for boosting the classification performance on each dataset. We develop and evaluate our method with two public large-scale COVID-19 diagnosis datasets from real CT images. Extensive experiments show that our approach consistently improves the performanceson both datasets, outperforming the original COVID-Net trained on each dataset by 12.16% and 14.23% in AUC respectively, also exceeding existing state-of-the-art multi-site learning methods.

Wang Zhao, Liu Quande, Dou Qi

2020-Sep-10