Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Implementing a high-efficiency similarity analysis approach for firmware code.

In PloS one ; h5-index 176.0

The rapid expansion of the open-source community has shortened the software development cycle, but the spread of vulnerabilities has been accelerated, especially in the field of the Internet of Things. In recent years, the frequency of attacks against connected devices is increasing exponentially; thus, the vulnerabilities are more serious in nature. The state-of-the-art firmware security inspection technologies, such as methods based on machine learning and graph theory, find similar applications depending on the known vulnerabilities but cannot do anything without detailed information about the vulnerabilities. Moreover, model training, which is necessary for the machine learning technologies, requires a significant amount of time and data, resulting in low efficiency and poor extensibility. Aiming at the above shortcomings, a high-efficiency similarity analysis approach for firmware code is proposed in this study. First, the function control flow features and data flow features are extracted from the functions of the firmware and of the vulnerabilities, and the features are used to calculate the SimHash of the functions. The mass storage and fast query capabilities of the SimHash are implemented by the pigeonhole principle. Second, the similarity function pairs are analyzed in detail within and among the basic blocks. Within the basic blocks, the symbolic execution is used to generate the basic block semantic information, and the constraint solver is used to determine the semantic equivalence. Among the basic blocks, the local control flow graphs are analyzed to obtain their similarity. Then, we implemented a prototype and present the evaluation. The evaluation results demonstrate that the proposed approach can implement large-scale firmware function similarity analysis. It can also get the location of the real-world firmware patch without vulnerability function information. Finally, we compare our method with existing methods. The comparison results demonstrate that our method is more efficient and accurate than the Gemini and StagedMethod. More than 90% of the firmware functions can be indexed within 0.1 s, while the search time of 100,000 firmware functions is less than 2 s.

Wang Yisen, Wang Ruimin, Jing Jing, Wang Huanwei


General General

Deep Learning for Accelerometric Data Assessment and Ataxic Gait Monitoring.

In IEEE transactions on neural systems and rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society

Ataxic gait monitoring and assessment of neurological disorders belong to important multidisciplinary areas that are supported by digital signal processing methods and machine learning tools. This paper presents the possibility of using accelerometric data to optimise deep learning convolutional neural network systems to distinguish between ataxic and normal gait. The experimental dataset includes 860 signal segments of 16 ataxic patients and 19 individuals from the control set with the mean age of 38.6 and 39.6 years, respectively. The proposed methodology is based upon the analysis of frequency components of accelerometric signals simultaneously recorded at specific body positions with a sampling frequency of 60 Hz. The deep learning system uses all of the frequency components in a range of ⟨0, 30⟩ Hz. Our classification results are compared with those obtained by standard methods, which include the support vector machine, Bayesian methods, and the two-layer neural network with features estimated as the relative power in selected frequency bands. Our results show that the appropriate selection of sensor positions can increase the accuracy from 81.2 % for the foot position to 91.7 % for the spine position. Combining the input data and the deep learning methodology with five layers increased the accuracy to 95.8 %. Our methodology suggests that artificial intelligence methods and deep learning are efficient methods in the assessment of motion disorders and they have a wide range of further applications.

Prochazka Ales, Dostal Ondrej, Cejnar Pavel, Mohamed Hagar Ibrahim, Pavelek Zbysek, Valis Martin, Vysata Oldrich


General General

Salient Object Detection in the Deep Learning Era: An In-depth Survey.

In IEEE transactions on pattern analysis and machine intelligence ; h5-index 127.0

As an essential problem in computer vision, salient object detection (SOD) has attracted an increasing amount of research attention over the years. Recent advances in SOD are predominantly led by deep learning-based solutions (named deep SOD). To enable an in-depth understanding of deep SOD, in this paper, we provide a comprehensive survey covering various aspects, ranging from algorithm taxonomy to unsolved issues. In particular, we first review deep SOD algorithms from different perspectives, including network architecture, level of supervision, learning paradigm, and object-/instance-level detection. Following that, we summarize and analyze existing SOD datasets and evaluation metrics. Then, we benchmark a large group of representative SOD models, and provide detailed analyses of the comparison results. Moreover, we study the performance of SOD algorithms under different attribute settings, which has not been thoroughly explored previously, by constructing a novel SOD dataset with rich attribute annotations covering various salient object types, challenging factors, and scene categories. We further analyze, for the first time in the field, the robustness of SOD models to random input perturbations and adversarial attacks. We also look into the generalization and difficulty of existing SOD datasets. Finally, we discuss several open issues of SOD and outline future research directions. All the saliency prediction maps, our constructed dataset with annotations, and codes for evaluation are publicly available at

Wang Wenguan, Lai Qiuxia, Fu Huazhu, Shen Jianbing, Ling Haibin, Yang Ruigang


General General

A Machine Learning Strategy for Drug Discovery Identifies Anti-Schistosomal Small Molecules.

In ACS infectious diseases

Schistosomiasis is a chronic and painful disease of poverty caused by the flatworm parasite Schistosoma. Drug discovery for antischistosomal compounds predominantly employs in vitro whole organism (phenotypic) screens against two developmental stages of Schistosoma mansoni, post-infective larvae (somules) and adults. We generated two rule books and associated scoring systems to normalize 3898 phenotypic data points to enable machine learning. The data were used to generate eight Bayesian machine learning models with the Assay Central software according to parasite's developmental stage and experimental time point (≤24, 48, 72, and >72 h). The models helped predict 56 active and nonactive compounds from commercial compound libraries for testing. When these were screened against S. mansoni in vitro, the prediction accuracy for active and inactives was 61% and 56% for somules and adults, respectively; also, hit rates were 48% and 34%, respectively, far exceeding the typical 1-2% hit rate for traditional high throughput screens.

Zorn Kimberley M, Sun Shengxi, McConnon Cecelia L, Ma Kelley, Chen Eric K, Foil Daniel H, Lane Thomas R, Liu Lawrence J, El-Sakkary Nelly, Skinner Danielle E, Ekins Sean, Caffrey Conor R


Bayesian, Schistosoma, drug discovery, machine learning, phenotypic screen, schistosomiasis

General General

Machine learning method for predicting pacemaker implantation following transcatheter aortic valve replacement.

In Pacing and clinical electrophysiology : PACE

BACKGROUND : An accurate assessment of permanent pacemaker implantation (PPI) risk following transcatheter aortic valve replacement (TAVR) is important for clinical decision making. The aims of this study were to investigate the significance and utility of pre- and post-TAVR ECG data and compare machine learning approaches with traditional logistic regression in predicting pacemaker risk following TAVR.

METHODS : 557 patients in sinus rhythm undergoing TAVR for severe aortic stenosis (AS) were included in the analysis. Baseline demographics, clinical, pre-TAVR ECG, post-TAVR data, post-TAVR ECGs (24 hours following TAVR and before PPI), and echocardiographic data were recorded. A Random Forest (RF) algorithm and logistic regression were used to train models for assessing the likelihood of PPI following TAVR.

RESULTS : Average age was 80 ± 9 years, with 52% male. PPI after TAVR occurred in 95 patients (17.1%). The optimal cutoff of delta PR (difference between post and pre TAVR PR intervals) to predict PPI was 20 ms with a sensitivity of 0.82, a specificity of 0.66. With regard to delta QRS, the optimal cutoff was 13 ms with a sensitivity of 0.68 and a specificity of 0.59. The RF model that incorporated post-TAVR ECG data (AUC 0.81) more accurately predicted PPI risk compared to the RF model without post-TAVR ECG data (AUC 0.72). Moreover, the RF model performed better than logistic regression model in predicting PPI risk (AUC: 0.81 vs. 0.69).

CONCLUSIONS : Machine learning using RF methodology is significantly more powerful than traditional logistic regression in predicting PPI risk following TAVR. This article is protected by copyright. All rights reserved.

Truong Vien T, Beyerbach Daniel, Mazur Wojciech, Wigle Matthew, Bateman Emma, Pallerla Akhil, Ngo Tam N M, Shreenivas Satya, Tretter Justin T, Palmer Cassady, Kereiakes Dean J, Chung Eugene S


TAVR, machine learning, pacemaker implantation, prediction, random forest

General General

Cloud based ensemble machine learning approach for smart detection of epileptic seizures using higher order spectral analysis.

In Physical and engineering sciences in medicine

The present paper proposes a smart framework for detection of epileptic seizures using the concepts of IoT technologies, cloud computing and machine learning. This framework processes the acquired scalp EEG signals by Fast Walsh Hadamard transform. Then, the transformed frequency-domain signals are examined using higher-order spectral analysis to extract amplitude and entropy-based statistical features. The extracted features have been selected by means of correlation-based feature selection algorithm to achieve more real-time classification with reduced complexity and delay. Finally, the samples containing selected features have been fed to ensemble machine learning techniques for classification into several classes of EEG states, viz. normal, interictal and ictal. The employed techniques include Dagging, Bagging, Stacking, MultiBoost AB and AdaBoost M1 algorithms in integration with C4.5 decision tree algorithm as the base classifier. The results of the ensemble techniques are also compared with standalone C4.5 decision tree and SVM algorithms. The performance analysis through simulation results reveals that the ensemble of AdaBoost M1 and C4.5 decision tree algorithms with higher-order spectral features is an adequate technique for automated detection of epileptic seizures in real-time. This technique achieves 100% classification accuracy, sensitivity and specificity values with optimally small classification time.

Singh Kuldeep, Malhotra Jyoteesh


Cloud computing, EEG, Ensemble machine learning, Epilepsy, Healthcare, Internet of Things