Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Detrending the Waveforms of Steady-State Vowels.

In Entropy (Basel, Switzerland)

Steady-state vowels are vowels that are uttered with a momentarily fixed vocal tract configuration and with steady vibration of the vocal folds. In this steady-state, the vowel waveform appears as a quasi-periodic string of elementary units called pitch periods. Humans perceive this quasi-periodic regularity as a definite pitch. Likewise, so-called pitch-synchronous methods exploit this regularity by using the duration of the pitch periods as a natural time scale for their analysis. In this work, we present a simple pitch-synchronous method using a Bayesian approach for estimating formants that slightly generalizes the basic approach of modeling the pitch periods as a superposition of decaying sinusoids, one for each vowel formant, by explicitly taking into account the additional low-frequency content in the waveform which arises not from formants but rather from the glottal pulse. We model this low-frequency content in the time domain as a polynomial trend function that is added to the decaying sinusoids. The problem then reduces to a rather familiar one in macroeconomics: estimate the cycles (our decaying sinusoids) independently from the trend (our polynomial trend function); in other words, detrend the waveform of steady-state waveforms. We show how to do this efficiently.

Van Soom Marnix, de Boer Bart


acoustic phonetics, detrending, formant, model averaging, nested sampling, probability theory, source-filter theory, steady-state, uncertainty quantification, vowel

General General

Averaging Is Probably Not the Optimum Way of Aggregating Parameters in Federated Learning.

In Entropy (Basel, Switzerland)

Federated learning is a decentralized topology of deep learning, that trains a shared model through data distributed among each client (like mobile phones, wearable devices), in order to ensure data privacy by avoiding raw data exposed in data center (server). After each client computes a new model parameter by stochastic gradient descent (SGD) based on their own local data, these locally-computed parameters will be aggregated to generate an updated global model. Many current state-of-the-art studies aggregate different client-computed parameters by averaging them, but none theoretically explains why averaging parameters is a good approach. In this paper, we treat each client computed parameter as a random vector because of the stochastic properties of SGD, and estimate mutual information between two client computed parameters at different training phases using two methods in two learning tasks. The results confirm the correlation between different clients and show an increasing trend of mutual information with training iteration. However, when we further compute the distance between client computed parameters, we find that parameters are getting more correlated while not getting closer. This phenomenon suggests that averaging parameters may not be the optimum way of aggregating trained parameters.

Xiao Peng, Cheng Samuel, Stankovic Vladimir, Vukobratovic Dejan


averaging, correlation, decentralized learning, federated learning, mutual information

Pathology Pathology

Robust and Scalable Learning of Complex Intrinsic Dataset Geometry via ElPiGraph.

In Entropy (Basel, Switzerland)

Multidimensional datapoint clouds representing large datasets are frequently characterized by non-trivial low-dimensional geometry and topology which can be recovered by unsupervised machine learning approaches, in particular, by principal graphs. Principal graphs approximate the multivariate data by a graph injected into the data space with some constraints imposed on the node mapping. Here we present ElPiGraph, a scalable and robust method for constructing principal graphs. ElPiGraph exploits and further develops the concept of elastic energy, the topological graph grammar approach, and a gradient descent-like optimization of the graph topology. The method is able to withstand high levels of noise and is capable of approximating data point clouds via principal graph ensembles. This strategy can be used to estimate the statistical significance of complex data features and to summarize them into a single consensus principal graph. ElPiGraph deals efficiently with large datasets in various fields such as biology, where it can be used for example with single-cell transcriptomic or epigenomic datasets to infer gene expression dynamics and recover differentiation landscapes.

Albergante Luca, Mirkes Evgeny, Bac Jonathan, Chen Huidong, Martin Alexis, Faure Louis, Barillot Emmanuel, Pinello Luca, Gorban Alexander, Zinovyev Andrei


data approximation, principal graphs, principal trees, software, topological grammars

General General

Semi-Supervised Bidirectional Long Short-Term Memory and Conditional Random Fields Model for Named-Entity Recognition Using Embeddings from Language Models Representations.

In Entropy (Basel, Switzerland)

Increasingly, popular online museums have significantly changed the way people acquire cultural knowledge. These online museums have been generating abundant amounts of cultural relics data. In recent years, researchers have used deep learning models that can automatically extract complex features and have rich representation capabilities to implement named-entity recognition (NER). However, the lack of labeled data in the field of cultural relics makes it difficult for deep learning models that rely on labeled data to achieve excellent performance. To address this problem, this paper proposes a semi-supervised deep learning model named SCRNER (Semi-supervised model for Cultural Relics' Named Entity Recognition) that utilizes the bidirectional long short-term memory (BiLSTM) and conditional random fields (CRF) model trained by seldom labeled data and abundant unlabeled data to attain an effective performance. To satisfy the semi-supervised sample selection, we propose a repeat-labeled (relabeled) strategy to select samples of high confidence to enlarge the training set iteratively. In addition, we use embeddings from language model (ELMo) representations to dynamically acquire word representations as the input of the model to solve the problem of the blurred boundaries of cultural objects and Chinese characteristics of texts in the field of cultural relics. Experimental results demonstrate that our proposed model, trained on limited labeled data, achieves an effective performance in the task of named entity recognition of cultural relics.

Zhang Min, Geng Guohua, Chen Jing


bidirectional long short-term memory network, conditional random fields, cultural relics, embeddings from language models, named-entity recognition, semi-supervised learning

General General

A Novel Counterfeit Feature Extraction Technique for Exposing Face-Swap Images Based on Deep Learning and Error Level Analysis.

In Entropy (Basel, Switzerland)

The quality and efficiency of generating face-swap images have been markedly strengthened by deep learning. For instance, the face-swap manipulations by DeepFake are so real that it is tricky to distinguish authenticity through automatic or manual detection. To augment the efficiency of distinguishing face-swap images generated by DeepFake from real facial ones, a novel counterfeit feature extraction technique was developed based on deep learning and error level analysis (ELA). It is related to entropy and information theory such as cross-entropy loss function in the final softmax layer. The DeepFake algorithm is only able to generate limited resolutions. Therefore, this algorithm results in two different image compression ratios between the fake face area as the foreground and the original area as the background, which would leave distinctive counterfeit traces. Through the ELA method, we can detect whether there are different image compression ratios. Convolution neural network (CNN), one of the representative technologies of deep learning, can extract the counterfeit feature and detect whether images are fake. Experiments show that the training efficiency of the CNN model can be significantly improved by the ELA method. In addition, the proposed technique can accurately extract the counterfeit feature, and therefore achieves outperformance in simplicity and efficiency compared with direct detection methods. Specifically, without loss of accuracy, the amount of computation can be significantly reduced (where the required floating-point computing power is reduced by more than 90%).

Zhang Weiguo, Zhao Chenggang, Li Yuxing


CNN, DeepFake detection, ELA detection, deep learning, feature extraction

General General

Entropy-Based Measures of Hypnopompic Heart Rate Variability Contribute to the Automatic Prediction of Cardiovascular Events.

In Entropy (Basel, Switzerland)

Surges in sympathetic activity should be a major contributor to the frequent occurrence of cardiovascular events towards the end of nocturnal sleep. We aimed to investigate whether the analysis of hypnopompic heart rate variability (HRV) could assist in the prediction of cardiovascular disease (CVD). 2217 baseline CVD-free subjects were identified and divided into CVD group and non-CVD group, according to the presence of CVD during a follow-up visit. HRV measures derived from time domain analysis, frequency domain analysis and nonlinear analysis were employed to characterize cardiac functioning. Machine learning models for both long-term and short-term CVD prediction were then constructed, based on hypnopompic HRV metrics and other typical CVD risk factors. CVD was associated with significant alterations in hypnopompic HRV. An accuracy of 81.4% was achieved in short-term prediction of CVD, demonstrating a 10.7% increase compared with long-term prediction. There was a decline of more than 6% in the predictive performance of short-term CVD outcomes without HRV metrics. The complexity of hypnopompic HRV, measured by entropy-based indices, contributed considerably to the prediction and achieved greater importance in the proposed models than conventional HRV measures. Our findings suggest that Hypnopompic HRV assists the prediction of CVD outcomes, especially the occurrence of CVD event within two years.

Yan Xueya, Zhang Lulu, Li Jinlian, Du Ding, Hou Fengzhen


XGBoost, cardiovascular disease, heart rate variability, machine learning, sleep