Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

General General

Clustering Analysis via Deep Generative Models With Mixture Models.

In IEEE transactions on neural networks and learning systems

Clustering is a fundamental problem that frequently arises in many fields, such as pattern recognition, data mining, and machine learning. Although various clustering algorithms have been developed in the past, traditional clustering algorithms with shallow structures cannot excavate the interdependence of complex data features in latent space. Recently, deep generative models, such as autoencoder (AE), variational AE (VAE), and generative adversarial network (GAN), have achieved remarkable success in many unsupervised applications thanks to their capabilities for learning promising latent representations from original data. In this work, first we propose a novel clustering approach based on both Wasserstein GAN with gradient penalty (WGAN-GP) and VAE with a Gaussian mixture prior. By combining the WGAN-GP with VAE, the generator of WGAN-GP is formulated by drawing samples from the probabilistic decoder of VAE. Moreover, to provide more robust clustering and generation performance when outliers are encountered in data, a variant of the proposed deep generative model is developed based on a Student's-t mixture prior. The effectiveness of our deep generative models is validated though experiments on both clustering analysis and samples generation. Through the comparison with other state-of-art clustering approaches based on deep generative models, the proposed approach can provide more stable training of the model, improve the accuracy of clustering, and generate realistic samples.

Yang Lin, Fan Wentao, Bouguila Nizar

2020-Oct-13

General General

Concept Factorization With Local Centroids.

In IEEE transactions on neural networks and learning systems

Data clustering is a fundamental problem in the field of machine learning. Among the numerous clustering techniques, matrix factorization-based methods have achieved impressive performances because they are able to provide a compact and interpretable representation of the input data. However, most of the existing works assume that each class has a global centroid, which does not hold for data with complicated structures. Besides, they cannot guarantee that the sample is associated with the nearest centroid. In this work, we present a concept factorization with the local centroids (CFLCs) approach for data clustering. The proposed model has the following advantages: 1) the samples from the same class are allowed to connect with multiple local centroids such that the manifold structure is captured; 2) the pairwise relationship between the samples and centroids is modeled to produce a reasonable label assignment; and 3) the clustering problem is formulated as a bipartite graph partitioning task, and an efficient algorithm is designed for optimization. Experiments on several data sets validate the effectiveness of the CFLC model and demonstrate its superior performance over the state of the arts.

Chen Mulin, Li Xuelong

2020-Oct-13

General General

Visual Analysis of Discrimination in Machine Learning.

In IEEE transactions on visualization and computer graphics

The growing use of automated decision-making in critical applications, such as crime prediction and college admission, has raised questions about fairness in machine learning. How can we decide whether different treatments are reasonable or discriminatory? In this paper, we investigate discrimination in machine learning from a visual analytics perspective and propose an interactive visualization tool, DiscriLens, to support a more comprehensive analysis. To reveal detailed information on algorithmic discrimination, DiscriLens identifies a collection of potentially discriminatory itemsets based on causal modeling and classification rules mining. By combining an extended Euler diagram with a matrix-based visualization, we develop a novel set visualization to facilitate the exploration and interpretation of discriminatory itemsets. A user study shows that users can interpret the visually encoded information in DiscriLens quickly and accurately. Use cases demonstrate that DiscriLens provides informative guidance in understanding and reducing algorithmic discrimination.

Wang Qianwen, Xu Zhenhua, Chen Zhutian, Wang Yong, Liu Shixia, Qu Huamin

2020-Oct-13

General General

HypoML: Visual Analysis for Hypothesis-based Evaluation of Machine Learning Models.

In IEEE transactions on visualization and computer graphics

In this paper, we present a visual analytics tool for enabling hypothesis-based evaluation of machine learning (ML) models. We describe a novel ML-testing framework that combines the traditional statistical hypothesis testing (commonly used in empirical research) with logical reasoning about the conclusions of multiple hypotheses. The framework defines a controlled configuration for testing a number of hypotheses as to whether and how some extra information about a "concept" or "feature" may benefit or hinder an ML model. Because reasoning multiple hypotheses is not always straightforward, we provide HypoML as a visual analysis tool, with which, the multi-thread testing results are first transformed to analytical results using statistical and logical inferences, and then to a visual representation for rapid observation of the conclusions and the logical flow between the testing results and hypotheses. We have applied HypoML to a number of hypothesized concepts, demonstrating the intuitive and explainable nature of the visual analysis.

Wang Qianwen, Alexander William, Pegg Jack, Qu Huamin, Chen Min

2020-Oct-13

General General

CNN EXPLAINER: Learning Convolutional Neural Networks with Interactive Visualization.

In IEEE transactions on visualization and computer graphics

Deep learning's great success motivates many practitioners and students to learn about this exciting technology. However, it is often challenging for beginners to take their first step due to the complexity of understanding and applying deep learning. We present CNN EXPLAINER, an interactive visualization tool designed for non-experts to learn and examine convolutional neural networks (CNNs), a foundational deep learning model architecture. Our tool addresses key challenges that novices face while learning about CNNs, which we identify from interviews with instructors and a survey with past students. CNN EXPLAINER tightly integrates a model overview that summarizes a CNN's structure, and on-demand, dynamic visual explanation views that help users understand the underlying components of CNNs. Through smooth transitions across levels of abstraction, our tool enables users to inspect the interplay between low-level mathematical operations and high-level model structures. A qualitative user study shows that CNN EXPLAINER helps users more easily understand the inner workings of CNNs, and is engaging and enjoyable to use. We also derive design lessons from our study. Developed using modern web technologies, CNN EXPLAINER runs locally in users' web browsers without the need for installation or specialized hardware, broadening the public's education access to modern deep learning techniques.

Wang Zijie J, Turko Robert, Shaikh Omar, Park Haekyu, Das Nilaksh, Hohman Fred, Kahng Minsuk, Chau Duen Horng

2020-Oct-13

General General

Visual Analytics for Temporal Hypergraph Model Exploration.

In IEEE transactions on visualization and computer graphics

Many processes, from gene interaction in biology to computer networks to social media, can be modeled more precisely as temporal hypergraphs than by regular graphs. This is because hypergraphs generalize graphs by extending edges to connect any number of vertices, allowing complex relationships to be described more accurately and predict their behavior over time. However, the interactive exploration and seamless refinement of such hypergraph-based prediction models still pose a major challenge. We contribute HYPER-MATRIX, a novel visual analytics technique that addresses this challenge through a tight coupling between machine-learning and interactive visualizations. In particular, the technique incorporates a geometric deep learning model as a blueprint for problem-specific models while integrating visualizations for graph-based and category-based data with a novel combination of interactions for an effective user-driven exploration of hypergraph models. To eliminate demanding context switches and ensure scalability, our matrix-based visualization provides drill-down capabilities across multiple levels of semantic zoom, from an overview of model predictions down to the content. We facilitate a focused analysis of relevant connections and groups based on interactive user-steering for filtering and search tasks, a dynamically modifiable partition hierarchy, various matrix reordering techniques, and interactive model feedback. We evaluate our technique in a case study and through formative evaluation with law enforcement experts using real-world internet forum communication data. The results show that our approach surpasses existing solutions in terms of scalability and applicability, enables the incorporation of domain knowledge, and allows for fast search-space traversal. With the proposed technique, we pave the way for the visual analytics of temporal hypergraphs in a wide variety of domains.

Fischer Maximilian T, Arya Devanshu, Streeb Dirk, Seebacher Daniel, Keim Daniel A, Worring Marcel

2020-Oct-13