Receive a weekly summary and discussion of the top papers of the week by leading researchers in the field.

In The Journal of the Acoustical Society of America

In acoustic scene classification (ASC), acoustic features play a crucial role in the extraction of scene information, which can be stored over different time scales. Moreover, the limited size of the dataset may lead to a biased model with a poor performance for recordings from unseen cities and confusing scene classes. This paper proposes a long-term wavelet feature that captures discriminative long-term scene information. The extracted scalogram requires a lower storage capacity and can be classified faster and more accurately compared with classic Mel filter bank coefficients (FBank). Furthermore, a data augmentation scheme is adopted to improve the generalization of the ASC systems, which extends the database iteratively with auxiliary classifier generative adversarial neural networks (ACGANs) and a deep learning-based sample filter. Experiments were conducted on datasets from the Detection and Classification of Acoustic Scenes and Events (DCASE) challenges. The DCASE17 and DCASE19 datasets marked a performance boost of the proposed techniques compared with the FBank classifier. Moreover, the ACGAN-based data augmentation scheme achieved an absolute accuracy improvement of 6.10% on recordings from unseen cities, far exceeding classic augmentation methods.

Chen Hangting, Liu Zuozhen, Liu Zongming, Zhang Pengyuan