Schedule as of Oct 11, 2022 - subject to change

Default Time Zone is EDT - Eastern Daylight Time

Stereo InSE-NET: Stereo Audio Quality Predictor Transfer Learned from Mono InSE-NET

Automatic coded audio quality predictors are typically designed for evaluating single channels without considering any spatial aspects. With InSE-NET [1], we demonstrated mimicking a state-of-the-art coded audio quality metric (ViSQOL-v3 [2]) with deep neural networks (DNN) and subsequently improving it – completely with programmatically generated data. In this study, we take steps towards building a DNN-based coded stereo audio quality predictor and we propose an extension of the InSE-NET for handling stereo signals. The design considers stereo/spatial aspects by conditioning the model with left, right, mid, and side channels; and we name our model Stereo InSE-NET. By transferring selected weights from the pre-trained mono InSE-NET and retraining with both real and synthetically augmented listening tests, we demonstrate a signiﬁcant improvement of 12% and 6% of Pearson’s and Spearman’s Rank correlation coefﬁcient, respectively, over the latest ViSQOL-v3 [3].

Speakers

Arijit Biswas

Dolby Germany GmbH

Guanxin Jiang

Dolby Germany GmbH

Thursday October 27, 2022 11:45am - 12:00pm EDT
Online Papers

Applications in Audio

badge type: ALL ACCESS or ONLINE

AES Fall Convention 2022

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Arijit Biswas

Guanxin Jiang