大工至善|大学至真分享 http://blog.sciencenet.cn/u/lcj2212916

博文

[转载]【信息技术】【2003】高质量音频信号的分析与编码

已有 770 次阅读 2021-4-2 17:03 |系统分类:科研笔记|文章来源:转载

图片

本文为澳大利亚昆士兰理工大学(作者:Daryl Ning)的博士论文,共179页。

 

数字音频已经越来越成为我们日常生活的一部分。不幸的是,与原始数字信号相关联的过多比特率使其成为极其昂贵的表示。数字音频广播、高清晰度电视和互联网音频等应用需要低比特率的高质量音频。音频编码领域解决了在保持高感知质量的同时降低数字音频比特率这一重要问题。开发一个高效的音频编码器需要对音频信号本身进行详细的分析。重要的是要找到一种表示,可以简洁地对任何一般音频信号建模。

 

在这篇论文中,我们提出了两种新的高质量音频编码器,分别基于两种不同的音频表示:正弦小波表示和扭曲线性预测编码(WLPC)小波表示。除了高质量的编码外,音频编码器在应用中的灵活性也很重要。随着网络音频的日益普及,音频编码器有利于解决实时音频传输的相关问题,本文针对比特流的可扩展性问题,提出了一种具有比特流可扩展的第三代音频编码器。通过与MPEG layer III编码器的比较,评价了每种编码器的性能。

 

第一种编码器是基于混合正弦小波表示。假设每一帧音频都可以建模为正弦信号加上噪声残差的总和。利用离散小波变换(DWT)将残差分解为近似人耳临界频带的子带,然后,使用感知导出的比特分配算法来最小化由量化DWT系数引入的可听失真。听力测试表明,编码器在G4 kbps提供了近透明质量范围内的关键音频信号。它的性能也优于在相同比特率下运行的MPEG layer III编码器。然而,这种编码器仅对高质量编码有用,并且难以扩展到较低的速率操作。

 

第二种编码器是基于混合WLPC小波表示的。在这种方法中,音频信号的频谱是由一个使用扭曲线性预测(WLP)的全极点滤波器估计的。WLP工作在一个扭曲的频域上,分辨率可以调整到接近人类听觉系统的分辨率。这使得合成滤波器的固有噪声更适合于音频编码。该滤波器的激励采用离散小波变换,并进行感知编码。听力测试表明,在G4 kbps时,可以实现近乎透明的编码。在相同的比特率下,该编码器也被发现略优于MPEG layer III编码器。

 

提出的第三种编码器与以前的WLPC小波编码器相似,但经过改进以实现码流的可伸缩性。为了保持低比特率,采用了高频分量的噪声模型,并实现了DWT系数的两级量化方案。第一阶段使用固定速率标量和矢量量化来提供系数的粗略近似,这使得输入信号的低比特率、低质量版本可以嵌入到整个比特流中。第二阶段的量化增加了系数的细节,从而提高了输出信号的质量。听力测试表明,当比特率从16kbps增加到20kbps时,信号质量得到了很好的改善。此编码器的性能与以类似(但固定)比特率运行的MPEG layer III编码器相当。

 

Digital audio is increasingly becoming more and more a part of our daily lives. Unfortunately, the excessive bitrate associated with the raw digital signal makes it an extremely expensive representation. Applications such as digital audio broad casting, high definition television, and internet audio, require high quality audio at low bitrates. The field of audio coding addresses this important issue of reducing the bitrate of digital audio, while maintaining a high perceptual quality. Developing an efficient audio coder requires a detailed analysis of the audio signals themselves. It is important to find a representation that can concisely model any general audio signal. In this thesis, we propose two new high quality audio coders based on two different audio representations - the sinusoidal-wavelet representa tion, and the warped linear predictive coding (WLPC)-wavelet representation. In addition to high quality coding, it is also important for audio coders to be flexible in their application. With the increasing popularity of internet audio, it is advan tageous for audio coders to address issues related to real-time audio delivery, The issue of bitstream scalability has been targeted in this thesis, and therfore, a third audio coder capable of bitstream scalability is also proposed. The performance of each of the proposed coders was evaluated by comparisons with the MPEG layer III coder. The first coder proposed is based on a hybrid sinusoidal-wavelet representation. This assumes that each frame of audio can be modelled as a sum of sinusoids plus a noisy residual. The discrete wavelet transform (DWT) is used to decompose the residual into subbands that approximate the critical bands of human hearing. A perceptually derived bit allocation algorithm is then used to minimise the audible distortions introduced from quantising the DWT coefficients. Listening tests showed that the coder delivers near transparent quality for a range of critical audio signals at G4 kbps. It also outperforms the MPEG layer IIIcoder operating at this same bitrate. This coder, however, is only useful for high quality coding, and is difficult to scale to operate at lower rates. The second coder proposed is based on a hybrid WLPC-wavelet representation. In this approach, the spectrum of the audio signal is estimated by an all pole filter using warped linear prediction (WLP). WLP operates on a warped frequency domain, where the resolution can be adjusted to approximate that of the human auditory system. This makes the inherent noise shaping of the synthesis filter even more suited to audio coding. The excitation to this filter is transformed using the DWT and perceptually encoded. Listening tests showed that near transparent coding is achieved at G4 kbps. The coder was also found to be slightly superior to the MPEG layer IIIcoder operating at this same bitrate. The third proposed coder is similar to the previous WLPC-wavelet coder, but mod ified to achieve bitstream scalability. A noise model for high frequency components is included to keep the overall bitrate low, and a two stage quantisation scheme for the DWT coefficients is implemented. The first stage uses fixed rate scalar and vector quantisation to provide a coarse approximation of the coefficients. This al lows for low bitrate, low quality versions of the input signal to be embedded in the overall bitstream. The second stage of quantisation adds detail to the coefficients, and hence, enhances the quality of the output signal. Listening tests showed that signal quality gracefully improves as the bitrate increases from 16 kbps to SO kbps. This coder has a performance that is comparable to the MPEG layer IIIcoder operating at a similar (but fixed) bitrate.

 

1.       引言

2. 音频编码基础

3. 感知音频编码方案

4. 混合正弦小波音频编码

5. 混合WLPC小波音频编码

6. 比特流可扩展WLPC小波音频编码

7. 结论与未来展望


更多精彩文章请关注公众号:205328s611i1aqxbbgxv19.jpg




https://wap.sciencenet.cn/blog-69686-1279962.html

上一篇:[转载]【计算机科学】【2020.06】三维点云目标跟踪的深度学习
下一篇:[转载]【计算机科学】【2020.05】MATLAB在计算机视觉、机器和深度学习算法中的应用
收藏 IP: 112.31.16.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-4-24 03:19

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部