久久精品三级视频,久久这里只有精品视频99,久久综合视频网站

Speex手冊(cè)----編解碼介紹

轉(zhuǎn)載自:http://blog.csdn.net/DotScylla/article/details/4402688

概念
編解碼
預(yù)處理器
自適應(yīng)抖動(dòng)緩沖
回聲消除
重采樣

前言：Speex官網(wǎng)：http://speex.org/ 可以再Documentation下找到PDF版或HTML OL版的英文手冊(cè)。可能會(huì)由于英文技能的匱乏或語(yǔ)音解碼領(lǐng)域的不熟悉會(huì)有翻譯錯(cuò)誤，所以每段我都會(huì)付上英文原段落，也望各位發(fā)現(xiàn)后能夠不吝賜教，大家共同進(jìn)步。

PS:如需轉(zhuǎn)載，注明出處，不勝感激

2.4 自適應(yīng)抖動(dòng)緩沖

2.5 回聲消除

2.6 重采樣

后記

This section describes Speex and its features into more details

這部分詳細(xì)介紹Speex及其特性

2.1 概念

Before introducing all the Speex features, here are some concepts in speech coding that help better understand the rest of the manual. Although some are general concepts in speech/audio processing, others are specific to Speex

在介紹Speex特性之前，為了便于閱讀后面的文檔，需要解釋一些概念，盡管一些概念是在語(yǔ)音/音頻處理過(guò)程中常見(jiàn)的，但也有Speex特有的一些。

采樣率

The sampling rate expressed in Hertz (Hz) is the number of samples taken from a signal per second. For a sampling rate of Fs kHz, the highest frequency that can be represented is equal to Fs/2 kHz (Fs/2 is known as the Nyquist frequency). This is a fundamental property in signal processing and is described by the sampling theorem. Speex is mainly designed for three different sampling rates: 8 kHz, 16 kHz, and 32 kHz. These are respectively refered to as narrowband, wideband and ultra-wideband.

采樣率是指從連續(xù)信號(hào)中每秒鐘采集到的采樣數(shù)量。用Fs kHz來(lái)表示，最高頻率可表示為Fs/2 kHz（見(jiàn)奈奎斯特Nyquist頻率）。采樣定理表明這是信號(hào)處理最基本的屬性。Speex主要設(shè)計(jì)了三種不同的采樣率：8kHz，16kHz和32kHz。分別表示了窄帶、寬帶和超寬帶。

比特率

When encoding a speech signal, the bit-rate is defined as the number of bits per unit of time required to encode the speech. It is measured in bits per second (bps), or generally kilobits per second. It is important to make the distinction between kilobits per second (kbps) and kilobytes per second (kBps).

比特率是指每秒鐘傳送的比特?cái)?shù)，在語(yǔ)音信號(hào)編碼時(shí)，表示語(yǔ)音數(shù)據(jù)每秒鐘需要多少個(gè)比特表示，單位為bps(比特/秒)或kbps(千比特/秒)。注意區(qū)分kbps和kBps（千字節(jié)/秒）。

質(zhì)量(可變)

Speex is a lossy codec, which means that it achives compression at the expense of fidelity of the input speech signal. Unlike ome other speech codecs, it is possible to control the tradeoff made between quality and bit-rate. The Speex encoding process is controlled most of the time by a quality parameter that ranges from 0 to 10. In constant bit-rate (CBR) operation, the quality parameter is an integer, while for variable bit-rate (VBR), the parameter is a float.

Speex是一種有損編解碼庫(kù)，這意味著它的文檔壓縮方面會(huì)導(dǎo)致語(yǔ)音輸入信號(hào)的失真，和一些語(yǔ)音編解碼庫(kù)不同的是，它盡可能的去控制質(zhì)量和比特率之間的平衡。大多數(shù)時(shí)候，是用一個(gè)0到10范圍內(nèi)的質(zhì)量參數(shù)來(lái)控制Speex的編碼，比特率為常量的操作，質(zhì)量參數(shù)是整數(shù)，如果是變比特率（VBR），則為浮點(diǎn)數(shù)（Float）

復(fù)雜度（可變）

With Speex, it is possible to vary the complexity allowed for the encoder. This is done by controlling how the search is performed with an integer ranging from 1 to 10 in a way that’s similar to the -1 to -9 options to gzip and bzip2 compression utilities. For normal use, the noise level at complexity 1 is between 1 and 2 dB higher than at complexity 10, but the CPU requirements for complexity 10 is about 5 times higher than for complexity 1. In practice, the best trade-off is between complexity 2 and 4, though higher settings are often useful when encoding non-speech sounds like DTMF tones.

在Speex中，編碼器可調(diào)整復(fù)雜度。用1到10的整數(shù)來(lái)控制如何執(zhí)行搜索，就像用-1到-9來(lái)設(shè)置壓縮工具gzip或bzip2(博主注：設(shè)計(jì)壓縮的塊長(zhǎng)度,為100k～900k)。正常情況下，復(fù)雜度為1時(shí)噪聲級(jí)會(huì)比復(fù)雜度為10時(shí)高1～2 dB(分貝)，而復(fù)雜度為10的CPU需求是復(fù)雜度為1的5倍。實(shí)踐證明，最好將復(fù)雜度設(shè)置在2～4，設(shè)置較高則對(duì)非語(yǔ)音編碼如雙音多頻（DTMF）音質(zhì)較為有用。

變比特率（VBR）

Variable bit-rate (VBR) allows a codec to change its bit-rate dynamically to adapt to the “difficulty” of the audio being encoded. In the example of Speex, sounds like vowels and high-energy transients require a higher bit-rate to achieve good quality, while fricatives (e.g. s,f sounds) can be coded adequately with less bits. For this reason, VBR can achive lower bit-rate for the same quality, or a better quality for a certain bit-rate. Despite its advantages, VBR has two main drawbacks: first, by only specifying quality, there’s no guaranty about the final average bit-rate. Second, for some real-time applications like voice over IP (VoIP), what counts is the maximum bit-rate, which must be low enough for the communication channel.

變比牲率（VBR）允許編解碼器動(dòng)態(tài)調(diào)整比特率以適應(yīng)的音頻解碼的“難度”，拿Speex來(lái)說(shuō)，像元音和瞬間高音則需較高比特率（Bit-rate）來(lái)達(dá)到最佳效果，而摩擦音則用較少的比特（bits）即可完成編碼?；谶@種原因，變比特率（VBR）可以用較低的比特率(bit-rate)達(dá)到相同的效果或使用某比特率（bit-rate）質(zhì)量會(huì)更好。盡管它有這些優(yōu)勢(shì)，但VBR也有兩個(gè)主要的缺點(diǎn)：首先，它只是針對(duì)質(zhì)量，卻沒(méi)辦法保證最終的平均比特率（ABR）; 其次，在一些實(shí)時(shí)應(yīng)用如VOIP電話中，盡管擁有高的比特率（bit-rate），為適應(yīng)通信信道還是需要適當(dāng)降低。

平均比特率（ABR）

Average bit-rate solves one of the problems of VBR, as it dynamically adjusts VBR quality in order to meet a specific target bit-rate. Because the quality/bit-rate is adjusted in real-time (open-loop), the global quality will be slightly lower than that obtained by encoding in VBR with exactly the right quality setting to meet the target average bit-rate.

平均比特率（ABR）通過(guò)動(dòng)態(tài)調(diào)整變比特率(VBR)的質(zhì)量來(lái)獲得一個(gè)特定目標(biāo)的比特率，解決了VBR中存在的問(wèn)題之一。因?yàn)槠骄忍芈剩ˋBR）是實(shí)時(shí)（開(kāi)環(huán)）調(diào)整質(zhì)量/比特率(bit-rate)的，整體質(zhì)量會(huì)略低于通過(guò)變比特率（VBR）設(shè)置的接近于目標(biāo)平均比特率進(jìn)行編碼獲得的質(zhì)量。

靜音檢測(cè)（VAD）

When enabled, voice activity detection detects whether the audio being encoded is speech or silence/background noise. VAD is always implicitly activated when encoding in VBR, so the option is only useful in non-VBR operation. In this case, Speex detects non-speech periods and encode them with just enough bits to reproduce the background noise. This is called “comfort noise generation” (CNG).

靜音檢測(cè)（VAD）將檢測(cè)被編碼的音頻數(shù)據(jù)是語(yǔ)音還是靜音或背景噪聲。這個(gè)特性在用變比特率（VBR）進(jìn)行編碼是總是開(kāi)啟的，所以選項(xiàng)設(shè)置只對(duì)非變比特率（VBR）起作用。在這種情況下，Speex檢測(cè)非語(yǔ)音周期并對(duì)用足夠的比特?cái)?shù)重新生成的背景噪聲進(jìn)行編碼。這個(gè)叫“舒適噪聲生成（CNG）”。

非連續(xù)傳輸（DTX）

Discontinuous transmission is an addition to VAD/VBR operation, that allows to stop transmitting completely when the background noise is stationary. In file-based operation, since we cannot just stop writing to the file, only 5 bits are used for such frames (corresponding to 250 bps).

非連續(xù)性傳輸（DTX）是靜音檢測(cè)（VAD）/變比特率（VBR）操作的額外選項(xiàng)，它能夠在背景噪聲固定時(shí)，完全的停止傳輸。如果是基于文件的操作，由于我們不能停止對(duì)文件的寫入，會(huì)有5個(gè)比特被用到這種幀內(nèi)（相對(duì)于250bps）。

知覺(jué)增強(qiáng)

Perceptual enhancement is a part of the decoder which, when turned on, attempts to reduce the perception of the noise/distortion produced by the encoding/decoding process. In most cases, perceptual enhancement brings the sound further from the original objectively (e.g. considering only SNR), but in the end it still sounds better (subjective improvement).

知覺(jué)增強(qiáng)中解碼的一部分，開(kāi)啟后，用來(lái)減少在編碼/解碼過(guò)程中產(chǎn)生的噪音/失真。大多數(shù)情況下，知覺(jué)增強(qiáng)產(chǎn)生的會(huì)和最原始的聲音會(huì)相差較遠(yuǎn)（如只考慮信噪比（SNR）），但最后發(fā)音效果卻很好（主觀改善）。

延時(shí)算法

Every speech codec introduces a delay in the transmission. For Speex, this delay is equal to the frame size, plus some amount of “look-ahead” required to process each frame. In narrowband operation (8 kHz), the delay is 30 ms, while for wideband (16 kHz), the delay is 34 ms. These values don’t account for the CPU time it takes to encode or decode the frames.

每個(gè)聲音編解碼在傳輸過(guò)程中都會(huì)有時(shí)延。就Speex來(lái)說(shuō)，它的時(shí)延就等于每幀大小加上每幀需要處理的一些"預(yù)測(cè)"(look-ahead)。在窄帶(8kHz)操作中,大概30ms時(shí)延，寬帶操作大概34ms時(shí)延。而且沒(méi)有將CPU進(jìn)行編/解碼的時(shí)間計(jì)算在內(nèi)。

2.2 編解碼

The main characteristics of Speex can be summarized as follows:
    • Free software/open-source, patent and royalty-free
    • Integration of narrowband and wideband using an embedded bit-stream
    • Wide range of bit-rates available (from 2.15 kbps to 44 kbps)
    • Dynamic bit-rate switching (AMR) and Variable Bit-Rate (VBR) operation
    • Voice Activity Detection (VAD, integrated with VBR) and discontinuous transmission (DTX)
    • Variable complexity
    • Embedded wideband structure (scalable sampling rate)
    • Ultra-wideband sampling rate at 32 kHz
    • Intensity stereo encoding option
    • Fixed-point implementation

Speex的主要特性總結(jié)如下：

開(kāi)源的自由軟件，免專利，免版權(quán)
通過(guò)嵌入的比特流集成窄帶和寬帶
可大范圍改變比特率（bit-rate）（從2.15kbps到44kbps ）
動(dòng)態(tài)比特率交換（AMR）和變比特率（VBR）操作
靜音檢測(cè)（VAD，和變比特率（VBR）集成）和非連續(xù)性傳輸（DTX）
可變復(fù)雜度
嵌入的寬帶結(jié)構(gòu)（可變的比特率）
32kHz的超寬帶采樣率
強(qiáng)立體聲編碼選項(xiàng)
定點(diǎn)執(zhí)行

2.3 預(yù)處理器

This part refers to the preprocessor module introduced in the 1.1.x branch. The preprocessor is designed to be used on the
audio before running the encoder. The preprocessor provides three main functionalities:
• noise suppression
• automatic gain control (AGC)
• voice activity detection (VAD)

這部分涉及到1.1.x里的預(yù)處理模塊介紹，預(yù)處理器是在音頻被編碼前使用，它主要提供如下三種主要功能：

抑制噪音
自動(dòng)增益控制（AGC）
靜音檢測(cè)（VAD）

The denoiser can be used to reduce the amount of background noise present in the input signal. This provides higher quality speech whether or not the denoised signal is encoded with Speex (or at all). However, when using the denoised signal with the codec, there is an additional benefit. Speech codecs in general (Speex included) tend to perform poorly on noisy input, which tends to amplify the noise. The denoiser greatly reduces this effect.

降噪是用來(lái)減少輸入信號(hào)中的背景噪音的數(shù)量。不論是Speex（或其他）編碼的去噪信號(hào)可提供更高的語(yǔ)音質(zhì)量。無(wú)論如何編解碼器使用降噪信號(hào)都是有利的。一般的語(yǔ)音編解碼器（Speex中也包含）在噪音輸入方面都表現(xiàn)不佳，往往會(huì)擴(kuò)大噪音。而降噪則大大降低了這種影響。

Automatic gain control (AGC) is a feature that deals with the fact that the recording volume may vary by a large amount between different setups. The AGC provides a way to adjust a signal to a reference volume. This is useful for voice over IP because it removes the need for manual adjustment of the microphone gain. A secondary advantage is that by setting the microphone gain to a conservative (low) level, it is easier to avoid clipping.

不同的設(shè)備，錄音效果會(huì)有較大幅度的變動(dòng)，自動(dòng)增益控制（AGC）就是用來(lái)處理這種現(xiàn)象的。它提供了一種調(diào)整信號(hào)為參考音量的方法。這對(duì)VOIP（voice over IP）是非常有用的，因?yàn)樗恍枰偈謩?dòng)去調(diào)整麥克風(fēng)增益。第二個(gè)好處是，將麥克風(fēng)增益設(shè)置為保守(低)級(jí)別，可有效避免削波。

The voice activity detector (VAD) provided by the preprocessor is more advanced than the one directly provided in the codec.

預(yù)處理器提供的靜音檢測(cè)（VAD）比編解碼器里直接提供的更為先進(jìn)。

2.4 自適應(yīng)抖動(dòng)緩沖

When transmitting voice (or any content for that matter) over UDP or RTP, packet may be lost, arrive with different delay,or even out of order. The purpose of a jitter buffer is to reorder packets and buffer them long enough (but no longer than necessary) so they can be sent to be decoded.

在用UDP或RTP協(xié)議傳輸語(yǔ)音（或其他相關(guān)內(nèi)容）的時(shí)候，會(huì)出現(xiàn)丟包、不同時(shí)延甚至是非時(shí)序的到達(dá)。抖動(dòng)緩沖的目的就是將它們緩沖到足夠長(zhǎng)（不超過(guò)必需的）并對(duì)這些包進(jìn)行重排序，然后才送給解碼器進(jìn)行解碼。

2.5 回聲消除

圖 2.1 回聲模式

In any hands-free communication system (Fig. 2.1), speech from the remote end is played in the local loudspeaker, propagates in the room and is captured by the microphone. If the audio captured from the microphone is sent directly to the remote end, then the remove user hears an echo of his voice. An acoustic echo canceller is designed to remove the acoustic echo before it is sent to the remote end. It is important to understand that the echo canceller is meant to improve the quality on the remote end.

如圖2.1所示，在免提通信系統(tǒng)中，語(yǔ)音從遠(yuǎn)端傳回本地的擴(kuò)音器，麥克風(fēng)回捕獲房?jī)?nèi)的回聲，然后會(huì)將其直接發(fā)回給遠(yuǎn)端，遠(yuǎn)端用戶就會(huì)聽(tīng)到它自己的聲音。回聲消除器就是為了在回聲傳回給遠(yuǎn)端用戶之前將其消除。重要的是要明白，回聲消除用來(lái)提高遠(yuǎn)端用戶接收到的語(yǔ)音質(zhì)量。

2.6 重采樣

In some cases, it may be useful to convert audio from one sampling rate to another. There are many reasons for that. It can be for mixing streams that have different sampling rates, for supporting sampling rates that the soundcard doesn’t support, for transcoding, etc. That’s why there is now a resampler that is part of the Speex project. This resampler can be used to convert between any two arbitrary rates (the ratio must only be a rational number) and there is control over the quality/complexity tradeoff.

在一些情況下，改變音頻的采樣率是非常有用的。有很多原因，如擁有不同采樣率則可進(jìn)行混合流、支持聲卡不支持的采樣率、代碼轉(zhuǎn)換等。這是為什么重采樣會(huì)成為Speex工程的一部分。重采樣可在任意比率之間轉(zhuǎn)換（比率必須是有理數(shù)），它是基于質(zhì)量/復(fù)雜度進(jìn)行的折中。

后記：

嗯，總體來(lái)說(shuō)感覺(jué)翻譯的蠻粗糙的，有些地方理解的不是很透，放在這里供大家拍磚。

posted on 2012-11-22 00:01 楊粼波閱讀(1738) 評(píng)論(0) 編輯收藏引用所屬分類: 文章收藏、C++

只有注冊(cè)用戶登錄后才能發(fā)表評(píng)論。
【推薦】100%開(kāi)源！大型工業(yè)跨平臺(tái)軟件C++源碼提供，建模，組態(tài)！

相關(guān)文章: cocos2dx 內(nèi)存管理 select 效率問(wèn)題微軟代碼簽名證書使用指南 Opus 音頻編碼正式標(biāo)準(zhǔn)化音頻比特率 speex 的一個(gè)例子, 使用了SPEEX抖動(dòng)緩存. 深入剖析 iLBC 編碼器原理 speex開(kāi)源項(xiàng)目的學(xué)習(xí) directsound抓取麥克風(fēng)PCM數(shù)據(jù)封裝類丟包補(bǔ)償技術(shù)調(diào)查

網(wǎng)站導(dǎo)航: 博客園 IT新聞 BlogJava 博問(wèn) Chat2DB 管理

牽著老婆滿街逛

導(dǎo)航

統(tǒng)計(jì)

公告

常用鏈接

留言簿(11)

隨筆分類(466)

隨筆檔案(1513)

文章分類(46)

文章檔案(45)

相冊(cè)

收藏夾(39)

工具官網(wǎng)

技術(shù)網(wǎng)站

開(kāi)源網(wǎng)站

其他窩點(diǎn)

收藏網(wǎng)站

銀行官網(wǎng)

友情鏈接

資源共享

搜索

積分與排名

最新評(píng)論

閱讀排行榜

Speex手冊(cè)----編解碼介紹

2.1 概念

2.2 編解碼

2.3 預(yù)處理器

2.4 自適應(yīng)抖動(dòng)緩沖

2.5 回聲消除

2.6 重采樣