O(1) 的小樂

In probability theory and information theory, the Kullback–Leibler divergence[1][2][3] (also information divergence,information gain, relative entropy, or KLIC) is a non-symmetric measure of the difference between two probability distributions P and Q. KL measures the expected number of extra bits required to code samples from P when using a code based on Q, rather than using a code based on P. Typically P represents the "true" distribution of data, observations, or a precise calculated theoretical distribution. The measure Q typically represents a theory, model, description, or approximation of P.

Although it is often intuited as a distance metric, the KL divergence is not a true metric – for example, the KL from P to Q is not necessarily the same as the KL from Q to P.

KL divergence is a special case of a broader class of divergences called f-divergences. Originally introduced by Solomon Kullbackand Richard Leibler in 1951 as the directed divergence between two distributions, it is not the same as a divergence incalculus. However, the KL divergence can be derived from the Bregman divergence.

注意P通常指數據集，我們已有的數據集，Q表示理論結果，所以KL divergence 的物理含義就是當用Q來編碼P中的采樣時，比用P來編碼P中的采用需要多用的位數！

KL散度，也有人稱為KL距離，但是它并不是嚴格的距離概念，其不滿足三角不等式

KL散度是不對稱的，當然，如果希望把它變對稱，

Ds(p1, p2) = [D(p1, p2) + D(p2, p1)] / 2

下面是KL散度的離散和連續定義！

$D_{\mathrm{KL}}(P\|Q) = \sum_i P(i) \log \frac{P(i)}{Q(i)}. \!$

$D_{\mathrm{KL}}(P\|Q) = \int_{-\infty}^\infty p(x) \log \frac{p(x)}{q(x)} \; dx, \!$

注意的一點是p(x) 和q(x)分別是pq兩個隨機變量的PDF,D(P||Q)是一個數值，而不是一個函數，看下圖！

注意：KL Area to be Integrated!

KL 散度一個很強大的性質：

The Kullback–Leibler divergence is always non-negative,

$D_{\mathrm{KL}}(P\|Q) \geq 0, \,$

a result known as , with D_KL(P||Q) zero if and only if P = Q.

計算KL散度的時候，注意問題是在稀疏數據集上KL散度計算通常會出現分母為零的情況！

Matlab中的函數：KLDIV給出了兩個分布的KL散度

Description

KLDIV Kullback-Leibler or Jensen-Shannon divergence between two distributions.

KLDIV(X,P1,P2) returns the Kullback-Leibler divergence between two distributions specified over the M variable values in vector X. P1 is a length-M vector of probabilities representing distribution 1, and P2 is a length-M vector of probabilities representing distribution 2. Thus, the probability of value X(i) is P1(i) for distribution 1 and P2(i) for distribution 2. The Kullback-Leibler divergence is given by:

KL(P1(x),P2(x)) = sum[P1(x).log(P1(x)/P2(x))]

If X contains duplicate values, there will be an warning message, and these values will be treated as distinct values. (I.e., the actual values do not enter into the computation, but the probabilities for the two duplicate values will be considered as probabilities corresponding to two unique values.) The elements of probability vectors P1 and P2 must each sum to 1 +/- .00001.

A "log of zero" warning will be thrown for zero-valued probabilities. Handle this however you wish. Adding 'eps' or some other small value to all probabilities seems reasonable. (Renormalize if necessary.)

KLDIV(X,P1,P2,'sym') returns a symmetric variant of the Kullback-Leibler divergence, given by [KL(P1,P2)+KL(P2,P1)]/2. See Johnson and Sinanovic (2001).

KLDIV(X,P1,P2,'js') returns the Jensen-Shannon divergence, given by [KL(P1,Q)+KL(P2,Q)]/2, where Q = (P1+P2)/2. See the Wikipedia article for "Kullback–Leibler divergence". This is equal to 1/2 the so-called "Jeffrey divergence." See Rubner et al. (2000).

EXAMPLE: Let the event set and probability sets be as follow:
   X = [1 2 3 3 4]';
   P1 = ones(5,1)/5;
   P2 = [0 0 .5 .2 .3]' + eps;
Note that the event set here has duplicate values (two 3's). These will be treated as DISTINCT events by KLDIV. If you want these to be treated as the SAME event, you will need to collapse their probabilities together before running KLDIV. One way to do this is to use UNIQUE to find the set of unique events, and then iterate over that set, summing probabilities for each instance of each unique event. Here, we just leave the duplicate values to be treated independently (the default):
   KL = kldiv(X,P1,P2);
   KL =
        19.4899

Note also that we avoided the log-of-zero warning by adding 'eps' to all probability values in P2. We didn't need to renormalize because we're still within the sum-to-one tolerance.

REFERENCES:
1) Cover, T.M. and J.A. Thomas. "Elements of Information Theory," Wiley, 1991.
2) Johnson, D.H. and S. Sinanovic. "Symmetrizing the Kullback-Leibler distance." IEEE Transactions on Information Theory (Submitted).
3) Rubner, Y., Tomasi, C., and Guibas, L. J., 2000. "The Earth Mover's distance as a metric for image retrieval." International Journal of Computer Vision, 40(2): 99-121.
4) <a href="http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence">Kullback–Leibler divergence</a>. Wikipedia, The Free Encyclopedia.

posted on 2010-10-16 15:04 Sosi 閱讀(10034) 評論(2) 編輯收藏引用所屬分類: Taps in Research

只有注冊用戶登錄后才能發表評論。
【推薦】100%開源！大型工業跨平臺軟件C++源碼提供，建模，組態！

相關文章: 游戲程序員養成計劃 (更新2010.11.6) Free Resource Thread 在VS2008下編譯ODE物理引擎zz 文獻調研1 Kullback–Leibler divergence KL散度 Mahalanobis distance 馬氏距離 6 blog tips for busy academics Two Classes of solutions to IK Footskate Yak Shaving Mark

網站導航: 博客園 IT新聞 BlogJava 博問 Chat2DB 管理

統計系統

# re: Kullback–Leibler divergence KL散度 2010-11-30 16:17 tintin0324

# re: Kullback–Leibler divergence KL散度 2010-12-05 22:37 Sosi

青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

O(1) 的小樂

公告

統計

留言簿(10)

隨筆分類(70)

隨筆檔案(182)

文章檔案(1)

如影隨形

搜索

最新隨筆

最新評論

閱讀排行榜

評論排行榜

Kullback–Leibler divergence KL散度

評論

青青草原综合久久大伊人导航_色综合久久天天综合_日日噜噜夜夜狠狠久久丁香五月_热久久这里只有精品

O(1) 的小樂

公告

統計

留言簿(10)

隨筆分類(70)

隨筆檔案(182)

文章檔案(1)

如影隨形

搜索

最新隨筆

最新評論

閱讀排行榜

評論排行榜

Kullback–Leibler divergence KL散度

評論

# re: Kullback&ndash;Leibler divergence KL散度 2010-11-30 16:17 tintin0324

# re: Kullback&ndash;Leibler divergence KL散度 2010-12-05 22:37 Sosi

# re: Kullback–Leibler divergence KL散度 2010-11-30 16:17 tintin0324

# re: Kullback–Leibler divergence KL散度 2010-12-05 22:37 Sosi