Created: 2026-03-06 07:53:04
Updated: 2026-03-06 07:53:04
AEP for continuous random variables: 设X 1 , … , X n X_{1},\dots,X_{n} X 1 , … , X n 是独立同分布变量,分布函数f ( x ) f(x) f ( x ) ,则
− 1 n log f ( X 1 , … , X n ) → E [ − log f ( X ) ] = h ( X ) in probability -\frac{1}{n} \log f(X_{1},\dots,X_{n}) \to E[-\log f(X)] = h(X)\ \text{in probability} − n 1 log f ( X 1 , … , X n ) → E [ − log f ( X )] = h ( X ) in probability
typical set:
A ϵ ( n ) = { ( x 1 , … , x n ) ∈ S n : ∣ − 1 n log f ( x 1 , … , x n ) − h ( X ) ∣ ≤ ϵ } A_{\epsilon}^{(n)} = \left\{(x_{1},\dots,x_{n})\in S^n:| -\frac{1}{n}\log f(x_{1},\dots,x_{n}) - h(X)| \leq\epsilon\right\} A ϵ ( n ) = { ( x 1 , … , x n ) ∈ S n : ∣ − n 1 log f ( x 1 , … , x n ) − h ( X ) ∣ ≤ ϵ }
记V o l ( A ) = ∫ A d x 1 d x 2 … d x n , A ∈ R n Vol(A) = \int _{A}dx_{1}dx_{2}\dots dx_{n}, A\in \mathscr{R}^n V o l ( A ) = ∫ A d x 1 d x 2 … d x n , A ∈ R n
则A ϵ ( n ) A_{\epsilon}^{(n)} A ϵ ( n ) 满足如下性质:
Pr ( A ϵ ( n ) ) ≥ 1 − ϵ \text{Pr}(A_{\epsilon}^{(n)})\geq 1-\epsilon Pr ( A ϵ ( n ) ) ≥ 1 − ϵ for n sufficiently large
V o l ( A ϵ ( n ) ≤ 2 n ( h ( X ) + ϵ ) Vol(A_{\epsilon}^{(n)} \leq 2^{n(h(X)+\epsilon)} V o l ( A ϵ ( n ) ≤ 2 n ( h ( X ) + ϵ ) for all n
V o l ( A ϵ ( n ) ) ≥ ( 1 − ϵ ) 2 n ( h ( X ) − ϵ ) Vol(A_{\epsilon}^{(n)}) \geq (1-\epsilon)2^{n(h(X)-\epsilon)} V o l ( A ϵ ( n ) ) ≥ ( 1 − ϵ ) 2 n ( h ( X ) − ϵ ) for n sufficiently large
Theorem 9.3.1: If the density f ( x ) f(x) f ( x ) of the random variable X X X is Riemann integrable, then
H ( X Δ ) + log ( Δ ) → h ( f ) = h ( X ) , a s Δ → 0 H(X^{\Delta}) + \log(\Delta) \to h(f) = h(X),\qquad as \Delta\to0 H ( X Δ ) + log ( Δ ) → h ( f ) = h ( X ) , a s Δ → 0
Thus the entropy of an n − n- n − bit quantization of a continuous random variable X X X is approximately h ( X ) + n h(X)+n h ( X ) + n
Conditional differential entropy:
h ( X ∣ Y ) = − ∫ f ( x , y ) log f ( x ∣ y ) d x d y h(X\mid Y) = -\int f(x,y)\log f(x|y) \, dxdy h ( X ∣ Y ) = − ∫ f ( x , y ) log f ( x ∣ y ) d x d y
h ( X ∣ Y ) = h ( X , Y ) − h ( Y ) h(X|Y) = h(X,Y) -h(Y) h ( X ∣ Y ) = h ( X , Y ) − h ( Y )
多元正太分布的微分熵为:
h ( X 1 , … , X n ) = h ( N n ( μ , K ) ) = 1 2 log ( 2 π e ) n ∣ K ∣ b i t s h(X_{1},\dots,X_{n}) = h(\mathscr{N}_{n}(\mu,K)) = \frac{1}{2} \log (2\pi e)^n |K|\ \ bits h ( X 1 , … , X n ) = h ( N n ( μ , K )) = 2 1 log ( 2 π e ) n ∣ K ∣ bi t s
Relative Entropy:
D ( f ∣ ∣ g ) = ∫ f log f g d x D(f\mid \mid g) = \int f\log \frac{f}{g} \, dx D ( f ∣∣ g ) = ∫ f log g f d x
Motivated by continuity, we set 0 log 0 0 = 0 0\log \frac{0}{0}=0 0 log 0 0 = 0
Mutual information:
I ( X ; Y ) = ∫ f ( x , y ) log f ( x , y ) f ( x ) f ( y ) d x d y = D ( f ( x , y ) ∣ ∣ f ( x ) f ( y ) ) I(X;Y) = \int f(x,y) \log \frac{f(x,y)}{f(x)}f(y) dxdy = D(f(x,y) \mid\mid f(x)f(y)) I ( X ; Y ) = ∫ f ( x , y ) log f ( x ) f ( x , y ) f ( y ) d x d y = D ( f ( x , y ) ∣∣ f ( x ) f ( y ))
I ( X Δ ; Y Δ ) = I ( X , Y ) I(X^\Delta;Y^\Delta) = I(X,Y) I ( X Δ ; Y Δ ) = I ( X , Y )
Hadamard's inequality: If we let X ⃗ ∼ N ( 0 , K ) \vec{X}\sim \mathscr{N}(0,K) X ∼ N ( 0 , K ) be a multi-variate normal random variable, then substituting the definitions of entropy in the above inequality gives us
∣ K ∣ ≤ ∏ i = 1 n K i i |K| \leq \prod_{i=1}^n K_{ii} ∣ K ∣ ≤ i = 1 ∏ n K ii
微分熵的性质:
h ( X + c ) = h ( X ) h(X+c)=h(X) h ( X + c ) = h ( X )
h ( a X ) = h ( X ) + log ∣ a ∣ h(aX) =h(X)+\log|a| h ( a X ) = h ( X ) + log ∣ a ∣
h ( A X ⃗ ) = h ( X ⃗ ) + log ∣ A ∣ h(A \vec{X})=h(\vec{X})+\log |A| h ( A X ) = h ( X ) + log ∣ A ∣ , |A|是行列式的绝对值
令随机向量X ⃗ ∈ R n \vec{X}\in \mathbb{R}^n X ∈ R n 具有均值0和协方差K = E X X t K=EXX^t K = EX X t , 例如K i j = E X i X j K_{ij}=EX_{i}X_{j} K ij = E X i X j ,那么h ( X ⃗ ) ≤ 1 2 log ( 2 π e ) n ∣ K ∣ h(\vec{X})\leq \frac{1}{2} \log (2\pi e)^n |K| h ( X ) ≤ 2 1 log ( 2 π e ) n ∣ K ∣ ,上式取等号当且仅当X ⃗ ∼ N ( 0 , K ) \vec{X}\sim \mathscr{N}(0,K) X ∼ N ( 0 , K ) 证明:令g ( X ⃗ ) g(\vec{X}) g ( X ) 具有满足∫ g ( x ⃗ ) x i x j d x ⃗ = K i j \int g(\vec{x})x_{i}x_{j} \, d \vec{x}=K_{ij} ∫ g ( x ) x i x j d x = K ij 的任意概率密度,令ϕ K \phi_{K} ϕ K 是满足正态分布N ( 0 , K ) \mathscr{N}(0,K) N ( 0 , K ) 随机向量的概率分布,注意到log ϕ K ( x ⃗ ) \log \phi_{K}(\vec{x}) log ϕ K ( x ) 是一个二次型,而且∫ x i x j ϕ K ( x ⃗ ) d x ⃗ = K i j \int x_{i}x_{j}\phi_{K}(\vec{x}) \, d \vec{x}=K_{ij} ∫ x i x j ϕ K ( x ) d x = K ij ,于是
0 ≤ D ( g ∣ ∣ ϕ K ) = ∫ g log ( g ϕ K ) = − h ( g ) = ∫ g log ϕ K = − h ( g ) − ∫ ϕ K log ϕ K = − h ( g ) + h ( ϕ K ) \begin{align}
0 & \leq D(g\mid\mid \phi_{K}) \\
& = \int g\log\left( \frac{g}{\phi_{K}} \right) \\
& = -h(g) = \int g\log \phi_{K} \\
& = -h(g) - \int \phi_{K}\log \phi_{K} \\
& = -h(g)+h(\phi_{K})
\end{align} 0 ≤ D ( g ∣∣ ϕ K ) = ∫ g log ( ϕ K g ) = − h ( g ) = ∫ g log ϕ K = − h ( g ) − ∫ ϕ K log ϕ K = − h ( g ) + h ( ϕ K )
在所有拥有相同方差的分布中,正太分布熵最大。我们用这个bound给出离散随机变量的熵。这不会用方差来描述,因为即便离散随机变量的方差任意小,它也可能有很大的熵。这个bound将由整数取值、拥有相同概率的随机变量来描述。
令X X X 为取值在X = { a 1 , a 2 , … } \mathscr{X}=\{a_{1},a_{2},\dots\} X = { a 1 , a 2 , … } 的随机变量,他有概率密度
Pr ( X = a i ) = p i \text{Pr}(X=a_{i}) = p_{i} Pr ( X = a i ) = p i
Thm 9.7.1:
H ( p 1 , … ) ≤ 1 2 log ( 2 π e ) ( ∑ i = 1 ∞ p i i 2 − ( ∑ i = 1 ∞ i p i ) 2 + 1 12 ) H(p_{1},\dots) \leq \frac{1}{2} \log (2\pi e)\left( \sum_{i=1}^\infty p_{i} i^2-\left( \sum_{i=1}^\infty ip_{i} \right)^2 + \frac{1}{12} \right) H ( p 1 , … ) ≤ 2 1 log ( 2 π e ) i = 1 ∑ ∞ p i i 2 − ( i = 1 ∑ ∞ i p i ) 2 + 12 1
for every permutation σ \sigma σ ,
H ( p 1 , … ) ≤ 1 2 log ( 2 π e ) ( ∑ i = 1 ∞ p σ ( i ) i 2 − ( ∑ i = 1 ∞ i p σ ( i ) ) 2 + 1 12 ) H(p_{1},\dots) \leq \frac{1}{2} \log (2\pi e)\left( \sum_{i=1}^\infty p_{\sigma(i)} i^2-\left( \sum_{i=1}^\infty ip_{\sigma(i)} \right)^2 + \frac{1}{12} \right) H ( p 1 , … ) ≤ 2 1 log ( 2 π e ) i = 1 ∑ ∞ p σ ( i ) i 2 − ( i = 1 ∑ ∞ i p σ ( i ) ) 2 + 12 1
证明:定义两个随机变量,第一个X 0 X_{0} X 0 :
Pr ( X 0 = i ) = p i \text{Pr}(X_{0}=i)=p_{i} Pr ( X 0 = i ) = p i
第二个U U U 为[ 0 , 1 ] [0,1] [ 0 , 1 ] 上的均匀分布,和X 0 X_{0} X 0 之间独立。X ~ = X 0 + U \tilde X=X_{0}+U X ~ = X 0 + U
H ( X 0 ) = − ∑ i = 1 ∞ p i log p i = − ∑ i = 1 ∞ ( ∫ i i + 1 f X ~ ( x ) d x ) log ( ∫ i = 1 i + 1 f X ~ ( x ) d x ) = − ∑ i = 1 ∞ ∫ i i + 1 f X ~ ( x ) log f X ~ ( x ) d x = − ∫ 1 ∞ f X ~ ( x ) log f ( X X ~ ( x ) ) d x = h ( X ) \begin{align}
H(X_{0}) & = -\sum_{i=1}^\infty p_{i}\log p_{i} \\
& = -\sum_{i=1}^\infty\left( \int _{i}^{i+1} f_{\tilde X }(x) \, dx \right) \log \left( \int _{i=1}^{i+1} f_{\tilde X}(x) \, dx \right) \\
& = -\sum_{i=1}^\infty \int _{i}^{i+1} f_{\tilde X}(x)\log f_{\tilde X}(x) \, dx \\
& = -\int _{1}^\infty f_{\tilde X}(x)\log f(X_{\tilde X}(x)) \, dx \\
& = h(X)
\end{align} H ( X 0 ) = − i = 1 ∑ ∞ p i log p i = − i = 1 ∑ ∞ ( ∫ i i + 1 f X ~ ( x ) d x ) log ( ∫ i = 1 i + 1 f X ~ ( x ) d x ) = − i = 1 ∑ ∞ ∫ i i + 1 f X ~ ( x ) log f X ~ ( x ) d x = − ∫ 1 ∞ f X ~ ( x ) log f ( X X ~ ( x )) d x = h ( X )
Hence we have the following chain of inequalities:
H ( X ) = H ( X 0 ) = h ( X ~ ) ≤ 1 2 log ( 2 π e ) Var ( X ~ ) = 1 2 log ( 2 π e ) ( V a r ( X 0 ) + V a r ( U ) ) = 1 2 log ( 2 π e ) ( ∑ i = 1 ∞ p i i 2 − ( ∑ i = 1 ∞ i p i ) 2 + 1 12 ) \begin{equation}\begin{aligned}
H(X) & = H(X_{0}) =h(\tilde X) \\
& \leq \frac{1}{2 }\log (2\pi e)\text{Var} (\tilde X) \\
& = \frac{1}{2} \log (2\pi e) (Var(X_{0} )+Var(U)) \\
& = \frac{1}{2} \log (2\pi e)\left( \sum_{i=1}^{\infty} p_{i}i^2 -\left( \sum_{i=1}^\infty ip_{i} \right)^2 + \frac{1}{12} \right)
\end{aligned}\end{equation} H ( X ) = H ( X 0 ) = h ( X ~ ) ≤ 2 1 log ( 2 π e ) Var ( X ~ ) = 2 1 log ( 2 π e ) ( Va r ( X 0 ) + Va r ( U )) = 2 1 log ( 2 π e ) i = 1 ∑ ∞ p i i 2 − ( i = 1 ∑ ∞ i p i ) 2 + 12 1
Leave a Comment