My Notes

Created: 2026-03-06 07:53:04

Updated: 2026-03-06 07:53:04

Codes & Extensions:

Let $S,T$ be two finite sets, called the source and target alphabets. A Code $C:S\to T^*$ is a total function mapping, and an extension of $C$ is a homomorphism: $ext(C):S^*\to T^*$ , which maps each sequence of source symbols to a sequence of target symbols.

Non-singular codes:指映射非单射

A code is non-singular if each source symbol is mapped to a different non-empty bit string, i.e. the mapping from source symbols to bit strings is injective( $f(x_{1})=f(x_{2})\implies x_{1}=x_{2}$ ).

可唯一解码的编码C(Uniquely decodable codes)：

对于有限长的编码 $T'\in T^*$ ，都存在唯一的解码方式 $S'\in S$ 使得 $S'=ext(C)$ 。

A code $C$ is uniquely decodable if its extension $ext(C)$ is non singular. Whether a given code is uniquely decodable can be decided with the Sardinas-Patterson algorithm.

%%给定一个编码如何判断是否为可唯一解码的编码？假定某 $T'\in T^*$ 有两种解码方式 $S_{1},S_{2}\in S^*$ ， $S_{1}=\{A_{1},A_{2},\dots,A_{m}\}$ 和 $S_{2}=\{B_{1},B_{2},\dots,B_m\}$ ，其中 $A_{i},B_{i}\in S$ %%

Sardinas Patterson Algorithm

Let $S_{0}=C$
round i : let the set of dangling suffixes $S_{i}$ $S_{i}$
- $S_{1}=S_{0}^{-1}S_{0} \{\}$

![[294024f145e41457d784606e9b5742a6d7ca7ffe.svg]]：Singular
![[e7077bf9dbc7c975f4d5f510567a34ca5ff0e1d7.svg]]：Non-singular, its extension will generate a lossless coding ,which will be useful fo general data transmission.
Note: it is not necessary for the non-singular code to be more compact than the source.

Consider $M_{2}$ again. It's not uniquely decodable, since the string 011101110011 can be interpreted as cdb or babe. However, such a code is useful when the set of all possiblesource symbols is completely known and finite, or when there are restrictions (for example a formal syntax) that determine if source elements of this extension are acceptable. Such restrictions permit the decoding of the original message by checking which of the possible source symbols mapped the same are valid under the restrictions.

前缀码

$C:S\to T^*=(c_{1},c_{2},\dots c_{n})$ ：指的是任意 $i\neq j$ ,有 $c_{i}\neq \text{prefix}(c_{j})$ 性质的编码
设可唯一解码的编码 $C$ 构成集合 $A$ ，前缀码/即时码的集合是 $B$ 。
我们来研究具有最小平均编码长度(min average code length)的集合。我们指出，对任意 $\vec{ c}\in A-B$ 具有最短编码长度，总能找到对应的 $\vec{c'}\in B$ ，使得二者编码长度相等。

Proof:

对于任意可唯一解码的编码 $C$ ，它总满足Kraft Inequality。证明见下面。
给定满足Kraft Inequality的一组码字长度，我们可以构造满足该条件的前缀码。构造方式如下：
重新安排码字长度 $l_{i}$ 使得 $l_{1}\leq l_{2}\leq\dots\leq l_{n}$ .定义第 $i$ 个码字 $C_{i}$ ，使得它是
$\sum_{j=1}^{i-1}D^{-l_{j}}$
的 $D$ 进制表示的前 $l_{i}$ 个数位。对于第一个 $l_{1}$ ， $C_{1}=0$ .对于这样构造的编码，一定有任意两个码字之间不互为前缀，因而是满足条件的前缀码。

对随机变量 $X,P=(p_{1},p_{2},\dots,p_{n})$ 对应了 $X=x_{k}$ 对应的概率。如何找到最小的prefix-free编码 $c_{1},c_{2},\dots c_{n},$ 使得平均码长 $E(l)=\sum_{i}p_{i}|c_{i}|$ 最小？

但prefix-free这个性质并不数学！如何改得更加数学化？
一个直觉是，prefix-free的限制使得 $c_{i}$ 不可能都很短！如何把这种直觉定量化？这就是Kraft Inequality描述的内容：

Kraft Inequality:

If $c_{1},c_{2},\dots,c_{n}$ are prefix-free codes, let $l_{1},l_{2},\dots,l_{n}$ be the code length; then

$\sum_{i=1}^n 2^{-l_{i}}\leq 1$

可以把这个prefix-free codes看成一个二叉树，这时每个码字就是一个叶节点，而prefix-free的性质要求了所有编码都不在二叉树的内部。
取等的情形对应满二叉树的情形，这时才能有可能取到最短的平均码长。

对于唯一可译编码的情形，可以给出一个很巧妙的证明：

[!theorem] Theorem 2(McMillan)
对于 $D$ 元的字母表 $S$ 构成的唯一可译码，其码字长度 $l_{k}$ 必然满足： $\sum_{i}D^{-l_{i}}\leq 1$

[!tip] Proof

$\sum_{x\in \mathscr{X}}D^{-l(x)}\leq 1$
作k次方处理：

$\left(\sum_{x\in \mathscr{X}} D^{-l(x)}\right)^k = \sum_{x_{i}\in \mathscr{X},i=1,\dots,n}D^{-\sum_{i=1}^nl(x_{i})}=\sum_{x^k\in \mathscr{X^k}}D^{-l(x^k)}=\sum_{m=1}^{kl_{\max }}a(m)D^{-m}\leq kl_{\max }$
其中 $a(m)$ 为编码长m的个数。由非奇异编码可知， $a(m)\leq D^m$ ，于是

$\sum_{x\in \mathscr{X}}D^{-l(x)}\leq (kl_{\max })^{1/k}$
令 $k\to \infty$ 即得证。

1

这时我们的问题变为：

$\\min \sum_{i}p_{i}l_{i}\qquad \text{when} \sum_{i}2^{-l_{i}}=1$

设 $q_{i}= 2^{-l_{i}}$ ， $l_{i}=-\log q_{i}$ ,于是问题变为：

$\\min \sum_{i}-p_{i}\log{q_{i}}\qquad \text{when} \sum_{i}q_{i}=1$

这正是之前证明过的定理，当 $q_{i}=p_{i},l_{i} = \log_{2} \frac{1}{p_{i}}$ 时取得最小值，最小码长为 $\sum_{i} -p_{i}\log p_{i}$ 。当然，这里未考虑到 $l_{i}$ 是整数这个约束。

熵的加性：设有随机变量 $X,Y,Z$ , $X$ 有分布 $(p_{1},p_{2},\dots,p_{n})$ ,当 $X=x_{n}$ 时 $p(Y=y_{1})=q_{1},p(Y=y_{2})=q_{2}$ ，Z的分布为 $(p_{1},p_{2},\dots,p_{n-1},p_{n}q_{1},p_{n}q_{2})$ ，则三者熵的关系为：

$H(X)+p_{n}H(Y)=H(Z)$

Optimal code

对于随机变量X及其分布 $(p_{1},p_{2},\dots,p_{n})$ ，有 $p_{1}\geq p_{2}\geq\dots\geq p_{n}$ ,求一组编码 $c_{1},c_{2},\dots,c_{n}\in \{0,1\}^n$ ，使得平均码长最短。

Optimal Code应具有的性质为：

$|c_{1}|\leq |c_{2}|\leq\dots\leq|c_{n-1}|\leq |c_{n}|$
$\sum_{i=1}^n 2^{-|c_{i}|}=1$
Recursive:合并两个最小概率节点，形成有 $n-1$ 个message的code时，这个code也是optimal的
$|c_{n}|=|c_{n-1}|$