TA的每日心情 | 开心 2021-12-13 21:45 |
---|
签到天数: 15 天 [LV.4]偶尔看看III
|
哈夫曼树是一种特殊的树,结合前面做书上动态规划题的了解,哈夫曼树就是最优二叉树。
建立一颗哈夫曼树前需要明确条件,比如一颗词典树(节点值为单词),我们希望能通过我们的查找习惯建立一颗更快、更合适的二叉树,那么,这里的条件就是树中每个单词的搜索频率,显然,搜索频率越高的单词越靠近树根,查找效率会更好,通过搜索频率(权值)与节点离根节点的路径距离计算出WPL(带权路径长),当词典树的形态为某种情况的时候(哈夫曼树总是一颗满二叉树 — 除叶节点外,内部节点都是儿孙满堂的),WPL最小,那么这样的一颗二叉树就是最优二叉树,也就是我们想要的树的形态了。
可通过动态规划算法证明,上面描述的二叉树的各个节点是否与最优二叉树的各节点相等。当然书上还有更严谨的算法数学证明。
WPL计算很简单,公式:WPL = ∑ L[sub]i[/sub] × P[sub]i[/sub] (其中L是路径长度,P是权值)。
建立哈夫曼树很简单:初始化节点数据,维护一个最小优先队列,将节点按权值大小加入到优先队列中,然后将队列中的节点弹出,由下而上建立哈夫曼树。
算法伪python代码:
- """
- class node:
- int f; //权值
- type var; //其他数据类型
- node left;
- ndoe right;
- """
- def build_Huffman_tree(nodes):
- """
- nodes是一组node类型的节点
- """
- priority_queue<node> que = nodes; //加入到优先队列
- while que.size > 1:
- left = que.top;
- right = que.top;
- p = new node; // 请求一个新节点
- p.f = left.f + right.f;
- que.add = p;
- return que.top;
复制代码
哈夫曼编码是一种变长编码的方式,变长编码一般比定长编码压缩率高,所以这里不考虑定长编码,但定长编码也很简单,自己制定一个编码表,通过查表的方式编码,效率高。解码也是查表即可。
制定哈夫曼编码规则:左路径编码为0,右路径编码为1。这样就可以通过遍历二叉树进行编码了。如图:
图片来自百度图片
解码也很简单,只需要根据制定的规则,再进行树的遍历,然后通过查表即可解码。
完整代码:
- #include <iostream>
- #include <string.h>
- #define MAXSIZE 0xffff
- #define QUE_LEFT(i) (2*(i) + 1)
- class node {
- public:
- char var;
- size_t freq;
- node * left;
- node * right;
- node() {}
- node(char c, size_t f) : var(c), freq(f) {}
- node(node * l, node * r) : var(0), freq(l->freq + r->freq), left(l), right(r) {}
- virtual ~node() {}
- };
- class queue : public node{
- public:
- size_t size_s;
- node * priority_queue[MAXSIZE];
- queue() : size_s(0) {}
- ~queue() {
- while(!empty())
- size_s--;
- }
- bool empty() const {
- if (size_s == 0)
- return true;
- return false;
- }
- bool full() const {
- if (size_s == MAXSIZE)
- return true;
- return false;
- }
- size_t size() const {
- return size_s;
- }
- void insert(node * n);
- node * pop();
- };
- void queue::insert(node * n) {
- if (full())
- exit(1);
- int i = size_s++;
- for (; i > 0 && priority_queue[i / 2]->freq >= n->freq; i /= 2)
- priority_queue[i] = priority_queue[i / 2];
- priority_queue[i] = n;
- }
- node * queue::pop() {
- if (empty())
- exit(1);
- size_s--;
- node * root = priority_queue[0];
- int i = 0;
- for (int l; QUE_LEFT(i) < (int)size_s; i = l) {
- l = QUE_LEFT(i);
- if (l + 1 < (int)size_s && priority_queue[l + 1]->freq < priority_queue[l]->freq)
- l++;
- priority_queue[i] = priority_queue[l];
- }
- priority_queue[i] = priority_queue[size_s];
- return root;
- }
- class HuffmanTree {
- public:
- node * build_Huffman_tree(std::string str, int * & freq);
- void coding(node * root, char * write, char ** code, int len);
- std::string encode(node * root, std::string str, char ** code);
- void decode(node * root, std::string codes);
- void destory(node * root);
- };
- node * HuffmanTree::build_Huffman_tree(std::string str, int * & freq) {
- queue que;
- for (auto v : str)
- ++freq[(int)v];
- for (int i = 0; i < 128 + 1; i++)
- if (freq[i]) {
- node * n = new node(i, freq[i]);
- que.insert(n);
- }
- while (que.size() > 1) {
- node * left = que.pop();
- node * right = que.pop();
- node * parent = new node(left, right);
- que.insert(parent);
- }
- return que.pop();
- }
- void HuffmanTree::coding(node * pr, char * write, char ** code, int len) {
- static char buf[MAXSIZE >> 1], *out = buf;
- if (pr->var) {
- write[len] = 0;
- strcpy(out, write);
- code[(int)pr->var] = out;
- out += len + 1;
- return;
- }
- write[len] = "0"; coding(pr->left, write, code, len + 1);
- write[len] = "1"; coding(pr->right, write, code, len + 1);
- }
- std::string HuffmanTree::encode(node * root, std::string str, char ** code) {
- char * write = new char;
- coding(root, write, code, 0);
- delete write;
- std::string read = "";
- for (auto v : str)
- read += code[(int)v];
- return read;
- }
- void HuffmanTree::decode(node * root, std::string codes) {
- node * n = root;
- int i = 0;
- while (codes[i]) {
- if (codes[i++] == "0")
- n = n->left;
- else
- n = n->right;
- if (n->var) putchar(n->var), n = root;
- }
- }
- void HuffmanTree::destory(node * root) {
- if (root) {
- destory(root->left);
- destory(root->right);
- delete root;
- }
- }
- void TravelTree(node * root) {
- if (root) {
- std::cout << root->freq;
- if (root->var)
- std::cout << ":" << root->var;
- std::cout << std::endl;
- TravelTree(root->left);
- TravelTree(root->right);
- }
- }
- int main()
- {
- int freq[128 + 1] = { 0 }, *f = freq;
- char *code[MAXSIZE + 1];
- node * root;
- HuffmanTree tree;
- std::string str = "hello world!";
- root = tree.build_Huffman_tree(str, f);
- str = tree.encode(root, str, code);
- for (int i = 0; i < 128 + 1; i++)
- if (code[i]) {
- std::cout << (char)i << ":" << freq[i] <<
- " --- " << code[i] << std::endl;
- }
- std::cout << "编码结果:" << str << std::endl;
- std::cout << "解码:";
- tree.decode(root, str);
- return 0;
- }
复制代码
运行结果:

参考资料:
1.这个网址是在维基百科找到的,有各种语言的哈夫曼编码的实现:http://rosettacode.org/wiki/Huffman_coding
2.http://www.cnblogs.com/sench/p/7798064.html |
|