之前写的 Huffman 编解码器 读写超过 4G 文件会报 basic_filebuf::xsgetn error reading the file: iostream error
,但是 C++ 标准规定大小是 size_t
不应该出问题,后来在网友提醒下发现这并不应该出错,但 Windows API 接受的参数是 DWORD
类型也即 unsigned int
,于是超过 4G 就报错了。
BOOL WriteFileEx(
HANDLE hFile,
LPCVOID lpBuffer,
DWORD nNumberOfBytesToWrite,
LPOVERLAPPED lpOverlapped,
LPOVERLAPPED_COMPLETION_ROUTINE lpCompletionRoutine
);
于是根据大小分了一下块进行读取,写入同理
char *buffer = new char[size];
std::filebuf *ptr = in.rdbuf();
size_t size = ptr->pubseekoff(0, std::ios::end, std::ios::in);
ptr->pubseekpos(0, std::ios::in);
for (size_t i = 0; (i << 10) < size; i++)
ptr->sgetn(buffer + (i << 10), std::min(size_t(1 << 10), size - (i << 10)));
接下来应该会改成在线模式减少内存用量,另外尝试用多线程加速处理过程。
下附项目说明
Huffman
Usage
./encode --infile input --outfile output
./decode --infile input --outfile output
Speed
Tested on i5-8265U CPU @ 1.60GHz
, single core.
Use 2018-11-13-raspbian-stretch-lite.img
as example file.
> encode --infile j --outfile d
0.00 Started
File Size: 1.74G
0.83 File Read
4.37 Char Counted
4.37 Tree Constructed
4.37 Tree Walked
20.25 File Encoded
20.44 Stopped
Overall Speed: 87.08M/s
Compression Rate: 47.68%
> decode --infile d --outfile s
0.18 Started
0.19 Huffman Loaded
File Size: 848.76M
0.61 File Read
20.01 File Decoded
20.01 Stopped
Overall Speed: 88.94M/s