Skip to content

it is an command user interface used to compress files using huffman algorithm(Greedy algorithm) with variable length of chunks for compressing

Notifications You must be signed in to change notification settings

ziad40/Compress-file

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analysis and Design of Algorithms Programming Assignment 2

Huffman's Algorithm

In this part, it is required to implement Huffman's algorithm that we discussed. Your implementation should allow compressing and decompressing arbitrary files. As discussed in class, the implementation should collect statistics from the input file first, then apply the compression algorithm. Note that you will need to store a representation of the codewords in the compressed file, so that you can decompress the file back. Your program should have the capability of considering more than one byte. For example, instead of just collecting the frequencies and finding codewords for single bytes. The same can be done assuming the basic unit is n bytes, where n is an integer.

Specifications

  • You will submit a single runnable jar that will be used for both compression and decompression. Your jar should be named as huffman.jar. The jar must include the source code files.

  • To use it for compressing an input file, the following will be called: java -jar huffman.jar c absolute_path_to_input_file n

  • c means compressing the file.

  • n is the number of bytes that will be considered together.

  • To use it for decompressing an input file, the following be called: java -jar huffman.jar d absolute_path_to_input_file

  • If the user chooses to compress a file with the name abc.exe, the compressed file should have the name compressed..abc.exe.hc , and should be replaced by n (the number of bytes per group). The compressed file should appear in the same directory of the input file. The program should print the compression ratio and the compression time.

  • If the user chooses to decompress a file with name abc.exe.hc, the output file should be named extracted.abc.exe. This should appear in the same directory of the input file. The program should print the decompression time in this case.

Analysis Requirements

  • Submit a report that shows the compression ratio when running your implementation on the following files for different values of n = 1, 2, 3, 4, 5.
  • File 1: This file is from the NIH genetic sequence database. Note that the file is already compressed. Extract the file first, then run the experiments on the uncompressed file (gbbct10.seq). Note that the experiments here could take time, as the file is large. You should test your program and make sure of correctness on small inputs first. Also, try to implement your code efficiently to save time.
  • Compare the compression ratio you got in both cases with the compression ratio of 7-zip: https://www.7-zip.org/. Try to explain the observations. Note: The compression ratio can be defined using multiple ways. You can calculate it in the same way as 7-zip does, which is the size of the compressed file divided by the size of the original file.

About

it is an command user interface used to compress files using huffman algorithm(Greedy algorithm) with variable length of chunks for compressing

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages