GNU Zip or gzip is a type of file format, developed to replace the compression programs used in earlier Unix systems. It is open source as well as free, created by Mark Adler and Jean-Loup Gailly.
The gzip command is popularly used to compress files but also used in compressing web pages on the server end for decompression in the browser. It is also extensively used in compressing streaming media. Besides compressing individual files, it can concatenate and compress several streams at the same time.
Gzip was initially intended for use by GNU, but it is also supported in other operating systems, like Windows or Mackintosh. It offers great compression. Files compressed using gzip can be unzipped by a program called gunzip.
In this article, we will discuss how gzip works, how the gzip command can compress/decompress files, its other functionalities as well as compare it with zip. Let’s start.
How does gzip work?
Gzip refers to a file format with some striking features as follows:
- It has a 10-byte header with a magic number (1f 8b), compression ID (08 for DEFLATE), file flags, 32-bit timestamp, compression flags, and OS id.
- It provides optional extra headers, represented by flags like the original filename.
- It has a body with a DEFLATE-compressed payload.
- It also has an 8-byte footer with a CRC-32 checksum.
Gzip offers lossless compression. It functions based on the DEFLATE algorithm, which is a combination of LZ77 and Huffman Coding.
DEFLATE was developed to replace LZW and other compression algorithms with limited functionalities.
Repeated occurrences of data are replaced with references using LZ77. Every reference contains two values – jump and length.
However, the amount of compression obtained relies on the size of the input and the distribution of common substrings. Usually, text like source code or English is reduced by 60-70%.
Huffman coding allocates shorter codes to more recurring characters. It is a variable length coding method. In variable length coding, the end of the code needs to be found and detect the start of a new one to decode it.
Huffman coding creates a prefix to solve this problem. No codeword can be a prefix of any other. You can visit the GNU gzip home page for downloading this tool.
How to compress a file using gzip?
The gzip command is popularly used to compress files, and one should be familiar with it to experience its benefit. When a file or folder is compressed using gzip, it retains the same file name but adds a .gz extension to it.
Run the following command for compressing a single file in gzip
For example, if you want to a compress a file named myfile.odt, then the command looks like
Some folders or files compress better compared to others. For instance, documents, text files, bitmap images, audio and video formats like WAV and MPEG compress significantly.
Some other kinds of files like JPEG images or MP3 audio files don’t compress so well. Sometimes, they even increase in size on running the gzip command because such files are already compressed.
Gzip command generally compresses regular files and folders. It ignores symbolic links.
How to decompress using gzip command?
For decompressing a file, use the following command:
gzip -d filename.gz
To decompress the file in our previous example, the command will look like:
gzip -d myfile.odt
Also, gunzip or zcat can be used to decompress files and restore them in original form. If the actual name saved in the compressed file is not compatible with its file system, a new name is generated from the original one to make it suitable.
Gunzip generally takes a list of files on its command line and replaces every file ending with .gz, -gz, .z, -z, or _z (ignore the case) and which starts with the right magic number with an uncompressed file, excluding the original extension.
It also identifies the special extensions like .tgz and .taz as short forms for .tar.gz and .tar.Z respectively. When compressing, it also uses the .tgz extension, if required, instead of truncating a file with a .tar extension.
Gzip can uncompress files created using zip only if they have a single member compressed with the ‘deflation’ method. This is only intended to help conversion of tar.zip files to the tar.gz format.
For extracting a zip file with a single member, use the command:
gunzip <foo.zip or gunzip -S .zip foo.zip
To extract the zip files with multiple members, use ‘unzip’ instead of gunzip.
zcat is similar to gunzip -c. On certain systems, it may be installed as gzcat to preserve the original link to compress. zcat can uncompress either a list of files on the command line or its standard input.
It then writes the uncompressed data on standard output. zcat will uncompress files having the right magic number, whether they have a .gz suffix or not.
What are some other functions that can be performed using gzip command?
Let us take a look at what other actions can be executed using the gzip command.
Forcefully compress a file
Sometimes, it may happen that you are attempting to compress a file (say, mydoc) but a file named mydoc.gz already exists in your system. In such case, run the following command to compress the file forcefully
gzip -f filename
Keeping the uncompressed file
On compressing using gzip, you automatically end up with a new file with extension .gz. For compressing the file as well as keeping the original file, use the following command:
gzip -k filename
For example, you will end up with the files mydoc.odt and mydoc.odt.gz on running the command:
gzip -k mydoc.odt
Information regarding disk space
We compress a file to save disk space on our system as well as for sending it quickly over the network. Using gzip command, you can figure out how much disk space you could conserve as well as the compression performance.
To obtain such information, use the command
gzip -l filename.gz
This command provides information concerning compressed size, uncompressed size, the ratio (percentage) and uncompressed filename.
Compressing each file in a folder/subfolder
Using the gzip command, you can traverse the entire directory of a particular folder structure and compress every file in it.
The command is
gzip -r foldername
Checking the validity of a compressed file
Use the following command to test if a file is valid or not:
gzip -t filename
There is no output for a valid file.
Changing the compression level
A file can be compressed in many ways. You can opt for smaller compression for speeding up your work. Also, you can prefer maximum compression, but it will take more time to run.
Use the following commands for
- Minimum compression at fastest speed
gzip -1 filename
- Maximum compression at the slowest speed
gzip -9 filename
You may vary the speed as well as compression level by choosing any digit between 1 to 9.
Comparison between Zip and Gzip
ZIP and GZIP are two well-known methods of compressing files to save space, to reduce the time required to transmit the files across the network, or the internet.
Usually, GZIP is much better compared to ZIP, concerning compression, specifically when compressing a large number of files.
Software using the ZIP format can archive and compress the files together.
They are two separate processes. Compression decreases the file size using algorithms while archiving combines multiple files so that the result is a single file.
GZIP is purely known as a compression tool and depends on another tool, usually TAR, to archive the files. Generally, GZIP archives all the files into a single tarball before compression.
For ZIP, the individual files are compressed and then added to the archive. To take out a file from ZIP, extract it, followed by decompression.
With GZIP, the entire file must be decompressed before extracting the required one from the archive. It takes much longer time in GZIP than ZIP when pulling a 1MB file from a 10GB archive.
As the compression algorithm in GZIP compresses one large file instead of several smaller ones, it can take benefit of the redundancy in the files to decrease the file size to some more extent.
For example, if you archive and compress 10 identical files with both ZIP and GZIP, the ZIP file would be over 10 times larger than the resulting GZIP file.
Both of them can be used with nearly any operating system. However, ZIP is famous in Windows and GZIP is popular in Unix like OSes.
gzip is a popular compression tool, initially developed for Unix-like operating systems but can be used in other OSes like Windows or Mackintosh as well.
It works based on the DEFLATE algorithm, comprised of LZ77 and Huffman coding. Gzip offers lossless compression. It usually compresses regular files or folders and skips symbolic links.
Some files are compressed to a great extent while others are not. Files like documents, text files, audio or video (WAV or MPEG format), etc. are significantly compressed. But others like JPEG and MP3 are previously compressed and hence, increase in size if gzip is applied.
The gzip command can perform various functions. It can compress/decompress a file, check for validation of a file, provide statistics concerning compression performance, change compression level, etc.
You can also use gunzip or zcat to decompress files compressed using gzip.