Lossless compression(link) is essential when data integrity is critical—such as in images, source code, and backups. Among the most popular lossless formats is ZIP, introduced in 1989. In this post, we’ll explore how ZIP compression works, how to use it in Python, and why it’s still relevant in 2025.

Whether you’re a developer trying to minimize storage usage or a data scientist needing to compress datasets while preserving accuracy, Python’s zipfile module provides a simple yet powerful solution.


What is ZIP Compression?

ZIP compression is a lossless compression algorithm that reduces file size without removing any data. It works by:

  • Identifying redundancy: Repetitive sequences like "abcabcabc" can be compressed to "3abc" (conceptually).
  • Combining multiple files into one container archive.
  • Preserving data integrity, making it suitable for files that must remain unchanged.

Lossless vs Lossy Compression

LosslessLossy
No data is lostSome data is discarded
Ideal for text, source code, images (when quality must be preserved)Ideal for media like music, video, photos
Examples: ZIP, PNG, FLACExamples: MP3, MP4, JPEG

YouTube, for example, uses lossy compression to stream videos efficiently based on network conditions and user preferences.


Why Use ZIP in Python?

As a developer, you might want to:

  • Archive multiple files
  • Reduce backup sizes
  • Transfer files with reduced size
  • Preserve the exact original data

Python’s built-in zipfile module makes this straightforward.


Python ZIP Compression Code Example

Here’s a full example that includes:

  • Compressing a single file
  • Compressing all files of a certain type in a directory
  • Unzipping specific files or all contents
import os
import zipfile

def file_zip(source_zip, target_file):
    f_zip = zipfile.ZipFile(source_zip, 'w')
    f_zip.write(target_file, compress_type=zipfile.ZIP_DEFLATED)
    f_zip.close()

def files_zip(source_zip, filetype):
    f_zip = zipfile.ZipFile(source_zip, 'w')
    base_dir = os.path.join(os.path.dirname(
        os.path.dirname(os.path.abspath(__file__))), 'tmp')
    for folder, _, files in os.walk(base_dir):
        for file in files:
            if file.endswith(filetype):
                full_path = os.path.join(folder, file)
                relative_path = os.path.relpath(full_path, base_dir)
                f_zip.write(full_path, relative_path,
                            compress_type=zipfile.ZIP_DEFLATED)
    f_zip.close()

def file_unzip(source_zip, target_file):
    with zipfile.ZipFile(source_zip) as f_zip:
        f_zip.extract(target_file, os.path.dirname(os.path.abspath(__file__)))

def file_all_unzip(source_zip):
    with zipfile.ZipFile(source_zip) as f_zip:
        f_zip.extractall(os.path.dirname(os.path.abspath(__file__)))

# Sample usage
if __name__ == "__main__":
    file_zip('compress.zip', 'source.file')
    file_unzip('compress.zip', 'source.file')
    file_all_unzip('compress.zip')
    os.remove('compress.zip')

Practical Notes

  • The function files_zip() adds complexity by scanning directories and filtering by file extension.
  • For simple tasks, many developers still prefer shell commands (especially on Linux) like zip or unzip for speed and simplicity.
  • That said, Python gives you cross-platform control, making it ideal for automated scripts or desktop apps.

Insights from Experience

“After 20+ years in software, I’ve found that developers are language-agnostic. Use Python if you’re working cross-platform. Use the command line if you’re automating in a Unix environment.”

ZIP is not the most sophisticated compression format, but it strikes a good balance between efficiency, speed, and compatibility. And it’s still one of the most common formats used globally for lossless file compression.


Trade-offs: Speed vs Compression Ratio

Modern lossless compression tools (like 7z, gzip, or bzip2) vary in:

  • Compression ratio
  • Speed
  • System compatibility

For ZIP:

  • Compression ratio: ⭐⭐☆☆☆
  • Speed: ⭐⭐⭐⭐☆
  • Compatibility: ⭐⭐⭐⭐⭐

Conclusion

ZIP compression in Python offers a fast, portable, and reliable way to archive and transfer files without losing any data. Whether you’re writing a script to clean up storage, backing up image data in OpenCV workflows, or preparing a dataset for ML, zipfile in Python is your go-to solution.

By Mark

-_-

Leave a Reply

Your email address will not be published. Required fields are marked *