Lossless compression(link) is essential when data integrity is critical—such as in images, source code, and backups. Among the most popular lossless formats is ZIP, introduced in 1989. In this post, we’ll explore how ZIP compression works, how to use it in Python, and why it’s still relevant in 2025.
Whether you’re a developer trying to minimize storage usage or a data scientist needing to compress datasets while preserving accuracy, Python’s zipfile
module provides a simple yet powerful solution.
What is ZIP Compression?
ZIP compression is a lossless compression algorithm that reduces file size without removing any data. It works by:
- Identifying redundancy: Repetitive sequences like
"abcabcabc"
can be compressed to"3abc"
(conceptually). - Combining multiple files into one container archive.
- Preserving data integrity, making it suitable for files that must remain unchanged.
Lossless vs Lossy Compression
Lossless | Lossy |
---|---|
No data is lost | Some data is discarded |
Ideal for text, source code, images (when quality must be preserved) | Ideal for media like music, video, photos |
Examples: ZIP, PNG, FLAC | Examples: MP3, MP4, JPEG |
YouTube, for example, uses lossy compression to stream videos efficiently based on network conditions and user preferences.
Why Use ZIP in Python?
As a developer, you might want to:
- Archive multiple files
- Reduce backup sizes
- Transfer files with reduced size
- Preserve the exact original data
Python’s built-in zipfile
module makes this straightforward.
Python ZIP Compression Code Example
Here’s a full example that includes:
- Compressing a single file
- Compressing all files of a certain type in a directory
- Unzipping specific files or all contents
import os
import zipfile
def file_zip(source_zip, target_file):
f_zip = zipfile.ZipFile(source_zip, 'w')
f_zip.write(target_file, compress_type=zipfile.ZIP_DEFLATED)
f_zip.close()
def files_zip(source_zip, filetype):
f_zip = zipfile.ZipFile(source_zip, 'w')
base_dir = os.path.join(os.path.dirname(
os.path.dirname(os.path.abspath(__file__))), 'tmp')
for folder, _, files in os.walk(base_dir):
for file in files:
if file.endswith(filetype):
full_path = os.path.join(folder, file)
relative_path = os.path.relpath(full_path, base_dir)
f_zip.write(full_path, relative_path,
compress_type=zipfile.ZIP_DEFLATED)
f_zip.close()
def file_unzip(source_zip, target_file):
with zipfile.ZipFile(source_zip) as f_zip:
f_zip.extract(target_file, os.path.dirname(os.path.abspath(__file__)))
def file_all_unzip(source_zip):
with zipfile.ZipFile(source_zip) as f_zip:
f_zip.extractall(os.path.dirname(os.path.abspath(__file__)))
# Sample usage
if __name__ == "__main__":
file_zip('compress.zip', 'source.file')
file_unzip('compress.zip', 'source.file')
file_all_unzip('compress.zip')
os.remove('compress.zip')
Practical Notes
- The function
files_zip()
adds complexity by scanning directories and filtering by file extension. - For simple tasks, many developers still prefer shell commands (especially on Linux) like
zip
orunzip
for speed and simplicity. - That said, Python gives you cross-platform control, making it ideal for automated scripts or desktop apps.
Insights from Experience
“After 20+ years in software, I’ve found that developers are language-agnostic. Use Python if you’re working cross-platform. Use the command line if you’re automating in a Unix environment.”
ZIP is not the most sophisticated compression format, but it strikes a good balance between efficiency, speed, and compatibility. And it’s still one of the most common formats used globally for lossless file compression.
Trade-offs: Speed vs Compression Ratio
Modern lossless compression tools (like 7z
, gzip
, or bzip2
) vary in:
- Compression ratio
- Speed
- System compatibility
For ZIP:
- Compression ratio: ⭐⭐☆☆☆
- Speed: ⭐⭐⭐⭐☆
- Compatibility: ⭐⭐⭐⭐⭐
Conclusion
ZIP compression in Python offers a fast, portable, and reliable way to archive and transfer files without losing any data. Whether you’re writing a script to clean up storage, backing up image data in OpenCV workflows, or preparing a dataset for ML, zipfile
in Python is your go-to solution.