ZIP is a very commonly used archive format to compress data. It was created in 1989. It supports lossless data compression.
ZIP files ends with .zip or .ZIP extension.
The majority of operating systems nowadays provide built-in zip compression utilities.
Zip archives use different compression algorithms (DEFLATED, LZMA, BZIP2, etc) for compressing contents.
When zip archives are transferred on the internet, the MIME-type application/zip is used to represent it.
Python provides a module named zipfile with a bunch of functionalities to work with zip archives. It provides various tools to perform operations like create, read, write and append on zip archives.
As a part of this tutorial, we have explained how to use Python module zipfile to handle zip archives with simple examples. We have explained important usage of zipfile module like how to read contents of zip archives, create zip archives, append files to an existing zip archive, compress content with different compression algorithms, test zip file for corruption, etc.
Below, we have listed essential sections of Tutorial to give an overview of the material covered.
We have created one simple text file that we'll be using when explaining various examples. Below we have printed the contents of the file. We have also created one archive with only this file using our Linux tools which we'll try to read in our first example.
!cat zen_of_python.txt
As a part of our first example, we'll explain how we can open an existing zip archive and read the contents of the file stored in it. We'll be using class ZipFile and its methods for this purpose.
Our code for this example starts by creating an instance of ZipFile with zip archive and then opens text file inside of it using open() method of ZipFile instance. It then reads the contents of the file, prints it, and closes the archive. We have decoded the contents of the file as it’s returned as bytes.
Below, we have listed class and methods used in example. All sections will have information about class and methods listed which can be referred to check for important parameters.
ZipFile(file,mode='r',compression=ZIP_STORED,allowZip64=True,compresslevel=None) - This class lets us work with a zip file given as first parameter. It let us open, read, write, append to zip files.
import zipfile
zipped_file = zipfile.ZipFile("zen_of_python.zip")
fp = zipped_file.open("zen_of_python.txt") ## Open file from zip archive
file_content = fp.read().decode() ## Read contents
print(file_content)
zipped_file.close() ## Close Archive
Our second code example for this part explains the usage of read() method of ZipFile object. It has almost the same code as the previous example with the only change that it uses read() method. We also check whether the file is a valid zip file or not using is_zipfile() method.
is_zipfile() - This method takes as input file or file-like object and returns True if its zip archive else returns False based on the magic number of file.
import zipfile
zipped_file = zipfile.ZipFile("zen_of_python.zip")
file_content = zipped_file.read("zen_of_python.txt")
print(file_content.decode())
zipped_file.close()
print("Is zen_of_python.zip a zip file ? ",zipfile.is_zipfile("zen_of_python.zip")) ## Checking for zip file
As a part of our second example, we'll explain how we can create zip archives. We'll be using write() method of ZipFile instance to write files to zip archive.
write(filename) - It accepts a filename and adds that file to the archive.
Our code for this example creates a new archive by using 'w' mode of ZipFile instance. It then writes our text file to the archive and closes the archive.
The next part of our code opens the same archive again in reading mode and reads the contents of the text file written to verify it.
import zipfile
### Create archive and write file to it.
zipped_file = zipfile.ZipFile("zen_of_python2.zip", mode="w", ) ## Opening in write mode
zipped_file.write("zen_of_python.txt")
zipped_file.close()
## Read archive to check contents
zipped_file = zipfile.ZipFile("zen_of_python2.zip", mode="r")
file_content = zipped_file.read("zen_of_python.txt")
print(file_content.decode())
zipped_file.close()
Our code for this example is exactly the same as our code for the previous example with the only change that it explains the usage of writestr() method.
import zipfile
### Create archive and write file to it.
zipped_file = zipfile.ZipFile("zen_of_python2.zip", mode="w", )
zipped_file.writestr("zen_of_python.txt", data=open("zen_of_python.txt", "r").read())
zipped_file.close()
## Read archive to check contents
zipped_file = zipfile.ZipFile("zen_of_python2.zip", mode="r")
file_content = zipped_file.read("zen_of_python.txt")
print(file_content.decode())
zipped_file.close()
As a part of our third example, we'll explain how we can append new files to an existing archive.
Our code for this example opens an archive that we created in the previous example in append mode. It then writes a new JPEG file to the archive and closes it. The next part of our code again opens the same archive in reading mode and prints the contents of the archive using printdir() method. The printdir() method lists archive contents which contain information about each file of an archive, its last modification date, and size.
import zipfile
### Create archive and write file to it.
zipped_file = zipfile.ZipFile("zen_of_python2.zip", mode="a", )
zipped_file.write("dr_apj_kalam.jpeg")
zipped_file.close()
## Read archive to check contents
zipped_file = zipfile.ZipFile("zen_of_python2.zip", mode="r")
zipped_file.printdir()
zipped_file.close()
We can also use ZipFile instance as a context manager (with statement). We don't need to call close() method on archive if we have opened it as a context manager as it'll close the archive by itself once we exit from context manager.
Our code for this example opens an existing archive as a context manager, reads the contents of the text file inside of it, and prints them.
It's recommended to use ZipFile as a context manager because it's a safe and easy way to work with archives. We'll be using it as a context manager in our examples going forward.
Python has a library named contextlib that let us create context managers with ease. Please feel free to check below link if you are interested in learning it.
import zipfile
with zipfile.ZipFile("zen_of_python.zip") as zipped_file:
file_content = zipped_file.read("zen_of_python.txt")
print(file_content.decode())
As a part of our fifth example, we are demonstrating how we can write multiple files to a zip archive.
Our code for this example starts by creating an instance of ZipFile with a new zip archive name and mode as 'w'. It then loops through four file names and writes them one by one to an archive. It then also lists a list of files present in an archive using namelist() method.
import zipfile
with zipfile.ZipFile("multiple_files.zip", "w") as zipped_file:
for file in ["zen_of_python.txt", "dr_apj_kalam.jpeg", "intro_ball.gif", "Deploying a Django Application to Google App Engine.pdf"]:
zipped_file.write(file)
print("List of files in archive : ", zipped_file.namelist())
As a part of our sixth example, we'll try different compression algorithms available for creating zip archives.
Our code for this example creates a different archive for each compression type and writes four files of a different type to each archive. We are then listing archive names to see the file size created by each type.
import zipfile
### Zip STORED
with zipfile.ZipFile("multiple_files_stored.zip", mode="w", compression=zipfile.ZIP_STORED) as zipped_file:
for file in ["zen_of_python.txt", "dr_apj_kalam.jpeg", "intro_ball.gif", "Deploying a Django Application to Google App Engine.pdf"]:
zipped_file.write(file)
### ZIP DEFLATED
with zipfile.ZipFile("multiple_files_deflated.zip", mode="w", compression=zipfile.ZIP_DEFLATED) as zipped_file:
for file in ["zen_of_python.txt", "dr_apj_kalam.jpeg", "intro_ball.gif", "Deploying a Django Application to Google App Engine.pdf"]:
zipped_file.write(file)
### ZIP BZIP2
with zipfile.ZipFile("multiple_files_bzip2.zip", mode="w", compression=zipfile.ZIP_BZIP2) as zipped_file:
for file in ["zen_of_python.txt", "dr_apj_kalam.jpeg", "intro_ball.gif", "Deploying a Django Application to Google App Engine.pdf"]:
zipped_file.write(file)
### ZIP LZMA
with zipfile.ZipFile("multiple_files_lzma.zip", mode="w", compression=zipfile.ZIP_LZMA) as zipped_file:
for file in ["zen_of_python.txt", "dr_apj_kalam.jpeg", "intro_ball.gif", "Deploying a Django Application to Google App Engine.pdf"]:
zipped_file.write(file)
We can notice from our results below that lzma has performed more compression compared to other algorithms.
!ls -l multiple_files_*
Below we have created another example that is exactly the same as our above example with the only change that we have provided compresslevel for BZIP2 and DEFLATED compressions. As we had explained earlier, the lower values for compression level will result in compression completing faster but compression will be low hence file sizes will be more compared to more compression levels.
When we don't provide compression level, it uses the best compression. We can compare file sizes generated by this example with previous ones. We can notice that file sizes generated by DEFLATED and BZIP2 are more compared to the previous example. As we did not provide compression level in our previous example, it used most compression. For this example, we have provided the least compression level hence compression completes fast but file sizes are more due to less compression.
import zipfile
### Zip STORED
with zipfile.ZipFile("multiple_files_stored.zip", mode="w", compression=zipfile.ZIP_STORED) as zipped_file:
for file in ["zen_of_python.txt", "dr_apj_kalam.jpeg", "intro_ball.gif", "Deploying a Django Application to Google App Engine.pdf"]:
zipped_file.write(file)
### ZIP DEFLATED with compresslevel=0 (least compression)
with zipfile.ZipFile("multiple_files_deflated.zip", mode="w", compression=zipfile.ZIP_DEFLATED, compresslevel=0) as zipped_file:
for file in ["zen_of_python.txt", "dr_apj_kalam.jpeg", "intro_ball.gif", "Deploying a Django Application to Google App Engine.pdf"]:
zipped_file.write(file)
### ZIP BZIP2 with compresslevel=1 (least compression)
with zipfile.ZipFile("multiple_files_bzip2.zip", mode="w", compression=zipfile.ZIP_BZIP2, compresslevel=1) as zipped_file:
for file in ["zen_of_python.txt", "dr_apj_kalam.jpeg", "intro_ball.gif", "Deploying a Django Application to Google App Engine.pdf"]:
zipped_file.write(file)
### ZIP LZMA
with zipfile.ZipFile("multiple_files_lzma.zip", mode="w", compression=zipfile.ZIP_LZMA) as zipped_file:
for file in ["zen_of_python.txt", "dr_apj_kalam.jpeg", "intro_ball.gif", "Deploying a Django Application to Google App Engine.pdf"]:
zipped_file.write(file)
!ls -l multiple_files_*
As a part of our seventh example, we'll explain how we can extract the contents of an archive. We'll be using extract() and extractall() methods of ZipFile instance for our purpose.
extract(member_name,path=None) - This method accepts member file name that we want to extract. It then extracts the member to the current directory. If we want to extract a file to some different location then we can give that location as a string to path parameter of the method. The method returns the path where the file was extracted.
extractall(path=None, members=None) - This method extracts all members of the zip archive to the current path if the path is not provided else it'll extract members to the given path. We can also give a list of members to extract as a list to members parameter and it'll only extract that subset of members to the given path. The method returns the path where files were extracted.
Our code for this example starts by opening a previously created zip file in reading mode. It then extracts one single JPEG to the current path. We then extract the same file to a different location by giving the location to path parameter. We have then also extracted all files using extractall() method.
import zipfile
with zipfile.ZipFile("multiple_files.zip") as zipped_file:
extraction_path = zipped_file.extract("dr_apj_kalam.jpeg")
print("Path where files are extracted : ", extraction_path)
extraction_path = zipped_file.extract("dr_apj_kalam.jpeg", path="/home/sunny/multiple_files/")
print("Path where files are extracted : ", extraction_path)
extraction_path = zipped_file.extractall(path="/home/sunny/multiple_files/")
As a part of our eighth example, we'll explain how we can retrieve details of individual members of an archive.
The zipfile library has a class named ZipInfo whose object stores information about one file.
The information like modification date, compression type, compression size, file size, CRC, and few other important attributes are available through ZipInfo instance.
We can get ZipInfo instance using getinfo() or infolist() methods of ZipInfo instance.
Our code for this example starts by creating a method that takes as input ZipInfo instance and prints important attributes of members represented by that instance.
We have then opened the existing zip archive in reading mode. We have then retrieved ZipInfo instance for the text file using getinfo() method and printed the attributes of the file.
We have then retrieved all ZipInfo instances for all members of the archive and printed details of each member.
import zipfile
def print_zipinfo_details(zipinfo):
print("\n=============== {} ==================".format(zipinfo.filename))
print("\nLast Modification Datetime of file : ", zipinfo.date_time)
print("Compression Type : ", zipinfo.compress_type)
print("Compressed Data Size : ", zipinfo.compress_size)
print("Uncompressed Data Size : ", zipinfo.file_size)
print("Comment for file : ", zipinfo.comment)
print("System that created zip file : ", zipinfo.create_system)
print("PKZIP Version : ", zipinfo.create_version)
print("PKZIP Version needed for extraction : ", zipinfo.extract_version)
print("ZIP Flags : ", zipinfo.flag_bits)
print("Volume Number of File Header : ", zipinfo.volume)
print("Internal Attributes : ", zipinfo.internal_attr)
print("External Attributes : ", zipinfo.external_attr)
print("Byte offset to File Header : ", zipinfo.header_offset)
print("CRC-32 of compressed data : ", zipinfo.CRC)
print("Is zip member directory ? : ", zipinfo.is_dir())
with zipfile.ZipFile("multiple_files.zip") as zipped_file:
print("List of files in archive : ", zipped_file.namelist())
zipinfo = zipped_file.getinfo("zen_of_python.txt")
print("\nZipInfo Object : ", zipinfo)
print_zipinfo_details(zipinfo)
for zipinfo in zipped_file.infolist():
print_zipinfo_details(zipinfo)
As a part of our ninth example, we are demonstrating how we can check for corrupt archives using testzip() method of ZipFile instance.
Our code for this example opens two different archives and checks them for corruption. It then prints whether archives are corrupt or not.
import zipfile
with zipfile.ZipFile("multiple_files.zip") as zipped_file:
test_result = zipped_file.testzip()
print("Is zip file (multiple_files.zip) valid ? ",test_result if test_result else "Yes")
with zipfile.ZipFile("zen_of_python.zip") as zipped_file:
test_result = zipped_file.testzip()
print("Is zip file (zen_of_python.zip) valid ? ",test_result if test_result else "Yes")
As a part of our tenth and last example, we'll explain how we can compress compiled python files (.pyc) and create an archive from them. The zipfile provides a special class named PyZipFile for this purpose.
PyZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=True, optimize=-1) - The PyZipFile constructor has same parameters as that of ZipFile constructor.
It has the same methods as ZipFile with one extra method described below.
Our code for this example starts by creating an instance of PyZipFile for the new archive. It then adds files to the zip archive from the path given as input. We then list all files added to the archive using printdir() method.
import zipfile
with zipfile.PyZipFile("python_files.zip", mode="w") as pyzipped_file:
pyzipped_file.writepy(pathname="/home/sunny/multiprocessing_synchronization_primitive_examples")
pyzipped_file.printdir()
This ends our small tutorial explaining the usage of zipfile module of Python.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to