Python developers many times need to compare files in the same directory or different directories when performing tasks like data analysis, machine learning, etc. Many times developers end up writing their own algorithms to do comparisons of these types. To solve that problem and save the time of the developers, Python has developed a module named filecmp which lets developers compare files and directories using its easy-to-use API. The module provides different methods to compare two files, more than two files, different directories containing a list of files, etc. As a part of this tutorial, we'll explain how we can use filecmp module for performing different kinds of comparisons with simple and easy-to-understand examples.
We have created a directory structure with a list of files inside it which will be used for various comparison examples that we'll explain as a part of this tutorial.
directory1
directory1
|
-----------------------------------------------
| | | |
directory1_1 file2.txt original.txt modified.txt
|
------------------------------------------------------
| | |
directory1_1_1 original1_1.txt modified1_1.txt
|
-------------------------
| |
original1_1_1.txt modified1_1_1.txt
directory2
directory2
|
-----------------------------------------------
| | | |
directory1_1 file2.txt original.txt modified.txt
|
----------------------------------------------------------
| | | |
directory1_1_1 original.txt modified1_1.txt file2.txt
|
-------------------------
| |
original.txt modified.txt
Both directories have almost the same structure with the same file names in some subdirectories and different in some. The contents of all original*.txt
, modified*.txt
is same. We have below shown the contents of the files. The contents of the file are taken from the text of zen of Python (import this).
!cat directory1/original.txt
!cat directory1/modified.txt
!cat directory1/file2.txt
Please make a NOTE that filecmp compares contents of the file and returns results as boolean values (same or not). If you are interested in finding the line-by-line differences between two files then please check our tutorial on difflib module which provides that functionality.
As a part of our first example, we'll explain how we can compare two files using cmp() function available from filecmp.
Our code for this example simply compares files in directory1 and directory2 with different parameter settings.
Please make a NOTE that if you are comparing many files then this function keeps some of the files in the cache. Therefore it’s recommended to clear cache using filecmp.clear_cache() function in order to avoid comparing stale files if file contents are getting changed very often.
import filecmp
result = filecmp.cmp("directory1/original.txt", "directory1/modified.txt")
print("Is {} equal to {}? : {}".format("directory1/original.txt", "directory1/modified.txt", result))
result = filecmp.cmp("directory1/original.txt", "directory2/original.txt", shallow=False)
print("Is {} equal to {}? : {}".format("directory1/original.txt", "directory1/original.txt", result))
esult = filecmp.cmp("directory1/modified.txt", "directory2/modified.txt", shallow=False)
print("Is {} equal to {}? : {}".format("directory1/modified.txt", "directory1/modified.txt", result))
As a part of our second example, we'll explain how we can compare the list of files having the same name different directories using cmpfiles() function.
Our code for this example first creates a list of files to check in both directories. The list has some filenames present in both directories and some present in neither. The code then compares for these files in both directories and prints match, mismatch, and error results.
import filecmp
files_to_compare = ["original.txt", "modified.txt", "file2.txt", "file3.txt", "file4.txt"]
match, mismatch, errors = filecmp.cmpfiles("directory1", "directory2", common=files_to_compare)
print("Matched Files : {}".format(match))
print("Mismatched Files : {}".format(mismatch))
print("Errors : {}".format(errors))
match, mismatch, errors = filecmp.cmpfiles("directory1", "directory2", common=files_to_compare, shallow=False)
print("\nMatched Files : {}".format(match))
print("Mismatched Files : {}".format(mismatch))
print("Errors : {}".format(errors))
As a part of our third example, we'll explain how we can compare the two directories using dircmp instance of filecmp. The dircmp let us compare all the files in the directories and all of its subdirectories as well. It then also let us generate a report showing the results of the comparison.
All the reports include information about the same files and differing files between two directories as well as files that are present in only one of the directories. It also includes information about common subdirectories. The report will have one section per each directory comparison.
Our code for this example creates a dircmp instance for comparing directories directory1 and directory2. It then prints reports by calling all three different report generating methods described above.
When we run below code, we can notice that how report() method only compared files in given directories, report_partial_closure() method went only 1 level down to do comparison and report_full_closure() method compared all sub directories recursively.
directory_cmp = filecmp.dircmp(a="directory1", b="directory2")
print("=========== Comparison Report =========== \n")
directory_cmp.report()
print("\n========= Comparison Report Partial ============\n")
directory_cmp.report_partial_closure()
print("\n========= Comparison Report Full ============== \n")
directory_cmp.report_full_closure()
As part of our fourth example, we are demonstrating how we can ignore some files when doing a comparison using dircmp instance by using ignore attribute of the constructor.
Our code for this example is exactly the same as our previous example with one minor change. We have added file2.txt to the list of files to be ignored when doing the comparison.
If we compare the output of this example with the previous example then we can clearly see that file2.txt is not present in the output of this example.
directory_cmp = filecmp.dircmp(a="directory1", b="directory2", ignore=["file2.txt"])
print("=========== Comparison Report =========== \n")
directory_cmp.report()
print("\n========= Comparison Report Partial ============\n")
directory_cmp.report_partial_closure()
print("\n========= Comparison Report Full ============== \n")
directory_cmp.report_full_closure()
As a part of our fifth, example, we are explaining how we can hide information about the list of files when generating a report using dircmp. We'll be using its attribute hide for this purpose.
Our code for this example is exactly the same as example 3 with a minor change. We have set the list of two files (modified1_1_1.txt, original1_1_1.txt) as the value of hide parameter to inform the report to hide information about them.
If we compare the output of this example with previous examples then we can clearly notice the difference that information about above mentioned two files are not present in any report.
directory_cmp = filecmp.dircmp(a="directory1", b="directory2", hide=["modified1_1_1.txt", "original1_1_1.txt"])
print("=========== Comparison Report =========== \n")
directory_cmp.report()
print("\n========= Comparison Report Partial ============\n")
directory_cmp.report_partial_closure()
print("\n========= Comparison Report Full ============== \n")
directory_cmp.report_full_closure()
As a part of our sixth example, we'll be explaining various attributes of dircmp instance.
Our code for this part generates dircmp instance for directory directory1_1 which is present in both directory1 and directory2. We then print the value of all the attributes described above.
directory_cmp = filecmp.dircmp(a="directory1/directory1_1", b="directory2/directory1_1")
print("=========== Comparison Report =========== \n")
directory_cmp.report()
print("\n=========== Important Attributes of dircmp Instance ========")
print("\nLeft Directory : {}".format(directory_cmp.left))
print("Right Directory : {}".format(directory_cmp.right))
print("Left List of Files/Directories : {}".format(directory_cmp.left_list))
print("Right List of Files/Directories : {}".format(directory_cmp.right_list))
print("Common Files : {}".format(directory_cmp.common))
print("Common Directories : {}".format(directory_cmp.common_dirs))
print("Common Funny : {}".format(directory_cmp.common_files))
print("Common Files : {}".format(directory_cmp.common_funny))
print("Identical Files : {}".format(directory_cmp.same_files))
print("Different Files : {}".format(directory_cmp.diff_files))
print("Funny Files : {}".format(directory_cmp.funny_files))
print("Mapping from Dirname to dircmp : {}".format(directory_cmp.subdirs))
This ends our small tutorial explaining how we can compare files and directories using filecmp module of Python. Please feel free to let us know your views in the comments section.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to