All the python objects have special methods that are executed by the python interpreter when the user performs some operations like object creation, addition, object comparison, converting an object to string representation, etc. These special methods start and end with double underscores (commonly referred to as dunder methods). The user explicitly does not need to call them but they are called by the python when the user performs operations. The majority of these methods have default implementations which comes when we create a class which by default extends the object class from Python 3 forward. Sometimes these default implementations are not helpful and we need some way to easily create these methods without explicitly implementing them each time. Python provides a module named dataclasses which can help us add a few important dunder methods implementation to our class by just decorating it with one of its decorators. As a part of this tutorial, we'll explain how we can use various methods of dataclasses module to add special methods to our classes with as much less code as possible. We'll be explaining the API of the module with very simple and easy to use examples.
Below is a list of possible special methods that can be added using dataclasses module.
__init__
__repr__
__eq__
__lt__
__gt__
__le__
__ge__
__hash__
As a part of our first example, we'll demonstrate how we can add three dunder methods to our class with just the addition of dataclasses.dataclass decorator. We need to define a class with attribute names and their type annotation in order for a decorator to work. The type annotation (data types) won't actually be forced data type by Python interpreter but will be used as hints.
Our example code for this part first creates a simple class named Employee which has three attributes named emp_id, name, and age. We have also included type annotations for each attribute. We can create an instance of this class but it won't have any attributes. All objects have the default dunder init method which comes from a superclass object but it does not do anything. If we want to add these attributes as instance attributes then we need to override the dunder init method and write our own which will override one that came from the object class.
We can avoid coding our own dunder init method by using dataclass() method of dataclasses as a class decorator. It'll add an implementation of three dunder methods namely init, repr, and eq to our class without us explicitly needing to do anything.
After we have decorated our class with dataclass decorator, we are trying to create an instance that will fail because the init method is added which will require us to provide all three attributes that we have specified.
We are then also printing instances of the class which prints a good representation of class with attribute name and values because the dunder repr method also got override. We can notice that when we print employee instance.
Our code further creates three different instances of employees and compares them using the double equals condition (which invokes the dunder eq method for comparison). The dataclass decorator also adds an implementation of the dunder eq method by default. It compares two instances of the class by keeping all attributes in a tuple. It'll compare (emp_id, name, age) tuple of one instance with another instance. Hence it compares one tuple with another. We can notice from our code that emp1 and emp3 have the same attribute values hence when their tuple of attribute values will be compared, it'll return True.
class Employee:
emp_id: int
name: str
age: int
employee = Employee()
employee.name
import dataclasses
@dataclasses.dataclass
class Employee:
emp_id: int
name: str
age: int
try:
employee = Employee()
except Exception as e:
print("Employee Creation Failed. Error : {}".format(e))
attributesemployee = Employee(123, "William", 30)
print("Employee Detail : {}".format(employee))
print("Employee Detail : {}-{}-{}".format(employee.emp_id, employee.name, employee.age))
emp1 = Employee(123, "William", 30)
emp2 = Employee(123, "William", 32)
emp3 = Employee(123, "William", 30)
print("Is emp1 and emp2 are same? {}".format(emp1 == emp2))
print("Is emp2 and emp3 are same? {}".format(emp2 == emp3))
print("Is emp1 and emp3 are same? {}".format(emp1 == emp3))
The dataclass decorator has a list of parameters that we can tweak according to our needs. Below is a list of parameters.
We'll now explain how we can use these parameters with examples.
Our second example has code that is almost the same as our first example but we have added default values to our class attributes. These default values will be added to the definition of the dunder init method. We are then creating instances of class Employee in a different way and printing them to see the effect of default values.
Please make a NOTE that we have decorated the class with dataclass(), unlike the previous example where we had used a version without parenthesis. Both mean the same thing.
import dataclasses
@dataclasses.dataclass()
class Employee:
emp_id: int = 123456789
name: str = "NA"
age: int = None
employee = Employee()
print("Employee Detail : {}".format(employee))
employee = Employee(123, "William")
print("Employee Detail : {}".format(employee))
employee = Employee(name="William G")
print("Employee Detail : {}".format(employee))
employee = Employee(123, "William", 31)
print("Employee Detail : {}".format(employee))
As a part of our third example, we are setting arguments of dataclass decorator. We have set parameters init, repr, eq, and order to True. The parameters init, repr and eq are True by default hence we have actually set only the order parameter to True.
Our code for this example builds on the previous example but adds few lines of its own. It creates a class like the previous example but has a decorator with parameters init, repr, eq, and order set to True. The order parameter will add dunder gt, lt, ge, and le methods to the class which will let us compare class instances.
Our code then creates three different instances of the class and printed comparison results between these three instances for operations greater than, less than, greater than, or equal to and less than or equal to. Each time we do instance comparison equivalent dunder method will be called and the tuple of (emp_id, name, age) will be compared between instances to find out the result.
We can notice from the result that comparing emp1-(10, "Willaim", 30) instance with emp2-(123, "William", 32) for greater than comes False because first two values of the tuple (emp_id, name) is same but age is different (greater for emp2 instance). The same logic will be applied to all comparisons where each element of tuple will be compared one by one until the difference is found. If there is no difference after comparing all values of the tuple then both instances will be considered the same.
Please make a NOTE that when you are assigning default values to attributes if you assign a default value to the first few attributes and then left the last few attributes then it'll raise an error TypeError saying non-default argument follows default argument. This error will be raised when creating an instance of the class because the inside definition of the dunder init method first few attributes will default values and attributes followed them won't have and that is not acceptable behavior in Python.
import dataclasses
@dataclasses.dataclass(init=True, repr=True, eq=True, order=True)
class Employee:
emp_id: int
name: str = "NA"
age: int = None
emp1 = Employee(123, "William", 30)
emp2 = Employee(123, "William", 32)
emp3 = Employee(234, "Dalton")
print("Is emp1 greater than emp2? {}".format(emp1 > emp2))
print("Is emp2 greater than emp1? {}".format(emp2 > emp1))
print("Is emp3 greater than emp2? {}".format(emp3 > emp2))
print("\nIs emp1 less than emp2? {}".format(emp1 < emp2))
print("Is emp2 less than emp1? {}".format(emp2 < emp1))
print("Is emp3 less than emp2? {}".format(emp3 < emp2))
print("\nIs emp1 greater than or equal to emp2? {}".format(emp1 >= emp2))
print("Is emp2 greater than or equal to emp1? {}".format(emp2 >= emp1))
print("Is emp3 greater than or equal to emp2? {}".format(emp3 >= emp2))
print("\nIs emp1 less than or equal to emp2? {}".format(emp1 <= emp2))
print("Is emp2 less than or equal to emp1? {}".format(emp2 <= emp1))
print("Is emp3 less than or equal to emp2? {}".format(emp3 <= emp2))
As part of our fourth example, we are demonstrating how we can prevent instance modification once created. We can set frozen parameter to True inside of dataclass decorator and it'll raise FrozenInstanceError error whenever we try to modify class attribute.
Our code for this example first builds class normal way like previous examples. We have set frozen parameter to True in this example. We have also created an instance of the class.
We have then tried to modify all three attributes of the instance and all of them raises error preventing instance modification.
import dataclasses
@dataclasses.dataclass(init=True, repr=True, eq=True, order=True, frozen=True)
class Employee:
emp_id: int
name: str = "NA"
age: int = None
emp1 = Employee(123, "William", 30)
try:
emp1.name = "William G"
except Exception as e:
print("ErrorType : {}, Error : {}".format(type(e).__name__, e))
try:
emp1.age = 32
except Exception as e:
print("ErrorType : {}, Error : {}".format(type(e).__name__, e))
try:
emp1.emp_id = 245
except Exception as e:
print("ErrorType : {}, Error : {}".format(type(e).__name__, e))
As a part of our fifth example, we are demonstrating how we can force the class to generate a hash of the instance using its attribute values. We can force class to generate hash by setting unsafe_hash parameter to True inside of dataclass decorator.
Our code for this example creates two different employee classes where the second one has unsafe_hash and frozen set to True. We have then created two instances, one for each class. Both instances have the same attribute values.
We are then trying to generate a hash of both instances. We can notice from the output that hashing of emp1 instance generated from class Employee1 fails whereas hashing of emp2 instance generated from class Employee2 works.
Please make a NOTE that when we set eq parameter to True and frozen parameter to False in the decorator, it sets dunder hash method to None hence hashing fails. If you create a class without dataclass decorator then the dunder hash method will be present by default which comes from a superclass object which is based on the address (the number we get when we do id(object)) of the instance.
import dataclasses
@dataclasses.dataclass(init=True, repr=True, eq=True, order=True)
class Employee1:
emp_id: int
name: str = "NA"
age: int = None
@dataclasses.dataclass(init=True, repr=True, eq=True, order=True, unsafe_hash=True, frozen=True)
class Employee2:
emp_id: int
name: str = "NA"
age: int = None
emp1 = Employee1(123, "William", 30)
emp2 = Employee2(123, "William", 30)
try:
hash_of_emp1 = hash(emp1)
print("Hash Value : {}".format(hash_of_emp1))
except Exception as e:
print("ErrorType : {}, Error : {}".format(type(e).__name__, e))
try:
hash_of_emp2 = hash(emp2)
print("Hash Value : {}".format(hash_of_emp2))
except Exception as e:
print("ErrorType : {}, Error : {}".format(type(e).__name__, e))
Our sixth example explains how we can keep class attributes when decorating class with dataclass decorator. We need to set the type annotation of the attribute as typing.ClassVar to point to dataclass decorator that this attribute should be treated as a class attribute and not an instance attribute.
Our code for this example builds on previous examples. We have again created Employee class but have added one extra attribute named department with type annotation as typing.ClassVar. We have also set the class variable department which is optional and can be set later as well.
We have then created instances of the class and tried to print attributes of the class. When we print class attribute it only prints three instance attributes. We have explicitly printed class attribute department. We can access a class attribute from instance and class both.
import dataclasses
import typing
@dataclasses.dataclass(init=True, repr=True, eq=True, order=True, frozen=True)
class Employee:
emp_id: int
name: str = "NA"
age: int = None
department: typing.ClassVar = "Computer Science"
print("Employee's Department by Default : {}".format(Employee.department))
employee = Employee(123, "William", 30)
print("\nEmployee Detail : {}".format(employee))
print("Employee Department : {}".format(employee.department))
employee = Employee(456, "Conrad Dalton", 45)
print("\nEmployee Detail : {}".format(employee))
print("Employee Department : {}".format(employee.department))
As a part of our seventh example, we'll explain how we can give special treatment to attributes like including or not including them when initializing objects, comparing objects, generating a hash, etc. We can do this by using field() method of dataclass module.
Our code for this example creates an Employee class like previous examples but we are assigning the result of calling field() to each attribute of class as default value. This method will have information about how to handle this attribute. We have also introduced a new attribute name addresses which will be a tuple of strings and have information about employee addresses.
Our code then creates employee instances in different ways and prints them.
Below is a list of important attributes of field() method which we'll explain through various examples.
Please make a NOTE that example 7-11 will work on Python 3.8+ only as it uses type annotation tuple[str] which is not supported in previous versions. We can also use only tuple as type annotation and the code will work just fine.
import dataclasses
@dataclasses.dataclass
class Employee:
emp_id: int = dataclasses.field(default=123456789)
name: str = dataclasses.field(default="NotPresent")
age: int = dataclasses.field(default=None)
addresses: tuple[str] = dataclasses.field(default_factory=tuple)
employee = Employee(123, "William", 30, addresses=("Address1", "Address2"))
print("Employee Detail : {}".format(employee))
employee = Employee(123, "William", 30, addresses=())
print("Employee Detail : {}".format(employee))
Our eighth example is a continuation of explaining the usage of field() method for special handling of attributes. As a part of this example, we'll explain how we can inform to exclude some attributes from the dunder init method, exclude attribute when creating string representation of instance, and exclude attribute when doing a comparison of instances.
Our class for this example has code almost the same as the previous example with few changes.
We have set parameter init to False for emp_id attribute which hints that we don't need emp_id when creating class instance. It won't be included in the dunder init definition.
We have then set attribute repr and compare to False for attribute addresses. This will make sure that addresses attribute is not included when creating a string representation of an instance of a class. It'll also make sure that when doing instance comparison which is based on a comparison between tuple of attribute value won't include addresses attribute. It'll only do instance comparison based on (emp_id, name, age).
Our code then creates employee instances in different ways, prints representation to check the changes. We also compare instances to check whether it’s considering addresses attribute in comparison or not. We can understand from results that it does not consider addresses attribute when comparing instances, else emp1 and emp2 comparison would have returned False (It returns True meaning that addresses attribute was ignored when comparing).
import dataclasses
@dataclasses.dataclass
class Employee:
emp_id: int = dataclasses.field(default=123456789, init=False)
name: str = dataclasses.field(default="NotPresent")
age: int = dataclasses.field(default=None)
addresses: tuple[str] = dataclasses.field(default_factory=tuple, repr=False, compare=False)
employee = Employee("William", 30, addresses=("Address1", "Address2"))
print("Employee Detail : {}".format(employee))
print("Employee Addresses : {}".format(employee.addresses))
employee = Employee("William", 35)
employee.emp_id = 123
print("\nEmployee Detail : {}".format(employee))
print("Employee Addresses : {}".format(employee.addresses))
emp1 = Employee("William", 30)
print("\nEmployee Detail : {}, Addresses : {}".format(emp1, emp1.addresses))
emp2 = Employee("William", 30, addresses=("Address1", "Address2"))
print("Employee Detail : {}, Addresses : {}".format(emp2, emp2.addresses))
emp3 = Employee("William", 35, addresses=("Address1", "Address2"))
print("Employee Detail : {}, Addresses : {}".format(emp3, emp3.addresses))
print("\nIs emp1 and emp2 are same? {}".format(emp1 == emp2))
print("Is emp2 and emp3 are same? {}".format(emp2 == emp3))
print("Is emp1 and emp3 are same? {}".format(emp1 == emp3))
Our ninth example further expands on explaining the usage of field() method. This time we are considering calculating the hash of the instance of the class and we'll explain how we can inform through field() method which fields to include in hashing and which does not.
Our code for this part creates three classes. All classes have the same code as the previous example's class with a minor change. Below we have explained changes in a simple way.
We have not included addresses attribute in any of the class for hashing.
Our code then creates instances from all three classes and prints the hash generated from each instance. We can notice how different fields contribute to generating different hash for the class.
Please make a NOTE that we have also set frozen attribute of dataclass to True so that hash can be generated considering instances are immutable.
import dataclasses
@dataclasses.dataclass(frozen=True)
class Employee1:
emp_id: int = dataclasses.field(default=123456789, hash=True)
name: str = dataclasses.field(default="NotPresent", hash=False)
age: int = dataclasses.field(default=None, hash=False)
addresses: tuple[str] = dataclasses.field(default_factory=tuple, repr=False, compare=False, hash=False)
@dataclasses.dataclass(frozen=True)
class Employee2:
emp_id: int = dataclasses.field(default=123456789, hash=True)
name: str = dataclasses.field(default="NotPresent", hash=True)
age: int = dataclasses.field(default=None, hash=False)
addresses: tuple[str] = dataclasses.field(default_factory=tuple, repr=False, compare=False, hash=False)
@dataclasses.dataclass(frozen=True)
class Employee3:
emp_id: int = dataclasses.field(default=123456789, hash=True)
name: str = dataclasses.field(default="NotPresent", hash=True)
age: int = dataclasses.field(default=None, hash=True)
addresses: tuple[str] = dataclasses.field(default_factory=tuple, repr=False, compare=False, hash=False)
employee1 = Employee1(123, "William", 30, addresses=("Address1", "Address2"))
print("Employee1 Detail : {}".format(employee1))
print("Employee1 Addresses : {}".format(employee1.addresses))
print("Hash of Employee1 : {}".format(hash(employee1)))
employee2 = Employee2(123, "William", 30, addresses=("Address1", "Address2"))
print("\nEmployee2 Detail : {}".format(employee2))
print("Employee2 Addresses : {}".format(employee2.addresses))
print("Hash of Employee2 : {}".format(hash(employee2)))
employee3 = Employee3(123, "William", 30, addresses=("Address1", "Address2"))
print("\nEmployee3 Detail : {}".format(employee3))
print("Employee3 Addresses : {}".format(employee3.addresses))
print("Hash of Employee3 : {}".format(hash(employee3)))
## Below all will return False due to hash value difference.
print("\nIs employee1 is equal to employee2? {}".format(employee1 == employee2))
print("\nIs employee1 is equal to employee3? {}".format(employee1 == employee3))
print("\nIs employee2 is equal to employee3? {}".format(employee2 == employee3))
We'll use our tenth example to demonstrate the usage of few important methods available with dataclasses module.
Our code for this part starts by creating Employee class which has the same definition as class Employee1 from the previous example. We then explain the usage of each method by printing their results.
import dataclasses
@dataclasses.dataclass
class Employee:
emp_id: int = dataclasses.field(default=123456789, init=False, hash=True)
name: str = dataclasses.field(default="NotPresent", hash=False)
age: int = dataclasses.field(default=None, hash=False)
addresses: tuple[str] = dataclasses.field(default_factory=tuple, repr=False, compare=False, hash=False)
print("========== Field Details =============")
for field in dataclasses.fields(Employee):
print(field, "\n")
print("======================================")
employee = Employee("William", 30, addresses=("Address1", "Address2"))
print("\nEmployee Details as Dictionary : {}".format(dataclasses.asdict(employee)))
print("\nEmployee Details as Tuple : {}".format(dataclasses.astuple(employee)))
print("\nIs employee instance a data class generated? {}".format(dataclasses.is_dataclass(employee)))
print("\nIs Employee a data class? {}".format(dataclasses.is_dataclass(Employee)))
We'll use our eleventh example to explain how we can easily create a copy of an instance of the class by replacing values of few attributes of the class using replace() method. This can be useful in situations when the process of creating a new instance is heavy, an instance has a lot of attributes and we need a new instance with only a modification of few attributes.
Our code starts by defining Employee class like previous examples. We then create an employee instance. We then modify the employee instances with replace() method.
import dataclasses
@dataclasses.dataclass
class Employee:
emp_id: int = dataclasses.field(default=123456789, init=False, hash=True)
name: str = dataclasses.field(default="NotPresent", hash=False)
age: int = dataclasses.field(default=None, hash=False)
addresses: tuple[str] = dataclasses.field(default_factory=tuple, repr=False, compare=False, hash=False)
employee = Employee("William", 30, addresses=("Address1", "Address2"))
print("\nEmployee Details as Dictionary : {}".format(dataclasses.asdict(employee)))
employee = dataclasses.replace(employee, name="William G")
print("\nEmployee Details as Dictionary : {}".format(dataclasses.asdict(employee)))
employee = dataclasses.replace(employee, age=32, addresses=("Address-3", "Address-4"))
print("\nEmployee Details as Dictionary : {}".format(dataclasses.asdict(employee)))
employee = dataclasses.replace(employee, **{"age":33, "addresses": ("Address-4", "Address-5"), "name":"William Grfn"})
print("\nEmployee Details as Dictionary : {}".format(dataclasses.asdict(employee)))
try:
employee = dataclasses.replace(employee, age=32, address="Address-3")
print("\nEmployee Details as Dictionary : {}".format(dataclasses.asdict(employee)))
except Exception as e:
print("\nErrorType : {}, Error : {}".format(type(e).__name__, e))
As a part of our example, we'll explain how we can create a data class using make_dataclass() method of dataclasses module. We'll try to create a copy of the exact data class that we have created in our previous examples.
Our code for this example starts by creating two method named date_of_birth() and raised_salary().
Our code then creates a data class using make_dataclass method which has exactly the same impact as data classes that we have created in our earlier example with the addition of two new methods explained above.
Our code then creates instances of the class, creates a string representation of them, does a comparison between them, and tries to create a hash of them to verify the working of the created data class.
@dataclasses.dataclass(init=True, repr=True, eq=True, order=True, unsafe_hash=True, frozen=True)
class Employee(object):
emp_id: int = 123456789
name: str = "NA"
age: int = -1
salary: int = -1
def date_of_birth(self):
return datetime.datetime.now() - datetime.timedelta(days = self.age*365)
def raised_salary(self, prcnt):
'''
prcnt : float : It should be in the range 0-1. This is considering nobody gets more than 100% hike.
No arguments please.
'''
return int(self.salary + self.salary * prcnt)
import dataclasses
import datetime
def date_of_birth(self):
return datetime.datetime.now() - datetime.timedelta(days = self.age*365)
def raised_salary(self, prcnt):
'''
prcnt : float : It should be in the range 0-1. This is considering nobody gets more than 100% hike. No arguments please.
'''
return int(self.salary + self.salary * prcnt)
Employee = dataclasses.make_dataclass(
cls_name="Employee",
fields=[
("emp_id", int, dataclasses.field(default=123456789)),
("name", str, dataclasses.field(default="NA")),
("age", int, dataclasses.field(default=-1)),
("salary", int, dataclasses.field(default=-1)),
],
bases=(object, ),
namespace={"date_of_birth" : date_of_birth, "raised_salary": raised_salary},
init=True,
repr=True,
eq=True,
order=True,
unsafe_hash=True,
frozen=True
)
employee = Employee(123, "William", 30, 100000)
print("Employee Detail : {}".format(employee))
dob = employee.date_of_birth()
print("\nEmployee DOB : {}".format(dob))
raised_salary = employee.raised_salary(0.10)
print("\nEmployee Salary After Raise : {}".format(raised_salary))
emp1 = Employee(123, "William", 30)
emp2 = Employee(123, "William", 32)
emp3 = Employee(123, "William", 30)
print("\nIs emp1 and emp2 are same? {}".format(emp1 == emp2))
print("Is emp2 and emp3 are same? {}".format(emp2 == emp3))
print("Is emp1 and emp3 are same? {}".format(emp1 == emp3))
print("\nIs emp1 greater than emp2? {}".format(emp1 > emp2))
print("Is emp2 greater than emp1? {}".format(emp2 > emp1))
print("Is emp3 greater than emp2? {}".format(emp3 > emp2))
print("\nIs emp1 less than emp2? {}".format(emp1 < emp2))
print("Is emp2 less than emp1? {}".format(emp2 < emp1))
print("Is emp3 less than emp2? {}".format(emp3 < emp2))
print("\nIs emp1 greater than or equal to emp2? {}".format(emp1 >= emp2))
print("Is emp2 greater than or equal to emp1? {}".format(emp2 >= emp1))
print("Is emp3 greater than or equal to emp2? {}".format(emp3 >= emp2))
print("\nIs emp1 less than or equal to emp2? {}".format(emp1 <= emp2))
print("Is emp2 less than or equal to emp1? {}".format(emp2 <= emp1))
print("Is emp3 less than or equal to emp2? {}".format(emp3 <= emp2))
try:
emp1.name = "William G"
except Exception as e:
print("ErrorType : {}, Error : {}".format(type(e).__name__, e))
try:
hash_of_emp1 = hash(emp1)
print("Hash Value : {}".format(hash_of_emp1))
except Exception as e:
print("ErrorType : {}, Error : {}".format(type(e).__name__, e))
This ends our small tutorial explaining the API of dataclasses module with simple and easy to understand examples. Please feel free to let us know your views in the comments section.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to