Represent elegantly data models with ease
The Python 3.7 release introduced the feature of dataclasses. Its goal is to simplify the class creation to represent data models. Dataclasses may be compared to a high-level object-oriented data structure.
When you use Python class for this usage, you have to implement some methods such as __init__()
and __repr()__
. This is cumbersome to repeat the same routine for each class. Hopefully, Dataclasses automatically generated these methods for you.
In this Python tutorial, you will have an overview of how dataclasses work through examples and examine offered possibilities.
Declaring a Python Dataclass
Dataclass declaration needs the assignment of a decorator. In the code snippet below, you have a dataclass usage example for representing GPS coordinates:
from dataclasses import dataclass
@dataclass
class Position:
lat: float
lon: float
if __name__ == '__main__':
position = Position(37.6216, -122.3929)
print(position)
When executing this code, the Position object is printed with latitude and longitude attributes. As mentioned before, no extra __repr__()
method Β is needed:
$ python dataclass_ex1.py
Position(lat=37.6216, lon=-122.3929)
You can compare instantiated dataclass object as any other Python type with the equal operator. No extra __eq__()
is needed too:
from dataclasses import dataclass
@dataclass
class Position:
lat: float
lon: float
if __name__ == '__main__':
position = Position(37.6216, -122.3929)
print(position == Position(37.6216, -122.3929))
$ python dataclass_ex2.py
True
Implementing Methods In a Python Dataclass
As a traditional Python class, you can also implement methods inside a dataclass. In this example, a method to calculate the Harvesine distance in kilometers between two positions is added: Β
from dataclasses import dataclass
import math
@dataclass
class Position:
lat: float
lon: float
def distance_to(self, position):
"""
Calculate harversine distance between two positions
:param position: other position object
:return: a float representing distance in kilometers between two positions
"""
r = 6371.0 # Earth radius in kilometers
lam1, lam2 = math.radians(self.lon), math.radians(position.lon)
phi1, phi2 = math.radians(self.lat), math.radians(position.lat)
delta_lam, delta_phi = lam2 - lam1, phi2 - phi1
a = math.sin(delta_phi / 2) ** 2 + math.cos(phi1) * math.cos(phi2) * math.sin(delta_lam / 2) ** 2
return r * (2 * math.atan2(math.sqrt(a), math.sqrt(1 - a)))
if __name__ == '__main__':
paris = Position(2.3522219, 48.856614)
san_francisco = Position(37.6216, -122.3929)
print(paris.distance_to(san_francisco))
$ python dataclass_ex3.py
15479.614752629424
Using Python Dataclass Object As Attributes
A dataclass object is considered as any other Python type. To show you that, the following instantiate a Town object with a position dataclass object:
from dataclasses import dataclass
@dataclass
class Position:
lat: float
lon: float
@dataclass
class Town:
name: str
position: Position
if __name__ == '__main__':
paris = Town('Paris', Position(2.3522219, 48.856614))
san_francisco = Town('San Francisco', Position(37.6216, -122.3929))
print(paris.distance_to(san_francisco))
Dataclasses and Inheritance
The town class presented in the last section can be simplified. Let's consider that a town is a position. Town objects will inherit from latitude and longitude attributes from the parent Position class:
from dataclasses import dataclass
@dataclass
class Position:
lat: float
lon: float
@dataclass
class Town(Position):
name: str
if __name__ == '__main__':
paris = Town(2.3522219, 48.856614, 'Paris')
san_francisco = Town(37.6216, -122.3929, 'San Francisco')
To go even further let's add a new class to distinguish the capital among the towns:
from dataclasses import dataclass
@dataclass
class Position:
lat: float
lon: float
@dataclass
class Town(Position):
name: str
@dataclass
class Capital(Town):
pass
if __name__ == '__main__':
paris = Capital(2.3522219, 48.856614, 'Paris')
san_francisco = Town(37.6216, -122.3929, 'San Francisco')
view raw
Dataclass Fields
Dataclass has field()
specifier to customize each field of your data. It supports many different parameters. The longitude and latitude units of a position are in degrees:
from dataclasses import dataclass, field
@dataclass
class Position:
lat: float = field(default=0.0, metadata={'unit': 'degrees'})
lon: float = field(default=0.0, metadata={'unit': 'degrees'})
@dataclass
class Town(Position):
# Default arguments cannot be followed by non-default arguments
name: str = None
if __name__ == '__main__':
paris = Town(2.3522219, 48.856614, 'Paris')
san_francisco = Town(37.6216, -122.3929, 'San Francisco')
Immutability
Dataclass offers immutability option setting using frozen=True
. When this flag is enabled, the fields may never change.
Be careful of the nested dataclass containing immutable fields with inheritance.
Town positions are destined to change. The following example shows a Country dataclass which is a collection of the different towns. In this class, a function get_capital
filters the capital from the country's towns:
from dataclasses import dataclass, field
from typing import List
@dataclass(frozen=True)
class Position:
lat: float = field(default=0.0, metadata={'unit': 'degrees'})
lon: float = field(default=0.0, metadata={'unit': 'degrees'})
@dataclass(frozen=True)
class Town(Position):
name: str = None
@dataclass(frozen=True)
class Capital(Town):
pass
@dataclass
class Country:
code: str
towns: List[Town] = field(default_factory=list)
def get_capital(self):
try:
return list(filter(lambda x: isinstance(x, Capital), self.towns)).__getitem__(0)
except IndexError:
return None
if __name__ == '__main__':
paris = Capital(2.3522219, 48.856614, 'Paris')
san_francisco = Town(37.6216, -122.3929, 'San Francisco')
washington = Capital(47.751076, -120.740135, 'Washington')
united_states = Country('US', [san_francisco, washington])
print(united_states.get_capital())
Conclusion
Through multiple funny examples, you have figured out the following points using dataclasses:
- You do not need to write existing methods for a new class. You can explicitly write them to override the default behavior.
- A dataclass is not so different than a traditional Python class.
- You may define immutable objects if you feel it is appropriate to your concerns.
- Dataclass is an elegant feature to create more comprehensive data models.
Since Iβve discovered this feature, I try to use it the most possible for great readable code! And you?
Resources

