The feature image of the article presenting a diamant
Photo by Girl with red hat / Unsplash

How To Make Your Python Code Prettier With Dataclasses

Represent elegantly data models with ease

Guillaume Vincent
Guillaume Vincent

Table of Contents

The Python 3.7 release introduced the feature of dataclasses. Its goal is to simplify the class creation to represent data models. Dataclasses may be compared to a high-level object-oriented data structure.

When you use Python class for this usage, you have to implement some methods such as __init__() and __repr()__. This is cumbersome to repeat the same routine for each class. Hopefully, Dataclasses automatically generated these methods for you.

In this Python tutorial, you will have an overview of how dataclasses work through examples and examine offered possibilities.

Declaring a Python Dataclass

Dataclass declaration needs the assignment of a decorator. In the code snippet below, you have a dataclass usage example for representing GPS coordinates:

from dataclasses import dataclass


@dataclass
class Position:
  lat: float
  lon: float


if __name__ == '__main__':
  position = Position(37.6216, -122.3929)
  print(position)
Dataclass representing GPS coordinates with latitude and longitude

When executing this code, the Position object is printed with latitude and longitude attributes. As mentioned before, no extra __repr__() method  is needed:

$ python dataclass_ex1.py
Position(lat=37.6216, lon=-122.3929)

You can compare instantiated dataclass object as any other Python type with the equal operator. No extra __eq__() is needed too:

from dataclasses import dataclass


@dataclass
class Position:
  lat: float
  lon: float


if __name__ == '__main__':
  position = Position(37.6216, -122.3929)
  print(position == Position(37.6216, -122.3929))
$ python dataclass_ex2.py
True

Implementing Methods In a Python Dataclass

As a traditional Python class, you can also implement methods inside a dataclass. In this example, a method to calculate the Harvesine distance in kilometers between two positions is added:  

from dataclasses import dataclass

import math


@dataclass
class Position:
  lat: float
  lon: float

  def distance_to(self, position):
    """
    Calculate harversine distance between two positions
    :param position: other position object
    :return: a float representing distance in kilometers between two positions
    """
    r = 6371.0  # Earth radius in kilometers
    lam1, lam2 = math.radians(self.lon), math.radians(position.lon)
    phi1, phi2 = math.radians(self.lat), math.radians(position.lat)
    delta_lam, delta_phi = lam2 - lam1, phi2 - phi1
    a = math.sin(delta_phi / 2) ** 2 + math.cos(phi1) * math.cos(phi2) * math.sin(delta_lam / 2) ** 2
    return r * (2 * math.atan2(math.sqrt(a), math.sqrt(1 - a)))


if __name__ == '__main__':
  paris = Position(2.3522219, 48.856614)
  san_francisco = Position(37.6216, -122.3929)
  print(paris.distance_to(san_francisco))
$ python dataclass_ex3.py
15479.614752629424

Using Python Dataclass Object As Attributes

A dataclass object is considered as any other Python type. To show you that, the following instantiate a Town object with a position dataclass object:

from dataclasses import dataclass


@dataclass
class Position:
  lat: float
  lon: float


@dataclass
class Town:
  name: str
  position: Position


if __name__ == '__main__':
  paris = Town('Paris', Position(2.3522219, 48.856614))
  san_francisco = Town('San Francisco', Position(37.6216, -122.3929))
  print(paris.distance_to(san_francisco))

Dataclasses and Inheritance

The town class presented in the last section can be simplified. Let's consider that a town is a position. Town objects will inherit from latitude and longitude attributes from the parent Position class:

from dataclasses import dataclass


@dataclass
class Position:
  lat: float
  lon: float


@dataclass
class Town(Position):
  name: str


if __name__ == '__main__':
  paris = Town(2.3522219, 48.856614, 'Paris')
  san_francisco = Town(37.6216, -122.3929, 'San Francisco')

To go even further let's add a new class to distinguish the capital among the towns:

from dataclasses import dataclass


@dataclass
class Position:
  lat: float
  lon: float


@dataclass
class Town(Position):
  name: str


@dataclass
class Capital(Town):
  pass


if __name__ == '__main__':
  paris = Capital(2.3522219, 48.856614, 'Paris')
  san_francisco = Town(37.6216, -122.3929, 'San Francisco')
view raw

Dataclass Fields

Dataclass has field() specifier to customize each field of your data. It supports many different parameters. The longitude and latitude units of a position are in degrees:

from dataclasses import dataclass, field


@dataclass
class Position:
  lat: float = field(default=0.0, metadata={'unit': 'degrees'})
  lon: float = field(default=0.0, metadata={'unit': 'degrees'})


@dataclass
class Town(Position):
  # Default arguments cannot be followed by non-default arguments
  name: str = None


if __name__ == '__main__':
  paris = Town(2.3522219, 48.856614, 'Paris')
	san_francisco = Town(37.6216, -122.3929, 'San Francisco')

Immutability

Dataclass offers immutability option setting using frozen=True. When this flag is enabled, the fields may never change.

Be careful of the nested dataclass containing immutable fields with inheritance.

Town positions are destined to change. The following example shows a Country dataclass which is a collection of the different towns. In this class, a function get_capital filters the capital from the country's towns:

from dataclasses import dataclass, field
from typing import List


@dataclass(frozen=True)
class Position:
  lat: float = field(default=0.0, metadata={'unit': 'degrees'})
  lon: float = field(default=0.0, metadata={'unit': 'degrees'})


@dataclass(frozen=True)
class Town(Position):
  name: str = None


@dataclass(frozen=True)
class Capital(Town):
  pass


@dataclass
class Country:
  code: str
  towns: List[Town] = field(default_factory=list)

  def get_capital(self):
    try:
      return list(filter(lambda x: isinstance(x, Capital), self.towns)).__getitem__(0)
    except IndexError:
      return None


if __name__ == '__main__':
  paris = Capital(2.3522219, 48.856614, 'Paris')
  san_francisco = Town(37.6216, -122.3929, 'San Francisco')
  washington = Capital(47.751076, -120.740135, 'Washington')
  united_states = Country('US', [san_francisco, washington])
  print(united_states.get_capital())

Conclusion

Through multiple funny examples, you have figured out the following points using dataclasses:

  • You do not need to write existing methods for a new class. You can explicitly write them to override the default behavior.
  • A dataclass is not so different than a traditional Python class.
  • You may define immutable objects if you feel it is appropriate to your concerns.
  • Dataclass is an elegant feature to create more comprehensive data models.

Since I’ve discovered this feature, I try to use it the most possible for great readable code! And you?

Resources

PEP 557 -- Data Classes
The official home of the Python Programming Language
dataclasses — Data Classes — Python 3.9.4 documentation
Programming

Guillaume Vincent Twitter

DevOps Engineer & AWS Certified Solution Architect. Cloud enthusiast and automation addict