Pydantic

Rationale

We use Pydantic as the standard library for data modeling, validation, and serialization across our Python components.

It is open source.
It defines data models using plain Python classes and type hints, requiring no separate schema language or configuration format.
It validates data at runtime, catching invalid inputs at the boundary of the system before they propagate further.
It supports complex validation scenarios, including field-level constraints, cross-field dependencies, and custom validator functions.
It provides first-class serialization and deserialization to and from JSON, with fine-grained control over field aliasing and output shape.
Its v2 implementation is backed by a Rust core, making it significantly faster than pure Python alternatives.
It has a very large community and is one of the most downloaded Python packages, making it reliable and well-supported.
It integrates naturally with modern Python tooling.

dataclasses is a standard library module for defining simple data containers using type annotations.

It is part of the Python standard library, requiring no additional dependency.
It does not perform any runtime validation; fields accept any value regardless of the annotated type.
It has no built-in serialization support.
It does not support field constraints, custom validators, or cross-field validation out of the box.

attrs is a mature library for writing concise and correct Python classes.

It is open source.
It supports validators and converters, but requires more boilerplate compared to Pydantic’s type-annotation-driven approach.
It does not provide native JSON serialization or deserialization.
Its API is less intuitive than Pydantic’s for developers already familiar with Python type hints.
It has a smaller community than Pydantic.

marshmallow is an object serialization and deserialization library for Python.

It is open source.
It defines schemas separately from the data classes themselves, which increases verbosity and introduces duplication.
It supports custom validation and field transformations, but requires more manual wiring than Pydantic.
It does not use Python type hints as its primary interface, reducing integration with static analysis tools.
It is slower than Pydantic v2 in most benchmarks.
It has a smaller community than Pydantic.

We use Pydantic across our Python components for:

Defining typed data models for structured data such as configuration files, vulnerability schemas, and API payloads.
Validating user and external inputs at system boundaries, including CLI arguments and data ingested from third-party sources.
Managing typed application settings loaded from environment variables via pydantic-settings.
Serializing internal models to standards-compliant output formats such as JSON, SARIF and YAML.