Pydantic
Rationale
We use Pydantic as the standard library for data modeling, validation, and serialization across our Python components.
- It is open source.
- It defines data models using plain Python classes and type hints, requiring no separate schema language or configuration format.
- It validates data at runtime, catching invalid inputs at the boundary of the system before they propagate further.
- It supports complex validation scenarios, including field-level constraints, cross-field dependencies, and custom validator functions.
- It provides first-class serialization and deserialization to and from JSON, with fine-grained control over field aliasing and output shape.
- Its v2 implementation is backed by a Rust core, making it significantly faster than pure Python alternatives.
- It has a very large community and is one of the most downloaded Python packages, making it reliable and well-supported.
- It integrates naturally with modern Python tooling.
Alternatives
dataclasses
dataclasses is a standard library module for defining simple data containers using type annotations.
- It is part of the Python standard library, requiring no additional dependency.
- It does not perform any runtime validation; fields accept any value regardless of the annotated type.
- It has no built-in serialization support.
- It does not support field constraints, custom validators, or cross-field validation out of the box.
attrs
attrs is a mature library for writing concise and correct Python classes.
- It is open source.
- It supports validators and converters, but requires more boilerplate compared to Pydantic’s type-annotation-driven approach.
- It does not provide native JSON serialization or deserialization.
- Its API is less intuitive than Pydantic’s for developers already familiar with Python type hints.
- It has a smaller community than Pydantic.
marshmallow
marshmallow is an object serialization and deserialization library for Python.
- It is open source.
- It defines schemas separately from the data classes themselves, which increases verbosity and introduces duplication.
- It supports custom validation and field transformations, but requires more manual wiring than Pydantic.
- It does not use Python type hints as its primary interface, reducing integration with static analysis tools.
- It is slower than Pydantic v2 in most benchmarks.
- It has a smaller community than Pydantic.
Usage
We use Pydantic across our Python components for:
- Defining typed data models for structured data such as configuration files, vulnerability schemas, and API payloads.
- Validating user and external inputs at system boundaries, including CLI arguments and data ingested from third-party sources.
- Managing typed application settings loaded from environment variables via pydantic-settings.
- Serializing internal models to standards-compliant output formats such as JSON, SARIF and YAML.
Last updated on