Skip to content

Validation

Data filtering and validation with composable filters and Pydantic integration.

Overview

The validation module provides composable filters for data cleaning and transformation with seamless Pydantic integration:

  • Composable Filters: Chain filters together
  • Pydantic Integration: Automatic field validation
  • Built-in Filters: Common text transformations
  • Custom Filters: Create domain-specific filters
  • Type-safe: Full type hints

Core Concepts

Filters

Filters are simple transformations that take a value and return a transformed value:

from dspu.validation import LowercaseFilter

filter = LowercaseFilter()
result = filter("HELLO")  # "hello"

Filter Composition

Chain filters using .then():

from dspu.validation import StripWhitespaceFilter, LowercaseFilter

# Chain filters
email_filter = StripWhitespaceFilter().then(LowercaseFilter())

# Apply chain
email = email_filter("  ALICE@EXAMPLE.COM  ")  # "alice@example.com"

Filter Chains

Explicit chains for complex pipelines:

from dspu.validation import FilterChain

chain = FilterChain([
    StripWhitespaceFilter(),
    LowercaseFilter(),
    TruncateFilter(max_length=50),
])

result = chain("  HELLO WORLD  ")  # "hello world"

Built-in Filters

String Filters

StripWhitespaceFilter

Remove leading/trailing whitespace:

from dspu.validation import StripWhitespaceFilter

strip = StripWhitespaceFilter()
strip("  hello  ")  # "hello"

LowercaseFilter

Convert to lowercase:

from dspu.validation import LowercaseFilter

lower = LowercaseFilter()
lower("HELLO WORLD")  # "hello world"

UppercaseFilter

Convert to uppercase:

from dspu.validation import UppercaseFilter

upper = UppercaseFilter()
upper("hello world")  # "HELLO WORLD"

TruncateFilter

Limit string length:

from dspu.validation import TruncateFilter

# With suffix
truncate = TruncateFilter(max_length=10, suffix="...")
truncate("Hello World!")  # "Hello W..."

# Without suffix
truncate = TruncateFilter(max_length=10)
truncate("Hello World!")  # "Hello Worl"

Specialized Filters

EmailNormalizationFilter

Normalize email addresses:

from dspu.validation import EmailNormalizationFilter

email = EmailNormalizationFilter()
email("Alice@Gmail.COM")  # "alice@gmail.com"

Features: - Lowercase email - Remove dots from Gmail addresses - Remove plus addressing

SlugifyFilter

Create URL-safe slugs:

from dspu.validation import SlugifyFilter

slug = SlugifyFilter()
slug("Hello World!")        # "hello-world"
slug("C++ Programming")     # "c-programming"

RemoveSpecialCharsFilter

Remove non-alphanumeric characters:

from dspu.validation import RemoveSpecialCharsFilter

remove = RemoveSpecialCharsFilter()
remove("hello@world!")  # "helloworld"

RegexReplaceFilter

Pattern-based replacement:

from dspu.validation import RegexReplaceFilter

# Replace digits with X
regex = RegexReplaceFilter(pattern=r"\d+", replacement="X")
regex("order123")  # "orderX"

# Remove punctuation
regex = RegexReplaceFilter(pattern=r"[^\w\s]", replacement="")
regex("Hello, World!")  # "Hello World"

Pydantic Integration

Field Validators

Apply filters to specific fields:

from pydantic import BaseModel
from dspu.validation import (
    pydantic_filter_validator,
    StripWhitespaceFilter,
    LowercaseFilter,
    EmailNormalizationFilter,
)

# Create filter
email_filter = (
    StripWhitespaceFilter()
    .then(LowercaseFilter())
    .then(EmailNormalizationFilter())
)

class User(BaseModel):
    name: str
    email: str

    # Apply filter to email field
    _email_filter = pydantic_filter_validator("email", email_filter)

# Automatic filtering
user = User(name="Alice", email="  ALICE@GMAIL.COM  ")
print(user.email)  # "alice@gmail.com"

FilteredModel

Base class for models with filtering:

from dspu.validation import FilteredModel

class User(FilteredModel):
    name: str
    email: str
    username: str

    # Define filters for each field
    _filters = {
        "name": StripWhitespaceFilter(),
        "email": StripWhitespaceFilter().then(LowercaseFilter()),
        "username": StripWhitespaceFilter().then(LowercaseFilter()),
    }

# Automatic filtering on all fields
user = User(
    name="  Alice  ",
    email="  ALICE@EXAMPLE.COM  ",
    username="  Alice123  "
)

print(user.name)      # "Alice"
print(user.email)     # "alice@example.com"
print(user.username)  # "alice123"

Custom Filters

Creating a Filter

from dspu.validation import Filter

class CapitalizeFilter(Filter):
    def apply(self, value: str) -> str:
        return value.capitalize()

# Use custom filter
capitalize = CapitalizeFilter()
result = capitalize("hello world")  # "Hello world"

# Compose with built-in filters
filter = StripWhitespaceFilter().then(CapitalizeFilter())
result = filter("  hello  ")  # "Hello"

Parameterized Filters

class ReplaceFilter(Filter):
    def __init__(self, old: str, new: str):
        self.old = old
        self.new = new

    def apply(self, value: str) -> str:
        return value.replace(self.old, self.new)

# Use with parameters
replace = ReplaceFilter(old="@", new="[at]")
result = replace("alice@example.com")  # "alice[at]example.com"

Stateful Filters

class CounterFilter(Filter):
    def __init__(self):
        self.count = 0

    def apply(self, value: str) -> str:
        self.count += 1
        return f"{self.count}. {value}"

counter = CounterFilter()
counter("first")   # "1. first"
counter("second")  # "2. second"

Common Patterns

Pattern 1: Email Validation

from pydantic import BaseModel, EmailStr
from dspu.validation import StripWhitespaceFilter, LowercaseFilter

email_filter = StripWhitespaceFilter().then(LowercaseFilter())

class User(BaseModel):
    email: EmailStr  # Pydantic validates format

    _email_filter = pydantic_filter_validator("email", email_filter)

# Both filtered and validated
user = User(email="  Alice@Example.com  ")
print(user.email)  # "alice@example.com"

Pattern 2: Username Normalization

username_filter = (
    StripWhitespaceFilter()
    .then(LowercaseFilter())
    .then(RemoveSpecialCharsFilter())
    .then(TruncateFilter(max_length=20))
)

class User(BaseModel):
    username: str

    _username_filter = pydantic_filter_validator("username", username_filter)

user = User(username="  Alice@123!  ")
print(user.username)  # "alice123"

Pattern 3: Slug Generation

from dspu.validation import SlugifyFilter
from pydantic import model_validator

class Article(BaseModel):
    title: str
    slug: str = ""

    @model_validator(mode='after')
    def generate_slug(self):
        if not self.slug:
            self.slug = SlugifyFilter()(self.title)
        return self

article = Article(title="Hello World!")
print(article.slug)  # "hello-world"

Pattern 4: Form Data Sanitization

# Sanitize all user input
sanitize = (
    StripWhitespaceFilter()
    .then(TruncateFilter(max_length=1000))
)

class CommentForm(FilteredModel):
    author: str
    text: str

    _filters = {
        "author": sanitize.then(TruncateFilter(max_length=100)),
        "text": sanitize,
    }

comment = CommentForm(
    author="  " + "A" * 200,  # Very long name
    text="  " + "B" * 2000,   # Very long text
)

len(comment.author)  # 100 (truncated)
len(comment.text)    # 1000 (truncated)

Pattern 5: Multi-Field Validation

from pydantic import field_validator

class User(FilteredModel):
    email: str
    username: str

    _filters = {
        "email": StripWhitespaceFilter().then(LowercaseFilter()),
        "username": StripWhitespaceFilter().then(LowercaseFilter()),
    }

    @field_validator("username")
    @classmethod
    def validate_username(cls, v: str) -> str:
        if len(v) < 3:
            raise ValueError("Username must be at least 3 characters")
        return v

Best Practices

Filters

DO: - Keep filters simple and focused - Compose filters for complex logic - Create reusable filter chains - Document filter behavior - Test filters with edge cases

DON'T: - Don't mutate input in-place - Don't perform validation (use Pydantic validators) - Don't create overly complex filters - Don't skip error handling - Don't trust unfiltered user input

Pydantic Integration

DO: - Use FilteredModel for multiple filtered fields - Combine with Pydantic validators - Filter before validation - Use meaningful filter names - Document filtering behavior

DON'T: - Don't skip validation after filtering - Don't confuse filtering with validation - Don't override filtered values - Don't filter sensitive data (sanitize properly)

Custom Filters

DO: - Inherit from Filter base class - Keep filters stateless when possible - Provide clear documentation - Test thoroughly - Make filters composable

DON'T: - Don't add side effects - Don't perform I/O operations - Don't throw exceptions (return transformed value) - Don't mutate shared state

Common Use Cases

User Registration

class RegistrationForm(FilteredModel):
    email: EmailStr
    username: str
    full_name: str

    _filters = {
        "email": email_filter,
        "username": username_filter,
        "full_name": StripWhitespaceFilter(),
    }

    @field_validator("username")
    @classmethod
    def validate_username(cls, v: str) -> str:
        if len(v) < 3:
            raise ValueError("Username too short")
        if not v.isalnum():
            raise ValueError("Username must be alphanumeric")
        return v

Content Moderation

# Remove profanity and sanitize
def create_content_filter(max_length: int):
    return (
        StripWhitespaceFilter()
        .then(RemoveSpecialCharsFilter())
        .then(TruncateFilter(max_length=max_length))
    )

class Post(FilteredModel):
    title: str
    content: str

    _filters = {
        "title": create_content_filter(max_length=100),
        "content": create_content_filter(max_length=5000),
    }

Search Query Normalization

search_filter = (
    StripWhitespaceFilter()
    .then(LowercaseFilter())
    .then(RegexReplaceFilter(pattern=r"\s+", replacement=" "))
)

query = search_filter("  Hello    World  ")  # "hello world"

Installation

# Validation included in base installation
pip install dspu

Next Steps