Validation¶
Data filtering and validation with composable filters and Pydantic integration.
Overview¶
The validation module provides composable filters for data cleaning and transformation with seamless Pydantic integration:
- Composable Filters: Chain filters together
- Pydantic Integration: Automatic field validation
- Built-in Filters: Common text transformations
- Custom Filters: Create domain-specific filters
- Type-safe: Full type hints
Core Concepts¶
Filters¶
Filters are simple transformations that take a value and return a transformed value:
from dspu.validation import LowercaseFilter
filter = LowercaseFilter()
result = filter("HELLO") # "hello"
Filter Composition¶
Chain filters using .then():
from dspu.validation import StripWhitespaceFilter, LowercaseFilter
# Chain filters
email_filter = StripWhitespaceFilter().then(LowercaseFilter())
# Apply chain
email = email_filter(" ALICE@EXAMPLE.COM ") # "alice@example.com"
Filter Chains¶
Explicit chains for complex pipelines:
from dspu.validation import FilterChain
chain = FilterChain([
StripWhitespaceFilter(),
LowercaseFilter(),
TruncateFilter(max_length=50),
])
result = chain(" HELLO WORLD ") # "hello world"
Built-in Filters¶
String Filters¶
StripWhitespaceFilter¶
Remove leading/trailing whitespace:
from dspu.validation import StripWhitespaceFilter
strip = StripWhitespaceFilter()
strip(" hello ") # "hello"
LowercaseFilter¶
Convert to lowercase:
from dspu.validation import LowercaseFilter
lower = LowercaseFilter()
lower("HELLO WORLD") # "hello world"
UppercaseFilter¶
Convert to uppercase:
from dspu.validation import UppercaseFilter
upper = UppercaseFilter()
upper("hello world") # "HELLO WORLD"
TruncateFilter¶
Limit string length:
from dspu.validation import TruncateFilter
# With suffix
truncate = TruncateFilter(max_length=10, suffix="...")
truncate("Hello World!") # "Hello W..."
# Without suffix
truncate = TruncateFilter(max_length=10)
truncate("Hello World!") # "Hello Worl"
Specialized Filters¶
EmailNormalizationFilter¶
Normalize email addresses:
from dspu.validation import EmailNormalizationFilter
email = EmailNormalizationFilter()
email("Alice@Gmail.COM") # "alice@gmail.com"
Features: - Lowercase email - Remove dots from Gmail addresses - Remove plus addressing
SlugifyFilter¶
Create URL-safe slugs:
from dspu.validation import SlugifyFilter
slug = SlugifyFilter()
slug("Hello World!") # "hello-world"
slug("C++ Programming") # "c-programming"
RemoveSpecialCharsFilter¶
Remove non-alphanumeric characters:
from dspu.validation import RemoveSpecialCharsFilter
remove = RemoveSpecialCharsFilter()
remove("hello@world!") # "helloworld"
RegexReplaceFilter¶
Pattern-based replacement:
from dspu.validation import RegexReplaceFilter
# Replace digits with X
regex = RegexReplaceFilter(pattern=r"\d+", replacement="X")
regex("order123") # "orderX"
# Remove punctuation
regex = RegexReplaceFilter(pattern=r"[^\w\s]", replacement="")
regex("Hello, World!") # "Hello World"
Pydantic Integration¶
Field Validators¶
Apply filters to specific fields:
from pydantic import BaseModel
from dspu.validation import (
pydantic_filter_validator,
StripWhitespaceFilter,
LowercaseFilter,
EmailNormalizationFilter,
)
# Create filter
email_filter = (
StripWhitespaceFilter()
.then(LowercaseFilter())
.then(EmailNormalizationFilter())
)
class User(BaseModel):
name: str
email: str
# Apply filter to email field
_email_filter = pydantic_filter_validator("email", email_filter)
# Automatic filtering
user = User(name="Alice", email=" ALICE@GMAIL.COM ")
print(user.email) # "alice@gmail.com"
FilteredModel¶
Base class for models with filtering:
from dspu.validation import FilteredModel
class User(FilteredModel):
name: str
email: str
username: str
# Define filters for each field
_filters = {
"name": StripWhitespaceFilter(),
"email": StripWhitespaceFilter().then(LowercaseFilter()),
"username": StripWhitespaceFilter().then(LowercaseFilter()),
}
# Automatic filtering on all fields
user = User(
name=" Alice ",
email=" ALICE@EXAMPLE.COM ",
username=" Alice123 "
)
print(user.name) # "Alice"
print(user.email) # "alice@example.com"
print(user.username) # "alice123"
Custom Filters¶
Creating a Filter¶
from dspu.validation import Filter
class CapitalizeFilter(Filter):
def apply(self, value: str) -> str:
return value.capitalize()
# Use custom filter
capitalize = CapitalizeFilter()
result = capitalize("hello world") # "Hello world"
# Compose with built-in filters
filter = StripWhitespaceFilter().then(CapitalizeFilter())
result = filter(" hello ") # "Hello"
Parameterized Filters¶
class ReplaceFilter(Filter):
def __init__(self, old: str, new: str):
self.old = old
self.new = new
def apply(self, value: str) -> str:
return value.replace(self.old, self.new)
# Use with parameters
replace = ReplaceFilter(old="@", new="[at]")
result = replace("alice@example.com") # "alice[at]example.com"
Stateful Filters¶
class CounterFilter(Filter):
def __init__(self):
self.count = 0
def apply(self, value: str) -> str:
self.count += 1
return f"{self.count}. {value}"
counter = CounterFilter()
counter("first") # "1. first"
counter("second") # "2. second"
Common Patterns¶
Pattern 1: Email Validation¶
from pydantic import BaseModel, EmailStr
from dspu.validation import StripWhitespaceFilter, LowercaseFilter
email_filter = StripWhitespaceFilter().then(LowercaseFilter())
class User(BaseModel):
email: EmailStr # Pydantic validates format
_email_filter = pydantic_filter_validator("email", email_filter)
# Both filtered and validated
user = User(email=" Alice@Example.com ")
print(user.email) # "alice@example.com"
Pattern 2: Username Normalization¶
username_filter = (
StripWhitespaceFilter()
.then(LowercaseFilter())
.then(RemoveSpecialCharsFilter())
.then(TruncateFilter(max_length=20))
)
class User(BaseModel):
username: str
_username_filter = pydantic_filter_validator("username", username_filter)
user = User(username=" Alice@123! ")
print(user.username) # "alice123"
Pattern 3: Slug Generation¶
from dspu.validation import SlugifyFilter
from pydantic import model_validator
class Article(BaseModel):
title: str
slug: str = ""
@model_validator(mode='after')
def generate_slug(self):
if not self.slug:
self.slug = SlugifyFilter()(self.title)
return self
article = Article(title="Hello World!")
print(article.slug) # "hello-world"
Pattern 4: Form Data Sanitization¶
# Sanitize all user input
sanitize = (
StripWhitespaceFilter()
.then(TruncateFilter(max_length=1000))
)
class CommentForm(FilteredModel):
author: str
text: str
_filters = {
"author": sanitize.then(TruncateFilter(max_length=100)),
"text": sanitize,
}
comment = CommentForm(
author=" " + "A" * 200, # Very long name
text=" " + "B" * 2000, # Very long text
)
len(comment.author) # 100 (truncated)
len(comment.text) # 1000 (truncated)
Pattern 5: Multi-Field Validation¶
from pydantic import field_validator
class User(FilteredModel):
email: str
username: str
_filters = {
"email": StripWhitespaceFilter().then(LowercaseFilter()),
"username": StripWhitespaceFilter().then(LowercaseFilter()),
}
@field_validator("username")
@classmethod
def validate_username(cls, v: str) -> str:
if len(v) < 3:
raise ValueError("Username must be at least 3 characters")
return v
Best Practices¶
Filters¶
✅ DO: - Keep filters simple and focused - Compose filters for complex logic - Create reusable filter chains - Document filter behavior - Test filters with edge cases
❌ DON'T: - Don't mutate input in-place - Don't perform validation (use Pydantic validators) - Don't create overly complex filters - Don't skip error handling - Don't trust unfiltered user input
Pydantic Integration¶
✅ DO: - Use FilteredModel for multiple filtered fields - Combine with Pydantic validators - Filter before validation - Use meaningful filter names - Document filtering behavior
❌ DON'T: - Don't skip validation after filtering - Don't confuse filtering with validation - Don't override filtered values - Don't filter sensitive data (sanitize properly)
Custom Filters¶
✅ DO: - Inherit from Filter base class - Keep filters stateless when possible - Provide clear documentation - Test thoroughly - Make filters composable
❌ DON'T: - Don't add side effects - Don't perform I/O operations - Don't throw exceptions (return transformed value) - Don't mutate shared state
Common Use Cases¶
User Registration¶
class RegistrationForm(FilteredModel):
email: EmailStr
username: str
full_name: str
_filters = {
"email": email_filter,
"username": username_filter,
"full_name": StripWhitespaceFilter(),
}
@field_validator("username")
@classmethod
def validate_username(cls, v: str) -> str:
if len(v) < 3:
raise ValueError("Username too short")
if not v.isalnum():
raise ValueError("Username must be alphanumeric")
return v
Content Moderation¶
# Remove profanity and sanitize
def create_content_filter(max_length: int):
return (
StripWhitespaceFilter()
.then(RemoveSpecialCharsFilter())
.then(TruncateFilter(max_length=max_length))
)
class Post(FilteredModel):
title: str
content: str
_filters = {
"title": create_content_filter(max_length=100),
"content": create_content_filter(max_length=5000),
}
Search Query Normalization¶
search_filter = (
StripWhitespaceFilter()
.then(LowercaseFilter())
.then(RegexReplaceFilter(pattern=r"\s+", replacement=" "))
)
query = search_filter(" Hello World ") # "hello world"