I/O Examples¶
Practical examples for storage abstraction and multi-format file operations.
Overview¶
The I/O module provides a unified API for file operations across local and cloud storage with support for multiple file formats.
Examples¶
1. Basic File Operations¶
File: examples/io/01_basic_file_operations.py
Learn the fundamentals.
Topics: - Creating a Storage instance - Writing and reading files - Checking file existence - Listing files with patterns - Deleting files - Working with raw bytes
Run:
2. Multi-Format Files¶
File: examples/io/02_multi_format_files.py
Master file format handling.
Topics: - Writing to 9 different formats (JSON, YAML, TOML, HOCON, CSV, Markdown, Python, Env, Text) - Auto-detection from file extensions - Format-specific options (indentation, delimiters, etc.) - Reading back from various formats - Format override capabilities
Supported Formats: - Structured: JSON, YAML, TOML, HOCON - Text: Plain text, Python code, Markdown - Tabular: CSV - Config: .env files
Run:
3. Path Resolution¶
File: examples/io/03_path_resolution.py
Work safely with file paths.
Topics: - PathResolver for relative paths - Resolving paths relative to source files - Path traversal security checks - Using basis directories - Integration with Storage - Static path validation
Run:
4. Streaming Large Files¶
File: examples/io/04_streaming_large_files.py
Handle large files efficiently.
Topics: - Streaming reads and writes - Processing data in chunks - Memory-efficient operations - Transform data while streaming - CSV streaming example
Benefits: - Handle files larger than RAM - Lower memory footprint - Start processing before download completes
Run:
5. Config Management¶
File: examples/io/05_config_management.py
Practical configuration management.
Topics: - Multi-environment configs (dev, staging, production) - Config file generation in multiple formats - Merging configs from multiple sources - Generating .env files - Auto-generating documentation
Use Cases: - Application configuration - Environment-specific settings - Config file templating
Run:
6. Cloud Storage¶
File: examples/io/06_cloud_storage.py
Work with cloud storage providers.
Topics: - Unified API for local, S3, GCS, Azure - URI-based provider selection - Multi-cloud patterns - Data migration strategies - Best practices
Supported Providers: - Local filesystem - Amazon S3 - Google Cloud Storage - Azure Blob Storage
Run:
Quick Start¶
Installation¶
Basic Usage¶
from dspu.io import Storage
# Create storage instance
storage = Storage.from_uri("./data")
# Write and read
await storage.write("file.json", {"status": "success"})
data = await storage.read("file.json")
# Multi-format
await storage.write_format("config.yaml", {"database": {"host": "localhost"}})
await storage.write_format("data.csv", [{"name": "Alice", "age": 30}])
Common Patterns¶
Pattern 1: Reading Configuration¶
from dspu.io import Storage
storage = Storage.from_uri("./config")
config = await storage.read_format("app.yaml")
Pattern 2: Writing Multiple Formats¶
data = {"name": "MyApp", "version": "1.0.0"}
# Auto-detection from extension
await storage.write_format("config.json", data)
await storage.write_format("config.yaml", data)
await storage.write_format("config.toml", data)
Pattern 3: Streaming Large Files¶
Pattern 4: Path Resolution¶
from dspu.io import PathResolver
resolver = PathResolver(__file__, basis="../configs")
config_path = resolver.resolve("app.yaml", check_exists=True)
Pattern 5: Cloud Storage¶
# Same API, different URI
local = Storage.from_uri("./data")
s3 = Storage.from_uri("s3://my-bucket/data")
gcs = Storage.from_uri("gs://my-bucket/data")
# All use identical methods
await local.write_format("data.json", data)
await s3.write_format("data.json", data)
await gcs.write_format("data.json", data)
Format Options¶
JSON Options¶
await storage.write_format(
"config.json",
data,
format_options={
"indent": 4, # Pretty print with 4 spaces
"sort_keys": True, # Sort dictionary keys
"ensure_ascii": False # Allow unicode characters
}
)
CSV Options¶
await storage.write_format(
"data.csv",
rows,
format_options={
"header": True, # Include header row
"delimiter": ",", # Field delimiter
}
)
Python Code Options¶
await storage.write_format(
"utils.py",
python_code,
format_options={
"validate_syntax": True # Validate syntax before writing
}
)
Supported Formats¶
| Format | Extensions | Use Case |
|---|---|---|
| JSON | .json |
Structured data, APIs |
| YAML | .yaml, .yml |
Human-readable configs |
| TOML | .toml |
Python projects |
| HOCON | .conf, .hocon |
Complex hierarchical configs |
| CSV | .csv |
Tabular data |
| Markdown | .md |
Documentation |
| Python | .py |
Code generation |
| Env | .env |
Environment variables |
| Text | .txt |
Plain text |
Cloud Storage URIs¶
# Local filesystem
storage = Storage.from_uri("file:///data")
storage = Storage.from_uri("./data") # Relative path
# Amazon S3
storage = Storage.from_uri("s3://bucket/path")
# Google Cloud Storage
storage = Storage.from_uri("gs://bucket/path")
# Azure Blob Storage
storage = Storage.from_uri("azure://container/path")
Error Handling¶
from dspu.io import FormatError, StorageError
try:
await storage.write_format("config.toml", ["list", "at", "root"])
except FormatError as e:
print(f"Format error: {e}")
print(f"Suggestion: {e.suggestion}")
try:
data = await storage.read("nonexistent.json")
except StorageError as e:
print(f"Storage error: {e}")
Advanced Usage¶
Custom Format Registration¶
from dspu.io import register_format
class XMLFormat:
def write(self, obj, path):
# Implementation
...
def read(self, data, path):
# Implementation
...
def can_write(self, obj):
return isinstance(obj, dict)
@property
def extensions(self):
return [".xml"]
register_format("xml", XMLFormat)
Format Discovery¶
from dspu.io import list_formats, list_extensions
print(f"Available formats: {list_formats()}")
print(f"Supported extensions: {list_extensions()}")
Best Practices¶
✅ DO: - Use Storage abstraction for portability - Use format auto-detection from extensions - Stream large files to save memory - Use PathResolver for secure path handling - Validate paths to prevent traversal attacks
❌ DON'T: - Don't load entire large files into memory - Don't construct paths with string concatenation - Don't ignore format errors - Don't hardcode cloud URIs (use config)
Troubleshooting¶
Import Errors¶
# If you see: ModuleNotFoundError: No module named 'yaml'
pip install pyyaml
# For cloud storage
pip install 'dspu[io]'
Permission Errors¶
Format Errors¶
# Check what formats are available
from dspu.io import list_formats
print(list_formats())
# Check if your data is compatible
fmt = get_format("json")
if fmt.can_write(my_data):
await storage.write_format("out.json", my_data)