Skip to content

I/O Examples

Practical examples for storage abstraction and multi-format file operations.

Overview

The I/O module provides a unified API for file operations across local and cloud storage with support for multiple file formats.

Examples

1. Basic File Operations

File: examples/io/01_basic_file_operations.py

Learn the fundamentals.

Topics: - Creating a Storage instance - Writing and reading files - Checking file existence - Listing files with patterns - Deleting files - Working with raw bytes

Run:

python examples/io/01_basic_file_operations.py

2. Multi-Format Files

File: examples/io/02_multi_format_files.py

Master file format handling.

Topics: - Writing to 9 different formats (JSON, YAML, TOML, HOCON, CSV, Markdown, Python, Env, Text) - Auto-detection from file extensions - Format-specific options (indentation, delimiters, etc.) - Reading back from various formats - Format override capabilities

Supported Formats: - Structured: JSON, YAML, TOML, HOCON - Text: Plain text, Python code, Markdown - Tabular: CSV - Config: .env files

Run:

python examples/io/02_multi_format_files.py

3. Path Resolution

File: examples/io/03_path_resolution.py

Work safely with file paths.

Topics: - PathResolver for relative paths - Resolving paths relative to source files - Path traversal security checks - Using basis directories - Integration with Storage - Static path validation

Run:

python examples/io/03_path_resolution.py

4. Streaming Large Files

File: examples/io/04_streaming_large_files.py

Handle large files efficiently.

Topics: - Streaming reads and writes - Processing data in chunks - Memory-efficient operations - Transform data while streaming - CSV streaming example

Benefits: - Handle files larger than RAM - Lower memory footprint - Start processing before download completes

Run:

python examples/io/04_streaming_large_files.py

5. Config Management

File: examples/io/05_config_management.py

Practical configuration management.

Topics: - Multi-environment configs (dev, staging, production) - Config file generation in multiple formats - Merging configs from multiple sources - Generating .env files - Auto-generating documentation

Use Cases: - Application configuration - Environment-specific settings - Config file templating

Run:

python examples/io/05_config_management.py

6. Cloud Storage

File: examples/io/06_cloud_storage.py

Work with cloud storage providers.

Topics: - Unified API for local, S3, GCS, Azure - URI-based provider selection - Multi-cloud patterns - Data migration strategies - Best practices

Supported Providers: - Local filesystem - Amazon S3 - Google Cloud Storage - Azure Blob Storage

Run:

python examples/io/06_cloud_storage.py

Quick Start

Installation

# Basic I/O
pip install dspu

# With cloud storage support
pip install 'dspu[io]'

Basic Usage

from dspu.io import Storage

# Create storage instance
storage = Storage.from_uri("./data")

# Write and read
await storage.write("file.json", {"status": "success"})
data = await storage.read("file.json")

# Multi-format
await storage.write_format("config.yaml", {"database": {"host": "localhost"}})
await storage.write_format("data.csv", [{"name": "Alice", "age": 30}])

Common Patterns

Pattern 1: Reading Configuration

from dspu.io import Storage

storage = Storage.from_uri("./config")
config = await storage.read_format("app.yaml")

Pattern 2: Writing Multiple Formats

data = {"name": "MyApp", "version": "1.0.0"}

# Auto-detection from extension
await storage.write_format("config.json", data)
await storage.write_format("config.yaml", data)
await storage.write_format("config.toml", data)

Pattern 3: Streaming Large Files

async for chunk in storage.read_stream("large_file.csv", chunk_size=8192):
    process(chunk)

Pattern 4: Path Resolution

from dspu.io import PathResolver

resolver = PathResolver(__file__, basis="../configs")
config_path = resolver.resolve("app.yaml", check_exists=True)

Pattern 5: Cloud Storage

# Same API, different URI
local = Storage.from_uri("./data")
s3 = Storage.from_uri("s3://my-bucket/data")
gcs = Storage.from_uri("gs://my-bucket/data")

# All use identical methods
await local.write_format("data.json", data)
await s3.write_format("data.json", data)
await gcs.write_format("data.json", data)

Format Options

JSON Options

await storage.write_format(
    "config.json",
    data,
    format_options={
        "indent": 4,           # Pretty print with 4 spaces
        "sort_keys": True,     # Sort dictionary keys
        "ensure_ascii": False  # Allow unicode characters
    }
)

CSV Options

await storage.write_format(
    "data.csv",
    rows,
    format_options={
        "header": True,        # Include header row
        "delimiter": ",",      # Field delimiter
    }
)

Python Code Options

await storage.write_format(
    "utils.py",
    python_code,
    format_options={
        "validate_syntax": True  # Validate syntax before writing
    }
)

Supported Formats

Format Extensions Use Case
JSON .json Structured data, APIs
YAML .yaml, .yml Human-readable configs
TOML .toml Python projects
HOCON .conf, .hocon Complex hierarchical configs
CSV .csv Tabular data
Markdown .md Documentation
Python .py Code generation
Env .env Environment variables
Text .txt Plain text

Cloud Storage URIs

# Local filesystem
storage = Storage.from_uri("file:///data")
storage = Storage.from_uri("./data")  # Relative path

# Amazon S3
storage = Storage.from_uri("s3://bucket/path")

# Google Cloud Storage
storage = Storage.from_uri("gs://bucket/path")

# Azure Blob Storage
storage = Storage.from_uri("azure://container/path")

Error Handling

from dspu.io import FormatError, StorageError

try:
    await storage.write_format("config.toml", ["list", "at", "root"])
except FormatError as e:
    print(f"Format error: {e}")
    print(f"Suggestion: {e.suggestion}")

try:
    data = await storage.read("nonexistent.json")
except StorageError as e:
    print(f"Storage error: {e}")

Advanced Usage

Custom Format Registration

from dspu.io import register_format

class XMLFormat:
    def write(self, obj, path):
        # Implementation
        ...

    def read(self, data, path):
        # Implementation
        ...

    def can_write(self, obj):
        return isinstance(obj, dict)

    @property
    def extensions(self):
        return [".xml"]

register_format("xml", XMLFormat)

Format Discovery

from dspu.io import list_formats, list_extensions

print(f"Available formats: {list_formats()}")
print(f"Supported extensions: {list_extensions()}")

Best Practices

DO: - Use Storage abstraction for portability - Use format auto-detection from extensions - Stream large files to save memory - Use PathResolver for secure path handling - Validate paths to prevent traversal attacks

DON'T: - Don't load entire large files into memory - Don't construct paths with string concatenation - Don't ignore format errors - Don't hardcode cloud URIs (use config)

Troubleshooting

Import Errors

# If you see: ModuleNotFoundError: No module named 'yaml'
pip install pyyaml

# For cloud storage
pip install 'dspu[io]'

Permission Errors

# Ensure write permissions
output_dir.mkdir(parents=True, exist_ok=True)

Format Errors

# Check what formats are available
from dspu.io import list_formats
print(list_formats())

# Check if your data is compatible
fmt = get_format("json")
if fmt.can_write(my_data):
    await storage.write_format("out.json", my_data)

See Also