Zum Inhalt

Data Models Reference

Complete documentation of all dataclasses and Pydantic models used throughout Common Secretary Services.

Overview

All data models follow a consistent structure: - Type Safety: Strict type annotations with Python typing module - Validation: Field validation in __post_init__ methods - Serialization: to_dict() and from_dict() methods for JSON conversion - Immutability: Many models use frozen=True for immutability - Slots: Performance optimization with slots=True where applicable

Model Categories

Base Models

Fundamental models used by all processors:

  • Base Models - BaseResponse, ErrorInfo, RequestInfo, ProcessInfo

Processor-Specific Models

Models for specific processors:

  • Audio Models - AudioResponse, TranscriptionResult, TranscriptionSegment
  • Video Models - VideoResponse, VideoProcessingResult, VideoSource
  • PDF Models - PDFResponse, PDFMetadata, PDFProcessingResult
  • ImageOCR Models - ImageOCRResponse, ImageOCRMetadata
  • Transformer Models - TransformerResponse, TemplateField, TemplateFields
  • Session Models - SessionResponse, SessionInput, SessionOutput, SessionData
  • Event Models - EventResponse, EventInput, EventOutput, EventData
  • Track Models - TrackResponse, TrackInput, TrackOutput, TrackData
  • Story Models - StoryResponse, StoryProcessorInput, StoryProcessorOutput
  • YouTube Models - YoutubeResponse, YoutubeMetadata, YoutubeProcessingResult

System Models

Models for system functionality:

Enums and Types

  • Enums - ProcessorType, ProcessingStatus, OutputFormat, EventFormat, PublicationStatus, LanguageCode

Common Patterns

Response Format

All API responses follow this structure:

@dataclass(frozen=True)
class BaseResponse:
    status: ProcessingStatus
    request: RequestInfo
    process: Optional[ProcessInfo]
    error: Optional[ErrorInfo]
    data: Any  # Processor-specific data

Error Handling

Errors are structured as:

@dataclass
class ErrorInfo:
    code: str
    message: str
    details: Dict[str, Any]

Process Tracking

Process information includes LLM tracking:

@dataclass
class ProcessInfo:
    id: str
    main_processor: str
    started: str
    completed: Optional[str]
    duration: Optional[float]
    llm_info: Optional[LLMInfo]  # LLM usage tracking
    is_from_cache: bool
    cache_key: Optional[str]

Usage Examples

Creating a Response

from src.core.models.base import BaseResponse, RequestInfo, ProcessInfo
from src.core.models.enums import ProcessingStatus

response = BaseResponse(
    status=ProcessingStatus.SUCCESS,
    request=RequestInfo(
        processor="audio",
        timestamp="2024-01-01T00:00:00Z"
    ),
    process=ProcessInfo(
        id="process-123",
        main_processor="audio",
        started="2024-01-01T00:00:00Z"
    ),
    data={"transcription": "..."}
)

Serialization

# Convert to dictionary
response_dict = response.to_dict()

# Convert to JSON
import json
json_str = json.dumps(response_dict)

Deserialization

# From dictionary
response = BaseResponse.from_dict(response_dict)

# From JSON
response_dict = json.loads(json_str)
response = BaseResponse.from_dict(response_dict)