Skip to main content

Data Module

This module provides data processing operations including batching and SQL transformations.

Module Overview

  • Purpose: Data processing operations including batching and SQL transformations
  • Architecture: Two-file design
    • batching.sh: Batching logic for grouping INSERT statements into efficient API batches
    • transform.sh: SQL transformation documentation and extension point

Dependencies

  • External commands: None
  • Internal modules:
    • src/lib/utils/core.sh: die() function

Batching Functions (src/lib/data/batching.sh)

batch_inserts(batch_size, table_name, force_overwrite)

Group INSERT statements into batches for efficient API calls. Validates batch size is a positive integer, reads INSERT statements line by line from stdin, accumulates INSERT statements until batch size reached, outputs complete batch as NUL-delimited string.

Batching Strategy

Default Batch Size

  • Default: 100 rows per batch
  • Configurable: Via --batch-size flag in main script
  • Rationale: Balance between API call overhead and payload size limits

NUL-Delimited Output Format

  • Format: Each batch is output as a NUL-delimited string (\0)
  • Rationale: Safe binary processing with read -r -d '' to handle SQL containing newlines, quotes, and special characters

Force Overwrite Pattern

Delete existing rows before inserting to handle updates and idempotent syncs. Collects Z_PK values from INSERT statements during batching, prepends DELETE FROM table WHERE Z_PK IN (...) to each batch.

Integration Guidelines

Sourcing Order

Source the data module after core utilities and logging, but before cloudflare and bear modules.

Global Variable Initialization

Caller must initialize these variables before using module functions:

FORCE_OVERWRITE=0  # or 1