Data Module
This module provides data processing operations including batching and SQL transformations.
Module Overview
- Purpose: Data processing operations including batching and SQL transformations
- Architecture: Two-file design
batching.sh: Batching logic for grouping INSERT statements into efficient API batchestransform.sh: SQL transformation documentation and extension point
Dependencies
- External commands: None
- Internal modules:
src/lib/utils/core.sh:die()function
Batching Functions (src/lib/data/batching.sh)
batch_inserts(batch_size, table_name, force_overwrite)
Group INSERT statements into batches for efficient API calls. Validates batch size is a positive integer, reads INSERT statements line by line from stdin, accumulates INSERT statements until batch size reached, outputs complete batch as NUL-delimited string.
Batching Strategy
Default Batch Size
- Default: 100 rows per batch
- Configurable: Via
--batch-sizeflag in main script - Rationale: Balance between API call overhead and payload size limits
NUL-Delimited Output Format
- Format: Each batch is output as a NUL-delimited string (
\0) - Rationale: Safe binary processing with
read -r -d ''to handle SQL containing newlines, quotes, and special characters
Force Overwrite Pattern
Delete existing rows before inserting to handle updates and idempotent syncs. Collects Z_PK values from INSERT statements during batching, prepends DELETE FROM table WHERE Z_PK IN (...) to each batch.
Integration Guidelines
Sourcing Order
Source the data module after core utilities and logging, but before cloudflare and bear modules.
Global Variable Initialization
Caller must initialize these variables before using module functions:
FORCE_OVERWRITE=0 # or 1
Navigation
- Architecture Overview - High-level architecture
- Modular Architecture - Module hierarchy and dependencies
- Bear Module - Bear database operations
- Cloudflare D1 Module - D1 API integration
- Sync Module - Sync orchestration