GTM Pipeline Comparison
LeadGrow Orchestrator vs. kkrlstrm/gtm-pipeline

A critical, source-verified architecture comparison — deep on observability, cost, agents, durability, feedback loops, quality gates, and storage. Every claim below was checked against cloned source by independent recon agents on 2026-06-12; this supersedes the first-pass audit (PR #102) and corrects three material errors in it.

LeadGrow stages

External stages

LeadGrow LOC

~41k 31.8k py · 9.6k ts

External LOC

~2.5k py + workflows

Test files

282 vs selftest.sh

Analysis

2026-06-12
3 recon agents

⚠ Corrections to the first-pass audit (PR #102)

Cross-run learning was backwards. #102 said we have "zero cross-run intelligence" and told us to copy a filter_intelligence.json keyword loop from the external repo. Verified: that file/loop exists in neither external repo, and we already run a live cross-run loop via the Nexus knowledge graph (copy generation reads past performance before writing; reply outcomes are deposited back). This is our advantage, not a gap.
"Via negativa, 97% accuracy" — no such metric exists in the external repo. There is an exclusions-first qualifier, but the number was invented.
External state is JSONL/Postgres, not "YAML files." Minor, but it changes the storage comparison.
Scale restated: ~41k LOC (not 26k); the "fragile single-SQLite ThreadPool" claim is unverified and softened below.

01Architecture at a glance

● LeadGrow GTM Orchestrator

production python pkg + cli · mid-migration to ts harness

        01-discover02a-clay-format02b-clay-enrich02c-join-back02.5-domain-mx03-qualify03b-enrich-co04-people04b-waterfall04c-mx04d-enrich-contact06-segment07-apply (copy)07b-personalize08-qa ×308.5-scorecard09-launch ×8
      

End-to-end: plain-English brief → launched Bison campaigns with copy, personalization, QAorchestrator.py:86-104
Copy framework (0–100 rubric, proximity auditor, 300+-phrase spam registry) + QA + 11-dim scorecard hard-block
Live cross-run learning via Nexus KG (retrieve-before-write)copy_sub_runner.py:436 · scoring/retrospective.py:171
State: per-campaign state.sqlite (v13, WAL) + Supabase mirror; Next.js/Supabase-Realtime dashboard
TS migration partial: 15/18 stages have TS handlers; copy, scorecard, launch are Python-bridge only

● kkrlstrm/gtm-pipeline

claude code plugin · markdown agents + js workflows · ~2k loc

company_searchcompany_enrichpeople_searchqualifyemail_enrichphone_enrichactivate

Provider-agnostic manifest seam — swap providers by editing one YAML, never codeproviders/*/manifest.yaml · gtm.config.yaml
Model-routing by stage economics — Sonnet for research, Haiku for high-volume scoringscore-leads.js · company-researcher.md
Role/segment expansion frozen as a gate-reviewed artifact at plan timeorchestrator/agent.md §1a
Stops at CSV export / draft sequencer push — no copy, QA, personalization, or launch
No feedback loop, no automated tests beyond a smoke script, no UI

Below, each dimension Mitchell flagged gets a mechanics-level side-by-side. The pattern repeats: we have the deeper, production implementation; they have the cleaner seam. Both lessons matter.

02Observability LeadGrow

LeadGrow

structured + traced + live UI

structlog JSON events throughout; append-only campaign.jsonl with stage.completed {rows_out, duration_ms}
Optional Langfuse trace/span wrapping every LLM stage under a campaign traceadapters/langfuse_obs.py (no-op if key absent)
generateRetro() emits a funnel table + anomalies (row-drop, verifier-fail)agents/retro.ts
Live Next.js + Supabase-Realtime dashboard (Vercel) subscribed to stage_status/checkpoints

External

cli + stderr, no ui

storage/cli.py list_summary — per-stage counts on demand
Per-contact email_waterfall_log / phone_waterfall_log columns (which providers were tried)
No run-ID trace, no span timing, no metrics, no UI — you read JSONL/CSV by hand

Read: We're clearly ahead. The one idea worth noting from them: the per-row waterfall log (which provider answered, per contact) is a clean, cheap observability primitive our enrichment stages could persist verbatim for post-hoc fill-rate analysis.

03Cost tracking External (on discipline)

LeadGrow

richer attribution — but reactive + a live bug

Python: cost_accumulator + lead_cost tables — per-stage and per-lead token/USD attributionstate/store.py:98 · adapters/runtime.py
llm_rate_cards.yaml pre-flight projections; costs/estimate CLI
Budget is gated reactively — campaign_budget_exceeded fires only after exceeding
Bug: TS harness OpenAIPromptRunner writes costUsd: 0 on every call — TS-native runs report $0.00agents/openai-prompt-runner.ts:115

External

proactive, pre-spend

Every adapter has an --estimate mode returning credit cost without spendingverified in selftest.sh
Gate #3 estimates total cost before any paid enrichment; auto-proceeds under thresholds, else asks
No persisted per-run ledger; credits never summed/stored post-run

Read: We measure cost more richly but spend first and gate later; they gate before spending. Their pre-spend estimate gate is the carry-over (rec #4) — and fixing the TS costUsd:0 bug is a one-line prerequisite so the gate has real numbers to read.

04Agents & model economics External (on routing)

LeadGrow

role-typed agents, mostly one model tier

Typed roles: Implementer (Sonnet), Verifier (deterministic, SHA-256 evidence), Reviewer, Closer (Haiku), Retro (Sonnet)CONTEXT.md
Inner loop: Implementer → gate → review → retry (maxRevisions=2) → escalateinner-loop.ts
LLM stages mostly gpt-4.1-mini, copy on gpt-4.1; no systematic cheap-model routing for high-volume row work
Fan-out via ThreadPoolExecutor (≤8) for personalize/copy; domain-caching (1 call per unique domain)

External

model routing as a first-class lever

Research/sourcing on Sonnet, 15-per-batch scoring on Haiku — hardcoded per workflowscore-leads.js:26 · discover-companies.js:71
Subagent fan-out keeps intermediate research out of the main context (token-clean)
Non-deterministic prompt behavior; no verifier/gate equivalent

Read: Their pattern — route by stage economics (expensive reasoning → Sonnet; high-volume classification → Haiku/cheap; keep intermediates off the main context) — is a 3–5× cost lever on exactly our heaviest row-by-row stages (qualify, segment, personalize). This is the most under-weighted finding in #102 → rec #3.

05Durability & resume LeadGrow

LeadGrow

real per-row resume + integrity invariant

state.sqlite per-row row_state; done|cached skipped on resume, failed re-enqueuedstate/store.py
Row-conservation invariant: every stage must satisfy rows_in = kept + rejected or throwrow_conservation.py + agents/row-conservation.ts
Brief frozen by SHA-256 — pipeline refuses to resume if the brief changed
TS harness has no SamplingGate equivalent; TS CostTracker is in-memory (crash loses cost)

External

idempotent writes, no real resume

Idempotent upserts (normalized LinkedIn URL); atomic file writes (write-then-rename)
No mid-stage resume. Crash mid-batch → re-run the stage (idempotent, but no "pick up where it left off")
No per-row error state, no retry queue, no run history

Read: We're materially ahead — per-row resume + row-conservation is genuine production hardening they don't have. The open item is internal: finish porting durability (SamplingGate, persisted cost) into the TS harness.

06Feedback loops LeadGrow

LeadGrow

a real, wired retrieve-then-deposit loop

CopySubRunner._query_nexus_intelligence() queries the KG for "copy angles · campaign performance · segment" before writing copycopy_sub_runner.py:433-452
score-replies → deposit_retrospective_intelligence() → TierEngine → Nexus KG closes the loop with reply outcomesscoring/retrospective.py:171
Conditional on NEXUS_API_KEY; TS harness has zero Nexus wiring; signal-bank runs are manual + disconnected

External

none automated

Cross-campaign suppression via opt-in master_contacts (you sync it from your sender)
No automated learning. Rubric/persona weights are static between campaigns (recon §6: "no loop that auto-adjusts ICP scoring")

Read — this is the corrected headline. #102 had it inverted. We have the live feedback loop; they don't. The action is not "build one" — it's extend our existing Nexus loop into the TS harness and re-connect signal-bank (internal gaps, §13).

07Quality gates LeadGrow

LeadGrow

automated scoring + structural launch gate

11-dimension scorecard (8 list + 3 copy, weighted); grade F (<0.40) = exit 7 hard blockstages/scorecard.py
QA: rendered semantic QA + mechanical spam/variable scan + brace conversionqa_rendered/qa_mechanical/qa_convert
SamplingGate pre-fan-out quality floor (grade C+); launch gate needs operator_confirms + signature
7 human checkpoints, each appended to decisions.md (full audit trail)

External

4 lean human gates, no scoring

Cost/plan judgment up front (Gate #1 plan + Gate #2 probe-cost) — proactive by design
Gate #3 qualify review (QUALIFY/MAYBE/SKIP); Gate #4 activation requires typed "activate"
No automated quality scoring, no copy/QA gates (it generates no copy)

Read: We gate quality automatically and deeply; they gate spend earlier. Adopt their front-loaded cost gate (rec #4) without giving up our automated quality gates.

08Storage & Supabase LeadGrow

Dimension	LeadGrow	External
Primary store	Per-campaign `state.sqlite` (v13, WAL, in-code migrations)	JSONL (`contacts.jsonl` + `index.json`) or Postgres
Shared / cloud	Supabase `leadgrow_knowledge`: `global_companies`, `global_contacts`, `campaign_contacts`, `enrichments` + dashboard tables	Optional Postgres; `master_contacts` xref (manually synced)
Row tracking	`row_state` per stage (pending/failed/done/skipped + error)	`stage` field per contact via `advance_stage`
Cross-campaign dedup	Supabase mirror, cross-client	`crossref_master` (local backend always returns "new")
Crash recovery	SQLite WAL + row-state retry + `StateReconciler`	Atomic file writes; re-run stage

Read: Our Supabase mirror + per-row SQLite is a richer, multi-client store. Their edge is portability — JSONL "runs anywhere with zero infra," which is why their seam is cleaner (next section).

09Provider abstraction External

LeadGrow

static registry, config-tunable

Enrichment CATALOG: 12 EnrichmentSpec entries (clay | runtime | apify), tunable via spec fields + config/enrichment_providers.yamlstages/enrichments/registry.py
Adding a provider = a code-level CATALOG entry + a provider class; no declarative manifest/field-map seam
TS core executeProvider() shared retry/backoff exists, but only millionverifier is ported

External

4-layer declarative seam

manifest.yaml per provider: auth.env, request_template, field_map (provider field names live only here, never in agent prompts)providers/*/manifest.yaml
gtm.config.yaml waterfalls = the only place a provider is chosen; missing key → silently skipped (BYOK shaping)
Quirks ("encode quirks as data so the agent stays generic") + uniform adapter.py --capability contract

Read: Their manifest seam is the single best idea in the repo. We have a registry — so this is an upgrade (declarative manifest + field-map layer on top of CATALOG), not a rewrite. Rec #2.

10Feature matrix (verified)

Capability	LeadGrow	External	Winner
End-to-end → launched campaigns	●	○ (CSV/draft)	LeadGrow
Copy generation framework	●	○	LeadGrow
QA + iterative fixes	●	○	LeadGrow
11-dim scorecard hard block	●	○	LeadGrow
Per-lead personalization loop	●	○	LeadGrow
Per-row resume + conservation invariant	●	◐	LeadGrow
Cross-run learning loop	● (Nexus)	○	LeadGrow ⬅ #102 had this inverted
Live observability UI	●	○	LeadGrow
Per-lead cost attribution (Python)	●	○	LeadGrow
Provider manifest / config-swap	◐ (registry)	●	External
Model-routing by stage economics	○	●	External
Role/segment title expansion	◐ (literal-ish)	●	External
Pre-spend cost gate	◐ (reactive)	●	External
BYOK graceful pipeline shaping	○	●	External
Zero-infra portability	○	●	External

11Carry-overs we should make (corrected & ranked)

Filtered to what's verified real in the external repo and genuinely missing/weaker in ours. The #102 "cross-run intelligence" recommendation is removed — we already have it.

Role / segment title expansion as a gate-reviewed frozen artifact

High impactHigh feasibility~2–3 daysLow risk

"Target CFOs" → an explicit equivalence class (CFO / Chief Financial Officer / VP Finance / Head of Finance / Controller / Treasurer). The external pipeline infers it once at brief time, shows it at Gate #1 with provenance (inferred vs from-context), the operator edits it, and it's frozen into the run — no stage re-infers. Prevents silent under-sourcing from literal title matching.

How: expansion step in brief capture → frozen expansion.yaml read by 03-qualify + 04-people; surface at the first checkpoint. ~200 lines + a module.

Declarative provider manifest seam (upgrade the registry)

High impactMed feasibility~3–4 daysMed risk

Add a field_map + request_template manifest layer on top of our existing CATALOG registry so adding/swapping a provider is a YAML file, not a code change + a Python class. Keep waterfalls as ordered name lists in config. This is the external repo's strongest idea, and we already have the registry to build it on.

How: manifest parser + registry loader; refactor the 12 specs to register via manifests; preserve the TS executeProvider() retry core. Backward-compatible wrapping + waterfall test coverage is the risk to manage.

Model-routing by stage economics

High cost impactMed feasibility~3–4 daysMed risk

Route by what each stage actually needs: expensive reasoning (research, copy) on a strong model; high-volume row classification (qualify, segment) on a cheap fast model in batches; keep intermediate output out of the main context. The external repo does Sonnet-research / Haiku-15-per-batch-scoring — a verified 3–5× cost delta on the heaviest stages. Most under-valued idea in the first pass.

How: per-stage target_model already exists in prompt YAML — formalize a routing policy + batch the classifier stages; fix the TS costUsd:0 bug first so savings are measurable.

Pre-spend cost gate (proactive budgeting) + fix TS cost=0

Med impactHigh feasibility~1 dayLow risk

We gate budget after exceeding; they estimate cost before spending (Gate #3 + --estimate). Wire our existing llm_rate_cards.yaml / enrichment cost estimates into a pre-stage gate that raises CheckpointPending when projected spend crosses a threshold. Prerequisite: fix OpenAIPromptRunner writing costUsd:0.

Per-row waterfall log + BYOK pipeline shaping

Low–med impactHigh feasibility~1–2 daysLow risk

Two cheap wins: (a) persist a per-contact waterfall_log (which provider answered) for fill-rate analysis; (b) at init, probe which provider keys exist and light up only those stages (their BYOK shaping) instead of failing on a missing key.

12What we already do better — lean in

End-to-end to launch. Brief → launched Bison campaigns with copy, personalization, QA. They stop at CSV/draft. Biggest structural advantage.

Copy framework. 0–100 rubric, proximity auditor, 300+-phrase spam registry — production systems with zero external equivalent.

QA + scorecard. Rendered semantic QA, mechanical scan, and an 11-dim scorecard that hard-blocks grade F. Bad copy can't reach inboxes.

Cross-run learning (corrected). Nexus retrieve-before-write + retrospective deposit. The external repo has nothing like it.

Durability. Per-row resume, row-conservation invariant, SHA-256 brief freeze, decisions.md audit trail.

Structural launch gate. operator_confirms + signature, exit 7 hard blocks — irreversibility handled with code, not vibes.

13Internal gaps the recon surfaced (not about them — about us)

These aren't carry-overs; they're things to fix in our own house, found while reading our source for this comparison.

Gap	Detail	Evidence
TS cost tracker writes $0	`OpenAIPromptRunner` records `costUsd:0` every call; TS-native runs report $0.00 cost	openai-prompt-runner.ts:115
Nexus not wired into TS harness	TS-native stages never read/write the KG — the feedback loop only fires on the Python path	integrations/nexus.py
3 stages bridge-only	copy (07), scorecard (08.5), launch (09) have no TS handler — always spawn the Python subprocess, no migration target tracked	flows/registry.ts:22-23
Signal-bank disconnected	Manual run order, no cron/Trigger.dev; a stale `RUNNING.lock` suggests a crashed run never cleaned up	signal-bank/runs/.../RUNNING.lock

Verification provenance. Both external repos were shallow-cloned and read in full; gtm-orchestrator was read read-only on this machine. All file:line citations were produced by three independent recon passes on 2026-06-12 and reconciled against the first-pass audit (PR #102), whose three material errors are corrected above. Items the recon could not confirm against live APIs (e.g. the external repo's synthetic example run, exact copy-stage model) are flagged in-line rather than asserted.

LeadGrow GTM — internal architecture review · 2026-06-12

GTM Pipeline ComparisonLeadGrow Orchestrator vs. kkrlstrm/gtm-pipeline

⚠ Corrections to the first-pass audit (PR #102)

How to read this

01Architecture at a glance

● LeadGrow GTM Orchestrator

● kkrlstrm/gtm-pipeline

02Observability LeadGrow

LeadGrow

External

03Cost tracking External (on discipline)

LeadGrow

External

04Agents & model economics External (on routing)

LeadGrow

External

05Durability & resume LeadGrow

LeadGrow

External

06Feedback loops LeadGrow

LeadGrow

External

07Quality gates LeadGrow

LeadGrow

External

08Storage & Supabase LeadGrow

09Provider abstraction External

LeadGrow

External

10Feature matrix (verified)

11Carry-overs we should make (corrected & ranked)

Role / segment title expansion as a gate-reviewed frozen artifact

Declarative provider manifest seam (upgrade the registry)

Model-routing by stage economics

Pre-spend cost gate (proactive budgeting) + fix TS cost=0

Per-row waterfall log + BYOK pipeline shaping

12What we already do better — lean in

13Internal gaps the recon surfaced (not about them — about us)

GTM Pipeline Comparison
LeadGrow Orchestrator vs. kkrlstrm/gtm-pipeline