GTM Pipeline Comparison
LeadGrow Orchestrator vs. kkrlstrm/gtm-pipeline

A critical, source-verified architecture comparison — deep on observability, cost, agents, durability, feedback loops, quality gates, and storage. Every claim below was checked against cloned source by independent recon agents on 2026-06-12; this supersedes the first-pass audit (PR #102) and corrects three material errors in it.

LeadGrow stages
18
External stages
7
LeadGrow LOC
~41k 31.8k py · 9.6k ts
External LOC
~2.5k py + workflows
Test files
282 vs selftest.sh
Analysis
2026-06-12
3 recon agents

01Architecture at a glance

● LeadGrow GTM Orchestrator

production python pkg + cli · mid-migration to ts harness
01-discover02a-clay-format02b-clay-enrich02c-join-back02.5-domain-mx03-qualify03b-enrich-co04-people04b-waterfall04c-mx04d-enrich-contact06-segment07-apply (copy)07b-personalize08-qa ×308.5-scorecard09-launch ×8
  • End-to-end: plain-English brief → launched Bison campaigns with copy, personalization, QAorchestrator.py:86-104
  • Copy framework (0–100 rubric, proximity auditor, 300+-phrase spam registry) + QA + 11-dim scorecard hard-block
  • Live cross-run learning via Nexus KG (retrieve-before-write)copy_sub_runner.py:436 · scoring/retrospective.py:171
  • State: per-campaign state.sqlite (v13, WAL) + Supabase mirror; Next.js/Supabase-Realtime dashboard
  • TS migration partial: 15/18 stages have TS handlers; copy, scorecard, launch are Python-bridge only

● kkrlstrm/gtm-pipeline

claude code plugin · markdown agents + js workflows · ~2k loc
company_searchcompany_enrichpeople_searchqualifyemail_enrichphone_enrichactivate
  • Provider-agnostic manifest seam — swap providers by editing one YAML, never codeproviders/*/manifest.yaml · gtm.config.yaml
  • Model-routing by stage economics — Sonnet for research, Haiku for high-volume scoringscore-leads.js · company-researcher.md
  • Role/segment expansion frozen as a gate-reviewed artifact at plan timeorchestrator/agent.md §1a
  • Stops at CSV export / draft sequencer push — no copy, QA, personalization, or launch
  • No feedback loop, no automated tests beyond a smoke script, no UI

Below, each dimension Mitchell flagged gets a mechanics-level side-by-side. The pattern repeats: we have the deeper, production implementation; they have the cleaner seam. Both lessons matter.

02Observability LeadGrow

LeadGrow

structured + traced + live UI
  • structlog JSON events throughout; append-only campaign.jsonl with stage.completed {rows_out, duration_ms}
  • Optional Langfuse trace/span wrapping every LLM stage under a campaign traceadapters/langfuse_obs.py (no-op if key absent)
  • generateRetro() emits a funnel table + anomalies (row-drop, verifier-fail)agents/retro.ts
  • Live Next.js + Supabase-Realtime dashboard (Vercel) subscribed to stage_status/checkpoints

External

cli + stderr, no ui
  • storage/cli.py list_summary — per-stage counts on demand
  • Per-contact email_waterfall_log / phone_waterfall_log columns (which providers were tried)
  • No run-ID trace, no span timing, no metrics, no UI — you read JSONL/CSV by hand
Read: We're clearly ahead. The one idea worth noting from them: the per-row waterfall log (which provider answered, per contact) is a clean, cheap observability primitive our enrichment stages could persist verbatim for post-hoc fill-rate analysis.

03Cost tracking External (on discipline)

LeadGrow

richer attribution — but reactive + a live bug
  • Python: cost_accumulator + lead_cost tables — per-stage and per-lead token/USD attributionstate/store.py:98 · adapters/runtime.py
  • llm_rate_cards.yaml pre-flight projections; costs/estimate CLI
  • Budget is gated reactivelycampaign_budget_exceeded fires only after exceeding
  • Bug: TS harness OpenAIPromptRunner writes costUsd: 0 on every call — TS-native runs report $0.00agents/openai-prompt-runner.ts:115

External

proactive, pre-spend
  • Every adapter has an --estimate mode returning credit cost without spendingverified in selftest.sh
  • Gate #3 estimates total cost before any paid enrichment; auto-proceeds under thresholds, else asks
  • No persisted per-run ledger; credits never summed/stored post-run
Read: We measure cost more richly but spend first and gate later; they gate before spending. Their pre-spend estimate gate is the carry-over (rec #4) — and fixing the TS costUsd:0 bug is a one-line prerequisite so the gate has real numbers to read.

04Agents & model economics External (on routing)

LeadGrow

role-typed agents, mostly one model tier
  • Typed roles: Implementer (Sonnet), Verifier (deterministic, SHA-256 evidence), Reviewer, Closer (Haiku), Retro (Sonnet)CONTEXT.md
  • Inner loop: Implementer → gate → review → retry (maxRevisions=2) → escalateinner-loop.ts
  • LLM stages mostly gpt-4.1-mini, copy on gpt-4.1; no systematic cheap-model routing for high-volume row work
  • Fan-out via ThreadPoolExecutor (≤8) for personalize/copy; domain-caching (1 call per unique domain)

External

model routing as a first-class lever
  • Research/sourcing on Sonnet, 15-per-batch scoring on Haiku — hardcoded per workflowscore-leads.js:26 · discover-companies.js:71
  • Subagent fan-out keeps intermediate research out of the main context (token-clean)
  • Non-deterministic prompt behavior; no verifier/gate equivalent
Read: Their pattern — route by stage economics (expensive reasoning → Sonnet; high-volume classification → Haiku/cheap; keep intermediates off the main context) — is a 3–5× cost lever on exactly our heaviest row-by-row stages (qualify, segment, personalize). This is the most under-weighted finding in #102 → rec #3.

05Durability & resume LeadGrow

LeadGrow

real per-row resume + integrity invariant
  • state.sqlite per-row row_state; done|cached skipped on resume, failed re-enqueuedstate/store.py
  • Row-conservation invariant: every stage must satisfy rows_in = kept + rejected or throwrow_conservation.py + agents/row-conservation.ts
  • Brief frozen by SHA-256 — pipeline refuses to resume if the brief changed
  • TS harness has no SamplingGate equivalent; TS CostTracker is in-memory (crash loses cost)

External

idempotent writes, no real resume
  • Idempotent upserts (normalized LinkedIn URL); atomic file writes (write-then-rename)
  • No mid-stage resume. Crash mid-batch → re-run the stage (idempotent, but no "pick up where it left off")
  • No per-row error state, no retry queue, no run history
Read: We're materially ahead — per-row resume + row-conservation is genuine production hardening they don't have. The open item is internal: finish porting durability (SamplingGate, persisted cost) into the TS harness.

06Feedback loops LeadGrow

LeadGrow

a real, wired retrieve-then-deposit loop
  • CopySubRunner._query_nexus_intelligence() queries the KG for "copy angles · campaign performance · segment" before writing copycopy_sub_runner.py:433-452
  • score-replies → deposit_retrospective_intelligence() → TierEngine → Nexus KG closes the loop with reply outcomesscoring/retrospective.py:171
  • Conditional on NEXUS_API_KEY; TS harness has zero Nexus wiring; signal-bank runs are manual + disconnected

External

none automated
  • Cross-campaign suppression via opt-in master_contacts (you sync it from your sender)
  • No automated learning. Rubric/persona weights are static between campaigns (recon §6: "no loop that auto-adjusts ICP scoring")
Read — this is the corrected headline. #102 had it inverted. We have the live feedback loop; they don't. The action is not "build one" — it's extend our existing Nexus loop into the TS harness and re-connect signal-bank (internal gaps, §13).

07Quality gates LeadGrow

LeadGrow

automated scoring + structural launch gate
  • 11-dimension scorecard (8 list + 3 copy, weighted); grade F (<0.40) = exit 7 hard blockstages/scorecard.py
  • QA: rendered semantic QA + mechanical spam/variable scan + brace conversionqa_rendered/qa_mechanical/qa_convert
  • SamplingGate pre-fan-out quality floor (grade C+); launch gate needs operator_confirms + signature
  • 7 human checkpoints, each appended to decisions.md (full audit trail)

External

4 lean human gates, no scoring
  • Cost/plan judgment up front (Gate #1 plan + Gate #2 probe-cost) — proactive by design
  • Gate #3 qualify review (QUALIFY/MAYBE/SKIP); Gate #4 activation requires typed "activate"
  • No automated quality scoring, no copy/QA gates (it generates no copy)
Read: We gate quality automatically and deeply; they gate spend earlier. Adopt their front-loaded cost gate (rec #4) without giving up our automated quality gates.

08Storage & Supabase LeadGrow

DimensionLeadGrowExternal
Primary storePer-campaign state.sqlite (v13, WAL, in-code migrations)JSONL (contacts.jsonl + index.json) or Postgres
Shared / cloudSupabase leadgrow_knowledge: global_companies, global_contacts, campaign_contacts, enrichments + dashboard tablesOptional Postgres; master_contacts xref (manually synced)
Row trackingrow_state per stage (pending/failed/done/skipped + error)stage field per contact via advance_stage
Cross-campaign dedupSupabase mirror, cross-clientcrossref_master (local backend always returns "new")
Crash recoverySQLite WAL + row-state retry + StateReconcilerAtomic file writes; re-run stage
Read: Our Supabase mirror + per-row SQLite is a richer, multi-client store. Their edge is portability — JSONL "runs anywhere with zero infra," which is why their seam is cleaner (next section).

09Provider abstraction External

LeadGrow

static registry, config-tunable
  • Enrichment CATALOG: 12 EnrichmentSpec entries (clay | runtime | apify), tunable via spec fields + config/enrichment_providers.yamlstages/enrichments/registry.py
  • Adding a provider = a code-level CATALOG entry + a provider class; no declarative manifest/field-map seam
  • TS core executeProvider() shared retry/backoff exists, but only millionverifier is ported

External

4-layer declarative seam
  • manifest.yaml per provider: auth.env, request_template, field_map (provider field names live only here, never in agent prompts)providers/*/manifest.yaml
  • gtm.config.yaml waterfalls = the only place a provider is chosen; missing key → silently skipped (BYOK shaping)
  • Quirks ("encode quirks as data so the agent stays generic") + uniform adapter.py --capability contract
Read: Their manifest seam is the single best idea in the repo. We have a registry — so this is an upgrade (declarative manifest + field-map layer on top of CATALOG), not a rewrite. Rec #2.

10Feature matrix (verified)

CapabilityLeadGrowExternalWinner
End-to-end → launched campaigns(CSV/draft)LeadGrow
Copy generation frameworkLeadGrow
QA + iterative fixesLeadGrow
11-dim scorecard hard blockLeadGrow
Per-lead personalization loopLeadGrow
Per-row resume + conservation invariantLeadGrow
Cross-run learning loop● (Nexus)LeadGrow ⬅ #102 had this inverted
Live observability UILeadGrow
Per-lead cost attribution (Python)LeadGrow
Provider manifest / config-swap(registry)External
Model-routing by stage economicsExternal
Role/segment title expansion(literal-ish)External
Pre-spend cost gate(reactive)External
BYOK graceful pipeline shapingExternal
Zero-infra portabilityExternal

11Carry-overs we should make (corrected & ranked)

Filtered to what's verified real in the external repo and genuinely missing/weaker in ours. The #102 "cross-run intelligence" recommendation is removed — we already have it.

1

Role / segment title expansion as a gate-reviewed frozen artifact

High impactHigh feasibility~2–3 daysLow risk

"Target CFOs" → an explicit equivalence class (CFO / Chief Financial Officer / VP Finance / Head of Finance / Controller / Treasurer). The external pipeline infers it once at brief time, shows it at Gate #1 with provenance (inferred vs from-context), the operator edits it, and it's frozen into the run — no stage re-infers. Prevents silent under-sourcing from literal title matching.

How: expansion step in brief capture → frozen expansion.yaml read by 03-qualify + 04-people; surface at the first checkpoint. ~200 lines + a module.

2

Declarative provider manifest seam (upgrade the registry)

High impactMed feasibility~3–4 daysMed risk

Add a field_map + request_template manifest layer on top of our existing CATALOG registry so adding/swapping a provider is a YAML file, not a code change + a Python class. Keep waterfalls as ordered name lists in config. This is the external repo's strongest idea, and we already have the registry to build it on.

How: manifest parser + registry loader; refactor the 12 specs to register via manifests; preserve the TS executeProvider() retry core. Backward-compatible wrapping + waterfall test coverage is the risk to manage.

3

Model-routing by stage economics

High cost impactMed feasibility~3–4 daysMed risk

Route by what each stage actually needs: expensive reasoning (research, copy) on a strong model; high-volume row classification (qualify, segment) on a cheap fast model in batches; keep intermediate output out of the main context. The external repo does Sonnet-research / Haiku-15-per-batch-scoring — a verified 3–5× cost delta on the heaviest stages. Most under-valued idea in the first pass.

How: per-stage target_model already exists in prompt YAML — formalize a routing policy + batch the classifier stages; fix the TS costUsd:0 bug first so savings are measurable.

4

Pre-spend cost gate (proactive budgeting) + fix TS cost=0

Med impactHigh feasibility~1 dayLow risk

We gate budget after exceeding; they estimate cost before spending (Gate #3 + --estimate). Wire our existing llm_rate_cards.yaml / enrichment cost estimates into a pre-stage gate that raises CheckpointPending when projected spend crosses a threshold. Prerequisite: fix OpenAIPromptRunner writing costUsd:0.

5

Per-row waterfall log + BYOK pipeline shaping

Low–med impactHigh feasibility~1–2 daysLow risk

Two cheap wins: (a) persist a per-contact waterfall_log (which provider answered) for fill-rate analysis; (b) at init, probe which provider keys exist and light up only those stages (their BYOK shaping) instead of failing on a missing key.

12What we already do better — lean in

End-to-end to launch. Brief → launched Bison campaigns with copy, personalization, QA. They stop at CSV/draft. Biggest structural advantage.
Copy framework. 0–100 rubric, proximity auditor, 300+-phrase spam registry — production systems with zero external equivalent.
QA + scorecard. Rendered semantic QA, mechanical scan, and an 11-dim scorecard that hard-blocks grade F. Bad copy can't reach inboxes.
Cross-run learning (corrected). Nexus retrieve-before-write + retrospective deposit. The external repo has nothing like it.
Durability. Per-row resume, row-conservation invariant, SHA-256 brief freeze, decisions.md audit trail.
Structural launch gate. operator_confirms + signature, exit 7 hard blocks — irreversibility handled with code, not vibes.

13Internal gaps the recon surfaced (not about them — about us)

These aren't carry-overs; they're things to fix in our own house, found while reading our source for this comparison.

GapDetailEvidence
TS cost tracker writes $0OpenAIPromptRunner records costUsd:0 every call; TS-native runs report $0.00 costopenai-prompt-runner.ts:115
Nexus not wired into TS harnessTS-native stages never read/write the KG — the feedback loop only fires on the Python pathintegrations/nexus.py
3 stages bridge-onlycopy (07), scorecard (08.5), launch (09) have no TS handler — always spawn the Python subprocess, no migration target trackedflows/registry.ts:22-23
Signal-bank disconnectedManual run order, no cron/Trigger.dev; a stale RUNNING.lock suggests a crashed run never cleaned upsignal-bank/runs/.../RUNNING.lock

Verification provenance. Both external repos were shallow-cloned and read in full; gtm-orchestrator was read read-only on this machine. All file:line citations were produced by three independent recon passes on 2026-06-12 and reconciled against the first-pass audit (PR #102), whose three material errors are corrected above. Items the recon could not confirm against live APIs (e.g. the external repo's synthetic example run, exact copy-stage model) are flagged in-line rather than asserted.

LeadGrow GTM — internal architecture review · 2026-06-12