From the visitor's side, a form submission is a button and a half-second spinner. From the system's side, that half second contains a small assembly line: a spam gate, a truth check, an identity decision, a filing step, and an announcement that the rest of the platform listens for. This post traces one submission through our shipped pipeline, end to end — and pulls out the design choices we'd defend in any system like it.
TL;DR: The form is a versioned definition file; the submission hits a public endpoint that challenges bots and validates every field server-side. Then the identity moment: if the form maps an email, the customer record is created or updated synchronously — the person exists before the response returns — joined to a contact list named for the form. The full payload (campaign parameters included) is stored as an archival submission record, and events fire — one generic, one named for the form — which is the hook journeys trigger on. Identity is normalized; payload is archival; the announcement is the integration surface.
Before the click: the form is a file
The pipeline starts before any visitor arrives, with a design decision: a form's definition — its fields, validation rules, and the mapping that says which fields mean identity — lives as a versioned configuration file, with the page that hosts it as a second artifact alongside. We've made the product case for this elsewhere; the engineering case is the same one as ever: artifacts you can inspect beat configuration you have to remember. A file is diffable when a field disappears, reviewable when AI writes it, and answerable when someone asks what the form looked like in March. The mapping deserves special mention because everything downstream pivots on it: it declares, explicitly, that this field is the email — and that declaration is what turns an anonymous payload into a customer.
At the door: the gate and the truth check
A form's submit endpoint is public by nature — it has to accept POSTs from any visitor's browser — and public endpoints get found by bots within days. Two stations guard the door:
- The spam gate. An invisible challenge verifies the submitter is plausibly human before the payload is processed. Invisible matters: every visible hoop costs real conversions, so the gate's job is to be cheap for humans and expensive for scripts.
- The truth check. Every field is validated server-side against the form's definition — types, required fields, shape. The browser's validation is a courtesy to the visitor; the server's validation is the rule. Anyone can POST anything to a public endpoint, so the definition file does double duty here: the same artifact that rendered the form is the contract the server enforces. One source of truth, both sides of the wire.
Junk that fails either station dies at the door — before identity, before storage, before events. The ordering is the point: nothing downstream ever has to ask "was this real?"
The identity moment: synchronous on purpose
Now the interesting decision. If the form's mapping includes an email, the pipeline resolves identity inline, before responding: the customer record is created — or updated, matched by email within the organization — the person is attached to a contact list (auto-created and named for the form if it doesn't exist), and the submission is stamped into their form history.
Synchronous identity is a deliberate trade. The async version — queue it, respond fast, reconcile later — shaves milliseconds and buys a consistency window where the lead exists but the customer doesn't: the window where a same-second follow-up email greets a person with no record, where two rapid submissions race to create twins. Doing the upsert inline means by the time the visitor sees "thanks!", the person your automations will act on already exists. For a pipeline whose entire purpose is feeding downstream actors, the milliseconds are cheap and the invariant is gold.
Note what does not get written to the person: everything else. Only mapped identity fields touch the record; the project description, the photo uploads, the qualifying answers stay with the submission. Identity is normalized — clean, comparable, deduplicatable. Payload is archival — rich, messy, preserved in context. Mixing them is how CRMs rot: forty submission-shaped fields on a person record, each true for exactly one moment.
The announcement: two events, two granularities
With identity resolved and the submission stored, the pipeline announces. Two events fire (plus an identify, so engagement systems learn who this is): a generic "form submitted" event, and a specific one carrying the form's title in its name. The split looks redundant and isn't:
| Event | Granularity | Who listens |
|---|---|---|
| "Form Submitted" | All forms | Analytics, dashboards, anything counting activity broadly |
| "Form Submitted: {title}" | One form | Journey event entries — the welcome sequence for this form |
The named event is the integration surface: a journey binds to that exact name, which makes the form's title part of a contract — the same lesson as the journey post, from the other side of the wire. The form announces; it neither knows nor cares whether anyone listens. That ignorance is the architecture: the forms pipeline has no journey logic in it, and the journey engine has no forms logic in it. The event name is the entire coupling, which is why either side can be rebuilt without the other noticing.
Where the submission itself lives
The full payload — every field, plus provenance like the campaign parameters the visitor arrived with — is stored as an append-friendly submission record, chunked for scale rather than normalized into someone's idea of a schema. Two choices worth noting. Provenance stays with the event: the UTM tags describe how this submission arrived, not who the person eternally is, so they live on the submission where attribution questions get asked — not smeared onto the profile where they'd be overwritten by the next visit and wrong forever after. And archival means archival: the submission is the historical fact of what was said; corrections and relationship truth happen on the customer record, never by editing history.
Lessons we'd carry to any pipeline
- Spend your synchrony budget on invariants. Identity-before-response is worth the milliseconds; everything that can tolerate eventual (analytics, broad counters) goes through events.
- One definition, enforced at the boundary. The artifact that renders the form validates the submission — divergence between what's shown and what's accepted becomes structurally impossible.
- Separate who someone is from what they sent. Normalized identity, archival payload, provenance-with-the-event — the trio that keeps both the CRM and the audit trail honest.
- Announce by name; couple by contract. Named events as the only seam between subsystems — and names treated like URLs, because someone built on them.
Key takeaways
- The half second has five stations: spam gate, server-side truth check, synchronous identity, archival storage, and a two-granularity announcement.
- The definition file does double duty: the artifact that renders the form is the contract the server enforces — one source of truth on both sides of the wire.
- Junk dies at the door: challenge and validation run before identity, storage, or events — nothing downstream ever asks "was this real?"
- Identity is synchronous on purpose: the customer record exists before the response returns, closing the consistency window where automations would greet a person with no record.
- Identity is normalized, payload is archival: only mapped fields touch the person; the rich messy rest stays on the submission, with provenance (UTM) where attribution is actually asked.
- The named event is the entire coupling: forms announce, journeys listen, neither contains the other's logic — and the event name is a contract someone built on.
Frequently asked questions
Why not validate only in the browser, since the form controls its own page?
Because the endpoint is public and the browser is optional: anyone — a bot, a curl command, a stale cached page with last month's fields — can POST directly. Browser validation exists for the visitor's experience (instant feedback, no round trip); server validation exists for the system's integrity. The useful framing: the browser validates so humans don't get annoyed; the server validates so the database doesn't get lied to. You need both, and they must read from the same definition or they'll drift apart.
What happens when the same email submits two different forms?
One person, two submissions, two list memberships. The identity upsert matches by email within the organization, so both submissions land on the same customer record's history — while each form's list gains the person independently, because the lists describe what someone asked about, and both are true. This is the normalized-identity dividend: the "two leads or one customer?" question never arises, because identity was resolved at the door.
Why fire events instead of calling the journey engine directly?
Direct calls couple the caller to the callee's existence, location, and API — the forms pipeline would need to know journeys exist, handle their failures, and change when they change. An announced event inverts all of it: the form's job ends at "this happened, here's its name," and any number of listeners — journeys today, something unbuilt tomorrow — subscribe without the form's knowledge. The decoupling also localizes failure: a journey hiccup can't fail a form submission, because the submission was complete before any listener woke up.
How does this design handle a burst — say, a viral post sending thousands of submissions?
Each station degrades independently and in the right order: the spam gate and validation are stateless and scale flat; the identity upsert is the contended step (everyone's writing customer records), which is exactly why it's the only synchronous database work in the path; storage is append-friendly by design, so submissions never queue behind each other's reorganization; and the events fan out to listeners that process at their own pace. The architecture's burst story is the same as its normal story — the synchronous core is minimal, and everything elastic is behind the announcement.
This pipeline ships inside Faster — every form a versioned artifact, every submission a customer record and an event. More engineering notes: the engineering blog.