Engineering Notes
HOW THE
PIPELINE WORKS.
One Vercel deploy. No separate backend. Every generation runs in a Node.js serverless function with a 60-second budget. Below: the eight steps between a brief and streamed copy, the non-obvious decisions, and the gaps honestly disclosed.
Pipeline diagram
Client (browser)
│
│ POST /api/generate { brief, contentType, tone, length }
▼
┌─────────────────────────────────────────────────────────────────┐
│ Next.js Route Handler (Node.js runtime, maxDuration=60s) │
│ │
│ 1. getClientIp() ──► checkRate() ──► 429 if exhausted │
│ │
│ 2. JSON.parse(body) ──► GenerateSchema.safeParse() │
│ (zod: enums + 10–4,000 char cap) │
│ ──► 422 on invalid │
│ │
│ 3. buildPrompt(brief, contentType, tone, length) │
│ ├── select system instruction per contentType │
│ ├── apply tone modifier string │
│ ├── apply word-count target per length │
│ └── wrap brief in <BRIEF>…</BRIEF> DATA block │
│ │
│ 4. getVertex() ──► model.generateContentStream() │
│ └── maxOutputTokens cap (512 / 1024 / 2048) │
│ │
│ 5. ReadableStream ──► sseBytes('delta', chunk) │
│ ──► sseBytes('done', {}) │
│ ──► sseBytes('error', {message}) │
│ (no stack traces to client; server-logged only) │
└─────────────────────────────────────────────────────────────────┘
│
│ text/event-stream (SSE over HTTP)
▼
Client SSE parser (parseSseChunk — handles partial chunks)
│
├── event: delta ──► accumulate ──► setOutput()
├── event: done ──► setStatus('done')
└── event: error ──► setError()
│
▼
SafeMarkdown (react-markdown + remark-gfm)
└── skipHtml=true — raw HTML in model output is stripped
No dangerouslySetInnerHTML anywhereStep by step.
- 01
Rate limit by IP
Upstash Redis sliding window — 30 requests per IP per 24 hours. Gracefully no-ops when Upstash is unconfigured (local dev). Prefix: rl:content.
- 02
zod Input Validation
GenerateSchema enforces: brief 10–4,000 chars, contentType ∈ {blog, product, social}, tone ∈ {professional, friendly, bold, playful}, length ∈ {short, medium, long}. Any violation returns 422 with a typed error message — no raw Zod output.
- 03
Prompt Construction
buildPrompt() selects a system instruction per content type (blog / product / social), injects tone and word-count modifiers, then wraps the user's brief inside <BRIEF>…</BRIEF> XML tags as a clearly-labelled DATA block. The model has already received its instructions in the system turn and treats the brief as content to write about, not as commands.
- 04
Prompt Injection Defence
The brief is sandwiched between XML delimiters in the user message, not interpolated into the system prompt. Even if the brief contains "Ignore all previous instructions", the model has already been told what to do and sees the brief as data inside a named section. This is not a perfect defence — no purely prompt-level solution is — but it is the industry-standard mitigation for single-turn generation tasks.
- 05
Gemini 2.5 Flash Streaming
generateContentStream() is called with a maxOutputTokens cap per length tier: 512 (short) / 1,024 (medium) / 2,048 (long). Temperature 0.8 for varied, natural-sounding output. maxDuration=60s on the route prevents runaway Vercel function charges.
- 06
Server-Sent Events
Each token chunk from Gemini is encoded as an SSE "delta" event and flushed immediately. The stream terminates with a "done" event or an "error" event (message only — no stack trace). The client uses a stateful parseSseChunk() helper that handles partial TCP frames correctly.
- 07
Safe Markdown Rendering
react-markdown with skipHtml=true parses the streamed output to a React virtual DOM. No dangerouslySetInnerHTML anywhere in the output path. Model-generated links are rendered as non-clickable spans — untrusted URLs are not given href attributes.
- 08
Error Handling
Server errors are logged with full context but only a generic "Content generation failed" message reaches the client. The client distinguishes rate-limit (429), validation (422), and stream errors, showing appropriate UI for each.
What is defended
- Input size cap (10–4,000 chars) — prevents oversized prompt attacks.
- Enum validation — only allowed content types, tones, lengths accepted.
- Rate limit 30/day by IP — limits abuse cost.
- maxOutputTokens cap — hard ceiling on generation cost per request.
- Prompt injection mitigation — brief in DATA block, not instructions.
- No raw HTML from model — react-markdown skipHtml=true.
- No dangerouslySetInnerHTML — ever.
- No stack traces to client — typed error codes only.
- GCP credentials never reach the client bundle.
Known gaps (honest)
- Prompt injection is mitigated, not eliminated. A sufficiently crafted brief in a multi-turn or instruction-following context could still influence output style beyond intent. Content moderation (e.g. Vertex AI Safety filters) is not enabled in this demo.
- Rate limiting is per-IP from x-forwarded-for, which is trivially spoofed without a WAF. Adequate for a demo; production would add auth.
- No CSRF protection — POST is open. Acceptable for a stateless generation endpoint with no side effects, but worth noting.
- Token budget is capped but Vertex billing is not monitored here — a GCP budget alert is recommended in production.
- Output is not validated by zod — we trust the model to return coherent markdown. A schema-constrained structured-output endpoint would strengthen this.
Stack
- Framework
- Next.js 14, App Router
- AI
- Vertex AI — Gemini 2.5 Flash
- Streaming
- Server-Sent Events
- Validation
- zod 4
- Rate Limit
- Upstash Redis
- Markdown
- react-markdown + remark-gfm
- Deploy
- Vercel
Next step
Need content at scale?
This demo generates one piece at a time with a browser session. The production architecture adds batch generation, brand-voice fine-tuning via system prompt injection, CMS integration (Contentful, Notion, Sanity), and a post-generation SEO scoring pass. If you have a content pipeline problem, email me.