Assay
COBOL documentation generator. 5-pass analysis pipeline using Claude Opus 4.6 with 1M token context window.
01 System Overview
Assay accepts legacy COBOL source code and produces comprehensive plain-English documentation through a multi-pass AI analysis pipeline. Each source program is processed through 5 specialized passes, generating structured markdown covering business rules, dependency maps, dead code detection, and data flow analysis.
The system is designed for enterprises with large COBOL codebases - banks, insurance companies, government agencies, and logistics firms - where understanding 40-year-old business logic is critical but COBOL expertise is rapidly retiring.
02 Technology Stack
| Layer | Technology | Purpose |
|---|---|---|
| Framework | Next.js 16 (App Router) | Server-side rendering, API routes, edge middleware |
| Runtime | React 19 | UI components, client interactivity |
| Language | TypeScript 5 (strict) | Type safety across all modules |
| Styling | Tailwind CSS v4 | Utility-first with glassmorphism design tokens |
| AI Engine | Claude Opus 4.6 (Anthropic SDK) | 1M token context, 5-pass analysis |
Resend | Transactional email for PoC requests | |
| State | Zustand 5 | Client-side state management |
| Testing | Playwright 1.58 | Cross-browser E2E testing |
| Hosting | Vercel | Edge deployment, serverless functions |
| Analytics | Vercel Analytics | Traffic and performance monitoring |
03 High-Level Architecture
graph TB
subgraph Client["Browser Client"]
LP["Landing Page"]
DP["Demo Page"]
BK["Booking Widget
donnacha.app"]
PP["Privacy / Terms"]
end
subgraph Edge["Next.js Edge Middleware"]
RL["Rate Limiter
IP-based sliding window"]
HEADERS["Security Headers
HSTS, X-Frame, CSP"]
end
subgraph API["API Routes (Serverless)"]
A_UP["/api/upload
POST"]
A_PROC["/api/process
POST"]
A_STAT["/api/status/:id
GET"]
A_DL["/api/download/:id
GET"]
A_CONT["/api/contact
POST"]
end
subgraph Core["Core Processing"]
PARSER["COBOL Parser
Regex-based structural analysis"]
GROUPER["Program Grouper
Resolve COPY/CALL references"]
CHAIN["Call Chain Analyzer
Dependency graph builder"]
PIPELINE["5-Pass AI Pipeline
Claude Opus 4.6"]
MARKDOWN["Markdown Assembler
Knowledge base builder"]
MERMAID_GEN["Mermaid Generator
Fallback diagram builder"]
end
subgraph Storage["In-Memory Store"]
JOBS[("Job Map
Map<id, StoredJob>")]
end
subgraph External["External Services"]
ANTHROPIC["Anthropic API
Claude Opus 4.6"]
DONNACHA["Donnacha VPS
Booking API"]
RESEND["Resend API
Email delivery"]
end
Client --> Edge --> API
A_UP --> PARSER --> GROUPER --> CHAIN
A_UP --> JOBS
A_PROC --> PIPELINE
PIPELINE --> ANTHROPIC
PIPELINE --> MARKDOWN
MARKDOWN --> MERMAID_GEN
PIPELINE --> JOBS
A_STAT --> JOBS
A_DL --> JOBS
A_CONT --> RESEND
BK -.->|"iframe / API"| DONNACHA
DP -.->|"Static JSON
Zero API cost"| DP
style Client fill:#0d1530,stroke:#1e3a5f,color:#e0e4ef
style Edge fill:#0d1530,stroke:#1e3a5f,color:#e0e4ef
style API fill:#0d1530,stroke:#1e3a5f,color:#e0e4ef
style Core fill:#0d1530,stroke:#1e3a5f,color:#e0e4ef
style Storage fill:#0d1530,stroke:#1e3a5f,color:#e0e4ef
style External fill:#0d1530,stroke:#1e3a5f,color:#e0e4ef
04 5-Pass AI Pipeline
Each COBOL program group is processed through 4 AI passes (the dependency map is generated programmatically). All passes share a system prompt: a senior COBOL systems analyst with 30 years of mainframe experience.
sequenceDiagram
participant C as Client
participant P as /api/process
participant S as Job Store
participant AI as Claude Opus 4.6
participant M as Markdown Assembler
C->>P: POST {jobId}
P->>S: getJob(jobId)
P-->>C: {status: processing}
loop For each program group
P->>AI: Pass 1 - Overview Prompt
(full source + copybooks + calls)
AI-->>P: 9-section markdown
P->>AI: Pass 2 - Business Rules
(source + copybooks)
AI-->>P: Rules table (ID, condition, action, severity)
P->>P: Pass 3 - Dependency Map
(programmatic from DependencyGraph)
P->>AI: Pass 4 - Dead Code
(source + parsed structure metadata)
AI-->>P: Dead code table (type, confidence, recommendation)
P->>AI: Pass 5 - Data Flow
(source + FDs + linkage items)
AI-->>P: Mermaid sequence diagram + narrative
P->>M: Assemble program document
M->>S: Store markdown output
P->>S: Update progress
end
P->>M: Generate project overview + index
M->>S: Store final bundle
P->>S: Status = complete
Pass Details
| Pass | Input | Output | Key Feature |
|---|---|---|---|
| 01 Overview | Full source + resolved copybooks + called programs | 9-section structured markdown | Business purpose, processing logic with line refs, modernization notes |
| 02 Business Rules | Program source + copybooks | Markdown table with severity ratings | Every conditional extracted with business meaning. Severity: CRITICAL / IMPORTANT / INFORMATIONAL |
| 03 Dependencies | Pre-built DependencyGraph (nodes + edges) | Mermaid graph TD diagram | AI-enhanced labels, subgraphs for >15 nodes, CALL vs COPY distinction |
| 04 Dead Code | Source + structure metadata (paragraphs, data items) | Markdown table with confidence levels | Considers PERFORM THRU, REDEFINES, dynamic CALLs. Confidence: HIGH / MEDIUM / LOW |
| 05 Data Flow | Source + FD names + linkage items | Mermaid sequence diagram + narrative | Traces data from input files through transformations to outputs |
05 Data Flow
flowchart LR
subgraph Upload["1. Upload"]
FILES["COBOL Files
.cbl .cob .cpy"]
end
subgraph Parse["2. Parse"]
CLASSIFY["Classify
program / copybook / JCL"]
STRIP["Strip Sequence Numbers
Columns 1-6"]
EXTRACT["Extract Structure
Divisions, Sections, Paragraphs
Data Items, FDs, COPY/CALL"]
end
subgraph Group["3. Group"]
RESOLVE["Resolve References
Match COPY to copybooks
Match CALL to programs"]
TOKEN["Estimate Tokens
18 tokens/line x 1.1"]
SPLIT["Split if > 800K tokens"]
end
subgraph Estimate["4. Estimate"]
COST["Calculate Cost
Tokens x 5 passes x pricing"]
TIER["Assign Tier
S / M / L / XL"]
end
subgraph Process["5. AI Pipeline"]
P1["Pass 1: Overview"]
P2["Pass 2: Rules"]
P3["Pass 3: Deps"]
P4["Pass 4: Dead Code"]
P5["Pass 5: Data Flow"]
end
subgraph Output["6. Output"]
ASSEMBLE["Assemble Markdown"]
BUNDLE["JSON Knowledge Base
Content-Disposition: attachment"]
end
FILES --> CLASSIFY --> STRIP --> EXTRACT
EXTRACT --> RESOLVE --> TOKEN --> SPLIT
SPLIT --> COST --> TIER
TIER --> P1 --> P2 --> P3 --> P4 --> P5
P5 --> ASSEMBLE --> BUNDLE
style Upload fill:#0a1628,stroke:#1e3a5f,color:#e0e4ef
style Parse fill:#0a1628,stroke:#1e3a5f,color:#e0e4ef
style Group fill:#0a1628,stroke:#1e3a5f,color:#e0e4ef
style Estimate fill:#0a1628,stroke:#1e3a5f,color:#e0e4ef
style Process fill:#0a1628,stroke:#1e3a5f,color:#e0e4ef
style Output fill:#0a1628,stroke:#1e3a5f,color:#e0e4ef
06 API Routes
flowchart TB
subgraph Routes["API Routes (All Public)"]
POST_CONTACT["POST /api/contact
PoC request form
Sends via Resend"]
POST_UPLOAD["POST /api/upload
Multipart COBOL files
Parse + estimate cost"]
POST_PROCESS["POST /api/process
Start AI pipeline
Async, returns immediately"]
GET_STATUS["GET /api/status/:id
Poll job progress
0-100% + currentStep"]
GET_DOWNLOAD["GET /api/download/:id
Download knowledge base
JSON attachment"]
end
subgraph RateLimits["Rate Limits (In-Memory)"]
RL2["Contact: 5 / 1 hr / IP"]
RL3["Process: 3 / 6 hr / IP"]
end
subgraph Booking["Booking (External)"]
BW["donnacha.app/booking-widget.js
Multi-tenant calendar"]
BA["donnacha.app/api/booking
Availability + scheduling"]
end
POST_CONTACT --- RL2
POST_PROCESS --- RL3
BW --> BA
style Routes fill:#0a1628,stroke:#34d399,color:#e0e4ef
style RateLimits fill:#0a1628,stroke:#ef4444,color:#e0e4ef
style Booking fill:#0a1628,stroke:#00d4ff,color:#e0e4ef
| Route | Method | Rate Limit | Description |
|---|---|---|---|
/api/contact | POST | 5 / 1 hr | PoC request form. Validates fields, sends via Resend |
/api/upload | POST | - | Accepts multipart files (max 100MB). Returns jobId + cost estimate |
/api/process | POST | 3 / 6 hr | Triggers async 5-pass pipeline. Returns immediately |
/api/status/:id | GET | - | Returns progress (0-100), currentStep, status. No source code exposed |
/api/download/:id | GET | - | Downloads completed knowledge base as JSON bundle |
07 Middleware & Security Headers
The site is fully public (no authentication gate). Edge middleware applies security headers and passes all requests through. The auth gate was removed to serve as a portfolio demo site.
flowchart TD
REQ["Incoming Request"] --> STATIC{"Static Asset?
/_next, /favicon, .ico"}
STATIC -->|Yes| PASS["Pass Through"]
STATIC -->|No| HEADERS["Apply Security Headers
HSTS, X-Frame-Options,
Content-Type-Options"]
HEADERS --> PASS
style REQ fill:#0a1628,stroke:#00d4ff,color:#e0e4ef
style PASS fill:#0a2618,stroke:#34d399,color:#e0e4ef
style HEADERS fill:#0a1628,stroke:#f0b429,color:#e0e4ef
Security Headers (next.config.ts)
| Header | Value | Purpose |
|---|---|---|
X-Frame-Options | DENY | Clickjacking prevention |
X-Content-Type-Options | nosniff | MIME-sniffing prevention |
Referrer-Policy | strict-origin-when-cross-origin | Referrer data control |
Strict-Transport-Security | max-age=63072000; includeSubDomains; preload | Force HTTPS (2-year HSTS) |
Permissions-Policy | camera=(), microphone=(), geolocation=() | Disable unnecessary browser APIs |
X-Powered-By | removed | Framework fingerprint suppression |
08 COBOL Parser
Regex-based structural parser for fixed-format COBOL (columns 7-72). Handles sequence number stripping, division/section/paragraph extraction, data item parsing, and reference resolution.
flowchart LR
RAW["Raw Source"] --> SEQ["Strip Sequence
Numbers (cols 1-6)"]
SEQ --> DIV["Extract Divisions
IDENTIFICATION
ENVIRONMENT
DATA
PROCEDURE"]
DIV --> SEC["Extract Sections
WORKING-STORAGE
FILE
LINKAGE
INPUT-OUTPUT"]
SEC --> PARA["Extract Paragraphs
Named code blocks"]
PARA --> DATA["Extract Data Items
Level numbers, PIC,
USAGE, OCCURS,
REDEFINES"]
DATA --> FD["Extract FD/SD
File descriptors"]
FD --> REFS["Extract References
COPY statements
CALL statements"]
REFS --> METRICS["Compute Metrics
Lines, code, comments,
max nesting depth"]
METRICS --> STRUCT["CobolStructure"]
style RAW fill:#0a1628,stroke:#1e3a5f,color:#e0e4ef
style STRUCT fill:#0a1628,stroke:#00d4ff,color:#00d4ff
Key Parsing Rules
| Feature | Detection Pattern | Notes |
|---|---|---|
| File classification | .cbl/.cob = program, .cpy/.copy = copybook, .jcl/.proc = JCL | Falls back to content-based detection |
| Sequence numbers | 6-digit prefix in columns 1-6 | Stripped during parse, not stored |
| CALL detection | CALL ['"](\w+)['"] vs CALL (\w+) | Literal = static, identifier = dynamic |
| COPY detection | COPY (\S+) + optional REPLACING | Handles member-of-library syntax |
| Token estimation | 18 tokens/line * 1.1 overhead | Conservative estimate for cost calculation |
| Group size limit | 800,000 tokens max per group | Programs exceeding this are split |
09 Type System
classDiagram
class CobolProject {
+string name
+CobolFile[] allFiles
+CobolFile[] programs
+CobolFile[] copybooks
+ProcessingGroup[] groups
+number totalLines
+number totalPrograms
}
class CobolFile {
+string path
+string name
+string content
+CobolFileType type
+CobolStructure structure
}
class CobolStructure {
+string programId
+CobolDivision[] divisions
+CobolSection[] sections
+CobolParagraph[] paragraphs
+CobolDataItem[] dataItems
+CobolFileDescriptor[] fileDescriptors
+CobolCopyStatement[] copyStatements
+CobolCallStatement[] callStatements
+CobolMetrics metrics
}
class ProcessingGroup {
+CobolFile primaryProgram
+CobolFile[] copybooks
+CobolFile[] calledPrograms
+number estimatedTokens
}
class ProcessingJob {
+string id
+JobStatus status
+number progress
+string currentStep
+CostEstimate costEstimate
}
class KnowledgeBase {
+ProgramDocumentation[] programs
+string systemDependencyMap
+GlossaryEntry[] glossary
+ProjectStatistics statistics
}
class ProgramDocumentation {
+ProgramOverview overview
+BusinessRule[] businessRules
+string dependencyMap
+DeadCodeItem[] deadCode
+string dataFlow
}
CobolProject "1" --> "*" CobolFile
CobolProject "1" --> "*" ProcessingGroup
CobolFile "1" --> "1" CobolStructure
ProcessingGroup "1" --> "1" CobolFile : primaryProgram
ProcessingGroup "1" --> "*" CobolFile : copybooks
ProcessingJob "1" --> "1" CobolProject
KnowledgeBase "1" --> "*" ProgramDocumentation
Type Modules
| Module | Types | Purpose |
|---|---|---|
cobol.ts | CobolFile, CobolStructure, CobolDivision, CobolSection, CobolParagraph, CobolDataItem, CobolFileDescriptor, CobolCopyStatement, CobolCallStatement, CobolMetrics, ProcessingGroup, CobolProject | COBOL source model |
job.ts | ProcessingJob, JobStatus, UploadedFile, CostEstimate | Processing lifecycle |
documentation.ts | KnowledgeBase, ProgramDocumentation, ProgramOverview, BusinessRule, DeadCodeItem, GlossaryEntry, ProjectStatistics, IndexEntry | Output documents |
10 Pre-Computed Demo
The demo replays pre-generated Claude Opus 4.6 output with zero API cost. The analysis was captured once and stored as static JSON.
flowchart TB
subgraph OneTime["One-Time Generation (Offline)"]
SAMPLE["Sample COBOL
PAYROLL-CALC.cbl
301 lines, 2 copybooks"]
RUN["Run 5-Pass Pipeline
Claude Opus 4.6"]
JSON["payroll-calc.json
4 pass outputs, stats,
business rules, diagrams"]
end
subgraph Runtime["Runtime (Zero API Cost)"]
FETCH["fetch('/demo/output/
payroll-calc.json')"]
PARSE_JSON["Parse JSON
4 passes + metadata"]
TYPE["Typewriter Animation
8ms/char, 40ms newline pause"]
TABS["Tab Navigation
Overview | Rules |
Dead Code | Data Flow"]
DL["Download Bundle
Client-side assembly"]
end
SAMPLE --> RUN --> JSON
JSON -.->|"Static file
from /public"| FETCH
FETCH --> PARSE_JSON --> TYPE
PARSE_JSON --> TABS
TABS --> DL
style OneTime fill:#1a1628,stroke:#a78bfa,color:#e0e4ef
style Runtime fill:#0a1628,stroke:#34d399,color:#e0e4ef
Demo Sample Program
| Metric | Value |
|---|---|
| Program ID | PAYROLL-CALC |
| Total lines | 301 (224 code, 42 comments) |
| Copybooks | PAYROLL-CONSTANTS.cpy, TAX-TABLES.cpy |
| Paragraphs | 12 |
| CALL statements | 2 |
| Business rules found | 13 |
| Dead code items | 8 |
11 Security Hardening
flowchart TB
subgraph Layer1["Layer 1: Edge"]
HSTS["HSTS Preload
2-year max-age"]
FRAME["X-Frame-Options
DENY"]
MIME["Content-Type
nosniff"]
PERM["Permissions Policy
No camera/mic/geo"]
end
subgraph Layer2["Layer 2: Public Access"]
MW["Edge Middleware
Pass-through, no auth gate"]
NOTE["Portfolio demo mode
All routes publicly accessible"]
end
subgraph Layer3["Layer 3: Rate Limiting"]
RL_A["Auth: 10/15min"]
RL_C["Contact: 5/hr"]
RL_P["Process: 3/6hr
(API cost protection)"]
end
subgraph Layer4["Layer 4: Input Validation"]
JOB_ID["Job ID regex
cb-[a-z0-9]+-[a-z0-9]+"]
FILE_EXT["File extension allowlist
.cbl .cob .cpy .jcl"]
MAX_SIZE["Upload size limit
100MB max"]
FIELDS["Contact form validation
Max lengths, email regex"]
end
subgraph Layer5["Layer 5: Data Protection"]
NO_SRC["No source in status API
getPublicJob() strips code"]
VOLATILE["Volatile job store
Auto-clears on restart"]
NO_LOG["No source code logging"]
end
Layer1 --> Layer2 --> Layer3 --> Layer4 --> Layer5
style Layer1 fill:#0a1628,stroke:#00d4ff,color:#e0e4ef
style Layer2 fill:#0a1628,stroke:#f0b429,color:#e0e4ef
style Layer3 fill:#0a1628,stroke:#ef4444,color:#e0e4ef
style Layer4 fill:#0a1628,stroke:#a78bfa,color:#e0e4ef
style Layer5 fill:#0a1628,stroke:#34d399,color:#e0e4ef
12 Test Coverage
flowchart TB
subgraph Setup["Setup Phase"]
AUTH_SETUP["auth.setup.ts
Authenticate once
Save cookie to .auth/state.json"]
end
subgraph Tests["Test Suites (52 tests)"]
AUTH_SPEC["auth.spec.ts
5 tests
(runs WITHOUT auth)"]
LANDING["landing-page.spec.ts
13 tests"]
LEGAL["legal-pages.spec.ts
14 tests"]
CONTACT["contact-form.spec.ts
8 tests"]
API_TEST["api-routes.spec.ts
8 tests"]
end
AUTH_SETUP -->|"storageState"| LANDING
AUTH_SETUP -->|"storageState"| LEGAL
AUTH_SETUP -->|"storageState"| CONTACT
AUTH_SETUP -->|"storageState"| API_TEST
AUTH_SPEC -.->|"No storageState
Tests login gate"| AUTH_SPEC
style Setup fill:#0a1628,stroke:#f0b429,color:#e0e4ef
style Tests fill:#0a1628,stroke:#34d399,color:#e0e4ef
| Suite | Tests | Focus | Auth |
|---|---|---|---|
auth.spec.ts | 5 | Login gate, redirect, wrong password, correct password, API 401 | None |
landing-page.spec.ts | 13 | Hero, stats, CTAs, 5 passes, code preview, pricing, trust, footer, nav | Authenticated |
legal-pages.spec.ts | 14 | Privacy (7 sections, Anthropic mention, 30-day deletion), Terms (6 sections, WA jurisdiction), cross-page nav | Authenticated |
contact-form.spec.ts | 8 | Form fields, submission, validation, loading state, API validation (400s) | Authenticated |
api-routes.spec.ts | 8 | Upload success/rejection, status 404, download 404, process 400/404 | Authenticated |
Test Configuration
| Setting | Value |
|---|---|
| Browser | Chromium only |
| Base URL | http://localhost:3099 (avoids port conflicts) |
| CI mode | Single worker, 2 retries |
| Dev mode | Parallel workers, reuse dev server |
| Artifacts | Screenshots on failure, traces on first retry |
13 Booking System
Discovery calls are scheduled through a multi-tenant booking widget. The widget loads from the VPS and communicates with the booking API.
sequenceDiagram
participant U as User
participant W as Booking Widget
(donnacha.app)
participant API as Booking API
(VPS)
participant E as Email
(Notification)
U->>W: Click "Book a Call"
W->>API: GET /api/booking/assay/availability
API-->>W: Available slots
(Mon-Thu 10am-4pm AWST)
W->>U: Display calendar
U->>W: Select time slot
W->>API: POST /api/booking/assay/book
API->>E: Send confirmation
(project: assay tagged)
API-->>W: Booking confirmed
W->>U: Show confirmation
Configuration
| Setting | Value |
|---|---|
| Project ID | assay |
| Display Name | Assay - COBOL Documentation |
| Service Type | 15-min discovery call |
| Available Days | Monday - Thursday |
| Available Hours | 10:00 AM - 4:00 PM (AWST) |
| Minimum Notice | 48 hours |
| Widget Source | donnacha.app/booking-widget.js |
| Button Text | Hidden (custom CTA used) |