Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
node_modules/
.next/
generated/
*.db
*.db-journal
*.tsbuildinfo
next-env.d.ts
.env.local
.DS_Store
46 changes: 46 additions & 0 deletions DESIGN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Design notes

## Data model

Four entities (Prisma + SQLite, `prisma/schema.prisma`):

- **CalendarEvent** — the anchor. `title`, `meetingType`, `startTime`/`endTime` stored as UTC instants plus the IANA `timezone` they were entered in (input is wall-clock time + zone, converted server-side with a DST-correct double-offset lookup, `lib/time.ts`), `status`.
- **RawTranscript** — 1:1 with an event (unique `eventId`). Immutable by construction: no update path exists, and a second attach returns 409.
- **ProcessedTranscriptVersion** — append-only versions per event (`@@unique(eventId, version)`). Holds `segments` (ordered `{speaker, text, timestamp}`) and a structured `summary`, both JSON. `source` distinguishes `PIPELINE` from `MANUAL_EDIT`; `jobId` links a pipeline version to the job that produced it. "Current" = highest version; everything else is history.
- **TranscriptJob** — explicit pipeline state: `PENDING → PROCESSING → COMPLETED | FAILED`, with `attempts`, `error`, and timing fields.

SQLite has no enums, so enum-like columns are strings validated with zod at every API boundary (`lib/types.ts` is the single source of truth for the unions).

## Async pipeline

Attaching a raw transcript creates the job and returns immediately; nothing is fire-and-forget because **every transition is a database row update**:

- An in-process runner (`lib/jobs/runner.ts`) drains PENDING jobs. Claims are atomic conditional updates (`UPDATE ... WHERE status = 'PENDING'`), so concurrent runners execute a job at most once — covered by a test that races three drains.
- The new version + the COMPLETED transition commit in one batch transaction; failures store the error message on the job, and a per-job error can never kill the drain loop. There are deliberately **no interactive transactions**: with one SQLite connection behind the driver adapter, an open interactive transaction can swallow interleaved writes on rollback, so writers use single statements / batch transactions with the `(eventId, version)` unique constraint as the race backstop.
- A sweeper (`lib/jobs/sweeper.ts`, started once per server via `instrumentation.ts`) covers the two crash modes the per-request kick can't: PENDING jobs left behind by a restart, and PROCESSING rows orphaned by a mid-job crash (reset to PENDING after 60s).
- **Retry** is user-triggered (FAILED → PENDING, attempts preserved). For demos, a transcript containing `[[FLAKY]]` deterministically fails its first attempt and succeeds on retry; a transcript with no parseable speaker lines fails permanently with a clear error.

Trade-off, made consciously: the queue is the database and the worker lives in the web process. For a single-user local app this gives full observability with zero infrastructure; the claim discipline means moving to a real worker (BullMQ, pg-boss, or a cron-driven runner) changes only who calls `drainJobs()`.

## Per-meeting-type formats

Processing is parse → clean → summarize (`lib/transcript/`). The split that keeps types cheap to add:

- Summarizers are a `Record<MeetingType, Summarizer>` registry (`lib/transcript/summarize/index.ts`). Each one *builds a different document* (Q&A pairs, roadshow sections, minutes with owners) but emits the same small block vocabulary (`paragraph | bullets | qa | actionItems`).
- Storage, API, and rendering only know that vocabulary, so they are type-agnostic.

**Adding a fourth meeting type** = add the value to `MEETING_TYPES` (the `Record` then fails to compile until you register a summarizer — the gap can't ship silently), write one summarizer file, and map its sample/label. No schema change, no API change, no UI change. A new *block kind* (e.g. a table) would be one renderer case + the zod union.

The processor is deterministic and rule-based (filler/stutter removal, role inference by word count, question-cue extraction, action-item/owner matching by content-word overlap). An LLM-backed processor would replace one function (`processRawTranscript`) behind the same signature.

## Tests (63)

The things that would actually break: parsing edge cases (near-miss speaker lines, hyphenated fillers, filler-only turns), cleaner idempotence and meaning preservation, all three summarizers against the real sample files (with count and absence assertions, not just substring presence), the full job lifecycle (success, parse failure, flaky-then-retry, atomic claims under concurrency, the one-active-job guard for enqueue *and* retry), sweeper crash recovery, version monotonicity for regenerate + manual edit, DST boundary conversion (including characterization of nonexistent/ambiguous wall-clock times), input validation (impossible calendar dates are 400s, not 500s), and the HTTP contract itself — the route handlers are called directly as functions for the attach→process→edit→retry flow and its 4xx cases.

## What I'd build next

1. **SSE or polling-with-ETag** instead of the 2.5s client poll for job status.
2. **Real queue + LLM processor** behind the existing interfaces (worker process, provider key via env).
3. **Segment-level diffing** between versions, and an explicit "restore version N" action.
4. **Full-text search** across processed transcripts (SQLite FTS5 would do locally).
5. Pagination, auth, and multi-tenancy — consciously skipped: single-user was in scope, and none of them change the core design.
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,21 @@
# Funda Take-Home: Meeting Transcript Platform

## Quick start (solution)

```bash
npm install
npm run setup # prisma generate + migrate + seed (3 demo events)
npm run dev # http://localhost:3000
```

`npm test` runs the suite (63 tests). See [DESIGN.md](DESIGN.md) for the data model, pipeline, and trade-offs.

Demo walkthrough: open a seeded event → **Load sample for this meeting type** (or paste/upload a `.txt`) → watch the job go `PENDING → PROCESSING → COMPLETED` → the processed transcript appears with the per-type summary. **Regenerate** or **Edit segments** to create new versions. To see the failure path, paste a sample with a line containing `[[FLAKY]]` added — the job fails on the first attempt and succeeds on **Retry** (a transcript with no `[HH:MM:SS] Speaker N:` lines fails permanently with a clear error).

---

*Original assignment below.*

At Funda we record many kinds of investor meetings — expert calls, company roadshows, internal weekly group calls — and turn their recordings into clean, structured, searchable transcripts. This assignment asks you to build a miniature version of that system.

**Timebox: aim for 4–6 focused hours.** We'd rather see a well-designed core than a feature-complete rush job. If you run out of time, write down what you'd do next in your design notes.
Expand Down
60 changes: 60 additions & 0 deletions app/api/events/[id]/raw-transcript/route.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
import { readFile } from 'node:fs/promises';
import path from 'node:path';
import { NextResponse } from 'next/server';
import { z } from 'zod';
import { prisma } from '@/lib/db';
import { attachRawTranscriptSchema, type MeetingType } from '@/lib/types';
import { kickJobRunner } from '@/lib/jobs/runner';
import { jsonError, readJsonBody, zodErrorResponse } from '@/lib/api';
import { serializeJob } from '@/lib/serialize';

// Attach a raw transcript: pasted text, an uploaded .txt file's contents, or
// `{ sample: true }` to load the bundled sample for the event's meeting type.
// Attaching automatically enqueues the processing job — there is no separate
// "process" action.

const bodySchema = z.union([z.object({ sample: z.literal(true) }), attachRawTranscriptSchema]);

const SAMPLE_FILES: Record<MeetingType, string> = {
EXPERT_CALL: 'expert-call-raw.txt',
ROADSHOW: 'roadshow-raw.txt',
WEEKLY_GROUP_CALL: 'weekly-group-call-raw.txt',
};

export async function POST(req: Request, { params }: { params: Promise<{ id: string }> }) {
const { id } = await params;
const parsed = bodySchema.safeParse(await readJsonBody(req));
if (!parsed.success) return zodErrorResponse(parsed.error);

const event = await prisma.calendarEvent.findUnique({
where: { id },
include: { rawTranscript: { select: { id: true } } },
});
if (!event) return jsonError(404, 'Event not found');
if (event.rawTranscript) {
return jsonError(409, 'A raw transcript is already attached; raw transcripts are immutable');
}

let content: string;
let fileName: string | undefined;
if ('sample' in parsed.data) {
fileName = SAMPLE_FILES[event.meetingType as MeetingType];
content = await readFile(path.join(process.cwd(), 'sample-data', fileName), 'utf8');
} else {
content = parsed.data.text;
fileName = parsed.data.fileName;
}

// Raw transcript + its processing job are created atomically (nested
// write): a crash between the two can't leave a transcript with no job.
const raw = await prisma.rawTranscript.create({
data: { eventId: event.id, content, fileName, jobs: { create: { eventId: event.id } } },
include: { jobs: true },
});
kickJobRunner();

return NextResponse.json(
{ rawTranscriptId: raw.id, job: serializeJob(raw.jobs[0]) },
{ status: 201 },
);
}
26 changes: 26 additions & 0 deletions app/api/events/[id]/regenerate/route.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
import { NextResponse } from 'next/server';
import { prisma } from '@/lib/db';
import { ActiveJobError, enqueueProcessingJob } from '@/lib/jobs/runner';
import { jsonError } from '@/lib/api';
import { serializeJob } from '@/lib/serialize';

// Re-run the pipeline against the immutable raw transcript. The result is a
// NEW processed transcript version; history is never overwritten.

export async function POST(_req: Request, { params }: { params: Promise<{ id: string }> }) {
const { id } = await params;
const event = await prisma.calendarEvent.findUnique({
where: { id },
include: { rawTranscript: { select: { id: true } } },
});
if (!event) return jsonError(404, 'Event not found');
if (!event.rawTranscript) return jsonError(409, 'No raw transcript attached yet');

try {
const job = await enqueueProcessingJob(event.id, event.rawTranscript.id);
return NextResponse.json({ job: serializeJob(job) }, { status: 201 });
} catch (err) {
if (err instanceof ActiveJobError) return jsonError(409, err.message);
throw err;
}
}
18 changes: 18 additions & 0 deletions app/api/events/[id]/route.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
import { NextResponse } from 'next/server';
import { prisma } from '@/lib/db';
import { jsonError } from '@/lib/api';
import { serializeEventDetail } from '@/lib/serialize';

export async function GET(_req: Request, { params }: { params: Promise<{ id: string }> }) {
const { id } = await params;
const event = await prisma.calendarEvent.findUnique({
where: { id },
include: {
rawTranscript: true,
jobs: { orderBy: { createdAt: 'desc' } },
processedVersions: { orderBy: { version: 'desc' } },
},
});
if (!event) return jsonError(404, 'Event not found');
return NextResponse.json(serializeEventDetail(event));
}
30 changes: 30 additions & 0 deletions app/api/events/[id]/versions/route.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import { NextResponse } from 'next/server';
import { prisma } from '@/lib/db';
import { manualEditSchema, type MeetingType } from '@/lib/types';
import { createManualVersion, NothingToEditError } from '@/lib/versions';
import { jsonError, readJsonBody, zodErrorResponse } from '@/lib/api';
import { serializeVersion } from '@/lib/serialize';

// Manual edit: the user submits corrected segments, which become a NEW
// version (source = MANUAL_EDIT) — history is never overwritten.

export async function POST(req: Request, { params }: { params: Promise<{ id: string }> }) {
const { id } = await params;
const parsed = manualEditSchema.safeParse(await readJsonBody(req));
if (!parsed.success) return zodErrorResponse(parsed.error);

const event = await prisma.calendarEvent.findUnique({ where: { id } });
if (!event) return jsonError(404, 'Event not found');

try {
const version = await createManualVersion(
id,
event.meetingType as MeetingType,
parsed.data.segments,
);
return NextResponse.json(serializeVersion(version), { status: 201 });
} catch (err) {
if (err instanceof NothingToEditError) return jsonError(409, err.message);
throw err;
}
}
28 changes: 28 additions & 0 deletions app/api/events/route.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import { NextResponse } from 'next/server';
import { prisma } from '@/lib/db';
import { listEventSummaries } from '@/lib/events';
import { createEventSchema } from '@/lib/types';
import { zonedNaiveToUtc } from '@/lib/time';
import { readJsonBody, zodErrorResponse } from '@/lib/api';

export async function GET() {
return NextResponse.json(await listEventSummaries());
}

export async function POST(req: Request) {
const parsed = createEventSchema.safeParse(await readJsonBody(req));
if (!parsed.success) return zodErrorResponse(parsed.error);

const { title, meetingType, startLocal, endLocal, timezone, status } = parsed.data;
const event = await prisma.calendarEvent.create({
data: {
title,
meetingType,
startTime: zonedNaiveToUtc(startLocal, timezone),
endTime: zonedNaiveToUtc(endLocal, timezone),
timezone,
status,
},
});
return NextResponse.json(event, { status: 201 });
}
20 changes: 20 additions & 0 deletions app/api/jobs/[id]/retry/route.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import { NextResponse } from 'next/server';
import { prisma } from '@/lib/db';
import { ActiveJobError, retryJob } from '@/lib/jobs/runner';
import { jsonError } from '@/lib/api';

export async function POST(_req: Request, { params }: { params: Promise<{ id: string }> }) {
const { id } = await params;
try {
const retried = await retryJob(id);
if (!retried) {
const job = await prisma.transcriptJob.findUnique({ where: { id }, select: { status: true } });
if (!job) return jsonError(404, 'Job not found');
return jsonError(409, `Only FAILED jobs can be retried (job is ${job.status})`);
}
return NextResponse.json({ ok: true });
} catch (err) {
if (err instanceof ActiveJobError) return jsonError(409, err.message);
throw err;
}
}
28 changes: 28 additions & 0 deletions app/events/[id]/page.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import { notFound } from 'next/navigation';
import { prisma } from '@/lib/db';
import { serializeEventDetail } from '@/lib/serialize';
import { EventDetail } from '@/components/EventDetail';
import type { EventDetailView } from '@/components/api-types';

export const dynamic = 'force-dynamic';

export default async function EventPage({ params }: { params: Promise<{ id: string }> }) {
const { id } = await params;
const event = await prisma.calendarEvent.findUnique({
where: { id },
include: {
rawTranscript: true,
jobs: { orderBy: { createdAt: 'desc' } },
processedVersions: { orderBy: { version: 'desc' } },
},
});
if (!event) notFound();

// Round-trip through JSON so the client component receives exactly the
// wire shape its polling fetches will produce (dates as ISO strings).
const initialData = JSON.parse(
JSON.stringify(serializeEventDetail(event)),
) as EventDetailView;

return <EventDetail eventId={id} initialData={initialData} />;
}
Loading