AI and matching

This is the most important architectural section to read. V2 made a deliberate, non-trivial change from V1 here.

TL;DR

Matching is deterministic math. Reach/Fit/Safety is computed from real numbers (GPA, test scores, admit rates, federal net price, completion rates). Same input → same output → fully explainable.
AI writes the 2–3 sentence "why this might be for you" blurb on each college card. That's it. AI does not decide the match.
V1 used RAG (Retrieval-Augmented Generation with vector embeddings). It was replaced because the inputs that actually drive college matching are numeric, not textual.

V1's matching engine ran on @convex-dev/rag with OpenAI's text-embedding-3-small (1536-dimensional embeddings). Every college's profile — name, location, stats, top programs, "strong in" categories, features — was concatenated into a structured text blob and embedded. The vectors were stored in Convex's RAG component under the colleges namespace.

A user's quiz responses were converted into a similar query embedding. Cosine similarity over the vector store returned the nearest N colleges; those were handed to a generation model to produce match explanations.

The V1 ingestion pipeline

V1 didn't read Scorecard live. It downloaded the full Scorecard CSV (MERGED2023_24_PP.csv, ~6,000 institutions), parsed it with a batch processor under packages/backend/scripts/ingestion/, called OpenAI for AI-generated descriptions and feature badges, scraped logos via logo.dev, fetched hero images via Serper, then bulk-inserted the fully-formed college documents.

A single ingestion run cost ~~$0.01/college for content generation + Serper calls (~~$0.90 per 3,000 schools at $0.30/1K queries) + logo.dev. Full pipeline: 2–3 seconds per college; partial runs an order of magnitude faster.

Why V1 didn't work for NXT

Retrieval was non-deterministic in practice. Identical query embeddings could surface different top-K results across runs due to vector index sharding and tie-breaking. Same student, different day, different list. Impossible to support.
The signal is numeric, not textual. "Is this school a Reach?" is decided by SAT range + GPA + admit rate, not by how a college's description "sounds like" the student's profile. RAG converts numbers to text, embeds text, then searches by text similarity — three lossy steps where one direct comparison would do.
Coverage was uneven. Schools without rich text descriptions (most trade schools, many community colleges) had weaker embeddings and got worse matches. The V2 expansion into trade schools — the loudest signal from administrators and teachers — would have been infeasible under V1.
Cost grew with usage. Every retrieval was a paid embedding + paid generation call. V2 pays for AI once per (student, school) pair and caches it.
Hallucinations on facts. Generation models embedded under RAG would invent specifics — "great mentorship program" — when no such program existed. There is no way to fact-check a vector-grounded blurb against a numeric truth.

The original V1 RAG implementation lives at packages/backend/convex/services/rag.ts on the legacy/v1-archive branch. The V1 setup guide is at docs/technical/rag-setup-guide.md on that branch. Both are reference-only.

What V2 does

V2 has two layers: a deterministic personalization engine that picks and ranks schools, and a grounded AI blurb that writes the friendly explanation.

Deterministic personalization

The current code lives at:

packages/backend/convex/features/discover/ — the rail composer (Picked For You, Learning Style, Campus Vibe, High Value, Hidden Gems, Program Leader, Test Optional, First Generation, MSI).
packages/backend/convex/lib/rfsEngine.ts — Reach/Fit/Safety verdict computation.
packages/backend/convex/features/rfs/ — verdict caching + cleanup.

Every signal is sourced from a federal field:

Rail / signal	Source field(s)
RFS verdict	`ADM_RATE`, `SAT_AVG`, `ACT*` percentiles vs. student GPA/test scores
Picked For You	composite score: interests × programs offered, geographic distance, GPA fit
Learning Style	personality quiz result mapped to documented school traits
Campus Vibe	setting (rural/town/suburb/metro) + size + walkability + politics from quiz
High Value	net price ÷ median earnings (10-yr post-grad)
Hidden Gems	high quality bucket, low awareness signal
Program Leader	top earnings + completions in student's study area (CIP-level)
Test Optional	`ADMCON7=5` Scorecard flag
First Generation	parental education proxy + Pell-eligible cohort outcomes
MSI	Scorecard's HBCU / AANAPISI / HSI / TRIBAL / PBI flags

The Afford peek on every card runs the actual federal net-price formula against the student's financeBracket (one of 5 federal income brackets). "$X for a family in your income range" is a fact pulled from NPT4* fields, not a guess.

The five meaning-first answers on college detail (Afford, Admit, Outcomes, Community, Finish) each pull from named Scorecard fields. Outcomes shows real median earnings 10 years post-graduation (MD_EARN_WNE_P10). Admit shows the school's real admit rate against the student's real GPA/test scores to produce a Reach/Fit/Safety verdict.

The AI blurb (the only AI in the app)

Each college card shows a 2–3 sentence "why this might be for you" blurb. Example:

You're aiming for a small, walkable campus and Bowdoin's setting matches that. Your SAT puts you inside their middle 50% range, and Bowdoin's outcomes for English majors line up with the area you're considering.

Where it lives

Builder: packages/backend/convex/lib/openai.ts — pure prompt builder + HTTP client.
Action: packages/backend/convex/features/colleges/actions.ts → generateUserReasoning.
Cache table: collegeReasoning (one row per (userId, unitId)).
Cleanup: weekly cron collegeReasoning cleanup evicts rows older than 30 days.

Model + cost

Model: gpt-5.4-nano (cheapest + fastest reasoning-family model, as of 2026-05).
Input: structured ReasoningUserContext + ReasoningCollegeContext — only the numeric facts the model needs.
Output: 2–3 sentences, ≤200 tokens.
Temperature: handled by the API default (reasoning models reject custom temperature and reject max_tokens — max_completion_tokens is used instead).
Timeout: 15s.

Anti-slop guardrails (locked, enforced in system prompt)

No superlatives ("amazing", "perfect", "world-class").
No marketing buzzwords ("nurturing community", "vibrant tapestry").
No AI-vocab tics ("delve", "robust", "pivotal").
No em dashes.
Second-person voice ("you").
2–3 short sentences max.
Every claim grounded in a specific data point handed to the model — no inventing facts.

The prompt is grounded: the model receives the student's GPA, test scores, the school's admit rate, the net-price estimate, matching programs, and is instructed to write 2–3 sentences using only those facts.

Failure mode

If OPENAI_API_KEY is missing or OpenAI is down, the call throws; the action layer catches and the school card renders without the blurb. The rest of the card still works.

Cost per user interaction

Action	Cost (USD, 2026-05)
Browsing rails (any number of swipes)	$0 — no AI call
Opening a college card (first time)	~$0.0003 — one `gpt-5.4-nano` blurb
Re-opening the same card within 30 days	$0 — served from `collegeReasoning` cache
Reach/Fit/Safety verdict (computed once per profile change, cached)	$0 — pure math
The Afford peek	$0 — federal formula

Embedding + vector search costs from V1: gone. V2 has no embedding model, no vector store, no cosine search.

Blurb request flow

Operational details

One env var. OPENAI_API_KEY on the Convex production deployment. Set via npx convex env set.
Two weekly cleanup crons. rfsVerdicts cleanup (Sunday 9:00 UTC) and collegeReasoning cleanup (Sunday 10:00 UTC) evict rows older than 30 days. Both self-recurse via the scheduler until backlog clears.
Backfill safety. New college fields can be added to Scorecard mappers without re-running V1-style ingestion. The monthly scorecard refresh cron picks them up on the 1st of each month.

Technical detail

Where to read the prompt

packages/backend/convex/lib/openai.ts exports buildReasoningMessages(user, college). The system prompt is the first message; the user message is the structured fact block. Read both before touching either — anti-slop rules are encoded as terse imperatives, easy to weaken accidentally.

Why not Claude or Gemini

Both are viable. gpt-5.4-nano was chosen for (a) lowest current $/M-input-token at acceptable quality, (b) low p99 latency on short outputs, (c) the existing OpenAI account already had org-level cost controls configured. Switching providers is a one-file change in openai.ts (HTTP client + auth header + response shape). The blurb prompt would need re-tuning for the new model's defaults — superlative + em-dash bans are model-agnostic but each model has its own slop fingerprint.

Why not fine-tune

The blurb's job is to summarize structured data the model is already given. There is no domain-specific vocabulary to teach. Fine-tuning would add operating burden (training pipeline, model versioning, eval set) without measurable quality gain over a good system prompt with strict guardrails.

Why not on-device

expo-router + Hermes does not run an LLM on-device at acceptable latency on mid-range Android. The blurb is short enough that a 300ms cloud round-trip is below the threshold a user notices.

What was deliberately not built

Re-running the V1 ingestion pipeline anywhere in V2. The scripts/ingestion/* files exist only on legacy/v1-archive.
A vector store. No Convex RAG component, no Pinecone, no pgvector.
A "find similar schools" semantic search. Browse uses Scorecard's searchIndex on identity.name + structured filters (active, primaryCategory, state, ownership). See packages/backend/convex/schema.ts colleges.searchIndex("search_colleges_v2", ...).

If a future product decision adds back semantic search, do it as an additive layer on top of the current deterministic engine. Do not unwind the determinism.