Decisions
Major architectural decisions and the reasoning behind each
This page captures the non-obvious architectural decisions NXT has made. Each entry: what was chosen, why, what was rejected, consequences.
For deferred work (decisions to NOT build something today), see Roadmap.
D1 · Deterministic matching engine, not RAG
Chosen: Deterministic computation over federal Scorecard fields. AI used only for the per-school blurb.
Rejected: RAG (V1's approach) — embeddings + vector store + retrieval-augmented generation.
Why: The signals that decide a match (GPA, SAT, admit rate, net price, completion rates) are numeric. Same input must produce the same output, every time, with an explainable line. RAG retrieval is non-deterministic in practice, has weaker coverage for institutions without rich text descriptions, and grows in cost per query. See AI and matching for the full history.
Consequences: Deterministic matching is auditable and fast. AI cost capped at one blurb per (user, school) per 30 days. Semantic search not available out of the box; if needed, build additive to the deterministic engine.
D2 · Convex as the entire backend
Chosen: Convex Cloud for database + functions + crons + real-time subscriptions.
Rejected: Separate Postgres + REST/GraphQL API server + Redis subscriptions.
Why: Types flow end-to-end via @app/data-contract. Subscriptions arrive for free. One deploy ships database, functions, and crons together. Atomic deploys eliminate half-deployed states.
Consequences: Vendor lock-in on Convex — accepted explicitly. @app/data-contract keeps the schema portable. Migration is a multi-month project; the trade-off is one fewer service to operate today.
D3 · WorkOS AuthKit, not a custom auth implementation
Chosen: WorkOS AuthKit for identity. Shared tenant across mobile + web + docs.
Rejected: Build-your-own (bcrypt + JWT + session table), Auth0, Clerk.
Why: WorkOS handles password hashes, MFA, magic links, OAuth, and session JWTs in one provider. Migration to another OIDC provider is a few days of auth wiring if ever needed. AuthKit Pro tier (SSO, SCIM) is available without re-architecting if NXT signs an enterprise customer.
Consequences: Single-tenant dependency on WorkOS. NXT does not store passwords. WorkOS sees every sign-in.
D4 · Mobile-first, marketing-only on web
Chosen: Product lives in apps/mobile. apps/web is a marketing site (home, About, Contact, privacy/terms, account-deletion request).
Rejected: A web product surface (authenticated routes on wearenxt.com).
Why: Adding a web product doubles the auth surface to harden, duplicates the design system, increases deploy blast radius. Mobile is where the user signal lives. The marketing site's job is App Store + Play Store installs, not to compete with the app.
Consequences: Students cannot use NXT on a desktop. If enterprise customers (districts, counselor admins) require a browser-based view, that's net-new work — scope estimate in Roadmap.
D5 · Live Scorecard with cache, not bulk CSV ingestion
Chosen: Read College Scorecard API live, cache results in colleges, refresh monthly via cron.
Rejected: V1's bulk CSV ingestion — download MERGED2023_24_PP.csv, parse + transform + insert + AI-enrich every school in a batch run.
Why: Ingestion runs were slow (2–3 sec/school × 6K institutions), expensive (per-school OpenAI + Serper + logo.dev calls), and fragile (federal field renames broke runs). Live + cache is always current and pays only for institutions users actually see.
Consequences: First-time fetch on a cold school adds a small latency cost (mitigated by the monthly warm refresh). The colleges table grows naturally based on demand, not pre-populated.
D6 · v.any() for the colleges table shape
Chosen: colleges: defineTable(v.any()) — no Convex validator on the shape.
Rejected: A typed Convex schema mirroring every Scorecard field.
Why: Scorecard's schema is wide (~100 fields, occasionally renamed) and NXT doesn't own it. Forcing a strict validator would break ingestion every federal release and duplicate types across mapper + schema.
Consequences: Writes are not Convex-validated. Mitigation: every write goes through colleges/internal.ts upsert, which uses a typed builder. All reads go through lib/collegeShape.ts helpers.
D7 · Wrapped Convex factories, lint-enforced
Chosen: userQuery / userMutation / userAction wrappers + custom/no-raw-convex-throw ESLint rule.
Rejected: Raw query() / mutation() / action() factories.
Why: Raw factories are publicly callable. A mutation that "only the admin should use" is a parameter-spoofing vulnerability waiting to happen. Wrappers pre-resolve ctx.user from the verified WorkOS session and bake authorization into the factory.
Consequences: Lint catches violations before they merge. New engineers must learn the wrapper pattern but the lint rule is loud enough.
D8 · One i18n catalog for mobile + web
Chosen: Single packages/i18n/locales/en.json. Mobile (i18next) and web (next-intl) read the same file. pnpm i18n:lint enforces no hardcoded strings.
Rejected: Per-app catalogs, dynamic translation services.
Why: A second language is a translation pass, not engineering work. Single catalog means a single rename PR (pnpm i18n:rename) updates both apps.
Consequences: Mobile + web copy stay in sync by default. Adding Spanish is one translation file.
D9 · No staging Convex deployment
Chosen: One production + one dev Convex deployment. Mobile staging (TestFlight) builds point at production Convex.
Rejected: A separate staging Convex deployment.
Why: Doubles the operator burden — two sets of crons firing, two sets of credentials, two webhook endpoints. The dev deployment already serves QA + integration testing. Signed-in test accounts cover the gap.
Consequences: A bad migration on production Convex has no staging tier to catch it. Convex deploys are atomic; rollback is fast. Re-evaluate if a production-only bug ships through dev testing.
D10 · No CI deploys for backend (yet)
Chosen: Production Convex deploy runs from a local operator machine. Web auto-deploys via Vercel on push to main.
Rejected: GitHub Actions-based Convex deploy on main.
Why: Today's local deploy gives a human checkpoint before code reaches production. Convex deploys are atomic; the cost of a bad deploy (revert + redeploy) is low.
Consequences: Bottlenecks on a single operator. Move to CI deploys when more than one engineer is regularly pushing — see Roadmap.