Data
Database, schema, external data sources, caching, backups
Database
Convex. Hosted database + serverless functions + real-time subscriptions in one. There is no Postgres instance to back up, no Redis sidecar, no message broker.
- Production deployment. Configured in the Convex dashboard under the project linked to
nxt-app@outlook.com. - Development deployment. Separate Convex deployment used by the current operator for local work.
- V1 deployments are deleted. Both
neighborly-chameleon-947(V1 prod) andaware-bee-958(V1 dev) have been removed. Thelegacy/v1-archivebranch still references them in.env.examplecomments and won't run as-is — V1 is code-only history now.
Schema
The canonical schema lives at packages/backend/convex/schema.ts. It's a single monolithic file by convention (splitting into multiple defineSchema files breaks Convex's generated API surface).
| Table | Purpose |
|---|---|
users | One row per account, keyed to WorkOS via workos_id. Index: by_email, by_workos_id. |
profiles | Structured student profile: academics, test scores, finances, background, story, achievements, extracurriculars, community service. Index: by_userId. |
colleges | Live cache of Scorecard data + derived fields. v.any() shape — read via lib/collegeShape.ts helpers, never direct field access. Indexes: by_unitId, by_active, by_active_score, by_active_category, by_active_state_category, by_parentUnitId. Search index search_colleges_v2 on identity.name with active + primaryCategory + state + ownership filter fields. |
collegeFieldsOfStudy | Per-(college × CIP × credential level) outcome data. Backs the Outcomes section on college detail. |
savedSchools | Every save, tagged with the Reach/Fit/Safety verdict at save time. Owns the user's college list. |
rfsVerdicts | Cached per-(student, school) Reach/Fit/Safety calculations. Evicted by the weekly cleanup cron at 30-day TTL. |
collegeReasoning | Cached AI blurbs from §3. One row per (userId, unitId). Evicted by the weekly cleanup cron at 30-day TTL. |
quizPersonality / quizStudyAreas / quizCampusSetting | One row per student per quiz. |
webhook_events | Idempotency ledger preventing duplicate webhook processing (e.g. WorkOS user-creation webhooks fired twice). |
reports | Abuse / content reports submitted from inside the app. |
backfillMetadata | Tracks when external data sources were last refreshed (per-source cursor + counts). |
Where college data comes from
Almost all factual college data is the federal College Scorecard API published by the U.S. Department of Education. NXT does not own or maintain it; it reads from it.
| Source | What it provides | Refresh |
|---|---|---|
| College Scorecard API | Identity, location, admit, test, demographics, financials, completion, earnings, MSI flags | Monthly cron scorecard refresh (1st of month, 06:00 UTC) |
| IPEDS | Sibling federal dataset for fields Scorecard doesn't expose | Manual via runbook |
| EADA | Athletics classification | Monthly cron eada refresh (1st of month, 07:00 UTC) |
| Walkability | Campus location walk score | Manual only |
| logo.dev | School logos | Fetched on-demand, cached on the college doc |
| Serper | Hero images (Google Image proxy) | Fetched on-demand, cached on the college doc |
The full cron schedule lives at packages/backend/convex/crons.ts. There are no nightly batch jobs to monitor; Convex schedules them automatically. A daily cron scorecard cron health check (12:00 UTC) flags if Scorecard refresh silently stopped firing.
Caching strategy
V2 reads Scorecard live and caches the parsed shape in the colleges table. The first time a user encounters a school, the action fetches Scorecard, normalizes via lib/collegeShape.ts, stamps derived fields (primaryCategory, sortKey, state, etc.) via lib/derive.ts + lib/categorize.ts, and upserts.
Monthly Scorecard refresh paginates all operating institutions (predominant degrees 1–4) and schedules an upsert per school. The upsert hash check makes re-runs a no-op for unchanged records, so the monthly run is cheap.
Cached RFS verdicts and AI blurbs live in their own tables with 30-day TTLs. Cleanup crons evict stale rows on Sundays. The query layer also filters stale rows at read time as a belt-and-braces.
Backups
Convex retains point-in-time snapshots automatically as part of the hosted service. For defense-in-depth:
- Configurable scheduled exports to S3 / R2. Set up via the Convex dashboard under Settings → Backups. Not configured today — Convex-internal snapshots are the recovery story.
- Data export.
npx convex export --path snapshot.zipwrites the full deployment to a local zip file. Suitable for one-off backups, audits, or migration prep. @app/data-contractholds the schema as TypeScript. If everything else burns down, the schema can be re-applied to a fresh deployment in minutes.
How to export data
For ad-hoc data export (a single table, or a filtered slice):
# Full export
npx convex export --path snapshot.zip
# Inspect via dashboard
# https://dashboard.convex.dev/d/<deployment>/dataThe dashboard's table view supports CSV download for any table or filtered view.
Technical detail
Why colleges is v.any()
Scorecard's schema is wide (~100 fields, occasionally renamed across yearly releases) and NXT does not own it. Forcing a strict Convex validator on every Scorecard field would (a) break ingestion every time a federal field name changes and (b) duplicate every field's type in two places. v.any() lets the mapper layer (lib/collegeShape.ts) be the single source of truth for shape, and pushes type safety to the read-time helpers.
The trade-off: writes are not validated by Convex. The mitigation: every write goes through colleges/internal.ts upsert, which constructs the document via a typed builder.
Derived top-level fields
Convex cannot index nested fields. Where a rail or browse query needs to filter or sort on a nested Scorecard field, the value is promoted top-level on every write:
active— operating institution flag.sortKey— mirrorsderived.academicScorefor theby_active_scoreindex. Flagged for removal onceaggregateWorkforce+aggregateTransferOutcomeaggregate components fully replace it.primaryCategory— stamped bylib/categorize.ts. Drives per-category rails + browse filtering. Null only on pre-categorize-backfill docs (none should exist on prod today).state,ownership— promoted from nested location/control fields forby_active_state_categoryindex.
Ownership invariant
Every user-scoped table has an ownership index. For example, savedSchools is indexed by userId so the user-internal cascade query (account deletion) can collect only the deleting user's rows. The lint rule @convex-dev/no-collect-in-query enforces that .collect() is only used on bounded sets — exceptions are explicitly allow-listed in eslint.config.mjs.
Errors as data
Failure modes for the AI blurb, RFS computation, and Scorecard fetch each have a typed factory in convex/lib/errors.ts or convex/lib/errors-<feature>.ts. The frontend translates these via parseConvexError(e).code and getErrorMessage(err, t). There are no raw throw new Error(...) calls in convex code — custom/no-raw-convex-throw blocks them.