Add College Field
Operational runbook
Categorization note
Every Scorecard upsert now calls categorizeFromRawRow (Plan Unit 9)
and stamps primaryCategory, categoryVersion, contentVersion,
tags, parentUnitId, isBranch, state (top-level promoted) onto
the doc. If the new field changes the prompt-input set used by AI
reasoning or rfsLive, also bump hashContentInputs domain in
mapper.ts so contentVersion invalidates the cache.
See also: docs/runbooks/categorize-school.md.
When to use this
Use this guide when you want to surface a new College Scorecard (or IPEDS / EADA) attribute on the mobile detail screen.
Prerequisites
- Confirm the field exists in the Scorecard data dictionary at https://collegescorecard.ed.gov/data/documentation/
- Know its dot-key path (e.g.
latest.student.share_25_older) and value type (number 0-1, integer, enum code) - Understand the unit: fraction vs. percentage, raw code vs. human-readable label, etc. Wrong units are silent — the field will populate with garbage and pass typecheck.
Steps
1. Add the dot-key to INSTITUTION_FIELDS
File: packages/backend/convex/lib/scorecard/fields.ts
Append the Scorecard dot-key to the INSTITUTION_FIELDS const array. Keep the group structure (school._ → location._ → admissions._ → cost._ → completion.* etc).
// packages/backend/convex/lib/scorecard/fields.ts
"latest.student.share_25_older",Adding a key here is safe — the mapper drops unknowns. Removing a key that the mapper reads will break the mapper silently (fields will become null for new fetches; existing cached rows are unaffected until re-upserted).
2. Read and map the field in mapInstitutionDoc
File: packages/backend/convex/lib/scorecard/mapper.ts
Use the f(s, key) + num() / str() / bool() helper pattern. Never inline raw field access.
// Inside mapInstitutionDoc return object, under the appropriate section:
students: {
// existing fields ...
over25Share: num(f(s, "latest.student.share_25_older")),
},All values must be nullable (number | null, string | null). Scorecard suppresses small cohorts — null is a normal production value, not an error.
If the field is a numeric code that maps to a label (like school.locale), add the mapping table to fields.ts alongside LOCALE_MAP and SIZE_MAP, then apply it in the mapper. Inline switch statements in mapInstitutionDoc make the mapper hard to test.
3. Add the field to ShapeRoot in collegeShape.ts
File: packages/backend/convex/lib/collegeShape.ts
The ShapeRoot interface is the internal read-layer for Convex docs. Add the field under the correct section, marked optional (the table is v.any() — old docs won't have it).
// packages/backend/convex/lib/collegeShape.ts
students?: {
// existing fields ...
over25Share?: number | null;
};4. Expose the field in toCollegeContract
File: packages/backend/convex/lib/collegeShape.ts
toCollegeContract is the mobile serializer. Add the field to its return object:
over25Share: c.students?.over25Share ?? null,Use ?? null — never ?? 0 or ?? false — so callers can distinguish "data not yet ingested" from a true zero.
5. Add the field to the College type in @app/data-contract
File: packages/data-contract/src/types.ts
College is the single source of truth for the mobile contract. Add the field with a TSDoc comment explaining the source and units:
/**
* Share of undergrads aged 25+ (SHARE_25_OLDER). Scorecard `latest.student.share_25_older`.
* 0–1 fraction or null when suppressed.
*/
over25Share: number | null;Never re-declare the type in apps/mobile/ or apps/web/. Every consumer imports from @app/data-contract.
6. Run the Scorecard backfill
After deploying the code changes, re-ingest all colleges so the new field populates from live Scorecard data:
npx convex run features/colleges/actions:scorecardBackfillThis enqueues an upsert mutation for every school. The upsert's content-hash check skips writes for unchanged records. New field additions always trigger a write because the hash changes. Wall time: ~5 minutes for the full ~6,500-school dataset.
7. Verify in the Convex dashboard and smoke test
- Open Convex dashboard → Tables →
colleges→ pick a well-known school (e.g. unitId 110635 = MIT) - Confirm the new field is present and non-null under the correct section
- Run
pnpm check(typecheck + lint) — the contract change must pass
Verification commands
# Full typecheck + lint after changes
pnpm check
# Confirm the backfill ran (check dashboard or query directly)
npx convex run features/colleges/internal:byUnitId '{"unitId":110635}'
# Run the scorecard backfill manually
npx convex run features/colleges/actions:scorecardBackfill
# After backfill, spot-check row count
npx convex run features/colleges/internal:byUnitId '{"unitId":110635}'What can go wrong
All rows show null after backfill
The dot-key in INSTITUTION_FIELDS is wrong or the field doesn't exist for the filter set. Verify in the Scorecard API Explorer at https://collegescorecard.ed.gov/data/documentation/ with a manual curl:
curl "https://api.data.ed.gov/ed/collegescorecard/v1/schools?api_key=YOURKEY&fields=id,school.name,latest.student.share_25_older&per_page=5"Wrong units (e.g. 0.43 vs 43)
Scorecard rate fields are 0-1 fractions; integer-count fields are whole numbers. The mapper's num() helper passes the raw value through — if the Scorecard returns 43.0 for a percentage that should be 0.43, you need to divide in the mapper. Check the Scorecard data dictionary's "type" and "example" columns.
Type errors after adding to College
toCollegeContract return type is inferred — it must include every field declared in College. If you add to College but forget toCollegeContract, TypeScript will surface an error at the byUnitId query level (the return type mismatch propagates up). Fix by adding the field to toCollegeContract first.
Content hash not invalidated
The upsert mutation in features/colleges/internal.ts uses contentHash() to skip writes when nothing changed. A field that maps to null for all schools will not change existing docs — they already have null for an unknown field. This is expected. The field will appear after the next Scorecard refresh for schools where the API returns a non-null value.
Anti-patterns
Non-nullable types in College — Every Scorecard field must be T | null. Non-null contracts will cause runtime crashes when Scorecard suppresses values for small schools. The existing fields (admitRate, medianEarnings10yr, etc.) are all nullable — match that pattern.
Deriving values from other fields inside mapInstitutionDoc — Derived values (composite scores, GPA proxies, grade labels) belong in packages/backend/convex/lib/derive.ts via computeDerived(), not inline in the mapper. The mapper maps; the derive module derives.
Forgetting toCollegeContract — Adding the field to ShapeRoot and College but not to toCollegeContract means the field will be present in the Convex doc but silently absent from every mobile query response. Always check toCollegeContract last.
Inline string literals in mapOwnership-style logic — Use the existing helper functions (mapOwnership, mapTestPolicy, mapLocale via LOCALE_MAP) rather than inline conditionals. If you need a new code-to-label mapping, add a const map to fields.ts.
Reference example
The walkability block follows this exact pattern end-to-end but uses a separate action (fetchAndCacheWalkabilityForCollege) rather than Scorecard. For a pure Scorecard field, the closest example is the share_25_older addition path above — look at how pellPercent and federalLoanPercent flow from fields.ts → mapper.ts → collegeShape.ts → toCollegeContract → types.ts.