Methodology

How the AI produces the report — honestly.

Everything in your report traces to one of two places: something in your student's uploaded documents, or a piece of institutional data we maintain for the school in question. Here's exactly how that works.

What the AI actually reads

The AI ingests the documents you upload: your student's transcript, test score reports, AP/IB scores, activities list, and high school profile. It extracts structured information: every course and grade by semester, every test sitting and section, every activity's hours, years, and role descriptions. It does not read Common App essays. It does not read letters of recommendation. It does not receive demographic fields like race or ethnicity.

DETAILS

Required documents

Transcript (grades 9–12), test score reports (SAT / ACT / AP / IB), activities list

Auto-fetched

High school profile — grade distribution, course offerings, Naviance data where available

Not used

Essays, recommendation letters, photos, names in reasoning, race/ethnicity, religion

Extraction

OCR + structured parsing for transcript and score PDFs; direct parse for Common App-format activities

How the AI reasons about admissions

The AI uses a detailed admissions-domain prompt we've built and iterate on continuously. The prompt encodes how admissions officers actually read files at selective schools: academic rigor first, then achievement within rigor, then extracurricular depth and leadership, then context (school profile, geography, first-gen status). Every section of the report is produced by the model walking through that evaluation explicitly, not by pattern-matching on a training set of prior outcomes.

DETAILS

Prompt

A domain-specific system prompt that encodes admissions evaluation frameworks used at selective schools

Reasoning style

Explicit, step-by-step. The model names the features it's weighing at each school and shows its work

Iteration

The prompt is versioned. We update it as admissions policies change (test-optional shifts, ED rate changes, published institutional data)

How probabilities are produced

Probabilities are computed by a deterministic, catalog-anchored function — not by the AI. The AI identifies which qualitative factors apply at each school (named accomplishment, ED, first-generation status, legacy ties, etc.); the server then computes the probability from a closed-form formula: a per-major base admit rate, plus tier-aware adjustments for GPA fit, test fit, and the factors the AI flagged. The same student profile produces the exact same probability run-to-run.

DETAILS

Calculation

prob = clamp(per-major base rate + sum of tier-aware factor adjustments, 1, 95)

AI's role

Selects which factor labels apply (qualitative judgment); writes the strategic prose

Server's role

Owns every number — admit rate, GPA / test bumps, ED magnitude, first-gen and legacy lifts. AI does not invent magnitudes.

Per-major base rate

Each (school, major) combo gets its own base. Where a published per-major admit rate exists (e.g., CMU SCS 5%, UCSD CSE 13%), the catalog uses it. Otherwise the base is computed from the school's overall rate × a tier-aware major multiplier built from real per-major data — bio at engineering-heavy schools is ~1.5–2× the overall rate (because bio applicants don't compete with the CS pool); CS at most flagships is ~0.4–0.6× overall.

Direct-admit programs

A short curated list of programs with separate, much harder admission gates (UIUC CS, UT Austin Cockrell, Penn Wharton, Penn Nursing, etc.) gets an additional 0.55× factor on top of the major multiplier

Tier-aware magnitudes

A 4.0 GPA at a 60%-admit school is more diagnostic than at HYPS where everyone is at-median; the engine reflects that

Reproducibility

Same student × same school = same probability, every run. No model drift on the numbers.

Confidence band

Width depends on data source: ±10pp for hand-curated CDS data, ±14pp for College Scorecard, ±18pp for modeled per-major rates

Transparency

Every probability shows the top three drivers with +/− markers so you can see exactly what moved it

Live institutional data

Every probability is anchored to a structured catalog row that the server owns. The base catalog covers 1,500+ US four-year colleges loaded from the US Department of Education's College Scorecard — admit rates, ACT and SAT middle-50% bands, urbanicity, public/private, enrollment. On top of that we layer hand-curated data from Common Data Set filings for the most-applied-to schools, plus a regression-derived layer for schools where we don't yet have CDS-quality data. Schools without catalog coverage are excluded from the probability chart and listed in an explicit notice — we don't guess.

DETAILS

Universities covered

1,500+ US four-year colleges with admit-rate data, sourced from College Scorecard

Hand-curated CDS data

Top schools have GPA p25/median, ED bump, and per-major admit rates from published Common Data Set filings

Regression-derived bands

For schools without CDS data, GPA bands are estimated from acceptance rate × ACT bands (validated against external CDS values to within 0.06 GPA points)

Per-major modeling

For (school, major) combos without published rates, the catalog computes a per-major base from a tier-aware multiplier table covering 35+ undergraduate majors — calibrated so CS at engineering-heavy publics reads ~0.4× overall, biology at the same schools reads ~1.7× overall (bio applicants don't compete with the CS pool), humanities read 1.05–1.2× overall

Refresh cadence

Quarterly Scorecard refresh; CDS policy changes (e.g., a school reintroducing required SAT) trigger out-of-band updates

Honest gaps

If a target school isn't in the catalog, we drop it from the probability chart and surface a banner — never a guessed number

What we deliberately don't do

Much of the credibility of a college advisor comes from what they refuse to do. The AI doesn't invent outcomes data it doesn't have. It doesn't score race or ethnicity. It doesn't read essays. It doesn't claim single-point precision where admissions is genuinely stochastic. And it doesn't scare families with catastrophizing — concerns are ranked by their actual probability impact, not by how alarming they sound.

DETAILS

No race or ethnicity scoring

Post-SFFA, these fields are legally and ethically fraught. The AI does not receive them

No essay reading

Essays belong to the student and their recommenders. We don't ingest them

No point estimates without bands

Every probability comes with a confidence interval; anyone quoting a single number is guessing

No manufactured urgency

We don't invent crises to sell upgrades. The report says what it finds and nothing more

Your data, never trained on, never shared

Your student's documents and the profile we extract from them stay inside our infrastructure. They're never used to train models — ours or our LLM provider's — never sold, and never shared with universities, testing organizations, or admissions consultancies. From Account → Delete My Data you can wipe everything; backups roll off within 35 days.

DETAILS

Training

Never. Your student's profile is never used to train, fine-tune, or evaluate any model — ours or our LLM provider's. The probability engine is closed-form and deterministic; the language model adds prose, never numbers

Provider terms

Our LLM provider's commercial terms contractually prohibit using customer prompts or outputs to train their models — it's a guarantee in the commercial agreement, not a setting we toggle

Sharing

Never sold. Never shared with universities, testing organizations, or admissions consultancies. No third-party tracking SDKs (no Google Analytics, no Meta pixel) in production code

Account isolation

Each student's data is scoped to its account. A user can only read or write rows for students they own or were explicitly invited to via a single-use, email-matched token

Deletion

One-click from Account → Delete My Data. Backups roll off within 35 days, after which no copy of your student's profile remains in our infrastructure

Full policy

See the Data Use Policy and Privacy Policy for the legal-grade detail — both are linked from the footer

Grounding and verification

Every claim in the report has to be traceable — either to something in the student's uploaded documents, or to the school brief for the institution in question. Before a report ships, an automated check walks the document and verifies that every numeric claim and every school-specific fact appears in one of those two sources. If something doesn't ground, the report is regenerated.

DETAILS

Document grounding

Numeric claims (GPA, scores, activity hours) must appear in the uploaded documents

School grounding

Claims about a specific school's policies or data must appear in the school brief

Automated grounding check

Runs before delivery; ungrounded claims trigger regeneration

Drift monitoring

We sample outputs weekly and review for accuracy, updating the prompt as needed

Questions about how we work?

Email methodology@collegesignal.ai. We answer every one personally.