Methodology

How the AI produces the report — honestly.

Everything in your report traces to one of two places: something in your student's uploaded documents, or a piece of institutional data we maintain for the school in question. Here's exactly how that works.

01

What the AI actually reads

The AI ingests the documents you upload: your student's transcript, test score reports, AP/IB scores, activities list, and high school profile. It extracts structured information: every course and grade by semester, every test sitting and section, every activity's hours, years, and role descriptions. It does not read Common App essays. It does not read letters of recommendation. It does not receive demographic fields like race or ethnicity.

DETAILS
Required documents
Transcript (grades 9–12), test score reports (SAT / ACT / AP / IB), activities list
Auto-fetched
High school profile — grade distribution, course offerings, Naviance data where available
Not used
Essays, recommendation letters, photos, names in reasoning, race/ethnicity, religion
Extraction
OCR + structured parsing for transcript and score PDFs; direct parse for Common App-format activities
02

How the AI reasons about admissions

The AI uses a detailed admissions-domain prompt we've built and iterate on continuously. The prompt encodes how admissions officers actually read files at selective schools: academic rigor first, then achievement within rigor, then extracurricular depth and leadership, then context (school profile, geography, first-gen status). Every section of the report is produced by the model walking through that evaluation explicitly, not by pattern-matching on a training set of prior outcomes.

DETAILS
Prompt
A domain-specific system prompt that encodes admissions evaluation frameworks used at selective schools
Reasoning style
Explicit, step-by-step. The model names the features it's weighing at each school and shows its work
Iteration
The prompt is versioned. We update it as admissions policies change (test-optional shifts, ED rate changes, published institutional data)
03

How probabilities are produced

Probabilities are computed by a deterministic, catalog-anchored function — not by the AI. The AI identifies which qualitative factors apply at each school (named accomplishment, ED, first-generation status, legacy ties, etc.); the server then computes the probability from a closed-form formula: a per-major base admit rate, plus tier-aware adjustments for GPA fit, test fit, and the factors the AI flagged. The same student profile produces the exact same probability run-to-run.

DETAILS
Calculation
prob = clamp(per-major base rate + sum of tier-aware factor adjustments, 1, 95)
AI's role
Selects which factor labels apply (qualitative judgment); writes the strategic prose
Server's role
Owns every number — admit rate, GPA / test bumps, ED magnitude, first-gen and legacy lifts. AI does not invent magnitudes.
Per-major base rate
Each (school, major) combo gets its own base. Where a published per-major admit rate exists (e.g., CMU SCS 5%, UCSD CSE 13%), the catalog uses it. Otherwise the base is computed from the school's overall rate × a tier-aware major multiplier built from real per-major data — bio at engineering-heavy schools is ~1.5–2× the overall rate (because bio applicants don't compete with the CS pool); CS at most flagships is ~0.4–0.6× overall.
Direct-admit programs
A short curated list of programs with separate, much harder admission gates (UIUC CS, UT Austin Cockrell, Penn Wharton, Penn Nursing, etc.) gets an additional 0.55× factor on top of the major multiplier
Tier-aware magnitudes
A 4.0 GPA at a 60%-admit school is more diagnostic than at HYPS where everyone is at-median; the engine reflects that
Reproducibility
Same student × same school = same probability, every run. No model drift on the numbers.
Confidence band
Width depends on data source: ±10pp for hand-curated CDS data, ±14pp for College Scorecard, ±18pp for modeled per-major rates
Transparency
Every probability shows the top three drivers with +/− markers so you can see exactly what moved it
04

Live institutional data

Every probability is anchored to a structured catalog row that the server owns. The base catalog covers 1,500+ US four-year colleges loaded from the US Department of Education's College Scorecard — admit rates, ACT and SAT middle-50% bands, urbanicity, public/private, enrollment. On top of that we layer hand-curated data from Common Data Set filings for the most-applied-to schools, plus a regression-derived layer for schools where we don't yet have CDS-quality data. Schools without catalog coverage are excluded from the probability chart and listed in an explicit notice — we don't guess.

DETAILS
Universities covered
1,500+ US four-year colleges with admit-rate data, sourced from College Scorecard
Hand-curated CDS data
Top schools have GPA p25/median, ED bump, and per-major admit rates from published Common Data Set filings
Regression-derived bands
For schools without CDS data, GPA bands are estimated from acceptance rate × ACT bands (validated against external CDS values to within 0.06 GPA points)
Per-major modeling
For (school, major) combos without published rates, the catalog computes a per-major base from a tier-aware multiplier table covering 35+ undergraduate majors — calibrated so CS at engineering-heavy publics reads ~0.4× overall, biology at the same schools reads ~1.7× overall (bio applicants don't compete with the CS pool), humanities read 1.05–1.2× overall
Refresh cadence
Quarterly Scorecard refresh; CDS policy changes (e.g., a school reintroducing required SAT) trigger out-of-band updates
Honest gaps
If a target school isn't in the catalog, we drop it from the probability chart and surface a banner — never a guessed number
05

What we deliberately don't do

Much of the credibility of a college advisor comes from what they refuse to do. The AI doesn't invent outcomes data it doesn't have. It doesn't score race or ethnicity. It doesn't read essays. It doesn't claim single-point precision where admissions is genuinely stochastic. And it doesn't scare families with catastrophizing — concerns are ranked by their actual probability impact, not by how alarming they sound.

DETAILS
No race or ethnicity scoring
Post-SFFA, these fields are legally and ethically fraught. The AI does not receive them
No essay reading
Essays belong to the student and their recommenders. We don't ingest them
No point estimates without bands
Every probability comes with a confidence interval; anyone quoting a single number is guessing
No manufactured urgency
We don't invent crises to sell upgrades. The report says what it finds and nothing more
06

Your data, never trained on, never shared

Your student's documents and the profile we extract from them stay inside our infrastructure. They're never used to train models — ours or our LLM provider's — never sold, and never shared with universities, testing organizations, or admissions consultancies. From Account → Delete My Data you can wipe everything; backups roll off within 35 days.

DETAILS
Training
Never. Your student's profile is never used to train, fine-tune, or evaluate any model — ours or our LLM provider's. The probability engine is closed-form and deterministic; the language model adds prose, never numbers
Provider terms
Our LLM provider's commercial terms contractually prohibit using customer prompts or outputs to train their models — it's a guarantee in the commercial agreement, not a setting we toggle
Sharing
Never sold. Never shared with universities, testing organizations, or admissions consultancies. No third-party tracking SDKs (no Google Analytics, no Meta pixel) in production code
Account isolation
Each student's data is scoped to its account. A user can only read or write rows for students they own or were explicitly invited to via a single-use, email-matched token
Deletion
One-click from Account → Delete My Data. Backups roll off within 35 days, after which no copy of your student's profile remains in our infrastructure
Full policy
See the Data Use Policy and Privacy Policy for the legal-grade detail — both are linked from the footer
07

Grounding and verification

Every claim in the report has to be traceable — either to something in the student's uploaded documents, or to the school brief for the institution in question. Before a report ships, an automated check walks the document and verifies that every numeric claim and every school-specific fact appears in one of those two sources. If something doesn't ground, the report is regenerated.

DETAILS
Document grounding
Numeric claims (GPA, scores, activity hours) must appear in the uploaded documents
School grounding
Claims about a specific school's policies or data must appear in the school brief
Automated grounding check
Runs before delivery; ungrounded claims trigger regeneration
Drift monitoring
We sample outputs weekly and review for accuracy, updating the prompt as needed

Questions about how we work?

Email methodology@collegesignal.ai. We answer every one personally.

Contact us →