Authentication

The API currently uses GCP Application Default Credentials on the server side. No Authorization header is required from API clients. Set GCP_PROJECT_ID and GOOGLE_APPLICATION_CREDENTIALS in the server environment before starting.
Endpoints that require GCP (BigQuery, Vertex AI) will return 503 Service Unavailable if GCP_PROJECT_ID is not configured.

Error Codes

StatusMeaningCommon cause
400Bad RequestMissing required field or empty resume text
422UnprocessablePDF parsing failed or empty PDF
500Server ErrorPipeline orchestration failure
503UnavailableGCP not configured — set GCP_PROJECT_ID

POST /analyze
analyze

Full 5-stage career-rebound orchestration. The primary interview-demo endpoint — one call, complete output. Accepts a resume (PDF upload or raw text) plus a target role and optional JD, then runs all pipeline stages in sequence. Each stage is fail-safe: a failure in one never blocks results from earlier stages.

Send as multipart/form-data. Provide either resume (PDF file) or resume_text (plain string) — if both are given, the PDF takes precedence.

Request Parameters

FieldTypeRequiredDescription
resumefileoptionalResume PDF file upload
resume_textstringoptionalResume plain text (alternative to PDF)
target_rolestringrequiredTarget job title, e.g. Senior Data Engineer
candidate_idstringoptionalUnique ID — auto-generated as demo-{8hex} if omitted
jd_textstringoptionalJob description text — enables gap analysis (Stage 2)
jd_titlestringoptionalJD title used in narrative generation
industrystringoptionalTarget industry — auto-detected from industry match if omitted
include_pathwaybooloptionalInclude 90-day CrewAI roadmap (default false, adds ~45s)

Example Request

curl
# PDF upload
curl -X POST http://localhost:8000/analyze \
  -F "resume=@resume.pdf" \
  -F "target_role=Senior Data Engineer" \
  -F "jd_text=We need Python, BigQuery, Airflow, dbt, Spark" \
  -F "candidate_id=demo-001"

# Plain text (no file needed)
curl -X POST http://localhost:8000/analyze \
  -F "resume_text=Experienced data engineer with Python, SQL, BigQuery..." \
  -F "target_role=Senior Data Engineer"

# With 90-day pathway (~45 s extra)
curl -X POST http://localhost:8000/analyze \
  -F "resume=@resume.pdf" \
  -F "target_role=Senior Data Engineer" \
  -F "include_pathway=true"

Response — AnalysisResult

200 OK · application/json
{
  "candidate_id":  "demo-001",
  "target_role":   "Senior Data Engineer",
  "analyzed_at":   "2026-04-17T10:30:00Z",
  "skill_count":   47,
  "top_skills": [
    { "name": "Python",   "category": "technical", "confidence": 0.97 },
    { "name": "BigQuery", "category": "tool",      "confidence": 0.95 }
  ],
  "gap": {
    "gap_score": 72.4,
    "matched_skills":      ["Python", "SQL", "BigQuery"],
    "missing_skills":      ["Kafka", "Terraform"],
    "transferable_skills": [
      { "jd_skill": "Airflow", "candidate_skill": "Luigi", "similarity": 0.81 }
    ],
    "recommendation": "Strong match. Bridge Kafka and Terraform to close gaps."
  },
  "industry_match": {
    "top_industry":       "data_ai",
    "top_industry_label": "Data & AI",
    "scores": [
      { "rank": 1, "industry": "data_ai",     "match_score": 88.3 },
      { "rank": 2, "industry": "cloud_devops", "match_score": 74.1 }
    ]
  },
  "narrative": "A data-first engineer with a strong Python foundation...",
  "pathway": null,
  "stages": {
    "extract":   { "success": true, "duration_ms": 342  },
    "gap":       { "success": true, "duration_ms": 1820 },
    "industry":  { "success": true, "duration_ms": 510  },
    "narrative": { "success": true, "duration_ms": 3102 },
    "pathway":   { "success": true, "duration_ms": 0, "error": "skipped" }
  },
  "total_duration_ms": 5892
}

POST /extract
extraction

Extract skills from raw text using the spaCy PhraseMatcher + NER pipeline. Persists results to BigQuery and refreshes the candidate profile.

FieldTypeRequiredDescription
textstringrequiredRaw text to extract skills from
candidate_idstringrequiredCandidate identifier for BigQuery storage
model_namestringoptionalspaCy model override (default: en_core_web_lg)
POST /resume/upload
resume

Upload a PDF resume. Parses the PDF with pdfplumber, splits into labeled sections (Summary, Experience, Skills, Education, Certifications, Projects, Other), extracts skills per section, and deduplicates by highest confidence.

FieldTypeRequiredDescription
filefilerequiredPDF resume file
candidate_idstringrequiredCandidate identifier
storebooloptionalPersist to BigQuery (default: true)
POST /agent/extract
agent

LangGraph stateful skill extractor. Runs section-by-section at high confidence (0.7), retries on full text at lower threshold (0.4) if skill count < 3. Returns skills plus a full graph trace for debugging.

Response includes

skillsSkill[]Extracted and validated skills
retry_countintNumber of retry passes taken
extraction_idstringBigQuery row ID if stored
tracelistLangGraph node execution trace

GET /candidate/{candidate_id}/profile
candidate

Returns the aggregated skill profile for a candidate — one entry per unique skill, ranked by frequency and confidence. Built from all extraction runs via BigQuery MERGE.

ProfiledSkill fields

skill_namestringNormalized skill name
categorystringtechnical / soft / domain / tool / certification
frequencyintTimes seen across all extractions
confidence_avgfloatAverage extraction confidence (0–1)
first_seen / last_seendatetimeTemporal range of skill evidence

POST /jd
job-description

Ingest a job description. Detects seniority level (junior/mid/senior/lead/staff/manager), classifies into one of 8 industries, splits required vs preferred skills, extracts skills per section, and stores to BigQuery.

FieldTypeRequiredDescription
jd_idstringrequiredUnique JD identifier
titlestringrequiredJob title
textstringrequiredFull JD text
companystringoptionalCompany name
industrystringoptionalIndustry override (auto-detected if omitted)

POST /gap
gap-analysis

Gap analysis between a candidate's skill profile and a stored JD. Uses exact match plus semantic cosine similarity via Vertex AI embeddings. Surfaces transferable skills that keyword matching misses.

FieldTypeRequiredDescription
candidate_idstringrequiredCandidate to evaluate
jd_idstringrequiredPreviously ingested JD ID
similarity_thresholdfloatoptionalMinimum cosine similarity for transferable match (default: 0.75)

GapAnalysisResult fields

gap_scorefloat 0–100Higher = stronger fit. Formula: min(100, (matched + 0.7×transferable) / total_required × 100)
matched_skillsstring[]Skills that exactly match JD requirements
transferable_skillsTransferableSkill[]Similar but not identical — includes similarity score
missing_skillsstring[]Required JD skills with no candidate match
recommendationstringHuman-readable career guidance
GET /industry/match/{candidate_id}
industry-match

Score a candidate against all 8 industry centroid vectors using BQML ML.DISTANCE(COSINE) directly in BigQuery. Returns ranked scores for: Data & AI, Software Engineering, FinTech, HealthTech, eCommerce, Cybersecurity, Cloud/DevOps, Product Management.

IndustryScore fields (per industry)

rankint1 = best fit
industrystringIndustry enum key
industry_labelstringHuman-readable label
match_scorefloat 0–100Converted from cosine distance (100 = perfect)
POST /narrative
narrative

Generate a RAG-grounded career narrative via Gemini 2.5 Flash. Retrieves candidate's top 8 skills, industry's top 10 demanded skills, skill overlap, and sample JD titles from BigQuery — then generates a 3-sentence, second-person story. System prompt enforces no hallucinations.

FieldTypeRequiredDescription
candidate_idstringrequiredCandidate with an existing profile
target_rolestringrequiredTarget job title for the narrative
industrystringoptionalIndustry context (auto-detected if omitted)
POST /market/analyze
market-analyst

Real-time skill demand analysis via a CrewAI single-agent crew. Uses DuckDuckGo search (no API key needed) to research current job market demand for up to 10 skills. Returns demand score 0–100, trend (growing / stable / declining), and evidence snippet per skill.

This endpoint makes live web search calls. Latency varies (~10–30s for 10 skills). Limit to the skills most important for the gap analysis to keep response time reasonable.
POST /pathway/plan
pathway-planner

CrewAI two-agent 90-day reskilling roadmap. Researcher agent searches Coursera, Udemy, and YouTube for real courses per missing skill. Planner agent synthesizes into a 3-phase roadmap with weekly hours, milestones, and success metrics. Expect ~45s.

FieldTypeRequiredDescription
candidate_idstringrequiredCandidate identifier
target_rolestringrequiredTarget job title
missing_skillsstring[]requiredSkills from gap analysis to close
gap_scorefloatoptionalGap score 0–100 for roadmap pacing

POST /embeddings/candidate/{candidate_id}
embeddings

Embed a candidate's profile skills not yet in the global catalog. Reads the candidate profile, filters skills already embedded, embeds new ones via Vertex AI text-embedding-004 (768-dim), and upserts to skill_embeddings table.

POST /embeddings/similar
embeddings

Semantic skill similarity search. Embeds the query skill using RETRIEVAL_QUERY task type, then runs BigQuery VECTOR_SEARCH (COSINE) against the skill catalog. Returns the top-N most similar skills with distance scores.


GET /lakehouse/status
lakehouse

Returns row counts for all 8 tables across Bronze, Silver, and Gold BigQuery datasets. Use this to verify lakehouse health and confirm data is flowing through the medallion pipeline.

LakehouseStatus structure

bronzeLayerTableInfo[]raw_resume_ingestion, raw_jd_ingestion
silverLayerTableInfo[]candidate_skills, jd_skill_profiles, ingestion_log
goldLayerTableInfo[]match_scores, industry_rankings, candidate_readiness
GET /lakehouse/gold/readiness/{candidate_id}
lakehouse

Fetch the Gold-layer composite readiness index for a candidate. Combines match scores, industry coverage, extraction confidence, and skill breadth into a single 0–100 score with tier (READY / DEVELOPING / EMERGING).

CandidateReadiness formula

readiness_indexfloat 0–10040% match score + 30% industry coverage + 20% avg confidence + 10% skill breadth
readiness_tierstringREADY (≥70) · DEVELOPING (40–69) · EMERGING (<40)
best_industrystringTop industry from most recent match run

POST /registry/evaluate
model-registry

Run F1 evaluation against the golden test set — no GCP write. Returns precision, recall, F1 score, per-example breakdown, and whether the model passes the F1 ≥ 0.85 gate required for registration.

GET /monitoring/drift/recent
drift-monitoring

Returns the last N drift records from BigQuery. Each record includes unknown_skill_rate, avg_confidence, taxonomy_coverage, and whether the alert threshold was triggered (unknown_rate > 20%).

ParamTypeDescription
limitintMax records to return (default: 10)
GET /health
ops

Basic health check. Returns {"status": "ok"}. No GCP dependency — use this for Cloud Run health probes and load balancer checks.