ReSkillio

AI-powered career rebound platform — resume in, career intelligence out. Skill extraction, gap analysis, industry fit, Gemini narrative, and a 90-day reskilling roadmap in a single API call.

FastAPI Google Cloud spaCy NLP LangGraph CrewAI BigQuery ML Vertex AI Gemini

Five-Stage Pipeline

Stage 1

Skill Extraction

spaCy PhraseMatcher + NER pulls 200+ skills from resume text and persists to BigQuery.

Stage 2

Gap Analysis

Exact match + semantic similarity via Vertex AI embeddings. Gap score 0–100 with transferable skills.

Stage 3

Industry Match

BQML cosine distance against 8 industry centroid vectors. Ranked fit scores in-database.

Stage 4

Gemini Narrative

RAG-grounded 3-sentence career story from Gemini 2.5 Flash — no hallucinations, facts only.

Stage 5 · opt-in

90-Day Pathway

CrewAI two-agent crew researches real courses (DuckDuckGo) and builds a phased reskilling roadmap.

System Architecture

Client / Demo UI curl · Postman · Streamlit POST /analyze FastAPI (Cloud Run) 16 routers · /analyze · /extract · /gap · /narrative · /pathway · ... analyze_pipeline.py Stage 1 spaCy Extract Stage 2 Gap Analysis Stage 3 Industry Match Stage 4 Gemini Narrative Stage 5 CrewAI Pathway Each stage: fail-safe · per-stage timing · success/error flag BigQuery skill_extractions candidate_profiles Vertex AI Embeddings text-embedding-004 768-dim · COSINE BigQuery ML industry_vectors ML.DISTANCE · 8 industries Gemini 2.5 Flash RAG narrative Vertex AI · Studio CI/CD taxonomy.json → GCS → Cloud Build → F1≥0.85 → Registry Cloud Monitoring unknown_skill_rate avg_confidence · alert >20%

GCP Service Map

ServiceUsed forResource
BigQuerySkill storage, profiles, embeddings, JD catalog, industry vectors, drift metricsreskillio.*
BigQuery MLCosine distance against 8 industry centroid vectors — scoring in-databaseindustry_vectors
BigQuery LakehouseMedallion Bronze/Silver/Gold layers for analytics & auditabilityreskillio_bronze/silver/gold.*
Vertex AI Embeddings768-dim skill vectors for gap analysis + industry matchingtext-embedding-004
Vertex AI GeminiRAG-grounded career narrative generationgemini-2.5-flash
Vertex AI Model RegistryVersioned spaCy skill extractor with F1 gatingreskillio-skill-extractor
Vertex AI Pipelines (KFP)Orchestrated PDF ingestion: load → extract → embedreskillio-ingestion-pipeline
Cloud StorageModel artifacts, taxonomy JSON, pipeline root{project}-models
Cloud BuildCI/CD retraining on taxonomy.json change in GCSPub/Sub trigger
Cloud MonitoringDrift metrics + alert policy (unknown_rate > 20%)3 custom metric descriptors
Cloud RunFastAPI API hostingreskillio-api

API Reference

Demo endpoint — POST /analyze

# PDF upload
curl -X POST http://localhost:8000/analyze \
  -F "resume=@resume.pdf" \
  -F "target_role=Senior Data Engineer" \
  -F "jd_text=We need Python, BigQuery, Airflow, dbt..." \
  -F "candidate_id=demo-001"

# Plain text (no PDF needed)
curl -X POST http://localhost:8000/analyze \
  -F "resume_text=Experienced data engineer with Python and BigQuery..." \
  -F "target_role=Senior Data Engineer"

# With 90-day pathway (~45s extra)
curl -X POST http://localhost:8000/analyze \
  -F "resume=@resume.pdf" \
  -F "target_role=Senior Data Engineer" \
  -F "include_pathway=true"

All Endpoints

POST
/analyze
Full 5-stage career-rebound analysis
POST
/extract
Extract skills from raw text
POST
/resume/upload
PDF upload — section-aware extraction
GET
/candidate/{id}/profile
Aggregated skill profile
POST
/jd
Ingest job description
POST
/gap
Gap analysis vs stored JD
GET
/industry/match/{id}
Industry fit scores (8 industries)
POST
/narrative
Gemini RAG career narrative
POST
/agent/extract
LangGraph stateful skill extractor
POST
/market/analyze
CrewAI real-time skill demand
POST
/pathway/plan
CrewAI 90-day reskilling roadmap
POST
/embeddings/candidate/{id}
Embed candidate skills
GET
/registry/versions
List Vertex AI model versions
GET
/monitoring/drift/recent
Recent drift records
GET
/lakehouse/status
Row counts across Bronze/Silver/Gold
GET
/health
Health check

Sample Output

// POST /analyze response (truncated)
{
  "candidate_id":   "demo-001",
  "target_role":    "Senior Data Engineer",
  "skill_count":    47,
  "top_skills": [
    { "name": "Python",   "category": "technical", "confidence": 0.97 },
    { "name": "BigQuery", "category": "tool",      "confidence": 0.95 }
  ],
  "gap": {
    "gap_score": 72.4,
    "matched_skills":      ["Python", "SQL", "BigQuery", "Spark", "dbt"],
    "missing_skills":      ["Kafka", "Terraform"],
    "transferable_skills": [{ "jd_skill": "Airflow", "candidate_skill": "Luigi", "similarity": 0.81 }],
    "recommendation":  "Strong match. Bridge Kafka and Terraform to close remaining gaps."
  },
  "industry_match": {
    "top_industry": "data_ai",
    "scores": [
      { "rank": 1, "industry": "data_ai",     "match_score": 88.3 },
      { "rank": 2, "industry": "cloud_devops", "match_score": 71.1 }
    ]
  },
  "narrative": "A data-first engineer with a strong foundation in Python and BigQuery,
   your transferable workflow orchestration experience positions you well for
   this role. Rounding out Kafka and Terraform would make you a top candidate.",
  "stages": {
    "extract":   { "success": true,  "duration_ms": 342  },
    "gap":       { "success": true,  "duration_ms": 1820 },
    "industry":  { "success": true,  "duration_ms": 510  },
    "narrative": { "success": true,  "duration_ms": 3102 },
    "pathway":   { "success": true,  "duration_ms": 0,    "error": "skipped" }
  },
  "total_duration_ms": 5892
}

BigQuery Medallion Lakehouse

Bronze — Raw Ingestion

  • raw_resume_ingestion
  • raw_jd_ingestion

Append-only. Every source document preserved as-is.

Silver — Validated

  • candidate_skills
  • jd_skill_profiles
  • ingestion_log

Deduplicated, validated, enriched via MERGE SQL.

Gold — Analytics

  • match_scores
  • industry_rankings
  • candidate_readiness

Computed scores. Readiness index = 40% match + 30% industry + 20% confidence + 10% breadth.

Getting Started

1

Clone & install

git clone https://github.com/vipul9811kumar/ReSkillio.git
cd ReSkillio
pip install -r requirements.txt
python -m spacy download en_core_web_lg
2

Configure GCP

cp .env.example .env
# Set GCP_PROJECT_ID and GOOGLE_APPLICATION_CREDENTIALS
3

Bootstrap GCP resources

python scripts/setup_gcp.py
python scripts/build_industry_vectors.py
4

Start the API

uvicorn reskillio.api.main:app --reload --port 8000
# Docs at http://localhost:8000/docs
5

Run the demo

curl -X POST http://localhost:8000/analyze \
  -F "resume=@data/raw/sample_resume.pdf" \
  -F "target_role=Senior Data Engineer"