ReSkillio — System Architecture

Five-Stage Pipeline

Stage 1

Skill Extraction

spaCy PhraseMatcher + NER pulls 200+ skills from resume text and persists to BigQuery.

Stage 2

Gap Analysis

Exact match + semantic similarity via Vertex AI embeddings. Gap score 0–100 with transferable skills.

Stage 3

Industry Match

BQML cosine distance against 8 industry centroid vectors. Ranked fit scores in-database.

Stage 4

Gemini Narrative

RAG-grounded 3-sentence career story from Gemini 2.5 Flash — no hallucinations, facts only.

Stage 5 · opt-in

90-Day Pathway

CrewAI two-agent crew researches real courses (DuckDuckGo) and builds a phased reskilling roadmap.

System Architecture

GCP Service Map

Service	Used for	Resource
BigQuery	Skill storage, profiles, embeddings, JD catalog, industry vectors, drift metrics	`reskillio.*`
BigQuery ML	Cosine distance against 8 industry centroid vectors — scoring in-database	`industry_vectors`
BigQuery Lakehouse	Medallion Bronze/Silver/Gold layers for analytics & auditability	`reskillio_bronze/silver/gold.*`
Vertex AI Embeddings	768-dim skill vectors for gap analysis + industry matching	`text-embedding-004`
Vertex AI Gemini	RAG-grounded career narrative generation	`gemini-2.5-flash`
Vertex AI Model Registry	Versioned spaCy skill extractor with F1 gating	`reskillio-skill-extractor`
Vertex AI Pipelines (KFP)	Orchestrated PDF ingestion: load → extract → embed	`reskillio-ingestion-pipeline`
Cloud Storage	Model artifacts, taxonomy JSON, pipeline root	`{project}-models`
Cloud Build	CI/CD retraining on taxonomy.json change in GCS	Pub/Sub trigger
Cloud Monitoring	Drift metrics + alert policy (unknown_rate > 20%)	3 custom metric descriptors
Cloud Run	FastAPI API hosting	`reskillio-api`

API Reference

Demo endpoint — `POST /analyze`

# PDF upload
curl -X POST http://localhost:8000/analyze \
  -F "resume=@resume.pdf" \
  -F "target_role=Senior Data Engineer" \
  -F "jd_text=We need Python, BigQuery, Airflow, dbt..." \
  -F "candidate_id=demo-001"

# Plain text (no PDF needed)
curl -X POST http://localhost:8000/analyze \
  -F "resume_text=Experienced data engineer with Python and BigQuery..." \
  -F "target_role=Senior Data Engineer"

# With 90-day pathway (~45s extra)
curl -X POST http://localhost:8000/analyze \
  -F "resume=@resume.pdf" \
  -F "target_role=Senior Data Engineer" \
  -F "include_pathway=true"

All Endpoints

POST

/analyze

Full 5-stage career-rebound analysis

POST

/extract

Extract skills from raw text

POST

/resume/upload

PDF upload — section-aware extraction

GET

/candidate/{id}/profile

Aggregated skill profile

POST

/jd

Ingest job description

POST

/gap

Gap analysis vs stored JD

GET

/industry/match/{id}

Industry fit scores (8 industries)

POST

/narrative

Gemini RAG career narrative

POST

/agent/extract

LangGraph stateful skill extractor

POST

/market/analyze

CrewAI real-time skill demand

POST

/pathway/plan

CrewAI 90-day reskilling roadmap

POST

/embeddings/candidate/{id}

Embed candidate skills

GET

/registry/versions

List Vertex AI model versions

GET

/monitoring/drift/recent

Recent drift records

GET

/lakehouse/status

Row counts across Bronze/Silver/Gold

GET

/health

Health check

Sample Output

// POST /analyze response (truncated)
{
  "candidate_id":   "demo-001",
  "target_role":    "Senior Data Engineer",
  "skill_count":    47,
  "top_skills": [
    { "name": "Python",   "category": "technical", "confidence": 0.97 },
    { "name": "BigQuery", "category": "tool",      "confidence": 0.95 }
  ],
  "gap": {
    "gap_score": 72.4,
    "matched_skills":      ["Python", "SQL", "BigQuery", "Spark", "dbt"],
    "missing_skills":      ["Kafka", "Terraform"],
    "transferable_skills": [{ "jd_skill": "Airflow", "candidate_skill": "Luigi", "similarity": 0.81 }],
    "recommendation":  "Strong match. Bridge Kafka and Terraform to close remaining gaps."
  },
  "industry_match": {
    "top_industry": "data_ai",
    "scores": [
      { "rank": 1, "industry": "data_ai",     "match_score": 88.3 },
      { "rank": 2, "industry": "cloud_devops", "match_score": 71.1 }
    ]
  },
  "narrative": "A data-first engineer with a strong foundation in Python and BigQuery,
   your transferable workflow orchestration experience positions you well for
   this role. Rounding out Kafka and Terraform would make you a top candidate.",
  "stages": {
    "extract":   { "success": true,  "duration_ms": 342  },
    "gap":       { "success": true,  "duration_ms": 1820 },
    "industry":  { "success": true,  "duration_ms": 510  },
    "narrative": { "success": true,  "duration_ms": 3102 },
    "pathway":   { "success": true,  "duration_ms": 0,    "error": "skipped" }
  },
  "total_duration_ms": 5892
}

BigQuery Medallion Lakehouse

Bronze — Raw Ingestion

raw_resume_ingestion
raw_jd_ingestion

Append-only. Every source document preserved as-is.

Silver — Validated

candidate_skills
jd_skill_profiles
ingestion_log

Deduplicated, validated, enriched via MERGE SQL.

Gold — Analytics

match_scores
industry_rankings
candidate_readiness

Computed scores. Readiness index = 40% match + 30% industry + 20% confidence + 10% breadth.

Getting Started

Clone & install

git clone https://github.com/vipul9811kumar/ReSkillio.git
cd ReSkillio
pip install -r requirements.txt
python -m spacy download en_core_web_lg

Configure GCP

cp .env.example .env
# Set GCP_PROJECT_ID and GOOGLE_APPLICATION_CREDENTIALS

Bootstrap GCP resources

python scripts/setup_gcp.py
python scripts/build_industry_vectors.py

Start the API

uvicorn reskillio.api.main:app --reload --port 8000
# Docs at http://localhost:8000/docs

Run the demo

curl -X POST http://localhost:8000/analyze \
  -F "resume=@data/raw/sample_resume.pdf" \
  -F "target_role=Senior Data Engineer"

Five-Stage Pipeline

Skill Extraction

Gap Analysis

Industry Match

Gemini Narrative

90-Day Pathway

System Architecture

GCP Service Map

API Reference

Demo endpoint — POST /analyze

All Endpoints

Sample Output

BigQuery Medallion Lakehouse

Bronze — Raw Ingestion

Silver — Validated

Gold — Analytics

Getting Started

Clone & install

Configure GCP

Bootstrap GCP resources

Start the API

Run the demo

Demo endpoint — `POST /analyze`