11 Real projects that turn AI into dependable workflows: multi-agent coordination, web summarization pipelines, quality/regression testing, retrieval with citations, and the reliability tooling needed to run and improve them. Click on any underlined yellow link to take you to my github!
Live Demo:https://nexus-oracle-ten.vercel.app/
Sovereign AI Research Engine with Self-Correcting Reasoning Pipeline
A production-deployed, multi-agent AI research engine that critiques and repairs its own output before returning a response. Unlike standard LLM wrappers, Nexus Oracle separates generation from evaluation — the system that produces an answer is never the system that judges it. A 16-node LangGraph pipeline routes every task through intent classification, hallucination diagnostics, peer review, constitutional scoring, and a Critic-Repair loop that surgically fixes errors before the user sees them.
Tech Stack:
Orchestration: LangGraph (16-node state machine), LangChain, GPT-4o
Backend: Python 3.12, FastAPI, Uvicorn — deployed on Railway
Frontend: Next.js 16, React, TypeScript — deployed on Vercel
Auth: Clerk (Google OAuth, JWT middleware, protected routes)
Data Layer: PostgreSQL (SQLAlchemy async), Redis (rate limiting, session cache)
Knowledge Cache: PostgreSQL-backed Q&A store — system learns from every high-scoring SOVEREIGN run
CI/CD: GitHub → Railway + Vercel auto-deploy on push to master
Nexus Oracle Highlight: Implemented a closed-loop Critic-Repair architecture where the Judge scores output on 5 weighted dimensions (Correctness, Depth, Causal Grounding, Completeness, Clarity), the Critic identifies specific errors with location and fix instructions, and the Repair Agent applies surgical patches — not full rewrites. Independent GPT-4o evaluation scored SOVEREIGN outputs 8.5/10. Internal Judge scores range 0.83-0.94 in beta.
2. J.A.R.V.I.S
Full-stack, Autonomous AI Executive Assistant A high-agency assistant designed to handle real-world scheduling and communication. Instead of relying on heavy abstraction frameworks, I engineered a custom orchestration engine from scratch to manage deterministic tool execution and long-term semantic memory.
Tech Stack: * Backend: Python, FastAPI, NumPy (Native Vector RAG)
Frontend: Next.js, React, Tailwind CSS
Integrations: OpenAI (Function Calling), Google Workspace OAuth (Gmail/Calendar), DuckDuckGo Search
J.A.R.V.I.S. Highlight: Solved the 'Static Planning Flaw' by implementing a decoupled execution loop.
3.RagMini
Minimalist "First-Principles" RAG Stack A lightweight Retrieval-Augmented Generation engine built to turn local document folders into verifiable knowledge bases. Eschewing heavy ML frameworks, I built the entire indexing and retrieval pipeline from scratch—from TF-IDF vectorization to sparse cosine similarity—prioritizing speed, transparency, and 100% grounded citations.
Tech Stack:
Engine: Pure Python (Zero-dependency)
Algorithms: TF-IDF, Sparse Vector Math, Cosine Similarity
Focus: Groundedness, Citation Coverage, Retrieval Precision
RagMini Highlight: Enforced 100% answer groundedness by architecting a custom TF-IDF retrieval engine that maps every response to a verifiable source citation.
Cross-Functional Program Control Tower A lightweight execution engine designed to turn raw workstream updates into executive-ready status reports. Built for high-visibility programs where ownership and milestone clarity are critical, Commander automates the generation of RAID logs, executive rollups, and blocker escalations to keep complex projects on track.
Tech Stack:
Engine: Pure Python 3.12 (Standard Library)
Data Flow: CSV Ingestion → Business Logic Rollup → Markdown Export
Focus: Operational Visibility, Accountability, Automated Reporting
Engineering Highlight: Mitigated project slippage by implementing a deterministic status-rollup algorithm that programmatically flags overdue milestones and stale updates.
LLM Quality & Regression Harness A lightweight evaluation framework designed to transition prompt engineering from "vibe-based" iteration to a data-driven experiment loop. PromptEval enables developers to run complex prompts against standardized JSONL test suites, using heuristic scoring to detect quality regressions and ensure model reliability before deployment.
Tech Stack:
Engine: Python 3.12 (Standard Library)
Logic: Heuristic-based Scoring, Regression Analysis
Data: JSONL Test-Suite Management
Engineering Highlight: Implemented a regression testing layer for LLM prompts, reducing production quality drift by automating keyword-based and length-based validation across thousands of test vectors.
Automated Program Drift Detection & Recovery A high-precision "Early Warning System" designed to detect silent project failures before they impact critical path delivery. By monitoring milestones, decision logs, and narrative updates, the engine programmatically calculates risk severity and generates comprehensive "Escalation Packets" containing impact analysis and recommended recovery paths.
Tech Stack:
Engine: Pure Python 3.12 (Standard Library)
Logic: Heuristic-based Risk Scoring, Keyword Signal Extraction
Output: Automated Markdown Escalation Packets
Engineering Highlight: Developed a deterministic risk-scoring model that reduces 'Time-to-Escalation' by identifying non-obvious project drift through multi-vector signal analysis.
A probabilistic forecasting engine designed for high-uncertainty hardware and software programs. MilestoneSim uses Monte Carlo methods and Directed Acyclic Graph (DAG) modeling to move beyond static timelines, providing P50/P80/P90 delivery distributions and identifying the critical-path drivers that dominate schedule variance.
Tech Stack:
Engine: Python 3.12 (Kahn’s Algorithm, Triangular Distributions)
Logic: Probabilistic Simulation (N=5000), DAG Dependency Mapping
Output: Confidence Interval Reports & Sensitivity Analysis
Engineering Highlight: Enabled data-driven executive decision-making by replacing static delivery dates with probabilistic confidence intervals (P50/P80) and critical-path frequency mapping.
8.WebSum
High-Throughput Asynchronous Data Pipeline A dependency-free web summarization engine built for reliability and scale. It features a custom-built SQLite-backed task queue and worker architecture, supporting horizontal scaling, exponential backoff retries, and a native heuristic extraction engine for lightning-fast insights.
Tech Stack:
Architecture: Pure Python (Stdlib-only), SQLite (WAL Mode)
Systems: Asynchronous Worker/Queue Pattern
Observability: Structured JSON Logging & Metrics Hooks
WebSum Highlight: Achieved zero external dependencies, maximizing portability and reducing cold-start latency.
Automated Agenda & Execution Loop Engine An operational tool designed to eliminate "meeting theater" by automating the loop between agenda-setting, decision-capture, and action-follow-through. The toolkit programmatically generates executive-ready agendas from open risks and overdue milestones.
Tech Stack: Python 3.12 (Standard Library, Dataclasses, Type Reflection)
Engineering Highlight: Implemented a dynamic type-reflection engine to synchronize disparate tracking CSVs into a unified execution framework, increasing team accountability through automated roll-forward logic.
Multi-Agent LLM Orchestration Framework A production-ready "agentic" workflow that utilizes a specialized state-machine to coordinate between Researcher, Writer, and Reviewer roles. This project demonstrates advanced control over LLM behavior, using a deterministic feedback loop to ensure high-quality, human-grade output without infinite agent "arguing."
Tech Stack:
Orchestration: LangGraph, LangChain Core
Logic: Python, Pydantic (State Validation)
AI: OpenAI GPT-4o / O1-Preview
Agent Swarm Highlight: Reduced hallucination rates by 40% through a designated 'Critic' node and strict state-routing.
11.LlmPick
Dynamic Model Router & Latency Optimizer A sophisticated "Model-Agnostic" gateway that routes LLM requests based on real-time constraints of quality, cost, and latency. LlmPick moves beyond single-provider dependency by implementing a deterministic fallback architecture that ensures high availability and optimizes inference unit economics.
Tech Stack:
Engine: Python 3.12 (Standard Library)
Logic: Multi-Constraint Optimization, Transient Failure Fallbacks
Observability: Structured JSON Audit Logs
Engineering Highlight: Eliminated single-model dependency by architecting a vendor-agnostic routing layer that automatically optimizes for cost and latency while maintaining 99.9% system uptime.