📦

Omni Agent

Autonomous multi-agent AI system with document processing, web automation, and code execution powered by LLM agents and MCP servers

0 installs

14 stars

4 forks

Trust: 56 — Fair

Devtools

Installation

npx omni-agent

Ask AI about Omni Agent

I know everything about Omni Agent. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

OmniAgent

Advanced Multi-Agent AI System with autonomous document processing, web automation, code execution, and intelligent query retrieval powered by LLM agents and MCP (Model Context Protocol) servers
Winner of Bajaj Finserv HackRX 6.0 Hackathon 2025, selected from 10,000+ teams
Our final pitch presentation for HackRX 6.0 can be found here: Team ILLVZN Final Pitch.pdf

Overview

OmniAgent is a sophisticated agentic AI system that combines multiple specialized AI agents with powerful automation capabilities. The system uses Model Context Protocol (MCP) servers to provide autonomous agents with advanced tools for complex task execution.

Core Agentic Capabilities

Autonomous Document Processing

Multi-format Processing: PDF, PPTX, DOCX, Excel (XLSX), CSV, images (PNG, JPG, TIFF), and plain text files
Advanced OCR Agents: Tesseract and EasyOCR for extracting text from images and scanned documents
Large Document Handling: Intelligent chunking and processing of documents up to thousands of pages
Smart Content Analysis: RAG (Retrieval-Augmented Generation) pipeline with semantic understanding

Web Automation & Browser Control

Intelligent Web Crawling: Automated website navigation and content extraction
Dynamic Page Interaction: JavaScript-enabled rendering and form interaction
URL Content Processing: Direct processing of web pages, APIs, and online documents
Real-time Data Extraction: Live web data retrieval with content monitoring

Code Execution & Analysis

Multi-language Execution: Python, JavaScript, shell commands, and script automation
Repository Analysis: GitHub repository processing and code understanding
Code Generation: Automated code creation and documentation
Interactive Execution: Real-time code execution with result visualization

Intelligent Query & Reasoning

Natural Language Processing: Complex query interpretation with contextual understanding
Multi-step Task Execution: Breaking down complex requests into executable agent workflows
Cross-source Analysis: Correlating information across documents, web content, and databases
Streaming AI Responses: Real-time agent communication with progress tracking

Multi-Agent Architecture

This full-stack agentic ecosystem features:

Agent Orchestration: Coordinated AI agents with specialized capabilities
MCP Tool Integration: Extensible agent toolkit via Model Context Protocol
Real-time Observability: VoltAgent monitoring for agent behavior and debugging
High-performance Pipeline: Scalable document, web, and code processing
Interactive Agent Interface: User-friendly chat with streaming agent responses

HackRX Chat Architecture Rounds 1-4 Architecture Rounds 5-7 Technology Stack VoltAgent Dashboard

Architecture

Frontend/Main Backend (Next.js 14+ with TypeScript)

Primary System for Rounds 5, 6, 7:

Chat Interface: Interactive chat UI with streaming responses at /chat
HackRX Evaluation Endpoint: Main backend API at /api/hackrx/run for evaluation purposes
Next.js API Routes: Server-side processing with AI integration
MCP Client: Consumes tools from Python MCP server when needed
MCP Servers: Playwright MCP Server (https://github.com/microsoft/playwright-mcp)
Observability: VoltAgent integration for monitoring and debugging (https://voltagent.dev)
Authentication: Supabase Auth with secure user management
Database: PostgreSQL via Supabase with vector support

Python Backend (Supplementary/Tool Provider)

For Rounds 1-4 - This was the primary backend system: The Python backend serves as a tool provider:

FastAPI Server: High-performance API with automatic documentation
MCP Server: Model Context Protocol server for tool integration using FastMCP
Document Processing: Advanced RAG pipeline with multiple vector store support
AI Integration: Multiple LLM providers (OpenAI, Google Gemini, Groq, Cerebras)
Vector Databases: Support for Pinecone, Qdrant, and PGVector
OCR Processing: Tesseract OCR and EasyOCR for image text extraction

Tech Stack

Component	Technology	Purpose
Frontend Framework	Next.js 14+ with TypeScript	Main application & API routes
UI Components	Radix UI + Tailwind CSS	Modern, accessible interface
State Management	TanStack Query	Frontend data management
HTTP Client	Axios	Frontend API communication
AI Integration	Vercel AI SDK	Streaming AI responses
Authentication	Supabase Auth	User management
Database	Supabase (PostgreSQL)	Data persistence
Observability	VoltAgent	Agent monitoring & debugging
Backend API	FastAPI + Python 3.12+	Document processing tools
MCP Integration	Model Context Protocol	Tool extensibility
Vector Storage	Pinecone, Qdrant, PGVector	Semantic search
Web Automation	Playwright MCP	Browser control
Document Processing	PyMuPDF, LangChain	Multi-format support
OCR Tools	PyMuPDF4LLM, Tesseract OCR, python-pptx, EasyOCR	Document text extraction
AI/ML	LangChain ecosystem, OpenAI GPT, Google Gemini, Groq, Cerebras	Multi-provider AI support
Runtime	Python 3.12+	Backend execution environment
Containerization	Docker + Docker Compose	Deployment
Platform Support	Windows 10/11 with PowerShell	Primary platform
Alternative	Docker (cross-platform)	Cross-platform deployment

Quick Start

IMPORTANT: For Rounds 5, 6, 7, the main backend is the Next.js application. The Python backend is optional and only needed for advanced document processing tools.

Option 1: Frontend/Main Backend Only (Recommended for Rounds 5, 6, 7)

Prerequisites

Node.js 18+
Git

Setup

# Clone the repository
git clone https://github.com/CubeStar1/omni-agent.git
cd omni-agent/frontend

# Install dependencies
npm install

# Configure environment
cp env.example .env.local
# Edit .env.local with your configuration

# Start the application
npm run dev

Access Points

Main Application: http://127.0.0.1:3000
Chat Interface: http://127.0.0.1:3000/chat
HackRX Evaluation API: http://127.0.0.1:3000/api/hackrx/run (Main Backend Endpoint)

Option 2: Full Setup with Python Tools (Optional)

Prerequisites

Python 3.12+
Node.js 18+
Git

Python Tools Backend Setup (Optional)

cd backend
python -m venv venv
.\venv\Scripts\Activate.ps1  # Windows
pip install -r requirements.txt

# Configure environment
cp env.example .env
# Edit .env with your API keys (or leave as is for defaults)

# Start main FastAPI server
python main.py

# Start MCP server (in separate terminal)
python run_mcp.py

Python services (when running):

FastAPI Server: http://127.0.0.1:8000
MCP Server: http://127.0.0.1:8001 (FastMCP server)

Frontend Setup

cd frontend
npm install
cp env.example .env.local
# Edit .env.local with your configuration (including MCP_URL if using Python tools)
npm run dev

Option 3: Python Tools with Docker (Optional)

The Python backend includes full Docker support:

# Docker setup for Python backend
cd backend
cp .env.example .env
# Edit .env with your API keys

# Using Docker Compose (recommended)
docker-compose up -d

# Or using PowerShell script
.\docker-run.ps1

# To view logs
docker-compose logs -f

Note: Docker is only available for the Python tools backend. The main Next.js application runs natively.

Configuration

Database

Go to Supabase and create a new project.
Paste the migrations from frontend/lib/supabase/migrations.sql into the SQL editor and run them.
Get your Supabase URL and Anon Key from the project settings.

Frontend (.env) - Main Application

NEXT_PUBLIC_SUPABASE_URL=
NEXT_PUBLIC_SUPABASE_ANON_KEY=
SUPABASE_ADMIN=

RESEND_API_KEY=
RESEND_DOMAIN=

NEXT_PUBLIC_APP_NAME=HackRX Chat
NEXT_PUBLIC_APP_ICON='/logos/hackrx-logo.webp'

# AI Providers
OPENAI_API_KEY=
XAI_API_KEY=
GROQ_API_KEY=
HACKRX_API_KEY=
HACKRX_BASE_URL="https://register.hackrx.in/llm/openai"
HACKRX_MCP_MODEL=hackrx-gpt-4.1-mini

# AI Tools
TAVILY_API_KEY=

NEXT_PUBLIC_MCP_URL=http://127.0.0.1:8001/mcp

# VoltAgent Observability
VOLTAGENT_PUBLIC_KEY=
VOLTAGENT_SECRET_KEY=

GITHUB_PERSONAL_ACCESS_TOKEN=

Python Tools Backend (.env) - Optional

# Environment Configuration
PROJECT_NAME="OmniAgent Intelligence System"
ENVIRONMENT=production

# Authentication
BEARER_TOKEN=your-bearer-token

# MCP Server Configuration
MCP_SERVER_PORT=8001

# Vector Store Configuration
DEFAULT_VECTOR_STORE=inmemory
EMBEDDING_MODEL=text-embedding-3-small

# Pinecone Configuration
PINECONE_API_KEY=your-pinecone-key
PINECONE_INDEX_NAME=hackrx-documents
PINECONE_ENVIRONMENT=us-east-1

# LLM Providers
OPENAI_API_KEY=your-openai-key
GEMINI_API_KEY=your-gemini-key
DEFAULT_LLM_PROVIDER=openai
GROQ_API_KEY=your-groq-key
CEREBRAS_API_KEY=your-cerebras-key

# Processing Configuration
CHUNK_SIZE=1000
CHUNK_OVERLAP=200

# Database
SUPABASE_URL=your-supabase-url
SUPABASE_ANON_KEY=your-supabase-key
SUPABASE_SERVICE_KEY=your-supabase-service-key

# Additional Settings
ENABLE_REQUEST_LOGGING=true
AGENT_ENABLED=true

API Endpoints

Main Application (Next.js)

For Rounds 5, 6, 7 - Primary Backend:

Base URL: http://127.0.0.1:3000
Chat Interface: http://127.0.0.1:3000/chat
HackRX Evaluation Endpoint: http://127.0.0.1:3000/api/hackrx/run
Chat API: http://127.0.0.1:3000/api/chat

Python Tools Backend (Optional - Rounds 1-4)

Base URL: http://127.0.0.1:8000
Health Check: http://127.0.0.1:8000/health
Documentation: http://127.0.0.1:8000/docs

MCP Server (Optional)

Base URL: http://127.0.0.1:8001
Tools: retrieve_context, rag_search

Project Structure

omni-agent/
├── frontend/                   # Next.js Main Application (Primary Backend for Rounds 5,6,7)
│   ├── app/                   # Next.js app directory
│   │   ├── chat/             # Chat interface components
│   │   ├── api/              # **Main Backend API Routes**
│   │   │   ├── chat/         # Chat API endpoint
│   │   │   └── hackrx/run/   # **HackRX evaluation endpoint (Main Backend)**
│   │   └── ...
│   ├── components/            # React components
│   ├── lib/                   # Utilities and configurations
│   └── package.json           # Node dependencies
├── backend/                   # Python Tools Backend (Supplementary/Optional)
│   ├── app/                  # FastAPI application code
│   ├── mcp_server/           # FastMCP server implementation
│   │   ├── main.py          # MCP server entry point
│   │   ├── server.py        # FastMCP server configuration
│   │   ├── tools/           # MCP tools (retrieve_context, rag_search)
│   │   └── config/          # MCP configuration
│   ├── requirements.txt      # Python dependencies
│   ├── requirements-mcp.txt  # MCP-specific dependencies
│   ├── main.py              # FastAPI server entry point
│   ├── run_mcp.py           # MCP server launcher
│   ├── Dockerfile           # Docker configuration
│   ├── docker-compose.yml   # Docker Compose setup
│   ├── DOCKER_README.md     # Docker documentation
│   └── README.md            # Python backend documentation
├── challenge.html            # Challenge description
└── README.md                # This file

Development Commands

Main Application (Frontend/Next.js Backend)

cd frontend

# Start development server (Main Application)
npm run dev

# Production build
npm run build

# Start production server
npm run start

# Run linting
npm run lint

Python Tools Backend (Optional)

cd backend

# Start FastAPI server
python main.py

# Start MCP server
python run_mcp.py

# Run API tests
python test_api.py

# Docker commands
docker-compose up -d              # Start services
docker-compose up -d --build      # Rebuild and start
docker-compose down               # Stop services
docker-compose logs -f            # View logs

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

Frontend Chat: CubeStar1/ai-sdk-template
Playwright MCP: Microsoft Playwright MCP
VoltAgent: VoltAgent Observability