Argus
Automated Retrieval and GPT Understanding System by utilizing Azure Document Intelligence in combination with GPT models.
Installation
npx argusAsk AI about Argus
Powered by Claude ยท Grounded in docs
I know everything about Argus. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
๐๏ธ ARGUS: The All-Seeing Document Intelligence Platform
Named after Argus Panoptes, the mythological giant with a hundred eyesโARGUS never misses a detail in your documents.
๐ Transform Document Processing with AI Intelligence
ARGUS revolutionizes how organizations extract, understand, and act on document data. By combining the precision of Azure Document Intelligence with the contextual reasoning of GPT-5.4, ARGUS doesn't just read documentsโit understands them.
๐ก Why ARGUS?
Traditional OCR solutions extract text but miss the context. AI-only approaches struggle with complex layouts. ARGUS bridges this gap, delivering enterprise-grade document intelligence that:
- ๐ฏ Extracts with Purpose: Understands document context, not just text
- โก Scales Effortlessly: Process thousands of documents with cloud-native architecture
- ๐ Secures by Design: Enterprise security with managed identities and RBAC
- ๐ง Learns Continuously: Configurable datasets adapt to your specific document types
- ๐ Measures Success: Built-in evaluation tools ensure consistent accuracy
๐ Key Capabilities
๐ Intelligent Document Understanding
โก Enterprise-Ready Performance
|
๐๏ธ Advanced Control & Customization
๐ Comprehensive Analytics
|
๐๏ธ Architecture: Built for Scale and Security
ARGUS employs a modern, cloud-native architecture designed for enterprise workloads:
graph TB
subgraph "๐ฅ Document Input"
A[๐ Documents] --> B[๐ Azure Blob Storage]
C[๐ Direct Upload API] --> D[๐ FastAPI Backend]
end
subgraph "๐ง AI Processing Engine"
B --> D
D --> E{๐ OCR Provider}
E -->|Azure| E1[Azure Document Intelligence]
E -->|Mistral| E2[Mistral Document AI]
D --> F[๐ค GPT-5.4]
E1 --> G[โ๏ธ Hybrid Processing Pipeline]
E2 --> G
F --> G
end
subgraph "๐ก Intelligence & Analytics"
G --> H[๐ Custom Evaluators]
G --> I[๐ฌ Interactive Chat]
H --> J[๐ Results & Analytics]
end
subgraph "๐พ Data Layer"
G --> K[๐๏ธ Azure Cosmos DB]
J --> K
I --> K
K --> L[๐ฑ Next.js Frontend]
end
style A fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style B fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style C fill:#e8f5e8,stroke:#388e3c,stroke-width:2px
style D fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style E fill:#fce4ec,stroke:#c2185b,stroke-width:2px
style E1 fill:#fce4ec,stroke:#c2185b,stroke-width:2px
style E2 fill:#fce4ec,stroke:#c2185b,stroke-width:2px
style F fill:#e0f2f1,stroke:#00695c,stroke-width:2px
style G fill:#fff8e1,stroke:#ffa000,stroke-width:2px
style H fill:#f1f8e9,stroke:#558b2f,stroke-width:2px
style I fill:#e8eaf6,stroke:#3f51b5,stroke-width:2px
style J fill:#fdf2e9,stroke:#e65100,stroke-width:2px
style K fill:#e0f7fa,stroke:#0097a7,stroke-width:2px
style L fill:#f9fbe7,stroke:#827717,stroke-width:2px
๐ง Infrastructure Components
| Component | Technology | Purpose |
|---|---|---|
| ๐ Backend API | Azure Container Apps + FastAPI | High-performance document processing engine |
| ๐ฑ Frontend UI | Next.js (React) | Modern document management interface |
| ๐ Document Storage | Azure Blob Storage | Secure, scalable document repository |
| ๐๏ธ Metadata Database | Azure Cosmos DB | Results, configurations, and analytics |
| ๐ OCR Engine | Azure Document Intelligence or Mistral Document AI | Structured text and layout extraction |
| ๐ง AI Reasoning | Azure OpenAI (GPT-5.4) | Contextual understanding and extraction |
| ๐๏ธ Container Registry | Azure Container Registry | Private, secure container images |
| ๐ Security | Managed Identity + RBAC | Zero-credential architecture |
| ๐ Network | VNet + Private Endpoints | Network isolation for all Azure services |
| ๐ Secrets | Azure Key Vault | Centralized secrets management |
| ๐ Monitoring | Application Insights | Performance and health monitoring |
๐ Security Architecture
ARGUS implements a defense-in-depth security model:
Network Isolation
- VNet Integration: All Container Apps run within a dedicated Virtual Network (
10.0.0.0/16) - Private Endpoints: Storage, Cosmos DB, OpenAI, Document Intelligence, and Key Vault are accessible only through private endpoints
- Private DNS Zones: Automatic DNS resolution for private endpoints via Azure Private DNS
- No Public Access: All backend services have
publicNetworkAccess: Disabled
Identity & Authentication
- Managed Identity: User-assigned managed identity for all service-to-service authentication
- No API Keys: Local authentication is disabled on all Azure services (
disableLocalAuth: true) - No Shared Keys: Storage account shared key access is disabled (
allowSharedKeyAccess: false) - RBAC-Only Access: All permissions are granted through Azure RBAC role assignments
RBAC Roles (Principle of Least Privilege)
| Role | Scope | Purpose |
|---|---|---|
| Storage Blob Data Contributor | Storage Account | Read/write blob data |
| Cosmos DB Built-in Data Contributor | Cosmos DB Account | Read/write database items |
| Cognitive Services User | Document Intelligence | OCR operations |
| Cognitive Services OpenAI User | Azure OpenAI | Model inference |
| Key Vault Secrets User | Key Vault | Read secrets |
| AcrPull | Container Registry | Pull container images |
โก Quick Start: Deploy in Minutes
๐ Prerequisites
๐ ๏ธ Required Tools (Click to expand)
-
Docker
# Install Docker (required for containerization during deployment) # Visit https://docs.docker.com/get-docker/ for installation instructions -
Azure Developer CLI (azd)
curl -fsSL https://aka.ms/install-azd.sh | bash -
Azure CLI
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash -
Azure Subscription
- An active Azure subscription with permissions to create resources
- The deployment automatically provisions all required Azure services (OpenAI, Storage, Cosmos DB, etc.)
- Authentication uses managed identity โ no API keys required
๐ One-Command Deployment
# 1. Clone the repository
git clone https://github.com/Azure-Samples/ARGUS.git
cd ARGUS
# 2. Login to Azure
az login
# 3. Deploy everything with a single command
azd up
That's it! ๐ Your ARGUS instance is now running in the cloud.
โ Verify Your Deployment
# Check system health
curl "$(azd env get-value BACKEND_URL)/health"
# Expected response:
{
"status": "healthy",
"services": {
"cosmos_db": "โ
connected",
"blob_storage": "โ
connected",
"document_intelligence": "โ
connected",
"azure_openai": "โ
connected"
}
}
# View live application logs
azd logs --follow
๐ฎ Usage Examples: See ARGUS in Action
๐ Method 1: Upload via Frontend Interface (Recommended)
The easiest way to process documents is through the user-friendly web interface:
-
Access the Frontend:
# Get the frontend URL after deployment azd env get-value FRONTEND_URL -
Upload and Process Documents:
- Navigate to the "๐ง Process Files" tab
- Select your dataset from the dropdown (e.g., "default-dataset", "medical-dataset")
- Use the file uploader to select PDF, image, or Office documents
- Click "Submit" to upload files
- Files are automatically processed using the selected dataset's configuration
- Monitor processing status in the "๐ Explore Data" tab
๐ค Method 2: Direct Blob Storage Upload
For automation or bulk processing, upload files directly to Azure Blob Storage:
# Upload a document to be processed automatically
az storage blob upload \
--account-name "$(azd env get-value STORAGE_ACCOUNT_NAME)" \
--container-name "datasets" \
--name "default-dataset/invoice-2024.pdf" \
--file "./my-invoice.pdf" \
--auth-mode login
# Files uploaded to blob storage are automatically detected and processed
# Results can be viewed in the frontend or retrieved via API
๐ฌ Example 3: Interactive Document Chat
Ask questions about any processed document through the API:
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"blob_url": "https://mystorage.blob.core.windows.net/datasets/default-dataset/contract.pdf",
"question": "What are the key terms and conditions in this contract?"
}' \
"$(azd env get-value BACKEND_URL)/api/chat"
# Get intelligent answers:
{
"answer": "The key terms include: 1) 12-month service agreement, 2) $5000/month fee, 3) 30-day termination clause...",
"confidence": 0.91,
"sources": ["page 1, paragraph 3", "page 2, section 2.1"]
}
๐ค MCP Integration: AI-Powered Document Access
ARGUS supports the Model Context Protocol (MCP) using the modern Streamable HTTP transport, enabling AI assistants like GitHub Copilot, Claude, and other MCP-compatible clients to interact directly with your document intelligence platform.
๐ What is MCP?
The Model Context Protocol is an open standard that allows AI assistants to securely connect to external data sources and tools. With ARGUS MCP support, your AI assistant can:
- ๐ List and search documents across all your datasets
- ๐ Query document content and extracted data
- ๐ฌ Chat with documents using natural language
- ๐ค Upload new documents for processing
- โ๏ธ Manage datasets and configurations
โก Quick Setup
Add ARGUS to your MCP client configuration:
VS Code / GitHub Copilot (~/.vscode/mcp.json or workspace settings):
{
"mcpServers": {
"argus": {
"url": "https://<your-backend-url>/mcp"
}
}
}
Tip: After deployment with
azd up, get your backend URL from the Azure Portal or runazd showto find the Container App URL.
Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"argus": {
"url": "https://<your-backend-url>/mcp"
}
}
}
Note: ARGUS uses the Streamable HTTP transport (the modern MCP standard). The endpoint is a single
/mcppath that handles all MCP communication.
๐ ๏ธ Available MCP Tools
| Tool | Description |
|---|---|
argus_list_documents | List all processed documents with filtering options |
argus_get_document | Get detailed document information including OCR and extraction results |
argus_chat_with_document | Ask natural language questions about a document |
argus_search_documents | Search documents by keyword across all datasets |
argus_list_datasets | List available dataset configurations |
argus_get_dataset_config | Get system prompt and schema for a dataset |
argus_create_dataset | Create a new dataset with custom prompt and schema |
argus_process_document_url | Queue a document for processing from blob URL |
argus_get_extraction | Get extracted structured data from a document |
argus_get_upload_url | Get a pre-signed SAS URL for direct document upload |
๐ก Example Interactions
Once configured, you can interact with ARGUS through your AI assistant:
User: "Show me all invoices processed in the last week"
AI: [Uses argus_list_documents to retrieve recent invoices]
User: "What's the total amount on invoice INV-2024-001?"
AI: [Uses argus_get_document to fetch invoice details]
User: "I need to upload a new contract for processing"
AI: [Uses argus_get_upload_url to get a secure upload link]
User: "Compare the extraction results between these two invoices"
AI: [Uses argus_get_extraction on both documents and compares]
User: "Create a new dataset for processing purchase orders"
AI: [Uses argus_create_dataset with appropriate prompt and schema]
๐๏ธ Advanced Configuration
๐ Dataset Management
ARGUS uses datasets to define how different types of documents should be processed. A dataset contains:
- Model Prompt: Instructions telling the AI how to extract data from documents
- Output Schema: The target structure for extracted data (can be empty to let AI determine the structure)
- Processing Options: Settings for OCR, image analysis, summarization, and evaluation
When to create custom datasets: Create a new dataset when you have a specific document type that requires different extraction logic than the built-in datasets (e.g., contracts, medical reports, financial statements).
๐๏ธ Built-in Datasets
default-dataset/: Invoices, receipts, general business documentsmedical-dataset/: Medical forms, prescriptions, healthcare documents
๐ง Create Custom Datasets
Datasets are managed through the web frontend interface (deployed automatically with azd):
- Access the frontend (URL provided after azd deployment)
- Navigate to the Process Files tab
- Scroll to "Add New Dataset" section
- Configure your dataset:
- Enter dataset name (e.g., "legal-contracts")
- Define model prompt with extraction instructions
- Specify output schema (JSON format) or leave empty
- Set processing options (OCR, images, evaluation)
- Click "Add New Dataset" - it's saved directly to Cosmos DB
๏ฟฝ OCR Provider Configuration
ARGUS supports two OCR providers for document text extraction:
- Azure Document Intelligence (Default): Microsoft's enterprise OCR service with advanced layout understanding
- Mistral Document AI: Mistral's document processing service with markdown-optimized output
๐ง Configure OCR Provider
Via Frontend (Recommended):
- Navigate to Settings tab in the web interface
- Select OCR Provider section
- Choose your provider:
- Azure: Uses Azure Document Intelligence (automatically configured during deployment)
- Mistral: Requires additional configuration (endpoint, API key, model name)
- For Mistral, enter:
- Mistral Endpoint: Your Mistral Document AI API endpoint URL
- Mistral API Key: Your Mistral API authentication key
- Mistral Model: Model name (default:
mistral-document-ai-2505)
- Click "Update OCR Provider" to apply changes
Via Environment Variables: Set the following environment variables in your deployment:
# Choose OCR provider
OCR_PROVIDER=mistral # or "azure" (default)
# Mistral-specific configuration (only needed if OCR_PROVIDER=mistral)
MISTRAL_DOC_AI_ENDPOINT=https://your-endpoint.services.ai.azure.com/providers/mistral/azure/ocr
MISTRAL_DOC_AI_KEY=your-mistral-api-key
MISTRAL_DOC_AI_MODEL=mistral-document-ai-2505
Update via Azure Portal:
- Navigate to Azure Portal โ Container Apps โ Your Backend App
- Go to Settings โ Environment variables
- Add/update the variables listed above
- Restart the container app
Update via Azure CLI:
# Switch to Mistral
az containerapp update \
--name <your-backend-app-name> \
--resource-group <your-resource-group> \
--set-env-vars \
OCR_PROVIDER="mistral" \
MISTRAL_DOC_AI_ENDPOINT="https://your-endpoint.../ocr" \
MISTRAL_DOC_AI_KEY="your-api-key" \
MISTRAL_DOC_AI_MODEL="mistral-document-ai-2505"
# Switch back to Azure
az containerapp update \
--name <your-backend-app-name> \
--resource-group <your-resource-group> \
--set-env-vars OCR_PROVIDER="azure"
Note: OCR provider selection is configured at the solution level and applies to all document processing operations.
The Streamlit frontend is automatically deployed with azd up and provides a user-friendly interface for document management.
Note: ARGUS ships with two frontends: a modern Next.js interface (default, deployed as
ca-frontend) and a legacy Streamlit interface. The Next.js frontend is recommended for production use.
๐ฏ Frontend Features
| Tab | Functionality |
|---|---|
| ๐ง Process Files | Drag-and-drop document upload with real-time processing status |
| ๐ Explore Data | Browse processed documents, search results, view extraction details |
| โ๏ธ Settings | Configure datasets, adjust processing parameters, manage connections |
| ๐ Instructions | Interactive help, API documentation, and usage examples |
๏ธ Development & Customization
๐๏ธ Project Structure Deep Dive
ARGUS/
โโโ ๐ azure.yaml # Azure Developer CLI configuration
โโโ ๐ README.md # Project documentation & setup guide
โโโ ๐ LICENSE # MIT license file
โโโ ๐ CONTRIBUTING.md # Contribution guidelines
โโโ ๐ sample-invoice.pdf # Sample document for testing
โโโ ๐ง .env.template # Environment variables template
โโโ ๐ .github/ # GitHub Actions & workflows
โโโ ๐ .devcontainer/ # Development container configuration
โโโ ๐ .vscode/ # VS Code settings & extensions
โ
โโโ ๐ infra/ # ๐๏ธ Azure Infrastructure as Code
โ โโโ โ๏ธ main.bicep # Orchestrator Bicep template (calls modules)
โ โโโ โ๏ธ main.parameters.json # Infrastructure parameters & configuration
โ โโโ โ๏ธ main-containerapp.bicep # Container App specific infrastructure
โ โโโ โ๏ธ main-containerapp.parameters.json # Container App parameters
โ โโโ ๐ abbreviations.json # Azure resource naming abbreviations
โ โโโ ๐ modules/ # Modular Bicep components
โ โโโ โ๏ธ network.bicep # VNet, subnets, private DNS zones
โ โโโ โ๏ธ identity.bicep # User-assigned managed identity
โ โโโ โ๏ธ storage.bicep # Storage account + private endpoint
โ โโโ โ๏ธ cosmos.bicep # Cosmos DB + private endpoint
โ โโโ โ๏ธ ai-services.bicep # Azure OpenAI + model deployment + PE
โ โโโ โ๏ธ document-intelligence.bicep # Doc Intelligence + private endpoint
โ โโโ โ๏ธ key-vault.bicep # Key Vault + private endpoint
โ โโโ โ๏ธ container-registry.bicep # ACR for container images
โ โโโ โ๏ธ container-apps.bicep # CAE + backend/frontend container apps
โ โโโ โ๏ธ role-assignments.bicep # RBAC role assignments
โ โโโ โ๏ธ monitoring.bicep # Application Insights + Log Analytics
โ โโโ โ๏ธ event-processing.bicep # Event Grid subscriptions
โ
โโโ ๐ src/ # ๐ Core Application Source Code
โ โโโ ๐ containerapp/ # FastAPI Backend Service
โ โ โโโ ๐ main.py # FastAPI app lifecycle & configuration
โ โ โโโ ๐ api_routes.py # HTTP endpoints & request handlers
โ โ โโโ ๐ง dependencies.py # Azure client initialization & management
โ โ โโโ ๐ models.py # Pydantic data models & schemas
โ โ โโโ โ๏ธ blob_processing.py # Document processing pipeline orchestration
โ โ โโโ ๐๏ธ logic_app_manager.py # Azure Logic Apps concurrency management
โ โ โโโ ๐ณ Dockerfile # Container image definition
โ โ โโโ ๐ฆ requirements.txt # Python dependencies
โ โ โโโ ๐ REFACTORING_SUMMARY.md # Architecture documentation
โ โ โ
โ โ โโโ ๐ ai_ocr/ # ๐ง AI Processing Engine
โ โ โ โโโ ๐ process.py # Main processing orchestration & workflow
โ โ โ โโโ ๐ chains.py # LangChain integration & AI workflows
โ โ โ โโโ ๐ค model.py # Configuration models & data structures
โ โ โ โโโ โฑ๏ธ timeout.py # Processing timeout management
โ โ โ โ
โ โ โ โโโ ๐ azure/ # โ๏ธ Azure Service Integrations
โ โ โ โโโ โ๏ธ config.py # Environment & configuration management
โ โ โ โโโ ๐ doc_intelligence.py # Azure Document Intelligence OCR
โ โ โ โโโ ๐ผ๏ธ images.py # PDF to image conversion utilities
โ โ โ โโโ ๐ค openai_ops.py # Azure OpenAI API operations
โ โ โ
โ โ โโโ ๐ example-datasets/ # ๐ Default Dataset Configurations
โ โ โโโ ๐ datasets/ # ๐ Runtime dataset storage
โ โ โโโ ๐ evaluators/ # ๐ Data quality evaluation modules
โ โ
โ โโโ ๐ evaluators/ # ๐งช Evaluation Framework
โ โโโ ๐ field_evaluator_base.py # Abstract base class for evaluators
โ โโโ ๐ค fuzz_string_evaluator.py # Fuzzy string matching evaluation
โ โโโ ๐ฏ cosine_similarity_string_evaluator.py # Semantic similarity evaluation
โ โโโ ๐๏ธ custom_string_evaluator.py # Custom evaluation logic
โ โโโ ๐ json_evaluator.py # JSON structure validation
โ โโโ ๐ tests/ # Unit tests for evaluators
โ
โโโ ๐ frontend-next/ # ๐ฅ๏ธ Next.js Web Interface
โ โโโ ๐ฑ src/app/ # App Router pages and API routes
โ โ โโโ ๐ page.tsx # Home page with document processing
โ โ โโโ ๐ explore/ # Document browsing & analysis
โ โ โโโ ๐ settings/ # Configuration management
โ โ โโโ ๐ instructions/ # Help & documentation
โ โ โโโ ๐ api-docs/ # API reference documentation
โ โ โโโ ๐ mcp/ # MCP integration info
โ โ โโโ ๐ api/ # Backend proxy API routes
โ โโโ ๐ src/components/ # Reusable React components
โ โโโ ๐ src/lib/ # API client & utilities
โ โโโ ๐ณ Dockerfile # Frontend container definition
โ โโโ ๐ฆ package.json # Node.js dependencies
โ โโโ โ๏ธ next.config.js # Next.js configuration
โ
โโโ ๐ frontend/ # ๐ฅ๏ธ Legacy Streamlit Interface
โ โโโ ๐ฑ app.py # Main Streamlit application entry point
โ โโโ ๐ backend_client.py # API client for backend communication
โ โโโ ๐ค process_files.py # File upload & processing interface
โ โโโ ๐ explore_data.py # Document browsing & analysis UI
โ โโโ ๐ฌ document_chat.py # Interactive document Q&A interface
โ โโโ ๐ instructions.py # Help & documentation tab
โ โโโ โ๏ธ settings.py # Configuration management UI
โ โโโ ๐๏ธ concurrency_management.py # Performance tuning interface
โ โโโ ๐ concurrency_settings.py # Concurrency configuration utilities
โ โโโ ๐ณ Dockerfile # Frontend container definition
โ โโโ ๐ฆ requirements.txt # Python dependencies for frontend
โ โโโ ๐ static/ # Static assets (logos, images)
โ โโโ ๐ผ๏ธ logo.png # ARGUS brand logo
โ
โโโ ๐ demo/ # ๐ Sample Datasets & Examples
โ โโโ ๐ default-dataset/ # General business documents dataset
โ โ โโโ ๐ system_prompt.txt # AI extraction instructions
โ โ โโโ ๐ output_schema.json # Expected data structure
โ โ โโโ ๐ ground_truth.json # Validation reference data
โ โ โโโ ๐ Invoice Sample.pdf # Sample document for testing
โ โ
โ โโโ ๐ medical-dataset/ # Healthcare documents dataset
โ โโโ ๐ system_prompt.txt # Medical-specific extraction rules
โ โโโ ๐ output_schema.json # Medical data structure
โ โโโ ๐ eyes_surgery_pre_1_4.pdf # Sample medical document
โ
โโโ ๐ notebooks/ # ๐ Analytics & Evaluation Tools
โ โโโ ๐งช evaluator.ipynb # Comprehensive evaluation dashboard
โ โโโ ๐ output.json # Evaluation results & metrics
โ โโโ ๐ฆ requirements.txt # Jupyter notebook dependencies
โ โโโ ๐ README.md # Notebook usage instructions
โ โโโ ๐ outputs/ # Historical evaluation results
โ
โโโ ๐ docs/ # ๐ Documentation & Assets
โโโ ๐ผ๏ธ ArchitectureOverview.png # System architecture diagram
๐งช Local Development Setup
# Setup development environment
cd src/containerapp
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
pip install -r requirements.txt
# Configure local environment
cp ../../.env.template .env
# Edit .env with your development credentials
# Run with hot reload
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# Access API documentation
open http://localhost:8000/docs
๐ง Key Technologies & Libraries
| Category | Technologies |
|---|---|
| ๐ API Framework | FastAPI, Uvicorn, Pydantic |
| ๐ง AI/ML | LangChain, OpenAI SDK, Azure AI SDK |
| โ๏ธ Azure Services | Azure SDK (Blob, Cosmos, Document Intelligence, Key Vault) |
| ๐ฑ Frontend | Next.js 15, React, Tailwind CSS, shadcn/ui |
| ๐ Document Processing | PyMuPDF, Pillow, PyPDF2 |
| ๐ Data & Analytics | Pandas, NumPy, Matplotlib |
| ๐ Security | Azure Identity, managed identities, Private Endpoints |
API Reference: Complete Documentation
๐ Core Processing Endpoints
๐ POST /api/process-blob - Process Document from Storage
Request:
{
"blob_url": "https://storage.blob.core.windows.net/datasets/default-dataset/invoice.pdf",
"dataset_name": "default-dataset",
"priority": "normal",
"webhook_url": "https://your-app.com/webhooks/argus",
"metadata": {
"source": "email_attachment",
"user_id": "user123"
}
}
Response:
{
"status": "success",
"job_id": "job_12345",
"extraction_results": {
"invoice_number": "INV-2024-001",
"total_amount": "$1,250.00",
"confidence_score": 0.94
},
"processing_time": "2.3s",
"timestamp": "2024-01-15T10:30:00Z"
}
๐ค POST /api/process-file - Direct File Upload
Request (multipart/form-data):
file: [PDF/Image file]
dataset_name: "default-dataset"
priority: "high"
Response:
{
"status": "success",
"job_id": "job_12346",
"blob_url": "https://storage.blob.core.windows.net/temp/uploaded_file.pdf",
"extraction_results": {...},
"processing_time": "1.8s"
}
๐ฌ POST /api/chat - Interactive Document Q&A
Request:
{
"blob_url": "https://storage.blob.core.windows.net/datasets/contract.pdf",
"question": "What are the payment terms and penalties for late payment?",
"context": "focus on financial obligations",
"temperature": 0.1
}
Response:
{
"answer": "Payment terms are Net 30 days. Late payment penalty is 1.5% per month on outstanding balance...",
"confidence": 0.91,
"sources": [
{"page": 2, "section": "Payment Terms"},
{"page": 5, "section": "Default Provisions"}
],
"processing_time": "1.2s"
}
โ๏ธ Configuration Management
๐ง GET/POST /api/configuration - System Configuration
GET Response:
{
"openai_settings": {
"endpoint": "https://your-openai.openai.azure.com/",
"model": "gpt-5.4",
"temperature": 0.1,
"max_tokens": 4000
},
"processing_settings": {
"max_concurrent_jobs": 5,
"timeout_seconds": 300,
"retry_attempts": 3
},
"datasets": ["default-dataset", "medical-dataset", "financial-reports"]
}
POST Request:
{
"openai_settings": {
"temperature": 0.05,
"max_tokens": 6000
},
"processing_settings": {
"max_concurrent_jobs": 8
}
}
๐ Monitoring & Analytics
๐ GET /api/metrics - Performance Metrics
Response:
{
"period": "last_24h",
"summary": {
"total_documents": 1247,
"successful_extractions": 1198,
"failed_extractions": 49,
"success_rate": 96.1,
"avg_processing_time": "2.3s"
},
"performance": {
"p50_processing_time": "1.8s",
"p95_processing_time": "4.2s",
"p99_processing_time": "8.1s"
},
"errors": {
"ocr_failures": 12,
"ai_timeouts": 8,
"storage_issues": 3,
"other": 26
}
}
Contributing & Community
๐ฏ How to Contribute
We welcome contributions! Here's how to get started:
-
๐ด Fork & Clone:
git clone https://github.com/your-username/ARGUS.git cd ARGUS -
๐ฟ Create Feature Branch:
git checkout -b feature/amazing-improvement -
๐งช Develop & Test:
# Setup development environment ./scripts/setup-dev.sh # Run tests pytest tests/ -v # Lint code black src/ && flake8 src/ -
๐ Document Changes:
# Update documentation # Add examples to README # Update API documentation -
๐ Submit PR:
git commit -m "feat: add amazing improvement" git push origin feature/amazing-improvement # Create pull request on GitHub
๐ Contribution Guidelines
| Type | Guidelines |
|---|---|
| ๐ Bug Fixes | Include reproduction steps, expected vs actual behavior |
| โจ New Features | Discuss in issues first, include tests and documentation |
| ๐ Documentation | Clear examples, practical use cases, proper formatting |
| ๐ง Performance | Benchmark results, before/after comparisons |
๐ Recognition
Contributors will be recognized in:
- ๐ Release notes for significant contributions
- ๐ Contributors section (with permission)
- ๐ฌ Community showcase for innovative use cases
๐ Support & Resources
๐ฌ Getting Help
| Resource | Description | Link |
|---|---|---|
| ๐ Documentation | Complete setup and usage guides | docs/ |
| ๏ฟฝ๐ Issue Tracker | Bug reports and feature requests | GitHub Issues |
| ๐ก Discussions | Community Q&A and ideas | GitHub Discussions |
| ๐ง Team Contact | Direct contact for enterprise needs | See team section below |
๐ Additional Resources
- ๐ Azure Document Intelligence: Official Documentation
- ๐ค Azure OpenAI: Service Documentation
- โก FastAPI: Framework Documentation
- ๐ LangChain: Integration Guides
๐ฅ Team
- Alberto Gallo
- Petteri Johansson
- Christin Pohl
- Konstantinos Mavrodis
License
This project is licensed under the MIT License - see the LICENSE file for details.
