📦
Visara
Visara - Visual MCP Server for detailed UI prototype analysis
0 installs
1 stars
Trust: 48 — Fair
Devtools
Installation
npx visaraAsk AI about Visara
Powered by Claude · Grounded in docs
I know everything about Visara. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Loading tools...
Reviews
Documentation
Visara - Visual MCP Server
Visara is a Model Context Protocol (MCP) compliant visual analysis server that provides image processing capabilities through the official MCP protocol. It can analyze images, extract text content, understand scenes, and provide detailed descriptions for frontend development workflows.
Features
- MCP Protocol Compliance: Full compliance with the Model Context Protocol specification using the official
@modelcontextprotocol/sdk - Image Analysis: Analyze images and extract detailed information including objects, text, and scene understanding
- Frontend Development Support: Specialized prompts for UI/UX analysis and frontend development
- Local File Path Support: Automatically converts local file paths to base64 data URLs
- Production Ready: Includes Docker support, health checks, and caching
- Qwen-VL Plus Integration: Connects to Qwen-VL Plus multimodal API for advanced image analysis
Installation
git clone <repository-url>
cd visara
npm install
Usage
Development
# Build the project
npm run build
# Start the server
npm start
The server will be available at http://localhost:9451.
Docker
# Copy environment variables
cp .env.example .env
# Edit .env with your Qwen-VL API key
# Build and run with Docker Compose
docker-compose up --build
MCP Endpoints
GET /health- Health check endpointGET /tools- List available toolsGET /resources- List available resourcesGET /prompts- List available promptsPOST /- Main MCP endpoint for tool callsPOST /images/upload- File upload endpoint for direct image processing
Tools
analyze_image
Analyze an image and extract detailed information.
Parameters:
imageUrl(string, required): URL of the image to analyze or local file pathimageBase64(string, optional): Base64 encoded image dataprompt(string, optional): Custom prompt for image analysismodel(string, optional): Model to use (default: qwen-vl-plus)temperature(number, optional): Temperature for generation (0.0-1.0)maxTokens(number, optional): Maximum tokens for response
Prompts
- detailed_description: Get a detailed description of all visible elements in the image
- frontend_ui_analysis: Analyze UI/UX prototype and extract component structure, layout, and styling information
- react_component_generation: Generate React component structure based on UI prototype
- css_style_extraction: Extract detailed CSS styles, colors, typography, and spacing
- ui_component_inventory: Create inventory of all UI components and elements present in the prototype
- responsive_design_analysis: Analyze responsive design aspects and breakpoints
- object_detection: Identify and list all objects in the image with their positions
- text_extraction: Extract all visible text from the image
- scene_understanding: Provide high-level understanding of the scene context
Environment Variables
QWEN_VL_API_KEY: Your Qwen-VL API key from https://dashscope.console.aliyun.com/apiKeyQWEN_VL_API_BASE_URL: Qwen-VL API base URL (default: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation)PORT: Server port (default: 9451)HOST: Server host (default: 0.0.0.0)CACHE_TTL: Cache time-to-live in seconds (default: 3600)MAX_FILE_SIZE: Maximum file size for uploads in bytes (default: 10485760 = 10MB)ALLOWED_MIME_TYPES: Allowed MIME types for file uploads (default: image/jpeg,image/png,image/webp)
License
MIT
