AI Gov Content Curator
💡An end-to-end solution for aggregating, summarizing, and displaying news articles using an AI-powered backend, an automated CRON crawler & newsletter emailer, and a responsive Next.js frontend. It integrates technologies like Express.js, MongoDB, Puppeteer, and GenAI/LLMs to deliver up-to-date, cur
Installation
npx ai-gov-content-curatorAsk AI about AI Gov Content Curator
Powered by Claude · Grounded in docs
I know everything about AI Gov Content Curator. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
SynthoraAI - AI-Powered Article Content Curator
[!TIP] SynthoraAI - Synthesizing the world’s news & information through AI. 🚀✨
The SynthoraAI - AI-Powered Article Content Curator is a comprehensive, AI-powered system designed to aggregate, summarize, and present curated government-related articles. This monorepo, multi-services project is organized into seven main components:
- Backend: Provides a robust RESTful API to store and serve curated articles.
- Crawler: Automatically crawls and extracts article URLs and metadata from government homepages and public API sources.
- Frontend: Offers an intuitive Next.js-based user interface for government staff (and potentially the public) to browse and view article details.
- Newsletter: Sends daily updates to subscribers with the latest articles.
- Agentic AI Pipeline: Sophisticated multi-agent system for advanced content processing using LangGraph and LangChain, with a FastAPI HTTP bridge for cross-service integration.
- Chat Orchestration: TypeScript-based dual-provider (Anthropic + Google) chat layer with 16 specialized agents, intent routing, grounding validation, and cost tracking.
- MCP Server + ACP Layer: Model Context Protocol server exposing the agentic pipeline as 28 tools, 14 resources, and 7 prompts, plus a production-grade Agent Communication Protocol (ACP) for agent-to-agent messaging with Redis-backed multi-replica support.
High-Level Architecture: Each component is maintained in its own directory:
- Backend:
backend/ - Frontend:
frontend/ - Crawler:
crawler/ - Newsletter:
newsletters/ - Agentic AI Pipeline:
agentic_ai/- Documentation: agentic_ai/README.md
Additionally, the project includes a set of shell scripts and a Makefile for automating common tasks, as well as a CLI tool for managing crawling and article operations. It also fully supports AWS, Terraform, & Kubernetes deployments with blue/green deployments, canary releases, and rolling updates via GitHub Actions. Enterprise-grade observability is provided through Splunk (via OpenTelemetry Collector and Kinesis Data Firehose), Prometheus, and Grafana.
Table of Contents
- Overview
- Architecture
- Collaboration & Agile Workflow with Jira
- User Interface
- Backend
- Crawler
- Frontend
- Newsletter Subscription
- Agentic AI Pipeline
- Article Q&A Feature
- Sitewide AI Chat
- Intelligent Recommendation System
- Command Line Interface (CLI)
- Shell Scripts & Makefile
- Testing
- Continuous Integration / Continuous Deployment (CI/CD)
- Deployment
- License
- Contact
- Conclusion
Overview
The SynthoraAI - AI-Powered Article Content Curator system is designed to provide government staff with up-to-date, summarized content from trusted government sources and reputable news outlets. By leveraging AI (Google Generative AI / Gemini) for summarization and using modern web technologies, this solution ensures that users receive concise, accurate, and timely information.
[!IMPORTANT] Live Web App: https://synthoraai.vercel.app/.
Architecture
Below is a high-level diagram outlining the system architecture:
flowchart LR
subgraph Sources[Trusted Sources]
GovSites[Government Websites]
NewsAPIs[Public News APIs]
end
Sources -->|URLs & Metadata| Crawler
Crawler[[Crawler Service<br/>Next.js API Routes]] -->|Summaries & Topics| MongoDB[(MongoDB Atlas)]
Crawler -->|AI Prompts| GoogleAI[(Google Generative AI)]
MongoDB --> Backend[[Backend API<br/>Next.js + Express]]
Backend -->|Cached Responses| Redis[(Redis)]
Backend -->|REST/JSON| Frontend[[Frontend Web App<br/>Next.js + React]]
Backend -->|Digest Payload| Newsletter[[Newsletter Service<br/>Next.js Serverless]]
Newsletter -->|Emails| Subscribers[(Subscribers)]
Frontend -->|Browse & Manage| Users[(Staff & Public Users)]
Backend -.->|AI-Assisted Features| GoogleAI
Crawler -.->|Cron & Shell Scripts| Automation[Shell / Make CLI]
Backend -.->|Operational Scripts| Automation
subgraph Observability[Observability Stack]
OTEL[Splunk OTEL Collector]
Splunk[(Splunk)]
Prometheus[Prometheus]
Grafana[Grafana]
end
Backend -.->|OTLP + logs| OTEL
Frontend -.->|OTLP + logs| OTEL
Crawler -.->|logs| OTEL
Newsletter -.->|logs| OTEL
OTEL -->|HEC| Splunk
OTEL -->|remote write| Prometheus
Prometheus --> Grafana
To illustrate how articles move through the platform, the following sequence captures the typical daily refresh:
sequenceDiagram
participant Cron as Vercel Cron / Shell
participant Crawl as Crawler Service
participant Back as Backend API
participant AI as Google Generative AI
participant DB as MongoDB
participant UI as Frontend
participant Mail as Newsletter Service
Cron->>Crawl: Trigger fetchAndSummarize job
Crawl->>AI: Request summaries & topics
AI-->>Crawl: Return condensed insights
Crawl->>DB: Upsert articles & analytics
Back->>DB: Query latest curated data
UI->>Back: Fetch paginated articles
Mail->>Back: Request latest articles
Back-->>Mail: Deliver curated digest data
Mail->>Subscribers: Send newsletter email
This project consists of 4 primary microservices that interact with each other:
- Crawler:
- Crawls government homepages and public API sources to extract article URLs and metadata.
- Uses Axios and Cheerio for static HTML parsing, with Puppeteer as a fallback for dynamic content.
- Scheduled to run daily at 6:00 AM UTC via a serverless function on Vercel.
- Provides a basic landing page with information about the crawler and links to the backend and frontend.
- Deployed on Vercel at https://ai-content-curator-crawler.vercel.app.
- Backend:
- Built with Express.js and Next.js, serving as a RESTful API for the frontend.
- Integrates Google Generative AI (Gemini) for content summarization.
- Stores articles in MongoDB using Mongoose, with fields for URL, title, full content, summary, source information, and fetch timestamp.
- Scheduled serverless function to fetch and process new articles daily at 6:00 AM UTC.
- Deployed on Vercel at https://ai-content-curator-backend.vercel.app.
- Newsletter Service:
- Allows users to subscribe to a newsletter for daily updates on the latest articles.
- Integrated with Resend API for managing subscriptions and sending emails.
- By default, the newsletter is sent daily at 9:00 AM UTC, from the email address with the
sonnguyenhoang.comdomain. - Deployed on Vercel as a serverless function, at https://ai-content-curator-newsletters.vercel.app.
- Frontend:
- Built with Next.js and React, providing a modern, mobile-responsive UI for browsing and viewing curated articles.
- Fetches and displays a paginated list of articles from the backend API, with filtering options.
- Dedicated pages for full article content, AI-generated summaries, source information, and fetched timestamps.
- User authentication for marking articles as favorites, commenting, discussions, and upvoting/downvoting comments.
- Deployed on Vercel at https://synthoraai.vercel.app/.
Additionally, there are 3 advanced AI components:
- Agentic AI Pipeline:
- A sophisticated multi-agent system built with Python, leveraging LangGraph and LangChain for advanced content processing. This pipeline handles tasks such as article summarization, topic extraction, bias analysis, and more. It is designed to be modular and extensible, allowing for the addition of new agents and tools as needed. The pipeline is exposed via a FastAPI HTTP bridge for seamless integration with other services.
- Orchestration Layer:
- Python (
agentic_ai/orchestration/): Enterprise article processing orchestration atop the LangGraph pipeline — content supervision, cost budgeting, error recovery with circuit breaking, dead-letter queuing, and concurrent batch processing. Exposed via a FastAPI HTTP bridge (agentic_ai/api.pyon port 8100) for cross-service integration. - TypeScript (
orchestration/): Dual-provider LLM chat layer — unified Anthropic + Google client, intent-based agent routing (16 agents), grounding validation, prompt caching, cost tracking, and structured observability. Integrated into the backend via/api/orchestrator/*endpoints. Seeorchestration/README.md.
- Python (
- MCP Server + ACP Layer (
mcp_server/): Model Context Protocol server exposing the agentic pipeline over stdio — 28 tools, 14 resources, and 7 prompts for Claude Code and IDE integration. Includes ACP with Redis-backed agent registry and inter-agent message routing for production multi-replica deployments. Configured via.mcp.json.
This monorepo, microservices architecture is designed to be modular and scalable, allowing for easy updates and maintenance. Each component can be developed, tested, and deployed independently, ensuring a smooth development workflow.
[!NOTE] This architecture diagram above is a simplified representation and may not include all components or interactions. For a more detailed view, please refer to the individual service documentation.
Collaboration & Agile Workflow with Jira
Introduction
This project is currently using Jira for task management and collaboration. The project's Kanban board is organized into six main columns: Backlog, To Do, In Progress, Testing, Code Review, and Done. Each task is assigned to a specific team member and includes detailed descriptions, acceptance criteria, and due dates.
[!TIP] Before getting started, please make sure to read through the entire section to understand the workflow and how to effectively use Jira for this project.
Agile Approach
We are following an AGILE approach to development, which emphasizes iterative progress, collaboration, and flexibility. This allows us to adapt to changes quickly and deliver value to users in a timely manner.
Agile methodologies, such as Scrum or Kanban, are used to manage the development process, ensuring that tasks are prioritized, completed, and reviewed efficiently. This approach helps us maintain a high level of quality and responsiveness to user needs.
We chose Kanban for this project because it allows us to visualize the workflow, limit work in progress, and focus on delivering value incrementally. The Kanban board provides a clear overview of the project's status, making it easy to track progress and identify bottlenecks.
Why Jira?
We are currently pursuing an AGILE approach to development, and Jira is a great tool for managing tasks, tracking progress, and facilitating collaboration among team members. It allows us to create tasks, assign them to team members, set priorities, and track the status of each task in real-time.
Also, Jira provides a comprehensive set of features for managing projects, including sprint planning, backlog management, and reporting. It allows us to create user stories, epics, and tasks, and track their progress throughout the development cycle.
Project Board
You can view the project board and tasks at https://ai-content-curator.atlassian.net.
[!IMPORTANT] Login is required to access the board, and you can create an account if you don't have one.
If you need access to the project board, please contact me directly at sonnguyenhoang.com or via email at hoangson091104@gmail.com for an invitation. I believe that having access to the project board will help you understand the project's progress, tasks, and overall workflow better, and it will also allow you to contribute more effectively to the project.
Workflow
As soon as you receive a task (verbally or in writing) or come up with an idea:
- Create a new task in the Backlog column of the Jira board/list.
- Add a detailed description of the task, including acceptance criteria and due date.
- Assign the task to yourself or another team member.
- Create a new branch in the GitHub repository for the task. Make sure that you use the Jira issue key in the branch name (e.g.,
AICC-123).- Mark the Jira task as To Do/In Progress.
- This is very important for Jira to recognize the branch and link it to the task!
- Work on the task in your local development environment, committing changes to the branch as you go.
- Once the task is complete, push the branch to the remote repository and create a pull request (PR) in GitHub.
- Name the PR with a descriptive title that includes the Jira issue key (e.g.,
feat(ui): implement new feature [AICC-123). - Before you commit your changes, make sure to run any applicable tests and ensure that the code is properly formatted and linted.
- Note: If your changes do not involve any AI functionalities (e.g. chatbot, crawler), then set
GOOGLE_AI_API_KEY=dummyin yourbackend/.envfile to bypass the git hooks that check for AI-related environment variables. This will allow you to commit and push your changes without needing access to the actual API keys, while still maintaining the integrity of the development workflow.
- Name the PR with a descriptive title that includes the Jira issue key (e.g.,
- Assign the PR to the appropriate team member for code review.
- Move the Jira task to the Code Review column.
- Make any necessary changes based on feedback from the code review.
- Once the PR is approved, merge it into the main branch.
- Make sure to resolve any merge conflicts before merging.
- After merging, move the Jira task to the Done column.
This workflow ensures that tasks are tracked, code is reviewed, and the project progresses smoothly. It also allows for easy collaboration and communication among team members, and for our pursuit of an AGILE approach to development.
Confluence
We are also using Confluence for documentation and knowledge sharing. The Confluence space is organized into different sections, including project overview, architecture, API documentation, and user guides.
You can view the Confluence space at https://ai-content-curator.atlassian.net/wiki/spaces/ACC/.
To gain access to the Confluence space, please contact me directly at sonnguyenhoang.com or via email at hoangson091104@gmail.com.
User Interface
The user interface is built with Next.js and React, providing a modern, mobile-responsive experience. Below are some screenshots of the application (some screenshots may be outdated and not reflect the latest UI - visit https://synthoraai.vercel.app/ for the latest version):
0. Landing Page
1. Home Page
2. Article Details Page
2.1. Article Q&A Feature
2.2. Related Articles (Vector Similarity Search)
2.3. AI-Powered Article Bias Analysis
2.4. Article Ratings
2.5. Article Comments
3. Favorite Articles Page (Only for Authenticated Users)
4. Newsletter Subscription Page
5. User Authentication
6. User Registration
7. Reset Password
8. Search Results
9. App-wide Translate Feature
10. 404 Not Found Page
11. Daily Newsletter Email Example
12. Passkey Management Page
more pages and features are available in the app - we encourage you to explore!
Backend
The Backend is responsible for storing articles and serving them via RESTful endpoints. It integrates AI summarization, MongoDB for storage, and runs within a Next.js environment using Express.js for API routes.
Features
- Data Ingestion:
Receives article URLs and data from the crawler and external API sources. - Content Summarization:
Uses Google Generative AI (Gemini) to generate concise summaries. - Storage:
Persists articles in MongoDB using Mongoose with fields for URL, title, full content, summary, source information, and fetch timestamp. - API Endpoints:
GET /api/articles– Retrieves a paginated list of articles (supports filtering via query parameters such aspage,limit, andsource).GET /api/articles/:id– Retrieves detailed information for a specific article.
- Scheduled Updates:
A serverless function (triggered twice daily at 6:00 AM and 6:00 PM UTC) fetches and processes new articles, so that the system remains up-to-date! - User Authentication:
Supports email + password registration, login, and JWT-based authentication, plus passwordless passkey (WebAuthn / FIDO2) sign-in. Users can sign up with a passkey only, add additional passkeys to an existing account, and manage them (list / rename / delete) at/account/passkeys. Both flows issue the same JWT, so the rest of the system is unchanged. - Favorite Articles:
Authenticated users can mark articles as favorites for quick access. - Newsletter Subscription:
Users can subscribe to a newsletter for daily updates on the latest articles. This feature is integrated with a third-party service (Resend) for managing subscriptions and sending emails. - Bias Detection & Analysis:
The app also includes a bias detection & analysis feature, powered by Google Generative AI, to analyze articles for potential bias and provide deep article insights to users. - User Ratings:
Users can rate articles, allowing for feedback and quality assessment. - Dark Mode:
The frontend offers a dark mode option for improved readability and user experience. - Discussions & Comments:
Users can also discuss and comment on articles, fostering engagement and collaboration. - Upvote/Downvote Comments:
Users can upvote or downvote comments to highlight valuable contributions.
Backend Swagger API Documentation
Prerequisites & Installation (Backend)
[!CAUTION] Before proceeding, run
npm installonce in the root directory of the monorepo to install the necessary dependencies for managing the project!
-
Prerequisites:
- Node.js (v18 or later)
- MongoDB (local or cloud)
- Vercel CLI (for deployment)
-
Clone the Repository:
git clone https://github.com/hoangsonww/AI-Gov-Content-Curator.git cd AI-Gov-Content-Curator/backend # then, fill in the environment variables as described in .env.example file # approved team members, please contact me for the actual .env file and API keys! -
Install Dependencies (inside
backend/):npm install
Configuration (Backend)
Create a .env file in the ROOT directory with the following (this will be shared across all components in this monorepo):
MONGODB_URI=<your_mongodb_uri>
GOOGLE_AI_API_KEY=<your_google_ai_api_key>
GOOGLE_AI_API_KEY1=<your_google_ai_api_key1_optional>
GOOGLE_AI_API_KEY2=<your_google_ai_api_key2_optional>
GOOGLE_AI_API_KEY3=<your_google_ai_api_key3_optional>
AI_INSTRUCTIONS=Summarize the articles concisely and naturally (change if needed)
NEWS_API_KEY=<your_news_api_key>
NEWS_API_KEY1=<your_news_api_key1>
PORT=3000
CRAWL_URLS=https://www.state.gov/press-releases/,https://www.bbc.com/news,https://www.nytimes.com/,https://www.dallasnews.com/news/,https://www.houstonchronicle.com/,,https://www.whitehouse.gov/briefing-room/,https://www.congress.gov/,https://www.statesman.com/news/politics-elections/
AICC_API_URL=https://ai-content-curator-backend.vercel.app/
RESEND_API_KEY=<your_resend_api_key>
RESEND_FROM="AI Curator <your_email>"
UNSUBSCRIBE_BASE_URL=<your_unsubscribe_base_url>
# Auth
JWT_SECRET=<long_random_string>
# WebAuthn / Passkey configuration
# RP_ID MUST equal the FRONTEND apex domain (eTLD+1), not the backend's host.
RP_ID=localhost
RP_NAME=SynthoraAI
RP_ORIGIN=http://localhost:3000,https://synthoraai.vercel.app
Refer to the .env.example file for more details on each variable.
Running Locally (Backend)
Start the development server:
npm run dev
Access endpoints:
GET http://localhost:3000/api/articlesGET http://localhost:3000/api/articles/:id
Deployment on Vercel (Backend)
-
Configure Environment Variables in your Vercel project settings.
-
Create or update the
vercel.jsonin the root of the backend directory:{ "version": 2, "builds": [ { "src": "package.json", "use": "@vercel/next" } ], "crons": [ { "path": "/api/scheduled/fetchAndSummarize", "schedule": "0 6,18 * * *" } ] } -
Deploy with:
vercel --prod
Crawler
The Crawler automatically retrieves article links and metadata from government homepages and public API sources. It uses Axios and Cheerio for static HTML parsing and falls back to Puppeteer when necessary.
Features
-
Article Extraction:
Crawls specified URLs to extract article links and metadata. -
Error Handling & Resilience:
Implements a retry mechanism and fallback to Puppeteer for dynamic content fetching when encountering issues (e.g., HTTP 403, ECONNRESET). -
Scheduling:
Deployed as a serverless function on Vercel, scheduled via cron (runs daily at 6:00 AM UTC). -
Next.js UI:
Provides a basic landing page with information about the crawler and links to the backend and frontend.
Prerequisites & Installation (Crawler)
- Prerequisites:
- Node.js (v18 or later)
- NPM (or Yarn)
- Vercel CLI (for deployment)
-
Clone the Repository:
git clone https://github.com/hoangsonww/AI-Gov-Content-Curator.git cd AI-Gov-Content-Curator/crawler -
Install Dependencies:
npm install
Running Locally (Crawler)
Start the Next.js development server to test both the UI and crawler function:
npm run dev
- UI: http://localhost:3000/
- Crawler Function: http://localhost:3000/api/scheduled/fetchAndSummarize
Alternatively, run the crawler directly:
npx ts-node schedule/fetchAndSummarize.ts
# or
npm run crawl
Also, there are 2 more scripts for the crawler:
-
Fetch and crawl all past articles (will run indefinitely, unless you stop it):
npx ts-node scripts/fetchPastArticles.ts # or npm run fetch:past -
Fetch and crawl all newest/latest articles:
npx ts-node scripts/fetchLatestArticles.ts # or npm run fetch:latest
Run these locally to test the crawler functionality. You can also run them in a Docker container if you prefer.
Deployment on Vercel (Crawler)
-
Set Environment Variables in the Vercel dashboard.
-
Create or update the
vercel.jsonin thecrawlerdirectory:{ "version": 2, "builds": [ { "src": "package.json", "use": "@vercel/next" } ], "crons": [ { "path": "/api/scheduled/fetchAndSummarize", "schedule": "0 6 * * *" } ] } -
Deploy with:
vercel --prod
Frontend
The Frontend is built with Next.js and React, providing a modern, mobile-responsive UI for browsing and viewing curated articles.
Features
-
Article Listing:
Fetches and displays a paginated list of articles from the backend API. Supports filtering by source. -
Article Detail View:
Dedicated pages display full article content, AI-generated summaries, source information, and fetched timestamps. -
Responsive Design:
The UI is optimized for both desktop and mobile devices. -
Authentication:
Users can register and log in with email + password, or sign in with a passkey (Face ID, Touch ID, Windows Hello, or a hardware key) for a faster, passwordless experience. New accounts may be created passkey-only. The frontend stores the issued JWT inlocalStorageand sends it on theAuthorizationheader for protected routes. -
Chatbot Q&A Feature: Users can ask questions about specific articles, powered by RAG (Retrieval-Augmented Generation) using Google Generative AI. The sitewide chat supports inline message editing with conversation branching—edit any prior message and the conversation forks from that point with fresh AI responses.
-
Related Articles (Vector Similarity Search):
Displays related articles based on vector similarity search using embeddings generated by Google Generative AI. -
Article Bias Analysis:
Analyzes articles for potential bias using Google Generative AI and provides insights to users. -
Article Ratings:
Users can rate articles, allowing for feedback and quality assessment. -
Discussions & Comments:
Users can discuss and comment on articles, fostering engagement and collaboration. -
Upvote/Downvote Comments:
Users can upvote or downvote comments to highlight valuable contributions. -
Search Functionality:
Users can search for articles using keywords, with results fetched from the backend API. -
Translation Feature:
Articles can be translated into multiple languages using Google Translate API. -
Favorite Articles:
Authenticated users can mark articles as favorites for quick access. This is stored in the backend and displayed in the frontend. -
Newsletter Subscription:
Users can subscribe to a newsletter for daily updates on the latest articles. This feature is integrated with a third-party service (Resend) for managing subscriptions and sending emails. -
Dark Mode:
The frontend offers a dark mode option for improved readability and user experience. -
Additional UI Components:
Includes components like HeroSlider, LatestArticles, ThemeToggle, and more for an enhanced user experience. -
Static Site Generation (SSG):
The frontend uses Next.js's SSG capabilities to pre-render pages for improved performance and SEO.
Prerequisites & Installation (Frontend)
- Prerequisites:
- Node.js (v18 or later)
- NPM or Yarn
-
Clone the Repository:
git clone https://github.com/hoangsonww/AI-Gov-Content-Curator.git cd AI-Gov-Content-Curator/frontend -
Install Dependencies:
npm installor
yarn
Configuration (Frontend)
(Optional) Create a .env.local file in the frontend directory to configure the API URL:
NEXT_PUBLIC_API_URL=https://your-backend.example.com
Running Locally (Frontend)
Start the development server:
npm run dev
Access the application at http://localhost:3000.
Deployment on Vercel (Frontend)
-
Configure Environment Variables in the Vercel dashboard (e.g.,
NEXT_PUBLIC_API_URL). -
Vercel automatically detects the Next.js project; if needed, customize with a
vercel.json. -
Deploy with:
vercel --prod
Alternatively, you can deploy directly from the Vercel dashboard.
Newsletter Subscription
The app also includes a newsletter subscription feature, allowing users to sign up for updates. This is integrated with a third-party service (Resend) for managing subscriptions.
Features (Newsletter)
- Subscription Form:
Users can enter their email addresses to subscribe to the newsletter. - Unsubscribe Option:
Users can unsubscribe from the newsletter at any time. - Daily Updates:
Subscribers receive daily updates with the latest articles. Only the latest articles are sent to subscribers, ensuring they receive the most relevant information.
Prerequisites & Installation (Newsletter)
[!IMPORTANT] This assumes that you have already set up the backend and frontend as described above.
- Prerequisites: Sign up for a Resend account and obtain your API key.
- Go to Domain Settings: In your Resend dashboard, navigate to the "Domains" section and add your domain (you'll have to have purchased a domain name that you have access to). Render will ask that you verify your domain ownership by adding a TXT record to your DNS settings, as well as adding an MX record to your DNS settings, and more. Follow the instructions provided by Resend to complete this step.
- Configure Environment Variables: Create a
.envfile in thenewslettersdirectory with the following variables:RESEND_API_KEY: Your Resend API key.RESEND_DOMAIN: The domain you added in the Resend dashboard.
- Deploy the CRON Job: Simply run
vercel --prodin thenewslettersdirectory to deploy the CRON job that sends daily updates to subscribers. - Configure the CRON Job: In your Vercel dashboard, navigate to the "Functions" section and set up a CRON job that runs daily at 9:00 AM UTC. This job will send the latest articles to subscribers.
- That's it! Your newsletter subscription feature is now set up and ready to go. Users can subscribe to receive daily updates with the latest articles.
Note
[!IMPORTANT]
- The newsletter subscription feature is designed to be simple and effective. It allows users to stay informed about the latest articles without overwhelming them with too many emails.
- The subscription form is integrated into the frontend, and users can easily sign up or unsubscribe at any time.
- The daily updates are sent via email, ensuring that subscribers receive the most relevant information without having to check the app constantly.
- The newsletter feature is built using the Resend API, which provides a reliable and scalable solution for managing subscriptions and sending emails.
- Sometimes, the emails may end up in the spam folder, so users should check their spam folder if they don't see the emails in their inbox.
Agentic AI Pipeline
The Agentic AI Pipeline is a sophisticated, production-ready multi-agent system built with LangGraph and LangChain that processes articles through a series of specialized AI agents. This advanced system provides enhanced content analysis, summarization, classification, sentiment analysis, and quality assurance beyond the basic AI features.
Overview
The Agentic AI Pipeline implements an assembly line architecture where each specialized agent performs a specific task in sequence. Built on LangGraph's state machine framework, the pipeline ensures reliable, scalable, and sophisticated multi-agent orchestration.
graph LR
A[Article Input] --> B[Content Analyzer]
B --> C[Summarizer]
C --> D[Classifier]
D --> E[Sentiment Analyzer]
E --> F[Quality Checker]
F --> G{Quality Pass?}
G -->|Yes| H[Output]
G -->|No & Retry| B
G -->|Max Retries| H
Key Features
-
🤖 Multi-Agent Architecture: Five specialized agents working in concert
- Content Analyzer: Extracts structure, entities, and key information
- Summarizer: Generates concise, accurate summaries
- Classifier: Categorizes content into 15+ topic categories
- Sentiment Analyzer: Analyzes emotional tone and objectivity
- Quality Checker: Validates outputs with automatic retry logic
-
🔄 Assembly Line Processing: LangGraph-based state machine with conditional routing
-
🔌 MCP Server: Model Context Protocol server for standardized AI interactions
-
🛰️ ACP Layer: Agent Communication Protocol for inter-agent messaging (
register -> heartbeat -> send -> inbox -> ack) -
📬 Durable Agent Comms: Redis-backed ACP store for multi-replica deployments with TTL, retention, and liveness pruning
-
🧪 Operational Preflight: Live ACP roundtrip checks are part of
make mcp-preflight -
☁️ Cloud-Ready: Production configs for AWS Lambda and Azure Functions
-
📊 Quality Assurance: Built-in quality checking with automatic retry mechanisms
-
⚡ Production-Ready: Comprehensive logging, monitoring, and error handling
-
🔐 Secure: Secrets management via AWS Secrets Manager and Azure Key Vault
Architecture
The pipeline uses an assembly line architecture where articles flow through multiple specialized agents:
- Intake Node: Validates input and initializes state
- Content Analysis: Extracts structure, entities, dates, and style
- Summarization: Generates 150-200 word summaries
- Classification: Categorizes into relevant topics
- Sentiment Analysis: Analyzes tone, objectivity, urgency, and controversy
- Quality Check: Validates outputs and determines if retry is needed
- Output Node: Returns final results
Technology Stack:
- LangChain: Framework for LLM-powered applications
- LangGraph: State machine orchestration for multi-agent systems
- Python 3.11+: Modern Python with async/await support
- Redis: State management and caching
- MongoDB: Data persistence
- Prometheus: Metrics and monitoring
- Splunk + OpenTelemetry: Centralized log aggregation, distributed tracing, and enterprise observability via OTEL Collector
- MCP Python SDK (FastMCP): Model Context Protocol server implementation
- ACP Store Backends: Redis (production) + in-memory fallback (non-production)
MCP + ACP Surface:
- MCP: 28 tools, 14 resources, 7 prompts
- ACP Tools:
acp_register_agent,acp_unregister_agent,acp_heartbeat,acp_send_message,acp_fetch_inbox,acp_acknowledge_message,acp_list_agents,acp_get_message - ACP Resources:
acp://agents,acp://stats,acp://messages/recent
Cloud Deployment:
- AWS: Lambda, API Gateway, S3, SQS, Secrets Manager, CloudWatch, Kinesis Firehose → Splunk HEC
- Azure: Functions, Storage Queues, Blob Storage, Key Vault, Application Insights
Beads Subarchitecture:
- Beads are the atomic unit of work in the agentic architecture. Each bead is a discrete, well-scoped task that an agent (human or AI) can claim, execute, and verify independently
- Beads follow a
PENDING → CLAIMED → IN_PROGRESS → REVIEW → DONElifecycle with aBLOCKEDescape state, and use file-level reservations (.beads/.status.json) to prevent concurrent-edit conflicts across agents - Service-scoped IDs (
ORCH-001,CRAWL-005,PIPE-012, etc.) tie every bead to the service it changes, enabling per-service tracking and parallelism - A compound learning loop records structured session logs in
.agent-sessions/after each completed bead, so future agents benefit from accumulated experience - See .beads/README.md for the full specification and .agent-sessions/README.md for session log format
Getting Started
Quick Start:
# Navigate to the agentic_ai directory
cd agentic_ai
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your API keys
# For production ACP, set:
# ACP_ENABLED=true
# ACP_BACKEND=redis
# REDIS_HOST=<redis-host>
# REDIS_PORT=6379
# Run the MCP server
PYTHONPATH=.. python -m mcp_server
# Run production-readiness preflight (includes live ACP checks)
make mcp-preflight
Use Programmatically:
from agentic_ai.core.pipeline import AgenticPipeline
import asyncio
# Initialize pipeline
pipeline = AgenticPipeline()
# Process an article
result = asyncio.run(pipeline.process_article({
"id": "article-123",
"content": "Your article content...",
"url": "https://example.com/article",
"source": "government"
}))
print(f"Summary: {result['summary']}")
print(f"Topics: {result['topics']}")
print(f"Quality Score: {result['quality_score']}")
Deploy to Cloud:
# Deploy to AWS
cd agentic_ai/aws
./deploy.sh production
# Deploy to Azure
cd agentic_ai/azure
./deploy.sh production
Docker Deployment:
cd agentic_ai
docker-compose up -d
Detailed Documentation
For comprehensive documentation including:
- Detailed agent specifications
- MCP server API reference
- Cloud deployment guides
- Performance optimization tips
- Monitoring and observability setup
- Integration examples
Please see the complete documentation: agentic_ai/README.md
For full MCP + ACP protocol and runtime diagrams, see MCP-ACP.md.
[!TIP] For a comprehensive reference of all AI/ML components — the three orchestration layers, 21 agents, LLM providers, cost controls, grounding rules, and 15+ Mermaid architecture diagrams — see AI_ML.md.
Article Q&A Feature
The article Q&A feature allows users to ask questions about specific articles and receive AI-generated answers. This feature is integrated into the frontend and backend, providing a seamless experience for users.
The AI will have access to the article content and will use RAG to generate answers based on the information provided in the article. This feature is designed to enhance user engagement and provide quick answers to common questions.
Features (Article Q&A)
The chatbot is given an identity (ArticleIQ) and is designed to answer questions related to the articles. The AI will have access to the article content and will use RAG (Retrieval-Augmented Generation) to generate answers based on the information provided in the article:
- Ask Questions: Users can ask questions about specific articles directly from the article detail page.
- AI-Generated Answers: The AI will generate answers based on the content of the article, providing users with relevant information.
- User-Friendly Interface: The Q&A feature is integrated into the article detail page, making it easy for users to ask questions and receive answers without navigating away from the content.
- RAG Integration: The AI will use RAG (Retrieval-Augmented Generation) to provide accurate and contextually relevant answers based on the article content.
- Real-Time Responses: Users will receive answers in real-time, enhancing the overall user experience and engagement with the content.
In addition to the site-wide chatbot, article-specific chatbots are also available on each article detail page. These chatbots are tailored to the content of the specific article, allowing users to ask questions and receive answers that are directly relevant to the article they are reading.
Prerequisites & Installation (Article Q&A)
[!TIP] This feature is integrated into the existing backend and frontend, so you don't need to set up anything separately.
- Prerequisites: Ensure that you have the backend and frontend set up as described above.
- Install Dependencies: Make sure you have the necessary dependencies installed in both the backend and frontend directories. You can run
npm installin the root directory to install dependencies for all components. - Configure Environment Variables: Ensure that you have the necessary environment variables set up in your
.envfile. This includes the Google AI API key and any other required variables. - Deploy the Backend: If you haven't already, deploy the backend to Vercel using
vercel --prodin the backend directory. Or, just run locally withnpm run dev. - Deploy the Frontend: If you haven't already, deploy the frontend to Vercel using
vercel --prodin the frontend directory. Or, just run locally withnpm run dev. - Test the Feature: Once everything is set up, you can test the article Q&A feature by navigating to an article detail page of an article and asking questions. The AI will generate answers based on the content of the article.
- That's it! The article Q&A feature is now integrated into the existing system, providing users with an enhanced experience and quick access to information.
Using the Article Q&A Feature
To use the article Q&A feature, simply navigate to the article detail page of an article and look for the Q&A section. You can ask questions related to the article, and the AI will generate answers based on the content provided.
Feel free to ask any questions related to the article, and the AI will do its best to provide accurate and relevant answers. This feature is designed to enhance user engagement and provide quick access to information without having to read through the entire article.
Sitewide AI Chat
The sitewide chat lets users ask open-ended questions across the full corpus—not just a single article—while keeping every claim cited. The system provides two chat paths: a direct Gemini-powered RAG pipeline and an orchestrated multi-agent pipeline.
Direct RAG Chat (/api/chat/sitewide)
- RAG over the whole library: The backend converts queries to gemini-embedding-001 vectors, searches Pinecone for top matches, and builds a context block with
[Source N]slots. - Streaming Gemini replies: Gemini 2.0 Flash / Flash Lite streams text via Server-Sent Events (SSE) with automatic API-key/model failover and history compaction to stay within token budgets.
- Inline citations & warnings: Responses carry citation metadata plus hallucination checks (missing citations, invalid refs, overconfident claims, uncited numbers). The frontend renders clickable superscripts and yellow warnings if issues are detected.
Orchestrated Multi-Agent Chat (/api/orchestrator/chat)
- 16 specialized agents: 8 Anthropic (Claude) primary + 8 Google (Gemini) fallback agents covering article search, Q&A, topic exploration, trend analysis, bias detection, clarification, and quality review.
- Intent-based routing: LLM-based intent classification routes queries to the best-fit agent, with keyword heuristic fallback.
- Dual-provider failover: Anthropic primary with automatic Google failover after retry exhaustion (3 attempts, exponential backoff with jitter).
- Grounding validation: 10 canonical rules applied post-generation to detect hallucinations, missing citations, and unsupported claims.
- Cost tracking: Real-time daily budget enforcement with per-model cost breakdown.
- Streaming support: SSE streaming via
/api/orchestrator/chat/stream.
Rich Client UX
frontend/pages/ai_chat.tsxprovides multiple conversations, local storage persistence, typing indicators, and interactive source cards.- Message editing with conversation branching: Users can click the pencil icon on any sent message to edit it inline. Submitting the edit truncates the conversation at that point (removing all subsequent messages) and re-sends the edited message with only the preceding history, effectively branching the conversation from the edit point.
How It Works
sequenceDiagram
participant User
participant UI as Frontend (ai_chat.tsx)
participant API as Backend /api/chat/sitewide
participant Vec as Pinecone (ai-gov-articles)
participant LLM as Gemini 2.0 Flash
User->>UI: Ask question
UI->>API: POST userMessage + trimmed history
API->>Vec: Embed query (gemini-embedding-001) & semantic search
Vec-->>API: Top K articles + metadata
API->>LLM: Stream request with context + citations + guardrails
LLM-->>API: SSE chunks (text)
API-->>UI: SSE events (status/context/citations/chunk/warnings/done)
UI-->>User: Live updates, clickable citations, warnings if any
Streaming Contract
- Endpoint:
POST /api/chat/sitewide - Events:
status,context,citations,chunk,warnings,done - Payloads: JSON per event (e.g.,
{"message":"Generating response..."},{"sources":[{number,title,url,score}]},{"text":"partial reply"}) - Frontend handling: Streams append text into the active message bubble; citations hydrate source cards; warnings show a yellow banner above the AI reply.
Intelligent Recommendation System
SynthoraAI employs a sophisticated, multi-layered recommendation engine to deliver personalized and contextually relevant content to users.
Related Articles (Vector Similarity Search)
The Related Articles feature leverages Pinecone, a high-performance vector database, to find semantically similar articles:
flowchart LR
Article[Current Article] --> Embed[Generate Embedding]
Embed --> Pinecone[(Pinecone Index)]
UI[Article Page] --> API[/api/articles/:id/similar/]
API --> Pinecone
Pinecone --> API
API --> UI
- Embedding Generation: Article content is transformed into high-dimensional vector embeddings using state-of-the-art NLP models
- Vector Storage: Embeddings are indexed in Pinecone for lightning-fast similarity searches
- Semantic Matching: When viewing an article, the system queries Pinecone to retrieve the most semantically similar articles based on cosine similarity
- Real-time Results: Users see up to 6 related articles displayed in an interactive carousel on each article detail page
This approach goes beyond simple keyword matching, understanding the deeper semantic relationships between articles to surface truly relevant content.
Recommended Articles (Client-Side ML)
The Recommended Articles section uses a lightweight machine learning model running directly in the browser:
- Privacy-First: All computation happens client-side—no user data is sent to external servers
- Behavioral Analysis: The model analyzes user interactions (views, time spent, favorites, ratings) to build a personalized preference profile
- Real-Time Adaptation: Recommendations dynamically update based on current session behavior and historical patterns
- Hybrid Approach: Combines collaborative filtering with content-based signals for optimal accuracy
- Performance Optimized: Uses WebAssembly and quantized model weights to ensure fast inference without impacting page load times
Together, these systems ensure users discover relevant content through both semantic similarity and personalized behavioral patterns.
Command Line Interface (CLI)
The aicc command gives you a single entrypoint to manage your entire monorepo—frontend, backend, crawler—and perform content‐curation tasks.
Installation
From the project root:
# Install dependencies
npm install
# Link the CLI so `aicc` is on your PATH
npm link
[!TIP] This sets up a global symlink named
aiccpointing at./bin/aicc.js.
Usage
Run aicc with no arguments to display help:
aicc
Workspace Management
| Command | Description |
|---|---|
aicc dev | Start all services in development mode |
aicc dev <service> | Start one service (frontend / backend / crawler) in dev |
aicc build | Build all services for production |
aicc build <service> | Build one service |
aicc start | Start all services in production mode |
aicc start <service> | Start one service |
aicc lint | Run Prettier across all packages |
aicc format | Alias for aicc lint |
Examples:
# Run frontend + backend + crawler in parallel
aicc dev
# Build only the backend
aicc build backend
# Start crawler in prod mode
aicc start crawler
# Lint & format everything
aicc lint
Crawling
Kick off your scheduled crawler (schedule/fetchAndSummarize.ts) in the crawler package:
aicc crawl
This will cd crawler and run npm run crawl under the hood.
Article CRUD
Interact with your backend’s /api/articles endpoints directly from the CLI:
| Command | Description |
|---|---|
aicc article create --title <t> --content <c> [...flags] | Create a new article |
aicc article get <id> | Fetch one article by its MongoDB _id |
aicc article list [--limit N] | List articles, optionally limiting the number |
aicc article update <id> [--flags] | Update fields on an existing article |
aicc article delete <id> | Delete an article by ID |
Flags for create & update:
--title <string>— Article title--content <string>— Full article content (stored incontent)--summary <string>— Brief summary--topics <topic1> ...— One or more topic tags--source <string>— Source identifier
Examples:
# Create a new article
aicc article create \
--title "AI in 2025" \
--content "Deep dive into AI trends..." \
--summary "Key trends in AI" \
--topics ai machine-learning \
--source "manual-cli"
# Get an article
aicc article get 64a1f2d3e4b5c6a7d8e9f0
# List up to 5 articles
aicc article list --limit 5
# Update title and topics
aicc article update 64a1f2d3e4b5c6a7d8e9f0 \
--title "AI Trends 2025" \
--topics ai trends
# Delete an article
aicc article delete 64a1f2d3e4b5c6a7d8e9f0
With aicc in your toolbox, you can develop, build, run, lint, crawl, and manage content—all from one unified interface.
Shell Scripts & Makefile
The project includes several shell scripts and a Makefile to simplify common tasks. These scripts are located in the scripts directory and can be executed directly from the command line.
Shell Scripts
Various shell scripts are provided for tasks such as:
- Starting the backend or frontend
- Running the crawler
- Building the project
- Running tests
- Deploying to Vercel
- and more...
These scripts are designed to be easy to use and can be executed with a simple command.
Visit the shell directory for more details on each script.
To run a shell script, use the following command:
chmod +x scripts/<script_name>.sh
./scripts/<script_name>.sh
daily.sh Script in Root Directory
The daily.sh script is a shell script that automates the process of running the crawler and sending out the newsletter.
It is designed to be run daily, and it performs the following tasks:
- Runs the crawler to fetch the latest articles.
- Processes the articles and generates summaries.
- Cleanups any temporary files or artifacts, as well as dirty/corrupted articles.
- Sends out the newsletter to subscribers with the latest articles.
- Performs any other necessary tasks related to the daily operation of the application.
To run the daily.sh script, use the following command:
chmod +x daily.sh
./daily.sh
Please ensure that you have the necessary permissions and environment variables set up before running the script.
Also, you can set up a cron job to run this script automatically at a specified time each day. To do so, simply run
the install_daily_cron.sh script, which will install the cron job for you.
chmod +x install_daily_cron.sh
./install_daily_cron.sh
This will create a cron job that runs the daily.sh script every day at 16:00 (4:00 PM) UTC. You can adjust the timing in the install_daily_cron.sh script if needed.
To confirm, run the following command to view your cron jobs:
crontab -l
This will display a list of all your cron jobs, including the one you just created for the daily.sh script.
[!CAUTION] Be sure to keep your computer on and connected to the internet for the cron job to run successfully at the scheduled time!
[!TIP] Logs will be saved in the
daily.logfile in the root directory, so you can check the output of the script and any errors that may occur.
Makefile
The Makefile provides a convenient way to run common tasks using the make command. It includes targets for building, testing, and deploying the project.
To use the Makefile, navigate to the project root directory and run:
make <target>
Example Makefile Targets
| Target | Description |
|---|---|
bootstrap | Install dependencies for all packages |
clean | Remove build artifacts and temporary files |
deps | Install dependencies for all packages |
dev:frontend | Start the frontend in development mode |
dev:backend | Start the backend in development mode |
| and more... |
To see all available targets, run:
make help
This will display a list of all targets defined in the Makefile along with their descriptions.
Testing
Backend
The backend uses Jest + Supertest (with an in-memory MongoDB) for unit and integration tests. From the backend workspace root, run:
# Install dependencies (npm ci is correct here—it installs exactly from package-lock)
npm ci
# Run all tests once
npm run test
# Rerun tests on file changes
npm run test:watch
# Generate a coverage report
npm run test:coverage
[!NOTE] If your changes do not involve AI functionality, you'll need to set
GOOGLE_AI_API_KEY=dummyinbackend/.envto prevent tests from failing due to missing API keys.
Frontend
The frontend uses Playwright for end-to-end testing. From the frontend workspace directory, run:
# Install dependencies
npm ci
# Headless E2E run (default)
npm run test:e2e
# Run in headed mode (open real browser windows)
npm run test:e2e:headed
# Open the HTML report after a run
npm run test:e2e:report
By default, the Playwright report is served at http://localhost:9323 (or another port as printed in your console).
Crawler
The crawler uses Jest + ts-jest to test the fetchAndSummarize job. From the crawler workspace root, run:
# Install dependencies
npm ci
# Run all crawler tests once
npm run test
# (Optional) Re-run on file changes
npm run test -- --watch
# To execute an actual crawl against your configured URLs:
npm run crawl
Make sure required environment variables (e.g. MONGODB_URI, CRAWL_URLS, CRAWL_MAX_LINKS, etc.) are defined before invoking npm run crawl and any test commands. Using npm ci in each workspace ensures a clean, reproducible installation based on your lockfile.
Continuous Integration / Continuous Deployment (CI/CD)
The project uses GitHub Actions for CI/CD. The workflow is defined in .github/workflows/ci.yml. It includes:
- Linting: Runs ESLint and Prettier checks on all code changes.
- Testing: Executes unit tests for the backend and frontend.
- Deployment: Automatically deploys the backend and frontend to Vercel on successful merges to the main branch.
- Docker: Builds and pushes Docker images for the backend and crawler.
- Cron Jobs: Configures scheduled tasks for the backend and crawler.
- Environment Variables: Sets up environment variables for the backend and crawler.
- and more...
Additional .yml files are also available for specific tasks, such as backend-ci.yml, crawler-ci.yml, and frontend-ci.yml.
Deployment
The project fully supports deployment with AWS, Kubernetes, and Terraform for infrastructure as code (IaC). It utilizes blue/green and canary deployment strategies for zero-downtime releases.
For detailed deployment instructions, refer to the infrastructure/ directory, which contains Terraform scripts and Kubernetes manifests.
License
This project is licensed under the MIT License. See the LICENSE file for more details.
Contact
If you have any questions or suggestions, feel free to reach out to the repository maintainer:
I will be happy to assist you with any questions or issues you may have regarding this project.
[!TIP] If I don't know the answer, I'll be able to forward your question to the right person in the AICC team who can help you!
Conclusion
The SynthoraAI - AI-Powered Article Content Curator project brings together a powerful backend, an intelligent crawler, a newsletter service, and a modern frontend to deliver up-to-date, summarized government-related articles. Leveraging advanced technologies like Google Generative AI, Next.js, Express.js, and MongoDB, the system is both scalable and robust. Whether you’re a government staff member or a curious public user, this solution provides a streamlined, user-friendly experience to quickly access relevant, summarized content.
[!NOTE] This project is a work in progress, and contributions are welcome! If you have ideas for improvements, bug fixes, or new features, please feel free to open an issue or submit a pull request.
Thank you for exploring this project! If you have any questions, suggestions, or contributions, feel free to reach out. Your feedback is invaluable in making this project even better. Cheers to a more informed world! 🚀
