📦

AI Gov Content Curator

💡An end-to-end solution for aggregating, summarizing, and displaying news articles using an AI-powered backend, an automated CRON crawler & newsletter emailer, and a responsive Next.js frontend. It integrates technologies like Express.js, MongoDB, Puppeteer, and GenAI/LLMs to deliver up-to-date, cur

0 installs

24 stars

11 forks

Trust: 67 — Good

Devtools

Installation

npx ai-gov-content-curator

Ask AI about AI Gov Content Curator

I know everything about AI Gov Content Curator. Ask me about installation, configuration, usage, or troubleshooting.

0/500

Loading tools...

Reviews

Documentation

SynthoraAI - AI-Powered Article Content Curator

[!TIP] SynthoraAI - Synthesizing the world’s news & information through AI. 🚀✨

The SynthoraAI - AI-Powered Article Content Curator is a comprehensive, AI-powered system designed to aggregate, summarize, and present curated government-related articles. This monorepo, multi-services project is organized into seven main components:

Backend: Provides a robust RESTful API to store and serve curated articles.
Crawler: Automatically crawls and extracts article URLs and metadata from government homepages and public API sources.
Frontend: Offers an intuitive Next.js-based user interface for government staff (and potentially the public) to browse and view article details.
Newsletter: Sends daily updates to subscribers with the latest articles.
Agentic AI Pipeline: Sophisticated multi-agent system for advanced content processing using LangGraph and LangChain, with a FastAPI HTTP bridge for cross-service integration.
Chat Orchestration: TypeScript-based dual-provider (Anthropic + Google) chat layer with 16 specialized agents, intent routing, grounding validation, and cost tracking.
MCP Server + ACP Layer: Model Context Protocol server exposing the agentic pipeline as 28 tools, 14 resources, and 7 prompts, plus a production-grade Agent Communication Protocol (ACP) for agent-to-agent messaging with Redis-backed multi-replica support.

High-Level Architecture: Each component is maintained in its own directory:

Backend: backend/
- Live: https://ai-content-curator-backend.vercel.app/
Frontend: frontend/
- Live: https://synthoraai.vercel.app/
Crawler: crawler/
- Live: https://ai-content-curator-crawler.vercel.app/
Newsletter: newsletters/
- Live: https://ai-content-curator-newsletters.vercel.app/
Agentic AI Pipeline: agentic_ai/
- Documentation: agentic_ai/README.md

Additionally, the project includes a set of shell scripts and a Makefile for automating common tasks, as well as a CLI tool for managing crawling and article operations. It also fully supports AWS, Terraform, & Kubernetes deployments with blue/green deployments, canary releases, and rolling updates via GitHub Actions. Enterprise-grade observability is provided through Splunk (via OpenTelemetry Collector and Kinesis Data Firehose), Prometheus, and Grafana.

Overview
Architecture
Collaboration & Agile Workflow with Jira
User Interface
Backend
Crawler
Frontend
Newsletter Subscription
Agentic AI Pipeline
Article Q&A Feature
Sitewide AI Chat
Intelligent Recommendation System
Command Line Interface (CLI)
Shell Scripts & Makefile
Testing
Continuous Integration / Continuous Deployment (CI/CD)
Deployment
License
Contact
Conclusion

Overview

The SynthoraAI - AI-Powered Article Content Curator system is designed to provide government staff with up-to-date, summarized content from trusted government sources and reputable news outlets. By leveraging AI (Google Generative AI / Gemini) for summarization and using modern web technologies, this solution ensures that users receive concise, accurate, and timely information.

[!IMPORTANT] Live Web App: https://synthoraai.vercel.app/.

Architecture

Below is a high-level diagram outlining the system architecture:

flowchart LR
    subgraph Sources[Trusted Sources]
        GovSites[Government Websites]
        NewsAPIs[Public News APIs]
    end

    Sources -->|URLs & Metadata| Crawler
    Crawler[[Crawler Service<br/>Next.js API Routes]] -->|Summaries & Topics| MongoDB[(MongoDB Atlas)]
    Crawler -->|AI Prompts| GoogleAI[(Google Generative AI)]
    MongoDB --> Backend[[Backend API<br/>Next.js + Express]]
    Backend -->|Cached Responses| Redis[(Redis)]
    Backend -->|REST/JSON| Frontend[[Frontend Web App<br/>Next.js + React]]
    Backend -->|Digest Payload| Newsletter[[Newsletter Service<br/>Next.js Serverless]]
    Newsletter -->|Emails| Subscribers[(Subscribers)]
    Frontend -->|Browse & Manage| Users[(Staff & Public Users)]
    Backend -.->|AI-Assisted Features| GoogleAI
    Crawler -.->|Cron & Shell Scripts| Automation[Shell / Make CLI]
    Backend -.->|Operational Scripts| Automation

    subgraph Observability[Observability Stack]
        OTEL[Splunk OTEL Collector]
        Splunk[(Splunk)]
        Prometheus[Prometheus]
        Grafana[Grafana]
    end

    Backend -.->|OTLP + logs| OTEL
    Frontend -.->|OTLP + logs| OTEL
    Crawler -.->|logs| OTEL
    Newsletter -.->|logs| OTEL
    OTEL -->|HEC| Splunk
    OTEL -->|remote write| Prometheus
    Prometheus --> Grafana

To illustrate how articles move through the platform, the following sequence captures the typical daily refresh:

sequenceDiagram
    participant Cron as Vercel Cron / Shell
    participant Crawl as Crawler Service
    participant Back as Backend API
    participant AI as Google Generative AI
    participant DB as MongoDB
    participant UI as Frontend
    participant Mail as Newsletter Service

    Cron->>Crawl: Trigger fetchAndSummarize job
    Crawl->>AI: Request summaries & topics
    AI-->>Crawl: Return condensed insights
    Crawl->>DB: Upsert articles & analytics
    Back->>DB: Query latest curated data
    UI->>Back: Fetch paginated articles
    Mail->>Back: Request latest articles
    Back-->>Mail: Deliver curated digest data
    Mail->>Subscribers: Send newsletter email

This project consists of 4 primary microservices that interact with each other:

Crawler:
- Crawls government homepages and public API sources to extract article URLs and metadata.
- Uses Axios and Cheerio for static HTML parsing, with Puppeteer as a fallback for dynamic content.
- Scheduled to run daily at 6:00 AM UTC via a serverless function on Vercel.
- Provides a basic landing page with information about the crawler and links to the backend and frontend.
- Deployed on Vercel at https://ai-content-curator-crawler.vercel.app.
Backend:
- Built with Express.js and Next.js, serving as a RESTful API for the frontend.
- Integrates Google Generative AI (Gemini) for content summarization.
- Stores articles in MongoDB using Mongoose, with fields for URL, title, full content, summary, source information, and fetch timestamp.
- Scheduled serverless function to fetch and process new articles daily at 6:00 AM UTC.
- Deployed on Vercel at https://ai-content-curator-backend.vercel.app.
Newsletter Service:
- Allows users to subscribe to a newsletter for daily updates on the latest articles.
- Integrated with Resend API for managing subscriptions and sending emails.
- By default, the newsletter is sent daily at 9:00 AM UTC, from the email address with the sonnguyenhoang.com domain.
- Deployed on Vercel as a serverless function, at https://ai-content-curator-newsletters.vercel.app.
Frontend:
- Built with Next.js and React, providing a modern, mobile-responsive UI for browsing and viewing curated articles.
- Fetches and displays a paginated list of articles from the backend API, with filtering options.
- Dedicated pages for full article content, AI-generated summaries, source information, and fetched timestamps.
- User authentication for marking articles as favorites, commenting, discussions, and upvoting/downvoting comments.
- Deployed on Vercel at https://synthoraai.vercel.app/.

Additionally, there are 3 advanced AI components:

Agentic AI Pipeline:
- A sophisticated multi-agent system built with Python, leveraging LangGraph and LangChain for advanced content processing. This pipeline handles tasks such as article summarization, topic extraction, bias analysis, and more. It is designed to be modular and extensible, allowing for the addition of new agents and tools as needed. The pipeline is exposed via a FastAPI HTTP bridge for seamless integration with other services.
Orchestration Layer:
- Python (agentic_ai/orchestration/): Enterprise article processing orchestration atop the LangGraph pipeline — content supervision, cost budgeting, error recovery with circuit breaking, dead-letter queuing, and concurrent batch processing. Exposed via a FastAPI HTTP bridge (agentic_ai/api.py on port 8100) for cross-service integration.
- TypeScript (orchestration/): Dual-provider LLM chat layer — unified Anthropic + Google client, intent-based agent routing (16 agents), grounding validation, prompt caching, cost tracking, and structured observability. Integrated into the backend via /api/orchestrator/* endpoints. See orchestration/README.md.
MCP Server + ACP Layer (mcp_server/): Model Context Protocol server exposing the agentic pipeline over stdio — 28 tools, 14 resources, and 7 prompts for Claude Code and IDE integration. Includes ACP with Redis-backed agent registry and inter-agent message routing for production multi-replica deployments. Configured via .mcp.json.

This monorepo, microservices architecture is designed to be modular and scalable, allowing for easy updates and maintenance. Each component can be developed, tested, and deployed independently, ensuring a smooth development workflow.

[!NOTE] This architecture diagram above is a simplified representation and may not include all components or interactions. For a more detailed view, please refer to the individual service documentation.

Collaboration & Agile Workflow with Jira

Introduction

This project is currently using Jira for task management and collaboration. The project's Kanban board is organized into six main columns: Backlog, To Do, In Progress, Testing, Code Review, and Done. Each task is assigned to a specific team member and includes detailed descriptions, acceptance criteria, and due dates.

[!TIP] Before getting started, please make sure to read through the entire section to understand the workflow and how to effectively use Jira for this project.

Agile Approach

We are following an AGILE approach to development, which emphasizes iterative progress, collaboration, and flexibility. This allows us to adapt to changes quickly and deliver value to users in a timely manner.

Agile methodologies, such as Scrum or Kanban, are used to manage the development process, ensuring that tasks are prioritized, completed, and reviewed efficiently. This approach helps us maintain a high level of quality and responsiveness to user needs.

We chose Kanban for this project because it allows us to visualize the workflow, limit work in progress, and focus on delivering value incrementally. The Kanban board provides a clear overview of the project's status, making it easy to track progress and identify bottlenecks.

Why Jira?

We are currently pursuing an AGILE approach to development, and Jira is a great tool for managing tasks, tracking progress, and facilitating collaboration among team members. It allows us to create tasks, assign them to team members, set priorities, and track the status of each task in real-time.

Also, Jira provides a comprehensive set of features for managing projects, including sprint planning, backlog management, and reporting. It allows us to create user stories, epics, and tasks, and track their progress throughout the development cycle.

Project Board

You can view the project board and tasks at https://ai-content-curator.atlassian.net.

[!IMPORTANT] Login is required to access the board, and you can create an account if you don't have one.

If you need access to the project board, please contact me directly at sonnguyenhoang.com or via email at hoangson091104@gmail.com for an invitation. I believe that having access to the project board will help you understand the project's progress, tasks, and overall workflow better, and it will also allow you to contribute more effectively to the project.

Workflow

As soon as you receive a task (verbally or in writing) or come up with an idea:

Create a new task in the Backlog column of the Jira board/list.
Add a detailed description of the task, including acceptance criteria and due date.
Assign the task to yourself or another team member.
Create a new branch in the GitHub repository for the task. Make sure that you use the Jira issue key in the branch name (e.g., AICC-123).
- Mark the Jira task as To Do/In Progress.
- This is very important for Jira to recognize the branch and link it to the task!
Work on the task in your local development environment, committing changes to the branch as you go.
Once the task is complete, push the branch to the remote repository and create a pull request (PR) in GitHub.
- Name the PR with a descriptive title that includes the Jira issue key (e.g., feat(ui): implement new feature [AICC-123).
- Before you commit your changes, make sure to run any applicable tests and ensure that the code is properly formatted and linted.
- Note: If your changes do not involve any AI functionalities (e.g. chatbot, crawler), then set GOOGLE_AI_API_KEY=dummy in your backend/.env file to bypass the git hooks that check for AI-related environment variables. This will allow you to commit and push your changes without needing access to the actual API keys, while still maintaining the integrity of the development workflow.
Assign the PR to the appropriate team member for code review.
- Move the Jira task to the Code Review column.
Make any necessary changes based on feedback from the code review.
Once the PR is approved, merge it into the main branch.
- Make sure to resolve any merge conflicts before merging.
After merging, move the Jira task to the Done column.

This workflow ensures that tasks are tracked, code is reviewed, and the project progresses smoothly. It also allows for easy collaboration and communication among team members, and for our pursuit of an AGILE approach to development.

Jira Workflow

Confluence

We are also using Confluence for documentation and knowledge sharing. The Confluence space is organized into different sections, including project overview, architecture, API documentation, and user guides.

You can view the Confluence space at https://ai-content-curator.atlassian.net/wiki/spaces/ACC/.

Confluence Space

To gain access to the Confluence space, please contact me directly at sonnguyenhoang.com or via email at hoangson091104@gmail.com.

User Interface

The user interface is built with Next.js and React, providing a modern, mobile-responsive experience. Below are some screenshots of the application (some screenshots may be outdated and not reflect the latest UI - visit https://synthoraai.vercel.app/ for the latest version):

0. Landing Page

Landing Page

1. Home Page

Home Page

2. Article Details Page

Article Detail Page

2.1. Article Q&A Feature

Article Q&A Feature

2.2. Related Articles (Vector Similarity Search)

Related Articles (Vector Similarity Search)

2.3. AI-Powered Article Bias Analysis

Article Bias Analysis

2.4. Article Ratings

Article Ratings

2.5. Article Comments

Article Comments & Ratings

3. Favorite Articles Page (Only for Authenticated Users)

Favorite Articles Page

4. Newsletter Subscription Page

Newsletter Subscription Page

5. User Authentication

User Authentication

6. User Registration

User Registration

7. Reset Password

Reset Password

8. Search Results

9. App-wide Translate Feature

App-wide Translate Feature

10. 404 Not Found Page

404 Not Found Page

11. Daily Newsletter Email Example

Daily Newsletter Example

12. Passkey Management Page

Passkey Management Page

more pages and features are available in the app - we encourage you to explore!

Backend

The Backend is responsible for storing articles and serving them via RESTful endpoints. It integrates AI summarization, MongoDB for storage, and runs within a Next.js environment using Express.js for API routes.

Features

Data Ingestion:
Receives article URLs and data from the crawler and external API sources.
Content Summarization:
Uses Google Generative AI (Gemini) to generate concise summaries.
Storage:
Persists articles in MongoDB using Mongoose with fields for URL, title, full content, summary, source information, and fetch timestamp.
API Endpoints:
- GET /api/articles – Retrieves a paginated list of articles (supports filtering via query parameters such as page, limit, and source).
- GET /api/articles/:id – Retrieves detailed information for a specific article.
Scheduled Updates:
A serverless function (triggered twice daily at 6:00 AM and 6:00 PM UTC) fetches and processes new articles, so that the system remains up-to-date!
User Authentication:
Supports email + password registration, login, and JWT-based authentication, plus passwordless passkey (WebAuthn / FIDO2) sign-in. Users can sign up with a passkey only, add additional passkeys to an existing account, and manage them (list / rename / delete) at /account/passkeys. Both flows issue the same JWT, so the rest of the system is unchanged.
Favorite Articles:
Authenticated users can mark articles as favorites for quick access.
Newsletter Subscription:
Users can subscribe to a newsletter for daily updates on the latest articles. This feature is integrated with a third-party service (Resend) for managing subscriptions and sending emails.
Bias Detection & Analysis:
The app also includes a bias detection & analysis feature, powered by Google Generative AI, to analyze articles for potential bias and provide deep article insights to users.
User Ratings:
Users can rate articles, allowing for feedback and quality assessment.
Dark Mode:
The frontend offers a dark mode option for improved readability and user experience.
Discussions & Comments:
Users can also discuss and comment on articles, fostering engagement and collaboration.
Upvote/Downvote Comments:
Users can upvote or downvote comments to highlight valuable contributions.

Backend Swagger API Documentation

Prerequisites & Installation (Backend)

[!CAUTION] Before proceeding, run npm install once in the root directory of the monorepo to install the necessary dependencies for managing the project!

Prerequisites:
- Node.js (v18 or later)
- MongoDB (local or cloud)
- Vercel CLI (for deployment)

Clone the Repository:

git clone https://github.com/hoangsonww/AI-Gov-Content-Curator.git
cd AI-Gov-Content-Curator/backend

# then, fill in the environment variables as described in .env.example file
# approved team members, please contact me for the actual .env file and API keys!

Install Dependencies (inside backend/):
```
npm install
```

Configuration (Backend)

Create a .env file in the ROOT directory with the following (this will be shared across all components in this monorepo):

MONGODB_URI=<your_mongodb_uri>
GOOGLE_AI_API_KEY=<your_google_ai_api_key>
GOOGLE_AI_API_KEY1=<your_google_ai_api_key1_optional>
GOOGLE_AI_API_KEY2=<your_google_ai_api_key2_optional>
GOOGLE_AI_API_KEY3=<your_google_ai_api_key3_optional>
AI_INSTRUCTIONS=Summarize the articles concisely and naturally (change if needed)
NEWS_API_KEY=<your_news_api_key>
NEWS_API_KEY1=<your_news_api_key1>
PORT=3000
CRAWL_URLS=https://www.state.gov/press-releases/,https://www.bbc.com/news,https://www.nytimes.com/,https://www.dallasnews.com/news/,https://www.houstonchronicle.com/,,https://www.whitehouse.gov/briefing-room/,https://www.congress.gov/,https://www.statesman.com/news/politics-elections/
AICC_API_URL=https://ai-content-curator-backend.vercel.app/
RESEND_API_KEY=<your_resend_api_key>
RESEND_FROM="AI Curator <your_email>"
UNSUBSCRIBE_BASE_URL=<your_unsubscribe_base_url>

# Auth
JWT_SECRET=<long_random_string>

# WebAuthn / Passkey configuration
# RP_ID MUST equal the FRONTEND apex domain (eTLD+1), not the backend's host.
RP_ID=localhost
RP_NAME=SynthoraAI
RP_ORIGIN=http://localhost:3000,https://synthoraai.vercel.app

Refer to the .env.example file for more details on each variable.

Running Locally (Backend)

Start the development server:

npm run dev

Access endpoints:

GET http://localhost:3000/api/articles
GET http://localhost:3000/api/articles/:id

Deployment on Vercel (Backend)

Configure Environment Variables in your Vercel project settings.

Create or update the vercel.json in the root of the backend directory:

{
  "version": 2,
  "builds": [
    {
      "src": "package.json",
      "use": "@vercel/next"
    }
  ],
  "crons": [
    {
      "path": "/api/scheduled/fetchAndSummarize",
      "schedule": "0 6,18 * * *"
    }
  ]
}

Deploy with:
```
vercel --prod
```

Crawler

The Crawler automatically retrieves article links and metadata from government homepages and public API sources. It uses Axios and Cheerio for static HTML parsing and falls back to Puppeteer when necessary.

Features

Article Extraction:
Crawls specified URLs to extract article links and metadata.
Error Handling & Resilience:
Implements a retry mechanism and fallback to Puppeteer for dynamic content fetching when encountering issues (e.g., HTTP 403, ECONNRESET).
Scheduling:
Deployed as a serverless function on Vercel, scheduled via cron (runs daily at 6:00 AM UTC).
Next.js UI:
Provides a basic landing page with information about the crawler and links to the backend and frontend.

Prerequisites & Installation (Crawler)

Prerequisites:

Node.js (v18 or later)
NPM (or Yarn)
Vercel CLI (for deployment)

Clone the Repository:

git clone https://github.com/hoangsonww/AI-Gov-Content-Curator.git
cd AI-Gov-Content-Curator/crawler

Install Dependencies:
```
npm install
```

Running Locally (Crawler)

Start the Next.js development server to test both the UI and crawler function:

npm run dev

UI: http://localhost:3000/
Crawler Function: http://localhost:3000/api/scheduled/fetchAndSummarize

Alternatively, run the crawler directly:

npx ts-node schedule/fetchAndSummarize.ts

# or
npm run crawl

Also, there are 2 more scripts for the crawler:

Fetch and crawl all past articles (will run indefinitely, unless you stop it):
```
npx ts-node scripts/fetchPastArticles.ts

# or
npm run fetch:past
```

Fetch and crawl all newest/latest articles:

 npx ts-node scripts/fetchLatestArticles.ts

 # or
 npm run fetch:latest

Run these locally to test the crawler functionality. You can also run them in a Docker container if you prefer.

Deployment on Vercel (Crawler)

Set Environment Variables in the Vercel dashboard.

Create or update the vercel.json in the crawler directory:

{
  "version": 2,
  "builds": [
    {
      "src": "package.json",
      "use": "@vercel/next"
    }
  ],
  "crons": [
    {
      "path": "/api/scheduled/fetchAndSummarize",
      "schedule": "0 6 * * *"
    }
  ]
}

Deploy with:
```
vercel --prod
```

Frontend

The Frontend is built with Next.js and React, providing a modern, mobile-responsive UI for browsing and viewing curated articles.

Features

Article Listing:
Fetches and displays a paginated list of articles from the backend API. Supports filtering by source.
Article Detail View:
Dedicated pages display full article content, AI-generated summaries, source information, and fetched timestamps.
Responsive Design:
The UI is optimized for both desktop and mobile devices.
Authentication:
Users can register and log in with email + password, or sign in with a passkey (Face ID, Touch ID, Windows Hello, or a hardware key) for a faster, passwordless experience. New accounts may be created passkey-only. The frontend stores the issued JWT in localStorage and sends it on the Authorization header for protected routes.
Chatbot Q&A Feature: Users can ask questions about specific articles, powered by RAG (Retrieval-Augmented Generation) using Google Generative AI. The sitewide chat supports inline message editing with conversation branching—edit any prior message and the conversation forks from that point with fresh AI responses.
Related Articles (Vector Similarity Search):
Displays related articles based on vector similarity search using embeddings generated by Google Generative AI.
Article Bias Analysis:
Analyzes articles for potential bias using Google Generative AI and provides insights to users.
Article Ratings:
Users can rate articles, allowing for feedback and quality assessment.
Discussions & Comments:
Users can discuss and comment on articles, fostering engagement and collaboration.
Upvote/Downvote Comments:
Users can upvote or downvote comments to highlight valuable contributions.
Search Functionality:
Users can search for articles using keywords, with results fetched from the backend API.
Translation Feature:
Articles can be translated into multiple languages using Google Translate API.
Favorite Articles:
Authenticated users can mark articles as favorites for quick access. This is stored in the backend and displayed in the frontend.
Newsletter Subscription:
Users can subscribe to a newsletter for daily updates on the latest articles. This feature is integrated with a third-party service (Resend) for managing subscriptions and sending emails.
Dark Mode:
The frontend offers a dark mode option for improved readability and user experience.
Additional UI Components:
Includes components like HeroSlider, LatestArticles, ThemeToggle, and more for an enhanced user experience.
Static Site Generation (SSG):
The frontend uses Next.js's SSG capabilities to pre-render pages for improved performance and SEO.

Prerequisites & Installation (Frontend)

Prerequisites:

Node.js (v18 or later)
NPM or Yarn

Clone the Repository:

git clone https://github.com/hoangsonww/AI-Gov-Content-Curator.git
cd AI-Gov-Content-Curator/frontend

Install Dependencies:
```
npm install
```
or
```
yarn
```

Configuration (Frontend)

(Optional) Create a .env.local file in the frontend directory to configure the API URL:

NEXT_PUBLIC_API_URL=https://your-backend.example.com

Running Locally (Frontend)

Start the development server:

npm run dev

Access the application at http://localhost:3000.

Deployment on Vercel (Frontend)

Configure Environment Variables in the Vercel dashboard (e.g., NEXT_PUBLIC_API_URL).
Vercel automatically detects the Next.js project; if needed, customize with a vercel.json.
Deploy with:
```
vercel --prod
```

Alternatively, you can deploy directly from the Vercel dashboard.

Newsletter Subscription

The app also includes a newsletter subscription feature, allowing users to sign up for updates. This is integrated with a third-party service (Resend) for managing subscriptions.

Features (Newsletter)

Subscription Form:
Users can enter their email addresses to subscribe to the newsletter.
Unsubscribe Option:
Users can unsubscribe from the newsletter at any time.
Daily Updates:
Subscribers receive daily updates with the latest articles. Only the latest articles are sent to subscribers, ensuring they receive the most relevant information.

Prerequisites & Installation (Newsletter)

[!IMPORTANT] This assumes that you have already set up the backend and frontend as described above.

Prerequisites: Sign up for a Resend account and obtain your API key.
Go to Domain Settings: In your Resend dashboard, navigate to the "Domains" section and add your domain (you'll have to have purchased a domain name that you have access to). Render will ask that you verify your domain ownership by adding a TXT record to your DNS settings, as well as adding an MX record to your DNS settings, and more. Follow the instructions provided by Resend to complete this step.
Configure Environment Variables: Create a .env file in the newsletters directory with the following variables:
- RESEND_API_KEY: Your Resend API key.
- RESEND_DOMAIN: The domain you added in the Resend dashboard.
Deploy the CRON Job: Simply run vercel --prod in the newsletters directory to deploy the CRON job that sends daily updates to subscribers.
Configure the CRON Job: In your Vercel dashboard, navigate to the "Functions" section and set up a CRON job that runs daily at 9:00 AM UTC. This job will send the latest articles to subscribers.
That's it! Your newsletter subscription feature is now set up and ready to go. Users can subscribe to receive daily updates with the latest articles.

Note

[!IMPORTANT]

The newsletter subscription feature is designed to be simple and effective. It allows users to stay informed about the latest articles without overwhelming them with too many emails.

The subscription form is integrated into the frontend, and users can easily sign up or unsubscribe at any time.

The daily updates are sent via email, ensuring that subscribers receive the most relevant information without having to check the app constantly.

The newsletter feature is built using the Resend API, which provides a reliable and scalable solution for managing subscriptions and sending emails.

Sometimes, the emails may end up in the spam folder, so users should check their spam folder if they don't see the emails in their inbox.

Agentic AI Pipeline

The Agentic AI Pipeline is a sophisticated, production-ready multi-agent system built with LangGraph and LangChain that processes articles through a series of specialized AI agents. This advanced system provides enhanced content analysis, summarization, classification, sentiment analysis, and quality assurance beyond the basic AI features.

Overview

The Agentic AI Pipeline implements an assembly line architecture where each specialized agent performs a specific task in sequence. Built on LangGraph's state machine framework, the pipeline ensures reliable, scalable, and sophisticated multi-agent orchestration.

graph LR
    A[Article Input] --> B[Content Analyzer]
    B --> C[Summarizer]
    C --> D[Classifier]
    D --> E[Sentiment Analyzer]
    E --> F[Quality Checker]
    F --> G{Quality Pass?}
    G -->|Yes| H[Output]
    G -->|No & Retry| B
    G -->|Max Retries| H

Key Features

🤖 Multi-Agent Architecture: Five specialized agents working in concert
- Content Analyzer: Extracts structure, entities, and key information
- Summarizer: Generates concise, accurate summaries
- Classifier: Categorizes content into 15+ topic categories
- Sentiment Analyzer: Analyzes emotional tone and objectivity
- Quality Checker: Validates outputs with automatic retry logic
🔄 Assembly Line Processing: LangGraph-based state machine with conditional routing
🔌 MCP Server: Model Context Protocol server for standardized AI interactions
🛰️ ACP Layer: Agent Communication Protocol for inter-agent messaging (register -> heartbeat -> send -> inbox -> ack)
📬 Durable Agent Comms: Redis-backed ACP store for multi-replica deployments with TTL, retention, and liveness pruning
🧪 Operational Preflight: Live ACP roundtrip checks are part of make mcp-preflight
☁️ Cloud-Ready: Production configs for AWS Lambda and Azure Functions
📊 Quality Assurance: Built-in quality checking with automatic retry mechanisms
⚡ Production-Ready: Comprehensive logging, monitoring, and error handling
🔐 Secure: Secrets management via AWS Secrets Manager and Azure Key Vault

Architecture

The pipeline uses an assembly line architecture where articles flow through multiple specialized agents:

Intake Node: Validates input and initializes state
Content Analysis: Extracts structure, entities, dates, and style
Summarization: Generates 150-200 word summaries
Classification: Categorizes into relevant topics
Sentiment Analysis: Analyzes tone, objectivity, urgency, and controversy
Quality Check: Validates outputs and determines if retry is needed
Output Node: Returns final results

Technology Stack:

LangChain: Framework for LLM-powered applications
LangGraph: State machine orchestration for multi-agent systems
Python 3.11+: Modern Python with async/await support
Redis: State management and caching
MongoDB: Data persistence
Prometheus: Metrics and monitoring
Splunk + OpenTelemetry: Centralized log aggregation, distributed tracing, and enterprise observability via OTEL Collector
MCP Python SDK (FastMCP): Model Context Protocol server implementation
ACP Store Backends: Redis (production) + in-memory fallback (non-production)

MCP + ACP Surface:

MCP: 28 tools, 14 resources, 7 prompts
ACP Tools: acp_register_agent, acp_unregister_agent, acp_heartbeat, acp_send_message, acp_fetch_inbox, acp_acknowledge_message, acp_list_agents, acp_get_message
ACP Resources: acp://agents, acp://stats, acp://messages/recent

Cloud Deployment:

AWS: Lambda, API Gateway, S3, SQS, Secrets Manager, CloudWatch, Kinesis Firehose → Splunk HEC
Azure: Functions, Storage Queues, Blob Storage, Key Vault, Application Insights

Beads Subarchitecture:

Beads are the atomic unit of work in the agentic architecture. Each bead is a discrete, well-scoped task that an agent (human or AI) can claim, execute, and verify independently
Beads follow a PENDING → CLAIMED → IN_PROGRESS → REVIEW → DONE lifecycle with a BLOCKED escape state, and use file-level reservations (.beads/.status.json) to prevent concurrent-edit conflicts across agents
Service-scoped IDs (ORCH-001, CRAWL-005, PIPE-012, etc.) tie every bead to the service it changes, enabling per-service tracking and parallelism
A compound learning loop records structured session logs in .agent-sessions/ after each completed bead, so future agents benefit from accumulated experience
See .beads/README.md for the full specification and .agent-sessions/README.md for session log format

Getting Started

Quick Start:

# Navigate to the agentic_ai directory
cd agentic_ai

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env with your API keys
# For production ACP, set:
# ACP_ENABLED=true
# ACP_BACKEND=redis
# REDIS_HOST=<redis-host>
# REDIS_PORT=6379

# Run the MCP server
PYTHONPATH=.. python -m mcp_server

# Run production-readiness preflight (includes live ACP checks)
make mcp-preflight

Use Programmatically:

from agentic_ai.core.pipeline import AgenticPipeline
import asyncio

# Initialize pipeline
pipeline = AgenticPipeline()

# Process an article
result = asyncio.run(pipeline.process_article({
    "id": "article-123",
    "content": "Your article content...",
    "url": "https://example.com/article",
    "source": "government"
}))

print(f"Summary: {result['summary']}")
print(f"Topics: {result['topics']}")
print(f"Quality Score: {result['quality_score']}")

Deploy to Cloud:

# Deploy to AWS
cd agentic_ai/aws
./deploy.sh production

# Deploy to Azure
cd agentic_ai/azure
./deploy.sh production

Docker Deployment:

cd agentic_ai
docker-compose up -d

Detailed Documentation

For comprehensive documentation including:

Detailed agent specifications
MCP server API reference
Cloud deployment guides
Performance optimization tips
Monitoring and observability setup
Integration examples

Please see the complete documentation: agentic_ai/README.md

For full MCP + ACP protocol and runtime diagrams, see MCP-ACP.md.

[!TIP] For a comprehensive reference of all AI/ML components — the three orchestration layers, 21 agents, LLM providers, cost controls, grounding rules, and 15+ Mermaid architecture diagrams — see AI_ML.md.

Article Q&A Feature

The article Q&A feature allows users to ask questions about specific articles and receive AI-generated answers. This feature is integrated into the frontend and backend, providing a seamless experience for users.

The AI will have access to the article content and will use RAG to generate answers based on the information provided in the article. This feature is designed to enhance user engagement and provide quick answers to common questions.

Features (Article Q&A)

The chatbot is given an identity (ArticleIQ) and is designed to answer questions related to the articles. The AI will have access to the article content and will use RAG (Retrieval-Augmented Generation) to generate answers based on the information provided in the article:

Ask Questions: Users can ask questions about specific articles directly from the article detail page.
AI-Generated Answers: The AI will generate answers based on the content of the article, providing users with relevant information.
User-Friendly Interface: The Q&A feature is integrated into the article detail page, making it easy for users to ask questions and receive answers without navigating away from the content.
RAG Integration: The AI will use RAG (Retrieval-Augmented Generation) to provide accurate and contextually relevant answers based on the article content.
Real-Time Responses: Users will receive answers in real-time, enhancing the overall user experience and engagement with the content.

In addition to the site-wide chatbot, article-specific chatbots are also available on each article detail page. These chatbots are tailored to the content of the specific article, allowing users to ask questions and receive answers that are directly relevant to the article they are reading.

Prerequisites & Installation (Article Q&A)

[!TIP] This feature is integrated into the existing backend and frontend, so you don't need to set up anything separately.

Prerequisites: Ensure that you have the backend and frontend set up as described above.
Install Dependencies: Make sure you have the necessary dependencies installed in both the backend and frontend directories. You can run npm install in the root directory to install dependencies for all components.
Configure Environment Variables: Ensure that you have the necessary environment variables set up in your .env file. This includes the Google AI API key and any other required variables.
Deploy the Backend: If you haven't already, deploy the backend to Vercel using vercel --prod in the backend directory. Or, just run locally with npm run dev.
Deploy the Frontend: If you haven't already, deploy the frontend to Vercel using vercel --prod in the frontend directory. Or, just run locally with npm run dev.
Test the Feature: Once everything is set up, you can test the article Q&A feature by navigating to an article detail page of an article and asking questions. The AI will generate answers based on the content of the article.
That's it! The article Q&A feature is now integrated into the existing system, providing users with an enhanced experience and quick access to information.

Using the Article Q&A Feature

To use the article Q&A feature, simply navigate to the article detail page of an article and look for the Q&A section. You can ask questions related to the article, and the AI will generate answers based on the content provided.

Feel free to ask any questions related to the article, and the AI will do its best to provide accurate and relevant answers. This feature is designed to enhance user engagement and provide quick access to information without having to read through the entire article.

Sitewide AI Chat

The sitewide chat lets users ask open-ended questions across the full corpus—not just a single article—while keeping every claim cited. The system provides two chat paths: a direct Gemini-powered RAG pipeline and an orchestrated multi-agent pipeline.

Direct RAG Chat (`/api/chat/sitewide`)

RAG over the whole library: The backend converts queries to gemini-embedding-001 vectors, searches Pinecone for top matches, and builds a context block with [Source N] slots.
Streaming Gemini replies: Gemini 2.0 Flash / Flash Lite streams text via Server-Sent Events (SSE) with automatic API-key/model failover and history compaction to stay within token budgets.
Inline citations & warnings: Responses carry citation metadata plus hallucination checks (missing citations, invalid refs, overconfident claims, uncited numbers). The frontend renders clickable superscripts and yellow warnings if issues are detected.

Orchestrated Multi-Agent Chat (`/api/orchestrator/chat`)

16 specialized agents: 8 Anthropic (Claude) primary + 8 Google (Gemini) fallback agents covering article search, Q&A, topic exploration, trend analysis, bias detection, clarification, and quality review.
Intent-based routing: LLM-based intent classification routes queries to the best-fit agent, with keyword heuristic fallback.
Dual-provider failover: Anthropic primary with automatic Google failover after retry exhaustion (3 attempts, exponential backoff with jitter).
Grounding validation: 10 canonical rules applied post-generation to detect hallucinations, missing citations, and unsupported claims.
Cost tracking: Real-time daily budget enforcement with per-model cost breakdown.
Streaming support: SSE streaming via /api/orchestrator/chat/stream.

Rich Client UX

frontend/pages/ai_chat.tsx provides multiple conversations, local storage persistence, typing indicators, and interactive source cards.
Message editing with conversation branching: Users can click the pencil icon on any sent message to edit it inline. Submitting the edit truncates the conversation at that point (removing all subsequent messages) and re-sends the edited message with only the preceding history, effectively branching the conversation from the edit point.

How It Works

sequenceDiagram
    participant User
    participant UI as Frontend (ai_chat.tsx)
    participant API as Backend /api/chat/sitewide
    participant Vec as Pinecone (ai-gov-articles)
    participant LLM as Gemini 2.0 Flash

    User->>UI: Ask question
    UI->>API: POST userMessage + trimmed history
    API->>Vec: Embed query (gemini-embedding-001) & semantic search
    Vec-->>API: Top K articles + metadata
    API->>LLM: Stream request with context + citations + guardrails
    LLM-->>API: SSE chunks (text)
    API-->>UI: SSE events (status/context/citations/chunk/warnings/done)
    UI-->>User: Live updates, clickable citations, warnings if any

Streaming Contract

Endpoint: POST /api/chat/sitewide
Events: status, context, citations, chunk, warnings, done
Payloads: JSON per event (e.g., {"message":"Generating response..."}, {"sources":[{number,title,url,score}]}, {"text":"partial reply"})
Frontend handling: Streams append text into the active message bubble; citations hydrate source cards; warnings show a yellow banner above the AI reply.

Intelligent Recommendation System

SynthoraAI employs a sophisticated, multi-layered recommendation engine to deliver personalized and contextually relevant content to users.

Command Line Interface (CLI)

The aicc command gives you a single entrypoint to manage your entire monorepo—frontend, backend, crawler—and perform content‐curation tasks.

Installation

From the project root:

# Install dependencies
npm install

# Link the CLI so `aicc` is on your PATH
npm link

[!TIP] This sets up a global symlink named aicc pointing at ./bin/aicc.js.

Usage

Run aicc with no arguments to display help:

aicc

Workspace Management

Command	Description
`aicc dev`	Start all services in development mode
`aicc dev <service>`	Start one service (`frontend` / `backend` / `crawler`) in dev
`aicc build`	Build all services for production
`aicc build <service>`	Build one service
`aicc start`	Start all services in production mode
`aicc start <service>`	Start one service
`aicc lint`	Run Prettier across all packages
`aicc format`	Alias for `aicc lint`

Examples:

# Run frontend + backend + crawler in parallel
aicc dev

# Build only the backend
aicc build backend

# Start crawler in prod mode
aicc start crawler

# Lint & format everything
aicc lint

Crawling

Kick off your scheduled crawler (schedule/fetchAndSummarize.ts) in the crawler package:

aicc crawl

This will cd crawler and run npm run crawl under the hood.

Article CRUD

Interact with your backend’s /api/articles endpoints directly from the CLI:

Command	Description
`aicc article create --title <t> --content <c> [...flags]`	Create a new article
`aicc article get <id>`	Fetch one article by its MongoDB `_id`
`aicc article list [--limit N]`	List articles, optionally limiting the number
`aicc article update <id> [--flags]`	Update fields on an existing article
`aicc article delete <id>`	Delete an article by ID

Flags for create & update:

--title <string> — Article title
--content <string> — Full article content (stored in content)
--summary <string> — Brief summary
--topics <topic1> ... — One or more topic tags
--source <string> — Source identifier

Examples:

# Create a new article
aicc article create \
  --title "AI in 2025" \
  --content "Deep dive into AI trends..." \
  --summary "Key trends in AI" \
  --topics ai machine-learning \
  --source "manual-cli"

# Get an article
aicc article get 64a1f2d3e4b5c6a7d8e9f0

# List up to 5 articles
aicc article list --limit 5

# Update title and topics
aicc article update 64a1f2d3e4b5c6a7d8e9f0 \
  --title "AI Trends 2025" \
  --topics ai trends

# Delete an article
aicc article delete 64a1f2d3e4b5c6a7d8e9f0

With aicc in your toolbox, you can develop, build, run, lint, crawl, and manage content—all from one unified interface.

Shell Scripts & Makefile

The project includes several shell scripts and a Makefile to simplify common tasks. These scripts are located in the scripts directory and can be executed directly from the command line.

Shell Scripts

Various shell scripts are provided for tasks such as:

Starting the backend or frontend
Running the crawler
Building the project
Running tests
Deploying to Vercel
and more...

These scripts are designed to be easy to use and can be executed with a simple command. Visit the shell directory for more details on each script.

To run a shell script, use the following command:

chmod +x scripts/<script_name>.sh
./scripts/<script_name>.sh

`daily.sh` Script in Root Directory

The daily.sh script is a shell script that automates the process of running the crawler and sending out the newsletter. It is designed to be run daily, and it performs the following tasks:

Runs the crawler to fetch the latest articles.
Processes the articles and generates summaries.
Cleanups any temporary files or artifacts, as well as dirty/corrupted articles.
Sends out the newsletter to subscribers with the latest articles.
Performs any other necessary tasks related to the daily operation of the application.

To run the daily.sh script, use the following command:

chmod +x daily.sh
./daily.sh

Please ensure that you have the necessary permissions and environment variables set up before running the script.

Also, you can set up a cron job to run this script automatically at a specified time each day. To do so, simply run the install_daily_cron.sh script, which will install the cron job for you.

chmod +x install_daily_cron.sh
./install_daily_cron.sh

This will create a cron job that runs the daily.sh script every day at 16:00 (4:00 PM) UTC. You can adjust the timing in the install_daily_cron.sh script if needed.

To confirm, run the following command to view your cron jobs:

crontab -l

This will display a list of all your cron jobs, including the one you just created for the daily.sh script.

[!CAUTION] Be sure to keep your computer on and connected to the internet for the cron job to run successfully at the scheduled time!

[!TIP] Logs will be saved in the daily.log file in the root directory, so you can check the output of the script and any errors that may occur.

Makefile

The Makefile provides a convenient way to run common tasks using the make command. It includes targets for building, testing, and deploying the project.

To use the Makefile, navigate to the project root directory and run:

make <target>

Example Makefile Targets

Target	Description
`bootstrap`	Install dependencies for all packages
`clean`	Remove build artifacts and temporary files
`deps`	Install dependencies for all packages
`dev:frontend`	Start the frontend in development mode
`dev:backend`	Start the backend in development mode
and more...

To see all available targets, run:

make help

This will display a list of all targets defined in the Makefile along with their descriptions.

Testing

Backend

The backend uses Jest + Supertest (with an in-memory MongoDB) for unit and integration tests. From the backend workspace root, run:

# Install dependencies (npm ci is correct here—it installs exactly from package-lock)
npm ci

# Run all tests once
npm run test

# Rerun tests on file changes
npm run test:watch

# Generate a coverage report
npm run test:coverage

[!NOTE] If your changes do not involve AI functionality, you'll need to set GOOGLE_AI_API_KEY=dummy in backend/.env to prevent tests from failing due to missing API keys.

Frontend

The frontend uses Playwright for end-to-end testing. From the frontend workspace directory, run:

# Install dependencies
npm ci

# Headless E2E run (default)
npm run test:e2e

# Run in headed mode (open real browser windows)
npm run test:e2e:headed

# Open the HTML report after a run
npm run test:e2e:report

By default, the Playwright report is served at http://localhost:9323 (or another port as printed in your console).

Crawler

The crawler uses Jest + ts-jest to test the fetchAndSummarize job. From the crawler workspace root, run:

# Install dependencies
npm ci

# Run all crawler tests once
npm run test

# (Optional) Re-run on file changes
npm run test -- --watch

# To execute an actual crawl against your configured URLs:
npm run crawl

Make sure required environment variables (e.g. MONGODB_URI, CRAWL_URLS, CRAWL_MAX_LINKS, etc.) are defined before invoking npm run crawl and any test commands. Using npm ci in each workspace ensures a clean, reproducible installation based on your lockfile.

Continuous Integration / Continuous Deployment (CI/CD)

The project uses GitHub Actions for CI/CD. The workflow is defined in .github/workflows/ci.yml. It includes:

Linting: Runs ESLint and Prettier checks on all code changes.
Testing: Executes unit tests for the backend and frontend.
Deployment: Automatically deploys the backend and frontend to Vercel on successful merges to the main branch.
Docker: Builds and pushes Docker images for the backend and crawler.
Cron Jobs: Configures scheduled tasks for the backend and crawler.
Environment Variables: Sets up environment variables for the backend and crawler.
and more...

CI/CD Workflow

Additional .yml files are also available for specific tasks, such as backend-ci.yml, crawler-ci.yml, and frontend-ci.yml.

Deployment

The project fully supports deployment with AWS, Kubernetes, and Terraform for infrastructure as code (IaC). It utilizes blue/green and canary deployment strategies for zero-downtime releases.

For detailed deployment instructions, refer to the infrastructure/ directory, which contains Terraform scripts and Kubernetes manifests.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Contact

If you have any questions or suggestions, feel free to reach out to the repository maintainer:

David Nguyen
- LinkedIn
- GitHub
- Email
- Website

I will be happy to assist you with any questions or issues you may have regarding this project.

[!TIP] If I don't know the answer, I'll be able to forward your question to the right person in the AICC team who can help you!

Conclusion

The SynthoraAI - AI-Powered Article Content Curator project brings together a powerful backend, an intelligent crawler, a newsletter service, and a modern frontend to deliver up-to-date, summarized government-related articles. Leveraging advanced technologies like Google Generative AI, Next.js, Express.js, and MongoDB, the system is both scalable and robust. Whether you’re a government staff member or a curious public user, this solution provides a streamlined, user-friendly experience to quickly access relevant, summarized content.

[!NOTE] This project is a work in progress, and contributions are welcome! If you have ideas for improvements, bug fixes, or new features, please feel free to open an issue or submit a pull request.

Thank you for exploring this project! If you have any questions, suggestions, or contributions, feel free to reach out. Your feedback is invaluable in making this project even better. Cheers to a more informed world! 🚀

🔝 Back to Top

AI Gov Content Curator

Installation

Reviews

Documentation

SynthoraAI - AI-Powered Article Content Curator

Table of Contents

Overview

Architecture

Collaboration & Agile Workflow with Jira

Introduction

Agile Approach

Why Jira?

Project Board

Workflow

Confluence

User Interface

0. Landing Page

1. Home Page

2. Article Details Page

2.1. Article Q&A Feature

2.2. Related Articles (Vector Similarity Search)

2.3. AI-Powered Article Bias Analysis

2.4. Article Ratings

2.5. Article Comments

3. Favorite Articles Page (Only for Authenticated Users)

4. Newsletter Subscription Page

5. User Authentication

6. User Registration

7. Reset Password

8. Search Results

9. App-wide Translate Feature

10. 404 Not Found Page

11. Daily Newsletter Email Example

12. Passkey Management Page

Backend

Features

Backend Swagger API Documentation

Prerequisites & Installation (Backend)

Configuration (Backend)

Running Locally (Backend)

Deployment on Vercel (Backend)

Crawler

Features

Prerequisites & Installation (Crawler)

Running Locally (Crawler)

Deployment on Vercel (Crawler)

Frontend

Features

Prerequisites & Installation (Frontend)

Configuration (Frontend)

Running Locally (Frontend)

Deployment on Vercel (Frontend)

Newsletter Subscription

Features (Newsletter)

Prerequisites & Installation (Newsletter)

Note

Agentic AI Pipeline

Overview

Key Features

Architecture

Getting Started

Detailed Documentation

Article Q&A Feature

Features (Article Q&A)

Prerequisites & Installation (Article Q&A)

Using the Article Q&A Feature

Sitewide AI Chat

Direct RAG Chat (/api/chat/sitewide)

Orchestrated Multi-Agent Chat (/api/orchestrator/chat)

Rich Client UX

How It Works

Streaming Contract

Intelligent Recommendation System

Related Articles (Vector Similarity Search)

Recommended Articles (Client-Side ML)

Command Line Interface (CLI)

Installation

Usage

Workspace Management

Crawling

Direct RAG Chat (`/api/chat/sitewide`)

Orchestrated Multi-Agent Chat (`/api/orchestrator/chat`)

`daily.sh` Script in Root Directory