Legalhack
Demo app for legislation.gov.uk endpoint calling, entity extraction and query building
Installation
npx legalhackAsk AI about Legalhack
Powered by Claude Β· Grounded in docs
I know everything about Legalhack. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
legalhack
Legal document analysis tool built for The National Archives Legal Data Hackathon (Challenge 2: Trust, AI, and Access to Justice).
Extracts structured entities from legal documents (statutes, provisions, dates, risks, organisations, etc.) and fans out targeted queries to legislation.gov.uk endpoints to retrieve relevant legislation. Available as both a CLI and a web UI with real-time SSE streaming.
Features
- Document Parsing -- regex-based line classification into questions, legislation references, facts, and opinions with character offsets
- Named Entity Recognition -- HuggingFace Transformers.js (BERT NER) extracts organisations, people, locations, and regulatory bodies
- Pattern Extraction -- regex patterns for statutes, provisions, courts, legal terms, implicit legislation (UK GDPR, TUPE, etc.), financial terms, dates, deliverables, risk indicators
- Targeted Query Generation -- maps extracted entities to search queries with strategy (exact/semantic/keyword), reason, and target endpoints
- API Fan-Out -- parallel queries across 9 legislation.gov.uk endpoints (7 MCP tools + 2 REST) via
Promise.allSettled - Source Tracing -- colour-coded entity highlights in the original document with bidirectional hover tracing between results, queries, entity pills, and source text
- SSE Streaming -- phased results stream in real-time (parsed -> queries -> results -> enriched -> complete)
- CLI + Web UI -- both interfaces share the same pipeline module
Quick Start
npm install
cp .env.example .env
# Check endpoint connectivity
npm start -- init
# Analyse documents
npm start -- run --dir input_docs
Web UI
cd web && npm install
# Terminal 1: Hono backend (port 3333)
npm run dev:server
# Terminal 2: Vite dev server (port 5173)
npm run dev
Open http://localhost:5173, paste a legal document, click Analyse.
Environment Variables
MCP_SERVER_URL=https://mcp.l9n.org/mcp # Legislation MCP server
LEGISLATION_BASE_URL=https://www.legislation.gov.uk # REST API base
RESEARCH_USER=research # Optional
RESEARCH_PASS= # Optional
LEGISLATION_CHAT_USER= # Optional
LEGISLATION_CHAT_PASS= # Optional
Architecture
βββββββββββββββ
β Document β
β (markdown) β
ββββββββ¬βββββββ
β
ββββββββββββββΌβββββββββββββ
βΌ βΌ βΌ
ββββββββββββ ββββββββββββ ββββββββββββ
β Regex β β NER β β Pattern β
β Parser β β (BERT) β β Extract β
ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββ
β Enriched Document β
β questions, facts, opinions, β
β legislation refs, statutes, β
β provisions, courts, legal terms, β
β organisations, dates, risks, ... β
ββββββββββββββββ¬βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Query Generator β
β (targeted, β
β with reasons) β
ββββββββββ¬βββββββββ
β
βββββββββββββββΌββββββββββββββ
βΌ βΌ βΌ
βββββββββββββ βββββββββββββ βββββββββββββ
β MCP Tools β β MCP Tools β β REST API β
β (search) β β (get) β β (search) β
βββββββ¬ββββββ βββββββ¬ββββββ βββββββ¬ββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β Legislation Results β
β title, url, snippet, relevance β
βββββββββββββββββββββββββββββββββββββββββββ
API Endpoints (legislation.gov.uk)
| Source | Tool/Endpoint | Strategy |
|---|---|---|
mcp-search | search_legislation | exact (title match) |
mcp-semantic | search_legislation_semantic | semantic |
mcp-sections-semantic | search_legislation_sections_semantic | semantic |
mcp-get | get_legislation | exact (type/year/number) |
mcp-fragment | get_legislation_fragment | exact |
mcp-metadata | get_legislation_metadata | exact |
mcp-toc | get_legislation_table_of_contents | exact |
rest-search | GET /all?title=... | keyword |
rest-changes | GET /changes/affected/... | exact |
Project Structure
src/
βββ cli.ts # CLI entry (init, run commands)
βββ pipeline/ # Shared pipeline orchestration
βββ parser/
β βββ index.ts # Regex line classifier
β βββ ner.ts # HuggingFace BERT NER
β βββ pattern-extraction.ts # Domain-specific regex patterns
β βββ enrichment.ts # Parallel NER + pattern orchestrator
β βββ enriched-types.ts # Entity type definitions
βββ queries/
β βββ index.ts # Query generation from parsed items
β βββ enriched.ts # Queries from enriched entities
βββ api/
β βββ mcp-client.ts # MCP server client (7 tools)
β βββ rest-client.ts # legislation.gov.uk REST client
β βββ unified.ts # Fan-out orchestrator
βββ output/
βββ formatter.ts # Markdown formatting
βββ writer.ts # File I/O
web/
βββ src/
β βββ server/routes/analyse.ts # SSE streaming + REST endpoints
β βββ components/
β β βββ pages/DocumentAnalysis.tsx # Main analysis UI
β β βββ HighlightedDocument.tsx # Entity-highlighted source viewer
β βββ lib/build-entity-spans.ts # Non-overlapping span calculation
βββ vite.config.ts
Testing
npm test # Run all 234 tests
npm run test:watch # Watch mode
15 test files covering parser, NER, pattern extraction, query generation, API clients, fan-out orchestration, output formatting, pipeline, and entity span building.
Tech Stack
| Layer | Technology |
|---|---|
| Language | TypeScript (strict mode) |
| CLI | Commander, chalk, ora |
| NLP | @huggingface/transformers (BERT NER) |
| MCP | @modelcontextprotocol/sdk (streamable-http) |
| Web Frontend | React 19, Tailwind CSS 4, Lucide icons |
| Web Backend | Hono (SSE streaming) |
| Build | Vite, tsc |
| Testing | vitest |
| Linting | eslint + typescript-eslint |
CLI Output
Results are written to output/<timestamp>/<document>/:
output/
βββ 2026-02-23T16-00-00/
βββ case-notes/
βββ parsed.md # Extracted questions, facts, opinions, legislation refs
βββ queries.md # Generated query plan with reasons
βββ results.md # Search results with links
βββ raw/ # Raw JSON per endpoint
Web UI
The web UI provides a split-pane interface:
- Left panel -- Original document with colour-coded entity highlighting (statutes in purple, provisions in indigo, dates in sky blue, courts in slate, etc.)
- Right panel -- Three resizable sections:
- Extracted Entities -- Categorised pills (queried entities marked with search icon)
- Queries -- Generated queries with strategy badges and target endpoints
- Results by Endpoint -- Grouped by API source with parameter details
Hover tracing: Hover over a result entry to trace back through the query, entity pill, and source text span. Hover over an entity pill to highlight its queries and results.
Development
TDD workflow with git hooks enforcing lint + test on commits:
npm run lint # ESLint
npm run typecheck # tsc --noEmit
npm test # vitest
Hackathon Context
Built for The National Archives Legal Data Hackathon, Challenge 2: Trust, AI, and Access to Justice. The tool demonstrates how AI-assisted legal document analysis can surface relevant legislation from authoritative government sources, with full traceability from extracted entities back to source text.
License
MIT
