Night Owl Research Agent
Fully automatic AI research agent for Geoscientists, Remote Sensing researchers, and GIScientists. Features harness engineering, GeoBenchmark (OLS/GWR/MGWR), journal templates, and MCP servers.
Ask AI about Night Owl Research Agent
Powered by Claude Β· Grounded in docs
I know everything about Night Owl Research Agent. Ask me about installation, configuration, usage, or troubleshooting.
0/500
Reviews
Documentation
NORA (Night Owl Research Agent)
A fully automatic, domain-aware AI research agent for Geoscientists, Remote Sensing researchers, and GIScientists β powered entirely by Claude Code skills.

Quick Start
NORA runs inside Claude Code. There is no Python entry point, no server to spin up, and no build step β you just drop the skills into Claude Code's skill directory and invoke the launcher.
Step 1 β Install Claude Code
Install Claude Code first. Any of the official distributions works:
- CLI (recommended):
npm install -g @anthropic-ai/claude-code claude --version - Desktop app (macOS / Windows): download from https://claude.com/claude-code
- VS Code extension: install "Claude Code" from the Marketplace
- Web: https://claude.ai/code
Sign in once with your Anthropic account so Claude Code can reach the API.
Step 2 β Get NORA onto your machine
git clone https://github.com/GRIND-Lab-Core/night_owl_research_agent.git
cd night_owl_research_agent
Step 3 β Install the skills into Claude Code
Claude Code looks for skills under ~/.claude/skills/ (user-level, available in every project) or <project>/.claude/skills/ (project-local). Copy the entire skills/ folder from this repo into one of those locations.
macOS / Linux (user-level β recommended):
mkdir -p ~/.claude/skills
cp -R skills/* ~/.claude/skills/
Windows PowerShell (user-level):
New-Item -ItemType Directory -Force -Path "$env:USERPROFILE\.claude\skills" | Out-Null
Copy-Item -Recurse -Force .\skills\* "$env:USERPROFILE\.claude\skills\"
Windows bash / Git Bash:
mkdir -p "$USERPROFILE/.claude/skills"
cp -R skills/* "$USERPROFILE/.claude/skills/"
Project-local alternative (skills only visible when Claude Code is opened in this folder):
mkdir -p .claude/skills
cp -R skills/* .claude/skills/
Also copy the launcher slash command so /launcher is available:
# macOS / Linux
mkdir -p ~/.claude/commands
cp .claude/commands/launcher.md ~/.claude/commands/
# Windows PowerShell
New-Item -ItemType Directory -Force -Path "$env:USERPROFILE\.claude\commands" | Out-Null
Copy-Item -Force .\.claude\commands\launcher.md "$env:USERPROFILE\.claude\commands\"
Verify the install β open Claude Code and run:
/skills
You should see the NORA skills (full-pipeline, lit-review, idea-discovery-pipeline, deploy-experiment, paper-draft, β¦) listed.
Step 4 β Start a research session
Open the night_owl_research_agent folder in Claude Code (this gives NORA access to CLAUDE.md, RESEARCH_PLAN.md, output/, memory/, and tools/), then pick one of the two entry points:
Option A β interactive launcher (best for first-time users):
/launcher
The launcher walks you through a short questionnaire β research topic, stage to start from, control flags (AUTO_PROCEED, HUMAN_CHECKPOINT, COMPACT_MODE, REVIEWER_DIFFICULTY) β and routes to the correct skill.
Option B β end-to-end pipeline (best when you already know what you want to run):
Skill: full-pipeline
"Your research direction here, e.g. 'urban soundscape inequality via street-view + audio foundation models'"
or, if you prefer a slash-style invocation:
/full-pipeline "your research direction"
full-pipeline chains all four stages:
idea-discovery-pipeline β deploy-experiment β auto-review-loop β generate-report
and then hands off to paper-writing-pipeline for the manuscript.
Tip: for reproducibility, fill in RESEARCH_PLAN.md (or BRIEF.md) in the project root before launching. When either file is present, skills read it as the authoritative brief and ignore conflicting $ARGUMENTS.
Step 5 β (Optional) Enable extras
- MCP servers β edit
.mcp.jsonand register with Claude Code (/mcpinside the chat) to enablefilesystem,fetch,arxiv_mcp,geo_mcp,github, andbrave_search. Seemcp/README_MCP.md. - Hooks β
settings.jsonwiresharness/hooks/*.shinto Claude Code's lifecycle (writeshandoff.jsonon session end, validates tool use, sends desktop notifications). On Windows, run the hook scripts via Git Bash or WSL. - W&B β if your experiments use Weights & Biases, run
wandb loginonce on the host wheredeploy-experimentwill launch training. - API keys β set
ANTHROPIC_API_KEY(for Claude Code), plus any optional keys you want to use (SEMANTIC_SCHOLAR_API_KEY,GITHUB_TOKEN,BRAVE_API_KEY).
Prerequisites summary
| Requirement | Why |
|---|---|
| Claude Code (CLI / desktop / web / VS Code) | Runtime for skills |
| Anthropic account + API credit | Powers the agent |
Python 3.10+ with pip install arxiv requests | tools/arxiv_fetch.py, tools/semantic_scholar_fetch.py |
Conda env with geopandas, pysal, libpysal, esda, spreg, mgwr, rasterio, xarray | Track B (spatial) experiments |
| CUDA GPU (local / remote SSH / Modal) | Track A (deep-learning) experiments β optional |
What It Does
NORA automates the complete academic research lifecycle using Claude Code skills β Markdown-defined workflows that Claude reads and executes, selecting appropriate tools and methods based on context.
- Literature review β searches ArXiv, Semantic Scholar, local papers, Zotero, and Obsidian; synthesizes findings and identifies ranked research gaps.
- Idea discovery β generates 8β12 research ideas from literature gaps, validates novelty via multi-source search + external reviewer, and pilot-tests the top candidates. Pilots run a mandatory local-GPU presence check first (
nvidia-smiβ CUDA, then MPS, thennone); when a local GPU is detected, every pilot launches on it instead of silently falling back to CPU or remote. - Method refinement β iteratively refines vague research directions into problem-anchored, implementation-ready proposals via adversarial review (up to 5 rounds, score β₯ 9 target).
- Experiment design & execution β produces claim-driven experiment roadmaps and deploys to local, remote SSH, or Modal serverless GPU (Track A), or runs spatial/GIScience methods on CPU (Track B), or both for mixed GeoAI. The same mandatory local-GPU check runs at Step 0 of
deploy-experimentso any ML/DL workload (pilot or full) executes on the local GPU when present. - Data acquisition β discovers, evaluates, downloads, validates, and documents datasets from government portals, APIs, cloud archives, and open repositories with full provenance.
- Spatial analysis β guideline-driven: classifies the analytical objective, runs ESDA, and applies geospatial diagnostics conditionally on the research question. MAUP discussion, GWR/MGWR, residual Moran's I, alternative spatial weights, and spatial CV are triggered only when the claim depends on them; when in doubt the skill pauses for a human checkpoint instead of running heavyweight checks (or skipping reviewer-expected ones) by reflex.
- Adversarial review β up to 4 rounds of generatorβevaluator-separated review with per-criterion hard floors;
medium/hard/nightmarereviewer modes via Codex MCP,codex exec, or a Claude subagent. Domain personas (giscience,remote-sensing,spatial-data-science) apply geo-specific must-checks only where the paper's claims actually depend on them instead of penalizing every paper for missing MAUP / GWR discussion. - Report + paper writing β consolidates every pipeline artifact into
output/NARRATIVE_REPORT.md, then runspaper-writing-pipelineto produce a journal-ready manuscript (Markdown β LaTeX β PDF/DOCX) with journal-specific profiles for IJGIS, IEEE TGRS, ISPRS JPRS, RSE, AAG, TGIS, and more.
Architecture
NORA is a skills-first system. All research logic lives in Markdown skill files that Claude reads and executes.

Skills describe workflow logic in Markdown. Claude reads a skill to understand the workflow, then decides the exact sequence of actions based on context β the skill provides guidelines and decision frameworks, not rigid procedures.
You (or /launcher)
β invokes
Skill SKILL.md ββββ reads domain knowledge from skills/knowledge/
β Claude decides what to do
CLI tools (tools/arxiv_fetch.py, etc.) + inline Python + MCP servers as needed
β produce
Output files (reports, paper-cache, figures, manuscript)
β read by
Next skill in pipeline
The single installed slash command is /launcher. Every other skill is invoked by name (Claude Code's native Skill tool) or by being called internally from another skill.
Skills
23 workflow skills in skills/ plus domain knowledge in skills/knowledge/. Each skill is a self-contained Markdown workflow file.
Workflow Skills
| Skill | What it does |
|---|---|
full-pipeline | Master pipeline: idea discovery β experiment β review β report β paper |
lit-review | Search + synthesize + gap analysis (ArXiv, Semantic Scholar, local papers, Zotero, Obsidian) |
idea-discovery-pipeline | Full idea pipeline: lit-review β generate-idea β novelty-check β idea-review β experiment-design-pipeline |
generate-idea | Brainstorm 8β12 ideas, filter, pilot-test top 3, rank (called by idea-discovery-pipeline) |
novelty-check | Verify idea novelty via multi-source search + external reviewer |
idea-review | External critical review of research ideas (Codex MCP) |
refine-research | Iterative method refinement via external review (up to 5 rounds, score β₯ 9) |
experiment-design | Claim-driven experiment roadmap with run order, budget, decision gates |
experiment-design-pipeline | One-shot wrapper: refine-research β experiment-design |
deploy-experiment | Deploy experiments β mandatory local-GPU check at Step 0, then Track A (GPU ML) and/or Track B (CPU spatial) |
data-download | Discover, evaluate, download datasets with provenance tracking |
spatial-analysis | Research-question-driven spatial analysis: classification β ESDA β method β conditional diagnostics β interpretation, with a human checkpoint before adding or skipping heavyweight spatial checks |
auto-review-loop | Up to 4 adversarial review rounds with per-criterion floors |
generate-report | Consolidate lit-review + idea + experiment + review artifacts into output/NARRATIVE_REPORT.md |
paper-writing-pipeline | Orchestrates paper-plan β paper-figure-generate β paper-draft β paper-review-loop β paper-covert |
paper-plan | Build section outline + figure plan (output/PAPER_PLAN.md) |
paper-figure-generate | Generate publication-quality figures, maps, diagrams, and captions |
paper-draft | Turn output/PAPER_PLAN.md into a journal-quality Markdown manuscript |
paper-review-loop | Reviewer-editor review of the draft manuscript and iterative revision |
paper-covert | Convert final manuscript into venue submission package (modular LaTeX, PDF, DOCX) |
submit-check | Validate manuscript against target-journal requirements |
training-check | Monitor running experiments for stalls/failures |
Domain Knowledge
| File | Domain |
|---|---|
spatial-methods.md | Spatial statistics, regression, autocorrelation |
geoai-domain.md | GeoAI, spatial deep learning, foundation models |
academic-writing.md | Academic writing conventions |
apa-citations.md | APA 7th edition citation formatting |
disaster-resilience.md | Disaster management, community resilience |
environmental-health.md | Environmental epidemiology, exposure assessment |
literature-mining.md | Literature search and synthesis strategies |
research-iteration.md | Iterative research refinement patterns |
Control Flags
Edit CLAUDE.md before starting a long run:
AUTO_PROCEED: false # true = auto-select top idea after discovery; false = wait for approval
HUMAN_CHECKPOINT: true # true = pause after each review round; false = run autonomously
COMPACT_MODE: false # true = use output/PROJ_NOTES.md instead of full logs (saves context)
EXTERNAL_REVIEW: false # true = use Claude subagent / external reviewer LLM
full-pipeline also accepts REVIEWER_DIFFICULTY = medium | hard | nightmare and ARXIV_DOWNLOAD = true | false. Overrides can be passed inline, e.g.:
/full-pipeline "topic β AUTO_PROCEED: false, difficulty: nightmare"
Harness Engineering
Claude Code's hook system automates lifecycle management (configured in settings.json):
| Hook | When | What it does |
|---|---|---|
PreToolUse | Before Bash/Write | Validates paths, blocks dangerous commands, logs intent |
PostToolUse | After tool execution | Updates state, caches results |
SkillUse | Before/after each Skill tool call | harness/hooks/skill_marker.sh writes per-stage markers feeding tools/telemetry_stage_marker.py |
Stop | Agent session ends | Writes handoff.json, updates memory/MEMORY.md, runs tools/telemetry_aggregate.py to emit output/TELEMETRY.jsonl and output/TELEMETRY_STAGES.jsonl, sends notification |
Notification | Long tasks finish | Desktop alert via notify-send / osascript |
Autoresearch Scoring Loop
paper-draft writes draft
β
paper-review-loop scores it (separate context β generatorβevaluator separation)
β
All 5 dimension floors met AND weighted avg β₯ 7.5? β ACCEPT
β (else)
paper-draft revises (max 3 attempts total)
β
If still not accepted β flag for human review
| Dimension | Weight | Hard floor |
|---|---|---|
| Novelty | 30% | β₯ 6.5 |
| Rigor | 25% | β₯ 7.0 |
| Literature coverage | 20% | β₯ 6.5 |
| Clarity | 15% | β₯ 6.0 |
| Impact | 10% | β₯ 6.0 |
Accept requires weighted avg β₯ 7.5 and all five floors met.
Journal Templates & Profiles
Templates enforce correct structure, section ordering, word limits, and formatting. paper-covert additionally loads a YAML profile that drives LaTeX conversion.
Markdown templates (templates/)
| Category | Journals |
|---|---|
geoscience/ | Nature Geoscience, Geophysical Research Letters |
remote_sensing/ | Remote Sensing of Environment, IEEE TGRS, ISPRS JPRS |
giscience/ | IJGIS, Transactions in GIS, Annals of AAG |
Submission profiles (skills/paper-covert/profiles/)
aag_annals.yaml, generic.yaml, ieee_tgrs.yaml, ijgis.yaml, isprs_jprs.yaml, rse.yaml, tgis.yaml.
MCP Servers
Declared in .mcp.json. Setup notes in mcp/README_MCP.md.
| Server | Purpose |
|---|---|
filesystem | Read/write local files and datasets |
fetch | Fetch web content (papers, data portals, journal pages) |
geo_mcp | Spatial data: GADM, OSM Overpass, Census ACS, GEE (mcp/geo_mcp_server.py) |
arxiv_mcp | ArXiv search, paper fetch, abstract parsing |
github | GitHub repo reading and code management |
brave_search | Web search for literature, datasets, documentation |
Key Output Files
| File | Written by |
|---|---|
output/LIT_REVIEW_REPORT.md | lit-review |
output/IDEA_REPORT.md / NOVELTY_REPORT.md / IDEA_REVIEW_REPORT.md | idea-discovery-pipeline |
output/refine-logs/FINAL_PROPOSAL.md / REFINE_REPORT.md | refine-research |
output/refine-logs/EXPERIMENT_PLAN.md / output/EXPERIMENT_TRACKER.md | experiment-design |
output/experiment/EXPERIMENT_RESULT.md / EXPERIMENT_LOG.md | deploy-experiment |
output/experiment/data/ / figures/ / scripts/ | deploy-experiment, spatial-analysis |
output/AUTO_REVIEW_REPORT.md / REVIEW_STATE.json / review-rounds/ | auto-review-loop |
output/METHOD_DESCRIPTION.md | auto-review-loop |
output/NARRATIVE_REPORT.md | generate-report |
output/PAPER_PLAN.md | paper-plan |
output/figures/ | paper-figure-generate |
output/manuscript/ | paper-draft, paper-review-loop |
output/papers/ | paper-covert |
output/reports/submit_check_*.md | submit-check |
data/DATA_MANIFEST.md, data/raw/ | data-download |
output/PROJ_NOTES.md | all skills (append-only, compact log) |
output/TELEMETRY.jsonl (per-session) and output/TELEMETRY_STAGES.jsonl (per-skill) | tools/telemetry_aggregate.py (run by Stop hook) |
output/CONTRACT_VIOLATION.md | any skill that detects a downgraded success criterion or other contract violation |
memory/MEMORY.md, handoff.json | Stop hook |
Project Structure
night_owl_research_agent/
βββ CLAUDE.md β Dashboard and project conventions
βββ README.md β This file
βββ design_principle.md β Skill-level design principles (export β Excel via tools/)
βββ design_principle_agents.md β Sub-agent design principles
βββ settings.json β Claude Code hooks, permissions, env vars
βββ .mcp.json β MCP server declarations
β
βββ .claude/
β βββ commands/
β β βββ launcher.md β /launcher (only installed slash command)
β βββ agents/ β Specialist sub-agent definitions (9 total)
β βββ orchestrator.md
β βββ literature-scout.md
β βββ synthesis-analyst.md
β βββ gap-finder.md
β βββ hypothesis-generator.md
β βββ geo-specialist.md
β βββ paper-writer.md
β βββ peer-reviewer.md
β βββ citation-manager.md
β
βββ skills/ β 22 workflow skills + knowledge/
β βββ full-pipeline/SKILL.md
β βββ lit-review/SKILL.md
β βββ idea-discovery-pipeline/SKILL.md
β βββ generate-idea/SKILL.md
β βββ novelty-check/SKILL.md
β βββ idea-review/SKILL.md
β βββ refine-research/SKILL.md
β βββ experiment-design/SKILL.md
β βββ experiment-design-pipeline/SKILL.md
β βββ deploy-experiment/SKILL.md
β βββ data-download/SKILL.md
β βββ spatial-analysis/SKILL.md
β βββ auto-review-loop/SKILL.md
β βββ generate-report/{SKILL.md, templates/}
β βββ paper-writing-pipeline/SKILL.md
β βββ paper-plan/SKILL.md
β βββ paper-figure-generate/{SKILL.md, templates/}
β βββ paper-draft/{SKILL.md, templates/}
β βββ paper-review-loop/{SKILL.md, templates/}
β βββ paper-covert/{SKILL.md, profiles/, templates/}
β βββ submit-check/SKILL.md
β βββ training-check/SKILL.md
β βββ knowledge/ β Domain reference files
β
βββ tools/ β CLI utilities (called by skills + harness)
β βββ arxiv_fetch.py
β βββ semantic_scholar_fetch.py
β βββ convert_skills_to_llm_chat.py
β βββ export_design_principle_table.py β exports design_principle.md tables to Excel
β βββ export_agent_design_principle_table.py β exports design_principle_agents.md tables
β βββ telemetry_stage_marker.py β called by skill_marker hook (per-skill timing)
β βββ telemetry_aggregate.py β called by Stop hook (session/stage telemetry)
β
βββ configs/
β βββ default.yaml β Scoring weights, domain keywords
β
βββ templates/ β Project + paper templates
β βββ EXPERIMENT_LOG_TEMPLATE.md
β βββ EXPERIMENT_PLAN_TEMPLATE.md
β βββ FINDINGS_TEMPLATE.md
β βββ HANDOFF_TEMPLATE.json
β βββ IDEA_CANDIDATES_TEMPLATE.md
β βββ PAPER_PLAN_TEMPLATE.md
β βββ RESEARCH_CONTRACT_TEMPLATE.md
β βββ RESEARCH_PLAN_TEMPLATE.md
β βββ REVIEW_STATE_TEMPLATE.json
β βββ geoscience/ (nature_geoscience, grl_template)
β βββ remote_sensing/ (ieee_tgrs, isprs_jprs, remote_sensing_env)
β βββ giscience/ (ijgis, transactions_gis, annals_aag)
β
βββ harness/
β βββ hooks/ (pre_tool_use, post_tool_use, skill_marker, stop_hook, notification)
β βββ prompts/system_geo.md
β
βββ mcp/ β MCP server implementations
β βββ geo_mcp_server.py
β βββ README_MCP.md
β
βββ memory/MEMORY.md β Persistent session memory
β
βββ output/ β All generated outputs
β βββ AUTO_REVIEW.md
β βββ REVIEW_STATE.json
β βββ ARCHITECTURE_DIAGRAM_PROMPTS.md
β βββ papers/
β βββ figures/
β βββ reports/
β
βββ res/nora_architecture.png β Architecture diagram
β
βββ archived/ β Retired skills and pre-skill Python modules
Contributing
- Fork this repository.
- Add skills in
skills/<name>/SKILL.md. - Add journal templates in
templates/(plus a YAML profile inskills/paper-covert/profiles/if needed). - Add domain knowledge in
skills/knowledge/.
License
MIT License. See LICENSE for details.
Design Principles
Two living documents describe the rules NORA's skills and sub-agents follow. Treat them as the source of truth when you write or change a skill.
| File | Scope |
|---|---|
design_principle.md | Skill-level principles: anchored problem, smallest adequate mechanism, generatorβevaluator separation, conditional geospatial checks, mandatory local-GPU check before any pilot or full experiment, human-checkpoint pattern for synthesis decisions. |
design_principle_agents.md | Sub-agent principles for the 9 specialists in .claude/agents/ (orchestrator, literature-scout, gap-finder, hypothesis-generator, geo-specialist, paper-writer, peer-reviewer, citation-manager, synthesis-analyst). |
Both files can be exported to Excel for review or workshop use:
python tools/export_design_principle_table.py
python tools/export_agent_design_principle_table.py
Inspired By
NORA's design borrows ideas from several open-source projects. Credit and gratitude to their authors:
- BZBarrett/superpowers β skill-pack patterns for extending Claude Code with composable Markdown workflows.
- BZBarrett/get-shit-done β pragmatic harness patterns for getting long-running agentic work to actually finish.
- wanshuiyin/Auto-claude-code-research-in-sleep β "research while you sleep" autonomous-loop concept that motivated NORA's overnight pipelines, handoff.json recovery, and adversarial review loop.
- karpathy/autoresearch β generatorβevaluator separation and the per-criterion floors + weighted-average scoring loop adapted into
auto-review-loopandpaper-review-loop.
If your project influenced NORA and is missing here, please open an issue and we will add it.
Citation
If you use NORA in your research, please cite the arXiv preprint:
Zhou, B., Wu, Q., Huang, X., Ning, H., Li, D., & Zhang, Z. (2026). NORA: Night Owl Research Agent β Autonomous AI Research for Geoscience, Remote Sensing, and GIScience. arXiv:2605.02092. https://arxiv.org/abs/2605.02092
@misc{zhou2026nora,
title = {NORA: Night Owl Research Agent --- Autonomous AI Research for Geoscience, Remote Sensing, and GIScience},
author = {Zhou, Bing and Wu, Qiusheng and Huang, Xiao and Ning, Huan and Li, Diya and Zhang, Ziyi},
year = {2026},
eprint = {2605.02092},
archivePrefix = {arXiv},
url = {https://arxiv.org/abs/2605.02092},
howpublished = {\url{https://github.com/GRIND-Lab-Core/night_owl_research_agent}}
}
