Overview
SpearHead is a multi-agent spear phishing research platform that automates the full OSINT-to-phishing-email lifecycle. It is designed exclusively for authorized security testing, academic research, and defensive cybersecurity training.
Architecture
The system is composed of five specialized agents that run in sequence, each feeding results into the next phase:
Services
| Service | Stack | Port | Purpose |
|---|---|---|---|
| Backend | Python / FastAPI | 8000 | Pipeline engine, REST API, WebSocket |
| Dashboard | Next.js 16 | 3000 | Operational UI — graph, console, results |
| Neo4j | neo4j:5 (Docker) | 7474 / 7687 | Knowledge graph database |
Quick Start
Two ways to get SpearHead running quickly, depending on your setup:
Option A — Docker Compose (recommended)
No Python, Node.js, or Neo4j installation required. Just Docker Desktop.
$ git clone https://github.com/ismaellaya/SpearAI.git$ cd SpearAI$ cp .env.example .env# Edit .env — set GEMINI_API_KEY or adjust OLLAMA_BASE_URL$ docker compose up --buildOpen http://localhost:3000 once the three services are running. See the Docker Compose section below for full details, Ollama tips, and runtime key management.
Option B — Shell scripts (dev mode)
Requires Python 3.10+, Node.js 20+, and Docker Desktop already installed. The scripts start Neo4j via Docker Compose, activate the virtualenv, run the FastAPI backend, and launch the Next.js dev server.
$ git clone https://github.com/ismaellaya/SpearAI.git$ cd SpearAI$ cp .env.example .env# Install Python dependencies (first time only)$ python -m venv .venv$ source .venv/bin/activate # Linux / macOS$ .venv\Scripts\activate # Windows PowerShell$ pip install -r requirements.txt# Install dashboard dependencies (first time only)$ cd dashboard && npm install && cd ..# Run everything$ chmod +x run_dashboard.sh && ./run_dashboard.sh # Linux / macOS$ .\run_dashboard.ps1 # Windows PowerShell--reload) and the dashboard (next dev).Docker Compose
The easiest way to share SpearHead with a tester or run it on a fresh machine — no Python venv, no manual service juggling. A single command builds and starts Neo4j, the FastAPI backend, and the Next.js dashboard together.
Prerequisites
Only Docker Desktop (Mac/Windows) or Docker Engine + Compose v2 (Linux) is required. No Python, Node.js, or Neo4j installation needed on the host.
1 — Clone & configure
$ git clone https://github.com/ismaellaya/SpearAI.git$ cd SpearAI$ cp .env.example .envOpen .env and set at least one LLM provider. For Gemini (recommended for first run):
LLM_PROVIDER=geminiGEMINI_API_KEY=your_key_herelocalhost. Set the host address instead:• Mac / Windows (Docker Desktop):
OLLAMA_BASE_URL=http://host.docker.internal:11434• Linux:
OLLAMA_BASE_URL=http://172.17.0.1:114342 — Build & start
# First run builds images (~5 min). Subsequent starts are instant.$ docker compose up --build# Run in the background$ docker compose up --build -dAll three services start together. The backend connects to Neo4j automatically once it is ready (typically within ~60 seconds of first startup). You will see logs from all three services interleaved in the terminal.
3 — Open SpearHead
| URL | Service |
|---|---|
| http://localhost:3000 | Dashboard — main UI |
| http://localhost:8000/health | Backend health check |
| http://localhost:7474 | Neo4j browser (neo4j / changeme) |
On first visit, the Setup Screen will appear asking you to select a use case (Academic, Red Team, etc.). This choice is stored in .env automatically.
Stopping & restarting
# Stop all services (data is preserved in ./data/neo4j and ./results)$ docker compose down# Rebuild after a code change$ docker compose up --build# Wipe everything including graph data$ docker compose down -vConfiguring OSINT keys at runtime
You do not need to edit .env before every key change. Open the dashboard → Settings panel → OSINT Tools, enter the key and click Save. The backend writes it to .env immediately and picks it up on the next pipeline run — no container restart needed.
.env file is mounted as a bind-mount inside the backend container. Never commit it to a public repository — it may contain LLM API keys.Installation (Manual)
Manual installation gives you full control over each service and is recommended for production or customized deployments.
1 — Python backend
$ python -m venv .venv# Activate (Linux/macOS)$ source .venv/bin/activate# Activate (Windows)$ .venv\Scripts\activate$ pip install -r requirements.txt2 — Neo4j (Docker)
# Starts Neo4j at bolt://localhost:7687# Web UI at http://localhost:7474$ docker-compose up -dDefault credentials: neo4j / changeme (set in .env).
3 — FastAPI server
$ python -m uvicorn src.api:app --reload --host 0.0.0.0 --port 80004 — Dashboard
$ cd dashboard$ npm install$ npm run dev # Dev server at http://localhost:3000Configuration
Copy .env.example to .env in the project root. Only one LLM provider is required — all OSINT keys are optional and can also be set at runtime from the dashboard Settings panel.
# ── Neo4j ────────────────────────────────────────────────────────────────────# Docker Compose overrides NEO4J_URI to bolt://neo4j:7687 automatically.NEO4J_URI=bolt://localhost:7687NEO4J_USER=neo4jNEO4J_PASSWORD=changeme# ── LLM Provider — pick one: local | gemini | claude | openai ────────────────LLM_PROVIDER=gemini# Ollama (LLM_PROVIDER=local)OLLAMA_BASE_URL=http://localhost:11434OLLAMA_MODEL=llama3# Gemini (LLM_PROVIDER=gemini)GEMINI_API_KEY=GEMINI_MODEL=gemini-2.0-flash# Claude / Anthropic (LLM_PROVIDER=claude)ANTHROPIC_API_KEY=ANTHROPIC_MODEL=claude-sonnet-4-6# OpenAI (LLM_PROVIDER=openai)OPENAI_API_KEY=OPENAI_MODEL=gpt-4o# ── Use case (set automatically via Setup Screen on first run) ────────────────# Values: academic | red_team | blue_team | awareness | threat_intel# USE_CASE=# ── Optional OSINT tools ──────────────────────────────────────────────────────GITHUB_TOKEN= # Raises GitHub rate limit from 60 to 5 000 req/hHUNTER_API_KEY= # Hunter.io email finder — 25 searches/month (free tier)APIFY_API_TOKEN= # Apify social scraping — Instagram & TikTok profilesAPIFY_ENABLED=true # Set to false to disable Apify without removing the keyHIBP_API_KEY= # HaveIBeenPwned individual email lookup (paid key)USE_CASE values
| Value | Context injected into LLM prompts |
|---|---|
| academic | Academic research and security education context |
| red_team | Authorized penetration test under signed statement of work |
| blue_team | Defensive SOC analyst training material |
| awareness | Authorized employee security awareness campaign |
| threat_intel | Threat intelligence and attribution research |
Environment Variables Reference
All configuration is done via environment variables in .env. Optional keys can also be set at runtime from the dashboard Settings panel without restarting the backend.
| Variable | Type | Default | Description |
|---|---|---|---|
| NEO4J_URI | url | bolt://localhost:7687 | Neo4j Bolt URI (overridden to bolt://neo4j:7687 in Docker Compose) |
| NEO4J_USER | string | neo4j | Neo4j username |
| NEO4J_PASSWORD | secret | changeme | Neo4j password |
| LLM_PROVIDER | string | gemini | LLM backend: local (Ollama) | gemini | claude | openai |
| OLLAMA_BASE_URL | url | http://localhost:11434 | Ollama server URL — use host.docker.internal:11434 in Docker |
| OLLAMA_MODEL | string | llama3 | Ollama model name |
| GEMINI_API_KEY | secret | — | Google Gemini API key |
| GEMINI_MODEL | string | gemini-2.0-flash | Gemini model name |
| ANTHROPIC_API_KEY | secret | — | Anthropic Claude API key |
| ANTHROPIC_MODEL | string | claude-sonnet-4-6 | Anthropic model name |
| OPENAI_API_KEY | secret | — | OpenAI API key |
| OPENAI_MODEL | string | gpt-4o | OpenAI model name |
| USE_CASE | string | (setup screen) | Authorization framing: academic | red_team | blue_team | awareness | threat_intel |
| GITHUB_TOKEN | secret | — | GitHub PAT — raises rate limit from 60 to 5 000 req/h |
| HUNTER_API_KEY | secret | — | Hunter.io API key for email discovery |
| APIFY_API_TOKEN | secret | — | Apify token for Instagram & TikTok OSINT |
| APIFY_ENABLED | bool | true | Set to false to disable Apify per-run without removing the key |
| HIBP_API_KEY | secret | — | HaveIBeenPwned key for individual email breach lookup (paid) |
| CORS_ORIGINS | string | localhost:3000 | Comma-separated allowed origins for the dashboard |
Pipeline Modes
| Mode | Agents | Output |
|---|---|---|
| full | Scout → Profiler → Attack → Reviewer → Report | Emails + graph + JSON report |
| search | Scout → Profiler | Knowledge graph only (no emails) |
| deep_search | Scout (extended depth) → Profiler | More OSINT + knowledge graph |
Running a Scan
Follow these steps to run a full pipeline from the dashboard:
- 1Enter target(s) in the sidebar textarea — names, email addresses, or domains (up to 10 per run).
- 2Fill the Engagement Context panel: org name, type (Corporation / University / Government), known domains, and scope notes.
- 3Select the pipeline mode (Full / Search / Deep Search) and the number of attack variation angles (1–3).
- 4Choose the SpearDetector mode: Entropy (fast, no API needed) or AI (requires a configured LLM).
- 5Click Run Pipeline. Watch the Phase Timeline and Agent Console update in real time.
- 6When complete, click View Results to see generated emails grouped by target and variation. Use the Compare mode for side-by-side analysis.
Batch Targets
Enter multiple targets one per line in the sidebar textarea (maximum 10 per run). The pipeline processes them sequentially and tracks progress via the batch_info WebSocket field.
# Example target list (one per line)john.doe@example.comjane.smith@corp.comtargetdomain.comAlice JohnsonClean DB (in Settings) before a new engagement to avoid graph data from previous runs mixing with the current batch. The Clean DB option also runs automatically for the first target when checked in the UI.First Run Workflow
The single most effective technique for improving result quality is running the pipeline on a new target with Deep Search enabled and a disambiguation hint in parentheses:
# Use natural language — commas separate targets, so avoid them inside hintsJohn Smith (CTO working at Acme Corp based in Madrid)alice.johnson@corp.com (IT Security engineer at Corp Ltd in London)targetdomain.comThe hint inside parentheses acts as a semantic anchor: the ScoutAgent's LLM summarizer uses it to discard search results that belong to a different person with the same name, and the ProfilerAgent uses it to reject entities that don't fit the described profile.
Step 1 — First run (new target)
- 1Enable Deep Search (the "Deep" button in the sidebar turns amber when active).
- 2Add context in parentheses after the target name using natural language — avoid commas as they are used to separate targets. Example: "John Smith (CTO at Acme Corp based in Madrid)".
- 3Fill the Engagement Context panel with the target organisation's known domains.
- 4Run in Full mode. The pipeline creates a Person node in Neo4j with high-confidence properties.
Step 2 — Enrichment from the graph
Once the node exists in Neo4j, right-click it in the graph and launch a new pipeline directly from the context menu. On this second pass:
- ›The ProfilerAgent sees existing node properties and discards entities that contradict the established profile.
- ›Re-running with Deep Search adds more OSINT depth without duplicating existing confident nodes.
Step 3 — Annotate and validate
Right-click any node → Annotate to mark it with one of three statuses:
| Status | Color | Meaning |
|---|---|---|
| confirmed | 🟢 green #00ff88 | Manually verified as correct |
| false_positive | 🔴 red #ff4444 | Belongs to a different person — keep visible but flagged |
| needs_review | 🟡 amber #ffaa00 | Uncertain — pending verification |
Annotations are stored as Neo4j node properties and persist across runs. They help the LLM on future executions by providing a curated, validated context.
Reducing False Positives
False positives — nodes that belong to a different person with the same name — are the main quality issue when profiling common names. These techniques reduce them significantly.
Disambiguation hints (most effective)
Add context in parentheses directly in the target field. The more specific, the better:
| Target input | Effect |
|---|---|
| John Smith | High false-positive risk — common name |
| John Smith (software engineer) | Better — filters unrelated professions |
| John Smith (senior engineer at Acme) | Good — company narrows scope significantly |
| John Smith (CTO at Acme Corp based in Madrid) | Best — role + company + city is unambiguous |
Engagement Context scope notes
The scope_notes field in the Engagement Context panel is injected into the ProfilerAgent prompt. Use it to explicitly rule out categories:
# Example scope notesTarget is the CTO of Acme Corp Madrid — discard any resultsrelated to sports, music, or other industries. Focus onlyon software engineering and corporate leadership.Confidence filter
Use the confidence filter in the graph toolbar (ALL / MED+ / HIGH) to hide low-confidence nodes during review. Person nodes are always shown regardless of the filter. Switch to HIGH first to inspect the reliable core, then expand to ALL to review edge cases.
false_positive remain in the graph with a red ring. They act as negative examples, signalling to future pipeline runs that these entities have already been evaluated and rejected.Engagement Context Tips
The Engagement Context panel (collapsible, between the target input and the phase timeline) enriches both the OSINT collection and the LLM analysis phases.
| Field | What it does |
|---|---|
| org_name | Injected into the dork generator LLM prompt to produce more targeted Google queries |
| org_type | Sets the organisational frame (Corporation / University / Government / NGO / Startup) — influences ProfilerAgent entity weighting |
| known_domains | Activates passes 5–7 (WHOIS, DNS, crt.sh) and 7b (HIBP) on those domains; results go to org_domain_sources, kept separate from person data |
| scope_notes | Free-text injected into ProfilerAgent prompt to constrain entity extraction and reduce off-topic false positives |
known_domains: what gets activated
- +WHOIS lookup — registrant org, registrar, creation/expiry dates
- +DNS enumeration — MX records reveal email provider (Google Workspace / M365 / custom), TXT records expose SPF + DMARC policy
- +crt.sh Certificate Transparency — subdomain discovery, cert issuer
- +HIBP domain breach check — free lookup of corporate email domain against known breaches
- +Org-scoped DuckDuckGo search — "target name" site:domain.com for confirmed association evidence
- +Email pattern inference — ProfilerAgent derives first.last@domain, flast@domain patterns from DNS + confirmed Email nodes
active badge appears on the panel header. Infrastructure nodes created from known_domains (HOSTED_BY, REGISTERED_BY, USES_SERVICE, ISSUED_BY) are linked to the Website node, never directly to the Person node.Working with the Graph
Toolbar quick reference
Right-click context menu
- ›Edit Node — update properties directly from the dashboard
- ›Annotate Node — set confirmed / false_positive / needs_review status with optional notes and tags
- ›Generate Email — runs AttackAgent + ReviewerAgent using this node's Neo4j neighbourhood as RAG context
- ›Delete Node — removes the node and all its relationships
- ›Right-click on canvas background — opens Add Node picker to manually insert a new entity
Before starting a new engagement
Export the current graph (FileDown → GEXF) before clearing the database. GEXF files can be opened directly in Gephi for offline analysis. Then use Clean DB in Settings → Danger Zone (requires confirmation) to start fresh.
fx/fy) on the d3 simulation. It resets automatically when you refresh the graph or start a new run.ScoutAgent
Collection passes (sequential)
| Pass | Source | Condition |
|---|---|---|
| 1 | Google/DuckDuckGo dork (LLM-generated) | Always |
| 1b | Org-scoped search: "name" site:domain.com | Only if org_context.known_domains set |
| 2 | GitHub — public repos, languages, topics, bio | Always |
| 3 | Wayback Machine CDX API | Domain targets only |
| 4 | WHOIS — registrant, registrar, dates | Domain targets only |
| 5 | DNS — MX, TXT (SPF/DMARC), A records | Domain targets only |
| 6 | crt.sh Certificate Transparency (subdomains) | Domain targets only |
| 7 | HIBP domain breach check (no API key) | Domain targets only |
| 7b | HIBP email/paste lookup | Email targets (requires HIBP_API_KEY for paid endpoint; otherwise uses domain fallback) |
| 8 | Hunter.io email finder | Requires HUNTER_API_KEY |
| 9 | Apify social media (Instagram, TikTok) | Requires APIFY_API_TOKEN |
Output keys
{ "target": "john.doe@example.com", "sources": [...], # Person/target OSINT results "org_domain_sources": [...], # Infrastructure data (WHOIS/DNS/crt.sh) — kept separate "summary": "...", # LLM narrative summary "memory_context": "..." # Condensed context for AttackAgent}Source deduplication
Before passing to the LLM summarizer, _deduplicate_sources() applies:
- +Max 3 results per domain (prevents link-farm flooding)
- +Snippet prefix dedup (first 80 chars) removes near-duplicates
- +Hard cap: 50 sources total
- +Apify sources (image_url) and HIBP breach entries (DATA_BREACH) are always preserved, exempt from caps
ProfilerAgent
Node types
Relationship types
| Relationship | Usage |
|---|---|
| WORKS_AT | Person → Company (current or past employer) |
| EDUCATED_AT | Person → Education (university, school) |
| HAS_EMAIL | Person → Email |
| HAS_USERNAME | Person → Username (social media handles) |
| REGISTERED_ON | Person → Website (owned or operated domain) |
| INTERESTED_IN | Person → Topic (hobby, skill, technology) |
| LIVES_IN | Person → Location |
| HOSTED_BY | Website → Company (hosting provider) |
| REGISTERED_BY | Website → Company (domain registrar) |
| USES_SERVICE | Website → Company (Google Workspace, M365…) |
| ISSUED_BY | Website → Company (SSL certificate authority) |
Email pattern inference
When known org domains are present in the Engagement Context, ProfilerAgent automatically infers email address patterns by cross-referencing confirmed Email nodes against each domain and scoring format candidates:
first.last@domainflast@domainf.last@domainfirstlast@domainfirst@domainInferred patterns are stored as Email nodes with the property inferred_pattern: true for visual distinction in the graph.
Injection protection
Before calling the LLM, all OSINT data is sanitized against a blocklist of prompt injection patterns (</s>, <|system|>,[INST], ###, etc.). Blocked patterns are stripped silently so the injection attempt is defused without halting the pipeline.
AttackAgent
Graph RAG retrieval
AttackAgent queries Neo4j for the target's 1-hop neighbors:
MATCH (p:Person {name: $name})-[r]-(neighbor)RETURN type(r), labels(neighbor), neighbor.name, neighbor.source_contextLIMIT 25Retrieved neighbors (organizations, education, topics, usernames) are injected into the system prompt as structured context. This grounds the generated email in verifiable facts about the target rather than generic templates.
Persuasion angle variations
Output format
{ "target": "John Doe", "variation": 1, "subject": "Urgent: Account verification required by EOD", "body": "...", "language": "en", "context_used": ["Acme Corp", "GitHub", "Python", "San Francisco"]}ReviewerAgent
Scoring criteria
Score range: 0 (obvious spam) to 10 (would likely bypass detection):
| Score range | Interpretation |
|---|---|
| 0 – 3 | Low realism — obvious flags (generic template, wrong name, broken links) |
| 4 – 6 | Medium — might fool some users but would flag most gateways |
| 7 – 8 | High — convincing pretext, grounded details, natural tone |
| 9 – 10 | Very high — personalized to the point of requiring specific counter-detection |
Output format
{ "score": 8, "critique": "Strong authority pretext using real org name. The IT dept signature and urgency cues would flag Proofpoint's TAP rules. Subject line uses urgency keyword 'EOD'.", "improved_subject": "Re: Action required — identity verification portal", "improved_body": "...", "variation": 1}Always-Active Tools
These tools require no API key and run on every scan:
| Tool | Source | What it finds |
|---|---|---|
| Google/DuckDuckGo | HTML backend | Web profiles, job history, news mentions, conference talks |
| GitHub OSINT | api.github.com | Public repos, programming languages, topics, bio, README content |
| Wayback Machine | CDX API (cdx.api) | Archived page snapshots — old job titles, portfolio pages |
| WHOIS | python-whois | Domain registrant org, registrar, creation/expiry dates, nameservers |
| DNS | dnspython | MX (email provider), TXT (SPF/DMARC policy), A records |
| crt.sh | crt.sh public API | Subdomains via Certificate Transparency logs, cert issuers |
| HIBP (domain) | HIBP public API | Data breaches associated with an org domain (no key needed) |
Optional Tools (API Keys)
| Tool | Env var | What it adds | Cost |
|---|---|---|---|
| GitHub (auth) | GITHUB_TOKEN | Rate limit: 60 → 5000 req/h. Also unlocks private org membership on some accounts. | Free (personal token) |
| Hunter.io | HUNTER_API_KEY | Corporate email finder: format + pattern discovery for a domain. Email verifier. | 25 searches/mo free |
| HIBP (email) | HIBP_API_KEY | Individual email breach lookup + paste database search. Provides pretext context. | Paid — $3.50/mo |
Setting keys at runtime
API keys can be updated without restarting the backend. Open the dashboard Settings panel (gear icon in the sidebar) and expand the OSINT Tools section. Keys are saved to .env via POST /config/tools and take effect immediately. Use the Test button to verify each key before running a scan.
Red Team Engagement
Set USE_CASE=red_team in .env. This mode frames all LLM prompts as an authorized penetration test under a signed statement of work, maximizing realism while maintaining ethical guardrails.
Recommended workflow
- 1Fill the Engagement Context panel: org name, type (Corporation), known email domains, scope notes referencing the SoW number.
- 2Add employee names as batch targets — use hints like "Alice Johnson (IT Helpdesk)" to help the LLM disambiguate common names.
- 3Run in full mode with 3 variations to generate Authority, Rapport, and Opportunity angle emails for each target.
- 4Review results in the Compare pane — sort by Reviewer score to prioritize the most convincing simulations.
- 5Export as HTML report (Pro) or JSON for evidence documentation.
- 6Test top emails against your email gateway (Proofpoint, Mimecast) to measure detection rate.
Blue Team Training
Set USE_CASE=blue_team. Generates realistic phishing samples for SOC analyst training, detection rule development, and email gateway tuning.
Recommended workflow
- 1Run the pipeline on public personas related to your industry (e.g., public LinkedIn profiles of CISOs at peer companies).
- 2Export phishing emails from View Results — copy the raw email body or use the JSON export.
- 3Import into your phishing simulation platform (GoPhish, KnowBe4) as custom email templates.
- 4Use the Reviewer critique text as ground truth for what features make an email convincing — build detection rules targeting those features.
- 5Export the knowledge graph as GEXF to visualize organizational attack surface in Gephi.
Academic Research
Set USE_CASE=academic. All generated artifacts are labeled for academic/research context. Recommended for conference papers, security research, and educational use.
Research outputs
Security Awareness
Set USE_CASE=awareness. Personalized simulated phishing for authorized employee awareness campaigns — each email is tailored to the individual target using real OSINT.
Recommended workflow
- 1Add employee names as batch targets (up to 10 at a time). The system processes them sequentially.
- 2Run in full mode — each email is personalized to the individual's public professional profile.
- 3Review scores in the results panel. Lower-scoring emails may need manual refinement before deployment.
- 4Use generated emails in your authorized simulated phishing platform.
- 5Include the Reviewer critique in post-campaign follow-up training materials to explain exactly what made the email convincing.
Interface Overview
The dashboard runs at http://localhost:3000 and provides real-time visualization of the pipeline as it runs.
| Panel | Description |
|---|---|
| Sidebar | Target input, mode selector, Engagement Context panel, Phase Timeline (5 phases with timings), Agent Console (live log stream) |
| Graph Panel | Interactive force-directed knowledge graph. Nodes colored by type. Zoom, pan, fit-to-view. |
| Toolbar | Export (CSV/GEXF), layout (force/radial), label toggle, fullscreen, node type filters, confidence filter (ALL/MED+/HIGH) |
| Results Modal | Generated emails grouped by target. Sub-tabs per variation. Side-by-side Compare mode for multi-variation analysis. |
| History Panel | Previous pipeline results. Search by target or filename. JSON download, HTML report viewer, PDF export (Pro). |
| Email Analyzer | Standalone phishing detection for pasting any email. Entropy or AI detection mode. |
| Settings Modal | LLM config, OSINT tool keys (with Test button), Danger Zone (Clear Database with confirmation dialog). |
Knowledge Graph
The graph panel renders the full Neo4j graph using react-force-graph-2d. All OSINT entities and relationships are visualized as a force-directed layout.
Right-click context menu
| Action | Description |
|---|---|
| Edit Node | Update node name or any property inline |
| Annotate Node | Mark as Confirmed / False Positive / Needs Review with notes and tags. Color ring appears around the node. |
| Generate Email | Run AttackAgent + ReviewerAgent directly from any Person node. Shows result in GenerateEmailModal. |
| Delete Node | Removes node and all its relationships from Neo4j |
| Add Node (canvas) | Right-click on empty canvas background to add a new node of any type |
Graph export
Click the FileDown icon in the toolbar to export:
nodes-DATE.csv and edges-DATE.csv. Compatible with any spreadsheet tool or pandas.Confidence filter
The toolbar confidence control (ALL / MED+ / HIGH) hides low-confidence inferred nodes. Person nodes are always shown regardless of confidence. Useful for cleaning up cluttered graphs from large batch runs.
Annotation colors
Email Analysis
The standalone Email Analyzer panel allows you to paste any raw email and run SpearDetector analysis without running a full pipeline. Accessible from the dashboard via the mail icon in the sidebar.
| Mode | How it works | API key needed |
|---|---|---|
| Entropy | Statistical analysis: Shannon entropy, multilingual phishing keywords, link inspection, URL redirect chains | No |
| AI | LLM-based analysis: semantic phishing pattern detection, contextualized scoring | Yes (configured LLM) |
Rate limit: 10 analyses per minute per IP. Results show a risk score, flagged indicators, and entropy breakdown. The endpoint is POST /analyze-email.
Run History
The History panel (clock icon in the sidebar) lists all previous pipeline runs saved to results/. Results are saved as result_{target}_{timestamp}.json.
| Action | Description |
|---|---|
| Search | Filter history by target name or filename (real-time) |
| Open | View the full JSON result in the results modal |
| Download | Download the raw JSON result file |
| HTML Report | Open the generated HTML report in a new browser tab (Pro) |
| PDF Export | Download a Playwright-rendered A4 PDF (Pro) |
| Delete | Permanently removes the result file (path-traversal protected) |
results/checkpoints/ after each completed phase. If a pipeline run is interrupted, use POST /resume-pipeline/{run_id} to resume from the last checkpoint without repeating completed phases.Key Endpoints
The FastAPI backend exposes a REST API at http://localhost:8000. Interactive docs at /docs (Swagger UI) and /redoc.
| Method | Endpoint | Description | Rate limit |
|---|---|---|---|
| GET | /health | Service status (Neo4j, Ollama, Gemini, Pro features) | — |
| POST | /run-pipeline | Start pipeline. Body: PipelineRequest (targets, mode, variations_count, org_context) | 5/min |
| POST | /resume-pipeline/{run_id} | Resume from last checkpoint. run_id is a 12-char hex string. | 5/min |
| GET | /pipeline-status | Current phase + logs snapshot (polling fallback) | — |
| WS | /ws/logs | Real-time log stream + status + batch_info + db stats | — |
| GET/POST | /graph-data | Full graph as {nodes, links} | — |
| POST | /run-query | Execute arbitrary Cypher query (max 2000 chars) | 20/min |
| POST | /search-nodes | Search nodes by name substring | — |
| GET | /analytics | Graph metrics: degree/betweenness centrality, density, components | — |
| POST | /analyze-email | SpearDetector: paste email content, returns risk score | 10/min |
| POST | /nodes/{node_id}/annotate | Add/update annotation (status, notes, tags) | — |
| PUT | /nodes/{node_id} | Update node properties | — |
| DELETE | /nodes/{node_id} | Delete node and all relationships | — |
| POST | /nodes | Create (MERGE) a new node. Body: {label, name, properties?} | — |
| POST | /generate-email/{node_id} | Run AttackAgent + ReviewerAgent for a graph node | — |
| GET | /history | List saved result files | — |
| DELETE | /history/{filename} | Delete a result file (path-traversal protected) | — |
| GET | /export-graph | Export full graph: ?format=csv or ?format=gexf | — |
| GET | /config/tools | Returns which optional API keys are configured (never exposes values) | — |
| POST | /config/tools | Update API keys at runtime, persists to .env | — |
| POST | /config/tools/test | Validate configured API keys for GitHub, Hunter.io, HIBP | — |
Request Examples
# Start a pipelinecurl -X POST http://localhost:8000/run-pipeline \ -H "Content-Type: application/json" \ -d '{ "targets": ["Alice Smith (Marketing)"], "mode": "full", "variations_count": 2, "detection_method": "entropy", "org_context": { "org_name": "Acme Corp", "org_type": "corporation", "known_domains": ["acme.com"], "scope_notes": "Authorized red team - signed SoW ref #2024-RT-001" } }'# Analyze a suspicious emailcurl -X POST http://localhost:8000/analyze-email \ -H "Content-Type: application/json" \ -d '{ "content": "Subject: Urgent: Verify your account\n\nDear user, click here to verify...", "method": "entropy" }'// WebSocket connectionconst ws = new WebSocket('ws://localhost:8000/ws/logs');ws.onmessage = (event) => { const msg = JSON.parse(event.data); if (msg.type === 'log') { console.log(`[${msg.level}] ${msg.message}`); } if (msg.type === 'status') { console.log(`Phase ${msg.current_phase} -- running: ${msg.is_running}`); // msg.batch_info: { total, current_index, current_target, completed, failed } }};Graph Query Examples
Useful Cypher queries you can paste into POST /run-query or the dashboard's Query panel:
-- Get full profile of a personMATCH (p:Person {name: "Alice Smith"})-[r]-(n)RETURN p, type(r) AS rel, n-- Find all inferred email addressesMATCH (p:Person)-[:HAS_EMAIL]->(e:Email)WHERE e.inferred_pattern IS NOT NULLRETURN p.name AS person, e.name AS email, e.inferred_pattern AS patternORDER BY person-- List all companies with their employeesMATCH (p:Person)-[:WORKS_AT]->(c:Company)RETURN c.name AS company, collect(p.name) AS employeesORDER BY size(employees) DESC-- Export high-confidence nodes onlyMATCH (p:Person)-[r]-(n)WHERE n.confidence IS NOT NULL AND ANY(v IN values(apoc.convert.fromJsonMap(n.confidence)) WHERE v = "High")RETURN p.name, type(r), n.name, labels(n)WebSocket
Connect to ws://localhost:8000/ws/logs to receive real-time events. The dashboard's WebSocketProvider manages a single shared connection.
Message types
// Log message{ "type": "log", "message": "Phase 2: ProfilerAgent starting entity extraction...", "level": "info", // info | warning | error | debug "timestamp": "2026-03-23T14:32:01.123Z"}// Status update (sent every ~1 second while pipeline runs){ "type": "status", "is_running": true, "current_phase": 2, "batch_info": { "total": 3, "current_index": 1, "current_target": "john.doe@example.com", "completed": 1, "failed": 0 }}Subscribing in React
import { useWSMessage } from '@/lib/WebSocketProvider';// Subscribe to log messagesuseWSMessage('log', (msg) => { console.log(msg.level, msg.message);});// Subscribe to status updatesuseWSMessage('status', (msg) => { setIsRunning(msg.is_running); setPhase(msg.current_phase);});HTML Reports
The Pro package (spearai-pro) adds full HTML report generation after each pipeline run. Reports are saved alongside the JSON result in results/ and accessible from the History panel.
Report sections
| Section | Contents |
|---|---|
| Cover Page | Target name, date, USE_CASE mode, pipeline mode, run ID |
| Resource Usage | Total tokens, estimated USD cost, model name, per-phase token breakdown |
| Reconnaissance | OSINT sources used, full source list with categories, key findings |
| Knowledge Graph | Entity counts by type, relationship counts, graph density, top connected nodes |
| Attack Simulations | All generated emails organized by variation angle, subject, body, Reviewer score and critique |
| Conclusions | Risk summary, top-scoring simulations, recommended mitigations |
Installing Pro
# After purchasing, you receive a .whl file$ pip install spearai_pro-1.0.0-py3-none-any.whl# Verify installation$ python -c "from spearai_pro import ReportingAgent; print('Pro installed')"# Restart the backend — /health will show reports_available: true$ python -m uvicorn src.api:app --reload --host 0.0.0.0 --port 8000PDF Export
PDF export uses Playwright / Chromium to render the HTML report at A4 dimensions and export a pixel-perfect PDF. Available from the History panel via the Download PDF button.
Technical details
| Parameter | Value |
|---|---|
| Engine | Playwright chromium (headless) |
| Viewport | 794 × 1123 px (A4 at 96 dpi) |
| Wait | wait_until="load" + document.fonts.ready (Google Fonts) |
| Format | A4, portrait, no margin override |
| Endpoint | GET /export-pdf/{filename} |
| Runs in | Thread executor (avoids uvicorn event loop conflict) |
Playwright setup
# Install Playwright browsers (one-time)$ playwright install chromium# Or via Python$ python -m playwright install chromiumTroubleshooting
Common issues encountered when setting up or running SpearHead, with step-by-step resolution guidance.
Neo4j Connection Error
Neo4j unavailable – Scout running without graph memorydocker-compose up -d. Check Neo4j at http://localhost:7474 (default credentials: neo4j / changeme). The pipeline continues without the graph but results won't be persisted.LLM Provider Not Responding
ollama pull llama3). For Gemini / Claude / OpenAI: check your API key in .env or Settings → LLM Config in the dashboard. The LLM retries up to 3 times with 1s / 5s / 15s backoff.DuckDuckGo Returns No Results
deep_search mode more sparingly. The Google dork auto-generator may also produce overly specific queries — try simplifying the target name.Windows Encoding Errors (cp1252)
UnicodeDecodeError: 'cp1252' codec can't decode...chcp 65001 in cmd / PowerShell, or set PYTHONUTF8=1 in your environment. All Python source files include # -*- coding: utf-8 -*-.Apify Returns No Results
APIFY_API_TOKEN is set and Apify is enabled in Settings → OSINT Tools. Instagram and TikTok results depend on profile visibility settings.Pipeline Stuck / No Progress
max_google_results by using search mode instead of deep_search.SpearHead Documentation · MIT License
Last updated: March 2026 · Built by Ismael Laya
Social Media (Apify)
Set
APIFY_API_TOKENto enable social media OSINT via the Apify actor platform. The toggle in Settings lets you enable/disable Apify per-run without removing the key.Profile pictures from Apify are stored as
profile_pic_urlon the corresponding Username node in Neo4j. They are displayed in node tooltips in the dashboard graph.