Incident Response Playbook

Severity Matrix & SLAs

Severity	Description	Detect SLA	Contain SLA	Examples
P0 Critical	Platform down, auth breach, mass data exposure, confirmed misinfo causing field harm	<15 min	<1 hour	JWT secret leaked, DB exposed publicly, XSS exfiltrating tokens
P1 High	Significant feed contamination, auth bypass, critical route returning wrong data	<1 hour	<4 hours	Chad-person articles in live feed, admin route accessible to free user
P2 Medium	Degraded accuracy, partial service failure, elevated false-positive rate	<4 hours	<24 hours	Social search returning >30% irrelevant results, LLM analysis timing out
P3 Low	Minor UI defects, non-critical metric drift, cosmetic issues	<24 hours	<72 hours	Confidence badge missing on some items, timestamp formatting wrong

Severity

Description

Detect SLA

Contain SLA

Examples

P0 Critical

Platform down, auth breach, mass data exposure, confirmed misinfo causing field harm

<15 min

<1 hour

JWT secret leaked, DB exposed publicly, XSS exfiltrating tokens

P1 High

Significant feed contamination, auth bypass, critical route returning wrong data

<1 hour

<4 hours

Chad-person articles in live feed, admin route accessible to free user

P2 Medium

Degraded accuracy, partial service failure, elevated false-positive rate

<4 hours

<24 hours

Social search returning >30% irrelevant results, LLM analysis timing out

P3 Low

Minor UI defects, non-critical metric drift, cosmetic issues

<24 hours

<72 hours

Confidence badge missing on some items, timestamp formatting wrong

Escalation RACI

Incident Type	Responsible	Accountable	Consulted	Informed
Data integrity / feed contamination	Backend Engineer	Engineering Lead	Data/Intel Lead	All users via status page
Auth / access control breach	Security Lead	CISO / Founder	Backend Engineer	Affected users, legal if data exposed
Misinfo injection / LLM manipulation	Data/Intel Lead	Engineering Lead	Security Lead	Field operators using affected country data
Platform outage (P0)	DevOps/SRE	Engineering Lead	All engineers	All users, management

Incident Type

Responsible

Accountable

Consulted

Informed

Data integrity / feed contamination

Backend Engineer

Engineering Lead

Data/Intel Lead

All users via status page

Auth / access control breach

Security Lead

CISO / Founder

Backend Engineer

Affected users, legal if data exposed

Misinfo injection / LLM manipulation

Data/Intel Lead

Engineering Lead

Security Lead

Field operators using affected country data

Platform outage (P0)

DevOps/SRE

Engineering Lead

All engineers

All users, management

Playbooks

Playbook 1 — Feed Contamination / Geo-Disambiguation Failure P1

Trigger

Live Incident Feed or social-search fallback showing sports/entertainment/person-name content for an African country. Example: MLB articles appearing for Chad, music articles for Mali.

Response Steps

Detect: Monitor /health and check /social-search?q=Chad&county=Chad meta for dropped_geo_irrelevant counter. If counter is 0 and feed has bad content, filter is not firing.
Isolate: Check if AMBIGUOUS_COUNTRY_NAMES Set includes the affected country. Check if isGeoRelevant and topicScore are being called in the affected code path.
Contain: If immediate fix not possible, temporarily disable the fallback path by returning empty array for affected country until fix is deployed.
Fix: Add country to AMBIGUOUS_COUNTRY_NAMES if missing. Verify fix covers all three paths: monitor ingestion, /africa/events, /social-search.
Validate: Run manual query: curl "/social-search?q=Chad Africa&county=Chad" and confirm meta shows dropped_geo_irrelevant > 0 and results are geopolitical.
Deploy & monitor: Deploy fix via scp + pm2 restart. Watch PM2 logs for 10 minutes post-deploy.

Evidence to preserve: screenshot of feed, browser network tab showing API response JSON with results array, server log excerpt showing dropped counters.

Playbook 2 — Authentication / Authorization Breach P0

Trigger

Unauthorized access to admin routes, JWT tokens accepted after revocation, API key bypass, privilege escalation from free-tier to admin.

Response Steps

Detect: Check audit log at /audit for unexpected admin actions. Check /admin/route-matrix to confirm all admin routes require requireRole('admin').
Contain immediately: If active breach suspected — rotate JWT_SECRET in .env and restart server. This invalidates ALL active sessions (all users must re-login).
Rotate secrets: Generate new JWT_SECRET (32+ bytes), new WEBHOOK_SIGNING_KEY. Update /opt/africa-watch/.env. Restart: pm2 restart africa-watch.
Audit accounts: Query SELECT * FROM users WHERE role='admin' — verify no unexpected admin accounts.
Review logs: Pull full audit log for past 48h. Identify all actions taken by compromised session/token.
Root cause: Check if bypass was via x-api-key header (should now return 401 "not configured"), stale JWT, or role escalation.
Notify: If user data was accessed — notify affected users within 72h per data protection obligations.

Evidence: audit_log table export, PM2 logs from incident window, compromised JWT (decode at jwt.io for claims), IP addresses from logs.

Playbook 3 — LLM Misinfo Injection / Prompt Manipulation P1

Trigger

LLM analysis output contains fabricated events, contradicts known facts, or shows signs of prompt injection (unusual instruction-following tone, policy-violating content, off-topic analysis).

Response Steps

Detect: Cross-reference LLM output against raw articles in the analysis modal. If LLM claims X but no source article supports it — likely hallucination or injection.
Isolate: Capture the exact prompt sent to the LLM: add temporary debug logging to buildPrompt() in llm-analysis.js.
Check inputs: Review the article text that fed the prompt. Check sanitizePromptInput() was applied. Look for injection patterns: "Ignore previous", "System:", "You are now".
Contain: If active injection detected — add the offending article source to BLOCKED_DOMAINS in /social-search. Clear _explainCache in memory (restart server).
Review sanitizer: Update INJECTION_PATTERN regex in security-middleware.js to catch new pattern.
Validate: Re-run the affected location's analysis and confirm output is grounded in cited articles only.

Evidence: LLM prompt (log it), LLM response verbatim, offending source article URL, browser console showing the analysis API response.

Communications Templates

Internal (Slack / WhatsApp)

🚨 INCIDENT DECLARED — [P0/P1/P2] [SHORT TITLE] Time detected: [HH:MM UTC] Affected: [system/feature/users] Current status: [investigating / contained / resolved] Incident lead: [@name] Next update: [HH:MM UTC] Thread for updates ↓

External (User-Facing Status)

We are aware of an issue affecting [feature] on the Africa Watch platform. Our team is investigating and working to resolve this as quickly as possible. Current status: [Investigating / Fix deployed / Monitoring] We will provide an update by [time]. We apologise for any inconvenience. — Africa Watch Team

Date

Playbook

Participants

Outcome

Actions Raised

Incident Response Playbook

Severity Matrix & SLAs

Escalation RACI

Playbooks

Trigger

Response Steps

Trigger

Response Steps

Trigger

Response Steps

Communications Templates

Internal (Slack / WhatsApp)

External (User-Facing Status)

Drill Log