Search Victim-Owned Websites
This detection identifies adversary reconnaissance activity targeting victim-owned websites, including automated crawling, directory enumeration, and harvesting of sensitive pages such as robots.txt, sitemap.xml, staff/contact directories, and hidden paths. Because T1594 is a PRE-ATT&CK technique occurring outside the victim network, detection relies on web server access logs, WAF telemetry, and CDN logs ingested into SIEM. Detection focuses on high-volume requests from single source IPs, enumeration of employee/contact pages, known scraping tool user agents, and sequential access patterns indicative of automated reconnaissance tools used by groups like Kimsuky, Volt Typhoon, Silent Librarian, and Sandworm Team.
What is T1594 Search Victim-Owned Websites?
Search Victim-Owned Websites (T1594) maps to the Reconnaissance tactic — the adversary is trying to gather information they can use to plan future operations in MITRE ATT&CK.
This page provides production-ready detection logic for Search Victim-Owned Websites, covering the data sources and telemetry it touches: Microsoft Sentinel (W3C IIS Logs), Azure WAF Logs. The queries below are rated medium severity at low confidence, and ship for 7 SIEM platforms — KQL, SPL, Elastic, QRadar, Sumo, YARA-L, LogScale.
MITRE ATT&CK
- Tactic
- Reconnaissance
- Technique
- T1594 Search Victim-Owned Websites
- Canonical reference
- https://attack.mitre.org/techniques/T1594/
let timeWindow = 1h;
let requestThreshold = 100;
let errorThreshold = 20;
let suspiciousUAs = dynamic(["scrapy", "python-requests", "python-urllib", "wget", "nikto", "masscan", "nmap", "zgrab", "go-http-client", "libwww-perl", "java/", "curl/", "dirbuster", "gobuster", "feroxbuster", "wfuzz", "ffuf", "httprint", "sqlmap", "whatweb"]);
let reconPaths = dynamic(["/robots.txt", "/sitemap.xml", "/sitemap_index.xml", "/.well-known/security.txt", "/.git/", "/.env", "/wp-admin", "/admin", "/staff", "/team", "/employees", "/contact", "/about", "/management", "/leadership", "/directory"]);
W3CIISLog
| where TimeGenerated > ago(timeWindow)
| extend UALower = tolower(csUserAgent)
| extend IsReconUA = UALower has_any (suspiciousUAs)
| extend IsReconPath = csUriStem has_any (reconPaths)
| extend Is404 = (scStatus == 404)
| extend IsEmployeePage = csUriStem matches regex @"(?i)/(staff|team|employees|people|directory|contact|about|leadership|management|board)"
| where IsReconUA or IsReconPath or Is404 or IsEmployeePage
| summarize
TotalRequests = count(),
Count404 = countif(scStatus == 404),
Count403 = countif(scStatus == 403),
UniquePathsRequested = dcount(csUriStem),
ReconPathHits = countif(IsReconPath),
EmployeePageHits = countif(IsEmployeePage),
SuspiciousUAUsed = countif(IsReconUA),
UserAgents = make_set(csUserAgent, 5),
AccessedPaths = make_set(csUriStem, 30),
FirstRequest = min(TimeGenerated),
LastRequest = max(TimeGenerated)
by SourceIP = cIp, TargetSite = sSiteName
| extend ReconScore =
(case(Count404 > 50, 3, Count404 > 20, 2, Count404 > 5, 1, 0)) +
(case(TotalRequests > 500, 3, TotalRequests > 200, 2, TotalRequests > 100, 1, 0)) +
(case(ReconPathHits > 3, 2, ReconPathHits >= 1, 1, 0)) +
(case(SuspiciousUAUsed > 0, 2, 0)) +
(case(EmployeePageHits > 5, 2, EmployeePageHits >= 1, 1, 0))
| where ReconScore >= 3
| extend SessionDurationMinutes = datetime_diff('minute', LastRequest, FirstRequest)
| extend RequestsPerMinute = round(todouble(TotalRequests) / max_of(SessionDurationMinutes, 1), 1)
| project
FirstRequest, LastRequest, SourceIP, TargetSite,
TotalRequests, Count404, Count403, UniquePathsRequested,
ReconPathHits, EmployeePageHits, SuspiciousUAUsed,
SessionDurationMinutes, RequestsPerMinute,
UserAgents, AccessedPaths, ReconScore
| sort by ReconScore desc, TotalRequests desc Detects automated reconnaissance against victim-owned web properties by correlating IIS web server access logs for high-volume requests, directory enumeration patterns (404 spikes), access to reconnaissance-specific paths (robots.txt, sitemap.xml, .git, .env), employee/contact directory harvesting, and known scraping tool user agents. Assigns a composite ReconScore to prioritize high-confidence alerts.
Data Sources
Required Tables
False Positives
- Legitimate search engine crawlers (Googlebot, Bingbot, DuckDuckBot) with high request volumes — filter by known crawler IP ranges and UA strings
- Authorized penetration testing or red team engagements scheduled by the organization — cross-reference with change management records
- Web archiving services such as archive.org (Internet Archive) performing scheduled snapshots
- SEO audit tools used by the marketing team (Screaming Frog, Ahrefs, SEMrush bots)
- Load testing tools (Apache JMeter, k6, Locust) run by the engineering team generating high 404 rates
Sigma rule & cross-platform mapping
The detection logic for Search Victim-Owned Websites (T1594) above is provided in a vendor-neutral
form so you can deploy it on any SIEM. The same logic is shipped here as native
KQL (Microsoft Sentinel / Defender), SPL (Splunk), Elastic (Elastic Security (EQL)), QRadar (IBM QRadar (AQL)), Sumo (Sumo Logic CSE), YARA-L (Google Chronicle / SecOps), LogScale (CrowdStrike LogScale (CQL)) queries. In Sigma terms, this detection targets the
following logsource:
logsource:
product: azure Browse the community-maintained Sigma rules for this technique:
Platform-specific guides for T1594
Testing Methodology
Validate this detection against 3 adversary techniques from Atomic Red Team. Each test below lists the behaviour to exercise and the telemetry you should expect to see. Executable commands and cleanup steps are available with Pro.
- Test 1Automated Website Crawling with wget Spider Mode
Expected signal: Web server access logs showing rapid sequential GET requests from single IP with wget user agent. Multiple 200, 301, and 404 responses across diverse URL paths. Request rate 20-100 req/min.
- Test 2Reconnaissance Path Enumeration with robots.txt and sitemap.xml Harvest
Expected signal: Sequential requests to robots.txt, sitemap.xml then employee-related paths. User agent 'python-requests' in all requests. Mix of 200 and 404 responses across 60-second window.
- Test 3Directory Enumeration with ffuf Wordlist Scanning
Expected signal: Burst of 404 responses (12 requests, 1 per path) within 90 seconds. ffuf or spoofed browser UA. Requests for paths like /admin, /staff, /.git, /.env. Rate approximately 10 req/min.
Unlock Pro Content
Get the full detection package for T1594 including response playbook, investigation guide, and atomic red team tests.