Search Victim-Owned Websites
This detection identifies adversary reconnaissance activity targeting victim-owned websites, including automated crawling, directory enumeration, and harvesting of sensitive pages such as robots.txt, sitemap.xml, staff/contact directories, and hidden paths. Because T1594 is a PRE-ATT&CK technique occurring outside the victim network, detection relies on web server access logs, WAF telemetry, and CDN logs ingested into SIEM. Detection focuses on high-volume requests from single source IPs, enumeration of employee/contact pages, known scraping tool user agents, and sequential access patterns indicative of automated reconnaissance tools used by groups like Kimsuky, Volt Typhoon, Silent Librarian, and Sandworm Team.
let timeWindow = 1h;
let requestThreshold = 100;
let errorThreshold = 20;
let suspiciousUAs = dynamic(["scrapy", "python-requests", "python-urllib", "wget", "nikto", "masscan", "nmap", "zgrab", "go-http-client", "libwww-perl", "java/", "curl/", "dirbuster", "gobuster", "feroxbuster", "wfuzz", "ffuf", "httprint", "sqlmap", "whatweb"]);
let reconPaths = dynamic(["/robots.txt", "/sitemap.xml", "/sitemap_index.xml", "/.well-known/security.txt", "/.git/", "/.env", "/wp-admin", "/admin", "/staff", "/team", "/employees", "/contact", "/about", "/management", "/leadership", "/directory"]);
W3CIISLog
| where TimeGenerated > ago(timeWindow)
| extend UALower = tolower(csUserAgent)
| extend IsReconUA = UALower has_any (suspiciousUAs)
| extend IsReconPath = csUriStem has_any (reconPaths)
| extend Is404 = (scStatus == 404)
| extend IsEmployeePage = csUriStem matches regex @"(?i)/(staff|team|employees|people|directory|contact|about|leadership|management|board)"
| where IsReconUA or IsReconPath or Is404 or IsEmployeePage
| summarize
TotalRequests = count(),
Count404 = countif(scStatus == 404),
Count403 = countif(scStatus == 403),
UniquePathsRequested = dcount(csUriStem),
ReconPathHits = countif(IsReconPath),
EmployeePageHits = countif(IsEmployeePage),
SuspiciousUAUsed = countif(IsReconUA),
UserAgents = make_set(csUserAgent, 5),
AccessedPaths = make_set(csUriStem, 30),
FirstRequest = min(TimeGenerated),
LastRequest = max(TimeGenerated)
by SourceIP = cIp, TargetSite = sSiteName
| extend ReconScore =
(case(Count404 > 50, 3, Count404 > 20, 2, Count404 > 5, 1, 0)) +
(case(TotalRequests > 500, 3, TotalRequests > 200, 2, TotalRequests > 100, 1, 0)) +
(case(ReconPathHits > 3, 2, ReconPathHits >= 1, 1, 0)) +
(case(SuspiciousUAUsed > 0, 2, 0)) +
(case(EmployeePageHits > 5, 2, EmployeePageHits >= 1, 1, 0))
| where ReconScore >= 3
| extend SessionDurationMinutes = datetime_diff('minute', LastRequest, FirstRequest)
| extend RequestsPerMinute = round(todouble(TotalRequests) / max_of(SessionDurationMinutes, 1), 1)
| project
FirstRequest, LastRequest, SourceIP, TargetSite,
TotalRequests, Count404, Count403, UniquePathsRequested,
ReconPathHits, EmployeePageHits, SuspiciousUAUsed,
SessionDurationMinutes, RequestsPerMinute,
UserAgents, AccessedPaths, ReconScore
| sort by ReconScore desc, TotalRequests desc Data Sources
Required Tables
False Positives
- Legitimate search engine crawlers (Googlebot, Bingbot, DuckDuckBot) with high request volumes — filter by known crawler IP ranges and UA strings
- Authorized penetration testing or red team engagements scheduled by the organization — cross-reference with change management records
- Web archiving services such as archive.org (Internet Archive) performing scheduled snapshots
- SEO audit tools used by the marketing team (Screaming Frog, Ahrefs, SEMrush bots)
- Load testing tools (Apache JMeter, k6, Locust) run by the engineering team generating high 404 rates
References (6)
- https://attack.mitre.org/techniques/T1594/
- https://www.comparitech.com/blog/information-security/website-information-leak/
- https://www.theregister.com/2015/07/29/googles_robots_txt_problem/
- https://www.cisa.gov/news-events/cybersecurity-advisories/aa24-038a
- https://blog.google/threat-analysis-group/exposing-initial-access-broker-ties-conti/
- https://www.justice.gov/opa/press-release/file/1328521/download
Unlock Pro Content
Get the full detection package for T1594 including response playbook, investigation guide, and atomic red team tests.