T1593.002 Microsoft Sentinel · KQL

Detect Search Engines in Microsoft Sentinel

Adversaries may use search engines to collect information about victims that can be used during targeting. Search engine services typically crawl online sites to index content and may provide users with specialized syntax to search for specific keywords or specific types of content (i.e. filetypes). Adversaries may craft various search engine queries — commonly called 'Google dorks' — to harvest general information about victims, as well as use specialized queries to look for spillages or leaks of sensitive information such as network details, credentials, or exposed configuration files. Information from these sources may reveal opportunities for other forms of reconnaissance, establishing operational resources, and/or initial access. The Kimsuky threat group (G0094) has been documented using Google searches to identify target vulnerabilities, tools, and geopolitical trends.

MITRE ATT&CK

Tactic
Reconnaissance
Technique
T1593 Search Open Websites/Domains
Sub-technique
T1593.002 Search Engines
Canonical reference
https://attack.mitre.org/techniques/T1593/002/

KQL Detection Query

Microsoft Sentinel (KQL)
kusto
let DorkPatterns = dynamic(["filetype:", "ext:", "inurl:", "intitle:", "intext:", "site:", "cache:", "allintitle:", "allinurl:"]);
let SensitiveTerms = dynamic(["password", "passwd", "credential", "api_key", "apikey", "secret", "token", "config", "backup", ".env", "admin", "vpn", "portal", "jira", "confluence", "database"]);
let SearchEngineReferrers = dynamic(["google.com/search", "bing.com/search", "search.yahoo.com", "duckduckgo.com", "yandex.com/search"]);
let RefererDorks = W3CIISLog
| where TimeGenerated > ago(24h)
| where isnotempty(csReferer) and csReferer has_any (SearchEngineReferrers)
| extend SearchQuery = url_decode(extract(@"[?&]q=([^&]+)", 1, csReferer))
| where isnotempty(SearchQuery)
| where SearchQuery has_any (DorkPatterns) or SearchQuery has_any (SensitiveTerms)
| extend HasDorkOperator = SearchQuery has_any (DorkPatterns)
| extend HasSensitiveTerm = SearchQuery has_any (SensitiveTerms)
| extend DetectionBranch = "referer_dork"
| project TimeGenerated, cIP, csUsername, csMethod, csUriStem, csUriQuery, SearchQuery, csReferer, HasDorkOperator, HasSensitiveTerm, scStatus, DetectionBranch;
let SensitivePathAccess = W3CIISLog
| where TimeGenerated > ago(24h)
| where csUriStem has_any (".env", ".git/config", ".git/HEAD", "wp-config.php", "web.config", "config.php", ".htpasswd", "/backup", "database.sql", "/credentials", "/.aws/credentials", "/.ssh/id_rsa", "phpinfo.php", "/server-status", "/elmah.axd", "/.DS_Store")
| extend HasDorkOperator = false
| extend HasSensitiveTerm = false
| extend SearchQuery = ""
| extend DetectionBranch = "sensitive_path_access"
| project TimeGenerated, cIP, csUsername, csMethod, csUriStem, csUriQuery, SearchQuery, csReferer, HasDorkOperator, HasSensitiveTerm, scStatus, DetectionBranch;
RefererDorks
| union SensitivePathAccess
| sort by TimeGenerated desc
medium severity low confidence

Detects potential Google dorking reconnaissance via two complementary branches using IIS web server logs (W3CIISLog) ingested into Microsoft Sentinel. Branch 1 (referer_dork): identifies inbound HTTP requests where the Referer header contains a search engine query string with dork operators (filetype:, inurl:, intitle:, site:, ext:) or sensitive keyword terms, indicating the visitor reached your site via a crafted dork query. Branch 2 (sensitive_path_access): identifies direct access to commonly dorked sensitive paths — .env files, .git directories, wp-config.php, backup files, and credential stores — that are frequent targets of Google dork enumeration campaigns. Confidence is low due to the PRE-platform nature of the technique; the adversary's search activity is entirely external and only leaves victim-side telemetry if they follow through to access discovered resources.

Data Sources

Application Log: Application Log ContentNetwork Traffic: Network Traffic ContentIIS Web Server Logs (W3CIISLog)

Required Tables

W3CIISLog

False Positives & Tuning

  • Legitimate users reaching public web content via normal search engine queries that happen to contain sensitive keywords (e.g., searching for 'admin portal login guide' and landing on your documentation)
  • Security researchers and authorized penetration testers performing scheduled reconnaissance assessments against your domains
  • Search engine crawlers (Googlebot, Bingbot, DuckDuckBot) probing robots.txt, sitemap.xml, and other indexed paths as part of normal site indexing
  • Automated vulnerability scanners (Qualys, Nessus, Burp Suite Enterprise) probing for sensitive file paths during authorized scheduled scans
  • Web monitoring and uptime services that access known paths for availability checks, potentially triggering the sensitive path branch
Download portable Sigma rule (.yml)

Other platforms for T1593.002


Testing Methodology

Validate this detection against 5 adversary techniques from Atomic Red Team. Each test below lists the behaviour to exercise and the telemetry you should expect to see. Executable commands and cleanup steps are available with Pro.

  1. Test 1Google Dork Enumeration via curl (Linux)

    Expected signal: On the attacker system: outbound HTTPS connections to google.com (142.250.0.0/15 range). In attacker network flow logs: TCP connections to port 443 with google.com SNI. If the adversary clicks through to discovered resources: victim W3CIISLog entries with csReferer containing 'google.com/search?q=site:example.com+filetype:env', cIP matching the attacker IP, and csUriStem matching the discovered sensitive file path.

  2. Test 2Sensitive File Path Probe — Post-Dork Simulation (Linux)

    Expected signal: Web server access logs showing 18 sequential GET requests to sensitive paths from the same source IP within seconds. IIS W3CIISLog: cIP=attacker_ip, csMethod=GET, csUriStem matching each path, scStatus reflecting actual server response, scBytes showing bytes returned. Apache/Nginx access_combined: similar fields. The burst pattern (multiple sensitive paths in <30 seconds from same IP) distinguishes automated probing from manual browsing.

  3. Test 3theHarvester Search Engine Reconnaissance

    Expected signal: On the attacker system: theHarvester makes repeated HTTPS requests to Google/Bing/DuckDuckGo with queries formatted as '@example.com' and 'site:example.com'. Network logs show connections to search engine IP ranges. If using a monitored network: proxy logs capture the search queries. On the victim's DNS infrastructure: resolution requests for any discovered subdomains from the attacker IP may appear in DNS query logs.

  4. Test 4Google Hacking Database (GHDB) Pattern Audit via PowerShell

    Expected signal: Sysmon Event ID 1: powershell.exe process with Invoke-WebRequest in CommandLine. Sysmon Event ID 3: network connections from powershell.exe to Bing IP ranges (40.77.0.0/17, 204.79.197.200). Windows Firewall log: outbound connections to port 443. PowerShell ScriptBlock Logging Event ID 4104 (if enabled): full script content including domain variable and dork patterns. Security Event ID 4688 (if command line auditing enabled): PowerShell process with Invoke-WebRequest command.

  5. Test 5robots.txt and Sitemap Enumeration (Pre-Dork Reconnaissance)

    Expected signal: Web server access logs showing sequential GET requests to /robots.txt, /sitemap.xml, /sitemap_index.xml, and /.well-known/security.txt from the same source IP within seconds. IIS W3CIISLog: cIP=attacker_ip, csUriStem matching each discovery endpoint. Apache/Nginx access.log: similar pattern. The hunting query for unusual spikes in robots.txt/sitemap access triggers if this is run at scale (multiple IPs) or if the baseline is low.

Unlock Pro Content

Get the full detection package for T1593.002 including response playbook, investigation guide, and atomic red team tests.

Response PlaybookInvestigation GuideHunting QueriesAtomic Red Team TestsTuning Guidance

Related Detections