Detect Search Engines in Splunk
Adversaries may use search engines to collect information about victims that can be used during targeting. Search engine services typically crawl online sites to index content and may provide users with specialized syntax to search for specific keywords or specific types of content (i.e. filetypes). Adversaries may craft various search engine queries — commonly called 'Google dorks' — to harvest general information about victims, as well as use specialized queries to look for spillages or leaks of sensitive information such as network details, credentials, or exposed configuration files. Information from these sources may reveal opportunities for other forms of reconnaissance, establishing operational resources, and/or initial access. The Kimsuky threat group (G0094) has been documented using Google searches to identify target vulnerabilities, tools, and geopolitical trends.
MITRE ATT&CK
- Tactic
- Reconnaissance
- Technique
- T1593 Search Open Websites/Domains
- Sub-technique
- T1593.002 Search Engines
- Canonical reference
- https://attack.mitre.org/techniques/T1593/002/
SPL Detection Query
index=web (sourcetype=iis OR sourcetype=access_combined OR sourcetype="apache:access")
| eval uri_lower=lower(cs_uri_stem)
| eval referer_lower=lower(cs_referer)
| eval is_search_referrer=if(match(referer_lower, "(google\.com/search|bing\.com/search|search\.yahoo\.com|duckduckgo\.com|yandex\.com/search)"), 1, 0)
| eval is_sensitive_path=if(match(uri_lower, "(\.env|\.git/config|\.git/head|wp-config\.php|web\.config|config\.php|\.htpasswd|/backup|database\.sql|/credentials|\.aws/credentials|\.ssh/id_rsa|phpinfo\.php|/server-status|/elmah\.axd|\.ds_store)"), 1, 0)
| where is_search_referrer=1 OR is_sensitive_path=1
| rex field=cs_referer "[?&]q=(?<raw_query>[^&]+)"
| eval search_query=urldecode(raw_query)
| eval query_lower=lower(coalesce(search_query, ""))
| eval has_dork_operator=if(match(query_lower, "(filetype:|ext:|inurl:|intitle:|intext:|site:|cache:|allintitle:|allinurl:)"), 1, 0)
| eval has_sensitive_term=if(match(query_lower, "(password|passwd|credential|api_key|apikey|secret|token|config|backup|\.env|admin|vpn|portal|jira|confluence|database)"), 1, 0)
| eval detection_branch=case(
is_search_referrer=1 AND (has_dork_operator=1 OR has_sensitive_term=1), "referer_dork",
is_sensitive_path=1, "sensitive_path_access",
1=1, "other")
| where detection_branch != "other"
| eval risk_score=case(
detection_branch="referer_dork" AND has_dork_operator=1 AND has_sensitive_term=1, 3,
detection_branch="referer_dork" AND (has_dork_operator=1 OR has_sensitive_term=1), 2,
detection_branch="sensitive_path_access" AND sc_status=200, 3,
detection_branch="sensitive_path_access", 1,
1=1, 1)
| table _time, c_ip, cs_username, cs_method, cs_uri_stem, cs_uri_query, search_query, has_dork_operator, has_sensitive_term, sc_status, detection_branch, risk_score
| sort - risk_score, - _time Detects Google dorking reconnaissance via IIS or Apache web server access logs in Splunk. Evaluates two detection branches: (1) search engine referrer headers containing dork operators (filetype:, inurl:, intitle:, site:) or sensitive keywords — indicating a visitor arrived via a crafted dork query; (2) direct access to commonly dorked sensitive paths (.env, .git/config, wp-config.php, etc.) that are standard targets of GHDB-style enumeration. Assigns a risk score (1-3) prioritizing: high-score events are dork referrers with both operator and sensitive term present, or successful (HTTP 200) sensitive path accesses. Events are sorted by risk score then recency.
Data Sources
Required Sourcetypes
False Positives & Tuning
- Legitimate users landing on public web resources via search queries that happen to contain sensitive keywords in the search terms
- Authorized penetration testers and security researchers performing scheduled assessments against your web infrastructure
- Search engine crawlers (Googlebot, Bingbot) probing known site paths including robots.txt-disallowed directories that may contain sensitive path patterns
- Automated vulnerability scanners performing authorized scans that enumerate common sensitive paths as part of their check list
- Web content delivery monitoring tools that periodically probe known application paths for availability and performance checks
Other platforms for T1593.002
Testing Methodology
Validate this detection against 5 adversary techniques from Atomic Red Team. Each test below lists the behaviour to exercise and the telemetry you should expect to see. Executable commands and cleanup steps are available with Pro.
- Test 1Google Dork Enumeration via curl (Linux)
Expected signal: On the attacker system: outbound HTTPS connections to google.com (142.250.0.0/15 range). In attacker network flow logs: TCP connections to port 443 with google.com SNI. If the adversary clicks through to discovered resources: victim W3CIISLog entries with csReferer containing 'google.com/search?q=site:example.com+filetype:env', cIP matching the attacker IP, and csUriStem matching the discovered sensitive file path.
- Test 2Sensitive File Path Probe — Post-Dork Simulation (Linux)
Expected signal: Web server access logs showing 18 sequential GET requests to sensitive paths from the same source IP within seconds. IIS W3CIISLog: cIP=attacker_ip, csMethod=GET, csUriStem matching each path, scStatus reflecting actual server response, scBytes showing bytes returned. Apache/Nginx access_combined: similar fields. The burst pattern (multiple sensitive paths in <30 seconds from same IP) distinguishes automated probing from manual browsing.
- Test 3theHarvester Search Engine Reconnaissance
Expected signal: On the attacker system: theHarvester makes repeated HTTPS requests to Google/Bing/DuckDuckGo with queries formatted as '@example.com' and 'site:example.com'. Network logs show connections to search engine IP ranges. If using a monitored network: proxy logs capture the search queries. On the victim's DNS infrastructure: resolution requests for any discovered subdomains from the attacker IP may appear in DNS query logs.
- Test 4Google Hacking Database (GHDB) Pattern Audit via PowerShell
Expected signal: Sysmon Event ID 1: powershell.exe process with Invoke-WebRequest in CommandLine. Sysmon Event ID 3: network connections from powershell.exe to Bing IP ranges (40.77.0.0/17, 204.79.197.200). Windows Firewall log: outbound connections to port 443. PowerShell ScriptBlock Logging Event ID 4104 (if enabled): full script content including domain variable and dork patterns. Security Event ID 4688 (if command line auditing enabled): PowerShell process with Invoke-WebRequest command.
- Test 5robots.txt and Sitemap Enumeration (Pre-Dork Reconnaissance)
Expected signal: Web server access logs showing sequential GET requests to /robots.txt, /sitemap.xml, /sitemap_index.xml, and /.well-known/security.txt from the same source IP within seconds. IIS W3CIISLog: cIP=attacker_ip, csUriStem matching each discovery endpoint. Apache/Nginx access.log: similar pattern. The hunting query for unusual spikes in robots.txt/sitemap access triggers if this is run at scale (multiple IPs) or if the baseline is low.
References (10)
- https://attack.mitre.org/techniques/T1593/002/
- https://www.exploit-db.com/google-hacking-database
- https://www.recordedfuture.com/threat-intelligence-101/threat-analysis-techniques/google-dorks
- https://securitytrails.com/blog/google-hacking-techniques
- https://github.com/laramies/theHarvester
- https://developers.google.com/search/docs/crawling-indexing/robots/intro
- https://search.google.com/search-console/about
- https://learn.microsoft.com/en-us/azure/sentinel/data-connectors/iis-logs
- https://learn.microsoft.com/en-us/azure/azure-monitor/reference/tables/w3ciislog
- https://www.sans.org/blog/google-hacking-finding-vulnerabilities/
Unlock Pro Content
Get the full detection package for T1593.002 including response playbook, investigation guide, and atomic red team tests.