Detect Search Engines in Sumo Logic CSE
Adversaries may use search engines to collect information about victims that can be used during targeting. Search engine services typically crawl online sites to index content and may provide users with specialized syntax to search for specific keywords or specific types of content (i.e. filetypes). Adversaries may craft various search engine queries — commonly called 'Google dorks' — to harvest general information about victims, as well as use specialized queries to look for spillages or leaks of sensitive information such as network details, credentials, or exposed configuration files. Information from these sources may reveal opportunities for other forms of reconnaissance, establishing operational resources, and/or initial access. The Kimsuky threat group (G0094) has been documented using Google searches to identify target vulnerabilities, tools, and geopolitical trends.
MITRE ATT&CK
- Tactic
- Reconnaissance
- Technique
- T1593 Search Open Websites/Domains
- Sub-technique
- T1593.002 Search Engines
- Canonical reference
- https://attack.mitre.org/techniques/T1593/002/
Sumo Detection Query
_sourceCategory=*web* OR _sourceCategory=*iis* OR _sourceCategory=*apache*
| parse regex "(?<client_ip>\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}) - (?<user>[^ ]+) .* \"(?<method>[A-Z]+) (?<uri>[^ ]+).*\" (?<status>\\d+)"
| count by client_ip, uri
| sort by _count desc Sumo Logic detection for Search Engines (T1593.002). Uses _sourceCategory path filtering for flexible log routing compatibility, with JSON field extraction and statistical aggregation to surface search engines patterns. Designed for the Sumo Logic Cloud SIEM platform.
Data Sources
Required Tables
False Positives & Tuning
- Legitimate users reaching public web content via normal search engine queries that happen to contain sensitive keywords (e.g., searching for 'admin portal login guide' and landing on your documentation)
- Security researchers and authorized penetration testers performing scheduled reconnaissance assessments against your domains
- Search engine crawlers (Googlebot, Bingbot, DuckDuckBot) probing robots.txt, sitemap.xml, and other indexed paths as part of normal site indexing
- Automated vulnerability scanners (Qualys, Nessus, Burp Suite Enterprise) probing for sensitive file paths during authorized scheduled scans
Other platforms for T1593.002
Testing Methodology
Validate this detection against 5 adversary techniques from Atomic Red Team. Each test below lists the behaviour to exercise and the telemetry you should expect to see. Executable commands and cleanup steps are available with Pro.
- Test 1Google Dork Enumeration via curl (Linux)
Expected signal: On the attacker system: outbound HTTPS connections to google.com (142.250.0.0/15 range). In attacker network flow logs: TCP connections to port 443 with google.com SNI. If the adversary clicks through to discovered resources: victim W3CIISLog entries with csReferer containing 'google.com/search?q=site:example.com+filetype:env', cIP matching the attacker IP, and csUriStem matching the discovered sensitive file path.
- Test 2Sensitive File Path Probe — Post-Dork Simulation (Linux)
Expected signal: Web server access logs showing 18 sequential GET requests to sensitive paths from the same source IP within seconds. IIS W3CIISLog: cIP=attacker_ip, csMethod=GET, csUriStem matching each path, scStatus reflecting actual server response, scBytes showing bytes returned. Apache/Nginx access_combined: similar fields. The burst pattern (multiple sensitive paths in <30 seconds from same IP) distinguishes automated probing from manual browsing.
- Test 3theHarvester Search Engine Reconnaissance
Expected signal: On the attacker system: theHarvester makes repeated HTTPS requests to Google/Bing/DuckDuckGo with queries formatted as '@example.com' and 'site:example.com'. Network logs show connections to search engine IP ranges. If using a monitored network: proxy logs capture the search queries. On the victim's DNS infrastructure: resolution requests for any discovered subdomains from the attacker IP may appear in DNS query logs.
- Test 4Google Hacking Database (GHDB) Pattern Audit via PowerShell
Expected signal: Sysmon Event ID 1: powershell.exe process with Invoke-WebRequest in CommandLine. Sysmon Event ID 3: network connections from powershell.exe to Bing IP ranges (40.77.0.0/17, 204.79.197.200). Windows Firewall log: outbound connections to port 443. PowerShell ScriptBlock Logging Event ID 4104 (if enabled): full script content including domain variable and dork patterns. Security Event ID 4688 (if command line auditing enabled): PowerShell process with Invoke-WebRequest command.
- Test 5robots.txt and Sitemap Enumeration (Pre-Dork Reconnaissance)
Expected signal: Web server access logs showing sequential GET requests to /robots.txt, /sitemap.xml, /sitemap_index.xml, and /.well-known/security.txt from the same source IP within seconds. IIS W3CIISLog: cIP=attacker_ip, csUriStem matching each discovery endpoint. Apache/Nginx access.log: similar pattern. The hunting query for unusual spikes in robots.txt/sitemap access triggers if this is run at scale (multiple IPs) or if the baseline is low.
References (10)
- https://attack.mitre.org/techniques/T1593/002/
- https://www.exploit-db.com/google-hacking-database
- https://www.recordedfuture.com/threat-intelligence-101/threat-analysis-techniques/google-dorks
- https://securitytrails.com/blog/google-hacking-techniques
- https://github.com/laramies/theHarvester
- https://developers.google.com/search/docs/crawling-indexing/robots/intro
- https://search.google.com/search-console/about
- https://learn.microsoft.com/en-us/azure/sentinel/data-connectors/iis-logs
- https://learn.microsoft.com/en-us/azure/azure-monitor/reference/tables/w3ciislog
- https://www.sans.org/blog/google-hacking-finding-vulnerabilities/
Unlock Pro Content
Get the full detection package for T1593.002 including response playbook, investigation guide, and atomic red team tests.