Detect Determine Physical Locations in Microsoft Sentinel
Adversaries may gather the victim's physical location(s) that can be used during targeting. Information about physical locations of a target organization may include a variety of details, including where key resources and infrastructure are housed. Physical locations may also indicate what legal jurisdiction and/or authorities the victim operates within. Adversaries may gather this information via direct elicitation through phishing for information, by searching victim-owned websites, or by leveraging publicly accessible data sets such as SEC EDGAR filings, WHOIS registration records, and social media. This reconnaissance technique is largely external to the victim environment, making direct detection extremely limited. Observable signals include automated scraping of organization-owned web properties, OSINT tool execution on managed endpoints, and email-based location elicitation attempts.
MITRE ATT&CK
- Tactic
- Reconnaissance
- Technique
- T1591 Gather Victim Org Information
- Sub-technique
- T1591.001 Determine Physical Locations
- Canonical reference
- https://attack.mitre.org/techniques/T1591/001/
KQL Detection Query
// Branch 1: Detect automated scraping of location/contact pages via WAF/proxy logs (CommonSecurityLog)
let LocationPagePatterns = dynamic(["/contact", "/about", "/locations", "/offices", "/headquarters", "/find-us", "/branches", "/our-locations", "/office-locations", "/where-we-are"]);
let SuspiciousAgents = dynamic(["python-requests", "go-http-client", "curl/", "wget/", "Scrapy", "theHarvester", "recon-ng", "HTTrack", "WebCopier", "libwww-perl", "mechanize", "python-urllib"]);
let WebScrapingAlerts = CommonSecurityLog
| where TimeGenerated > ago(24h)
| where RequestURL has_any (LocationPagePatterns)
| where UserAgent has_any (SuspiciousAgents)
or UserAgent matches regex @"(?i)(bot|spider|crawler|scraper|scanner|harvest)"
or isempty(UserAgent)
| summarize
RequestCount = count(),
UniqueURLs = dcount(RequestURL),
FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated),
SampleURLs = make_set(RequestURL, 5),
SampleAgents = make_set(UserAgent, 3)
by SourceIP, DeviceName
| where RequestCount > 15 or UniqueURLs > 4
| extend DetectionSource = "WAF_LocationScraping"
| extend DeviceName2 = DeviceName, AccountName2 = "", FileName2 = "", ProcessCommandLine2 = "";
// Branch 2: OSINT tools for physical location gathering on managed endpoints
let OsintToolAlerts = DeviceProcessEvents
| where Timestamp > ago(24h)
| where FileName has_any ("theHarvester", "recon-ng", "maltego", "spiderfoot", "metagoofil", "datasploit")
or (FileName in~ ("python.exe", "python3.exe", "python")
and ProcessCommandLine has_any ("theHarvester", "recon-ng", "harvester", "spiderfoot", "metagoofil"))
or (FileName in~ ("cmd.exe", "powershell.exe")
and ProcessCommandLine has_all ("whois", "-d"))
| extend
RequestCount = int(null),
UniqueURLs = int(null),
FirstSeen = Timestamp,
LastSeen = Timestamp,
SampleURLs = dynamic([]),
SampleAgents = dynamic([])
| extend DetectionSource = "Endpoint_OsintTool"
| extend DeviceName2 = DeviceName, AccountName2 = AccountName, FileName2 = FileName, ProcessCommandLine2 = ProcessCommandLine;
WebScrapingAlerts
| union OsintToolAlerts
| project DetectionSource, FirstSeen, LastSeen, DeviceName2, AccountName2, FileName2, ProcessCommandLine2, SourceIP, RequestCount, UniqueURLs, SampleURLs, SampleAgents
| sort by FirstSeen desc Detects indicators of physical location reconnaissance targeting the organization using two parallel branches. Branch 1 monitors CommonSecurityLog (WAF/proxy CEF-format logs) for automated scraping of location and contact pages using suspicious user agents or anomalous request volumes. Branch 2 monitors DeviceProcessEvents for known OSINT frameworks (theHarvester, recon-ng, Maltego, SpiderFoot) executing on managed endpoints. Confidence is set to low due to the predominantly external nature of this reconnaissance activity and high false positive potential from legitimate web crawlers.
Data Sources
Required Tables
False Positives & Tuning
- Legitimate search engine crawlers (Googlebot, Bingbot, DuckDuckBot) accessing public location pages — filter by known good crawler IP ranges published by Google and Microsoft
- Internal IT security teams or authorized penetration testers executing OSINT tools on managed endpoints during sanctioned assessments — correlate against approved change tickets
- Marketing or business development teams using web scraping tools for competitive intelligence or market research — verify user account context and business justification
- Website uptime monitoring and accessibility checking services (UptimeRobot, Pingdom, StatusCake) that regularly access contact/about pages to verify availability
Other platforms for T1591.001
Testing Methodology
Validate this detection against 5 adversary techniques from Atomic Red Team. Each test below lists the behaviour to exercise and the telemetry you should expect to see. Executable commands and cleanup steps are available with Pro.
- Test 1theHarvester Domain Reconnaissance for Physical Location Data
Expected signal: Auditd syscall: execve() for 'theharvester' binary with -d and -b arguments. Sysmon for Linux Event ID 1 (if deployed): Image=theharvester, CommandLine contains '-d example.com -b google,bing,linkedin'. DNS queries: multiple resolution requests for subdomains of example.com. Network connections: outbound HTTPS (port 443) to Google, Bing, and LinkedIn APIs. File creation: /tmp/argus_location_recon.html and .xml output files.
- Test 2recon-ng Physical Location Module Execution
Expected signal: Process creation: recon-ng process with -w workspace and -C command arguments. Network connections: outbound HTTPS to whois servers and the recon-ng module data sources. File creation: ~/.recon-ng/workspaces/argus_test_workspace/ directory and data.db SQLite database. DNS queries for whois server hostnames.
- Test 3Automated HTTP Scraping of Organizational Location Pages
Expected signal: Web server access log entries: Multiple requests from 127.0.0.1 with UserAgent='python-requests/2.28.2' targeting /contact, /about, /locations, /offices, /headquarters, /find-us, /branches, /our-locations in rapid succession. Process creation (Sysmon Event ID 1): python3 process with inline script. Network connections (Sysmon Event ID 3): python3 to localhost:8080. UniqueURLs=8 exceeds the detection threshold of 4.
- Test 4WHOIS Registration Data Physical Address Extraction
Expected signal: Process creation: whois binary executed three times (one per domain), visible in Sysmon Event ID 1 with FileName=whois and domain argument in CommandLine. Network connections (Sysmon Event ID 3): TCP connections to port 43 (WHOIS protocol) to whois.iana.org, whois.verisign-grs.com, and registrar-specific WHOIS servers. DNS queries (Sysmon Event ID 22) for WHOIS server hostnames. Shell history: whois commands preserved in ~/.bash_history.
- Test 5SEC EDGAR Filing Search for Physical Address Disclosure
Expected signal: Process creation (Sysmon Event ID 1): curl execution with EDGAR URL as argument, FileName=curl. DNS query (Sysmon Event ID 22): efts.sec.gov DNS resolution from internal endpoint. Network connection (Sysmon Event ID 3): outbound HTTPS (port 443) to SEC EDGAR servers. File creation (Sysmon Event ID 11): /tmp/edgar_location_results.json. Shell history: curl command with EDGAR URL preserved.
References (12)
- https://attack.mitre.org/techniques/T1591/001/
- https://attack.mitre.org/techniques/T1591/
- https://www.sec.gov/edgar/search/
- https://threatpost.com/broadvoice-leaks-350m-records-voicemail-transcripts/160158/
- https://blog.google/threat-analysis-group/iranian-backed-threat-actor-techniques/
- https://github.com/laramies/theHarvester
- https://github.com/lanmaster53/recon-ng
- https://github.com/smicallef/spiderfoot
- https://www.maltego.com/
- https://learn.microsoft.com/en-us/azure/web-application-firewall/ag/ag-overview
- https://learn.microsoft.com/en-us/microsoft-365/security/defender/advanced-hunting-emailevents-table
- https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Multisearch
Unlock Pro Content
Get the full detection package for T1591.001 including response playbook, investigation guide, and atomic red team tests.