← Blog · · df00tech

Splunk SPL vs Microsoft KQL: Detection Rule Syntax Comparison

SPL KQL Splunk Microsoft Sentinel detection engineering comparison

If you learned SPL first, KQL feels backwards. If you learned KQL first, SPL feels stuck in the 90s. Both catch the same attacks — they just argue differently about how.

This post walks through the real syntactic differences between Splunk’s Search Processing Language and Microsoft’s Kusto Query Language, with three production detection queries translated side-by-side. Every SPL and KQL snippet below is copied verbatim from the df00tech detection library — no hand-rolled examples, no simplified toy queries.

1. Philosophy: Pipes vs Operator Chain, Schema-on-Read vs Tabular

SPL was born from the Unix pipe. You start with a search (often implicit search index=...), and each | passes rows to the next command. Commands are verbs: stats, eval, table, sort. The schema is discovered at search time — Splunk looks at raw events, extracts fields on the fly (schema-on-read), and lets you transform them mid-pipeline. This is extremely flexible for messy, heterogeneous log data.

KQL was born from OLAP analytics. It’s a tabular expression language — queries always start from a named table with a known schema. You chain operators (also with |, but semantically different) like where, summarize, extend, project. Data is strongly typed. The optimizer knows column types up front, which is why KQL tends to be fast on large volumes but less forgiving when your schema drifts.

The practical impact: SPL will happily eat malformed JSON, weird CSVs, and half-broken syslog and still produce a result. KQL expects you to normalize ingestion, define schemas, and commit to field types. SPL rewards hackers. KQL rewards librarians.

2. Core Operators Side-by-Side

Filtering rows:

index=wineventlog sourcetype=Sysmon EventCode=1 Image="*\\powershell.exe"
DeviceProcessEvents
| where FileName =~ "powershell.exe"

Notice there is no | before the Splunk filter — the initial search clause is implicit. In KQL, every operator, including the first where, sits behind a pipe.

Aggregation:

| stats count as Hits, dc(host) as Devices by User
| summarize Hits = count(), Devices = dcount(DeviceName) by AccountName

stats and summarize are semantic twins. SPL uses dc() for distinct count; KQL uses dcount(). Aliases come after the function in SPL (count as Hits), before the function in KQL (Hits = count()).

Computed columns:

| eval EncodedCmd=if(match(CommandLine, "-encodedcommand"), 1, 0)
| extend EncodedCmd = ProcessCommandLine has "-encodedcommand"

eval vs extend. SPL’s eval returns integers or strings for booleans (most people use 1/0). KQL’s extend returns a real bool type, which is why downstream KQL can do | where EncodedCmd without comparing to anything.

Regex and string matching:

| where match(CommandLine, "(invoke-webrequest|iwr\s|net\.webclient)")
| where ProcessCommandLine has_any ("Invoke-WebRequest", "IWR ", "Net.WebClient")

This is where the two languages really diverge. SPL leans on PCRE regex; KQL leans on indexed term operators (has, has_any, has_all, contains, startswith). has_any is dramatically faster than regex on KQL’s indexed string columns. SPL has no equivalent indexed operator — match() is a linear scan.

3. Joining Data Sources

SPL’s default joining strategy is append, appendcols, or the subsearch [ search ... ]. Joins exist but are considered expensive and row-limited:

search index=wineventlog EventCode=4624
| append [ search index=wineventlog EventCode=4672 ]

KQL uses union for stacking tables and join for relational joins. Both are first-class:

DeviceLogonEvents
| union DeviceProcessEvents
| where Timestamp > ago(24h)

The mental model difference: in SPL, everything is a stream of events flowing through the pipe and you splice streams together. In KQL, tables are real relational objects and you compose them with set operations.

4. Time-Bucketing and Aggregation

Splunk:

| bin _time span=1h
| stats count by _time, host

KQL:

| summarize count() by bin(TimeGenerated, 1h), DeviceName

The logic is identical, but the syntax locations flip. In SPL, bin is a separate pipeline step that modifies _time before stats. In KQL, bin() is a function invoked inside summarize by, keeping everything in one statement. KQL’s style composes better for complex queries; SPL’s step-by-step style is easier to debug one command at a time.

5. Three Real Detections, Translated

Now the real thing. Below are three production detections from the df00tech library, shown verbatim in both KQL and SPL. These are not simplified — they are the actual queries we ship.

Detection A: Suspicious PowerShell Execution (T1059.001)

let SuspiciousPatterns = dynamic([
  "-EncodedCommand", "-enc ", "-e ", "-ec ",
  "Invoke-WebRequest", "IWR ", "Invoke-RestMethod",
  "Net.WebClient", "DownloadString", "DownloadFile", "DownloadData",
  "Start-BitsTransfer",
  "AmsiUtils", "amsiInitFailed", "SetProtectionLevel",
  "Invoke-Expression", "IEX(", "IEX ",
  "-ExecutionPolicy Bypass", "-ep bypass", "-ep unrestricted",
  "-WindowStyle Hidden", "-w hidden", "-windowstyle h",
  "[Convert]::FromBase64String", "[System.Convert]::FromBase64String",
  "Invoke-Mimikatz", "Invoke-Shellcode",
  "New-Object IO.MemoryStream", "IO.Compression",
  "bitsadmin", "certutil -urlcache"
]);
DeviceProcessEvents
| where Timestamp > ago(24h)
| where FileName =~ "powershell.exe" or FileName =~ "pwsh.exe"
| where ProcessCommandLine has_any (SuspiciousPatterns)
| extend EncodedCmd = ProcessCommandLine has_any ("-EncodedCommand", "-enc ", "-e ", "-ec ")
| extend DownloadCradle = ProcessCommandLine has_any ("Invoke-WebRequest", "Net.WebClient", "DownloadString", "DownloadFile", "IWR ", "Start-BitsTransfer")
| extend AmsiBypass = ProcessCommandLine has_any ("AmsiUtils", "amsiInitFailed", "SetProtectionLevel")
| extend PolicyBypass = ProcessCommandLine has_any ("-ExecutionPolicy Bypass", "-ep bypass")
| extend HiddenWindow = ProcessCommandLine has_any ("-WindowStyle Hidden", "-w hidden")
| project Timestamp, DeviceName, AccountName, FileName, ProcessCommandLine,
         InitiatingProcessFileName, InitiatingProcessCommandLine,
         EncodedCmd, DownloadCradle, AmsiBypass, PolicyBypass, HiddenWindow
| sort by Timestamp desc
index=wineventlog sourcetype="XmlWinEventLog:Microsoft-Windows-Sysmon/Operational" EventCode=1
  (Image="*\\powershell.exe" OR Image="*\\pwsh.exe")
| eval CommandLine=lower(CommandLine)
| eval EncodedCmd=if(match(CommandLine, "(-encodedcommand|-enc\s|-e\s|-ec\s)"), 1, 0)
| eval DownloadCradle=if(match(CommandLine, "(invoke-webrequest|iwr\s|net\.webclient|downloadstring|downloadfile|downloaddata|start-bitstransfer|invoke-restmethod)"), 1, 0)
| eval AmsiBypass=if(match(CommandLine, "(amsiutils|amsiinitfailed|setprotectionlevel)"), 1, 0)
| eval PolicyBypass=if(match(CommandLine, "(-executionpolicy\s+bypass|-ep\s+bypass|-ep\s+unrestricted)"), 1, 0)
| eval HiddenWindow=if(match(CommandLine, "(-windowstyle\s+hidden|-w\s+hidden)"), 1, 0)
| eval InvokeExpression=if(match(CommandLine, "(invoke-expression|iex\(|iex\s)"), 1, 0)
| eval SuspicionScore=EncodedCmd + DownloadCradle + AmsiBypass + PolicyBypass + HiddenWindow + InvokeExpression
| where SuspicionScore > 0
| table _time, host, User, Image, CommandLine, ParentImage, ParentCommandLine, EncodedCmd, DownloadCradle, AmsiBypass, PolicyBypass, HiddenWindow, InvokeExpression, SuspicionScore
| sort - _time

Three syntactic differences that matter here:

  1. Pattern arrays vs regex alternation. KQL declares a dynamic([]) list and uses has_any for indexed multi-term matching. SPL jams alternations into a single match() regex. The KQL version is easier to maintain (add one string to the array) and faster at scale (term index lookup). The SPL version is more portable across Splunk versions that lack list variables.

  2. Boolean columns vs integer flags. KQL’s extend EncodedCmd = ... has_any (...) produces a native bool. SPL’s eval EncodedCmd=if(..., 1, 0) produces an integer. That integer lets SPL do SuspicionScore = EncodedCmd + DownloadCradle + ... — arithmetic on booleans. KQL would need toint() casts for the same trick, which is why this detection’s KQL version skips the composite score entirely.

  3. Schema binding. KQL starts with DeviceProcessEvents, a Microsoft Defender for Endpoint table with a fixed schema. SPL starts with index=wineventlog sourcetype=... — a content-addressed scope that Splunk schema-on-reads. Same attack, two completely different data-access models.

Detection B: LSASS Credential Dumping (T1003.001)

let LsassAccessTools = dynamic([
  "procdump", "procdump64", "mimikatz", "mimilib", "wce", "gsecdump",
  "lsass.exe", "sqldumper", "werfault", "taskmgr"
]);
let SuspiciousLsassAccess = DeviceEvents
| where Timestamp > ago(24h)
| where ActionType == "ProcessAccessed"
| where FileName =~ "lsass.exe"
| where InitiatingProcessFileName !in~ (
    "MsMpEng.exe", "csrss.exe", "services.exe", "lsm.exe",
    "svchost.exe", "winlogon.exe", "wmiprvse.exe", "wininit.exe",
    "SecurityHealthService.exe", "SenseIR.exe"
  )
| where InitiatingProcessGrantedAccessMask in (
    "0x1fffff", "0x1f3fff", "0x143a", "0x1410", "0x1010", "0x40"
  )
| project Timestamp, DeviceName, InitiatingProcessFileName, InitiatingProcessCommandLine,
          InitiatingProcessAccountName, InitiatingProcessGrantedAccessMask;
let ComsvcsMinidump = DeviceProcessEvents
| where Timestamp > ago(24h)
| where FileName =~ "rundll32.exe"
| where ProcessCommandLine has_all ("comsvcs.dll", "MiniDump")
| project Timestamp, DeviceName, AccountName, FileName, ProcessCommandLine;
let ProcDumpLsass = DeviceProcessEvents
| where Timestamp > ago(24h)
| where FileName in~ ("procdump.exe", "procdump64.exe")
| where ProcessCommandLine has "lsass"
| project Timestamp, DeviceName, AccountName, FileName, ProcessCommandLine;
union SuspiciousLsassAccess, ComsvcsMinidump, ProcDumpLsass
| sort by Timestamp desc
index=wineventlog sourcetype="XmlWinEventLog:Microsoft-Windows-Sysmon/Operational" EventCode=10
  TargetImage="*\\lsass.exe"
  NOT (SourceImage="*\\MsMpEng.exe" OR SourceImage="*\\csrss.exe" OR SourceImage="*\\services.exe"
    OR SourceImage="*\\lsm.exe" OR SourceImage="*\\svchost.exe" OR SourceImage="*\\winlogon.exe"
    OR SourceImage="*\\wininit.exe" OR SourceImage="*\\SecurityHealthService.exe")
| eval HighAccess=if(match(GrantedAccess, "^0x(1[fF][3f][fF]{3}|1[fF][fF]{4}|143[aA]|1[04][01][0a0])"), 1, 0)
| where HighAccess=1
| table _time, host, SourceImage, SourceCommandLine, GrantedAccess, SourceUser
| sort - _time
| append
  [search index=wineventlog sourcetype="XmlWinEventLog:Microsoft-Windows-Sysmon/Operational" EventCode=1
   ((Image="*\\rundll32.exe" CommandLine="*comsvcs*" CommandLine="*MiniDump*")
    OR (Image="*\\procdump*.exe" CommandLine="*lsass*"))
  | eval DetectionType=case(
      match(CommandLine, "comsvcs"), "ComsvcsMinidump",
      match(Image, "procdump"), "ProcDumpLsass",
      1==1, "Other"
    )
  | table _time, host, Image, CommandLine, User, DetectionType]

Three syntactic differences that matter here:

  1. union of named subqueries vs append with subsearch. KQL names three intermediate results (SuspiciousLsassAccess, ComsvcsMinidump, ProcDumpLsass) with let, each of which is a full, reusable table expression, then unions them. SPL has to jam its second and third logical branches into an append [ search ... ] subsearch — and subsearches in Splunk have a default row limit (50,000) that can silently truncate results. The KQL version scales better for high-volume environments.

  2. has_all vs two substring predicates. KQL’s ProcessCommandLine has_all ("comsvcs.dll", "MiniDump") is a single indexed operator checking both terms are present in any order. SPL achieves the same thing with CommandLine="*comsvcs*" CommandLine="*MiniDump*" — two wildcard predicates that trigger scans. Same intent, very different execution cost on large indexes.

  3. Typed set membership vs regex mask matching. KQL’s InitiatingProcessGrantedAccessMask in ("0x1fffff", ...) is a simple set-membership check against discrete string values. SPL wraps the equivalent into a regex ^0x(1[fF][3f][fF]{3}|...) because match() is the most portable way to check multiple possible values in older SPL. The regex works, but it’s harder to audit than KQL’s explicit list.

Detection C: Scheduled Task Creation (T1053.005)

let SuspiciousTaskPatterns = dynamic([
  "cmd.exe", "powershell.exe", "pwsh.exe", "wscript.exe", "cscript.exe",
  "mshta.exe", "rundll32.exe", "regsvr32.exe", "certutil.exe",
  "bitsadmin.exe", "msbuild.exe", "wmic.exe", "msiexec.exe"
]);
let SuspiciousLocations = dynamic([
  "\\AppData\\", "\\Temp\\", "\\ProgramData\\", "\\Public\\",
  "\\Users\\Default\\", "%temp%", "%appdata%", "%public%"
]);
let SuspiciousSchtasksArgs = dynamic([
  "/sc onlogon", "/sc onstart", "/sc onstartup", "/ru system",
  "/ru \"system\"", "http://", "https://", "\\\\\\\\" 
]);
// Branch 1: schtasks.exe process creation with suspicious patterns
let SchtasksExecution = DeviceProcessEvents
| where Timestamp > ago(24h)
| where FileName =~ "schtasks.exe"
| where ProcessCommandLine has_any ("/create", "/change")
| extend HasSuspiciousLoc = ProcessCommandLine has_any (SuspiciousLocations)
| extend HasSuspiciousBin = ProcessCommandLine has_any (SuspiciousTaskPatterns)
| extend RunAsSystem = ProcessCommandLine has_any ("/ru system", "/ru \"SYSTEM\"", "/ru \"NT AUTHORITY\\SYSTEM\"")
| extend RemoteTask = ProcessCommandLine has "/s "
| extend OnLogonTrigger = ProcessCommandLine has_any ("/sc onlogon", "/sc onstartup", "/sc onstart")
| extend HighFreqTrigger = ProcessCommandLine has_any ("/sc minute", "/sc hourly")
| extend SuspicionScore = toint(HasSuspiciousLoc) + toint(HasSuspiciousBin) + toint(RunAsSystem) + toint(RemoteTask) + toint(OnLogonTrigger) + toint(HighFreqTrigger)
| where SuspicionScore > 0
| extend DetectionSource = "schtasks_process"
| project Timestamp, DeviceName, AccountName, FileName, ProcessCommandLine,
         InitiatingProcessFileName, InitiatingProcessCommandLine,
         HasSuspiciousLoc, HasSuspiciousBin, RunAsSystem, RemoteTask,
         OnLogonTrigger, HighFreqTrigger, SuspicionScore, DetectionSource;
// Branch 2: Suspicious parent processes spawning schtasks
let SuspiciousParentSchtasks = DeviceProcessEvents
| where Timestamp > ago(24h)
| where FileName =~ "schtasks.exe"
| where InitiatingProcessFileName in~ ("powershell.exe", "pwsh.exe", "cmd.exe", "wscript.exe",
         "cscript.exe", "mshta.exe", "rundll32.exe", "regsvr32.exe", "msiexec.exe",
         "wmic.exe", "explorer.exe", "winword.exe", "excel.exe", "outlook.exe",
         "acrord32.exe", "msedge.exe", "chrome.exe", "firefox.exe")
| extend SuspicionScore = 2
| extend DetectionSource = "suspicious_parent"
| project Timestamp, DeviceName, AccountName, FileName, ProcessCommandLine,
         InitiatingProcessFileName, InitiatingProcessCommandLine,
         HasSuspiciousLoc=false, HasSuspiciousBin=false, RunAsSystem=false,
         RemoteTask=false, OnLogonTrigger=false, HighFreqTrigger=false,
         SuspicionScore, DetectionSource;
// Branch 3: Task Scheduler registry writes to suspicious paths
let RegistryTaskCreation = DeviceRegistryEvents
| where Timestamp > ago(24h)
| where RegistryKey has "\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Schedule\\TaskCache\\"
| where ActionType in ("RegistryValueSet", "RegistryKeyCreated")
| where InitiatingProcessFileName !in~ ("svchost.exe", "taskeng.exe", "taskhostw.exe", "TaskScheduler")
| extend SuspicionScore = 1
| extend DetectionSource = "registry_task"
| project Timestamp=Timestamp, DeviceName, AccountName=InitiatingProcessAccountName,
         FileName=InitiatingProcessFileName,
         ProcessCommandLine=InitiatingProcessCommandLine,
         InitiatingProcessFileName, InitiatingProcessCommandLine,
         HasSuspiciousLoc=false, HasSuspiciousBin=false, RunAsSystem=false,
         RemoteTask=false, OnLogonTrigger=false, HighFreqTrigger=false,
         SuspicionScore, DetectionSource;
union SchtasksExecution, SuspiciousParentSchtasks, RegistryTaskCreation
| sort by Timestamp desc
index=wineventlog (sourcetype="XmlWinEventLog:Microsoft-Windows-Sysmon/Operational" OR sourcetype="WinEventLog:Security")
(
  (EventCode=1 Image="*\\schtasks.exe")
  OR (EventCode=4688 NewProcessName="*\\schtasks.exe")
  OR (EventCode=4698)
  OR (EventCode=4702)
)
| eval CommandLine=coalesce(CommandLine, ProcessCommandLine, CommandLine)
| eval CommandLineLower=lower(CommandLine)
| eval TaskName=coalesce(TaskName, "")
| eval TaskContent=coalesce(TaskContent, "")

| eval HasCreate=if(match(CommandLineLower, "(/create|/change)"), 1, 0)
| eval RunAsSystem=if(match(CommandLineLower, "(/ru\s+(system|\"system\"|\"nt authority\\\\system\"))"), 1, 0)
| eval SuspiciousLoc=if(match(CommandLineLower, "(appdata|\\\\temp\\\\|programdata|\\\\public\\\\|%temp%|%appdata%)"), 1, 0)
| eval SuspiciousBin=if(match(CommandLineLower, "(powershell|cmd\.exe|wscript|cscript|mshta|rundll32|regsvr32|certutil|bitsadmin|msbuild|wmic|msiexec)"), 1, 0)
| eval OnLogonTrigger=if(match(CommandLineLower, "(/sc\s+(onlogon|onstartup|onstart))"), 1, 0)
| eval HighFreqTrigger=if(match(CommandLineLower, "(/sc\s+(minute|hourly))"), 1, 0)
| eval RemoteTask=if(match(CommandLineLower, "/s\s+"), 1, 0)
| eval SuspiciousParent=if(match(lower(ParentImage), "(powershell|pwsh|wscript|cscript|mshta|rundll32|regsvr32|winword|excel|outlook|acrord32|msedge|chrome|firefox)"), 1, 0)
| eval IsSecurityEvent=if(EventCode=4698 OR EventCode=4702, 1, 0)

| eval SuspicionScore=HasCreate + RunAsSystem + SuspiciousLoc + SuspiciousBin + OnLogonTrigger + HighFreqTrigger + RemoteTask + SuspiciousParent + IsSecurityEvent

| where SuspicionScore > 0

| eval DetectionSource=case(
    EventCode=4698, "task_created_security_log",
    EventCode=4702, "task_updated_security_log",
    EventCode=1, "sysmon_process_create",
    EventCode=4688, "security_process_create",
    true(), "unknown"
  )

| table _time, host, User, Image, CommandLine, ParentImage, ParentCommandLine,
        TaskName, HasCreate, RunAsSystem, SuspiciousLoc, SuspiciousBin,
        OnLogonTrigger, HighFreqTrigger, RemoteTask, SuspiciousParent,
        SuspicionScore, DetectionSource, EventCode
| sort - SuspicionScore, - _time

Three syntactic differences that matter here:

  1. let bindings for modular branches vs a single flattened pipeline. The KQL detection breaks into three named branches (SchtasksExecution, SuspiciousParentSchtasks, RegistryTaskCreation) that each query different tables (DeviceProcessEvents vs DeviceRegistryEvents), then unions them. SPL achieves multi-source coverage with a single top-level (EventCode=1 OR EventCode=4688 OR EventCode=4698 OR EventCode=4702) filter — every event type flows through the same eval pipeline. SPL is more compact; KQL is more structured.

  2. case() vs KQL’s iff/case for categorical output. Both languages have a case expression, but SPL’s case(condition1, value1, condition2, value2, true(), default) uses true() as the “else” clause — a readable idiom that KQL lacks.

  3. coalesce() for cross-source field normalization. SPL uses coalesce(CommandLine, ProcessCommandLine, CommandLine) to pick the first non-null value across event types, because Sysmon calls it CommandLine and Security Event 4688 calls it ProcessCommandLine. KQL sidesteps this entirely by querying separate tables that already have consistent column names in each — one of the payoffs of a strictly-typed schema.

6. Performance and Cost Considerations

Indexing model. Splunk indexes every raw event and every extracted field at ingest. That makes full-text search very fast but storage expensive. Typical Splunk licenses charge by ingest volume in GB/day. Microsoft Sentinel also charges by ingest volume but stores data in Azure Data Explorer, which uses columnar storage with per-column compression — often yielding 10-20x better compression than Splunk.

Query cost. KQL’s indexed has, has_any, and has_all operators execute in logarithmic time on a term index. SPL’s match() regex operator is O(n) over each event’s raw text. For a has_any against a list of 30 IOC strings, KQL finishes in the time SPL spends parsing the regex.

Wildcards. SPL’s leading-wildcard patterns ("*\\powershell.exe") force a full event scan — they cannot use the inverted index. KQL does not have leading wildcards by default; you either use endswith (which KQL can optimize on an indexed column) or normalize your filenames at ingest.

Practical impact. On a 500GB/day environment, a badly-written SPL query can cost hundreds of dollars in extra compute per run. The same detection in KQL against the same data volume typically runs for pennies. This is not a small difference — it changes what kinds of hunts are economically viable.

7. Which Should You Learn First?

Opinionated answer: learn KQL first if you’re new to detection engineering. Three reasons:

  1. KQL is on the way up, SPL is on the way down. Microsoft Defender XDR, Sentinel, Azure Monitor, Azure Data Explorer, and Microsoft Fabric all use KQL. The Microsoft ecosystem is the biggest growth market in SOC tooling. Splunk is still everywhere, but the hiring curve has flattened. A Sentinel-native SOC analyst role didn’t exist five years ago; it’s now a common job title.

  2. KQL forces you to think in schemas. Detection engineering at scale requires you to know your data model — which table, which column, which type. KQL makes this non-optional from day one. You can write working SPL for a year without truly understanding your event model, and when you finally need to scale or migrate, you’ll rewrite everything. KQL teaches the right habits up front.

  3. KQL syntax is closer to what the rest of the data world speaks. summarize, join, project, extend map cleanly onto SQL’s GROUP BY, JOIN, SELECT, and computed columns. If you know KQL, picking up Spark SQL, Snowflake, BigQuery, or standard ANSI SQL is an easy transition. SPL’s verb-first pipe syntax is idiosyncratic — it’s a good DSL for Splunk, but the skills transfer less cleanly.

That said: if you already work in a Splunk shop, learn SPL first. The best detection language is the one deployed against your actual data. Detection engineering is a craft practiced against real logs, not a language debate.

Both, Eventually

The real endgame is being able to read and translate detections between both languages. Adversaries don’t care what query language your SOC runs. A PowerShell download cradle looks the same in DeviceProcessEvents and Sysmon. LSASS memory gets dumped the same way whether you’re hunting in Splunk or Sentinel. The detection logic transfers — only the syntax changes.

Browse the full df00tech detection library — every detection ships with queries in BOTH KQL and SPL (plus Elastic, QRadar, Sumo, Chronicle, LogScale). Pick the syntax your stack speaks, and the detection logic comes along for free.

Start with these five techniques — they cover the highest-frequency attacker behaviors in the MITRE ATT&CK framework: