Detect Code Repositories in Sumo Logic CSE
Adversaries may leverage code repositories to collect valuable information including proprietary source code and unsecured credentials embedded within software. Code repositories such as GitHub, GitLab, Bitbucket, and Azure DevOps store source code and automate software builds, and may be hosted internally or externally. Once adversaries gain access via compromised credentials, stolen OAuth tokens, or insider access, they may bulk-clone repositories, run automated secret-scanning tools (trufflehog, gitleaks) to harvest embedded API keys and passwords, or enumerate organizational repositories at scale via API calls. LAPSUS$ searched victim networks for GitLab and GitHub instances to discover high-privilege credentials; Scattered Spider enumerated internal GitHub repositories as part of broader data theft operations; APT41 cloned victim Git repositories during intrusions. Successful exploitation provides adversaries with source code for developing targeted exploits, service credentials for lateral movement, and intellectual property for competitive or financial gain.
MITRE ATT&CK
- Tactic
- Collection
- Technique
- T1213 Data from Information Repositories
- Sub-technique
- T1213.003 Code Repositories
- Canonical reference
- https://attack.mitre.org/techniques/T1213/003/
Sumo Detection Query
_sourceCategory=*windows* OR _sourceCategory=*sysmon* OR _sourceCategory=*linux*
| where (%"EventID" = "1" OR %"EventID" = "4688" OR _sourceCategory matches "*linux*")
| parse field=CommandLine "*" as cmd nodrop
| parse field=Image "*" as process_image nodrop
| eval cmd_lower = toLowerCase(cmd)
| eval image_lower = toLowerCase(process_image)
// Classify activity
| eval IsSecretScan = if(
cmd_lower matches "*trufflehog*" OR
cmd_lower matches "*gitleaks*" OR
cmd_lower matches "*git-secrets*" OR
cmd_lower matches "*gitrob*" OR
cmd_lower matches "*shhgit*" OR
cmd_lower matches "*detect-secrets*" OR
cmd_lower matches "*git-hound*" OR
cmd_lower matches "*gitallsecrets*",
1, 0)
| eval IsAPIEnum = if(
cmd_lower matches "*api.github.com/orgs*" OR
cmd_lower matches "*api.github.com/users/*" OR
cmd_lower matches "*api.github.com/repos*" OR
cmd_lower matches "*api.github.com/search/repositories*" OR
cmd_lower matches "*gitlab.com/api/v4/projects*" OR
cmd_lower matches "*gitlab.com/api/v4/groups*" OR
cmd_lower matches "*api.bitbucket.org/2.0/repositories*" OR
cmd_lower matches "*dev.azure.com*",
1, 0)
| eval IsBulkExtract = if(
(image_lower matches "*\git.exe" OR image_lower matches "*/git")
AND (cmd_lower matches "* archive *" OR cmd_lower matches "* bundle *"),
1, 0)
| eval IsBulkClone = if(
(image_lower matches "*\git.exe" OR image_lower matches "*/git")
AND cmd_lower matches "* clone *",
1, 0)
| where IsSecretScan = 1 OR IsAPIEnum = 1 OR IsBulkExtract = 1 OR IsBulkClone = 1
| stats
count as EventCount,
max(IsSecretScan) as IsSecretScan,
max(IsAPIEnum) as IsAPIEnum,
max(IsBulkExtract) as IsBulkExtract,
max(IsBulkClone) as IsBulkClone,
values(cmd) as CommandSamples,
min(_messageTime) as FirstSeen,
max(_messageTime) as LastSeen
by host, user, process_image
| where IsSecretScan = 1
OR IsAPIEnum = 1
OR IsBulkExtract = 1
OR (IsBulkClone = 1 AND EventCount >= 5)
| eval DetectionType = if(IsSecretScan = 1, "SecretScanningToolExecution",
if(IsAPIEnum = 1, "RepositoryAPIEnumeration",
if(IsBulkExtract = 1, "GitBulkExtraction",
if(IsBulkClone = 1 AND EventCount >= 5, "BulkRepositoryCloning",
"MultipleSignals"))))
| fields FirstSeen, LastSeen, host, user, process_image, DetectionType, EventCount, CommandSamples
| sort by -FirstSeen Sumo Logic query detecting T1213.003 code repository collection via process creation events from Windows Sysmon (EventID 1), Windows Security (EventID 4688), and Linux audit logs. Classifies events as secret scanning tool execution, repository API enumeration, git bulk extraction, or bulk cloning, applying a count threshold of 5+ for clone operations. Aggregates per host, user, and process image.
Data Sources
Required Tables
False Positives & Tuning
- Developer workstations with legitimate trufflehog or gitleaks installed as pre-commit hooks that run on every commit.
- CI/CD runners (GitHub Actions, GitLab CI, Jenkins agents) that clone multiple repositories as part of automated build and test pipelines.
- Security tooling VMs that run authorized secret-scanning workflows against internal codebases on a scheduled basis.
- Data engineering teams archiving or bundling repositories via git archive as part of regular backup or compliance workflows.
Other platforms for T1213.003
Testing Methodology
Validate this detection against 4 adversary techniques from Atomic Red Team. Each test below lists the behaviour to exercise and the telemetry you should expect to see. Executable commands and cleanup steps are available with Pro.
- Test 1Bulk Repository Cloning via Shell Loop
Expected signal: Sysmon Event ID 1 (if configured on Linux via auditd): multiple git process creation events with CommandLine containing 'clone' within seconds. On Windows endpoints: DeviceProcessEvents entries for git.exe with ProcessCommandLine matching 'clone https://github.com/'. Five or more clone events from the same AccountName within a 1-hour bin.
- Test 2Secret Scanning with Trufflehog Against Local Repository
Expected signal: Sysmon Event ID 1: Process Create with Image containing 'trufflehog' or CommandLine containing 'trufflehog'. If trufflehog is not installed, the which command still creates a process event. DeviceProcessEvents: FileName='trufflehog' or ProcessCommandLine has 'trufflehog'.
- Test 3GitHub Organization Repository Enumeration via API
Expected signal: Sysmon Event ID 1: Process Create for curl with CommandLine containing 'api.github.com/orgs'. Sysmon Event ID 3: Network Connection to api.github.com:443. DeviceProcessEvents: FileName='curl' with ProcessCommandLine has 'api.github.com/orgs'. DeviceNetworkEvents: RemoteUrl containing 'api.github.com'.
- Test 4Git Archive Bulk Content Extraction
Expected signal: Sysmon Event ID 1: Process Create with Image containing 'git' and CommandLine containing 'archive --format'. Sysmon Event ID 11: File Create events for the extracted files in /tmp. DeviceProcessEvents: FileName='git' with ProcessCommandLine has 'archive'. DeviceFileEvents: multiple file creation events from the git process in output directory.
References (11)
- https://attack.mitre.org/techniques/T1213/003/
- https://www.microsoft.com/en-us/security/blog/2022/03/22/dev-0537-criminal-actor-targeting-organizations-for-data-exfiltration-and-destruction/
- https://www.cisa.gov/sites/default/files/2023-11/aa23-320a_scattered_spider.pdf
- https://www.nccgroup.com/us/research-blog/lapsus-recent-techniques-tactics-and-procedures/
- https://www.wired.com/story/uber-paid-off-hackers-to-hide-a-57-million-user-data-breach/
- https://krebsonsecurity.com/2013/10/adobe-to-announce-source-code-customer-data-breach/
- https://github.com/trufflesecurity/trufflehog
- https://github.com/gitleaks/gitleaks
- https://docs.github.com/en/organizations/keeping-your-organization-secure/managing-security-settings-for-your-organization/reviewing-the-audit-log-for-your-organization
- https://learn.microsoft.com/en-us/defender-cloud-apps/connect-github-ec
- https://github.com/redcanaryco/atomic-red-team/blob/master/atomics/T1213.003/T1213.003.md
Unlock Pro Content
Get the full detection package for T1213.003 including response playbook, investigation guide, and atomic red team tests.