T1213.003 Google Chronicle · YARA-L

Detect Code Repositories in Google Chronicle

Adversaries may leverage code repositories to collect valuable information including proprietary source code and unsecured credentials embedded within software. Code repositories such as GitHub, GitLab, Bitbucket, and Azure DevOps store source code and automate software builds, and may be hosted internally or externally. Once adversaries gain access via compromised credentials, stolen OAuth tokens, or insider access, they may bulk-clone repositories, run automated secret-scanning tools (trufflehog, gitleaks) to harvest embedded API keys and passwords, or enumerate organizational repositories at scale via API calls. LAPSUS$ searched victim networks for GitLab and GitHub instances to discover high-privilege credentials; Scattered Spider enumerated internal GitHub repositories as part of broader data theft operations; APT41 cloned victim Git repositories during intrusions. Successful exploitation provides adversaries with source code for developing targeted exploits, service credentials for lateral movement, and intellectual property for competitive or financial gain.

MITRE ATT&CK

Tactic
Collection
Technique
T1213 Data from Information Repositories
Sub-technique
T1213.003 Code Repositories
Canonical reference
https://attack.mitre.org/techniques/T1213/003/

YARA-L Detection Query

Google Chronicle (YARA-L)
yaral
rule T1213_003_code_repository_collection {
  meta:
    author = "Detection Engineering"
    description = "Detects T1213.003 - Code Repository collection via secret scanning tools, repository API enumeration, and git bulk extraction operations."
    mitre_attack_tactic = "Collection"
    mitre_attack_technique = "T1213.003"
    severity = "HIGH"
    confidence = "HIGH"
    reference = "https://attack.mitre.org/techniques/T1213/003/"
    created = "2026-04-19"

  events:
    $e.metadata.event_type = "PROCESS_LAUNCH"

    (
      // Secret scanning tools by process name
      re.regex($e.principal.process.file.full_path, `(?i)(trufflehog|gitleaks|git-secrets|gitrob|shhgit|detect-secrets|git-hound|gitallsecrets)$`)

      // Secret scanning tools in command line arguments
      or re.regex($e.principal.process.command_line, `(?i)(trufflehog|gitleaks|git.secrets|gitrob|shhgit|detect.secrets|git.hound|gitallsecrets)`)

      // Git bulk archive or bundle operations
      or (
        re.regex($e.principal.process.file.full_path, `(?i)(^|[/\\])git(\.exe)?$`)
        and re.regex($e.principal.process.command_line, `(?i)\b(archive|bundle)\b`)
      )

      // Repository API enumeration via scripting tools
      or (
        re.regex($e.principal.process.file.full_path, `(?i)(python3?(\.exe)?|powershell(\.exe)?|pwsh(\.exe)?|curl(\.exe)?|wget(\.exe)?)$`)
        and re.regex($e.principal.process.command_line, `api\.github\.com/(orgs|users|repos|search)|gitlab\.com/api/v4/(projects|groups)|api\.bitbucket\.org/2\.0/repositories|dev\.azure\.com`)
      )

      // Git clone operations (threshold for bulk detected at SIEM level via noisy_keyword or aggregation rule)
      or (
        re.regex($e.principal.process.file.full_path, `(?i)(^|[/\\])git(\.exe)?$`)
        and re.regex($e.principal.process.command_line, `(?i)\bclone\b`)
      )
    )

  match:
    $e.principal.hostname over 1h

  outcome:
    $hostname = $e.principal.hostname
    $username = $e.principal.user.userid
    $process_path = $e.principal.process.file.full_path
    $command_line = $e.principal.process.command_line
    $risk_score = max(
      if(re.regex($e.principal.process.command_line, `(?i)(trufflehog|gitleaks|git.secrets|gitrob|shhgit|detect.secrets|git.hound|gitallsecrets)`), 90, 0),
      if(re.regex($e.principal.process.command_line, `api\.github\.com|gitlab\.com/api|api\.bitbucket\.org|dev\.azure\.com`), 75, 0),
      if(re.regex($e.principal.process.command_line, `(?i)\b(archive|bundle)\b`) and re.regex($e.principal.process.file.full_path, `(?i)(^|[/\\])git(\.exe)?$`), 80, 0),
      if(re.regex($e.principal.process.command_line, `(?i)\bclone\b`) and re.regex($e.principal.process.file.full_path, `(?i)(^|[/\\])git(\.exe)?$`), 50, 0)
    )
    $detection_type = if(
      re.regex($e.principal.process.command_line, `(?i)(trufflehog|gitleaks|git.secrets|gitrob|shhgit|detect.secrets|git.hound|gitallsecrets)`), "SecretScanningToolExecution",
      if(re.regex($e.principal.process.command_line, `api\.github\.com|gitlab\.com/api|api\.bitbucket\.org|dev\.azure\.com`), "RepositoryAPIEnumeration",
      if(re.regex($e.principal.process.command_line, `(?i)\b(archive|bundle)\b`), "GitBulkExtraction", "BulkRepositoryCloning"))
    )

  condition:
    $e
}
high severity high confidence

Chronicle YARA-L 2.0 rule detecting T1213.003 code repository collection. Matches on PROCESS_LAUNCH UDM events and classifies activity as secret scanning tool execution (trufflehog/gitleaks/gitrob family), repository API enumeration (GitHub/GitLab/Bitbucket/Azure DevOps REST APIs via scripting engines), git bulk archive/bundle extraction, or git clone operations. Outputs a dynamic risk score and detection type label. Correlates within a 1-hour window per hostname.

Data Sources

Google Chronicle UDM - PROCESS_LAUNCH eventsEndpoint telemetry forwarded to Chronicle (CrowdStrike, Carbon Black, SentinelOne, Windows Event logs via Chronicle forwarder)

Required Tables

UDM events with metadata.event_type = PROCESS_LAUNCH

False Positives & Tuning

  • Authorized security assessments where red team or security engineering runs gitleaks/trufflehog against internal repositories as part of approved engagements.
  • Developer machines with pre-commit hooks that invoke secret-scanning tools automatically on every git commit action.
  • CI/CD build agents (GitHub Actions self-hosted runners, Jenkins workers) that legitimately clone multiple repositories and interact with repository APIs.
  • DevOps engineers using scripted tools to manage repository settings, create repositories, or configure webhooks via GitHub/GitLab REST APIs.
Download portable Sigma rule (.yml)

Other platforms for T1213.003


Testing Methodology

Validate this detection against 4 adversary techniques from Atomic Red Team. Each test below lists the behaviour to exercise and the telemetry you should expect to see. Executable commands and cleanup steps are available with Pro.

  1. Test 1Bulk Repository Cloning via Shell Loop

    Expected signal: Sysmon Event ID 1 (if configured on Linux via auditd): multiple git process creation events with CommandLine containing 'clone' within seconds. On Windows endpoints: DeviceProcessEvents entries for git.exe with ProcessCommandLine matching 'clone https://github.com/'. Five or more clone events from the same AccountName within a 1-hour bin.

  2. Test 2Secret Scanning with Trufflehog Against Local Repository

    Expected signal: Sysmon Event ID 1: Process Create with Image containing 'trufflehog' or CommandLine containing 'trufflehog'. If trufflehog is not installed, the which command still creates a process event. DeviceProcessEvents: FileName='trufflehog' or ProcessCommandLine has 'trufflehog'.

  3. Test 3GitHub Organization Repository Enumeration via API

    Expected signal: Sysmon Event ID 1: Process Create for curl with CommandLine containing 'api.github.com/orgs'. Sysmon Event ID 3: Network Connection to api.github.com:443. DeviceProcessEvents: FileName='curl' with ProcessCommandLine has 'api.github.com/orgs'. DeviceNetworkEvents: RemoteUrl containing 'api.github.com'.

  4. Test 4Git Archive Bulk Content Extraction

    Expected signal: Sysmon Event ID 1: Process Create with Image containing 'git' and CommandLine containing 'archive --format'. Sysmon Event ID 11: File Create events for the extracted files in /tmp. DeviceProcessEvents: FileName='git' with ProcessCommandLine has 'archive'. DeviceFileEvents: multiple file creation events from the git process in output directory.

Unlock Pro Content

Get the full detection package for T1213.003 including response playbook, investigation guide, and atomic red team tests.

Response PlaybookInvestigation GuideHunting QueriesAtomic Red Team TestsTuning Guidance

Related Detections