From Idea to Pipeline: Building an AI-Powered Security Triage Tool From Scratch

Why I Built This

I wanted to stop reading about detection and start doing it. I had been studying MITRE ATT&CK, learning about Sysmon, and watching SOC workflows from the outside. At some point I realized the best way to understand how detection actually works was to build something that did it. Not a lab exercise following someone else’s steps. Something mine, from scratch, with real telemetry and real decisions.

Setting Up the Lab

The question I started with was simple: could I simulate a real attack on my own isolated Windows VM, capture the resulting telemetry with Sysmon, and use AI to automatically triage what it found? That question became this project.

Before any code got written I needed an environment. I built an isolated Windows 10 VM in VMware Workstation Pro on a host machine with 64GB RAM. Isolated means Host-Only networking, no connection to the internet or my real network. Whatever happened inside that VM stayed inside that VM.

I installed Sysmon using the SwiftOnSecurity community configuration, which tuned the logging to capture meaningful security events without drowning in noise. Sysmon runs silently in the background as a Windows service. Every process creation, file modification, registry change, and network connection gets logged automatically.

Getting to this point had its own lessons. VMware would not boot from the ISO initially because the virtual CD drive was not connected to the file. Sysmon would not install the first time because the VM was on Host-Only and could not reach the internet. Small details but matter when putting this lab together.

Simulating Real Attacks with Atomic Red Team

Atomic Red Team is a library of small scripts, each one simulating a specific MITRE ATT&CK technique. A real attacker does not use Atomic Red Team. They run the actual commands themselves. But Atomic Red Team runs those same real commands, generating the same real telemetry, which is what matters for detection.

I ran a ten technique attack chain designed to tell a story. The sequence moved through the phases a real attacker follows after gaining initial access. First orientation, who am I and where am I. Then network reconnaissance, what can I reach. Then system discovery, what is running and what is configured. Then persistence, surviving a reboot. Then an attempt to disable defenses. Then credential access.

Some techniques succeeded quietly because they used legitimate Windows tools. Tasklist, systeminfo, whoami, reg query. Windows Defender cannot block these because blocking them would break normal administration. That is the Living off the Land concept in practice. Other techniques were blocked immediately. Mimikatz was caught the moment it appeared. The attempt to disable Defender failed silently because Tamper Protection held. Seeing which techniques passed and which were caught was one of the most educational parts of the entire project.

The QakBot recon simulation was particularly interesting. It ran a sequence of commands that a real QakBot infection uses to profile a victim machine including whoami /all to enumerate privileges, ipconfig to map the network, arp to discover neighbors, netstat to identify listening ports, and nslookup attempting to find a domain controller. Every command is a legitimate Windows tool. Together they form a recognizable attack pattern.

Building the Tool

The code went through six versions and each one solved a real problem the previous version could not handle.

V1 was a proof of concept. A hardcoded mock alert based on the T1057 process discovery output, sent to Google Gemini, getting back a structured triage result. Severity High, technique correctly identified, process chain analyzed, false positive likelihood explained.

V2 made the input dynamic. Instead of a hardcoded alert the script accepted any text pasted into the terminal. It also added error handling so the script would not crash if the API failed or the input was empty. These changes seem small but they are the difference between a script and a tool.

V3 was the real turning point. Instead of manually writing mock alerts the script automatically ingested a Sysmon XML export, parsed 2,077 real security events, filtered for suspicious activity, and sent the results to the AI. The first few runs kept finding VS Code installation activity instead of attack techniques because the time window captured the wrong period. The AI correctly identified what it saw. It flagged me opening the Sysmon file in Notepad as suspicious reconnaissance. Technically it was not wrong. The analyst and the attacker were the same person in this lab and the AI had no way to know that.

V4 added MITRE ATT&CK enrichment. After the AI identified technique IDs the script automatically queried the MITRE ATT&CK database via the attackcti Python library and pulled back the tactic classification, detection guidance, and reference URL for each one. Every finding now had authoritative MITRE context attached automatically.

V5 added MITRE D3FEND. D3FEND is the defensive counterpart to ATT&CK. Where ATT&CK catalogs what attackers do, D3FEND maps what defenders should do in response. For every technique the AI detected the script now automatically queried the D3FEND REST API and returned the corresponding defensive countermeasures. T1033 returned 31 countermeasures. T1003 returned 48. The pipeline now went from raw telemetry all the way to defensive recommendations without a human touching anything in between.

V6 rebuilt the D3FEND enrichment layer entirely. The previous version capped results at five countermeasures per technique and listed them flat with no context. V6 queries the sub-technique ID first and falls back to the parent technique automatically, removing a gap where sub-techniques like T1059.001 were returning no results. It now returns all available countermeasures grouped by defensive tactic — Detect, Isolate, Harden, Deceive, Evict — so the output reads like an actionable defense brief rather than a raw list. Each countermeasure now includes its D3FEND reference URL. T1059.001 returned 15 countermeasures. T1082 returned 7. T1016 returned 19. The report is saved automatically to JSON.

The Moment It Worked

When V6 produced its first complete output I sat with it for a minute. Critical severity. Eleven MITRE ATT&CK techniques correctly identified. The AI caught the QakBot recon chain, the offensive tools downloaded from GitHub, the Kerberoasting attempt, the credential dumping precursor. It described the attack narrative in language that sounded like a real incident report. Then the MITRE enrichment appeared, then the D3FEND countermeasures, now grouped by tactic, fully linked, with every available countermeasure returned instead of a capped preview. All saved to a JSON report automatically. 2,077 Sysmon events went in. A structured, defense-ready security intelligence report came out.

What the AI Missed and Why That Matters

The keyword filter only passed events containing specific words. Some technique executions may have generated telemetry that did not contain any of our keywords and never made it to the AI.

Sophisticated attackers obfuscate commands. Whoami becomes a PowerShell expression that returns the same information without containing the word whoami. Base64 encoding can hide entire command sequences from keyword filters. The AI analyzes what it receives but if the filter does not pass the right events the AI never sees them.

This is not a flaw unique to this tool. It is a fundamental tension in detection engineering. Keyword-based detection catches known patterns. Behavioral detection catches unknown patterns by looking at relationships and context rather than content. The right approach combines both. This tool does the former. The latter is what future versions will address. Understanding where a tool fails is as valuable as understanding where it succeeds.

What I Learned

Building iteratively forced me to understand each problem before moving to the next one. Every version broke in a way that taught me something the previous version could not have.

Living off the land techniques were more eye-opening in practice than in theory. Reading about them in a textbook is different from watching tasklist and whoami generate real Sysmon telemetry that looks identical to normal administration. That gap between what is malicious and what is legitimate is where detection engineering lives.

The AD limitation was the most clarifying. Running a standalone workstation in WORKGROUP mode cannot generate Kerberoasting telemetry, DCSync events, Pass-the-Hash lateral movement, or any domain-based attack patterns. Those techniques simply do not apply without a domain controller. Understanding why the results had a ceiling told me exactly what Phase 2 needs to be.

The Open Source Contribution

While building V4 I noticed that T1548.002, Bypass UAC, appeared in the AI triage output and in the D3FEND results but returned zero results from the attackcti library. I verified the technique exists on the official MITRE ATT&CK site. It was last modified April 15, 2026. I confirmed I had the latest version of attackcti installed. I searched the repository issues and found no one had reported it.

So I reported it. A properly formatted bug report with reproduction steps, expected behavior, actual behavior, environment details, and context explaining how I found it. That was submitted to the OTRF/ATTACK-Python-Client repository on GitHub.

What Comes Next

The pipeline currently requires manually exporting the Sysmon XML and transferring it to the host machine. That manual step is the biggest remaining limitation. The next version will connect directly to Microsoft Sentinel deployed on Azure, pulling live alerts via the Azure API instead of waiting for a manual export. That closes the loop and makes the pipeline more automated.