Submit your favorite resources for free.

Submit
HackDB logoHackDB
icon of Cochise

Cochise

Autonomous LLM-driven agent for Active Directory pen-testing. Automates attack chains from assumed breach to domain admin using a Planner/Executor model.

Introduction

Cochise is a lightweight, autonomous penetration testing prototype designed to simulate "Assumed Breach" scenarios within Microsoft Active Directory environments. Built in approximately 576 lines of Python, the tool leverages Large Language Models (LLMs) to automate complex attack chains, including reconnaissance, credential harvesting, and lateral movement. It operates via a dual-component architecture consisting of a persistent Strategic Planner and ephemeral Tactical Executors that run commands on a target-network Kali Linux VM via SSH.

By integrating with LiteLLM, Cochise supports over 100 model providers, allowing red teamers to benchmark the cybersecurity capabilities of different LLMs. In testing against the GOAD (Game of Active Directory) lab environment, the tool has demonstrated the ability to escalate from an initial foothold to full domain dominance in under two hours at a minimal operational cost.

Key Features
  • Planner/Executor Architecture: Separates high-level strategic planning from tactical command execution to manage context windows effectively.
  • Autonomous AD Hacking: Automates Kerberoasting, AS-REP roasting, domain enumeration, and credential pivoting without human intervention.
  • Multi-Model Support: Uses LiteLLM to interface with OpenAI, Anthropic, Gemini, and open-weight models via a single environment variable.
  • Detailed Logging: Captures every LLM interaction, shell command, and discovered credential in structured JSON for post-operation replay and analysis.
  • Built-in Knowledge Base: Tracks compromised accounts and discovered network entities across multiple rounds of interaction.
Use Cases
  • Red Team Automation: Scaling initial internal reconnaissance and exploitation phases in complex Active Directory environments.
  • LLM Benchmarking: Evaluating and comparing the offensive security reasoning capabilities of frontier models like Claude 3.5, GPT-4, and DeepSeek.
  • Training and Research: Serving as a readable baseline for developing more advanced agentic AI security tools and offensive security frameworks.

Information

  • Publisher
  • Websitegithub.com
  • Created date04/18/2026
  • Published date04/18/2026
215+ Subscribers
Newsletter

Join 215+ Professionals

Receive our monthly newsletter featuring the latest additions to the directory.

No spam. Unsubscribe anytime.