The Challenge: AI Agents Write Vulnerable Code
We use AI coding agents (Claude Code, Codex CLI) to build our entire platform. They’re fast — an agent can scaffold a new microservice in 20 minutes, write integration tests, and push a PR. But speed without security is just faster failure.
When we audited the code our agents had written, we found a pattern of vulnerabilities that no amount of prompt engineering could prevent:
- SQL injection — String concatenation in database queries. The agent knew it “should use parameterized queries” but didn’t always do it, especially in utility scripts and one-off tools.
- Silent exception swallowing — except: pass blocks that hide critical errors. The agent writes them to “handle gracefully” — which means hiding failures.
- Weak cryptography — MD5 for hashing where SHA-256 was required. The agent picks the first hash function it recalls from training data.
- Outdated dependencies — Agents install the version they saw most often in training, not the latest patched version.
“The agent doesn’t know that os.system(user_input)is a command injection vulnerability. It just knows the code compiles and the tests pass.”
Traditional code review catches some of this. But when agents are shipping 9 tasks/day across 5 codebases, manual review becomes the bottleneck — and the thing that gets skipped at 11 PM.
The Solution: RAKṢĀ + Datadog Code Security
RAKṢĀ (रक्षा — Sanskrit for “protection”) is our security scanning platform, integrated with Datadog Code Security MCP for deep static analysis. Together, they form a pre-deployment security gate that catches vulnerabilities before code leaves the CI pipeline.
Architecture: Scan → Block → Report → Fix
Agent writes code
│
▼
┌──────────────────────────────────────────────┐
│ GitHub Action (CI) │
│ │
│ ┌─────────────┐ ┌──────────────────────┐ │
│ │ RAKṢĀ │ │ Datadog Code │ │
│ │ Cloud │ │ Security MCP │ │
│ │ Scanner │ │ │ │
│ │ │ │ • SAST analysis │ │
│ │ • SAST │ │ • Secret detection │ │
│ │ • SCA │ │ • CVE scanning │ │
│ │ • Secrets │ │ • SBOM generation │ │
│ └──────┬──────┘ └──────────┬───────────┘ │
│ │ │ │
│ └─────────┬───────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ SARIF Report │ │
│ │ + GitHub Code │ │
│ │ Scanning │ │
│ └────────┬────────┘ │
│ │ │
│ Findings > threshold? │
│ │ │ │
│ YES NO │
│ │ │ │
│ Block Deploy ✅ Deploy │
└─────────┼────────────────┼───────────────────┘
│ │
▼ ▼
Agent fixes Production
in same PRThe critical design choice: SARIF output feeds directly back into the agent’s context. The same coding agent that wrote the vulnerable code receives the scan results and fixes the issues — in the same PR, the same session. No human handoff. No Jira ticket that sits for weeks.
Real Scan Data: What We Found Today
Here’s what RAKṢĀ + Datadog Code Security found in a single scan across two repositories — RAKṢĀ itself and DevOps RAG:
SAST Findings: 46 Total
| Repository | HIGH | MEDIUM | Total |
|---|---|---|---|
| RAKṢĀ | 17 | 17 | 34 |
| DevOps RAG | 5 | 7 | 12 |
Top Findings by Category
| Finding | Severity | File | Fix |
|---|---|---|---|
| SQL Injection | HIGH | vuln_db.py | Parameterized queries |
| Silent Exceptions (×8) | MEDIUM | Multiple files | Specific exception types + logging |
| Weak Hashing (MD5) | HIGH | utils/hash.py | Migrated to SHA-256 |
| Hardcoded Credentials | HIGH | config.py | Environment variables |
Dependency Vulnerabilities: 10 CVEs
| Package | Vulnerability | Severity | Fix Version |
|---|---|---|---|
| urllib3 | SSRF / Header injection | HIGH | ≥2.3.0 |
| starlette | Path traversal | HIGH | ≥0.40.0 |
| requests | Certificate verification bypass | MEDIUM | ≥2.32.0 |
| python-dotenv | Path injection | MEDIUM | ≥1.1.0 |
Secret detection: 0 findings. This is the one area where our agents have been consistently disciplined — likely because we have a strong .gitignore and .env.example pattern that the agents learned from.
CI/CD Integration: The GitHub Action
RAKṢĀ runs on every push and every PR. The GitHub Action is the primary enforcement point:
# .github/workflows/security.yml
name: RAKṢĀ Security Scan
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
jobs:
security-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run RAKṢĀ Security Scan
uses: avyay/raksha-scan-action@v1
with:
severity_threshold: high
scan_type: full # SAST + SCA + secrets
exclude_paths: |
sample-code/
tests/fixtures/
env:
RAKSHA_API_KEY: ${{ secrets.RAKSHA_API_KEY }}
DD_API_KEY: ${{ secrets.DD_API_KEY }}
- name: Upload SARIF to GitHub Security
if: always()
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: raksha-results.sarif
- name: Post findings to PR
if: github.event_name == 'pull_request' && failure()
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const sarif = JSON.parse(fs.readFileSync('raksha-results.sarif'));
const findings = sarif.runs[0].results.length;
github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body: `🛡️ **RAKṢĀ found ${findings} security issues.** Fix before merge.`
});Docker Image Hardening
One subtle but important integration: the .dockerignore excludes sample vulnerable code that ships with RAKṢĀ for testing:
# .dockerignore
sample-code/ # Intentionally vulnerable samples
tests/fixtures/vuln/ # Test fixtures with known vulnerabilities
*.sarif # Scan results (contain file paths)
.env* # Environment filesWithout this, RAKṢĀ’s own Docker image would contain the very vulnerabilities it’s designed to detect — a common trap in security tooling.
The Results: Zero Vulnerabilities in Production
After implementing RAKṢĀ + Datadog Code Security across all five Avyay microservices:
| Metric | Before RAKṢĀ | After RAKṢĀ |
|---|---|---|
| SAST findings reaching prod | Unknown (no scanning) | 0 |
| Known CVEs in dependencies | 10+ (untracked) | 0 |
| Secret leaks | 2 incidents (caught manually) | 0 |
| Time from finding to fix | Days (manual review) | <30 min (agent auto-fix) |
| Scan coverage | Ad-hoc | 100% of commits |
The key metric isn’t the number of findings caught — it’s the time from finding to fix: under 30 minutes.Because SARIF output feeds directly into the coding agent’s context, the same agent that introduced the vulnerability fixes it in the same PR cycle. No handoff. No ticket. No “we’ll get to it next sprint.”
Get Started with RAKṢĀ
# Install the CLI
pip install raksha-cli
# Scan your project
raksha scan --severity high --format sarif
# Or use the GitHub Action
# Add avyay/raksha-scan-action@v1 to your workflow
# Or call the API directly
curl -X POST https://raksha.avyay.ai/v1/scan \
-H "Authorization: Bearer your-key" \
-F "files=@./src" \
-F "scan_type=full"- CLI: pip install raksha-cli
- GitHub Action: github.com/marketplace/actions/raksha-security-scan
- Documentation: docs.avyay.ai/raksha
Gaurav Sharma is the founder of Avyay (अव्यय). RAKṢĀ is the security layer of the Avyay platform. Read about the full architecture at avyay.ai/blog/avyay-architecture.