The Security Debt Nobody Talks About
Every engineering team has a security debt problem. Not because they don’t care — because the tooling makes caring nearly impossible.
A scanner runs nightly. It finds 247 CVEs across your dependency tree. An engineer gets assigned. They try the bump, watch 14 tests fail, and close the ticket with “needs investigation.”
The industry numbers most organizations quietly accept:
- Mean Time to Remediate (MTTR) for critical vulns: 60–90 days
- Percentage of scanner findings that get fixed: <30%
- Engineering hours per non-trivial CVE: 2–8 hours
- Dependabot PRs that merge without manual intervention: ~40%
That last number: Dependabot fails on 60% of its own PRs. It bumps versions. When the build breaks, it walks away. The engineer is back to square one.
“Detection is solved. Remediation isn’t. The gap between ‘found’ and ‘fixed’ is where security incidents live.”
What We Built
ShieldOps is an autonomous security remediation platform — not a scanner, not a dashboard. It takes vulnerabilities from “detected” to “pull request ready for review” without human intervention.
Architecture: Trust control plane orchestrating three systems:
- Devin AI — autonomous coding agent
- Datadog — observability for the remediation pipeline
- GitHub — source of truth for code, issues, PRs
Scan → Triage → Devin Fleet → Policy Boundary → GitHub PRs + Datadog6-Stage Pipeline
| Stage | What Happens |
|---|---|
| 01 Scan | pip-audit, npm audit, trivy, semgrep |
| 02 Triage | Severity × reachability × fix availability × complexity |
| 03 Devin Fleet | Context-aware prompts, reads CHANGELOGs, fixes breaking call sites |
| 04 Policy Boundary | Auto-merge · Human review · Blocked |
| 05 Evidence Bundle | What changed, why, blast radius, confidence score |
| 06 Datadog | Fleet health, trust split, cost, audit trail |
The Hero Story: Flask 2→3 in 500K Lines of Code
This is the moment that defines what autonomous remediation actually means.
Apache Superset. 500K lines of Python. Flask 2.3.3, which reached end-of-support. Issue #1 in our auto-created triage: CRITICAL.
What Dependabot Does
Opens a PR bumping Flask from 2.3.3 to 3.1.0. The build fails — breaking imports, changed APIs, deprecated patterns. The PR sits red forever. An engineer closes it with “too complex for automated fix.”
What Devin Did
- Read the Flask 3.x CHANGELOG and migration guide
- Found all version constraints across 5 files
- Updated pyproject.toml dependency spec
- Updated requirements/base.txt pin
- Updated requirements/development.txt
- Fixed integration test imports
- Fixed security dataset test compatibility
- Verified no breaking API call sites remained
# Before (pyproject.toml)
"flask>=2.2.5, <4.0.0"
# After
"flask>=3.1.0, <4.0.0"The result: PR #10— +11/-12 across 5 files. Clean. Mergeable. No human touched it.
“That’s the work only an autonomous coding agent can do — reading the error, understanding the CHANGELOG, fixing the call sites, iterating until green.”
The Other PRs: Not Just the Hero
ShieldOps didn’t just handle the hard one. Here’s the full picture:
| PR | Title | What Devin Did | Changes |
|---|---|---|---|
| #8 | Dockerfile Hardening | Pinned base images to SHA256 digests, purged dev packages, added HEALTHCHECK | +20/-4 |
| #9 | Paramiko CVE-2026-44405 | Upgraded 3.5.1→5.0.0, handled breaking changes (GSSAPI removed, DH modulus), verified API compatibility | +8/-4 |
| #10 | Flask 2.3→3.x (Hero) | Major version upgrade, fixed imports across 5 files in 500K LOC codebase | +11/-12 |
7 Issues Auto-Created with Severity Labels
| Severity | Count | Examples |
|---|---|---|
| CRITICAL | 1 | Flask EOL upgrade |
| HIGH | 3 | SQLAlchemy 1.4→2.0, flask-sqlalchemy, npm audit |
| MEDIUM | 1 | Dockerfile hardening |
| LOW | 1 | Paramiko CVE (CVSS 3.4) |
The Trust Boundary
The VP question: “Is this thing safe to run?”
Not removing humans — making their job trivial. Three tiers:
VP Dashboard
The Enterprise Use Case
Every enterprise with 50+ repos faces the same math.
| Metric | Manual Process | ShieldOps |
|---|---|---|
| Cost per CVE fix | $600 (engineer time) | ~$15 (Devin session) |
| MTTR for critical vulns | 60–90 days | Hours |
| Remediation coverage | <30% of findings | 80%+ |
| Audit evidence | Manual, inconsistent | Automated, every fix |
| Scale | Linear with headcount | Concurrent fleet |
What CISOs actually care about:
- Remediation velocity (not scan counts)
- Evidence for auditors (evidence bundles on every PR)
- Predictable cost per fix
- Fleet scaling without headcount
What Most People Miss
The value isn’t fixing CVEs faster. It’s building a trust layer that lets autonomous agents operate safely in production codebases.
Security remediation is the perfect proving ground:
- Bounded scope (one CVE, one fix)
- Measurable success (tests pass or they don’t)
- Natural trust tiers (auto-merge, human review, blocked)
- Evidence-rich (every fix has a paper trail)
The pattern — scan → triage → autonomous execution → policy boundary → human oversight — applies to all autonomous engineering. Feature development. Refactoring. Migration. Security is just application #1.
“ShieldOps isn’t a faster vulnerability scanner. It’s a trust control plane for an autonomous engineering workforce.”
Tradeoffs and Honest Assessment
What Doesn’t Work Yet
- Architectural migrations (SQLAlchemy 1.4→2.0 requires understanding query patterns across entire codebase)
- Test suite fragility (if existing tests are bad, Devin can’t tell if its fix broke something real)
- Session failure rate: 15–20% of sessions don’t converge (complexity exceeds agent capability)
- Cost: Devin sessions aren’t free. At scale, ACU budgeting becomes a real concern.
Where Humans Are Still Essential
- Setting policy boundaries
- Reviewing major architectural changes
- Expanding auto-merge rules as trust data accumulates
Where This Goes
- CI integration: scan on every merge, not just scheduled
- Multi-repo fleet: same policy boundary across every repo in the org
- Expanding auto-merge: as confidence data accumulates, more fixes qualify for auto-merge
- ACU budgeting: cost optimization at fleet scale
- Beyond security: the same architecture for feature development, refactoring, migration
Gaurav Sharma is the founder of Avyay (अव्यय). ShieldOps is open source at github.com/gaurav21/shieldops. Read more about the platform at avyay.ai/products.