How to Remediate a Vulnerability

A few years back, I wrote an internal blog post to help IBM developers resolve their vulnerabilities. As lead engineer on the CIO's Developer Experience team, I saw firsthand how many developers were unsure how to fix the vulnerabilities my CI/CD pipeline found. The blog covered the basics: update your dependencies, remove unused dependencies, use overrides when necessary, and test every change. Developers improved, and we saw large reductions in overall vulnerability counts.

Fast-forward to now, and things have changed. Most developers use AI agents for development, resulting in more projects than ever. Without assistance, maintaining all these projects becomes a project itself. I've been tinkering with a new way to resolve vulnerabilities, empowering AI agents to make these changes with confidence.

The Current State

Current AI tooling doesn't resolve vulnerabilities well out of the box. Agents usually update the right dependencies to the correct version but often miss testing the change, updating peer dependencies, or re-scanning dependencies. This creates a longer feedback loop: developers push changes expecting all issues resolved, only to find the CI pipeline failing. The agent is almost there but needs more help to be reliable.

Giving Agents Some Help

My first attempt at improving the agent workflow was to write a markdown-based skill the agent would use when resolving vulnerabilities. This skill described the expectations in detail: check the current status of the project, update dependencies to resolve vulnerabilities, and test the changes. Still, the agent only partially adhered to the guidelines, claiming completion after updating dependencies but skipping tests entirely. This approach proved no real improvement over the base model.

I thought about the problem more and realized most of what I needed was deterministic. The agent was already good at fixing code after breaking updates but wasn't good at following all my testing requirements. I had an idea: add bash scripts to the skill that the agent could call to handle all the deterministic checks.

I wrote three scripts: an audit script to check the current state of the project, a fix script to update dependencies, and a verify script to run all tests after the changes. All our projects use a standard CI/CD pipeline with predictable scripts for tasks like linting and unit testing. Each script outputs a markdown report, making results easy to review.

I rewrote the skill to be more high-level, simply calling these scripts in a specific order. After each step, the agent reviews the results with the human. For example, if the audit script finds low test coverage, the user can add more tests before making dependency changes. The agent uses its existing capabilities to fix code issues after the fix script runs and uses the verify script to find any remaining issues.

Results

I've seen immediate improvement in the agent's ability to resolve vulnerabilities. With this skill, the agent consistently follows the workflow, auditing the project and verifying changes. The agent continues to fix breaking changes, so we haven't lost the main benefit of using an agent for vulnerability remediation.

Next, I plan to test letting the agent complete the workflow without interaction. I'd need to make some improvements unrelated to vulnerabilities first: running our other tooling to ensure CI/CD passes. If the agent can make these changes independently, I can give it a list of repositories to work on while I sleep.