Building the Sensor Layer: Why AI Code Security Needs Computational Gates, Not Just Prompts
Source: martinfowler
The Thoughtworks team discovered something familiar while scaling an AI-generated video assembly prototype: their coding assistants recommended making storage buckets public and granting excessive IAM permissions. Both suggestions would have created serious security exposure. Both required a human to notice and push back.
This pattern repeats across the industry. According to 2026 security research, 25% of AI-generated code ships with confirmed vulnerabilities. The number makes sense once you understand how these tools work. Language models optimize for the path of least resistance, and that path rarely aligns with security best practices. A public storage bucket is simpler than configuring signed URLs. Broad IAM roles are easier than scoping to minimum required permissions.
The article correctly identifies that prompting an AI to “be secure” is insufficient. Prompts can be overridden, misunderstood, or simply ignored when a user phrases their request differently. What interests me more is the solution architecture they outline but don’t fully expand on: the computational sensor layer.
The Harness Model
Birgitta Böckeler’s harness engineering framework provides the mental model. She describes wrapping AI coding agents in controls structured along two axes. Guides (feedforward) steer the model before it acts. Sensors (feedback) observe and validate after it acts. Each can be either computational (deterministic, CPU-run) or inferential (semantic, AI-driven).
The critical insight is that inferential guides alone create a single point of failure. If your security context file tells the AI to follow least privilege principles, you’re relying on the model’s ability to correctly interpret and apply that instruction every time. Models are probabilistic. They will fail.
Computational sensors provide the deterministic safety net. These are the tools that run after code is generated and before it ships: static analysis, credential scanners, infrastructure validators, dependency auditors. They operate independent of what the AI was told to do.
What the Sensor Layer Catches
The two incidents from the Thoughtworks article map directly to sensor categories:
Infrastructure Misconfigurations
Public storage buckets, overly permissive security groups, databases exposed to 0.0.0.0/0. Tools like Checkov, tfsec, and Terrascan scan infrastructure-as-code for these patterns. They integrate into CI pipelines and fail builds when violations appear.
The specific check for the Thoughtworks case would look like this in Checkov:
from checkov.common.models.enums import CheckResult
from checkov.terraform.checks.resource.base_resource_check import BaseResourceCheck
class GCSBucketPublicAccess(BaseResourceCheck):
def __init__(self):
name = "Ensure GCS bucket is not publicly accessible"
id = "CKV_GCP_28"
supported_resources = ['google_storage_bucket_iam_binding',
'google_storage_bucket_iam_member']
categories = ['IAM']
super().__init__(name=name, id=id, categories=categories,
supported_resources=supported_resources)
def scan_resource_conf(self, conf):
if 'members' in conf:
if 'allUsers' in conf['members'] or 'allAuthenticatedUsers' in conf['members']:
return CheckResult.FAILED
return CheckResult.PASSED
This runs deterministically. The model cannot negotiate with it. If the infrastructure code grants public access, the check fails, regardless of what was in the prompt.
Excessive IAM Permissions
The Access Token Creator role incident reflects a common pattern: AI tools suggest roles that are broader than necessary because the narrower alternative requires understanding the specific permissions needed. Tools like IAM Access Analyzer (AWS) and Policy Troubleshooter (GCP) can identify overprivileged service accounts.
For proactive detection, Parliament (AWS) and IAM Policy Validator scan policy documents for privilege escalation risks:
# Scan an AWS IAM policy for security issues
parliament --aws-managed-policies policy.json
These tools catch policies that grant iam:*, s3:*, or other wildcard permissions that allow lateral movement.
Hardcoded Secrets
AI models sometimes generate API keys directly in code or commit example credentials that developers forget to replace. Gitleaks, TruffleHog, and detect-secrets scan for these patterns:
# .gitlab-ci.yml
secret-scan:
stage: test
image: zricethezav/gitleaks:latest
script:
- gitleaks detect --source . --verbose --no-git
allow_failure: false
The key parameter is allow_failure: false. If secrets are detected, the pipeline stops.
Vulnerable Dependencies
AI coding assistants sometimes suggest outdated packages or libraries with known CVEs. Dependabot, Snyk, and OWASP Dependency-Check monitor for this:
# Scan for vulnerable dependencies
snyk test --severity-threshold=high
According to the 2026 Black Duck OSSRA report, 78% of codebases contain high or critical severity vulnerabilities, with vulnerabilities per codebase increasing 107% year over year. Automated scanning is no longer optional.
Code-Level Vulnerabilities
SQL injection, path traversal, insecure deserialization. Static analysis tools like Semgrep, CodeQL, and Bandit (Python) detect these at the code level:
# semgrep rules for common vulnerabilities
rules:
- id: sql-injection
pattern: |
cursor.execute(f"SELECT * FROM users WHERE id = {$ID}")
message: SQL injection vulnerability detected
severity: ERROR
languages: [python]
These rules are pattern-based and deterministic. They run in milliseconds and integrate into pre-commit hooks or CI.
Implementation Architecture
The sensor layer belongs in three places:
Pre-commit Hooks
Local validation before code reaches the repository. Use pre-commit to orchestrate multiple tools:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/gitleaks/gitleaks
rev: v8.18.2
hooks:
- id: gitleaks
- repo: https://github.com/semgrep/semgrep
rev: v1.70.0
hooks:
- id: semgrep
args: ['--config', 'auto', '--error']
- repo: https://github.com/aquasecurity/tfsec
rev: v1.28.1
hooks:
- id: tfsec
This configuration runs Gitleaks, Semgrep, and tfsec on every commit attempt. If any check fails, the commit is rejected.
CI Pipeline Gates
Centralized enforcement for all code entering the main branch:
# GitHub Actions example
name: Security Scan
on: [push, pull_request]
jobs:
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run Semgrep
uses: returntocorp/semgrep-action@v1
with:
config: auto
- name: Run Checkov
uses: bridgecrewio/checkov-action@master
with:
framework: terraform
soft_fail: false
- name: Run Snyk
uses: snyk/actions/node@master
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
with:
args: --severity-threshold=high
The soft_fail: false setting ensures builds fail if issues are found. This creates a non-negotiable gate.
IDE Integration
Real-time feedback while the AI is generating code. Many SAST tools now offer IDE plugins:
These surface issues immediately, before code is committed.
The Layered Defense Model
The Thoughtworks article advocates for a security context file as an inferential guide. That file should exist, but it’s layer one of three:
-
Inferential Guide (Security Context File): Loaded into the AI’s context at session start, containing organization security policies, architectural patterns, and coding standards. This increases the probability of secure code generation.
-
Computational Sensors (SAST/DAST/IAST): Automated tools that validate output deterministically. These run in pre-commit hooks, CI pipelines, and IDEs to catch what the guide missed.
-
Human Review: Code review with security training and an explicit checklist for AI-generated code. According to Aikido Security’s 2026 report, 1 in 5 enterprise breaches now originate from AI-generated code, making trained human review essential.
Layers one and three are probabilistic. Humans make mistakes. AI models hallucinate or misinterpret instructions. Layer two is deterministic. The tools always run, always apply the same rules, and always fail unsafe code.
The Performance Question
Running multiple security scanners sounds slow. In practice, most SAST tools complete in seconds. Gitleaks scans a typical repository in under 5 seconds. Semgrep runs sub-10 seconds for most codebases. Checkov processes Terraform files nearly instantaneously.
The pipeline bottleneck is usually dependency scanning (Snyk, Dependabot), which requires network calls to vulnerability databases. Caching and incremental scans mitigate this:
- name: Cache Snyk
uses: actions/cache@v3
with:
path: ~/.cache/snyk
key: ${{ runner.os }}-snyk-${{ hashFiles('**/package-lock.json') }}
For local development, pre-commit hooks can run a subset of checks (secrets, basic SAST) while CI runs the full suite (dependency scanning, infrastructure validation, deep SAST).
What This Means for Teams Using AI Coding Tools
If you’re building with GitHub Copilot, Cursor, Claude Code, or any AI assistant, the workflow is:
-
Provide a security context file in your project (
.github/copilot-instructions.md,CLAUDE.md, or tool-specific format) with your organization’s security requirements. -
Install pre-commit hooks with at minimum: secret scanning (Gitleaks), SAST (Semgrep), and infrastructure scanning if you use IaC (Checkov, tfsec).
-
Configure CI to run the same checks plus dependency scanning (Snyk, Dependabot) and fail builds on high/critical findings.
-
Train developers to review AI-generated code specifically for security patterns: authentication bypasses, authorization flaws, injection vulnerabilities, excessive permissions.
The ProjectDiscovery 2026 AI Coding Impact Report found that 62% of security teams say keeping up with AI-generated code volume is getting harder. Computational sensors are how you scale review capacity without scaling headcount.
The Broader Pattern
This layered approach applies beyond security. If you want AI-generated code to follow architectural patterns, enforce them with linters (architectural decision records, module boundaries, dependency graphs). If you want test coverage, enforce it with coverage gates (pytest --cov --cov-fail-under=80). If you want accessible UI components, enforce it with axe-core in CI.
Prompts are a suggestion layer. Deterministic checks are the enforcement layer. Build both.
The Thoughtworks article provides the conceptual framework and real-world motivation. The implementation detail missing from that story is which specific tools to run, where to run them, and how to configure them to fail unsafe code before it reaches production. The tools listed here are open source, widely adopted, and integrate into existing development workflows with minimal friction.
If 42% of new enterprise software is now AI-generated or AI-assisted, the teams that build computational sensor layers will ship faster and more securely than teams relying on prompts and human review alone.