Skip to content

fix(cloud-security): AWS Config new recording model + GCP/Azure remediation robustness#3036

Merged
tofikwest merged 4 commits into
mainfrom
tofik/aws-config-recorder-remediation-fix
Jun 4, 2026
Merged

fix(cloud-security): AWS Config new recording model + GCP/Azure remediation robustness#3036
tofikwest merged 4 commits into
mainfrom
tofik/aws-config-recorder-remediation-fix

Conversation

@tofikwest

@tofikwest tofikwest commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Customer-reported (Ian Hunter): AWS auto-remediation for "AWS Config recorder not fully active" generated empty configs, the Retry button did nothing, and applying a generated fix failed and fell back to dated manual steps. Root cause: the Config check + auto-fix only understood AWS's legacy recordingGroup.allSupported model, but the customer's recorder uses AWS's current model ("Record all resource types with customizable overrides" = recordingStrategy / exclusionByResourceTypes).

While fixing AWS, an audit found the same robustness bugs (Retry no-op, empty-plan handling) living in the separate GCP and Azure remediation paths, so this PR brings all three to parity.

What changed

AWS-specific

  • Config recorder check (config.adapter.ts) now treats recordingStrategy.useOnly === 'ALL_SUPPORTED_RESOURCE_TYPES' (and legacy allSupported) as "records all" → eliminates false positives on the new model. Genuine EXCLUSION/INCLUSION recorders stay flagged.
  • Config remediation now produces an AWS-valid call: read the existing recorder, then PutConfigurationRecorder with a clean recordingGroup { allSupported: true, includeGlobalResourceTypes: true } and no recordingStrategy/exclusionByResourceTypes/resourceTypes (those are mutually exclusive with allSupported and trigger a ValidationException). Also records the global IAM resource types the customer was missing.
  • Deterministic executor guardrail (aws-command-executor.tsnormalizeConfigRecordingGroup) collapses any all-supported-intent PutConfigurationRecorder to the single valid shape right before the SDK call, regardless of what the AI emits.

Cross-provider robustness (AWS + GCP + Azure)

  • Retry no-op fixed: never cache an empty / non-auto-fixable plan, and drop the stale cache entry on execute. Previously execute reloaded the same dead plan, so Retry failed identically every time.
  • Empty-plan retry: generateFixPlan / generateGcpFixPlan / generateAzureFixPlan retry once at a higher temperature when the first pass yields zero fix steps (temperature 0 just reproduced the same empty result).
  • Azure: if the refined plan flips canAutoFix to false after reading real state, return guided steps instead of a misleading auto-fix preview; delete the cached plan in the execute catch block.

Frontend (shared, all providers)

  • RemediationDialog.tsx: disable Apply Fix on an empty plan and explain why.

Prompts

  • Discourage S3 ACL steps (a common cause of empty plans), reinforce the valid Config recorder call, and base manual steps on the current AWS Console.

Tests

  • New config.adapter.spec.ts (recording-model awareness + AWS-valid remediation text).
  • New normalizeConfigRecordingGroup cases, AWS + GCP + Azure empty-plan retry cases, and isUsablePlan cache-guard cases.
  • Full cloud-security jest suite green (292 passing). All changed files typecheck clean.

Deploy / customer impact

  • No DB migration, no env changes, no manifest change, no reconnection. Applies to existing connections.
  • Takes effect after apps/api + apps/app deploy; the corrected check re-evaluates on the next scan.

⚠️ Not yet verified

  • No live apply against a real AWS/GCP/Azure account yet — the only true proof for remediation that mutates infra. Recommend an impersonated end-to-end Auto-Remediate (start with Ian's AWS Config finding) before declaring the ticket resolved.
  • Ian's finding won't auto-clear on the next scan (his recorder genuinely excludes IAM) — he must click Auto-Remediate, and may need a one-time config:* permission grant on CompAI-Remediator (surfaced in-product).

Flagged but intentionally NOT fixed here (separate scope / higher risk)

  • GCP/Azure scanners swallow per-adapter/per-scope errors → a real API/permission failure can look like "0 findings."
  • GCP Cloud SQL databaseFlags is a REPLACE op with no guard that existing flags are preserved; disabling public IP has no private-IP precondition check.

🤖 Generated with Claude Code


Summary by cubic

Supports AWS Config’s current recording model and fixes auto‑remediation so plans apply cleanly and Retry regenerates instead of doing nothing. Also hardens GCP and Azure remediation to avoid empty plans and no‑op retries, and aligns prompt command outputs with the executor.

  • Bug Fixes
    • AWS Config: check recognizes recordingStrategy.useOnly === 'ALL_SUPPORTED_RESOURCE_TYPES' (and legacy allSupported) as “records all.” Remediation uses { recordingGroup: { allSupported: true, includeGlobalResourceTypes: true } } with no recordingStrategy/exclusionByResourceTypes/resourceTypes. The executor normalizes recorder payloads to this valid shape.
    • Caching/Retry (AWS/GCP/Azure): never cache empty or non‑auto‑fixable plans; on execute, only reuse fresh and usable cached plans, otherwise drop and regenerate. Generators retry once at a higher temperature when the first plan is empty, and prefer a retry that either has steps or correctly sets canAutoFix to false (routes to guided steps).
    • Azure/Frontend: Azure returns guided steps if canAutoFix flips to false and deletes failed cached plans so Retry regenerates. The UI disables Apply Fix on an empty plan and explains why.
    • Prompts: use bare AWS SDK command names (service named separately) to match output rules; add the missing GetBucketPolicyCommand; remove PutBucketAcl from the permissions chain and reinforce “never use ACLs”; reiterate the valid Config recorder call.

Written for commit 432fd39. Summary will update on new commits.

Review in cubic

tofikwest and others added 2 commits June 4, 2026 17:54
… + auto-remediation

Customer-reported: AWS auto-remediation for "AWS Config recorder not fully
active" generated empty configs, the Retry button did nothing, and applying a
generated fix failed and fell back to dated manual steps. Root cause: the
entire Config check + auto-fix assumed the legacy `recordingGroup.allSupported`
model, but the customer's recorder uses AWS's current model ("Record all
resource types with customizable overrides" = recordingStrategy /
exclusionByResourceTypes).

- config.adapter: the recorder check now treats
  recordingStrategy.useOnly === 'ALL_SUPPORTED_RESOURCE_TYPES' (and legacy
  allSupported) as "records all", eliminating false positives on the new
  model. Genuine EXCLUSION/INCLUSION recorders stay flagged.
- config.adapter: remediation guidance now produces an AWS-valid call —
  read the existing recorder, then PutConfigurationRecorder with a clean
  recordingGroup { allSupported: true, includeGlobalResourceTypes: true } and
  NO recordingStrategy/exclusionByResourceTypes/resourceTypes (those are
  mutually exclusive with allSupported and trigger a ValidationException). This
  also records the global IAM resource types the customer was missing.
- aws-command-executor: deterministic guardrail (normalizeConfigRecordingGroup)
  collapses any all-supported-intent PutConfigurationRecorder to the single
  valid shape right before the SDK call, regardless of what the AI emits.
- remediation.service: never cache an empty / non-auto-fixable plan and drop
  the stale entry on execute — this is what made "Retry" a guaranteed no-op
  (it reloaded the same dead plan). Retry now regenerates.
- ai-remediation.service: generateFixPlan retries once at a higher temperature
  when the first pass yields zero fix steps (temp 0 would reproduce it).
- prompts: discourage S3 ACL steps (cause of empty plans), reinforce the valid
  Config recorder call, and base manual steps on the current AWS Console.
- RemediationDialog: disable "Apply Fix" on an empty plan and explain why.

Tests: new config.adapter.spec, recordingGroup-normalizer and retry-on-empty
cases; full cloud-security jest suite green (288 passing).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Retry-no-op and empty-plan bugs fixed for AWS also existed in the separate
GCP and Azure remediation paths (own services + own planCache + own
generators). Bring them to parity.

GCP (gcp-remediation.service.ts):
- isUsablePlan guard: never cache an empty / non-auto-fixable plan; on execute,
  only reuse a fresh AND usable cached plan and drop the stale entry otherwise
  (was the Retry no-op — execute reloaded the same dead plan).
Azure (azure-remediation.service.ts):
- Same isUsablePlan cache guard on preview + execute.
- Delete the cached plan in the execute catch block so Retry regenerates.
- If the refined plan flips canAutoFix to false, return guided steps instead of
  a misleading auto-fix preview.
GCP + Azure generators (ai-remediation.service.ts):
- generateGcpFixPlan / generateAzureFixPlan now retry once at a higher
  temperature when the first pass yields zero fix steps (temp 0 reproduces it).

Tests: GCP/Azure empty-plan retry cases added; full cloud-security jest suite
green (292 passing). Typecheck clean for all changed files.

Note (flagged, NOT changed here — separate scope / higher risk):
- GCP/Azure scanners swallow per-adapter/per-scope errors (return [] on
  failure), so a real API/permission failure can look like "0 findings".
- GCP Cloud SQL databaseFlags is a REPLACE op with no guard that all existing
  flags are preserved; disabling public IP has no private-IP precondition check.
These are real but need their own design + tests before touching.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 4, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
app Ready Ready Preview, Comment Jun 4, 2026 10:38pm
comp-framework-editor Ready Ready Preview, Comment Jun 4, 2026 10:38pm
portal Ready Ready Preview, Comment Jun 4, 2026 10:38pm

Request Review

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 12 files

Confidence score: 3/5

  • There is a concrete regression risk in apps/api/src/cloud-security/ai-remediation.prompt.ts: the new service-prefixed command naming can conflict with the command schema and produce invalid AWS SDK command values.
  • The same prompt file also has inconsistent command formatting (GetBucketPolicy missing Command) and conflicting ACL guidance, which can lead to remediation output that fails validation or gives mixed instructions.
  • apps/api/src/cloud-security/ai-remediation.service.ts has retry-handling logic that may discard valid non-auto-fix retry results and still return an unusable empty canAutoFix=true plan, which is user-impacting behavior.
  • Pay close attention to apps/api/src/cloud-security/ai-remediation.prompt.ts, apps/api/src/cloud-security/ai-remediation.service.ts - command/schema consistency and empty-plan retry selection need verification.

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread apps/api/src/cloud-security/ai-remediation.prompt.ts Outdated
Comment thread apps/api/src/cloud-security/ai-remediation.prompt.ts Outdated
Comment thread apps/api/src/cloud-security/ai-remediation.prompt.ts
Comment thread apps/api/src/cloud-security/ai-remediation.service.ts Outdated
tofikwest and others added 2 commits June 4, 2026 18:34
…y + retry selection

- ai-remediation.prompt.ts: use bare AWS SDK command names (with the service
  named separately) in the new S3 + Config guidance, matching the OUTPUT RULES
  schema — the prior "s3:PutPublicAccessBlockCommand" / "config-service:..."
  shorthand could nudge the model to emit a service-prefixed (invalid) command
  value. Add the missing "Command" suffix to GetBucketPolicy. Remove PutBucketAcl
  from the "permissions you need" chain so it no longer contradicts the new
  "never use ACLs" rule.
- ai-remediation.service.ts: the empty-plan retry now prefers a retry that is
  usable OR correctly canAutoFix=false (routes to guided steps) instead of
  discarding it and returning the original empty canAutoFix=true plan. Applied
  to AWS, GCP, and Azure generators.
- Test: retry prefers a non-auto-fixable result.

cloud-security jest suite green (293 passing); changed files typecheck clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel vercel Bot temporarily deployed to Preview – portal June 4, 2026 22:37 Inactive
@vercel vercel Bot temporarily deployed to Preview – app June 4, 2026 22:37 Inactive
@tofikwest tofikwest merged commit 512d819 into main Jun 4, 2026
11 checks passed
@tofikwest tofikwest deleted the tofik/aws-config-recorder-remediation-fix branch June 4, 2026 23:06
@claudfuen

Copy link
Copy Markdown
Contributor

🎉 This PR is included in version 3.70.4 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants