fix(cloud-security): AWS Config new recording model + GCP/Azure remediation robustness#3036
Merged
Merged
Conversation
… + auto-remediation
Customer-reported: AWS auto-remediation for "AWS Config recorder not fully
active" generated empty configs, the Retry button did nothing, and applying a
generated fix failed and fell back to dated manual steps. Root cause: the
entire Config check + auto-fix assumed the legacy `recordingGroup.allSupported`
model, but the customer's recorder uses AWS's current model ("Record all
resource types with customizable overrides" = recordingStrategy /
exclusionByResourceTypes).
- config.adapter: the recorder check now treats
recordingStrategy.useOnly === 'ALL_SUPPORTED_RESOURCE_TYPES' (and legacy
allSupported) as "records all", eliminating false positives on the new
model. Genuine EXCLUSION/INCLUSION recorders stay flagged.
- config.adapter: remediation guidance now produces an AWS-valid call —
read the existing recorder, then PutConfigurationRecorder with a clean
recordingGroup { allSupported: true, includeGlobalResourceTypes: true } and
NO recordingStrategy/exclusionByResourceTypes/resourceTypes (those are
mutually exclusive with allSupported and trigger a ValidationException). This
also records the global IAM resource types the customer was missing.
- aws-command-executor: deterministic guardrail (normalizeConfigRecordingGroup)
collapses any all-supported-intent PutConfigurationRecorder to the single
valid shape right before the SDK call, regardless of what the AI emits.
- remediation.service: never cache an empty / non-auto-fixable plan and drop
the stale entry on execute — this is what made "Retry" a guaranteed no-op
(it reloaded the same dead plan). Retry now regenerates.
- ai-remediation.service: generateFixPlan retries once at a higher temperature
when the first pass yields zero fix steps (temp 0 would reproduce it).
- prompts: discourage S3 ACL steps (cause of empty plans), reinforce the valid
Config recorder call, and base manual steps on the current AWS Console.
- RemediationDialog: disable "Apply Fix" on an empty plan and explain why.
Tests: new config.adapter.spec, recordingGroup-normalizer and retry-on-empty
cases; full cloud-security jest suite green (288 passing).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Retry-no-op and empty-plan bugs fixed for AWS also existed in the separate GCP and Azure remediation paths (own services + own planCache + own generators). Bring them to parity. GCP (gcp-remediation.service.ts): - isUsablePlan guard: never cache an empty / non-auto-fixable plan; on execute, only reuse a fresh AND usable cached plan and drop the stale entry otherwise (was the Retry no-op — execute reloaded the same dead plan). Azure (azure-remediation.service.ts): - Same isUsablePlan cache guard on preview + execute. - Delete the cached plan in the execute catch block so Retry regenerates. - If the refined plan flips canAutoFix to false, return guided steps instead of a misleading auto-fix preview. GCP + Azure generators (ai-remediation.service.ts): - generateGcpFixPlan / generateAzureFixPlan now retry once at a higher temperature when the first pass yields zero fix steps (temp 0 reproduces it). Tests: GCP/Azure empty-plan retry cases added; full cloud-security jest suite green (292 passing). Typecheck clean for all changed files. Note (flagged, NOT changed here — separate scope / higher risk): - GCP/Azure scanners swallow per-adapter/per-scope errors (return [] on failure), so a real API/permission failure can look like "0 findings". - GCP Cloud SQL databaseFlags is a REPLACE op with no guard that all existing flags are preserved; disabling public IP has no private-IP precondition check. These are real but need their own design + tests before touching. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
4 issues found across 12 files
Confidence score: 3/5
- There is a concrete regression risk in
apps/api/src/cloud-security/ai-remediation.prompt.ts: the new service-prefixed command naming can conflict with the command schema and produce invalid AWS SDK command values. - The same prompt file also has inconsistent command formatting (
GetBucketPolicymissingCommand) and conflicting ACL guidance, which can lead to remediation output that fails validation or gives mixed instructions. apps/api/src/cloud-security/ai-remediation.service.tshas retry-handling logic that may discard valid non-auto-fix retry results and still return an unusable emptycanAutoFix=trueplan, which is user-impacting behavior.- Pay close attention to
apps/api/src/cloud-security/ai-remediation.prompt.ts,apps/api/src/cloud-security/ai-remediation.service.ts- command/schema consistency and empty-plan retry selection need verification.
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
…y + retry selection - ai-remediation.prompt.ts: use bare AWS SDK command names (with the service named separately) in the new S3 + Config guidance, matching the OUTPUT RULES schema — the prior "s3:PutPublicAccessBlockCommand" / "config-service:..." shorthand could nudge the model to emit a service-prefixed (invalid) command value. Add the missing "Command" suffix to GetBucketPolicy. Remove PutBucketAcl from the "permissions you need" chain so it no longer contradicts the new "never use ACLs" rule. - ai-remediation.service.ts: the empty-plan retry now prefers a retry that is usable OR correctly canAutoFix=false (routes to guided steps) instead of discarding it and returning the original empty canAutoFix=true plan. Applied to AWS, GCP, and Azure generators. - Test: retry prefers a non-auto-fixable result. cloud-security jest suite green (293 passing); changed files typecheck clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
|
🎉 This PR is included in version 3.70.4 🎉 The release is available on GitHub release Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Customer-reported (Ian Hunter): AWS auto-remediation for "AWS Config recorder not fully active" generated empty configs, the Retry button did nothing, and applying a generated fix failed and fell back to dated manual steps. Root cause: the Config check + auto-fix only understood AWS's legacy
recordingGroup.allSupportedmodel, but the customer's recorder uses AWS's current model ("Record all resource types with customizable overrides" =recordingStrategy/exclusionByResourceTypes).While fixing AWS, an audit found the same robustness bugs (Retry no-op, empty-plan handling) living in the separate GCP and Azure remediation paths, so this PR brings all three to parity.
What changed
AWS-specific
config.adapter.ts) now treatsrecordingStrategy.useOnly === 'ALL_SUPPORTED_RESOURCE_TYPES'(and legacyallSupported) as "records all" → eliminates false positives on the new model. GenuineEXCLUSION/INCLUSIONrecorders stay flagged.PutConfigurationRecorderwith a cleanrecordingGroup { allSupported: true, includeGlobalResourceTypes: true }and norecordingStrategy/exclusionByResourceTypes/resourceTypes(those are mutually exclusive withallSupportedand trigger aValidationException). Also records the global IAM resource types the customer was missing.aws-command-executor.ts→normalizeConfigRecordingGroup) collapses any all-supported-intentPutConfigurationRecorderto the single valid shape right before the SDK call, regardless of what the AI emits.Cross-provider robustness (AWS + GCP + Azure)
generateFixPlan/generateGcpFixPlan/generateAzureFixPlanretry once at a higher temperature when the first pass yields zero fix steps (temperature 0 just reproduced the same empty result).canAutoFixto false after reading real state, return guided steps instead of a misleading auto-fix preview; delete the cached plan in the execute catch block.Frontend (shared, all providers)
RemediationDialog.tsx: disable Apply Fix on an empty plan and explain why.Prompts
Tests
config.adapter.spec.ts(recording-model awareness + AWS-valid remediation text).normalizeConfigRecordingGroupcases, AWS + GCP + Azure empty-plan retry cases, andisUsablePlancache-guard cases.Deploy / customer impact
apps/api+apps/appdeploy; the corrected check re-evaluates on the next scan.config:*permission grant onCompAI-Remediator(surfaced in-product).Flagged but intentionally NOT fixed here (separate scope / higher risk)
databaseFlagsis a REPLACE op with no guard that existing flags are preserved; disabling public IP has no private-IP precondition check.🤖 Generated with Claude Code
Summary by cubic
Supports AWS Config’s current recording model and fixes auto‑remediation so plans apply cleanly and Retry regenerates instead of doing nothing. Also hardens GCP and Azure remediation to avoid empty plans and no‑op retries, and aligns prompt command outputs with the executor.
recordingStrategy.useOnly === 'ALL_SUPPORTED_RESOURCE_TYPES'(and legacyallSupported) as “records all.” Remediation uses{ recordingGroup: { allSupported: true, includeGlobalResourceTypes: true } }with norecordingStrategy/exclusionByResourceTypes/resourceTypes. The executor normalizes recorder payloads to this valid shape.canAutoFixto false (routes to guided steps).canAutoFixflips to false and deletes failed cached plans so Retry regenerates. The UI disables Apply Fix on an empty plan and explains why.GetBucketPolicyCommand; removePutBucketAclfrom the permissions chain and reinforce “never use ACLs”; reiterate the valid Config recorder call.Written for commit 432fd39. Summary will update on new commits.