alert-mgmt-01: k8s foundation#787
alert-mgmt-01: k8s foundation#787sradco wants to merge 2 commits intoopenshift:alerts-management-apifrom
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughThe PR upgrades core Kubernetes and Prometheus dependencies to v0.34.2+ while introducing a comprehensive Kubernetes client abstraction layer with managers for Prometheus rules, namespace monitoring, alerting health, and alert configuration management. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (1)
pkg/k8s/client.go (1)
57-62:routeClientsetis created but not stored on theclientstruct.The
routeClientsetis only passed tonewPrometheusAlerts. If future methods onclientneed route access, it won't be available. This is fine ifprometheusAlertsis the only consumer, but worth noting for awareness.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/k8s/client.go` around lines 57 - 62, The client constructor builds a routeClientset but doesn't store it on the client struct; update the client struct (type client) to include a routeClientset field and assign the created routeClientset to that field in the constructor where c := &client{...} is created, so routeClientset is available for future methods; keep passing routeClientset into newPrometheusAlerts as before but also store it on the client instance (reference symbols: routeClientset, client struct, newPrometheusAlerts, prometheusAlerts).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@go.mod`:
- Around line 12-13: Update the Prometheus endpoint constants in vars.go so they
include the required "/api" prefix: change the PrometheusAlertsPath and
PrometheusRulesPath constants from "/v1/alerts" and "/v1/rules" to
"/api/v1/alerts" and "/api/v1/rules" respectively; locate the symbols
PrometheusAlertsPath and PrometheusRulesPath in vars.go and modify their string
values to the corrected paths.
In `@pkg/k8s/client.go`:
- Around line 36-94: The Client interface methods ConfigMaps()
ConfigMapInterface and AlertingHealth(ctx context.Context) (AlertingHealth,
error) are declared but not implemented on the *client type; add receiver
methods on *client named ConfigMaps and AlertingHealth that return the
appropriate values (for ConfigMaps return the configmap manager/implementation
instance held on the client or create one similarly to namespaceManager; for
AlertingHealth call or create the alert-health checker using existing clients
like monitoringv1clientset or prometheusAlerts and return (AlertingHealth,
error)). Ensure the method signatures exactly match the interface (ConfigMaps()
ConfigMapInterface and AlertingHealth(ctx context.Context) (AlertingHealth,
error)) so the compile-time assertion var _ Client = (*client)(nil) succeeds.
In `@pkg/k8s/namespace.go`:
- Around line 69-75: WaitForNamedCacheSync call's boolean return value is
ignored: after starting nm.informer.Run(ctx.Done()) you must check the result of
cache.WaitForNamedCacheSync("Namespace informer", ctx.Done(),
nm.informer.HasSynced) and if it returns false, return an error instead of
returning nm nil; update the function that constructs/returns nm to propagate a
descriptive error (e.g., "namespace informer failed to sync") when
WaitForNamedCacheSync returns false so the caller knows the informer never
synced.
In `@pkg/k8s/prometheus_rule.go`:
- Around line 21-39: Change newPrometheusRuleManager to return
(*prometheusRuleManager, error), check the boolean result of
cache.WaitForNamedCacheSync and return an error if it returns false (e.g.,
context cancelled or sync failed) instead of discarding it; update the function
signature and its return statements (prometheusRuleManager constructor) and
adjust the caller in pkg/k8s/client.go to handle the returned error (as shown in
other managers like newNamespaceManager/newAlertRelabelConfigManager) so
creation fails fast when the informer never syncs.
- Around line 82-89: The Delete handler in prometheusRuleManager (func Delete)
returns an error that only mentions the resource name; update the error
formatting to include both namespace and name (same pattern used in
Update/AddRule) by changing the fmt.Errorf call to include namespace and name in
the message so logs show "failed to delete PrometheusRule <namespace>/<name>:
%w".
- Around line 45-58: The List method prometheusRuleManager.List currently
ignores the namespace parameter and returns all items from
prm.informer.GetStore().List(); update it to filter the returned PrometheusRule
objects by namespace: iterate prs as before, assert item to
*monitoringv1.PrometheusRule (pr), then skip entries where pr.GetNamespace() !=
namespace when namespace is non-empty (and if you want to preserve previous
behavior, treat an empty namespace as "no filtering" to return all). Ensure the
function still returns a slice of monitoringv1.PrometheusRule built from the
filtered items and returns nil error.
In `@pkg/k8s/types.go`:
- Around line 157-159: The IsClusterMonitoringNamespace method currently returns
only bool which conflates “not cluster-monitoring” with lookup/read errors;
update the NamespaceInterface by changing IsClusterMonitoringNamespace(name
string) bool to IsClusterMonitoringNamespace(name string) (bool, error) and then
update all implementations of that method (and callers) to return (true, nil) or
(false, nil) for a definitive negative result and to return (false, err) when
namespace retrieval/label read fails so errors are propagated rather than
silently treated as false.
- Around line 147-150: RelabeledRulesInterface currently has List(ctx
context.Context) []monitoringv1.Rule and Get(ctx context.Context, id string)
(monitoringv1.Rule, bool) which cannot surface backend or context errors; change
the contract so List returns ([]monitoringv1.Rule, error) and Get returns
(monitoringv1.Rule, bool, error) (or alternatively (*monitoringv1.Rule, error)
if preferred), then update all implementations of RelabeledRulesInterface and
their callers to propagate and handle the error return (including
context/cancellation and backend failure cases) so failures are not silently
swallowed; update unit tests and any call sites that assume the old signatures
accordingly.
---
Nitpick comments:
In `@pkg/k8s/client.go`:
- Around line 57-62: The client constructor builds a routeClientset but doesn't
store it on the client struct; update the client struct (type client) to include
a routeClientset field and assign the created routeClientset to that field in
the constructor where c := &client{...} is created, so routeClientset is
available for future methods; keep passing routeClientset into
newPrometheusAlerts as before but also store it on the client instance
(reference symbols: routeClientset, client struct, newPrometheusAlerts,
prometheusAlerts).
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (9)
go.modpkg/k8s/auth_context.gopkg/k8s/client.gopkg/k8s/client_factory.gopkg/k8s/namespace.gopkg/k8s/prometheus_rule.gopkg/k8s/prometheus_rules_types.gopkg/k8s/types.gopkg/k8s/vars.go
| github.com/openshift/client-go v0.0.0-20251123231646-4685125c2287 | ||
| github.com/openshift/library-go v0.0.0-20240905123346-5bdbfe35a6f5 |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
prometheus/prometheus v0.308.0 release validity
💡 Result:
Yes—prometheus/prometheus v0.308.0 is a real, official GitHub release tag.
- GitHub shows a release page for
v0.308.0, pointing to commite44ed35, and indicates it was tagged on Dec 2, 2025 (tagged byjan--f). [1] - Independent release indexing also lists
v0.308.0as released on Dec 2, 2025. [2]
Conclusion: v0.308.0 is a valid Prometheus repository release/tag (not a fabricated version). [1]
Sources: [1] GitHub release tag page, [2] ReleaseAlert release index.
🏁 Script executed:
# Find vars.go file in the repository
fd vars.goRepository: openshift/monitoring-plugin
Length of output: 86
🏁 Script executed:
# Check the go.mod file to see if prometheus/prometheus dependency is listed
cat go.modRepository: openshift/monitoring-plugin
Length of output: 4293
🏁 Script executed:
cat -n pkg/k8s/vars.goRepository: openshift/monitoring-plugin
Length of output: 1777
🌐 Web query:
prometheus v0.308.0 API endpoints /alerts /rules
💡 Result:
In Prometheus, the HTTP API endpoints for alerts and rules are:
- Active alerts:
GET /api/v1/alerts(returns all currently active alerts) [1] - Loaded rules:
GET /api/v1/rules(returns currently loaded recording + alerting rules, and includes active alerts per alerting rule) [1]
/api/v1/rules optional query params include: type=alert|record, rule_name[]=..., rule_group[]=..., file[]=..., plus pagination via group_limit + group_next_token [1].
Note: These are Prometheus server endpoints. If you meant Alertmanager’s alerts API (to send alerts), that’s POST /api/v2/alerts on Alertmanager, not Prometheus [2].
Sources: [1] [2]
🌐 Web query:
prometheus http API paths /v1/alerts /v1/rules documentation
💡 Result:
In Prometheus (server), the stable HTTP API is rooted at /api/v1 (not /v1). (prometheus.io)
Active alerts (Prometheus)
- Path:
GET /api/v1/alerts - Docs: Prometheus “HTTP API” → “Alerts” (prometheus.io)
Loaded rules (Prometheus)
- Path:
GET /api/v1/rules - Optional filters include
type=alert|record,rule_name[],rule_group[],file[], plus pagination viagroup_next_token(withgroup_limit). (prometheus.io)
Common gotcha: Alertmanager “/v1/alerts”
If you meant Alertmanager’s alerts ingestion endpoint: Alertmanager API v1 was deprecated (0.16.0) and removed (0.27.0); use POST /api/v2/alerts instead. (prometheus.io)
Citations:
- 1: https://prometheus.io/docs/prometheus/latest/querying/api/
- 2: https://prometheus.io/docs/prometheus/latest/querying/api/
- 3: https://prometheus.io/docs/prometheus/latest/querying/api/?utm_source=openai
- 4: https://prometheus.io/docs/alerting/latest/alerts_api/?utm_source=openai
Fix API paths: add missing /api prefix to Prometheus endpoints.
The paths defined in vars.go lines 12–13 are incorrect. The Prometheus HTTP API uses /api/v1/alerts and /api/v1/rules, not /v1/alerts and /v1/rules. This applies to prometheus/prometheus v0.308.0 and all other versions—the /api prefix is required.
Current (incorrect) paths:
PrometheusAlertsPath = "/v1/alerts"
PrometheusRulesPath = "/v1/rules"
Update both constants to include the /api prefix to match the official Prometheus API specification.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@go.mod` around lines 12 - 13, Update the Prometheus endpoint constants in
vars.go so they include the required "/api" prefix: change the
PrometheusAlertsPath and PrometheusRulesPath constants from "/v1/alerts" and
"/v1/rules" to "/api/v1/alerts" and "/api/v1/rules" respectively; locate the
symbols PrometheusAlertsPath and PrometheusRulesPath in vars.go and modify their
string values to the corrected paths.
| func newClient(ctx context.Context, config *rest.Config) (Client, error) { | ||
| clientset, err := kubernetes.NewForConfig(config) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to create clientset: %w", err) | ||
| } | ||
|
|
||
| monitoringv1clientset, err := monitoringv1client.NewForConfig(config) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to create monitoringv1 clientset: %w", err) | ||
| } | ||
|
|
||
| osmv1clientset, err := osmv1client.NewForConfig(config) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to create osmv1 clientset: %w", err) | ||
| } | ||
|
|
||
| routeClientset, err := routeclient.NewForConfig(config) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to create route clientset: %w", err) | ||
| } | ||
|
|
||
| c := &client{ | ||
| clientset: clientset, | ||
| monitoringv1clientset: monitoringv1clientset, | ||
| osmv1clientset: osmv1clientset, | ||
| config: config, | ||
| } | ||
|
|
||
| c.prometheusRuleManager = newPrometheusRuleManager(ctx, monitoringv1clientset) | ||
|
|
||
| c.prometheusAlerts = newPrometheusAlerts(routeClientset, clientset.CoreV1(), config, c.prometheusRuleManager) | ||
|
|
||
| c.alertRelabelConfigManager, err = newAlertRelabelConfigManager(ctx, osmv1clientset) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to create alert relabel config manager: %w", err) | ||
| } | ||
|
|
||
| c.alertingRuleManager, err = newAlertingRuleManager(ctx, osmv1clientset) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to create alerting rule manager: %w", err) | ||
| } | ||
|
|
||
| c.namespaceManager, err = newNamespaceManager(ctx, clientset) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to create namespace manager: %w", err) | ||
| } | ||
|
|
||
| c.clusterMonitoringConfig, err = newClusterMonitoringConfigManager(ctx, clientset) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to create cluster monitoring config manager: %w", err) | ||
| } | ||
|
|
||
| c.relabeledRulesManager, err = newRelabeledRulesManager(ctx, c.namespaceManager, c.alertRelabelConfigManager, monitoringv1clientset, clientset) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to create relabeled rules config manager: %w", err) | ||
| } | ||
|
|
||
| return c, nil | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for ConfigMaps and AlertingHealth method implementations in the k8s package
echo "=== Looking for ConfigMaps method on *client ==="
rg -n 'func \(c \*client\) ConfigMaps' pkg/k8s/ --type=go
echo ""
echo "=== Looking for AlertingHealth method on *client ==="
rg -n 'func \(c \*client\) AlertingHealth' pkg/k8s/ --type=go
echo ""
echo "=== Checking all go files in pkg/k8s/ ==="
ls -la pkg/k8s/*.goRepository: openshift/monitoring-plugin
Length of output: 791
🏁 Script executed:
#!/bin/bash
# Check the Client interface definition in types.go
echo "=== Client interface in types.go ==="
cat -n pkg/k8s/types.go | head -100
echo ""
echo "=== Check for compile-time interface assertion in client.go ==="
head -30 pkg/k8s/client.goRepository: openshift/monitoring-plugin
Length of output: 5439
🏁 Script executed:
#!/bin/bash
# Search across entire codebase for these method implementations
echo "=== Search entire repo for ConfigMaps method ==="
rg -n 'func.*ConfigMaps' --type=go
echo ""
echo "=== Search entire repo for AlertingHealth method ==="
rg -n 'func.*AlertingHealth' --type=go
echo ""
echo "=== Check if client struct is actually used/tested ==="
rg -n 'newClient\|Client\(' pkg/ --type=go | head -20Repository: openshift/monitoring-plugin
Length of output: 229
Implement missing interface methods ConfigMaps() and AlertingHealth().
The Client interface declares ConfigMaps() ConfigMapInterface and AlertingHealth(ctx context.Context) (AlertingHealth, error) (types.go lines 47, 26), but neither is implemented on the *client type. The compile-time check on line 18 (var _ Client = (*client)(nil)) will fail at build time.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/k8s/client.go` around lines 36 - 94, The Client interface methods
ConfigMaps() ConfigMapInterface and AlertingHealth(ctx context.Context)
(AlertingHealth, error) are declared but not implemented on the *client type;
add receiver methods on *client named ConfigMaps and AlertingHealth that return
the appropriate values (for ConfigMaps return the configmap
manager/implementation instance held on the client or create one similarly to
namespaceManager; for AlertingHealth call or create the alert-health checker
using existing clients like monitoringv1clientset or prometheusAlerts and return
(AlertingHealth, error)). Ensure the method signatures exactly match the
interface (ConfigMaps() ConfigMapInterface and AlertingHealth(ctx
context.Context) (AlertingHealth, error)) so the compile-time assertion var _
Client = (*client)(nil) succeeds.
| go nm.informer.Run(ctx.Done()) | ||
|
|
||
| cache.WaitForNamedCacheSync("Namespace informer", ctx.Done(), | ||
| nm.informer.HasSynced, | ||
| ) | ||
|
|
||
| return nm, nil |
There was a problem hiding this comment.
WaitForNamedCacheSync return value is discarded.
Same issue as in prometheus_rule.go: if the context is cancelled before the cache syncs, WaitForNamedCacheSync returns false but the manager is returned successfully. Since this function already has an error return, it should check the result.
🐛 Proposed fix
go nm.informer.Run(ctx.Done())
- cache.WaitForNamedCacheSync("Namespace informer", ctx.Done(),
+ if !cache.WaitForNamedCacheSync("Namespace informer", ctx.Done(),
nm.informer.HasSynced,
- )
+ ) {
+ return nil, fmt.Errorf("failed to sync Namespace informer cache")
+ }
return nm, nil📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| go nm.informer.Run(ctx.Done()) | |
| cache.WaitForNamedCacheSync("Namespace informer", ctx.Done(), | |
| nm.informer.HasSynced, | |
| ) | |
| return nm, nil | |
| go nm.informer.Run(ctx.Done()) | |
| if !cache.WaitForNamedCacheSync("Namespace informer", ctx.Done(), | |
| nm.informer.HasSynced, | |
| ) { | |
| return nil, fmt.Errorf("failed to sync Namespace informer cache") | |
| } | |
| return nm, nil |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/k8s/namespace.go` around lines 69 - 75, WaitForNamedCacheSync call's
boolean return value is ignored: after starting nm.informer.Run(ctx.Done()) you
must check the result of cache.WaitForNamedCacheSync("Namespace informer",
ctx.Done(), nm.informer.HasSynced) and if it returns false, return an error
instead of returning nm nil; update the function that constructs/returns nm to
propagate a descriptive error (e.g., "namespace informer failed to sync") when
WaitForNamedCacheSync returns false so the caller knows the informer never
synced.
| func newPrometheusRuleManager(ctx context.Context, clientset *monitoringv1client.Clientset) *prometheusRuleManager { | ||
| informer := cache.NewSharedIndexInformer( | ||
| prometheusRuleListWatchForAllNamespaces(clientset), | ||
| &monitoringv1.PrometheusRule{}, | ||
| 0, | ||
| cache.Indexers{}, | ||
| ) | ||
|
|
||
| go informer.Run(ctx.Done()) | ||
|
|
||
| cache.WaitForNamedCacheSync("PrometheusRule informer", ctx.Done(), | ||
| informer.HasSynced, | ||
| ) | ||
|
|
||
| return &prometheusRuleManager{ | ||
| clientset: clientset, | ||
| informer: informer, | ||
| } | ||
| } |
There was a problem hiding this comment.
WaitForNamedCacheSync return value is discarded; no error returned to caller.
WaitForNamedCacheSync returns false if synchronization fails (e.g., context cancelled), but the result is ignored and newPrometheusRuleManager does not return an error. Other managers in this PR (newNamespaceManager, newAlertRelabelConfigManager) return (*manager, error), but this one only returns *prometheusRuleManager. If the cache never syncs, operations will silently work with stale or empty data.
🐛 Proposed fix: return an error on sync failure
-func newPrometheusRuleManager(ctx context.Context, clientset *monitoringv1client.Clientset) *prometheusRuleManager {
+func newPrometheusRuleManager(ctx context.Context, clientset *monitoringv1client.Clientset) (*prometheusRuleManager, error) {
informer := cache.NewSharedIndexInformer(
prometheusRuleListWatchForAllNamespaces(clientset),
&monitoringv1.PrometheusRule{},
0,
cache.Indexers{},
)
go informer.Run(ctx.Done())
- cache.WaitForNamedCacheSync("PrometheusRule informer", ctx.Done(),
+ if !cache.WaitForNamedCacheSync("PrometheusRule informer", ctx.Done(),
informer.HasSynced,
- )
+ ) {
+ return nil, fmt.Errorf("failed to sync PrometheusRule informer cache")
+ }
return &prometheusRuleManager{
clientset: clientset,
informer: informer,
- }
+ }, nil
}Note: This change requires updating the call site in pkg/k8s/client.go (Line 64) to handle the error:
c.prometheusRuleManager, err = newPrometheusRuleManager(ctx, monitoringv1clientset)
if err != nil {
return nil, fmt.Errorf("failed to create prometheus rule manager: %w", err)
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/k8s/prometheus_rule.go` around lines 21 - 39, Change
newPrometheusRuleManager to return (*prometheusRuleManager, error), check the
boolean result of cache.WaitForNamedCacheSync and return an error if it returns
false (e.g., context cancelled or sync failed) instead of discarding it; update
the function signature and its return statements (prometheusRuleManager
constructor) and adjust the caller in pkg/k8s/client.go to handle the returned
error (as shown in other managers like
newNamespaceManager/newAlertRelabelConfigManager) so creation fails fast when
the informer never syncs.
| func (prm *prometheusRuleManager) List(ctx context.Context, namespace string) ([]monitoringv1.PrometheusRule, error) { | ||
| prs := prm.informer.GetStore().List() | ||
|
|
||
| prometheusRules := make([]monitoringv1.PrometheusRule, 0, len(prs)) | ||
| for _, item := range prs { | ||
| pr, ok := item.(*monitoringv1.PrometheusRule) | ||
| if !ok { | ||
| continue | ||
| } | ||
| prometheusRules = append(prometheusRules, *pr) | ||
| } | ||
|
|
||
| return prometheusRules, nil | ||
| } |
There was a problem hiding this comment.
Bug: List ignores the namespace parameter.
The namespace parameter is accepted but never used. The method returns all PrometheusRule resources from the informer store regardless of namespace. This silently returns a superset of results when callers expect namespace-scoped results.
🐛 Proposed fix to filter by namespace
func (prm *prometheusRuleManager) List(ctx context.Context, namespace string) ([]monitoringv1.PrometheusRule, error) {
prs := prm.informer.GetStore().List()
prometheusRules := make([]monitoringv1.PrometheusRule, 0, len(prs))
for _, item := range prs {
pr, ok := item.(*monitoringv1.PrometheusRule)
if !ok {
continue
}
+ if namespace != "" && pr.Namespace != namespace {
+ continue
+ }
prometheusRules = append(prometheusRules, *pr)
}
return prometheusRules, nil
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/k8s/prometheus_rule.go` around lines 45 - 58, The List method
prometheusRuleManager.List currently ignores the namespace parameter and returns
all items from prm.informer.GetStore().List(); update it to filter the returned
PrometheusRule objects by namespace: iterate prs as before, assert item to
*monitoringv1.PrometheusRule (pr), then skip entries where pr.GetNamespace() !=
namespace when namespace is non-empty (and if you want to preserve previous
behavior, treat an empty namespace as "no filtering" to return all). Ensure the
function still returns a slice of monitoringv1.PrometheusRule built from the
filtered items and returns nil error.
| func (prm *prometheusRuleManager) Delete(ctx context.Context, namespace string, name string) error { | ||
| err := prm.clientset.MonitoringV1().PrometheusRules(namespace).Delete(ctx, name, metav1.DeleteOptions{}) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to delete PrometheusRule %s: %w", name, err) | ||
| } | ||
|
|
||
| return nil | ||
| } |
There was a problem hiding this comment.
Delete error message is missing the namespace.
Other error messages in this file (e.g., Update, AddRule) include both namespace and name, but Delete only includes the name. This makes debugging harder.
Proposed fix
- return fmt.Errorf("failed to delete PrometheusRule %s: %w", name, err)
+ return fmt.Errorf("failed to delete PrometheusRule %s/%s: %w", namespace, name, err)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (prm *prometheusRuleManager) Delete(ctx context.Context, namespace string, name string) error { | |
| err := prm.clientset.MonitoringV1().PrometheusRules(namespace).Delete(ctx, name, metav1.DeleteOptions{}) | |
| if err != nil { | |
| return fmt.Errorf("failed to delete PrometheusRule %s: %w", name, err) | |
| } | |
| return nil | |
| } | |
| func (prm *prometheusRuleManager) Delete(ctx context.Context, namespace string, name string) error { | |
| err := prm.clientset.MonitoringV1().PrometheusRules(namespace).Delete(ctx, name, metav1.DeleteOptions{}) | |
| if err != nil { | |
| return fmt.Errorf("failed to delete PrometheusRule %s/%s: %w", namespace, name, err) | |
| } | |
| return nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/k8s/prometheus_rule.go` around lines 82 - 89, The Delete handler in
prometheusRuleManager (func Delete) returns an error that only mentions the
resource name; update the error formatting to include both namespace and name
(same pattern used in Update/AddRule) by changing the fmt.Errorf call to include
namespace and name in the message so logs show "failed to delete PrometheusRule
<namespace>/<name>: %w".
pkg/k8s/types.go
Outdated
| List(ctx context.Context) []monitoringv1.Rule | ||
|
|
||
| // Get retrieves the relabeled rule for a given id | ||
| Get(ctx context.Context, id string) (monitoringv1.Rule, bool) |
There was a problem hiding this comment.
Expose failures in RelabeledRulesInterface reads.
At Line [147] and Line [150], the API can’t report failures even though calls are context-aware. This forces implementations to swallow backend/config errors and can present incorrect empty data.
💡 Suggested contract update
type RelabeledRulesInterface interface {
// List retrieves the relabeled rules for a given PrometheusRule
- List(ctx context.Context) []monitoringv1.Rule
+ List(ctx context.Context) ([]monitoringv1.Rule, error)
// Get retrieves the relabeled rule for a given id
- Get(ctx context.Context, id string) (monitoringv1.Rule, bool)
+ Get(ctx context.Context, id string) (monitoringv1.Rule, bool, error)
// Config returns the list of alert relabel configs
Config() []*relabel.Config
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/k8s/types.go` around lines 147 - 150, RelabeledRulesInterface currently
has List(ctx context.Context) []monitoringv1.Rule and Get(ctx context.Context,
id string) (monitoringv1.Rule, bool) which cannot surface backend or context
errors; change the contract so List returns ([]monitoringv1.Rule, error) and Get
returns (monitoringv1.Rule, bool, error) (or alternatively (*monitoringv1.Rule,
error) if preferred), then update all implementations of RelabeledRulesInterface
and their callers to propagate and handle the error return (including
context/cancellation and backend failure cases) so failures are not silently
swallowed; update unit tests and any call sites that assume the old signatures
accordingly.
| type NamespaceInterface interface { | ||
| // IsClusterMonitoringNamespace checks if a namespace has the openshift.io/cluster-monitoring=true label | ||
| IsClusterMonitoringNamespace(name string) bool |
There was a problem hiding this comment.
Don’t conflate namespace lookup failure with a negative classification.
At Line [159], bool alone cannot distinguish “not cluster-monitoring” from “failed to read namespace labels.” This can silently downgrade real errors into false.
💡 Suggested contract update
type NamespaceInterface interface {
// IsClusterMonitoringNamespace checks if a namespace has the openshift.io/cluster-monitoring=true label
- IsClusterMonitoringNamespace(name string) bool
+ IsClusterMonitoringNamespace(ctx context.Context, name string) (bool, error)
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/k8s/types.go` around lines 157 - 159, The IsClusterMonitoringNamespace
method currently returns only bool which conflates “not cluster-monitoring” with
lookup/read errors; update the NamespaceInterface by changing
IsClusterMonitoringNamespace(name string) bool to
IsClusterMonitoringNamespace(name string) (bool, error) and then update all
implementations of that method (and callers) to return (true, nil) or (false,
nil) for a definitive negative result and to return (false, err) when namespace
retrieval/label read fails so errors are propagated rather than silently treated
as false.
|
@sradco: This pull request references mgmt-01 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@sradco: This pull request references mgmt-01 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
pkg/k8s/const.go
Outdated
| PlatformRouteName = "prometheus-k8s" | ||
| PlatformAlertmanagerRouteName = "alertmanager-main" | ||
| UserWorkloadRouteNamespace = "openshift-user-workload-monitoring" | ||
| UserWorkloadRouteName = "prometheus-user-workload" |
There was a problem hiding this comment.
there's no prometheus-user-workload route in OCP?
There was a problem hiding this comment.
The prometheus-user-workload route doesn't exist in standard OCP, AFAIU.
The code probes it optimistically and falls back to direct service access (prometheus-user-workload-web:9090) when the route is not found.
The same pattern applies also to alertmanager-user-workload.
This keeps the code forward-compatible if the route is introduced in future OCP versions or custom installs.
Do you think we should only use the direct service access?
| @@ -0,0 +1,34 @@ | |||
| package k8s | |||
|
|
|||
| const ( | |||
There was a problem hiding this comment.
not sure how all these constants will be used.
| // GetRulesRequest holds parameters for filtering rules alerts. | ||
| type GetRulesRequest struct { | ||
| // Labels filters alerts by labels | ||
| Labels map[string]string |
There was a problem hiding this comment.
why not supporting label matchers like upstream Prometheus API?
3f23644 to
ef9bb18
Compare
|
@sradco: This pull request references mgmt-01 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
ef9bb18 to
0ed1db6
Compare
|
@sradco: This pull request references mgmt-01 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@sradco: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@sradco: This pull request references mgmt-01 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sradco The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
2bf65d3 to
e38bbf8
Compare
|
@sradco: This pull request references mgmt-01 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
e38bbf8 to
fb22c01
Compare
…metheusRule parsing Add the foundational k8s layer for alert management: - Client factory for dynamic and typed Kubernetes clients - Namespace resolution helpers - Auth context extraction from HTTP requests - Core types (AlertingRuleSource, sortable slices) - Shared vars (label names, configmap keys) - PrometheusRule types and rule parsing/filtering helpers Signed-off-by: avlitman <alitman@redhat.com> Signed-off-by: Shirly Radco <sradco@redhat.com> Signed-off-by: machadovilaca <machadovilaca@gmail.com> Co-authored-by: Cursor AI Agent <cursor-ai@users.noreply.github.com> Made-with: Cursor
fb22c01 to
10099ad
Compare
|
@jgbernalp , @jan--f , @avlitman , @machacekondra Please review this PR. |
3c4424e to
68beae9
Compare
a82ec2e to
0480720
Compare
…nt API Signed-off-by: Shirly Radco <sradco@redhat.com> Signed-off-by: João Vilaça <jvilaca@redhat.com> Signed-off-by: Aviv Litman <alitman@redhat.com> Co-authored-by: AI Assistant <noreply@cursor.com>
0480720 to
1cbfb7d
Compare
Alert Management API — Part 1/8: k8s foundation
Summary
GET /api/v1/alerting/healthstub (returns501 Not Implemented) to make the intended API shape/call-path reviewable earlyDependencies
This PR is part of a stacked series. Please review in order.
Summary by CodeRabbit
Release Notes
Chores
New Features