RHINENG-26546: add job to backfill workspace data by TenSt · Pull Request #2221 · RedHatInsights/patchman-engine

TenSt · 2026-06-01T16:09:49Z

This PR:

Adds a batched workspace_backfill job to populate workspace_id and workspace_name from workspaces
Adds job tests and prometheus metrics
Adds grafana panel
Adds a suspended CronJob in deploy/clowdapp.yaml (every 10 min, 50k rows/run in prod)
Runs the job as the admin DB user (for session_replication_role and updates)
Adds local Docker compose, e2e script
Adds a slim test_generate_system_inventory.sql for local load testing
Adds dev/workspace_backfill.md with docs and how to test locally

sourcery-ai · 2026-06-01T16:10:09Z

Reviewer's Guide

Introduces a new batched workspace backfill job that runs as the admin DB user to populate denormalized workspace_id and workspace_name columns from the workspaces JSON field, wires it into the job runner and ClowdApp as a suspended CronJob with tunable limits, exposes Prometheus metrics and a Grafana panel for monitoring, and adds local Docker/e2e tooling plus SQL helpers for realistic load generation and verification.

File-Level Changes

Change	Details	Files
Add a batched workspace backfill job that updates system_inventory workspace columns from the workspaces JSON field with safety limits and metrics.	Introduce tasks/workspace_backfill package implementing account-partitioned batching with per-run row limits, per-batch size, and optional sleep between batches. Run the backfill using the admin DB configuration so it can set session_replication_role to replica and update partitioned tables in a transaction-safe way. Query pending accounts and rows via read-replica transactions, and track pending/invalid row stats with structured logging before each run. Expose Prometheus counters for rows updated, batches completed, and batch errors, and push them via a dedicated pushgateway job name. Add unit test that seeds systems with workspaces JSON, clears denormalized columns under replica role, runs backfillBatch, and asserts workspace columns are set while last_updated is preserved. Wire the new job into main.runJob via the workspace_backfill case so it can be invoked through the existing job entrypoint.	`tasks/workspace_backfill/workspace_backfill.go` `tasks/workspace_backfill/metrics.go` `tasks/workspace_backfill/workspace_backfill_test.go` `main.go`
Expose configuration and deployment wiring for the workspace backfill job, including CronJob, POD_CONFIG, and tunable batch parameters.	Add WorkspaceBackfillMaxRowsPerRun, WorkspaceBackfillBatchSize, and WorkspaceBackfillBatchSleepMs to the tasks config, sourced from PodConfig with sensible defaults. Add a workspace-backfill CronJob object in the ClowdApp manifest that runs the job command, uses the admin DB check initContainer, and passes PROMETHEUS_PUSHGATEWAY and WORKSPACE_BACKFILL_CONFIG env. Introduce new ClowdApp parameters for schedule, suspend flag, and default WORKSPACE_BACKFILL_CONFIG string tuned for production (50k rows/run, 1k batch size).	`tasks/config.go` `deploy/clowdapp.yaml`
Add observability via Grafana and Prometheus for workspace backfill progress and errors.	Register three Prometheus counters (rows_updated, batches, batch_errors) under the patchman_engine_workspace_backfill namespace and expose them via a pushgateway pusher. Add a Grafana timeseries panel that graphs increases of rows updated, batches, and batch errors over the standard $interval for the workspace_backfill job.	`tasks/workspace_backfill/metrics.go` `dashboards/app-sre/grafana-dashboard-insights-patchman-engine-general.configmap.yaml`
Provide local Docker, SQL generators, and scripts to run and validate workspace backfill end-to-end.	Add a dedicated docker-compose.workspace-backfill.yml stack with a DB container and a workspace-backfill runner that executes a new e2e script. Introduce workspace_backfill.env to configure local POD_CONFIG (batch size, max rows per run, sleep) separate from other jobs. Create dev/test_generate_system_inventory.sql to quickly generate rh_account, system_inventory, and system_patch data including realistic workspaces JSON and timestamps suited for backfill testing. Add dev/prepare_workspace_backfill_test.sql to clear workspace_id and workspace_name under replica role without firing triggers, and dev/verify_workspace_backfill.sql to report pending and mismatched rows. Implement scripts/workspace_backfill_e2e.sh which waits for DB, runs migrations, loads test data, clears workspace columns, runs the job once, and validates via verify_workspace_backfill.sql. Document the local workflow, configuration, and production notes in dev/workspace_backfill.md, including manual batched runs and one-shot e2e usage.	`docker-compose.workspace-backfill.yml` `conf/workspace_backfill.env` `dev/test_generate_system_inventory.sql` `dev/prepare_workspace_backfill_test.sql` `dev/verify_workspace_backfill.sql` `scripts/workspace_backfill_e2e.sh` `dev/workspace_backfill.md` `dev/test_generate_data.sql`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

codecov-commenter · 2026-06-01T21:24:59Z

Codecov Report

❌ Patch coverage is 6.73077% with 97 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.46%. Comparing base (52d4524) to head (b95d69b).

Files with missing lines	Patch %	Lines
tasks/workspace_backfill/workspace_backfill.go	7.14%	89 Missing and 2 partials ⚠️
tasks/workspace_backfill/metrics.go	0.00%	4 Missing ⚠️
main.go	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2221      +/-   ##
==========================================
- Coverage   59.07%   58.46%   -0.61%     
==========================================
  Files         137      139       +2     
  Lines        8821     8925     +104     
==========================================
+ Hits         5211     5218       +7     
- Misses       3064     3159      +95     
- Partials      546      548       +2

Flag	Coverage Δ
unittests	`58.46% <6.73%> (-0.61%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

The workspace eligibility predicates are duplicated across backfillUpdateSQL, pendingAccountsSQL, pendingRowsSQL, invalidPendingRowsSQL, and the SQL in verify_workspace_backfill.sql; consider centralizing this condition (e.g., in a view or a single reusable SQL snippet) so future changes to the rules don’t get out of sync between the job and verification scripts.
loadPendingAccounts scans rh_account_id into a []int, which can be narrower than the DB type; if rh_account_id is bigint in the schema, it would be safer to use []int64 (and adjust the function signatures) to avoid potential truncation issues.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The workspace eligibility predicates are duplicated across `backfillUpdateSQL`, `pendingAccountsSQL`, `pendingRowsSQL`, `invalidPendingRowsSQL`, and the SQL in `verify_workspace_backfill.sql`; consider centralizing this condition (e.g., in a view or a single reusable SQL snippet) so future changes to the rules don’t get out of sync between the job and verification scripts.
- `loadPendingAccounts` scans `rh_account_id` into a `[]int`, which can be narrower than the DB type; if `rh_account_id` is `bigint` in the schema, it would be safer to use `[]int64` (and adjust the function signatures) to avoid potential truncation issues.

## Individual Comments

### Comment 1
<location path="tasks/workspace_backfill/workspace_backfill.go" line_range="93" />
<code_context>
+	}
+}
+
+func runWorkspaceBackfill() (nUpdated int64, complete bool, err error) {
+	if err := logPendingStats(); err != nil {
+		return 0, false, err
</code_context>
<issue_to_address>
**issue (complexity):** Consider refactoring the backfill loop into a helper and centralizing the shared SQL predicate to simplify control flow and make predicate changes safer.

You can reduce complexity in two focused spots without changing behavior: the per-account loop and the SQL predicates.

---

### 1. Simplify `runWorkspaceBackfill` loop control

Right now `runWorkspaceBackfill` has:

- nested `for` loops
- two `total >= maxRows` checks
- `break` vs `return` scattered in the inner loop

You can make the control flow easier to follow by:

1. Extracting per-account processing into a helper.
2. Making the inner loop condition explicit.
3. Having a single place that decides whether the global limit was hit.

For example:

```go
func runWorkspaceBackfill() (nUpdated int64, complete bool, err error) {
	if err := logPendingStats(); err != nil {
		return 0, false, err
	}

	accounts, err := loadPendingAccounts()
	if err != nil {
		return 0, false, err
	}
	if len(accounts) == 0 {
		return 0, true, nil
	}

	utils.LogInfo("accounts", len(accounts), "Starting workspace backfill")

	maxRows := int64(tasks.WorkspaceBackfillMaxRowsPerRun)
	var total int64

	for i, rhAccountID := range accounts {
		rows, hitLimit, err := processAccountBatches(i, rhAccountID, maxRows, total)
		total += rows
		if err != nil {
			// keep existing behavior: on batch error we just skip that account
			continue
		}
		if hitLimit {
			return total, false, nil
		}
	}

	pending, err := countPending()
	if err != nil {
		return total, false, err
	}
	return total, pending == 0, nil
}

func processAccountBatches(idx, rhAccountID int, maxRows, totalSoFar int64) (rowsUpdated int64, hitLimit bool, err error) {
	for totalSoFar+rowsUpdated < maxRows {
		remaining := maxRows - (totalSoFar + rowsUpdated)
		batchLimit := tasks.WorkspaceBackfillBatchSize
		if int64(batchLimit) > remaining {
			batchLimit = int(remaining)
		}

		rows, batchErr := backfillBatch(rhAccountID, batchLimit)
		if batchErr != nil {
			utils.LogWarn("rhAccountID", rhAccountID, "err", batchErr.Error(), "Workspace backfill batch failed")
			backfillErrorsCnt.Inc()
			return rowsUpdated, false, batchErr
		}
		if rows == 0 {
			return rowsUpdated, false, nil
		}

		rowsUpdated += rows
		backfillRowsCnt.Add(float64(rows))
		backfillBatchesCnt.Inc()
		utils.LogInfo("i", idx, "rhAccountID", rhAccountID, "nRows", rows, "total", totalSoFar+rowsUpdated, "Workspace backfill batch")

		if tasks.WorkspaceBackfillBatchSleepMs > 0 {
			time.Sleep(time.Duration(tasks.WorkspaceBackfillBatchSleepMs) * time.Millisecond)
		}
	}

	return rowsUpdated, true, nil
}
```

This keeps the same behavior but:

- makes the limit condition explicit (`totalSoFar+rowsUpdated < maxRows`)
- centralizes the “did we hit the limit?” decision in a single boolean
- isolates per-account concerns in `processAccountBatches`

---

### 2. Centralize the “pending rows” predicate

The JSON predicate for “pending” rows is repeated in four places with a negation for invalid rows. You can factor out the core predicate once and build the other SQL strings from it, which will make future changes safer.

For example:

```go
const workspacePendingPredicate = `
workspace_id IS NULL
  AND workspaces IS NOT NULL
  AND jsonb_typeof(workspaces) = 'array'
  AND jsonb_array_length(workspaces) > 0
  AND workspaces->0->>'id' IS NOT NULL
  AND workspaces->0->>'name' IS NOT NULL
  AND NOT empty(workspaces->0->>'name')
`

const backfillUpdateSQL = `
UPDATE system_inventory si
SET workspace_id   = (si.workspaces->0->>'id')::uuid,
    workspace_name = si.workspaces->0->>'name'
FROM (
    SELECT rh_account_id, id
    FROM system_inventory
    WHERE rh_account_id = ?
      AND ` + workspacePendingPredicate + `
    ORDER BY id
    LIMIT ?
) batch
WHERE si.rh_account_id = batch.rh_account_id
  AND si.id = batch.id
`

const pendingAccountsSQL = `
SELECT rh_account_id
FROM system_inventory
WHERE ` + workspacePendingPredicate + `
GROUP BY rh_account_id
ORDER BY hash_partition_id(rh_account_id, 128), rh_account_id
`

const pendingRowsSQL = workspacePendingPredicate

const invalidPendingRowsSQL = `
workspace_id IS NULL
  AND workspaces IS NOT NULL
  AND NOT (
    ` + workspacePendingPredicate + `
  )
`
```

This keeps the SQL semantics identical, but:

- “what is a pending row” is defined exactly once
- adding/changing a condition only requires updating `workspacePendingPredicate`
- `countPending`, `backfillUpdateSQL`, `pendingAccountsSQL`, and `invalidPendingRowsSQL` all stay in sync automatically
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

TenSt marked this pull request as ready for review June 1, 2026 21:45

TenSt requested a review from a team as a code owner June 1, 2026 21:45

sourcery-ai Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread tasks/workspace_backfill/workspace_backfill.go

TenSt force-pushed the stepan/RHINENG-26546-job-to-backfill-workspace-data branch 2 times, most recently from 954040f to 52ddab1 Compare June 1, 2026 22:28

RHINENG-26546: create new task workspace_backfill

0379827

TenSt force-pushed the stepan/RHINENG-26546-job-to-backfill-workspace-data branch from 52ddab1 to a653a73 Compare June 2, 2026 00:00

TenSt added 6 commits June 2, 2026 02:02

RHINENG-26546: add tests

95db32f

RHINENG-26546: add deployment

b27897e

RHINENG-26546: add local run

34150aa

RHINENG-26546: run backfill job as admin user

5f58ab4

RHINENG-26546: add grafana panel

719fdd6

RHINENG-26546: move dev files into one folder

b95d69b

TenSt force-pushed the stepan/RHINENG-26546-job-to-backfill-workspace-data branch from a653a73 to b95d69b Compare June 2, 2026 00:02

MichaelMraka self-assigned this Jun 2, 2026

MichaelMraka approved these changes Jun 2, 2026

View reviewed changes

TenSt merged commit 86e75ea into RedHatInsights:master Jun 2, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RHINENG-26546: add job to backfill workspace data#2221

RHINENG-26546: add job to backfill workspace data#2221
TenSt merged 7 commits into
RedHatInsights:masterfrom
TenSt:stepan/RHINENG-26546-job-to-backfill-workspace-data

TenSt commented Jun 1, 2026 •

edited

Loading

Uh oh!

sourcery-ai Bot commented Jun 1, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

codecov-commenter commented Jun 1, 2026 •

edited

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

TenSt commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sourcery-ai Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

codecov-commenter commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TenSt commented Jun 1, 2026 •

edited

Loading

sourcery-ai Bot commented Jun 1, 2026 •

edited

Loading

codecov-commenter commented Jun 1, 2026 •

edited

Loading