Skip to content

fix: allow retrieve of public and owned ds#2712

Open
minottic wants to merge 2 commits into
masterfrom
job_sub_fix
Open

fix: allow retrieve of public and owned ds#2712
minottic wants to merge 2 commits into
masterfrom
job_sub_fix

Conversation

@minottic
Copy link
Copy Markdown
Member

@minottic minottic commented Apr 29, 2026

Description

The computation of job.ownerGroup, error thrown if empty, suffered from having
jobs trying to access public and owned datasets because public datasets might
be not owned by the user submitting the job. It also includes a refactor to
simplify maintenance as well as more verbose error messages

Fixes

  • public datasets excluded from intersection computation

Changes:

  • refactor
  • more verbose error messages/cleaner error codes
  • speed up dataset retrieval with a findAll rather than looping on all datasets

Tests included

  • Included for each change/fix?
  • Passing?

Documentation

  • swagger documentation updated (required for API changes)
  • official documentation updated

official documentation info

Summary by Sourcery

Adjust job creation mapping and validation to correctly handle public and owned datasets while improving ownership checks and error reporting.

Bug Fixes:

  • Exclude published datasets from group intersection when resolving ownerGroup for #datasetAccess jobs so mixed public and owned datasets can be used.
  • Ensure ownerGroup resolution for #datasetOwner jobs validates that all datasets share the same owner group and fails with a clear error otherwise.
  • Align HTTP status codes for invalid ownership, group membership, and anonymous job email validation with semantic 403 and 422 responses instead of generic 400s.

Enhancements:

  • Refactor the v3 job creation interceptor into smaller helpers for building job parameters, owner user, and owner group.
  • Relax immutability on CreateJobDto ownership fields so they can be populated server-side before persistence.
  • Improve error messages around job ownership and group validation for clearer diagnostics.

Tests:

  • Extend job authorization tests to cover mixed public and owned datasets, differing owner groups, and updated error status codes and messages.
  • Add new test datasets to verify published datasets do not block dataset access jobs via group intersection.

@minottic minottic requested a review from a team as a code owner April 29, 2026 13:18
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In buildOwnerGroup, the findAll query uses { _id: { $in: datasetPid } }, but datasetPid is built from datasetDto.pid; this likely should query on pid instead of _id to avoid silently returning no datasets and miscomputing ownership.
  • The #datasetOwner branch in buildOwnerGroup does const [firstOwner] = datasetOwners; on a Set, which will always yield undefined; consider converting the set to an array first (e.g. const [firstOwner] = [...datasetOwners];).
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `buildOwnerGroup`, the `findAll` query uses `{ _id: { $in: datasetPid } }`, but `datasetPid` is built from `datasetDto.pid`; this likely should query on `pid` instead of `_id` to avoid silently returning no datasets and miscomputing ownership.
- The `#datasetOwner` branch in `buildOwnerGroup` does `const [firstOwner] = datasetOwners;` on a `Set`, which will always yield `undefined`; consider converting the set to an array first (e.g. `const [firstOwner] = [...datasetOwners];`).

## Individual Comments

### Comment 1
<location path="src/jobs/interceptors/create-job-v3-mapping.interceptor.ts" line_range="113-114" />
<code_context>
+    const datasetList = jobParams.datasetList;
+    if (datasetList.length === 0) return undefined;
+    const datasetPid = datasetList.map((datasetDto) => datasetDto.pid);
+    const datasets = await this.datasetsService.findAll({
+      where: { _id: { $in: datasetPid } },
+      fields: ["isPublished", "accessGroups", "ownerGroup"],
+    });
</code_context>
<issue_to_address>
**🚨 issue (security):** Dataset lookup now uses `_id` with a list of PIDs, which likely changes semantics and may break owner-group resolution.

The previous implementation queried by `pid` via `findOne({ where: { pid: datasetDto.pid } })`, but the new code collects PIDs into `datasetPid` and queries with `where: { _id: { $in: datasetPid } }`. Unless `_id` is guaranteed to equal `pid`, this will return wrong or empty results and lead to incorrect `ownerGroup` and authorization decisions. Please either query by `pid` again or ensure you collect and filter on the same identifier field.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/jobs/interceptors/create-job-v3-mapping.interceptor.ts Outdated
@minottic minottic force-pushed the job_sub_fix branch 6 times, most recently from ee4e19c to aa90928 Compare May 5, 2026 14:18
minottic added 2 commits May 8, 2026 09:50
The computation of job.ownerGroup, error thrown if empty, suffered from having
jobs trying to access public and owned datasets because public datasets might
be not owned by the user submitting the job. It also includes a refactor to
simplify maintenance as well as more verbose error messages
Comment on lines +144 to +149
if (
jobConfigCreateAuth === CreateJobAuth.DatasetAccess &&
datasets.every((dataset) => dataset?.isPublished)
)
return jobUserCurrentGroups?.[0];
const nonPublishedDatasets = datasets.filter(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with the job concept, but this logic looks slightly off to me.
If each job requires ownerGroup, instead of picking arbitary one from userGroups array, how about forcing user to provide one? if possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants