Skip to content

Reduce code size of warpspeed scan kernel #7440

@bernhardmgruber

Description

@bernhardmgruber

The first ~560 something instructions of the warpspeed scan kernel just handle the setup of mbarriers (up into CCTL.IVALL), which is really excessive (the kernel has a total size of ~1800 instructions). I believe there is great potential to reduce this.

I made a few attempts in: https://github.com/bernhardmgruber/cccl/tree/warpspeed_barrer_init. However, while reducing the setup code to 140 instructions, a few runs had small regressions, so I did not integrate the branch.

We should revisit this.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions