Skip to content

Promote halide_malloc_alignment() to be user-overridable#7204

Closed
steven-johnson wants to merge 1 commit intomainfrom
srj/halide-malloc-alignment
Closed

Promote halide_malloc_alignment() to be user-overridable#7204
steven-johnson wants to merge 1 commit intomainfrom
srj/halide-malloc-alignment

Conversation

@steven-johnson
Copy link
Copy Markdown
Contributor

Currently, halide_malloc_alignment() is WEAK_INLINE, so even if you attempt to override it, it won't have any effect (the call sites in halide_malloc() will have inlined the implementation selected by LLVM_Runtime_Linker).

This promotes halide_malloc_alignment() to a documented piece of the runtime you can override, with the caveat that it must return the same value every time it is called. To reduce overhead in halide_malloc(), we copy the value into a global at startup time and use that instead.

By itself, this isn't that useful, but it's essential to have in order for #7189 to be robust; I'm pulling this change into a separate PR so that it can land separately, enabling downstream users to be able to update their halide_malloc() implementations ahead of the "real" change.

Currently, halide_malloc_alignment() is WEAK_INLINE, so even if you attempt to override it, it won't have any effect (the call sites in halide_malloc() will have inlined the implementation selected by LLVM_Runtime_Linker).

This promotes halide_malloc_alignment() to a documented piece of the runtime you can override, with the caveat that it must return the same value every time it is called. To reduce overhead in halide_malloc(), we copy the value into a global at startup time and use that instead.

By itself, this isn't that useful, but it's essential to have in order for #7189 to be robust; I'm pulling this change into a separate PR so that it can land separately, enabling downstream users to be able to update their halide_malloc() implementations ahead of the "real" change.
@abadams
Copy link
Copy Markdown
Member

abadams commented Dec 6, 2022

Overhead looks minimal, but why is this essential for the other PR to be robust?

@steven-johnson
Copy link
Copy Markdown
Contributor Author

Overhead looks minimal, but why is this essential for the other PR to be robust?

There exists code (e.g. apps/hannk) that calls halide_malloc_alignment() so it can make decisions about how to pad and package various things. (This may be a dubious use of a not-official API, but it does it because it needs to in order to work efficiently.)

Now imagine that you run such a piece of code that that uses the default arm alignment of 32, but overrides halide_malloc() in a way that uses (say) align=8. Currently, you have no way to modify what halide_malloc_alignment() returns -- the decisions are hardcoded in LLVM_Runtime_Linker -- and because it's WEAK_INLINE, you can't change the value.

I agree that having the constants be "compile-time" constants as we have now is preferable, but if we agree that it's desirable to allow user code to know the alignment contract that the current halide_malloc() implementation is using, I'm not sure of a better way to accomplish this. I'm definitely open for better suggestions, of course.

@abadams
Copy link
Copy Markdown
Member

abadams commented Dec 6, 2022

What do you think of adding a setter instead of a weak-overridable function? It would work on more platforms.

@steven-johnson
Copy link
Copy Markdown
Contributor Author

Update: apparently initializing a global to the result of a call doesn't work in JIT mode for the qurt runtime (i.e.: WEAK size_t _alignment = 128; is fine, but WEAK size_t _alignment = halide_malloc_alignment(); is garbage), which is why the current PR is getting failures there (_alignment is garbage and thus we 'align' to huge values that fail). Oy.

What do you think of adding a setter instead of a weak-overridable function? It would work on more platforms.

Hmm... that seems like it would work. Let me take a stab at it.

(This whole alignment effort is turning into a bigger can of worms than I expected.)

@steven-johnson
Copy link
Copy Markdown
Contributor Author

What do you think of adding a setter instead of a weak-overridable function? It would work on more platforms.

Hmm... that seems like it would work. Let me take a stab at it.

So this definitely can be made to work, but it feels pretty fragile.

The problem here is that there are several ways to override the halide_malloc() allocator (AOT: override at link time, calling halide_set_custom_malloc; JIT: call JITSharedRuntime::set_default_handlers() or pass a set of JITHandlers to compile_to_callable()). In all of these cases, it is desirable to ensure that halide_malloc_alignment() is in sync with what halide_malloc() is doing (or at least to make the API make it hard to avoid it), but doing so would either require revising the API somehow to ensure both are set at the same time (which is ~impossible for the override-at-link-time variant), or to add some runtime check that somehow reality-checks this (which would add undesirable runtime overhead , unless we did it in debug-only mode, in which case it would ~never be checked for most downstream code).

Now, to be fair... we already have a setup in which there is no enforcement that halide_malloc_alignment() agrees with halide_malloc() (indeed, a case inside Google is what led me here), so it's quite likely that I'm overthinking this. So let me suggest a few harebrained alternatives to the entire status quo:

  • Revise the API for halide_malloc() to always get the alignment passed in (i.e., have it match the API for aligned_alloc(). This would cause massive internal and downstream breakage and require a lot of churn even within Halide. As a practical matter, we'd need to rename it (halide_aligned_alloc()?) and add some sort of linktime/runtime checking to warn/fail if someone provides an override to halide_malloc(). Also not sure how/whether there would be a good transition path to allow both to exist at the same time.

  • Revise the API(s) for overriding halide_malloc() to add an int alignment argument, which is the expected alignment of the malloc implementation. This would break existing code but in a likely-much-less-painful way than the previous option. Also, it's not clear how we would implement/enforce this for overrides to halide_malloc(). (...maybe forbid overriding halide_malloc() via linkage entirely and require calling halide_set_custom_malloc() for all AOT code?)

  • Hack API of halide_malloc() to allow it to also serve as halide_malloc_alignment()... e.g., calling halide_malloc(-1) returns the alignment value, cast to a void*. This kinda makes me sick to even suggest it. We shouldn't do this.

@steven-johnson
Copy link
Copy Markdown
Contributor Author

(Side note: why the heck do we have both halide_set_custom_malloc() and halide_set_custom_free()? Surely you would ~always want to override these in tandem...)

@steven-johnson
Copy link
Copy Markdown
Contributor Author

steven-johnson commented Dec 6, 2022

Finally, there is another option I overlooked: punt on the whole problem entirely.

At present, there is exactly one use case that I know of for using halide_malloc_alignment() outside of the Halide runtime, and that's in apps/hannk, for ensuring that certain allocations are alignment appropriately. Instead of calling halide_malloc_alignment(), we could just choose a "suitably large" alignment, the way HalideBuffer.h does (=128), and then revise halide_malloc_alignment() to make it impossible to see or call outside the runtime.

This is likely the best short-term option. I'll prep a (separate) PR to try that out.

@steven-johnson
Copy link
Copy Markdown
Contributor Author

I'm going to close this in favor of an alternate solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants