Numba: fuse AdvancedSubtensor->Elemwise->AdvancedIncSubtensor#2015
Draft
ricardoV94 wants to merge 7 commits into
Draft
Numba: fuse AdvancedSubtensor->Elemwise->AdvancedIncSubtensor#2015ricardoV94 wants to merge 7 commits into
ricardoV94 wants to merge 7 commits into
Conversation
6d875d8 to
0ad6e2e
Compare
41869a4 to
a07997b
Compare
2b65554 to
f939f9c
Compare
c06fc82 to
0057a48
Compare
…ariables When a shared variable's update is deleted but the variable is still destroyed (mutated inplace) by a node in the copied graph, the shared variable storage will still be mutated. Emit a UserWarning in this case.
Extend FusionOptimizer to merge independent subgraphs that share inputs but have no producer-consumer edge (siblings like f(x) and g(x)). The eager expansion only walks producer-consumer edges, missing these. Also extract InplaceGraphOptimizer.try_inplace_on_node helper and _insert_sorted_subgraph to deduplicate insertion-point logic.
The inplace_pattern loop used `input_type.layout` leaked from the preceding core-input-types loop instead of `output_type.layout`.
Extend the IndexedElemwise fusion to also absorb
AdvancedIncSubtensor1 (indexed set/inc) on the output side.
Before (3 nodes):
temp = Elemwise(x[idx], y) # shape (919,)
result = IncSubtensor(target, temp, idx) # target shape (85,)
After (1 fused loop, target is an input):
for k in range(919):
target[idx[k]] += scalar_fn(x[idx[k]], y[k])
- FuseIndexedElemwise now detects AdvancedIncSubtensor1 consumers
- Reject fusion when val broadcasts against target's non-indexed axes
- store_core_outputs supports inc mode via o[...] += val
- Inner fgraph always uses inplace IncSubtensor
- op_debug_information shows buf_N / idx_N linkage
Support AdvancedSubtensor on any axis (not just axis 0) and multi-index patterns like x[idx_row, idx_col] where multiple 1D index arrays address consecutive source axes. Generalize writes (AdvancedIncSubtensor) to match. Reads: - Add undo_take_dimshuffle_for_fusion pre-fusion rewrite - _get_indexed_read_info handles AdvancedSubtensor with consecutive tensor indices, full-slice prefix/suffix - Reject boolean indices and non-consecutive advanced indices Writes: - _get_indexed_update_info mirrors _get_indexed_read_info for AdvancedIncSubtensor - find_indexed_update_consumers detects both AdvancedIncSubtensor1 and AdvancedIncSubtensor - Broadcast guard generalized for non-axis-0 indexed axes - Indexed update construction supports AdvancedIncSubtensor (inplace) Dispatch + codegen: - indexed_inputs encoding: ((positions, axis, idx_bc), ...) - input_read_spec uses tuple of (idx_k, axis) pairs per input - n_index_loop_dims = max(idx.ndim for group)
Support multidimensional (e.g. 2D matrix) and 0-d integer indices in IndexedElemwise fusion, for both reads and writes. ND indices: - Add undo_take_reshape_for_fusion: undoes the Reshape+flatten pattern that transform_take applies for ND indices, recovering the original AdvancedSubtensor(source, mat_idx) form for fusion. Handles both axis=0 and axis>0 (with DimShuffle wrapping). - idx_load_axes: tuple of tuples, each index array loads from idx_ndim loop counters 0-d indices: - Accept 0-d tensor indices (e.g. x[scalar_idx, vec_idx]) which are valid AdvancedSubtensor inputs that broadcast with higher-dim indices.
0057a48 to
b917ef0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduce
IndexedElemwise, anOpFromGraphthat wrapsAdvancedSubtensor+Elemwise+AdvancedIncSubtensorsubgraphs so the Numba backend can generate a single loop with indirect indexing, avoiding materializing AvancedSubtensor input arrays, and writing directly on the output buffer, doing the job of AdvancedIncSubtensor in the same loop, without having to loop again through the intermediate elemwise outputCommit 1 fuses indexed reads (AdvancedSubtensor1 on inputs).
Commit 2 fuses indexed updates (AdvancedIncSubtensor1 on outputs).
Commit 3 extends to AdvancedSubtensor inputs, on arbitrary (1d) indexed (consecutive) axes
Motivation
In hierarchical models with mu = beta[group_idx] * x + ..., the logp+gradient graph combines indexed reads and indexed updates in the same Elemwise (the forward expands group-level parameters via advanced subtensor, and the gradient accumulates back into the source via advanced inc subtensor).
A simple example
Next step would be to also fuse the sum directly on the elemwise, so we end up with a single loop over the data. This is important as the sum can easily break our fusion, as we don't fuse if the elemwise output is needed elsewhere (like in a sum).