Just pitching this idea to see if we would like to try this, also happy to develop it in a dev branch to see how we feel about it. But first I'd like to get your feelings about this strategy @AmenRa @milyenpabo and @andersonbcdefg.
Problem
Numba is currently a hard dependency that significantly impacts the user experience:
- Code readability: Numba decorators and type hints make the codebase harder to understand
- Performance inconsistency: JIT compilation can sometimes be slower than pure NumPy for certain workloads
- Process spawning issues: Numba can create too many processes leading to crashes in some environments
- Installation complexity: Numba adds significant build complexity and binary size
- Debugging difficulty: JIT-compiled code is harder to debug and profile
see also #74 #64
Proposed Solution
Make Numba an optional dependency through a progressive two-step migration strategy.
Implementation Approach
Strategy: Progressive Migration to Dual Implementation
Step 1: Conditional Decorators (Quick Win)
- Replace
@njit with @maybe_njit that falls back to identity function when Numba is disabled
- Immediate benefit: Users can disable Numba globally with minimal code changes
- Handles Numba-specific types (
numba.typed.Dict/List) with fallbacks
- Convert
prange to range when Numba is disabled
Step 2: Dual Implementation Pattern (Long-term Solution)
- Keep existing Numba implementations for performance
- Add clean, readable NumPy implementations as fallbacks
- Runtime selection based on Numba availability and user preference
- Much better code readability and debugging experience
Code Evolution Example: Precision Metric
Current:
from numba import njit
from .common import clean_qrels, fix_k
@njit(cache=True)
def precision_at_k(qrels, run, k):
qrels = clean_qrels(qrels, 1)
run = run[:fix_k(k, run)]
if qrels.shape[0] == 0:
return 0.0
return np.intersect1d(qrels[:, 0], run[:, 0]).shape[0] / run.shape[0]
Step 1: Conditional Decorators
from ..decorators import maybe_njit
from .common import clean_qrels, fix_k
@maybe_njit(cache=True) # Falls back to pure Python when Numba disabled
def precision_at_k(qrels, run, k):
qrels = clean_qrels(qrels, 1)
run = run[:fix_k(k, run)]
if qrels.shape[0] == 0:
return 0.0
return np.intersect1d(qrels[:, 0], run[:, 0]).shape[0] / run.shape[0]
Step 2: Dual Implementation (Future)
def precision_at_k_numpy(qrels, run, k):
"""Clean, readable NumPy implementation."""
relevant_docs = qrels[qrels[:, 1] >= 1][:, 0]
if k == 0 or k > len(run):
k = len(run)
top_k_docs = run[:k, 0]
if len(relevant_docs) == 0:
return 0.0
relevant_retrieved = np.intersect1d(relevant_docs, top_k_docs)
return len(relevant_retrieved) / k
@njit(cache=True)
def precision_at_k_numba(qrels, run, k):
# Existing optimized implementation
...
def precision_at_k(qrels, run, k):
"""Auto-select best implementation."""
if NUMBA_AVAILABLE and use_numba():
return precision_at_k_numba(qrels, run, k)
else:
return precision_at_k_numpy(qrels, run, k)
Configuration System
# ranx/config.py
import os
_USE_NUMBA = None
def use_numba():
global _USE_NUMBA
if _USE_NUMBA is None:
_USE_NUMBA = os.environ.get('RANX_USE_NUMBA', 'true').lower() != 'false'
return _USE_NUMBA
def set_numba_enabled(enabled: bool):
global _USE_NUMBA
_USE_NUMBA = enabled
Usage Examples
import ranx
# Option 1: Disable Numba globally
ranx.set_numba_enabled(False)
# Option 2: Environment variable
# export RANX_USE_NUMBA=false
# Usage remains identical - automatic fallback
qrels = ranx.Qrels.from_dict({"q1": {"d1": 1, "d2": 1}})
run = ranx.Run.from_dict({"q1": {"d1": 0.9, "d2": 0.8, "d3": 0.7}})
result = ranx.evaluate(qrels, run, ["precision@2"]) # Uses best available implementation
Benefits
Step 1 Benefits:
- Immediate relief: Users can disable Numba right away
- Simplified debugging: Pure Python stack traces
- Easier development: No JIT compilation delays
- Zero breaking changes: Existing API unchanged
Step 2 Benefits:
- Clean, readable code: NumPy versions are self-documenting
- Educational value: Clean implementations help users understand metrics
- Better maintenance: Easier to debug and modify NumPy versions
- Performance flexibility: Users choose speed vs simplicity
Impact Areas
The change would affect ~132 functions across:
- Metrics (50+ functions): ndcg, precision, recall, etc.
- Fusion algorithms (40+ functions): bordafuse, bayesfuse, etc.
- Data structures (15+ functions): Qrels, Run operations
- Normalization (10+ functions)
- Utilities and statistical tests (15+ functions)
Performance Considerations
- Step 1: Same algorithms, just without JIT compilation (slower but functional)
- Step 2: NumPy implementations could often be as fast as Numba
- Vectorized NumPy might sometimes outperform Numba on small datasets
- Users get to choose their performance/readability tradeoff
Implementation Plan
Phase 1: Foundation (Step 1)
- Create configuration system (
ranx/config.py)
- Add conditional decorators (
ranx/decorators.py)
- Migrate decorators across codebase (can be done incrementally)
- Update
__init__.py for conditional Numba setup
- Add tests for both Numba and non-Numba modes
Phase 2: Clean Implementations (Step 2)
- Start with high-impact metrics (precision, recall, ndcg)
- Add dual implementations incrementally
- Create comprehensive benchmarks
- Update documentation with examples
Alternative Approaches Considered
- Pure NumPy rewrite: Would break performance for existing users
- Separate packages: Split into
ranx and ranx-numba — Too complex
- Lazy imports: Import Numba only when needed — Doesn't solve core readability issues
This progressive approach gives immediate relief to users experiencing Numba issues while working toward a long-term solution
with clean, readable implementations.
Just pitching this idea to see if we would like to try this, also happy to develop it in a dev branch to see how we feel about it. But first I'd like to get your feelings about this strategy @AmenRa @milyenpabo and @andersonbcdefg.
Problem
Numba is currently a hard dependency that significantly impacts the user experience:
see also #74 #64
Proposed Solution
Make Numba an optional dependency through a progressive two-step migration strategy.
Implementation Approach
Strategy: Progressive Migration to Dual Implementation
Step 1: Conditional Decorators (Quick Win)
@njitwith@maybe_njitthat falls back to identity function when Numba is disablednumba.typed.Dict/List) with fallbacksprangetorangewhen Numba is disabledStep 2: Dual Implementation Pattern (Long-term Solution)
Code Evolution Example: Precision Metric
Current:
Step 1: Conditional Decorators
Step 2: Dual Implementation (Future)
Configuration System
Usage Examples
Benefits
Step 1 Benefits:
Step 2 Benefits:
Impact Areas
The change would affect ~132 functions across:
Performance Considerations
Implementation Plan
Phase 1: Foundation (Step 1)
ranx/config.py)ranx/decorators.py)__init__.pyfor conditional Numba setupPhase 2: Clean Implementations (Step 2)
Alternative Approaches Considered
ranxandranx-numba— Too complexThis progressive approach gives immediate relief to users experiencing Numba issues while working toward a long-term solution
with clean, readable implementations.