Evaluation: Uploading dataset concurrently#461
Conversation
WalkthroughThis change implements concurrent uploading for dataset items to Langfuse by replacing sequential item creation with ThreadPoolExecutor-based parallel processing. Inner helper function encapsulates item creation with error handling, and flush behavior is consolidated to a single final call instead of per-item flushing. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related issues
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
backend/app/crud/evaluations/langfuse.py (3)
251-270: Add return type annotation to inner function.Per coding guidelines, type hints should be used throughout. The
upload_itemhelper is missing its return type annotation.- def upload_item(item: dict[str, str], duplicate_num: int) -> bool: + def upload_item(item: dict[str, str], duplicate_num: int) -> bool:The current signature already has the return type — disregard if Black reformats it correctly. Otherwise, ensure the
-> boolannotation is preserved.
285-285: Consider makingmax_workersconfigurable.The hardcoded
max_workers=4may not be optimal for all environments. Consider exposing this as a configuration parameter or deriving it from settings (similar toCELERY_WORKER_CONCURRENCY).+from app.core.config import settings + +# In function or at module level +MAX_UPLOAD_WORKERS = getattr(settings, "LANGFUSE_UPLOAD_WORKERS", 4) + # Then use: - with ThreadPoolExecutor(max_workers=4) as executor: + with ThreadPoolExecutor(max_workers=MAX_UPLOAD_WORKERS) as executor:
287-290: Simplify by usingexecutor.mapor list comprehension withsubmit.The current pattern of appending futures to a list in a loop can be simplified.
- futures = [] - for item, dup_num in upload_tasks: - future = executor.submit(upload_item, item, dup_num) - futures.append(future) + futures = [executor.submit(upload_item, item, dup_num) for item, dup_num in upload_tasks]backend/CELERY_OVERVIEW.md (1)
15-25: Add language specifier to fenced code block.The file structure code block is missing a language identifier, which triggers markdown lint warning MD040. Use
textorplaintextfor directory listings.-``` +```text app/celery/ ├── __init__.py # Package initialization, exports celery_app
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/CELERY_OVERVIEW.md(1 hunks)backend/app/crud/evaluations/langfuse.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/crud/evaluations/langfuse.py
backend/app/crud/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement database access operations in backend/app/crud/
Files:
backend/app/crud/evaluations/langfuse.py
🧠 Learnings (2)
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/celery/**/*.py : Keep Celery app configuration (priority queues, beat scheduler, workers) under backend/app/celery/
Applied to files:
backend/CELERY_OVERVIEW.md
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/celery/tasks/**/*.py : Define Celery tasks under backend/app/celery/tasks/
Applied to files:
backend/CELERY_OVERVIEW.md
🧬 Code graph analysis (1)
backend/app/crud/evaluations/langfuse.py (1)
backend/app/core/langfuse/langfuse.py (1)
flush(108-109)
🪛 GitHub Actions: AI Platform CI
backend/app/crud/evaluations/langfuse.py
[error] 1-1: Trailing whitespace detected and removed by pre-commit hook 'trailing-whitespace'. Re-run pre-commit to finalize changes.
[error] 1-1: Black formatter reformatted the file. Re-run pre-commit to commit the changes.
🪛 LanguageTool
backend/CELERY_OVERVIEW.md
[uncategorized] ~171-~171: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...Identical to high priority but uses the low priority queue get_task_status() (lines 7...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[grammar] ~555-~555: Use a hyphen to join words.
Context: ...vice** (app/services/llm/) - Uses high priority queue for real-time API calls -...
(QB_NEW_EN_HYPHEN)
🪛 markdownlint-cli2 (0.18.1)
backend/CELERY_OVERVIEW.md
15-15: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (2)
backend/CELERY_OVERVIEW.md (1)
1-588: Documentation looks comprehensive and well-structured.This is a valuable addition that clearly explains the Celery architecture, queue priorities, task routing, and best practices. The end-to-end flow example and integration patterns are particularly helpful for onboarding developers. Based on learnings, the file structure aligns with project conventions for Celery configuration under
backend/app/celery/.backend/app/crud/evaluations/langfuse.py (1)
283-295: Langfuse client thread-safety is properly handled.The Langfuse Python SDK is thread-safe for concurrent
create_dataset_itemcalls when using a single client instance, which your code does. The internal queue and background worker handle batching, andlangfuse.flush()at line 298 correctly ensures all items are delivered before the function returns.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
backend/app/tests/crud/evaluations/test_langfuse.py (3)
489-512: Missing flush verification in error handling test.All other tests in this class verify flush behavior, but this error-handling test does not assert that
flushwas called despite item creation failures. If the new behavior guarantees a final flush even when errors occur, add an assertion to confirm it.🔎 Add flush verification:
# 2 succeeded out of 3 assert total_items == 2 assert mock_langfuse.create_dataset_item.call_count == 3 + mock_langfuse.flush.assert_called_once()
386-393: Use factory pattern for test fixtures.As per coding guidelines, test fixtures in
backend/app/tests/should use the factory pattern rather than returning static data directly.Based on coding guidelines: "Use factory pattern for test fixtures in
backend/app/tests/"🔎 Refactor to factory pattern:
@pytest.fixture - def valid_items(self): + def valid_items_factory(self): """Valid parsed items.""" - return [ - {"question": "What is 2+2?", "answer": "4"}, - {"question": "What is the capital of France?", "answer": "Paris"}, - {"question": "Who wrote Romeo and Juliet?", "answer": "Shakespeare"}, - ] + def _factory(): + return [ + {"question": "What is 2+2?", "answer": "4"}, + {"question": "What is the capital of France?", "answer": "Paris"}, + {"question": "Who wrote Romeo and Juliet?", "answer": "Shakespeare"}, + ] + return _factoryThen update test methods to call the factory:
def test_upload_dataset_to_langfuse_success(self, valid_items_factory): valid_items = valid_items_factory() ...
16-512: Add type hints to test methods and fixtures.The coding guidelines require type hints on all function parameters and return values. Test methods should specify
-> None, and the fixture should specify its return type.Based on coding guidelines: "Always add type hints to all function parameters and return values in Python code"
Example for test methods:
def test_create_langfuse_dataset_run_success(self) -> None: ...Example for fixture (after applying factory pattern):
from typing import Callable from collections.abc import Callable @pytest.fixture def valid_items_factory(self) -> Callable[[], list[dict[str, str]]]: ...
🧹 Nitpick comments (3)
backend/app/tests/crud/evaluations/test_langfuse.py (3)
418-419: Preferassert_called_once()for consistency.These lines use
flush.call_count == 1while other flush assertions in this file useassert_called_once()(lines 80, 208, 298, 332, 371, 380). The latter is more idiomatic and provides clearer assertion failure messages.🔎 Apply this diff for consistency:
- # Verify flush was called once (final flush) - assert mock_langfuse.flush.call_count == 1 + # Verify flush was called once (final flush) + mock_langfuse.flush.assert_called_once()
486-487: Preferassert_called_once()for consistency.Same as above—use
assert_called_once()instead of comparingcall_countfor consistency with the rest of the file.🔎 Apply this diff for consistency:
- # final flush once - assert mock_langfuse.flush.call_count == 1 + # final flush once + mock_langfuse.flush.assert_called_once()
383-512: Consider adding concurrency validation tests.The PR introduces ThreadPoolExecutor-based concurrent uploads, but the current tests use synchronous mocks and don't validate thread-safety or concurrent execution behavior. While MagicMock is generally thread-safe, consider adding tests that verify the concurrent upload mechanism works correctly—for example, confirming that all items are processed even under concurrent execution or that errors in one thread don't block others.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/tests/crud/evaluations/test_langfuse.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/tests/crud/evaluations/test_langfuse.py
backend/app/tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use factory pattern for test fixtures in
backend/app/tests/
Files:
backend/app/tests/crud/evaluations/test_langfuse.py
🧬 Code graph analysis (1)
backend/app/tests/crud/evaluations/test_langfuse.py (1)
backend/app/core/langfuse/langfuse.py (1)
flush(110-111)
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
* Enhancing swagger and redocs (#484) * rearranging endpoints for swagger and redocs * Claude: Next steps (#500) * claude updates * updated migration command * making it concise * add step to add typehint * Add Langfuse observability to Unified API (#457) * Add Langfuse observability to LLM execution methods * Enhance observability decorator to validate Langfuse credentials before execution * remove trace metadata * precommit * remove creds check * Unified API: Add support for Kaapi Abstracted LLM Call (#498) * Add Kaapi LLM parameters and completion config; implement transformation to native provider format * Refine LLM API documentation and improve code formatting for clarity; enhance configuration handling for OpenAI provider * add/fix tests * Fix validation logic in map_kaapi_to_openai_params to prevent simultaneous setting of 'temperature' and 'reasoning' parameters * Remove default value for 'model' in KaapiLLMParams to enforce explicit assignment * Refactor KaapiLLMParams to enforce explicit reasoning levels; update mapping logic to handle reasoning and temperature conflicts with warnings * Enhance LLM API documentation to clarify ad-hoc configuration parameters and warning handling for unsupported settings * Refactor execute_job to use completion_config directly instead of config_blob.completion * Refactor LLM provider interfaces to use NativeCompletionConfig instead of CompletionConfig * precommit * Evaluation: Uploading dataset concurrently (#461) * fix: add threadpool based concurrency to speeden up langfuse dataset upload * chore: fix precommit linting issues * fix: cleanup and deleted CELERY.md * chore: formatting --------- Co-authored-by: Akhilesh Negi <akhileshnegi.an3@gmail.com> * adding provider input (#502) * Documentation : repo enhancement (#496) * Documentation : repo MDs enhancement and adding enhancement template * Kaapi v1.0: Permissions Review and Authorization Cleanup (#501) * Refactor dependencies and enhance AuthContext for non-optional organization and project attributes * Refactor permission checks to require SUPERUSER role across multiple routes * fix session * Refactor routes to enhance AuthContext usage and enforce project permissions * Refactor dependency imports and remove unused parameters across multiple files * Refactor user model by removing UserOrganization and UserProjectOrg classes; update tests to use AuthContext for user-related operations * precommit * require project in llm call * fix: update project attribute reference in CRUD operations --------- Co-authored-by: Nishika Yadav <89646695+nishika26@users.noreply.github.com> * refactor: remove API key encryption and decryption functions from security module and tests (#507) API Key: remove API key encryption and decryption functions * added depends as import --------- Co-authored-by: Nishika Yadav <89646695+nishika26@users.noreply.github.com> Co-authored-by: Akhilesh Negi <akhileshnegi.an3@gmail.com> Co-authored-by: Aviraj Gour <100823015+avirajsingh7@users.noreply.github.com>
Summary
Target issue is #460
Adding threadpool based concurrency model to allow max 4 worker threads to take up the insert (input, output) pair instead of one blocking process doing all the heavy lifting. This enhances the endpoints ability to insert CSV files upto 1000 line items within 60 seconds.
Chore: Also added a celery doc to explain how CELERY implemented in the repo. Largely unrelated with the core task.
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Summary by CodeRabbit
Refactor
Tests
✏️ Tip: You can customize this high-level summary in your review settings.