Evaluation: Uploading dataset concurrently by Prajna1999 · Pull Request #461 · ProjectTech4DevAI/kaapi-backend

Prajna1999 · 2025-12-02T07:24:25Z

Summary

Target issue is #460

Adding threadpool based concurrency model to allow max 4 worker threads to take up the insert (input, output) pair instead of one blocking process doing all the heavy lifting. This enhances the endpoints ability to insert CSV files upto 1000 line items within 60 seconds.

Chore: Also added a celery doc to explain how CELERY implemented in the repo. Largely unrelated with the core task.

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Summary by CodeRabbit

Refactor
- Optimized dataset uploads to Langfuse with concurrent processing for improved performance.
- Enhanced error handling during upload operations.
Tests
- Updated test assertions to reflect optimized upload behavior.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…upload

coderabbitai · 2025-12-02T07:24:38Z

Walkthrough

This change implements concurrent uploading for dataset items to Langfuse by replacing sequential item creation with ThreadPoolExecutor-based parallel processing. Inner helper function encapsulates item creation with error handling, and flush behavior is consolidated to a single final call instead of per-item flushing.

Changes

Cohort / File(s)	Summary
Langfuse concurrent upload implementation `backend/app/crud/evaluations/langfuse.py`	Introduces ThreadPoolExecutor and as_completed imports; replaces sequential loop with concurrent task submission via upload_tasks list; adds inner upload_item helper for Langfuse item creation and error handling; removes per-item flush calls; consolidates to single final flush; derives langfuse_dataset_id directly from dataset.id.
Langfuse test updates `backend/app/tests/crud/evaluations/test_langfuse.py`	Updates assertions to expect single final flush instead of per-item flushing; reduces expected flush call count from 4 to 1 in successful upload and single-duplication test scenarios; maintains unchanged item creation counts and metadata verification logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Concurrency correctness: Verify ThreadPoolExecutor usage, exception handling in concurrent context, and as_completed iteration logic
Flush behavior: Confirm that single final flush after all concurrent uploads is functionally equivalent to per-item flushes and doesn't introduce race conditions or data consistency issues
Error handling: Ensure errors from individual item uploads are properly captured and counted in concurrent execution
Test alignment: Validate that test assertions accurately reflect the new flush semantics across all upload scenarios

Possibly related issues

Evaluation: Optimize Upload Dataset to Langfuse Function to prevent timeout error #460: Directly implements ThreadPoolExecutor-based concurrency in upload_dataset_to_langfuse to batch item inserts to Langfuse, matching the same function-level optimization approach.

Poem

🐰 Threads hop in parallel now,
No more single-file waiting—
Many uploads dance at once,
Flush once when all are done!
Speed hops forward, swift and bright. 🚀

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Evaluation: Uploading dataset concurrently' directly aligns with the main change of implementing concurrent dataset uploads using ThreadPoolExecutor.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch refactor/lf-dataset-upload

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (4)

backend/app/crud/evaluations/langfuse.py (3)
251-270: Add return type annotation to inner function.

Per coding guidelines, type hints should be used throughout. The upload_item helper is missing its return type annotation.
-    def upload_item(item: dict[str, str], duplicate_num: int) -> bool:
+    def upload_item(item: dict[str, str], duplicate_num: int) -> bool:
The current signature already has the return type — disregard if Black reformats it correctly. Otherwise, ensure the -> bool annotation is preserved.

285-285: Consider making max_workers configurable.

The hardcoded max_workers=4 may not be optimal for all environments. Consider exposing this as a configuration parameter or deriving it from settings (similar to CELERY_WORKER_CONCURRENCY).
+from app.core.config import settings
+
+# In function or at module level
+MAX_UPLOAD_WORKERS = getattr(settings, "LANGFUSE_UPLOAD_WORKERS", 4)
+
 # Then use:
-        with ThreadPoolExecutor(max_workers=4) as executor:
+        with ThreadPoolExecutor(max_workers=MAX_UPLOAD_WORKERS) as executor:
287-290: Simplify by using executor.map or list comprehension with submit.

The current pattern of appending futures to a list in a loop can be simplified.
-            futures = []
-            for item, dup_num in upload_tasks:
-                future = executor.submit(upload_item, item, dup_num)
-                futures.append(future)
+            futures = [executor.submit(upload_item, item, dup_num) for item, dup_num in upload_tasks]
backend/CELERY_OVERVIEW.md (1)
15-25: Add language specifier to fenced code block.

The file structure code block is missing a language identifier, which triggers markdown lint warning MD040. Use text or plaintext for directory listings.
-```
+```text
 app/celery/
 ├── __init__.py              # Package initialization, exports celery_app

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17f427e and cc015ec.

📒 Files selected for processing (2)

backend/CELERY_OVERVIEW.md (1 hunks)
backend/app/crud/evaluations/langfuse.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

backend/app/crud/evaluations/langfuse.py

backend/app/crud/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement database access operations in backend/app/crud/

Files:

backend/app/crud/evaluations/langfuse.py

🧠 Learnings (2)

📚 Learning: 2025-10-08T12:05:01.317Z

Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/celery/**/*.py : Keep Celery app configuration (priority queues, beat scheduler, workers) under backend/app/celery/

Applied to files:

backend/CELERY_OVERVIEW.md

📚 Learning: 2025-10-08T12:05:01.317Z

Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/celery/tasks/**/*.py : Define Celery tasks under backend/app/celery/tasks/

Applied to files:

backend/CELERY_OVERVIEW.md

🧬 Code graph analysis (1)

backend/app/crud/evaluations/langfuse.py (1)

backend/app/core/langfuse/langfuse.py (1)

flush (108-109)

🪛 GitHub Actions: AI Platform CI

backend/app/crud/evaluations/langfuse.py

[error] 1-1: Trailing whitespace detected and removed by pre-commit hook 'trailing-whitespace'. Re-run pre-commit to finalize changes.

[error] 1-1: Black formatter reformatted the file. Re-run pre-commit to commit the changes.

🪛 LanguageTool

backend/CELERY_OVERVIEW.md

[uncategorized] ~171-~171: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...Identical to high priority but uses the low priority queue get_task_status() (lines 7...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

[grammar] ~555-~555: Use a hyphen to join words.
Context: ...vice** (app/services/llm/) - Uses high priority queue for real-time API calls -...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)

backend/CELERY_OVERVIEW.md

15-15: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (2)

backend/CELERY_OVERVIEW.md (1)

1-588: Documentation looks comprehensive and well-structured.

This is a valuable addition that clearly explains the Celery architecture, queue priorities, task routing, and best practices. The end-to-end flow example and integration patterns are particularly helpful for onboarding developers. Based on learnings, the file structure aligns with project conventions for Celery configuration under backend/app/celery/.

backend/app/crud/evaluations/langfuse.py (1)

283-295: Langfuse client thread-safety is properly handled.

The Langfuse Python SDK is thread-safe for concurrent create_dataset_item calls when using a single client instance, which your code does. The internal queue and background worker handle batching, and langfuse.flush() at line 298 correctly ensures all items are delivered before the function returns.

backend/app/crud/evaluations/langfuse.py

backend/CELERY_OVERVIEW.md

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

backend/app/tests/crud/evaluations/test_langfuse.py (3)
489-512: Missing flush verification in error handling test.

All other tests in this class verify flush behavior, but this error-handling test does not assert that flush was called despite item creation failures. If the new behavior guarantees a final flush even when errors occur, add an assertion to confirm it.
🔎 Add flush verification:
         # 2 succeeded out of 3
         assert total_items == 2
         assert mock_langfuse.create_dataset_item.call_count == 3
+        mock_langfuse.flush.assert_called_once()
386-393: Use factory pattern for test fixtures.

As per coding guidelines, test fixtures in backend/app/tests/ should use the factory pattern rather than returning static data directly.

Based on coding guidelines: "Use factory pattern for test fixtures in backend/app/tests/"
🔎 Refactor to factory pattern:
     @pytest.fixture
-    def valid_items(self):
+    def valid_items_factory(self):
         """Valid parsed items."""
-        return [
-            {"question": "What is 2+2?", "answer": "4"},
-            {"question": "What is the capital of France?", "answer": "Paris"},
-            {"question": "Who wrote Romeo and Juliet?", "answer": "Shakespeare"},
-        ]
+        def _factory():
+            return [
+                {"question": "What is 2+2?", "answer": "4"},
+                {"question": "What is the capital of France?", "answer": "Paris"},
+                {"question": "Who wrote Romeo and Juliet?", "answer": "Shakespeare"},
+            ]
+        return _factory
Then update test methods to call the factory:
def test_upload_dataset_to_langfuse_success(self, valid_items_factory):
    valid_items = valid_items_factory()
    ...
16-512: Add type hints to test methods and fixtures.

The coding guidelines require type hints on all function parameters and return values. Test methods should specify -> None, and the fixture should specify its return type.

Based on coding guidelines: "Always add type hints to all function parameters and return values in Python code"

Example for test methods:
def test_create_langfuse_dataset_run_success(self) -> None:
    ...
Example for fixture (after applying factory pattern):
from typing import Callable
from collections.abc import Callable

@pytest.fixture
def valid_items_factory(self) -> Callable[[], list[dict[str, str]]]:
    ...

🧹 Nitpick comments (3)

backend/app/tests/crud/evaluations/test_langfuse.py (3)
418-419: Prefer assert_called_once() for consistency.

These lines use flush.call_count == 1 while other flush assertions in this file use assert_called_once() (lines 80, 208, 298, 332, 371, 380). The latter is more idiomatic and provides clearer assertion failure messages.
🔎 Apply this diff for consistency:
-        # Verify flush was called once (final flush)
-        assert mock_langfuse.flush.call_count == 1
+        # Verify flush was called once (final flush)
+        mock_langfuse.flush.assert_called_once()
486-487: Prefer assert_called_once() for consistency.

Same as above—use assert_called_once() instead of comparing call_count for consistency with the rest of the file.
🔎 Apply this diff for consistency:
-        # final flush once
-        assert mock_langfuse.flush.call_count == 1
+        # final flush once
+        mock_langfuse.flush.assert_called_once()
383-512: Consider adding concurrency validation tests.

The PR introduces ThreadPoolExecutor-based concurrent uploads, but the current tests use synchronous mocks and don't validate thread-safety or concurrent execution behavior. While MagicMock is generally thread-safe, consider adding tests that verify the concurrent upload mechanism works correctly—for example, confirming that all items are processed even under concurrent execution or that errors in one thread don't block others.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08225a9 and 968895d.

📒 Files selected for processing (1)

backend/app/tests/crud/evaluations/test_langfuse.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets: logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase

Files:

backend/app/tests/crud/evaluations/test_langfuse.py

backend/app/tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use factory pattern for test fixtures in backend/app/tests/

Files:

backend/app/tests/crud/evaluations/test_langfuse.py

🧬 Code graph analysis (1)

backend/app/tests/crud/evaluations/test_langfuse.py (1)

backend/app/core/langfuse/langfuse.py (1)

flush (110-111)

codecov · 2025-12-18T11:30:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

* Enhancing swagger and redocs (#484) * rearranging endpoints for swagger and redocs * Claude: Next steps (#500) * claude updates * updated migration command * making it concise * add step to add typehint * Add Langfuse observability to Unified API (#457) * Add Langfuse observability to LLM execution methods * Enhance observability decorator to validate Langfuse credentials before execution * remove trace metadata * precommit * remove creds check * Unified API: Add support for Kaapi Abstracted LLM Call (#498) * Add Kaapi LLM parameters and completion config; implement transformation to native provider format * Refine LLM API documentation and improve code formatting for clarity; enhance configuration handling for OpenAI provider * add/fix tests * Fix validation logic in map_kaapi_to_openai_params to prevent simultaneous setting of 'temperature' and 'reasoning' parameters * Remove default value for 'model' in KaapiLLMParams to enforce explicit assignment * Refactor KaapiLLMParams to enforce explicit reasoning levels; update mapping logic to handle reasoning and temperature conflicts with warnings * Enhance LLM API documentation to clarify ad-hoc configuration parameters and warning handling for unsupported settings * Refactor execute_job to use completion_config directly instead of config_blob.completion * Refactor LLM provider interfaces to use NativeCompletionConfig instead of CompletionConfig * precommit * Evaluation: Uploading dataset concurrently (#461) * fix: add threadpool based concurrency to speeden up langfuse dataset upload * chore: fix precommit linting issues * fix: cleanup and deleted CELERY.md * chore: formatting --------- Co-authored-by: Akhilesh Negi <akhileshnegi.an3@gmail.com> * adding provider input (#502) * Documentation : repo enhancement (#496) * Documentation : repo MDs enhancement and adding enhancement template * Kaapi v1.0: Permissions Review and Authorization Cleanup (#501) * Refactor dependencies and enhance AuthContext for non-optional organization and project attributes * Refactor permission checks to require SUPERUSER role across multiple routes * fix session * Refactor routes to enhance AuthContext usage and enforce project permissions * Refactor dependency imports and remove unused parameters across multiple files * Refactor user model by removing UserOrganization and UserProjectOrg classes; update tests to use AuthContext for user-related operations * precommit * require project in llm call * fix: update project attribute reference in CRUD operations --------- Co-authored-by: Nishika Yadav <89646695+nishika26@users.noreply.github.com> * refactor: remove API key encryption and decryption functions from security module and tests (#507) API Key: remove API key encryption and decryption functions * added depends as import --------- Co-authored-by: Nishika Yadav <89646695+nishika26@users.noreply.github.com> Co-authored-by: Akhilesh Negi <akhileshnegi.an3@gmail.com> Co-authored-by: Aviraj Gour <100823015+avirajsingh7@users.noreply.github.com>

fix: add threadpool based concurrency to speeden up langfuse dataset …

cc015ec

…upload

Prajna1999 requested review from AkhileshNegi and kartpop December 2, 2025 07:24

coderabbitai bot reviewed Dec 2, 2025

View reviewed changes

backend/app/crud/evaluations/langfuse.py Outdated Show resolved Hide resolved

backend/app/crud/evaluations/langfuse.py Show resolved Hide resolved

chore: fix precommit linting issues

08225a9

Prajna1999 requested a review from nishika26 December 12, 2025 04:22

AkhileshNegi requested changes Dec 12, 2025

View reviewed changes

backend/app/crud/evaluations/langfuse.py Show resolved Hide resolved

backend/app/crud/evaluations/langfuse.py Show resolved Hide resolved

AkhileshNegi changed the title ~~fix: add threadpool based concurrency to speeden up langfuse dataset upload functionality~~ Evaluation: Uploading dataset concurrently Dec 12, 2025

Prajna1999 removed the request for review from kartpop December 17, 2025 11:11

nishika26 requested changes Dec 18, 2025

View reviewed changes

backend/CELERY_OVERVIEW.md Outdated Show resolved Hide resolved

backend/CELERY_OVERVIEW.md Outdated Show resolved Hide resolved

Prajna1999 and others added 2 commits December 18, 2025 09:27

Merge branch 'main' into refactor/lf-dataset-upload

ce52b79

Merge branch 'main' into refactor/lf-dataset-upload

a1bd86f

Prajna1999 self-assigned this Dec 18, 2025

Prajna1999 added the bug Something isn't working label Dec 18, 2025

fix: cleanup and deleted CELERY.md

56bb763

Prajna1999 requested review from AkhileshNegi and nishika26 December 18, 2025 07:24

AkhileshNegi approved these changes Dec 18, 2025

View reviewed changes

nishika26 approved these changes Dec 18, 2025

View reviewed changes

chore: formatting

968895d

coderabbitai bot reviewed Dec 18, 2025

View reviewed changes

Prajna1999 merged commit fc46fa3 into main Dec 18, 2025
3 checks passed

Prajna1999 deleted the refactor/lf-dataset-upload branch December 18, 2025 11:34

Prajna1999 restored the refactor/lf-dataset-upload branch December 18, 2025 11:34

Prajna1999 linked an issue Jan 16, 2026 that may be closed by this pull request

Evaluation: Optimize Upload Dataset to Langfuse Function to prevent timeout error #460

Closed

Prajna1999 mentioned this pull request Dec 18, 2025

Evaluation UI: Running large evaluations ProjectTech4DevAI/kaapi-frontend#15

Open

AkhileshNegi deleted the refactor/lf-dataset-upload branch April 2, 2026 04:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation: Uploading dataset concurrently#461

Evaluation: Uploading dataset concurrently#461
Prajna1999 merged 6 commits intomainfrom
refactor/lf-dataset-upload

Prajna1999 commented Dec 2, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 2, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Prajna1999 commented Dec 2, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 18, 2025

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Prajna1999 commented Dec 2, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 2, 2025 •

edited

Loading