Skip to content

Evaluation: Uploading dataset concurrently#461

Merged
Prajna1999 merged 6 commits intomainfrom
refactor/lf-dataset-upload
Dec 18, 2025
Merged

Evaluation: Uploading dataset concurrently#461
Prajna1999 merged 6 commits intomainfrom
refactor/lf-dataset-upload

Conversation

@Prajna1999
Copy link
Copy Markdown
Collaborator

@Prajna1999 Prajna1999 commented Dec 2, 2025

Summary

Target issue is #460

Adding threadpool based concurrency model to allow max 4 worker threads to take up the insert (input, output) pair instead of one blocking process doing all the heavy lifting. This enhances the endpoints ability to insert CSV files upto 1000 line items within 60 seconds.

Chore: Also added a celery doc to explain how CELERY implemented in the repo. Largely unrelated with the core task.

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Summary by CodeRabbit

  • Refactor

    • Optimized dataset uploads to Langfuse with concurrent processing for improved performance.
    • Enhanced error handling during upload operations.
  • Tests

    • Updated test assertions to reflect optimized upload behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 2, 2025

Walkthrough

This change implements concurrent uploading for dataset items to Langfuse by replacing sequential item creation with ThreadPoolExecutor-based parallel processing. Inner helper function encapsulates item creation with error handling, and flush behavior is consolidated to a single final call instead of per-item flushing.

Changes

Cohort / File(s) Summary
Langfuse concurrent upload implementation
backend/app/crud/evaluations/langfuse.py
Introduces ThreadPoolExecutor and as_completed imports; replaces sequential loop with concurrent task submission via upload_tasks list; adds inner upload_item helper for Langfuse item creation and error handling; removes per-item flush calls; consolidates to single final flush; derives langfuse_dataset_id directly from dataset.id.
Langfuse test updates
backend/app/tests/crud/evaluations/test_langfuse.py
Updates assertions to expect single final flush instead of per-item flushing; reduces expected flush call count from 4 to 1 in successful upload and single-duplication test scenarios; maintains unchanged item creation counts and metadata verification logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Concurrency correctness: Verify ThreadPoolExecutor usage, exception handling in concurrent context, and as_completed iteration logic
  • Flush behavior: Confirm that single final flush after all concurrent uploads is functionally equivalent to per-item flushes and doesn't introduce race conditions or data consistency issues
  • Error handling: Ensure errors from individual item uploads are properly captured and counted in concurrent execution
  • Test alignment: Validate that test assertions accurately reflect the new flush semantics across all upload scenarios

Possibly related issues

Poem

🐰 Threads hop in parallel now,
No more single-file waiting—
Many uploads dance at once,
Flush once when all are done!
Speed hops forward, swift and bright. 🚀

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Evaluation: Uploading dataset concurrently' directly aligns with the main change of implementing concurrent dataset uploads using ThreadPoolExecutor.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor/lf-dataset-upload

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
backend/app/crud/evaluations/langfuse.py (3)

251-270: Add return type annotation to inner function.

Per coding guidelines, type hints should be used throughout. The upload_item helper is missing its return type annotation.

-    def upload_item(item: dict[str, str], duplicate_num: int) -> bool:
+    def upload_item(item: dict[str, str], duplicate_num: int) -> bool:

The current signature already has the return type — disregard if Black reformats it correctly. Otherwise, ensure the -> bool annotation is preserved.


285-285: Consider making max_workers configurable.

The hardcoded max_workers=4 may not be optimal for all environments. Consider exposing this as a configuration parameter or deriving it from settings (similar to CELERY_WORKER_CONCURRENCY).

+from app.core.config import settings
+
+# In function or at module level
+MAX_UPLOAD_WORKERS = getattr(settings, "LANGFUSE_UPLOAD_WORKERS", 4)
+
 # Then use:
-        with ThreadPoolExecutor(max_workers=4) as executor:
+        with ThreadPoolExecutor(max_workers=MAX_UPLOAD_WORKERS) as executor:

287-290: Simplify by using executor.map or list comprehension with submit.

The current pattern of appending futures to a list in a loop can be simplified.

-            futures = []
-            for item, dup_num in upload_tasks:
-                future = executor.submit(upload_item, item, dup_num)
-                futures.append(future)
+            futures = [executor.submit(upload_item, item, dup_num) for item, dup_num in upload_tasks]
backend/CELERY_OVERVIEW.md (1)

15-25: Add language specifier to fenced code block.

The file structure code block is missing a language identifier, which triggers markdown lint warning MD040. Use text or plaintext for directory listings.

-```
+```text
 app/celery/
 ├── __init__.py              # Package initialization, exports celery_app
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17f427e and cc015ec.

📒 Files selected for processing (2)
  • backend/CELERY_OVERVIEW.md (1 hunks)
  • backend/app/crud/evaluations/langfuse.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

  • backend/app/crud/evaluations/langfuse.py
backend/app/crud/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement database access operations in backend/app/crud/

Files:

  • backend/app/crud/evaluations/langfuse.py
🧠 Learnings (2)
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/celery/**/*.py : Keep Celery app configuration (priority queues, beat scheduler, workers) under backend/app/celery/

Applied to files:

  • backend/CELERY_OVERVIEW.md
📚 Learning: 2025-10-08T12:05:01.317Z
Learnt from: CR
Repo: ProjectTech4DevAI/ai-platform PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-10-08T12:05:01.317Z
Learning: Applies to backend/app/celery/tasks/**/*.py : Define Celery tasks under backend/app/celery/tasks/

Applied to files:

  • backend/CELERY_OVERVIEW.md
🧬 Code graph analysis (1)
backend/app/crud/evaluations/langfuse.py (1)
backend/app/core/langfuse/langfuse.py (1)
  • flush (108-109)
🪛 GitHub Actions: AI Platform CI
backend/app/crud/evaluations/langfuse.py

[error] 1-1: Trailing whitespace detected and removed by pre-commit hook 'trailing-whitespace'. Re-run pre-commit to finalize changes.


[error] 1-1: Black formatter reformatted the file. Re-run pre-commit to commit the changes.

🪛 LanguageTool
backend/CELERY_OVERVIEW.md

[uncategorized] ~171-~171: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...Identical to high priority but uses the low priority queue get_task_status() (lines 7...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[grammar] ~555-~555: Use a hyphen to join words.
Context: ...vice** (app/services/llm/) - Uses high priority queue for real-time API calls -...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)
backend/CELERY_OVERVIEW.md

15-15: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (2)
backend/CELERY_OVERVIEW.md (1)

1-588: Documentation looks comprehensive and well-structured.

This is a valuable addition that clearly explains the Celery architecture, queue priorities, task routing, and best practices. The end-to-end flow example and integration patterns are particularly helpful for onboarding developers. Based on learnings, the file structure aligns with project conventions for Celery configuration under backend/app/celery/.

backend/app/crud/evaluations/langfuse.py (1)

283-295: Langfuse client thread-safety is properly handled.

The Langfuse Python SDK is thread-safe for concurrent create_dataset_item calls when using a single client instance, which your code does. The internal queue and background worker handle batching, and langfuse.flush() at line 298 correctly ensures all items are delivered before the function returns.

@Prajna1999 Prajna1999 requested a review from nishika26 December 12, 2025 04:22
@AkhileshNegi AkhileshNegi changed the title fix: add threadpool based concurrency to speeden up langfuse dataset upload functionality Evaluation: Uploading dataset concurrently Dec 12, 2025
@Prajna1999 Prajna1999 removed the request for review from kartpop December 17, 2025 11:11
@Prajna1999 Prajna1999 self-assigned this Dec 18, 2025
@Prajna1999 Prajna1999 added the bug Something isn't working label Dec 18, 2025
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
backend/app/tests/crud/evaluations/test_langfuse.py (3)

489-512: Missing flush verification in error handling test.

All other tests in this class verify flush behavior, but this error-handling test does not assert that flush was called despite item creation failures. If the new behavior guarantees a final flush even when errors occur, add an assertion to confirm it.

🔎 Add flush verification:
         # 2 succeeded out of 3
         assert total_items == 2
         assert mock_langfuse.create_dataset_item.call_count == 3
+        mock_langfuse.flush.assert_called_once()

386-393: Use factory pattern for test fixtures.

As per coding guidelines, test fixtures in backend/app/tests/ should use the factory pattern rather than returning static data directly.

Based on coding guidelines: "Use factory pattern for test fixtures in backend/app/tests/"

🔎 Refactor to factory pattern:
     @pytest.fixture
-    def valid_items(self):
+    def valid_items_factory(self):
         """Valid parsed items."""
-        return [
-            {"question": "What is 2+2?", "answer": "4"},
-            {"question": "What is the capital of France?", "answer": "Paris"},
-            {"question": "Who wrote Romeo and Juliet?", "answer": "Shakespeare"},
-        ]
+        def _factory():
+            return [
+                {"question": "What is 2+2?", "answer": "4"},
+                {"question": "What is the capital of France?", "answer": "Paris"},
+                {"question": "Who wrote Romeo and Juliet?", "answer": "Shakespeare"},
+            ]
+        return _factory

Then update test methods to call the factory:

def test_upload_dataset_to_langfuse_success(self, valid_items_factory):
    valid_items = valid_items_factory()
    ...

16-512: Add type hints to test methods and fixtures.

The coding guidelines require type hints on all function parameters and return values. Test methods should specify -> None, and the fixture should specify its return type.

Based on coding guidelines: "Always add type hints to all function parameters and return values in Python code"

Example for test methods:

def test_create_langfuse_dataset_run_success(self) -> None:
    ...

Example for fixture (after applying factory pattern):

from typing import Callable
from collections.abc import Callable

@pytest.fixture
def valid_items_factory(self) -> Callable[[], list[dict[str, str]]]:
    ...
🧹 Nitpick comments (3)
backend/app/tests/crud/evaluations/test_langfuse.py (3)

418-419: Prefer assert_called_once() for consistency.

These lines use flush.call_count == 1 while other flush assertions in this file use assert_called_once() (lines 80, 208, 298, 332, 371, 380). The latter is more idiomatic and provides clearer assertion failure messages.

🔎 Apply this diff for consistency:
-        # Verify flush was called once (final flush)
-        assert mock_langfuse.flush.call_count == 1
+        # Verify flush was called once (final flush)
+        mock_langfuse.flush.assert_called_once()

486-487: Prefer assert_called_once() for consistency.

Same as above—use assert_called_once() instead of comparing call_count for consistency with the rest of the file.

🔎 Apply this diff for consistency:
-        # final flush once
-        assert mock_langfuse.flush.call_count == 1
+        # final flush once
+        mock_langfuse.flush.assert_called_once()

383-512: Consider adding concurrency validation tests.

The PR introduces ThreadPoolExecutor-based concurrent uploads, but the current tests use synchronous mocks and don't validate thread-safety or concurrent execution behavior. While MagicMock is generally thread-safe, consider adding tests that verify the concurrent upload mechanism works correctly—for example, confirming that all items are processed even under concurrent execution or that errors in one thread don't block others.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08225a9 and 968895d.

📒 Files selected for processing (1)
  • backend/app/tests/crud/evaluations/test_langfuse.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets: logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase

Files:

  • backend/app/tests/crud/evaluations/test_langfuse.py
backend/app/tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use factory pattern for test fixtures in backend/app/tests/

Files:

  • backend/app/tests/crud/evaluations/test_langfuse.py
🧬 Code graph analysis (1)
backend/app/tests/crud/evaluations/test_langfuse.py (1)
backend/app/core/langfuse/langfuse.py (1)
  • flush (110-111)

@codecov
Copy link
Copy Markdown

codecov bot commented Dec 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Prajna1999 Prajna1999 merged commit fc46fa3 into main Dec 18, 2025
3 checks passed
@Prajna1999 Prajna1999 deleted the refactor/lf-dataset-upload branch December 18, 2025 11:34
@Prajna1999 Prajna1999 restored the refactor/lf-dataset-upload branch December 18, 2025 11:34
Prajna1999 added a commit that referenced this pull request Jan 12, 2026
* Enhancing swagger and redocs (#484)

* rearranging endpoints for swagger and redocs

* Claude: Next steps (#500)

* claude updates

* updated migration command

* making it concise

* add step to add typehint

* Add Langfuse observability to Unified API (#457)

* Add Langfuse observability to LLM execution methods

* Enhance observability decorator to validate Langfuse credentials before execution

* remove trace metadata

* precommit

* remove creds check

* Unified API: Add support for Kaapi Abstracted LLM Call (#498)

* Add Kaapi LLM parameters and completion config; implement transformation to native provider format

* Refine LLM API documentation and improve code formatting for clarity; enhance configuration handling for OpenAI provider

* add/fix tests

* Fix validation logic in map_kaapi_to_openai_params to prevent simultaneous setting of 'temperature' and 'reasoning' parameters

* Remove default value for 'model' in KaapiLLMParams to enforce explicit assignment

* Refactor KaapiLLMParams to enforce explicit reasoning levels; update mapping logic to handle reasoning and temperature conflicts with warnings

* Enhance LLM API documentation to clarify ad-hoc configuration parameters and warning handling for unsupported settings

* Refactor execute_job to use completion_config directly instead of config_blob.completion

* Refactor LLM provider interfaces to use NativeCompletionConfig instead of CompletionConfig

* precommit

* Evaluation: Uploading dataset concurrently (#461)

* fix: add threadpool based concurrency to speeden up langfuse dataset upload

* chore: fix precommit linting issues

* fix: cleanup and deleted CELERY.md

* chore: formatting

---------

Co-authored-by: Akhilesh Negi <akhileshnegi.an3@gmail.com>

* adding provider input (#502)

* Documentation : repo enhancement (#496)

* Documentation : repo MDs enhancement and adding enhancement template

* Kaapi v1.0: Permissions Review and Authorization Cleanup (#501)

* Refactor dependencies and enhance AuthContext for non-optional organization and project attributes

* Refactor permission checks to require SUPERUSER role across multiple routes

* fix session

* Refactor routes to enhance AuthContext usage and enforce project permissions

* Refactor dependency imports and remove unused parameters across multiple files

* Refactor user model by removing UserOrganization and UserProjectOrg classes; update tests to use AuthContext for user-related operations

* precommit

* require project in llm call

* fix: update project attribute reference in CRUD operations

---------

Co-authored-by: Nishika Yadav <89646695+nishika26@users.noreply.github.com>

* refactor: remove API key encryption and decryption functions from security module and tests (#507)

API Key: remove API key encryption and decryption functions

* added depends as import

---------

Co-authored-by: Nishika Yadav <89646695+nishika26@users.noreply.github.com>
Co-authored-by: Akhilesh Negi <akhileshnegi.an3@gmail.com>
Co-authored-by: Aviraj Gour <100823015+avirajsingh7@users.noreply.github.com>
@AkhileshNegi AkhileshNegi deleted the refactor/lf-dataset-upload branch April 2, 2026 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Evaluation: Optimize Upload Dataset to Langfuse Function to prevent timeout error

3 participants