Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 11 additions & 8 deletions backend/app/api/docs/collections/create.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,12 @@ pipeline:
* Create an OpenAI [Vector
Store](https://platform.openai.com/docs/api-reference/vector-stores)
based on those File's.
* Attach the Vector Store to an OpenAI
* [To be deprecated] Attach the Vector Store to an OpenAI
[Assistant](https://platform.openai.com/docs/api-reference/assistants). Use
parameters in the request body relevant to an Assistant to flesh out
its configuration.
its configuration. Note that an assistant will only be created when you pass both
"model" and "instruction" in the request body otherwise only a vector store will be
created from the documents given.

If any one of the OpenAI interactions fail, all OpenAI resources are
cleaned up. If a Vector Store is unable to be created, for example,
Expand All @@ -19,9 +21,10 @@ OpenAI. Failure can occur from OpenAI being down, or some parameter
value being invalid. It can also fail due to document types not be
accepted. This is especially true for PDFs that may not be parseable.

The immediate response from the endpoint is `collection_job` object which is
going to contain the collection "job ID", status and action type ("CREATE").
Once the collection has been created, information about the collection will
be returned to the user via the callback URL. If a callback URL is not provided,
clients can poll the `collection job info` endpoint with the `id` in the
`collection_job` object returned as it is the `job id`, to retrieve the same information.
Vector store/assistant will be created asynchronously. The immediate response
from this endpoint is `collection_job` object which is going to contain
the collection "job ID" and status.Once the collection has been created,
information about the collection will be returned to the user via the
callback URL. If a callback URL is not provided, clients can check the
`collection job info` endpoint with the `job_id`, to retrieve the
information about the creation of collection.
9 changes: 5 additions & 4 deletions backend/app/api/docs/collections/delete.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ Remove a collection from the platform. This is a two step process:
No action is taken on the documents themselves: the contents of the
documents that were a part of the collection remain unchanged, those
documents can still be accessed via the documents endpoints. The response from this
endpoint will be a `collection_job` object which will contain the collection `job ID`,
status and action type ("DELETE"). when you take the id returned and use the collection job
info endpoint, if the job is successful, you will get the status as successful and nothing will
be returned as the collection as it has been deleted and marked as deleted.
endpoint will be a `collection_job` object which will contain the collection `job_id` and
status. when you take the id returned and use the collection job
info endpoint, if the job is successful, you will get the status as successful.
Additionally, if a `callback_url` was provided in the request body,
you will receive a message indicating whether the deletion was successful or if it failed.
3 changes: 0 additions & 3 deletions backend/app/api/docs/collections/docs.md

This file was deleted.

6 changes: 3 additions & 3 deletions backend/app/api/docs/collections/info.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Retrieve detailed information about a specific collection by its ID from the collection table. Note that this endpoint CANNOT be used as a polling endpoint for collection creation because an entry will be made in the collection table only after the resource creation and association has been successful.

This endpoint returns metadata for the collection, including its project, organization,
Retrieve detailed information about `a specific collection by its ID` from the collection table. This endpoint returns the collection object including its project, organization,
timestamps, and associated LLM service details (`llm_service_id`).

Additionally, if the `include_docs` flag in the request body is true then you will get a list of document IDs associated with a given collection as well. Documents returned are not only stored by the AI platform, but also by OpenAI.
13 changes: 5 additions & 8 deletions backend/app/api/docs/collections/job_info.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,9 @@
Retrieve information about a collection job by the collection job ID. This endpoint can be considered the polling endpoint for collection creation job. This endpoint provides detailed status and metadata for a specific collection job
in the AI platform. It is especially useful for:
Retrieve information about a collection job by the collection job ID. This endpoint provides detailed status and metadata for a specific collection job in the AI platform. It is especially useful for:

* Fetching the collection job object containing the ID which will be collection job id, collection ID, status of the job as well as error message.
* Fetching the collection job object, including the collection job ID, the current status, and the associated collection details.

* If the job has finished, has been successful and it was a job of creation of collection then this endpoint will fetch the associated collection details from the collection table, including:
- `llm_service_id`: The OpenAI assistant or model used for the collection.
- Collection metadata such as ID, project, organization, and timestamps.
- `llm_service_id`: The OpenAI assistant or model used for the collection.
- Collection metadata such as ID, project, organization, and timestamps.

* If the job of delete collection was successful, we will get the status as successful and nothing will be returned as collection.

* Containing a simplified error messages in the retrieved collection job object when a job has failed.
* If the delete-collection job succeeds, the status is set to “successful” and the `collection_key` contains the ID of the collection that has been deleted.
4 changes: 4 additions & 0 deletions backend/app/api/docs/collections/list.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,6 @@
List _active_ collections -- collections that have been created but
not deleted

If a vector store was created - `llm_service_name` and `llm_service_id` in the response denote the name of the vector store (eg. 'openai vector store') and its id.

[To be deprecated] If an assistant was created, `llm_service_name` and `llm_service_id` in the response denote the name of the model used in the assistant (eg. 'gpt-4o') and assistant id.
36 changes: 23 additions & 13 deletions backend/app/api/routes/collection_job.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,12 @@
CollectionCrud,
CollectionJobCrud,
)
from app.models import CollectionJobStatus, CollectionJobPublic, CollectionActionType
from app.models import (
CollectionJobStatus,
CollectionIDPublic,
CollectionActionType,
CollectionJobPublic,
)
from app.models.collection import CollectionPublic
from app.utils import APIResponse, load_description
from app.services.collections.helpers import extract_error_message
Expand All @@ -21,7 +26,7 @@


@router.get(
"/info/jobs/{job_id}",
"/jobs/{job_id}",
description=load_description("collections/job_info.md"),
response_model=APIResponse[CollectionJobPublic],
)
Expand All @@ -35,16 +40,21 @@ def collection_job_info(

job_out = CollectionJobPublic.model_validate(collection_job)

if (
collection_job.status == CollectionJobStatus.SUCCESSFUL
and collection_job.action_type == CollectionActionType.CREATE
and collection_job.collection_id
):
collection_crud = CollectionCrud(session, current_user.project_id)
collection = collection_crud.read_one(collection_job.collection_id)
job_out.collection = CollectionPublic.model_validate(collection)

if collection_job.status == CollectionJobStatus.FAILED and job_out.error_message:
job_out.error_message = extract_error_message(job_out.error_message)
if collection_job.collection_id:
if (
collection_job.action_type == CollectionActionType.CREATE
and collection_job.status == CollectionJobStatus.SUCCESSFUL
):
collection_crud = CollectionCrud(session, current_user.project_id)
collection = collection_crud.read_one(collection_job.collection_id)
job_out.collection = CollectionPublic.model_validate(collection)

elif collection_job.action_type == CollectionActionType.DELETE:
job_out.collection = CollectionIDPublic(id=collection_job.collection_id)

if collection_job.status == CollectionJobStatus.FAILED:
raw_error = getattr(collection_job, "error_message", None)
error_message = extract_error_message(raw_error)
job_out.error_message = error_message

return APIResponse.success_response(data=job_out)
153 changes: 90 additions & 63 deletions backend/app/api/routes/collections.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,10 @@
import inspect
import logging
from uuid import UUID
from typing import List

from fastapi import APIRouter, Query
from fastapi import APIRouter, Query, Body
from fastapi import Path as FastPath


from app.api.deps import SessionDep, CurrentUserOrgProject
from app.crud import (
CollectionCrud,
Expand All @@ -18,28 +16,65 @@
CollectionJobStatus,
CollectionActionType,
CollectionJobCreate,
CollectionJobPublic,
CollectionJobImmediatePublic,
CollectionWithDocsPublic,
)
from app.models.collection import (
ResponsePayload,
CreationRequest,
CallbackRequest,
DeletionRequest,
CollectionPublic,
)
from app.utils import APIResponse, load_description
from app.services.collections.helpers import extract_error_message
from app.services.collections import (
create_collection as create_service,
delete_collection as delete_service,
)


logger = logging.getLogger(__name__)

router = APIRouter(prefix="/collections", tags=["collections"])
collection_callback_router = APIRouter()


@collection_callback_router.post(
"{$callback_url}",
name="collection_callback",
)
def collection_callback_notification(body: APIResponse[CollectionJobPublic]):
"""
Callback endpoint specification for collection creation/deletion.

The callback will receive:
- On success: APIResponse with success=True and data containing CollectionJobPublic
- On failure: APIResponse with success=False and error message
- metadata field will always be included if provided in the request
"""
...


@router.get(
"/",
description=load_description("collections/list.md"),
response_model=APIResponse[List[CollectionPublic]],
)
def list_collections(
session: SessionDep,
current_user: CurrentUserOrgProject,
):
collection_crud = CollectionCrud(session, current_user.project_id)
rows = collection_crud.read_all()

return APIResponse.success_response(rows)


@router.post(
"/create",
"/",
description=load_description("collections/create.md"),
response_model=APIResponse[CollectionJobImmediatePublic],
callbacks=collection_callback_router.routes,
)
def create_collection(
session: SessionDep,
Expand All @@ -55,110 +90,102 @@ def create_collection(
)
)

this = inspect.currentframe()
route = router.url_path_for(this.f_code.co_name)
payload = ResponsePayload(
status="processing", route=route, key=str(collection_job.id)
# True iff both model and instructions were provided in the request body
with_assistant = bool(
getattr(request, "model", None) and getattr(request, "instructions", None)
)

create_service.start_job(
db=session,
request=request,
payload=payload,
collection_job_id=collection_job.id,
project_id=current_user.project_id,
organization_id=current_user.organization_id,
with_assistant=with_assistant,
)

return APIResponse.success_response(collection_job)
metadata = None
if not with_assistant:
metadata = {
"note": (
"This job will create a vector store only (no Assistant). "
"Assistant creation happens when both 'model' and 'instructions' are included."
)
}

return APIResponse.success_response(
CollectionJobImmediatePublic.model_validate(collection_job), metadata=metadata
)


@router.post(
"/delete",
@router.delete(
"/{collection_id}",
description=load_description("collections/delete.md"),
response_model=APIResponse[CollectionJobImmediatePublic],
callbacks=collection_callback_router.routes,
)
def delete_collection(
session: SessionDep,
current_user: CurrentUserOrgProject,
request: DeletionRequest,
collection_id: UUID = FastPath(description="Collection to delete"),
request: CallbackRequest | None = Body(default=None),
):
collection_crud = CollectionCrud(session, current_user.project_id)
collection = collection_crud.read_one(request.collection_id)
_ = CollectionCrud(session, current_user.project_id).read_one(collection_id)

deletion_request = DeletionRequest(
collection_id=collection_id,
callback_url=request.callback_url if request else None,
)

collection_job_crud = CollectionJobCrud(session, current_user.project_id)
collection_job = collection_job_crud.create(
CollectionJobCreate(
action_type=CollectionActionType.DELETE,
project_id=current_user.project_id,
status=CollectionJobStatus.PENDING,
collection_id=collection.id,
collection_id=collection_id,
)
)

this = inspect.currentframe()
route = router.url_path_for(this.f_code.co_name)
payload = ResponsePayload(
status="processing", route=route, key=str(collection_job.id)
)

delete_service.start_job(
db=session,
request=request,
payload=payload,
collection=collection,
request=deletion_request,
collection_job_id=collection_job.id,
project_id=current_user.project_id,
organization_id=current_user.organization_id,
)

return APIResponse.success_response(collection_job)
return APIResponse.success_response(
CollectionJobImmediatePublic.model_validate(collection_job)
)


@router.get(
"/info/{collection_id}",
"/{collection_id}",
description=load_description("collections/info.md"),
response_model=APIResponse[CollectionPublic],
response_model=APIResponse[CollectionWithDocsPublic],
)
def collection_info(
session: SessionDep,
current_user: CurrentUserOrgProject,
collection_id: UUID = FastPath(description="Collection to retrieve"),
include_docs: bool = Query(
True,
description="If true, include documents linked to this collection",
),
skip: int = Query(0, ge=0),
limit: int = Query(100, gt=0, le=100),
):
collection_crud = CollectionCrud(session, current_user.project_id)
collection = collection_crud.read_one(collection_id)

return APIResponse.success_response(collection)

collection_with_docs = CollectionWithDocsPublic.model_validate(collection)

@router.get(
"/list",
description=load_description("collections/list.md"),
response_model=APIResponse[List[CollectionPublic]],
)
def list_collections(
session: SessionDep,
current_user: CurrentUserOrgProject,
):
collection_crud = CollectionCrud(session, current_user.project_id)
rows = collection_crud.read_all()
if include_docs:
document_collection_crud = DocumentCollectionCrud(session)
docs = document_collection_crud.read(collection, skip, limit)
collection_with_docs.documents = [
DocumentPublic.model_validate(doc) for doc in docs
]

return APIResponse.success_response(rows)


@router.post(
"/docs/{collection_id}",
description=load_description("collections/docs.md"),
response_model=APIResponse[List[DocumentPublic]],
)
def collection_documents(
session: SessionDep,
current_user: CurrentUserOrgProject,
collection_id: UUID = FastPath(description="Collection to retrieve"),
skip: int = Query(0, ge=0),
limit: int = Query(100, gt=0, le=100),
):
collection_crud = CollectionCrud(session, current_user.project_id)
document_collection_crud = DocumentCollectionCrud(session)
collection = collection_crud.read_one(collection_id)
data = document_collection_crud.read(collection, skip, limit)
return APIResponse.success_response(data)
return APIResponse.success_response(collection_with_docs)
Loading