Description
Build a model evaluation endpoint to assess the performance of fine-tuned models, compare results, and retain only the best-performing model.
Tasks
-
Implement logic to split data and convert the test set into JSONL format.
-
Generate predictions using one or more fine-tuned models on the test data.
-
Implement evaluation logic using the Matthews Correlation Coefficient (MCC) metric.
-
Identify the best-performing model based on evaluation results.
-
Automatically delete lower-performing models to maintain system efficiency.
-
Write test cases to validate the full evaluation and cleanup workflow.
Description
Build a model evaluation endpoint to assess the performance of fine-tuned models, compare results, and retain only the best-performing model.
Tasks
Implement logic to split data and convert the test set into JSONL format.
Generate predictions using one or more fine-tuned models on the test data.
Implement evaluation logic using the Matthews Correlation Coefficient (MCC) metric.
Identify the best-performing model based on evaluation results.
Automatically delete lower-performing models to maintain system efficiency.
Write test cases to validate the full evaluation and cleanup workflow.