AI Robustness Tests

class datarobot.models.genai.insights_configuration.InsightsConfiguration

Configuration information for a specific insight.

Variables:

insight_name (str) – The name of the insight.
insight_type (InsightTypes, optional) – The type of the insight.
deployment_id (Optional[str]) – The deployment ID the insight is applied to.
model_id (Optional[str]) – The model ID for the insight.
sidecar_model_metric_validation_id (Optional[str]) – Validation ID for the sidecar model metric.
custom_metric_id (Optional[str]) – The ID for a custom model metric.
evaluation_dataset_configuration_id (Optional[str]) – The ID for the evaluation dataset configuration.
cost_configuration_id (Optional[str]) – The ID for the cost configuration information.
result_unit (Optional[str]) – The unit of the result, for example “USD”.
ootb_metric_id (Optional[str]) – The ID of the Datarobot-provided metric that does not require additional configuration.
ootb_metric_name (Optional[str]) – The name of the Datarobot-provided metric that does not require additional configuration.
guard_conditions (list[dict], optional) – The guard conditions to be used with the insight.
moderation_configuration (dict, optional) – The moderation configuration for the insight.
execution_status (Optional[str]) – The execution status of the insight.
error_message (Optional[str]) – The error message for the insight, for example if it is missing specific configuration for deployed models.
error_resolution (Optional[str]) – An indicator of which field must be edited to resolve an error state.
nemo_metric_id (Optional[str]) – The ID for the NEMO metric.
llm_id (Optional[str]) – The LLM ID for OOTB metrics that use LLMs.
custom_model_llm_validation_id (Optional[str]) – The ID for the custom model LLM validation if using a custom model LLM for OOTB metrics.
aggregation_types (list[str], optional) – The aggregation types to be used for the insight.
stage (Optional[str]) – The stage (prompt or response) when the metric is calculated.
sidecar_model_metric_metadata (dict, optional) – Metadata specific to sidecar model metrics.
guard_template_id (Optional[str]) – The ID for the guard template that applies to the insight.
guard_configuration_id (Optional[str]) – The ID for the guard configuration that applies to the insight.
model_package_registered_model_id (Optional[str]) – The ID of the registered model package associated with deploymentId.
custom_model_guard (Optional[CustomModelGuard]) – The custom model guard configuration, if applicable.
extra_metric_settings (Optional[ExtraMetricSettings]) – Additional settings for the insight.

classmethod from_data(data)

Properly convert composition classes.

Return type:: InsightsConfiguration

class datarobot.models.genai.cost_metric_configurations.LLMCostConfiguration

Cost configuration for a specific LLM model; used for cost metric calculation. Price-per-token is price/reference token count.

Variables:

(float) (output_token_price)
(int) (reference_output_token_count)
(float)
(int)
(str) (llm_id)
(str)
(Optional[str]) (custom_model_llm_validation_id)

class datarobot.models.genai.cost_metric_configurations.CostMetricConfiguration

Cost metric configuration for a use case.

Variables:

(str) (use_case_id)
(str)
(List[LLMCostConfiguration]) (cost_metric_configurations)

classmethod get(cost_metric_configuration_id)

Get cost metric configuration by ID.

Return type:: CostMetricConfiguration

update(cost_metric_configurations, name=None)

Update the cost configurations.

Return type:: CostMetricConfiguration

classmethod create(use_case_id, playground_id, name, cost_metric_configurations)

Create a new cost metric configuration.

Return type:: CostMetricConfiguration

delete()

Delete the cost metric configuration.

Return type:: None

class datarobot.models.genai.evaluation_dataset_configuration.EvaluationDatasetConfiguration

An evaluation dataset configuration used to evaluate the performance of LLMs.

Variables:

id (str) – The evaluation dataset configuration ID.
name (str) – The name of the evaluation dataset configuration.
size (int) – The size of the evaluation dataset (in bytes).
rows_count (int) – The row count of the evaluation dataset.
use_case_id (str) – The ID of the Use Case associated with the evaluation dataset configuration.
playground_id (Optional[str]) – The ID of the playground associated with the evaluation dataset configuration.
dataset_id (str) – The ID of the evaluation dataset.
dataset_name (str) – The name of the evaluation dataset.
prompt_column_name (str) – The name of the dataset column containing the prompt text.
response_column_name (Optional[str]) – The name of the dataset column containing the response text.
tool_calls_column_name (Optional[str]) – The name of the dataset column containing the expected tool calls. It is required to evaluate the tool call accuracy metric for agentic workflows.
agent_goals_column_name (Optional[str]) – The name of the dataset column containing the expected agent goals. It is required to evaluate the agent goal accuracy with reference metrics for agentic workflows.
correctness_enabled (Optional[bool]) – Whether correctness is enabled for the evaluation dataset configuration.
creation_user_id (str) – The ID of the user who created the evaluation dataset configuration.
creation_date (str) – The creation date of the evaluation dataset configuration (ISO-8601 formatted).
tenant_id (str) – The ID of the DataRobot tenant this evaluation dataset configuration belongs to.
execution_status (str) – The execution status of the evaluation dataset configuration.
error_message (Optional[str]) – The error message associated with the evaluation dataset configuration.

classmethod get(id)

Get an evaluation dataset configuration by ID.

Parameters:: id (str) – The evaluation dataset configuration ID to fetch.
Returns:: evaluation_dataset_configuration – The evaluation dataset configuration.
Return type:: EvaluationDatasetConfiguration

classmethod list(use_case_id, playground_id, evaluation_dataset_configuration_id=None, offset=0, limit=100, sort=None, search=None, correctness_only=False, completed_only=False)

List all evaluation dataset configurations for a Use Case.

Parameters:

use_case_id (str) – The ID of the Use Case that evaluation datasets are returned for.
playground_id (str) – The ID of the playground that evaluation datasets are returned for. Default is None.
evaluation_dataset_configuration_id (Optional[str]) – The ID of the evaluation dataset configuration to fetch. Default is None.
offset (Optional[int]) – The offset to start fetching evaluation datasets from. Default is 0.
limit (Optional[int]) – The maximum number of evaluation datasets to return. Default is 100.
sort (Optional[str]) – The order of return for evaluation datasets. Default is None, which returns sorting by creation time.
search (Optional[str]) – A search term that filters results so that only evaluation datasets with names matching the string are returned. Default is None.
correctness_only (Optional[bool]) – Whether to return only completed datasets (particularly applicable to completion of generated synthetic datasets). Default is False.
completed_only (Optional[bool]) – Whether to return only completed datasets. Default is False.

Returns:

evaluation_dataset_configurations – A list of evaluation dataset configurations.

Return type:

List[EvaluationDatasetConfiguration]

classmethod create(name, use_case_id, dataset_id, prompt_column_name, playground_id, is_synthetic_dataset=False, response_column_name=None, tool_calls_column_name=None, agent_goals_column_name=None)

Create an evaluation dataset configuration for an existing dataset.

Parameters:

name (str) – The name of the evaluation dataset configuration.
use_case_id (str) – The Use Case ID that the evaluation dataset configuration will be added to.
dataset_id (str) – An ID, to add to the configuration, that identifies the evaluation dataset.
playground_id (str) – The ID of the playground that the evaluation dataset configuration will be added to. Default is None.
prompt_column_name (str) – The name of the prompt column in the dataset.
response_column_name (str) – The name of the response column in the dataset.
tool_calls_column_name (Optional[str]) – The name of the dataset column containing the expected tool calls. It is required to evaluate the tool call accuracy metric for agentic workflows.
agent_goals_column_name (Optional[str]) – The name of the dataset column containing the expected agent goals. It is required to evaluate the agent goal accuracy with reference metrics for agentic workflows.
is_synthetic_dataset (bool) – Whether the evaluation dataset is synthetic.

Returns:

evaluation_dataset_configuration – The created evaluation dataset configuration.

Return type:

EvaluationDatasetConfiguration

update(name=None, dataset_id=None, prompt_column_name=None, response_column_name=None, tool_calls_column_name=None, agent_goals_column_name=None)

Update the evaluation dataset configuration.

Parameters:

name (Optional[str]) – The name of the evaluation dataset configuration.
dataset_id (Optional[str]) – The ID of the dataset used in this configuration.
prompt_column_name (Optional[str]) – The name of the prompt column in the dataset.
response_column_name (Optional[str]) – The name of the response column in the dataset.
tool_calls_column_name (Optional[str]) – The name of the dataset column containing the expected tool calls. It is required to evaluate the tool call accuracy metric for agentic workflows.
agent_goals_column_name (Optional[str]) – The name of the dataset column containing the expected agent goals. It is required to evaluate the agent goal accuracy with reference metrics for agentic workflows.

Returns:

evaluation_dataset_configuration – The updated evaluation dataset configuration.

Return type:

EvaluationDatasetConfiguration

delete()

Delete the evaluation dataset configuration.

Return type:: None

class datarobot.models.genai.evaluation_dataset_metric_aggregation.EvaluationDatasetMetricAggregation

Information about an evaluation dataset metric aggregation job.: This job runs a metric against LLMs using an evaluation dataset and aggregates the results.

Variables:

llm_blueprint_id (str) – The LLM blueprint ID.
evaluation_dataset_configuration_id (str) – The evaluation dataset configuration ID.
ootb_dataset_name (str | None) – The name of the Datarobot-provided dataset that does not require additional configuration.
metric_name (str) – The name of the metric.
deployment_id (str | None) – A deployment ID if the evaluation was run against a deployment.
dataset_id (str | None) – The ID of the dataset used in the evaluation.
dataset_name (str | None) – The name of the dataset used in the evaluation.
chat_id (str) – The ID of the chat created to run the evaluation.
chat_name (str) – The name of the chat that was created to run the evaluation.
aggregation_value (float | List[Dict[str, float]]) – The aggregated metric result.
aggregation_type (AggregationType) – The type of aggregation used for the metric results.
creation_date (str) – The date the evaluation job was created.
creation_user_id (str) – The ID of the user who created the evaluation job.
tenant_id (str) – The ID of the tenant that owns the evaluation job.
custom_model_guard_id (str | None) – The ID of the custom model’s guard associated with the metric aggregation, if applicable.

classmethod create(chat_name, llm_blueprint_ids, evaluation_dataset_configuration_id, insights_configuration)

Create a new evaluation dataset metric aggregation job. The job will run the specified metric for the specified LLM blueprint IDs using the prompt-response pairs in the evaluation dataset.

Parameters:

chat_name (str) – The name of the chat that will be created to run the evaluation in.
llm_blueprint_ids (List[str]) – The LLM blueprint IDs to evaluate.
evaluation_dataset_configuration_id (str) – The ID evaluation dataset configuration to use during the evaluation.
insights_configuration (List[InsightsConfiguration]) – The insights configurations to use during the evaluation.

Returns:

The ID of the evaluation dataset metric aggregation job.

Return type:

str

classmethod list(llm_blueprint_ids=None, chat_ids=None, evaluation_dataset_configuration_ids=None, metric_names=None, aggregation_types=None, current_configuration_only=False, sort=None, offset=0, limit=100, non_errored_only=True)

List evaluation dataset metric aggregations. The results will be filtered by the provided LLM blueprint IDs and chat IDs.

Parameters:

llm_blueprint_ids (List[str]) – The LLM blueprint IDs to filter on.
chat_ids (List[str]) – The chat IDs to filter on.
evaluation_dataset_configuration_ids (List[str]) – The evaluation dataset configuration IDs to filter on.
metric_names (List[str]) – The metric names to filter on.
aggregation_types (List[str]) – The aggregation types to filter on.
current_configuration_only (Optional[bool]) – If True, only results that are associated with the current configuration of the LLM blueprint will be returned. Defaults to False.
sort (Optional[str]) – The field to sort on. Defaults to None.
offset (Optional[int]) – The offset to start at. Defaults to 0.
limit (Optional[int]) – The maximum number of results to return. Defaults to 100.
non_errored_only (Optional[bool]) – If True, only results that did not encounter an error will be returned. Defaults to True.

Returns:

A list of evaluation dataset metric aggregations.

Return type:

List[EvaluationDatasetMetricAggregation]

classmethod delete(llm_blueprint_ids, chat_ids)

Delete the associated evaluation dataset metric aggregations. Either llm_blueprint_ids or chat_ids must be provided. If both are provided, only results matching both will be removed.

Parameters:

llm_blueprint_ids (List[str]) – The LLM blueprint IDs to filter on.
chat_ids (List[str]) – The chat IDs to filter on.

Return type:

None

class datarobot.models.genai.synthetic_evaluation_dataset_generation.SyntheticEvaluationDataset

A synthetically generated evaluation dataset for LLMs.

Variables:

(str) (response_column_name)
(str)
(str)

classmethod create(llm_id, vector_database_id, llm_settings=None, dataset_name=None, language=None)

Create a synthetic evaluation dataset generation job. This will create a synthetic dataset to be used for evaluation of a language model.

Parameters:

(str) (language)
(Dict[Optional[str][Union[bool (llm_settings) – model used for dataset generation.
int (The settings to use for the language) – model used for dataset generation.
float (The settings to use for the language) – model used for dataset generation.
str]]]) (The settings to use for the language) – model used for dataset generation.
(str)
(str)
(str)

Returns:

SyntheticEvaluationDataset

Return type:

Reference to the synthetic evaluation dataset that was created.

class datarobot.models.genai.sidecar_model_metric.SidecarModelMetricValidation

A sidecar model metric validation for LLMs.

Variables:

id (str) – The ID of the sidecar model metric validation.
prompt_column_name (str) – The name of the prompt column for the sidecar model.
deployment_id (str) – The ID of the deployment associated with the sidecar model.
model_id (str) – The ID of the sidecar model.
validation_status (str) – The status of the validation job.
deployment_access_data (dict) – The data that will be used for accessing the deployment prediction server. This field is only available for deployments that pass validation. Dict fields are as follows: - prediction_api_url - The URL for the deployment prediction server. - datarobot_key - The first of two auth headers for the prediction server. - authorization_header - The second of two auth headers for the prediction server. - input_type - The input type the model expects, either JSON or CSV. - model_type - The target type of the deployed custom model.
tenant_id (str) – The ID of the tenant that created the sidecar model metric validation.
name (str) – The name of the sidecar model metric.
creation_date (str) – The date the sidecar model metric validation was created.
user_id (str) – The ID of the user that created the sidecar model metric validation.
deployment_name (str) – The name of the deployment associated with the sidecar model.
user_name (str) – The name of the user that created the sidecar model metric validation.
use_case_id (str) – The ID of the Use Case associated with the sidecar model metric validation.
prediction_timeout (int) – The timeout in seconds for the prediction API used in this sidecar model metric validation.
error_message (str) – Additional information for the errored validation.
citations_prefix_column_name (str) – The name of the prefix in the citations column for the sidecar model.
response_column_name (str) – The name of the response column for the sidecar model.
expected_response_column_name (str) – The name of the expected response column for the sidecar model.
target_column_name (str) – The name of the target column for the sidecar model.

classmethod create(deployment_id, name, prediction_timeout, model_id=None, use_case_id=None, playground_id=None, prompt_column_name=None, target_column_name=None, response_column_name=None, citation_prefix_column_name=None, expected_response_column_name=None)

Create a sidecar model metric validation.

Parameters:

deployment_id (str) – The ID of the deployment to validate.
name (str) – The name of the validation.
prediction_timeout (int) – The timeout in seconds for the prediction API used in this validation.
model_id (Optional[str]) – The ID of the model to validate.
use_case_id (Optional[str]) – The ID of the Use Case associated with the validation.
playground_id (Optional[str]) – The ID of the playground associated with the validation.
prompt_column_name (Optional[str]) – The name of the prompt column for the sidecar model.
target_column_name (Optional[str]) – The name of the target column for the sidecar model.
response_column_name (Optional[str]) – The name of the response column for the sidecar model.
citation_prefix_column_name (Optional[str]) – The name of the prefix for citations column for the sidecar model.
expected_response_column_name (Optional[str]) – The name of the expected response column for the sidecar model.

Returns:

The created sidecar model metric validation.

Return type:

SidecarModelMetricValidation

classmethod list(use_case_ids=None, offset=None, limit=None, search=None, sort=None, completed_only=True, deployment_id=None, model_id=None, prompt_column_name=None, target_column_name=None, citation_prefix_column_name=None)

List sidecar model metric validations.

Parameters:

use_case_ids (List[str], optional) – The IDs of the use cases to filter by.
offset (Optional[int]) – The number of records to skip.
limit (Optional[int]) – The maximum number of records to return.
search (Optional[str]) – The search string.
sort (Optional[str]) – The sort order.
completed_only (Optional[bool]) – Whether to return only completed validations.
deployment_id (Optional[str]) – The ID of the deployment to filter by.
model_id (Optional[str]) – The ID of the model to filter by.
prompt_column_name (Optional[str]) – The name of the prompt column to filter by.
target_column_name (Optional[str]) – The name of the target column to filter by.
citation_prefix_column_name (Optional[str]) – The name of the prefix for citations column to filter by.

Returns:

The list of sidecar model metric validations.

Return type:

List[SidecarModelMetricValidation]

classmethod get(validation_id)

Get a sidecar model metric validation by ID.

Parameters:: validation_id (str) – The ID of the validation to get.
Returns:: The sidecar model metric validation.
Return type:: SidecarModelMetricValidation

revalidate()

Revalidate the sidecar model metric validation.

Returns:: The sidecar model metric validation.
Return type:: SidecarModelMetricValidation

update(name=None, prompt_column_name=None, target_column_name=None, response_column_name=None, expected_response_column_name=None, citation_prefix_column_name=None, deployment_id=None, model_id=None, prediction_timeout=None)

Update the sidecar model metric validation.

Parameters:

name (Optional[str]) – The name of the validation.
prompt_column_name (Optional[str]) – The name of the prompt column for the sidecar model.
target_column_name (Optional[str]) – The name of the target column for the sidecar model.
response_column_name (Optional[str]) – The name of the response column for the sidecar model.
expected_response_column_name (Optional[str]) – The name of the expected response column for the sidecar model.
citation_prefix_column_name (Optional[str]) – The name of the prefix for citations column for the sidecar model.
deployment_id (Optional[str]) – The ID of the deployment to validate.
model_id (Optional[str]) – The ID of the model to validate.
prediction_timeout (Optional[int]) – The timeout in seconds for the prediction API used in this validation.

Returns:

The updated sidecar model metric validation.

Return type:

SidecarModelMetricValidation

delete()

Delete the sidecar model metric validation.

Return type:: None

class datarobot.models.genai.llm_test_configuration.LLMTestConfiguration

Metadata for a DataRobot GenAI LLM test configuration.

Variables:

id (str) – The LLM test configuration ID.
name (str) – The LLM test configuration name.
description (str) – The LLM test configuration description.
dataset_evaluations (list[DatasetEvaluation]) – The dataset/insight combinations that make up the LLM test configuration.
llm_test_grading_criteria (LLMTestGradingCriteria) – The criteria used to grade the result of the LLM test configuration.
is_out_of_the_box_test_configuration (bool) – Whether this is an out-of-the-box configuration.
use_case_id (Optional[str]) – The ID of the linked Use Case, if any.
creation_date (Optional[str]) – The date the LLM test configuration was created, if any.
creation_user_id (Optional[str]) – The ID of the creating user, if any.
warnings (Optional[list[Dict[str, str]]]) – The warnings for the LLM test configuration, if any.

classmethod create(name, dataset_evaluations, llm_test_grading_criteria, use_case=None, description=None)

Creates a new LLM test configuration.

Parameters:

name (str) – The LLM test configuration name.
dataset_evaluations (list[DatasetEvaluationRequestDict]) – The LLM test dataset evaluation requests.
llm_test_grading_criteria (LLMTestGradingCriteria) – The LLM test grading criteria.
use_case (Optional[Union[UseCase, str]], optional) – Use case to link to the created llm test configuration.
description (Optional[str]) – The LLM test configuration description. If None, the default, description returns an empty string.

Returns:

llm_test_configuration – The created LLM test configuration.

Return type:

LLMTestConfiguration

classmethod get(llm_test_configuration)

Retrieve a single LLM Test configuration.

Parameters:: llm_test_configuration (LLMTestConfiguration or str) – The LLM test configuration to retrieve, either LLMTestConfiguration or LLMTestConfiguration ID.
Returns:: llm_test_configuration – The requested LLM Test configuration.
Return type:: LLMTestConfiguration

classmethod list(use_case=None, test_config_type=None)

List all LLM test configurations available to the user. If a Use Case is specified, results are restricted to only those configurations associated with that Use Case.

Parameters:

use_case (Optional[UseCaseLike], optional) – Returns only those configurations associated with a particular Use Case, specified by either the Use Case name or ID.
test_config_type (Optional[LLMTestConfigurationType], optional) – Returns only configurations of the specified type. If not specified, the custom test configurations are returned.

Returns:

llm_test_configurations – Returns a list of LLM test configurations.

Return type:

list[LLMTestConfiguration]

update(name=None, description=None, dataset_evaluations=None, llm_test_grading_criteria=None)

Update the LLM test configuration.

Parameters:

name (Optional[str]) – The new LLM test configuration name.
description (Optional[str]) – The new LLM test configuration description.
dataset_evaluations (list[DatasetEvaluationRequestDict], optional) – The new dataset evaluation requests.
llm_test_grading_criteria (LLMTestGradingCriteria, optional) – The new grading criteria.

Returns:

llm_test_configuration – The updated LLM test configuration.

Return type:

LLMTestConfiguration

delete()

Delete a single LLM test configuration.

Return type:: None

class datarobot.models.genai.llm_test_configuration.LLMTestConfigurationSupportedInsights

Metadata for a DataRobot GenAI LLM test configuration supported insights.

Variables:: supported_insight_configurations (list[InsightsConfiguration]) – The supported insights for LLM test configurations.

classmethod list(use_case=None, playground=None)

List all supported insights for a LLM test configuration.

Parameters:

use_case (Optional[Union[UseCase, str]], optional) – Returns only those supported insight configurations associated with a particular Use Case, specified by either the Use Case name or ID.
playground (Optional[Union[Playground, str]], optional) – Returns only those supported insight configurations associated with a particular playground, specified by either the Playground or ID.

Returns:

llm_test_configuration_supported_insights – Returns the supported insight configurations for the LLM test configuration.

Return type:

LLMTestConfigurationSupportedInsights

class datarobot.models.genai.llm_test_result.LLMTestResult

Metadata for a DataRobot GenAI LLM test result.

Variables:

id (str) – The LLM test result ID.
llm_test_configuration_id (str) – The LLM test configuration ID associated with this LLM test result.
llm_test_configuration_name (str) – The LLM test configuration name associated with this LLM test result.
use_case_id (str) – The ID of the Use Case associated with this LLM test result.
llm_blueprint_id (str) – The ID of the LLM blueprint for this LLM test result.
llm_test_grading_criteria (LLMTestGradingCriteria) – The criteria used to grade the result of the LLM test configuration.
grading_result (GradingResult) – The overall grading result for the LLM test.
pass_percentage (float) – The percentage of insight evaluation results that passed the grading criteria.
execution_status (str) – The execution status of the job that evaluated the LLM test result.
insight_evaluation_result (list[InsightEvaluationResult]) – The results for the individual insights that make up the LLM test result.
creation_date (str) – The date of the LLM test result.
creation_user_id (str) – The ID of the user who executed the LLM test.
creation_user_name (str) – The name of the user who executed the LLM test.

classmethod create(llm_test_configuration, llm_blueprint)

Create a new LLMTestResult. This executes the LLM test configuration using the specified LLM blueprint. To check the status of the LLM test, use the LLMTestResult.get method with the returned ID.

Parameters:

llm_test_configuration (LLMTestConfiguration or str) – The LLM test configuration to execute, either LLMTestConfiguration or the LLM test configuration ID.
llm_blueprint (LLMBlueprint or str) – The LLM blueprint to test, either LLMBlueprint or the LLM blueprint ID.

Returns:

llm_test_result – The created LLM test result.

Return type:

LLMTestResult

classmethod get(llm_test_result)

Retrieve a single LLM test result.

Parameters:: llm_test_result (LLMTestResult or str) – The LLM test result to retrieve, specified by either LLM test result or test ID.
Returns:: llm_test_result – The requested LLM test result.
Return type:: LLMTestResult

classmethod list(llm_test_configuration=None, llm_blueprint=None)

List all LLM test results available to the user. If the LLM test configuration or LLM blueprint is specified, results are restricted to only those LLM test results associated with the LLM test configuration or LLM blueprint.

Parameters:

llm_test_configuration (Optional[Union[LLMTestConfiguration, str]]) – The returned LLM test results are filtered to those associated with a specific LLM test configuration, if specified.
llm_blueprint (Optional[Union[LLMBlueprint, str]]) – The returned LLM test results, filtered by those associated with a specific LLM blueprint, if specified.

Returns:

llm_test_results – Returns a list of LLM test results.

Return type:

List[LLMTestResult]

delete()

Delete a single LLM test result.

Return type:: None

class datarobot.models.genai.llm_test_configuration.DatasetEvaluation

Metadata for a DataRobot GenAI dataset evaluation.

Variables:

evaluation_name (str) – The name of the evaluation.
evaluation_dataset_configuration_id (str or None, optional) – The ID of the evaluation dataset configuration for custom datasets.
evaluation_dataset_name (str) – The name of the evaluation dataset.
ootb_dataset (OOTBDataset or None, optional) – Out-of-the-box dataset.
insight_configuration (InsightsConfiguration) – The insight to calculate for this dataset.
insight_grading_criteria (InsightGradingCriteria) – The criteria to use for grading the results.
max_num_prompts (int) – The maximum number of prompts to use for the evaluation.
prompt_sampling_strategy (PromptSamplingStrategy) – The prompt sampling strategy for the dataset evaluation.

class datarobot.models.genai.llm_test_result.InsightEvaluationResult

Metadata for a DataRobot GenAI insight evaluation result.

Variables:

id (str) – The ID of the insight evaluation result.
llm_test_result_id (str) – The ID of the LLM test result associated with this insight evaluation result.
evaluation_dataset_configuration_id (str) – The ID of the evaluation dataset configuration.
evaluation_dataset_name (str) – The name of the evaluation dataset.
metric_name (str) – The name of the metric.
chat_id (str) – The ID of the chat containing the prompts and responses.
chat_name (str) – The name of the chat containing the prompts and responses.
aggregation_type (AggregationType) – The type of aggregation used for the metric results.
grading_result (GradingResult) – The overall grade for the LLM test.
execution_status (str) – The execution status of the LLM test.
evaluation_name (str) – The name of the evaluation.
insight_grading_criteria (InsightGradingCriteria) – The criteria to grade the results.
last_update_date (str) – The date the result was most recently updated.
aggregation_value (float | List[Dict[str, float]] | None) – The aggregated metric result.

class datarobot.models.genai.llm_test_configuration.OOTBDatasetDict

class datarobot.models.genai.llm_test_configuration.DatasetEvaluationRequestDict

class datarobot.models.genai.llm_test_configuration.DatasetEvaluationDict

class datarobot.models.genai.nemo_configuration.NemoConfiguration

Configuration for the Nemo Pipeline.

Variables:

prompt_pipeline_metric_name (Optional[str]) – The name of the metric for the prompt pipeline.
prompt_pipeline_files (NemoFileContentsResponse, optional) – The files used in the prompt pipeline.
prompt_llm_configuration (NemoLLMConfiguration, optional) – The LLM configuration for the prompt pipeline.
prompt_moderation_configuration (ModerationConfigurationWithoutID, optional) – The moderation configuration for the prompt pipeline.
prompt_pipeline_template_id (Optional[str]) – The ID of the prompt pipeline template. This parameter defines the actions.py file.
response_pipeline_metric_name (Optional[str]) – The name of the metric for the response pipeline.
response_pipeline_files (NemoFileContentsResponse, optional) – The files used in the response pipeline.
response_llm_configuration (NemoLLMConfiguration, optional) – The LLM configuration for the response pipeline.
response_moderation_configuration (ModerationConfigurationWithoutID, optional) – The moderation configuration for the response pipeline.
response_pipeline_template_id (Optional[str]) – The ID of the response pipeline template. This parameter defines the actions.py file.
blocked_terms_file_contents (str) – The contents of the blocked terms file. This is shared between the prompt and response pipelines.

classmethod get(playground)

Get the Nemo configuration for a playground.

Parameters:: playground (str or Playground) – The playground to get the configuration for
Returns:: The Nemo configuration for the playground.
Return type:: NemoConfiguration

classmethod upsert(playground, blocked_terms_file_contents, prompt_pipeline_metric_name=None, prompt_pipeline_files=None, prompt_llm_configuration=None, prompt_moderation_configuration=None, prompt_pipeline_template_id=None, response_pipeline_metric_name=None, response_pipeline_files=None, response_llm_configuration=None, response_moderation_configuration=None, response_pipeline_template_id=None)

Create or update the nemo configuration for a playground.

Parameters:

playground (str or Playground) – The playground for the configuration
blocked_terms_file_contents (str) – The contents of the blocked terms file.
prompt_pipeline_metric_name (Optional[str]) – The name of the metric for the prompt pipeline.
prompt_pipeline_files (NemoFileContents, optional) – The files used in the prompt pipeline.
prompt_llm_configuration (NemoLLMConfiguration, optional) – The LLM configuration for the prompt pipeline.
prompt_moderation_configuration (ModerationConfigurationWithoutID, optional) – The moderation configuration for the prompt pipeline.
prompt_pipeline_template_id (Optional[str]) – The ID of the prompt pipeline template, this will define the action.py file.
response_pipeline_metric_name (Optional[str]) – The name of the metric for the response pipeline.
response_pipeline_files (NemoFileContents, optional) – The files used in the response pipeline.
response_llm_configuration (NemoLLMConfiguration, optional) – The LLM configuration for the response pipeline.
response_moderation_configuration (ModerationConfigurationWithoutID, optional) – The moderation configuration for the response pipeline.
response_pipeline_template_id (Optional[str]) – The ID of the response pipeline template, this will define the action.py file.

Returns:

The Nemo configuration for the playground.

Return type:

NemoConfiguration

class datarobot.models.genai.llm_test_configuration.OOTBDataset

Metadata for a DataRobot GenAI out-of-the-box LLM compliance test dataset.

Variables:

dataset_name (str) – The name of the dataset.
prompt_column_name (str) – The name of the prompt column.
response_column_name (str or None, optional) – The name of the response column, if any.
dataset_url (str or None, optional) – The URL of the dataset.
rows_count (int) – The number of rows in the dataset.
warning (str or None, optional) – A warning message regarding the contents of the dataset, if any.

classmethod list()

List all out-of-the-box datasets available to the user.

Returns:: ootb_datasets – Returns a list of out-of-the-box datasets.
Return type:: list[OOTBDataset]

class datarobot.models.genai.llm_test_configuration.NonOOTBDataset

Metadata for a DataRobot GenAI non out-of-the-box (OOTB) LLM compliance test dataset.

classmethod list(use_case=None)

List all non out-of-the-box datasets available to the user.

Returns:: non_ootb_datasets – Returns a list of non out-of-the-box datasets.
Return type:: list[NonOOTBDataset]

class datarobot.models.genai.metric_insights.MetricInsights

Metric insights for playground.

classmethod list(playground, llm_blueprint_ids=None)

Get metric insights for playground.

Parameters:

playground (str or Playground) – Playground to get the supported metrics from.
llm_blueprint_ids (Optional[Sequence[str]]) – LLM Blueprint IDs to check for additional metrics support for.

Returns:

insights – Metric insights for playground.

Return type:

list[InsightsConfiguration]

classmethod copy_to_playground(source_playground, target_playground, add_to_existing=True, with_evaluation_datasets=False)

Copy metric insights from one playground to another.

Parameters:

source_playground (str or Playground) – Playground to copy metric insights from.
target_playground (str or Playground) – Playground to copy metric insights to.
add_to_existing (Optional[bool]) – Add metric insights to existing ones in the target playground, by default True.
with_evaluation_datasets (Optional[bool]) – Copy evaluation datasets from the source playground.

Return type:

None

class datarobot.models.genai.ootb_metric_configuration.PlaygroundOOTBMetricConfiguration

OOTB metric configurations for a playground.

Variables:: ootb_metric_configurations ((List[OOTBMetricConfigurationResponse]): The list of the OOTB metric configurations.)

classmethod get(playground_id)

Get OOTB metric configurations for the playground.

Return type:: PlaygroundOOTBMetricConfiguration

classmethod create(playground_id, ootb_metric_configurations)

Create a new OOTB metric configurations.

Return type:: PlaygroundOOTBMetricConfiguration

class datarobot.models.genai.evaluation_dataset_utils.ReferenceToolCall

Reference tool call for an evaluation dataset. This is a convenience stand in for the Ragas ToolCall class.

json()

Convert the tool call to a JSON string.

Return type:: str

classmethod from_json(json_str)

Create a ReferenceToolCall object from a JSON string.

Return type:: ReferenceToolCall

class datarobot.models.genai.evaluation_dataset_utils.ReferenceToolCalls

Utility for creating a list of reference tool calls for an evaluation dataset. This class represents a list of tool calls for a single row in the evaluation dataset.

Example usage: >>> df = pandas.DataFrame() >>> tool_calls_1 = ReferenceToolCalls([ >>> ReferenceToolCall(name=”get_weather”, args={“location”: “New York”}), >>> ReferenceToolCall(name=”get_news”, args={“topic”: “technology”}) >>> ]) >>> tool_calls_2 = ReferenceToolCalls([ >>> ReferenceToolCall(name=”get_weather”, args={“location”: “Los Angeles”}), >>> ReferenceToolCall(name=”get_news”, args={“topic”: “sports”}) >>> ]) >>> df[‘prompts’] = [‘what is the weather for the tech conference in NYC?’, >>> ‘what is the weather in LA?, and will it affect the game?’] >>> df[‘reference_tool_calls’] = [tool_calls_1.json(), tool_calls_2.json()]

classmethod from_json(json_str)

Create a ReferenceToolCalls object from a JSON string.

Return type:: ReferenceToolCalls