AI Robustness Tests
- class datarobot.models.genai.insights_configuration.InsightsConfiguration
Bases:
APIObject
Configuration information for a specific insight.
- Variables:
insight_name (
str
) – The name of the insight.insight_type (
InsightTypes
, optional) – The type of the insight.deployment_id (
Optional[str]
) – The deployment ID the insight is applied to.model_id (
Optional[str]
) – The model ID for the insight.sidecar_model_metric_validation_id (
Optional[str]
) – Validation ID for the sidecar model metric.custom_metric_id (
Optional[str]
) – The ID for a custom model metric.evaluation_dataset_configuration_id (
Optional[str]
) – The ID for the evaluation dataset configuration.cost_configuration_id (
Optional[str]
) – The ID for the cost configuration information.result_unit (
Optional[str]
) – The unit of the result, for example “USD”.ootb_metric_id (
Optional[str]
) – The ID of the Datarobot-provided metric that does not require additional configuration.ootb_metric_name (
Optional[str]
) – The name of the Datarobot-provided metric that does not require additional configuration.guard_conditions (
list[dict]
, optional) – The guard conditions to be used with the insight.moderation_configuration (
dict
, optional) – The moderation configuration for the insight.execution_status (
Optional[str]
) – The execution status of the insight.error_message (
Optional[str]
) – The error message for the insight, for example if it is missing specific configuration for deployed models.error_resolution (
Optional[str]
) – An indicator of which field must be edited to resolve an error state.nemo_metric_id (
Optional[str]
) – The ID for the NEMO metric.llm_id (
Optional[str]
) – The LLM ID for OOTB metrics that use LLMs.custom_model_llm_validation_id (
Optional[str]
) – The ID for the custom model LLM validation if using a custom model LLM for OOTB metrics.aggregation_types (
list[str]
, optional) – The aggregation types to be used for the insight.stage (
Optional[str]
) – The stage (prompt or response) when the metric is calculated.sidecar_model_metric_metadata (
dict
, optional) – Metadata specific to sidecar model metrics.guard_template_id (
Optional[str]
) – The ID for the guard template that applies to the insight.guard_configuration_id (
Optional[str]
) – The ID for the guard configuration that applies to the insight.
- to_dict()
- Return type:
InsightsConfigurationDict
- class datarobot.models.genai.insights_configuration.SupportedInsights
Bases:
InsightsConfiguration
Supported insights configurations for a given use case.
- classmethod list(use_case_id)
Get a list of all supported insights that can be used within a given Use Case.
- Parameters:
use_case_id (
str
) – The ID of the Use Case to list supported insights for.- Returns:
insights – A list of supported insights.
- Return type:
list[InsightsConfiguration]
- class datarobot.models.genai.insights_configuration.Insights
Bases:
APIObject
The insights configured for a playground.
- Variables:
playground_id (
str
) – The ID of the playground the insights are configured for.insights_configuration (
list[InsightsConfiguration]
) – The insights configuration for the playground.creation_date (
str
) – The date the insights were configured.creation_user_id (
str
) – The ID of the user who created the insights.last_update_date (
str
) – The date the insights were last updated.last_update_user_id (
str
) – The ID of the user who last updated the insights.tenant_id (
str
) – The tenant ID that applies to the record.
- classmethod get(playground, with_aggregation_types_only=False)
Get the insights configuration for a given playground.
- Parameters:
playground (
str|Playground
) – The ID of the playground to get insights for.with_aggregation_types_only (
Optional[bool]
) – If True, only return the aggregation types for the insights.
- Returns:
insights – The insights configuration for the playground.
- Return type:
- classmethod create(playground, insights_configuration, use_case)
Create a new insights configuration for a given playground.
- Parameters:
playground (
str
) – The ID of the playground to create insights for.insights_configuration (
list[InsightsConfiguration]
) – The insights configuration for the playground.use_case_id (
str
) – The Use Case ID to the playground is a part of.
- Returns:
insights – The created insights configuration.
- Return type:
- class datarobot.models.genai.cost_metric_configurations.LLMCostConfiguration
Bases:
APIObject
Cost configuration for a specific LLM model; used for cost metric calculation. Price-per-token is price/reference token count.
- Variables:
(float) (output_token_price)
(int) (reference_output_token_count)
(float)
(int)
(str) (llm_id)
(str)
(Optional[str]) (custom_model_llm_validation_id)
- to_dict()
- Return type:
Dict
[str
,Any
]
- class datarobot.models.genai.cost_metric_configurations.CostMetricConfiguration
Bases:
APIObject
Cost metric configuration for a use case.
- Variables:
(str) (use_case_id)
(str)
(List[LLMCostConfiguration]) (cost_metric_configurations)
- classmethod get(cost_metric_configuration_id)
Get cost metric configuration by ID.
- Return type:
- update(cost_metric_configurations, name=None)
Update the cost configurations.
- Return type:
- classmethod create(use_case_id, playground_id, name, cost_metric_configurations)
Create a new cost metric configuration.
- Return type:
- delete()
Delete the cost metric configuration.
- Return type:
None
- class datarobot.models.genai.evaluation_dataset_configuration.EvaluationDatasetConfiguration
Bases:
APIObject
An evaluation dataset configuration used to evaluate the performance of LLMs.
- Variables:
id (
str
) – The evaluation dataset configuration ID.name (
str
) – The name of the evaluation dataset configuration.size (
int
) – The size of the evaluation dataset (in bytes).rows_count (
int
) – The row count of the evaluation dataset.use_case_id (
str
) – The ID of the Use Case associated with the evaluation dataset configuration.playground_id (
Optional[str]
) – The ID of the playground associated with the evaluation dataset configuration.dataset_id (
str
) – The ID of the evaluation dataset.dataset_name (
str
) – The name of the evaluation dataset.prompt_column_name (
str
) – The name of the dataset column containing the prompt text.response_column_name (
Optional[str]
) – The name of the dataset column containing the response text.user_name (
str
) – The name of the user who created the evaluation dataset configuration.correctness_enabled (
Optional[bool]
) – Whether correctness is enabled for the evaluation dataset configuration.creation_user_id (
str
) – The ID of the user who created the evaluation dataset configuration.creation_date (
str
) – The creation date of the evaluation dataset configuration (ISO-8601 formatted).tenant_id (
str
) – The ID of the DataRobot tenant this evaluation dataset configuration belongs to.execution_status (
str
) – The execution status of the evaluation dataset configuration.error_message (
Optional[str]
) – The error message associated with the evaluation dataset configuration.
- classmethod get(id)
Get an evaluation dataset configuration by ID.
- Parameters:
id (
str
) – The evaluation dataset configuration ID to fetch.- Returns:
evaluation_dataset_configuration – The evaluation dataset configuration.
- Return type:
- classmethod list(use_case_id, playground_id=None, evaluation_dataset_configuration_id=None, offset=0, limit=100, sort=None, search=None, correctness_only=False, completed_only=False)
List all evaluation dataset configurations for a Use Case.
- Parameters:
use_case_id (
str
) – The ID of the Use Case that evaluation datasets are returned for.playground_id (
str
, optional) – The ID of the playground that evaluation datasets are returned for. Default is None.evaluation_dataset_configuration_id (
Optional[str]
) – The ID of the evaluation dataset configuration to fetch. Default is None.offset (
Optional[int]
) – The offset to start fetching evaluation datasets from. Default is 0.limit (
Optional[int]
) – The maximum number of evaluation datasets to return. Default is 100.sort (
Optional[str]
) – The order of return for evaluation datasets. Default is None, which returns sorting by creation time.search (
Optional[str]
) – A search term that filters results so that only evaluation datasets with names matching the string are returned. Default is None.correctness_only (
Optional[bool]
) – Whether to return only completed datasets (particularly applicable to completion of generated synthetic datasets). Default is False.completed_only (
Optional[bool]
) – Whether to return only completed datasets. Default is False.
- Returns:
evaluation_dataset_configurations – A list of evaluation dataset configurations.
- Return type:
List[EvaluationDatasetConfiguration]
- classmethod create(name, use_case_id, dataset_id, prompt_column_name, playground_id, is_synthetic_dataset=False, response_column_name=None)
Create an evaluation dataset configuration for an existing dataset.
- Parameters:
name (
str
) – The name of the evaluation dataset configuration.use_case_id (
str
) – The Use Case ID that the evaluation dataset configuration will be added to.dataset_id (
str
) – An ID, to add to the configuration, that identifies the evaluation dataset.playground_id (
str
) – The ID of the playground that the evaluation dataset configuration will be added to. Default is None.prompt_column_name (
str
) – The name of the prompt column in the dataset.response_column_name (
str
) – The name of the response column in the dataset.is_synthetic_dataset (
bool
) – Whether the evaluation dataset is synthetic.
- Returns:
evaluation_dataset_configuration – The created evaluation dataset configuration.
- Return type:
- update(name=None, dataset_id=None, prompt_column_name=None, response_column_name=None)
Update the evaluation dataset configuration.
- Parameters:
name (
Optional[str]
) – The name of the evaluation dataset configuration.dataset_id (
Optional[str]
) – The ID of the dataset used in this configuration.prompt_column_name (
Optional[str]
) – The name of the prompt column in the dataset.response_column_name (
Optional[str]
) – The name of the response column in the dataset.
- Returns:
evaluation_dataset_configuration – The updated evaluation dataset configuration.
- Return type:
- delete()
Delete the evaluation dataset configuration.
- Return type:
None
- class datarobot.models.genai.evaluation_dataset_metric_aggregation.EvaluationDatasetMetricAggregation
Bases:
APIObject
- Information about an evaluation dataset metric aggregation job.
This job runs a metric against LLMs using an evaluation dataset and aggregates the results.
- Variables:
llm_blueprint_id (
str
) – The LLM blueprint ID.evaluation_dataset_configuration_id (
str
) – The evaluation dataset configuration ID.ootb_dataset_name (
str | None
) – The name of the Datarobot-provided dataset that does not require additional configuration..metric_name (
str
) – The name of the metric.deployment_id (
str | None
) – A deployment ID if the evaluation was run against a deployment.dataset_id (
str | None
) – The ID of the dataset used in the evaluation.dataset_name (
str | None
) – The name of the dataset used in the evaluation.chat_id (
str
) – The ID of the chat created to run the evaluation.chat_name (
str
) – The name of the chat that was created to run the evaluation.aggregation_value (
float | List[Dict[str
,float]]
) – The aggregated metric result.aggregation_type (
AggregationType
) – The type of aggregation used for the metric results.creation_date (
str
) – The date the evaluation job was created.creation_user_id (
str
) – The ID of the user who created the evaluation job.tenant_id (
str
) – The ID of the tenant that owns the evaluation job.
- classmethod create(chat_name, llm_blueprint_ids, evaluation_dataset_configuration_id, insights_configuration)
Create a new evaluation dataset metric aggregation job. The job will run the specified metric for the specified LLM blueprint IDs using the prompt-response pairs in the evaluation dataset.
- Parameters:
chat_name (
str
) – The name of the chat that will be created to run the evaluation in.llm_blueprint_ids (
List[str]
) – The LLM blueprint IDs to evaluate.evaluation_dataset_configuration_id (
str
) – The ID evaluation dataset configuration to use during the evaluation.insights_configuration (
List[InsightsConfiguration]
) – The insights configurations to use during the evaluation.
- Returns:
The ID of the evaluation dataset metric aggregation job.
- Return type:
str
- classmethod list(llm_blueprint_ids=None, chat_ids=None, evaluation_dataset_configuration_ids=None, metric_names=None, aggregation_types=None, current_configuration_only=False, sort=None, offset=0, limit=100, non_errored_only=True)
List evaluation dataset metric aggregations. The results will be filtered by the provided LLM blueprint IDs and chat IDs.
- Parameters:
llm_blueprint_ids (
List[str]
) – The LLM blueprint IDs to filter on.chat_ids (
List[str]
) – The chat IDs to filter on.evaluation_dataset_configuration_ids (
List[str]
) – The evaluation dataset configuration IDs to filter on.metric_names (
List[str]
) – The metric names to filter on.aggregation_types (
List[str]
) – The aggregation types to filter on.current_configuration_only (
Optional[bool]
) – If True, only results that are associated with the current configuration of the LLM blueprint will be returned. Defaults to False.sort (
Optional[str]
) – The field to sort on. Defaults to None.offset (
Optional[int]
) – The offset to start at. Defaults to 0.limit (
Optional[int]
) – The maximum number of results to return. Defaults to 100.non_errored_only (
Optional[bool]
) – If True, only results that did not encounter an error will be returned. Defaults to True.
- Returns:
A list of evaluation dataset metric aggregations.
- Return type:
List[EvaluationDatasetMetricAggregation]
- classmethod delete(llm_blueprint_ids, chat_ids)
Delete the associated evaluation dataset metric aggregations. Either llm_blueprint_ids or chat_ids must be provided. If both are provided, only results matching both will be removed.
- Parameters:
llm_blueprint_ids (
List[str]
) – The LLM blueprint IDs to filter on.chat_ids (
List[str]
) – The chat IDs to filter on.
- Return type:
None
- class datarobot.models.genai.synthetic_evaluation_dataset_generation.SyntheticEvaluationDataset
Bases:
APIObject
A synthetically generated evaluation dataset for LLMs.
- Variables:
(str) (response_column_name)
(str)
(str)
- classmethod create(llm_id, vector_database_id, llm_settings=None, dataset_name=None, language=None)
Create a synthetic evaluation dataset generation job. This will create a synthetic dataset to be used for evaluation of a language model.
- Parameters:
(str) (language)
(Dict[Optional[str][Union[bool (llm_settings) – model used for dataset generation.
int (
The settings
touse for the language
) – model used for dataset generation.float (
The settings
touse for the language
) – model used for dataset generation.str]]]) (
The settings
touse for the language
) – model used for dataset generation.(str)
(str)
(str)
- Returns:
SyntheticEvaluationDataset
- Return type:
Reference
tothe synthetic evaluation dataset that was created.
- class datarobot.models.genai.sidecar_model_metric.SidecarModelMetricValidation
Bases:
APIObject
A sidecar model metric validation for LLMs.
- Variables:
id (
str
) – The ID of the sidecar model metric validation.prompt_column_name (
str
) – The name of the prompt column for the sidecar model.deployment_id (
str
) – The ID of the deployment associated with the sidecar model.model_id (
str
) – The ID of the sidecar model.validation_status (
str
) – The status of the validation job.deployment_access_data (
dict
) – Data that will be used for accessing deployment prediction server. Only available for deployments that passed validation. Dict fields: - prediction_api_url - URL for deployment prediction server. - datarobot_key - first of 2 auth headers for the prediction server. - authorization_header - second of 2 auth headers for the prediction server. - input_type - Either JSON or CSV - the input type that the model expects. - model_type - Target type of the deployed custom model.tenant_id (
str
) – The ID of the tenant that created the sidecar model metric validation.name (
str
) – The name of the sidecar model metric.creation_date (
str
) – The date the sidecar model metric validation was created.user_id (
str
) – The ID of the user that created the sidecar model metric validation.deployment_name (
str
) – The name of the deployment associated with the sidecar model.user_name (
str
) – The name of the user that created the sidecar model metric validation.use_case_id (
str
) – The ID of the use case associated with the sidecar model metric validation.prediction_timeout (
int
) – The timeout in seconds for the prediction API used in this sidecar model metric validation.error_message (
str
) – Additional information for errored validation.citations_prefix_column_name (
str
) – The name of the prefix in the citations column for the sidecar model.response_column_name (
str
) – The name of the response column for the sidecar model.expected_response_column_name (
str
) – The name of the expected response column for the sidecar model.target_column_name (
str
) – The name of the target column for the sidecar model.
- classmethod create(deployment_id, name, prediction_timeout, model_id=None, use_case_id=None, playground_id=None, prompt_column_name=None, target_column_name=None, response_column_name=None, citation_prefix_column_name=None, expected_response_column_name=None)
Create a sidecar model metric validation.
- Parameters:
deployment_id (
str
) – The ID of the deployment to validate.name (
str
) – The name of the validation.prediction_timeout (
int
) – The timeout in seconds for the prediction API used in this validation.model_id (
Optional[str]
) – The ID of the model to validate.use_case_id (
Optional[str]
) – The ID of the use case associated with the validation.playground_id (
Optional[str]
) – The ID of the playground associated with the validation.prompt_column_name (
Optional[str]
) – The name of the prompt column for the sidecar model.target_column_name (
Optional[str]
) – The name of the target column for the sidecar model.response_column_name (
Optional[str]
) – The name of the response column for the sidecar model.citation_prefix_column_name (
Optional[str]
) – The name of the prefix for citations column for the sidecar model.expected_response_column_name (
Optional[str]
) – The name of the expected response column for the sidecar model.
- Returns:
The created sidecar model metric validation.
- Return type:
- classmethod list(use_case_ids=None, offset=None, limit=None, search=None, sort=None, completed_only=True, deployment_id=None, model_id=None, prompt_column_name=None, target_column_name=None, citation_prefix_column_name=None)
List sidecar model metric validations.
- Parameters:
use_case_ids (
List[str]
, optional) – The IDs of the use cases to filter by.offset (
Optional[int]
) – The number of records to skip.limit (
Optional[int]
) – The maximum number of records to return.search (
Optional[str]
) – The search string.sort (
Optional[str]
) – The sort order.completed_only (
Optional[bool]
) – Whether to return only completed validations.deployment_id (
Optional[str]
) – The ID of the deployment to filter by.model_id (
Optional[str]
) – The ID of the model to filter by.prompt_column_name (
Optional[str]
) – The name of the prompt column to filter by.target_column_name (
Optional[str]
) – The name of the target column to filter by.citation_prefix_column_name (
Optional[str]
) – The name of the prefix for citations column to filter by.
- Returns:
The list of sidecar model metric validations.
- Return type:
List[SidecarModelMetricValidation]
- classmethod get(validation_id)
Get a sidecar model metric validation by ID.
- Parameters:
validation_id (
str
) – The ID of the validation to get.- Returns:
The sidecar model metric validation.
- Return type:
- revalidate()
Revalidate the sidecar model metric validation.
- Returns:
The sidecar model metric validation.
- Return type:
- update(name=None, prompt_column_name=None, target_column_name=None, response_column_name=None, expected_response_column_name=None, citation_prefix_column_name=None, deployment_id=None, model_id=None, prediction_timeout=None)
Update the sidecar model metric validation.
- Parameters:
name (
Optional[str]
) – The name of the validation.prompt_column_name (
Optional[str]
) – The name of the prompt column for the sidecar model.target_column_name (
Optional[str]
) – The name of the target column for the sidecar model.response_column_name (
Optional[str]
) – The name of the response column for the sidecar model.expected_response_column_name (
Optional[str]
) – The name of the expected response column for the sidecar model.citation_prefix_column_name (
Optional[str]
) – The name of the prefix for citations column for the sidecar model.deployment_id (
Optional[str]
) – The ID of the deployment to validate.model_id (
Optional[str]
) – The ID of the model to validate.prediction_timeout (
Optional[int]
) – The timeout in seconds for the prediction API used in this validation.
- Returns:
The updated sidecar model metric validation.
- Return type:
- delete()
Delete the sidecar model metric validation.
- Return type:
None
- class datarobot.models.genai.llm_test_configuration.LLMTestConfiguration
Bases:
APIObject
Metadata for a DataRobot GenAI LLM test configuration.
- Variables:
id (
str
) – The LLM test configuration ID.name (
str
) – The LLM test configuration name.description (
str
) – The LLM test configuration description.dataset_evaluations (
list[DatasetEvaluation]
) – The dataset/insight combinations that make up the LLM test configuration.llm_test_grading_criteria (
LLMTestGradingCriteria
) – The criteria used to grade the result of the LLM test configuration.is_out_of_the_box_test_configuration (
bool
) – Whether this is an out-of-the-box configuration.use_case_id (
Optional[str]
) – The ID of the linked Use Case, if any.creation_date (
Optional[str]
) – The date the LLM test configuration was created, if any.creation_user_id (
Optional[str]
) – The ID of the creating user, if any.warnings (
Optional[list[Dict[str
,str]]]
) – The warnings for the LLM test configuration, if any.
- classmethod create(name, dataset_evaluations, llm_test_grading_criteria, use_case=None, description=None)
Creates a new LLM test configuration.
- Parameters:
name (
str
) – The LLM test configuration name.dataset_evaluations (
list[DatasetEvaluationRequestDict]
) – The LLM test dataset evaluation requests.llm_test_grading_criteria (
LLMTestGradingCriteria
) – The LLM test grading criteria.use_case (
Optional[Union[UseCase
,str]]
, optional) – Use case to link to the created llm test configuration.description (
Optional[str]
) – The LLM test configuration description. If None, the default, description returns an empty string.
- Returns:
llm_test_configuration – The created LLM test configuration.
- Return type:
- classmethod get(llm_test_configuration)
Retrieve a single LLM Test configuration.
- Parameters:
llm_test_configuration (
LLMTestConfiguration
orstr
) – The LLM test configuration to retrieve, either LLMTestConfiguration or LLMTestConfiguration ID.- Returns:
llm_test_configuration – The requested LLM Test configuration.
- Return type:
- classmethod list(use_case=None, test_config_type=None)
List all LLM test configurations available to the user. If a Use Case is specified, results are restricted to only those configurations associated with that Use Case.
- Parameters:
use_case (
Optional[UseCaseLike]
, optional) – Returns only those configurations associated with a particular Use Case, specified by either the Use Case name or ID.test_config_type (
Optional[LLMTestConfigurationType]
, optional) – Returns only configurations of the specified type. If not specified, the custom test configurations are returned.
- Returns:
llm_test_configurations – Returns a list of LLM test configurations.
- Return type:
list[LLMTestConfiguration]
- update(name=None, description=None, dataset_evaluations=None, llm_test_grading_criteria=None)
Update the LLM test configuration.
- Parameters:
name (
Optional[str]
) – The new LLM test configuration name.description (
Optional[str]
) – The new LLM test configuration description.dataset_evaluations (
list[DatasetEvaluationRequestDict]
, optional) – The new dataset evaluation requests.llm_test_grading_criteria (
LLMTestGradingCriteria
, optional) – The new grading criteria.
- Returns:
llm_test_configuration – The updated LLM test configuration.
- Return type:
- delete()
Delete a single LLM test configuration.
- Return type:
None
- class datarobot.models.genai.llm_test_configuration.LLMTestConfigurationSupportedInsights
Bases:
APIObject
Metadata for a DataRobot GenAI LLM test configuration supported insights.
- Variables:
supported_insight_configurations (
list[InsightsConfiguration]
) – The supported insights for LLM test configurations.
- classmethod list(use_case=None, playground=None)
List all supported insights for a LLM test configuration.
- Parameters:
use_case (
Optional[Union[UseCase
,str]]
, optional) – Returns only those supported insight configurations associated with a particular Use Case, specified by either the Use Case name or ID.playground (
Optional[Union[Playground
,str]]
, optional) – Returns only those supported insight configurations associated with a particular playground, specified by either the Playground or ID.
- Returns:
llm_test_configuration_supported_insights – Returns the supported insight configurations for the LLM test configuration.
- Return type:
- class datarobot.models.genai.llm_test_result.LLMTestResult
Bases:
APIObject
Metadata for a DataRobot GenAI LLM test result.
- Variables:
id (
str
) – The LLM test result ID.llm_test_configuration_id (
str
) – The LLM test configuration ID associated with this LLM test result.llm_test_configuration_name (
str
) – The LLM test configuration name associated with this LLM test result.use_case_id (
str
) – The ID of the Use Case associated with this LLM test result.llm_blueprint_id (
str
) – The ID of the LLM blueprint for this LLM test result.llm_test_grading_criteria (
LLMTestGradingCriteria
) – The criteria used to grade the result of the LLM test configuration.grading_result (
GradingResult
) – The overall grading result for the LLM test.pass_percentage (
float
) – The percentage of insight evaluation results that passed the grading criteria.execution_status (
str
) – The execution status of the job that evaluated the LLM test result.insight_evaluation_result (
list[InsightEvaluationResult]
) – The results for the individual insights that make up the LLM test result.creation_date (
str
) – The date of the LLM test result.creation_user_id (
str
) – The ID of the user who executed the LLM test.creation_user_name (
str
) – The name of the user who executed the LLM test.
- classmethod create(llm_test_configuration, llm_blueprint)
Create a new LLMTestResult. This executes the LLM test configuration using the specified LLM blueprint. To check the status of the LLM test, use the LLMTestResult.get method with the returned ID.
- Parameters:
llm_test_configuration (
LLMTestConfiguration
orstr
) – The LLM test configuration to execute, either LLMTestConfiguration or the LLM test configuration ID.llm_blueprint (
LLMBlueprint
orstr
) – The LLM blueprint to test, either LLMBlueprint or the LLM blueprint ID.
- Returns:
llm_test_result – The created LLM test result.
- Return type:
- classmethod get(llm_test_result)
Retrieve a single LLM test result.
- Parameters:
llm_test_result (
LLMTestResult
orstr
) – The LLM test result to retrieve, specified by either LLM test result or test ID.- Returns:
llm_test_result – The requested LLM test result.
- Return type:
- classmethod list(llm_test_configuration=None, llm_blueprint=None)
List all LLM test results available to the user. If the LLM test configuration or LLM blueprint is specified, results are restricted to only those LLM test results associated with the LLM test configuration or LLM blueprint.
- Parameters:
llm_test_configuration (
Optional[Union[LLMTestConfiguration
,str]]
) – The returned LLM test results are filtered to those associated with a specific LLM test configuration, if specified.llm_blueprint (
Optional[Union[LLMBlueprint
,str]]
) – The returned LLM test results, filtered by those associated with a specific LLM blueprint, if specified.
- Returns:
llm_test_results – Returns a list of LLM test results.
- Return type:
List[LLMTestResult]
- delete()
Delete a single LLM test result.
- Return type:
None
- class datarobot.models.genai.llm_test_configuration.DatasetEvaluation
Bases:
APIObject
Metadata for a DataRobot GenAI dataset evaluation.
- Variables:
evaluation_name (
str
) – The name of the evaluation.evaluation_dataset_configuration_id (
str
orNone
, optional) – The ID of the evaluation dataset configuration for custom datasets.evaluation_dataset_name (
str
) – The name of the evaluation dataset.ootb_dataset (
OOTBDataset
orNone
, optional) – Out-of-the-box dataset.insight_configuration (
InsightsConfiguration
) – The insight to calculate for this dataset.insight_grading_criteria (
InsightGradingCriteria
) – The criteria to use for grading the results.max_num_prompts (
int
) – The maximum number of prompts to use for the evaluation.prompt_sampling_strategy (
PromptSamplingStrategy
) – The prompt sampling strategy for the dataset evaluation.
- to_dict()
- Return type:
- class datarobot.models.genai.llm_test_result.InsightEvaluationResult
Bases:
APIObject
Metadata for a DataRobot GenAI insight evaluation result.
- Variables:
id (
str
) – The ID of the insight evaluation result.llm_test_result_id (
str
) – The ID of the LLM test result associated with this insight evaluation result.evaluation_dataset_configuration_id (
str
) – The ID of the evaluation dataset configuration.evaluation_dataset_name (
str
) – The name of the evaluation dataset.metric_name (
str
) – The name of the metric.chat_id (
str
) – The ID of the chat containing the prompts and responses.chat_name (
str
) – The name of the chat containing the prompts and responses.aggregation_type (
AggregationType
) – The type of aggregation used for the metric results.grading_result (
GradingResult
) – The overall grade for the LLM test.execution_status (
str
) – The execution status of the LLM test.evaluation_name (
str
) – The name of the evaluation.insight_grading_criteria (
InsightGradingCriteria
) – The criteria to grade the results.last_update_date (
str
) – The date the result was most recently updated.aggregation_value (
float | List[Dict[str
,float]] | None
) – The aggregated metric result.
- class datarobot.models.genai.llm_test_configuration.OOTBDatasetDict
Bases:
dict
-
dataset_url:
Optional
[str
]
-
dataset_name:
str
-
prompt_column_name:
str
-
response_column_name:
Optional
[str
]
-
rows_count:
int
-
warning:
Optional
[str
]
-
dataset_url:
- class datarobot.models.genai.llm_test_configuration.DatasetEvaluationRequestDict
Bases:
dict
-
ootb_dataset_name:
str
-
evaluation_name:
str
-
evaluation_dataset_configuration_id:
Optional
[str
]
-
insight_configuration:
InsightsConfigurationDict
-
insight_grading_criteria:
InsightGradingCriteriaDict
-
max_num_prompts:
Optional
[int
]
-
prompt_sampling_strategy:
Optional
[PromptSamplingStrategy
]
-
ootb_dataset_name:
- class datarobot.models.genai.llm_test_configuration.DatasetEvaluationDict
Bases:
dict
-
evaluation_name:
str
-
evaluation_dataset_configuration_id:
Optional
[str
]
-
evaluation_dataset_name:
str
-
ootb_dataset:
Optional
[OOTBDatasetDict
]
-
insight_configuration:
InsightsConfigurationDict
-
insight_grading_criteria:
InsightGradingCriteriaDict
-
max_num_prompts:
Optional
[int
]
-
prompt_sampling_strategy:
Optional
[PromptSamplingStrategy
]
-
evaluation_name:
- class datarobot.models.genai.nemo_configuration.NemoConfiguration
Bases:
APIObject
Configuration for the Nemo Pipeline.
- Variables:
prompt_pipeline_metric_name (
Optional[str]
) – The name of the metric for the prompt pipeline.prompt_pipeline_files (
NemoFileContentsResponse
, optional) – The files used in the prompt pipeline.prompt_llm_configuration (
NemoLLMConfiguration
, optional) – The LLM configuration for the prompt pipeline.prompt_moderation_configuration (
ModerationConfigurationWithoutID
, optional) – The moderation configuration for the prompt pipeline.prompt_pipeline_template_id (
Optional[str]
) – The ID of the prompt pipeline template. This parameter defines the actions.py file.response_pipeline_metric_name (
Optional[str]
) – The name of the metric for the response pipeline.response_pipeline_files (
NemoFileContentsResponse
, optional) – The files used in the response pipeline.response_llm_configuration (
NemoLLMConfiguration
, optional) – The LLM configuration for the response pipeline.response_moderation_configuration (
ModerationConfigurationWithoutID
, optional) – The moderation configuration for the response pipeline.response_pipeline_template_id (
Optional[str]
) – The ID of the response pipeline template. This parameter defines the actions.py file.blocked_terms_file_contents (
str
) – The contents of the blocked terms file. This is shared between the prompt and response pipelines.
- classmethod get(playground)
Get the Nemo configuration for a playground.
- Parameters:
playground (
str
orPlayground
) – The playground to get the configuration for- Returns:
The Nemo configuration for the playground.
- Return type:
- classmethod upsert(playground, blocked_terms_file_contents, prompt_pipeline_metric_name=None, prompt_pipeline_files=None, prompt_llm_configuration=None, prompt_moderation_configuration=None, prompt_pipeline_template_id=None, response_pipeline_metric_name=None, response_pipeline_files=None, response_llm_configuration=None, response_moderation_configuration=None, response_pipeline_template_id=None)
Create or update the nemo configuration for a playground.
- Parameters:
playground (
str
orPlayground
) – The playground for the configurationblocked_terms_file_contents (
str
) – The contents of the blocked terms file.prompt_pipeline_metric_name (
Optional[str]
) – The name of the metric for the prompt pipeline.prompt_pipeline_files (
NemoFileContents
, optional) – The files used in the prompt pipeline.prompt_llm_configuration (
NemoLLMConfiguration
, optional) – The LLM configuration for the prompt pipeline.prompt_moderation_configuration (
ModerationConfigurationWithoutID
, optional) – The moderation configuration for the prompt pipeline.prompt_pipeline_template_id (
Optional[str]
) – The ID of the prompt pipeline template, this will define the action.py file.response_pipeline_metric_name (
Optional[str]
) – The name of the metric for the response pipeline.response_pipeline_files (
NemoFileContents
, optional) – The files used in the response pipeline.response_llm_configuration (
NemoLLMConfiguration
, optional) – The LLM configuration for the response pipeline.response_moderation_configuration (
ModerationConfigurationWithoutID
, optional) – The moderation configuration for the response pipeline.response_pipeline_template_id (
Optional[str]
) – The ID of the response pipeline template, this will define the action.py file.
- Returns:
The Nemo configuration for the playground.
- Return type:
- class datarobot.models.genai.llm_test_configuration.OOTBDataset
Bases:
APIObject
Metadata for a DataRobot GenAI out-of-the-box LLM compliance test dataset.
- Variables:
dataset_name (
str
) – The name of the dataset.prompt_column_name (
str
) – The name of the prompt column.response_column_name (
str
orNone
, optional) – The name of the response column, if any.dataset_url (
str
orNone
, optional) – The URL of the dataset.rows_count (
int
) – The number of rows in the dataset.warning (
str
orNone
, optional) – A warning message regarding the contents of the dataset, if any.
- to_dict()
- Return type:
- classmethod list()
List all out-of-the-box datasets available to the user.
- Returns:
ootb_datasets – Returns a list of out-of-the-box datasets.
- Return type:
list[OOTBDataset]
- class datarobot.models.genai.llm_test_configuration.NonOOTBDataset
Bases:
APIObject
Metadata for a DataRobot GenAI non out-of-the-box (OOTB) LLM compliance test dataset.
- classmethod list(use_case=None)
List all non out-of-the-box datasets available to the user.
- Returns:
non_ootb_datasets – Returns a list of non out-of-the-box datasets.
- Return type:
list[NonOOTBDataset]
- class datarobot.models.genai.metric_insights.MetricInsights
Bases:
InsightsConfiguration
Metric insights for playground.
- classmethod list(playground)
Get metric insights for playground.
- Parameters:
playground (
str
orPlayground
) – Playground to get the supported metrics from.- Returns:
insights – Metric insights for playground.
- Return type:
list[InsightsConfiguration]
- classmethod copy_to_playground(source_playground, target_playground, add_to_existing=True, with_evaluation_datasets=False)
Copy metric insights to from one playground to another.
- Parameters:
source_playground (
str
orPlayground
) – Playground to copy metric insights from.target_playground (
str
orPlayground
) – Playground to copy metric insights to.add_to_existing (
Optional[bool]
) – Add metric insights to existing ones in the target playground, by default True.with_evaluation_datasets (
Optional[bool]
) – Copy evaluation datasets from the source playground.
- Return type:
None
- class datarobot.models.genai.ootb_metric_configuration.PlaygroundOOTBMetricConfiguration
Bases:
APIObject
OOTB metric configurations for a playground.
- Variables:
ootb_metric_configurations (
(List[OOTBMetricConfigurationResponse])
:The list
ofthe OOTB metric configurations.
)
- path = 'api/v2/genai/playgrounds/{playground_id}/ootbMetricConfigurations'
- classmethod get(playground_id)
Get OOTB metric configurations for the playground.
- Return type:
- classmethod create(playground_id, ootb_metric_configurations)
Create a new OOTB metric configurations.
- Return type: