Data exports
Use deployment data export to retrieve the data sent for predictions along with the associated predictions.
Prediction data export
Use the following commands to manage prediction data exports:
Create a prediction data export
To create a prediction data export, use PredictionDataExport.create
, defining the time window to include in the export
using the start
and end
parameters, as shown in the following example:
from datetime import datetime, timedelta
from datarobot.models.deployment import PredictionDataExport
now=datetime.now()
prediction_data_export = PredictionDataExport.create(
deployment_id='5c939e08962d741e34f609f0', start=now - timedelta(days=7), end=now)
Specify the model ID for export, otherwise the champion model ID is used by default:
from datetime import datetime, timedelta
from datarobot.models.deployment import PredictionDataExport
now=datetime.now()
prediction_data_export = PredictionDataExport.create(
deployment_id='5c939e08962d741e34f609f0',
model_id='6444482e5583f6ee2e572265',
start=now - timedelta(days=7),
end=now
)
For deployments in batch mode, provide batch IDs to export prediction data for those batches:
from datetime import datetime, timedelta
from datarobot.models.deployment import PredictionDataExport
now=datetime.now()
prediction_data_export = PredictionDataExport.create(
deployment_id='5c939e08962d741e34f609f0',
model_id='6444482e5583f6ee2e572265',
start=now - timedelta(days=7),
end=now,
batch_ids=['6572db2c9f9d4ad3b9de33d0', '6572db2c9f9d4ad3b9de33d0']
)
The start
and end
of the export can be defined as a datetime or string type.
List prediction data exports
To list prediction data exports, use PredictionDataExport.list
, as in the following example:
from datarobot.models.deployment import PredictionDataExport
prediction_data_exports = PredictionDataExport.list(deployment_id='5c939e08962d741e34f609f0', limit=0)
prediction_data_exports
>>> [PredictionDataExport('65fbe59aaa3f847bd5acc75b'),
PredictionDataExport('65fbe59aaa3f847bd5acc75c'),
PredictionDataExport('65fbe59aaa3f847bd5acc75a')]
To list all prediction data exports, set the limit to 0.
Adjust additional parameters to filter the data as needed:
from datarobot.enums import ExportStatus
from datarobot.models.deployment import PredictionDataExport
prediction_data_exports = PredictionDataExport.list(deployment_id='5c939e08962d741e34f609f0', limit=100, offset=100)
# use additional filters
prediction_data_exports = PredictionDataExport.list(
deployment_id='5c939e08962d741e34f609f0',
model_id="6444482e5583f6ee2e572265",
batch=False,
status=ExportStatus.FAILED
)
Retrieve a prediction data export
To get a prediction data export by identifier, use PredictionDataExport.get
, as in the following example:
from datarobot.models.deployment import PredictionDataExport
prediction_data_export = PredictionDataExport.get(
deployment_id='5c939e08962d741e34f609f0', export_id='65fbe59aaa3f847bd5acc75b'
)
prediction_data_exports
>>> PredictionDataExport('65fbe59aaa3f847bd5acc75b')
Fetch prediction export datasets
To return data from a prediction export as dr.Dataset
, use fetch_data
method, as in the following example:
from datarobot.models.deployment import PredictionDataExport
prediction_data_export = PredictionDataExport.get(
deployment_id='5c939e08962d741e34f609f0', export_id='65fbe59aaa3f847bd5acc75b'
)
prediction_datasets = prediction_data_export.fetch_data()
prediction_datasets
>>> [Dataset(name='Deployment prediction data', id='65f240b0e37a9f1a104bf450')]
prediction_dataset = prediction_datasets[0]
df = prediction_dataset.get_as_dataframe()
df.head(2)
>>> DR_RESERVED_PREDICTION_TIMESTAMP ... upstream_x_datarobot_version
0 2024-03-13 23:00:38.998000+00:00 ... predictionapi/X/X
1 2024-03-13 23:00:38.998000+00:00 ... predictionapi/X/X
This method can return a list of datasets; however, usually it returns one dataset . There are cases, like time series, when more than one element is returned. The obtained dataset (or datasets) can be transformed into, for example, a pandas DataFrame.
Actuals data export
Use the following commands to manage actuals data exports:
Create actuals data export
To create actuals data export, use ActualsDataExport.create
, defining the time window to include in the export
using the start
and end
parameters, as shown in the following example:
from datetime import datetime, timedelta
from datarobot.models.deployment import ActualsDataExport
now=datetime.now()
actuals_data_export = ActualsDataExport.create(
deployment_id='5c939e08962d741e34f609f0', start=now - timedelta(days=7), end=now
)
Specify the model ID for export, otherwise the champion model ID is used by default:
from datetime import datetime, timedelta
from datarobot.models.deployment import ActualsDataExport
now=datetime.now()
actuals_data_export = ActualsDataExport.create(
deployment_id='5c939e08962d741e34f609f0',
model_id="6444482e5583f6ee2e572265",
start=now - timedelta(days=7),
end=now,
)
To export only actuals that are matched to predictions, set only_matched_predictions
to True
;
by default all available actuals are exported.
from datetime import datetime, timedelta
from datarobot.models.deployment import ActualsDataExport
now=datetime.now()
actuals_data_export = ActualsDataExport.create(
deployment_id='5c939e08962d741e34f609f0',
only_matched_predictions=True,
start=now - timedelta(days=7),
end=now,
)
The start
and end
of the export can be defined as a datetime or string type.
List actuals data exports
To list actuals data exports, use ActualsDataExport.list
, as in the following example:
from datarobot.models.deployment import ActualsDataExport
actuals_data_exports = ActualsDataExport.list(deployment_id='5c939e08962d741e34f609f0', limit=0)
actuals_data_exports
>>> [ActualsDataExport('660456a332d0081029ee5031'),
ActualsDataExport('660456a332d0081029ee5032'),
ActualsDataExport('660456a332d0081029ee5033')]
To list all actuals data exports, set the limit to 0.
Adjust additional parameters to filter the data as needed:
from datarobot.enums import ExportStatus
from datarobot.models.deployment import ActualsDataExport
# use additional filters
actuals_data_exports = ActualsDataExport.list(
deployment_id='5c939e08962d741e34f609f0',
offset=500,
limit=50,
status=ExportStatus.SUCCEEDED
)
Retrieve actuals data export
To get actuals data export by identifier, use ActualsDataExport.get
, as in the following example:
from datarobot.models.deployment import ActualsDataExport
actuals_data_export = ActualsDataExport.get(
deployment_id='5c939e08962d741e34f609f0', export_id='660456a332d0081029ee4031'
)
actuals_data_export
>>> ActualsDataExport('660456a332d0081029ee4031')
Fetch actuals export datasets
To return data from actuals export as dr.Dataset
, use fetch_data
method, as in the following example:
from datarobot.models.deployment import ActualsDataExport
actuals_data_export = ActualsDataExport.get(
deployment_id='5c939e08962d741e34f609f0', export_id='660456a332d0081029ee4031'
)
actuals_datasets = actuals_data_export.fetch_data()
actuals_datasets
>>> [Dataset(name='Deployment prediction data', id='65f240b0e37a9f1a104bf450')]
actuals_dataset = actuals_datasets[0]
df = actuals_dataset.get_as_dataframe()
df.head(2)
>>> association_id timestamp actuals predictions
0 1 2024-03-20 15:00:00+00:00 21.0 18.125388
1 10 2024-03-20 15:00:00+00:00 12.0 22.805252
This method may return a list of datasets; however, it usually returns one dataset. The obtained dataset (or datasets) can be transformed into, for example, a pandas DataFrame.
Training data export
Use the following commands to manage training data exports:
Create training data export
To create training data export, use TrainingDataExport.create
and define the deployment ID, as shown in the following example:
from datarobot.models.deployment import TrainingDataExport
dataset_id = TrainingDataExport.create(deployment_id='5c939e08962d741e34f609f0')
Specify the model ID for export, otherwise the champion model ID is used by default:
from datarobot.models.deployment import TrainingDataExport
dataset_id = TrainingDataExport.create(
deployment_id='5c939e08962d741e34f609f0', model_id='6444482e5583f6ee2e572265')
dataset_id
>>> 65fb0c25019ca3333bbb4c10
This method returns the ID of the dataset that contains the training data. This dataset is saved in the AI Catalog.
List training data exports
To list training data exports, use TrainingDataExport.list
, as in the following example:
from datarobot.models.deployment import TrainingDataExport
training_data_exports = TrainingDataExport.list(deployment_id='5c939e08962d741e34f609f0')
training_data_exports
>>> [TrainingDataExport('6565fbf2356124f1daa3acc522')]
Retrieve training data export
To get training data export by identifier, use TrainingDataExport.get
, as in the following example:
from datarobot.models.deployment import ActualsDataExport
training_data_export = TrainingDataExport.get(
deployment_id='5c939e08962d741e34f609f0', export_id='65fbf2356124f1daa3acc522'
)
training_data_export
>>> TrainingDataExport('6565fbf2356124f1daa3acc522')
Fetch training export dataset
To return data from the training export as dr.Dataset
, use fetch_data
, as in the following example:
from datarobot.models.deployment import TrainingDataExport
training_data_export = TrainingDataExport.get(
deployment_id='5c939e08962d741e34f609f0', export_id='660456a332d0081029ee4031'
)
training_dataset = training_data_export.fetch_data()
training_dataset
>>> [Dataset(name='training-data-10k_diabetes.csv', id='65fb0c25019ca3333bbb4c10')]
df = training_dataset.get_as_dataframe()
df.head(2)
>>> acetohexamide time_in_hospital ... number_outpatient payer_code
0 No 1 ... 0 YY
1 No 2 ... 0 XX
This method returns a single training dataset. The obtained dataset can be transformed into, for example, a pandas DataFrame.
Data quality export
The data-quality exports provide feedback on LLM deployments. It is intended to be used in conjunction with custom-metrics for prompt monitoring.
Use the following commands to manage data-quality exports:
Data quality export list
To list data quality exports, use DataQualityExport.list
, as in the following example:
from datetime import datetime, timedelta
from datarobot.models.deployment import DataQualityExport
now=datetime.now()
data_quality_exports = DataQualityExport.list(
deployment_id='66903c40f18e6ec90fd7c8c7',
start=now - timedelta(days=1),
end=now,
)
data_quality_exports
>>> [DataQualityExport(6447ca39c6a04df6b5b0ed19c6101e3c),
...
DataQualityExport(0ff46fd3636545a9ac3e15ee1dbd8638)]
data_quality_deports[0].metrics
>>> [{'id': '669688f90a23524131e2d301', 'name': 'metric 3', 'value': None},
{'id': '669688e633ae1ffce40eb2f8', 'name': 'metric 2', 'value': 45.0},
{'id': '669688d282c9384ab8068a6c', 'name': 'metric 1', 'value': 178.0}]
The start and end of the export can be defined as a datetime or string type. And, there are many options for filtering and ordering the data.