Data exports
Use deployment data export to retrieve data sent for predictions along with the associated predictions.
Prediction data export
The following sections outline how to manage prediction data exports.
Create a prediction data export
To create a prediction data export, use PredictionDataExport.create
, defining the time window to include in the export
using the start
and end
parameters:
from datetime import datetime, timedelta
from datarobot.models.deployment import PredictionDataExport
now=datetime.now()
prediction_data_export = PredictionDataExport.create(
deployment_id='5c939e08962d741e34f609f0', start=now - timedelta(days=7), end=now)
Specify the model ID for export. Otherwise, the champion model ID is used by default:
from datetime import datetime, timedelta
from datarobot.models.deployment import PredictionDataExport
now=datetime.now()
prediction_data_export = PredictionDataExport.create(
deployment_id='5c939e08962d741e34f609f0',
model_id='6444482e5583f6ee2e572265',
start=now - timedelta(days=7),
end=now
)
For deployments in batch mode, provide batch IDs to export prediction data for those batches:
from datetime import datetime, timedelta
from datarobot.models.deployment import PredictionDataExport
now=datetime.now()
prediction_data_export = PredictionDataExport.create(
deployment_id='5c939e08962d741e34f609f0',
model_id='6444482e5583f6ee2e572265',
start=now - timedelta(days=7),
end=now,
batch_ids=['6572db2c9f9d4ad3b9de33d0', '6572db2c9f9d4ad3b9de33d0']
)
The start
and end
of the export can be defined as a datetime or string type.
List prediction data exports
To list prediction data exports, use PredictionDataExport.list
:
from datarobot.models.deployment import PredictionDataExport
prediction_data_exports = PredictionDataExport.list(deployment_id='5c939e08962d741e34f609f0', limit=0)
prediction_data_exports
>>> [PredictionDataExport('65fbe59aaa3f847bd5acc75b'),
PredictionDataExport('65fbe59aaa3f847bd5acc75c'),
PredictionDataExport('65fbe59aaa3f847bd5acc75a')]
To list all prediction data exports, set the limit to 0
.
Adjust additional parameters to filter the data as needed:
from datarobot.enums import ExportStatus
from datarobot.models.deployment import PredictionDataExport
prediction_data_exports = PredictionDataExport.list(deployment_id='5c939e08962d741e34f609f0', limit=100, offset=100)
# Use additional filters
prediction_data_exports = PredictionDataExport.list(
deployment_id='5c939e08962d741e34f609f0',
model_id="6444482e5583f6ee2e572265",
batch=False,
status=ExportStatus.FAILED
)
Retrieve a prediction data export
To get a prediction data export by identifier, use PredictionDataExport.get
:
from datarobot.models.deployment import PredictionDataExport
prediction_data_export = PredictionDataExport.get(
deployment_id='5c939e08962d741e34f609f0', export_id='65fbe59aaa3f847bd5acc75b'
)
prediction_data_exports
>>> PredictionDataExport('65fbe59aaa3f847bd5acc75b')
Fetch prediction export datasets
To return data from a prediction export as dr.Dataset
, use the fetch_data
method. This method can return a list of datasets; however, usually it returns one dataset. There are cases, like time series, when more than one element is returned. The obtained dataset (or datasets) can be transformed into, for example, a pandas DataFrame.
from datarobot.models.deployment import PredictionDataExport
prediction_data_export = PredictionDataExport.get(
deployment_id='5c939e08962d741e34f609f0', export_id='65fbe59aaa3f847bd5acc75b'
)
prediction_datasets = prediction_data_export.fetch_data()
prediction_datasets
>>> [Dataset(name='Deployment prediction data', id='65f240b0e37a9f1a104bf450')]
prediction_dataset = prediction_datasets[0]
df = prediction_dataset.get_as_dataframe()
df.head(2)
>>> DR_RESERVED_PREDICTION_TIMESTAMP ... upstream_x_datarobot_version
0 2024-03-13 23:00:38.998000+00:00 ... predictionapi/X/X
1 2024-03-13 23:00:38.998000+00:00 ... predictionapi/X/X
Actuals data export
The following examples outline how to manage actuals data exports.
Create actuals data export
To create an actuals data export, use ActualsDataExport.create
, defining the time window to include in the export
using the start
and end
parameters:
from datetime import datetime, timedelta
from datarobot.models.deployment import ActualsDataExport
now=datetime.now()
actuals_data_export = ActualsDataExport.create(
deployment_id='5c939e08962d741e34f609f0', start=now - timedelta(days=7), end=now
)
Specify the model ID for export. Otherwise, the champion model ID is used by default:
from datetime import datetime, timedelta
from datarobot.models.deployment import ActualsDataExport
now=datetime.now()
actuals_data_export = ActualsDataExport.create(
deployment_id='5c939e08962d741e34f609f0',
model_id="6444482e5583f6ee2e572265",
start=now - timedelta(days=7),
end=now,
)
To export only actuals that are matched to predictions, set only_matched_predictions
to True
;
by default all available actuals are exported.
The start
and end
of the export can be defined as a datetime
or string
type.
from datetime import datetime, timedelta
from datarobot.models.deployment import ActualsDataExport
now=datetime.now()
actuals_data_export = ActualsDataExport.create(
deployment_id='5c939e08962d741e34f609f0',
only_matched_predictions=True,
start=now - timedelta(days=7),
end=now,
)
List actuals data exports
To list actuals data exports, use ActualsDataExport.list
:
from datarobot.models.deployment import ActualsDataExport
actuals_data_exports = ActualsDataExport.list(deployment_id='5c939e08962d741e34f609f0', limit=0)
actuals_data_exports
>>> [ActualsDataExport('660456a332d0081029ee5031'),
ActualsDataExport('660456a332d0081029ee5032'),
ActualsDataExport('660456a332d0081029ee5033')]
To list all actuals data exports, set the limit to 0
.
Adjust additional parameters to filter the data as needed:
from datarobot.enums import ExportStatus
from datarobot.models.deployment import ActualsDataExport
# use additional filters
actuals_data_exports = ActualsDataExport.list(
deployment_id='5c939e08962d741e34f609f0',
offset=500,
limit=50,
status=ExportStatus.SUCCEEDED
)
Retrieve actuals data export
To get actuals data export by identifier, use ActualsDataExport.get
, as in the following example:
from datarobot.models.deployment import ActualsDataExport
actuals_data_export = ActualsDataExport.get(
deployment_id='5c939e08962d741e34f609f0', export_id='660456a332d0081029ee4031'
)
actuals_data_export
>>> ActualsDataExport('660456a332d0081029ee4031')
Fetch actuals export datasets
To return data from actuals export as dr.Dataset
, use the fetch_data
method:
from datarobot.models.deployment import ActualsDataExport
actuals_data_export = ActualsDataExport.get(
deployment_id='5c939e08962d741e34f609f0', export_id='660456a332d0081029ee4031'
)
actuals_datasets = actuals_data_export.fetch_data()
actuals_datasets
>>> [Dataset(name='Deployment prediction data', id='65f240b0e37a9f1a104bf450')]
actuals_dataset = actuals_datasets[0]
df = actuals_dataset.get_as_dataframe()
df.head(2)
>>> association_id timestamp actuals predictions
0 1 2024-03-20 15:00:00+00:00 21.0 18.125388
1 10 2024-03-20 15:00:00+00:00 12.0 22.805252
This method may return a list of datasets; however, it usually returns one dataset. The obtained dataset (or datasets) can be transformed into, for example, a pandas DataFrame.
Training data export
The following examples outline how to manage training data exports.
Create training data export
To create a training data export, use TrainingDataExport.create
and define the deployment ID:
from datarobot.models.deployment import TrainingDataExport
dataset_id = TrainingDataExport.create(deployment_id='5c939e08962d741e34f609f0')
Specify the model ID for export. Otherwise, the champion model ID is used by default:
from datarobot.models.deployment import TrainingDataExport
dataset_id = TrainingDataExport.create(
deployment_id='5c939e08962d741e34f609f0', model_id='6444482e5583f6ee2e572265')
dataset_id
>>> 65fb0c25019ca3333bbb4c10
This method returns the ID of the dataset that contains the training data. This dataset is saved in the AI Catalog.
List training data exports
To list training data exports, use TrainingDataExport.list
:
from datarobot.models.deployment import TrainingDataExport
training_data_exports = TrainingDataExport.list(deployment_id='5c939e08962d741e34f609f0')
training_data_exports
>>> [TrainingDataExport('6565fbf2356124f1daa3acc522')]
Retrieve a training data export
To get training data export by identifier, use TrainingDataExport.get
.
from datarobot.models.deployment import ActualsDataExport
training_data_export = TrainingDataExport.get(
deployment_id='5c939e08962d741e34f609f0', export_id='65fbf2356124f1daa3acc522'
)
training_data_export
>>> TrainingDataExport('6565fbf2356124f1daa3acc522')
Fetch a training export dataset
To return data from the training export as dr.Dataset
, use fetch_data
. This method returns a single training dataset. The obtained dataset can be transformed into, for example, a pandas DataFrame.
from datarobot.models.deployment import TrainingDataExport
training_data_export = TrainingDataExport.get(
deployment_id='5c939e08962d741e34f609f0', export_id='660456a332d0081029ee4031'
)
training_dataset = training_data_export.fetch_data()
training_dataset
>>> [Dataset(name='training-data-10k_diabetes.csv', id='65fb0c25019ca3333bbb4c10')]
df = training_dataset.get_as_dataframe()
df.head(2)
>>> acetohexamide time_in_hospital ... number_outpatient payer_code
0 No 1 ... 0 YY
1 No 2 ... 0 XX
Data quality export
The data-quality exports provide feedback on LLM deployments. It is intended to be used in conjunction with custom-metrics for prompt monitoring.
Data quality export list
To list data quality exports, use DataQualityExport.list
:
The start
and end
of the export can be defined as a datetime
or string
type. There are many options for filtering and ordering the data.
from datetime import datetime, timedelta
from datarobot.models.deployment import DataQualityExport
now=datetime.now()
data_quality_exports = DataQualityExport.list(
deployment_id='66903c40f18e6ec90fd7c8c7',
start=now - timedelta(days=1),
end=now,
)
data_quality_exports
>>> [DataQualityExport(6447ca39c6a04df6b5b0ed19c6101e3c),
...
DataQualityExport(0ff46fd3636545a9ac3e15ee1dbd8638)]
data_quality_deports[0].metrics
>>> [{'id': '669688f90a23524131e2d301', 'name': 'metric 3', 'value': None},
{'id': '669688e633ae1ffce40eb2f8', 'name': 'metric 2', 'value': 45.0},
{'id': '669688d282c9384ab8068a6c', 'name': 'metric 1', 'value': 178.0}]