Data exports

Use deployment data export to retrieve data sent for predictions along with the associated predictions.

Prediction data export

The following sections outline how to manage prediction data exports.

Create a prediction data export

To create a prediction data export, use PredictionDataExport.create, defining the time window to include in the export using the start and end parameters:

from datetime import datetime, timedelta
from datarobot.models.deployment import PredictionDataExport

now=datetime.now()

prediction_data_export = PredictionDataExport.create(
    deployment_id='5c939e08962d741e34f609f0', start=now - timedelta(days=7), end=now)

Specify the model ID for export. Otherwise, the champion model ID is used by default:

from datetime import datetime, timedelta
from datarobot.models.deployment import PredictionDataExport

now=datetime.now()

prediction_data_export = PredictionDataExport.create(
    deployment_id='5c939e08962d741e34f609f0',
    model_id='6444482e5583f6ee2e572265',
    start=now - timedelta(days=7),
    end=now
)

For deployments in batch mode, provide batch IDs to export prediction data for those batches:

from datetime import datetime, timedelta
from datarobot.models.deployment import PredictionDataExport

now=datetime.now()

prediction_data_export = PredictionDataExport.create(
    deployment_id='5c939e08962d741e34f609f0',
    model_id='6444482e5583f6ee2e572265',
    start=now - timedelta(days=7),
    end=now,
    batch_ids=['6572db2c9f9d4ad3b9de33d0', '6572db2c9f9d4ad3b9de33d0']
)

The start and end of the export can be defined as a datetime or string type.

List prediction data exports

To list prediction data exports, use PredictionDataExport.list:

from datarobot.models.deployment import PredictionDataExport

prediction_data_exports = PredictionDataExport.list(deployment_id='5c939e08962d741e34f609f0', limit=0)

prediction_data_exports
>>> [PredictionDataExport('65fbe59aaa3f847bd5acc75b'),
     PredictionDataExport('65fbe59aaa3f847bd5acc75c'),
     PredictionDataExport('65fbe59aaa3f847bd5acc75a')]

To list all prediction data exports, set the limit to 0.

Adjust additional parameters to filter the data as needed:

from datarobot.enums import ExportStatus
from datarobot.models.deployment import PredictionDataExport

prediction_data_exports = PredictionDataExport.list(deployment_id='5c939e08962d741e34f609f0', limit=100, offset=100)

# Use additional filters
prediction_data_exports = PredictionDataExport.list(
    deployment_id='5c939e08962d741e34f609f0',
    model_id="6444482e5583f6ee2e572265",
    batch=False,
    status=ExportStatus.FAILED
)

Retrieve a prediction data export

To get a prediction data export by identifier, use PredictionDataExport.get:

from datarobot.models.deployment import PredictionDataExport

prediction_data_export = PredictionDataExport.get(
    deployment_id='5c939e08962d741e34f609f0', export_id='65fbe59aaa3f847bd5acc75b'
    )

prediction_data_exports
>>> PredictionDataExport('65fbe59aaa3f847bd5acc75b')

Fetch prediction export datasets

To return data from a prediction export as dr.Dataset, use the fetch_data method. This method can return a list of datasets; however, usually it returns one dataset. There are cases, like time series, when more than one element is returned. The obtained dataset (or datasets) can be transformed into, for example, a pandas DataFrame.

from datarobot.models.deployment import PredictionDataExport

prediction_data_export = PredictionDataExport.get(
    deployment_id='5c939e08962d741e34f609f0', export_id='65fbe59aaa3f847bd5acc75b'
    )
prediction_datasets = prediction_data_export.fetch_data()

prediction_datasets
>>> [Dataset(name='Deployment prediction data', id='65f240b0e37a9f1a104bf450')]

prediction_dataset = prediction_datasets[0]

df = prediction_dataset.get_as_dataframe()
df.head(2)
>>>    DR_RESERVED_PREDICTION_TIMESTAMP  ...    upstream_x_datarobot_version
    0  2024-03-13 23:00:38.998000+00:00  ...               predictionapi/X/X
    1  2024-03-13 23:00:38.998000+00:00  ...               predictionapi/X/X

Actuals data export

The following examples outline how to manage actuals data exports.

Create actuals data export

To create an actuals data export, use ActualsDataExport.create, defining the time window to include in the export using the start and end parameters:

from datetime import datetime, timedelta
from datarobot.models.deployment import ActualsDataExport

now=datetime.now()
actuals_data_export = ActualsDataExport.create(
    deployment_id='5c939e08962d741e34f609f0', start=now - timedelta(days=7), end=now
    )

Specify the model ID for export. Otherwise, the champion model ID is used by default:

from datetime import datetime, timedelta
from datarobot.models.deployment import ActualsDataExport

now=datetime.now()
actuals_data_export = ActualsDataExport.create(
    deployment_id='5c939e08962d741e34f609f0',
    model_id="6444482e5583f6ee2e572265",
    start=now - timedelta(days=7),
    end=now,
    )

To export only actuals that are matched to predictions, set only_matched_predictions to True; by default all available actuals are exported.

The start and end of the export can be defined as a datetime or string type.

from datetime import datetime, timedelta
from datarobot.models.deployment import ActualsDataExport

now=datetime.now()
actuals_data_export = ActualsDataExport.create(
    deployment_id='5c939e08962d741e34f609f0',
    only_matched_predictions=True,
    start=now - timedelta(days=7),
    end=now,
    )

List actuals data exports

To list actuals data exports, use ActualsDataExport.list:

from datarobot.models.deployment import ActualsDataExport

actuals_data_exports = ActualsDataExport.list(deployment_id='5c939e08962d741e34f609f0', limit=0)

actuals_data_exports
>>> [ActualsDataExport('660456a332d0081029ee5031'),
     ActualsDataExport('660456a332d0081029ee5032'),
     ActualsDataExport('660456a332d0081029ee5033')]

To list all actuals data exports, set the limit to 0.

Adjust additional parameters to filter the data as needed:

from datarobot.enums import ExportStatus
from datarobot.models.deployment import ActualsDataExport

# use additional filters
actuals_data_exports = ActualsDataExport.list(
    deployment_id='5c939e08962d741e34f609f0',
    offset=500,
    limit=50,
    status=ExportStatus.SUCCEEDED
)

Retrieve actuals data export

To get actuals data export by identifier, use ActualsDataExport.get, as in the following example:

from datarobot.models.deployment import ActualsDataExport

actuals_data_export = ActualsDataExport.get(
    deployment_id='5c939e08962d741e34f609f0', export_id='660456a332d0081029ee4031'
    )

actuals_data_export
>>> ActualsDataExport('660456a332d0081029ee4031')

Fetch actuals export datasets

To return data from actuals export as dr.Dataset, use the fetch_data method:

from datarobot.models.deployment import ActualsDataExport

actuals_data_export = ActualsDataExport.get(
    deployment_id='5c939e08962d741e34f609f0', export_id='660456a332d0081029ee4031'
    )
actuals_datasets = actuals_data_export.fetch_data()

actuals_datasets
>>> [Dataset(name='Deployment prediction data', id='65f240b0e37a9f1a104bf450')]

actuals_dataset = actuals_datasets[0]

df = actuals_dataset.get_as_dataframe()
df.head(2)
>>>    association_id                  timestamp  actuals  predictions
    0               1  2024-03-20 15:00:00+00:00     21.0    18.125388
    1              10  2024-03-20 15:00:00+00:00     12.0    22.805252

This method may return a list of datasets; however, it usually returns one dataset. The obtained dataset (or datasets) can be transformed into, for example, a pandas DataFrame.

Training data export

The following examples outline how to manage training data exports.

Create training data export

To create a training data export, use TrainingDataExport.create and define the deployment ID:

from datarobot.models.deployment import TrainingDataExport

dataset_id = TrainingDataExport.create(deployment_id='5c939e08962d741e34f609f0')

Specify the model ID for export. Otherwise, the champion model ID is used by default:

from datarobot.models.deployment import TrainingDataExport

dataset_id = TrainingDataExport.create(
    deployment_id='5c939e08962d741e34f609f0', model_id='6444482e5583f6ee2e572265')

dataset_id
>>> 65fb0c25019ca3333bbb4c10

This method returns the ID of the dataset that contains the training data. This dataset is saved in the AI Catalog.

List training data exports

To list training data exports, use TrainingDataExport.list:

from datarobot.models.deployment import TrainingDataExport

training_data_exports = TrainingDataExport.list(deployment_id='5c939e08962d741e34f609f0')

training_data_exports
>>> [TrainingDataExport('6565fbf2356124f1daa3acc522')]

Retrieve a training data export

To get training data export by identifier, use TrainingDataExport.get.

from datarobot.models.deployment import ActualsDataExport

training_data_export = TrainingDataExport.get(
    deployment_id='5c939e08962d741e34f609f0', export_id='65fbf2356124f1daa3acc522'
    )

training_data_export
>>> TrainingDataExport('6565fbf2356124f1daa3acc522')

Fetch a training export dataset

To return data from the training export as dr.Dataset, use fetch_data. This method returns a single training dataset. The obtained dataset can be transformed into, for example, a pandas DataFrame.

from datarobot.models.deployment import TrainingDataExport

training_data_export = TrainingDataExport.get(
    deployment_id='5c939e08962d741e34f609f0', export_id='660456a332d0081029ee4031'
    )
training_dataset = training_data_export.fetch_data()

training_dataset
>>> [Dataset(name='training-data-10k_diabetes.csv', id='65fb0c25019ca3333bbb4c10')]

df = training_dataset.get_as_dataframe()
df.head(2)
>>> acetohexamide  time_in_hospital  ... number_outpatient payer_code
  0            No                 1  ...                 0         YY
  1            No                 2  ...                 0         XX

Data quality export

The data-quality exports provide feedback on LLM deployments. It is intended to be used in conjunction with custom-metrics for prompt monitoring.

Data quality export list

To list data quality exports, use DataQualityExport.list:

The start and end of the export can be defined as a datetime or string type. There are many options for filtering and ordering the data.

from datetime import datetime, timedelta
from datarobot.models.deployment import DataQualityExport

now=datetime.now()

data_quality_exports = DataQualityExport.list(
    deployment_id='66903c40f18e6ec90fd7c8c7',
    start=now - timedelta(days=1),
    end=now,
)

data_quality_exports
>>> [DataQualityExport(6447ca39c6a04df6b5b0ed19c6101e3c),
 ...
 DataQualityExport(0ff46fd3636545a9ac3e15ee1dbd8638)]

data_quality_deports[0].metrics
>>> [{'id': '669688f90a23524131e2d301', 'name': 'metric 3', 'value': None},
 {'id': '669688e633ae1ffce40eb2f8', 'name': 'metric 2', 'value': 45.0},
 {'id': '669688d282c9384ab8068a6c', 'name': 'metric 1', 'value': 178.0}]