Data exports

Use deployment data export to retrieve data sent for predictions along with the associated predictions.

Prediction data export

The following sections outline how to manage prediction data exports.

Create a prediction data export

To create a prediction data export, use PredictionDataExport.create, defining the time window to include in the export using the start and end parameters:

from datetime import datetime, timedelta
from datarobot.models.deployment import PredictionDataExport

now=datetime.now()

prediction_data_export = PredictionDataExport.create(
    deployment_id='5c939e08962d741e34f609f0', start=now - timedelta(days=7), end=now)

Specify the model ID for export. Otherwise, the champion model ID is used by default:

from datetime import datetime, timedelta
from datarobot.models.deployment import PredictionDataExport

now=datetime.now()

prediction_data_export = PredictionDataExport.create(
    deployment_id='5c939e08962d741e34f609f0',
    model_id='6444482e5583f6ee2e572265',
    start=now - timedelta(days=7),
    end=now
)

For deployments in batch mode, provide batch IDs to export prediction data for those batches:

from datetime import datetime, timedelta
from datarobot.models.deployment import PredictionDataExport

now=datetime.now()

prediction_data_export = PredictionDataExport.create(
    deployment_id='5c939e08962d741e34f609f0',
    model_id='6444482e5583f6ee2e572265',
    start=now - timedelta(days=7),
    end=now,
    batch_ids=['6572db2c9f9d4ad3b9de33d0', '6572db2c9f9d4ad3b9de33d0']
)

The start and end of the export can be defined as a datetime or string type.

List prediction data exports

To list prediction data exports, use PredictionDataExport.list:

from datarobot.models.deployment import PredictionDataExport

prediction_data_exports = PredictionDataExport.list(deployment_id='5c939e08962d741e34f609f0', limit=0)

prediction_data_exports
>>> [PredictionDataExport('65fbe59aaa3f847bd5acc75b'),
     PredictionDataExport('65fbe59aaa3f847bd5acc75c'),
     PredictionDataExport('65fbe59aaa3f847bd5acc75a')]

To list all prediction data exports, set the limit to 0.

Adjust additional parameters to filter the data as needed:

from datarobot.enums import ExportStatus
from datarobot.models.deployment import PredictionDataExport

prediction_data_exports = PredictionDataExport.list(deployment_id='5c939e08962d741e34f609f0', limit=100, offset=100)

# Use additional filters
prediction_data_exports = PredictionDataExport.list(
    deployment_id='5c939e08962d741e34f609f0',
    model_id="6444482e5583f6ee2e572265",
    batch=False,
    status=ExportStatus.FAILED
)

Retrieve a prediction data export

To get a prediction data export by identifier, use PredictionDataExport.get:

from datarobot.models.deployment import PredictionDataExport

prediction_data_export = PredictionDataExport.get(
    deployment_id='5c939e08962d741e34f609f0', export_id='65fbe59aaa3f847bd5acc75b'
    )

prediction_data_exports
>>> PredictionDataExport('65fbe59aaa3f847bd5acc75b')

Fetch prediction export datasets

To return data from a prediction export as dr.Dataset, use the fetch_data method. This method can return a list of datasets; however, usually it returns one dataset. There are cases, like time series, when more than one element is returned. The obtained dataset (or datasets) can be transformed into, for example, a pandas DataFrame.

from datarobot.models.deployment import PredictionDataExport

prediction_data_export = PredictionDataExport.get(
    deployment_id='5c939e08962d741e34f609f0', export_id='65fbe59aaa3f847bd5acc75b'
    )
prediction_datasets = prediction_data_export.fetch_data()

prediction_datasets
>>> [Dataset(name='Deployment prediction data', id='65f240b0e37a9f1a104bf450')]

prediction_dataset = prediction_datasets[0]

df = prediction_dataset.get_as_dataframe()
df.head(2)
>>>    DR_RESERVED_PREDICTION_TIMESTAMP  ...    upstream_x_datarobot_version
    0  2024-03-13 23:00:38.998000+00:00  ...               predictionapi/X/X
    1  2024-03-13 23:00:38.998000+00:00  ...               predictionapi/X/X

Actuals data export

The following examples outline how to manage actuals data exports.

Create actuals data export

To create an actuals data export, use ActualsDataExport.create, defining the time window to include in the export using the start and end parameters:

from datetime import datetime, timedelta
from datarobot.models.deployment import ActualsDataExport

now=datetime.now()
actuals_data_export = ActualsDataExport.create(
    deployment_id='5c939e08962d741e34f609f0', start=now - timedelta(days=7), end=now
    )

Specify the model ID for export. Otherwise, the champion model ID is used by default:

from datetime import datetime, timedelta
from datarobot.models.deployment import ActualsDataExport

now=datetime.now()
actuals_data_export = ActualsDataExport.create(
    deployment_id='5c939e08962d741e34f609f0',
    model_id="6444482e5583f6ee2e572265",
    start=now - timedelta(days=7),
    end=now,
    )

To export only actuals that are matched to predictions, set only_matched_predictions to True; by default all available actuals are exported.

The start and end of the export can be defined as a datetime or string type.

from datetime import datetime, timedelta
from datarobot.models.deployment import ActualsDataExport

now=datetime.now()
actuals_data_export = ActualsDataExport.create(
    deployment_id='5c939e08962d741e34f609f0',
    only_matched_predictions=True,
    start=now - timedelta(days=7),
    end=now,
    )

List actuals data exports

To list actuals data exports, use ActualsDataExport.list:

from datarobot.models.deployment import ActualsDataExport

actuals_data_exports = ActualsDataExport.list(deployment_id='5c939e08962d741e34f609f0', limit=0)

actuals_data_exports
>>> [ActualsDataExport('660456a332d0081029ee5031'),
     ActualsDataExport('660456a332d0081029ee5032'),
     ActualsDataExport('660456a332d0081029ee5033')]

To list all actuals data exports, set the limit to 0.

Adjust additional parameters to filter the data as needed:

from datarobot.enums import ExportStatus
from datarobot.models.deployment import ActualsDataExport

# use additional filters
actuals_data_exports = ActualsDataExport.list(
    deployment_id='5c939e08962d741e34f609f0',
    offset=500,
    limit=50,
    status=ExportStatus.SUCCEEDED
)

Retrieve actuals data export

To get actuals data export by identifier, use ActualsDataExport.get, as in the following example:

from datarobot.models.deployment import ActualsDataExport

actuals_data_export = ActualsDataExport.get(
    deployment_id='5c939e08962d741e34f609f0', export_id='660456a332d0081029ee4031'
    )

actuals_data_export
>>> ActualsDataExport('660456a332d0081029ee4031')

Fetch actuals export datasets

To return data from actuals export as dr.Dataset, use the fetch_data method:

from datarobot.models.deployment import ActualsDataExport

actuals_data_export = ActualsDataExport.get(
    deployment_id='5c939e08962d741e34f609f0', export_id='660456a332d0081029ee4031'
    )
actuals_datasets = actuals_data_export.fetch_data()

actuals_datasets
>>> [Dataset(name='Deployment prediction data', id='65f240b0e37a9f1a104bf450')]

actuals_dataset = actuals_datasets[0]

df = actuals_dataset.get_as_dataframe()
df.head(2)
>>>    association_id                  timestamp  actuals  predictions
    0               1  2024-03-20 15:00:00+00:00     21.0    18.125388
    1              10  2024-03-20 15:00:00+00:00     12.0    22.805252

This method may return a list of datasets; however, it usually returns one dataset. The obtained dataset (or datasets) can be transformed into, for example, a pandas DataFrame.

Training data export

The following examples outline how to manage training data exports.

Create training data export

To create a training data export, use TrainingDataExport.create and define the deployment ID:

from datarobot.models.deployment import TrainingDataExport

dataset_id = TrainingDataExport.create(deployment_id='5c939e08962d741e34f609f0')

Specify the model ID for export. Otherwise, the champion model ID is used by default:

from datarobot.models.deployment import TrainingDataExport

dataset_id = TrainingDataExport.create(
    deployment_id='5c939e08962d741e34f609f0', model_id='6444482e5583f6ee2e572265')

dataset_id
>>> 65fb0c25019ca3333bbb4c10

This method returns the ID of the dataset that contains the training data. This dataset is saved in the AI Catalog.

List training data exports

To list training data exports, use TrainingDataExport.list:

from datarobot.models.deployment import TrainingDataExport

training_data_exports = TrainingDataExport.list(deployment_id='5c939e08962d741e34f609f0')

training_data_exports
>>> [TrainingDataExport('6565fbf2356124f1daa3acc522')]

Retrieve a training data export

To get training data export by identifier, use TrainingDataExport.get.

from datarobot.models.deployment import ActualsDataExport

training_data_export = TrainingDataExport.get(
    deployment_id='5c939e08962d741e34f609f0', export_id='65fbf2356124f1daa3acc522'
    )

training_data_export
>>> TrainingDataExport('6565fbf2356124f1daa3acc522')

Fetch a training export dataset

To return data from the training export as dr.Dataset, use fetch_data. This method returns a single training dataset. The obtained dataset can be transformed into, for example, a pandas DataFrame.

from datarobot.models.deployment import TrainingDataExport

training_data_export = TrainingDataExport.get(
    deployment_id='5c939e08962d741e34f609f0', export_id='660456a332d0081029ee4031'
    )
training_dataset = training_data_export.fetch_data()

training_dataset
>>> [Dataset(name='training-data-10k_diabetes.csv', id='65fb0c25019ca3333bbb4c10')]

df = training_dataset.get_as_dataframe()
df.head(2)
>>> acetohexamide  time_in_hospital  ... number_outpatient payer_code
  0            No                 1  ...                 0         YY
  1            No                 2  ...                 0         XX

Data quality export

The data-quality exports provide feedback on LLM deployments. It is intended to be used in conjunction with custom-metrics for prompt monitoring.

Data quality export list

To list data quality exports, use DataQualityExport.list:

The start and end of the export can be defined as a datetime or string type. There are many options for filtering and ordering the data.

from datetime import datetime, timedelta
from datarobot.models.deployment import DataQualityExport

now=datetime.now()

data_quality_exports = DataQualityExport.list(
    deployment_id='66903c40f18e6ec90fd7c8c7',
    start=now - timedelta(days=1),
    end=now,
)

data_quality_exports
>>> [DataQualityExport(6447ca39c6a04df6b5b0ed19c6101e3c),
 ...
 DataQualityExport(0ff46fd3636545a9ac3e15ee1dbd8638)]

data_quality_deports[0].metrics
>>> [{'id': '669688f90a23524131e2d301', 'name': 'metric 3', 'value': None},
 {'id': '669688e633ae1ffce40eb2f8', 'name': 'metric 2', 'value': 45.0},
 {'id': '669688d282c9384ab8068a6c', 'name': 'metric 1', 'value': 178.0}]