Binary Data Helpers

datarobot.helpers.binary_data_utils.get_encoded_image_contents_from_urls(urls, custom_headers=None, image_options=None, continue_on_error=False, n_threads=None)

Returns base64 encoded string of images located in addresses passed in input collection. Input collection should hold data of valid image url addresses reachable from location where code is being executed. Method will retrieve image, apply specified reformatting before converting contents to base64 string. Results will in same order as specified in input collection.

Parameters:
urls: Iterable

Iterable with url addresses to download images from

custom_headers: dict

Dictionary containing custom headers to use when downloading files using a URL. Detailed data related to supported Headers in HTTP can be found in the RFC specification for headers: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html When used, specified passed values will overwrite default header values.

image_options: ImageOptions class

Class holding parameters for use in image transformation and formatting.

continue_on_error: bool

If one of rows encounters error while retrieving content (i.e. file does not exist) should this error terminate process of downloading consecutive files or should process continue skipping this file.

n_threads: int or None

Number of threads to use for processing. If “None” is passed, the number of threads is determined automatically based on the number of available CPU cores. If this is not possible, 4 threads are used.

Returns:
List of base64 encoded strings representing reformatted images.
Raises:
ContentRetrievalTerminatedError:

The error is raised when the flag continue_on_error is set to` False` and processing has been terminated due to an exception while loading the contents of the file.

Return type:

List[Optional[str]]

datarobot.helpers.binary_data_utils.get_encoded_image_contents_from_paths(paths, image_options=None, continue_on_error=False, n_threads=None, base_path=None)

Returns base64 encoded string of images located in paths passed in input collection. Input collection should hold data of valid image paths reachable from location where code is being executed. Method will retrieve image, apply specified reformatting before converting contents to base64 string. Results will in same order as specified in input collection.

Parameters:
paths: Iterable

Iterable with path locations to open images from

image_options: ImageOptions class

Class holding parameters for image transformation and formatting

continue_on_error: bool

If one of rows encounters error while retrieving content (i.e. file does not exist) should this error terminate process of downloading consecutive files or should process continue skipping this file.

n_threads: int or None

Number of threads to use for processing. If “None” is passed, the number of threads is determined automatically based on the number of available CPU cores. If this is not possible, 4 threads are used.

base_path: Optional[str]

Base path to use when opening files. If specified, this path will be used as a base directory against which all relative paths will be evaluated. If not specified, the path will be evaluated against the directory where this script is running.

Returns:
List of base64 encoded strings representing reformatted images.
Raises:
ContentRetrievalTerminatedError:

The error is raised when the flag continue_on_error is set to` False` and processing has been terminated due to an exception while loading the contents of the file.

Return type:

List[Optional[str]]

datarobot.helpers.binary_data_utils.get_encoded_file_contents_from_paths(paths, continue_on_error=False, n_threads=None, base_path=None)

Returns base64 encoded string for files located under paths passed in input collection. Input collection should hold data of valid file paths locations reachable from location where code is being executed. Method will retrieve file and convert its contents to base64 string. Results will be returned in same order as specified in input collection.

Parameters:
paths: Iterable

Iterable with path locations to open images from

continue_on_error: bool

If one of rows encounters error while retrieving content (i.e. file does not exist) should this error terminate process of downloading consecutive files or should process continue skipping this file.

n_threads: int or None

Number of threads to use for processing. If “None” is passed, the number of threads is determined automatically based on the number of available CPU cores. If this is not possible, 4 threads are used.

base_path: Optional[str]

Base path to use when opening files. If specified, this path will be used as a base directory against which all relative paths will be evaluated. If not specified, the path will be evaluated against the directory where this script is running.

Returns:
List of base64 encoded strings representing files.
Raises:
ContentRetrievalTerminatedError:

The error is raised when the flag continue_on_error is set to` False` and processing has been terminated due to an exception while loading the contents of the file.

Return type:

List[Optional[str]]

datarobot.helpers.binary_data_utils.get_encoded_file_contents_from_urls(urls, custom_headers=None, continue_on_error=False, n_threads=None)

Returns base64-encoded string for files located in the URL addresses passed on input. Input collection holds data of valid file URL addresses reachable from location where code is being executed. Method will retrieve file and convert its contents to base64 string. Results will be returned in same order as specified in input collection.

Parameters:
urls: Iterable

Iterable containing URL addresses to download images from.

custom_headers: dict

Dictionary with headers to use when downloading files using a URL. Detailed data related to supported Headers in HTTP can be found in the RFC specification: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html. When specified, passed values will overwrite default header values.

continue_on_error: bool

If a row encounters an error while retrieving content (i.e., file does not exist), specifies whether the error results in terminating the process of downloading consecutive files or the process continues. Skipped files will be marked as missing.

n_threads: int or None

Number of threads to use for processing. If “None” is passed, the number of threads is determined automatically based on the number of available CPU cores. If this is not possible, 4 threads are used.

Returns:
List of base64 encoded strings representing files.
Raises:
ContentRetrievalTerminatedError:

The error is raised when the flag continue_on_error is set to` False` and processing has been terminated due to an exception while loading the contents of the file.

Return type:

List[Optional[str]]

class datarobot.helpers.image_utils.ImageOptions(should_resize=True, force_size=True, image_size=(224, 224), image_format=None, image_quality=75, image_subsampling=None, resample_method=1, keep_quality=True)

Image options class. Class holds image options related to image resizing and image reformatting.

should_resize: bool

Whether input image should be resized to new dimensions.

force_size: bool

Whether the image size should fully match the new requested size. If the original and new image sizes have different aspect ratios, specifying True will force a resize to exactly match the requested size. This may break the aspect ratio of the original image. If False, the resize method modifies the image to contain a thumbnail version of itself, no larger than the given size, that preserves the image’s aspect ratio.

image_size: Tuple[int, int]

New image size (width, height). Both values (width, height) should be specified and contain a positive value. Depending on the value of force_size, the image will be resized exactly to the given image size or will be resized into a thumbnail version of itself, no larger than the given size.

image_format: ImageFormat | str

What image format will be used to save result image after transformations. For example (ImageFormat.JPEG, ImageFormat.PNG). Values supported are in line with values supported by DataRobot. If no format is specified by passing None value original image format will be preserved.

image_quality: int or None

The image quality used when saving image. When None is specified, a value will not be passed and Pillow library will use its default.

resample_method: ImageResampleMethod

What resampling method should be used when resizing image.

keep_quality: bool

Whether the image quality is kept (when possible). If True, for JPEG images quality will be preserved. For other types, the value specified in image_quality will be used.