DataRobot File System

DataRobot’s file system uses containers or “buckets” to store one or more files using a key-value storage approach, where the file’s path is the key and its contents the value. Each container is listed as an item under Data Assets (Data Catalog). We refer to the container as a catalog item.

The following should be kept in mind when working with the DataRobot file system:

Permissions are attached to the catalog item containing the files. All files inside a catalog item share the same permissions.
Since the DR file system uses key-value pairs to store files inside containers, directory structures are simulated and may change due to their contents. Most operations in the DataRobot file system support directory paths.
- DR file system does not support empty directories.
- To create directory X simply upload a file to a path that contains the directory name, e.g. X/file.txt.
- A directory will be deleted if all files inside a directory are deleted.
While the DR file system does not support empty directories, a catalog item may be empty.
The DR file system simulates a top-level directory structure by giving each catalog item its own directory named according to its id. Files inside the catalog item will appear as paths inside its directory.

class datarobot.fs.file_system.DataRobotFileSystem

Bases: AbstractFileSystem

fsspec implementation of DataRobot’s file system.

File paths are of the form:: dr://<catalog_item_id>/path/to/file.txt or <catalog_item_id>/path/to/file.txt

Variables:

protocol (str) – The protocol prefix for the DataRobot file system. Can be removed with _strip_protocol().
root_marker (str) – The root path of the DataRobot file system.

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()

List all catalog items in the file system:

>>> fs.ls("")
['696935d6d5a04a752419cf6d/', '69691fc3d5a04a752419cf5c/']

Create a new catalog item to hold your files:

>>> catalog_id = fs.create_catalog_item_dir()
>>> fs.put_file("local/path/to/file.txt", f"dr://{catalog_id}/file.txt")
>>> fs.ls(f"dr://{catalog_id}/")
['file.txt']

Find all PDF files you’ve uploaded to your catalog item:

>>> fs.glob(f"dr://{catalog_id}/**/*.pdf")
['696935d6d5a04a752419cf6d/file.pdf', '696935d6d5a04a752419cf6d/finance/fy-2024/budgets/Q2_budget_2024.pdf']

Copy, move or delete your files:

>>> fs.copy(f"dr://{catalog_id}/file.txt", f"dr://{catalog_id}/file_copy.txt")
>>> fs.move(f"dr://{catalog_id}/file_copy.txt", f"dr://{catalog_id}/file_moved.txt")
>>> fs.rm(f"dr://{catalog_id}/file_moved.txt")

Open files for reading or writing:

>>> with fs.open(f"dr://{catalog_id}/new_file.txt", mode="w") as f:
...     f.write("Hello, world!")

>>> with fs.open(f"dr://{catalog_id}/new_file.txt", mode="r") as f:
...     data = f.read()
...     print(data)
Hello, world!

classmethod _strip_protocol(path)

Turn path from fully-qualified to DR file system specific.

Parameters:: path (str) – File path in the DataRobot file system.
Returns:: Validated file path without the protocol prefix.
Return type:: str

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> DataRobotFileSystem._strip_protocol("dr://12345/path/to/file.txt")
'12345/path/to/file.txt'
>>> DataRobotFileSystem._strip_protocol("dr://12345/path/")
'12345/path/'
>>> DataRobotFileSystem._strip_protocol("dr:///12345/")
'12345/'
>>> DataRobotFileSystem._strip_protocol("dr://")
''

_split_path(path)

Split the given path into catalog ID and internal file path. Internal paths can be empty.

Parameters:: path (str) – File path in the DataRobot file system.
Returns:: A tuple of catalog ID and the internal file path.
Return type:: Tuple[str, str]
Raises:: ValueError – If the path format is invalid.

Examples

>>> fs = DataRobotFileSystem()
>>> fs._split_path("dr://12345/path/to/file.txt")
('12345', 'path/to/file.txt')
>>> fs._split_path("dr:///12345/")
('12345', '')
>>> fs._split_path("12345/folder/")
('12345', 'folder/')

ls(path, detail=True, **kwargs)

List files and folders at the given directory path. Use info() for information about a specific file.

If detail is True, returns a list of dictionaries with file details including name (path), size and type. If detail is False, returns a list of file and folder paths as strings.

Parameters:

path (str) – Path in the DataRobot file system to list.
detail (bool) – Whether to return detailed information.
kwargs (Any) – Additional keyword arguments for future proofing.
version_id (str) – Version ID of the catalog item to target. If not provided, the latest version is used.

Returns:

paths – List of dicts with file and folder details if detail is True, otherwise list of paths.

Return type:

List[FileInfo] or List[str]

Raises:

FileNotFoundError – If the specified path does not exist.

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.ls("dr://", detail=False)
['696935d6d5a04a752419cf6d/', 'abcdef1234567890abcdef12/']
>>> fs.ls("dr://696935d6d5a04a752419cf6d/finance/")
[
    {
        'name': '696935d6d5a04a752419cf6d/finance/fy-2024/',
        'size': 0,
        'type': 'directory',
        'format': None
    },
    {
        'name': '696935d6d5a04a752419cf6d/finance/employee-list.csv',
        'size': 2048,
        'type': 'file',
        'format': 'csv'
    },
]

See also

info()

exists()

info(path, **kwargs)

Get details about a file or directory.

For info about a directory path append a forward slash (/) at the end of the path. Paths without a trailing slash can return info about files or directories. If both a file and directory share the same path, the file info is returned.

Parameters:

path (str) – Path in the DataRobot file system to get information about.
version_id – Optional version ID of the catalog item to target. If not provided, the latest version is used.
kwargs (Any) – Additional keyword arguments passed to ls().

Returns:

info – A dictionary with file or directory details including name (path), size and type.

Return type:

FileInfo

Raises:

FileNotFoundError – If the specified path does not exist.
ValueError – If the path is invalid. Root path is not allowed.

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.info("dr://696935d6d5a04a752419cf6d/finance/employee-list.csv")
{
    'name': '696935d6d5a04a752419cf6d/finance/employee-list.csv',
    'size': 2048,
    'type': 'file',
    'format': 'csv',
    'created_at': datetime.datetime(2026, 3, 6, 10, 5, 16, 805655)
}
>>> fs.info("dr://696935d6d5a04a752419cf6d/finance/")
{
    'name': '696935d6d5a04a752419cf6d/finance/',
    'size': 0,
    'type': 'directory',
    'format': None,
    'created_at': None
}
>>> fs.info("dr://696935d6d5a04a752419cf6d/my_folder")
{
    'name': '696935d6d5a04a752419cf6d/my_folder/',
    'size': 0,
    'type': 'directory',
    'format': None,
    'created_at': None
}

created(path)

Return the created timestamp of a file as a datetime.datetime :type path: str :param path: Path in the DataRobot file system to get information about.

Returns:

created – if a directory.

Return type:

A datetime.datetime timestamp of when the file was created or None

Raises:

FileNotFoundError – If the specified path does not exist.
ValueError – If the path is invalid. Root path is not allowed.

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.created("dr://696935d6d5a04a752419cf6d/finance/employee-list.csv")
datetime.datetime(2026, 3, 6, 10, 5, 16, 805655)

du(path, total=True, maxdepth=None, withdirs=False, **kwargs)

Retrieve space used by files and optionally directories at a path.

Notes

Directory size does not include the size of its contents and is set to zero.

Parameters:

path (str) – The path to retrieve file space usage for.
total (bool) – Whether to sum all file sizes.
maxdepth (Optional[int]) – Maximum number of directory levels to descend when searching for files. Use None for unlimited.
withdirs (bool) – Whether to include directory paths in the output.
kwargs (Any) – Additional keyword arguments passed to find().

Returns:

If total is True, the number of bytes of all files in the path. If total is False, a dictionary mapping paths to their size.

Return type:

int or Dict[str, int]

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()

>>> fs.du("dr://696935d6d5a04a752419cf6d/finance/yellow.txt")
2048

>>> fs.du("dr://696935d6d5a04a752419cf6d/", total=False)
{'696935d6d5a04a752419cf6d/file.txt': 102, '696935d6d5a04a752419cf6d/finance/yellow.txt': 2048}

>>> fs.du("dr://696935d6d5a04a752419cf6d/", total=False, maxdepth=1, withdirs=True)
{'696935d6d5a04a752419cf6d/file.txt': 102, '696935d6d5a04a752419cf6d/finance/': 0}

find(path, maxdepth=None, withdirs=False, detail=False, **kwargs)

List all files below path. If withdirs is True, include directories as well.

Like posix find command without conditions

Parameters:

path (str) – The path to search from. Note that unlike the glob method, this method does not support glob patterns and treats the path as a literal directory path to search under or a filename to match.
maxdepth (Optional[int]) – If not None, the maximum number of levels to descend
withdirs (bool) – Whether to include directory paths in the output.
kwargs (Any) – Passed to ls

Returns:

If detail is False, a list of file (and optionally directory) paths. If detail is True, a dictionary mapping paths to their info dictionaries.

Return type:

List[str] or Dict[str, Dict[str, FileInfo]]

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.find("dr://696935d6d5a04a752419cf6d/", withdirs=True)
[
    '696935d6d5a04a752419cf6d/',
    '696935d6d5a04a752419cf6d/finance/',
    '696935d6d5a04a752419cf6d/finance/budgets/',
    '696935d6d5a04a752419cf6d/finance/budgets/Q2_budget_2024.pdf',
    '696935d6d5a04a752419cf6d/finance/employee-list.csv'
]

>>> fs.find("dr://696935d6d5a04a752419cf6d/finance/", maxdepth=1)
['696935d6d5a04a752419cf6d/finance/employee-list.csv']

>>> fs.find("dr://696935d6d5a04a752419cf6d/finance", maxdepth=1, withdirs=True, detail=True)
{
    '696935d6d5a04a752419cf6d/finance/': {
        'name': '696935d6d5a04a752419cf6d/finance/',
        'size': 0,
        'type': 'directory',
        'format': None,
        'created_at': None
    },
    '696935d6d5a04a752419cf6d/finance/employee-list.csv': {
        'name': '696935d6d5a04a752419cf6d/finance/employee-list.csv',
        'size': 2048,
        'type': 'file',
        'format': 'csv',
        'created_at': datetime.datetime(2026, 3, 6, 10, 5, 16, 805655)
    },
    '696935d6d5a04a752419cf6d/finance/budgets/': {
        'name': '696935d6d5a04a752419cf6d/finance/budgets/',
        'size': 0,
        'type': 'directory',
        'format': None,
        'created_at': None
    },
}

glob(path, maxdepth=None, detail=False, **kwargs)

Find files by glob-matching.

Pattern matching capabilities for finding files that match the given pattern.

Parameters:

path (str) – The glob pattern to match against.
maxdepth (Optional[int]) – Maximum depth for ‘**’ patterns. Applied on the first ‘**’ found. Must be at least 1 if provided.
detail (bool) – Whether to return detailed information.
kwargs (Any) – Additional arguments passed to find.

Returns:

If detail is False, a list of file and directory paths. If detail is True, a dictionary mapping paths to their info dictionaries.

Return type:

List[str] or Dict[str, FileInfo]

Notes

Supported patterns:

‘*’: Matches any sequence of characters within a single directory level
‘**’: Matches any number of directory levels (must be an entire path component)
‘?’: Matches exactly one character
‘[abc]’: Matches any character in the set
‘[a-z]’: Matches any character in the range
‘[!cat]’: Matches any character NOT in the set {c, a, t}

Special behaviors:

If the path ends with ‘/’, only folders are returned
Consecutive ‘*’ characters are compressed into a single ‘*’
Empty brackets ‘[]’ never match anything
Negated empty brackets ‘[!]’ match any single character
Special characters in character classes are escaped properly

Limitations:

‘**’ must be a complete path component (e.g., ‘a/**/b’, not ‘a**b’)
No brace expansion (‘{a, b}.txt’)
No extended glob patterns (‘+(pattern)’, ‘!(pattern)’)

See also

find()

Examples

Find all files and directories directly under the specified path.

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.glob("dr://696935d6d5a04a752419cf6d/finance/*", detail=False)
[
    '696935d6d5a04a752419cf6d/finance/budgets/',
    '696935d6d5a04a752419cf6d/finance/employee-list.csv'
]

Find only directories directly under the specified path.

>>> fs.glob("dr://696935d6d5a04a752419cf6d/finance/*/", detail=False)
['696935d6d5a04a752419cf6d/finance/budgets/']

Find any budget directories with a 4-digit year in their name.

>>> fs.glob("dr://696935d6d5a04a752419cf6d/finance/budgets/*-202[0-9]/", detail=False)
[
    '696935d6d5a04a752419cf6d/finance/budgets/fy-2024/',
    '696935d6d5a04a752419cf6d/finance/budgets/fy-2023/'
]

Find all .csv files at a maximum depth of 2 levels.

>>> fs.glob("dr://696935d6d5a04a752419cf6d/**/*.csv", maxdepth=2, detail=False)
[
    '696935d6d5a04a752419cf6d/finance/employee-list.csv',
    '696935d6d5a04a752419cf6d/sales/data.csv'
]

tree(path='', recursion_limit=2, max_display=25, display_size=False, prefix='', is_last=True, first=True, indent_size=4)

Return a tree-like structure string of the DataRobot file system from the given path.

Parameters:

path (str) – Path in the DataRobot file system to display the tree from.
recursion_limit (int) – Maximum depth of directory traversal.
max_display (int) – Maximum number of items to display per directory.
display_size (bool) – Whether to display file sizes.
prefix (str) – Current line prefix for visual tree structure.
is_last (bool) – Whether the current item is last in its level.
first (bool) – Whether this is the first call (displays root path).
indent_size (int) – Number of spaces by indent.

Returns:

tree_str – A string representing the tree structure of the file system.

Return type:

str

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> print(fs.tree("dr://696935d6d5a04a752419cf6d/", recursion_limit=5))
696935d6d5a04a752419cf6d/
└── finance/
    ├── fy-2024/
    │   └── budgets/
    │       └── Q2_budget_2024.pdf
    └── employee-list.csv

See also

walk()

cat_file(path, start=None, end=None, **kwargs)

Fetch a single file’s contents.

Parameters:

path (str) – File path in the DataRobot file system to read.
start (Optional[int]) – Optional starting byte position to read from. If negative, counts from the end of the file.
end (Optional[int]) – Optional ending byte position to read to. If negative, counts from the end of the file.
kwargs (Any) – Keyword arguments passed to DataRobotFileSystem.open().

Returns:

The contents of the file as bytes.

Return type:

bytes

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.cat_file("dr://696935d6d5a04a752419cf6d/finance/report.txt")
b'Q2 Financial Report...'

Read a range of bytes from a file:

>>> fs.cat_file("dr://696935d6d5a04a752419cf6d/finance/report.txt", start=10, end=20)
b'Financial Report...'

cat(path, recursive=False, on_error='raise', **kwargs)

Fetch (potentially multiple) path’s contents.

Parameters:

path (Union[str, List[str]]) – File or directory path(s) in the DataRobot file system to read. Can include glob patterns.
recursive (bool) – If True, assume the path(s) are directories, and get contents of all contained files.
on_error (Union[Literal['raise'], Literal['omit'], Literal['return']]) – If raise, an underlying exception will be raised (converted to KeyError if the type is in self.missing_exceptions); if omit, keys with exception will simply not be included in the output; if “return”, all keys are included in the output, but the value will be bytes or an exception instance.
kwargs (Any) – Additional keyword arguments passed to cat_file().

Returns:

If a single file path is provided, returns the file contents as bytes. If multiple paths are provided or the path is otherwise expanded, returns a dictionary mapping each path to its contents as bytes or an exception instance if on_error is set to “return”.

Return type:

bytes or Dict[str, bytes] or Dict[str, Union[bytes, Exception]]

Examples

Read a single file:

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.cat("dr://696935d6d5a04a752419cf6d/finance/report.txt")
b'Q2 Financial Report...'

Read multiple files and all files in a directory:

>>> fs.cat(
...     ["dr://696935d6d5a04a752419cf6d/finance/summary.txt", "dr://696935d6d5a04a752419cf6d/reports/"],
...     recursive=True
... )
{
    '696935d6d5a04a752419cf6d/finance/summary.txt': b'Summary...',
    '696935d6d5a04a752419cf6d/reports/report_2024.txt': b'2024 Report...',
    '696935d6d5a04a752419cf6d/reports/report_2025.txt': b'2025 Report...'
}

Read all CSV files matching a glob pattern:

>>> fs.cat("dr://696935d6d5a04a752419cf6d/data/**/*.csv")
{
    '696935d6d5a04a752419cf6d/data/sales.csv': b'date,amount\n2024-01-01,1000\n...',
    '696935d6d5a04a752419cf6d/data/archive/old_sales.csv': b'date,amount\n2023-01-01,950\n...'
}

sign(path, expiration=100, version_id=None, **kwargs)

Create a signed URL for the given file path. Optionally specify a version ID to retrieve a signed URL for an earlier version of the file from that version of the catalog directory.

Parameters:

path (str) – File path in the DataRobot file system to sign.
expiration (int) – Number of seconds until the signed URL expires.
version_id (Optional[str]) – Version ID of the catalog directory to target. If not provided, the latest version is used.
kwargs (Any) – Additional keyword arguments for future proofing.

Returns:

A signed URL granting temporary access to the file.

Return type:

str

Raises:

FileNotFoundError – If the specified file does not exist.
IsADirectoryError – If the specified path is a directory.
ValueError – If the path format is invalid.

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> signed_url = fs.sign(
...     "dr://696935d6d5a04a752419cf6d/finance/budgets/Q2_budget_2024.pdf",
...     expiration=300,
... )

cp_file(path1, path2, overwrite_strategy=FilesOverwriteStrategy.RENAME, max_wait=600, wait_for_completion=True, **kwargs)

Copy a file or directory from path1 to path2.

Copies directories recursively. Specify an overwrite strategy to handle file naming conflicts at the target location. Note that copying between catalog item directories is an asynchronous operation. Cannot create a new catalog item directory by copying files into a non-existent catalog item directory.

Parameters:

path1 (str) – Source file or directory path. Directory paths should end with a forward slash (/).
path2 (str) – Target file or directory path. Directory paths should end with a forward slash (/).
overwrite_strategy (FilesOverwriteStrategy) – Strategy to handle naming conflicts at the target location.
max_wait (int) – Maximum time in seconds to wait for the copy operation to complete when copying between catalog items.
wait_for_completion (bool) – Whether to wait for the copy operation to complete before returning when copying between catalog items.
kwargs (Any) – Additional keyword arguments for future proofing.

Raises:

FileNotFoundError: – If the source path does not exist or either catalog item directory does not exist.
ValueError: – If attempting to copy a directory to a file path.
FileExistsError: – If the target file or directory already exists and overwrite strategy is set to ERROR.

Return type:

None

Examples

Copy a file to a new file path:

>>> from datarobot.fs import DataRobotFileSystem
>>> from datarobot.enums import FilesOverwriteStrategy
>>> fs = DataRobotFileSystem()
>>> fs.cp_file(
...     "dr://696935d6d5a04a752419cf6d/fy-2024/budgets/Q2_budget_2024.pdf",
...     "dr://69691fc3d5a04a752419cf5/fy-2024/budgets-copy.pdf",
... )

Copy file into a directory, replace existing file if present:

>>> fs.cp_file(
...     "dr://696935d6d5a04a752419cf6d/fy-2024/budgets/Q2_budget_2024.pdf",
...     "dr://69691fc3d5a04a752419cf5/fy-2024/budgets/",
...     overwrite_strategy=FilesOverwriteStrategy.OVERWRITE,
... )

Copy the contents of a directory into another directory:

>>> fs.cp_file(
...     "dr://696935d6d5a04a752419cf6d/fy-2024/budgets/",
...     "dr://69691fc3d5a04a752419cf5/archive/budgets-2024/",
... )

See also

copy()

cp_directory(path1, path2, overwrite_strategy=FilesOverwriteStrategy.RENAME, max_wait=600, wait_for_completion=True, **kwargs)

Copy a directory recursively from path1 to path2.

Validates that both paths are directories by checking for trailing slashes (/). Calls cp_file() internally.

Parameters:

path1 (str) – Source directory path. Must end with a forward slash (/).
path2 (str) – Target directory path. Must end with a forward slash (/).
overwrite_strategy (FilesOverwriteStrategy) – Strategy to handle naming conflicts at the target location.
max_wait (int) – Maximum time in seconds to wait for the copy operation to complete when copying between catalog items.
wait_for_completion (bool) – Whether to wait for the copy operation to complete before returning when copying between catalog items.
kwargs (Any) – Additional keyword arguments passed to cp_file().

Return type:

None

See also

cp_file()

copy(path1, path2, recursive=False, maxdepth=None, on_error=None, **kwargs)

Copy files or directories between two locations in the DataRobot file system.

Parameters:

path1 (Union[str, List[str]]) – Source file or directory path(s). Supports glob pattern. If specifying a directory, recursive should be True.
path2 (Union[str, List[str]]) – Target file or directory path(s).
recursive (bool) – Whether to copy directory contents recursively.
maxdepth (Optional[int]) – Maximum depth to recurse when finding files to copy.
on_error (Optional[Literal['raise', 'ignore']]) – If "raise", any file not found exceptions will be raised. If "ignore", any file not found exceptions will be skipped and ignored. Defaults to "raise" unless recursive is True, where the default is "ignore".
kwargs (Any) – Additional keyword arguments passed to cp_file().
overwrite_strategy (FilesOverwriteStrategy) – Strategy to handle naming conflicts at the target location. Passed to cp_file().

Raises:

FileNotFoundError – If any of the source paths do not exist or cannot find files and on_error is "raise".

Return type:

None

Examples

Copy a single file to a new path:

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.copy(
...     "dr://696935d6d5a04a752419cf6d/finance/employee-list.csv",
...     "dr://696935d6d5a04a752419cf6d/finance/employee-list-backup.csv",
... )

Copy more than one file or directory:

>>> fs.copy(
...     [
...         "dr://696935d6d5a04a752419cf6d/finance/employee-list.csv",
...         "dr://696935d6d5a04a752419cf6d/finance/employee-list-copy.csv",
...     ],
...     [
...         "dr://696935d6d5a04a752419cf6d/finance/employee-list-copy.csv",
...         "dr://696935d6d5a04a752419cf6d/finance/employee-list-copy-2.csv",
...     ],
... )

Copy a single file into a directory:

>>> fs.copy(
...     "dr://696935d6d5a04a752419cf6d/finance/report.pdf",
...     "dr://696935d6d5a04a752419cf6d/archive/",
... )

Recursively copy the contents of a directory to another directory:

>>> fs.copy(
...     "dr://696935d6d5a04a752419cf6d/budgets/",
...     "dr://696935d6d5a04a752419cf6d/archive/budgets-2024/",
...     recursive=True,
... )

Copy all CSV files in a directory and its subdirectories up to a maximum depth of 2:

>>> fs.copy(
...     "dr://696935d6d5a04a752419cf6d/data/**/*.csv",
...     "dr://696935d6d5a04a752419cf6d/archive/data-2024/",
...     recursive=True,
...     maxdepth=2,
... )

Copy all text files in a directory into a new directory:

>>> fs.copy(
...     "dr://696935d6d5a04a752419cf6d/data/*.txt",
...     "dr://696935d6d5a04a752419cf6d/archive/",
...     recursive=True,
... )

Copy a directory recursively, skipping files that already exist at the target:

>>> from datarobot.enums import FilesOverwriteStrategy
>>> fs.copy(
...     "dr://696935d6d5a04a752419cf6d/budgets/",
...     "dr://696935d6d5a04a752419cf6d/archive/",
...     recursive=True,
...     overwrite_strategy=FilesOverwriteStrategy.SKIP,
... )

rm_file(path, **kwargs)

Delete a file or directory at the given path(s). Completes silently if the file does not exist.

Parameters:

path (Union[str, List[str]]) – Path(s) of the file(s) to delete. Paths ending with a forward slash (/) are treated as directories and deleted recursively.
kwargs (Any) – Additional keyword arguments for future proofing.

Return type:

None

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.rm_file("dr://696935d6d5a04a752419cf6d/finance/employee-list.csv")

>>> fs.rm_file([
...     "dr://696935d6d5a04a752419cf6d/finance/employee-list.csv",
...     "dr://696935d6d5a04a752419cf6d/finance/fy-2024/budgets/Q2_budget_2024.pdf"
... ])

rm_directory(path, **kwargs)

Recursively delete a directory at the given path(s). Completes silently if the directory does not exist. Uses rm_file() internally.

Soft-deletes catalog item directory when requested. Use Files.un_delete() if you need to restore a deleted catalog item.

Parameters:

path (Union[str, List[str]]) – One or more directory paths to delete recursively. Paths must end with a forward slash (/) to be treated as directories and deleted recursively.
kwargs (Any) – Additional keyword arguments for future proofing.

Raises:

ValueError: – If any of the provided paths do not end with a forward slash (/).

Return type:

None

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.rm_directory("dr://696935d6d5a04a752419cf6d/finance/fy-2024/")

>>> fs.rm_directory([
...     "dr://696935d6d5a04a752419cf6d/finance/fy-2024/",
...     "dr://696935d6d5a04a752419cf6d/"
... ])

rm(path, recursive=False, maxdepth=None, **kwargs)

Delete files or directories. Completes silently if the file or directory does not exist.

Soft-deletes catalog item directory when requested. Use Files.un_delete() if you need to restore a deleted catalog item. If all files in a directory are deleted, the directory itself is also deleted implicitly as DataRobot file system does not support empty directories.

Parameters:

path (Union[str, List[str]]) – One or more file or directory paths to delete. Paths ending with a forward slash (/) are treated as directories.
recursive (bool) – Whether to recurse into directories when targeting files to delete. If False only deletes files targeted.
maxdepth (Optional[int]) – Depth to pass to find() and glob() when targeting files for deletion. Used to limit recursion in directories when finding files to delete. If None, no limit is applied.
kwargs (Any) – Additional keyword arguments for future proofing. Passed to rm_file().

Return type:

None

Examples

Delete file:

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.rm("dr://696935d6d5a04a752419cf6d/finance/employee-list.csv")

Delete directory recursively:

>>> fs.rm("dr://696935d6d5a04a752419cf6d/finance/fy-2024/", recursive=True)

Delete contents of catalog item folder recursively up to a maximum depth of 2:

>>> fs.rm("dr://696935d6d5a04a752419cf6d/", recursive=True, maxdepth=2)

Delete catalog item folder:

>>> fs.rm("dr://696935d6d5a04a752419cf6d/")

Delete .csv files in a directory and its subdirectories up to a maximum depth of 3:

>>> fs.rm("dr://696935d6d5a04a752419cf6d/finance/**/*.csv", recursive=True, maxdepth=3)

create_catalog_item_dir(**kwargs)

Create a new empty catalog item directory and return its id.

Parameters:: kwargs (Any) – Additional keyword arguments for future proofing.
Returns:: The id of the newly created catalog item.
Return type:: str

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> catalog_id = fs.create_catalog_item_dir()
>>> fs.ls(f"dr://{catalog_id}/")
[]

mv_file(path1, path2, *, overwrite_strategy=FilesOverwriteStrategy.REPLACE, **kwargs)

Move a single file or directory from path1 to path2.

Parameters:

path1 (str) – Source path. Format: dr://<catalog_id>/path. Directories should end with /.
path2 (str) – Destination path. Format: dr://<catalog_id>/path. Directories should end with /.
overwrite_strategy (FilesOverwriteStrategy) – Strategy for overwriting existing paths. Defaults to REPLACE, inline with fsspec.
kwargs (Any) – Additional keyword arguments passed to cp_file() and rm_file() when moving across catalogs.

Return type:

None

mv(path1, path2, recursive=False, maxdepth=None, *, overwrite_strategy=FilesOverwriteStrategy.REPLACE, **kwargs)

Move files or directories from path1 to path2. path1 may contain glob patterns.

Parameters:

path1 (Union[str, List[str]]) – Source path(s). Format: dr://<catalog_id>/path. A string (file, directory, or glob pattern) or a list of explicit paths.
path2 (Union[str, List[str]]) – Destination path(s). Format: dr://<catalog_id>/path. A single path when path1 is a string. When path1 is a list, either a single directory (ending with /; each source maps to path2/basename) or a list of paths. When both are lists, truncates to the shorter length (matches fsspec).
recursive (bool) – If True, move directories recursively.
maxdepth (Optional[int]) – If not None, maximum directory depth when resolving path1. None means no limit.
overwrite_strategy (FilesOverwriteStrategy) – Strategy for overwriting existing paths. Defaults to REPLACE, inline with fsspec.
kwargs (Any) – Additional keyword arguments passed to expand_path when resolving paths and to mv_file() when performing the move.

Raises:

ValueError: – If multiple sources are moved to a single file destination (not a directory).

Return type:

None

clone_catalog_item_dir(path_or_id, files_to_omit=None, **kwargs)

Clone a catalog item directory (copy all contents) and return the ID of the cloned catalog item.

Parameters:

path_or_id (str) – Path or ID of the catalog item directory to clone.
files_to_omit (Optional[List[str]]) – List of files to omit when cloning. Provide paths relative to the root of the catalog item directory.
kwargs (Any) – Additional keyword arguments passed to Files.clone().

Returns:

The ID of the cloned catalog item.

Return type:

str

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.ls("dr://696935d6d5a04a752419cf6d/", detail=False)
['696935d6d5a04a752419cf6d/folder/', '696935d6d5a04a752419cf6d/file.txt']
>>> clone_id = fs.clone_catalog_item_dir("dr://696935d6d5a04a752419cf6d/")
>>> clone_id
"696935d6d5a04a752419cf6d-clone"
>>> fs.ls(f"dr://{clone_id}/", detail=False)
['696935d6d5a04a752419cf6d-clone/folder/', '696935d6d5a04a752419cf6d-clone/file.txt']

Clone a catalog item directory and omit a file:

>>> fs.clone_catalog_item_dir("dr://696935d6d5a04a752419cf6d/", files_to_omit=["file.txt"])
"696935d6d5a04a752419cf6d-clone"
>>> fs.ls(f"dr://696935d6d5a04a752419cf6d-clone/", detail=False)
['696935d6d5a04a752419cf6d-clone/folder/']

put_from_url(path, url, unpack_archive_files=True, overwrite_strategy=FilesOverwriteStrategy.RENAME, *, upload_timeout=600, wait_for_completion=True, **kwargs)

Load file(s) from a URL into a directory in the DataRobot file system.

Parameters:

path (str) – DataRobot path to the directory (catalog root or a folder inside it).
url (str) – The URL of the file or archive to load. Must be accessible by the DataRobot server.
unpack_archive_files (bool) – If True, extract archive contents into the directory. If False, upload the file as-is. Defaults to True.
upload_timeout (int) – Maximum time in seconds to wait for the upload to complete.
wait_for_completion (bool) – If True, block until the upload completes. Defaults to True.
overwrite_strategy (FilesOverwriteStrategy) – How to handle name conflicts with existing files. Defaults to FilesOverwriteStrategy.RENAME.
kwargs (Any) – Additional keyword arguments for future proofing.

Return type:

None

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> catalog_id = fs.create_catalog_item_dir()
>>> fs.put_from_url(f"dr://{catalog_id}/data/", "https://example.com/file.png")
>>> fs.ls(f"dr://{catalog_id}/data/")
[{'name': 'file.png', 'size': 12345, 'type': 'file', ...}]

Raises:

AsyncTimeoutError – If wait_for_completion is True and the upload takes longer than upload_timeout seconds.
FileExistsError – If overwrite_strategy is FilesOverwriteStrategy.ERROR and a file with the same name already exists.

put_from_data_source(path, data_source_id, credential_id=None, credential_data=None, unpack_archive_files=True, overwrite_strategy=FilesOverwriteStrategy.RENAME, *, upload_timeout=600, wait_for_completion=True, **kwargs)

Upload one or more files from a data source into a directory in the DataRobot file system.

Parameters:

path (str) – Directory path to upload files under. Cannot be root directory.
data_source_id (str) – The ID of the DataSource to use as the source of data.
credential_id (Optional[str]) – The ID of the Credential to use for authentication.
credential_data (Optional[Dict[str, str]]) – The credentials to authenticate with the database, to use instead of credential ID.
unpack_archive_files (bool) – Whether to unpack archive files (zip, tar, tar.gz, tgz) upon upload.
overwrite_strategy (FilesOverwriteStrategy) – Strategy to handle naming conflicts when writing to a path where a file already exists. Use FilesOverwriteStrategy.RENAME to rename and uploaded file using the “<filename> (n).ext” pattern. Use FilesOverwriteStrategy.REPLACE to overwrite the existing file. Use FilesOverwriteStrategy.SKIP to skip uploading if a file already exists at the target path. Use FilesOverwriteStrategy.ERROR to raise FileExistsError if a file already exists at the target path.
upload_timeout (int) – Maximum time in seconds to wait for the upload to complete.
wait_for_completion (bool) – If True, block until the upload completes. If False, return after starting the upload.
kwargs (Any) – Additional keyword arguments for future proofing.

Raises:

ValueError – If the directory path is invalid.
FileNotFoundError – If the directory path does not exist.
AsyncTimeoutError – If wait_for_completion is True and the upload takes longer than upload_timeout seconds.

Return type:

None

Examples

Upload file or folder from Google Drive.

Note: GDrive paths must use drive, folder and file IDs. Example: /<drive_id>/<folder_id>/<file_id> or /<drive_id>/<folder_id> if folder.

>>> import datarobot as dr
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> gcp_cred = dr.Credential.create_gcp(
...     name='GDrive Credentials',
...     gcp_key={  # Or load from keyfile
...         "type": "service_account",
...         "private_key_id": "...",
...         "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
...         "client_email": "user@project.iam.gserviceaccount.com",
...         "client_id": "...",
...     },
... )
>>> gdrive_connector = next(
...     c for c in dr.Connector.list() if c.connector_type == "gdrive"
... )
>>> gdrive_datastore = dr.DataStore.create(
...     data_store_type=dr.enums.DataStoreTypes.DR_CONNECTOR_V1,
...     canonical_name='GDrive DataStore',
...     fields=[{'id': 'gdrive.drive_name', 'name': 'Drive Name', 'value': 'My Drive'}],
...     connector_id=gdrive_connector.id,
... )
>>> path = "/<drive_id>/<folder_id>/<file_id>"  # or "/<drive_id>/<folder_id>" for a folder
>>> gdrive_datasource = dr.DataSource.create(
...     data_source_type=dr.enums.DataStoreTypes.DR_CONNECTOR_V1,
...     canonical_name='GDrive DataSource for my documents',
...     params=dr.DataSourceParameters(data_store_id=gdrive_datastore.id, path=path),
... )
>>> fs.put_from_data_source(
...     "dr://<catalog-id>/my_gdrive_documents/",
...     gdrive_datasource.id,
...     credential_id=gcp_cred.credential_id, # Can omit if using default credentials setup with DataStore
... )
>>> print(fs.ls(f"dr://<catalog-id>/my_gdrive_documents/", detail=False))
['<catalog-id>/my_gdrive_documents/file.txt']

Upload file or folder from AWS S3 bucket:

>>> import datarobot as dr
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> cred = dr.Credential.create_s3(
...     name="AWS S3 Credentials",
...     aws_access_key_id="...",
...     aws_secret_access_key="...",
...     aws_session_token="...",
... )
>>> s3_connector = next(
...     c for c in dr.Connector.list() if c.connector_type == "s3"
... )
>>> s3_datastore = dr.DataStore.create(
...     data_store_type=dr.enums.DataStoreTypes.DR_CONNECTOR_V1,
...     canonical_name='S3 DataStore',
...     fields=[
...         {"id": "fs.defaultFS", "name": "Bucket Name", "value": "my-bucket-name"},
...         {"id": "fs.rootDirectory", "name": "Prefix", "value": "/"},
...         {"id": "fs.s3.awsRegion", "name": "S3 Bucket Region", "value": "us-east-1"},
...     ],
...     connector_id=s3_connector.id,
... )
>>> s3_datasource = dr.DataSource.create(
...     data_source_type=dr.enums.DataStoreTypes.DR_CONNECTOR_V1,
...     canonical_name='S3 DataSource for my files',
...     params=dr.DataSourceParameters(
...         data_store_id=s3_datastore.id,
...         path="path/to/my/file.txt",  # or "path/to/my/folder/"
...     ),
... )
>>> fs.put_from_data_source(
...     "dr://<catalog-id>/my_s3_files/",
...     s3_datasource.id,
...     credential_id=cred.credential_id, # Can omit if using default credentials setup with DataStore
... )
>>> print(fs.ls(f"dr://<catalog-id>/my_s3_files/", detail=False))
['<catalog-id>/my_s3_files/file.txt']

Upload file or folder from SharePoint:

Note: Sharepoint paths must use the following format: /<HOSTNAME>,<SITE_COLLECTION_ID>,<SITE_ID/WEB_ID>/<DRIVE_ID>/<FILE_OR_FOLDER_ITEM_ID>.

Example: /mydomain.sharepoint.com,4732d...8b01b0,eb0d3...e42f/b!8tQyRyn.....TowMA13__nTU/01MAJ...EYJTAOR6/

>>> import datarobot as dr
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> cred = dr.Credential.create_azure_service_principal(
...     name="Azure Service Principal Credential for Sharepoint",
...     client_id="...",
...     client_secret="...",
...     azure_tenant_id="...",
... )
>>> sharepoint_connector = next(
...     c for c in dr.Connector.list() if c.connector_type == "sharepoint"
... )
>>> sharepoint_datastore = dr.DataStore.create(
...     data_store_type=dr.enums.DataStoreTypes.DR_CONNECTOR_V1,
...     canonical_name='Sharepoint DataStore',
...     fields=[],
...     connector_id=sharepoint_connector.id,
... )
>>> path = "/<HOSTNAME>,<SITE_COLLECTION_ID>,<SITE_ID/WEB_ID>/<DRIVE_ID>/<FILE_OR_FOLDER_ITEM_ID>"
>>> sharepoint_datasource = dr.DataSource.create(
...     data_source_type=dr.enums.DataStoreTypes.DR_CONNECTOR_V1,
...     canonical_name='Sharepoint DataSource',
...     params=dr.DataSourceParameters(
...         data_store_id=sharepoint_datastore.id,
...         path=path,
...     ),
... )
>>> fs.put_from_data_source(
...     "dr://<catalog-id>/my_sharepoint_files/",
...     sharepoint_datasource.id,
...     credential_id=cred.credential_id,
... )
>>> print(fs.ls(f"dr://<catalog-id>/my_sharepoint_files/", detail=False))
['<catalog-id>/my_sharepoint_files/my_file.txt']

See also

put_from_url()

put_file()

open(path, mode='rb', block_size=None, cache_options=None, compression=None, overwrite_strategy=FilesOverwriteStrategy.REPLACE, unpack_archive_files=False, upload_timeout=600, **kwargs)

Open a file in the DataRobot file system. Supports read modes ‘r’, ‘rb’ and write modes ‘w’, ‘wb’, ‘xb’.

Parameters:

path (str) – Path in the DataRobot file system to open.
mode (str) – Mode to open the file in. ‘r’ or ‘rb’ for reading, ‘w’, ‘wb’ or ‘xb’ for writing.
block_size (Optional[int]) – Buffer size in bytes for reading and writing.
cache_options (Optional[Dict[str, Any]]) – Extra arguments to pass through the cache.
compression (Optional[str]) – If given, open file using compression codec. Can either be a compression name (a key in fsspec.compression.compr) or “infer” to guess the compression from the filename suffix.
overwrite_strategy (FilesOverwriteStrategy) – Strategy to handle naming conflicts when writing to a path where a file already exists. Use FilesOverwriteStrategy.RENAME to rename and uploaded file using the “<filename> (n).ext” pattern. Use FilesOverwriteStrategy.REPLACE to overwrite the existing file. Use FilesOverwriteStrategy.SKIP to skip uploading if a file already exists at the target path. Use FilesOverwriteStrategy.ERROR to raise FileExistsError if a file already exists at the target path.
unpack_archive_files (bool) – If True, automatically unpack archive files (zip, tar, tar.gz, tgz) upon upload.
upload_timeout (int) – Maximum time in seconds to wait for file upload to complete.
kwargs (Any) – Additional keyword arguments passed to DataRobotFile or TextFileWrapper.

Raises:

IsADirectoryError – If attempting to open a directory for reading.
FileNotFoundError – If attempting to open a non-existent file for reading.
ValueError – If an unsupported file mode is provided, an invalid path is passed, or if file is too big to download.
FileExistsError – If attempting to write to a path where a file already exists and overwrite strategy is set to FilesOverwriteStrategy.ERROR or mode is set to ‘xb’.

Returns:

A file-like object for reading or writing.

Return type:

DataRobotFile

Examples

Open a file for reading:

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> with fs.open("dr://696935d6d5a04a752419cf6d/notes/agenda.txt", mode="r") as f:
...     data = f.read()

Read first 20 bytes from a file then skip to byte 100 and read the next 30 bytes:

>>> with fs.open("dr://696935d6d5a04a752419cf6d/figures/plot.png", mode="rb") as f:
...     first_20_bytes = f.read(20)
...     f.seek(100)
...     next_30_bytes = f.read(30)

touch(path, truncate=True, **kwargs)

Create an empty file at the given path.

DataRobotFileSystem does not support updating timestamps of existing files.

Parameters:

path (str) – Path to the file to create.
truncate (bool) – Whether to replace the existing file with an empty one. This must always be set to True.
kwargs (Any) – Additional keyword arguments passed to open().

Raises:

NotImplementedError – If attempting to update the timestamp of an existing file with truncate set to False.

Return type:

None

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.touch("dr://696935d6d5a04a752419cf6d/notes/agenda.txt")

read_block(fn, offset, length, delimiter=None)

Read a block of bytes from a file.

Starting at offset of the file, read length bytes. If delimiter is set then we ensure that the read starts and stops at delimiter boundaries that follow the locations offset and offset + length. If offset is zero then we start at zero. The bytestring returned WILL include the end delimiter string.

If offset+length is beyond the eof, reads to eof.

Parameters:

fn (str) – Filepath to read from.
offset (int) – Byte offset to start read from.
length (Optional[int]) – Number of bytes to read. If None, read to end of file.
delimiter (Optional[bytes]) – Ensure reading starts and stops at delimiter bytestring.

Return type:

bytes

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.read_block("dr://696935d6d5a04a752419cf6d/data/file.txt", 0, 13)
b'Alice, 100\nBo'
>>> fs.read_block("dr://696935d6d5a04a752419cf6d/data/file.txt", 0, 13, delimiter=b'\n')
b'Alice, 100\nBob, 200\n'

Use length=None to read to the end of the file.

>>> fs.read_block("dr://696935d6d5a04a752419cf6d/data/file.txt", 0, None, delimiter=b'\n')
b'Alice, 100\nBob, 200\nCharlie, 300'

put_file(lpath, rpath, callback=<fsspec.callbacks.NoOpCallback object>, mode='overwrite', raise_error_on_directory=True, **kwargs)

Upload a single file from local to DataRobot file system.

Parameters:

lpath (str) – Local file path.
rpath (str) – DataRobot file system path.
callback (Callback) – Callback to track progress of the file transfer. Not supported as DataRobotFileSystem does not support buffered uploads.
mode (str) – Mode to open the file in: ‘overwrite’ or ‘create’.
raise_error_on_directory (bool) – Whether to raise an exception if the local path is a directory. DataRobot file system does not support creating empty directories. If False, the function does nothing and returns silently.
kwargs (Any) – Keyword arguments passed to open().

Raises:

FileExistsError – If the file already exists and mode is set to ‘create’.
NotImplementedError – If attempting to upload a directory and raise_error_on_directory is True.
ValueError – If attempting to upload a file to an invalid path.

Return type:

None

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.put_file(
...     "/Users/username/local/path/to/file.txt",
...     "dr://696935d6d5a04a752419cf6d/my/new/file_copy.txt",
... )

put(lpath, rpath, recursive=False, callback=<fsspec.callbacks.NoOpCallback object>, maxdepth=None, **kwargs)

Upload local file(s) to DataRobot file system.

Copies a specific file or tree of files (if recursive=True). If rpath ends with a “/”, it will be assumed to be a directory, and target files will go within. If lpath ends with a “/”, it will be assumed to be a directory and will target files inside the directory. Calls put_file() for each source path or uses FilesStage to upload multiple files at once if upload can be optimized.

Parameters:

lpath (Union[str, List[str]]) – Local file path or list of local file paths to upload.
rpath (Union[str, List[str]]) – DataRobot file system path or list of DataRobot file system paths to upload to.
recursive (bool) – Whether to recursively target local files to upload.
callback (Callback) – Callback to track progress of the file transfer. Not supported as DataRobotFileSystem does not support buffered uploads.
maxdepth (Optional[int]) – Maximum depth to recurse when targeting local files to upload.
kwargs (Any) – Additional keyword arguments passed to put_file().
raise_error_on_directory – Whether to raise an exception for local directory paths. DataRobot file system does not support creating empty directories. Defaults to False so invocations of put_file() for local directory paths do nothing and return silently.
overwrite_strategy – How to handle name conflicts with existing files. Defaults to FilesOverwriteStrategy.RENAME.

Return type:

None

Examples

>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.put(
...     "/Users/username/local/path/to/file.txt",
...     "dr://696935d6d5a04a752419cf6d/my/new/file_copy.txt",

Upload a directory recursively:

>>> fs.put(
...     "/Users/username/local/path/to/directory",
...     "dr://696935d6d5a04a752419cf6d/my/new/directory/",
...     recursive=True,
... )

Upload all PDF files in a directory:

>>> fs.put(
...     "/Users/username/local/my/documents/**/*.pdf",
...     "dr://696935d6d5a04a752419cf6d/my-pdf-documents/",
...     recursive=True,
... )

Upload multiple files at once:

>>> fs.put(
...     ["/Users/username/local/path/to/file1.txt", "/Users/username/local/path/to/file2.txt"],
...     ["dr://696935d6d5a04a752419cf6d/my/new/file1.txt", "dr://696935d6d5a04a752419cf6d/my/new/file2.txt"],
... )

get_mapper(root='', missing_exceptions=None)

Create a key/value mutable store based on this file-system.

Creates a MutableMapping interface to the DataRobot file system at the given root path.

Parameters:

root (str) – Path in the DataRobot file system to use as the root for the map.
missing_exceptions (Optional[Tuple[Type[Exception], ...]]) – Exceptions to convert to KeyError if raised when working with the file system.

Returns:

A key/value mutable store based on this file-system.

Return type:

DataRobotFSMap

Examples

>>> from datarobot.fs import DataRobotFileSystem, DataRobotFSMap
>>> fs = DataRobotFileSystem()
>>> root_map = fs.get_mapper()
>>> map = fs.get_mapper("dr://696935d6d5a04a752419cf6d/")

Retrieve file contents from file system using map:

>>> map["file.txt"]
b"Hello, world!"
>>> "folder/path/file.txt" in map
True
>>> file_count = len(map)
>>> file_count
3
>>> [file for file in map]
["file.txt", "folder/path/file.txt", "another/folder/file.txt"]
>>> map.getitems(["file.txt", "folder/path/file.txt", "another/folder/file.txt"])
{
    "file.txt": b"Hello, world!",
    "folder/path/file.txt": b"Hello, world!",
    "another/folder/file.txt": b"Hello, world!",
}

Set file contents in file system using map:

>>> map["file.txt"] = b"Hello, world!"
>>> map["folder/path/new_file.txt"] = b"This is a new file!"
>>> map.setitems({
    "another/folder/file.txt": b"Hello, world!",
    "folder/path/new_file.txt": b"This is a new file!",
})

Delete files from file system using map:

>>> del map["file.txt"]
>>> map.delitems(["folder/path/new_file.txt", "another/folder/file.txt"])
>>> map.pop("file.txt", "default_value_if_file_does_not_exist")
b'Hello, world!'
>>> map.pop("folder/path/non_existent_file.txt", "default_value_if_file_does_not_exist")
'default_value_if_file_does_not_exist'

Clear all files under the map root. This may have unintended consequences as DataRobot file system does not support empty directories:

>>> map.clear()
>>> len(map)
0

mkdir(*args, **kwargs)

Not supported as DataRobotFileSystem does not support empty directories.

Return type:: None

makedirs(*args, **kwargs)

Not supported as DataRobotFileSystem does not support empty directories.

Return type:: None

rmdir(*args, **kwargs)

Not supported as DataRobotFileSystem does not support empty directories.

Return type:: None

modified(*args, **kwargs)

DataRobotFileSystem does not currently expose file modification timestamp.

Return type:: datetime

cat_ranges(paths, starts, ends, max_gap=None, on_error='return', **kwargs)

Get the contents of byte ranges from one or more files

Parameters:

paths (list) – A list of of filepaths on this filesystems
starts (int or list) – Bytes limits of the read. If using a single int, the same value will be used to read all the specified files.
ends (int or list) – Bytes limits of the read. If using a single int, the same value will be used to read all the specified files.

checksum(path)

Unique value for current version of file

If the checksum is the same from one moment to another, the contents are guaranteed to be the same. If the checksum changes, the contents might have changed.

This should normally be overridden; default will probably capture creation/modification timestamp (which would be good) or maybe access timestamp (which would be bad)

classmethod clear_instance_cache(): Clear the cache of filesystem instances.

Notes

Unless overridden by setting the cachable class attribute to False, the filesystem class stores a reference to newly created instances. This prevents Python’s normal rules around garbage collection from working, since the instances refcount will not drop to zero until clear_instance_cache is called.

cp(path1, path2, **kwargs): Alias of AbstractFileSystem.copy.

classmethod current()

Return the most recently instantiated FileSystem

If no instance has been created, then create one with defaults

delete(path, recursive=False, maxdepth=None): Alias of AbstractFileSystem.rm.

disk_usage(path, total=True, maxdepth=None, **kwargs): Alias of AbstractFileSystem.du.

download(rpath, lpath, recursive=False, **kwargs): Alias of AbstractFileSystem.get.

end_transaction(): Finish write transaction, non-context version

exists(path, **kwargs): Is there a file at the given path

expand_path(path, recursive=False, maxdepth=None, **kwargs)

Turn one or more globs or directories into a list of all matching paths to files or directories.

kwargs are passed to glob or find, which may in turn call ls

static from_dict(dct)

Recreate a filesystem instance from dictionary representation.

See .to_dict() for the expected structure of the input.

Parameters:: dct (Dict[str, Any])
Return type:: file system instance, not necessarily of this particular class.

Warning

This can import arbitrary modules (as determined by the cls key). Make sure you haven’t installed any modules that may execute malicious code at import time.

static from_json(blob)

Recreate a filesystem instance from JSON representation.

See .to_json() for the expected structure of the input.

Parameters:: blob (str)
Return type:: file system instance, not necessarily of this particular class.

Warning

This can import arbitrary modules (as determined by the cls key). Make sure you haven’t installed any modules that may execute malicious code at import time.

property fsid: Persistent filesystem id that can be used to compare filesystems across sessions.

get(rpath, lpath, recursive=False, callback=<fsspec.callbacks.NoOpCallback object>, maxdepth=None, **kwargs)

Copy file(s) to local.

Copies a specific file or tree of files (if recursive=True). If lpath ends with a “/”, it will be assumed to be a directory, and target files will go within. Can submit a list of paths, which may be glob-patterns and will be expanded.

Calls get_file for each source.

get_file(rpath, lpath, callback=<fsspec.callbacks.NoOpCallback object>, outfile=None, **kwargs): Copy single remote file to local

head(path, size=1024): Get the first size bytes from file

invalidate_cache(path=None)

Discard any cached directory information

Parameters:: path (string or None) – If None, clear all listings cached else listings at or under given path.

isdir(path): Is this entry directory-like?

isfile(path): Is this entry file-like?

lexists(path, **kwargs): If there is a file at the given path (including broken links)

listdir(path, detail=True, **kwargs): Alias of AbstractFileSystem.ls.

makedir(path, create_parents=True, **kwargs): Alias of AbstractFileSystem.mkdir.

mkdirs(path, exist_ok=False): Alias of AbstractFileSystem.makedirs.

move(path1, path2, **kwargs): Alias of AbstractFileSystem.mv.

pipe(path, value=None, **kwargs)

Put value into path

(counterpart to cat)

Parameters:

path (string or dict(str, bytes)) – If a string, a single remote location to put value bytes; if a dict, a mapping of {path: bytesvalue}.
value (bytes, optional) – If using a single path, these are the bytes to put there. Ignored if path is a dict

pipe_file(path, value, mode='overwrite', **kwargs): Set the bytes of given file

read_bytes(path, start=None, end=None, **kwargs): Alias of AbstractFileSystem.cat_file.

read_text(path, encoding=None, errors=None, newline=None, **kwargs)

Get the contents of the file as a string.

Parameters:

path (str) – URL of file on this filesystems
encoding (same as `open.`)
errors (same as `open.`)
newline (same as `open.`)

rename(path1, path2, **kwargs): Alias of AbstractFileSystem.mv.

size(path): Size in bytes of file

sizes(paths): Size in bytes of each file in a list of paths

start_transaction(): Begin write transaction for deferring files, non-context version

stat(path, **kwargs): Alias of AbstractFileSystem.info.

tail(path, size=1024): Get the last size bytes from file

to_dict(*, include_password=True)

JSON-serializable dictionary representation of this filesystem instance.

Parameters:

include_password (bool, default True) – Whether to include the password (if any) in the output.

Return type:

dict[str, Any]

Returns:

Dictionary with keys ``cls` (the python location` of this class),
protocol (text name of this class's protocol, first one in case of
multiple), args (positional args, usually empty), and all other
keyword arguments as their own keys.

Warning

Serialized filesystems may contain sensitive information which have been passed to the constructor, such as passwords and tokens. Make sure you store and send them in a secure environment!

to_json(*, include_password=True)

JSON representation of this filesystem instance.

Parameters:

include_password (bool, default True) – Whether to include the password (if any) in the output.

Return type:

str

Returns:

JSON string with keys ``cls` (the python location` of this class),
protocol (text name of this class's protocol, first one in case of
multiple), args (positional args, usually empty), and all other
keyword arguments as their own keys.

Warning

Serialized filesystems may contain sensitive information which have been passed to the constructor, such as passwords and tokens. Make sure you store and send them in a secure environment!

property transaction

A context within which files are committed together upon exit

Requires the file class to implement .commit() and .discard() for the normal and exception cases.

transaction_type: alias of Transaction

ukey(path): Hash of file properties, to tell if it has changed

unstrip_protocol(name)

Format FS-specific path to generic, including protocol

Return type:: str

upload(lpath, rpath, recursive=False, **kwargs): Alias of AbstractFileSystem.put.

walk(path, maxdepth=None, topdown=True, on_error='omit', **kwargs)

Return all files under the given path.

List all files, recursing into subdirectories; output is iterator-style, like os.walk(). For a simple list of files, find() is available.

When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again. Modifying dirnames when topdown is False has no effect. (see os.walk)

Note that the “files” outputted will include anything that is not a directory, such as links.

Parameters:

path (str) – Root to recurse into
maxdepth (int) – Maximum recursion depth. None means limitless, but not recommended on link-based file-systems.
topdown (bool (True)) – Whether to walk the directory tree from the top downwards or from the bottom upwards.
on_error ("omit", "raise", a callable) – if omit (default), path with exception will simply be empty; If raise, an underlying exception will be raised; if callable, it will be called with a single OSError instance as argument
kwargs (passed to ls)

write_bytes(path, value, **kwargs): Alias of AbstractFileSystem.pipe_file.

write_text(path, value, encoding=None, errors=None, newline=None, **kwargs)

Write the text to the given file.

An existing file will be overwritten.

Parameters:

path (str) – URL of file on this filesystems
value (str) – Text to write.
encoding (same as `open.`)
errors (same as `open.`)
newline (same as `open.`)

class datarobot.fs.file_system.DataRobotFile

Bases: AbstractBufferedFile

File-like object for reading and writing files in the DataRobot file system.

Supports read modes ‘r’, ‘rb’ and write modes ‘w’, ‘wb’, ‘xb’. DataRobot file system buffers writes in memory only before uploading on close.

Variables:

path (str) – File path in the DataRobot file system.
mode (str) – File mode, either ‘rb’, ‘wb’, or ‘xb’.
fs (DataRobotFileSystem) – The DataRobot file system instance.
blocksize (int) – Block size for reading files.
autocommit (bool) – Whether to automatically commit changes on close.
loc (int) – Current position in the file.
closed (bool) – Whether the file is closed.
forced (bool) – Whether the file is in forced mode.
offset (Optional[int]) – Content length of the file.
buffer (io.BytesIO) – In-memory buffer when writing.
overwrite_strategy – Strategy to handle file naming conflicts when writing files.
unpack_archive_files – Whether to unpack archive files (zip, tar, tar.gz, tgz) upon upload.
upload_timeout – Maximum time in seconds to wait for file upload to complete.

See also

open()

write(data)

Write data to buffer.

Parameters:: data (bytes) – Data to write as bytes.
Returns:: Number of bytes written.
Return type:: int
Raises:: ValueError – If the file is not in write mode, is closed, or has been force-flushed.

flush(force=False)

Write the buffered data to the DataRobot file system if force is True.

Notes

Since DataRobot file system does not support multipart uploads, calling flush without force does not upload any data.

Parameters:: force (bool) – Whether to force flush and upload data. Disallows further writing to this file.
Raises:: ValueError – If the file is closed or if force flush has already been called.
Return type:: None

upload()

Alias of flush(force=True).

Return type:: None

close()

Close file. Finalizes writes, discards cache.

Return type:: None

property url: str: A signed URL for the file.

commit(): Move from temp to final destination

discard(): Throw away temporary file

fileno()

Returns underlying file descriptor if one exists.

OSError is raised if the IO object does not use a file descriptor.

info(): File information about this path

isatty()

Return whether this is an ‘interactive’ stream.

Return False if it can’t be determined.

read(length=-1)

Return data from cache, or fetch pieces as necessary

Parameters:: length (int (-1)) – Number of bytes to read; if <0, all remaining bytes.

readable(): Whether opened for reading

readinto(b)

mirrors builtin file’s readinto method

https://docs.python.org/3/library/io.html#io.RawIOBase.readinto

readline()

Read until and including the first occurrence of newline character

Note that, because of character encoding, this is not necessarily a true line ending.

readlines(): Return all data, split by the newline character, including the newline character

readuntil(char=b'\n', blocks=None)

Return data between current position and first occurrence of char

char is included in the output, except if the end of the tile is encountered first.

Parameters:

char (bytes) – Thing to find
blocks (None or int) – How much to read in each go. Defaults to file blocksize - which may mean a new read on every call.

seek(loc, whence=0)

Set current file location

Parameters:

loc (int) – byte location
whence ({0, 1, 2}) – from start of file, current location or end of file, resp.

seekable(): Whether is seekable (only in read mode)

tell(): Current file location

truncate()

Truncate file to size bytes.

File pointer is left unchanged. Size defaults to the current IO position as reported by tell(). Returns the new size.

property use_range_headers: bool: Whether to use range headers when reading data from file URL.

writable(): Whether opened for writing

writelines(lines, /)

Write a list of lines to stream.

Line separators are not added, so it is usual for each of the lines provided to have a line separator at the end.

property is_datarobot_url_for_read: bool: Whether the file URL is a DataRobot URL.

property read_client: Session: Session client to use for reading data from file URL. Supports unauthenticated clients for URLs outside DataRobot with embedded authentication.

class datarobot.fs.file_system.DataRobotFSMap

Bases: FSMap

Wrap a DataRobotFileSystem instance as a mutable mapping.

The keys of the mapping become files under the given root, and the values (which must be bytes) the contents of those files.

Parameters:

root (str) – The root path in the DataRobot file system to create the mapper for.
fs (DataRobotFileSystem) – The DataRobot file system instance.
missing_exceptions (Optional[Tuple[Type[Exception], ]]) – Exceptions to convert to KeyError when accessing the file system.

Examples

>>> from datarobot.fs import DataRobotFileSystem, DataRobotFSMap
>>> fs = DataRobotFileSystem()
>>> map = DataRobotFSMap("dr://696935d6d5a04a752419cf6d/", fs)

Retrieve file contents from file system using map:

>>> map["file.txt"]
b"Hello, world!"
>>> "folder/path/file.txt" in map
True
>>> file_count = len(map)
>>> file_count
3
>>> [file for file in map]
["file.txt", "folder/path/file.txt", "another/folder/file.txt"]
>>> map.getitems(["file.txt", "folder/path/file.txt", "another/folder/file.txt"])
{
    "file.txt": b"Hello, world!",
    "folder/path/file.txt": b"Hello, world!",
    "another/folder/file.txt": b"Hello, world!",
}

Set file contents in file system using map:

>>> map["file.txt"] = b"Hello, world!"
>>> map["folder/path/new_file.txt"] = b"This is a new file!"
>>> map.setitems({
    "another/folder/file.txt": b"Hello, world!",
    "folder/path/new_file.txt": b"This is a new file!",
})

Delete files from file system using map:

>>> del map["file.txt"]
>>> map.delitems(["folder/path/new_file.txt", "another/folder/file.txt"])
>>> map.pop("file.txt", "default_value_if_file_does_not_exist")
b'Hello, world!'
>>> map.pop("folder/path/non_existent_file.txt", "default_value_if_file_does_not_exist")
'default_value_if_file_does_not_exist'

Clear all files under the map root directory. This may have unintended consequences as DataRobot file system does not support empty directories:

>>> map.clear()
>>> len(map)
0

delitems(keys): Remove multiple keys from the store

property dirfs: dirfs instance that can be used with the same keys as the mapper

get(k[, d]) → D[k] if k in D, else d. d defaults to None.

getitems(keys, on_error='raise')

Fetch multiple items from the store

If the backend is async-able, this might proceed concurrently

Parameters:

keys (list(str)) – They keys to be fetched
on_error ("raise", "omit", "return") – If raise, an underlying exception will be raised (converted to KeyError if the type is in self.missing_exceptions); if omit, keys with exception will simply not be included in the output; if “return”, all keys are included in the output, but the value will be bytes or an exception instance.

Return type:

dict(key, bytes|exception)

items() → a set-like object providing a view on D's items

keys() → a set-like object providing a view on D's keys

pop(key, default=None): Pop data

popitem() → (k, v), remove and return some (key, value) pair: as a 2-tuple; but raise KeyError if D is empty.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D

setitems(values_dict)

Set the values of multiple items in the store

Parameters:: values_dict (dict(str, bytes))

update([E, ]**F) → None. Update D from mapping/iterable E and F.: If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values() → an object providing a view on D's values

clear()

Remove all keys below root. Empties out the mapping.

Notes

May delete more directories than expected as DataRobot file system does not support empty directories.

Return type:: None

Enum and Helpers

class datarobot.fs.file_system.FileInfo

Information about a file or directory in DataRobot File System.

Variables:

name – The path of the file or directory. Does not include the protocol prefix.
size – The size of the file in bytes. For directories, this is 0.
type – The type of the item, either ‘file’ or ‘directory’.
format – The file format (e.g., ‘csv’, ‘pdf’) if the item is a file; None for directories.
created_at – The file creation timestamp if the item is a file; None for directories.

class datarobot.enums.FilesOverwriteStrategy