DataRobot File System
DataRobot’s file system uses containers or “buckets” to store one or more files using a key-value storage approach, where the file’s path is the key and its contents the value. Each container is listed as an item under Data Assets (Data Catalog). We refer to the container as a catalog item.
The following should be kept in mind when working with the DataRobot file system:
Permissions are attached to the catalog item containing the files. All files inside a catalog item share the same permissions.
Since the DR file system uses key-value pairs to store files inside containers, directory structures are simulated and may change due to their contents. Most operations in the DataRobot file system support directory paths.
DR file system does not support empty directories.
To create directory
Xsimply upload a file to a path that contains the directory name, e.g.X/file.txt.A directory will be deleted if all files inside a directory are deleted.
While the DR file system does not support empty directories, a catalog item may be empty.
The DR file system simulates a top-level directory structure by giving each catalog item its own directory named according to its id. Files inside the catalog item will appear as paths inside its directory.
- class datarobot.fs.file_system.DataRobotFileSystem
Bases:
AbstractFileSystemfsspec implementation of DataRobot’s file system.
- File paths are of the form:
dr://<catalog_item_id>/path/to/file.txtor<catalog_item_id>/path/to/file.txt
- Variables:
protocol (
str) – The protocol prefix for the DataRobot file system. Can be removed with_strip_protocol().root_marker (
str) – The root path of the DataRobot file system.
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem()
List all catalog items in the file system:
>>> fs.ls("") ['696935d6d5a04a752419cf6d/', '69691fc3d5a04a752419cf5c/']
Create a new catalog item to hold your files:
>>> catalog_id = fs.create_catalog_item_dir() >>> fs.put_file("local/path/to/file.txt", f"dr://{catalog_id}/file.txt") >>> fs.ls(f"dr://{catalog_id}/") ['file.txt']
Find all PDF files you’ve uploaded to your catalog item:
>>> fs.glob(f"dr://{catalog_id}/**/*.pdf") ['696935d6d5a04a752419cf6d/file.pdf', '696935d6d5a04a752419cf6d/finance/fy-2024/budgets/Q2_budget_2024.pdf']
Copy, move or delete your files:
>>> fs.copy(f"dr://{catalog_id}/file.txt", f"dr://{catalog_id}/file_copy.txt") >>> fs.move(f"dr://{catalog_id}/file_copy.txt", f"dr://{catalog_id}/file_moved.txt") >>> fs.rm(f"dr://{catalog_id}/file_moved.txt")
Open files for reading or writing:
>>> with fs.open(f"dr://{catalog_id}/new_file.txt", mode="w") as f: ... f.write("Hello, world!") >>> with fs.open(f"dr://{catalog_id}/new_file.txt", mode="r") as f: ... data = f.read() ... print(data) Hello, world!
- classmethod _strip_protocol(path)
Turn path from fully-qualified to DR file system specific.
- Parameters:
path (
str) – File path in the DataRobot file system.- Returns:
Validated file path without the protocol prefix.
- Return type:
str
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> DataRobotFileSystem._strip_protocol("dr://12345/path/to/file.txt") '12345/path/to/file.txt' >>> DataRobotFileSystem._strip_protocol("dr://12345/path/") '12345/path/' >>> DataRobotFileSystem._strip_protocol("dr:///12345/") '12345/' >>> DataRobotFileSystem._strip_protocol("dr://") ''
- _split_path(path)
Split the given path into catalog ID and internal file path. Internal paths can be empty.
- Parameters:
path (
str) – File path in the DataRobot file system.- Returns:
A tuple of catalog ID and the internal file path.
- Return type:
Tuple[str,str]- Raises:
ValueError – If the path format is invalid.
Examples
>>> fs = DataRobotFileSystem() >>> fs._split_path("dr://12345/path/to/file.txt") ('12345', 'path/to/file.txt') >>> fs._split_path("dr:///12345/") ('12345', '') >>> fs._split_path("12345/folder/") ('12345', 'folder/')
- ls(path, detail=True, **kwargs)
List files and folders at the given directory path. Use
info()for information about a specific file.If
detailis True, returns a list of dictionaries with file details including name (path), size and type. Ifdetailis False, returns a list of file and folder paths as strings.- Parameters:
path (
str) – Path in the DataRobot file system to list.detail (
bool) – Whether to return detailed information.kwargs (
Any) – Additional keyword arguments for future proofing.version_id (
str) – Version ID of the catalog item to target. If not provided, the latest version is used.
- Returns:
paths – List of dicts with file and folder details if detail is True, otherwise list of paths.
- Return type:
List[FileInfo]orList[str]- Raises:
FileNotFoundError – If the specified path does not exist.
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> fs.ls("dr://", detail=False) ['696935d6d5a04a752419cf6d/', 'abcdef1234567890abcdef12/'] >>> fs.ls("dr://696935d6d5a04a752419cf6d/finance/") [ { 'name': '696935d6d5a04a752419cf6d/finance/fy-2024/', 'size': 0, 'type': 'directory', 'format': None }, { 'name': '696935d6d5a04a752419cf6d/finance/employee-list.csv', 'size': 2048, 'type': 'file', 'format': 'csv' }, ]
- info(path, **kwargs)
Get details about a file or directory.
For info about a directory path append a forward slash (/) at the end of the path. Paths without a trailing slash can return info about files or directories. If both a file and directory share the same path, the file info is returned.
- Parameters:
path (
str) – Path in the DataRobot file system to get information about.version_id – Optional version ID of the catalog item to target. If not provided, the latest version is used.
kwargs (
Any) – Additional keyword arguments passed tols().
- Returns:
info – A dictionary with file or directory details including name (path), size and type.
- Return type:
- Raises:
FileNotFoundError – If the specified path does not exist.
ValueError – If the path is invalid. Root path is not allowed.
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> fs.info("dr://696935d6d5a04a752419cf6d/finance/employee-list.csv") { 'name': '696935d6d5a04a752419cf6d/finance/employee-list.csv', 'size': 2048, 'type': 'file', 'format': 'csv', 'created_at': datetime.datetime(2026, 3, 6, 10, 5, 16, 805655) } >>> fs.info("dr://696935d6d5a04a752419cf6d/finance/") { 'name': '696935d6d5a04a752419cf6d/finance/', 'size': 0, 'type': 'directory', 'format': None, 'created_at': None } >>> fs.info("dr://696935d6d5a04a752419cf6d/my_folder") { 'name': '696935d6d5a04a752419cf6d/my_folder/', 'size': 0, 'type': 'directory', 'format': None, 'created_at': None }
- created(path)
Return the created timestamp of a file as a datetime.datetime :type path:
str:param path: Path in the DataRobot file system to get information about.- Returns:
created – if a directory.
- Return type:
A datetime.datetime timestampofwhen the file was createdorNone- Raises:
FileNotFoundError – If the specified path does not exist.
ValueError – If the path is invalid. Root path is not allowed.
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> fs.created("dr://696935d6d5a04a752419cf6d/finance/employee-list.csv") datetime.datetime(2026, 3, 6, 10, 5, 16, 805655)
- du(path, total=True, maxdepth=None, withdirs=False, **kwargs)
Retrieve space used by files and optionally directories at a path.
Notes
Directory size does not include the size of its contents and is set to zero.
- Parameters:
path (
str) – The path to retrieve file space usage for.total (
bool) – Whether to sum all file sizes.maxdepth (
Optional[int]) – Maximum number of directory levels to descend when searching for files. UseNonefor unlimited.withdirs (
bool) – Whether to include directory paths in the output.kwargs (
Any) – Additional keyword arguments passed tofind().
- Returns:
If total is True, the number of bytes of all files in the path. If total is False, a dictionary mapping paths to their size.
- Return type:
intorDict[str,int]
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> fs.du("dr://696935d6d5a04a752419cf6d/finance/yellow.txt") 2048 >>> fs.du("dr://696935d6d5a04a752419cf6d/", total=False) {'696935d6d5a04a752419cf6d/file.txt': 102, '696935d6d5a04a752419cf6d/finance/yellow.txt': 2048} >>> fs.du("dr://696935d6d5a04a752419cf6d/", total=False, maxdepth=1, withdirs=True) {'696935d6d5a04a752419cf6d/file.txt': 102, '696935d6d5a04a752419cf6d/finance/': 0}
- find(path, maxdepth=None, withdirs=False, detail=False, **kwargs)
List all files below path. If withdirs is True, include directories as well.
Like posix
findcommand without conditions- Parameters:
path (
str) – The path to search from. Note that unlike the glob method, this method does not support glob patterns and treats the path as a literal directory path to search under or a filename to match.maxdepth (
Optional[int]) – If not None, the maximum number of levels to descendwithdirs (
bool) – Whether to include directory paths in the output.kwargs (
Any) – Passed tols
- Returns:
If detail is False, a list of file (and optionally directory) paths. If detail is True, a dictionary mapping paths to their info dictionaries.
- Return type:
List[str]orDict[str,Dict[str,FileInfo]]
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> fs.find("dr://696935d6d5a04a752419cf6d/", withdirs=True) [ '696935d6d5a04a752419cf6d/', '696935d6d5a04a752419cf6d/finance/', '696935d6d5a04a752419cf6d/finance/budgets/', '696935d6d5a04a752419cf6d/finance/budgets/Q2_budget_2024.pdf', '696935d6d5a04a752419cf6d/finance/employee-list.csv' ] >>> fs.find("dr://696935d6d5a04a752419cf6d/finance/", maxdepth=1) ['696935d6d5a04a752419cf6d/finance/employee-list.csv'] >>> fs.find("dr://696935d6d5a04a752419cf6d/finance", maxdepth=1, withdirs=True, detail=True) { '696935d6d5a04a752419cf6d/finance/': { 'name': '696935d6d5a04a752419cf6d/finance/', 'size': 0, 'type': 'directory', 'format': None, 'created_at': None }, '696935d6d5a04a752419cf6d/finance/employee-list.csv': { 'name': '696935d6d5a04a752419cf6d/finance/employee-list.csv', 'size': 2048, 'type': 'file', 'format': 'csv', 'created_at': datetime.datetime(2026, 3, 6, 10, 5, 16, 805655) }, '696935d6d5a04a752419cf6d/finance/budgets/': { 'name': '696935d6d5a04a752419cf6d/finance/budgets/', 'size': 0, 'type': 'directory', 'format': None, 'created_at': None }, }
- glob(path, maxdepth=None, detail=False, **kwargs)
Find files by glob-matching.
Pattern matching capabilities for finding files that match the given pattern.
- Parameters:
path (
str) – The glob pattern to match against.maxdepth (
Optional[int]) – Maximum depth for ‘**’ patterns. Applied on the first ‘**’ found. Must be at least 1 if provided.detail (
bool) – Whether to return detailed information.kwargs (
Any) – Additional arguments passed tofind.
- Returns:
If detail is False, a list of file and directory paths. If detail is True, a dictionary mapping paths to their info dictionaries.
- Return type:
List[str]orDict[str,FileInfo]
Notes
Supported patterns:
‘*’: Matches any sequence of characters within a single directory level
‘**’: Matches any number of directory levels (must be an entire path component)
‘?’: Matches exactly one character
‘[abc]’: Matches any character in the set
‘[a-z]’: Matches any character in the range
‘[!cat]’: Matches any character NOT in the set {c, a, t}
Special behaviors:
If the path ends with ‘/’, only folders are returned
Consecutive ‘*’ characters are compressed into a single ‘*’
Empty brackets ‘[]’ never match anything
Negated empty brackets ‘[!]’ match any single character
Special characters in character classes are escaped properly
Limitations:
‘**’ must be a complete path component (e.g., ‘a/**/b’, not ‘a**b’)
No brace expansion (‘{a, b}.txt’)
No extended glob patterns (‘+(pattern)’, ‘!(pattern)’)
See also
Examples
Find all files and directories directly under the specified path.
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> fs.glob("dr://696935d6d5a04a752419cf6d/finance/*", detail=False) [ '696935d6d5a04a752419cf6d/finance/budgets/', '696935d6d5a04a752419cf6d/finance/employee-list.csv' ]
Find only directories directly under the specified path.
>>> fs.glob("dr://696935d6d5a04a752419cf6d/finance/*/", detail=False) ['696935d6d5a04a752419cf6d/finance/budgets/']
Find any budget directories with a 4-digit year in their name.
>>> fs.glob("dr://696935d6d5a04a752419cf6d/finance/budgets/*-202[0-9]/", detail=False) [ '696935d6d5a04a752419cf6d/finance/budgets/fy-2024/', '696935d6d5a04a752419cf6d/finance/budgets/fy-2023/' ]
Find all .csv files at a maximum depth of 2 levels.
>>> fs.glob("dr://696935d6d5a04a752419cf6d/**/*.csv", maxdepth=2, detail=False) [ '696935d6d5a04a752419cf6d/finance/employee-list.csv', '696935d6d5a04a752419cf6d/sales/data.csv' ]
- tree(path='', recursion_limit=2, max_display=25, display_size=False, prefix='', is_last=True, first=True, indent_size=4)
Return a tree-like structure string of the DataRobot file system from the given path.
- Parameters:
path (
str) – Path in the DataRobot file system to display the tree from.recursion_limit (
int) – Maximum depth of directory traversal.max_display (
int) – Maximum number of items to display per directory.display_size (
bool) – Whether to display file sizes.prefix (
str) – Current line prefix for visual tree structure.is_last (
bool) – Whether the current item is last in its level.first (
bool) – Whether this is the first call (displays root path).indent_size (
int) – Number of spaces by indent.
- Returns:
tree_str – A string representing the tree structure of the file system.
- Return type:
str
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> print(fs.tree("dr://696935d6d5a04a752419cf6d/", recursion_limit=5)) 696935d6d5a04a752419cf6d/ └── finance/ ├── fy-2024/ │ └── budgets/ │ └── Q2_budget_2024.pdf └── employee-list.csv
See also
- cat_file(path, start=None, end=None, **kwargs)
Fetch a single file’s contents.
- Parameters:
path (
str) – File path in the DataRobot file system to read.start (
Optional[int]) – Optional starting byte position to read from. If negative, counts from the end of the file.end (
Optional[int]) – Optional ending byte position to read to. If negative, counts from the end of the file.kwargs (
Any) – Keyword arguments passed toDataRobotFileSystem.open().
- Returns:
The contents of the file as bytes.
- Return type:
bytes
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> fs.cat_file("dr://696935d6d5a04a752419cf6d/finance/report.txt") b'Q2 Financial Report...'
Read a range of bytes from a file:
>>> fs.cat_file("dr://696935d6d5a04a752419cf6d/finance/report.txt", start=10, end=20) b'Financial Report...'
- cat(path, recursive=False, on_error='raise', **kwargs)
Fetch (potentially multiple) path’s contents.
- Parameters:
path (
Union[str,List[str]]) – File or directory path(s) in the DataRobot file system to read. Can include glob patterns.recursive (
bool) – If True, assume the path(s) are directories, and get contents of all contained files.on_error (
Union[Literal['raise'],Literal['omit'],Literal['return']]) – If raise, an underlying exception will be raised (converted to KeyError if the type is in self.missing_exceptions); if omit, keys with exception will simply not be included in the output; if “return”, all keys are included in the output, but the value will be bytes or an exception instance.kwargs (
Any) – Additional keyword arguments passed tocat_file().
- Returns:
If a single file path is provided, returns the file contents as bytes. If multiple paths are provided or the path is otherwise expanded, returns a dictionary mapping each path to its contents as bytes or an exception instance if on_error is set to “return”.
- Return type:
bytesorDict[str,bytes]orDict[str,Union[bytes,Exception]]
Examples
Read a single file:
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> fs.cat("dr://696935d6d5a04a752419cf6d/finance/report.txt") b'Q2 Financial Report...'
Read multiple files and all files in a directory:
>>> fs.cat( ... ["dr://696935d6d5a04a752419cf6d/finance/summary.txt", "dr://696935d6d5a04a752419cf6d/reports/"], ... recursive=True ... ) { '696935d6d5a04a752419cf6d/finance/summary.txt': b'Summary...', '696935d6d5a04a752419cf6d/reports/report_2024.txt': b'2024 Report...', '696935d6d5a04a752419cf6d/reports/report_2025.txt': b'2025 Report...' }
Read all CSV files matching a glob pattern:
>>> fs.cat("dr://696935d6d5a04a752419cf6d/data/**/*.csv") { '696935d6d5a04a752419cf6d/data/sales.csv': b'date,amount\n2024-01-01,1000\n...', '696935d6d5a04a752419cf6d/data/archive/old_sales.csv': b'date,amount\n2023-01-01,950\n...' }
- sign(path, expiration=100, version_id=None, **kwargs)
Create a signed URL for the given file path. Optionally specify a version ID to retrieve a signed URL for an earlier version of the file from that version of the catalog directory.
- Parameters:
path (
str) – File path in the DataRobot file system to sign.expiration (
int) – Number of seconds until the signed URL expires.version_id (
Optional[str]) – Version ID of the catalog directory to target. If not provided, the latest version is used.kwargs (
Any) – Additional keyword arguments for future proofing.
- Returns:
A signed URL granting temporary access to the file.
- Return type:
str- Raises:
FileNotFoundError – If the specified file does not exist.
IsADirectoryError – If the specified path is a directory.
ValueError – If the path format is invalid.
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> signed_url = fs.sign( ... "dr://696935d6d5a04a752419cf6d/finance/budgets/Q2_budget_2024.pdf", ... expiration=300, ... )
- cp_file(path1, path2, overwrite_strategy=FilesOverwriteStrategy.RENAME, max_wait=600, wait_for_completion=True, **kwargs)
Copy a file or directory from path1 to path2.
Copies directories recursively. Specify an overwrite strategy to handle file naming conflicts at the target location. Note that copying between catalog item directories is an asynchronous operation. Cannot create a new catalog item directory by copying files into a non-existent catalog item directory.
- Parameters:
path1 (
str) – Source file or directory path. Directory paths should end with a forward slash (/).path2 (
str) – Target file or directory path. Directory paths should end with a forward slash (/).overwrite_strategy (
FilesOverwriteStrategy) – Strategy to handle naming conflicts at the target location.max_wait (
int) – Maximum time in seconds to wait for the copy operation to complete when copying between catalog items.wait_for_completion (
bool) – Whether to wait for the copy operation to complete before returning when copying between catalog items.kwargs (
Any) – Additional keyword arguments for future proofing.
- Raises:
FileNotFoundError: – If the source path does not exist or either catalog item directory does not exist.
ValueError: – If attempting to copy a directory to a file path.
FileExistsError: – If the target file or directory already exists and overwrite strategy is set to ERROR.
- Return type:
None
Examples
Copy a file to a new file path:
>>> from datarobot.fs import DataRobotFileSystem >>> from datarobot.enums import FilesOverwriteStrategy >>> fs = DataRobotFileSystem() >>> fs.cp_file( ... "dr://696935d6d5a04a752419cf6d/fy-2024/budgets/Q2_budget_2024.pdf", ... "dr://69691fc3d5a04a752419cf5/fy-2024/budgets-copy.pdf", ... )
Copy file into a directory, replace existing file if present:
>>> fs.cp_file( ... "dr://696935d6d5a04a752419cf6d/fy-2024/budgets/Q2_budget_2024.pdf", ... "dr://69691fc3d5a04a752419cf5/fy-2024/budgets/", ... overwrite_strategy=FilesOverwriteStrategy.OVERWRITE, ... )
Copy the contents of a directory into another directory:
>>> fs.cp_file( ... "dr://696935d6d5a04a752419cf6d/fy-2024/budgets/", ... "dr://69691fc3d5a04a752419cf5/archive/budgets-2024/", ... )
See also
- cp_directory(path1, path2, overwrite_strategy=FilesOverwriteStrategy.RENAME, max_wait=600, wait_for_completion=True, **kwargs)
Copy a directory recursively from path1 to path2.
Validates that both paths are directories by checking for trailing slashes (/). Calls
cp_file()internally.- Parameters:
path1 (
str) – Source directory path. Must end with a forward slash (/).path2 (
str) – Target directory path. Must end with a forward slash (/).overwrite_strategy (
FilesOverwriteStrategy) – Strategy to handle naming conflicts at the target location.max_wait (
int) – Maximum time in seconds to wait for the copy operation to complete when copying between catalog items.wait_for_completion (
bool) – Whether to wait for the copy operation to complete before returning when copying between catalog items.kwargs (
Any) – Additional keyword arguments passed tocp_file().
- Return type:
None
See also
- copy(path1, path2, recursive=False, maxdepth=None, on_error=None, **kwargs)
Copy files or directories between two locations in the DataRobot file system.
- Parameters:
path1 (
Union[str,List[str]]) – Source file or directory path(s). Supports glob pattern. If specifying a directory, recursive should beTrue.path2 (
Union[str,List[str]]) – Target file or directory path(s).recursive (
bool) – Whether to copy directory contents recursively.maxdepth (
Optional[int]) – Maximum depth to recurse when finding files to copy.on_error (
Optional[Literal['raise','ignore']]) – If"raise", any file not found exceptions will be raised. If"ignore", any file not found exceptions will be skipped and ignored. Defaults to"raise"unless recursive isTrue, where the default is"ignore".kwargs (
Any) – Additional keyword arguments passed tocp_file().overwrite_strategy (
FilesOverwriteStrategy) – Strategy to handle naming conflicts at the target location. Passed tocp_file().
- Raises:
FileNotFoundError – If any of the source paths do not exist or cannot find files and
on_erroris"raise".- Return type:
None
Examples
Copy a single file to a new path:
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> fs.copy( ... "dr://696935d6d5a04a752419cf6d/finance/employee-list.csv", ... "dr://696935d6d5a04a752419cf6d/finance/employee-list-backup.csv", ... )
Copy more than one file or directory:
>>> fs.copy( ... [ ... "dr://696935d6d5a04a752419cf6d/finance/employee-list.csv", ... "dr://696935d6d5a04a752419cf6d/finance/employee-list-copy.csv", ... ], ... [ ... "dr://696935d6d5a04a752419cf6d/finance/employee-list-copy.csv", ... "dr://696935d6d5a04a752419cf6d/finance/employee-list-copy-2.csv", ... ], ... )
Copy a single file into a directory:
>>> fs.copy( ... "dr://696935d6d5a04a752419cf6d/finance/report.pdf", ... "dr://696935d6d5a04a752419cf6d/archive/", ... )
Recursively copy the contents of a directory to another directory:
>>> fs.copy( ... "dr://696935d6d5a04a752419cf6d/budgets/", ... "dr://696935d6d5a04a752419cf6d/archive/budgets-2024/", ... recursive=True, ... )
Copy all CSV files in a directory and its subdirectories up to a maximum depth of 2:
>>> fs.copy( ... "dr://696935d6d5a04a752419cf6d/data/**/*.csv", ... "dr://696935d6d5a04a752419cf6d/archive/data-2024/", ... recursive=True, ... maxdepth=2, ... )
Copy all text files in a directory into a new directory:
>>> fs.copy( ... "dr://696935d6d5a04a752419cf6d/data/*.txt", ... "dr://696935d6d5a04a752419cf6d/archive/", ... recursive=True, ... )
Copy a directory recursively, skipping files that already exist at the target:
>>> from datarobot.enums import FilesOverwriteStrategy >>> fs.copy( ... "dr://696935d6d5a04a752419cf6d/budgets/", ... "dr://696935d6d5a04a752419cf6d/archive/", ... recursive=True, ... overwrite_strategy=FilesOverwriteStrategy.SKIP, ... )
- rm_file(path, **kwargs)
Delete a file or directory at the given path(s). Completes silently if the file does not exist.
- Parameters:
path (
Union[str,List[str]]) – Path(s) of the file(s) to delete. Paths ending with a forward slash (/) are treated as directories and deleted recursively.kwargs (
Any) – Additional keyword arguments for future proofing.
- Return type:
None
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> fs.rm_file("dr://696935d6d5a04a752419cf6d/finance/employee-list.csv") >>> fs.rm_file([ ... "dr://696935d6d5a04a752419cf6d/finance/employee-list.csv", ... "dr://696935d6d5a04a752419cf6d/finance/fy-2024/budgets/Q2_budget_2024.pdf" ... ])
- rm_directory(path, **kwargs)
Recursively delete a directory at the given path(s). Completes silently if the directory does not exist. Uses
rm_file()internally.Soft-deletes catalog item directory when requested. Use
Files.un_delete()if you need to restore a deleted catalog item.- Parameters:
path (
Union[str,List[str]]) – One or more directory paths to delete recursively. Paths must end with a forward slash (/) to be treated as directories and deleted recursively.kwargs (
Any) – Additional keyword arguments for future proofing.
- Raises:
ValueError: – If any of the provided paths do not end with a forward slash (/).
- Return type:
None
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> fs.rm_directory("dr://696935d6d5a04a752419cf6d/finance/fy-2024/") >>> fs.rm_directory([ ... "dr://696935d6d5a04a752419cf6d/finance/fy-2024/", ... "dr://696935d6d5a04a752419cf6d/" ... ])
- rm(path, recursive=False, maxdepth=None, **kwargs)
Delete files or directories. Completes silently if the file or directory does not exist.
Soft-deletes catalog item directory when requested. Use
Files.un_delete()if you need to restore a deleted catalog item. If all files in a directory are deleted, the directory itself is also deleted implicitly as DataRobot file system does not support empty directories.- Parameters:
path (
Union[str,List[str]]) – One or more file or directory paths to delete. Paths ending with a forward slash (/) are treated as directories.recursive (
bool) – Whether to recurse into directories when targeting files to delete. If False only deletes files targeted.maxdepth (
Optional[int]) – Depth to pass tofind()andglob()when targeting files for deletion. Used to limit recursion in directories when finding files to delete. If None, no limit is applied.kwargs (
Any) – Additional keyword arguments for future proofing. Passed torm_file().
- Return type:
None
Examples
Delete file:
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> fs.rm("dr://696935d6d5a04a752419cf6d/finance/employee-list.csv")
Delete directory recursively:
>>> fs.rm("dr://696935d6d5a04a752419cf6d/finance/fy-2024/", recursive=True)
Delete contents of catalog item folder recursively up to a maximum depth of 2:
>>> fs.rm("dr://696935d6d5a04a752419cf6d/", recursive=True, maxdepth=2)
Delete catalog item folder:
>>> fs.rm("dr://696935d6d5a04a752419cf6d/")
Delete .csv files in a directory and its subdirectories up to a maximum depth of 3:
>>> fs.rm("dr://696935d6d5a04a752419cf6d/finance/**/*.csv", recursive=True, maxdepth=3)
- create_catalog_item_dir(**kwargs)
Create a new empty catalog item directory and return its id.
- Parameters:
kwargs (
Any) – Additional keyword arguments for future proofing.- Returns:
The id of the newly created catalog item.
- Return type:
str
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> catalog_id = fs.create_catalog_item_dir() >>> fs.ls(f"dr://{catalog_id}/") []
- mv_file(path1, path2, *, overwrite_strategy=FilesOverwriteStrategy.REPLACE, **kwargs)
Move a single file or directory from path1 to path2.
- Parameters:
path1 (
str) – Source path. Format:dr://<catalog_id>/path. Directories should end with /.path2 (
str) – Destination path. Format:dr://<catalog_id>/path. Directories should end with /.overwrite_strategy (
FilesOverwriteStrategy) – Strategy for overwriting existing paths. Defaults to REPLACE, inline with fsspec.kwargs (
Any) – Additional keyword arguments passed tocp_file()andrm_file()when moving across catalogs.
- Return type:
None
- mv(path1, path2, recursive=False, maxdepth=None, *, overwrite_strategy=FilesOverwriteStrategy.REPLACE, **kwargs)
Move files or directories from path1 to path2. path1 may contain glob patterns.
- Parameters:
path1 (
Union[str,List[str]]) – Source path(s). Format:dr://<catalog_id>/path. A string (file, directory, or glob pattern) or a list of explicit paths.path2 (
Union[str,List[str]]) – Destination path(s). Format:dr://<catalog_id>/path. A single path when path1 is a string. When path1 is a list, either a single directory (ending with /; each source maps to path2/basename) or a list of paths. When both are lists, truncates to the shorter length (matches fsspec).recursive (
bool) – If True, move directories recursively.maxdepth (
Optional[int]) – If not None, maximum directory depth when resolving path1. None means no limit.overwrite_strategy (
FilesOverwriteStrategy) – Strategy for overwriting existing paths. Defaults to REPLACE, inline with fsspec.kwargs (
Any) – Additional keyword arguments passed toexpand_pathwhen resolving paths and tomv_file()when performing the move.
- Raises:
ValueError: – If multiple sources are moved to a single file destination (not a directory).
- Return type:
None
- clone_catalog_item_dir(path_or_id, files_to_omit=None, **kwargs)
Clone a catalog item directory (copy all contents) and return the ID of the cloned catalog item.
- Parameters:
path_or_id (
str) – Path or ID of the catalog item directory to clone.files_to_omit (
Optional[List[str]]) – List of files to omit when cloning. Provide paths relative to the root of the catalog item directory.kwargs (
Any) – Additional keyword arguments passed toFiles.clone().
- Returns:
The ID of the cloned catalog item.
- Return type:
str
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> fs.ls("dr://696935d6d5a04a752419cf6d/", detail=False) ['696935d6d5a04a752419cf6d/folder/', '696935d6d5a04a752419cf6d/file.txt'] >>> clone_id = fs.clone_catalog_item_dir("dr://696935d6d5a04a752419cf6d/") >>> clone_id "696935d6d5a04a752419cf6d-clone" >>> fs.ls(f"dr://{clone_id}/", detail=False) ['696935d6d5a04a752419cf6d-clone/folder/', '696935d6d5a04a752419cf6d-clone/file.txt']
Clone a catalog item directory and omit a file:
>>> fs.clone_catalog_item_dir("dr://696935d6d5a04a752419cf6d/", files_to_omit=["file.txt"]) "696935d6d5a04a752419cf6d-clone" >>> fs.ls(f"dr://696935d6d5a04a752419cf6d-clone/", detail=False) ['696935d6d5a04a752419cf6d-clone/folder/']
- put_from_url(path, url, unpack_archive_files=True, overwrite_strategy=FilesOverwriteStrategy.RENAME, *, upload_timeout=600, wait_for_completion=True, **kwargs)
Load file(s) from a URL into a directory in the DataRobot file system.
- Parameters:
path (
str) – DataRobot path to the directory (catalog root or a folder inside it).url (
str) – The URL of the file or archive to load. Must be accessible by the DataRobot server.unpack_archive_files (
bool) – If True, extract archive contents into the directory. If False, upload the file as-is. Defaults to True.upload_timeout (
int) – Maximum time in seconds to wait for the upload to complete.wait_for_completion (
bool) – If True, block until the upload completes. Defaults to True.overwrite_strategy (
FilesOverwriteStrategy) – How to handle name conflicts with existing files. Defaults toFilesOverwriteStrategy.RENAME.kwargs (
Any) – Additional keyword arguments for future proofing.
- Return type:
None
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> catalog_id = fs.create_catalog_item_dir() >>> fs.put_from_url(f"dr://{catalog_id}/data/", "https://example.com/file.png") >>> fs.ls(f"dr://{catalog_id}/data/") [{'name': 'file.png', 'size': 12345, 'type': 'file', ...}]
- Raises:
AsyncTimeoutError – If
wait_for_completionis True and the upload takes longer thanupload_timeoutseconds.FileExistsError – If
overwrite_strategyisFilesOverwriteStrategy.ERRORand a file with the same name already exists.
- put_from_data_source(path, data_source_id, credential_id=None, credential_data=None, unpack_archive_files=True, overwrite_strategy=FilesOverwriteStrategy.RENAME, *, upload_timeout=600, wait_for_completion=True, **kwargs)
Upload one or more files from a data source into a directory in the DataRobot file system.
- Parameters:
path (
str) – Directory path to upload files under. Cannot be root directory.data_source_id (
str) – The ID of theDataSourceto use as the source of data.credential_id (
Optional[str]) – The ID of theCredentialto use for authentication.credential_data (
Optional[Dict[str,str]]) – The credentials to authenticate with the database, to use instead of credential ID.unpack_archive_files (
bool) – Whether to unpack archive files (zip, tar, tar.gz, tgz) upon upload.overwrite_strategy (
FilesOverwriteStrategy) – Strategy to handle naming conflicts when writing to a path where a file already exists. UseFilesOverwriteStrategy.RENAMEto rename and uploaded file using the “<filename> (n).ext” pattern. UseFilesOverwriteStrategy.REPLACEto overwrite the existing file. UseFilesOverwriteStrategy.SKIPto skip uploading if a file already exists at the target path. UseFilesOverwriteStrategy.ERRORto raise FileExistsError if a file already exists at the target path.upload_timeout (
int) – Maximum time in seconds to wait for the upload to complete.wait_for_completion (
bool) – If True, block until the upload completes. If False, return after starting the upload.kwargs (
Any) – Additional keyword arguments for future proofing.
- Raises:
ValueError – If the directory path is invalid.
FileNotFoundError – If the directory path does not exist.
AsyncTimeoutError – If
wait_for_completionis True and the upload takes longer thanupload_timeoutseconds.
- Return type:
None
Examples
Upload file or folder from Google Drive.
Note: GDrive paths must use drive, folder and file IDs. Example:
/<drive_id>/<folder_id>/<file_id>or/<drive_id>/<folder_id>if folder.>>> import datarobot as dr >>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> gcp_cred = dr.Credential.create_gcp( ... name='GDrive Credentials', ... gcp_key={ # Or load from keyfile ... "type": "service_account", ... "private_key_id": "...", ... "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n", ... "client_email": "user@project.iam.gserviceaccount.com", ... "client_id": "...", ... }, ... ) >>> gdrive_connector = next( ... c for c in dr.Connector.list() if c.connector_type == "gdrive" ... ) >>> gdrive_datastore = dr.DataStore.create( ... data_store_type=dr.enums.DataStoreTypes.DR_CONNECTOR_V1, ... canonical_name='GDrive DataStore', ... fields=[{'id': 'gdrive.drive_name', 'name': 'Drive Name', 'value': 'My Drive'}], ... connector_id=gdrive_connector.id, ... ) >>> path = "/<drive_id>/<folder_id>/<file_id>" # or "/<drive_id>/<folder_id>" for a folder >>> gdrive_datasource = dr.DataSource.create( ... data_source_type=dr.enums.DataStoreTypes.DR_CONNECTOR_V1, ... canonical_name='GDrive DataSource for my documents', ... params=dr.DataSourceParameters(data_store_id=gdrive_datastore.id, path=path), ... ) >>> fs.put_from_data_source( ... "dr://<catalog-id>/my_gdrive_documents/", ... gdrive_datasource.id, ... credential_id=gcp_cred.credential_id, # Can omit if using default credentials setup with DataStore ... ) >>> print(fs.ls(f"dr://<catalog-id>/my_gdrive_documents/", detail=False)) ['<catalog-id>/my_gdrive_documents/file.txt']
Upload file or folder from AWS S3 bucket:
>>> import datarobot as dr >>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> cred = dr.Credential.create_s3( ... name="AWS S3 Credentials", ... aws_access_key_id="...", ... aws_secret_access_key="...", ... aws_session_token="...", ... ) >>> s3_connector = next( ... c for c in dr.Connector.list() if c.connector_type == "s3" ... ) >>> s3_datastore = dr.DataStore.create( ... data_store_type=dr.enums.DataStoreTypes.DR_CONNECTOR_V1, ... canonical_name='S3 DataStore', ... fields=[ ... {"id": "fs.defaultFS", "name": "Bucket Name", "value": "my-bucket-name"}, ... {"id": "fs.rootDirectory", "name": "Prefix", "value": "/"}, ... {"id": "fs.s3.awsRegion", "name": "S3 Bucket Region", "value": "us-east-1"}, ... ], ... connector_id=s3_connector.id, ... ) >>> s3_datasource = dr.DataSource.create( ... data_source_type=dr.enums.DataStoreTypes.DR_CONNECTOR_V1, ... canonical_name='S3 DataSource for my files', ... params=dr.DataSourceParameters( ... data_store_id=s3_datastore.id, ... path="path/to/my/file.txt", # or "path/to/my/folder/" ... ), ... ) >>> fs.put_from_data_source( ... "dr://<catalog-id>/my_s3_files/", ... s3_datasource.id, ... credential_id=cred.credential_id, # Can omit if using default credentials setup with DataStore ... ) >>> print(fs.ls(f"dr://<catalog-id>/my_s3_files/", detail=False)) ['<catalog-id>/my_s3_files/file.txt']
Upload file or folder from SharePoint:
Note: Sharepoint paths must use the following format:
/<HOSTNAME>,<SITE_COLLECTION_ID>,<SITE_ID/WEB_ID>/<DRIVE_ID>/<FILE_OR_FOLDER_ITEM_ID>.Example:
/mydomain.sharepoint.com,4732d...8b01b0,eb0d3...e42f/b!8tQyRyn.....TowMA13__nTU/01MAJ...EYJTAOR6/>>> import datarobot as dr >>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> cred = dr.Credential.create_azure_service_principal( ... name="Azure Service Principal Credential for Sharepoint", ... client_id="...", ... client_secret="...", ... azure_tenant_id="...", ... ) >>> sharepoint_connector = next( ... c for c in dr.Connector.list() if c.connector_type == "sharepoint" ... ) >>> sharepoint_datastore = dr.DataStore.create( ... data_store_type=dr.enums.DataStoreTypes.DR_CONNECTOR_V1, ... canonical_name='Sharepoint DataStore', ... fields=[], ... connector_id=sharepoint_connector.id, ... ) >>> path = "/<HOSTNAME>,<SITE_COLLECTION_ID>,<SITE_ID/WEB_ID>/<DRIVE_ID>/<FILE_OR_FOLDER_ITEM_ID>" >>> sharepoint_datasource = dr.DataSource.create( ... data_source_type=dr.enums.DataStoreTypes.DR_CONNECTOR_V1, ... canonical_name='Sharepoint DataSource', ... params=dr.DataSourceParameters( ... data_store_id=sharepoint_datastore.id, ... path=path, ... ), ... ) >>> fs.put_from_data_source( ... "dr://<catalog-id>/my_sharepoint_files/", ... sharepoint_datasource.id, ... credential_id=cred.credential_id, ... ) >>> print(fs.ls(f"dr://<catalog-id>/my_sharepoint_files/", detail=False)) ['<catalog-id>/my_sharepoint_files/my_file.txt']
- open(path, mode='rb', block_size=None, cache_options=None, compression=None, overwrite_strategy=FilesOverwriteStrategy.REPLACE, unpack_archive_files=False, upload_timeout=600, **kwargs)
Open a file in the DataRobot file system. Supports read modes ‘r’, ‘rb’ and write modes ‘w’, ‘wb’, ‘xb’.
- Parameters:
path (
str) – Path in the DataRobot file system to open.mode (
str) – Mode to open the file in. ‘r’ or ‘rb’ for reading, ‘w’, ‘wb’ or ‘xb’ for writing.block_size (
Optional[int]) – Buffer size in bytes for reading and writing.cache_options (
Optional[Dict[str,Any]]) – Extra arguments to pass through the cache.compression (
Optional[str]) – If given, open file using compression codec. Can either be a compression name (a key in fsspec.compression.compr) or “infer” to guess the compression from the filename suffix.overwrite_strategy (
FilesOverwriteStrategy) – Strategy to handle naming conflicts when writing to a path where a file already exists. UseFilesOverwriteStrategy.RENAMEto rename and uploaded file using the “<filename> (n).ext” pattern. UseFilesOverwriteStrategy.REPLACEto overwrite the existing file. UseFilesOverwriteStrategy.SKIPto skip uploading if a file already exists at the target path. UseFilesOverwriteStrategy.ERRORto raise FileExistsError if a file already exists at the target path.unpack_archive_files (
bool) – If True, automatically unpack archive files (zip, tar, tar.gz, tgz) upon upload.upload_timeout (
int) – Maximum time in seconds to wait for file upload to complete.kwargs (
Any) – Additional keyword arguments passed toDataRobotFileorTextFileWrapper.
- Raises:
IsADirectoryError – If attempting to open a directory for reading.
FileNotFoundError – If attempting to open a non-existent file for reading.
ValueError – If an unsupported file mode is provided, an invalid path is passed, or if file is too big to download.
FileExistsError – If attempting to write to a path where a file already exists and overwrite strategy is set to
FilesOverwriteStrategy.ERRORor mode is set to ‘xb’.
- Returns:
A file-like object for reading or writing.
- Return type:
Examples
Open a file for reading:
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> with fs.open("dr://696935d6d5a04a752419cf6d/notes/agenda.txt", mode="r") as f: ... data = f.read()
Read first 20 bytes from a file then skip to byte 100 and read the next 30 bytes:
>>> with fs.open("dr://696935d6d5a04a752419cf6d/figures/plot.png", mode="rb") as f: ... first_20_bytes = f.read(20) ... f.seek(100) ... next_30_bytes = f.read(30)
- touch(path, truncate=True, **kwargs)
Create an empty file at the given path.
DataRobotFileSystem does not support updating timestamps of existing files.
- Parameters:
path (
str) – Path to the file to create.truncate (
bool) – Whether to replace the existing file with an empty one. This must always be set to True.kwargs (
Any) – Additional keyword arguments passed toopen().
- Raises:
NotImplementedError – If attempting to update the timestamp of an existing file with truncate set to False.
- Return type:
None
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> fs.touch("dr://696935d6d5a04a752419cf6d/notes/agenda.txt")
- read_block(fn, offset, length, delimiter=None)
Read a block of bytes from a file.
Starting at
offsetof the file, readlengthbytes. Ifdelimiteris set then we ensure that the read starts and stops at delimiter boundaries that follow the locationsoffsetandoffset + length. Ifoffsetis zero then we start at zero. The bytestring returned WILL include the end delimiter string.If offset+length is beyond the eof, reads to eof.
- Parameters:
fn (
str) – Filepath to read from.offset (
int) – Byte offset to start read from.length (
Optional[int]) – Number of bytes to read. If None, read to end of file.delimiter (
Optional[bytes]) – Ensure reading starts and stops at delimiter bytestring.
- Return type:
bytes
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> fs.read_block("dr://696935d6d5a04a752419cf6d/data/file.txt", 0, 13) b'Alice, 100\nBo' >>> fs.read_block("dr://696935d6d5a04a752419cf6d/data/file.txt", 0, 13, delimiter=b'\n') b'Alice, 100\nBob, 200\n'
Use
length=Noneto read to the end of the file.>>> fs.read_block("dr://696935d6d5a04a752419cf6d/data/file.txt", 0, None, delimiter=b'\n') b'Alice, 100\nBob, 200\nCharlie, 300'
- put_file(lpath, rpath, callback=<fsspec.callbacks.NoOpCallback object>, mode='overwrite', raise_error_on_directory=True, **kwargs)
Upload a single file from local to DataRobot file system.
- Parameters:
lpath (
str) – Local file path.rpath (
str) – DataRobot file system path.callback (
Callback) – Callback to track progress of the file transfer. Not supported as DataRobotFileSystem does not support buffered uploads.mode (
str) – Mode to open the file in: ‘overwrite’ or ‘create’.raise_error_on_directory (
bool) – Whether to raise an exception if the local path is a directory. DataRobot file system does not support creating empty directories. If False, the function does nothing and returns silently.kwargs (
Any) – Keyword arguments passed toopen().
- Raises:
FileExistsError – If the file already exists and mode is set to ‘create’.
NotImplementedError – If attempting to upload a directory and raise_error_on_directory is True.
ValueError – If attempting to upload a file to an invalid path.
- Return type:
None
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> fs.put_file( ... "/Users/username/local/path/to/file.txt", ... "dr://696935d6d5a04a752419cf6d/my/new/file_copy.txt", ... )
- put(lpath, rpath, recursive=False, callback=<fsspec.callbacks.NoOpCallback object>, maxdepth=None, **kwargs)
Upload local file(s) to DataRobot file system.
Copies a specific file or tree of files (if recursive=True). If rpath ends with a “/”, it will be assumed to be a directory, and target files will go within. Calls
put_file()for each source path.- Parameters:
lpath (
Union[str,List[str]]) – Local file path or list of local file paths to upload.rpath (
Union[str,List[str]]) – DataRobot file system path or list of DataRobot file system paths to upload to.recursive (
bool) – Whether to recursively target local files to upload.callback (
Callback) – Callback to track progress of the file transfer. Not supported as DataRobotFileSystem does not support buffered uploads.maxdepth (
Optional[int]) – Maximum depth to recurse when targeting local files to upload.kwargs (
Any) – Additional keyword arguments passed toput_file().raise_error_on_directory – Whether to raise an exception for local directory paths. DataRobot file system does not support creating empty directories. Defaults to False so invocations of
put_file()for local directory paths do nothing and return silently.
- Return type:
None
Examples
>>> from datarobot.fs import DataRobotFileSystem >>> fs = DataRobotFileSystem() >>> fs.put( ... "/Users/username/local/path/to/file.txt", ... "dr://696935d6d5a04a752419cf6d/my/new/file_copy.txt",
Upload a directory recursively:
>>> fs.put( ... "/Users/username/local/path/to/directory", ... "dr://696935d6d5a04a752419cf6d/my/new/directory/", ... recursive=True, ... )
Upload all PDF files in a directory:
>>> fs.put( ... "/Users/username/local/my/documents/**/*.pdf", ... "dr://696935d6d5a04a752419cf6d/my-pdf-documents/", ... recursive=True, ... )
Upload multiple files at once:
>>> fs.put( ... ["/Users/username/local/path/to/file1.txt", "/Users/username/local/path/to/file2.txt"], ... ["dr://696935d6d5a04a752419cf6d/my/new/file1.txt", "dr://696935d6d5a04a752419cf6d/my/new/file2.txt"], ... )
- get_mapper(root='', missing_exceptions=None)
Create a key/value mutable store based on this file-system.
Creates a MutableMapping interface to the DataRobot file system at the given root path.
- Parameters:
root (
str) – Path in the DataRobot file system to use as the root for the map.missing_exceptions (
Optional[Tuple[Type[Exception],...]]) – Exceptions to convert to KeyError if raised when working with the file system.
- Returns:
A key/value mutable store based on this file-system.
- Return type:
Examples
>>> from datarobot.fs import DataRobotFileSystem, DataRobotFSMap >>> fs = DataRobotFileSystem() >>> root_map = fs.get_mapper() >>> map = fs.get_mapper("dr://696935d6d5a04a752419cf6d/")
Retrieve file contents from file system using map:
>>> map["file.txt"] b"Hello, world!" >>> "folder/path/file.txt" in map True >>> file_count = len(map) >>> file_count 3 >>> [file for file in map] ["file.txt", "folder/path/file.txt", "another/folder/file.txt"] >>> map.getitems(["file.txt", "folder/path/file.txt", "another/folder/file.txt"]) { "file.txt": b"Hello, world!", "folder/path/file.txt": b"Hello, world!", "another/folder/file.txt": b"Hello, world!", }
Set file contents in file system using map:
>>> map["file.txt"] = b"Hello, world!" >>> map["folder/path/new_file.txt"] = b"This is a new file!" >>> map.setitems({ "another/folder/file.txt": b"Hello, world!", "folder/path/new_file.txt": b"This is a new file!", })
Delete files from file system using map:
>>> del map["file.txt"] >>> map.delitems(["folder/path/new_file.txt", "another/folder/file.txt"]) >>> map.pop("file.txt", "default_value_if_file_does_not_exist") b'Hello, world!' >>> map.pop("folder/path/non_existent_file.txt", "default_value_if_file_does_not_exist") 'default_value_if_file_does_not_exist'
Clear all files under the map root. This may have unintended consequences as DataRobot file system does not support empty directories:
>>> map.clear() >>> len(map) 0
- mkdir(*args, **kwargs)
Not supported as DataRobotFileSystem does not support empty directories.
- Return type:
None
- makedirs(*args, **kwargs)
Not supported as DataRobotFileSystem does not support empty directories.
- Return type:
None
- rmdir(*args, **kwargs)
Not supported as DataRobotFileSystem does not support empty directories.
- Return type:
None
- modified(*args, **kwargs)
DataRobotFileSystem does not currently expose file modification timestamp.
- Return type:
datetime
- cat_ranges(paths, starts, ends, max_gap=None, on_error='return', **kwargs)
Get the contents of byte ranges from one or more files
- Parameters:
paths (
list) – A list of of filepaths on this filesystemsstarts (
intorlist) – Bytes limits of the read. If using a single int, the same value will be used to read all the specified files.ends (
intorlist) – Bytes limits of the read. If using a single int, the same value will be used to read all the specified files.
- checksum(path)
Unique value for current version of file
If the checksum is the same from one moment to another, the contents are guaranteed to be the same. If the checksum changes, the contents might have changed.
This should normally be overridden; default will probably capture creation/modification timestamp (which would be good) or maybe access timestamp (which would be bad)
- classmethod clear_instance_cache()
Clear the cache of filesystem instances.
Notes
Unless overridden by setting the
cachableclass attribute to False, the filesystem class stores a reference to newly created instances. This prevents Python’s normal rules around garbage collection from working, since the instances refcount will not drop to zero untilclear_instance_cacheis called.
- cp(path1, path2, **kwargs)
Alias of AbstractFileSystem.copy.
- classmethod current()
Return the most recently instantiated FileSystem
If no instance has been created, then create one with defaults
- delete(path, recursive=False, maxdepth=None)
Alias of AbstractFileSystem.rm.
- disk_usage(path, total=True, maxdepth=None, **kwargs)
Alias of AbstractFileSystem.du.
- download(rpath, lpath, recursive=False, **kwargs)
Alias of AbstractFileSystem.get.
- end_transaction()
Finish write transaction, non-context version
- exists(path, **kwargs)
Is there a file at the given path
- expand_path(path, recursive=False, maxdepth=None, **kwargs)
Turn one or more globs or directories into a list of all matching paths to files or directories.
kwargs are passed to
globorfind, which may in turn callls
- static from_dict(dct)
Recreate a filesystem instance from dictionary representation.
See
.to_dict()for the expected structure of the input.- Parameters:
dct (
Dict[str,Any])- Return type:
file system instance,not necessarilyofthis particular class.
Warning
This can import arbitrary modules (as determined by the
clskey). Make sure you haven’t installed any modules that may execute malicious code at import time.
- static from_json(blob)
Recreate a filesystem instance from JSON representation.
See
.to_json()for the expected structure of the input.- Parameters:
blob (
str)- Return type:
file system instance,not necessarilyofthis particular class.
Warning
This can import arbitrary modules (as determined by the
clskey). Make sure you haven’t installed any modules that may execute malicious code at import time.
- property fsid
Persistent filesystem id that can be used to compare filesystems across sessions.
- get(rpath, lpath, recursive=False, callback=<fsspec.callbacks.NoOpCallback object>, maxdepth=None, **kwargs)
Copy file(s) to local.
Copies a specific file or tree of files (if recursive=True). If lpath ends with a “/”, it will be assumed to be a directory, and target files will go within. Can submit a list of paths, which may be glob-patterns and will be expanded.
Calls get_file for each source.
- get_file(rpath, lpath, callback=<fsspec.callbacks.NoOpCallback object>, outfile=None, **kwargs)
Copy single remote file to local
- head(path, size=1024)
Get the first
sizebytes from file
- invalidate_cache(path=None)
Discard any cached directory information
- Parameters:
path (
stringorNone) – If None, clear all listings cached else listings at or under given path.
- isdir(path)
Is this entry directory-like?
- isfile(path)
Is this entry file-like?
- lexists(path, **kwargs)
If there is a file at the given path (including broken links)
- listdir(path, detail=True, **kwargs)
Alias of AbstractFileSystem.ls.
- makedir(path, create_parents=True, **kwargs)
Alias of AbstractFileSystem.mkdir.
- mkdirs(path, exist_ok=False)
Alias of AbstractFileSystem.makedirs.
- move(path1, path2, **kwargs)
Alias of AbstractFileSystem.mv.
- pipe(path, value=None, **kwargs)
Put value into path
(counterpart to
cat)- Parameters:
path (
stringordict(str,bytes)) – If a string, a single remote location to putvaluebytes; if a dict, a mapping of {path: bytesvalue}.value (
bytes, optional) – If using a single path, these are the bytes to put there. Ignored ifpathis a dict
- pipe_file(path, value, mode='overwrite', **kwargs)
Set the bytes of given file
- read_bytes(path, start=None, end=None, **kwargs)
Alias of AbstractFileSystem.cat_file.
- read_text(path, encoding=None, errors=None, newline=None, **kwargs)
Get the contents of the file as a string.
- Parameters:
path (
str) – URL of file on this filesystemsencoding (
same as `open.`)errors (
same as `open.`)newline (
same as `open.`)
- rename(path1, path2, **kwargs)
Alias of AbstractFileSystem.mv.
- size(path)
Size in bytes of file
- sizes(paths)
Size in bytes of each file in a list of paths
- start_transaction()
Begin write transaction for deferring files, non-context version
- stat(path, **kwargs)
Alias of AbstractFileSystem.info.
- tail(path, size=1024)
Get the last
sizebytes from file
- to_dict(*, include_password=True)
JSON-serializable dictionary representation of this filesystem instance.
- Parameters:
include_password (
bool, defaultTrue) – Whether to include the password (if any) in the output.- Return type:
dict[str,Any]- Returns:
Dictionary with keys ``cls`(the python location` ofthis class),protocol (text nameofthis class's protocol,first one in case ofmultiple),args(positional args,usually empty), andall otherkeyword arguments as their own keys.
Warning
Serialized filesystems may contain sensitive information which have been passed to the constructor, such as passwords and tokens. Make sure you store and send them in a secure environment!
- to_json(*, include_password=True)
JSON representation of this filesystem instance.
- Parameters:
include_password (
bool, defaultTrue) – Whether to include the password (if any) in the output.- Return type:
str- Returns:
JSON string with keys ``cls`(the python location` ofthis class),protocol (text nameofthis class's protocol,first one in case ofmultiple),args(positional args,usually empty), andall otherkeyword arguments as their own keys.
Warning
Serialized filesystems may contain sensitive information which have been passed to the constructor, such as passwords and tokens. Make sure you store and send them in a secure environment!
- property transaction
A context within which files are committed together upon exit
Requires the file class to implement .commit() and .discard() for the normal and exception cases.
- transaction_type
alias of
Transaction
- ukey(path)
Hash of file properties, to tell if it has changed
- unstrip_protocol(name)
Format FS-specific path to generic, including protocol
- Return type:
str
- upload(lpath, rpath, recursive=False, **kwargs)
Alias of AbstractFileSystem.put.
- walk(path, maxdepth=None, topdown=True, on_error='omit', **kwargs)
Return all files under the given path.
List all files, recursing into subdirectories; output is iterator-style, like
os.walk(). For a simple list of files,find()is available.When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again. Modifying dirnames when topdown is False has no effect. (see os.walk)
Note that the “files” outputted will include anything that is not a directory, such as links.
- Parameters:
path (
str) – Root to recurse intomaxdepth (
int) – Maximum recursion depth. None means limitless, but not recommended on link-based file-systems.topdown (
bool (True)) – Whether to walk the directory tree from the top downwards or from the bottom upwards.on_error (
"omit","raise",a callable) – if omit (default), path with exception will simply be empty; If raise, an underlying exception will be raised; if callable, it will be called with a single OSError instance as argumentkwargs (
passedtols)
- write_bytes(path, value, **kwargs)
Alias of AbstractFileSystem.pipe_file.
- write_text(path, value, encoding=None, errors=None, newline=None, **kwargs)
Write the text to the given file.
An existing file will be overwritten.
- Parameters:
path (
str) – URL of file on this filesystemsvalue (
str) – Text to write.encoding (
same as `open.`)errors (
same as `open.`)newline (
same as `open.`)
- class datarobot.fs.file_system.DataRobotFile
Bases:
AbstractBufferedFileFile-like object for reading and writing files in the DataRobot file system.
Supports read modes ‘r’, ‘rb’ and write modes ‘w’, ‘wb’, ‘xb’. DataRobot file system buffers writes in memory only before uploading on close.
- Variables:
path (
str) – File path in the DataRobot file system.mode (
str) – File mode, either ‘rb’, ‘wb’, or ‘xb’.fs (
DataRobotFileSystem) – The DataRobot file system instance.blocksize (
int) – Block size for reading files.autocommit (
bool) – Whether to automatically commit changes on close.loc (
int) – Current position in the file.closed (
bool) – Whether the file is closed.forced (
bool) – Whether the file is in forced mode.offset (
Optional[int]) – Content length of the file.buffer (
io.BytesIO) – In-memory buffer when writing.overwrite_strategy – Strategy to handle file naming conflicts when writing files.
unpack_archive_files – Whether to unpack archive files (zip, tar, tar.gz, tgz) upon upload.
upload_timeout – Maximum time in seconds to wait for file upload to complete.
See also
- write(data)
Write data to buffer.
- Parameters:
data (
bytes) – Data to write as bytes.- Returns:
Number of bytes written.
- Return type:
int- Raises:
ValueError – If the file is not in write mode, is closed, or has been force-flushed.
- flush(force=False)
Write the buffered data to the DataRobot file system if force is True.
Notes
Since DataRobot file system does not support multipart uploads, calling flush without force does not upload any data.
- Parameters:
force (
bool) – Whether to force flush and upload data. Disallows further writing to this file.- Raises:
ValueError – If the file is closed or if force flush has already been called.
- Return type:
None
- upload()
Alias of
flush(force=True).- Return type:
None
- close()
Close file. Finalizes writes, discards cache.
- Return type:
None
- property url: str
A signed URL for the file.
- commit()
Move from temp to final destination
- discard()
Throw away temporary file
- fileno()
Returns underlying file descriptor if one exists.
OSError is raised if the IO object does not use a file descriptor.
- info()
File information about this path
- isatty()
Return whether this is an ‘interactive’ stream.
Return False if it can’t be determined.
- read(length=-1)
Return data from cache, or fetch pieces as necessary
- Parameters:
length (
int (-1)) – Number of bytes to read; if <0, all remaining bytes.
- readable()
Whether opened for reading
- readinto(b)
mirrors builtin file’s readinto method
https://docs.python.org/3/library/io.html#io.RawIOBase.readinto
- readline()
Read until and including the first occurrence of newline character
Note that, because of character encoding, this is not necessarily a true line ending.
- readlines()
Return all data, split by the newline character, including the newline character
- readuntil(char=b'\n', blocks=None)
Return data between current position and first occurrence of char
char is included in the output, except if the end of the tile is encountered first.
- Parameters:
char (
bytes) – Thing to findblocks (
Noneorint) – How much to read in each go. Defaults to file blocksize - which may mean a new read on every call.
- seek(loc, whence=0)
Set current file location
- Parameters:
loc (
int) – byte locationwhence (
{0, 1, 2}) – from start of file, current location or end of file, resp.
- seekable()
Whether is seekable (only in read mode)
- tell()
Current file location
- truncate()
Truncate file to size bytes.
File pointer is left unchanged. Size defaults to the current IO position as reported by tell(). Returns the new size.
- property use_range_headers: bool
Whether to use range headers when reading data from file URL.
- writable()
Whether opened for writing
- writelines(lines, /)
Write a list of lines to stream.
Line separators are not added, so it is usual for each of the lines provided to have a line separator at the end.
- property is_datarobot_url_for_read: bool
Whether the file URL is a DataRobot URL.
- property read_client: Session
Session client to use for reading data from file URL. Supports unauthenticated clients for URLs outside DataRobot with embedded authentication.
- class datarobot.fs.file_system.DataRobotFSMap
Bases:
FSMapWrap a
DataRobotFileSysteminstance as a mutable mapping.The keys of the mapping become files under the given root, and the values (which must be bytes) the contents of those files.
- Parameters:
root (
str) – The root path in the DataRobot file system to create the mapper for.fs (
DataRobotFileSystem) – The DataRobot file system instance.missing_exceptions (
Optional[Tuple[Type[Exception],]]) – Exceptions to convert to KeyError when accessing the file system.
Examples
>>> from datarobot.fs import DataRobotFileSystem, DataRobotFSMap >>> fs = DataRobotFileSystem() >>> map = DataRobotFSMap("dr://696935d6d5a04a752419cf6d/", fs)
Retrieve file contents from file system using map:
>>> map["file.txt"] b"Hello, world!" >>> "folder/path/file.txt" in map True >>> file_count = len(map) >>> file_count 3 >>> [file for file in map] ["file.txt", "folder/path/file.txt", "another/folder/file.txt"] >>> map.getitems(["file.txt", "folder/path/file.txt", "another/folder/file.txt"]) { "file.txt": b"Hello, world!", "folder/path/file.txt": b"Hello, world!", "another/folder/file.txt": b"Hello, world!", }
Set file contents in file system using map:
>>> map["file.txt"] = b"Hello, world!" >>> map["folder/path/new_file.txt"] = b"This is a new file!" >>> map.setitems({ "another/folder/file.txt": b"Hello, world!", "folder/path/new_file.txt": b"This is a new file!", })
Delete files from file system using map:
>>> del map["file.txt"] >>> map.delitems(["folder/path/new_file.txt", "another/folder/file.txt"]) >>> map.pop("file.txt", "default_value_if_file_does_not_exist") b'Hello, world!' >>> map.pop("folder/path/non_existent_file.txt", "default_value_if_file_does_not_exist") 'default_value_if_file_does_not_exist'
Clear all files under the map root directory. This may have unintended consequences as DataRobot file system does not support empty directories:
>>> map.clear() >>> len(map) 0
- delitems(keys)
Remove multiple keys from the store
- property dirfs
dirfs instance that can be used with the same keys as the mapper
- get(k[, d]) D[k] if k in D, else d. d defaults to None.
- getitems(keys, on_error='raise')
Fetch multiple items from the store
If the backend is async-able, this might proceed concurrently
- Parameters:
keys (
list(str)) – They keys to be fetchedon_error (
"raise","omit","return") – If raise, an underlying exception will be raised (converted to KeyError if the type is in self.missing_exceptions); if omit, keys with exception will simply not be included in the output; if “return”, all keys are included in the output, but the value will be bytes or an exception instance.
- Return type:
dict(key,bytes|exception)
- items() a set-like object providing a view on D's items
- keys() a set-like object providing a view on D's keys
- pop(key, default=None)
Pop data
- popitem() (k, v), remove and return some (key, value) pair
as a 2-tuple; but raise KeyError if D is empty.
- setdefault(k[, d]) D.get(k,d), also set D[k]=d if k not in D
- setitems(values_dict)
Set the values of multiple items in the store
- Parameters:
values_dict (
dict(str,bytes))
- update([E, ]**F) None. Update D from mapping/iterable E and F.
If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
- values() an object providing a view on D's values
- clear()
Remove all keys below root. Empties out the mapping.
Notes
May delete more directories than expected as DataRobot file system does not support empty directories.
- Return type:
None
Enum and Helpers
- class datarobot.fs.file_system.FileInfo
Information about a file or directory in DataRobot File System.
- Variables:
name – The path of the file or directory. Does not include the protocol prefix.
size – The size of the file in bytes. For directories, this is 0.
type – The type of the item, either ‘file’ or ‘directory’.
format – The file format (e.g., ‘csv’, ‘pdf’) if the item is a file; None for directories.
created_at – The file creation timestamp if the item is a file; None for directories.
- class datarobot.enums.FilesOverwriteStrategy