Data Client¶
-
class
citrination_client.data.client.
DataClient
(api_key, host='https://citrination.com', suppress_warnings=False, proxies=None)¶ Client encapsulating data management behavior.
-
__init__
(api_key, host='https://citrination.com', suppress_warnings=False, proxies=None)¶ Constructor.
- Parameters
api_key (str) – A users API key, as a string
host (str) – The base URL of the citrination site, e.g. https://citrination.com
suppress_warnings (bool) – Whether or not usage warnings should be printed to stdout
-
create_dataset
(name=None, description=None, public=False)¶ Create a new data set.
- Parameters
name (str) – name of the dataset
description (str) – description for the dataset
public (bool) – A boolean indicating whether or not the dataset should be public.
- Returns
The newly created dataset.
- Return type
Dataset
-
create_dataset_version
(dataset_id)¶ Create a new data set version.
- Parameters
dataset_id (int) – The ID of the dataset for which the version must be bumped.
- Returns
The new dataset version.
- Return type
DatasetVersion
-
delete_dataset
(dataset_id)¶ Delete a dataset by id. This will only work if you are the owner of the dataset.
- Parameters
dataset_id – The ID of the dataset to data.
-
download_files
(dataset_files, destination='.')¶ Downloads file(s) to a local destination.
- Parameters
dataset_files (list of :class: DatasetFile) –
destination (str) – The path to the desired local download destination
chunk (bool) – Whether or not to chunk the file. Default True
-
get_data_view_ids
(dataset_id)¶ Returns a list of ids for data views that are built upon the provided dataset.
- Parameters
dataset_id (int) – The ID of the dataset for which the version must be bumped.
- Returns
The list of data view ids that use the given dataset.
- Return type
list of int
-
get_dataset_file
(dataset_id, file_path, version=None)¶ Retrieves a dataset file matching a provided file path
- Parameters
dataset_id (int) – The id of the dataset to retrieve file from
file_path (str) – The file path within the dataset
version (int) – The dataset version to look for the file in. If nothing is supplied, the latest dataset version will be searched
- Returns
A dataset file matching the filepath provided
- Return type
DatasetFile
-
get_dataset_files
(dataset_id, glob='.', is_dir=False, version_number=None)¶ Retrieves URLs for the files matched by a glob or a path to a directory in a given dataset.
- Parameters
dataset_id (int) – The id of the dataset to retrieve files from
glob (str) – A regex used to select one or more files in the dataset
is_dir (bool) – Whether or not the supplied pattern should be treated as a directory to search in
version_number (int) – The version number of the dataset to retrieve files from
- Returns
A list of dataset files whose paths match the provided pattern.
- Return type
list of
DatasetFile
-
get_ingest_status
(dataset_id)¶ Returns the current status of dataset ingestion. If any file uploaded to a dataset is in an error/failure state this endpoint will return error/failure. If any files are still processing, will return processing.
- Parameters
dataset_id – Dataset identifier
- Returns
Status of dataset ingestion as a string
-
get_pif
(dataset_id, uid, dataset_version=None, pif_version=None)¶ Retrieves a PIF from a given dataset.
- Parameters
dataset_id (int) – The id of the dataset to retrieve PIF from
uid (str) – The uid of the PIF to retrieve
dataset_version (int) – The dataset version to look for the PIF in. If nothing is supplied, the latest dataset version will be searched.
pif_version (int) – The version of the PIF to look for. If nothing is supplied, the current PIF version will be returned.
- Returns
A
Pif
object- Return type
Pif
-
get_pif_with_metadata
(dataset_id, uid, dataset_version=None, pif_version=None)¶ Retrieves a PIF from a given dataset, along with information regarding the dataset it belongs to, its version number, and when it was last updated.
- Parameters
dataset_id (int) – The id of the dataset to retrieve PIF from
uid (str) – The uid of the PIF to retrieve
dataset_version (int) – The dataset version to look for the PIF in. If nothing is supplied, the latest dataset version will be searched.
pif_version (int) – The version of the PIF to look for. If nothing is supplied, the current PIF version will be returned.
- Returns
A dict with two keys -
pif
(Pif
) andmetadata
(dict)- Return type
dict
-
ingester_logs
(file_id)¶ Returns Ingester Logs for a given file if they are available
-
list_files
(dataset_id, glob='.', is_dir=False)¶ List matched filenames in a dataset on Citrination.
- Parameters
dataset_id (int) – The ID of the dataset to search for files.
glob (str) – A pattern which will be matched against files in the dataset.
is_dir (bool) – A boolean indicating whether or not the pattern should match against the beginning of paths in the dataset.
- Returns
A list of filepaths in the dataset matching the provided glob.
- Return type
list of strings
-
list_ingesters
()¶ Retrieves the list of available ingesters
- Returns
The list of ingesters available for ingestion
- Return type
IngesterList
-
matched_file_count
(dataset_id, glob='.', is_dir=False)¶ Returns the number of files matching a pattern in a dataset.
- Parameters
dataset_id (int) – The ID of the dataset to search for files.
glob (str) – A pattern which will be matched against files in the dataset.
is_dir (bool) – A boolean indicating whether or not the pattern should match against the beginning of paths in the dataset.
- Returns
The number of matching files
- Return type
int
-
update_dataset
(dataset_id, name=None, description=None, public=None)¶ Update a data set.
- Parameters
dataset_id (int) – The ID of the dataset to update
name (str) – name of the dataset
description (str) – description for the dataset
public (bool) – A boolean indicating whether or not the dataset should be public.
- Returns
The updated dataset.
- Return type
Dataset
-
upload
(dataset_id, source_path, dest_path=None)¶ Upload a file, specifying source and optionally destination paths of a file (acts as the scp command)
- Parameters
dataset_id (Union[int, str]) – The ID of the dataset to search for files.
source_path (str) – The path to the file on the source host
dest_path (str) – The path to the file where the contents of the upload will be written (on the dest host)
- Returns
The result of the upload process
- Return type
UploadResult
-
upload_with_ingester
(dataset_id, source_path, ingester, ingester_arguments=[], dest_path=None)¶ Upload a file using a particular ingester, specifying source and optionally destination paths of a file (acts as the scp command)
- Parameters
dataset_id (Union[int, str]) – The ID of the dataset to search for files.
source_path (str) – The path to the file on the source host
ingester (class:citrination_client.data.ingest.Ingester) – The ingester being used
ingester_arguments (list of dict) – ingester arguments (optional), arguments should contain keys “name” and “value”
dest_path (str) – The path to the file where the contents of the upload will be written (on the dest host)
- Returns
The result of the upload process
- Return type
UploadResult
-
upload_with_template_csv_ingester
(dataset_id, source_path, dest_path=None)¶ Upload a file using the template CSV ingester, specifying source and optionally destination paths of a file (acts as the scp command)
- Parameters
dataset_id (Union[int, str]) – The ID of the dataset to search for files.
source_path (str) – The path to the file on the source host
dest_path (str) – The path to the file where the contents of the upload will be written (on the dest host)
- Returns
The result of the upload process
- Return type
UploadResult
-
-
class
citrination_client.data.upload_result.
UploadResult
¶ The result of an attempted upload. Keeps track of the failures and successes if multiple files were uploaded (for instance, if a directory was uploaded).
-
__init__
()¶ Constructor.
-
add_failure
(filepath, reason)¶ Registers a file as a failure to upload.
- Parameters
filepath (str) – The path to the file which was to be uploaded.
reason (str) – The reason the file failed to upload
-
add_success
(filepath, id, dest_path)¶ Registers a file as successfully uploaded.
- Parameters
filepath (str) – The path to the successfully uploaded file.
id (Union[str, int]) – The id of the successfully uploaded file.
dest_path (str) – The destination path to the successfully uploaded file.
-
successful
()¶ Indicates whether or not the entire upload was successful.
- Returns
Whether or not the upload was successful
- Return type
bool
-