Data Client

class citrination_client.data.client.DataClient(api_key, host='https://citrination.com', suppress_warnings=False, proxies=None)

Client encapsulating data management behavior.

__init__(api_key, host='https://citrination.com', suppress_warnings=False, proxies=None)

Constructor.

Parameters
  • api_key (str) – A users API key, as a string

  • host (str) – The base URL of the citrination site, e.g. https://citrination.com

  • suppress_warnings (bool) – Whether or not usage warnings should be printed to stdout

create_dataset(name=None, description=None, public=False)

Create a new data set.

Parameters
  • name (str) – name of the dataset

  • description (str) – description for the dataset

  • public (bool) – A boolean indicating whether or not the dataset should be public.

Returns

The newly created dataset.

Return type

Dataset

create_dataset_version(dataset_id)

Create a new data set version.

Parameters

dataset_id (int) – The ID of the dataset for which the version must be bumped.

Returns

The new dataset version.

Return type

DatasetVersion

delete_dataset(dataset_id)

Delete a dataset by id. This will only work if you are the owner of the dataset.

Parameters

dataset_id – The ID of the dataset to data.

download_files(dataset_files, destination='.')

Downloads file(s) to a local destination.

Parameters
  • dataset_files (list of :class: DatasetFile) –

  • destination (str) – The path to the desired local download destination

  • chunk (bool) – Whether or not to chunk the file. Default True

get_data_view_ids(dataset_id)

Returns a list of ids for data views that are built upon the provided dataset.

Parameters

dataset_id (int) – The ID of the dataset for which the version must be bumped.

Returns

The list of data view ids that use the given dataset.

Return type

list of int

get_dataset_file(dataset_id, file_path, version=None)

Retrieves a dataset file matching a provided file path

Parameters
  • dataset_id (int) – The id of the dataset to retrieve file from

  • file_path (str) – The file path within the dataset

  • version (int) – The dataset version to look for the file in. If nothing is supplied, the latest dataset version will be searched

Returns

A dataset file matching the filepath provided

Return type

DatasetFile

get_dataset_files(dataset_id, glob='.', is_dir=False, version_number=None)

Retrieves URLs for the files matched by a glob or a path to a directory in a given dataset.

Parameters
  • dataset_id (int) – The id of the dataset to retrieve files from

  • glob (str) – A regex used to select one or more files in the dataset

  • is_dir (bool) – Whether or not the supplied pattern should be treated as a directory to search in

  • version_number (int) – The version number of the dataset to retrieve files from

Returns

A list of dataset files whose paths match the provided pattern.

Return type

list of DatasetFile

get_ingest_status(dataset_id)

Returns the current status of dataset ingestion. If any file uploaded to a dataset is in an error/failure state this endpoint will return error/failure. If any files are still processing, will return processing.

Parameters

dataset_id – Dataset identifier

Returns

Status of dataset ingestion as a string

get_pif(dataset_id, uid, dataset_version=None, pif_version=None)

Retrieves a PIF from a given dataset.

Parameters
  • dataset_id (int) – The id of the dataset to retrieve PIF from

  • uid (str) – The uid of the PIF to retrieve

  • dataset_version (int) – The dataset version to look for the PIF in. If nothing is supplied, the latest dataset version will be searched.

  • pif_version (int) – The version of the PIF to look for. If nothing is supplied, the current PIF version will be returned.

Returns

A Pif object

Return type

Pif

get_pif_with_metadata(dataset_id, uid, dataset_version=None, pif_version=None)

Retrieves a PIF from a given dataset, along with information regarding the dataset it belongs to, its version number, and when it was last updated.

Parameters
  • dataset_id (int) – The id of the dataset to retrieve PIF from

  • uid (str) – The uid of the PIF to retrieve

  • dataset_version (int) – The dataset version to look for the PIF in. If nothing is supplied, the latest dataset version will be searched.

  • pif_version (int) – The version of the PIF to look for. If nothing is supplied, the current PIF version will be returned.

Returns

A dict with two keys - pif (Pif) and metadata (dict)

Return type

dict

ingester_logs(file_id)

Returns Ingester Logs for a given file if they are available

list_files(dataset_id, glob='.', is_dir=False)

List matched filenames in a dataset on Citrination.

Parameters
  • dataset_id (int) – The ID of the dataset to search for files.

  • glob (str) – A pattern which will be matched against files in the dataset.

  • is_dir (bool) – A boolean indicating whether or not the pattern should match against the beginning of paths in the dataset.

Returns

A list of filepaths in the dataset matching the provided glob.

Return type

list of strings

list_ingesters()

Retrieves the list of available ingesters

Returns

The list of ingesters available for ingestion

Return type

IngesterList

matched_file_count(dataset_id, glob='.', is_dir=False)

Returns the number of files matching a pattern in a dataset.

Parameters
  • dataset_id (int) – The ID of the dataset to search for files.

  • glob (str) – A pattern which will be matched against files in the dataset.

  • is_dir (bool) – A boolean indicating whether or not the pattern should match against the beginning of paths in the dataset.

Returns

The number of matching files

Return type

int

update_dataset(dataset_id, name=None, description=None, public=None)

Update a data set.

Parameters
  • dataset_id (int) – The ID of the dataset to update

  • name (str) – name of the dataset

  • description (str) – description for the dataset

  • public (bool) – A boolean indicating whether or not the dataset should be public.

Returns

The updated dataset.

Return type

Dataset

upload(dataset_id, source_path, dest_path=None)

Upload a file, specifying source and optionally destination paths of a file (acts as the scp command)

Parameters
  • dataset_id (Union[int, str]) – The ID of the dataset to search for files.

  • source_path (str) – The path to the file on the source host

  • dest_path (str) – The path to the file where the contents of the upload will be written (on the dest host)

Returns

The result of the upload process

Return type

UploadResult

upload_with_ingester(dataset_id, source_path, ingester, ingester_arguments=[], dest_path=None)

Upload a file using a particular ingester, specifying source and optionally destination paths of a file (acts as the scp command)

Parameters
  • dataset_id (Union[int, str]) – The ID of the dataset to search for files.

  • source_path (str) – The path to the file on the source host

  • ingester (class:citrination_client.data.ingest.Ingester) – The ingester being used

  • ingester_arguments (list of dict) – ingester arguments (optional), arguments should contain keys “name” and “value”

  • dest_path (str) – The path to the file where the contents of the upload will be written (on the dest host)

Returns

The result of the upload process

Return type

UploadResult

upload_with_template_csv_ingester(dataset_id, source_path, dest_path=None)

Upload a file using the template CSV ingester, specifying source and optionally destination paths of a file (acts as the scp command)

Parameters
  • dataset_id (Union[int, str]) – The ID of the dataset to search for files.

  • source_path (str) – The path to the file on the source host

  • dest_path (str) – The path to the file where the contents of the upload will be written (on the dest host)

Returns

The result of the upload process

Return type

UploadResult

class citrination_client.data.upload_result.UploadResult

The result of an attempted upload. Keeps track of the failures and successes if multiple files were uploaded (for instance, if a directory was uploaded).

__init__()

Constructor.

add_failure(filepath, reason)

Registers a file as a failure to upload.

Parameters
  • filepath (str) – The path to the file which was to be uploaded.

  • reason (str) – The reason the file failed to upload

add_success(filepath, id, dest_path)

Registers a file as successfully uploaded.

Parameters
  • filepath (str) – The path to the successfully uploaded file.

  • id (Union[str, int]) – The id of the successfully uploaded file.

  • dest_path (str) – The destination path to the successfully uploaded file.

successful()

Indicates whether or not the entire upload was successful.

Returns

Whether or not the upload was successful

Return type

bool