tokio.cli.archive_collectdes module

Dump a lot of data out of ElasticSearch using the Python API and native scrolling support. Output either as native json from ElasticSearch or as serialized TOKIO TimeSeries (TTS) HDF5 files.

tokio.cli.archive_collectdes.dataset2metadataset_key(dataset_key)[source]

Return the metadataset name corresponding to a dataset name

Parameters:dataset_name (str) – Name of a dataset
Returns:Name of corresponding metadataset name
Return type:str
tokio.cli.archive_collectdes.main(argv=None)[source]

Entry point for the CLI interface

tokio.cli.archive_collectdes.metadataset2dataset_key(metadataset_name)[source]

Return the dataset name corresponding to a metadataset name

Metadatasets are not ever stored in the HDF5 and instead are only used to store data needed to correctly calculate dataset values. This function maps a metadataset name to its corresponding dataset name.

Parameters:metadataset_name (str) – Name of a metadataset
Returns:Name of corresponding dataset name, or None if metadataset_name does not appear to be a metadataset name.
Return type:str
tokio.cli.archive_collectdes.normalize_cpu_datasets(inserts, datasets)[source]

Normalize CPU load datasets

Divide each element of CPU datasets by the number of CPUs counted at each point in time. Necessary because these measurements are reported on a per-core basis, but not all cores may be reported for each timestamp.

Parameters:
  • inserts (list of tuples) – list of inserts that were used to populate datasets
  • datasets (dict of TimeSeries) – all of the datasets being populated
Returns:

Nothing

tokio.cli.archive_collectdes.pages_to_hdf5(pages, output_file, init_start, init_end, query_start, query_end, timestep, num_servers, devices_per_server, threads=1)[source]

Stores a page from Elasticsearch query in an HDF5 file Take pages from ElasticSearch query and store them in output_file

Parameters:
  • pages (list) – A list of page objects (dictionaries)
  • output_file (str) – Path to an HDF5 file in which page data should be stored
  • init_start (datetime.datetime) – Lower bound of time (inclusive) to be stored in the output_file. Used when creating a non-existent HDF5 file.
  • init_end (datetime.datetime) – Upper bound of time (inclusive) to be stored in the output_file. Used when creating a non-existent HDF5 file.
  • query_start (datetime.datetime) – Retrieve data greater than or equal to this time from Elasticsearch
  • query_end (datetime.datetime) – Elasticsearch
  • timestep (int) – Time, in seconds, between successive sample intervals to be used when initializing output_file
  • num_servers (int) – Number of discrete servers in the cluster. Used when initializing output_file.
  • devices_per_server (int) – Number of SSDs per server. Used when initializing output_file.
  • threads (int) – Number of parallel threads to utilize when parsing the Elasticsearch output
tokio.cli.archive_collectdes.process_page(page)[source]

Go through a list of docs and insert their data into a numpy matrix. In the future this should be a flush function attached to the CollectdEs connector class.

Parameters:page (dict) – A single page of output from an Elasticsearch scroll query. Should contain a hits key.
tokio.cli.archive_collectdes.reset_timeseries(timeseries, start, end, value=-0.0)[source]

Zero out a region of a tokio.timeseries.TimeSeries dataset

Parameters:
  • timeseries (tokio.timeseries.TimeSeries) – data from a subset should be zeroed
  • start (datetime.datetime) – Time at which zeroing of all columns in timeseries should begin
  • end (datetime.datetime) – Time at which zeroing all columns in timeseries should end (exclusive)
  • value – value which should be set in every element being reset
Returns:

Nothing

tokio.cli.archive_collectdes.update_datasets(inserts, datasets)[source]

Insert list of tuples into a dataset

Insert a list of tuples into a tokio.timeseries.TimeSeries object serially

Parameters:
  • inserts (list of tuples) –

    List of tuples which should be serially inserted into a dataset. The tuples can be of the form

    or

    • dataset name (str)
    • timestamp (datetime.datetime)
    • column name (str)
    • value
    • reducer name (str)

    where

    • dataset name is the key used to retrieve a target tokio.timeseries.TimeSeries object from the datasets argument
    • timestamp and column name reference the element to be udpated
    • value is the new value to insert into the given (timestamp, column name) location within dataset.
    • reducer name is None (to just replace whatever value currently exists in the (timestamp, column name) location, or ‘sum’ to add value to the existing value.
  • datasets (dict) – Dictionary mapping dataset names (str) to tokio.timeseries.TimeSeries objects
Returns:

number of elements in inserts which were not inserted because their timestamp value was out of the range of the dataset to be updated.

Return type:

int