tokio.cli.archive_collectdes module¶
Dump a lot of data out of ElasticSearch using the Python API and native scrolling support. Output either as native json from ElasticSearch or as serialized TOKIO TimeSeries (TTS) HDF5 files.
-
tokio.cli.archive_collectdes.
dataset2metadataset_key
(dataset_key)[source]¶ Return the metadataset name corresponding to a dataset name
Parameters: dataset_name (str) – Name of a dataset Returns: Name of corresponding metadataset name Return type: str
-
tokio.cli.archive_collectdes.
metadataset2dataset_key
(metadataset_name)[source]¶ Return the dataset name corresponding to a metadataset name
Metadatasets are not ever stored in the HDF5 and instead are only used to store data needed to correctly calculate dataset values. This function maps a metadataset name to its corresponding dataset name.
Parameters: metadataset_name (str) – Name of a metadataset Returns: Name of corresponding dataset name, or None if metadataset_name does not appear to be a metadataset name. Return type: str
-
tokio.cli.archive_collectdes.
normalize_cpu_datasets
(inserts, datasets)[source]¶ Normalize CPU load datasets
Divide each element of CPU datasets by the number of CPUs counted at each point in time. Necessary because these measurements are reported on a per-core basis, but not all cores may be reported for each timestamp.
Parameters: - inserts (list of tuples) – list of inserts that were used to populate datasets
- datasets (dict of TimeSeries) – all of the datasets being populated
Returns: Nothing
-
tokio.cli.archive_collectdes.
pages_to_hdf5
(pages, output_file, init_start, init_end, query_start, query_end, timestep, num_servers, devices_per_server, threads=1)[source]¶ Stores a page from Elasticsearch query in an HDF5 file Take pages from ElasticSearch query and store them in output_file
Parameters: - pages (list) – A list of page objects (dictionaries)
- output_file (str) – Path to an HDF5 file in which page data should be stored
- init_start (datetime.datetime) – Lower bound of time (inclusive) to be
stored in the
output_file
. Used when creating a non-existent HDF5 file. - init_end (datetime.datetime) – Upper bound of time (inclusive) to be
stored in the
output_file
. Used when creating a non-existent HDF5 file. - query_start (datetime.datetime) – Retrieve data greater than or equal to this time from Elasticsearch
- query_end (datetime.datetime) – Elasticsearch
- timestep (int) – Time, in seconds, between successive sample intervals
to be used when initializing
output_file
- num_servers (int) – Number of discrete servers in the cluster. Used
when initializing
output_file
. - devices_per_server (int) – Number of SSDs per server. Used when
initializing
output_file
. - threads (int) – Number of parallel threads to utilize when parsing the Elasticsearch output
-
tokio.cli.archive_collectdes.
process_page
(page)[source]¶ Go through a list of docs and insert their data into a numpy matrix. In the future this should be a flush function attached to the CollectdEs connector class.
Parameters: page (dict) – A single page of output from an Elasticsearch scroll query. Should contain a hits
key.
-
tokio.cli.archive_collectdes.
reset_timeseries
(timeseries, start, end, value=-0.0)[source]¶ Zero out a region of a tokio.timeseries.TimeSeries dataset
Parameters: - timeseries (tokio.timeseries.TimeSeries) – data from a subset should be zeroed
- start (datetime.datetime) – Time at which zeroing of all columns in timeseries should begin
- end (datetime.datetime) – Time at which zeroing all columns in timeseries should end (exclusive)
- value – value which should be set in every element being reset
Returns: Nothing
-
tokio.cli.archive_collectdes.
update_datasets
(inserts, datasets)[source]¶ Insert list of tuples into a dataset
Insert a list of tuples into a
tokio.timeseries.TimeSeries
object seriallyParameters: - inserts (list of tuples) –
List of tuples which should be serially inserted into a dataset. The tuples can be of the form
- dataset name (str)
- timestamp (
datetime.datetime
) - column name (str)
- value
or
- dataset name (str)
- timestamp (
datetime.datetime
) - column name (str)
- value
- reducer name (str)
where
- dataset name is the key used to retrieve a target
tokio.timeseries.TimeSeries
object from the datasets argument - timestamp and column name reference the element to be udpated
- value is the new value to insert into the given (timestamp, column name) location within dataset.
- reducer name is None (to just replace whatever value currently exists in the (timestamp, column name) location, or ‘sum’ to add value to the existing value.
- datasets (dict) – Dictionary mapping dataset names (str) to
tokio.timeseries.TimeSeries
objects
Returns: number of elements in inserts which were not inserted because their timestamp value was out of the range of the dataset to be updated.
Return type: - inserts (list of tuples) –