tokio.connectors.nersc_lfsstate module¶
Tools to parse and index the outputs of Lustre’s lfs and lctl commands
to quantify Lustre fullness and health. Assumes inputs are generated by NERSC’s
Lustre health monitoring cron jobs which periodically issue the following:
echo "BEGIN $(date +%s)" >> osts.txt
/usr/bin/lfs df >> osts.txt
echo "BEGIN $(date +%s)" >> ost-map.txt
/usr/sbin/lctl dl -t >> ost-map.txt
Accepts ASCII text files, or gzip-compressed text files.
-
class
tokio.connectors.nersc_lfsstate.NerscLfsOstFullness(cache_file=None)[source]¶ Bases:
dictSubclass of dictionary that self-populates with Lustre OST fullness.
-
__init__(cache_file=None)[source]¶ Load the fullness of OSTs
Parameters: cache_file (str, optional) – Path to a cache file to load instead of issuing the lfs dfcommand
-
__repr__()[source]¶ Serialize OST fullness into a format that resembles
lfs df.Returns: Serialization of the OST fullness in a format similar to lfs df. Columns are- Name of OST (e.g., snx11025-OST0001_UUID)
- Total kibibytes on OST
- Used kibibytes on OST
- Available kibibytes on OST
- Percent capacity used
- Mount point, role, and OST ID
Return type: str
-
_save_cache(output)[source]¶ Serialize object into a form resembling the output of
lfs df.Parameters: output (file) – File-like object into which resulting text should be written.
-
-
class
tokio.connectors.nersc_lfsstate.NerscLfsOstMap(cache_file=None)[source]¶ Bases:
dictSubclass of dictionary that self-populates with Lustre OST-OSS mapping.
-
__init__(cache_file=None)[source]¶ Load the mapping of OSTs to OSSes.
Parameters: cache_file (str, optional) – Path to a cache file to load instead of issuing the lctl dl -tcommand
-
__repr__()[source]¶ Serialize OST map into a format that resembles
lctl dl -t.Returns: Serialization of the OST to OSS mapping in a format similar to lctl dl -t. Fixed-width columns are- index: OST/MDT index
- status: up/down status
- role:
osc,mdc, etc - role_id: name with unique identifier for target
- uuid: UUID of target
- ref_count: number of references to target
- nid: LNET identifier of the target
Return type: str
-
_save_cache(output)[source]¶ Serialize object into a form resembling the output of
lctl dl -t.Parameters: output (file) – File-like object into which resulting text should be written.
-
get_failovers()[source]¶ Determine OSSes which are likely affected by a failover.
Figure out the OSTs that are probably failed over and, for each time stamp and file system, return a list of abnormal OSSes and the expected number of OSTs per OSS.
Returns: Dictionary keyed by timestamps and whose values are dicts of the form: { 'mode': int, 'abnormal_ips': [list of str] }
where
moderefers to the statistical mode of OSTs per OSS, andabnormal_ipsis a list of strings containing the IP addresses of OSSes whose OST counts are not equal to themodefor that time stamp.Return type: dict
-
-
tokio.connectors.nersc_lfsstate._REX_LFS_DF= <_sre.SRE_Pattern object>¶ Regular expression to extract OST fullness levels
Matches output of
lfs dfwhich takes the form:snx11035-OST0000_UUID 90767651352 54512631228 35277748388 61% /scratch2[OST:0]
where the columns are
- OST/MDT UID
- kibibytes total
- kibibytes in use
- kibibytes available
- percent fullness
- file system mount, role, and ID
Carries the implicit assumption that all OSTs are prefixed with snx.
-
tokio.connectors.nersc_lfsstate._REX_OST_MAP= <_sre.SRE_Pattern object>¶ Regular expression to match OSC/MDC lines
Matches output of
lctl dl -twhich takes the form:351 UP osc snx11025-OST0007-osc-ffff8875ac1e7c00 3f30f170-90e6-b332-b141-a6d4a94a1829 5 10.100.100.12@o2ib1
Intentionally skips MGC, LOV, and LMV lines.