Add prometheus metrics for dirsize and limits #76
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While working on 2i2c-org/infrastructure#7166, I realized
that for alerts on a per-user level to be useful, it's critical that we have information about
per-user limits. Currently limits are overriden per-user, and in the future we may have
groups too.
While my initial suggestion
to improve the experience was to run dirsize-exporter in the same pod (so mounted with xfs not NFS),
that won't give us per-user limits. We should still do that, since dirsize-exporter offers
additional info (about total number of files + the oldest file in a dir, both helpful information)
that isn't present in the xfs info.
This PR adds:
total_size_bytes(total used size) andhard_limit_bytes(hard limit) perdirectory.
dirsize, so that existing graphs in grafanawill work regardless of the source of the metrics. This is important since not everyone
using jupyterhub will be using jupyterhub-home-nfs
using pod annotations. This allows us to scrape multiple ports on the same pod, so
we can continue using node-exporter for disk metrics while getting usage metrics
out of jupyterhub-home-nfs, and in the future getting other metrics out of dirsize
exporter
testing.
An alternative I considered is to add support for XFS quotas in prometheus-dirsize-exporter.
However, since home-nfs puts projid and projects files not in their standard locations,
that would be a little more complex. Plus we may not have info about what dirs to report
on and what to not (we only want to report on the paths we manage). So I kept it in here.
Happy to reconsider too.
We should add an option to https://github.com/2i2c-org/prometheus-dirsize-exporter to
allow disabling the dirsize total size bytes so we don't have duplicate metrics.
Validated that this works and prometheus does pick up the metrics:
TODO