I use some batch scripts in my proxmox installation. They are in cron.hourly and daily checking for virus and ram/CPU load of my LXC containers. An email is send on condition.
What are your tipps or solution without unnecessary load on disc io or CPU time. Lets keep it simple.
Edit: a lot of great input about possible solutions. In addition TIL “that keep it simple” means a lot of different things to people.😉
I’ll keep it very simple: I don’t.
If I’m trying to do something and I notice an issue, then I’ll investigate it. But if it’s not affecting anything, is it really a problem?
I was kind of the same, but I still collected metrics, because I just love graphs.
Over time I ended up setting alerts for failures I wish I was aware of earlier. Some examples:
What do you use to collect these metrics?
I use Telegraf for most of the metrics.