Assistance Needed on Monitoring Agent Efficiency and Reliability

Dear Team,

I’ve been exploring the development of agents for monitoring and alerting purposes. As an external tool, it will utilize a custom function that periodically checks the Prometheus endpoint.

To give a brief overview: once my service is deployed to production, the agents will monitor its behavior through the Prometheus endpoint. If any anomalies are detected, they will alert the relevant personnel.

I have a few questions:

  • Will this setup efficiently monitor the health of the underlying system?
  • Are there any potential challenges or setbacks I should be aware of?
  • If possible, could you share any references or insights on how an agent operates in the background to ensure it remains up and running until it is instructed to shut down?

I would greatly appreciate any guidance you can provide.

Folks? Can some one pls ack here