Health Check and Alerts

Health Checks

You can automate Health Checks on EdgeXR’s platform or configure it manually. For example, you can configure a Health Check so that it does not run for app instances. Health Checks verify the performance of a specific component, and where possible, verify the module is functioning within the designated normal tolerances.

Health Checks and QoS checks are built into EdgeXR’s platform. Once the results of the Health Checks are available, the EdgeXR team is notified and can take appropriate actions to address any errors or warning conditions detected by the Health Checks.

For example, periodic tests are performed between the EdgeXR Global Controller and the regional controllers and cloudlets. These tests can confirm that the cloudlets and controllers are active and responding. Additionally, the latency between these components is recorded and monitored for performance.

Alerts generated by the Health Checks are treated the same as any other alert; they can be sent to any defined alert receivers provided that the RBAC security is satisfied. Managing alerts this way allows operators to receive notifications of possible up-stack issues that could potentially affect their cloudlet(s).

Alerts

Alerts are generated when criteria that the user defines are met. Alerts enable you to monitor the performance and counteract irregularities within the system, helping you proactively mitigate any performance or functional issues. A notification is sent either through Slack, PagerDuty, or email, depending on the preferred delivery method configured by the user. When the issue or condition is resolved, an additional notification is sent to the user indicating that the issue has been fixed.

Some alerts depend on Health Checks which the user on the EdgeXR Edge-Cloud Console can enable. See below for examples of Alerts.

Supported alerts

  • Cloudlet is offline

  • Resource limitations have exceeded

Severity levels

Alert subscriptions can be filtered based on a severity level. Currently, you can select one of three severity levels.

  • Info: Normal operational messages that require no action.

  • Warning: May indicate that an error will occur if action is not taken.

  • Error: Error conditions.

AlertManager

The AlertManager is a global component of EdgeXR’s product and is responsible for distributing alerts to cloud operators. Alerts are consolidated at the regional level, where each regional controller receives alerts via a notification.

The image below illustrates the AlertManager workflow. A cloud operator can create an alert receiver and set up their preferred notification method through the Edge-Cloud Console. Once an alert receiver is created, the receiver is pushed to the EdgeXR platform. When an alarm is triggered, the Alert Manager from within the platform sends an alert notification to the cloud operator for mitigation.

Alert Receiver Workflow

Alert management

The EdgeXR platform provides a flexible alerting interface that includes the following:

  • RBAC support for users, roles, and organizations that control access to alerts. Any user having the ability to view a resource [that generates an alert] can create or delete an alert receiver for the resource. However, since alerts are raised and cleared by the platform, users cannot create custom alerts.

  • Flexibility to manage the delivery of alerts to different “alert receivers” based on user configuration. We currently support the delivery of alerts to your Slack or email account.

AlertManager and EdgeXR APIs

The AlertManager is designed to be configurable via the EdgeXR APIs, both directly and through the mcctl utility program, providing flexibility for users integrating with their existing monitoring systems.

Action

API Route

Create an Alert Receiver

api/v1/auth/alertreceiver/create

Delete an Alert Receiver

api/v1/auth/alertreceiver/delete

Show all Alert Receivers

api/v1/auth/alertreceiver/show

For detailed AlertReceiver API examples, please refer to the mcctl Utility Reference guide.

Set up alert receivers and notification methods through the console

While you can use the mcctl tool and the commands provided to set up your alerts and notification preferences, we recommend using the Edge-Cloud Console to set up your alert receivers for ease of use.

  1. From the left navigation, select Alert Receivers, which will bring you to the Alert Receivers page.

  2. Select the plus icon in the top right to open the Create Receiver page.

  3. Additional fields appear depending on your selections. Populate all the required fields, then select Create.

  4. Your new Alert Receiver will appear on the Alert Receivers page. When you select the Alert icon, information about the alert is displayed.

Create Alert Receiver screen

Resolving alerts

The alert is considered resolved when the condition that triggered the alert has not been active for a period of time. For example, if the Health Check probe is no longer failing to reach the application, AppInstDown, the alert will get resolved, and a notification about the resolution will be sent to the user using a configured alert receiver.

View Audit Logs

Historical activities performed by you and others within your organization are logged and viewed from the Edge-Cloud Console. These logs are used for diagnostic purposes or error correction, and each activity is logged by date and time. You can trace different events through the various sub-sections, separated into three parts: Raw Viewer, Request, and Response. These sections provide valuable information if you require support from EdgeXR. Copy and paste the traceid from the Raw Viewer section, and email the traceid to [email protected].

To view the audit logs, from the Organizations page, under the Actions menu, select Audit.
You may be performed on this page:

  • Filter logs by region

  • Filter logs by time range

Contact Support

You can email [email protected] to assist you in resolving product issues. To help expedite your request, make sure you copy and paste the traceid, which can be found on the Audit logs page, into your email with a brief description of your issue.