- 01 Apr 2024
-
DarkLight
-
PDF
Monitoring
- Updated on 01 Apr 2024
-
DarkLight
-
PDF
To maximize availability and ensure consistent performance, Itential recommends the following to monitor Itential Automation Platform (IAP). The examples provided here are designed to configure the application for monitoring and to receive actionable alerts.
Basic Monitoring
The /health/status
route requires no login that can be used for basic monitoring. This route is appropriate to use by a load balancer to determine if an individual node is healthy. IAP will return a 200 when healthy. Any other response or lack of response should be interpreted as unhealthy. It is recommended to set the timeout for this request to 1 second.
Effective with IAP 2023.1, the GET /status
API is deprecated and targeted for replacement in 2023.2. The replacement API, /health/status
, reports the health of all apps and adapters. Refer to the posted notice: Deprecation of Status API.
Request
GET /health/status HTTP/1.1
Host: *some-iap-host*:3000
Accept: /
Healthy Response
The following example shows a healthy response (i.e., apps, adapters, Redis, Mongo, RabbitMQ is healthy).
In the example, the serverId
and ServerName
is a customization. You are responsible for modifying to suit your particular environment. See the Related Reading for important information.
{
"host":"automation-platform-20231-stable-0",
"serverId":"582672c5a088e6f0ca182723e1471805179faf272e1b5750e12bc06f962b13ee",
"serverName":null,
"services":[
{
"service":"redis",
"status":"running"
},
{
"service":"mongo",
"status":"running"
},
{
"service":"vault",
"status":"running"
},
{
"service":"rabbitmq",
"status":"running"
}
],
"timestamp":1711633372837,
"apps":"running",
"adapters":"running"
}
Note: A healthy response from IAP 2023.2.x will not include "service":"rabbitMQ".
Unhealthy Response
If there are any failed applications, the following example shows a degraded (unhealthy) response. The failed applications can be viewed from the Admin Essentials - Alerts dashboard.
{
"host":"automation-platform-20231-stable-0",
"serverId":"582672c5a088e6f0ca182723e1471805179faf272e1b5750e12bc06f962b13ee",
"serverName":null,
"services":[
{
"service":"redis",
"status":"running"
},
{
"service":"mongo",
"status":"running"
},
{
"service":"vault",
"status":"running"
},
{
"service":"rabbitmq",
"status":"running"
}
],
"timestamp":1711633372837,
"apps":"degraded",
"adapters":"running"
}
Note: An unhealthy response from IAP 2023.2.x will not include "service":"rabbitMQ".
Application and Adapter Monitoring
Itential also recommends that you monitor individual applications and adapters to determine if they are healthy. IAP has two routes to monitor apps and adapters, with each requiring a valid session token.
Application Monitoring
For each application in the results array careful attention should be given to the state
field. If it does not say RUNNING
then the application should be considered unhealthy. The uptime
, memoryUsage
, and cpuUsage
fields can be tracked to monitor how consumption of these resources changes for better or worse over time.
Request
GET /health/applications?token=*some-iap-token* HTTP/1.1
Host: *some-iap-host*:3000
Accept: /
Response
{
"results": [
{
"id": "AdminEssentials",
"package_id": "@itential/app-admin_essentials",
"version": "3.5.64-2021.2.50.0",
"type": "Application",
"description": "Itential Automation Platform's administration suite.",
"state": "RUNNING",
"connection": null,
"uptime": 320.914691654,
"memoryUsage": {
"rss": 52662272,
"heapTotal": 29638656,
"heapUsed": 27756488,
"external": 38208460,
"arrayBuffers": 36697945
},
"cpuUsage": {
"user": 1284858,
"system": 167948
},
"pid": 15283,
"logger": {
"console": "info",
"file": "info",
"syslog": "warning"
},
"routePrefix": "admin",
"prevUptime": 206.380886528
},
...more applications...
]
}
Adapter Monitoring
For each adapter in the results array careful attention should be given to the state field. If it does not say RUNNING
then the adapter should be considered unhealthy. Adapters have an additional connection.state
field that indicates if the adapter is currently passing its health check to the target system. Anything other than ONLINE
should be considered unhealthy. The uptime
, memoryUsage
, and cpuUsage
fields can be tracked to monitor how consumption of these resources changes for better or worse over time.
Request
GET /health/adapters?token=*some-iap-token* HTTP/1.1
Host: *some-iap-host*:3000
Accept: /
Response
{
"results": [
{
"id": "local_aaa",
"package_id": "@itential/adapter-local_aaa",
"version": "4.3.1-2021.2.3.0",
"type": "Adapter",
"description": "Simple AAA against local persistence store.",
"state": "RUNNING",
"connection": {
"state": "ONLINE"
},
"uptime": 804.227121682,
"memoryUsage": {
"rss": 62124032,
"heapTotal": 36909056,
"heapUsed": 34986384,
"external": 56421712,
"arrayBuffers": 54911197
},
"cpuUsage": {
"user": 1989815,
"system": 231054
},
"pid": 15568,
"logger": {
"console": "info",
"file": "info",
"syslog": "warning"
},
"routePrefix": "local_aaa",
"prevUptime": 426.752105274
}
...more adapters...
]
}
SNMP Traps
IAP also produces SNMP Traps that can be leveraged to perform alerting when certain events occur. More information is available in the SNMP Notification Types article.