Itential Automation Platform

On this page:

Troubleshooting Guide

This troubleshooting guide is to assist in identifying issues that may impact application function. Since network environments vary, the scenarios included in this guide are provided only as an example. If at any time you are unable to source or solve a problem, please open a support ticket in the ISD portal.

⚠ Note: This Troubleshooting Guide is a living document and subject to change with the release of new versions of IAP.

Health Checks

Health check monitoring represents overall system status by using two colors.

Green (Good)

This color is displayed when:

  • The system is passing health checks and monitoring is not reporting any problems or major issues.
  • The system is completing requests normally.
  • The uptime status is current and being updated.

Red (Down)

This color is displayed when:

  • System monitoring is reporting request failures or connection issues for an environment.

If it shows both green and red, the adapter is up but not connected to the end point.

Figure 1: Troubleshooting Adapters

Troubleshooting Adapters

Profile

You can check application status using Profiles.

  1. From Admin Essentials, select the active Profile, and then navigate to Applications (tab). The page displays many details in a very wide table.

  2. Check the number of applications found in the lower right corner to see if the number of services matches the profile.

    Figure 2: Troubleshooting Applications

    Troubleshooting Applications

  3. Check the Indexing tab to verify that all Indexes are up and running.

    Figure 3: Troubleshooting Indexes

    Troubleshooting Indexes

  4. Using Automation Studio, verify that you can open or save an automation and run a task. If you are not able to, restart Workflow Engine and then repeat the process.

Prerequisite Checklist

When you are running automations in production, it is important to know that your system is available and responsive.

  1. Verify that RAM and CPU meet the recommended specifications. Add more memory if required to improve the overall RAM usage.

  2. Make sure dependencies have been installed with recommended versions (RabbitMQ, Redis, MongoDB, etc.).

  3. Check Release Notes for the latest Maintenance Release to view any changes or updates that may have occurred.

  4. Check the status of all dependencies (RabbitMQ, Redis, and MongoDB) using the status commands below:

    systemctl status mongod.service
    systemctl status rabbitmq-server
    systemctl status redis
  5. If the dependencies have not started, use the commands below to start them:

    systemctl start mongod.service
    systemctl start rabbitmq-server
    systemctl start redis
  6. Check to make sure the status of the pronghorn service is up. If not, use this start command:

    systemctl start pronghorn
  7. To check the local host status, the most common link used will be the domain name followed by status:

   https://(your domain)/status
  1. Check for any zombie or stalled process related to IAP using this command:

    ps -ef|grep Z
  2. Kill any zombie processes found and then restart the service.

Logging

Logging provides useful data to evaluate system health, more easily debug errors, and capture events that happen during runtime.

  1. Check the application logs or journal logs to verify if the application is up and working correctly, or to determine if an app or adapter is down or has no connection in Admin Essentials.

    • Application logs are commonly found at /var/log/pronghorn.
    • To view journal logs use the journalctl command journalctl -fu pronghorn.
  2. From Admin Essentials, navigate to Applications → AGManager → loggerProps. Three log types can be viewed: application, console, syslog.

    Figure 4: Logger Props

    Logger Props

Log Type Subject of Logging
Application Pertains to AG Manager and log files for the application.
Console Pertains to log files showing everything under IAP Standard-out (STD_OUT).
Syslog Pertains to system logs that can be produced in any specified host or localhost.

To configure Syslog, go to the Configuration guide.

To configure alarmProps for more logging options, see the IAP Profiles guide.

To configure alarmProps using various SNMP Traps, see SNMP Notification Types.

Log Levels

Log levels are defined in the loggerProps and each log level has a severity associated with it. The most important log level is defined as an error.

Log Level Description
Error Errors or failures that impact functionality.
Warn Issues or unexpected behavior which does not impact functionality.
Info Successful status change; should be limited to one message per successful action.
Debug Major events such as successful data retrieval from an external system or the completion of a function.
Trace Minor events within functions. These are "breadcrumbs" within a function.
Spam Collect or output excessive or repetitive messages, large text files, large quantities of data such as search results; information which though relevant would clutter up the log file and render it unusable.

Figure 5: Log Settings

ts04

Production environments should have the log_level set to warn or info. Debug, trace, and spam log levels will generate a large amount of log data and additional server load. Only configure production servers in debug mode when tracing logs are necessary to the operations.

For systemd operating systems, the system journal manages console logging. The system journal may also contain application life cycle error messages that cannot be saved to the IAP file logs.

Systemd logs will show IAP starting up and the deployment log files.

Additional systemd logging information is stored in the systemd journal or at /var/log/messages on System V (“System Five”) hosts.

Monitor the system journal for errors and warnings using the shell command journalctl -f.

Filter log messages to show only IAP logs using the shell command journalctl -f -u pronghorn.service.

Monitor IAP logs with a tail follow shell command, such as tail -F /var/log/pronghorn/pronghorn.log.

Apart from the IAP logs it is essential to capture logs for dependencies like MongoDB, RabbitMQ, and Redis. The configuration files for each dependency will contain the location of their respective log files. If an ISD ticket is opened, the service desk may or may not request a copy of these logs.

For more information on Log Settings, please see Event Logs.

Connectivity

To troubleshoot network-connection issues:

  1. Verify your profiles or service_configs to ensure they have been configured correctly within the database being used. For example:

    • Host Name (of the endpoint)
    • Port Number
    • User Credentials
  2. Verify your properties are configured correctly and ensure that there are no network issues between IAP and the other dependencies or endpoints.

  3. Use telnet to perform a basic connectivity test between the endpoints. For example, to test connectivity from the IAP server to the mongo-server (if the 27017 default port is used), verify properties.json for the appropriate hostname and port number. If the port is blocked by a firewall, then the network engineering team will need to enable the port for connectivity.

    telnet mongo-server-hostname 27017
    telnet redis-server-hostname 6379
    telnet redis-sentinels-hostname 26379
    telnet rabbit-server-hostname 5672
  4. Use the folllowing command to verify if a port for an application is up and listening:

    lsof -Pi |grep LISTEN

Additional Troubleshooting Network-Connection Resources

It’s important to have additional resources and best practices in place for network-connection troubleshooting.

Authentication

IAP should be configured to use only one of four different authentication methods.

  • LDAP
  • Azure AD
  • Radius
  • Local AAA (Lab and Development environments)

LDAP

Verify the LDAP user is connected to the LDAP server and that a connection to AD (Active Directory) can be made.

Run ldapsearch to verify the IP address of the LDAP Server, Port Number, Base DN, Username, Domain, Password, and Common Name.

ldapsearch -H ldaps://<IP Address of LDAP Server>:<Port> -b '<Base DN>' -D <UserName>@<Domain> -w '<Password>' cn='<Common Name>' -s sub -x

For more information on LDAP connections and configuration, please see LDAP Adapter.

Azure AD

Verify that Azure AD has been set up ad configured properly. Double-check the Name, Supported Account Types, and Redirect URL.

For more information on Azure AD set up and configuration, please see Azure Adapter.

RADIUS

Verify that RADIUS is configured properly. For more information on RADIUS, please see RADIUS Adapter

Local AAA

The Local AAA adapter may be used in lab and development environments to locally authenticate users against a MongoDB collection inside the local AAA database.

Make sure your configuration and credentials are correct and that the Local AAA is connected to the database.

For more information on Local AAA, please see Local AAA Adapter

High Availability (HA) Issues

High Availability (HA) allows users to run more than one IAP instance in a data center. For example, you may have a data center on the East Coast with 6 active instances of IAP running virtually while you have a back-up on the West Coast that contains 6 passive instances of IAP running in a passive state for disaster recovery.

Check the Server Id located in the Task History modal under the Metrics tab to verify that all available instances are running correctly.

Figure 6: Server Id

Server Id

The Server Id will specify the MAC address of the server used to process the task being worked. If there is an error or failure, make sure there are no issues with the connection to the server.

Node Memory Usage

Use Admin Essentials to evaluate your Core Memory usage.

Figure 7: Core Memory

Core Memory Usage

From the Profile view you can also check the memory usage for both Applications and Adapters. You can also compare the memory usage here to the memory being used in your local server controls.

Figure 8: Applications Memory

Applications Memory

If the memory for an app keeps growing over time and does not show a decrease, there may be a memory leak. Please enter an ISD ticket with Itential for any product apps or adapters showing higher memory use.