High-availability architecture

A Highly Available Architecture (HA2) is an Itential architecture where all components are redundant and can gracefully tolerate at least one catastrophic failure. This architecture is the recommended architecture for production and testing environments.

Architecture overview

The Itential Platform application performs many reads and writes against the database and is sensitive to high latencies. All components must be installed in the same data center and have authentication enabled.

The minimum HA2 architecture is composed of nine VMs:

  • Two Itential Platform servers
  • Three MongoDB servers
  • Three Redis servers
  • One IAG server (optional)
HA2 architecture diagram showing two Platform servers, three MongoDB servers, three Redis servers, and one optional IAG server
Itential Highly Available Architecture

Components

Itential Platform instances communicate with one another through Redis and share data via MongoDB. Adding a new Itential Platform node and pointing it to the correct MongoDB and Redis is sufficient to achieve high availability. As Itential Platforms are added and configured, they are enabled to perform work.

Itential Platforms must have the following configurations:

  • MongoDB connection strings must contain a reference to all members of the replica set
  • Redis configurations must specify the list of all known Redis Sentinels and their Sentinel username and password (connections to HA Redis occur through Sentinels, not directly to Redis)

Highly available MongoDB

MongoDB clusters operate in a primary/secondary model where data written to the primary replicates to the secondary. If a primary MongoDB node fails, the replica set detects this failure and forces an election for a new primary. During this time the replica set may not accept reads and writes until the new primary is selected, usually after a few seconds. Once finished and a new primary is identified, the Itential Platform application resumes normal operation. Operators do not need to take action during this election.

Itential’s MongoDB cluster must have the following requirements:

  • All replica set members must be defined in the Itential Platform config
  • Authentication between the replica members must be done with either a shared key or X.509 certificate
  • The database must have an admin user able to perform any operation
  • The database must have an “itential” user that is granted the least amount of privileges required by the Itential Platform application (Itential Platform must be configured to use this user account)

For more information, see MongoDB Replication documentation.

Highly available Redis

Redis clusters operate in a primary/secondary model where data written to the primary replicates to the secondary. If a primary Redis node fails, the replica set detects this failure via Redis Sentinels and forces an election for a new primary. During this time the replica set may not accept reads and writes until the new primary is selected, usually after a few seconds. Once finished and a new primary is identified, the Itential Platform application resumes normal operation. Operators do not need to take action during this election.

Itential’s Redis cluster must have the following requirements:

  • All Redis nodes must be defined in the Itential Platform profile configuration
  • Authentication between the replica members is done with users defined in the Redis config file
  • Redis must have an admin user able to perform any operation
  • Redis must have an “itential” user that is granted the least amount of privileges required by the application (Itential Platform must be configured to use this user account)
  • Redis must have a replication user that is granted the least amount of privileges required by the replication process
  • Redis Sentinel must be included to monitor the Redis cluster and must be colocated with Redis
  • Redis Sentinel must have an admin user able to perform any Sentinel task
  • Redis nodes must maintain a low latency connection between nodes to avoid replication failures

For more information, see Redis Replication documentation.

Required user accounts

The validated designs are opinionated installations of Itential and its dependencies. The following user accounts are required by the dependencies.

MongoDB

AccountDescription
adminHas full root access to the mongo database. Can read and write to any logical database. Can be used to issue admin commands like forcing an election and configuring replica sets. This is NOT used by the Itential application but is created for admin purposes.
itentialHas read and write access to the "itential" database only. This is the account used by the Itential Platform application.
localaaaHas read and write access to the "LocalAAA" database. This is used by the Local AAA adapter for local, non-LDAP logins.

Redis

AccountDescription
adminHas full root access to the Redis database, all channels, all keys, all commands. This is NOT used by the Itential application but is created for admin purposes.
itentialHas full access to the Redis database, all channels, all keys, EXCEPT the following commands: asking, cluster, readonly, readwrite, bgrewriteaof, bgsave, failover, flushall, flushdb, psync, replconf, replicaof, save, shutdown, sync. This is the account used by the Itential Platform application.
repluserHas access to the minimum set of commands to perform replication: psync, replconf, ping.
admin (Sentinel)Full root access to Redis Sentinel. This is NOT used by the Itential application but is created for admin purposes of Redis Sentinel.
sentineluserHas access to the minimum set of commands to perform sentinel monitoring: multi, slaveof, ping, exec, subscribe, `config

Network requirements

In an environment where components are installed on more than one host, the following network traffic flows need to be allowed. All ports and networking specs are TCP protocol unless otherwise noted. Not all ports will need to be open for every supported architecture. Secure ports are only required when explicitly configured.

SourceDestinationPortDescription
Desktop DevicesItential Platform3000Web browser connections to Itential Platform over HTTP
Desktop DevicesItential Platform3443Web browser connections to Itential Platform over HTTPS
Desktop DevicesIAG8083Web browser connections to IAG over HTTP
Desktop DevicesIAG8443Web browser connections to IAG over HTTPS
Desktop DevicesHashiCorp Vault8200Web browser connections to HashiCorp Vault
Itential PlatformMongoDB27017Itential Platform connects to MongoDB
Itential PlatformRedis6379Itential Platform connects to Redis
Itential PlatformRedis26379Itential Platform connects to Redis Sentinel (HA installations only)
Itential PlatformIAG8083Itential Platform connects to IAG over HTTP
Itential PlatformIAG8443Itential Platform connects to IAG over HTTPS
Itential PlatformHashiCorp Vault8200Itential Platform connects to HashiCorp Vault
Itential PlatformLDAP389Itential Platform connects to LDAP (when LDAP adapter is used for authentication)
Itential PlatformLDAP636Itential Platform connects to LDAP with TLS (when LDAP adapter is used for authentication)
Itential PlatformRADIUS1812Itential Platform connects to RADIUS (when RADIUS adapter is used for authentication; uses UDP)
MongoDBMongoDB27017Each MongoDB talks to other MongoDBs for replication (HA installations only)
RedisRedis6379Each Redis talks to other Redis sources for replication (HA installations only)
RedisRedis26379Each Redis uses Redis Sentinel to monitor the Redis processes (HA installations only)

Hardware requirements

Processor

Processor specification requirements:

  • Second generation or better Intel Xeon Platinum 8000 series processors
  • Third generation or better AMD EPYC 7000 series processors

Memory

Memory specification requirement:

  • DDR5 DRAM 3200 MHz or higher

Storage

Storage performance requirements in IOPS (16 kiB):

  • 20000+ IOPS
  • Non-spinning media (SSD, NVMe)

Network

Network speed requirement:

  • 10 Gbps or higher

In some instances, adding additional dedicated interfaces that are focused on routing specific traffic to specific external systems can be explored. This routing of traffic would be configured at the OS-level (custom interfaces and routes) and requires the system administrator to manage it. An example would be separating NSO traffic from Redis/MongoDB destined traffic.

Hypervisor/host OS settings

These settings are strongly recommended for high load applications of Itential Platform:

  • CPU affinity settings or similar functionality to prevent CPU starvation
  • Full memory reservation
  • One physical CPU per VM is preferred
  • Huge pages for memory support enabled (except MongoDB)
  • Memory compression disabled
  • Minimal CPU allocation settings for scheduler according to CPU clock

Example: Assuming an Itential Platform VM on a server capable of 2.5GHz nominal speed:

CPU clock reservation = 16vCPU × 2.5GHz

Follow hypervisor recommendations when performing CPU reservations. In most cases the total of all CPU reservations for all VMs on a host cannot be more than 90% of the host capacity as 10% is reserved by the host itself.

MongoDB discourages the utilization of Transparent Huge Pages.

Server specifications

For production environments, all Itential Platform components should be installed on their own individual servers to properly support High Availability (HA). Disk references to pronghorn (seen in older deployments) should be changed to itential.

Itential Platform server

SpecRequirementProduction ENV
CPU64-bit x86 CPU cores16
OSRHEL
Rocky
8/9
8/9
RAMDDR5 DRAM 3200 MHz64 GB
Disk (Solid State Media, SSD, NVMe)Total
/var/log/itential
/opt/itential
/
250 GB
100 GB
100 GB
50 GB

MongoDB server

SpecRequirementProduction ENV
CPU64-bit x86 CPU cores16
OSRHEL
Rocky
8/9
8/9
RAMDDR5 DRAM 3200 MHz128 GB
Disk (Solid State Media, SSD, NVMe)Total
/var/log/mongodb
/var/lib/mongo
/
1000 GB
100 GB
850 GB
50 GB

Redis server

SpecRequirementProduction ENV
CPU64-bit x86 CPU cores8
OSRHEL
Rocky
8/9
8/9
RAMDDR5 DRAM 3200 MHz32 GB
Disk (Solid State Media, SSD, NVMe)Total
/var/log/redis
/var/lib/redis
/
100 GB
10 GB
50 GB
40 GB

IAG server

SpecRequirementProduction ENV
CPU64-bit x86 CPU cores16
OSRHEL
Rocky
8/9
8/9
RAMDDR5 DRAM 3200 MHz32 GB
Disk (Solid State Media, SSD, NVMe)Total
/var/log/automation-gateway
/var/lib/automation-gateway
/opt/automation-gateway
/
80 GB
10 GB
50 GB
10 GB
10 GB

Troubleshoot

Troubleshoot common issues related to high availability architecture.

Identify failed task servers

If task execution fails in an HA Itential Platform environment, you can determine the specific Itential Platform server that attempted to execute the task by referencing the Server ID of the failed task.

Server ID configuration

Each Itential Platform server in an HA environment has a unique Server ID property that is defined in one of two ways:

  • Manually via the serverName property of the properties.json file located in the Itential Platform installation directory
  • Automatically by combining the MAC address and Itential Platform port values and hashing them

When a server attempts to execute a task, its Server ID property is added to the Task Details panel of that task. Verify there are no connection issues affecting the server identified by the Server ID property of a failed task.