Active-standby architecture

An active/standby architecture (ASA) is an Itential architecture where all components are redundant and can gracefully tolerate at least one catastrophic failure while providing redundancy for the primary data center. This architecture is the recommended architecture for production environments that must adhere to strict business continuity and uptime demands. It builds on the HA2 architecture, essentially using two HA2 installs in geographically redundant locations with a larger MongoDB replica set that is also geographically redundant.

Architecture overview

The Itential Platform application performs many reads and writes against the database and is sensitive to high latencies. All active components must run in the same data center. The MongoDB replication process ensures that data written to the primary node in the active data center replicates to the geographically redundant MongoDB nodes in the secondary data center. All components must have authentication enabled.

The minimum ASA architecture is composed of 17 VMs:

  • Four Itential Platform servers
  • Five MongoDB servers
  • Six Redis servers
  • One IAG server
Active-standby architecture diagram showing four Platform servers, five MongoDB servers, six Redis servers, and one IAG server across geographically redundant data centers
Itential Highly Available Architecture

Highly available Itential Platform

Itential Platform instances communicate with one another through Redis and share data via MongoDB. Adding a new Itential Platform node and pointing it to the correct MongoDB and Redis is sufficient to achieve high availability. As Itential Platforms are added and configured, they are enabled to perform work.

Itential Platforms must have the following configurations:

  • MongoDB connection strings must contain a reference to all members of the replica set
  • Redis configurations must specify the list of all known Redis Sentinels and their Sentinel username and password (connections to HA Redis occur through Sentinels, not directly to Redis)

Configure standby site servers

Applies to Itential Automation Platform 2023.1 and later.

In an ASA, the secondary data center (standby site) contains Itential Platform servers that you must configure to remain passive until a failover event occurs. This configuration prevents both data centers from processing workloads simultaneously.

Configure all Itential Platform servers in the standby site with the following settings.

Disable task worker and job processing

Disable task worker and job start on all Itential Platform servers in the standby site. This prevents standby servers from claiming workflow tasks or starting jobs, which ensures only the active site processes workloads.

Set the following properties in the properties.json file before you start the Itential Platform:

  • processTasksOnStart: false - Prevents the Task Worker from processing tasks on startup
  • processJobsOnStart: false - Prevents jobs from starting on startup

File location: /opt/pronghorn/current/properties.json

Standby site configuration:

1{
2 "processTasksOnStart": false,
3 "processJobsOnStart": false,
4 "pathProps": {
5 "sdk_dir": "/opt/pronghorn-applications",
6 "encrypted": true
7 },
8 "id": "StandbyProfile",
9 "mongoProps": {
10 "credentials": {
11 "passwd": "itentialPassword",
12 "user": "itentialUser"
13 },
14 "db": "pronghorn",
15 "url": "mongodb://mongo1:27017,mongo2:27017,mongo3:27017/?replicaSet=rs0"
16 }
17}

Active site configuration (for comparison):

1{
2 "processTasksOnStart": true,
3 "processJobsOnStart": true,
4 "pathProps": {
5 "sdk_dir": "/opt/pronghorn-applications",
6 "encrypted": true
7 },
8 "id": "ActiveProfile",
9 "mongoProps": {
10 "credentials": {
11 "passwd": "itentialPassword",
12 "user": "itentialUser"
13 },
14 "db": "pronghorn",
15 "url": "mongodb://mongo1:27017,mongo2:27017,mongo3:27017/?replicaSet=rs0"
16 }
17}

Pre-configuring these settings in properties.json is the recommended approach for standby sites.

Method 2: Use UI toggles

If the Itential Platform is already running, you can disable task worker and job start through the UI.

2

Toggle off job processing

Under Operation Execution, toggle off Accept New Jobs and Execute Job Tasks.

Job and task toggle

Job and task toggle confirmation

Stop Operations Manager

Stop Operations Manager to prevent the standby site from executing scheduled triggers, API triggers, and event triggers. If Operations Manager runs on both sites, triggers fire twice and cause duplicate workflow executions.

2

Locate Operations Manager

Locate Operations Manager in the applications list.

3

Stop the application

Click Stop.

4

Verify stopped status

Verify that the application status shows stopped, which is indicated by the Play icon.

Stopped Operations Manager

Operations Manager stopped state

Configure these settings on all Active/Standby deployments running version 2023.1 or later.

Highly available MongoDB

MongoDB clusters operate in a primary/secondary model where data written to the primary replicates to the secondary. If a primary MongoDB node fails, the replica set detects this failure and forces an election for a new primary. During this time the replica set may not accept reads and writes until the new primary is selected, usually after a few seconds. Once finished and a new primary is identified, the Itential Platform application resumes normal operation. Operators do not need to take action during this election.

To preserve an odd number of replicas to prevent a split-brain scenario when/if an election occurs, this architecture requires the MongoDB cluster to be split across three data centers or regions: 2 in the primary region, 2 in the secondary region, and 1 in a tertiary region. When a region is lost there remain three voting members of the replica set. The replica set configuration must enforce a preference to influence the voting in this architecture to guarantee that the primary MongoDB shifts to the secondary region in the case of a disaster.

Itential’s MongoDB cluster must have the following requirements:

  • All replica set members must be defined in the Itential Platform config
  • Authentication between the replica members must be done with either a shared key or X.509 certificate
  • The database must have an admin user able to perform any operation
  • The database must have an “itential” user that is granted the least amount of privileges required by the Itential Platform application (Itential Platform must be configured to use this user account)
  • The replica set configuration must leverage the priority settings to influence voting as follows:
MongoDB NodePriority Setting
Primary Region Database 110
Primary Region Database 210
Secondary Region Database 15
Secondary Region Database 25
Tertiary Region Database 31

Related reading:

Highly available Redis

Redis clusters operate in a primary/secondary model where data written to the primary replicates to the secondary. If a primary Redis node fails, the replica set detects this failure via Redis Sentinels and forces an election for a new primary. During this time the replica set may not accept reads and writes until the new primary is selected, usually after a few seconds. Once finished and a new primary is identified, the Itential Platform application resumes normal operation. Operators do not need to take action during this election.

Itential’s Redis cluster must have the following requirements:

  • All Redis nodes must be defined in the Itential Platform profile configuration
  • Authentication between the replica members is done with users defined in the Redis config file
  • Redis must have an admin user able to perform any operation
  • Redis must have an “itential” user that is granted the least amount of privileges required by the application (Itential Platform must be configured to use this user account)
  • Redis must have a replication user that is granted the least amount of privileges required by the replication process
  • Redis Sentinel must be included to monitor the Redis cluster and must be colocated with Redis
  • Redis Sentinel must have an admin user able to perform any Sentinel task
  • Redis nodes must maintain a low latency connection between nodes to avoid replication failures

For more information, see Redis Replication documentation.

Required user accounts

The validated designs are opinionated installations of Itential and its dependencies. The following user accounts are required by the dependencies.

MongoDB

AccountDescription
adminHas full root access to the mongo database. Can read and write to any logical database. Can be used to issue admin commands like forcing an election and configuring replica sets. This is NOT used by the Itential application but is created for admin purposes.
itentialHas read and write access to the "itential" database only. This is the account used by the Itential Platform application.
localaaaHas read and write access to the "LocalAAA" database. This is used by the Local AAA adapter for local, non-LDAP logins.

Redis

AccountDescription
adminHas full root access to the Redis database, all channels, all keys, all commands. This is NOT used by the Itential application but is created for admin purposes.
itentialHas full access to the Redis database, all channels, all keys, EXCEPT the following commands: asking, cluster, readonly, readwrite, bgrewriteaof, bgsave, failover, flushall, flushdb, psync, replconf, replicaof, save, shutdown, sync. This is the account used by the Itential Platform application.
repluserHas access to the minimum set of commands to perform replication: psync, replconf, ping.
admin (Sentinel)Full root access to Redis Sentinel. This is NOT used by the Itential application but is created for admin purposes of Redis Sentinel.
sentineluserHas access to the minimum set of commands to perform sentinel monitoring: multi, slaveof, ping, exec, subscribe, `config

Network requirements

In an environment where components are installed on more than one host, the following network traffic flows need to be allowed. All ports and networking specs are TCP protocol unless otherwise noted. Not all ports will need to be open for every supported architecture. Secure ports are only required when explicitly configured.

SourceDestinationPortDescription
Desktop DevicesItential Platform3000Web browser connections to Itential Platform over HTTP
Desktop DevicesItential Platform3443Web browser connections to Itential Platform over HTTPS
Desktop DevicesIAG8083Web browser connections to IAG over HTTP
Desktop DevicesIAG8443Web browser connections to IAG over HTTPS
Desktop DevicesHashiCorp Vault8200Web browser connections to HashiCorp Vault
Itential PlatformMongoDB27017Itential Platform connects to MongoDB
Itential PlatformRedis6379Itential Platform connects to Redis
Itential PlatformRedis26379Itential Platform connects to Redis Sentinel (HA installations only)
Itential PlatformIAG8083Itential Platform connects to IAG over HTTP
Itential PlatformIAG8443Itential Platform connects to IAG over HTTPS
Itential PlatformHashiCorp Vault8200Itential Platform connects to HashiCorp Vault
Itential PlatformLDAP389Itential Platform connects to LDAP (when LDAP adapter is used for authentication)
Itential PlatformLDAP636Itential Platform connects to LDAP with TLS (when LDAP adapter is used for authentication)
Itential PlatformRADIUS1812Itential Platform connects to RADIUS (when RADIUS adapter is used for authentication; uses UDP)
MongoDBMongoDB27017Each MongoDB talks to other MongoDBs for replication (HA installations only)
RedisRedis6379Each Redis talks to other Redis sources for replication (HA installations only)
RedisRedis26379Each Redis uses Redis Sentinel to monitor the Redis processes (HA installations only)

Hardware requirements

Processor

Processor specification requirements:

  • Second generation or better Intel Xeon Platinum 8000 series processors
  • Third generation or better AMD EPYC 7000 series processors

Memory

Memory specification requirement:

  • DDR5 DRAM 3200 MHz or higher

Storage

Storage performance requirements in IOPS (16 kiB):

  • 20000+ IOPS
  • Non-spinning media (SSD, NVMe)

Network

Network speed requirement:

  • 10 Gbps or higher

In some instances, adding additional dedicated interfaces that are focused on routing specific traffic to specific external systems can be explored. This routing of traffic would be configured at the OS-level (custom interfaces and routes) and requires the system administrator to manage it. An example would be separating NSO traffic from Redis/MongoDB destined traffic.

Hypervisor/host OS settings

These settings are strongly recommended for high load applications of Itential Platform:

  • CPU affinity settings or similar functionality to prevent CPU starvation
  • Full memory reservation
  • One physical CPU per VM is preferred
  • Huge pages for memory support enabled (except MongoDB)
  • Memory compression disabled
  • Minimal CPU allocation settings for scheduler according to CPU clock

Example: Assuming an Itential Platform VM on a server capable of 2.5GHz nominal speed:

CPU clock reservation = 16vCPU × 2.5GHz

Follow hypervisor recommendations when performing CPU reservations. In most cases the total of all CPU reservations for all VMs on a host cannot be more than 90% of the host capacity as 10% is reserved by the host itself.

MongoDB discourages the utilization of Transparent Huge Pages.

Server specifications

For production environments, all Itential Platform components should be installed on their own individual servers to properly support High Availability (HA). Disk references to pronghorn (seen in older deployments) should be changed to itential.

Itential Platform server

SpecRequirementProduction ENV
CPU64-bit x86 CPU cores16
OSRHEL
Rocky
8/9
8/9
RAMDDR5 DRAM 3200 MHz64 GB
Disk (Solid State Media, SSD, NVMe)Total
/var/log/itential
/opt/itential
/
250 GB
100 GB
100 GB
50 GB

MongoDB server

SpecRequirementProduction ENV
CPU64-bit x86 CPU cores16
OSRHEL
Rocky
8/9
8/9
RAMDDR5 DRAM 3200 MHz128 GB
Disk (Solid State Media, SSD, NVMe)Total
/var/log/mongodb
/var/lib/mongo
/
1000 GB
100 GB
850 GB
50 GB

Redis server

SpecRequirementProduction ENV
CPU64-bit x86 CPU cores8
OSRHEL
Rocky
8/9
8/9
RAMDDR5 DRAM 3200 MHz32 GB
Disk (Solid State Media, SSD, NVMe)Total
/var/log/redis
/var/lib/redis
/
100 GB
10 GB
50 GB
40 GB

IAG server

SpecRequirementProduction ENV
CPU64-bit x86 CPU cores16
OSRHEL
Rocky
8/9
8/9
RAMDDR5 DRAM 3200 MHz32 GB
Disk (Solid State Media, SSD, NVMe)Total
/var/log/automation-gateway
/var/lib/automation-gateway
/opt/automation-gateway
/
80 GB
10 GB
50 GB
10 GB
10 GB