Active-standby architecture

The active-standby architecture is Itential’s most resilient validated deployment model. It is designed for organizations that operate automation at a scale where unplanned downtime carries measurable business risk, whether that risk is regulatory, contractual, or operational.

This architecture provides geographic redundancy across three data centers and is capable of surviving the complete loss of any single data center without operator intervention. We recommend deploying this architecture if you have strict uptime requirements and formal business continuity programs.

This architecture requires significant infrastructure investment but delivers high resilience. The sections below describe the component footprint, hardware requirements, failover behavior, and operational expectations in detail so that infrastructure, operations, and security teams can assess readiness and plan accordingly.

Three-data-center architecture with redundant Platform, Redis, and MongoDB nodes across active and standby sites. — Itential active-standby architecture

How This Compares to Other Itential Architectures

Architecture	Data Centers	Redundancy	Recommended For
Standalone	1	None	Development/Lab
High Availability (HA)	1	Component-level	Non-critical production
Active-standby	3	Full geographic redundancy	Mission-critical production

Architecture overview

An active-standby architecture (ASA) deploys Itential Platform, MongoDB, and Redis across three geographically distributed data centers with full redundancy. This architecture builds on two geographically redundant HA2 installations and uses a larger geographically redundant MongoDB replica set.

The Itential Platform performs frequent reads and writes to the database, so low latency between the active Platform instances and MongoDB is critical. All active components run in the same data center. MongoDB and Redis replication processes replicate data from the primary node in the active data center to secondary data centers. All components require authentication.

The minimum active-standby deployment requires 17 virtual machines (VMs) distributed across three data centers. The table below summarizes the full server inventory.

Component	Quantity	Per-VM RAM	Per-VM Storage	Notes
Itential Platform	4	64 GB	250 GB	2 active, 2 standby
MongoDB	5	128 GB	1,000 GB	2 primary DC, 2 secondary DC, 1 arbiter
Redis	4	32 GB	100 GB	2 per data center
Redis Sentinel	3	2 GB	20 GB	1 per data center
Itential Gateway	2	8 GB	20 GB	1 per primary/secondary DC
Total	17

All servers require solid-state storage (SSD or NVMe) capable of at least 20,000 IOPS and network connectivity of 10 Gbps or higher. For complete hardware specifications, see Server specifications.

A Global Traffic Manager (GTM) load balancer for routing between data centers during failover
Local Traffic Manager (LTM) load balancers in each active data center
Network routing that permits inter-component traffic flows (see Network requirements)

Highly available Itential Platform

Itential Platform instances communicate with one another in an abstract manner via Redis. Adding a new Itential Platform node and configuring it to access the correct MongoDB and Redis instances achieves high availability. As Itential Platform instances are added and configured, they are enabled to perform work.

Configure Itential Platform instances with the following:

MongoDB connection strings that reference all members of the replica set
Redis configurations that specify the list of all known Redis Sentinels and their Sentinel username and password (connections use Redis Sentinels rather than direct Redis connections)

Highly available databases

Both MongoDB and Redis use a primary/secondary replication model. When a primary node fails, the replica set initiates an election for a new primary. The replica set cannot accept reads and writes until the new primary is selected, typically within a few seconds. Once a new primary is identified, the Itential Platform resumes normal operation. Operators do not need to take action during elections.

MongoDB configuration

MongoDB clusters operate in a primary/secondary model where data written to the primary replicates to secondary nodes. To prevent split-brain scenarios during elections, the architecture requires an odd number of replica set members distributed across three data centers: 2 in the primary region, 2 in the secondary region, and 1 in a tertiary region. When a region is lost, three voting members of the replica set remain. The replica set configuration must enforce a preference to influence the voting in this architecture to guarantee that the primary MongoDB shifts to the secondary region in the case of a disaster.

Configure Itential’s MongoDB cluster with the following requirements:

All replica set members must be defined in the Itential Platform config
Configure authentication between replica members using either a shared key or X.509 certificate
Create an admin user with full access to perform any operation
Create an “itential” user with least-privilege access to the Itential database only (configure Itential Platform to use this user account)
Use the priority settings to influence voting as follows:

MongoDB Node	Priority Setting
Primary Region Database 1	10
Primary Region Database 2	10
Secondary Region Database 1	5
Secondary Region Database 2	5
Tertiary Region Database 3	1

Learn more:

Redis configuration

To avoid single points of failure, arrange Redis data-bearing nodes in pairs across both data centers as a replica set. Configure Sentinels in 3 data centers to avoid a data center outage from reducing Sentinel availability below a majority. A majority of Sentinels must always be available for failover to work.

Configure Itential’s Redis replica sets with the following requirements:

Define all Redis nodes in the Itential Platform configuration
Configure authentication between replica members using user credentials in the Redis configuration file
Create an admin user with full Redis access
Create an “itential” user with least-privilege access required by the Itential Platform
Create a replication user with least-privilege access for the replication process
Include Redis Sentinel to monitor the Redis cluster
Redis Sentinel may be collocated with Redis but is not required to be collocated
Create an admin user for Redis Sentinel with full access to perform any Sentinel task
Maintain low-latency connections between Redis nodes to prevent replication failures
Configure Redis priority settings to make primary member elections deterministic. Settings are the opposite of MongoDB—the lowest value is most preferred.

Redis node	Priority Setting
Primary region Redis 1	10
Primary region Redis 2	10
Secondary region Redis 1	50
Secondary region Redis 2	50

Redis requires careful latency management. If latency between Redis nodes exceeds 10ms, replication lag and failover issues can occur. Keep all Redis nodes within a single region or use high-bandwidth, low-latency interconnects between regions.

Learn more:

What happens during a data center outage?

The active-standby architecture is designed so that the loss of any single data center triggers automatic recovery without requiring operator action. The specific behavior depends on which data center is lost.

Loss of the primary data center

If Data Center 1 (active) becomes unavailable, the MongoDB replica set detects the loss of its two highest-priority members and holds an election. The two MongoDB nodes in the secondary data center carry sufficient priority to assume the primary role. Redis Sentinel, with members still available in Data Centers 2 and 3, similarly promotes the secondary Redis node to primary. The GTM load balancer then routes traffic to Data Center 2, where a standby set of Platform and Gateway nodes is running and ready to accept work. The Platform nodes in Data Center 2 resume processing once both MongoDB and Redis primaries are established in that data center.

Loss of the secondary data center

If Data Center 2 becomes unavailable, the primary data center retains a majority in both MongoDB and Redis and continues normal operation without disruption.

Loss of the tertiary data center

Data Center 3 hosts only a Redis Sentinel and a MongoDB arbiter. Losing it does not cause a primary election in either MongoDB or Redis because a majority of voting members (in Data Centers 1 and 2) remain available. No failover occurs and operations continue uninterrupted.

Recovery time expectations

Automatic recovery is handled by MongoDB and Redis election mechanisms. Under normal network conditions, elections complete within a few seconds. During that window the platform may not be able to accept new work or commit data. No manual intervention is required. Once elections complete, the platform resumes automatically.

Define your own Recovery Time Objective (RTO) and Recovery Point Objective (RPO) targets based on your specific operational requirements and validate those targets during initial deployment testing. The architecture is designed to support RTOs measured in seconds for component-level failures and minutes for full data center failures, contingent on external load balancer configuration.

Backup and recovery

Replication provides infrastructure redundancy, not data protection. Backups are essential to guard against logical corruption and accidental deletion.

Geographic replication is not a substitute for backup. Replication propagates all writes, including unintended ones, to all members of the replica set. Implement and test a backup strategy appropriate to your recovery objectives.

MongoDB is the system of record for all platform data including workflows, jobs, inventory, and configuration. Itential recommends a regular backup schedule using MongoDB-native tooling (mongodump or MongoDB Ops Manager) with backups stored outside the primary replica set, preferably in a location not subject to the same failure scenarios as the three data centers. Consider point-in-time recovery using oplog tailing for low RPO requirements.

For more information, see Back up and restore MongoDB.

Redis functions as a transient message and job queue layer. Redis data is reconstructed by the platform on reconnection and does not require the same backup posture as MongoDB. However, if you use Redis persistence (AOF or RDB), ensure that persistence files are included in your broader backup strategy.

Test backup procedures on a scheduled basis, not only at initial deployment.

Operational overhead

The answer to how much ongoing operational effort this architecture requires depends on whether normal operations or incident response is being considered.

Normal operations

Under steady-state conditions this architecture requires relatively little hands-on management. The platform, MongoDB, and Redis all handle routine internal events — such as replica lag recovery and leader election — without operator intervention. The primary operational responsibilities are:

Monitoring cluster health across all three data centers (see Monitoring expectations section below)
Applying operating system and component patches on a scheduled basis using rolling procedures that avoid simultaneous downtime of majority replica set members
Periodically testing failover scenarios in non-production environments to validate that GTM routing and Sentinel configuration remain correct as the environment evolves
Managing TLS certificates and credential rotation across all components on their respective expiration schedules

Required skills

Operating this architecture competently requires staff with working knowledge of MongoDB replica set administration, Redis Sentinel configuration, Linux system administration, and your load balancer and network infrastructure. Familiarity with your monitoring stack is also required. Itential Professional Services can assist with initial deployment and knowledge transfer.

What operators don’t need to do

Primary elections in MongoDB and Redis are fully automatic. Operators do not need to manually promote a replica to primary during a failover. The platform reconnects to new primaries automatically once elections complete.

Monitoring

Establish monitoring coverage for the following conditions at minimum. Without visibility into these signals, degraded states can go undetected until they compound into an outage.

MongoDB

Monitor replica set member health and replication lag across all five nodes. A lagging secondary that has not caught up with the primary is a risk factor in any subsequent failover. Alert on replication lag exceeding a threshold appropriate to your RPO. Monitor available disk space on all data-bearing nodes, particularly the /var/lib/mongo partition, which holds the data files.

Redis

Monitor Sentinel-reported primary/secondary topology to confirm the expected node holds the primary role at all times. Alert on any unexpected role change, which may indicate a silent failover occurred. Monitor replication lag between Redis nodes and connection counts from the platform.

Platform nodes

Monitor application logs for job processing errors and connectivity failures to MongoDB or Redis. Elevated error rates are early indicators of a dependency problem.

Infrastructure

Monitor network latency between data centers. The platform is sensitive to latency in its MongoDB and Redis connections. Elevated cross-datacenter latency is a leading indicator of replication problems.

Configure the workflow engine

The installation process handles setting the appropriate configurations in MongoDB and Redis. There are a few other things to consider in Itential Platform — most importantly, knowing what state the Workflow Engine is in at any given moment. In an ASA, the secondary data center (standby site) contains Itential Platform servers that you must configure to remain passive until a failover event occurs. This configuration prevents both data centers from processing workloads simultaneously.

There are two settings for controlling the state of the Workflow Engine, both found in either the configuration file or in corresponding environment variables. The properties file is typically found at /etc/itential/platform.properties.

Property File Variable	Environment Variable	Value	Description
task_worker_enabled	ITENTIAL_TASK_WORKER_ENABLED	`false`	If `true`, starts working tasks immediately after server startup. If `false`, the task worker must be enabled manually via the UI or API.
job_worker_enabled	ITENTIAL_JOB_WORKER_ENABLED	`false`	If `true`, allows jobs to be started after server startup. If `false`, API calls to start jobs return an error until enabled manually via the UI or API.

Initialize both properties to false in the secondary data center so that it remains passive and does not start or process jobs or tasks. During a failover event, after both MongoDB and Redis have successfully completed their elections, set these to true to activate the secondary workflow engines and resume automations. Manage both properties with a RESTful API call — start the task worker first, then the job worker. Both APIs require a valid session token and must be run against each Itential Platform server individually; they cannot be executed through a load balancer.

$ # Activate the task workers
$ curl -X POST 'http://<hostname>:<port>/workflow_engine/activate' \
>   -H 'Authorization: Bearer <token>' \
>   -H 'Content-Type: application/json'
$ 
$ # Activate the job workers
$ curl -X POST 'http://<hostname>:<port>/workflow_engine/jobWorker/activate' \
>   -H 'Authorization: Bearer <token>' \
>   -H 'Content-Type: application/json'

After making these requests the secondary data center processes automations. At this point, disable these on the previously active data center:

$ # Deactivate the task workers
$ curl -X POST 'http://<hostname>:<port>/workflow_engine/deactivate' \
>   -H 'Authorization: Bearer <token>' \
>   -H 'Content-Type: application/json'
$ 
$ # Deactivate the job workers
$ curl -X POST 'http://<hostname>:<port>/workflow_engine/jobWorker/deactivate' \
>   -H 'Authorization: Bearer <token>' \
>   -H 'Content-Type: application/json'

Itential recommends setting both properties to false in both the active and secondary data centers. Setting all Itential Platform servers to disable jobs and tasks gives you a known state whenever instances stop and restart, making it a deliberate action to enable job and task processing even in the primary data center.

Server specifications

For production environments, all Itential Platform components should be installed on their own individual servers to properly support High Availability (HA). Disk references to pronghorn (seen in older deployments) should be changed to itential.

Itential Platform server

Spec	Requirement	Production ENV
CPU	64-bit x86 CPU cores	16
OS	RHEL Rocky	8/9 8/9
RAM	DDR5 DRAM 3200 MHz	64 GB
Disk (Solid State Media, SSD, NVMe)	Total `/var/log/itential` `/opt/itential` `/`	250 GB 100 GB 100 GB 50 GB

MongoDB server

Spec	Requirement	Production ENV
CPU	64-bit x86 CPU cores	16
OS	RHEL Rocky	8/9 8/9
RAM	DDR5 DRAM 3200 MHz	128 GB
Disk (Solid State Media, SSD, NVMe)	Total `/var/log/mongodb` `/var/lib/mongo` `/`	1000 GB 100 GB 850 GB 50 GB

Redis server

Spec	Requirement	Production ENV
CPU	64-bit x86 CPU cores	8
OS	RHEL Rocky	8/9 8/9
RAM	DDR5 DRAM 3200 MHz	32 GB
Disk (Solid State Media, SSD, NVMe)	Total `/var/log/redis` `/var/lib/redis` `/`	100 GB 10 GB 50 GB 40 GB

Gateway server

The following applies to a simple All-In-One implementation of Itential Gateway. For a more information about alternative Gateway architectures, see Choose a deployment architecture.

Spec	Requirement	Production ENV
CPU	64-bit x86 CPU cores	4
OS	RHEL Rocky	8/9 8/9
RAM	DDR5 DRAM 3200 MHz	8 GB
Disk	Solid State Media (SSD, NVMe)	20 GB

Hardware requirements

Processor

Processor specification requirements:

Second generation or better Intel Xeon Platinum 8000 series processors
Third generation or better AMD EPYC 7000 series processors

Memory

Memory specification requirement:

DDR5 DRAM 3200 MHz or higher

Storage

Storage performance requirements in IOPS (16 kiB):

20000+ IOPS
Non-spinning media (SSD, NVMe)

Network

Network speed requirement:

10 Gbps or higher

In some instances, adding additional dedicated interfaces that are focused on routing specific traffic to specific external systems can be explored. This routing of traffic would be configured at the OS-level (custom interfaces and routes) and requires the system administrator to manage it. An example would be separating NSO traffic from Redis/MongoDB destined traffic.

Hypervisor/host OS settings

These settings are strongly recommended for high load applications of Itential Platform:

CPU affinity settings or similar functionality to prevent CPU starvation
Full memory reservation
One physical CPU per VM is preferred
Huge pages for memory support enabled (except MongoDB)
Memory compression disabled
Minimal CPU allocation settings for scheduler according to CPU clock

Example: Assuming an Itential Platform VM on a server capable of 2.5GHz nominal speed:

CPU clock reservation = 16vCPU × 2.5GHz

Follow hypervisor recommendations when performing CPU reservations. In most cases the total of all CPU reservations for all VMs on a host cannot be more than 90% of the host capacity as 10% is reserved by the host itself.

MongoDB discourages the utilization of Transparent Huge Pages with versions 7 and below. This advice is changed in version 8 which encourages the use of Transparent Huge Pages.

Network requirements

In an environment where components are installed on more than one host, the following network traffic flows need to be allowed. All ports and networking specs are TCP protocol unless otherwise noted. Not all ports will need to be open for every supported architecture. Secure ports are only required when explicitly configured.

Source	Destination	Port	Description
Desktop Devices	Itential Platform	3000	Web browser connections to Itential Platform over HTTP
Desktop Devices	Itential Platform	3443	Web browser connections to Itential Platform over HTTPS
Desktop Devices	HashiCorp Vault	8200	Web browser connections to HashiCorp Vault
Itential Platform	MongoDB	27017	Itential Platform connects to MongoDB
Itential Platform	Redis	6379	Itential Platform connects to Redis
Itential Platform	Redis	26379	Itential Platform connects to Redis Sentinel (HA installations only)
Itential Platform	Gateway 5	50051	Gateway connects to Itential Platform using a mTLS websocket
Itential Platform	HashiCorp Vault	8200	Itential Platform connects to HashiCorp Vault
Itential Platform	LDAP	389	Itential Platform connects to LDAP (when LDAP adapter is used for authentication)
Itential Platform	LDAP	636	Itential Platform connects to LDAP with TLS (when LDAP adapter is used for authentication)
Itential Platform	RADIUS	1812	Itential Platform connects to RADIUS (when RADIUS adapter is used for authentication; uses UDP)
MongoDB	MongoDB	27017	Each MongoDB talks to other MongoDBs for replication (HA installations only)
Redis	Redis	6379	Each Redis talks to other Redis sources for replication (HA installations only)
Redis	Redis	26379	Each Redis uses Redis Sentinel to monitor the Redis processes (HA installations only)

Required user accounts in dependencies

The validated designs are opinionated installations of Itential and its dependencies. The following user accounts are required by the dependencies.

MongoDB

Account	Description
`admin`	Has full root access to the `mongo` database. Can read and write to any logical database. Can be used to issue admin commands like forcing an election and configuring replica sets. This is NOT used by the Itential application but is created for admin purposes
`itential`	Has read and write access to the `"itential"` database only. This is the account used by the Itential Platform application.
`monitor`	Has read only access to the `mongo` database. This is used by the monitoring systems to capture MongoDB metrics for observability.

Redis

Account	Description
`admin`	Has full root access to the Redis database, all channels, all keys, all commands. This is NOT used by the Itential application but is created for admin purposes.
`itential`	Has full access to the Redis database, all channels, all keys, EXCEPT the following commands: `asking`, `cluster`, `readonly`, `readwrite`, `bgrewriteaof`, `bgsave`, `failover`, `flushall`, `flushdb`, `psync`, `replconf`, `replicaof`, `save`, `shutdown`, `sync`. This is the account used by the Itential Platform application.
`repluser`	Has access to the minimum set of commands to perform replication: `psync`, `replconf`, `ping`.
`admin` (Sentinel)	Full root access to Redis Sentinel. This is NOT used by the Itential application but is created for admin purposes of Redis Sentinel.
`sentineluser`	Has access to the minimum set of commands to perform sentinel monitoring: `multi`, `slaveof`, `ping`, `exec`, `subscribe`, `config
`monitor`	Has access to the minimum set of commands to expose metrics for observability: `-@all +@connection +memory -readonly +strlen +config