For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Open sourceSupportFAQsDocs Home
DocumentationAPI referenceRelease notes
DocumentationAPI referenceRelease notes
  • Platform On-Prem
    • Overview
    • Navigate
        • Overview
        • Minimal architecture
        • High availability
        • Cloud
        • Active-standby
        • Blue-green
        • Deploy with containers
        • Deploy with Kubernetes
        • Alternative architectures
      • System requirements
    • Search resources
  • Apps
    • FlowAI
    • Itential Automation Gateway
  • Resources
    • Itential Academy
    • Version lifecycle
    • Itential MCP
    • Accessibility conformance
    • Get support
    • FAQs
LogoLogo
Open sourceSupportFAQsDocs Home
On this page
  • Architecture overview
  • Highly available Itential Platform
  • Highly available databases
  • What happens during a data center outage?
  • Operational overhead
  • Monitoring
  • Configure the workflow engine
  • Server specifications
  • Hardware requirements
  • Hypervisor/host OS settings
  • Network requirements
  • Required user accounts in dependencies
Platform On-PremPlanChoose a deployment architecture

Active-standby architecture

Was this page helpful?
Previous

Blue-green deployments

Next
Built with

The active-standby architecture is Itential’s most resilient validated deployment model. It is designed for organizations that operate automation at a scale where unplanned downtime carries measurable business risk, whether that risk is regulatory, contractual, or operational.

This architecture provides geographic redundancy across three data centers and is capable of surviving the complete loss of any single data center without operator intervention. We recommend deploying this architecture if you have strict uptime requirements and formal business continuity programs.

This architecture requires significant infrastructure investment but delivers high resilience. The sections below describe the component footprint, hardware requirements, failover behavior, and operational expectations in detail so that infrastructure, operations, and security teams can assess readiness and plan accordingly.

Three-data-center architecture with redundant Platform, Redis, and MongoDB nodes across active and standby sites.
Itential active-standby architecture

How This Compares to Other Itential Architectures

ArchitectureData CentersRedundancyRecommended For
Standalone1NoneDevelopment/Lab
High Availability (HA)1Component-levelNon-critical production
Active-standby3Full geographic redundancyMission-critical production

Architecture overview

An active-standby architecture (ASA) deploys Itential Platform, MongoDB, and Redis across three geographically distributed data centers with full redundancy. This architecture builds on two geographically redundant HA2 installations and uses a larger geographically redundant MongoDB replica set.

The Itential Platform performs frequent reads and writes to the database, so low latency between the active Platform instances and MongoDB is critical. All active components run in the same data center. MongoDB and Redis replication processes replicate data from the primary node in the active data center to secondary data centers. All components require authentication.

The minimum active-standby deployment requires 17 virtual machines (VMs) distributed across three data centers. The table below summarizes the full server inventory.

ComponentQuantityPer-VM RAMPer-VM StorageNotes
Itential Platform464 GB250 GB2 active, 2 standby
MongoDB5128 GB1,000 GB2 primary DC, 2 secondary DC, 1 arbiter
Redis432 GB100 GB2 per data center
Redis Sentinel32 GB20 GB1 per data center
Automation Gateway28 GB20 GB1 per primary/secondary DC
Total17

All servers require solid-state storage (SSD or NVMe) capable of at least 20,000 IOPS and network connectivity of 10 Gbps or higher. For complete hardware specifications, see Server specifications.

  • A Global Traffic Manager (GTM) load balancer for routing between data centers during failover
  • Local Traffic Manager (LTM) load balancers in each active data center
  • Network routing that permits inter-component traffic flows (see Network requirements)

Highly available Itential Platform

Itential Platform instances communicate with one another in an abstract manner via Redis. Adding a new Itential Platform node and configuring it to access the correct MongoDB and Redis instances achieves high availability. As Itential Platform instances are added and configured, they are enabled to perform work.

Configure Itential Platform instances with the following:

  • MongoDB connection strings that reference all members of the replica set
  • Redis configurations that specify the list of all known Redis Sentinels and their Sentinel username and password (connections use Redis Sentinels rather than direct Redis connections)

Highly available databases

Both MongoDB and Redis use a primary/secondary replication model. When a primary node fails, the replica set initiates an election for a new primary. The replica set cannot accept reads and writes until the new primary is selected, typically within a few seconds. Once a new primary is identified, the Itential Platform resumes normal operation. Operators do not need to take action during elections.

MongoDB configuration

MongoDB clusters operate in a primary/secondary model where data written to the primary replicates to secondary nodes. To prevent split-brain scenarios during elections, the architecture requires an odd number of replica set members distributed across three data centers: 2 in the primary region, 2 in the secondary region, and 1 in a tertiary region. When a region is lost, three voting members of the replica set remain. The replica set configuration must enforce a preference to influence the voting in this architecture to guarantee that the primary MongoDB shifts to the secondary region in the case of a disaster.

Configure Itential’s MongoDB cluster with the following requirements:

  • All replica set members must be defined in the Itential Platform config
  • Configure authentication between replica members using either a shared key or X.509 certificate
  • Create an admin user with full access to perform any operation
  • Create an “itential” user with least-privilege access to the Itential database only (configure Itential Platform to use this user account)
  • Use the priority settings to influence voting as follows:
MongoDB NodePriority Setting
Primary Region Database 110
Primary Region Database 210
Secondary Region Database 15
Secondary Region Database 25
Tertiary Region Database 31

Learn more:

  • MongoDB Replication documentation
  • Replica Set Elections

Redis configuration

To avoid single points of failure, arrange Redis data-bearing nodes in pairs across both data centers as a replica set. Configure Sentinels in 3 data centers to avoid a data center outage from reducing Sentinel availability below a majority. A majority of Sentinels must always be available for failover to work.

Configure Itential’s Redis replica sets with the following requirements:

  • Define all Redis nodes in the Itential Platform configuration
  • Configure authentication between replica members using user credentials in the Redis configuration file
  • Create an admin user with full Redis access
  • Create an “itential” user with least-privilege access required by the Itential Platform
  • Create a replication user with least-privilege access for the replication process
  • Include Redis Sentinel to monitor the Redis cluster
  • Redis Sentinel may be collocated with Redis but is not required to be collocated
  • Create an admin user for Redis Sentinel with full access to perform any Sentinel task
  • Maintain low-latency connections between Redis nodes to prevent replication failures
  • Configure Redis priority settings to make primary member elections deterministic. Settings are the opposite of MongoDB—the lowest value is most preferred.
Redis nodePriority Setting
Primary region Redis 110
Primary region Redis 210
Secondary region Redis 150
Secondary region Redis 250

Redis requires careful latency management. If latency between Redis nodes exceeds 10ms, replication lag and failover issues can occur. Keep all Redis nodes within a single region or use high-bandwidth, low-latency interconnects between regions.

Learn more:

  • Redis Replication documentation
  • Redis Sentinel documentation

What happens during a data center outage?

The active-standby architecture is designed so that the loss of any single data center triggers automatic recovery without requiring operator action. The specific behavior depends on which data center is lost.

Loss of the primary data center

If Data Center 1 (active) becomes unavailable, the MongoDB replica set detects the loss of its two highest-priority members and holds an election. The two MongoDB nodes in the secondary data center carry sufficient priority to assume the primary role. Redis Sentinel, with members still available in Data Centers 2 and 3, similarly promotes the secondary Redis node to primary. The GTM load balancer then routes traffic to Data Center 2, where a standby set of Platform and Gateway nodes is running and ready to accept work. The Platform nodes in Data Center 2 resume processing once both MongoDB and Redis primaries are established in that data center.

Loss of the secondary data center

If Data Center 2 becomes unavailable, the primary data center retains a majority in both MongoDB and Redis and continues normal operation without disruption.

Loss of the tertiary data center

Data Center 3 hosts only a Redis Sentinel and a MongoDB arbiter. Losing it does not cause a primary election in either MongoDB or Redis because a majority of voting members (in Data Centers 1 and 2) remain available. No failover occurs and operations continue uninterrupted.

Recovery time expectations

Automatic recovery is handled by MongoDB and Redis election mechanisms. Under normal network conditions, elections complete within a few seconds. During that window the platform may not be able to accept new work or commit data. No manual intervention is required. Once elections complete, the platform resumes automatically.

Define your own Recovery Time Objective (RTO) and Recovery Point Objective (RPO) targets based on your specific operational requirements and validate those targets during initial deployment testing. The architecture is designed to support RTOs measured in seconds for component-level failures and minutes for full data center failures, contingent on external load balancer configuration.

Backup and recovery

Replication provides infrastructure redundancy, not data protection. Backups are essential to guard against logical corruption and accidental deletion.

Geographic replication is not a substitute for backup. Replication propagates all writes, including unintended ones, to all members of the replica set. Implement and test a backup strategy appropriate to your recovery objectives.

MongoDB is the system of record for all platform data including workflows, jobs, inventory, and configuration. Itential recommends a regular backup schedule using MongoDB-native tooling (mongodump or MongoDB Ops Manager) with backups stored outside the primary replica set, preferably in a location not subject to the same failure scenarios as the three data centers. Consider point-in-time recovery using oplog tailing for low RPO requirements.

For more information, see Back up and restore MongoDB.

Redis functions as a transient message and job queue layer. Redis data is reconstructed by the platform on reconnection and does not require the same backup posture as MongoDB. However, if you use Redis persistence (AOF or RDB), ensure that persistence files are included in your broader backup strategy.

Test backup procedures on a scheduled basis, not only at initial deployment.

Operational overhead

The answer to how much ongoing operational effort this architecture requires depends on whether normal operations or incident response is being considered.

Normal operations

Under steady-state conditions this architecture requires relatively little hands-on management. The platform, MongoDB, and Redis all handle routine internal events — such as replica lag recovery and leader election — without operator intervention. The primary operational responsibilities are:

  • Monitoring cluster health across all three data centers (see Monitoring expectations section below)
  • Applying operating system and component patches on a scheduled basis using rolling procedures that avoid simultaneous downtime of majority replica set members
  • Periodically testing failover scenarios in non-production environments to validate that GTM routing and Sentinel configuration remain correct as the environment evolves
  • Managing TLS certificates and credential rotation across all components on their respective expiration schedules

Required skills

Operating this architecture competently requires staff with working knowledge of MongoDB replica set administration, Redis Sentinel configuration, Linux system administration, and your load balancer and network infrastructure. Familiarity with your monitoring stack is also required. Itential Professional Services can assist with initial deployment and knowledge transfer.

What operators don’t need to do

Primary elections in MongoDB and Redis are fully automatic. Operators do not need to manually promote a replica to primary during a failover. The platform reconnects to new primaries automatically once elections complete.

Monitoring

Establish monitoring coverage for the following conditions at minimum. Without visibility into these signals, degraded states can go undetected until they compound into an outage.

MongoDB

Monitor replica set member health and replication lag across all five nodes. A lagging secondary that has not caught up with the primary is a risk factor in any subsequent failover. Alert on replication lag exceeding a threshold appropriate to your RPO. Monitor available disk space on all data-bearing nodes, particularly the /var/lib/mongo partition, which holds the data files.

Redis

Monitor Sentinel-reported primary/secondary topology to confirm the expected node holds the primary role at all times. Alert on any unexpected role change, which may indicate a silent failover occurred. Monitor replication lag between Redis nodes and connection counts from the platform.

Platform nodes

Monitor application logs for job processing errors and connectivity failures to MongoDB or Redis. Elevated error rates are early indicators of a dependency problem.

Infrastructure

Monitor network latency between data centers. The platform is sensitive to latency in its MongoDB and Redis connections. Elevated cross-datacenter latency is a leading indicator of replication problems.

Configure the workflow engine

The installation process handles setting the appropriate configurations in MongoDB and Redis. There are a few other things to consider in Itential Platform — most importantly, knowing what state the Workflow Engine is in at any given moment. In an ASA, the secondary data center (standby site) contains Itential Platform servers that you must configure to remain passive until a failover event occurs. This configuration prevents both data centers from processing workloads simultaneously.

There are two settings for controlling the state of the Workflow Engine, both found in either the configuration file or in corresponding environment variables. The properties file is typically found at /etc/itential/platform.properties.

Property File VariableEnvironment VariableValueDescription
task_worker_enabledITENTIAL_TASK_WORKER_ENABLEDfalseIf true, starts working tasks immediately after server startup. If false, the task worker must be enabled manually via the UI or API.
job_worker_enabledITENTIAL_JOB_WORKER_ENABLEDfalseIf true, allows jobs to be started after server startup. If false, API calls to start jobs return an error until enabled manually via the UI or API.

Initialize both properties to false in the secondary data center so that it remains passive and does not start or process jobs or tasks. During a failover event, after both MongoDB and Redis have successfully completed their elections, set these to true to activate the secondary workflow engines and resume automations. Manage both properties with a RESTful API call — start the task worker first, then the job worker. Both APIs require a valid session token and must be run against each Itential Platform server individually; they cannot be executed through a load balancer.

$# Activate the task workers
$curl -X POST 'http://<hostname>:<port>/workflow_engine/activate' \
> -H 'Authorization: Bearer <token>' \
> -H 'Content-Type: application/json'
$
$# Activate the job workers
$curl -X POST 'http://<hostname>:<port>/workflow_engine/jobWorker/activate' \
> -H 'Authorization: Bearer <token>' \
> -H 'Content-Type: application/json'

After making these requests the secondary data center processes automations. At this point, disable these on the previously active data center:

$# Deactivate the task workers
$curl -X POST 'http://<hostname>:<port>/workflow_engine/deactivate' \
> -H 'Authorization: Bearer <token>' \
> -H 'Content-Type: application/json'
$
$# Deactivate the job workers
$curl -X POST 'http://<hostname>:<port>/workflow_engine/jobWorker/deactivate' \
> -H 'Authorization: Bearer <token>' \
> -H 'Content-Type: application/json'

Itential recommends setting both properties to false in both the active and secondary data centers. Setting all Itential Platform servers to disable jobs and tasks gives you a known state whenever instances stop and restart, making it a deliberate action to enable job and task processing even in the primary data center.

Server specifications

For production environments, all Itential Platform components should be installed on their own individual servers to properly support High Availability (HA). Disk references to pronghorn (seen in older deployments) should be changed to itential.

Itential Platform server

SpecRequirementProduction ENV
CPU64-bit x86 CPU cores16
OSRHEL
Rocky
8/9
8/9
RAMDDR5 DRAM 3200 MHz64 GB
Disk (Solid State Media, SSD, NVMe)Total
/var/log/itential
/opt/itential
/
250 GB
100 GB
100 GB
50 GB

MongoDB server

SpecRequirementProduction ENV
CPU64-bit x86 CPU cores16
OSRHEL
Rocky
8/9
8/9
RAMDDR5 DRAM 3200 MHz128 GB
Disk (Solid State Media, SSD, NVMe)Total
/var/log/mongodb
/var/lib/mongo
/
1000 GB
100 GB
850 GB
50 GB

Redis server

SpecRequirementProduction ENV
CPU64-bit x86 CPU cores8
OSRHEL
Rocky
8/9
8/9
RAMDDR5 DRAM 3200 MHz32 GB
Disk (Solid State Media, SSD, NVMe)Total
/var/log/redis
/var/lib/redis
/
100 GB
10 GB
50 GB
40 GB

IAG server

The following applies to a simple All-In-One implementation of Itential Automation Gateway (IAG). For a more information about alternative IAG architectures, see Choose a deployment architecture.

SpecRequirementProduction ENV
CPU64-bit x86 CPU cores4
OSRHEL
Rocky
8/9
8/9
RAMDDR5 DRAM 3200 MHz8 GB
DiskSolid State Media (SSD, NVMe)20 GB

Hardware requirements

Processor

Processor specification requirements:

  • Second generation or better Intel Xeon Platinum 8000 series processors
  • Third generation or better AMD EPYC 7000 series processors

Memory

Memory specification requirement:

  • DDR5 DRAM 3200 MHz or higher

Storage

Storage performance requirements in IOPS (16 kiB):

  • 20000+ IOPS
  • Non-spinning media (SSD, NVMe)

Network

Network speed requirement:

  • 10 Gbps or higher

In some instances, adding additional dedicated interfaces that are focused on routing specific traffic to specific external systems can be explored. This routing of traffic would be configured at the OS-level (custom interfaces and routes) and requires the system administrator to manage it. An example would be separating NSO traffic from Redis/MongoDB destined traffic.

Hypervisor/host OS settings

These settings are strongly recommended for high load applications of Itential Platform:

  • CPU affinity settings or similar functionality to prevent CPU starvation
  • Full memory reservation
  • One physical CPU per VM is preferred
  • Huge pages for memory support enabled (except MongoDB)
  • Memory compression disabled
  • Minimal CPU allocation settings for scheduler according to CPU clock

Example: Assuming an Itential Platform VM on a server capable of 2.5GHz nominal speed:

CPU clock reservation = 16vCPU × 2.5GHz

Follow hypervisor recommendations when performing CPU reservations. In most cases the total of all CPU reservations for all VMs on a host cannot be more than 90% of the host capacity as 10% is reserved by the host itself.

MongoDB discourages the utilization of Transparent Huge Pages with versions 7 and below. This advice is changed in version 8 which encourages the use of Transparent Huge Pages.

Network requirements

In an environment where components are installed on more than one host, the following network traffic flows need to be allowed. All ports and networking specs are TCP protocol unless otherwise noted. Not all ports will need to be open for every supported architecture. Secure ports are only required when explicitly configured.

SourceDestinationPortDescription
Desktop DevicesItential Platform3000Web browser connections to Itential Platform over HTTP
Desktop DevicesItential Platform3443Web browser connections to Itential Platform over HTTPS
Desktop DevicesHashiCorp Vault8200Web browser connections to HashiCorp Vault
Itential PlatformMongoDB27017Itential Platform connects to MongoDB
Itential PlatformRedis6379Itential Platform connects to Redis
Itential PlatformRedis26379Itential Platform connects to Redis Sentinel (HA installations only)
Itential PlatformIAG 550051IAG connects to Itential Platform using a mTLS websocket
Itential PlatformHashiCorp Vault8200Itential Platform connects to HashiCorp Vault
Itential PlatformLDAP389Itential Platform connects to LDAP (when LDAP adapter is used for authentication)
Itential PlatformLDAP636Itential Platform connects to LDAP with TLS (when LDAP adapter is used for authentication)
Itential PlatformRADIUS1812Itential Platform connects to RADIUS (when RADIUS adapter is used for authentication; uses UDP)
MongoDBMongoDB27017Each MongoDB talks to other MongoDBs for replication (HA installations only)
RedisRedis6379Each Redis talks to other Redis sources for replication (HA installations only)
RedisRedis26379Each Redis uses Redis Sentinel to monitor the Redis processes (HA installations only)

Required user accounts in dependencies

The validated designs are opinionated installations of Itential and its dependencies. The following user accounts are required by the dependencies.

MongoDB

AccountDescription
adminHas full root access to the mongo database. Can read and write to any logical database. Can be used to issue admin commands like forcing an election and configuring replica sets. This is NOT used by the Itential application but is created for admin purposes
itentialHas read and write access to the "itential" database only. This is the account used by the Itential Platform application.
monitorHas read only access to the mongo database. This is used by the monitoring systems to capture MongoDB metrics for observability.

Redis

AccountDescription
adminHas full root access to the Redis database, all channels, all keys, all commands. This is NOT used by the Itential application but is created for admin purposes.
itentialHas full access to the Redis database, all channels, all keys, EXCEPT the following commands: asking, cluster, readonly, readwrite, bgrewriteaof, bgsave, failover, flushall, flushdb, psync, replconf, replicaof, save, shutdown, sync. This is the account used by the Itential Platform application.
repluserHas access to the minimum set of commands to perform replication: psync, replconf, ping.
admin (Sentinel)Full root access to Redis Sentinel. This is NOT used by the Itential application but is created for admin purposes of Redis Sentinel.
sentineluserHas access to the minimum set of commands to perform sentinel monitoring: multi, slaveof, ping, exec, subscribe, `config
monitorHas access to the minimum set of commands to expose metrics for observability: `-@all +@connection +memory -readonly +strlen +config