For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
This guide provides step-by-step procedures for deploying a single gateway cluster with distributed service execution, where gateway servers handle management functions and dedicated runner nodes handle service execution.
Overview
A distributed execution cluster consists of:
Shared database (etcd or DynamoDB): Stores cluster data and coordinates communication between gateway servers
Gateway server(s): Manages automation resources and delegates execution to runners
Runner nodes: Execute automation services and report results back to gateway server(s)
Gateway client: Sends requests to the gateway server for processing
Prerequisites
Before starting the deployment:
Ensure all nodes have IAG installed
Verify network connectivity between all components
Have administrative access to configure each component
Understand your desired cluster topology and node assignments
Step 1: Configure the shared database
Choose between etcd or Amazon DynamoDB based on your infrastructure preferences and requirements. Both databases store cluster data and enable coordination between cluster nodes.
Option A: Configure etcd database
The etcd database stores data in a persistent key-value store and enables coordination between cluster nodes.
Verify successful authentication before proceeding
Ensure client has appropriate permissions for intended operations
Step 5: Verify cluster deployment
Test cluster connectivity
Check runner registration:
Run iagctl get runners from the gateway client
Verify all expected runners appear in the output
Confirm runners show as online and available
Verify cluster status:
Check that all runners are registered with the same cluster ID
Confirm gateway server recognizes all runners
Review logs for any connectivity issues
Test service execution
Execute test services:
Run automation requests through the gateway client
Monitor execution across different runner nodes
Verify round-robin distribution is working correctly
Monitor execution logs:
Observe logs on runner nodes during service execution
Confirm services execute on the expected runner nodes
Verify results are returned correctly to the client
Performance verification
Test load distribution:
Execute multiple concurrent requests
Verify load is distributed across available runners
Monitor resource utilization on runner nodes
Validate failover behavior:
Temporarily disable a runner node
Confirm execution continues on remaining runners
Verify automatic redistribution of load
Troubleshooting common issues
Unable to register runners
Check database connectivity: Verify all nodes can connect to the shared database (etcd or DynamoDB)
Verify cluster ID: Ensure all nodes use the same GATEWAY_APPLICATION_CLUSTER_ID
Review network configuration: Check firewall rules and network connectivity
For DynamoDB: Verify AWS credentials and IAM permissions
Examine logs: Look for specific error messages in runner startup logs
Gateway server doesn’t recognize runners
Verify database configuration: Ensure gateway server and runners use identical database settings
*** Check distributed execution**: Confirm GATEWAY_SERVER_DISTRIBUTED_EXECUTION is set to true
Review cluster membership: Verify cluster ID consistency across all nodes
For DynamoDB: Ensure consistent AWS region and table configuration
Client connection issues
Verify server accessibility: Confirm client can reach gateway server hostname/IP
Check authentication: Ensure client authentication is properly configured
Review network settings: Verify firewall rules allow client-server communication
Review TLS configuration: Confirm TLS configuration from client to server
Post-deployment tasks
Monitoring setup
Implement cluster monitoring: Set up monitoring for all cluster components
Configure log aggregation: Centralize logs for easier troubleshooting
Set up alerting: Create alerts for runner failures or connectivity issues
Documentation and maintenance
Document cluster configuration: Record all configuration settings and topology
Create operational procedures: Document startup, shutdown, and maintenance procedures
Plan scaling procedures: Prepare processes for adding or removing runner nodes
Security hardening
Implement access controls: Configure appropriate authentication and authorization
Plan security updates: Establish procedures for applying security patches
Your distributed execution cluster is now ready for production use. Regular monitoring and maintenance will ensure optimal performance and reliability.