Deploying distributed execution clusters

This guide provides step-by-step procedures for deploying a single gateway cluster with distributed service execution, where gateway servers handle management functions and dedicated runner nodes handle service execution.

Overview

A distributed execution cluster consists of:

Shared database (etcd or DynamoDB): Stores cluster data and coordinates communication between gateway servers
Gateway server(s): Manages automation resources and delegates execution to runners
Runner nodes: Execute automation services and report results back to gateway server(s)
Gateway client: Sends requests to the gateway server for processing

Prerequisites

Before starting the deployment:

Ensure all nodes have IAG installed
Verify network connectivity between all components
Have administrative access to configure each component
Understand your desired cluster topology and node assignments

Step 1: Configure the shared database

Choose between etcd or Amazon DynamoDB based on your infrastructure preferences and requirements. Both databases store cluster data and enable coordination between cluster nodes.

Option A: Configure etcd database

The etcd database stores data in a persistent key-value store and enables coordination between cluster nodes.

Setup etcd

Install and configure etcd following the official etcd documentation
Configure etcd server or cluster using the etcd database configuration procedures
Verify etcd is running and accessible from all planned gateway cluster nodes
Record connection details (hostname and port) for use in subsequent configuration steps

Security Considerations

Configure appropriate authentication and authorization for etcd access
Ensure network security between etcd and gateway cluster nodes
Consider TLS encryption for etcd communications in production environments

Option B: Configure Amazon DynamoDB table

Amazon DynamoDB provides a managed NoSQL database service that can serve as the shared database for gateway clusters.

Setup DynamoDB

Create a DynamoDB table following the DynamoDB table configuration procedures
Configure AWS credentials with appropriate permissions for all gateway cluster nodes
Verify connectivity from all planned gateway cluster nodes to AWS DynamoDB service
Record AWS region and table configuration for use in subsequent configuration steps

Security Considerations

Configure IAM roles and policies with least privilege access
Enable encryption at rest and in transit
Consider VPC endpoints for enhanced security
Monitor access through AWS CloudTrail

Step 2: Configure the gateway server

The gateway server manages automation resources and coordinates execution across runner nodes.

Database Connection Configuration

Choose the appropriate configuration based on your selected database option.

For etcd database

Set the etcd connection:
- Configure GATEWAY_STORE_ETCD_HOSTS to the hostname:port of your etcd server
- For etcd clusters, use a space-separated list: hostname1:port hostname2:port hostname3:port
Verify etcd store variables:
- Review all GATEWAY_STORE_ETCD_* configuration variables
- Ensure they match your etcd setup and security requirements
- For more information, see Store variables.
Handle data migration (if applicable):
- If migrating from a local database to etcd, use the iagctl db migrate command
- For more information, see iagctl db migrate
- Plan migration during a maintenance window to avoid service disruption

For DynamoDB database

Set the DynamoDB connection:
- Configure GATEWAY_STORE_DYNAMODB_* variables according to your AWS setup
- Set the appropriate AWS region and table names
- Configure AWS credentials through environment variables, IAM roles, or credential files
Verify DynamoDB store variables:
- Review all GATEWAY_STORE_DYNAMODB_* configuration variables
- Ensure proper AWS permissions and connectivity
- For more information, see Store variables.
Handle data migration (if applicable):
- If migrating from a local database to DynamoDB, use the iagctl db migrate command
- For more information, iagctl db migrate
- Plan migration during a maintenance window to avoid service disruption

Cluster Configuration

Set the cluster ID:
- Configure GATEWAY_APPLICATION_CLUSTER_ID to your desired cluster identifier
- Use a descriptive name that reflects the cluster's purpose or environment
- Note: Changing the cluster ID creates a new namespace in the database
Configure application mode:
- Set GATEWAY_APPLICATION_MODE to server
- This designates the node as a gateway server rather than a runner

Distributed Execution Setup

Enable distributed execution:
- Set GATEWAY_SERVER_DISTRIBUTED_EXECUTION to true
- This enables round-robin distribution of execution requests to registered runners
- Runners must share the same database (etcd or DynamoDB) and cluster ID
Configure server variables:
- Verify all GATEWAY_SERVER_* configuration variables are properly set
- Ensure configuration allows gateway clients to connect and send requests
- Pay particular attention to network and security settings

Start the gateway server

Launch the gateway server:
- Start the server using one of the following methods:
  - Use the systemd service (if installed via installer): systemctl start iagctl
  - Run directly from the CLI (if installed without a service): iagctl server
  - Start your container (if using containerized deployment)
- The server will begin listening for requests from configured gateway clients
- Monitor logs for successful startup and database connection confirmation

Step 3: Configure runner nodes

Runner nodes handle the actual execution of automation services delegated by gateway servers.

Database and Cluster Configuration

Configure database connection:
- Set all database configuration variables to match the gateway server exactly
- For etcd: Use identical GATEWAY_STORE_ETCD_* values
- For DynamoDB: Use identical GATEWAY_STORE_DYNAMODB_* values
- Use identical values to ensure proper cluster membership
Set cluster membership:
- Configure GATEWAY_APPLICATION_CLUSTER_ID to the same value as the gateway server
- This ensures the runner joins the correct cluster namespace
Set application mode:
- Configure GATEWAY_APPLICATION_MODE to runner
- This designates the node as an execution-only runner

Runner Communication Setup

Configure runner variables:
- Set appropriate values for all GATEWAY_RUNNER_* configuration variables
- Ensure runner nodes can communicate with the gateway server
- Configure any specific execution environment requirements

Start Runner Nodes

Launch each runner:
- Start each runner using one of the following methods:
  - Use the systemd service (if installed via installer): systemctl start iagctl
  - Run directly from the CLI (if installed without a service): iagctl runner
  - Start your container (if using containerized deployment)
- Monitor logs for successful startup
- Look for the INFO level log message: registered runner with database
- This confirms successful registration with the cluster

Step 4: Configure gateway client

The gateway client sends automation requests to the gateway server for processing and execution.

Client Connection Setup

Configure server connection:
- Set GATEWAY_CLIENT_HOST to the hostname or IP address of the gateway server
- Set GATEWAY_CLIENT_PORT to the port of the gateway server
- Ensure the hostname is resolvable and accessible from the client
Verify client configuration:
- Review all GATEWAY_CLIENT_* configuration variables
- Ensure proper network connectivity and authentication settings
- Configure any required security parameters

Client Authentication

Authenticate with the server:
- Follow the login guide to authenticate the client
- Verify successful authentication before proceeding
- Ensure client has appropriate permissions for intended operations

Step 5: Verify cluster deployment

Test Cluster Connectivity

Check runner registration:
- Run iagctl get runners from the gateway client
- Verify all expected runners appear in the output
- Confirm runners show as online and available
Verify cluster status:
- Check that all runners are registered with the same cluster ID
- Confirm gateway server recognizes all runners
- Review logs for any connectivity issues

Test Service Execution

Execute test services:
- Run automation requests through the gateway client
- Monitor execution across different runner nodes
- Verify round-robin distribution is working correctly
Monitor execution logs:
- Observe logs on runner nodes during service execution
- Confirm services execute on the expected runner nodes
- Verify results are returned correctly to the client

Performance Verification

Test load distribution:
- Execute multiple concurrent requests
- Verify load is distributed across available runners
- Monitor resource utilization on runner nodes
Validate failover behavior:
- Temporarily disable a runner node
- Confirm execution continues on remaining runners
- Verify automatic redistribution of load

Troubleshooting Common Issues

Runners Not Registering

Check database connectivity: Verify all nodes can connect to the shared database (etcd or DynamoDB)
Verify cluster ID: Ensure all nodes use the same GATEWAY_APPLICATION_CLUSTER_ID
Review network configuration: Check firewall rules and network connectivity
For DynamoDB: Verify AWS credentials and IAM permissions
Examine logs: Look for specific error messages in runner startup logs

Gateway Server Not Recognizing Runners

Verify database configuration: Ensure gateway server and runners use identical database settings
*** Check distributed execution**: Confirm GATEWAY_SERVER_DISTRIBUTED_EXECUTION is set to true
Review cluster membership: Verify cluster ID consistency across all nodes
For DynamoDB: Ensure consistent AWS region and table configuration

Client Connection Issues

Verify server accessibility: Confirm client can reach gateway server hostname/IP
Check authentication: Ensure client authentication is properly configured
Review network settings: Verify firewall rules allow client-server communication
Review TLS configuration: Confirm TLS configuration from client to server

Post-Deployment Tasks

Monitoring Setup

Implement cluster monitoring: Set up monitoring for all cluster components
Configure log aggregation: Centralize logs for easier troubleshooting
Set up alerting: Create alerts for runner failures or connectivity issues

Documentation and Maintenance

Document cluster configuration: Record all configuration settings and topology
Create operational procedures: Document startup, shutdown, and maintenance procedures
Plan scaling procedures: Prepare processes for adding or removing runner nodes

Security Hardening

Implement access controls: Configure appropriate authentication and authorization
Plan security updates: Establish procedures for applying security patches

Your distributed execution cluster is now ready for production use. Regular monitoring and maintenance will ensure optimal performance and reliability.