Deploying distributed execution clusters

Prev Next

This guide provides step-by-step procedures for deploying a single gateway cluster with distributed service execution, where gateway servers handle management functions and dedicated runner nodes handle service execution.

Overview

A distributed execution cluster consists of:

  • Shared database (etcd or DynamoDB): Stores cluster data and coordinates communication between gateway servers
  • Gateway server(s): Manages automation resources and delegates execution to runners
  • Runner nodes: Execute automation services and report results back to gateway server(s)
  • Gateway client: Sends requests to the gateway server for processing

Prerequisites

Before starting the deployment:

  • Ensure all nodes have IAG installed
  • Verify network connectivity between all components
  • Have administrative access to configure each component
  • Understand your desired cluster topology and node assignments

Step 1: Configure the shared database

Choose between etcd or Amazon DynamoDB based on your infrastructure preferences and requirements. Both databases store cluster data and enable coordination between cluster nodes.

Option A: Configure etcd database

The etcd database stores data in a persistent key-value store and enables coordination between cluster nodes.

Setup etcd

  • Install and configure etcd following the official etcd documentation
  • Configure etcd server or cluster using the etcd database configuration procedures
  • Verify etcd is running and accessible from all planned gateway cluster nodes
  • Record connection details (hostname and port) for use in subsequent configuration steps

Security Considerations

  • Configure appropriate authentication and authorization for etcd access
  • Ensure network security between etcd and gateway cluster nodes
  • Consider TLS encryption for etcd communications in production environments

Option B: Configure Amazon DynamoDB table

Amazon DynamoDB provides a managed NoSQL database service that can serve as the shared database for gateway clusters.

Setup DynamoDB

  • Create a DynamoDB table following the DynamoDB table configuration procedures
  • Configure AWS credentials with appropriate permissions for all gateway cluster nodes
  • Verify connectivity from all planned gateway cluster nodes to AWS DynamoDB service
  • Record AWS region and table configuration for use in subsequent configuration steps

Security Considerations

  • Configure IAM roles and policies with least privilege access
  • Enable encryption at rest and in transit
  • Consider VPC endpoints for enhanced security
  • Monitor access through AWS CloudTrail

Step 2: Configure the gateway server

The gateway server manages automation resources and coordinates execution across runner nodes.

Database Connection Configuration

Choose the appropriate configuration based on your selected database option.

For etcd database

  1. Set the etcd connection:
    • Configure GATEWAY_STORE_ETCD_HOSTS to the hostname:port of your etcd server
    • For etcd clusters, use a space-separated list: hostname1:port hostname2:port hostname3:port
  2. Verify etcd store variables:
    • Review all GATEWAY_STORE_ETCD_* configuration variables
    • Ensure they match your etcd setup and security requirements
    • For more information, see Store variables.
  3. Handle data migration (if applicable):
    • If migrating from a local database to etcd, use the iagctl db migrate command
    • For more information, see iagctl db migrate
    • Plan migration during a maintenance window to avoid service disruption

For DynamoDB database

  1. Set the DynamoDB connection:
    • Configure GATEWAY_STORE_DYNAMODB_* variables according to your AWS setup
    • Set the appropriate AWS region and table names
    • Configure AWS credentials through environment variables, IAM roles, or credential files
  2. Verify DynamoDB store variables:
    • Review all GATEWAY_STORE_DYNAMODB_* configuration variables
    • Ensure proper AWS permissions and connectivity
    • For more information, see Store variables.
  3. Handle data migration (if applicable):
    • If migrating from a local database to DynamoDB, use the iagctl db migrate command
    • For more information, iagctl db migrate
    • Plan migration during a maintenance window to avoid service disruption

Cluster Configuration

  1. Set the cluster ID:
    • Configure GATEWAY_APPLICATION_CLUSTER_ID to your desired cluster identifier
    • Use a descriptive name that reflects the cluster's purpose or environment
    • Note: Changing the cluster ID creates a new namespace in the database
  2. Configure application mode:
    • Set GATEWAY_APPLICATION_MODE to server
    • This designates the node as a gateway server rather than a runner

Distributed Execution Setup

  1. Enable distributed execution:
    • Set GATEWAY_SERVER_DISTRIBUTED_EXECUTION to true
    • This enables round-robin distribution of execution requests to registered runners
    • Runners must share the same database (etcd or DynamoDB) and cluster ID
  2. Configure server variables:
    • Verify all GATEWAY_SERVER_* configuration variables are properly set
    • Ensure configuration allows gateway clients to connect and send requests
    • Pay particular attention to network and security settings

Start the gateway server

  1. Launch the gateway server:
    • Start the server using one of the following methods:
      • Use the systemd service (if installed via installer): systemctl start iagctl
      • Run directly from the CLI (if installed without a service): iagctl server
      • Start your container (if using containerized deployment)
    • The server will begin listening for requests from configured gateway clients
    • Monitor logs for successful startup and database connection confirmation

Step 3: Configure runner nodes

Runner nodes handle the actual execution of automation services delegated by gateway servers.

Database and Cluster Configuration

  1. Configure database connection:
    • Set all database configuration variables to match the gateway server exactly
    • For etcd: Use identical GATEWAY_STORE_ETCD_* values
    • For DynamoDB: Use identical GATEWAY_STORE_DYNAMODB_* values
    • Use identical values to ensure proper cluster membership
  2. Set cluster membership:
    • Configure GATEWAY_APPLICATION_CLUSTER_ID to the same value as the gateway server
    • This ensures the runner joins the correct cluster namespace
  3. Set application mode:
    • Configure GATEWAY_APPLICATION_MODE to runner
    • This designates the node as an execution-only runner

Runner Communication Setup

  1. Configure runner variables:
    • Set appropriate values for all GATEWAY_RUNNER_* configuration variables
    • Ensure runner nodes can communicate with the gateway server
    • Configure any specific execution environment requirements

Start Runner Nodes

  1. Launch each runner:
    • Start each runner using one of the following methods:
      • Use the systemd service (if installed via installer): systemctl start iagctl
      • Run directly from the CLI (if installed without a service): iagctl runner
      • Start your container (if using containerized deployment)
    • Monitor logs for successful startup
    • Look for the INFO level log message: registered runner with database
    • This confirms successful registration with the cluster

Step 4: Configure gateway client

The gateway client sends automation requests to the gateway server for processing and execution.

Client Connection Setup

  1. Configure server connection:
    • Set GATEWAY_CLIENT_HOSTS to the hostname or IP address of the gateway server
    • Ensure the hostname is resolvable and accessible from the client
  2. Verify client configuration:
    • Review all GATEWAY_CLIENT_* configuration variables
    • Ensure proper network connectivity and authentication settings
    • Configure any required security parameters

Client Authentication

  1. Authenticate with the server:
    • Follow the login guide to authenticate the client
    • Verify successful authentication before proceeding
    • Ensure client has appropriate permissions for intended operations

Step 5: Verify cluster deployment

Test Cluster Connectivity

  1. Check runner registration:
    • Run iagctl get runners from the gateway client
    • Verify all expected runners appear in the output
    • Confirm runners show as online and available
  2. Verify cluster status:
    • Check that all runners are registered with the same cluster ID
    • Confirm gateway server recognizes all runners
    • Review logs for any connectivity issues

Test Service Execution

  1. Execute test services:
    • Run automation requests through the gateway client
    • Monitor execution across different runner nodes
    • Verify round-robin distribution is working correctly
  2. Monitor execution logs:
    • Observe logs on runner nodes during service execution
    • Confirm services execute on the expected runner nodes
    • Verify results are returned correctly to the client

Performance Verification

  1. Test load distribution:
    • Execute multiple concurrent requests
    • Verify load is distributed across available runners
    • Monitor resource utilization on runner nodes
  2. Validate failover behavior:
    • Temporarily disable a runner node
    • Confirm execution continues on remaining runners
    • Verify automatic redistribution of load

Troubleshooting Common Issues

Runners Not Registering

  • Check database connectivity: Verify all nodes can connect to the shared database (etcd or DynamoDB)
  • Verify cluster ID: Ensure all nodes use the same GATEWAY_APPLICATION_CLUSTER_ID
  • Review network configuration: Check firewall rules and network connectivity
  • For DynamoDB: Verify AWS credentials and IAM permissions
  • Examine logs: Look for specific error messages in runner startup logs

Gateway Server Not Recognizing Runners

  • Verify database configuration: Ensure gateway server and runners use identical database settings
    *** Check distributed execution**: Confirm GATEWAY_SERVER_DISTRIBUTED_EXECUTION is set to true
  • Review cluster membership: Verify cluster ID consistency across all nodes
  • For DynamoDB: Ensure consistent AWS region and table configuration

Client Connection Issues

  • Verify server accessibility: Confirm client can reach gateway server hostname/IP
  • Check authentication: Ensure client authentication is properly configured
  • Review network settings: Verify firewall rules allow client-server communication
  • Review TLS configuration: Confirm TLS configuration from client to server

Post-Deployment Tasks

Monitoring Setup

  1. Implement cluster monitoring: Set up monitoring for all cluster components
  2. Configure log aggregation: Centralize logs for easier troubleshooting
  3. Set up alerting: Create alerts for runner failures or connectivity issues

Documentation and Maintenance

  1. Document cluster configuration: Record all configuration settings and topology
  2. Create operational procedures: Document startup, shutdown, and maintenance procedures
  3. Plan scaling procedures: Prepare processes for adding or removing runner nodes

Security Hardening

  1. Implement access controls: Configure appropriate authentication and authorization
  2. Plan security updates: Establish procedures for applying security patches

Your distributed execution cluster is now ready for production use. Regular monitoring and maintenance will ensure optimal performance and reliability.