Manual Failover
  • 01 Apr 2024
  • Dark
    Light
  • PDF

Manual Failover

  • Dark
    Light
  • PDF

Article summary

Manual Failover Procedure in a Disaster Recovery Scenario

⚠ Note

The information in this article should only be considered for IAP versions that require RabbitMQ as a dependent software. This includes IAP 2022.1.x and IAP 2023.1.x.

Itential Automation Platform (IAP) has been designed to allow for multiple IAP servers to be used in a clustered and High Availability (HA) fashion when deployed within the same datacenter. When further assurrance is needed, a second disaster recovery system can be deployed which will stand ready to assume processing duties from a second datacenter.

This document provides the steps required to perform a manual failover from the Primary Active site to the Standby site when an IAP cluster is in the Large: High Availability (HA) and Disaster Recovery (Active/Standby) architecture (also known as HA-6). At the end of this procedure, the Standby system will become the active and processing system.

For component information along with architectural sizing for High Availability in IAP, refer to High Availability Architecture.

For HA where there is Disaster Recovery (DR), you will need to disable (pause) Task Worker, and shutdown the Operations ("Ops") Manager and Automation Catalog apps on the DR system.

Alternatively, the IAP servers in a DR setup could be shut-off entirely. Activating the DR site means starting IAP. Active and Standby sites would be configured identically (Task Worker "on", Ops Manager "on", Automation Catalog "on", etc). This setup is a feasible alternative as opposed to maintaining the state of individual processes within IAP.

Configure Active and Standby Servers

It is assumed that an IAP server in Standby mode was previously configured and started with the correct setting for the processTasksOnStart property in the properties.json file. As a recommendation, make sure your Active and Standby servers are configured differently when it comes to default startup task processing behavior by verifying the value of the processTasksOnStart property in each of the running servers properties.json file.

server Property Expected Value
Standby processTasksOnStart false
Active processTasksOnStart true (Default value)

Example properties.json

The following presents an example properties.json file with processTasksOnStart configured for Standby mode. The properties.json is normally located at /opt/pronghorn/current directory.

{
   "processTasksOnStart":false,
   "pathProps": {
      "`": "File Path Variables",
      "sdk_dir": "/opt/pronghorn-applications",
      "encrypted": true
   },
   "id": "Profile2",
   "mongoProps": {
      "credentials": {
      "passwd": "itentialPassword",
      "user": "itentialUser"
    },
   "db": "pronghorn",
      "url": "mongodb://127.0.0.1:27017"
   }
}

Conversely, the processTasksOnStart is set to 'true' within the properties.json file on the server considered in Active mode.

How to Enable/Disable Task Execution

In order to manually failover from IAP Active to Standby node, disable the TaskWorker on Active node from the IAP UI, and then enable it on IAP Standby node. This must be done on each runnning IAP instance. Please note that in a disaster recovery (DR) situation, disabling TaskWorker on Active node as the first step may not be possible.

Stopping and starting the TaskWorker must be done on each running IAP. Doing this from only one running IAP will not propagate across a cluster.

Enabling or disabling Task Execution (on each corresponding node) can be performed from the IAP UI in either of two ways.

  1. Navigate to Admin Essentials from the IAP home page.

  2. Click the Pause Task Execution button under "Running". A message banner appears indicating that Automation (Workflow) Engine was suspended.

    Figure 1: Admin Essentials
    01-AdminEssentialsPauseTaskExecution-21.2.png

  3. Alternately, click the Current Operations link to open the console for Active Jobs and Running Tasks.

  4. Using the Suspend Workflow Engine toggle switch (upper-right), slide it to the right to to disable Workflow Engine on Active node; and similarly, use the same button to enable Workflow Engine (normally on Standby node).

    Figure 2: Current Operations
    02-CurrentOps-SuspendWFE-Switch-21.2.png

  5. Verify the following message banner and success notification appears when Workflow Engine is disabled (normally IAP Active node in a manual failover scenario).

    Figure 3: Task Execution Suspended
    03-TaskExecutionSuspended-21.2.png

  6. Verify the message banner disappears from the node when TaskWorker is enabled (normally IAP Standby node in a manual failover scenario). The success notification will indicate that Automation (Workflow) Engine was enabled (activated).

    Figure 4: Automation (Workflow) Engine Activated
    04-TaskExecutionEnabledActive-21.2.png

  7. Isolate the Active node from the network until it is ready to return to its original Active node role. Restarting the Active node while the Standby node TaskWorker is enabled should be prevented by isolating it from reaching the rest of network.

Task Execution Service Config

In some scenarios, you may require a setting that will disable TaskWorker under the services_configs.

The startup property for TaskWorker should be activate: true.

Below is an example of the parameters that may need to be added to the TaskWorker Configuration under the Applications collection in Admin Essentials. Moreover, you will need to use two different profiles, one for Active and one for Standby (DR). The Active IAP profile will use TaskWorker-ACTIVE while the Standby IAP profile will use TaskWorker-STBY.

TaskWorker-Active

{
    "loggerProps": {
        "description": "Logging",
        "log_max_files": 100,
        "log_max_file_size": 1048576,
        "log_level": "warn",
        "log_directory": "/var/log/pronghorn",
        "log_filename": "TaskWorker.log",
        "console_level": "warn"
    },
    "isEncrypted": true,
    "model": "@itential/app-task_worker",
    "name": "TaskWorker-ACTIVE",
    "type": "Application",
    "properties": {
        "activate":true
    },
    "rabbitmq": {
        "protocol": "amqp",
        "port": 5672,
        "username": "guest",
        "password": "guest",
        "locale": "en_US",
        "frameMax": 0,
        "heartbeat": 0,
        "vhost": "/",
        "certPath": "",
        "keyPath": "",
        "passphrase": "guest",
        "caPath": "",
        "hosts": [
            "localhost"
        ]
    }
}

TaskWorker-Standby (DR)

{
    "loggerProps": {
        "description": "Logging",
        "log_max_files": 100,
        "log_max_file_size": 1048576,
        "log_level": "warn",
        "log_directory": "/var/log/pronghorn",
        "log_filename": "TaskWorker.log",
        "console_level": "warn"
    },
    "isEncrypted": true,
    "model": "@itential/app-task_worker",
    "name": "TaskWorker-STBY",
    "type": "Application",
    "properties": {
        "activate":false
    },
    "rabbitmq": {
        "protocol": "amqp",
        "port": 5672,
        "username": "guest",
        "password": "guest",
        "locale": "en_US",
        "frameMax": 0,
        "heartbeat": 0,
        "vhost": "/",
        "certPath": "",
        "keyPath": "",
        "passphrase": "guest",
        "caPath": "",
        "hosts": [
            "localhost"
        ]
    }
}

Figure 5: Workflow Engine Service Config
05-WorkflowEngineConfig-21.2.png


Was this article helpful?

Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.