Configure disaster recovery
To configure disaster recovery, you must provision a replica to serve as backup during failovers. If your primary server is permanently disabled, you can then promote a replica.
puppet infrastructure commands, which are used to configure and manage disaster recovery, require a valid admin RBAC token and must be run from a root session. Running with elevated privileges via sudo puppet infrastructure is not sufficient. Instead, start a root session by running sudo su -, and then run the puppet infrastructure command. For details about these commands, run puppet infrastructure help <ACTION>, for example, puppet infrastructure help provision. Provision and enable a replica
Provisioning a replica duplicates specific components and services from the primary server to the replica. Enabling a replica activates most of its duplicated services and components, and instructs agents and infrastructure nodes how to communicate in a failover scenario.
- Apply disaster recovery system and software requirements.
- Ensure you have a valid admin RBAC token.
- Ensure Code Manager is enabled and configured on your primary server.
- Move any tuning parameters that you set for your primary server using the console to Hiera. Using Hiera ensures configuration is applied to both your primary server and replica.
- Back up your classifier hierarchy, because enabling a replica alters classification.
Managing agent communication in multi-region installations
Typically, when you enable a replica using puppet infrastructure enable replica, the configuration tool automatically sets the same communication parameters for all agents. In multi-region installations, with load balancers or compilers in multiple locations, you must manually configure agent communication settings so that agents fail over to the appropriate load balancer or compiler.
--skip-agent-config flag when you provision and enable a replica, for example: puppet infrastructure provision replica example.puppet.com --enable --skip-agent-config To manually configure which load balancer or compiler agents communicate with, use one of these options:
- CSR attributes
-
For each node, include a CSR attribute that identifies the location of the node, for example
pp_regionorpp_datacenter. -
Create child groups off of the PE Agent node group for each location.
-
In each child node group, include the
puppet_enterprise::profile::agentmodule and set theserver_listparameter to the appropriate load balancer or compiler hostname. -
In each child node group, add a rule that uses the trusted fact created from the CSR attribute.
-
- Hiera
For each node or group of nodes, create a key/value pair that sets the
puppet_enterprise::profile::agent::server_listparameter to be used by the PE Agent node group. - Custom method that sets the
server_listparameter inpuppet.conf.
Promote a replica
If your primary server can’t be restored, you can promote the replica to primary server to establish the replica as the new, permanent primary server.
Enable a new replica using a failed primary server
After promoting a replica, you can use your old primary server as a new replica, effectively swapping the roles of your failed primary server and promoted replica.
The puppet infrastructure run command leverages built-in Bolt plans to automate certain management tasks. To use this command, you must be able to connect using SSH from your primary server to any nodes that the command modifies. You can establish an SSH connection using key forwarding, a local key file, or by specifying keys in .ssh/config on your primary server. For more information, see Bolt OpenSSH configuration options.
To view all available parameters, use the --help flag. The logs for all puppet infrastructure run Bolt plans are located at /var/log/puppetlabs/installer/bolt_info.log.
You must be able to reach the failed primary server via SSH from the current primary server.
puppet infrastructure run enable_ha_failover, specifying these parameters: -
host— Hostname of the failed primary server. This node becomes your new replica. -
topology— Architecture used in your environment, eithermono(standard) ormono-with-compile(large). -
replication_timeout_secs— Optional. The number of seconds allowed to complete provisioning and enabling of the new replica before the command fails. -
tmpdir— Optional. Path to a directory to use for uploading and executing temporary files.
puppet infrastructure run enable_ha_failover host=<FAILED_PRIMARY_HOSTNAME> topology=mono Forget a replica
Forgetting a replica cleans up classification and database state, preventing degraded performance over time.
Ensure you have a valid admin RBAC token and the replica you want to remove is permanently offline.
Run the forget command whenever a replica node is destroyed, even if you plan to replace it with a replica with the same name.
- On the primary server, as the root user, run
puppet infrastructure forget <REPLICA NODE NAME> - Delete your secret key file from the replica because leaving sensitive information on a replica poses a security risk. The path to the secret key file is
/etc/puppetlabs/orchestration-services/conf.d/secrets/keys.json
Reinitialize a replica
If you encounter certain errors on your replica after provisioning, you can reinitialize the replica. Reinitializing destroys and re-creates replica databases, as specified.
Reinitialization is not intended to fix slow queries or intermittent failures. Reinitialize your replica only if it’s inoperational or you see replication errors.