*iProtect™ Stand-by Server

This manual represents the knowledge at the above-mentioned time. TKH security works non-stop to improve her products. For the most recent technical information please contact your consultant or dealer.


1. Introduction

This document describes the stand-by functionality of iProtect. The cold stand-by functionality is already available for many years in iProtect and from the iProtect 8.03 release also a warm stand-by functionality has been implemented.


2. Used network ports, services and license

The Standby functionality becomes available with a license. The type of Standby functionality is determined by a value within the license.

Description

License number

Value

Description

License number

Value

Cold Stand-by functionality

5080

1

Warm stand-by functionality

5080

2

iProtect version <=10.0 requires this license to be installed on both the primary and backup servers.

From iProtect 10.01 and higher only 1 license is required, the backup server takes over the license from the primary server. After failing over, there will be a grace period of 30 days to activate the license.

2.1 Used network ports and service

Port/ service

Involved standby type

Function

Remarks

Port/ service

Involved standby type

Function

Remarks

20000/ tcp

Cold and Warm

Status check between primary and backup server

Used for iProtect version 10.02 or lower

20001/ tcp

Cold and Warm

Status check between primary and backup server

Used for iProtect version 10.03 or higher

VRRP service

Warm

Broadcast protocol to check keepalived software

 


3. Stand-by functionality

This chapter describes the various standby functionalities:

Primary iProtect Server
The primary iProtect server is the iProtect server that controls the hardware in the system and where the users log into. It is the operational system.

Backup iProtect Server
The backup iProtect server is a second iProtect server that is periodically updated with the same data as the primary iProtect server, but it does not communicate with the hardware in the system. Usually, the users do not log into this system, because they can’t perform any actions.

Cold Stand-by
A Cold Stand-by server is a backup iProtect server that receives once every 24-hours the full database from the primary iProtect server. It immediately restores this database and has all the iProtect processes up and running. However, there is no communication with the hardware and users. When the primary iProtect server fails an operator can activate the cold stand-by server and from that moment on this server starts the communication with the hardware. Because the server contains the data from the backup, this will become the actual state of the system. All changes between the moment of the backup and the moment of the switch over will be lost.

Warm Stand-by
In the warm stand-by configuration the backup iProtect server receives updates from the primary iProtect server at short intervals, typically every minute. It immediately restores these updates and has all the iProtect processes up and running, except for the GUI. Once every 24 hours the backup server receives a full backup of the primary iProtect database (as in the cold stand-by configuration). There is no communication with the hardware and users. When the primary iProtect server fails the warm stand-by server is automatically activated and from that moment on this server starts the communication with the hardware. Because the server contains the data from the backup and the intermediate updates, only a very limited amount of data will be lost, i.e. only the changes within the last minute before the failure. The automatic fail-over mechanism is described in chapter 4.

From iProtect version 10.02 the SSL/TLS certificates are also synchronized to the backup server. In order to include all used DNS names and IP-addresses for the primary and backup server, they must be added to the SAN (Subject Alternative Name). This setting can be found in the iProtect maintenance page: iProtect → Certificate → Configuration

3.1 Cold stand-by configuration

The primary- and the standby server MUST be the same iProtect version.

On the primary server we must define the ip-address and the atlas credentials of the backup server (login of the Maintenance page of the backup server).

This can be defined with the menu option: Installation >Settings > System parameters > tab Backup. Go to Backup server where the parameters can be filled in.

  1. Daily backup time: this defines the time at which the daily backup will be made. After this backup has
    been made the backup is also transferred to the backup server.

  2. Server IP Address: This is the IP-address of the backup server

  3. Password: This is the password of the atlas account on the backup server.

3.1.1 Software version, before 9.05

For iProtect versions lower than 9.05 the following procedure must be followed:

  1. The backup server can be configured by starting a backup on the primary server via the maintenance
    page.

  2. When the installation on the backup server is completed the license file (with the standby server
    license) must be installed on the backup server. This is done via the Install > Configuration >Upload
    license in the maintenance page of the backup server.

3.1.2 Software version, from and after 9.05

For iProtect version 09.05 and higher, a different procedure has to be followed due the encryption of the backup. Follow the next procedure. This is based on a new iProtect installation:

  1. Install the same iProtect version as the primary server on the backup server.

  2. Create an fresh backup on the primary server.

  3. Download the key backup file from the serverbox of the primary server (iProtect >Backup-Download->Key Backup)

  4. Download the encrypted backup file from the primary server (iProtect >Backup-Download->Encrypted Backup)

  5. Ask Service & Support for an restore.taz file, they need the key backup file to generate the restore.taz file.

  6. Upload the restore.taz file on the backup server (Install > Upgrade > Configuration > Upload iProtect)

  7. Upload the encrypted backup file you downloaded from the primary server to the backup server (Install > Upgrade > Configuration > Upload iProtect)

  8. If iProtect version 10.00 or lower is used upload the license: This is done via the Install > Configuration >Upload license in the maintenance page of the backup server.

After the backup server has been installed and the manual backup has been restored, the synchronization between the primary server can be started by making a new backup on the primary server. A backup can be made through the maintenance page of the primary server (iProtect >Backup >General >Start). The progress can be monitored via the last transaction in iProtect.

The state of the server can be verified by checking the Database status in the Installation > Settings > System parameters menu option. For the primary server the status should be In Operation and for the standby server this should be Standby. Also, in the lower left corner of the user interface of the backup server the message Standby!!!! should be visible.

3.1.3 Events

When the transfer of the full backup starts from the primary to the iProtect backup server, the primary server logs an event: Start standby upgrade.

When the backup server has successfully restored the backup, the event Standby upgrade done is logged with the event: Cold standby dataset restore done.

The backup server restarts the iProtect application after the backup from the primary server has been restored.

3.2 Warm stand-by configuration

On the primary server we must define the ip-address and the atlas account of the backup server. This can be defined with the menu option: Installation >Settings >System parameters. On the tab Backup/Backup
server the parameters can be filled in.

  1. Daily backup time: this defines the time at which the daily backup will be made. After this backup has
    been made the backup is also transferred to the backup server.

  2. Server IP Address: This is the IP-address of the backup server.

  3. Password: This is the password of the atlas account on the backup server.

  4. On the tab Installation > Settings > System parameters > Hardware the checkbox Deactivate lines
    on backup/restore should be UNCHECKED. Otherwise, the lines are not activated after the standby
    server becomes the primary server.

3.2.1 Below is the step-by-step plan to create a standby configuration for the first time.

  1. Install the same iProtect version as the primary server on the backup server.

  2. Create the automatic failover configuration on both servers. (see the configuration details on chapter 4.)

  3. Create an fresh backup on the primary server, wait till the backup process is completed. the progress can be monitored via the last transactions in iProtect.

  4. Download the key backup file from the serverbox of the primary server (iProtect > Backup-Download-> Key Backup)

  5. Download the encrypted backup file from the primary server (iProtect > Backup-Download->Encrypted Backup)

  6. Ask Service & Support for an restore.taz file, they need the key backup file to generate the restore.taz file.

  7. Upload the restore.taz file on the backup server (Install > Upgrade > Configuration > Upload iProtect)

  8. Upload the encrypted backup file you downloaded from the primary server to the backup server (Install > Upgrade > Configuration > Upload iProtect.

  9. Activate the keepalived services on the primary server (Server > Services > Configuration), The role must be set to Master and the Service Active to yes.

  10. Activate the keepalived services on the Backup server (Server > Services > Configuration), The role must be set to Backup and the Service Active to yes.

After the backup server has been installed and the manual backup has been restored, the synchronization between the primary server and the backup server can be started by making a backup on the primary server. A backup can be made through the maintenance page of the primary server
(iProtect->Backup->General->Start). The progress can be monitored via the last transaction in iProtect.

To check if the warm backup process is running correctly, you can check the /home/backup/increment folder
on the primary iProtect server. In this folder all changes in the database of the last minute are stored before
they are transmitted to the backup server. These files should change every minute.

3.2.2 Events

When the transfer of the full backup starts from the primary to the backup iProtect server, the primary iProtect server logs an event: Start standby upgrade.

When the backup server has successfully restored the backup, the event Standby upgrade done is logged with the message Warm standby dataset restore done.

The fail over of the iProtect system will be logged by the transaction: State change database => Server state:
Failed over. Give this event priority “hard” if you want an alarm on failover.

When a failover has occurred, you will see the message "Failed over" in the lower left corner of the user interface.

The notification “ Failed over” disappears after you reconfigure the failover configuration (See chapter 4.4 how-to reconfigure the failover configuration after an failover).


4. Automatic fail-over configuration

For the warm standby functionality also, the automatic fail-over functionality must be configured. This
functionality makes sure that when the primary iProtect server fails. The backup server automatically activated as primary server, with 1 minute delay to cover short network/ system issues.

For this functionality we use the keep-alived package of Linux. This package is by default installed on iPuntu 2.0 and higher, as well as iProtect-setup on Ubuntu 20.04 and higher. The keep-alived software uses the VRRP protocol to check if the primary server is up and running. To achieve this broadcast messages are used.

This fail-over mechanism uses an additional (virtual) IP-address. This extra IP-address will be used as the IP-address that is used in iProtect (for users and devices). So, we are not using the “real” IP-address of the
iProtect servers for this. When the fail-over takes place, the virtual IP address of the primary server is transferred to the backup server so all the external interfaces with the iProtect server remain operational and
the users can use the same URL to log into the iProtect server after fail-over.

Thus, we need at least 3 IP-addresses in the same LAN for this functionality:

  • The real IP-address of the primary iProtect server (192.168.1.2 in figure)

  • The real IP-address of the backup server (192.168.1.3 in figure)

  • The virtual IP-address that will be used in the iProtect URL and by the devices.
    (192.168.1.1 in figure)

An extra package with the fail-over configuration files (iPuntu-failover-config-vX.taz) needs to be installed
as well via the maintenance page.

4.1 Fail-over setups

iProtect support two different kind of fail-over setups:

• A 2-server solution, with the primary and the backup iProtect server.
• A 3-server solution, that has a separate witness server, besides the 2 iProtect servers.

4.1.1 2-server setup

In the 2-server setup the backup server monitors the availability of the primary iProtect
server. If the backup server loses contact with the primary iProtect server, it switches from
standby mode to operational mode and becomes the new primary iProtect server.
The fail-over can be caused by:

  • A crash of the iProtect software on the primary server

  • A crash of the primary server hardware

  • Loss of network between the backup server and the primary server

 

4.1.2 3-server setup

In the 3-server setup, a third server is used. This is the so-called witness server. In case loss of network
between the primary iProtect server and the backup server it is uncertain for the backup server if the primary
server is still running. It could be the case that the network interface of the standby server is broken and that
the iProtect system is functioning normally. This situation can be solved by the introduction of a witness server that checks the availability of both the primary and backup server. In case of conflicts between backup and primary server: for example the backup server thinks the primary server is down, but the primary server is up, the witness server decides what the correct situation is

4.2 Configuration

4.2.1 Primary server setup

First, we must define the virtual IP-address in the failover configuration files.

  • Login to the primary iProtect server using an ssh client e.q. putty.

  • Go the the failover directory: cd /home/atlas/failover

  • Edit both the keepalived.MASTER.conf and the keepaliver.BACKUP.conf file

  • Change the following part of the file:
    virtual_ipaddress {
    192.168.152.200/24 brd 192.168.152.255 dev eth0
    }

  • Replace the first IP-address (192.168.152.200) with the IP-address that will be used for the virtual
    IP-address.

  • The second number (/24) is the type of network this IP-address is on (/24 for a class C network).

  • The third number (192.168.152.255) is the broadcast address on this network: For class C network
    the first 3 numbers are the same as the IP-address and the 4th number is 255.

  • So if our virtual IP-address is 192.168.1.1 on a class C network the line should be changed to:
    192.168.1.1/24 brd 192.168.1.255 dev eth0

  • Save the file and do the same for the keepaliver.BACKUP.conf file.

In the Maintenance page, for the 2-server setup, only the Keepalived part needs to be configured.

This functionality is configured in the Server > Services > Configuration area of the Maintenance page of the
iProtect server.

The Role of this server is Master, and with Activate set to Yes, the failover functionality is started. The iProtect
system is now available on the virtual IP-address that has been configured in the keepalived configuration files. The Current State shows the state the server is in at the moment.

4.2.2 Backup server setup

First, we must define the virtual IP-address in the failover configuration files.

  • Login to the backup iProtect server using an ssh client e.q. putty.

  • Go the the failover directory: cd /home/atlas/failover

  • Edit both the keepalived.MASTER.conf and the keepaliver.BACKUP.conf file.

  • Change the following part of the file:
    virtual_ipaddress {
    192.168.152.200/24 brd 192.168.152.255 dev eth0
    }

  • Replace the first IP-address (192.168.152.200) with the IP-address that will be used for the virtual IPaddress.

  • The second number (/24) is the type of network this IP-address is on (/24 for a class C network).

  • The third number (192.168.152.255) is the broadcast address on this network: For class C network
    the first 3 numbers are the same as the IP-address and the 4th number is 255.

  • So if our virtual IP-address is 192.168.1.1 on a class C network the line should be changed to:
    192.168.1.1/24 brd 192.168.1.255 dev eth0

  • If needed change the interface name (line 21 and 30 in the script)

global_defs { enable_script_security script_user atlas atlas } vrrp_sync_group VG1 { group { VI_1 } notify /home/atlas/failover/failover.pl } vrrp_script failover { script "/home/atlas/failover/failover.pl check" interval 15 fall 4 } vrrp_instance VI_1 { state MASTER interface eth1 virtual_router_id 51 priority 100 advert_int 20 authentication { auth_type PASS auth_pass password } virtual_ipaddress { 192.168.152.200/24 brd 192.168.152.255 dev eth1 } track_script { failover } }

Save the file and do the same for the keepalived.BACKUP.conf file.

In the Maintenance page, for the 2-server setup, only the Keepalived part needs to be configured.

This functionality is configured in the Server >Services > Configuration area of the
Maintenance page of the iProtect server.

The Role of this server is Backup, and with Activate set to Yes, the failover functionality is started. The backup iProtect system can only be reached on the URL of the maintenance page or cockpit panel with the real IP-address of the backup server. The iProtect GUI is NOT running!

4.2.3 Witness server setup

The witness server can be a small iProtect server (4GB memory is more than enough). The maintenance page
should be installed on this server but iProtect must not be running. It can be disabled permanently in the services part of cockpit panel or in iProtect version <=10.01 with the command disable77.

To configure the server as a witness server the Server->Services->Configuration area of the Maintenance page is used.

Witness server
For the witness server the Role should be set to Server. The Protocol that will be used to test if the master
and backup server are running should be set to Pong, which is basically HTTPS.
Furthermore, the real IP-address of the master and the backup server must be entered.

If the IP-address of the witness server is 192.168.1.4 the setup in the example from becomes:

Role: Server
Protocol: Pong
IP-address Master: 192.168.1.2
IP-address Backup: 192.168.1.3

Master + Backup servers
For the master and the backup server the Role should be set to Client. The IP-address of the witness server
must be entered. Also, the username and password of the maintenance page of the witness server must be
entered.

So in our example:
Role: Client
IP-address witness: 192.168.1.4
Username: atlas
Password: *******


5 Important remarks

The warm standby and failover features of the iProtect system cause the system to behave different from a normal iProtect system. In this chapter a number of important issues are addressed when dealing with a warm standby system.

5.1 Stopping / restarting iProtect

Because we have a failover mechanism, we cannot simply stop the primary iProtect system when we want to stop iProtect.

The backup iProtect server would take over the control of the lines.

When you want to stop the iProtect system the following procedure should be followed:

  1. Stop the keepalived process on the backup server by selecting activate NO in the maintenance page

  2. Stop the keepalived process on the primary server by selecting activate NO in the maintenance page

  3. It is now safe to stop the iProtect processes or to shutdown/ reboot the server, without the backup server being activated.

5.2 Upgrading iProtect

When upgrading the iProtect system, it is enough to upgrade only the primary iProtect server.
The backup server will be upgraded automatically (including license) with the next full backup.

5.3 Using the proper IP address

In the warm standby setup, the iProtect system uses 3 or 4 IP-addresses in the same network. It is important
to always use the correct IP-address.

  • The users of the iProtect system should use the Virtual IP-address, so that when the system switches
    from primary to the backup server they still can use the same URL.

  • External interfaces to the database should use the Virtual IP-address, so that when the system
    switches from primary to the backup server it can connect to the same URL.

  • Maintenance on the primary and backup server via the maintenance page or putty should use the
    Real IP-address of the server.

To list all the IP-addresses from an interface (eth0), including the virtual IP-address you can use the command: “ip addr show eth0”. You will get something like:

This shows that 192.168.1.1 is present as the secondary virtual) IP-address.

5.4 Reconfigure fail-over mechanism

The fail-over mechanism is a one-time mechanism: after a fail-over has occurred the system needs to be reconfigured again to activate this mechanism again.
To enable the warm standby again after the old primary server is repaired, it is best to make the old primary server the backup server.

The procedure for this is:

  1. Add the old primary server as the backup server in iProtect. So in our example:
    192.168.1.2 should be filled in as the Backup server in the Backup tab of Installation->Settings->System parameters.

  2. Change the keepalived services on the new primary server in the role Master (the service will already be running and the status will already be master, with this adjustment we ensure that the server is also given the correct priority).

  3. Activate the keepalived services on the backup server in the role backup at the maintenance page of the new backup server (old primary server)

  4. Make a backup on the new primary server. This is automatically transferred to the new backup server
    which becomes the warm standby server

The progress can be monitored via the last transaction in iProtect.

To check if the warm backup process is running correctly, you can check the /home/backup/increment folder on the primary iProtect server. In this folder all changes in the database of the last minute are stored before they are transmitted to the backup server. These files should change every minute.

5. Activate the license on the new primary server

5.5 Lines

In iProtect the lines with underlying nodes automatically fail-over to the new primary iProtect server. Because the key stores and certificates are also transferred with the database, they can continue to communicate securely with the iProtect system. If the lines are connected via a separate network interface on the iProtect server (technical network), no special fail-over configuration is required for this interface. However, it is required that the backup server also has a second network interface in the same technical network.

5.6 Expected behavior

With warm standby many different scenarios can occur, and it can be difficult to predict the behavior of the system. Here we describe a number of likely scenario’s and the response of the system.

5.6.1 iProtect process crashes

When one of the iProtect processes crashes (kp77db, kp77ln, kp77usr, kp77trans) keepalived will detect this and the primary system will go into FAULT status. The other iProtect processes on this system will be terminated. This is detected by the standby system and this system will go into MASTER state and iProtect on this system will go from standby to operational. The virtual IP address of the system will go over from the
primary server to the standby server. The old primary server will go into BACKUP state. If the old primary server is rebooted, or if by accident the iProtect software is started again, the keepalived software will prevent it from becoming operational.

5.6.2 Primary server stops

When the primary server stops the keep-alived process on the standby server will detect this and this system
will go into MASTER state and iProtect on this system will go from standby to operational. The virtual IP
address of the system will go over from the primary server to the standby server

If the old primary server is started again the keep-alived software will prevent it from becoming operational.

5.6.3 Primary server freezes temporarily

When the primary server freezes temporarily (i.e. due to a snapshot on the VM server) it will not respond in time to the standby server. This is detected by the standby system and this system will go into MASTER state and iProtect on this system will go from standby to operational. The virtual IP address of the system will go over from the primary server to the standby server.

If the primary server then “unfreezes” it is still in MASTER state, but it will detect that there is another master
server in the network and it will go to BACKUP state and deactivate the iProtect processes.

5.6.4 The network between primary and standby is interrupted

When the network between the primary and standby server is interrupted, it will not respond in time to the
standby server. This is detected by the standby system. When there is a witness server in the system the
standby system will query the witness whether it sees the master server. If the witness sees the master, nothing will happen. If the witness server also cannot see the master the standby server will go into MASTER state and iProtect on this system will go from standby to operational. The virtual IP address of the system will go over from the primary server to the stand-by server.

If the network connection between primary and stand-by server is restored again, the primary server is still in MASTER state, but it will detect that there is another master server in the network and it will go to BACKUP state and deactivate the iProtect processes.

5.6.5 Both servers failed at the same time.

When both servers have failed due to, for example, a power failure and after the failure both servers start up again, the old master server will boot without iProtect processes and the old backup server has become active in the master state.