*How failover works

Technical Article | TA-20201002-TP-36

VDG Sense | Tutorials | Failover

Introduction

A ’Failover’ server, or ’Hot Standby’ server is used to monitor one or more servers in the same installation. The failover server actively checks if the monitored servers are still functioning. When it detects a server is not responding anymore, it will automatically take-over the server using the same settings of the ’failed’ server, thus emulating its configuration. When the original server is back online, the failover server will shutdown automatically.

Both procedures are completely transparent for the end-user. The user will see that a server has failed (because there is no video), but in the meantime the failoverserver is starting up. The viewer station will be notified that the video needs to be requested from the failover server. This is done automatically without the need for user interaction; the video returns automatically.

 

Timeouts

The Failover server will use the following default timeout values to indicate if the failover procedure should be started:

Type

Default Timeout value

Description

Type

Default Timeout value

Description

Connected

 5 minutes

A network connection has been made, but the first startup message has not been received

Started

 5 minutes

The startup message has been received, but the first keep-alive message has not been received

Keep alive

 30 seconds

No new keep-alive message is received

Normal shutdown

 15 minutes

The server has been shutdown normally, but has not restarted yet

Abnormal shutdown

 5 minutes

The server has exited with an error and has not restarted yet

The values can be modified in the failover settings.

Failover procedures

Assuming the Failover server is actively checking one or more servers and has a recent copy of all monitored servers and the failover does not receive keep-alive messages.

Slave server fails

  • Clients will see ‘Server Connection Lost’ in video panels

  • After 30 seconds the Failover server will take over the failed server by loading the corresponding settings.

  • Automatically inform management server to change the IP address of the failed server with that of the failover server

  • Automatically inform all connected clients to relogin on the management server to update the serverlist

  • Clients are logged in and cameras of failed server are displayed

  • Failover server constantly checks if failed server is back online

Slave server is restored

  • Slave server is started, on startup videodata stored on failover server is read

  • Failover server stops takeover procedure

  • Automatically inform management server that the failover server is offline and restored server is online

  • Automatically inform all connected clients to relogin on the management server to update the serverlist

  • Clients are logged in and cameras of restored server are displayed

  • Failover server constantly checks monitored servers

Management server fails

  • Clients will see ‘Server Connection Lost’ in video panels

  • After 30 seconds the Failover server will take over the failed server by loading the corresponding settings.

  • Automatically inform slave servers to change the management server address to that of the failover server

  • Automatically inform connected clients to relogin on the new management server

  • Clients are logged in and cameras of failed server are displayed

  • Failover server constantly checks if failed server is back online

Management server is restored

  • Management server is started, on startup videodata stored on failover server is read and stored events during failover period are synchronized with the database

  • Failover server stops takeover procedure

  • Automatically inform slave servers to change the management server address to that of the restored management server

  • Automatically inform all connected clients to relogin on the restored management

  • Clients are logged in and cameras of restored server are displayed

  • Failover server constantly checks monitored servers

Storage

The failover server can monitor multiple servers at the same time, but can only take over one server at a time. The videodata for each monitored server is stored in a separate folder. The name of each folder is the IP address of the monitored server. This folder is automatically shared to provide the monitored server access to the videodata. Videodata is never synchronized with the monitored server.

The (online)monitored server is always owner of all its videodata, locally and on the failover server. This means when the failover server is offline the monitored server manages its own videodata on the failover server. The storage space on the failover server should be seen as replacement storage space, not an addition.

For example, camera channels are set to max 10 days of recording and during the first three days monitored server failed and the failover server was storing video:

Date (day)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

Date (day)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

Storage duration

1

2

3

4

5

6

7

8

9

10

11

 

 

 

Data on original server

F

F

F

S

S

S

S

S

S

S

W

 

 

 

Data on failover server

S

S

S 

 

 

 

 

 

 

 

 

 

 

 

S:Stored, F:Failed, D:Deleting, W:Writing

The server is constantly checking if the maximum storage duration is reached. This means that on day 11-14 it starts deleting videodata on the failover server:

Date (day)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

Date (day)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

Storage duration

 

 

 

1

2

3

4

5

6

7

8

9

10

11

Data on original server

F

F

F

D

S

S

S

S

S

S

S

S

S

W

Data on failover server

D

D

D

 

 

 

 

 

 

 

 

 

 

 

S:Stored, F:Failed, D:Deleting, W:Writing

For the operator this process is completely transparent