Reliability, Availability, Maintainability, and Safety (RAMS)

ESB is proven to run in the field continuously for years with 100% up-time.

The 4 components of ESB implement High Availability and Disaster Recovery processes. Each one as a separate entity.

  • ESB LOGIC and RabbitMQ utilize Microsoft Failover Cluster.

  • Elastic Search has its own Cluster High Availability.

  • Classic Apache ActiveMQ has its own cluster High Availability.

Regarding High Availability timings:

  • ESB Logic works in Active-Hot Standby mode. Failover time is less than 1 second. During this time clients sending information to ESB Logic will fail and retry (TCP retries failed packets). Once the standby is responding – the retries will succeed and all traffic flow continues as normal with no loss.

  • RabbitMQ works in Active-Hot Standby mode. Once the active fails, it fails traffic is diverted immediately to the Hot Standby with 0 packet loss.

  • Elastic Search cluster is configured in Remote Cluster Synchronization mode. Thus, all logs are replicated in real time to the second cluster (in SDC). Once the Active node fails – traffic is immediately diverted to the second node with 0 log loss.

ESB Failover Clustering

ESB Failover Clustering

ESB Logic

The solution uses two cluster members, one in the MDC (Main Data Center) and one in the SDC (Secondary Data Center). ESB Logic is a Windows-based application that runs on the Windows Server operating system. It leverages the Microsoft failover cluster mechanism to detect and identify any malfunctions of the operating system and/or the application itself by querying the application services.

ESB Logic High Availability is of the Active/Hot Standby type. By default, the Active node is running in the MDC and the Hot Standby node is in the SDC. Switching between Active and Hot Standby (and vice versa) is managed by the Microsoft failover cluster, using a Virtual IP mechanism. The Virtual IP is assigned to one of the two cluster members at any given time. Clients always access the Virtual IP and are unaware of whether it is assigned to the MDC or SDC member.

The Microsoft failover cluster uses a quorum (voting) mechanism to decide which node should be Active. In a setup with two cluster members, best practice is to add a “witness” cluster member. This member only participates in the voting process to determine the Active node and does not have application capabilities.

Synchronization strategy: The configuration file, which contains only setup data/configuration, will be synchronized manually between the cluster nodes as part of ongoing maintenance, not in response to a failure.

RabbitMQ

Similarly to the ESB Logic, RabbitMQ uses two cluster members, one in the MDC (Main Data Center) and one in the SDC (Secondary Data Center). It is a Windows-based application that runs on the Windows Server operating system. It leverages the Microsoft failover cluster mechanism to detect and identify any malfunctions of the operating system and/or the application itself by querying the application services.

RabbitMQ High Availability is of the Active/Hot Standby type. By default, the Active node is running in the MDC and the Hot Standby node is in the SDC. Switching between Active and Hot Standby (and vice versa) is managed by the Microsoft failover cluster, using a Virtual IP mechanism. The Virtual IP is assigned to one of the two cluster members at any given time. Clients always access the Virtual IP and are unaware of whether it is assigned to the MDC or SDC member.

The Microsoft failover cluster uses a quorum (voting) mechanism to decide which node should be Active. In a setup with two cluster members, best practice is to add a “witness” cluster member. This member only participates in the voting process to determine the Active node and does not have application capabilities.

Synchronization strategy: No synchronization is required. Topics are set up automatically on demand on each node.

Apache ActiveMQ

The solution uses two ActiveMQ cluster nodes (servers) configured as a “Network of Brokers” to provide a high availability (HA) solution. More details can be found here: https://activemq.apache.org/networks-of-brokers

Clients will use the failover:// transport protocol. They will primarily connect to the master broker in the MDC (Main Data Center). If the master broker fails, the clients will automatically switch to the slave broker in the SDC (Secondary Data Center).