This post is focused on the interface monitoring functionality in redundancy groups.
Redundancy groups (RG) in SRX chassis cluster provide high-availability. They fail over from one node to the other in case of failure. You can configure the cluster to monitor physical state of interfaces (interface monitoring) and/or check the reachability of IP addresses (IP monitoring).
Combining these options is quite flexible and allows you to define the desired circumstances that represent failure. For example: single interface physical failure, multiple interfaces physical failure, unreachable single IP address, unreachable multiple IP addresses, single interface physical failure and at the same time one IP address unreachable, etc.
In this post we will have a look at couple of the interface monitoring options. Please see the example setup below:
- RG1 contains redundant ethernet reth0 (ge-0/0/4 and ge-5/0/4 are child interfaces)
- RG2 contains redundant ethernet reth1 (ge-0/0/5, ge-0/0/6, ge-5/0/5 and ge-5/0/6 are child interfaces). The cluster forms 2 LAG interfaces - one on node0 (ge-0/0/5 and ge-0/0/6) and the other on node1 (ge-5/0/5 and ge-5/0/6).
- ge-0/0/3 and ge-5/0/3 interfaces are uplinks. The dynamic routing protocol is used for the uplink path selection.
Each monitored interface is assigned a numerical value (range 0-255) called weight. The failover is triggered when the cumulated weight of all failed interfaces equals or is more than 255. The configured weight of the monitored interfaces is crucial. It defines whether a single interface (with the weight of 255) causes the failover or multiple interfaces need to fail the same time (with the weight less then 255).
In our example, failure of any reth0 child interface causes the failover of RG1.
The same approach can be used also for RG2. Because the reth1 has 4 child interfaces another option exists. The failover would be triggered when the whole LAG fails, i.e. no active LAG links are available on the node. In our case it requires both child interfaces on one node to fail at the same time. To achieve it the weight of each child interface has to be less then 255. But at the same time the cumulated weight of 2 child interfaces needs to be 255 or more. For example: 200 and 200, 150 and 150, 200 and 100, 254 and 99, etc.
Please have a look on our redundancy groups example configuration:
The “show chassis cluster status” command displays the RG state and and the “show chassis cluster interfaces” command lists the details about monitored interfaces.
Failure of any single reth1 child interface does not change the ownership of RG2. It remains primary on node1.
However, failure of both reth1 child interfaces on the node1 results in RG2 transitioning to node0..
This approach can be generalized. If failure of N or more interfaces should trigger the failover their weights need to fulfill following criteria: The cumulated weight of N interfaces is 255 or more but at the same time the cumulated weight of N-1 interfaces has to be less than 255.
For instance lets assume following examples:
- Three or more interfaces (N=3) should trigger the failover. The cumulated weight of 3 interfaces is above 255 but the cumulated weight of 2 interfaces is less then 255. Possible options are: (100, 100, 100) or (120, 120, 120), etc.
- Four or more interfaces (N=4) should trigger the failover. The cumulated weight of 4 interfaces is above 255 but the cumulated weight of 3 interfaces is less then 255. Possible options are: (80, 80, 80 80) or (70, 70, 70, 70), etc.
Furthermore RGs can monitor reth child interfaces from other RGs or interfaces that do not belong to any reth/RG at all (called local interfaces).
A single interface can be monitored by multiple RGs and in each RG have a different weight defined. And with the weight of 255 it can cause simultaneous failover of multiple RGs.
In our example the ge-0/0/3 and ge-5/0/3 are local interfaces monitored by RG1 as well as by RG2. Both RGs have weight of 255 associated with those interfaces. If one uplink fails the both RG1 and RG2 will transition to the node with the remaining one. It helps to avoid transit traffic traversing the data link between nodes.
The cluster status below is after the previously failed interfaces (ge-5/0/5 and ge-5/0/6) are recovered. The RG1 remains primary on node0.
Now if the ge-0/0/3 interface fails the RG1 and RG2 failover to node1.
The “show chassis cluster information” command is very useful for troubleshooting because is displays detailed information about the chassis cluster. Multiple parameters can be defined for the command which provide further details about the cluster. For instance the “interface-monitor” parameter reveals the history of monitored interfaces. Please keep in mind the command is hidden in Junos release 11.4.
This post focused on the interface monitoring functionality in SRX chassis cluster. It allows to define monitored interfaces and their weights on per redundancy group basis. This makes it quite flexible and capable of accommodating various failover scenarios.