CLUEXIT Decision

OpenVMS clustering, cluster management utilities, protocols, building and managing clusters of any scale.
Post Reply

Topic author
pweaver
Member
Posts: 9
Joined: Wed May 13, 2020 1:37 pm
Reputation: 0
Status: Offline

CLUEXIT Decision

Post by pweaver » Mon Nov 18, 2024 2:24 pm

Hi,

A long, long time ago I recall reading what decision tree a cluster that uses Ethernet for the SCS communication goes through when SCS communication is lost and one node must do a CLUEXIT. The topic recently came up when a customer had a switch change go bad and one cluster had to have a member CLUEXIT. I dusted off some old books I have but could not find what I think I remember reading. What I read might have been in the old DSN Link database, or maybe it was at a DECUS presentation, it was a long time ago..

I think the decision was basically, if a node thinks it has lost SCS communication, the node with the highest SCSSYSTEMID volunteers to CLUEXIT. I also think there was something in the article about certain conditions that does not guarantee the highest SCSSYTEMID is the one that volunteers.

TLDR; Does anyone know how a cluster decides which node CLUEXITs when SCS Communication is lost?

User avatar

arne_v
Senior Member
Posts: 568
Joined: Fri Apr 17, 2020 7:31 pm
Reputation: 0
Location: Rhode Island, USA
Status: Online
Contact:

Re: CLUEXIT Decision

Post by arne_v » Mon Nov 18, 2024 2:43 pm

How many nodes with how many votes and what quorum?

Usually I would expect a scenario like:

nodes A, B and C each has 1 vote & quorum is 2 => if A and B can communicate but C cannot communicate with either then C is out

In the old days we also had the joy of quorum disks.
Arne
arne@vajhoej.dk
VMS user since 1986


abrsvc
Contributor
Posts: 15
Joined: Wed Apr 01, 2020 8:12 am
Reputation: 0
Status: Offline

Re: CLUEXIT Decision

Post by abrsvc » Tue Nov 19, 2024 7:17 am

A CLUEXIT in this case is a voluntary crash by the node that is attempting to re-enter the cluster. If the time away from the cluster exceeds RECNXINTERVAL, than the entering node will crash. This is to prevent stale data from causing any problems due to outstanding I/Os for example. A common misconception is that this crash happens when a node leaves. A node can leave and reenter a cluster within the RECNXINTERVAL without any problems. There will likely be cluster transitions for both the removal and reconnection, but the node will survive.

Dan

User avatar

volkerhalle
Master
Posts: 210
Joined: Fri Aug 14, 2020 11:31 am
Reputation: 0
Status: Offline

Re: CLUEXIT Decision

Post by volkerhalle » Tue Nov 19, 2024 2:01 pm

There are various other scenarios, which lead to CLUEXIT crashes.

The good old Digital/Compaq/HP CANASTA rules (internal crashdump analysis tool) describe all those scenarios. VSI should have a copy of those rules available internally.

Volker.
[Previous CANASTA maintainer]

User avatar

baldrick
Member
Posts: 5
Joined: Fri Jul 26, 2024 5:42 am
Reputation: 0
Location: Wigan, UK
Status: Offline

Re: CLUEXIT Decision

Post by baldrick » Thu Nov 21, 2024 12:59 pm

I spent a while understanding this with a conversation with VMS engineering in the Compaq days.

While the SCSSYSTEMID does reflect who the membership request goes to, it doesn't play a part in the CLUEXIT. (But on joining a cluster ALL members have to be able to form a VC (Virtual Circuit) to all others joined or trying to join directly, not just the one named in the join request).

You don't describe the scenario of the customer's situation so all I can assume is both nodes/sides were hung.

In a heterogenous cluster, 2 members, same votes each, loss of connection will hang both. Therefore both will experience a VC fail, and subsequently the allowed time for that "hello" before the node is declared as "removed" (RECNXINTERVAL) is exceeded.

When RECNXINTERVAL is exceeded then ALL the local connection managers perform a cluster reconfiguration (Source: VAXcluster principles Roy G Davis). Nodes will be removed from the local node's 'picture' of the cluster.

When connectivity returns it is a race condition, the first node to say "I'm back, where are you?". The message exchange will go something like this: "Hi I'm a cluster member of ABC" reply "Sorry, I removed you" and the late member will crash itself, without regaining quorum. Another situation if one of the nodes reboots then "my" cluster incarnation is later than "yours", so while it may be the same cluster, its formation or transition is later than the one known by the node attempting to join, so all it can do is CLUEXIT.

Clearly a race condition will depends on many factors, not just the CPU speed, but if the network delays a packet so another is received first. Clock skew between the systems may also affect this.

There is an implication, that I have not tried, which may suggest because RECNXINTERVAL affects when the reconfiguration occurs, that a longer RECNXINTERVAL may give a later timestamp to the cluster transition, and therefore be given precedence. Documentation does state all cluster (membership) parameters should be identical on all members.

I believe there are some nuances with the number of votes a cluster has when more than 2 nodes, typically in a more than 3 nodes cluster, more votes will 'win'.

The decision making process is not affected by the SCSSYSTEMID because you have to assume no node can communicate with any other so the rules are implemented the same way.

We've probably all at some point seen the "sending cluster request to ZZZZZZ" repeated. The usual scenario is that node ZZZZZZ has connection with more nodes that the node asking to join the cluster, so it'll be denied until it can connect with the same nodes that ZZZZZZ is connected to. There's another subtle reason, if the EXPECTED_VOTES of the joining member would cause an insufficient quorum condition, it will be prevented from joining.

We have on multiple occasions increased the RECNXINTERVAL to "hours" when the network teams have been upgrading switches, reconfiguring networks, etc., simply to avoid CLUEXIT from occurring. Needless to say the cluster hangs, and we tell the customer to expect that, and be patient. As none of the nodes exceed the RECNXINTERVAL, no reconfiguration occurs, and the systems just carry on where they left off prior to the VC closing (and the network being interrupted). We increase the shadowing timeouts too, to prevent mount verification as well.

I am not aware of the document you refer to. Bear in mind the original clustering rules (that the connection managers abide by) were set up when it was typical that systems would have multiple paths of communication, not just NI (Network Interconnect) but CI and DSSI, and indeed multiple networks are also possible. Memory Channel and FDDI were also available. Each channel is assigned a software priority (as of OpenVMS 7.2) which is 1. Memory Channel, 2. CI (computer Interconnect). 3. DSSI 4. FDDI 5. NI (Ethernet), It makes sense as you expect the network (or notwork as I'll occasionally affectionally call it) is potentially the most unreliable. (Source: "[OpenVMS] A Discussion Of How VMScluster Members Communicate" April 2000)

I will close by saying that a CLUEXIT is a positive thing, primarily because it is preventing potential data corruption which is worse than the inconvenience of the brief loss of a system.
Nic Clews, dxc, UK


Topic author
pweaver
Member
Posts: 9
Joined: Wed May 13, 2020 1:37 pm
Reputation: 0
Status: Offline

Re: CLUEXIT Decision

Post by pweaver » Thu Nov 28, 2024 2:29 pm

Thanks to everyone for all the replies and details. I was sure I recalled a document or presentation that said that when two nodes use Ethernet for SCS traffic and the SCS traffic stops (in this particular example the network switch two Alphas plugged into was accidently misconfigured) then the higher SCSSYSTEMID would CLUEXIT first. Based on the replies here I guess my memory is wrong,

User avatar

volkerhalle
Master
Posts: 210
Joined: Fri Aug 14, 2020 11:31 am
Reputation: 0
Status: Offline

Re: CLUEXIT Decision

Post by volkerhalle » Sun Dec 01, 2024 1:32 am

Here is the CLUEXIT decision tree as documented in a CANASTA rule:

This node was selected to crash because a connect message was
received that indicated one of the following conditions:

1- The other node has removed this node from its cluster and
it has quorum,

If neither node has quorum...

2- the other node has more available votes,
3- or the other node has connection to more nodes,
4- or the votes and nodes are equal and the other node has a higher system ID.

Volker.

Post Reply