QMAN$MASTER.DAT on shadow set in cluster causes crashes on x86-64
Posted: Mon Apr 17, 2023 6:25 pm
(Not sure if this belongs in the Clustering forum, X86-64 forum, or here...)
I have a three-system cluster with OpenVMS E9.2-1 for x86-64 on each node. Each node has its own system disk, and each system has a second disk which is a member of a shadow set across all three nodes.
All of the documentation for clusters says the queue manager QMAN$MASTER.DAT file needs to be accessible at the same location to all cluster members. I create a directory for the queue manager file on the shadowset volume and set the QMAN$MASTER logical cluster-wide:
Now, from one cluster member (VMSX01), I start the queue manager:
As expected, this created QMAN$MASTER.DAT in SHDT1:[CLUSTER$CONFIG.Q]:
All three cluster members agree that the queue manager is running on node VMSX01. All three produce this exact same output:
I run "enable /autostart /queues" on all three cluster members.
Now I create the sys$batch queue from node VMSX01:
The other two cluster members also show "Batch queue SYS$BATCH, idle, on VMSX01::" when I run "show queue sys$batch" on them.
So at first glance, everything appears to be working as expected. However, if I try to stop the queue manager (this also happens if I try to shut down the cluster), VMSX01 crashes.
After a few seconds, the console for VMSX01 shows:
Am I doing something wrong by putting the QMAN$MASTER.DAT file on a shadowset? If so, how are you supposed to share the QMAN$MASTER.DAT file in a cluster without shared disks?
Or is this a bug?
Thanks,
Matthew
I have a three-system cluster with OpenVMS E9.2-1 for x86-64 on each node. Each node has its own system disk, and each system has a second disk which is a member of a shadow set across all three nodes.
Code: Select all
$ show dev shdt1
Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt
DSA0: Mounted 0 SHDT1 50182695 5 3
$1$DKA100: (VMSX01) ShadowSetMember 0 (member of DSA0:)
$2$DKA100: (VMSX02) ShadowSetMember 0 (member of DSA0:)
$3$DKA100: (VMSX03) ShadowSetMember 0 (member of DSA0:)
Code: Select all
$ CREATE/DIR SHDT1:[CLUSTER$CONFIG.Q]
$ DEFINE/SYSTEM/EXECUTIVE/CLUSTER QMAN$MASTER SHDT1:[CLUSTER$CONFIG.Q]
Code: Select all
$ START/QUEUE/MANAGER/NEW_VERSION
Code: Select all
$ dir shdt1:[cluster$config.q]
Directory SHDT1:[CLUSTER$CONFIG.Q]
QMAN$MASTER.DAT;1
Total of 1 file.
Code: Select all
$ show queue/manager
Queue manager SYS$QUEUE_MANAGER, running, on VMSX01::
Now I create the sys$batch queue from node VMSX01:
Code: Select all
$ init /queue /start /autostart_on=(vmsx01::,vmsx02::,vmsx03::) /batch sys$batch
$ show queue sys$batch
Batch queue SYS$BATCH, idle, on VMSX01::
So at first glance, everything appears to be working as expected. However, if I try to stop the queue manager (this also happens if I try to shut down the cluster), VMSX01 crashes.
Code: Select all
$ stop/queue/manager/cluster
$ show queue/manager
Queue manager SYS$QUEUE_MANAGER, stopping, on VMSX01::
Code: Select all
VSI Dump Kernel SYSBOOT Jan 23 2023 14:03:45
**** OpenVMS x86_64 Operating System E9.2-1 - BUGCHECK ****
** Bugcheck code = 0000019C: INCONSTATE, Inconsistent I/O data base
** Crash Time: 17-APR-2023 20:54:37.33
** Crash CPU: 00000000 Primary CPU: 00000000 Node Name: VMSX01
** Highest CPU number: 00000001
** Active CPUs: 00000000.00000003
** Current Process: "QUEUE_MANAGER"
** Current PSB ID: 00000001
** Image Name: $1$DKA0:[SYS0.SYSCOMMON.][SYSEXE]QMAN$QUEUE_MANAGER.EXE;1
** Dumping error logs to the system disk ($1$DKA0:)
** Error logs dumped to $1$DKA0:[SYS0.SYSEXE]SYS$ERRLOG.DMP
** (used 52 out of 64 available blocks)
** Dumping memory to the system disk ($1$DKA0:)
Or is this a bug?
Thanks,
Matthew