MountVerify after heavy $CRMPSC and $EXPREG

Post Reply

Topic author
johnharney
Newbie
Posts: 3
Joined: Thu Aug 24, 2023 2:45 pm
Reputation: 0
Status: Offline

MountVerify after heavy $CRMPSC and $EXPREG

Post by johnharney » Wed Nov 29, 2023 10:49 am

As part of our product's process startup, subprocesses are spawned to do their own startup; they $EXPREG and then $CRMPSC[_GPFILE_64] 14+ million pagelets, both P0 and P2 space. Most are backed with the pagefile, a few aren't.

From an IA64 working system with this software:
1842 Global Sections Used, 14248960/34039136 Global Pagelets Used/Unused

After "a while" (41 seconds CPU yesterday) one or two (mostly random) spawned subprocesses go into RWMPB, and the disk with the page file goes into MountVerify (mounted) state. The disk never recovers. This is on the console:

Code: Select all

%%%%%%%%%%%  OPCOM  28-NOV-2023 19:43:12.76  %%%%%%%%%%%
Device BBIRCH$DKA200: is offline.
Mount verification is in progress.
If we shrink some of the sections we're creating/mapping, we get "farther" into the big startup.

We put the pagefile on its own disk as part of troubleshooting; if the pagefile is on the system disk or the program disk, the whole system or product gets unresponsive in a hurry. Having the pagefile by itself helps isolate the problem to... the pagefile (or its disk), if we're reading the tea leaves correctly. The pagefile is the only thing on DKA200/PDISK.

This happens on all flavors of 9.2-1 (v100, v200 too). This is ESXi as host; 16GB RAM in the VM; using SATA disks.

When we add a SCSI controller, and put the pagefile on a SCSI disk, "It works!!" Mount verification is kicked, but "it" recovers, and well, all is happy!

Code: Select all

%%%%%%%%%%%  OPCOM  29-NOV-2023 10:31:31.37  %%%%%%%%%%%
Device CBIRCH$DKA200: is offline.
Mount verification is in progress.

%%%%%%%%%%%  OPCOM  29-NOV-2023 10:31:31.38  %%%%%%%%%%%
Mount verification has completed for device CBIRCH$DKA200:
We did eventually run down the product, and MountVerify occurred on the TDISK (which was still SATA). "It" never recovered.

We built a system with all SCSI, and while we get MountVerify events, they all eventually resolve, and things work.

We're not exactly sure what to ask or look at next (yet); we're not in a position to submit a bug report (our junk is huge; oo-er), so any questions, thoughts, ideas etc are very much appreciated. Trouble-shooting ideas? How to look deeper in $ ANALYZE/SYS? We could surely crash the system and send a dump, but again, we're not sure it's the next step.

While MountVerify is nice when it resolves, it seems it really shouldn't be happening with this amount of reproducability. Just a SWAG.

Thanks for reading this far! What more can we show you?
\john and rod

Code: Select all

BBIRCH_harney> sh dev d

Device                  Device           Error   Volume          Free  Trans Mnt
 Name                   Status           Count    Label         Blocks Count Cnt
BBIRCH$DMM0:            Offline              0
BBIRCH$DKA0:            Mounted              0 B_X86VMSV921  203157504   362   1
BBIRCH$DKA100:          Mounted              0 TDISK         267970208    10   1
BBIRCH$DKA200:          MountVerify          0 PDISK         113090112     2   1
                        mounted    
                        

BBIRCH_harney> sh mem
              System Memory Resources on 28-NOV-2023 16:05:01.04

Physical Memory Usage (pages):     Total        Free      In Use    Modified
  Main Memory (15.74GB)          2063975     1642441      169646      251888

Extended File Cache  (Time of last reset: 22-NOV-2023 20:14:31.02)
 Allocated (MBytes)            411.16    Maximum size (MBytes)          8062.39
 Free (MBytes)                   0.21    Minimum size (MBytes)             3.12
 In use (MBytes)               410.94    Percentage Read I/Os                46%
 Read hit rate                     85%   Write hit rate                       0%
 Read I/O count                102463    Write I/O count                 116678
 Read hit count                 87694    Write hit count                      0
 Reads bypassing cache           3230    Writes bypassing cache           88027
 Files cached open                330    Files cached closed               1689
 Vols in Full XFC mode              0    Vols in VIOC Compatible mode         4
 Vols in No Caching mode            0    Vols in Perm. No Caching mode        0

Granularity Hint Regions (pages):   Total        Free      In Use    Released
  S0 Execlet data                    2048        1742         306           0
  S0 Executive data                  5632         382        5250           0
  S0 Executive RO data               1024         833         191           0
  S0 Resident image code             3072        2963         109           0
  S0 Resident image data              512         512           0           0
  S0 Resident RO image data          1024        1024           0           0
  S2 Execlet code                    4096        1513        2583           0
  S2 Execlet data                    4096        4096           0           0
  S2 Executive data                  1024           0        1024           0
  S2 Resident image code             4096         268        3828           0
  S2 Resident image data              512         512           0           0

Slot Usage (slots):                Total        Free    Resident     Swapped
  Process Entry Slots                980         937          43           0

Dynamic Memory Usage:              Total        Free      In Use     Largest
  Nonpaged Dynamic Memory (MB)     28.00       21.48        6.51       19.29
  USB Addressable Memory  (KB)   1024.00     1022.87        1.12     1022.87
  Paged Dynamic Memory    (MB)     12.40        7.24        5.16        7.22
  Lock Manager Dyn Memory (MB)      1.82        0.33        1.48
  S2 Dynamic Memory Usage (MB)      7.97        7.63        0.34        7.63

Buffer Object Usage (pages):                  In Use        Peak
  32-bit System Space Windows (S0/S1)              5           5
  64-bit System Space Windows (S2)                 0           0
  Physical pages locked by buffer objects          5           0

Memory Reservations (pages):       Group    Reserved      In Use        Type
  Total (0 bytes reserved)                         0           0

Paging File Usage (8KB pages):                 Index        Free        Size
  DISK$PDISK:[SYS0.SYSEXE]PAGEFILE.SYS;1                                        
                                                 254     1811308     2499992
  Total committed paging file usage:                                  961020

Of the physical pages in use, 89446 pages are permanently allocated to OpenVMS.


User avatar

volkerhalle
Master
Posts: 195
Joined: Fri Aug 14, 2020 11:31 am
Reputation: 0
Status: Offline

Re: MountVerify after heavy $CRMPSC and $EXPREG

Post by volkerhalle » Thu Nov 30, 2023 1:32 am

John and Rod,

I've posted some SDA suggestion in the previous topic for this problem:

viewtopic.php?f=37&t=8888

If you can reliably reproduce the problem, you could at least try the SDA DKLOG extension to find out, WHY the disk is going into MountVerification.

Volker.


duncanmorris
Visitor
Posts: 2
Joined: Fri Feb 17, 2023 6:35 am
Reputation: 0
Status: Offline

Re: MountVerify after heavy $CRMPSC and $EXPREG

Post by duncanmorris » Thu Nov 30, 2023 6:02 am

Hi, we also had mount verify issues with VMware ESXi V7.1 with the vmdk volumes presented using SATA controller. This was seen on all version of VMS.

Our VMware provider subsequently moved to use local SCSI disks on the ESXi host, and presented the vmdk volumes with the parallel SCSI controller. These are now performing satisfactorily.

When we had the SATA controller, the volumes were thin provisioned, but since moving to the SCSI controller the volumes are Thick Provision Eager Zeroed.

Unfortunately, I was unable to test whether it was the thin/thick provisioning which sorted the issue, or whether the change to the SCSI controller was the magic bullet.


marty.stu
Site Admin
Valued Contributor
Posts: 96
Joined: Tue May 21, 2019 6:56 am
Reputation: 0
Status: Offline

Re: MountVerify after heavy $CRMPSC and $EXPREG

Post by marty.stu » Thu Nov 30, 2023 6:57 am

Hi all,

We will add a note regarding the provisioning mode in the upcoming installation guide update, but we only support thick provisioning/fixed size volumes for now.
Run to the bedroom, In the suitcase on the left You'll find my favorite axe.

User avatar

imiller
Master
Posts: 122
Joined: Fri Jun 28, 2019 8:45 am
Reputation: 0
Location: South Tyneside, UK
Status: Offline
Contact:

Re: MountVerify after heavy $CRMPSC and $EXPREG

Post by imiller » Thu Nov 30, 2023 8:00 am

interesting.
I've found with OpenVMS V8.4-2 I64 systems using storage provided by 3PAR arrays that performance with thin provisioned volumes was poor so only use fully provisioned volumes. I assumed this was due to how the VMS file system works.
Ian Miller
[ personal opinion only. usual disclaimers apply. Do not taunt happy fun ball ].


pjacobi
VSI Expert
Contributor
Posts: 21
Joined: Wed Jun 28, 2023 11:46 am
Reputation: 0
Status: Offline

Re: MountVerify after heavy $CRMPSC and $EXPREG

Post by pjacobi » Fri Dec 08, 2023 10:17 am

Thin Provisioning is not supported in VMS. There will be a Release Note not to use Thin Provisioning.

Paul A. Jacobi
VMS Software

Post Reply