Lock Manager Memory increasing suddenly

Having difficulties when installing the system? Your system runs slowly and requires some tweaking? You can get help here.
Post Reply

Topic author
dirk.bogaerts
Member
Posts: 6
Joined: Thu Feb 18, 2021 9:50 am
Reputation: 0
Status: Offline

Lock Manager Memory increasing suddenly

Post by dirk.bogaerts » Sun Dec 05, 2021 3:39 pm

Situation & Setup:
OpenVMS V8.4
3 identical RX2600 – 48GB mem - with only internal disks in RAID
Cluster consisting of:
  • 2 Production servers:
    • Node1: (active) running the application and Oracle database, shared Queue management
    • Node2: (standby) running some housekeeping tasks, third-party Backup via network, Defrag, sharing some batch queues (and sharing all print Q’s) with Node1
    • The production application Shadow disks have shadowset members mounted on both nodes; system disks are local on each node with common part on a separate shadow disk mounted on both nodes, for amongst others, the shared Queue manager
  • 1 Development server – Node3:
    • Quorum/voting member for the cluster
    • The production applications disks have a third shadowset member on this node, but these are not mounted
    • Local non-shadow disks for system & application development (physical disks are double the size on this member, this being the only physical HW difference)
The cluster was put in production 4.5 years ago, since then no reboot yet. The systems were pre-tested with simulated application load and tuned accordingly before the first go-live. Everything has been running perfectly stable ever since.
No changes done on the systems the past couple of months, no major changes (ex. SYSGEN) the previous years.

Problem:
Since last Friday morning, the Lock Manager Dynamic Memory has started to slowly increase on Node1 and has continued to increase ever since, from +/- 73MB in use (87MB size) the weeks before, to 84MB (91MB) on Friday morning (triggering an alert), and creeping up to 88 MB (95MB) about 48hrs later.

Investigation so far (I have limited experience on system internals and performance analysis knowledge):
  • SDA ‘Show Lock’ analysis on Node1:
    • 114K entries in total
    • 105K from ‘Resource name’ CACHE$cmDISK4
    • 1K from next most occurring resource CACHE$cmCOMMON (disk containing, amongst others, the Q mgr.)
  • DISK4 is the disk with the ‘miscellaneous’ files containing temporary work files, .log files, etc. :
    • 230+K files in total
    • 20K new files created per day
    • approx. 200-400 open files on DISK4 on Node1 (< 10 on Node2)
    • daily cleanup of older file versions at 4PM
    • disk less than 50% full
    • monthly defragged, last run was 1 week ago
  • details of Show Mem included at the end.
Initial findings:
Monitoring the Lock Mgr memory during the day, I discovered a significant drop of 4 – 5 MB ‘in use’ after the daily file-cleanup housekeeping job had run (purging approx. 20K old file versions). But this drop in size, only compensates partially the daily growth. The files that are purged are mostly files that have not been in use/opened for several days (except by the daily VMS full backup with /IGN=INTER, that runs every other day on Node1 or Node2).
SDA ‘Show Lock’ analysis on Node1 after daily cleanup:
• 104K entries in total (down from 114K)
• 94K from ‘Resource name’ CACHE$cmDISK4 (down from 105K)

Questions:
1. Is it possible to calculate/estimate, how much longer the Lock Mgr. memory can continue to grow?

2. Can the huge amount of Lock entries for Disk4 be explained somehow, giving the low number of files open on Disk4 at any given time? How can purging mostly old/unused files decrease the number of Lock table entries?

3. Most importantly, any ideas on how to fix this sudden growth in Lock entries??


______________________________________________________________

Code: Select all

 
              System Memory Resources on  5-DEC-2021 20:50:55.36              
                                                                              
Nonpaged Dynamic Memory      (Lists + Variable)                               
  Current Size (MB)                75.53   Current Size (Pagelets)    154704  
  Initial Size (MB)                57.66   Initial Size (Pagelets)    118096  
  Maximum Size (MB)               261.00   Maximum Size (Pagelets)    534528  
  Free Space (MB)                  19.14   Space in Use (MB)           56.39  
  Largest Var Block (KB)          270.50   Smallest Var Block (bytes)    192  
  Number of Free Blocks            41296   Free Blocks LEQU 64 bytes       0  
  Free Blocks on Lookasides        41291   Lookaside Space (MB)        18.79  
                                                                              
Bus Addressable Memory       (Lists + Variable)                               
  Current Size (MB)                 2.46   Current Size (Pagelets)      5056  
  Initial Size (MB)                 2.46   Initial Size (Pagelets)      5056  
  Free Space (MB)                   2.26   Space in Use (MB)            0.19  
  Largest Var Block (MB)            2.26   Smallest Var Block (bytes)     64  
  Number of Free Blocks               14   Free Blocks LEQU 64 bytes       1  
  Free Blocks on Lookasides            5   Lookaside Space (bytes)       384  
                                                                              
Paged Dynamic Memory         (Lists + Variable)                               
  Current Size (MB)                12.33   Current Size (Pagelets)     25264  
  Free Space (MB)                   6.24   Space in Use (MB)            6.09  
  Largest Var Block (MB)            4.84   Smallest Var Block (bytes)     16  
  Number of Free Blocks            13737   Free Blocks LEQU 64 bytes      22  
  Free Blocks on Lookasides        13731   Lookaside Space (MB)         1.39  
                                                                              
Lock Manager Dynamic Memory                                                   
  Current Size (MB)                95.12   Current Size (Pages)        12176  
  Free Space (MB)                   9.90   Hits                   1308210732  
  Space in Use (MB)                85.21   Misses                      10452  
  Number of Empty Pages              685   Expansions                  20568  
  Number of Free Packets           39119   Packet Size (bytes)             0  




              System Memory Resources on  5-DEC-2021 20:49:59.90             
                                                                             
Physical Memory Usage (pages):     Total        Free      In Use    Modified 
  Main Memory (47.99GB)          6291024      616526     5649442       25056 
                                                                             
Extended File Cache  (Time of last reset: 16-MAR-2017 12:05:28.56)           
 Allocated (GBytes)              8.56    Maximum size (GBytes)            23.
 Free (GBytes)                   1.21    Minimum size (GBytes)             0.
 In use (GBytes)                 7.34    Percentage Read I/Os                
 Read hit rate                     97%   Write hit rate                      
 Read I/O count           43642926446    Write I/O count             28671504
 Read hit count           42482109518    Write hit count                     
 Reads bypassing cache       88175862    Writes bypassing cache      20298857
 Files cached open                663    Files cached closed              710
 Vols in Full XFC mode              0    Vols in VIOC Compatible mode        
 Vols in No Caching mode            0    Vols in Perm. No Caching mode       
                                                                             
Granularity Hint Regions (pages):   Total        Free      In Use    Released
  Execlet code region                8192        4427        3765           0
  Execlet data region                2048        1123         925           0
  S0S1 Executive data region         7721           0        7721           0
  S0S1 Resident image code region   65536       61810        3726           0
  S0S1 Resident image data region    2048        1888         160           0
                                                                             
Slot Usage (slots):                Total        Free    Resident     Swapped 
  Process Entry Slots               1017         782         235           0 
  Balance Set Slots                 1015         782         233           0 
                                                                             
Dynamic Memory Usage:              Total        Free      In Use     Largest 
  Nonpaged Dynamic Memory (MB)     75.53       19.10       56.43        0.26 
  Bus Addressable Memory  (MB)      2.46        2.26        0.19        2.26 
  Paged Dynamic Memory    (MB)     12.33        6.19        6.14        4.84 
  Lock Manager Dyn Memory (MB)     95.12        9.86       85.25             
                                                                             
Buffer Object Usage (pages):                  In Use        Peak             
  32-bit System Space Windows (S0/S1)              6          19             
  64-bit System Space Windows (S2)                 0           0             
  Physical pages locked by buffer objects          6          19             
                                                                             
Memory Reservations (pages):       Group    Reserved      In Use        Type 
  ORA_SGA                         SYSGBL        3200        3200  Page Table 
  ORA_SGA                         SYSGBL     3145728     3145728   Allocated 
  ORA_SGA                         SYSGBL      131072      130803   Allocated 
  Total (25.02 GBytes reserved)              3280000     3279731             
                                                                             
Write Bitmap (WBM) Memory Summary                                            
  Local bitmap count:     4     Local bitmap memory usage (KB)       1016.00 
  Master bitmap count:    5     Master bitmap memory usage (MB)         1.19[/size]

User avatar

arne_v
Master
Posts: 299
Joined: Fri Apr 17, 2020 7:31 pm
Reputation: 0
Location: Rhode Island, USA
Status: Offline
Contact:

Re: Lock Manager Memory increasing suddenly

Post by arne_v » Sun Dec 05, 2021 8:40 pm

Any chance of rebooting the cluster one night?

My hypothesis: some very rare combination of events triggered a bug somewhere that led to some sort of a bad state that is causing these symptoms and a reboot will fix the bad state and maybe (just maybe - Murphy's law may apply) it will be many years before the same rare combination of events happen.

Alternatively you may need to get an SDA guru to analyze the running system and determining the specific cause. I believe both VSI and certain 3rd party companies has such people.
Arne
arne@vajhoej.dk
VMS user since 1986


Topic author
dirk.bogaerts
Member
Posts: 6
Joined: Thu Feb 18, 2021 9:50 am
Reputation: 0
Status: Offline

Re: Lock Manager Memory increasing suddenly

Post by dirk.bogaerts » Mon Dec 06, 2021 4:23 am

Hello Arne, thanks for the feedback.

It's a critical 24/7 system, any planned (short) downtime needs to be coordinated a week beforehand, therefore if anybody has an insight into my first question, this would be helpful in my planning:
1. Is it possible to calculate/estimate, how much longer the Lock Mgr. memory can continue to grow?

Compared to rebooting, as a shorter intervention/remedy, I was thinking about dismounting (after stopping shortly all user activity) the Disk4 on Node1 (or even clusterwide). As stated before, I have no real knowledge of OpenVMS internals, but it would seem logic to me that on dismounting Disk4, all related entries would be cleared out of the Lock Table... Is this a correct supposition?

PS: I will create a ticket with VSI support shortly (but any suggestions/tips still very welcome :-) )
Last edited by dirk.bogaerts on Mon Dec 06, 2021 4:26 am, edited 1 time in total.

User avatar

volkerhalle
Master
Posts: 196
Joined: Fri Aug 14, 2020 11:31 am
Reputation: 0
Status: Offline

Re: Lock Manager Memory increasing suddenly

Post by volkerhalle » Mon Dec 06, 2021 10:57 am

Dirk,

lock manager data structures seem to be allocated from the Lock Manager Pool Zone in S2 space.

SDA> SHOW LOCK/SUMMARY lists a summary of the Lock Manager Poolzone

The following items may be important here:
...
Number of Pages: 00000xxx (nnnn.)
Maximum Number of Pages: 000xxxxx (nnnnn.)
...

SDA> SHOW LOCK/POOL displays the Lock Manager Pool Zone pages.

The maximum number of pages seems to be somehow related to the SYSGEN parameter LOCKIDTBL. LOCKIDTBL_MAX is obsolete beginning with OpenVMS Version 7.1. There are mechanisms in place to purge empty Lock Manager Pool Zone pages.

Also have a look at $ SHOW MEM/CACHE=VOLUME=<device-name-of-DISK4>

How many locks does the XFC cache report as 'acquired' ? How many open and/or closed files are being cached ? Do these number also increase ?

Volker.
Last edited by volkerhalle on Mon Dec 06, 2021 11:05 am, edited 2 times in total.


Topic author
dirk.bogaerts
Member
Posts: 6
Joined: Thu Feb 18, 2021 9:50 am
Reputation: 0
Status: Offline

Re: Lock Manager Memory increasing suddenly

Post by dirk.bogaerts » Mon Dec 06, 2021 1:21 pm

Volker, thank you for your reply.

Code: Select all

SDA> SHOW LOCK/SUMMARY 
Lock Manager Poolzone:                                           
    Number of Pages:                      000011A4   (4516.)      
    Maximum Number of Pages:              00108411   (1082385.)   

Parameter Name            Current    Default       Min.      Max.   Unit  Dynamic 
--------------            -------    -------   -------    -------   ----   ------- 
LOCKIDTBL                   60000       3840      1792   16776959 Entries        

Code: Select all

$ SHOW MEM/CACHE=VOLUME=dsa4  
_DSA4: (DISK$DISK4), Caching mode is VIOC Compatible                      
    Open files                149     Closed files             58986      
    Files ever opened   977445469     Files ever deposed   975540283      
    Allocated pages        900734     Locks acquired           59098      
    Total QIOs         3489009263     Read hits           1911124778      
    Virtual reads      2062934761     Virtual writes      1426074524      
    Read hit rate              92 %   Read aheads          412561483      
    Read throughs      1977129974     Write throughs       381222011      
    Read arounds         85804787     Write arounds       1044852513      
    Ave Disk I/O Resp Time incl cache hits (microseconds)          0
The # open files is in the range that I would expect.
The closed files depends on what timeframe we're talking about, as approx 20K files are created per day.
I haven't really been monitoring the XFC on a per Volume level (only the overall results), as we don't have any performance issues. But I'll keep an eye on these number from now on.

As a general feedback, I have in the meantime got reassurance from VSI Support that I won't run into LockMgr Mem. shortage anytime soon, so that was my main concern. And in the meantime it looks like the LockMgr. Mem. in Use seems to stabilize and even stabilize in the lower eighties (during the w.e. is was more in the high eighties). But nevertheless it used to be in the low seventies for a very long time and I have not yet found why on Friday it started to increase by 15-20%.

Maybe one last question for the esteemed forum wizards, if things would start to go south again really quick, would a dismount/clus of the disk concerned, be a quick & good remedy to trigger a cleanup of the LockMgr. Mem. (as this disk makes up 95% of the total entry count)? I would prefer this quick action compared to a reboot (even if that is the std. troubleshooting action in the Windows world - which luckily isn't mine ;-) )

Thanks again!

User avatar

volkerhalle
Master
Posts: 196
Joined: Fri Aug 14, 2020 11:31 am
Reputation: 0
Status: Offline

Re: Lock Manager Memory increasing suddenly

Post by volkerhalle » Mon Dec 06, 2021 1:45 pm

Dirk,

Number of Pages: 000011A4 (4516.)
Maximum Number of Pages: 00108411 (1082385.)


This should let you sleep well ;-)

A new feature in OpenVMS V8.4 is the $ SET VOLUME/CACHE=... command. It allows to dynamically disable/enable caching on a volume (/CACHE=[NO]DATA) and even allows to purge/delete all XFC cache entries on a volume (/CACHE=CLEAR_DATA).

Volker.

User avatar

volkerhalle
Master
Posts: 196
Joined: Fri Aug 14, 2020 11:31 am
Reputation: 0
Status: Offline

Re: Lock Manager Memory increasing suddenly

Post by volkerhalle » Wed Dec 08, 2021 5:44 am

dirk.bogaerts wrote:
Sun Dec 05, 2021 3:39 pm
  • SDA ‘Show Lock’ analysis on Node1:
    • 114K entries in total
    • 105K from ‘Resource name’ CACHE$cmDISK4
Dirk,

the XFC resource/lock names are constructed as 'CACHE$cm' 'label-of-volume' 'File-ID' for each file.
To find the file name associated with an XFC cache lock, use:

$ DUMP/ID=%X'File-ID'/HEADER/BLOCK=COUNT=0 DISK$<label-of-volume>

Example:

Resource: 6D632445 48434143 CACHE$cm Status: FLOCK VALBLKR VALBLKW
Length 24 32385641 48504C41 ALPHAV82
Kernel mode 000021D3 20535953 SYS Ó!..

$ DUMP/ID=%X21d3/HEADER/BLOCK=COUNT=0 DISK$ALPHAV82SYS
Dump of file _DSA0:[VMS$COMMON.SYSEXE]TCPIP$TRACEROUT on 8-DEC-2021 11:42:41.36
E.EXE;1
...

Volker.
Last edited by volkerhalle on Wed Dec 08, 2021 6:47 am, edited 1 time in total.


Topic author
dirk.bogaerts
Member
Posts: 6
Joined: Thu Feb 18, 2021 9:50 am
Reputation: 0
Status: Offline

Re: Lock Manager Memory increasing suddenly

Post by dirk.bogaerts » Thu Dec 09, 2021 1:41 pm

Dear Volker, thank you for the additional information.

In the meantime I've tested (on development platform)
$ set volume/cache=clear diskname
which massively cleaned up the Lock Memory (and of course XFC), with Lock Mem. now 90% empty instead of 90% full.

I've checked a few CACHE$cm Lock table entries in detail (on the rather busy Production system) and what continues to surprise me, are entries with a 'Last access date' of 4 - 5 days old. I would have expected the Lock Table to clean itself up instead of expanding and contain only (recent) active locks. But that expectation was due to my lack of 'internals' knowledge. :)
With the help of VSI Support, I created also an overview on entries with high lock count, but that were only a handful of entries.

Anyway, case closed. Thanks everybody for the help!

Post Reply