OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time

OpenVMS virtualization: OpenVMS on VirtualBox, VMWare, Hyper-V, KVM, and more.

Topic author
bobgezelter
Contributor
Posts: 13
Joined: Tue Oct 26, 2021 8:19 am
Reputation: 0
Location: Flushing, New York, USA
Status: Offline
Contact:

OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time

Post by bobgezelter » Fri Jun 23, 2023 11:04 am

I have observed multiple occurrences of accumulating TOY skew running OpenVMS x86-64 under Virtual Box. The underlying Windows host is not particularly busy. In the most recent episode, the time skew is over 1 hour 50 minutes(!) in approximately 24 hours. The skew appears not to be uniform, but episodic.

The hardware/software configuration is:

Host CPU: Dell Latitude E6420 (i5-2520M@2.5GHz; Turbo to 3.26GHz; Typical usage: 30% @ .90GHz); 16GB; 2TB Seagate Barracuda

Host OS: Windows 10 Professional, all patches to date
Virtual Machine: Oracle VirtualBox 7.0.6r1555176 (Qt5.15.2)
Basic OpenVMS x86-64 install; only VSI-supplied basic products (OVMS Community License; TCPIP; DECnet IV)

Examples:
Host Time (EDT) VMS Time (UTC)

Bootstrap 1:
09:38 1338Z
15:25 1857Z
16:26 1958Z
16:47 2019Z
18:46 2218Z
22:49 0219Z

Bootstrap 2:
10:25 1425Z
10:31 1431Z
23:22 0309Z
09:23 1211Z

This is a serious flaw for multiple reasons:
- The slippage is not at a constant ratio. Processes that sequence events based on time can not be reliably run if the clock reference is unpredictably varying. At best it triggers timeouts and correctable errors. At worst, it can compromise the safety of equipment operation.

- Time comparisons between OpenVMS-recorded events and other logs are not reliable.

- Unreliable time stamps compromise the ability to use OpenVMS log time stamps in legal proceedings. Having testified in court involving software-generated logs, unreliable Time of Year recording significantly undermines the reliability of such logs.

- SET TIME from DCL has no effect :cry:

It should be noted that guest Ubuntu LTS 20 virtual instances do not encounter corresponding difficulties.

- Bob Gezelter, http://www.rlgsc.com <gezelter@rlgsc.com>
Last edited by bobgezelter on Fri Jun 23, 2023 12:24 pm, edited 1 time in total.
- Bob Gezelter, http://www.rlgsc.com


mjvms27
Contributor
Posts: 23
Joined: Wed May 17, 2023 2:11 pm
Reputation: 0
Status: Offline

Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time

Post by mjvms27 » Mon Jun 26, 2023 9:55 am

Is this potentially of any assistance?
https://docs.oracle.com/en/virtualizati ... imers.html
bobgezelter wrote:
Fri Jun 23, 2023 11:04 am
It should be noted that guest Ubuntu LTS 20 virtual instances do not encounter corresponding difficulties.
Are the Ubuntu LTS 20 virtual instances running the VirtualBox Guest Additions for Linux?
Last edited by mjvms27 on Mon Jun 26, 2023 10:13 am, edited 1 time in total.


Topic author
bobgezelter
Contributor
Posts: 13
Joined: Tue Oct 26, 2021 8:19 am
Reputation: 0
Location: Flushing, New York, USA
Status: Offline
Contact:

Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time

Post by bobgezelter » Mon Jun 26, 2023 12:23 pm

Clair,

I have some supplemental data.

I exported and restored the VM to another machine here. The second machine is a GEEKOM T-11 (i7-1195G7@2.9GHz; 4 cores/8 threads; 16GB; 512GB NVMe SSD). It has now been running for nearly two days with minimal slippage). The OpenVMS VM instance in both cases is limited to 2 processors.

The original Latitude has been rebooted a few times. However, it maintained minimal slippage for the four hours or so. When I rebooted, I neglected to restart Firefox (about 10 active windows). Restarting Firefox took an extended time (there was a 600GB backup running). When I checked the time a short while later, the OpenVMS was approximately 15 minutes behind the Windows 10 host TOY. I have the handwritten logs of time deltas if it is helpful. Presently, the delta is 56 minutes.

If there is something I can run on my OpenVMS instance(s) that would produce useful evidence, please let me know.

- Bob
cgrant wrote:
Mon Jun 26, 2023 6:39 am
I rarely use VBox these days but I thought I'd take a look at Bob's time issue. Sure enough, I have the same time loss problem. Easily reproduced. V9.2-1, VBox 7.0.6 r155176, Windows 11, Lenovo ThinkBook.

What's different between John and Bob/Clair? Don't know.

I will enter an official problem report and we will get on it.

BTW: I have KVM on a DL380 and ESXi on a DL580 guests that have been running for days with no time issue.

Clair
Added in 9 minutes 54 seconds:
mjvms27,

Yes, the Ubuntu LTS20 instances have the latest Guest Additions installed.

Since OpenVMS 9.2-1 only operates as a virtualized instance, I would defer on that question to Clair et al.


mjvms27 wrote:
Mon Jun 26, 2023 9:55 am
Is this potentially of any assistance?
https://docs.oracle.com/en/virtualizati ... imers.html
bobgezelter wrote:
Fri Jun 23, 2023 11:04 am
It should be noted that guest Ubuntu LTS 20 virtual instances do not encounter corresponding difficulties.
Are the Ubuntu LTS 20 virtual instances running the VirtualBox Guest Additions for Linux?
- Bob Gezelter, http://www.rlgsc.com

User avatar

arne_v
Master
Posts: 347
Joined: Fri Apr 17, 2020 7:31 pm
Reputation: 0
Location: Rhode Island, USA
Status: Offline
Contact:

Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time

Post by arne_v » Mon Jun 26, 2023 4:48 pm

I am not a "close to HW and OS" person (any longer).

But I got an idea.

Could the VMS falling behind time wise be due to the VM running on laptops with power savings enabled and the problem arise when the laptop due to not much going on reduces clock speed on the CPU to save battery power?
Arne
arne@vajhoej.dk
VMS user since 1986


cgrant
VSI Expert
Contributor
Posts: 18
Joined: Mon Aug 09, 2021 9:01 am
Reputation: 0
Status: Offline

Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time

Post by cgrant » Mon Jun 26, 2023 5:38 pm

We agree. Laptop power management is something we are looking into.

Clair

User avatar

arne_v
Master
Posts: 347
Joined: Fri Apr 17, 2020 7:31 pm
Reputation: 0
Location: Rhode Island, USA
Status: Offline
Contact:

Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time

Post by arne_v » Mon Jun 26, 2023 7:38 pm

I believe it is possible to disable that feature, so it should be possible to test.

If it get confirmed then:
* easy short term workaround by telling everybody to disable that in release notes
* potentially tricky long term solution (I say potentially tricky, because I think this is a fundamental VMS design going back to VAX)

If the hypothesis get falsified, then back to square one.
Arne
arne@vajhoej.dk
VMS user since 1986


Topic author
bobgezelter
Contributor
Posts: 13
Joined: Tue Oct 26, 2021 8:19 am
Reputation: 0
Location: Flushing, New York, USA
Status: Offline
Contact:

Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time

Post by bobgezelter » Tue Jun 27, 2023 5:02 am

Arne,

Interesting thought. The power management issue goes far beyond laptops. Heat is the major pollutant (byproduct) produced by server farms. Power consumption by servers, together with the "hotel" load for cooling, a if not the, major expense.

If anything, the mobile-style power management features are likely to be adopted by data centers. Clock frequency reduction is a cheep kill. Idling cores also saves power. Reshuffling instances onto fewer servers is likely better, but is a far more costly process.

In any event, a quick check of my non-laptop GEEKOM IT-11 (i71195G7 4 cores/8 threads); 2.9GHz shows the clock frequency varying, with the media somewhere in the 1.36 range. With http://www.cnn.com up in Firefox, clock variation is more frequent, with excursions as high as 3.65Hz.

The OpenVMS instance on the IT-11 does not exhibit the TOY slippage. I will see what happens in a few hours.

- Bob



arne_v wrote:
Mon Jun 26, 2023 7:38 pm
I believe it is possible to disable that feature, so it should be possible to test.

If it get confirmed then:
* easy short term workaround by telling everybody to disable that in release notes
* potentially tricky long term solution (I say potentially tricky, because I think this is a fundamental VMS design going back to VAX)

If the hypothesis get falsified, then back to square one.
- Bob Gezelter, http://www.rlgsc.com


cgrant
VSI Expert
Contributor
Posts: 18
Joined: Mon Aug 09, 2021 9:01 am
Reputation: 0
Status: Offline

Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time

Post by cgrant » Tue Jun 27, 2023 6:16 am

Has anyone ever seen this time loss on anything other than a laptop? It will make a difference in the importance of how we prioritize this.

One of the things we have always known is that some hypervisors provide an interface for the host and guest to communicate. Our thought has been that time management is an area where this could be important. These interfaces have just not made it to the top of the priority list.


Topic author
bobgezelter
Contributor
Posts: 13
Joined: Tue Oct 26, 2021 8:19 am
Reputation: 0
Location: Flushing, New York, USA
Status: Offline
Contact:

Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time

Post by bobgezelter » Tue Jun 27, 2023 2:50 pm

I concur with Arne in part and disagree in part.

I suspect that there are at least two separate issues here.

The first is when the the host hibernates/sleeps and/or the virtual instance is suspended. In that case, the effective clock rate is instantaneously zero. When the clock is restarted, one has to ensure that the virtual instance re-establishes the reference to the external TOY and executes all timed events. The most likely best past analog to this in the RSX-11M/M-PLUS/VMS clade is the infrequently exercised power failure code. Ideally, one processes the timer queue in time order and updates the OVMS TOY clock from the hardware TOY clock (in the case of a virtual instance, the host clock). I reported this issue at the beginning of the 9.0 field test.

Changing clock frequency is, I suspect, almost an entirely different issue. I do not have my IDSM close at hand, but my recollection is that there is already code to deal with incompletely processed clock ticks, These can occur when an interrupt routine(s) run too long. With the range of CPU speeds on x86, at the slowest processing rate, e.g., with ALL processes idle, today's x86-64 processors are still running faster than most VAX processors. My Dell Latitude E6420 has a "throttled down" clock frequency of 0.9GHz. I can imagine that high interrupt processing loads could conceivably overwhelm the CPU for a few ticks. If there is an presumption on the maximum number of ticks that can be queued, that could be a problem.
- Bob Gezelter, http://www.rlgsc.com


mjvms27
Contributor
Posts: 23
Joined: Wed May 17, 2023 2:11 pm
Reputation: 0
Status: Offline

Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time

Post by mjvms27 » Wed Jun 28, 2023 4:45 pm

There likely isn't anything that would prevent an OpenVMS VM on VirtualBox from doing what the existing VirtualBox Guest Additions for the VirtualBox-supported platforms do to communicate with the host, but would likely involve a VM-side device driver (or equivalent) and the OpenVMS VM code written and tested to interact with it.

For the curious...

Here is an article on how the Guest Additions work and how the VM and host communicate:
https://wiki.osdev.org/VirtualBox_Guest_Additions

Here is a link to a browse location for the VirtualBox Guest Additions code related to synchronizing time:
https://www.virtualbox.org/browser/vbox ... meSync.cpp

The top folder of the Guest Additions Code can be browsed here:
https://www.virtualbox.org/browser/vbox ... Additions/

The VirtualBox source code can be browsed here:
https://www.virtualbox.org/browser/vbox/trunk


jonesd
Valued Contributor
Posts: 78
Joined: Mon Aug 09, 2021 7:59 pm
Reputation: 0
Status: Offline

Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time

Post by jonesd » Wed Jun 28, 2023 5:26 pm

mjvms27 wrote:
Wed Jun 28, 2023 4:45 pm
Here is a link to a browse location for the VirtualBox Guest Additions code related to synchronizing time:
https://www.virtualbox.org/browser/vbox ... meSync.cpp
The crux of that service thread is calls to a function named VbglR3GetHostTime, which probalby goes down at least another 2 or 3 levels before you actually get to the real communication with the host.

Post Reply