OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time
-
Topic author - Contributor
- Posts: 13
- Joined: Tue Oct 26, 2021 8:19 am
- Reputation: 0
- Location: Flushing, New York, USA
- Status: Offline
- Contact:
OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time
I have observed multiple occurrences of accumulating TOY skew running OpenVMS x86-64 under Virtual Box. The underlying Windows host is not particularly busy. In the most recent episode, the time skew is over 1 hour 50 minutes(!) in approximately 24 hours. The skew appears not to be uniform, but episodic.
The hardware/software configuration is:
Host CPU: Dell Latitude E6420 (i5-2520M@2.5GHz; Turbo to 3.26GHz; Typical usage: 30% @ .90GHz); 16GB; 2TB Seagate Barracuda
Host OS: Windows 10 Professional, all patches to date
Virtual Machine: Oracle VirtualBox 7.0.6r1555176 (Qt5.15.2)
Basic OpenVMS x86-64 install; only VSI-supplied basic products (OVMS Community License; TCPIP; DECnet IV)
Examples:
Host Time (EDT) VMS Time (UTC)
Bootstrap 1:
09:38 1338Z
15:25 1857Z
16:26 1958Z
16:47 2019Z
18:46 2218Z
22:49 0219Z
Bootstrap 2:
10:25 1425Z
10:31 1431Z
23:22 0309Z
09:23 1211Z
This is a serious flaw for multiple reasons:
- The slippage is not at a constant ratio. Processes that sequence events based on time can not be reliably run if the clock reference is unpredictably varying. At best it triggers timeouts and correctable errors. At worst, it can compromise the safety of equipment operation.
- Time comparisons between OpenVMS-recorded events and other logs are not reliable.
- Unreliable time stamps compromise the ability to use OpenVMS log time stamps in legal proceedings. Having testified in court involving software-generated logs, unreliable Time of Year recording significantly undermines the reliability of such logs.
- SET TIME from DCL has no effect
It should be noted that guest Ubuntu LTS 20 virtual instances do not encounter corresponding difficulties.
- Bob Gezelter, http://www.rlgsc.com <gezelter@rlgsc.com>
The hardware/software configuration is:
Host CPU: Dell Latitude E6420 (i5-2520M@2.5GHz; Turbo to 3.26GHz; Typical usage: 30% @ .90GHz); 16GB; 2TB Seagate Barracuda
Host OS: Windows 10 Professional, all patches to date
Virtual Machine: Oracle VirtualBox 7.0.6r1555176 (Qt5.15.2)
Basic OpenVMS x86-64 install; only VSI-supplied basic products (OVMS Community License; TCPIP; DECnet IV)
Examples:
Host Time (EDT) VMS Time (UTC)
Bootstrap 1:
09:38 1338Z
15:25 1857Z
16:26 1958Z
16:47 2019Z
18:46 2218Z
22:49 0219Z
Bootstrap 2:
10:25 1425Z
10:31 1431Z
23:22 0309Z
09:23 1211Z
This is a serious flaw for multiple reasons:
- The slippage is not at a constant ratio. Processes that sequence events based on time can not be reliably run if the clock reference is unpredictably varying. At best it triggers timeouts and correctable errors. At worst, it can compromise the safety of equipment operation.
- Time comparisons between OpenVMS-recorded events and other logs are not reliable.
- Unreliable time stamps compromise the ability to use OpenVMS log time stamps in legal proceedings. Having testified in court involving software-generated logs, unreliable Time of Year recording significantly undermines the reliability of such logs.
- SET TIME from DCL has no effect
It should be noted that guest Ubuntu LTS 20 virtual instances do not encounter corresponding difficulties.
- Bob Gezelter, http://www.rlgsc.com <gezelter@rlgsc.com>
Last edited by bobgezelter on Fri Jun 23, 2023 12:24 pm, edited 1 time in total.
- Bob Gezelter, http://www.rlgsc.com
Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time
Is this potentially of any assistance?
https://docs.oracle.com/en/virtualizati ... imers.html
https://docs.oracle.com/en/virtualizati ... imers.html
Are the Ubuntu LTS 20 virtual instances running the VirtualBox Guest Additions for Linux?bobgezelter wrote: ↑Fri Jun 23, 2023 11:04 amIt should be noted that guest Ubuntu LTS 20 virtual instances do not encounter corresponding difficulties.
Last edited by mjvms27 on Mon Jun 26, 2023 10:13 am, edited 1 time in total.
-
Topic author - Contributor
- Posts: 13
- Joined: Tue Oct 26, 2021 8:19 am
- Reputation: 0
- Location: Flushing, New York, USA
- Status: Offline
- Contact:
Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time
Clair,
I have some supplemental data.
I exported and restored the VM to another machine here. The second machine is a GEEKOM T-11 (i7-1195G7@2.9GHz; 4 cores/8 threads; 16GB; 512GB NVMe SSD). It has now been running for nearly two days with minimal slippage). The OpenVMS VM instance in both cases is limited to 2 processors.
The original Latitude has been rebooted a few times. However, it maintained minimal slippage for the four hours or so. When I rebooted, I neglected to restart Firefox (about 10 active windows). Restarting Firefox took an extended time (there was a 600GB backup running). When I checked the time a short while later, the OpenVMS was approximately 15 minutes behind the Windows 10 host TOY. I have the handwritten logs of time deltas if it is helpful. Presently, the delta is 56 minutes.
If there is something I can run on my OpenVMS instance(s) that would produce useful evidence, please let me know.
- Bob
mjvms27,
Yes, the Ubuntu LTS20 instances have the latest Guest Additions installed.
Since OpenVMS 9.2-1 only operates as a virtualized instance, I would defer on that question to Clair et al.
I have some supplemental data.
I exported and restored the VM to another machine here. The second machine is a GEEKOM T-11 (i7-1195G7@2.9GHz; 4 cores/8 threads; 16GB; 512GB NVMe SSD). It has now been running for nearly two days with minimal slippage). The OpenVMS VM instance in both cases is limited to 2 processors.
The original Latitude has been rebooted a few times. However, it maintained minimal slippage for the four hours or so. When I rebooted, I neglected to restart Firefox (about 10 active windows). Restarting Firefox took an extended time (there was a 600GB backup running). When I checked the time a short while later, the OpenVMS was approximately 15 minutes behind the Windows 10 host TOY. I have the handwritten logs of time deltas if it is helpful. Presently, the delta is 56 minutes.
If there is something I can run on my OpenVMS instance(s) that would produce useful evidence, please let me know.
- Bob
Added in 9 minutes 54 seconds:cgrant wrote: ↑Mon Jun 26, 2023 6:39 amI rarely use VBox these days but I thought I'd take a look at Bob's time issue. Sure enough, I have the same time loss problem. Easily reproduced. V9.2-1, VBox 7.0.6 r155176, Windows 11, Lenovo ThinkBook.
What's different between John and Bob/Clair? Don't know.
I will enter an official problem report and we will get on it.
BTW: I have KVM on a DL380 and ESXi on a DL580 guests that have been running for days with no time issue.
Clair
mjvms27,
Yes, the Ubuntu LTS20 instances have the latest Guest Additions installed.
Since OpenVMS 9.2-1 only operates as a virtualized instance, I would defer on that question to Clair et al.
mjvms27 wrote: ↑Mon Jun 26, 2023 9:55 amIs this potentially of any assistance?
https://docs.oracle.com/en/virtualizati ... imers.html
Are the Ubuntu LTS 20 virtual instances running the VirtualBox Guest Additions for Linux?bobgezelter wrote: ↑Fri Jun 23, 2023 11:04 amIt should be noted that guest Ubuntu LTS 20 virtual instances do not encounter corresponding difficulties.
- Bob Gezelter, http://www.rlgsc.com
-
- Master
- Posts: 391
- Joined: Fri Apr 17, 2020 7:31 pm
- Reputation: 0
- Location: Rhode Island, USA
- Status: Offline
- Contact:
Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time
I am not a "close to HW and OS" person (any longer).
But I got an idea.
Could the VMS falling behind time wise be due to the VM running on laptops with power savings enabled and the problem arise when the laptop due to not much going on reduces clock speed on the CPU to save battery power?
But I got an idea.
Could the VMS falling behind time wise be due to the VM running on laptops with power savings enabled and the problem arise when the laptop due to not much going on reduces clock speed on the CPU to save battery power?
-
- VSI Expert
- Contributor
- Posts: 18
- Joined: Mon Aug 09, 2021 9:01 am
- Reputation: 0
- Status: Offline
Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time
We agree. Laptop power management is something we are looking into.
Clair
Clair
-
- Master
- Posts: 391
- Joined: Fri Apr 17, 2020 7:31 pm
- Reputation: 0
- Location: Rhode Island, USA
- Status: Offline
- Contact:
Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time
I believe it is possible to disable that feature, so it should be possible to test.
If it get confirmed then:
* easy short term workaround by telling everybody to disable that in release notes
* potentially tricky long term solution (I say potentially tricky, because I think this is a fundamental VMS design going back to VAX)
If the hypothesis get falsified, then back to square one.
If it get confirmed then:
* easy short term workaround by telling everybody to disable that in release notes
* potentially tricky long term solution (I say potentially tricky, because I think this is a fundamental VMS design going back to VAX)
If the hypothesis get falsified, then back to square one.
-
Topic author - Contributor
- Posts: 13
- Joined: Tue Oct 26, 2021 8:19 am
- Reputation: 0
- Location: Flushing, New York, USA
- Status: Offline
- Contact:
Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time
Arne,
Interesting thought. The power management issue goes far beyond laptops. Heat is the major pollutant (byproduct) produced by server farms. Power consumption by servers, together with the "hotel" load for cooling, a if not the, major expense.
If anything, the mobile-style power management features are likely to be adopted by data centers. Clock frequency reduction is a cheep kill. Idling cores also saves power. Reshuffling instances onto fewer servers is likely better, but is a far more costly process.
In any event, a quick check of my non-laptop GEEKOM IT-11 (i71195G7 4 cores/8 threads); 2.9GHz shows the clock frequency varying, with the media somewhere in the 1.36 range. With http://www.cnn.com up in Firefox, clock variation is more frequent, with excursions as high as 3.65Hz.
The OpenVMS instance on the IT-11 does not exhibit the TOY slippage. I will see what happens in a few hours.
- Bob
Interesting thought. The power management issue goes far beyond laptops. Heat is the major pollutant (byproduct) produced by server farms. Power consumption by servers, together with the "hotel" load for cooling, a if not the, major expense.
If anything, the mobile-style power management features are likely to be adopted by data centers. Clock frequency reduction is a cheep kill. Idling cores also saves power. Reshuffling instances onto fewer servers is likely better, but is a far more costly process.
In any event, a quick check of my non-laptop GEEKOM IT-11 (i71195G7 4 cores/8 threads); 2.9GHz shows the clock frequency varying, with the media somewhere in the 1.36 range. With http://www.cnn.com up in Firefox, clock variation is more frequent, with excursions as high as 3.65Hz.
The OpenVMS instance on the IT-11 does not exhibit the TOY slippage. I will see what happens in a few hours.
- Bob
arne_v wrote: ↑Mon Jun 26, 2023 7:38 pmI believe it is possible to disable that feature, so it should be possible to test.
If it get confirmed then:
* easy short term workaround by telling everybody to disable that in release notes
* potentially tricky long term solution (I say potentially tricky, because I think this is a fundamental VMS design going back to VAX)
If the hypothesis get falsified, then back to square one.
- Bob Gezelter, http://www.rlgsc.com
-
- VSI Expert
- Contributor
- Posts: 18
- Joined: Mon Aug 09, 2021 9:01 am
- Reputation: 0
- Status: Offline
Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time
Has anyone ever seen this time loss on anything other than a laptop? It will make a difference in the importance of how we prioritize this.
One of the things we have always known is that some hypervisors provide an interface for the host and guest to communicate. Our thought has been that time management is an area where this could be important. These interfaces have just not made it to the top of the priority list.
One of the things we have always known is that some hypervisors provide an interface for the host and guest to communicate. Our thought has been that time management is an area where this could be important. These interfaces have just not made it to the top of the priority list.
-
Topic author - Contributor
- Posts: 13
- Joined: Tue Oct 26, 2021 8:19 am
- Reputation: 0
- Location: Flushing, New York, USA
- Status: Offline
- Contact:
Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time
I concur with Arne in part and disagree in part.
I suspect that there are at least two separate issues here.
The first is when the the host hibernates/sleeps and/or the virtual instance is suspended. In that case, the effective clock rate is instantaneously zero. When the clock is restarted, one has to ensure that the virtual instance re-establishes the reference to the external TOY and executes all timed events. The most likely best past analog to this in the RSX-11M/M-PLUS/VMS clade is the infrequently exercised power failure code. Ideally, one processes the timer queue in time order and updates the OVMS TOY clock from the hardware TOY clock (in the case of a virtual instance, the host clock). I reported this issue at the beginning of the 9.0 field test.
Changing clock frequency is, I suspect, almost an entirely different issue. I do not have my IDSM close at hand, but my recollection is that there is already code to deal with incompletely processed clock ticks, These can occur when an interrupt routine(s) run too long. With the range of CPU speeds on x86, at the slowest processing rate, e.g., with ALL processes idle, today's x86-64 processors are still running faster than most VAX processors. My Dell Latitude E6420 has a "throttled down" clock frequency of 0.9GHz. I can imagine that high interrupt processing loads could conceivably overwhelm the CPU for a few ticks. If there is an presumption on the maximum number of ticks that can be queued, that could be a problem.
I suspect that there are at least two separate issues here.
The first is when the the host hibernates/sleeps and/or the virtual instance is suspended. In that case, the effective clock rate is instantaneously zero. When the clock is restarted, one has to ensure that the virtual instance re-establishes the reference to the external TOY and executes all timed events. The most likely best past analog to this in the RSX-11M/M-PLUS/VMS clade is the infrequently exercised power failure code. Ideally, one processes the timer queue in time order and updates the OVMS TOY clock from the hardware TOY clock (in the case of a virtual instance, the host clock). I reported this issue at the beginning of the 9.0 field test.
Changing clock frequency is, I suspect, almost an entirely different issue. I do not have my IDSM close at hand, but my recollection is that there is already code to deal with incompletely processed clock ticks, These can occur when an interrupt routine(s) run too long. With the range of CPU speeds on x86, at the slowest processing rate, e.g., with ALL processes idle, today's x86-64 processors are still running faster than most VAX processors. My Dell Latitude E6420 has a "throttled down" clock frequency of 0.9GHz. I can imagine that high interrupt processing loads could conceivably overwhelm the CPU for a few ticks. If there is an presumption on the maximum number of ticks that can be queued, that could be a problem.
- Bob Gezelter, http://www.rlgsc.com
Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time
There likely isn't anything that would prevent an OpenVMS VM on VirtualBox from doing what the existing VirtualBox Guest Additions for the VirtualBox-supported platforms do to communicate with the host, but would likely involve a VM-side device driver (or equivalent) and the OpenVMS VM code written and tested to interact with it.
For the curious...
Here is an article on how the Guest Additions work and how the VM and host communicate:
https://wiki.osdev.org/VirtualBox_Guest_Additions
Here is a link to a browse location for the VirtualBox Guest Additions code related to synchronizing time:
https://www.virtualbox.org/browser/vbox ... meSync.cpp
The top folder of the Guest Additions Code can be browsed here:
https://www.virtualbox.org/browser/vbox ... Additions/
The VirtualBox source code can be browsed here:
https://www.virtualbox.org/browser/vbox/trunk
For the curious...
Here is an article on how the Guest Additions work and how the VM and host communicate:
https://wiki.osdev.org/VirtualBox_Guest_Additions
Here is a link to a browse location for the VirtualBox Guest Additions code related to synchronizing time:
https://www.virtualbox.org/browser/vbox ... meSync.cpp
The top folder of the Guest Additions Code can be browsed here:
https://www.virtualbox.org/browser/vbox ... Additions/
The VirtualBox source code can be browsed here:
https://www.virtualbox.org/browser/vbox/trunk
Re: OpenVMS x86-64 V9.2-1: Significant TOY Slippage over time
The crux of that service thread is calls to a function named VbglR3GetHostTime, which probalby goes down at least another 2 or 3 levels before you actually get to the real communication with the host.mjvms27 wrote: ↑Wed Jun 28, 2023 4:45 pmHere is a link to a browse location for the VirtualBox Guest Additions code related to synchronizing time:
https://www.virtualbox.org/browser/vbox ... meSync.cpp