Static link time on X86

OpenVMS x86 Field Test questions, reports, and feedback.

Topic author
mgdaniel
Valued Contributor
Posts: 62
Joined: Mon Feb 28, 2022 5:16 pm
Reputation: 0
Location: Adelaide, South Australia
Status: Offline
Contact:

Static link time on X86

Post by mgdaniel » Wed Jan 25, 2023 5:35 pm

Why does static linking an image take several tens (even hundreds) of times longer on X86 than on Itanium or Alpha?

To illustrate; the DCL procedure listed below performs a static link of the WASD server against it's own object files and the VSI SSL111 V1.1-1S SSL111$LIBSSL32.OLB and SSL111$LIBCRYPTO32.OLB.

Code: Select all

$ sec1 = f$cvtime(,,"secondofyear")
$ cpu1 = f$getjpi(0,"cputim")
$ @build_httpd /out=nl: ssl111 link
$ sec2 = f$cvtime(,,"secondofyear")
$ cpu2 = f$getjpi(0,"cputim")
$ write sys$output "elapsed:", sec2 - sec1, " cpu:", cpu2 - cpu1
Run three times (to fill the file cache) the (perhaps infamous) VUPS.COM shows the platform and a measure of expected "performance", across an Itanium, and Alpha, and two independent X86 systems.

Code: Select all

HP rx2660 (1.40GHz/6.0MB) with 4 CPU and 14335MB running VMS V8.4-2L3
INFO: Preventing endless loop (10$) on fast CPUs
Approximate System VUPs Rating : 486.0 ( min: 480.4 max: 490.6 )
elapsed:1 cpu:103

Digital Personal WorkStation with 1 CPU and 1536MB running VMS V8.4-2L1
Approximate System VUPs Rating : 150.4 ( min: 149.6 max: 150.8 )
elapsed:5 cpu:390

innotek GmbH VirtualBox with 2 CPU and 7680MB running VMS V9.2
INFO: Preventing endless loop (10$) on fast CPUs
Approximate System VUPs Rating : 605.5 ( min: 604.6 max: 607.4 )
%ILINK-I-THREADUPCALLS, user thread upcalls automatically enabled
elapsed:131 cpu:13071

innotek GmbH VirtualBox with 2 CPU and 7574MB running VMS V9.2
Approximate System VUPs Rating : 282.2 ( min: 281.4 max: 283.0 )
%ILINK-I-THREADUPCALLS, user thread upcalls automatically enabled
elapsed:913 cpu:91035
Linking with the X86 SSL111 shared images SSL111$LIBSSL_SHR32.EXE and SSL111$LIBCRYPTO_SHR32.EXE takes a small fraction of the time. TIA.


sms
Master
Posts: 317
Joined: Fri Aug 21, 2020 5:18 pm
Reputation: 0
Status: Offline

Re: Static link time on X86

Post by sms » Wed Jan 25, 2023 10:33 pm

Code: Select all

> Why does static linking an image take several tens (even hundreds) of
> times longer on X86 than on Itanium or Alpha?

   Did you determine that "static" made a difference?  How many
different jobs have you tested?

   How similar/different are the process quotas?  "7574MB" sounds good,
but how much of it can this process use?  I/O speed?

   I might run the job in batch, and look at the "Accounting
information" in the log file.  CPU speed is not the only determinant of
the time to complete a task.

User avatar

volkerhalle
Master
Posts: 196
Joined: Fri Aug 14, 2020 11:31 am
Reputation: 0
Status: Offline

Re: Static link time on X86

Post by volkerhalle » Thu Jan 26, 2023 2:06 am

Mark,

could you add DIRIO and PAGEFLTS to your measurements ? What does MONITOR SYSTEM indicate ?

If you compare VUPS between your rx2660 (486) and DWPS (150) and the consumed CPU time for the link operation (103 / 390), the results seem comparable: 486/150=3.2 and 390/103=3.8

Not so for the 2 x86-64 systems: VUPS 602/282=2.14 and CPU 91035/13071=6.96

What's the difference between those 2 x86-64 systems ? Faster CPU ? Different kinds of disks ?

I've been using VUPS mainly to compare the CPU capacity of emulated systems. For simple testing of the IO capacity, I've been using:

$ SPAWN/NOWAIT BACKUP/PHY/NOCRC/GROUP=0 disk: NLA0:X:X/SAVE
$ MONITOR DISK

Let it run for a while and note the average DISK-IO of the disk under load

Then stop MONITOR and STOP the sub-process running BACKUP/PHYSICAL

I've been using this simple disk-IO test, because it can be run at any OpenVMS system without downloading or installing any complicated IO performance test tools.

AFAIK, performance is not yet the main concern for VSI.

Volker.


Topic author
mgdaniel
Valued Contributor
Posts: 62
Joined: Mon Feb 28, 2022 5:16 pm
Reputation: 0
Location: Adelaide, South Australia
Status: Offline
Contact:

Re: Static link time on X86

Post by mgdaniel » Thu Jan 26, 2023 12:22 pm

AFAIK, performance is not yet the main concern for VSI.
Sure, and I understand this, however the link time between the rx2660 (elapsed:1 cpu:103) and the Intel NUC 6 core i7 1.10GHz 32GB X86 (elapsed:131 cpu:13071) is so chalk-and-cheese that something else has to be going on.
If you compare VUPS between your rx2660 (486) and DWPS (150) and the consumed CPU time for the link operation (103 / 390), the results seem comparable: 486/150=3.2 and 390/103=3.8

Not so for the 2 x86-64 systems: VUPS 602/282=2.14 and CPU 91035/13071=6.96
I noticed this myself but not expressed eloquently as above.

Might just reflect an older generation of i7 (Dell) and a difference between 3.4 and 1.10 GHz clock speed (see below).
What's the difference between those 2 x86-64 systems ? Faster CPU ? Different kinds of disks ?
BXNUC10i7FNH4 6 core i7 1.10GHz 32GB X86
Dell Optiplex 9020 SFF i7-4770 QC 3.4Ghz 16GB Windows 10 Pro
Both using SSDs.

The differences between the X86 systems is not of concern. Each just reflects the level of investment between real development system (NUC on Linux) and a casual development machine (Dell on Win10). I had access to two very different X86 systems and wanted to demonstrate they both behaved in a similar fashion.

To reiterate; the seeming significant difference between X86 and non-X86 platform linking. The rx2660 completes the link in a second. The 20+ year old PWS in 4 seconds. State of the art systems in 2+ minutes and 15+ minutes.

The four accounts used to run these links on each system are quite comparable:

Code: Select all

Maxjobs:         0  Fillm:       128  Bytlm:        256000
Maxacctjobs:     0  Shrfillm:      0  Pbytlm:            0
Maxdetach:       0  BIOlm:       150  JTquota:        4096
Prclm:          10  DIOlm:       150  WSdef:          4096
Prio:            4  ASTlm:       300  WSquo:          8192
Queprio:         0  TQElm:       100  WSextent:      16384
CPU:        (none)  Enqlm:      4000  Pgflquo:      700000
I have added the suggested data and the disk performance measurement.

Code: Select all

HP rx2660 (1.40GHz/6.0MB) with 4 CPU and 14335MB running VMS V8.4-2L3
INFO: Preventing endless loop (10$) on fast CPUs
Approximate System VUPs Rating : 487.2 ( min: 487.2 max: 487.2 )
elapsed:1 cpu:101 diocnt:0 pageflts:5655
    CUR        AVE        MIN        MAX
5817.33    6123.96    5817.33    6567.00

Digital Personal WorkStation with 1 CPU and 1536MB running VMS V8.4-2L1
Approximate System VUPs Rating : 151.0 ( min: 151.0 max: 151.0 )
elapsed:4 cpu:387 diocnt:0 pageflts:1497
    CUR        AVE        MIN        MAX
1016.06    1016.34    1014.40    1018.06

! BXNUC10i7FNH4 6 core i7 1.10GHz 32GB
innotek GmbH VirtualBox with 2 CPU and 7680MB running VMS V9.2
INFO: Preventing endless loop (10$) on fast CPUs
Approximate System VUPs Rating : 620.8 ( min: 616.6 max: 625.0 )
%ILINK-I-THREADUPCALLS, user thread upcalls automatically enabled
elapsed:131 cpu:13083 diocnt:0 pageflts:29298
    CUR        AVE        MIN        MAX
9729.00    9098.80    8171.51   10274.18

! Dell Optiplex 9020 SFF i7-4770 QC 3.4Ghz 16GB Windows 10 Pro
innotek GmbH VirtualBox with 2 CPU and 7574MB running VMS V9.2
INFO: Preventing endless loop (10$) on fast CPUs
Approximate System VUPs Rating : 291.0 ( min: 289.4 max: 291.8 )
%ILINK-I-THREADUPCALLS, user thread upcalls automatically enabled
elapsed:912 cpu:90834 diocnt:0 pageflts:30048
    CUR        AVE        MIN        MAX
5965.84    4766.99    4415.99    5965.84
Thanks for your interest Volker.

Added in 19 minutes 47 seconds:
sms wrote:
Wed Jan 25, 2023 10:33 pm

It takes only a few minutes to learn some basic formatting Steve. There are buttons.

Code: Select all

> Why does static linking an image take several tens (even hundreds) of
> times longer on X86 than on Itanium or Alpha?

Code: Select all

   Did you determine that "static" made a difference?  How many
different jobs have you tested?
Yes, and a single job.

Code: Select all

   How similar/different are the process quotas?  "7574MB" sounds good,
but how much of it can this process use?  I/O speed?
The account quotas across the systems are comparable. Certainly makes my 1.5GB PWS pale. The additional data are addressed in the response to Volker.

Code: Select all

   I might run the job in batch, and look at the "Accounting
information" in the log file.  CPU speed is not the only determinant of
the time to complete a task.
Sure, it's not. It might account for 2x, 3x, even 10x depending on the platform but not in this context for 100x, 200x, ... On my humble PWS 500 it takes seconds, not minutes.

Thanks for your input.

User avatar

volkerhalle
Master
Posts: 196
Joined: Fri Aug 14, 2020 11:31 am
Reputation: 0
Status: Offline

Re: Static link time on X86

Post by volkerhalle » Thu Jan 26, 2023 1:55 pm

Mark,

DIRIO=0 ? Are you sure ? Even if the reads would all come from cache, the linker has to write the image file...

Have you run MONITOR SYSTEM - at least on the x86-64 systems, where the link is running for a while ? What's your visual impression of the MONITOR SYSTEM data: CPU 100% ? High Disk-IO ? High Pagefault rate ? Just describe what you're seeing, it would be hard to document the results in text form, but maybe try MONI SYSTEM/AVERAGE ?

The CPU load should always be 100%, CPUTIM is in 10ms units and always equals Elapsed, which is in seconds. So the question is, what does the CPU do ? 'Real work' = USER mode or 'overhead' = EXEC/KERNEL ?

How about MONITOR MODE ? Mostly USER or lots of EXEC and KERNEL mode ?

Volker.
Last edited by volkerhalle on Thu Jan 26, 2023 2:11 pm, edited 3 times in total.


Topic author
mgdaniel
Valued Contributor
Posts: 62
Joined: Mon Feb 28, 2022 5:16 pm
Reputation: 0
Location: Adelaide, South Australia
Status: Offline
Contact:

Re: Static link time on X86

Post by mgdaniel » Thu Jan 26, 2023 5:15 pm

DIRIO=0 ? Are you sure ? Even if the reads would all come from cache, the linker has to write the image file...
Alarms should have gone off with that obviously bogus result. Here are the correct data:

Code: Select all

HP rx2660 (1.40GHz/6.0MB) with 4 CPU and 14335MB running VMS V8.4-2L3
INFO: Preventing endless loop (10$) on fast CPUs
Approximate System VUPs Rating : 486.1 ( min: 483.8 max: 487.2 )
elapsed:1 cpu:103 dirio:4209 pageflts:5658

Digital Personal WorkStation with 1 CPU and 1536MB running VMS V8.4-2L1
Approximate System VUPs Rating : 136.2 ( min: 136.2 max: 136.2 )
elapsed:4 cpu:396 dirio:7176 pageflts:1453

! BXNUC10i7FNH4 6 core i7 1.10GHz 32GB
innotek GmbH VirtualBox with 2 CPU and 7680MB running VMS V9.2
Approximate System VUPs Rating : 623.3 ( min: 620.8 max: 625.0 )
%ILINK-I-THREADUPCALLS, user thread upcalls automatically enabled
elapsed:132 cpu:13093 dirio:4841 pageflts:29219

! Dell Optiplex 9020 SFF i7-4770 QC 3.4Ghz 16GB Windows 10 Pro
innotek GmbH VirtualBox with 2 CPU and 7574MB running VMS V9.2
Approximate System VUPs Rating : 290.2 ( min: 290.2 max: 290.2 )
%ILINK-I-THREADUPCALLS, user thread upcalls automatically enabled
elapsed:909 cpu:90809 dirio:4847 pageflts:29755
Have you run MONITOR SYSTEM - at least on the x86-64 systems, where the link is running for a while ? What's your visual impression of the MONITOR SYSTEM data: CPU 100% ? High Disk-IO ? High Pagefault rate ?
Observations:

BXNUC10i7FNH4 6 core i7 1.10GHz 32GB
VirtualBox with 2 CPU and 7680MB running VMS V9.2
CPU sits at 100% (of 200)
Page Fault Rate reaches maximum (4732) quickly then drops away
Direct I/O Rate reaches maximum (1806) then drops away
User mode 100% (of 200)
Idle time mostly 100%
Interrupt State mostly 0 occasionally flicks 1 to 8 (from Idle)
No other modes obvious

The page faults generated by the 2x X86 are comparable.
The direct IO are also comparable.
The CPU ticks consumed are Dell 5x the NUC. Perhaps in-line with their respective generations.

With notably fewer page faults on the hardware, the stark difference are the numbers of CPU ticks consumed by the Itanium and Alpha compared to the X86 running on the NUC and the Dell.

User avatar

volkerhalle
Master
Posts: 196
Joined: Fri Aug 14, 2020 11:31 am
Reputation: 0
Status: Offline

Re: Static link time on X86

Post by volkerhalle » Fri Jan 27, 2023 2:39 am

Mark,

does the DELL i7-4770 show about the same results with MONITOR SYSTEM and MONITOR MODE ?

It looks like one now also has to become an expert on Intel architectures, to choose a well-performing VSI OpenVMS x86-64 system ;-)

For predicting performance on emulated system (mostly for CHARON-AXP), I've been using PassMark CPU benchmark values for the underlying CPUs, to try to correlate them with the VUPS data measured on the phyical systems to be emulated. I've looked up the PassMark results and divided that by the no. of cores of that specific processor to get PM/core - DCL VUPS.COM is single-threaded - as is the linker.

(my) i5-9600K @3.7 GHz deliverd 730 VUPS (with V9.2). PM: 10763 6 cores -> 1793 PM/core
NUC i7-10710U @1.1 (10th Gen) GHz delivered 620 VUPS. PM: 9885 6 cores -> 1647 PM/core
DELL i7-4770 (4th Gen) @3.4 GHz delivered 291 VUPS. PM: 7064 4 cores -> 1766 PM/core

While these systems seem to be comparable PassMark-wise, the VUPS data on the 4th Gen i7-4770 is pretty bad and the link operation performance is even worse.

I've also used PRIME_SIEVE.C benchmarking in the past. Assuming you have access to a C compiler on V9.2, you could also try this on both of your OpenVMS x86-64 systems. I used this to 'calibrate' VUPS for I64. If you also have a C compiler on WIndows, why not try PRIME_SIEVE natively as well - so you get a feel for the 'real' CPU performance of the underlying system.

I seem to remember a tool from VSI (look for https://groups.google.com/g/comp.os.vms ... CwAJ?hl=de) to be run on the physical Intel processor, to be able to determine, if all necessary instructions to run OpenVMS x86-64 were available, Weren't there also 'optional' instructions, which could be emulated, if not available ? Worth to check on the old DELL...

Volker.
Last edited by volkerhalle on Fri Jan 27, 2023 7:15 am, edited 2 times in total.


Topic author
mgdaniel
Valued Contributor
Posts: 62
Joined: Mon Feb 28, 2022 5:16 pm
Reputation: 0
Location: Adelaide, South Australia
Status: Offline
Contact:

Re: Static link time on X86

Post by mgdaniel » Fri Jan 27, 2023 8:01 am

does the DELL i7-4770 show about the same results with MONITOR SYSTEM and MONITOR MODE ?
Yes, it does. Though processing for more like 15 minutes.

Code: Select all

%ILINK-I-THREADUPCALLS, user thread upcalls automatically enabled
elapsed:972 cpu:97076 dirio:4848 pageflts:28837
CPU sits at 100% (of 200)
Page Fault Rate reaches maximum (3806) quickly then drops away
Direct I/O Rate reaches maximum (1386) then drops away
User mode 100% (of 200)
Idle time mostly 100%
Interrupt State mostly 0 occasionally flicks only to 1 (from Idle)
No other modes obvious
I seem to remember a tool from VSI (look for https://groups.google.com/g/comp.os.vms ... CwAJ?hl=de) to be run on the physical Intel processor, to be able to determine, if all necessary instructions to run OpenVMS x86-64 were available, Weren't there also 'optional' instructions, which could be emulated, if not available ? Worth to check on the old DELL...

Code: Select all

OpenVMS 9.x compatibility quick-check

Vendor ID : GenuineIntel
CPU name  : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz

Necessary for OpenVMS 9.x:
XSAVE     : Yes
TSC       : Yes
APIC      : Yes
MTTR      : Yes
Optional for OpenVMS 9.x:
PCID      : Yes
X2APIC    : Yes
XSAVEOPT  : Yes
Volker, this still begs the question, why are the virtualised static links still so slow and consuming so many, many more CPU ticks?

Mebe it's time for VSI to chime in?

User avatar

volkerhalle
Master
Posts: 196
Joined: Fri Aug 14, 2020 11:31 am
Reputation: 0
Status: Offline

Re: Static link time on X86

Post by volkerhalle » Fri Jan 27, 2023 8:13 am

Mark,

is it worth your time to try to run a PRIME_SIEVE.C test ? This test does nearly no disk-IOs and is CPU-only (pure integer math). Could quickly confirm differences between the virtualized OpenVMS x86-64 systems (although with non-optimzing compilers). And show the real capacity of the CPU, if run under Windows natively.

Volker.
Last edited by volkerhalle on Fri Jan 27, 2023 8:14 am, edited 1 time in total.


Topic author
mgdaniel
Valued Contributor
Posts: 62
Joined: Mon Feb 28, 2022 5:16 pm
Reputation: 0
Location: Adelaide, South Australia
Status: Offline
Contact:

Re: Static link time on X86

Post by mgdaniel » Fri Jan 27, 2023 8:24 am

is it worth your time to try to run a PRIME_SIEVE.C test ?
I can do that (via the Itanium cross-compiler). Don't (believe I) have access to any native Win10 compiler.

There are a few out there

https://www.google.com/search?client=sa ... 8&oe=UTF-8

Any particular variety ? How to interpret the results ? Or just publish here ?

(Still think VSI may be able to shed some light.)

Post Reply