python performance on OpenVMS I64

User avatar

Topic author
imiller
Master
Posts: 147
Joined: Fri Jun 28, 2019 8:45 am
Reputation: 0
Location: South Tyneside, UK
Status: Offline
Contact:

python performance on OpenVMS I64

Post by imiller » Fri Aug 04, 2023 12:13 pm

HPE OpenVMS V8.4
VSI I64VMS PYTHON A3.10-0RELEASE005
VSI I64VMS PYTHWHLS A1.1-6FIX06
HP Integrity rx2800 i2 (1.33GHz/4.0MB)
8 CPUs, 32Gb RAM

I've started to try running some python on VMS instead of on windows. The python reads performance data from csv files in zip archives. So far going OK. The lack of matplotlib.pyplot is stopping one from running but I've got several to run now with small adjustments.

For one that I have running, it runs on VMS much slower than on my Windows 10 laptop (Intel Core i7-9850H CPU @ 2.60GHz)

The line that is taking a long time is the following which reads a large csv file (around 388666 bytes) into a pandas dataframe.

pingdata = pd.read_csv(TextIOWrapper(cf,encoding="utf-8"),
header=1, index_col=['Datetime'],parse_dates=['Datetime'],
names=['Datetime','Delay'],engine='c')

The process running the python script runs at 100% on one CPU.

Any thoughts?
Ian Miller
[ personal opinion only. usual disclaimers apply. Do not taunt happy fun ball ].

User avatar

arne_v
Master
Posts: 347
Joined: Fri Apr 17, 2020 7:31 pm
Reputation: 0
Location: Rhode Island, USA
Status: Online
Contact:

Re: python performance on OpenVMS I64

Post by arne_v » Fri Aug 04, 2023 1:45 pm

Your Itanium must be about 8 years older than your laptop.

VMS IO is notoriously slow compared to other OS.

The code says encoding="utf-8" and Windows is a lot more UTF-8 ready than VMS.

Likely the Itanium use HDD while the laptop use SSD.

Maybe it is just the small differences adding up.

Added in 3 minutes 27 seconds:
BTW, I have some benchmark code in both C (and a few other native languages) and Python.

Email me if you want a copy.

It could provide you numbers for:

C/Windows
Python/Windows
C/VMS
Python/VMS

for non-IO operations.

It may reveal whether:
* your VMS system is just slower than your Windows system in general
* Python on VMS performs poorly
* it must be IO related
Arne
arne@vajhoej.dk
VMS user since 1986


sms
Master
Posts: 349
Joined: Fri Aug 21, 2020 5:18 pm
Reputation: 0
Status: Offline

Re: python performance on OpenVMS I64

Post by sms » Fri Aug 04, 2023 2:47 pm

Code: Select all

> [...] it runs on VMS much slower than on my Windows 10 laptop [...]

   Define "much slower"?

> [...] a large csv file (around 388666 bytes) [...]

   380KB doesn't seem especially large to me.

      write sys$output f$file_attribute( "<your_file_name>", "RFM")

Anything other than STMLF might cause some sloth, especially if anyone
asks about the file size, say.  (Having to read a file to determine its
size can waste some time.)

> The process running the python script runs at 100% on one CPU.

   Without knowing where the code is spending its time, it's tough to
say how it might be improved.


> Your Itanium must be about 8 years older than your laptop.
> [...]

   And all of that.

>  VMS IO is notoriously slow compared to other OS.

   Especially when it's done carelessly, as some of the RMS default
values are sub-optimal for many tasks.  Some years ago, Info-ZIP Zip and
UnZip got noticeably faster at writing large files by setting some RMS
values to larger-than-default values.  (A larger "extend quantity" was
especially helpful, as I recall.)  I have a dim recollection of
complaining to HP about the FTP client suffering the same way.  I don't
recall if file _reading_ suffered as much, but a little playing around
with SET RMS_DEFAULT might be educational.

>  Likely the Itanium use HDD while the laptop use SSD.

   In such a case, there are limits to what software can do.

User avatar

Topic author
imiller
Master
Posts: 147
Joined: Fri Jun 28, 2019 8:45 am
Reputation: 0
Location: South Tyneside, UK
Status: Offline
Contact:

Re: python performance on OpenVMS I64

Post by imiller » Mon Aug 07, 2023 6:18 am

I expected it to be slower but it is 10 times slower ( around 2 mins to process a week of data, around 20 mins on VMS I64).
It's CPU bound. Hopefully when python on OpenVMS x86 arrives I shall be able to run it quicker.
Ian Miller
[ personal opinion only. usual disclaimers apply. Do not taunt happy fun ball ].

User avatar

arne_v
Master
Posts: 347
Joined: Fri Apr 17, 2020 7:31 pm
Reputation: 0
Location: Rhode Island, USA
Status: Online
Contact:

Re: python performance on OpenVMS I64

Post by arne_v » Mon Aug 07, 2023 7:14 pm

There seems to be "something" regarding Python.

I ran my benchmark and integer performance look like:

Code: Select all

                  C         Python 3    ratio
Itanium          246          0.64       384
Desktop PC       725           11         66
ratio            2.9          17.2     
Arne
arne@vajhoej.dk
VMS user since 1986

User avatar

neilrieck
Contributor
Posts: 21
Joined: Tue Jan 10, 2023 10:41 am
Reputation: 0
Location: Waterloo, Ontario, Canada
Status: Offline
Contact:

Re: python performance on OpenVMS I64

Post by neilrieck » Sun Aug 13, 2023 6:48 am

"I think" I may have discovered a problem with the way that cpython-3-10 is implemented on OpenVMS and "it might" explain your performance observation.

Here's what supposed to happen during normal operation on any OS:
(1) Every time that the PVM (python virtual machine) executes a script, it must be first compiled to bytecode in memory before execution. The bytecode is discarded when the PVM exits
(2) Starting with python3, whenever an import statement is encountered, the PVM looks for the source file (.py) then looks for a compiled file (.pyc) in a subfolder [.__pycache__] under the source file location. The PVM will compare the timestamp of the source file to the timestamp of bytecode file. If the source file is newer, then it must be compiled before it is executed.

Now create these two demo scripts (see below) then invoke the second one. I noticed that a new "hello_world.cpython-310.pyc" is being created during every execution "which may indicate" that a bug exists in the JIT (just in time) compiler trigger logic.

Furthermore, this may also be happening with the precompiled libraries under python$root but the user does not have the privs to update the bytecode files in that other location so everything happens in memory then is discarded.

I believe this to be true because I was testing a small script the other day which happens to import requests, which is dependent upon five other libraries. Starting this small script always introduces a 2-3 second delay so I suspect I am recompiling all the dependent code every time.

Code: Select all

# ===========================================
# title   : hello_world.py
# usage   :
# 1) python hello_world.py (compiles to memory every time)
# 2) import from another script (should only compile once)
#    note: see 'hello_world_loader.py' in this folder
# ===========================================
def main():
    print("hello world")

if __name__ == "__main__":
    main()

Code: Select all

# ===========================================
# title   : hello_world_loader.py
# usage   : python hello_world_loader.py
# ===========================================
import hello_world
hello_world.main()
Last edited by neilrieck on Sun Aug 13, 2023 6:50 am, edited 1 time in total.


jonesd
Valued Contributor
Posts: 78
Joined: Mon Aug 09, 2021 7:59 pm
Reputation: 0
Status: Offline

Re: python performance on OpenVMS I64

Post by jonesd » Sun Aug 13, 2023 2:30 pm

neilrieck wrote:
Sun Aug 13, 2023 6:48 am
(2) Starting with python3, whenever an import statement is encountered, the PVM looks for the source file (.py) then looks for a compiled file (.pyc) in a subfolder [.__pycache__] under the source file location. The PVM will compare the timestamp of the source file to the timestamp of bytecode file. If the source file is newer, then it must be compiled before it is executed.
I'm pretty sure that behavior pre-dates python3. Not only does it look for pre-compiled bytecode, it also looks for cases where the module is a DLL that implements the module in a compiled language like C. Under 2.7, at least, it also looks for cases where the library root can be a zip file, so it may be searching zip archives as well. A module can also be the name of a sub-directory that implements it with multiple files.

Compared to unix, VMS is notoriously slow at file operations (compounded by fact that the CRTL stat("/d1/d2/xxx") function has to search for both "xxx." and "xxx.DIR"), so the myriad of files open at Python startup is tolerable under linux but far less so under OpenVMS.


sodjan
Active Contributor
Posts: 40
Joined: Mon Apr 24, 2023 3:51 am
Reputation: 0
Status: Offline

Re: python performance on OpenVMS I64

Post by sodjan » Mon Aug 14, 2023 7:56 am

More specifically, Python 2 (VMS) looks for a .pyc file in *the same* DIR as where the .py file is.
And if the .pyc file is not found, and if the user has write access to the directory, the .py is "compiled" and the .pyc file is saved.

The Python install routines runs a full COMPILE on all .py files in the Python distribution, since most normal users does not have write access to the PYTHON_ROOT.

No idea how Python 3 (VMS) works (only Alpha here...).

User avatar

neilrieck
Contributor
Posts: 21
Joined: Tue Jan 10, 2023 10:41 am
Reputation: 0
Location: Waterloo, Ontario, Canada
Status: Offline
Contact:

Re: python performance on OpenVMS I64

Post by neilrieck » Mon Aug 14, 2023 10:31 am

jonesd wrote:
Sun Aug 13, 2023 2:30 pm
neilrieck wrote:
Sun Aug 13, 2023 6:48 am
(2) Starting with python3, whenever an import statement is encountered, the PVM looks for the source file (.py) then looks for a compiled file (.pyc) in a subfolder [.__pycache__] under the source file location. The PVM will compare the timestamp of the source file to the timestamp of bytecode file. If the source file is newer, then it must be compiled before it is executed.
I'm pretty sure that behavior pre-dates python3. Not only does it look for pre-compiled bytecode, it also looks for cases where the module is a DLL that implements the module in a compiled language like C. Under 2.7, at least, it also looks for cases where the library root can be a zip file, so it may be searching zip archives as well. A module can also be the name of a sub-directory that implements it with multiple files.

Compared to unix, VMS is notoriously slow at file operations (compounded by fact that the CRTL stat("/d1/d2/xxx") function has to search for both "xxx." and "xxx.DIR"), so the myriad of files open at Python startup is tolerable under linux but far less so under OpenVMS.
(1) You are correct that VMS file operations are slower than UNIX or Linux. That said, the demo I posted is creating a new pyc every execution BUT this is only supposed to happen if the py file is newer. These scripts work properly on CentOS.

(2) I didn't report these previously, but if you switch your process to "case sensitive" then none of this stuff works. I suspect this is because the associated logicals and symbols are "all uppercase" but should be mixed case.

Code: Select all

eg.  show logical python$root 
   "PYTHON$ROOT" = "KAWC09$DKB0:[SYS0.SYSCOMMON.PYTHON.]" (LNM$SYSTEM_TABLE)
but should be:
   "PYTHON$ROOT" = "KAWC09$DKB0:[SYS0.SYSCOMMON.python.]" (LNM$SYSTEM_TABLE)
Caveats:
Since stuff under here (like bin) is also lower case, then the symbol would need to be changed as well
Whatever is fixed here would also need to be fixed in the wheels package
Last edited by neilrieck on Mon Aug 14, 2023 1:59 pm, edited 1 time in total.

User avatar

arne_v
Master
Posts: 347
Joined: Fri Apr 17, 2020 7:31 pm
Reputation: 0
Location: Rhode Island, USA
Status: Online
Contact:

Re: python performance on OpenVMS I64

Post by arne_v » Mon Aug 14, 2023 9:22 pm

I tested and saw:

JFP Python 2.7 / VMS Alpha: .pyc in [] and reuse

VSI Python 3.10 / VMS Itanium: .pyc in [.__pycache__] and new version at every run

Looks like a bug.

Added in 1 minute 57 seconds:
Note that my benchmark code above:
* does not use module so no pyc file at all
* measure between two points within the code so does not include any startup time
so that result is not related to pyc files.
Arne
arne@vajhoej.dk
VMS user since 1986

Post Reply