Having difficulties when installing the system? Your system runs slowly and requires some tweaking? You can get help here.
-
Topic author
neliasen
- Member
- Posts: 8
- Joined: Mon Aug 28, 2023 5:23 am
- Reputation: 0
-
Status:
Offline
Post
by neliasen » Mon Sep 04, 2023 4:05 am
Hello
My OpenVMS 9.2-1 hangs/crashes/whatever after a few days ... and then doing a "anal/crash" gives me the following:
Code: Select all
Crashdump Summary Information:
------------------------------
Crash Time: 1-SEP-2023 23:45:23.60
Bugcheck Type: CPUSANITY, CPU sanity timer expired
Node: VMS2 (Standalone)
CPU Type: QEMU Standard PC (Q35 + ICH9, 2009)
VMS Version: V9.2-1
Current Process: NULL
Current Image: <not available>
Failing PC: FFFF8300.07DF3D04 SYSTEM_PRIMITIVES_4_MIN+8008DD04 (INTERVAL_TIMER + 000023C4 / line 52943)
Failing PS: 00000000.00000000
Module: SYSTEM_PRIMITIVES_4_MIN (Link Date/Time: 26-JUL-2023 22:09:03.27)
Offset: 8008DD04
Boot Time: 1-SEP-2023 04:48:25.00
System Uptime: 0 18:56:58.60
Crash/Primary CPU: 0./0.
System/CPU Type: 0000
Saved Processes: 13
Pagesize: 8 KByte (8192 bytes)
Physical Memory: 7915 MByte (2621440 PFNs, discontiguous memory)
Dumpfile Pagelets: 0 blocks
Dump Flags: olddump,writecomp,errlogcomp
Dump Type: raw,full,shared_mem
EXE$GL_FLAGS: init,bugdump
Paging Files: 1 Pagefile and 0 Swapfiles installed
and "clue crash" gives me:
Crashdump Summary Information:
------------------------------
Failing Instruction:
INTERVAL_TIMER + 000023C4 / line 52943: jmpb -2F
I am a bit lost here .... any ideas ?
Last edited by
marty.stu on Mon Sep 04, 2023 4:59 am, edited 1 time in total.
-
imiller
- Master
- Posts: 155
- Joined: Fri Jun 28, 2019 8:45 am
- Reputation: 0
- Location: South Tyneside, UK
-
Status:
Offline
-
Contact:
Post
by imiller » Mon Sep 04, 2023 7:16 am
in SYS$ERRORLOG there should be a file called CLUE$*.LIS ( the name includes the node name and date and time of crash). Raise an issue on the Support Portal and attach this file and the result of running VSI$SUPPORT.COM ( you can find this DCL zipped on the Support Portal )
Ian Miller
[ personal opinion only. usual disclaimers apply. Do not taunt happy fun ball ].
-
volkerhalle
- Master
- Posts: 200
- Joined: Fri Aug 14, 2020 11:31 am
- Reputation: 0
-
Status:
Offline
Post
by volkerhalle » Tue Sep 05, 2023 6:41 am
In an OpenVMS SMP system, each CPU is monitoring the sanity timer of the neighboring CPU.
If that other CPU is stuck/hung/HALTed, a CPUSANITY crash is taken.
Look at the crash with SDA> CLUE CONFIG to determine the state of the other CPUs.
Volker.
-
Topic author
neliasen
- Member
- Posts: 8
- Joined: Mon Aug 28, 2023 5:23 am
- Reputation: 0
-
Status:
Offline
Post
by neliasen » Tue Sep 05, 2023 8:09 am
I get the following:
Code: Select all
$ anal/crash SYS$SYSTEM:SYSDUMP.DMP
OpenVMS system dump analyzer
...analyzing an x86-64 interleaved memory dump...
Dump taken on 1-SEP-2023 23:45:23.60 using version V9.2-1
CPUSANITY, CPU sanity timer expired
SDA> clue config
System Configuration:
---------------------
System Information:
System Type QEMU Standard PC (Q35 + ICH9, 2009) Primary CPU ID 0.
Cycle Time 0.44 nsec (2295 MHz) Pagesize 8192 Byte
%CLUE-W-NOSYMBIOS, cannot access SMBIOS table
System Processor Configuration:
-------------------------------
CPU ID 0 CPU State rc,pa,pp,cv,pv,pmv,pl
CPU Type unknown 00000000.00000000
Halt PC 00000000.00000000 Halt PS 00000000.00000000
Halt code Bootstrap or Powerfail Halt Req. Default, No Action
Slot VA FFFFFFFF.8CF3F000 CPUDB VA FFFFFFFF.82000000
Package Unknown Core Unknown
Thread id Unknown Cothread id None
FW Usage Unknown CPU die Unknown
ACPI CPU id 00000000.00000000 Serial Num
LID 00000000.00000000 CFG flags Unknown
CPU ID 1 CPU State bip,pa,pp,cv,pv,pmv,pl
CPU Type unknown 00000000.00000000
Halt PC 00000000.00000000 Halt PS 00000000.00000000
Halt code Bootstrap or Powerfail Halt Req. Default, No Action
Slot VA FFFFFFFF.8CF3F430 CPUDB VA FFFFFFFF.8D35A000
Package Unknown Core Unknown
Thread id Unknown Cothread id None
FW Usage Unknown CPU die Unknown
ACPI CPU id 00000000.00000001 Serial Num
LID 00000000.00000001 CFG flags Unknown
-
volkerhalle
- Master
- Posts: 200
- Joined: Fri Aug 14, 2020 11:31 am
- Reputation: 0
-
Status:
Offline
Post
by volkerhalle » Tue Sep 05, 2023 8:23 am
So the state of CPU 1 is the problem, it shows:
CPU State bip,pa,pp,cv,pv,pmv,pl
In a running system, it would show:
CPU State rc,pa,pp,cv,pv,pmv,pl
According to LIB.REQ:
macro SLOT$V_BIP = 264,0,1,0 %; ! Bootstrap in progress
Volker.
-
Topic author
neliasen
- Member
- Posts: 8
- Joined: Mon Aug 28, 2023 5:23 am
- Reputation: 0
-
Status:
Offline
Post
by neliasen » Tue Sep 05, 2023 10:31 am
Hello
Sounds very reasonable.....
But why is there a difference for the two CPU's ??
I have not "tinkered" with special flags for the CPU's (nor anything else for that matter!)
-
volkerhalle
- Master
- Posts: 200
- Joined: Fri Aug 14, 2020 11:31 am
- Reputation: 0
-
Status:
Offline
Post
by volkerhalle » Tue Sep 05, 2023 10:43 am
The 2 CPUs could be in different states. E.g. CPU 01 could have been halted after a software error.
What's the state of the 2nd CPU, if your system is running normally ?
$ ANALYZE/SYSTEM
SDA> CLUE CONFIG
SDA> EXIT
Do you record the console output on OPA0 (maybe with a Putty logfile ?) There may be some messages, when the system gets into this state.
It could also be some problem with the VM and the 2nd vCPU. Do log have logfiles to check ?
Volker.
-
Topic author
neliasen
- Member
- Posts: 8
- Joined: Mon Aug 28, 2023 5:23 am
- Reputation: 0
-
Status:
Offline
Post
by neliasen » Wed Sep 06, 2023 6:33 am
running "analyze /system" and then "clue config" gives me now that the CPU states are identical ....
Code: Select all
System Processor Configuration:
-------------------------------
CPU ID 0 CPU State rc,pa,pp,cv,pv,pmv,pl
CPU Type unknown 00000000.00000000
Halt PC 00000000.00000000 Halt PS 00000000.00000000
Halt code Bootstrap or Powerfail Halt Req. Default, No Action
Slot VA FFFFFFFF.8CF3F000 CPUDB VA FFFFFFFF.82000000
Package Unknown Core Unknown
Thread id Unknown Cothread id None
FW Usage Unknown CPU die Unknown
ACPI CPU id 00000000.00000000 Serial Num
LID 00000000.00000000 CFG flags Unknown
CPU ID 1 CPU State rc,pa,pp,cv,pv,pmv,pl
CPU Type unknown 00000000.00000000
Halt PC 00000000.00000000 Halt PS 00000000.00000000
Halt code Bootstrap or Powerfail Halt Req. Default, No Action
Slot VA FFFFFFFF.8CF3F430 CPUDB VA FFFFFFFF.8D35A000
Package Unknown Core Unknown
Thread id Unknown Cothread id None
FW Usage Unknown CPU die Unknown
ACPI CPU id 00000000.00000001 Serial Num
LID 00000000.00000001 CFG flags Unknown
and the "analyze /crash" shows the following:
Crashdump Summary Information:
------------------------------
Crash Time: 1-SEP-2023 23:45:23.60
Bugcheck Type: CPUSANITY, CPU sanity timer expired
Node: VMS2 (Standalone)
CPU Type: QEMU Standard PC (Q35 + ICH9, 2009)
VMS Version: V9.2-1
Current Process: NULL
Current Image: <not available>
Failing PC: FFFF8300.07DF3D04 SYSTEM_PRIMITIVES_4_MIN+8008DD04 (INTERVAL_TIMER + 000023C4 / line 52943)
Failing PS: 00000000.00000000
Module: SYSTEM_PRIMITIVES_4_MIN (Link Date/Time: 26-JUL-2023 22:09:03.27)
Offset: 8008DD04
Boot Time: 1-SEP-2023 04:48:25.00
System Uptime: 0 18:56:58.60
Crash/Primary CPU: 0./0.
System/CPU Type: 0000
Saved Processes: 13
Pagesize: 8 KByte (8192 bytes)
Physical Memory: 7915 MByte (2621440 PFNs, discontiguous memory)
Dumpfile Pagelets: 0 blocks
Dump Flags: olddump,writecomp,errlogcomp
Dump Type: raw,full,shared_mem
EXE$GL_FLAGS: init,bugdump
Paging Files: 1 Pagefile and 0 Swapfiles installed
-
volkerhalle
- Master
- Posts: 200
- Joined: Fri Aug 14, 2020 11:31 am
- Reputation: 0
-
Status:
Offline
Post
by volkerhalle » Wed Sep 06, 2023 6:42 am
The state of both CPUs in the running system looks normal.
The question is, how did CPU 1 get into that 'bip' state and probably stopped to update it's sanity timer.
Could you try to log the output of the OPA0: console to some log file ? And then look at the messages preceeding the next CPUSANITY crash ?
Is the any log file from QEMU ?
Volker.
-
Topic author
neliasen
- Member
- Posts: 8
- Joined: Mon Aug 28, 2023 5:23 am
- Reputation: 0
-
Status:
Offline
Post
by neliasen » Thu Sep 07, 2023 7:30 am
i'll try to make OPA0: log to some file.. and also get some log from QEMU .... (none found right now...)
I just thought that having different flags set for the CPU's was ... in normal cases! .. not possible!