getting system health state

Post Reply

Topic author
willemgrooters
Valued Contributor
Posts: 86
Joined: Fri Jul 12, 2019 1:59 pm
Reputation: 0
Location: Netherlands
Status: Offline
Contact:

getting system health state

Post by willemgrooters » Tue Jan 04, 2022 10:54 am

Is there a way to get the "physical health state" of an Alpha (PWS, DS10) and Itanium (RX2620): CPU and memory temperature, fan state, power state etc, by a program, to monitor and signalling of "extreme" situations BEFORE they occur?
DS10 has RMC in snoop mode, which allows me to display a status - when it does react, which is not always the case - but that can only be done via console.


jonesd
Active Contributor
Posts: 31
Joined: Mon Aug 09, 2021 7:59 pm
Reputation: 0
Status: Offline

Re: getting system health state

Post by jonesd » Tue Jan 04, 2022 2:29 pm

For my DS10L, I can get CPU temperature examining f$getsyi("TEMPERATURE_VECTOR"). The last 2 digits are a hex encoding of the temperature in Celsius. I haven't found any other CPUs that this works. Since the 10L is a 1U box, keeping the thing cool is a problem.


dirk.bogaerts
Member
Posts: 6
Joined: Thu Feb 18, 2021 9:50 am
Reputation: 0
Status: Offline

Re: getting system health state

Post by dirk.bogaerts » Wed Jan 05, 2022 4:38 am

This script used to work on my DS20E Alpha's, but not anymore on my current Integrity servers. Currently using CockpitMgr which does all the HW monitoring.

I added an extra check to the "env_check" script:

Code: Select all

---
$ activecpu_cnt = f$getsyi("ACTIVECPU_CNT")    
$ availcpu_cnt  = f$getsyi("AVAILCPU_CNT")     
----
$ gosub cpu_check    
----
$cpu_check:                                                                     
$ if availcpu_cnt  .gt. 1                                                      
$ then                                                                         
$   if activecpu_cnt .lt. availcpu_cnt                                         
$   then write sys$output                                                     -
            "CPU is BAD : avail ''availcpu_cnt' / active ''activecpu_cnt'"     
$   else write sys$output "CPUs are Good"                                      
$   endif                                                                      
$ else                                                                         
$   write sys$output "CPU is Good"                                             
$ endif                                                                        
$ return                                                                        
----


Topic author
willemgrooters
Valued Contributor
Posts: 86
Joined: Fri Jul 12, 2019 1:59 pm
Reputation: 0
Location: Netherlands
Status: Offline
Contact:

Re: getting system health state

Post by willemgrooters » Wed Jan 05, 2022 11:39 am

Thanks - great script, gives the information I needed. It would be nice if Cockpit manager was available for community members in some form
Last edited by willemgrooters on Wed Jan 05, 2022 11:41 am, edited 1 time in total.

User avatar

imiller
Active Contributor
Posts: 46
Joined: Fri Jun 28, 2019 8:45 am
Reputation: 0
Location: Reading, UK
Status: Offline
Contact:

Re: getting system health state

Post by imiller » Mon Feb 27, 2023 10:59 am

What I'm doing using DCL and Kermit scripting to connect to the iLo and do a PS command to get the power supply status and temp then checking the output. Works for Alpha and I64.
Ian Miller
[ personal opinion only. usual disclaimers apply. Do not taunt happy fun ball ].

Post Reply