sigsetjmp/siglongjmp and open file


Topic author
theoldman
Member
Posts: 6
Joined: Fri Feb 26, 2021 5:48 pm
Reputation: 0
Status: Offline

sigsetjmp/siglongjmp and open file

Post by theoldman » Wed May 22, 2024 2:43 pm

Overview (test program attached):

1. sigsetjmp(jmpbuf,1);
2. fopen a file
3. signal(SIGALRM,TOSTMER_TRAP);
4. copy jmpbuf to static so TOSTMER_TRAP can use it
5. alarm(secs)
6. Alarm pops, TOSTMER_TRAP called, resets SIGALRM, siglongjmp, goto to end of program
7. Attempt to fclose() file, get "mount point busy"
8. Attempt to remove() file, get "file locked by another process" even though this is the only process using it.

Somehow the siglongjmp back to the sigsetjmp point is causing the file descriptor to get confused. Note that FILE *fp is in a structure and so is not in a register saved by sigsetjmp. This works on the various unix varieties (linux, solaris, aix).

Any insight is appreciated,
Regards,
Paul
Attachments
vmstmr3.c
(1.38 KiB) Downloaded 31 times


hein
Valued Contributor
Posts: 51
Joined: Fri Dec 25, 2020 5:20 pm
Reputation: 0
Status: Offline

Re: sigsetjmp/siglongjmp and open file

Post by hein » Thu May 23, 2024 8:51 am

[Major edit]
Looking at the code, it confirms a simple C, unshared open, which means there is hardly any RMS involved.
It's just and OPEN for BLOCK-IO and CRTL doing 16KB writes when buffer full, OR FLUSHED.

The "mount point busy" can be read as 'unflushed data to deal with'
Now you may wonder why fclose does not do it all, and I think that is a fair question, but this is how it is.

Maybe there is an 'auto-flush-on-close' environment variable, but the solid fix is to add

fflush(moo.fp) # with accoutrements as deemed needed

just before the fclose.

Hein.

-------- old reply -----------


>>> 7. Attempt to fclose() file, get "mount point busy"

We need to figure out what the underlying RMS error is.
I may find time with your code.
Without altering code, one can try set debugger breakpoint after CLOSE; ANALYZE/SYSTEM; SET PROC x; SHOW PROC/RMS=(FAB,RAB)

I did, and it was all marked as succes. see below

>>> 8. Attempt to remove() file, get "file locked by another process" even though this is the only process using it.

This is, and has been since the beginning of time, a poorly worded message text. Another process is really another 'channel' and can be, and often is, in your own process,

Hein

Code: Select all

FAB Address:    00054D10
-----------
BID:            03             3.
BLN:            50            80.
STS:            00010001                ALQ:            00000000
STV:            000000D0                DEQ:            0000
FAC:            5B      PUT,GET,UPD,TRN,BRO
SHR:            00
ORG:            00      Sequential
RAT:            02      CR
RFM:            05      STMLF

RAB Address:    000575A0
-----------
BID:            01             1.       ISI:            0001
ROP:            00000E00        RAH,WBH,BIO
CTX:            00054BB0                RAC:            00      SEQ
STS:            00010001                RFA:            00000C41,0000
STV:            00000000
USZ:            4000       16384.       UBF:            00058000
Hein.
Last edited by hein on Thu May 23, 2024 10:15 am, edited 2 times in total.


Topic author
theoldman
Member
Posts: 6
Joined: Fri Feb 26, 2021 5:48 pm
Reputation: 0
Status: Offline

Re: sigsetjmp/siglongjmp and open file

Post by theoldman » Fri May 24, 2024 2:25 pm

Hi Hein,
I had tried the fflush() before the close in an earlier version of the test case (attached) and it gave me

operation already in progress

I tried allocating the MOO structure instead of a stack variable, no change. So far all I can think of is that the program state reset done by siglongjmp *somehow* interacts with the FILE data structure which seems rather farfetched.

Thank you for taking the time to look at this. The issue that I am trying to solve is to not leave "debris" files when the program exits.
Paul
Attachments
vmstmr3.c
(1.56 KiB) Downloaded 24 times


hein
Valued Contributor
Posts: 51
Joined: Fri Dec 25, 2020 5:20 pm
Reputation: 0
Status: Offline

Re: sigsetjmp/siglongjmp and open file

Post by hein » Sat May 25, 2024 12:11 am

Well, it turns out I have only intermittent succes with fflush.
In general it fails for me as well with "operation already in progress!"
It seems to matter a bit whether I wait a second before calling it, but waiting long does not guarantee success. Odd!
It appears to be a timing thing, but not IO related?
For this case the C-RTL uses RMS Block IO, not record IO, so multi-block count and write-behind settings do not matter.
RMS never shows an error like 'busy' in the RAB , but I suppose it could return that in R0, if SYS$WRITE is called while SYS$WRITE is already active, but such write would be milliseconds, not the seconds I waited for.

I tried with : HP C V7.3-009 on OpenVMS Alpha V8.4 DECC$SHR "V8.4-00"

FWIW the C_RTL manual writes
"The close function tries to write buffered data by using an implicit call to fflush....
If your program needs to recover from errors when flushing buffered data, it
should make an explicit call to fsync (or fflush) before calling close."

NOTE: When I mentioned "16KB writes " - know that his directly related to one's process SET RMS/BLOCK setting.

That's all I have for now.
Hein


alexwong
Active Contributor
Posts: 27
Joined: Tue Apr 23, 2024 6:28 am
Reputation: 0
Status: Offline

Re: sigsetjmp/siglongjmp and open file

Post by alexwong » Sat May 25, 2024 3:41 am

I'm a big fan of a library called exception4c which emulates try-catch-finally in C. It's basically syntactic sugar macros for the sigxxxjmp functions and works very well. Handles self-defined exceptions and Unix signals as well. Library can be found at https://github.com/guillermocalvo/exceptions4c . Closed successfully without fflush. Tested on x86 9.2-2 vmdk only.

<copy e4c.c, e4c.h to your source dir>
$ cc vmstmr3,e4c
$ lin vmstmr3,e4c
$ r vmstmr3
timeout value: 3
Waiting...

Timeout occurred!
Close successful!
Attachments
VMSTMR3.C
(2.11 KiB) Downloaded 22 times
Last edited by alexwong on Sat May 25, 2024 3:44 am, edited 1 time in total.


hein
Valued Contributor
Posts: 51
Joined: Fri Dec 25, 2020 5:20 pm
Reputation: 0
Status: Offline

Re: sigsetjmp/siglongjmp and open file

Post by hein » Sat May 25, 2024 11:34 am

Good to hear that.
But be aware that I _thought_ I had a workaround which did fine on first test, but did fail most of the times afterwards.
So maybe run it a few times?
And as always, did you verify that it failed in your environment before you made changes?
Hein.


alexwong
Active Contributor
Posts: 27
Joined: Tue Apr 23, 2024 6:28 am
Reputation: 0
Status: Offline

Re: sigsetjmp/siglongjmp and open file

Post by alexwong » Sat May 25, 2024 7:34 pm

That's a valid point, I assumed OP was on an x86 vm. Re-ran his code on my vm and it did fail as well with:

Timeout occurred!
Waiting...
Close error -1 bad file number!
Remove error -1 file currently locked by another user!

The root cause of the problem is stack invalidation in main thread after siglongjmp(). See discussion at
https://stackoverflow.com/questions/796 ... mp-longjmp

A quick work-around is to declare the moo variable as global but I think refactoring the code with e4c is much cleaner.
Attachments
VMSTMR3_original_fixed.C
(1.67 KiB) Downloaded 20 times


jonesd
Valued Contributor
Posts: 85
Joined: Mon Aug 09, 2021 7:59 pm
Reputation: 0
Status: Offline

Re: sigsetjmp/siglongjmp and open file

Post by jonesd » Sun May 26, 2024 11:05 am

Add a call to decc$set_reentrancy(C$C_AST) to your main() function (include <reentrancy.h> header file as well).

Added in 10 hours 40 minutes 18 seconds:
alexwong wrote:
Sat May 25, 2024 7:34 pm
The root cause of the problem is stack invalidation in main thread after siglongjmp(). See discussion at
https://stackoverflow.com/questions/796 ... mp-longjmp

A quick work-around is to declare the moo variable as global but I think refactoring the code with e4c is much cleaner.
The root cause is the RTL checks re-entrant calls to the I/O functions and fails unless you explicitly tell it to behave in a re-entrant safe manner. A secondary problem is that the compiler may optimize away storage for the pointer variable moop, only saving in a register that the long jump blows away. Declaring the variable global or static will force allocation, as will compiling the code /noopt or refactoring when moop is initialized.
Last edited by jonesd on Sun May 26, 2024 10:12 pm, edited 1 time in total.


alexwong
Active Contributor
Posts: 27
Joined: Tue Apr 23, 2024 6:28 am
Reputation: 0
Status: Offline

Re: sigsetjmp/siglongjmp and open file

Post by alexwong » Mon May 27, 2024 7:55 am

Interesting...I tried this,

MOO moo; // declare in main
if (argc > 1) timeout = atoi(argv[1]);
int status=decc$set_reentrancy (C$C_AST);

Still failed with same errors, but worked if moo is declared global(there is no moop). Seems like a stack-overwite problem but I'd like to learn more about this. I've read the C RTL doc and need help relating it to the issue here.


Topic author
theoldman
Member
Posts: 6
Joined: Fri Feb 26, 2021 5:48 pm
Reputation: 0
Status: Offline

Re: sigsetjmp/siglongjmp and open file

Post by theoldman » Mon May 27, 2024 12:39 pm

Thank you Alexwong, Jonesd, and Hein for your time and ideas, I appreciate them greatly and will look into your suggestions. Our OpenVMS customers run on AXP, IA, and x86 (we have VAX customers but they do not use the affected products there). The actual code is a bit more complex but the concept is the same.

1. Main allocates a control block and eventually calls function A.

2. A allocates a main control block that is used is in all calls beneath it and hooks it into Main's cb.

3. It calls the alarm setter which sets up the sigsetjmp jmpbuf, the SIGALRM, calls alarm and returns.

4. A then runs along going deeply into functions and eventually get to one that needs a file, function B.

5. B allocates a cb and puts its address into an array of pointers hanging off of A's main cb.

6. B calls another function which opens the file and puts its file pointer into B's cb.

7. B runs along until it is interrupted by the alarm which calls the siglongjmp and puts us back into A right after where sigsetjmp was called. A returns an error to its caller.

8. A gets called to clean up the mess and eventually tries to fclose the file.

In banging on this earlier it did occur to me that perhaps the alarm interrupt pops in the middle of the fwrite and something that needs to be written to the file system internal cb gets messed up due to the siglongjmp. In the "real "code we are not pounding fwrites, there is much activity in between. So if that was the case I would expect this error be sporadic, not perfectly reproducible. I am going to simplify my test to do just a few fwrites, then wait on getchar() for the alarm to pop and see if fclose() works.

Thank you again for your time and interest.
Paul

Post Reply