sigsetjmp/siglongjmp and open file
-
Topic author - Member
- Posts: 6
- Joined: Fri Feb 26, 2021 5:48 pm
- Reputation: 0
- Status: Offline
sigsetjmp/siglongjmp and open file
Overview (test program attached):
1. sigsetjmp(jmpbuf,1);
2. fopen a file
3. signal(SIGALRM,TOSTMER_TRAP);
4. copy jmpbuf to static so TOSTMER_TRAP can use it
5. alarm(secs)
6. Alarm pops, TOSTMER_TRAP called, resets SIGALRM, siglongjmp, goto to end of program
7. Attempt to fclose() file, get "mount point busy"
8. Attempt to remove() file, get "file locked by another process" even though this is the only process using it.
Somehow the siglongjmp back to the sigsetjmp point is causing the file descriptor to get confused. Note that FILE *fp is in a structure and so is not in a register saved by sigsetjmp. This works on the various unix varieties (linux, solaris, aix).
Any insight is appreciated,
Regards,
Paul
1. sigsetjmp(jmpbuf,1);
2. fopen a file
3. signal(SIGALRM,TOSTMER_TRAP);
4. copy jmpbuf to static so TOSTMER_TRAP can use it
5. alarm(secs)
6. Alarm pops, TOSTMER_TRAP called, resets SIGALRM, siglongjmp, goto to end of program
7. Attempt to fclose() file, get "mount point busy"
8. Attempt to remove() file, get "file locked by another process" even though this is the only process using it.
Somehow the siglongjmp back to the sigsetjmp point is causing the file descriptor to get confused. Note that FILE *fp is in a structure and so is not in a register saved by sigsetjmp. This works on the various unix varieties (linux, solaris, aix).
Any insight is appreciated,
Regards,
Paul
- Attachments
-
- vmstmr3.c
- (1.38 KiB) Downloaded 34 times
Re: sigsetjmp/siglongjmp and open file
[Major edit]
Looking at the code, it confirms a simple C, unshared open, which means there is hardly any RMS involved.
It's just and OPEN for BLOCK-IO and CRTL doing 16KB writes when buffer full, OR FLUSHED.
The "mount point busy" can be read as 'unflushed data to deal with'
Now you may wonder why fclose does not do it all, and I think that is a fair question, but this is how it is.
Maybe there is an 'auto-flush-on-close' environment variable, but the solid fix is to add
fflush(moo.fp) # with accoutrements as deemed needed
just before the fclose.
Hein.
-------- old reply -----------
>>> 7. Attempt to fclose() file, get "mount point busy"
We need to figure out what the underlying RMS error is.
I may find time with your code.
Without altering code, one can try set debugger breakpoint after CLOSE; ANALYZE/SYSTEM; SET PROC x; SHOW PROC/RMS=(FAB,RAB)
I did, and it was all marked as succes. see below
>>> 8. Attempt to remove() file, get "file locked by another process" even though this is the only process using it.
This is, and has been since the beginning of time, a poorly worded message text. Another process is really another 'channel' and can be, and often is, in your own process,
Hein
Hein.
Looking at the code, it confirms a simple C, unshared open, which means there is hardly any RMS involved.
It's just and OPEN for BLOCK-IO and CRTL doing 16KB writes when buffer full, OR FLUSHED.
The "mount point busy" can be read as 'unflushed data to deal with'
Now you may wonder why fclose does not do it all, and I think that is a fair question, but this is how it is.
Maybe there is an 'auto-flush-on-close' environment variable, but the solid fix is to add
fflush(moo.fp) # with accoutrements as deemed needed
just before the fclose.
Hein.
-------- old reply -----------
>>> 7. Attempt to fclose() file, get "mount point busy"
We need to figure out what the underlying RMS error is.
I may find time with your code.
Without altering code, one can try set debugger breakpoint after CLOSE; ANALYZE/SYSTEM; SET PROC x; SHOW PROC/RMS=(FAB,RAB)
I did, and it was all marked as succes. see below
>>> 8. Attempt to remove() file, get "file locked by another process" even though this is the only process using it.
This is, and has been since the beginning of time, a poorly worded message text. Another process is really another 'channel' and can be, and often is, in your own process,
Hein
Code: Select all
FAB Address: 00054D10
-----------
BID: 03 3.
BLN: 50 80.
STS: 00010001 ALQ: 00000000
STV: 000000D0 DEQ: 0000
FAC: 5B PUT,GET,UPD,TRN,BRO
SHR: 00
ORG: 00 Sequential
RAT: 02 CR
RFM: 05 STMLF
RAB Address: 000575A0
-----------
BID: 01 1. ISI: 0001
ROP: 00000E00 RAH,WBH,BIO
CTX: 00054BB0 RAC: 00 SEQ
STS: 00010001 RFA: 00000C41,0000
STV: 00000000
USZ: 4000 16384. UBF: 00058000
Last edited by hein on Thu May 23, 2024 10:15 am, edited 2 times in total.
-
Topic author - Member
- Posts: 6
- Joined: Fri Feb 26, 2021 5:48 pm
- Reputation: 0
- Status: Offline
Re: sigsetjmp/siglongjmp and open file
Hi Hein,
I had tried the fflush() before the close in an earlier version of the test case (attached) and it gave me
operation already in progress
I tried allocating the MOO structure instead of a stack variable, no change. So far all I can think of is that the program state reset done by siglongjmp *somehow* interacts with the FILE data structure which seems rather farfetched.
Thank you for taking the time to look at this. The issue that I am trying to solve is to not leave "debris" files when the program exits.
Paul
I had tried the fflush() before the close in an earlier version of the test case (attached) and it gave me
operation already in progress
I tried allocating the MOO structure instead of a stack variable, no change. So far all I can think of is that the program state reset done by siglongjmp *somehow* interacts with the FILE data structure which seems rather farfetched.
Thank you for taking the time to look at this. The issue that I am trying to solve is to not leave "debris" files when the program exits.
Paul
- Attachments
-
- vmstmr3.c
- (1.56 KiB) Downloaded 28 times
Re: sigsetjmp/siglongjmp and open file
Well, it turns out I have only intermittent succes with fflush.
In general it fails for me as well with "operation already in progress!"
It seems to matter a bit whether I wait a second before calling it, but waiting long does not guarantee success. Odd!
It appears to be a timing thing, but not IO related?
For this case the C-RTL uses RMS Block IO, not record IO, so multi-block count and write-behind settings do not matter.
RMS never shows an error like 'busy' in the RAB , but I suppose it could return that in R0, if SYS$WRITE is called while SYS$WRITE is already active, but such write would be milliseconds, not the seconds I waited for.
I tried with : HP C V7.3-009 on OpenVMS Alpha V8.4 DECC$SHR "V8.4-00"
FWIW the C_RTL manual writes
"The close function tries to write buffered data by using an implicit call to fflush....
If your program needs to recover from errors when flushing buffered data, it
should make an explicit call to fsync (or fflush) before calling close."
NOTE: When I mentioned "16KB writes " - know that his directly related to one's process SET RMS/BLOCK setting.
That's all I have for now.
Hein
In general it fails for me as well with "operation already in progress!"
It seems to matter a bit whether I wait a second before calling it, but waiting long does not guarantee success. Odd!
It appears to be a timing thing, but not IO related?
For this case the C-RTL uses RMS Block IO, not record IO, so multi-block count and write-behind settings do not matter.
RMS never shows an error like 'busy' in the RAB , but I suppose it could return that in R0, if SYS$WRITE is called while SYS$WRITE is already active, but such write would be milliseconds, not the seconds I waited for.
I tried with : HP C V7.3-009 on OpenVMS Alpha V8.4 DECC$SHR "V8.4-00"
FWIW the C_RTL manual writes
"The close function tries to write buffered data by using an implicit call to fflush....
If your program needs to recover from errors when flushing buffered data, it
should make an explicit call to fsync (or fflush) before calling close."
NOTE: When I mentioned "16KB writes " - know that his directly related to one's process SET RMS/BLOCK setting.
That's all I have for now.
Hein
-
- Active Contributor
- Posts: 27
- Joined: Tue Apr 23, 2024 6:28 am
- Reputation: 0
- Status: Offline
Re: sigsetjmp/siglongjmp and open file
I'm a big fan of a library called exception4c which emulates try-catch-finally in C. It's basically syntactic sugar macros for the sigxxxjmp functions and works very well. Handles self-defined exceptions and Unix signals as well. Library can be found at https://github.com/guillermocalvo/exceptions4c . Closed successfully without fflush. Tested on x86 9.2-2 vmdk only.
<copy e4c.c, e4c.h to your source dir>
$ cc vmstmr3,e4c
$ lin vmstmr3,e4c
$ r vmstmr3
timeout value: 3
Waiting...
Timeout occurred!
Close successful!
<copy e4c.c, e4c.h to your source dir>
$ cc vmstmr3,e4c
$ lin vmstmr3,e4c
$ r vmstmr3
timeout value: 3
Waiting...
Timeout occurred!
Close successful!
- Attachments
-
- VMSTMR3.C
- (2.11 KiB) Downloaded 26 times
Last edited by alexwong on Sat May 25, 2024 3:44 am, edited 1 time in total.
Re: sigsetjmp/siglongjmp and open file
Good to hear that.
But be aware that I _thought_ I had a workaround which did fine on first test, but did fail most of the times afterwards.
So maybe run it a few times?
And as always, did you verify that it failed in your environment before you made changes?
Hein.
But be aware that I _thought_ I had a workaround which did fine on first test, but did fail most of the times afterwards.
So maybe run it a few times?
And as always, did you verify that it failed in your environment before you made changes?
Hein.
-
- Active Contributor
- Posts: 27
- Joined: Tue Apr 23, 2024 6:28 am
- Reputation: 0
- Status: Offline
Re: sigsetjmp/siglongjmp and open file
That's a valid point, I assumed OP was on an x86 vm. Re-ran his code on my vm and it did fail as well with:
Timeout occurred!
Waiting...
Close error -1 bad file number!
Remove error -1 file currently locked by another user!
The root cause of the problem is stack invalidation in main thread after siglongjmp(). See discussion at
https://stackoverflow.com/questions/796 ... mp-longjmp
A quick work-around is to declare the moo variable as global but I think refactoring the code with e4c is much cleaner.
Timeout occurred!
Waiting...
Close error -1 bad file number!
Remove error -1 file currently locked by another user!
The root cause of the problem is stack invalidation in main thread after siglongjmp(). See discussion at
https://stackoverflow.com/questions/796 ... mp-longjmp
A quick work-around is to declare the moo variable as global but I think refactoring the code with e4c is much cleaner.
- Attachments
-
- VMSTMR3_original_fixed.C
- (1.67 KiB) Downloaded 22 times
Re: sigsetjmp/siglongjmp and open file
Add a call to decc$set_reentrancy(C$C_AST) to your main() function (include <reentrancy.h> header file as well).
Added in 10 hours 40 minutes 18 seconds:
Added in 10 hours 40 minutes 18 seconds:
The root cause is the RTL checks re-entrant calls to the I/O functions and fails unless you explicitly tell it to behave in a re-entrant safe manner. A secondary problem is that the compiler may optimize away storage for the pointer variable moop, only saving in a register that the long jump blows away. Declaring the variable global or static will force allocation, as will compiling the code /noopt or refactoring when moop is initialized.alexwong wrote: ↑Sat May 25, 2024 7:34 pmThe root cause of the problem is stack invalidation in main thread after siglongjmp(). See discussion at
https://stackoverflow.com/questions/796 ... mp-longjmp
A quick work-around is to declare the moo variable as global but I think refactoring the code with e4c is much cleaner.
Last edited by jonesd on Sun May 26, 2024 10:12 pm, edited 1 time in total.
-
- Active Contributor
- Posts: 27
- Joined: Tue Apr 23, 2024 6:28 am
- Reputation: 0
- Status: Offline
Re: sigsetjmp/siglongjmp and open file
Interesting...I tried this,
MOO moo; // declare in main
if (argc > 1) timeout = atoi(argv[1]);
int status=decc$set_reentrancy (C$C_AST);
Still failed with same errors, but worked if moo is declared global(there is no moop). Seems like a stack-overwite problem but I'd like to learn more about this. I've read the C RTL doc and need help relating it to the issue here.
MOO moo; // declare in main
if (argc > 1) timeout = atoi(argv[1]);
int status=decc$set_reentrancy (C$C_AST);
Still failed with same errors, but worked if moo is declared global(there is no moop). Seems like a stack-overwite problem but I'd like to learn more about this. I've read the C RTL doc and need help relating it to the issue here.
-
Topic author - Member
- Posts: 6
- Joined: Fri Feb 26, 2021 5:48 pm
- Reputation: 0
- Status: Offline
Re: sigsetjmp/siglongjmp and open file
Thank you Alexwong, Jonesd, and Hein for your time and ideas, I appreciate them greatly and will look into your suggestions. Our OpenVMS customers run on AXP, IA, and x86 (we have VAX customers but they do not use the affected products there). The actual code is a bit more complex but the concept is the same.
1. Main allocates a control block and eventually calls function A.
2. A allocates a main control block that is used is in all calls beneath it and hooks it into Main's cb.
3. It calls the alarm setter which sets up the sigsetjmp jmpbuf, the SIGALRM, calls alarm and returns.
4. A then runs along going deeply into functions and eventually get to one that needs a file, function B.
5. B allocates a cb and puts its address into an array of pointers hanging off of A's main cb.
6. B calls another function which opens the file and puts its file pointer into B's cb.
7. B runs along until it is interrupted by the alarm which calls the siglongjmp and puts us back into A right after where sigsetjmp was called. A returns an error to its caller.
8. A gets called to clean up the mess and eventually tries to fclose the file.
In banging on this earlier it did occur to me that perhaps the alarm interrupt pops in the middle of the fwrite and something that needs to be written to the file system internal cb gets messed up due to the siglongjmp. In the "real "code we are not pounding fwrites, there is much activity in between. So if that was the case I would expect this error be sporadic, not perfectly reproducible. I am going to simplify my test to do just a few fwrites, then wait on getchar() for the alarm to pop and see if fclose() works.
Thank you again for your time and interest.
Paul
1. Main allocates a control block and eventually calls function A.
2. A allocates a main control block that is used is in all calls beneath it and hooks it into Main's cb.
3. It calls the alarm setter which sets up the sigsetjmp jmpbuf, the SIGALRM, calls alarm and returns.
4. A then runs along going deeply into functions and eventually get to one that needs a file, function B.
5. B allocates a cb and puts its address into an array of pointers hanging off of A's main cb.
6. B calls another function which opens the file and puts its file pointer into B's cb.
7. B runs along until it is interrupted by the alarm which calls the siglongjmp and puts us back into A right after where sigsetjmp was called. A returns an error to its caller.
8. A gets called to clean up the mess and eventually tries to fclose the file.
In banging on this earlier it did occur to me that perhaps the alarm interrupt pops in the middle of the fwrite and something that needs to be written to the file system internal cb gets messed up due to the siglongjmp. In the "real "code we are not pounding fwrites, there is much activity in between. So if that was the case I would expect this error be sporadic, not perfectly reproducible. I am going to simplify my test to do just a few fwrites, then wait on getchar() for the alarm to pop and see if fclose() works.
Thank you again for your time and interest.
Paul