Bit as a key
-
Topic author - Contributor
- Posts: 10
- Joined: Fri Nov 18, 2022 11:34 am
- Reputation: 0
- Status: Offline
Bit as a key
We have a file that uses the bit to determine what to do in a particular situation with an account.
I want to skip all the accounts where the 5th bit is set to a certain value and was wondering if it's possible to create an alternate key on the specific bit in the field I'm trying to bypass.
The field is one character in length but I'm only focused on the 5th bit.
I want to skip all the accounts where the 5th bit is set to a certain value and was wondering if it's possible to create an alternate key on the specific bit in the field I'm trying to bypass.
The field is one character in length but I'm only focused on the 5th bit.
Re: Bit as a key
Most programming languages have this option. Below I give an example in Fortran (which I know best)
character*1 ch
integer*1 i1
equivalence (i1 , ch )
open(init=1 , file='your file' )
read(1,'(a)' ) ch
if (btest( I1 , 5-1) )then
..... action for 5th bit set
else
.actioin for 5th bit not set
endif
character*1 ch
integer*1 i1
equivalence (i1 , ch )
open(init=1 , file='your file' )
read(1,'(a)' ) ch
if (btest( I1 , 5-1) )then
..... action for 5th bit set
else
.actioin for 5th bit not set
endif
Last edited by joukj on Thu Mar 30, 2023 2:41 am, edited 1 time in total.
Re: Bit as a key
Code: Select all
> We have a file [...]
And you're reading it how, exactly? Some high-level language
built-ins? RMS system services? Other?
> [...] alternate key on the specific bit in the field [...]
I know nothing, but I don't see how you'd specify it. According to
the "VSI OpenVMS Record Management Services Reference Manual":
https://vmssoftware.com/docs/VSI_RMS_Ref_Manual_23Jul19.pdf
7.10. RAB$B_KSZ Field
The key size (KSZ) field contains a numeric value equal to the
size, in bytes, of the record key pointed to by the RAB$L_KBF
field.
How would you specify a key size of one bit, "in bytes"?
Knowing what I know, I'd guess that you've compacted your data too
much to make that kind of selection easy.
"See the Guide to OpenVMS File Applications for more information
about accessing indexed records."
> Most programming languages have this option. [...]
You can examine a bit in any language where you can do arithmetic,
but that's not using it as an RMS key. (Which is how I read the
original question.)
-
- VSI Expert
- Valued Contributor
- Posts: 55
- Joined: Wed Jul 21, 2021 9:14 am
- Reputation: 0
- Status: Offline
Re: Bit as a key
It's -possible-, but probably not the way you'd normally think to do so. As previous poster stated, the smallest piece of record you can set as a key will be a single byte, in your case the character field you referenced. if it's really important to be able to access those records via that bit, and not just be part of other queries, the solution I see is to create a new field in the record which is populated with the value of that fifth bit only. This means commensurate changes to the application to keep the field up-to-date.
-
- Master
- Posts: 341
- Joined: Fri Apr 17, 2020 7:31 pm
- Reputation: 0
- Location: Rhode Island, USA
- Status: Offline
- Contact:
Re: Bit as a key
As already stated by several posters then indexes use keys that have byte offset and byte length, so you cannot just add an index to the file.
I see 3 options:
A) live with the cost of sequential scan of the entire file when you need to search for those bit values - if the bit in question is on/off 50%/50% and randomly distributed then the overhead of sequential scan is probably not that big (everything will need to be read from disk to memory anyway)
B) convert the file - like changing the 1 byte (with 8 bit values) to 8 bytes and creates indexes on those bytes that you need indexes on - that is a clean solution, but all applications using the file will need to change the file definition and the handling of this data from bits to bytes (changing all applications may or may not be a problem)
C) Create a new separate file with primary key from the original file and 8 bytes with indexes as above, existing applications run as always, then you can run an index generator that populate the new file based on the original file and the application that need the fast access can use the new file - there is a huge drawback that the index is not automatically updated and can be inaccurate if no repopulated before use
I see 3 options:
A) live with the cost of sequential scan of the entire file when you need to search for those bit values - if the bit in question is on/off 50%/50% and randomly distributed then the overhead of sequential scan is probably not that big (everything will need to be read from disk to memory anyway)
B) convert the file - like changing the 1 byte (with 8 bit values) to 8 bytes and creates indexes on those bytes that you need indexes on - that is a clean solution, but all applications using the file will need to change the file definition and the handling of this data from bits to bytes (changing all applications may or may not be a problem)
C) Create a new separate file with primary key from the original file and 8 bytes with indexes as above, existing applications run as always, then you can run an index generator that populate the new file based on the original file and the application that need the fast access can use the new file - there is a huge drawback that the index is not automatically updated and can be inaccurate if no repopulated before use
Re: Bit as a key
>> I want to skip all the accounts where the 5th bit is set to a certain value
Yeah we all made those stupid - bit saving - design choice back in the day it seems, and now we are stuck with it.
>> and was wondering if it's possible to create an alternate key on the specific bit in the field I'm trying to bypass.
NO can do.
>> The field is one character in length but I'm only focused on the 5th bit.
Are bit 6,7 or 8 in use? If not you are luck as you could used the byte value 16 as cut-off point. If they are you could consider multiple ranges like 48 (32+16) thru 63 for bit 6 being in use.
But back to the main question, What is the goal? Faster processing? fewer resources while processing? How my rows are we talking about? If it not hundreds of thousands than it's unlikely to be worth the effort and if it is hundreds of thousand then an alternate key with just two values is utterly useless unless combined with a 'null-key-value' where there are NO index entries made.
It's all about the distribution - in my experience even with 90 - 10 distribution it it better to just read all and discard 90% then to read 10% by key. This is because reading BY PRIMARY KEY typically get s 10+ possible 100 rows for a single IO and will likely prime read-ahead XFC/storage caches. Reading by alternate key is followed by likely random access for each row.
Now let's say that this is tasks file and the bit signifies 'done'. In such case there are likely thousands, if not millions, task 'done' over time and maybe just tens or hundreds of tasks needing to be done. In that case a key IS useful, but you'll have to convert the application structure and make it a BYTE, not a bit. - unlike an acceptable solution after 20? 30? 40? years in production.
If you do redesign, recompile, rebuilt, convert, and deploy then make sure that the byte value chosen for 'done' is declared as a NULL KEY value in RMS / FDL to avoid getting 'duplicate chains'. Those are disastrous for insert performance often causing an additional IO for every 1000 rows already present already - yes I've seen cases where adding a row in 10 million row file cause 50,000 IO's and took many seconds to find the place where to insert that next 'done' row pointer at the and of the chain of all 'done' row pointers (7 bytes each).
Hope this helps some,
Hein.
Yeah we all made those stupid - bit saving - design choice back in the day it seems, and now we are stuck with it.
>> and was wondering if it's possible to create an alternate key on the specific bit in the field I'm trying to bypass.
NO can do.
>> The field is one character in length but I'm only focused on the 5th bit.
Are bit 6,7 or 8 in use? If not you are luck as you could used the byte value 16 as cut-off point. If they are you could consider multiple ranges like 48 (32+16) thru 63 for bit 6 being in use.
But back to the main question, What is the goal? Faster processing? fewer resources while processing? How my rows are we talking about? If it not hundreds of thousands than it's unlikely to be worth the effort and if it is hundreds of thousand then an alternate key with just two values is utterly useless unless combined with a 'null-key-value' where there are NO index entries made.
It's all about the distribution - in my experience even with 90 - 10 distribution it it better to just read all and discard 90% then to read 10% by key. This is because reading BY PRIMARY KEY typically get s 10+ possible 100 rows for a single IO and will likely prime read-ahead XFC/storage caches. Reading by alternate key is followed by likely random access for each row.
Now let's say that this is tasks file and the bit signifies 'done'. In such case there are likely thousands, if not millions, task 'done' over time and maybe just tens or hundreds of tasks needing to be done. In that case a key IS useful, but you'll have to convert the application structure and make it a BYTE, not a bit. - unlike an acceptable solution after 20? 30? 40? years in production.
If you do redesign, recompile, rebuilt, convert, and deploy then make sure that the byte value chosen for 'done' is declared as a NULL KEY value in RMS / FDL to avoid getting 'duplicate chains'. Those are disastrous for insert performance often causing an additional IO for every 1000 rows already present already - yes I've seen cases where adding a row in 10 million row file cause 50,000 IO's and took many seconds to find the place where to insert that next 'done' row pointer at the and of the chain of all 'done' row pointers (7 bytes each).
Hope this helps some,
Hein.
-
- Master
- Posts: 341
- Joined: Fri Apr 17, 2020 7:31 pm
- Reputation: 0
- Location: Rhode Island, USA
- Status: Offline
- Contact:
Re: Bit as a key
The typical main application of a VMS is old (main application as in the application that is the reason the VMS is there - as opposed to various secondary applications that has been added over time because the VMS system was there and it needed to add various integrations/functionality).
My gut feeling is that the majority is from 1980-1995.
That was a different time. Much slower CPU's much less memory and much smaller disks. XX MHz with XX MB RAM and a bunch of XXX MB disks (as opposed to N cores at X GHz, XXX GB RAM and a bunch of X TB disks of today).
Choices were made to work in that environment.
When more powerful HW arrived then many of those choices should have been revisited and code changed.
But getting funding to redo something that already work is not always easy.
So we have these cases.
My gut feeling is that the majority is from 1980-1995.
That was a different time. Much slower CPU's much less memory and much smaller disks. XX MHz with XX MB RAM and a bunch of XXX MB disks (as opposed to N cores at X GHz, XXX GB RAM and a bunch of X TB disks of today).
Choices were made to work in that environment.
When more powerful HW arrived then many of those choices should have been revisited and code changed.
But getting funding to redo something that already work is not always easy.
So we have these cases.