Discussion:
CRTL and RMS vs SSIO
(too old to reply)
Greg Tinkler
2021-10-06 02:06:29 UTC
Permalink
I notice that SSIO (beta) in included in an up coming V9.1 field test. So I read up on the issues it is trying to solve.

One concerning thing was to have CRTL (via SSIO) access directly to XFC. From an architectural point of view this is wrong at so many levels, but if that is what needs to happen then open it up so RMS and other code bases can use it.

The main reason stated was the need to do byte offset/count IO’s. Well lets solve that first, change RMS by adding SYS$READB and SYS$WRITEB. These would be useful to all code using RMS.
SYS$READB read from byte offset for count, return latest data from that byte range.
SYS$WRITEB write from byte offset for count, update latest copy of underlying blocks.

SYS$WRITEB needs to use latest copy of data, and could use the new SSIO interface to XFC but RMS has it's own methods for this.
It may seem like a big ask getting all the latest blocks, but if you think about it it only needs to re-read the last and first block if it does not already have the latest copy. Also no need if the offset starts at the beginning of a block, and it fills the last block.

By having these as part of RMS we want to ensure the blocks/buffers are coordinated so any other user of RMS will see the changes, and we get their changes.

This seems to be at the core of the CRTL issue, it does NOT use RMS, nor does it synchronize its blocks/buffers, leading to the lost update problem.

So with this ‘simple’ addition the CRTL should be altered to us RMS for all file IO.

An extra that could be added, if the file is RFM=fixed, and the C code uses it that way with the same record length then use the SYS$GET/SYS$PUT so it will play nicely with an RMS access to those files.

Anyway just my 2 cent worth.

gt down under
Stephen Hoffman
2021-10-06 03:09:14 UTC
Permalink
Post by Greg Tinkler
I notice that SSIO (beta) in included in an up coming V9.1 field test.
So I read up on the issues it is trying to solve.
One concerning thing was to have CRTL (via SSIO) access directly to
XFC. From an architectural point of view this is wrong at so many
levels,...
Off the top, some of the various existing stuff that breaks layering on
OpenVMS includes HBVS volume shadowing, MOUNT, and byte-range locking.

IP as a layered product is broken layering.

The C select() call is a fine mess of mis-layering.

The XQP design is mis-layering.

There are other examples.

There are examples of breaking layering to advantage, such as ZFS
else-platform.

All discussions of layering and esthetics aside, I presume the primary
purpose of the SSIO project is to permit porting PostgreSQL to OpenVMS,
posthaste.
--
Pure Personal Opinion | HoffmanLabs LLC
Greg Tinkler
2021-10-06 03:32:50 UTC
Permalink
Post by Stephen Hoffman
Off the top, some of the various existing stuff that breaks layering on
OpenVMS includes HBVS volume shadowing, MOUNT, and byte-range locking.
IP as a layered product is broken layering.
The C select() call is a fine mess of mis-layering.
The XQP design is mis-layering.
There are other examples.
There are examples of breaking layering to advantage, such as ZFS
else-platform.
All discussions of layering and esthetics aside, I presume the primary
purpose of the SSIO project is to permit porting PostgreSQL to OpenVMS,
posthaste.
Yup, exactly, hence get CRTL to use RMS which does work.

Re byte range locking, why not just use locking granularity (aka Rdb) to do the job. Very efficient and has worked for decades, and no need to change VMS DLM. Sure it may be nice to have an API that does this for us, but hey we are programmers.

gt
Stephen Hoffman
2021-10-06 15:09:20 UTC
Permalink
Post by Greg Tinkler
Yup, exactly, hence get CRTL to use RMS which does work.
For this case, RMS really doesn't work at all well. Says why right
there in the name, too. Record management, not stream management.

C and IP have both been tussling with mismatched assumptions within the
OpenVMS file system since the instantiation of C on OpenVMS, too.

Lately, I've been tussling with the record-oriented assumptions within
OpenVMS. Records just never got as far along as objects. And RMS
records are an unmitigated joy around upgrades and mixed-version
clusters.

The various stream-format files are one of the ensuing compromises here.
Post by Greg Tinkler
Re byte range locking, why not just use locking granularity (aka Rdb)
to do the job. Very efficient and has worked for decades, and no need
to change VMS DLM.
The use of Oracle Rdb isn't viable as a dependency for many folks, and
lock granularity doesn't work at all well for arbitrary and overlapping
locking ranges.
Post by Greg Tinkler
Sure it may be nice to have an API that does this for us, but hey we
are programmers.
I don't want us each writing and debugging and maintaining
range-locking code for what is part of the C standard library, but you
do you.

As much as I'd like a general range-locking solution here in DLM, and
with adding (better?) stream I/O support into RMS, and as much as I'd
like to see OO API support added, and IP integration, and app and app
security integration with sandboxes, packaging, and package management,
and a whole pile of other badly-needed work, I'd infer that the folks
at VSI really want PostgreSQL as an available database option soonest.

There's a very long history of "can-kicking" here and a whole lot of
that is almost inherent and inevitable with the upward-compatibility
goals for the platform, and with resulting miasma far less visible to
those of us that have used OpenVMS for the past decade or three or
more, but is front and center with any new developer looking at the
APIs, and with any wholly new 64-bit app work.
--
Pure Personal Opinion | HoffmanLabs LLC
Craig A. Berry
2021-10-06 12:40:07 UTC
Permalink
Post by Greg Tinkler
An extra that could be added, if the file is RFM=fixed, and the C
code uses it that way with the same record length then use the
SYS$GET/SYS$PUT so it will play nicely with an RMS access to those files.
I don't know the degree to which the current plan corresponds to the
original plan from a decade or so ago, but back then only stream files
were going to be supported by SSIO, which makes sense since the whole
point is locking byte ranges.
David Jones
2021-10-06 13:18:55 UTC
Permalink
Post by Craig A. Berry
Post by Greg Tinkler
An extra that could be added, if the file is RFM=fixed, and the C
code uses it that way with the same record length then use the
SYS$GET/SYS$PUT so it will play nicely with an RMS access to those files.
I don't know the degree to which the current plan corresponds to the
original plan from a decade or so ago, but back then only stream files
were going to be supported by SSIO, which makes sense since the whole
point is locking byte ranges.
Open source software ports often comes with the restriction that it only works
with stream-LF files. Maybe they should add flag to directory files that if set
only allows it to contain stream-LF or directory files.

I keep a stmlf.fdl file in my login directory to use for copying (i.e. convert/fdl=...)
text files to NFS shares.
John Dallman
2021-10-06 19:04:00 UTC
Permalink
Post by David Jones
Open source software ports often comes with the restriction that it
only works with stream-LF files. Maybe they should add a flag to
directory files that if set only allows it to contain stream-LF
or directory files.
People used to UNIX or Windows generally find the other VMS file types
baffling and confusing. I got used to the idea, but never made use of
them, since my employers already had fewer customers on VMS than they did
UNIX when I joined, and the disparity only increased.

John
Greg Tinkler
2021-10-07 01:25:57 UTC
Permalink
What a good conversation, some feedback.
Post by Arne Vajhøj
To be honest then I think the safest way to implement this is
to put lots of restrictions on when it is doable.
* No cluster support (announcement already states that!)
* Only FIX 512, STMLF and UDF are supported
* no mixing with traditional RMS calls
My point is SSIO seems to be focused on just PostgreSQL, whereas an RMS solution is much much easier to program, uses well tested code, and is already cluster ready putting the team ahead of the game and not building issues for the future.
Post by Arne Vajhøj
I've a database product, a rather old product. At the time it was
implemented it was rather useful. But there was a locking issue. The
DLM locks resource names. The database would support I/O transfers of 1
to 127 disk blocks. How would one lock 127 contiguous disk blocks? The
blunt force method could be taking out 127 locks, not an optimum
solution. Having numeric range locking back in 1984 would have been
quite useful.
Yup DLM uses resource names, but they can be hierarchical, like a B-Tree index. Also the resources need only exist when needed, removed it not. The the lock tree size depends on the lock contention.

This is why I made reference to Rdb, it uses this technique, and they are probably not the only ones. NB each level controls a range of resources and each level can have it’s own fan out factor. The depth and lowest level is always dependant on the applications requirements.

FYI I am pretty sure RMS uses RFA to lock a record, this is an implied range of 1 record.
Post by Arne Vajhøj
No matter what the disk can do then the VMS file system is still
block oriented and I believe the system services take block offsets
not byte offsets.
All disks are block based, even on Unix. With some SSD’s yes you can do byte transfers, but this should be left to the driver to optimise. Also with X86_64 it weill be virtualised so what the..
Post by Arne Vajhøj
For this case, RMS really doesn't work at all well. Says why right
there in the name, too. Record management, not stream management.
Well yes and no. If you think about it most Unix text IO is record, ie LF terminated, and binary is fixed records not necessarily the same length in the file.

RMS for $GET and $PUT are record based, but $READ and $WRITE are block based, missing is $READB and $WRITEB, not just for CRTL but useful for various applications.

RMS ISAM with fixed length records is a pain, I have long argued ISAM should support variable length records, don’t care if they are VFC or STMLF, I would allow for both as VFC could allow for binary variable length records.

Likewise the keys on an ISAM file should be able to be variable based on a separator e.g “,” or <tab> or a combination.
Post by Arne Vajhøj
The use of Oracle Rdb isn't viable as a dependency for many folks, and
lock granularity doesn't work at all well for arbitrary and overlapping
locking ranges.
I think you will be a B-Tree style dynamic resource tree, similar to what Rdb uses, will work well. Any ‘byte range’ implementation will need some index to find interesting locks, DLM uses hash which is as efficient as you can get.
Post by Arne Vajhøj
Post by Greg Tinkler
Sure it may be nice to have an API that does this for us, but hey we
are programmers.
I don't want us each writing and debugging and maintaining
range-locking code for what is part of the C standard library, but you
do you.
NO, quite the opposite. I believe there is a POSIX standard for a locking API, and as VMS, sorry OpenVMS, wishes to maintain its POSIX stamp it should use these API’s using DLM underneath. NB DLM is also already cluster based, but you know that.
Post by Arne Vajhøj
People used to UNIX or Windows generally find the other VMS file types
baffling and confusing.
I always wondered why the CRTL did not have some smarts to present a VFC records as STMLF and vise-versa, effectively hiding the internal record structures. This could be done via open using the VMS extension “rfm=STMLF” which should be the default unless it is a binary file “rfm=unf”. If the file is VFC then CRTL could to the translation. Wishful thinking.

gt down under
Arne Vajhøj
2021-10-07 01:48:24 UTC
Permalink
Post by Greg Tinkler
What a good conversation, some feedback.
Post by Arne Vajhøj
To be honest then I think the safest way to implement this is
to put lots of restrictions on when it is doable.
* No cluster support (announcement already states that!)
* Only FIX 512, STMLF and UDF are supported
* no mixing with traditional RMS calls
My point is SSIO seems to be focused on just PostgreSQL, whereas an
RMS solution is much much easier to program, uses well tested code,
and is already cluster ready putting the team ahead of the game and
not building issues for the future.
I very much doubt that a full RMS solution is much easier.

:-)
Post by Greg Tinkler
Post by Arne Vajhøj
For this case, RMS really doesn't work at all well. Says why right
there in the name, too. Record management, not stream management.
Well yes and no. If you think about it most Unix text IO is record,
ie LF terminated, and binary is fixed records not necessarily the
same length in the file.
RMS for $GET and $PUT are record based, but $READ and $WRITE are
block based, missing is $READB and $WRITEB, not just for CRTL but
useful for various applications.
RMS ISAM with fixed length records is a pain, I have long argued ISAM
should support variable length records, don’t care if they are VFC or
STMLF, I would allow for both as VFC could allow for binary variable
length records.
????

Index-sequential files and RMS API supports variable length.

Not all language API's on top of RMS does.
Post by Greg Tinkler
Post by Arne Vajhøj
The use of Oracle Rdb isn't viable as a dependency for many folks, and
lock granularity doesn't work at all well for arbitrary and overlapping
locking ranges.
I think you will be a B-Tree style dynamic resource tree, similar to
what Rdb uses, will work well. Any ‘byte range’ implementation will
need some index to find interesting locks, DLM uses hash which is as
efficient as you can get.
Hash is effective for finding exact matches but useless for finding
other matches aka "starting with". For those a tree is better.

Arne
Lawrence D’Oliveiro
2021-10-07 02:00:36 UTC
Permalink
Post by Greg Tinkler
All disks are block based, even on Unix.
The difference being, on *nix systems, the responsibility for blocking and deblocking is left to the filesystem layer. So if a file is n bytes long, and n mod «sector size» ≠ 0, the application never sees what is in the padding bytes, if any.

Some filesystems even implement “tail packing”, which means the leftover bits of multiple files can share the same block, all transparently to the application, minimizing fragmentation.

By the way, Linus Torvalds did apparently use a VMS system at some point. (Must have been after his Sinclair QL days.) Guess what reason he gave, when asked why he hated it ...
Post by Greg Tinkler
RMS ISAM with fixed length records is a pain, I have long argued ISAM should support
variable length records ...
Given that nowadays an SQL-based RDBMS like SQLite can offer full support for transactions, joins and subqueries (missing only more multi-user-type features like locking and replication), and yet still be resource-light enough to fit in your mobile phone, I would say the time for application developers to be grubbing about in ISAM files is past.
Arne Vajhøj
2021-10-07 15:50:50 UTC
Permalink
Post by Lawrence D’Oliveiro
Given that nowadays an SQL-based RDBMS like SQLite can offer full
support for transactions, joins and subqueries (missing only more
multi-user-type features like locking and replication), and yet still
be resource-light enough to fit in your mobile phone, I would say the
time for application developers to be grubbing about in ISAM files is
past.
There are still cases where it make sense. RMS index-sequential files
are really a NoSQL Key Value Store in modern terminology and
they are still used and new ones even being developed (like
RocksDB).

But the default should change.

"use index-sequential file unless good reason to use relational database"

=>

"use relational database unless good reason to use
index-sequential file"

Arne
Dave Froble
2021-10-07 17:25:30 UTC
Permalink
Post by Arne Vajhøj
Post by Lawrence D’Oliveiro
Given that nowadays an SQL-based RDBMS like SQLite can offer full
support for transactions, joins and subqueries (missing only more
multi-user-type features like locking and replication), and yet still
be resource-light enough to fit in your mobile phone, I would say the
time for application developers to be grubbing about in ISAM files is
past.
There are still cases where it make sense. RMS index-sequential files
are really a NoSQL Key Value Store in modern terminology and
they are still used and new ones even being developed (like
RocksDB).
But the default should change.
"use index-sequential file unless good reason to use relational database"
=>
"use relational database unless good reason to use
index-sequential file"
Arne
I'd suggest there should not be a "default". Rather, make good
thoughtful decisions. Have valid reasons for any decisions or choices.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Dave Froble
2021-10-07 04:10:28 UTC
Permalink
Post by Greg Tinkler
What a good conversation, some feedback.
Post by Arne Vajhøj
To be honest then I think the safest way to implement this is
to put lots of restrictions on when it is doable.
* No cluster support (announcement already states that!)
* Only FIX 512, STMLF and UDF are supported
* no mixing with traditional RMS calls
My point is SSIO seems to be focused on just PostgreSQL, whereas an RMS solution is much much easier to program, uses well tested code, and is already cluster ready putting the team ahead of the game and not building issues for the future.
RMS is a bit too high level for what's being discussed.

But yeah, the real issue is that SSIO was aimed (it seems) at
PostgreSQL. In my opinion, that is poor software architecture and design.
Post by Greg Tinkler
Post by Arne Vajhøj
I've a database product, a rather old product. At the time it was
implemented it was rather useful. But there was a locking issue. The
DLM locks resource names. The database would support I/O transfers of 1
to 127 disk blocks. How would one lock 127 contiguous disk blocks? The
blunt force method could be taking out 127 locks, not an optimum
solution. Having numeric range locking back in 1984 would have been
quite useful.
Yup DLM uses resource names, but they can be hierarchical, like a B-Tree index. Also the resources need only exist when needed, removed it not. The the lock tree size depends on the lock contention.
Well the perceived issue is what happens when taking out locks, and at
some point there is a conflict. Say needing 127 blocks locked, and the
conflict is on the last block. That means 126 locks to be released, and
perhaps try again.

In reality, the large I/O buffer capability is rarely used, and then
it's usually with exclusive file access, which precludes the need for
block locks, just the file lock. For random access, single block
locking and I/O is good. Larger I/O buffers are usually used for
sequential access, both read only, and updating.
Post by Greg Tinkler
This is why I made reference to Rdb, it uses this technique, and they are probably not the only ones. NB each level controls a range of resources and each level can have it’s own fan out factor. The depth and lowest level is always dependant on the applications requirements.
FYI I am pretty sure RMS uses RFA to lock a record, this is an implied range of 1 record.
RMS has some interesting internals, basically below application usage.

Global buffers
Multiple buffers
Multi-block count

RMS can (I believe, it's been a long while) keep track of file usage,
and provide data from an RMS buffer to a user's buffer. No disk
activity required. Writes of course must go to disk. But even so, the
data can still be in the updated global buffers for use by multiple tasks.
Post by Greg Tinkler
Post by Arne Vajhøj
No matter what the disk can do then the VMS file system is still
block oriented and I believe the system services take block offsets
not byte offsets.
All disks are block based, even on Unix. With some SSD’s yes you can do byte transfers, but this should be left to the driver to optimise. Also with X86_64 it weill be virtualised so what the..
As long as storage is block oriented, then regardless of the numeric
range of bytes, all blocks encompassing the byte range will need to be
read, including locking, and written. This usually will include data
outside the byte range.
Post by Greg Tinkler
Post by Arne Vajhøj
For this case, RMS really doesn't work at all well. Says why right
there in the name, too. Record management, not stream management.
Ayep. RMS is record based.
Post by Greg Tinkler
Well yes and no. If you think about it most Unix text IO is record, ie LF terminated, and binary is fixed records not necessarily the same length in the file.
RMS for $GET and $PUT are record based, but $READ and $WRITE are block based, missing is $READB and $WRITEB, not just for CRTL but useful for various applications.
Forget RMS, I/O would be at the QIO level.
Post by Greg Tinkler
RMS ISAM with fixed length records is a pain, I have long argued ISAM should support variable length records, don’t care if they are VFC or STMLF, I would allow for both as VFC could allow for binary variable length records.
RMS keyed files can have variable record lengths.
RMS relative files require fixed length records. (if I remember correctly)
RMS sequential files can have variable record lengths.
Post by Greg Tinkler
Likewise the keys on an ISAM file should be able to be variable based on a separator e.g “,” or <tab> or a combination.
Post by Arne Vajhøj
The use of Oracle Rdb isn't viable as a dependency for many folks, and
lock granularity doesn't work at all well for arbitrary and overlapping
locking ranges.
I think you will be a B-Tree style dynamic resource tree, similar to what Rdb uses, will work well. Any ‘byte range’ implementation will need some index to find interesting locks, DLM uses hash which is as efficient as you can get.
Post by Arne Vajhøj
Post by Greg Tinkler
Sure it may be nice to have an API that does this for us, but hey we
are programmers.
I don't want us each writing and debugging and maintaining
range-locking code for what is part of the C standard library, but you
do you.
NO, quite the opposite. I believe there is a POSIX standard for a locking API, and as VMS, sorry OpenVMS, wishes to maintain its POSIX stamp it should use these API’s using DLM underneath. NB DLM is also already cluster based, but you know that.
Post by Arne Vajhøj
People used to UNIX or Windows generally find the other VMS file types
baffling and confusing.
That is because, without additional apps, Unix I/O is a stream of bytes.
There is no concept of records, such as that provided by RMS.

Frankly, (and yes, I'm biased), I find records reasonable, and a stream
of bytes baffling and confusing. Guess it's what one is used to.
Post by Greg Tinkler
I always wondered why the CRTL did not have some smarts to present a VFC records as STMLF and vise-versa, effectively hiding the internal record structures. This could be done via open using the VMS extension “rfm=STMLF” which should be the default unless it is a binary file “rfm=unf”. If the file is VFC then CRTL could to the translation. Wishful thinking.
I would suggest the use of "VMS" in the above, rather than "CRTL". That
is unless one considers the CRTL VMS ...
Post by Greg Tinkler
gt down under
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Lawrence D’Oliveiro
2021-10-07 07:54:51 UTC
Permalink
Post by Dave Froble
Frankly, (and yes, I'm biased), I find records reasonable, and a stream
of bytes baffling and confusing. Guess it's what one is used to.
Trouble is, there are many binary file formats that do not map easily to a simple sequence of records (of whatever delimitation). Consider the IFF family of file formats, for example: these are built out of chunks, and certain chunk types can contain other chunks.

For another example, consider file formats like TIFF and TTF, where there is a directory that identifies the location and size of the various major pieces. Oh, and PDF comes under this as well.

And then there are text-based format families, like XML, JSON, YAML, TOML ...
Arne Vajhøj
2021-10-07 14:48:58 UTC
Permalink
On Thursday, October 7, 2021 at 5:12:58 PM UTC+13, Dave Froble
Post by Dave Froble
Frankly, (and yes, I'm biased), I find records reasonable, and a
stream of bytes baffling and confusing. Guess it's what one is used
to.
Trouble is, there are many binary file formats that do not map easily
to a simple sequence of records (of whatever delimitation). Consider
the IFF family of file formats, for example: these are built out of
chunks, and certain chunk types can contain other chunks.
For another example, consider file formats like TIFF and TTF, where
there is a directory that identifies the location and size of the
various major pieces. Oh, and PDF comes under this as well.
The whole record things is mostly for text files and RMS based
database style usage.

Even on VMS then true binary files are usually FIX 512 (or in rare
cases UDF) with the structure handled entirely by the application.

Attempts to do otherwise often end up with 32K problems.
And then there are text-based format families, like XML, JSON, YAML, TOML ...
Different. Both on *nix and VMS that is a separate structure
on top of the basic file format.

Arne
David Jones
2021-10-07 15:37:04 UTC
Permalink
Post by Lawrence D’Oliveiro
Trouble is, there are many binary file formats that do not map easily to a simple sequence of records (of whatever delimitation). Consider the IFF family of file formats, for example: these are built out of chunks, and certain chunk types can contain other chunks.
Whatever happened to Compound Document Architecture (CDA)? It always struck me as an effort (now abandoned) toward an object oriented file structure.
Stephen Hoffman
2021-10-07 16:08:45 UTC
Permalink
Post by David Jones
Post by Lawrence D’Oliveiro
Trouble is, there are many binary file formats that do not map easily
to a simple sequence of records (of whatever delimitation). Consider
the IFF family of file formats, for example: these are built out of
chunks, and certain chunk types can contain other chunks.
Whatever happened to Compound Document Architecture (CDA)? It always
struck me as an effort (now abandoned) toward an object oriented file
structure.
DEC ceded the desktop app business.

The modern equivalent to CDA is PDF.
--
Pure Personal Opinion | HoffmanLabs LLC
Lawrence D’Oliveiro
2021-10-08 00:55:14 UTC
Permalink
Post by David Jones
Whatever happened to Compound Document Architecture (CDA)? It always
struck me as an effort (now abandoned) toward an object oriented file structure.
Then there was Bento, which Apple was fond of for a while (back in the days of the OpenDoc-versus-OLE2 war).

Seems like nobody cares about live embedding and compound documents now. Probably turned out to be too complex for most users to handle.

One interesting modern trend is the use of ZIP archives as a document metaformat. For example, an ODF file (ISO 26300) is essentially a ZIP archive. There is this interesting convention that the first element of the archive shall be named “mimetype”, and its content shall be uncompressed. This allows file sniffers to pick up the MIME type info at a fixed offset near the start of the file.
Arne Vajhøj
2021-10-08 00:59:48 UTC
Permalink
Post by Lawrence D’Oliveiro
One interesting modern trend is the use of ZIP archives as a document
metaformat. For example, an ODF file (ISO 26300) is essentially a ZIP
archive.
ODF, OOXML, a bunch of Java stuff (jar, war, rar, ear) etc..

Arne
Stephen Hoffman
2021-10-07 15:51:31 UTC
Permalink
Post by Lawrence D’Oliveiro
Post by Dave Froble
Frankly, (and yes, I'm biased), I find records reasonable, and a stream
of bytes baffling and confusing. Guess it's what one is used to.
Trouble is, there are many binary file formats that do not map easily
to a simple sequence of records (of whatever delimitation). Consider
the IFF family of file formats, for example: these are built out of
chunks, and certain chunk types can contain other chunks.
For another example, consider file formats like TIFF and TTF, where
there is a directory that identifies the location and size of the
various major pieces. Oh, and PDF comes under this as well.
And then there are text-based format families, like XML, JSON, YAML, TOML ...
There are many examples. It's far easier to map a whole executable
image into virtual memory or to use file system calls to load the whole
image into virtual memory, too. (This is an app design I never would
have considered on a VAX, too.)

For a number of apps and designs, I find RMS problematic for its
fondness for records in the lower parts of its position within the I/O
stack "funnel", and problematic again at somewhat higher levels of the
I/O stack "funnel" with what little RMS can do with those database
records it wants to enforce; its lack of marshaling and unmarshaling
for apps needing those services, among other sorts of designs, and with
all the usual "fun" with making changes to the contents and formats of
RMS records within apps.

Trying to make all apps fit within one NoSQL database really isn't all
that great of a solution. Getting PostgreSQL, SQLite, and other
databases better integrated is helpful. Longer-term and as I'd
mentioned in another reply, demoting 32-bit RMS to "just another local
database" status, too.

And to be absolutely clear here: if an app developer needs a NoSQL
database and as many apps can, having 32-bit RMS is entirely useful. At
least until the app developer needs to make changes or additions to the
record structures, when 32-bit RMS starts showing its age. A problem
related to how we now have roughly two-dozen files necessary within a
cluster configuration.
--
Pure Personal Opinion | HoffmanLabs LLC
Craig A. Berry
2021-10-07 12:50:19 UTC
Permalink
Post by Dave Froble
the real issue is that SSIO was aimed (it seems) at
PostgreSQL.
And Apache, and Samba, and other things that have been explicitly
mentioned as having needed app-specific workarounds due to the absence
of shared stream I/O support. SSIO *is* the general-purpose solution
that you seem to be lamenting the lack of.
Arne Vajhøj
2021-10-07 13:40:07 UTC
Permalink
Post by Craig A. Berry
the real issue is that SSIO was aimed (it seems) at PostgreSQL.
And Apache, and Samba, and other things that have been explicitly
mentioned as having needed app-specific workarounds due to the absence
of shared stream I/O support. SSIO *is* the general-purpose solution
that you seem to be lamenting the lack of.
Samba I totally get.

Multiple PC's writing to a file on a Samba share would create
some interesting scenarios.

But why does Apache need it?

It should read files to serve - and since it is serving VMS files
then I think it be as VMSish as possible so totally standard RMS.
And it should write sequential text files like access.log.

What am I missing?

Arne
Craig A. Berry
2021-10-07 13:51:08 UTC
Permalink
Post by Arne Vajhøj
Post by Craig A. Berry
the real issue is that SSIO was aimed (it seems) at PostgreSQL.
And Apache, and Samba, and other things that have been explicitly
mentioned as having needed app-specific workarounds due to the absence
of shared stream I/O support. SSIO *is* the general-purpose solution
that you seem to be lamenting the lack of.
Samba I totally get.
Multiple PC's writing to a file on a Samba share would create
some interesting scenarios.
But why does Apache need it?
It should read files to serve - and since it is serving VMS files
then I think it be as VMSish as possible so totally standard RMS.
And it should write sequential text files like access.log.
What am I missing?
log files (and probably the fact that multiple worker processes can be
writing to the same logs). And I forgot to mention that Java needs it
too. See:

<http://de.openvms.org/TUD2012/opensource_and_unix_portability.pdf>

Page 16 says:

• Java (CIFS too) uses a work-around
− Does open+read/write+close for every read/write!
− Restores current file offset after each close+open
− Significant performance issue
• Oracle problem with log and trace files
− Single writer with multiple readers
• Apache’s use of log files sub-optimal
− V1.3 places restriction
− V2.0 uses a work-around
Arne Vajhøj
2021-10-07 14:01:09 UTC
Permalink
Post by Craig A. Berry
Post by Arne Vajhøj
Post by Craig A. Berry
the real issue is that SSIO was aimed (it seems) at PostgreSQL.
And Apache, and Samba, and other things that have been explicitly
mentioned as having needed app-specific workarounds due to the absence
of shared stream I/O support. SSIO *is* the general-purpose solution
that you seem to be lamenting the lack of.
Samba I totally get.
Multiple PC's writing to a file on a Samba share would create
some interesting scenarios.
But why does Apache need it?
It should read files to serve - and since it is serving VMS files
then I think it be as VMSish as possible so totally standard RMS.
And it should write sequential text files like access.log.
What am I missing?
log files (and probably the fact that multiple worker processes can be
writing to the same logs).
I still don't get it.

I thought SSIO was about shared access to byte streams.

Writing to log files should be fine using good old record based
writes (somewhere down the call stack SYS$PUT).
Post by Craig A. Berry
  And I forgot to mention that Java needs it
<http://de.openvms.org/TUD2012/opensource_and_unix_portability.pdf>
• Java (CIFS too) uses a work-around
  − Does open+read/write+close for every read/write!
  − Restores current file offset after each close+open
  − Significant performance issue
In this context does "Java" mean "Tomcat"?

Arne
Craig A. Berry
2021-10-07 16:12:00 UTC
Permalink
Post by Arne Vajhøj
Post by Craig A. Berry
Post by Arne Vajhøj
Post by Craig A. Berry
the real issue is that SSIO was aimed (it seems) at PostgreSQL.
And Apache, and Samba, and other things that have been explicitly
mentioned as having needed app-specific workarounds due to the absence
of shared stream I/O support. SSIO *is* the general-purpose solution
that you seem to be lamenting the lack of.
Samba I totally get.
Multiple PC's writing to a file on a Samba share would create
some interesting scenarios.
But why does Apache need it?
It should read files to serve - and since it is serving VMS files
then I think it be as VMSish as possible so totally standard RMS.
And it should write sequential text files like access.log.
What am I missing?
log files (and probably the fact that multiple worker processes can be
writing to the same logs).
I still don't get it.
I thought SSIO was about shared access to byte streams.
Writing to log files should be fine using good old record based
writes (somewhere down the call stack SYS$PUT).
Don't ask me, ask the authors of the document to which I linked. Or the
folks at VSI who inherited their work. I may be wrong and it's not
about log files, but suppose it is. If you start from the premise that
the log files are stream-oriented and you have multiple writers and
multiple readers at the same time, then that's pretty much the
definition of shared access to a byte stream. Doing it differently for a
platform that prefers records would be extra cost and extra maintenance.
Post by Arne Vajhøj
Post by Craig A. Berry
                           And I forgot to mention that Java needs it
<http://de.openvms.org/TUD2012/opensource_and_unix_portability.pdf>
• Java (CIFS too) uses a work-around
   − Does open+read/write+close for every read/write!
   − Restores current file offset after each close+open
   − Significant performance issue
In this context does "Java" mean "Tomcat"?
You know as much as I do -- probably more ;-).
Arne Vajhøj
2021-10-07 17:27:22 UTC
Permalink
Post by Craig A. Berry
Post by Arne Vajhøj
I still don't get it.
Don't ask me, ask the authors of the document to which I linked. Or the
folks at VSI who inherited their work.
I know - I should not shoot the messenger. Sorry.

Arne
Dave Froble
2021-10-07 16:27:09 UTC
Permalink
Post by Arne Vajhøj
I still don't get it.
I thought SSIO was about shared access to byte streams.
That is a bit of tunnel vision.

Locking numeric ranges could be used for many other things. Such a
capability should be generic, not just for a single purpose.

That's the problem I see, the tunnel vision when approaching the issue,
rather than the vision to see just how useful the capability could be.

Craig's post points that out.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Arne Vajhøj
2021-10-07 16:59:57 UTC
Permalink
Post by Dave Froble
Post by Arne Vajhøj
I still don't get it.
I thought SSIO was about shared access to byte streams.
That is a bit of tunnel vision.
Not really. More like the definition.

<quote>
SSIO
====
Shared Stream IO feature provides POSIX compliant read/write to byte
stream files.
Hence SSIO feature, the data consistency is guaranteed when mutiple
processes are performing a Read/Write to non overlapping byte boundaries
with the same block boundary.
</quote>
Post by Dave Froble
Locking numeric ranges could be used for many other things.  Such a
capability should be generic, not just for a single purpose.
I agree that range locking is a useful feature for many other purposes
than SSIO.
Post by Dave Froble
That's the problem I see, the tunnel vision when approaching the issue,
rather than the vision to see just how useful the capability could be.
Craig's post points that out.
It listed some project that could benefit from SSIO besides
PostgreSQL.

And I just don't understand some of the examples since they
sound traditional record oriented to me.

Arne
Dave Froble
2021-10-07 16:18:28 UTC
Permalink
Post by Craig A. Berry
the real issue is that SSIO was aimed (it seems) at PostgreSQL.
And Apache, and Samba, and other things that have been explicitly
mentioned as having needed app-specific workarounds due to the absence
of shared stream I/O support. SSIO *is* the general-purpose solution
that you seem to be lamenting the lack of.
A while back we were discussing doing away with I/O to buffers, and
accessing the data in place. Slower access perhaps, but doing away with
the reading and writing to/from buffers. Haven't heard much about that
lately. I don't get out much.

Such type of activity would really benefit from having the capability of
locking just the required data, and, would need the capability of
reading and writing just the required data.

I'm aware of how useful something like SSIO would be. I'm just appalled
by the design and implementation. As mentioned, it seems aimed at just
a few current uses, and totally ignores how useful it would be for many
more future uses. This is rather consistent with the long time apathy
with which VMS has been treated. It's more a patch than an enhancement.
This is what I lament.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Stephen Hoffman
2021-10-07 17:01:49 UTC
Permalink
Post by Dave Froble
A while back we were discussing doing away with I/O to buffers, and
accessing the data in place. Slower access perhaps, but doing away
with the reading and writing to/from buffers. Haven't heard much about
that lately. I don't get out much.
Ayup. Nonvolatile byte-addressable storage hardware is available now,
and is in use in various applications.

Compatible memory hardware will be rather more available for OpenVMS
x86-64, for folks interested in investigating this for their apps.

Carving out a hunk of persistent storage will be interesting topic for
app developers on OpenVMS, though I can think of a couple of ways to
try.

Here's an HPE overview from a few years ago on the topic:
https://www.pdl.cmu.edu/SDI/2016/slides/keeton-2016-10-19-memory-driven-computing.pdf


I see some B-Tree work for this area in a newer paper, and a number of
other discussions.
Post by Dave Froble
Such type of activity would really benefit from having the capability
of locking just the required data, and, would need the capability of
reading and writing just the required data.
Locking access to the contents of a global section, or locking access
to hardware-backed storage for external devices, is the same issue.

Whether DLM overhead is too high for that to be workable is another
discussion that the app developers will want to ponder.
Post by Dave Froble
I'm aware of how useful something like SSIO would be. I'm just
appalled by the design and implementation. As mentioned, it seems
aimed at just a few current uses, and totally ignores how useful it
would be for many more future uses. This is rather consistent with the
long time apathy with which VMS has been treated. It's more a patch
than an enhancement. This is what I lament.
Alas, there's no other outcome when upward-compatibility is an
overarching goal for the platform.
--
Pure Personal Opinion | HoffmanLabs LLC
Dave Froble
2021-10-07 21:03:53 UTC
Permalink
Post by Stephen Hoffman
Post by Dave Froble
A while back we were discussing doing away with I/O to buffers, and
accessing the data in place. Slower access perhaps, but doing away
with the reading and writing to/from buffers. Haven't heard much
about that lately. I don't get out much.
Ayup. Nonvolatile byte-addressable storage hardware is available now,
and is in use in various applications.
Compatible memory hardware will be rather more available for OpenVMS
x86-64, for folks interested in investigating this for their apps.
Carving out a hunk of persistent storage will be interesting topic for
app developers on OpenVMS, though I can think of a couple of ways to try.
https://www.pdl.cmu.edu/SDI/2016/slides/keeton-2016-10-19-memory-driven-computing.pdf
I see some B-Tree work for this area in a newer paper, and a number of
other discussions.
Post by Dave Froble
Such type of activity would really benefit from having the capability
of locking just the required data, and, would need the capability of
reading and writing just the required data.
Locking access to the contents of a global section, or locking access to
hardware-backed storage for external devices, is the same issue.
Whether DLM overhead is too high for that to be workable is another
discussion that the app developers will want to ponder.
Post by Dave Froble
I'm aware of how useful something like SSIO would be. I'm just
appalled by the design and implementation. As mentioned, it seems
aimed at just a few current uses, and totally ignores how useful it
would be for many more future uses. This is rather consistent with
the long time apathy with which VMS has been treated. It's more a
patch than an enhancement. This is what I lament.
Alas, there's no other outcome when upward-compatibility is an
overarching goal for the platform.
Now I''m just a dumb polock, wandered down out of the woods. But I just
don't see where upward compatibility has anything to do with
enhancements to the DLM. If existing calls continue to work as before,
and only when an optional extra parameter would enable new capabilities,
then upward compatibility just cannot be an issue. At least for this.

The optional parameter might be a "lock type", and if not present,
existing logic would be used, and if present, new code could be executed
to process the new lock type. Stuff a couple of quadwords into the
resource name for the numeric range. It would add one new piece of data
to the DLM data structure(s).
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Stephen Hoffman
2021-10-08 14:51:03 UTC
Permalink
Post by Dave Froble
Post by Stephen Hoffman
Post by Dave Froble
I'm aware of how useful something like SSIO would be. I'm just
appalled by the design and implementation. As mentioned, it seems
aimed at just a few current uses, and totally ignores how useful it
would be for many more future uses. This is rather consistent with the
long time apathy with which VMS has been treated. It's more a patch
than an enhancement. This is what I lament.
Alas, there's no other outcome when upward-compatibility is an
overarching goal for the platform.
Now I''m just a dumb polock, wandered down out of the woods. But I
just don't see where upward compatibility has anything to do with
enhancements to the DLM. If existing calls continue to work as before,
and only when an optional extra parameter would enable new
capabilities, then upward compatibility just cannot be an issue. At
least for this.
I was building on the "long term apathy" and "more patch than
enhancement" comments, with the increasing difficulties even making
comparatively minor or isolated changes and updates.

Larger changes can be Really Difficult with ~40 years of accumunated
dependencies around, assuming the developers and schedule and funding
are all available. (q.v. Hyrum's Law.)

There are sections of OpenVMS that would best be ripped out and
replaced, or refactored, or re-architected, but that can't happen or
can't easily happen while staying compatible with existing apps.

DLM itself needs better abstractions as some of the more common tasks
are just absurdly involved to program within the existing API. Tasks
such as selecting a primary app server for a host or a cluster, for
instance. This is less of an issue for experienced OpenVMS programmers
and for those with access to examples (cost and schedule and budget and
ongoing support discussions aside), but this sequence is not something
at all obvious to less-experienced developers. And even with
experienced developers, mistakes still happen. And within a wider view,
this DLM primary support is building local process and job control
support, which is an omission I've commended on before.
--
Pure Personal Opinion | HoffmanLabs LLC
Dave Froble
2021-10-08 18:50:59 UTC
Permalink
Post by Stephen Hoffman
Post by Dave Froble
Post by Stephen Hoffman
Post by Dave Froble
I'm aware of how useful something like SSIO would be. I'm just
appalled by the design and implementation. As mentioned, it seems
aimed at just a few current uses, and totally ignores how useful it
would be for many more future uses. This is rather consistent with
the long time apathy with which VMS has been treated. It's more a
patch than an enhancement. This is what I lament.
Alas, there's no other outcome when upward-compatibility is an
overarching goal for the platform.
Now I''m just a dumb polock, wandered down out of the woods. But I
just don't see where upward compatibility has anything to do with
enhancements to the DLM. If existing calls continue to work as
before, and only when an optional extra parameter would enable new
capabilities, then upward compatibility just cannot be an issue. At
least for this.
I was building on the "long term apathy" and "more patch than
enhancement" comments, with the increasing difficulties even making
comparatively minor or isolated changes and updates.
Larger changes can be Really Difficult with ~40 years of accumunated
dependencies around, assuming the developers and schedule and funding
are all available. (q.v. Hyrum's Law.)
Hyrum's Law and such points to the need of good software architecture.
(I always have to use the spell checker on that word.)

If intelligent and structured use of something like VMS if followed,
enhancements should not be much of an issue. It is when people do
things they really should not the problems arise. Compatibility with
well designed tools should not be an issue. Going off on one's own, and
making assumptions about things, which are not guaranteed to remain
as-is is where such problems occur, for the most part.

As a simple example:

If Stat% and 1%

vs

If Stat% and SS$_NORMAL

That causes a problem, if the VMS developers decide that "1" is no
longer what it used to be. The problem is not compatibility, the
problem is not using the approved constant.

Now while breaking customers code can be bad for business, the dumb
polock can say "fuck 'em, enhance the product and break their erroneous
code".

:-)
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Simon Clubley
2021-10-08 18:19:27 UTC
Permalink
Post by Dave Froble
Now I''m just a dumb polock, wandered down out of the woods. But I just
don't see where upward compatibility has anything to do with
enhancements to the DLM. If existing calls continue to work as before,
and only when an optional extra parameter would enable new capabilities,
then upward compatibility just cannot be an issue. At least for this.
Is there a version number on the current inter-node DLM messages ?

If not, how can you change the DLM message structure in a compatible way ?

If yes, what happens when an older node sees a later format DLM message ?
You would at least need a compatibility kit to be installed on the older
nodes.
Post by Dave Froble
The optional parameter might be a "lock type", and if not present,
existing logic would be used, and if present, new code could be executed
to process the new lock type. Stuff a couple of quadwords into the
resource name for the numeric range. It would add one new piece of data
to the DLM data structure(s).
What about the DLM messages sent between nodes ?

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Dave Froble
2021-10-08 19:32:20 UTC
Permalink
Post by Simon Clubley
Post by Dave Froble
Now I''m just a dumb polock, wandered down out of the woods. But I just
don't see where upward compatibility has anything to do with
enhancements to the DLM. If existing calls continue to work as before,
and only when an optional extra parameter would enable new capabilities,
then upward compatibility just cannot be an issue. At least for this.
Is there a version number on the current inter-node DLM messages ?
Good question, and if not, perhaps such could be implemented. However,
what I envision should not affect usage of the existing resource name lock.
Post by Simon Clubley
If not, how can you change the DLM message structure in a compatible way ?
If yes, what happens when an older node sees a later format DLM message ?
You would at least need a compatibility kit to be installed on the older
nodes.
Perhaps.
Post by Simon Clubley
Post by Dave Froble
The optional parameter might be a "lock type", and if not present,
existing logic would be used, and if present, new code could be executed
to process the new lock type. Stuff a couple of quadwords into the
resource name for the numeric range. It would add one new piece of data
to the DLM data structure(s).
What about the DLM messages sent between nodes ?
Simon.
First, re-read what I posted. Specifically "if not present, existing
logic would be used".

Not sure what you're calling "DLM message".

The only data item I'd see added to the lock database would be the "lock
type", and that could be done in a manner such that it does not affect
lock database information that does not have the new structure definitions.

Perhaps it could be arranged that when using the new data structure(s),
that it would be mandatory to update all nodes in a cluster. Perhaps
some type of version would disallow usage of dissimilar versions.

Note that any node or cluster that wished to use numeric range locking
would have to have the enhancement installed. If not using it, then
nothing changes.

This could be done as a VMS DLM enhancement. I'm rather sure of that.
Whether the desire to do so might be a different issue.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Simon Clubley
2021-10-08 20:36:10 UTC
Permalink
Post by Dave Froble
Not sure what you're calling "DLM message".
DLM-related cluster traffic.

Anything you propose not only has to be compatible at API level,
but also in physical DLM messages on the wire.

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Dave Froble
2021-10-09 00:57:25 UTC
Permalink
Post by Simon Clubley
Post by Dave Froble
Not sure what you're calling "DLM message".
DLM-related cluster traffic.
Anything you propose not only has to be compatible at API level,
but also in physical DLM messages on the wire.
Simon.
If the single new piece of data is not used, then nothing changes.

If it is in use, then the nodes in question would already have the
enhancement installed.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Greg Tinkler
2021-10-07 12:54:54 UTC
Permalink
Post by Dave Froble
Well the perceived issue is what happens when taking out locks, and at
some point there is a conflict. Say needing 127 blocks locked, and the
conflict is on the last block. That means 126 locks to be released, and
perhaps try again.
Maybe, maybe not. It depends on the locking fan out factors for the differing levels. It is possible that only 1 lock is needed, may be more, the wort case would be 127. NB there is also BLAST to assist with managing the lock promotion/demotions.
Post by Dave Froble
As long as storage is block oriented, then regardless of the numeric
range of bytes, all blocks encompassing the byte range will need to be
read, including locking, and written. This usually will include data
outside the byte range.
Yup, as is the case on Unix...let the drivers worry about how and why this is done, block/byte what ever the IO device needs.
Post by Dave Froble
Forget RMS, I/O would be at the QIO level.
Why? Underneath RMS is QIO, what RMS gives us the the coordination of the buffers/buckets/clumps/block across the cluster to ensure not lost updates, as per the example used to justify SSIO.
Post by Dave Froble
RMS keyed files can have variable record lengths.
True, VAR only not VFC or STM*, but fixed length key fields, with fixed offsets in the record
Post by Dave Froble
RMS relative files require fixed length records. (if I remember correctly)
Yup, there are implicitly fixed length.

===
Have been thinking about the byte range locking. As most of the use will be for locking ranges in a file it should be integrated with RMS, i.e. RMS should have an API to allow this as it already does the locking to the buffer/bucket/clump/block. Just need another 1 or 2 layers of lock tree and you have it. And it all be cluster wide, and it will be compatible with other users of RMS.

gt
Dave Froble
2021-10-07 16:50:39 UTC
Permalink
Post by Greg Tinkler
Post by Dave Froble
Well the perceived issue is what happens when taking out locks, and at
some point there is a conflict. Say needing 127 blocks locked, and the
conflict is on the last block. That means 126 locks to be released, and
perhaps try again.
Maybe, maybe not. It depends on the locking fan out factors for the differing levels. It is possible that only 1 lock is needed, may be more, the wort case would be 127. NB there is also BLAST to assist with managing the lock promotion/demotions.
Post by Dave Froble
As long as storage is block oriented, then regardless of the numeric
range of bytes, all blocks encompassing the byte range will need to be
read, including locking, and written. This usually will include data
outside the byte range.
Yup, as is the case on Unix...let the drivers worry about how and why this is done, block/byte what ever the IO device needs.
Post by Dave Froble
Forget RMS, I/O would be at the QIO level.
Why? Underneath RMS is QIO, what RMS gives us the the coordination of the buffers/buckets/clumps/block across the cluster to ensure not lost updates, as per the example used to justify SSIO.
Too limited and specific purpose. RMS might be able to make use of some
capabilities, but so might other applications.

RMS does some things well, and doesn't have some capabilities that it
perhaps should have. Data field definitions in records comes to mind.
Post by Greg Tinkler
Post by Dave Froble
RMS keyed files can have variable record lengths.
True, VAR only not VFC or STM*, but fixed length key fields, with fixed offsets in the record
Post by Dave Froble
RMS relative files require fixed length records. (if I remember correctly)
Yup, there are implicitly fixed length.
===
Have been thinking about the byte range locking. As most of the use will be for locking ranges in a file it should be integrated with RMS, i.e. RMS should have an API to allow this as it already does the locking to the buffer/bucket/clump/block. Just need another 1 or 2 layers of lock tree and you have it. And it all be cluster wide, and it will be compatible with other users of RMS.
Short sighted thinking. Numeric range locking might be useful in many
applications.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Simon Clubley
2021-10-07 12:26:36 UTC
Permalink
Post by Greg Tinkler
Post by Stephen Hoffman
For this case, RMS really doesn't work at all well. Says why right
there in the name, too. Record management, not stream management.
Well yes and no. If you think about it most Unix text IO is record, ie LF terminated, and binary is fixed records not necessarily the same length in the file.
How do you find byte 12,335,456 in a variable length RMS sequential file
without reading from the start of the file ?

That's why there are restrictions on RMS supported file formats in an
application in some cases.
Post by Greg Tinkler
I always wondered why the CRTL did not have some smarts to present a VFC
records as STMLF and vise-versa, effectively hiding the internal record
structures. This could be done via open using the VMS extension ?rfm=STMLF?
which should be the default unless it is a binary file ?rfm=unf?. If the file
is VFC then CRTL could to the translation. Wishful thinking.
This could not be the default. What if LF characters are part of the
existing data record itself ? You have just destroyed the meaning of
the file in that case.

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Greg Tinkler
2021-10-07 12:42:39 UTC
Permalink
Post by Simon Clubley
How do you find byte 12,335,456 in a variable length RMS sequential file
without reading from the start of the file ?
That's why there are restrictions on RMS supported file formats in an
application in some cases.
The same way it is done on Unix, calculate the block offset, go get it, and extract the byte. no difference and nothing to do with the underlying format.
Post by Simon Clubley
Post by Greg Tinkler
I always wondered why the CRTL did not have some smarts to present a VFC
records as STMLF and vise-versa, effectively hiding the internal record
structures. This could be done via open using the VMS extension ?rfm=STMLF?
which should be the default unless it is a binary file ?rfm=unf?. If the file
is VFC then CRTL could to the translation. Wishful thinking.
This could not be the default. What if LF characters are part of the
existing data record itself ? You have just destroyed the meaning of
the file in that case.
Please read what I wrote, if the file has been opened "b" then don't, otherwise we need to assume it is stmLF. Yup probably another logical to set the default but I'm pretty sure if you create a new file using CRTL with the defaults then it will be stmLF anyway.

gt
Simon Clubley
2021-10-07 12:59:16 UTC
Permalink
Post by Greg Tinkler
Post by Simon Clubley
How do you find byte 12,335,456 in a variable length RMS sequential file
without reading from the start of the file ?
That's why there are restrictions on RMS supported file formats in an
application in some cases.
The same way it is done on Unix, calculate the block offset, go get it, and extract the byte. no difference and nothing to do with the underlying format.
You don't know the block offset without scanning the file when it comes
to some RMS file formats.

IOW, data byte 12,335,456 will not be the same thing as file byte 12,335,456
unless you restrict yourself to record formats that do not have embedded
record metadata.

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Arne Vajhøj
2021-10-07 13:34:23 UTC
Permalink
Post by Simon Clubley
Post by Greg Tinkler
Post by Simon Clubley
How do you find byte 12,335,456 in a variable length RMS sequential file
without reading from the start of the file ?
That's why there are restrictions on RMS supported file formats in an
application in some cases.
The same way it is done on Unix, calculate the block offset, go get it, and extract the byte. no difference and nothing to do with the underlying format.
You don't know the block offset without scanning the file when it comes
to some RMS file formats.
IOW, data byte 12,335,456 will not be the same thing as file byte 12,335,456
unless you restrict yourself to record formats that do not have embedded
record metadata.
Yes.

And it does not get better when using standard C IO.

I suspect that the variable length file output below will
surprise a few *nix developers.

$ type var.txt
A
BB
CCC
$ type stmlf.txt
A
BB
CCC
$ type process.c
#include <stdio.h>
#include <sys/stat.h>

void sequential(const char *fnm, int mode)
{
FILE *fp;
int ix, c;
printf("%s sequential read (%s):", fnm, mode ? "binary" : "text");
fp = fopen(fnm, mode ? "rb" : "r");
ix = 0;
while((c = fgetc(fp)) >= 0)
{
ix++;
if(c >= 0)
{
printf(" %d=%02X", ix, c);
}
else
{
printf(" %d=-1", ix);
}
}
printf("\n");
fclose(fp);
}

void direct(const char *fnm, int mode, int siz)
{
FILE *fp;
int ix, c;
printf("%s direct read (%s):", fnm, mode ? "binary" : "text");
fp = fopen(fnm, mode ? "rb" : "r");
for(ix = 0; ix < siz; ix++)
{
fseek(fp, ix, SEEK_SET);
c = fgetc(fp);
if(c >= 0)
{
printf(" %d=%02X", ix + 1, c);
}
else
{
printf(" %d=-1", ix + 1);
}
}
printf("\n");
fclose(fp);
}

int main(int argc,char *argv[])
{
struct stat buf;
stat(argv[1], &buf);
printf("%s size = %d bytes\n", argv[1], (int)buf.st_size);
sequential(argv[1], 0);
sequential(argv[1], 1);
direct(argv[1], 0, (int)buf.st_size);
direct(argv[1], 1, (int)buf.st_size);
return 0;
}
$ cc process
$ link process
$ mcr sys$disk:[]process var.txt
var.txt size = 14 bytes
var.txt sequential read (text): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43 8=43 9=0A
var.txt sequential read (binary): 1=41 2=42 3=42 4=43 5=43 6=43
var.txt direct read (text): 1=41 2=-1 3=02 4=-1 5=42 6=-1 7=-1 8=-1 9=43
10=-1 11=-1 12=-1 13=FF 14=-1
var.txt direct read (binary): 1=41 2=-1 3=02 4=-1 5=42 6=-1 7=-1 8=-1
9=43 10=-1 11=-1 12=-1 13=FF 14=-1
$ mcr sys$disk:[]process stmlf.txt
stmlf.txt size = 9 bytes
stmlf.txt sequential read (text): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43
8=43 9=0A
stmlf.txt sequential read (binary): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43
8=43 9=0A
stmlf.txt direct read (text): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43 8=43 9=0A
stmlf.txt direct read (binary): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43 8=43 9=0A

Arne
Arne Vajhøj
2021-10-07 14:42:26 UTC
Permalink
        fseek(fp, ix, SEEK_SET);
var.txt size = 14 bytes
var.txt sequential read (text): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43 8=43 9=0A
var.txt sequential read (binary): 1=41 2=42 3=42 4=43 5=43 6=43
var.txt direct read (text): 1=41 2=-1 3=02 4=-1 5=42 6=-1 7=-1 8=-1 9=43
10=-1 11=-1 12=-1 13=FF 14=-1
var.txt direct read (binary): 1=41 2=-1 3=02 4=-1 5=42 6=-1 7=-1 8=-1
9=43 10=-1 11=-1 12=-1 13=FF 14=-1
$ mcr sys$disk:[]process stmlf.txt
stmlf.txt size = 9 bytes
stmlf.txt sequential read (text): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43
8=43 9=0A
stmlf.txt sequential read (binary): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43
8=43 9=0A
stmlf.txt direct read (text): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43 8=43 9=0A
stmlf.txt direct read (binary): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43 8=43 9=0A
In all fairness then I believe there are some documentation
somewhere that states that fseek is only supported to
beginning of a record. I cannot find it right now,
but I believe I once saw it somewhere.

Arne
Craig A. Berry
2021-10-07 16:01:37 UTC
Permalink
Post by Arne Vajhøj
         fseek(fp, ix, SEEK_SET);
var.txt size = 14 bytes
var.txt sequential read (text): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43 8=43 9=0A
var.txt sequential read (binary): 1=41 2=42 3=42 4=43 5=43 6=43
var.txt direct read (text): 1=41 2=-1 3=02 4=-1 5=42 6=-1 7=-1 8=-1
9=43 10=-1 11=-1 12=-1 13=FF 14=-1
var.txt direct read (binary): 1=41 2=-1 3=02 4=-1 5=42 6=-1 7=-1 8=-1
9=43 10=-1 11=-1 12=-1 13=FF 14=-1
$ mcr sys$disk:[]process stmlf.txt
stmlf.txt size = 9 bytes
stmlf.txt sequential read (text): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43
8=43 9=0A
stmlf.txt sequential read (binary): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43
8=43 9=0A
stmlf.txt direct read (text): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43 8=43 9=0A
stmlf.txt direct read (binary): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43 8=43 9=0A
In all fairness then I believe there are some documentation
somewhere that states that fseek is only supported to
beginning of a record. I cannot find it right now,
but I believe I once saw it somewhere.
Is this what you're looking for?

$ help crtl fseek description

CRTL

fseek

Description

The fseek function can position a fixed-length record-access
file with no carriage control or a stream-access file on any
byte offset, but can position all other files only on record
boundaries.

The available Standard I/O functions position a variable-length
or VFC record file at its first byte, at the end-of-file, or on
a record boundary. Therefore, the arguments given to fseek must
specify any of the following:

o The beginning or end of the file

o A 0 offset from the current position (an arbitrary record
boundary)

o The position returned by a previous, valid ftell call
Arne Vajhøj
2021-10-07 16:52:21 UTC
Permalink
Post by Craig A. Berry
Post by Arne Vajhøj
         fseek(fp, ix, SEEK_SET);
var.txt size = 14 bytes
var.txt sequential read (text): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43 8=43 9=0A
var.txt sequential read (binary): 1=41 2=42 3=42 4=43 5=43 6=43
var.txt direct read (text): 1=41 2=-1 3=02 4=-1 5=42 6=-1 7=-1 8=-1
9=43 10=-1 11=-1 12=-1 13=FF 14=-1
var.txt direct read (binary): 1=41 2=-1 3=02 4=-1 5=42 6=-1 7=-1 8=-1
9=43 10=-1 11=-1 12=-1 13=FF 14=-1
$ mcr sys$disk:[]process stmlf.txt
stmlf.txt size = 9 bytes
stmlf.txt sequential read (text): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43
8=43 9=0A
stmlf.txt sequential read (binary): 1=41 2=0A 3=42 4=42 5=0A 6=43
7=43 8=43 9=0A
stmlf.txt direct read (text): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43 8=43 9=0A
stmlf.txt direct read (binary): 1=41 2=0A 3=42 4=42 5=0A 6=43 7=43 8=43 9=0A
In all fairness then I believe there are some documentation
somewhere that states that fseek is only supported to
beginning of a record. I cannot find it right now,
but I believe I once saw it somewhere.
Is this what you're looking for?
$ help crtl fseek description
CRTL
  fseek
    Description
         The fseek function can position a fixed-length record-access
         file with no carriage control or a stream-access file on any
         byte offset, but can position all other files only on record
         boundaries.
         The available Standard I/O functions position a variable-length
         or VFC record file at its first byte, at the end-of-file, or on
         a record boundary. Therefore, the arguments given to fseek must
         o  The beginning or end of the file
         o  A 0 offset from the current position (an arbitrary record
            boundary)
         o  The position returned by a previous, valid ftell call
YES.

And shame on me, because I only checked help crtl fseek arguments.

Arne
Dave Froble
2021-10-07 17:00:35 UTC
Permalink
Post by Arne Vajhøj
I suspect that the variable length file output below will
surprise a few *nix developers.
Why do you post C code examples that confuse me and give me a headache?

:-)

Then again, Basic code examples might confuse Unix developers ...
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Arne Vajhøj
2021-10-07 17:09:33 UTC
Permalink
Post by Dave Froble
Post by Arne Vajhøj
I suspect that the variable length file output below will
surprise a few *nix developers.
Why do you post C code examples that confuse me and give me a headache?
:-)
Then again, Basic code examples might confuse Unix developers ...
Sorry about the headache.

But the topic was identical code on *nix and VMS trying to
access a random position in a file.

C is available on both *nix and VMS so it was rather
obvious.

VMS Basic is not available on *nix.

I don't think there is quite the same options
in VMS Basic as in C for this, but I expect all the
options available in VMS Basic to produce a natural
expected result.

Arne
Simon Clubley
2021-10-07 17:53:34 UTC
Permalink
Post by Dave Froble
Post by Arne Vajhøj
I suspect that the variable length file output below will
surprise a few *nix developers.
Why do you post C code examples that confuse me and give me a headache?
:-)
Then again, Basic code examples might confuse Unix developers ...
Some of them might be aware of Basic.

Back in the later MS-DOS days, Microsoft used to ship a Basic
interpreter for free with MS-DOS and (apparently some Windows versions):

https://en.wikipedia.org/wiki/QBasic

I've just discovered there's a version of Microsoft QuickBasic for Linux:

https://en.wikipedia.org/wiki/FreeBASIC

which I did not know about.

Just been reminded that Gorillas.bas was released 30 years ago.

I am now depressed. :-)

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Arne Vajhøj
2021-10-07 18:13:45 UTC
Permalink
Post by Simon Clubley
Post by Dave Froble
Then again, Basic code examples might confuse Unix developers ...
Some of them might be aware of Basic.
Back in the later MS-DOS days, Microsoft used to ship a Basic
https://en.wikipedia.org/wiki/QBasic
GW-Basic came with DOS 1-4 and QBasic with DOS 5-6 and early Windows I
believe.

GW-Basic source code is now available at:
https://github.com/microsoft/GW-BASIC
Post by Simon Clubley
https://en.wikipedia.org/wiki/FreeBASIC
which I did not know about.
I would still not expect many Linux people to know Basic.

And besides VMS Basic is somewhat different from MS Basic flavors.

Arne
Dave Froble
2021-10-07 16:57:28 UTC
Permalink
Post by Simon Clubley
Post by Greg Tinkler
Post by Simon Clubley
How do you find byte 12,335,456 in a variable length RMS sequential file
without reading from the start of the file ?
That's why there are restrictions on RMS supported file formats in an
application in some cases.
The same way it is done on Unix, calculate the block offset, go get it, and extract the byte. no difference and nothing to do with the underlying format.
You don't know the block offset without scanning the file when it comes
to some RMS file formats.
IOW, data byte 12,335,456 will not be the same thing as file byte 12,335,456
unless you restrict yourself to record formats that do not have embedded
record metadata.
Simon.
I'm guessing Unix files don't have metadata and such. So the comparison
is not valid.

For a non-RMS file, yes, the location can be calculated. But not so for
an RMs file with record characteristics included in the records.

Since Unix doesn't have RMS files, perhaps that confused Greg.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Simon Clubley
2021-10-07 18:07:00 UTC
Permalink
Post by Dave Froble
Post by Simon Clubley
Post by Greg Tinkler
Post by Simon Clubley
How do you find byte 12,335,456 in a variable length RMS sequential file
without reading from the start of the file ?
That's why there are restrictions on RMS supported file formats in an
application in some cases.
The same way it is done on Unix, calculate the block offset, go get it, and extract the byte. no difference and nothing to do with the underlying format.
You don't know the block offset without scanning the file when it comes
to some RMS file formats.
IOW, data byte 12,335,456 will not be the same thing as file byte 12,335,456
unless you restrict yourself to record formats that do not have embedded
record metadata.
I'm guessing Unix files don't have metadata and such. So the comparison
is not valid.
No, Unix doesn't. At Unix filesystem level, files are just a stream of bytes.

The next layer up on Unix is the C RTL. There's nothing like RMS
between the filesystem and the C RTL on Unix.

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Arne Vajhøj
2021-10-07 18:18:19 UTC
Permalink
Post by Simon Clubley
Post by Dave Froble
I'm guessing Unix files don't have metadata and such. So the comparison
is not valid.
No, Unix doesn't. At Unix filesystem level, files are just a stream of bytes.
The next layer up on Unix is the C RTL. There's nothing like RMS
between the filesystem and the C RTL on Unix.
The Unix file systems does not have meta data about how the
bytes are to be read/interpreted (like VMS: ORG, RFM, RAT,
MRS etc.). They do have some general meta data (owner,
protection, size, timestamp).

Arne
Lawrence D’Oliveiro
2021-10-08 01:10:01 UTC
Permalink
Post by Simon Clubley
Post by Dave Froble
I'm guessing Unix files don't have metadata and such.
No, Unix doesn't. At Unix filesystem level, files are just a stream of bytes.
Some Linux filesystems have the concept of “extended attributes” <https://manpages.debian.org/buster/manpages/xattr.7.en.html>. Some are reserved for security purposes, others are user-defined.
Simon Clubley
2021-10-08 18:23:44 UTC
Permalink
Post by Simon Clubley
Post by Dave Froble
I'm guessing Unix files don't have metadata and such.
No, Unix doesn't. At Unix filesystem level, files are just a stream of bytes.
Some Linux filesystems have the concept of ?extended attributes? <https://manpages.debian.org/buster/manpages/xattr.7.en.html>. Some are reserved for security purposes, others are user-defined.
That's true and I do use them. However, at filesystem level, the
file data itself is just a stream of bytes without any embedded metadata
(unlike on VMS).

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Stephen Hoffman
2021-10-07 15:34:48 UTC
Permalink
Post by Greg Tinkler
My point is SSIO seems to be focused on just PostgreSQL, whereas an RMS
solution is much much easier to program, uses well tested code, and is
already cluster ready putting the team ahead of the game and not
building issues for the future.
...

Fix the existing RMS data corruption in 32-bit RMS and/or in the C
library, and get PostgreSQL available on OpenVMS soonest. I expect this
is the priority for VSI.

Everything else is aspirational.

Integrate stream file access support at the XQP and allow C and C++ and
other non-punched-card-style app designs and stream- and OO-focused
languages to optionally bypass RMS entirely.

Better integrate and document the existing range-locking support
available within DLM.

And in aggregate, stop trying to make the current 32-bit RMS NoSQL
database more complex than it already is, and re-architect such that
32-bit RMS NoSQL database becomes just another available database, and
preferably while providing room for 64-bit RMS rather than trying
another OpenVMS Alpha V7.0-style 32-/64-bit or FAB/RAB/RAB64/NAM/NAML
design, and make 32- or (hypothetical) 64-bit RMS not the sole
persistent-storage "funnel" for structured file access for apps running
on OpenVMS, short of those few using XQP or LOG_IO or PHY_IO. Existing
RMS apps are already headed for "fun" as part of the upcoming 64-bit
LBN work for VSI and for apps, and a whole lot of those apps just won't
make it past messes similar to apps still tied to ODS-2 naming. I'd
wager that most existing apps don't yet fully support ODS-5 naming,
UTF-8 and all, too. Similar app messes with latent 32-bit RMS
dependencies.
--
Pure Personal Opinion | HoffmanLabs LLC
Dave Froble
2021-10-07 17:16:03 UTC
Permalink
Post by Stephen Hoffman
Post by Greg Tinkler
My point is SSIO seems to be focused on just PostgreSQL, whereas an
RMS solution is much much easier to program, uses well tested code,
and is already cluster ready putting the team ahead of the game and
not building issues for the future.
...
Fix the existing RMS data corruption in 32-bit RMS and/or in the C
library, and get PostgreSQL available on OpenVMS soonest. I expect this
is the priority for VSI.
Most likely.
Post by Stephen Hoffman
Everything else is aspirational.
Integrate stream file access support at the XQP and allow C and C++ and
other non-punched-card-style app designs and stream- and OO-focused
languages to optionally bypass RMS entirely.
I don't use C, so I don't know much about it. But isn't this capability
already available? Even RMS has the BLOCK I/O capability, at least from
Basic.

As far as I know, QIO doesn't know a thing about RMS. Well, the
directory structure does know RMS, and to an extent is RMS.
Post by Stephen Hoffman
Better integrate and document the existing range-locking support
available within DLM.
Yes, for sure. And if needed, make it much better.
Post by Stephen Hoffman
And in aggregate, stop trying to make the current 32-bit RMS NoSQL
database more complex than it already is, and re-architect such that
32-bit RMS NoSQL database becomes just another available database, and
preferably while providing room for 64-bit RMS rather than trying
another OpenVMS Alpha V7.0-style 32-/64-bit or FAB/RAB/RAB64/NAM/NAML
design, and make 32- or (hypothetical) 64-bit RMS not the sole
persistent-storage "funnel" for structured file access for apps running
on OpenVMS, short of those few using XQP or LOG_IO or PHY_IO. Existing
RMS apps are already headed for "fun" as part of the upcoming 64-bit LBN
work for VSI and for apps, and a whole lot of those apps just won't make
it past messes similar to apps still tied to ODS-2 naming. I'd wager
that most existing apps don't yet fully support ODS-5 naming, UTF-8 and
all, too. Similar app messes with latent 32-bit RMS dependencies.
Oh, no, Steve. That is much too logical and reasonable. Can't have
that. We must insure that things stay totally screwed up.

Don't know how far work had progressed on alternate file systems. Might
or might not help to make RMS "just another capability". But, doing
what you suggest would go a long way toward making VMS more useful in
the future.

I've got the suspicion that VMS clusters, while good, create some of the
problems in attempting to add new capabilities to VMS. Need I mention
"MOUNT"? Better segregation might help to add new and different
capabilities. Not sure how easy that might be.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Arne Vajhøj
2021-10-07 17:25:30 UTC
Permalink
Post by Stephen Hoffman
Integrate stream file access support at the XQP and allow C and C++ and
other non-punched-card-style app designs and stream- and OO-focused
languages to optionally bypass RMS entirely.
I don't use C, so I don't know much about it.  But isn't this capability
already available?  Even RMS has the BLOCK I/O capability, at least from
Basic.
C/C++ and most newer languages have a "stream view" of files while
RMS has a "record view" of files.

If they used different file systems everything would be fine.

If all text files are STMLF then it works and the "stream view"
and the "record view" produces consistent results.

But trying to mix on variable length or VFC files becomes
a minefield.

I know you don't like C, but try look at the example I posted.
Some of the outputs are very weird.

Arne
Dave Froble
2021-10-07 21:13:54 UTC
Permalink
Post by Arne Vajhøj
Post by Dave Froble
Post by Stephen Hoffman
Integrate stream file access support at the XQP and allow C and C++ and
other non-punched-card-style app designs and stream- and OO-focused
languages to optionally bypass RMS entirely.
I don't use C, so I don't know much about it. But isn't this
capability already available? Even RMS has the BLOCK I/O capability,
at least from Basic.
C/C++ and most newer languages have a "stream view" of files while
RMS has a "record view" of files.
If they used different file systems everything would be fine.
If all text files are STMLF then it works and the "stream view"
and the "record view" produces consistent results.
But trying to mix on variable length or VFC files becomes
a minefield.
I know you don't like C, but try look at the example I posted.
Some of the outputs are very weird.
Arne
I have some understanding of RMS files. I've been known to do recovery
work on corrupted RMS files. Have to have some knowledge of RMS to do
that. But, it sure isn't any fun ...
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Simon Clubley
2021-10-07 18:28:04 UTC
Permalink
Post by Dave Froble
Don't know how far work had progressed on alternate file systems. Might
or might not help to make RMS "just another capability". But, doing
what you suggest would go a long way toward making VMS more useful in
the future.
I've got the suspicion that VMS clusters, while good, create some of the
problems in attempting to add new capabilities to VMS. Need I mention
"MOUNT"? Better segregation might help to add new and different
capabilities. Not sure how easy that might be.
VMS clusters at conceptual level are not the problem. They offer
some very nice functionality that only recently is beginning to
appear elsewhere. They were literally a generation ahead of what
was available elsewhere when they were released.

The problem is how VMS was designed in those early days before
modular and layered computing really took off.

The VMS filesystem code, including MOUNT as you say, is a _horrible_
monolithic mass of closely interlinked code without any clear
boundaries between them that allow people (including end users) to
easily plug in new functionality and new filesystems.

The same is true for VMS CLIs BTW. DCL is tightly bound into VMS
in a horrible way it should not be. On Linux, both the command
shell and filesystem architectures are vastly cleaner and more
modular than they are on VMS.

However, if VMS had been designed in a later era, there would be
absolutely nothing stopping VMS having a cleaner internal architecture
_and_ also having world-leading cluster capabilities that are only
now just being equalled elsewhere.

IOW, it's not clustering that's the problem - it's the fact that
VMS wasn't implemented 5 to 10 years later than it was.

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Stephen Hoffman
2021-10-07 18:34:19 UTC
Permalink
IOW, it's not clustering that's the problem - it's the fact that VMS
wasn't implemented 5 to 10 years later than it was.
...Or that OpenVMS and its apps weren't later migrated to DEC MICA.
Which is kinda-sorta what you're referring to.
--
Pure Personal Opinion | HoffmanLabs LLC
Dave Froble
2021-10-07 21:17:32 UTC
Permalink
Post by Simon Clubley
Post by Dave Froble
Don't know how far work had progressed on alternate file systems. Might
or might not help to make RMS "just another capability". But, doing
what you suggest would go a long way toward making VMS more useful in
the future.
I've got the suspicion that VMS clusters, while good, create some of the
problems in attempting to add new capabilities to VMS. Need I mention
"MOUNT"? Better segregation might help to add new and different
capabilities. Not sure how easy that might be.
VMS clusters at conceptual level are not the problem. They offer
some very nice functionality that only recently is beginning to
appear elsewhere. They were literally a generation ahead of what
was available elsewhere when they were released.
The problem is how VMS was designed in those early days before
modular and layered computing really took off.
The VMS filesystem code, including MOUNT as you say, is a _horrible_
monolithic mass of closely interlinked code without any clear
boundaries between them that allow people (including end users) to
easily plug in new functionality and new filesystems.
The same is true for VMS CLIs BTW. DCL is tightly bound into VMS
in a horrible way it should not be. On Linux, both the command
shell and filesystem architectures are vastly cleaner and more
modular than they are on VMS.
However, if VMS had been designed in a later era, there would be
absolutely nothing stopping VMS having a cleaner internal architecture
_and_ also having world-leading cluster capabilities that are only
now just being equalled elsewhere.
IOW, it's not clustering that's the problem - it's the fact that
VMS wasn't implemented 5 to 10 years later than it was.
Simon.
You may have noticed that I didn't blame VMS clusters for the problem.
Rather how some things are so rigid, and more so because of support for
some things that involve clusters. Makes new stuff sometimes much
harder, as you mentioned.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Lawrence D’Oliveiro
2021-10-08 01:19:33 UTC
Permalink
Post by Simon Clubley
The same is true for VMS CLIs BTW. DCL is tightly bound into VMS
in a horrible way it should not be. On Linux, both the command
shell and filesystem architectures are vastly cleaner and more
modular than they are on VMS.
Fundamental difference in mindset: process creation in VMS is expensive and to be avoided if possible, while on *nix systems it’s something you do as naturally as breathing.

And of course the VMS mindset continued over into Windows NT...
Post by Simon Clubley
However, if VMS had been designed in a later era ...
Note that Unix predates VMS. Folks at DEC would have been aware of it right from the early days, since it was born on DEC hardware.
Stephen Hoffman
2021-10-08 15:37:24 UTC
Permalink
Post by Lawrence D’Oliveiro
The same is true for VMS CLIs BTW. DCL is tightly bound into VMS in a
horrible way it should not be. On Linux, both the command shell and
filesystem architectures are vastly cleaner and more modular than they
are on VMS.
Fundamental difference in mindset: process creation in VMS is expensive
and to be avoided if possible, while on *nix systems it’s something you
do as naturally as breathing.
VAX-era wisdom, and which is clung on. Creating new processes on
OpenVMS never got as light as Unix, but the overhead has become
negligible on modern systems for all but industrial-scale
creation-deletion.

Having looked at this back in the VAX era, the slow process creations
our apps were incurring were arising from inefficiencies within the DCL
spawn-related processing, and not from within the OpenVMS process
creation overhead.

Once that was identified and the obvious work-around implemented,
spawns were pretty speedy even VAX-era.

To Simon's comment, how DCL gets mapped into process address space is
just ugly, too. And hard to debug.
--
Pure Personal Opinion | HoffmanLabs LLC
Lawrence D’Oliveiro
2021-10-08 22:55:03 UTC
Permalink
Post by Stephen Hoffman
Having looked at this back in the VAX era, the slow process creations
our apps were incurring were arising from inefficiencies within the DCL
spawn-related processing, and not from within the OpenVMS process
creation overhead.
Once that was identified and the obvious work-around implemented,
spawns were pretty speedy even VAX-era.
To Simon's comment, how DCL gets mapped into process address space is
just ugly, too. And hard to debug.
But the whole reason why DCL maps into a process in this way, so that user-mode code can be repeatedly loaded, run and then wiped from the same process, was precisely to avoid multiple process creations. Now you are saying that the DCL mechanism itself contributes to the overhead of process creations!

But “spawn” is still not the same as “fork”. Sure, in *nix, the “fork” followed by “exec” idiom is common, but lots of forks are done without an accompanying exec (I’ve done a few myself). In the early days of Unix, the “vfork” hack was invented to speed things up in the fork+exec case, but this was later discovered to be unnecessary: not (so much) because hardware had become faster, but it was recognized that the bottleneck of giving the child process its own copy of non-shared writable memory could be avoided/postponed by just copying the relevant page table entries and setting a “copy-on-write” flag on them.

What do you know, vfork(2) was actually specified in POSIX, and Linux still supports it <https://manpages.debian.org/bullseye/manpages-dev/vfork.2.en.html>.
Greg Tinkler
2021-10-09 00:04:09 UTC
Permalink
<snip>
Post by Dave Froble
The optional parameter might be a "lock type", and if not present,
existing logic would be used, and if present, new code could be executed
to process the new lock type. Stuff a couple of quadwords into the
resource name for the numeric range. It would add one new piece of data
to the DLM data structure(s).
What would be useful is a name space for the lock e.g. RMS ...

Well there is the group id from UIC, and there is $set_resource_domain(), both can be useful but not a solution.

Having a name space that can be local machine or cluster wide could be very useful for some applications. But that is a much longer term idea.

At present the resource name is limited to 31 char, ok in the 32 era but in 64 bit era and looking at GFS2 for peta byte moving into the exabyte and possibly the Zettabyta range, if VMS is to survive the next 40 years it needs to prepare.

First lets move onto X86_64. Yes it would be good to have easier building of open source code, and the main issues as I understand it are

file IO, moving to RMS will fix most if not all of that
fork
for fork/exec - spawn is fine, no modern systems a bit of CPU...
for file access more of problem, but not often used
directory and filenaming
some work is in place for this

So the main issue is file IO, so change CRTL to use RMS.

gt
Dave Froble
2021-10-09 01:09:30 UTC
Permalink
Post by Greg Tinkler
<snip>
Post by Dave Froble
The optional parameter might be a "lock type", and if not present,
existing logic would be used, and if present, new code could be executed
to process the new lock type. Stuff a couple of quadwords into the
resource name for the numeric range. It would add one new piece of data
to the DLM data structure(s).
What would be useful is a name space for the lock e.g. RMS ...
Well there is the group id from UIC, and there is $set_resource_domain(), both can be useful but not a solution.
Having a name space that can be local machine or cluster wide could be very useful for some applications. But that is a much longer term idea.
At present the resource name is limited to 31 char, ok in the 32 era but in 64 bit era and looking at GFS2 for peta byte moving into the exabyte and possibly the Zettabyta range, if VMS is to survive the next 40 years it needs to prepare.
First lets move onto X86_64. Yes it would be good to have easier building of open source code, and the main issues as I understand it are
file IO, moving to RMS will fix most if not all of that
fork
for fork/exec - spawn is fine, no modern systems a bit of CPU...
for file access more of problem, but not often used
directory and filenaming
some work is in place for this
So the main issue is file IO, so change CRTL to use RMS.
gt
Let me ask this as a question, because I really don't know.

Doesn't C already use RMS for file I/O ?

It has been my impression that all the VMS languages use RMS for file
I/O. But I don't get out much.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Arne Vajhøj
2021-10-09 01:18:53 UTC
Permalink
Post by Dave Froble
Post by Greg Tinkler
So the main issue is file IO, so change CRTL to use RMS.
Let me ask this as a question, because I really don't know.
Doesn't C already use RMS for file I/O ?
It has been my impression that all the VMS languages use RMS for file
I/O.  But I don't get out much.
It has previously been claimed that:

other languages use a very thin layer on top of SYS$GET and SYS$PUT

C use a much thicker layer on top of SYS$READ and SYS$WRITE

I don't know if it is true or not.

Arne
Vitaly Pustovetov
2021-10-09 07:00:47 UTC
Permalink
Post by Greg Tinkler
So the main issue is file IO, so change CRTL to use RMS.
gt
CRTL uses RMS for file I/O. But there is an issue with concurrent access of multiple processes to the same file in stream mode. And we had a choice - 1) rewrite half of Postgres by inserting file locking; 2) add a new SSIO (Shared Stream IO) service to VMS.
Greg Tinkler
2021-10-09 10:19:42 UTC
Permalink
Post by Vitaly Pustovetov
Post by Greg Tinkler
So the main issue is file IO, so change CRTL to use RMS.
gt
CRTL uses RMS for file I/O. But there is an issue with concurrent access of multiple processes to the same file in stream mode.
And we had a choice - 1) rewrite half of Postgres by inserting file locking; 2) add a new SSIO (Shared Stream IO) service to VMS.
Sort of.

RMS worked and has been working of 40+years and does not have these concurrent access issues! NB there is no such thing at an OS level of stream anything, every thing is clumps of data being buffered is some way, the API that accesses that data from the higher levels may be stream based. In this case it is CRTL's role to translate the clumps of data into/from stream API.

So the other choice, 3), fix CRTL to use RMS correctly, and the problems will go away. Engineering effort would not be great. I don't have access to the code base, but assuming that stdio uses unixio then it is fixing 5 routines. This would also allow all the other ports to work with minimal changes in the file access area. If you what to know more contact me directly.

Longer term the SSIO may be useful to RMS, which is where it belongs.

Sorry if the above is a little blunt, I appreciate the efforts people have put in over the years, but some of us have using and coding VMS for a very long time, and I really want VMS to be successful and easy to port to. This has been a good opportunity for me to look more into CRTL and RMS, and see the problems that have been there for decades.

locking is an interesting area, I still feel the current DLM is more than capable of doing the 'lock a byte range' in a way that can be used with the current RMS locking. Longer term DLM needs some changes but they are about sizes of resource names, scoping of resource names, ability to scan for children resources by name.

gt
David Jones
2021-10-09 12:25:11 UTC
Permalink
So the other choice, 3), fix CRTL to use RMS correctly, and the problems will go away. Engineering effort would not be great. I don't have access to the code base, but assuming that stdio uses unixio then it is fixing 5 routines. This would also allow all the other ports to work with minimal changes in the file access area. If you what to know more contact me directly.
Longer term the SSIO may be useful to RMS, which is where it belongs.
I don't think the CRTL can do it with just the capabilities RMS gives it currently(or they would have fixed it already). Maintaining coherence of where end-of-file is for multiple writers is a difficult problem.

The crtl does not layer stdio file access on top of unixio primitives.
Vitaly Pustovetov
2021-10-09 15:48:29 UTC
Permalink
Post by Greg Tinkler
RMS worked and has been working of 40+years and does not have these concurrent access issues!
No, you are wrong. RMS works fine with record-based files, but not streams. You can write a program even in MACRO, you will still have the same issues. This is a documented feature of RMS.
Stephen Hoffman
2021-10-09 16:47:11 UTC
Permalink
Post by Greg Tinkler
Post by Vitaly Pustovetov
So the main issue is file IO, so change CRTL to use RMS.> >> > gt
CRTL uses RMS for file I/O. But there is an issue with concurrent
access of multiple processes to the same file in stream mode. And we
had a choice - 1) rewrite half of Postgres by inserting file locking;
2) add a new SSIO (Shared Stream IO) service to VMS.
Sort of.
RMS worked and has been working of 40+years and does not have these
concurrent access issues! NB there is no such thing at an OS level of
stream anything, every thing is clumps of data being buffered is some
way, the API that accesses that data from the higher levels may be
stream based. In this case it is CRTL's role to translate the clumps
of data into/from stream API.
RMS is a pretty good database, for its time. Alas, its become rather
more dated, with an API design that is complex and limiting, and in
competitive terms RMS is badly feature-limited.

If you need a key-value store and where the developer entirely owns the
fields used within the punched cards, and where y'all can fit your
files in 2 TiB (or bound volume sets, gag), RMS is still a fine choice.

For stream access to data, removing the punched-card assumptions and
file and cluster locking and the rest of effectively removes all of RMS
from the discussion; in such a case, RMS really isn't used either in
name, or in general.

As for whether or not there are streams of data, the abstraction
exists. The difference at the app level is whether the operating system
and its default file system enforces the use of a punched-card
abstraction. C does not expect that. Classic OpenVMS apps do.
Post by Greg Tinkler
So the other choice, 3), fix CRTL to use RMS correctly, and the
problems will go away. Engineering effort would not be great. I don't
have access to the code base, but assuming that stdio uses unixio then
it is fixing 5 routines. This would also allow all the other ports to
work with minimal changes in the file access area. If you what to know
more contact me directly.
Punched cards and punched-card-based assumptions are rather more
pernicious within OpenVMS and clustering, and mailboxes, and various
other areas, alas. For those of us steeped in OpenVMS, the effects of
these assumptions can be invisible.
Post by Greg Tinkler
Longer term the SSIO may be useful to RMS, which is where it belongs.
Longer-term, SSIO belongs in XQP, and RMS needs a demotion to "just
another of the available databases on OpenVMS" atop the XQP, and/or
atop some replacement XQP and/or FUSE for different file systems, and
this with various other common databases present.

SSIO and similar work aside, that demotion of RMS and the related and
substantial investment in new file system and database work are not
going to happen any time soon. Getting the device drivers and XQP to
64-bit storage addressing was reportedly one part of the work involved
(and was once targeted for V8.5), while getting RMS to 64-bit
addressing was a separate and subsequent feature. Getting RMS to 64-bit
storage addressing was and is and will be a rather larger investment,
too. Both VAFS and GFS have been discussed here, but VSI has been busy
with and increasingly focused on the port and port-related work.

SSIO is unrelated to the other file system work pending here.

Back to RMS and SSIO, apps that don't expect punched-card semantics can
and variously do perform their own coordination, so sharing the
underlying files with apps that do expect punched cards is unnecessary,
and counterproductive.
Post by Greg Tinkler
Sorry if the above is a little blunt, I appreciate the efforts people
have put in over the years, but some of us have using and coding VMS
for a very long time, and I really want VMS to be successful and easy
to port to. This has been a good opportunity for me to look more into
CRTL and RMS, and see the problems that have been there for decades.
Part of the problem was that thirty years ago, senior DEC leadership
and OpenVMS development leadership was unwilling or unable to foresee
the directions that computing was headed, and fallout from that era
continues to reverberate through to this day around C and IP and RMS.
And around where OpenVMS has found itself in recent years. As long as
we're being blunt.

One of the problems that OpenVMS has here is RMS. While RMS was and is
very useful, it's just not a competitive database in 2021, and too many
of its punched-card assumptions have permeated the platform. That, and
the primary RMS API is just bad for making any sort of significant
changes. This is another area very much like the addition of 64-bit
virtual addressing on OpenVMS; where providing compatibility for 32-bit
virtual apps makes an already complex environment (RMS) vastly more
complex (mixed 32-bit and 64-bit storage addressing within RMS).
Post by Greg Tinkler
locking is an interesting area, I still feel the current DLM is more
than capable of doing the 'lock a byte range' in a way that can be used
with the current RMS locking. Longer term DLM needs some changes but
they are about sizes of resource names, scoping of resource names,
ability to scan for children resources by name.
C and DLM already implement range locking on OpenVMS.
--
Pure Personal Opinion | HoffmanLabs LLC
Dave Froble
2021-10-09 18:00:45 UTC
Permalink
Post by Stephen Hoffman
C and DLM already implement range locking on OpenVMS.
I'd really like to see the documentation and how to use it.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Stephen Hoffman
2021-10-09 20:57:34 UTC
Permalink
Post by Dave Froble
Post by Stephen Hoffman
C and DLM already implement range locking on OpenVMS.
I'd really like to see the documentation and how to use it.
Alas, entirely undocumented, per the previous comments around here.
--
Pure Personal Opinion | HoffmanLabs LLC
Dave Froble
2021-10-09 22:47:38 UTC
Permalink
Post by Stephen Hoffman
Post by Dave Froble
Post by Stephen Hoffman
C and DLM already implement range locking on OpenVMS.
I'd really like to see the documentation and how to use it.
Alas, entirely undocumented, per the previous comments around here.
Then, does it really exist?
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Vitaly Pustovetov
2021-10-09 21:22:29 UTC
Permalink
Post by Dave Froble
Post by Stephen Hoffman
C and DLM already implement range locking on OpenVMS.
I'd really like to see the documentation and how to use it.
"File Locking
The C RTL supports byte-range file locking using the F_GETLK, F_SETLK, and F_SETLKW
commands of the fcntl function, as defined in the X/Open specification. Byte-range file locking is
supported across OpenVMS clusters. You can only use offsets that fit into 32-bit unsigned integers.
When a shared lock is set on a segment of a file, other processes on the cluster are able to set shared
locks on that segment or a portion of it. A shared lock prevents any other process from setting
an exclusive lock on any portion of the protected area. A request for a shared lock fails if the file
descriptor was not opened with read access....."(c)VSI C Run-Time Library Reference Manual
Stephen Hoffman
2021-10-09 22:27:45 UTC
Permalink
Post by Vitaly Pustovetov
On 10/9/2021 12:47 PM, Stephen Hoffman wrote:>> > C and DLM already
implement range locking on OpenVMS.
I'd really like to see the documentation and how to use it.
"File Locking
The C RTL supports byte-range...
That comment was in reference to the DLM range-locking API; the
(un)documentation for what's implemented underneath those C calls
within CRTL and DLM.
--
Pure Personal Opinion | HoffmanLabs LLC
Arne Vajhøj
2021-10-09 18:22:02 UTC
Permalink
Post by Greg Tinkler
Post by Vitaly Pustovetov
So the main issue is file IO, so change CRTL to use RMS.> >> > gt
CRTL uses RMS for file I/O. But there is an issue with concurrent
access of multiple processes to the same file in stream mode. And we
had a choice - 1) rewrite half of Postgres by inserting file locking;
2) add a new SSIO (Shared Stream IO) service to VMS.
Sort of.
RMS worked and has been working of 40+years and does not have these
concurrent access issues!  NB there is no such thing at an OS level of
stream anything, every thing is clumps of data being buffered is some
way, the API that accesses that data from the higher levels may be
stream based.  In this case it is CRTL's role to translate the clumps
of data into/from stream API.
RMS is a pretty good database, for its time.  Alas, its become rather
more dated, with an API design that is complex and limiting, and in
competitive terms RMS is badly feature-limited.
If you need a key-value store and where the developer entirely owns the
fields used within the punched cards, and where y'all can fit your files
in 2 TiB (or bound volume sets, gag), RMS is still a fine choice.
Hoff I think you are muddying the water here.

This discussion has so far been about ORG:SEQ files.

ORG:IDX files are a Key Value Store. But that is a totally
different topic.

Arne
Stephen Hoffman
2021-10-09 20:55:20 UTC
Permalink
Post by Arne Vajhøj
RMS is a pretty good database, for its time.  Alas, its become rather
more dated, with an API design that is complex and limiting, and in
competitive terms RMS is badly feature-limited.
If you need a key-value store and where the developer entirely owns the
fields used within the punched cards, and where y'all can fit your
files in 2 TiB (or bound volume sets, gag), RMS is still a fine choice.
Hoff I think you are muddying the water here.
This discussion has so far been about ORG:SEQ files.
ORG:IDX files are a Key Value Store. But that is a totally different topic.
And here I was trying to explicitly not slag on RMS and its
capabilities, as that'd solely serve provoke a torrent of folks quite
reasonably pointing out that RMS is perfect for {app}.
--
Pure Personal Opinion | HoffmanLabs LLC
Dave Froble
2021-10-09 22:55:17 UTC
Permalink
Post by Stephen Hoffman
Post by Arne Vajhøj
Post by Stephen Hoffman
RMS is a pretty good database, for its time. Alas, its become rather
more dated, with an API design that is complex and limiting, and in
competitive terms RMS is badly feature-limited.
If you need a key-value store and where the developer entirely owns
the fields used within the punched cards, and where y'all can fit
your files in 2 TiB (or bound volume sets, gag), RMS is still a fine
choice.
Hoff I think you are muddying the water here.
This discussion has so far been about ORG:SEQ files.
ORG:IDX files are a Key Value Store. But that is a totally different topic.
And here I was trying to explicitly not slag on RMS and its
capabilities, as that'd solely serve provoke a torrent of folks quite
reasonably pointing out that RMS is perfect for {app}.
Which it is, for those apps that need and use it's capabilities. Well,
maybe not "perfect". There is that lack of definition of data fields
that is so lacking in RMS. What I believe you call "marshaling".
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Dave Froble
2021-10-09 22:52:22 UTC
Permalink
Post by Arne Vajhøj
Post by Stephen Hoffman
Post by Greg Tinkler
Post by Vitaly Pustovetov
So the main issue is file IO, so change CRTL to use RMS.> >> > gt
CRTL uses RMS for file I/O. But there is an issue with concurrent
access of multiple processes to the same file in stream mode. And we
had a choice - 1) rewrite half of Postgres by inserting file
locking; 2) add a new SSIO (Shared Stream IO) service to VMS.
Sort of.
RMS worked and has been working of 40+years and does not have these
concurrent access issues! NB there is no such thing at an OS level
of stream anything, every thing is clumps of data being buffered is
some way, the API that accesses that data from the higher levels may
be stream based. In this case it is CRTL's role to translate the
clumps of data into/from stream API.
RMS is a pretty good database, for its time. Alas, its become rather
more dated, with an API design that is complex and limiting, and in
competitive terms RMS is badly feature-limited.
If you need a key-value store and where the developer entirely owns
the fields used within the punched cards, and where y'all can fit your
files in 2 TiB (or bound volume sets, gag), RMS is still a fine choice.
Hoff I think you are muddying the water here.
This discussion has so far been about ORG:SEQ files.
ORG:IDX files are a Key Value Store. But that is a totally
different topic.
Arne
No, it is not. The OP declared that RMS should be used for that.

You are correct that we're concerned about stream files, but claims
about RMS have been part of the discussion.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Arne Vajhøj
2021-10-09 23:00:26 UTC
Permalink
Post by Arne Vajhøj
RMS is a pretty good database, for its time.  Alas, its become rather
more dated, with an API design that is complex and limiting, and in
competitive terms RMS is badly feature-limited.
If you need a key-value store and where the developer entirely owns
the fields used within the punched cards, and where y'all can fit your
files in 2 TiB (or bound volume sets, gag), RMS is still a fine choice.
Hoff I think you are muddying the water here.
This discussion has so far been about ORG:SEQ files.
ORG:IDX files are a Key Value Store. But that is a totally
different topic.
No, it is not.  The OP declared that RMS should be used for that.
You are correct that we're concerned about stream files, but claims
about RMS have been part of the discussion.
RMS is very much in scope for the discussion.

But considering files a stream of bytes and the SSIO
feature are only relevant for ORG:SEQ files.

Arne

Phillip Helbig (undress to reply)
2021-10-09 18:13:24 UTC
Permalink
Post by Stephen Hoffman
RMS is a pretty good database, for its time. Alas, its become rather
more dated,
From database to datedbase. :-)
Arne Vajhøj
2021-10-09 18:18:36 UTC
Permalink
Post by Greg Tinkler
every thing is clumps of data being buffered is some way, the API that
accesses that data from the higher levels may be stream based.  In
this case it is CRTL's role to translate the clumps of data into/from
stream API.
So, how does Pascal, Fortran, Cobol, Basic, and such do it?
They do not treat files as streams of bytes - they treat files
as sequences of records.

The underlying problem is that the two paradigms are pretty
incompatible. It is not easy for CRTL to translate a sequence
of records to a stream of bytes in a consistent and meaningful
manner.

Arne
Dave Froble
2021-10-09 22:41:16 UTC
Permalink
Post by Arne Vajhøj
Post by Greg Tinkler
every thing is clumps of data being buffered is some way, the API
that accesses that data from the higher levels may be stream based.
In this case it is CRTL's role to translate the clumps of data
into/from stream API.
So, how does Pascal, Fortran, Cobol, Basic, and such do it?
They do not treat files as streams of bytes - they treat files
as sequences of records.
The underlying problem is that the two paradigms are pretty
incompatible. It is not easy for CRTL to translate a sequence
of records to a stream of bytes in a consistent and meaningful
manner.
Arne
Which is why Steve's suggestion for ODS2/ODS5 becoming just another file
system.

Which is why Steve's suggestion for RMS to become just another database
product. Well, if ODS? wants to use it for directories, Ok.

But even if another "application" handles other files, there is still
the issue of today's disks being block based (Ok, punched card if you
must) devices.

Stream devices is alien enough to today's VMS that it would be much
better served by dedicated tools designed for that format. (And it sure
isn't RMS!)

Then there is the interesting question of what the next format to come
along might be.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Stephen Hoffman
2021-10-07 18:30:50 UTC
Permalink
Post by Dave Froble
Post by Stephen Hoffman
Post by Greg Tinkler
My point is SSIO seems to be focused on just PostgreSQL, whereas an RMS
solution is much much easier to program, uses well tested code, and is
already cluster ready putting the team ahead of the game and not
building issues for the future.
...
Fix the existing RMS data corruption in 32-bit RMS and/or in the C
library, and get PostgreSQL available on OpenVMS soonest. I expect this
is the priority for VSI.
Most likely.
Post by Stephen Hoffman
Everything else is aspirational.
Integrate stream file access support at the XQP and allow C and C++ and
other non-punched-card-style app designs and stream- and OO-focused
languages to optionally bypass RMS entirely.
I don't use C, so I don't know much about it. But isn't this
capability already available?
The C standard functions—the equivalent of the BASIC calls OPEN, READ,
WRITE, et al—are via RMS. There's no knob to tell C "don't do that".

The C default sequential file format creation format on OpenVMS is RMS
VFC, which has been a perpetual source of confusion and consternation
for users new to C on OpenVMS.
Post by Dave Froble
Even RMS has the BLOCK I/O capability, at least from Basic.
C doesn't do sector I/O within the standard library, though the native
platform calls are easily available.
Post by Dave Froble
As far as I know, QIO doesn't know a thing about RMS. Well, the
directory structure does know RMS, and to an extent is RMS.
$qio (and $io_perform) offer sector access through RMS (virtual),
record access through RMS (virtual), or access to device through the
file system (IO$_ACPCONTROL XQP), or direct access to the device driver
and device (logical and physical I/O).

The VIRT_IO virtual I/O paths through RMS and through the XQP are
cluster-aware, while the LOG_IO logical and PHY_IO physical I/O paths
are not.

RMS provides record locking for cluster coordination, while the XQP
provides coordination for the on-disk file system.
Post by Dave Froble
Post by Stephen Hoffman
Better integrate and document the existing range-locking support
available within DLM.
Yes, for sure. And if needed, make it much better.
Post by Stephen Hoffman
And in aggregate, stop trying to make the current 32-bit RMS NoSQL
database more complex than it already is, and re-architect such that
32-bit RMS NoSQL database becomes just another available database, and
preferably while providing room for 64-bit RMS rather than trying
another OpenVMS Alpha V7.0-style 32-/64-bit or FAB/RAB/RAB64/NAM/NAML
design, and make 32- or (hypothetical) 64-bit RMS not the sole
persistent-storage "funnel" for structured file access for apps running
on OpenVMS, short of those few using XQP or LOG_IO or PHY_IO. Existing
RMS apps are already headed for "fun" as part of the upcoming 64-bit
LBN work for VSI and for apps, and a whole lot of those apps just won't
make it past messes similar to apps still tied to ODS-2 naming. I'd
wager that most existing apps don't yet fully support ODS-5 naming,
UTF-8 and all, too. Similar app messes with latent 32-bit RMS
dependencies.
Oh, no, Steve. That is much too logical and reasonable. Can't have
that. We must insure that things stay totally screwed up.
I'd prefer an approach where there's some opportunity to ease new work
and new APIs into production, and to also retire overtly-busted APIs.

Oracle Rdb was really good at that migration and for as far as that
went, but most other apps and OpenVMS itself have not managed to copy
that. Not successfully.
Post by Dave Froble
Don't know how far work had progressed on alternate file systems.
Might or might not help to make RMS "just another capability". But,
doing what you suggest would go a long way toward making VMS more
useful in the future.
I've got the suspicion that VMS clusters, while good, create some of
the problems in attempting to add new capabilities to VMS. Need I
mention "MOUNT"? Better segregation might help to add new and
different capabilities. Not sure how easy that might be.
Oracle Rdb and some other databases have cluster access locking,
whether using DLM or database-level locking.

Other databases can be single-host.

The SQLite port to OpenVMS supports DLM and clustering.

PostgreSQL has been adding replication and clustering:
https://www.postgresql.org/docs/9.5/different-replication-solutions.html

Whether an OpenVMS port of PostgreSQL can incorporate DLM calls is
fodder for future discussions, once the SSIO prerequisite becomes
available and a hypothetical future PostgreSQL port becomes stable. A
stable PostgreSQL will interest some folks, with adoptions depending on
both intrinsic interest and, um, potential extrinsic factors not yet in
evidence.

And no, you need not mention MOUNT, having necessarily (re)written what
MOUNT provides on several occasions.
--
Pure Personal Opinion | HoffmanLabs LLC
Arne Vajhøj
2021-10-07 18:36:28 UTC
Permalink
Post by Stephen Hoffman
https://www.postgresql.org/docs/9.5/different-replication-solutions.html
Whether an OpenVMS port of PostgreSQL can incorporate DLM calls is
fodder for future discussions, once the SSIO prerequisite becomes
available and a hypothetical future PostgreSQL port becomes stable. A
stable PostgreSQL will interest some folks, with adoptions depending on
both intrinsic interest and, um, potential extrinsic factors not yet in
evidence.
PostgreSQL clusters are active/passive.

All updates and typical all reads goes to the active node
and updates get replicated from the active node to the passive nodes.

I believe it is possible to have the passive nodes support
reading.

But with only the active node taking updates then there
is no need for DLM.

(VMS people may not even call such a config a cluster, but ...)

Arne
Stephen Hoffman
2021-10-07 19:09:10 UTC
Permalink
PostgreSQL clusters are active/passive. ...
For folks interested in this general topic area with PostgreSQL around
failover and replication, please see the PostgreSQL documentation for
details.

Here's an updated link from what I'd posted earlier:
https://www.postgresql.org/docs/14/different-replication-solutions.html

If there's interest in adding what OpenVMS calls clustering within any
hypothetical future PostgreSQL port, use of the DLM will undoubtedly be
considered.

nb: PostgreSQL uses the term "cluster" for something entirely different
and unrelated to OpenVMS clustering.
--
Pure Personal Opinion | HoffmanLabs LLC
Craig A. Berry
2021-10-07 23:15:19 UTC
Permalink
Post by Stephen Hoffman
I don't use C, so I don't know much about it.  But isn't this
capability already available?
The C standard functions—the equivalent of the BASIC calls OPEN, READ,
WRITE, et al—are via RMS. There's no knob to tell C "don't do that".
You're pretending that you don't know about the foo="bar" options on the
CRTL open/fopen/creat calls. Yes, it's all via RMS, but you can tell it
to do or not do certain things. And the feature logicals, of course, but
it might be dinnertime in your time zone and I wouldn't want to give you
indigestion :-).

But from BASIC, yes, I think you have to write wrappers around the CRTL
functions and then call them from BASIC, or at least that's what I did
the one time I had to write stream files from BASIC.
Dave Froble
2021-10-08 00:22:00 UTC
Permalink
Post by Craig A. Berry
Post by Stephen Hoffman
Post by Dave Froble
I don't use C, so I don't know much about it. But isn't this
capability already available?
The C standard functions—the equivalent of the BASIC calls OPEN, READ,
WRITE, et al—are via RMS. There's no knob to tell C "don't do that".
You're pretending that you don't know about the foo="bar" options on the
CRTL open/fopen/creat calls. Yes, it's all via RMS, but you can tell it
to do or not do certain things. And the feature logicals, of course, but
it might be dinnertime in your time zone and I wouldn't want to give you
indigestion :-).
But from BASIC, yes, I think you have to write wrappers around the CRTL
functions and then call them from BASIC, or at least that's what I did
the one time I had to write stream files from BASIC.
I'd ask, why not call the RMS routines?

No, messing with FABs and RABs and such is not one of my favorite things
to do. But it sure is doable.

Now perhaps the naming doesn't mean the same thing, but:

OPEN
Syntax
[ FOR INPUT ]
OPEN file-spec1 [ FOR OUTPUT ] AS [ FILE ] chnl-exp1 [,
open-clause ]...

open-clause: { { VIRTUAL }
}
{ { UNDEFINED }
}
{ [ ORGANIZATION ] { INDEXED } [ STREAM ]
}
{ { SEQUENTIAL } [ VARIABLE ]
}
{ { RELATIVE } [ FIXED ]
}

Basic help seems to imply that stream files can be created ...

Perhaps I should actually try it, much as it entails work ...

Itanic> t zz.bas
1 Open "ZZ.ZZ" For Output as File #1%, &
Organization Sequential Stream, &
Recordsize 32767

Print #1%, Num1$(Z%) For Z% = 1% to 5%

Close #1%

End
Itanic> t zz.zz
1
2
3
4
5
Itanic> dir/full zz.zz

Directory DKB0:[DFE]

ZZ.ZZ;1 File ID: (6678,7,0)
Size: 1/16 Owner: [DFE]
Created: 7-OCT-2021 20:22:50.29
Modified: 7-OCT-2021 20:22:50.36 (1)
Expires: <None specified>
Backup: <No backup recorded>
Effective: <None specified>
Recording: <None specified>
Accessed: 7-OCT-2021 20:22:50.29
Attr Mod: 7-OCT-2021 20:22:50.36
Data Mod: 7-OCT-2021 20:22:50.29
Linkcount: 1
File organization: Sequential
Shelved state: Online
Caching attribute: Writethrough
File attributes: Allocation: 16, Extend: 0, Global buffer count: 0,
No version limit
Record format: Stream, maximum 32767 bytes, longest 1 byte
Record attributes: Carriage return carriage control
RMS attributes: None
Journaling enabled: None
File protection: System:RWED, Owner:RWED, Group:RE, World:
Access Cntrl List: None
Client attributes: None
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Lawrence D’Oliveiro
2021-10-08 01:06:11 UTC
Permalink
As far as I know, QIO doesn't know a thing about RMS. Well, the
directory structure does know RMS, and to an extent is RMS.
Last I checked*, on VMS, directories were just files, and there was no protection against processes with write access screwing up their contents. For some reason that was not considered to be a vital part of filesystem integrity.

RMS implements the full file/directory name syntax, but the management of name entries inside directories is an ACP/XQP function.

*decades ago, admittedly
Lawrence D’Oliveiro
2021-10-08 01:01:05 UTC
Permalink
Post by Stephen Hoffman
Integrate stream file access support at the XQP and allow C and C++ and
other non-punched-card-style app designs and stream- and OO-focused
languages to optionally bypass RMS entirely.
I assume you mean “bypass RMS for non-block-level I/O”, since it was always possible for nonprivileged code to do direct ACP/XQP calls like IO$_ACCESS, READ/WRITEVBLK and friends.

(You soon appreciate how much work $PARSE is doing for you...)
Lawrence D’Oliveiro
2021-10-07 01:51:31 UTC
Permalink
Post by John Dallman
People used to UNIX or Windows generally find the other VMS file types
baffling and confusing.
One question I never saw answered (because I never came across examples of files to check it) was whether in “VFC” files, the record count included the fixed header or not? And was that the same or different in the on-disk format versus the in-memory RMS structure with the “RSZ” (“RAB$W_RSZ”?) field?

By the way, I knew FORTRAN carriage control is now an anachronism, but I didn’t realize that it is now considered so obsolete, that compilers won’t support it any more.
Arne Vajhøj
2021-10-07 01:59:41 UTC
Permalink
Post by Lawrence D’Oliveiro
Post by John Dallman
People used to UNIX or Windows generally find the other VMS file types
baffling and confusing.
One question I never saw answered (because I never came across
examples of files to check it) was whether in “VFC” files, the record
count included the fixed header or not? And was that the same or
different in the on-disk format versus the in-memory RMS structure
with the “RSZ” (“RAB$W_RSZ”?) field?
Try it!

$ open/write z.z z.z
$ write z.z "ABC"
$ close z.z
$ dir/full z.z

Directory DISK2:[ARNE]

z.z;1 File ID: (5295,236,0)
...
Record format: VFC, 2 byte header, maximum 0 bytes, longest 3 bytes
...
$ dump z.z

Dump of file DISK2:[ARNE]z.z;1 on 6-OCT-2021 21:54:39.48
File ID (5295,236,0) End of file block 1 / Allocated 16

Virtual block number 1 (00000001), 512 (0200) bytes

00000000 00000000 00000000 00000000 00000000 0000FFFF 00434241
8D010005 ....ABC......................... 000000

Arne
Simon Clubley
2021-10-07 12:12:55 UTC
Permalink
Post by John Dallman
Post by David Jones
Open source software ports often comes with the restriction that it
only works with stream-LF files. Maybe they should add a flag to
directory files that if set only allows it to contain stream-LF
or directory files.
People used to UNIX or Windows generally find the other VMS file types
baffling and confusing. I got used to the idea, but never made use of
them, since my employers already had fewer customers on VMS than they did
UNIX when I joined, and the disparity only increased.
That because asking Unix/Windows people to learn about VMS records and
file structures is like asking a VMS person to learn about how to work
with records and files on z/OS using traditional z/OS methods.

It is something so very, very, different from what they are used to.

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Lawrence D’Oliveiro
2021-10-07 01:45:41 UTC
Permalink
Post by David Jones
Open source software ports often comes with the restriction that it only works
with stream-LF files.
I would say that’s partially true. Typically there are options to treat files as “text” or “binary”. A “binary” file is just a stream of arbitrary 8-bit bytes, which are supposed to be read or written without any imposition of record boundaries, sector-size rounding or special treatment of any byte values. A “text” file is assumed to be broken up into lines. It is true that LF is the traditional Unix line delimiter. But enlightened toolkits like Python are capable of reading text files in “universal newline” mode, so for example if you copy a text file created on MS-DOS (line delimiter = CR+LF, because CP/M did it that way, for no rational reason) in binary mode onto a Linux system, your Python text-processing script running on the latter can cope with it without a hiccup.
Dave Froble
2021-10-06 13:45:19 UTC
Permalink
Post by Craig A. Berry
Post by Greg Tinkler
An extra that could be added, if the file is RFM=fixed, and the C
code uses it that way with the same record length then use the
SYS$GET/SYS$PUT so it will play nicely with an RMS access to those files.
I don't know the degree to which the current plan corresponds to the
original plan from a decade or so ago, but back then only stream files
were going to be supported by SSIO, which makes sense since the whole
point is locking byte ranges.
It has been my impression that for quite some time at HP, work on
specific requests tended to be very specific to that request, and failed
to consider capabilities as general to VMS.

The approach to SSIO appears to be an example of this. Basically, do
the least required to achieve the specific result. In the case of SSIO
the result appears to be rather useless, at least so far.

For some years I've advocated a more general enhancement to the VMS DLM,
specifically, numeric range locking. Such would address a basic issue
I've had with the VMS DLM for a rather long time.

I've a database product, a rather old product. At the time it was
implemented it was rather useful. But there was a locking issue. The
DLM locks resource names. The database would support I/O transfers of 1
to 127 disk blocks. How would one lock 127 contiguous disk blocks? The
blunt force method could be taking out 127 locks, not an optimum
solution. Having numeric range locking back in 1984 would have been
quite useful.

I've also suggested in the past that a simple enhancement to the DLM,
specifically the addition of a "type of lock" with the capability of
adding logic for specific "types" would solve the locking part of SSIO
and do so as a part of VMS, not as part of the CRTL.

As for byte range I/O, I'm not sure what is and isn't possible with disk
drives. It has been my impression that only whole block transfers are
possible. Perhaps I've been wrong. Perhaps SSDs have more flexibility.

Not really an issue for me anymore.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Arne Vajhøj
2021-10-06 14:37:05 UTC
Permalink
Post by Dave Froble
Post by Craig A. Berry
Post by Greg Tinkler
An extra that could be added, if the file is RFM=fixed, and the C
code  uses it that way with the same record length then use the
SYS$GET/SYS$PUT so it will play nicely with an RMS access to those files.
I don't know the degree to which the current plan corresponds to the
original plan from a decade or so ago, but back then only stream files
were going to be supported by SSIO, which makes sense since the whole
point is locking byte ranges.
It has been my impression that for quite some time at HP, work on
specific requests tended to be very specific to that request, and failed
to consider capabilities as general to VMS.
The approach to SSIO appears to be an example of this.  Basically, do
the least required to achieve the specific result.  In the case of SSIO
the result appears to be rather useless, at least so far.
General is better than specific.

When not considering resources.

My impression is that VSI engineering resources are very limited - and
several orders of magnitudes smaller than DEC 40 years ago.

So when they have the choice of solving something 80% for 200 hours of
effort or 100% for 1000 hours of effort then ...
Post by Dave Froble
For some years I've advocated a more general enhancement to the VMS DLM,
specifically, numeric range locking.  Such would address a basic issue
I've had with the VMS DLM for a rather long time.
I've also suggested in the past that a simple enhancement to the DLM,
specifically the addition of a "type of lock" with the capability of
adding logic for specific "types" would solve the locking part of SSIO
and do so as a part of VMS, not as part of the CRTL.
That would make sense to me.

But I do not count.
Post by Dave Froble
As for byte range I/O, I'm not sure what is and isn't possible with disk
drives.  It has been my impression that only whole block transfers are
possible.  Perhaps I've been wrong.  Perhaps SSDs have more flexibility.
No matter what the disk can do then the VMS file system is still
block oriented and I believe the system services take block offsets
not byte offsets.

Arne
Arne Vajhøj
2021-10-06 13:01:17 UTC
Permalink
Post by Greg Tinkler
I notice that SSIO (beta) in included in an up coming V9.1 field
test. So I read up on the issues it is trying to solve.
One concerning thing was to have CRTL (via SSIO) access directly to
XFC. From an architectural point of view this is wrong at so many
levels, but if that is what needs to happen then open it up so RMS
and other code bases can use it.
The main reason stated was the need to do byte offset/count IO’s.
Well lets solve that first, change RMS by adding SYS$READB and
SYS$WRITEB. These would be useful to all code using RMS. SYS$READB
read from byte offset for count, return latest data from that byte
range. SYS$WRITEB write from byte offset for count, update latest
copy of underlying blocks.
By having these as part of RMS we want to ensure the blocks/buffers
are coordinated so any other user of RMS will see the changes, and we
get their changes.
This seems to be at the core of the CRTL issue, it does NOT use RMS,
nor does it synchronize its blocks/buffers, leading to the lost
update problem.
So with this ‘simple’ addition the CRTL should be altered to us RMS for all file IO.
An extra that could be added, if the file is RFM=fixed, and the C
code uses it that way with the same record length then use the
SYS$GET/SYS$PUT so it will play nicely with an RMS access to those
files.
To be honest then I think the safest way to implement this is
to put lots of restrictions on when it is doable.

Examples:
* No cluster support (announcement already states that!)
* Only FIX 512, STMLF and UDF are supported
* no mixing with traditional RMS calls

Some applications coming over from *nix most known PostgreSQL needs
this. But trying to cover all types of cases would be a lot of
work.

Arne
Loading...