Discussion:
wrong file format
(too old to reply)
Phillip Helbig (undress to reply)
2020-12-22 23:42:19 UTC
Permalink
I have a damaged file which looks fine except that I see (in EDT)
explicit carriage returns at the end of each line AND extra blank lines
between each pair of lines. None of the standard SET FILE/ATTR
incantations seems to work.

Long story and it's late. :-| Maybe someone has a suggestion.
Dave Froble
2020-12-22 23:49:53 UTC
Permalink
Post by Phillip Helbig (undress to reply)
I have a damaged file which looks fine except that I see (in EDT)
explicit carriage returns at the end of each line AND extra blank lines
between each pair of lines. None of the standard SET FILE/ATTR
incantations seems to work.
Long story and it's late. :-| Maybe someone has a suggestion.
You determine what the line terminators are, then write a program to
read the file, remove the terminators, and write the records. Should
not take much more than 10 lines of code, if that.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Phillip Helbig (undress to reply)
2020-12-23 09:40:18 UTC
Permalink
Post by Dave Froble
You determine what the line terminators are, then write a program to
read the file, remove the terminators, and write the records. Should
not take much more than 10 lines of code, if that.
Yes, if all else fails, I could try that. It's a big file, though.
geze...@rlgsc.com
2020-12-22 23:56:50 UTC
Permalink
Post by Phillip Helbig (undress to reply)
I have a damaged file which looks fine except that I see (in EDT)
explicit carriage returns at the end of each line AND extra blank lines
between each pair of lines. None of the standard SET FILE/ATTR
incantations seems to work.
Long story and it's late. :-| Maybe someone has a suggestion.
Phillip,

WADR, for a start, please post the output of DIRECTORY/FULL.

If I cannot fix the problem with a simple SET FILE/ATTRIBUTE, I frequently find that TECO allows me to quickly write a command string to fix things.

- Bob Gezelter, http://www.rlgsc.com
Phillip Helbig (undress to reply)
2020-12-23 09:42:20 UTC
Permalink
Post by ***@rlgsc.com
WADR, for a start, please post the output of DIRECTORY/FULL.
Will do if I can't solve the problem quickly. The RFM is STM, though as
I mentioned EDT sees <CR> at the end of each non-empty line, and after
each such line there is an extra empty line.
Post by ***@rlgsc.com
If I cannot fix the problem with a simple SET FILE/ATTRIBUTE, I
frequently find that TECO allows me to quickly write a command string to
fix things.
There are some cases where just opening the file in TECO and saving it
does the trick. :-)

To paraphrase Arthur C. Clarke, any sufficiently baroque editor is
indistinguishable from magic.
Jan-Erik Söderholm
2020-12-23 00:04:58 UTC
Permalink
Post by Phillip Helbig (undress to reply)
I have a damaged file which looks fine except that I see (in EDT)
explicit carriage returns at the end of each line AND extra blank lines
between each pair of lines. None of the standard SET FILE/ATTR
incantations seems to work.
Long story and it's late. :-| Maybe someone has a suggestion.
Trying to SET FILE/ATTR to anything that does not match the
actual file content will usually give weird results.

EDT tries to read the file according to the file attribs, not
according to what the file actually contains.

What are the "standard SET FILE/ATTR incarnations"? A secret?

You might try a DUMP and see how the records/lines actually are
terminated and set the file attribs accordingly.

I'm sure that you are aware that SET FILE/ATTR of course does not
change any of the content of the file, right?

B.t.w., what does a "damaged" file mean here? Dropped on the floor?
Is it an original file foreign to VMS?

lots of lose end here...
Phillip Helbig (undress to reply)
2020-12-23 09:44:10 UTC
Permalink
Post by Jan-Erik Söderholm
Trying to SET FILE/ATTR to anything that does not match the
actual file content will usually give weird results.
Right.
Post by Jan-Erik Söderholm
EDT tries to read the file according to the file attribs, not
according to what the file actually contains.
Right.
Post by Jan-Erik Söderholm
What are the "standard SET FILE/ATTR incarnations"? A secret?
Almost always, SET FILE/ATTR=RFM:<whatever> does the trick.
Post by Jan-Erik Söderholm
You might try a DUMP and see how the records/lines actually are
terminated and set the file attribs accordingly.
Yes, if a simple fix doesn't work, I'll have to dig deeper.
Post by Jan-Erik Söderholm
I'm sure that you are aware that SET FILE/ATTR of course does not
change any of the content of the file, right?
Right.
Post by Jan-Erik Söderholm
B.t.w., what does a "damaged" file mean here? Dropped on the floor?
Is it an original file foreign to VMS?
From VMS to VMS via the internet, but not a standard problem one
encounters in such situations. :-|
Henry Crun
2020-12-23 12:01:39 UTC
Permalink
Post by Phillip Helbig (undress to reply)
Post by Jan-Erik Söderholm
Trying to SET FILE/ATTR to anything that does not match the
actual file content will usually give weird results.
Right.
Post by Jan-Erik Söderholm
EDT tries to read the file according to the file attribs, not
according to what the file actually contains.
Right.
Post by Jan-Erik Söderholm
What are the "standard SET FILE/ATTR incarnations"? A secret?
Almost always, SET FILE/ATTR=RFM:<whatever> does the trick.
Post by Jan-Erik Söderholm
You might try a DUMP and see how the records/lines actually are
terminated and set the file attribs accordingly.
Yes, if a simple fix doesn't work, I'll have to dig deeper.
Post by Jan-Erik Söderholm
I'm sure that you are aware that SET FILE/ATTR of course does not
change any of the content of the file, right?
Right.
Post by Jan-Erik Söderholm
B.t.w., what does a "damaged" file mean here? Dropped on the floor?
Is it an original file foreign to VMS?
From VMS to VMS via the internet, but not a standard problem one
encounters in such situations. :-
I have solved similar problems using the TPU LEARN and REPEAT commands
(om files of several hundred thousand lines...)
--
Mike R.
Home: http://alpha.mike-r.com/
QOTD: http://alpha.mike-r.com/qotd.php
No Micro$oft products were used in the URLs above, or in preparing this message.
Recommended reading: http://www.catb.org/~esr/faqs/smart-questions.html#before
and: http://alpha.mike-r.com/jargon/T/top-post.html
Missile address: N31.7624/E34.9691
Phillip Helbig (undress to reply)
2020-12-23 13:00:12 UTC
Permalink
Post by Henry Crun
I have solved similar problems using the TPU LEARN and REPEAT commands
(om files of several hundred thousand lines...)
One of the few occasions when I use TPU instead of EDT. The file is
about 75 MB. :-(
Jan-Erik Söderholm
2020-12-23 14:04:53 UTC
Permalink
Post by Henry Crun
Post by Phillip Helbig (undress to reply)
Post by Jan-Erik Söderholm
Trying to SET FILE/ATTR to anything that does not match the
actual file content will usually give weird results.
Right.
Post by Jan-Erik Söderholm
EDT tries to read the file according to the file attribs, not
according to what the file actually contains.
Right.
Post by Jan-Erik Söderholm
What are the "standard SET FILE/ATTR incarnations"? A secret?
Almost always, SET FILE/ATTR=RFM:<whatever> does the trick.
Post by Jan-Erik Söderholm
You might try a DUMP and see how the records/lines actually are
terminated and set the file attribs accordingly.
Yes, if a simple fix doesn't work, I'll have to dig deeper.
Post by Jan-Erik Söderholm
I'm sure that you are aware that SET FILE/ATTR of course does not
change any of the content of the file, right?
Right.
Post by Jan-Erik Söderholm
B.t.w., what does a "damaged" file mean here? Dropped on the floor?
Is it an original file foreign to VMS?
 From VMS to VMS via the internet, but not a standard problem one
encounters in such situations.  :-
I have solved similar problems using the TPU LEARN and REPEAT commands
(om files of several hundred thousand lines...)
Well, *IF* it is the actuall *content* that is "wrong". If not, that
is not the right solution. Why mess with the content if it is OK?

But if there is an issue with the content, that can probably be fixed
with a short DCL or Python script also.
Scott Dorsey
2020-12-24 16:15:12 UTC
Permalink
Post by Jan-Erik Söderholm
Well, *IF* it is the actuall *content* that is "wrong". If not, that
is not the right solution. Why mess with the content if it is OK?
Because it will take five minutes to write a short script to change the
contents, and it is likely to take longer than that to figure out what
the correct file format is and to change it.

I assume currently it is appearing to be a stream_lf file?
--scott
--
"C'est un Nagra. C'est suisse, et tres, tres precis."
Craig A. Berry
2020-12-24 17:57:11 UTC
Permalink
Post by Scott Dorsey
Post by Jan-Erik Söderholm
Well, *IF* it is the actuall *content* that is "wrong". If not, that
is not the right solution. Why mess with the content if it is OK?
Because it will take five minutes to write a short script to change the
contents, and it is likely to take longer than that to figure out what
the correct file format is and to change it.
I assume currently it is appearing to be a stream_lf file?
He said the record format was stm, not stmlf. If it were stmlf he
wouldn't be having these problems. Apparently if you write CRLF to a
file with stream record format it creates two lines but leaves the CR as
data in the first line. See my previously-posted reproducer.
Phillip Helbig (undress to reply)
2020-12-24 18:13:13 UTC
Permalink
Post by Scott Dorsey
Post by Jan-Erik Söderholm
Well, *IF* it is the actuall *content* that is "wrong". If not, that
is not the right solution. Why mess with the content if it is OK?
Because it will take five minutes to write a short script to change the
contents, and it is likely to take longer than that to figure out what
the correct file format is and to change it.
I assume currently it is appearing to be a stream_lf file?
--scott
DIR/FULL says it is STM.
Hein RMS van den Heuvel
2020-12-23 15:32:28 UTC
Permalink
Post by Phillip Helbig (undress to reply)
I have a damaged file which looks fine except that I see (in EDT)
explicit carriage returns at the end of each line AND extra blank lines
between each pair of lines. None of the standard SET FILE/ATTR
incantations seems to work.
Long story and it's late. :-| Maybe someone has a suggestion.
sure... Give us good and clear information to work with.
Specifically
1) - DUMP/BLOCK=COUNT=1/WIDTH=80
2) - DUMP/RECORD=COUNT=3/WIDTH=80
3) - $ PIPE DIR/FULL X.X | search sys$pipe org,"record "

Most surprises are caused by the RFM=STM format being rather odd.
RTFM:
- http://h30266.www3.hpe.com/odl/axpos/opsys/vmsos84/4523/4523pro_007.html
---> "FAB$C_STM
Indicates stream record format. Records are delimited by FF, VT, LF, or CR LF, and all leading zeros are ignored. This format applies to sequential files only and cannot be used with the block spanning option."

So if a file has <LF><CR> instead of <CR><LF> then the LF will terminated and both the LF CR becomes data!

$ delete *.tmp ! Critical as some C based tools pick up file attributes from prior file versions.
$ perl -e "while ($i++ < 3){printf qq(%03d\n\r),$i}" > nr.tmp
$ set file/attr=rfm=stm *.tmp
$ edit/edt nr.tmp
001<LF>
<CR>002<LF>
<CR>003<LF>
<CR>

Record number 1 0A313030 001.............
Record number 2 0A 3230300D .002............
Record number 3 0A 3330300D .003............
Record number 4 0D ................

Cheers,
Hein.
Michael Moroney
2020-12-23 16:41:43 UTC
Permalink
Post by Phillip Helbig (undress to reply)
I have a damaged file which looks fine except that I see (in EDT)
explicit carriage returns at the end of each line AND extra blank lines
between each pair of lines. None of the standard SET FILE/ATTR
incantations seems to work.
Long story and it's late. :-| Maybe someone has a suggestion.
Try the following:

$ COPY NL: NEWFILE.DAT
$ APPEND OLDFILE.DAT NEWFILE.DAT
(ignore incompatible file type warning)

If NEWFILE.DAT still has stray <CR> and/or <LF> they can be removed with EDT.
Hein RMS van den Heuvel
2020-12-23 17:58:17 UTC
Permalink
Post by Michael Moroney
$ COPY NL: NEWFILE.DAT
$ APPEND OLDFILE.DAT NEWFILE.DAT
Good. The copy command creates an empty default file = variable length sequential file.

Silly, less known, FDL hints
1)
An empty FDL file also defaults to a variable length sequential file.
Therefor you can use things like: CONVERT/STAT/FDL=NL: OLDFILE.DAT NEWFILE.DAT

2)
Instead of an FDL file you can specify an FDL string on most DCL commands.
For example to create a stream-lf file, you can use:
$ convert/fdl="record; format stream_lf" OLDFILE.DAT NEWFILE.DAT
To create a trivial indexed file to play with, with 10 byte key length starting at default byte 0:
$ create/fdl="file; org ind; key 0; seg0_l 10" tmp.tmp

Back to Philip's question.
Post by Michael Moroney
I see (in EDT) explicit carriage returns at the end of each line
So maybe we can just make those the terminator and then remove all LF bytes?
$ set file/att=rfm=stmcr x.x
now see what you have and deal with it.

I tried removing the linefeeds with simple Perl one liners, but they and up either giving double records per record (one empty) or single record with no terminators. Perl seems to remember stray LF bytes as line end.

You can possibly get there using:
$ perl -015 -l012 -pe "s/\n//g" nr.tmp > new.tmp

The -015 makes CR the input line terminator
The -l012 makes LF the output terminator added to each record as defined by the CR's
The -p adds an implicit simple while loop processing the input stream until EOF
The -e is an inline script "s/\n//g" telling it to replace any and all LF's in the records to be replaced by nothing.

Cheers,
Hein
Craig A. Berry
2020-12-23 21:09:04 UTC
Permalink
Post by Hein RMS van den Heuvel
Post by Michael Moroney
$ COPY NL: NEWFILE.DAT
$ APPEND OLDFILE.DAT NEWFILE.DAT
Good. The copy command creates an empty default file = variable length sequential file.
Silly, less known, FDL hints
1)
An empty FDL file also defaults to a variable length sequential file.
Therefor you can use things like: CONVERT/STAT/FDL=NL: OLDFILE.DAT NEWFILE.DAT
2)
Instead of an FDL file you can specify an FDL string on most DCL commands.
$ convert/fdl="record; format stream_lf" OLDFILE.DAT NEWFILE.DAT
$ create/fdl="file; org ind; key 0; seg0_l 10" tmp.tmp
Back to Philip's question.
Post by Michael Moroney
I see (in EDT) explicit carriage returns at the end of each line
So maybe we can just make those the terminator and then remove all LF bytes?
$ set file/att=rfm=stmcr x.x
now see what you have and deal with it.
I tried removing the linefeeds with simple Perl one liners, but they and up either giving double records per record (one empty) or single record with no terminators. Perl seems to remember stray LF bytes as line end.
$ perl -015 -l012 -pe "s/\n//g" nr.tmp > new.tmp
The -015 makes CR the input line terminator
The -l012 makes LF the output terminator added to each record as defined by the CR's
The -p adds an implicit simple while loop processing the input stream until EOF
The -e is an inline script "s/\n//g" telling it to replace any and all LF's in the records to be replaced by nothing.
Phillip's description of the file was:

"The RFM is STM, though as I mentioned EDT sees <CR> at the end of each
non-empty line, and after each such line there is an extra empty line."

Modifying Hein's earlier suggestion, I can reliably create a file that
matches Phillip's description like so:

$ perl -"MVMS::Stdio=vmsopen" -we "$f=vmsopen('>nr.tmp','rfm=stm'); for
(1..3){printf $f qq(%03d\r\n), $_}"

and it can be fixed with the following in-place substitution:

$ perl -pi -e "$_ =~ s/\r\n$//m;" nr.tmp

the "/m" modifier on the substitution matches across line boundaries so
the CRLF and the adjacent EOL (that shows up as a blank line) match and
get removed, and then the print implicit in -p reintroduces the newline
that's been removed.

Sticking with rfm=stmlf or rfm=stmcr rather than just stm would probably
avoid a lot of trouble for reasons Hein explained earlier.
Michael Moroney
2020-12-23 18:49:33 UTC
Permalink
Post by Michael Moroney
Post by Phillip Helbig (undress to reply)
I have a damaged file which looks fine except that I see (in EDT)
explicit carriage returns at the end of each line AND extra blank lines
between each pair of lines. None of the standard SET FILE/ATTR
incantations seems to work.
Long story and it's late. :-| Maybe someone has a suggestion.
$ COPY NL: NEWFILE.DAT
$ APPEND OLDFILE.DAT NEWFILE.DAT
(ignore incompatible file type warning)
If NEWFILE.DAT still has stray <CR> and/or <LF> they can be removed with EDT.
More details: Look at the file with EDT screen mode. If the file looks correct
other than <CR> or <LF> characters, do the following:

[PF1][KP7]s/^M//w/notype[KP Enter]
[PF1][KP7]s/^J//w/notype[KP Enter]

where ^M and ^J are control-M and control-J respectively, not the physical
characters.

Keep the old file in case this destroys something.
Chris
2020-12-23 18:33:53 UTC
Permalink
Post by Phillip Helbig (undress to reply)
I have a damaged file which looks fine except that I see (in EDT)
explicit carriage returns at the end of each line AND extra blank lines
between each pair of lines. None of the standard SET FILE/ATTR
incantations seems to work.
Long story and it's late. :-| Maybe someone has a suggestion.
Maybe not what you want to hear, but you could fix that in a line
or two in unix / linux, using the primitive ed line editor. or
sed stream editor.

Perhaps vms has a similar utility ?...

Chris
Steven Schweda
2020-12-24 19:52:01 UTC
Permalink
[...] None of the standard SET FILE/ATTR
incantations seems to work.
help convert file

$! 12 December 1999. SMS.
$!
$! CONVERT a file to StreamLF record format.
$!
$ convert 'p1' 'p2' /fdl = sys$input:
RECORD
FORMAT stream_lf
$!

Between "the standard SET FILE/ATTR" and CONVERT
(occasionally multiple times each), I've seldom needed to
write a simple conversion program, but it's also possible to
do that.
[...] Maybe someone has a suggestion.
Nike, Inc. has one: https://en.wikipedia.org/wiki/Just_Do_It
Steven Schweda
2020-12-24 22:53:52 UTC
Permalink
One more possibility: Zip the thing, possibly with "-ll,"
and use "unzip -a[a]" to extract it. Try it with different
initial attributes. If nothing else, with all the possible
combinations, it'll kill some (more) time.
Phil Howell
2020-12-25 02:50:10 UTC
Permalink
Post by Phillip Helbig (undress to reply)
I have a damaged file which looks fine except that I see (in EDT)
explicit carriage returns at the end of each line AND extra blank lines
between each pair of lines. None of the standard SET FILE/ATTR
incantations seems to work.
Long story and it's late. :-| Maybe someone has a suggestion.
I used to use eveplus (from decus tape)
And the trim command was sometimes usefull
Also more suggestions in this thread
https://comp.os.vms.narkive.com/h3kFgdYH/need-quick-edt-or-tpu-script
Dirk Munk
2020-12-28 22:32:25 UTC
Permalink
Post by Phillip Helbig (undress to reply)
I have a damaged file which looks fine except that I see (in EDT)
explicit carriage returns at the end of each line AND extra blank lines
between each pair of lines. None of the standard SET FILE/ATTR
incantations seems to work.
Long story and it's late. :-| Maybe someone has a suggestion.
It seems your file is not a stream file, so changing attributes will not
help.

You can try the following:
1. Convert the file to a stream-lf file.
2. Set the attributes of the resulting file to stream cr-lf.
3. Convert the file again to a stream-lf file, it will replace all the
cr-lf record seperators to lf seperators.
4. Convert the file to any filetype you like, with some luck all the
blank lines will have disappeared.
Phillip Helbig (undress to reply)
2020-12-28 22:47:34 UTC
Permalink
Post by Dirk Munk
1. Convert the file to a stream-lf file.
2. Set the attributes of the resulting file to stream cr-lf.
3. Convert the file again to a stream-lf file, it will replace all the
cr-lf record seperators to lf seperators.
4. Convert the file to any filetype you like, with some luck all the
blank lines will have disappeared.
I tried for about 45 minutes---all the suggestions posted here! It was
about 100 MB, so not all were quick to check.

In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
Dirk Munk
2020-12-29 13:27:23 UTC
Permalink
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
1. Convert the file to a stream-lf file.
2. Set the attributes of the resulting file to stream cr-lf.
3. Convert the file again to a stream-lf file, it will replace all the
cr-lf record seperators to lf seperators.
4. Convert the file to any filetype you like, with some luck all the
blank lines will have disappeared.
I tried for about 45 minutes---all the suggestions posted here! It was
about 100 MB, so not all were quick to check.
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
I've dealt with problems like these before, usually caused by
applications that were not written for VMS.

You need to have a bit of a feeling for the different file types of VMS
to fix these problems, but if you have that, it's very simple to solve
these little puzzles.
Phillip Helbig (undress to reply)
2020-12-29 13:35:45 UTC
Permalink
Post by Dirk Munk
Post by Phillip Helbig (undress to reply)
I tried for about 45 minutes---all the suggestions posted here! It was
about 100 MB, so not all were quick to check.
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
I've dealt with problems like these before, usually caused by
applications that were not written for VMS.
You need to have a bit of a feeling for the different file types of VMS
to fix these problems, but if you have that, it's very simple to solve
these little puzzles.
I don't know how many times I've used SET FILE/ATTR or CONVERT or TECO
to fix things like this. I can usually look at the contents, look at
DIR/FULL, and see what needs to be done if they don't match, but this
was somehow different.
Jan-Erik Söderholm
2020-12-29 14:21:26 UTC
Permalink
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
Post by Phillip Helbig (undress to reply)
I tried for about 45 minutes---all the suggestions posted here! It was
about 100 MB, so not all were quick to check.
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
I've dealt with problems like these before, usually caused by
applications that were not written for VMS.
You need to have a bit of a feeling for the different file types of VMS
to fix these problems, but if you have that, it's very simple to solve
these little puzzles.
I don't know how many times I've used SET FILE/ATTR or CONVERT or TECO
to fix things like this. I can usually look at the contents, look at
DIR/FULL, and see what needs to be done if they don't match, but this
was somehow different.
Nice that it was fixed! And no, I do not belive in magic... :-)
Bill Gunshannon
2020-12-29 14:45:27 UTC
Permalink
Post by Jan-Erik Söderholm
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
I tried for about 45 minutes---all the suggestions posted here!  It was
about 100 MB, so not all were quick to check.
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
I've dealt with problems like these before, usually caused by
applications that were not written for VMS.
You need to have a bit of a feeling for the different file types of VMS
to fix these problems, but if you have that, it's very simple to solve
these little puzzles.
I don't know how many times I've used SET FILE/ATTR or CONVERT or TECO
to fix things like this.  I can usually look at the contents, look at
DIR/FULL, and see what needs to be done if they don't match, but this
was somehow different.
Nice that it was fixed! And no, I do not belive in magic... :-)
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial. :-)

bill
Dirk Munk
2020-12-29 22:05:39 UTC
Permalink
Post by Bill Gunshannon
Post by Jan-Erik Söderholm
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
I tried for about 45 minutes---all the suggestions posted here!  It was
about 100 MB, so not all were quick to check.
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
I've dealt with problems like these before, usually caused by
applications that were not written for VMS.
You need to have a bit of a feeling for the different file types of VMS
to fix these problems, but if you have that, it's very simple to solve
these little puzzles.
I don't know how many times I've used SET FILE/ATTR or CONVERT or TECO
to fix things like this.  I can usually look at the contents, look at
DIR/FULL, and see what needs to be done if they don't match, but this
was somehow different.
Nice that it was fixed! And no, I do not belive in magic... :-)
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial.  :-)
bill
No, of course Unix is not immune. Using <lf> or <cr> (Windows) as record
terminator is a rather silly idea. It means that you can't use those
characters in a record, and you have to scan the contents of a file for
those characters. Simply writing the length of a record at the beginning
of that record is far better solution.
Jan-Erik Söderholm
2020-12-29 23:32:02 UTC
Permalink
Post by Dirk Munk
Post by Bill Gunshannon
Post by Jan-Erik Söderholm
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
I tried for about 45 minutes---all the suggestions posted here!  It was
about 100 MB, so not all were quick to check.
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
I've dealt with problems like these before, usually caused by
applications that were not written for VMS.
You need to have a bit of a feeling for the different file types of VMS
to fix these problems, but if you have that, it's very simple to solve
these little puzzles.
I don't know how many times I've used SET FILE/ATTR or CONVERT or TECO
to fix things like this.  I can usually look at the contents, look at
DIR/FULL, and see what needs to be done if they don't match, but this
was somehow different.
Nice that it was fixed! And no, I do not belive in magic... :-)
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial.  :-)
bill
No, of course Unix is not immune. Using <lf> or <cr> (Windows) as record
terminator is a rather silly idea. It means that you can't use those
characters in a record, and you have to scan the contents of a file for
those characters. Simply writing the length of a record at the beginning of
that record is far better solution.
Having a <LF> or a <CR> in text files seems rather logical to me.
What else, if you want either a line feed or a carriage return?

But yes, there are other ways to specify and delimiting a "line of text",
if you have a system suporting that.

Now, if that "record" is something else than a "line of text"...
Dirk Munk
2020-12-30 12:59:15 UTC
Permalink
Post by Jan-Erik Söderholm
Post by Dirk Munk
Post by Bill Gunshannon
Post by Jan-Erik Söderholm
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
Post by Phillip Helbig (undress to reply)
I tried for about 45 minutes---all the suggestions posted here!
It was
about 100 MB, so not all were quick to check.
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
I've dealt with problems like these before, usually caused by
applications that were not written for VMS.
You need to have a bit of a feeling for the different file types of VMS
to fix these problems, but if you have that, it's very simple to solve
these little puzzles.
I don't know how many times I've used SET FILE/ATTR or CONVERT or TECO
to fix things like this.  I can usually look at the contents, look at
DIR/FULL, and see what needs to be done if they don't match, but this
was somehow different.
Nice that it was fixed! And no, I do not belive in magic... :-)
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial.  :-)
bill
No, of course Unix is not immune. Using <lf> or <cr> (Windows) as
record terminator is a rather silly idea. It means that you can't use
those characters in a record, and you have to scan the contents of a
file for those characters. Simply writing the length of a record at
the beginning of that record is far better solution.
Having a <LF> or a <CR> in text files seems rather logical to me.
What else, if you want either a line feed or a carriage return?
But yes, there are other ways to specify and delimiting a "line of text",
if you have a system suporting that.
Now, if that "record" is something else than a "line of text"...
The problem is that in Unix and Windows land there is no difference
between the metadata of a file, and the actual contents of a file. The
metadata should define the file and the records in the file, that should
be completely separate from the actual data contents of the file.

Suppose I have a file with binary data, and one byte has the binary
(ascii) value of <lf>, then Unix will use it as a record separator, even
if it is in the middle of the actual data of that record.

Suppose you have a VMS file with fixed record size. That file has no
records separators what so ever, it is one long stream of data. VMS can
calculate where the records start and end in the file. Suppose it
consists out of sets of three records of 100 bytes that belong together.
Then you can change the attributes of that file to records of 300 bytes,
and in one read operation you will have all the data that belongs
together. I've actually used this in the past.

Suppose you want to print such a file, then VMS will send a <cr> and a
<lf> to the printer after each record. Simple.

The DEC software engineers understood very well why it is a bad idea to
mix up contents of a file with the structure of a file, and that's why
they did not use stream files as standard RMS files in applications.
They are just there for compatibly with Unix, Windows etc.
hb
2020-12-30 14:13:50 UTC
Permalink
Post by Dirk Munk
Suppose I have a file with binary data, and one byte has the binary
(ascii) value of <lf>, then Unix will use it as a record separator, even
if it is in the middle of the actual data of that record.
No. Unix does not know what a record separator is. It's the application
which knows what's in a file and whether an <lf> is a newline and
interpreted as a record separator or just data. And, if you have binary
data you usually do not read with fgets() or any other read function,
which stops reading after a newline.
Post by Dirk Munk
Suppose you have a VMS file with fixed record size. That file has no
records separators what so ever, it is one long stream of data. VMS can
calculate where the records start and end in the file. Suppose it
consists out of sets of three records of 100 bytes that belong together.
Then you can change the attributes of that file to records of 300 bytes,
and in one read operation you will have all the data that belongs
together. I've actually used this in the past.
You obviously had even record size and never had the record attribute
BLOCK_SPAN set to "no" (FDL syntax).
Bill Gunshannon
2020-12-30 19:55:03 UTC
Permalink
Post by Dirk Munk
Post by Jan-Erik Söderholm
Post by Dirk Munk
Post by Bill Gunshannon
Post by Jan-Erik Söderholm
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
Post by Phillip Helbig (undress to reply)
I tried for about 45 minutes---all the suggestions posted here!
It was
about 100 MB, so not all were quick to check.
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
I've dealt with problems like these before, usually caused by
applications that were not written for VMS.
You need to have a bit of a feeling for the different file types of VMS
to fix these problems, but if you have that, it's very simple to solve
these little puzzles.
I don't know how many times I've used SET FILE/ATTR or CONVERT or TECO
to fix things like this.  I can usually look at the contents, look at
DIR/FULL, and see what needs to be done if they don't match, but this
was somehow different.
Nice that it was fixed! And no, I do not belive in magic... :-)
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial.  :-)
bill
No, of course Unix is not immune. Using <lf> or <cr> (Windows) as
record terminator is a rather silly idea. It means that you can't use
those characters in a record, and you have to scan the contents of a
file for those characters. Simply writing the length of a record at
the beginning of that record is far better solution.
Having a <LF> or a <CR> in text files seems rather logical to me.
What else, if you want either a line feed or a carriage return?
But yes, there are other ways to specify and delimiting a "line of text",
if you have a system suporting that.
Now, if that "record" is something else than a "line of text"...
The problem is that in Unix and Windows land there is no difference
between the metadata of a file, and the actual contents of a file. The
metadata should define the file and the records in the file, that should
be completely separate from the actual data contents of the file.
Can't speak for Windows, but Unix has no meta-data. Unix has only one
file type, a stream of bytes. Everything else is application layer.
Post by Dirk Munk
Suppose I have a file with binary data, and one byte has the binary
(ascii) value of <lf>, then Unix will use it as a record separator, even
if it is in the middle of the actual data of that record.
Unix has no records. If you cat the file it will line break at the <lf>.
If you od -c the file it will identify the <lf> as just that.
Post by Dirk Munk
Suppose you have a VMS file with fixed record size. That file has no
records separators what so ever, it is one long stream of data. VMS can
calculate where the records start and end in the file. Suppose it
consists out of sets of three records of 100 bytes that belong together.
Then you can change the attributes of that file to records of 300 bytes,
and in one read operation you will have all the data that belongs
together. I've actually used this in the past.
And that would be an application concept, not really an OS thing.
Post by Dirk Munk
Suppose you want to print such a file, then VMS will send a <cr> and a
<lf> to the printer after each record. Simple.
VMS won't. Whatever application actually prints it will.
Post by Dirk Munk
The DEC software engineers understood very well why it is a bad idea to
mix up contents of a file with the structure of a file, and that's why
they did not use stream files as standard RMS files in applications.
They are just there for compatibly with Unix, Windows etc.
And Unix made all files streams of bytes and lets the applications
decide what to do with them. Not really an OS problem.

bill
Arne Vajhøj
2020-12-31 01:09:29 UTC
Permalink
Post by Bill Gunshannon
Post by Dirk Munk
Suppose I have a file with binary data, and one byte has the binary
(ascii) value of <lf>, then Unix will use it as a record separator,
even if it is in the middle of the actual data of that record.
Unix has no records. If you cat the file it will line break at the <lf>.
If you od -c the file it will identify the <lf> as just that.
There is a convention and some library functions that assume
text files consist of LF delimited lines.
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you have a VMS file with fixed record size. That file has no
records separators what so ever, it is one long stream of data. VMS
can calculate where the records start and end in the file. Suppose it
consists out of sets of three records of 100 bytes that belong
together. Then you can change the attributes of that file to records
of 300 bytes, and in one read operation you will have all the data
that belongs together. I've actually used this in the past.
And that would be an application concept, not really an OS thing.
On *nix it is an application thing. But on VMS it is really an
OS thing.

RMS will use the record format and record length when reading.

On *nix it is 300 bytes that can be read as 3 x 100 bytes or
1 x 300 bytes.

But on VMS then 3 records of 100 bytes are different from
1 record of 300 bytes. SYS$GET will behave different in
the two cases.
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you want to print such a file, then VMS will send a <cr> and a
<lf> to the printer after each record. Simple.
VMS won't.  Whatever application actually prints it will.
Something will send the CR LF.

But VMS will actually tell where to put them as SYS$GET calls (or some
language RTL on top of it) will return either 3 lines or 1 line.

Arne
Dirk Munk
2020-12-31 11:07:20 UTC
Permalink
Post by Bill Gunshannon
Post by Dirk Munk
Post by Jan-Erik Söderholm
Post by Dirk Munk
Post by Bill Gunshannon
Post by Jan-Erik Söderholm
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
Post by Phillip Helbig (undress to reply)
I tried for about 45 minutes---all the suggestions posted here!
It was
about 100 MB, so not all were quick to check.
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
I've dealt with problems like these before, usually caused by
applications that were not written for VMS.
You need to have a bit of a feeling for the different file types of VMS
to fix these problems, but if you have that, it's very simple to solve
these little puzzles.
I don't know how many times I've used SET FILE/ATTR or CONVERT or TECO
to fix things like this.  I can usually look at the contents, look at
DIR/FULL, and see what needs to be done if they don't match, but this
was somehow different.
Nice that it was fixed! And no, I do not belive in magic... :-)
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial.  :-)
bill
No, of course Unix is not immune. Using <lf> or <cr> (Windows) as
record terminator is a rather silly idea. It means that you can't
use those characters in a record, and you have to scan the contents
of a file for those characters. Simply writing the length of a
record at the beginning of that record is far better solution.
Having a <LF> or a <CR> in text files seems rather logical to me.
What else, if you want either a line feed or a carriage return?
But yes, there are other ways to specify and delimiting a "line of text",
if you have a system suporting that.
Now, if that "record" is something else than a "line of text"...
The problem is that in Unix and Windows land there is no difference
between the metadata of a file, and the actual contents of a file. The
metadata should define the file and the records in the file, that
should be completely separate from the actual data contents of the file.
Can't speak for Windows, but Unix has no meta-data. Unix has only one
file type, a stream of bytes.  Everything else is application layer.
Which means you don't have a clue about the contents of a file, until
you know the internals of the application. Standard VMS applications
produce structured files, so you only have to worry about the data
contents. It is possible to write your own applications using the files
of another application. The application can be in any language, because
RMS is the layer between the application and the file. This is a
structured approach, instead of producing a diarrhea of bytes, and
calling it a file.
Post by Bill Gunshannon
Post by Dirk Munk
Suppose I have a file with binary data, and one byte has the binary
(ascii) value of <lf>, then Unix will use it as a record separator,
even if it is in the middle of the actual data of that record.
Unix has no records. If you cat the file it will line break at the <lf>.
If you od -c the file it will identify the <lf> as just that.
Wonderful. However, it is clear that in many applications the notion of
a data record is present, and that the <lf> is used as record separator,
even if Unix formally doesn't have records.
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you have a VMS file with fixed record size. That file has no
records separators what so ever, it is one long stream of data. VMS
can calculate where the records start and end in the file. Suppose it
consists out of sets of three records of 100 bytes that belong
together. Then you can change the attributes of that file to records
of 300 bytes, and in one read operation you will have all the data
that belongs together. I've actually used this in the past.
And that would be an application concept, not really an OS thing.
Actually not, since this can only be done because of the way RMS stores
data, and RMS is part of the OS.
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you want to print such a file, then VMS will send a <cr> and a
<lf> to the printer after each record. Simple.
VMS won't.  Whatever application actually prints it will.
Obviously, this is functionality of the spooler, and that is part of VMS.
Post by Bill Gunshannon
Post by Dirk Munk
The DEC software engineers understood very well why it is a bad idea
to mix up contents of a file with the structure of a file, and that's
why they did not use stream files as standard RMS files in
applications. They are just there for compatibly with Unix, Windows etc.
And Unix made all files streams of bytes and lets the applications
decide what to do with them.  Not really an OS problem.
Exchanging data between applications is rather important. Those
applications can be written in many languages, can come from different
sources. It is obvious that well structured files are paramount for
exchanging data between applications. That is why something like RMS is
in fact a very modern approach to structured software engineering,
instead of producing a an unstructured diarrhea of bytes, and calling it
a file.
Bill Gunshannon
2020-12-31 15:41:42 UTC
Permalink
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Post by Jan-Erik Söderholm
Post by Dirk Munk
Post by Bill Gunshannon
Post by Jan-Erik Söderholm
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
Post by Phillip Helbig (undress to reply)
I tried for about 45 minutes---all the suggestions posted
here! It was
about 100 MB, so not all were quick to check.
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
I've dealt with problems like these before, usually caused by
applications that were not written for VMS.
You need to have a bit of a feeling for the different file types of VMS
to fix these problems, but if you have that, it's very simple to solve
these little puzzles.
I don't know how many times I've used SET FILE/ATTR or CONVERT or TECO
to fix things like this.  I can usually look at the contents, look at
DIR/FULL, and see what needs to be done if they don't match, but this
was somehow different.
Nice that it was fixed! And no, I do not belive in magic... :-)
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial.  :-)
bill
No, of course Unix is not immune. Using <lf> or <cr> (Windows) as
record terminator is a rather silly idea. It means that you can't
use those characters in a record, and you have to scan the contents
of a file for those characters. Simply writing the length of a
record at the beginning of that record is far better solution.
Having a <LF> or a <CR> in text files seems rather logical to me.
What else, if you want either a line feed or a carriage return?
But yes, there are other ways to specify and delimiting a "line of text",
if you have a system suporting that.
Now, if that "record" is something else than a "line of text"...
The problem is that in Unix and Windows land there is no difference
between the metadata of a file, and the actual contents of a file.
The metadata should define the file and the records in the file, that
should be completely separate from the actual data contents of the file.
Can't speak for Windows, but Unix has no meta-data. Unix has only one
file type, a stream of bytes.  Everything else is application layer.
Which means you don't have a clue about the contents of a file, until
you know the internals of the application.
Well, that isn't exactly true. Certain file types do have clues.
And, at least under Unix, there is an application that will do a
very good job of identifying what the file is. It is even possible
to add your own hints if they exist and if you so desire .
Post by Dirk Munk
Standard VMS applications
produce structured files, so you only have to worry about the data
contents. It is possible to write your own applications using the files
of another application. The application can be in any language, because
RMS is the layer between the application and the file. This is a
structured approach, instead of producing a diarrhea of bytes, and
calling it a file.
Post by Bill Gunshannon
Post by Dirk Munk
Suppose I have a file with binary data, and one byte has the binary
(ascii) value of <lf>, then Unix will use it as a record separator,
even if it is in the middle of the actual data of that record.
Unix has no records. If you cat the file it will line break at the <lf>.
If you od -c the file it will identify the <lf> as just that.
Wonderful. However, it is clear that in many applications the notion of
a data record is present, and that the <lf> is used as record separator,
even if Unix formally doesn't have records.
Again, that is more of a C'ism than a Unix'ism. If I write an
application that uses ^M instead of ^J it will work just fine.
and, there is no reason why I couldn't have ^J as a valid, non-
record terminating character in the file.
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you have a VMS file with fixed record size. That file has no
records separators what so ever, it is one long stream of data. VMS
can calculate where the records start and end in the file. Suppose it
consists out of sets of three records of 100 bytes that belong
together. Then you can change the attributes of that file to records
of 300 bytes, and in one read operation you will have all the data
that belongs together. I've actually used this in the past.
And that would be an application concept, not really an OS thing.
Actually not, since this can only be done because of the way RMS stores
data, and RMS is part of the OS.
See, there is where we differ in opinion. I see RMS as an
application that just happens to ship with VMS. Like editors,
compilers and other pieces that ship with the OS but have are
not really part of it. Surely VMS will run without RMS present.
Not all applications need to access files at all.
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you want to print such a file, then VMS will send a <cr> and
a <lf> to the printer after each record. Simple.
VMS won't.  Whatever application actually prints it will.
Obviously, this is functionality of the spooler, and that is part of VMS.
Post by Bill Gunshannon
Post by Dirk Munk
The DEC software engineers understood very well why it is a bad idea
to mix up contents of a file with the structure of a file, and that's
why they did not use stream files as standard RMS files in
applications. They are just there for compatibly with Unix, Windows etc.
And Unix made all files streams of bytes and lets the applications
decide what to do with them.  Not really an OS problem.
Exchanging data between applications is rather important. Those
applications can be written in many languages, can come from different
sources. It is obvious that well structured files are paramount for
exchanging data between applications. That is why something like RMS is
in fact a very modern approach to structured software engineering,
instead of producing a an unstructured diarrhea of bytes, and calling it
a file.
Some see it otherwise. Unix tends to leave more control for the
developer and not try and handcuff them with someone else's concept
of how things should be done.

bill
Dave Froble
2020-12-31 18:09:03 UTC
Permalink
Post by Bill Gunshannon
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Post by Jan-Erik Söderholm
Post by Dirk Munk
Post by Bill Gunshannon
Post by Jan-Erik Söderholm
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
Post by Phillip Helbig (undress to reply)
I tried for about 45 minutes---all the suggestions posted
here! It was
about 100 MB, so not all were quick to check.
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
I've dealt with problems like these before, usually caused by
applications that were not written for VMS.
You need to have a bit of a feeling for the different file types of VMS
to fix these problems, but if you have that, it's very simple to solve
these little puzzles.
I don't know how many times I've used SET FILE/ATTR or CONVERT or TECO
to fix things like this. I can usually look at the contents, look at
DIR/FULL, and see what needs to be done if they don't match, but this
was somehow different.
Nice that it was fixed! And no, I do not belive in magic... :-)
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial. :-)
bill
No, of course Unix is not immune. Using <lf> or <cr> (Windows) as
record terminator is a rather silly idea. It means that you can't
use those characters in a record, and you have to scan the
contents of a file for those characters. Simply writing the length
of a record at the beginning of that record is far better solution.
Having a <LF> or a <CR> in text files seems rather logical to me.
What else, if you want either a line feed or a carriage return?
But yes, there are other ways to specify and delimiting a "line of text",
if you have a system suporting that.
Now, if that "record" is something else than a "line of text"...
The problem is that in Unix and Windows land there is no difference
between the metadata of a file, and the actual contents of a file.
The metadata should define the file and the records in the file,
that should be completely separate from the actual data contents of
the file.
Can't speak for Windows, but Unix has no meta-data. Unix has only one
file type, a stream of bytes. Everything else is application layer.
Which means you don't have a clue about the contents of a file, until
you know the internals of the application.
Well, that isn't exactly true. Certain file types do have clues.
And, at least under Unix, there is an application that will do a
very good job of identifying what the file is. It is even possible
to add your own hints if they exist and if you so desire .
Post by Dirk Munk
Standard VMS applications
produce structured files, so you only have to worry about the data
contents. It is possible to write your own applications using the
files of another application. The application can be in any language,
because RMS is the layer between the application and the file. This is
a structured approach, instead of producing a diarrhea of bytes, and
calling it a file.
Post by Bill Gunshannon
Post by Dirk Munk
Suppose I have a file with binary data, and one byte has the binary
(ascii) value of <lf>, then Unix will use it as a record separator,
even if it is in the middle of the actual data of that record.
Unix has no records. If you cat the file it will line break at the <lf>.
If you od -c the file it will identify the <lf> as just that.
Wonderful. However, it is clear that in many applications the notion
of a data record is present, and that the <lf> is used as record
separator, even if Unix formally doesn't have records.
Again, that is more of a C'ism than a Unix'ism. If I write an
application that uses ^M instead of ^J it will work just fine.
and, there is no reason why I couldn't have ^J as a valid, non-
record terminating character in the file.
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you have a VMS file with fixed record size. That file has no
records separators what so ever, it is one long stream of data. VMS
can calculate where the records start and end in the file. Suppose
it consists out of sets of three records of 100 bytes that belong
together. Then you can change the attributes of that file to records
of 300 bytes, and in one read operation you will have all the data
that belongs together. I've actually used this in the past.
And that would be an application concept, not really an OS thing.
Actually not, since this can only be done because of the way RMS
stores data, and RMS is part of the OS.
See, there is where we differ in opinion. I see RMS as an
application that just happens to ship with VMS. Like editors,
compilers and other pieces that ship with the OS but have are
not really part of it. Surely VMS will run without RMS present.
Not all applications need to access files at all.
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you want to print such a file, then VMS will send a <cr> and
a <lf> to the printer after each record. Simple.
VMS won't. Whatever application actually prints it will.
Obviously, this is functionality of the spooler, and that is part of VMS.
Post by Bill Gunshannon
Post by Dirk Munk
The DEC software engineers understood very well why it is a bad idea
to mix up contents of a file with the structure of a file, and
that's why they did not use stream files as standard RMS files in
applications. They are just there for compatibly with Unix, Windows etc.
And Unix made all files streams of bytes and lets the applications
decide what to do with them. Not really an OS problem.
Exchanging data between applications is rather important. Those
applications can be written in many languages, can come from different
sources. It is obvious that well structured files are paramount for
exchanging data between applications. That is why something like RMS
is in fact a very modern approach to structured software engineering,
instead of producing a an unstructured diarrhea of bytes, and calling
it a file.
Some see it otherwise. Unix tends to leave more control for the
developer and not try and handcuff them with someone else's concept
of how things should be done.
bill
This whole topic is an excuse for "discussions".

Just about everyone can have their own opinion about just what is an OS,
and come up with arguments to support that opinion. But it really
doesn't matter. Each "environment", and that is one name for what VMS,
Unix, WEENDOZE, and such is.

The VMS environment does include utilities, Ok, call them applications,
doesn't matter, which are considered (by most) to be part of that
particular OS.

For example, where my experience comes from, the concept of "records" is
sort of fundamental. But I understand that may not be so for certain
uses. I'd guess many real-time uses would not have records, databases,
and other things, just what is required for the specific task.
-
So have the discussions, if it is felt necessary. Doesn't really matter.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Dirk Munk
2020-12-31 20:58:15 UTC
Permalink
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Post by Jan-Erik Söderholm
Post by Dirk Munk
Post by Bill Gunshannon
Post by Jan-Erik Söderholm
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
Post by Phillip Helbig (undress to reply)
I tried for about 45 minutes---all the suggestions posted
here! It was
about 100 MB, so not all were quick to check.
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
I've dealt with problems like these before, usually caused by
applications that were not written for VMS.
You need to have a bit of a feeling for the different file types of VMS
to fix these problems, but if you have that, it's very simple to solve
these little puzzles.
I don't know how many times I've used SET FILE/ATTR or CONVERT or TECO
to fix things like this.  I can usually look at the contents, look at
DIR/FULL, and see what needs to be done if they don't match, but this
was somehow different.
Nice that it was fixed! And no, I do not belive in magic... :-)
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial.  :-)
bill
No, of course Unix is not immune. Using <lf> or <cr> (Windows) as
record terminator is a rather silly idea. It means that you can't
use those characters in a record, and you have to scan the
contents of a file for those characters. Simply writing the length
of a record at the beginning of that record is far better solution.
Having a <LF> or a <CR> in text files seems rather logical to me.
What else, if you want either a line feed or a carriage return?
But yes, there are other ways to specify and delimiting a "line of text",
if you have a system suporting that.
Now, if that "record" is something else than a "line of text"...
The problem is that in Unix and Windows land there is no difference
between the metadata of a file, and the actual contents of a file.
The metadata should define the file and the records in the file,
that should be completely separate from the actual data contents of
the file.
Can't speak for Windows, but Unix has no meta-data. Unix has only one
file type, a stream of bytes.  Everything else is application layer.
Which means you don't have a clue about the contents of a file, until
you know the internals of the application.
Well, that isn't exactly true.  Certain file types do have clues.
And, at least under Unix, there is an application that will do a
very good job of identifying what the file is.  It is even possible
to add your own hints if they exist and if you so desire .
Nice, but suppose you have a Cobol compiler on Unix, then it will have
to set up its own file system with all the files Cobol supports, like
indexed files. What will that application do with those files? RMS will
tell you the structure of the file, you don't have to guess it.
Post by Dirk Munk
                                           Standard VMS applications
produce structured files, so you only have to worry about the data
contents. It is possible to write your own applications using the
files of another application. The application can be in any language,
because RMS is the layer between the application and the file. This is
a structured approach, instead of producing a diarrhea of bytes, and
calling it a file.
Post by Bill Gunshannon
Post by Dirk Munk
Suppose I have a file with binary data, and one byte has the binary
(ascii) value of <lf>, then Unix will use it as a record separator,
even if it is in the middle of the actual data of that record.
Unix has no records. If you cat the file it will line break at the <lf>.
If you od -c the file it will identify the <lf> as just that.
Wonderful. However, it is clear that in many applications the notion
of a data record is present, and that the <lf> is used as record
separator, even if Unix formally doesn't have records.
Again, that is  more of a C'ism than a Unix'ism.  If I write an
application that uses ^M instead of ^J it will work just fine.
and, there is no reason why I couldn't have ^J as a valid, non-
record terminating character in the file.
Sure you can. But the standard (used for instance by FTP ASCII
transfers) is <lf>.
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you have a VMS file with fixed record size. That file has no
records separators what so ever, it is one long stream of data. VMS
can calculate where the records start and end in the file. Suppose
it consists out of sets of three records of 100 bytes that belong
together. Then you can change the attributes of that file to records
of 300 bytes, and in one read operation you will have all the data
that belongs together. I've actually used this in the past.
And that would be an application concept, not really an OS thing.
Actually not, since this can only be done because of the way RMS
stores data, and RMS is part of the OS.
See, there is where we differ in opinion.  I see RMS as an
application that just happens to ship with VMS.  Like editors,
compilers and other pieces that ship with the OS but have are
not really part of it.  Surely VMS will run without RMS present.
Not all applications need to access files at all.
No, RMS is more like middleware. How do you think that VMS could read
and write its own files if RMS is not present?
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you want to print such a file, then VMS will send a <cr> and
a <lf> to the printer after each record. Simple.
VMS won't.  Whatever application actually prints it will.
Obviously, this is functionality of the spooler, and that is part of VMS.
Post by Bill Gunshannon
Post by Dirk Munk
The DEC software engineers understood very well why it is a bad idea
to mix up contents of a file with the structure of a file, and
that's why they did not use stream files as standard RMS files in
applications. They are just there for compatibly with Unix, Windows etc.
And Unix made all files streams of bytes and lets the applications
decide what to do with them.  Not really an OS problem.
Exchanging data between applications is rather important. Those
applications can be written in many languages, can come from different
sources. It is obvious that well structured files are paramount for
exchanging data between applications. That is why something like RMS
is in fact a very modern approach to structured software engineering,
instead of producing a an unstructured diarrhea of bytes, and calling
it a file.
Some see it otherwise.  Unix tends to leave more control for the
developer and not try and handcuff them with someone else's concept
of how things should be done.
If you must, you can do that with VMS as well. However, in 99.9% of all
applications, RMS with all of its functionality will give you execly
what you need. The point is, Unix doesn't have something like that. With
VMS you have the choice, with Unix, you don't.
Arne Vajhøj
2020-12-31 21:13:21 UTC
Permalink
Post by Dirk Munk
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
The problem is that in Unix and Windows land there is no difference
between the metadata of a file, and the actual contents of a file.
The metadata should define the file and the records in the file,
that should be completely separate from the actual data contents of
the file.
Can't speak for Windows, but Unix has no meta-data. Unix has only one
file type, a stream of bytes.  Everything else is application layer.
Which means you don't have a clue about the contents of a file, until
you know the internals of the application.
Well, that isn't exactly true.  Certain file types do have clues.
And, at least under Unix, there is an application that will do a
very good job of identifying what the file is.  It is even possible
to add your own hints if they exist and if you so desire .
Nice, but suppose you have a Cobol compiler on Unix, then it will have
to set up its own file system with all the files Cobol supports, like
indexed files. What will that application do with those files? RMS will
tell you the structure of the file, you don't have to guess it.
RMS will always have the information about the record format.

For index-sequential files RMS will have information about the keys, but
it will not have information about the non-key part (which can actually
be different for different records).
Post by Dirk Munk
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose I have a file with binary data, and one byte has the binary
(ascii) value of <lf>, then Unix will use it as a record separator,
even if it is in the middle of the actual data of that record.
Unix has no records. If you cat the file it will line break at the <lf>.
If you od -c the file it will identify the <lf> as just that.
Wonderful. However, it is clear that in many applications the notion
of a data record is present, and that the <lf> is used as record
separator, even if Unix formally doesn't have records.
Again, that is  more of a C'ism than a Unix'ism.  If I write an
application that uses ^M instead of ^J it will work just fine.
and, there is no reason why I couldn't have ^J as a valid, non-
record terminating character in the file.
Sure you can. But the standard (used for instance by FTP ASCII
transfers) is <lf>.
The *nix standard is definitely LF.

But most network protocols including FTP use CR LF.

FTP RFC:

<quote>
In accordance with the NVT standard, the <CRLF> sequence
should be used where necessary to denote the end of a line
of text.
</quote>

<quote>
If this division is necessary,
the FTP implementation should use the end-of-line sequence,
<CRLF> for ASCII, or <NL> for EBCDIC text files, as the
delimiter.
</quote>

Arne
Phillip Helbig (undress to reply)
2020-12-31 22:18:59 UTC
Permalink
Post by Arne Vajhøj
The *nix standard is definitely LF.
But most network protocols including FTP use CR LF.
<quote>
In accordance with the NVT standard, the <CRLF> sequence
should be used where necessary to denote the end of a line
of text.
</quote>
Nearly every internet application protocol (HTTP, FTP, NNTP, SMTP) specifies
<CR><LF> as the line terminator and nearly every unix-derived application
screws it up at some point it its development.
Phillip Helbig (undress to reply)
2020-12-31 23:41:17 UTC
Permalink
Post by Phillip Helbig (undress to reply)
Post by Arne Vajhøj
The *nix standard is definitely LF.
But most network protocols including FTP use CR LF.
<quote>
In accordance with the NVT standard, the <CRLF> sequence
should be used where necessary to denote the end of a line
of text.
</quote>
Nearly every internet application protocol (HTTP, FTP, NNTP, SMTP) specifies
<CR><LF> as the line terminator and nearly every unix-derived application
screws it up at some point it its development.
I forgot the attribution; that's a quote from Dave Jones (author of the
OSU HTTP server).
Simon Clubley
2021-01-01 03:43:26 UTC
Permalink
Post by Dirk Munk
Nice, but suppose you have a Cobol compiler on Unix, then it will have
to set up its own file system with all the files Cobol supports, like
indexed files. What will that application do with those files? RMS will
tell you the structure of the file, you don't have to guess it.
RMS only does this at a relatively meaningless level.

For example, unlike with SQL, you don't know the data field names,
their size and data type just by looking at the RMS data.

For example, there's nothing in the RMS metadata which tells you the
SYSUAF.DAT field names, sizes and data types.

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Bill Gunshannon
2021-01-01 03:54:45 UTC
Permalink
Post by Dirk Munk
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Post by Jan-Erik Söderholm
Post by Dirk Munk
Post by Bill Gunshannon
Post by Jan-Erik Söderholm
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
Post by Phillip Helbig (undress to reply)
I tried for about 45 minutes---all the suggestions posted
here! It was
about 100 MB, so not all were quick to check.
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
I've dealt with problems like these before, usually caused by
applications that were not written for VMS.
You need to have a bit of a feeling for the different file types of VMS
to fix these problems, but if you have that, it's very simple to solve
these little puzzles.
I don't know how many times I've used SET FILE/ATTR or CONVERT or TECO
to fix things like this.  I can usually look at the contents, look at
DIR/FULL, and see what needs to be done if they don't match, but this
was somehow different.
Nice that it was fixed! And no, I do not belive in magic... :-)
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial.  :-)
bill
No, of course Unix is not immune. Using <lf> or <cr> (Windows) as
record terminator is a rather silly idea. It means that you can't
use those characters in a record, and you have to scan the
contents of a file for those characters. Simply writing the
length of a record at the beginning of that record is far better
solution.
Having a <LF> or a <CR> in text files seems rather logical to me.
What else, if you want either a line feed or a carriage return?
But yes, there are other ways to specify and delimiting a "line of text",
if you have a system suporting that.
Now, if that "record" is something else than a "line of text"...
The problem is that in Unix and Windows land there is no difference
between the metadata of a file, and the actual contents of a file.
The metadata should define the file and the records in the file,
that should be completely separate from the actual data contents of
the file.
Can't speak for Windows, but Unix has no meta-data. Unix has only one
file type, a stream of bytes.  Everything else is application layer.
Which means you don't have a clue about the contents of a file, until
you know the internals of the application.
Well, that isn't exactly true.  Certain file types do have clues.
And, at least under Unix, there is an application that will do a
very good job of identifying what the file is.  It is even possible
to add your own hints if they exist and if you so desire .
Nice, but suppose you have a Cobol compiler on Unix, then it will have
to set up its own file system with all the files Cobol supports,
Don't know what you mean by "set up its own file system". COBOL will
open the file and if necessary create them for output files. To Unix
the files will be streams of bytes. To COBOL Programs they will be
sequential , line sequential, direct or indexed.
Post by Dirk Munk
like
indexed files. What will that application do with those files? RMS will
tell you the structure of the file, you don't have to guess it.
I use GnuCOBOL. sequential files show up as "ASCII text: as does the
COBOL Source File. Indexed report as "Berkeley DB" as that was the
option I chose for indexed files when I built GnuCOBOL. Other COBOL
compiler (like MicroFocus) may differ. Of course, the executable shows
up as "ELF 64-bit LSB shared object". If I wanted to put in the
effort I could probably get it to identify the source as COBOL source
but I see no reason to bother.
Post by Dirk Munk
Post by Dirk Munk
                                           Standard VMS applications
produce structured files, so you only have to worry about the data
contents. It is possible to write your own applications using the
files of another application. The application can be in any language,
because RMS is the layer between the application and the file. This
is a structured approach, instead of producing a diarrhea of bytes,
and calling it a file.
Post by Bill Gunshannon
Post by Dirk Munk
Suppose I have a file with binary data, and one byte has the binary
(ascii) value of <lf>, then Unix will use it as a record separator,
even if it is in the middle of the actual data of that record.
Unix has no records. If you cat the file it will line break at the <lf>.
If you od -c the file it will identify the <lf> as just that.
Wonderful. However, it is clear that in many applications the notion
of a data record is present, and that the <lf> is used as record
separator, even if Unix formally doesn't have records.
Again, that is  more of a C'ism than a Unix'ism.  If I write an
application that uses ^M instead of ^J it will work just fine.
and, there is no reason why I couldn't have ^J as a valid, non-
record terminating character in the file.
Sure you can. But the standard (used for instance by FTP ASCII
transfers) is <lf>.
OK, so what? The default has to be something and that has been around
a long time. And there are a lot of applications written to that
standard. But I am not forced to use it. How much freedom do you
have?
Post by Dirk Munk
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you have a VMS file with fixed record size. That file has
no records separators what so ever, it is one long stream of data.
VMS can calculate where the records start and end in the file.
Suppose it consists out of sets of three records of 100 bytes that
belong together. Then you can change the attributes of that file to
records of 300 bytes, and in one read operation you will have all
the data that belongs together. I've actually used this in the past.
And that would be an application concept, not really an OS thing.
Actually not, since this can only be done because of the way RMS
stores data, and RMS is part of the OS.
See, there is where we differ in opinion.  I see RMS as an
application that just happens to ship with VMS.  Like editors,
compilers and other pieces that ship with the OS but have are
not really part of it.  Surely VMS will run without RMS present.
Not all applications need to access files at all.
No, RMS is more like middleware. How do you think that VMS could read
and write its own files if RMS is not present?
Does a call to printf/scanf in a C program use RMS?
(Really, can someone answer that question?)
Post by Dirk Munk
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you want to print such a file, then VMS will send a <cr>
and a <lf> to the printer after each record. Simple.
VMS won't.  Whatever application actually prints it will.
Obviously, this is functionality of the spooler, and that is part of VMS.
Post by Bill Gunshannon
Post by Dirk Munk
The DEC software engineers understood very well why it is a bad
idea to mix up contents of a file with the structure of a file, and
that's why they did not use stream files as standard RMS files in
applications. They are just there for compatibly with Unix, Windows etc.
And Unix made all files streams of bytes and lets the applications
decide what to do with them.  Not really an OS problem.
Exchanging data between applications is rather important. Those
applications can be written in many languages, can come from
different sources. It is obvious that well structured files are
paramount for exchanging data between applications. That is why
something like RMS is in fact a very modern approach to structured
software engineering, instead of producing a an unstructured diarrhea
of bytes, and calling it a file.
Some see it otherwise.  Unix tends to leave more control for the
developer and not try and handcuff them with someone else's concept
of how things should be done.
If you must, you can do that with VMS as well. However, in 99.9% of all
applications, RMS with all of its functionality will give you execly
what you need. The point is, Unix doesn't have something like that. With
VMS you have the choice, with Unix, you don't.
Really? then why did Phillip have the problem that started this whole
discussion? VMS did not give him what he needed. I butchered a file
he brought over from somewhere on the web. Of course, Unix will do
that too, but Unix never told you it wouldn't. :-)

bill
Dave Froble
2021-01-01 04:59:30 UTC
Permalink
Post by Bill Gunshannon
Does a call to printf/scanf in a C program use RMS?
(Really, can someone answer that question?)
I think I can handle this one.

If using RMS for file I/O on VMS, and yes, as far as I know, C and most
other VMS languages do so, then the files are RMS files. Indexed (fixed
and variable length), relative, and sequential (fixed and variable
length). RMS has been modified to also support the stream file type(s).

For example, in Basic, a file open can specify the file organization.
I'm assuming that other languages also allow this, and RMS also supports
this.

So if one is doing a write to a disk file on VMS, it will use RMS and be
an RMS file

There is lower level file I/O available on VMS, which I'm rather
familiar with, but the languages as far as I know all use RMS for file I/O.
Post by Bill Gunshannon
Really? then why did Phillip have the problem that started this whole
discussion? VMS did not give him what he needed. I butchered a file
he brought over from somewhere on the web. Of course, Unix will do
that too, but Unix never told you it wouldn't. :-)
That was not specifically a VMS problem. As Phillip finally confessed,
he was pulling a file using HTTP, which can vary widely, based upon what
the sender chooses to do.

Garbage coming in, garbage stored in the output file.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Dirk Munk
2021-01-01 15:46:05 UTC
Permalink
Post by Dirk Munk
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Post by Jan-Erik Söderholm
Post by Dirk Munk
Post by Bill Gunshannon
Post by Jan-Erik Söderholm
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
Post by Phillip Helbig (undress to reply)
I tried for about 45 minutes---all the suggestions posted
here! It was
about 100 MB, so not all were quick to check.
In the end, I managed to transfer it again (don't ask!) and
somehow,
magically, it was OK.
I've dealt with problems like these before, usually caused by
applications that were not written for VMS.
You need to have a bit of a feeling for the different file
types of VMS
to fix these problems, but if you have that, it's very
simple to solve
these little puzzles.
I don't know how many times I've used SET FILE/ATTR or CONVERT or TECO
to fix things like this.  I can usually look at the contents, look at
DIR/FULL, and see what needs to be done if they don't match, but this
was somehow different.
Nice that it was fixed! And no, I do not belive in magic... :-)
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial.  :-)
bill
No, of course Unix is not immune. Using <lf> or <cr> (Windows)
as record terminator is a rather silly idea. It means that you
can't use those characters in a record, and you have to scan the
contents of a file for those characters. Simply writing the
length of a record at the beginning of that record is far better
solution.
Having a <LF> or a <CR> in text files seems rather logical to me.
What else, if you want either a line feed or a carriage return?
But yes, there are other ways to specify and delimiting a "line of text",
if you have a system suporting that.
Now, if that "record" is something else than a "line of text"...
The problem is that in Unix and Windows land there is no
difference between the metadata of a file, and the actual contents
of a file. The metadata should define the file and the records in
the file, that should be completely separate from the actual data
contents of the file.
Can't speak for Windows, but Unix has no meta-data. Unix has only one
file type, a stream of bytes.  Everything else is application layer.
Which means you don't have a clue about the contents of a file,
until you know the internals of the application.
Well, that isn't exactly true.  Certain file types do have clues.
And, at least under Unix, there is an application that will do a
very good job of identifying what the file is.  It is even possible
to add your own hints if they exist and if you so desire .
Nice, but suppose you have a Cobol compiler on Unix, then it will have
to set up its own file system with all the files Cobol supports,
Don't know what you mean by "set up its own file system".  COBOL will
open the file and if necessary create them for output files.  To Unix
the files will be streams of bytes.  To COBOL Programs they will be
sequential , line sequential, direct or indexed.
Yes indeed. And all those files created by those Cobol programs can only
be used by other Cobol programs created by that compiler.

Compare that with VMS, where I can read and write those files by any
other program, written in any other language, or even with DCL if the
type of data in the files allows it.
Post by Dirk Munk
                                                                  like
indexed files. What will that application do with those files? RMS
will tell you the structure of the file, you don't have to guess it.
I use GnuCOBOL.  sequential files show up as "ASCII text: as does the
COBOL Source File.  Indexed report as "Berkeley DB" as that was the
option I chose for indexed files when I built GnuCOBOL.  Other COBOL
compiler (like MicroFocus) may differ. Of course, the executable shows
up as "ELF 64-bit LSB shared object".  If I wanted to put in the
effort I could probably get it to identify the source as COBOL source
but I  see no reason to bother.
Cobol source files are always text files of course. But again, with VMS
all the file types are offered by RMS, and can be used by any language
or even DCL.
Post by Dirk Munk
Post by Dirk Munk
                                           Standard VMS applications
produce structured files, so you only have to worry about the data
contents. It is possible to write your own applications using the
files of another application. The application can be in any
language, because RMS is the layer between the application and the
file. This is a structured approach, instead of producing a diarrhea
of bytes, and calling it a file.
Post by Bill Gunshannon
Post by Dirk Munk
Suppose I have a file with binary data, and one byte has the
binary (ascii) value of <lf>, then Unix will use it as a record
separator, even if it is in the middle of the actual data of that
record.
Unix has no records. If you cat the file it will line break at the <lf>.
If you od -c the file it will identify the <lf> as just that.
Wonderful. However, it is clear that in many applications the notion
of a data record is present, and that the <lf> is used as record
separator, even if Unix formally doesn't have records.
Again, that is  more of a C'ism than a Unix'ism.  If I write an
application that uses ^M instead of ^J it will work just fine.
and, there is no reason why I couldn't have ^J as a valid, non-
record terminating character in the file.
Sure you can. But the standard (used for instance by FTP ASCII
transfers) is <lf>.
OK, so what?  The default has to be something and that has been around
a long time.  And there are a lot of applications written to that
standard.  But I am not forced to use it.  How  much freedom do you
have?
RMS stores the DATA. How it stores the data is normally something you
don't care about. Standard RMS does not confuse your data with record
delimiters like <cr>, <lf>, or which other delimiter you want to use.
That is the point, not which freedom I have in the choice of delimiter.
Post by Dirk Munk
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you have a VMS file with fixed record size. That file has
no records separators what so ever, it is one long stream of data.
VMS can calculate where the records start and end in the file.
Suppose it consists out of sets of three records of 100 bytes that
belong together. Then you can change the attributes of that file
to records of 300 bytes, and in one read operation you will have
all the data that belongs together. I've actually used this in the
past.
And that would be an application concept, not really an OS thing.
Actually not, since this can only be done because of the way RMS
stores data, and RMS is part of the OS.
See, there is where we differ in opinion.  I see RMS as an
application that just happens to ship with VMS.  Like editors,
compilers and other pieces that ship with the OS but have are
not really part of it.  Surely VMS will run without RMS present.
Not all applications need to access files at all.
No, RMS is more like middleware. How do you think that VMS could read
and write its own files if RMS is not present?
Does a call to printf/scanf in a C program use RMS?
(Really, can someone answer that question?)
Post by Dirk Munk
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you want to print such a file, then VMS will send a <cr>
and a <lf> to the printer after each record. Simple.
VMS won't.  Whatever application actually prints it will.
Obviously, this is functionality of the spooler, and that is part of VMS.
Post by Bill Gunshannon
Post by Dirk Munk
The DEC software engineers understood very well why it is a bad
idea to mix up contents of a file with the structure of a file,
and that's why they did not use stream files as standard RMS files
in applications. They are just there for compatibly with Unix,
Windows etc.
And Unix made all files streams of bytes and lets the applications
decide what to do with them.  Not really an OS problem.
Exchanging data between applications is rather important. Those
applications can be written in many languages, can come from
different sources. It is obvious that well structured files are
paramount for exchanging data between applications. That is why
something like RMS is in fact a very modern approach to structured
software engineering, instead of producing a an unstructured
diarrhea of bytes, and calling it a file.
Some see it otherwise.  Unix tends to leave more control for the
developer and not try and handcuff them with someone else's concept
of how things should be done.
If you must, you can do that with VMS as well. However, in 99.9% of
all applications, RMS with all of its functionality will give you
execly what you need. The point is, Unix doesn't have something like
that. With VMS you have the choice, with Unix, you don't.
Really?  then why did Phillip have the problem that started this whole
discussion?  VMS did not give him what he needed.  I butchered a file
he brought over from somewhere on the web.  Of course, Unix will do
that too, but Unix never told you it wouldn't.  :-)
Philip had the problem that those silly record delimiters you need in
stream files turned up in his data. By using some RMS file manipulations
they could be removed.
Bill Gunshannon
2021-01-01 16:13:15 UTC
Permalink
Post by Dirk Munk
Post by Dirk Munk
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Post by Jan-Erik Söderholm
Post by Dirk Munk
Post by Bill Gunshannon
Post by Jan-Erik Söderholm
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
Post by Phillip Helbig (undress to reply)
I tried for about 45 minutes---all the suggestions posted
here! It was
about 100 MB, so not all were quick to check.
In the end, I managed to transfer it again (don't ask!)
and somehow,
magically, it was OK.
I've dealt with problems like these before, usually caused by
applications that were not written for VMS.
You need to have a bit of a feeling for the different file
types of VMS
to fix these problems, but if you have that, it's very
simple to solve
these little puzzles.
I don't know how many times I've used SET FILE/ATTR or
CONVERT or TECO
to fix things like this.  I can usually look at the
contents, look at
DIR/FULL, and see what needs to be done if they don't match, but this
was somehow different.
Nice that it was fixed! And no, I do not belive in magic... :-)
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial.  :-)
bill
No, of course Unix is not immune. Using <lf> or <cr> (Windows)
as record terminator is a rather silly idea. It means that you
can't use those characters in a record, and you have to scan
the contents of a file for those characters. Simply writing the
length of a record at the beginning of that record is far
better solution.
Having a <LF> or a <CR> in text files seems rather logical to me.
What else, if you want either a line feed or a carriage return?
But yes, there are other ways to specify and delimiting a "line of text",
if you have a system suporting that.
Now, if that "record" is something else than a "line of text"...
The problem is that in Unix and Windows land there is no
difference between the metadata of a file, and the actual
contents of a file. The metadata should define the file and the
records in the file, that should be completely separate from the
actual data contents of the file.
Can't speak for Windows, but Unix has no meta-data. Unix has only one
file type, a stream of bytes.  Everything else is application layer.
Which means you don't have a clue about the contents of a file,
until you know the internals of the application.
Well, that isn't exactly true.  Certain file types do have clues.
And, at least under Unix, there is an application that will do a
very good job of identifying what the file is.  It is even possible
to add your own hints if they exist and if you so desire .
Nice, but suppose you have a Cobol compiler on Unix, then it will
have to set up its own file system with all the files Cobol supports,
Don't know what you mean by "set up its own file system".  COBOL will
open the file and if necessary create them for output files.  To Unix
the files will be streams of bytes.  To COBOL Programs they will be
sequential , line sequential, direct or indexed.
Yes indeed. And all those files created by those Cobol programs can only
be used by other Cobol programs created by that compiler.
Why on earth would you think that? They can be used by any
program using any language you wish to program in. As long
as you know the format and contents of the file, which you
also need to know to access them with COBOL. If they were
written as PIC X(80) (or whatever length) you can just cat
them and see the contents.
Post by Dirk Munk
Compare that with VMS, where I can read and write those files by any
other program, written in any other language, or even with DCL if the
type of data in the files allows it.
Nothing different between them.
Post by Dirk Munk
Post by Dirk Munk
like indexed files. What will that application do with those files?
RMS will tell you the structure of the file, you don't have to guess it.
I use GnuCOBOL.  sequential files show up as "ASCII text: as does the
COBOL Source File.  Indexed report as "Berkeley DB" as that was the
option I chose for indexed files when I built GnuCOBOL.  Other COBOL
compiler (like MicroFocus) may differ. Of course, the executable shows
up as "ELF 64-bit LSB shared object".  If I wanted to put in the
effort I could probably get it to identify the source as COBOL source
but I  see no reason to bother.
Cobol source files are always text files of course. But again, with VMS
all the file types are offered by RMS, and can be used by any language
or even DCL.
And, as I stated above the exact same is true of Unix and OS-9 and
RT-11 and Windows and probably every other OS.
Post by Dirk Munk
Post by Dirk Munk
Post by Dirk Munk
                                           Standard VMS
applications produce structured files, so you only have to worry
about the data contents. It is possible to write your own
applications using the files of another application. The
application can be in any language, because RMS is the layer
between the application and the file. This is a structured
approach, instead of producing a diarrhea of bytes, and calling it
a file.
Post by Bill Gunshannon
Post by Dirk Munk
Suppose I have a file with binary data, and one byte has the
binary (ascii) value of <lf>, then Unix will use it as a record
separator, even if it is in the middle of the actual data of that
record.
Unix has no records. If you cat the file it will line break at the <lf>.
If you od -c the file it will identify the <lf> as just that.
Wonderful. However, it is clear that in many applications the
notion of a data record is present, and that the <lf> is used as
record separator, even if Unix formally doesn't have records.
Again, that is  more of a C'ism than a Unix'ism.  If I write an
application that uses ^M instead of ^J it will work just fine.
and, there is no reason why I couldn't have ^J as a valid, non-
record terminating character in the file.
Sure you can. But the standard (used for instance by FTP ASCII
transfers) is <lf>.
OK, so what?  The default has to be something and that has been around
a long time.  And there are a lot of applications written to that
standard.  But I am not forced to use it.  How  much freedom do you
have?
RMS stores the DATA. How it stores the data is normally something you
don't care about. Standard RMS does not confuse your data with record
delimiters like <cr>, <lf>, or which other delimiter you want to use.
That is the point, not which freedom I have in the choice of delimiter.
Post by Dirk Munk
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you have a VMS file with fixed record size. That file has
no records separators what so ever, it is one long stream of
data. VMS can calculate where the records start and end in the
file. Suppose it consists out of sets of three records of 100
bytes that belong together. Then you can change the attributes of
that file to records of 300 bytes, and in one read operation you
will have all the data that belongs together. I've actually used
this in the past.
And that would be an application concept, not really an OS thing.
Actually not, since this can only be done because of the way RMS
stores data, and RMS is part of the OS.
See, there is where we differ in opinion.  I see RMS as an
application that just happens to ship with VMS.  Like editors,
compilers and other pieces that ship with the OS but have are
not really part of it.  Surely VMS will run without RMS present.
Not all applications need to access files at all.
No, RMS is more like middleware. How do you think that VMS could read
and write its own files if RMS is not present?
Does a call to printf/scanf in a C program use RMS?
(Really, can someone answer that question?)
Post by Dirk Munk
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you want to print such a file, then VMS will send a <cr>
and a <lf> to the printer after each record. Simple.
VMS won't.  Whatever application actually prints it will.
Obviously, this is functionality of the spooler, and that is part of VMS.
Post by Bill Gunshannon
Post by Dirk Munk
The DEC software engineers understood very well why it is a bad
idea to mix up contents of a file with the structure of a file,
and that's why they did not use stream files as standard RMS
files in applications. They are just there for compatibly with
Unix, Windows etc.
And Unix made all files streams of bytes and lets the applications
decide what to do with them.  Not really an OS problem.
Exchanging data between applications is rather important. Those
applications can be written in many languages, can come from
different sources. It is obvious that well structured files are
paramount for exchanging data between applications. That is why
something like RMS is in fact a very modern approach to structured
software engineering, instead of producing a an unstructured
diarrhea of bytes, and calling it a file.
Some see it otherwise.  Unix tends to leave more control for the
developer and not try and handcuff them with someone else's concept
of how things should be done.
If you must, you can do that with VMS as well. However, in 99.9% of
all applications, RMS with all of its functionality will give you
execly what you need. The point is, Unix doesn't have something like
that. With VMS you have the choice, with Unix, you don't.
Really?  then why did Phillip have the problem that started this whole
discussion?  VMS did not give him what he needed.  I butchered a file
he brought over from somewhere on the web.  Of course, Unix will do
that too, but Unix never told you it wouldn't.  :-)
Philip had the problem that those silly record delimiters you need in
stream files turned up in his data. By using some RMS file manipulations
they could be removed.
It probably comes as a surprise to you, but a VMS Web Server would
have sent the data with exactly the same <CR><LF> characters embedded
in the text. The problem was most likely the receiving client, on VMS,
did not handle properly converting the incoming NETASCII data into
VMS data.

bill
Dirk Munk
2021-01-01 16:52:50 UTC
Permalink
Post by Dirk Munk
Post by Dirk Munk
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Post by Jan-Erik Söderholm
Post by Dirk Munk
Post by Bill Gunshannon
Post by Jan-Erik Söderholm
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
Post by Phillip Helbig (undress to reply)
I tried for about 45 minutes---all the suggestions posted
here! It was
about 100 MB, so not all were quick to check.
In the end, I managed to transfer it again (don't ask!)
and somehow,
magically, it was OK.
I've dealt with problems like these before, usually caused by
applications that were not written for VMS.
You need to have a bit of a feeling for the different file
types of VMS
to fix these problems, but if you have that, it's very
simple to solve
these little puzzles.
I don't know how many times I've used SET FILE/ATTR or
CONVERT or TECO
to fix things like this.  I can usually look at the
contents, look at
DIR/FULL, and see what needs to be done if they don't
match, but this
was somehow different.
Nice that it was fixed! And no, I do not belive in magic... :-)
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial.  :-)
bill
No, of course Unix is not immune. Using <lf> or <cr> (Windows)
as record terminator is a rather silly idea. It means that you
can't use those characters in a record, and you have to scan
the contents of a file for those characters. Simply writing
the length of a record at the beginning of that record is far
better solution.
Having a <LF> or a <CR> in text files seems rather logical to me.
What else, if you want either a line feed or a carriage return?
But yes, there are other ways to specify and delimiting a "line of text",
if you have a system suporting that.
Now, if that "record" is something else than a "line of text"...
The problem is that in Unix and Windows land there is no
difference between the metadata of a file, and the actual
contents of a file. The metadata should define the file and the
records in the file, that should be completely separate from the
actual data contents of the file.
Can't speak for Windows, but Unix has no meta-data. Unix has only one
file type, a stream of bytes.  Everything else is application layer.
Which means you don't have a clue about the contents of a file,
until you know the internals of the application.
Well, that isn't exactly true.  Certain file types do have clues.
And, at least under Unix, there is an application that will do a
very good job of identifying what the file is.  It is even possible
to add your own hints if they exist and if you so desire .
Nice, but suppose you have a Cobol compiler on Unix, then it will
have to set up its own file system with all the files Cobol supports,
Don't know what you mean by "set up its own file system".  COBOL will
open the file and if necessary create them for output files.  To Unix
the files will be streams of bytes.  To COBOL Programs they will be
sequential , line sequential, direct or indexed.
Yes indeed. And all those files created by those Cobol programs can
only be used by other Cobol programs created by that compiler.
Why on earth would you think that?  They can be used by any
program using any language you wish to program in.  As long
as you know the format and contents of the file, which you
also need to know to access them with COBOL.  If they were
written as PIC X(80) (or whatever length) you can just cat
them and see the contents.
And if you have an indexed sequential file, how do you distinguish
between indexes and data? And in a relative file, how can you see which
record is still valid, and which is not? Those files have structures,
and you can't use them without a means to handle the structures.
Post by Dirk Munk
Compare that with VMS, where I can read and write those files by any
other program, written in any other language, or even with DCL if the
type of data in the files allows it.
Nothing different between them.
Post by Dirk Munk
Post by Dirk Munk
like indexed files. What will that application do with those files?
RMS will tell you the structure of the file, you don't have to guess it.
I use GnuCOBOL.  sequential files show up as "ASCII text: as does the
COBOL Source File.  Indexed report as "Berkeley DB" as that was the
option I chose for indexed files when I built GnuCOBOL.  Other COBOL
compiler (like MicroFocus) may differ. Of course, the executable shows
up as "ELF 64-bit LSB shared object".  If I wanted to put in the
effort I could probably get it to identify the source as COBOL source
but I  see no reason to bother.
Cobol source files are always text files of course. But again, with
VMS all the file types are offered by RMS, and can be used by any
language or even DCL.
And, as I stated above the exact same is true of Unix and OS-9 and
RT-11 and Windows and probably every other OS.
No, Unix does not offer indexed sequential files etc. as standardized
part of the operating system. They can be added as part of a compiler,
like Cobol, but then those files can only be used by applications
written with that compiler, unless the same filesystem is used by
another compiler.

It seems that Windows does have more file types as part of the OS, but
that feature is very well hidden.
Post by Dirk Munk
Post by Dirk Munk
Post by Dirk Munk
                                           Standard VMS
applications produce structured files, so you only have to worry
about the data contents. It is possible to write your own
applications using the files of another application. The
application can be in any language, because RMS is the layer
between the application and the file. This is a structured
approach, instead of producing a diarrhea of bytes, and calling it
a file.
Post by Bill Gunshannon
Post by Dirk Munk
Suppose I have a file with binary data, and one byte has the
binary (ascii) value of <lf>, then Unix will use it as a record
separator, even if it is in the middle of the actual data of
that record.
Unix has no records. If you cat the file it will line break at the <lf>.
If you od -c the file it will identify the <lf> as just that.
Wonderful. However, it is clear that in many applications the
notion of a data record is present, and that the <lf> is used as
record separator, even if Unix formally doesn't have records.
Again, that is  more of a C'ism than a Unix'ism.  If I write an
application that uses ^M instead of ^J it will work just fine.
and, there is no reason why I couldn't have ^J as a valid, non-
record terminating character in the file.
Sure you can. But the standard (used for instance by FTP ASCII
transfers) is <lf>.
OK, so what?  The default has to be something and that has been around
a long time.  And there are a lot of applications written to that
standard.  But I am not forced to use it.  How  much freedom do you
have?
RMS stores the DATA. How it stores the data is normally something you
don't care about. Standard RMS does not confuse your data with record
delimiters like <cr>, <lf>, or which other delimiter you want to use.
That is the point, not which freedom I have in the choice of delimiter.
Post by Dirk Munk
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you have a VMS file with fixed record size. That file
has no records separators what so ever, it is one long stream of
data. VMS can calculate where the records start and end in the
file. Suppose it consists out of sets of three records of 100
bytes that belong together. Then you can change the attributes
of that file to records of 300 bytes, and in one read operation
you will have all the data that belongs together. I've actually
used this in the past.
And that would be an application concept, not really an OS thing.
Actually not, since this can only be done because of the way RMS
stores data, and RMS is part of the OS.
See, there is where we differ in opinion.  I see RMS as an
application that just happens to ship with VMS.  Like editors,
compilers and other pieces that ship with the OS but have are
not really part of it.  Surely VMS will run without RMS present.
Not all applications need to access files at all.
No, RMS is more like middleware. How do you think that VMS could
read and write its own files if RMS is not present?
Does a call to printf/scanf in a C program use RMS?
(Really, can someone answer that question?)
Post by Dirk Munk
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you want to print such a file, then VMS will send a <cr>
and a <lf> to the printer after each record. Simple.
VMS won't.  Whatever application actually prints it will.
Obviously, this is functionality of the spooler, and that is part of VMS.
Post by Bill Gunshannon
Post by Dirk Munk
The DEC software engineers understood very well why it is a bad
idea to mix up contents of a file with the structure of a file,
and that's why they did not use stream files as standard RMS
files in applications. They are just there for compatibly with
Unix, Windows etc.
And Unix made all files streams of bytes and lets the applications
decide what to do with them.  Not really an OS problem.
Exchanging data between applications is rather important. Those
applications can be written in many languages, can come from
different sources. It is obvious that well structured files are
paramount for exchanging data between applications. That is why
something like RMS is in fact a very modern approach to structured
software engineering, instead of producing a an unstructured
diarrhea of bytes, and calling it a file.
Some see it otherwise.  Unix tends to leave more control for the
developer and not try and handcuff them with someone else's concept
of how things should be done.
If you must, you can do that with VMS as well. However, in 99.9% of
all applications, RMS with all of its functionality will give you
execly what you need. The point is, Unix doesn't have something like
that. With VMS you have the choice, with Unix, you don't.
Really?  then why did Phillip have the problem that started this whole
discussion?  VMS did not give him what he needed.  I butchered a file
he brought over from somewhere on the web.  Of course, Unix will do
that too, but Unix never told you it wouldn't.  :-)
Philip had the problem that those silly record delimiters you need in
stream files turned up in his data. By using some RMS file
manipulations they could be removed.
It probably comes as a surprise to you, but a VMS Web Server would
have sent the data with exactly the same <CR><LF> characters embedded
in the text. The problem was most likely the receiving client, on VMS,
did not handle properly converting the incoming NETASCII data into
VMS data.
bill
Stephen Hoffman
2021-01-01 18:21:58 UTC
Permalink
Post by Dirk Munk
No, Unix does not offer indexed sequential files etc. as standardized
part of the operating system. They can be added as part of a compiler,
like Cobol, but then those files can only be used by applications
written with that compiler, unless the same filesystem is used by
another compiler.
Donno. The Unix distro I use includes database support. And most apps
will use that or will use another readily-available database, typically
one supported by the language(s) involved. And given typical COBOL app
packaging, should the COBOL app be using something not integrated with
the system, it's almost certainly been loaded using the local package
manager, and it's then available for other apps. Locally, poking inside
bundled apps is a little more work, but most know databases can be very
quickly identified using file magic; with one command, if the database
used is not already apparent from the database filename used.

But again, directly accessing the database underneath some unrelated
app is not something most folks are even doing, outside of reverse
engineering that particular app.

And reverse-engineering some unrelated app's RMS indexed file is Less
Than Fun. RMS might be consistent, but reverse-engineering RMS database
record formats is not.

As I'm presently working on something similar, reverse-engineering one
of the more common sorts of macOS databases—on OpenVMS—is easier than
reversing native RMS file formats on OpenVMS, too.

So... RMS database integration was a wonderful thing in the 1980s and
1990s, integrated databases are nice, but it's less of a differentiator
in this era given the prevalence of databases.

For those few that do need to reverse-engineer a data store, I'd
certainly prolly prefer a data store other than RMS due to the
difficulties inherent in RMS database record-level reversing. And if I
do have the record definitions from the COBOL or other code, accessing
that data store in any of the common databases is little different from
accessing RMS databases.

And yes, I would prefer to see SQLite integrated into OpenVMS, or
MariaDB, or—with SSIO resolved—PostgreSQL. But installing some database
onto a Unix system is not an intractable hurdle. It's usually one or
two package manager commands, if not already loaded.
--
Pure Personal Opinion | HoffmanLabs LLC
Arne Vajhøj
2021-01-01 20:35:43 UTC
Permalink
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Post by Dirk Munk
like indexed files. What will that application do with those files?
RMS will tell you the structure of the file, you don't have to guess it.
I use GnuCOBOL.  sequential files show up as "ASCII text: as does the
COBOL Source File.  Indexed report as "Berkeley DB" as that was the
option I chose for indexed files when I built GnuCOBOL.  Other COBOL
compiler (like MicroFocus) may differ. Of course, the executable shows
up as "ELF 64-bit LSB shared object".  If I wanted to put in the
effort I could probably get it to identify the source as COBOL source
but I  see no reason to bother.
Cobol source files are always text files of course. But again, with
VMS all the file types are offered by RMS, and can be used by any
language or even DCL.
And, as I stated above the exact same is true of Unix and OS-9 and
RT-11 and Windows and probably every other OS.
No, Unix does not offer indexed sequential files etc. as standardized
part of the operating system. They can be added as part of a compiler,
like Cobol, but then those files can only be used by applications
written with that compiler, unless the same filesystem is used by
another compiler.
File system does not matter - from the file system perspective
then ISAM files are also just a stream of bytes on *nix.

But yes different compilers use different ISAM libraries. So
to access files using a different compiler, then either that
compiler must use the same library *or* the program must
use the library API explicit.
Post by Dirk Munk
It seems that Windows does have more file types as part of the OS, but
that feature is very well hidden.
What are you referring to?

Arne
Dirk Munk
2021-01-01 20:52:59 UTC
Permalink
Post by Arne Vajhøj
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Post by Dirk Munk
like indexed files. What will that application do with those
files? RMS will tell you the structure of the file, you don't have
to guess it.
I use GnuCOBOL.  sequential files show up as "ASCII text: as does the
COBOL Source File.  Indexed report as "Berkeley DB" as that was the
option I chose for indexed files when I built GnuCOBOL.  Other COBOL
compiler (like MicroFocus) may differ. Of course, the executable shows
up as "ELF 64-bit LSB shared object".  If I wanted to put in the
effort I could probably get it to identify the source as COBOL source
but I  see no reason to bother.
Cobol source files are always text files of course. But again, with
VMS all the file types are offered by RMS, and can be used by any
language or even DCL.
And, as I stated above the exact same is true of Unix and OS-9 and
RT-11 and Windows and probably every other OS.
No, Unix does not offer indexed sequential files etc. as standardized
part of the operating system. They can be added as part of a compiler,
like Cobol, but then those files can only be used by applications
written with that compiler, unless the same filesystem is used by
another compiler.
File system does not matter - from the file system perspective
then ISAM files are also just a stream of bytes on *nix.
But yes different compilers use different ISAM libraries. So
to access files using a different compiler, then either that
compiler must use the same library *or* the program must
use the library API explicit.
That is what I mean. You can only use those files by using the ISAM
library, and the ISAM library is not a part of Unix. It is running on
Unix, but not part of it, and you have the choice of several ISAM libraries.

With VMS the standard ISAM library is RMS.
Post by Arne Vajhøj
Post by Dirk Munk
It seems that Windows does have more file types as part of the OS, but
that feature is very well hidden.
What are you referring to?
It seems that Windows has a standard ISAM library, and the Windows
operating system is even using it.
Arne Vajhøj
2021-01-01 21:00:32 UTC
Permalink
Post by Dirk Munk
Post by Arne Vajhøj
Post by Dirk Munk
It seems that Windows does have more file types as part of the OS,
but that feature is very well hidden.
What are you referring to?
It seems that Windows has a standard ISAM library, and the Windows
operating system is even using it.
Are you thinking about:

https://en.wikipedia.org/wiki/Extensible_Storage_Engine

?

It does ship with Windows, but it is not used much outside
MS stuff.

Arne
Dirk Munk
2021-01-01 22:07:46 UTC
Permalink
Post by Arne Vajhøj
Post by Dirk Munk
Post by Arne Vajhøj
Post by Dirk Munk
It seems that Windows does have more file types as part of the OS,
but that feature is very well hidden.
What are you referring to?
It seems that Windows has a standard ISAM library, and the Windows
operating system is even using it.
https://en.wikipedia.org/wiki/Extensible_Storage_Engine
?
It does ship with Windows, but it is not used much outside
MS stuff.
True, that's why I said it is a bit hidden. I assume if it was a bit
more well known, other applications could and would use it as well.
Arne Vajhøj
2021-01-03 01:46:09 UTC
Permalink
Post by Dirk Munk
Post by Arne Vajhøj
Post by Dirk Munk
Post by Arne Vajhøj
Post by Dirk Munk
It seems that Windows does have more file types as part of the OS,
but that feature is very well hidden.
What are you referring to?
It seems that Windows has a standard ISAM library, and the Windows
operating system is even using it.
https://en.wikipedia.org/wiki/Extensible_Storage_Engine
?
It does ship with Windows, but it is not used much outside
MS stuff.
True, that's why I said it is a bit hidden. I assume if it was a bit
more well known, other applications could and would use it as well.
I suspect that the main reason for it not being used so much
is that most applications go with a more advanced database:
relational, document store etc..

Arne
Dave Froble
2021-01-01 16:39:02 UTC
Permalink
Post by Dirk Munk
Yes indeed. And all those files created by those Cobol programs can only
be used by other Cobol programs created by that compiler.
I've got to challenge this statement. The files are just data, and many
compilers can produce code that could use the data. Might require some
special coding, but, it's possible.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Dirk Munk
2021-01-01 16:55:46 UTC
Permalink
Post by Dirk Munk
Yes indeed. And all those files created by those Cobol programs can only
be used by other Cobol programs created by that compiler.
I've got to challenge this statement.  The files are just data, and many
compilers can produce code that could use the data.  Might require some
special coding, but, it's possible.
No, the files do not just contain data. They contain indexes, pointers
etc., and you have to be able to use them.
Chris Townley
2021-01-01 17:20:37 UTC
Permalink
Post by Dirk Munk
Post by Dirk Munk
Yes indeed. And all those files created by those Cobol programs can only
be used by other Cobol programs created by that compiler.
I've got to challenge this statement.  The files are just data, and
many compilers can produce code that could use the data.  Might
require some special coding, but, it's possible.
No, the files do not just contain data. They contain indexes, pointers
etc., and you have to be able to use them.
I managed it. We had a VMS application written (externally) in COBOL
with massive data files. Every month I exported to non-packed files, and
stripped out the data I needed for reporting using DCL

Worked for me!

Chris
Dirk Munk
2021-01-01 17:30:48 UTC
Permalink
Post by Chris Townley
Post by Dirk Munk
Post by Dirk Munk
Yes indeed. And all those files created by those Cobol programs can only
be used by other Cobol programs created by that compiler.
I've got to challenge this statement.  The files are just data, and
many compilers can produce code that could use the data.  Might
require some special coding, but, it's possible.
No, the files do not just contain data. They contain indexes, pointers
etc., and you have to be able to use them.
I managed it. We had a VMS application written (externally) in COBOL
with massive data files. Every month I exported to non-packed files, and
stripped out the data I needed for reporting using DCL
Worked for me!
Chris
Yes, of course that works on VMS ! You prove my point.

But it does not work like this on Unix. My remark was about Cobol files
on Unix.
Arne Vajhøj
2021-01-01 20:28:33 UTC
Permalink
Post by Dirk Munk
Yes indeed. And all those files created by those Cobol programs can only
be used by other Cobol programs created by that compiler.
I've got to challenge this statement.  The files are just data, and many
compilers can produce code that could use the data.  Might require some
special coding, but, it's possible.
Of course data can be read. And with the correct library then data
can be interpreted correctly.

But there is a difference between VMS and non-VMS.

In the VMS world it is:

Cobol program----Cobol RTL----|
Pascal program---Pascal RTL---|--RMS--RMS index sequential file format
Fortran program--Fortran RTL--|

In the non-VMS world it is:

Cobol vendor A program--Cobol vendor A RTL--vendor A ISAM file format
Cobol vendor B program--Cobol vendor B RTL--vendor B ISAM file format
Cobol vendor C program--Cobol vendor C RTL--vendor C ISAM file format

where A, B and C ISAM file formats are different.

Heck - GNU Cobol even comes in two flavors with two different ISAM file
formats.

Arne
Bill Gunshannon
2021-01-01 20:32:50 UTC
Permalink
Post by Arne Vajhøj
Post by Dirk Munk
Yes indeed. And all those files created by those Cobol programs can only
be used by other Cobol programs created by that compiler.
I've got to challenge this statement.  The files are just data, and
many compilers can produce code that could use the data.  Might
require some special coding, but, it's possible.
Of course data can be read. And with the correct library then data
can be interpreted correctly.
But there is a difference between VMS and non-VMS.
Cobol program----Cobol RTL----|
Pascal program---Pascal RTL---|--RMS--RMS index sequential file format
Fortran program--Fortran RTL--|
Cobol vendor A program--Cobol vendor A RTL--vendor A ISAM file format
Cobol vendor B program--Cobol vendor B RTL--vendor B ISAM file format
Cobol vendor C program--Cobol vendor C RTL--vendor C ISAM file format
where A, B and C ISAM file formats are different.
Heck - GNU Cobol even comes in two flavors with two different ISAM file
formats.
Yes, but they are well documented and use underlying systems (like
Berkeley DB) that work just fine with other languages and applications.

bill
Dirk Munk
2021-01-01 21:56:34 UTC
Permalink
Post by Bill Gunshannon
Post by Arne Vajhøj
Post by Dirk Munk
Yes indeed. And all those files created by those Cobol programs can only
be used by other Cobol programs created by that compiler.
I've got to challenge this statement.  The files are just data, and
many compilers can produce code that could use the data.  Might
require some special coding, but, it's possible.
Of course data can be read. And with the correct library then data
can be interpreted correctly.
But there is a difference between VMS and non-VMS.
Cobol program----Cobol RTL----|
Pascal program---Pascal RTL---|--RMS--RMS index sequential file format
Fortran program--Fortran RTL--|
Cobol vendor A program--Cobol vendor A RTL--vendor A ISAM file format
Cobol vendor B program--Cobol vendor B RTL--vendor B ISAM file format
Cobol vendor C program--Cobol vendor C RTL--vendor C ISAM file format
where A, B and C ISAM file formats are different.
Heck - GNU Cobol even comes in two flavors with two different ISAM
file formats.
Yes, but they are well documented and use underlying systems (like
Berkeley DB) that work just fine with other languages and applications.
That's not the point. VMS has a standard ISAM library (RMS), and all
languages on VMS and even DCL use it. That is a lot easier than having
to figure out which compiler can use which ISAM library, and if your
application supplier uses that library.
Arne Vajhøj
2021-01-03 01:58:00 UTC
Permalink
Post by Bill Gunshannon
Post by Arne Vajhøj
Post by Dirk Munk
Yes indeed. And all those files created by those Cobol programs can only
be used by other Cobol programs created by that compiler.
I've got to challenge this statement.  The files are just data, and
many compilers can produce code that could use the data.  Might
require some special coding, but, it's possible.
Of course data can be read. And with the correct library then data
can be interpreted correctly.
But there is a difference between VMS and non-VMS.
Cobol program----Cobol RTL----|
Pascal program---Pascal RTL---|--RMS--RMS index sequential file format
Fortran program--Fortran RTL--|
Cobol vendor A program--Cobol vendor A RTL--vendor A ISAM file format
Cobol vendor B program--Cobol vendor B RTL--vendor B ISAM file format
Cobol vendor C program--Cobol vendor C RTL--vendor C ISAM file format
where A, B and C ISAM file formats are different.
Heck - GNU Cobol even comes in two flavors with two different ISAM
file formats.
Yes, but they are well documented and use underlying systems (like
Berkeley DB) that work just fine with other languages and applications.
True.

BDB may be a bit more well-documented than VBISAM. But on the
other hand VBISAM's LGPL may be a lot more attractive than
BDB's AGPL.

Arne
Dave Froble
2021-01-01 23:57:17 UTC
Permalink
Post by Arne Vajhøj
Post by Dave Froble
Post by Dirk Munk
Yes indeed. And all those files created by those Cobol programs can only
be used by other Cobol programs created by that compiler.
I've got to challenge this statement. The files are just data, and
many compilers can produce code that could use the data. Might
require some special coding, but, it's possible.
Of course data can be read. And with the correct library then data
can be interpreted correctly.
But there is a difference between VMS and non-VMS.
Cobol program----Cobol RTL----|
Pascal program---Pascal RTL---|--RMS--RMS index sequential file format
Fortran program--Fortran RTL--|
Cobol vendor A program--Cobol vendor A RTL--vendor A ISAM file format
Cobol vendor B program--Cobol vendor B RTL--vendor B ISAM file format
Cobol vendor C program--Cobol vendor C RTL--vendor C ISAM file format
where A, B and C ISAM file formats are different.
Heck - GNU Cobol even comes in two flavors with two different ISAM file
formats.
Arne
Doesn't anybody here write their own file handling code ???
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Arne Vajhøj
2021-01-02 03:22:04 UTC
Permalink
Post by Dave Froble
Post by Arne Vajhøj
Post by Dirk Munk
Yes indeed. And all those files created by those Cobol programs can only
be used by other Cobol programs created by that compiler.
I've got to challenge this statement.  The files are just data, and
many compilers can produce code that could use the data.  Might
require some special coding, but, it's possible.
Of course data can be read. And with the correct library then data
can be interpreted correctly.
But there is a difference between VMS and non-VMS.
Cobol program----Cobol RTL----|
Pascal program---Pascal RTL---|--RMS--RMS index sequential file format
Fortran program--Fortran RTL--|
Cobol vendor A program--Cobol vendor A RTL--vendor A ISAM file format
Cobol vendor B program--Cobol vendor B RTL--vendor B ISAM file format
Cobol vendor C program--Cobol vendor C RTL--vendor C ISAM file format
where A, B and C ISAM file formats are different.
Heck - GNU Cobol even comes in two flavors with two different ISAM file
formats.
Doesn't anybody here write their own file handling code ???
I dabbled in such stuff (B trees) 20-30 years ago.

But today I would say that this stuff is for database engine
developers and that developers pick and use something.

Arne
Simon Clubley
2021-01-02 14:58:03 UTC
Permalink
Post by Dave Froble
Doesn't anybody here write their own file handling code ???
Unless you are writing directly to the controller registers, then
all you are doing is writing a custom abstraction layer on top
of an existing API.

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Stephen Hoffman
2021-01-02 16:13:39 UTC
Permalink
Post by Dave Froble
Doesn't anybody here write their own file handling code ???
I've written file system and volume system code, and have most recently
loaded the whole file to be parsed into memory and parsed it there as
RMS was just getting in the way, but generally no.
We've been moving up the stack from machine code to assembler, and from
assembler to 3GLs, and from RMS to databases, etc., since this whole IT
road-show got started.
Yes, some few of us need to deal with the arcana of x86-64, AArch64,
RISC-V, EPIC, or Alpha, but the vast majority of us don't need to
design and write and support that code.
--
Pure Personal Opinion | HoffmanLabs LLC
Arne Vajhøj
2021-01-03 02:04:45 UTC
Permalink
Post by Bill Gunshannon
Post by Dirk Munk
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you have a VMS file with fixed record size. That file has
no records separators what so ever, it is one long stream of data.
VMS can calculate where the records start and end in the file.
Suppose it consists out of sets of three records of 100 bytes that
belong together. Then you can change the attributes of that file
to records of 300 bytes, and in one read operation you will have
all the data that belongs together. I've actually used this in the
past.
And that would be an application concept, not really an OS thing.
Actually not, since this can only be done because of the way RMS
stores data, and RMS is part of the OS.
See, there is where we differ in opinion.  I see RMS as an
application that just happens to ship with VMS.  Like editors,
compilers and other pieces that ship with the OS but have are
not really part of it.  Surely VMS will run without RMS present.
Not all applications need to access files at all.
No, RMS is more like middleware. How do you think that VMS could read
and write its own files if RMS is not present?
Does a call to printf/scanf in a C program use RMS?
(Really, can someone answer that question?)
You will need someone with access to source to confirm.

But:
* the traditional story is that all language RTL IO goes
through RMS - including C RTL (even though that one
supposedly is a bit more complicated than other
languages)
* as a rule of thumb: if the IO works both with a terminal
and a file then it is likely going through RMS - printf and
scanf does

(it is of course possible to test on device type and issue
different SYS$QIO calls based on device type - but that seems
like reinventing the wheel/RMS to me)

Arne
Craig A. Berry
2021-01-03 04:01:08 UTC
Permalink
Post by Arne Vajhøj
Post by Bill Gunshannon
Does a call to printf/scanf in a C program use RMS?
(Really, can someone answer that question?)
You will need someone with access to source to confirm.
* the traditional story is that all language RTL IO goes
  through RMS - including C RTL (even though that one
  supposedly is a bit more complicated than other
  languages)
* as a rule of thumb: if the IO works both with a terminal
  and a file then it is likely going through RMS - printf and
  scanf does
I believe calls such as read() and write() may be implemented via $QIO
if the filehandle is a socket and perhaps other things that are not a
regular file or terminal. But otherwise your inference concurs with
mine. fwrite() is infamously (mis)implemented such that it introduces a
record boundary for each character in the string passed to it, at least
for the idiomatic case where the size is passed as 1 and the number of
items is the number of bytes in the string.
Hein RMS van den Heuvel
2021-01-03 20:13:40 UTC
Permalink
Post by Arne Vajhøj
* the traditional story is that all language RTL IO goes
through RMS - including C RTL (even though that one
supposedly is a bit more complicated than other
languages)
Last century, yes.
Today for the simple sequential file COBRTL, CRTL and SORT switched away from record IO to roll their own with BlockIO ($READ, $WRITE).
I don't recall them doing direct $QIO calls. BlockIO is a tiny layer over QIO adding error handling mostly.
Record IO is way to expensive for high volume IO - going in and out of exec mode 'all the time', probing the argument blocks and buffers and so on.
Block IO incurs this overhead typically much less frequently to the point where it does not really matter ( 10 ... 50 times less is my guess).

It's all easy enough to verify for your exact situation with a tiny program doing the IO calls you would like to understand.
Code in a pause after a few IOs and activate ANALYZE /SYSTEM ... SET PROC ... SHOW PROC/RMS=(RAB,FAB,BDBSUM[,IFAB,IRAB,FSB])

Hein
Arne Vajhøj
2021-01-03 20:44:47 UTC
Permalink
Post by Hein RMS van den Heuvel
Post by Arne Vajhøj
* the traditional story is that all language RTL IO goes
through RMS - including C RTL (even though that one
supposedly is a bit more complicated than other
languages)
Last century, yes.
Today for the simple sequential file COBRTL, CRTL and SORT switched away from record IO to roll their own with BlockIO ($READ, $WRITE).
I don't recall them doing direct $QIO calls. BlockIO is a tiny layer over QIO adding error handling mostly.
Record IO is way to expensive for high volume IO - going in and out of exec mode 'all the time', probing the argument blocks and buffers and so on.
Block IO incurs this overhead typically much less frequently to the point where it does not really matter ( 10 ... 50 times less is my guess).
Block IO is still RMS.

But a lot more DIY.

Besides the error handling doesn't $WRITE also handle stuff like
extending the file? ($QIOW WRITEVBLK does not).

Arne
Dirk Munk
2021-01-03 23:42:40 UTC
Permalink
Post by Hein RMS van den Heuvel
Post by Arne Vajhøj
* the traditional story is that all language RTL IO goes
through RMS - including C RTL (even though that one
supposedly is a bit more complicated than other
languages)
Last century, yes.
Today for the simple sequential file COBRTL, CRTL and SORT switched away from record IO to roll their own with BlockIO ($READ, $WRITE).
I don't recall them doing direct $QIO calls. BlockIO is a tiny layer over QIO adding error handling mostly.
Record IO is way to expensive for high volume IO - going in and out of exec mode 'all the time', probing the argument blocks and buffers and so on.
Block IO incurs this overhead typically much less frequently to the point where it does not really matter ( 10 ... 50 times less is my guess).
It's all easy enough to verify for your exact situation with a tiny program doing the IO calls you would like to understand.
Code in a pause after a few IOs and activate ANALYZE /SYSTEM ... SET PROC ... SHOW PROC/RMS=(RAB,FAB,BDBSUM[,IFAB,IRAB,FSB])
Hein
That switch must have been a long time ago for COBRTL. As long as I can
remember COBOL on RSX and VMS had the "reserve nn areas" in the File
control paragraph of the Input-output section of the Environment
division. to speed up file access, and that made IO much faster. As far
as I know that clause reserved nn blocks for that file.
Hein RMS van den Heuvel
2021-01-04 17:34:38 UTC
Permalink
Post by Dirk Munk
Post by Hein RMS van den Heuvel
Post by Arne Vajhøj
* the traditional story is that all language RTL IO goes
through RMS - including C RTL (even though that one
supposedly is a bit more complicated than other
languages)
Last century, yes.
Today for the simple sequential file COBRTL, CRTL and SORT switched away from record IO to roll their own with BlockIO ($READ, $WRITE).
I don't recall them doing direct $QIO calls. BlockIO is a tiny layer over QIO adding error handling mostly.
Record IO is way to expensive for high volume IO - going in and out of exec mode 'all the time', probing the argument blocks and buffers and so on.
Block IO incurs this overhead typically much less frequently to the point where it does not really matter ( 10 ... 50 times less is my guess).
It's all easy enough to verify for your exact situation with a tiny program doing the IO calls you would like to understand.
Code in a pause after a few IOs and activate ANALYZE /SYSTEM ... SET PROC ... SHOW PROC/RMS=(RAB,FAB,BDBSUM[,IFAB,IRAB,FSB])
Hein
That switch must have been a long time ago for COBRTL. As long as I can
remember COBOL on RSX and VMS had the "reserve nn areas" in the File
control paragraph of the Input-output section of the Environment
division. to speed up file access, and that made IO much faster. As far
as I know that clause reserved nn blocks for that file.
Oh boy, we sure know how to divert and drag on a discussion huh. :-)

BLOCK CONTAINS ---> MBC --> Multi Block Count
RESERVE AREAS --> MBF --> Multi Buffer Count.
Both are specific to RMS Record IO, not directly applicable for BlockIO.
If not specified defaults as per SHOW RMS are used.
Applications can (and some do) of course use GETJPI/GETSYI to also listen to SET RMS requests.

Actually I just ran a few quick test on EISNER ( VSI COBOL V3.1-0007 on OpenVMS Alpha V8.4-2L2 )
I cannot get Cobol to use BlockIO for Read, only for WRITE.


SDA> show proc/rms=(rab,fab,bdbsum)

Read:
FAC: 42 GET,BRO
ROP: 00101600 RAH,WBH,CDK,NLK
BDBSUM:
Address USERS SIZE NUMB VBN
7B077600 0 00007600 00007600 00000001 ! Used: BLOCK CONTAINS 30000 CHARACTERS,
7B079D70 0 00007600 00007600 0000003C ! Read Ahead already filled this after reading the first record
:
As many buffers as MBF (RESERVE X AREAS) requested or process/system default if MBF=0.
Note: if you test this on a small file, the buffer size is maxed out by file size!

Write:
FAC: 61 PUT,BIO,BRO
ROP: 00001E00 RAH,WBH,BIO,CDK
RFA: 00000000,0000
BDBSUM:
Address USERS SIZE
7B079D70 0 00000000 ! Only 1 faked out buffer - no bytes - RESERVE x AREAS not visible here.

Cheers,
Hein

Arne Vajhøj
2020-12-31 21:29:00 UTC
Permalink
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you have a VMS file with fixed record size. That file has no
records separators what so ever, it is one long stream of data. VMS
can calculate where the records start and end in the file. Suppose
it consists out of sets of three records of 100 bytes that belong
together. Then you can change the attributes of that file to records
of 300 bytes, and in one read operation you will have all the data
that belongs together. I've actually used this in the past.
And that would be an application concept, not really an OS thing.
Actually not, since this can only be done because of the way RMS
stores data, and RMS is part of the OS.
See, there is where we differ in opinion.  I see RMS as an
application that just happens to ship with VMS.  Like editors,
compilers and other pieces that ship with the OS but have are
not really part of it.  Surely VMS will run without RMS present.
Not all applications need to access files at all.
I would say RMS is part of VMS.

It comes with VMS. It get loaded with VMS. It runs in a privileged mode.
There has never been a VMS without it. There is no documented way of
replacing it. VMS relies on it for many features like reading SYSUAF.
The meta data used by RMS are in the ODS formats. The meta data used
by RMS it also available via system services (SYS$QIO and FAT block).

VMS without RMS would not be VMS.

Arne
Stephen Hoffman
2020-12-31 22:29:53 UTC
Permalink
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Post by Dirk Munk
No, of course Unix is not immune. Using <lf> or <cr> (Windows) as
record terminator is a rather silly idea. It means that you can't use
those characters in a record, and you have to scan the contents of a
file for those characters. Simply writing the length of a record at the
beginning of that record is far better solution.
Having a <LF> or a <CR> in text files seems rather logical to me. What
else, if you want either a line feed or a carriage return?
But yes, there are other ways to specify and delimiting a "line of
text", if you have a system suporting that.
Now, if that "record" is something else than a "line of text"...
The problem is that in Unix and Windows land there is no difference
between the metadata of a file, and the actual contents of a file. The
metadata should define the file and the records in the file, that
should be completely separate from the actual data contents of the file.
Can't speak for Windows, but Unix has no meta-data. Unix has only one
file type, a stream of bytes.  Everything else is application layer.
Which means you don't have a clue about the contents of a file, until
you know the internals of the application. Standard VMS applications
produce structured files, so you only have to worry about the data
contents. It is possible to write your own applications using the files
of another application. The application can be in any language, because
RMS is the layer between the application and the file. This is a
structured approach, instead of producing a diarrhea of bytes, and
calling it a file.
Most of us use file magic to see what sort of app file we're looking
at. And those same file magic tools common on Unix are also pretty good
at identifying common OpenVMS-format files, too.

RMS is a database that emulates punched cards and provides the related
sorting and arrays from that era, and it still works pretty well where
punched cards and related access is a workable data store for an app.

RMS is problematic when it comes to updates, modifications, or pretty
much anything past punched cards. And RMS knows nothing about the data
and data format and encoding used in the record.

Nobody's suggesting removing RMS. But a database that tops out with
key-value access and DEC MCS support is not going to be viewed as a
product differentiator.
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose I have a file with binary data, and one byte has the binary
(ascii) value of <lf>, then Unix will use it as a record separator,
even if it is in the middle of the actual data of that record.
As stated below, Unix doesn't work that way any more than PRINT
SYS$SYSTEM:AUTHORIZE.EXE "works".
Post by Dirk Munk
Post by Bill Gunshannon
Unix has no records. If you cat the file it will line break at the
<lf>. If you od -c the file it will identify the <lf> as just that.
Wonderful. However, it is clear that in many applications the notion of
a data record is present, and that the <lf> is used as record
separator, even if Unix formally doesn't have records.
I'm finding myself poking around in the databases of other apps rather
less often. OpenVMS or elsewhere.

Interestingly—when poking around in an app data store is
required—that's often gotten much easier on Unix, as the data stores
are increasingly using the local analog to RMS.

That local equivalent tends to be SQLite, in the environments I'm often
working in. And SQLite provides much better clues about "record"
formats and fields and field relationships, too.
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you have a VMS file with fixed record size. That file has no
records separators what so ever, it is one long stream of data. VMS can
calculate where the records start and end in the file. Suppose it
consists out of sets of three records of 100 bytes that belong
together. Then you can change the attributes of that file to records of
300 bytes, and in one read operation you will have all the data that
belongs together. I've actually used this in the past.
And that would be an application concept, not really an OS thing.
Actually not, since this can only be done because of the way RMS stores
data, and RMS is part of the OS.
Which is part and parcel of what some of us have been grumbling about
for years; that OpenVMS, well, stopped enhancing its data access in the
1980s, and given the efforts to embed the Rdb database support run-time
ended with the sale of Rdb to Oracle.

SQLite would be one option here for future integration alongside RMS,
and there are others.
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you want to print such a file, then VMS will send a <cr> and a
<lf> to the printer after each record. Simple.
VMS won't.  Whatever application actually prints it will.
Obviously, this is functionality of the spooler, and that is part of VMS.
OpenVMS printing really shouldn't be printing non-printable files—see
file magic above, see PRINT SYS$SYSTEM:AUTHORIZE.EXE above, etc.
Printing needs work. But here we are.
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
The DEC software engineers understood very well why it is a bad idea to
mix up contents of a file with the structure of a file, and that's why
they did not use stream files as standard RMS files in applications.
They are just there for compatibly with Unix, Windows etc.
And Unix made all files streams of bytes and lets the applications
decide what to do with them.  Not really an OS problem.
Ayup.. And I've found the Unix approach works very well. OpenVMS folks
do the same wad-of-bytes thing here with the add-on databases, too;
with Oracle Rdb, SQLite, and other such.
Post by Dirk Munk
Exchanging data between applications is rather important. Those
applications can be written in many languages, can come from different
sources. It is obvious that well structured files are paramount for
exchanging data between applications. That is why something like RMS is
in fact a very modern approach to structured software engineering,
instead of producing a an unstructured diarrhea of bytes, and calling
it a file.
Can't say I really want to document my internal data store as my
import-export API, but you do you. If it's SQLite, that access does
work decently well, as SQLite databases are themselves quite portable
including across endianness differences. But even on OpenVMS, poking
directly into an app's RMS database is close to reverse-engineering,
and not really an interface that most app developers want to support.
Poking directly into SYSUAF isn't recommended, even if SYSUAF currently
uses RMS, Providing a supported and documented data import-export being
much more typical. That interchange format might be YAML or XML or any
number of other interfaces, and frameworks and tooling are available
for all of the common formats. Put differently, we've moved from
abstracting at the RMS or SQLite or other layer to a higher-level
abstraction or data import-export interface.

And RMS has other gaps here beyond its inability to import and export
its common files, too. RMS never got around to providing support for
consistent live backups, though various add-on databases do. RMS lacks
data definitions within records, too. CDD/Repository went to Oracle,
and that and SDL and other data definition tooling. RMS gets you the
record yes, but is less than useful with the app data within the
record, and the character encoding for the data, and related details.
And RMS itself—and BACKUP has similar issues—stinks at identifying and
repairing issues of metadata. Which was the reason for this thread.

Again, RMS was great in the last millennium. Its age is showing.
There's absolutely no reason to remove RMS, but RMS is comparatively
limited. But if RMS works for you, have at. Streams-of-bytes file
systems, and SQLite and other databases, and other tooling work well
for others. RMS is just not something I miss, when working on other
platforms.
--
Pure Personal Opinion | HoffmanLabs LLC
Dirk Munk
2021-01-01 14:09:36 UTC
Permalink
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Post by Jan-Erik Söderholm
Post by Dirk Munk
No, of course Unix is not immune. Using <lf> or <cr> (Windows) as
record terminator is a rather silly idea. It means that you can't
use those characters in a record, and you have to scan the
contents of a file for those characters. Simply writing the length
of a record at the beginning of that record is far better solution.
Having a <LF> or a <CR> in text files seems rather logical to me.
What else, if you want either a line feed or a carriage return?
But yes, there are other ways to specify and delimiting a "line of
text", if you have a system suporting that.
Now, if that "record" is something else than a "line of text"...
The problem is that in Unix and Windows land there is no difference
between the metadata of a file, and the actual contents of a file.
The metadata should define the file and the records in the file,
that should be completely separate from the actual data contents of
the file.
Can't speak for Windows, but Unix has no meta-data. Unix has only one
file type, a stream of bytes.  Everything else is application layer.
Which means you don't have a clue about the contents of a file, until
you know the internals of the application. Standard VMS applications
produce structured files, so you only have to worry about the data
contents. It is possible to write your own applications using the
files of another application. The application can be in any language,
because RMS is the layer between the application and the file. This is
a structured approach, instead of producing a diarrhea of bytes, and
calling it a file.
Most of us use file magic to see what sort of app file we're looking at.
And those same file magic tools common on Unix are also pretty good at
identifying common OpenVMS-format files, too.
RMS is a database that emulates punched cards and provides the related
sorting and arrays from that era, and it still works pretty well where
punched cards and related access is a workable data store for an app.
RMS offers all kind of sequential files types, relative files, and
indexed sequential files. The latter may be called database files, but
sequential files surely not. You may compare them with punched cards,
and I would compare Unix files with a container of shreds that is left
over from punching cards.
RMS is problematic when it comes to updates, modifications, or pretty
much anything past punched cards.
Never had any problems with that, on the contrary. And I wrote many
programs that updated, modified, extended RMS files.
And RMS knows nothing about the data
and data format and encoding used in the record.
True, that could be improved.
Nobody's suggesting removing RMS. But a database that tops out with
key-value access and DEC MCS support is not going to be viewed as a
product differentiator.
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose I have a file with binary data, and one byte has the binary
(ascii) value of <lf>, then Unix will use it as a record separator,
even if it is in the middle of the actual data of that record.
As stated below, Unix doesn't work that way any more than PRINT
SYS$SYSTEM:AUTHORIZE.EXE "works".
Unfortunately, many Unix utilities work that way if they expect ASCII files.
Post by Dirk Munk
Post by Bill Gunshannon
Unix has no records. If you cat the file it will line break at the
<lf>. If you od -c the file it will identify the <lf> as just that.
Wonderful. However, it is clear that in many applications the notion
of a data record is present, and that the <lf> is used as record
separator, even if Unix formally doesn't have records.
I'm finding myself poking around in the databases of other apps rather
less often. OpenVMS or elsewhere.
Interestingly—when poking around in an app data store is required—that's
often gotten much easier on Unix, as the data stores are increasingly
using the local analog to RMS.
That local equivalent tends to be SQLite, in the environments I'm often
working in. And SQLite provides much better clues about "record" formats
and fields and field relationships, too.
Sure, great stuff. But that is a database ! I've often used indexed
sequential files in DCL, simple read and write commands, no need for
SQL. And in many cases indexed sequential files are fine (and fast) as
database files in applications as well.
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you have a VMS file with fixed record size. That file has no
records separators what so ever, it is one long stream of data. VMS
can calculate where the records start and end in the file. Suppose
it consists out of sets of three records of 100 bytes that belong
together. Then you can change the attributes of that file to records
of 300 bytes, and in one read operation you will have all the data
that belongs together. I've actually used this in the past.
And that would be an application concept, not really an OS thing.
Actually not, since this can only be done because of the way RMS
stores data, and RMS is part of the OS.
Which is part and parcel of what some of us have been grumbling about
for years; that OpenVMS, well, stopped enhancing its data access in the
1980s, and given the efforts to embed the Rdb database support run-time
ended with the sale of Rdb to Oracle.
Yes, that was a monumental stupidity. Leave it to managers to be that
stupid. RdB should have been an integral part of VMS. Perhaps VSI can
buy it back from Oracle?

RMS should be updated as well. More functionality, like record
descriptions, larger record size etc, would be great. Perhaps if we ask
Hein nicely, he can do it.
SQLite would be one option here for future integration alongside RMS,
and there are others.
I have no problem with that.
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
Suppose you want to print such a file, then VMS will send a <cr> and
a <lf> to the printer after each record. Simple.
VMS won't.  Whatever application actually prints it will.
Obviously, this is functionality of the spooler, and that is part of VMS.
OpenVMS printing really shouldn't be printing non-printable files—see
file magic above, see PRINT SYS$SYSTEM:AUTHORIZE.EXE above, etc.
Printing needs work. But here we are.
These are not non-printable files. The contents is plain ascii, it is
just adding a <cr> and a <lf> when a record is send to the printer. This
has been standard since RSX !!
Post by Dirk Munk
Post by Bill Gunshannon
Post by Dirk Munk
The DEC software engineers understood very well why it is a bad idea
to mix up contents of a file with the structure of a file, and
that's why they did not use stream files as standard RMS files in
applications. They are just there for compatibly with Unix, Windows etc.
And Unix made all files streams of bytes and lets the applications
decide what to do with them.  Not really an OS problem.
Ayup..  And I've found the Unix approach works very well. OpenVMS folks
do the same wad-of-bytes thing here with the add-on databases, too; with
Oracle Rdb, SQLite, and other such.
Database files also offer structured data. I would not expect databases
to be implemented on top of RMS, that would be a bit silly. Databases
like RdB, SQLlite, Oracle, DBMS etc. can be seen as an alternative for
RMS, with more and other functionality.
Post by Dirk Munk
Exchanging data between applications is rather important. Those
applications can be written in many languages, can come from different
sources. It is obvious that well structured files are paramount for
exchanging data between applications. That is why something like RMS
is in fact a very modern approach to structured software engineering,
instead of producing a an unstructured diarrhea of bytes, and calling
it a file.
Can't say I really want to document my internal data store as my
import-export API, but you do you.  If it's SQLite, that access does
work decently well, as SQLite databases are themselves quite portable
including across endianness differences. But even on OpenVMS, poking
directly into an app's RMS database is close to reverse-engineering, and
not really an interface that most app developers want to support. Poking
directly into SYSUAF isn't recommended, even if SYSUAF currently uses
RMS, Providing a supported and documented data import-export being much
more typical. That interchange format might be YAML or XML or any number
of other interfaces, and frameworks and tooling are available for all of
the common formats. Put differently, we've moved from abstracting at the
RMS or SQLite or other layer to a higher-level abstraction or data
import-export interface.
And RMS has other gaps here beyond its inability to import and export
its common files, too. RMS never got around to providing support for
consistent live backups, though various add-on databases do. RMS lacks
data definitions within records, too. CDD/Repository went to Oracle, and
that and SDL and other data definition tooling. RMS gets you the record
yes, but is less than useful with the app data within the record, and
the character encoding for the data, and related details. And RMS
itself—and BACKUP has similar issues—stinks at identifying and repairing
issues of metadata. Which was the reason for this thread.
The reason for this thread was that record delimiters (<cr> and <lf>)
ended up as data in records. With the solution I offered, they were
removed from the data. My point is that standard RMS sequential files
offer envelopes that contain the data, RMS does not have to inspect the
data to find record delimiters like <cr> and <lf>.
Again, RMS was great in the last millennium. Its age is showing. There's
absolutely no reason to remove RMS, but RMS is comparatively limited.
But if RMS works for you, have at. Streams-of-bytes file systems, and
SQLite and other databases, and other tooling work well for others. RMS
is just not something I miss, when working on other platforms.
Sure, RMS is limited, it is not a database. But it offers structured
data storage, even for simple sequential files. And I always prefer
structured data over a diarrhea of bytes.

Can RMS be improved? Sure it can, and it should.
Phillip Helbig (undress to reply)
2020-12-31 08:21:13 UTC
Permalink
Post by Dirk Munk
The problem is that in Unix and Windows land there is no difference
between the metadata of a file, and the actual contents of a file. The
metadata should define the file and the records in the file, that should
be completely separate from the actual data contents of the file.
The unix mantra is "everything is a stream of bytes". Including the
user, presumably.
Post by Dirk Munk
The DEC software engineers understood very well why it is a bad idea to
mix up contents of a file with the structure of a file, and that's why
they did not use stream files as standard RMS files in applications.
They are just there for compatibly with Unix, Windows etc.
Exactly. There are so many things which DEC just go right.
Dave Froble
2020-12-31 18:13:49 UTC
Permalink
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
The problem is that in Unix and Windows land there is no difference
between the metadata of a file, and the actual contents of a file. The
metadata should define the file and the records in the file, that should
be completely separate from the actual data contents of the file.
The unix mantra is "everything is a stream of bytes". Including the
user, presumably.
Post by Dirk Munk
The DEC software engineers understood very well why it is a bad idea to
mix up contents of a file with the structure of a file, and that's why
they did not use stream files as standard RMS files in applications.
They are just there for compatibly with Unix, Windows etc.
Exactly. There are so many things which DEC just go right.
That's one opinion. But it's not valid in all cases. Perhaps in the
things you do, but not in the things others do.

I do remember when the VMS environment did not include all that it now
does. I happened to play a rather small part in the addition of
capabilities back in the early 1980s, so I'm rather aware of the
evolution of VMS.

I am to the way DC did so many things, mainly because that is what I
learned to use. I hope that I can understand there is more than one
method for doing things. Yes, I do.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Phillip Helbig (undress to reply)
2020-12-30 06:34:51 UTC
Permalink
Post by Dirk Munk
Post by Bill Gunshannon
Post by Jan-Erik Söderholm
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
Post by Phillip Helbig (undress to reply)
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
I've dealt with problems like these before, usually caused by
applications that were not written for VMS.
You need to have a bit of a feeling for the different file types of VMS
to fix these problems, but if you have that, it's very simple to solve
these little puzzles.
I don't know how many times I've used SET FILE/ATTR or CONVERT or TECO
to fix things like this.  I can usually look at the contents, look at
DIR/FULL, and see what needs to be done if they don't match, but this
was somehow different.
Nice that it was fixed! And no, I do not belive in magic... :-)
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial.  :-)
No, of course Unix is not immune. Using <lf> or <cr> (Windows) as record
terminator is a rather silly idea. It means that you can't use those
characters in a record, and you have to scan the contents of a file for
those characters. Simply writing the length of a record at the beginning
of that record is far better solution.
Nearly every internet application protocol (HTTP, FTP, NNTP, SMTP) specifies
<CR><LF> as the line terminator and nearly every unix-derived application
screws it up at some point it its development.

---David Jones
Bill Gunshannon
2020-12-30 19:57:35 UTC
Permalink
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
Post by Bill Gunshannon
Post by Jan-Erik Söderholm
Post by Phillip Helbig (undress to reply)
Post by Dirk Munk
Post by Phillip Helbig (undress to reply)
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
I've dealt with problems like these before, usually caused by
applications that were not written for VMS.
You need to have a bit of a feeling for the different file types of VMS
to fix these problems, but if you have that, it's very simple to solve
these little puzzles.
I don't know how many times I've used SET FILE/ATTR or CONVERT or TECO
to fix things like this.  I can usually look at the contents, look at
DIR/FULL, and see what needs to be done if they don't match, but this
was somehow different.
Nice that it was fixed! And no, I do not belive in magic... :-)
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial.  :-)
No, of course Unix is not immune. Using <lf> or <cr> (Windows) as record
terminator is a rather silly idea. It means that you can't use those
characters in a record, and you have to scan the contents of a file for
those characters. Simply writing the length of a record at the beginning
of that record is far better solution.
Nearly every internet application protocol (HTTP, FTP, NNTP, SMTP) specifies
<CR><LF> as the line terminator and nearly every unix-derived application
screws it up at some point it its development.
---David Jones
That's NETASCII. It standardized the moving of text files and left it
up to the hosts on either end to ensure the file was in the format it
wanted.

bill
Simon Clubley
2020-12-30 14:22:23 UTC
Permalink
Post by Bill Gunshannon
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial. :-)
And that a dedicated command to fix it generally comes with modern
Unix variants...

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Dave Froble
2020-12-30 17:43:18 UTC
Permalink
Post by Simon Clubley
Post by Bill Gunshannon
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial. :-)
And that a dedicated command to fix it generally comes with modern
Unix variants...
Sounds like, instead of fixing a bug, a fix for the result of the bug is
provided.

Am I the only one who has a problem with this?
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Bill Gunshannon
2020-12-30 20:00:39 UTC
Permalink
Post by Dave Froble
Post by Simon Clubley
Post by Bill Gunshannon
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial.  :-)
And that a dedicated command to fix it generally comes with modern
Unix variants...
Sounds like, instead of fixing a bug, a fix for the result of the bug is
provided.
Am I the only one who has a problem with this?
What, exactly is the bug?

bill
Dave Froble
2020-12-31 00:17:18 UTC
Permalink
Post by Bill Gunshannon
Post by Dave Froble
Post by Simon Clubley
Post by Bill Gunshannon
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial. :-)
And that a dedicated command to fix it generally comes with modern
Unix variants...
Sounds like, instead of fixing a bug, a fix for the result of the bug
is provided.
Am I the only one who has a problem with this?
What, exactly is the bug?
bill
Don't know. I thought you implied a bug.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Bill Gunshannon
2020-12-31 01:10:12 UTC
Permalink
Post by Bill Gunshannon
Post by Dave Froble
Post by Simon Clubley
Post by Bill Gunshannon
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial.  :-)
And that a dedicated command to fix it generally comes with modern
Unix variants...
Sounds like, instead of fixing a bug, a fix for the result of the bug
is provided.
Am I the only one who has a problem with this?
What, exactly is the bug?
bill
Don't know.  I thought you implied a bug.
No, the discussion was about files with extraneous characters
like ^M in a text file moved to a Unix system usually by using
FTP in BINARY mode rather than ASCII.

I would love to know where the original file that started this
discussion came from and how it ended up on a VMS Systems.

bill
Dave Froble
2020-12-31 18:17:21 UTC
Permalink
Post by Bill Gunshannon
I would love to know where the original file that started this
discussion came from and how it ended up on a VMS Systems.
bill
Well, yes, Phillip is rather good at providing mysteries for us to
figure out. It sure would have been reasonable for him to lay out
precisely the entire sequence of events from start to finish.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Phillip Helbig (undress to reply)
2020-12-31 18:58:26 UTC
Permalink
Post by Dave Froble
Post by Bill Gunshannon
I would love to know where the original file that started this
discussion came from and how it ended up on a VMS Systems.
Well, yes, Phillip is rather good at providing mysteries for us to
figure out. It sure would have been reasonable for him to lay out
precisely the entire sequence of events from start to finish.
That would take more time than it is worth. Suffice it to say that they
were uploaded via a shaky http connection.
Bill Gunshannon
2020-12-31 19:09:15 UTC
Permalink
Post by Phillip Helbig (undress to reply)
Post by Dave Froble
Post by Bill Gunshannon
I would love to know where the original file that started this
discussion came from and how it ended up on a VMS Systems.
Well, yes, Phillip is rather good at providing mysteries for us to
figure out. It sure would have been reasonable for him to lay out
precisely the entire sequence of events from start to finish.
That would take more time than it is worth. Suffice it to say that they
were uploaded via a shaky http connection.
Believe it or not, that goes a long way towards explaining what
happened and what you got. HTTP does not have a concept of
ASCII/BINARY. Transferring "text" files from none VMS systems
to VMS systems using HTTP is bound to result in at least the
CR+LF|LF+CR|CR|LF problem and, depending on the sending system
even stranger results are possible,

bill
Arne Vajhøj
2020-12-31 20:05:04 UTC
Permalink
Post by Bill Gunshannon
Post by Dave Froble
Post by Bill Gunshannon
I would love to know where the original file that started this
discussion came from and how it ended up on a VMS Systems.
Well, yes, Phillip is rather good at providing mysteries for us to
figure out.  It sure would have been reasonable for him to lay out
precisely the entire sequence of events from start to finish.
That would take more time than it is worth.  Suffice it to say that they
were uploaded via a shaky http connection.
Believe it or not, that goes a long way towards explaining what
happened and what you got.  HTTP does not have a concept of
ASCII/BINARY.  Transferring "text" files from none VMS systems
to VMS systems using HTTP is bound to result in at least the
CR+LF|LF+CR|CR|LF problem and, depending on the sending system
even stranger results are possible,
HTTP has the Content-Type header.

If that is text/plain or text/something then the client
knows it is text.

And a VMS client can treat it appropriately including
writing a file with rfm:var.

Arne
Dave Froble
2021-01-01 01:13:08 UTC
Permalink
Post by Arne Vajhøj
Post by Bill Gunshannon
Post by Phillip Helbig (undress to reply)
Post by Dave Froble
Post by Bill Gunshannon
I would love to know where the original file that started this
discussion came from and how it ended up on a VMS Systems.
Well, yes, Phillip is rather good at providing mysteries for us to
figure out. It sure would have been reasonable for him to lay out
precisely the entire sequence of events from start to finish.
That would take more time than it is worth. Suffice it to say that they
were uploaded via a shaky http connection.
Believe it or not, that goes a long way towards explaining what
happened and what you got. HTTP does not have a concept of
ASCII/BINARY. Transferring "text" files from none VMS systems
to VMS systems using HTTP is bound to result in at least the
CR+LF|LF+CR|CR|LF problem and, depending on the sending system
even stranger results are possible,
HTTP has the Content-Type header.
Most of the "optional" HTTP headers are just that, optional, and I've
seen way too many headers that were sorely lacking in sufficient header
data.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Arne Vajhøj
2021-01-01 01:30:42 UTC
Permalink
Post by Dave Froble
Post by Arne Vajhøj
Post by Bill Gunshannon
Post by Dave Froble
Post by Bill Gunshannon
I would love to know where the original file that started this
discussion came from and how it ended up on a VMS Systems.
Well, yes, Phillip is rather good at providing mysteries for us to
figure out.  It sure would have been reasonable for him to lay out
precisely the entire sequence of events from start to finish.
That would take more time than it is worth.  Suffice it to say that they
were uploaded via a shaky http connection.
Believe it or not, that goes a long way towards explaining what
happened and what you got.  HTTP does not have a concept of
ASCII/BINARY.  Transferring "text" files from none VMS systems
to VMS systems using HTTP is bound to result in at least the
CR+LF|LF+CR|CR|LF problem and, depending on the sending system
even stranger results are possible,
HTTP has the Content-Type header.
Most of the "optional" HTTP headers are just that, optional, and I've
seen way too many headers that were sorely lacking in sufficient header
data.
I think that header is pretty likely to be there.

HTTP RFC says:

<quote>
Any HTTP/1.1 message containing an entity-body SHOULD include a
Content-Type header field defining the media type of that body.
</quote>

And most web stuff will not work without that header.

Any web server will have to send that or be considered
totally broken.

Arne
Dave Froble
2021-01-01 05:03:43 UTC
Permalink
Post by Arne Vajhøj
Post by Dave Froble
Post by Arne Vajhøj
Post by Bill Gunshannon
Post by Phillip Helbig (undress to reply)
Post by Dave Froble
Post by Bill Gunshannon
I would love to know where the original file that started this
discussion came from and how it ended up on a VMS Systems.
Well, yes, Phillip is rather good at providing mysteries for us to
figure out. It sure would have been reasonable for him to lay out
precisely the entire sequence of events from start to finish.
That would take more time than it is worth. Suffice it to say that they
were uploaded via a shaky http connection.
Believe it or not, that goes a long way towards explaining what
happened and what you got. HTTP does not have a concept of
ASCII/BINARY. Transferring "text" files from none VMS systems
to VMS systems using HTTP is bound to result in at least the
CR+LF|LF+CR|CR|LF problem and, depending on the sending system
even stranger results are possible,
HTTP has the Content-Type header.
Most of the "optional" HTTP headers are just that, optional, and I've
seen way too many headers that were sorely lacking in sufficient
header data.
I think that header is pretty likely to be there.
<quote>
Any HTTP/1.1 message containing an entity-body SHOULD include a
Content-Type header field defining the media type of that body.
</quote>
Well, there is that word, "should" ....

When I found incoming HTTP packages without the "Content_Length" to tell
the size of the detail part of the package, I was rather disgusted. So
easy to insure the amount of data, and then not used. I ended up
rejecting such packages. I felt and still feel fully justified.
Post by Arne Vajhøj
And most web stuff will not work without that header.
You might be surprised at what some might do.
Post by Arne Vajhøj
Any web server will have to send that or be considered
totally broken.
Well, there you are wrong, because I've seen it.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Arne Vajhøj
2021-01-01 15:08:01 UTC
Permalink
Post by Dave Froble
Post by Arne Vajhøj
Post by Dave Froble
Post by Arne Vajhøj
Post by Bill Gunshannon
Post by Dave Froble
Post by Bill Gunshannon
I would love to know where the original file that started this
discussion came from and how it ended up on a VMS Systems.
Well, yes, Phillip is rather good at providing mysteries for us to
figure out.  It sure would have been reasonable for him to lay out
precisely the entire sequence of events from start to finish.
That would take more time than it is worth.  Suffice it to say that they
were uploaded via a shaky http connection.
Believe it or not, that goes a long way towards explaining what
happened and what you got.  HTTP does not have a concept of
ASCII/BINARY.  Transferring "text" files from none VMS systems
to VMS systems using HTTP is bound to result in at least the
CR+LF|LF+CR|CR|LF problem and, depending on the sending system
even stranger results are possible,
HTTP has the Content-Type header.
Most of the "optional" HTTP headers are just that, optional, and I've
seen way too many headers that were sorely lacking in sufficient
header data.
I think that header is pretty likely to be there.
<quote>
Any HTTP/1.1 message containing an entity-body SHOULD include a
Content-Type header field defining the media type of that body.
</quote>
Well, there is that word, "should" ....
I believe that is the second strongest requirement after MUST.
Post by Dave Froble
When I found incoming HTTP packages without the "Content_Length" to tell
the size of the detail part of the package, I was rather disgusted.  So
easy to insure the amount of data, and then not used.  I ended up
rejecting such packages.  I felt and still feel fully justified.
Content-Length is more complicated than Content-Type.

RFC:

<quote>
3.If a Content-Length header field (section 14.13) is present, its
decimal value in OCTETs represents both the entity-length and the
transfer-length. The Content-Length header field MUST NOT be sent
if these two lengths are different (i.e., if a Transfer-Encoding
header field is present). If a message is received with both a
Transfer-Encoding header field and a Content-Length header field,
the latter MUST be ignored.
</quote>

<quote>
For compatibility with HTTP/1.0 applications, HTTP/1.1 requests
containing a message-body MUST include a valid Content-Length header
field unless the server is known to be HTTP/1.1 compliant. If a
request contains a message-body and a Content-Length is not given,
the server SHOULD respond with 400 (bad request) if it cannot
determine the length of the message, or with 411 (length required) if
it wishes to insist on receiving a valid Content-Length.
</quote>
Post by Dave Froble
Post by Arne Vajhøj
And most web stuff will not work without that header.
You might be surprised at what some might do.
Post by Arne Vajhøj
Any web server will have to send that or be considered
totally broken.
Well, there you are wrong, because I've seen it.
What server does not send Content-Type?

Arne
Phillip Helbig (undress to reply)
2020-12-31 22:16:28 UTC
Permalink
Post by Arne Vajhøj
Post by Bill Gunshannon
Post by Phillip Helbig (undress to reply)
That would take more time than it is worth. Suffice it to say that they
were uploaded via a shaky http connection.
Believe it or not, that goes a long way towards explaining what
happened and what you got. HTTP does not have a concept of
ASCII/BINARY. Transferring "text" files from none VMS systems
to VMS systems using HTTP is bound to result in at least the
CR+LF|LF+CR|CR|LF problem and, depending on the sending system
even stranger results are possible,
HTTP has the Content-Type header.
If that is text/plain or text/something then the client
knows it is text.
And a VMS client can treat it appropriately including
writing a file with rfm:var.
The question is why sometimes it worked but occasionally not.
Bill Gunshannon
2021-01-01 03:59:03 UTC
Permalink
Post by Phillip Helbig (undress to reply)
Post by Arne Vajhøj
Post by Bill Gunshannon
Post by Phillip Helbig (undress to reply)
That would take more time than it is worth. Suffice it to say that they
were uploaded via a shaky http connection.
Believe it or not, that goes a long way towards explaining what
happened and what you got. HTTP does not have a concept of
ASCII/BINARY. Transferring "text" files from none VMS systems
to VMS systems using HTTP is bound to result in at least the
CR+LF|LF+CR|CR|LF problem and, depending on the sending system
even stranger results are possible,
HTTP has the Content-Type header.
If that is text/plain or text/something then the client
knows it is text.
And a VMS client can treat it appropriately including
writing a file with rfm:var.
The question is why sometimes it worked but occasionally not.
I would be looking at the receiving client for the answer to that.

bill
Arne Vajhøj
2020-12-31 20:07:32 UTC
Permalink
Post by Phillip Helbig (undress to reply)
Post by Dave Froble
Post by Bill Gunshannon
I would love to know where the original file that started this
discussion came from and how it ended up on a VMS Systems.
Well, yes, Phillip is rather good at providing mysteries for us to
figure out. It sure would have been reasonable for him to lay out
precisely the entire sequence of events from start to finish.
That would take more time than it is worth. Suffice it to say that they
were uploaded via a shaky http connection.
With my understanding of the term "shaky connection", then
that should not be able to mess up line format in what
actually got downloaded.

Arne
Dave Froble
2021-01-01 01:09:53 UTC
Permalink
Post by Phillip Helbig (undress to reply)
Post by Dave Froble
Post by Bill Gunshannon
I would love to know where the original file that started this
discussion came from and how it ended up on a VMS Systems.
Well, yes, Phillip is rather good at providing mysteries for us to
figure out. It sure would have been reasonable for him to lay out
precisely the entire sequence of events from start to finish.
That would take more time than it is worth. Suffice it to say that they
were uploaded via a shaky http connection.
How do you expect anyone to come up with solutions to your problems, if
describing the problem is more time than it is worth? If you can not do
that much, better to just keep it to yourself.
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: ***@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
Bill Gunshannon
2020-12-30 19:59:49 UTC
Permalink
Post by Simon Clubley
Post by Bill Gunshannon
And, just so people don't think, based on earlier comments, that
Unix is somehow immune, I frequently have to remove "^M" characters
from text files on Unix. Unix's only saving grace in this regard is
that the solution is trivial. :-)
And that a dedicated command to fix it generally comes with modern
Unix variants...
Simon.
vi does the job just fine.

bill
Simon Clubley
2020-12-30 14:25:55 UTC
Permalink
Post by Phillip Helbig (undress to reply)
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
If you still have the original version around, what does TECO do
to it if you read it in and write it out again ?

Simon.
--
Simon Clubley, ***@remove_me.eisner.decus.org-Earth.UFP
Walking destinations on a map are further away than they appear.
Phillip Helbig (undress to reply)
2020-12-31 08:22:26 UTC
Permalink
Post by Simon Clubley
Post by Phillip Helbig (undress to reply)
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
If you still have the original version around, what does TECO do
to it if you read it in and write it out again ?
In order to avoid confusion, I deleted the bad version. However, it
didn't look like the files which I've just opened and closed in TECO in
order to fix them.
Phillip Helbig (undress to reply)
2020-12-31 17:36:53 UTC
Permalink
Post by Phillip Helbig (undress to reply)
Post by Simon Clubley
Post by Phillip Helbig (undress to reply)
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
If you still have the original version around, what does TECO do
to it if you read it in and write it out again ?
In order to avoid confusion, I deleted the bad version. However, it
didn't look like the files which I've just opened and closed in TECO in
order to fix them.
Same problem today, but with a much smaller file (30 instead of 200,000
blocks). I loaded it into TECO then exited, then deleted the extra
blank lines with a TECO learn sequence. Still some problem. I then
went to the post-TECO file and noticed that a few lines had been broken
at the wrong place (they are supposed to be all the same length). I
fixed those by hand with EDT (not possible with 100 MB), then TPU again.
In the end there was still something not 110% correct, but I could still
use the file so all is well now. :-)
Bill Gunshannon
2020-12-31 17:50:56 UTC
Permalink
Post by Phillip Helbig (undress to reply)
Post by Phillip Helbig (undress to reply)
Post by Simon Clubley
Post by Phillip Helbig (undress to reply)
In the end, I managed to transfer it again (don't ask!) and somehow,
magically, it was OK.
If you still have the original version around, what does TECO do
to it if you read it in and write it out again ?
In order to avoid confusion, I deleted the bad version. However, it
didn't look like the files which I've just opened and closed in TECO in
order to fix them.
Same problem today, but with a much smaller file (30 instead of 200,000
blocks). I loaded it into TECO then exited, then deleted the extra
blank lines with a TECO learn sequence. Still some problem. I then
went to the post-TECO file and noticed that a few lines had been broken
at the wrong place (they are supposed to be all the same length). I
fixed those by hand with EDT (not possible with 100 MB), then TPU again.
In the end there was still something not 110% correct, but I could still
use the file so all is well now. :-)
Can you tell us where these files come from and how they get on the
VMS System?

bill
Loading...