Discussion:
backup archive format saved to disk
Douglas Tutty
2006-12-04 20:20:16 UTC
Permalink
Hello,

I'm going to be backing up to a portable ruggedized hard drive.
Currently, my backups end up in tar.bz2 format.

It would be nice if there was some redundancy in the data stream to
handle blocks that go bad while the drive is in storage (e.g. archive).

How is this handled on tape? Is it built-into the hardware
compression?

Do I need to put a file system on a disk partition if I'm only saving
one archive file or can I just write the archive to the partition
directly (and read it back) as if it was a scsi tape?

Is there an archive or compression format that includes the ability to
not only detect errors but to correct them? (e.g. store ECC data
elsewhere in the file) If there was, and I could write it directly to
the disk, then that would solve the blocks-failing-while-drive-stored
issue.

Thanks,

Doug.
Tyler MacDonald
2006-12-05 01:44:57 UTC
Permalink
Post by Douglas Tutty
Is there an archive or compression format that includes the ability to
not only detect errors but to correct them? (e.g. store ECC data
elsewhere in the file) If there was, and I could write it directly to
the disk, then that would solve the blocks-failing-while-drive-stored
issue.
.PAR files are popular on usenet... The "par2" and "parchive" debian
packages may be of help there.

Cheers,
Tyler
Johannes Wiedersich
2006-12-05 10:45:03 UTC
Permalink
Post by Douglas Tutty
I'm going to be backing up to a portable ruggedized hard drive.
Currently, my backups end up in tar.bz2 format.
It would be nice if there was some redundancy in the data stream to
handle blocks that go bad while the drive is in storage (e.g. archive).
How is this handled on tape? Is it built-into the hardware
compression?
Do I need to put a file system on a disk partition if I'm only saving
one archive file or can I just write the archive to the partition
directly (and read it back) as if it was a scsi tape?
Is there an archive or compression format that includes the ability to
not only detect errors but to correct them? (e.g. store ECC data
elsewhere in the file) If there was, and I could write it directly to
the disk, then that would solve the blocks-failing-while-drive-stored
issue.
Now, to something completely different....
If data integrity is your concern, than maybe a better solution than
compression is to copy all your data with rsync or another backup tool
that 'mirrors' your files instead of packing them all together in one
large file. If something goes wrong with this large file you might loose
the backup of all your files. If something goes wrong with the
transmission of one file in the rsync case you will only 'loose' the
backup of that one file and just restart the rsync command.

Well, at least I much prefer to spend a bit more on storage and have all
my files copied individually. It adds the benefit that it is
straightforward to verify the integrity of the backup via 'diff -r'.

As far as redundancy is concerned I would prefer to use a second disk
(and while you are at it store it in a different location, miles away
from the other). I have one backup at home and another one at my
mother's house, adding several layers of security to my data.

Johannes

NB: Are you using a journal for your fs?
Mike McCarty
2006-12-05 23:47:23 UTC
Permalink
Post by Johannes Wiedersich
Post by Douglas Tutty
I'm going to be backing up to a portable ruggedized hard drive.
Currently, my backups end up in tar.bz2 format.
It would be nice if there was some redundancy in the data stream to
handle blocks that go bad while the drive is in storage (e.g. archive).
How is this handled on tape? Is it built-into the hardware
compression?
Do I need to put a file system on a disk partition if I'm only saving
one archive file or can I just write the archive to the partition
directly (and read it back) as if it was a scsi tape?
Is there an archive or compression format that includes the ability to
not only detect errors but to correct them? (e.g. store ECC data
elsewhere in the file) If there was, and I could write it directly to
the disk, then that would solve the blocks-failing-while-drive-stored
issue.
Now, to something completely different....
If data integrity is your concern, than maybe a better solution than
compression is to copy all your data with rsync or another backup tool
that 'mirrors' your files instead of packing them all together in one
large file. If something goes wrong with this large file you might loose
the backup of all your files. If something goes wrong with the
[snip]

My understanding of the BZ2 format is that it compresses individual
blocks independently, and that the loss of a block will not compromize
the entire archive, only those files which are contained in a given
block.

Your other points are (as usual) well taken.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Douglas Tutty
2006-12-06 00:08:54 UTC
Permalink
Post by Mike McCarty
Post by Johannes Wiedersich
Post by Douglas Tutty
I'm going to be backing up to a portable ruggedized hard drive.
Currently, my backups end up in tar.bz2 format.
It would be nice if there was some redundancy in the data stream to
handle blocks that go bad while the drive is in storage (e.g. archive).
How is this handled on tape? Is it built-into the hardware
compression?
Do I need to put a file system on a disk partition if I'm only saving
one archive file or can I just write the archive to the partition
directly (and read it back) as if it was a scsi tape?
Is there an archive or compression format that includes the ability to
not only detect errors but to correct them? (e.g. store ECC data
elsewhere in the file) If there was, and I could write it directly to
the disk, then that would solve the blocks-failing-while-drive-stored
issue.
Now, to something completely different....
If data integrity is your concern, than maybe a better solution than
compression is to copy all your data with rsync or another backup tool
that 'mirrors' your files instead of packing them all together in one
large file. If something goes wrong with this large file you might loose
the backup of all your files. If something goes wrong with the
[snip]
My understanding of the BZ2 format is that it compresses individual
blocks independently, and that the loss of a block will not compromize
the entire archive, only those files which are contained in a given
block.
Yes. But I don't want to loose any data at all.

I've looked at par2. It looks interesting. For me, the question is how
to implement it for archiving onto a drive since the ECC data are
separate files rather than being included within one data stream.

Separate files suggests that it be on a file system, and we're back to
where we started since I haven't found a parfs.

I suppose I could use par2 to create the ECC files, then feed the ECC
files one at a time, followed by the main data file, followed by the ECC
files again.

I'll check out with my zip drive if I can write a tar file directly to
disk without a fs (unless someone knows the answer).

Doug.
Andrew Sackville-West
2006-12-06 00:15:18 UTC
Permalink
Post by Douglas Tutty
Yes. But I don't want to loose any data at all.
there is no way to guarantee this. you could improve your odds by
having multiple storage locations with multiple copies and a rigorous
method for routinely testing the backup media for corruption and
making new replacement copies of the backups to prevent future loss.

For example, make multiple identical backups. sprinkle them in various
locations. on a periodic, routine basis, test those backups for
possible corruption. If their clean, make a new copy anyway to put in
rotation, throwing away the old ones after so many periods. If you
find a corrupt one, get one of your clean ones to reproduce it and
start over.

there is now way, using only one physical storage medium, to guarantee
no loss of data.

maybe I;'m reading this wrong , but it seems to be what you're asking for.
A
Tyler MacDonald
2006-12-05 17:31:18 UTC
Permalink
Post by Andrew Sackville-West
For example, make multiple identical backups. sprinkle them in various
locations. on a periodic, routine basis, test those backups for
possible corruption. If their clean, make a new copy anyway to put in
rotation, throwing away the old ones after so many periods. If you
find a corrupt one, get one of your clean ones to reproduce it and
start over.
Here's what I do with my systems:

I use backup2l to make incremental backups to a partition in /dump.
These backups are then GPG-encrypted, with the key of the owner of each
server. They are then rsynced to a central repository on one of the servers,
and from there rsynced down to my home system.

So each server's backup data is always in three locations: It's own
machine, the repo, and my home machine.

When the /dump partition starts to get a bit full somewhere, I
create a DVD image of some of the tarballs and burn off 4 copies. Two stay
at home, one goes to my friend that is managing the repo, and one gets
mailed to a friend in austria.

This system works well, but that's mainly because we have less than
300GB of data that needs to be backed up and we have long backup cycles -- a
new level-1 backup is generated maybe once every six months.

If anyone wants to check out the backup2l.conf and associated files,
let me know and I'll send it to you off-list.

Cheers,
Tyler
Mike McCarty
2006-12-06 01:03:06 UTC
Permalink
Tyler MacDonald wrote:

[snip]
Post by Tyler MacDonald
When the /dump partition starts to get a bit full somewhere, I
create a DVD image of some of the tarballs and burn off 4 copies. Two stay
at home, one goes to my friend that is managing the repo, and one gets
mailed to a friend in austria.
You are effectively using a BCH code with a Hamming distance of 2. This
same distance can be achieved with only three copies. Now, if you have
a way to know which copy got corrupted (IOW, you have three states
for any given bit: 0, 1, missing, as opposed to just 0, 1) then your
method is better, because correcting mising bits takes less information
than corrected corrupted bits. But if you have no way to detect missing
bits, then you are doing effectively no better than you could with just
three discs.

[snip]

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
h***@topoi.pooq.com
2006-12-06 14:21:51 UTC
Permalink
Post by Tyler MacDonald
I use backup2l to make incremental backups to a partition in /dump.
These backups are then GPG-encrypted, with the key of the owner of each
server.
Thereby ensuring that the entire backup depends on the survivability of
the private GPG keys of the owners of the servers. If they lose their
keys, the entire backup system is worthless. How are the keys backed
up?

-- hendrik
Mike McCarty
2006-12-06 00:58:35 UTC
Permalink
Post by Andrew Sackville-West
Post by Douglas Tutty
Yes. But I don't want to loose any data at all.
there is no way to guarantee this. you could improve your odds by
having multiple storage locations with multiple copies and a rigorous
method for routinely testing the backup media for corruption and
making new replacement copies of the backups to prevent future loss.
For example, make multiple identical backups. sprinkle them in various
locations. on a periodic, routine basis, test those backups for
possible corruption. If their clean, make a new copy anyway to put in
rotation, throwing away the old ones after so many periods. If you
Respectfully, I disagree with this last recommendation. You are
suggesting that he continually keep his backup media on the
infant mortality portion of the Weibull distribution. The usual
way for devices which are not subjected to periodic high stress
to fail is to have an infant mortality rate which is high, but falls
down to a low level, then begins to rise again with wearout. In this
case, wearout would be eventual degradation of the metallization
layer in the disc.
Post by Andrew Sackville-West
find a corrupt one, get one of your clean ones to reproduce it and
start over.
Be sure to use an odd number of copies. Don't want no tied votes
on whether a given bit is a 0 or a 1 :-)
Post by Andrew Sackville-West
there is now way, using only one physical storage medium, to guarantee
no loss of data.
There is no way, using any number of physical storage media, to
gurantee no loss of data.

On any storage medium, if the probability of error in a data bit is less
than 50%, then given any e > 0 there exists an FEC method which reduces
the probability of data loss to be less than e.

If the probability of error on any given bit is greater than 50%,
then there is no way, by adding additional information, to make
the eventual error rate be less than a single copy. The additional
bits are more likely to be in error than the original.
Post by Andrew Sackville-West
maybe I;'m reading this wrong , but it seems to be what you're asking for.
There is no way to guarantee that every bit of computer information
on the Earth is not destroyed by a comet strike :-)

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Andrew Sackville-West
2006-12-06 07:33:25 UTC
Permalink
Post by Mike McCarty
Post by Andrew Sackville-West
Post by Douglas Tutty
Yes. But I don't want to loose any data at all.
there is no way to guarantee this. you could improve your odds by
having multiple storage locations with multiple copies and a rigorous
method for routinely testing the backup media for corruption and
making new replacement copies of the backups to prevent future loss.
For example, make multiple identical backups. sprinkle them in various
locations. on a periodic, routine basis, test those backups for
possible corruption. If their clean, make a new copy anyway to put in
rotation, throwing away the old ones after so many periods. If you
Respectfully, I disagree with this last recommendation. You are
suggesting that he continually keep his backup media on the
infant mortality portion of the Weibull distribution. The usual
way for devices which are not subjected to periodic high stress
to fail is to have an infant mortality rate which is high, but falls
down to a low level, then begins to rise again with wearout. In this
case, wearout would be eventual degradation of the metallization
layer in the disc.
good point and one I hadn't thought of, though its obvious now that
you mention it. I'm thinking of moldy floppies and aging cd's that
definitely fall apart over time with probably less of the typical
bathtub shape and more of a slippery slope...
Post by Mike McCarty
Post by Andrew Sackville-West
find a corrupt one, get one of your clean ones to reproduce it and
start over.
Be sure to use an odd number of copies. Don't want no tied votes
on whether a given bit is a 0 or a 1 :-)
:)
Post by Mike McCarty
Post by Andrew Sackville-West
there is now way, using only one physical storage medium, to guarantee
no loss of data.
There is no way, using any number of physical storage media, to
gurantee no loss of data.
absolutely right.
Post by Mike McCarty
On any storage medium, if the probability of error in a data bit is less
than 50%, then given any e > 0 there exists an FEC method which reduces
the probability of data loss to be less than e.
If the probability of error on any given bit is greater than 50%,
then there is no way, by adding additional information, to make
the eventual error rate be less than a single copy. The additional
bits are more likely to be in error than the original.
Post by Andrew Sackville-West
maybe I;'m reading this wrong , but it seems to be what you're asking for.
There is no way to guarantee that every bit of computer information
on the Earth is not destroyed by a comet strike :-)
That's why I'm hoping the universe is circular. I can catch the
broadcast on the way back ;-)

A

ps. Mike, I got one of those bounces the other day and the whois
pointed to US DOD servers. heh.
h***@topoi.pooq.com
2006-12-06 14:30:48 UTC
Permalink
Post by Mike McCarty
Post by Andrew Sackville-West
Post by Douglas Tutty
Yes. But I don't want to loose any data at all.
there is no way to guarantee this. you could improve your odds by
having multiple storage locations with multiple copies and a rigorous
method for routinely testing the backup media for corruption and
making new replacement copies of the backups to prevent future loss.
For example, make multiple identical backups. sprinkle them in various
locations. on a periodic, routine basis, test those backups for
possible corruption. If their clean, make a new copy anyway to put in
rotation, throwing away the old ones after so many periods. If you
Respectfully, I disagree with this last recommendation. You are
suggesting that he continually keep his backup media on the
infant mortality portion of the Weibull distribution. The usual
way for devices which are not subjected to periodic high stress
to fail is to have an infant mortality rate which is high, but falls
down to a low level, then begins to rise again with wearout. In this
case, wearout would be eventual degradation of the metallization
layer in the disc.
Post by Andrew Sackville-West
find a corrupt one, get one of your clean ones to reproduce it and
start over.
Be sure to use an odd number of copies. Don't want no tied votes
on whether a given bit is a 0 or a 1 :-)
Post by Andrew Sackville-West
there is now way, using only one physical storage medium, to guarantee
no loss of data.
There is no way, using any number of physical storage media, to
gurantee no loss of data.
On any storage medium, if the probability of error in a data bit is less
than 50%, then given any e > 0 there exists an FEC method which reduces
the probability of data loss to be less than e.
If the probability of error on any given bit is greater than 50%,
then there is no way, by adding additional information, to make
the eventual error rate be less than a single copy. The additional
bits are more likely to be in error than the original.
Speaking pedantically, if the probability of error is greater than 50%,
you can complement every bit and gte a probability less than 50%.

-- hendrik
Mike McCarty
2006-12-06 18:57:40 UTC
Permalink
Post by h***@topoi.pooq.com
Speaking pedantically, if the probability of error is greater than 50%,
you can complement every bit and gte a probability less than 50%.
No, not so. Because on the channels we are discussing, the bits
have three states: 0, 1, and "unable to read". I don't know how
to complement a bit which is in the state "unable to read". The
best I can do is assign it an arbitrary value of 0 or 1.

In some circumstances, this can actually work, and is one of the
reasons that "missing" bits are easier to correct. What one can
do is go through all possible assignments of 0 and 1 to each bit
which is missing. Along the way, one may encounter a combination
which is correctable to a code word (or may actually *be* a code
word). If that happens, one may actually be able to correct. This
can be time consuming, however, as you may guess.

It's true that I spoke simply, because I didn't want to get into
more technical depth than we have already. This whole discussion
is getting pretty far afield from Debian.

Another reason this is not true, is that one has to know
*in advance* that the channel flips bits (not erases them)
in order to do that sort of error correction. In that case, one
has a mis-designed channel. The only channels which I've seen
considered which cause errors on a very high proportion of
bits create missing bits. Anything else would be, well,
crazy.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Mike McCarty
2006-12-06 00:27:10 UTC
Permalink
Post by Douglas Tutty
Post by Mike McCarty
Post by Johannes Wiedersich
Post by Douglas Tutty
I'm going to be backing up to a portable ruggedized hard drive.
Currently, my backups end up in tar.bz2 format.
[snip]
Post by Douglas Tutty
Post by Mike McCarty
Post by Johannes Wiedersich
Now, to something completely different....
If data integrity is your concern, than maybe a better solution than
compression is to copy all your data with rsync or another backup tool
that 'mirrors' your files instead of packing them all together in one
large file. If something goes wrong with this large file you might loose
the backup of all your files. If something goes wrong with the
[snip]
My understanding of the BZ2 format is that it compresses individual
blocks independently, and that the loss of a block will not compromize
the entire archive, only those files which are contained in a given
block.
Yes. But I don't want to loose any data at all.
Of course not. I was responding to Johannes' statement that one
risks entire loss. This is true with, for example, gzip of a tar,
but not with bzip2.
Post by Douglas Tutty
I've looked at par2. It looks interesting. For me, the question is how
to implement it for archiving onto a drive since the ECC data are
separate files rather than being included within one data stream.
You could implement your own FEC. A very simple form of FEC is simply
three copies, which you can do by hand. Another possibility is simply
have two copies of the BZ2 and read any bad blocks from the other
copy. This corresponds more closely to the request retransmission
model than FEC, but is reasonable in this circumstance.

One thing to bear in mind is that, no matter how good an FEC method
you use, you are going to have to store about 2x redundant data
to get anything out of it. IOW, the data + parity information is going
to be about 3x the size of the data alone for any reasonable ability
to correct anything.
Post by Douglas Tutty
Separate files suggests that it be on a file system, and we're back to
where we started since I haven't found a parfs.
I don't understand this statement. If you have a means to create FEC
checksums, and a way to store those, and a way to use the FEC checksums
along with a damaged copy of the file to reconstruct it, then why
do you need some special kind of FS to store it?
Post by Douglas Tutty
I suppose I could use par2 to create the ECC files, then feed the ECC
files one at a time, followed by the main data file, followed by the ECC
files again.
Why two copies of the FEC information?
Post by Douglas Tutty
I'll check out with my zip drive if I can write a tar file directly to
disk without a fs (unless someone knows the answer).
Why do you insist on not having a FS? Even if you don't have an FS,
I don't see why you want to separate the FEC information, unless you
don't have a program which can manage the information you're trying
to store. If that be the case, then the FEC information won't do
any good anyway.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Ron Johnson
2006-12-06 00:57:38 UTC
Permalink
Post by Mike McCarty
Post by Douglas Tutty
Post by Mike McCarty
Post by Douglas Tutty
I'm going to be backing up to a portable ruggedized hard drive.
Currently, my backups end up in tar.bz2 format.
[snip]
[snip]
Post by Mike McCarty
Post by Douglas Tutty
Post by Mike McCarty
[snip]
[snip]
Post by Mike McCarty
Post by Douglas Tutty
I've looked at par2. It looks interesting. For me, the question is how
to implement it for archiving onto a drive since the ECC data are
separate files rather than being included within one data stream.
You could implement your own FEC. A very simple form of FEC is simply
Yes, but *why*? Tape storage systems have been using ECC for decades.

There's a whole lot of "Linux people" who's knowledge of computer
history seems to have started in 1991, and thus all the many lessons
learned in 30 years of computing are lost.

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
Mike McCarty
2006-12-06 01:12:11 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Post by Mike McCarty
You could implement your own FEC. A very simple form of FEC is simply
Yes, but *why*? Tape storage systems have been using ECC for decades.
You are the only one who can answer this question. AFAIK, tape systems
have *not* been using FEC. The systems I've used have implemented
ED, but not EDAC.
There's a whole lot of "Linux people" who's knowledge of computer
history seems to have started in 1991, and thus all the many lessons
learned in 30 years of computing are lost.
I can't help that.

How about stating some measureable goals in the form of requirements
instead of just complaining that some systems don't meeet your needs?
Then perhaps some of us could suggest possible solutions. At present,
I feel like whatever I suggest is going to be met with another objection
that an as-yet-unstated requirement is not being met.

You seem to have just added a requirement that what you use be something
which was used on tape before 1991.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Douglas Tutty
2006-12-06 01:33:41 UTC
Permalink
Post by Ron Johnson
Post by Mike McCarty
You could implement your own FEC. A very simple form of FEC is simply
Yes, but *why*? Tape storage systems have been using ECC for decades.
There's a whole lot of "Linux people" who's knowledge of computer
history seems to have started in 1991, and thus all the many lessons
learned in 30 years of computing are lost.
Hi Ron,

I'm hoping someone who can remember computer history prior to 1991 can
give some perspective.

It think (__please__ correct me if I'm wrong) that the tape systems had
the ECC as part of the hardware. Write a plain datastream to the drive
and the drive did the ECC part transparent to the user. Read the data
and a bad block gets fixed by the hardware ECC.

I'm told that modern hard drives also do ECC but I can't find out how
that is implemented. I'm told that if a block starts to fail (whatever
that means) then the data is transferred to a new unallocated block,
transparent to the rest of the computer. Only if the drive runs out of
unallocated blocks does it give errors.

The question is, if a block is sucessfully written now, if the drive is
not used for 5 years then a read is attempted, is the drive able to
retreive that data using ECC (as a tape drive could)?

Since I don't __know__ that it can, I'm assuming that it can't. I'm
playing my own devils advocate and trying to find out how to plan to be
able to read successfully off a drive with bad blocks after years of
sitting on a shelf.

I'm focusing on the one-drive issue because this is one drive sitting in
a bank vault. This is __archive__ (just like tape). I have backup
procedures as a separate issue. One of the places that backup data goes
to is the bank vault archive.

In the absence of an all-in-one archive format, I'll use tar (which can
detect errors just not fix them) to take care of names, owners,
permissions, etc. Then that tar needs to be made ECC and compressed.
If I want to throw in a monkey, I'll consider encryption.

Yes tape drives do that. Its probably why they cost so much. Hard
drives are much cheaper and are supposed to be able to hang on to their
data (Seagate gives a 5 year warranty). But having seagate give me a
new drive when I can't get my data off after 4 years is cold comfort.

The other problem with tapes is their fragility. Drop a DLT and I'm
told that its toast. Put that tape in the drive and I'm told it can
damage the drive. A laptop drive in a ruggedized enclosure is much more
robust and has a wider environmental range.

Perhaps what I'm looking for doesn't exist. If it doesn't, I'll start
work on it.

As far as computer history prior to 1991, I could never get the hang of
C. I'll stick with fortran77.

Doug.
Mike McCarty
2006-12-06 01:55:40 UTC
Permalink
Post by Douglas Tutty
Post by Ron Johnson
Post by Mike McCarty
You could implement your own FEC. A very simple form of FEC is simply
Yes, but *why*? Tape storage systems have been using ECC for decades.
There's a whole lot of "Linux people" who's knowledge of computer
history seems to have started in 1991, and thus all the many lessons
learned in 30 years of computing are lost.
Hi Ron,
I'm hoping someone who can remember computer history prior to 1991 can
give some perspective.
It think (__please__ correct me if I'm wrong) that the tape systems had
the ECC as part of the hardware. Write a plain datastream to the drive
and the drive did the ECC part transparent to the user. Read the data
and a bad block gets fixed by the hardware ECC.
The 9 track tape drives had two forms of EDAC. One was the fact that
each byte was written with a 9th check bit, even parity IIRC, allowing
some detection, and some FEC on a block basis, IIRC.
Post by Douglas Tutty
I'm told that modern hard drives also do ECC but I can't find out how
that is implemented. I'm told that if a block starts to fail (whatever
that means) then the data is transferred to a new unallocated block,
transparent to the rest of the computer. Only if the drive runs out of
unallocated blocks does it give errors.
That's the way fixe discs work, usually with 11 bit BCH code, and also
the way DAT works. Whatever other tapes may be in use, I don't know what
they do. CDs also use some FEC (two Read-Solomon codes, one for
thumbprint correction, and one for long-burst errors), but I don't know
whether they do that when using a data format, as opposed to music.
Post by Douglas Tutty
The question is, if a block is sucessfully written now, if the drive is
not used for 5 years then a read is attempted, is the drive able to
retreive that data using ECC (as a tape drive could)?
I thought the question was "How can I be sure I can get my data back?"
So far, some people have suggested few techniques to accomplish that,
but all I've seen is complaints in response.

I guess I don't know what the question is.
Post by Douglas Tutty
Since I don't __know__ that it can, I'm assuming that it can't. I'm
playing my own devils advocate and trying to find out how to plan to be
able to read successfully off a drive with bad blocks after years of
sitting on a shelf.
IF this is what your goal is, then, as I pointed out, you can implement
your own FEC.
Post by Douglas Tutty
I'm focusing on the one-drive issue because this is one drive sitting in
a bank vault. This is __archive__ (just like tape). I have backup
procedures as a separate issue. One of the places that backup data goes
to is the bank vault archive.
If the issue is a drive, then you need more than one drive. If the
drive itself fails, then you are SOL.
Post by Douglas Tutty
In the absence of an all-in-one archive format, I'll use tar (which can
detect errors just not fix them) to take care of names, owners,
permissions, etc. Then that tar needs to be made ECC and compressed.
If I want to throw in a monkey, I'll consider encryption.
Yes tape drives do that. Its probably why they cost so much. Hard
drives are much cheaper and are supposed to be able to hang on to their
data (Seagate gives a 5 year warranty). But having seagate give me a
new drive when I can't get my data off after 4 years is cold comfort.
Hard disc drives use FEC (usually a BCH 11 bit redundant code).
If you want to be able to read more reliably than the FEC already
present, you'll have to add your own.
Post by Douglas Tutty
The other problem with tapes is their fragility. Drop a DLT and I'm
told that its toast. Put that tape in the drive and I'm told it can
The most common cause of that is edge damage.
Post by Douglas Tutty
damage the drive. A laptop drive in a ruggedized enclosure is much more
robust and has a wider environmental range.
Perhaps what I'm looking for doesn't exist. If it doesn't, I'll start
work on it.
Hmm. You going to become an expert at designing ECC? I suggest you take
a course in abstract algebra first.
Post by Douglas Tutty
As far as computer history prior to 1991, I could never get the hang of
C. I'll stick with fortran77.
Which progamming language one uses is less important than the algorithm
implemented.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Douglas Tutty
2006-12-06 02:13:06 UTC
Permalink
Post by Mike McCarty
Post by Douglas Tutty
The question is, if a block is sucessfully written now, if the drive is
not used for 5 years then a read is attempted, is the drive able to
retreive that data using ECC (as a tape drive could)?
I thought the question was "How can I be sure I can get my data back?"
So far, some people have suggested few techniques to accomplish that,
but all I've seen is complaints in response.
I'm not complaining Mike. Also, note who's saying what; there's a few
voices in this conversation.

Please don't take my questions the wrong way.
I am very gratefull for the wisdom. I'm just trying to tease apart
where failures can occur and what can mitigate them.
Post by Mike McCarty
I guess I don't know what the question is.
Post by Douglas Tutty
Since I don't __know__ that it can, I'm assuming that it can't. I'm
playing my own devils advocate and trying to find out how to plan to be
able to read successfully off a drive with bad blocks after years of
sitting on a shelf.
IF this is what your goal is, then, as I pointed out, you can implement
your own FEC.
Yes I could. The origional question was to see if one existed already.
Post by Mike McCarty
Post by Douglas Tutty
I'm focusing on the one-drive issue because this is one drive sitting in
a bank vault. This is __archive__ (just like tape). I have backup
procedures as a separate issue. One of the places that backup data goes
to is the bank vault archive.
If the issue is a drive, then you need more than one drive. If the
drive itself fails, then you are SOL.
So drive failures are atomic? I.e. if in 5 years I go to read a drive
and it has errors, everything is toast? I'm wanting a data-stream
format (call it a file system, an archive format, whatever) that can
withstand those errors.
Post by Mike McCarty
Post by Douglas Tutty
In the absence of an all-in-one archive format, I'll use tar (which can
detect errors just not fix them) to take care of names, owners,
permissions, etc. Then that tar needs to be made ECC and compressed.
If I want to throw in a monkey, I'll consider encryption.
Yes tape drives do that. Its probably why they cost so much. Hard
drives are much cheaper and are supposed to be able to hang on to their
data (Seagate gives a 5 year warranty). But having seagate give me a
new drive when I can't get my data off after 4 years is cold comfort.
Hard disc drives use FEC (usually a BCH 11 bit redundant code).
If you want to be able to read more reliably than the FEC already
present, you'll have to add your own.
Post by Douglas Tutty
The other problem with tapes is their fragility. Drop a DLT and I'm
told that its toast. Put that tape in the drive and I'm told it can
The most common cause of that is edge damage.
Post by Douglas Tutty
damage the drive. A laptop drive in a ruggedized enclosure is much more
robust and has a wider environmental range.
Perhaps what I'm looking for doesn't exist. If it doesn't, I'll start
work on it.
Hmm. You going to become an expert at designing ECC? I suggest you take
a course in abstract algebra first.
Can (or at least I used to be able to) do the algebra, but that's not
the issue. There are programs like par2 that do the ECC stuff but put
it in separate files. If I went that route, I just have to pack it all
together.
Post by Mike McCarty
Post by Douglas Tutty
As far as computer history prior to 1991, I could never get the hang of
C. I'll stick with fortran77.
Which progamming language one uses is less important than the algorithm
implemented.
True.

Thanks.

Doug.
Mike McCarty
2006-12-06 03:43:55 UTC
Permalink
[snip]
Post by Douglas Tutty
I'm not complaining Mike. Also, note who's saying what; there's a few
voices in this conversation.
Sorry, did I miss an attribution? If so, then I apologize.
Post by Douglas Tutty
Please don't take my questions the wrong way.
I am very gratefull for the wisdom. I'm just trying to tease apart
where failures can occur and what can mitigate them.
Fair enough. See my other message which describes hypothetical
data recovery on a damaged set of CDROMs.
Post by Douglas Tutty
Post by Mike McCarty
I guess I don't know what the question is.
Post by Douglas Tutty
Since I don't __know__ that it can, I'm assuming that it can't. I'm
playing my own devils advocate and trying to find out how to plan to be
able to read successfully off a drive with bad blocks after years of
sitting on a shelf.
IF this is what your goal is, then, as I pointed out, you can implement
your own FEC.
Yes I could. The origional question was to see if one existed already.
Yes, FEC is used on all modern technology data storage that I know of,
with the possible exception of CDROMs. I haven't studied the low level
data storage format they use to know whether they use any FEC when
storing data as opposed to music. I know the music format uses nested
Reed-Solomon codes. For all I know, the ISO format has FEC embedded
in it as part of the FS, though I doubt it.
Post by Douglas Tutty
Post by Mike McCarty
Post by Douglas Tutty
I'm focusing on the one-drive issue because this is one drive sitting in
a bank vault. This is __archive__ (just like tape). I have backup
procedures as a separate issue. One of the places that backup data goes
to is the bank vault archive.
If the issue is a drive, then you need more than one drive. If the
drive itself fails, then you are SOL.
So drive failures are atomic? I.e. if in 5 years I go to read a drive
If in X years you plug the drive into your machine, and smoke
pours out, you are going to have difficulty reading any medium
you may put into it. You are banking on the format of the tape
or whatever not changing in that time. If the format becomes
obsolete, and your drive fails, then you are SOL.

[snip]
Post by Douglas Tutty
Post by Mike McCarty
Hmm. You going to become an expert at designing ECC? I suggest you take
a course in abstract algebra first.
Can (or at least I used to be able to) do the algebra, but that's not
the issue. There are programs like par2 that do the ECC stuff but put
it in separate files. If I went that route, I just have to pack it all
together.
I don't understand that. It shouldn't matter where the correction bits
get stored, so long as the number of bits that get damaged is less than
what the code can correct.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Douglas Tutty
2006-12-06 04:15:17 UTC
Permalink
Post by Mike McCarty
Yes, FEC is used on all modern technology data storage that I know of,
with the possible exception of CDROMs. I haven't studied the low level
data storage format they use to know whether they use any FEC when
storing data as opposed to music. I know the music format uses nested
Reed-Solomon codes. For all I know, the ISO format has FEC embedded
in it as part of the FS, though I doubt it.
If FEC is used on all media (except CDROM), is there any value in adding
my own FEC layer over top or should I just format the drive JFS and copy
my tar.bz2 backup file to it and be done? (remembering that the drive in
the bank is only one of the sets of data I keep).
Post by Mike McCarty
Post by Douglas Tutty
Post by Mike McCarty
Post by Douglas Tutty
I'm focusing on the one-drive issue because this is one drive sitting in
a bank vault. This is __archive__ (just like tape). I have backup
procedures as a separate issue. One of the places that backup data goes
to is the bank vault archive.
If the issue is a drive, then you need more than one drive. If the
drive itself fails, then you are SOL.
So drive failures are atomic? I.e. if in 5 years I go to read a drive
If in X years you plug the drive into your machine, and smoke
pours out, you are going to have difficulty reading any medium
you may put into it. You are banking on the format of the tape
or whatever not changing in that time. If the format becomes
obsolete, and your drive fails, then you are SOL.
True.

Thanks,

Doug.
Mike McCarty
2006-12-06 05:09:19 UTC
Permalink
Post by Douglas Tutty
If FEC is used on all media (except CDROM), is there any value in adding
my own FEC layer over top or should I just format the drive JFS and copy
my tar.bz2 backup file to it and be done? (remembering that the drive in
the bank is only one of the sets of data I keep).
Only you can answer that question.

I didn't say that FEC isn't used on CDROMs, I said I don't know
whether it be used. It's something worth investigating, though!

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Ron Johnson
2006-12-06 05:42:03 UTC
Permalink
Post by Mike McCarty
[snip]
[snip]
Post by Mike McCarty
If in X years you plug the drive into your machine, and smoke
pours out, you are going to have difficulty reading any medium
you may put into it. You are banking on the format of the tape
or whatever not changing in that time. If the format becomes
obsolete, and your drive fails, then you are SOL.
Sadly, this is a perfect argument for why businesses go with
"popular" instead of "best": you can still buy 9-track tape drives
that read IBM and DEC formatted tapes, as well as 3480/3490 tapes
and DEC TK50 tapes. There are a few companies that can even still
read DECtapes.

It's the esoteric stuff (3rd-tier and specialized equipment) that's
impossible to read now.

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
h***@topoi.pooq.com
2006-12-06 14:45:46 UTC
Permalink
Post by Mike McCarty
Post by Douglas Tutty
I'm focusing on the one-drive issue because this is one drive sitting in
a bank vault. This is __archive__ (just like tape). I have backup
procedures as a separate issue. One of the places that backup data goes
to is the bank vault archive.
If the issue is a drive, then you need more than one drive. If the
drive itself fails, then you are SOL.
And maybe a second bank with a separate vault.

-- hendrik
Ron Johnson
2006-12-06 03:20:45 UTC
Permalink
Post by Douglas Tutty
Post by Ron Johnson
Post by Mike McCarty
You could implement your own FEC. A very simple form of FEC is simply
Yes, but *why*? Tape storage systems have been using ECC for decades.
There's a whole lot of "Linux people" who's knowledge of computer
history seems to have started in 1991, and thus all the many lessons
learned in 30 years of computing are lost.
Hi Ron,
I'm hoping someone who can remember computer history prior to 1991 can
give some perspective.
It think (__please__ correct me if I'm wrong) that the tape systems had
the ECC as part of the hardware. Write a plain datastream to the drive
and the drive did the ECC part transparent to the user. Read the data
and a bad block gets fixed by the hardware ECC.
I'm told that modern hard drives also do ECC but I can't find out how
that is implemented. I'm told that if a block starts to fail (whatever
that means) then the data is transferred to a new unallocated block,
transparent to the rest of the computer. Only if the drive runs out of
unallocated blocks does it give errors.
The question is, if a block is sucessfully written now, if the drive is
not used for 5 years then a read is attempted, is the drive able to
retreive that data using ECC (as a tape drive could)?
Mike is correct, disk drive blocks do have ECC.

Remember, though, that drives are delicate mechanisms, and so the
problem I see is the lubricating oil possibly thickening, and thus
the drive not spinning up properly. Hopefully the bad spin-up would
not cause the r/w head to gouge the platter. Otherwise, the data
could still be retrieved, easily, for a price, from a data recovery
company.

[snip]
Post by Douglas Tutty
In the absence of an all-in-one archive format, I'll use tar (which can
detect errors just not fix them) to take care of names, owners,
permissions, etc. Then that tar needs to be made ECC and compressed.
If I want to throw in a monkey, I'll consider encryption.
Remember what "tar" means: Tape ARchive. It's designed as a
container file.

OTOH, if you're backing up a hard disk, you could do file-by-file
backups, compressing the big, compressible files, and leaving alone
the not-so-compressible files. Thus, if a sector goes blooey,
you've still got most of your data.
Post by Douglas Tutty
Yes tape drives do that. Its probably why they cost so much. Hard
And lower production volumes.
Post by Douglas Tutty
drives are much cheaper and are supposed to be able to hang on to their
data (Seagate gives a 5 year warranty). But having seagate give me a
new drive when I can't get my data off after 4 years is cold comfort.
The other problem with tapes is their fragility. Drop a DLT and I'm
told that its toast. Put that tape in the drive and I'm told it can
damage the drive.
We've used DLT drives for years, and never had that problem.
Post by Douglas Tutty
A laptop drive in a ruggedized enclosure is much more
robust and has a wider environmental range.
Drop a HDD and you've got worse problems.
Post by Douglas Tutty
Perhaps what I'm looking for doesn't exist. If it doesn't, I'll start
work on it.
As far as computer history prior to 1991, I could never get the hang of
C. I'll stick with fortran77.
Give me VAX COBOL. But then, I've always been on the DP side.

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
Ron Johnson
2006-12-06 03:31:02 UTC
Permalink
[snip]
Post by Ron Johnson
Post by Douglas Tutty
The question is, if a block is sucessfully written now, if the drive is
not used for 5 years then a read is attempted, is the drive able to
retreive that data using ECC (as a tape drive could)?
Mike is correct, disk drive blocks do have ECC.
Remember, though, that drives are delicate mechanisms, and so the
problem I see is the lubricating oil possibly thickening, and thus
the drive not spinning up properly. Hopefully the bad spin-up would
not cause the r/w head to gouge the platter. Otherwise, the data
could still be retrieved, easily, for a price, from a data recovery
company.
Forgot to mention: for important data, we make multiple copies of
the data, so if one of the tapes has too many errors for the EDAC to
handle, the other tape hopefully won't be bad.

I'd do the same thing with disks.

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
Douglas Tutty
2006-12-06 01:46:03 UTC
Permalink
Post by Mike McCarty
Post by Douglas Tutty
I've looked at par2. It looks interesting. For me, the question is how
to implement it for archiving onto a drive since the ECC data are
separate files rather than being included within one data stream.
You could implement your own FEC. A very simple form of FEC is simply
three copies, which you can do by hand. Another possibility is simply
have two copies of the BZ2 and read any bad blocks from the other
copy. This corresponds more closely to the request retransmission
model than FEC, but is reasonable in this circumstance.
One thing to bear in mind is that, no matter how good an FEC method
you use, you are going to have to store about 2x redundant data
to get anything out of it. IOW, the data + parity information is going
to be about 3x the size of the data alone for any reasonable ability
to correct anything.
Par2 seems to be able to do it at about 15%. It comes down to number
theory and how many corrupted data blocks one needs to be able to
handle. If 100 % of the data blocks are unavailable (worst case) then
you need 100% redundant data (i.e. raid1).
Post by Mike McCarty
Post by Douglas Tutty
Separate files suggests that it be on a file system, and we're back to
where we started since I haven't found a parfs.
I don't understand this statement. If you have a means to create FEC
checksums, and a way to store those, and a way to use the FEC checksums
along with a damaged copy of the file to reconstruct it, then why
do you need some special kind of FS to store it?
My statement referrs to using par2 which doesn't touch the input file(s)
but generates the error-corecting data as separate files.

What does FEC stand for? I think ECC stands for Error Checking and
Correcting.
Post by Mike McCarty
Post by Douglas Tutty
I suppose I could use par2 to create the ECC files, then feed the ECC
files one at a time, followed by the main data file, followed by the ECC
files again.
Why two copies of the FEC information?
What if two blocks on the drive fail, one containing data, the other
containing the ECC info?
Post by Mike McCarty
Post by Douglas Tutty
I'll check out with my zip drive if I can write a tar file directly to
disk without a fs (unless someone knows the answer).
Why do you insist on not having a FS? Even if you don't have an FS,
I don't see why you want to separate the FEC information, unless you
don't have a program which can manage the information you're trying
to store. If that be the case, then the FEC information won't do
any good anyway.
I don't insist on not having a FS. But how well does a FS work with bad
blocks cropped up? If it doesn't encorporate ECC itself then it either
drops the data from the bad blocks or at worst can't be mounted. The
question is, do I need a FS? If I don't, isn't it just one more
potential point of failure?

Thank you all for the discussion.

Doug.
Mike McCarty
2006-12-06 03:07:33 UTC
Permalink
[snip]
Post by Douglas Tutty
Post by Mike McCarty
One thing to bear in mind is that, no matter how good an FEC method
you use, you are going to have to store about 2x redundant data
to get anything out of it. IOW, the data + parity information is going
to be about 3x the size of the data alone for any reasonable ability
to correct anything.
Par2 seems to be able to do it at about 15%. It comes down to number
theory and how many corrupted data blocks one needs to be able to
handle. If 100 % of the data blocks are unavailable (worst case) then
you need 100% redundant data (i.e. raid1).
15% to do what?

I have designed some BCH FEC codes for a few systems, so I think I
have a reasonable feel for what is involved.

What you describe is correct only if each bit has only three values:
0, 1, and missing. If the transmission channel can only change a bit
from 0 to missing, or 1 to missing, but no other values, then 100%
redundancy is adequate for single error correction. If other types of
damage may occur, then 100% redundancy is not adequate. A distance 2
code is adequate if the only changes the channel introduces is missing
bits. But if not all bits may be distinguished as damaged, then at least
a distance 3 code is required, which needs more than 100% redundancy.
Post by Douglas Tutty
Post by Mike McCarty
Post by Douglas Tutty
Separate files suggests that it be on a file system, and we're back to
where we started since I haven't found a parfs.
I don't understand this statement. If you have a means to create FEC
checksums, and a way to store those, and a way to use the FEC checksums
along with a damaged copy of the file to reconstruct it, then why
do you need some special kind of FS to store it?
My statement referrs to using par2 which doesn't touch the input file(s)
but generates the error-corecting data as separate files.
Wherever they get stored is irrelevant, except insofar as it may aid
the code in burst detection and correction.
Post by Douglas Tutty
What does FEC stand for? I think ECC stands for Error Checking and
Correcting.
FEC = Forward Error Correction. When a transmission channel makes
error detection with request for retransmission infeasible (like
with space missions, or when the data are recorded, and no other
copy exists to use as a retransmission source, for examples) then
one uses some form of FEC. ECC = Error Correcting Code, which refers
to the code itself, not the technique. EDAC = Error Detection And
Correction, which refers to any number of techniques which may
include error detection with request for retransmit, or FEC,
for examples.
Post by Douglas Tutty
Post by Mike McCarty
Post by Douglas Tutty
I suppose I could use par2 to create the ECC files, then feed the ECC
files one at a time, followed by the main data file, followed by the ECC
files again.
Why two copies of the FEC information?
What if two blocks on the drive fail, one containing data, the other
containing the ECC info?
Then the information in the check and data bits is used to correct them.

In a properly designed code the check bits are themselves part of the
correctable data, so that errors in them are correctable. The check bits
are not treated any diffferently from any other bits. They are all just
data. If the total number of bits which are damaged does not exceed
the ability of the code to correct, then they are all recovered.
Post by Douglas Tutty
Post by Mike McCarty
Post by Douglas Tutty
I'll check out with my zip drive if I can write a tar file directly to
disk without a fs (unless someone knows the answer).
Why do you insist on not having a FS? Even if you don't have an FS,
I don't see why you want to separate the FEC information, unless you
don't have a program which can manage the information you're trying
to store. If that be the case, then the FEC information won't do
any good anyway.
I don't insist on not having a FS. But how well does a FS work with bad
blocks cropped up? If it doesn't encorporate ECC itself then it either
drops the data from the bad blocks or at worst can't be mounted. The
question is, do I need a FS? If I don't, isn't it just one more
potential point of failure?
How well does the disc work with bad blocks? If you have errors which
the disc itself cannot correct, then you are going to have to do
very low level recovery, indeed. That is why I suggested not doing that,
but rather use your own FEC, by having redundant copies of the entire
disc. In this wise, you don't have to go about trying to recover
whatever high level information may be on the disc, and fixing the
data storage format itself. If you do that, then it doesn't matter
whether there is a file system present, and if it is present whether
it can recover from corrupted blocks. You do the low level recovery,
then whatever data were on the discs it is recovered.

To put it another way, if the disc is unable to read its platters,
then it can't and you aren't going to get data, anyway, for those
sectors. It's better then, not to try to layer on top of something
that is going to lose large blocks by trying to do it on the
same device, but rather to rely on a separate device. To get data
for those sectors, you'd have to issue low level commands to the
controller, instructing it to do long-reads and ignore errors.

I've done that once with 360K floppies, and recovered data that
way, but I wouldn't want to go through learning how to do that
again for whatever hard disc I might have. For one thing, modern
discs do sector remapping, and I'd have to go through all that
rigamarole of finding out (if the manufacturer would even disclose)
how that takes place, and how to instruct the disc to ignore it,
and what the FEC code used on the sectors is, etc.

An added benefit of this is that it essentially creates a code which
has the ability to correct bursts as long as the whole disc, which is
quite long, indeed.

To put it another way, suppose we use CDROMs to store our information
in ISO format, and make three copies. Suppose that we get read errors
on the discs later. The easy way to handle this is to rip the ISO
images from the three discs, errors and all. Then we use the majority
rule to examine each bit. If two of the images agree that a bit is
a 1, then we put out a 1. If two of the images agree that a bit is
a 0, then we put out a 0. When we are through, then if there are
not any double errors, we have a correct ISO image which may be
mounted, or written to new media or whatever. It doesn't matter
whether the data form a file system, nor whether the file system
have any ability to correct errors. What matters is that, whatever
the format of the data on the storage medium, we can recover it.
We don't try to repair the file system, we repair the underlying bits.
Repairing the file system is too laborious.

If we can't recover an ISO image (or other file system image), then we
can't recover raw data written to disc, either. If we can recover a raw
disc write, then we can also recover an ISO image, or any other file
system format. It's all just bits recorded on a single long spiral on
the disc. To the bits themselves, it makes no difference if they got
there to make up a file system, or got there as a raw image. They are
all raw images, in the end. It's just a matter of a raw image of what.
A raw image of a file system is still just bits.

So, ISTM that whether any file system be used for storage is irrelevant,
so one might as well go ahead and use a file system for ease of mount
and read, unless the overhead of the directory entries is something
one wants to avoid. But in that case, one is going to lose permissions,
and dates, etc.

OTOH, if you want to save space, then a raw image of a compressed
archive file like a tarball will be smaller. But that is a separate
issue from data recovery.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Douglas Tutty
2006-12-06 04:01:49 UTC
Permalink
Thanks Mike,

If I can attempt to summarize a portion of what you said:

If the issue is resistance to data block errors, it doesn't
matter if I use a file system or not so I may as well use a file
system then if have difficulty, rip multiple copies of the file
system bit by bit and do majority rules.

There's a package (forget the name) that will do this
with files: take multiple damaged copies and make one
good copy if possible.


Does the kernel software-raid in raid1 do this? Would there be any
advantage/disadvantage to putting three partitions on the drive and
setting them up as raid1? (and record the partition table [sfdisk -d]
separately)?


Googling this topic, I find sporatic posts on different forums whishing
for something like this but there doesn't seem to be anything
off-the-shelf for linux. It seems to be what the data security
companies get paid for (e.g. the veritas filesystem). Do you know of
anything?

I understand you description of FEC and I guess that's what we're
talking about. In the absence of a filesystem that does it, I want a
program that takes a data stream (e.g. a tar.bz2 archive) and imbeds FEC
data in it so it can be stored, then later can take that data and
generate the origional data stream.

Do I understand you correctly that the FEC-embedded data stream to be
effective will be three times the size of the input stream?

Does it matter if this FEC data is embeddded with the data or appended?

If this doesn't exist for linux, do you know of any open-source
non-linux implementations that just need some type of porting? I've
found a couple of technical papers discussing the algorithms
(Reed-Solomon) used in the par2 archive that I'll study.

Thanks,

Doug.
Mike McCarty
2006-12-06 05:05:12 UTC
Permalink
Post by Douglas Tutty
Thanks Mike,
If the issue is resistance to data block errors, it doesn't
matter if I use a file system or not so I may as well use a file
system then if have difficulty, rip multiple copies of the file
system bit by bit and do majority rules.
Well, not quite. "Majority rules" is a simple BCH code which anyone
could do with a simple program. All you need is an odd number of
copies. The distance of the code is the number of copies, so the
number of correctable errors (per bit, I mean) is (n-1)/2. So, with
three copies, one can correct any single error, with five copies
one can correct double errors (in a single bit), etc. It's not
the most efficient, but it is very very simple.
Post by Douglas Tutty
There's a package (forget the name) that will do this
with files: take multiple damaged copies and make one
good copy if possible.
Multiple copies on a single medium are not, IMO, advisable. There are
errors which can occur which prevent reading a single bit anywhere.
If you sit on a CDROM, you may make it completely unreadable :-)
Post by Douglas Tutty
Does the kernel software-raid in raid1 do this? Would there be any
advantage/disadvantage to putting three partitions on the drive and
setting them up as raid1? (and record the partition table [sfdisk -d]
separately)?
I am not familiar with the internals of Linux' software RAID.
Post by Douglas Tutty
Googling this topic, I find sporatic posts on different forums whishing
for something like this but there doesn't seem to be anything
off-the-shelf for linux. It seems to be what the data security
companies get paid for (e.g. the veritas filesystem). Do you know of
anything?
I have used Veritas on some telephony equipment, and it worked well
enough.
Post by Douglas Tutty
I understand you description of FEC and I guess that's what we're
talking about. In the absence of a filesystem that does it, I want a
program that takes a data stream (e.g. a tar.bz2 archive) and imbeds FEC
data in it so it can be stored, then later can take that data and
generate the origional data stream.
Do I understand you correctly that the FEC-embedded data stream to be
effective will be three times the size of the input stream?
That is correct. The larger the Hamming distance used in the code, the
more redundancy is required. It is unfortunate that the very best codes
we have been able to design fall quite a bit short of what is the
theoretical limit, which is itself somewhat disappointing to those
with naive expectations. Generally, one uses about 1/3 data and 2/3
check bits.
Post by Douglas Tutty
Does it matter if this FEC data is embeddded with the data or appended?
Depends on what you mean by the question. First, you presuppose that
in an arbitrary code, the original data survive intact, and some
additional redundancy is added. This would be called a systematic
code. Not all codes are systematic, though the very best practical
block[1] codes we have can all be put into systematic form. The very
best block codes we've cooked up for experimentation purposes are *not*
systematic, and the original data do not survive in any distinct form
apart from the entire code word. IOW, in these codes there aren't
distinguishable "data bits" and "check bits", there are just
"code bits", and without going through the decoding process, there
is no simple way to get the data bits back. AFAIK, none of these
experimental codes has ever been applied in a practical system,
as coding and decoding are too problematic.[2] But they are much
more compact codes, and more closely approach the theoretical
limits on the total code word size required to achieve a given
degree of correction ability.

Second, one of the best ways to make a code which can correct
bursts as well as independent bit errors (many media are
susceptible to bursts) is to use interleaved codes.

Here's a simple example. Let's go back to the "transmit each bit
three times" example I gave of CDROMs. That code is an example
of an interleaved code, which is one of the reasons it has such
good burst correcting capabilities. If we used a non-interleaved
code, we would simply record each bit three times on the disc,
and let it span multiple discs. Then it would have single bit
correcting ability, but no burst correcting ability.

If, OTOH, we have three discs which are each a complete copy,
then we can recover from any single burst which is less or equal to
any single disc in length. This is an interleaved code.

If we use 5x transmission, we can recover from any double bit
error, and any burst of up to two bits as well. But if we use
5 separate discs, once again we can recover from any single burst
which spans up to two discs.

The code used on CDs for music are interleaved Reed-Solomon codes,
one used for "fingerprints" (local errors) and another which spans
about half the circumference for "track errors" or whatever they
are called. Long bursts, anyway. That's AIUI; I'm not an expert on
CDs by any means.
Post by Douglas Tutty
If this doesn't exist for linux, do you know of any open-source
non-linux implementations that just need some type of porting? I've
found a couple of technical papers discussing the algorithms
(Reed-Solomon) used in the par2 archive that I'll study.
I do not. Most implementations are Reed-Solomon, or some other
variation of BCH codes (of which the RS codes are a subset),
and are closely guarded secrets.

Well, I guess there are some RAID systems which are open source
and which could be used for that sort of thing. Again, using
the idea of CDROMs, one could make a bunch of them look like
a RAID system. I suppose that the software would require some
serious hacking to make it able to deal with that sort of set
up.

[1] I've been presuming block codes as opposed to stream
codes in all that I've written so far, though most of
what I've said also applies to stream codes.

[2] These codes all require enormous look-up tables and
CPU time, or even larger look-up tables if one wants to
use less CPU. The code words are essentially random bits
which seemingly cannot be generated by some algorithmic
procedure. Any code which would be capable of encoding
any reasonable amount of data would require a table which
wouldn't fit into any reasonable amount of memory. IOW,
a code capable of encoding 128 byte blocks, would require
perhaps another 200 bytes of check information, and the
code word would be perhaps 320 bytes long. To do decoding would
require a table of 2560 bits in each entry, and would require
2^128 entries. Hardly a practical code. These values
(which I made up) are extrapolations from what we observe
in creating some small block codes which are efficient
in code word size. This all points up that the word
"efficient" has different meanings in different contexts.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
h***@topoi.pooq.com
2006-12-06 15:08:50 UTC
Permalink
Post by Douglas Tutty
Thanks Mike,
If the issue is resistance to data block errors, it doesn't
matter if I use a file system or not so I may as well use a file
system then if have difficulty, rip multiple copies of the file
system bit by bit and do majority rules.
There's a package (forget the name) that will do this
with files: take multiple damaged copies and make one
good copy if possible.
Does the kernel software-raid in raid1 do this? Would there be any
advantage/disadvantage to putting three partitions on the drive and
setting them up as raid1? (and record the partition table [sfdisk -d]
separately)?
If the drive electronics fails, for example, or a piece of abrasive
dirt is on the head during a seekm you lose all three partitions.

Better to have one partition on each of three separate drives.

My strategy?

* RAID1 with two drives
* reiserfs on the RAID (although I have been told that reiser has bad
resistance to power failures, I haven't changed yet; it's wonderfully
resilient to the software crashes I've been experiencing)
* backup by copying everything onto a dismountable hard disk and keeping
it on a shelf
* critical data kept in textual form and checked into monotone, which is
to be sync'ed to monotone repositories elsewhere (still setting this
up).

-- hendrik
Douglas Tutty
2006-12-06 17:33:45 UTC
Permalink
Post by h***@topoi.pooq.com
If the drive electronics fails, for example, or a piece of abrasive
dirt is on the head during a seekm you lose all three partitions.
Better to have one partition on each of three separate drives.
My strategy?
* RAID1 with two drives
* reiserfs on the RAID (although I have been told that reiser has bad
resistance to power failures, I haven't changed yet; it's wonderfully
resilient to the software crashes I've been experiencing)
* backup by copying everything onto a dismountable hard disk and keeping
it on a shelf
* critical data kept in textual form and checked into monotone, which is
to be sync'ed to monotone repositories elsewhere (still setting this
up).
This is similar to my approach except that I don't use monotone, I keep
absolutley critical data in several formats on different media in
different locations (one copy to my parents for example).

Its the drive-on-the-shelf(-in-the-bank) issue I'm focusing on. What is
the best way to protect the data on that drive.

Since I use raid1 for my 80 GB drives, I can add that external drive to
the array to get a bootable snapshot, but is there a better way? Maybe
there's not but I figured that I'd check first.

As someone else noted, the data-security companies keep this stuff as
closely guarded secrets because its their bread-and-butter. If a
virtual-tape-server is no better than your own home-brew linux raid
setup then why spend the extra money?

Doug.
Douglas Tutty
2006-12-05 22:01:36 UTC
Permalink
Post by Johannes Wiedersich
Post by Douglas Tutty
I'm going to be backing up to a portable ruggedized hard drive.
Currently, my backups end up in tar.bz2 format.
It would be nice if there was some redundancy in the data stream to
handle blocks that go bad while the drive is in storage (e.g. archive).
How is this handled on tape? Is it built-into the hardware
compression?
Do I need to put a file system on a disk partition if I'm only saving
one archive file or can I just write the archive to the partition
directly (and read it back) as if it was a scsi tape?
Is there an archive or compression format that includes the ability to
not only detect errors but to correct them? (e.g. store ECC data
elsewhere in the file) If there was, and I could write it directly to
the disk, then that would solve the blocks-failing-while-drive-stored
issue.
Now, to something completely different....
If data integrity is your concern, than maybe a better solution than
compression is to copy all your data with rsync or another backup tool
that 'mirrors' your files instead of packing them all together in one
large file. If something goes wrong with this large file you might loose
the backup of all your files. If something goes wrong with the
transmission of one file in the rsync case you will only 'loose' the
backup of that one file and just restart the rsync command.
Well, at least I much prefer to spend a bit more on storage and have all
my files copied individually. It adds the benefit that it is
straightforward to verify the integrity of the backup via 'diff -r'.
As far as redundancy is concerned I would prefer to use a second disk
(and while you are at it store it in a different location, miles away
from the other). I have one backup at home and another one at my
mother's house, adding several layers of security to my data.
Johannes
Thanks Johannes,

Yes I use JFS for my file systems. I have raid1 on my main drives. I
will have one portable drive at home, so several layers of backup here.
The issue is off-site backup and that's where the disk in the bank comes
in.

The problem is that a journal on a hard disk only protects the
filesystem from an inconsistant state due to power failure. It does
nothing to protect the data if it was written correctly 5 years ago and
never mounted since. If a block or two goes bad then that piece of data
is lost. It could make the filesystem unmountable.

I haven't been able to find a filesystem that provides redundancy that
is free. The companies that pioneered disk-based virtual tape serves
have their own (e.g. Veritas). This is why I'm looking at archive
formats.

The idea is that a format with built-in error-correcting would scatter
the redundancy around the disk so that if a few blocks are bad, the data
can still be retreived.

Even raid1 doesn't accomplish this. With raid1 and two disks, if both
disks have bad blocks appear, even if they are on different spots on
each drive, as far as I can tell raid1 can't create a virtual pristine
partition out of several damaged ones.

Searching aptitude, there seem to be a few packages that address this
issue obliquely (given two corrupted archives, can create a single
pristine archive) but need two complete archive sets. I have to look at
the par spec.

Basically, I want to do for my archives what ECC does for memory. With
ECC memory, for every 8 bits, there's one extra bit of storage. It can
fix single-bit errors. If I'm remembering my math right, ECC adds 15%
to the size of an archive __prior_to_compression__. Its impossible to
do with less than 1:1 (100%) on compressed data. Its therefore best
done from __within__ the compression algorithm. Take a block of data
from the input stream, make the ECC data, compress the block of data,
append the ECC, and spit this to the output stream and write the ECC
data to an ECC stream. At the end of the input stream, take the ECC
stream, make ECC data for that, compress the ECC stream, append the ECC
for that, spit this to the output stream.

If par doesn't do what I need and I can't find an alternative, I'll just
write my own, modeled first in python, then done in Fortran77 for speed.
If I go to all this trouble, I'd probably throw in AES for good measure.
It would make a fun project but I hate reinventing perfectly good
wheels. Then again, I know people who jump out of perfectly good
airplanes. Go figure.

Doug.
Johannes Wiedersich
2006-12-06 13:52:29 UTC
Permalink
Post by Douglas Tutty
The idea is that a format with built-in error-correcting would scatter
the redundancy around the disk so that if a few blocks are bad, the data
can still be retreived.
Point taken.
Post by Douglas Tutty
Even raid1 doesn't accomplish this. With raid1 and two disks, if both
disks have bad blocks appear, even if they are on different spots on
each drive, as far as I can tell raid1 can't create a virtual pristine
partition out of several damaged ones.
Question: how likely is it that both disks develop bad blocks, while
none of them is damaged? I'm no expert on this, but I guess a better
strategy might be to rotate backups on two disks, and use (and check:
fsck and smartctl) them reguarly.

Leaving two identical disks in a bank vault for years and taking them
out, I guess the probabilty is rather high that either both work
perfectly or both are damaged and can't be read at all. There might be
cases in between, where your approach would make sense, but I doubt
there are many. Indeed I twice had a disk were one partition (or part of
the disk) was physically damaged, but it was possible to read data from
another partition, but most of the time the state of the disk is digital
(all works or nothing works). In one of the two cases a large part of
the disk (at the end of the cylinders) was damaged as far as I could
tell, so even error correction probably would not have saved data in
that region (for the second case, I can't tell).
Post by Douglas Tutty
If par doesn't do what I need and I can't find an alternative, I'll just
write my own, modeled first in python, then done in Fortran77 for speed.
If I go to all this trouble, I'd probably throw in AES for good measure.
It would make a fun project but I hate reinventing perfectly good
wheels. Then again, I know people who jump out of perfectly good
airplanes. Go figure.
As said I'm no expert on this, but I think the effort might be better
put elsewhere. Your approach gives some advantage in a special case of
failure, ie. if a disk has single points of failure, but doesn't add
security in case the whole drive goes bad.

On debian there is a program 'dvdisaster' [1] that adds error correction
data to cd and dvd images, so that potentially one could reconstruct
partially defective media. Since the program ignores the filesystem,
maybe you could use it or modify it for your needs. This approach is
based on the assumption that a cd is rather likely to be partially
damaged by scratches etc. IMHO it only makes sense, if the likelihood of
_partial_ damage is large enough to justify the effort. This might not
be the case for hard disks.

HTH, YMMV,
Johannes

[1] http://www.dvdisaster.com/
Andrew Sackville-West
2006-12-06 16:43:44 UTC
Permalink
Post by Johannes Wiedersich
Question: how likely is it that both disks develop bad blocks, while
none of them is damaged? I'm no expert on this, but I guess a better
fsck and smartctl) them reguarly.
if the chance of a disk failure is (say) 1% in the time alloted, then
the chance of having a failure with disks is 2%. THe change of any one
particular disk failing is still 1%, it the odds of A failure in the
system as a whole that goes up. So with more disks you're more likely
to have failures of some kind, but the per disk failure stays the same
and the odds of losing ALL of them goes the other way. The odds of
losing BOTH disks is .1%. the question becomes, which one has
failed...

A
Johannes Wiedersich
2006-12-06 18:38:01 UTC
Permalink
Post by Andrew Sackville-West
if the chance of a disk failure is (say) 1% in the time alloted, then
the chance of having a failure with disks is 2%. THe change of any one
particular disk failing is still 1%, it the odds of A failure in the
system as a whole that goes up. So with more disks you're more likely
to have failures of some kind, but the per disk failure stays the same
and the odds of losing ALL of them goes the other way. The odds of
losing BOTH disks is .1%. the question becomes, which one has
failed...
...the one with the bad blocks, I would guess. I would just diff the
disks and look at the files that differ manually. Opening your data with
the appropriate application (be it jpgs, txt or office documents....)
will usually tell you which one is the damaged file at a glance. (Of
course, one could also save md5 sums or the like.) I would guess that
the case, where one would get more than a handful of different files
from both, and at the same time both disks are damaged and partially
accessible is extremely unlikely. Most of the damaged disks I have seen
so far did not work at all, so being lucky enough that both disks fail
gracefully seems very unlikely to me.

If I put the same disks in a bank vault for years, I guess the
probability for the first few months will be 1 % for each. After a long
enough time closer to the end of the lifetime of the disks the situation
may be different and each disk has a failure probability of, say 50%. So
in 25 % of all cases you loose data. (If you leave them in the vault for
longer this risk will not only get close to 100%, sooner or later it
will be straight at 100%, whenever that will be.)

In order to increase your chances of at least one surviving disk, it is
essential that you check your disks regularly and replace the failing
one before the second one fails. Ultimately, the disk will die of things
like that the polymers degrade of age which hold the magnetic particles
with your data in place, or some degrading glue inside the housing that
keeps important parts together, or some other 'old age' related problem.

So you have to check your disks in much shorter intervals than the
average lifetime of a disk.

Since you have to check your disks regularly anyway, I think the better
strategy is to rotate backups, while you are at checking your disks
anyway. If there's an undetected error from the backup process (or
something like an undetected damaged file from your original data) you
have the chance to revert to the previous backup. Bad blocks are not the
only way to loose data/backups.

Just my 2ct.

Johannes
Mike McCarty
2006-12-06 19:11:06 UTC
Permalink
Post by Andrew Sackville-West
Post by Johannes Wiedersich
Question: how likely is it that both disks develop bad blocks, while
none of them is damaged? I'm no expert on this, but I guess a better
fsck and smartctl) them reguarly.
if the chance of a disk failure is (say) 1% in the time alloted, then
the chance of having a failure with disks is 2%. THe change of any one
I don't follow this reasoning. Are you presuming independence of the
failures and identical probabilities? If so, then this is the way to
compute it:

Let p be the probability of failure of each disc, independently of the
other. There are four mutually independent events which comprise the
space. Both discs may fail [Pr = p^2]. The first disc may fail, while
the second does not [Pr = p(1-p)]. The second disc may fail, while the
first does not [Pr = (1-p)p]. Both discs may survive [Pr = (1-p)(1-p)].

So, the probability that at least one disc fails is 1-(1-p)(1-p).
For p = 0.01, that is 0.0199.

I'll grant you this is not markedly different from 2%, but it is also
not simply 2p.
Post by Andrew Sackville-West
particular disk failing is still 1%, it the odds of A failure in the
system as a whole that goes up. So with more disks you're more likely
to have failures of some kind, but the per disk failure stays the same
and the odds of losing ALL of them goes the other way. The odds of
losing BOTH disks is .1%. the question becomes, which one has
failed...
I don't follow this reasoning. The probability of both discs failing
(if they do so independently) is not 0.1%, but rather 0.01%. A partially
failed disc is usually easy to detect, since they have FEC on them. A
completely failed disc is even easier to detect :-)

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Andrew Sackville-West
2006-12-06 19:44:50 UTC
Permalink
Post by Mike McCarty
Post by Andrew Sackville-West
Post by Johannes Wiedersich
Question: how likely is it that both disks develop bad blocks, while
none of them is damaged? I'm no expert on this, but I guess a better
fsck and smartctl) them reguarly.
if the chance of a disk failure is (say) 1% in the time alloted, then
the chance of having a failure with disks is 2%. THe change of any one
I don't follow this reasoning. Are you presuming independence of the
failures and identical probabilities? If so, then this is the way to
Let p be the probability of failure of each disc, independently of the
other. There are four mutually independent events which comprise the
space. Both discs may fail [Pr = p^2]. The first disc may fail, while
the second does not [Pr = p(1-p)]. The second disc may fail, while the
first does not [Pr = (1-p)p]. Both discs may survive [Pr = (1-p)(1-p)].
So, the probability that at least one disc fails is 1-(1-p)(1-p).
For p = 0.01, that is 0.0199.
--------------------^^^^

you aren't using an old pentium are you? ;-)
Post by Mike McCarty
I'll grant you this is not markedly different from 2%, but it is also
not simply 2p.
huh. been a long time since math class for me. I was under the
impression that the probability was 2p. chance of one failing is added
to the chance of the other failing give the chance of both
failing. But I'll take your word for it :)
Post by Mike McCarty
Post by Andrew Sackville-West
particular disk failing is still 1%, it the odds of A failure in the
system as a whole that goes up. So with more disks you're more likely
to have failures of some kind, but the per disk failure stays the same
and the odds of losing ALL of them goes the other way. The odds of
losing BOTH disks is .1%. the question becomes, which one has
failed... ^^
^^
-------------------------^^
Post by Mike McCarty
I don't follow this reasoning. The probability of both discs failing
(if they do so independently) is not 0.1%, but rather 0.01%. A partially
failed disc is usually easy to detect, since they have FEC on them. A
completely failed disc is even easier to detect :-)
yeah, that was me shuffling decimals before coffee. I was doing p^2
(.01*.01=.0001 and turning that into .1%). I should know better than
to think that early in the morning.

A
Mike McCarty
2006-12-07 14:41:23 UTC
Permalink
[Mike wrote]
Post by Andrew Sackville-West
Post by Mike McCarty
Let p be the probability of failure of each disc, independently of the
other. There are four mutually independent events which comprise the
space. Both discs may fail [Pr = p^2]. The first disc may fail, while
the second does not [Pr = p(1-p)]. The second disc may fail, while the
first does not [Pr = (1-p)p]. Both discs may survive [Pr = (1-p)(1-p)].
So, the probability that at least one disc fails is 1-(1-p)(1-p).
For p = 0.01, that is 0.0199.
--------------------^^^^
you aren't using an old pentium are you? ;-)
If p = 0.01, then (1-p) = 0.99, and (1-p)(1-p) = 0.9801, so that
1 - (1-p)(1-p) = 0.0199 exactly.
Post by Andrew Sackville-West
Post by Mike McCarty
I'll grant you this is not markedly different from 2%, but it is also
not simply 2p.
huh. been a long time since math class for me. I was under the
impression that the probability was 2p. chance of one failing is added
to the chance of the other failing give the chance of both
failing. But I'll take your word for it :)
If the probability of the first one failing is 60% and the probability
of the second one failing is 60%, then I suppose you believe that the
probability of one or the other or both failing is 120%.

The probability in this case works out to 1-(0.4)(0.4) = 0.84.

The way to think of it is that it is 1 - Pr(none fails).
This extends to any number of independent events.
The problem with trying to add those probabilities is that
the events are not disjoint, so the probabilities do not add.

You can however decompose it into events A, B, and C which are
disjoint, as A = {disc one fails, but disc two does not},
B = {disc two fails, but disc A one does not} and
C = {both discs fail}. This is all the ways one or more discs
fail, and the events are disjoint.

Pr(A) = p(1-p)
Pr(B) = (1-p)p
Pr(C) = p^2

So, Pr(at least one disc fails) = p - p^2 + p - p^2 + p^2
= 2p - p^2

This technique gets more complicated as the number of events
increases.

NB: 1 - (1-p)^2 = 1 - (1 - 2p + p^2) = 2p - p^2, as computed
above.

[snip]
Post by Andrew Sackville-West
Post by Mike McCarty
I don't follow this reasoning. The probability of both discs failing
(if they do so independently) is not 0.1%, but rather 0.01%. A partially
failed disc is usually easy to detect, since they have FEC on them. A
completely failed disc is even easier to detect :-)
yeah, that was me shuffling decimals before coffee. I was doing p^2
(.01*.01=.0001 and turning that into .1%). I should know better than
to think that early in the morning.
:-)

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Andrew Sackville-West
2006-12-07 15:59:22 UTC
Permalink
Post by Mike McCarty
[Mike wrote]
Post by Andrew Sackville-West
Post by Mike McCarty
Let p be the probability of failure of each disc, independently of the
other. There are four mutually independent events which comprise the
space. Both discs may fail [Pr = p^2]. The first disc may fail, while
the second does not [Pr = p(1-p)]. The second disc may fail, while the
first does not [Pr = (1-p)p]. Both discs may survive [Pr = (1-p)(1-p)].
So, the probability that at least one disc fails is 1-(1-p)(1-p).
For p = 0.01, that is 0.0199.
--------------------^^^^
you aren't using an old pentium are you? ;-)
If p = 0.01, then (1-p) = 0.99, and (1-p)(1-p) = 0.9801, so that
1 - (1-p)(1-p) = 0.0199 exactly.
that was a joke...
Post by Mike McCarty
Post by Andrew Sackville-West
Post by Mike McCarty
I'll grant you this is not markedly different from 2%, but it is also
not simply 2p.
huh. been a long time since math class for me. I was under the
impression that the probability was 2p. chance of one failing is added
to the chance of the other failing give the chance of both
failing. But I'll take your word for it :)
If the probability of the first one failing is 60% and the probability
of the second one failing is 60%, then I suppose you believe that the
probability of one or the other or both failing is 120%.
no I don't. consider me thinking aloud.
Post by Mike McCarty
The probability in this case works out to 1-(0.4)(0.4) = 0.84.
The way to think of it is that it is 1 - Pr(none fails).
This extends to any number of independent events.
The problem with trying to add those probabilities is that
the events are not disjoint, so the probabilities do not add.
its starting to come back to me.
Post by Mike McCarty
You can however decompose it into events A, B, and C which are
disjoint, as A = {disc one fails, but disc two does not},
B = {disc two fails, but disc A one does not} and
C = {both discs fail}. This is all the ways one or more discs
fail, and the events are disjoint.
Pr(A) = p(1-p)
Pr(B) = (1-p)p
Pr(C) = p^2
So, Pr(at least one disc fails) = p - p^2 + p - p^2 + p^2
= 2p - p^2
This technique gets more complicated as the number of events
increases.
NB: 1 - (1-p)^2 = 1 - (1 - 2p + p^2) = 2p - p^2, as computed
above.
doesn't this involve pascal's triangle somewhere? my brain is too
fuzzy for this stuff. I need to get a good book and retrain my brain.

thanks for the refresher!

A
Mike McCarty
2006-12-07 22:09:23 UTC
Permalink
Post by Andrew Sackville-West
Post by Mike McCarty
[Mike wrote]
Post by Andrew Sackville-West
Post by Mike McCarty
So, the probability that at least one disc fails is 1-(1-p)(1-p).
For p = 0.01, that is 0.0199.
--------------------^^^^
you aren't using an old pentium are you? ;-)
If p = 0.01, then (1-p) = 0.99, and (1-p)(1-p) = 0.9801, so that
1 - (1-p)(1-p) = 0.0199 exactly.
that was a joke...
Sorry, missed that.

[snip]
Post by Andrew Sackville-West
Post by Mike McCarty
If the probability of the first one failing is 60% and the probability
of the second one failing is 60%, then I suppose you believe that the
probability of one or the other or both failing is 120%.
no I don't. consider me thinking aloud.
Now you missed *my* joke! Sorry, I shoulda put in a smiley.

[snip]
Post by Andrew Sackville-West
Post by Mike McCarty
NB: 1 - (1-p)^2 = 1 - (1 - 2p + p^2) = 2p - p^2, as computed
above.
doesn't this involve pascal's triangle somewhere? my brain is too
fuzzy for this stuff. I need to get a good book and retrain my brain.
Everything involves Pascal's triangle somewhere :-)
Post by Andrew Sackville-West
thanks for the refresher!
Sorry, my grad schooling is in Mathematical Probability and Statistics.
I almost got a Ph.D (a kid suddenly came along, and I had to get a
real job), but was in a program which only made Ph.Ds,
so I didn't wind up with an MSc. along the way, which I have certainly
exceeded the normal requirements for :-(

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Douglas Tutty
2006-12-08 02:08:38 UTC
Permalink
Post by Mike McCarty
Everything involves Pascal's triangle somewhere :-)
Post by Andrew Sackville-West
thanks for the refresher!
Sorry, my grad schooling is in Mathematical Probability and Statistics.
I almost got a Ph.D (a kid suddenly came along, and I had to get a
real job), but was in a program which only made Ph.Ds,
so I didn't wind up with an MSc. along the way, which I have certainly
exceeded the normal requirements for :-(
Please don't appologize Mike,

It refreshing to look at how the hardware works (or doesn't). I passed
(but really failed statistics). I was doing my degree in nursing.
Going into the final exam I needed 180% to pass. I passed; what was the
probability? Go figure. It came back to bite me in my pathology final:
the prof 'bell curved' the results because too many people failed the
course. He printed raw score and bell score. My raw was 98%. My bell
was 65%. I couldn't find the error in his bell (apparently his software
'wrapped' people who were beyond some deviation somewhere). Go figure.

I hate statistics. I'll take the low-tech approach: the drive will
fail. If it hard fails, its dead and nothing I do ahead of time will
change that, so have another copy somewhere else. If it soft fails, and
I have prepared FEC somehow, I could get it back because theres a chance
that both drives soft fail.
Douglas Tutty
2006-12-07 00:46:16 UTC
Permalink
Post by Mike McCarty
Post by Andrew Sackville-West
Post by Johannes Wiedersich
Question: how likely is it that both disks develop bad blocks, while
none of them is damaged? I'm no expert on this, but I guess a better
fsck and smartctl) them reguarly.
if the chance of a disk failure is (say) 1% in the time alloted, then
the chance of having a failure with disks is 2%. THe change of any one
I don't follow this reasoning. Are you presuming independence of the
failures and identical probabilities? If so, then this is the way to
Let p be the probability of failure of each disc, independently of the
other. There are four mutually independent events which comprise the
space. Both discs may fail [Pr = p^2]. The first disc may fail, while
the second does not [Pr = p(1-p)]. The second disc may fail, while the
first does not [Pr = (1-p)p]. Both discs may survive [Pr = (1-p)(1-p)].
So, the probability that at least one disc fails is 1-(1-p)(1-p).
For p = 0.01, that is 0.0199.
I'll grant you this is not markedly different from 2%, but it is also
not simply 2p.
Post by Andrew Sackville-West
particular disk failing is still 1%, it the odds of A failure in the
system as a whole that goes up. So with more disks you're more likely
to have failures of some kind, but the per disk failure stays the same
and the odds of losing ALL of them goes the other way. The odds of
losing BOTH disks is .1%. the question becomes, which one has
failed...
I don't follow this reasoning. The probability of both discs failing
(if they do so independently) is not 0.1%, but rather 0.01%. A partially
failed disc is usually easy to detect, since they have FEC on them. A
completely failed disc is even easier to detect :-)
Mike,

Without expending any mathematical energy, could you recompute your two
probabilities based on a set of three disks instead of 2? I'm guessing
that the probability of one disk failing goes up but the probability of
all three failing drops substantially (the famious tripple-redundancy
theory).

I'm assuming that a partialy failed disk will return good data (because
of the FEC) and that an error notice ends up in syslog (do you know the
severity)?

How does a raid1 array handle a partially failing disk? Does it just
take the good data and carry on until the drive completly fails or does
mdadm also get involved in issuing a warning of a failing drive?

Thanks,

Doug.
Mike McCarty
2006-12-07 15:04:44 UTC
Permalink
Post by Douglas Tutty
Mike,
Without expending any mathematical energy, could you recompute your two
probabilities based on a set of three disks instead of 2? I'm guessing
that the probability of one disk failing goes up but the probability of
all three failing drops substantially (the famious tripple-redundancy
theory).
It's very easy. Suppose we have events E1, E2, E3, ... En. Suppose that
the occur independently. Suppose that Pr(Ek) = Pk. Then the probability
that at least one of the Ek occurs is

1 - (1-P1)(1-P2)(1-P3)...(1-Pn)

If the probabilites are all the same, Pk = p for all k, then it is

1 - (1-p)^n

In the case we just discussed where p = 0.01 and using n = 3 per your
request, we get

Pr(one or more of three discs fails) = 1 - 0.99^3 = 0.029701
Post by Douglas Tutty
I'm assuming that a partialy failed disk will return good data (because
of the FEC) and that an error notice ends up in syslog (do you know the
severity)?
Depends on how "partially" it fails. It may be "partially" failed so
that some sectors are readable, and others are not.

I'm not a Linux expert, so I don't know the answer to your question.
Post by Douglas Tutty
How does a raid1 array handle a partially failing disk? Does it just
take the good data and carry on until the drive completly fails or does
mdadm also get involved in issuing a warning of a failing drive?
Umm,

RAID 0 striping, results in speedy access only, not true RAID
RAID 1 writes redundant copies of data to two (or more) discs
RAID 3 writes redundant data to three (or more) discs reserves
one disc as "parity"; requires at least four discs

I've implemented a RAID 1 system in a fully redundant system (dual
discs, dual controllers, each controller able to control each
disc, both controllers connected to independent computers with
separate power supplies). That's the way I did it. Writes went
to both discs, and did not return to the requesting app until
both writes were complete unless a no wait I/O was requested,
in which case I provided notification when the write completed.

Reads simply used the disc on the same side using the controller
on the same side as the requesting CPU, unless one disc failed.
If I got read error reports, then I used a leaky bucket. If the
leaky bucket overflowed, then I placed the suspect disc out of
service, asserted a frame alarm, issued an Information or Problem
Report (IPR) and continued with just one disc. When the disc
got replaced, then I'd format it, and start equalizing. I started
at the bottom of disc, and kept a high water mark. Any writes
below the high water mark were done redundantly, while writes
above the high water mark were done simplex. When the high
water mark reached the top of disc, then it was put back into
service, and the alarm was abated.

I did the same sort of thing with the controllers, placing
one out of service if it was deemed to be failing, and all
reads and writes took place through the still functioning
controller, until the failing one got replaced.

For more information, I suggest that you consult
http://www.sohoconsult.ch/raid/raid.html
for some information on how the various level of RAID work.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
h***@topoi.pooq.com
2006-12-06 14:16:31 UTC
Permalink
Post by Douglas Tutty
Post by Johannes Wiedersich
Post by Douglas Tutty
I'm going to be backing up to a portable ruggedized hard drive.
Currently, my backups end up in tar.bz2 format.
It would be nice if there was some redundancy in the data stream to
handle blocks that go bad while the drive is in storage (e.g. archive).
How is this handled on tape? Is it built-into the hardware
compression?
Do I need to put a file system on a disk partition if I'm only saving
one archive file or can I just write the archive to the partition
directly (and read it back) as if it was a scsi tape?
Is there an archive or compression format that includes the ability to
not only detect errors but to correct them? (e.g. store ECC data
elsewhere in the file) If there was, and I could write it directly to
the disk, then that would solve the blocks-failing-while-drive-stored
issue.
Now, to something completely different....
If data integrity is your concern, than maybe a better solution than
compression is to copy all your data with rsync or another backup tool
that 'mirrors' your files instead of packing them all together in one
large file. If something goes wrong with this large file you might loose
the backup of all your files. If something goes wrong with the
transmission of one file in the rsync case you will only 'loose' the
backup of that one file and just restart the rsync command.
Well, at least I much prefer to spend a bit more on storage and have all
my files copied individually. It adds the benefit that it is
straightforward to verify the integrity of the backup via 'diff -r'.
As far as redundancy is concerned I would prefer to use a second disk
(and while you are at it store it in a different location, miles away
from the other). I have one backup at home and another one at my
mother's house, adding several layers of security to my data.
Johannes
Thanks Johannes,
Yes I use JFS for my file systems. I have raid1 on my main drives. I
will have one portable drive at home, so several layers of backup here.
The issue is off-site backup and that's where the disk in the bank comes
in.
The problem is that a journal on a hard disk only protects the
filesystem from an inconsistant state due to power failure. It does
nothing to protect the data if it was written correctly 5 years ago and
never mounted since. If a block or two goes bad then that piece of data
is lost. It could make the filesystem unmountable.
I haven't been able to find a filesystem that provides redundancy that
is free. The companies that pioneered disk-based virtual tape serves
have their own (e.g. Veritas). This is why I'm looking at archive
formats.
The idea is that a format with built-in error-correcting would scatter
the redundancy around the disk so that if a few blocks are bad, the data
can still be retreived.
Even raid1 doesn't accomplish this. With raid1 and two disks, if both
disks have bad blocks appear, even if they are on different spots on
each drive, as far as I can tell raid1 can't create a virtual pristine
partition out of several damaged ones.
Searching aptitude, there seem to be a few packages that address this
issue obliquely (given two corrupted archives, can create a single
pristine archive) but need two complete archive sets. I have to look at
the par spec.
Basically, I want to do for my archives what ECC does for memory. With
ECC memory, for every 8 bits, there's one extra bit of storage. It can
fix single-bit errors. If I'm remembering my math right, ECC adds 15%
to the size of an archive __prior_to_compression__. Its impossible to
do with less than 1:1 (100%) on compressed data. Its therefore best
done from __within__ the compression algorithm. Take a block of data
from the input stream, make the ECC data, compress the block of data,
append the ECC, and spit this to the output stream and write the ECC
data to an ECC stream. At the end of the input stream, take the ECC
stream, make ECC data for that, compress the ECC stream, append the ECC
for that, spit this to the output stream.
You need to add the ECC *after* the compression. ECC adds redundancy
that allows one to recover from a small amount of damage.

If you add ECC before compression, and, say, a single bit gets changed
to the compressed archive, decompressing it will likely not yield a
block with a small amount of damage; it will more likely yield total
gibberish -- and ECC on that is not likely to help.

If you add ECC after compression, and a single bit gets changed, then
ECC will make it possible to correct the compressed block, after which
decompression will work.

If you want to be able to recover data despite damage, it is in general
not wise to compress it, since different parts will be damaged
independently, and the undamaged parts will still be readable.
Squeezing out redundancy makes different parts of the data dependent on
one another for interpretation.

-- hendrik
Reid Priedhorsky
2006-12-07 03:02:37 UTC
Permalink
Post by h***@topoi.pooq.com
If you want to be able to recover data despite damage, it is in general
not wise to compress it, since different parts will be damaged
independently, and the undamaged parts will still be readable.
Squeezing out redundancy makes different parts of the data dependent on
one another for interpretation.
No, you _should_ compress it and then use some of the space you saved to
add some carefully chosen redundancy which will allow you to reconstruct
everything, not just some things, in case of failure. (E.g., using par2.)

Scenario A: Compression

Suppose you have 100 megabytes of files, uncompressed. You create a tar
archive and compress it down to 75M. A failure occurs, and 2M of data
are lost. The archive becomes impossible to decompress, and you lose
everything. You are very sad.

Scenario B: No compression

Suppose you have 100 megabytes of files, uncompressed. A failure occurs,
and 2M of data is lost. All files intersecting the broken region are
destroyed (modulo any Herculean effort one is willing to put into
reconstruction). You are sad, but not as sad as Scenario A.

Scenario C: Compression plus redundancy

Suppose you have 100 megabytes of files, uncompressed. You create a tar
archive and compress it down to 75M. You then create 10M of redundancy
using (e.g.) par2, for a total of 85M. A failure occurs, and 2M of data
is lost. You use par2 to reconstruct the archive, and nothing is lost.
(You can do this regardless of whether data, redundancy, or both are
destroyed.) You are happy.

HTH,

Reid
--
To UNSUBSCRIBE, email to debian-user-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org
Douglas Tutty
2006-12-07 14:16:11 UTC
Permalink
Post by Reid Priedhorsky
No, you _should_ compress it and then use some of the space you saved to
add some carefully chosen redundancy which will allow you to reconstruct
everything, not just some things, in case of failure. (E.g., using par2.)
Scenario C: Compression plus redundancy
Suppose you have 100 megabytes of files, uncompressed. You create a tar
archive and compress it down to 75M. You then create 10M of redundancy
using (e.g.) par2, for a total of 85M. A failure occurs, and 2M of data
is lost. You use par2 to reconstruct the archive, and nothing is lost.
(You can do this regardless of whether data, redundancy, or both are
destroyed.) You are happy.
Hi Reid,

I've been looking at par2. The question remains how to apply it to data
stored on media where the potential failure is one of media not
transmittion. If I only protect the tar.bz2 file and a media failure
occurs, how could I have set up the par2 redundancy files to allow me to
recover the data.

Apparently, hard disks use FEC themselves so that they either can fix
the data or there is too much damage and the drive is inaccessible. It
seems to be an all-or-nothing propositition. If someone has experience
of FEC drive failures that refutes this I'd be very interested.

The only disk failures I have experienced are on older drives without
FEC that for a given sector return an error about bad CRC but one can
carry on and read the rest of the disk. It was from this perspective
that I proposed the question that led to this thread.

If drives are atomic in this way, it seems that the only way to achieve
redundancy is through multiple copies (either manually done or via
raid1).

I'm still hoping that someone who knows how linux software raid work can
tell me how it decides that a drive has failed. This question was posed
in a thread about raid1 internals.

Thanks,

Doug.
Ron Johnson
2006-12-07 14:36:39 UTC
Permalink
[snip]
Post by Douglas Tutty
Apparently, hard disks use FEC themselves so that they either can fix
the data or there is too much damage and the drive is inaccessible. It
seems to be an all-or-nothing propositition. If someone has experience
of FEC drive failures that refutes this I'd be very interested.
The only disk failures I have experienced are on older drives without
FEC that for a given sector return an error about bad CRC but one can
carry on and read the rest of the disk. It was from this perspective
that I proposed the question that led to this thread.
If drives are atomic in this way, it seems that the only way to achieve
But I don't think they are. Depending on the problem, drives that
go bad can spit out scary messages to syslog for weeks before they die.

Of course, it all depends on the problem. If the drive electronics
or mechanics die, you'll have to send it off to a data recovery company.

Drives just have more that can go wrong: electronics, mechanicals
and media. Tapes just have, well, tape: the media. If a drive goes
bad, you call the vendor and they come out and repair it (most
likely via a jerk-and-switch).

However, the cost of a tape drive plus support contract might
outweigh the cost of sending a dud HDD to a data recovery company.

BTW, how much data do you have to archive, and at what frequency?
One time only, or weekly, monthly, quarterly? Is this personal
data, or company data?
Post by Douglas Tutty
redundancy is through multiple copies (either manually done or via
But you should do that anyway. How important *is* your data.
Post by Douglas Tutty
raid1).
RAID is *not* for archives!!!
Post by Douglas Tutty
I'm still hoping that someone who knows how linux software raid work can
tell me how it decides that a drive has failed. This question was posed
in a thread about raid1 internals.
- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
Douglas Tutty
2006-12-07 15:07:08 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
[snip]
Post by Douglas Tutty
Apparently, hard disks use FEC themselves so that they either can fix
the data or there is too much damage and the drive is inaccessible. It
seems to be an all-or-nothing propositition. If someone has experience
of FEC drive failures that refutes this I'd be very interested.
The only disk failures I have experienced are on older drives without
FEC that for a given sector return an error about bad CRC but one can
carry on and read the rest of the disk. It was from this perspective
that I proposed the question that led to this thread.
If drives are atomic in this way, it seems that the only way to achieve
But I don't think they are. Depending on the problem, drives that
go bad can spit out scary messages to syslog for weeks before they die.
Right, but for a disconnected drive sitting on a shelf? Also, during
those scary messages, do I still get all the data (in effect the drive
is fully function just complaining), or are some blocks unreadable. If
the former, that counts as atomic.
Of course, it all depends on the problem. If the drive electronics
or mechanics die, you'll have to send it off to a data recovery company.
Drives just have more that can go wrong: electronics, mechanicals
and media. Tapes just have, well, tape: the media. If a drive goes
bad, you call the vendor and they come out and repair it (most
likely via a jerk-and-switch).
Right but a tape media has a mechanical aspect to it. They also have a
narrow environmental storage range compared to disks.
However, the cost of a tape drive plus support contract might
outweigh the cost of sending a dud HDD to a data recovery company.
Of of a third hard drive.
BTW, how much data do you have to archive, and at what frequency?
One time only, or weekly, monthly, quarterly? Is this personal
data, or company data?
Personal data. Design archive size is 80 GB but want it to scale well.

What I've found is that for the same money as a DLT tape, I can get a
2.5" seagate hard drive. A ruggedized enclosure is $30 (Addonics
Jupiter). The only cost that corresponds to a tape drive unit is the
interface cable to connect the drive enclosure. I can keep a USB cable
in the bank with the archive so that I can use it with any computer, and
an eSATA cable here (eSATA will _eventually_ be hot plug I hope).
Post by Douglas Tutty
redundancy is through multiple copies (either manually done or via
But you should do that anyway. How important *is* your data.
Post by Douglas Tutty
raid1).
RAID is *not* for archives!!!
When I say raid1, I'm referring to having the archive drive added to a
raid1 array while it syncs then removed from the array and put into
storage. Out of the array it can function as an independant drive if
need be, but can also be used to recreate the array if the origional
array drives are destroyed.

Besides, virtual tape storage units seem to use raid.

So the question is, do I use a filesystem like JFS on the drive or use
tar.bz2 with par2 files appended directly to the drive.

If the drives are atomic, then I may as well use a filesystem for
convenience. If failures after storage are likely to show up as some
unreadable and unrecoverable blocks, then looking at some redundancy in
the data stream itself may be useful.

Thanks,

Doug.
Ron Johnson
2006-12-07 17:18:20 UTC
Permalink
Post by Douglas Tutty
Post by Mike McCarty
[snip]
[snip]
Post by Douglas Tutty
Personal data. Design archive size is 80 GB but want it to scale well.
IMO, you're making a mountain out of a molehill.
Post by Douglas Tutty
What I've found is that for the same money as a DLT tape, I can get a
2.5" seagate hard drive. A ruggedized enclosure is $30 (Addonics
Jupiter). The only cost that corresponds to a tape drive unit is the
interface cable to connect the drive enclosure. I can keep a USB cable
in the bank with the archive so that I can use it with any computer, and
an eSATA cable here (eSATA will _eventually_ be hot plug I hope).
2.5" drives in a plain-old USB enclosure are all you need. Buy 3 of
them. One goes to the bank vault, another to your parents house,
another in your bedroom closet.

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
Mike McCarty
2006-12-07 22:27:57 UTC
Permalink
Post by Ron Johnson
RAID is *not* for archives!!!
RAID was not designed for archives. I can see no reason why
it wouldn't work for that. RAID 1, for example, is simply
making two (or more) copies of the data. Are you saying that
making more than one copy of a backup is not a reasonable
approach?

I see no reason why RAID 5 could not be used when data
span multiple CDROMs, for example. Let's consider the
case where the data span two CDROMs, and one wants
some assurance that if one of the media fails, the
data will still be recoverable. Then one can write the
two CDROMs, and then write another which is the bitwise
XOR of the other two CDROMs. If any one of them fails,
all data are recoverable. This only takes 1.5x as many
discs, whereas making two copies would take 2x.

Essentially, with RAID 5, one uses some version of a
systematic BCH code with the check bits stored across
the additional drives. Then, if one or more of the drives
should fail, the data on the others could be used to
reconstruct the data. I see no reason why this could not
be applied to CDROMs, or any other archive medium.

The main advantages would be that one would essentially
have burst error correction of the size of the disc
(this being, with the FEC on the disc, if any, an
interleaved code in effect), which is enormous, indeed,
and economy in storage over using multiple copies, as
illustrated above.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Ron Johnson
2006-12-07 22:36:02 UTC
Permalink
Post by Mike McCarty
Post by Ron Johnson
RAID is *not* for archives!!!
RAID was not designed for archives. I can see no reason why
it wouldn't work for that. RAID 1, for example, is simply
making two (or more) copies of the data. Are you saying that
making more than one copy of a backup is not a reasonable
approach?
[snip]
Post by Mike McCarty
The main advantages would be that one would essentially
have burst error correction of the size of the disc
(this being, with the FEC on the disc, if any, an
interleaved code in effect), which is enormous, indeed,
and economy in storage over using multiple copies, as
illustrated above.
I'd only trust "RAID archiving" if the controller and a rescue CD
were also stored in the "archive location" along with the hard drives.

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
Mike McCarty
2006-12-07 23:36:10 UTC
Permalink
Post by Mike McCarty
Post by Mike McCarty
Post by Ron Johnson
RAID is *not* for archives!!!
RAID was not designed for archives. I can see no reason why
it wouldn't work for that. RAID 1, for example, is simply
making two (or more) copies of the data. Are you saying that
making more than one copy of a backup is not a reasonable
approach?
[snip]
[I wrote]
Post by Mike McCarty
Post by Mike McCarty
The main advantages would be that one would essentially
have burst error correction of the size of the disc
(this being, with the FEC on the disc, if any, an
interleaved code in effect), which is enormous, indeed,
and economy in storage over using multiple copies, as
illustrated above.
I'd only trust "RAID archiving" if the controller and a rescue CD
were also stored in the "archive location" along with the hard drives.
I gave two examples of RAID archiving which required no
special controller, and which would need to special rescue CD
to use. You snipped those, and didn't answer my question.
I'll ask the question again, and then I'll supply you with
a simple software RAID implementation which I wrote in
less than 15 minutes.

RAID level 1 is simply multiple copies. Do you think that
making multiple copies of the backup is an unreasonable
way to protect data from loss?

One way of doing RAID 5 with three discs is to write data
to each of two discs, and write the bitwise XOR of the
data on the two discs to a third. This requires no special
controller, it simply requires a tiny program. Here's one
in completely portable C:

#include <stdlib.h>
#include <stdio.h>

int main(void) {
FILE *Input1, *Input2, *Output;
int Chr1, Chr2;

Input1 = fopen("disc1.raw","rb");
Input2 = fopen("disc2.raw","rb");
Output = fopen("disc3.raw","wb");
if (!Input1 || !Input2 || !Output)
fprintf(stderr,"unable to open files\n"), exit(EXIT_FAILURE);
while ((Chr1 = fgetc(Input1)) != EOF && (Chr2 = fgetc(Input2)) != EOF)
fputc(Chr1 ^ Chr2,Output);
if (Chr1 != Chr2) {
if (Chr1 == EOF)
Input1 = Input2;
while ((Chr1 = fgetc(Input1)) != EOF)
fputc(Chr1,Output);
}
return EXIT_SUCCESS;
}

Code compiled but not tested. The files disc1.raw and disc2.raw
are the inputs, disc3.raw gets written. This program works both
to create the image, and to recover from a fault in one of the
discs. The shorter disc image is padded with bytes with all bits
off to fit the length of the longer. For recovery this would
need some change, of course, to truncate the output as necessary.

This program has, of course, no error checking or other fancies.
It is a proof-of-concept prototype only. But it shows just how
simple a RAID system can be.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Ron Johnson
2006-12-08 00:14:01 UTC
Permalink
Post by Mike McCarty
Post by Mike McCarty
Post by Mike McCarty
Post by Ron Johnson
RAID is *not* for archives!!!
RAID was not designed for archives. I can see no reason why
it wouldn't work for that. RAID 1, for example, is simply
making two (or more) copies of the data. Are you saying that
making more than one copy of a backup is not a reasonable
approach?
[snip]
[I wrote]
Post by Mike McCarty
Post by Mike McCarty
The main advantages would be that one would essentially
have burst error correction of the size of the disc
(this being, with the FEC on the disc, if any, an
interleaved code in effect), which is enormous, indeed,
and economy in storage over using multiple copies, as
illustrated above.
I'd only trust "RAID archiving" if the controller and a rescue CD
were also stored in the "archive location" along with the hard drives.
I gave two examples of RAID archiving which required no
special controller, and which would need to special rescue CD
to use. You snipped those, and didn't answer my question.
I'll ask the question again, and then I'll supply you with
a simple software RAID implementation which I wrote in
less than 15 minutes.
RAID level 1 is simply multiple copies. Do you think that
making multiple copies of the backup is an unreasonable
way to protect data from loss?
One way of doing RAID 5 with three discs is to write data
to each of two discs, and write the bitwise XOR of the
data on the two discs to a third. This requires no special
controller, it simply requires a tiny program. Here's one
Separate parity disk is RAID-4, not -5.
Post by Mike McCarty
#include <stdlib.h>
#include <stdio.h>
int main(void) {
FILE *Input1, *Input2, *Output;
int Chr1, Chr2;
Input1 = fopen("disc1.raw","rb");
Input2 = fopen("disc2.raw","rb");
Output = fopen("disc3.raw","wb");
if (!Input1 || !Input2 || !Output)
fprintf(stderr,"unable to open files\n"), exit(EXIT_FAILURE);
while ((Chr1 = fgetc(Input1)) != EOF && (Chr2 = fgetc(Input2)) != EOF)
fputc(Chr1 ^ Chr2,Output);
if (Chr1 != Chr2) {
if (Chr1 == EOF)
Input1 = Input2;
while ((Chr1 = fgetc(Input1)) != EOF)
fputc(Chr1,Output);
}
return EXIT_SUCCESS;
}
Shouldn't these be raw devices instead of files on filesystems?
Post by Mike McCarty
Code compiled but not tested. The files disc1.raw and disc2.raw
are the inputs, disc3.raw gets written. This program works both
to create the image, and to recover from a fault in one of the
discs. The shorter disc image is padded with bytes with all bits
off to fit the length of the longer. For recovery this would
need some change, of course, to truncate the output as necessary.
This program has, of course, no error checking or other fancies.
It is a proof-of-concept prototype only. But it shows just how
simple a RAID system can be.
What you've done is create a parity disk. Useful, but not RAID.

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
Mike McCarty
2006-12-08 00:52:15 UTC
Permalink
Post by Ron Johnson
Post by Mike McCarty
Post by Ron Johnson
I'd only trust "RAID archiving" if the controller and a rescue CD
were also stored in the "archive location" along with the hard drives.
I gave two examples of RAID archiving which required no
special controller, and which would need to special rescue CD
to use. You snipped those, and didn't answer my question.
I'll ask the question again, and then I'll supply you with
a simple software RAID implementation which I wrote in
less than 15 minutes.
RAID level 1 is simply multiple copies. Do you think that
making multiple copies of the backup is an unreasonable
way to protect data from loss?
One way of doing RAID 5 with three discs is to write data
to each of two discs, and write the bitwise XOR of the
data on the two discs to a third. This requires no special
controller, it simply requires a tiny program. Here's one
Separate parity disk is RAID-4, not -5.
Not according to my information.

See http://www.sohoconsult.ch/raid/raid.html
Post by Ron Johnson
Post by Mike McCarty
#include <stdlib.h>
#include <stdio.h>
int main(void) {
FILE *Input1, *Input2, *Output;
int Chr1, Chr2;
Input1 = fopen("disc1.raw","rb");
Input2 = fopen("disc2.raw","rb");
Output = fopen("disc3.raw","wb");
if (!Input1 || !Input2 || !Output)
fprintf(stderr,"unable to open files\n"), exit(EXIT_FAILURE);
while ((Chr1 = fgetc(Input1)) != EOF && (Chr2 = fgetc(Input2)) != EOF)
fputc(Chr1 ^ Chr2,Output);
if (Chr1 != Chr2) {
if (Chr1 == EOF)
Input1 = Input2;
while ((Chr1 = fgetc(Input1)) != EOF)
fputc(Chr1,Output);
}
return EXIT_SUCCESS;
}
Shouldn't these be raw devices instead of files on filesystems?
We're talking about images to be writen to some medium as an example
for backup. In any case, the names in the quotes can be changed
to "/dev/hda", "/dev/hdb" and "/dev/hdc" if you insist on
using direct devices.
Post by Ron Johnson
Post by Mike McCarty
Code compiled but not tested. The files disc1.raw and disc2.raw
are the inputs, disc3.raw gets written. This program works both
to create the image, and to recover from a fault in one of the
discs. The shorter disc image is padded with bytes with all bits
off to fit the length of the longer. For recovery this would
need some change, of course, to truncate the output as necessary.
This program has, of course, no error checking or other fancies.
It is a proof-of-concept prototype only. But it shows just how
simple a RAID system can be.
What you've done is create a parity disk. Useful, but not RAID.
Pardon? Perhaps you are using a definition of RAID I'm not
familiar with. "Redundant Array of Inexpensive Discs"
(some have changed that to "Independent".)

I'll admit that this is not the usual way it is implemented,
but it is certainly RAID to my way of thinking, and conforms to the
definitions I've seen, and uses the same kind of redundant data.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Ron Johnson
2006-12-08 02:37:12 UTC
Permalink
[snip]
Post by Mike McCarty
Post by Ron Johnson
One way of doing RAID 5 with three discs is to write data to
each of two discs, and write the bitwise XOR of the data on
the two discs to a third. This requires no special
controller, it simply requires a tiny program. Here's one in
Separate parity disk is RAID-4, not -5.
Not according to my information.
See http://www.sohoconsult.ch/raid/raid.html
http://en.wikipedia.org/wiki/RAID#RAID_4

A RAID 4 uses block-level striping with a dedicated
parity disk. RAID 4 looks similar to RAID 5 except
that it does not use distributed parity,

http://en.wikipedia.org/wiki/RAID#RAID_5

A RAID 5 uses block-level striping with parity data
distributed across all member disks.

This is how it is implemented on the "enterprise" systems we have at
work.

What your code looks like is RAID-3.

http://en.wikipedia.org/wiki/RAID#RAID_3

A RAID 3 uses byte-level striping with a dedicated
parity disk. RAID 3 is very rare in practice. One
of the side-effects of RAID 3 is that it generally
cannot service multiple requests simultaneously. This
comes about because any single block of data will, by
definition, be spread across all members of the set
and will reside in the same location. So, any I/O
operation requires activity on every disk.

[snip]
Post by Mike McCarty
Post by Ron Johnson
What you've done is create a parity disk. Useful, but not
RAID.
Pardon? Perhaps you are using a definition of RAID I'm not
familiar with. "Redundant Array of Inexpensive Discs" (some have
changed that to "Independent".)
I'll admit that this is not the usual way it is implemented, but
it is certainly RAID to my way of thinking, and conforms to the
definitions I've seen, and uses the same kind of redundant data.
Integral to the implementation of RAID is that the disk is divided
into large blocks, and there are parity blocks for data blocks.

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
Mike McCarty
2006-12-08 02:42:11 UTC
Permalink
Ron Johnson wrote:

[snip]
Post by Ron Johnson
What your code looks like is RAID-3.
http://en.wikipedia.org/wiki/RAID#RAID_3
A RAID 3 uses byte-level striping with a dedicated
parity disk. RAID 3 is very rare in practice. One
of the side-effects of RAID 3 is that it generally
cannot service multiple requests simultaneously. This
comes about because any single block of data will, by
definition, be spread across all members of the set
and will reside in the same location. So, any I/O
operation requires activity on every disk.
[snip]
Ok, fair enough. But I didn't do any striping.
Post by Ron Johnson
Post by Mike McCarty
Post by Ron Johnson
What you've done is create a parity disk. Useful, but not
RAID.
Pardon? Perhaps you are using a definition of RAID I'm not
familiar with. "Redundant Array of Inexpensive Discs" (some have
changed that to "Independent".)
I'll admit that this is not the usual way it is implemented, but
it is certainly RAID to my way of thinking, and conforms to the
definitions I've seen, and uses the same kind of redundant data.
Integral to the implementation of RAID is that the disk is divided
into large blocks, and there are parity blocks for data blocks.
I don't like arguments. That isn't what I've been led to believe.
Raid 1 doesn't do that, AFAIK. I won't respond further, as this
is getting way away from the OPs question.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Ron Johnson
2006-12-08 05:23:05 UTC
Permalink
Post by Mike McCarty
[snip]
[snip]
Post by Mike McCarty
I don't like arguments. That isn't what I've been led to believe.
Raid 1 doesn't do that, AFAIK. I won't respond further, as this
is getting way away from the OPs question.
Anyway, yours is a good idea for making data more secure than plain
old redundancy.

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
Douglas Tutty
2006-12-08 01:51:29 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Post by Mike McCarty
Post by Ron Johnson
RAID is *not* for archives!!!
RAID was not designed for archives. I can see no reason why
it wouldn't work for that. RAID 1, for example, is simply
making two (or more) copies of the data. Are you saying that
making more than one copy of a backup is not a reasonable
approach?
[snip]
Post by Mike McCarty
The main advantages would be that one would essentially
have burst error correction of the size of the disc
(this being, with the FEC on the disc, if any, an
interleaved code in effect), which is enormous, indeed,
and economy in storage over using multiple copies, as
illustrated above.
I'd only trust "RAID archiving" if the controller and a rescue CD
were also stored in the "archive location" along with the hard drives.
In the case I'm talking about for raid is in the case of my current
raid1 of my system. Vanialla Etch LVM on MD. The "controler" is the
linux kernel. Each of my existing disks can boot on its own (since grub
can't read two MBRs at once, but my bios can boot either drive since I
installed grub into the MBR of both). That takes care of /boot on md0.

md1 is vg0 with lvm on top. Its all taken care of nicely by the kernel
and the initramfs.

For backing up this 80 GB pair of drives, I would use an 80 GB laptop
drive partitioned exactly the same. Add each partition into its
corresponding raid1 array and get it to sync. Install grub onto its
MBR. Remove it from the array and you have a perfect copy. If disaster
strikes, buy a new computer with new drives, hook up the laptop drive
and boot. Partition the new drives and add them to the (degraded)
array. Then remove the laptop drive fromt the array. You're back where
you started (except for hardware changes). (Thanks to Len Sorensen on
the amd64 list for this idea).

This is neat and elegant but doesn't address the issue of the backup
drive's failure if such failure is non-atomic. Actually, it means
forcing drive failure to be atomic since there is no FEC underlying the
md arrays except the drive's ECC. I wonder how long it would take my
Athlon 3800+ to calculate an SHA-sum of the raw partition? Write the
sum on the outside of the drive. The can boot it up in ro
(init=/bin/sh) and calculate it again. If its the same there should be
no files missing or corrupted.

As far as rescue CD we run into the problem of the longevity of CD-ROM
media. Yes I would have Etch netinst CD (on a 2" disk) but I would also
have a USB stick set up with hd-media and the full CD1.iso (so I have
the installation manual, readmes, etc). I don't know how long a USB
stick lasts in storage; they haven't been around for 10 years (yes I
know that EEROM has been around for a while). Hard drives have been
around for a while.

Doug.
Mike McCarty
2006-12-07 15:16:35 UTC
Permalink
Post by Douglas Tutty
Post by Reid Priedhorsky
No, you _should_ compress it and then use some of the space you saved to
add some carefully chosen redundancy which will allow you to reconstruct
everything, not just some things, in case of failure. (E.g., using par2.)
Scenario C: Compression plus redundancy
Suppose you have 100 megabytes of files, uncompressed. You create a tar
archive and compress it down to 75M. You then create 10M of redundancy
using (e.g.) par2, for a total of 85M. A failure occurs, and 2M of data
is lost. You use par2 to reconstruct the archive, and nothing is lost.
(You can do this regardless of whether data, redundancy, or both are
destroyed.) You are happy.
Hi Reid,
I've been looking at par2. The question remains how to apply it to data
stored on media where the potential failure is one of media not
transmittion. If I only protect the tar.bz2 file and a media failure
occurs, how could I have set up the par2 redundancy files to allow me to
recover the data.
Apparently, hard disks use FEC themselves so that they either can fix
the data or there is too much damage and the drive is inaccessible. It
The sector is unreadable. It is possible to command discs to do "long
reads" which return the data in the sector along with the check bits,
and possibly recover some of the data if one can recognize other
redundancy present in it. For example, ASCII text is often so redundant
that it can be reconstructed with even half of the data corrupted.
Post by Douglas Tutty
seems to be an all-or-nothing propositition. If someone has experience
of FEC drive failures that refutes this I'd be very interested.
I'm not sure that what I wrote above is a refutation of what you said.
Post by Douglas Tutty
The only disk failures I have experienced are on older drives without
Hmm? Discs have had FEC on them since 1982 at least (when I got involved
with them).
Post by Douglas Tutty
FEC that for a given sector return an error about bad CRC but one can
carry on and read the rest of the disk. It was from this perspective
that I proposed the question that led to this thread.
There are many kinds of disc failures. If you have a failing bearing,
then you may have speed variations that the data resynchronizer cannot
ovecome. If this happens, you may have sporadic ability to read any
given sector, and if you try often enough, you may get everything off
the disc.

If you have a medium failure, then you may have weak or migrating bits
which are sometimes readable without errror, and sometimes not. Or you
may have a hard error. If the FEC can correct it, then the data get
recovered, and the sector gets remapped. This happens without any
notification, unless you ask the disc for its information.

If the power on the disc fails, then the disc is unreadable in any
fashion at all.
Post by Douglas Tutty
If drives are atomic in this way, it seems that the only way to achieve
redundancy is through multiple copies (either manually done or via
raid1).
Well, that's not quite the case. But if you want to protect against
single point failures, like the uC on the disc goes South, or the
power supply on it takes a nosedive, then the only way to do that
is via redundancy.
Post by Douglas Tutty
I'm still hoping that someone who knows how linux software raid work can
tell me how it decides that a drive has failed. This question was posed
in a thread about raid1 internals.
Your best bet may be to get the code and read it.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
h***@topoi.pooq.com
2006-12-07 17:26:13 UTC
Permalink
Post by Douglas Tutty
Post by Reid Priedhorsky
No, you _should_ compress it and then use some of the space you saved to
add some carefully chosen redundancy which will allow you to reconstruct
everything, not just some things, in case of failure. (E.g., using par2.)
Scenario C: Compression plus redundancy
Suppose you have 100 megabytes of files, uncompressed. You create a tar
archive and compress it down to 75M. You then create 10M of redundancy
using (e.g.) par2, for a total of 85M. A failure occurs, and 2M of data
is lost. You use par2 to reconstruct the archive, and nothing is lost.
(You can do this regardless of whether data, redundancy, or both are
destroyed.) You are happy.
Hi Reid,
I've been looking at par2. The question remains how to apply it to data
stored on media where the potential failure is one of media not
transmittion. If I only protect the tar.bz2 file and a media failure
occurs, how could I have set up the par2 redundancy files to allow me to
recover the data.
Apparently, hard disks use FEC themselves so that they either can fix
the data or there is too much damage and the drive is inaccessible. It
seems to be an all-or-nothing propositition. If someone has experience
of FEC drive failures that refutes this I'd be very interested.
The only disk failures I have experienced are on older drives without
FEC that for a given sector return an error about bad CRC but one can
carry on and read the rest of the disk. It was from this perspective
that I proposed the question that led to this thread.
If drives are atomic in this way, it seems that the only way to achieve
redundancy is through multiple copies (either manually done or via
raid1).
I'm still hoping that someone who knows how linux software raid work can
tell me how it decides that a drive has failed. This question was posed
in a thread about raid1 internals.
Thanks,
Doug.
I quite agree. But in the absence of error-correction codes,
uncompressed is batter.

And if your error-correction software ahould happen to be unusable in several
years, your errors will not be easy to corrected.

Did you ever write any code in the 1970's that can't be run any more?
I did.

-- hendrik
Douglas Tutty
2006-12-07 21:39:53 UTC
Permalink
Post by h***@topoi.pooq.com
Post by Douglas Tutty
Post by Reid Priedhorsky
No, you _should_ compress it and then use some of the space you saved to
add some carefully chosen redundancy which will allow you to reconstruct
everything, not just some things, in case of failure. (E.g., using par2.)
Scenario C: Compression plus redundancy
Suppose you have 100 megabytes of files, uncompressed. You create a tar
archive and compress it down to 75M. You then create 10M of redundancy
using (e.g.) par2, for a total of 85M. A failure occurs, and 2M of data
is lost. You use par2 to reconstruct the archive, and nothing is lost.
(You can do this regardless of whether data, redundancy, or both are
destroyed.) You are happy.
Hi Reid,
I've been looking at par2. The question remains how to apply it to data
stored on media where the potential failure is one of media not
transmittion. If I only protect the tar.bz2 file and a media failure
occurs, how could I have set up the par2 redundancy files to allow me to
recover the data.
Apparently, hard disks use FEC themselves so that they either can fix
the data or there is too much damage and the drive is inaccessible. It
seems to be an all-or-nothing propositition. If someone has experience
of FEC drive failures that refutes this I'd be very interested.
The only disk failures I have experienced are on older drives without
FEC that for a given sector return an error about bad CRC but one can
carry on and read the rest of the disk. It was from this perspective
that I proposed the question that led to this thread.
If drives are atomic in this way, it seems that the only way to achieve
redundancy is through multiple copies (either manually done or via
raid1).
I'm still hoping that someone who knows how linux software raid work can
tell me how it decides that a drive has failed. This question was posed
in a thread about raid1 internals.
I quite agree. But in the absence of error-correction codes,
uncompressed is batter.
And if your error-correction software ahould happen to be unusable in several
years, your errors will not be easy to corrected.
Did you ever write any code in the 1970's that can't be run any more?
I did.
Thanks Hendrik,

I understand what you mean about compression and that seems to be the
consesus not just here but in general: without error correction, don't
compress so that recovery is possible; with error correction, compress
to save space.

I was 4 years old in 1970. However, later (forget the year) I had a
Timex/Sinclair 1000. The programs I wrote in Basic can and have
sometimes been ported to my Sharp PC-1401's Baisc, and later ported to
python on linux. The facilitator as always is well documented
non-cryptic code. However, the OS I made for the first computer I made
from scratch was written in Z-80 machine code in hex so isn't any good
for anything else. Anything I write now is either in python or
fortran77. I strive to keep everyting portable and minimize the use of
add-ons. Pure fortran77 should always be able to be compiled on pretty
much anything. If a python interpreter goes out of style, at least the
souce forms a great prototype for a new port. I don't do C or perl and
I only program sh like a dos bat file (if flow-control is needed I
switch to python) which I suppose makes me an oddity on *NIX.

That's why I was looking for an existing archive format with built-in FEC.
Anything I cobble together would have to be backed up separatly so that
restoration would be possible. I __really__ wish that FEC was a
standard option of tar, cpio, or afio being readily available. Par2 is
available as a package but will it always be? If I just archive its
executable, it may not work with whatever the libs-de-jure so I may try
to learn how to compile if from souce statically linked.

All I really want is an archive media (and format) as robust as
pigment-on-parchment, that can store 80 Gb in about 300 cubic
centremeters (the data density of a 2.5" drive in a ruggedized
enclosure). I guess this is the holy grail of data storage and is
both the bread and butter of the big specialized companies and the
reason that banks still print everything out somewhere.

I wonder what NASA did for their deep-space probes like Voyager? The
recent stuff seems to be disposable (e.g. how long will this one last?),
but Voyager was meant to keep on running. They used some sort of gold
pressed record for ETI to read but I wonder what they used for the
computer's OS and data-storage in-between downloads?

Thanks

Doug.
h***@topoi.pooq.com
2006-12-08 16:20:59 UTC
Permalink
Post by Douglas Tutty
I wonder what NASA did for their deep-space probes like Voyager? The
recent stuff seems to be disposable (e.g. how long will this one last?),
but Voyager was meant to keep on running. They used some sort of gold
pressed record for ETI to read but I wonder what they used for the
computer's OS and data-storage in-between downloads?
A few years ago I hear that they had stored a lot of their early data on
magnetic tapes, which were deteriorating and in need ot copying to new
media, but that there was no funding available to do this.

-- hendrik
Ron Johnson
2006-12-07 23:23:16 UTC
Permalink
[snip]
Post by h***@topoi.pooq.com
I quite agree. But in the absence of error-correction codes,
uncompressed is batter.
And if your error-correction software ahould happen to be unusable in several
years, your errors will not be easy to corrected.
That's why you write such s/w in generic ANSI C. (I presume there's
some "strict" switch in g77, too.)
Post by h***@topoi.pooq.com
Did you ever write any code in the 1970's that can't be run any more?
I did.
Shame on you for not writing in a portable language. Go COBOL!!!

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
h***@topoi.pooq.com
2006-12-08 16:26:19 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Post by h***@topoi.pooq.com
Did you ever write any code in the 1970's that can't be run any more?
I did.
Shame on you for not writing in a portable language. Go COBOL!!!
I actually did my non-surviving code in assembler for the IBM 1620, a
decimal machine. I believe I had a Fortran II compiler available --
that was in the days before Fortran had been standardized.

In the 70's I wrote most of an Algol 68 compiler in Algol W. The intent
was to rewrite it in Algol 68 when it was done. Conversion would
probably be done (mostly) mechanically, but funding ran out shortly
before the first comnpiler was quite finished.

I looked at it again a few years ago -- some bit rot has occurred in
the lexical analyser, but most of it is still readable.

-- hendrik
Mike McCarty
2006-12-08 18:22:29 UTC
Permalink
Post by h***@topoi.pooq.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Post by h***@topoi.pooq.com
Did you ever write any code in the 1970's that can't be run any more?
I did.
Shame on you for not writing in a portable language. Go COBOL!!!
I actually did my non-surviving code in assembler for the IBM 1620, a
decimal machine. I believe I had a Fortran II compiler available --
that was in the days before Fortran had been standardized.
Well, *nominally* a decimal machine. IIRC, part of the bootstrap process
was to load in the math tables. It was actually possible to make
it into a sort-of octal machine by loading octal tables.
Post by h***@topoi.pooq.com
In the 70's I wrote most of an Algol 68 compiler in Algol W. The intent
was to rewrite it in Algol 68 when it was done. Conversion would
probably be done (mostly) mechanically, but funding ran out shortly
before the first comnpiler was quite finished.
Yeah, I did a compiler for Modula II several years ago which died on
the vine like that. It's kinda a disappointment for the engineers
when the thing the are working on doesn't actually ever get built.
Post by h***@topoi.pooq.com
I looked at it again a few years ago -- some bit rot has occurred in
the lexical analyser, but most of it is still readable.
Huh. Interesting. I've got floppies which are more than 10 yrs old
which read perfectly.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Ron Johnson
2006-12-08 19:04:14 UTC
Permalink
[snip]
Post by Mike McCarty
Post by h***@topoi.pooq.com
I looked at it again a few years ago -- some bit rot has occurred in
the lexical analyser, but most of it is still readable.
Huh. Interesting. I've got floppies which are more than 10 yrs old
which read perfectly.
By "lexical analyser", I *think* he meant, "I've forgotten some of
the trickier syntax".

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
Mike McCarty
2006-12-08 20:08:12 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
[snip]
Post by Mike McCarty
Post by h***@topoi.pooq.com
I looked at it again a few years ago -- some bit rot has occurred in
the lexical analyser, but most of it is still readable.
Huh. Interesting. I've got floppies which are more than 10 yrs old
which read perfectly.
By "lexical analyser", I *think* he meant, "I've forgotten some of
the trickier syntax".
I'm sure he could answer that better than you or I. However,
"bit rot" normally means that some of the bits on the magnetic
medium have either flaked off, or have migrated (a real problem
on low coercion magnetic media) and consequently have become
unreadable by the drive.

A lexical analyzer is part of a compiler. They are typically
lexical analyzer (tokenizer), semantic analyzer, symbol manager,
code generator, machine independent optimizer, code assembler,
and machine dependent optimizer. Some compilers also have an
assembler in them (not what is meant by a code assembler; that
turns the machine independent intermediate code into machine
dependent code, and does instruction scheduling).

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
h***@topoi.pooq.com
2006-12-10 13:34:37 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
[snip]
Post by Mike McCarty
Post by h***@topoi.pooq.com
I looked at it again a few years ago -- some bit rot has occurred in
the lexical analyser, but most of it is still readable.
Huh. Interesting. I've got floppies which are more than 10 yrs old
which read perfectly.
By "lexical analyser", I *think* he meant, "I've forgotten some of
the trickier syntax".
I meant, the compiler contained a lexical analyser, and there were some
irrecoverable bad blocks on the magnetic tape that contained the source
code for that lexical analyser.

If there was a prospect of reviving the compiler (I suspect it's not
worth the effort except as a historical artifact) fillin in the
missing code would be a very small part of the project. Rewriting the
code generator to generate other than IBM 360 code would involve more
work, as well as rewriting the whole thing in another programming
language so that it can be compiled and used on today's systems.

But to succeed in today's language market, it would probably have to be
transmogrified into some kind of object-oriented Algol 68, which
would be a very different thing.

-- hendrik
Douglas Tutty
2006-12-10 18:27:29 UTC
Permalink
Post by h***@topoi.pooq.com
I meant, the compiler contained a lexical analyser, and there were some
irrecoverable bad blocks on the magnetic tape that contained the source
code for that lexical analyser.
If there was a prospect of reviving the compiler (I suspect it's not
worth the effort except as a historical artifact) fillin in the
missing code would be a very small part of the project. Rewriting the
code generator to generate other than IBM 360 code would involve more
work, as well as rewriting the whole thing in another programming
language so that it can be compiled and used on today's systems.
But to succeed in today's language market, it would probably have to be
transmogrified into some kind of object-oriented Algol 68, which
would be a very different thing.
I've yet to see the appeal of OO. Then again I've never seen Algol. I
don't do C (too many punctuational snares); ditto perl; ditto bash;
machine/assembler isn't portable. To me that leaves Fortran and Python
(I don't tend to use the OO nature of python unless I can help it).

I guess what I'm wistful for are all the lessons learned about how to
run a computer installation during the mainframe era that are
inaccessible now except by picking your brain. The focus recently has
been on server-farm data centers, web interfaces, etc. Now, IBM is
marketing server consolodation and virturalizing everything but that's
not the same.

The last time I was at UofT, I checked out the applicable library and
came up empty. Yes, every engineer has to learn Fortran so there were
books on that, and there were mathematic-centric texts on computer
design but there was nothing on the historical operating culture, the
accumulated wisdom of the age.

I'm not saying that I wish the CRT hadn't been invented or that I want
to try document processing using punch cards, but 99 % of what I use a
computer for can be done on any *N*X computer of any era. (my 486
usually sits at 97% idle, 2% of which is top).


With regard to your other post, I hope this hasn't become many to do
around naught.

Doug.
h***@topoi.pooq.com
2006-12-11 02:44:36 UTC
Permalink
Post by Douglas Tutty
Post by h***@topoi.pooq.com
I meant, the compiler contained a lexical analyser, and there were some
irrecoverable bad blocks on the magnetic tape that contained the source
code for that lexical analyser.
If there was a prospect of reviving the compiler (I suspect it's not
worth the effort except as a historical artifact) fillin in the
missing code would be a very small part of the project. Rewriting the
code generator to generate other than IBM 360 code would involve more
work, as well as rewriting the whole thing in another programming
language so that it can be compiled and used on today's systems.
But to succeed in today's language market, it would probably have to be
transmogrified into some kind of object-oriented Algol 68, which
would be a very different thing.
I've yet to see the appeal of OO. Then again I've never seen Algol. I
don't do C (too many punctuational snares); ditto perl; ditto bash;
machine/assembler isn't portable. To me that leaves Fortran and Python
(I don't tend to use the OO nature of python unless I can help it).
Much of the advantage of OO can be obtained by:
* strong type checking (yes, really bulletproof strong type checking)
* garbage collection, so you won't accidentally free storage you really need
* the ancillary run-time checks you need to make sure you don't break
the run-time model of the language (such as shecks on subscript bounds)

This tends to be enough that run-time errors can be reported at the logical
level of the panguage you are using, instead of hexadecimal gibberiish.

C++ does *not* have these advantages.

-- hendrik
Ron Johnson
2006-12-11 03:57:22 UTC
Permalink
Post by h***@topoi.pooq.com
Post by Douglas Tutty
Post by h***@topoi.pooq.com
I meant, the compiler contained a lexical analyser, and there were some
irrecoverable bad blocks on the magnetic tape that contained the source
code for that lexical analyser.
If there was a prospect of reviving the compiler (I suspect it's not
worth the effort except as a historical artifact) fillin in the
missing code would be a very small part of the project. Rewriting the
code generator to generate other than IBM 360 code would involve more
work, as well as rewriting the whole thing in another programming
language so that it can be compiled and used on today's systems.
But to succeed in today's language market, it would probably have to be
transmogrified into some kind of object-oriented Algol 68, which
would be a very different thing.
I've yet to see the appeal of OO. Then again I've never seen Algol. I
don't do C (too many punctuational snares); ditto perl; ditto bash;
machine/assembler isn't portable. To me that leaves Fortran and Python
(I don't tend to use the OO nature of python unless I can help it).
* strong type checking (yes, really bulletproof strong type checking)
* garbage collection, so you won't accidentally free storage you really need
* the ancillary run-time checks you need to make sure you don't break
the run-time model of the language (such as shecks on subscript bounds)
I assume you are making the point that lots of non-fashionable
languages can do this... Heck, VAX Basic did that in the mid-1980s
Post by h***@topoi.pooq.com
This tends to be enough that run-time errors can be reported at the logical
level of the panguage you are using, instead of hexadecimal gibberiish.
C++ does *not* have these advantages.
- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
Miles Bader
2006-12-12 00:41:44 UTC
Permalink
Post by Douglas Tutty
I've yet to see the appeal of OO. Then again I've never seen Algol. I
* strong type checkin * garbage collection * ancillary run-time checks
Those have nothing to do with OOP (that is to say, they are orthogonal
to it).

OOP's main advantages would seem to be:

(1) improvement of modularity by keeping code related to a particular
type in one place even in the presence of hierarchical type
relationships,

(2) easy sharaing of common code that often results from such type
relationships, and

(3) making it simpler to code generic algorithms by taking advantage of
these hierarchical type relationships.
C++ does *not* have these advantages.
It has many other advantages however, including those from OOP, and more
unusually, a notational power that makes certain sorts of programs
_much_ easier to write/read (part of this is the fact that doing so can
be done _efficiently_ -- it's very common to see e.g. java programs
which

-Miles
--
The car has become... an article of dress without which we feel uncertain,
unclad, and incomplete. [Marshall McLuhan, Understanding Media, 1964]
Miles Bader
2006-12-12 00:52:50 UTC
Permalink
Whoops, chopped off my last paragraph; I meant:

It has many other advantages however, including those from OOP, and more
unusually, a notational power that makes certain sorts of programs
_much_ easier to write/read. [Part of this is the fact that doing so
can be done _efficiently_ -- it's very common to see e.g. java programs
which are hard to read and have subtle bugs because of the tricks
they're playing to avoid heap allocating temporary objects. Because C++
allows using value (or value-like) semantics instead in many cases, you
don't need so many tricks, and it can greatly improve the
maintainability of the code.]

-MIles
--
Quidquid latine dictum sit, altum viditur.
h***@topoi.pooq.com
2006-12-12 15:20:28 UTC
Permalink
Post by Miles Bader
It has many other advantages however, including those from OOP, and more
unusually, a notational power that makes certain sorts of programs
_much_ easier to write/read. [Part of this is the fact that doing so
can be done _efficiently_ -- it's very common to see e.g. java programs
which are hard to read and have subtle bugs because of the tricks
they're playing to avoid heap allocating temporary objects. Because C++
allows using value (or value-like) semantics instead in many cases, you
don't need so many tricks, and it can greatly improve the
maintainability of the code.]
I know, I know. Those tricks are a pain in the butt.

Eiffel eliminates that problem with its "expanded" classes.
Modula-3 avoids that problem by having data structures that are *not*
made of objects (in the technical OO sense) and that can be places off
the heap, and in other objects.

Modula-3 even goes the whole way to low-level system programming with
its "unsafe" features. The difference between these and C++ or C is
that you can't use them by accident; you have to explicitly mark the
code that uses them as "unsafe".

Although I find these languages wordy, I still think it a great pity
that C++ took off instead of them.

- hendrik
Mike McCarty
2006-12-12 21:23:49 UTC
Permalink
[snip]
Post by h***@topoi.pooq.com
Eiffel eliminates that problem with its "expanded" classes.
Modula-3 avoids that problem by having data structures that are *not*
made of objects (in the technical OO sense) and that can be places off
the heap, and in other objects.
Modula-3 even goes the whole way to low-level system programming with
its "unsafe" features. The difference between these and C++ or C is
that you can't use them by accident; you have to explicitly mark the
code that uses them as "unsafe".
Modula-3 I'm not familiar with. There were two problems with Modula-II
(1) it was named Modula-II instead of Pascal-II
(2) it came along 10 years too late

When C took over from Pascal, it was evident to all with eyes to see
that it was an inferior language /as a language/ to Pascal. However,
Pascal was also deliberately hamstrung. The language was designed for
beginning programmers, and had so many restraints and safety nets
that it couldn't be used for systems programming. Another issue
is that the language definition specified p-code as the output,
but one can leave that aside.

What one cannot leave aside, for systems programming, is the places
where strong typing could not be broken when one needed to,
and where separate compilation was not supported.

Another flaw in Pascal was that it was based on the successive
refinement model for software development, which was a failure.
In particular, nested procedures are a bad idea. So are local
variables hiding global variables, but C also has that defect.
But these features of the language can just not be used. No one
forces you to write nested procedures.

But when C came along, Pascal was just not up to systems programming.
The only other alternative was assembler. C, bad as it is, is
superior to assembler.

Had Modula-II come along in a timely manner, and been named Pascal-II
so people would have had a "warm fuzzy" feeling of familiarity,
then C would, I belive, have been the backwater, and not Modula-II.
Post by h***@topoi.pooq.com
Although I find these languages wordy, I still think it a great pity
that C++ took off instead of them.
Well, you've got my take on why that happened.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Ron Johnson
2006-12-12 21:56:37 UTC
Permalink
Post by Mike McCarty
[snip]
Post by h***@topoi.pooq.com
Eiffel eliminates that problem with its "expanded" classes.
Modula-3 avoids that problem by having data structures that are *not*
made of objects (in the technical OO sense) and that can be places off
the heap, and in other objects.
Modula-3 even goes the whole way to low-level system programming with
its "unsafe" features. The difference between these and C++ or C is
that you can't use them by accident; you have to explicitly mark the
code that uses them as "unsafe".
Modula-3 I'm not familiar with. There were two problems with Modula-II
(1) it was named Modula-II instead of Pascal-II
(2) it came along 10 years too late
When C took over from Pascal, it was evident to all with eyes to see
that it was an inferior language /as a language/ to Pascal. However,
Pascal was also deliberately hamstrung. The language was designed for
beginning programmers, and had so many restraints and safety nets
that it couldn't be used for systems programming. Another issue
is that the language definition specified p-code as the output,
but one can leave that aside.
What one cannot leave aside, for systems programming, is the places
where strong typing could not be broken when one needed to,
and where separate compilation was not supported.
Another flaw in Pascal was that it was based on the successive
refinement model for software development, which was a failure.
In particular, nested procedures are a bad idea. So are local
variables hiding global variables, but C also has that defect.
But these features of the language can just not be used. No one
forces you to write nested procedures.
But when C came along, Pascal was just not up to systems programming.
The only other alternative was assembler. C, bad as it is, is
superior to assembler.
Had Modula-II come along in a timely manner, and been named Pascal-II
so people would have had a "warm fuzzy" feeling of familiarity,
then C would, I belive, have been the backwater, and not Modula-II.
Post by h***@topoi.pooq.com
Although I find these languages wordy, I still think it a great pity
that C++ took off instead of them.
Well, you've got my take on why that happened.
My recollection of the 1980s MS-DOS world was that Turbo Pascal's
problems were it's small memory model and lack of modules until
v4.0, by which time C had already taken over.

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
Mike McCarty
2006-12-12 22:30:34 UTC
Permalink
Post by Ron Johnson
My recollection of the 1980s MS-DOS world was that Turbo Pascal's
problems were it's small memory model and lack of modules until
v4.0, by which time C had already taken over.
Who said anything about MSDOS? C took over when CP/M was the rage.
"Modules" are just what I mentioned with respect to "separate
compilation".

The issue with Pascal is that it is completely unsuited to
systems programming altogether, because it has no escape
route from the strong typing, no provision for separate
compilation, and uses interpreted p-code.

It is impossible to write a string concatenation routine
in Pascal because of the strong typing. It is impossible
to unpack a floating point number into its compnent parts
because of the strong typing. These are just two examples
of how Pascal is completely unsuited for systems programmning.

Without separate compilation, programs are individual
separate things. This prevents code sharing, which is
one of the prime reasons for having an OS. So this is
another reason Pascal cannot be used for systems programming.

Since the output of a Pascal compiler is required to be
p-code, it needs an interpreter. So, we're back to
writing the interpreter in assembler. Furthermore, the
OS normally cannot take the performance hit that interpretation
imposes. So, again, Pascal is completely unsuited for
systems programming.

Modula-II addressed all these issues, in a way superior
to C. It just had the wrong name and came along 10 years
too late.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Ron Johnson
2006-12-12 23:53:17 UTC
Permalink
Post by Mike McCarty
Post by Ron Johnson
My recollection of the 1980s MS-DOS world was that Turbo Pascal's
problems were it's small memory model and lack of modules until
v4.0, by which time C had already taken over.
Who said anything about MSDOS? C took over when CP/M was the rage.
"Modules" are just what I mentioned with respect to "separate
compilation".
The issue with Pascal is that it is completely unsuited to
systems programming altogether, because it has no escape
route from the strong typing, no provision for separate
compilation, and uses interpreted p-code.
I'm not a systems programmer, I'm a DP programmer. Thus, I don't
give a Rat's Arse whether my language of choice is good for system
programming. In fact, I *like* B&D languages. Why? Not needing to
worry about pointers and heaps and array under/overflows trampling
over core means that my jobs die less often, which is A Good Thing.

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.

h***@topoi.pooq.com
2006-12-12 23:00:26 UTC
Permalink
Post by Mike McCarty
[snip]
Post by h***@topoi.pooq.com
Eiffel eliminates that problem with its "expanded" classes.
Modula-3 avoids that problem by having data structures that are *not*
made of objects (in the technical OO sense) and that can be places off
the heap, and in other objects.
Modula-3 even goes the whole way to low-level system programming with
its "unsafe" features. The difference between these and C++ or C is
that you can't use them by accident; you have to explicitly mark the
code that uses them as "unsafe".
Modula-3 I'm not familiar with. There were two problems with Modula-II
(1) it was named Modula-II instead of Pascal-II
Yeah. Wirth even wrote a paper about the mistakes made in the design of
Pascal, and then with Modula he repeated many of them.
Post by Mike McCarty
(2) it came along 10 years too late
When C took over from Pascal, it was evident to all with eyes to see
that it was an inferior language /as a language/ to Pascal. However,
Pascal was also deliberately hamstrung. The language was designed for
beginning programmers, and had so many restraints and safety nets
that it couldn't be used for systems programming.
It was originally designed for the 60-bit CDC processors.
A lot of its weirder restrictions derive directly from implementation
limits. Roughly speaking, it likes its data to for in 60-bit words.
Arrays and records seem to be an exception to this. But so-called sets
were not.

ROughly speaking, Pascal encourages you to think in abstract,
machine-independent concepts, and then restrictsl the implementation so
you can't.
Post by Mike McCarty
Another issue
is that the language definition specified p-code as the output,
but one can leave that aside.
The language definition never specified p-code. The first
implementation wend straight to CDC Cyber machine code.

p-code was defined for a so-called portable Pascal compiler.
Better technologies for portable object code existed, but Wirth chose to
ignore them.
Post by Mike McCarty
What one cannot leave aside, for systems programming, is the places
where strong typing could not be broken when one needed to,
and where separate compilation was not supported.
But strong typing could be broken, and that is one of the real flaws in
Pascal. You just use record variants. You end up breaking it all the
time without intending to. Horrible. And when you need to break it,
it's syntactically and semantically awkward, and not at all obvious what
you're doing.
Post by Mike McCarty
Another flaw in Pascal was that it was based on the successive
refinement model for software development, which was a failure.
In particular, nested procedures are a bad idea.
I don't think nested procedures have anything to do with successive
refinement. I've found then quite useful, especially when you want to
pass procedures as parameters to other procedures. Pascal actually got
the semantics of this somewhat right.
Post by Mike McCarty
So are local
variables hiding global variables, but C also has that defect.
Occasionally useful, but you don't want to get it by accident.
Post by Mike McCarty
But these features of the language can just not be used. No one
forces you to write nested procedures.
But when C came along, Pascal was just not up to systems programming.
The only other alternative was assembler.
There were, I believe, alternatives to C even in the early 70's.
C spread because it was the implementation language for Unix.
Post by Mike McCarty
C, bad as it is, is
superior to assembler.
For most purposes, yes. But it has severe deficiencies when it comes to
initialized static data, and for anything involving data structures
that contain code.
Post by Mike McCarty
Had Modula-II come along in a timely manner, and been named Pascal-II
so people would have had a "warm fuzzy" feeling of familiarity,
then C would, I belive, have been the backwater, and not Modula-II.
I think it was intended to give people a warm fuzzy feeling because they
were familiar to Modula.

Modula III is a different language entirely. It is superficially
similar to the earlier Modulas, but
(1) it was not designed ny Wirth
(2) it *was* strongly types and had a garbage collector
(3) it has a machanism for object types and inheritance which did
*not* replace the older, well-known data structures. And yes, it did
allow breaking type security, but you had to be explicit about doing
that.
Post by Mike McCarty
Post by h***@topoi.pooq.com
Although I find these languages wordy, I still think it a great pity
that C++ took off instead of them.
Well, you've got my take on why that happened.
Your take was about C. Modula 3 was a mature systems language with
object-oriented stuff in the 80's, before C++ was really properly off
the ground.

-- ehndrik
Post by Mike McCarty
Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
Is this legal C? Isn't the type of p implicitly int? Am I missing
something?

-- hendrik
Miles Bader
2006-12-12 23:17:25 UTC
Permalink
Post by Mike McCarty
But when C came along, Pascal was just not up to systems programming.
The only other alternative was assembler.
There were tons of "systems pascal" variants around, and lots of systems
programming was done in pascal (e.g., the "spice" OS, predecessor to
Mach, and other stuff done on the perq).

The reason C won was because (1) it was associated with unix, and people
(well, researchers) liked unix, and (2) people _liked_ C -- despite a
few weird points, it was a refreshingly practical and elegant language
compared to the "wordy pedantic schoolmarm" tradition that pascal came
from. C was designed by _programmers_.

-Miles
--
Somebody has to do something, and it's just incredibly pathetic that it
has to be us. -- Jerry Garcia
h***@topoi.pooq.com
2006-12-12 15:12:51 UTC
Permalink
Post by Miles Bader
Post by Douglas Tutty
I've yet to see the appeal of OO. Then again I've never seen Algol. I
* strong type checkin * garbage collection * ancillary run-time checks
Those have nothing to do with OOP (that is to say, they are orthogonal
to it).
That's right. Strong, strict type-checking and garbage collection
were known technologies and were embedded in programming languages
before OO. But they seem to have entered the main stream of popular
language use along with OO languages.
Post by Miles Bader
(1) improvement of modularity by keeping code related to a particular
type in one place even in the presence of hierarchical type
relationships,
Yes.
Post by Miles Bader
(2) easy sharaing of common code that often results from such type
relationships, and
Actually, this sharing gets in the way of (1).
Post by Miles Bader
(3) making it simpler to code generic algorithms by taking advantage of
these hierarchical type relationships.
No argument here.

I may have been condescending, but it did not seem to me that these
advantages were what was likely to be appreciated by a Fortran
programmer.
Post by Miles Bader
C++ does *not* have these advantages.
It has many other advantages however, including those from OOP, and more
unusually, a notational power that makes certain sorts of programs
_much_ easier to write/read (part of this is the fact that doing so can
be done _efficiently_ -- it's very common to see e.g. java programs
which
In any case, in my experience (which included writing a parser and
static type-checker for C++ in C++), these advantages of C++ do not
outweigh the lack of secure type checking and garbage collection.

-- hendrik
Ron Johnson
2006-12-12 16:11:32 UTC
Permalink
[snip]
Post by Miles Bader
(3) making it simpler to code generic algorithms by taking advantage of
these hierarchical type relationships.
After a while, the exceptions and exceptions to exceptions etc etc
make the sub-classing inheritance trees really ugly and impossible
to debug.

Or maybe I just work in a messy industry...

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
Miles Bader
2006-12-12 23:23:10 UTC
Permalink
Post by Ron Johnson
After a while, the exceptions and exceptions to exceptions etc etc
make the sub-classing inheritance trees really ugly and impossible
to debug.
Or maybe I just work in a messy industry...
Probably. OOP is not a magic bullet, and bad programmers will still
produce bad programs (and classes and libraries and ...).

However OOP does offer a genuinely useful tool.

-Miles
--
`...the Soviet Union was sliding in to an economic collapse so comprehensive
that in the end its factories produced not goods but bads: finished products
less valuable than the raw materials they were made from.' [The Economist]
Douglas Tutty
2006-12-09 02:27:58 UTC
Permalink
Post by h***@topoi.pooq.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Post by h***@topoi.pooq.com
Did you ever write any code in the 1970's that can't be run any more?
I did.
Shame on you for not writing in a portable language. Go COBOL!!!
I actually did my non-surviving code in assembler for the IBM 1620, a
decimal machine. I believe I had a Fortran II compiler available --
that was in the days before Fortran had been standardized.
In the 70's I wrote most of an Algol 68 compiler in Algol W. The intent
was to rewrite it in Algol 68 when it was done. Conversion would
probably be done (mostly) mechanically, but funding ran out shortly
before the first comnpiler was quite finished.
I wish there was an interactive museum where old-tech guys like me could
go play with the old stuff. My sense is that programmers of your
generation wrote tighter code than is common today since you had such
limited (compared to now) memory, processing, etc, into which to
shoe-horn things.

I'm glad you guys are taking the time to share your wisdom and
experience here. Thank you.

Doug.
Ron Johnson
2006-12-09 14:10:24 UTC
Permalink
[snip]
Post by Douglas Tutty
I wish there was an interactive museum where old-tech guys like me could
go play with the old stuff. My sense is that programmers of your
generation wrote tighter code than is common today since you had such
limited (compared to now) memory, processing, etc, into which to
shoe-horn things.
Remember, though, that (except for esoteric boxen like LISP
machines) systems were simpler back then. Mainly libraries written
in assembly language. And no OO or GUI gunk!!!
Post by Douglas Tutty
I'm glad you guys are taking the time to share your wisdom and
experience here. Thank you.
- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
h***@topoi.pooq.com
2006-12-10 13:49:18 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
[snip]
Post by Douglas Tutty
I wish there was an interactive museum where old-tech guys like me could
go play with the old stuff. My sense is that programmers of your
generation wrote tighter code than is common today since you had such
limited (compared to now) memory, processing, etc, into which to
shoe-horn things.
Remember, though, that (except for esoteric boxen like LISP
machines) systems were simpler back then. Mainly libraries written
in assembly language. And no OO or GUI gunk!!!
To a large extent, that was *because* of limited memory and processing.

Algol 68 was a very good basic conception, and -- after the initial
Report had been Revised (which took a few years) -- an exceppent overall
design. Its input-output system was overly complex, and there were some
lexical issues that had not been tied down completely (character set
chaos still reigned then), but my experience with it were good. I had
the opportunity to use it in the late 70's, and found that it was not
unusual to write programs with tricky data structures of a thousand
lines and more and have them run correctly the first time they got
through the compiler's static checks.

I have not seen the like until I encountered Modula 3 and Eiffel,
decades later. For reasons I have not yet understood, Java doesn't come
close in this regard, even though it has garbage collection and strong
static type checking like the others. Still, Java is far, far better
for writing reliable code than C++.

The things I miss in Modula 3 and Eiffel are
compactness of notation
any statement can be syntactically embedded in an expression.

These are primarily notational issues, yet a lot of virtual ink has been
spilled about the complete conceptual difference between statements and
expressions.

I've noticed the same kind of disputes in natural languages. For
example, English speakers usually perceive a clear semantic difference
between "many" and "much". Yet it's possible to give a purely syntactic
rule to distinguish them -- you use "many" when modifying a plural noun,
and "much" for a singular one.

-- hendrik
Post by Douglas Tutty
I'm glad you guys are taking the time to share your wisdom and
experience here. Thank you.
Glad to do so.

-- hendrik
Mike McCarty
2006-12-11 09:19:07 UTC
Permalink
***@topoi.pooq.com wrote:

[snip]
Post by h***@topoi.pooq.com
I've noticed the same kind of disputes in natural languages. For
example, English speakers usually perceive a clear semantic difference
between "many" and "much". Yet it's possible to give a purely syntactic
rule to distinguish them -- you use "many" when modifying a plural noun,
and "much" for a singular one.
This is not true. For example, I have said "I've eaten too much
beans". "I've eaten too many beans", though it isn't something
I've said, *could* be said, and would not mean quite the same
thing. Another place where this doesn't work is with "grits".
One never has "many" grits. One *could* speak of "many grits",
I suppose, but that would not mean the same thing as "much grits".
Another one is "oats". One does not have many oats. If one were
to ask "How many oats do you have?" it would mean "How many
varieties do you have?", and not "What quantity do you have?"

Asking "How much beans did you eat?" means "what quantity", perhaps
in servings, or ounces weight, or volumetric like cups, but "How many
beans did you eat?" means "How many different varieties of beans did you
eat?" (like in a seven bean salad) or "Give me an exact count of how
many beans you ate." (like 50).

The issue is whether the quantity is considered to be continuous, or to
be discrete. Usually, when one speaks of a quatity of discrete objects,
one uses a plural noun. Likewise, usually when one speaks of a quantity
of something considered to be continuous, one uses a singular noun.
But this is not always the case.

Another way to think of it is this: If one *counts* the amount, then
one uses "many", if one *measures* the amount, then one uses "much".
This is regardless of whether the noun used be plural or singular.
One does not actually *count* the number of beans he has eaten, so
one uses "much beans" and not "many beans". "Many beans" means one
needs to count something, like varieties, or make an actual count
of the number of beans eaten. Most native speakers would be somewhat
confused upon being asked "How many beans did you eat?" He wouldn't
know the exact count, and would wonder why anyone would want to know
it, anyway, so would wonder what was really being asked.

Peas also fall into this category. I don't know whether I could
find examples which are not related to food, but believe me, the
issue is if you ask "how many" then you want an actual count,
and anything not counted is not a "many", but rather a "much".

I consider myself a native speaker, since I started when I was about
three years old. (Spanish is the first language I spoke.)

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
HÃ¥kon Alstadheim
2006-12-11 10:10:44 UTC
Permalink
Post by Mike McCarty
Peas also fall into this category. I don't know whether I could
find examples which are not related to food, but believe me, the
issue is if you ask "how many" then you want an actual count,
and anything not counted is not a "many", but rather a "much".
How much timber, how many logs.
How much water, how many litres
How much pain, how many wounds.
How much confusion, how many words, how much verbiage.
--
HÃ¥kon Alstadheim priv: +47 74 82 60 27
7510 Skatval mob: +47 47 35 39 38
http://alstadheim.priv.no/hakon/ job: +47 93 41 70 55
Mike McCarty
2006-12-11 10:52:57 UTC
Permalink
Post by HÃ¥kon Alstadheim
Post by Mike McCarty
Peas also fall into this category. I don't know whether I could
find examples which are not related to food, but believe me, the
issue is if you ask "how many" then you want an actual count,
and anything not counted is not a "many", but rather a "much".
How much timber, how many logs.
How much water, how many litres
How much pain, how many wounds.
How much confusion, how many words, how much verbiage.
Yep. But that still doesn't show the use of "much" with a plural
noun. Hmm. "Verbiage" may be sort of a "mass noun", as is possibly
"confusion".

It does illustrate the "counting" aspect of the use of "many".
It is certainly true that we use "much water", and "many liters
of water". The first is *measured*, the second is *counted*.
Likewise "timber" and "logs". In fact, that is an excellent
example.

Spanish has an interesting distinction sometimes between
masculine and femine words which are otherwise the same,
like "la radio" and "el radio". Or, more like this case,
"len~o" versus "len~a". One is a firewood log, whereas
the other is firewood.

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
--
To UNSUBSCRIBE, email to debian-user-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org
h***@topoi.pooq.com
2006-12-11 13:46:31 UTC
Permalink
Post by Mike McCarty
[snip]
Post by h***@topoi.pooq.com
I've noticed the same kind of disputes in natural languages. For
example, English speakers usually perceive a clear semantic difference
between "many" and "much". Yet it's possible to give a purely syntactic
rule to distinguish them -- you use "many" when modifying a plural noun,
and "much" for a singular one.
This is not true. For example, I have said "I've eaten too much
beans". "I've eaten too many beans", though it isn't something
I've said, *could* be said, and would not mean quite the same
thing. Another place where this doesn't work is with "grits".
One never has "many" grits. One *could* speak of "many grits",
I suppose, but that would not mean the same thing as "much grits".
Another one is "oats". One does not have many oats. If one were
to ask "How many oats do you have?" it would mean "How many
varieties do you have?", and not "What quantity do you have?"
Asking "How much beans did you eat?" means "what quantity", perhaps
in servings, or ounces weight, or volumetric like cups, but "How many
beans did you eat?" means "How many different varieties of beans did you
eat?" (like in a seven bean salad) or "Give me an exact count of how
many beans you ate." (like 50).
The issue is whether the quantity is considered to be continuous, or to
be discrete. Usually, when one speaks of a quatity of discrete objects,
one uses a plural noun. Likewise, usually when one speaks of a quantity
of something considered to be continuous, one uses a singular noun.
But this is not always the case.
Another way to think of it is this: If one *counts* the amount, then
one uses "many", if one *measures* the amount, then one uses "much".
This is regardless of whether the noun used be plural or singular.
One does not actually *count* the number of beans he has eaten, so
one uses "much beans" and not "many beans". "Many beans" means one
needs to count something, like varieties, or make an actual count
of the number of beans eaten. Most native speakers would be somewhat
confused upon being asked "How many beans did you eat?" He wouldn't
know the exact count, and would wonder why anyone would want to know
it, anyway, so would wonder what was really being asked.
Peas also fall into this category. I don't know whether I could
find examples which are not related to food, but believe me, the
issue is if you ask "how many" then you want an actual count,
and anything not counted is not a "many", but rather a "much".
I consider myself a native speaker, since I started when I was about
three years old. (Spanish is the first language I spoke.)
Interesting set of examples. THank you. They com close to seemingly
plural singular nouns, like "a people", but not quite!

Dutch is my first language -- I started English when I was five.

-- hendrik
Chris Bannister
2006-12-12 09:16:55 UTC
Permalink
Post by Mike McCarty
[snip]
Post by h***@topoi.pooq.com
I've noticed the same kind of disputes in natural languages. For
example, English speakers usually perceive a clear semantic difference
between "many" and "much". Yet it's possible to give a purely syntactic
rule to distinguish them -- you use "many" when modifying a plural noun,
and "much" for a singular one.
Agreed.
Post by Mike McCarty
This is not true. For example, I have said "I've eaten too much
beans". "I've eaten too many beans", though it isn't something
I've said, *could* be said, and would not mean quite the same
Correct is "I've eaten too many beans".
--
Chris.
======
" ... the official version cannot be abandoned because the implication of
rejecting it is far too disturbing: that we are subject to a government
conspiracy of `X-Files' proportions and insidiousness."
Letter to the LA Times Magazine, September 18, 2005.
Mike McCarty
2006-12-07 23:39:50 UTC
Permalink
Post by h***@topoi.pooq.com
I quite agree. But in the absence of error-correction codes,
uncompressed is batter.
And if your error-correction software ahould happen to be unusable in several
years, your errors will not be easy to corrected.
Even with FEC uncompressed may be better. OTOH, fewer bits to fail
is an advantage.
Post by h***@topoi.pooq.com
Did you ever write any code in the 1970's that can't be run any more?
I did.
I wrote some machine language programs for the IBM 1401 in 1969.
Does that count as programs that can't be run any more?

Mike
--
p="p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
This message made from 100% recycled bits.
You have found the bank of Larn.
I can explain it for you, but I can't understand it for you.
I speak only for myself, and I am unanimous in that!
Ron Johnson
2006-12-08 00:18:43 UTC
Permalink
[snip]
Post by Mike McCarty
I wrote some machine language programs for the IBM 1401 in 1969.
Does that count as programs that can't be run any more?
$ wajig show simh
[snip]
Description: Emulators for 32 different computers
This is the SIMH set of emulators for 32 different computers:
DEC PDP-1, PDP-4, PDP-7, PDP-8, PDP-9,
DEC PDP-10, PDP-11, PDP-15,
Data General Nova, Eclipse,
GRI-909,
Honeywell 316, 516,
HP 2100,
IBM System 3 Model 10, 1401,
IBM 1620 Model 1, IBM 1620 Model 2,
Interdata 3, 4, 5, 70, 80, 7/16, 8/16, 8/16E,
Interdata 7/32, 8/32,
SDS 940,
LGP-21, LGP-30,
DEC VAX (but cannot include the microcode due to copyright)
Tag: hardware::emulation, role::sw:utility


- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
Douglas Tutty
2006-12-08 01:25:00 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
[snip]
Post by Mike McCarty
I wrote some machine language programs for the IBM 1401 in 1969.
Does that count as programs that can't be run any more?
$ wajig show simh
[snip]
Description: Emulators for 32 different computers
DEC PDP-1, PDP-4, PDP-7, PDP-8, PDP-9,
DEC PDP-10, PDP-11, PDP-15,
Data General Nova, Eclipse,
GRI-909,
Honeywell 316, 516,
HP 2100,
IBM System 3 Model 10, 1401,
IBM 1620 Model 1, IBM 1620 Model 2,
Interdata 3, 4, 5, 70, 80, 7/16, 8/16, 8/16E,
Interdata 7/32, 8/32,
SDS 940,
LGP-21, LGP-30,
DEC VAX (but cannot include the microcode due to copyright)
You mean there's no emulator that lets me run Fortran for the 704? I
SO loved conditional gotos :-)
Ron Johnson
2006-12-08 02:41:54 UTC
Permalink
[snip]
Post by Douglas Tutty
You mean there's no emulator that lets me run Fortran for the 704? I
SO loved conditional gotos :-)
Sheah, computed GOTOs are great!! I use them all the time scripting
batch jobs in OpenVMS. They make code 10x easier to write and read.

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
h***@topoi.pooq.com
2006-12-08 16:29:11 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
[snip]
Post by Douglas Tutty
You mean there's no emulator that lets me run Fortran for the 704? I
SO loved conditional gotos :-)
Sheah, computed GOTOs are great!! I use them all the time scripting
batch jobs in OpenVMS. They make code 10x easier to write and read.
Datamation once published an article describing the computed COME FROM
statement. :-)

-- hendrik
Andrew Sackville-West
2006-12-08 00:18:18 UTC
Permalink
Post by Mike McCarty
Post by h***@topoi.pooq.com
I quite agree. But in the absence of error-correction codes,
uncompressed is batter.
And if your error-correction software ahould happen to be unusable in
several years, your errors will not be easy to corrected.
Even with FEC uncompressed may be better. OTOH, fewer bits to fail
is an advantage.
there's a math problem for you, though its obviously just a scaling of
the other. So if you get 25% compression (compressed is 75% size of
original), how does the fewer number of bits improve the probability
of losses? Please, though, don't figure it out. just speculation.
Post by Mike McCarty
Post by h***@topoi.pooq.com
Did you ever write any code in the 1970's that can't be run any more?
I did.
I wrote some machine language programs for the IBM 1401 in 1969.
Does that count as programs that can't be run any more?
my oldest is 6510 assembler rewrite of the kernel for my C64. I'm such
a young thing ;-)

A
Douglas Tutty
2006-12-08 01:21:48 UTC
Permalink
Post by Mike McCarty
Post by h***@topoi.pooq.com
I quite agree. But in the absence of error-correction codes,
uncompressed is batter.
And if your error-correction software ahould happen to be unusable in
several years, your errors will not be easy to corrected.
Even with FEC uncompressed may be better. OTOH, fewer bits to fail
is an advantage.
Post by h***@topoi.pooq.com
Did you ever write any code in the 1970's that can't be run any more?
I did.
I wrote some machine language programs for the IBM 1401 in 1969.
Does that count as programs that can't be run any more?
Not if there's still an IBM 1401 still around. You may not have one in
your spare bedroom, but it doesn't mean that there isn't one somewhere.

I'm assuming the IBM 1401 is a tad bigger than my Sharp PC-1401 (4 KB
ram, basic only).

Don't you wish you could run linux on the IBM 1401?
Ron Johnson
2006-12-08 02:47:23 UTC
Permalink
[snip]
Post by Douglas Tutty
Don't you wish you could run linux on the IBM 1401?
No. They were a PITA.

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
Loading...