Discussion:
Data corruption issues possibly involving cgd(4)
(too old to reply)
Nino Dehne
2007-01-16 05:59:10 UTC
Permalink
Hi there,

I am currently experiencing data corruption using 4.0_BETA2 from around
mid-december.

cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Unknown K7 (Athlon) (686-class), 2000.30 MHz, id 0x40fb2
cpu0: features ffdbfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features ffdbfbff<PGE,MCA,CMOV,PAT,PSE36,MPC,NOX,MMXX,MMX>
cpu0: features ffdbfbff<FXSR,SSE,SSE2,B27,HTT,LONG,3DNOW2,3DNOW>
cpu0: features2 2001<SSE3>
cpu0: "AMD Athlon(tm) 64 X2 Dual Core Processor 3600+"
cpu0: I-cache 64 KB 64B/line 2-way, D-cache 64 KB 64B/line 2-way
cpu0: L2 cache 256 KB 64B/line 16-way
cpu0: ITLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu0: DTLB 32 4 KB entries fully associative, 8 4 MB entries fully associative
cpu0: AMD Power Management features: 3f<STC,TM,TTP,VID,FID,TS>
cpu0: AMD PowerNow! Technology 2000 MHz
cpu0: available frequencies (Mhz): 1000 1800 2000
cpu0: calibrating local timer
cpu0: apic clock running at 200 MHz
cpu0: 8 page colors
cpu1 at mainbus0: apid 1 (application processor)
cpu1: not started

ACPI is enabled. This is an Athlon 64 X2 3600+ EE on an ASRock ALiveSATA2-GLAN
and an MDT 512M stick of DDR2-800 RAM. The box has 5 drives in a RAID5 using
raid(4) and a cgd(4) on top of that.

The issue manifests as follows:

1) Repeatedly hashing a large file residing on the crypted partition
occasionally yields a bad checksum. The problem can be reproduced by
repeatedly checking a large .rar file or .flac file as well.
The file is large enough not to fit in RAM and disks are active 100% of
the time.
Sometimes the wrong hash occurs at the 5th run, sometimes 20 runs are
needed. Sometimes two bad runs occur in succession.
2) same as 1) but the file fits into RAM so that subsequent hashes don't
hit the disk: the problem does _not_ occur. Tested with over 2000 runs of
md5 <file>.
3) same as 1) but the file resides on a non-cgd partition on a RAID1 using
raid(4): the problem also does _not_ occur. I aborted the hashing after
100 runs where the problem would have shown up with certainty in 1).
4) memtest86+ runs without errors.
5) mprime[1] runs without errors.
6) build.sh release not involving the cgd partition runs without errors.

Since I noticed sys/arch/x86/x86/errata.c on HEAD, at first I thought the
CPU might be affected by it. So I tried booting a -current GENERIC.MPACPI
kernel using boot -d. This did not give anything in dmesg.

SMP vs. UP kernel makes no difference.
Setting machdep.powernow.frequency.target to 1000 or 2000 makes no difference.

Judging from 2) to 6) I can exclude heating issues or something related
to concurrent hash calculations and disk access. envstat reports CPU below
40°C at all times.

Please help, I'm at a loss.

Best regards,

ND


[1] http://www.mersenne.org/freesoft.htm
Roland Dowdeswell
2007-01-16 06:10:21 UTC
Permalink
On 1168927089 seconds since the Beginning of the UNIX epoch
Post by Nino Dehne
Hi there,
ACPI is enabled. This is an Athlon 64 X2 3600+ EE on an ASRock ALiveSATA2-GLAN
and an MDT 512M stick of DDR2-800 RAM. The box has 5 drives in a RAID5 using
raid(4) and a cgd(4) on top of that.
This is the correct layering.
Post by Nino Dehne
1) Repeatedly hashing a large file residing on the crypted partition
occasionally yields a bad checksum. The problem can be reproduced by
repeatedly checking a large .rar file or .flac file as well.
The file is large enough not to fit in RAM and disks are active 100% of
the time.
Sometimes the wrong hash occurs at the 5th run, sometimes 20 runs are
needed. Sometimes two bad runs occur in succession.
2) same as 1) but the file fits into RAM so that subsequent hashes don't
hit the disk: the problem does _not_ occur. Tested with over 2000 runs of
md5 <file>.
3) same as 1) but the file resides on a non-cgd partition on a RAID1 using
raid(4): the problem also does _not_ occur. I aborted the hashing after
100 runs where the problem would have shown up with certainty in 1).
4) memtest86+ runs without errors.
5) mprime[1] runs without errors.
6) build.sh release not involving the cgd partition runs without errors.
Okay, so CGD does live under the buffer cache so (2) will not be
causing any encryption to occur.

The only thing that I can think of might be that there are some
kinds of memory errors that occurred a number of years ago under
particular usage patterns, e.g. gcc, which memtest did not catch.
Otherwise, it does seem that CGD might be the obvious culprit---but
that said, there's nothing in the code path, I think, that is not
deterministic.

Can you reproduce this issue on another system or is it just this
one?
Post by Nino Dehne
Since I noticed sys/arch/x86/x86/errata.c on HEAD, at first I thought the
CPU might be affected by it. So I tried booting a -current GENERIC.MPACPI
kernel using boot -d. This did not give anything in dmesg.
SMP vs. UP kernel makes no difference.
Setting machdep.powernow.frequency.target to 1000 or 2000 makes no difference.
Judging from 2) to 6) I can exclude heating issues or something related
to concurrent hash calculations and disk access. envstat reports CPU below
40°C at all times.
Please help, I'm at a loss.
Best regards,
ND
[1] http://www.mersenne.org/freesoft.htm
--
Roland Dowdeswell http://www.Imrryr.ORG/~elric/
Nino Dehne
2007-01-16 06:20:20 UTC
Permalink
Hi Roland,
Post by Roland Dowdeswell
Post by Nino Dehne
1) Repeatedly hashing a large file residing on the crypted partition
occasionally yields a bad checksum. The problem can be reproduced by
repeatedly checking a large .rar file or .flac file as well.
The file is large enough not to fit in RAM and disks are active 100% of
the time.
Sometimes the wrong hash occurs at the 5th run, sometimes 20 runs are
needed. Sometimes two bad runs occur in succession.
2) same as 1) but the file fits into RAM so that subsequent hashes don't
hit the disk: the problem does _not_ occur. Tested with over 2000 runs of
md5 <file>.
3) same as 1) but the file resides on a non-cgd partition on a RAID1 using
raid(4): the problem also does _not_ occur. I aborted the hashing after
100 runs where the problem would have shown up with certainty in 1).
4) memtest86+ runs without errors.
5) mprime[1] runs without errors.
6) build.sh release not involving the cgd partition runs without errors.
Okay, so CGD does live under the buffer cache so (2) will not be
causing any encryption to occur.
That's what I figured as well.
Post by Roland Dowdeswell
The only thing that I can think of might be that there are some
kinds of memory errors that occurred a number of years ago under
particular usage patterns, e.g. gcc, which memtest did not catch.
Otherwise, it does seem that CGD might be the obvious culprit---but
that said, there's nothing in the code path, I think, that is not
deterministic.
Can you reproduce this issue on another system or is it just this
one?
Transferring the system to other hardware will be a bit of a hassle. I will
see to it.

Thanks and regards,

ND
Daniel Carosone
2007-01-16 06:24:38 UTC
Permalink
Post by Nino Dehne
I am currently experiencing data corruption using 4.0_BETA2 from around
mid-december.
I'm sorry to hear it. You certainly seem to have taken the obvious
steps to eliminate other possible sources of the problem.

I can offer you some reassurrance that I don't see the same problem,
and haven't ever in a long history of using cgd. I have just tried to
specifically reproduce your test on a plain cgd-on-wd, without hitting
a different hash value.

I'm not sure where your problem lies, but it's not a simple one.
Post by Nino Dehne
ACPI is enabled. This is an Athlon 64 X2 3600+ EE on an ASRock ALiveSATA2-GLAN
and an MDT 512M stick of DDR2-800 RAM. The box has 5 drives in a RAID5 using
raid(4) and a cgd(4) on top of that.
..
Post by Nino Dehne
3) same as 1) but the file resides on a non-cgd partition on a RAID1 using
raid(4): the problem also does _not_ occur. I aborted the hashing after
100 runs where the problem would have shown up with certainty in 1).
any chance you could test with a RAID5 - ideally from the same RAID5 -
without cgd? It could be a controller or drive problem, or even a
power supply problem when all drives are active. RAID1 won't
necessarily hit those conditions, especially for read.

You could probably achive the same result dd'ing a constant chunk of
encrypted data off the raid(4) device to checksum, avoiding the need
to destroy or remake filesystems. If you reproduce the problem like
this, you have also eliminated filesystem bugs.

For comparison, Sun's ZFS has shown up these kinds of problems (power,
controller concurrency) in marginal hardware on a number of occasions.
Post by Nino Dehne
Please help, I'm at a loss.
It's a tricky one, but the above would be my next guess, and the next
useful thing to try to eliminate.

--
Dan.
Nino Dehne
2007-01-16 07:01:16 UTC
Permalink
Post by Daniel Carosone
any chance you could test with a RAID5 - ideally from the same RAID5 -
without cgd? It could be a controller or drive problem, or even a
power supply problem when all drives are active. RAID1 won't
necessarily hit those conditions, especially for read.
I'm getting your drift. While I can't make a filesystem on the RAID5
directly, see below.
Post by Daniel Carosone
You could probably achive the same result dd'ing a constant chunk of
encrypted data off the raid(4) device to checksum, avoiding the need
to destroy or remake filesystems. If you reproduce the problem like
this, you have also eliminated filesystem bugs.
Excellent advice, thanks. Unfortunately, I can't reproduce the issue this
way.

After 50 runs of dd if=/dev/rcgd0d bs=65536 count=4096 | md5 and no error
I aborted the test. Replacing rcgd0d with cgd0a made no difference.
While not necessary IMO, I tried the same with rraid1d, no errors either
after 50 runs. For comparison, a loop on the filesystem on the cgd aborted
after the 14th run now.

So the issue doesn't seem to be related to the power supply either and
frankly, it's starting to freak me out.
Post by Daniel Carosone
Post by Nino Dehne
Please help, I'm at a loss.
It's a tricky one, but the above would be my next guess, and the next
useful thing to try to eliminate.
So there, I'm even more at a loss now. :)

Thanks for the help. Best regards,

ND
Bernd Ernesti
2007-01-16 07:29:13 UTC
Permalink
Post by Nino Dehne
Post by Daniel Carosone
any chance you could test with a RAID5 - ideally from the same RAID5 -
without cgd? It could be a controller or drive problem, or even a
power supply problem when all drives are active. RAID1 won't
necessarily hit those conditions, especially for read.
I'm getting your drift. While I can't make a filesystem on the RAID5
directly, see below.
Can you check the s.m.a.r.t. status of your drives?

atactl wdX smart status

or use
smartctl -a /dev/wdXd
from pkgsrc/sysutils/smartmontools

Bernd
Nino Dehne
2007-01-16 07:37:09 UTC
Permalink
Post by Bernd Ernesti
Can you check the s.m.a.r.t. status of your drives?
atactl wdX smart status
They all check fine. Besides, what would happen if raid(4) got fed a
corrupted block? I'm under the impression I would see more serious errors
far earlier than observing silent data corruption several layers above, no?

Best regards,

ND
Daniel Carosone
2007-01-16 20:23:56 UTC
Permalink
Post by Nino Dehne
Post by Bernd Ernesti
Can you check the s.m.a.r.t. status of your drives?
atactl wdX smart status
They all check fine. Besides, what would happen if raid(4) got fed a
corrupted block? I'm under the impression I would see more serious errors
far earlier than observing silent data corruption several layers above, no?
Actually, no. raid(4) doesn't check parity unless it gets an error;
you want zfs for that..

And if it does turn out to be power, the corruption induced could be
happening a number of places, including some (like main memory) which
SMART won't see.

--
Dan.
David Laight
2007-01-16 09:28:41 UTC
Permalink
Post by Nino Dehne
After 50 runs of dd if=/dev/rcgd0d bs=65536 count=4096 | md5 and no error
I aborted the test. Replacing rcgd0d with cgd0a made no difference.
While not necessary IMO, I tried the same with rraid1d, no errors either
after 50 runs. For comparison, a loop on the filesystem on the cgd aborted
after the 14th run now.
So the issue doesn't seem to be related to the power supply either and
frankly, it's starting to freak me out.
The 'dd' will be doing sequential reads, whereas the fs version will be doing
considerable numbers of seeks. It is the seeks that cause the disks to
draw current bursts from the psu - so don't discount that.

David
--
David Laight: ***@l8s.co.uk
Daniel Carosone
2007-01-16 20:45:04 UTC
Permalink
Post by Nino Dehne
After 50 runs of dd if=/dev/rcgd0d bs=65536 count=4096 | md5 and no error
I aborted the test. Replacing rcgd0d with cgd0a made no difference.
Interesting.
Post by Nino Dehne
While not necessary IMO, I tried the same with rraid1d, no errors either
after 50 runs.
Which was actually the test I had in mind, to eliminate cgd(4) but
keep the rest the same. Your test above seems to suggest the problem
is somewhere else than cgd alone, which is good.
Post by Nino Dehne
For comparison, a loop on the filesystem on the cgd aborted
after the 14th run now.
So the issue doesn't seem to be related to the power supply either and
frankly, it's starting to freak me out.
I sympathise, but this is progress. You've already done a number of
important things to eliminate certain causes, and now we're
eliminating more and narrowing in on the culprit.
Post by Nino Dehne
The 'dd' will be doing sequential reads, whereas the fs version will be doing
considerable numbers of seeks. It is the seeks that cause the disks to
draw current bursts from the psu - so don't discount that.
And this is a most excellent and important point. Could you try
repeating the test with one or more of these variations to force
seeking:

two concurrent dd's, one with a large skip= to land elsewhere on the
platters

dd from raid and a concurrent fsck -n of the cgd filesystem

multiple concurrent fsck -n's, to see if they ever report different
errors. -n is especially important here, both because of the
concurrency and if they're going to find spurious errors

If this produces the problem, it's a great result, because combined
with your previous test it clearly isolates seeking and thus almost
certainly power as the problem. You've done the test that eliminated
the seeks, now you need to add the seeks and eliminate cgd. After
that, you might try the same test on all of the individual drives in
parallel, to eliminate the raid(4) software, if you really want to
prove the point.

If it doesn't produce the problem, I don't immediately see any other
culprits consistent with the data so far, and I might start getting a
little freaked out too... :-)

--
Dan.
Nino Dehne
2007-01-17 05:06:23 UTC
Permalink
Post by Daniel Carosone
Post by David Laight
The 'dd' will be doing sequential reads, whereas the fs version will be doing
considerable numbers of seeks. It is the seeks that cause the disks to
draw current bursts from the psu - so don't discount that.
And this is a most excellent and important point. Could you try
repeating the test with one or more of these variations to force
two concurrent dd's, one with a large skip= to land elsewhere on the
platters
OK, I have done some extensive tests now. It doesn't look good though:

1) Run memtest86+ again. There's a new version 1.70 with better support
for K8 and DDR2 memory: 4 passes without errors.
2) machdep.powernow.frequency.target = 2000 to maximize power draw.
3) I'm on SMP kernel again. Start 2 instances of

gzip -9c </dev/zero | gzip -dc >/dev/null

i.e. 4 gzip processes are running.
4) Additionally, run

while true; do dd if=/dev/rcgd0d bs=65536 count=1024 2>/dev/null | md5

and

while true; do dd if=/dev/rcgd0d bs=65536 count=1024 skip=123456 2>/dev/null | md5

concurrently. After 100 runs each, not a single mismatch occurred.
cgd0a was not mounted to eliminate filesystem changes affecting the
checksums. Disks were active all the time and top showed no buffer
usage increase, so caching was definitely not involved. The first
even slowed down as expected when starting the second dd. All this
is single-user BTW.

After that, I killed the second dd and tried different blocksizes
but was faced with serious trouble:

While the first dd was running I tried a

dd if=/dev/rcgd0d bs=123405 count=1024 skip=56454

This gave me:

dd: /dev/rcgd0d: Invalid argument
0+0 records in
0+0 records out
0 bytes transferred in 0.002 secs (0 bytes/sec)

Then I tried

dd if=/dev/rcgd0d bs=32000 count=1024 skip=56454 | md5

This panicked the box!

cgd0: error 22
uvm_fault(0xc03c62c0, 0xca596000, 1) -> 0xe
kernel: supervisor trap page fault, code=0
Stopped in pid 31.1 (raidio1) at netbsd:BF_cbc_encrypt+0xed: movl 0
(%esi),%eax
db{0}> trace
uvm_fault(0xc03dbee0, 0, 1) -> 0xe
kernel: supervisor trap page fault, code=0
Faulted in DDB; continuing...
db{0}>

5) A watt meter showed 175W usage during 2)-4) for a whole bunch of
hardware including the server. The hardware minus the server is
drawing ~42W, i.e. the server was drawing around 133W during these
tests. The power supply is only some weeks old and is a bequiet
BQT E5-350W.

After I was done with that, I mounted cgd0a and hashed the usual file
in a loop. Result: mismatch at the 3rd try. This was on an idle box, i.e.
no 3) or 4) running.
Post by Daniel Carosone
dd from raid and a concurrent fsck -n of the cgd filesystem
multiple concurrent fsck -n's, to see if they ever report different
errors. -n is especially important here, both because of the
concurrency and if they're going to find spurious errors
This was actually not possible:

# fsck -n /home
** /dev/rcgd0a (NO WRITE)
** File system is clean; not checking
# fsck -pn /home
NO WRITE ACCESS
/dev/rcgd0a: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY.

I also don't want to mess with the filesystem further. I'm already trying
to minimize access to it in fear of permanent corruption.

Now I'm not sure what to make of this. The cgd/raid panic looks creepy but
I'm not sure how to interpret it.

Does this help you?

In either case, thanks a lot for your help and best regards,

ND
Daniel Carosone
2007-01-17 05:27:37 UTC
Permalink
Post by Nino Dehne
3) I'm on SMP kernel again. Start 2 instances of
gzip -9c </dev/zero | gzip -dc >/dev/null
i.e. 4 gzip processes are running.
4) Additionally, run
while true; do dd if=/dev/rcgd0d bs=65536 count=1024 2>/dev/null | md5
and
while true; do dd if=/dev/rcgd0d bs=65536 count=1024 skip=123456 2>/dev/null | md5
concurrently. After 100 runs each, not a single mismatch occurred.
Hmm. I can only really suspect the filesystem at this point. How large is it?
Post by Nino Dehne
After that, I killed the second dd and tried different blocksizes
While the first dd was running I tried a
dd if=/dev/rcgd0d bs=123405 count=1024 skip=56454
dd: /dev/rcgd0d: Invalid argument
Yeah, bigger than MAXPHYS on a raw device, probably.
Post by Nino Dehne
Then I tried
dd if=/dev/rcgd0d bs=32000 count=1024 skip=56454 | md5
This panicked the box!
Oops. Not good at all but probably a separate issue (non power-of-two
read from raw device).
Post by Nino Dehne
cgd0: error 22
uvm_fault(0xc03c62c0, 0xca596000, 1) -> 0xe
EINVAL, followed by something else that hasn't taken an error path
from there. All yours, Roland :)
Post by Nino Dehne
5) A watt meter showed 175W usage during 2)-4) for a whole bunch of
hardware including the server. The hardware minus the server is
drawing ~42W, i.e. the server was drawing around 133W during these
tests. The power supply is only some weeks old and is a bequiet
BQT E5-350W.
After I was done with that, I mounted cgd0a and hashed the usual file
in a loop. Result: mismatch at the 3rd try. This was on an idle box, i.e.
no 3) or 4) running.
Hrm. I suspect the filesystem at this point, since you seem to have
eliminated seeking power and the cgd device.

Can you tell us more about the fs? ufs1, ufs2? lfs? :-)
What size, block and frag size, etc?

Another thing you might do is compare sucessive dump(8)s of the
filesystem.
Post by Nino Dehne
Post by Daniel Carosone
dd from raid and a concurrent fsck -n of the cgd filesystem
multiple concurrent fsck -n's, to see if they ever report different
errors. -n is especially important here, both because of the
concurrency and if they're going to find spurious errors
# fsck -n /home
** /dev/rcgd0a (NO WRITE)
** File system is clean; not checking
# fsck -pn /home
NO WRITE ACCESS
/dev/rcgd0a: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY.
Oh, you wanted -fn not -pn.
Post by Nino Dehne
I also don't want to mess with the filesystem further. I'm already trying
to minimize access to it in fear of permanent corruption.
Valid fear, and hence -n.
Post by Nino Dehne
Now I'm not sure what to make of this. The cgd/raid panic looks creepy but
I'm not sure how to interpret it.
Does this help you?
In either case, thanks a lot for your help and best regards,
You're most welcome, and have found at least one concrete problem
already for your methodical efforts. I'm very curious now what the
other problem might be, keep at it..

--
Dan.
Nino Dehne
2007-01-17 05:39:30 UTC
Permalink
Post by Daniel Carosone
Hmm. I can only really suspect the filesystem at this point. How large is it?
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/cgd0a 1173977144 1106331136 8947152 99% /home

or

/dev/cgd0a 1.1T 1.0T 8.5G 99% /home
Post by Daniel Carosone
Hrm. I suspect the filesystem at this point, since you seem to have
eliminated seeking power and the cgd device.
Can you tell us more about the fs? ufs1, ufs2? lfs? :-)
What size, block and frag size, etc?
# dumpfs /dev/cgd0a
file system: /dev/cgd0a
endian little-endian
location 65536 (-b 128)
magic 19540119 (UFS2) time Wed Jan 17 06:29:04 2007
superblock location 65536 id [ 44a83fa3 10976c51 ]
cylgrp dynamic inodes FFSv2 sblock FFSv2 fslevel 5
nbfree 1055930 ndir 48350 nifree 35933852 nffree 8311
ncg 354 size 147896952 blocks 146747143
bsize 65536 shift 16 mask 0xffff0000
fsize 8192 shift 13 mask 0xffffe000
frag 8 shift 3 fsbtodb 4
bpg 52224 fpg 417792 ipg 103424
minfree 5% optim time maxcontig 1 maxbpg 8192
symlinklen 120 contigsumsize 0
maxfilesize 0x00800400200bffff
nindir 8192 inopb 256
avgfilesize 16384 avgfpdir 64
sblkno 16 cblkno 24 iblkno 32 dblkno 3264
sbsize 8192 cgsize 65536
csaddr 3264 cssize 8192
cgrotor 0 fmod 0 ronly 0 clean 0x02
flags soft-updates
fsmnt /home
volname swuid 0
cs[].cs_(nbfree,ndir,nifree,nffree):
[...]

I hope this includes the info you want. I initialized it back then using
a simple newfs -O 2 under 3.0 I think.
Post by Daniel Carosone
Another thing you might do is compare sucessive dump(8)s of the
filesystem.
I will see what I can do. I never used dump(8) before.
Post by Daniel Carosone
Oh, you wanted -fn not -pn.
Oops, I even checked the man page. No idea how I ended up with -p. I will
have another test session tomorrow morning when the box is not used as much.
It will have to serve some stuff over the day.
Post by Daniel Carosone
You're most welcome, and have found at least one concrete problem
already for your methodical efforts. I'm very curious now what the
other problem might be, keep at it..
I'll keep you posted. Thanks again.

Regards,

ND
Pavel Cahyna
2007-01-17 09:18:30 UTC
Permalink
Post by Daniel Carosone
Post by Nino Dehne
While the first dd was running I tried a
dd if=/dev/rcgd0d bs=123405 count=1024 skip=56454
dd: /dev/rcgd0d: Invalid argument
Yeah, bigger than MAXPHYS on a raw device, probably.
More importantly, not being a multiple of 512.

Pavel
Nino Dehne
2007-01-16 21:06:26 UTC
Permalink
Post by David Laight
Post by Nino Dehne
After 50 runs of dd if=/dev/rcgd0d bs=65536 count=4096 | md5 and no error
I aborted the test. Replacing rcgd0d with cgd0a made no difference.
While not necessary IMO, I tried the same with rraid1d, no errors either
after 50 runs. For comparison, a loop on the filesystem on the cgd aborted
after the 14th run now.
So the issue doesn't seem to be related to the power supply either and
frankly, it's starting to freak me out.
The 'dd' will be doing sequential reads, whereas the fs version will be doing
considerable numbers of seeks. It is the seeks that cause the disks to
draw current bursts from the psu - so don't discount that.
Good point. To accommodate to that, I repeatedly cat'ed the test file on the
cgd partition to /dev/null. At the same time, I hashed the first 64M of rcgd0d
in a loop. I used 64M instead of 256M because the disk thrashing was really
bad. I also set the CPU frequency to its maximum to maximize the power the
system draws.

The results were as follows:

f7abc41f7514946306a6aeddca8cb704
[about 70 occurrences of the same checksum]
f7abc41f7514946306a6aeddca8cb704
102e34f6c25d4fc135da03e26d4feff0
cfc1dc011ccff0e82fc5aa5a69173bd0
cfc1dc011ccff0e82fc5aa5a69173bd0
cfc1dc011ccff0e82fc5aa5a69173bd0

I attribute the checksum change to changes on the filesystem, since that was
obviously mounted while doing the test. Getting over 70 equal checksums and
then 3 equal other checksums in a row with flaky hardware seems highly
improbable to me.

In comparison, a loop of hashes on the file itself afterwards gave the
following result:

82d964b8d0cd2f60041067fc9263c1d7
82d964b8d0cd2f60041067fc9263c1d7
686d81e7362114475427b7fff2aec4fb
82d964b8d0cd2f60041067fc9263c1d7
82d964b8d0cd2f60041067fc9263c1d7
82d964b8d0cd2f60041067fc9263c1d7
82d964b8d0cd2f60041067fc9263c1d7

i.e. mismatch at the 3rd run. I seriously doubt that the 70+ successful runs
on the rcgd0d device were pure luck.

Please, anyone. :(

Best regards,

ND
Daniel Carosone
2007-01-16 21:33:20 UTC
Permalink
Post by Nino Dehne
Post by David Laight
considerable numbers of seeks. It is the seeks that cause the disks to
draw current bursts from the psu - so don't discount that.
Good point. To accommodate to that, I repeatedly cat'ed the test file on the
cgd partition to /dev/null. At the same time, I hashed the first 64M of rcgd0d
in a loop. I used 64M instead of 256M because the disk thrashing was really
bad. I also set the CPU frequency to its maximum to maximize the power the
system draws.
a cpu-hog process would help here too..
Post by Nino Dehne
I attribute the checksum change to changes on the filesystem, since that was
obviously mounted while doing the test.
Probably, yeah; I gave some suggestions for ways to avoid this a
moment ago, too.
Post by Nino Dehne
Getting over 70 equal checksums and then 3 equal other checksums in
a row with flaky hardware seems highly improbable to me.
Or the 64m is fitting in cache most of the time, and the bad read was
cached and thus repeated?
Post by Nino Dehne
i.e. mismatch at the 3rd run. I seriously doubt that the 70+ successful runs
on the rcgd0d device were pure luck.
Please try some of the other variants I suggested. Perhaps try
varying the block size of the dd, too. If these eliminate seeking,
then the next possible culprit is probably the filesystem :-/.

--
Dan.
Daniel Carosone
2007-01-16 21:38:29 UTC
Permalink
Post by Daniel Carosone
Post by Nino Dehne
I also set the CPU frequency to its maximum to maximize the power the
system draws.
a cpu-hog process would help here too..
to elaborate: it's a tricky balance. with the file reads, you're
seeking, but not a whole lot, and you're using most of your cpu for
cgd and md5 calculation. if you induce *too* much seek contention
with some of these artificial tests, you may in fact starve the cpu of
data for this calculation and actually use less power.

--
Dan.
Nino Dehne
2007-01-16 21:45:10 UTC
Permalink
Post by Daniel Carosone
Post by Nino Dehne
Post by David Laight
considerable numbers of seeks. It is the seeks that cause the disks to
draw current bursts from the psu - so don't discount that.
Good point. To accommodate to that, I repeatedly cat'ed the test file on the
cgd partition to /dev/null. At the same time, I hashed the first 64M of rcgd0d
in a loop. I used 64M instead of 256M because the disk thrashing was really
bad. I also set the CPU frequency to its maximum to maximize the power the
system draws.
a cpu-hog process would help here too..
While doing the above, the CPU is about 0%-8% idle. I'm still running a
UP kernel.
Post by Daniel Carosone
Post by Nino Dehne
I attribute the checksum change to changes on the filesystem, since that was
obviously mounted while doing the test.
Probably, yeah; I gave some suggestions for ways to avoid this a
moment ago, too.
I'll have a look. Your other mail just arrived due to connectivity problems
earlier.
Post by Daniel Carosone
Post by Nino Dehne
Getting over 70 equal checksums and then 3 equal other checksums in
a row with flaky hardware seems highly improbable to me.
Or the 64m is fitting in cache most of the time, and the bad read was
cached and thus repeated?
Just doing the hashing from rcgd0d leaves the disks active 100%. I think
dd from a raw device is not cached.
Post by Daniel Carosone
Post by Nino Dehne
i.e. mismatch at the 3rd run. I seriously doubt that the 70+ successful runs
on the rcgd0d device were pure luck.
Please try some of the other variants I suggested. Perhaps try
varying the block size of the dd, too. If these eliminate seeking,
then the next possible culprit is probably the filesystem :-/.
Gonna do this right away.

Thanks and regards,

ND
Thilo Jeremias
2007-01-17 13:31:47 UTC
Permalink
is the changed checksum always deterministicly the same?
Meaning is this a systematic error, or
(Where I would guess for drive/cable/power etc problems) is it always a
different checksum (I mean are there more than two checksums)

If it is deterministic, it probably just happens at a certain block, so
it might help then to isolate the location where the fault is
to find the cause
--
my 5 cts'

good luck

thilo
Post by Nino Dehne
Post by Daniel Carosone
Post by Nino Dehne
Post by David Laight
considerable numbers of seeks. It is the seeks that cause the disks to
draw current bursts from the psu - so don't discount that.
Good point. To accommodate to that, I repeatedly cat'ed the test file on the
cgd partition to /dev/null. At the same time, I hashed the first 64M of rcgd0d
in a loop. I used 64M instead of 256M because the disk thrashing was really
bad. I also set the CPU frequency to its maximum to maximize the power the
system draws.
a cpu-hog process would help here too..
While doing the above, the CPU is about 0%-8% idle. I'm still running a
UP kernel.
Post by Daniel Carosone
Post by Nino Dehne
I attribute the checksum change to changes on the filesystem, since that was
obviously mounted while doing the test.
Probably, yeah; I gave some suggestions for ways to avoid this a
moment ago, too.
I'll have a look. Your other mail just arrived due to connectivity problems
earlier.
Post by Daniel Carosone
Post by Nino Dehne
Getting over 70 equal checksums and then 3 equal other checksums in
a row with flaky hardware seems highly improbable to me.
Or the 64m is fitting in cache most of the time, and the bad read was
cached and thus repeated?
Just doing the hashing from rcgd0d leaves the disks active 100%. I think
dd from a raw device is not cached.
Post by Daniel Carosone
Post by Nino Dehne
i.e. mismatch at the 3rd run. I seriously doubt that the 70+ successful runs
on the rcgd0d device were pure luck.
Please try some of the other variants I suggested. Perhaps try
varying the block size of the dd, too. If these eliminate seeking,
then the next possible culprit is probably the filesystem :-/.
Gonna do this right away.
Thanks and regards,
ND
Steven M. Bellovin
2007-01-17 15:02:49 UTC
Permalink
On Wed, 17 Jan 2007 23:30:55 +1000
Post by Thilo Jeremias
is the changed checksum always deterministicly the same?
Meaning is this a systematic error, or
(Where I would guess for drive/cable/power etc problems) is it always
a different checksum (I mean are there more than two checksums)
If it is deterministic, it probably just happens at a certain block,
so it might help then to isolate the location where the fault is to
find the cause
Is there any chance the two different mirrors -- you did say RAID,
right, though I confess I don't remember which variant -- have
different versions of the block? That shouldn't happen, of course, but
if it did it would explain the problem.


--Steve Bellovin, http://www.cs.columbia.edu/~smb
Daniel Carosone
2007-01-17 20:32:21 UTC
Permalink
Post by Steven M. Bellovin
Is there any chance the two different mirrors -- you did say RAID,
right, though I confess I don't remember which variant -- have
different versions of the block? That shouldn't happen, of course, but
if it did it would explain the problem.
It's RAID5, from the original post. One of the first ideas I had and
eliminated, alas.

Nino, are you running a kernel with DIAGNOSTIC and/or DEBUG? Looking
at the cgd panic you found, I'm guessing not, because the path we see
to that problem would have involved one or more likely DIAGNOSTIC
messages.

If you're able, adding those options would probably be a very good
idea at this point, especially as filesystem issues are looking more
and more likely. The combination of ffsv2, >1Tb, and older kernels
smells fishy to me, and any additional clues they may provide could be
vital. Reproducing that combination on a test machine, without cgd
and R5, would also be a good idea if feasible.

--
Dan.
Nino Dehne
2007-01-17 22:59:31 UTC
Permalink
Post by Daniel Carosone
Nino, are you running a kernel with DIAGNOSTIC and/or DEBUG? Looking
at the cgd panic you found, I'm guessing not, because the path we see
to that problem would have involved one or more likely DIAGNOSTIC
messages.
Not yet, but that just went on my list of things to try.
Post by Daniel Carosone
The combination of ffsv2, >1Tb, and older kernels
smells fishy to me, and any additional clues they may provide could be
vital. Reproducing that combination on a test machine, without cgd
and R5, would also be a good idea if feasible.
Unfortunately, I'll have to pass. That filesystem is the only one of that
size I have access to. Perhaps someone else is running that combination.
It shouldn't be too unlikely.

My plans for today:

1) Boot DIAGNOSTIC+DEBUG kernel
2) Run fsck -f[1]
3) Last resort: transfer disks to my desktop machine and try to reproduce
the problem

Best regards,

ND


[1] That fs went through several real fscks recently as I was fighting
some stubborn disk controller[2]. I never noticed anything unusual.
Still gonna try, though.
[2] A SiI0680 cmdide(4) controller was apparently causing lockups during
heavy I/O. The disks are now master+slave on an onboard viaide(4)
(4 disks) and master on a PCI hptide(4) (the 5th disk).
Nino Dehne
2007-01-18 08:22:10 UTC
Permalink
Post by Nino Dehne
Post by Daniel Carosone
Nino, are you running a kernel with DIAGNOSTIC and/or DEBUG? Looking
at the cgd panic you found, I'm guessing not, because the path we see
to that problem would have involved one or more likely DIAGNOSTIC
messages.
Not yet, but that just went on my list of things to try.
I'm now running the system with those options. I didn't try to provoke
the cgd panic yet, though. Parity recalculation is a lengthy process.
Post by Nino Dehne
1) Boot DIAGNOSTIC+DEBUG kernel
2) Run fsck -f[1]
I ran fsck -fn 10 times in a row, with 4 gzips running concurrently.
Nothing. Output looked like this each time:

** /dev/rcgd0a (NO WRITE)
** File system is already clean
** Last Mounted on /home
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
678270 files, 138286777 used, 8460366 free (8334 frags, 1056504 blocks, 0.0% fragmentation)

What I did manage was getting two samples of how the corruption looks like.
I copied a ~650M file to /var/tmp where the corruption never occurred so far.
I verified this by hashing it 100 times without an error. The file is
a .rar file so I could verify its integrity.

I then wrote a little script that copied the same file from cgd0a to /var/tmp
over and over again with a different name, hashing it and aborting if the hash
mismatched the predetermined value.

Then I ran cmp -l /var/tmp/<good file> /var/tmp/<bad file>:

503124993 246 310
503124994 132 251
503124995 230 221
503124996 211 351
503124997 51 46
503124998 214 173
503124999 374 122
503125000 144 331
503125001 134 141
503125002 150 336
503125003 46 247
503125004 266 153
503125217 257 211
503125218 303 217
503125219 111 14
503125220 70 227
503125221 2 316
503125222 343 340
503125223 207 372
503125224 350 210
503125229 100 67
503125230 64 145
503125231 262 327
503125232 205 146

Another run of the script got me another sample. cmp -l:

502883433 167 363
502883434 141 126
502883435 26 11
502883436 311 67
502883437 25 153
502883438 302 103
502883439 145 40
502883440 103 71
502883445 346 174
502883446 45 60
502883447 333 262

I managed to get both samples with under 20 runs of the script.

File size is exactly 678765312 bytes. For good measure I hashed both the
good and the bad copy 100 times while they were on /var: no mismatch from
their actual hash values.

As a wild guess, I resolved all IRQ conflicts on the machine. The extra
IDE controller shares an interrupt with one of the USB controllers, so I
disabled USB temporarily.

Also, since all disks have separate scratch partitions on them besides the
respective RAID component, I did the usual hashing loop on the disk that's
connected to the separate controller[1].

Both steps helped nothing to resolve the issue.
Post by Nino Dehne
3) Last resort: transfer disks to my desktop machine and try to reproduce
the problem
That will have to wait. I will see to reproducing the cgd panic while at it.

Best regards,

ND


[1]:
hptide0 at pci0 dev 9 function 0
hptide0: Triones/Highpoint HPT371 IDE Controller
hptide0: bus-master DMA support present
hptide0: primary channel wired to native-PCI mode
hptide0: using ioapic0 pin 16 (irq 7) for native-PCI interrupt
Daniel Carosone
2007-01-21 21:47:22 UTC
Permalink
/*
* tech-kern@ added and subject changed, in the hopes of recruiting
* some ffs-expert help. After some comprehensive testing and
* elimination this is looking very much like a ffsv2 bug to me.
*
* Quick background, more details available in the current-users
* archives:
* - ~1.1Tb ffs2 on cgd on raidframe R5 on 5x wd(4) NetBSD 3.x
* - reproducible occasional data corruption reading files
* - not reproducible reading wd, raid, or cgd devices
* - hardware, memory, power, etc pretty well eliminated
*/
Post by Nino Dehne
Post by Nino Dehne
Post by Daniel Carosone
Nino, are you running a kernel with DIAGNOSTIC and/or DEBUG? Looking
at the cgd panic you found, I'm guessing not, because the path we see
to that problem would have involved one or more likely DIAGNOSTIC
messages.
Not yet, but that just went on my list of things to try.
I'm now running the system with those options. I didn't try to provoke
the cgd panic yet, though. Parity recalculation is a lengthy process.
Sure. While a test run of that when you get a chance could be helpful
to confirm the specific diagnosis of that problem, it's a separate
issue from your data corruption.
Post by Nino Dehne
Post by Nino Dehne
1) Boot DIAGNOSTIC+DEBUG kernel
2) Run fsck -f[1]
I ran fsck -fn 10 times in a row, with 4 gzips running concurrently.
** /dev/rcgd0a (NO WRITE)
** File system is already clean
** Last Mounted on /home
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
678270 files, 138286777 used, 8460366 free (8334 frags, 1056504 blocks, 0.0% fragmentation)
The only real explanation is that live access via the filesystem code
is causing the problem; every other path at every other layer below
that has been unable to provoke the issue, and at least fsck isn't
recognising any on-disk corruption.
Post by Nino Dehne
503124993 246 310
503124994 132 251
503124995 230 221
503124996 211 351
503124997 51 46
503124998 214 173
503124999 374 122
503125000 144 331
503125001 134 141
503125002 150 336
503125003 46 247
503125004 266 153
503125217 257 211
503125218 303 217
503125219 111 14
503125220 70 227
503125221 2 316
503125222 343 340
503125223 207 372
503125224 350 210
503125229 100 67
503125230 64 145
503125231 262 327
503125232 205 146
502883433 167 363
502883434 141 126
502883435 26 11
502883436 311 67
502883437 25 153
502883438 302 103
502883439 145 40
502883440 103 71
502883445 346 174
502883446 45 60
502883447 333 262
I managed to get both samples with under 20 runs of the script.
It's a small sample, but the coincidence of the offset and short range
at which the corruption occurs are rather interesting. Especially if
this pattern is repeated, and for different large source files, its
continuing to strongly suggest filesystem issues to me.

Can you try to provoke the problem with a file a little smaller than
this? Perhaps also with files much larger, to see if there's ever
corruption further along than this. My off-hand guess is that this is
near a boundary where the next level of indirection blocks kicks in.

The next thing to try, if you can, is a -current (or 4-beta) kernel to
see if any filesystem fixes since 3.x have been missed.

Is there an ffs doctor in the house? This is reminding me more and
more of problems der Mouse reported seeing some time ago.
Post by Nino Dehne
As a wild guess, I resolved all IRQ conflicts on the machine.
[..]
Both steps helped nothing to resolve the issue.
These were unlikely at this point, but thanks for going to the effort
of eliminating them.

--
Dan.
Daniel Carosone
2007-01-21 22:06:50 UTC
Permalink
Post by Daniel Carosone
* - ~1.1Tb ffs2 on cgd on raidframe R5 on 5x wd(4) NetBSD 3.x
Apologies. It's 4.0_BETA2 from mid december.
Post by Daniel Carosone
The next thing to try, if you can, is a -current (or 4-beta) kernel to
see if any filesystem fixes since 3.x have been missed.
. which makes this less relevant, but it would still be useful to
confirm with a recent -4 or current.

--
Dan.
Pavel Cahyna
2007-01-21 23:16:53 UTC
Permalink
Post by Daniel Carosone
/*
* some ffs-expert help. After some comprehensive testing and
* elimination this is looking very much like a ffsv2 bug to me.
*
* Quick background, more details available in the current-users
* - ~1.1Tb ffs2 on cgd on raidframe R5 on 5x wd(4) NetBSD 3.x
* - reproducible occasional data corruption reading files
The next step could be to try ffsv1, if dumping and recreating 1.1TB
filesystem is feasible.

According to archives, ffsv1 on >1TB used to work fine, if the filesystem
is smaller than 2TB.

Pavel
Nino Dehne
2007-01-24 18:36:48 UTC
Permalink
Hi there,

first, I'm feeling really stupid and I'm terribly sorry to have caused
such an uproar. It appears that the issue _was_ hardware-based after all.
At least that's how things look currently. Let me explain:

Before messing around further I wanted to try the setup in my desktop
box. So I swapped disks, using a different add-on controller than in
the server and also using different cables.

The issue didn't show up. OK, a bit let down that the new server hardware
might be flaky and not knowing exactly which part of it, I tried running
the same setup in the desktop with the add-on controller from the server
(HPT371 single-channel). This brought back the dreaded no-panic-no-nothing-
lockups I had experienced in the server earlier already. Back then, I
used both the HPT and an additional SiI0680 cmdide(4) controller so that
all disks had their dedicated channel. Seeing those lockups on the desktop
now immediately raised a flag.

It dawned on me that the cause of the lockups earlier might not have been
the cmdide(4) controller I ripped out but instead the hptide(4) one. The
cmdide(4) had other issues in the desktop box, though (lost interrupts).

I swapped all disks back to the server and replaced the HPT with a Promise
Fasttrak100. And what can I say, 200 runs without a single error. I will
watch things closely but I'm confident.

I still don't understand the symptoms fully, though.
Post by Daniel Carosone
Post by Nino Dehne
As a wild guess, I resolved all IRQ conflicts on the machine.
[..]
Both steps helped nothing to resolve the issue.
These were unlikely at this point, but thanks for going to the effort
of eliminating them.
As it turned out, nothing seems to be unlikely. :/ I would have never
expected the controller to be flaky either. Especially not when I do huge
transfers from a raw device without an error. Do you think there might
still be a bug in NetBSD, but instead of the FFS code it's hptide(4) with
that specific controller?

Anyway, thanks a lot for your efforts everyone and sorry for the trouble.

Best regards,

ND
Nino Dehne
2007-01-17 22:22:05 UTC
Permalink
Post by Thilo Jeremias
is the changed checksum always deterministicly the same?
Meaning is this a systematic error, or
(Where I would guess for drive/cable/power etc problems) is it always a
different checksum (I mean are there more than two checksums)
From what I can see, the wrong checksum is always different. I didn't
write them down but in one run I had 3 errors and they were all random.

Regards,

ND
Brian Buhrow
2007-01-21 23:40:02 UTC
Permalink
Hello Dan. I've been following this thread somewhat closely, though
not with agonizing detail. I have a NetBSD-3.x system with a 2.1TB FFSV2
filesystem built atop a 7 disk raid 5 set using SATA (wd) disks. Are you
suggesting that if I run a file off of that filesystem through MD5 multiple
times, that you think I'll see corruption? If that's the case, I'll be
happy to run some tests, with variously sized files, and let you know the
results. Anecdotally speaking, I've been running this system for about 9
months, and I've not seen any problems what so ever with the filesystem or
the files on it.
So, what test should I perform to help expose this problem?
-thanks
-Brian

On Jan 22, 9:05am, Daniel Carosone wrote:
} Subject: Re: Data corruption issues, probably involving ffs2 and >1Tb
}
} --NMuMz9nt05w80d4+
} Content-Type: text/plain; charset=us-ascii
} Content-Disposition: inline
} Content-Transfer-Encoding: quoted-printable
}
} On Mon, Jan 22, 2007 at 08:45:19AM +1100, Daniel Carosone wrote:
} > * - ~1.1Tb ffs2 on cgd on raidframe R5 on 5x wd(4) NetBSD 3.x
}
} Apologies. It's 4.0_BETA2 from mid december.
}
} > The next thing to try, if you can, is a -current (or 4-beta) kernel to
} > see if any filesystem fixes since 3.x have been missed. =20
}
} =2E. which makes this less relevant, but it would still be useful to
} confirm with a recent -4 or current.
}
} --
} Dan.
}
}
} --NMuMz9nt05w80d4+
} Content-Type: application/pgp-signature
} Content-Disposition: inline
}
} -----BEGIN PGP SIGNATURE-----
} Version: GnuPG v1.4.6 (NetBSD)
}
} iD8DBQFFs+OrEAVxvV4N66cRAnTzAJ0aUlSCnEHGWrUajKkeiwFbcp/gxACfQIs2
} q7bzANV3L289exfV1FNRyW8=
} =P/rC
} -----END PGP SIGNATURE-----
}
} --NMuMz9nt05w80d4+--
-- End of excerpt from Daniel Carosone
Daniel Carosone
2007-01-22 00:01:45 UTC
Permalink
I have a NetBSD-3.x system with a 2.1TB FFSV2 filesystem built atop
a 7 disk raid 5 set using SATA (wd) disks. Are you suggesting that
if I run a file off of that filesystem through MD5 multiple times,
that you think I'll see corruption? If that's the case, I'll be
happy to run some tests, with variously sized files, and let you
know the results.
Yes please.. make sure that you're testing repeated reads from disk
rather than through cache, especially for the smaller files. You
might most easily do this by having a collection of files of different
sizes, and reading through them each in turn per run.

--
Dan.
Brian Buhrow
2007-01-23 00:49:33 UTC
Permalink
Hello. I've setup a test, using the shell script below, which I
expect to take a few days to complete. I'll let you know the results when
they're in.
expdct
If you see anything wrong with this shell script in terms of its
methodology, let me know.
-thanks
-Brian


Usage:
cd to top of tree to be tested.
/path/to/script >& /var/tmp/scrip.log&
..
wait
..

#!/bin/sh
#$Id: test-ffsv2.sh,v 1.1 2007/01/23 00:42:32 buhrow Exp $
#NAME: Brian Buhrow
#DATE: January 22, 2007
#The purpose of this shell script is to determine if ffsV2 as implemented
#on NetBSD-3.x systems, is corrupting files somehow. This could test files
#on any filesystem, it just happens that FFSV2 may be a problem, so this is
#the impetus to write this script.

#Now, let's define constants and do work.

PATH=/bin:/usr/bin:/sbin:/usr/sbin; export PATH
MAXRUNS=1002

#The basic methodology of this script is to create a list of files to
#check, capture an MD5 checksum for each file, and then repeatedly run each
#file through MD5 again and see if its output differs from the original
#captured output. Because we use find to get a list of files to check,
#Be sure you run this on filesystem trees where you know how many files
#you'll be scanning. Otherwise, you might scan more than you bargained on.

fillist=`find . -type f -print |tr '\012' ' '`

#Now, we'll do our runs. The first run is comprised of collecting the MD5
#signatures, but other than that, there's nothing special about it.
currun=1

while [ $currun -lt $MAXRUNS ]; do
for curfil in $fillist; do
hashname=`echo $curfil |sed 's+/+-+g' |sed 's+^\.-++'`
hashname="/var/tmp/$hashname"
if [ ! -s "$hashname" ]; then
destfile=$hashname
initrun="true"
else
destfile="/var/tmp/testmd5.$$"
initrun="false"
fi
/bin/rm -f $destfile
md5 $curfil > $destfile
if [ X"$initrun" != X"true" ]; then
diff -u $hashname $destfile
if [ $? -ne 0 ]; then
echo "Differences encountered for $curfile!!!"
fi
echo "$curfil($currun)"
fi
done
currun=`echo $currun |awk '{ count = $1 + 1;print count }'`
done

echo "All finished!!!"
exit 0
Brian Buhrow
2007-01-24 18:46:37 UTC
Permalink
Hello. I'm happy to report that my md5 tests, which I've been running
over the last 72 hours, have yielded no errors what soever. These have
been using the pdcsata(4) driver, which seems to work fine.
Glad you found the trouble. It's possible there's a bug in the
hptide(4) driver, it's also possible that there's a bug in the specific
revision of the hpt chipset you have in your card, wich the driver doesn't
work around. Are there many folks using the hptide(4) driver, and, if so,
what revisions of chips are they using it with?
-Brian

On Jan 24, 7:35pm, Nino Dehne wrote:
} Subject: Re: Data corruption issues, probably involving ffs2 and >1Tb (SOL
} Hi there,
}
} first, I'm feeling really stupid and I'm terribly sorry to have caused
} such an uproar. It appears that the issue _was_ hardware-based after all.
} At least that's how things look currently. Let me explain:
}
} Before messing around further I wanted to try the setup in my desktop
} box. So I swapped disks, using a different add-on controller than in
} the server and also using different cables.
}
} The issue didn't show up. OK, a bit let down that the new server hardware
} might be flaky and not knowing exactly which part of it, I tried running
} the same setup in the desktop with the add-on controller from the server
} (HPT371 single-channel). This brought back the dreaded no-panic-no-nothing-
} lockups I had experienced in the server earlier already. Back then, I
} used both the HPT and an additional SiI0680 cmdide(4) controller so that
} all disks had their dedicated channel. Seeing those lockups on the desktop
} now immediately raised a flag.
}
} It dawned on me that the cause of the lockups earlier might not have been
} the cmdide(4) controller I ripped out but instead the hptide(4) one. The
} cmdide(4) had other issues in the desktop box, though (lost interrupts).
}
} I swapped all disks back to the server and replaced the HPT with a Promise
} Fasttrak100. And what can I say, 200 runs without a single error. I will
} watch things closely but I'm confident.
}
} I still don't understand the symptoms fully, though.
}
} On Mon, Jan 22, 2007 at 08:45:19AM +1100, Daniel Carosone wrote:
} > > As a wild guess, I resolved all IRQ conflicts on the machine.
} > > [..]
} > > Both steps helped nothing to resolve the issue.
} >
} > These were unlikely at this point, but thanks for going to the effort
} > of eliminating them.
}
} As it turned out, nothing seems to be unlikely. :/ I would have never
} expected the controller to be flaky either. Especially not when I do huge
} transfers from a raw device without an error. Do you think there might
} still be a bug in NetBSD, but instead of the FFS code it's hptide(4) with
} that specific controller?
}
} Anyway, thanks a lot for your efforts everyone and sorry for the trouble.
}
} Best regards,
}
} ND
-- End of excerpt from Nino Dehne
Greg Oster
2007-01-26 23:03:40 UTC
Permalink
Post by Brian Buhrow
Hello. I'm happy to report that my md5 tests, which I've been running
over the last 72 hours, have yielded no errors what soever. These have
been using the pdcsata(4) driver, which seems to work fine.
Glad you found the trouble. It's possible there's a bug in the
hptide(4) driver, it's also possible that there's a bug in the specific
revision of the hpt chipset you have in your card, wich the driver doesn't
work around. Are there many folks using the hptide(4) driver, and, if so,
what revisions of chips are they using it with?
I've been running disks on this:

hptide0 at pci0 dev 19 function 0
hptide0: Triones/Highpoint HPT370 IDE Controller
hptide0: bus-master DMA support present
hptide0: primary channel wired to native-PCI mode
hptide0: using irq 10 for native-PCI interrupt

for *ages* now. (I had problems with lockups early on before I upgraded
the bios on the thing... but since then it's been 100% solid and used
to drive disks under RAIDframe sets..)

Later...

Greg Oster
Post by Brian Buhrow
} Subject: Re: Data corruption issues, probably involving ffs2 and >1Tb (SOL
} Hi there,
}
} first, I'm feeling really stupid and I'm terribly sorry to have caused
} such an uproar. It appears that the issue _was_ hardware-based after all.
}
} Before messing around further I wanted to try the setup in my desktop
} box. So I swapped disks, using a different add-on controller than in
} the server and also using different cables.
}
} The issue didn't show up. OK, a bit let down that the new server hardware
} might be flaky and not knowing exactly which part of it, I tried running
} the same setup in the desktop with the add-on controller from the server
} (HPT371 single-channel). This brought back the dreaded no-panic-no-nothing-
} lockups I had experienced in the server earlier already. Back then, I
} used both the HPT and an additional SiI0680 cmdide(4) controller so that
} all disks had their dedicated channel. Seeing those lockups on the desktop
} now immediately raised a flag.
}
} It dawned on me that the cause of the lockups earlier might not have been
} the cmdide(4) controller I ripped out but instead the hptide(4) one. The
} cmdide(4) had other issues in the desktop box, though (lost interrupts).
}
} I swapped all disks back to the server and replaced the HPT with a Promise
} Fasttrak100. And what can I say, 200 runs without a single error. I will
} watch things closely but I'm confident.
}
} I still don't understand the symptoms fully, though.
}
} > > As a wild guess, I resolved all IRQ conflicts on the machine.
} > > [..]
} > > Both steps helped nothing to resolve the issue.
} >
} > These were unlikely at this point, but thanks for going to the effort
} > of eliminating them.
}
} As it turned out, nothing seems to be unlikely. :/ I would have never
} expected the controller to be flaky either. Especially not when I do huge
} transfers from a raw device without an error. Do you think there might
} still be a bug in NetBSD, but instead of the FFS code it's hptide(4) with
} that specific controller?
}
} Anyway, thanks a lot for your efforts everyone and sorry for the trouble.
}
} Best regards,
}
} ND
-- End of excerpt from Nino Dehne
John Nemeth
2009-01-06 11:04:19 UTC
Permalink
On May 29, 1:39pm, Daniel Carosone wrote:
} On Mon, Jan 05, 2009 at 10:29:52PM -0800, John Nemeth wrote:
} > I am trying to run MySQL 5 on it, but it fails initialisation. I
} > also tried PostgreSQL last week and it failed in a similar way, so I
} > don't think the problem is with MySQL. mysqld only links against
} > system libraries so that rules out other packages as being the
} > problem...
}
} Wrong cpuflags, in particular related to fp ?

Thanks! That was it. Guess it was a pkgsrc issue of sorts. I
use devel/cpuflags which currently reports:

-----

P4-3679GHz: {67} cpuflags
-mfpmath=sse -msse2 -march=prescott

-----

It used to say "-mfpmath=sse -msse2 -march=pentium4". My CPU is:

-----

P4-3679GHz: {69} cpuctl identify 0
cpu0: Intel Pentium 4 (686-class), 3519.84 MHz, id 0xf29
cpu0: features 0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features 0xbfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features 0xbfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: features2 0x4400<CID,xTPR>
cpu0: "Intel(R) Pentium(R) 4 CPU 3.20GHz"
cpu0: I-cache 12K uOp cache 8-way, D-cache 8KB 64B/line 4-way
cpu0: L2 cache 1MB 64B/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: Initial APIC ID 0
cpu0: Cluster/Package ID 0
cpu0: SMT ID 0
cpu0: family 0f model 02 extfamily 00 extmodel 00

-----

I don't know the code names well enough to know if this qualifies
as prescott or not (it is a hyperthreading CPU). Given what the GCC
manpage says and lack of SSE3 in the above, I'm guessing it doesn't.

-----

prescott
Improved version of Intel Pentium4 CPU with MMX, SSE, SSE2 and
SSE3 instruction set support.

-----

I changed it back to -march=pentium4 and retested. It worked.

}-- End of excerpt from Daniel Carosone
David Brownlee
2009-01-09 12:56:02 UTC
Permalink
Thanks for catching this - Intel appear to have re-used cpu
branding strings between Northwood and Prescott pentium4s.
Which is just... special and annoying.

cpuflags now handles this by explicitly testing for SSE3 support
to distinguish between '-march=prescott' and '-march=pentium4'
fixed in v1.32 and apologies for the pain...
Post by John Nemeth
} > I am trying to run MySQL 5 on it, but it fails initialisation. I
} > also tried PostgreSQL last week and it failed in a similar way, so I
} > don't think the problem is with MySQL. mysqld only links against
} > system libraries so that rules out other packages as being the
} > problem...
}
} Wrong cpuflags, in particular related to fp ?
Thanks! That was it. Guess it was a pkgsrc issue of sorts. I
-----
P4-3679GHz: {67} cpuflags
-mfpmath=sse -msse2 -march=prescott
-----
-----
P4-3679GHz: {69} cpuctl identify 0
cpu0: Intel Pentium 4 (686-class), 3519.84 MHz, id 0xf29
cpu0: features 0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features 0xbfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features 0xbfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: features2 0x4400<CID,xTPR>
cpu0: "Intel(R) Pentium(R) 4 CPU 3.20GHz"
cpu0: I-cache 12K uOp cache 8-way, D-cache 8KB 64B/line 4-way
cpu0: L2 cache 1MB 64B/line 8-way
cpu0: ITLB 4K/4M: 64 entries
cpu0: DTLB 4K/4M: 64 entries
cpu0: Initial APIC ID 0
cpu0: Cluster/Package ID 0
cpu0: SMT ID 0
cpu0: family 0f model 02 extfamily 00 extmodel 00
-----
I don't know the code names well enough to know if this qualifies
as prescott or not (it is a hyperthreading CPU). Given what the GCC
manpage says and lack of SSE3 in the above, I'm guessing it doesn't.
-----
prescott
Improved version of Intel Pentium4 CPU with MMX, SSE, SSE2 and
SSE3 instruction set support.
-----
I changed it back to -march=pentium4 and retested. It worked.
}-- End of excerpt from Daniel Carosone
--
David/absolute -- www.NetBSD.org: No hype required --
Geert Hendrickx
2009-01-09 15:12:37 UTC
Permalink
Thanks for catching this - Intel appear to have re-used cpu branding
strings between Northwood and Prescott pentium4s. Which is just...
special and annoying.
cpuflags now handles this by explicitly testing for SSE3 support to
distinguish between '-march=prescott' and '-march=pentium4' fixed in
v1.32 and apologies for the pain...
There seem to be similar cases like this:

cpu_name="Intel Pentium III (Katmai) (686-class), 2660.15 MHz, id 0x10677"
cpu_brand="Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz"
cpu_family=6
cpu_model=7
cpu_extfamily=0
cpu_extmodel=1

Here cpuflags suggests "-mfpmath=sse -msse3 -march=prescott", while I use
"-march=nocona" (gcc 4.1).

Geert
David Brownlee
2009-01-12 22:13:24 UTC
Permalink
Post by Geert Hendrickx
Thanks for catching this - Intel appear to have re-used cpu branding
strings between Northwood and Prescott pentium4s. Which is just...
special and annoying.
cpuflags now handles this by explicitly testing for SSE3 support to
distinguish between '-march=prescott' and '-march=pentium4' fixed in
v1.32 and apologies for the pain...
cpu_name="Intel Pentium III (Katmai) (686-class), 2660.15 MHz, id 0x10677"
cpu_family=6
cpu_model=7
cpu_extfamily=0
cpu_extmodel=1
Here cpuflags suggests "-mfpmath=sse -msse3 -march=prescott", while I use
"-march=nocona" (gcc 4.1).
I have the fallback for 'core2' as 'pentium-m' which I think
may be better for instruction scheduling, but omits the
SSE3 support of 'nocona'. On balance I think 'nocona' is
probably a better choice.

Could you test cpuflags 1.34 (the issue with it not picking
it up as core2 to start with should be resolved)
Geert Hendrickx
2009-01-13 08:09:39 UTC
Permalink
I have the fallback for 'core2' as 'pentium-m' which I think may be
better for instruction scheduling, but omits the SSE3 support of
'nocona'. On balance I think 'nocona' is probably a better choice.
Could you test cpuflags 1.34 (the issue with it not picking it up as
core2 to start with should be resolved)
Yes, cpuflags-1.34 gives "-mfpmath=sse -msse3 -march=nocona".

Btw, according to the gcc(1) manpage, -mfpmath=sse is default on x64-64,
and -msse3 is included in -march=nonona.

But I think you're right that pentium-m + SSE3 could perhaps be a better
match for this CPU.

Geert

Loading...