[arch-general] Reliability test for hard drives and SSD

Discussion:

Andrey Ponomarenko via arch-general

2018-03-03 07:20:51 UTC

Hi there!

Good news for all interested in hardware compatibility and reliability.

I've started a new project to estimate reliability of hard drives and SSD in real-life conditions based on the SMART data reports collected by Linux users in the Linux-Hardware.org database since 2014. The initial data (SMART reports), analysis methods and results are publicly shared in a new github repository: https://github.com/linuxhw/SMART. Everyone can contribute to the report by uploading probes of their computers by the hw-probe tool!

The primary aim of the project is to find drives with longest "power on hours" and minimal number of errors. The following formula is used to measure reliability: Power_On_Hours / (1 + Number_Of_Errors), i.e. time to the first error/between errors.

Please be careful when reading the results table. Pay attention not only to the rating, but also to the number of checked model samples. If rating is low, then look at the number of power-on days and number of errors occurred. New drive models will appear at the end of the rating table and will move to the top in the case of long error-free operation.

Thanks to ROSA, Arch, Fedora, Ubuntu, Debian, Mint, openSUSE, Gentoo users and others who had made this work possible by contribution to the database!

Ralf Mardorf

2018-03-03 07:55:58 UTC

Permalink

Post by Andrey Ponomarenko via arch-general
The primary aim of the project is to find drives with longest "power
on hours" and minimal number of errors.

Pardon, but this is idiotic, since the HDDs and SSDs with the longest
power on hours and minimal number of errors are already discontinued,
if you want to buy a new drive. Your statistic at best is useful to
find out what HDDs and SSDs fail much to early, but than consider to
care about the usage. Some people turn their computers on and off
several times a day, this isn't good for a HDD, while others have got
uptimes of more than a year. Other people are maintaining SSDs with kid
gloves, OTOH other are using SSDs in the same way, as they use/d HDDs.
btw. I'm using my SSDs without kid gloves and when I wanted to order
additional SSDs of the same kind, I couldn't get them anymore, let
alone the best HDDs I every used.

Ralf Mardorf

2018-03-03 08:13:47 UTC

Permalink

Post by Ralf Mardorf

Post by Andrey Ponomarenko via arch-general
The primary aim of the project is to find drives with longest "power
on hours" and minimal number of errors.

Not to mention what crap as e.g. GVFS does to green external HDDs :D.

[***@archlinux ~]$ pacman -Qi gvfs | grep Description
Description : Dummy package

A dummy packe is required to solve insane, useless dependencies.

My external WD green drive stays asleep, it does not spin down and up
again and again. It did, before I replaced GVFS by a dummy package. GVFS
is just one HDD killer of several HDD killers.

--
$ pacman -Q linux{,-rt-securityink,-rt,-rt-pussytoes,-rt-cornflower}
linux 4.15.6-1
linux-rt-securityink 4.14.20_rt17-1
linux-rt 4.14.12_rt10-1
linux-rt-pussytoes 4.14.8_rt9-2
linux-rt-cornflower 4.11.12_rt16-1

Guus Snijders via arch-general

2018-03-03 11:16:30 UTC

Permalink

Post by Ralf Mardorf

Post by Andrey Ponomarenko via arch-general
The primary aim of the project is to find drives with longest "power
on hours" and minimal number of errors.

Actually, it could be very useful to have a real insight in these figures.

Similar reports by backblaze (a storage company) have been very helpful in
deciding which HDD's to buy (or skip!) at $work.
Also, this potentially gives *real* insight in SSD reliability; most other
sources are "promises" by the manufacturer, some testers and perhaps random
people who happened to run into a problem.

Even if most models will indeed have been discontinued by now.

Mvg, Guus Snijders

Ralf Mardorf

2018-03-03 16:14:18 UTC

Permalink

On Sat, 03 Mar 2018 11:16:30 +0000, Guus Snijders via arch-general

Post by Guus Snijders via arch-general

Post by Ralf Mardorf

Post by Andrey Ponomarenko via arch-general
The primary aim of the project is to find drives with longest "power
on hours" and minimal number of errors.

Actually, it could be very useful to have a real insight in these figures.
Similar reports by backblaze (a storage company) have been very
helpful in deciding which HDD's to buy (or skip!) at $work.
Also, this potentially gives *real* insight in SSD reliability; most
other sources are "promises" by the manufacturer, some testers and
perhaps random people who happened to run into a problem.
Even if most models will indeed have been discontinued by now.

Actually the strategy of companies such as Google obviously is to use
consumer drives instead of enterprise drives, different models,
from different vendors and much likely different batches, too, since
statistics aren't helpful at all. Statistics even say anything useful
about drive vendors, as far as I know, no vendor is better than another,
just specific _discontinued_ product lines of different vendors
were(/still are :) better than other.

Ralf Mardorf

2018-03-03 16:30:30 UTC

Permalink

Also very helpful could be https://en.wikipedia.org/wiki/Ouija the
result is useless, too, but a good step into the right direction to
notice self-delusion.

Guus Snijders via arch-general

2018-03-03 17:06:46 UTC

Permalink

Post by Ralf Mardorf
Also very helpful could be https://en.wikipedia.org/wiki/Ouija the
result is useless, too, but a good step into the right direction to
notice self-delusion.

When it comes to predicting stats; agreed.
This is the other way around; historic data on how $device *has* behaved.

Something which can help with replacing early etc. In our case: a certain
model HDD was reported with a high error rate. Sure enough, once one of
them failed, 2 others were quick to follow. Thanks to the warning from this
data, we were prepared and could simply swap the disks without any downtime.
Oh yeah, the replacements were from the same brand, just another model.
Superstition? Perhaps, but so far, so good ;).

Mvg, Guus Snijders

Ralf Mardorf

2018-03-03 21:25:08 UTC

Permalink

Post by Guus Snijders via arch-general
Something which can help with replacing early etc. In our case: a
certain model HDD was reported with a high error rate.

I didn't think of that. I agree that a statistic could be helpful to
replace drives, before they fail.

Andrey Ponomarenko via arch-general

2018-08-10 08:09:23 UTC

Permalink

Post by Andrey Ponomarenko via arch-general
Hi there!
Good news for all interested in hardware compatibility and reliability.
I've started a new project to estimate reliability of hard drives and SSD in real-life conditions based on the SMART data reports collected by Linux users in the Linux-Hardware.org database since 2014. The initial data (SMART reports), analysis methods and results are publicly shared in a new github repository: https://github.com/linuxhw/SMART. Everyone can contribute to the report by uploading probes of their computers by the hw-probe tool!
The primary aim of the project is to find drives with longest "power on hours" and minimal number of errors. The following formula is used to measure reliability: Power_On_Hours / (1 + Number_Of_Errors), i.e. time to the first error/between errors.
Please be careful when reading the results table. Pay attention not only to the rating, but also to the number of checked model samples. If rating is low, then look at the number of power-on days and number of errors occurred. New drive models will appear at the end of the rating table and will move to the top in the case of long error-free operation.

Hi,

I've just built an Arch Linux package for hw-probe. See https://github.com/linuxhw/hw-probe/blob/master/INSTALL.md#install-on-arch-linux.

The command to replenish the database:

sudo hw-probe -all -upload

One can also use a lightweight all-in-one AppImage w/o the need to install anything to the system: https://github.com/linuxhw/hw-probe#appimage

Thank you.