Discussion:
Looking for 3.5" SSD for ZIL
Stephan Budach
2010-12-22 10:36:48 UTC
Permalink
Hello all,

I am shopping around for 3.5" SSDs that I can mount into my storage and
use as ZIL drives.
As of yet, I have only found 3.5" models with the Sandforce 1200, which
was not recommended on this list.
Does anyone maybe know of a model that has the Sandforce 1500 and is
3.5"? Or any other 3.5" SSD that he/she can recommend?

Cheers,
budy
Khushil Dep
2010-12-22 11:10:09 UTC
Permalink
We've always bought 2.5" and adapters for the super-micro cradles - works
well, no issues to report here.

Normally Intel's or Samsung though we also use STECH.

---
W. A. Khushil Dep - ***@gmail.com - 07905374843

Visit my blog at http://www.khushil.com/






On 22 December 2010 10:36, Stephan Budach <***@jvm.de> wrote:

> Hello all,
>
> I am shopping around for 3.5" SSDs that I can mount into my storage and use
> as ZIL drives.
> As of yet, I have only found 3.5" models with the Sandforce 1200, which was
> not recommended on this list.
> Does anyone maybe know of a model that has the Sandforce 1500 and is 3.5"?
> Or any other 3.5" SSD that he/she can recommend?
>
> Cheers,
> budy
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-***@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
Pasi Kärkkäinen
2010-12-22 11:41:12 UTC
Permalink
On Wed, Dec 22, 2010 at 11:36:48AM +0100, Stephan Budach wrote:
> Hello all,
>
> I am shopping around for 3.5" SSDs that I can mount into my storage and
> use as ZIL drives.
> As of yet, I have only found 3.5" models with the Sandforce 1200, which
> was not recommended on this list.
>

I think the "recommendation" was not to use SSDs at all for ZIL,
not just specifially Sandforce controllers?

-- Pasi

> Does anyone maybe know of a model that has the Sandforce 1500 and is 3.5"?
> Or any other 3.5" SSD that he/she can recommend?
>
> Cheers,
> budy

> _______________________________________________
> zfs-discuss mailing list
> zfs-***@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Stephan Budach
2010-12-22 13:03:37 UTC
Permalink
Am 22.12.10 12:41, schrieb Pasi Kärkkäinen:
> On Wed, Dec 22, 2010 at 11:36:48AM +0100, Stephan Budach wrote:
>> Hello all,
>>
>> I am shopping around for 3.5" SSDs that I can mount into my storage and
>> use as ZIL drives.
>> As of yet, I have only found 3.5" models with the Sandforce 1200, which
>> was not recommended on this list.
>>
> I think the "recommendation" was not to use SSDs at all for ZIL,
> not just specifially Sandforce controllers?
>
> -- Pasi
>
>
I think the recommendation has been either Intel X25 or Sandforce
1500-based SSDs.

Cheers,
budy
Jabbar
2010-12-22 13:43:35 UTC
Permalink
Hello,

I was thinking of buying a couple of SSD's until I found out that Trim is
only supported with SATA drives. I'm not sure if TRIM will work with ZFS. I
was concerned that with trim support the SSD life and write throughput will
get affected.

Doesn't anybody have any thoughts on this?

On 22 December 2010 10:36, Stephan Budach <***@jvm.de> wrote:

> Hello all,
>
> I am shopping around for 3.5" SSDs that I can mount into my storage and use
> as ZIL drives.
> As of yet, I have only found 3.5" models with the Sandforce 1200, which was
> not recommended on this list.
> Does anyone maybe know of a model that has the Sandforce 1500 and is 3.5"?
> Or any other 3.5" SSD that he/she can recommend?
>
> Cheers,
> budy
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-***@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>


--
Thanks

A Jabbar Azam
Ray Van Dolson
2010-12-22 15:29:14 UTC
Permalink
On Wed, Dec 22, 2010 at 05:43:35AM -0800, Jabbar wrote:
> Hello,
>
> I was thinking of buying a couple of SSD's until I found out that Trim is only
> supported with SATA drives. I'm not sure if TRIM will work with ZFS. I was
> concerned that with trim support the SSD life and write throughput will get
> affected.
>
> Doesn't anybody have any thoughts on this?

Have been using X-25E's as ZIL for over a year. Cheap enough to
replace a drive when they last that long... (still not seeing any
reason to replace our current batch yet either).

Ray
David Magda
2010-12-22 15:53:26 UTC
Permalink
On Dec 22, 2010, at 08:43, Jabbar wrote:

> I was thinking of buying a couple of SSD's until I found out that
> Trim is
> only supported with SATA drives. I'm not sure if TRIM will work with
> ZFS. I
> was concerned that with trim support the SSD life and write
> throughput will
> get affected.
>
> Doesn't anybody have any thoughts on this?

Basic support for TRIM was added to b146, but ZFS does not make use of
it yet:

http://bugs.opensolaris.org/view_bug.do?bug_id=6866610
http://sparcv9.blogspot.com/2010/07/sata-trim-command-in-b146.html

The statement "only support with SATA drives" is a bit confusing. SATA
is merely a protocol between the host and the storage unit. Whether
the storage unit is an SSD or spinning rust is irrelevant as either
can support talk to the outside world via SATA. Similarly the SCSI
world (which now runs over SAS for the transport layer) has a
corresponding UNMAP command.
Christopher George
2010-12-22 15:05:16 UTC
Permalink
> I'm not sure if TRIM will work with ZFS.

Neither ZFS nor the ZIL code in particular support TRIM.

> I was concerned that with trim support the SSD life and
> write throughput will get affected.

Your concerns about sustainable write performance (IOPS)
for a Flash based SSD are valid, the resulting degradation
will vary depending on the controller used.

Best regards,

Christopher George
Founder/CTO
www.ddrdrive.com
--
This message posted from opensolaris.org
Erik Trimble
2010-12-23 06:36:00 UTC
Permalink
On 12/22/2010 7:05 AM, Christopher George wrote:
>> I'm not sure if TRIM will work with ZFS.
> Neither ZFS nor the ZIL code in particular support TRIM.
>
>> I was concerned that with trim support the SSD life and
>> write throughput will get affected.
> Your concerns about sustainable write performance (IOPS)
> for a Flash based SSD are valid, the resulting degradation
> will vary depending on the controller used.
>
> Best regards,
>
> Christopher George
> Founder/CTO
> www.ddrdrive.com

Christopher is correct, in that SSDs will suffer from (non-trivial)
performance degredation after they've exhausted their free list, and
haven't been told to reclaim emptied space. True battery-backed DRAM is
the only permanent solution currently available which never runs into
this problem. Even TRIM-supported SSDs eventually need reconditioning.

However, this *can* be overcome by frequently re-formatting the SSD (not
the Solaris format, a low-level format using a vendor-supplied
utility). It's generally a simple thing, but requires pulling the SSD
from the server, connecting it to either a Linux or Windows box, running
the reformatter, then replacing the SSD. Which, is a PITA.

But, still a bit cheaper than buying a DDRdrive. <wink>


--
Erik Trimble
Java System Support
Mailstop: usca22-123
Phone: x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)
Christopher George
2010-12-23 07:29:40 UTC
Permalink
> It's generally a simple thing, but requires pulling the SSD from the
> server, connecting it to either a Linux or Windows box, running
> the reformatter, then replacing the SSD. Which, is a PITA.

This procedure is more commonly known as a "Secure Erase". And it
will return a Flash based SSD to it's original or "new" performance.

But as demonstrated in my presentation comparing Flash to DRAM
based SSDs for ZIL accelerator applicability, the most dramatic write
IOPS degradation occurs in less than 10 minutes of sustained use.

For reference: http://www.ddrdrive.com/zil_accelerator.pdf

So for the tested devices (OCZ Vertex 2 EX / Vertex 2 Pro) to come
close to matching the vendor promised random write IOPS, one
would have to remove the log device from the pool and Secure Erase
after every ~10 minutes of sustained ZIL use.

Would having to perform a Secure Erase every hour, day, or even
week really be the most cost effective use of an administrators time?

Best regards,

Christopher George
Founder/CTO
www.ddrdrive.com
--
This message posted from opensolaris.org
Eric D. Mudama
2010-12-23 17:45:02 UTC
Permalink
On Wed, Dec 22 at 23:29, Christopher George wrote:
>Would having to perform a Secure Erase every hour, day, or even
>week really be the most cost effective use of an administrators time?

You're assuming that the "into an empty device" performance is
required by their application.

For many users, the worst-case steady-state of the device (6k IOPS on
the Vertex2 EX, depending on workload, as per slide 48 in your
presentation) is so much faster than a rotating drive (50x faster,
assuming that cache disabled on a rotating drive is roughly 100 IOPS
with queueing), that it'll still provide a huge performance boost when
used as a ZIL in their system.

For a huge ZFS box providing tens of ZFS filesystems in a pool all
with huge user loads, sure, a RAM based device makes sense, but it's
overkill for some large percentage of ZFS users, I imagine.

--
Eric D. Mudama
***@mail.bounceswoosh.org
Christopher George
2010-12-23 18:49:31 UTC
Permalink
> You're assuming that the "into an empty device" performance is
> required by their application.

My assumption was stated in the paragraph prior, i.e. vendor promised
random write IOPS. Based on the inquires we receive, most *actually*
expect an OCZ SSD to perform as specified which is 50K 4KB
random writes for both the Vertex 2 EX and the Vertex 2 Pro.

The point I was trying to make, Secure Erase is not a viable solution to
write IOPS degradation, of the above listed SSDs, relative to published
specifications.

I think we can all agree, if "Secure Erase" could magically solve the
problem it would already be implemented by the SSD controller.

> For many users, the worst-case steady-state of the device (6k IOPS
> the Vertex2 EX, depending on workload, as per slide 48 in your
> presentation) is so much faster than a rotating drive (50x faster,
> assuming that cache disabled on a rotating drive is roughly 100
> IOPS with queueing), that it'll still provide a huge performance boost
> when used as a ZIL in their system.

I agree 100%. I never intended to insinuate otherwise :-)

Best regards,

Christopher George
Founder/CTO
www.ddrdrive.com
--
This message posted from opensolaris.org
Eric D. Mudama
2010-12-23 21:12:49 UTC
Permalink
On Thu, Dec 23 at 10:49, Christopher George wrote:
>My assumption was stated in the paragraph prior, i.e. vendor promised
>random write IOPS. Based on the inquires we receive, most *actually*
>expect an OCZ SSD to perform as specified which is 50K 4KB
>random writes for both the Vertex 2 EX and the Vertex 2 Pro.

Okay, I understand where you're coming from.

Yes, buyers must be aware of the test methodologies for published
benchmark results, especially those used to sell drives by the vendors
themselves. "Up to" is generally a poor thing to base a buying
decision.

--eric

--
Eric D. Mudama
***@mail.bounceswoosh.org
Fred Liu
2010-12-23 07:29:35 UTC
Permalink
ACARD 9010 is good enough in this aspect, if you need extremely high iops...

Fred

> -----Original Message-----
> From: zfs-discuss-***@opensolaris.org [mailto:zfs-discuss-
> ***@opensolaris.org] On Behalf Of Erik Trimble
> Sent: 星期四, 十二月 23, 2010 14:36
> To: Christopher George
> Cc: zfs-***@opensolaris.org
> Subject: Re: [zfs-discuss] Looking for 3.5" SSD for ZIL
>
> On 12/22/2010 7:05 AM, Christopher George wrote:
> >> I'm not sure if TRIM will work with ZFS.
> > Neither ZFS nor the ZIL code in particular support TRIM.
> >
> >> I was concerned that with trim support the SSD life and
> >> write throughput will get affected.
> > Your concerns about sustainable write performance (IOPS)
> > for a Flash based SSD are valid, the resulting degradation
> > will vary depending on the controller used.
> >
> > Best regards,
> >
> > Christopher George
> > Founder/CTO
> > www.ddrdrive.com
>
> Christopher is correct, in that SSDs will suffer from (non-trivial)
> performance degredation after they've exhausted their free list, and
> haven't been told to reclaim emptied space. True battery-backed DRAM
> is
> the only permanent solution currently available which never runs into
> this problem. Even TRIM-supported SSDs eventually need reconditioning.
>
> However, this *can* be overcome by frequently re-formatting the SSD
> (not
> the Solaris format, a low-level format using a vendor-supplied
> utility). It's generally a simple thing, but requires pulling the SSD
> from the server, connecting it to either a Linux or Windows box,
> running
> the reformatter, then replacing the SSD. Which, is a PITA.
>
> But, still a bit cheaper than buying a DDRdrive. <wink>
>
>
> --
> Erik Trimble
> Java System Support
> Mailstop: usca22-123
> Phone: x17195
> Santa Clara, CA
> Timezone: US/Pacific (GMT-0800)
>
> _______________________________________________
> zfs-discuss mailing list
> zfs-***@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Fred Liu
2010-12-23 07:30:25 UTC
Permalink
ACARD 9010 is good enough in this aspect, if you DON'T need extremely high IOPS...

Sorry for the typo.

Fred

> -----Original Message-----
> From: Fred Liu
> Sent: 星期四, 十二月 23, 2010 15:30
> To: 'Erik Trimble'; Christopher George
> Cc: zfs-***@opensolaris.org
> Subject: RE: [zfs-discuss] Looking for 3.5" SSD for ZIL
>
> ACARD 9010 is good enough in this aspect, if you need extremely high
> iops...
>
> Fred
>
> > -----Original Message-----
> > From: zfs-discuss-***@opensolaris.org [mailto:zfs-discuss-
> > ***@opensolaris.org] On Behalf Of Erik Trimble
> > Sent: 星期四, 十二月 23, 2010 14:36
> > To: Christopher George
> > Cc: zfs-***@opensolaris.org
> > Subject: Re: [zfs-discuss] Looking for 3.5" SSD for ZIL
> >
> > On 12/22/2010 7:05 AM, Christopher George wrote:
> > >> I'm not sure if TRIM will work with ZFS.
> > > Neither ZFS nor the ZIL code in particular support TRIM.
> > >
> > >> I was concerned that with trim support the SSD life and
> > >> write throughput will get affected.
> > > Your concerns about sustainable write performance (IOPS)
> > > for a Flash based SSD are valid, the resulting degradation
> > > will vary depending on the controller used.
> > >
> > > Best regards,
> > >
> > > Christopher George
> > > Founder/CTO
> > > www.ddrdrive.com
> >
> > Christopher is correct, in that SSDs will suffer from (non-trivial)
> > performance degredation after they've exhausted their free list, and
> > haven't been told to reclaim emptied space. True battery-backed DRAM
> > is
> > the only permanent solution currently available which never runs into
> > this problem. Even TRIM-supported SSDs eventually need
> reconditioning.
> >
> > However, this *can* be overcome by frequently re-formatting the SSD
> > (not
> > the Solaris format, a low-level format using a vendor-supplied
> > utility). It's generally a simple thing, but requires pulling the
> SSD
> > from the server, connecting it to either a Linux or Windows box,
> > running
> > the reformatter, then replacing the SSD. Which, is a PITA.
> >
> > But, still a bit cheaper than buying a DDRdrive. <wink>
> >
> >
> > --
> > Erik Trimble
> > Java System Support
> > Mailstop: usca22-123
> > Phone: x17195
> > Santa Clara, CA
> > Timezone: US/Pacific (GMT-0800)
> >
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-***@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Garrett D'Amore
2010-12-23 15:21:48 UTC
Permalink
We should get the reformatter(s) ported to illumos/solaris, if source is available. Something to consider.

- Garrett

-----Original Message-----
From: zfs-discuss-***@opensolaris.org on behalf of Erik Trimble
Sent: Wed 12/22/2010 10:36 PM
To: Christopher George
Cc: zfs-***@opensolaris.org
Subject: Re: [zfs-discuss] Looking for 3.5" SSD for ZIL

On 12/22/2010 7:05 AM, Christopher George wrote:
>> I'm not sure if TRIM will work with ZFS.
> Neither ZFS nor the ZIL code in particular support TRIM.
>
>> I was concerned that with trim support the SSD life and
>> write throughput will get affected.
> Your concerns about sustainable write performance (IOPS)
> for a Flash based SSD are valid, the resulting degradation
> will vary depending on the controller used.
>
> Best regards,
>
> Christopher George
> Founder/CTO
> www.ddrdrive.com

Christopher is correct, in that SSDs will suffer from (non-trivial)
performance degredation after they've exhausted their free list, and
haven't been told to reclaim emptied space. True battery-backed DRAM is
the only permanent solution currently available which never runs into
this problem. Even TRIM-supported SSDs eventually need reconditioning.

However, this *can* be overcome by frequently re-formatting the SSD (not
the Solaris format, a low-level format using a vendor-supplied
utility). It's generally a simple thing, but requires pulling the SSD
from the server, connecting it to either a Linux or Windows box, running
the reformatter, then replacing the SSD. Which, is a PITA.

But, still a bit cheaper than buying a DDRdrive. <wink>


--
Erik Trimble
Java System Support
Mailstop: usca22-123
Phone: x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)
Deano
2010-12-23 15:35:29 UTC
Permalink
If anybody does know of any source to the secure erase/reformatters, I'll
happily volunteer to do the port and then maintain it.


I'm currently in talks with several SSD and driver chip hardware peeps with
regard getting datasheets for some SSD products etc. for the purpose of
better support under the OI/Solaris driver model but these things can take a
while to obtain, so if anybody knows of existing open source versions I'll
jump on it.



Thanks,

Deano



From: zfs-discuss-***@opensolaris.org
[mailto:zfs-discuss-***@opensolaris.org] On Behalf Of Garrett D'Amore
Sent: 23 December 2010 15:22
To: Erik Trimble; Christopher George
Cc: zfs-***@opensolaris.org
Subject: Re: [zfs-discuss] Looking for 3.5" SSD for ZIL





We should get the reformatter(s) ported to illumos/solaris, if source is
available. Something to consider.

- Garrett

-----Original Message-----
From: zfs-discuss-***@opensolaris.org on behalf of Erik Trimble
Sent: Wed 12/22/2010 10:36 PM
To: Christopher George
Cc: zfs-***@opensolaris.org
Subject: Re: [zfs-discuss] Looking for 3.5" SSD for ZIL

On 12/22/2010 7:05 AM, Christopher George wrote:
>> I'm not sure if TRIM will work with ZFS.
> Neither ZFS nor the ZIL code in particular support TRIM.
>
>> I was concerned that with trim support the SSD life and
>> write throughput will get affected.
> Your concerns about sustainable write performance (IOPS)
> for a Flash based SSD are valid, the resulting degradation
> will vary depending on the controller used.
>
> Best regards,
>
> Christopher George
> Founder/CTO
> www.ddrdrive.com

Christopher is correct, in that SSDs will suffer from (non-trivial)
performance degredation after they've exhausted their free list, and
haven't been told to reclaim emptied space. True battery-backed DRAM is
the only permanent solution currently available which never runs into
this problem. Even TRIM-supported SSDs eventually need reconditioning.

However, this *can* be overcome by frequently re-formatting the SSD (not
the Solaris format, a low-level format using a vendor-supplied
utility). It's generally a simple thing, but requires pulling the SSD
from the server, connecting it to either a Linux or Windows box, running
the reformatter, then replacing the SSD. Which, is a PITA.

But, still a bit cheaper than buying a DDRdrive. <wink>


--
Erik Trimble
Java System Support
Mailstop: usca22-123
Phone: x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)
Ray Van Dolson
2010-12-23 15:46:14 UTC
Permalink
On Thu, Dec 23, 2010 at 07:35:29AM -0800, Deano wrote:
> If anybody does know of any source to the secure erase/reformatters,
> I’ll happily volunteer to do the port and then maintain it.
>
> I’m currently in talks with several SSD and driver chip hardware
> peeps with regard getting datasheets for some SSD products etc. for
> the purpose of better support under the OI/Solaris driver model but
> these things can take a while to obtain, so if anybody knows of
> existing open source versions I’ll jump on it.
>
> Thanks,
> Deano

A tool to help the end user know *when* they should run the reformatter
tool would be helpful too.

I know we can just wait until performance "degrades", but it would be
nice to see what % of blocks are in use, etc.

Ray
Deano
2010-12-23 15:57:53 UTC
Permalink
In an ideal world, if we could obtain details on how to reset/format blocks of a SSD, we could do it automatically running behind the ZIL. As a log its going in one direction, a background task could clean up behind it, making the performance lowing over time a non-issue for the ZIL. A first start may be calling unmap/trim on those blocks (which I was surprised to find in the source is already coded up in the SATA driver, just not used yet) but really a reset would be better.

But as you say a tool to say if its need doing would be a good start. They certainly exist in closed source form...

Deano

-----Original Message-----
From: zfs-discuss-***@opensolaris.org [mailto:zfs-discuss-***@opensolaris.org] On Behalf Of Ray Van Dolson
Sent: 23 December 2010 15:46
To: zfs-***@opensolaris.org
Subject: Re: [zfs-discuss] Looking for 3.5" SSD for ZIL

On Thu, Dec 23, 2010 at 07:35:29AM -0800, Deano wrote:
> If anybody does know of any source to the secure erase/reformatters,
> I’ll happily volunteer to do the port and then maintain it.
>
> I’m currently in talks with several SSD and driver chip hardware
> peeps with regard getting datasheets for some SSD products etc. for
> the purpose of better support under the OI/Solaris driver model but
> these things can take a while to obtain, so if anybody knows of
> existing open source versions I’ll jump on it.
>
> Thanks,
> Deano

A tool to help the end user know *when* they should run the reformatter
tool would be helpful too.

I know we can just wait until performance "degrades", but it would be
nice to see what % of blocks are in use, etc.

Ray
Erik Trimble
2010-12-23 17:14:12 UTC
Permalink
On 12/23/2010 7:57 AM, Deano wrote:
> In an ideal world, if we could obtain details on how to reset/format blocks of a SSD, we could do it automatically running behind the ZIL. As a log its going in one direction, a background task could clean up behind it, making the performance lowing over time a non-issue for the ZIL. A first start may be calling unmap/trim on those blocks (which I was surprised to find in the source is already coded up in the SATA driver, just not used yet) but really a reset would be better.
>
> But as you say a tool to say if its need doing would be a good start. They certainly exist in closed source form...
>
> Deano
>
> -----Original Message-----
> From: zfs-discuss-***@opensolaris.org [mailto:zfs-discuss-***@opensolaris.org] On Behalf Of Ray Van Dolson
> Sent: 23 December 2010 15:46
> To: zfs-***@opensolaris.org
> Subject: Re: [zfs-discuss] Looking for 3.5" SSD for ZIL
>
> On Thu, Dec 23, 2010 at 07:35:29AM -0800, Deano wrote:
>> If anybody does know of any source to the secure erase/reformatters,
>> I’ll happily volunteer to do the port and then maintain it.
>>
>> I’m currently in talks with several SSD and driver chip hardware
>> peeps with regard getting datasheets for some SSD products etc. for
>> the purpose of better support under the OI/Solaris driver model but
>> these things can take a while to obtain, so if anybody knows of
>> existing open source versions I’ll jump on it.
>>
>> Thanks,
>> Deano
> A tool to help the end user know *when* they should run the reformatter
> tool would be helpful too.
>
> I know we can just wait until performance "degrades", but it would be
> nice to see what % of blocks are in use, etc.
>
> Ray


AFAIK, all the reformatter utilities are closed-source, direct from the
SSD manufacturer. They talk directly to the drive firmware, so they're
decidedly implementation-specific (I'd be flabberghasted if one worked
on two different manufacturers' SSDs, even if they used the same basic
controller). Many are DOS-based.

As Christopher noted, you'll get a drop-off in performance as soon as
you collect enough sync writes to have written (in the aggregate)
slightly more than the total capacity of the SSD (including the "extra"
that most SSDs now have).

That said, I would expect full TRIM support to possibly make this
better, as it could free up partially-used pages more frequently, and
thus increasing the time before performance drops (which is due to the
page remapping/reshuffling demands on the SSD controller).

But, yes, SSDs are inherently less fast than DRAM. They're utility is
entirely dependent on what your use case (and performance demands) are.
The longer-term solution is to have SSDs change how they are designed,
moving away from the current one-page-of-multiple-blocks as the atomic
entity of writing, and straight to a one-block-per-page setup. Don't
hold your breath.



--
Erik Trimble
Java System Support
Mailstop: usca22-123
Phone: x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)
Eric D. Mudama
2010-12-23 17:47:01 UTC
Permalink
On Thu, Dec 23 at 9:14, Erik Trimble wrote:
>The longer-term solution is to have SSDs change how they are
>designed, moving away from the current one-page-of-multiple-blocks as
>the atomic entity of writing, and straight to a one-block-per-page
>setup. Don't hold your breath.

Will never happen using NAND technology.

Non-NAND SSDs may or may not have similar or related limitations.

--
Eric D. Mudama
***@mail.bounceswoosh.org
Christopher George
2010-12-23 16:46:15 UTC
Permalink
> However, this *can* be overcome by frequently re-formatting the SSD (not
> the Solaris format, a low-level format using a vendor-supplied utility).

For those looking to "Secure Erase" a OCZ SandForce based SSD to reclaim
performance, the following OCZ Forum thread might be of interest:

http://www.ocztechnologyforum.com/forum/showthread.php?75773-Secure-Erase-TRIM-and-anything-else-Sandforce

OCZ uses the term "DuraClass" as a catch-all for algorithms controlling wear
leveling, drive longevity... There is a direct correlation between Secure Erase
frequency and expected SSD lifetime.

Thread #1 detailing a recommended frequency of Secure Erase use:

"3) Secure erase a drive every 6 months to free up previously read only
blocks, secure erase every 2 days to get round Duraclass and you will kill the
drive very quickly"

Thread #5 explaining DuraClass and relationship to TRIM:

"Duraclass is limiting the speed of the drive NOT TRIM. TRIM is used along
with wear levelling."

Thread #6 provides more details of DuraClass and TRIM:

"Now Duraclass monitors all writes and control's encryption and compression,
this is what effects the speed of the blocks being written to..NOT the fact they
have been TRIM'd or not TRIM'd."

"You guys have become fixated at TRIM not speeding up the drive and forget
that Duraclass controls all writes incurred by the drive once a GC map has
been written."

Above excerpts written by a OCZ employed thread moderator (Tony).

Best regards,

Christopher George
Founder/CTO
www.ddrdrive.com
--
This message posted from opensolaris.org
Deano
2010-12-23 17:11:51 UTC
Permalink
Secure Erase is currently a entire drive function, its writes all the cell
resetting it. It also updates the firmware GC maps so it knows the drive is
clean. Trim just gives more info the firmware that a block is unused (as
normally a delete is just updating an index table and the firmware has no
way of knowing which cells are no longer needed by the OS).

Currently firmware is meant to help conventional file system usage. However
ZIL isn't normal usage and as such *IF* and it's a big if, we can
effectively bypass the firmware trying to be clever or at least help it be
clever then we can avoid the downgrade over time. In particular if we could
secure erase a few cells as once as required, the lifetime would be much
longer, I'd even argue that taking the wear leveling off the drives hand
would be useful in the ZIL case. The other thing is the slow down only
occurs once the SSD fills and has to start getting clever where to put
things and which cells to change, for a ZIL that is again something we could
avoid in software fairly easy.

Its also worth putting this in perspective, a complete secure erase every
night to restore performance to your ZIL would still let the SSD last for
*years*. And given how cheap some SSD are, it is probably still cheaper to
effectively burn the ZIL out and just replace it once a year. Maybe not a
classic level of RAID but the very essence of the idea, lots of cheap can be
better than expensive if you know what you are doing.

Bye,
Deano

-----Original Message-----
From: zfs-discuss-***@opensolaris.org
[mailto:zfs-discuss-***@opensolaris.org] On Behalf Of Christopher George
Sent: 23 December 2010 16:46
To: zfs-***@opensolaris.org
Subject: Re: [zfs-discuss] Looking for 3.5" SSD for ZIL

> However, this *can* be overcome by frequently re-formatting the SSD (not
> the Solaris format, a low-level format using a vendor-supplied utility).

For those looking to "Secure Erase" a OCZ SandForce based SSD to reclaim
performance, the following OCZ Forum thread might be of interest:

http://www.ocztechnologyforum.com/forum/showthread.php?75773-Secure-Erase-TR
IM-and-anything-else-Sandforce

OCZ uses the term "DuraClass" as a catch-all for algorithms controlling wear

leveling, drive longevity... There is a direct correlation between Secure
Erase
frequency and expected SSD lifetime.

Thread #1 detailing a recommended frequency of Secure Erase use:

"3) Secure erase a drive every 6 months to free up previously read only
blocks, secure erase every 2 days to get round Duraclass and you will kill
the
drive very quickly"

Thread #5 explaining DuraClass and relationship to TRIM:

"Duraclass is limiting the speed of the drive NOT TRIM. TRIM is used along
with wear levelling."

Thread #6 provides more details of DuraClass and TRIM:

"Now Duraclass monitors all writes and control's encryption and compression,

this is what effects the speed of the blocks being written to..NOT the fact
they
have been TRIM'd or not TRIM'd."

"You guys have become fixated at TRIM not speeding up the drive and forget
that Duraclass controls all writes incurred by the drive once a GC map has
been written."

Above excerpts written by a OCZ employed thread moderator (Tony).

Best regards,

Christopher George
Founder/CTO
www.ddrdrive.com
--
This message posted from opensolaris.org
Eric D. Mudama
2010-12-23 17:51:10 UTC
Permalink
On Thu, Dec 23 at 17:11, Deano wrote:
>Currently firmware is meant to help conventional file system usage. However
>ZIL isn't normal usage and as such *IF* and it's a big if, we can
>effectively bypass the firmware trying to be clever or at least help it be
>clever then we can avoid the downgrade over time. In particular if we could
>secure erase a few cells as once as required, the lifetime would be much
>longer, I'd even argue that taking the wear leveling off the drives hand
>would be useful in the ZIL case.

In most cases, an SSD knows something isn't valuable when it is
overwritten. If the allocator for the ZIL would rewrite to sectors
no-longer-needed, instead of walking sequentially across the entire
available LBA space, slowdown of a ZIL would likely never occur on a
NAND SSD, since the drive would always have a good idea which sectors
were free and which were still in use.

--
Eric D. Mudama
***@mail.bounceswoosh.org
Pasi Kärkkäinen
2010-12-22 21:55:11 UTC
Permalink
On Wed, Dec 22, 2010 at 01:43:35PM +0000, Jabbar wrote:
> Hello,
>
> I was thinking of buying a couple of SSD's until I found out that Trim is
> only supported with SATA drives.
>

Yes, because TRIM is ATA command. SATA means Serial ATA.
SCSI (SAS) drives have "WRITE SAME" command, which is the equivalent command there.

-- Pasi
Krunal Desai
2010-12-22 14:55:56 UTC
Permalink
> As of yet, I have only found 3.5" models with the Sandforce 1200, which was
> not recommended on this list.

I actually bought a SF-1200 based OCZ Agility 2 (60G) for use as a
ZIL/L2ARC (haven't installed it yet however, definitely jumped the gun
on this purchase...) based on some recommendations from fellow users.
Why are these not recommended? Is it performance related, or more
"workload will degrade and kill this thing in no time" relate?

--khd
David Magda
2010-12-22 16:20:44 UTC
Permalink
On Dec 22, 2010, at 09:55, Krunal Desai wrote:

> I actually bought a SF-1200 based OCZ Agility 2 (60G) for use as a
> ZIL/L2ARC (haven't installed it yet however, definitely jumped the gun
> on this purchase...) based on some recommendations from fellow users.
> Why are these not recommended? Is it performance related, or more
> "workload will degrade and kill this thing in no time" relate?


There are two main reasons why they're generally recommended:

First, SF-1500 based devices usually come with a supercap or other
battery system that helps preserve the buffers if the power goes out.

They also generally respect the 'flush cache' commands: when a SYNC
command is sent to many other disks/SSDs they answer back "yes, the
data is on stable storage" when in fact it is not. Lying to ZFS about
what's on stable storage causes problems when the power goes out.

This is for slog devices. These shortcomings don't matter (as much?)
for cache / L2ARC devices since they're mostly read-only. However for
mostly-write I/O it can cause problems when it comes to pool recovery.

Some recent threads on the subject:

http://mail.opensolaris.org/pipermail/zfs-discuss/2010-May/thread.html#41326
http://mail.opensolaris.org/pipermail/zfs-discuss/2010-May/thread.html#41588
http://mail.opensolaris.org/pipermail/zfs-discuss/2010-June/thread.html#42298

SandForce has recently announced the 2000-series chip set, of which
the SF-2500 and SF-2600 are labelled as "enterprise":

http://www.sandforce.com/index.php?id=21

Note that for a slog / ZIL device it doesn't have to be very big (at
most 1/2 of physical RAM), so if your system has 16 GB of memory then
your ZIL will at most be 8 GB:

http://tinyurl.com/34ac5vv
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Separate_Log_Devices

Which is why you may want to purchase less storage but go with a
"better" SSD. For L2ARC, bigger can be construed to be better.
Christopher George
2010-12-22 17:40:53 UTC
Permalink
> I actually bought a SF-1200 based OCZ Agility 2 (60G)...
> Why are these not recommended?

The OCZ Agility 2 or any SF-1200 based SSD is an excellent choice for
the L2ARC. As on-board volatile memory does *not* need power protection
because the L2ARC contents are not required to survive a host power loss
(at this time). Also, checksum fallback to the pool provides data redundancy.

The ZIL accelerator's requirements differ from the L2ARC, as it's very
purpose is to guarantee *all* data written to the log can be replayed
(on next reboot) in case of host failure.

Best regards,

Christopher George
Founder/CTO
www.ddrdrive.com
--
This message posted from opensolaris.org
Krunal Desai
2010-12-22 17:51:32 UTC
Permalink
> The ZIL accelerator's requirements differ from the L2ARC, as it's very
> purpose is to guarantee *all* data written to the log can be replayed
> (on next reboot) in case of host failure.

Ah, so this would be why say a super-capacitor backed SSD can be very
helpful, as it will have some backup power present. Luckily, my use
case is not a high-availability server, but a NAS in my basement. I've
got it attached to a UPS with very conservative shut-down timing. Or
are there other host failures aside from power a ZIL would be
vulnerable too (system hard-locks?)?
Christopher George
2010-12-22 18:31:22 UTC
Permalink
> got it attached to a UPS with very conservative shut-down timing. Or
> are there other host failures aside from power a ZIL would be
> vulnerable too (system hard-locks?)?

Correct, a system hard-lock is another example...

Best regards,

Christopher George
Founder/CTO
www.ddrdrive.com
--
This message posted from opensolaris.org
Bill Werner
2010-12-23 04:57:23 UTC
Permalink
> > got it attached to a UPS with very conservative
> shut-down timing. Or
> > are there other host failures aside from power a
> ZIL would be
> > vulnerable too (system hard-locks?)?
>
> Correct, a system hard-lock is another example...

How about comparing a non-battery backed ZIL to running a ZFS dataset with sync=disabled. Which is more risky?

This has been an educational thread for me...I was not aware that SSD drives had some DRAM in front of the SSD part?
--
This message posted from opensolaris.org
Richard Elling
2010-12-24 20:34:33 UTC
Permalink
On Dec 22, 2010, at 8:57 PM, Bill Werner <***@cubbyhole.com> wrote:

>>> got it attached to a UPS with very conservative
>> shut-down timing. Or
>>> are there other host failures aside from power a
>> ZIL would be
>>> vulnerable too (system hard-locks?)?
>>
>> Correct, a system hard-lock is another example...
>
> How about comparing a non-battery backed ZIL to running a ZFS dataset with sync=disabled. Which is more risky?

Disabling the ZIL is always more risky. But more importantly, disabling
the ZIL is a policy decision. If the user is happy with that policy, then
they should be happy with the consequence.
-- richard

>
Christopher George
2010-12-23 06:04:51 UTC
Permalink
> How about comparing a non-battery backed ZIL to running a
> ZFS dataset with sync=disabled. Which is more risky?

Most likely, the 3.5" SSD's on-board volatile (not power protected)
memory would be small relative to the transaction group (txg) size
and thus less "risky" than sync=disabled.

Best regards,

Christopher George
Founder/CTO
www.ddrdrive.com
--
This message posted from opensolaris.org
Erik Trimble
2010-12-23 06:36:49 UTC
Permalink
On 12/22/2010 10:04 PM, Christopher George wrote:
>> How about comparing a non-battery backed ZIL to running a
>> ZFS dataset with sync=disabled. Which is more risky?
> Most likely, the 3.5" SSD's on-board volatile (not power protected)
> memory would be small relative to the transaction group (txg) size
> and thus less "risky" than sync=disabled.
>
> Best regards,
>
> Christopher George
> Founder/CTO
> www.ddrdrive.com


To the OP: First off, what do you mean by "sync=disabled"??? There is
no such parameter for a mount option or attribute for ZFS, nor is there
for exporting anything in NFS, nor for client-side NFS mounts.

If you meant "disable the ZIL", well, DON'T.

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29

Moreover, disabling the ZIL on a per-dataset basis is not possible.



As noted in the ETG, disabling ZIL can cause possible NFS-client-side
corruption. If you absolutely must turn it off, however, you will get
More Reliable transactions than a non-SuperCap'd SSD, by virtue that any
sync-write on such a fileserver will not return as complete until the
data has reach backing store. Which, in most cases, will tank (no pun
intended) your synchronous performance. About the only case it won't
cripple performance is when your backing store is using some sort of
NVRAM to buffer writes to the disks (as most large array controllers do
- but make sure that cache is battery backed). But even there, it can
be a relatively simple thing to overwhelm the very limited cache on such
a controller, in which case your performance tanks again.

--
Erik Trimble
Java System Support
Mailstop: usca22-123
Phone: x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)
Christopher George
2010-12-23 07:47:44 UTC
Permalink
> To the OP: First off, what do you mean by "sync=disabled"???

I believe he is referring to ZIL synchronicity (PSARC/2010/108).

http://arc.opensolaris.org/caselog/PSARC/2010/108/20100401_neil.perrin

The following presentation by Robert Milkowski does an excellent job of
placing in a larger context:

http://www.oug.org/files/presentations/zfszilsynchronicity.pdf

Best regards,

Christopher George
Founder/CTO
www.ddrdrive.com
--
This message posted from opensolaris.org
Khushil Dep
2010-12-25 18:19:50 UTC
Permalink
"Friends don't let friends disable the ZIL" - right Richard? :-)
On 24 Dec 2010 20:34, "Richard Elling" <***@gmail.com> wrote:
Richard Elling
2010-12-26 06:22:57 UTC
Permalink
On Dec 25, 2010, at 10:19 AM, Khushil Dep wrote:
> "Friends don't let friends disable the ZIL" - right Richard? :-)
>
>

Or, if you care about your data enough to bother with RAID, don't
disable the ZIL :-)
-- richard
Saxon, Will
2010-12-28 00:16:43 UTC
Permalink
> -----Original Message-----
> From: zfs-discuss-***@opensolaris.org [mailto:zfs-discuss-
> ***@opensolaris.org] On Behalf Of Stephan Budach
> Sent: Wednesday, December 22, 2010 5:37 AM
> To: zfs-***@opensolaris.org
> Subject: [zfs-discuss] Looking for 3.5" SSD for ZIL
>
> Hello all,
>
> I am shopping around for 3.5" SSDs that I can mount into my storage and use
> as ZIL drives.
> As of yet, I have only found 3.5" models with the Sandforce 1200, which was
> not recommended on this list.
> Does anyone maybe know of a model that has the Sandforce 1500 and is
> 3.5"? Or any other 3.5" SSD that he/she can recommend?

I have not personally used one, but I have received recommendations for the STEC ZeusRAM. It is a 3.5" SAS RAM-based, flash-backed SSD. I was quoted a shade under $3K USD for an 8GB unit. My understanding is that these have to be ordered through an integrator and there is a significant lead time.

There are 2.5->3.5 adapter shells that you should be able to use for any/all of the 2.5" SSDs on the market.

-Will
Jordan McQuown
2010-12-28 03:32:10 UTC
Permalink
Does anyone have a contact from whom I could purchase an STEC SSD?

Thank you,

Jordan


________________________________________
From: zfs-discuss-***@opensolaris.org [zfs-discuss-***@opensolaris.org] on behalf of Saxon, Will [***@sage.com]
Sent: Monday, December 27, 2010 7:16 PM
To: Stephan Budach; zfs-***@opensolaris.org
Subject: Re: [zfs-discuss] Looking for 3.5" SSD for ZIL

> -----Original Message-----
> From: zfs-discuss-***@opensolaris.org [mailto:zfs-discuss-
> ***@opensolaris.org] On Behalf Of Stephan Budach
> Sent: Wednesday, December 22, 2010 5:37 AM
> To: zfs-***@opensolaris.org
> Subject: [zfs-discuss] Looking for 3.5" SSD for ZIL
>
> Hello all,
>
> I am shopping around for 3.5" SSDs that I can mount into my storage and use
> as ZIL drives.
> As of yet, I have only found 3.5" models with the Sandforce 1200, which was
> not recommended on this list.
> Does anyone maybe know of a model that has the Sandforce 1500 and is
> 3.5"? Or any other 3.5" SSD that he/she can recommend?

I have not personally used one, but I have received recommendations for the STEC ZeusRAM. It is a 3.5" SAS RAM-based, flash-backed SSD. I was quoted a shade under $3K USD for an 8GB unit. My understanding is that these have to be ordered through an integrator and there is a significant lead time.

There are 2.5->3.5 adapter shells that you should be able to use for any/all of the 2.5" SSDs on the market.

-Will
Loading...