Discussion:
Feature proposal: differential pools
Andrew
2006-07-26 06:02:14 UTC
Permalink
Since ZFS is COW, can I have a read-only pool (on a central file server, or on a DVD, etc) with a separate block-differential pool on my local hard disk to store writes?
This way, the pool in use can be read-write, even if the main pool itself is read-only, without having to make a full local copy of that read-only pool in order to be able to write to it, and without having to use messy filesystem-level union filesystem features.

This would also be useful for live-system bootable DVDs, for which the writeable block-differential pool could be stored just in system memory in order to allow a fully functional non-persistent read-write pool without having to use the system's hard disk, or stored on a small flash thumbdrive which the user carries along with the DVD to allow a persistent read-write pool without having to use the system's hard disk.

For yet another feature, this ability to copy newly written blocks to a separate differential pool could be used even if those new blocks are still written back to the main pool as usual; in this case, the differential pool would serve as a real-time differential backup. For example, I could make a full backup of my laptop's hard disk onto DVDs, and then while in use have a thumbdrive plugged into the laptop. All updates to the hard disk would be copied to the thumbdrive, and when the thumbdrive fills up, it can be copied to a DVD and then erased. If the laptop's hard disk dies, I can reconstruct the system's disk state right up to the moment that it died by restoring all the DVDs to a new hard disk and then restoring the current contents of the thumbdrive. This would effectively provide the redundancy benefits of a full mirror of the laptop's hard disk, but without having to lug along an entire full-size second hard disk, since I only have to carry a thumbdrive big enough to hold the amount of differential data I expect to generate.

Finally, using the upcoming hard disk/flash disk combo drives in laptops, using the flash disk as the differential pool for the main hard disk pool (instead of writing the differential data immediately back to the main pool) would allow persistent writes without having to spin up the sleeping hard disk, and the differential pool could be flushed to the main pool sometime later when the hard disk is forced to spin up anyway to service a read. (This feature is independent of the use of an external thumbdrive to mirror differential data, and both features could be used at the same time.)

All of these features would be enabled by allowing pool writes to be redirected to another destination (the differential pool) separate from the pool itself, and keeping track of the txg number at which the redirection began so that pool read requests will be sent to the right place.


This message posted from opensolaris.org
Matthew Ahrens
2006-07-27 18:38:02 UTC
Permalink
Post by Andrew
Since ZFS is COW, can I have a read-only pool (on a central file
server, or on a DVD, etc) with a separate block-differential pool on
my local hard disk to store writes?
This way, the pool in use can be read-write, even if the main pool
itself is read-only, without having to make a full local copy of that
read-only pool in order to be able to write to it, and without having
to use messy filesystem-level union filesystem features.
These are some interesting use cases. I'll have to ponder how they
could be best implemented in ZFS.

Some of these cases can be solved simply by having a read-only device in
the pool (eg. live-boot DVDs). However, cases where you want to
transfer the data between devices are somewhat nontrivial (eg. hard
drive + flash memory), at least until we have 4852783 "reduce pool capacity".

I've filed RFE 6453741 "want mostly-read-only devices" to remember this
request.

--matt
Henk Langeveld
2006-07-27 23:11:16 UTC
Permalink
Post by Matthew Ahrens
Post by Andrew
Since ZFS is COW, can I have a read-only pool (on a central file
server, or on a DVD, etc) with a separate block-differential pool on
my local hard disk to store writes?
This way, the pool in use can be read-write, even if the main pool
itself is read-only, without having to make a full local copy of that
read-only pool in order to be able to write to it, and without having
to use messy filesystem-level union filesystem features.
These are some interesting use cases. I'll have to ponder how they
could be best implemented in ZFS.
Some of these cases can be solved simply by having a read-only device in
the pool (eg. live-boot DVDs). However, cases where you want to
transfer the data between devices are somewhat nontrivial (eg. hard
drive + flash memory), at least until we have 4852783 "reduce pool capacity".
I've filed RFE 6453741 "want mostly-read-only devices" to remember this
request.
I can see where this will lead to eventually...

I've seen several scenarios where 4852783 becomes essential. How to implement
such a thing? You first mark the vdev (do I say that correctly) as "evicting".
Then you start a full scrub/rewrite of the whole pool, with any "evict"-ing
components not available for writing, so the data is forced to be copied elsewhere.

Once the scrub finishes, you change the state of the device to "evicted" and
remove it from the pool.

Odd thing is, with a read-only, or read-mostly device, you cannot put these
marks on the original disk in the first place. This is slightly contrary to
the concept of zfs storing all of its configuration and meta-data on-disk.

Which to me implies, that before you can actually start scrubbing the pool, you
first have to copy ALL metadata from the evicted device to the other devices in
the pool.

A consequence of this whole process is that this results in yet another
method of installing the OS:

- boot from a r/o zfs source image (dvd)
- add sufficient storage to capture changes.
- mark session as one of:
o transient - discard change-pool on halt/crash/reboot
o persistent - retain changes on reboot
o permanent - make the pool bootable independantly from the original r/o image.
- followed by the parallel
- evict the source-image
- configure the system, and purge any history.
- if desired, reboot...

A final implication is that on such a system, you cannot ever use the
original source image, unless you tell the system explicitly NOT to import
any zfs devices besides the image.

The architecture of zfs is fascinating.


Cheers,
Henk
Malahat Qureshi
2006-07-28 01:17:03 UTC
Permalink
Is there any way to boot of from zfs disk "work around" ??


regards,

Malahat
Matthew Ahrens
2006-07-28 02:25:55 UTC
Permalink
Post by Malahat Qureshi
Is there any way to boot of from zfs disk "work around" ??
Yes, see
http://blogs.sun.com/roller/page/tabriz?entry=are_you_ready_to_rumble

--mat
Brian Hechinger
2006-07-28 12:47:38 UTC
Permalink
Post by Matthew Ahrens
Post by Malahat Qureshi
Is there any way to boot of from zfs disk "work around" ??
Yes, see
http://blogs.sun.com/roller/page/tabriz?entry=are_you_ready_to_rumble
I followed those directions with snv_38 and was unsucessful, I wonder
what I did wrong.

Sadly it's my work desktop and I had to stop screwing around with that
and actually get work done. :)

I think I'll just wait until you can install directly to ZFS.

Any ETA on that, btw?

-brian
Lori Alt
2006-07-28 15:47:48 UTC
Permalink
Post by Brian Hechinger
Post by Matthew Ahrens
Post by Malahat Qureshi
Is there any way to boot of from zfs disk "work around" ??
Yes, see
http://blogs.sun.com/roller/page/tabriz?entry=are_you_ready_to_rumble
I followed those directions with snv_38 and was unsucessful, I wonder
what I did wrong.
Sadly it's my work desktop and I had to stop screwing around with that
and actually get work done. :)
I think I'll just wait until you can install directly to ZFS.
Any ETA on that, btw?
While the official release of zfs-boot won't be out
until Update 4 at least, we're working right now on
getting enough pieces available through OpenSolaris
so that users can put together a boot CD/DVD/image
that will directly install a system with a zfs
root. I can't give an exact date, but we're pretty
close. I expect it within weeks, not months.

Lori
Brian Hechinger
2006-07-28 18:30:31 UTC
Permalink
Post by Lori Alt
While the official release of zfs-boot won't be out
until Update 4 at least, we're working right now on
getting enough pieces available through OpenSolaris
so that users can put together a boot CD/DVD/image
that will directly install a system with a zfs
root. I can't give an exact date, but we're pretty
close. I expect it within weeks, not months.
That's great Lori!! I could wait months, but I'd rather wait weeks. :)

What about Express? I don't have a problem running express. In fact,
the desktop at work (a Dell POS) has snv_38 on it and the work laptop (a
Dell POS, see a patern here? *G*) has snv_40 on it. ;)

That being said, I'm (hopefuly safely) assuming that if this makes it
into Update 4 it will include support for zfs-root/install on SPARC as
well as x86?

That would make my U80 at home very, very happy. :)

-brian
Lori Alt
2006-07-28 20:26:24 UTC
Permalink
Post by Brian Hechinger
Post by Lori Alt
While the official release of zfs-boot won't be out
until Update 4 at least, we're working right now on
getting enough pieces available through OpenSolaris
so that users can put together a boot CD/DVD/image
that will directly install a system with a zfs
root. I can't give an exact date, but we're pretty
close. I expect it within weeks, not months.
That's great Lori!! I could wait months, but I'd rather wait weeks. :)
What about Express?
Probably not any time soon. If it makes U4,
I think that would make it available in Express late
this year.
Post by Brian Hechinger
That being said, I'm (hopefuly safely) assuming that if this makes it
into Update 4 it will include support for zfs-root/install on SPARC as
well as x86?
We expect to release SPARC support at the same time as x86.

Lori
Brian Hechinger
2006-08-15 19:22:41 UTC
Permalink
Post by Lori Alt
Post by Brian Hechinger
What about Express?
Probably not any time soon. If it makes U4,
I think that would make it available in Express late
this year.
Is there a specific Nevada build you are going to target? I'd love to
start testing this as soon as possible. I have both SPARC and x86 here
to play with.
Post by Lori Alt
Post by Brian Hechinger
That being said, I'm (hopefuly safely) assuming that if this makes it
into Update 4 it will include support for zfs-root/install on SPARC as
well as x86?
We expect to release SPARC support at the same time as x86.
Most excellent.

-brian
Lori Alt
2006-08-15 22:30:24 UTC
Permalink
Post by Brian Hechinger
Post by Lori Alt
Post by Brian Hechinger
What about Express?
Probably not any time soon. If it makes U4,
I think that would make it available in Express late
this year.
Is there a specific Nevada build you are going to target? I'd love to
start testing this as soon as possible. I have both SPARC and x86 here
to play with.
You need more than a Nevada build. You also need the
installation code. We're working on an OpenSolaris
community web page for zfs-boot. On that web page
will be links to files that can be downloaded for
putting together a netinstall image or a DVD for
installing a system with a zfs root file system.
We hope to have that available in the next few weeks.

lori
Post by Brian Hechinger
Post by Lori Alt
Post by Brian Hechinger
That being said, I'm (hopefuly safely) assuming that if this makes it
into Update 4 it will include support for zfs-root/install on SPARC as
well as x86?
We expect to release SPARC support at the same time as x86.
Most excellent.
-brian
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Dick Davies
2006-08-16 06:58:49 UTC
Permalink
Post by Lori Alt
Post by Brian Hechinger
Post by Lori Alt
Post by Brian Hechinger
What about Express?
Probably not any time soon. If it makes U4,
I think that would make it available in Express late
this year.
Is there a specific Nevada build you are going to target? I'd love to
start testing this as soon as possible. I have both SPARC and x86 here
to play with.
You need more than a Nevada build. You also need the
installation code. We're working on an OpenSolaris
community web page for zfs-boot. On that web page
will be links to files that can be downloaded for
putting together a netinstall image or a DVD for
installing a system with a zfs root file system.
We hope to have that available in the next few weeks.
That's excellent news Lori, thanks to everyone who's working
on this. Are you planning to use a single pool,
or an 'os pool/application pool' split?

As an aside, is there a general method to generate bootable
opensolaris DVDs? The only way I know of getting opensolaris on
is installing sxcr and then BFUing on top.
--
Rasputin :: Jack of All Trades - Master of Nuns
http://number9.hellooperator.net/
Joerg Schilling
2006-08-16 13:14:06 UTC
Permalink
Post by Dick Davies
As an aside, is there a general method to generate bootable
opensolaris DVDs? The only way I know of getting opensolaris on
is installing sxcr and then BFUing on top.
A year ago, I did publish a toolkit to create bootable SchilliX CDs/DVDs.
Would this help?

Jörg
--
EMail:***@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
***@cs.tu-berlin.de (uni)
***@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/
URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Dick Davies
2006-08-16 13:42:32 UTC
Permalink
Post by Joerg Schilling
Post by Dick Davies
As an aside, is there a general method to generate bootable
opensolaris DVDs? The only way I know of getting opensolaris on
is installing sxcr and then BFUing on top.
A year ago, I did publish a toolkit to create bootable SchilliX CDs/DVDs.
Would this help?
Definitely - but I'm just curious to be honest. I just wanted
to burn the appropriate thing onto the 'i boot opensolaris' blank dvds I
got sent the other day :)
--
Rasputin :: Jack of All Trades - Master of Nuns
http://number9.hellooperator.net/
Lori Alt
2006-08-17 17:57:25 UTC
Permalink
Post by Dick Davies
Post by Lori Alt
Post by Brian Hechinger
Post by Lori Alt
Post by Brian Hechinger
What about Express?
Probably not any time soon. If it makes U4,
I think that would make it available in Express late
this year.
Is there a specific Nevada build you are going to target? I'd love to
start testing this as soon as possible. I have both SPARC and x86
here
Post by Brian Hechinger
to play with.
You need more than a Nevada build. You also need the
installation code. We're working on an OpenSolaris
community web page for zfs-boot. On that web page
will be links to files that can be downloaded for
putting together a netinstall image or a DVD for
installing a system with a zfs root file system.
We hope to have that available in the next few weeks.
That's excellent news Lori, thanks to everyone who's working
on this. Are you planning to use a single pool,
or an 'os pool/application pool' split?
I'm not quite sure what you mean by an "os pool/application pool"
split. I would have thought that the *executables* for an application
would normally be installed somewhere in the Solaris namespace
(i.e., the name space established by a Solaris installation: root, /usr,
/opt, /var, and so ) and would thus be part of the "personality"
of a system. Data, on the other hand, is typically NOT part of that
namespace. So your databases, etc. would often be installed somewhere
else. Thus I think of the most important split as the "os pool/data pool"
split. Maybe that's what you meant. If so, then the answer to your
question is:

The plan right now is to encourage (though not mandate) separate
pools for the Solaris name space and for data. There are three reasons
for this:

1. There will be some restrictions on root pools, at least initially
and perhaps permanently, that you would probably not want to place
on pools used for data. For example, no RAID-Z or concatenation of
vdevs. These might be relaxed at some time, but right now, limitations
in the boot PROMs cause us to place restrictions on the devices
you can place in a root pool. (root mirroring WILL be supported,
however).

2. Data is theoretically shareable and transferable among different
instruction set architectures. System software is not.

3. There are advantages to separating the "personality" of a machine
from its data. If they are separated, one can be modified (i.e., patched,
upgraded, moved from one kind of storage to another, etc) without having
to affect the other.
Post by Dick Davies
As an aside, is there a general method to generate bootable
opensolaris DVDs? The only way I know of getting opensolaris on
is installing sxcr and then BFUing on top.
I actually don't know much about that right now. Someone else might have to
answer that question for you.

Lori
Dick Davies
2006-08-18 08:28:24 UTC
Permalink
Post by Lori Alt
Post by Dick Davies
That's excellent news Lori, thanks to everyone who's working
on this. Are you planning to use a single pool,
or an 'os pool/application pool' split?
Thus I think of the most important split as the "os pool/data pool"
split. Maybe that's what you meant.
That's it, yes :)
I should probably have said service rather than application.
Post by Lori Alt
...... limitations
in the boot PROMs cause us to place restrictions on the devices
you can place in a root pool. (root mirroring WILL be supported,
however).
Does boot prom support mean this will be SPARC only? That's
interesting (last time I tried Tabriz' hack, it was x86 only).

Or is x86 zfs root going to need a grub /boot partition on one
of the disks?
--
Rasputin :: Jack of All Trades - Master of Nuns
http://number9.hellooperator.net/
Lori Alt
2006-08-18 16:56:45 UTC
Permalink
Post by Dick Davies
Post by Lori Alt
Post by Dick Davies
That's excellent news Lori, thanks to everyone who's working
on this. Are you planning to use a single pool,
or an 'os pool/application pool' split?
Thus I think of the most important split as the "os pool/data pool"
split. Maybe that's what you meant.
That's it, yes :)
I should probably have said service rather than application.
Post by Lori Alt
...... limitations
in the boot PROMs cause us to place restrictions on the devices
you can place in a root pool. (root mirroring WILL be supported,
however).
Does boot prom support mean this will be SPARC only? That's
interesting (last time I tried Tabriz' hack, it was x86 only).
No, zfs boot will be supported on both x86 and sparc. Sparc's
OBP, and various x86 BIOS's both have restrictions on the devices
that can be accessed at boot time, so we need to limit the
devices in a root pool on both architectures.
Post by Dick Davies
Or is x86 zfs root going to need a grub /boot partition on one
of the disks?
On x86, each disk capable of booting the system (which means each
disk in a root pool) will have grub installed on it in a disk
slice which occupies the first few blocks of the disk. It's not
the same as the old /boot partition, because all the slice
contains is grub. It doesn't contain a file system.

Lori
Dick Davies
2006-08-18 17:13:23 UTC
Permalink
Post by Lori Alt
No, zfs boot will be supported on both x86 and sparc. Sparc's
OBP, and various x86 BIOS's both have restrictions on the devices
that can be accessed at boot time, so we need to limit the
devices in a root pool on both architectures.
Gotcha. I wasn't sure if you were proposing requiring a custom
BIOS on x86, but I take it (from your next point)
you're just chainloading a ZFS-aware grub
Post by Lori Alt
Post by Dick Davies
Or is x86 zfs root going to need a grub /boot partition on one
of the disks?
On x86, each disk capable of booting the system (which means each
disk in a root pool) will have grub installed on it in a disk
slice which occupies the first few blocks of the disk. It's not
the same as the old /boot partition, because all the slice
contains is grub. It doesn't contain a file system.
I think that was really what I was getting at. So long as one
of the disks is still alive, and the BIOS can boot of it, then you'd
be alright? That sounds perfect - the implementation is really
not that important to me, so long as there's no single point of
failure.

Thanks for your time, and have a good weekend.
--
Rasputin :: Jack of All Trades - Master of Nuns
http://number9.hellooperator.net/
Torrey McMahon
2006-08-18 17:22:26 UTC
Permalink
Post by Lori Alt
No, zfs boot will be supported on both x86 and sparc. Sparc's
OBP, and various x86 BIOS's both have restrictions on the devices
that can be accessed at boot time, so we need to limit the
devices in a root pool on both architectures.
Hi Lori.

Can you expand a bit on the above? What sort of limitations are you
referring too? (Boot time? Topology?)
Lori Alt
2006-08-18 17:27:09 UTC
Permalink
Post by Torrey McMahon
Post by Lori Alt
No, zfs boot will be supported on both x86 and sparc. Sparc's
OBP, and various x86 BIOS's both have restrictions on the devices
that can be accessed at boot time, so we need to limit the
devices in a root pool on both architectures.
Hi Lori.
Can you expand a bit on the above? What sort of limitations are you
referring too? (Boot time? Topology?)
The limitation is mainly about the *number* of disks
that can be accessed at one time. If we were going to
support booting off a set of disks in a RAID-Z
configuration, the early boot code would have to
read some blocks from one disk, and then some blocks
from another disk, and so on. There are difficulties
doing that when using the capabilities of OBP
or the BIOS to do I/O. (and if you want me to be more
specific about what THOSE difficulties are, I'd
have to get someone who knows more about BIOS and
OBP to answer the question.) But with straight
mirroring, there's no such problem because any disk
in the mirror can supply all of the disk blocks needed
to boot.

lori
Bennett, Steve
2006-08-18 18:01:51 UTC
Permalink
Post by Lori Alt
The limitation is mainly about the *number* of disks
that can be accessed at one time.
...
But with straight mirroring, there's no such problem
because any disk in the mirror can supply all of the
disk blocks needed to boot.
Does that mean that these restrictions will go away once replication can
be varied on a per dataset (or per file) basis? You could have all your
'essential to boot' files mirrored across all disks, then raidz2 the
rest...

Steve.
Lori Alt
2006-08-18 18:39:06 UTC
Permalink
Post by Bennett, Steve
Post by Lori Alt
The limitation is mainly about the *number* of disks
that can be accessed at one time.
...
But with straight mirroring, there's no such problem
because any disk in the mirror can supply all of the
disk blocks needed to boot.
Does that mean that these restrictions will go away once replication can
be varied on a per dataset (or per file) basis? You could have all your
'essential to boot' files mirrored across all disks, then raidz2 the
rest...
Maybe. It depends on how per-file replication is implemented.
I don't think we've made any design decisions at this time that
would prevent that from working in the future.

Lori
Tabriz Leman
2006-08-18 17:41:05 UTC
Permalink
Post by Torrey McMahon
Post by Lori Alt
No, zfs boot will be supported on both x86 and sparc. Sparc's
OBP, and various x86 BIOS's both have restrictions on the devices
that can be accessed at boot time, so we need to limit the
devices in a root pool on both architectures.
Hi Lori.
Can you expand a bit on the above? What sort of limitations are you
referring too? (Boot time? Topology?)
I think what Lori is referring to here is that we need to limit the
rootpool to BIOS/OBP visible devices; not all devices are visible from
the BIOS/OBP (fibre channel devices, for example).

Tabriz
Post by Torrey McMahon
_______________________________________________
zfs-discuss mailing list
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Torrey McMahon
2006-08-19 21:22:56 UTC
Permalink
Post by Tabriz Leman
Post by Torrey McMahon
Post by Lori Alt
No, zfs boot will be supported on both x86 and sparc. Sparc's
OBP, and various x86 BIOS's both have restrictions on the devices
that can be accessed at boot time, so we need to limit the
devices in a root pool on both architectures.
Hi Lori.
Can you expand a bit on the above? What sort of limitations are you
referring too? (Boot time? Topology?)
I think what Lori is referring to here is that we need to limit the
rootpool to BIOS/OBP visible devices; not all devices are visible from
the BIOS/OBP (fibre channel devices, for example).
Actually, OBP and BIOS should have access to fabric devices. I know OBP
does and I've seen docs that mention BIOS does. (Look in the x86 fabric
boot docs for example)

I think the problem is more around, as Lori mentioned in her email, the
ability to traverse a ZFS pool when the ZFS drivers haven't been loaded.
Loading...