Discussion:
[PATCH V7] Add support for BTRFS raid5/6 to GRUB
Goffredo Baroncelli
2018-09-19 18:40:31 UTC
Permalink
Hi All,

the aim of this patches set is to provide support for a BTRFS raid5/6
filesystem in GRUB.

The first patch, implements the basic support for raid5/6. I.e this works when
all the disks are present.

The next 5 patches, are preparatory ones.

The 7th patch implements the raid5 recovery for btrfs (i.e. handling the
disappearing of 1 disk).
The 8th patch makes the code for handling the raid6 recovery more generic.
The last one implements the raid6 recovery for btrfs (i.e. handling the
disappearing up to two disks).

I tested the code in grub-emu, and it works both with all the disks,
and with some disks missing. I checked the crc32 calculated from grub and
from linux and these matched. Finally I checked if the support for md raid6
still works properly, and it does (with all drives and with up to 2 drives
missing)

Comments are welcome.

Changelog
v1: initial support for btrfs raid5/6. No recovery allowed
v2: full support for btrfs raid5/6. Recovery allowed
v3: some minor cleanup suggested by Daniel Kiper; reusing the
original raid6 recovery code of grub
v4: Several spell fix; better description of the RAID layout
in btrfs, and the variables which describes the stripe
positioning; split the patch #5 in two (#5 and #6)
v5: Several spell fix; improved code comment in patch #1, small
clean up in the code
v6: Small cleanup; improved the wording in the RAID6 layout
description; in the function raid6_recover_read_buffer() avoid
a unnecessary memcpy in case of invalid data;
v7: - patch 2,3,5,6,8 received an Review-by Daniel, and were unchanged from
the last time (only minor cleanup in the commit description requested by
Daniel)
- patch 7 received some small update rearranging a for(), and some
bracket around if()
- patch 4, received an update message which explains better why NULL
is stored in data->devices_attached[]
- patch 9, received a blank line to separate better a code line from
a previous comment. A description of 'parities_pos' was added
- patch 1, received a major update about the variable meaning description
in the comment. However I suspect that we need some further review to reach
a fully agreement about this text. NB: the update are relate only to comments

BR
G.Baroncelli

--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
Goffredo Baroncelli
2018-09-19 18:40:36 UTC
Permalink
From: Goffredo Baroncelli <***@inwind.it>

A portion of the logging code is moved outside of internal for(;;). The part
that is left inside is the one which depends on the internal for(;;) index.

This is a preparatory patch. The next one will refactor the code inside
the for(;;) into an another function.

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
Reviewed-by: Daniel Kiper <***@oracle.com>
---
grub-core/fs/btrfs.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 6e42c33f6..ee134c167 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -863,6 +863,18 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,

for (j = 0; j < 2; j++)
{
+ grub_dprintf ("btrfs", "chunk 0x%" PRIxGRUB_UINT64_T
+ "+0x%" PRIxGRUB_UINT64_T
+ " (%d stripes (%d substripes) of %"
+ PRIxGRUB_UINT64_T ")\n",
+ grub_le_to_cpu64 (key->offset),
+ grub_le_to_cpu64 (chunk->size),
+ grub_le_to_cpu16 (chunk->nstripes),
+ grub_le_to_cpu16 (chunk->nsubstripes),
+ grub_le_to_cpu64 (chunk->stripe_length));
+ grub_dprintf ("btrfs", "reading laddr 0x%" PRIxGRUB_UINT64_T "\n",
+ addr);
+
for (i = 0; i < redundancy; i++)
{
struct grub_btrfs_chunk_stripe *stripe;
@@ -875,20 +887,11 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,

paddr = grub_le_to_cpu64 (stripe->offset) + stripe_offset;

- grub_dprintf ("btrfs", "chunk 0x%" PRIxGRUB_UINT64_T
- "+0x%" PRIxGRUB_UINT64_T
- " (%d stripes (%d substripes) of %"
- PRIxGRUB_UINT64_T ") stripe %" PRIxGRUB_UINT64_T
+ grub_dprintf ("btrfs", "stripe %" PRIxGRUB_UINT64_T
" maps to 0x%" PRIxGRUB_UINT64_T "\n",
- grub_le_to_cpu64 (key->offset),
- grub_le_to_cpu64 (chunk->size),
- grub_le_to_cpu16 (chunk->nstripes),
- grub_le_to_cpu16 (chunk->nsubstripes),
- grub_le_to_cpu64 (chunk->stripe_length),
stripen, stripe->offset);
grub_dprintf ("btrfs", "reading paddr 0x%" PRIxGRUB_UINT64_T
- " for laddr 0x%" PRIxGRUB_UINT64_T "\n", paddr,
- addr);
+ "\n", paddr);

dev = find_device (data, stripe->device_id);
if (!dev)
--
2.19.0
Goffredo Baroncelli
2018-09-19 18:40:32 UTC
Permalink
From: Goffredo Baroncelli <***@inwind.it>

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
---
grub-core/fs/btrfs.c | 66 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index be195448d..56c42746d 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
#define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10
#define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED 0x20
#define GRUB_BTRFS_CHUNK_TYPE_RAID10 0x40
+#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80
+#define GRUB_BTRFS_CHUNK_TYPE_RAID6 0x100
grub_uint8_t dummy2[0xc];
grub_uint16_t nstripes;
grub_uint16_t nsubstripes;
@@ -764,6 +766,70 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
stripe_offset = low + chunk_stripe_length
* high;
csize = chunk_stripe_length - low;
+ break;
+ }
+ case GRUB_BTRFS_CHUNK_TYPE_RAID5:
+ case GRUB_BTRFS_CHUNK_TYPE_RAID6:
+ {
+ grub_uint64_t nparities, block_nr, high, low;
+
+ redundancy = 1; /* no redundancy for now */
+
+ if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
+ {
+ grub_dprintf ("btrfs", "RAID5\n");
+ nparities = 1;
+ }
+ else
+ {
+ grub_dprintf ("btrfs", "RAID6\n");
+ nparities = 2;
+ }
+
+ /*
+ * A RAID 6 layout consists of several blocks spread on the disks.
+ * The raid terminology is used to call all the blocks of a row
+ * "stripe". Unfortunately the BTRFS terminology confuses block
+ * and stripe.
+ *
+ * Disk0 Disk1 Disk2 Disk3
+ *
+ * A1 B1 P1 Q1
+ * Q2 A2 B2 P2
+ * P3 Q3 A3 B3
+ * [...]
+ *
+ * Note that the placement of the parities depends on row index.
+ * In the code below:
+ * - block_nr is the block number without the parities
+ * (A1 = 0, B1 = 1, A2 = 2, B2 = 3, ...),
+ * - high is the row number (0 for A1...Q1, 1 for Q2...P2, ...),
+ * - stripen is the disk number (0 for A1,Q2,P3, 1 for B1...),
+ * - off is the logical address to read
+ * - chunk_stripe_length is the size of a block (typically 64k),
+ * - nstripes is the number of disks,
+ * - low is the offset of the data inside a stripe,
+ * - stripe_offset is the disk offset,
+ * - csize is the "potential" data to read. It will be reduced to
+ * size if the latter is smaller.
+ */
+ block_nr = grub_divmod64 (off, chunk_stripe_length, &low);
+
+ /*
+ * stripen is computed without the parities (0 for A1, A2, A3...
+ * 1 for B1, B2...).
+ */
+ high = grub_divmod64 (block_nr, nstripes - nparities, &stripen);
+
+ /*
+ * stripen is recomputed considering the parities (0 for A1, 1 for
+ * A2, 2 for A3....).
+ */
+ grub_divmod64 (high + stripen, nstripes, &stripen);
+
+ stripe_offset = low + chunk_stripe_length * high;
+ csize = chunk_stripe_length - low;
+
break;
}
default:
--
2.19.0
Daniel Kiper
2018-09-25 15:31:42 UTC
Permalink
Post by Goffredo Baroncelli
---
grub-core/fs/btrfs.c | 66 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index be195448d..56c42746d 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
#define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10
#define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED 0x20
#define GRUB_BTRFS_CHUNK_TYPE_RAID10 0x40
+#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80
+#define GRUB_BTRFS_CHUNK_TYPE_RAID6 0x100
grub_uint8_t dummy2[0xc];
grub_uint16_t nstripes;
grub_uint16_t nsubstripes;
@@ -764,6 +766,70 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
stripe_offset = low + chunk_stripe_length
* high;
csize = chunk_stripe_length - low;
+ break;
+ }
+ {
+ grub_uint64_t nparities, block_nr, high, low;
+
+ redundancy = 1; /* no redundancy for now */
+
+ if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
+ {
+ grub_dprintf ("btrfs", "RAID5\n");
+ nparities = 1;
+ }
+ else
+ {
+ grub_dprintf ("btrfs", "RAID6\n");
+ nparities = 2;
+ }
+
+ /*
+ * A RAID 6 layout consists of several blocks spread on the disks.
+ * The raid terminology is used to call all the blocks of a row
+ * "stripe". Unfortunately the BTRFS terminology confuses block
Stripe is data set or parity (parity stripe) on one disk. Block has
different meaning. Please stick to btrfs terminology and say it clearly
in the comment. And even add a link to btrfs wiki page to ease reading.

I think about this one:
https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs#BLOCK_GROUPS.2C_CHUNKS.2C_RAID
Post by Goffredo Baroncelli
+ * and stripe.
I do not think so. Or at least not so much...
Post by Goffredo Baroncelli
+ *
+ * Disk0 Disk1 Disk2 Disk3
+ *
+ * A1 B1 P1 Q1
+ * Q2 A2 B2 P2
+ * P3 Q3 A3 B3
+ * [...]
+ *
+ * Note that the placement of the parities depends on row index.
+ * - block_nr is the block number without the parities
Well, it seems to me that the btrfs code introduces confusion not the
spec itself. I would leave code as is but s/block number/stripe number/.
Post by Goffredo Baroncelli
+ * (A1 = 0, B1 = 1, A2 = 2, B2 = 3, ...),
+ * - high is the row number (0 for A1...Q1, 1 for Q2...P2, ...),
+ * - stripen is the disk number (0 for A1,Q2,P3, 1 for B1...),
s/disk number/disk number in a row/
Post by Goffredo Baroncelli
+ * - off is the logical address to read
+ * - chunk_stripe_length is the size of a block (typically 64k),
s/a block/a stripe/
Post by Goffredo Baroncelli
+ * - nstripes is the number of disks,
s/number of disks/number of disks in a row/

I miss the description of nparities here...
Post by Goffredo Baroncelli
+ * - low is the offset of the data inside a stripe,
+ * - stripe_offset is the disk offset,
s/the disk offset/the data offset in an array/?
Post by Goffredo Baroncelli
+ * - csize is the "potential" data to read. It will be reduced to
+ * size if the latter is smaller.
+ */
+ block_nr = grub_divmod64 (off, chunk_stripe_length, &low);
+
+ /*
+ * stripen is computed without the parities (0 for A1, A2, A3...
+ * 1 for B1, B2...).
+ */
+ high = grub_divmod64 (block_nr, nstripes - nparities, &stripen);
This is clear...
Post by Goffredo Baroncelli
+ /*
+ * stripen is recomputed considering the parities (0 for A1, 1 for
+ * A2, 2 for A3....).
+ */
+ grub_divmod64 (high + stripen, nstripes, &stripen);
... but this looks strange... You add disk number to row number. Hmmm...
It looks that it works but this is not obvious at first sight. Could you
explain that?
Post by Goffredo Baroncelli
+ stripe_offset = low + chunk_stripe_length * high;
+ csize = chunk_stripe_length - low;
+
break;
}
Daniel
Goffredo Baroncelli
2018-09-26 20:40:32 UTC
Permalink
Post by Daniel Kiper
Post by Goffredo Baroncelli
---
grub-core/fs/btrfs.c | 66 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index be195448d..56c42746d 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
#define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10
#define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED 0x20
#define GRUB_BTRFS_CHUNK_TYPE_RAID10 0x40
+#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80
+#define GRUB_BTRFS_CHUNK_TYPE_RAID6 0x100
grub_uint8_t dummy2[0xc];
grub_uint16_t nstripes;
grub_uint16_t nsubstripes;
@@ -764,6 +766,70 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
stripe_offset = low + chunk_stripe_length
* high;
csize = chunk_stripe_length - low;
+ break;
+ }
+ {
+ grub_uint64_t nparities, block_nr, high, low;
+
+ redundancy = 1; /* no redundancy for now */
+
+ if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
+ {
+ grub_dprintf ("btrfs", "RAID5\n");
+ nparities = 1;
+ }
+ else
+ {
+ grub_dprintf ("btrfs", "RAID6\n");
+ nparities = 2;
+ }
+
+ /*
+ * A RAID 6 layout consists of several blocks spread on the disks.
+ * The raid terminology is used to call all the blocks of a row
+ * "stripe". Unfortunately the BTRFS terminology confuses block
Stripe is data set or parity (parity stripe) on one disk. Block has
different meaning. Please stick to btrfs terminology and say it clearly
in the comment. And even add a link to btrfs wiki page to ease reading.
https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs#BLOCK_GROUPS.2C_CHUNKS.2C_RAID
Post by Goffredo Baroncelli
+ * and stripe.
I do not think so. Or at least not so much...
Trust me, generally speaking stripe is the "row" in the disks (without the parity); looking at the ext3 man page:

....
stride=stride-size
Configure the filesystem for a RAID array with
stride-size filesystem blocks. This is the number of
blocks read or written to disk before moving to the
next disk, which is sometimes referred to as the
chunk size. This mostly affects placement of
filesystem metadata like bitmaps at mke2fs time to
avoid placing them on a single disk, which can hurt
performance. It may also be used by the block allo‐
cator.

stripe_width=stripe-width
Configure the filesystem for a RAID array with
stripe-width filesystem blocks per stripe. This is
typically stride-size * N, where N is the number of
data-bearing disks in the RAID (e.g. for RAID 5
there is one parity disk, so N will be the number of
disks in the array minus 1). This allows the block
allocator to prevent read-modify-write of the parity
in a RAID stripe if possible when the data is writ‐
ten.

....
Looking at the RAID5 wikipedia page, it seems that the term "stripe" is coherent with the ext3 man page.

I suspect that the confusion is due to the fact that in RAID1 a stripe is in ONE disk (because the others are like parities). In BTRFS the RAID5/6 code uses the structure of RAID1 with some minimal extensions...

To be clear, I don't have problem to be adherent to the BTRFS terminology. However I found this very confusing because I was used to a different terminology. I am bit worried about the fact that grub uses both MD/DM code and BTRFS code; a quick look to the code (eg ./grub-core/disk/diskfilter.c) shows that the stripe_size field seems to be related to a disks row without parities.

And... yes in BTRFS "chunk" is a completely different beast than what it is reported in the ext3 man page :-)
Post by Daniel Kiper
Post by Goffredo Baroncelli
+ *
+ * Disk0 Disk1 Disk2 Disk3
+ *
+ * A1 B1 P1 Q1
+ * Q2 A2 B2 P2
+ * P3 Q3 A3 B3
+ * [...]
+ *
+ * Note that the placement of the parities depends on row index.
+ * - block_nr is the block number without the parities
Well, it seems to me that the btrfs code introduces confusion not the
spec itself. I would leave code as is but s/block number/stripe number/.
Ok I will replace the two terms. However I have to put a warning that this is a "BTRFS" terminology :-)
Post by Daniel Kiper
Post by Goffredo Baroncelli
+ * (A1 = 0, B1 = 1, A2 = 2, B2 = 3, ...),
+ * - high is the row number (0 for A1...Q1, 1 for Q2...P2, ...),
+ * - stripen is the disk number (0 for A1,Q2,P3, 1 for B1...),
s/disk number/disk number in a row/
This value doesn't depend by the row. So "number of disk" is more correct
Post by Daniel Kiper
Post by Goffredo Baroncelli
+ * - off is the logical address to read
+ * - chunk_stripe_length is the size of a block (typically 64k),
s/a block/a stripe/
Post by Goffredo Baroncelli
+ * - nstripes is the number of disks,
s/number of disks/number of disks in a row/
ditto
Post by Daniel Kiper
I miss the description of nparities here...
Right:
+ * - nparities is the number of parities (1 for RAID5, 2 for RAID6);
+ * used only in RAID5/6 code.
Post by Daniel Kiper
Post by Goffredo Baroncelli
+ * - low is the offset of the data inside a stripe,
+ * - stripe_offset is the disk offset,
s/the disk offset/the data offset in an array/?
Yes
Post by Daniel Kiper
Post by Goffredo Baroncelli
+ * - csize is the "potential" data to read. It will be reduced to
+ * size if the latter is smaller.
+ */
+ block_nr = grub_divmod64 (off, chunk_stripe_length, &low);
+
+ /*
+ * stripen is computed without the parities (0 for A1, A2, A3...
+ * 1 for B1, B2...).
+ */
+ high = grub_divmod64 (block_nr, nstripes - nparities, &stripen);
This is clear...
Post by Goffredo Baroncelli
+ /*
+ * stripen is recomputed considering the parities (0 for A1, 1 for
+ * A2, 2 for A3....).
+ */
+ grub_divmod64 (high + stripen, nstripes, &stripen);
... but this looks strange... You add disk number to row number. Hmmm...
It looks that it works but this is not obvious at first sight. Could you
explain that?
What about
+ /*
+ * stripen is recomputed considering the parities: different row have
+ * a different offset, we have to add to stripen the number of row ("high") in
+ * modulo nstripes (0 for A1, 1 for A2, 2 for A3....).
+ */
Post by Daniel Kiper
Post by Goffredo Baroncelli
+ stripe_offset = low + chunk_stripe_length * high;
+ csize = chunk_stripe_length - low;
+
break;
}
Daniel
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
Daniel Kiper
2018-09-27 15:47:59 UTC
Permalink
Post by Goffredo Baroncelli
Post by Daniel Kiper
Post by Goffredo Baroncelli
---
grub-core/fs/btrfs.c | 66 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index be195448d..56c42746d 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
#define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10
#define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED 0x20
#define GRUB_BTRFS_CHUNK_TYPE_RAID10 0x40
+#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80
+#define GRUB_BTRFS_CHUNK_TYPE_RAID6 0x100
grub_uint8_t dummy2[0xc];
grub_uint16_t nstripes;
grub_uint16_t nsubstripes;
@@ -764,6 +766,70 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
stripe_offset = low + chunk_stripe_length
* high;
csize = chunk_stripe_length - low;
+ break;
+ }
+ {
+ grub_uint64_t nparities, block_nr, high, low;
+
+ redundancy = 1; /* no redundancy for now */
+
+ if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
+ {
+ grub_dprintf ("btrfs", "RAID5\n");
+ nparities = 1;
+ }
+ else
+ {
+ grub_dprintf ("btrfs", "RAID6\n");
+ nparities = 2;
+ }
+
+ /*
+ * A RAID 6 layout consists of several blocks spread on the disks.
+ * The raid terminology is used to call all the blocks of a row
+ * "stripe". Unfortunately the BTRFS terminology confuses block
Stripe is data set or parity (parity stripe) on one disk. Block has
different meaning. Please stick to btrfs terminology and say it clearly
in the comment. And even add a link to btrfs wiki page to ease reading.
https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs#BLOCK_GROUPS.2C_CHUNKS.2C_RAID
Post by Goffredo Baroncelli
+ * and stripe.
I do not think so. Or at least not so much...
....
stride=stride-size
Configure the filesystem for a RAID array with
stride-size filesystem blocks. This is the number of
blocks read or written to disk before moving to the
next disk, which is sometimes referred to as the
chunk size. This mostly affects placement of
filesystem metadata like bitmaps at mke2fs time to
avoid placing them on a single disk, which can hurt
performance. It may also be used by the block allo???
cator.
stripe_width=stripe-width
Configure the filesystem for a RAID array with
stripe-width filesystem blocks per stripe. This is
typically stride-size * N, where N is the number of
data-bearing disks in the RAID (e.g. for RAID 5
there is one parity disk, so N will be the number of
disks in the array minus 1). This allows the block
allocator to prevent read-modify-write of the parity
in a RAID stripe if possible when the data is writ???
ten.
....
Looking at the RAID5 wikipedia page, it seems that the term "stripe"
is coherent with the ext3 man page.
Ugh... It looks that I have messed up things. Sorry about that.
Post by Goffredo Baroncelli
I suspect that the confusion is due to the fact that in RAID1 a stripe
is in ONE disk (because the others are like parities). In BTRFS the
RAID5/6 code uses the structure of RAID1 with some minimal
extensions...
To be clear, I don't have problem to be adherent to the BTRFS
terminology. However I found this very confusing because I was used to
a different terminology. I am bit worried about the fact that grub
Yeah, I have the same feeling. However, I think that in btrfs code we
should stay with btrfs terms. Though I think that it make sense to
underline differences between btrfs and well known RAID here.
Post by Goffredo Baroncelli
uses both MD/DM code and BTRFS code; a quick look to the code (eg
./grub-core/disk/diskfilter.c) shows that the stripe_size field seems
to be related to a disks row without parities.
And... yes in BTRFS "chunk" is a completely different beast than what
it is reported in the ext3 man page :-)
As I said above, please say it in the comment. This will ease reading
for people who are not used to btrfs terms.
Post by Goffredo Baroncelli
Post by Daniel Kiper
Post by Goffredo Baroncelli
+ *
+ * Disk0 Disk1 Disk2 Disk3
+ *
+ * A1 B1 P1 Q1
+ * Q2 A2 B2 P2
+ * P3 Q3 A3 B3
+ * [...]
+ *
+ * Note that the placement of the parities depends on row index.
+ * - block_nr is the block number without the parities
Well, it seems to me that the btrfs code introduces confusion not the
spec itself. I would leave code as is but s/block number/stripe number/.
Ok I will replace the two terms. However I have to put a warning that this is a "BTRFS" terminology :-)
Yep, and please explain the differences at the beginning of the comment.
Post by Goffredo Baroncelli
Post by Daniel Kiper
Post by Goffredo Baroncelli
+ * (A1 = 0, B1 = 1, A2 = 2, B2 = 3, ...),
+ * - high is the row number (0 for A1...Q1, 1 for Q2...P2, ...),
+ * - stripen is the disk number (0 for A1,Q2,P3, 1 for B1...),
s/disk number/disk number in a row/
This value doesn't depend by the row. So "number of disk" is more correct
Yes, but without "row" it is a bit confusing because it suggests that
it is an arbitrary number. Even if you give an example next to the
description. So, I am insisting on adding "in a row" here.
Post by Goffredo Baroncelli
Post by Daniel Kiper
Post by Goffredo Baroncelli
+ * - off is the logical address to read
+ * - chunk_stripe_length is the size of a block (typically 64k),
s/a block/a stripe/
Taking into account discussion above I am not sure right now which one
is correct. Please double check and fix it if it is needed.
Post by Goffredo Baroncelli
Post by Daniel Kiper
Post by Goffredo Baroncelli
+ * - nstripes is the number of disks,
s/number of disks/number of disks in a row/
ditto
As above, Is it total number of disks in array? I do not think so.
Hence, I think that "in a row" helps a bit. Even if it is not very
precise. However, if you come up with something better I am not
against it.
Post by Goffredo Baroncelli
Post by Daniel Kiper
I miss the description of nparities here...
+ * - nparities is the number of parities (1 for RAID5, 2 for RAID6);
+ * used only in RAID5/6 code.
LGTM.
Post by Goffredo Baroncelli
Post by Daniel Kiper
Post by Goffredo Baroncelli
+ * - low is the offset of the data inside a stripe,
+ * - stripe_offset is the disk offset,
s/the disk offset/the data offset in an array/?
Yes
Post by Daniel Kiper
Post by Goffredo Baroncelli
+ * - csize is the "potential" data to read. It will be reduced to
+ * size if the latter is smaller.
+ */
+ block_nr = grub_divmod64 (off, chunk_stripe_length, &low);
+
+ /*
+ * stripen is computed without the parities (0 for A1, A2, A3...
+ * 1 for B1, B2...).
+ */
+ high = grub_divmod64 (block_nr, nstripes - nparities, &stripen);
This is clear...
Post by Goffredo Baroncelli
+ /*
+ * stripen is recomputed considering the parities (0 for A1, 1 for
+ * A2, 2 for A3....).
+ */
+ grub_divmod64 (high + stripen, nstripes, &stripen);
... but this looks strange... You add disk number to row number. Hmmm...
It looks that it works but this is not obvious at first sight. Could you
explain that?
What about
+ /*
+ * stripen is recomputed considering the parities: different row have
+ * a different offset, we have to add to stripen the number of row ("high") in
+ * modulo nstripes (0 for A1, 1 for A2, 2 for A3....).
+ */
This is better but not much. You are repeating what code does.
I am especially interested in why this math is correct. It is not
obvious at first sight. If it is not it should be explained.
Otherwise we will forget in a few months why it is correct.

Daniel
Goffredo Baroncelli
2018-09-19 18:40:40 UTC
Permalink
From: Goffredo Baroncelli <***@inwind.it>

Add the RAID 6 recovery, in order to use a RAID 6 filesystem even if some
disks (up to two) are missing. This code use the md RAID 6 code already
present in grub.

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
---
grub-core/fs/btrfs.c | 54 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 48 insertions(+), 6 deletions(-)

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 55a7eeffc..400cd56b6 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -30,6 +30,7 @@
#include <grub/i18n.h>
#include <grub/btrfs.h>
#include <grub/crypto.h>
+#include <grub/diskfilter.h>

GRUB_MOD_LICENSE ("GPLv3+");

@@ -705,11 +706,36 @@ rebuild_raid5 (char *dest, struct raid56_buffer *buffers,
}
}

+static grub_err_t
+raid6_recover_read_buffer (void *data, int disk_nr,
+ grub_uint64_t addr __attribute__ ((unused)),
+ void *dest, grub_size_t size)
+{
+ struct raid56_buffer *buffers = data;
+
+ if (!buffers[disk_nr].data_is_valid)
+ return grub_errno = GRUB_ERR_READ_ERROR;
+
+ grub_memcpy(dest, buffers[disk_nr].buf, size);
+
+ return grub_errno = GRUB_ERR_NONE;
+}
+
+static void
+rebuild_raid6 (struct raid56_buffer *buffers, grub_uint64_t nstripes,
+ grub_uint64_t csize, grub_uint64_t parities_pos, void *dest,
+ grub_uint64_t stripen)
+
+{
+ grub_raid6_recover_gen (buffers, nstripes, stripen, parities_pos,
+ dest, 0, csize, 0, raid6_recover_read_buffer);
+}
+
static grub_err_t
raid56_read_retry (struct grub_btrfs_data *data,
struct grub_btrfs_chunk_item *chunk,
- grub_uint64_t stripe_offset,
- grub_uint64_t csize, void *buf)
+ grub_uint64_t stripe_offset, grub_uint64_t stripen,
+ grub_uint64_t csize, void *buf, grub_uint64_t parities_pos)
{
struct raid56_buffer *buffers;
grub_uint64_t nstripes = grub_le_to_cpu16 (chunk->nstripes);
@@ -787,6 +813,15 @@ raid56_read_retry (struct grub_btrfs_data *data,
ret = GRUB_ERR_READ_ERROR;
goto cleanup;
}
+ else if (failed_devices > 2 && (chunk_type & GRUB_BTRFS_CHUNK_TYPE_RAID6))
+ {
+ grub_dprintf ("btrfs",
+ "not enough disks for raid6: total %" PRIuGRUB_UINT64_T
+ ", missing %" PRIuGRUB_UINT64_T "\n",
+ nstripes, failed_devices);
+ ret = GRUB_ERR_READ_ERROR;
+ goto cleanup;
+ }
else
grub_dprintf ("btrfs",
"enough disks for RAID 5 rebuilding: total %"
@@ -797,7 +832,7 @@ raid56_read_retry (struct grub_btrfs_data *data,
if (chunk_type & GRUB_BTRFS_CHUNK_TYPE_RAID5)
rebuild_raid5 (buf, buffers, nstripes, csize);
else
- grub_dprintf ("btrfs", "called rebuild_raid6(), NOT IMPLEMENTED\n");
+ rebuild_raid6 (buffers, nstripes, csize, parities_pos, buf, stripen);

cleanup:
if (buffers)
@@ -886,9 +921,11 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
unsigned redundancy = 1;
unsigned i, j;
int is_raid56;
+ grub_uint64_t parities_pos = 0;

- is_raid56 = !!(grub_le_to_cpu64 (chunk->type) &
- GRUB_BTRFS_CHUNK_TYPE_RAID5);
+ is_raid56 = !!(grub_le_to_cpu64 (chunk->type) &
+ (GRUB_BTRFS_CHUNK_TYPE_RAID5 |
+ GRUB_BTRFS_CHUNK_TYPE_RAID6));

if (grub_le_to_cpu64 (chunk->size) <= off)
{
@@ -1015,6 +1052,8 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
* - stripe_offset is the disk offset,
* - csize is the "potential" data to read. It will be reduced to
* size if the latter is smaller.
+ * - parities_pos is the position of the parity inside a row (
+ * 2 for P1, 3 for P2...)
*/
block_nr = grub_divmod64 (off, chunk_stripe_length, &low);

@@ -1030,6 +1069,9 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
*/
grub_divmod64 (high + stripen, nstripes, &stripen);

+ grub_divmod64 (high + nstripes - nparities, nstripes,
+ &parities_pos);
+
stripe_offset = low + chunk_stripe_length * high;
csize = chunk_stripe_length - low;

@@ -1081,7 +1123,7 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
grub_errno = GRUB_ERR_NONE;
if (err != GRUB_ERR_NONE)
err = raid56_read_retry (data, chunk, stripe_offset,
- csize, buf);
+ stripen, csize, buf, parities_pos);
}
if (err == GRUB_ERR_NONE)
break;
--
2.19.0
Daniel Kiper
2018-09-25 19:20:46 UTC
Permalink
Post by Goffredo Baroncelli
Add the RAID 6 recovery, in order to use a RAID 6 filesystem even if some
disks (up to two) are missing. This code use the md RAID 6 code already
present in grub.
---
grub-core/fs/btrfs.c | 54 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 48 insertions(+), 6 deletions(-)
diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 55a7eeffc..400cd56b6 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -30,6 +30,7 @@
#include <grub/i18n.h>
#include <grub/btrfs.h>
#include <grub/crypto.h>
+#include <grub/diskfilter.h>
GRUB_MOD_LICENSE ("GPLv3+");
@@ -705,11 +706,36 @@ rebuild_raid5 (char *dest, struct raid56_buffer *buffers,
}
}
+static grub_err_t
+raid6_recover_read_buffer (void *data, int disk_nr,
+ grub_uint64_t addr __attribute__ ((unused)),
+ void *dest, grub_size_t size)
+{
+ struct raid56_buffer *buffers = data;
+
+ if (!buffers[disk_nr].data_is_valid)
+ return grub_errno = GRUB_ERR_READ_ERROR;
+
+ grub_memcpy(dest, buffers[disk_nr].buf, size);
+
+ return grub_errno = GRUB_ERR_NONE;
+}
+
+static void
+rebuild_raid6 (struct raid56_buffer *buffers, grub_uint64_t nstripes,
+ grub_uint64_t csize, grub_uint64_t parities_pos, void *dest,
+ grub_uint64_t stripen)
+
+{
+ grub_raid6_recover_gen (buffers, nstripes, stripen, parities_pos,
+ dest, 0, csize, 0, raid6_recover_read_buffer);
+}
+
static grub_err_t
raid56_read_retry (struct grub_btrfs_data *data,
struct grub_btrfs_chunk_item *chunk,
- grub_uint64_t stripe_offset,
- grub_uint64_t csize, void *buf)
+ grub_uint64_t stripe_offset, grub_uint64_t stripen,
+ grub_uint64_t csize, void *buf, grub_uint64_t parities_pos)
{
struct raid56_buffer *buffers;
grub_uint64_t nstripes = grub_le_to_cpu16 (chunk->nstripes);
@@ -787,6 +813,15 @@ raid56_read_retry (struct grub_btrfs_data *data,
ret = GRUB_ERR_READ_ERROR;
goto cleanup;
}
+ else if (failed_devices > 2 && (chunk_type & GRUB_BTRFS_CHUNK_TYPE_RAID6))
+ {
+ grub_dprintf ("btrfs",
+ "not enough disks for raid6: total %" PRIuGRUB_UINT64_T
+ ", missing %" PRIuGRUB_UINT64_T "\n",
+ nstripes, failed_devices);
+ ret = GRUB_ERR_READ_ERROR;
+ goto cleanup;
+ }
else
grub_dprintf ("btrfs",
"enough disks for RAID 5 rebuilding: total %"
@@ -797,7 +832,7 @@ raid56_read_retry (struct grub_btrfs_data *data,
if (chunk_type & GRUB_BTRFS_CHUNK_TYPE_RAID5)
rebuild_raid5 (buf, buffers, nstripes, csize);
else
- grub_dprintf ("btrfs", "called rebuild_raid6(), NOT IMPLEMENTED\n");
+ rebuild_raid6 (buffers, nstripes, csize, parities_pos, buf, stripen);
if (buffers)
@@ -886,9 +921,11 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
unsigned redundancy = 1;
unsigned i, j;
int is_raid56;
+ grub_uint64_t parities_pos = 0;
- is_raid56 = !!(grub_le_to_cpu64 (chunk->type) &
- GRUB_BTRFS_CHUNK_TYPE_RAID5);
+ is_raid56 = !!(grub_le_to_cpu64 (chunk->type) &
+ (GRUB_BTRFS_CHUNK_TYPE_RAID5 |
+ GRUB_BTRFS_CHUNK_TYPE_RAID6));
if (grub_le_to_cpu64 (chunk->size) <= off)
{
@@ -1015,6 +1052,8 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
* - stripe_offset is the disk offset,
* - csize is the "potential" data to read. It will be reduced to
* size if the latter is smaller.
+ * - parities_pos is the position of the parity inside a row (
s/inside/in/
Post by Goffredo Baroncelli
+ * 2 for P1, 3 for P2...)
*/
block_nr = grub_divmod64 (off, chunk_stripe_length, &low);
@@ -1030,6 +1069,9 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
*/
grub_divmod64 (high + stripen, nstripes, &stripen);
+ grub_divmod64 (high + nstripes - nparities, nstripes,
+ &parities_pos);
I think that this math requires a bit of explanation in the comment
before grub_divmod64(). Especially I am interested in why high +
nstripes - nparities works as expected.

Daniel
Goffredo Baroncelli
2018-09-26 19:56:07 UTC
Permalink
[....]
Post by Goffredo Baroncelli
* - stripe_offset is the disk offset,
* - csize is the "potential" data to read. It will be reduced to
* size if the latter is smaller.
+ * - parities_pos is the position of the parity inside a row (
s/inside/in/>
Post by Goffredo Baroncelli
+ * 2 for P1, 3 for P2...)
+ * - nparities is the number of parities (1 for RAID5, 2 for RAID6);
+ * used only in RAID5/6 code.
Post by Goffredo Baroncelli
*/
block_nr = grub_divmod64 (off, chunk_stripe_length, &low);
@@ -1030,6 +1069,9 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
*/
grub_divmod64 (high + stripen, nstripes, &stripen);
+ grub_divmod64 (high + nstripes - nparities, nstripes,
+ &parities_pos);
I think that this math requires a bit of explanation in the comment
before grub_divmod64(). Especially I am interested in why high +
nstripes - nparities works as expected.
What about

/*
* parities_pos is equal to "(high - nparities) % nstripes" (see the diagram above).
* However "high - nparities" might be negative (eg when high == 0) leading to an
* incorrect computation.
* Instead "high + nstripes - nparities" is always positive and in modulo nstripes is
* equal to "(high - nparities) % nstripes
*/
Daniel
BR
G.Baroncelli
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
Daniel Kiper
2018-09-27 16:20:38 UTC
Permalink
Post by Goffredo Baroncelli
[....]
Post by Goffredo Baroncelli
* - stripe_offset is the disk offset,
* - csize is the "potential" data to read. It will be reduced to
* size if the latter is smaller.
+ * - parities_pos is the position of the parity inside a row (
s/inside/in/>
Post by Goffredo Baroncelli
+ * 2 for P1, 3 for P2...)
+ * - nparities is the number of parities (1 for RAID5, 2 for RAID6);
+ * used only in RAID5/6 code.
Post by Goffredo Baroncelli
*/
block_nr = grub_divmod64 (off, chunk_stripe_length, &low);
@@ -1030,6 +1069,9 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
*/
grub_divmod64 (high + stripen, nstripes, &stripen);
+ grub_divmod64 (high + nstripes - nparities, nstripes,
+ &parities_pos);
I think that this math requires a bit of explanation in the comment
before grub_divmod64(). Especially I am interested in why high +
nstripes - nparities works as expected.
What about
/*
* parities_pos is equal to "(high - nparities) % nstripes" (see the diagram above).
* However "high - nparities" might be negative (eg when high == 0) leading to an
* incorrect computation.
* Instead "high + nstripes - nparities" is always positive and in modulo nstripes is
* equal to "(high - nparities) % nstripes
*/
LGTM.

Daniel
Goffredo Baroncelli
2018-09-19 18:40:37 UTC
Permalink
From: Goffredo Baroncelli <***@inwind.it>

Move the code in charge to read the data from disk into a separate
function. This helps to separate the error handling logic (which depends on
the different raid profiles) from the read from disk logic.
Refactoring this code increases the general readability too.

This is a preparatory patch, to help the adding of the RAID 5/6 recovery
code.

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
Reviewed-by: Daniel Kiper <***@oracle.com>
---
grub-core/fs/btrfs.c | 75 ++++++++++++++++++++++++++------------------
1 file changed, 44 insertions(+), 31 deletions(-)

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index ee134c167..5c1ebae77 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -625,6 +625,46 @@ find_device (struct grub_btrfs_data *data, grub_uint64_t id)
return ctx.dev_found;
}

+static grub_err_t
+btrfs_read_from_chunk (struct grub_btrfs_data *data,
+ struct grub_btrfs_chunk_item *chunk,
+ grub_uint64_t stripen, grub_uint64_t stripe_offset,
+ int redundancy, grub_uint64_t csize,
+ void *buf)
+{
+ struct grub_btrfs_chunk_stripe *stripe;
+ grub_disk_addr_t paddr;
+ grub_device_t dev;
+ grub_err_t err;
+
+ stripe = (struct grub_btrfs_chunk_stripe *) (chunk + 1);
+ /* Right now the redundancy handling is easy.
+ With RAID5-like it will be more difficult. */
+ stripe += stripen + redundancy;
+
+ paddr = grub_le_to_cpu64 (stripe->offset) + stripe_offset;
+
+ grub_dprintf ("btrfs", "stripe %" PRIxGRUB_UINT64_T
+ " maps to 0x%" PRIxGRUB_UINT64_T
+ ". Reading paddr 0x%" PRIxGRUB_UINT64_T "\n",
+ stripen, stripe->offset, paddr);
+
+ dev = find_device (data, stripe->device_id);
+ if (!dev)
+ {
+ grub_dprintf ("btrfs",
+ "couldn't find a necessary member device "
+ "of multi-device filesystem\n");
+ grub_errno = GRUB_ERR_NONE;
+ return GRUB_ERR_READ_ERROR;
+ }
+
+ err = grub_disk_read (dev->disk, paddr >> GRUB_DISK_SECTOR_BITS,
+ paddr & (GRUB_DISK_SECTOR_SIZE - 1),
+ csize, buf);
+ return err;
+}
+
static grub_err_t
grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
void *buf, grub_size_t size, int recursion_depth)
@@ -638,7 +678,6 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
grub_err_t err = 0;
struct grub_btrfs_key key_out;
int challoc = 0;
- grub_device_t dev;
struct grub_btrfs_key key_in;
grub_size_t chsize;
grub_disk_addr_t chaddr;
@@ -877,36 +916,10 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,

for (i = 0; i < redundancy; i++)
{
- struct grub_btrfs_chunk_stripe *stripe;
- grub_disk_addr_t paddr;
-
- stripe = (struct grub_btrfs_chunk_stripe *) (chunk + 1);
- /* Right now the redundancy handling is easy.
- With RAID5-like it will be more difficult. */
- stripe += stripen + i;
-
- paddr = grub_le_to_cpu64 (stripe->offset) + stripe_offset;
-
- grub_dprintf ("btrfs", "stripe %" PRIxGRUB_UINT64_T
- " maps to 0x%" PRIxGRUB_UINT64_T "\n",
- stripen, stripe->offset);
- grub_dprintf ("btrfs", "reading paddr 0x%" PRIxGRUB_UINT64_T
- "\n", paddr);
-
- dev = find_device (data, stripe->device_id);
- if (!dev)
- {
- grub_dprintf ("btrfs",
- "couldn't find a necessary member device "
- "of multi-device filesystem\n");
- err = grub_errno;
- grub_errno = GRUB_ERR_NONE;
- continue;
- }
-
- err = grub_disk_read (dev->disk, paddr >> GRUB_DISK_SECTOR_BITS,
- paddr & (GRUB_DISK_SECTOR_SIZE - 1),
- csize, buf);
+ err = btrfs_read_from_chunk (data, chunk, stripen,
+ stripe_offset,
+ i, /* redundancy */
+ csize, buf);
if (!err)
break;
grub_errno = GRUB_ERR_NONE;
--
2.19.0
Goffredo Baroncelli
2018-09-19 18:40:33 UTC
Permalink
From: Goffredo Baroncelli <***@inwind.it>

This helper is used in a few places to help the debugging. As
conservative approach the error is only logged.
This does not impact the error handling.

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
Reviewed-by: Daniel Kiper <***@oracle.com>
---
grub-core/fs/btrfs.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 56c42746d..4f404f4b2 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -77,7 +77,8 @@ struct btrfs_header
{
grub_btrfs_checksum_t checksum;
grub_btrfs_uuid_t uuid;
- grub_uint8_t dummy[0x30];
+ grub_uint64_t bytenr;
+ grub_uint8_t dummy[0x28];
grub_uint32_t nitems;
grub_uint8_t level;
} GRUB_PACKED;
@@ -286,6 +287,25 @@ free_iterator (struct grub_btrfs_leaf_descriptor *desc)
grub_free (desc->data);
}

+static grub_err_t
+check_btrfs_header (struct grub_btrfs_data *data, struct btrfs_header *header,
+ grub_disk_addr_t addr)
+{
+ if (grub_le_to_cpu64 (header->bytenr) != addr)
+ {
+ grub_dprintf ("btrfs", "btrfs_header.bytenr is not equal node addr\n");
+ return grub_error (GRUB_ERR_BAD_FS,
+ "header bytenr is not equal node addr");
+ }
+ if (grub_memcmp (data->sblock.uuid, header->uuid, sizeof(grub_btrfs_uuid_t)))
+ {
+ grub_dprintf ("btrfs", "btrfs_header.uuid doesn't match sblock uuid\n");
+ return grub_error (GRUB_ERR_BAD_FS,
+ "header uuid doesn't match sblock uuid");
+ }
+ return GRUB_ERR_NONE;
+}
+
static grub_err_t
save_ref (struct grub_btrfs_leaf_descriptor *desc,
grub_disk_addr_t addr, unsigned i, unsigned m, int l)
@@ -341,6 +361,7 @@ next (struct grub_btrfs_data *data,

err = grub_btrfs_read_logical (data, grub_le_to_cpu64 (node.addr),
&head, sizeof (head), 0);
+ check_btrfs_header (data, &head, grub_le_to_cpu64 (node.addr));
if (err)
return -err;

@@ -402,6 +423,7 @@ lower_bound (struct grub_btrfs_data *data,
/* FIXME: preread few nodes into buffer. */
err = grub_btrfs_read_logical (data, addr, &head, sizeof (head),
recursion_depth + 1);
+ check_btrfs_header (data, &head, addr);
if (err)
return err;
addr += sizeof (head);
--
2.19.0
Goffredo Baroncelli
2018-09-19 18:40:34 UTC
Permalink
From: Goffredo Baroncelli <***@inwind.it>

The caller knows better if this error is fatal or not, i.e. another disk is
available or not.

This is a preparatory patch.

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
Reviewed-by: Daniel Kiper <***@oracle.com>
---
grub-core/fs/btrfs.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 4f404f4b2..0cdfaf7c0 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -604,9 +604,6 @@ find_device (struct grub_btrfs_data *data, grub_uint64_t id, int do_rescan)
grub_device_iterate (find_device_iter, &ctx);
if (!ctx.dev_found)
{
- grub_error (GRUB_ERR_BAD_FS,
- N_("couldn't find a necessary member device "
- "of multi-device filesystem"));
return NULL;
}
data->n_devices_attached++;
@@ -898,6 +895,9 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
dev = find_device (data, stripe->device_id, j);
if (!dev)
{
+ grub_dprintf ("btrfs",
+ "couldn't find a necessary member device "
+ "of multi-device filesystem\n");
err = grub_errno;
grub_errno = GRUB_ERR_NONE;
continue;
--
2.19.0
Daniel Kiper
2018-09-25 17:23:56 UTC
Permalink
Post by Goffredo Baroncelli
The caller knows better if this error is fatal or not, i.e. another disk is
available or not.
This is a preparatory patch.
---
grub-core/fs/btrfs.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 4f404f4b2..0cdfaf7c0 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -604,9 +604,6 @@ find_device (struct grub_btrfs_data *data, grub_uint64_t id, int do_rescan)
grub_device_iterate (find_device_iter, &ctx);
if (!ctx.dev_found)
{
- grub_error (GRUB_ERR_BAD_FS,
- N_("couldn't find a necessary member device "
- "of multi-device filesystem"));
return NULL;
}
I think that you can drop curly brackets too.
If you do that you can retain my Reviewed-by.

Daniel
Goffredo Baroncelli
2018-09-19 18:40:35 UTC
Permalink
From: Goffredo Baroncelli <***@inwind.it>

If a device is not found, do not return immediately but
record this failure by storing NULL in data->devices_attached[].
This way we avoid unnecessary devices rescan, and speedup the
reads in case of a degraded array.

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
---
grub-core/fs/btrfs.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 0cdfaf7c0..6e42c33f6 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -588,7 +588,7 @@ find_device_iter (const char *name, void *data)
}

static grub_device_t
-find_device (struct grub_btrfs_data *data, grub_uint64_t id, int do_rescan)
+find_device (struct grub_btrfs_data *data, grub_uint64_t id)
{
struct find_device_ctx ctx = {
.data = data,
@@ -600,12 +600,9 @@ find_device (struct grub_btrfs_data *data, grub_uint64_t id, int do_rescan)
for (i = 0; i < data->n_devices_attached; i++)
if (id == data->devices_attached[i].id)
return data->devices_attached[i].dev;
- if (do_rescan)
- grub_device_iterate (find_device_iter, &ctx);
- if (!ctx.dev_found)
- {
- return NULL;
- }
+
+ grub_device_iterate (find_device_iter, &ctx);
+
data->n_devices_attached++;
if (data->n_devices_attached > data->n_devices_allocated)
{
@@ -617,7 +614,8 @@ find_device (struct grub_btrfs_data *data, grub_uint64_t id, int do_rescan)
* sizeof (data->devices_attached[0]));
if (!data->devices_attached)
{
- grub_device_close (ctx.dev_found);
+ if (ctx.dev_found)
+ grub_device_close (ctx.dev_found);
data->devices_attached = tmp;
return NULL;
}
@@ -892,7 +890,7 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
" for laddr 0x%" PRIxGRUB_UINT64_T "\n", paddr,
addr);

- dev = find_device (data, stripe->device_id, j);
+ dev = find_device (data, stripe->device_id);
if (!dev)
{
grub_dprintf ("btrfs",
@@ -969,7 +967,8 @@ grub_btrfs_unmount (struct grub_btrfs_data *data)
unsigned i;
/* The device 0 is closed one layer upper. */
for (i = 1; i < data->n_devices_attached; i++)
- grub_device_close (data->devices_attached[i].dev);
+ if (data->devices_attached[i].dev)
+ grub_device_close (data->devices_attached[i].dev);
grub_free (data->devices_attached);
grub_free (data->extent);
grub_free (data);
--
2.19.0
Daniel Kiper
2018-09-25 17:29:31 UTC
Permalink
Post by Goffredo Baroncelli
If a device is not found, do not return immediately but
record this failure by storing NULL in data->devices_attached[].
Still the same question: Where the store happens in the code?
I cannot find it in the patch below. This have to be clarified.

Daniel
Post by Goffredo Baroncelli
This way we avoid unnecessary devices rescan, and speedup the
reads in case of a degraded array.
---
grub-core/fs/btrfs.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 0cdfaf7c0..6e42c33f6 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -588,7 +588,7 @@ find_device_iter (const char *name, void *data)
}
static grub_device_t
-find_device (struct grub_btrfs_data *data, grub_uint64_t id, int do_rescan)
+find_device (struct grub_btrfs_data *data, grub_uint64_t id)
{
struct find_device_ctx ctx = {
.data = data,
@@ -600,12 +600,9 @@ find_device (struct grub_btrfs_data *data, grub_uint64_t id, int do_rescan)
for (i = 0; i < data->n_devices_attached; i++)
if (id == data->devices_attached[i].id)
return data->devices_attached[i].dev;
- if (do_rescan)
- grub_device_iterate (find_device_iter, &ctx);
- if (!ctx.dev_found)
- {
- return NULL;
- }
+
+ grub_device_iterate (find_device_iter, &ctx);
+
data->n_devices_attached++;
if (data->n_devices_attached > data->n_devices_allocated)
{
@@ -617,7 +614,8 @@ find_device (struct grub_btrfs_data *data, grub_uint64_t id, int do_rescan)
* sizeof (data->devices_attached[0]));
if (!data->devices_attached)
{
- grub_device_close (ctx.dev_found);
+ if (ctx.dev_found)
+ grub_device_close (ctx.dev_found);
data->devices_attached = tmp;
return NULL;
}
@@ -892,7 +890,7 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
" for laddr 0x%" PRIxGRUB_UINT64_T "\n", paddr,
addr);
- dev = find_device (data, stripe->device_id, j);
+ dev = find_device (data, stripe->device_id);
if (!dev)
{
grub_dprintf ("btrfs",
@@ -969,7 +967,8 @@ grub_btrfs_unmount (struct grub_btrfs_data *data)
unsigned i;
/* The device 0 is closed one layer upper. */
for (i = 1; i < data->n_devices_attached; i++)
- grub_device_close (data->devices_attached[i].dev);
+ if (data->devices_attached[i].dev)
+ grub_device_close (data->devices_attached[i].dev);
grub_free (data->devices_attached);
grub_free (data->extent);
grub_free (data);
--
2.19.0
Goffredo Baroncelli
2018-09-26 19:55:54 UTC
Permalink
Post by Daniel Kiper
Post by Goffredo Baroncelli
If a device is not found, do not return immediately but
record this failure by storing NULL in data->devices_attached[].
Still the same question: Where the store happens in the code?
I cannot find it in the patch below. This have to be clarified.
Daniel
What about the following commit description
---------------------------------------------
Change the behavior of find_device(): before the patch, a read of a missed device might trigger a rescan. However, it is never recorded that a device is missed, so each single read of a missed device might triggers a rescan.
It is the caller who decides if a rescan is performed in case of a missed device. And it does quite often, without considering if in the past a devices was already found as "missed"
This behavior causes a lot of unneeded rescan, causing a huge slowdown in case of a missed device.

After the patch, the "missed device" information is stored in the cache (as a NULL value). A rescan is triggered only if no information at all is found in the cache. This means that only the first time a read of a missed device triggers a rescan.

The change in the code is done removing "return NULL" when the disk is not found. So it is always executed the code which stores in the cache the value returned by grub_device_iterate(): NULL if the device is missed, or a valid data otherwise.
---------------------------------------------
Post by Daniel Kiper
Post by Goffredo Baroncelli
This way we avoid unnecessary devices rescan, and speedup the
reads in case of a degraded array.
---
grub-core/fs/btrfs.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 0cdfaf7c0..6e42c33f6 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -588,7 +588,7 @@ find_device_iter (const char *name, void *data)
}
static grub_device_t
-find_device (struct grub_btrfs_data *data, grub_uint64_t id, int do_rescan)
+find_device (struct grub_btrfs_data *data, grub_uint64_t id)
{
struct find_device_ctx ctx = {
.data = data,
@@ -600,12 +600,9 @@ find_device (struct grub_btrfs_data *data, grub_uint64_t id, int do_rescan)
for (i = 0; i < data->n_devices_attached; i++)
if (id == data->devices_attached[i].id)
return data->devices_attached[i].dev;
- if (do_rescan)
- grub_device_iterate (find_device_iter, &ctx);
- if (!ctx.dev_found)
- {
- return NULL;
- }
+
+ grub_device_iterate (find_device_iter, &ctx);
+
data->n_devices_attached++;
if (data->n_devices_attached > data->n_devices_allocated)
{
@@ -617,7 +614,8 @@ find_device (struct grub_btrfs_data *data, grub_uint64_t id, int do_rescan)
* sizeof (data->devices_attached[0]));
if (!data->devices_attached)
{
- grub_device_close (ctx.dev_found);
+ if (ctx.dev_found)
+ grub_device_close (ctx.dev_found);
data->devices_attached = tmp;
return NULL;
}
@@ -892,7 +890,7 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
" for laddr 0x%" PRIxGRUB_UINT64_T "\n", paddr,
addr);
- dev = find_device (data, stripe->device_id, j);
+ dev = find_device (data, stripe->device_id);
if (!dev)
{
grub_dprintf ("btrfs",
@@ -969,7 +967,8 @@ grub_btrfs_unmount (struct grub_btrfs_data *data)
unsigned i;
/* The device 0 is closed one layer upper. */
for (i = 1; i < data->n_devices_attached; i++)
- grub_device_close (data->devices_attached[i].dev);
+ if (data->devices_attached[i].dev)
+ grub_device_close (data->devices_attached[i].dev);
grub_free (data->devices_attached);
grub_free (data->extent);
grub_free (data);
--
2.19.0
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
Daniel Kiper
2018-09-27 16:03:26 UTC
Permalink
Post by Goffredo Baroncelli
Post by Daniel Kiper
Post by Goffredo Baroncelli
If a device is not found, do not return immediately but
record this failure by storing NULL in data->devices_attached[].
Still the same question: Where the store happens in the code?
I cannot find it in the patch below. This have to be clarified.
Daniel
What about the following commit description
---------------------------------------------
Change the behavior of find_device(): before the patch, a read of a
missed device might trigger a rescan. However, it is never recorded
s/might/may/
Post by Goffredo Baroncelli
that a device is missed, so each single read of a missed device might
triggers a rescan. It is the caller who decides if a rescan is
performed in case of a missed device. And it does quite often, without
considering if in the past a devices was already found as "missed"
This behavior causes a lot of unneeded rescan, causing a huge slowdown
in case of a missed device.
After the patch, the "missed device" information is stored in the
cache (as a NULL value). A rescan is triggered only if no information
What do you mean by "cache"? ctx.dev_found? If yes please use latter
instead of former. Or both together if it makes sense.
Post by Goffredo Baroncelli
at all is found in the cache. This means that only the first time a
read of a missed device triggers a rescan.
The change in the code is done removing "return NULL" when the disk is
not found. So it is always executed the code which stores in the cache
cache?
Post by Goffredo Baroncelli
the value returned by grub_device_iterate(): NULL if the device is
missed, or a valid data otherwise.
---------------------------------------------
Otherwise it is much better than earlier one.

Daniel
Goffredo Baroncelli
2018-09-19 18:40:38 UTC
Permalink
From: Goffredo Baroncelli <***@inwind.it>

Add support for recovery for a RAID 5 btrfs profile. In addition
it is added some code as preparatory work for RAID 6 recovery code.

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
---
grub-core/fs/btrfs.c | 169 +++++++++++++++++++++++++++++++++++++++++--
1 file changed, 164 insertions(+), 5 deletions(-)

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 5c1ebae77..55a7eeffc 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -29,6 +29,7 @@
#include <minilzo.h>
#include <grub/i18n.h>
#include <grub/btrfs.h>
+#include <grub/crypto.h>

GRUB_MOD_LICENSE ("GPLv3+");

@@ -665,6 +666,148 @@ btrfs_read_from_chunk (struct grub_btrfs_data *data,
return err;
}

+struct raid56_buffer {
+ void *buf;
+ int data_is_valid;
+};
+
+static void
+rebuild_raid5 (char *dest, struct raid56_buffer *buffers,
+ grub_uint64_t nstripes, grub_uint64_t csize)
+{
+ grub_uint64_t i;
+ int first;
+
+ i = 0;
+ while (buffers[i].data_is_valid && i < nstripes)
+ ++i;
+
+ if (i == nstripes)
+ {
+ grub_dprintf ("btrfs", "called rebuild_raid5(), but all disks are OK\n");
+ return;
+ }
+
+ grub_dprintf ("btrfs", "rebuilding RAID 5 stripe #%" PRIuGRUB_UINT64_T "\n",
+ i);
+
+ for (i = 0, first = 1; i < nstripes; i++)
+ {
+ if (!buffers[i].data_is_valid)
+ continue;
+
+ if (first) {
+ grub_memcpy(dest, buffers[i].buf, csize);
+ first = 0;
+ } else
+ grub_crypto_xor (dest, dest, buffers[i].buf, csize);
+
+ }
+}
+
+static grub_err_t
+raid56_read_retry (struct grub_btrfs_data *data,
+ struct grub_btrfs_chunk_item *chunk,
+ grub_uint64_t stripe_offset,
+ grub_uint64_t csize, void *buf)
+{
+ struct raid56_buffer *buffers;
+ grub_uint64_t nstripes = grub_le_to_cpu16 (chunk->nstripes);
+ grub_uint64_t chunk_type = grub_le_to_cpu64 (chunk->type);
+ grub_err_t ret = GRUB_ERR_NONE;
+ grub_uint64_t i, failed_devices;
+
+ buffers = grub_zalloc (sizeof(*buffers) * nstripes);
+ if (!buffers)
+ {
+ ret = GRUB_ERR_OUT_OF_MEMORY;
+ goto cleanup;
+ }
+
+ for (i = 0; i < nstripes; i++)
+ {
+ buffers[i].buf = grub_zalloc (csize);
+ if (!buffers[i].buf)
+ {
+ ret = GRUB_ERR_OUT_OF_MEMORY;
+ goto cleanup;
+ }
+ }
+
+ for (failed_devices = 0, i = 0; i < nstripes; i++)
+ {
+ struct grub_btrfs_chunk_stripe *stripe;
+ grub_disk_addr_t paddr;
+ grub_device_t dev;
+ grub_err_t err2;
+
+ stripe = (struct grub_btrfs_chunk_stripe *) (chunk + 1);
+ stripe += i;
+
+ paddr = grub_le_to_cpu64 (stripe->offset) + stripe_offset;
+ grub_dprintf ("btrfs", "reading paddr %" PRIxGRUB_UINT64_T
+ " from stripe ID %" PRIxGRUB_UINT64_T "\n", paddr,
+ stripe->device_id);
+
+ dev = find_device (data, stripe->device_id);
+ if (!dev)
+ {
+ buffers[i].data_is_valid = 0;
+ grub_dprintf ("btrfs", "stripe %" PRIuGRUB_UINT64_T " FAILED (dev ID %"
+ PRIxGRUB_UINT64_T ")\n", i, stripe->device_id);
+ failed_devices++;
+ continue;
+ }
+
+ err2 = grub_disk_read (dev->disk, paddr >> GRUB_DISK_SECTOR_BITS,
+ paddr & (GRUB_DISK_SECTOR_SIZE - 1),
+ csize, buffers[i].buf);
+ if (err2 == GRUB_ERR_NONE)
+ {
+ buffers[i].data_is_valid = 1;
+ grub_dprintf ("btrfs", "stripe %" PRIuGRUB_UINT64_T " Ok (dev ID %"
+ PRIxGRUB_UINT64_T ")\n", i, stripe->device_id);
+ }
+ else
+ {
+ buffers[i].data_is_valid = 0;
+ grub_dprintf ("btrfs", "stripe %" PRIuGRUB_UINT64_T
+ " FAILED (dev ID %" PRIxGRUB_UINT64_T ")\n", i,
+ stripe->device_id);
+ failed_devices++;
+ }
+ }
+
+ if (failed_devices > 1 && (chunk_type & GRUB_BTRFS_CHUNK_TYPE_RAID5))
+ {
+ grub_dprintf ("btrfs",
+ "not enough disks for RAID 5: total %" PRIuGRUB_UINT64_T
+ ", missing %" PRIuGRUB_UINT64_T "\n",
+ nstripes, failed_devices);
+ ret = GRUB_ERR_READ_ERROR;
+ goto cleanup;
+ }
+ else
+ grub_dprintf ("btrfs",
+ "enough disks for RAID 5 rebuilding: total %"
+ PRIuGRUB_UINT64_T ", missing %" PRIuGRUB_UINT64_T "\n",
+ nstripes, failed_devices);
+
+ /* if these are enough, try to rebuild the data */
+ if (chunk_type & GRUB_BTRFS_CHUNK_TYPE_RAID5)
+ rebuild_raid5 (buf, buffers, nstripes, csize);
+ else
+ grub_dprintf ("btrfs", "called rebuild_raid6(), NOT IMPLEMENTED\n");
+
+ cleanup:
+ if (buffers)
+ for (i = 0; i < nstripes; i++)
+ grub_free(buffers[i].buf);
+ grub_free(buffers);
+
+ return ret;
+}
+
static grub_err_t
grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
void *buf, grub_size_t size, int recursion_depth)
@@ -742,6 +885,10 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
grub_uint16_t nstripes;
unsigned redundancy = 1;
unsigned i, j;
+ int is_raid56;
+
+ is_raid56 = !!(grub_le_to_cpu64 (chunk->type) &
+ GRUB_BTRFS_CHUNK_TYPE_RAID5);

if (grub_le_to_cpu64 (chunk->size) <= off)
{
@@ -914,17 +1061,29 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
grub_dprintf ("btrfs", "reading laddr 0x%" PRIxGRUB_UINT64_T "\n",
addr);

- for (i = 0; i < redundancy; i++)
+ if (!is_raid56)
+ for (i = 0; i < redundancy; i++)
+ {
+ err = btrfs_read_from_chunk (data, chunk, stripen,
+ stripe_offset,
+ i, /* redundancy */
+ csize, buf);
+ if (!err)
+ break;
+ grub_errno = GRUB_ERR_NONE;
+ }
+ else
{
err = btrfs_read_from_chunk (data, chunk, stripen,
stripe_offset,
- i, /* redundancy */
+ 0, /* no mirror */
csize, buf);
- if (!err)
- break;
grub_errno = GRUB_ERR_NONE;
+ if (err != GRUB_ERR_NONE)
+ err = raid56_read_retry (data, chunk, stripe_offset,
+ csize, buf);
}
- if (i != redundancy)
+ if (err == GRUB_ERR_NONE)
break;
}
if (err)
--
2.19.0
Daniel Kiper
2018-09-25 19:10:32 UTC
Permalink
Post by Goffredo Baroncelli
Add support for recovery for a RAID 5 btrfs profile. In addition
it is added some code as preparatory work for RAID 6 recovery code.
---
grub-core/fs/btrfs.c | 169 +++++++++++++++++++++++++++++++++++++++++--
1 file changed, 164 insertions(+), 5 deletions(-)
diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 5c1ebae77..55a7eeffc 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -29,6 +29,7 @@
#include <minilzo.h>
#include <grub/i18n.h>
#include <grub/btrfs.h>
+#include <grub/crypto.h>
GRUB_MOD_LICENSE ("GPLv3+");
@@ -665,6 +666,148 @@ btrfs_read_from_chunk (struct grub_btrfs_data *data,
return err;
}
+struct raid56_buffer {
+ void *buf;
+ int data_is_valid;
+};
+
+static void
+rebuild_raid5 (char *dest, struct raid56_buffer *buffers,
+ grub_uint64_t nstripes, grub_uint64_t csize)
+{
+ grub_uint64_t i;
+ int first;
+
+ i = 0;
+ while (buffers[i].data_is_valid && i < nstripes)
+ ++i;
for (i = 0; buffers[i].data_is_valid && i < nstripes; i++);
Post by Goffredo Baroncelli
+ if (i == nstripes)
+ {
+ grub_dprintf ("btrfs", "called rebuild_raid5(), but all disks are OK\n");
+ return;
+ }
+
+ grub_dprintf ("btrfs", "rebuilding RAID 5 stripe #%" PRIuGRUB_UINT64_T "\n",
+ i);
One line here please.
Post by Goffredo Baroncelli
+ for (i = 0, first = 1; i < nstripes; i++)
+ {
+ if (!buffers[i].data_is_valid)
+ continue;
+
+ if (first) {
+ grub_memcpy(dest, buffers[i].buf, csize);
+ first = 0;
+ } else
+ grub_crypto_xor (dest, dest, buffers[i].buf, csize);
+
+ }
Hmmm... I think that this function can be simpler. You can drop first
while/for and "if (i == nstripes)". Then here:

if (first) {
grub_dprintf ("btrfs", "called rebuild_raid5(), but all disks are OK\n");

Am I right?
Post by Goffredo Baroncelli
+}
+
+static grub_err_t
+raid56_read_retry (struct grub_btrfs_data *data,
+ struct grub_btrfs_chunk_item *chunk,
+ grub_uint64_t stripe_offset,
+ grub_uint64_t csize, void *buf)
+{
+ struct raid56_buffer *buffers;
+ grub_uint64_t nstripes = grub_le_to_cpu16 (chunk->nstripes);
+ grub_uint64_t chunk_type = grub_le_to_cpu64 (chunk->type);
+ grub_err_t ret = GRUB_ERR_NONE;
s/GRUB_ERR_NONE/GRUB_ERR_OUT_OF_MEMORY/ and then you can drop at
least two relevant assigments and some curly brackets. Of course
before cleanup label you have to add ret = GRUB_ERR_NONE.
Post by Goffredo Baroncelli
+ grub_uint64_t i, failed_devices;
+
+ buffers = grub_zalloc (sizeof(*buffers) * nstripes);
+ if (!buffers)
+ {
+ ret = GRUB_ERR_OUT_OF_MEMORY;
+ goto cleanup;
+ }
+
+ for (i = 0; i < nstripes; i++)
+ {
+ buffers[i].buf = grub_zalloc (csize);
+ if (!buffers[i].buf)
+ {
+ ret = GRUB_ERR_OUT_OF_MEMORY;
+ goto cleanup;
+ }
+ }
+
+ for (failed_devices = 0, i = 0; i < nstripes; i++)
+ {
+ struct grub_btrfs_chunk_stripe *stripe;
+ grub_disk_addr_t paddr;
+ grub_device_t dev;
+ grub_err_t err2;
s/err2/err/?
Post by Goffredo Baroncelli
+
+ stripe = (struct grub_btrfs_chunk_stripe *) (chunk + 1);
+ stripe += i;
Why not stripe = ((struct grub_btrfs_chunk_stripe *) (chunk + 1)) + i;?

Daniel
Goffredo Baroncelli
2018-09-26 19:55:57 UTC
Permalink
Post by Daniel Kiper
Post by Goffredo Baroncelli
Add support for recovery for a RAID 5 btrfs profile. In addition
it is added some code as preparatory work for RAID 6 recovery code.
---
grub-core/fs/btrfs.c | 169 +++++++++++++++++++++++++++++++++++++++++--
1 file changed, 164 insertions(+), 5 deletions(-)
diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 5c1ebae77..55a7eeffc 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -29,6 +29,7 @@
#include <minilzo.h>
#include <grub/i18n.h>
#include <grub/btrfs.h>
+#include <grub/crypto.h>
GRUB_MOD_LICENSE ("GPLv3+");
@@ -665,6 +666,148 @@ btrfs_read_from_chunk (struct grub_btrfs_data *data,
return err;
}
+struct raid56_buffer {
+ void *buf;
+ int data_is_valid;
+};
+
+static void
+rebuild_raid5 (char *dest, struct raid56_buffer *buffers,
+ grub_uint64_t nstripes, grub_uint64_t csize)
+{
+ grub_uint64_t i;
+ int first;
+
+ i = 0;
+ while (buffers[i].data_is_valid && i < nstripes)
+ ++i;
for (i = 0; buffers[i].data_is_valid && i < nstripes; i++);
Post by Goffredo Baroncelli
+ if (i == nstripes)
+ {
+ grub_dprintf ("btrfs", "called rebuild_raid5(), but all disks are OK\n");
+ return;
+ }
+
+ grub_dprintf ("btrfs", "rebuilding RAID 5 stripe #%" PRIuGRUB_UINT64_T "\n",
+ i);
One line here please.
Post by Goffredo Baroncelli
+ for (i = 0, first = 1; i < nstripes; i++)
+ {
+ if (!buffers[i].data_is_valid)
+ continue;
+
+ if (first) {
+ grub_memcpy(dest, buffers[i].buf, csize);
+ first = 0;
+ } else
+ grub_crypto_xor (dest, dest, buffers[i].buf, csize);
+
+ }
Hmmm... I think that this function can be simpler. You can drop first
if (first) {
grub_dprintf ("btrfs", "called rebuild_raid5(), but all disks are OK\n");
Am I right?
Ehm.. no. The "if" is an internal check to avoid BUG. rebuild_raid5() should be called only if some disk is missed.
To perform this control, the code checks if all buffers are valid. Otherwise there is an internal BUG.

Checking "first" is a completely different test.
Post by Daniel Kiper
Post by Goffredo Baroncelli
+}
+
+static grub_err_t
+raid56_read_retry (struct grub_btrfs_data *data,
+ struct grub_btrfs_chunk_item *chunk,
+ grub_uint64_t stripe_offset,
+ grub_uint64_t csize, void *buf)
+{
+ struct raid56_buffer *buffers;
+ grub_uint64_t nstripes = grub_le_to_cpu16 (chunk->nstripes);
+ grub_uint64_t chunk_type = grub_le_to_cpu64 (chunk->type);
+ grub_err_t ret = GRUB_ERR_NONE;
s/GRUB_ERR_NONE/GRUB_ERR_OUT_OF_MEMORY/ and then you can drop at
least two relevant assigments and some curly brackets. Of course
before cleanup label you have to add ret = GRUB_ERR_NONE.
Post by Goffredo Baroncelli
+ grub_uint64_t i, failed_devices;
+
+ buffers = grub_zalloc (sizeof(*buffers) * nstripes);
+ if (!buffers)
+ {
+ ret = GRUB_ERR_OUT_OF_MEMORY;
+ goto cleanup;
+ }
+
+ for (i = 0; i < nstripes; i++)
+ {
+ buffers[i].buf = grub_zalloc (csize);
+ if (!buffers[i].buf)
+ {
+ ret = GRUB_ERR_OUT_OF_MEMORY;
+ goto cleanup;
+ }
+ }
+
+ for (failed_devices = 0, i = 0; i < nstripes; i++)
+ {
+ struct grub_btrfs_chunk_stripe *stripe;
+ grub_disk_addr_t paddr;
+ grub_device_t dev;
+ grub_err_t err2;
s/err2/err/?
Ok
Post by Daniel Kiper
Post by Goffredo Baroncelli
+
+ stripe = (struct grub_btrfs_chunk_stripe *) (chunk + 1);
+ stripe += i;
Why not stripe = ((struct grub_btrfs_chunk_stripe *) (chunk + 1)) + i;?
Make sense
Post by Daniel Kiper
Daniel
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
Daniel Kiper
2018-09-27 16:18:37 UTC
Permalink
Post by Goffredo Baroncelli
Post by Daniel Kiper
Post by Goffredo Baroncelli
Add support for recovery for a RAID 5 btrfs profile. In addition
it is added some code as preparatory work for RAID 6 recovery code.
---
grub-core/fs/btrfs.c | 169 +++++++++++++++++++++++++++++++++++++++++--
1 file changed, 164 insertions(+), 5 deletions(-)
diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 5c1ebae77..55a7eeffc 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -29,6 +29,7 @@
#include <minilzo.h>
#include <grub/i18n.h>
#include <grub/btrfs.h>
+#include <grub/crypto.h>
GRUB_MOD_LICENSE ("GPLv3+");
@@ -665,6 +666,148 @@ btrfs_read_from_chunk (struct grub_btrfs_data *data,
return err;
}
+struct raid56_buffer {
+ void *buf;
+ int data_is_valid;
+};
+
+static void
+rebuild_raid5 (char *dest, struct raid56_buffer *buffers,
+ grub_uint64_t nstripes, grub_uint64_t csize)
+{
+ grub_uint64_t i;
+ int first;
+
+ i = 0;
+ while (buffers[i].data_is_valid && i < nstripes)
+ ++i;
for (i = 0; buffers[i].data_is_valid && i < nstripes; i++);
Post by Goffredo Baroncelli
+ if (i == nstripes)
+ {
+ grub_dprintf ("btrfs", "called rebuild_raid5(), but all disks are OK\n");
+ return;
+ }
+
+ grub_dprintf ("btrfs", "rebuilding RAID 5 stripe #%" PRIuGRUB_UINT64_T "\n",
+ i);
One line here please.
Post by Goffredo Baroncelli
+ for (i = 0, first = 1; i < nstripes; i++)
+ {
+ if (!buffers[i].data_is_valid)
+ continue;
+
+ if (first) {
+ grub_memcpy(dest, buffers[i].buf, csize);
+ first = 0;
+ } else
+ grub_crypto_xor (dest, dest, buffers[i].buf, csize);
+
+ }
Hmmm... I think that this function can be simpler. You can drop first
if (first) {
grub_dprintf ("btrfs", "called rebuild_raid5(), but all disks are OK\n");
Am I right?
Ehm.. no. The "if" is an internal check to avoid BUG. rebuild_raid5() should be called only if some disk is missed.
To perform this control, the code checks if all buffers are valid. Otherwise there is an internal BUG.
Something is wrong here. I think that the code checks if it is an invalid
buffer. If there is not then GRUB complains. Right? However, it looks
that I misread the code and made a mistake here. So, please ignore
this change. Though please change while() with for() at the beginning.

Daniel
Goffredo Baroncelli
2018-09-19 18:40:39 UTC
Permalink
From: Goffredo Baroncelli <***@inwind.it>

The original code which handles the recovery of a RAID 6 disks array
assumes that all reads are multiple of 1 << GRUB_DISK_SECTOR_BITS and it
assumes that all the I/O is done via the struct grub_diskfilter_segment.
This is not true for the btrfs code. In order to reuse the native
grub_raid6_recover() code, it is modified to not call
grub_diskfilter_read_node() directly, but to call an handler passed
as an argument.

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
Reviewed-by: Daniel Kiper <***@oracle.com>
---
grub-core/disk/raid6_recover.c | 52 ++++++++++++++++++++++------------
include/grub/diskfilter.h | 9 ++++++
2 files changed, 43 insertions(+), 18 deletions(-)

diff --git a/grub-core/disk/raid6_recover.c b/grub-core/disk/raid6_recover.c
index aa674f6ca..0cf691ddf 100644
--- a/grub-core/disk/raid6_recover.c
+++ b/grub-core/disk/raid6_recover.c
@@ -74,14 +74,26 @@ mod_255 (unsigned x)
}

static grub_err_t
-grub_raid6_recover (struct grub_diskfilter_segment *array, int disknr, int p,
- char *buf, grub_disk_addr_t sector, grub_size_t size)
+raid6_recover_read_node (void *data, int disknr,
+ grub_uint64_t sector,
+ void *buf, grub_size_t size)
+{
+ struct grub_diskfilter_segment *array = data;
+
+ return grub_diskfilter_read_node (&array->nodes[disknr],
+ (grub_disk_addr_t)sector,
+ size >> GRUB_DISK_SECTOR_BITS, buf);
+}
+
+grub_err_t
+grub_raid6_recover_gen (void *data, grub_uint64_t nstripes, int disknr, int p,
+ char *buf, grub_uint64_t sector, grub_size_t size,
+ int layout, raid_recover_read_t read_func)
{
int i, q, pos;
int bad1 = -1, bad2 = -1;
char *pbuf = 0, *qbuf = 0;

- size <<= GRUB_DISK_SECTOR_BITS;
pbuf = grub_zalloc (size);
if (!pbuf)
goto quit;
@@ -91,17 +103,17 @@ grub_raid6_recover (struct grub_diskfilter_segment *array, int disknr, int p,
goto quit;

q = p + 1;
- if (q == (int) array->node_count)
+ if (q == (int) nstripes)
q = 0;

pos = q + 1;
- if (pos == (int) array->node_count)
+ if (pos == (int) nstripes)
pos = 0;

- for (i = 0; i < (int) array->node_count - 2; i++)
+ for (i = 0; i < (int) nstripes - 2; i++)
{
int c;
- if (array->layout & GRUB_RAID_LAYOUT_MUL_FROM_POS)
+ if (layout & GRUB_RAID_LAYOUT_MUL_FROM_POS)
c = pos;
else
c = i;
@@ -109,8 +121,7 @@ grub_raid6_recover (struct grub_diskfilter_segment *array, int disknr, int p,
bad1 = c;
else
{
- if (! grub_diskfilter_read_node (&array->nodes[pos], sector,
- size >> GRUB_DISK_SECTOR_BITS, buf))
+ if (!read_func(data, pos, sector, buf, size))
{
grub_crypto_xor (pbuf, pbuf, buf, size);
grub_raid_block_mulx (c, buf, size);
@@ -128,7 +139,7 @@ grub_raid6_recover (struct grub_diskfilter_segment *array, int disknr, int p,
}

pos++;
- if (pos == (int) array->node_count)
+ if (pos == (int) nstripes)
pos = 0;
}

@@ -139,16 +150,14 @@ grub_raid6_recover (struct grub_diskfilter_segment *array, int disknr, int p,
if (bad2 < 0)
{
/* One bad device */
- if ((! grub_diskfilter_read_node (&array->nodes[p], sector,
- size >> GRUB_DISK_SECTOR_BITS, buf)))
+ if (!read_func(data, p, sector, buf, size))
{
grub_crypto_xor (buf, buf, pbuf, size);
goto quit;
}

grub_errno = GRUB_ERR_NONE;
- if (grub_diskfilter_read_node (&array->nodes[q], sector,
- size >> GRUB_DISK_SECTOR_BITS, buf))
+ if (read_func(data, q, sector, buf, size))
goto quit;

grub_crypto_xor (buf, buf, qbuf, size);
@@ -160,14 +169,12 @@ grub_raid6_recover (struct grub_diskfilter_segment *array, int disknr, int p,
/* Two bad devices */
unsigned c;

- if (grub_diskfilter_read_node (&array->nodes[p], sector,
- size >> GRUB_DISK_SECTOR_BITS, buf))
+ if (read_func(data, p, sector, buf, size))
goto quit;

grub_crypto_xor (pbuf, pbuf, buf, size);

- if (grub_diskfilter_read_node (&array->nodes[q], sector,
- size >> GRUB_DISK_SECTOR_BITS, buf))
+ if (read_func(data, q, sector, buf, size))
goto quit;

grub_crypto_xor (qbuf, qbuf, buf, size);
@@ -190,6 +197,15 @@ quit:
return grub_errno;
}

+static grub_err_t
+grub_raid6_recover (struct grub_diskfilter_segment *array, int disknr, int p,
+ char *buf, grub_disk_addr_t sector, grub_size_t size)
+{
+ return grub_raid6_recover_gen (array, array->node_count, disknr, p, buf,
+ sector, size << GRUB_DISK_SECTOR_BITS,
+ array->layout, raid6_recover_read_node);
+}
+
GRUB_MOD_INIT(raid6rec)
{
grub_raid6_init_table ();
diff --git a/include/grub/diskfilter.h b/include/grub/diskfilter.h
index d89273c1b..8deb1a8c3 100644
--- a/include/grub/diskfilter.h
+++ b/include/grub/diskfilter.h
@@ -189,6 +189,15 @@ typedef grub_err_t (*grub_raid6_recover_func_t) (struct grub_diskfilter_segment
extern grub_raid5_recover_func_t grub_raid5_recover_func;
extern grub_raid6_recover_func_t grub_raid6_recover_func;

+typedef grub_err_t (* raid_recover_read_t)(void *data, int disk_nr,
+ grub_uint64_t addr, void *dest,
+ grub_size_t size);
+
+extern grub_err_t
+grub_raid6_recover_gen (void *data, grub_uint64_t nstripes, int disknr, int p,
+ char *buf, grub_uint64_t sector, grub_size_t size,
+ int layout, raid_recover_read_t read_func);
+
grub_err_t grub_diskfilter_vg_register (struct grub_diskfilter_vg *vg);

grub_err_t
--
2.19.0
Goffredo Baroncelli
2018-09-27 18:34:59 UTC
Permalink
From: Goffredo Baroncelli <***@inwind.it>

Change the behavior of find_device(): before the patch, a read of a missed
device may trigger a rescan. However, it is never recorded that a device
is missed, so each single read of a missed device may triggers a rescan.
It is the caller who decides if a rescan is performed in case of a missed
device. And it does quite often, without considering if in the past a
devices was already found as "missed"
This behavior causes a lot of unneeded rescan, causing a huge slowdown in
case of a missed device.

After the patch, the "missed device" information is stored in the
data->devices_attached[] array (as a NULL value in the field dev). A rescan
is triggered only if no information at all is found. This means that only
the first time a read of a missed device triggers a rescan.

The change in the code is done removing "return NULL" when the disk is not
found. So it is always executed the code which stores in the
data->devices_attached[] array the value returned by grub_device_iterate():
NULL if the device is missed, or a valid data otherwise.

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
---
grub-core/fs/btrfs.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 2b1dd7dd4..c07d91657 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -588,7 +588,7 @@ find_device_iter (const char *name, void *data)
}

static grub_device_t
-find_device (struct grub_btrfs_data *data, grub_uint64_t id, int do_rescan)
+find_device (struct grub_btrfs_data *data, grub_uint64_t id)
{
struct find_device_ctx ctx = {
.data = data,
@@ -600,10 +600,9 @@ find_device (struct grub_btrfs_data *data, grub_uint64_t id, int do_rescan)
for (i = 0; i < data->n_devices_attached; i++)
if (id == data->devices_attached[i].id)
return data->devices_attached[i].dev;
- if (do_rescan)
- grub_device_iterate (find_device_iter, &ctx);
- if (!ctx.dev_found)
- return NULL;
+
+ grub_device_iterate (find_device_iter, &ctx);
+
data->n_devices_attached++;
if (data->n_devices_attached > data->n_devices_allocated)
{
@@ -615,7 +614,8 @@ find_device (struct grub_btrfs_data *data, grub_uint64_t id, int do_rescan)
* sizeof (data->devices_attached[0]));
if (!data->devices_attached)
{
- grub_device_close (ctx.dev_found);
+ if (ctx.dev_found)
+ grub_device_close (ctx.dev_found);
data->devices_attached = tmp;
return NULL;
}
@@ -898,7 +898,7 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
" for laddr 0x%" PRIxGRUB_UINT64_T "\n", paddr,
addr);

- dev = find_device (data, stripe->device_id, j);
+ dev = find_device (data, stripe->device_id);
if (!dev)
{
grub_dprintf ("btrfs",
@@ -975,7 +975,8 @@ grub_btrfs_unmount (struct grub_btrfs_data *data)
unsigned i;
/* The device 0 is closed one layer upper. */
for (i = 1; i < data->n_devices_attached; i++)
- grub_device_close (data->devices_attached[i].dev);
+ if (data->devices_attached[i].dev)
+ grub_device_close (data->devices_attached[i].dev);
grub_free (data->devices_attached);
grub_free (data->extent);
grub_free (data);
--
2.19.0
Daniel Kiper
2018-10-09 17:56:43 UTC
Permalink
Post by Goffredo Baroncelli
Change the behavior of find_device(): before the patch, a read of a missed
device may trigger a rescan. However, it is never recorded that a device
is missed, so each single read of a missed device may triggers a rescan.
It is the caller who decides if a rescan is performed in case of a missed
device. And it does quite often, without considering if in the past a
devices was already found as "missed"
This behavior causes a lot of unneeded rescan, causing a huge slowdown in
case of a missed device.
After the patch, the "missed device" information is stored in the
data->devices_attached[] array (as a NULL value in the field dev). A rescan
is triggered only if no information at all is found. This means that only
the first time a read of a missed device triggers a rescan.
The change in the code is done removing "return NULL" when the disk is not
found. So it is always executed the code which stores in the
NULL if the device is missed, or a valid data otherwise.
Commit message still begs for improvement. I will send you a proposal shortly.

Daniel
Daniel Kiper
2018-10-11 16:54:11 UTC
Permalink
Post by Daniel Kiper
Post by Goffredo Baroncelli
Change the behavior of find_device(): before the patch, a read of a missed
device may trigger a rescan. However, it is never recorded that a device
is missed, so each single read of a missed device may triggers a rescan.
It is the caller who decides if a rescan is performed in case of a missed
device. And it does quite often, without considering if in the past a
devices was already found as "missed"
This behavior causes a lot of unneeded rescan, causing a huge slowdown in
case of a missed device.
After the patch, the "missed device" information is stored in the
data->devices_attached[] array (as a NULL value in the field dev). A rescan
is triggered only if no information at all is found. This means that only
the first time a read of a missed device triggers a rescan.
The change in the code is done removing "return NULL" when the disk is not
found. So it is always executed the code which stores in the
NULL if the device is missed, or a valid data otherwise.
Commit message still begs for improvement. I will send you a proposal shortly.
Below you can find updated commit message. Please verify it is OK.

Currently read from missing device triggers rescan. However, it is never
recorded that the device is missing. So, each read of a missing device
triggers rescan again and again. This behavior causes a lot of unneeded
rescans leading to huge slowdowns.

This patch fixes above mentioned issue. Information about missing devices
is stored in the data->devices_attached[] array as NULL value in dev
member. Rescan is triggered only if no information is found for a given
device. This means that only first time read triggers rescan.

The patch drops premature return. This way data->devices_attached[] is
filled even when a given device is missing.

Daniel
Goffredo Baroncelli
2018-09-27 18:34:56 UTC
Permalink
From: Goffredo Baroncelli <***@inwind.it>

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
---
grub-core/fs/btrfs.c | 74 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 74 insertions(+)

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index be195448d..9bc6d399d 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
#define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10
#define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED 0x20
#define GRUB_BTRFS_CHUNK_TYPE_RAID10 0x40
+#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80
+#define GRUB_BTRFS_CHUNK_TYPE_RAID6 0x100
grub_uint8_t dummy2[0xc];
grub_uint16_t nstripes;
grub_uint16_t nsubstripes;
@@ -764,6 +766,78 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
stripe_offset = low + chunk_stripe_length
* high;
csize = chunk_stripe_length - low;
+ break;
+ }
+ case GRUB_BTRFS_CHUNK_TYPE_RAID5:
+ case GRUB_BTRFS_CHUNK_TYPE_RAID6:
+ {
+ grub_uint64_t nparities, stripe_nr, high, low;
+
+ redundancy = 1; /* no redundancy for now */
+
+ if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
+ {
+ grub_dprintf ("btrfs", "RAID5\n");
+ nparities = 1;
+ }
+ else
+ {
+ grub_dprintf ("btrfs", "RAID6\n");
+ nparities = 2;
+ }
+
+ /*
+ * A RAID 6 layout consists of several stripes spread on the
+ * disks, following a layout like the one below
+ *
+ * Disk0 Disk1 Disk2 Disk3
+ *
+ * A1 B1 P1 Q1
+ * Q2 A2 B2 P2
+ * P3 Q3 A3 B3
+ * [...]
+ *
+ * Note that the placement of the parities depends on row index.
+ * Pay attention that the BTRFS terminolgy may be different
+ * from others RAID implementation (e.g. lvm/dm or md). In BTRFS
+ * a contiguous block of data of a disk (like A1) is called
+ * stripe.
+ * In the code below:
+ * - stripe_nr is the stripe number without the parities
+ * (A1 = 0, B1 = 1, A2 = 2, B2 = 3, ...),
+ * - high is the row number (0 for A1...Q1, 1 for Q2...P2, ...),
+ * - stripen is the disk number in a row (0 for A1,Q2,P3, 1 for
+ * B1...),
+ * - off is the logical address to read,
+ * - chunk_stripe_length is the size of a stripe (typically 64k),
+ * - nstripes is the number of disks of a row
+ * - low is the offset of the data inside a stripe,
+ * - stripe_offset is the data offset in an array,
+ * - csize is the "potential" data to read. It will be reduced to
+ * size if the latter is smaller.
+ * - nparities is the number of parities (1 for RAID5, 2 for
+ * RAID6); used only in RAID5/6 code.
+ */
+ stripe_nr = grub_divmod64 (off, chunk_stripe_length, &low);
+
+ /*
+ * stripen is computed without the parities (0 for A1, A2, A3...
+ * 1 for B1, B2...).
+ */
+ high = grub_divmod64 (stripe_nr, nstripes - nparities, &stripen);
+
+ /*
+ * the stripes are spread across the disks, each row their
+ * position is shifted by 1 place. So to compute the real disk
+ * number occuped by a stripe, we need to sum also the
+ * "row number" in modulo nstripes (0 for A1, 1 for A2, 2 for
+ * A3....).
+ */
+ grub_divmod64 (high + stripen, nstripes, &stripen);
+
+ stripe_offset = low + chunk_stripe_length * high;
+ csize = chunk_stripe_length - low;
+
break;
}
default:
--
2.19.0
Daniel Kiper
2018-10-09 17:51:01 UTC
Permalink
Code LGTM. Though comment begs improvement. I will send you updated
comment for approval shortly.

Daniel
Daniel Kiper
2018-10-11 13:17:00 UTC
Permalink
Post by Daniel Kiper
Code LGTM. Though comment begs improvement. I will send you updated
comment for approval shortly.
Below you can find updated patch. Please check I have not messed up something.

Daniel

From ecefb12a10d39bdd09e1d2b8fbbcbdb1b35274f8 Mon Sep 17 00:00:00 2001
From: Goffredo Baroncelli <***@inwind.it>
Date: Thu, 27 Sep 2018 20:34:56 +0200
Subject: [PATCH 1/1] btrfs: Add support for reading a filesystem with a RAID
5 or RAID 6 profile.

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
Signed-off-by: Daniel Kiper <***@oracle.com>
---
grub-core/fs/btrfs.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 73 insertions(+)

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index be19544..933a57d 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
#define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10
#define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED 0x20
#define GRUB_BTRFS_CHUNK_TYPE_RAID10 0x40
+#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80
+#define GRUB_BTRFS_CHUNK_TYPE_RAID6 0x100
grub_uint8_t dummy2[0xc];
grub_uint16_t nstripes;
grub_uint16_t nsubstripes;
@@ -766,6 +768,77 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
csize = chunk_stripe_length - low;
break;
}
+ case GRUB_BTRFS_CHUNK_TYPE_RAID5:
+ case GRUB_BTRFS_CHUNK_TYPE_RAID6:
+ {
+ grub_uint64_t nparities, stripe_nr, high, low;
+
+ redundancy = 1; /* no redundancy for now */
+
+ if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
+ {
+ grub_dprintf ("btrfs", "RAID5\n");
+ nparities = 1;
+ }
+ else
+ {
+ grub_dprintf ("btrfs", "RAID6\n");
+ nparities = 2;
+ }
+
+ /*
+ * RAID 6 layout consists of several stripes spread over
+ * the disks, e.g.:
+ *
+ * Disk_0 Disk_1 Disk_2 Disk_3
+ * A0 B0 P0 Q0
+ * Q1 A1 B1 P1
+ * P2 Q2 A2 B2
+ *
+ * Note: placement of the parities depend on row number.
+ *
+ * Pay attention that the btrfs terminology may differ from
+ * terminology used in other RAID implementations, e.g. LVM,
+ * dm or md. The main difference is that btrfs calls contiguous
+ * block of data on a given disk, e.g. A0, stripe instead of chunk.
+ *
+ * The variables listed below have following meaning:
+ * - stripe_nr is the stripe number excluding the parities
+ * (A0 = 0, B0 = 1, A1 = 2, B1 = 3, etc.),
+ * - high is the row number (0 for A0...Q0, 1 for Q1...P1, etc.),
+ * - stripen is the disk number in a row (0 for A0, Q1, P2,
+ * 1 for B0, A1, Q2, etc.),
+ * - off is the logical address to read,
+ * - chunk_stripe_length is the size of a stripe (typically 64 KiB),
+ * - nstripes is the number of disks in a row,
+ * - low is the offset of the data inside a stripe,
+ * - stripe_offset is the data offset in an array,
+ * - csize is the "potential" data to read; it will be reduced
+ * to size if the latter is smaller,
+ * - nparities is the number of parities (1 for RAID 5, 2 for
+ * RAID 6); used only in RAID 5/6 code.
+ */
+ stripe_nr = grub_divmod64 (off, chunk_stripe_length, &low);
+
+ /*
+ * stripen is computed without the parities
+ * (0 for A0, A1, A2, 1 for B0, B1, B2, etc.).
+ */
+ high = grub_divmod64 (stripe_nr, nstripes - nparities, &stripen);
+
+ /*
+ * The stripes are spread over the disks. Every each row their
+ * positions are shifted by 1 place. So, the real disks number
+ * change. Hence, we have to take current row number modulo
+ * nstripes into account (0 for A0, 1 for A1, 2 for A2, etc.).
+ */
+ grub_divmod64 (high + stripen, nstripes, &stripen);
+
+ stripe_offset = low + chunk_stripe_length * high;
+ csize = chunk_stripe_length - low;
+
+ break;
+ }
default:
grub_dprintf ("btrfs", "unsupported RAID\n");
return grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET,
--
1.7.10.4
Goffredo Baroncelli
2018-09-27 18:35:01 UTC
Permalink
From: Goffredo Baroncelli <***@inwind.it>

Move the code in charge to read the data from disk into a separate
function. This helps to separate the error handling logic (which depends on
the different raid profiles) from the read from disk logic.
Refactoring this code increases the general readability too.

This is a preparatory patch, to help the adding of the RAID 5/6 recovery
code.

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
Reviewed-by: Daniel Kiper <***@oracle.com>
---
grub-core/fs/btrfs.c | 75 ++++++++++++++++++++++++++------------------
1 file changed, 44 insertions(+), 31 deletions(-)

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 179a2da9d..554f350c5 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -625,6 +625,46 @@ find_device (struct grub_btrfs_data *data, grub_uint64_t id)
return ctx.dev_found;
}

+static grub_err_t
+btrfs_read_from_chunk (struct grub_btrfs_data *data,
+ struct grub_btrfs_chunk_item *chunk,
+ grub_uint64_t stripen, grub_uint64_t stripe_offset,
+ int redundancy, grub_uint64_t csize,
+ void *buf)
+{
+ struct grub_btrfs_chunk_stripe *stripe;
+ grub_disk_addr_t paddr;
+ grub_device_t dev;
+ grub_err_t err;
+
+ stripe = (struct grub_btrfs_chunk_stripe *) (chunk + 1);
+ /* Right now the redundancy handling is easy.
+ With RAID5-like it will be more difficult. */
+ stripe += stripen + redundancy;
+
+ paddr = grub_le_to_cpu64 (stripe->offset) + stripe_offset;
+
+ grub_dprintf ("btrfs", "stripe %" PRIxGRUB_UINT64_T
+ " maps to 0x%" PRIxGRUB_UINT64_T
+ ". Reading paddr 0x%" PRIxGRUB_UINT64_T "\n",
+ stripen, stripe->offset, paddr);
+
+ dev = find_device (data, stripe->device_id);
+ if (!dev)
+ {
+ grub_dprintf ("btrfs",
+ "couldn't find a necessary member device "
+ "of multi-device filesystem\n");
+ grub_errno = GRUB_ERR_NONE;
+ return GRUB_ERR_READ_ERROR;
+ }
+
+ err = grub_disk_read (dev->disk, paddr >> GRUB_DISK_SECTOR_BITS,
+ paddr & (GRUB_DISK_SECTOR_SIZE - 1),
+ csize, buf);
+ return err;
+}
+
static grub_err_t
grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
void *buf, grub_size_t size, int recursion_depth)
@@ -638,7 +678,6 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
grub_err_t err = 0;
struct grub_btrfs_key key_out;
int challoc = 0;
- grub_device_t dev;
struct grub_btrfs_key key_in;
grub_size_t chsize;
grub_disk_addr_t chaddr;
@@ -885,36 +924,10 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,

for (i = 0; i < redundancy; i++)
{
- struct grub_btrfs_chunk_stripe *stripe;
- grub_disk_addr_t paddr;
-
- stripe = (struct grub_btrfs_chunk_stripe *) (chunk + 1);
- /* Right now the redundancy handling is easy.
- With RAID5-like it will be more difficult. */
- stripe += stripen + i;
-
- paddr = grub_le_to_cpu64 (stripe->offset) + stripe_offset;
-
- grub_dprintf ("btrfs", "stripe %" PRIxGRUB_UINT64_T
- " maps to 0x%" PRIxGRUB_UINT64_T "\n",
- stripen, stripe->offset);
- grub_dprintf ("btrfs", "reading paddr 0x%" PRIxGRUB_UINT64_T
- "\n", paddr);
-
- dev = find_device (data, stripe->device_id);
- if (!dev)
- {
- grub_dprintf ("btrfs",
- "couldn't find a necessary member device "
- "of multi-device filesystem\n");
- err = grub_errno;
- grub_errno = GRUB_ERR_NONE;
- continue;
- }
-
- err = grub_disk_read (dev->disk, paddr >> GRUB_DISK_SECTOR_BITS,
- paddr & (GRUB_DISK_SECTOR_SIZE - 1),
- csize, buf);
+ err = btrfs_read_from_chunk (data, chunk, stripen,
+ stripe_offset,
+ i, /* redundancy */
+ csize, buf);
if (!err)
break;
grub_errno = GRUB_ERR_NONE;
--
2.19.0
Goffredo Baroncelli
2018-09-27 18:35:00 UTC
Permalink
From: Goffredo Baroncelli <***@inwind.it>

A portion of the logging code is moved outside of internal for(;;). The part
that is left inside is the one which depends on the internal for(;;) index.

This is a preparatory patch. The next one will refactor the code inside
the for(;;) into an another function.

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
Reviewed-by: Daniel Kiper <***@oracle.com>
---
grub-core/fs/btrfs.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index c07d91657..179a2da9d 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -871,6 +871,18 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,

for (j = 0; j < 2; j++)
{
+ grub_dprintf ("btrfs", "chunk 0x%" PRIxGRUB_UINT64_T
+ "+0x%" PRIxGRUB_UINT64_T
+ " (%d stripes (%d substripes) of %"
+ PRIxGRUB_UINT64_T ")\n",
+ grub_le_to_cpu64 (key->offset),
+ grub_le_to_cpu64 (chunk->size),
+ grub_le_to_cpu16 (chunk->nstripes),
+ grub_le_to_cpu16 (chunk->nsubstripes),
+ grub_le_to_cpu64 (chunk->stripe_length));
+ grub_dprintf ("btrfs", "reading laddr 0x%" PRIxGRUB_UINT64_T "\n",
+ addr);
+
for (i = 0; i < redundancy; i++)
{
struct grub_btrfs_chunk_stripe *stripe;
@@ -883,20 +895,11 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,

paddr = grub_le_to_cpu64 (stripe->offset) + stripe_offset;

- grub_dprintf ("btrfs", "chunk 0x%" PRIxGRUB_UINT64_T
- "+0x%" PRIxGRUB_UINT64_T
- " (%d stripes (%d substripes) of %"
- PRIxGRUB_UINT64_T ") stripe %" PRIxGRUB_UINT64_T
+ grub_dprintf ("btrfs", "stripe %" PRIxGRUB_UINT64_T
" maps to 0x%" PRIxGRUB_UINT64_T "\n",
- grub_le_to_cpu64 (key->offset),
- grub_le_to_cpu64 (chunk->size),
- grub_le_to_cpu16 (chunk->nstripes),
- grub_le_to_cpu16 (chunk->nsubstripes),
- grub_le_to_cpu64 (chunk->stripe_length),
stripen, stripe->offset);
grub_dprintf ("btrfs", "reading paddr 0x%" PRIxGRUB_UINT64_T
- " for laddr 0x%" PRIxGRUB_UINT64_T "\n", paddr,
- addr);
+ "\n", paddr);

dev = find_device (data, stripe->device_id);
if (!dev)
--
2.19.0
Goffredo Baroncelli
2018-09-27 18:35:02 UTC
Permalink
From: Goffredo Baroncelli <***@inwind.it>

Add support for recovery for a RAID 5 btrfs profile. In addition
it is added some code as preparatory work for RAID 6 recovery code.

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
---
grub-core/fs/btrfs.c | 160 +++++++++++++++++++++++++++++++++++++++++--
1 file changed, 155 insertions(+), 5 deletions(-)

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 554f350c5..db8df0eea 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -29,6 +29,7 @@
#include <minilzo.h>
#include <grub/i18n.h>
#include <grub/btrfs.h>
+#include <grub/crypto.h>

GRUB_MOD_LICENSE ("GPLv3+");

@@ -665,6 +666,139 @@ btrfs_read_from_chunk (struct grub_btrfs_data *data,
return err;
}

+struct raid56_buffer {
+ void *buf;
+ int data_is_valid;
+};
+
+static void
+rebuild_raid5 (char *dest, struct raid56_buffer *buffers,
+ grub_uint64_t nstripes, grub_uint64_t csize)
+{
+ grub_uint64_t i;
+ int first;
+
+ for(i = 0; buffers[i].data_is_valid && i < nstripes; i++);
+
+ if (i == nstripes)
+ {
+ grub_dprintf ("btrfs", "called rebuild_raid5(), but all disks are OK\n");
+ return;
+ }
+
+ grub_dprintf ("btrfs", "rebuilding RAID 5 stripe #%" PRIuGRUB_UINT64_T "\n", i);
+
+ for (i = 0, first = 1; i < nstripes; i++)
+ {
+ if (!buffers[i].data_is_valid)
+ continue;
+
+ if (first) {
+ grub_memcpy(dest, buffers[i].buf, csize);
+ first = 0;
+ } else
+ grub_crypto_xor (dest, dest, buffers[i].buf, csize);
+
+ }
+}
+
+static grub_err_t
+raid56_read_retry (struct grub_btrfs_data *data,
+ struct grub_btrfs_chunk_item *chunk,
+ grub_uint64_t stripe_offset,
+ grub_uint64_t csize, void *buf)
+{
+ struct raid56_buffer *buffers;
+ grub_uint64_t nstripes = grub_le_to_cpu16 (chunk->nstripes);
+ grub_uint64_t chunk_type = grub_le_to_cpu64 (chunk->type);
+ grub_err_t ret = GRUB_ERR_OUT_OF_MEMORY;
+ grub_uint64_t i, failed_devices;
+
+ buffers = grub_zalloc (sizeof(*buffers) * nstripes);
+ if (!buffers)
+ goto cleanup;
+
+ for (i = 0; i < nstripes; i++)
+ {
+ buffers[i].buf = grub_zalloc (csize);
+ if (!buffers[i].buf)
+ goto cleanup;
+ }
+
+ for (failed_devices = 0, i = 0; i < nstripes; i++)
+ {
+ struct grub_btrfs_chunk_stripe *stripe;
+ grub_disk_addr_t paddr;
+ grub_device_t dev;
+ grub_err_t err;
+
+ stripe = (struct grub_btrfs_chunk_stripe *) (chunk + 1) + i;
+
+ paddr = grub_le_to_cpu64 (stripe->offset) + stripe_offset;
+ grub_dprintf ("btrfs", "reading paddr %" PRIxGRUB_UINT64_T
+ " from stripe ID %" PRIxGRUB_UINT64_T "\n", paddr,
+ stripe->device_id);
+
+ dev = find_device (data, stripe->device_id);
+ if (!dev)
+ {
+ buffers[i].data_is_valid = 0;
+ grub_dprintf ("btrfs", "stripe %" PRIuGRUB_UINT64_T " FAILED (dev ID %"
+ PRIxGRUB_UINT64_T ")\n", i, stripe->device_id);
+ failed_devices++;
+ continue;
+ }
+
+ err = grub_disk_read (dev->disk, paddr >> GRUB_DISK_SECTOR_BITS,
+ paddr & (GRUB_DISK_SECTOR_SIZE - 1),
+ csize, buffers[i].buf);
+ if (err == GRUB_ERR_NONE)
+ {
+ buffers[i].data_is_valid = 1;
+ grub_dprintf ("btrfs", "stripe %" PRIuGRUB_UINT64_T " Ok (dev ID %"
+ PRIxGRUB_UINT64_T ")\n", i, stripe->device_id);
+ }
+ else
+ {
+ buffers[i].data_is_valid = 0;
+ grub_dprintf ("btrfs", "stripe %" PRIuGRUB_UINT64_T
+ " FAILED (dev ID %" PRIxGRUB_UINT64_T ")\n", i,
+ stripe->device_id);
+ failed_devices++;
+ }
+ }
+
+ if (failed_devices > 1 && (chunk_type & GRUB_BTRFS_CHUNK_TYPE_RAID5))
+ {
+ grub_dprintf ("btrfs",
+ "not enough disks for RAID 5: total %" PRIuGRUB_UINT64_T
+ ", missing %" PRIuGRUB_UINT64_T "\n",
+ nstripes, failed_devices);
+ ret = GRUB_ERR_READ_ERROR;
+ goto cleanup;
+ }
+ else
+ grub_dprintf ("btrfs",
+ "enough disks for RAID 5 rebuilding: total %"
+ PRIuGRUB_UINT64_T ", missing %" PRIuGRUB_UINT64_T "\n",
+ nstripes, failed_devices);
+
+ /* if these are enough, try to rebuild the data */
+ if (chunk_type & GRUB_BTRFS_CHUNK_TYPE_RAID5)
+ rebuild_raid5 (buf, buffers, nstripes, csize);
+ else
+ grub_dprintf ("btrfs", "called rebuild_raid6(), NOT IMPLEMENTED\n");
+
+ ret = GRUB_ERR_NONE;
+ cleanup:
+ if (buffers)
+ for (i = 0; i < nstripes; i++)
+ grub_free(buffers[i].buf);
+ grub_free(buffers);
+
+ return ret;
+}
+
static grub_err_t
grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
void *buf, grub_size_t size, int recursion_depth)
@@ -742,6 +876,10 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
grub_uint16_t nstripes;
unsigned redundancy = 1;
unsigned i, j;
+ int is_raid56;
+
+ is_raid56 = !!(grub_le_to_cpu64 (chunk->type) &
+ GRUB_BTRFS_CHUNK_TYPE_RAID5);

if (grub_le_to_cpu64 (chunk->size) <= off)
{
@@ -922,17 +1060,29 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
grub_dprintf ("btrfs", "reading laddr 0x%" PRIxGRUB_UINT64_T "\n",
addr);

- for (i = 0; i < redundancy; i++)
+ if (!is_raid56)
+ for (i = 0; i < redundancy; i++)
+ {
+ err = btrfs_read_from_chunk (data, chunk, stripen,
+ stripe_offset,
+ i, /* redundancy */
+ csize, buf);
+ if (!err)
+ break;
+ grub_errno = GRUB_ERR_NONE;
+ }
+ else
{
err = btrfs_read_from_chunk (data, chunk, stripen,
stripe_offset,
- i, /* redundancy */
+ 0, /* no mirror */
csize, buf);
- if (!err)
- break;
grub_errno = GRUB_ERR_NONE;
+ if (err != GRUB_ERR_NONE)
+ err = raid56_read_retry (data, chunk, stripe_offset,
+ csize, buf);
}
- if (i != redundancy)
+ if (err == GRUB_ERR_NONE)
break;
}
if (err)
--
2.19.0
Daniel Kiper
2018-10-09 18:20:43 UTC
Permalink
Post by Goffredo Baroncelli
Add support for recovery for a RAID 5 btrfs profile. In addition
it is added some code as preparatory work for RAID 6 recovery code.
---
grub-core/fs/btrfs.c | 160 +++++++++++++++++++++++++++++++++++++++++--
1 file changed, 155 insertions(+), 5 deletions(-)
diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 554f350c5..db8df0eea 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -29,6 +29,7 @@
#include <minilzo.h>
#include <grub/i18n.h>
#include <grub/btrfs.h>
+#include <grub/crypto.h>
GRUB_MOD_LICENSE ("GPLv3+");
@@ -665,6 +666,139 @@ btrfs_read_from_chunk (struct grub_btrfs_data *data,
return err;
}
+struct raid56_buffer {
+ void *buf;
+ int data_is_valid;
+};
+
+static void
+rebuild_raid5 (char *dest, struct raid56_buffer *buffers,
+ grub_uint64_t nstripes, grub_uint64_t csize)
+{
+ grub_uint64_t i;
+ int first;
+
+ for(i = 0; buffers[i].data_is_valid && i < nstripes; i++);
+
+ if (i == nstripes)
+ {
+ grub_dprintf ("btrfs", "called rebuild_raid5(), but all disks are OK\n");
+ return;
+ }
+
+ grub_dprintf ("btrfs", "rebuilding RAID 5 stripe #%" PRIuGRUB_UINT64_T "\n", i);
+
+ for (i = 0, first = 1; i < nstripes; i++)
+ {
+ if (!buffers[i].data_is_valid)
+ continue;
+
+ if (first) {
+ grub_memcpy(dest, buffers[i].buf, csize);
+ first = 0;
+ } else
+ grub_crypto_xor (dest, dest, buffers[i].buf, csize);
+
+ }
+}
+
+static grub_err_t
+raid56_read_retry (struct grub_btrfs_data *data,
+ struct grub_btrfs_chunk_item *chunk,
+ grub_uint64_t stripe_offset,
+ grub_uint64_t csize, void *buf)
+{
+ struct raid56_buffer *buffers;
+ grub_uint64_t nstripes = grub_le_to_cpu16 (chunk->nstripes);
+ grub_uint64_t chunk_type = grub_le_to_cpu64 (chunk->type);
+ grub_err_t ret = GRUB_ERR_OUT_OF_MEMORY;
+ grub_uint64_t i, failed_devices;
+
+ buffers = grub_zalloc (sizeof(*buffers) * nstripes);
+ if (!buffers)
+ goto cleanup;
+
+ for (i = 0; i < nstripes; i++)
+ {
+ buffers[i].buf = grub_zalloc (csize);
+ if (!buffers[i].buf)
+ goto cleanup;
+ }
+
+ for (failed_devices = 0, i = 0; i < nstripes; i++)
+ {
+ struct grub_btrfs_chunk_stripe *stripe;
+ grub_disk_addr_t paddr;
+ grub_device_t dev;
+ grub_err_t err;
+
+ stripe = (struct grub_btrfs_chunk_stripe *) (chunk + 1) + i;
I think that "chunk + 1" requires short comment why...
Post by Goffredo Baroncelli
+
+ paddr = grub_le_to_cpu64 (stripe->offset) + stripe_offset;
+ grub_dprintf ("btrfs", "reading paddr %" PRIxGRUB_UINT64_T
+ " from stripe ID %" PRIxGRUB_UINT64_T "\n", paddr,
+ stripe->device_id);
+
+ dev = find_device (data, stripe->device_id);
+ if (!dev)
+ {
+ buffers[i].data_is_valid = 0;
+ grub_dprintf ("btrfs", "stripe %" PRIuGRUB_UINT64_T " FAILED (dev ID %"
+ PRIxGRUB_UINT64_T ")\n", i, stripe->device_id);
+ failed_devices++;
+ continue;
+ }
+
+ err = grub_disk_read (dev->disk, paddr >> GRUB_DISK_SECTOR_BITS,
+ paddr & (GRUB_DISK_SECTOR_SIZE - 1),
+ csize, buffers[i].buf);
+ if (err == GRUB_ERR_NONE)
+ {
+ buffers[i].data_is_valid = 1;
+ grub_dprintf ("btrfs", "stripe %" PRIuGRUB_UINT64_T " Ok (dev ID %"
+ PRIxGRUB_UINT64_T ")\n", i, stripe->device_id);
+ }
+ else
+ {
+ buffers[i].data_is_valid = 0;
+ grub_dprintf ("btrfs", "stripe %" PRIuGRUB_UINT64_T
+ " FAILED (dev ID %" PRIxGRUB_UINT64_T ")\n", i,
+ stripe->device_id);
+ failed_devices++;
+ }
+ }
+
+ if (failed_devices > 1 && (chunk_type & GRUB_BTRFS_CHUNK_TYPE_RAID5))
+ {
+ grub_dprintf ("btrfs",
+ "not enough disks for RAID 5: total %" PRIuGRUB_UINT64_T
+ ", missing %" PRIuGRUB_UINT64_T "\n",
+ nstripes, failed_devices);
+ ret = GRUB_ERR_READ_ERROR;
+ goto cleanup;
+ }
+ else
+ grub_dprintf ("btrfs",
+ "enough disks for RAID 5 rebuilding: total %"
+ PRIuGRUB_UINT64_T ", missing %" PRIuGRUB_UINT64_T "\n",
+ nstripes, failed_devices);
+
+ /* if these are enough, try to rebuild the data */
+ if (chunk_type & GRUB_BTRFS_CHUNK_TYPE_RAID5)
+ rebuild_raid5 (buf, buffers, nstripes, csize);
+ else
+ grub_dprintf ("btrfs", "called rebuild_raid6(), NOT IMPLEMENTED\n");
+
+ ret = GRUB_ERR_NONE;
+ if (buffers)
+ for (i = 0; i < nstripes; i++)
+ grub_free(buffers[i].buf);
+ grub_free(buffers);
+
+ return ret;
+}
+
static grub_err_t
grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
void *buf, grub_size_t size, int recursion_depth)
@@ -742,6 +876,10 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
grub_uint16_t nstripes;
unsigned redundancy = 1;
unsigned i, j;
+ int is_raid56;
+
+ is_raid56 = !!(grub_le_to_cpu64 (chunk->type) &
+ GRUB_BTRFS_CHUNK_TYPE_RAID5);
if (grub_le_to_cpu64 (chunk->size) <= off)
{
@@ -922,17 +1060,29 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
grub_dprintf ("btrfs", "reading laddr 0x%" PRIxGRUB_UINT64_T "\n",
addr);
- for (i = 0; i < redundancy; i++)
+ if (!is_raid56)
Why not "if (is_raid56)"? This looks more natural here.
Post by Goffredo Baroncelli
+ for (i = 0; i < redundancy; i++)
+ {
+ err = btrfs_read_from_chunk (data, chunk, stripen,
+ stripe_offset,
+ i, /* redundancy */
+ csize, buf);
+ if (!err)
+ break;
+ grub_errno = GRUB_ERR_NONE;
+ }
+ else
{
err = btrfs_read_from_chunk (data, chunk, stripen,
stripe_offset,
- i, /* redundancy */
+ 0, /* no mirror */
csize, buf);
- if (!err)
- break;
grub_errno = GRUB_ERR_NONE;
+ if (err != GRUB_ERR_NONE)
Please be consistent and use "if (err)" here.
Post by Goffredo Baroncelli
+ err = raid56_read_retry (data, chunk, stripe_offset,
+ csize, buf);
}
- if (i != redundancy)
+ if (err == GRUB_ERR_NONE)
if (!err) please...

Daniel
Goffredo Baroncelli
2018-10-10 13:08:37 UTC
Permalink
Post by Daniel Kiper
Post by Goffredo Baroncelli
Add support for recovery for a RAID 5 btrfs profile. In addition
it is added some code as preparatory work for RAID 6 recovery code.
---
grub-core/fs/btrfs.c | 160 +++++++++++++++++++++++++++++++++++++++++--
1 file changed, 155 insertions(+), 5 deletions(-)
diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 554f350c5..db8df0eea 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -29,6 +29,7 @@
[...]
Post by Daniel Kiper
Post by Goffredo Baroncelli
+
+ stripe = (struct grub_btrfs_chunk_stripe *) (chunk + 1) + i;
I think that "chunk + 1" requires short comment why...
This is a quite common pattern (a struct followed by an array of struct). However I added a short comment like

/* after the struct grub_btrfs_chunk_item, there is an array of
struct grub_btrfs_chunk_stripe */
stripe = (struct grub_btrfs_chunk_stripe *) (chunk + 1) + i;


[...]
Post by Daniel Kiper
Post by Goffredo Baroncelli
- for (i = 0; i < redundancy; i++)
+ if (!is_raid56)
Why not "if (is_raid56)"? This looks more natural here.
I think that it was due to the fact than before it was only the not "raid5/6" profiles.
Then it was added the "raid5/6" profiles, so I added this case after.

What I dislike is not the order, but the fact that in one case (not raid5), the same function is called several time until it found valid data. In the other case, it is called only the first time, then a completely different path is called....
I am trying a different solution, but in any case the result is a bit ugly....
Post by Daniel Kiper
Post by Goffredo Baroncelli
+ for (i = 0; i < redundancy; i++)
+ {
+ err = btrfs_read_from_chunk (data, chunk, stripen,
+ stripe_offset,
+ i, /* redundancy */
+ csize, buf);
+ if (!err)
+ break;
+ grub_errno = GRUB_ERR_NONE;
+ }
+ else
{
err = btrfs_read_from_chunk (data, chunk, stripen,
stripe_offset,
- i, /* redundancy */
+ 0, /* no mirror */
csize, buf);
- if (!err)
- break;
grub_errno = GRUB_ERR_NONE;
+ if (err != GRUB_ERR_NONE)
Please be consistent and use "if (err)" here.
ok
Post by Daniel Kiper
Post by Goffredo Baroncelli
+ err = raid56_read_retry (data, chunk, stripe_offset,
+ csize, buf);
}
- if (i != redundancy)
+ if (err == GRUB_ERR_NONE)
if (!err) please...
ok
Post by Daniel Kiper
Daniel
_______________________________________________
Grub-devel mailing list
https://lists.gnu.org/mailman/listinfo/grub-devel
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
Goffredo Baroncelli
2018-09-27 18:35:03 UTC
Permalink
From: Goffredo Baroncelli <***@inwind.it>

The original code which handles the recovery of a RAID 6 disks array
assumes that all reads are multiple of 1 << GRUB_DISK_SECTOR_BITS and it
assumes that all the I/O is done via the struct grub_diskfilter_segment.
This is not true for the btrfs code. In order to reuse the native
grub_raid6_recover() code, it is modified to not call
grub_diskfilter_read_node() directly, but to call an handler passed
as an argument.

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
Reviewed-by: Daniel Kiper <***@oracle.com>
---
grub-core/disk/raid6_recover.c | 52 ++++++++++++++++++++++------------
include/grub/diskfilter.h | 9 ++++++
2 files changed, 43 insertions(+), 18 deletions(-)

diff --git a/grub-core/disk/raid6_recover.c b/grub-core/disk/raid6_recover.c
index aa674f6ca..0cf691ddf 100644
--- a/grub-core/disk/raid6_recover.c
+++ b/grub-core/disk/raid6_recover.c
@@ -74,14 +74,26 @@ mod_255 (unsigned x)
}

static grub_err_t
-grub_raid6_recover (struct grub_diskfilter_segment *array, int disknr, int p,
- char *buf, grub_disk_addr_t sector, grub_size_t size)
+raid6_recover_read_node (void *data, int disknr,
+ grub_uint64_t sector,
+ void *buf, grub_size_t size)
+{
+ struct grub_diskfilter_segment *array = data;
+
+ return grub_diskfilter_read_node (&array->nodes[disknr],
+ (grub_disk_addr_t)sector,
+ size >> GRUB_DISK_SECTOR_BITS, buf);
+}
+
+grub_err_t
+grub_raid6_recover_gen (void *data, grub_uint64_t nstripes, int disknr, int p,
+ char *buf, grub_uint64_t sector, grub_size_t size,
+ int layout, raid_recover_read_t read_func)
{
int i, q, pos;
int bad1 = -1, bad2 = -1;
char *pbuf = 0, *qbuf = 0;

- size <<= GRUB_DISK_SECTOR_BITS;
pbuf = grub_zalloc (size);
if (!pbuf)
goto quit;
@@ -91,17 +103,17 @@ grub_raid6_recover (struct grub_diskfilter_segment *array, int disknr, int p,
goto quit;

q = p + 1;
- if (q == (int) array->node_count)
+ if (q == (int) nstripes)
q = 0;

pos = q + 1;
- if (pos == (int) array->node_count)
+ if (pos == (int) nstripes)
pos = 0;

- for (i = 0; i < (int) array->node_count - 2; i++)
+ for (i = 0; i < (int) nstripes - 2; i++)
{
int c;
- if (array->layout & GRUB_RAID_LAYOUT_MUL_FROM_POS)
+ if (layout & GRUB_RAID_LAYOUT_MUL_FROM_POS)
c = pos;
else
c = i;
@@ -109,8 +121,7 @@ grub_raid6_recover (struct grub_diskfilter_segment *array, int disknr, int p,
bad1 = c;
else
{
- if (! grub_diskfilter_read_node (&array->nodes[pos], sector,
- size >> GRUB_DISK_SECTOR_BITS, buf))
+ if (!read_func(data, pos, sector, buf, size))
{
grub_crypto_xor (pbuf, pbuf, buf, size);
grub_raid_block_mulx (c, buf, size);
@@ -128,7 +139,7 @@ grub_raid6_recover (struct grub_diskfilter_segment *array, int disknr, int p,
}

pos++;
- if (pos == (int) array->node_count)
+ if (pos == (int) nstripes)
pos = 0;
}

@@ -139,16 +150,14 @@ grub_raid6_recover (struct grub_diskfilter_segment *array, int disknr, int p,
if (bad2 < 0)
{
/* One bad device */
- if ((! grub_diskfilter_read_node (&array->nodes[p], sector,
- size >> GRUB_DISK_SECTOR_BITS, buf)))
+ if (!read_func(data, p, sector, buf, size))
{
grub_crypto_xor (buf, buf, pbuf, size);
goto quit;
}

grub_errno = GRUB_ERR_NONE;
- if (grub_diskfilter_read_node (&array->nodes[q], sector,
- size >> GRUB_DISK_SECTOR_BITS, buf))
+ if (read_func(data, q, sector, buf, size))
goto quit;

grub_crypto_xor (buf, buf, qbuf, size);
@@ -160,14 +169,12 @@ grub_raid6_recover (struct grub_diskfilter_segment *array, int disknr, int p,
/* Two bad devices */
unsigned c;

- if (grub_diskfilter_read_node (&array->nodes[p], sector,
- size >> GRUB_DISK_SECTOR_BITS, buf))
+ if (read_func(data, p, sector, buf, size))
goto quit;

grub_crypto_xor (pbuf, pbuf, buf, size);

- if (grub_diskfilter_read_node (&array->nodes[q], sector,
- size >> GRUB_DISK_SECTOR_BITS, buf))
+ if (read_func(data, q, sector, buf, size))
goto quit;

grub_crypto_xor (qbuf, qbuf, buf, size);
@@ -190,6 +197,15 @@ quit:
return grub_errno;
}

+static grub_err_t
+grub_raid6_recover (struct grub_diskfilter_segment *array, int disknr, int p,
+ char *buf, grub_disk_addr_t sector, grub_size_t size)
+{
+ return grub_raid6_recover_gen (array, array->node_count, disknr, p, buf,
+ sector, size << GRUB_DISK_SECTOR_BITS,
+ array->layout, raid6_recover_read_node);
+}
+
GRUB_MOD_INIT(raid6rec)
{
grub_raid6_init_table ();
diff --git a/include/grub/diskfilter.h b/include/grub/diskfilter.h
index d89273c1b..8deb1a8c3 100644
--- a/include/grub/diskfilter.h
+++ b/include/grub/diskfilter.h
@@ -189,6 +189,15 @@ typedef grub_err_t (*grub_raid6_recover_func_t) (struct grub_diskfilter_segment
extern grub_raid5_recover_func_t grub_raid5_recover_func;
extern grub_raid6_recover_func_t grub_raid6_recover_func;

+typedef grub_err_t (* raid_recover_read_t)(void *data, int disk_nr,
+ grub_uint64_t addr, void *dest,
+ grub_size_t size);
+
+extern grub_err_t
+grub_raid6_recover_gen (void *data, grub_uint64_t nstripes, int disknr, int p,
+ char *buf, grub_uint64_t sector, grub_size_t size,
+ int layout, raid_recover_read_t read_func);
+
grub_err_t grub_diskfilter_vg_register (struct grub_diskfilter_vg *vg);

grub_err_t
--
2.19.0
Goffredo Baroncelli
2018-09-27 18:34:57 UTC
Permalink
From: Goffredo Baroncelli <***@inwind.it>

This helper is used in a few places to help the debugging. As
conservative approach the error is only logged.
This does not impact the error handling.

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
Reviewed-by: Daniel Kiper <***@oracle.com>
---
grub-core/fs/btrfs.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index 9bc6d399d..bf0dbce21 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -77,7 +77,8 @@ struct btrfs_header
{
grub_btrfs_checksum_t checksum;
grub_btrfs_uuid_t uuid;
- grub_uint8_t dummy[0x30];
+ grub_uint64_t bytenr;
+ grub_uint8_t dummy[0x28];
grub_uint32_t nitems;
grub_uint8_t level;
} GRUB_PACKED;
@@ -286,6 +287,25 @@ free_iterator (struct grub_btrfs_leaf_descriptor *desc)
grub_free (desc->data);
}

+static grub_err_t
+check_btrfs_header (struct grub_btrfs_data *data, struct btrfs_header *header,
+ grub_disk_addr_t addr)
+{
+ if (grub_le_to_cpu64 (header->bytenr) != addr)
+ {
+ grub_dprintf ("btrfs", "btrfs_header.bytenr is not equal node addr\n");
+ return grub_error (GRUB_ERR_BAD_FS,
+ "header bytenr is not equal node addr");
+ }
+ if (grub_memcmp (data->sblock.uuid, header->uuid, sizeof(grub_btrfs_uuid_t)))
+ {
+ grub_dprintf ("btrfs", "btrfs_header.uuid doesn't match sblock uuid\n");
+ return grub_error (GRUB_ERR_BAD_FS,
+ "header uuid doesn't match sblock uuid");
+ }
+ return GRUB_ERR_NONE;
+}
+
static grub_err_t
save_ref (struct grub_btrfs_leaf_descriptor *desc,
grub_disk_addr_t addr, unsigned i, unsigned m, int l)
@@ -341,6 +361,7 @@ next (struct grub_btrfs_data *data,

err = grub_btrfs_read_logical (data, grub_le_to_cpu64 (node.addr),
&head, sizeof (head), 0);
+ check_btrfs_header (data, &head, grub_le_to_cpu64 (node.addr));
if (err)
return -err;

@@ -402,6 +423,7 @@ lower_bound (struct grub_btrfs_data *data,
/* FIXME: preread few nodes into buffer. */
err = grub_btrfs_read_logical (data, addr, &head, sizeof (head),
recursion_depth + 1);
+ check_btrfs_header (data, &head, addr);
if (err)
return err;
addr += sizeof (head);
--
2.19.0
Goffredo Baroncelli
2018-09-27 18:34:58 UTC
Permalink
From: Goffredo Baroncelli <***@inwind.it>

The caller knows better if this error is fatal or not, i.e. another disk is
available or not.

This is a preparatory patch.

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
Reviewed-by: Daniel Kiper <***@oracle.com>
---
grub-core/fs/btrfs.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index bf0dbce21..2b1dd7dd4 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -603,12 +603,7 @@ find_device (struct grub_btrfs_data *data, grub_uint64_t id, int do_rescan)
if (do_rescan)
grub_device_iterate (find_device_iter, &ctx);
if (!ctx.dev_found)
- {
- grub_error (GRUB_ERR_BAD_FS,
- N_("couldn't find a necessary member device "
- "of multi-device filesystem"));
- return NULL;
- }
+ return NULL;
data->n_devices_attached++;
if (data->n_devices_attached > data->n_devices_allocated)
{
@@ -906,6 +901,9 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
dev = find_device (data, stripe->device_id, j);
if (!dev)
{
+ grub_dprintf ("btrfs",
+ "couldn't find a necessary member device "
+ "of multi-device filesystem\n");
err = grub_errno;
grub_errno = GRUB_ERR_NONE;
continue;
--
2.19.0
Goffredo Baroncelli
2018-09-27 18:35:04 UTC
Permalink
From: Goffredo Baroncelli <***@inwind.it>

Add the RAID 6 recovery, in order to use a RAID 6 filesystem even if some
disks (up to two) are missing. This code use the md RAID 6 code already
present in grub.

Signed-off-by: Goffredo Baroncelli <***@inwind.it>
---
grub-core/fs/btrfs.c | 63 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 57 insertions(+), 6 deletions(-)

diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
index db8df0eea..b49f714ef 100644
--- a/grub-core/fs/btrfs.c
+++ b/grub-core/fs/btrfs.c
@@ -30,6 +30,7 @@
#include <grub/i18n.h>
#include <grub/btrfs.h>
#include <grub/crypto.h>
+#include <grub/diskfilter.h>

GRUB_MOD_LICENSE ("GPLv3+");

@@ -702,11 +703,36 @@ rebuild_raid5 (char *dest, struct raid56_buffer *buffers,
}
}

+static grub_err_t
+raid6_recover_read_buffer (void *data, int disk_nr,
+ grub_uint64_t addr __attribute__ ((unused)),
+ void *dest, grub_size_t size)
+{
+ struct raid56_buffer *buffers = data;
+
+ if (!buffers[disk_nr].data_is_valid)
+ return grub_errno = GRUB_ERR_READ_ERROR;
+
+ grub_memcpy(dest, buffers[disk_nr].buf, size);
+
+ return grub_errno = GRUB_ERR_NONE;
+}
+
+static void
+rebuild_raid6 (struct raid56_buffer *buffers, grub_uint64_t nstripes,
+ grub_uint64_t csize, grub_uint64_t parities_pos, void *dest,
+ grub_uint64_t stripen)
+
+{
+ grub_raid6_recover_gen (buffers, nstripes, stripen, parities_pos,
+ dest, 0, csize, 0, raid6_recover_read_buffer);
+}
+
static grub_err_t
raid56_read_retry (struct grub_btrfs_data *data,
struct grub_btrfs_chunk_item *chunk,
- grub_uint64_t stripe_offset,
- grub_uint64_t csize, void *buf)
+ grub_uint64_t stripe_offset, grub_uint64_t stripen,
+ grub_uint64_t csize, void *buf, grub_uint64_t parities_pos)
{
struct raid56_buffer *buffers;
grub_uint64_t nstripes = grub_le_to_cpu16 (chunk->nstripes);
@@ -777,6 +803,15 @@ raid56_read_retry (struct grub_btrfs_data *data,
ret = GRUB_ERR_READ_ERROR;
goto cleanup;
}
+ else if (failed_devices > 2 && (chunk_type & GRUB_BTRFS_CHUNK_TYPE_RAID6))
+ {
+ grub_dprintf ("btrfs",
+ "not enough disks for raid6: total %" PRIuGRUB_UINT64_T
+ ", missing %" PRIuGRUB_UINT64_T "\n",
+ nstripes, failed_devices);
+ ret = GRUB_ERR_READ_ERROR;
+ goto cleanup;
+ }
else
grub_dprintf ("btrfs",
"enough disks for RAID 5 rebuilding: total %"
@@ -787,7 +822,7 @@ raid56_read_retry (struct grub_btrfs_data *data,
if (chunk_type & GRUB_BTRFS_CHUNK_TYPE_RAID5)
rebuild_raid5 (buf, buffers, nstripes, csize);
else
- grub_dprintf ("btrfs", "called rebuild_raid6(), NOT IMPLEMENTED\n");
+ rebuild_raid6 (buffers, nstripes, csize, parities_pos, buf, stripen);

ret = GRUB_ERR_NONE;
cleanup:
@@ -877,9 +912,11 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
unsigned redundancy = 1;
unsigned i, j;
int is_raid56;
+ grub_uint64_t parities_pos = 0;

- is_raid56 = !!(grub_le_to_cpu64 (chunk->type) &
- GRUB_BTRFS_CHUNK_TYPE_RAID5);
+ is_raid56 = !!(grub_le_to_cpu64 (chunk->type) &
+ (GRUB_BTRFS_CHUNK_TYPE_RAID5 |
+ GRUB_BTRFS_CHUNK_TYPE_RAID6));

if (grub_le_to_cpu64 (chunk->size) <= off)
{
@@ -1011,6 +1048,9 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
* size if the latter is smaller.
* - nparities is the number of parities (1 for RAID5, 2 for
* RAID6); used only in RAID5/6 code.
+ * size if the latter is smaller.
+ * - parities_pos is the position of the parity in a row (
+ * 2 for P1, 3 for P2...)
*/
stripe_nr = grub_divmod64 (off, chunk_stripe_length, &low);

@@ -1029,6 +1069,17 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
*/
grub_divmod64 (high + stripen, nstripes, &stripen);

+ /*
+ * parities_pos is equal to "(high - nparities) % nstripes"
+ * (see the diagram above).
+ * However "high - nparities" might be negative (eg when high
+ * == 0) leading to an incorrect computation.
+ * Instead "high + nstripes - nparities" is always positive and
+ * in modulo nstripes is equal to "(high - nparities) % nstripes"
+ */
+ grub_divmod64 (high + nstripes - nparities, nstripes,
+ &parities_pos);
+
stripe_offset = low + chunk_stripe_length * high;
csize = chunk_stripe_length - low;

@@ -1080,7 +1131,7 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
grub_errno = GRUB_ERR_NONE;
if (err != GRUB_ERR_NONE)
err = raid56_read_retry (data, chunk, stripe_offset,
- csize, buf);
+ stripen, csize, buf, parities_pos);
}
if (err == GRUB_ERR_NONE)
break;
--
2.19.0
Daniel Kiper
2018-10-09 18:24:52 UTC
Permalink
Post by Goffredo Baroncelli
Add the RAID 6 recovery, in order to use a RAID 6 filesystem even if some
disks (up to two) are missing. This code use the md RAID 6 code already
present in grub.
Reviewed-by: Daniel Kiper <***@oracle.com>

Daniel
Daniel Kiper
2018-10-11 16:56:37 UTC
Permalink
Post by Goffredo Baroncelli
i All,
the aim of this patches set is to provide support for a BTRFS raid5/6
filesystem in GRUB.
I have sent you updated comment and commit message. Please double check it.
If everything is OK please repost whole series.

Daniel

Loading...