From: Daniel Kiper <dkiper@net-space.pl>
To: kreijack@inwind.it
Cc: Daniel Kiper <dkiper@net-space.pl>,
grub-devel@gnu.org, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile.
Date: Thu, 27 Sep 2018 17:47:59 +0200 [thread overview]
Message-ID: <20180927154759.GA22053@router-fw-old.i.net-space.pl> (raw)
In-Reply-To: <01115a89-3397-3cad-73b1-10495e11cb61@libero.it>
On Wed, Sep 26, 2018 at 10:40:32PM +0200, Goffredo Baroncelli wrote:
> On 25/09/2018 17.31, Daniel Kiper wrote:
> > On Wed, Sep 19, 2018 at 08:40:32PM +0200, Goffredo Baroncelli wrote:
> >> From: Goffredo Baroncelli <kreijack@inwind.it>
> >>
> >> Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
> >> ---
> >> grub-core/fs/btrfs.c | 66 ++++++++++++++++++++++++++++++++++++++++++++
> >> 1 file changed, 66 insertions(+)
> >>
> >> diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
> >> index be195448d..56c42746d 100644
> >> --- a/grub-core/fs/btrfs.c
> >> +++ b/grub-core/fs/btrfs.c
> >> @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
> >> #define GRUB_BTRFS_CHUNK_TYPE_RAID1 0x10
> >> #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED 0x20
> >> #define GRUB_BTRFS_CHUNK_TYPE_RAID10 0x40
> >> +#define GRUB_BTRFS_CHUNK_TYPE_RAID5 0x80
> >> +#define GRUB_BTRFS_CHUNK_TYPE_RAID6 0x100
> >> grub_uint8_t dummy2[0xc];
> >> grub_uint16_t nstripes;
> >> grub_uint16_t nsubstripes;
> >> @@ -764,6 +766,70 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, grub_disk_addr_t addr,
> >> stripe_offset = low + chunk_stripe_length
> >> * high;
> >> csize = chunk_stripe_length - low;
> >> + break;
> >> + }
> >> + case GRUB_BTRFS_CHUNK_TYPE_RAID5:
> >> + case GRUB_BTRFS_CHUNK_TYPE_RAID6:
> >> + {
> >> + grub_uint64_t nparities, block_nr, high, low;
> >> +
> >> + redundancy = 1; /* no redundancy for now */
> >> +
> >> + if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
> >> + {
> >> + grub_dprintf ("btrfs", "RAID5\n");
> >> + nparities = 1;
> >> + }
> >> + else
> >> + {
> >> + grub_dprintf ("btrfs", "RAID6\n");
> >> + nparities = 2;
> >> + }
> >> +
> >> + /*
> >> + * A RAID 6 layout consists of several blocks spread on the disks.
> >> + * The raid terminology is used to call all the blocks of a row
> >> + * "stripe". Unfortunately the BTRFS terminology confuses block
> >
> > Stripe is data set or parity (parity stripe) on one disk. Block has
> > different meaning. Please stick to btrfs terminology and say it clearly
> > in the comment. And even add a link to btrfs wiki page to ease reading.
> >
> > I think about this one:
> > https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs#BLOCK_GROUPS.2C_CHUNKS.2C_RAID
> >
> >> + * and stripe.
> >
> > I do not think so. Or at least not so much...
>
> Trust me, generally speaking stripe is the "row" in the disks (without the parity); looking at the ext3 man page:
>
> ....
> stride=stride-size
> Configure the filesystem for a RAID array with
> stride-size filesystem blocks. This is the number of
> blocks read or written to disk before moving to the
> next disk, which is sometimes referred to as the
> chunk size. This mostly affects placement of
> filesystem metadata like bitmaps at mke2fs time to
> avoid placing them on a single disk, which can hurt
> performance. It may also be used by the block allo???
> cator.
>
> stripe_width=stripe-width
> Configure the filesystem for a RAID array with
> stripe-width filesystem blocks per stripe. This is
> typically stride-size * N, where N is the number of
> data-bearing disks in the RAID (e.g. for RAID 5
> there is one parity disk, so N will be the number of
> disks in the array minus 1). This allows the block
> allocator to prevent read-modify-write of the parity
> in a RAID stripe if possible when the data is writ???
> ten.
>
> ....
> Looking at the RAID5 wikipedia page, it seems that the term "stripe"
> is coherent with the ext3 man page.
Ugh... It looks that I have messed up things. Sorry about that.
> I suspect that the confusion is due to the fact that in RAID1 a stripe
> is in ONE disk (because the others are like parities). In BTRFS the
> RAID5/6 code uses the structure of RAID1 with some minimal
> extensions...
>
> To be clear, I don't have problem to be adherent to the BTRFS
> terminology. However I found this very confusing because I was used to
> a different terminology. I am bit worried about the fact that grub
Yeah, I have the same feeling. However, I think that in btrfs code we
should stay with btrfs terms. Though I think that it make sense to
underline differences between btrfs and well known RAID here.
> uses both MD/DM code and BTRFS code; a quick look to the code (eg
> ./grub-core/disk/diskfilter.c) shows that the stripe_size field seems
> to be related to a disks row without parities.
>
> And... yes in BTRFS "chunk" is a completely different beast than what
> it is reported in the ext3 man page :-)
As I said above, please say it in the comment. This will ease reading
for people who are not used to btrfs terms.
> >> + *
> >> + * Disk0 Disk1 Disk2 Disk3
> >> + *
> >> + * A1 B1 P1 Q1
> >> + * Q2 A2 B2 P2
> >> + * P3 Q3 A3 B3
> >> + * [...]
> >> + *
> >> + * Note that the placement of the parities depends on row index.
> >> + * In the code below:
> >> + * - block_nr is the block number without the parities
> >
> > Well, it seems to me that the btrfs code introduces confusion not the
> > spec itself. I would leave code as is but s/block number/stripe number/.
>
> Ok I will replace the two terms. However I have to put a warning that this is a "BTRFS" terminology :-)
Yep, and please explain the differences at the beginning of the comment.
> >> + * (A1 = 0, B1 = 1, A2 = 2, B2 = 3, ...),
> >> + * - high is the row number (0 for A1...Q1, 1 for Q2...P2, ...),
> >> + * - stripen is the disk number (0 for A1,Q2,P3, 1 for B1...),
> >
> > s/disk number/disk number in a row/
>
> This value doesn't depend by the row. So "number of disk" is more correct
Yes, but without "row" it is a bit confusing because it suggests that
it is an arbitrary number. Even if you give an example next to the
description. So, I am insisting on adding "in a row" here.
> >> + * - off is the logical address to read
> >> + * - chunk_stripe_length is the size of a block (typically 64k),
> >
> > s/a block/a stripe/
Taking into account discussion above I am not sure right now which one
is correct. Please double check and fix it if it is needed.
> >> + * - nstripes is the number of disks,
> >
> > s/number of disks/number of disks in a row/
> ditto
As above, Is it total number of disks in array? I do not think so.
Hence, I think that "in a row" helps a bit. Even if it is not very
precise. However, if you come up with something better I am not
against it.
> > I miss the description of nparities here...
>
> Right:
> + * - nparities is the number of parities (1 for RAID5, 2 for RAID6);
> + * used only in RAID5/6 code.
LGTM.
> >> + * - low is the offset of the data inside a stripe,
> >> + * - stripe_offset is the disk offset,
> >
> > s/the disk offset/the data offset in an array/?
>
> Yes
>
> >
> >> + * - csize is the "potential" data to read. It will be reduced to
> >> + * size if the latter is smaller.
> >> + */
> >> + block_nr = grub_divmod64 (off, chunk_stripe_length, &low);
> >> +
> >> + /*
> >> + * stripen is computed without the parities (0 for A1, A2, A3...
> >> + * 1 for B1, B2...).
> >> + */
> >> + high = grub_divmod64 (block_nr, nstripes - nparities, &stripen);
> >
> > This is clear...
> >
> >> + /*
> >> + * stripen is recomputed considering the parities (0 for A1, 1 for
> >> + * A2, 2 for A3....).
> >> + */
> >> + grub_divmod64 (high + stripen, nstripes, &stripen);
> >
> > ... but this looks strange... You add disk number to row number. Hmmm...
> > It looks that it works but this is not obvious at first sight. Could you
> > explain that?
>
> What about
> + /*
> + * stripen is recomputed considering the parities: different row have
> + * a different offset, we have to add to stripen the number of row ("high") in
> + * modulo nstripes (0 for A1, 1 for A2, 2 for A3....).
> + */
This is better but not much. You are repeating what code does.
I am especially interested in why this math is correct. It is not
obvious at first sight. If it is not it should be explained.
Otherwise we will forget in a few months why it is correct.
Daniel
next prev parent reply other threads:[~2018-09-27 15:53 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-19 18:40 [PATCH V7] Add support for BTRFS raid5/6 to GRUB Goffredo Baroncelli
2018-09-19 18:40 ` [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile Goffredo Baroncelli
2018-09-25 15:31 ` Daniel Kiper
2018-09-26 20:40 ` Goffredo Baroncelli
2018-09-27 15:47 ` Daniel Kiper [this message]
2018-09-19 18:40 ` [PATCH 2/9] btrfs: Add helper to check the btrfs header Goffredo Baroncelli
2018-09-19 18:40 ` [PATCH 3/9] btrfs: Move the error logging from find_device() to its caller Goffredo Baroncelli
2018-09-25 17:23 ` Daniel Kiper
2018-09-19 18:40 ` [PATCH 4/9] btrfs: Avoid a rescan for a device which was already not found Goffredo Baroncelli
2018-09-25 17:29 ` Daniel Kiper
2018-09-26 19:55 ` Goffredo Baroncelli
2018-09-27 16:03 ` Daniel Kiper
2018-09-19 18:40 ` [PATCH 5/9] btrfs: Move logging code in grub_btrfs_read_logical() Goffredo Baroncelli
2018-09-19 18:40 ` [PATCH 6/9] btrfs: Refactor the code that read from disk Goffredo Baroncelli
2018-09-19 18:40 ` [PATCH 7/9] btrfs: Add support for recovery for a RAID 5 btrfs profiles Goffredo Baroncelli
2018-09-25 19:10 ` Daniel Kiper
2018-09-26 19:55 ` Goffredo Baroncelli
2018-09-27 16:18 ` Daniel Kiper
2018-09-19 18:40 ` [PATCH 8/9] btrfs: Make more generic the code for RAID 6 rebuilding Goffredo Baroncelli
2018-09-19 18:40 ` [PATCH 9/9] btrfs: Add RAID 6 recovery for a btrfs filesystem Goffredo Baroncelli
2018-09-25 19:20 ` Daniel Kiper
2018-09-26 19:56 ` Goffredo Baroncelli
2018-09-27 16:20 ` Daniel Kiper
2018-09-27 18:34 [PATCH V8] Add support for BTRFS raid5/6 to GRUB Goffredo Baroncelli
2018-09-27 18:34 ` [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile Goffredo Baroncelli
2018-10-09 17:51 ` Daniel Kiper
2018-10-11 13:17 ` Daniel Kiper
2018-10-11 18:50 [PATCH V9] Add support for BTRFS raid5/6 to GRUB Goffredo Baroncelli
2018-10-11 18:50 ` [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile Goffredo Baroncelli
2018-10-17 13:46 ` Daniel Kiper
2018-10-18 17:55 [PATCH V10] Add support for BTRFS raid5/6 to GRUB Goffredo Baroncelli
2018-10-18 17:55 ` [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile Goffredo Baroncelli
2018-10-22 17:29 [PATCH V11] Add support for BTRFS raid5/6 to GRUB Goffredo Baroncelli
2018-10-22 17:29 ` [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile Goffredo Baroncelli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180927154759.GA22053@router-fw-old.i.net-space.pl \
--to=dkiper@net-space.pl \
--cc=grub-devel@gnu.org \
--cc=kreijack@inwind.it \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).