All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: RAID56 discussion related to RST. (Was "Re: [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree")
Date: Wed, 13 Jul 2022 11:43:32 +0000	[thread overview]
Message-ID: <PH0PR04MB74164213B5F136059236B78C9B899@PH0PR04MB7416.namprd04.prod.outlook.com> (raw)
In-Reply-To: 78daa7e4-7c88-d6c0-ccaa-fb148baf7bc8@gmx.com

On 13.07.22 12:54, Qu Wenruo wrote:
> 
> 
> On 2022/5/16 22:31, Johannes Thumshirn wrote:
>> Introduce a raid-stripe-tree to record writes in a RAID environment.
>>
>> In essence this adds another address translation layer between the logical
>> and the physical addresses in btrfs and is designed to close two gaps. The
>> first is the ominous RAID-write-hole we suffer from with RAID5/6 and the
>> second one is the inability of doing RAID with zoned block devices due to the
>> constraints we have with REQ_OP_ZONE_APPEND writes.
> 
> Here I want to discuss about something related to RAID56 and RST.
> 
> One of my long existing concern is, P/Q stripes have a higher update
> frequency, thus with certain transaction commit/data writeback timing,
> wouldn't it cause the device storing P/Q stripes go out of space before
> the data stripe devices?

P/Q stripes on a dedicated drive would be RAID4, which we don't have.

> 
> One example is like this, we have 3 disks RAID5, with RST and zoned
> allocator (allocated logical bytenr can only go forward):
> 
> 	0		32K		64K
> Disk 1	|                               | (data stripe)
> Disk 2	|                               | (data stripe)
> Disk 3	|                               | (parity stripe)
> 
> And initially, all the zones in those disks are empty, and their write
> pointer are all at the beginning of the zone. (all data)
> 
> Then we write 0~4K in the range, and write back happens immediate (can
> be DIO or sync).
> 
> We need to write the 0~4K back to disk 1, and update P for that vertical
> stripe, right? So we got:
> 
> 	0		32K		64K
> Disk 1	|X                              | (data stripe)
> Disk 2	|                               | (data stripe)
> Disk 3	|X                              | (parity stripe)
> 
> Then we write into 4~8K range, and sync immedately.
> 
> If we go C0W for the P (we have to anyway), so what we got is:
> 
> 	0		32K		64K
> Disk 1	|X                              | (data stripe)
> Disk 2	|X                              | (data stripe)
> Disk 3	|XX                             | (parity stripe)
> 
> So now, you can see disk3 (the zone handling parity) has its writer
> pointer moved 8K forward, but both data stripe zone only has its writer
> pointer moved 4K forward.
> 
> If we go forward like this, always 4K write and sync, we will hit the
> following case eventually:
> 
> 	0		32K		64K
> Disk 1	|XXXXXXXXXXXXXXX                | (data stripe)
> Disk 2	|XXXXXXXXXXXXXXX                | (data stripe)
> Disk 3	|XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| (parity stripe)
> 
> The extent allocator should still think we have 64K free space to write,
> as we only really have written 64K.
> 
> But the zone for parity stripe is already exhausted.
> 
> How could we handle such case?
> As RAID0/1 shouldn't have such problem at all, the imbalance is purely
> caused by the fact that CoWing P/Q will cause higher write frequency.
> 

Then the a new zone for the parity stripe has to be allocated, and the old one
gets reclaimed. That's nothing new. Of cause there's some gotchas in the extent
allocator and the active zone management we need to consider, but over all I do
not see where the blocker is here.

  reply	other threads:[~2022-07-13 11:43 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-16 14:31 [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 1/8] btrfs: add raid stripe tree definitions Johannes Thumshirn
2022-05-17  7:39   ` Qu Wenruo
2022-05-17  7:45     ` Johannes Thumshirn
2022-05-17  7:56       ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 2/8] btrfs: move btrfs_io_context to volumes.h Johannes Thumshirn
2022-05-17  7:42   ` Qu Wenruo
2022-05-17  7:51     ` Johannes Thumshirn
2022-05-17  7:58       ` Qu Wenruo
2022-05-17  8:01         ` Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 3/8] btrfs: read raid-stripe-tree from disk Johannes Thumshirn
2022-05-17  8:09   ` Qu Wenruo
2022-05-17  8:13     ` Johannes Thumshirn
2022-05-17  8:28       ` Qu Wenruo
2022-05-18 11:29         ` Johannes Thumshirn
2022-05-19  8:36           ` Qu Wenruo
2022-05-19  8:39             ` Johannes Thumshirn
2022-05-19 10:37               ` Qu Wenruo
2022-05-19 11:44                 ` Johannes Thumshirn
2022-05-19 11:48                   ` Qu Wenruo
2022-05-19 11:53                     ` Johannes Thumshirn
2022-05-19 13:26                       ` Qu Wenruo
2022-05-19 13:49                         ` Johannes Thumshirn
2022-05-19 22:56                           ` Qu Wenruo
2022-05-20  8:27                             ` Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 4/8] btrfs: add boilerplate code to insert raid extent Johannes Thumshirn
2022-05-17  7:53   ` Qu Wenruo
2022-05-17  8:00   ` Qu Wenruo
2022-05-17  8:05     ` Johannes Thumshirn
2022-05-17  8:09       ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 5/8] btrfs: add code to delete " Johannes Thumshirn
2022-05-17  8:06   ` Qu Wenruo
2022-05-17  8:10     ` Johannes Thumshirn
2022-05-17  8:14       ` Qu Wenruo
2022-05-17  8:20         ` Johannes Thumshirn
2022-05-17  8:31           ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 6/8] btrfs: add code to read " Johannes Thumshirn
2022-05-16 14:55   ` Josef Bacik
2022-05-16 14:31 ` [RFC ONLY 7/8] btrfs: zoned: allow zoned RAID1 Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 8/8] btrfs: add raid stripe tree pretty printer Johannes Thumshirn
2022-05-16 14:58 ` [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree Josef Bacik
2022-05-16 15:04   ` Johannes Thumshirn
2022-05-16 15:10     ` Josef Bacik
2022-05-16 15:47       ` Johannes Thumshirn
2022-05-17  7:23 ` Nikolay Borisov
2022-05-17  7:31   ` Qu Wenruo
2022-05-17  7:41     ` Johannes Thumshirn
2022-05-17  7:32   ` Johannes Thumshirn
2022-07-13 10:54 ` RAID56 discussion related to RST. (Was "Re: [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree") Qu Wenruo
2022-07-13 11:43   ` Johannes Thumshirn [this message]
2022-07-13 12:01     ` Qu Wenruo
2022-07-13 12:42       ` Johannes Thumshirn
2022-07-13 13:47         ` Qu Wenruo
2022-07-13 14:01           ` Johannes Thumshirn
2022-07-13 15:24             ` Lukas Straub
2022-07-13 15:28               ` Johannes Thumshirn
2022-07-14  1:08             ` Qu Wenruo
2022-07-14  7:08               ` Johannes Thumshirn
2022-07-14  7:32                 ` Qu Wenruo
2022-07-14  7:46                   ` Johannes Thumshirn
2022-07-14  7:53                     ` Qu Wenruo
2022-07-15 17:54                     ` Goffredo Baroncelli
2022-07-15 19:08                       ` Thiago Ramon
2022-07-16  0:34                         ` Qu Wenruo
2022-07-16 11:11                           ` Qu Wenruo
2022-07-16 13:52                             ` Thiago Ramon
2022-07-16 14:26                               ` Goffredo Baroncelli
2022-07-17 17:58                                 ` Goffredo Baroncelli
2022-07-17  0:30                               ` Qu Wenruo
2022-07-17 15:18                                 ` Thiago Ramon
2022-07-17 22:01                                   ` Qu Wenruo
2022-07-17 23:00                           ` Zygo Blaxell
2022-07-18  1:04                             ` Qu Wenruo
2022-07-15 20:14                       ` Chris Murphy
2022-07-18  7:33                         ` Johannes Thumshirn
2022-07-18  8:03                           ` Qu Wenruo
2022-07-18 21:49                         ` Forza
2022-07-19  1:19                           ` Qu Wenruo
2022-07-21 14:51                             ` Forza
2022-07-24 11:27                               ` Qu Wenruo
2022-07-25  0:00                             ` Zygo Blaxell
2022-07-25  0:25                               ` Qu Wenruo
2022-07-25  5:41                                 ` Zygo Blaxell
2022-07-25  7:49                                   ` Qu Wenruo
2022-07-25 19:58                               ` Goffredo Baroncelli
2022-07-25 21:29                                 ` Qu Wenruo
2022-07-18  7:30                       ` Johannes Thumshirn
2022-07-19 18:58                         ` Goffredo Baroncelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PH0PR04MB74164213B5F136059236B78C9B899@PH0PR04MB7416.namprd04.prod.outlook.com \
    --to=johannes.thumshirn@wdc.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.