All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	Qu Wenruo <wqu@suse.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: RAID56 discussion related to RST. (Was "Re: [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree")
Date: Thu, 14 Jul 2022 15:53:04 +0800	[thread overview]
Message-ID: <f54f6709-9b31-fc9c-6b5d-10dd43b089ca@gmx.com> (raw)
In-Reply-To: <PH0PR04MB7416243FCD419B4BDDB04D8C9B889@PH0PR04MB7416.namprd04.prod.outlook.com>



On 2022/7/14 15:46, Johannes Thumshirn wrote:
> On 14.07.22 09:32, Qu Wenruo wrote:
>>
>>
>> On 2022/7/14 15:08, Johannes Thumshirn wrote:
>>> On 14.07.22 03:08, Qu Wenruo wrote:> [CASE 2 CURRENT WRITE ORDER, PADDING> No difference than case 1, just when we have finished sector 7, all > zones are exhausted.>> Total written bytes: 64K> Expected written bytes: 128K (nr_data * 64K)> Efficiency:	1 / nr_data.>
>>> I'm sorry but I have to disagree.
>>> If we're writing less than 64k, everything beyond these 64k will get filled up with 0
>>>
>>>          0                               64K
>>> Disk 1 | D1| 0 | 0 | 0 | 0 | 0 | 0 | 0 | (Data stripe)
>>> Disk 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | (Data stripe)
>>> Disk 3 | P | P | P | P | P | P | P | P | (Parity stripe)
>>>
>>> So the next write (the CoW) will then be:
>>>
>>>         64k                              128K
>>> Disk 1 | D1| 0 | 0 | 0 | 0 | 0 | 0 | 0 | (Data stripe)
>>> Disk 2 | D2| 0 | 0 | 0 | 0 | 0 | 0 | 0 | (Data stripe)
>>> Disk 3 | P'| P'| P'| P'| P'| P'| P'| P'| (Parity stripe)
>>
>> Nope, currently full stripe write should still go into disk1, not disk 2.
>> Sorry I did use a bad example from the very beginning.
>>
>> In that case, what we should have is:
>>
>>          0                               64K
>> Disk 1 | D1| D2| 0 | 0 | 0 | 0 | 0 | 0 | (Data stripe)
>> Disk 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | (Data stripe)
>> Disk 3 | P | P | 0 | 0 | 0 | 0 | 0 | 0 | (Parity stripe)
>>
>> In that case, Parity should still needs two blocks.
>>
>> And when Disk 1 get filled up, we have no way to write into Disk 2.
>>
>>>
>>> For zoned we can play this game zone_size/stripe_size times, which on a typical
>>> SMR HDD would be:
>>>
>>> 126M/64k = 4096 times until you fill up a zone.
>>
>> No difference.
>>
>> You have extra zone to use, but the result is, the space efficiency will
>> not be better than RAID1 for the worst case.
>>
>>>
>>> I.e. if you do stupid things you get stupid results. C'est la vie.
>>>
>>
>> You still didn't answer the space efficient problem.
>>
>> RAID56 really rely on overwrite on its P/Q stripes.
>
> Nope, btrfs raid56 does this. Another implementation could for instance
> buffer each stripe in an NVRAM (like described in [1]), or like Chris
> suggested in a RAID1 area on the drives, or doing variable stripe length
> like ZFS' RAID-Z, and so on.

Not only btrfs raid56, but also dm-raid56 also do this.

And what you mention is just an variant of journal, delay the write
until got a full stripe.

>
>> The total write amount is really twice the data writes, that's something
>> you can not avoid.
>>
>
> Again if you're doing sub-stripe size writes, you're asking stupid things and
> then there's no reason to not give the user stupid answers.

No, you can not limit what users do.

As long as btrfs itself support writes in sectorsize (4K), you can not
stop user doing that.

In your argument, I can also say, write-intent is a problem of end
users, and no need to fix at all.

That's definitely not the correct way to do, let user to adapt the
limitation? No, just big no.

Thanks,
Qu

>
> If a user is concerned about the write or space amplicfication of sub-stripe
> writes on RAID56 he/she really needs to rethink the architecture.
>
>
>
> [1]
> S. K. Mishra and P. Mohapatra,
> "Performance study of RAID-5 disk arrays with data and parity cache,"
> Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing,
> 1996, pp. 222-229 vol.1, doi: 10.1109/ICPP.1996.537164.

  reply	other threads:[~2022-07-14  7:53 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-16 14:31 [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 1/8] btrfs: add raid stripe tree definitions Johannes Thumshirn
2022-05-17  7:39   ` Qu Wenruo
2022-05-17  7:45     ` Johannes Thumshirn
2022-05-17  7:56       ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 2/8] btrfs: move btrfs_io_context to volumes.h Johannes Thumshirn
2022-05-17  7:42   ` Qu Wenruo
2022-05-17  7:51     ` Johannes Thumshirn
2022-05-17  7:58       ` Qu Wenruo
2022-05-17  8:01         ` Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 3/8] btrfs: read raid-stripe-tree from disk Johannes Thumshirn
2022-05-17  8:09   ` Qu Wenruo
2022-05-17  8:13     ` Johannes Thumshirn
2022-05-17  8:28       ` Qu Wenruo
2022-05-18 11:29         ` Johannes Thumshirn
2022-05-19  8:36           ` Qu Wenruo
2022-05-19  8:39             ` Johannes Thumshirn
2022-05-19 10:37               ` Qu Wenruo
2022-05-19 11:44                 ` Johannes Thumshirn
2022-05-19 11:48                   ` Qu Wenruo
2022-05-19 11:53                     ` Johannes Thumshirn
2022-05-19 13:26                       ` Qu Wenruo
2022-05-19 13:49                         ` Johannes Thumshirn
2022-05-19 22:56                           ` Qu Wenruo
2022-05-20  8:27                             ` Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 4/8] btrfs: add boilerplate code to insert raid extent Johannes Thumshirn
2022-05-17  7:53   ` Qu Wenruo
2022-05-17  8:00   ` Qu Wenruo
2022-05-17  8:05     ` Johannes Thumshirn
2022-05-17  8:09       ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 5/8] btrfs: add code to delete " Johannes Thumshirn
2022-05-17  8:06   ` Qu Wenruo
2022-05-17  8:10     ` Johannes Thumshirn
2022-05-17  8:14       ` Qu Wenruo
2022-05-17  8:20         ` Johannes Thumshirn
2022-05-17  8:31           ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 6/8] btrfs: add code to read " Johannes Thumshirn
2022-05-16 14:55   ` Josef Bacik
2022-05-16 14:31 ` [RFC ONLY 7/8] btrfs: zoned: allow zoned RAID1 Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 8/8] btrfs: add raid stripe tree pretty printer Johannes Thumshirn
2022-05-16 14:58 ` [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree Josef Bacik
2022-05-16 15:04   ` Johannes Thumshirn
2022-05-16 15:10     ` Josef Bacik
2022-05-16 15:47       ` Johannes Thumshirn
2022-05-17  7:23 ` Nikolay Borisov
2022-05-17  7:31   ` Qu Wenruo
2022-05-17  7:41     ` Johannes Thumshirn
2022-05-17  7:32   ` Johannes Thumshirn
2022-07-13 10:54 ` RAID56 discussion related to RST. (Was "Re: [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree") Qu Wenruo
2022-07-13 11:43   ` Johannes Thumshirn
2022-07-13 12:01     ` Qu Wenruo
2022-07-13 12:42       ` Johannes Thumshirn
2022-07-13 13:47         ` Qu Wenruo
2022-07-13 14:01           ` Johannes Thumshirn
2022-07-13 15:24             ` Lukas Straub
2022-07-13 15:28               ` Johannes Thumshirn
2022-07-14  1:08             ` Qu Wenruo
2022-07-14  7:08               ` Johannes Thumshirn
2022-07-14  7:32                 ` Qu Wenruo
2022-07-14  7:46                   ` Johannes Thumshirn
2022-07-14  7:53                     ` Qu Wenruo [this message]
2022-07-15 17:54                     ` Goffredo Baroncelli
2022-07-15 19:08                       ` Thiago Ramon
2022-07-16  0:34                         ` Qu Wenruo
2022-07-16 11:11                           ` Qu Wenruo
2022-07-16 13:52                             ` Thiago Ramon
2022-07-16 14:26                               ` Goffredo Baroncelli
2022-07-17 17:58                                 ` Goffredo Baroncelli
2022-07-17  0:30                               ` Qu Wenruo
2022-07-17 15:18                                 ` Thiago Ramon
2022-07-17 22:01                                   ` Qu Wenruo
2022-07-17 23:00                           ` Zygo Blaxell
2022-07-18  1:04                             ` Qu Wenruo
2022-07-15 20:14                       ` Chris Murphy
2022-07-18  7:33                         ` Johannes Thumshirn
2022-07-18  8:03                           ` Qu Wenruo
2022-07-18 21:49                         ` Forza
2022-07-19  1:19                           ` Qu Wenruo
2022-07-21 14:51                             ` Forza
2022-07-24 11:27                               ` Qu Wenruo
2022-07-25  0:00                             ` Zygo Blaxell
2022-07-25  0:25                               ` Qu Wenruo
2022-07-25  5:41                                 ` Zygo Blaxell
2022-07-25  7:49                                   ` Qu Wenruo
2022-07-25 19:58                               ` Goffredo Baroncelli
2022-07-25 21:29                                 ` Qu Wenruo
2022-07-18  7:30                       ` Johannes Thumshirn
2022-07-19 18:58                         ` Goffredo Baroncelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f54f6709-9b31-fc9c-6b5d-10dd43b089ca@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.