All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	Qu Wenruo <quwenruo.btrfs@gmx.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: RAID56 discussion related to RST. (Was "Re: [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree")
Date: Thu, 14 Jul 2022 09:08:19 +0800	[thread overview]
Message-ID: <96da9455-f30d-b3fc-522b-7cbd08ad3358@suse.com> (raw)
In-Reply-To: <PH0PR04MB741638E2A15F4E106D8A6FAF9B899@PH0PR04MB7416.namprd04.prod.outlook.com>



On 2022/7/13 22:01, Johannes Thumshirn wrote:
> On 13.07.22 15:47, Qu Wenruo wrote:
> 
> 
> Ah ok, my apologies. For sub-stripe size writes My idea was to 0-pad up to
> stripe size. Then we can do full CoW of stripes. If we have an older generation
> of a stripe, we can just override it on regular btrfs. On zoned btrfs this
> just accounts for more zone_unusable bytes and waits for the GC to kick in.
> 

Sorry, I guess you still didn't get my point here.

What I'm talking about is, how many bytes you can really write into a 
full stripe when CoWing P/Q stripes.

[TL;DR]

If we CoW P/Q, for the worst cases (always 4K write and sync), the space 
efficiency is no better than RAID1.

For a lot of write order, we can only write 64K (STRIPE_LEN) no matter what.


!NOTE!
All following examples are using 8KiB sector size, to make the graph 
shorter.

[CASE 1 CURRENT WRITE ORDER, NO PADDING]
        0                               64K
Disk 1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | (Data stripe)
Disk 2 | 8 | 9 | a | b | c | d | e | f | (Data stripe)
Disk 3 | P | P | P | P | P | P | P | P | (Parity stripe).

For zoned RST, we can only write 8 sectors, then Disk 3 exhaust its 
zone. As every time we write a sector in data stripe, we have to write a P.

Total written bytes: 64K
Expected written bytes: 128K (nr_data * 64K)
Efficiency:	1 / nr_data.

The worst.

[CASE 2 CURRENT WRITE ORDER, PADDING]
No difference than case 1, just when we have finished sector 7, all 
zones are exhausted.

Total written bytes: 64K
Expected written bytes: 128K (nr_data * 64K)
Efficiency:	1 / nr_data.

[CASE 3 FULLY UNORDERED, NO PADDING]
This should have the best efficiency, but no better than RAID1.

        0                               64K
Disk 1 | 0 | P | 3 | P | 6 | P | 9 | P |
Disk 2 | P | 2 | P | 5 | P | 8 | P | b |
Disk 3 | 1 | P | 4 | P | 7 | P | a | P |

Total written bytes: 96K
Expected written bytes: 128K (nr_data * 64K)
Efficiency:	1 / 2

This can not even beat RAID1/RAID10, but cause way more metadata just 
for the RST.


Whatever the case, we can no longer ensure we can write (nr_data * 64K) 
bytes of data into a full stripe.
And for worst cases, it can be way bad than RAID1, I don't really think 
it's any good to our extent allocator or the space efficiency (that's 
exactly why users choose to go RAID56).

[ROOT CAUSE]
If we just check how many write we really need submit to each device, it 
should be obvious:

When data stripe in disk1 is filled:
        0                               64K
Disk 1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 64K written
Disk 2 |   |   |   |   |   |   |   |   | 0 written
Disk 3 | P | P | P | P | P | P | P | P | 64K written

When data stripe in disk2 is filled:

        0                               64K
Disk 1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 64K written
Disk 2 | 8 | 9 | a | b | c | d | e | f | 64K written
Disk 3 | P'| P'| P'| P'| P'| P'| P'| P'| 128K written

For RAID56 partial write, the total write is always 2 * data written.
Thus for zoned device, since they can not do any overwrite, their worst 
case space efficiency can never exceed RAID1.

Thus I have repeated times and times, against this problem for RST.

Thanks,
Qu

  parent reply	other threads:[~2022-07-14  1:08 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-16 14:31 [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 1/8] btrfs: add raid stripe tree definitions Johannes Thumshirn
2022-05-17  7:39   ` Qu Wenruo
2022-05-17  7:45     ` Johannes Thumshirn
2022-05-17  7:56       ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 2/8] btrfs: move btrfs_io_context to volumes.h Johannes Thumshirn
2022-05-17  7:42   ` Qu Wenruo
2022-05-17  7:51     ` Johannes Thumshirn
2022-05-17  7:58       ` Qu Wenruo
2022-05-17  8:01         ` Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 3/8] btrfs: read raid-stripe-tree from disk Johannes Thumshirn
2022-05-17  8:09   ` Qu Wenruo
2022-05-17  8:13     ` Johannes Thumshirn
2022-05-17  8:28       ` Qu Wenruo
2022-05-18 11:29         ` Johannes Thumshirn
2022-05-19  8:36           ` Qu Wenruo
2022-05-19  8:39             ` Johannes Thumshirn
2022-05-19 10:37               ` Qu Wenruo
2022-05-19 11:44                 ` Johannes Thumshirn
2022-05-19 11:48                   ` Qu Wenruo
2022-05-19 11:53                     ` Johannes Thumshirn
2022-05-19 13:26                       ` Qu Wenruo
2022-05-19 13:49                         ` Johannes Thumshirn
2022-05-19 22:56                           ` Qu Wenruo
2022-05-20  8:27                             ` Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 4/8] btrfs: add boilerplate code to insert raid extent Johannes Thumshirn
2022-05-17  7:53   ` Qu Wenruo
2022-05-17  8:00   ` Qu Wenruo
2022-05-17  8:05     ` Johannes Thumshirn
2022-05-17  8:09       ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 5/8] btrfs: add code to delete " Johannes Thumshirn
2022-05-17  8:06   ` Qu Wenruo
2022-05-17  8:10     ` Johannes Thumshirn
2022-05-17  8:14       ` Qu Wenruo
2022-05-17  8:20         ` Johannes Thumshirn
2022-05-17  8:31           ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 6/8] btrfs: add code to read " Johannes Thumshirn
2022-05-16 14:55   ` Josef Bacik
2022-05-16 14:31 ` [RFC ONLY 7/8] btrfs: zoned: allow zoned RAID1 Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 8/8] btrfs: add raid stripe tree pretty printer Johannes Thumshirn
2022-05-16 14:58 ` [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree Josef Bacik
2022-05-16 15:04   ` Johannes Thumshirn
2022-05-16 15:10     ` Josef Bacik
2022-05-16 15:47       ` Johannes Thumshirn
2022-05-17  7:23 ` Nikolay Borisov
2022-05-17  7:31   ` Qu Wenruo
2022-05-17  7:41     ` Johannes Thumshirn
2022-05-17  7:32   ` Johannes Thumshirn
2022-07-13 10:54 ` RAID56 discussion related to RST. (Was "Re: [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree") Qu Wenruo
2022-07-13 11:43   ` Johannes Thumshirn
2022-07-13 12:01     ` Qu Wenruo
2022-07-13 12:42       ` Johannes Thumshirn
2022-07-13 13:47         ` Qu Wenruo
2022-07-13 14:01           ` Johannes Thumshirn
2022-07-13 15:24             ` Lukas Straub
2022-07-13 15:28               ` Johannes Thumshirn
2022-07-14  1:08             ` Qu Wenruo [this message]
2022-07-14  7:08               ` Johannes Thumshirn
2022-07-14  7:32                 ` Qu Wenruo
2022-07-14  7:46                   ` Johannes Thumshirn
2022-07-14  7:53                     ` Qu Wenruo
2022-07-15 17:54                     ` Goffredo Baroncelli
2022-07-15 19:08                       ` Thiago Ramon
2022-07-16  0:34                         ` Qu Wenruo
2022-07-16 11:11                           ` Qu Wenruo
2022-07-16 13:52                             ` Thiago Ramon
2022-07-16 14:26                               ` Goffredo Baroncelli
2022-07-17 17:58                                 ` Goffredo Baroncelli
2022-07-17  0:30                               ` Qu Wenruo
2022-07-17 15:18                                 ` Thiago Ramon
2022-07-17 22:01                                   ` Qu Wenruo
2022-07-17 23:00                           ` Zygo Blaxell
2022-07-18  1:04                             ` Qu Wenruo
2022-07-15 20:14                       ` Chris Murphy
2022-07-18  7:33                         ` Johannes Thumshirn
2022-07-18  8:03                           ` Qu Wenruo
2022-07-18 21:49                         ` Forza
2022-07-19  1:19                           ` Qu Wenruo
2022-07-21 14:51                             ` Forza
2022-07-24 11:27                               ` Qu Wenruo
2022-07-25  0:00                             ` Zygo Blaxell
2022-07-25  0:25                               ` Qu Wenruo
2022-07-25  5:41                                 ` Zygo Blaxell
2022-07-25  7:49                                   ` Qu Wenruo
2022-07-25 19:58                               ` Goffredo Baroncelli
2022-07-25 21:29                                 ` Qu Wenruo
2022-07-18  7:30                       ` Johannes Thumshirn
2022-07-19 18:58                         ` Goffredo Baroncelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=96da9455-f30d-b3fc-522b-7cbd08ad3358@suse.com \
    --to=wqu@suse.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.