All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
To: Nikolay Borisov <nborisov@suse.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree
Date: Tue, 17 May 2022 07:32:47 +0000	[thread overview]
Message-ID: <PH0PR04MB741613EDCFC95CCCF5587CBB9BCE9@PH0PR04MB7416.namprd04.prod.outlook.com> (raw)
In-Reply-To: 64227525-4507-9a04-942c-e081c6550f69@suse.com

On 17/05/2022 09:23, Nikolay Borisov wrote:
> 
> 
> On 16.05.22 г. 17:31 ч., Johannes Thumshirn wrote:
>> Introduce a raid-stripe-tree to record writes in a RAID environment.
>>
>> In essence this adds another address translation layer between the logical
>> and the physical addresses in btrfs and is designed to close two gaps. The
>> first is the ominous RAID-write-hole we suffer from with RAID5/6 and the
>> second one is the inability of doing RAID with zoned block devices due to the
>> constraints we have with REQ_OP_ZONE_APPEND writes.
>>
>> Thsi is an RFC/PoC only which just shows how the code will look like for a
>> zoned RAID1. Its sole purpose is to facilitate design reviews and is not
>> intended to be merged yet. Or if merged to be used on an actual file-system.
>>
>> Johannes Thumshirn (8):
>>    btrfs: add raid stripe tree definitions
>>    btrfs: move btrfs_io_context to volumes.h
>>    btrfs: read raid-stripe-tree from disk
>>    btrfs: add boilerplate code to insert raid extent
>>    btrfs: add code to delete raid extent
>>    btrfs: add code to read raid extent
>>    btrfs: zoned: allow zoned RAID1
>>    btrfs: add raid stripe tree pretty printer
>>
>>   fs/btrfs/Makefile               |   2 +-
>>   fs/btrfs/ctree.c                |   1 +
>>   fs/btrfs/ctree.h                |  29 ++++
>>   fs/btrfs/disk-io.c              |  12 ++
>>   fs/btrfs/extent-tree.c          |   9 ++
>>   fs/btrfs/file.c                 |   1 -
>>   fs/btrfs/print-tree.c           |  21 +++
>>   fs/btrfs/raid-stripe-tree.c     | 251 ++++++++++++++++++++++++++++++++
>>   fs/btrfs/raid-stripe-tree.h     |  39 +++++
>>   fs/btrfs/volumes.c              |  44 +++++-
>>   fs/btrfs/volumes.h              |  93 ++++++------
>>   fs/btrfs/zoned.c                |  39 +++++
>>   include/uapi/linux/btrfs.h      |   1 +
>>   include/uapi/linux/btrfs_tree.h |  17 +++
>>   14 files changed, 509 insertions(+), 50 deletions(-)
>>   create mode 100644 fs/btrfs/raid-stripe-tree.c
>>   create mode 100644 fs/btrfs/raid-stripe-tree.h
>>
> 
> 
> So if we choose to go with raid stripe tree this means we won't need the 
> raid56j code that Qu is working on ? So it's important that these two 
> work streams are synced so we don't duplicate effort, right?
> 

That's the reason for my early RFC here.

I think both solutions have benefits and drawbacks. 

The stripe tree adds complexity, metadata (though at the moment only 16 
bytes per drive in the stripe per extent) and another address translation /
lookup layer, it adds the benefit of being always able to do CoW and close
the write-hole here. Also it can work with zoned devices and the Zone Append
write command.

The raid56j code will be simpler in the end I suspect, but it still doesn't
do full CoW and isn't Zone Append capable. Two factors that can't work on
zoned filesystems. And given that capacity drives will likely be more and more
zoned drives, even outside of the hyperscale sector, I see this problematic.

Both Qu and I are aware of each others patches and I would really like to get
the work converged here. The raid56j code for sure is a stop gap solution for
the users that already have a raid56 setup and want to get rid of the write
hole.

Thanks,
	Johannes




  parent reply	other threads:[~2022-05-17  7:32 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-16 14:31 [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 1/8] btrfs: add raid stripe tree definitions Johannes Thumshirn
2022-05-17  7:39   ` Qu Wenruo
2022-05-17  7:45     ` Johannes Thumshirn
2022-05-17  7:56       ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 2/8] btrfs: move btrfs_io_context to volumes.h Johannes Thumshirn
2022-05-17  7:42   ` Qu Wenruo
2022-05-17  7:51     ` Johannes Thumshirn
2022-05-17  7:58       ` Qu Wenruo
2022-05-17  8:01         ` Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 3/8] btrfs: read raid-stripe-tree from disk Johannes Thumshirn
2022-05-17  8:09   ` Qu Wenruo
2022-05-17  8:13     ` Johannes Thumshirn
2022-05-17  8:28       ` Qu Wenruo
2022-05-18 11:29         ` Johannes Thumshirn
2022-05-19  8:36           ` Qu Wenruo
2022-05-19  8:39             ` Johannes Thumshirn
2022-05-19 10:37               ` Qu Wenruo
2022-05-19 11:44                 ` Johannes Thumshirn
2022-05-19 11:48                   ` Qu Wenruo
2022-05-19 11:53                     ` Johannes Thumshirn
2022-05-19 13:26                       ` Qu Wenruo
2022-05-19 13:49                         ` Johannes Thumshirn
2022-05-19 22:56                           ` Qu Wenruo
2022-05-20  8:27                             ` Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 4/8] btrfs: add boilerplate code to insert raid extent Johannes Thumshirn
2022-05-17  7:53   ` Qu Wenruo
2022-05-17  8:00   ` Qu Wenruo
2022-05-17  8:05     ` Johannes Thumshirn
2022-05-17  8:09       ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 5/8] btrfs: add code to delete " Johannes Thumshirn
2022-05-17  8:06   ` Qu Wenruo
2022-05-17  8:10     ` Johannes Thumshirn
2022-05-17  8:14       ` Qu Wenruo
2022-05-17  8:20         ` Johannes Thumshirn
2022-05-17  8:31           ` Qu Wenruo
2022-05-16 14:31 ` [RFC ONLY 6/8] btrfs: add code to read " Johannes Thumshirn
2022-05-16 14:55   ` Josef Bacik
2022-05-16 14:31 ` [RFC ONLY 7/8] btrfs: zoned: allow zoned RAID1 Johannes Thumshirn
2022-05-16 14:31 ` [RFC ONLY 8/8] btrfs: add raid stripe tree pretty printer Johannes Thumshirn
2022-05-16 14:58 ` [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree Josef Bacik
2022-05-16 15:04   ` Johannes Thumshirn
2022-05-16 15:10     ` Josef Bacik
2022-05-16 15:47       ` Johannes Thumshirn
2022-05-17  7:23 ` Nikolay Borisov
2022-05-17  7:31   ` Qu Wenruo
2022-05-17  7:41     ` Johannes Thumshirn
2022-05-17  7:32   ` Johannes Thumshirn [this message]
2022-07-13 10:54 ` RAID56 discussion related to RST. (Was "Re: [RFC ONLY 0/8] btrfs: introduce raid-stripe-tree") Qu Wenruo
2022-07-13 11:43   ` Johannes Thumshirn
2022-07-13 12:01     ` Qu Wenruo
2022-07-13 12:42       ` Johannes Thumshirn
2022-07-13 13:47         ` Qu Wenruo
2022-07-13 14:01           ` Johannes Thumshirn
2022-07-13 15:24             ` Lukas Straub
2022-07-13 15:28               ` Johannes Thumshirn
2022-07-14  1:08             ` Qu Wenruo
2022-07-14  7:08               ` Johannes Thumshirn
2022-07-14  7:32                 ` Qu Wenruo
2022-07-14  7:46                   ` Johannes Thumshirn
2022-07-14  7:53                     ` Qu Wenruo
2022-07-15 17:54                     ` Goffredo Baroncelli
2022-07-15 19:08                       ` Thiago Ramon
2022-07-16  0:34                         ` Qu Wenruo
2022-07-16 11:11                           ` Qu Wenruo
2022-07-16 13:52                             ` Thiago Ramon
2022-07-16 14:26                               ` Goffredo Baroncelli
2022-07-17 17:58                                 ` Goffredo Baroncelli
2022-07-17  0:30                               ` Qu Wenruo
2022-07-17 15:18                                 ` Thiago Ramon
2022-07-17 22:01                                   ` Qu Wenruo
2022-07-17 23:00                           ` Zygo Blaxell
2022-07-18  1:04                             ` Qu Wenruo
2022-07-15 20:14                       ` Chris Murphy
2022-07-18  7:33                         ` Johannes Thumshirn
2022-07-18  8:03                           ` Qu Wenruo
2022-07-18 21:49                         ` Forza
2022-07-19  1:19                           ` Qu Wenruo
2022-07-21 14:51                             ` Forza
2022-07-24 11:27                               ` Qu Wenruo
2022-07-25  0:00                             ` Zygo Blaxell
2022-07-25  0:25                               ` Qu Wenruo
2022-07-25  5:41                                 ` Zygo Blaxell
2022-07-25  7:49                                   ` Qu Wenruo
2022-07-25 19:58                               ` Goffredo Baroncelli
2022-07-25 21:29                                 ` Qu Wenruo
2022-07-18  7:30                       ` Johannes Thumshirn
2022-07-19 18:58                         ` Goffredo Baroncelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PH0PR04MB741613EDCFC95CCCF5587CBB9BCE9@PH0PR04MB7416.namprd04.prod.outlook.com \
    --to=johannes.thumshirn@wdc.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nborisov@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.