Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: Josef Bacik <josef@toxicpanda.com>, Naohiro Aota <Naohiro.Aota@wdc.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
	"David Sterba" <dsterba@suse.com>, "Chris Mason" <clm@fb.com>,
	"Qu Wenruo" <wqu@suse.com>, "Nikolay Borisov" <nborisov@suse.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Hannes Reinecke" <hare@suse.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"Matias Bjørling" <mb@lightnvm.io>,
	"Johannes Thumshirn" <jthumshirn@suse.de>,
	"Bart Van Assche" <bvanassche@acm.org>
Subject: Re: [PATCH 11/19] btrfs: introduce submit buffer
Date: Mon, 17 Jun 2019 03:16:05 +0000
Message-ID: <BYAPR04MB5816E9FC012A289CA438E794E7EB0@BYAPR04MB5816.namprd04.prod.outlook.com> (raw)
In-Reply-To: <20190613141457.jws5ca63wfgjf7da@MacBook-Pro-91.local>

Josef,

On 2019/06/13 23:15, Josef Bacik wrote:
> On Fri, Jun 07, 2019 at 10:10:17PM +0900, Naohiro Aota wrote:
>> Sequential allocation is not enough to maintain sequential delivery of
>> write IOs to the device. Various features (async compress, async checksum,
>> ...) of btrfs affect ordering of the IOs. This patch introduces submit
>> buffer to sort WRITE bios belonging to a block group and sort them out
>> sequentially in increasing block address to achieve sequential write
>> sequences with __btrfs_map_bio().
>>
>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> 
> I hate everything about this.  Can't we just use the plugging infrastructure for
> this and then make sure it re-orders the bios before submitting them?  Also
> what's to prevent the block layer scheduler from re-arranging these io's?
> Thanks,

The block I/O scheduler reorders requests in LBA order, but that happens for a
newly inserted request against pending requests. If there are no pending
requests because all requests were already issued, no ordering happen, and even
worse, if the drive queue is not full yet (e.g. there are free tags), then the
newly inserted request will be dispatched almost immediately, preventing
reordering with subsequent incoming write requests to happen.

The other problem is that the mq-deadline scheduler does not track zone WP
position. Write request issuing is done regardless of the current WP value,
solely based on LBA ordering. This means that mq-deadline will not prevent
out-of-order, or rather, unaligned write requests. These will not be detected
and dispatched whenever possible. The reasons for this are that:
1) the disk user (the FS) has to manage zone WP positions anyway. So duplicating
that management at the block IO scheduler level is inefficient.
2) Adding zone WP management at the block IO scheduler level would also need a
write error processing path to resync the WP value in case of failed writes. But
the user/FS also needs that anyway. Again duplicated functionalities.
3) The block layer will need a timeout to force issue or cancel pending
unaligned write requests. This is necessary in case the drive user stops issuing
writes (for whatever reasons) or the scheduler is being switched. This would
unnecessarily cause write I/O errors or cause deadlocks if the request queue
quiesce mode is entered at the wrong time (and I do not see a good way to deal
with that).

blk-mq is already complicated enough. Adding this to the block IO scheduler will
unnecessarily complicate things further for no real benefits. I would like to
point out the dm-zoned device mapper and f2fs which are both already dealing
with write ordering and write error processing directly. Both are fairly
straightforward but completely different and each optimized for their own structure.

Naohiro changes to btrfs IO scheduler have the same intent, that is, efficiently
integrate and handle write ordering "a la btrfs". Would creating a different
"hmzoned" btrfs IO scheduler help address your concerns ?

Best regards.


-- 
Damien Le Moal
Western Digital Research

  reply index

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-07 13:10 [PATCH v2 00/19] btrfs zoned block device support Naohiro Aota
2019-06-07 13:10 ` [PATCH 01/19] btrfs: introduce HMZONED feature flag Naohiro Aota
2019-06-07 13:10 ` [PATCH 02/19] btrfs: Get zone information of zoned block devices Naohiro Aota
2019-06-13 13:58   ` Josef Bacik
2019-06-18  6:04     ` Naohiro Aota
2019-06-13 13:58   ` Josef Bacik
2019-06-17 18:57   ` David Sterba
2019-06-18  6:42     ` Naohiro Aota
2019-06-27 15:11       ` David Sterba
2019-06-07 13:10 ` [PATCH 03/19] btrfs: Check and enable HMZONED mode Naohiro Aota
2019-06-13 13:57   ` Josef Bacik
2019-06-18  6:43     ` Naohiro Aota
2019-06-07 13:10 ` [PATCH 04/19] btrfs: disable fallocate in " Naohiro Aota
2019-06-07 13:10 ` [PATCH 05/19] btrfs: disable direct IO " Naohiro Aota
2019-06-13 14:00   ` Josef Bacik
2019-06-18  8:17     ` Naohiro Aota
2019-06-07 13:10 ` [PATCH 06/19] btrfs: align dev extent allocation to zone boundary Naohiro Aota
2019-06-07 13:10 ` [PATCH 07/19] btrfs: do sequential extent allocation in HMZONED mode Naohiro Aota
2019-06-13 14:07   ` Josef Bacik
2019-06-18  8:28     ` Naohiro Aota
2019-06-18 13:37       ` Josef Bacik
2019-06-17 22:30   ` David Sterba
2019-06-18  8:49     ` Naohiro Aota
2019-06-27 15:28       ` David Sterba
2019-06-07 13:10 ` [PATCH 08/19] btrfs: make unmirroed BGs readonly only if we have at least one writable BG Naohiro Aota
2019-06-13 14:09   ` Josef Bacik
2019-06-18  7:42     ` Naohiro Aota
2019-06-18 13:35       ` Josef Bacik
2019-06-07 13:10 ` [PATCH 09/19] btrfs: limit super block locations in HMZONED mode Naohiro Aota
2019-06-13 14:12   ` Josef Bacik
2019-06-18  8:51     ` Naohiro Aota
2019-06-17 22:53   ` David Sterba
2019-06-18  9:01     ` Naohiro Aota
2019-06-27 15:35       ` David Sterba
2019-06-28  3:55   ` Anand Jain
2019-06-28  6:39     ` Naohiro Aota
2019-06-28  6:52       ` Anand Jain
2019-06-07 13:10 ` [PATCH 10/19] btrfs: rename btrfs_map_bio() Naohiro Aota
2019-06-07 13:10 ` [PATCH 11/19] btrfs: introduce submit buffer Naohiro Aota
2019-06-13 14:14   ` Josef Bacik
2019-06-17  3:16     ` Damien Le Moal [this message]
2019-06-18  0:00       ` David Sterba
2019-06-18  4:04         ` Damien Le Moal
2019-06-18 13:33       ` Josef Bacik
2019-06-19 10:32         ` Damien Le Moal
2019-06-07 13:10 ` [PATCH 12/19] btrfs: expire submit buffer on timeout Naohiro Aota
2019-06-13 14:15   ` Josef Bacik
2019-06-17  3:19     ` Damien Le Moal
2019-06-07 13:10 ` [PATCH 13/19] btrfs: avoid sync IO prioritization on checksum in HMZONED mode Naohiro Aota
2019-06-13 14:17   ` Josef Bacik
2019-06-07 13:10 ` [PATCH 14/19] btrfs: redirty released extent buffers in sequential BGs Naohiro Aota
2019-06-13 14:24   ` Josef Bacik
2019-06-18  9:09     ` Naohiro Aota
2019-06-07 13:10 ` [PATCH 15/19] btrfs: reset zones of unused block groups Naohiro Aota
2019-06-07 13:10 ` [PATCH 16/19] btrfs: wait existing extents before truncating Naohiro Aota
2019-06-13 14:25   ` Josef Bacik
2019-06-07 13:10 ` [PATCH 17/19] btrfs: shrink delayed allocation size in HMZONED mode Naohiro Aota
2019-06-13 14:27   ` Josef Bacik
2019-06-07 13:10 ` [PATCH 18/19] btrfs: support dev-replace " Naohiro Aota
2019-06-13 14:33   ` Josef Bacik
2019-06-18  9:14     ` Naohiro Aota
2019-06-07 13:10 ` [PATCH 19/19] btrfs: enable to mount HMZONED incompat flag Naohiro Aota
2019-06-07 13:17 ` [PATCH 01/12] btrfs-progs: build: Check zoned block device support Naohiro Aota
2019-06-07 13:17   ` [PATCH 02/12] btrfs-progs: utils: Introduce queue_param Naohiro Aota
2019-06-07 13:17   ` [PATCH 03/12] btrfs-progs: add new HMZONED feature flag Naohiro Aota
2019-06-07 13:17   ` [PATCH 04/12] btrfs-progs: Introduce zone block device helper functions Naohiro Aota
2019-06-07 13:17   ` [PATCH 05/12] btrfs-progs: load and check zone information Naohiro Aota
2019-06-07 13:17   ` [PATCH 06/12] btrfs-progs: avoid writing super block to sequential zones Naohiro Aota
2019-06-07 13:17   ` [PATCH 07/12] btrfs-progs: support discarding zoned device Naohiro Aota
2019-06-07 13:17   ` [PATCH 08/12] btrfs-progs: volume: align chunk allocation to zones Naohiro Aota
2019-06-07 13:17   ` [PATCH 09/12] btrfs-progs: do sequential allocation Naohiro Aota
2019-06-07 13:17   ` [PATCH 10/12] btrfs-progs: mkfs: Zoned block device support Naohiro Aota
2019-06-07 13:17   ` [PATCH 11/12] btrfs-progs: device-add: support HMZONED device Naohiro Aota
2019-06-07 13:17   ` [PATCH 12/12] btrfs-progs: introduce support for dev-place " Naohiro Aota
2019-06-12 17:51 ` [PATCH v2 00/19] btrfs zoned block device support David Sterba
2019-06-13  4:59   ` Naohiro Aota
2019-06-13 13:46     ` David Sterba
2019-06-14  2:07       ` Naohiro Aota
2019-06-17  2:44       ` Damien Le Moal

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BYAPR04MB5816E9FC012A289CA438E794E7EB0@BYAPR04MB5816.namprd04.prod.outlook.com \
    --to=damien.lemoal@wdc.com \
    --cc=Naohiro.Aota@wdc.com \
    --cc=bvanassche@acm.org \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=hare@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=jthumshirn@suse.de \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mb@lightnvm.io \
    --cc=nborisov@suse.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org linux-btrfs@archiver.kernel.org
	public-inbox-index linux-btrfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox