All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Naohiro Aota <naohiro.aota@wdc.com>
Cc: linux-btrfs@vger.kernel.org, David Sterba <dsterba@suse.com>,
	Chris Mason <clm@fb.com>, Nikolay Borisov <nborisov@suse.com>,
	Damien Le Moal <damien.lemoal@wdc.com>,
	Johannes Thumshirn <jthumshirn@suse.de>,
	Hannes Reinecke <hare@suse.com>,
	Anand Jain <anand.jain@oracle.com>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v6 15/28] btrfs: serialize data allocation and submit IOs
Date: Thu, 19 Dec 2019 09:01:35 -0500	[thread overview]
Message-ID: <ce94fc27-0167-087e-28f1-17e885ff5ddb@toxicpanda.com> (raw)
In-Reply-To: <20191219065457.rhd4wcycylii33c3@naota.dhcp.fujisawa.hgst.com>

On 12/19/19 1:54 AM, Naohiro Aota wrote:
> On Tue, Dec 17, 2019 at 02:49:44PM -0500, Josef Bacik wrote:
>> On 12/12/19 11:09 PM, Naohiro Aota wrote:
>>> To preserve sequential write pattern on the drives, we must serialize
>>> allocation and submit_bio. This commit add per-block group mutex
>>> "zone_io_lock" and find_free_extent_zoned() hold the lock. The lock is kept
>>> even after returning from find_free_extent(). It is released when submiting
>>> IOs corresponding to the allocation is completed.
>>>
>>> Implementing such behavior under __extent_writepage_io() is almost
>>> impossible because once pages are unlocked we are not sure when submiting
>>> IOs for an allocated region is finished or not. Instead, this commit add
>>> run_delalloc_hmzoned() to write out non-compressed data IOs at once using
>>> extent_write_locked_rage(). After the write, we can call
>>> btrfs_hmzoned_data_io_unlock() to unlock the block group for new
>>> allocation.
>>>
>>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>>
>> Have you actually tested these patches with lock debugging on?  The 
>> submit_compressed_extents stuff is async, so the unlocker owner will not be 
>> the lock owner, and that'll make all sorts of things blow up. This is just 
>> straight up broken.
> 
> Yes, I have ran xfstests on this patch series with lockdeps and
> KASAN. There was no problem with that.
> 
> For non-compressed writes, both allocation and submit is done in
> run_delalloc_zoned(). Allocation is done in cow_file_range() and
> submit is done in extent_write_locked_range(), so both are in the same
> context, so both locking and unlocking are done by the same execution
> context.
> 
> For compressed writes, again, allocation/lock is done under
> cow_file_range() and submit is done in extent_write_locked_range() and
> unlocked all in submit_compressed_extents() (this is called after
> compression), so they are all in the same context and the lock owner
> does the unlock.
> 
>> I would really rather see a hmzoned block scheduler that just doesn't submit 
>> the bio's until they are aligned with the WP, that way this intellligence 
>> doesn't have to be dealt with at the file system layer. I get allocating in 
>> line with the WP, but this whole forcing us to allocate and submit the bio in 
>> lock step is just nuts, and broken in your subsequent patches.  This whole 
>> approach needs to be reworked. Thanks,
>>
>> Josef
> 
> We tried this approach by modifying mq-deadline to wait if the first
> queued request is not aligned at the write pointer of a zone. However,
> running btrfs without the allocate+submit lock with this modified IO
> scheduler did not work well at all. With write intensive workloads, we
> observed that a very long wait time was very often necessary to get a
> fully sequential stream of requests starting at the write pointer of a
> zone. The wait time we observed was sometimes in larger than 60 seconds,
> at which point we gave up.

This is because we will only write out the pages we've been handed but do 
cow_file_range() for a possibly larger delalloc range, so as you say there can 
be a large gap in time between writing one part of the range and writing the 
next part.

You actually solve this with your patch, by doing the cow_file_range and then 
following it up with the extent_write_locked_range() for the range you just cow'ed.

There is no need for the locking in this case, you could simply do that and then 
have a modified block scheduler that keeps the bio's in the correct order.  I 
imagine if you just did this with your original block layer approach it would 
work fine.  Thanks,

Josef

  reply	other threads:[~2019-12-19 14:01 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-13  4:08 [PATCH v6 00/28] btrfs: zoned block device support Naohiro Aota
2019-12-13  4:08 ` [PATCH v6 01/28] btrfs: introduce HMZONED feature flag Naohiro Aota
2019-12-13  4:08 ` [PATCH v6 02/28] btrfs: Get zone information of zoned block devices Naohiro Aota
2019-12-13 16:18   ` Josef Bacik
2019-12-18  2:29     ` Naohiro Aota
2019-12-13  4:08 ` [PATCH v6 03/28] btrfs: Check and enable HMZONED mode Naohiro Aota
2019-12-13 16:21   ` Josef Bacik
2019-12-18  4:17     ` Naohiro Aota
2019-12-13  4:08 ` [PATCH v6 04/28] btrfs: disallow RAID5/6 in " Naohiro Aota
2019-12-13 16:21   ` Josef Bacik
2019-12-13  4:08 ` [PATCH v6 05/28] btrfs: disallow space_cache " Naohiro Aota
2019-12-13 16:24   ` Josef Bacik
2019-12-18  4:28     ` Naohiro Aota
2019-12-13  4:08 ` [PATCH v6 06/28] btrfs: disallow NODATACOW " Naohiro Aota
2019-12-13 16:25   ` Josef Bacik
2019-12-13  4:08 ` [PATCH v6 07/28] btrfs: disable fallocate " Naohiro Aota
2019-12-13 16:26   ` Josef Bacik
2019-12-13  4:08 ` [PATCH v6 08/28] btrfs: implement log-structured superblock for " Naohiro Aota
2019-12-13 16:38   ` Josef Bacik
2019-12-13 21:58     ` Damien Le Moal
2019-12-17 19:17       ` Josef Bacik
2019-12-13  4:08 ` [PATCH v6 09/28] btrfs: align device extent allocation to zone boundary Naohiro Aota
2019-12-13 16:52   ` Josef Bacik
2019-12-13  4:08 ` [PATCH v6 10/28] btrfs: do sequential extent allocation in HMZONED mode Naohiro Aota
2019-12-17 19:19   ` Josef Bacik
2019-12-13  4:08 ` [PATCH v6 11/28] btrfs: make unmirroed BGs readonly only if we have at least one writable BG Naohiro Aota
2019-12-17 19:25   ` Josef Bacik
2019-12-18  7:35     ` Naohiro Aota
2019-12-18 14:54       ` Josef Bacik
2019-12-13  4:08 ` [PATCH v6 12/28] btrfs: ensure metadata space available on/after degraded mount in HMZONED Naohiro Aota
2019-12-17 19:32   ` Josef Bacik
2019-12-13  4:09 ` [PATCH v6 13/28] btrfs: reset zones of unused block groups Naohiro Aota
2019-12-17 19:33   ` Josef Bacik
2019-12-13  4:09 ` [PATCH v6 14/28] btrfs: redirty released extent buffers in HMZONED mode Naohiro Aota
2019-12-17 19:41   ` Josef Bacik
2019-12-13  4:09 ` [PATCH v6 15/28] btrfs: serialize data allocation and submit IOs Naohiro Aota
2019-12-17 19:49   ` Josef Bacik
2019-12-19  6:54     ` Naohiro Aota
2019-12-19 14:01       ` Josef Bacik [this message]
2020-01-21  6:54         ` Naohiro Aota
2019-12-13  4:09 ` [PATCH v6 16/28] btrfs: implement atomic compressed IO submission Naohiro Aota
2019-12-13  4:09 ` [PATCH v6 17/28] btrfs: support direct write IO in HMZONED Naohiro Aota
2019-12-13  4:09 ` [PATCH v6 18/28] btrfs: serialize meta IOs on HMZONED mode Naohiro Aota
2019-12-13  4:09 ` [PATCH v6 19/28] btrfs: wait existing extents before truncating Naohiro Aota
2019-12-17 19:53   ` Josef Bacik
2019-12-13  4:09 ` [PATCH v6 20/28] btrfs: avoid async checksum on HMZONED mode Naohiro Aota
2019-12-13  4:09 ` [PATCH v6 21/28] btrfs: disallow mixed-bg in " Naohiro Aota
2019-12-17 19:56   ` Josef Bacik
2019-12-18  8:03     ` Naohiro Aota
2019-12-13  4:09 ` [PATCH v6 22/28] btrfs: disallow inode_cache " Naohiro Aota
2019-12-17 19:56   ` Josef Bacik
2019-12-13  4:09 ` [PATCH v6 23/28] btrfs: support dev-replace " Naohiro Aota
2019-12-17 21:05   ` Josef Bacik
2019-12-18  6:00     ` Naohiro Aota
2019-12-18 14:58       ` Josef Bacik
2019-12-13  4:09 ` [PATCH v6 24/28] btrfs: enable relocation " Naohiro Aota
2019-12-17 21:32   ` Josef Bacik
2019-12-18 10:49     ` Naohiro Aota
2019-12-18 15:01       ` Josef Bacik
2019-12-13  4:09 ` [PATCH v6 25/28] btrfs: relocate block group to repair IO failure in HMZONED Naohiro Aota
2019-12-17 22:04   ` Josef Bacik
2019-12-13  4:09 ` [PATCH v6 26/28] btrfs: split alloc_log_tree() Naohiro Aota
2019-12-13  4:09 ` [PATCH v6 27/28] btrfs: enable tree-log on HMZONED mode Naohiro Aota
2019-12-17 22:08   ` Josef Bacik
2019-12-18  9:35     ` Naohiro Aota
2019-12-13  4:09 ` [PATCH v6 28/28] btrfs: enable to mount HMZONED incompat flag Naohiro Aota
2019-12-17 22:09   ` Josef Bacik
2019-12-13  4:15 ` [PATCH RFC v2] libblkid: implement zone-aware probing for HMZONED btrfs Naohiro Aota
2019-12-19 20:19 ` [PATCH v6 00/28] btrfs: zoned block device support David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ce94fc27-0167-087e-28f1-17e885ff5ddb@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=anand.jain@oracle.com \
    --cc=clm@fb.com \
    --cc=damien.lemoal@wdc.com \
    --cc=dsterba@suse.com \
    --cc=hare@suse.com \
    --cc=jthumshirn@suse.de \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=naohiro.aota@wdc.com \
    --cc=nborisov@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.