Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: Dennis Zhou <dennis@kernel.org>
To: Dennis Zhou <dennis@kernel.org>
Cc: David Sterba <dsterba@suse.cz>, David Sterba <dsterba@suse.com>,
	Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	Omar Sandoval <osandov@osandov.com>,
	kernel-team@fb.com, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH v6 00/22] btrfs: async discard support
Date: Wed, 18 Dec 2019 20:03:37 -0600
Message-ID: <20191219020337.GA25072@dennisz-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <20191218000600.GB2823@dennisz-mbp>

On Tue, Dec 17, 2019 at 07:06:00PM -0500, Dennis Zhou wrote:
> On Tue, Dec 17, 2019 at 03:55:41PM +0100, David Sterba wrote:
> > On Fri, Dec 13, 2019 at 04:22:09PM -0800, Dennis Zhou wrote:
> > > Hello,
> > > 
> > > Dave reported a lockdep issue [1]. I'm a bit surprised as I can't repro
> > > it, but it obviously is right. I believe I fixed the issue by moving the
> > > fully trimmed check outside of the block_group lock.  I mistakingly
> > > thought the btrfs_block_group lock subsumed btrfs_free_space_ctl
> > > tree_lock. This clearly isn't the case.
> > > 
> > > Changes in v6:
> > >  - Move the fully trimmed check outside of the block_group lock.
> > 
> > v6 passed fstests, with some weird test failures that don't seem to be
> > related to the patchset.
> 
> Yay!
> 
> > 
> > Meanwhile I did manual test how the discard behaves. The workload was
> > a series of linux git checkouts of various release tags (ie. this should
> > provide some freed extents and coalesce them eventually to get larger
> > chunks to discard), then a simple large file copy, sync, remove, sync.
> > 
> > The discards going down to the device were followin the maximum default
> > size (64M) but I observed that only one range was discarded per 10
> > seconds, while the other stats there are many more extents to discard.
> > 
> > For the large file it took like 5-10 cycles to send all the trimmed
> > ranges, the discardable_extents decreased by one each time until it
> > reached ... -1. At this point the discardable bytes were -16384, so
> > thre's some accounting problem.
> > 
> > This happened also when I deleted everything from the filesystem and ran
> > full balance.
> > 

Also were these both on fresh file systems so it seems reproducible for
you?

> 
> Oh no :(. I've been trying to repro with some limited checking out and
> syncing, then subsequently removing everything and calling balance. It
> is coming out to be 0 for me. I'll try harder to repro this and fix it.
> 
> > Regarding the slow io submission, I tried to increase the iops value,
> > default was 10, but 100 and 1000 made no change. Increasing the maximum
> > discard request size to 128M worked (when there was such long extent
> > ready). I was expecting a burst of like 4 consecutive IOs after a 600MB
> > file is deleted.  I did not try to tweak bps_limit because there was
> > nothing to limit.
> > 
> 
> Ah so there's actually a max time between discards set to 10 seconds as
> the maximum timeout is calculated over 6 hours. So if we only have 6
> extents, we'd discard 1 per hour(ish given it decays), but this is
> clamped to 10 seconds.
> 
> At least on our servers, we seem to discard at a reasonable rate to
> prevent performance penalties during a large number of reads and writes
> while maintaining reasonable write amplification performance. Also,
> metadata blocks aren't tracked, so on freeing of a whole metadata block
> group (minus relocation), we'll trickle discards slightly slower than
> expected.
> 
> 
> > So this is something to fix but otherwise the patchset seems to be ok
> > for adding to misc-next. Due to the timing of the end of the year and
> > that we're already at rc2, this will be the main feature in 5.6.
> 
> I'll report back if I continue having trouble reproing it.
> 

I spent the day trying to repro against ext/dzhou-async-discard-v6
without any luck... I've been running the following:

$ mkfs.btrfs -f /dev/nvme0n1
$ mount -t btrfs -o discard=async /dev/nvme0n1 mnt
$ cd mnt
$ bash ../age_btrfs.sh .

where age_btrfs.sh is from [1].

If I delete arbitrary subvolumes, sync, and then run balance:
$ btrfs balance start --full-balance .
It all seems to resolve to 0 after some time. I haven't seen a negative
case on either of my 2 boxes. I've also tried unmounting and then
remounting, deleting and removing more free space items.

I'm still considering how this can happen. Possibly bad load of free
space cache and then freeing of the block group? Because being off by
just 1 and it not accumulating seems to be a real corner case here.

Adding asserts in btrfs_discard_update_discardable() might give us
insight to which callsite is responsible for going below 0.

[1] https://github.com/osandov/osandov-linux/blob/master/scripts/age_btrfs.sh

Thanks,
Dennis

  reply index

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-14  0:22 Dennis Zhou
2019-12-14  0:22 ` [PATCH 01/22] bitmap: genericize percpu bitmap region iterators Dennis Zhou
2019-12-14  0:22 ` [PATCH 02/22] btrfs: rename DISCARD opt to DISCARD_SYNC Dennis Zhou
2019-12-14  0:22 ` [PATCH 03/22] btrfs: keep track of which extents have been discarded Dennis Zhou
2019-12-14  0:22 ` [PATCH 04/22] btrfs: keep track of cleanliness of the bitmap Dennis Zhou
2019-12-14  0:22 ` [PATCH 05/22] btrfs: add the beginning of async discard, discard workqueue Dennis Zhou
2019-12-14  0:22 ` [PATCH 06/22] btrfs: handle empty block_group removal Dennis Zhou
2019-12-14  0:22 ` [PATCH 07/22] btrfs: discard one region at a time in async discard Dennis Zhou
2019-12-14  0:22 ` [PATCH 08/22] btrfs: add removal calls for sysfs debug/ Dennis Zhou
2019-12-18 11:45   ` Anand Jain
2019-12-14  0:22 ` [PATCH 09/22] btrfs: make UUID/debug have its own kobject Dennis Zhou
2019-12-18 11:45   ` Anand Jain
2019-12-14  0:22 ` [PATCH 10/22] btrfs: add discard sysfs directory Dennis Zhou
2019-12-18 11:45   ` Anand Jain
2019-12-14  0:22 ` [PATCH 11/22] btrfs: track discardable extents for async discard Dennis Zhou
2019-12-14  0:22 ` [PATCH 12/22] btrfs: keep track of discardable_bytes Dennis Zhou
2019-12-14  0:22 ` [PATCH 13/22] btrfs: calculate discard delay based on number of extents Dennis Zhou
2019-12-30 16:50   ` David Sterba
2020-01-02 16:45     ` Dennis Zhou
2019-12-14  0:22 ` [PATCH 14/22] btrfs: add bps discard rate limit Dennis Zhou
2019-12-30 17:58   ` David Sterba
2020-01-02 16:46     ` Dennis Zhou
2019-12-14  0:22 ` [PATCH 15/22] btrfs: limit max discard size for async discard Dennis Zhou
2019-12-30 18:00   ` David Sterba
2019-12-30 18:08   ` David Sterba
2020-01-02 16:48     ` Dennis Zhou
2019-12-14  0:22 ` [PATCH 16/22] btrfs: make max async discard size tunable Dennis Zhou
2019-12-30 18:05   ` David Sterba
2020-01-02 16:50     ` Dennis Zhou
2019-12-14  0:22 ` [PATCH 17/22] btrfs: have multiple discard lists Dennis Zhou
2019-12-14  0:22 ` [PATCH 18/22] btrfs: only keep track of data extents for async discard Dennis Zhou
2019-12-30 17:39   ` David Sterba
2020-01-02 16:55     ` Dennis Zhou
2019-12-14  0:22 ` [PATCH 19/22] btrfs: keep track of discard reuse stats Dennis Zhou
2019-12-30 17:33   ` David Sterba
2020-01-02 16:57     ` Dennis Zhou
2019-12-14  0:22 ` [PATCH 20/22] btrfs: add async discard header Dennis Zhou
2019-12-14  0:22 ` [PATCH 21/22] btrfs: increase the metadata allowance for the free_space_cache Dennis Zhou
2019-12-14  0:22 ` [PATCH 22/22] btrfs: make smaller extents more likely to go into bitmaps Dennis Zhou
2019-12-17 14:55 ` [PATCH v6 00/22] btrfs: async discard support David Sterba
2019-12-18  0:06   ` Dennis Zhou
2019-12-19  2:03     ` Dennis Zhou [this message]
2019-12-19 20:06       ` David Sterba
2019-12-19 21:22         ` Dennis Zhou
2019-12-19 20:34     ` David Sterba
2019-12-19 21:17       ` Dennis Zhou
2019-12-30 18:13 ` David Sterba
2019-12-30 18:49   ` Dennis Zhou
2020-01-02 13:22     ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191219020337.GA25072@dennisz-mbp.dhcp.thefacebook.com \
    --to=dennis@kernel.org \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=dsterba@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=osandov@osandov.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git