Re: [6.2 regression][bisected]discard storm on idle since v6.1-rc8-59-g63a7cb130718 discard=async

From: Sergei Trofimovich <slyich@gmail.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Josef Bacik <josef@toxicpanda.com>,
	Christopher Price <pricechrispy@gmail.com>,
	anand.jain@oracle.com, boris@bur.io, clm@fb.com,
	dsterba@suse.com, linux-btrfs@vger.kernel.org,
	regressions@leemhuis.info, regressions@lists.linux.dev
Subject: Re: [6.2 regression][bisected]discard storm on idle since v6.1-rc8-59-g63a7cb130718 discard=async
Date: Thu, 23 Mar 2023 22:26:06 +0000	[thread overview]
Message-ID: <20230323222606.20d10de2@nz> (raw)
In-Reply-To: <ZBq+ktWm2gZR/sgq@infradead.org>

[-- Attachment #1: Type: text/plain, Size: 2049 bytes --]

On Wed, 22 Mar 2023 01:38:42 -0700
Christoph Hellwig <hch@infradead.org> wrote:

> On Tue, Mar 21, 2023 at 05:26:49PM -0400, Josef Bacik wrote:
> > We got the defaults based on our testing with our workloads inside of
> > FB.  Clearly this isn't representative of a normal desktop usage, but
> > we've also got a lot of workloads so figured if it made the whole
> > fleet happy it would probably be fine everywhere.
> > 
> > That being said this is tunable for a reason, your workload seems to
> > generate a lot of free'd extents and discards.  We can probably mess
> > with the async stuff to maybe pause discarding if there's no other
> > activity happening on the device at the moment, but tuning it to let
> > more discards through at a time is also completely valid.  Thanks,  
> 
> FYI, discard performance differs a lot between different SSDs.
> It used to be pretty horrible for most devices early on, and then a
> certain hyperscaler started requiring decent performance for enterprise
> drives, so many of them are good now.  A lot less so for the typical
> consumer drive, especially at the lower end of the spectrum.
> 
> And that jut NVMe, the still shipping SATA SSDs are another different
> story.  Not helped by the fact that we don't even support ranged
> discards for them in Linux.

Josef, what did you use as a signal to detect what value was good
enough? Did you crank up the number until discard backlog clears up in a
reasonable time?

I still don't understand what I should take into account to change the
default and whether I should change it at all. Is it fine if the discard
backlog takes a week to get through it? (Or a day? An hour? A minute?)

Is it fine to send discards as fast as device allows instead of doing 10
IOPS? Does IOPS limit consider a device wearing tradeoff? Then low IOPS
makes sense. Or IOPS limit is just a way to reserve most bandwidth to
non-discard workloads? Then I would say unlimited IOPS as a default
would make more sense for btrfs.

-- 

  Sergei

[-- Attachment #2: Цифровая подпись OpenPGP --]
[-- Type: application/pgp-signature, Size: 981 bytes --]