linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Damien Le Moal <dlemoal@fastmail.com>
To: Christoph Hellwig <hch@infradead.org>, Roman Mamedov <rm@romanrm.net>
Cc: Linux regressions mailing list <regressions@lists.linux.dev>,
	Sergei Trofimovich <slyich@gmail.com>,
	Josef Bacik <josef@toxicpanda.com>,
	Christopher Price <pricechrispy@gmail.com>,
	anand.jain@oracle.com, boris@bur.io, clm@fb.com,
	dsterba@suse.com, linux-btrfs@vger.kernel.org
Subject: Re: [6.2 regression][bisected]discard storm on idle since v6.1-rc8-59-g63a7cb130718 discard=async
Date: Wed, 5 Apr 2023 08:37:24 +0900	[thread overview]
Message-ID: <852ad310-092e-169c-d98a-9317aa0b4268@fastmail.com> (raw)
In-Reply-To: <ZCxP/ll7YjPdb9Ou@infradead.org>

On 4/5/23 01:27, Christoph Hellwig wrote:
> On Tue, Apr 04, 2023 at 09:20:27PM +0500, Roman Mamedov wrote:
>> SSDs do not physically erase blocks on discard, that would be very slow.
>>
>> Instead they nuke corresponding records in the Flash translation layer (FTL)
>> tables, so that the discarded areas point "nowhere" instead of the actual
>> stored blocks. And when facing such pointers on trying to resolve read
>> requests, the controller knows to just return zeroes.
> 
> Of course they don't erase blocks on every discard (although if you look
> long enough you'll probably find a worst case implementation that does
> this anyway).  But you still need to persist your FTL changes, and the
> zeroing if any was done by the time your get a FLUSH command, because
> without that you'd return different data when reading a block after a
> powerfail (i.e. the old data) than before (zeros or a pattern), which is
> a no-go.
> 
>> Of course there can be varying behaviors per SSD, e.g. I know of some that
>> return random garbage instead of zeroes, and some which for a puzzling reason
>> prefer to return the byte FF instead.
> 
> All of that is valid behavior per the relevant standards.  
> 
>> But I think the 1st point above should
>> be universal, pretty certain there are none where a discard/TRIM would take
>> comparable time to "dd if=/dev/zero of=/dev/ssd" (making it unusable in
>> practice).
> 
> This is wishful thinking :)  SSDs generall optimize the fast path very
> heavily, so slow path command even when they should in theory be faster
> due to the underlying optimizations might not be, as they are processed
> in software instead of hardware offloads, moved to slower cores, etc.
> 
> For discard things have gotten a lot better in the last years, but for
> many older devices performance can be outright horrible.
> 
> For SATA SSDs the fact that classic TRIM isn't a queued command adds
> insult to injury as it always means draining the queue first and not
> issuing any I/O until the TRIM command is done.  There is a FPDMA
> version not, but I don't think it ws all that widely implemented before
> SATA SSDs fell out of favour.

Not to mention that many of the SATA SSDs actually implementing queued trim are
buggy. See ATA horkage flag ATA_HORKAGE_NO_NCQ_TRIM and the many consumer SSDs
that need that flag.


  reply	other threads:[~2023-04-04 23:45 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-20 22:40 [6.2 regression][bisected]discard storm on idle since v6.1-rc8-59-g63a7cb130718 discard=async Christopher Price
2023-03-21 21:26 ` Josef Bacik
2023-03-22  8:38   ` Christoph Hellwig
2023-03-23 22:26     ` Sergei Trofimovich
2023-04-04 10:49       ` Linux regression tracking (Thorsten Leemhuis)
2023-04-04 16:04         ` Christoph Hellwig
2023-04-04 16:20           ` Roman Mamedov
2023-04-04 16:27             ` Christoph Hellwig
2023-04-04 23:37               ` Damien Le Moal [this message]
2023-04-04 18:15           ` Chris Mason
2023-04-04 18:51             ` Boris Burkov
2023-04-04 19:22               ` David Sterba
2023-04-04 19:39                 ` Boris Burkov
2023-04-05  8:17                   ` Linux regression tracking (Thorsten Leemhuis)
2023-04-10  2:03               ` Michael Bromilow
2023-04-11 17:52                 ` David Sterba
2023-04-11 18:15                   ` Linux regression tracking (Thorsten Leemhuis)
2023-04-04 19:08             ` Sergei Trofimovich
2023-04-05  6:18             ` Christoph Hellwig
2023-04-05 12:01               ` Chris Mason
2023-04-04 18:23         ` Boris Burkov
2023-04-04 19:12           ` Sergei Trofimovich
  -- strict thread matches above, loose matches on Subject: below --
2023-03-01 19:30 Sergei Trofimovich
2023-03-02  8:04 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-04-04 10:52   ` Linux regression tracking #update (Thorsten Leemhuis)
2023-04-21 13:56   ` Linux regression tracking #update (Thorsten Leemhuis)
2023-03-02  9:12 ` Anand Jain
2023-03-02 10:54   ` Sergei Trofimovich
2023-03-15 11:44     ` Linux regression tracking (Thorsten Leemhuis)
2023-03-15 16:34       ` Sergei Trofimovich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=852ad310-092e-169c-d98a-9317aa0b4268@fastmail.com \
    --to=dlemoal@fastmail.com \
    --cc=anand.jain@oracle.com \
    --cc=boris@bur.io \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=hch@infradead.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=pricechrispy@gmail.com \
    --cc=regressions@lists.linux.dev \
    --cc=rm@romanrm.net \
    --cc=slyich@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).