linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Keith Busch <keith.busch@intel.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
	Ric Wheeler <ricwheeler@gmail.com>,
	Dave Chinner <david@fromorbit.com>,
	lsf-pc@lists.linux-foundation.org,
	linux-xfs <linux-xfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-ext4 <linux-ext4@vger.kernel.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	linux-block@vger.kernel.org
Subject: Re: [LSF/MM TOPIC] More async operations for file systems - async discard?
Date: Wed, 27 Feb 2019 05:24:51 -0800	[thread overview]
Message-ID: <20190227132451.GL11592@bombadil.infradead.org> (raw)
In-Reply-To: <20190222164504.GB10066@localhost.localdomain>

On Fri, Feb 22, 2019 at 09:45:05AM -0700, Keith Busch wrote:
> On Thu, Feb 21, 2019 at 09:51:12PM -0500, Martin K. Petersen wrote:
> > 
> > Keith,
> > 
> > > With respect to fs block sizes, one thing making discards suck is that
> > > many high capacity SSDs' physical page sizes are larger than the fs
> > > block size, and a sub-page discard is worse than doing nothing.
> > 
> > That ties into the whole zeroing as a side-effect thing.
> > 
> > The devices really need to distinguish between discard-as-a-hint where
> > it is free to ignore anything that's not a whole multiple of whatever
> > the internal granularity is, and the WRITE ZEROES use case where the end
> > result needs to be deterministic.
> 
> Exactly, yes, considering the deterministic zeroing behavior. For devices
> supporting that, sub-page discards turn into a read-modify-write instead
> of invalidating the page.  That increases WAF instead of improving it
> as intended, and large page SSDs are most likely to have relatively poor
> write endurance in the first place.
> 
> We have NVMe spec changes in the pipeline so devices can report this
> granularity. But my real concern isn't with discard per se, but more
> with the writes since we don't support "sector" sizes greater than the
> system's page size. This is a bit of a different topic from where this
> thread started, though.

I don't understand how reporting a larger discard granularity helps.
Sure, if the file was written block-by-block in that large granularity
to begin with, then the drive can invalidate an entire page.  But if
even one page of that, say, 256kB block was rewritten, then discarding
the 256kB block will need to discard 252kB from one erase block and 4kB
from another erase block.

So it looks like you really just want to report a larger "optimal IO
size", which I thought we already had.

      parent reply	other threads:[~2019-02-27 13:24 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-17 20:36 [LSF/MM TOPIC] More async operations for file systems - async discard? Ric Wheeler
2019-02-17 21:09 ` Dave Chinner
2019-02-17 23:42   ` Ric Wheeler
2019-02-18  2:22     ` Dave Chinner
2019-02-18 22:30       ` Ric Wheeler
2019-02-20 23:47     ` Keith Busch
2019-02-21 20:08       ` Dave Chinner
2019-02-21 23:55       ` Jeff Mahoney
2019-02-22  3:01         ` Martin K. Petersen
2019-02-22  6:15           ` Roman Mamedov
2019-02-22 14:12             ` Martin K. Petersen
2019-02-22  2:51       ` Martin K. Petersen
2019-02-22 16:45         ` Keith Busch
2019-02-27 11:40           ` Ric Wheeler
2019-02-27 13:24           ` Matthew Wilcox [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190227132451.GL11592@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=david@fromorbit.com \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=martin.petersen@oracle.com \
    --cc=ricwheeler@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).