linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Matthew Wilcox <willy@infradead.org>,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-mm <linux-mm@kvack.org>
Subject: Re: [LSF/MM/BPF TOPIC] untorn buffered writes
Date: Thu, 29 Feb 2024 12:07:02 +1100	[thread overview]
Message-ID: <Zd/YtmTBUN7jFg4X@dread.disaster.area> (raw)
In-Reply-To: <20240228233354.GC177082@mit.edu>

On Wed, Feb 28, 2024 at 05:33:54PM -0600, Theodore Ts'o wrote:
> On Wed, Feb 28, 2024 at 02:11:06PM +0000, Matthew Wilcox wrote:
> > I'm not entirely sure that it does become a mess.  If our implementation
> > of this ensures that each write ends up in a single folio (even if the
> > entire folio is larger than the write), then we will have satisfied the
> > semantics of the flag.
> 
> What if we do a 32k write which spans two folios?  And what
> if the physical pages for those 32k in the buffer cache are not
> contiguous?  Are you going to have to join the two 16k folios
> together, or maybe two 8k folios and an 16k folio, and relocate pages
> to make a contiguous 32k folio when we do a buffered RWF_ATOMIC write
> of size 32k?

RWF_ATOMIC defines contraints that a 32kB write must be 32kB
aligned. So the only way a 32kB write would span two folios is if
a 16kB write had already been done in this space.

WE are already dealing with this problem for bs > ps with the min
order mapping constraint. We can deal with this easily by ensuring
that when we set the inode as supporting atomic writes. This already
ensures physical extent allocation alignment, we can also set the
mapping folio order at this time to ensure that we only allocate
RWF_ATOMIC compatible aligned/sized folios....

> > I think we'd be better off treating RWF_ATOMIC like it's a bs>PS device.

Which is why Willy says this...

> > That takes two somewhat special cases and makes them use the same code
> > paths, which probably means fewer bugs as both camps will be testing
> > the same code.
> 
> But for a bs > PS device, where the logical block size is greater than
> the page size, you don't need the RWF_ATOMIC flag at all.

Yes we do - hardware already supports REQ_ATOMIC sizes larger than
64kB filesystem blocks. i.e. RWF_ATOMIC is not restricted to 64kB
or any specific filesystem block size, and can always be larger than
the filesystem block size.

> All direct
> I/O writes *must* be a multiple of the logical sector size, and
> buffered writes, if they are smaller than the block size, *must* be
> handled as a read-modify-write, since you can't send writes to the
> device smaller than the logical sector size.

The filesystem will likely need to constrain minimum RWF_ATOMIC
sizes to a single filesystem block. That's the whole point of having
the statx interface - the application is going to have to query what
the min/max atomic write sizes supported are and adjust to those.
Applications will not be able to use 2kB RWF_ATOMIC writes on a 4kB
block size filesystem, and it's no different with larger filesystem
block sizes.

-Dave.

-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2024-02-29  1:07 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-28  6:12 [LSF/MM/BPF TOPIC] untorn buffered writes Theodore Ts'o
2024-02-28 11:38 ` [Lsf-pc] " Amir Goldstein
2024-02-28 20:21   ` Theodore Ts'o
2024-02-28 14:11 ` Matthew Wilcox
2024-02-28 23:33   ` Theodore Ts'o
2024-02-29  1:07     ` Dave Chinner [this message]
2024-02-28 16:06 ` John Garry
2024-02-28 23:24   ` Theodore Ts'o
2024-02-29 16:28     ` John Garry
2024-02-29 21:21       ` Ritesh Harjani
2024-02-29  0:52 ` Dave Chinner
2024-03-11  8:42 ` John Garry
2024-05-15 19:54 ` John Garry
2024-05-22 21:56   ` Luis Chamberlain
2024-05-23 11:59     ` John Garry
2024-05-23 12:59   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zd/YtmTBUN7jFg4X@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).