All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>,
	Keith Busch <kbusch@kernel.org>, Theodore Ts'o <tytso@mit.edu>,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-block@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations
Date: Sun, 5 Mar 2023 05:02:43 +0000	[thread overview]
Message-ID: <ZAQicyYR0kZgrzIr@casper.infradead.org> (raw)
In-Reply-To: <ZAQXduwAcAtIZHkB@bombadil.infradead.org>

On Sat, Mar 04, 2023 at 08:15:50PM -0800, Luis Chamberlain wrote:
> On Sat, Mar 04, 2023 at 04:39:02PM +0000, Matthew Wilcox wrote:
> > I'm getting more and more
> > comfortable with the idea that "Linux doesn't support block sizes >
> > PAGE_SIZE on 32-bit machines" is an acceptable answer.
> 
> First of all filesystems would need to add support for a larger block
> sizes > PAGE_SIZE, and that takes effort. It is also a support question
> too.
> 
> I think garnering consensus from filesystem developers we don't want
> to support block sizes > PAGE_SIZE on 32-bit systems would be a good
> thing to review at LSFMM or even on this list. I hightly doubt anyone
> is interested in that support.

Agreed.

> > XFS already works with arbitrary-order folios. 
> 
> But block sizes > PAGE_SIZE is work which is still not merged. It
> *can* be with time. That would allow one to muck with larger block
> sizes than 4k on x86-64 for instance. Without this, you can't play
> ball.

Do you mean that XFS is checking that fs block size <= PAGE_SIZE and
that check needs to be dropped?  If so, I don't see where that happens.

Or do you mean that the blockdev "filesystem" needs to be enhanced to
support large folios?  That's going to be kind of a pain because it
uses buffer_heads.  And ext4 depends on it using buffer_heads.  So,
yup, more work needed than I remembered (but as I said, it's FS side,
not block layer or driver work).

Or were you referring to the NVMe PAGE_SIZE sanity check that Keith
mentioned upthread?

> > The only needed piece is
> > specifying to the VFS that there's a minimum order for this particular
> > inode, and having the VFS honour that everywhere.
> 
> Other than the above too, don't we still also need to figure out what
> fs APIs would incur larger order folios? And then what about corner cases
> with the page cache?
> 
> I was hoping some of these nooks and crannies could be explored with tmpfs.

I think we're exploring all those with XFS.  Or at least, many of
them.  A lot of the folio conversion patches you see flowing past
are pure efficiency gains -- no need to convert between pages and
folios implicitly; do the explicit conversions and save instructions.
Most of the correctness issues were found & fixed a long time ago when
PMD support was added to tmpfs.  One notable exception would be the
writeback path since tmpfs doesn't writeback, it has that special thing
it does with swap.

tmpfs is a rather special case as far as its use of the filesystem APIs
go, but I suspect I've done most of the needed work to have it work with
arbitrary order folios instead of just PTE and PMD sizes.  There's
probably some left-over assumptions that I didn't find yet.  Maybe in
the swap path, for example ;-)

  reply	other threads:[~2023-03-05  5:03 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-01  3:52 [LSF/MM/BPF TOPIC] Cloud storage optimizations Theodore Ts'o
2023-03-01  4:18 ` Gao Xiang
2023-03-01  4:40   ` Matthew Wilcox
2023-03-01  4:59     ` Gao Xiang
2023-03-01  4:35 ` Matthew Wilcox
2023-03-01  4:49   ` Gao Xiang
2023-03-01  5:01     ` Matthew Wilcox
2023-03-01  5:09       ` Gao Xiang
2023-03-01  5:19         ` Gao Xiang
2023-03-01  5:42         ` Matthew Wilcox
2023-03-01  5:51           ` Gao Xiang
2023-03-01  6:00             ` Gao Xiang
2023-03-02  3:13 ` Chaitanya Kulkarni
2023-03-02  3:50 ` Darrick J. Wong
2023-03-03  3:03   ` Martin K. Petersen
2023-03-02 20:30 ` Bart Van Assche
2023-03-03  3:05   ` Martin K. Petersen
2023-03-03  1:58 ` Keith Busch
2023-03-03  3:49   ` Matthew Wilcox
2023-03-03 11:32     ` Hannes Reinecke
2023-03-03 13:11     ` James Bottomley
2023-03-04  7:34       ` Matthew Wilcox
2023-03-04 13:41         ` James Bottomley
2023-03-04 16:39           ` Matthew Wilcox
2023-03-05  4:15             ` Luis Chamberlain
2023-03-05  5:02               ` Matthew Wilcox [this message]
2023-03-08  6:11                 ` Luis Chamberlain
2023-03-08  7:59                   ` Dave Chinner
2023-03-06 12:04               ` Hannes Reinecke
2023-03-06  3:50             ` James Bottomley
2023-03-04 19:04         ` Luis Chamberlain
2023-03-03 21:45     ` Luis Chamberlain
2023-03-03 22:07       ` Keith Busch
2023-03-03 22:14         ` Luis Chamberlain
2023-03-03 22:32           ` Keith Busch
2023-03-03 23:09             ` Luis Chamberlain
2023-03-16 15:29             ` Pankaj Raghav
2023-03-16 15:41               ` Pankaj Raghav
2023-03-03 23:51       ` Bart Van Assche
2023-03-04 11:08       ` Hannes Reinecke
2023-03-04 13:24         ` Javier González
2023-03-04 16:47         ` Matthew Wilcox
2023-03-04 17:17           ` Hannes Reinecke
2023-03-04 17:54             ` Matthew Wilcox
2023-03-04 18:53               ` Luis Chamberlain
2023-03-05  3:06               ` Damien Le Moal
2023-03-05 11:22               ` Hannes Reinecke
2023-03-06  8:23                 ` Matthew Wilcox
2023-03-06 10:05                   ` Hannes Reinecke
2023-03-06 16:12                   ` Theodore Ts'o
2023-03-08 17:53                     ` Matthew Wilcox
2023-03-08 18:13                       ` James Bottomley
2023-03-09  8:04                         ` Javier González
2023-03-09 13:11                           ` James Bottomley
2023-03-09 14:05                             ` Keith Busch
2023-03-09 15:23                             ` Martin K. Petersen
2023-03-09 20:49                               ` James Bottomley
2023-03-09 21:13                                 ` Luis Chamberlain
2023-03-09 21:28                                   ` Martin K. Petersen
2023-03-10  1:16                                     ` Dan Helmick
2023-03-10  7:59                             ` Javier González
2023-03-08 19:35                 ` Luis Chamberlain
2023-03-08 19:55                 ` Bart Van Assche
2023-03-03  2:54 ` Martin K. Petersen
2023-03-03  3:29   ` Keith Busch
2023-03-03  4:20   ` Theodore Ts'o
2023-07-16  4:09 BELINDA Goodpaster kelly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZAQicyYR0kZgrzIr@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mcgrof@kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.