From: Matthew Wilcox <willy@infradead.org>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Andreas Dilger <adilger@dilger.ca>,
Johannes Weiner <hannes@cmpxchg.org>,
"Darrick J. Wong" <darrick.wong@oracle.com>,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: Discontiguous folios/pagesets
Date: Mon, 30 Aug 2021 19:43:01 +0100 [thread overview]
Message-ID: <YS0mtYZ+PEAaM7pI@casper.infradead.org> (raw)
In-Reply-To: <20210830182818.GA9892@magnolia>
On Mon, Aug 30, 2021 at 11:28:18AM -0700, Darrick J. Wong wrote:
> On Sat, Aug 28, 2021 at 01:27:29PM -0600, Andreas Dilger wrote:
> > On Aug 28, 2021, at 1:04 PM, Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > The current folio work is focused on permitting the VM to use
> > > physically contiguous chunks of memory. Both Darrick and Johannes
> > > have pointed out the advantages of supporting logically-contiguous,
> > > physically-discontiguous chunks of memory. Johannes wants to be able to
> > > use order-0 allocations to allocate larger folios, getting the benefit
> > > of managing the memory in larger chunks without requiring the memory
> > > allocator to be able to find contiguous chunks. Darrick wants to support
> > > non-power-of-two block sizes.
> >
> > What is the use case for non-power-of-two block sizes? The main question
> > is whether that use case is important enough to add the complexity and
> > overhead in order to support it?
>
> For copy-on-write to a XFS realtime volume where the allocation extent
> size (we support bigalloc too! :P) is not a power of two (e.g. you set
> up a 4 disk raid5 with 64k stripes, now the extent size is 192k).
>
> Granted, I don't think folios handling 192k chunks is absolutely
> *required* for folios; the only hard requirement is that if any page in
> a 192k extent becomes dirty, the rest have to get written out all the
> same time, and the cow remap can only happen after the last page
> finishes writeback.
I /think/ "all pages get written out at the same time" is basically the
same thing as "support a non-power-of-two block size".
If we only have page A in the cache at the time it's going to be written
back, we have to read in pages B and C in order to calculate the parity P.
That will annoy writeback-because-we're-low-on-memory; I know we allow
a certain amount of allocation to happen in the writeback path, but
requiring 128kB to be allocated is a bit much.
So we have to allow page A being dirty to pin pages B and C in the cache.
I suppose that's possible; we could make (clean) pages B and C follow
page A on the LRU, so they're going to still be in RAM at the time that
page A is written back. I don't fully understand how the LRU works,
but I assume it'd be a nightmare to ensure that A, B and C all move
around the system in the same way. Much easier to ensure that ABC stay
linked together and all get written back at once.
next prev parent reply other threads:[~2021-08-30 18:43 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-28 19:04 Discontiguous folios/pagesets Matthew Wilcox
2021-08-28 19:27 ` Andreas Dilger
2021-08-30 18:28 ` Darrick J. Wong
2021-08-30 18:35 ` Andreas Dilger
2021-08-30 18:43 ` Matthew Wilcox [this message]
2021-09-01 9:40 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YS0mtYZ+PEAaM7pI@casper.infradead.org \
--to=willy@infradead.org \
--cc=adilger@dilger.ca \
--cc=darrick.wong@oracle.com \
--cc=djwong@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).