All of lore.kernel.org
 help / color / mirror / Atom feed
* Folios for anonymous memory
@ 2023-02-15 12:38 Ryan Roberts
  2023-02-15 15:13 ` Matthew Wilcox
  0 siblings, 1 reply; 3+ messages in thread
From: Ryan Roberts @ 2023-02-15 12:38 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Linux-MM, Catalin Marinas, Mark Rutland, Ruben Ayrapetyan

Hi Matthew, all,

I’ve recently been looking into some potential performance improvements, and
think that folios could help with making these improvements a reality. I’m
hoping that you can answer some questions to help figure out if this makes sense.

First a quick summary of my bench-marking; I’ve been running a Kernel
Compilation test as well as the Speedometer browser performance benchmark (among
others), while trying to better understand the impact of page size on both HW
and SW. To do this, I’ve hacked the arm64 arch code to separate the HW page size
(4K) from the kernel page size (16K). Then I ran 3 kernels (baseline-4k,
baseline-16k, and my hacked up hybrid-16k-4k) - all based on v6.1 - with the aim
of determining the speedups due solely to SW overhead reduction (baseline-4k ->
hybrid-16k-4k), and the speedups due to HW overhead reduction (baseline-4k ->
(baseline-16k - hybrid-16k-4k)).

Results as follows:

Kernel Compilation:
Speed up due to SW overhead reduction: 6.5%
Speed up due to HW overhead reduction: 5.0%
Total speed up: 11.5%

Speedometer 2.0:
Speed up due to SW overhead reduction: 5.3%
Speed up due to HW overhead reduction: 5.1%
Total speed up: 10.4%

Digging into the reasons for the SW-side speedup, it boils down to less
book-keeping - 4x fewer page faults, 4x fewer pages to manage locks/refcounts/…
for, which leads to faster abort and syscall handling. I think these phenomena
are well understood in the Folio context? Although for these workloads, the
memory is primarily anonymous.

I’d like to figure out how to realise some of these benefits in a kernel that
still maintains a 4K page user ABI. Reading over old threads, LWN and watching
Matthew’s talk at OSS last summer, it sounds like this is exactly what Folios
intend to solve?

So a few questions:

- I’ve seen folios for anon memory listed as future work; what’s the current
status? Is anyone looking at this? It’s something that I would be interested to
take a look at if not (although don’t take that as an actual commitment yet!).

- My understanding is that as of v6.0, at least, XFS was the only FS supporting
large folios? Has that picture changed? Is there any likelihood of seeing ext4
and f2fs support anytime soon?

- Matthew mentioned in the talk that he had data showing memory fragmentation
becoming less of an issue as more users we allocating large folios. Is that data
or the experimental approach public?

Thanks,
Ryan


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Folios for anonymous memory
  2023-02-15 12:38 Folios for anonymous memory Ryan Roberts
@ 2023-02-15 15:13 ` Matthew Wilcox
  2023-02-15 16:51   ` Ryan Roberts
  0 siblings, 1 reply; 3+ messages in thread
From: Matthew Wilcox @ 2023-02-15 15:13 UTC (permalink / raw)
  To: Ryan Roberts; +Cc: Linux-MM, Catalin Marinas, Mark Rutland, Ruben Ayrapetyan

On Wed, Feb 15, 2023 at 12:38:13PM +0000, Ryan Roberts wrote:
> Kernel Compilation:
> Speed up due to SW overhead reduction: 6.5%
> Speed up due to HW overhead reduction: 5.0%
> Total speed up: 11.5%
> 
> Speedometer 2.0:
> Speed up due to SW overhead reduction: 5.3%
> Speed up due to HW overhead reduction: 5.1%
> Total speed up: 10.4%
> 
> Digging into the reasons for the SW-side speedup, it boils down to less
> book-keeping - 4x fewer page faults, 4x fewer pages to manage locks/refcounts/…
> for, which leads to faster abort and syscall handling. I think these phenomena
> are well understood in the Folio context? Although for these workloads, the
> memory is primarily anonymous.

All of that tracks pretty well with what I've found.  Although I haven't
been conducting exactly the same experiments, and different hardware is
going to have different properties, it all seems about right.

> I’d like to figure out how to realise some of these benefits in a kernel that
> still maintains a 4K page user ABI. Reading over old threads, LWN and watching
> Matthew’s talk at OSS last summer, it sounds like this is exactly what Folios
> intend to solve?

Yes, it's exactly what folios are supposed to achieve -- opportunistic use
of larger memory allocations & TLB sizes when the stars align.

> So a few questions:
> 
> - I’ve seen folios for anon memory listed as future work; what’s the current
> status? Is anyone looking at this? It’s something that I would be interested to
> take a look at if not (although don’t take that as an actual commitment yet!).

There are definitely people _looking_ at it.  I don't think anyone's
committed to it, and I don't think there's anyone 50 patches into a 100
patch series to make it work ;-)  I think there are a lot of unanswered
questions about how best to do it.

> - My understanding is that as of v6.0, at least, XFS was the only FS supporting
> large folios? Has that picture changed? Is there any likelihood of seeing ext4
> and f2fs support anytime soon?

We have some progress on that front.  In addition to XFS, AFS, EROFS
and tmpfs currently enable support for large folios.  I've heard tell
of NFS support coming soon.  I'm pretty sure CIFS is looking into it.
The OCFS2 maintainers are interested.  You can find the current state
of fs support by grepping for mapping_set_large_folios().

People are working on it from the f2fs side:
https://lore.kernel.org/linux-fsdevel/Y5D8wYGpp%2F95ShTV@bombadil.infradead.org/

ext4 is being more conservative.  I posted a patch series to convert
ext4 to use order-0 folios instead of pages (enabling large folios
will be more work), but I don't have any significant responses to
that yet:
https://lore.kernel.org/linux-fsdevel/20230126202415.1682629-1-willy@infradead.org/

> - Matthew mentioned in the talk that he had data showing memory fragmentation
> becoming less of an issue as more users we allocating large folios. Is that data
> or the experimental approach public?

I'm not sure I have data on that front; more of an argument from first
principles -- page cache is the easiest form of memory to reclaim
since it's usually clean.  If the filesystems using the page cache are
allocating large folios, it's easier to find larger chunks of memory.
Also every time a fs tries to allocate large folios and fails, it'll
poke the compaction code to try to create larger chunks of memory.

There's also memory allocation patterns to consider.  At some point, all
our low-order pools will be empty and we'll have to break up an order-10
page.  If we're allocating individual pages for the filesystem, we'll
happily allocate the first few, but then the radix tree which we store
the pages in will have to allocate a new node from slab.  Slab allocates
28 nodes from an order-2 page allocation, so you'll almost instantly get
a case where this order-10 page will never be reassembled.  Unless your
system is configured with a movable memory zone (which will segregate slab
allocations from page cache allocations), and my laptop certainly isn't.

I don't want you to get the impression that all the work going on is
targetted at filesystem folios.  There's a lot of infrastructure that's
being converted from pages to folios and being reexamined at the same
time to be sure it handles arbitrary-order folios correctly.  Right
now, I'm working on the architecture support for inserting multiple
consecutive PTEs at the same time:
https://lore.kernel.org/linux-arch/20230211033948.891959-1-willy@infradead.org/

Thanks for reaching out.  We have a Zoom call on alternate Fridays,
so if you're free at 5pm UK time (yes, I know ... trying to fit in both
California and central Europe leads to awkward times for phone calls),
I can send you the meeting details.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Folios for anonymous memory
  2023-02-15 15:13 ` Matthew Wilcox
@ 2023-02-15 16:51   ` Ryan Roberts
  0 siblings, 0 replies; 3+ messages in thread
From: Ryan Roberts @ 2023-02-15 16:51 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Linux-MM, Catalin Marinas, Mark Rutland, Ruben Ayrapetyan

Thanks for the fast response - I appreciate it!

>>
>> - I’ve seen folios for anon memory listed as future work; what’s the current
>> status? Is anyone looking at this? It’s something that I would be interested to
>> take a look at if not (although don’t take that as an actual commitment yet!).
>
> There are definitely people _looking_ at it.  I don't think anyone's
> committed to it, and I don't think there's anyone 50 patches into a 100
> patch series to make it work ;-)  I think there are a lot of unanswered
> questions about how best to do it.

Is there any list outlining those questions? Having had a quick look at
do_anonymous_page(), which is where a bunch of my overheads seem to be coming
in, (and having the luxury of not being intimately familiar with mm ;-) ) it
looks like it would be doable to convert it to allocate order-2 folios, for
example. I guess a lot of the difficulty is in figuring out heuristics for
choosing the right folio size for a given fault? And then thinking about handing
COW, swap, parallel faults, etc?


> I don't want you to get the impression that all the work going on is
> targetted at filesystem folios.  There's a lot of infrastructure that's
> being converted from pages to folios and being reexamined at the same
> time to be sure it handles arbitrary-order folios correctly.  Right
> now, I'm working on the architecture support for inserting multiple
> consecutive PTEs at the same time:
> https://lore.kernel.org/linux-arch/20230211033948.891959-1-willy@infradead.org/

Yep - I'm aware of that, thanks!

>
> Thanks for reaching out.  We have a Zoom call on alternate Fridays,
> so if you're free at 5pm UK time (yes, I know ... trying to fit in both
> California and central Europe leads to awkward times for phone calls),
> I can send you the meeting details.

Yes, I'd be keen to join, although not the easiest time for me. I can't do this
week, but would like to join on an ad-hoc basis if that's ok? When is the next call?

Thanks,
Ryan

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-02-15 16:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-15 12:38 Folios for anonymous memory Ryan Roberts
2023-02-15 15:13 ` Matthew Wilcox
2023-02-15 16:51   ` Ryan Roberts

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.