* [RFC] mm: page-flags.h: remove the bias against tail pages
@ 2024-03-25 4:55 John Hubbard
2024-03-25 5:24 ` Matthew Wilcox
0 siblings, 1 reply; 3+ messages in thread
From: John Hubbard @ 2024-03-25 4:55 UTC (permalink / raw)
To: Andrew Morton
Cc: LKML, linux-mm, John Hubbard, David Hildenbrand, Matthew Wilcox,
Mike Rapoport, Theodore Ts'o, Vishal Moola,
Peter Collingbourne
commit 1d798ca3f1643 ("mm: make compound_head() robust") added
page->compound_head and the associated "unlikely" check for a tail page
in compound_head():
if (unlikely(head & 1))
return (struct page *) (head - 1);
return page;
That worked nicely in 2015. However, in the 8.5 years since then, things
have changed: folios and huge pages are heavily used, with more uses
coming. See for example the various THP enhancements being proposed. And
hugetlbfs remains alive and well. And large folios are being plumbed
into everything.
With that in mind, remove the "unlikely" attribute when checking for a
tail page in compound_head(), and let normal CPU branch prediction do
what it may.
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Peter Collingbourne <pcc@google.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
Hi,
Is this reasonable? I haven't gone out and gathered test data, because
the original patch to create this just assumed that compound pages were
uncommon, and so now it's time to stop making that assumption. I think
that's sufficient reasoning here to leave out the compiler hint, right?
thanks,
John Hubbard
NVIDIA
include/linux/page-flags.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 652d77805e99..ae9509c6736c 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -246,7 +246,7 @@ static inline unsigned long _compound_head(const struct page *page)
{
unsigned long head = READ_ONCE(page->compound_head);
- if (unlikely(head & 1))
+ if (head & 1)
return head - 1;
return (unsigned long)page_fixed_fake_head(page);
}
--
2.44.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [RFC] mm: page-flags.h: remove the bias against tail pages
2024-03-25 4:55 [RFC] mm: page-flags.h: remove the bias against tail pages John Hubbard
@ 2024-03-25 5:24 ` Matthew Wilcox
2024-03-26 3:21 ` John Hubbard
0 siblings, 1 reply; 3+ messages in thread
From: Matthew Wilcox @ 2024-03-25 5:24 UTC (permalink / raw)
To: John Hubbard
Cc: Andrew Morton, LKML, linux-mm, David Hildenbrand, Mike Rapoport,
Theodore Ts'o, Vishal Moola, Peter Collingbourne
On Sun, Mar 24, 2024 at 09:55:19PM -0700, John Hubbard wrote:
> commit 1d798ca3f1643 ("mm: make compound_head() robust") added
> page->compound_head and the associated "unlikely" check for a tail page
> in compound_head():
>
> if (unlikely(head & 1))
> return (struct page *) (head - 1);
> return page;
>
> That worked nicely in 2015. However, in the 8.5 years since then, things
> have changed: folios and huge pages are heavily used, with more uses
> coming. See for example the various THP enhancements being proposed. And
> hugetlbfs remains alive and well. And large folios are being plumbed
> into everything.
>
> With that in mind, remove the "unlikely" attribute when checking for a
> tail page in compound_head(), and let normal CPU branch prediction do
> what it may.
> Is this reasonable? I haven't gone out and gathered test data, because
> the original patch to create this just assumed that compound pages were
> uncommon, and so now it's time to stop making that assumption. I think
> that's sufficient reasoning here to leave out the compiler hint, right?
It's complicated. On the one hand, it's "more likely" because there are
more tail pages than there are head pages or order-0 pages. On the
other hand, a _lot_ of the time we call compound_head(), it's done with
a non-tail page because we tend to pass around head pages (eg,
pmd_page() on hugetlbfs, or looking up a folio in the page cache and
passing &folio->page to some function that's not yet converted.
On the third hand, does the compiler really do much with the annotation?
Before your patch:
27d6: a8 01 test $0x1,%al
27d8: 75 02 jne 27dc <clear_refs_pte_range+0x9c>
27da: eb 59 jmp 2835 <clear_refs_pte_range+0xf5>
27dc: 49 8b 44 24 08 mov 0x8(%r12),%rax
27e1: a8 01 test $0x1,%al
27e3: 75 6f jne 2854 <clear_refs_pte_range+0x114>
27e5: eb 73 jmp 285a <clear_refs_pte_range+0x11a>
With your patch:
1ee6: a8 01 test $0x1,%al
1ee8: 75 02 jne 1eec <clear_refs_pte_range+0x9c>
1eea: eb 5f jmp 1f4b <clear_refs_pte_range+0xfb>
1eec: 49 8b 44 24 08 mov 0x8(%r12),%rax
1ef1: a8 01 test $0x1,%al
1ef3: 75 50 jne 1f45 <clear_refs_pte_range+0xf5>
1ef5: eb 6c jmp 1f63 <clear_refs_pte_range+0x113>
Looks pretty much the same. bloat-o-meter says:
$ ./scripts/bloat-o-meter before.o after.o
add/remove: 0/0 grow/shrink: 2/4 up/down: 32/-48 (-16)
Function old new delta
gather_stats.constprop 730 753 +23
smaps_hugetlb_range 635 644 +9
smaps_page_accumulate 342 338 -4
clear_refs_pte_range 339 328 -11
pagemap_hugetlb_range 422 407 -15
smaps_pte_range 1406 1388 -18
Total: Before=20066, After=20050, chg -0.08%
(I was looking at clear_refs_pte_range above). This seems marginal.
The benefits of removing a call to compound_head are much less
ambiguous:
$ ./scripts/bloat-o-meter before.o .build/fs/proc/task_mmu.o
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-101 (-101)
Function old new delta
clear_refs_pte_range 339 238 -101
Total: Before=20066, After=19965, chg -0.50%
I'd describe that as replacing four calls to compound_head() with two:
- page = pmd_page(*pmd);
+ folio = page_folio(pmd_page(*pmd));
/* Clear accessed and referenced bits. */
pmdp_test_and_clear_young(vma, addr, pmd);
- test_and_clear_page_young(page);
- ClearPageReferenced(page);
+ folio_test_clear_young(folio);
+ folio_clear_referenced(folio);
...
- page = vm_normal_page(vma, addr, ptent);
- if (!page)
+ folio = vm_normal_folio(vma, addr, ptent);
+ if (!folio)
continue;
/* Clear accessed and referenced bits. */
ptep_test_and_clear_young(vma, addr, pte);
- test_and_clear_page_young(page);
- ClearPageReferenced(page);
+ folio_test_clear_young(folio);
+ folio_clear_referenced(folio);
I'm not saying this patch is necessarily wrong, I just think it's
"not proven".
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] mm: page-flags.h: remove the bias against tail pages
2024-03-25 5:24 ` Matthew Wilcox
@ 2024-03-26 3:21 ` John Hubbard
0 siblings, 0 replies; 3+ messages in thread
From: John Hubbard @ 2024-03-26 3:21 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, LKML, linux-mm, David Hildenbrand, Mike Rapoport,
Theodore Ts'o, Vishal Moola, Peter Collingbourne
On 3/24/24 10:24 PM, Matthew Wilcox wrote:
...
> It's complicated. On the one hand, it's "more likely" because there are
> more tail pages than there are head pages or order-0 pages. On the
> other hand, a _lot_ of the time we call compound_head(), it's done with
> a non-tail page because we tend to pass around head pages (eg,
ah yes, that's true.
> pmd_page() on hugetlbfs, or looking up a folio in the page cache and
> passing &folio->page to some function that's not yet converted.
>
> On the third hand, does the compiler really do much with the annotation?
>
> Before your patch:
>
> 27d6: a8 01 test $0x1,%al
> 27d8: 75 02 jne 27dc <clear_refs_pte_range+0x9c>
I should have thought to check this. Usually I'll see a change between je/jne
if __builtin_expect is doing its job. Here it is, oddly, missing in action.
Maybe I'll look a little closer into why that is...
> 27da: eb 59 jmp 2835 <clear_refs_pte_range+0xf5>
> 27dc: 49 8b 44 24 08 mov 0x8(%r12),%rax
> 27e1: a8 01 test $0x1,%al
> 27e3: 75 6f jne 2854 <clear_refs_pte_range+0x114>
> 27e5: eb 73 jmp 285a <clear_refs_pte_range+0x11a>
>
> With your patch:
>
> 1ee6: a8 01 test $0x1,%al
> 1ee8: 75 02 jne 1eec <clear_refs_pte_range+0x9c>
> 1eea: eb 5f jmp 1f4b <clear_refs_pte_range+0xfb>
> 1eec: 49 8b 44 24 08 mov 0x8(%r12),%rax
> 1ef1: a8 01 test $0x1,%al
> 1ef3: 75 50 jne 1f45 <clear_refs_pte_range+0xf5>
> 1ef5: eb 6c jmp 1f63 <clear_refs_pte_range+0x113>
>
> Looks pretty much the same. bloat-o-meter says:
>
> $ ./scripts/bloat-o-meter before.o after.o
> add/remove: 0/0 grow/shrink: 2/4 up/down: 32/-48 (-16)
> Function old new delta
> gather_stats.constprop 730 753 +23
> smaps_hugetlb_range 635 644 +9
> smaps_page_accumulate 342 338 -4
> clear_refs_pte_range 339 328 -11
> pagemap_hugetlb_range 422 407 -15
> smaps_pte_range 1406 1388 -18
> Total: Before=20066, After=20050, chg -0.08%
>
> (I was looking at clear_refs_pte_range above). This seems marginal.
> The benefits of removing a call to compound_head are much less
> ambiguous:
>
> $ ./scripts/bloat-o-meter before.o .build/fs/proc/task_mmu.o
> add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-101 (-101)
> Function old new delta
> clear_refs_pte_range 339 238 -101
> Total: Before=20066, After=19965, chg -0.50%
>
> I'd describe that as replacing four calls to compound_head() with two:
>
> - page = pmd_page(*pmd);
> + folio = page_folio(pmd_page(*pmd));
>
> /* Clear accessed and referenced bits. */
> pmdp_test_and_clear_young(vma, addr, pmd);
> - test_and_clear_page_young(page);
> - ClearPageReferenced(page);
> + folio_test_clear_young(folio);
> + folio_clear_referenced(folio);
> ...
> - page = vm_normal_page(vma, addr, ptent);
> - if (!page)
> + folio = vm_normal_folio(vma, addr, ptent);
> + if (!folio)
> continue;
>
> /* Clear accessed and referenced bits. */
> ptep_test_and_clear_young(vma, addr, pte);
> - test_and_clear_page_young(page);
> - ClearPageReferenced(page);
> + folio_test_clear_young(folio);
> + folio_clear_referenced(folio);
>
> I'm not saying this patch is necessarily wrong, I just think it's
> "not proven".
I appreciate your looking at this and explaining the analysis steps
you used!
thanks,
--
John Hubbard
NVIDIA
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-03-26 3:21 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-25 4:55 [RFC] mm: page-flags.h: remove the bias against tail pages John Hubbard
2024-03-25 5:24 ` Matthew Wilcox
2024-03-26 3:21 ` John Hubbard
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).