linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Yu Zhao <yuzhao@google.com>,
	"Yin, Fengwei" <fengwei.yin@intel.com>
Cc: linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org
Subject: Re: [RFC v2 PATCH 00/17] variable-order, large folios for anonymous memory
Date: Wed, 26 Apr 2023 11:41:48 +0100	[thread overview]
Message-ID: <6857912b-4afd-7fb5-b11b-ebe0e32298c2@arm.com> (raw)
In-Reply-To: <c1e63072-4b7b-fcbd-8de5-9dbcb6ad60fc@redhat.com>

Hi David,

On 17/04/2023 16:44, David Hildenbrand wrote:

>>>>>
>>>>> So what should be safe is replacing all sub-pages of a folio that are marked
>>>>> "maybe shared" by a new folio under PT lock. However, I wonder if it's really
>>>>> worth the complexity. For THP we were happy so far to *not* optimize this,
>>>>> implying that maybe we shouldn't worry about optimizing the fork() case for
>>>>> now
>>>>> that heavily.
>>>>
>>>> I don't have the exact numbers to hand, but I'm pretty sure I remember enabling
>>>> large copies was contributing a measurable amount to the performance
>>>> improvement. (Certainly, the zero-page copy case, is definitely a big
>>>> contributer). I don't have access to the HW at the moment but can rerun later
>>>> with and without to double check.
>>>
>>> In which test exactly? Some micro-benchmark?
>>
>> The kernel compile benchmark that I quoted numbers for in the cover letter. I
>> have some trace points (not part of the submitted series) that tell me how many
>> mappings of each order we get for each code path. I'm pretty sure I remember all
>> of these 4 code paths contributing non-negligible amounts.
> 
> Interesting! It would be great to see if there is an actual difference after
> patch #10 was applied without the other COW replacement.
> 

Sorry about the delay. I now have some numbers for this...

I rearranged the patch order so that all the "utility" stuff (new rmap
functions, etc) are first (1, 2, 3, 4, 5, 8, 9, 11, 12, 13), followed by a
couple of general improvements (7, 17), which should be dormant until we have
the final patches, then finally (6, 10, 14, 15), which implement large anon
folios the allocate, reuse, copy-non-zero and copy-zero paths respectively. I've
dropped patch 16 and fixed the copy-exclusive bug you spotted (by ensuring we
never replace an exclusive page).

I've measured performance at the following locations in the patch set:

- baseline: none of my patches applied
- utility: has utility and general improvement patches applied
- alloc: utility + 6
- reuse: utility + 6 + 10
- copy: utility + 6 + 10 + 14
- zero-alloc: utility + 6 + 19 + 14 + 15

The test is `make defconfig && time make -jN Image` for a clean checkout of
v6.3-rc3. The first result is thrown away, and the next 3 are kept. I saw some
per-boot variance (probably down to kaslr, etc). So have booted each kernel 7
times for a total of 3x7=21 samples per kernel. Then I've taken the mean:


jobs=8:

| label      |   real |   user |   kernel |
|:-----------|-------:|-------:|---------:|
| baseline   |   0.0% |   0.0% |     0.0% |
| utility    |  -2.7% |  -2.8% |    -3.1% |
| alloc      |  -6.0% |  -2.3% |   -24.1% |
| reuse      |  -9.5% |  -5.8% |   -28.5% |
| copy       | -10.6% |  -6.9% |   -29.4% |
| zero-alloc |  -9.2% |  -5.1% |   -29.8% |


jobs=160:

| label      |   real |   user |   kernel |
|:-----------|-------:|-------:|---------:|
| baseline   |   0.0% |   0.0% |     0.0% |
| utility    |  -1.8% |  -0.0% |    -7.7% |
| alloc      |  -6.0% |   1.8% |   -20.9% |
| reuse      |  -7.8% |  -1.6% |   -24.1% |
| copy       |  -7.8% |  -2.5% |   -26.8% |
| zero-alloc |  -7.7% |   1.5% |   -29.4% |


So it looks like patch 10 (reuse) is making a difference, but copy and
zero-alloc are not adding a huge amount, as you hypothesized. Personally I would
prefer not to drop those patches though, as it will all help towards utilization
of contiguous PTEs on arm64, which is the second part of the change that I'm now
working on.


For the final config ("zero-alloc") I also collected stats on how many
operations each of the 4 paths was performing, using ftrace and histograms.
"pnr" is the number of pages allocated/reused/copied, and "fnr" is the number of
pages in the source folio):


do_anonymous_page:

{ pnr:          1 } hitcount:    2749722
{ pnr:          4 } hitcount:     387832
{ pnr:          8 } hitcount:     409628
{ pnr:         16 } hitcount:    4296115

pages: 76315914
faults: 7843297
pages per fault: 9.7


wp_page_reuse (anon):

{ pnr:          1, fnr:          1 } hitcount:      47887
{ pnr:          3, fnr:          4 } hitcount:          2
{ pnr:          4, fnr:          4 } hitcount:       6131
{ pnr:          6, fnr:          8 } hitcount:          1
{ pnr:          7, fnr:          8 } hitcount:         10
{ pnr:          8, fnr:          8 } hitcount:       3794
{ pnr:          1, fnr:         16 } hitcount:         36
{ pnr:          2, fnr:         16 } hitcount:         23
{ pnr:          3, fnr:         16 } hitcount:          5
{ pnr:          4, fnr:         16 } hitcount:          9
{ pnr:          5, fnr:         16 } hitcount:          8
{ pnr:          6, fnr:         16 } hitcount:          9
{ pnr:          7, fnr:         16 } hitcount:          3
{ pnr:          8, fnr:         16 } hitcount:         24
{ pnr:          9, fnr:         16 } hitcount:          2
{ pnr:         10, fnr:         16 } hitcount:          1
{ pnr:         11, fnr:         16 } hitcount:          9
{ pnr:         12, fnr:         16 } hitcount:          2
{ pnr:         13, fnr:         16 } hitcount:         27
{ pnr:         14, fnr:         16 } hitcount:          2
{ pnr:         15, fnr:         16 } hitcount:         54
{ pnr:         16, fnr:         16 } hitcount:       6673

pages: 211393
faults: 64712
pages per fault: 3.3


wp_page_copy (anon):

{ pnr:          1, fnr:          1 } hitcount:      81242
{ pnr:          4, fnr:          4 } hitcount:       5974
{ pnr:          1, fnr:          8 } hitcount:          1
{ pnr:          4, fnr:          8 } hitcount:          1
{ pnr:          8, fnr:          8 } hitcount:      12933
{ pnr:          1, fnr:         16 } hitcount:         19
{ pnr:          4, fnr:         16 } hitcount:          3
{ pnr:          8, fnr:         16 } hitcount:          7
{ pnr:         16, fnr:         16 } hitcount:       4106

pages: 274390
faults: 104286
pages per fault: 2.6


wp_page_copy (zero):

{ pnr:          1 } hitcount:     178699
{ pnr:          4 } hitcount:      14498
{ pnr:          8 } hitcount:      23644
{ pnr:         16 } hitcount:     257940

pages: 4552883
faults: 474781
pages per fault: 9.6


Thanks,
Ryan



  parent reply	other threads:[~2023-04-26 10:41 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-14 13:02 [RFC v2 PATCH 00/17] variable-order, large folios for anonymous memory Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 01/17] mm: Expose clear_huge_page() unconditionally Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 02/17] mm: pass gfp flags and order to vma_alloc_zeroed_movable_folio() Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 03/17] mm: Introduce try_vma_alloc_movable_folio() Ryan Roberts
2023-04-17  8:49   ` Yin, Fengwei
2023-04-17 10:11     ` Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 04/17] mm: Implement folio_add_new_anon_rmap_range() Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 05/17] mm: Routines to determine max anon folio allocation order Ryan Roberts
2023-04-14 14:09   ` Kirill A. Shutemov
2023-04-14 14:38     ` Ryan Roberts
2023-04-14 15:37       ` Kirill A. Shutemov
2023-04-14 16:06         ` Ryan Roberts
2023-04-14 16:18           ` Matthew Wilcox
2023-04-14 16:31             ` Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 06/17] mm: Allocate large folios for anonymous memory Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 07/17] mm: Allow deferred splitting of arbitrary large anon folios Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 08/17] mm: Implement folio_move_anon_rmap_range() Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 09/17] mm: Update wp_page_reuse() to operate on range of pages Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 10/17] mm: Reuse large folios for anonymous memory Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 11/17] mm: Split __wp_page_copy_user() into 2 variants Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 12/17] mm: ptep_clear_flush_range_notify() macro for batch operation Ryan Roberts
2023-04-14 13:02 ` [RFC v2 PATCH 13/17] mm: Implement folio_remove_rmap_range() Ryan Roberts
2023-04-14 13:03 ` [RFC v2 PATCH 14/17] mm: Copy large folios for anonymous memory Ryan Roberts
2023-04-14 13:03 ` [RFC v2 PATCH 15/17] mm: Convert zero page to large folios on write Ryan Roberts
2023-04-14 13:03 ` [RFC v2 PATCH 16/17] mm: mmap: Align unhinted maps to highest anon folio order Ryan Roberts
2023-04-17  8:25   ` Yin, Fengwei
2023-04-17 10:13     ` Ryan Roberts
2023-04-14 13:03 ` [RFC v2 PATCH 17/17] mm: Batch-zap large anonymous folio PTE mappings Ryan Roberts
2023-04-17  8:04 ` [RFC v2 PATCH 00/17] variable-order, large folios for anonymous memory Yin, Fengwei
2023-04-17 10:19   ` Ryan Roberts
2023-04-17  8:19 ` Yin, Fengwei
2023-04-17 10:28   ` Ryan Roberts
2023-04-17 10:54 ` David Hildenbrand
2023-04-17 11:43   ` Ryan Roberts
2023-04-17 14:05     ` David Hildenbrand
2023-04-17 15:38       ` Ryan Roberts
2023-04-17 15:44         ` David Hildenbrand
2023-04-17 16:15           ` Ryan Roberts
2023-04-26 10:41           ` Ryan Roberts [this message]
2023-05-17 13:58             ` David Hildenbrand
2023-05-18 11:23               ` Ryan Roberts
2023-04-19 10:12       ` Ryan Roberts
2023-04-19 10:51         ` David Hildenbrand
2023-04-19 11:13           ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6857912b-4afd-7fb5-b11b-ebe0e32298c2@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).