All of lore.kernel.org
 help / color / mirror / Atom feed
From: Raghavendra K T <raghavendra.kt@amd.com>
To: Peter Xu <peterx@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Hugh Dickins <hughd@google.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Alistair Popple <apopple@nvidia.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Minchan Kim <minchan@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Andi Kleen <andi.kleen@intel.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Huang Ying <ying.huang@intel.com>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH v4 0/7] mm: Remember a/d bits for migration entries
Date: Mon, 21 Nov 2022 10:45:45 +0530	[thread overview]
Message-ID: <dfc6bdde-7e5d-44e8-8549-7d61a0f18bb5@amd.com> (raw)
In-Reply-To: <20220811161331.37055-1-peterx@redhat.com>

On 8/11/2022 9:43 PM, Peter Xu wrote:
> v4:
> - Added r-bs for Ying
> - Some cosmetic changes here and there [Ying]
> - Fix smaps to only dump PFN for pfn swap entries for both pte/pmd [Ying]
> - Remove max_swapfile_size(), export swapfile_maximum_size variable [Ying]
> - In migrate_vma_collect_pmd() only read A/D if pte_present()
> 
> rfc: https://lore.kernel.org/all/20220729014041.21292-1-peterx@redhat.com
> v1:  https://lore.kernel.org/all/20220803012159.36551-1-peterx@redhat.com
> v2:  https://lore.kernel.org/all/20220804203952.53665-1-peterx@redhat.com
> v3:  https://lore.kernel.org/all/20220809220100.20033-1-peterx@redhat.com
> 
> Problem
> =======
> 
> When migrate a page, right now we always mark the migrated page as old &
> clean.
> 
> However that could lead to at least two problems:
> 
>    (1) We lost the real hot/cold information while we could have persisted.
>        That information shouldn't change even if the backing page is changed
>        after the migration,
> 
>    (2) There can be always extra overhead on the immediate next access to
>        any migrated page, because hardware MMU needs cycles to set the young
>        bit again for reads, and dirty bits for write, as long as the
>        hardware MMU supports these bits.
> 
> Many of the recent upstream works showed that (2) is not something trivial
> and actually very measurable.  In my test case, reading 1G chunk of memory
> - jumping in page size intervals - could take 99ms just because of the
> extra setting on the young bit on a generic x86_64 system, comparing to 4ms
> if young set.
> 
> This issue is originally reported by Andrea Arcangeli.
> 
> Solution
> ========
> 
> To solve this problem, this patchset tries to remember the young/dirty bits
> in the migration entries and carry them over when recovering the ptes.
> 
> We have the chance to do so because in many systems the swap offset is not
> really fully used.  Migration entries use swp offset to store PFN only,
> while the PFN is normally not as large as swp offset and normally smaller.
> It means we do have some free bits in swp offset that we can use to store
> things like A/D bits, and that's how this series tried to approach this
> problem.
> 
> max_swapfile_size() is used here to detect per-arch offset length in swp
> entries.  We'll automatically remember the A/D bits when we find that we
> have enough swp offset field to keep both the PFN and the extra bits.
> 
> Since max_swapfile_size() can be slow, the last two patches cache the
> results for it and also swap_migration_ad_supported as a whole.
> 
> Known Issues / TODOs
> ====================
> 
> We still haven't taught madvise() to recognize the new A/D bits in
> migration entries, namely MADV_COLD/MADV_FREE.  E.g. when MADV_COLD upon a
> migration entry.  It's not clear yet on whether we should clear the A bit,
> or we should just drop the entry directly.
> 
> We didn't teach idle page tracking on the new migration entries, because
> it'll need larger rework on the tree on rmap pgtable walk.  However it
> should make it already better because before this patchset page will be old
> page after migration, so the series will fix potential false negative of
> idle page tracking when pages were migrated before observing.
> 
> The other thing is migration A/D bits will not start to working for private
> device swap entries.  The code is there for completeness but since private
> device swap entries do not yet have fields to store A/D bits, even if we'll
> persistent A/D across present pte switching to migration entry, we'll lose
> it again when the migration entry converted to private device swap entry.
> 
> Tests
> =====
> 
> After the patchset applied, the immediate read access test [1] of above 1G
> chunk after migration can shrink from 99ms to 4ms.  The test is done by
> moving 1G pages from node 0->1->0 then read it in page size jumps.  The
> test is with Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz.
> 
> Similar effect can also be measured when writting the memory the 1st time
> after migration.
> 
> After applying the patchset, both initial immediate read/write after page
> migrated will perform similarly like before migration happened.

I was able to test on AMD EPYC 64 core 2 numa node (Milan) 3.72 GHz 
clocked system

am seeing the similar improvement for the test mentioned above (swap-young)

base: (6.0)
--------------
Write (node 0) took 562202 (us)
Read (node 0) took 7790 (us)
Move to node 1 took 474876(us)
Move to node 0 took 642805(us)
Read (node 0) took 81364 (us)
Write (node 0) took 12887 (us)
Read (node 0) took 5202 (us)
Write (node 0) took 4533 (us)
Read (node 0) took 5229 (us)
Write (node 0) took 4558 (us)
Read (node 0) took 5198 (us)
Write (node 0) took 4551 (us)
Read (node 0) took 5218 (us)
Write (node 0) took 4534 (us)

patched
-------------
Write (node 0) took 250232 (us)
Read (node 0) took 3262 (us)
Move to node 1 took 640636(us)
Move to node 0 took 449051(us)
Read (node 0) took 2966 (us)
Write (node 0) took 2720 (us)
Read (node 0) took 2891 (us)
Write (node 0) took 2560 (us)
Read (node 0) took 2899 (us)
Write (node 0) took 2568 (us)
Read (node 0) took 2890 (us)
Write (node 0) took 2568 (us)
Read (node 0) took 2897 (us)
Write (node 0) took 2563 (us)

Please feel free to add FWIW
Tested-by: Raghavendra K T <raghavendra.kt@amd.com>

> 
> Patch Layout
> ============
> 
> Patch 1-2:  Cleanups from either previous versions or on swapops.h macros.
> 
> Patch 3-4:  Prepare for the introduction of migration A/D bits
> 
> Patch 5:    The core patch to remember young/dirty bit in swap offsets.
> 
> Patch 6-7:  Cache relevant fields to make migration_entry_supports_ad() fast.
> 
> Please review, thanks.
> 
> [1] https://github.com/xzpeter/clibs/blob/master/misc/swap-young.c
> 
> Peter Xu (7):
>    mm/x86: Use SWP_TYPE_BITS in 3-level swap macros
>    mm/swap: Comment all the ifdef in swapops.h
>    mm/swap: Add swp_offset_pfn() to fetch PFN from swap entry
>    mm/thp: Carry over dirty bit when thp splits on pmd
>    mm: Remember young/dirty bit for page migrations
>    mm/swap: Cache maximum swapfile size when init swap
>    mm/swap: Cache swap migration A/D bits support
> 
>   arch/arm64/mm/hugetlbpage.c           |   2 +-
>   arch/x86/include/asm/pgtable-3level.h |   8 +-
>   arch/x86/mm/init.c                    |   2 +-
>   fs/proc/task_mmu.c                    |  20 +++-
>   include/linux/swapfile.h              |   5 +-
>   include/linux/swapops.h               | 145 +++++++++++++++++++++++---
>   mm/hmm.c                              |   2 +-
>   mm/huge_memory.c                      |  27 ++++-
>   mm/memory-failure.c                   |   2 +-
>   mm/migrate.c                          |   6 +-
>   mm/migrate_device.c                   |   6 ++
>   mm/page_vma_mapped.c                  |   6 +-
>   mm/rmap.c                             |   5 +-
>   mm/swapfile.c                         |  15 ++-
>   14 files changed, 214 insertions(+), 37 deletions(-)
> 


  parent reply	other threads:[~2022-11-21  5:16 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-11 16:13 [PATCH v4 0/7] mm: Remember a/d bits for migration entries Peter Xu
2022-08-11 16:13 ` [PATCH v4 1/7] mm/x86: Use SWP_TYPE_BITS in 3-level swap macros Peter Xu
2022-08-11 16:13 ` [PATCH v4 2/7] mm/swap: Comment all the ifdef in swapops.h Peter Xu
2022-08-15  6:03   ` Alistair Popple
2022-08-11 16:13 ` [PATCH v4 3/7] mm/swap: Add swp_offset_pfn() to fetch PFN from swap entry Peter Xu
2022-08-12  2:33   ` Huang, Ying
2022-08-23 21:01     ` Yu Zhao
2022-08-23 22:04       ` Peter Xu
2022-08-11 16:13 ` [PATCH v4 4/7] mm/thp: Carry over dirty bit when thp splits on pmd Peter Xu
2022-10-21 16:06   ` dpkg fails on sparc64 (was: [PATCH v4 4/7] mm/thp: Carry over dirty bit when thp splits on pmd) Anatoly Pugachev
2022-10-23 13:33     ` dpkg fails on sparc64 (was: [PATCH v4 4/7] mm/thp: Carry over dirty bit when thp splits on pmd) #forregzbot Thorsten Leemhuis
2022-11-04 10:39       ` Thorsten Leemhuis
2022-11-13 17:56         ` Thorsten Leemhuis
2022-10-23 19:52     ` dpkg fails on sparc64 (was: [PATCH v4 4/7] mm/thp: Carry over dirty bit when thp splits on pmd) Peter Xu
2022-10-25 10:22       ` Anatoly Pugachev
2022-10-25 14:43         ` Peter Xu
2022-11-01 13:13           ` Anatoly Pugachev
2022-11-02 18:34             ` Peter Xu
2022-11-02 18:47               ` Andrew Morton
2022-08-11 16:13 ` [PATCH v4 5/7] mm: Remember young/dirty bit for page migrations Peter Xu
2022-09-11 23:48   ` Andrew Morton
2022-09-13  0:55     ` Huang, Ying
2022-08-11 16:13 ` [PATCH v4 6/7] mm/swap: Cache maximum swapfile size when init swap Peter Xu
2022-08-12  2:34   ` Huang, Ying
2022-08-11 16:13 ` [PATCH v4 7/7] mm/swap: Cache swap migration A/D bits support Peter Xu
2022-11-21  5:15 ` Raghavendra K T [this message]
2022-11-21 14:57   ` [PATCH v4 0/7] mm: Remember a/d bits for migration entries Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dfc6bdde-7e5d-44e8-8549-7d61a0f18bb5@amd.com \
    --to=raghavendra.kt@amd.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi.kleen@intel.com \
    --cc=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=peterx@redhat.com \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.