All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Huang Ying <ying.huang@intel.com>,
	peterx@redhat.com, Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Nadav Amit <nadav.amit@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	David Hildenbrand <david@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: [PATCH RFC 0/4] mm: Remember young bit for migration entries
Date: Thu, 28 Jul 2022 21:40:37 -0400	[thread overview]
Message-ID: <20220729014041.21292-1-peterx@redhat.com> (raw)

[Marking as RFC; only x86 is supported for now, plan to add a few more
 archs when there's a formal version]

Problem
=======

When migrate a page, right now we always mark the migrated page as old.
The reason could be that we don't really know whether the page is hot or
cold, so we could have taken it a default negative assuming that's safer.

However that could lead to at least two problems:

  (1) We lost the real hot/cold information while we could have persisted.
      That information shouldn't change even if the backing page is changed
      after the migration,

  (2) There can be always extra overhead on the immediate next access to
      any migrated page, because hardware MMU needs cycles to set the young
      bit again (as long as the MMU supports).

Many of the recent upstream works showed that (2) is not something trivial
and actually very measurable.  In my test case, reading 1G chunk of memory
- jumping in page size intervals - could take 99ms just because of the
extra setting on the young bit on a generic x86_64 system, comparing to 4ms
if young set.

This issue is originally reported by Andrea Arcangeli.

Solution
========

To solve this problem, this patchset tries to remember the young bit in the
migration entries and carry it over when recovering the ptes.

We have the chance to do so because in many systems the swap offset is not
really fully used.  Migration entries use swp offset to store PFN only,
while the PFN is normally not as large as swp offset and normally smaller.
It means we do have some free bits in swp offset that we can use to store
things like young, and that's how this series tried to approach this
problem.

One tricky thing here is even though we're embedding the information into
swap entry which seems to be a very generic data structure, the number of
bits that are free is still arch dependent.  Not only because the size of
swp_entry_t differs, but also due to the different layouts of swap ptes on
different archs.

Here, this series requires specific arch to define an extra macro called
__ARCH_SWP_OFFSET_BITS represents the size of swp offset.  With this
information, the swap logic can know whether there's extra bits to use,
then it'll remember the young bits when possible.  By default, it'll keep
the old behavior of keeping all migrated pages cold.

Tests
=====

After the patchset applied, the immediate read access test [1] of above 1G
chunk after migration can shrink from 99ms to 4ms.  The test is done by
moving 1G pages from node 0->1->0 then read it in page size jumps.

Currently __ARCH_SWP_OFFSET_BITS is only defined on x86 for this series and
only tested on x86_64 with Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz.

Patch Layout
============

Patch 1:  Add swp_offset_pfn() and apply to all pfn swap entries, we should
          also stop treating swp_offset() as PFN anymore because it can
          contain more information starting from next patch.
Patch 2:  The core patch to remember young bit in swap offsets.
Patch 3:  A cleanup for x86 32 bits pgtable.h.
Patch 4:  Define __ARCH_SWP_OFFSET_BITS on x86, enable young bit for migration

Please review, thanks.

[1] https://github.com/xzpeter/clibs/blob/master/misc/swap-young.c

Peter Xu (4):
  mm/swap: Add swp_offset_pfn() to fetch PFN from swap entry
  mm: Remember young bit for page migrations
  mm/x86: Use SWP_TYPE_BITS in 3-level swap macros
  mm/x86: Define __ARCH_SWP_OFFSET_BITS

 arch/arm64/mm/hugetlbpage.c           |  2 +-
 arch/x86/include/asm/pgtable-2level.h |  6 ++
 arch/x86/include/asm/pgtable-3level.h | 15 +++--
 arch/x86/include/asm/pgtable_64.h     |  5 ++
 include/linux/swapops.h               | 85 +++++++++++++++++++++++++--
 mm/hmm.c                              |  2 +-
 mm/huge_memory.c                      | 10 +++-
 mm/memory-failure.c                   |  2 +-
 mm/migrate.c                          |  4 +-
 mm/migrate_device.c                   |  2 +
 mm/page_vma_mapped.c                  |  6 +-
 mm/rmap.c                             |  3 +-
 12 files changed, 122 insertions(+), 20 deletions(-)

-- 
2.32.0


             reply	other threads:[~2022-07-29  1:40 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-29  1:40 Peter Xu [this message]
2022-07-29  1:40 ` [PATCH RFC 1/4] mm/swap: Add swp_offset_pfn() to fetch PFN from swap entry Peter Xu
2022-08-01  3:13   ` Huang, Ying
2022-08-01 22:29     ` Peter Xu
2022-08-02  1:22       ` Huang, Ying
2022-07-29  1:40 ` [PATCH RFC 2/4] mm: Remember young bit for page migrations Peter Xu
2022-07-29  1:40 ` [PATCH RFC 3/4] mm/x86: Use SWP_TYPE_BITS in 3-level swap macros Peter Xu
2022-07-29  1:40 ` [PATCH RFC 4/4] mm/x86: Define __ARCH_SWP_OFFSET_BITS Peter Xu
2022-07-29 17:07 ` [PATCH RFC 0/4] mm: Remember young bit for migration entries Nadav Amit
2022-07-29 22:43   ` Peter Xu
2022-08-01  3:20 ` Huang, Ying
2022-08-01  5:33 ` Huang, Ying
2022-08-01 22:25   ` Peter Xu
2022-08-01  8:21 ` David Hildenbrand
2022-08-01 22:35   ` Peter Xu
2022-08-02 12:06     ` David Hildenbrand
2022-08-02 20:14       ` Peter Xu
2022-08-02 20:23         ` David Hildenbrand
2022-08-02 20:35           ` Peter Xu
2022-08-02 20:59             ` David Hildenbrand
2022-08-02 22:15               ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220729014041.21292-1-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nadav.amit@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.