All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nadav Amit <nadav.amit@gmail.com>
To: Peter Xu <peterx@redhat.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, Andrea Arcangeli <aarcange@redhat.com>,
	Andi Kleen <andi.kleen@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Hugh Dickins <hughd@google.com>,
	Huang Ying <ying.huang@intel.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH 2/2] mm: Remember young bit for page migrations
Date: Wed, 3 Aug 2022 00:42:54 -0700	[thread overview]
Message-ID: <EEBB9DDB-3D67-42BD-BD16-4264F40760CD@gmail.com> (raw)
In-Reply-To: <20220803012159.36551-3-peterx@redhat.com>

On Aug 2, 2022, at 6:21 PM, Peter Xu <peterx@redhat.com> wrote:

> When page migration happens, we always ignore the young bit settings in the
> old pgtable, and marking the page as old in the new page table using either
> pte_mkold() or pmd_mkold().
> 
> That's fine from functional-wise, but that's not friendly to page reclaim
> because the moving page can be actively accessed within the procedure.  Not
> to mention hardware setting the young bit can bring quite some overhead on
> some systems, e.g. x86_64 needs a few hundreds nanoseconds to set the bit.
> 
> Actually we can easily remember the young bit configuration and recover the
> information after the page is migrated.  To achieve it, define a new bit in
> the migration swap offset field to show whether the old pte has young bit
> set or not.  Then when removing/recovering the migration entry, we can
> recover the young bit even if the page changed.
> 
> One thing to mention is that here we used max_swapfile_size() to detect how
> many swp offset bits we have, and we'll only enable this feature if we know
> the swp offset can be big enough to store both the PFN value and the young
> bit.  Otherwise the young bit is dropped like before.

I gave it some more thought and I am less confident whether this is the best
solution. Not sure it is not either, so I am raising an alternative with
pros and cons.

An alternative would be to propagate the access bit into the page (i.e.,
using folio_set_young()) and then set it back into the PTE later (i.e.,
based on folio_test_young()). It might even seem that in general it is
better to always set the page access bit if folio_test_young().

This can be simpler and more performant. Setting the access-bit would not
impact reclaim decisions (as the page is already considered young), would
not induce overheads on clearing the access-bit (no TLB flush is needed at
least on x86), and would save the time the CPU takes to set the access bit
if the page is ever accessed (on x86).

It may also improve the preciseness of page-idle mechanisms and the
interaction with it. IIUC, page-idle does not consider migration entries, so
the user would not get indication that pages under migration are not idle.
When page-idle is reset, migrated pages might be later reinstated as
“accessed”, giving wrong indication that the pages are not-idle, when in
fact they are.

On the negative side, I am not sure whether other archs, that might require
a TLB flush for resetting the access-bit, and the overhead of doing atomic
operation to clear the access-bit, would not induce more overhead than they
would save.

One more unrelated point - note that remove_migration_pte() would always set
a clean PTE even when the old one was dirty…


  reply	other threads:[~2022-08-03  7:43 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-03  1:21 [PATCH 0/2] mm: Remember young bit for migration entries Peter Xu
2022-08-03  1:21 ` [PATCH 1/2] mm/swap: Add swp_offset_pfn() to fetch PFN from swap entry Peter Xu
2022-08-03  1:21 ` [PATCH 2/2] mm: Remember young bit for page migrations Peter Xu
2022-08-03  7:42   ` Nadav Amit [this message]
2022-08-03 16:45     ` Peter Xu
2022-08-04  6:42       ` Nadav Amit
2022-08-04 17:07         ` Peter Xu
2022-08-04 17:16           ` Nadav Amit
2022-08-03 12:21   ` kernel test robot
2022-08-03 21:47     ` Peter Xu
2022-08-03 21:47       ` Peter Xu
2022-08-03 13:53   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=EEBB9DDB-3D67-42BD-BD16-4264F40760CD@gmail.com \
    --to=nadav.amit@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi.kleen@intel.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    --cc=vbabka@suse.cz \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.