All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>,
	nao.horiguchi@gmail.com,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: mempolicy: don't have to split pmd for huge zero page
Date: Mon, 7 Jun 2021 10:00:01 -0700	[thread overview]
Message-ID: <CAHbLzkowcskM=p==-q48Ca12D=h9SgqUuUB4NknRNR=64TyXCw@mail.gmail.com> (raw)
In-Reply-To: <YL265A86DQe5Rgon@dhcp22.suse.cz>

On Sun, Jun 6, 2021 at 11:21 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Fri 04-06-21 13:35:13, Yang Shi wrote:
> > When trying to migrate pages to obey mempolicy, the huge zero page is
> > split then the page table walk at PTE level just skips zero page.  So it
> > seems pointless to split huge zero page, it could be just skipped like
> > base zero page.
>
> My THP knowledge is not the best but this is incorrect AIACS. Huge zero
> page is not split. We do split the pmd which is mapping the said page. I
> suspect you refer to vm_normal_page when talking about a zero page but
> please be aware that huge zero page is not a normal zero page. It is
> allocated dynamically (see get_huge_zero_page).

For a normal huge page, yes, split_huge_pmd() just splits pmd. But
actually the base zero pfn will be inserted to PTEs when splitting
huge zero pmd. Please check __split_huge_zero_page_pmd() out.

I should make this point clearer in the commit log. Sorry for the confusion.

>
> So in the end you patch disables mbind of zero pages to a target node
> and that is a regression.

Do we really migrate zero page? IIUC zero page is just skipped by
vm_normal_page() check in queue_pages_pte_range(), isn't it?

>
> Have you tested the patch?

No, just build test. I thought this change was straightforward.

>
> > Set ACTION_CONTINUE to prevent the walk_page_range() split the pmd for
> > this case.
>
> Btw. this changelog is missing a problem statement. I suspect there is
> no actual problem that it should fix and it is likely driven by reading
> the code. Right?

The actual problem is it is pointless to split a huge zero pmd. Yes,
it is driven by visual inspection.

The behavior before the patch for huge zero page is:
split huge zero pmd (insert base zero pfn to ptes)
walk ptes
skip zero pfn

So why not just skip the huge zero page in the first place?

>
> > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > ---
> >  mm/mempolicy.c | 9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index b5f4f584009b..205c1a768775 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -436,7 +436,8 @@ static inline bool queue_pages_required(struct page *page,
> >
> >  /*
> >   * queue_pages_pmd() has four possible return values:
> > - * 0 - pages are placed on the right node or queued successfully.
> > + * 0 - pages are placed on the right node or queued successfully, or
> > + *     special page is met, i.e. huge zero page.
> >   * 1 - there is unmovable page, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
> >   *     specified.
> >   * 2 - THP was split.
> > @@ -460,8 +461,7 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
> >       page = pmd_page(*pmd);
> >       if (is_huge_zero_page(page)) {
> >               spin_unlock(ptl);
> > -             __split_huge_pmd(walk->vma, pmd, addr, false, NULL);
> > -             ret = 2;
> > +             walk->action = ACTION_CONTINUE;
> >               goto out;
> >       }
> >       if (!queue_pages_required(page, qp))
> > @@ -488,7 +488,8 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
> >   * and move them to the pagelist if they do.
> >   *
> >   * queue_pages_pte_range() has three possible return values:
> > - * 0 - pages are placed on the right node or queued successfully.
> > + * 0 - pages are placed on the right node or queued successfully, or
> > + *     special page is met, i.e. zero page.
> >   * 1 - there is unmovable page, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
> >   *     specified.
> >   * -EIO - only MPOL_MF_STRICT was specified and an existing page was already
> > --
> > 2.26.2
>
> --
> Michal Hocko
> SUSE Labs

  reply	other threads:[~2021-06-07 17:00 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-04 20:35 [PATCH] mm: mempolicy: don't have to split pmd for huge zero page Yang Shi
2021-06-04 21:23 ` Zi Yan
2021-06-07  6:21 ` Michal Hocko
2021-06-07 17:00   ` Yang Shi [this message]
2021-06-07 17:00     ` Yang Shi
2021-06-07 18:41     ` Yang Shi
2021-06-07 18:41       ` Yang Shi
2021-06-07 18:55     ` Michal Hocko
2021-06-07 22:02       ` Yang Shi
2021-06-07 22:02         ` Yang Shi
2021-06-08  6:41         ` Michal Hocko
2021-06-08 17:15           ` Yang Shi
2021-06-08 17:15             ` Yang Shi
2021-06-08 17:49             ` Michal Hocko
2021-06-08 19:36               ` Yang Shi
2021-06-08 19:36                 ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHbLzkowcskM=p==-q48Ca12D=h9SgqUuUB4NknRNR=64TyXCw@mail.gmail.com' \
    --to=shy828301@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=nao.horiguchi@gmail.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.