From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>,
Andrea Arcangeli <aarcange@redhat.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"kirill.shutemov@linux.intel.com"
<kirill.shutemov@linux.intel.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"mhocko@suse.com" <mhocko@suse.com>,
"will.deacon@arm.com" <will.deacon@arm.com>
Subject: Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry
Date: Thu, 18 Oct 2018 02:17:42 +0000 [thread overview]
Message-ID: <20181018021741.GA3603@hori1.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <5A0A88EF-4B86-4173-A506-DE19BDB786B8@cs.rutgers.edu>
On Tue, Oct 16, 2018 at 10:31:50AM -0400, Zi Yan wrote:
> On 15 Oct 2018, at 0:06, Anshuman Khandual wrote:
>
> > On 10/15/2018 06:23 AM, Zi Yan wrote:
> >> On 12 Oct 2018, at 4:00, Anshuman Khandual wrote:
> >>
> >>> On 10/10/2018 06:13 PM, Zi Yan wrote:
> >>>> On 10 Oct 2018, at 0:05, Anshuman Khandual wrote:
> >>>>
> >>>>> On 10/09/2018 07:28 PM, Zi Yan wrote:
> >>>>>> cc: Naoya Horiguchi (who proposed to use !_PAGE_PRESENT && !_PAGE_PSE for x86
> >>>>>> PMD migration entry check)
> >>>>>>
> >>>>>> On 8 Oct 2018, at 23:58, Anshuman Khandual wrote:
> >>>>>>
> >>>>>>> A normal mapped THP page at PMD level should be correctly differentiated
> >>>>>>> from a PMD migration entry while walking the page table. A mapped THP would
> >>>>>>> additionally check positive for pmd_present() along with pmd_trans_huge()
> >>>>>>> as compared to a PMD migration entry. This just adds a new conditional test
> >>>>>>> differentiating the two while walking the page table.
> >>>>>>>
> >>>>>>> Fixes: 616b8371539a6 ("mm: thp: enable thp migration in generic path")
> >>>>>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> >>>>>>> ---
> >>>>>>> On X86, pmd_trans_huge() and is_pmd_migration_entry() are always mutually
> >>>>>>> exclusive which makes the current conditional block work for both mapped
> >>>>>>> and migration entries. This is not same with arm64 where pmd_trans_huge()
> >>>>>>
> >>>>>> !pmd_present() && pmd_trans_huge() is used to represent THPs under splitting,
> >>>>>
> >>>>> Not really if we just look at code in the conditional blocks.
> >>>>
> >>>> Yeah, I explained it wrong above. Sorry about that.
> >>>>
> >>>> In x86, pmd_present() checks (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE),
> >>>> thus, it returns true even if the present bit is cleared but PSE bit is set.
> >>>
> >>> Okay.
> >>>
> >>>> This is done so, because THPs under splitting are regarded as present in the kernel
> >>>> but not present when a hardware page table walker checks it.
> >>>
> >>> Okay.
> >>>
> >>>>
> >>>> For PMD migration entry, which should be regarded as not present, if PSE bit
> >>>> is set, which makes pmd_trans_huge() returns true, like ARM64 does, all
> >>>> PMD migration entries will be regarded as present
> >>>
> >>> Okay to make pmd_present() return false pmd_trans_huge() has to return false
> >>> as well. Is there anything which can be done to get around this problem on
> >>> X86 ? pmd_trans_huge() returning true for a migration entry sounds logical.
> >>> Otherwise we would revert the condition block order to accommodate both the
> >>> implementation for pmd_trans_huge() as suggested by Kirill before or just
> >>> consider this patch forward.
> >>>
> >>> Because I am not really sure yet about the idea of getting pmd_present()
> >>> check into pmd_trans_huge() on arm64 just to make it fit into this semantics
> >>> as suggested by Will. If a PMD is trans huge page or not should not depend on
> >>> whether it is present or not.
> >>
> >> In terms of THPs, we have three cases: a present THP, a THP under splitting,
> >> and a THP under migration. pmd_present() and pmd_trans_huge() both return true
> >> for a present THP and a THP under splitting, because they discover _PAGE_PSE bit
> >
> > Then how do we differentiate between a mapped THP and a splitting THP.
>
> AFAIK, in x86, there is no distinction between a mapped THP and a splitting THP
> using helper functions.
>
> A mapped THP has _PAGE_PRESENT bit and _PAGE_PSE bit set, whereas a splitting THP
> has only _PAGE_PSE bit set. But both pmd_present() and pmd_trans_huge() return
> true as long as _PAGE_PSE bit is set.
>
> >
> >> is set for both cases, whereas they both return false for a THP under migration.
> >> You want to change them to make pmd_trans_huge() returns true for a THP under migration
> >> instead of false to help ARM64’s support for THP migration.
> > I am just trying to understand the rationale behind this semantics and see where
> > it should be fixed.
> >
> > I think the fundamental problem here is that THP under split has been difficult
> > to be re-presented through the available helper functions and in turn PTE bits.
> >
> > The following checks
> >
> > 1) pmd_present()
> > 2) pmd_trans_huge()
> >
> > Represent three THP states
> >
> > 1) Mapped THP (pmd_present && pmd_trans_huge)
> > 2) Splitting THP (pmd_present && pmd_trans_huge)
> > 3) Migrating THP (!pmd_present && !pmd_trans_huge)
> >
> > The problem is if we make pmd_trans_huge() return true for all the three states
> > which sounds logical because they are all still trans huge PMD, then pmd_present()
> > can only represent two states not three as required.
>
> We are on the same page about representing three THP states in x86.
> I also agree with you that it is logical to use three distinct representations
> for these three states, i.e. splitting THP could be changed to (!pmd_present && pmd_trans_huge).
I think that the behavior of pmd_trans_huge() for non-present pmd is
undefined by its nature. IOW, it's no use determining whether it's thp or
not for non-existing pages because it does not exist :)
So I think that the right direction is to make sure that pmd_trans_huge() is
never checked for non-present pmd, just like Kirill's suggestion. And maybe
we have some room for engineering to ensure it (rather than just commenting it).
Thanks,
Naoya Horiguchi
next prev parent reply other threads:[~2018-10-18 2:18 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-09 3:58 [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry Anshuman Khandual
2018-10-09 13:04 ` Kirill A. Shutemov
2018-10-09 13:18 ` Will Deacon
2018-10-12 8:02 ` Anshuman Khandual
2018-10-15 8:32 ` Aneesh Kumar K.V
2018-10-16 13:16 ` Anshuman Khandual
2018-10-09 13:42 ` Anshuman Khandual
2018-10-09 13:58 ` Zi Yan
2018-10-10 4:05 ` Anshuman Khandual
2018-10-10 12:43 ` Zi Yan
2018-10-12 8:00 ` Anshuman Khandual
2018-10-15 0:53 ` Zi Yan
2018-10-15 4:06 ` Anshuman Khandual
2018-10-16 14:31 ` Zi Yan
2018-10-18 2:17 ` Naoya Horiguchi [this message]
2018-11-02 5:22 ` Anshuman Khandual
2018-10-25 8:10 ` Anshuman Khandual
2018-10-25 18:45 ` Zi Yan
2018-10-26 1:39 ` Anshuman Khandual
2018-10-17 2:09 ` Andrea Arcangeli
2018-10-22 14:00 ` Zi Yan
2018-11-02 6:15 ` Anshuman Khandual
2018-11-06 0:35 ` Will Deacon
2018-11-06 9:51 ` Anshuman Khandual
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181018021741.GA3603@hori1.linux.bs1.fc.nec.co.jp \
--to=n-horiguchi@ah.jp.nec.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=will.deacon@arm.com \
--cc=zi.yan@cs.rutgers.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).