linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* pmd_modify() semantics
@ 2015-10-13 13:58 Vineet Gupta
  2015-10-13 16:06 ` Kirill A. Shutemov
  0 siblings, 1 reply; 3+ messages in thread
From: Vineet Gupta @ 2015-10-13 13:58 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton; +Cc: lkml, linux-mm, Minchan Kim

Hi Kirill,

I'm running LTP tests on the new ARC THP code and thp03 seems to be triggering mm
spew.

--------------->8---------------------
[ARCLinux]# ./ltp-thp03-extract
PID 60
bad pmd bf1c4600 be600231
../mm/pgtable-generic.c:34: bad pgd be600231.
bad pmd bf1c4604 bd800231
../mm/pgtable-generic.c:34: bad pgd bd800231.
BUG: Bad rss-counter state mm:bf12e900 idx:1 val:512
BUG: non-zero nr_ptes on freeing mm: 2
--------------->8---------------------

I know what exactly is happening and the likely fix, but would want to get some
thoughts from you if possible.

background: ARC is software page walked with PGD -> PTE -> page for normal and PMD
-> page for THP case. A vanilla PGD doesn't have any flags - only pointer to PTE

A reduced version of thp03 allocates a THP, dirties it, followed by
mprotect(PROT_NONE).
At the time of mprotect() -> change_huge_pmd() -> pmd_modify() needs to change
some of the bits.

The issue is ARC implementation of pmd_modify() based on pte variant, which
retains the soft pte bits (dirty and accessed).

static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
{
    return pte_pmd(pte_modify(pmd_pte(pmd), newprot));
}

Obvious fix is to rewrite pmd_modify() so that it clears out all pte type flags
but that assumes PMD is becoming PGD (a vanilla PGD on ARC doesn't have any
flags). Can we have pmd_modify() ever be called for NOT splitting pmd e.g.
mprotect Write to Read which won't split the THP like it does now and simply
changes the prot flags. My proposed version of pmd_modify() will loose the dirty bit.

In short, what are the semantics of pmd_modify() - essentially does it imply pmd
is being split so are free to make it like PGD.

TIA,
-Vineet

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: pmd_modify() semantics
  2015-10-13 13:58 pmd_modify() semantics Vineet Gupta
@ 2015-10-13 16:06 ` Kirill A. Shutemov
  2015-10-14  6:33   ` Vineet Gupta
  0 siblings, 1 reply; 3+ messages in thread
From: Kirill A. Shutemov @ 2015-10-13 16:06 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Kirill A. Shutemov, Andrew Morton, lkml, linux-mm, Minchan Kim

On Tue, Oct 13, 2015 at 01:58:39PM +0000, Vineet Gupta wrote:
> Hi Kirill,
> 
> I'm running LTP tests on the new ARC THP code and thp03 seems to be triggering mm
> spew.
> 
> --------------->8---------------------
> [ARCLinux]# ./ltp-thp03-extract
> PID 60
> bad pmd bf1c4600 be600231
> ../mm/pgtable-generic.c:34: bad pgd be600231.
> bad pmd bf1c4604 bd800231
> ../mm/pgtable-generic.c:34: bad pgd bd800231.
> BUG: Bad rss-counter state mm:bf12e900 idx:1 val:512
> BUG: non-zero nr_ptes on freeing mm: 2
> --------------->8---------------------
> 
> I know what exactly is happening and the likely fix, but would want to get some
> thoughts from you if possible.
> 
> background: ARC is software page walked with PGD -> PTE -> page for normal and PMD
> -> page for THP case. A vanilla PGD doesn't have any flags - only pointer to PTE
> 
> A reduced version of thp03 allocates a THP, dirties it, followed by
> mprotect(PROT_NONE).
> At the time of mprotect() -> change_huge_pmd() -> pmd_modify() needs to change
> some of the bits.
> 
> The issue is ARC implementation of pmd_modify() based on pte variant, which
> retains the soft pte bits (dirty and accessed).
> 
> static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
> {
>     return pte_pmd(pte_modify(pmd_pte(pmd), newprot));
> }
> 
> Obvious fix is to rewrite pmd_modify() so that it clears out all pte type flags
> but that assumes PMD is becoming PGD (a vanilla PGD on ARC doesn't have any
> flags). Can we have pmd_modify() ever be called for NOT splitting pmd e.g.
> mprotect Write to Read which won't split the THP like it does now and simply
> changes the prot flags. My proposed version of pmd_modify() will loose the dirty bit.

Hm? pmd_modify() is nothing to do with splitting. The mprotect() codepath
you've mentioned above calls pmd_modify() only if the THP is fully in
mprotect range.

> In short, what are the semantics of pmd_modify() - essentially does it imply pmd
> is being split so are free to make it like PGD.

No, pmd_modify() cannot make such assumption. That's just not true -- we
don't split PMD in such codepath. And even if we do, we construct new PMD
entry from scratch instead of modifying existing one.

So the semantics of pmd_modify(): you can assume that the entry is
pmd_large(), going to stay this way and you need to touch only
protection-related bit.

-- 
 Kirill A. Shutemov

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: pmd_modify() semantics
  2015-10-13 16:06 ` Kirill A. Shutemov
@ 2015-10-14  6:33   ` Vineet Gupta
  0 siblings, 0 replies; 3+ messages in thread
From: Vineet Gupta @ 2015-10-14  6:33 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Andrew Morton, lkml, linux-mm, Minchan Kim

On Tuesday 13 October 2015 09:37 PM, Kirill A. Shutemov wrote:
> On Tue, Oct 13, 2015 at 01:58:39PM +0000, Vineet Gupta wrote:
>> Hi Kirill,
>>
>> I'm running LTP tests on the new ARC THP code and thp03 seems to be triggering mm
>> spew.
>>
>> --------------->8---------------------
>> [ARCLinux]# ./ltp-thp03-extract
>> PID 60
>> bad pmd bf1c4600 be600231
>> ../mm/pgtable-generic.c:34: bad pgd be600231.
>> bad pmd bf1c4604 bd800231
>> ../mm/pgtable-generic.c:34: bad pgd bd800231.
>> BUG: Bad rss-counter state mm:bf12e900 idx:1 val:512
>> BUG: non-zero nr_ptes on freeing mm: 2
>> --------------->8---------------------
>>
>> I know what exactly is happening and the likely fix, but would want to get some
>> thoughts from you if possible.
>>
>> background: ARC is software page walked with PGD -> PTE -> page for normal and PMD
>> -> page for THP case. A vanilla PGD doesn't have any flags - only pointer to PTE
>>
>> A reduced version of thp03 allocates a THP, dirties it, followed by
>> mprotect(PROT_NONE).
>> At the time of mprotect() -> change_huge_pmd() -> pmd_modify() needs to change
>> some of the bits.
>>
>> The issue is ARC implementation of pmd_modify() based on pte variant, which
>> retains the soft pte bits (dirty and accessed).
>>
>> static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
>> {
>>     return pte_pmd(pte_modify(pmd_pte(pmd), newprot));
>> }
>>
>> Obvious fix is to rewrite pmd_modify() so that it clears out all pte type flags
>> but that assumes PMD is becoming PGD (a vanilla PGD on ARC doesn't have any
>> flags). Can we have pmd_modify() ever be called for NOT splitting pmd e.g.
>> mprotect Write to Read which won't split the THP like it does now and simply
>> changes the prot flags. My proposed version of pmd_modify() will loose the dirty bit.
> Hm? pmd_modify() is nothing to do with splitting. The mprotect() codepath
> you've mentioned above calls pmd_modify() only if the THP is fully in
> mprotect range.

Indeed my mental picture of this was messed up - specially because behind the
back, pmd_modify() for ARC (based on pte_modify()) was buggered to clear the huge
page bit itself :-) So we had a THP PMD which would start failing for
pmd_trans_huge() and thus treated like a normal PGD. But it had the leftover PMD
soft bits, which triggered the MM spew.

The localized fix is below, while better fix is to make pte_modify() only clear
R-W-X bits (currently it clears everything except soft accessed/dirty bits)

 static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
 {
-       return pte_pmd(pte_modify(pmd_pte(pmd), newprot));
+        /*
+         * open-coded pte_modify() additionally retaining HW_SZ bit
+         * otherwise, pmd_trans_huge() checks start failing
+         */
+        return __pmd((pmd_val(pmd) & (_PAGE_CHG_MASK | _PAGE_HW_SZ)) |
pgprot_val(newprot));
 }


>
>> In short, what are the semantics of pmd_modify() - essentially does it imply pmd
>> is being split so are free to make it like PGD.
> No, pmd_modify() cannot make such assumption. That's just not true -- we
> don't split PMD in such codepath. And even if we do, we construct new PMD
> entry from scratch instead of modifying existing one.
>
> So the semantics of pmd_modify(): you can assume that the entry is
> pmd_large(), going to stay this way and you need to touch only
> protection-related bit.

Thx !

-Vineet

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-10-14  6:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-13 13:58 pmd_modify() semantics Vineet Gupta
2015-10-13 16:06 ` Kirill A. Shutemov
2015-10-14  6:33   ` Vineet Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).