All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	linuxppc-dev@lists.ozlabs.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	linux-s390@vger.kernel.org,
	Sebastian Ott <sebott@linux.vnet.ibm.com>
Subject: Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
Date: Wed, 24 Feb 2016 09:22:08 +0100	[thread overview]
Message-ID: <20160224092208.49e013ff@mschwide> (raw)
In-Reply-To: <20160223191907.25719a4d@thinkpad>

On Tue, 23 Feb 2016 19:19:07 +0100
Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:

> On Tue, 23 Feb 2016 13:32:21 +0300
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote:
> > > On Fri, 12 Feb 2016 16:57:27 +0100
> > > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> > > 
> > > > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> > > > 
> > > > Don't know, Gerald or Martin?
> > > 
> > > The implementation frequently changes depending on how many new bits Martin
> > > needs to squeeze out :-)
> > > We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
> > > entry is not empty. pmd_none() of course does the opposite, it checks if it is
> > > empty.
> > 
> > I still worry about pmd_present(). It looks wrong to me. I wounder if
> > patch below makes a difference.
> > 
> > The theory is that the splitting bit effetely masked bogus pmd_present():
> > we had pmd_trans_splitting() in all code path and that prevented mm from
> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
> > pmd where it shouldn't and here's a boom.
> 
> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
> splitting, after all there is a page behind the the pmd. Also, if it was
> bogus, and it would need to be false, why should it be marked !pmd_present()
> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
> is pmd_present() before that, on all architectures, and if there was any
> problem/race with that, setting it to !pmd_present() at this stage would
> only (marginally) reduce the race window.
> 
> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
> i.e. they do not set pmd_present() == false, only mark it so that it would
> not generate a new TLB entry, just like on s390. After all, the function
> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
> before that call is just a little ambiguous in its wording. When it says
> "mark the pmd notpresent" it probably means "mark it so that it will not
> generate a new TLB entry", which is also what the comment is really about:
> prevent huge and small entries in the TLB for the same page at the same
> time.

If I am not mistaken this is true for x86 as well. The generic implementation
for pmdp_invalidate sets a new pmd that has been modified with
pmd_mknotpresent. For x86 this function removes the _PAGE_PRESENT and
_PAGE_PROTNONE bits from the entry. The _PAGE_PSE bit stays set and that
makes pmd_present return true.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

WARNING: multiple messages have this Message-ID (diff)
From: Martin Schwidefsky <schwidefsky@de.ibm.com>
To: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	linuxppc-dev@lists.ozlabs.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	linux-s390@vger.kernel.org,
	Sebastian Ott <sebott@linux.vnet.ibm.com>
Subject: Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
Date: Wed, 24 Feb 2016 09:22:08 +0100	[thread overview]
Message-ID: <20160224092208.49e013ff@mschwide> (raw)
In-Reply-To: <20160223191907.25719a4d@thinkpad>

On Tue, 23 Feb 2016 19:19:07 +0100
Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:

> On Tue, 23 Feb 2016 13:32:21 +0300
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote:
> > > On Fri, 12 Feb 2016 16:57:27 +0100
> > > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> > > 
> > > > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> > > > 
> > > > Don't know, Gerald or Martin?
> > > 
> > > The implementation frequently changes depending on how many new bits Martin
> > > needs to squeeze out :-)
> > > We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
> > > entry is not empty. pmd_none() of course does the opposite, it checks if it is
> > > empty.
> > 
> > I still worry about pmd_present(). It looks wrong to me. I wounder if
> > patch below makes a difference.
> > 
> > The theory is that the splitting bit effetely masked bogus pmd_present():
> > we had pmd_trans_splitting() in all code path and that prevented mm from
> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
> > pmd where it shouldn't and here's a boom.
> 
> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
> splitting, after all there is a page behind the the pmd. Also, if it was
> bogus, and it would need to be false, why should it be marked !pmd_present()
> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
> is pmd_present() before that, on all architectures, and if there was any
> problem/race with that, setting it to !pmd_present() at this stage would
> only (marginally) reduce the race window.
> 
> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
> i.e. they do not set pmd_present() == false, only mark it so that it would
> not generate a new TLB entry, just like on s390. After all, the function
> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
> before that call is just a little ambiguous in its wording. When it says
> "mark the pmd notpresent" it probably means "mark it so that it will not
> generate a new TLB entry", which is also what the comment is really about:
> prevent huge and small entries in the TLB for the same page at the same
> time.

If I am not mistaken this is true for x86 as well. The generic implementation
for pmdp_invalidate sets a new pmd that has been modified with
pmd_mknotpresent. For x86 this function removes the _PAGE_PRESENT and
_PAGE_PROTNONE bits from the entry. The _PAGE_PSE bit stays set and that
makes pmd_present return true.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: schwidefsky@de.ibm.com (Martin Schwidefsky)
To: linux-arm-kernel@lists.infradead.org
Subject: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
Date: Wed, 24 Feb 2016 09:22:08 +0100	[thread overview]
Message-ID: <20160224092208.49e013ff@mschwide> (raw)
In-Reply-To: <20160223191907.25719a4d@thinkpad>

On Tue, 23 Feb 2016 19:19:07 +0100
Gerald Schaefer <gerald.schaefer@de.ibm.com> wrote:

> On Tue, 23 Feb 2016 13:32:21 +0300
> "Kirill A. Shutemov" <kirill@shutemov.name> wrote:
> 
> > On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote:
> > > On Fri, 12 Feb 2016 16:57:27 +0100
> > > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> > > 
> > > > > I'm also confused by pmd_none() is equal to !pmd_present() on s390. Hm?
> > > > 
> > > > Don't know, Gerald or Martin?
> > > 
> > > The implementation frequently changes depending on how many new bits Martin
> > > needs to squeeze out :-)
> > > We don't have a _PAGE_PRESENT bit for pmds, so pmd_present() just checks if the
> > > entry is not empty. pmd_none() of course does the opposite, it checks if it is
> > > empty.
> > 
> > I still worry about pmd_present(). It looks wrong to me. I wounder if
> > patch below makes a difference.
> > 
> > The theory is that the splitting bit effetely masked bogus pmd_present():
> > we had pmd_trans_splitting() in all code path and that prevented mm from
> > touching the pmd. Once pmd_trans_splitting() has gone, mm proceed with the
> > pmd where it shouldn't and here's a boom.
> 
> Well, I don't think pmd_present() == true is bogus for a trans_huge pmd under
> splitting, after all there is a page behind the the pmd. Also, if it was
> bogus, and it would need to be false, why should it be marked !pmd_present()
> only at the pmdp_invalidate() step before the pmd_populate()? It clearly
> is pmd_present() before that, on all architectures, and if there was any
> problem/race with that, setting it to !pmd_present() at this stage would
> only (marginally) reduce the race window.
> 
> BTW, PowerPC and Sparc seem to do the same thing in pmdp_invalidate(),
> i.e. they do not set pmd_present() == false, only mark it so that it would
> not generate a new TLB entry, just like on s390. After all, the function
> is called pmdp_invalidate(), and I think the comment in mm/huge_memory.c
> before that call is just a little ambiguous in its wording. When it says
> "mark the pmd notpresent" it probably means "mark it so that it will not
> generate a new TLB entry", which is also what the comment is really about:
> prevent huge and small entries in the TLB for the same page at the same
> time.

If I am not mistaken this is true for x86 as well. The generic implementation
for pmdp_invalidate sets a new pmd that has been modified with
pmd_mknotpresent. For x86 this function removes the _PAGE_PRESENT and
_PAGE_PROTNONE bits from the entry. The _PAGE_PSE bit stays set and that
makes pmd_present return true.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

  parent reply	other threads:[~2016-02-24  8:22 UTC|newest]

Thread overview: 153+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-11 18:22 [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM) Gerald Schaefer
2016-02-11 18:22 ` Gerald Schaefer
2016-02-11 18:22 ` Gerald Schaefer
2016-02-11 19:09 ` Kirill A. Shutemov
2016-02-11 19:09   ` Kirill A. Shutemov
2016-02-11 19:09   ` Kirill A. Shutemov
2016-02-11 19:12   ` Kirill A. Shutemov
2016-02-11 19:12     ` Kirill A. Shutemov
2016-02-11 19:12     ` Kirill A. Shutemov
2016-02-12 12:21     ` Sebastian Ott
2016-02-12 12:21       ` Sebastian Ott
2016-02-12 12:21       ` Sebastian Ott
2016-02-11 19:57   ` Gerald Schaefer
2016-02-11 19:57     ` Gerald Schaefer
2016-02-11 19:57     ` Gerald Schaefer
2016-02-12  4:04     ` Aneesh Kumar K.V
2016-02-12  4:04       ` Aneesh Kumar K.V
2016-02-12  4:04       ` Aneesh Kumar K.V
2016-02-12 11:59       ` Gerald Schaefer
2016-02-12 11:59         ` Gerald Schaefer
2016-02-12 11:59         ` Gerald Schaefer
2016-02-12 16:17         ` Aneesh Kumar K.V
2016-02-12 16:17           ` Aneesh Kumar K.V
2016-02-12 16:17           ` Aneesh Kumar K.V
2016-02-12 10:01     ` Will Deacon
2016-02-12 10:01       ` Will Deacon
2016-02-12 10:01       ` Will Deacon
2016-02-12 10:12       ` Sebastian Ott
2016-02-12 10:12         ` Sebastian Ott
2016-02-12 10:12         ` Sebastian Ott
2016-02-12 15:52         ` Will Deacon
2016-02-12 15:52           ` Will Deacon
2016-02-12 15:52           ` Will Deacon
2016-02-12 15:41     ` Kirill A. Shutemov
2016-02-12 15:41       ` Kirill A. Shutemov
2016-02-12 15:41       ` Kirill A. Shutemov
2016-02-12 15:57       ` Christian Borntraeger
2016-02-12 15:57         ` Christian Borntraeger
2016-02-12 15:57         ` Christian Borntraeger
2016-02-12 17:16         ` Gerald Schaefer
2016-02-12 17:16           ` Gerald Schaefer
2016-02-12 17:16           ` Gerald Schaefer
2016-02-12 23:15           ` Kirill A. Shutemov
2016-02-12 23:15             ` Kirill A. Shutemov
2016-02-12 23:15             ` Kirill A. Shutemov
2016-02-13 11:58             ` Sebastian Ott
2016-02-13 11:58               ` Sebastian Ott
2016-02-13 11:58               ` Sebastian Ott
2016-02-15 11:31               ` Kirill A. Shutemov
2016-02-15 11:31                 ` Kirill A. Shutemov
2016-02-15 11:31                 ` Kirill A. Shutemov
2016-02-15 16:38                 ` Sebastian Ott
2016-02-15 16:38                   ` Sebastian Ott
2016-02-15 16:38                   ` Sebastian Ott
2016-02-15 18:37                 ` Gerald Schaefer
2016-02-15 18:37                   ` Gerald Schaefer
2016-02-15 18:37                   ` Gerald Schaefer
2016-02-15 18:37                   ` Gerald Schaefer
2016-02-15 21:35                   ` Kirill A. Shutemov
2016-02-15 21:35                     ` Kirill A. Shutemov
2016-02-15 21:35                     ` Kirill A. Shutemov
2016-02-16  9:54                     ` Sebastian Ott
2016-02-16  9:54                       ` Sebastian Ott
2016-02-16  9:54                       ` Sebastian Ott
2016-02-16 16:24                     ` Gerald Schaefer
2016-02-16 16:24                       ` Gerald Schaefer
2016-02-16 16:24                       ` Gerald Schaefer
2016-02-16 16:24                       ` Gerald Schaefer
2016-02-17 15:04                       ` Kirill A. Shutemov
2016-02-17 15:04                         ` Kirill A. Shutemov
2016-02-17 15:04                         ` Kirill A. Shutemov
2016-02-17 19:04                         ` Sebastian Ott
2016-02-17 19:04                           ` Sebastian Ott
2016-02-17 19:04                           ` Sebastian Ott
2016-02-16 18:46                     ` Christian Borntraeger
2016-02-16 18:46                       ` Christian Borntraeger
2016-02-16 18:46                       ` Christian Borntraeger
2016-02-17 19:13               ` Gerald Schaefer
2016-02-17 19:13                 ` Gerald Schaefer
2016-02-17 19:13                 ` Gerald Schaefer
2016-02-17 23:58                 ` Kirill A. Shutemov
2016-02-17 23:58                   ` Kirill A. Shutemov
2016-02-17 23:58                   ` Kirill A. Shutemov
2016-02-18 15:00                   ` Gerald Schaefer
2016-02-18 15:00                     ` Gerald Schaefer
2016-02-18 15:00                     ` Gerald Schaefer
2016-02-18 17:06                     ` Kirill A. Shutemov
2016-02-18 17:06                       ` Kirill A. Shutemov
2016-02-18 17:06                       ` Kirill A. Shutemov
2016-02-19 14:15                       ` Sebastian Ott
2016-02-19 14:15                         ` Sebastian Ott
2016-02-19 14:15                         ` Sebastian Ott
2016-02-15 16:41             ` Gerald Schaefer
2016-02-15 16:41               ` Gerald Schaefer
2016-02-15 16:41               ` Gerald Schaefer
2016-02-23 10:32           ` Kirill A. Shutemov
2016-02-23 10:32             ` Kirill A. Shutemov
2016-02-23 10:32             ` Kirill A. Shutemov
2016-02-23 17:46             ` Linus Torvalds
2016-02-23 17:46               ` Linus Torvalds
2016-02-23 17:46               ` Linus Torvalds
2016-02-23 17:46               ` Linus Torvalds
2016-02-23 18:19             ` Gerald Schaefer
2016-02-23 18:19               ` Gerald Schaefer
2016-02-23 18:19               ` Gerald Schaefer
2016-02-23 18:47               ` Will Deacon
2016-02-23 18:47                 ` Will Deacon
2016-02-23 18:47                 ` Will Deacon
2016-02-25 15:49                 ` Steve Capper
2016-02-25 15:49                   ` Steve Capper
2016-02-25 15:49                   ` Steve Capper
2016-02-25 15:49                   ` Steve Capper
2016-02-25 16:01                   ` Kirill A. Shutemov
2016-02-25 16:01                     ` Kirill A. Shutemov
2016-02-25 16:01                     ` Kirill A. Shutemov
2016-02-25 16:01                     ` Kirill A. Shutemov
2016-02-25 16:08                     ` Steve Capper
2016-02-25 16:08                       ` Steve Capper
2016-02-25 16:08                       ` Steve Capper
2016-02-25 16:08                       ` Steve Capper
2016-02-23 19:33               ` Kirill A. Shutemov
2016-02-23 19:33                 ` Kirill A. Shutemov
2016-02-23 19:33                 ` Kirill A. Shutemov
2016-02-23 20:22                 ` Will Deacon
2016-02-23 20:22                   ` Will Deacon
2016-02-23 20:22                   ` Will Deacon
2016-02-24 10:16                   ` Christian Borntraeger
2016-02-24 10:16                     ` Christian Borntraeger
2016-02-24 10:16                     ` Christian Borntraeger
2016-02-24 10:41                     ` Will Deacon
2016-02-24 10:41                       ` Will Deacon
2016-02-24 10:41                       ` Will Deacon
2016-02-24 10:51                       ` Christian Borntraeger
2016-02-24 10:51                         ` Christian Borntraeger
2016-02-24 10:51                         ` Christian Borntraeger
2016-02-24 11:02                         ` Will Deacon
2016-02-24 11:02                           ` Will Deacon
2016-02-24 11:02                           ` Will Deacon
2016-02-24 17:22                         ` Aneesh Kumar K.V
2016-02-24 17:22                           ` Aneesh Kumar K.V
2016-02-24 17:22                           ` Aneesh Kumar K.V
2016-02-24  8:39                 ` Martin Schwidefsky
2016-02-24  8:39                   ` Martin Schwidefsky
2016-02-24  8:39                   ` Martin Schwidefsky
2016-02-24 12:11                   ` Sebastian Ott
2016-02-24 12:11                     ` Sebastian Ott
2016-02-24 12:11                     ` Sebastian Ott
2016-02-24 16:44                 ` Gerald Schaefer
2016-02-24 16:44                   ` Gerald Schaefer
2016-02-24 16:44                   ` Gerald Schaefer
2016-02-24  8:22               ` Martin Schwidefsky [this message]
2016-02-24  8:22                 ` Martin Schwidefsky
2016-02-24  8:22                 ` Martin Schwidefsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160224092208.49e013ff@mschwide \
    --to=schwidefsky@de.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=borntraeger@de.ibm.com \
    --cc=catalin.marinas@arm.com \
    --cc=gerald.schaefer@de.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=sebott@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.