All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Borntraeger <borntraeger@de.ibm.com>
To: Will Deacon <will.deacon@arm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	Gerald Schaefer <gerald.schaefer@de.ibm.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	linuxppc-dev@lists.ozlabs.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	linux-s390@vger.kernel.org,
	Sebastian Ott <sebott@linux.vnet.ibm.com>
Subject: Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
Date: Wed, 24 Feb 2016 11:51:47 +0100	[thread overview]
Message-ID: <56CD8B43.9070509@de.ibm.com> (raw)
In-Reply-To: <20160224104139.GC28310@arm.com>

On 02/24/2016 11:41 AM, Will Deacon wrote:
> On Wed, Feb 24, 2016 at 11:16:34AM +0100, Christian Borntraeger wrote:
>> On 02/23/2016 09:22 PM, Will Deacon wrote:
>>> On Tue, Feb 23, 2016 at 10:33:45PM +0300, Kirill A. Shutemov wrote:
>>>> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>>>>> I'll check with Martin, maybe it is actually trivial, then we can
>>>>> do a quick test it to rule that one out.
>>>>
>>>> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
>>>> _the_ bug.
>>>>
>>>> pmdp_invalidate() is called for the wrong address :-/
>>>> I guess that can be destructive on the architecture, right?
>>>
>>> FWIW, arm64 ignores the address parameter for set_pmd_at, so this would
>>> only result in the TLBI nuking the wrong entries, which is going to be
>>> tricky to observe in practice given that we install a table entry
>>> immediately afterwards that maps the same pages. If s390 does more here
>>> (I see some magic asm using the address), that could be the answer...
>>
>> This patch does not change the address for set_pmd_at, it does that for the 
>> pmdp_invalidate here (by keeping haddr at the start of the pmd)
>>
>> --->    pmdp_invalidate(vma, haddr, pmd);
>>         pmd_populate(mm, pmd, pgtable);
> 
> On arm64, pmdp_invalidate looks like:
> 
> void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> 		     pmd_t *pmdp)
> {
> 	pmd_t entry = *pmdp;
> 	set_pmd_at(vma->vm_mm, address, pmdp, pmd_mknotpresent(entry));
> 	flush_pmd_tlb_range(vma, address, address + hpage_pmd_size);
> }
> 
> so that's the set_pmd_at call I was referring to.
> 
> On s390, that address ends up in __pmdp_idte[_local], but I don't know
> what .insn rrf,0xb98e0000,%2,%3,0,{0,1} do ;)

It does invalidation of the pmd entry and tlb clearing for this entry.

> 
>> Without that fix we would clearly have stale tlb entries, no?
> 
> Yes, but AFAIU the sequence on arm64 is:
> 
> 1.  trans huge mapping (block mapping in arm64 speak)
> 2.  faulting entry (pmd_mknotpresent)
> 3.  tlb invalidation
> 4.  table entry mapping the same pages as (1).
> 
> so if the microarchitecture we're on can tolerate a mixture of block
> mappings and page mappings mapping the same VA to the same PA, then the
> lack of TLB maintenance would go unnoticed. There are certainly systems
> where that could cause an issue, but I believe the one I've been testing
> on would be ok.

So in essence you say it does not matter that you flush the wrong range in 
flush_pmd_tlb_range as long as it will be flushed later on when the pages
really go away. Yes, then it really might be ok for arm64.

WARNING: multiple messages have this Message-ID (diff)
From: Christian Borntraeger <borntraeger@de.ibm.com>
To: Will Deacon <will.deacon@arm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
	Gerald Schaefer <gerald.schaefer@de.ibm.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	linuxppc-dev@lists.ozlabs.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	linux-s390@vger.kernel.org,
	Sebastian Ott <sebott@linux.vnet.ibm.com>
Subject: Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
Date: Wed, 24 Feb 2016 11:51:47 +0100	[thread overview]
Message-ID: <56CD8B43.9070509@de.ibm.com> (raw)
In-Reply-To: <20160224104139.GC28310@arm.com>

On 02/24/2016 11:41 AM, Will Deacon wrote:
> On Wed, Feb 24, 2016 at 11:16:34AM +0100, Christian Borntraeger wrote:
>> On 02/23/2016 09:22 PM, Will Deacon wrote:
>>> On Tue, Feb 23, 2016 at 10:33:45PM +0300, Kirill A. Shutemov wrote:
>>>> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>>>>> I'll check with Martin, maybe it is actually trivial, then we can
>>>>> do a quick test it to rule that one out.
>>>>
>>>> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
>>>> _the_ bug.
>>>>
>>>> pmdp_invalidate() is called for the wrong address :-/
>>>> I guess that can be destructive on the architecture, right?
>>>
>>> FWIW, arm64 ignores the address parameter for set_pmd_at, so this would
>>> only result in the TLBI nuking the wrong entries, which is going to be
>>> tricky to observe in practice given that we install a table entry
>>> immediately afterwards that maps the same pages. If s390 does more here
>>> (I see some magic asm using the address), that could be the answer...
>>
>> This patch does not change the address for set_pmd_at, it does that for the 
>> pmdp_invalidate here (by keeping haddr at the start of the pmd)
>>
>> --->    pmdp_invalidate(vma, haddr, pmd);
>>         pmd_populate(mm, pmd, pgtable);
> 
> On arm64, pmdp_invalidate looks like:
> 
> void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> 		     pmd_t *pmdp)
> {
> 	pmd_t entry = *pmdp;
> 	set_pmd_at(vma->vm_mm, address, pmdp, pmd_mknotpresent(entry));
> 	flush_pmd_tlb_range(vma, address, address + hpage_pmd_size);
> }
> 
> so that's the set_pmd_at call I was referring to.
> 
> On s390, that address ends up in __pmdp_idte[_local], but I don't know
> what .insn rrf,0xb98e0000,%2,%3,0,{0,1} do ;)

It does invalidation of the pmd entry and tlb clearing for this entry.

> 
>> Without that fix we would clearly have stale tlb entries, no?
> 
> Yes, but AFAIU the sequence on arm64 is:
> 
> 1.  trans huge mapping (block mapping in arm64 speak)
> 2.  faulting entry (pmd_mknotpresent)
> 3.  tlb invalidation
> 4.  table entry mapping the same pages as (1).
> 
> so if the microarchitecture we're on can tolerate a mixture of block
> mappings and page mappings mapping the same VA to the same PA, then the
> lack of TLB maintenance would go unnoticed. There are certainly systems
> where that could cause an issue, but I believe the one I've been testing
> on would be ok.

So in essence you say it does not matter that you flush the wrong range in 
flush_pmd_tlb_range as long as it will be flushed later on when the pages
really go away. Yes, then it really might be ok for arm64.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: borntraeger@de.ibm.com (Christian Borntraeger)
To: linux-arm-kernel@lists.infradead.org
Subject: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)
Date: Wed, 24 Feb 2016 11:51:47 +0100	[thread overview]
Message-ID: <56CD8B43.9070509@de.ibm.com> (raw)
In-Reply-To: <20160224104139.GC28310@arm.com>

On 02/24/2016 11:41 AM, Will Deacon wrote:
> On Wed, Feb 24, 2016 at 11:16:34AM +0100, Christian Borntraeger wrote:
>> On 02/23/2016 09:22 PM, Will Deacon wrote:
>>> On Tue, Feb 23, 2016 at 10:33:45PM +0300, Kirill A. Shutemov wrote:
>>>> On Tue, Feb 23, 2016 at 07:19:07PM +0100, Gerald Schaefer wrote:
>>>>> I'll check with Martin, maybe it is actually trivial, then we can
>>>>> do a quick test it to rule that one out.
>>>>
>>>> Oh. I found a bug in __split_huge_pmd_locked(). Although, not sure if it's
>>>> _the_ bug.
>>>>
>>>> pmdp_invalidate() is called for the wrong address :-/
>>>> I guess that can be destructive on the architecture, right?
>>>
>>> FWIW, arm64 ignores the address parameter for set_pmd_at, so this would
>>> only result in the TLBI nuking the wrong entries, which is going to be
>>> tricky to observe in practice given that we install a table entry
>>> immediately afterwards that maps the same pages. If s390 does more here
>>> (I see some magic asm using the address), that could be the answer...
>>
>> This patch does not change the address for set_pmd_at, it does that for the 
>> pmdp_invalidate here (by keeping haddr at the start of the pmd)
>>
>> --->    pmdp_invalidate(vma, haddr, pmd);
>>         pmd_populate(mm, pmd, pgtable);
> 
> On arm64, pmdp_invalidate looks like:
> 
> void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> 		     pmd_t *pmdp)
> {
> 	pmd_t entry = *pmdp;
> 	set_pmd_at(vma->vm_mm, address, pmdp, pmd_mknotpresent(entry));
> 	flush_pmd_tlb_range(vma, address, address + hpage_pmd_size);
> }
> 
> so that's the set_pmd_at call I was referring to.
> 
> On s390, that address ends up in __pmdp_idte[_local], but I don't know
> what .insn rrf,0xb98e0000,%2,%3,0,{0,1} do ;)

It does invalidation of the pmd entry and tlb clearing for this entry.

> 
>> Without that fix we would clearly have stale tlb entries, no?
> 
> Yes, but AFAIU the sequence on arm64 is:
> 
> 1.  trans huge mapping (block mapping in arm64 speak)
> 2.  faulting entry (pmd_mknotpresent)
> 3.  tlb invalidation
> 4.  table entry mapping the same pages as (1).
> 
> so if the microarchitecture we're on can tolerate a mixture of block
> mappings and page mappings mapping the same VA to the same PA, then the
> lack of TLB maintenance would go unnoticed. There are certainly systems
> where that could cause an issue, but I believe the one I've been testing
> on would be ok.

So in essence you say it does not matter that you flush the wrong range in 
flush_pmd_tlb_range as long as it will be flushed later on when the pages
really go away. Yes, then it really might be ok for arm64.

  reply	other threads:[~2016-02-24 10:52 UTC|newest]

Thread overview: 153+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-11 18:22 [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM) Gerald Schaefer
2016-02-11 18:22 ` Gerald Schaefer
2016-02-11 18:22 ` Gerald Schaefer
2016-02-11 19:09 ` Kirill A. Shutemov
2016-02-11 19:09   ` Kirill A. Shutemov
2016-02-11 19:09   ` Kirill A. Shutemov
2016-02-11 19:12   ` Kirill A. Shutemov
2016-02-11 19:12     ` Kirill A. Shutemov
2016-02-11 19:12     ` Kirill A. Shutemov
2016-02-12 12:21     ` Sebastian Ott
2016-02-12 12:21       ` Sebastian Ott
2016-02-12 12:21       ` Sebastian Ott
2016-02-11 19:57   ` Gerald Schaefer
2016-02-11 19:57     ` Gerald Schaefer
2016-02-11 19:57     ` Gerald Schaefer
2016-02-12  4:04     ` Aneesh Kumar K.V
2016-02-12  4:04       ` Aneesh Kumar K.V
2016-02-12  4:04       ` Aneesh Kumar K.V
2016-02-12 11:59       ` Gerald Schaefer
2016-02-12 11:59         ` Gerald Schaefer
2016-02-12 11:59         ` Gerald Schaefer
2016-02-12 16:17         ` Aneesh Kumar K.V
2016-02-12 16:17           ` Aneesh Kumar K.V
2016-02-12 16:17           ` Aneesh Kumar K.V
2016-02-12 10:01     ` Will Deacon
2016-02-12 10:01       ` Will Deacon
2016-02-12 10:01       ` Will Deacon
2016-02-12 10:12       ` Sebastian Ott
2016-02-12 10:12         ` Sebastian Ott
2016-02-12 10:12         ` Sebastian Ott
2016-02-12 15:52         ` Will Deacon
2016-02-12 15:52           ` Will Deacon
2016-02-12 15:52           ` Will Deacon
2016-02-12 15:41     ` Kirill A. Shutemov
2016-02-12 15:41       ` Kirill A. Shutemov
2016-02-12 15:41       ` Kirill A. Shutemov
2016-02-12 15:57       ` Christian Borntraeger
2016-02-12 15:57         ` Christian Borntraeger
2016-02-12 15:57         ` Christian Borntraeger
2016-02-12 17:16         ` Gerald Schaefer
2016-02-12 17:16           ` Gerald Schaefer
2016-02-12 17:16           ` Gerald Schaefer
2016-02-12 23:15           ` Kirill A. Shutemov
2016-02-12 23:15             ` Kirill A. Shutemov
2016-02-12 23:15             ` Kirill A. Shutemov
2016-02-13 11:58             ` Sebastian Ott
2016-02-13 11:58               ` Sebastian Ott
2016-02-13 11:58               ` Sebastian Ott
2016-02-15 11:31               ` Kirill A. Shutemov
2016-02-15 11:31                 ` Kirill A. Shutemov
2016-02-15 11:31                 ` Kirill A. Shutemov
2016-02-15 16:38                 ` Sebastian Ott
2016-02-15 16:38                   ` Sebastian Ott
2016-02-15 16:38                   ` Sebastian Ott
2016-02-15 18:37                 ` Gerald Schaefer
2016-02-15 18:37                   ` Gerald Schaefer
2016-02-15 18:37                   ` Gerald Schaefer
2016-02-15 18:37                   ` Gerald Schaefer
2016-02-15 21:35                   ` Kirill A. Shutemov
2016-02-15 21:35                     ` Kirill A. Shutemov
2016-02-15 21:35                     ` Kirill A. Shutemov
2016-02-16  9:54                     ` Sebastian Ott
2016-02-16  9:54                       ` Sebastian Ott
2016-02-16  9:54                       ` Sebastian Ott
2016-02-16 16:24                     ` Gerald Schaefer
2016-02-16 16:24                       ` Gerald Schaefer
2016-02-16 16:24                       ` Gerald Schaefer
2016-02-16 16:24                       ` Gerald Schaefer
2016-02-17 15:04                       ` Kirill A. Shutemov
2016-02-17 15:04                         ` Kirill A. Shutemov
2016-02-17 15:04                         ` Kirill A. Shutemov
2016-02-17 19:04                         ` Sebastian Ott
2016-02-17 19:04                           ` Sebastian Ott
2016-02-17 19:04                           ` Sebastian Ott
2016-02-16 18:46                     ` Christian Borntraeger
2016-02-16 18:46                       ` Christian Borntraeger
2016-02-16 18:46                       ` Christian Borntraeger
2016-02-17 19:13               ` Gerald Schaefer
2016-02-17 19:13                 ` Gerald Schaefer
2016-02-17 19:13                 ` Gerald Schaefer
2016-02-17 23:58                 ` Kirill A. Shutemov
2016-02-17 23:58                   ` Kirill A. Shutemov
2016-02-17 23:58                   ` Kirill A. Shutemov
2016-02-18 15:00                   ` Gerald Schaefer
2016-02-18 15:00                     ` Gerald Schaefer
2016-02-18 15:00                     ` Gerald Schaefer
2016-02-18 17:06                     ` Kirill A. Shutemov
2016-02-18 17:06                       ` Kirill A. Shutemov
2016-02-18 17:06                       ` Kirill A. Shutemov
2016-02-19 14:15                       ` Sebastian Ott
2016-02-19 14:15                         ` Sebastian Ott
2016-02-19 14:15                         ` Sebastian Ott
2016-02-15 16:41             ` Gerald Schaefer
2016-02-15 16:41               ` Gerald Schaefer
2016-02-15 16:41               ` Gerald Schaefer
2016-02-23 10:32           ` Kirill A. Shutemov
2016-02-23 10:32             ` Kirill A. Shutemov
2016-02-23 10:32             ` Kirill A. Shutemov
2016-02-23 17:46             ` Linus Torvalds
2016-02-23 17:46               ` Linus Torvalds
2016-02-23 17:46               ` Linus Torvalds
2016-02-23 17:46               ` Linus Torvalds
2016-02-23 18:19             ` Gerald Schaefer
2016-02-23 18:19               ` Gerald Schaefer
2016-02-23 18:19               ` Gerald Schaefer
2016-02-23 18:47               ` Will Deacon
2016-02-23 18:47                 ` Will Deacon
2016-02-23 18:47                 ` Will Deacon
2016-02-25 15:49                 ` Steve Capper
2016-02-25 15:49                   ` Steve Capper
2016-02-25 15:49                   ` Steve Capper
2016-02-25 15:49                   ` Steve Capper
2016-02-25 16:01                   ` Kirill A. Shutemov
2016-02-25 16:01                     ` Kirill A. Shutemov
2016-02-25 16:01                     ` Kirill A. Shutemov
2016-02-25 16:01                     ` Kirill A. Shutemov
2016-02-25 16:08                     ` Steve Capper
2016-02-25 16:08                       ` Steve Capper
2016-02-25 16:08                       ` Steve Capper
2016-02-25 16:08                       ` Steve Capper
2016-02-23 19:33               ` Kirill A. Shutemov
2016-02-23 19:33                 ` Kirill A. Shutemov
2016-02-23 19:33                 ` Kirill A. Shutemov
2016-02-23 20:22                 ` Will Deacon
2016-02-23 20:22                   ` Will Deacon
2016-02-23 20:22                   ` Will Deacon
2016-02-24 10:16                   ` Christian Borntraeger
2016-02-24 10:16                     ` Christian Borntraeger
2016-02-24 10:16                     ` Christian Borntraeger
2016-02-24 10:41                     ` Will Deacon
2016-02-24 10:41                       ` Will Deacon
2016-02-24 10:41                       ` Will Deacon
2016-02-24 10:51                       ` Christian Borntraeger [this message]
2016-02-24 10:51                         ` Christian Borntraeger
2016-02-24 10:51                         ` Christian Borntraeger
2016-02-24 11:02                         ` Will Deacon
2016-02-24 11:02                           ` Will Deacon
2016-02-24 11:02                           ` Will Deacon
2016-02-24 17:22                         ` Aneesh Kumar K.V
2016-02-24 17:22                           ` Aneesh Kumar K.V
2016-02-24 17:22                           ` Aneesh Kumar K.V
2016-02-24  8:39                 ` Martin Schwidefsky
2016-02-24  8:39                   ` Martin Schwidefsky
2016-02-24  8:39                   ` Martin Schwidefsky
2016-02-24 12:11                   ` Sebastian Ott
2016-02-24 12:11                     ` Sebastian Ott
2016-02-24 12:11                     ` Sebastian Ott
2016-02-24 16:44                 ` Gerald Schaefer
2016-02-24 16:44                   ` Gerald Schaefer
2016-02-24 16:44                   ` Gerald Schaefer
2016-02-24  8:22               ` Martin Schwidefsky
2016-02-24  8:22                 ` Martin Schwidefsky
2016-02-24  8:22                 ` Martin Schwidefsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56CD8B43.9070509@de.ibm.com \
    --to=borntraeger@de.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=catalin.marinas@arm.com \
    --cc=gerald.schaefer@de.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=sebott@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.