All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] MIPS: Fix page table corruption on THP permission changes.
@ 2016-06-16 22:50 David Daney
  2016-06-17 12:00   ` Aaro Koskinen
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: David Daney @ 2016-06-16 22:50 UTC (permalink / raw)
  To: linux-mips, ralf; +Cc: David Daney, stable

From: David Daney <david.daney@cavium.com>

When the core THP code is modifying the permissions of a huge page it
calls pmd_modify(), which unfortunately was clearing the _PAGE_HUGE bit
of the page table entry.  The result can be kernel messages like:

mm/memory.c:397: bad pmd 000000040080004d.
mm/memory.c:397: bad pmd 00000003ff00004d.
mm/memory.c:397: bad pmd 000000040100004d.

or:

------------[ cut here ]------------
WARNING: at mm/mmap.c:3200 exit_mmap+0x150/0x158()
Modules linked in: ipv6 at24 octeon3_ethernet octeon_srio_nexus m25p80
CPU: 12 PID: 1295 Comm: pmderr Not tainted 3.10.87-rt80-Cavium-Octeon #4
Stack : 0000000040808000 0000000014009ce1 0000000000400004 ffffffff81076ba0
          0000000000000000 0000000000000000 ffffffff85110000 0000000000000119
          0000000000000004 0000000000000000 0000000000000119 43617669756d2d4f
          0000000000000000 ffffffff850fda40 ffffffff85110000 0000000000000000
          0000000000000000 0000000000000009 ffffffff809207a0 0000000000000c80
          ffffffff80f1bf20 0000000000000001 000000ffeca36828 0000000000000001
          0000000000000000 0000000000000001 000000ffeca7e700 ffffffff80886924
          80000003fd7a0000 80000003fd7a39b0 80000003fdea8000 ffffffff80885780
          80000003fdea8000 ffffffff80f12218 000000000000000c 000000000000050f
          0000000000000000 ffffffff80865c4c 0000000000000000 0000000000000000
          ...
Call Trace:
[<ffffffff80865c4c>] show_stack+0x6c/0xf8
[<ffffffff80885780>] warn_slowpath_common+0x78/0xa8
[<ffffffff809207a0>] exit_mmap+0x150/0x158
[<ffffffff80882d44>] mmput+0x5c/0x110
[<ffffffff8088b450>] do_exit+0x230/0xa68
[<ffffffff8088be34>] do_group_exit+0x54/0x1d0
[<ffffffff8088bfc0>] __wake_up_parent+0x0/0x18

---[ end trace c7b38293191c57dc ]---
BUG: Bad rss-counter state mm:80000003fa168000 idx:1 val:1536

Fix by not clearing _PAGE_HUGE bit.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: stable@vger.kernel.org
---
 arch/mips/include/asm/pgtable.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
index a6b611f..477b1b1 100644
--- a/arch/mips/include/asm/pgtable.h
+++ b/arch/mips/include/asm/pgtable.h
@@ -632,7 +632,7 @@ static inline struct page *pmd_page(pmd_t pmd)
 
 static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
 {
-	pmd_val(pmd) = (pmd_val(pmd) & _PAGE_CHG_MASK) | pgprot_val(newprot);
+	pmd_val(pmd) = (pmd_val(pmd) & (_PAGE_CHG_MASK | _PAGE_HUGE)) | pgprot_val(newprot);
 	return pmd;
 }
 
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] MIPS: Fix page table corruption on THP permission changes.
@ 2016-06-17 12:00   ` Aaro Koskinen
  0 siblings, 0 replies; 9+ messages in thread
From: Aaro Koskinen @ 2016-06-17 12:00 UTC (permalink / raw)
  To: David Daney; +Cc: linux-mips, ralf, David Daney, stable

Hi,

On Thu, Jun 16, 2016 at 03:50:31PM -0700, David Daney wrote:
> From: David Daney <david.daney@cavium.com>
> 
> When the core THP code is modifying the permissions of a huge page it
> calls pmd_modify(), which unfortunately was clearing the _PAGE_HUGE bit
> of the page table entry.  The result can be kernel messages like:
> 
> mm/memory.c:397: bad pmd 000000040080004d.

[...]

> BUG: Bad rss-counter state mm:80000003fa168000 idx:1 val:1536
> 
> Fix by not clearing _PAGE_HUGE bit.
> 
> Signed-off-by: David Daney <david.daney@cavium.com>
> Cc: stable@vger.kernel.org
> ---
>  arch/mips/include/asm/pgtable.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> index a6b611f..477b1b1 100644
> --- a/arch/mips/include/asm/pgtable.h
> +++ b/arch/mips/include/asm/pgtable.h
> @@ -632,7 +632,7 @@ static inline struct page *pmd_page(pmd_t pmd)
>  
>  static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
>  {
> -	pmd_val(pmd) = (pmd_val(pmd) & _PAGE_CHG_MASK) | pgprot_val(newprot);
> +	pmd_val(pmd) = (pmd_val(pmd) & (_PAGE_CHG_MASK | _PAGE_HUGE)) | pgprot_val(newprot);
>  	return pmd;
>  }

The fix looks correct, but unfortunately at least EBH5600 still keeps
crashing with THP enabled. :-(

[  606.429974] Got mcheck at 000000ffebed8c2c
[  606.442262] CPU: 6 PID: 6767 Comm: ld Not tainted 4.7.0-rc3-octeon-distro.git-v2.17-27-g5cc128c-12208-g7d9ecdf #1
[  606.473026] task: 800000041f384880 ti: 80000000ed7b0000 task.ti: 80000000ed7b0000
[  606.495454] $ 0   : 0000000000000000 3e000000038ac006 000000ffebba7028 000000ffebb9f020
[  606.519588] $ 4   : 0000000001529d94 00000001204f4236 0000000000000000 0000000000000000
[  606.543722] $ 8   : 0000000000000001 7efefefefefefeff ffa0a0998d9e9c8b 8101010101010100
[  606.567856] $12   : 4040404040404040 ffffffff84080018 0000000000000000 6162002e74657874
[  606.591991] $16   : 000000012032a7d0 00000001204f4229 00000001201483f0 0000000000000000
[  606.616125] $20   : 0000000000000000 000000000000000c 00000000053cd125 00000001204edb70
[  606.640259] $24   : 0000000000000034 000000ffebed8b50                                  
[  606.664393] $28   : 000000ffebfac000 000000ffff808160 00000001204b9ad0 000000ffebed9cc8
[  606.688528] Hi    : 0000000000001001
[  606.699237] Lo    : 00000000000014f4
[  606.709951] epc   : 000000ffebed8c2c 0xffebed8c2c
[  606.724048] ra    : 000000ffebed9cc8 0xffebed9cc8
[  606.738144] Status: 00308cf3	KX SX UX USER EXL IE 
[  606.752704] Cause : 00800060 (ExcCode 18)
[  606.764717] PrId  : 000d0409 (Cavium Octeon+)
[  606.777770] Index    : 80000000
[  606.787178] PageMask : 1fe000
[  606.796064] EntryHi  : 000000012032a095
[  606.807555] EntryLo0 : 00000000038a8006
[  606.819046] EntryLo1 : 00000000038ac006
[  606.830535] Wired    : 0
[  606.838120] PageGrain: e0000000
[  606.847525] 
[  606.851986] Index: 40 pgmask=4kb va=0ffebba6000 asid=95
	[ri=0 xi=1 pa=0041d2b2000 c=0 d=1 v=1 g=0] [ri=0 xi=1 pa=0041d2b3000 c=0 d=1 v=1 g=0]
[  606.890740] Index: 41 pgmask=4kb va=0ffebbb6000 asid=95
	[ri=0 xi=1 pa=0041d26e000 c=0 d=1 v=1 g=0] [ri=0 xi=1 pa=0041d26f000 c=0 d=1 v=1 g=0]
[  606.929492] Index: 42 pgmask=4kb va=00120148000 asid=95
	[ri=0 xi=0 pa=0041d6b7000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=0041dcd1000 c=0 d=1 v=1 g=0]
[  606.968241] Index: 43 pgmask=4kb va=0012012c000 asid=95
	[ri=0 xi=1 pa=000e30e9000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=0041e5f8000 c=0 d=1 v=1 g=0]
[  607.006990] Index: 44 pgmask=4kb va=001204ec000 asid=95
	[ri=0 xi=0 pa=000e317e000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=000e32cf000 c=0 d=1 v=1 g=0]
[  607.045743] Index: 45 pgmask=4kb va=001204fe000 asid=95
	[ri=0 xi=0 pa=000e4206000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=000e308f000 c=0 d=1 v=1 g=0]
[  607.084493] Index: 46 pgmask=4kb va=001204f4000 asid=95
	[ri=0 xi=0 pa=000e31d0000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=000e2874000 c=0 d=1 v=1 g=0]
[  607.123243] Index: 47 pgmask=4kb va=0ffebd3c000 asid=95
	[ri=0 xi=0 pa=000ef2fc000 c=0 d=0 v=1 g=0] [ri=0 xi=0 pa=000ef01f000 c=0 d=0 v=1 g=0]
[  607.161992] Index: 48 pgmask=4kb va=0ffebf28000 asid=95
	[ri=0 xi=0 pa=000e3adf000 c=0 d=0 v=1 g=0] [ri=0 xi=0 pa=000e3ade000 c=0 d=0 v=1 g=0]
[  607.200741] Index: 49 pgmask=4kb va=0ffff808000 asid=95
	[ri=0 xi=0 pa=000e34a8000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=000e43bb000 c=0 d=1 v=1 g=0]
[  607.239489] Index: 50 pgmask=4kb va=0ffebfa4000 asid=95
	[ri=0 xi=1 pa=000e35c6000 c=0 d=1 v=1 g=0] [ri=0 xi=1 pa=000e31eb000 c=0 d=1 v=1 g=0]
[  607.278238] Index: 51 pgmask=4kb va=0ffebed8000 asid=95
	[ri=0 xi=0 pa=000e3dce000 c=0 d=0 v=1 g=0] [ri=0 xi=0 pa=000e49ed000 c=0 d=0 v=1 g=0]
[  607.316985] Index: 52 pgmask=4kb va=00120274000 asid=95
	[ri=0 xi=0 pa=00000000000 c=0 d=0 v=0 g=0] [ri=0 xi=1 pa=00000000000 c=2 d=1 v=1 g=0]
[  607.355734] 
[  607.360192] 
Code: de100000  12000014  00000000 <de020010> 1456fffb  df9991d0  de040008  0320f809  0220282d 
[  607.389654] Kernel panic - not syncing: Caught Machine Check exception - caused by multiple matching entries in the TLB.
[  607.422806] ---[ end Kernel panic - not syncing: Caught Machine Check exception - caused by multiple matching entries in the TLB.

*** NMI Watchdog interrupt on Core 0x0 ***

A.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] MIPS: Fix page table corruption on THP permission changes.
@ 2016-06-17 12:00   ` Aaro Koskinen
  0 siblings, 0 replies; 9+ messages in thread
From: Aaro Koskinen @ 2016-06-17 12:00 UTC (permalink / raw)
  To: David Daney; +Cc: linux-mips, ralf, David Daney, stable

Hi,

On Thu, Jun 16, 2016 at 03:50:31PM -0700, David Daney wrote:
> From: David Daney <david.daney@cavium.com>
> 
> When the core THP code is modifying the permissions of a huge page it
> calls pmd_modify(), which unfortunately was clearing the _PAGE_HUGE bit
> of the page table entry.  The result can be kernel messages like:
> 
> mm/memory.c:397: bad pmd 000000040080004d.

[...]

> BUG: Bad rss-counter state mm:80000003fa168000 idx:1 val:1536
> 
> Fix by not clearing _PAGE_HUGE bit.
> 
> Signed-off-by: David Daney <david.daney@cavium.com>
> Cc: stable@vger.kernel.org
> ---
>  arch/mips/include/asm/pgtable.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> index a6b611f..477b1b1 100644
> --- a/arch/mips/include/asm/pgtable.h
> +++ b/arch/mips/include/asm/pgtable.h
> @@ -632,7 +632,7 @@ static inline struct page *pmd_page(pmd_t pmd)
>  
>  static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
>  {
> -	pmd_val(pmd) = (pmd_val(pmd) & _PAGE_CHG_MASK) | pgprot_val(newprot);
> +	pmd_val(pmd) = (pmd_val(pmd) & (_PAGE_CHG_MASK | _PAGE_HUGE)) | pgprot_val(newprot);
>  	return pmd;
>  }

The fix looks correct, but unfortunately at least EBH5600 still keeps
crashing with THP enabled. :-(

[  606.429974] Got mcheck at 000000ffebed8c2c
[  606.442262] CPU: 6 PID: 6767 Comm: ld Not tainted 4.7.0-rc3-octeon-distro.git-v2.17-27-g5cc128c-12208-g7d9ecdf #1
[  606.473026] task: 800000041f384880 ti: 80000000ed7b0000 task.ti: 80000000ed7b0000
[  606.495454] $ 0   : 0000000000000000 3e000000038ac006 000000ffebba7028 000000ffebb9f020
[  606.519588] $ 4   : 0000000001529d94 00000001204f4236 0000000000000000 0000000000000000
[  606.543722] $ 8   : 0000000000000001 7efefefefefefeff ffa0a0998d9e9c8b 8101010101010100
[  606.567856] $12   : 4040404040404040 ffffffff84080018 0000000000000000 6162002e74657874
[  606.591991] $16   : 000000012032a7d0 00000001204f4229 00000001201483f0 0000000000000000
[  606.616125] $20   : 0000000000000000 000000000000000c 00000000053cd125 00000001204edb70
[  606.640259] $24   : 0000000000000034 000000ffebed8b50                                  
[  606.664393] $28   : 000000ffebfac000 000000ffff808160 00000001204b9ad0 000000ffebed9cc8
[  606.688528] Hi    : 0000000000001001
[  606.699237] Lo    : 00000000000014f4
[  606.709951] epc   : 000000ffebed8c2c 0xffebed8c2c
[  606.724048] ra    : 000000ffebed9cc8 0xffebed9cc8
[  606.738144] Status: 00308cf3	KX SX UX USER EXL IE 
[  606.752704] Cause : 00800060 (ExcCode 18)
[  606.764717] PrId  : 000d0409 (Cavium Octeon+)
[  606.777770] Index    : 80000000
[  606.787178] PageMask : 1fe000
[  606.796064] EntryHi  : 000000012032a095
[  606.807555] EntryLo0 : 00000000038a8006
[  606.819046] EntryLo1 : 00000000038ac006
[  606.830535] Wired    : 0
[  606.838120] PageGrain: e0000000
[  606.847525] 
[  606.851986] Index: 40 pgmask=4kb va=0ffebba6000 asid=95
	[ri=0 xi=1 pa=0041d2b2000 c=0 d=1 v=1 g=0] [ri=0 xi=1 pa=0041d2b3000 c=0 d=1 v=1 g=0]
[  606.890740] Index: 41 pgmask=4kb va=0ffebbb6000 asid=95
	[ri=0 xi=1 pa=0041d26e000 c=0 d=1 v=1 g=0] [ri=0 xi=1 pa=0041d26f000 c=0 d=1 v=1 g=0]
[  606.929492] Index: 42 pgmask=4kb va=00120148000 asid=95
	[ri=0 xi=0 pa=0041d6b7000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=0041dcd1000 c=0 d=1 v=1 g=0]
[  606.968241] Index: 43 pgmask=4kb va=0012012c000 asid=95
	[ri=0 xi=1 pa=000e30e9000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=0041e5f8000 c=0 d=1 v=1 g=0]
[  607.006990] Index: 44 pgmask=4kb va=001204ec000 asid=95
	[ri=0 xi=0 pa=000e317e000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=000e32cf000 c=0 d=1 v=1 g=0]
[  607.045743] Index: 45 pgmask=4kb va=001204fe000 asid=95
	[ri=0 xi=0 pa=000e4206000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=000e308f000 c=0 d=1 v=1 g=0]
[  607.084493] Index: 46 pgmask=4kb va=001204f4000 asid=95
	[ri=0 xi=0 pa=000e31d0000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=000e2874000 c=0 d=1 v=1 g=0]
[  607.123243] Index: 47 pgmask=4kb va=0ffebd3c000 asid=95
	[ri=0 xi=0 pa=000ef2fc000 c=0 d=0 v=1 g=0] [ri=0 xi=0 pa=000ef01f000 c=0 d=0 v=1 g=0]
[  607.161992] Index: 48 pgmask=4kb va=0ffebf28000 asid=95
	[ri=0 xi=0 pa=000e3adf000 c=0 d=0 v=1 g=0] [ri=0 xi=0 pa=000e3ade000 c=0 d=0 v=1 g=0]
[  607.200741] Index: 49 pgmask=4kb va=0ffff808000 asid=95
	[ri=0 xi=0 pa=000e34a8000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=000e43bb000 c=0 d=1 v=1 g=0]
[  607.239489] Index: 50 pgmask=4kb va=0ffebfa4000 asid=95
	[ri=0 xi=1 pa=000e35c6000 c=0 d=1 v=1 g=0] [ri=0 xi=1 pa=000e31eb000 c=0 d=1 v=1 g=0]
[  607.278238] Index: 51 pgmask=4kb va=0ffebed8000 asid=95
	[ri=0 xi=0 pa=000e3dce000 c=0 d=0 v=1 g=0] [ri=0 xi=0 pa=000e49ed000 c=0 d=0 v=1 g=0]
[  607.316985] Index: 52 pgmask=4kb va=00120274000 asid=95
	[ri=0 xi=0 pa=00000000000 c=0 d=0 v=0 g=0] [ri=0 xi=1 pa=00000000000 c=2 d=1 v=1 g=0]
[  607.355734] 
[  607.360192] 
Code: de100000  12000014  00000000 <de020010> 1456fffb  df9991d0  de040008  0320f809  0220282d 
[  607.389654] Kernel panic - not syncing: Caught Machine Check exception - caused by multiple matching entries in the TLB.
[  607.422806] ---[ end Kernel panic - not syncing: Caught Machine Check exception - caused by multiple matching entries in the TLB.

*** NMI Watchdog interrupt on Core 0x0 ***

A.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] MIPS: Fix page table corruption on THP permission changes.
@ 2016-06-17 16:22     ` David Daney
  0 siblings, 0 replies; 9+ messages in thread
From: David Daney @ 2016-06-17 16:22 UTC (permalink / raw)
  To: Aaro Koskinen, ralf; +Cc: David Daney, linux-mips, David Daney, stable

On 06/17/2016 05:00 AM, Aaro Koskinen wrote:
> Hi,
>
> On Thu, Jun 16, 2016 at 03:50:31PM -0700, David Daney wrote:
>> From: David Daney <david.daney@cavium.com>
>>
>> When the core THP code is modifying the permissions of a huge page it
>> calls pmd_modify(), which unfortunately was clearing the _PAGE_HUGE bit
>> of the page table entry.  The result can be kernel messages like:
>>
>> mm/memory.c:397: bad pmd 000000040080004d.
>
> [...]
>
>> BUG: Bad rss-counter state mm:80000003fa168000 idx:1 val:1536
>>
>> Fix by not clearing _PAGE_HUGE bit.
>>
>> Signed-off-by: David Daney <david.daney@cavium.com>
>> Cc: stable@vger.kernel.org
>> ---
>>   arch/mips/include/asm/pgtable.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
>> index a6b611f..477b1b1 100644
>> --- a/arch/mips/include/asm/pgtable.h
>> +++ b/arch/mips/include/asm/pgtable.h
>> @@ -632,7 +632,7 @@ static inline struct page *pmd_page(pmd_t pmd)
>>
>>   static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
>>   {
>> -	pmd_val(pmd) = (pmd_val(pmd) & _PAGE_CHG_MASK) | pgprot_val(newprot);
>> +	pmd_val(pmd) = (pmd_val(pmd) & (_PAGE_CHG_MASK | _PAGE_HUGE)) | pgprot_val(newprot);
>>   	return pmd;
>>   }
>
> The fix looks correct, but unfortunately at least EBH5600 still keeps
> crashing with THP enabled. :-(

OK, I think this patch is still necessary as it fixes other types of 
failures.

Your testing shows that even with this applied there still remain problems.

We need to carefully audit all the code in 
arch/mips/include/asm/pgtable.h that deals with huge page PTEs, to make 
sure that the _PAGE_HUGE bit is being set when necessary.

If the entry in the PMD were to gets its  _PAGE_HUGE bit erroneously 
cleared the TLB exception handlers would load garbage to the TLB, which 
could easily result in MCheck.

David.



>
> [  606.429974] Got mcheck at 000000ffebed8c2c
> [  606.442262] CPU: 6 PID: 6767 Comm: ld Not tainted 4.7.0-rc3-octeon-distro.git-v2.17-27-g5cc128c-12208-g7d9ecdf #1
> [  606.473026] task: 800000041f384880 ti: 80000000ed7b0000 task.ti: 80000000ed7b0000
> [  606.495454] $ 0   : 0000000000000000 3e000000038ac006 000000ffebba7028 000000ffebb9f020
> [  606.519588] $ 4   : 0000000001529d94 00000001204f4236 0000000000000000 0000000000000000
> [  606.543722] $ 8   : 0000000000000001 7efefefefefefeff ffa0a0998d9e9c8b 8101010101010100
> [  606.567856] $12   : 4040404040404040 ffffffff84080018 0000000000000000 6162002e74657874
> [  606.591991] $16   : 000000012032a7d0 00000001204f4229 00000001201483f0 0000000000000000
> [  606.616125] $20   : 0000000000000000 000000000000000c 00000000053cd125 00000001204edb70
> [  606.640259] $24   : 0000000000000034 000000ffebed8b50
> [  606.664393] $28   : 000000ffebfac000 000000ffff808160 00000001204b9ad0 000000ffebed9cc8
> [  606.688528] Hi    : 0000000000001001
> [  606.699237] Lo    : 00000000000014f4
> [  606.709951] epc   : 000000ffebed8c2c 0xffebed8c2c
> [  606.724048] ra    : 000000ffebed9cc8 0xffebed9cc8
> [  606.738144] Status: 00308cf3	KX SX UX USER EXL IE
> [  606.752704] Cause : 00800060 (ExcCode 18)
> [  606.764717] PrId  : 000d0409 (Cavium Octeon+)
> [  606.777770] Index    : 80000000
> [  606.787178] PageMask : 1fe000
> [  606.796064] EntryHi  : 000000012032a095
> [  606.807555] EntryLo0 : 00000000038a8006
> [  606.819046] EntryLo1 : 00000000038ac006
> [  606.830535] Wired    : 0
> [  606.838120] PageGrain: e0000000
> [  606.847525]
> [  606.851986] Index: 40 pgmask=4kb va=0ffebba6000 asid=95
> 	[ri=0 xi=1 pa=0041d2b2000 c=0 d=1 v=1 g=0] [ri=0 xi=1 pa=0041d2b3000 c=0 d=1 v=1 g=0]
> [  606.890740] Index: 41 pgmask=4kb va=0ffebbb6000 asid=95
> 	[ri=0 xi=1 pa=0041d26e000 c=0 d=1 v=1 g=0] [ri=0 xi=1 pa=0041d26f000 c=0 d=1 v=1 g=0]
> [  606.929492] Index: 42 pgmask=4kb va=00120148000 asid=95
> 	[ri=0 xi=0 pa=0041d6b7000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=0041dcd1000 c=0 d=1 v=1 g=0]
> [  606.968241] Index: 43 pgmask=4kb va=0012012c000 asid=95
> 	[ri=0 xi=1 pa=000e30e9000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=0041e5f8000 c=0 d=1 v=1 g=0]
> [  607.006990] Index: 44 pgmask=4kb va=001204ec000 asid=95
> 	[ri=0 xi=0 pa=000e317e000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=000e32cf000 c=0 d=1 v=1 g=0]
> [  607.045743] Index: 45 pgmask=4kb va=001204fe000 asid=95
> 	[ri=0 xi=0 pa=000e4206000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=000e308f000 c=0 d=1 v=1 g=0]
> [  607.084493] Index: 46 pgmask=4kb va=001204f4000 asid=95
> 	[ri=0 xi=0 pa=000e31d0000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=000e2874000 c=0 d=1 v=1 g=0]
> [  607.123243] Index: 47 pgmask=4kb va=0ffebd3c000 asid=95
> 	[ri=0 xi=0 pa=000ef2fc000 c=0 d=0 v=1 g=0] [ri=0 xi=0 pa=000ef01f000 c=0 d=0 v=1 g=0]
> [  607.161992] Index: 48 pgmask=4kb va=0ffebf28000 asid=95
> 	[ri=0 xi=0 pa=000e3adf000 c=0 d=0 v=1 g=0] [ri=0 xi=0 pa=000e3ade000 c=0 d=0 v=1 g=0]
> [  607.200741] Index: 49 pgmask=4kb va=0ffff808000 asid=95
> 	[ri=0 xi=0 pa=000e34a8000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=000e43bb000 c=0 d=1 v=1 g=0]
> [  607.239489] Index: 50 pgmask=4kb va=0ffebfa4000 asid=95
> 	[ri=0 xi=1 pa=000e35c6000 c=0 d=1 v=1 g=0] [ri=0 xi=1 pa=000e31eb000 c=0 d=1 v=1 g=0]
> [  607.278238] Index: 51 pgmask=4kb va=0ffebed8000 asid=95
> 	[ri=0 xi=0 pa=000e3dce000 c=0 d=0 v=1 g=0] [ri=0 xi=0 pa=000e49ed000 c=0 d=0 v=1 g=0]
> [  607.316985] Index: 52 pgmask=4kb va=00120274000 asid=95
> 	[ri=0 xi=0 pa=00000000000 c=0 d=0 v=0 g=0] [ri=0 xi=1 pa=00000000000 c=2 d=1 v=1 g=0]
> [  607.355734]
> [  607.360192]
> Code: de100000  12000014  00000000 <de020010> 1456fffb  df9991d0  de040008  0320f809  0220282d
> [  607.389654] Kernel panic - not syncing: Caught Machine Check exception - caused by multiple matching entries in the TLB.
> [  607.422806] ---[ end Kernel panic - not syncing: Caught Machine Check exception - caused by multiple matching entries in the TLB.
>
> *** NMI Watchdog interrupt on Core 0x0 ***
>
> A.
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] MIPS: Fix page table corruption on THP permission changes.
@ 2016-06-17 16:22     ` David Daney
  0 siblings, 0 replies; 9+ messages in thread
From: David Daney @ 2016-06-17 16:22 UTC (permalink / raw)
  To: Aaro Koskinen, ralf; +Cc: David Daney, linux-mips, David Daney, stable

On 06/17/2016 05:00 AM, Aaro Koskinen wrote:
> Hi,
>
> On Thu, Jun 16, 2016 at 03:50:31PM -0700, David Daney wrote:
>> From: David Daney <david.daney@cavium.com>
>>
>> When the core THP code is modifying the permissions of a huge page it
>> calls pmd_modify(), which unfortunately was clearing the _PAGE_HUGE bit
>> of the page table entry.  The result can be kernel messages like:
>>
>> mm/memory.c:397: bad pmd 000000040080004d.
>
> [...]
>
>> BUG: Bad rss-counter state mm:80000003fa168000 idx:1 val:1536
>>
>> Fix by not clearing _PAGE_HUGE bit.
>>
>> Signed-off-by: David Daney <david.daney@cavium.com>
>> Cc: stable@vger.kernel.org
>> ---
>>   arch/mips/include/asm/pgtable.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
>> index a6b611f..477b1b1 100644
>> --- a/arch/mips/include/asm/pgtable.h
>> +++ b/arch/mips/include/asm/pgtable.h
>> @@ -632,7 +632,7 @@ static inline struct page *pmd_page(pmd_t pmd)
>>
>>   static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
>>   {
>> -	pmd_val(pmd) = (pmd_val(pmd) & _PAGE_CHG_MASK) | pgprot_val(newprot);
>> +	pmd_val(pmd) = (pmd_val(pmd) & (_PAGE_CHG_MASK | _PAGE_HUGE)) | pgprot_val(newprot);
>>   	return pmd;
>>   }
>
> The fix looks correct, but unfortunately at least EBH5600 still keeps
> crashing with THP enabled. :-(

OK, I think this patch is still necessary as it fixes other types of 
failures.

Your testing shows that even with this applied there still remain problems.

We need to carefully audit all the code in 
arch/mips/include/asm/pgtable.h that deals with huge page PTEs, to make 
sure that the _PAGE_HUGE bit is being set when necessary.

If the entry in the PMD were to gets its  _PAGE_HUGE bit erroneously 
cleared the TLB exception handlers would load garbage to the TLB, which 
could easily result in MCheck.

David.



>
> [  606.429974] Got mcheck at 000000ffebed8c2c
> [  606.442262] CPU: 6 PID: 6767 Comm: ld Not tainted 4.7.0-rc3-octeon-distro.git-v2.17-27-g5cc128c-12208-g7d9ecdf #1
> [  606.473026] task: 800000041f384880 ti: 80000000ed7b0000 task.ti: 80000000ed7b0000
> [  606.495454] $ 0   : 0000000000000000 3e000000038ac006 000000ffebba7028 000000ffebb9f020
> [  606.519588] $ 4   : 0000000001529d94 00000001204f4236 0000000000000000 0000000000000000
> [  606.543722] $ 8   : 0000000000000001 7efefefefefefeff ffa0a0998d9e9c8b 8101010101010100
> [  606.567856] $12   : 4040404040404040 ffffffff84080018 0000000000000000 6162002e74657874
> [  606.591991] $16   : 000000012032a7d0 00000001204f4229 00000001201483f0 0000000000000000
> [  606.616125] $20   : 0000000000000000 000000000000000c 00000000053cd125 00000001204edb70
> [  606.640259] $24   : 0000000000000034 000000ffebed8b50
> [  606.664393] $28   : 000000ffebfac000 000000ffff808160 00000001204b9ad0 000000ffebed9cc8
> [  606.688528] Hi    : 0000000000001001
> [  606.699237] Lo    : 00000000000014f4
> [  606.709951] epc   : 000000ffebed8c2c 0xffebed8c2c
> [  606.724048] ra    : 000000ffebed9cc8 0xffebed9cc8
> [  606.738144] Status: 00308cf3	KX SX UX USER EXL IE
> [  606.752704] Cause : 00800060 (ExcCode 18)
> [  606.764717] PrId  : 000d0409 (Cavium Octeon+)
> [  606.777770] Index    : 80000000
> [  606.787178] PageMask : 1fe000
> [  606.796064] EntryHi  : 000000012032a095
> [  606.807555] EntryLo0 : 00000000038a8006
> [  606.819046] EntryLo1 : 00000000038ac006
> [  606.830535] Wired    : 0
> [  606.838120] PageGrain: e0000000
> [  606.847525]
> [  606.851986] Index: 40 pgmask=4kb va=0ffebba6000 asid=95
> 	[ri=0 xi=1 pa=0041d2b2000 c=0 d=1 v=1 g=0] [ri=0 xi=1 pa=0041d2b3000 c=0 d=1 v=1 g=0]
> [  606.890740] Index: 41 pgmask=4kb va=0ffebbb6000 asid=95
> 	[ri=0 xi=1 pa=0041d26e000 c=0 d=1 v=1 g=0] [ri=0 xi=1 pa=0041d26f000 c=0 d=1 v=1 g=0]
> [  606.929492] Index: 42 pgmask=4kb va=00120148000 asid=95
> 	[ri=0 xi=0 pa=0041d6b7000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=0041dcd1000 c=0 d=1 v=1 g=0]
> [  606.968241] Index: 43 pgmask=4kb va=0012012c000 asid=95
> 	[ri=0 xi=1 pa=000e30e9000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=0041e5f8000 c=0 d=1 v=1 g=0]
> [  607.006990] Index: 44 pgmask=4kb va=001204ec000 asid=95
> 	[ri=0 xi=0 pa=000e317e000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=000e32cf000 c=0 d=1 v=1 g=0]
> [  607.045743] Index: 45 pgmask=4kb va=001204fe000 asid=95
> 	[ri=0 xi=0 pa=000e4206000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=000e308f000 c=0 d=1 v=1 g=0]
> [  607.084493] Index: 46 pgmask=4kb va=001204f4000 asid=95
> 	[ri=0 xi=0 pa=000e31d0000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=000e2874000 c=0 d=1 v=1 g=0]
> [  607.123243] Index: 47 pgmask=4kb va=0ffebd3c000 asid=95
> 	[ri=0 xi=0 pa=000ef2fc000 c=0 d=0 v=1 g=0] [ri=0 xi=0 pa=000ef01f000 c=0 d=0 v=1 g=0]
> [  607.161992] Index: 48 pgmask=4kb va=0ffebf28000 asid=95
> 	[ri=0 xi=0 pa=000e3adf000 c=0 d=0 v=1 g=0] [ri=0 xi=0 pa=000e3ade000 c=0 d=0 v=1 g=0]
> [  607.200741] Index: 49 pgmask=4kb va=0ffff808000 asid=95
> 	[ri=0 xi=0 pa=000e34a8000 c=0 d=1 v=1 g=0] [ri=0 xi=0 pa=000e43bb000 c=0 d=1 v=1 g=0]
> [  607.239489] Index: 50 pgmask=4kb va=0ffebfa4000 asid=95
> 	[ri=0 xi=1 pa=000e35c6000 c=0 d=1 v=1 g=0] [ri=0 xi=1 pa=000e31eb000 c=0 d=1 v=1 g=0]
> [  607.278238] Index: 51 pgmask=4kb va=0ffebed8000 asid=95
> 	[ri=0 xi=0 pa=000e3dce000 c=0 d=0 v=1 g=0] [ri=0 xi=0 pa=000e49ed000 c=0 d=0 v=1 g=0]
> [  607.316985] Index: 52 pgmask=4kb va=00120274000 asid=95
> 	[ri=0 xi=0 pa=00000000000 c=0 d=0 v=0 g=0] [ri=0 xi=1 pa=00000000000 c=2 d=1 v=1 g=0]
> [  607.355734]
> [  607.360192]
> Code: de100000  12000014  00000000 <de020010> 1456fffb  df9991d0  de040008  0320f809  0220282d
> [  607.389654] Kernel panic - not syncing: Caught Machine Check exception - caused by multiple matching entries in the TLB.
> [  607.422806] ---[ end Kernel panic - not syncing: Caught Machine Check exception - caused by multiple matching entries in the TLB.
>
> *** NMI Watchdog interrupt on Core 0x0 ***
>
> A.
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] MIPS: Fix page table corruption on THP permission changes.
@ 2016-06-23 12:09   ` Aaro Koskinen
  0 siblings, 0 replies; 9+ messages in thread
From: Aaro Koskinen @ 2016-06-23 12:09 UTC (permalink / raw)
  To: David Daney; +Cc: linux-mips, ralf, David Daney, stable

Hi,

On Thu, Jun 16, 2016 at 03:50:31PM -0700, David Daney wrote:
> From: David Daney <david.daney@cavium.com>
> 
> When the core THP code is modifying the permissions of a huge page it
> calls pmd_modify(), which unfortunately was clearing the _PAGE_HUGE bit
> of the page table entry.  The result can be kernel messages like:
> 
> mm/memory.c:397: bad pmd 000000040080004d.
> mm/memory.c:397: bad pmd 00000003ff00004d.
> mm/memory.c:397: bad pmd 000000040100004d.
> 
> or:
> 
> ------------[ cut here ]------------
> WARNING: at mm/mmap.c:3200 exit_mmap+0x150/0x158()
> Modules linked in: ipv6 at24 octeon3_ethernet octeon_srio_nexus m25p80
> CPU: 12 PID: 1295 Comm: pmderr Not tainted 3.10.87-rt80-Cavium-Octeon #4
> Stack : 0000000040808000 0000000014009ce1 0000000000400004 ffffffff81076ba0
>           0000000000000000 0000000000000000 ffffffff85110000 0000000000000119
>           0000000000000004 0000000000000000 0000000000000119 43617669756d2d4f
>           0000000000000000 ffffffff850fda40 ffffffff85110000 0000000000000000
>           0000000000000000 0000000000000009 ffffffff809207a0 0000000000000c80
>           ffffffff80f1bf20 0000000000000001 000000ffeca36828 0000000000000001
>           0000000000000000 0000000000000001 000000ffeca7e700 ffffffff80886924
>           80000003fd7a0000 80000003fd7a39b0 80000003fdea8000 ffffffff80885780
>           80000003fdea8000 ffffffff80f12218 000000000000000c 000000000000050f
>           0000000000000000 ffffffff80865c4c 0000000000000000 0000000000000000
>           ...
> Call Trace:
> [<ffffffff80865c4c>] show_stack+0x6c/0xf8
> [<ffffffff80885780>] warn_slowpath_common+0x78/0xa8
> [<ffffffff809207a0>] exit_mmap+0x150/0x158
> [<ffffffff80882d44>] mmput+0x5c/0x110
> [<ffffffff8088b450>] do_exit+0x230/0xa68
> [<ffffffff8088be34>] do_group_exit+0x54/0x1d0
> [<ffffffff8088bfc0>] __wake_up_parent+0x0/0x18
> 
> ---[ end trace c7b38293191c57dc ]---
> BUG: Bad rss-counter state mm:80000003fa168000 idx:1 val:1536
> 
> Fix by not clearing _PAGE_HUGE bit.
> 
> Signed-off-by: David Daney <david.daney@cavium.com>
> Cc: stable@vger.kernel.org

Tested-by: Aaro Koskinen <aaro.koskinen@nokia.com>

A.

> ---
>  arch/mips/include/asm/pgtable.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> index a6b611f..477b1b1 100644
> --- a/arch/mips/include/asm/pgtable.h
> +++ b/arch/mips/include/asm/pgtable.h
> @@ -632,7 +632,7 @@ static inline struct page *pmd_page(pmd_t pmd)
>  
>  static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
>  {
> -	pmd_val(pmd) = (pmd_val(pmd) & _PAGE_CHG_MASK) | pgprot_val(newprot);
> +	pmd_val(pmd) = (pmd_val(pmd) & (_PAGE_CHG_MASK | _PAGE_HUGE)) | pgprot_val(newprot);
>  	return pmd;
>  }
>  
> -- 
> 1.7.11.7
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] MIPS: Fix page table corruption on THP permission changes.
@ 2016-06-23 12:09   ` Aaro Koskinen
  0 siblings, 0 replies; 9+ messages in thread
From: Aaro Koskinen @ 2016-06-23 12:09 UTC (permalink / raw)
  To: David Daney; +Cc: linux-mips, ralf, David Daney, stable

Hi,

On Thu, Jun 16, 2016 at 03:50:31PM -0700, David Daney wrote:
> From: David Daney <david.daney@cavium.com>
> 
> When the core THP code is modifying the permissions of a huge page it
> calls pmd_modify(), which unfortunately was clearing the _PAGE_HUGE bit
> of the page table entry.  The result can be kernel messages like:
> 
> mm/memory.c:397: bad pmd 000000040080004d.
> mm/memory.c:397: bad pmd 00000003ff00004d.
> mm/memory.c:397: bad pmd 000000040100004d.
> 
> or:
> 
> ------------[ cut here ]------------
> WARNING: at mm/mmap.c:3200 exit_mmap+0x150/0x158()
> Modules linked in: ipv6 at24 octeon3_ethernet octeon_srio_nexus m25p80
> CPU: 12 PID: 1295 Comm: pmderr Not tainted 3.10.87-rt80-Cavium-Octeon #4
> Stack : 0000000040808000 0000000014009ce1 0000000000400004 ffffffff81076ba0
>           0000000000000000 0000000000000000 ffffffff85110000 0000000000000119
>           0000000000000004 0000000000000000 0000000000000119 43617669756d2d4f
>           0000000000000000 ffffffff850fda40 ffffffff85110000 0000000000000000
>           0000000000000000 0000000000000009 ffffffff809207a0 0000000000000c80
>           ffffffff80f1bf20 0000000000000001 000000ffeca36828 0000000000000001
>           0000000000000000 0000000000000001 000000ffeca7e700 ffffffff80886924
>           80000003fd7a0000 80000003fd7a39b0 80000003fdea8000 ffffffff80885780
>           80000003fdea8000 ffffffff80f12218 000000000000000c 000000000000050f
>           0000000000000000 ffffffff80865c4c 0000000000000000 0000000000000000
>           ...
> Call Trace:
> [<ffffffff80865c4c>] show_stack+0x6c/0xf8
> [<ffffffff80885780>] warn_slowpath_common+0x78/0xa8
> [<ffffffff809207a0>] exit_mmap+0x150/0x158
> [<ffffffff80882d44>] mmput+0x5c/0x110
> [<ffffffff8088b450>] do_exit+0x230/0xa68
> [<ffffffff8088be34>] do_group_exit+0x54/0x1d0
> [<ffffffff8088bfc0>] __wake_up_parent+0x0/0x18
> 
> ---[ end trace c7b38293191c57dc ]---
> BUG: Bad rss-counter state mm:80000003fa168000 idx:1 val:1536
> 
> Fix by not clearing _PAGE_HUGE bit.
> 
> Signed-off-by: David Daney <david.daney@cavium.com>
> Cc: stable@vger.kernel.org

Tested-by: Aaro Koskinen <aaro.koskinen@nokia.com>

A.

> ---
>  arch/mips/include/asm/pgtable.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/mips/include/asm/pgtable.h b/arch/mips/include/asm/pgtable.h
> index a6b611f..477b1b1 100644
> --- a/arch/mips/include/asm/pgtable.h
> +++ b/arch/mips/include/asm/pgtable.h
> @@ -632,7 +632,7 @@ static inline struct page *pmd_page(pmd_t pmd)
>  
>  static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
>  {
> -	pmd_val(pmd) = (pmd_val(pmd) & _PAGE_CHG_MASK) | pgprot_val(newprot);
> +	pmd_val(pmd) = (pmd_val(pmd) & (_PAGE_CHG_MASK | _PAGE_HUGE)) | pgprot_val(newprot);
>  	return pmd;
>  }
>  
> -- 
> 1.7.11.7
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] MIPS: Fix page table corruption on THP permission changes.
  2016-06-16 22:50 [PATCH] MIPS: Fix page table corruption on THP permission changes David Daney
  2016-06-17 12:00   ` Aaro Koskinen
  2016-06-23 12:09   ` Aaro Koskinen
@ 2016-07-05 15:10 ` Ralf Baechle
  2016-08-16  8:11   ` Joshua Kinard
  2 siblings, 1 reply; 9+ messages in thread
From: Ralf Baechle @ 2016-07-05 15:10 UTC (permalink / raw)
  To: David Daney; +Cc: linux-mips, David Daney, stable

On Thu, Jun 16, 2016 at 03:50:31PM -0700, David Daney wrote:

> From: David Daney <david.daney@cavium.com>
> 
> When the core THP code is modifying the permissions of a huge page it
> calls pmd_modify(), which unfortunately was clearing the _PAGE_HUGE bit
> of the page table entry.  The result can be kernel messages like:
> 
> mm/memory.c:397: bad pmd 000000040080004d.
> mm/memory.c:397: bad pmd 00000003ff00004d.
> mm/memory.c:397: bad pmd 000000040100004d.
> 
> or:
> 
> ------------[ cut here ]------------
> WARNING: at mm/mmap.c:3200 exit_mmap+0x150/0x158()
> Modules linked in: ipv6 at24 octeon3_ethernet octeon_srio_nexus m25p80
> CPU: 12 PID: 1295 Comm: pmderr Not tainted 3.10.87-rt80-Cavium-Octeon #4
> Stack : 0000000040808000 0000000014009ce1 0000000000400004 ffffffff81076ba0
>           0000000000000000 0000000000000000 ffffffff85110000 0000000000000119
>           0000000000000004 0000000000000000 0000000000000119 43617669756d2d4f
>           0000000000000000 ffffffff850fda40 ffffffff85110000 0000000000000000
>           0000000000000000 0000000000000009 ffffffff809207a0 0000000000000c80
>           ffffffff80f1bf20 0000000000000001 000000ffeca36828 0000000000000001
>           0000000000000000 0000000000000001 000000ffeca7e700 ffffffff80886924
>           80000003fd7a0000 80000003fd7a39b0 80000003fdea8000 ffffffff80885780
>           80000003fdea8000 ffffffff80f12218 000000000000000c 000000000000050f
>           0000000000000000 ffffffff80865c4c 0000000000000000 0000000000000000
>           ...
> Call Trace:
> [<ffffffff80865c4c>] show_stack+0x6c/0xf8
> [<ffffffff80885780>] warn_slowpath_common+0x78/0xa8
> [<ffffffff809207a0>] exit_mmap+0x150/0x158
> [<ffffffff80882d44>] mmput+0x5c/0x110
> [<ffffffff8088b450>] do_exit+0x230/0xa68
> [<ffffffff8088be34>] do_group_exit+0x54/0x1d0
> [<ffffffff8088bfc0>] __wake_up_parent+0x0/0x18
> 
> ---[ end trace c7b38293191c57dc ]---
> BUG: Bad rss-counter state mm:80000003fa168000 idx:1 val:1536
> 
> Fix by not clearing _PAGE_HUGE bit.

I resolved the conflict with my recent other fix for pmd_modify
and just applied and pushed this.

  Ralf

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] MIPS: Fix page table corruption on THP permission changes.
  2016-07-05 15:10 ` Ralf Baechle
@ 2016-08-16  8:11   ` Joshua Kinard
  0 siblings, 0 replies; 9+ messages in thread
From: Joshua Kinard @ 2016-08-16  8:11 UTC (permalink / raw)
  To: Ralf Baechle, David Daney; +Cc: linux-mips, David Daney, stable

On 07/05/2016 11:10, Ralf Baechle wrote:
> On Thu, Jun 16, 2016 at 03:50:31PM -0700, David Daney wrote:
> 
>> From: David Daney <david.daney@cavium.com>
>>
>> When the core THP code is modifying the permissions of a huge page it
>> calls pmd_modify(), which unfortunately was clearing the _PAGE_HUGE bit
>> of the page table entry.  The result can be kernel messages like:
>>
>> mm/memory.c:397: bad pmd 000000040080004d.
>> mm/memory.c:397: bad pmd 00000003ff00004d.
>> mm/memory.c:397: bad pmd 000000040100004d.
>>
>> or:
>>
>> ------------[ cut here ]------------
>> WARNING: at mm/mmap.c:3200 exit_mmap+0x150/0x158()
>> Modules linked in: ipv6 at24 octeon3_ethernet octeon_srio_nexus m25p80
>> CPU: 12 PID: 1295 Comm: pmderr Not tainted 3.10.87-rt80-Cavium-Octeon #4
>> Stack : 0000000040808000 0000000014009ce1 0000000000400004 ffffffff81076ba0
>>           0000000000000000 0000000000000000 ffffffff85110000 0000000000000119
>>           0000000000000004 0000000000000000 0000000000000119 43617669756d2d4f
>>           0000000000000000 ffffffff850fda40 ffffffff85110000 0000000000000000
>>           0000000000000000 0000000000000009 ffffffff809207a0 0000000000000c80
>>           ffffffff80f1bf20 0000000000000001 000000ffeca36828 0000000000000001
>>           0000000000000000 0000000000000001 000000ffeca7e700 ffffffff80886924
>>           80000003fd7a0000 80000003fd7a39b0 80000003fdea8000 ffffffff80885780
>>           80000003fdea8000 ffffffff80f12218 000000000000000c 000000000000050f
>>           0000000000000000 ffffffff80865c4c 0000000000000000 0000000000000000
>>           ...
>> Call Trace:
>> [<ffffffff80865c4c>] show_stack+0x6c/0xf8
>> [<ffffffff80885780>] warn_slowpath_common+0x78/0xa8
>> [<ffffffff809207a0>] exit_mmap+0x150/0x158
>> [<ffffffff80882d44>] mmput+0x5c/0x110
>> [<ffffffff8088b450>] do_exit+0x230/0xa68
>> [<ffffffff8088be34>] do_group_exit+0x54/0x1d0
>> [<ffffffff8088bfc0>] __wake_up_parent+0x0/0x18
>>
>> ---[ end trace c7b38293191c57dc ]---
>> BUG: Bad rss-counter state mm:80000003fa168000 idx:1 val:1536
>>
>> Fix by not clearing _PAGE_HUGE bit.
> 
> I resolved the conflict with my recent other fix for pmd_modify
> and just applied and pushed this.
> 
>   Ralf

Eh, it looks like I've stumbled into another odd corner case of THP.  Only
affects the SGI Octane so far.  So it might be an Octane bug, but I'm at a loss
to explain why/how.

If I have THP/HugeTLBFS enabled, BUT disable only CONFIG_CPU_IDLE_GOV_LADDER
(while keeping the Menu governor and basic idle support in), then on userland
boot, there's about a 1-in-2 chance it'll start to throw instruction bus
errors.  If I keep the ladder governor compiled in, no bus errors.

The other way to trigger it, regardless of the above condition, is to modify
arch/mips/kernel/idle.c and force the R1x000 CPU's to use 'r4k_wait' for
cpu_wait.  Compile and run that, and virtually an IBE on every boot.

If I disable THP/HugeTLBFS, then with either of the conditions above, the
system appears to boot fine.  I honestly have no idea if the R10000-family of
CPUs even supports the 'wait' instruction, as I can't find any solid
documentation (except for one vague NEC reference) that suggests otherwise, but
I am not seeing any illegal instruction issues arising out of its use, unless
the R10k treats it as a nop or such.

That said, THP does appear to work now on both IP27 and IP30.  IP27 seems to
run it fine w/o the CPU idle framework at all.  Doesn't hit very often in
/proc/vmstat, though.

Thoughts?

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-08-16  8:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-16 22:50 [PATCH] MIPS: Fix page table corruption on THP permission changes David Daney
2016-06-17 12:00 ` Aaro Koskinen
2016-06-17 12:00   ` Aaro Koskinen
2016-06-17 16:22   ` David Daney
2016-06-17 16:22     ` David Daney
2016-06-23 12:09 ` Aaro Koskinen
2016-06-23 12:09   ` Aaro Koskinen
2016-07-05 15:10 ` Ralf Baechle
2016-08-16  8:11   ` Joshua Kinard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.