linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/mm: Handle physical-virtual alignment mismatch in phys_p4d_init()
@ 2019-06-20 11:22 Kirill A. Shutemov
  2019-06-20 14:42 ` Dave Hansen
  2019-06-21  9:02 ` Baoquan He
  0 siblings, 2 replies; 10+ messages in thread
From: Kirill A. Shutemov @ 2019-06-20 11:22 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra
  Cc: x86, linux-kernel, Kirill A. Shutemov, Kyle Pelton, Baoquan He

Kyle has reported that kernel crashes sometimes when it boots in
5-level paging mode with KASLR enabled:

[    0.000000] WARNING: CPU: 0 PID: 0 at arch/x86/mm/init_64.c:87 phys_p4d_init+0x1d4/0x1ea
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-rc5+ #27
[    0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.4
[    0.000000] RIP: 0010:phys_p4d_init+0x1d4/0x1ea
[    0.000000] Code: 20 be 43 94 ff 45 cc 4c 89 e6 e9 90 fe ff ff 48 89 d8 eb 1d 48 8b 4d d0 48 8b 7d c0 45 0f b6 3
[    0.000000] RSP: 0000:ffffffff94403c70 EFLAGS: 00000046 ORIG_RAX: 0000000000000000
[    0.000000] RAX: 0000000000000000 RBX: 0000000340000000 RCX: 0000000000000007
[    0.000000] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[    0.000000] RBP: ffffffff94403cb8 R08: 0000000000000023 R09: ffffffff947fa680
[    0.000000] R10: 0000000000000008 R11: 0000000000000031 R12: 0000010000000000
[    0.000000] R13: 0000000340000000 R14: ffffffff94403d01 R15: ff484d01962014d0
[    0.000000] FS:  0000000000000000(0000) GS:ffffffff9468a000(0000) knlGS:0000000000000000
[    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: ff484d0196201000 CR3: 0000000315b3e000 CR4: 00000000000016b0
[    0.000000] Call Trace:
[    0.000000]  __kernel_physical_mapping_init+0x10a/0x35c
[    0.000000]  kernel_physical_mapping_init+0xe/0x10
[    0.000000]  init_memory_mapping+0x1aa/0x3b0
[    0.000000]  init_range_memory_mapping+0xc8/0x116
[    0.000000]  init_mem_mapping+0x225/0x2eb
[    0.000000]  setup_arch+0x6ff/0xcf5
[    0.000000]  start_kernel+0x64/0x53b
[    0.000000]  ? copy_bootdata+0x1f/0xce
[    0.000000]  x86_64_start_reservations+0x24/0x26
[    0.000000]  x86_64_start_kernel+0x8a/0x8d
[    0.000000]  secondary_startup_64+0xb6/0xc0
[    0.000000] random: get_random_bytes called from print_oops_end_marker+0x3f/0x60 with crng_init=0
...
[    0.000000] BUG: unable to handle page fault for address: ff484d019580eff8
[    0.000000] #PF: supervisor read access in kernel mode
[    0.000000] #PF: error_code(0x0000) - not-present page
[    0.000000] BAD
[    0.000000] Oops: 0000 [#1] SMP NOPTI
[    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W         5.2.0-rc5+ #27
[    0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.4
[    0.000000] RIP: 0010:fill_pud+0x13/0x130
[    0.000000] Code: 48 83 c4 08 b8 f4 ff ff ff 5b 41 5c 41 5d 41 5e 41 5f 5d c3 31 c0 c3 e8 2b ee d8 00 55 48 89 4
[    0.000000] RSP: 0000:ffffffff94403dc0 EFLAGS: 00000006
[    0.000000] RAX: 00000000000001ff RBX: ffffffffff5fd000 RCX: 0000000000000030
[    0.000000] RDX: 00000000000001ff RSI: ffffffffff5fd000 RDI: ff484d019580eff8
[    0.000000] RBP: ffffffff94403de0 R08: 0000000000000078 R09: 0000000000000002
[    0.000000] R10: ffffffff94403cb0 R11: 000000000000005b R12: ff484d019580eff8
[    0.000000] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    0.000000] FS:  0000000000000000(0000) GS:ffffffff9468a000(0000) knlGS:0000000000000000
[    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: ff484d019580aa40 CR3: 000000031580a000 CR4: 00000000000016b0
[    0.000000] Call Trace:
[    0.000000]  set_pte_vaddr_p4d+0x2e/0x50
[    0.000000]  set_pte_vaddr+0x6f/0xb0
[    0.000000]  __native_set_fixmap+0x28/0x40
[    0.000000]  native_set_fixmap+0x39/0x70
[    0.000000]  register_lapic_address+0x49/0xb6
[    0.000000]  early_acpi_boot_init+0xa5/0xde
[    0.000000]  setup_arch+0x944/0xcf5
[    0.000000]  start_kernel+0x64/0x53b
[    0.000000]  ? copy_bootdata+0x1f/0xce
[    0.000000]  x86_64_start_reservations+0x24/0x26
[    0.000000]  x86_64_start_kernel+0x8a/0x8d
[    0.000000]  secondary_startup_64+0xb6/0xc0
[    0.000000] Modules linked in:
[    0.000000] CR2: ff484d019580eff8

Kyle bisected the issue to commmit b569c1843498 ("x86/mm/KASLR: Reduce
randomization granularity for 5-level paging to 1GB")

The commit relaxes KASLR alignment requirements and it can lead to
mismatch bentween 'i' and 'p4d_index(vaddr)' inside the loop in
phys_p4d_init(). The mismatch in its turn leads to clearing wrong p4d
entry and eventually to the oops.

The fix is to make phys_p4d_init() walk virtual address space, not
physical one.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-and-tested-by: Kyle Pelton <kyle.d.pelton@intel.com>
Fixes: b569c1843498 ("x86/mm/KASLR: Reduce randomization granularity for 5-level paging to 1GB")
Cc: Baoquan He <bhe@redhat.com>
---
 arch/x86/mm/init_64.c | 39 ++++++++++++++++-----------------------
 1 file changed, 16 insertions(+), 23 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 693aaf28d5fe..4628ac9105a2 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -671,41 +671,34 @@ static unsigned long __meminit
 phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
 	      unsigned long page_size_mask, bool init)
 {
-	unsigned long paddr_next, paddr_last = paddr_end;
-	unsigned long vaddr = (unsigned long)__va(paddr);
-	int i = p4d_index(vaddr);
+	unsigned long vaddr, vaddr_start, vaddr_end, vaddr_next, paddr_last;
+
+	paddr_last = paddr_end;
+	vaddr = (unsigned long)__va(paddr);
+	vaddr_end = (unsigned long)__va(paddr_end);
+	vaddr_start = vaddr;
 
 	if (!pgtable_l5_enabled())
 		return phys_pud_init((pud_t *) p4d_page, paddr, paddr_end,
 				     page_size_mask, init);
 
-	for (; i < PTRS_PER_P4D; i++, paddr = paddr_next) {
-		p4d_t *p4d;
+	for (; vaddr < vaddr_end; vaddr = vaddr_next) {
+		p4d_t *p4d = p4d_page + p4d_index(vaddr);
 		pud_t *pud;
 
-		vaddr = (unsigned long)__va(paddr);
-		p4d = p4d_page + p4d_index(vaddr);
-		paddr_next = (paddr & P4D_MASK) + P4D_SIZE;
+		vaddr_next = (vaddr & P4D_MASK) + P4D_SIZE;
 
-		if (paddr >= paddr_end) {
-			if (!after_bootmem &&
-			    !e820__mapped_any(paddr & P4D_MASK, paddr_next,
-					     E820_TYPE_RAM) &&
-			    !e820__mapped_any(paddr & P4D_MASK, paddr_next,
-					     E820_TYPE_RESERVED_KERN))
-				set_p4d_init(p4d, __p4d(0), init);
-			continue;
-		}
-
-		if (!p4d_none(*p4d)) {
-			pud = pud_offset(p4d, 0);
-			paddr_last = phys_pud_init(pud, paddr, paddr_end,
-						   page_size_mask, init);
+		if (p4d_val(*p4d)) {
+			pud = (pud_t *)p4d_page_vaddr(*p4d);
+			paddr_last = phys_pud_init(pud, __pa(vaddr),
+						   __pa(vaddr_end),
+						   page_size_mask,
+						   init);
 			continue;
 		}
 
 		pud = alloc_low_page();
-		paddr_last = phys_pud_init(pud, paddr, paddr_end,
+		paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end),
 					   page_size_mask, init);
 
 		spin_lock(&init_mm.page_table_lock);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mm: Handle physical-virtual alignment mismatch in phys_p4d_init()
  2019-06-20 11:22 [PATCH] x86/mm: Handle physical-virtual alignment mismatch in phys_p4d_init() Kirill A. Shutemov
@ 2019-06-20 14:42 ` Dave Hansen
  2019-06-20 23:15   ` Kirill A. Shutemov
  2019-06-21  9:02 ` Baoquan He
  1 sibling, 1 reply; 10+ messages in thread
From: Dave Hansen @ 2019-06-20 14:42 UTC (permalink / raw)
  To: Kirill A. Shutemov, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra
  Cc: x86, linux-kernel, Kirill A. Shutemov, Kyle Pelton, Baoquan He

On 6/20/19 4:22 AM, Kirill A. Shutemov wrote:
> The commit relaxes KASLR alignment requirements and it can lead to
> mismatch bentween 'i' and 'p4d_index(vaddr)' inside the loop in
> phys_p4d_init(). The mismatch in its turn leads to clearing wrong p4d
> entry and eventually to the oops.

Just curious, but what does it relax the requirement to and from?

I'm just not clearly spotting the actual bug.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mm: Handle physical-virtual alignment mismatch in phys_p4d_init()
  2019-06-20 14:42 ` Dave Hansen
@ 2019-06-20 23:15   ` Kirill A. Shutemov
  2019-06-20 23:27     ` Dave Hansen
  2019-06-23 11:27     ` Thomas Gleixner
  0 siblings, 2 replies; 10+ messages in thread
From: Kirill A. Shutemov @ 2019-06-20 23:15 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Kirill A. Shutemov, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, x86, linux-kernel, Kyle Pelton, Baoquan He

On Thu, Jun 20, 2019 at 02:42:55PM +0000, Dave Hansen wrote:
> On 6/20/19 4:22 AM, Kirill A. Shutemov wrote:
> > The commit relaxes KASLR alignment requirements and it can lead to
> > mismatch bentween 'i' and 'p4d_index(vaddr)' inside the loop in
> > phys_p4d_init(). The mismatch in its turn leads to clearing wrong p4d
> > entry and eventually to the oops.
> 
> Just curious, but what does it relax the requirement to and from?
> 
> I'm just not clearly spotting the actual bug.

Before the commit PAGE_OFFSET is always aligned to P4D_SIZE if we boot in
5-level paging mode. But now only PUD_SIZE alignment is guaranteed.

For phys_p4d_init() it means that paddr_next after the first iteration
can belong to the same p4d entry.

In the case I was able to reproduce the vaddr on the first iteration is
0xff4228027fe00000 and paddr is 0x33fe00000. On the second iteration vaddr
becomes 0xff42287f40000000 and paddr is 0x8000000000. The vaddr in both
cases bolong to the same p4d entry.

It screws up 'i' count: we miss the last entry in the page table
completely.  And it confuses the branch under 'paddr >= paddr_end'
condition: the p4d entry can be cleared where must not to.

The patch makes phys_p4d_init() work like __kernel_physical_mapping_init()
which deals with phys-virt mismatch already.

I hope this explanation makes any sense :P

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mm: Handle physical-virtual alignment mismatch in phys_p4d_init()
  2019-06-20 23:15   ` Kirill A. Shutemov
@ 2019-06-20 23:27     ` Dave Hansen
  2019-06-23 11:27     ` Thomas Gleixner
  1 sibling, 0 replies; 10+ messages in thread
From: Dave Hansen @ 2019-06-20 23:27 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, x86, linux-kernel, Kyle Pelton, Baoquan He

On 6/20/19 4:15 PM, Kirill A. Shutemov wrote:
> I hope this explanation makes any sense :P

Yes, thanks for the details!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mm: Handle physical-virtual alignment mismatch in phys_p4d_init()
  2019-06-20 11:22 [PATCH] x86/mm: Handle physical-virtual alignment mismatch in phys_p4d_init() Kirill A. Shutemov
  2019-06-20 14:42 ` Dave Hansen
@ 2019-06-21  9:02 ` Baoquan He
  2019-06-21 10:54   ` Kirill A. Shutemov
  1 sibling, 1 reply; 10+ messages in thread
From: Baoquan He @ 2019-06-21  9:02 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Kirill A. Shutemov, Kyle Pelton

Hi Kirill,

On 06/20/19 at 02:22pm, Kirill A. Shutemov wrote:
> Kyle has reported that kernel crashes sometimes when it boots in
> 5-level paging mode with KASLR enabled:

This is a great finding, thanks for the fix. I ever have modified codes
to make them accommodate PMD level of randomization, this
phys_p4d_init() part is included. Not sure why I missed it when later
took PUD level randomization for 5-level.

https://github.com/baoquan-he/linux/commit/dc91f5292bf1f55666c9139b6621d830b5b38aa5

Have some concerns, please check.

> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/x86/mm/init_64.c:87 phys_p4d_init+0x1d4/0x1ea
...... 
> Kyle bisected the issue to commmit b569c1843498 ("x86/mm/KASLR: Reduce
> randomization granularity for 5-level paging to 1GB")
> 
> The commit relaxes KASLR alignment requirements and it can lead to
> mismatch bentween 'i' and 'p4d_index(vaddr)' inside the loop in
           ^ between
> phys_p4d_init(). The mismatch in its turn leads to clearing wrong p4d
> entry and eventually to the oops.
> 
> The fix is to make phys_p4d_init() walk virtual address space, not
> physical one.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reported-and-tested-by: Kyle Pelton <kyle.d.pelton@intel.com>
> Fixes: b569c1843498 ("x86/mm/KASLR: Reduce randomization granularity for 5-level paging to 1GB")
> Cc: Baoquan He <bhe@redhat.com>
> ---
>  arch/x86/mm/init_64.c | 39 ++++++++++++++++-----------------------
>  1 file changed, 16 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 693aaf28d5fe..4628ac9105a2 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -671,41 +671,34 @@ static unsigned long __meminit
>  phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
>  	      unsigned long page_size_mask, bool init)
>  {
> -	unsigned long paddr_next, paddr_last = paddr_end;
> -	unsigned long vaddr = (unsigned long)__va(paddr);
> -	int i = p4d_index(vaddr);
> +	unsigned long vaddr, vaddr_start, vaddr_end, vaddr_next, paddr_last;
> +
> +	paddr_last = paddr_end;
> +	vaddr = (unsigned long)__va(paddr);
> +	vaddr_end = (unsigned long)__va(paddr_end);
> +	vaddr_start = vaddr;

Variable vaddr_start is not used in this patch, redundent?

>  
>  	if (!pgtable_l5_enabled())
>  		return phys_pud_init((pud_t *) p4d_page, paddr, paddr_end,
>  				     page_size_mask, init);
>  
> -	for (; i < PTRS_PER_P4D; i++, paddr = paddr_next) {
> -		p4d_t *p4d;
> +	for (; vaddr < vaddr_end; vaddr = vaddr_next) {
> +		p4d_t *p4d = p4d_page + p4d_index(vaddr);
>  		pud_t *pud;
>  
> -		vaddr = (unsigned long)__va(paddr);
> -		p4d = p4d_page + p4d_index(vaddr);
> -		paddr_next = (paddr & P4D_MASK) + P4D_SIZE;
> +		vaddr_next = (vaddr & P4D_MASK) + P4D_SIZE;
>  

The code block as below is to zero p4d entries which are not coverred by
the current memory range, and if haven't been mapped already. It's
clearred away in this patch, could you also mention it in log, and tell
why it doesn't matter now?

If it doesn't matter, should we clear away the simillar code in
phys_pud_init/phys_pmd_init/phys_pte_init? Maybe a prep patch to do the
clean up?

Thanks
Baoquan

> -		if (paddr >= paddr_end) {
> -			if (!after_bootmem &&
> -			    !e820__mapped_any(paddr & P4D_MASK, paddr_next,
> -					     E820_TYPE_RAM) &&
> -			    !e820__mapped_any(paddr & P4D_MASK, paddr_next,
> -					     E820_TYPE_RESERVED_KERN))
> -				set_p4d_init(p4d, __p4d(0), init);
> -			continue;
> -		}
> -
> -		if (!p4d_none(*p4d)) {
> -			pud = pud_offset(p4d, 0);
> -			paddr_last = phys_pud_init(pud, paddr, paddr_end,
> -						   page_size_mask, init);
> +		if (p4d_val(*p4d)) {
> +			pud = (pud_t *)p4d_page_vaddr(*p4d);
> +			paddr_last = phys_pud_init(pud, __pa(vaddr),
> +						   __pa(vaddr_end),
> +						   page_size_mask,
> +						   init);
>  			continue;
>  		}
>  
>  		pud = alloc_low_page();
> -		paddr_last = phys_pud_init(pud, paddr, paddr_end,
> +		paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end),
>  					   page_size_mask, init);
>  
>  		spin_lock(&init_mm.page_table_lock);
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mm: Handle physical-virtual alignment mismatch in phys_p4d_init()
  2019-06-21  9:02 ` Baoquan He
@ 2019-06-21 10:54   ` Kirill A. Shutemov
  2019-06-24 10:07     ` Baoquan He
  0 siblings, 1 reply; 10+ messages in thread
From: Kirill A. Shutemov @ 2019-06-21 10:54 UTC (permalink / raw)
  To: Baoquan He
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Kirill A. Shutemov, Kyle Pelton

On Fri, Jun 21, 2019 at 05:02:49PM +0800, Baoquan He wrote:
> Hi Kirill,
> 
> On 06/20/19 at 02:22pm, Kirill A. Shutemov wrote:
> > Kyle has reported that kernel crashes sometimes when it boots in
> > 5-level paging mode with KASLR enabled:
> 
> This is a great finding, thanks for the fix. I ever have modified codes
> to make them accommodate PMD level of randomization, this
> phys_p4d_init() part is included. Not sure why I missed it when later
> took PUD level randomization for 5-level.
> 
> https://github.com/baoquan-he/linux/commit/dc91f5292bf1f55666c9139b6621d830b5b38aa5
> 
> Have some concerns, please check.
> 
> > [    0.000000] WARNING: CPU: 0 PID: 0 at arch/x86/mm/init_64.c:87 phys_p4d_init+0x1d4/0x1ea
> ...... 
> > Kyle bisected the issue to commmit b569c1843498 ("x86/mm/KASLR: Reduce
> > randomization granularity for 5-level paging to 1GB")
> > 
> > The commit relaxes KASLR alignment requirements and it can lead to
> > mismatch bentween 'i' and 'p4d_index(vaddr)' inside the loop in
>            ^ between
> > phys_p4d_init(). The mismatch in its turn leads to clearing wrong p4d
> > entry and eventually to the oops.
> > 
> > The fix is to make phys_p4d_init() walk virtual address space, not
> > physical one.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Reported-and-tested-by: Kyle Pelton <kyle.d.pelton@intel.com>
> > Fixes: b569c1843498 ("x86/mm/KASLR: Reduce randomization granularity for 5-level paging to 1GB")
> > Cc: Baoquan He <bhe@redhat.com>
> > ---
> >  arch/x86/mm/init_64.c | 39 ++++++++++++++++-----------------------
> >  1 file changed, 16 insertions(+), 23 deletions(-)
> > 
> > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> > index 693aaf28d5fe..4628ac9105a2 100644
> > --- a/arch/x86/mm/init_64.c
> > +++ b/arch/x86/mm/init_64.c
> > @@ -671,41 +671,34 @@ static unsigned long __meminit
> >  phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
> >  	      unsigned long page_size_mask, bool init)
> >  {
> > -	unsigned long paddr_next, paddr_last = paddr_end;
> > -	unsigned long vaddr = (unsigned long)__va(paddr);
> > -	int i = p4d_index(vaddr);
> > +	unsigned long vaddr, vaddr_start, vaddr_end, vaddr_next, paddr_last;
> > +
> > +	paddr_last = paddr_end;
> > +	vaddr = (unsigned long)__va(paddr);
> > +	vaddr_end = (unsigned long)__va(paddr_end);
> > +	vaddr_start = vaddr;
> 
> Variable vaddr_start is not used in this patch, redundent?

Yep. I'll drop it in v2.

> >  	if (!pgtable_l5_enabled())
> >  
> >  		return phys_pud_init((pud_t *) p4d_page, paddr, paddr_end,
> >  				     page_size_mask, init);
> >  
> > -	for (; i < PTRS_PER_P4D; i++, paddr = paddr_next) {
> > -		p4d_t *p4d;
> > +	for (; vaddr < vaddr_end; vaddr = vaddr_next) {
> > +		p4d_t *p4d = p4d_page + p4d_index(vaddr);
> >  		pud_t *pud;
> >  
> > -		vaddr = (unsigned long)__va(paddr);
> > -		p4d = p4d_page + p4d_index(vaddr);
> > -		paddr_next = (paddr & P4D_MASK) + P4D_SIZE;
> > +		vaddr_next = (vaddr & P4D_MASK) + P4D_SIZE;
> >  
> 
> The code block as below is to zero p4d entries which are not coverred by
> the current memory range, and if haven't been mapped already. It's
> clearred away in this patch, could you also mention it in log, and tell
> why it doesn't matter now?
> 
> If it doesn't matter, should we clear away the simillar code in
> phys_pud_init/phys_pmd_init/phys_pte_init? Maybe a prep patch to do the
> clean up?

It only matters for the levels that contains page table entries that can
point to pages, not page tables. There's no p4d or pgd huge pages on x86.
Otherwise we only leak page tables without any benefit.

We might have this on all leveles under p?d_large() condition and don't
touch page tables at all.

BTW, it all becomes rather risky for this late in the release cycle. Maybe
we should revert the original patch and try again later with more
comprehansive solution?

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mm: Handle physical-virtual alignment mismatch in phys_p4d_init()
  2019-06-20 23:15   ` Kirill A. Shutemov
  2019-06-20 23:27     ` Dave Hansen
@ 2019-06-23 11:27     ` Thomas Gleixner
  1 sibling, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2019-06-23 11:27 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Kirill A. Shutemov, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	x86, linux-kernel, Kyle Pelton, Baoquan He

On Fri, 21 Jun 2019, Kirill A. Shutemov wrote:
> On Thu, Jun 20, 2019 at 02:42:55PM +0000, Dave Hansen wrote:
> > On 6/20/19 4:22 AM, Kirill A. Shutemov wrote:
> > > The commit relaxes KASLR alignment requirements and it can lead to
> > > mismatch bentween 'i' and 'p4d_index(vaddr)' inside the loop in
> > > phys_p4d_init(). The mismatch in its turn leads to clearing wrong p4d
> > > entry and eventually to the oops.
> > 
> > Just curious, but what does it relax the requirement to and from?
> > 
> > I'm just not clearly spotting the actual bug.
> 
> Before the commit PAGE_OFFSET is always aligned to P4D_SIZE if we boot in
> 5-level paging mode. But now only PUD_SIZE alignment is guaranteed.
> 
> For phys_p4d_init() it means that paddr_next after the first iteration
> can belong to the same p4d entry.
> 
> In the case I was able to reproduce the vaddr on the first iteration is
> 0xff4228027fe00000 and paddr is 0x33fe00000. On the second iteration vaddr
> becomes 0xff42287f40000000 and paddr is 0x8000000000. The vaddr in both
> cases bolong to the same p4d entry.
> 
> It screws up 'i' count: we miss the last entry in the page table
> completely.  And it confuses the branch under 'paddr >= paddr_end'
> condition: the p4d entry can be cleared where must not to.
> 
> The patch makes phys_p4d_init() work like __kernel_physical_mapping_init()
> which deals with phys-virt mismatch already.
> 
> I hope this explanation makes any sense :P

Yes, and it should have been in the changelog right away. It's much more
useful than the 63 lines of untrimmed backtrace noise, which should be
condensed to the real valuable information:

  WARNING: CPU: 0 PID: 0 at arch/x86/mm/init_64.c:87 phys_p4d_init+0x1d4/0x1ea
  RIP: 0010:phys_p4d_init+0x1d4/0x1ea
  Call Trace:
   __kernel_physical_mapping_init+0x10a/0x35c
   kernel_physical_mapping_init+0xe/0x10
   init_memory_mapping+0x1aa/0x3b0
   init_range_memory_mapping+0xc8/0x116
   init_mem_mapping+0x225/0x2eb
   setup_arch+0x6ff/0xcf5
   start_kernel+0x64/0x53b
   ? copy_bootdata+0x1f/0xce
   x86_64_start_reservations+0x24/0x26
   x86_64_start_kernel+0x8a/0x8d
   secondary_startup_64+0xb6/0xc0

which causes later:

  BUG: unable to handle page fault for address: ff484d019580eff8
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  BAD
  Oops: 0000 [#1] SMP NOPTI
  RIP: 0010:fill_pud+0x13/0x130
  Call Trace:
   set_pte_vaddr_p4d+0x2e/0x50
   set_pte_vaddr+0x6f/0xb0
   __native_set_fixmap+0x28/0x40
   native_set_fixmap+0x39/0x70
   register_lapic_address+0x49/0xb6
   early_acpi_boot_init+0xa5/0xde
   setup_arch+0x944/0xcf5
   start_kernel+0x64/0x53b

Hmm?

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mm: Handle physical-virtual alignment mismatch in phys_p4d_init()
  2019-06-21 10:54   ` Kirill A. Shutemov
@ 2019-06-24 10:07     ` Baoquan He
  2019-06-24 12:23       ` Kirill A. Shutemov
  0 siblings, 1 reply; 10+ messages in thread
From: Baoquan He @ 2019-06-24 10:07 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Kirill A. Shutemov, Kyle Pelton

On 06/21/19 at 01:54pm, Kirill A. Shutemov wrote:
> > The code block as below is to zero p4d entries which are not coverred by
> > the current memory range, and if haven't been mapped already. It's
> > clearred away in this patch, could you also mention it in log, and tell
> > why it doesn't matter now?
> > 
> > If it doesn't matter, should we clear away the simillar code in
> > phys_pud_init/phys_pmd_init/phys_pte_init? Maybe a prep patch to do the
> > clean up?
> 
> It only matters for the levels that contains page table entries that can
> point to pages, not page tables. There's no p4d or pgd huge pages on x86.
> Otherwise we only leak page tables without any benefit.

Ah, I checked git history, didn't find why it's added. I just Have a
superficial knowledge of the clearing, but in a low-efficiency way.

> 
> We might have this on all leveles under p?d_large() condition and don't
> touch page tables at all.

I see.

> 
> BTW, it all becomes rather risky for this late in the release cycle. Maybe
> we should revert the original patch and try again later with more
> comprehansive solution?

It's not added in one time. I am fine with your current change, would be
much better if mention it in log, and also add code comment above the
clearing code. Surely reverting and trying later with more comprehensive
solution is also good to me, this need a little more effort.

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mm: Handle physical-virtual alignment mismatch in phys_p4d_init()
  2019-06-24 10:07     ` Baoquan He
@ 2019-06-24 12:23       ` Kirill A. Shutemov
  2019-06-24 13:43         ` Baoquan He
  0 siblings, 1 reply; 10+ messages in thread
From: Kirill A. Shutemov @ 2019-06-24 12:23 UTC (permalink / raw)
  To: Baoquan He
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Kirill A. Shutemov, Kyle Pelton

On Mon, Jun 24, 2019 at 06:07:42PM +0800, Baoquan He wrote:
> On 06/21/19 at 01:54pm, Kirill A. Shutemov wrote:
> > > The code block as below is to zero p4d entries which are not coverred by
> > > the current memory range, and if haven't been mapped already. It's
> > > clearred away in this patch, could you also mention it in log, and tell
> > > why it doesn't matter now?
> > > 
> > > If it doesn't matter, should we clear away the simillar code in
> > > phys_pud_init/phys_pmd_init/phys_pte_init? Maybe a prep patch to do the
> > > clean up?
> > 
> > It only matters for the levels that contains page table entries that can
> > point to pages, not page tables. There's no p4d or pgd huge pages on x86.
> > Otherwise we only leak page tables without any benefit.
> 
> Ah, I checked git history, didn't find why it's added. I just Have a
> superficial knowledge of the clearing, but in a low-efficiency way.
> 
> > 
> > We might have this on all leveles under p?d_large() condition and don't
> > touch page tables at all.
> 
> I see.
> 
> > 
> > BTW, it all becomes rather risky for this late in the release cycle. Maybe
> > we should revert the original patch and try again later with more
> > comprehansive solution?
> 
> It's not added in one time. I am fine with your current change, would be
> much better if mention it in log, and also add code comment above the
> clearing code. Surely reverting and trying later with more comprehensive
> solution is also good to me, this need a little more effort.

I've decided to keep the block for now. We can remove it later, once the fixis in.
I'll post it soon

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86/mm: Handle physical-virtual alignment mismatch in phys_p4d_init()
  2019-06-24 12:23       ` Kirill A. Shutemov
@ 2019-06-24 13:43         ` Baoquan He
  0 siblings, 0 replies; 10+ messages in thread
From: Baoquan He @ 2019-06-24 13:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dave Hansen, Andy Lutomirski, Peter Zijlstra, x86, linux-kernel,
	Kirill A. Shutemov, Kyle Pelton

On 06/24/19 at 03:23pm, Kirill A. Shutemov wrote:
> On Mon, Jun 24, 2019 at 06:07:42PM +0800, Baoquan He wrote:
> > On 06/21/19 at 01:54pm, Kirill A. Shutemov wrote:
> > > > The code block as below is to zero p4d entries which are not coverred by
> > > > the current memory range, and if haven't been mapped already. It's
> > > > clearred away in this patch, could you also mention it in log, and tell
> > > > why it doesn't matter now?
> > > > 
> > > > If it doesn't matter, should we clear away the simillar code in
> > > > phys_pud_init/phys_pmd_init/phys_pte_init? Maybe a prep patch to do the
> > > > clean up?
> > > 
> > > It only matters for the levels that contains page table entries that can
> > > point to pages, not page tables. There's no p4d or pgd huge pages on x86.
> > > Otherwise we only leak page tables without any benefit.
> > 
> > Ah, I checked git history, didn't find why it's added. I just Have a
> > superficial knowledge of the clearing, but in a low-efficiency way.
> > 
> > > 
> > > We might have this on all leveles under p?d_large() condition and don't
> > > touch page tables at all.
> > 
> > I see.
> > 
> > > 
> > > BTW, it all becomes rather risky for this late in the release cycle. Maybe
> > > we should revert the original patch and try again later with more
> > > comprehansive solution?
> > 
> > It's not added in one time. I am fine with your current change, would be
> > much better if mention it in log, and also add code comment above the
> > clearing code. Surely reverting and trying later with more comprehensive
> > solution is also good to me, this need a little more effort.
> 
> I've decided to keep the block for now. We can remove it later, once the fixis in.
> I'll post it soon

That's great, can make those codes more understandable with clear commit
log and code comment.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-06-24 13:43 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-20 11:22 [PATCH] x86/mm: Handle physical-virtual alignment mismatch in phys_p4d_init() Kirill A. Shutemov
2019-06-20 14:42 ` Dave Hansen
2019-06-20 23:15   ` Kirill A. Shutemov
2019-06-20 23:27     ` Dave Hansen
2019-06-23 11:27     ` Thomas Gleixner
2019-06-21  9:02 ` Baoquan He
2019-06-21 10:54   ` Kirill A. Shutemov
2019-06-24 10:07     ` Baoquan He
2019-06-24 12:23       ` Kirill A. Shutemov
2019-06-24 13:43         ` Baoquan He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).