linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/1] x86: Fix .bss corruption
@ 2023-06-14 16:38 Lee Jones
  2023-06-14 16:38 ` [PATCH v2 1/1] x86/mm/KASLR: Store pud_page_tramp into entry rather than page Lee Jones
  0 siblings, 1 reply; 6+ messages in thread
From: Lee Jones @ 2023-06-14 16:38 UTC (permalink / raw)
  To: lee, dave.hansen, luto, peterz, tglx, mingo, bp, x86, hpa, linux-mm
  Cc: linux-kernel

- Report

The following highly indeterministic kernel panic was reported to occur every
few hundred boots:

    Kernel panic - not syncing: Fatal exception
     RIP: 0010:alloc_ucounts+0x68/0x280
     RSP: 0018:ffffb53ac13dfe98 EFLAGS: 00010006
     RAX: 0000000000000000 RBX: 000000015b804063 RCX: ffff9bb60d5aa500
     RDX: 0000000000000001 RSI: 00000000000003fb RDI: 0000000000000001
     RBP: ffffb53ac13dfec0 R08: 0000000000000000 R09: ffffd53abf600000
     R10: ffff9bb60b4bdd80 R11: ffffffffbcde7bb0 R12: ffffffffbf04e710
     R13: ffffffffbf405160 R14: 00000000000003fb R15: ffffffffbf1afc60
     FS:  00007f5be44d5000(0000) GS:ffff9bb6b9cc0000(0000) knlGS:0000000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: 000000015b80407b CR3: 00000001038ea000 CR4: 00000000000406a0
     DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
     DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
     Call Trace:
      <TASK>
      __sys_setuid+0x1a0/0x290
      __x64_sys_setuid+0xc/0x10
      do_syscall_64+0x43/0x90
      ? asm_exc_page_fault+0x8/0x30
      entry_SYSCALL_64_after_hwframe+0x44/0xae

- Problem

The issue was eventually tracked down to the attempted dereference of a value
located in a corrupted hash table.  ucounts_hashtable is an array of 1024
struct hlists.  Each element is the head of its own linked list where previous
ucount allocations are stored.  The [20]th element of ucounts_hashtable was
being consistently trashed on each and every boot.  However the indeterminism
comes from it being accessed only every few hundred boots.

The issue disappeared, or was at least unidentifiable when !(LTO=full) or when
memory base randomisation (a.k.a. KASLR) was disabled, rending GDB all but
impossible to use effectively.

The cause of the corruption was uncovered using a verity of different debugging
techniques and was eventually tracked down to page table manipulation in early
architecture setup.

The following line in arch/x86/realmode/init.c [0] allocates a variable, just 8
Bytes in size, to "hold the pgd entry used on booting additional CPUs":

  pgd_t trampoline_pgd_entry;

The address of that variable is then passed from init_trampoline_kaslr() via a
call to set_pgd() [1] to have a value (not too important here) assigned to it.
Numerous abstractions take place, eventually leading to native_set_p4d(), an
inline function [2] contained in arch/x86/include/asm/pgtable_64.h.

From here, intentionally or otherwise, a call to pti_set_user_pgtbl() is made.
This is where the out-of-bounds write eventually occurs.  It is not known (by
me) why this function is called.  The returned result is subsequently used as a
value to write using the WRITE_ONCE macro.  Perhaps the premature write is not
intended.  This is what I hope to find out.

A little way down in pti_set_user_pgtbl() [3] the following line occurs:

  kernel_to_user_pgdp(pgdp)->pgd = pgd.pgd

The kernel_to_user_pgdp() part takes the address of pgdp (a.k.a.
trampoline_pgd_entry) and ends up flipping the 12th bit, essentially adding 4k
(0x1000) to the address.  Then the part at the end assigns our value (still not
important here) to it.  However, if we remember that only 8 Bytes was allocated
(globally) for trampoline_pgd_entry, then means we just stored the value into
the outlands (what we now know to be allocated to another global storage user
ucounts_hashtable).

[0] https://elixir.bootlin.com/linux/latest/source/arch/x86/realmode/init.c#L18
[1] https://elixir.bootlin.com/linux/latest/source/arch/x86/mm/kaslr.c#L178
[2] https://elixir.bootlin.com/linux/latest/source/arch/x86/include/asm/pgtable_64.h#L142
[3] https://elixir.bootlin.com/linux/latest/source/arch/x86/mm/pti.c#L142

Lee Jones (1):
  x86/mm/KASLR: Store pud_page_tramp into entry rather than page

 arch/x86/mm/kaslr.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

-- 
2.41.0.162.gfafddb0af9-goog


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/1] x86/mm/KASLR: Store pud_page_tramp into entry rather than page
  2023-06-14 16:38 [PATCH v2 0/1] x86: Fix .bss corruption Lee Jones
@ 2023-06-14 16:38 ` Lee Jones
  2023-06-14 17:48   ` Nick Desaulniers
  2023-06-16 17:00   ` Lee Jones
  0 siblings, 2 replies; 6+ messages in thread
From: Lee Jones @ 2023-06-14 16:38 UTC (permalink / raw)
  To: lee, dave.hansen, luto, peterz, tglx, mingo, bp, x86, hpa, linux-mm
  Cc: linux-kernel

set_pgd() expects to be passed whole pages to operate on, whereas
trampoline_pgd_entry is, as the name suggests, an entry.  The
ramifications for using set_pgd() here are that the following thread of
execution will not only place the suggested value into the
trampoline_pgd_entry (8-Byte globally stored [.bss]) variable, PTI will
also attempt to replicate that value into the non-existent neighboring
user page (located +4k away), leading to the corruption of other global
[.bss] stored variables.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Lee Jones <lee@kernel.org>
---
 arch/x86/mm/kaslr.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 557f0fe25dff4..37db264866b64 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -172,10 +172,10 @@ void __meminit init_trampoline_kaslr(void)
 		set_p4d(p4d_tramp,
 			__p4d(_KERNPG_TABLE | __pa(pud_page_tramp)));
 
-		set_pgd(&trampoline_pgd_entry,
-			__pgd(_KERNPG_TABLE | __pa(p4d_page_tramp)));
+		trampoline_pgd_entry =
+			__pgd(_KERNPG_TABLE | __pa(p4d_page_tramp));
 	} else {
-		set_pgd(&trampoline_pgd_entry,
-			__pgd(_KERNPG_TABLE | __pa(pud_page_tramp)));
+		trampoline_pgd_entry =
+			__pgd(_KERNPG_TABLE | __pa(pud_page_tramp));
 	}
 }
-- 
2.41.0.162.gfafddb0af9-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/1] x86/mm/KASLR: Store pud_page_tramp into entry rather than page
  2023-06-14 16:38 ` [PATCH v2 1/1] x86/mm/KASLR: Store pud_page_tramp into entry rather than page Lee Jones
@ 2023-06-14 17:48   ` Nick Desaulniers
  2023-06-14 18:43     ` Lee Jones
  2023-06-16 17:00   ` Lee Jones
  1 sibling, 1 reply; 6+ messages in thread
From: Nick Desaulniers @ 2023-06-14 17:48 UTC (permalink / raw)
  To: Lee Jones
  Cc: dave.hansen, luto, peterz, tglx, mingo, bp, x86, hpa, linux-mm,
	linux-kernel

On Wed, Jun 14, 2023 at 05:38:54PM +0100, Lee Jones wrote:
> set_pgd() expects to be passed whole pages to operate on, whereas
> trampoline_pgd_entry is, as the name suggests, an entry.  The
> ramifications for using set_pgd() here are that the following thread of
> execution will not only place the suggested value into the
> trampoline_pgd_entry (8-Byte globally stored [.bss]) variable, PTI will
> also attempt to replicate that value into the non-existent neighboring
> user page (located +4k away), leading to the corruption of other global
> [.bss] stored variables.
> 
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Lee Jones <lee@kernel.org>

Nice work tracking this one down!

Fixes: 0925dda5962e ("x86/mm/KASLR: Use only one PUD entry for real mode trampoline")
Cc: <stable@vger.kernel.org>

> ---
>  arch/x86/mm/kaslr.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> index 557f0fe25dff4..37db264866b64 100644
> --- a/arch/x86/mm/kaslr.c
> +++ b/arch/x86/mm/kaslr.c
> @@ -172,10 +172,10 @@ void __meminit init_trampoline_kaslr(void)
>  		set_p4d(p4d_tramp,
>  			__p4d(_KERNPG_TABLE | __pa(pud_page_tramp)));
>  
> -		set_pgd(&trampoline_pgd_entry,
> -			__pgd(_KERNPG_TABLE | __pa(p4d_page_tramp)));
> +		trampoline_pgd_entry =
> +			__pgd(_KERNPG_TABLE | __pa(p4d_page_tramp));
>  	} else {
> -		set_pgd(&trampoline_pgd_entry,
> -			__pgd(_KERNPG_TABLE | __pa(pud_page_tramp)));
> +		trampoline_pgd_entry =
> +			__pgd(_KERNPG_TABLE | __pa(pud_page_tramp));
>  	}
>  }
> -- 
> 2.41.0.162.gfafddb0af9-goog
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/1] x86/mm/KASLR: Store pud_page_tramp into entry rather than page
  2023-06-14 17:48   ` Nick Desaulniers
@ 2023-06-14 18:43     ` Lee Jones
  0 siblings, 0 replies; 6+ messages in thread
From: Lee Jones @ 2023-06-14 18:43 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: dave.hansen, luto, peterz, tglx, mingo, bp, x86, hpa, linux-mm,
	linux-kernel

On Wed, 14 Jun 2023, Nick Desaulniers wrote:

> On Wed, Jun 14, 2023 at 05:38:54PM +0100, Lee Jones wrote:
> > set_pgd() expects to be passed whole pages to operate on, whereas
> > trampoline_pgd_entry is, as the name suggests, an entry.  The
> > ramifications for using set_pgd() here are that the following thread of
> > execution will not only place the suggested value into the
> > trampoline_pgd_entry (8-Byte globally stored [.bss]) variable, PTI will
> > also attempt to replicate that value into the non-existent neighboring
> > user page (located +4k away), leading to the corruption of other global
> > [.bss] stored variables.
> > 
> > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Signed-off-by: Lee Jones <lee@kernel.org>
> 
> Nice work tracking this one down!
> 
> Fixes: 0925dda5962e ("x86/mm/KASLR: Use only one PUD entry for real mode trampoline")
> Cc: <stable@vger.kernel.org>

I was planning on backporting this myself once it hits Mainline.

Happy to submit a RESEND with those pieces added if required though.

Or perhaps someone would be kind enough to add them on merge?

> > ---
> >  arch/x86/mm/kaslr.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> > index 557f0fe25dff4..37db264866b64 100644
> > --- a/arch/x86/mm/kaslr.c
> > +++ b/arch/x86/mm/kaslr.c
> > @@ -172,10 +172,10 @@ void __meminit init_trampoline_kaslr(void)
> >  		set_p4d(p4d_tramp,
> >  			__p4d(_KERNPG_TABLE | __pa(pud_page_tramp)));
> >  
> > -		set_pgd(&trampoline_pgd_entry,
> > -			__pgd(_KERNPG_TABLE | __pa(p4d_page_tramp)));
> > +		trampoline_pgd_entry =
> > +			__pgd(_KERNPG_TABLE | __pa(p4d_page_tramp));
> >  	} else {
> > -		set_pgd(&trampoline_pgd_entry,
> > -			__pgd(_KERNPG_TABLE | __pa(pud_page_tramp)));
> > +		trampoline_pgd_entry =
> > +			__pgd(_KERNPG_TABLE | __pa(pud_page_tramp));
> >  	}
> >  }
> > -- 
> > 2.41.0.162.gfafddb0af9-goog
> > 

-- 
Lee Jones [李琼斯]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/1] x86/mm/KASLR: Store pud_page_tramp into entry rather than page
  2023-06-14 16:38 ` [PATCH v2 1/1] x86/mm/KASLR: Store pud_page_tramp into entry rather than page Lee Jones
  2023-06-14 17:48   ` Nick Desaulniers
@ 2023-06-16 17:00   ` Lee Jones
  2023-06-19 10:43     ` Lee Jones
  1 sibling, 1 reply; 6+ messages in thread
From: Lee Jones @ 2023-06-16 17:00 UTC (permalink / raw)
  To: dave.hansen, luto, peterz, tglx, mingo, bp, x86, hpa, linux-mm
  Cc: linux-kernel

On Wed, 14 Jun 2023, Lee Jones wrote:

> set_pgd() expects to be passed whole pages to operate on, whereas
> trampoline_pgd_entry is, as the name suggests, an entry.  The
> ramifications for using set_pgd() here are that the following thread of
> execution will not only place the suggested value into the
> trampoline_pgd_entry (8-Byte globally stored [.bss]) variable, PTI will
> also attempt to replicate that value into the non-existent neighboring
> user page (located +4k away), leading to the corruption of other global
> [.bss] stored variables.
> 
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Lee Jones <lee@kernel.org>
> ---
>  arch/x86/mm/kaslr.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

Is there any more you need from me at this point?

Would you like me to resubmit with the Fixes: tag applied, or is
someone happy to apply it on merge?

> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> index 557f0fe25dff4..37db264866b64 100644
> --- a/arch/x86/mm/kaslr.c
> +++ b/arch/x86/mm/kaslr.c
> @@ -172,10 +172,10 @@ void __meminit init_trampoline_kaslr(void)
>  		set_p4d(p4d_tramp,
>  			__p4d(_KERNPG_TABLE | __pa(pud_page_tramp)));
>  
> -		set_pgd(&trampoline_pgd_entry,
> -			__pgd(_KERNPG_TABLE | __pa(p4d_page_tramp)));
> +		trampoline_pgd_entry =
> +			__pgd(_KERNPG_TABLE | __pa(p4d_page_tramp));
>  	} else {
> -		set_pgd(&trampoline_pgd_entry,
> -			__pgd(_KERNPG_TABLE | __pa(pud_page_tramp)));
> +		trampoline_pgd_entry =
> +			__pgd(_KERNPG_TABLE | __pa(pud_page_tramp));
>  	}
>  }
> -- 
> 2.41.0.162.gfafddb0af9-goog
> 

-- 
Lee Jones [李琼斯]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/1] x86/mm/KASLR: Store pud_page_tramp into entry rather than page
  2023-06-16 17:00   ` Lee Jones
@ 2023-06-19 10:43     ` Lee Jones
  0 siblings, 0 replies; 6+ messages in thread
From: Lee Jones @ 2023-06-19 10:43 UTC (permalink / raw)
  To: dave.hansen, luto, peterz, tglx, mingo, bp, x86, hpa, linux-mm
  Cc: linux-kernel

On Fri, 16 Jun 2023, Lee Jones wrote:

> On Wed, 14 Jun 2023, Lee Jones wrote:
> 
> > set_pgd() expects to be passed whole pages to operate on, whereas
> > trampoline_pgd_entry is, as the name suggests, an entry.  The
> > ramifications for using set_pgd() here are that the following thread of
> > execution will not only place the suggested value into the
> > trampoline_pgd_entry (8-Byte globally stored [.bss]) variable, PTI will
> > also attempt to replicate that value into the non-existent neighboring
> > user page (located +4k away), leading to the corruption of other global
> > [.bss] stored variables.
> > 
> > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Signed-off-by: Lee Jones <lee@kernel.org>
> > ---
> >  arch/x86/mm/kaslr.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> Is there any more you need from me at this point?
> 
> Would you like me to resubmit with the Fixes: tag applied, or is
> someone happy to apply it on merge?

Superstar, thanks Dave.

-- 
Lee Jones [李琼斯]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-06-19 10:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-14 16:38 [PATCH v2 0/1] x86: Fix .bss corruption Lee Jones
2023-06-14 16:38 ` [PATCH v2 1/1] x86/mm/KASLR: Store pud_page_tramp into entry rather than page Lee Jones
2023-06-14 17:48   ` Nick Desaulniers
2023-06-14 18:43     ` Lee Jones
2023-06-16 17:00   ` Lee Jones
2023-06-19 10:43     ` Lee Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).