linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/mm: Fix boot with some memory above MAXMEM
@ 2020-05-11 19:17 Kirill A. Shutemov
  2020-05-25  4:49 ` Kirill A. Shutemov
  0 siblings, 1 reply; 8+ messages in thread
From: Kirill A. Shutemov @ 2020-05-11 19:17 UTC (permalink / raw)
  To: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin
  Cc: Dan Williams, Tony Luck, x86, linux-mm, linux-kernel,
	Kirill A. Shutemov, Dave Hansen, stable

A 5-level paging capable machine can have memory above 46-bit in the
physical address space. This memory is only addressable in the 5-level
paging mode: we don't have enough virtual address space to create direct
mapping for such memory in the 4-level paging mode.

Currently, we fail boot completely: NULL pointer dereference in
subsection_map_init().

Skip creating a memblock for such memory instead and notify user that
some memory is not addressable.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Cc: stable@vger.kernel.org # v4.14
---

Tested with a hacked QEMU: https://gist.github.com/kiryl/d45eb54110944ff95e544972d8bdac1d

---
 arch/x86/kernel/e820.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index c5399e80c59c..d320d37d0f95 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1280,8 +1280,8 @@ void __init e820__memory_setup(void)
 
 void __init e820__memblock_setup(void)
 {
+	u64 size, end, not_addressable = 0;
 	int i;
-	u64 end;
 
 	/*
 	 * The bootstrap memblock region count maximum is 128 entries
@@ -1307,7 +1307,22 @@ void __init e820__memblock_setup(void)
 		if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
 			continue;
 
-		memblock_add(entry->addr, entry->size);
+		if (entry->addr >= MAXMEM) {
+			not_addressable += entry->size;
+			continue;
+		}
+
+		end = min_t(u64, end, MAXMEM - 1);
+		size = end - entry->addr;
+		not_addressable += entry->size - size;
+		memblock_add(entry->addr, size);
+	}
+
+	if (not_addressable) {
+		pr_err("%lldGB of physical memory is not addressable in the paging mode\n",
+		       not_addressable >> 30);
+		if (!pgtable_l5_enabled())
+			pr_err("Consider enabling 5-level paging\n");
 	}
 
 	/* Throw away partial pages: */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/mm: Fix boot with some memory above MAXMEM
  2020-05-11 19:17 [PATCH] x86/mm: Fix boot with some memory above MAXMEM Kirill A. Shutemov
@ 2020-05-25  4:49 ` Kirill A. Shutemov
  2020-05-25 14:59   ` Mike Rapoport
  0 siblings, 1 reply; 8+ messages in thread
From: Kirill A. Shutemov @ 2020-05-25  4:49 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Dan Williams,
	Tony Luck, x86, linux-mm, linux-kernel, Dave Hansen, stable

On Mon, May 11, 2020 at 10:17:21PM +0300, Kirill A. Shutemov wrote:
> A 5-level paging capable machine can have memory above 46-bit in the
> physical address space. This memory is only addressable in the 5-level
> paging mode: we don't have enough virtual address space to create direct
> mapping for such memory in the 4-level paging mode.
> 
> Currently, we fail boot completely: NULL pointer dereference in
> subsection_map_init().
> 
> Skip creating a memblock for such memory instead and notify user that
> some memory is not addressable.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Dave Hansen <dave.hansen@intel.com>
> Cc: stable@vger.kernel.org # v4.14
> ---

Gentle ping.

It's not urgent, but it's a bug fix. Please consider applying.

> Tested with a hacked QEMU: https://gist.github.com/kiryl/d45eb54110944ff95e544972d8bdac1d
> 
> ---
>  arch/x86/kernel/e820.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index c5399e80c59c..d320d37d0f95 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -1280,8 +1280,8 @@ void __init e820__memory_setup(void)
>  
>  void __init e820__memblock_setup(void)
>  {
> +	u64 size, end, not_addressable = 0;
>  	int i;
> -	u64 end;
>  
>  	/*
>  	 * The bootstrap memblock region count maximum is 128 entries
> @@ -1307,7 +1307,22 @@ void __init e820__memblock_setup(void)
>  		if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
>  			continue;
>  
> -		memblock_add(entry->addr, entry->size);
> +		if (entry->addr >= MAXMEM) {
> +			not_addressable += entry->size;
> +			continue;
> +		}
> +
> +		end = min_t(u64, end, MAXMEM - 1);
> +		size = end - entry->addr;
> +		not_addressable += entry->size - size;
> +		memblock_add(entry->addr, size);
> +	}
> +
> +	if (not_addressable) {
> +		pr_err("%lldGB of physical memory is not addressable in the paging mode\n",
> +		       not_addressable >> 30);
> +		if (!pgtable_l5_enabled())
> +			pr_err("Consider enabling 5-level paging\n");
>  	}
>  
>  	/* Throw away partial pages: */
> -- 
> 2.26.2
> 
> 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/mm: Fix boot with some memory above MAXMEM
  2020-05-25  4:49 ` Kirill A. Shutemov
@ 2020-05-25 14:59   ` Mike Rapoport
  2020-05-25 15:08     ` Kirill A. Shutemov
  0 siblings, 1 reply; 8+ messages in thread
From: Mike Rapoport @ 2020-05-25 14:59 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dan Williams, Tony Luck, x86, linux-mm, linux-kernel,
	Dave Hansen, stable

On Mon, May 25, 2020 at 07:49:02AM +0300, Kirill A. Shutemov wrote:
> On Mon, May 11, 2020 at 10:17:21PM +0300, Kirill A. Shutemov wrote:
> > A 5-level paging capable machine can have memory above 46-bit in the
> > physical address space. This memory is only addressable in the 5-level
> > paging mode: we don't have enough virtual address space to create direct
> > mapping for such memory in the 4-level paging mode.
> > 
> > Currently, we fail boot completely: NULL pointer dereference in
> > subsection_map_init().
> > 
> > Skip creating a memblock for such memory instead and notify user that
> > some memory is not addressable.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Reviewed-by: Dave Hansen <dave.hansen@intel.com>
> > Cc: stable@vger.kernel.org # v4.14
> > ---
> 
> Gentle ping.
> 
> It's not urgent, but it's a bug fix. Please consider applying.
> 
> > Tested with a hacked QEMU: https://gist.github.com/kiryl/d45eb54110944ff95e544972d8bdac1d
> > 
> > ---
> >  arch/x86/kernel/e820.c | 19 +++++++++++++++++--
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> > index c5399e80c59c..d320d37d0f95 100644
> > --- a/arch/x86/kernel/e820.c
> > +++ b/arch/x86/kernel/e820.c
> > @@ -1280,8 +1280,8 @@ void __init e820__memory_setup(void)
> >  
> >  void __init e820__memblock_setup(void)
> >  {
> > +	u64 size, end, not_addressable = 0;
> >  	int i;
> > -	u64 end;
> >  
> >  	/*
> >  	 * The bootstrap memblock region count maximum is 128 entries
> > @@ -1307,7 +1307,22 @@ void __init e820__memblock_setup(void)
> >  		if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
> >  			continue;
> >  
> > -		memblock_add(entry->addr, entry->size);
> > +		if (entry->addr >= MAXMEM) {
> > +			not_addressable += entry->size;
> > +			continue;
> > +		}
> > +
> > +		end = min_t(u64, end, MAXMEM - 1);
> > +		size = end - entry->addr;
> > +		not_addressable += entry->size - size;
> > +		memblock_add(entry->addr, size);
> > +	}
> > +
> > +	if (not_addressable) {
> > +		pr_err("%lldGB of physical memory is not addressable in the paging mode\n",
> > +		       not_addressable >> 30);
> > +		if (!pgtable_l5_enabled())
> > +			pr_err("Consider enabling 5-level paging\n");

Could this happen at all when l5 is enabled?
Does it mean we need kmap() for 64-bit?

> >  	}
> >  
> >  	/* Throw away partial pages: */
> > -- 
> > 2.26.2
> > 
> > 
> 
> -- 
>  Kirill A. Shutemov
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/mm: Fix boot with some memory above MAXMEM
  2020-05-25 14:59   ` Mike Rapoport
@ 2020-05-25 15:08     ` Kirill A. Shutemov
  2020-05-25 15:58       ` Mike Rapoport
  2020-05-26 14:27       ` Dave Hansen
  0 siblings, 2 replies; 8+ messages in thread
From: Kirill A. Shutemov @ 2020-05-25 15:08 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Kirill A. Shutemov, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dan Williams, Tony Luck, x86, linux-mm, linux-kernel,
	Dave Hansen, stable

On Mon, May 25, 2020 at 05:59:43PM +0300, Mike Rapoport wrote:
> On Mon, May 25, 2020 at 07:49:02AM +0300, Kirill A. Shutemov wrote:
> > On Mon, May 11, 2020 at 10:17:21PM +0300, Kirill A. Shutemov wrote:
> > > A 5-level paging capable machine can have memory above 46-bit in the
> > > physical address space. This memory is only addressable in the 5-level
> > > paging mode: we don't have enough virtual address space to create direct
> > > mapping for such memory in the 4-level paging mode.
> > > 
> > > Currently, we fail boot completely: NULL pointer dereference in
> > > subsection_map_init().
> > > 
> > > Skip creating a memblock for such memory instead and notify user that
> > > some memory is not addressable.
> > > 
> > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > Reviewed-by: Dave Hansen <dave.hansen@intel.com>
> > > Cc: stable@vger.kernel.org # v4.14
> > > ---
> > 
> > Gentle ping.
> > 
> > It's not urgent, but it's a bug fix. Please consider applying.
> > 
> > > Tested with a hacked QEMU: https://gist.github.com/kiryl/d45eb54110944ff95e544972d8bdac1d
> > > 
> > > ---
> > >  arch/x86/kernel/e820.c | 19 +++++++++++++++++--
> > >  1 file changed, 17 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> > > index c5399e80c59c..d320d37d0f95 100644
> > > --- a/arch/x86/kernel/e820.c
> > > +++ b/arch/x86/kernel/e820.c
> > > @@ -1280,8 +1280,8 @@ void __init e820__memory_setup(void)
> > >  
> > >  void __init e820__memblock_setup(void)
> > >  {
> > > +	u64 size, end, not_addressable = 0;
> > >  	int i;
> > > -	u64 end;
> > >  
> > >  	/*
> > >  	 * The bootstrap memblock region count maximum is 128 entries
> > > @@ -1307,7 +1307,22 @@ void __init e820__memblock_setup(void)
> > >  		if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
> > >  			continue;
> > >  
> > > -		memblock_add(entry->addr, entry->size);
> > > +		if (entry->addr >= MAXMEM) {
> > > +			not_addressable += entry->size;
> > > +			continue;
> > > +		}
> > > +
> > > +		end = min_t(u64, end, MAXMEM - 1);
> > > +		size = end - entry->addr;
> > > +		not_addressable += entry->size - size;
> > > +		memblock_add(entry->addr, size);
> > > +	}
> > > +
> > > +	if (not_addressable) {
> > > +		pr_err("%lldGB of physical memory is not addressable in the paging mode\n",
> > > +		       not_addressable >> 30);
> > > +		if (!pgtable_l5_enabled())
> > > +			pr_err("Consider enabling 5-level paging\n");
> 
> Could this happen at all when l5 is enabled?
> Does it mean we need kmap() for 64-bit?

It's future-profing. Who knows what paging modes we would have in the
future.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/mm: Fix boot with some memory above MAXMEM
  2020-05-25 15:08     ` Kirill A. Shutemov
@ 2020-05-25 15:58       ` Mike Rapoport
  2020-05-26 14:27       ` Dave Hansen
  1 sibling, 0 replies; 8+ messages in thread
From: Mike Rapoport @ 2020-05-25 15:58 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Kirill A. Shutemov, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dan Williams, Tony Luck, x86, linux-mm, linux-kernel,
	Dave Hansen, stable

On Mon, May 25, 2020 at 06:08:20PM +0300, Kirill A. Shutemov wrote:
> On Mon, May 25, 2020 at 05:59:43PM +0300, Mike Rapoport wrote:
> > On Mon, May 25, 2020 at 07:49:02AM +0300, Kirill A. Shutemov wrote:
> > > On Mon, May 11, 2020 at 10:17:21PM +0300, Kirill A. Shutemov wrote:
> > > > A 5-level paging capable machine can have memory above 46-bit in the
> > > > physical address space. This memory is only addressable in the 5-level
> > > > paging mode: we don't have enough virtual address space to create direct
> > > > mapping for such memory in the 4-level paging mode.
> > > > 
> > > > Currently, we fail boot completely: NULL pointer dereference in
> > > > subsection_map_init().
> > > > 
> > > > Skip creating a memblock for such memory instead and notify user that
> > > > some memory is not addressable.
> > > > 
> > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > > Reviewed-by: Dave Hansen <dave.hansen@intel.com>
> > > > Cc: stable@vger.kernel.org # v4.14
> > > > ---
> > > 
> > > Gentle ping.
> > > 
> > > It's not urgent, but it's a bug fix. Please consider applying.
> > > 
> > > > Tested with a hacked QEMU: https://gist.github.com/kiryl/d45eb54110944ff95e544972d8bdac1d
> > > > 
> > > > ---
> > > >  arch/x86/kernel/e820.c | 19 +++++++++++++++++--
> > > >  1 file changed, 17 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> > > > index c5399e80c59c..d320d37d0f95 100644
> > > > --- a/arch/x86/kernel/e820.c
> > > > +++ b/arch/x86/kernel/e820.c
> > > > @@ -1280,8 +1280,8 @@ void __init e820__memory_setup(void)
> > > >  
> > > >  void __init e820__memblock_setup(void)
> > > >  {
> > > > +	u64 size, end, not_addressable = 0;
> > > >  	int i;
> > > > -	u64 end;
> > > >  
> > > >  	/*
> > > >  	 * The bootstrap memblock region count maximum is 128 entries
> > > > @@ -1307,7 +1307,22 @@ void __init e820__memblock_setup(void)
> > > >  		if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
> > > >  			continue;
> > > >  
> > > > -		memblock_add(entry->addr, entry->size);
> > > > +		if (entry->addr >= MAXMEM) {
> > > > +			not_addressable += entry->size;
> > > > +			continue;
> > > > +		}
> > > > +
> > > > +		end = min_t(u64, end, MAXMEM - 1);
> > > > +		size = end - entry->addr;
> > > > +		not_addressable += entry->size - size;
> > > > +		memblock_add(entry->addr, size);
> > > > +	}
> > > > +
> > > > +	if (not_addressable) {
> > > > +		pr_err("%lldGB of physical memory is not addressable in the paging mode\n",
> > > > +		       not_addressable >> 30);
> > > > +		if (!pgtable_l5_enabled())
> > > > +			pr_err("Consider enabling 5-level paging\n");
> > 
> > Could this happen at all when l5 is enabled?
> > Does it mean we need kmap() for 64-bit?
> 
> It's future-profing. Who knows what paging modes we would have in the
> future.

Than maybe

	pr_err("%lldGB of physical memory is not addressable in %s the paging mode\n",
               not_addressable >> 30, pgtable_l5_enabled() "5-level" ? "4-level");

"the paging mode" on its own sounds a bit awkward to me.

> -- 
>  Kirill A. Shutemov

-- 
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/mm: Fix boot with some memory above MAXMEM
  2020-05-25 15:08     ` Kirill A. Shutemov
  2020-05-25 15:58       ` Mike Rapoport
@ 2020-05-26 14:27       ` Dave Hansen
  2020-06-02 23:18         ` Kirill A. Shutemov
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Hansen @ 2020-05-26 14:27 UTC (permalink / raw)
  To: Kirill A. Shutemov, Mike Rapoport
  Cc: Kirill A. Shutemov, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dan Williams, Tony Luck, x86, linux-mm, linux-kernel, stable

On 5/25/20 8:08 AM, Kirill A. Shutemov wrote:
>>>> +	if (not_addressable) {
>>>> +		pr_err("%lldGB of physical memory is not addressable in the paging mode\n",
>>>> +		       not_addressable >> 30);
>>>> +		if (!pgtable_l5_enabled())
>>>> +			pr_err("Consider enabling 5-level paging\n");
>> Could this happen at all when l5 is enabled?
>> Does it mean we need kmap() for 64-bit?
> It's future-profing. Who knows what paging modes we would have in the
> future.

Future-proofing and firmware-proofing. :)

In any case, are we *really* limited to 52 bits of physical memory with
5-level paging?  Previously, we said we were limited to 46 bits, and now
we're saying that the limit is 52 with 5-level paging:

#define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46)

The 46 was fine with the 48 bits of address space on 4-level paging
systems since we need 1/2 of the address space for userspace, 1/4 for
the direct map and 1/4 for the vmalloc-and-friends area.  At 46 bits of
address space, we fill up the direct map.

The hardware designers know this and never enumerated a MAXPHYADDR from
CPUID which was higher than what we could cover with 46 bits.  It was
nice and convenient that these two separate things matched:
1. The amount of physical address space addressable in a direct map
   consuming 1/4 of the virtual address space.
2. The CPU-enumerated MAXPHYADDR which among other things dictates how
   much physical address space is addressable in a PTE.

But, with 5-level paging, things are a little different.  The limit in
addressable memory because of running out of the direct map actually
happens at 55 bits (57-2=55, analogous to the 4-level 48-2=46).

So shouldn't it technically be this:

#define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 55 : 46)

?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/mm: Fix boot with some memory above MAXMEM
  2020-05-26 14:27       ` Dave Hansen
@ 2020-06-02 23:18         ` Kirill A. Shutemov
  2020-06-03 19:18           ` Dave Hansen
  0 siblings, 1 reply; 8+ messages in thread
From: Kirill A. Shutemov @ 2020-06-02 23:18 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Mike Rapoport, Kirill A. Shutemov, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dan Williams, Tony Luck, x86, linux-mm,
	linux-kernel, stable

On Tue, May 26, 2020 at 07:27:15AM -0700, Dave Hansen wrote:
> On 5/25/20 8:08 AM, Kirill A. Shutemov wrote:
> >>>> +	if (not_addressable) {
> >>>> +		pr_err("%lldGB of physical memory is not addressable in the paging mode\n",
> >>>> +		       not_addressable >> 30);
> >>>> +		if (!pgtable_l5_enabled())
> >>>> +			pr_err("Consider enabling 5-level paging\n");
> >> Could this happen at all when l5 is enabled?
> >> Does it mean we need kmap() for 64-bit?
> > It's future-profing. Who knows what paging modes we would have in the
> > future.
> 
> Future-proofing and firmware-proofing. :)
> 
> In any case, are we *really* limited to 52 bits of physical memory with
> 5-level paging?

Yes. It's architectural. SDM says "MAXPHYADDR is at most 52" (Vol 3A,
4.1.4).

I guess it can be extended with an opt-in feature and relevant changes to
page table structure. But as of today there's no such thing.

> Previously, we said we were limited to 46 bits, and now
> we're saying that the limit is 52 with 5-level paging:
> 
> #define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46)
> 
> The 46 was fine with the 48 bits of address space on 4-level paging
> systems since we need 1/2 of the address space for userspace, 1/4 for
> the direct map and 1/4 for the vmalloc-and-friends area.  At 46 bits of
> address space, we fill up the direct map.
> 
> The hardware designers know this and never enumerated a MAXPHYADDR from
> CPUID which was higher than what we could cover with 46 bits.  It was
> nice and convenient that these two separate things matched:
> 1. The amount of physical address space addressable in a direct map
>    consuming 1/4 of the virtual address space.
> 2. The CPU-enumerated MAXPHYADDR which among other things dictates how
>    much physical address space is addressable in a PTE.
> 
> But, with 5-level paging, things are a little different.  The limit in
> addressable memory because of running out of the direct map actually
> happens at 55 bits (57-2=55, analogous to the 4-level 48-2=46).
> 
> So shouldn't it technically be this:
> 
> #define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 55 : 46)
> 
> ?

Bits above 52 are ignored in the page table entries and accessible to
software. Some of them got claimed by HW features (XD-bit, protection
keys), but such features require explicit opt-in on software side.

Kernel could claim bits 53-55 for the physical address, but it doesn't get
us anything: if future HW would provide such feature it would require
opt-in. On other hand claiming them now means we cannot use them for other
purposes as SW bit. I don't see a point.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86/mm: Fix boot with some memory above MAXMEM
  2020-06-02 23:18         ` Kirill A. Shutemov
@ 2020-06-03 19:18           ` Dave Hansen
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Hansen @ 2020-06-03 19:18 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Mike Rapoport, Kirill A. Shutemov, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Dan Williams, Tony Luck, x86, linux-mm,
	linux-kernel, stable

On 6/2/20 4:18 PM, Kirill A. Shutemov wrote:
> On Tue, May 26, 2020 at 07:27:15AM -0700, Dave Hansen wrote:
>> On 5/25/20 8:08 AM, Kirill A. Shutemov wrote:
>>>>>> +	if (not_addressable) {
>>>>>> +		pr_err("%lldGB of physical memory is not addressable in the paging mode\n",
>>>>>> +		       not_addressable >> 30);
>>>>>> +		if (!pgtable_l5_enabled())
>>>>>> +			pr_err("Consider enabling 5-level paging\n");
>>>> Could this happen at all when l5 is enabled?
>>>> Does it mean we need kmap() for 64-bit?
>>> It's future-profing. Who knows what paging modes we would have in the
>>> future.
>>
>> Future-proofing and firmware-proofing. :)
>>
>> In any case, are we *really* limited to 52 bits of physical memory with
>> 5-level paging?
> 
> Yes. It's architectural. SDM says "MAXPHYADDR is at most 52" (Vol 3A,
> 4.1.4).

Right you are.

I'm glad it's in the architecture.  Makes all of this a lot easier!

>> So shouldn't it technically be this:
>>
>> #define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 55 : 46)
>>
>> ?
> 
> Bits above 52 are ignored in the page table entries and accessible to
> software. Some of them got claimed by HW features (XD-bit, protection
> keys), but such features require explicit opt-in on software side.
> 
> Kernel could claim bits 53-55 for the physical address, but it doesn't get
> us anything: if future HW would provide such feature it would require
> opt-in. On other hand claiming them now means we cannot use them for other
> purposes as SW bit. I don't see a point.

Yep, agreed.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-06-03 19:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-11 19:17 [PATCH] x86/mm: Fix boot with some memory above MAXMEM Kirill A. Shutemov
2020-05-25  4:49 ` Kirill A. Shutemov
2020-05-25 14:59   ` Mike Rapoport
2020-05-25 15:08     ` Kirill A. Shutemov
2020-05-25 15:58       ` Mike Rapoport
2020-05-26 14:27       ` Dave Hansen
2020-06-02 23:18         ` Kirill A. Shutemov
2020-06-03 19:18           ` Dave Hansen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).