kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
* [PATCH] arm64: kvm: handle 52-bit VA regions correctly under nVHE
@ 2021-03-30 11:21 Ard Biesheuvel
  2021-03-30 12:44 ` Marc Zyngier
  0 siblings, 1 reply; 8+ messages in thread
From: Ard Biesheuvel @ 2021-03-30 11:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: will, maz, anshuman.khandual, catalin.marinas, kernel-team, kvmarm

Commit f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA
configurations") introduced a new layout for the 52-bit VA space, in
order to maximize the space available to the linear region. After this
change, the kernel VA space is no longer split 1:1 down the middle, and
as it turns out, this violates an assumption in the KVM init code when
it chooses the layout for the nVHE EL2 mapping.

Given that EFI does not support 52-bit VA addressing (as it only
supports 4k pages), and that in general, loaders cannot assume that the
kernel being loaded supports 52-bit VA/PA addressing in the first place,
we can safely assume that the kernel, and therefore the .idmap section,
will be 48-bit addressable on 52-bit VA capable systems.

So in this case, organize the nVHE EL2 address space as a 2^48 byte
window starting at address 0x0, containing the ID map and the
hypervisor's private mappings, followed by a contiguous 2^52 - 2^48 byte
linear region. (Note that EL1's linear region is 2^52 - 2^47 bytes in
size, so it is slightly larger, but this only matters on systems where
the DRAM footprint in the physical memory map exceeds 3968 TB)

Fixes: f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA configurations")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 Documentation/arm64/booting.rst |  6 +++---
 arch/arm64/kvm/va_layout.c      | 18 ++++++++++++++----
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst
index 7552dbc1cc54..418ec9b63d2c 100644
--- a/Documentation/arm64/booting.rst
+++ b/Documentation/arm64/booting.rst
@@ -121,8 +121,8 @@ Header notes:
 			  to the base of DRAM, since memory below it is not
 			  accessible via the linear mapping
 			1
-			  2MB aligned base may be anywhere in physical
-			  memory
+			  2MB aligned base may be anywhere in the 48-bit
+			  addressable physical memory region
   Bits 4-63	Reserved.
   ============= ===============================================================
 
@@ -132,7 +132,7 @@ Header notes:
   depending on selected features, and is effectively unbound.
 
 The Image must be placed text_offset bytes from a 2MB aligned base
-address anywhere in usable system RAM and called there. The region
+address in 48-bit addressable system RAM and called there. The region
 between the 2 MB aligned base address and the start of the image has no
 special significance to the kernel, and may be used for other purposes.
 At least image_size bytes from the start of the image must be free for
diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c
index 978301392d67..e9ab449de197 100644
--- a/arch/arm64/kvm/va_layout.c
+++ b/arch/arm64/kvm/va_layout.c
@@ -62,9 +62,19 @@ __init void kvm_compute_layout(void)
 	phys_addr_t idmap_addr = __pa_symbol(__hyp_idmap_text_start);
 	u64 hyp_va_msb;
 
-	/* Where is my RAM region? */
-	hyp_va_msb  = idmap_addr & BIT(vabits_actual - 1);
-	hyp_va_msb ^= BIT(vabits_actual - 1);
+	/*
+	 * On LVA capable hardware, the kernel is guaranteed to reside
+	 * in the 48-bit addressable part of physical memory, and so
+	 * the idmap will be located there as well. Put the EL2 linear
+	 * region right after it, where it can grow upward to fill the
+	 * entire 52-bit VA region.
+	 */
+	if (vabits_actual > VA_BITS_MIN) {
+		hyp_va_msb = BIT(VA_BITS_MIN);
+	} else {
+		hyp_va_msb  = idmap_addr & BIT(vabits_actual - 1);
+		hyp_va_msb ^= BIT(vabits_actual - 1);
+	}
 
 	tag_lsb = fls64((u64)phys_to_virt(memblock_start_of_DRAM()) ^
 			(u64)(high_memory - 1));
@@ -72,7 +82,7 @@ __init void kvm_compute_layout(void)
 	va_mask = GENMASK_ULL(tag_lsb - 1, 0);
 	tag_val = hyp_va_msb;
 
-	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && tag_lsb != (vabits_actual - 1)) {
+	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && tag_lsb < (vabits_actual - 1)) {
 		/* We have some free bits to insert a random tag. */
 		tag_val |= get_random_long() & GENMASK_ULL(vabits_actual - 2, tag_lsb);
 	}
-- 
2.31.0.291.g576ba9dcdaf-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] arm64: kvm: handle 52-bit VA regions correctly under nVHE
  2021-03-30 11:21 [PATCH] arm64: kvm: handle 52-bit VA regions correctly under nVHE Ard Biesheuvel
@ 2021-03-30 12:44 ` Marc Zyngier
  2021-03-30 12:49   ` Ard Biesheuvel
  0 siblings, 1 reply; 8+ messages in thread
From: Marc Zyngier @ 2021-03-30 12:44 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: will, anshuman.khandual, catalin.marinas, kernel-team, kvmarm,
	linux-arm-kernel

On Tue, 30 Mar 2021 12:21:26 +0100,
Ard Biesheuvel <ardb@kernel.org> wrote:
> 
> Commit f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA
> configurations") introduced a new layout for the 52-bit VA space, in
> order to maximize the space available to the linear region. After this
> change, the kernel VA space is no longer split 1:1 down the middle, and
> as it turns out, this violates an assumption in the KVM init code when
> it chooses the layout for the nVHE EL2 mapping.
> 
> Given that EFI does not support 52-bit VA addressing (as it only
> supports 4k pages), and that in general, loaders cannot assume that the
> kernel being loaded supports 52-bit VA/PA addressing in the first place,
> we can safely assume that the kernel, and therefore the .idmap section,
> will be 48-bit addressable on 52-bit VA capable systems.
> 
> So in this case, organize the nVHE EL2 address space as a 2^48 byte
> window starting at address 0x0, containing the ID map and the
> hypervisor's private mappings, followed by a contiguous 2^52 - 2^48 byte
> linear region. (Note that EL1's linear region is 2^52 - 2^47 bytes in
> size, so it is slightly larger, but this only matters on systems where
> the DRAM footprint in the physical memory map exceeds 3968 TB)

So if I have memory in the [2^52 - 2^48, 2^52 - 2^47] range, not
necessarily because I have that much memory, but because my system has
multiple memory banks, one of which lands on that spot, I cannot map
such memory at EL2. We'll explode at run time.

Can we keep the private mapping to 47 bits and restore the missing
chunk to the linear mapping? Of course, it means that the linear map
is now potential no linear anymore, so we'd have to garantee that the
kernel lines in the first 2^47 bits instead. Crap.

> 
> Fixes: f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA configurations")
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  Documentation/arm64/booting.rst |  6 +++---
>  arch/arm64/kvm/va_layout.c      | 18 ++++++++++++++----
>  2 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst
> index 7552dbc1cc54..418ec9b63d2c 100644
> --- a/Documentation/arm64/booting.rst
> +++ b/Documentation/arm64/booting.rst
> @@ -121,8 +121,8 @@ Header notes:
>  			  to the base of DRAM, since memory below it is not
>  			  accessible via the linear mapping
>  			1
> -			  2MB aligned base may be anywhere in physical
> -			  memory
> +			  2MB aligned base may be anywhere in the 48-bit
> +			  addressable physical memory region
>    Bits 4-63	Reserved.
>    ============= ===============================================================
>  
> @@ -132,7 +132,7 @@ Header notes:
>    depending on selected features, and is effectively unbound.
>  
>  The Image must be placed text_offset bytes from a 2MB aligned base
> -address anywhere in usable system RAM and called there. The region
> +address in 48-bit addressable system RAM and called there. The region
>  between the 2 MB aligned base address and the start of the image has no
>  special significance to the kernel, and may be used for other purposes.
>  At least image_size bytes from the start of the image must be free for
> diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c
> index 978301392d67..e9ab449de197 100644
> --- a/arch/arm64/kvm/va_layout.c
> +++ b/arch/arm64/kvm/va_layout.c
> @@ -62,9 +62,19 @@ __init void kvm_compute_layout(void)
>  	phys_addr_t idmap_addr = __pa_symbol(__hyp_idmap_text_start);
>  	u64 hyp_va_msb;
>  
> -	/* Where is my RAM region? */
> -	hyp_va_msb  = idmap_addr & BIT(vabits_actual - 1);
> -	hyp_va_msb ^= BIT(vabits_actual - 1);
> +	/*
> +	 * On LVA capable hardware, the kernel is guaranteed to reside
> +	 * in the 48-bit addressable part of physical memory, and so
> +	 * the idmap will be located there as well. Put the EL2 linear
> +	 * region right after it, where it can grow upward to fill the
> +	 * entire 52-bit VA region.
> +	 */
> +	if (vabits_actual > VA_BITS_MIN) {
> +		hyp_va_msb = BIT(VA_BITS_MIN);
> +	} else {
> +		hyp_va_msb  = idmap_addr & BIT(vabits_actual - 1);
> +		hyp_va_msb ^= BIT(vabits_actual - 1);
> +	}
>  
>  	tag_lsb = fls64((u64)phys_to_virt(memblock_start_of_DRAM()) ^
>  			(u64)(high_memory - 1));
> @@ -72,7 +82,7 @@ __init void kvm_compute_layout(void)
>  	va_mask = GENMASK_ULL(tag_lsb - 1, 0);
>  	tag_val = hyp_va_msb;
>  
> -	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && tag_lsb != (vabits_actual - 1)) {
> +	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && tag_lsb < (vabits_actual - 1)) {
>  		/* We have some free bits to insert a random tag. */
>  		tag_val |= get_random_long() & GENMASK_ULL(vabits_actual - 2, tag_lsb);
>  	}

It seems __create_hyp_private mapping() still refers to (VA_BITS - 1)
to choose where to allocate the IO mappings, and
__pkvm_create_private_mapping() relies on similar things based on what
hyp_create_idmap().

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] arm64: kvm: handle 52-bit VA regions correctly under nVHE
  2021-03-30 12:44 ` Marc Zyngier
@ 2021-03-30 12:49   ` Ard Biesheuvel
  2021-03-30 13:04     ` Marc Zyngier
  0 siblings, 1 reply; 8+ messages in thread
From: Ard Biesheuvel @ 2021-03-30 12:49 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, Anshuman Khandual, Catalin Marinas,
	Android Kernel Team, kvmarm, Linux ARM

On Tue, 30 Mar 2021 at 14:44, Marc Zyngier <maz@kernel.org> wrote:
>
> On Tue, 30 Mar 2021 12:21:26 +0100,
> Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > Commit f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA
> > configurations") introduced a new layout for the 52-bit VA space, in
> > order to maximize the space available to the linear region. After this
> > change, the kernel VA space is no longer split 1:1 down the middle, and
> > as it turns out, this violates an assumption in the KVM init code when
> > it chooses the layout for the nVHE EL2 mapping.
> >
> > Given that EFI does not support 52-bit VA addressing (as it only
> > supports 4k pages), and that in general, loaders cannot assume that the
> > kernel being loaded supports 52-bit VA/PA addressing in the first place,
> > we can safely assume that the kernel, and therefore the .idmap section,
> > will be 48-bit addressable on 52-bit VA capable systems.
> >
> > So in this case, organize the nVHE EL2 address space as a 2^48 byte
> > window starting at address 0x0, containing the ID map and the
> > hypervisor's private mappings, followed by a contiguous 2^52 - 2^48 byte
> > linear region. (Note that EL1's linear region is 2^52 - 2^47 bytes in
> > size, so it is slightly larger, but this only matters on systems where
> > the DRAM footprint in the physical memory map exceeds 3968 TB)
>
> So if I have memory in the [2^52 - 2^48, 2^52 - 2^47] range, not
> necessarily because I have that much memory, but because my system has
> multiple memory banks, one of which lands on that spot, I cannot map
> such memory at EL2. We'll explode at run time.
>
> Can we keep the private mapping to 47 bits and restore the missing
> chunk to the linear mapping? Of course, it means that the linear map
> is now potential no linear anymore, so we'd have to garantee that the
> kernel lines in the first 2^47 bits instead. Crap.
>

Yeah. The linear region needs to be contiguous. Alternatively, we
could restrict the upper address limit for loading the kernel to 47
bits.

> >
> > Fixes: f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA configurations")
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  Documentation/arm64/booting.rst |  6 +++---
> >  arch/arm64/kvm/va_layout.c      | 18 ++++++++++++++----
> >  2 files changed, 17 insertions(+), 7 deletions(-)
> >
> > diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst
> > index 7552dbc1cc54..418ec9b63d2c 100644
> > --- a/Documentation/arm64/booting.rst
> > +++ b/Documentation/arm64/booting.rst
> > @@ -121,8 +121,8 @@ Header notes:
> >                         to the base of DRAM, since memory below it is not
> >                         accessible via the linear mapping
> >                       1
> > -                       2MB aligned base may be anywhere in physical
> > -                       memory
> > +                       2MB aligned base may be anywhere in the 48-bit
> > +                       addressable physical memory region
> >    Bits 4-63  Reserved.
> >    ============= ===============================================================
> >
> > @@ -132,7 +132,7 @@ Header notes:
> >    depending on selected features, and is effectively unbound.
> >
> >  The Image must be placed text_offset bytes from a 2MB aligned base
> > -address anywhere in usable system RAM and called there. The region
> > +address in 48-bit addressable system RAM and called there. The region
> >  between the 2 MB aligned base address and the start of the image has no
> >  special significance to the kernel, and may be used for other purposes.
> >  At least image_size bytes from the start of the image must be free for
> > diff --git a/arch/arm64/kvm/va_layout.c b/arch/arm64/kvm/va_layout.c
> > index 978301392d67..e9ab449de197 100644
> > --- a/arch/arm64/kvm/va_layout.c
> > +++ b/arch/arm64/kvm/va_layout.c
> > @@ -62,9 +62,19 @@ __init void kvm_compute_layout(void)
> >       phys_addr_t idmap_addr = __pa_symbol(__hyp_idmap_text_start);
> >       u64 hyp_va_msb;
> >
> > -     /* Where is my RAM region? */
> > -     hyp_va_msb  = idmap_addr & BIT(vabits_actual - 1);
> > -     hyp_va_msb ^= BIT(vabits_actual - 1);
> > +     /*
> > +      * On LVA capable hardware, the kernel is guaranteed to reside
> > +      * in the 48-bit addressable part of physical memory, and so
> > +      * the idmap will be located there as well. Put the EL2 linear
> > +      * region right after it, where it can grow upward to fill the
> > +      * entire 52-bit VA region.
> > +      */
> > +     if (vabits_actual > VA_BITS_MIN) {
> > +             hyp_va_msb = BIT(VA_BITS_MIN);
> > +     } else {
> > +             hyp_va_msb  = idmap_addr & BIT(vabits_actual - 1);
> > +             hyp_va_msb ^= BIT(vabits_actual - 1);
> > +     }
> >
> >       tag_lsb = fls64((u64)phys_to_virt(memblock_start_of_DRAM()) ^
> >                       (u64)(high_memory - 1));
> > @@ -72,7 +82,7 @@ __init void kvm_compute_layout(void)
> >       va_mask = GENMASK_ULL(tag_lsb - 1, 0);
> >       tag_val = hyp_va_msb;
> >
> > -     if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && tag_lsb != (vabits_actual - 1)) {
> > +     if (IS_ENABLED(CONFIG_RANDOMIZE_BASE) && tag_lsb < (vabits_actual - 1)) {
> >               /* We have some free bits to insert a random tag. */
> >               tag_val |= get_random_long() & GENMASK_ULL(vabits_actual - 2, tag_lsb);
> >       }
>
> It seems __create_hyp_private mapping() still refers to (VA_BITS - 1)
> to choose where to allocate the IO mappings, and
> __pkvm_create_private_mapping() relies on similar things based on what
> hyp_create_idmap().
>

That was probably broken already then, given that it should refer to
vabits_actual. I'll address that in a separate patch.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] arm64: kvm: handle 52-bit VA regions correctly under nVHE
  2021-03-30 12:49   ` Ard Biesheuvel
@ 2021-03-30 13:04     ` Marc Zyngier
  2021-03-30 13:15       ` Ard Biesheuvel
  0 siblings, 1 reply; 8+ messages in thread
From: Marc Zyngier @ 2021-03-30 13:04 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Will Deacon, Anshuman Khandual, Catalin Marinas,
	Android Kernel Team, kvmarm, Linux ARM

On Tue, 30 Mar 2021 13:49:18 +0100,
Ard Biesheuvel <ardb@kernel.org> wrote:
> 
> On Tue, 30 Mar 2021 at 14:44, Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Tue, 30 Mar 2021 12:21:26 +0100,
> > Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > Commit f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA
> > > configurations") introduced a new layout for the 52-bit VA space, in
> > > order to maximize the space available to the linear region. After this
> > > change, the kernel VA space is no longer split 1:1 down the middle, and
> > > as it turns out, this violates an assumption in the KVM init code when
> > > it chooses the layout for the nVHE EL2 mapping.
> > >
> > > Given that EFI does not support 52-bit VA addressing (as it only
> > > supports 4k pages), and that in general, loaders cannot assume that the
> > > kernel being loaded supports 52-bit VA/PA addressing in the first place,
> > > we can safely assume that the kernel, and therefore the .idmap section,
> > > will be 48-bit addressable on 52-bit VA capable systems.
> > >
> > > So in this case, organize the nVHE EL2 address space as a 2^48 byte
> > > window starting at address 0x0, containing the ID map and the
> > > hypervisor's private mappings, followed by a contiguous 2^52 - 2^48 byte
> > > linear region. (Note that EL1's linear region is 2^52 - 2^47 bytes in
> > > size, so it is slightly larger, but this only matters on systems where
> > > the DRAM footprint in the physical memory map exceeds 3968 TB)
> >
> > So if I have memory in the [2^52 - 2^48, 2^52 - 2^47] range, not
> > necessarily because I have that much memory, but because my system has
> > multiple memory banks, one of which lands on that spot, I cannot map
> > such memory at EL2. We'll explode at run time.
> >
> > Can we keep the private mapping to 47 bits and restore the missing
> > chunk to the linear mapping? Of course, it means that the linear map
> > is now potential no linear anymore, so we'd have to garantee that the
> > kernel lines in the first 2^47 bits instead. Crap.
> >
> 
> Yeah. The linear region needs to be contiguous. Alternatively, we
> could restrict the upper address limit for loading the kernel to 47
> bits.

Is that something we can do retroactively? We could mandate it for
LVA systems only, but that's a bit odd.

[...]

> > It seems __create_hyp_private mapping() still refers to (VA_BITS - 1)
> > to choose where to allocate the IO mappings, and
> > __pkvm_create_private_mapping() relies on similar things based on what
> > hyp_create_idmap().
> >
> 
> That was probably broken already then, given that it should refer to
> vabits_actual. I'll address that in a separate patch.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] arm64: kvm: handle 52-bit VA regions correctly under nVHE
  2021-03-30 13:04     ` Marc Zyngier
@ 2021-03-30 13:15       ` Ard Biesheuvel
  2021-03-30 13:56         ` Marc Zyngier
  0 siblings, 1 reply; 8+ messages in thread
From: Ard Biesheuvel @ 2021-03-30 13:15 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, Anshuman Khandual, Catalin Marinas,
	Android Kernel Team, kvmarm, Linux ARM

On Tue, 30 Mar 2021 at 15:04, Marc Zyngier <maz@kernel.org> wrote:
>
> On Tue, 30 Mar 2021 13:49:18 +0100,
> Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Tue, 30 Mar 2021 at 14:44, Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Tue, 30 Mar 2021 12:21:26 +0100,
> > > Ard Biesheuvel <ardb@kernel.org> wrote:
> > > >
> > > > Commit f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA
> > > > configurations") introduced a new layout for the 52-bit VA space, in
> > > > order to maximize the space available to the linear region. After this
> > > > change, the kernel VA space is no longer split 1:1 down the middle, and
> > > > as it turns out, this violates an assumption in the KVM init code when
> > > > it chooses the layout for the nVHE EL2 mapping.
> > > >
> > > > Given that EFI does not support 52-bit VA addressing (as it only
> > > > supports 4k pages), and that in general, loaders cannot assume that the
> > > > kernel being loaded supports 52-bit VA/PA addressing in the first place,
> > > > we can safely assume that the kernel, and therefore the .idmap section,
> > > > will be 48-bit addressable on 52-bit VA capable systems.
> > > >
> > > > So in this case, organize the nVHE EL2 address space as a 2^48 byte
> > > > window starting at address 0x0, containing the ID map and the
> > > > hypervisor's private mappings, followed by a contiguous 2^52 - 2^48 byte
> > > > linear region. (Note that EL1's linear region is 2^52 - 2^47 bytes in
> > > > size, so it is slightly larger, but this only matters on systems where
> > > > the DRAM footprint in the physical memory map exceeds 3968 TB)
> > >
> > > So if I have memory in the [2^52 - 2^48, 2^52 - 2^47] range, not
> > > necessarily because I have that much memory, but because my system has
> > > multiple memory banks, one of which lands on that spot, I cannot map
> > > such memory at EL2. We'll explode at run time.
> > >
> > > Can we keep the private mapping to 47 bits and restore the missing
> > > chunk to the linear mapping? Of course, it means that the linear map
> > > is now potential no linear anymore, so we'd have to garantee that the
> > > kernel lines in the first 2^47 bits instead. Crap.
> > >
> >
> > Yeah. The linear region needs to be contiguous. Alternatively, we
> > could restrict the upper address limit for loading the kernel to 47
> > bits.
>
> Is that something we can do retroactively? We could mandate it for
> LVA systems only, but that's a bit odd.
>

Yeah, especially given the fact that LVA systems will be VHE capable
and may therefore not care in the first place.

On systems that have memory that high, EFI is likely to load the
kernel there, as it usually allocates from the top down, and it tries
to avoid having to move it around unless asked to (via KASLR), in
which case it will currently randomize over the entire available
memory space.

So it is going to add a special case for a corner^2 case, i.e., nVHE
on 52-bit/64k pages with more than 3968 TB distance between the start
and end of DRAM. Ugh.

It seems to me that the only way to solve this is to permit the idmap
and the hyp linear region to overlap, and use the 2^47 byte window at
the top of the address space for the hyp private mappings instead of
the one at the bottom.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] arm64: kvm: handle 52-bit VA regions correctly under nVHE
  2021-03-30 13:15       ` Ard Biesheuvel
@ 2021-03-30 13:56         ` Marc Zyngier
  2021-03-30 13:58           ` Ard Biesheuvel
  0 siblings, 1 reply; 8+ messages in thread
From: Marc Zyngier @ 2021-03-30 13:56 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Will Deacon, Anshuman Khandual, Catalin Marinas,
	Android Kernel Team, kvmarm, Linux ARM

On Tue, 30 Mar 2021 14:15:19 +0100,
Ard Biesheuvel <ardb@kernel.org> wrote:
> 
> On Tue, 30 Mar 2021 at 15:04, Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Tue, 30 Mar 2021 13:49:18 +0100,
> > Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Tue, 30 Mar 2021 at 14:44, Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > On Tue, 30 Mar 2021 12:21:26 +0100,
> > > > Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > >
> > > > > Commit f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA
> > > > > configurations") introduced a new layout for the 52-bit VA space, in
> > > > > order to maximize the space available to the linear region. After this
> > > > > change, the kernel VA space is no longer split 1:1 down the middle, and
> > > > > as it turns out, this violates an assumption in the KVM init code when
> > > > > it chooses the layout for the nVHE EL2 mapping.
> > > > >
> > > > > Given that EFI does not support 52-bit VA addressing (as it only
> > > > > supports 4k pages), and that in general, loaders cannot assume that the
> > > > > kernel being loaded supports 52-bit VA/PA addressing in the first place,
> > > > > we can safely assume that the kernel, and therefore the .idmap section,
> > > > > will be 48-bit addressable on 52-bit VA capable systems.
> > > > >
> > > > > So in this case, organize the nVHE EL2 address space as a 2^48 byte
> > > > > window starting at address 0x0, containing the ID map and the
> > > > > hypervisor's private mappings, followed by a contiguous 2^52 - 2^48 byte
> > > > > linear region. (Note that EL1's linear region is 2^52 - 2^47 bytes in
> > > > > size, so it is slightly larger, but this only matters on systems where
> > > > > the DRAM footprint in the physical memory map exceeds 3968 TB)
> > > >
> > > > So if I have memory in the [2^52 - 2^48, 2^52 - 2^47] range, not
> > > > necessarily because I have that much memory, but because my system has
> > > > multiple memory banks, one of which lands on that spot, I cannot map
> > > > such memory at EL2. We'll explode at run time.
> > > >
> > > > Can we keep the private mapping to 47 bits and restore the missing
> > > > chunk to the linear mapping? Of course, it means that the linear map
> > > > is now potential no linear anymore, so we'd have to garantee that the
> > > > kernel lines in the first 2^47 bits instead. Crap.
> > > >
> > >
> > > Yeah. The linear region needs to be contiguous. Alternatively, we
> > > could restrict the upper address limit for loading the kernel to 47
> > > bits.
> >
> > Is that something we can do retroactively? We could mandate it for
> > LVA systems only, but that's a bit odd.
> >
> 
> Yeah, especially given the fact that LVA systems will be VHE capable
> and may therefore not care in the first place.
> 
> On systems that have memory that high, EFI is likely to load the
> kernel there, as it usually allocates from the top down, and it tries
> to avoid having to move it around unless asked to (via KASLR), in
> which case it will currently randomize over the entire available
> memory space.
> 
> So it is going to add a special case for a corner^2 case, i.e., nVHE
> on 52-bit/64k pages with more than 3968 TB distance between the start
> and end of DRAM. Ugh.

Yeah. I'd rather we ignore that memory altogether, but I don't think
we can.

> It seems to me that the only way to solve this is to permit the idmap
> and the hyp linear region to overlap, and use the 2^47 byte window at
> the top of the address space for the hyp private mappings instead of
> the one at the bottom.

But that's the hard problem I want to avoid thinking of.

We need to ensure that there is no EL1 VA that is congruent with the
idmap over the kern_hyp_va() transformation. It means imposing
restrictions over the EL1 linear map, and prevent any allocation that
would result in this overlap (and that is including text).

How do we do that?

Frankly, I think we need to start looking into enabling VHE for the
nVHE /behaviour/. Having a single TTBR on these systems is just
insane.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] arm64: kvm: handle 52-bit VA regions correctly under nVHE
  2021-03-30 13:56         ` Marc Zyngier
@ 2021-03-30 13:58           ` Ard Biesheuvel
  2021-03-30 14:24             ` Marc Zyngier
  0 siblings, 1 reply; 8+ messages in thread
From: Ard Biesheuvel @ 2021-03-30 13:58 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, Anshuman Khandual, Catalin Marinas,
	Android Kernel Team, kvmarm, Linux ARM

On Tue, 30 Mar 2021 at 15:56, Marc Zyngier <maz@kernel.org> wrote:
>
> On Tue, 30 Mar 2021 14:15:19 +0100,
> Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Tue, 30 Mar 2021 at 15:04, Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Tue, 30 Mar 2021 13:49:18 +0100,
> > > Ard Biesheuvel <ardb@kernel.org> wrote:
> > > >
> > > > On Tue, 30 Mar 2021 at 14:44, Marc Zyngier <maz@kernel.org> wrote:
> > > > >
> > > > > On Tue, 30 Mar 2021 12:21:26 +0100,
> > > > > Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > > >
> > > > > > Commit f4693c2716b35d08 ("arm64: mm: extend linear region for 52-bit VA
> > > > > > configurations") introduced a new layout for the 52-bit VA space, in
> > > > > > order to maximize the space available to the linear region. After this
> > > > > > change, the kernel VA space is no longer split 1:1 down the middle, and
> > > > > > as it turns out, this violates an assumption in the KVM init code when
> > > > > > it chooses the layout for the nVHE EL2 mapping.
> > > > > >
> > > > > > Given that EFI does not support 52-bit VA addressing (as it only
> > > > > > supports 4k pages), and that in general, loaders cannot assume that the
> > > > > > kernel being loaded supports 52-bit VA/PA addressing in the first place,
> > > > > > we can safely assume that the kernel, and therefore the .idmap section,
> > > > > > will be 48-bit addressable on 52-bit VA capable systems.
> > > > > >
> > > > > > So in this case, organize the nVHE EL2 address space as a 2^48 byte
> > > > > > window starting at address 0x0, containing the ID map and the
> > > > > > hypervisor's private mappings, followed by a contiguous 2^52 - 2^48 byte
> > > > > > linear region. (Note that EL1's linear region is 2^52 - 2^47 bytes in
> > > > > > size, so it is slightly larger, but this only matters on systems where
> > > > > > the DRAM footprint in the physical memory map exceeds 3968 TB)
> > > > >
> > > > > So if I have memory in the [2^52 - 2^48, 2^52 - 2^47] range, not
> > > > > necessarily because I have that much memory, but because my system has
> > > > > multiple memory banks, one of which lands on that spot, I cannot map
> > > > > such memory at EL2. We'll explode at run time.
> > > > >
> > > > > Can we keep the private mapping to 47 bits and restore the missing
> > > > > chunk to the linear mapping? Of course, it means that the linear map
> > > > > is now potential no linear anymore, so we'd have to garantee that the
> > > > > kernel lines in the first 2^47 bits instead. Crap.
> > > > >
> > > >
> > > > Yeah. The linear region needs to be contiguous. Alternatively, we
> > > > could restrict the upper address limit for loading the kernel to 47
> > > > bits.
> > >
> > > Is that something we can do retroactively? We could mandate it for
> > > LVA systems only, but that's a bit odd.
> > >
> >
> > Yeah, especially given the fact that LVA systems will be VHE capable
> > and may therefore not care in the first place.
> >
> > On systems that have memory that high, EFI is likely to load the
> > kernel there, as it usually allocates from the top down, and it tries
> > to avoid having to move it around unless asked to (via KASLR), in
> > which case it will currently randomize over the entire available
> > memory space.
> >
> > So it is going to add a special case for a corner^2 case, i.e., nVHE
> > on 52-bit/64k pages with more than 3968 TB distance between the start
> > and end of DRAM. Ugh.
>
> Yeah. I'd rather we ignore that memory altogether, but I don't think
> we can.
>
> > It seems to me that the only way to solve this is to permit the idmap
> > and the hyp linear region to overlap, and use the 2^47 byte window at
> > the top of the address space for the hyp private mappings instead of
> > the one at the bottom.
>
> But that's the hard problem I want to avoid thinking of.
>
> We need to ensure that there is no EL1 VA that is congruent with the
> idmap over the kern_hyp_va() transformation. It means imposing
> restrictions over the EL1 linear map, and prevent any allocation that
> would result in this overlap (and that is including text).
>
> How do we do that?
>

A phys to virt offset of 0x0 is perfectly acceptable, no? The only
difference is that the idmapped bits are in another part of the VA
space.

> Frankly, I think we need to start looking into enabling VHE for the
> nVHE /behaviour/. Having a single TTBR on these systems is just
> insane.
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] arm64: kvm: handle 52-bit VA regions correctly under nVHE
  2021-03-30 13:58           ` Ard Biesheuvel
@ 2021-03-30 14:24             ` Marc Zyngier
  0 siblings, 0 replies; 8+ messages in thread
From: Marc Zyngier @ 2021-03-30 14:24 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Will Deacon, Anshuman Khandual, Catalin Marinas,
	Android Kernel Team, kvmarm, Linux ARM

On Tue, 30 Mar 2021 14:58:39 +0100,
Ard Biesheuvel <ardb@kernel.org> wrote:
> 
> On Tue, 30 Mar 2021 at 15:56, Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Tue, 30 Mar 2021 14:15:19 +0100,
> > Ard Biesheuvel <ardb@kernel.org> wrote:

[...]

> > > It seems to me that the only way to solve this is to permit the idmap
> > > and the hyp linear region to overlap, and use the 2^47 byte window at
> > > the top of the address space for the hyp private mappings instead of
> > > the one at the bottom.
> >
> > But that's the hard problem I want to avoid thinking of.
> >
> > We need to ensure that there is no EL1 VA that is congruent with the
> > idmap over the kern_hyp_va() transformation. It means imposing
> > restrictions over the EL1 linear map, and prevent any allocation that
> > would result in this overlap (and that is including text).
> >
> > How do we do that?
> >
> 
> A phys to virt offset of 0x0 is perfectly acceptable, no? The only
> difference is that the idmapped bits are in another part of the VA
> space.

What do we lose by doing that? If that's acceptable for LVA, why don't
we do it across the board? It feels like KASLR and EL2 randomisation
are in the way...

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-03-30 14:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-30 11:21 [PATCH] arm64: kvm: handle 52-bit VA regions correctly under nVHE Ard Biesheuvel
2021-03-30 12:44 ` Marc Zyngier
2021-03-30 12:49   ` Ard Biesheuvel
2021-03-30 13:04     ` Marc Zyngier
2021-03-30 13:15       ` Ard Biesheuvel
2021-03-30 13:56         ` Marc Zyngier
2021-03-30 13:58           ` Ard Biesheuvel
2021-03-30 14:24             ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).