* Re: [PATCH] powerpc: mm: Limit rma_size to 1TB when running without HV mode
2019-07-10 5:20 [PATCH] powerpc: mm: Limit rma_size to 1TB when running without HV mode Suraj Jitindar Singh
@ 2019-07-10 7:32 ` Satheesh Rajendran
2019-07-10 14:21 ` David Gibson
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Satheesh Rajendran @ 2019-07-10 7:32 UTC (permalink / raw)
To: Suraj Jitindar Singh; +Cc: linuxppc-dev, kvm-ppc, david
On Wed, Jul 10, 2019 at 03:20:18PM +1000, Suraj Jitindar Singh wrote:
> The virtual real mode addressing (VRMA) mechanism is used when a
> partition is using HPT (Hash Page Table) translation and performs
> real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this
> mode effective address bits 0:23 are treated as zero (i.e. the access
> is aliased to 0) and the access is performed using an implicit 1TB SLB
> entry.
>
> The size of the RMA (Real Memory Area) is communicated to the guest as
> the size of the first memory region in the device tree. And because of
> the mechanism described above can be expected to not exceed 1TB. In the
> event that the host erroneously represents the RMA as being larger than
> 1TB, guest accesses in real mode to memory addresses above 1TB will be
> aliased down to below 1TB. This means that a memory access performed in
> real mode may differ to one performed in virtual mode for the same memory
> address, which would likely have unintended consequences.
>
> To avoid this outcome have the guest explicitly limit the size of the
> RMA to the current maximum, which is 1TB. This means that even if the
> first memory block is larger than 1TB, only the first 1TB should be
> accessed in real mode.
>
> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
> ---
> arch/powerpc/mm/book3s64/hash_utils.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
Hi,
Tested this patch and now Power8 compat guest boots fine with mem >1024G on
Power9 host.
Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Host: P9; kernel: 5.2.0-00915-g5ad18b2e60b7
Before this patch:
Guest crashes..
[0.000000] BUG: Kernel NULL pointer dereference at 0x00000028
[0.000000] Faulting instruction address: 0xc00000000102caa0
[0.000000] Oops: Kernel access of bad area, sig: 11 [#1]
[0.000000] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[0.000000] Modules linked in:
[0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-03135-ge9a83bd23220 #24
[0.000000] NIP: c00000000102caa0 LR: c00000000102ca84 CTR: 0000000000000000
[0.000000] REGS: c000000001603ba0 TRAP: 0380 Not tainted (5.2.0-03135-ge9a83bd23220)
[0.000000] MSR: 8000000000001033 <SF,ME,IR,DR,RI,LE> CR: 24000428 XER: 20000000
[0.000000] CFAR: c00000000102c1d8 IRQMASK: 1
[0.000000] GPR00: c00000000102ca84 c000000001603e30 c000000001605300 0000010000000000
[0.000000] GPR04: 0000000000000000 0000000000000000 c00000ffffff8000 c000000001863dc8
[0.000000] GPR08: 0000000000002028 0000000000000000 c00000ffffff8000 0000000000000009
[0.000000] GPR12: 0000000000000000 c0000000018f0000 000000007dc5fef0 00000000012e1220
[0.000000] GPR16: 00000000012e10a0 fffffffffffffffd 000000007dc5fef0 000000000130fcc0
[0.000000] GPR20: 0000000000000014 0000000001a80000 000000002fff0000 fffffffffffffffd
[0.000000] GPR24: 0000000001d0000c c000000000000000 c000000001641ed8 c000000001641b78
[0.000000] GPR28: 0000000000000000 0000000000000000 0000010000000000 0000000000000000
[0.000000] NIP [c00000000102caa0] emergency_stack_init+0xb8/0x118
[0.000000] LR [c00000000102ca84] emergency_stack_init+0x9c/0x118
[0.000000] Call Trace:
[0.000000] [c000000001603e30] [c00000000102ca84] emergency_stack_init+0x9c/0x118 (unreliable)
[0.000000] [c000000001603e80] [c00000000102bd54] setup_arch+0x2fc/0x388
[0.000000] [c000000001603ef0] [c000000001023ccc] start_kernel+0xa4/0x660
[0.000000] [c000000001603f90] [c00000000000b774] start_here_common+0x1c/0x528
[0.000000] Instruction dump:
[0.000000] 7ffc07b4 7fc3f378 7bfd1f24 7f84e378 4bfff6e9 3f620004 3b7bc878 7f84e378
[0.000000] 39434000 7fc3f378 e93b0000 7d29e82a <f9490028> 4bfff6c5 e93b0000 7f84e378
[0.000000] random: get_random_bytes called from print_oops_end_marker+0x6c/0xa0 with crng_init=0
[0.000000] ---[ end trace 0000000000000000 ]---
[0.000000]
[0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
-------------------------
With this patch:
# virsh start --console p8
Domain p8 started
Connected to domain p8
..
..
Fedora 27 (Twenty Seven)
Kernel 5.2.0-03136-gf709b0494ad9 on an ppc64le (hvc0)
atest-guest login:
# free -g
total used free shared buff/cache available
Mem: 1028 0 1027 0 0 1025
Swap: 0 0
Regards,
-Satheesh.
>
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 28ced26f2a00..4d0e2cce9cd5 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1901,11 +1901,19 @@ void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base,
> *
> * For guests on platforms before POWER9, we clamp the it limit to 1G
> * to avoid some funky things such as RTAS bugs etc...
> + * On POWER9 we limit to 1TB in case the host erroneously told us that
> + * the RMA was >1TB. Effective address bits 0:23 are treated as zero
> + * (meaning the access is aliased to zero i.e. addr = addr % 1TB)
> + * for virtual real mode addressing and so it doesn't make sense to
> + * have an area larger than 1TB as it can't be addressed.
> */
> if (!early_cpu_has_feature(CPU_FTR_HVMODE)) {
> ppc64_rma_size = first_memblock_size;
> if (!early_cpu_has_feature(CPU_FTR_ARCH_300))
> ppc64_rma_size = min_t(u64, ppc64_rma_size, 0x40000000);
> + else
> + ppc64_rma_size = min_t(u64, ppc64_rma_size,
> + 1UL << SID_SHIFT_1T);
>
> /* Finally limit subsequent allocations */
> memblock_set_current_limit(ppc64_rma_size);
> --
> 2.13.6
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] powerpc: mm: Limit rma_size to 1TB when running without HV mode
2019-07-10 5:20 [PATCH] powerpc: mm: Limit rma_size to 1TB when running without HV mode Suraj Jitindar Singh
2019-07-10 7:32 ` Satheesh Rajendran
@ 2019-07-10 14:21 ` David Gibson
2019-07-12 13:09 ` Michael Ellerman
2019-07-18 13:56 ` Michael Ellerman
3 siblings, 0 replies; 7+ messages in thread
From: David Gibson @ 2019-07-10 14:21 UTC (permalink / raw)
To: Suraj Jitindar Singh; +Cc: linuxppc-dev, kvm-ppc
[-- Attachment #1: Type: text/plain, Size: 3171 bytes --]
On Wed, Jul 10, 2019 at 03:20:18PM +1000, Suraj Jitindar Singh wrote:
> The virtual real mode addressing (VRMA) mechanism is used when a
> partition is using HPT (Hash Page Table) translation and performs
> real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this
> mode effective address bits 0:23 are treated as zero (i.e. the access
> is aliased to 0) and the access is performed using an implicit 1TB SLB
> entry.
>
> The size of the RMA (Real Memory Area) is communicated to the guest as
> the size of the first memory region in the device tree. And because of
> the mechanism described above can be expected to not exceed 1TB. In the
> event that the host erroneously represents the RMA as being larger than
> 1TB, guest accesses in real mode to memory addresses above 1TB will be
> aliased down to below 1TB. This means that a memory access performed in
> real mode may differ to one performed in virtual mode for the same memory
> address, which would likely have unintended consequences.
>
> To avoid this outcome have the guest explicitly limit the size of the
> RMA to the current maximum, which is 1TB. This means that even if the
> first memory block is larger than 1TB, only the first 1TB should be
> accessed in real mode.
>
> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Although I'd really like to also see some comments added in
allocate_paca_ptrs() explaining the constraints there.
Oh, also, basing this on the non-compat PVR is bogus, but it's still
better than what we had.
> ---
> arch/powerpc/mm/book3s64/hash_utils.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 28ced26f2a00..4d0e2cce9cd5 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1901,11 +1901,19 @@ void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base,
> *
> * For guests on platforms before POWER9, we clamp the it limit to 1G
> * to avoid some funky things such as RTAS bugs etc...
> + * On POWER9 we limit to 1TB in case the host erroneously told us that
> + * the RMA was >1TB. Effective address bits 0:23 are treated as zero
> + * (meaning the access is aliased to zero i.e. addr = addr % 1TB)
> + * for virtual real mode addressing and so it doesn't make sense to
> + * have an area larger than 1TB as it can't be addressed.
> */
> if (!early_cpu_has_feature(CPU_FTR_HVMODE)) {
> ppc64_rma_size = first_memblock_size;
> if (!early_cpu_has_feature(CPU_FTR_ARCH_300))
> ppc64_rma_size = min_t(u64, ppc64_rma_size, 0x40000000);
> + else
> + ppc64_rma_size = min_t(u64, ppc64_rma_size,
> + 1UL << SID_SHIFT_1T);
>
> /* Finally limit subsequent allocations */
> memblock_set_current_limit(ppc64_rma_size);
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] powerpc: mm: Limit rma_size to 1TB when running without HV mode
2019-07-10 5:20 [PATCH] powerpc: mm: Limit rma_size to 1TB when running without HV mode Suraj Jitindar Singh
2019-07-10 7:32 ` Satheesh Rajendran
2019-07-10 14:21 ` David Gibson
@ 2019-07-12 13:09 ` Michael Ellerman
2019-07-15 1:58 ` Suraj Jitindar Singh
2019-07-18 13:56 ` Michael Ellerman
3 siblings, 1 reply; 7+ messages in thread
From: Michael Ellerman @ 2019-07-12 13:09 UTC (permalink / raw)
To: Suraj Jitindar Singh, linuxppc-dev; +Cc: sjitindarsingh, kvm-ppc, david
Suraj Jitindar Singh <sjitindarsingh@gmail.com> writes:
> The virtual real mode addressing (VRMA) mechanism is used when a
> partition is using HPT (Hash Page Table) translation and performs
> real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this
> mode effective address bits 0:23 are treated as zero (i.e. the access
> is aliased to 0) and the access is performed using an implicit 1TB SLB
> entry.
>
> The size of the RMA (Real Memory Area) is communicated to the guest as
> the size of the first memory region in the device tree. And because of
> the mechanism described above can be expected to not exceed 1TB. In the
> event that the host erroneously represents the RMA as being larger than
> 1TB, guest accesses in real mode to memory addresses above 1TB will be
> aliased down to below 1TB. This means that a memory access performed in
> real mode may differ to one performed in virtual mode for the same memory
> address, which would likely have unintended consequences.
>
> To avoid this outcome have the guest explicitly limit the size of the
> RMA to the current maximum, which is 1TB. This means that even if the
> first memory block is larger than 1TB, only the first 1TB should be
> accessed in real mode.
>
> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
I added:
Fixes: c3ab300ea555 ("powerpc: Add POWER9 cputable entry")
Cc: stable@vger.kernel.org # v4.6+
Which is not exactly correct, but probably good enough?
cheers
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 28ced26f2a00..4d0e2cce9cd5 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1901,11 +1901,19 @@ void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base,
> *
> * For guests on platforms before POWER9, we clamp the it limit to 1G
> * to avoid some funky things such as RTAS bugs etc...
> + * On POWER9 we limit to 1TB in case the host erroneously told us that
> + * the RMA was >1TB. Effective address bits 0:23 are treated as zero
> + * (meaning the access is aliased to zero i.e. addr = addr % 1TB)
> + * for virtual real mode addressing and so it doesn't make sense to
> + * have an area larger than 1TB as it can't be addressed.
> */
> if (!early_cpu_has_feature(CPU_FTR_HVMODE)) {
> ppc64_rma_size = first_memblock_size;
> if (!early_cpu_has_feature(CPU_FTR_ARCH_300))
> ppc64_rma_size = min_t(u64, ppc64_rma_size, 0x40000000);
> + else
> + ppc64_rma_size = min_t(u64, ppc64_rma_size,
> + 1UL << SID_SHIFT_1T);
>
> /* Finally limit subsequent allocations */
> memblock_set_current_limit(ppc64_rma_size);
> --
> 2.13.6
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] powerpc: mm: Limit rma_size to 1TB when running without HV mode
2019-07-12 13:09 ` Michael Ellerman
@ 2019-07-15 1:58 ` Suraj Jitindar Singh
2019-07-15 2:23 ` Michael Ellerman
0 siblings, 1 reply; 7+ messages in thread
From: Suraj Jitindar Singh @ 2019-07-15 1:58 UTC (permalink / raw)
To: Michael Ellerman, linuxppc-dev; +Cc: kvm-ppc, david
On Fri, 2019-07-12 at 23:09 +1000, Michael Ellerman wrote:
> Suraj Jitindar Singh <sjitindarsingh@gmail.com> writes:
> > The virtual real mode addressing (VRMA) mechanism is used when a
> > partition is using HPT (Hash Page Table) translation and performs
> > real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this
> > mode effective address bits 0:23 are treated as zero (i.e. the
> > access
> > is aliased to 0) and the access is performed using an implicit 1TB
> > SLB
> > entry.
> >
> > The size of the RMA (Real Memory Area) is communicated to the guest
> > as
> > the size of the first memory region in the device tree. And because
> > of
> > the mechanism described above can be expected to not exceed 1TB. In
> > the
> > event that the host erroneously represents the RMA as being larger
> > than
> > 1TB, guest accesses in real mode to memory addresses above 1TB will
> > be
> > aliased down to below 1TB. This means that a memory access
> > performed in
> > real mode may differ to one performed in virtual mode for the same
> > memory
> > address, which would likely have unintended consequences.
> >
> > To avoid this outcome have the guest explicitly limit the size of
> > the
> > RMA to the current maximum, which is 1TB. This means that even if
> > the
> > first memory block is larger than 1TB, only the first 1TB should be
> > accessed in real mode.
> >
> > Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
>
> I added:
>
> Fixes: c3ab300ea555 ("powerpc: Add POWER9 cputable entry")
> Cc: stable@vger.kernel.org # v4.6+
>
>
> Which is not exactly correct, but probably good enough?
I think we actually want:
Fixes: c610d65c0ad0 ("powerpc/pseries: lift RTAS limit for hash")
Which is what actually caused it to break and for the issue to present
itself.
>
> cheers
>
> > diff --git a/arch/powerpc/mm/book3s64/hash_utils.c
> > b/arch/powerpc/mm/book3s64/hash_utils.c
> > index 28ced26f2a00..4d0e2cce9cd5 100644
> > --- a/arch/powerpc/mm/book3s64/hash_utils.c
> > +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> > @@ -1901,11 +1901,19 @@ void
> > hash__setup_initial_memory_limit(phys_addr_t first_memblock_base,
> > *
> > * For guests on platforms before POWER9, we clamp the it
> > limit to 1G
> > * to avoid some funky things such as RTAS bugs etc...
> > + * On POWER9 we limit to 1TB in case the host erroneously
> > told us that
> > + * the RMA was >1TB. Effective address bits 0:23 are
> > treated as zero
> > + * (meaning the access is aliased to zero i.e. addr = addr
> > % 1TB)
> > + * for virtual real mode addressing and so it doesn't make
> > sense to
> > + * have an area larger than 1TB as it can't be addressed.
> > */
> > if (!early_cpu_has_feature(CPU_FTR_HVMODE)) {
> > ppc64_rma_size = first_memblock_size;
> > if (!early_cpu_has_feature(CPU_FTR_ARCH_300))
> > ppc64_rma_size = min_t(u64,
> > ppc64_rma_size, 0x40000000);
> > + else
> > + ppc64_rma_size = min_t(u64,
> > ppc64_rma_size,
> > + 1UL <<
> > SID_SHIFT_1T);
> >
> > /* Finally limit subsequent allocations */
> > memblock_set_current_limit(ppc64_rma_size);
> > --
> > 2.13.6
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] powerpc: mm: Limit rma_size to 1TB when running without HV mode
2019-07-15 1:58 ` Suraj Jitindar Singh
@ 2019-07-15 2:23 ` Michael Ellerman
0 siblings, 0 replies; 7+ messages in thread
From: Michael Ellerman @ 2019-07-15 2:23 UTC (permalink / raw)
To: Suraj Jitindar Singh, linuxppc-dev; +Cc: kvm-ppc, david
Suraj Jitindar Singh <sjitindarsingh@gmail.com> writes:
> On Fri, 2019-07-12 at 23:09 +1000, Michael Ellerman wrote:
>> Suraj Jitindar Singh <sjitindarsingh@gmail.com> writes:
>> > The virtual real mode addressing (VRMA) mechanism is used when a
>> > partition is using HPT (Hash Page Table) translation and performs
>> > real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this
>> > mode effective address bits 0:23 are treated as zero (i.e. the
>> > access
>> > is aliased to 0) and the access is performed using an implicit 1TB
>> > SLB
>> > entry.
>> >
>> > The size of the RMA (Real Memory Area) is communicated to the guest
>> > as
>> > the size of the first memory region in the device tree. And because
>> > of
>> > the mechanism described above can be expected to not exceed 1TB. In
>> > the
>> > event that the host erroneously represents the RMA as being larger
>> > than
>> > 1TB, guest accesses in real mode to memory addresses above 1TB will
>> > be
>> > aliased down to below 1TB. This means that a memory access
>> > performed in
>> > real mode may differ to one performed in virtual mode for the same
>> > memory
>> > address, which would likely have unintended consequences.
>> >
>> > To avoid this outcome have the guest explicitly limit the size of
>> > the
>> > RMA to the current maximum, which is 1TB. This means that even if
>> > the
>> > first memory block is larger than 1TB, only the first 1TB should be
>> > accessed in real mode.
>> >
>> > Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
>>
>> I added:
>>
>> Fixes: c3ab300ea555 ("powerpc: Add POWER9 cputable entry")
>> Cc: stable@vger.kernel.org # v4.6+
>>
>>
>> Which is not exactly correct, but probably good enough?
>
> I think we actually want:
> Fixes: c610d65c0ad0 ("powerpc/pseries: lift RTAS limit for hash")
>
> Which is what actually caused it to break and for the issue to present
> itself.
Thanks, I used that instead.
cheers
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] powerpc: mm: Limit rma_size to 1TB when running without HV mode
2019-07-10 5:20 [PATCH] powerpc: mm: Limit rma_size to 1TB when running without HV mode Suraj Jitindar Singh
` (2 preceding siblings ...)
2019-07-12 13:09 ` Michael Ellerman
@ 2019-07-18 13:56 ` Michael Ellerman
3 siblings, 0 replies; 7+ messages in thread
From: Michael Ellerman @ 2019-07-18 13:56 UTC (permalink / raw)
To: Suraj Jitindar Singh, linuxppc-dev; +Cc: kvm-ppc, sjitindarsingh, david
On Wed, 2019-07-10 at 05:20:18 UTC, Suraj Jitindar Singh wrote:
> The virtual real mode addressing (VRMA) mechanism is used when a
> partition is using HPT (Hash Page Table) translation and performs
> real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this
> mode effective address bits 0:23 are treated as zero (i.e. the access
> is aliased to 0) and the access is performed using an implicit 1TB SLB
> entry.
>
> The size of the RMA (Real Memory Area) is communicated to the guest as
> the size of the first memory region in the device tree. And because of
> the mechanism described above can be expected to not exceed 1TB. In the
> event that the host erroneously represents the RMA as being larger than
> 1TB, guest accesses in real mode to memory addresses above 1TB will be
> aliased down to below 1TB. This means that a memory access performed in
> real mode may differ to one performed in virtual mode for the same memory
> address, which would likely have unintended consequences.
>
> To avoid this outcome have the guest explicitly limit the size of the
> RMA to the current maximum, which is 1TB. This means that even if the
> first memory block is larger than 1TB, only the first 1TB should be
> accessed in real mode.
>
> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
> Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Applied to powerpc fixes, thanks.
https://git.kernel.org/powerpc/c/da0ef93310e67ae6902efded60b6724dab27a5d1
cheers
^ permalink raw reply [flat|nested] 7+ messages in thread