linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/9] improve early structure allocations (paca, lppaca, etc)
@ 2017-08-12 11:34 Nicholas Piggin
  2017-08-13  1:33 ` [PATCH v2 1/9] KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validation Nicholas Piggin
                   ` (8 more replies)
  0 siblings, 9 replies; 28+ messages in thread
From: Nicholas Piggin @ 2017-08-12 11:34 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, kvm-ppc

I last posted this series here,

http://marc.info/?l=linuxppc-embedded&m=150068630827162&w=2

Since then it's become apparent that NUMA allocation support was not
quite right and will require some fiddly rejigging of the early dt
parsing to make it work. So I've dropped those NUMA patches from the
series for now.

The series relaxes memory allocation limits on various platforms (e.g.,
powernv has no RMA restriction, radix has no SLB restriction, etc). It
also avoids unnecessary allocations (powernv does not require lppaca,
radix does not require SLB).

Finally, it moves paca and lppaca allocations from a single big array
to a table of pointers. This gets to a point where we can easily
allocate these on local node if we had the topology info available at
the time of allocation (which we don't at the moment).

I also found a small KVM bug I think (patch 1).

Thanks,
Nick

Nicholas Piggin (9):
  KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validation
  powerpc/powernv: powernv platform is not constrained by RMA
  powerpc/powernv: Remove real mode access limit for early allocations
  powerpc/64s/radix: Remove bolted-SLB address limit for per-cpu stacks
  powerpc/64s: Relax PACA address limitations
  powerpc/64s/radix: Do not allocate SLB shadow structures
  powerpc/64s: do not allocate lppaca if we are not virtualized
  powerpc/64: Use a table of paca pointers and allocate pacas
    individually
  powerpc/64: Use a table of lppaca pointers and allocate lppacas
    individually

 arch/powerpc/include/asm/kvm_ppc.h           |   8 +-
 arch/powerpc/include/asm/lppaca.h            |  26 ++--
 arch/powerpc/include/asm/paca.h              |  12 +-
 arch/powerpc/include/asm/pmc.h               |  10 +-
 arch/powerpc/include/asm/smp.h               |   4 +-
 arch/powerpc/kernel/asm-offsets.c            |   7 ++
 arch/powerpc/kernel/crash.c                  |   2 +-
 arch/powerpc/kernel/head_64.S                |  12 +-
 arch/powerpc/kernel/machine_kexec_64.c       |  30 +++--
 arch/powerpc/kernel/paca.c                   | 178 +++++++++++++++++----------
 arch/powerpc/kernel/prom.c                   |  10 +-
 arch/powerpc/kernel/setup.h                  |   4 +
 arch/powerpc/kernel/setup_64.c               |  36 +++---
 arch/powerpc/kernel/smp.c                    |  10 +-
 arch/powerpc/kvm/book3s_hv.c                 |  22 ++--
 arch/powerpc/kvm/book3s_hv_builtin.c         |   2 +-
 arch/powerpc/kvm/book3s_hv_interrupts.S      |   3 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S      |   5 +-
 arch/powerpc/mm/hash_utils_64.c              |  24 ++--
 arch/powerpc/mm/numa.c                       |   4 +-
 arch/powerpc/mm/pgtable-radix.c              |  33 ++---
 arch/powerpc/mm/tlb-radix.c                  |   2 +-
 arch/powerpc/platforms/85xx/smp.c            |   8 +-
 arch/powerpc/platforms/cell/smp.c            |   4 +-
 arch/powerpc/platforms/powernv/idle.c        |  13 +-
 arch/powerpc/platforms/powernv/opal.c        |   7 +-
 arch/powerpc/platforms/powernv/setup.c       |   4 +-
 arch/powerpc/platforms/powernv/smp.c         |   2 +-
 arch/powerpc/platforms/powernv/subcore.c     |   2 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c |   2 +-
 arch/powerpc/platforms/pseries/kexec.c       |   7 +-
 arch/powerpc/platforms/pseries/lpar.c        |   4 +-
 arch/powerpc/platforms/pseries/setup.c       |   2 +-
 arch/powerpc/platforms/pseries/smp.c         |   4 +-
 arch/powerpc/sysdev/xics/icp-native.c        |   2 +-
 arch/powerpc/xmon/xmon.c                     |   2 +-
 36 files changed, 301 insertions(+), 206 deletions(-)

-- 
2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/9] KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validation
  2017-08-12 11:34 [PATCH v2 0/9] improve early structure allocations (paca, lppaca, etc) Nicholas Piggin
@ 2017-08-13  1:33 ` Nicholas Piggin
  2017-08-15 11:24   ` Michael Ellerman
  2017-08-31  3:41   ` Paul Mackerras
  2017-08-13  1:33 ` [PATCH v2 2/9] powerpc/powernv: powernv platform is not constrained by RMA Nicholas Piggin
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 28+ messages in thread
From: Nicholas Piggin @ 2017-08-13  1:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, kvm-ppc

KVM currently validates the size of the VPA registered by the client
against sizeof(struct lppaca), however we align (and therefore size)
that struct to 1kB to avoid crossing a 4kB boundary in the client.

PAPR calls for sizes >= 640 bytes to be accepted. Hard code this with
a comment.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kvm/book3s_hv.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 359c79cdf0cc..1182cfd79857 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -485,7 +485,13 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu *vcpu,
 
 	switch (subfunc) {
 	case H_VPA_REG_VPA:		/* register VPA */
-		if (len < sizeof(struct lppaca))
+		/*
+		 * The size of our lppaca is 1kB because of the way we align
+		 * it for the guest to avoid crossing a 4kB boundary. We only
+		 * use 640 bytes of the structure though, so we should accept
+		 * clients that set a size of 640.
+		 */
+		if (len < 640)
 			break;
 		vpap = &tvcpu->arch.vpa;
 		err = 0;
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/9] powerpc/powernv: powernv platform is not constrained by RMA
  2017-08-12 11:34 [PATCH v2 0/9] improve early structure allocations (paca, lppaca, etc) Nicholas Piggin
  2017-08-13  1:33 ` [PATCH v2 1/9] KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validation Nicholas Piggin
@ 2017-08-13  1:33 ` Nicholas Piggin
  2017-08-31 11:36   ` [v2, " Michael Ellerman
  2017-08-13  1:33 ` [PATCH v2 3/9] powerpc/powernv: Remove real mode access limit for early allocations Nicholas Piggin
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Nicholas Piggin @ 2017-08-13  1:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, kvm-ppc

Remove incorrect comment about real mode address restrictions on
powernv (bare metal), and unnecessary clamping to ppc64_rma_size.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/platforms/powernv/opal.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index cad6b57ce494..caf60583293a 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -162,12 +162,9 @@ int __init early_init_dt_scan_recoverable_ranges(unsigned long node,
 			sizeof(struct mcheck_recoverable_range);
 
 	/*
-	 * Allocate a buffer to hold the MC recoverable ranges. We would be
-	 * accessing them in real mode, hence it needs to be within
-	 * RMO region.
+	 * Allocate a buffer to hold the MC recoverable ranges.
 	 */
-	mc_recoverable_range =__va(memblock_alloc_base(size, __alignof__(u64),
-							ppc64_rma_size));
+	mc_recoverable_range =__va(memblock_alloc(size, __alignof__(u64)));
 	memset(mc_recoverable_range, 0, size);
 
 	for (i = 0; i < mc_recoverable_range_len; i++) {
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 3/9] powerpc/powernv: Remove real mode access limit for early allocations
  2017-08-12 11:34 [PATCH v2 0/9] improve early structure allocations (paca, lppaca, etc) Nicholas Piggin
  2017-08-13  1:33 ` [PATCH v2 1/9] KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validation Nicholas Piggin
  2017-08-13  1:33 ` [PATCH v2 2/9] powerpc/powernv: powernv platform is not constrained by RMA Nicholas Piggin
@ 2017-08-13  1:33 ` Nicholas Piggin
  2017-08-14 12:49   ` Michael Ellerman
  2017-08-13  1:33 ` [PATCH v2 4/9] powerpc/64s/radix: Remove bolted-SLB address limit for per-cpu stacks Nicholas Piggin
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Nicholas Piggin @ 2017-08-13  1:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, kvm-ppc

This removes the RMA limit on powernv platform, which constrains
early allocations such as PACAs and stacks. There are still other
restrictions that must be followed, such as bolted SLB limits, but
real mode addressing has no constraints.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/mm/hash_utils_64.c | 24 +++++++++++++++---------
 arch/powerpc/mm/pgtable-radix.c | 33 +++++++++++++++++----------------
 2 files changed, 32 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 7a20669c19e7..d3da19cc4867 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -1824,16 +1824,22 @@ void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base,
 	 */
 	BUG_ON(first_memblock_base != 0);
 
-	/* On LPAR systems, the first entry is our RMA region,
-	 * non-LPAR 64-bit hash MMU systems don't have a limitation
-	 * on real mode access, but using the first entry works well
-	 * enough. We also clamp it to 1G to avoid some funky things
-	 * such as RTAS bugs etc...
-	 */
-	ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
+	if (!early_cpu_has_feature(CPU_FTR_HVMODE)) {
+		/*
+		 * On virtualized systems, the first entry is our RMA region,
+		 * non-LPAR 64-bit hash MMU systems don't have a limitation
+		 * on real mode access.
+		 *
+		 * We also clamp it to 1G to avoid some funky things
+		 * such as RTAS bugs etc...
+		 */
+		ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
 
-	/* Finally limit subsequent allocations */
-	memblock_set_current_limit(ppc64_rma_size);
+		/* Finally limit subsequent allocations */
+		memblock_set_current_limit(ppc64_rma_size);
+	} else {
+		ppc64_rma_size = ULONG_MAX;
+	}
 }
 
 #ifdef CONFIG_DEBUG_FS
diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
index 671a45d86c18..61ca17d81737 100644
--- a/arch/powerpc/mm/pgtable-radix.c
+++ b/arch/powerpc/mm/pgtable-radix.c
@@ -598,22 +598,23 @@ void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
 	 * physical on those processors
 	 */
 	BUG_ON(first_memblock_base != 0);
-	/*
-	 * We limit the allocation that depend on ppc64_rma_size
-	 * to first_memblock_size. We also clamp it to 1GB to
-	 * avoid some funky things such as RTAS bugs.
-	 *
-	 * On radix config we really don't have a limitation
-	 * on real mode access. But keeping it as above works
-	 * well enough.
-	 */
-	ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
-	/*
-	 * Finally limit subsequent allocations. We really don't want
-	 * to limit the memblock allocations to rma_size. FIXME!! should
-	 * we even limit at all ?
-	 */
-	memblock_set_current_limit(first_memblock_base + first_memblock_size);
+
+	if (!early_cpu_has_feature(CPU_FTR_HVMODE)) {
+		/*
+		 * On virtualized systems, the first entry is our RMA region,
+		 * non-LPAR 64-bit hash MMU systems don't have a limitation
+		 * on real mode access.
+		 *
+		 * We also clamp it to 1G to avoid some funky things
+		 * such as RTAS bugs etc...
+		 */
+		ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
+
+		/* Finally limit subsequent allocations */
+		memblock_set_current_limit(ppc64_rma_size);
+	} else {
+		ppc64_rma_size = ULONG_MAX;
+	}
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 4/9] powerpc/64s/radix: Remove bolted-SLB address limit for per-cpu stacks
  2017-08-12 11:34 [PATCH v2 0/9] improve early structure allocations (paca, lppaca, etc) Nicholas Piggin
                   ` (2 preceding siblings ...)
  2017-08-13  1:33 ` [PATCH v2 3/9] powerpc/powernv: Remove real mode access limit for early allocations Nicholas Piggin
@ 2017-08-13  1:33 ` Nicholas Piggin
  2017-08-31 11:36   ` [v2, " Michael Ellerman
  2017-08-13  1:33 ` [PATCH v2 5/9] powerpc/64s: Relax PACA address limitations Nicholas Piggin
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Nicholas Piggin @ 2017-08-13  1:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, kvm-ppc

Radix MMU does not take SLB or TLB interrupts when accessing kernel
linear address. Remove this restriction for radix mode.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/setup_64.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index af23d4b576ec..7393bac3c7f4 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -564,6 +564,9 @@ static __init u64 safe_stack_limit(void)
 	/* Other BookE, we assume the first GB is bolted */
 	return 1ul << 30;
 #else
+	if (early_radix_enabled())
+		return ULONG_MAX;
+
 	/* BookS, the first segment is bolted */
 	if (mmu_has_feature(MMU_FTR_1T_SEGMENT))
 		return 1UL << SID_SHIFT_1T;
@@ -578,7 +581,8 @@ void __init irqstack_early_init(void)
 
 	/*
 	 * Interrupt stacks must be in the first segment since we
-	 * cannot afford to take SLB misses on them.
+	 * cannot afford to take SLB misses on them. They are not
+	 * accessed in realmode.
 	 */
 	for_each_possible_cpu(i) {
 		softirq_ctx[i] = (struct thread_info *)
@@ -649,8 +653,9 @@ void __init emergency_stack_init(void)
 	 * aligned.
 	 *
 	 * Since we use these as temporary stacks during secondary CPU
-	 * bringup, we need to get at them in real mode. This means they
-	 * must also be within the RMO region.
+	 * bringup, machine check, system reset, and HMI, we need to get
+	 * at them in real mode. This means they must also be within the RMO
+	 * region.
 	 *
 	 * The IRQ stacks allocated elsewhere in this file are zeroed and
 	 * initialized in kernel/irq.c. These are initialized here in order
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 5/9] powerpc/64s: Relax PACA address limitations
  2017-08-12 11:34 [PATCH v2 0/9] improve early structure allocations (paca, lppaca, etc) Nicholas Piggin
                   ` (3 preceding siblings ...)
  2017-08-13  1:33 ` [PATCH v2 4/9] powerpc/64s/radix: Remove bolted-SLB address limit for per-cpu stacks Nicholas Piggin
@ 2017-08-13  1:33 ` Nicholas Piggin
  2017-08-13  1:33 ` [PATCH v2 6/9] powerpc/64s/radix: Do not allocate SLB shadow structures Nicholas Piggin
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2017-08-13  1:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, kvm-ppc

Book3S radix-mode has no SLB interrupt limitation, and hash-mode has
a 1T limitation on modern CPUs, so PACA allocation limits can be lifted.

Update the paca alloation limits. Share TLB/SLB calculation with the
stack allocation code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/paca.c     | 13 +++++--------
 arch/powerpc/kernel/setup.h    |  4 ++++
 arch/powerpc/kernel/setup_64.c |  7 ++++---
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 8d63627e067f..64401f551765 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -18,6 +18,8 @@
 #include <asm/pgtable.h>
 #include <asm/kexec.h>
 
+#include "setup.h"
+
 #ifdef CONFIG_PPC_BOOK3S
 
 /*
@@ -199,16 +201,11 @@ void __init allocate_pacas(void)
 	u64 limit;
 	int cpu;
 
-	limit = ppc64_rma_size;
-
-#ifdef CONFIG_PPC_BOOK3S_64
 	/*
-	 * We can't take SLB misses on the paca, and we want to access them
-	 * in real mode, so allocate them within the RMA and also within
-	 * the first segment.
+	 * We access pacas in real mode, and cannot take faults on them when
+	 * in virtual mode, so allocate them accordingly.
 	 */
-	limit = min(0x10000000ULL, limit);
-#endif
+	limit = min(safe_kva_limit(), ppc64_rma_size);
 
 	paca_size = PAGE_ALIGN(sizeof(struct paca_struct) * nr_cpu_ids);
 
diff --git a/arch/powerpc/kernel/setup.h b/arch/powerpc/kernel/setup.h
index cfba134b3024..b97dfb50298c 100644
--- a/arch/powerpc/kernel/setup.h
+++ b/arch/powerpc/kernel/setup.h
@@ -45,6 +45,10 @@ void emergency_stack_init(void);
 static inline void emergency_stack_init(void) { };
 #endif
 
+#ifdef CONFIG_PPC64
+u64 safe_kva_limit(void);
+#endif
+
 /*
  * Having this in kvm_ppc.h makes include dependencies too
  * tricky to solve for setup-common.c so have it here.
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 7393bac3c7f4..35ad5f28f0c1 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -555,10 +555,11 @@ void __init initialize_cache_info(void)
  * used to allocate interrupt or emergency stacks for which our
  * exception entry path doesn't deal with being interrupted.
  */
-static __init u64 safe_stack_limit(void)
+__init u64 safe_kva_limit(void)
 {
 #ifdef CONFIG_PPC_BOOK3E
 	/* Freescale BookE bolts the entire linear mapping */
+	/* XXX: BookE ppc64_rma_limit setup seems to disagree? */
 	if (mmu_has_feature(MMU_FTR_TYPE_FSL_E))
 		return linear_map_top;
 	/* Other BookE, we assume the first GB is bolted */
@@ -576,7 +577,7 @@ static __init u64 safe_stack_limit(void)
 
 void __init irqstack_early_init(void)
 {
-	u64 limit = safe_stack_limit();
+	u64 limit = safe_kva_limit();
 	unsigned int i;
 
 	/*
@@ -661,7 +662,7 @@ void __init emergency_stack_init(void)
 	 * initialized in kernel/irq.c. These are initialized here in order
 	 * to have emergency stacks available as early as possible.
 	 */
-	limit = min(safe_stack_limit(), ppc64_rma_size);
+	limit = min(safe_kva_limit(), ppc64_rma_size);
 
 	for_each_possible_cpu(i) {
 		struct thread_info *ti;
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 6/9] powerpc/64s/radix: Do not allocate SLB shadow structures
  2017-08-12 11:34 [PATCH v2 0/9] improve early structure allocations (paca, lppaca, etc) Nicholas Piggin
                   ` (4 preceding siblings ...)
  2017-08-13  1:33 ` [PATCH v2 5/9] powerpc/64s: Relax PACA address limitations Nicholas Piggin
@ 2017-08-13  1:33 ` Nicholas Piggin
  2017-08-31 11:36   ` [v2, " Michael Ellerman
  2017-08-13  1:33 ` [PATCH v2 7/9] powerpc/64s: do not allocate lppaca if we are not virtualized Nicholas Piggin
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Nicholas Piggin @ 2017-08-13  1:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, kvm-ppc

These are unused in radix mode.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/paca.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 64401f551765..354a955ca377 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -101,18 +101,27 @@ static inline void free_lppacas(void) { }
  * If you make the number of persistent SLB entries dynamic, please also
  * update PR KVM to flush and restore them accordingly.
  */
-static struct slb_shadow *slb_shadow;
+static struct slb_shadow * __initdata slb_shadow;
 
 static void __init allocate_slb_shadows(int nr_cpus, int limit)
 {
 	int size = PAGE_ALIGN(sizeof(struct slb_shadow) * nr_cpus);
+
+	if (early_radix_enabled())
+		return;
+
 	slb_shadow = __va(memblock_alloc_base(size, PAGE_SIZE, limit));
 	memset(slb_shadow, 0, size);
 }
 
 static struct slb_shadow * __init init_slb_shadow(int cpu)
 {
-	struct slb_shadow *s = &slb_shadow[cpu];
+	struct slb_shadow *s;
+
+	if (early_radix_enabled())
+		return NULL;
+
+	s = &slb_shadow[cpu];
 
 	/*
 	 * When we come through here to initialise boot_paca, the slb_shadow
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 7/9] powerpc/64s: do not allocate lppaca if we are not virtualized
  2017-08-12 11:34 [PATCH v2 0/9] improve early structure allocations (paca, lppaca, etc) Nicholas Piggin
                   ` (5 preceding siblings ...)
  2017-08-13  1:33 ` [PATCH v2 6/9] powerpc/64s/radix: Do not allocate SLB shadow structures Nicholas Piggin
@ 2017-08-13  1:33 ` Nicholas Piggin
  2017-10-13 22:47   ` Paul Mackerras
  2017-08-13  1:33 ` [PATCH v2 8/9] powerpc/64: Use a table of paca pointers and allocate pacas individually Nicholas Piggin
  2017-08-13  1:33 ` [PATCH v2 9/9] powerpc/64: Use a table of lppaca pointers and allocate lppacas individually Nicholas Piggin
  8 siblings, 1 reply; 28+ messages in thread
From: Nicholas Piggin @ 2017-08-13  1:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, kvm-ppc

The "lppaca" is a structure registered with the hypervisor. This
is unnecessary when running on non-virtualised platforms. One field
from the lppaca (pmcregs_in_use) is also used by the host, so move
the host part out into the paca (lppaca field is still updated in
guest mode).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/paca.h         |  8 ++++++--
 arch/powerpc/include/asm/pmc.h          | 10 +++++++++-
 arch/powerpc/kernel/asm-offsets.c       |  7 +++++++
 arch/powerpc/kernel/paca.c              | 16 +++++++++++++---
 arch/powerpc/kernel/prom.c              | 10 +++++++---
 arch/powerpc/kvm/book3s_hv_interrupts.S |  3 +--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |  5 ++---
 7 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index dc88a31cc79a..de47c5a4f132 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -57,7 +57,7 @@ struct task_struct;
  * processor.
  */
 struct paca_struct {
-#ifdef CONFIG_PPC_BOOK3S
+#ifdef CONFIG_PPC_PSERIES
 	/*
 	 * Because hw_cpu_id, unlike other paca fields, is accessed
 	 * routinely from other CPUs (from the IRQ code), we stick to
@@ -66,7 +66,8 @@ struct paca_struct {
 	 */
 
 	struct lppaca *lppaca_ptr;	/* Pointer to LpPaca for PLIC */
-#endif /* CONFIG_PPC_BOOK3S */
+#endif /* CONFIG_PPC_PSERIES */
+
 	/*
 	 * MAGIC: the spinlock functions in arch/powerpc/lib/locks.c 
 	 * load lock_token and paca_index with a single lwz
@@ -158,6 +159,9 @@ struct paca_struct {
 	u64 saved_r1;			/* r1 save for RTAS calls or PM */
 	u64 saved_msr;			/* MSR saved here by enter_rtas */
 	u16 trap_save;			/* Used when bad stack is encountered */
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	u8 pmcregs_in_use;		/* pseries puts this in lppaca */
+#endif
 	u8 soft_enabled;		/* irq soft-enable flag */
 	u8 irq_happened;		/* irq happened while soft-disabled */
 	u8 io_sync;			/* writel() needs spin_unlock sync */
diff --git a/arch/powerpc/include/asm/pmc.h b/arch/powerpc/include/asm/pmc.h
index 5a9ede4962cb..7b672a72cb0b 100644
--- a/arch/powerpc/include/asm/pmc.h
+++ b/arch/powerpc/include/asm/pmc.h
@@ -31,10 +31,18 @@ void ppc_enable_pmcs(void);
 
 #ifdef CONFIG_PPC_BOOK3S_64
 #include <asm/lppaca.h>
+#include <asm/firmware.h>
 
 static inline void ppc_set_pmu_inuse(int inuse)
 {
-	get_lppaca()->pmcregs_in_use = inuse;
+#ifdef CONFIG_PPC_PSERIES
+	if (firmware_has_feature(FW_FEATURE_LPAR))
+		get_lppaca()->pmcregs_in_use = inuse;
+#endif
+
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	get_paca()->pmcregs_in_use = inuse;
+#endif
 }
 
 extern void power4_enable_pmcs(void);
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 6e95c2c19a7e..831b277c91c7 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -221,12 +221,19 @@ int main(void)
 	OFFSET(PACA_EXMC, paca_struct, exmc);
 	OFFSET(PACA_EXSLB, paca_struct, exslb);
 	OFFSET(PACA_EXNMI, paca_struct, exnmi);
+#ifdef CONFIG_PPC_PSERIES
 	OFFSET(PACALPPACAPTR, paca_struct, lppaca_ptr);
+#endif
 	OFFSET(PACA_SLBSHADOWPTR, paca_struct, slb_shadow_ptr);
 	OFFSET(SLBSHADOW_STACKVSID, slb_shadow, save_area[SLB_NUM_BOLTED - 1].vsid);
 	OFFSET(SLBSHADOW_STACKESID, slb_shadow, save_area[SLB_NUM_BOLTED - 1].esid);
 	OFFSET(SLBSHADOW_SAVEAREA, slb_shadow, save_area);
+#ifdef CONFIG_PPC_PSERIES
 	OFFSET(LPPACA_PMCINUSE, lppaca, pmcregs_in_use);
+#endif
+#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	OFFSET(PACA_PMCINUSE, paca_struct, pmcregs_in_use);
+#endif
 	OFFSET(LPPACA_DTLIDX, lppaca, dtl_idx);
 	OFFSET(LPPACA_YIELDCOUNT, lppaca, yield_count);
 	OFFSET(PACA_DTL_RIDX, paca_struct, dtl_ridx);
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 354a955ca377..5afd96980679 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -20,7 +20,7 @@
 
 #include "setup.h"
 
-#ifdef CONFIG_PPC_BOOK3S
+#ifdef CONFIG_PPC_PSERIES
 
 /*
  * The structure which the hypervisor knows about - this structure
@@ -47,6 +47,9 @@ static long __initdata lppaca_size;
 
 static void __init allocate_lppacas(int nr_cpus, unsigned long limit)
 {
+	if (!firmware_has_feature(FW_FEATURE_LPAR))
+		return;
+
 	if (nr_cpus <= NR_LPPACAS)
 		return;
 
@@ -60,6 +63,9 @@ static struct lppaca * __init new_lppaca(int cpu)
 {
 	struct lppaca *lp;
 
+	if (!firmware_has_feature(FW_FEATURE_LPAR))
+		return NULL;
+
 	if (cpu < NR_LPPACAS)
 		return &lppaca[cpu];
 
@@ -73,6 +79,9 @@ static void __init free_lppacas(void)
 {
 	long new_size = 0, nr;
 
+	if (!firmware_has_feature(FW_FEATURE_LPAR))
+		return;
+
 	if (!lppaca_size)
 		return;
 	nr = num_possible_cpus() - NR_LPPACAS;
@@ -157,9 +166,10 @@ EXPORT_SYMBOL(paca);
 
 void __init initialise_paca(struct paca_struct *new_paca, int cpu)
 {
-#ifdef CONFIG_PPC_BOOK3S
+#ifdef CONFIG_PPC_PSERIES
 	new_paca->lppaca_ptr = new_lppaca(cpu);
-#else
+#endif
+#ifdef CONFIG_PPC_BOOK3E
 	new_paca->kernel_pgd = swapper_pg_dir;
 #endif
 	new_paca->lock_token = 0x8000;
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index f83056297441..6f29c6d1cb76 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -728,6 +728,13 @@ void __init early_init_devtree(void *params)
 	 * FIXME .. and the initrd too? */
 	move_device_tree();
 
+	/*
+	 * Now try to figure out if we are running on LPAR and so on
+	 * This must run before allocate_pacas() in order to allocate
+	 * lppacas or not.
+	 */
+	pseries_probe_fw_features();
+
 	allocate_pacas();
 
 	DBG("Scanning CPUs ...\n");
@@ -758,9 +765,6 @@ void __init early_init_devtree(void *params)
 #endif
 	epapr_paravirt_early_init();
 
-	/* Now try to figure out if we are running on LPAR and so on */
-	pseries_probe_fw_features();
-
 #ifdef CONFIG_PPC_PS3
 	/* Identify PS3 firmware */
 	if (of_flat_dt_is_compatible(of_get_flat_dt_root(), "sony,ps3"))
diff --git a/arch/powerpc/kvm/book3s_hv_interrupts.S b/arch/powerpc/kvm/book3s_hv_interrupts.S
index dc54373c8780..0e8493033288 100644
--- a/arch/powerpc/kvm/book3s_hv_interrupts.S
+++ b/arch/powerpc/kvm/book3s_hv_interrupts.S
@@ -79,8 +79,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	li	r5, 0
 	mtspr	SPRN_MMCRA, r5
 	isync
-	ld	r3, PACALPPACAPTR(r13)	/* is the host using the PMU? */
-	lbz	r5, LPPACA_PMCINUSE(r3)
+	lbz	r5, PACA_PMCINUSE(r13)	/* is the host using the PMU? */
 	cmpwi	r5, 0
 	beq	31f			/* skip if not */
 	mfspr	r5, SPRN_MMCR1
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index c52184a8efdf..b838348e3a2b 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -99,8 +99,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_SPRG_VDSO_WRITE,r3
 
 	/* Reload the host's PMU registers */
-	ld	r3, PACALPPACAPTR(r13)	/* is the host using the PMU? */
-	lbz	r4, LPPACA_PMCINUSE(r3)
+	lbz	r4, PACA_PMCINUSE(r13) /* is the host using the PMU? */
 	cmpwi	r4, 0
 	beq	23f			/* skip if not */
 BEGIN_FTR_SECTION
@@ -1671,7 +1670,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_MMCRA, r7
 	isync
 	beq	21f			/* if no VPA, save PMU stuff anyway */
-	lbz	r7, LPPACA_PMCINUSE(r8)
+	lbz	r7, PACA_PMCINUSE(r13)
 	cmpwi	r7, 0			/* did they ask for PMU stuff to be saved? */
 	bne	21f
 	std	r3, VCPU_MMCR(r9)	/* if not, set saved MMCR0 to FC */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 8/9] powerpc/64: Use a table of paca pointers and allocate pacas individually
  2017-08-12 11:34 [PATCH v2 0/9] improve early structure allocations (paca, lppaca, etc) Nicholas Piggin
                   ` (6 preceding siblings ...)
  2017-08-13  1:33 ` [PATCH v2 7/9] powerpc/64s: do not allocate lppaca if we are not virtualized Nicholas Piggin
@ 2017-08-13  1:33 ` Nicholas Piggin
  2017-08-15 15:50   ` Nicholas Piggin
  2017-08-15 17:30   ` kbuild test robot
  2017-08-13  1:33 ` [PATCH v2 9/9] powerpc/64: Use a table of lppaca pointers and allocate lppacas individually Nicholas Piggin
  8 siblings, 2 replies; 28+ messages in thread
From: Nicholas Piggin @ 2017-08-13  1:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, kvm-ppc

Change the paca array into an array of pointers to pacas. Allocate
pacas individually.

This allows flexibility in where the PACAs are allocated. Future work
will allocate them node-local. Platforms that don't have address limits
on PACAs would be able to defer PACA allocations until later in boot
rather than allocate all possible ones up-front then freeing unused.

This is slightly more overhead (one additional indirection) for cross
CPU paca references, but those aren't too common.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/kvm_ppc.h           |  8 ++--
 arch/powerpc/include/asm/lppaca.h            |  2 +-
 arch/powerpc/include/asm/paca.h              |  4 +-
 arch/powerpc/include/asm/smp.h               |  4 +-
 arch/powerpc/kernel/crash.c                  |  2 +-
 arch/powerpc/kernel/head_64.S                | 12 +++--
 arch/powerpc/kernel/machine_kexec_64.c       | 22 ++++-----
 arch/powerpc/kernel/paca.c                   | 68 +++++++++++++++++++---------
 arch/powerpc/kernel/setup_64.c               | 18 ++++----
 arch/powerpc/kernel/smp.c                    | 10 ++--
 arch/powerpc/kvm/book3s_hv.c                 | 21 +++++----
 arch/powerpc/kvm/book3s_hv_builtin.c         |  2 +-
 arch/powerpc/mm/tlb-radix.c                  |  2 +-
 arch/powerpc/platforms/85xx/smp.c            |  8 ++--
 arch/powerpc/platforms/cell/smp.c            |  4 +-
 arch/powerpc/platforms/powernv/idle.c        | 13 +++---
 arch/powerpc/platforms/powernv/setup.c       |  4 +-
 arch/powerpc/platforms/powernv/smp.c         |  2 +-
 arch/powerpc/platforms/powernv/subcore.c     |  2 +-
 arch/powerpc/platforms/pseries/hotplug-cpu.c |  2 +-
 arch/powerpc/platforms/pseries/lpar.c        |  4 +-
 arch/powerpc/platforms/pseries/setup.c       |  2 +-
 arch/powerpc/platforms/pseries/smp.c         |  4 +-
 arch/powerpc/sysdev/xics/icp-native.c        |  2 +-
 arch/powerpc/xmon/xmon.c                     |  2 +-
 25 files changed, 127 insertions(+), 97 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index ba5fadd6f3c9..49da5d47c693 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -428,15 +428,15 @@ struct openpic;
 extern void kvm_cma_reserve(void) __init;
 static inline void kvmppc_set_xics_phys(int cpu, unsigned long addr)
 {
-	paca[cpu].kvm_hstate.xics_phys = (void __iomem *)addr;
+	paca_ptrs[cpu]->kvm_hstate.xics_phys = (void __iomem *)addr;
 }
 
 static inline void kvmppc_set_xive_tima(int cpu,
 					unsigned long phys_addr,
 					void __iomem *virt_addr)
 {
-	paca[cpu].kvm_hstate.xive_tima_phys = (void __iomem *)phys_addr;
-	paca[cpu].kvm_hstate.xive_tima_virt = virt_addr;
+	paca_ptrs[cpu]->kvm_hstate.xive_tima_phys = (void __iomem *)phys_addr;
+	paca_ptrs[cpu]->kvm_hstate.xive_tima_virt = virt_addr;
 }
 
 static inline u32 kvmppc_get_xics_latch(void)
@@ -450,7 +450,7 @@ static inline u32 kvmppc_get_xics_latch(void)
 
 static inline void kvmppc_set_host_ipi(int cpu, u8 host_ipi)
 {
-	paca[cpu].kvm_hstate.host_ipi = host_ipi;
+	paca_ptrs[cpu]->kvm_hstate.host_ipi = host_ipi;
 }
 
 static inline void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
index d0a2a2f99564..6e4589eee2da 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -103,7 +103,7 @@ struct lppaca {
 
 extern struct lppaca lppaca[];
 
-#define lppaca_of(cpu)	(*paca[cpu].lppaca_ptr)
+#define lppaca_of(cpu)	(*paca_ptrs[cpu]->lppaca_ptr)
 
 /*
  * We are using a non architected field to determine if a partition is
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index de47c5a4f132..f332f92996ab 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -228,10 +228,10 @@ struct paca_struct {
 	struct sibling_subcore_state *sibling_subcore_state;
 #endif
 #endif
-};
+} ____cacheline_aligned;
 
 extern void copy_mm_to_paca(struct mm_struct *mm);
-extern struct paca_struct *paca;
+extern struct paca_struct **paca_ptrs;
 extern void initialise_paca(struct paca_struct *new_paca, int cpu);
 extern void setup_paca(struct paca_struct *new_paca);
 extern void allocate_pacas(void);
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 8ea98504f900..1100574bcccd 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -164,12 +164,12 @@ static inline const struct cpumask *cpu_sibling_mask(int cpu)
 #ifdef CONFIG_PPC64
 static inline int get_hard_smp_processor_id(int cpu)
 {
-	return paca[cpu].hw_cpu_id;
+	return paca_ptrs[cpu]->hw_cpu_id;
 }
 
 static inline void set_hard_smp_processor_id(int cpu, int phys)
 {
-	paca[cpu].hw_cpu_id = phys;
+	paca_ptrs[cpu]->hw_cpu_id = phys;
 }
 #else
 /* 32-bit */
diff --git a/arch/powerpc/kernel/crash.c b/arch/powerpc/kernel/crash.c
index cbabb5adccd9..99eb8fd87d6f 100644
--- a/arch/powerpc/kernel/crash.c
+++ b/arch/powerpc/kernel/crash.c
@@ -230,7 +230,7 @@ static void __maybe_unused crash_kexec_wait_realmode(int cpu)
 		if (i == cpu)
 			continue;
 
-		while (paca[i].kexec_state < KEXEC_STATE_REAL_MODE) {
+		while (paca_ptrs[i]->kexec_state < KEXEC_STATE_REAL_MODE) {
 			barrier();
 			if (!cpu_possible(i) || !cpu_online(i) || (msecs <= 0))
 				break;
diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 0ddc602b33a4..f71f468ebe7f 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -386,19 +386,21 @@ generic_secondary_common_init:
 	 * physical cpu id in r24, we need to search the pacas to find
 	 * which logical id maps to our physical one.
 	 */
-	LOAD_REG_ADDR(r13, paca)	/* Load paca pointer		 */
-	ld	r13,0(r13)		/* Get base vaddr of paca array	 */
+	LOAD_REG_ADDR(r8, paca_ptrs)	/* Load paca_ptrs pointe	 */
+	ld	r8,0(r8)		/* Get base vaddr of array	 */
 #ifndef CONFIG_SMP
-	addi	r13,r13,PACA_SIZE	/* know r13 if used accidentally */
+	li	r13,r13,0		/* kill r13 if used accidentally */
 	b	kexec_wait		/* wait for next kernel if !SMP	 */
 #else
 	LOAD_REG_ADDR(r7, nr_cpu_ids)	/* Load nr_cpu_ids address       */
 	lwz	r7,0(r7)		/* also the max paca allocated 	 */
 	li	r5,0			/* logical cpu id                */
-1:	lhz	r6,PACAHWCPUID(r13)	/* Load HW procid from paca      */
+1:
+	sldi	r9,r5,3			/* get paca_ptrs[] index from cpu id */
+	ldx	r13,r8,r9		/* r13 = paca_ptrs[cpu id]       */
+	lhz	r6,PACAHWCPUID(r13)	/* Load HW procid from paca      */
 	cmpw	r6,r24			/* Compare to our id             */
 	beq	2f
-	addi	r13,r13,PACA_SIZE	/* Loop to next PACA on miss     */
 	addi	r5,r5,1
 	cmpw	r5,r7			/* Check if more pacas exist     */
 	blt	1b
diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c
index 5c12e21d0d1a..700cd25fbd28 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -168,24 +168,25 @@ static void kexec_prepare_cpus_wait(int wait_state)
 	 * are correctly onlined.  If somehow we start a CPU on boot with RTAS
 	 * start-cpu, but somehow that CPU doesn't write callin_cpu_map[] in
 	 * time, the boot CPU will timeout.  If it does eventually execute
-	 * stuff, the secondary will start up (paca[].cpu_start was written) and
-	 * get into a peculiar state.  If the platform supports
-	 * smp_ops->take_timebase(), the secondary CPU will probably be spinning
-	 * in there.  If not (i.e. pseries), the secondary will continue on and
-	 * try to online itself/idle/etc. If it survives that, we need to find
-	 * these possible-but-not-online-but-should-be CPUs and chaperone them
-	 * into kexec_smp_wait().
+	 * stuff, the secondary will start up (paca_ptrs[]->cpu_start was
+	 * written) and get into a peculiar state.
+	 * If the platform supports smp_ops->take_timebase(), the secondary CPU
+	 * will probably be spinning in there.  If not (i.e. pseries), the
+	 * secondary will continue on and try to online itself/idle/etc. If it
+	 * survives that, we need to find these
+	 * possible-but-not-online-but-should-be CPUs and chaperone them into
+	 * kexec_smp_wait().
 	 */
 	for_each_online_cpu(i) {
 		if (i == my_cpu)
 			continue;
 
-		while (paca[i].kexec_state < wait_state) {
+		while (paca_ptrs[i]->kexec_state < wait_state) {
 			barrier();
 			if (i != notified) {
 				printk(KERN_INFO "kexec: waiting for cpu %d "
 				       "(physical %d) to enter %i state\n",
-				       i, paca[i].hw_cpu_id, wait_state);
+				       i, paca_ptrs[i]->hw_cpu_id, wait_state);
 				notified = i;
 			}
 		}
@@ -327,8 +328,7 @@ void default_machine_kexec(struct kimage *image)
 	 */
 	memcpy(&kexec_paca, get_paca(), sizeof(struct paca_struct));
 	kexec_paca.data_offset = 0xedeaddeadeeeeeeeUL;
-	paca = (struct paca_struct *)RELOC_HIDE(&kexec_paca, 0) -
-		kexec_paca.paca_index;
+	paca_ptrs[kexec_paca.paca_index] = &kexec_paca;
 	setup_paca(&kexec_paca);
 
 	/* XXX: If anyone does 'dynamic lppacas' this will also need to be
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 5afd96980679..780c65a847d4 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -161,8 +161,8 @@ static void __init allocate_slb_shadows(int nr_cpus, int limit) { }
  * processors.  The processor VPD array needs one entry per physical
  * processor (not thread).
  */
-struct paca_struct *paca;
-EXPORT_SYMBOL(paca);
+struct paca_struct **paca_ptrs __read_mostly;
+EXPORT_SYMBOL(paca_ptrs);
 
 void __init initialise_paca(struct paca_struct *new_paca, int cpu)
 {
@@ -213,11 +213,13 @@ void setup_paca(struct paca_struct *new_paca)
 
 }
 
-static int __initdata paca_size;
+static int __initdata paca_nr_cpu_ids;
+static int __initdata paca_ptrs_size;
 
 void __init allocate_pacas(void)
 {
 	u64 limit;
+	unsigned long size = 0;
 	int cpu;
 
 	/*
@@ -226,13 +228,25 @@ void __init allocate_pacas(void)
 	 */
 	limit = min(safe_kva_limit(), ppc64_rma_size);
 
-	paca_size = PAGE_ALIGN(sizeof(struct paca_struct) * nr_cpu_ids);
+	paca_nr_cpu_ids = nr_cpu_ids;
 
-	paca = __va(memblock_alloc_base(paca_size, PAGE_SIZE, limit));
-	memset(paca, 0, paca_size);
+	paca_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
+	paca_ptrs = __va(memblock_alloc_base(paca_ptrs_size, 0, limit));
+	memset(paca_ptrs, 0, paca_ptrs_size);
 
-	printk(KERN_DEBUG "Allocated %u bytes for %d pacas at %p\n",
-		paca_size, nr_cpu_ids, paca);
+	size += paca_ptrs_size;
+
+	for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
+		unsigned long pa;
+
+		pa = memblock_alloc_base(sizeof(struct paca_struct),
+						L1_CACHE_BYTES, limit);
+		paca_ptrs[cpu] = __va(pa);
+
+		size += sizeof(struct paca_struct);
+	}
+
+	printk(KERN_DEBUG "Allocated %lu bytes for %d pacas\n", size, nr_cpu_ids);
 
 	allocate_lppacas(nr_cpu_ids, limit);
 
@@ -240,26 +254,38 @@ void __init allocate_pacas(void)
 
 	/* Can't use for_each_*_cpu, as they aren't functional yet */
 	for (cpu = 0; cpu < nr_cpu_ids; cpu++)
-		initialise_paca(&paca[cpu], cpu);
+		initialise_paca(paca_ptrs[cpu], cpu);
 }
 
 void __init free_unused_pacas(void)
 {
-	int new_size;
-
-	new_size = PAGE_ALIGN(sizeof(struct paca_struct) * nr_cpu_ids);
-
-	if (new_size >= paca_size)
-		return;
-
-	memblock_free(__pa(paca) + new_size, paca_size - new_size);
-
-	printk(KERN_DEBUG "Freed %u bytes for unused pacas\n",
-		paca_size - new_size);
+	unsigned long size = 0;
+	int new_ptrs_size;
+	int cpu;
 
-	paca_size = new_size;
+	for (cpu = 0; cpu < paca_nr_cpu_ids; cpu++) {
+		if (!cpu_possible(cpu)) {
+			unsigned long pa = __pa(paca_ptrs[cpu]);
+			memblock_free(pa, sizeof(struct paca_struct));
+			paca_ptrs[cpu] = NULL;
+			size += sizeof(struct paca_struct);
+		}
+	}
+
+	new_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
+	if (new_ptrs_size < paca_ptrs_size) {
+		memblock_free(__pa(paca_ptrs) + new_ptrs_size,
+					paca_ptrs_size - new_ptrs_size);
+		size += paca_ptrs_size - new_ptrs_size;
+	}
+
+	if (size)
+		printk(KERN_DEBUG "Freed %lu bytes for unused pacas\n", size);
 
 	free_lppacas();
+
+	paca_nr_cpu_ids = nr_cpu_ids;
+	paca_ptrs_size = new_ptrs_size;
 }
 
 void copy_mm_to_paca(struct mm_struct *mm)
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 35ad5f28f0c1..19439ca61d3e 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -108,7 +108,7 @@ void __init setup_tlb_core_data(void)
 		if (cpu_first_thread_sibling(boot_cpuid) == first)
 			first = boot_cpuid;
 
-		paca[cpu].tcd_ptr = &paca[first].tcd;
+		paca_ptrs[cpu]->tcd_ptr = &paca_ptrs[first]->tcd;
 
 		/*
 		 * If we have threads, we need either tlbsrx.
@@ -300,7 +300,7 @@ void __init early_setup(unsigned long dt_ptr)
 	early_init_devtree(__va(dt_ptr));
 
 	/* Now we know the logical id of our boot cpu, setup the paca. */
-	setup_paca(&paca[boot_cpuid]);
+	setup_paca(paca_ptrs[boot_cpuid]);
 	fixup_boot_paca();
 
 	/*
@@ -604,15 +604,15 @@ void __init exc_lvl_early_init(void)
 	for_each_possible_cpu(i) {
 		sp = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
 		critirq_ctx[i] = (struct thread_info *)__va(sp);
-		paca[i].crit_kstack = __va(sp + THREAD_SIZE);
+		paca_ptrs[i]->crit_kstack = __va(sp + THREAD_SIZE);
 
 		sp = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
 		dbgirq_ctx[i] = (struct thread_info *)__va(sp);
-		paca[i].dbg_kstack = __va(sp + THREAD_SIZE);
+		paca_ptrs[i]->dbg_kstack = __va(sp + THREAD_SIZE);
 
 		sp = memblock_alloc(THREAD_SIZE, THREAD_SIZE);
 		mcheckirq_ctx[i] = (struct thread_info *)__va(sp);
-		paca[i].mc_kstack = __va(sp + THREAD_SIZE);
+		paca_ptrs[i]->mc_kstack = __va(sp + THREAD_SIZE);
 	}
 
 	if (cpu_has_feature(CPU_FTR_DEBUG_LVL_EXC))
@@ -669,20 +669,20 @@ void __init emergency_stack_init(void)
 		ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit));
 		memset(ti, 0, THREAD_SIZE);
 		emerg_stack_init_thread_info(ti, i);
-		paca[i].emergency_sp = (void *)ti + THREAD_SIZE;
+		paca_ptrs[i]->emergency_sp = (void *)ti + THREAD_SIZE;
 
 #ifdef CONFIG_PPC_BOOK3S_64
 		/* emergency stack for NMI exception handling. */
 		ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit));
 		memset(ti, 0, THREAD_SIZE);
 		emerg_stack_init_thread_info(ti, i);
-		paca[i].nmi_emergency_sp = (void *)ti + THREAD_SIZE;
+		paca_ptrs[i]->nmi_emergency_sp = (void *)ti + THREAD_SIZE;
 
 		/* emergency stack for machine check exception handling. */
 		ti = __va(memblock_alloc_base(THREAD_SIZE, THREAD_SIZE, limit));
 		memset(ti, 0, THREAD_SIZE);
 		emerg_stack_init_thread_info(ti, i);
-		paca[i].mc_emergency_sp = (void *)ti + THREAD_SIZE;
+		paca_ptrs[i]->mc_emergency_sp = (void *)ti + THREAD_SIZE;
 #endif
 	}
 }
@@ -738,7 +738,7 @@ void __init setup_per_cpu_areas(void)
 	delta = (unsigned long)pcpu_base_addr - (unsigned long)__per_cpu_start;
 	for_each_possible_cpu(cpu) {
                 __per_cpu_offset[cpu] = delta + pcpu_unit_offsets[cpu];
-		paca[cpu].data_offset = __per_cpu_offset[cpu];
+		paca_ptrs[cpu]->data_offset = __per_cpu_offset[cpu];
 	}
 }
 #endif
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index cf0e1245b8cc..e0360a48eff4 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -121,8 +121,8 @@ int smp_generic_kick_cpu(int nr)
 	 * cpu_start field to become non-zero After we set cpu_start,
 	 * the processor will continue on to secondary_start
 	 */
-	if (!paca[nr].cpu_start) {
-		paca[nr].cpu_start = 1;
+	if (!paca_ptrs[nr]->cpu_start) {
+		paca_ptrs[nr]->cpu_start = 1;
 		smp_mb();
 		return 0;
 	}
@@ -613,7 +613,7 @@ void smp_prepare_boot_cpu(void)
 {
 	BUG_ON(smp_processor_id() != boot_cpuid);
 #ifdef CONFIG_PPC64
-	paca[boot_cpuid].__current = current;
+	paca_ptrs[boot_cpuid]->__current = current;
 #endif
 	set_numa_node(numa_cpu_lookup_table[boot_cpuid]);
 	current_set[boot_cpuid] = task_thread_info(current);
@@ -704,8 +704,8 @@ static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle)
 	struct thread_info *ti = task_thread_info(idle);
 
 #ifdef CONFIG_PPC64
-	paca[cpu].__current = idle;
-	paca[cpu].kstack = (unsigned long)ti + THREAD_SIZE - STACK_FRAME_OVERHEAD;
+	paca_ptrs[cpu]->__current = idle;
+	paca_ptrs[cpu]->kstack = (unsigned long)ti + THREAD_SIZE - STACK_FRAME_OVERHEAD;
 #endif
 	ti->cpu = cpu;
 	secondary_ti = current_set[cpu] = ti;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 1182cfd79857..f24406de4ebc 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -163,7 +163,7 @@ static bool kvmppc_ipi_thread(int cpu)
 
 #if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
 	if (cpu >= 0 && cpu < nr_cpu_ids) {
-		if (paca[cpu].kvm_hstate.xics_phys) {
+		if (paca_ptrs[cpu]->kvm_hstate.xics_phys) {
 			xics_wake_cpu(cpu);
 			return true;
 		}
@@ -2117,7 +2117,7 @@ static int kvmppc_grab_hwthread(int cpu)
 	struct paca_struct *tpaca;
 	long timeout = 10000;
 
-	tpaca = &paca[cpu];
+	tpaca = paca_ptrs[cpu];
 
 	/* Ensure the thread won't go into the kernel if it wakes */
 	tpaca->kvm_hstate.kvm_vcpu = NULL;
@@ -2150,7 +2150,7 @@ static void kvmppc_release_hwthread(int cpu)
 {
 	struct paca_struct *tpaca;
 
-	tpaca = &paca[cpu];
+	tpaca = paca_ptrs[cpu];
 	tpaca->kvm_hstate.hwthread_req = 0;
 	tpaca->kvm_hstate.kvm_vcpu = NULL;
 	tpaca->kvm_hstate.kvm_vcore = NULL;
@@ -2216,7 +2216,7 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu, struct kvmppc_vcore *vc)
 		vcpu->arch.thread_cpu = cpu;
 		cpumask_set_cpu(cpu, &kvm->arch.cpu_in_guest);
 	}
-	tpaca = &paca[cpu];
+	tpaca = paca_ptrs[cpu];
 	tpaca->kvm_hstate.kvm_vcpu = vcpu;
 	tpaca->kvm_hstate.ptid = cpu - vc->pcpu;
 	/* Order stores to hstate.kvm_vcpu etc. before store to kvm_vcore */
@@ -2242,7 +2242,7 @@ static void kvmppc_wait_for_nap(void)
 		 * for any threads that still have a non-NULL vcore ptr.
 		 */
 		for (i = 1; i < n_threads; ++i)
-			if (paca[cpu + i].kvm_hstate.kvm_vcore)
+			if (paca_ptrs[cpu + i]->kvm_hstate.kvm_vcore)
 				break;
 		if (i == n_threads) {
 			HMT_medium();
@@ -2252,7 +2252,7 @@ static void kvmppc_wait_for_nap(void)
 	}
 	HMT_medium();
 	for (i = 1; i < n_threads; ++i)
-		if (paca[cpu + i].kvm_hstate.kvm_vcore)
+		if (paca_ptrs[cpu + i]->kvm_hstate.kvm_vcore)
 			pr_err("KVM: CPU %d seems to be stuck\n", cpu + i);
 }
 
@@ -2743,7 +2743,7 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc)
 		smp_wmb();
 	}
 	for (thr = 0; thr < controlled_threads; ++thr)
-		paca[pcpu + thr].kvm_hstate.kvm_split_mode = sip;
+		paca_ptrs[pcpu + thr]->kvm_hstate.kvm_split_mode = sip;
 
 	/* Initiate micro-threading (split-core) if required */
 	if (cmd_bit) {
@@ -4255,7 +4255,7 @@ static int kvm_init_subcore_bitmap(void)
 		int node = cpu_to_node(first_cpu);
 
 		/* Ignore if it is already allocated. */
-		if (paca[first_cpu].sibling_subcore_state)
+		if (paca_ptrs[first_cpu]->sibling_subcore_state)
 			continue;
 
 		sibling_subcore_state =
@@ -4270,7 +4270,8 @@ static int kvm_init_subcore_bitmap(void)
 		for (j = 0; j < threads_per_core; j++) {
 			int cpu = first_cpu + j;
 
-			paca[cpu].sibling_subcore_state = sibling_subcore_state;
+			paca_ptrs[cpu]->sibling_subcore_state =
+						sibling_subcore_state;
 		}
 	}
 	return 0;
@@ -4297,7 +4298,7 @@ static int kvmppc_book3s_init_hv(void)
 
 	/*
 	 * We need a way of accessing the XICS interrupt controller,
-	 * either directly, via paca[cpu].kvm_hstate.xics_phys, or
+	 * either directly, via paca_ptrs[cpu]->kvm_hstate.xics_phys, or
 	 * indirectly, via OPAL.
 	 */
 #ifdef CONFIG_SMP
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 90644db9d38e..b57e42ab37bb 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -251,7 +251,7 @@ void kvmhv_rm_send_ipi(int cpu)
 	    return;
 
 	/* Else poke the target with an IPI */
-	xics_phys = paca[cpu].kvm_hstate.xics_phys;
+	xics_phys = paca_ptrs[cpu]->kvm_hstate.xics_phys;
 	if (xics_phys)
 		__raw_rm_writeb(IPI_PRIORITY, xics_phys + XICS_MFRR);
 	else
diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 16ae1bbe13f0..92ed44a97dcb 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -486,7 +486,7 @@ extern void radix_kvm_prefetch_workaround(struct mm_struct *mm)
 		for (; sib <= cpu_last_thread_sibling(cpu) && !flush; sib++) {
 			if (sib == cpu)
 				continue;
-			if (paca[sib].kvm_hstate.kvm_vcpu)
+			if (paca_ptrs[sib]->kvm_hstate.kvm_vcpu)
 				flush = true;
 		}
 		if (flush)
diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
index f51fd35f4618..7e966f4cf19a 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -147,7 +147,7 @@ static void qoriq_cpu_kill(unsigned int cpu)
 	for (i = 0; i < 500; i++) {
 		if (is_cpu_dead(cpu)) {
 #ifdef CONFIG_PPC64
-			paca[cpu].cpu_start = 0;
+			paca_ptrs[cpu]->cpu_start = 0;
 #endif
 			return;
 		}
@@ -328,7 +328,7 @@ static int smp_85xx_kick_cpu(int nr)
 		return ret;
 
 done:
-	paca[nr].cpu_start = 1;
+	paca_ptrs[nr]->cpu_start = 1;
 	generic_set_cpu_up(nr);
 
 	return ret;
@@ -409,14 +409,14 @@ void mpc85xx_smp_kexec_cpu_down(int crash_shutdown, int secondary)
 	}
 
 	if (disable_threadbit) {
-		while (paca[disable_cpu].kexec_state < KEXEC_STATE_REAL_MODE) {
+		while (paca_ptrs[disable_cpu]->kexec_state < KEXEC_STATE_REAL_MODE) {
 			barrier();
 			now = mftb();
 			if (!notified && now - start > 1000000) {
 				pr_info("%s/%d: waiting for cpu %d to enter KEXEC_STATE_REAL_MODE (%d)\n",
 					__func__, smp_processor_id(),
 					disable_cpu,
-					paca[disable_cpu].kexec_state);
+					paca_ptrs[disable_cpu]->kexec_state);
 				notified = true;
 			}
 		}
diff --git a/arch/powerpc/platforms/cell/smp.c b/arch/powerpc/platforms/cell/smp.c
index f84d52a2db40..1aeac5761e0b 100644
--- a/arch/powerpc/platforms/cell/smp.c
+++ b/arch/powerpc/platforms/cell/smp.c
@@ -83,7 +83,7 @@ static inline int smp_startup_cpu(unsigned int lcpu)
 	pcpu = get_hard_smp_processor_id(lcpu);
 
 	/* Fixup atomic count: it exited inside IRQ handler. */
-	task_thread_info(paca[lcpu].__current)->preempt_count	= 0;
+	task_thread_info(paca_ptrs[lcpu]->__current)->preempt_count	= 0;
 
 	/*
 	 * If the RTAS start-cpu token does not exist then presume the
@@ -126,7 +126,7 @@ static int smp_cell_kick_cpu(int nr)
 	 * cpu_start field to become non-zero After we set cpu_start,
 	 * the processor will continue on to secondary_start
 	 */
-	paca[nr].cpu_start = 1;
+	paca_ptrs[nr]->cpu_start = 1;
 
 	return 0;
 }
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index 2abee070373f..d974a5c877c4 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -79,7 +79,7 @@ static int pnv_save_sprs_for_deep_states(void)
 
 	for_each_possible_cpu(cpu) {
 		uint64_t pir = get_hard_smp_processor_id(cpu);
-		uint64_t hsprg0_val = (uint64_t)&paca[cpu];
+		uint64_t hsprg0_val = (uint64_t)paca_ptrs[cpu];
 
 		rc = opal_slw_set_reg(pir, SPRN_HSPRG0, hsprg0_val);
 		if (rc != 0)
@@ -172,12 +172,12 @@ static void pnv_alloc_idle_core_states(void)
 		for (j = 0; j < threads_per_core; j++) {
 			int cpu = first_cpu + j;
 
-			paca[cpu].core_idle_state_ptr = core_idle_state;
-			paca[cpu].thread_idle_state = PNV_THREAD_RUNNING;
-			paca[cpu].thread_mask = 1 << j;
+			paca_ptrs[cpu]->core_idle_state_ptr = core_idle_state;
+			paca_ptrs[cpu]->thread_idle_state = PNV_THREAD_RUNNING;
+			paca_ptrs[cpu]->thread_mask = 1 << j;
 			if (!cpu_has_feature(CPU_FTR_POWER9_DD1))
 				continue;
-			paca[cpu].thread_sibling_pacas =
+			paca_ptrs[cpu]->thread_sibling_pacas =
 				kmalloc_node(paca_ptr_array_size,
 					     GFP_KERNEL, node);
 		}
@@ -676,7 +676,8 @@ static int __init pnv_init_idle_states(void)
 			for (i = 0; i < threads_per_core; i++) {
 				int j = base_cpu + i;
 
-				paca[j].thread_sibling_pacas[idx] = &paca[cpu];
+				paca_ptrs[j]->thread_sibling_pacas[idx] =
+					paca_ptrs[cpu];
 			}
 		}
 	}
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index 897aa1400eb8..be563e913b43 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -204,7 +204,7 @@ static void pnv_kexec_wait_secondaries_down(void)
 			if (i != notified) {
 				printk(KERN_INFO "kexec: waiting for cpu %d "
 				       "(physical %d) to enter OPAL\n",
-				       i, paca[i].hw_cpu_id);
+				       i, paca_ptrs[i]->hw_cpu_id);
 				notified = i;
 			}
 
@@ -216,7 +216,7 @@ static void pnv_kexec_wait_secondaries_down(void)
 			if (timeout-- == 0) {
 				printk(KERN_ERR "kexec: timed out waiting for "
 				       "cpu %d (physical %d) to enter OPAL\n",
-				       i, paca[i].hw_cpu_id);
+				       i, paca_ptrs[i]->hw_cpu_id);
 				break;
 			}
 		}
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index 40dae96f7e20..e4e48fdf4c1f 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -70,7 +70,7 @@ static int pnv_smp_kick_cpu(int nr)
 	 * If we already started or OPAL is not supported, we just
 	 * kick the CPU via the PACA
 	 */
-	if (paca[nr].cpu_start || !firmware_has_feature(FW_FEATURE_OPAL))
+	if (paca_ptrs[nr]->cpu_start || !firmware_has_feature(FW_FEATURE_OPAL))
 		goto kick;
 
 	/*
diff --git a/arch/powerpc/platforms/powernv/subcore.c b/arch/powerpc/platforms/powernv/subcore.c
index 596ae2e98040..45563004feda 100644
--- a/arch/powerpc/platforms/powernv/subcore.c
+++ b/arch/powerpc/platforms/powernv/subcore.c
@@ -280,7 +280,7 @@ void update_subcore_sibling_mask(void)
 		int offset = (tid / threads_per_subcore) * threads_per_subcore;
 		int mask = sibling_mask_first_cpu << offset;
 
-		paca[cpu].subcore_sibling_mask = mask;
+		paca_ptrs[cpu]->subcore_sibling_mask = mask;
 
 	}
 }
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 6afd1efd3633..2245b8e47969 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -226,7 +226,7 @@ static void pseries_cpu_die(unsigned int cpu)
 	 * done here.  Change isolate state to Isolate and
 	 * change allocation-state to Unusable.
 	 */
-	paca[cpu].cpu_start = 0;
+	paca_ptrs[cpu]->cpu_start = 0;
 }
 
 /*
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 495ba4e7336d..eb0064d50ac6 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -99,7 +99,7 @@ void vpa_init(int cpu)
 	 * reports that.  All SPLPAR support SLB shadow buffer.
 	 */
 	if (!radix_enabled() && firmware_has_feature(FW_FEATURE_SPLPAR)) {
-		addr = __pa(paca[cpu].slb_shadow_ptr);
+		addr = __pa(paca_ptrs[cpu]->slb_shadow_ptr);
 		ret = register_slb_shadow(hwcpu, addr);
 		if (ret)
 			pr_err("WARNING: SLB shadow buffer registration for "
@@ -111,7 +111,7 @@ void vpa_init(int cpu)
 	/*
 	 * Register dispatch trace log, if one has been allocated.
 	 */
-	pp = &paca[cpu];
+	pp = paca_ptrs[cpu];
 	dtl = pp->dispatch_log;
 	if (dtl) {
 		pp->dtl_ridx = 0;
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index b5d86426e97b..e0fc426e2ce2 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -242,7 +242,7 @@ static int alloc_dispatch_logs(void)
 		return 0;
 
 	for_each_possible_cpu(cpu) {
-		pp = &paca[cpu];
+		pp = paca_ptrs[cpu];
 		dtl = kmem_cache_alloc(dtl_cache, GFP_KERNEL);
 		if (!dtl) {
 			pr_warn("Failed to allocate dispatch trace log for cpu %d\n",
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index 24785f63fb40..942274c109ee 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -109,7 +109,7 @@ static inline int smp_startup_cpu(unsigned int lcpu)
 	}
 
 	/* Fixup atomic count: it exited inside IRQ handler. */
-	task_thread_info(paca[lcpu].__current)->preempt_count	= 0;
+	task_thread_info(paca_ptrs[lcpu]->__current)->preempt_count	= 0;
 #ifdef CONFIG_HOTPLUG_CPU
 	if (get_cpu_current_state(lcpu) == CPU_STATE_INACTIVE)
 		goto out;
@@ -162,7 +162,7 @@ static int smp_pSeries_kick_cpu(int nr)
 	 * cpu_start field to become non-zero After we set cpu_start,
 	 * the processor will continue on to secondary_start
 	 */
-	paca[nr].cpu_start = 1;
+	paca_ptrs[nr]->cpu_start = 1;
 #ifdef CONFIG_HOTPLUG_CPU
 	set_preferred_offline_state(nr, CPU_STATE_ONLINE);
 
diff --git a/arch/powerpc/sysdev/xics/icp-native.c b/arch/powerpc/sysdev/xics/icp-native.c
index 2bfb9968d562..68663723ba22 100644
--- a/arch/powerpc/sysdev/xics/icp-native.c
+++ b/arch/powerpc/sysdev/xics/icp-native.c
@@ -164,7 +164,7 @@ void icp_native_cause_ipi_rm(int cpu)
 	 * Just like the cause_ipi functions, it is required to
 	 * include a full barrier before causing the IPI.
 	 */
-	xics_phys = paca[cpu].kvm_hstate.xics_phys;
+	xics_phys = paca_ptrs[cpu]->kvm_hstate.xics_phys;
 	mb();
 	__raw_rm_writeb(IPI_PRIORITY, xics_phys + XICS_MFRR);
 }
diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 08e367e3e8c3..e36935dc5017 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -2247,7 +2247,7 @@ static void dump_one_paca(int cpu)
 	catch_memory_errors = 1;
 	sync();
 
-	p = &paca[cpu];
+	p = paca_ptrs[cpu];
 
 	printf("paca for cpu 0x%x @ %p:\n", cpu, p);
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 9/9] powerpc/64: Use a table of lppaca pointers and allocate lppacas individually
  2017-08-12 11:34 [PATCH v2 0/9] improve early structure allocations (paca, lppaca, etc) Nicholas Piggin
                   ` (7 preceding siblings ...)
  2017-08-13  1:33 ` [PATCH v2 8/9] powerpc/64: Use a table of paca pointers and allocate pacas individually Nicholas Piggin
@ 2017-08-13  1:33 ` Nicholas Piggin
  8 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2017-08-13  1:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin, kvm-ppc

Similary to the previous patch, allocate LPPACAs individually.

We no longer allocate lppacas in an array, so this patch removes the 1kB
static alignment for the structure, and enforce the PAPR alignment
requirements at allocation time. We can not reduce the 1kB allocation size
however, due to existing KVM hypervisors.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/lppaca.h      | 24 +++++------
 arch/powerpc/kernel/machine_kexec_64.c |  8 +++-
 arch/powerpc/kernel/paca.c             | 76 ++++++++++++++++++----------------
 arch/powerpc/kvm/book3s_hv.c           |  9 +---
 arch/powerpc/mm/numa.c                 |  4 +-
 arch/powerpc/platforms/pseries/kexec.c |  7 +++-
 6 files changed, 68 insertions(+), 60 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
index 6e4589eee2da..457a81f0fb58 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -36,14 +36,16 @@
 #include <asm/mmu.h>
 
 /*
- * We only have to have statically allocated lppaca structs on
- * legacy iSeries, which supports at most 64 cpus.
- */
-#define NR_LPPACAS	1
-
-/*
- * The Hypervisor barfs if the lppaca crosses a page boundary.  A 1k
- * alignment is sufficient to prevent this
+ * The lppaca is the "virtual processor area" registered with the hypervisor,
+ * H_REGISTER_VPA etc.
+ *
+ * According to PAPR, the structure is 640 bytes long, must be L1 cache line
+ * aligned, and must not cross a 4kB boundary. Its size field must be at
+ * least 640 bytes (but may be more).
+ *
+ * Pre-v4.14 KVM hypervisors reject the VPA if its size field is smaller than
+ * 1kB, so we dynamically allocate 1kB and set size to 1kB, but keep the
+ * structure as the canonical 640 byte size.
  */
 struct lppaca {
 	/* cacheline 1 contains read-only data */
@@ -97,11 +99,9 @@ struct lppaca {
 
 	__be32	page_ins;		/* CMO Hint - # page ins by OS */
 	u8	reserved11[148];
-	volatile __be64 dtl_idx;		/* Dispatch Trace Log head index */
+	volatile __be64 dtl_idx;	/* Dispatch Trace Log head index */
 	u8	reserved12[96];
-} __attribute__((__aligned__(0x400)));
-
-extern struct lppaca lppaca[];
+} ____cacheline_aligned;
 
 #define lppaca_of(cpu)	(*paca_ptrs[cpu]->lppaca_ptr)
 
diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c
index 700cd25fbd28..fb6acf822088 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -329,11 +329,15 @@ void default_machine_kexec(struct kimage *image)
 	memcpy(&kexec_paca, get_paca(), sizeof(struct paca_struct));
 	kexec_paca.data_offset = 0xedeaddeadeeeeeeeUL;
 	paca_ptrs[kexec_paca.paca_index] = &kexec_paca;
+
 	setup_paca(&kexec_paca);
 
-	/* XXX: If anyone does 'dynamic lppacas' this will also need to be
-	 * switched to a static version!
+	/*
+	 * The lppaca should be unregistered at this point. In the case
+	 * of a crash, none of the lppacas are unregistered so there is
+	 * not much we can do about it here.
 	 */
+
 	/*
 	 * On Book3S, the copy must happen with the MMU off if we are either
 	 * using Radix page tables or we are not in an LPAR since we can
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 780c65a847d4..b3fcd5df84ee 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -19,44 +19,53 @@
 #include <asm/kexec.h>
 
 #include "setup.h"
+static int __initdata paca_nr_cpu_ids;
 
 #ifdef CONFIG_PPC_PSERIES
 
 /*
- * The structure which the hypervisor knows about - this structure
- * should not cross a page boundary.  The vpa_init/register_vpa call
- * is now known to fail if the lppaca structure crosses a page
- * boundary.  The lppaca is also used on POWER5 pSeries boxes.
- * The lppaca is 640 bytes long, and cannot readily
- * change since the hypervisor knows its layout, so a 1kB alignment
- * will suffice to ensure that it doesn't cross a page boundary.
+ * See asm/lppaca.h for more detail.
+ *
+ * lppaca structures must must be 1kB in size, L1 cache line aligned,
+ * and not cross 4kB boundary. A 1kB size and 1kB alignment will satisfy
+ * these requirements.
  */
-struct lppaca lppaca[] = {
-	[0 ... (NR_LPPACAS-1)] = {
+static inline void init_lppaca(struct lppaca *lppaca)
+{
+	BUILD_BUG_ON(sizeof(struct lppaca) != 640);
+
+	*lppaca = (struct lppaca) {
 		.desc = cpu_to_be32(0xd397d781),	/* "LpPa" */
-		.size = cpu_to_be16(sizeof(struct lppaca)),
+		.size = cpu_to_be16(0x400),
 		.fpregs_in_use = 1,
 		.slb_count = cpu_to_be16(64),
 		.vmxregs_in_use = 0,
-		.page_ins = 0,
-	},
+		.page_ins = 0, };
 };
 
-static struct lppaca *extra_lppacas;
-static long __initdata lppaca_size;
+static struct lppaca ** __initdata lppaca_ptrs;
+
+static long __initdata lppaca_ptrs_size;
 
 static void __init allocate_lppacas(int nr_cpus, unsigned long limit)
 {
+	size_t size = 0x400;
+	int cpu;
+
+	BUILD_BUG_ON(size < sizeof(struct lppaca));
+
 	if (!firmware_has_feature(FW_FEATURE_LPAR))
 		return;
 
-	if (nr_cpus <= NR_LPPACAS)
-		return;
+	lppaca_ptrs_size = sizeof(struct lppaca *) * nr_cpu_ids;
+	lppaca_ptrs = __va(memblock_alloc_base(lppaca_ptrs_size, 0, limit));
+
+	for (cpu = 0; cpu < nr_cpu_ids; cpu++) {
+		unsigned long pa;
 
-	lppaca_size = PAGE_ALIGN(sizeof(struct lppaca) *
-				 (nr_cpus - NR_LPPACAS));
-	extra_lppacas = __va(memblock_alloc_base(lppaca_size,
-						 PAGE_SIZE, limit));
+		pa = memblock_alloc_base(size, 0x400, limit);
+		lppaca_ptrs[cpu] = __va(pa);
+	}
 }
 
 static struct lppaca * __init new_lppaca(int cpu)
@@ -66,32 +75,28 @@ static struct lppaca * __init new_lppaca(int cpu)
 	if (!firmware_has_feature(FW_FEATURE_LPAR))
 		return NULL;
 
-	if (cpu < NR_LPPACAS)
-		return &lppaca[cpu];
-
-	lp = extra_lppacas + (cpu - NR_LPPACAS);
-	*lp = lppaca[0];
+	lp = lppaca_ptrs[cpu];
+	init_lppaca(lp);
 
 	return lp;
 }
 
 static void __init free_lppacas(void)
 {
-	long new_size = 0, nr;
+	int cpu;
 
 	if (!firmware_has_feature(FW_FEATURE_LPAR))
 		return;
 
-	if (!lppaca_size)
-		return;
-	nr = num_possible_cpus() - NR_LPPACAS;
-	if (nr > 0)
-		new_size = PAGE_ALIGN(nr * sizeof(struct lppaca));
-	if (new_size >= lppaca_size)
-		return;
+	for (cpu = 0; cpu < paca_nr_cpu_ids; cpu++) {
+		if (!cpu_possible(cpu)) {
+			unsigned long pa = __pa(lppaca_ptrs[cpu]);
+			memblock_free(pa, sizeof(struct lppaca));
+			lppaca_ptrs[cpu] = NULL;
+		}
+	}
 
-	memblock_free(__pa(extra_lppacas) + new_size, lppaca_size - new_size);
-	lppaca_size = new_size;
+	memblock_free(__pa(lppaca_ptrs), lppaca_ptrs_size);
 }
 
 #else
@@ -213,7 +218,6 @@ void setup_paca(struct paca_struct *new_paca)
 
 }
 
-static int __initdata paca_nr_cpu_ids;
 static int __initdata paca_ptrs_size;
 
 void __init allocate_pacas(void)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index f24406de4ebc..a3cd052476bf 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -485,13 +485,8 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu *vcpu,
 
 	switch (subfunc) {
 	case H_VPA_REG_VPA:		/* register VPA */
-		/*
-		 * The size of our lppaca is 1kB because of the way we align
-		 * it for the guest to avoid crossing a 4kB boundary. We only
-		 * use 640 bytes of the structure though, so we should accept
-		 * clients that set a size of 640.
-		 */
-		if (len < 640)
+		BUILD_BUG_ON(sizeof(struct lppaca) != 640);
+		if (len < sizeof(struct lppaca))
 			break;
 		vpap = &tvcpu->arch.vpa;
 		err = 0;
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index b95c584ce19d..55e3fa5fcfb0 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1168,7 +1168,7 @@ static void setup_cpu_associativity_change_counters(void)
 	for_each_possible_cpu(cpu) {
 		int i;
 		u8 *counts = vphn_cpu_change_counts[cpu];
-		volatile u8 *hypervisor_counts = lppaca[cpu].vphn_assoc_counts;
+		volatile u8 *hypervisor_counts = lppaca_of(cpu).vphn_assoc_counts;
 
 		for (i = 0; i < distance_ref_points_depth; i++)
 			counts[i] = hypervisor_counts[i];
@@ -1194,7 +1194,7 @@ static int update_cpu_associativity_changes_mask(void)
 	for_each_possible_cpu(cpu) {
 		int i, changed = 0;
 		u8 *counts = vphn_cpu_change_counts[cpu];
-		volatile u8 *hypervisor_counts = lppaca[cpu].vphn_assoc_counts;
+		volatile u8 *hypervisor_counts = lppaca_of(cpu).vphn_assoc_counts;
 
 		for (i = 0; i < distance_ref_points_depth; i++) {
 			if (hypervisor_counts[i] != counts[i]) {
diff --git a/arch/powerpc/platforms/pseries/kexec.c b/arch/powerpc/platforms/pseries/kexec.c
index 6681ac97fb18..2ad3e3919b25 100644
--- a/arch/powerpc/platforms/pseries/kexec.c
+++ b/arch/powerpc/platforms/pseries/kexec.c
@@ -22,7 +22,12 @@
 
 void pseries_kexec_cpu_down(int crash_shutdown, int secondary)
 {
-	/* Don't risk a hypervisor call if we're crashing */
+	/*
+	 * Don't risk a hypervisor call if we're crashing
+	 * XXX: Why? The hypervisor is not crashing. It might be better
+	 * to at least attempt unregister to avoid the hypervisor stepping
+	 * on our memory.
+	 */
 	if (firmware_has_feature(FW_FEATURE_SPLPAR) && !crash_shutdown) {
 		int ret;
 		int cpu = smp_processor_id();
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/9] powerpc/powernv: Remove real mode access limit for early allocations
  2017-08-13  1:33 ` [PATCH v2 3/9] powerpc/powernv: Remove real mode access limit for early allocations Nicholas Piggin
@ 2017-08-14 12:49   ` Michael Ellerman
  2017-08-14 13:10     ` Benjamin Herrenschmidt
  2017-08-14 13:13     ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 28+ messages in thread
From: Michael Ellerman @ 2017-08-14 12:49 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: kvm-ppc, Nicholas Piggin, aneesh.kumar

Nicholas Piggin <npiggin@gmail.com> writes:

> This removes the RMA limit on powernv platform, which constrains
> early allocations such as PACAs and stacks. There are still other
> restrictions that must be followed, such as bolted SLB limits, but
> real mode addressing has no constraints.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/mm/hash_utils_64.c | 24 +++++++++++++++---------
>  arch/powerpc/mm/pgtable-radix.c | 33 +++++++++++++++++----------------

I missed that we'd duplicated this logic for radix vs hash [yes I know I
merged the commit that did it :)]

> diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
> index 671a45d86c18..61ca17d81737 100644
> --- a/arch/powerpc/mm/pgtable-radix.c
> +++ b/arch/powerpc/mm/pgtable-radix.c
> @@ -598,22 +598,23 @@ void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
>  	 * physical on those processors
>  	 */
>  	BUG_ON(first_memblock_base != 0);
> -	/*
> -	 * We limit the allocation that depend on ppc64_rma_size
> -	 * to first_memblock_size. We also clamp it to 1GB to
> -	 * avoid some funky things such as RTAS bugs.

That comment about RTAS is 7 years old, and I'm pretty sure it was a
historical note when it was written.

I'm inclined to drop it and if we discover new bugs with RTAS on Power9
then we can always put it back.

> -	 *
> -	 * On radix config we really don't have a limitation
> -	 * on real mode access. But keeping it as above works
> -	 * well enough.

Ergh.

> -	 */
> -	ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
> -	/*
> -	 * Finally limit subsequent allocations. We really don't want
> -	 * to limit the memblock allocations to rma_size. FIXME!! should
> -	 * we even limit at all ?
> -	 */

So I think we should just delete this function entirely.

Any objections?

cheers

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/9] powerpc/powernv: Remove real mode access limit for early allocations
  2017-08-14 12:49   ` Michael Ellerman
@ 2017-08-14 13:10     ` Benjamin Herrenschmidt
  2017-08-14 14:51       ` Nicholas Piggin
  2017-08-14 13:13     ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 28+ messages in thread
From: Benjamin Herrenschmidt @ 2017-08-14 13:10 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, linuxppc-dev
  Cc: Suraj Jitindar Singh, kvm-ppc, aneesh.kumar

On Mon, 2017-08-14 at 22:49 +1000, Michael Ellerman wrote:
> Nicholas Piggin <npiggin@gmail.com> writes:
> 
> > This removes the RMA limit on powernv platform, which constrains
> > early allocations such as PACAs and stacks. There are still other
> > restrictions that must be followed, such as bolted SLB limits, but
> > real mode addressing has no constraints.

For radix, should we consider making the PACAs chip/node local ?

> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> >  arch/powerpc/mm/hash_utils_64.c | 24 +++++++++++++++---------
> >  arch/powerpc/mm/pgtable-radix.c | 33 +++++++++++++++++----------------
> 
> I missed that we'd duplicated this logic for radix vs hash [yes I know I
> merged the commit that did it :)]
> 
> > diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c
> > index 671a45d86c18..61ca17d81737 100644
> > --- a/arch/powerpc/mm/pgtable-radix.c
> > +++ b/arch/powerpc/mm/pgtable-radix.c
> > @@ -598,22 +598,23 @@ void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
> >  	 * physical on those processors
> >  	 */
> >  	BUG_ON(first_memblock_base != 0);
> > -	/*
> > -	 * We limit the allocation that depend on ppc64_rma_size
> > -	 * to first_memblock_size. We also clamp it to 1GB to
> > -	 * avoid some funky things such as RTAS bugs.
> 
> That comment about RTAS is 7 years old, and I'm pretty sure it was a
> historical note when it was written.
> 
> I'm inclined to drop it and if we discover new bugs with RTAS on Power9
> then we can always put it back.
> 
> > -	 *
> > -	 * On radix config we really don't have a limitation
> > -	 * on real mode access. But keeping it as above works
> > -	 * well enough.
> 
> Ergh.
> 
> > -	 */
> > -	ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
> > -	/*
> > -	 * Finally limit subsequent allocations. We really don't want
> > -	 * to limit the memblock allocations to rma_size. FIXME!! should
> > -	 * we even limit at all ?
> > -	 */
> 
> So I think we should just delete this function entirely.
> 
> Any objections?
> 
> cheers

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/9] powerpc/powernv: Remove real mode access limit for early allocations
  2017-08-14 12:49   ` Michael Ellerman
  2017-08-14 13:10     ` Benjamin Herrenschmidt
@ 2017-08-14 13:13     ` Benjamin Herrenschmidt
  2017-08-15 12:10       ` Nicholas Piggin
  2017-09-12 10:13       ` Aneesh Kumar K.V
  1 sibling, 2 replies; 28+ messages in thread
From: Benjamin Herrenschmidt @ 2017-08-14 13:13 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, linuxppc-dev
  Cc: Suraj Jitindar Singh, kvm-ppc, aneesh.kumar

On Mon, 2017-08-14 at 22:49 +1000, Michael Ellerman wrote:
> > -     /*
> > -      * We limit the allocation that depend on ppc64_rma_size
> > -      * to first_memblock_size. We also clamp it to 1GB to
> > -      * avoid some funky things such as RTAS bugs.
> 
> That comment about RTAS is 7 years old, and I'm pretty sure it was a
> historical note when it was written.
> 
> I'm inclined to drop it and if we discover new bugs with RTAS on Power9
> then we can always put it back.

Arent' we using a 32-bit RTAS ? (Afaik there's a 64-bit one, we just
never used it ..). In this case we need to at least clamp to 2G (no
trust RTAS doing unsigned properly).

> > -      *
> > -      * On radix config we really don't have a limitation
> > -      * on real mode access. But keeping it as above works
> > -      * well enough.
> 
> Ergh.
> 
> > -      */
> > -     ppc64_rma_size = min_t(u64, first_memblock_size, 0x40000000);
> > -     /*
> > -      * Finally limit subsequent allocations. We really don't want
> > -      * to limit the memblock allocations to rma_size. FIXME!! should
> > -      * we even limit at all ?
> > -      */
> 
> So I think we should just delete this function entirely.
> 
> Any objections?

Well.. RTAS is quite sucky ... 

Ben.

> cheers

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/9] powerpc/powernv: Remove real mode access limit for early allocations
  2017-08-14 13:10     ` Benjamin Herrenschmidt
@ 2017-08-14 14:51       ` Nicholas Piggin
  0 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2017-08-14 14:51 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael Ellerman, linuxppc-dev, Suraj Jitindar Singh, kvm-ppc,
	aneesh.kumar

On Mon, 14 Aug 2017 23:10:50 +1000
Benjamin Herrenschmidt <benh@au1.ibm.com> wrote:

> On Mon, 2017-08-14 at 22:49 +1000, Michael Ellerman wrote:
> > Nicholas Piggin <npiggin@gmail.com> writes:
> >   
> > > This removes the RMA limit on powernv platform, which constrains
> > > early allocations such as PACAs and stacks. There are still other
> > > restrictions that must be followed, such as bolted SLB limits, but
> > > real mode addressing has no constraints.  
> 
> For radix, should we consider making the PACAs chip/node local ?

Yes that's the main goal of the series. I had NUMAization patches
at the end. Just dropped them for now because some of them need
toplogy information that's not available (that's why I was asking
about moving unflattening earlier in boot, but we may be able to
move allocations later too).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/9] KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validation
  2017-08-13  1:33 ` [PATCH v2 1/9] KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validation Nicholas Piggin
@ 2017-08-15 11:24   ` Michael Ellerman
  2017-08-31  3:41   ` Paul Mackerras
  1 sibling, 0 replies; 28+ messages in thread
From: Michael Ellerman @ 2017-08-15 11:24 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: kvm-ppc, Nicholas Piggin

Nicholas Piggin <npiggin@gmail.com> writes:

> KVM currently validates the size of the VPA registered by the client
> against sizeof(struct lppaca), however we align (and therefore size)
> that struct to 1kB to avoid crossing a 4kB boundary in the client.
>
> PAPR calls for sizes >= 640 bytes to be accepted. Hard code this with
> a comment.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/kvm/book3s_hv.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)

This one should go via Paul.

Hopefully he can just pick it up.

cheers

> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 359c79cdf0cc..1182cfd79857 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -485,7 +485,13 @@ static unsigned long do_h_register_vpa(struct kvm_vcpu *vcpu,
>  
>  	switch (subfunc) {
>  	case H_VPA_REG_VPA:		/* register VPA */
> -		if (len < sizeof(struct lppaca))
> +		/*
> +		 * The size of our lppaca is 1kB because of the way we align
> +		 * it for the guest to avoid crossing a 4kB boundary. We only
> +		 * use 640 bytes of the structure though, so we should accept
> +		 * clients that set a size of 640.
> +		 */
> +		if (len < 640)
>  			break;
>  		vpap = &tvcpu->arch.vpa;
>  		err = 0;
> -- 
> 2.13.3

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/9] powerpc/powernv: Remove real mode access limit for early allocations
  2017-08-14 13:13     ` Benjamin Herrenschmidt
@ 2017-08-15 12:10       ` Nicholas Piggin
  2017-08-15 12:48         ` Benjamin Herrenschmidt
  2017-09-12 10:13       ` Aneesh Kumar K.V
  1 sibling, 1 reply; 28+ messages in thread
From: Nicholas Piggin @ 2017-08-15 12:10 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael Ellerman, linuxppc-dev, Suraj Jitindar Singh, kvm-ppc,
	aneesh.kumar

On Mon, 14 Aug 2017 23:13:07 +1000
Benjamin Herrenschmidt <benh@au1.ibm.com> wrote:

> On Mon, 2017-08-14 at 22:49 +1000, Michael Ellerman wrote:
> > > -     /*
> > > -      * We limit the allocation that depend on ppc64_rma_size
> > > -      * to first_memblock_size. We also clamp it to 1GB to
> > > -      * avoid some funky things such as RTAS bugs.  
> > 
> > That comment about RTAS is 7 years old, and I'm pretty sure it was a
> > historical note when it was written.
> > 
> > I'm inclined to drop it and if we discover new bugs with RTAS on Power9
> > then we can always put it back.  
> 
> Arent' we using a 32-bit RTAS ? (Afaik there's a 64-bit one, we just
> never used it ..). In this case we need to at least clamp to 2G (no
> trust RTAS doing unsigned properly).

Is there any allocation not covered by RTAS_INSTANTIATE_MAX?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/9] powerpc/powernv: Remove real mode access limit for early allocations
  2017-08-15 12:10       ` Nicholas Piggin
@ 2017-08-15 12:48         ` Benjamin Herrenschmidt
  2017-08-15 13:02           ` Nicholas Piggin
  0 siblings, 1 reply; 28+ messages in thread
From: Benjamin Herrenschmidt @ 2017-08-15 12:48 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Michael Ellerman, linuxppc-dev, Suraj Jitindar Singh, kvm-ppc,
	aneesh.kumar

On Tue, 2017-08-15 at 22:10 +1000, Nicholas Piggin wrote:
> On Mon, 14 Aug 2017 23:13:07 +1000
> Benjamin Herrenschmidt <benh@au1.ibm.com> wrote:
> 
> > On Mon, 2017-08-14 at 22:49 +1000, Michael Ellerman wrote:
> > > > -     /*
> > > > -      * We limit the allocation that depend on ppc64_rma_size
> > > > -      * to first_memblock_size. We also clamp it to 1GB to
> > > > -      * avoid some funky things such as RTAS bugs.  
> > > 
> > > That comment about RTAS is 7 years old, and I'm pretty sure it was a
> > > historical note when it was written.
> > > 
> > > I'm inclined to drop it and if we discover new bugs with RTAS on Power9
> > > then we can always put it back.  
> > 
> > Arent' we using a 32-bit RTAS ? (Afaik there's a 64-bit one, we just
> > never used it ..). In this case we need to at least clamp to 2G (no
> > trust RTAS doing unsigned properly).
> 
> Is there any allocation not covered by RTAS_INSTANTIATE_MAX?

Not sure, we have to audit. Talking about all this with mpe today, I
think we just need to make sure that anything that has a restriction
uses a specific identifier for *that* restriction rather than just
blindly "rma". For example, seg0_limit for segment 0 in HPT. In the
case of PACAs, we would create a specific limit that is
min(seg0_limit,rma) for pseries and -1 for powernv. etc..

The RMA limit can then become either strictly a pseries thing, or be
initialized to -1 on powernv (or max mem).

Ben.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/9] powerpc/powernv: Remove real mode access limit for early allocations
  2017-08-15 12:48         ` Benjamin Herrenschmidt
@ 2017-08-15 13:02           ` Nicholas Piggin
  0 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2017-08-15 13:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Michael Ellerman, linuxppc-dev, Suraj Jitindar Singh, kvm-ppc,
	aneesh.kumar

On Tue, 15 Aug 2017 22:48:22 +1000
Benjamin Herrenschmidt <benh@au1.ibm.com> wrote:

> On Tue, 2017-08-15 at 22:10 +1000, Nicholas Piggin wrote:
> > On Mon, 14 Aug 2017 23:13:07 +1000
> > Benjamin Herrenschmidt <benh@au1.ibm.com> wrote:
> >   
> > > On Mon, 2017-08-14 at 22:49 +1000, Michael Ellerman wrote:  
> > > > > -     /*
> > > > > -      * We limit the allocation that depend on ppc64_rma_size
> > > > > -      * to first_memblock_size. We also clamp it to 1GB to
> > > > > -      * avoid some funky things such as RTAS bugs.    
> > > > 
> > > > That comment about RTAS is 7 years old, and I'm pretty sure it was a
> > > > historical note when it was written.
> > > > 
> > > > I'm inclined to drop it and if we discover new bugs with RTAS on Power9
> > > > then we can always put it back.    
> > > 
> > > Arent' we using a 32-bit RTAS ? (Afaik there's a 64-bit one, we just
> > > never used it ..). In this case we need to at least clamp to 2G (no
> > > trust RTAS doing unsigned properly).  
> > 
> > Is there any allocation not covered by RTAS_INSTANTIATE_MAX?  
> 
> Not sure, we have to audit.

Okay.

> Talking about all this with mpe today, I
> think we just need to make sure that anything that has a restriction
> uses a specific identifier for *that* restriction rather than just
> blindly "rma". For example, seg0_limit for segment 0 in HPT. In the
> case of PACAs, we would create a specific limit that is
> min(seg0_limit,rma) for pseries and -1 for powernv. etc..
> 
> The RMA limit can then become either strictly a pseries thing, or be
> initialized to -1 on powernv (or max mem).

Right, I'm trying to get there with the patch. Well really it breaks
into two different things -- RMA for real mode (which is unlimited for
powernv), and non faulting (of any kind, SLB or TLB on booke) for
virtual mode.

RTAS is still squished in there with RMA, but we do have that RTAS
limit so we should try to move it out to there.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 8/9] powerpc/64: Use a table of paca pointers and allocate pacas individually
  2017-08-13  1:33 ` [PATCH v2 8/9] powerpc/64: Use a table of paca pointers and allocate pacas individually Nicholas Piggin
@ 2017-08-15 15:50   ` Nicholas Piggin
  2017-08-15 17:30   ` kbuild test robot
  1 sibling, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2017-08-15 15:50 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: kvm-ppc, Michael Ellerman

On Sun, 13 Aug 2017 11:33:45 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> Change the paca array into an array of pointers to pacas. Allocate
> pacas individually.
> 
> This allows flexibility in where the PACAs are allocated. Future work
> will allocate them node-local. Platforms that don't have address limits
> on PACAs would be able to defer PACA allocations until later in boot
> rather than allocate all possible ones up-front then freeing unused.
> 
> This is slightly more overhead (one additional indirection) for cross
> CPU paca references, but those aren't too common.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Incremental fix required for this. A !SMP typo bug, and pmac
init code that was not switched from paca array to paca_ptrs
array.

In the first case I just got rid of the r13 thing completely.
I think the idea of the code was to give secondaries a different
r13 so they wouldn't overwrite CPU0's paca if they accidentally
used it.

---
 arch/powerpc/kernel/head_64.S | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index f71f468ebe7f..9db29719fc32 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -386,12 +386,11 @@ generic_secondary_common_init:
 	 * physical cpu id in r24, we need to search the pacas to find
 	 * which logical id maps to our physical one.
 	 */
-	LOAD_REG_ADDR(r8, paca_ptrs)	/* Load paca_ptrs pointe	 */
-	ld	r8,0(r8)		/* Get base vaddr of array	 */
 #ifndef CONFIG_SMP
-	li	r13,r13,0		/* kill r13 if used accidentally */
 	b	kexec_wait		/* wait for next kernel if !SMP	 */
 #else
+	LOAD_REG_ADDR(r8, paca_ptrs)	/* Load paca_ptrs pointe	 */
+	ld	r8,0(r8)		/* Get base vaddr of array	 */
 	LOAD_REG_ADDR(r7, nr_cpu_ids)	/* Load nr_cpu_ids address       */
 	lwz	r7,0(r7)		/* also the max paca allocated 	 */
 	li	r5,0			/* logical cpu id                */
@@ -752,10 +751,10 @@ _GLOBAL(pmac_secondary_start)
 	mtmsrd	r3			/* RI on */
 
 	/* Set up a paca value for this processor. */
-	LOAD_REG_ADDR(r4,paca)		/* Load paca pointer		*/
-	ld	r4,0(r4)		/* Get base vaddr of paca array	*/
-	mulli	r13,r24,PACA_SIZE	/* Calculate vaddr of right paca */
-	add	r13,r13,r4		/* for this processor.		*/
+	LOAD_REG_ADDR(r4,paca_ptrs)	/* Load paca pointer		*/
+	ld	r4,0(r4)		/* Get base vaddr of paca_ptrs array */
+	sldi	r5,r24,3		/* get paca_ptrs[] index from cpu id */
+	ldx	r13,r4,r5		/* r13 = paca_ptrs[cpu id]       */
 	SET_PACA(r13)			/* Save vaddr of paca in an SPRG*/
 
 	/* Mark interrupts soft and hard disabled (they might be enabled
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 8/9] powerpc/64: Use a table of paca pointers and allocate pacas individually
  2017-08-13  1:33 ` [PATCH v2 8/9] powerpc/64: Use a table of paca pointers and allocate pacas individually Nicholas Piggin
  2017-08-15 15:50   ` Nicholas Piggin
@ 2017-08-15 17:30   ` kbuild test robot
  1 sibling, 0 replies; 28+ messages in thread
From: kbuild test robot @ 2017-08-15 17:30 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: kbuild-all, linuxppc-dev, kvm-ppc, Nicholas Piggin

[-- Attachment #1: Type: text/plain, Size: 1050 bytes --]

Hi Nicholas,

[auto build test ERROR on v4.13-rc5]
[also build test ERROR on next-20170815]
[cannot apply to powerpc/next scottwood/next]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Nicholas-Piggin/KVM-PPC-Book3S-HV-Fix-H_REGISTER_VPA-VPA-size-validation/20170815-221907
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=powerpc 

All errors (new ones prefixed by >>):

   arch/powerpc/kernel/head_64.o: In function `pmac_secondary_start':
>> (.text+0x3762): undefined reference to `paca'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23578 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/9] KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validation
  2017-08-13  1:33 ` [PATCH v2 1/9] KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validation Nicholas Piggin
  2017-08-15 11:24   ` Michael Ellerman
@ 2017-08-31  3:41   ` Paul Mackerras
  1 sibling, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2017-08-31  3:41 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, kvm-ppc

On Sun, Aug 13, 2017 at 11:33:38AM +1000, Nicholas Piggin wrote:
> KVM currently validates the size of the VPA registered by the client
> against sizeof(struct lppaca), however we align (and therefore size)
> that struct to 1kB to avoid crossing a 4kB boundary in the client.
> 
> PAPR calls for sizes >= 640 bytes to be accepted. Hard code this with
> a comment.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Thanks, patch applied to my kvm-ppc-next branch.

Paul.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v2, 4/9] powerpc/64s/radix: Remove bolted-SLB address limit for per-cpu stacks
  2017-08-13  1:33 ` [PATCH v2 4/9] powerpc/64s/radix: Remove bolted-SLB address limit for per-cpu stacks Nicholas Piggin
@ 2017-08-31 11:36   ` Michael Ellerman
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Ellerman @ 2017-08-31 11:36 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: kvm-ppc, Nicholas Piggin

On Sun, 2017-08-13 at 01:33:41 UTC, Nicholas Piggin wrote:
> Radix MMU does not take SLB or TLB interrupts when accessing kernel
> linear address. Remove this restriction for radix mode.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/d55071905ee1719094c66dd3c40e2a

cheers

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v2, 6/9] powerpc/64s/radix: Do not allocate SLB shadow structures
  2017-08-13  1:33 ` [PATCH v2 6/9] powerpc/64s/radix: Do not allocate SLB shadow structures Nicholas Piggin
@ 2017-08-31 11:36   ` Michael Ellerman
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Ellerman @ 2017-08-31 11:36 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: kvm-ppc, Nicholas Piggin

On Sun, 2017-08-13 at 01:33:43 UTC, Nicholas Piggin wrote:
> These are unused in radix mode.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b68b1d7487195d17bdd7e06f183acf

cheers

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [v2, 2/9] powerpc/powernv: powernv platform is not constrained by RMA
  2017-08-13  1:33 ` [PATCH v2 2/9] powerpc/powernv: powernv platform is not constrained by RMA Nicholas Piggin
@ 2017-08-31 11:36   ` Michael Ellerman
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Ellerman @ 2017-08-31 11:36 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: kvm-ppc, Nicholas Piggin

On Sun, 2017-08-13 at 01:33:39 UTC, Nicholas Piggin wrote:
> Remove incorrect comment about real mode address restrictions on
> powernv (bare metal), and unnecessary clamping to ppc64_rma_size.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/76b42e28beb5ed615b093e2abb28a7

cheers

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/9] powerpc/powernv: Remove real mode access limit for early allocations
  2017-08-14 13:13     ` Benjamin Herrenschmidt
  2017-08-15 12:10       ` Nicholas Piggin
@ 2017-09-12 10:13       ` Aneesh Kumar K.V
  2017-09-12 10:26         ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 28+ messages in thread
From: Aneesh Kumar K.V @ 2017-09-12 10:13 UTC (permalink / raw)
  To: benh, Michael Ellerman, Nicholas Piggin, linuxppc-dev
  Cc: Suraj Jitindar Singh, kvm-ppc

Benjamin Herrenschmidt <benh@au1.ibm.com> writes:

> On Mon, 2017-08-14 at 22:49 +1000, Michael Ellerman wrote:
>> > -     /*
>> > -      * We limit the allocation that depend on ppc64_rma_size
>> > -      * to first_memblock_size. We also clamp it to 1GB to
>> > -      * avoid some funky things such as RTAS bugs.
>> 
>> That comment about RTAS is 7 years old, and I'm pretty sure it was a
>> historical note when it was written.
>> 
>> I'm inclined to drop it and if we discover new bugs with RTAS on Power9
>> then we can always put it back.
>
> Arent' we using a 32-bit RTAS ? (Afaik there's a 64-bit one, we just
> never used it ..). In this case we need to at least clamp to 2G (no
> trust RTAS doing unsigned properly).
>

Yes. I added the limit to radix after I observed that we have MSR[SF] =
0.

IIRC it was PACA access that was causing it to crash on return from RTAS.

hmm the commit also explains that.

powerpc/mm/radix: Limit paca allocation in radix

On return from RTAS we access the paca variables and we have 64 bit
disabled. This requires us to limit paca in 32 bit range.

Fix this by setting ppc64_rma_size to first_memblock_size/1G range.

-aneesh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/9] powerpc/powernv: Remove real mode access limit for early allocations
  2017-09-12 10:13       ` Aneesh Kumar K.V
@ 2017-09-12 10:26         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 28+ messages in thread
From: Benjamin Herrenschmidt @ 2017-09-12 10:26 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Michael Ellerman, Nicholas Piggin, linuxppc-dev
  Cc: Suraj Jitindar Singh, kvm-ppc

On Tue, 2017-09-12 at 15:43 +0530, Aneesh Kumar K.V wrote:
> Yes. I added the limit to radix after I observed that we have MSR[SF] =
> 0.
> 
> IIRC it was PACA access that was causing it to crash on return from RTAS.
> 
> hmm the commit also explains that.
> 
> powerpc/mm/radix: Limit paca allocation in radix
> 
> On return from RTAS we access the paca variables and we have 64 bit
> disabled. This requires us to limit paca in 32 bit range.
> 
> Fix this by setting ppc64_rma_size to first_memblock_size/1G range.

That should be fixable with better use of temporaries...

Ben.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 7/9] powerpc/64s: do not allocate lppaca if we are not virtualized
  2017-08-13  1:33 ` [PATCH v2 7/9] powerpc/64s: do not allocate lppaca if we are not virtualized Nicholas Piggin
@ 2017-10-13 22:47   ` Paul Mackerras
  2017-10-14  3:54     ` Nicholas Piggin
  0 siblings, 1 reply; 28+ messages in thread
From: Paul Mackerras @ 2017-10-13 22:47 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, kvm-ppc

On Sun, Aug 13, 2017 at 11:33:44AM +1000, Nicholas Piggin wrote:
> The "lppaca" is a structure registered with the hypervisor. This
> is unnecessary when running on non-virtualised platforms. One field
> from the lppaca (pmcregs_in_use) is also used by the host, so move
> the host part out into the paca (lppaca field is still updated in
> guest mode).

There is an error in the patch, see below...

> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index c52184a8efdf..b838348e3a2b 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -99,8 +99,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
>  	mtspr	SPRN_SPRG_VDSO_WRITE,r3
>  
>  	/* Reload the host's PMU registers */
> -	ld	r3, PACALPPACAPTR(r13)	/* is the host using the PMU? */
> -	lbz	r4, LPPACA_PMCINUSE(r3)
> +	lbz	r4, PACA_PMCINUSE(r13) /* is the host using the PMU? */
>  	cmpwi	r4, 0
>  	beq	23f			/* skip if not */
>  BEGIN_FTR_SECTION
> @@ -1671,7 +1670,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	mtspr	SPRN_MMCRA, r7
>  	isync
>  	beq	21f			/* if no VPA, save PMU stuff anyway */
> -	lbz	r7, LPPACA_PMCINUSE(r8)
> +	lbz	r7, PACA_PMCINUSE(r13)

We really do need to check the guest's flag not the host's here, since
we're deciding whether to save the PMU state to the vcpu struct.

Paul.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 7/9] powerpc/64s: do not allocate lppaca if we are not virtualized
  2017-10-13 22:47   ` Paul Mackerras
@ 2017-10-14  3:54     ` Nicholas Piggin
  0 siblings, 0 replies; 28+ messages in thread
From: Nicholas Piggin @ 2017-10-14  3:54 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, kvm-ppc

On Sat, 14 Oct 2017 09:47:59 +1100
Paul Mackerras <paulus@ozlabs.org> wrote:

> On Sun, Aug 13, 2017 at 11:33:44AM +1000, Nicholas Piggin wrote:
> > The "lppaca" is a structure registered with the hypervisor. This
> > is unnecessary when running on non-virtualised platforms. One field
> > from the lppaca (pmcregs_in_use) is also used by the host, so move
> > the host part out into the paca (lppaca field is still updated in
> > guest mode).  
> 
> There is an error in the patch, see below...
> 
> > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > index c52184a8efdf..b838348e3a2b 100644
> > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > @@ -99,8 +99,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
> >  	mtspr	SPRN_SPRG_VDSO_WRITE,r3
> >  
> >  	/* Reload the host's PMU registers */
> > -	ld	r3, PACALPPACAPTR(r13)	/* is the host using the PMU? */
> > -	lbz	r4, LPPACA_PMCINUSE(r3)
> > +	lbz	r4, PACA_PMCINUSE(r13) /* is the host using the PMU? */
> >  	cmpwi	r4, 0
> >  	beq	23f			/* skip if not */
> >  BEGIN_FTR_SECTION
> > @@ -1671,7 +1670,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> >  	mtspr	SPRN_MMCRA, r7
> >  	isync
> >  	beq	21f			/* if no VPA, save PMU stuff anyway */
> > -	lbz	r7, LPPACA_PMCINUSE(r8)
> > +	lbz	r7, PACA_PMCINUSE(r13)  
> 
> We really do need to check the guest's flag not the host's here, since
> we're deciding whether to save the PMU state to the vcpu struct.

Okay I'll fix that up.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-10-14  3:54 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-12 11:34 [PATCH v2 0/9] improve early structure allocations (paca, lppaca, etc) Nicholas Piggin
2017-08-13  1:33 ` [PATCH v2 1/9] KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validation Nicholas Piggin
2017-08-15 11:24   ` Michael Ellerman
2017-08-31  3:41   ` Paul Mackerras
2017-08-13  1:33 ` [PATCH v2 2/9] powerpc/powernv: powernv platform is not constrained by RMA Nicholas Piggin
2017-08-31 11:36   ` [v2, " Michael Ellerman
2017-08-13  1:33 ` [PATCH v2 3/9] powerpc/powernv: Remove real mode access limit for early allocations Nicholas Piggin
2017-08-14 12:49   ` Michael Ellerman
2017-08-14 13:10     ` Benjamin Herrenschmidt
2017-08-14 14:51       ` Nicholas Piggin
2017-08-14 13:13     ` Benjamin Herrenschmidt
2017-08-15 12:10       ` Nicholas Piggin
2017-08-15 12:48         ` Benjamin Herrenschmidt
2017-08-15 13:02           ` Nicholas Piggin
2017-09-12 10:13       ` Aneesh Kumar K.V
2017-09-12 10:26         ` Benjamin Herrenschmidt
2017-08-13  1:33 ` [PATCH v2 4/9] powerpc/64s/radix: Remove bolted-SLB address limit for per-cpu stacks Nicholas Piggin
2017-08-31 11:36   ` [v2, " Michael Ellerman
2017-08-13  1:33 ` [PATCH v2 5/9] powerpc/64s: Relax PACA address limitations Nicholas Piggin
2017-08-13  1:33 ` [PATCH v2 6/9] powerpc/64s/radix: Do not allocate SLB shadow structures Nicholas Piggin
2017-08-31 11:36   ` [v2, " Michael Ellerman
2017-08-13  1:33 ` [PATCH v2 7/9] powerpc/64s: do not allocate lppaca if we are not virtualized Nicholas Piggin
2017-10-13 22:47   ` Paul Mackerras
2017-10-14  3:54     ` Nicholas Piggin
2017-08-13  1:33 ` [PATCH v2 8/9] powerpc/64: Use a table of paca pointers and allocate pacas individually Nicholas Piggin
2017-08-15 15:50   ` Nicholas Piggin
2017-08-15 17:30   ` kbuild test robot
2017-08-13  1:33 ` [PATCH v2 9/9] powerpc/64: Use a table of lppaca pointers and allocate lppacas individually Nicholas Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).