All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET
@ 2016-09-02 11:47 Paul Mackerras
  2016-09-02 11:49 ` [PATCH 2/3] powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address Paul Mackerras
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Paul Mackerras @ 2016-09-02 11:47 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Aneesh Kumar K.V

In commit c60ac5693c47 ("powerpc: Update kernel VSID range", 2013-03-13)
we lost a check on the region number (the top four bits of the effective
address) for addresses below PAGE_OFFSET.  That commit replaced a check
that the top 18 bits were all zero with a check that bits 46 - 59 were
zero (performed for all addresses, not just user addresses).

This means that userspace can access an address like 0x1000_0xxx_xxxx_xxxx
and we will insert a valid SLB entry for it.  The VSID used will be the
same as if the top 4 bits were 0, but the page size will be some random
value obtained by indexing beyond the end of the mm_ctx_high_slices_psize
array in the paca.  If that page size is the same as would be used for
region 0, then userspace just has an alias of the region 0 space.  If the
page size is different, then no HPTE will be found for the access, and
the process will get a SIGSEGV (since hash_page_mm() will refuse to create
a HPTE for the bogus address).

The access beyond the end of the mm_ctx_high_slices_psize can be at most
5.5MB past the array, and so will be in RAM somewhere.  Since the access
is a load performed in real mode, it won't fault or crash the kernel.
At most this bug could perhaps leak a little bit of information about
blocks of 32 bytes of memory located at offsets of i * 512kB past the
paca->mm_ctx_high_slices_psize array, for 1 <= i <= 11.

Cc: stable@vger.kernel.org # v3.10+
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/mm/slb_low.S | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/slb_low.S b/arch/powerpc/mm/slb_low.S
index dfdb90c..9f19834 100644
--- a/arch/powerpc/mm/slb_low.S
+++ b/arch/powerpc/mm/slb_low.S
@@ -113,7 +113,12 @@ BEGIN_FTR_SECTION
 END_MMU_FTR_SECTION_IFCLR(MMU_FTR_1T_SEGMENT)
 	b	slb_finish_load_1T
 
-0:
+0:	/*
+	 * For userspace addresses, make sure this is region 0.
+	 */
+	cmpdi	r9, 0
+	bne	8f
+
 	/* when using slices, we extract the psize off the slice bitmaps
 	 * and then we need to get the sllp encoding off the mmu_psize_defs
 	 * array.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/3] powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address
  2016-09-02 11:47 [PATCH 1/3] powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET Paul Mackerras
@ 2016-09-02 11:49 ` Paul Mackerras
  2016-09-04 11:30   ` Aneesh Kumar K.V
  2016-09-13 12:16   ` [2/3] " Michael Ellerman
  2016-09-02 11:50   ` Paul Mackerras
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 28+ messages in thread
From: Paul Mackerras @ 2016-09-02 11:49 UTC (permalink / raw)
  To: linuxppc-dev

Currently, if userspace or the kernel accesses a completely bogus address,
for example with any of bits 46-59 set, we first take an SLB miss interrupt,
install a corresponding SLB entry with VSID 0, retry the instruction, then
take a DSI/ISI interrupt because there is no HPT entry mapping the address.
However, by the time of the second interrupt, the Come-From Address Register
(CFAR) has been overwritten by the rfid instruction at the end of the SLB
miss interrupt handler.  Since bogus accesses can often be caused by a
function return after the stack has been overwritten, the CFAR value would
be very useful as it could indicate which function it was whose return had
led to the bogus address.

This patch adds code to create a full exception frame in the SLB miss handler
in the case of a bogus address, rather than inserting an SLB entry with a
zero VSID field.  Then we call a new slb_miss_bad_addr() function in C code,
which delivers a signal for a user access or creates an oops for a kernel
access.  In the latter case the oops message will show the CFAR value at the
time of the access.

In the case of the radix MMU, a segment miss interrupt indicates an access
outside the ranges mapped by the page tables.  Previously this was handled
by the code for an unrecoverable SLB miss (one with MSR[RI] = 0), which is
not really correct.  With this patch, we now handle these interrupts with
slb_miss_bad_addr(), which is much more consistent.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/kernel/exceptions-64s.S | 40 ++++++++++++++++++++++++++++++------
 arch/powerpc/kernel/traps.c          | 11 ++++++++++
 arch/powerpc/mm/slb_low.S            |  8 +++-----
 3 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index df6d45e..a2526b0 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -175,6 +175,7 @@ data_access_slb_pSeries:
 	std	r3,PACA_EXSLB+EX_R3(r13)
 	mfspr	r3,SPRN_DAR
 	mfspr	r12,SPRN_SRR1
+	crset	4*cr6+eq
 #ifndef CONFIG_RELOCATABLE
 	b	slb_miss_realmode
 #else
@@ -201,6 +202,7 @@ instruction_access_slb_pSeries:
 	std	r3,PACA_EXSLB+EX_R3(r13)
 	mfspr	r3,SPRN_SRR0		/* SRR0 is faulting address */
 	mfspr	r12,SPRN_SRR1
+	crclr	4*cr6+eq
 #ifndef CONFIG_RELOCATABLE
 	b	slb_miss_realmode
 #else
@@ -767,6 +769,7 @@ data_access_slb_relon_pSeries:
 	std	r3,PACA_EXSLB+EX_R3(r13)
 	mfspr	r3,SPRN_DAR
 	mfspr	r12,SPRN_SRR1
+	crset	4*cr6+eq
 #ifndef CONFIG_RELOCATABLE
 	b	slb_miss_realmode
 #else
@@ -792,6 +795,7 @@ instruction_access_slb_relon_pSeries:
 	std	r3,PACA_EXSLB+EX_R3(r13)
 	mfspr	r3,SPRN_SRR0		/* SRR0 is faulting address */
 	mfspr	r12,SPRN_SRR1
+	crclr	4*cr6+eq
 #ifndef CONFIG_RELOCATABLE
 	b	slb_miss_realmode
 #else
@@ -1389,6 +1393,7 @@ unrecover_mce:
  * r3 has the faulting address
  * r9 - r13 are saved in paca->exslb.
  * r3 is saved in paca->slb_r3
+ * cr6.eq is set for a D-SLB miss, clear for a I-SLB miss
  * We assume we aren't going to take any exceptions during this procedure.
  */
 slb_miss_realmode:
@@ -1399,29 +1404,31 @@ slb_miss_realmode:
 
 	stw	r9,PACA_EXSLB+EX_CCR(r13)	/* save CR in exc. frame */
 	std	r10,PACA_EXSLB+EX_LR(r13)	/* save LR */
+	std	r3,PACA_EXSLB+EX_DAR(r13)
 
+	crset	4*cr0+eq
 #ifdef CONFIG_PPC_STD_MMU_64
 BEGIN_MMU_FTR_SECTION
 	bl	slb_allocate_realmode
 END_MMU_FTR_SECTION_IFCLR(MMU_FTR_TYPE_RADIX)
 #endif
-	/* All done -- return from exception. */
 
 	ld	r10,PACA_EXSLB+EX_LR(r13)
 	ld	r3,PACA_EXSLB+EX_R3(r13)
 	lwz	r9,PACA_EXSLB+EX_CCR(r13)	/* get saved CR */
-
 	mtlr	r10
+
+	beq	8f		/* if bad address, make full stack frame */
+
 	andi.	r10,r12,MSR_RI	/* check for unrecoverable exception */
-BEGIN_MMU_FTR_SECTION
 	beq-	2f
-FTR_SECTION_ELSE
-	b	2f
-ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
+
+	/* All done -- return from exception. */
 
 .machine	push
 .machine	"power4"
 	mtcrf	0x80,r9
+	mtcrf	0x02,r9		/* I/D indication is in cr6 */
 	mtcrf	0x01,r9		/* slb_allocate uses cr0 and cr7 */
 .machine	pop
 
@@ -1451,6 +1458,27 @@ unrecov_slb:
 	bl	unrecoverable_exception
 	b	1b
 
+8:	mfspr	r11,SPRN_SRR0
+	ld	r10,PACAKBASE(r13)
+	LOAD_HANDLER(r10,bad_addr_slb)
+	mtspr	SPRN_SRR0,r10
+	ld	r10,PACAKMSR(r13)
+	mtspr	SPRN_SRR1,r10
+	rfid
+	b	.
+
+bad_addr_slb:
+	EXCEPTION_PROLOG_COMMON(0x380, PACA_EXSLB)
+	RECONCILE_IRQ_STATE(r10, r11)
+	ld	r3, PACA_EXSLB+EX_DAR(r13)
+	std	r3, _DAR(r1)
+	beq	cr6, 2f
+	li	r10, 0x480		/* fix trap number for I-SLB miss */
+	std	r10, _TRAP(r1)
+2:	bl	save_nvgprs
+	addi	r3, r1, STACK_FRAME_OVERHEAD
+	bl	slb_miss_bad_addr
+	b	ret_from_except
 
 #ifdef CONFIG_PPC_970_NAP
 power4_fixup_nap:
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 2cb5892..a80478b 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1309,6 +1309,17 @@ bail:
 	exception_exit(prev_state);
 }
 
+void slb_miss_bad_addr(struct pt_regs *regs)
+{
+	enum ctx_state prev_state = exception_enter();
+
+	if (user_mode(regs))
+		_exception(SIGSEGV, regs, SEGV_BNDERR, regs->dar);
+	else
+		bad_page_fault(regs, regs->dar, SIGSEGV);
+	exception_exit(prev_state);
+}
+
 void StackOverflow(struct pt_regs *regs)
 {
 	printk(KERN_CRIT "Kernel stack overflow in process %p, r1=%lx\n",
diff --git a/arch/powerpc/mm/slb_low.S b/arch/powerpc/mm/slb_low.S
index 9f19834..e2974fc 100644
--- a/arch/powerpc/mm/slb_low.S
+++ b/arch/powerpc/mm/slb_low.S
@@ -178,11 +178,9 @@ BEGIN_FTR_SECTION
 END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT)
 	b	slb_finish_load
 
-8:	/* invalid EA */
-	li	r10,0			/* BAD_VSID */
-	li	r9,0			/* BAD_VSID */
-	li	r11,SLB_VSID_USER	/* flags don't much matter */
-	b	slb_finish_load
+8:	/* invalid EA - return an error indication */
+	crset	4*cr0+eq		/* indicate failure */
+	blr
 
 /*
  * Finish loading of an SLB entry and return
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
  2016-09-02 11:47 [PATCH 1/3] powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET Paul Mackerras
@ 2016-09-02 11:50   ` Paul Mackerras
  2016-09-02 11:50   ` Paul Mackerras
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2016-09-02 11:50 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: kvm, kvm-ppc

This replaces a 2-D search through an array with a simple 8-bit table
lookup for determining the actual and/or base page size for a HPT entry.

The encoding in the second doubleword of the HPTE is designed to encode
the actual and base page sizes without using any more bits than would be
needed for a 4k page number, by using between 1 and 8 low-order bits of
the RPN (real page number) field to encode the page sizes.  A single
"large page" bit in the first doubleword indicates that these low-order
bits are to be interpreted like this.

We can determine the page sizes by using the low-order 8 bits of the RPN
to look up a 256-entry table.  For actual page sizes less than 1MB, some
of the upper bits of these 8 bits are going to be real address bits, but
we can cope with that by replicating the entries for those smaller page
sizes.

While we're at it, let's move the hpte_page_size() and hpte_base_page_size()
functions from a KVM-specific header to a header for 64-bit HPT systems,
since this computation doesn't have anything specifically to do with KVM.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h | 37 ++++++++++++
 arch/powerpc/include/asm/kvm_book3s_64.h      | 87 +++------------------------
 arch/powerpc/include/asm/mmu.h                |  1 +
 arch/powerpc/mm/hash_native_64.c              | 42 +------------
 arch/powerpc/mm/hash_utils_64.c               | 37 ++++++++++++
 5 files changed, 84 insertions(+), 120 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 287a656..e407af2 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -245,6 +245,43 @@ static inline int segment_shift(int ssize)
 }
 
 /*
+ * This array is indexed by the LP field of the HPTE second dword.
+ * Since this field may contain some RPN bits, some entries are
+ * replicated so that we get the same value irrespective of RPN.
+ * The top 4 bits are the page size index (MMU_PAGE_*) for the
+ * actual page size, the bottom 4 bits are the base page size.
+ */
+extern u8 hpte_page_sizes[1 << LP_BITS];
+
+static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l,
+					     bool is_base_size)
+{
+	unsigned int i, lp;
+
+	if (!(h & HPTE_V_LARGE))
+		return 1ul << 12;
+
+	/* Look at the 8 bit LP value */
+	lp = (l >> LP_SHIFT) & ((1 << LP_BITS) - 1);
+	i = hpte_page_sizes[lp];
+	if (!i)
+		return 0;
+	if (!is_base_size)
+		i >>= 4;
+	return 1ul << mmu_psize_defs[i & 0xf].shift;
+}
+
+static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
+{
+	return __hpte_page_size(h, l, 0);
+}
+
+static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long l)
+{
+	return __hpte_page_size(h, l, 1);
+}
+
+/*
  * The current system page and segment sizes
  */
 extern int mmu_kernel_ssize;
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 88d17b4..4ffd5a1 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -20,6 +20,8 @@
 #ifndef __ASM_KVM_BOOK3S_64_H__
 #define __ASM_KVM_BOOK3S_64_H__
 
+#include <asm/book3s/64/mmu-hash.h>
+
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 static inline struct kvmppc_book3s_shadow_vcpu *svcpu_get(struct kvm_vcpu *vcpu)
 {
@@ -97,56 +99,20 @@ static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v)
 	hpte[0] = cpu_to_be64(hpte_v);
 }
 
-static inline int __hpte_actual_psize(unsigned int lp, int psize)
-{
-	int i, shift;
-	unsigned int mask;
-
-	/* start from 1 ignoring MMU_PAGE_4K */
-	for (i = 1; i < MMU_PAGE_COUNT; i++) {
-
-		/* invalid penc */
-		if (mmu_psize_defs[psize].penc[i] == -1)
-			continue;
-		/*
-		 * encoding bits per actual page size
-		 *        PTE LP     actual page size
-		 *    rrrr rrrz		>=8KB
-		 *    rrrr rrzz		>=16KB
-		 *    rrrr rzzz		>=32KB
-		 *    rrrr zzzz		>=64KB
-		 * .......
-		 */
-		shift = mmu_psize_defs[i].shift - LP_SHIFT;
-		if (shift > LP_BITS)
-			shift = LP_BITS;
-		mask = (1 << shift) - 1;
-		if ((lp & mask) == mmu_psize_defs[psize].penc[i])
-			return i;
-	}
-	return -1;
-}
-
 static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
 					     unsigned long pte_index)
 {
-	int b_psize = MMU_PAGE_4K, a_psize = MMU_PAGE_4K;
+	int i, b_psize = MMU_PAGE_4K, a_psize = MMU_PAGE_4K;
 	unsigned int penc;
 	unsigned long rb = 0, va_low, sllp;
 	unsigned int lp = (r >> LP_SHIFT) & ((1 << LP_BITS) - 1);
 
 	if (v & HPTE_V_LARGE) {
-		for (b_psize = 0; b_psize < MMU_PAGE_COUNT; b_psize++) {
-
-			/* valid entries have a shift value */
-			if (!mmu_psize_defs[b_psize].shift)
-				continue;
-
-			a_psize = __hpte_actual_psize(lp, b_psize);
-			if (a_psize != -1)
-				break;
-		}
+		i = hpte_page_sizes[lp];
+		b_psize = i & 0xf;
+		a_psize = i >> 4;
 	}
+
 	/*
 	 * Ignore the top 14 bits of va
 	 * v have top two bits covering segment size, hence move
@@ -215,45 +181,6 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
 	return rb;
 }
 
-static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l,
-					     bool is_base_size)
-{
-
-	int size, a_psize;
-	/* Look at the 8 bit LP value */
-	unsigned int lp = (l >> LP_SHIFT) & ((1 << LP_BITS) - 1);
-
-	/* only handle 4k, 64k and 16M pages for now */
-	if (!(h & HPTE_V_LARGE))
-		return 1ul << 12;
-	else {
-		for (size = 0; size < MMU_PAGE_COUNT; size++) {
-			/* valid entries have a shift value */
-			if (!mmu_psize_defs[size].shift)
-				continue;
-
-			a_psize = __hpte_actual_psize(lp, size);
-			if (a_psize != -1) {
-				if (is_base_size)
-					return 1ul << mmu_psize_defs[size].shift;
-				return 1ul << mmu_psize_defs[a_psize].shift;
-			}
-		}
-
-	}
-	return 0;
-}
-
-static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
-{
-	return __hpte_page_size(h, l, 0);
-}
-
-static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long l)
-{
-	return __hpte_page_size(h, l, 1);
-}
-
 static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
 {
 	return ((ptel & HPTE_R_RPN) & ~(psize - 1)) >> PAGE_SHIFT;
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index e2fb408..b78e8d3 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -271,6 +271,7 @@ static inline bool early_radix_enabled(void)
 #define MMU_PAGE_16G	13
 #define MMU_PAGE_64G	14
 
+/* N.B. we need to change the type of hpte_page_sizes if this gets to be > 16 */
 #define MMU_PAGE_COUNT	15
 
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 0e4e965..83ddc0e 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -493,36 +493,6 @@ static void native_hugepage_invalidate(unsigned long vsid,
 }
 #endif
 
-static inline int __hpte_actual_psize(unsigned int lp, int psize)
-{
-	int i, shift;
-	unsigned int mask;
-
-	/* start from 1 ignoring MMU_PAGE_4K */
-	for (i = 1; i < MMU_PAGE_COUNT; i++) {
-
-		/* invalid penc */
-		if (mmu_psize_defs[psize].penc[i] == -1)
-			continue;
-		/*
-		 * encoding bits per actual page size
-		 *        PTE LP     actual page size
-		 *    rrrr rrrz		>=8KB
-		 *    rrrr rrzz		>=16KB
-		 *    rrrr rzzz		>=32KB
-		 *    rrrr zzzz		>=64KB
-		 * .......
-		 */
-		shift = mmu_psize_defs[i].shift - LP_SHIFT;
-		if (shift > LP_BITS)
-			shift = LP_BITS;
-		mask = (1 << shift) - 1;
-		if ((lp & mask) == mmu_psize_defs[psize].penc[i])
-			return i;
-	}
-	return -1;
-}
-
 static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
 			int *psize, int *apsize, int *ssize, unsigned long *vpn)
 {
@@ -538,16 +508,8 @@ static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
 		size   = MMU_PAGE_4K;
 		a_size = MMU_PAGE_4K;
 	} else {
-		for (size = 0; size < MMU_PAGE_COUNT; size++) {
-
-			/* valid entries have a shift value */
-			if (!mmu_psize_defs[size].shift)
-				continue;
-
-			a_size = __hpte_actual_psize(lp, size);
-			if (a_size != -1)
-				break;
-		}
+		size = hpte_page_sizes[lp] & 0xf;
+		a_size = hpte_page_sizes[lp] >> 4;
 	}
 	/* This works for all page sizes, and for 256M and 1T segments */
 	if (cpu_has_feature(CPU_FTR_ARCH_300))
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 0821556..e4ec99c 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -93,6 +93,9 @@ static unsigned long _SDR1;
 struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
 EXPORT_SYMBOL_GPL(mmu_psize_defs);
 
+u8 hpte_page_sizes[1 << LP_BITS];
+EXPORT_SYMBOL_GPL(hpte_page_sizes);
+
 struct hash_pte *htab_address;
 unsigned long htab_size_bytes;
 unsigned long htab_hash_mask;
@@ -564,8 +567,42 @@ static void __init htab_scan_page_sizes(void)
 #endif /* CONFIG_HUGETLB_PAGE */
 }
 
+/*
+ * Fill in the hpte_page_sizes[] array.
+ * We go through the mmu_psize_defs[] array looking for all the
+ * supported base/actual page size combinations.  Each combination
+ * has a unique pagesize encoding (penc) value in the low bits of
+ * the LP field of the HPTE.  For actual page sizes less than 1MB,
+ * some of the upper LP bits are used for RPN bits, meaning that
+ * we need to fill in several entries in hpte_page_sizes[].
+ */
+static void init_hpte_page_sizes(void)
+{
+	long int ap, bp;
+	long int shift, penc;
+
+	for (bp = 0; bp < MMU_PAGE_COUNT; ++bp) {
+		if (!mmu_psize_defs[bp].shift)
+			continue;	/* not a supported page size */
+		for (ap = bp; ap < MMU_PAGE_COUNT; ++ap) {
+			penc = mmu_psize_defs[bp].penc[ap];
+			if (penc == -1)
+				continue;
+			shift = mmu_psize_defs[ap].shift - LP_SHIFT;
+			if (shift <= 0)
+				continue;	/* should never happen */
+			while (penc < (1 << LP_BITS)) {
+				hpte_page_sizes[penc] = (ap << 4) | bp;
+				penc += 1 << shift;
+			}
+		}
+	}
+}
+
 static void __init htab_init_page_sizes(void)
 {
+	init_hpte_page_sizes();
+
 	if (!debug_pagealloc_enabled()) {
 		/*
 		 * Pick a size for the linear mapping. Currently, we only
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
@ 2016-09-02 11:50   ` Paul Mackerras
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2016-09-02 11:50 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: kvm, kvm-ppc

This replaces a 2-D search through an array with a simple 8-bit table
lookup for determining the actual and/or base page size for a HPT entry.

The encoding in the second doubleword of the HPTE is designed to encode
the actual and base page sizes without using any more bits than would be
needed for a 4k page number, by using between 1 and 8 low-order bits of
the RPN (real page number) field to encode the page sizes.  A single
"large page" bit in the first doubleword indicates that these low-order
bits are to be interpreted like this.

We can determine the page sizes by using the low-order 8 bits of the RPN
to look up a 256-entry table.  For actual page sizes less than 1MB, some
of the upper bits of these 8 bits are going to be real address bits, but
we can cope with that by replicating the entries for those smaller page
sizes.

While we're at it, let's move the hpte_page_size() and hpte_base_page_size()
functions from a KVM-specific header to a header for 64-bit HPT systems,
since this computation doesn't have anything specifically to do with KVM.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h | 37 ++++++++++++
 arch/powerpc/include/asm/kvm_book3s_64.h      | 87 +++------------------------
 arch/powerpc/include/asm/mmu.h                |  1 +
 arch/powerpc/mm/hash_native_64.c              | 42 +------------
 arch/powerpc/mm/hash_utils_64.c               | 37 ++++++++++++
 5 files changed, 84 insertions(+), 120 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 287a656..e407af2 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -245,6 +245,43 @@ static inline int segment_shift(int ssize)
 }
 
 /*
+ * This array is indexed by the LP field of the HPTE second dword.
+ * Since this field may contain some RPN bits, some entries are
+ * replicated so that we get the same value irrespective of RPN.
+ * The top 4 bits are the page size index (MMU_PAGE_*) for the
+ * actual page size, the bottom 4 bits are the base page size.
+ */
+extern u8 hpte_page_sizes[1 << LP_BITS];
+
+static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l,
+					     bool is_base_size)
+{
+	unsigned int i, lp;
+
+	if (!(h & HPTE_V_LARGE))
+		return 1ul << 12;
+
+	/* Look at the 8 bit LP value */
+	lp = (l >> LP_SHIFT) & ((1 << LP_BITS) - 1);
+	i = hpte_page_sizes[lp];
+	if (!i)
+		return 0;
+	if (!is_base_size)
+		i >>= 4;
+	return 1ul << mmu_psize_defs[i & 0xf].shift;
+}
+
+static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
+{
+	return __hpte_page_size(h, l, 0);
+}
+
+static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long l)
+{
+	return __hpte_page_size(h, l, 1);
+}
+
+/*
  * The current system page and segment sizes
  */
 extern int mmu_kernel_ssize;
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 88d17b4..4ffd5a1 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -20,6 +20,8 @@
 #ifndef __ASM_KVM_BOOK3S_64_H__
 #define __ASM_KVM_BOOK3S_64_H__
 
+#include <asm/book3s/64/mmu-hash.h>
+
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 static inline struct kvmppc_book3s_shadow_vcpu *svcpu_get(struct kvm_vcpu *vcpu)
 {
@@ -97,56 +99,20 @@ static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v)
 	hpte[0] = cpu_to_be64(hpte_v);
 }
 
-static inline int __hpte_actual_psize(unsigned int lp, int psize)
-{
-	int i, shift;
-	unsigned int mask;
-
-	/* start from 1 ignoring MMU_PAGE_4K */
-	for (i = 1; i < MMU_PAGE_COUNT; i++) {
-
-		/* invalid penc */
-		if (mmu_psize_defs[psize].penc[i] = -1)
-			continue;
-		/*
-		 * encoding bits per actual page size
-		 *        PTE LP     actual page size
-		 *    rrrr rrrz		>=8KB
-		 *    rrrr rrzz		>\x16KB
-		 *    rrrr rzzz		>2KB
-		 *    rrrr zzzz		>dKB
-		 * .......
-		 */
-		shift = mmu_psize_defs[i].shift - LP_SHIFT;
-		if (shift > LP_BITS)
-			shift = LP_BITS;
-		mask = (1 << shift) - 1;
-		if ((lp & mask) = mmu_psize_defs[psize].penc[i])
-			return i;
-	}
-	return -1;
-}
-
 static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
 					     unsigned long pte_index)
 {
-	int b_psize = MMU_PAGE_4K, a_psize = MMU_PAGE_4K;
+	int i, b_psize = MMU_PAGE_4K, a_psize = MMU_PAGE_4K;
 	unsigned int penc;
 	unsigned long rb = 0, va_low, sllp;
 	unsigned int lp = (r >> LP_SHIFT) & ((1 << LP_BITS) - 1);
 
 	if (v & HPTE_V_LARGE) {
-		for (b_psize = 0; b_psize < MMU_PAGE_COUNT; b_psize++) {
-
-			/* valid entries have a shift value */
-			if (!mmu_psize_defs[b_psize].shift)
-				continue;
-
-			a_psize = __hpte_actual_psize(lp, b_psize);
-			if (a_psize != -1)
-				break;
-		}
+		i = hpte_page_sizes[lp];
+		b_psize = i & 0xf;
+		a_psize = i >> 4;
 	}
+
 	/*
 	 * Ignore the top 14 bits of va
 	 * v have top two bits covering segment size, hence move
@@ -215,45 +181,6 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
 	return rb;
 }
 
-static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l,
-					     bool is_base_size)
-{
-
-	int size, a_psize;
-	/* Look at the 8 bit LP value */
-	unsigned int lp = (l >> LP_SHIFT) & ((1 << LP_BITS) - 1);
-
-	/* only handle 4k, 64k and 16M pages for now */
-	if (!(h & HPTE_V_LARGE))
-		return 1ul << 12;
-	else {
-		for (size = 0; size < MMU_PAGE_COUNT; size++) {
-			/* valid entries have a shift value */
-			if (!mmu_psize_defs[size].shift)
-				continue;
-
-			a_psize = __hpte_actual_psize(lp, size);
-			if (a_psize != -1) {
-				if (is_base_size)
-					return 1ul << mmu_psize_defs[size].shift;
-				return 1ul << mmu_psize_defs[a_psize].shift;
-			}
-		}
-
-	}
-	return 0;
-}
-
-static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
-{
-	return __hpte_page_size(h, l, 0);
-}
-
-static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long l)
-{
-	return __hpte_page_size(h, l, 1);
-}
-
 static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
 {
 	return ((ptel & HPTE_R_RPN) & ~(psize - 1)) >> PAGE_SHIFT;
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index e2fb408..b78e8d3 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -271,6 +271,7 @@ static inline bool early_radix_enabled(void)
 #define MMU_PAGE_16G	13
 #define MMU_PAGE_64G	14
 
+/* N.B. we need to change the type of hpte_page_sizes if this gets to be > 16 */
 #define MMU_PAGE_COUNT	15
 
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 0e4e965..83ddc0e 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -493,36 +493,6 @@ static void native_hugepage_invalidate(unsigned long vsid,
 }
 #endif
 
-static inline int __hpte_actual_psize(unsigned int lp, int psize)
-{
-	int i, shift;
-	unsigned int mask;
-
-	/* start from 1 ignoring MMU_PAGE_4K */
-	for (i = 1; i < MMU_PAGE_COUNT; i++) {
-
-		/* invalid penc */
-		if (mmu_psize_defs[psize].penc[i] = -1)
-			continue;
-		/*
-		 * encoding bits per actual page size
-		 *        PTE LP     actual page size
-		 *    rrrr rrrz		>=8KB
-		 *    rrrr rrzz		>\x16KB
-		 *    rrrr rzzz		>2KB
-		 *    rrrr zzzz		>dKB
-		 * .......
-		 */
-		shift = mmu_psize_defs[i].shift - LP_SHIFT;
-		if (shift > LP_BITS)
-			shift = LP_BITS;
-		mask = (1 << shift) - 1;
-		if ((lp & mask) = mmu_psize_defs[psize].penc[i])
-			return i;
-	}
-	return -1;
-}
-
 static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
 			int *psize, int *apsize, int *ssize, unsigned long *vpn)
 {
@@ -538,16 +508,8 @@ static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
 		size   = MMU_PAGE_4K;
 		a_size = MMU_PAGE_4K;
 	} else {
-		for (size = 0; size < MMU_PAGE_COUNT; size++) {
-
-			/* valid entries have a shift value */
-			if (!mmu_psize_defs[size].shift)
-				continue;
-
-			a_size = __hpte_actual_psize(lp, size);
-			if (a_size != -1)
-				break;
-		}
+		size = hpte_page_sizes[lp] & 0xf;
+		a_size = hpte_page_sizes[lp] >> 4;
 	}
 	/* This works for all page sizes, and for 256M and 1T segments */
 	if (cpu_has_feature(CPU_FTR_ARCH_300))
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 0821556..e4ec99c 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -93,6 +93,9 @@ static unsigned long _SDR1;
 struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
 EXPORT_SYMBOL_GPL(mmu_psize_defs);
 
+u8 hpte_page_sizes[1 << LP_BITS];
+EXPORT_SYMBOL_GPL(hpte_page_sizes);
+
 struct hash_pte *htab_address;
 unsigned long htab_size_bytes;
 unsigned long htab_hash_mask;
@@ -564,8 +567,42 @@ static void __init htab_scan_page_sizes(void)
 #endif /* CONFIG_HUGETLB_PAGE */
 }
 
+/*
+ * Fill in the hpte_page_sizes[] array.
+ * We go through the mmu_psize_defs[] array looking for all the
+ * supported base/actual page size combinations.  Each combination
+ * has a unique pagesize encoding (penc) value in the low bits of
+ * the LP field of the HPTE.  For actual page sizes less than 1MB,
+ * some of the upper LP bits are used for RPN bits, meaning that
+ * we need to fill in several entries in hpte_page_sizes[].
+ */
+static void init_hpte_page_sizes(void)
+{
+	long int ap, bp;
+	long int shift, penc;
+
+	for (bp = 0; bp < MMU_PAGE_COUNT; ++bp) {
+		if (!mmu_psize_defs[bp].shift)
+			continue;	/* not a supported page size */
+		for (ap = bp; ap < MMU_PAGE_COUNT; ++ap) {
+			penc = mmu_psize_defs[bp].penc[ap];
+			if (penc = -1)
+				continue;
+			shift = mmu_psize_defs[ap].shift - LP_SHIFT;
+			if (shift <= 0)
+				continue;	/* should never happen */
+			while (penc < (1 << LP_BITS)) {
+				hpte_page_sizes[penc] = (ap << 4) | bp;
+				penc += 1 << shift;
+			}
+		}
+	}
+}
+
 static void __init htab_init_page_sizes(void)
 {
+	init_hpte_page_sizes();
+
 	if (!debug_pagealloc_enabled()) {
 		/*
 		 * Pick a size for the linear mapping. Currently, we only
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET
  2016-09-02 11:47 [PATCH 1/3] powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET Paul Mackerras
  2016-09-02 11:49 ` [PATCH 2/3] powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address Paul Mackerras
  2016-09-02 11:50   ` Paul Mackerras
@ 2016-09-02 12:22 ` Aneesh Kumar K.V
  2016-09-03  9:54   ` Paul Mackerras
  2016-09-08  9:47 ` [1/3] " Michael Ellerman
  3 siblings, 1 reply; 28+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-02 12:22 UTC (permalink / raw)
  To: Paul Mackerras, linuxppc-dev


Hi Paul,

Really nice catch. Was this found by code analysis or do we have any
reported issue around this ?

Paul Mackerras <paulus@ozlabs.org> writes:

> In commit c60ac5693c47 ("powerpc: Update kernel VSID range", 2013-03-13)
> we lost a check on the region number (the top four bits of the effective
> address) for addresses below PAGE_OFFSET.  That commit replaced a check
> that the top 18 bits were all zero with a check that bits 46 - 59 were
> zero (performed for all addresses, not just user addresses).

To make review easy for others, here is the relevant diff from that commit.

 _GLOBAL(slb_allocate_realmode)
-       /* r3 = faulting address */
+       /*
+        * check for bad kernel/user address
+        * (ea & ~REGION_MASK) >= PGTABLE_RANGE
+        */
+       rldicr. r9,r3,4,(63 - 46 - 4)
+       bne-    8f
 
        srdi    r9,r3,60                /* get region */

......
And because we were doing the above check, I removed
.........

 BEGIN_FTR_SECTION
        b       slb_finish_load
 END_MMU_FTR_SECTION_IFCLR(MMU_FTR_1T_SEGMENT)
        b       slb_finish_load_1T
 
-0:     /* user address: proto-VSID = context << 15 | ESID. First check
-        * if the address is within the boundaries of the user region
-        */
-       srdi.   r9,r10,USER_ESID_BITS
-       bne-    8f                      /* invalid ea bits set */
-
-
+0:


>
> This means that userspace can access an address like 0x1000_0xxx_xxxx_xxxx
> and we will insert a valid SLB entry for it.  The VSID used will be the
> same as if the top 4 bits were 0, but the page size will be some random
> value obtained by indexing beyond the end of the mm_ctx_high_slices_psize
> array in the paca.  If that page size is the same as would be used for
> region 0, then userspace just has an alias of the region 0 space.  If the
> page size is different, then no HPTE will be found for the access, and
> the process will get a SIGSEGV (since hash_page_mm() will refuse to create
> a HPTE for the bogus address).
>
> The access beyond the end of the mm_ctx_high_slices_psize can be at most
> 5.5MB past the array, and so will be in RAM somewhere.  Since the access
> is a load performed in real mode, it won't fault or crash the kernel.
> At most this bug could perhaps leak a little bit of information about
> blocks of 32 bytes of memory located at offsets of i * 512kB past the
> paca->mm_ctx_high_slices_psize array, for 1 <= i <= 11.


Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

>
> Cc: stable@vger.kernel.org # v3.10+
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
>  arch/powerpc/mm/slb_low.S | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/mm/slb_low.S b/arch/powerpc/mm/slb_low.S
> index dfdb90c..9f19834 100644
> --- a/arch/powerpc/mm/slb_low.S
> +++ b/arch/powerpc/mm/slb_low.S
> @@ -113,7 +113,12 @@ BEGIN_FTR_SECTION
>  END_MMU_FTR_SECTION_IFCLR(MMU_FTR_1T_SEGMENT)
>  	b	slb_finish_load_1T
>
> -0:
> +0:	/*
> +	 * For userspace addresses, make sure this is region 0.
> +	 */
> +	cmpdi	r9, 0
> +	bne	8f
> +
>  	/* when using slices, we extract the psize off the slice bitmaps
>  	 * and then we need to get the sllp encoding off the mmu_psize_defs
>  	 * array.
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET
  2016-09-02 12:22 ` [PATCH 1/3] powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET Aneesh Kumar K.V
@ 2016-09-03  9:54   ` Paul Mackerras
  2016-09-04 11:31     ` Aneesh Kumar K.V
  0 siblings, 1 reply; 28+ messages in thread
From: Paul Mackerras @ 2016-09-03  9:54 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev

On Fri, Sep 02, 2016 at 05:52:16PM +0530, Aneesh Kumar K.V wrote:
> 
> Hi Paul,
> 
> Really nice catch. Was this found by code analysis or do we have any
> reported issue around this ?

I found it by code analysis.

I haven't been able to find any really bad consequence, beyond leaking
some information about kernel memory.  Can you find any worse
consequence?

Paul.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
  2016-09-02 11:50   ` Paul Mackerras
@ 2016-09-04 11:28     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 28+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-04 11:16 UTC (permalink / raw)
  To: Paul Mackerras, linuxppc-dev; +Cc: kvm-ppc, kvm

Paul Mackerras <paulus@ozlabs.org> writes:

> +/*
> + * Fill in the hpte_page_sizes[] array.
> + * We go through the mmu_psize_defs[] array looking for all the
> + * supported base/actual page size combinations.  Each combination
> + * has a unique pagesize encoding (penc) value in the low bits of
> + * the LP field of the HPTE.  For actual page sizes less than 1MB,
> + * some of the upper LP bits are used for RPN bits, meaning that
> + * we need to fill in several entries in hpte_page_sizes[].
> + */


May be can put the details of upper LP bits used for RPN here. ie, add
the below in the comment ?

		/*
		 * encoding bits per actual page size
		 *        PTE LP     actual page size
		 *    rrrr rrrz		>=8KB
		 *    rrrr rrzz		>=16KB
		 *    rrrr rzzz		>=32KB
		 *    rrrr zzzz		>=64KB
		 * .......
		 */


> +static void init_hpte_page_sizes(void)
> +{
> +	long int ap, bp;
> +	long int shift, penc;
> +
> +	for (bp = 0; bp < MMU_PAGE_COUNT; ++bp) {
> +		if (!mmu_psize_defs[bp].shift)
> +			continue;	/* not a supported page size */
> +		for (ap = bp; ap < MMU_PAGE_COUNT; ++ap) {
> +			penc = mmu_psize_defs[bp].penc[ap];
> +			if (penc == -1)
> +				continue;
> +			shift = mmu_psize_defs[ap].shift - LP_SHIFT;
> +			if (shift <= 0)
> +				continue;	/* should never happen */
> +			while (penc < (1 << LP_BITS)) {
> +				hpte_page_sizes[penc] = (ap << 4) | bp;
> +				penc += 1 << shift;
> +			}

Can you add a comment around that while loop ? ie something like.
/*
 * if we are using all LP_BITs in penc, fill the array such that we
 * replicate the ap and bp information, ignoring those bits. They will
 * be filled by rpn bits in hpte.
 */


> +		}
> +	}
> +}
> +
>  static void __init htab_init_page_sizes(void)
>  {
> +	init_hpte_page_sizes();
> +
>  	if (!debug_pagealloc_enabled()) {
>  		/*
>  		 * Pick a size for the linear mapping. Currently, we only
> -- 

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

-aneesh


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
@ 2016-09-04 11:28     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 28+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-04 11:28 UTC (permalink / raw)
  To: Paul Mackerras, linuxppc-dev; +Cc: kvm-ppc, kvm

Paul Mackerras <paulus@ozlabs.org> writes:

> +/*
> + * Fill in the hpte_page_sizes[] array.
> + * We go through the mmu_psize_defs[] array looking for all the
> + * supported base/actual page size combinations.  Each combination
> + * has a unique pagesize encoding (penc) value in the low bits of
> + * the LP field of the HPTE.  For actual page sizes less than 1MB,
> + * some of the upper LP bits are used for RPN bits, meaning that
> + * we need to fill in several entries in hpte_page_sizes[].
> + */


May be can put the details of upper LP bits used for RPN here. ie, add
the below in the comment ?

		/*
		 * encoding bits per actual page size
		 *        PTE LP     actual page size
		 *    rrrr rrrz		>=8KB
		 *    rrrr rrzz		>\x16KB
		 *    rrrr rzzz		>2KB
		 *    rrrr zzzz		>dKB
		 * .......
		 */


> +static void init_hpte_page_sizes(void)
> +{
> +	long int ap, bp;
> +	long int shift, penc;
> +
> +	for (bp = 0; bp < MMU_PAGE_COUNT; ++bp) {
> +		if (!mmu_psize_defs[bp].shift)
> +			continue;	/* not a supported page size */
> +		for (ap = bp; ap < MMU_PAGE_COUNT; ++ap) {
> +			penc = mmu_psize_defs[bp].penc[ap];
> +			if (penc = -1)
> +				continue;
> +			shift = mmu_psize_defs[ap].shift - LP_SHIFT;
> +			if (shift <= 0)
> +				continue;	/* should never happen */
> +			while (penc < (1 << LP_BITS)) {
> +				hpte_page_sizes[penc] = (ap << 4) | bp;
> +				penc += 1 << shift;
> +			}

Can you add a comment around that while loop ? ie something like.
/*
 * if we are using all LP_BITs in penc, fill the array such that we
 * replicate the ap and bp information, ignoring those bits. They will
 * be filled by rpn bits in hpte.
 */


> +		}
> +	}
> +}
> +
>  static void __init htab_init_page_sizes(void)
>  {
> +	init_hpte_page_sizes();
> +
>  	if (!debug_pagealloc_enabled()) {
>  		/*
>  		 * Pick a size for the linear mapping. Currently, we only
> -- 

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

-aneesh


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address
  2016-09-02 11:49 ` [PATCH 2/3] powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address Paul Mackerras
@ 2016-09-04 11:30   ` Aneesh Kumar K.V
  2016-09-07  5:52     ` Paul Mackerras
  2016-09-13 12:16   ` [2/3] " Michael Ellerman
  1 sibling, 1 reply; 28+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-04 11:30 UTC (permalink / raw)
  To: Paul Mackerras, linuxppc-dev

Paul Mackerras <paulus@ozlabs.org> writes:

> Currently, if userspace or the kernel accesses a completely bogus address,
> for example with any of bits 46-59 set, we first take an SLB miss interrupt,
> install a corresponding SLB entry with VSID 0, retry the instruction, then
> take a DSI/ISI interrupt because there is no HPT entry mapping the address.
> However, by the time of the second interrupt, the Come-From Address Register
> (CFAR) has been overwritten by the rfid instruction at the end of the SLB
> miss interrupt handler.  Since bogus accesses can often be caused by a
> function return after the stack has been overwritten, the CFAR value would
> be very useful as it could indicate which function it was whose return had
> led to the bogus address.
>
> This patch adds code to create a full exception frame in the SLB miss handler
> in the case of a bogus address, rather than inserting an SLB entry with a
> zero VSID field.  Then we call a new slb_miss_bad_addr() function in C code,
> which delivers a signal for a user access or creates an oops for a kernel
> access.  In the latter case the oops message will show the CFAR value at the
> time of the access.
>
> In the case of the radix MMU, a segment miss interrupt indicates an access
> outside the ranges mapped by the page tables.  Previously this was handled
> by the code for an unrecoverable SLB miss (one with MSR[RI] = 0), which is
> not really correct.  With this patch, we now handle these interrupts with
> slb_miss_bad_addr(), which is much more consistent.
>
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

> ---
>  arch/powerpc/kernel/exceptions-64s.S | 40 ++++++++++++++++++++++++++++++------
>  arch/powerpc/kernel/traps.c          | 11 ++++++++++
>  arch/powerpc/mm/slb_low.S            |  8 +++-----
>  3 files changed, 48 insertions(+), 11 deletions(-)
>
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index df6d45e..a2526b0 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -175,6 +175,7 @@ data_access_slb_pSeries:
>  	std	r3,PACA_EXSLB+EX_R3(r13)
>  	mfspr	r3,SPRN_DAR
>  	mfspr	r12,SPRN_SRR1
> +	crset	4*cr6+eq
>  #ifndef CONFIG_RELOCATABLE
>  	b	slb_miss_realmode
>  #else
> @@ -201,6 +202,7 @@ instruction_access_slb_pSeries:
>  	std	r3,PACA_EXSLB+EX_R3(r13)
>  	mfspr	r3,SPRN_SRR0		/* SRR0 is faulting address */
>  	mfspr	r12,SPRN_SRR1
> +	crclr	4*cr6+eq
>  #ifndef CONFIG_RELOCATABLE
>  	b	slb_miss_realmode
>  #else
> @@ -767,6 +769,7 @@ data_access_slb_relon_pSeries:
>  	std	r3,PACA_EXSLB+EX_R3(r13)
>  	mfspr	r3,SPRN_DAR
>  	mfspr	r12,SPRN_SRR1
> +	crset	4*cr6+eq
>  #ifndef CONFIG_RELOCATABLE
>  	b	slb_miss_realmode
>  #else
> @@ -792,6 +795,7 @@ instruction_access_slb_relon_pSeries:
>  	std	r3,PACA_EXSLB+EX_R3(r13)
>  	mfspr	r3,SPRN_SRR0		/* SRR0 is faulting address */
>  	mfspr	r12,SPRN_SRR1
> +	crclr	4*cr6+eq
>  #ifndef CONFIG_RELOCATABLE
>  	b	slb_miss_realmode
>  #else
> @@ -1389,6 +1393,7 @@ unrecover_mce:
>   * r3 has the faulting address
>   * r9 - r13 are saved in paca->exslb.
>   * r3 is saved in paca->slb_r3
> + * cr6.eq is set for a D-SLB miss, clear for a I-SLB miss
>   * We assume we aren't going to take any exceptions during this procedure.
>   */
>  slb_miss_realmode:
> @@ -1399,29 +1404,31 @@ slb_miss_realmode:
>  
>  	stw	r9,PACA_EXSLB+EX_CCR(r13)	/* save CR in exc. frame */
>  	std	r10,PACA_EXSLB+EX_LR(r13)	/* save LR */
> +	std	r3,PACA_EXSLB+EX_DAR(r13)


We already have that in EX_R3(r13) right ? Any specific reason we can't
use that? . Is this because we are finding that ovewritten by
EXCEPTION_PROLOG_COMMON in bad_addr_slb ?. But we do set the right R3
befor calling bad_addr_slb via 

  	ld	r3,PACA_EXSLB+EX_R3(r13)

>  
> +	crset	4*cr0+eq
>  #ifdef CONFIG_PPC_STD_MMU_64
>  BEGIN_MMU_FTR_SECTION
>  	bl	slb_allocate_realmode
>  END_MMU_FTR_SECTION_IFCLR(MMU_FTR_TYPE_RADIX)
>  #endif
> -	/* All done -- return from exception. */
>  
>  	ld	r10,PACA_EXSLB+EX_LR(r13)
>  	ld	r3,PACA_EXSLB+EX_R3(r13)
>  	lwz	r9,PACA_EXSLB+EX_CCR(r13)	/* get saved CR */
> -
>  	mtlr	r10
> +
> +	beq	8f		/* if bad address, make full stack frame */
> +
>  	andi.	r10,r12,MSR_RI	/* check for unrecoverable exception */
> -BEGIN_MMU_FTR_SECTION
>  	beq-	2f
> -FTR_SECTION_ELSE
> -	b	2f
> -ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
> +
> +	/* All done -- return from exception. */
>  
>  .machine	push
>  .machine	"power4"
>  	mtcrf	0x80,r9
> +	mtcrf	0x02,r9		/* I/D indication is in cr6 */
>  	mtcrf	0x01,r9		/* slb_allocate uses cr0 and cr7 */
>  .machine	pop
>  
> @@ -1451,6 +1458,27 @@ unrecov_slb:
>  	bl	unrecoverable_exception
>  	b	1b
>  
> +8:	mfspr	r11,SPRN_SRR0
> +	ld	r10,PACAKBASE(r13)
> +	LOAD_HANDLER(r10,bad_addr_slb)
> +	mtspr	SPRN_SRR0,r10
> +	ld	r10,PACAKMSR(r13)
> +	mtspr	SPRN_SRR1,r10
> +	rfid
> +	b	.
> +
> +bad_addr_slb:
> +	EXCEPTION_PROLOG_COMMON(0x380, PACA_EXSLB)
> +	RECONCILE_IRQ_STATE(r10, r11)
> +	ld	r3, PACA_EXSLB+EX_DAR(r13)
> +	std	r3, _DAR(r1)
> +	beq	cr6, 2f
> +	li	r10, 0x480		/* fix trap number for I-SLB miss */
> +	std	r10, _TRAP(r1)
> +2:	bl	save_nvgprs
> +	addi	r3, r1, STACK_FRAME_OVERHEAD
> +	bl	slb_miss_bad_addr
> +	b	ret_from_except
>  
>  #ifdef CONFIG_PPC_970_NAP
>  power4_fixup_nap:
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 2cb5892..a80478b 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -1309,6 +1309,17 @@ bail:
>  	exception_exit(prev_state);
>  }
>  
> +void slb_miss_bad_addr(struct pt_regs *regs)
> +{
> +	enum ctx_state prev_state = exception_enter();
> +
> +	if (user_mode(regs))
> +		_exception(SIGSEGV, regs, SEGV_BNDERR, regs->dar);
> +	else
> +		bad_page_fault(regs, regs->dar, SIGSEGV);
> +	exception_exit(prev_state);
> +}
> +
>  void StackOverflow(struct pt_regs *regs)
>  {
>  	printk(KERN_CRIT "Kernel stack overflow in process %p, r1=%lx\n",
> diff --git a/arch/powerpc/mm/slb_low.S b/arch/powerpc/mm/slb_low.S
> index 9f19834..e2974fc 100644
> --- a/arch/powerpc/mm/slb_low.S
> +++ b/arch/powerpc/mm/slb_low.S
> @@ -178,11 +178,9 @@ BEGIN_FTR_SECTION
>  END_MMU_FTR_SECTION_IFSET(MMU_FTR_1T_SEGMENT)
>  	b	slb_finish_load
>  
> -8:	/* invalid EA */
> -	li	r10,0			/* BAD_VSID */
> -	li	r9,0			/* BAD_VSID */
> -	li	r11,SLB_VSID_USER	/* flags don't much matter */
> -	b	slb_finish_load
> +8:	/* invalid EA - return an error indication */
> +	crset	4*cr0+eq		/* indicate failure */
> +	blr
>  
>  /*
>   * Finish loading of an SLB entry and return
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET
  2016-09-03  9:54   ` Paul Mackerras
@ 2016-09-04 11:31     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 28+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-04 11:31 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

Paul Mackerras <paulus@ozlabs.org> writes:

> On Fri, Sep 02, 2016 at 05:52:16PM +0530, Aneesh Kumar K.V wrote:
>> 
>> Hi Paul,
>> 
>> Really nice catch. Was this found by code analysis or do we have any
>> reported issue around this ?
>
> I found it by code analysis.
>
> I haven't been able to find any really bad consequence, beyond leaking
> some information about kernel memory.  Can you find any worse
> consequence?
>

No, considering linux page table entry is not going to have a mapping
for this address.

-aneesh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
  2016-09-02 11:50   ` Paul Mackerras
@ 2016-09-05  5:16     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 28+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-05  5:04 UTC (permalink / raw)
  To: Paul Mackerras, linuxppc-dev; +Cc: kvm-ppc, kvm

> +static void init_hpte_page_sizes(void)
> +{
> +	long int ap, bp;
> +	long int shift, penc;
> +
> +	for (bp = 0; bp < MMU_PAGE_COUNT; ++bp) {
> +		if (!mmu_psize_defs[bp].shift)
> +			continue;	/* not a supported page size */
> +		for (ap = bp; ap < MMU_PAGE_COUNT; ++ap) {
> +			penc = mmu_psize_defs[bp].penc[ap];
> +			if (penc == -1)
> +				continue;
> +			shift = mmu_psize_defs[ap].shift - LP_SHIFT;
> +			if (shift <= 0)
> +				continue;	/* should never happen */
> +			while (penc < (1 << LP_BITS)) {
> +				hpte_page_sizes[penc] = (ap << 4) | bp;
> +				penc += 1 << shift;
> +			}
> +		}
> +	}
> +}
> +

Going through this again, it is confusing . How are we differentiating
between the below penc values

 0000 000z		>=8KB (z = 1)
 0000 zzzz		>=64KB (zzzz = 0001)

Those are made up 'z' values.

-aneesh


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
@ 2016-09-05  5:16     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 28+ messages in thread
From: Aneesh Kumar K.V @ 2016-09-05  5:16 UTC (permalink / raw)
  To: Paul Mackerras, linuxppc-dev; +Cc: kvm-ppc, kvm

> +static void init_hpte_page_sizes(void)
> +{
> +	long int ap, bp;
> +	long int shift, penc;
> +
> +	for (bp = 0; bp < MMU_PAGE_COUNT; ++bp) {
> +		if (!mmu_psize_defs[bp].shift)
> +			continue;	/* not a supported page size */
> +		for (ap = bp; ap < MMU_PAGE_COUNT; ++ap) {
> +			penc = mmu_psize_defs[bp].penc[ap];
> +			if (penc = -1)
> +				continue;
> +			shift = mmu_psize_defs[ap].shift - LP_SHIFT;
> +			if (shift <= 0)
> +				continue;	/* should never happen */
> +			while (penc < (1 << LP_BITS)) {
> +				hpte_page_sizes[penc] = (ap << 4) | bp;
> +				penc += 1 << shift;
> +			}
> +		}
> +	}
> +}
> +

Going through this again, it is confusing . How are we differentiating
between the below penc values

 0000 000z		>=8KB (z = 1)
 0000 zzzz		>dKB (zzzz = 0001)

Those are made up 'z' values.

-aneesh


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
  2016-09-05  5:16     ` Aneesh Kumar K.V
@ 2016-09-07  5:07       ` Paul Mackerras
  -1 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2016-09-07  5:07 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, kvm-ppc, kvm

On Mon, Sep 05, 2016 at 10:34:16AM +0530, Aneesh Kumar K.V wrote:
> > +static void init_hpte_page_sizes(void)
> > +{
> > +	long int ap, bp;
> > +	long int shift, penc;
> > +
> > +	for (bp = 0; bp < MMU_PAGE_COUNT; ++bp) {
> > +		if (!mmu_psize_defs[bp].shift)
> > +			continue;	/* not a supported page size */
> > +		for (ap = bp; ap < MMU_PAGE_COUNT; ++ap) {
> > +			penc = mmu_psize_defs[bp].penc[ap];
> > +			if (penc == -1)
> > +				continue;
> > +			shift = mmu_psize_defs[ap].shift - LP_SHIFT;
> > +			if (shift <= 0)
> > +				continue;	/* should never happen */
> > +			while (penc < (1 << LP_BITS)) {
> > +				hpte_page_sizes[penc] = (ap << 4) | bp;
> > +				penc += 1 << shift;
> > +			}
> > +		}
> > +	}
> > +}
> > +
> 
> Going through this again, it is confusing . How are we differentiating
> between the below penc values
> 
>  0000 000z		>=8KB (z = 1)
>  0000 zzzz		>=64KB (zzzz = 0001)
> 
> Those are made up 'z' values.

That wouldn't be a valid set of page encodings.  If the page encoding
for 8kB pages is z=1 then then encodings for all larger page sizes
would have to have the least significant bit be a 0.  In fact none of
the POWER processors has an 8kB page size; the smallest implemented
large page size is 64kB.  Consequently the first level of decoding of
the page size on these CPUs can look at the bottom 4 bits.

The 00000000 encoding is used for 16MB pages, because 16MB was the
first large page size implemented back in the POWER4+ days, and there
was no page size field at that time, so these 8 bits were reserved and
set to zero by OSes at that time.  For compatibility, the 00000000
encoding continues to be used, so the encodings for other page sizes
always have at least one 1 in the zzzz bits.

Paul.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
@ 2016-09-07  5:07       ` Paul Mackerras
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2016-09-07  5:07 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev, kvm-ppc, kvm

On Mon, Sep 05, 2016 at 10:34:16AM +0530, Aneesh Kumar K.V wrote:
> > +static void init_hpte_page_sizes(void)
> > +{
> > +	long int ap, bp;
> > +	long int shift, penc;
> > +
> > +	for (bp = 0; bp < MMU_PAGE_COUNT; ++bp) {
> > +		if (!mmu_psize_defs[bp].shift)
> > +			continue;	/* not a supported page size */
> > +		for (ap = bp; ap < MMU_PAGE_COUNT; ++ap) {
> > +			penc = mmu_psize_defs[bp].penc[ap];
> > +			if (penc = -1)
> > +				continue;
> > +			shift = mmu_psize_defs[ap].shift - LP_SHIFT;
> > +			if (shift <= 0)
> > +				continue;	/* should never happen */
> > +			while (penc < (1 << LP_BITS)) {
> > +				hpte_page_sizes[penc] = (ap << 4) | bp;
> > +				penc += 1 << shift;
> > +			}
> > +		}
> > +	}
> > +}
> > +
> 
> Going through this again, it is confusing . How are we differentiating
> between the below penc values
> 
>  0000 000z		>=8KB (z = 1)
>  0000 zzzz		>dKB (zzzz = 0001)
> 
> Those are made up 'z' values.

That wouldn't be a valid set of page encodings.  If the page encoding
for 8kB pages is z=1 then then encodings for all larger page sizes
would have to have the least significant bit be a 0.  In fact none of
the POWER processors has an 8kB page size; the smallest implemented
large page size is 64kB.  Consequently the first level of decoding of
the page size on these CPUs can look at the bottom 4 bits.

The 00000000 encoding is used for 16MB pages, because 16MB was the
first large page size implemented back in the POWER4+ days, and there
was no page size field at that time, so these 8 bits were reserved and
set to zero by OSes at that time.  For compatibility, the 00000000
encoding continues to be used, so the encodings for other page sizes
always have at least one 1 in the zzzz bits.

Paul.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/3] powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address
  2016-09-04 11:30   ` Aneesh Kumar K.V
@ 2016-09-07  5:52     ` Paul Mackerras
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2016-09-07  5:52 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev

On Sun, Sep 04, 2016 at 05:00:13PM +0530, Aneesh Kumar K.V wrote:
[snip]
> > @@ -1389,6 +1393,7 @@ unrecover_mce:
> >   * r3 has the faulting address
> >   * r9 - r13 are saved in paca->exslb.
> >   * r3 is saved in paca->slb_r3
> > + * cr6.eq is set for a D-SLB miss, clear for a I-SLB miss
> >   * We assume we aren't going to take any exceptions during this procedure.
> >   */
> >  slb_miss_realmode:
> > @@ -1399,29 +1404,31 @@ slb_miss_realmode:
> >  
> >  	stw	r9,PACA_EXSLB+EX_CCR(r13)	/* save CR in exc. frame */
> >  	std	r10,PACA_EXSLB+EX_LR(r13)	/* save LR */
> > +	std	r3,PACA_EXSLB+EX_DAR(r13)
> 
> 
> We already have that in EX_R3(r13) right ? Any specific reason we can't

No, what's in EX_R3(r13) is the original value of r3.  What's in r3
now is the faulting address.  We save that here so we can put it in
regs->dar later on.

Paul.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
  2016-09-02 11:50   ` Paul Mackerras
                     ` (2 preceding siblings ...)
  (?)
@ 2016-09-07  6:17   ` Paul Mackerras
  2016-09-08 10:08       ` Paul Mackerras
  -1 siblings, 1 reply; 28+ messages in thread
From: Paul Mackerras @ 2016-09-07  6:17 UTC (permalink / raw)
  To: linuxppc-dev

This replaces a 2-D search through an array with a simple 8-bit table
lookup for determining the actual and/or base page size for a HPT entry.

The encoding in the second doubleword of the HPTE is designed to encode
the actual and base page sizes without using any more bits than would be
needed for a 4k page number, by using between 1 and 8 low-order bits of
the RPN (real page number) field to encode the page sizes.  A single
"large page" bit in the first doubleword indicates that these low-order
bits are to be interpreted like this.

We can determine the page sizes by using the low-order 8 bits of the RPN
to look up a 256-entry table.  For actual page sizes less than 1MB, some
of the upper bits of these 8 bits are going to be real address bits, but
we can cope with that by replicating the entries for those smaller page
sizes.

While we're at it, let's move the hpte_page_size() and hpte_base_page_size()
functions from a KVM-specific header to a header for 64-bit HPT systems,
since this computation doesn't have anything specifically to do with KVM.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
---
v2: added more comments as suggested by Aneesh

 arch/powerpc/include/asm/book3s/64/mmu-hash.h | 37 ++++++++++++
 arch/powerpc/include/asm/kvm_book3s_64.h      | 87 +++------------------------
 arch/powerpc/include/asm/mmu.h                |  1 +
 arch/powerpc/mm/hash_native_64.c              | 42 +------------
 arch/powerpc/mm/hash_utils_64.c               | 55 +++++++++++++++++
 5 files changed, 102 insertions(+), 120 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 287a656..e407af2 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -245,6 +245,43 @@ static inline int segment_shift(int ssize)
 }
 
 /*
+ * This array is indexed by the LP field of the HPTE second dword.
+ * Since this field may contain some RPN bits, some entries are
+ * replicated so that we get the same value irrespective of RPN.
+ * The top 4 bits are the page size index (MMU_PAGE_*) for the
+ * actual page size, the bottom 4 bits are the base page size.
+ */
+extern u8 hpte_page_sizes[1 << LP_BITS];
+
+static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l,
+					     bool is_base_size)
+{
+	unsigned int i, lp;
+
+	if (!(h & HPTE_V_LARGE))
+		return 1ul << 12;
+
+	/* Look at the 8 bit LP value */
+	lp = (l >> LP_SHIFT) & ((1 << LP_BITS) - 1);
+	i = hpte_page_sizes[lp];
+	if (!i)
+		return 0;
+	if (!is_base_size)
+		i >>= 4;
+	return 1ul << mmu_psize_defs[i & 0xf].shift;
+}
+
+static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
+{
+	return __hpte_page_size(h, l, 0);
+}
+
+static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long l)
+{
+	return __hpte_page_size(h, l, 1);
+}
+
+/*
  * The current system page and segment sizes
  */
 extern int mmu_kernel_ssize;
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 88d17b4..4ffd5a1 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -20,6 +20,8 @@
 #ifndef __ASM_KVM_BOOK3S_64_H__
 #define __ASM_KVM_BOOK3S_64_H__
 
+#include <asm/book3s/64/mmu-hash.h>
+
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 static inline struct kvmppc_book3s_shadow_vcpu *svcpu_get(struct kvm_vcpu *vcpu)
 {
@@ -97,56 +99,20 @@ static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v)
 	hpte[0] = cpu_to_be64(hpte_v);
 }
 
-static inline int __hpte_actual_psize(unsigned int lp, int psize)
-{
-	int i, shift;
-	unsigned int mask;
-
-	/* start from 1 ignoring MMU_PAGE_4K */
-	for (i = 1; i < MMU_PAGE_COUNT; i++) {
-
-		/* invalid penc */
-		if (mmu_psize_defs[psize].penc[i] == -1)
-			continue;
-		/*
-		 * encoding bits per actual page size
-		 *        PTE LP     actual page size
-		 *    rrrr rrrz		>=8KB
-		 *    rrrr rrzz		>=16KB
-		 *    rrrr rzzz		>=32KB
-		 *    rrrr zzzz		>=64KB
-		 * .......
-		 */
-		shift = mmu_psize_defs[i].shift - LP_SHIFT;
-		if (shift > LP_BITS)
-			shift = LP_BITS;
-		mask = (1 << shift) - 1;
-		if ((lp & mask) == mmu_psize_defs[psize].penc[i])
-			return i;
-	}
-	return -1;
-}
-
 static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
 					     unsigned long pte_index)
 {
-	int b_psize = MMU_PAGE_4K, a_psize = MMU_PAGE_4K;
+	int i, b_psize = MMU_PAGE_4K, a_psize = MMU_PAGE_4K;
 	unsigned int penc;
 	unsigned long rb = 0, va_low, sllp;
 	unsigned int lp = (r >> LP_SHIFT) & ((1 << LP_BITS) - 1);
 
 	if (v & HPTE_V_LARGE) {
-		for (b_psize = 0; b_psize < MMU_PAGE_COUNT; b_psize++) {
-
-			/* valid entries have a shift value */
-			if (!mmu_psize_defs[b_psize].shift)
-				continue;
-
-			a_psize = __hpte_actual_psize(lp, b_psize);
-			if (a_psize != -1)
-				break;
-		}
+		i = hpte_page_sizes[lp];
+		b_psize = i & 0xf;
+		a_psize = i >> 4;
 	}
+
 	/*
 	 * Ignore the top 14 bits of va
 	 * v have top two bits covering segment size, hence move
@@ -215,45 +181,6 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r,
 	return rb;
 }
 
-static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l,
-					     bool is_base_size)
-{
-
-	int size, a_psize;
-	/* Look at the 8 bit LP value */
-	unsigned int lp = (l >> LP_SHIFT) & ((1 << LP_BITS) - 1);
-
-	/* only handle 4k, 64k and 16M pages for now */
-	if (!(h & HPTE_V_LARGE))
-		return 1ul << 12;
-	else {
-		for (size = 0; size < MMU_PAGE_COUNT; size++) {
-			/* valid entries have a shift value */
-			if (!mmu_psize_defs[size].shift)
-				continue;
-
-			a_psize = __hpte_actual_psize(lp, size);
-			if (a_psize != -1) {
-				if (is_base_size)
-					return 1ul << mmu_psize_defs[size].shift;
-				return 1ul << mmu_psize_defs[a_psize].shift;
-			}
-		}
-
-	}
-	return 0;
-}
-
-static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
-{
-	return __hpte_page_size(h, l, 0);
-}
-
-static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long l)
-{
-	return __hpte_page_size(h, l, 1);
-}
-
 static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
 {
 	return ((ptel & HPTE_R_RPN) & ~(psize - 1)) >> PAGE_SHIFT;
diff --git a/arch/powerpc/include/asm/mmu.h b/arch/powerpc/include/asm/mmu.h
index e2fb408..b78e8d3 100644
--- a/arch/powerpc/include/asm/mmu.h
+++ b/arch/powerpc/include/asm/mmu.h
@@ -271,6 +271,7 @@ static inline bool early_radix_enabled(void)
 #define MMU_PAGE_16G	13
 #define MMU_PAGE_64G	14
 
+/* N.B. we need to change the type of hpte_page_sizes if this gets to be > 16 */
 #define MMU_PAGE_COUNT	15
 
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c
index 0e4e965..83ddc0e 100644
--- a/arch/powerpc/mm/hash_native_64.c
+++ b/arch/powerpc/mm/hash_native_64.c
@@ -493,36 +493,6 @@ static void native_hugepage_invalidate(unsigned long vsid,
 }
 #endif
 
-static inline int __hpte_actual_psize(unsigned int lp, int psize)
-{
-	int i, shift;
-	unsigned int mask;
-
-	/* start from 1 ignoring MMU_PAGE_4K */
-	for (i = 1; i < MMU_PAGE_COUNT; i++) {
-
-		/* invalid penc */
-		if (mmu_psize_defs[psize].penc[i] == -1)
-			continue;
-		/*
-		 * encoding bits per actual page size
-		 *        PTE LP     actual page size
-		 *    rrrr rrrz		>=8KB
-		 *    rrrr rrzz		>=16KB
-		 *    rrrr rzzz		>=32KB
-		 *    rrrr zzzz		>=64KB
-		 * .......
-		 */
-		shift = mmu_psize_defs[i].shift - LP_SHIFT;
-		if (shift > LP_BITS)
-			shift = LP_BITS;
-		mask = (1 << shift) - 1;
-		if ((lp & mask) == mmu_psize_defs[psize].penc[i])
-			return i;
-	}
-	return -1;
-}
-
 static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
 			int *psize, int *apsize, int *ssize, unsigned long *vpn)
 {
@@ -538,16 +508,8 @@ static void hpte_decode(struct hash_pte *hpte, unsigned long slot,
 		size   = MMU_PAGE_4K;
 		a_size = MMU_PAGE_4K;
 	} else {
-		for (size = 0; size < MMU_PAGE_COUNT; size++) {
-
-			/* valid entries have a shift value */
-			if (!mmu_psize_defs[size].shift)
-				continue;
-
-			a_size = __hpte_actual_psize(lp, size);
-			if (a_size != -1)
-				break;
-		}
+		size = hpte_page_sizes[lp] & 0xf;
+		a_size = hpte_page_sizes[lp] >> 4;
 	}
 	/* This works for all page sizes, and for 256M and 1T segments */
 	if (cpu_has_feature(CPU_FTR_ARCH_300))
diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 0821556..ef3ae89 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -93,6 +93,9 @@ static unsigned long _SDR1;
 struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT];
 EXPORT_SYMBOL_GPL(mmu_psize_defs);
 
+u8 hpte_page_sizes[1 << LP_BITS];
+EXPORT_SYMBOL_GPL(hpte_page_sizes);
+
 struct hash_pte *htab_address;
 unsigned long htab_size_bytes;
 unsigned long htab_hash_mask;
@@ -564,8 +567,60 @@ static void __init htab_scan_page_sizes(void)
 #endif /* CONFIG_HUGETLB_PAGE */
 }
 
+/*
+ * Fill in the hpte_page_sizes[] array.
+ * We go through the mmu_psize_defs[] array looking for all the
+ * supported base/actual page size combinations.  Each combination
+ * has a unique pagesize encoding (penc) value in the low bits of
+ * the LP field of the HPTE.  For actual page sizes less than 1MB,
+ * some of the upper LP bits are used for RPN bits, meaning that
+ * we need to fill in several entries in hpte_page_sizes[].
+ *
+ * In diagrammatic form, with r = RPN bits and z = page size bits:
+ *        PTE LP     actual page size
+ *    rrrr rrrz		>=8KB
+ *    rrrr rrzz		>=16KB
+ *    rrrr rzzz		>=32KB
+ *    rrrr zzzz		>=64KB
+ *    ...
+ *
+ * The zzzz bits are implementation-specific but are chosen so that
+ * no encoding for a larger page size uses the same value in its
+ * low-order N bits as the encoding for the 2^(12+N) byte page size
+ * (if it exists).
+ */
+static void init_hpte_page_sizes(void)
+{
+	long int ap, bp;
+	long int shift, penc;
+
+	for (bp = 0; bp < MMU_PAGE_COUNT; ++bp) {
+		if (!mmu_psize_defs[bp].shift)
+			continue;	/* not a supported page size */
+		for (ap = bp; ap < MMU_PAGE_COUNT; ++ap) {
+			penc = mmu_psize_defs[bp].penc[ap];
+			if (penc == -1)
+				continue;
+			shift = mmu_psize_defs[ap].shift - LP_SHIFT;
+			if (shift <= 0)
+				continue;	/* should never happen */
+			/*
+			 * For page sizes less than 1MB, this loop
+			 * replicates the entry for all possible values
+			 * of the rrrr bits.
+			 */
+			while (penc < (1 << LP_BITS)) {
+				hpte_page_sizes[penc] = (ap << 4) | bp;
+				penc += 1 << shift;
+			}
+		}
+	}
+}
+
 static void __init htab_init_page_sizes(void)
 {
+	init_hpte_page_sizes();
+
 	if (!debug_pagealloc_enabled()) {
 		/*
 		 * Pick a size for the linear mapping. Currently, we only
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [1/3] powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET
  2016-09-02 11:47 [PATCH 1/3] powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET Paul Mackerras
                   ` (2 preceding siblings ...)
  2016-09-02 12:22 ` [PATCH 1/3] powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET Aneesh Kumar K.V
@ 2016-09-08  9:47 ` Michael Ellerman
  3 siblings, 0 replies; 28+ messages in thread
From: Michael Ellerman @ 2016-09-08  9:47 UTC (permalink / raw)
  To: Paul Mackerras, linuxppc-dev; +Cc: Aneesh Kumar K.V

On Fri, 2016-02-09 at 11:47:59 UTC, Paul Mackerras wrote:
> In commit c60ac5693c47 ("powerpc: Update kernel VSID range", 2013-03-13)
> we lost a check on the region number (the top four bits of the effective
> address) for addresses below PAGE_OFFSET.  That commit replaced a check
> that the top 18 bits were all zero with a check that bits 46 - 59 were
> zero (performed for all addresses, not just user addresses).
> 
> This means that userspace can access an address like 0x1000_0xxx_xxxx_xxxx
> and we will insert a valid SLB entry for it.  The VSID used will be the
> same as if the top 4 bits were 0, but the page size will be some random
> value obtained by indexing beyond the end of the mm_ctx_high_slices_psize
> array in the paca.  If that page size is the same as would be used for
> region 0, then userspace just has an alias of the region 0 space.  If the
> page size is different, then no HPTE will be found for the access, and
> the process will get a SIGSEGV (since hash_page_mm() will refuse to create
> a HPTE for the bogus address).
> 
> The access beyond the end of the mm_ctx_high_slices_psize can be at most
> 5.5MB past the array, and so will be in RAM somewhere.  Since the access
> is a load performed in real mode, it won't fault or crash the kernel.
> At most this bug could perhaps leak a little bit of information about
> blocks of 32 bytes of memory located at offsets of i * 512kB past the
> paca->mm_ctx_high_slices_psize array, for 1 <= i <= 11.
> 
> Cc: stable@vger.kernel.org # v3.10+
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/f077aaf0754bcba0fffdbd925b

cheers

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
  2016-09-07  6:17   ` [PATCH v2 " Paul Mackerras
@ 2016-09-08 10:08       ` Paul Mackerras
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2016-09-08 10:08 UTC (permalink / raw)
  To: Michael Ellerman, Paolo Bonzini; +Cc: linuxppc-dev, kvm, kvm-ppc

On Wed, Sep 07, 2016 at 04:17:09PM +1000, Paul Mackerras wrote:
> This replaces a 2-D search through an array with a simple 8-bit table
> lookup for determining the actual and/or base page size for a HPT entry.
> 
> The encoding in the second doubleword of the HPTE is designed to encode
> the actual and base page sizes without using any more bits than would be
> needed for a 4k page number, by using between 1 and 8 low-order bits of
> the RPN (real page number) field to encode the page sizes.  A single
> "large page" bit in the first doubleword indicates that these low-order
> bits are to be interpreted like this.
> 
> We can determine the page sizes by using the low-order 8 bits of the RPN
> to look up a 256-entry table.  For actual page sizes less than 1MB, some
> of the upper bits of these 8 bits are going to be real address bits, but
> we can cope with that by replicating the entries for those smaller page
> sizes.
> 
> While we're at it, let's move the hpte_page_size() and hpte_base_page_size()
> functions from a KVM-specific header to a header for 64-bit HPT systems,
> since this computation doesn't have anything specifically to do with KVM.
> 
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
> v2: added more comments as suggested by Aneesh
> 
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h | 37 ++++++++++++
>  arch/powerpc/include/asm/kvm_book3s_64.h      | 87 +++------------------------
>  arch/powerpc/include/asm/mmu.h                |  1 +
>  arch/powerpc/mm/hash_native_64.c              | 42 +------------
>  arch/powerpc/mm/hash_utils_64.c               | 55 +++++++++++++++++

This of course touches two maintainers' areas.  Michael and Paolo, how
do you want to proceed here?  Can this just go through Michael's tree?
Or should I make a topic branch off Linus' tree that you can both
pull, or should I split the patch into two (i.e. everything except the
kvm_book3s_64.h change in the first patch, and the kvm_book3s_64.h
change in the second) and get Michael to put the first one in a topic
branch that I can then pull and apply the second patch onto?

Thanks,
Paul.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
@ 2016-09-08 10:08       ` Paul Mackerras
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2016-09-08 10:08 UTC (permalink / raw)
  To: Michael Ellerman, Paolo Bonzini; +Cc: linuxppc-dev, kvm, kvm-ppc

On Wed, Sep 07, 2016 at 04:17:09PM +1000, Paul Mackerras wrote:
> This replaces a 2-D search through an array with a simple 8-bit table
> lookup for determining the actual and/or base page size for a HPT entry.
> 
> The encoding in the second doubleword of the HPTE is designed to encode
> the actual and base page sizes without using any more bits than would be
> needed for a 4k page number, by using between 1 and 8 low-order bits of
> the RPN (real page number) field to encode the page sizes.  A single
> "large page" bit in the first doubleword indicates that these low-order
> bits are to be interpreted like this.
> 
> We can determine the page sizes by using the low-order 8 bits of the RPN
> to look up a 256-entry table.  For actual page sizes less than 1MB, some
> of the upper bits of these 8 bits are going to be real address bits, but
> we can cope with that by replicating the entries for those smaller page
> sizes.
> 
> While we're at it, let's move the hpte_page_size() and hpte_base_page_size()
> functions from a KVM-specific header to a header for 64-bit HPT systems,
> since this computation doesn't have anything specifically to do with KVM.
> 
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> ---
> v2: added more comments as suggested by Aneesh
> 
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h | 37 ++++++++++++
>  arch/powerpc/include/asm/kvm_book3s_64.h      | 87 +++------------------------
>  arch/powerpc/include/asm/mmu.h                |  1 +
>  arch/powerpc/mm/hash_native_64.c              | 42 +------------
>  arch/powerpc/mm/hash_utils_64.c               | 55 +++++++++++++++++

This of course touches two maintainers' areas.  Michael and Paolo, how
do you want to proceed here?  Can this just go through Michael's tree?
Or should I make a topic branch off Linus' tree that you can both
pull, or should I split the patch into two (i.e. everything except the
kvm_book3s_64.h change in the first patch, and the kvm_book3s_64.h
change in the second) and get Michael to put the first one in a topic
branch that I can then pull and apply the second patch onto?

Thanks,
Paul.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
  2016-09-08 10:08       ` Paul Mackerras
@ 2016-09-08 10:16         ` Paolo Bonzini
  -1 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2016-09-08 10:16 UTC (permalink / raw)
  To: Paul Mackerras, Michael Ellerman; +Cc: linuxppc-dev, kvm, kvm-ppc



On 08/09/2016 12:08, Paul Mackerras wrote:
>> > 
>> >  arch/powerpc/include/asm/book3s/64/mmu-hash.h | 37 ++++++++++++
>> >  arch/powerpc/include/asm/kvm_book3s_64.h      | 87 +++------------------------
>> >  arch/powerpc/include/asm/mmu.h                |  1 +
>> >  arch/powerpc/mm/hash_native_64.c              | 42 +------------
>> >  arch/powerpc/mm/hash_utils_64.c               | 55 +++++++++++++++++
> This of course touches two maintainers' areas.  Michael and Paolo, how
> do you want to proceed here?  Can this just go through Michael's tree?
> Or should I make a topic branch off Linus' tree that you can both
> pull, or should I split the patch into two (i.e. everything except the
> kvm_book3s_64.h change in the first patch, and the kvm_book3s_64.h
> change in the second) and get Michael to put the first one in a topic
> branch that I can then pull and apply the second patch onto?

This patch seems separate from the other two (I can't really tell since
there wasn't a cover letter on linuxppc-dev).  Can you place it in a
pull request for both Michael and myself?

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
@ 2016-09-08 10:16         ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2016-09-08 10:16 UTC (permalink / raw)
  To: Paul Mackerras, Michael Ellerman; +Cc: linuxppc-dev, kvm, kvm-ppc



On 08/09/2016 12:08, Paul Mackerras wrote:
>> > 
>> >  arch/powerpc/include/asm/book3s/64/mmu-hash.h | 37 ++++++++++++
>> >  arch/powerpc/include/asm/kvm_book3s_64.h      | 87 +++------------------------
>> >  arch/powerpc/include/asm/mmu.h                |  1 +
>> >  arch/powerpc/mm/hash_native_64.c              | 42 +------------
>> >  arch/powerpc/mm/hash_utils_64.c               | 55 +++++++++++++++++
> This of course touches two maintainers' areas.  Michael and Paolo, how
> do you want to proceed here?  Can this just go through Michael's tree?
> Or should I make a topic branch off Linus' tree that you can both
> pull, or should I split the patch into two (i.e. everything except the
> kvm_book3s_64.h change in the first patch, and the kvm_book3s_64.h
> change in the second) and get Michael to put the first one in a topic
> branch that I can then pull and apply the second patch onto?

This patch seems separate from the other two (I can't really tell since
there wasn't a cover letter on linuxppc-dev).  Can you place it in a
pull request for both Michael and myself?

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
  2016-09-08 10:16         ` Paolo Bonzini
@ 2016-09-12  0:58           ` Paul Mackerras
  -1 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2016-09-12  0:58 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Michael Ellerman, linuxppc-dev, kvm, kvm-ppc

On Thu, Sep 08, 2016 at 12:16:00PM +0200, Paolo Bonzini wrote:
> 
> 
> On 08/09/2016 12:08, Paul Mackerras wrote:
> >> > 
> >> >  arch/powerpc/include/asm/book3s/64/mmu-hash.h | 37 ++++++++++++
> >> >  arch/powerpc/include/asm/kvm_book3s_64.h      | 87 +++------------------------
> >> >  arch/powerpc/include/asm/mmu.h                |  1 +
> >> >  arch/powerpc/mm/hash_native_64.c              | 42 +------------
> >> >  arch/powerpc/mm/hash_utils_64.c               | 55 +++++++++++++++++
> > This of course touches two maintainers' areas.  Michael and Paolo, how
> > do you want to proceed here?  Can this just go through Michael's tree?
> > Or should I make a topic branch off Linus' tree that you can both
> > pull, or should I split the patch into two (i.e. everything except the
> > kvm_book3s_64.h change in the first patch, and the kvm_book3s_64.h
> > change in the second) and get Michael to put the first one in a topic
> > branch that I can then pull and apply the second patch onto?
> 
> This patch seems separate from the other two (I can't really tell since
> there wasn't a cover letter on linuxppc-dev).  Can you place it in a
> pull request for both Michael and myself?

Yes, it is separate.  I have put it in a new kvm-ppc-infrastructure
branch, which I have merged into my kvm-ppc-next branch (since there
are some other patches on that branch which are prerequisites for some
patches in kvm-ppc-next).  Michael can pull kvm-ppc-infrastructure
when he wants to.  I'll send a pull request for kvm-ppc-next tomorrow
assuming today's linux-next merge doesn't cause any problems.

Paul.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
@ 2016-09-12  0:58           ` Paul Mackerras
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Mackerras @ 2016-09-12  0:58 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Michael Ellerman, linuxppc-dev, kvm, kvm-ppc

On Thu, Sep 08, 2016 at 12:16:00PM +0200, Paolo Bonzini wrote:
> 
> 
> On 08/09/2016 12:08, Paul Mackerras wrote:
> >> > 
> >> >  arch/powerpc/include/asm/book3s/64/mmu-hash.h | 37 ++++++++++++
> >> >  arch/powerpc/include/asm/kvm_book3s_64.h      | 87 +++------------------------
> >> >  arch/powerpc/include/asm/mmu.h                |  1 +
> >> >  arch/powerpc/mm/hash_native_64.c              | 42 +------------
> >> >  arch/powerpc/mm/hash_utils_64.c               | 55 +++++++++++++++++
> > This of course touches two maintainers' areas.  Michael and Paolo, how
> > do you want to proceed here?  Can this just go through Michael's tree?
> > Or should I make a topic branch off Linus' tree that you can both
> > pull, or should I split the patch into two (i.e. everything except the
> > kvm_book3s_64.h change in the first patch, and the kvm_book3s_64.h
> > change in the second) and get Michael to put the first one in a topic
> > branch that I can then pull and apply the second patch onto?
> 
> This patch seems separate from the other two (I can't really tell since
> there wasn't a cover letter on linuxppc-dev).  Can you place it in a
> pull request for both Michael and myself?

Yes, it is separate.  I have put it in a new kvm-ppc-infrastructure
branch, which I have merged into my kvm-ppc-next branch (since there
are some other patches on that branch which are prerequisites for some
patches in kvm-ppc-next).  Michael can pull kvm-ppc-infrastructure
when he wants to.  I'll send a pull request for kvm-ppc-next tomorrow
assuming today's linux-next merge doesn't cause any problems.

Paul.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
  2016-09-08 10:16         ` Paolo Bonzini
@ 2016-09-12  3:03           ` Michael Ellerman
  -1 siblings, 0 replies; 28+ messages in thread
From: Michael Ellerman @ 2016-09-12  3:03 UTC (permalink / raw)
  To: Paolo Bonzini, Paul Mackerras; +Cc: linuxppc-dev, kvm, kvm-ppc

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 08/09/2016 12:08, Paul Mackerras wrote:
>>> > 
>>> >  arch/powerpc/include/asm/book3s/64/mmu-hash.h | 37 ++++++++++++
>>> >  arch/powerpc/include/asm/kvm_book3s_64.h      | 87 +++------------------------
>>> >  arch/powerpc/include/asm/mmu.h                |  1 +
>>> >  arch/powerpc/mm/hash_native_64.c              | 42 +------------
>>> >  arch/powerpc/mm/hash_utils_64.c               | 55 +++++++++++++++++
>> This of course touches two maintainers' areas.  Michael and Paolo, how
>> do you want to proceed here?  Can this just go through Michael's tree?
>> Or should I make a topic branch off Linus' tree that you can both
>> pull, or should I split the patch into two (i.e. everything except the
>> kvm_book3s_64.h change in the first patch, and the kvm_book3s_64.h
>> change in the second) and get Michael to put the first one in a topic
>> branch that I can then pull and apply the second patch onto?
>
> This patch seems separate from the other two (I can't really tell since
> there wasn't a cover letter on linuxppc-dev).

Yeah. I've merged 1/3 as a fix, and will take 2/3 into next.

> Can you place it in a pull request for both Michael and myself?

Paul and I talked about this offline, he's going to create a topic
branch with this in it.

I'll hold off merging it until closer to the merge window, and I'll
merge it then if we are actually seeing conflicts between the PPC & KVM
trees caused by this.

cheers

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
@ 2016-09-12  3:03           ` Michael Ellerman
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Ellerman @ 2016-09-12  3:03 UTC (permalink / raw)
  To: Paolo Bonzini, Paul Mackerras; +Cc: linuxppc-dev, kvm, kvm-ppc

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 08/09/2016 12:08, Paul Mackerras wrote:
>>> > 
>>> >  arch/powerpc/include/asm/book3s/64/mmu-hash.h | 37 ++++++++++++
>>> >  arch/powerpc/include/asm/kvm_book3s_64.h      | 87 +++------------------------
>>> >  arch/powerpc/include/asm/mmu.h                |  1 +
>>> >  arch/powerpc/mm/hash_native_64.c              | 42 +------------
>>> >  arch/powerpc/mm/hash_utils_64.c               | 55 +++++++++++++++++
>> This of course touches two maintainers' areas.  Michael and Paolo, how
>> do you want to proceed here?  Can this just go through Michael's tree?
>> Or should I make a topic branch off Linus' tree that you can both
>> pull, or should I split the patch into two (i.e. everything except the
>> kvm_book3s_64.h change in the first patch, and the kvm_book3s_64.h
>> change in the second) and get Michael to put the first one in a topic
>> branch that I can then pull and apply the second patch onto?
>
> This patch seems separate from the other two (I can't really tell since
> there wasn't a cover letter on linuxppc-dev).

Yeah. I've merged 1/3 as a fix, and will take 2/3 into next.

> Can you place it in a pull request for both Michael and myself?

Paul and I talked about this offline, he's going to create a topic
branch with this in it.

I'll hold off merging it until closer to the merge window, and I'll
merge it then if we are actually seeing conflicts between the PPC & KVM
trees caused by this.

cheers

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
  2016-09-12  3:03           ` Michael Ellerman
@ 2016-09-12  9:45             ` Paolo Bonzini
  -1 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2016-09-12  9:45 UTC (permalink / raw)
  To: Michael Ellerman, Paul Mackerras; +Cc: linuxppc-dev, kvm, kvm-ppc



On 12/09/2016 05:03, Michael Ellerman wrote:
>> > Can you place it in a pull request for both Michael and myself?
> Paul and I talked about this offline, he's going to create a topic
> branch with this in it.
> 
> I'll hold off merging it until closer to the merge window, and I'll
> merge it then if we are actually seeing conflicts between the PPC & KVM
> trees caused by this.

Sounds like best of both worlds.  Thanks!

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE
@ 2016-09-12  9:45             ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2016-09-12  9:45 UTC (permalink / raw)
  To: Michael Ellerman, Paul Mackerras; +Cc: linuxppc-dev, kvm, kvm-ppc



On 12/09/2016 05:03, Michael Ellerman wrote:
>> > Can you place it in a pull request for both Michael and myself?
> Paul and I talked about this offline, he's going to create a topic
> branch with this in it.
> 
> I'll hold off merging it until closer to the merge window, and I'll
> merge it then if we are actually seeing conflicts between the PPC & KVM
> trees caused by this.

Sounds like best of both worlds.  Thanks!

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [2/3] powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address
  2016-09-02 11:49 ` [PATCH 2/3] powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address Paul Mackerras
  2016-09-04 11:30   ` Aneesh Kumar K.V
@ 2016-09-13 12:16   ` Michael Ellerman
  1 sibling, 0 replies; 28+ messages in thread
From: Michael Ellerman @ 2016-09-13 12:16 UTC (permalink / raw)
  To: Paul Mackerras, linuxppc-dev

On Fri, 2016-02-09 at 11:49:21 UTC, Paul Mackerras wrote:
> Currently, if userspace or the kernel accesses a completely bogus address,
> for example with any of bits 46-59 set, we first take an SLB miss interrupt,
> install a corresponding SLB entry with VSID 0, retry the instruction, then
> take a DSI/ISI interrupt because there is no HPT entry mapping the address.
> However, by the time of the second interrupt, the Come-From Address Register
> (CFAR) has been overwritten by the rfid instruction at the end of the SLB
> miss interrupt handler.  Since bogus accesses can often be caused by a
> function return after the stack has been overwritten, the CFAR value would
> be very useful as it could indicate which function it was whose return had
> led to the bogus address.
> 
> This patch adds code to create a full exception frame in the SLB miss handler
> in the case of a bogus address, rather than inserting an SLB entry with a
> zero VSID field.  Then we call a new slb_miss_bad_addr() function in C code,
> which delivers a signal for a user access or creates an oops for a kernel
> access.  In the latter case the oops message will show the CFAR value at the
> time of the access.
> 
> In the case of the radix MMU, a segment miss interrupt indicates an access
> outside the ranges mapped by the page tables.  Previously this was handled
> by the code for an unrecoverable SLB miss (one with MSR[RI] = 0), which is
> not really correct.  With this patch, we now handle these interrupts with
> slb_miss_bad_addr(), which is much more consistent.
> 
> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f0f558b131db0e793fd90aac5d

cheers

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-09-13 12:16 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-02 11:47 [PATCH 1/3] powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET Paul Mackerras
2016-09-02 11:49 ` [PATCH 2/3] powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address Paul Mackerras
2016-09-04 11:30   ` Aneesh Kumar K.V
2016-09-07  5:52     ` Paul Mackerras
2016-09-13 12:16   ` [2/3] " Michael Ellerman
2016-09-02 11:50 ` [PATCH 3/3] powerpc/mm: Speed up computation of base and actual page size for a HPTE Paul Mackerras
2016-09-02 11:50   ` Paul Mackerras
2016-09-04 11:16   ` Aneesh Kumar K.V
2016-09-04 11:28     ` Aneesh Kumar K.V
2016-09-05  5:04   ` Aneesh Kumar K.V
2016-09-05  5:16     ` Aneesh Kumar K.V
2016-09-07  5:07     ` Paul Mackerras
2016-09-07  5:07       ` Paul Mackerras
2016-09-07  6:17   ` [PATCH v2 " Paul Mackerras
2016-09-08 10:08     ` Paul Mackerras
2016-09-08 10:08       ` Paul Mackerras
2016-09-08 10:16       ` Paolo Bonzini
2016-09-08 10:16         ` Paolo Bonzini
2016-09-12  0:58         ` Paul Mackerras
2016-09-12  0:58           ` Paul Mackerras
2016-09-12  3:03         ` Michael Ellerman
2016-09-12  3:03           ` Michael Ellerman
2016-09-12  9:45           ` Paolo Bonzini
2016-09-12  9:45             ` Paolo Bonzini
2016-09-02 12:22 ` [PATCH 1/3] powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET Aneesh Kumar K.V
2016-09-03  9:54   ` Paul Mackerras
2016-09-04 11:31     ` Aneesh Kumar K.V
2016-09-08  9:47 ` [1/3] " Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.