All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper
@ 2017-07-19  4:49 Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 02/24] powerpc/mm: Pre-filter SRR1 bits before do_page_fault() Benjamin Herrenschmidt
                   ` (23 more replies)
  0 siblings, 24 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

This will allow simplifying the returns from do_page_fault

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/fault.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 4c422632047b..faddc87d0205 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -195,10 +195,9 @@ static int mm_fault_error(struct pt_regs *regs, unsigned long addr, int fault)
  * The return value is 0 if the fault was handled, or the signal
  * number if this is a kernel fault that can't be handled here.
  */
-int do_page_fault(struct pt_regs *regs, unsigned long address,
-			    unsigned long error_code)
+static int __do_page_fault(struct pt_regs *regs, unsigned long address,
+			   unsigned long error_code)
 {
-	enum ctx_state prev_state = exception_enter();
 	struct vm_area_struct * vma;
 	struct mm_struct *mm = current->mm;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
@@ -523,6 +522,15 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
 	rc = SIGSEGV;
 
 bail:
+	return rc;
+}
+NOKPROBE_SYMBOL(__do_page_fault);
+
+int do_page_fault(struct pt_regs *regs, unsigned long address,
+		  unsigned long error_code)
+{
+	enum ctx_state prev_state = exception_enter();
+	int rc = __do_page_fault(regs, address, error_code);
 	exception_exit(prev_state);
 	return rc;
 }
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 02/24] powerpc/mm: Pre-filter SRR1 bits before do_page_fault()
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-22 16:43   ` LEROY Christophe
  2017-07-19  4:49 ` [PATCH 03/24] powerpc/6xx: Handle DABR match before calling do_page_fault Benjamin Herrenschmidt
                   ` (22 subsequent siblings)
  23 siblings, 1 reply; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

By filtering the relevant SRR1 bits in the assembly rather than
in do_page_fault() itself, we avoid a conditional branch (since we
already come from different path for data and instruction faults).

This will allow more simplifications later

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/kernel/head_32.S  |  2 +-
 arch/powerpc/kernel/head_8xx.S |  4 ++--
 arch/powerpc/mm/fault.c        | 14 ++------------
 3 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index a5301248b098..f11d1cd6e314 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -409,7 +409,7 @@ InstructionAccess:
 	mr	r4,r12			/* SRR0 is fault address */
 	bl	hash_page
 1:	mr	r4,r12
-	mr	r5,r9
+	andis.	r5,r9,0x4820		/* Filter relevant SRR1 bits */
 	EXC_XFER_LITE(0x400, handle_page_fault)
 
 /* External interrupt */
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index c032fe8c2d26..da3afa2c1658 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -569,8 +569,8 @@ _ENTRY(DTLBMiss_jmp)
 InstructionTLBError:
 	EXCEPTION_PROLOG
 	mr	r4,r12
-	mr	r5,r9
-	andis.	r10,r5,0x4000
+	andis.	r5,r9,0x4820		/* Filter relevant SRR1 bits */
+	andis.	r10,r9,0x4000
 	beq+	1f
 	tlbie	r4
 itlbie:
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index faddc87d0205..f04bc9f6b134 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -203,23 +203,13 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 	int code = SEGV_MAPERR;
 	int is_write = 0;
-	int trap = TRAP(regs);
- 	int is_exec = trap == 0x400;
+ 	int is_exec = TRAP(regs) == 0x400;
 	int is_user = user_mode(regs);
 	int fault;
 	int rc = 0, store_update_sp = 0;
 
 #if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
-	/*
-	 * Fortunately the bit assignments in SRR1 for an instruction
-	 * fault and DSISR for a data fault are mostly the same for the
-	 * bits we are interested in.  But there are some bits which
-	 * indicate errors in DSISR but can validly be set in SRR1.
-	 */
-	if (is_exec)
-		error_code &= 0x48200000;
-	else
-		is_write = error_code & DSISR_ISSTORE;
+	is_write = error_code & DSISR_ISSTORE;
 #else
 	is_write = error_code & ESR_DST;
 #endif /* CONFIG_4xx || CONFIG_BOOKE */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 03/24] powerpc/6xx: Handle DABR match before calling do_page_fault
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 02/24] powerpc/mm: Pre-filter SRR1 bits before do_page_fault() Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-08-03  0:19   ` Michael Ellerman
  2017-07-19  4:49 ` [PATCH 04/24] powerpc/mm: Update definitions of DSISR bits Benjamin Herrenschmidt
                   ` (21 subsequent siblings)
  23 siblings, 1 reply; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

On legacy 6xx 32-bit procesors, we checked for the DABR match bit
in DSISR from do_page_fault(), in the middle of a pile of ifdef's
because all other CPU types do it in assembly prior to calling
do_page_fault. Fix that.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/kernel/entry_32.S | 11 +++++++++++
 arch/powerpc/mm/fault.c        |  9 ---------
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 8587059ad848..7331df033804 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -586,6 +586,8 @@ ppc_swapcontext:
 handle_page_fault:
 	stw	r4,_DAR(r1)
 	addi	r3,r1,STACK_FRAME_OVERHEAD
+	andis.  r0,r5,DSISR_DABRMATCH@h
+	bne-    handle_dabr_fault
 	bl	do_page_fault
 	cmpwi	r3,0
 	beq+	ret_from_except
@@ -599,6 +601,15 @@ handle_page_fault:
 	bl	bad_page_fault
 	b	ret_from_except_full
 
+	/* We have a data breakpoint exception - handle it */
+handle_dabr_fault:
+	SAVE_NVGPRS(r1)
+	lwz	r0,_TRAP(r1)
+	clrrwi	r0,r0,1
+	stw	r0,_TRAP(r1)
+	bl      do_break
+	b	ret_from_except_full
+
 /*
  * This routine switches between two different tasks.  The process
  * state of one is saved on its kernel stack.  Then the state
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index f04bc9f6b134..f257965b54b5 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -242,15 +242,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 		goto bail;
 	}
 
-#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE) || \
-      defined(CONFIG_PPC_BOOK3S_64) || defined(CONFIG_PPC_8xx))
-  	if (error_code & DSISR_DABRMATCH) {
-		/* breakpoint match */
-		do_break(regs, address, error_code);
-		goto bail;
-	}
-#endif
-
 	/* We restore the interrupt state now */
 	if (!arch_irq_disabled_regs(regs))
 		local_irq_enable();
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 04/24] powerpc/mm: Update definitions of DSISR bits
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 02/24] powerpc/mm: Pre-filter SRR1 bits before do_page_fault() Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 03/24] powerpc/6xx: Handle DABR match before calling do_page_fault Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 05/24] powerpc/mm: Update bits used to skip hash_page Benjamin Herrenschmidt
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

This updates the definitions for the various DSISR bits to
match both some historical stuff and to match new bits on
POWER9.

In addition, we define some masks corresponding to the "bad"
faults on Book3S, and some masks corresponding to the bits
that match between DSISR and SRR1 for a DSI and an ISI.

This comes with a small code update to change the definition
of DSISR_PGDIRFAULT which becomes DSISR_PRTABLE_FAULT to
match architecture 3.0B

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/reg.h         | 69 +++++++++++++++++++++++++++++-----
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  4 +-
 2 files changed, 61 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index a3b6575c7842..73be2f71dbbb 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -272,16 +272,65 @@
 #define SPRN_DAR	0x013	/* Data Address Register */
 #define SPRN_DBCR	0x136	/* e300 Data Breakpoint Control Reg */
 #define SPRN_DSISR	0x012	/* Data Storage Interrupt Status Register */
-#define   DSISR_NOHPTE		0x40000000	/* no translation found */
-#define   DSISR_PROTFAULT	0x08000000	/* protection fault */
-#define   DSISR_BADACCESS	0x04000000	/* bad access to CI or G */
-#define   DSISR_ISSTORE		0x02000000	/* access was a store */
-#define   DSISR_DABRMATCH	0x00400000	/* hit data breakpoint */
-#define   DSISR_NOSEGMENT	0x00200000	/* SLB miss */
-#define   DSISR_KEYFAULT	0x00200000	/* Key fault */
-#define   DSISR_UNSUPP_MMU	0x00080000	/* Unsupported MMU config */
-#define   DSISR_SET_RC		0x00040000	/* Failed setting of R/C bits */
-#define   DSISR_PGDIRFAULT      0x00020000      /* Fault on page directory */
+#define   DSISR_BAD_DIRECT_ST	0x80000000 /* Obsolete: Direct store error */
+#define   DSISR_NOHPTE		0x40000000 /* no translation found */
+#define   DSISR_ATTR_CONFLICT	0x20000000 /* P9: Process vs. Partition attr */
+#define   DSISR_NOEXEC_OR_G	0x10000000 /* Alias of SRR1 bit, see below */
+#define   DSISR_PROTFAULT	0x08000000 /* protection fault */
+#define   DSISR_BADACCESS	0x04000000 /* bad access to CI or G */
+#define   DSISR_ISSTORE		0x02000000 /* access was a store */
+#define   DSISR_DABRMATCH	0x00400000 /* hit data breakpoint */
+#define   DSISR_NOSEGMENT	0x00200000 /* STAB miss (unsupported) */
+#define   DSISR_KEYFAULT	0x00200000 /* Storage Key fault */
+#define   DSISR_BAD_EXT_CTRL	0x00100000 /* Obsolete: External ctrl error */
+#define   DSISR_UNSUPP_MMU	0x00080000 /* P9: Unsupported MMU config */
+#define   DSISR_SET_RC		0x00040000 /* P9: Failed setting of R/C bits */
+#define   DSISR_PRTABLE_FAULT   0x00020000 /* P9: Fault on process table */
+#define   DSISR_ICSWX_NO_CT     0x00004000 /* P7: icswx unavailable cp type */
+#define   DSISR_BAD_COPYPASTE   0x00000008 /* P9: Copy/Paste on wrong memtype */
+#define   DSISR_BAD_AMO		0x00000004 /* P9: Incorrect AMO opcode */
+#define   DSISR_BAD_CI_LDST	0x00000002 /* P8: Bad HV CI load/store */
+
+/*
+ * DSISR_NOEXEC_OR_G doesn't actually exist. This bit is always
+ * 0 on DSIs. However, on ISIs, the corresponding bit in SRR1
+ * indicates an attempt at executing from a no-execute PTE
+ * or segment or from a guarded page.
+ *
+ * We add a definition here for completeness as we alias
+ * DSISR and SRR1 in do_page_fault.
+ */
+
+/*
+ * DSISR bits that are treated as a fault. Any bit set
+ * here will skip hash_page, and cause do_page_fault to
+ * trigger a SIGBUS or SIGSEGV:
+ */
+#define   DSISR_BAD_FAULT_32S	(DSISR_BAD_DIRECT_ST	| \
+				 DSISR_BADACCESS	| \
+				 DSISR_BAD_EXT_CTRL)
+#define	  DSISR_BAD_FAULT_64S	(DSISR_BAD_FAULT_32S	| \
+				 DSISR_ATTR_CONFLICT	| \
+				 DSISR_KEYFAULT		| \
+				 DSISR_UNSUPP_MMU	| \
+				 DSISR_PRTABLE_FAULT	| \
+				 DSISR_ICSWX_NO_CT	| \
+				 DSISR_BAD_COPYPASTE	| \
+				 DSISR_BAD_AMO		| \
+				 DSISR_BAD_CI_LDST)
+/*
+ * These bits are equivalent in SRR1 and DSISR for 0x400
+ * instruction access interrupts on Book3S
+ */
+#define   DSISR_SRR1_MATCH_32S	(DSISR_NOHPTE		| \
+				 DSISR_NOEXEC_OR_G	| \
+				 DSISR_PROTFAULT)
+#define   DSISR_SRR1_MATCH_64S	(DSISR_SRR1_MATCH_32S	| \
+				 DSISR_KEYFAULT		| \
+				 DSISR_UNSUPP_MMU	| \
+				 DSISR_SET_RC		| \
+				 DSISR_PRTABLE_FAULT)
+
 #define SPRN_TBRL	0x10C	/* Time Base Read Lower Register (user, R/O) */
 #define SPRN_TBRU	0x10D	/* Time Base Read Upper Register (user, R/O) */
 #define SPRN_CIR	0x11B	/* Chip Information Register (hyper, R/0) */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index f6b3e67c5762..6d677c79eeb1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -322,13 +322,13 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	gpa = vcpu->arch.fault_gpa & ~0xfffUL;
 	gpa &= ~0xF000000000000000ul;
 	gfn = gpa >> PAGE_SHIFT;
-	if (!(dsisr & DSISR_PGDIRFAULT))
+	if (!(dsisr & DSISR_PRTABLE_FAULT))
 		gpa |= ea & 0xfff;
 	memslot = gfn_to_memslot(kvm, gfn);
 
 	/* No memslot means it's an emulated MMIO region */
 	if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID)) {
-		if (dsisr & (DSISR_PGDIRFAULT | DSISR_BADACCESS |
+		if (dsisr & (DSISR_PRTABLE_FAULT | DSISR_BADACCESS |
 			     DSISR_SET_RC)) {
 			/*
 			 * Bad address in guest page table tree, or other
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 05/24] powerpc/mm: Update bits used to skip hash_page
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (2 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 04/24] powerpc/mm: Update definitions of DSISR bits Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 06/24] powerpc/mm: Use symbolic constants for filtering SRR1 bits on ISIs Benjamin Herrenschmidt
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

We test a number of bits from DSISR/SRR1 before deciding
to call hash_page(). If any of these is set, we go directly
to do_page_fault() as the bit indicate a fault that needs
to be handled there (no hashing needed).

This updates the current open-coded masks to use the new
DSISR definitions.

This *does* change the masks actually used in two ways:

 - We used to test various bits that were defined as "always 0"
in the architecture and could be repurposed for something
else. From now on, we just ignore such bits.

 - We were missing some new bits defined on P9

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/kernel/exceptions-64s.S | 6 ++++--
 arch/powerpc/kernel/head_32.S        | 2 +-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index e6d8354d79ef..0e6aedd7066d 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1477,8 +1477,10 @@ USE_TEXT_SECTION()
  */
 	.balign	IFETCH_ALIGN_BYTES
 do_hash_page:
-#ifdef CONFIG_PPC_STD_MMU_64
-	andis.	r0,r4,0xa450		/* weird error? */
+	#ifdef CONFIG_PPC_STD_MMU_64
+	lis	r0,DSISR_BAD_FAULT_64S@h
+	ori	r0,r0,DSISR_BAD_FAULT_64S@l
+	and.	r0,r4,r0		/* weird error? */
 	bne-	handle_page_fault	/* if not, try to insert a HPTE */
 	CURRENT_THREAD_INFO(r11, r1)
 	lwz	r0,TI_PREEMPT(r11)	/* If we're in an "NMI" */
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index f11d1cd6e314..86ad3eeadf67 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -388,7 +388,7 @@ DataAccess:
 	EXCEPTION_PROLOG
 	mfspr	r10,SPRN_DSISR
 	stw	r10,_DSISR(r11)
-	andis.	r0,r10,0xa470		/* weird error? */
+	andis.	r0,r10,DSISR_BAD_FAULT_32S@h
 	bne	1f			/* if not, try to put a PTE */
 	mfspr	r4,SPRN_DAR		/* into the hash table */
 	rlwinm	r3,r10,32-15,21,21	/* DSISR_STORE -> _PAGE_RW */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 06/24] powerpc/mm: Use symbolic constants for filtering SRR1 bits on ISIs
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (3 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 05/24] powerpc/mm: Update bits used to skip hash_page Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 07/24] powerpc/mm: Move out definition of CPU specific is_write bits Benjamin Herrenschmidt
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

This uses the newly defined constants for this rather than open-coded
numbers. There is a side effect on 64-bit which is to pass through
some of the new P9 bits which we didn't before.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/kernel/exceptions-64s.S | 2 +-
 arch/powerpc/kernel/head_32.S        | 4 ++--
 arch/powerpc/kernel/head_8xx.S       | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 0e6aedd7066d..3e447e017da2 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -541,7 +541,7 @@ EXC_COMMON_BEGIN(instruction_access_common)
 	RECONCILE_IRQ_STATE(r10, r11)
 	ld	r12,_MSR(r1)
 	ld	r3,_NIP(r1)
-	andis.	r4,r12,0x5820
+	andis.	r4,r12,DSISR_BAD_FAULT_64S@h
 	li	r5,0x400
 	std	r3,_DAR(r1)
 	std	r4,_DSISR(r1)
diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.S
index 86ad3eeadf67..463c78286b66 100644
--- a/arch/powerpc/kernel/head_32.S
+++ b/arch/powerpc/kernel/head_32.S
@@ -403,13 +403,13 @@ DataAccess:
 	DO_KVM  0x400
 InstructionAccess:
 	EXCEPTION_PROLOG
-	andis.	r0,r9,0x4000		/* no pte found? */
+	andis.	r0,r9,SRR1_ISI_NOPT@h	/* no pte found? */
 	beq	1f			/* if so, try to put a PTE */
 	li	r3,0			/* into the hash table */
 	mr	r4,r12			/* SRR0 is fault address */
 	bl	hash_page
 1:	mr	r4,r12
-	andis.	r5,r9,0x4820		/* Filter relevant SRR1 bits */
+	andis.	r5,r9,DSISR_SRR1_MATCH_32S@h /* Filter relevant SRR1 bits */
 	EXC_XFER_LITE(0x400, handle_page_fault)
 
 /* External interrupt */
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index da3afa2c1658..07ddced6bab3 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -569,8 +569,8 @@ _ENTRY(DTLBMiss_jmp)
 InstructionTLBError:
 	EXCEPTION_PROLOG
 	mr	r4,r12
-	andis.	r5,r9,0x4820		/* Filter relevant SRR1 bits */
-	andis.	r10,r9,0x4000
+	andis.	r5,r9,DSISR_SRR1_MATCH_32S@h /* Filter relevant SRR1 bits */
+	andis.	r10,r9,SRR1_ISI_NOPT@h
 	beq+	1f
 	tlbie	r4
 itlbie:
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 07/24] powerpc/mm: Move out definition of CPU specific is_write bits
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (4 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 06/24] powerpc/mm: Use symbolic constants for filtering SRR1 bits on ISIs Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-22 16:40   ` LEROY Christophe
  2017-07-19  4:49 ` [PATCH 08/24] powerpc/mm: Move error_code checks for bad faults earlier Benjamin Herrenschmidt
                   ` (17 subsequent siblings)
  23 siblings, 1 reply; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

Define a common page_fault_is_write() helper and use it

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/fault.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index f257965b54b5..26ec0dd4f419 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -183,6 +183,16 @@ static int mm_fault_error(struct pt_regs *regs, unsigned long addr, int fault)
 }
 
 /*
+ * Define the correct "is_write" bit in error_code based
+ * on the processor family
+ */
+#if (defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
+#define page_fault_is_write(__err)	((__err) & ESR_DST)
+#else
+#define page_fault_is_write(__err)	((__err) & DSISR_ISSTORE)
+#endif
+
+/*
  * For 600- and 800-family processors, the error_code parameter is DSISR
  * for a data fault, SRR1 for an instruction fault. For 400-family processors
  * the error_code parameter is ESR for a data fault, 0 for an instruction
@@ -202,18 +212,12 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	struct mm_struct *mm = current->mm;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
 	int code = SEGV_MAPERR;
-	int is_write = 0;
  	int is_exec = TRAP(regs) == 0x400;
 	int is_user = user_mode(regs);
+	int is_write = page_fault_is_write(error_code);
 	int fault;
 	int rc = 0, store_update_sp = 0;
 
-#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
-	is_write = error_code & DSISR_ISSTORE;
-#else
-	is_write = error_code & ESR_DST;
-#endif /* CONFIG_4xx || CONFIG_BOOKE */
-
 #ifdef CONFIG_PPC_ICSWX
 	/*
 	 * we need to do this early because this "data storage
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 08/24] powerpc/mm: Move error_code checks for bad faults earlier
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (5 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 07/24] powerpc/mm: Move out definition of CPU specific is_write bits Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 09/24] powerpc/mm: Overhaul handling of bad page faults Benjamin Herrenschmidt
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

There's no point looking for the VMA etc.. when we already know
we are going to fail.

This adds some code to set "code" for the si_code but that will
be gone in subsequent patches.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/fault.c | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 26ec0dd4f419..e2f3144a48b9 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -237,6 +237,26 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	if (unlikely(debugger_fault_handler(regs)))
 		goto bail;
 
+#if defined(CONFIG_6xx)
+	if (error_code & 0x95700000) {
+		/* an error such as lwarx to I/O controller space,
+		   address matching DABR, eciwx, etc. */
+		code = SEGV_ACCERR;
+		goto bad_area_nosemaphore;
+	}
+#endif /* CONFIG_6xx */
+#if defined(CONFIG_8xx)
+        /* The MPC8xx seems to always set 0x80000000, which is
+         * "undefined".  Of those that can be set, this is the only
+         * one which seems bad.
+         */
+	if (error_code & 0x10000000) {
+                /* Guarded storage error. */
+		code = SEGV_ACCERR;
+		goto bad_area_nosemaphore;
+	}
+#endif /* CONFIG_8xx */
+
 	/*
 	 * The kernel should never take an execute fault nor should it
 	 * take a page fault to a kernel address.
@@ -351,21 +371,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 
 good_area:
 	code = SEGV_ACCERR;
-#if defined(CONFIG_6xx)
-	if (error_code & 0x95700000)
-		/* an error such as lwarx to I/O controller space,
-		   address matching DABR, eciwx, etc. */
-		goto bad_area;
-#endif /* CONFIG_6xx */
-#if defined(CONFIG_8xx)
-        /* The MPC8xx seems to always set 0x80000000, which is
-         * "undefined".  Of those that can be set, this is the only
-         * one which seems bad.
-         */
-	if (error_code & 0x10000000)
-                /* Guarded storage error. */
-		goto bad_area;
-#endif /* CONFIG_8xx */
 
 	if (is_exec) {
 		/*
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 09/24] powerpc/mm: Overhaul handling of bad page faults
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (6 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 08/24] powerpc/mm: Move error_code checks for bad faults earlier Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 10/24] powerpc/mm: Move debugger check to notify_page_fault() Benjamin Herrenschmidt
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

A bad page fault is when the HW signals an error such as a bad
copy/paste, an AMO error, or some other type of error that will
not be fixed by updating the PTE.

Use a helper page_fault_is_bad() to check for bad page faults thus
removing the per-processor family open-coding in __do_page_fault()
and trigger a SIGBUS rather than a SIGSEGV which is more appropriate.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/fault.c | 32 ++++++++++++++------------------
 1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index e2f3144a48b9..4470500b4871 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -188,8 +188,16 @@ static int mm_fault_error(struct pt_regs *regs, unsigned long addr, int fault)
  */
 #if (defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
 #define page_fault_is_write(__err)	((__err) & ESR_DST)
+#define page_fault_is_bad(__err)	(0)
 #else
 #define page_fault_is_write(__err)	((__err) & DSISR_ISSTORE)
+#if defined(CONFIG_8xx)
+#define page_fault_is_bad(__err)	((__err) & 0x10000000)
+#elif defined(CONFIG_PPC64)
+#define page_fault_is_bad(__err)	((__err) & DSISR_BAD_FAULT_64S)
+#else
+#define page_fault_is_bad(__err)	((__err) & DSISR_BAD_FAULT_32S)
+#endif
 #endif
 
 /*
@@ -237,25 +245,13 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	if (unlikely(debugger_fault_handler(regs)))
 		goto bail;
 
-#if defined(CONFIG_6xx)
-	if (error_code & 0x95700000) {
-		/* an error such as lwarx to I/O controller space,
-		   address matching DABR, eciwx, etc. */
-		code = SEGV_ACCERR;
-		goto bad_area_nosemaphore;
-	}
-#endif /* CONFIG_6xx */
-#if defined(CONFIG_8xx)
-        /* The MPC8xx seems to always set 0x80000000, which is
-         * "undefined".  Of those that can be set, this is the only
-         * one which seems bad.
-         */
-	if (error_code & 0x10000000) {
-                /* Guarded storage error. */
-		code = SEGV_ACCERR;
-		goto bad_area_nosemaphore;
+	if (unlikely(page_fault_is_bad(error_code))) {
+		if (is_user)
+			_exception(SIGBUS, regs, BUS_OBJERR, address);
+		else
+			rc = SIGBUS;
+		goto bail;
 	}
-#endif /* CONFIG_8xx */
 
 	/*
 	 * The kernel should never take an execute fault nor should it
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 10/24] powerpc/mm: Move debugger check to notify_page_fault()
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (7 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 09/24] powerpc/mm: Overhaul handling of bad page faults Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 11/24] powerpc/mm: Simplify returns from __do_page_fault Benjamin Herrenschmidt
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

unclutters the main path

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/fault.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 4470500b4871..2f825ae68f20 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -47,27 +47,25 @@
 
 #include "icswx.h"
 
-#ifdef CONFIG_KPROBES
-static inline int notify_page_fault(struct pt_regs *regs)
+static inline bool notify_page_fault(struct pt_regs *regs)
 {
-	int ret = 0;
+	bool ret = false;
 
+#ifdef CONFIG_KPROBES
 	/* kprobe_running() needs smp_processor_id() */
 	if (!user_mode(regs)) {
 		preempt_disable();
 		if (kprobe_running() && kprobe_fault_handler(regs, 11))
-			ret = 1;
+			ret = true;
 		preempt_enable();
 	}
+#endif /* CONFIG_KPROBES */
+
+	if (unlikely(debugger_fault_handler(regs)))
+		ret = true;
 
 	return ret;
 }
-#else
-static inline int notify_page_fault(struct pt_regs *regs)
-{
-	return 0;
-}
-#endif
 
 /*
  * Check whether the instruction at regs->nip is a store using
@@ -242,9 +240,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	if (notify_page_fault(regs))
 		goto bail;
 
-	if (unlikely(debugger_fault_handler(regs)))
-		goto bail;
-
 	if (unlikely(page_fault_is_bad(error_code))) {
 		if (is_user)
 			_exception(SIGBUS, regs, BUS_OBJERR, address);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 11/24] powerpc/mm: Simplify returns from __do_page_fault
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (8 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 10/24] powerpc/mm: Move debugger check to notify_page_fault() Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 12/24] powerpc/mm: Fix reporting of kernel execute faults Benjamin Herrenschmidt
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

Now that we moved the exception state handling to a wrapper, we can
just directly return rather than "goto bail"

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/fault.c | 39 ++++++++++++++++-----------------------
 1 file changed, 16 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 2f825ae68f20..e8d6acc888c5 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -233,39 +233,36 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	if (error_code & ICSWX_DSI_UCT) {
 		rc = acop_handle_fault(regs, address, error_code);
 		if (rc)
-			goto bail;
+			return rc;
 	}
 #endif /* CONFIG_PPC_ICSWX */
 
 	if (notify_page_fault(regs))
-		goto bail;
+		return 0;
 
 	if (unlikely(page_fault_is_bad(error_code))) {
-		if (is_user)
+		if (is_user) {
 			_exception(SIGBUS, regs, BUS_OBJERR, address);
-		else
-			rc = SIGBUS;
-		goto bail;
+			return 0;
+		}
+		return SIGBUS;
 	}
 
 	/*
 	 * The kernel should never take an execute fault nor should it
 	 * take a page fault to a kernel address.
 	 */
-	if (!is_user && (is_exec || (address >= TASK_SIZE))) {
-		rc = SIGSEGV;
-		goto bail;
-	}
+	if (!is_user && (is_exec || (address >= TASK_SIZE)))
+		return SIGSEGV;
 
 	/* We restore the interrupt state now */
 	if (!arch_irq_disabled_regs(regs))
 		local_irq_enable();
 
 	if (faulthandler_disabled() || mm == NULL) {
-		if (!is_user) {
-			rc = SIGSEGV;
-			goto bail;
-		}
+		if (!is_user)
+			return SIGSEGV;
+
 		/* faulthandler_disabled() in user mode is really bad,
 		   as is current->mm == NULL. */
 		printk(KERN_EMERG "Page fault in user mode with "
@@ -454,9 +451,8 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 			goto bad_area_nosemaphore;
 		rc = mm_fault_error(regs, address, fault);
 		if (rc >= MM_FAULT_RETURN)
-			goto bail;
-		else
-			rc = 0;
+			return rc;
+		rc = 0;
 	}
 
 	/*
@@ -483,7 +479,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 			      regs, address);
 	}
 
-	goto bail;
+	return rc;
 
 bad_area:
 	up_read(&mm->mmap_sem);
@@ -492,7 +488,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	/* User mode accesses cause a SIGSEGV */
 	if (is_user) {
 		_exception(SIGSEGV, regs, code, address);
-		goto bail;
+		return 0;
 	}
 
 	if (is_exec && (error_code & DSISR_PROTFAULT))
@@ -500,10 +496,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 				   " page (%lx) - exploit attempt? (uid: %d)\n",
 				   address, from_kuid(&init_user_ns, current_uid()));
 
-	rc = SIGSEGV;
-
-bail:
-	return rc;
+	return SIGSEGV;
 }
 NOKPROBE_SYMBOL(__do_page_fault);
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 12/24] powerpc/mm: Fix reporting of kernel execute faults
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (9 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 11/24] powerpc/mm: Simplify returns from __do_page_fault Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2018-11-07  8:35   ` Christophe LEROY
  2017-07-19  4:49 ` [PATCH 13/24] powerpc/mm: Make bad_area* helper functions Benjamin Herrenschmidt
                   ` (12 subsequent siblings)
  23 siblings, 1 reply; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

We currently test for is_exec and DSISR_PROTFAULT but that doesn't
make sense as this is the wrong error bit to test for an execute
permission failure.

In fact, we had code that would return early if we had an exec
fault in kernel mode so I think that was just dead code anyway.

Finally the location of that test is awkward and prevents further
simplifications.

So instead move that test into a helper along with the existing
early test for kernel exec faults and out of range accesses,
and put it all in a "bad_kernel_fault()" helper. While at it
test the correct error bits.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/fault.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index e8d6acc888c5..aead07cf8a5b 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -180,6 +180,20 @@ static int mm_fault_error(struct pt_regs *regs, unsigned long addr, int fault)
 	return MM_FAULT_CONTINUE;
 }
 
+/* Is this a bad kernel fault ? */
+static bool bad_kernel_fault(bool is_exec, unsigned long error_code,
+			     unsigned long address)
+{
+	if (is_exec && (error_code & (DSISR_NOEXEC_OR_G | DSISR_KEYFAULT))) {
+		printk_ratelimited(KERN_CRIT "kernel tried to execute"
+				   " exec-protected page (%lx) -"
+				   "exploit attempt? (uid: %d)\n",
+				   address, from_kuid(&init_user_ns,
+						      current_uid()));
+	}
+	return is_exec || (address >= TASK_SIZE);
+}
+
 /*
  * Define the correct "is_write" bit in error_code based
  * on the processor family
@@ -252,7 +266,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	 * The kernel should never take an execute fault nor should it
 	 * take a page fault to a kernel address.
 	 */
-	if (!is_user && (is_exec || (address >= TASK_SIZE)))
+	if (unlikely(!is_user && bad_kernel_fault(is_exec, error_code, address)))
 		return SIGSEGV;
 
 	/* We restore the interrupt state now */
@@ -491,11 +505,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 		return 0;
 	}
 
-	if (is_exec && (error_code & DSISR_PROTFAULT))
-		printk_ratelimited(KERN_CRIT "kernel tried to execute NX-protected"
-				   " page (%lx) - exploit attempt? (uid: %d)\n",
-				   address, from_kuid(&init_user_ns, current_uid()));
-
 	return SIGSEGV;
 }
 NOKPROBE_SYMBOL(__do_page_fault);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 13/24] powerpc/mm: Make bad_area* helper functions
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (10 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 12/24] powerpc/mm: Fix reporting of kernel execute faults Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 14/24] powerpc/mm: Rework mm_fault_error() Benjamin Herrenschmidt
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

Instead of goto labels, instead call those functions and return.

This gets us closer to x86 and allows us to shring do_page_fault()
even more.

The main difference with x86 is that those function return a value
which we then return from do_page_fault(). That value is our
return value from do_page_fault() which we use to generate
kernel faults.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/fault.c | 78 +++++++++++++++++++++++++++++++------------------
 1 file changed, 50 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index aead07cf8a5b..3903b55fccf8 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -108,6 +108,45 @@ static int store_updates_sp(struct pt_regs *regs)
  * do_page_fault error handling helpers
  */
 
+static int
+__bad_area_nosemaphore(struct pt_regs *regs, unsigned long address, int si_code)
+{
+	/*
+	 * If we are in kernel mode, bail out with a SEGV, this will
+	 * be caught by the assembly which will restore the non-volatile
+	 * registers before calling bad_page_fault()
+	 */
+	if (!user_mode(regs))
+		return SIGSEGV;
+
+	_exception(SIGSEGV, regs, si_code, address);
+
+	return 0;
+}
+
+static noinline int bad_area_nosemaphore(struct pt_regs *regs, unsigned long address)
+{
+	return __bad_area_nosemaphore(regs, address, SEGV_MAPERR);
+}
+
+static int __bad_area(struct pt_regs *regs, unsigned long address, int si_code)
+{
+	struct mm_struct *mm = current->mm;
+
+	/*
+	 * Something tried to access memory that isn't in our memory map..
+	 * Fix it, but check if it's kernel or user first..
+	 */
+	up_read(&mm->mmap_sem);
+
+	return __bad_area_nosemaphore(regs, address, si_code);
+}
+
+static noinline int bad_area(struct pt_regs *regs, unsigned long address)
+{
+	return __bad_area(regs, address, SEGV_MAPERR);
+}
+
 #define MM_FAULT_RETURN		0
 #define MM_FAULT_CONTINUE	-1
 #define MM_FAULT_ERR(sig)	(sig)
@@ -231,7 +270,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	struct vm_area_struct * vma;
 	struct mm_struct *mm = current->mm;
 	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
-	int code = SEGV_MAPERR;
  	int is_exec = TRAP(regs) == 0x400;
 	int is_user = user_mode(regs);
 	int is_write = page_fault_is_write(error_code);
@@ -317,7 +355,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	 */
 	if (!down_read_trylock(&mm->mmap_sem)) {
 		if (!is_user && !search_exception_tables(regs->nip))
-			goto bad_area_nosemaphore;
+			return bad_area_nosemaphore(regs, address);
 
 retry:
 		down_read(&mm->mmap_sem);
@@ -332,11 +370,11 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 
 	vma = find_vma(mm, address);
 	if (!vma)
-		goto bad_area;
+		return bad_area(regs, address);
 	if (vma->vm_start <= address)
 		goto good_area;
 	if (!(vma->vm_flags & VM_GROWSDOWN))
-		goto bad_area;
+		return bad_area(regs, address);
 
 	/*
 	 * N.B. The POWER/Open ABI allows programs to access up to
@@ -351,7 +389,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 		/* get user regs even if this fault is in kernel mode */
 		struct pt_regs *uregs = current->thread.regs;
 		if (uregs == NULL)
-			goto bad_area;
+			return bad_area(regs, address);
 
 		/*
 		 * A user-mode access to an address a long way below
@@ -366,14 +404,12 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 		 * expand the stack rather than segfaulting.
 		 */
 		if (address + 2048 < uregs->gpr[1] && !store_update_sp)
-			goto bad_area;
+			return bad_area(regs, address);
 	}
 	if (expand_stack(vma, address))
-		goto bad_area;
+		return bad_area(regs, address);
 
 good_area:
-	code = SEGV_ACCERR;
-
 	if (is_exec) {
 		/*
 		 * Allow execution from readable areas if the MMU does not
@@ -388,16 +424,16 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 		if (!(vma->vm_flags & VM_EXEC) &&
 		    (cpu_has_feature(CPU_FTR_NOEXECUTE) ||
 		     !(vma->vm_flags & (VM_READ | VM_WRITE))))
-			goto bad_area;
+			return bad_area(regs, address);
 	/* a write */
 	} else if (is_write) {
 		if (!(vma->vm_flags & VM_WRITE))
-			goto bad_area;
+			return bad_area(regs, address);
 		flags |= FAULT_FLAG_WRITE;
 	/* a read */
 	} else {
 		if (!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)))
-			goto bad_area;
+			return bad_area(regs, address);
 	}
 #ifdef CONFIG_PPC_STD_MMU
 	/*
@@ -462,11 +498,10 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 
 	if (unlikely(fault & (VM_FAULT_RETRY|VM_FAULT_ERROR))) {
 		if (fault & VM_FAULT_SIGSEGV)
-			goto bad_area_nosemaphore;
+			return bad_area_nosemaphore(regs, address);
 		rc = mm_fault_error(regs, address, fault);
 		if (rc >= MM_FAULT_RETURN)
 			return rc;
-		rc = 0;
 	}
 
 	/*
@@ -492,20 +527,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 		perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
 			      regs, address);
 	}
-
-	return rc;
-
-bad_area:
-	up_read(&mm->mmap_sem);
-
-bad_area_nosemaphore:
-	/* User mode accesses cause a SIGSEGV */
-	if (is_user) {
-		_exception(SIGSEGV, regs, code, address);
-		return 0;
-	}
-
-	return SIGSEGV;
+	return 0;
 }
 NOKPROBE_SYMBOL(__do_page_fault);
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 14/24] powerpc/mm: Rework mm_fault_error()
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (11 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 13/24] powerpc/mm: Make bad_area* helper functions Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 15/24] powerpc/mm: Move CMO accounting out of do_page_fault into a helper Benjamin Herrenschmidt
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

First, handle the normal retry failure in do_page_fault itself,
since it's a simple return statement. That allows us to remove
the "continue" special return code from mm_fault_error().

Once that's done, we can have an implementation much closer to
x86 where we only call mm_fault_error() if VM_FAULT_ERROR is set
and directly return.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/fault.c | 66 +++++++++++++++++++++----------------------------
 1 file changed, 28 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 3903b55fccf8..0b217947d34f 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -147,10 +147,6 @@ static noinline int bad_area(struct pt_regs *regs, unsigned long address)
 	return __bad_area(regs, address, SEGV_MAPERR);
 }
 
-#define MM_FAULT_RETURN		0
-#define MM_FAULT_CONTINUE	-1
-#define MM_FAULT_ERR(sig)	(sig)
-
 static int do_sigbus(struct pt_regs *regs, unsigned long address,
 		     unsigned int fault)
 {
@@ -158,7 +154,7 @@ static int do_sigbus(struct pt_regs *regs, unsigned long address,
 	unsigned int lsb = 0;
 
 	if (!user_mode(regs))
-		return MM_FAULT_ERR(SIGBUS);
+		return SIGBUS;
 
 	current->thread.trap_nr = BUS_ADRERR;
 	info.si_signo = SIGBUS;
@@ -179,25 +175,17 @@ static int do_sigbus(struct pt_regs *regs, unsigned long address,
 #endif
 	info.si_addr_lsb = lsb;
 	force_sig_info(SIGBUS, &info, current);
-	return MM_FAULT_RETURN;
+	return 0;
 }
 
 static int mm_fault_error(struct pt_regs *regs, unsigned long addr, int fault)
 {
 	/*
-	 * Pagefault was interrupted by SIGKILL. We have no reason to
-	 * continue the pagefault.
+	 * Kernel page fault interrupted by SIGKILL. We have no reason to
+	 * continue processing.
 	 */
-	if (fatal_signal_pending(current)) {
-		/* Coming from kernel, we need to deal with uaccess fixups */
-		if (user_mode(regs))
-			return MM_FAULT_RETURN;
-		return MM_FAULT_ERR(SIGKILL);
-	}
-
-	/* No fault: be happy */
-	if (!(fault & VM_FAULT_ERROR))
-		return MM_FAULT_CONTINUE;
+	if (fatal_signal_pending(current) && !user_mode(regs))
+		return SIGKILL;
 
 	/* Out of memory */
 	if (fault & VM_FAULT_OOM) {
@@ -206,17 +194,18 @@ static int mm_fault_error(struct pt_regs *regs, unsigned long addr, int fault)
 		 * made us unable to handle the page fault gracefully.
 		 */
 		if (!user_mode(regs))
-			return MM_FAULT_ERR(SIGKILL);
+			return SIGSEGV;
 		pagefault_out_of_memory();
-		return MM_FAULT_RETURN;
+	} else {
+		if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON|
+			     VM_FAULT_HWPOISON_LARGE))
+			return do_sigbus(regs, addr, fault);
+		else if (fault & VM_FAULT_SIGSEGV)
+			return bad_area_nosemaphore(regs, addr);
+		else
+			BUG();
 	}
-
-	if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE))
-		return do_sigbus(regs, addr, fault);
-
-	/* We don't understand the fault code, this is fatal */
-	BUG();
-	return MM_FAULT_CONTINUE;
+	return 0;
 }
 
 /* Is this a bad kernel fault ? */
@@ -274,7 +263,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	int is_user = user_mode(regs);
 	int is_write = page_fault_is_write(error_code);
 	int fault;
-	int rc = 0, store_update_sp = 0;
+	int store_update_sp = 0;
 
 #ifdef CONFIG_PPC_ICSWX
 	/*
@@ -283,7 +272,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	 * look at it
 	 */
 	if (error_code & ICSWX_DSI_UCT) {
-		rc = acop_handle_fault(regs, address, error_code);
+		int rc = acop_handle_fault(regs, address, error_code);
 		if (rc)
 			return rc;
 	}
@@ -492,18 +481,19 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 			if (!fatal_signal_pending(current))
 				goto retry;
 		}
-		/* We will enter mm_fault_error() below */
-	} else
-		up_read(&current->mm->mmap_sem);
 
-	if (unlikely(fault & (VM_FAULT_RETRY|VM_FAULT_ERROR))) {
-		if (fault & VM_FAULT_SIGSEGV)
-			return bad_area_nosemaphore(regs, address);
-		rc = mm_fault_error(regs, address, fault);
-		if (rc >= MM_FAULT_RETURN)
-			return rc;
+		/*
+		 * User mode? Just return to handle the fatal exception otherwise
+		 * return to bad_page_fault
+		 */
+		return is_user ? 0 : SIGBUS;
 	}
 
+	up_read(&current->mm->mmap_sem);
+
+	if (unlikely(fault & VM_FAULT_ERROR))
+		return mm_fault_error(regs, address, fault);
+
 	/*
 	 * Major/minor page fault accounting.
 	 */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 15/24] powerpc/mm: Move CMO accounting out of do_page_fault into a helper
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (12 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 14/24] powerpc/mm: Rework mm_fault_error() Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 16/24] powerpc/mm: Cosmetic fix to page fault accounting Benjamin Herrenschmidt
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

It makes do_page_fault() more readable. No functional change.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/fault.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 0b217947d34f..5ccbf30d8aef 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -222,6 +222,23 @@ static bool bad_kernel_fault(bool is_exec, unsigned long error_code,
 	return is_exec || (address >= TASK_SIZE);
 }
 
+#ifdef CONFIG_PPC_SMLPAR
+static inline void cmo_account_page_fault(void)
+{
+	if (firmware_has_feature(FW_FEATURE_CMO)) {
+		u32 page_ins;
+
+		preempt_disable();
+		page_ins = be32_to_cpu(get_lppaca()->page_ins);
+		page_ins += 1 << PAGE_FACTOR;
+		get_lppaca()->page_ins = cpu_to_be32(page_ins);
+		preempt_enable();
+	}
+}
+#else
+static inline void cmo_account_page_fault(void) { }
+#endif /* CONFIG_PPC_SMLPAR */
+
 /*
  * Define the correct "is_write" bit in error_code based
  * on the processor family
@@ -501,17 +518,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 		current->maj_flt++;
 		perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
 			      regs, address);
-#ifdef CONFIG_PPC_SMLPAR
-		if (firmware_has_feature(FW_FEATURE_CMO)) {
-			u32 page_ins;
-
-			preempt_disable();
-			page_ins = be32_to_cpu(get_lppaca()->page_ins);
-			page_ins += 1 << PAGE_FACTOR;
-			get_lppaca()->page_ins = cpu_to_be32(page_ins);
-			preempt_enable();
-		}
-#endif /* CONFIG_PPC_SMLPAR */
+		cmo_account_page_fault();
 	} else {
 		current->min_flt++;
 		perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 16/24] powerpc/mm: Cosmetic fix to page fault accounting
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (13 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 15/24] powerpc/mm: Move CMO accounting out of do_page_fault into a helper Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 17/24] powerpc/mm: Move the DSISR_PROTFAULT sanity check Benjamin Herrenschmidt
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

No need to break those lines, they aren't that long

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/fault.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 5ccbf30d8aef..bd5d668b47ff 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -516,13 +516,11 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	 */
 	if (fault & VM_FAULT_MAJOR) {
 		current->maj_flt++;
-		perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1,
-			      regs, address);
+		perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
 		cmo_account_page_fault();
 	} else {
 		current->min_flt++;
-		perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1,
-			      regs, address);
+		perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, address);
 	}
 	return 0;
 }
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 17/24] powerpc/mm: Move the DSISR_PROTFAULT sanity check
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (14 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 16/24] powerpc/mm: Cosmetic fix to page fault accounting Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 18/24] powerpc/mm: Move/simplify faulthandler_disabled() and !mm check Benjamin Herrenschmidt
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

This has a page of comment explaining what's going on right in
the middle of do_page_fault() which makes things a bit hard to
follow. Move it to a helper instead. Also do the test earlier
as there's no point waiting until after we found the VMA.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/fault.c | 75 +++++++++++++++++++++++++++----------------------
 1 file changed, 42 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index bd5d668b47ff..6f3a2437008a 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -239,6 +239,45 @@ static inline void cmo_account_page_fault(void)
 static inline void cmo_account_page_fault(void) { }
 #endif /* CONFIG_PPC_SMLPAR */
 
+#ifdef CONFIG_PPC_STD_MMU
+static void sanity_check_fault(bool is_write, unsigned long error_code)
+{
+	/*
+	 * For hash translation mode, we should never get a
+	 * PROTFAULT. Any update to pte to reduce access will result in us
+	 * removing the hash page table entry, thus resulting in a DSISR_NOHPTE
+	 * fault instead of DSISR_PROTFAULT.
+	 *
+	 * A pte update to relax the access will not result in a hash page table
+	 * entry invalidate and hence can result in DSISR_PROTFAULT.
+	 * ptep_set_access_flags() doesn't do a hpte flush. This is why we have
+	 * the special !is_write in the below conditional.
+	 *
+	 * For platforms that doesn't supports coherent icache and do support
+	 * per page noexec bit, we do setup things such that we do the
+	 * sync between D/I cache via fault. But that is handled via low level
+	 * hash fault code (hash_page_do_lazy_icache()) and we should not reach
+	 * here in such case.
+	 *
+	 * For wrong access that can result in PROTFAULT, the above vma->vm_flags
+	 * check should handle those and hence we should fall to the bad_area
+	 * handling correctly.
+	 *
+	 * For embedded with per page exec support that doesn't support coherent
+	 * icache we do get PROTFAULT and we handle that D/I cache sync in
+	 * set_pte_at while taking the noexec/prot fault. Hence this is WARN_ON
+	 * is conditional for server MMU.
+	 *
+	 * For radix, we can get prot fault for autonuma case, because radix
+	 * page table will have them marked noaccess for user.
+	 */
+	if (!radix_enabled() && !is_write)
+		WARN_ON_ONCE(error_code & DSISR_PROTFAULT);
+}
+#else
+static void sanity_check_fault(bool is_write, unsigned long error_code) { }
+#endif /* CONFIG_PPC_STD_MMU */
+
 /*
  * Define the correct "is_write" bit in error_code based
  * on the processor family
@@ -306,6 +345,9 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 		return SIGBUS;
 	}
 
+	/* Additional sanity check(s) */
+	sanity_check_fault(is_write, error_code);
+
 	/*
 	 * The kernel should never take an execute fault nor should it
 	 * take a page fault to a kernel address.
@@ -441,39 +483,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 		if (!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)))
 			return bad_area(regs, address);
 	}
-#ifdef CONFIG_PPC_STD_MMU
-	/*
-	 * For hash translation mode, we should never get a
-	 * PROTFAULT. Any update to pte to reduce access will result in us
-	 * removing the hash page table entry, thus resulting in a DSISR_NOHPTE
-	 * fault instead of DSISR_PROTFAULT.
-	 *
-	 * A pte update to relax the access will not result in a hash page table
-	 * entry invalidate and hence can result in DSISR_PROTFAULT.
-	 * ptep_set_access_flags() doesn't do a hpte flush. This is why we have
-	 * the special !is_write in the below conditional.
-	 *
-	 * For platforms that doesn't supports coherent icache and do support
-	 * per page noexec bit, we do setup things such that we do the
-	 * sync between D/I cache via fault. But that is handled via low level
-	 * hash fault code (hash_page_do_lazy_icache()) and we should not reach
-	 * here in such case.
-	 *
-	 * For wrong access that can result in PROTFAULT, the above vma->vm_flags
-	 * check should handle those and hence we should fall to the bad_area
-	 * handling correctly.
-	 *
-	 * For embedded with per page exec support that doesn't support coherent
-	 * icache we do get PROTFAULT and we handle that D/I cache sync in
-	 * set_pte_at while taking the noexec/prot fault. Hence this is WARN_ON
-	 * is conditional for server MMU.
-	 *
-	 * For radix, we can get prot fault for autonuma case, because radix
-	 * page table will have them marked noaccess for user.
-	 */
-	if (!radix_enabled() && !is_write)
-		WARN_ON_ONCE(error_code & DSISR_PROTFAULT);
-#endif /* CONFIG_PPC_STD_MMU */
 
 	/*
 	 * If for any reason at all we couldn't handle the fault,
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 18/24] powerpc/mm: Move/simplify faulthandler_disabled() and !mm check
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (15 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 17/24] powerpc/mm: Move the DSISR_PROTFAULT sanity check Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 19/24] powerpc/mm: Add a bunch of (un)likely annotations to do_page_fault Benjamin Herrenschmidt
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

Do the check before we re-enable interrupts and clean the code
up a bit.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/fault.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 6f3a2437008a..c5859ffd3a96 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -355,24 +355,23 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	if (unlikely(!is_user && bad_kernel_fault(is_exec, error_code, address)))
 		return SIGSEGV;
 
+	/*
+	 * If we're in an interrupt, have no user context or are running
+	 * in a region with pagefaults disabled then we must not take the fault
+	 */
+	if (unlikely(faulthandler_disabled() || !mm)) {
+		if (is_user)
+			printk_ratelimited(KERN_ERR "Page fault in user mode"
+					   " with faulthandler_disabled()=%d"
+					   " mm=%=p\n",
+					   faulthandler_disabled(), mm);
+		return bad_area_nosemaphore(regs, address);
+	}
+
 	/* We restore the interrupt state now */
 	if (!arch_irq_disabled_regs(regs))
 		local_irq_enable();
 
-	if (faulthandler_disabled() || mm == NULL) {
-		if (!is_user)
-			return SIGSEGV;
-
-		/* faulthandler_disabled() in user mode is really bad,
-		   as is current->mm == NULL. */
-		printk(KERN_EMERG "Page fault in user mode with "
-		       "faulthandler_disabled() = %d mm = %p\n",
-		       faulthandler_disabled(), mm);
-		printk(KERN_EMERG "NIP = %lx  MSR = %lx\n",
-		       regs->nip, regs->msr);
-		die("Weird page fault", regs, SIGSEGV);
-	}
-
 	perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address);
 
 	/*
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 19/24] powerpc/mm: Add a bunch of (un)likely annotations to do_page_fault
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (16 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 18/24] powerpc/mm: Move/simplify faulthandler_disabled() and !mm check Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 20/24] powerpc/mm: Set fault flags earlier Benjamin Herrenschmidt
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

Mostly for the failure cases

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/fault.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index c5859ffd3a96..f1abdc90e330 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -400,7 +400,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	 * source.  If this is invalid we can skip the address space check,
 	 * thus avoiding the deadlock.
 	 */
-	if (!down_read_trylock(&mm->mmap_sem)) {
+	if (unlikely(!down_read_trylock(&mm->mmap_sem))) {
 		if (!is_user && !search_exception_tables(regs->nip))
 			return bad_area_nosemaphore(regs, address);
 
@@ -416,11 +416,11 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	}
 
 	vma = find_vma(mm, address);
-	if (!vma)
+	if (unlikely(!vma))
 		return bad_area(regs, address);
-	if (vma->vm_start <= address)
+	if (likely(vma->vm_start <= address))
 		goto good_area;
-	if (!(vma->vm_flags & VM_GROWSDOWN))
+	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
 		return bad_area(regs, address);
 
 	/*
@@ -453,7 +453,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 		if (address + 2048 < uregs->gpr[1] && !store_update_sp)
 			return bad_area(regs, address);
 	}
-	if (expand_stack(vma, address))
+	if (unlikely(expand_stack(vma, address)))
 		return bad_area(regs, address);
 
 good_area:
@@ -468,18 +468,18 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 		 * below wouldn't be valid on those processors. This -may-
 		 * break programs compiled with a really old ABI though.
 		 */
-		if (!(vma->vm_flags & VM_EXEC) &&
-		    (cpu_has_feature(CPU_FTR_NOEXECUTE) ||
-		     !(vma->vm_flags & (VM_READ | VM_WRITE))))
+		if (unlikely(!(vma->vm_flags & VM_EXEC) &&
+			     (cpu_has_feature(CPU_FTR_NOEXECUTE) ||
+			      !(vma->vm_flags & (VM_READ | VM_WRITE)))))
 			return bad_area(regs, address);
 	/* a write */
 	} else if (is_write) {
-		if (!(vma->vm_flags & VM_WRITE))
+		if (unlikely(!(vma->vm_flags & VM_WRITE)))
 			return bad_area(regs, address);
 		flags |= FAULT_FLAG_WRITE;
 	/* a read */
 	} else {
-		if (!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)))
+		if (unlikely(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))))
 			return bad_area(regs, address);
 	}
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 20/24] powerpc/mm: Set fault flags earlier
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (17 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 19/24] powerpc/mm: Add a bunch of (un)likely annotations to do_page_fault Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 21/24] powerpc/mm: Move page fault VMA access checks to a helper Benjamin Herrenschmidt
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

Move out the code that sets FAULT_FLAG_WRITE so the block that check
access permissions can be extracted. While at it also set
FAULT_FLAG_INSTRUCTION which will be used for protection keys.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index f1abdc90e330..6a938fc8c5fb 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -384,6 +384,10 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 
 	if (is_user)
 		flags |= FAULT_FLAG_USER;
+	if (is_write)
+		flags |= FAULT_FLAG_WRITE;
+	if (is_exec)
+		flags |= FAULT_FLAG_INSTRUCTION;
 
 	/* When running in the kernel we expect faults to occur only to
 	 * addresses in user space.  All other faults represent errors in the
@@ -476,7 +480,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	} else if (is_write) {
 		if (unlikely(!(vma->vm_flags & VM_WRITE)))
 			return bad_area(regs, address);
-		flags |= FAULT_FLAG_WRITE;
 	/* a read */
 	} else {
 		if (unlikely(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))))
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 21/24] powerpc/mm: Move page fault VMA access checks to a helper
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (18 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 20/24] powerpc/mm: Set fault flags earlier Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 22/24] powerpc/mm: Don't lose "major" fault indication on retry Benjamin Herrenschmidt
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/fault.c | 57 ++++++++++++++++++++++++++++---------------------
 1 file changed, 33 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 6a938fc8c5fb..2291686d24c6 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -222,6 +222,37 @@ static bool bad_kernel_fault(bool is_exec, unsigned long error_code,
 	return is_exec || (address >= TASK_SIZE);
 }
 
+static bool access_error(bool is_write, bool is_exec,
+			 struct vm_area_struct *vma)
+{
+	/*
+	 * Allow execution from readable areas if the MMU does not
+	 * provide separate controls over reading and executing.
+	 *
+	 * Note: That code used to not be enabled for 4xx/BookE.
+	 * It is now as I/D cache coherency for these is done at
+	 * set_pte_at() time and I see no reason why the test
+	 * below wouldn't be valid on those processors. This -may-
+	 * break programs compiled with a really old ABI though.
+	 */
+	if (is_exec) {
+		return !(vma->vm_flags & VM_EXEC) &&
+			(cpu_has_feature(CPU_FTR_NOEXECUTE) ||
+			 !(vma->vm_flags & (VM_READ | VM_WRITE)));
+	}
+
+	if (is_write) {
+		if (unlikely(!(vma->vm_flags & VM_WRITE)))
+			return true;
+		return false;
+	}
+
+	if (unlikely(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))))
+		return true;
+
+	return false;
+}
+
 #ifdef CONFIG_PPC_SMLPAR
 static inline void cmo_account_page_fault(void)
 {
@@ -461,30 +492,8 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 		return bad_area(regs, address);
 
 good_area:
-	if (is_exec) {
-		/*
-		 * Allow execution from readable areas if the MMU does not
-		 * provide separate controls over reading and executing.
-		 *
-		 * Note: That code used to not be enabled for 4xx/BookE.
-		 * It is now as I/D cache coherency for these is done at
-		 * set_pte_at() time and I see no reason why the test
-		 * below wouldn't be valid on those processors. This -may-
-		 * break programs compiled with a really old ABI though.
-		 */
-		if (unlikely(!(vma->vm_flags & VM_EXEC) &&
-			     (cpu_has_feature(CPU_FTR_NOEXECUTE) ||
-			      !(vma->vm_flags & (VM_READ | VM_WRITE)))))
-			return bad_area(regs, address);
-	/* a write */
-	} else if (is_write) {
-		if (unlikely(!(vma->vm_flags & VM_WRITE)))
-			return bad_area(regs, address);
-	/* a read */
-	} else {
-		if (unlikely(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))))
-			return bad_area(regs, address);
-	}
+	if (unlikely(access_error(is_write, is_exec, vma)))
+		return bad_area(regs, address);
 
 	/*
 	 * If for any reason at all we couldn't handle the fault,
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 22/24] powerpc/mm: Don't lose "major" fault indication on retry
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (19 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 21/24] powerpc/mm: Move page fault VMA access checks to a helper Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-19  4:49 ` [PATCH 23/24] powerpc/mm: Cleanup check for stack expansion Benjamin Herrenschmidt
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

If the first iteration returns VM_FAULT_MAJOR but the second
one doesn't, we fail to account the fault as a major fault.

This fixes it and brings the code in line with x86.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/fault.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 2291686d24c6..a229fd2d82d6 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -349,7 +349,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
  	int is_exec = TRAP(regs) == 0x400;
 	int is_user = user_mode(regs);
 	int is_write = page_fault_is_write(error_code);
-	int fault;
+	int fault, major = 0;
 	int store_update_sp = 0;
 
 #ifdef CONFIG_PPC_ICSWX
@@ -501,6 +501,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	 * the fault.
 	 */
 	fault = handle_mm_fault(vma, address, flags);
+	major |= fault & VM_FAULT_MAJOR;
 
 	/*
 	 * Handle the retry right now, the mmap_sem has been released in that
@@ -534,7 +535,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	/*
 	 * Major/minor page fault accounting.
 	 */
-	if (fault & VM_FAULT_MAJOR) {
+	if (major) {
 		current->maj_flt++;
 		perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, address);
 		cmo_account_page_fault();
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 23/24] powerpc/mm: Cleanup check for stack expansion
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (20 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 22/24] powerpc/mm: Don't lose "major" fault indication on retry Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-07-21 16:59   ` LEROY Christophe
  2017-07-19  4:49 ` [PATCH 24/24] powerpc: Remove old unused icswx based coprocessor support Benjamin Herrenschmidt
  2017-08-07 10:41 ` [01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Michael Ellerman
  23 siblings, 1 reply; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

When hitting below a VM_GROWSDOWN vma (typically growing the stack),
we check whether it's a valid stack-growing instruction and we
check the distance to GPR1. This is largely open coded with lots
of comments, so move it out to a helper.

While at it, make store_update_sp a boolean.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/mm/fault.c | 84 ++++++++++++++++++++++++++++---------------------
 1 file changed, 48 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index a229fd2d82d6..c2720ebb6a62 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -71,15 +71,15 @@ static inline bool notify_page_fault(struct pt_regs *regs)
  * Check whether the instruction at regs->nip is a store using
  * an update addressing form which will update r1.
  */
-static int store_updates_sp(struct pt_regs *regs)
+static bool store_updates_sp(struct pt_regs *regs)
 {
 	unsigned int inst;
 
 	if (get_user(inst, (unsigned int __user *)regs->nip))
-		return 0;
+		return false;
 	/* check for 1 in the rA field */
 	if (((inst >> 16) & 0x1f) != 1)
-		return 0;
+		return false;
 	/* check major opcode */
 	switch (inst >> 26) {
 	case 37:	/* stwu */
@@ -87,7 +87,7 @@ static int store_updates_sp(struct pt_regs *regs)
 	case 45:	/* sthu */
 	case 53:	/* stfsu */
 	case 55:	/* stfdu */
-		return 1;
+		return true;
 	case 62:	/* std or stdu */
 		return (inst & 3) == 1;
 	case 31:
@@ -99,10 +99,10 @@ static int store_updates_sp(struct pt_regs *regs)
 		case 439:	/* sthux */
 		case 695:	/* stfsux */
 		case 759:	/* stfdux */
-			return 1;
+			return true;
 		}
 	}
-	return 0;
+	return false;
 }
 /*
  * do_page_fault error handling helpers
@@ -222,6 +222,43 @@ static bool bad_kernel_fault(bool is_exec, unsigned long error_code,
 	return is_exec || (address >= TASK_SIZE);
 }
 
+static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
+				struct vm_area_struct *vma,
+				bool store_update_sp)
+{
+	/*
+	 * N.B. The POWER/Open ABI allows programs to access up to
+	 * 288 bytes below the stack pointer.
+	 * The kernel signal delivery code writes up to about 1.5kB
+	 * below the stack pointer (r1) before decrementing it.
+	 * The exec code can write slightly over 640kB to the stack
+	 * before setting the user r1.  Thus we allow the stack to
+	 * expand to 1MB without further checks.
+	 */
+	if (address + 0x100000 < vma->vm_end) {
+		/* get user regs even if this fault is in kernel mode */
+		struct pt_regs *uregs = current->thread.regs;
+		if (uregs == NULL)
+			return true;
+
+		/*
+		 * A user-mode access to an address a long way below
+		 * the stack pointer is only valid if the instruction
+		 * is one which would update the stack pointer to the
+		 * address accessed if the instruction completed,
+		 * i.e. either stwu rs,n(r1) or stwux rs,r1,rb
+		 * (or the byte, halfword, float or double forms).
+		 *
+		 * If we don't check this then any write to the area
+		 * between the last mapped region and the stack will
+		 * expand the stack rather than segfaulting.
+		 */
+		if (address + 2048 < uregs->gpr[1] && !store_update_sp)
+			return true;
+	}
+	return false;
+}
+
 static bool access_error(bool is_write, bool is_exec,
 			 struct vm_area_struct *vma)
 {
@@ -350,7 +387,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	int is_user = user_mode(regs);
 	int is_write = page_fault_is_write(error_code);
 	int fault, major = 0;
-	int store_update_sp = 0;
+	bool store_update_sp = false;
 
 #ifdef CONFIG_PPC_ICSWX
 	/*
@@ -458,36 +495,11 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
 		return bad_area(regs, address);
 
-	/*
-	 * N.B. The POWER/Open ABI allows programs to access up to
-	 * 288 bytes below the stack pointer.
-	 * The kernel signal delivery code writes up to about 1.5kB
-	 * below the stack pointer (r1) before decrementing it.
-	 * The exec code can write slightly over 640kB to the stack
-	 * before setting the user r1.  Thus we allow the stack to
-	 * expand to 1MB without further checks.
-	 */
-	if (address + 0x100000 < vma->vm_end) {
-		/* get user regs even if this fault is in kernel mode */
-		struct pt_regs *uregs = current->thread.regs;
-		if (uregs == NULL)
-			return bad_area(regs, address);
+	/* The stack is being expanded, check if it's valid */
+	if (unlikely(bad_stack_expansion(regs, address, vma, store_update_sp)))
+		return bad_area(regs, address);
 
-		/*
-		 * A user-mode access to an address a long way below
-		 * the stack pointer is only valid if the instruction
-		 * is one which would update the stack pointer to the
-		 * address accessed if the instruction completed,
-		 * i.e. either stwu rs,n(r1) or stwux rs,r1,rb
-		 * (or the byte, halfword, float or double forms).
-		 *
-		 * If we don't check this then any write to the area
-		 * between the last mapped region and the stack will
-		 * expand the stack rather than segfaulting.
-		 */
-		if (address + 2048 < uregs->gpr[1] && !store_update_sp)
-			return bad_area(regs, address);
-	}
+	/* Try to expand it */
 	if (unlikely(expand_stack(vma, address)))
 		return bad_area(regs, address);
 
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 24/24] powerpc: Remove old unused icswx based coprocessor support
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (21 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 23/24] powerpc/mm: Cleanup check for stack expansion Benjamin Herrenschmidt
@ 2017-07-19  4:49 ` Benjamin Herrenschmidt
  2017-08-07 10:41 ` [01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Michael Ellerman
  23 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-19  4:49 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, Benjamin Herrenschmidt

We have a whole pile of unused code to maintain the ACOP register,
allocate coprocessor PIDs and handle ACOP faults. This mechanism
was used for the HFI adapter on POWER7 which is dead and gone and
whose driver never went upstream. It was used on some A2 core based
stuff that also never saw the light of day.

Take out all that code.

There is still some POWER8 coprocessor code that uses icswx but it's
kernel only and thus doesn't use any of that infrastructure.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/book3s/64/mmu.h |   5 -
 arch/powerpc/include/asm/mmu_context.h   |   6 -
 arch/powerpc/include/asm/reg_booke.h     |   3 -
 arch/powerpc/mm/Makefile                 |   2 -
 arch/powerpc/mm/fault.c                  |  15 --
 arch/powerpc/mm/icswx.c                  | 292 -------------------------------
 arch/powerpc/mm/icswx.h                  |  68 -------
 arch/powerpc/mm/icswx_pid.c              |  87 ---------
 arch/powerpc/mm/mmu_context_book3s64.c   |  18 --
 arch/powerpc/platforms/Kconfig.cputype   |  38 ----
 10 files changed, 534 deletions(-)
 delete mode 100644 arch/powerpc/mm/icswx.c
 delete mode 100644 arch/powerpc/mm/icswx.h
 delete mode 100644 arch/powerpc/mm/icswx_pid.c

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index 5b4023c616f7..1a220cdff923 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -97,11 +97,6 @@ typedef struct {
 #ifdef CONFIG_PPC_SUBPAGE_PROT
 	struct subpage_prot_table spt;
 #endif /* CONFIG_PPC_SUBPAGE_PROT */
-#ifdef CONFIG_PPC_ICSWX
-	struct spinlock *cop_lockp; /* guard acop and cop_pid */
-	unsigned long acop;	/* mask of enabled coprocessor types */
-	unsigned int cop_pid;	/* pid value used with coprocessors */
-#endif /* CONFIG_PPC_ICSWX */
 #ifdef CONFIG_PPC_64K_PAGES
 	/* for 4K PTE fragment support */
 	void *pte_frag;
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 26a587a7f50f..f14aec069d76 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -107,12 +107,6 @@ static inline void switch_mm_irqs_off(struct mm_struct *prev,
 	if (prev == next)
 		return;
 
-#ifdef CONFIG_PPC_ICSWX
-	/* Switch coprocessor context only if prev or next uses a coprocessor */
-	if (prev->context.acop || next->context.acop)
-		switch_cop(next);
-#endif /* CONFIG_PPC_ICSWX */
-
 	/* We must stop all altivec streams before changing the HW
 	 * context
 	 */
diff --git a/arch/powerpc/include/asm/reg_booke.h b/arch/powerpc/include/asm/reg_booke.h
index 737e012ef56e..eb2a33d5df26 100644
--- a/arch/powerpc/include/asm/reg_booke.h
+++ b/arch/powerpc/include/asm/reg_booke.h
@@ -221,10 +221,7 @@
 #define SPRN_CSRR0	SPRN_SRR2 /* Critical Save and Restore Register 0 */
 #define SPRN_CSRR1	SPRN_SRR3 /* Critical Save and Restore Register 1 */
 #endif
-
-#ifdef CONFIG_PPC_ICSWX
 #define SPRN_HACOP	0x15F	/* Hypervisor Available Coprocessor Register */
-#endif
 
 /* Bit definitions for CCR1. */
 #define	CCR1_DPC	0x00000100 /* Disable L1 I-Cache/D-Cache parity checking */
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index 7414034df1c3..fa6a6ba34355 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -22,8 +22,6 @@ ifeq ($(CONFIG_PPC_STD_MMU_64),y)
 obj-$(CONFIG_PPC_4K_PAGES)	+= hash64_4k.o
 obj-$(CONFIG_PPC_64K_PAGES)	+= hash64_64k.o
 endif
-obj-$(CONFIG_PPC_ICSWX)		+= icswx.o
-obj-$(CONFIG_PPC_ICSWX_PID)	+= icswx_pid.o
 obj-$(CONFIG_40x)		+= 40x_mmu.o
 obj-$(CONFIG_44x)		+= 44x_mmu.o
 obj-$(CONFIG_PPC_8xx)		+= 8xx_mmu.o
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index c2720ebb6a62..fbea5446b064 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -45,8 +45,6 @@
 #include <asm/siginfo.h>
 #include <asm/debug.h>
 
-#include "icswx.h"
-
 static inline bool notify_page_fault(struct pt_regs *regs)
 {
 	bool ret = false;
@@ -389,19 +387,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	int fault, major = 0;
 	bool store_update_sp = false;
 
-#ifdef CONFIG_PPC_ICSWX
-	/*
-	 * we need to do this early because this "data storage
-	 * interrupt" does not update the DAR/DEAR so we don't want to
-	 * look at it
-	 */
-	if (error_code & ICSWX_DSI_UCT) {
-		int rc = acop_handle_fault(regs, address, error_code);
-		if (rc)
-			return rc;
-	}
-#endif /* CONFIG_PPC_ICSWX */
-
 	if (notify_page_fault(regs))
 		return 0;
 
diff --git a/arch/powerpc/mm/icswx.c b/arch/powerpc/mm/icswx.c
deleted file mode 100644
index 1fa794d7d59f..000000000000
--- a/arch/powerpc/mm/icswx.c
+++ /dev/null
@@ -1,292 +0,0 @@
-/*
- *  ICSWX and ACOP Management
- *
- *  Copyright (C) 2011 Anton Blanchard, IBM Corp. <anton@samba.org>
- *
- *  This program is free software; you can redistribute it and/or
- *  modify it under the terms of the GNU General Public License
- *  as published by the Free Software Foundation; either version
- *  2 of the License, or (at your option) any later version.
- *
- */
-
-#include <linux/sched.h>
-#include <linux/kernel.h>
-#include <linux/errno.h>
-#include <linux/types.h>
-#include <linux/mm.h>
-#include <linux/spinlock.h>
-#include <linux/module.h>
-#include <linux/uaccess.h>
-
-#include "icswx.h"
-
-/*
- * The processor and its L2 cache cause the icswx instruction to
- * generate a COP_REQ transaction on PowerBus. The transaction has no
- * address, and the processor does not perform an MMU access to
- * authenticate the transaction. The command portion of the PowerBus
- * COP_REQ transaction includes the LPAR_ID (LPID) and the coprocessor
- * Process ID (PID), which the coprocessor compares to the authorized
- * LPID and PID held in the coprocessor, to determine if the process
- * is authorized to generate the transaction.  The data of the COP_REQ
- * transaction is 128-byte or less in size and is placed in cacheable
- * memory on a 128-byte cache line boundary.
- *
- * The task to use a coprocessor should use use_cop() to mark the use
- * of the Coprocessor Type (CT) and context switching. On a server
- * class processor, the PID register is used only for coprocessor
- * management + * and so a coprocessor PID is allocated before
- * executing icswx + * instruction. Drop_cop() is used to free the
- * coprocessor PID.
- *
- * Example:
- * Host Fabric Interface (HFI) is a PowerPC network coprocessor.
- * Each HFI have multiple windows. Each HFI window serves as a
- * network device sending to and receiving from HFI network.
- * HFI immediate send function uses icswx instruction. The immediate
- * send function allows small (single cache-line) packets be sent
- * without using the regular HFI send FIFO and doorbell, which are
- * much slower than immediate send.
- *
- * For each task intending to use HFI immediate send, the HFI driver
- * calls use_cop() to obtain a coprocessor PID for the task.
- * The HFI driver then allocate a free HFI window and save the
- * coprocessor PID to the HFI window to allow the task to use the
- * HFI window.
- *
- * The HFI driver repeatedly creates immediate send packets and
- * issues icswx instruction to send data through the HFI window.
- * The HFI compares the coprocessor PID in the CPU PID register
- * to the PID held in the HFI window to determine if the transaction
- * is allowed.
- *
- * When the task to release the HFI window, the HFI driver calls
- * drop_cop() to release the coprocessor PID.
- */
-
-void switch_cop(struct mm_struct *next)
-{
-#ifdef CONFIG_PPC_ICSWX_PID
-	mtspr(SPRN_PID, next->context.cop_pid);
-#endif
-	mtspr(SPRN_ACOP, next->context.acop);
-}
-
-/**
- * Start using a coprocessor.
- * @acop: mask of coprocessor to be used.
- * @mm: The mm the coprocessor to associate with. Most likely current mm.
- *
- * Return a positive PID if successful. Negative errno otherwise.
- * The returned PID will be fed to the coprocessor to determine if an
- * icswx transaction is authenticated.
- */
-int use_cop(unsigned long acop, struct mm_struct *mm)
-{
-	int ret;
-
-	if (!cpu_has_feature(CPU_FTR_ICSWX))
-		return -ENODEV;
-
-	if (!mm || !acop)
-		return -EINVAL;
-
-	/* The page_table_lock ensures mm_users won't change under us */
-	spin_lock(&mm->page_table_lock);
-	spin_lock(mm->context.cop_lockp);
-
-	ret = get_cop_pid(mm);
-	if (ret < 0)
-		goto out;
-
-	/* update acop */
-	mm->context.acop |= acop;
-
-	sync_cop(mm);
-
-	/*
-	 * If this is a threaded process then there might be other threads
-	 * running. We need to send an IPI to force them to pick up any
-	 * change in PID and ACOP.
-	 */
-	if (atomic_read(&mm->mm_users) > 1)
-		smp_call_function(sync_cop, mm, 1);
-
-out:
-	spin_unlock(mm->context.cop_lockp);
-	spin_unlock(&mm->page_table_lock);
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(use_cop);
-
-/**
- * Stop using a coprocessor.
- * @acop: mask of coprocessor to be stopped.
- * @mm: The mm the coprocessor associated with.
- */
-void drop_cop(unsigned long acop, struct mm_struct *mm)
-{
-	int free_pid;
-
-	if (!cpu_has_feature(CPU_FTR_ICSWX))
-		return;
-
-	if (WARN_ON_ONCE(!mm))
-		return;
-
-	/* The page_table_lock ensures mm_users won't change under us */
-	spin_lock(&mm->page_table_lock);
-	spin_lock(mm->context.cop_lockp);
-
-	mm->context.acop &= ~acop;
-
-	free_pid = disable_cop_pid(mm);
-	sync_cop(mm);
-
-	/*
-	 * If this is a threaded process then there might be other threads
-	 * running. We need to send an IPI to force them to pick up any
-	 * change in PID and ACOP.
-	 */
-	if (atomic_read(&mm->mm_users) > 1)
-		smp_call_function(sync_cop, mm, 1);
-
-	if (free_pid != COP_PID_NONE)
-		free_cop_pid(free_pid);
-
-	spin_unlock(mm->context.cop_lockp);
-	spin_unlock(&mm->page_table_lock);
-}
-EXPORT_SYMBOL_GPL(drop_cop);
-
-static int acop_use_cop(int ct)
-{
-	/* There is no alternate policy, yet */
-	return -1;
-}
-
-/*
- * Get the instruction word at the NIP
- */
-static u32 acop_get_inst(struct pt_regs *regs)
-{
-	u32 inst;
-	u32 __user *p;
-
-	p = (u32 __user *)regs->nip;
-	if (!access_ok(VERIFY_READ, p, sizeof(*p)))
-		return 0;
-
-	if (__get_user(inst, p))
-		return 0;
-
-	return inst;
-}
-
-/**
- * @regs: registers at time of interrupt
- * @address: storage address
- * @error_code: Fault code, usually the DSISR or ESR depending on
- *		processor type
- *
- * Return 0 if we are able to resolve the data storage fault that
- * results from a CT miss in the ACOP register.
- */
-int acop_handle_fault(struct pt_regs *regs, unsigned long address,
-		      unsigned long error_code)
-{
-	int ct;
-	u32 inst = 0;
-
-	if (!cpu_has_feature(CPU_FTR_ICSWX)) {
-		pr_info("No coprocessors available");
-		_exception(SIGILL, regs, ILL_ILLOPN, address);
-	}
-
-	if (!user_mode(regs)) {
-		/* this could happen if the HV denies the
-		 * kernel access, for now we just die */
-		die("ICSWX from kernel failed", regs, SIGSEGV);
-	}
-
-	/* Some implementations leave us a hint for the CT */
-	ct = ICSWX_GET_CT_HINT(error_code);
-	if (ct < 0) {
-		/* we have to peek at the instruction word to figure out CT */
-		u32 ccw;
-		u32 rs;
-
-		inst = acop_get_inst(regs);
-		if (inst == 0)
-			return -1;
-
-		rs = (inst >> (31 - 10)) & 0x1f;
-		ccw = regs->gpr[rs];
-		ct = (ccw >> 16) & 0x3f;
-	}
-
-	/*
-	 * We could be here because another thread has enabled acop
-	 * but the ACOP register has yet to be updated.
-	 *
-	 * This should have been taken care of by the IPI to sync all
-	 * the threads (see smp_call_function(sync_cop, mm, 1)), but
-	 * that could take forever if there are a significant amount
-	 * of threads.
-	 *
-	 * Given the number of threads on some of these systems,
-	 * perhaps this is the best way to sync ACOP rather than whack
-	 * every thread with an IPI.
-	 */
-	if ((acop_copro_type_bit(ct) & current->active_mm->context.acop) != 0) {
-		sync_cop(current->active_mm);
-		return 0;
-	}
-
-	/* check for alternate policy */
-	if (!acop_use_cop(ct))
-		return 0;
-
-	/* at this point the CT is unknown to the system */
-	pr_warn("%s[%d]: Coprocessor %d is unavailable\n",
-		current->comm, current->pid, ct);
-
-	/* get inst if we don't already have it */
-	if (inst == 0) {
-		inst = acop_get_inst(regs);
-		if (inst == 0)
-			return -1;
-	}
-
-	/* Check if the instruction is the "record form" */
-	if (inst & 1) {
-		/*
-		 * the instruction is "record" form so we can reject
-		 * using CR0
-		 */
-		regs->ccr &= ~(0xful << 28);
-		regs->ccr |= ICSWX_RC_NOT_FOUND << 28;
-
-		/* Move on to the next instruction */
-		regs->nip += 4;
-	} else {
-		/*
-		 * There is no architected mechanism to report a bad
-		 * CT so we could either SIGILL or report nothing.
-		 * Since the non-record version should only bu used
-		 * for "hints" or "don't care" we should probably do
-		 * nothing.  However, I could see how some people
-		 * might want an SIGILL so it here if you want it.
-		 */
-#ifdef CONFIG_PPC_ICSWX_USE_SIGILL
-		_exception(SIGILL, regs, ILL_ILLOPN, address);
-#else
-		regs->nip += 4;
-#endif
-	}
-
-	return 0;
-}
-EXPORT_SYMBOL_GPL(acop_handle_fault);
diff --git a/arch/powerpc/mm/icswx.h b/arch/powerpc/mm/icswx.h
deleted file mode 100644
index 6dedc08e62c8..000000000000
--- a/arch/powerpc/mm/icswx.h
+++ /dev/null
@@ -1,68 +0,0 @@
-#ifndef _ARCH_POWERPC_MM_ICSWX_H_
-#define _ARCH_POWERPC_MM_ICSWX_H_
-
-/*
- *  ICSWX and ACOP Management
- *
- *  Copyright (C) 2011 Anton Blanchard, IBM Corp. <anton@samba.org>
- *
- *  This program is free software; you can redistribute it and/or
- *  modify it under the terms of the GNU General Public License
- *  as published by the Free Software Foundation; either version
- *  2 of the License, or (at your option) any later version.
- *
- */
-
-#include <asm/mmu_context.h>
-
-/* also used to denote that PIDs are not used */
-#define COP_PID_NONE 0
-
-static inline void sync_cop(void *arg)
-{
-	struct mm_struct *mm = arg;
-
-	if (mm == current->active_mm)
-		switch_cop(current->active_mm);
-}
-
-#ifdef CONFIG_PPC_ICSWX_PID
-extern int get_cop_pid(struct mm_struct *mm);
-extern int disable_cop_pid(struct mm_struct *mm);
-extern void free_cop_pid(int free_pid);
-#else
-#define get_cop_pid(m) (COP_PID_NONE)
-#define disable_cop_pid(m) (COP_PID_NONE)
-#define free_cop_pid(p)
-#endif
-
-/*
- * These are implementation bits for architected registers.  If this
- * ever becomes architecture the should be moved to reg.h et. al.
- */
-/* UCT is the same bit for Server and Embedded */
-#define ICSWX_DSI_UCT		0x00004000  /* Unavailable Coprocessor Type */
-
-#ifdef CONFIG_PPC_BOOK3E
-/* Embedded implementation gives us no hints as to what the CT is */
-#define ICSWX_GET_CT_HINT(x) (-1)
-#else
-/* Server implementation contains the CT value in the DSISR */
-#define ICSWX_DSISR_CTMASK	0x00003f00
-#define ICSWX_GET_CT_HINT(x)	(((x) & ICSWX_DSISR_CTMASK) >> 8)
-#endif
-
-#define ICSWX_RC_STARTED	0x8	/* The request has been started */
-#define ICSWX_RC_NOT_IDLE	0x4	/* No coprocessor found idle */
-#define ICSWX_RC_NOT_FOUND	0x2	/* No coprocessor found */
-#define ICSWX_RC_UNDEFINED	0x1	/* Reserved */
-
-extern int acop_handle_fault(struct pt_regs *regs, unsigned long address,
-			     unsigned long error_code);
-
-static inline u64 acop_copro_type_bit(unsigned int type)
-{
-	return 1ULL << (63 - type);
-}
-
-#endif /* !_ARCH_POWERPC_MM_ICSWX_H_ */
diff --git a/arch/powerpc/mm/icswx_pid.c b/arch/powerpc/mm/icswx_pid.c
deleted file mode 100644
index 91e30eb7d054..000000000000
--- a/arch/powerpc/mm/icswx_pid.c
+++ /dev/null
@@ -1,87 +0,0 @@
-/*
- *  ICSWX and ACOP/PID Management
- *
- *  Copyright (C) 2011 Anton Blanchard, IBM Corp. <anton@samba.org>
- *
- *  This program is free software; you can redistribute it and/or
- *  modify it under the terms of the GNU General Public License
- *  as published by the Free Software Foundation; either version
- *  2 of the License, or (at your option) any later version.
- *
- */
-
-#include <linux/sched.h>
-#include <linux/kernel.h>
-#include <linux/errno.h>
-#include <linux/types.h>
-#include <linux/mm.h>
-#include <linux/spinlock.h>
-#include <linux/idr.h>
-#include <linux/module.h>
-#include "icswx.h"
-
-#define COP_PID_MIN (COP_PID_NONE + 1)
-#define COP_PID_MAX (0xFFFF)
-
-static DEFINE_SPINLOCK(mmu_context_acop_lock);
-static DEFINE_IDA(cop_ida);
-
-static int new_cop_pid(struct ida *ida, int min_id, int max_id,
-		       spinlock_t *lock)
-{
-	int index;
-	int err;
-
-again:
-	if (!ida_pre_get(ida, GFP_KERNEL))
-		return -ENOMEM;
-
-	spin_lock(lock);
-	err = ida_get_new_above(ida, min_id, &index);
-	spin_unlock(lock);
-
-	if (err == -EAGAIN)
-		goto again;
-	else if (err)
-		return err;
-
-	if (index > max_id) {
-		spin_lock(lock);
-		ida_remove(ida, index);
-		spin_unlock(lock);
-		return -ENOMEM;
-	}
-
-	return index;
-}
-
-int get_cop_pid(struct mm_struct *mm)
-{
-	int pid;
-
-	if (mm->context.cop_pid == COP_PID_NONE) {
-		pid = new_cop_pid(&cop_ida, COP_PID_MIN, COP_PID_MAX,
-				  &mmu_context_acop_lock);
-		if (pid >= 0)
-			mm->context.cop_pid = pid;
-	}
-	return mm->context.cop_pid;
-}
-
-int disable_cop_pid(struct mm_struct *mm)
-{
-	int free_pid = COP_PID_NONE;
-
-	if ((!mm->context.acop) && (mm->context.cop_pid != COP_PID_NONE)) {
-		free_pid = mm->context.cop_pid;
-		mm->context.cop_pid = COP_PID_NONE;
-	}
-	return free_pid;
-}
-
-void free_cop_pid(int free_pid)
-{
-	spin_lock(&mmu_context_acop_lock);
-	ida_remove(&cop_ida, free_pid);
-	spin_unlock(&mmu_context_acop_lock);
-}
diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index 183a67baeccd..5588f3ce647d 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -29,8 +29,6 @@
 #include <asm/kvm_book3s_asm.h>
 #endif
 
-#include "icswx.h"
-
 static DEFINE_SPINLOCK(mmu_context_lock);
 static DEFINE_IDA(mmu_context_ida);
 
@@ -169,16 +167,6 @@ int init_new_context(struct task_struct *tsk, struct mm_struct *mm)
 		return index;
 
 	mm->context.id = index;
-#ifdef CONFIG_PPC_ICSWX
-	mm->context.cop_lockp = kmalloc(sizeof(spinlock_t), GFP_KERNEL);
-	if (!mm->context.cop_lockp) {
-		__destroy_context(index);
-		subpage_prot_free(mm);
-		mm->context.id = MMU_NO_CONTEXT;
-		return -ENOMEM;
-	}
-	spin_lock_init(mm->context.cop_lockp);
-#endif /* CONFIG_PPC_ICSWX */
 
 #ifdef CONFIG_PPC_64K_PAGES
 	mm->context.pte_frag = NULL;
@@ -230,12 +218,6 @@ void destroy_context(struct mm_struct *mm)
 #ifdef CONFIG_SPAPR_TCE_IOMMU
 	WARN_ON_ONCE(!list_empty(&mm->context.iommu_group_mem_list));
 #endif
-#ifdef CONFIG_PPC_ICSWX
-	drop_cop(mm->context.acop, mm);
-	kfree(mm->context.cop_lockp);
-	mm->context.cop_lockp = NULL;
-#endif /* CONFIG_PPC_ICSWX */
-
 	if (radix_enabled()) {
 		/*
 		 * Radix doesn't have a valid bit in the process table
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 2f629e0551e9..9539620a48d4 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -271,44 +271,6 @@ config VSX
 
 	  If in doubt, say Y here.
 
-config PPC_ICSWX
-	bool "Support for PowerPC icswx coprocessor instruction"
-	depends on PPC_BOOK3S_64
-	default n
-	---help---
-
-	  This option enables kernel support for the PowerPC Initiate
-	  Coprocessor Store Word (icswx) coprocessor instruction on POWER7
-	  and POWER8 processors. POWER9 uses new copy/paste instructions
-	  to invoke the coprocessor.
-
-	  This option is only useful if you have a processor that supports
-	  the icswx coprocessor instruction. It does not have any effect
-	  on processors without the icswx coprocessor instruction.
-
-	  This option slightly increases kernel memory usage.
-
-	  If in doubt, say N here.
-
-config PPC_ICSWX_PID
-	bool "icswx requires direct PID management"
-	depends on PPC_ICSWX
-	default y
-	---help---
-	  The PID register in server is used explicitly for ICSWX.  In
-	  embedded systems PID management is done by the system.
-
-config PPC_ICSWX_USE_SIGILL
-	bool "Should a bad CT cause a SIGILL?"
-	depends on PPC_ICSWX
-	default n
-	---help---
-	  Should a bad CT used for "non-record form ICSWX" cause an
-	  illegal instruction signal or should it be silent as
-	  architected.
-
-	  If in doubt, say N here.
-
 config SPE_POSSIBLE
 	def_bool y
 	depends on E200 || (E500 && !PPC_E500MC)
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 23/24] powerpc/mm: Cleanup check for stack expansion
  2017-07-19  4:49 ` [PATCH 23/24] powerpc/mm: Cleanup check for stack expansion Benjamin Herrenschmidt
@ 2017-07-21 16:59   ` LEROY Christophe
  2017-07-24 10:47     ` Michael Ellerman
  0 siblings, 1 reply; 45+ messages in thread
From: LEROY Christophe @ 2017-07-21 16:59 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: aneesh.kumar, linuxppc-dev

Benjamin Herrenschmidt <benh@kernel.crashing.org> a =C3=A9crit=C2=A0:

> When hitting below a VM_GROWSDOWN vma (typically growing the stack),
> we check whether it's a valid stack-growing instruction and we
> check the distance to GPR1. This is largely open coded with lots
> of comments, so move it out to a helper.

Did you have a look at the following patch ? It's been waiting for=20=20
application=20for some weeks now.=20=20
https://patchwork.ozlabs.org/patch/771869
It=20limits number of calls to get_user()
Can you have a look and merge it with your serie ?

>
> While at it, make store_update_sp a boolean.

My patch requires a tristate here

Christophe

>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>  arch/powerpc/mm/fault.c | 84=20=20
>=20++++++++++++++++++++++++++++---------------------
>  1 file changed, 48 insertions(+), 36 deletions(-)
>
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index a229fd2d82d6..c2720ebb6a62 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -71,15 +71,15 @@ static inline bool notify_page_fault(struct=20=20
>=20pt_regs *regs)
>   * Check whether the instruction at regs->nip is a store using
>   * an update addressing form which will update r1.
>   */
> -static int store_updates_sp(struct pt_regs *regs)
> +static bool store_updates_sp(struct pt_regs *regs)
>  {
>  	unsigned int inst;
>
>  	if (get_user(inst, (unsigned int __user *)regs->nip))
> -		return 0;
> +		return false;
>  	/* check for 1 in the rA field */
>  	if (((inst >> 16) & 0x1f) !=3D 1)
> -		return 0;
> +		return false;
>  	/* check major opcode */
>  	switch (inst >> 26) {
>  	case 37:	/* stwu */
> @@ -87,7 +87,7 @@ static int store_updates_sp(struct pt_regs *regs)
>  	case 45:	/* sthu */
>  	case 53:	/* stfsu */
>  	case 55:	/* stfdu */
> -		return 1;
> +		return true;
>  	case 62:	/* std or stdu */
>  		return (inst & 3) =3D=3D 1;
>  	case 31:
> @@ -99,10 +99,10 @@ static int store_updates_sp(struct pt_regs *regs)
>  		case 439:	/* sthux */
>  		case 695:	/* stfsux */
>  		case 759:	/* stfdux */
> -			return 1;
> +			return true;
>  		}
>  	}
> -	return 0;
> +	return false;
>  }
>  /*
>   * do_page_fault error handling helpers
> @@ -222,6 +222,43 @@ static bool bad_kernel_fault(bool is_exec,=20=20
>=20unsigned long error_code,
>  	return is_exec || (address >=3D TASK_SIZE);
>  }
>
> +static bool bad_stack_expansion(struct pt_regs *regs, unsigned long addr=
ess,
> +				struct vm_area_struct *vma,
> +				bool store_update_sp)
> +{
> +	/*
> +	 * N.B. The POWER/Open ABI allows programs to access up to
> +	 * 288 bytes below the stack pointer.
> +	 * The kernel signal delivery code writes up to about 1.5kB
> +	 * below the stack pointer (r1) before decrementing it.
> +	 * The exec code can write slightly over 640kB to the stack
> +	 * before setting the user r1.  Thus we allow the stack to
> +	 * expand to 1MB without further checks.
> +	 */
> +	if (address + 0x100000 < vma->vm_end) {
> +		/* get user regs even if this fault is in kernel mode */
> +		struct pt_regs *uregs =3D current->thread.regs;
> +		if (uregs =3D=3D NULL)
> +			return true;
> +
> +		/*
> +		 * A user-mode access to an address a long way below
> +		 * the stack pointer is only valid if the instruction
> +		 * is one which would update the stack pointer to the
> +		 * address accessed if the instruction completed,
> +		 * i.e. either stwu rs,n(r1) or stwux rs,r1,rb
> +		 * (or the byte, halfword, float or double forms).
> +		 *
> +		 * If we don't check this then any write to the area
> +		 * between the last mapped region and the stack will
> +		 * expand the stack rather than segfaulting.
> +		 */
> +		if (address + 2048 < uregs->gpr[1] && !store_update_sp)
> +			return true;
> +	}
> +	return false;
> +}
> +
>  static bool access_error(bool is_write, bool is_exec,
>  			 struct vm_area_struct *vma)
>  {
> @@ -350,7 +387,7 @@ static int __do_page_fault(struct pt_regs *regs,=20=
=20
>=20unsigned long address,
>  	int is_user =3D user_mode(regs);
>  	int is_write =3D page_fault_is_write(error_code);
>  	int fault, major =3D 0;
> -	int store_update_sp =3D 0;
> +	bool store_update_sp =3D false;
>
>  #ifdef CONFIG_PPC_ICSWX
>  	/*
> @@ -458,36 +495,11 @@ static int __do_page_fault(struct pt_regs=20=20
>=20*regs, unsigned long address,
>  	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
>  		return bad_area(regs, address);
>
> -	/*
> -	 * N.B. The POWER/Open ABI allows programs to access up to
> -	 * 288 bytes below the stack pointer.
> -	 * The kernel signal delivery code writes up to about 1.5kB
> -	 * below the stack pointer (r1) before decrementing it.
> -	 * The exec code can write slightly over 640kB to the stack
> -	 * before setting the user r1.  Thus we allow the stack to
> -	 * expand to 1MB without further checks.
> -	 */
> -	if (address + 0x100000 < vma->vm_end) {
> -		/* get user regs even if this fault is in kernel mode */
> -		struct pt_regs *uregs =3D current->thread.regs;
> -		if (uregs =3D=3D NULL)
> -			return bad_area(regs, address);
> +	/* The stack is being expanded, check if it's valid */
> +	if (unlikely(bad_stack_expansion(regs, address, vma, store_update_sp)))
> +		return bad_area(regs, address);
>
> -		/*
> -		 * A user-mode access to an address a long way below
> -		 * the stack pointer is only valid if the instruction
> -		 * is one which would update the stack pointer to the
> -		 * address accessed if the instruction completed,
> -		 * i.e. either stwu rs,n(r1) or stwux rs,r1,rb
> -		 * (or the byte, halfword, float or double forms).
> -		 *
> -		 * If we don't check this then any write to the area
> -		 * between the last mapped region and the stack will
> -		 * expand the stack rather than segfaulting.
> -		 */
> -		if (address + 2048 < uregs->gpr[1] && !store_update_sp)
> -			return bad_area(regs, address);
> -	}
> +	/* Try to expand it */
>  	if (unlikely(expand_stack(vma, address)))
>  		return bad_area(regs, address);
>
> --
> 2.13.3

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 07/24] powerpc/mm: Move out definition of CPU specific is_write bits
  2017-07-19  4:49 ` [PATCH 07/24] powerpc/mm: Move out definition of CPU specific is_write bits Benjamin Herrenschmidt
@ 2017-07-22 16:40   ` LEROY Christophe
  2017-07-23  1:06     ` Benjamin Herrenschmidt
  2017-07-24 11:58     ` Michael Ellerman
  0 siblings, 2 replies; 45+ messages in thread
From: LEROY Christophe @ 2017-07-22 16:40 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: aneesh.kumar, linuxppc-dev

Benjamin Herrenschmidt <benh@kernel.crashing.org> a =C3=A9crit=C2=A0:

> Define a common page_fault_is_write() helper and use it
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>  arch/powerpc/mm/fault.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index f257965b54b5..26ec0dd4f419 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -183,6 +183,16 @@ static int mm_fault_error(struct pt_regs *regs,=20=
=20
>=20unsigned long addr, int fault)
>  }
>
>  /*
> + * Define the correct "is_write" bit in error_code based
> + * on the processor family
> + */
> +#if (defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
> +#define page_fault_is_write(__err)	((__err) & ESR_DST)
> +#else
> +#define page_fault_is_write(__err)	((__err) & DSISR_ISSTORE)
> +#endif

Doesn't linux kernel codying style make preference to static inline=20=20
functions=20instead of macros ?

> +
> +/*
>   * For 600- and 800-family processors, the error_code parameter is DSISR
>   * for a data fault, SRR1 for an instruction fault. For 400-family=20=20
>=20processors
>   * the error_code parameter is ESR for a data fault, 0 for an instructio=
n
> @@ -202,18 +212,12 @@ static int __do_page_fault(struct pt_regs=20=20
>=20*regs, unsigned long address,
>  	struct mm_struct *mm =3D current->mm;
>  	unsigned int flags =3D FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
>  	int code =3D SEGV_MAPERR;
> -	int is_write =3D 0;
>   	int is_exec =3D TRAP(regs) =3D=3D 0x400;
>  	int is_user =3D user_mode(regs);
> +	int is_write =3D page_fault_is_write(error_code);
>  	int fault;
>  	int rc =3D 0, store_update_sp =3D 0;
>
> -#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
> -	is_write =3D error_code & DSISR_ISSTORE;
> -#else
> -	is_write =3D error_code & ESR_DST;
> -#endif /* CONFIG_4xx || CONFIG_BOOKE */
> -
>  #ifdef CONFIG_PPC_ICSWX
>  	/*
>  	 * we need to do this early because this "data storage
> --
> 2.13.3

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 02/24] powerpc/mm: Pre-filter SRR1 bits before do_page_fault()
  2017-07-19  4:49 ` [PATCH 02/24] powerpc/mm: Pre-filter SRR1 bits before do_page_fault() Benjamin Herrenschmidt
@ 2017-07-22 16:43   ` LEROY Christophe
  2017-07-23  1:10     ` Benjamin Herrenschmidt
  2017-07-24 13:48     ` Michael Ellerman
  0 siblings, 2 replies; 45+ messages in thread
From: LEROY Christophe @ 2017-07-22 16:43 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: aneesh.kumar, linuxppc-dev

Benjamin Herrenschmidt <benh@kernel.crashing.org> a =C3=A9crit=C2=A0:

> By filtering the relevant SRR1 bits in the assembly rather than
> in do_page_fault() itself, we avoid a conditional branch (since we
> already come from different path for data and instruction faults).
>
> This will allow more simplifications later
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>  arch/powerpc/kernel/head_32.S  |  2 +-
>  arch/powerpc/kernel/head_8xx.S |  4 ++--
>  arch/powerpc/mm/fault.c        | 14 ++------------
>  3 files changed, 5 insertions(+), 15 deletions(-)
>
> diff --git a/arch/powerpc/kernel/head_32.S b/arch/powerpc/kernel/head_32.=
S
> index a5301248b098..f11d1cd6e314 100644
> --- a/arch/powerpc/kernel/head_32.S
> +++ b/arch/powerpc/kernel/head_32.S
> @@ -409,7 +409,7 @@ InstructionAccess:
>  	mr	r4,r12			/* SRR0 is fault address */
>  	bl	hash_page
>  1:	mr	r4,r12
> -	mr	r5,r9
> +	andis.	r5,r9,0x4820		/* Filter relevant SRR1 bits */
>  	EXC_XFER_LITE(0x400, handle_page_fault)
>
>  /* External interrupt */
> diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8x=
x.S
> index c032fe8c2d26..da3afa2c1658 100644
> --- a/arch/powerpc/kernel/head_8xx.S
> +++ b/arch/powerpc/kernel/head_8xx.S
> @@ -569,8 +569,8 @@ _ENTRY(DTLBMiss_jmp)
>  InstructionTLBError:
>  	EXCEPTION_PROLOG
>  	mr	r4,r12
> -	mr	r5,r9
> -	andis.	r10,r5,0x4000
> +	andis.	r5,r9,0x4820		/* Filter relevant SRR1 bits */
> +	andis.	r10,r9,0x4000
>  	beq+	1f
>  	tlbie	r4
>  itlbie:
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index faddc87d0205..f04bc9f6b134 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -203,23 +203,13 @@ static int __do_page_fault(struct pt_regs=20=20
>=20*regs, unsigned long address,
>  	unsigned int flags =3D FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
>  	int code =3D SEGV_MAPERR;
>  	int is_write =3D 0;
> -	int trap =3D TRAP(regs);
> - 	int is_exec =3D trap =3D=3D 0x400;
> + 	int is_exec =3D TRAP(regs) =3D=3D 0x400;

Don't we have a tab/space issue here ?

>  	int is_user =3D user_mode(regs);
>  	int fault;
>  	int rc =3D 0, store_update_sp =3D 0;
>
>  #if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
> -	/*
> -	 * Fortunately the bit assignments in SRR1 for an instruction
> -	 * fault and DSISR for a data fault are mostly the same for the
> -	 * bits we are interested in.  But there are some bits which
> -	 * indicate errors in DSISR but can validly be set in SRR1.
> -	 */
> -	if (is_exec)
> -		error_code &=3D 0x48200000;
> -	else
> -		is_write =3D error_code & DSISR_ISSTORE;
> +	is_write =3D error_code & DSISR_ISSTORE;
>  #else
>  	is_write =3D error_code & ESR_DST;
>  #endif /* CONFIG_4xx || CONFIG_BOOKE */
> --
> 2.13.3

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 07/24] powerpc/mm: Move out definition of CPU specific is_write bits
  2017-07-22 16:40   ` LEROY Christophe
@ 2017-07-23  1:06     ` Benjamin Herrenschmidt
  2017-07-24 11:58     ` Michael Ellerman
  1 sibling, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-23  1:06 UTC (permalink / raw)
  To: LEROY Christophe; +Cc: aneesh.kumar, linuxppc-dev

On Sat, 2017-07-22 at 18:40 +0200, LEROY Christophe wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> a écrit :
> 
> > Define a common page_fault_is_write() helper and use it
> > 
> > Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > ---
> >  arch/powerpc/mm/fault.c | 18 +++++++++++-------
> >  1 file changed, 11 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> > index f257965b54b5..26ec0dd4f419 100644
> > --- a/arch/powerpc/mm/fault.c
> > +++ b/arch/powerpc/mm/fault.c
> > @@ -183,6 +183,16 @@ static int mm_fault_error(struct pt_regs *regs,  
> > unsigned long addr, int fault)
> >  }
> > 
> >  /*
> > + * Define the correct "is_write" bit in error_code based
> > + * on the processor family
> > + */
> > +#if (defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
> > +#define page_fault_is_write(__err)	((__err) & ESR_DST)
> > +#else
> > +#define page_fault_is_write(__err)	((__err) & DSISR_ISSTORE)
> > +#endif
> 
> Doesn't linux kernel codying style make preference to static inline  
> functions instead of macros ?

Doesn't really matter, yes that could be, but then I would have
to add a !! to make them proper bools I think unless I keep things
as int, etc... not sure how well gcc will do with it.
> 
> > +
> > +/*
> >   * For 600- and 800-family processors, the error_code parameter is DSISR
> >   * for a data fault, SRR1 for an instruction fault. For 400-family  
> > processors
> >   * the error_code parameter is ESR for a data fault, 0 for an instruction
> > @@ -202,18 +212,12 @@ static int __do_page_fault(struct pt_regs  
> > *regs, unsigned long address,
> >  	struct mm_struct *mm = current->mm;
> >  	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> >  	int code = SEGV_MAPERR;
> > -	int is_write = 0;
> >   	int is_exec = TRAP(regs) == 0x400;
> >  	int is_user = user_mode(regs);
> > +	int is_write = page_fault_is_write(error_code);
> >  	int fault;
> >  	int rc = 0, store_update_sp = 0;
> > 
> > -#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
> > -	is_write = error_code & DSISR_ISSTORE;
> > -#else
> > -	is_write = error_code & ESR_DST;
> > -#endif /* CONFIG_4xx || CONFIG_BOOKE */
> > -
> >  #ifdef CONFIG_PPC_ICSWX
> >  	/*
> >  	 * we need to do this early because this "data storage
> > --
> > 2.13.3
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 02/24] powerpc/mm: Pre-filter SRR1 bits before do_page_fault()
  2017-07-22 16:43   ` LEROY Christophe
@ 2017-07-23  1:10     ` Benjamin Herrenschmidt
  2017-07-24 13:48     ` Michael Ellerman
  1 sibling, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-07-23  1:10 UTC (permalink / raw)
  To: LEROY Christophe; +Cc: aneesh.kumar, linuxppc-dev

On Sat, 2017-07-22 at 18:43 +0200, LEROY Christophe wrote:
> 
> > @@ -203,23 +203,13 @@ static int __do_page_fault(struct pt_regs  
> > *regs, unsigned long address,
> >  	unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
> >  	int code = SEGV_MAPERR;
> >  	int is_write = 0;
> > -	int trap = TRAP(regs);
> > - 	int is_exec = trap == 0x400;
> > + 	int is_exec = TRAP(regs) == 0x400;
> 
> Don't we have a tab/space issue here ?

There seem to be indeed an extra space before the tab at the beginning
of the line, though it looks like it was already there in the orignal
code :-) I didn't notice it and thus didn't fix it. If I respin I'll
take care of it.

> >  	int is_user = user_mode(regs);
> >  	int fault;
> >  	int rc = 0, store_update_sp = 0;
> > 
> >  #if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
> > -	/*
> > -	 * Fortunately the bit assignments in SRR1 for an instruction
> > -	 * fault and DSISR for a data fault are mostly the same for the
> > -	 * bits we are interested in.  But there are some bits which
> > -	 * indicate errors in DSISR but can validly be set in SRR1.
> > -	 */
> > -	if (is_exec)
> > -		error_code &= 0x48200000;
> > -	else
> > -		is_write = error_code & DSISR_ISSTORE;
> > +	is_write = error_code & DSISR_ISSTORE;
> >  #else
> >  	is_write = error_code & ESR_DST;
> >  #endif /* CONFIG_4xx || CONFIG_BOOKE */
> > --
> > 2.13.3
> 
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 23/24] powerpc/mm: Cleanup check for stack expansion
  2017-07-21 16:59   ` LEROY Christophe
@ 2017-07-24 10:47     ` Michael Ellerman
  2017-07-24 17:34       ` LEROY Christophe
  0 siblings, 1 reply; 45+ messages in thread
From: Michael Ellerman @ 2017-07-24 10:47 UTC (permalink / raw)
  To: LEROY Christophe, Benjamin Herrenschmidt; +Cc: linuxppc-dev, aneesh.kumar

LEROY Christophe <christophe.leroy@c-s.fr> writes:

> Benjamin Herrenschmidt <benh@kernel.crashing.org> a =C3=A9crit=C2=A0:
>
>> When hitting below a VM_GROWSDOWN vma (typically growing the stack),
>> we check whether it's a valid stack-growing instruction and we
>> check the distance to GPR1. This is largely open coded with lots
>> of comments, so move it out to a helper.
>
> Did you have a look at the following patch ? It's been waiting for=20=20
> application for some weeks now.=20=20
> https://patchwork.ozlabs.org/patch/771869

I actually merged it last merge window, but found I had no good way to
test it, so I took it out again until I can write a test case for it.

The way I realised it wasn't being tested was by removing all the
store_updates_sp logic entirely and having my system run happily for
several days :}

cheers

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 07/24] powerpc/mm: Move out definition of CPU specific is_write bits
  2017-07-22 16:40   ` LEROY Christophe
  2017-07-23  1:06     ` Benjamin Herrenschmidt
@ 2017-07-24 11:58     ` Michael Ellerman
  1 sibling, 0 replies; 45+ messages in thread
From: Michael Ellerman @ 2017-07-24 11:58 UTC (permalink / raw)
  To: LEROY Christophe, Benjamin Herrenschmidt; +Cc: linuxppc-dev, aneesh.kumar

LEROY Christophe <christophe.leroy@c-s.fr> writes:

> Benjamin Herrenschmidt <benh@kernel.crashing.org> a =C3=A9crit=C2=A0:
>
>> Define a common page_fault_is_write() helper and use it
>>
>> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> ---
>>  arch/powerpc/mm/fault.c | 18 +++++++++++-------
>>  1 file changed, 11 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>> index f257965b54b5..26ec0dd4f419 100644
>> --- a/arch/powerpc/mm/fault.c
>> +++ b/arch/powerpc/mm/fault.c
>> @@ -183,6 +183,16 @@ static int mm_fault_error(struct pt_regs *regs,=20=
=20
>> unsigned long addr, int fault)
>>  }
>>
>>  /*
>> + * Define the correct "is_write" bit in error_code based
>> + * on the processor family
>> + */
>> +#if (defined(CONFIG_4xx) || defined(CONFIG_BOOKE))
>> +#define page_fault_is_write(__err)	((__err) & ESR_DST)
>> +#else
>> +#define page_fault_is_write(__err)	((__err) & DSISR_ISSTORE)
>> +#endif
>
> Doesn't linux kernel codying style make preference to static inline=20=20
> functions instead of macros ?

In general yes. Especially for things that look like functions.

Ben was worried the code gen would be worse if it was a static inline
returning bool, I'll apply it and have a look.

cheers

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 02/24] powerpc/mm: Pre-filter SRR1 bits before do_page_fault()
  2017-07-22 16:43   ` LEROY Christophe
  2017-07-23  1:10     ` Benjamin Herrenschmidt
@ 2017-07-24 13:48     ` Michael Ellerman
  1 sibling, 0 replies; 45+ messages in thread
From: Michael Ellerman @ 2017-07-24 13:48 UTC (permalink / raw)
  To: LEROY Christophe, Benjamin Herrenschmidt; +Cc: linuxppc-dev, aneesh.kumar

LEROY Christophe <christophe.leroy@c-s.fr> writes:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> a =C3=A9crit=C2=A0:
>> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>> index faddc87d0205..f04bc9f6b134 100644
>> --- a/arch/powerpc/mm/fault.c
>> +++ b/arch/powerpc/mm/fault.c
>> @@ -203,23 +203,13 @@ static int __do_page_fault(struct pt_regs=20=20
>> *regs, unsigned long address,
>>  	unsigned int flags =3D FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
>>  	int code =3D SEGV_MAPERR;
>>  	int is_write =3D 0;
>> -	int trap =3D TRAP(regs);
>> - 	int is_exec =3D trap =3D=3D 0x400;
>> + 	int is_exec =3D TRAP(regs) =3D=3D 0x400;
>
> Don't we have a tab/space issue here ?

We do:

  Warnings for commit 0f9b89d759ba 'powerpc/mm: Pre-filter SRR1 bits before=
 do_page_fault()'
    Lines beginning with space in arch/powerpc/mm/fault.c
      + 	int is_exec =3D TRAP(regs) =3D=3D 0x400;

I can fix it up.

cheers

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 23/24] powerpc/mm: Cleanup check for stack expansion
  2017-07-24 10:47     ` Michael Ellerman
@ 2017-07-24 17:34       ` LEROY Christophe
  2017-07-25 11:19         ` Michael Ellerman
  0 siblings, 1 reply; 45+ messages in thread
From: LEROY Christophe @ 2017-07-24 17:34 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: aneesh.kumar, linuxppc-dev, Benjamin Herrenschmidt

Michael Ellerman <mpe@ellerman.id.au> a =C3=A9crit=C2=A0:

> LEROY Christophe <christophe.leroy@c-s.fr> writes:
>
>> Benjamin Herrenschmidt <benh@kernel.crashing.org> a =C3=A9crit=C2=A0:
>>
>>> When hitting below a VM_GROWSDOWN vma (typically growing the stack),
>>> we check whether it's a valid stack-growing instruction and we
>>> check the distance to GPR1. This is largely open coded with lots
>>> of comments, so move it out to a helper.
>>
>> Did you have a look at the following patch ? It's been waiting for
>> application for some weeks now.
>> https://patchwork.ozlabs.org/patch/771869
>
> I actually merged it last merge window, but found I had no good way to
> test it, so I took it out again until I can write a test case for it.
>
> The way I realised it wasn't being tested was by removing all the
> store_updates_sp logic entirely and having my system run happily for
> several days :}

Which demonstrates how unlikely this is, hence doing that get_user()=20=20
at=20every fault is waste of time.

How do you plan to handle that in parralele to ben's serie ?

I'll be back from vacation next week and may help finding a way to=20=20
test=20that. (A test program using alloca() ?)

Christophe

>
> cheers

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 23/24] powerpc/mm: Cleanup check for stack expansion
  2017-07-24 17:34       ` LEROY Christophe
@ 2017-07-25 11:19         ` Michael Ellerman
  2017-07-31 11:37           ` Christophe LEROY
  0 siblings, 1 reply; 45+ messages in thread
From: Michael Ellerman @ 2017-07-25 11:19 UTC (permalink / raw)
  To: LEROY Christophe; +Cc: aneesh.kumar, linuxppc-dev, Benjamin Herrenschmidt

LEROY Christophe <christophe.leroy@c-s.fr> writes:

> Michael Ellerman <mpe@ellerman.id.au> a =C3=A9crit=C2=A0:
>
>> LEROY Christophe <christophe.leroy@c-s.fr> writes:
>>
>>> Benjamin Herrenschmidt <benh@kernel.crashing.org> a =C3=A9crit=C2=A0:
>>>
>>>> When hitting below a VM_GROWSDOWN vma (typically growing the stack),
>>>> we check whether it's a valid stack-growing instruction and we
>>>> check the distance to GPR1. This is largely open coded with lots
>>>> of comments, so move it out to a helper.
>>>
>>> Did you have a look at the following patch ? It's been waiting for
>>> application for some weeks now.
>>> https://patchwork.ozlabs.org/patch/771869
>>
>> I actually merged it last merge window, but found I had no good way to
>> test it, so I took it out again until I can write a test case for it.
>>
>> The way I realised it wasn't being tested was by removing all the
>> store_updates_sp logic entirely and having my system run happily for
>> several days :}
>
> Which demonstrates how unlikely this is, hence doing that get_user()=20=20
> at every fault is waste of time.

Yes I agree.

> How do you plan to handle that in parralele to ben's serie ?

Not sure :)

> I'll be back from vacation next week and may help finding a way to=20=20
> test that. (A test program using alloca() ?)

I was thinking hand-crafted asm, but that might be a pain to get working
for 32 & 64-bit, in which case alloca() might work.

cheers

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 23/24] powerpc/mm: Cleanup check for stack expansion
  2017-07-25 11:19         ` Michael Ellerman
@ 2017-07-31 11:37           ` Christophe LEROY
  0 siblings, 0 replies; 45+ messages in thread
From: Christophe LEROY @ 2017-07-31 11:37 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: aneesh.kumar, linuxppc-dev, Benjamin Herrenschmidt



Le 25/07/2017 à 13:19, Michael Ellerman a écrit :
> LEROY Christophe <christophe.leroy@c-s.fr> writes:
> 
>> Michael Ellerman <mpe@ellerman.id.au> a écrit :
>>
>>> LEROY Christophe <christophe.leroy@c-s.fr> writes:
>>>
>>>> Benjamin Herrenschmidt <benh@kernel.crashing.org> a écrit :
>>>>
>>>>> When hitting below a VM_GROWSDOWN vma (typically growing the stack),
>>>>> we check whether it's a valid stack-growing instruction and we
>>>>> check the distance to GPR1. This is largely open coded with lots
>>>>> of comments, so move it out to a helper.
>>>>
>>>> Did you have a look at the following patch ? It's been waiting for
>>>> application for some weeks now.
>>>> https://patchwork.ozlabs.org/patch/771869
>>>
>>> I actually merged it last merge window, but found I had no good way to
>>> test it, so I took it out again until I can write a test case for it.
>>>
>>> The way I realised it wasn't being tested was by removing all the
>>> store_updates_sp logic entirely and having my system run happily for
>>> several days :}
>>
>> Which demonstrates how unlikely this is, hence doing that get_user()
>> at every fault is waste of time.
> 
> Yes I agree.
> 
>> How do you plan to handle that in parralele to ben's serie ?
> 
> Not sure :)
> 
>> I'll be back from vacation next week and may help finding a way to
>> test that. (A test program using alloca() ?)
> 
> I was thinking hand-crafted asm, but that might be a pain to get working
> for 32 & 64-bit, in which case alloca() might work.

No need of very sofisticated thing indeed.
The following app makes the trick. If I modify store_updates_sp() to 
always return 0, the app gets a SIGSEGV.

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char **argv)
{
	char buf[1024 * 1025];

	sprintf(buf, "Hello world !\n");
	printf(buf);

	exit(0);
}

Christophe

> 
> cheers
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/24] powerpc/6xx: Handle DABR match before calling do_page_fault
  2017-07-19  4:49 ` [PATCH 03/24] powerpc/6xx: Handle DABR match before calling do_page_fault Benjamin Herrenschmidt
@ 2017-08-03  0:19   ` Michael Ellerman
  2017-08-03  1:00     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 45+ messages in thread
From: Michael Ellerman @ 2017-08-03  0:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev; +Cc: aneesh.kumar

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On legacy 6xx 32-bit procesors, we checked for the DABR match bit
> in DSISR from do_page_fault(), in the middle of a pile of ifdef's
> because all other CPU types do it in assembly prior to calling
> do_page_fault. Fix that.
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>  arch/powerpc/kernel/entry_32.S | 11 +++++++++++
>  arch/powerpc/mm/fault.c        |  9 ---------
>  2 files changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index 8587059ad848..7331df033804 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -586,6 +586,8 @@ ppc_swapcontext:
>  handle_page_fault:
>  	stw	r4,_DAR(r1)
>  	addi	r3,r1,STACK_FRAME_OVERHEAD
> +	andis.  r0,r5,DSISR_DABRMATCH@h
> +	bne-    handle_dabr_fault
>  	bl	do_page_fault
>  	cmpwi	r3,0
>  	beq+	ret_from_except
> @@ -599,6 +601,15 @@ handle_page_fault:
>  	bl	bad_page_fault
>  	b	ret_from_except_full
>  
> +	/* We have a data breakpoint exception - handle it */
> +handle_dabr_fault:
> +	SAVE_NVGPRS(r1)
> +	lwz	r0,_TRAP(r1)
> +	clrrwi	r0,r0,1
> +	stw	r0,_TRAP(r1)
> +	bl      do_break
> +	b	ret_from_except_full

This breaks ~35% of the defconfigs.

eg. ppc44x_defconfig
http://kisskb.ellerman.id.au/kisskb/buildresult/13113408/

  arch/powerpc/kernel/entry_32.S:(.text+0x7ac): undefined reference to `do_break'

For now I've just wrapped the asm above in:

+#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE) || defined(CONFIG_PPC_8xx))

But would be happy if there was something less gross.

cheers

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/24] powerpc/6xx: Handle DABR match before calling do_page_fault
  2017-08-03  0:19   ` Michael Ellerman
@ 2017-08-03  1:00     ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2017-08-03  1:00 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: aneesh.kumar

On Thu, 2017-08-03 at 10:19 +1000, Michael Ellerman wrote:
>   arch/powerpc/kernel/entry_32.S:(.text+0x7ac): undefined reference to `do_break'
> 
> For now I've just wrapped the asm above in:
> 
> +#if !(defined(CONFIG_4xx) || defined(CONFIG_BOOKE) || defined(CONFIG_PPC_8xx))
> 
> But would be happy if there was something less gross.

#ifdef CONFIG_6xx...

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper
  2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
                   ` (22 preceding siblings ...)
  2017-07-19  4:49 ` [PATCH 24/24] powerpc: Remove old unused icswx based coprocessor support Benjamin Herrenschmidt
@ 2017-08-07 10:41 ` Michael Ellerman
  2017-08-07 16:37   ` Christophe LEROY
  23 siblings, 1 reply; 45+ messages in thread
From: Michael Ellerman @ 2017-08-07 10:41 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev; +Cc: aneesh.kumar

On Wed, 2017-07-19 at 04:49:23 UTC, Benjamin Herrenschmidt wrote:
> This will allow simplifying the returns from do_page_fault
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/7afad422ac61067473d5f3d20bbd54

cheers

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper
  2017-08-07 10:41 ` [01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Michael Ellerman
@ 2017-08-07 16:37   ` Christophe LEROY
  2017-08-08  2:16     ` Michael Ellerman
  0 siblings, 1 reply; 45+ messages in thread
From: Christophe LEROY @ 2017-08-07 16:37 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt; +Cc: linuxppc-dev, aneesh.kumar



Le 07/08/2017 à 12:41, Michael Ellerman a écrit :
> On Wed, 2017-07-19 at 04:49:23 UTC, Benjamin Herrenschmidt wrote:
>> This will allow simplifying the returns from do_page_fault
>>
>> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> 
> Series applied to powerpc next, thanks.
> 
> https://git.kernel.org/powerpc/c/7afad422ac61067473d5f3d20bbd54
> 


Boot failure on the 8xx:

[    6.029556] Failed to execute /init (error -14)
[    6.034623] Starting init: /bin/sh exists but couldn't execute it 
(error -14)
[    6.041489] Kernel panic - not syncing: No working init found.  Try 
passing init= option to kernel. See Linux 
Documentation/admin-guide/init.rst for guidance.
[    6.055518] CPU: 0 PID: 1 Comm: init Not tainted 
4.13.0-rc3-s3k-dev-00143-g7aa62e972a56 #56
[    6.063745] Call Trace:
[    6.066224] [c60f1ed0] [c001a624] panic+0x108/0x250 (unreliable)
[    6.072140] [c60f1f30] [c0002640] rootfs_mount+0x0/0x58
[    6.077311] [c60f1f40] [c000cb80] ret_from_kernel_thread+0x5c/0x64
[    6.083405] Rebooting in 180 seconds..


Bisected to c433ec0455f921eaf8dd0262a718ce6f8ad62ea2 ("powerpc/mm: 
Pre-filter SRR1 bits before do_page_fault()")


Will see tomorrow what's wrong with that.

Christophe

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper
  2017-08-07 16:37   ` Christophe LEROY
@ 2017-08-08  2:16     ` Michael Ellerman
  2017-08-08  6:45       ` Christophe LEROY
  0 siblings, 1 reply; 45+ messages in thread
From: Michael Ellerman @ 2017-08-08  2:16 UTC (permalink / raw)
  To: Christophe LEROY, Michael Ellerman, Benjamin Herrenschmidt
  Cc: linuxppc-dev, aneesh.kumar

Christophe LEROY <christophe.leroy@c-s.fr> writes:

> Le 07/08/2017 =C3=A0 12:41, Michael Ellerman a =C3=A9crit :
>> On Wed, 2017-07-19 at 04:49:23 UTC, Benjamin Herrenschmidt wrote:
>>> This will allow simplifying the returns from do_page_fault
>>>
>>> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>=20
>> Series applied to powerpc next, thanks.
>>=20
>> https://git.kernel.org/powerpc/c/7afad422ac61067473d5f3d20bbd54
>
> Boot failure on the 8xx:
>
> [    6.029556] Failed to execute /init (error -14)
> [    6.034623] Starting init: /bin/sh exists but couldn't execute it=20
> (error -14)
> [    6.041489] Kernel panic - not syncing: No working init found.  Try=20
> passing init=3D option to kernel. See Linux=20
> Documentation/admin-guide/init.rst for guidance.
> [    6.055518] CPU: 0 PID: 1 Comm: init Not tainted=20
> 4.13.0-rc3-s3k-dev-00143-g7aa62e972a56 #56
> [    6.063745] Call Trace:
> [    6.066224] [c60f1ed0] [c001a624] panic+0x108/0x250 (unreliable)
> [    6.072140] [c60f1f30] [c0002640] rootfs_mount+0x0/0x58
> [    6.077311] [c60f1f40] [c000cb80] ret_from_kernel_thread+0x5c/0x64
> [    6.083405] Rebooting in 180 seconds..
>
>
> Bisected to c433ec0455f921eaf8dd0262a718ce6f8ad62ea2 ("powerpc/mm:=20
> Pre-filter SRR1 bits before do_page_fault()")

Sorry about that. I don't have a way to test 8xx.

> Will see tomorrow what's wrong with that.

Thanks.

cheers

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper
  2017-08-08  2:16     ` Michael Ellerman
@ 2017-08-08  6:45       ` Christophe LEROY
  2017-08-08 10:00         ` Michael Ellerman
  0 siblings, 1 reply; 45+ messages in thread
From: Christophe LEROY @ 2017-08-08  6:45 UTC (permalink / raw)
  To: Michael Ellerman, Michael Ellerman, Benjamin Herrenschmidt
  Cc: linuxppc-dev, aneesh.kumar



Le 08/08/2017 à 04:16, Michael Ellerman a écrit :
> Christophe LEROY <christophe.leroy@c-s.fr> writes:
> 
>> Le 07/08/2017 à 12:41, Michael Ellerman a écrit :
>>> On Wed, 2017-07-19 at 04:49:23 UTC, Benjamin Herrenschmidt wrote:
>>>> This will allow simplifying the returns from do_page_fault
>>>>
>>>> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>>
>>> Series applied to powerpc next, thanks.
>>>
>>> https://git.kernel.org/powerpc/c/7afad422ac61067473d5f3d20bbd54
>>
>> Boot failure on the 8xx:
>>
>> [    6.029556] Failed to execute /init (error -14)
>> [    6.034623] Starting init: /bin/sh exists but couldn't execute it
>> (error -14)
>> [    6.041489] Kernel panic - not syncing: No working init found.  Try
>> passing init= option to kernel. See Linux
>> Documentation/admin-guide/init.rst for guidance.
>> [    6.055518] CPU: 0 PID: 1 Comm: init Not tainted
>> 4.13.0-rc3-s3k-dev-00143-g7aa62e972a56 #56
>> [    6.063745] Call Trace:
>> [    6.066224] [c60f1ed0] [c001a624] panic+0x108/0x250 (unreliable)
>> [    6.072140] [c60f1f30] [c0002640] rootfs_mount+0x0/0x58
>> [    6.077311] [c60f1f40] [c000cb80] ret_from_kernel_thread+0x5c/0x64
>> [    6.083405] Rebooting in 180 seconds..
>>
>>
>> Bisected to c433ec0455f921eaf8dd0262a718ce6f8ad62ea2 ("powerpc/mm:
>> Pre-filter SRR1 bits before do_page_fault()")
> 
> Sorry about that. I don't have a way to test 8xx.

Looks like I concluded too quickly yesterday night, indeed the above 
commit is the last good one. The faulty one is 
d300627c6a53693fb01479b59b0cdd293761b1fa("powerpc/6xx: Handle DABR match 
before calling do_page_fault")

> 
>> Will see tomorrow what's wrong with that.
> 
> Thanks.

This is because the 'bl do_page_fault' has been enclosed inside the 
#ifdef CONFIG_6xx

See patch at https://patchwork.ozlabs.org/patch/799039/

Christophe

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper
  2017-08-08  6:45       ` Christophe LEROY
@ 2017-08-08 10:00         ` Michael Ellerman
  0 siblings, 0 replies; 45+ messages in thread
From: Michael Ellerman @ 2017-08-08 10:00 UTC (permalink / raw)
  To: Christophe LEROY, Michael Ellerman, Benjamin Herrenschmidt
  Cc: linuxppc-dev, aneesh.kumar

Christophe LEROY <christophe.leroy@c-s.fr> writes:

> Le 08/08/2017 =C3=A0 04:16, Michael Ellerman a =C3=A9crit :
>> Christophe LEROY <christophe.leroy@c-s.fr> writes:
>>=20
>>> Le 07/08/2017 =C3=A0 12:41, Michael Ellerman a =C3=A9crit :
>>>> On Wed, 2017-07-19 at 04:49:23 UTC, Benjamin Herrenschmidt wrote:
>>>>> This will allow simplifying the returns from do_page_fault
>>>>>
>>>>> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>>>
>>>> Series applied to powerpc next, thanks.
>>>>
>>>> https://git.kernel.org/powerpc/c/7afad422ac61067473d5f3d20bbd54
>>>
>>> Boot failure on the 8xx:
>>>
>>> [    6.029556] Failed to execute /init (error -14)
>>> [    6.034623] Starting init: /bin/sh exists but couldn't execute it
>>> (error -14)
>>> [    6.041489] Kernel panic - not syncing: No working init found.  Try
>>> passing init=3D option to kernel. See Linux
>>> Documentation/admin-guide/init.rst for guidance.
>>> [    6.055518] CPU: 0 PID: 1 Comm: init Not tainted
>>> 4.13.0-rc3-s3k-dev-00143-g7aa62e972a56 #56
>>> [    6.063745] Call Trace:
>>> [    6.066224] [c60f1ed0] [c001a624] panic+0x108/0x250 (unreliable)
>>> [    6.072140] [c60f1f30] [c0002640] rootfs_mount+0x0/0x58
>>> [    6.077311] [c60f1f40] [c000cb80] ret_from_kernel_thread+0x5c/0x64
>>> [    6.083405] Rebooting in 180 seconds..
>>>
>>>
>>> Bisected to c433ec0455f921eaf8dd0262a718ce6f8ad62ea2 ("powerpc/mm:
>>> Pre-filter SRR1 bits before do_page_fault()")
>>=20
>> Sorry about that. I don't have a way to test 8xx.
>
> Looks like I concluded too quickly yesterday night, indeed the above=20
> commit is the last good one. The faulty one is=20
> d300627c6a53693fb01479b59b0cdd293761b1fa("powerpc/6xx: Handle DABR match=
=20
> before calling do_page_fault")
>
>>=20
>>> Will see tomorrow what's wrong with that.
>>=20
>> Thanks.
>
> This is because the 'bl do_page_fault' has been enclosed inside the=20
> #ifdef CONFIG_6xx
>
> See patch at https://patchwork.ozlabs.org/patch/799039/

Oops :}

Thanks for the fix.

cheers

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 12/24] powerpc/mm: Fix reporting of kernel execute faults
  2017-07-19  4:49 ` [PATCH 12/24] powerpc/mm: Fix reporting of kernel execute faults Benjamin Herrenschmidt
@ 2018-11-07  8:35   ` Christophe LEROY
  2018-11-07 10:39     ` Benjamin Herrenschmidt
       [not found]     ` <87zhtr5d1v.fsf@linux.ibm.com>
  0 siblings, 2 replies; 45+ messages in thread
From: Christophe LEROY @ 2018-11-07  8:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev; +Cc: aneesh.kumar, Nicholas Piggin

Hi Ben,

I have an issue on the 8xx with this change

Le 19/07/2017 à 06:49, Benjamin Herrenschmidt a écrit :
> We currently test for is_exec and DSISR_PROTFAULT but that doesn't
> make sense as this is the wrong error bit to test for an execute
> permission failure.

On the 8xx, on an exec permission failure, this is the correct BIT, see 
below extract from reference manual:

Note that only one of bits 1, 3, and 4 will be set.
1 1 if the translation of an attempted access is not in the translation 
tables. Otherwise 0
3 1 if the fetch access was to guarded memory when MSR[IR] = 1. Otherwise 0
4 1 if the access is not permitted by the protection mechanism; otherwise 0.

So on the 8xx, bit 3 is not DSISR_NOEXEC_OR_G but only DSISR_G.
When the PPP bits are set to No-Execute, we really get bit 4 that is 
DSISR_PROTFAULT.

> 
> In fact, we had code that would return early if we had an exec
> fault in kernel mode so I think that was just dead code anyway.
> 
> Finally the location of that test is awkward and prevents further
> simplifications.
> 
> So instead move that test into a helper along with the existing
> early test for kernel exec faults and out of range accesses,
> and put it all in a "bad_kernel_fault()" helper. While at it
> test the correct error bits.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>   arch/powerpc/mm/fault.c | 21 +++++++++++++++------
>   1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index e8d6acc888c5..aead07cf8a5b 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -180,6 +180,20 @@ static int mm_fault_error(struct pt_regs *regs, unsigned long addr, int fault)
>   	return MM_FAULT_CONTINUE;
>   }
>   
> +/* Is this a bad kernel fault ? */
> +static bool bad_kernel_fault(bool is_exec, unsigned long error_code,
> +			     unsigned long address)
> +{
> +	if (is_exec && (error_code & (DSISR_NOEXEC_OR_G | DSISR_KEYFAULT))) {

Do you mind if we had DSISR_PROTFAULT here as well ?

Christophe

> +		printk_ratelimited(KERN_CRIT "kernel tried to execute"
> +				   " exec-protected page (%lx) -"
> +				   "exploit attempt? (uid: %d)\n",
> +				   address, from_kuid(&init_user_ns,
> +						      current_uid()));
> +	}
> +	return is_exec || (address >= TASK_SIZE);
> +}
> +
>   /*
>    * Define the correct "is_write" bit in error_code based
>    * on the processor family
> @@ -252,7 +266,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>   	 * The kernel should never take an execute fault nor should it
>   	 * take a page fault to a kernel address.
>   	 */
> -	if (!is_user && (is_exec || (address >= TASK_SIZE)))
> +	if (unlikely(!is_user && bad_kernel_fault(is_exec, error_code, address)))
>   		return SIGSEGV;
>   
>   	/* We restore the interrupt state now */
> @@ -491,11 +505,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>   		return 0;
>   	}
>   
> -	if (is_exec && (error_code & DSISR_PROTFAULT))
> -		printk_ratelimited(KERN_CRIT "kernel tried to execute NX-protected"
> -				   " page (%lx) - exploit attempt? (uid: %d)\n",
> -				   address, from_kuid(&init_user_ns, current_uid()));
> -
>   	return SIGSEGV;
>   }
>   NOKPROBE_SYMBOL(__do_page_fault);
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 12/24] powerpc/mm: Fix reporting of kernel execute faults
  2018-11-07  8:35   ` Christophe LEROY
@ 2018-11-07 10:39     ` Benjamin Herrenschmidt
       [not found]     ` <87zhtr5d1v.fsf@linux.ibm.com>
  1 sibling, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2018-11-07 10:39 UTC (permalink / raw)
  To: Christophe LEROY, linuxppc-dev; +Cc: aneesh.kumar, Nicholas Piggin

On Wed, 2018-11-07 at 09:35 +0100, Christophe LEROY wrote:
> Hi Ben,
> 
> I have an issue on the 8xx with this change

Ah ouch...

 .../...
  
> > +/* Is this a bad kernel fault ? */
> > +static bool bad_kernel_fault(bool is_exec, unsigned long error_code,
> > +			     unsigned long address)
> > +{
> > +	if (is_exec && (error_code & (DSISR_NOEXEC_OR_G | DSISR_KEYFAULT))) {
> 
> Do you mind if we had DSISR_PROTFAULT here as well ?

Off the top of my mind, I don't see a problem with that... but it would
definitely require an explanation comment.

> > +		printk_ratelimited(KERN_CRIT "kernel tried to execute"
> > +				   " exec-protected page (%lx) -"
> > +				   "exploit attempt? (uid: %d)\n",
> > +				   address, from_kuid(&init_user_ns,
> > +						      current_uid()));
> > +	}
> > +	return is_exec || (address >= TASK_SIZE);
> > +}
> > +
> >   /*
> >    * Define the correct "is_write" bit in error_code based
> >    * on the processor family
> > @@ -252,7 +266,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
> >   	 * The kernel should never take an execute fault nor should it
> >   	 * take a page fault to a kernel address.
> >   	 */
> > -	if (!is_user && (is_exec || (address >= TASK_SIZE)))
> > +	if (unlikely(!is_user && bad_kernel_fault(is_exec, error_code, address)))
> >   		return SIGSEGV;
> >   
> >   	/* We restore the interrupt state now */
> > @@ -491,11 +505,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
> >   		return 0;
> >   	}
> >   
> > -	if (is_exec && (error_code & DSISR_PROTFAULT))
> > -		printk_ratelimited(KERN_CRIT "kernel tried to execute NX-protected"
> > -				   " page (%lx) - exploit attempt? (uid: %d)\n",
> > -				   address, from_kuid(&init_user_ns, current_uid()));
> > -
> >   	return SIGSEGV;
> >   }
> >   NOKPROBE_SYMBOL(__do_page_fault);
> > 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 12/24] powerpc/mm: Fix reporting of kernel execute faults
       [not found]     ` <87zhtr5d1v.fsf@linux.ibm.com>
@ 2018-11-30  6:08       ` Christophe LEROY
  0 siblings, 0 replies; 45+ messages in thread
From: Christophe LEROY @ 2018-11-30  6:08 UTC (permalink / raw)
  To: Aneesh Kumar K.V; +Cc: linuxppc-dev



Le 30/11/2018 à 06:50, Aneesh Kumar K.V a écrit :
> Christophe LEROY <christophe.leroy@c-s.fr> writes:
> 
>> Hi Ben,
>>
>> I have an issue on the 8xx with this change
>>
>> Le 19/07/2017 à 06:49, Benjamin Herrenschmidt a écrit :
>>> We currently test for is_exec and DSISR_PROTFAULT but that doesn't
>>> make sense as this is the wrong error bit to test for an execute
>>> permission failure.
>>
>> On the 8xx, on an exec permission failure, this is the correct BIT, see
>> below extract from reference manual:
>>
>> Note that only one of bits 1, 3, and 4 will be set.
>> 1 1 if the translation of an attempted access is not in the translation
>> tables. Otherwise 0
>> 3 1 if the fetch access was to guarded memory when MSR[IR] = 1. Otherwise 0
>> 4 1 if the access is not permitted by the protection mechanism; otherwise 0.
>>
>> So on the 8xx, bit 3 is not DSISR_NOEXEC_OR_G but only DSISR_G.
>> When the PPP bits are set to No-Execute, we really get bit 4 that is
>> DSISR_PROTFAULT.
> 
> 
> Do you have an url for the document? I am wondering whether we can get
> Documentation/powerpc/cpu_families.txt updated with these urls?

https://www.nxp.com/docs/en/reference-manual/MPC885RM.pdf

Christophe

> 
>>
>>>
>>> In fact, we had code that would return early if we had an exec
>>> fault in kernel mode so I think that was just dead code anyway.
>>>
>>> Finally the location of that test is awkward and prevents further
>>> simplifications.
>>>
>>> So instead move that test into a helper along with the existing
>>> early test for kernel exec faults and out of range accesses,
>>> and put it all in a "bad_kernel_fault()" helper. While at it
>>> test the correct error bits.
>>>
>>> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>> ---
>>>    arch/powerpc/mm/fault.c | 21 +++++++++++++++------
>>>    1 file changed, 15 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>>> index e8d6acc888c5..aead07cf8a5b 100644
>>> --- a/arch/powerpc/mm/fault.c
>>> +++ b/arch/powerpc/mm/fault.c
>>> @@ -180,6 +180,20 @@ static int mm_fault_error(struct pt_regs *regs, unsigned long addr, int fault)
>>>    	return MM_FAULT_CONTINUE;
>>>    }
>>>    
>>> +/* Is this a bad kernel fault ? */
>>> +static bool bad_kernel_fault(bool is_exec, unsigned long error_code,
>>> +			     unsigned long address)
>>> +{
>>> +	if (is_exec && (error_code & (DSISR_NOEXEC_OR_G | DSISR_KEYFAULT))) {
>>
>> Do you mind if we had DSISR_PROTFAULT here as well ?
>>
>> Christophe
>>
>>> +		printk_ratelimited(KERN_CRIT "kernel tried to execute"
>>> +				   " exec-protected page (%lx) -"
>>> +				   "exploit attempt? (uid: %d)\n",
>>> +				   address, from_kuid(&init_user_ns,
>>> +						      current_uid()));
>>> +	}
>>> +	return is_exec || (address >= TASK_SIZE);
>>> +}
>>> +
>>>    /*
>>>     * Define the correct "is_write" bit in error_code based
>>>     * on the processor family
>>> @@ -252,7 +266,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>>>    	 * The kernel should never take an execute fault nor should it
>>>    	 * take a page fault to a kernel address.
>>>    	 */
>>> -	if (!is_user && (is_exec || (address >= TASK_SIZE)))
>>> +	if (unlikely(!is_user && bad_kernel_fault(is_exec, error_code, address)))
>>>    		return SIGSEGV;
>>>    
>>>    	/* We restore the interrupt state now */
>>> @@ -491,11 +505,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>>>    		return 0;
>>>    	}
>>>    
>>> -	if (is_exec && (error_code & DSISR_PROTFAULT))
>>> -		printk_ratelimited(KERN_CRIT "kernel tried to execute NX-protected"
>>> -				   " page (%lx) - exploit attempt? (uid: %d)\n",
>>> -				   address, from_kuid(&init_user_ns, current_uid()));
>>> -
>>>    	return SIGSEGV;
>>>    }
>>>    NOKPROBE_SYMBOL(__do_page_fault);
>>>

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2018-11-30  6:10 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-19  4:49 [PATCH 01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 02/24] powerpc/mm: Pre-filter SRR1 bits before do_page_fault() Benjamin Herrenschmidt
2017-07-22 16:43   ` LEROY Christophe
2017-07-23  1:10     ` Benjamin Herrenschmidt
2017-07-24 13:48     ` Michael Ellerman
2017-07-19  4:49 ` [PATCH 03/24] powerpc/6xx: Handle DABR match before calling do_page_fault Benjamin Herrenschmidt
2017-08-03  0:19   ` Michael Ellerman
2017-08-03  1:00     ` Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 04/24] powerpc/mm: Update definitions of DSISR bits Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 05/24] powerpc/mm: Update bits used to skip hash_page Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 06/24] powerpc/mm: Use symbolic constants for filtering SRR1 bits on ISIs Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 07/24] powerpc/mm: Move out definition of CPU specific is_write bits Benjamin Herrenschmidt
2017-07-22 16:40   ` LEROY Christophe
2017-07-23  1:06     ` Benjamin Herrenschmidt
2017-07-24 11:58     ` Michael Ellerman
2017-07-19  4:49 ` [PATCH 08/24] powerpc/mm: Move error_code checks for bad faults earlier Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 09/24] powerpc/mm: Overhaul handling of bad page faults Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 10/24] powerpc/mm: Move debugger check to notify_page_fault() Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 11/24] powerpc/mm: Simplify returns from __do_page_fault Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 12/24] powerpc/mm: Fix reporting of kernel execute faults Benjamin Herrenschmidt
2018-11-07  8:35   ` Christophe LEROY
2018-11-07 10:39     ` Benjamin Herrenschmidt
     [not found]     ` <87zhtr5d1v.fsf@linux.ibm.com>
2018-11-30  6:08       ` Christophe LEROY
2017-07-19  4:49 ` [PATCH 13/24] powerpc/mm: Make bad_area* helper functions Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 14/24] powerpc/mm: Rework mm_fault_error() Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 15/24] powerpc/mm: Move CMO accounting out of do_page_fault into a helper Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 16/24] powerpc/mm: Cosmetic fix to page fault accounting Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 17/24] powerpc/mm: Move the DSISR_PROTFAULT sanity check Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 18/24] powerpc/mm: Move/simplify faulthandler_disabled() and !mm check Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 19/24] powerpc/mm: Add a bunch of (un)likely annotations to do_page_fault Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 20/24] powerpc/mm: Set fault flags earlier Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 21/24] powerpc/mm: Move page fault VMA access checks to a helper Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 22/24] powerpc/mm: Don't lose "major" fault indication on retry Benjamin Herrenschmidt
2017-07-19  4:49 ` [PATCH 23/24] powerpc/mm: Cleanup check for stack expansion Benjamin Herrenschmidt
2017-07-21 16:59   ` LEROY Christophe
2017-07-24 10:47     ` Michael Ellerman
2017-07-24 17:34       ` LEROY Christophe
2017-07-25 11:19         ` Michael Ellerman
2017-07-31 11:37           ` Christophe LEROY
2017-07-19  4:49 ` [PATCH 24/24] powerpc: Remove old unused icswx based coprocessor support Benjamin Herrenschmidt
2017-08-07 10:41 ` [01/24] powerpc/mm: Move exception_enter/exit to a do_page_fault wrapper Michael Ellerman
2017-08-07 16:37   ` Christophe LEROY
2017-08-08  2:16     ` Michael Ellerman
2017-08-08  6:45       ` Christophe LEROY
2017-08-08 10:00         ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.