All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 1/3] powerpc/mm: Move get_user() out of store_updates_sp()
@ 2018-05-22 14:02 Christophe Leroy
  2018-05-22 14:02 ` [PATCH v7 2/3] powerpc/mm: Only read faulting instruction when necessary in do_page_fault() Christophe Leroy
  2018-05-22 14:02 ` [PATCH v7 3/3] powerpc/mm: Use instruction symbolic names in store_updates_sp() Christophe Leroy
  0 siblings, 2 replies; 8+ messages in thread
From: Christophe Leroy @ 2018-05-22 14:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, npiggin
  Cc: linux-kernel, linuxppc-dev

In preparation of the following patch which will focus on calling
that get_user() only when necessary, this patch takes out of
store_updates_sp() the call to get_user() used to read the faulting
instruction.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 v7: New

 arch/powerpc/mm/fault.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index c01d627e687a..fcbb34431da2 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -66,15 +66,11 @@ static inline bool notify_page_fault(struct pt_regs *regs)
 }
 
 /*
- * Check whether the instruction at regs->nip is a store using
+ * Check whether the instruction inst is a store using
  * an update addressing form which will update r1.
  */
-static bool store_updates_sp(struct pt_regs *regs)
+static bool store_updates_sp(unsigned int inst)
 {
-	unsigned int inst;
-
-	if (get_user(inst, (unsigned int __user *)regs->nip))
-		return false;
 	/* check for 1 in the rA field */
 	if (((inst >> 16) & 0x1f) != 1)
 		return false;
@@ -235,7 +231,7 @@ static bool bad_kernel_fault(bool is_exec, unsigned long error_code,
 
 static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
 				struct vm_area_struct *vma,
-				bool store_update_sp)
+				unsigned int inst)
 {
 	/*
 	 * N.B. The POWER/Open ABI allows programs to access up to
@@ -264,7 +260,7 @@ static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
 		 * between the last mapped region and the stack will
 		 * expand the stack rather than segfaulting.
 		 */
-		if (address + 2048 < uregs->gpr[1] && !store_update_sp)
+		if (address + 2048 < uregs->gpr[1] && !store_updates_sp(inst))
 			return true;
 	}
 	return false;
@@ -403,7 +399,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	int is_user = user_mode(regs);
 	int is_write = page_fault_is_write(error_code);
 	int fault, major = 0;
-	bool store_update_sp = false;
+	unsigned int inst = 0;
 
 	if (notify_page_fault(regs))
 		return 0;
@@ -455,7 +451,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	 * mmap_sem held
 	 */
 	if (is_write && is_user)
-		store_update_sp = store_updates_sp(regs);
+		get_user(inst, (unsigned int __user *)regs->nip);
 
 	if (is_user)
 		flags |= FAULT_FLAG_USER;
@@ -503,7 +499,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 		return bad_area(regs, address);
 
 	/* The stack is being expanded, check if it's valid */
-	if (unlikely(bad_stack_expansion(regs, address, vma, store_update_sp)))
+	if (unlikely(bad_stack_expansion(regs, address, vma, inst)))
 		return bad_area(regs, address);
 
 	/* Try to expand it */
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v7 2/3] powerpc/mm: Only read faulting instruction when necessary in do_page_fault()
  2018-05-22 14:02 [PATCH v7 1/3] powerpc/mm: Move get_user() out of store_updates_sp() Christophe Leroy
@ 2018-05-22 14:02 ` Christophe Leroy
  2018-05-22 14:38   ` Nicholas Piggin
  2018-05-22 14:02 ` [PATCH v7 3/3] powerpc/mm: Use instruction symbolic names in store_updates_sp() Christophe Leroy
  1 sibling, 1 reply; 8+ messages in thread
From: Christophe Leroy @ 2018-05-22 14:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, npiggin
  Cc: linux-kernel, linuxppc-dev

Commit a7a9dcd882a67 ("powerpc: Avoid taking a data miss on every
userspace instruction miss") has shown that limiting the read of
faulting instruction to likely cases improves performance.

This patch goes further into this direction by limiting the read
of the faulting instruction to the only cases where it is likely
needed.

On an MPC885, with the same benchmark app as in the commit referred
above, we see a reduction of about 3900 dTLB misses (approx 3%):

Before the patch:
 Performance counter stats for './fault 500' (10 runs):

         683033312      cpu-cycles                                                    ( +-  0.03% )
            134538      dTLB-load-misses                                              ( +-  0.03% )
             46099      iTLB-load-misses                                              ( +-  0.02% )
             19681      faults                                                        ( +-  0.02% )

       5.389747878 seconds time elapsed                                          ( +-  0.06% )

With the patch:

 Performance counter stats for './fault 500' (10 runs):

         682112862      cpu-cycles                                                    ( +-  0.03% )
            130619      dTLB-load-misses                                              ( +-  0.03% )
             46073      iTLB-load-misses                                              ( +-  0.05% )
             19681      faults                                                        ( +-  0.01% )

       5.381342641 seconds time elapsed                                          ( +-  0.07% )

The proper work of the huge stack expansion was tested with the
following app:

int main(int argc, char **argv)
{
	char buf[1024 * 1025];

	sprintf(buf, "Hello world !\n");
	printf(buf);

	exit(0);
}

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 v7: Following comment from Nicholas on v6 on possibility of the page getting removed from the pagetables
     between the fault and the read, I have reworked the patch in order to do the get_user() in
     __do_page_fault() directly in order to reduce complexity compared to version v5

 v6: Rebased on latest powerpc/merge branch ; Using __get_user_inatomic() instead of get_user() in order
     to move it inside the semaphored area. That removes all the complexity of the patch.

 v5: Reworked to fit after Benh do_fault improvement and rebased on top of powerpc/merge (65152902e43fef)

 v4: Rebased on top of powerpc/next (f718d426d7e42e) and doing access_ok() verification before __get_user_xxx()

 v3: Do a first try with pagefault disabled before releasing the semaphore

 v2: Changes 'if (cond1) if (cond2)' by 'if (cond1 && cond2)'

 arch/powerpc/mm/fault.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index fcbb34431da2..dc64b8e06477 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -450,9 +450,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	 * can result in fault, which will cause a deadlock when called with
 	 * mmap_sem held
 	 */
-	if (is_write && is_user)
-		get_user(inst, (unsigned int __user *)regs->nip);
-
 	if (is_user)
 		flags |= FAULT_FLAG_USER;
 	if (is_write)
@@ -498,6 +495,26 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
 		return bad_area(regs, address);
 
+	if (unlikely(is_write && is_user && address + 0x100000 < vma->vm_end &&
+		     !inst)) {
+		unsigned int __user *nip = (unsigned int __user *)regs->nip;
+
+		if (likely(access_ok(VERIFY_READ, nip, sizeof(inst)))) {
+			int res;
+
+			pagefault_disable();
+			res = __get_user_inatomic(inst, nip);
+			pagefault_enable();
+			if (unlikely(res)) {
+				up_read(&mm->mmap_sem);
+				res = __get_user(inst, nip);
+				if (!res && inst)
+					goto retry;
+				return bad_area_nosemaphore(regs, address);
+			}
+		}
+	}
+
 	/* The stack is being expanded, check if it's valid */
 	if (unlikely(bad_stack_expansion(regs, address, vma, inst)))
 		return bad_area(regs, address);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v7 3/3] powerpc/mm: Use instruction symbolic names in store_updates_sp()
  2018-05-22 14:02 [PATCH v7 1/3] powerpc/mm: Move get_user() out of store_updates_sp() Christophe Leroy
  2018-05-22 14:02 ` [PATCH v7 2/3] powerpc/mm: Only read faulting instruction when necessary in do_page_fault() Christophe Leroy
@ 2018-05-22 14:02 ` Christophe Leroy
  1 sibling, 0 replies; 8+ messages in thread
From: Christophe Leroy @ 2018-05-22 14:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, npiggin
  Cc: linux-kernel, linuxppc-dev

Use symbolic names defined in asm/ppc-opcode.h
instead of hardcoded values.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 v7: New

 arch/powerpc/include/asm/ppc-opcode.h |  1 +
 arch/powerpc/mm/fault.c               | 26 +++++++++++++-------------
 2 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 18883b8a6dac..4436887bc415 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -162,6 +162,7 @@
 /* VMX Vector Store Instructions */
 #define OP_31_XOP_STVX          231
 
+#define OP_31   31
 #define OP_LWZ  32
 #define OP_STFS 52
 #define OP_STFSU 53
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index dc64b8e06477..20f696c10bb3 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -76,23 +76,23 @@ static bool store_updates_sp(unsigned int inst)
 		return false;
 	/* check major opcode */
 	switch (inst >> 26) {
-	case 37:	/* stwu */
-	case 39:	/* stbu */
-	case 45:	/* sthu */
-	case 53:	/* stfsu */
-	case 55:	/* stfdu */
+	case OP_STWU:
+	case OP_STBU:
+	case OP_STHU:
+	case OP_STFSU:
+	case OP_STFDU:
 		return true;
-	case 62:	/* std or stdu */
+	case OP_STD:	/* std or stdu */
 		return (inst & 3) == 1;
-	case 31:
+	case OP_31:
 		/* check minor opcode */
 		switch ((inst >> 1) & 0x3ff) {
-		case 181:	/* stdux */
-		case 183:	/* stwux */
-		case 247:	/* stbux */
-		case 439:	/* sthux */
-		case 695:	/* stfsux */
-		case 759:	/* stfdux */
+		case OP_31_XOP_STDUX:
+		case OP_31_XOP_STWUX:
+		case OP_31_XOP_STBUX:
+		case OP_31_XOP_STHUX:
+		case OP_31_XOP_STFSUX:
+		case OP_31_XOP_STFDUX:
 			return true;
 		}
 	}
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v7 2/3] powerpc/mm: Only read faulting instruction when necessary in do_page_fault()
  2018-05-22 14:02 ` [PATCH v7 2/3] powerpc/mm: Only read faulting instruction when necessary in do_page_fault() Christophe Leroy
@ 2018-05-22 14:38   ` Nicholas Piggin
  2018-05-22 14:50     ` Christophe LEROY
  0 siblings, 1 reply; 8+ messages in thread
From: Nicholas Piggin @ 2018-05-22 14:38 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, 22 May 2018 16:02:56 +0200 (CEST)
Christophe Leroy <christophe.leroy@c-s.fr> wrote:

> Commit a7a9dcd882a67 ("powerpc: Avoid taking a data miss on every
> userspace instruction miss") has shown that limiting the read of
> faulting instruction to likely cases improves performance.
> 
> This patch goes further into this direction by limiting the read
> of the faulting instruction to the only cases where it is likely
> needed.
> 
> On an MPC885, with the same benchmark app as in the commit referred
> above, we see a reduction of about 3900 dTLB misses (approx 3%):
> 
> Before the patch:
>  Performance counter stats for './fault 500' (10 runs):
> 
>          683033312      cpu-cycles                                                    ( +-  0.03% )
>             134538      dTLB-load-misses                                              ( +-  0.03% )
>              46099      iTLB-load-misses                                              ( +-  0.02% )
>              19681      faults                                                        ( +-  0.02% )
> 
>        5.389747878 seconds time elapsed                                          ( +-  0.06% )
> 
> With the patch:
> 
>  Performance counter stats for './fault 500' (10 runs):
> 
>          682112862      cpu-cycles                                                    ( +-  0.03% )
>             130619      dTLB-load-misses                                              ( +-  0.03% )
>              46073      iTLB-load-misses                                              ( +-  0.05% )
>              19681      faults                                                        ( +-  0.01% )
> 
>        5.381342641 seconds time elapsed                                          ( +-  0.07% )
> 
> The proper work of the huge stack expansion was tested with the
> following app:
> 
> int main(int argc, char **argv)
> {
> 	char buf[1024 * 1025];
> 
> 	sprintf(buf, "Hello world !\n");
> 	printf(buf);
> 
> 	exit(0);
> }
> 
> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> ---
>  v7: Following comment from Nicholas on v6 on possibility of the page getting removed from the pagetables
>      between the fault and the read, I have reworked the patch in order to do the get_user() in
>      __do_page_fault() directly in order to reduce complexity compared to version v5

This is looking better, thanks.

> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index fcbb34431da2..dc64b8e06477 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -450,9 +450,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>  	 * can result in fault, which will cause a deadlock when called with
>  	 * mmap_sem held
>  	 */
> -	if (is_write && is_user)
> -		get_user(inst, (unsigned int __user *)regs->nip);
> -
>  	if (is_user)
>  		flags |= FAULT_FLAG_USER;
>  	if (is_write)
> @@ -498,6 +495,26 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>  	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
>  		return bad_area(regs, address);
>  
> +	if (unlikely(is_write && is_user && address + 0x100000 < vma->vm_end &&
> +		     !inst)) {
> +		unsigned int __user *nip = (unsigned int __user *)regs->nip;
> +
> +		if (likely(access_ok(VERIFY_READ, nip, sizeof(inst)))) {
> +			int res;
> +
> +			pagefault_disable();
> +			res = __get_user_inatomic(inst, nip);
> +			pagefault_enable();
> +			if (unlikely(res)) {
> +				up_read(&mm->mmap_sem);
> +				res = __get_user(inst, nip);
> +				if (!res && inst)
> +					goto retry;

You're handling error here but the previous code did not?

> +				return bad_area_nosemaphore(regs, address);
> +			}
> +		}
> +	}

Would it be nicer to move all this up into bad_stack_expansion().
It would need a way to handle the retry and insn, but I think it
would still look better.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v7 2/3] powerpc/mm: Only read faulting instruction when necessary in do_page_fault()
  2018-05-22 14:38   ` Nicholas Piggin
@ 2018-05-22 14:50     ` Christophe LEROY
  2018-05-23  6:29         ` Nicholas Piggin
  0 siblings, 1 reply; 8+ messages in thread
From: Christophe LEROY @ 2018-05-22 14:50 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev



Le 22/05/2018 à 16:38, Nicholas Piggin a écrit :
> On Tue, 22 May 2018 16:02:56 +0200 (CEST)
> Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> 
>> Commit a7a9dcd882a67 ("powerpc: Avoid taking a data miss on every
>> userspace instruction miss") has shown that limiting the read of
>> faulting instruction to likely cases improves performance.
>>
>> This patch goes further into this direction by limiting the read
>> of the faulting instruction to the only cases where it is likely
>> needed.
>>
>> On an MPC885, with the same benchmark app as in the commit referred
>> above, we see a reduction of about 3900 dTLB misses (approx 3%):
>>
>> Before the patch:
>>   Performance counter stats for './fault 500' (10 runs):
>>
>>           683033312      cpu-cycles                                                    ( +-  0.03% )
>>              134538      dTLB-load-misses                                              ( +-  0.03% )
>>               46099      iTLB-load-misses                                              ( +-  0.02% )
>>               19681      faults                                                        ( +-  0.02% )
>>
>>         5.389747878 seconds time elapsed                                          ( +-  0.06% )
>>
>> With the patch:
>>
>>   Performance counter stats for './fault 500' (10 runs):
>>
>>           682112862      cpu-cycles                                                    ( +-  0.03% )
>>              130619      dTLB-load-misses                                              ( +-  0.03% )
>>               46073      iTLB-load-misses                                              ( +-  0.05% )
>>               19681      faults                                                        ( +-  0.01% )
>>
>>         5.381342641 seconds time elapsed                                          ( +-  0.07% )
>>
>> The proper work of the huge stack expansion was tested with the
>> following app:
>>
>> int main(int argc, char **argv)
>> {
>> 	char buf[1024 * 1025];
>>
>> 	sprintf(buf, "Hello world !\n");
>> 	printf(buf);
>>
>> 	exit(0);
>> }
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>> ---
>>   v7: Following comment from Nicholas on v6 on possibility of the page getting removed from the pagetables
>>       between the fault and the read, I have reworked the patch in order to do the get_user() in
>>       __do_page_fault() directly in order to reduce complexity compared to version v5
> 
> This is looking better, thanks.
> 
>> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>> index fcbb34431da2..dc64b8e06477 100644
>> --- a/arch/powerpc/mm/fault.c
>> +++ b/arch/powerpc/mm/fault.c
>> @@ -450,9 +450,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>>   	 * can result in fault, which will cause a deadlock when called with
>>   	 * mmap_sem held
>>   	 */
>> -	if (is_write && is_user)
>> -		get_user(inst, (unsigned int __user *)regs->nip);
>> -
>>   	if (is_user)
>>   		flags |= FAULT_FLAG_USER;
>>   	if (is_write)
>> @@ -498,6 +495,26 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>>   	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
>>   		return bad_area(regs, address);
>>   
>> +	if (unlikely(is_write && is_user && address + 0x100000 < vma->vm_end &&
>> +		     !inst)) {
>> +		unsigned int __user *nip = (unsigned int __user *)regs->nip;
>> +
>> +		if (likely(access_ok(VERIFY_READ, nip, sizeof(inst)))) {
>> +			int res;
>> +
>> +			pagefault_disable();
>> +			res = __get_user_inatomic(inst, nip);
>> +			pagefault_enable();
>> +			if (unlikely(res)) {
>> +				up_read(&mm->mmap_sem);
>> +				res = __get_user(inst, nip);
>> +				if (!res && inst)
>> +					goto retry;
> 
> You're handling error here but the previous code did not?

The previous code did in store_updates_sp()

When I moved get_user() out of that function in preceeding patch, I did 
consider that if get_user() fails, inst will remain 0, which means that 
store_updates_sp() will return false if ever called.

Now, as the semaphore has been released, we really need to do something, 
because if we goto retry inconditionally, we may end up in an infinite 
loop, and we can't let it continue either as the semaphore is not held 
anymore.

> 
>> +				return bad_area_nosemaphore(regs, address);
>> +			}
>> +		}
>> +	}
> 
> Would it be nicer to move all this up into bad_stack_expansion().
> It would need a way to handle the retry and insn, but I think it
> would still look better.

That's what I did in v5 indeed, but it looked too complex to me at the 
end. Can you have a look at it 
(https://patchwork.ozlabs.org/patch/799053/) and tell me if you feel it 
better than v7, or if you have any suggestion to improve based on v5 
and/or v7 ?

Thanks
Christophe

> 
> Thanks,
> Nick
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v7 2/3] powerpc/mm: Only read faulting instruction when necessary in do_page_fault()
  2018-05-22 14:50     ` Christophe LEROY
@ 2018-05-23  6:29         ` Nicholas Piggin
  0 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2018-05-23  6:29 UTC (permalink / raw)
  To: Christophe LEROY
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, 22 May 2018 16:50:55 +0200
Christophe LEROY <christophe.leroy@c-s.fr> wrote:

> Le 22/05/2018 à 16:38, Nicholas Piggin a écrit :
> > On Tue, 22 May 2018 16:02:56 +0200 (CEST)
> > Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >   
> >> Commit a7a9dcd882a67 ("powerpc: Avoid taking a data miss on every
> >> userspace instruction miss") has shown that limiting the read of
> >> faulting instruction to likely cases improves performance.
> >>
> >> This patch goes further into this direction by limiting the read
> >> of the faulting instruction to the only cases where it is likely
> >> needed.
> >>
> >> On an MPC885, with the same benchmark app as in the commit referred
> >> above, we see a reduction of about 3900 dTLB misses (approx 3%):
> >>
> >> Before the patch:
> >>   Performance counter stats for './fault 500' (10 runs):
> >>
> >>           683033312      cpu-cycles                                                    ( +-  0.03% )
> >>              134538      dTLB-load-misses                                              ( +-  0.03% )
> >>               46099      iTLB-load-misses                                              ( +-  0.02% )
> >>               19681      faults                                                        ( +-  0.02% )
> >>
> >>         5.389747878 seconds time elapsed                                          ( +-  0.06% )
> >>
> >> With the patch:
> >>
> >>   Performance counter stats for './fault 500' (10 runs):
> >>
> >>           682112862      cpu-cycles                                                    ( +-  0.03% )
> >>              130619      dTLB-load-misses                                              ( +-  0.03% )
> >>               46073      iTLB-load-misses                                              ( +-  0.05% )
> >>               19681      faults                                                        ( +-  0.01% )
> >>
> >>         5.381342641 seconds time elapsed                                          ( +-  0.07% )
> >>
> >> The proper work of the huge stack expansion was tested with the
> >> following app:
> >>
> >> int main(int argc, char **argv)
> >> {
> >> 	char buf[1024 * 1025];
> >>
> >> 	sprintf(buf, "Hello world !\n");
> >> 	printf(buf);
> >>
> >> 	exit(0);
> >> }
> >>
> >> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >> ---
> >>   v7: Following comment from Nicholas on v6 on possibility of the page getting removed from the pagetables
> >>       between the fault and the read, I have reworked the patch in order to do the get_user() in
> >>       __do_page_fault() directly in order to reduce complexity compared to version v5  
> > 
> > This is looking better, thanks.
> >   
> >> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> >> index fcbb34431da2..dc64b8e06477 100644
> >> --- a/arch/powerpc/mm/fault.c
> >> +++ b/arch/powerpc/mm/fault.c
> >> @@ -450,9 +450,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
> >>   	 * can result in fault, which will cause a deadlock when called with
> >>   	 * mmap_sem held
> >>   	 */
> >> -	if (is_write && is_user)
> >> -		get_user(inst, (unsigned int __user *)regs->nip);
> >> -
> >>   	if (is_user)
> >>   		flags |= FAULT_FLAG_USER;
> >>   	if (is_write)
> >> @@ -498,6 +495,26 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
> >>   	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
> >>   		return bad_area(regs, address);
> >>   
> >> +	if (unlikely(is_write && is_user && address + 0x100000 < vma->vm_end &&
> >> +		     !inst)) {
> >> +		unsigned int __user *nip = (unsigned int __user *)regs->nip;
> >> +
> >> +		if (likely(access_ok(VERIFY_READ, nip, sizeof(inst)))) {
> >> +			int res;
> >> +
> >> +			pagefault_disable();
> >> +			res = __get_user_inatomic(inst, nip);
> >> +			pagefault_enable();
> >> +			if (unlikely(res)) {
> >> +				up_read(&mm->mmap_sem);
> >> +				res = __get_user(inst, nip);
> >> +				if (!res && inst)
> >> +					goto retry;  
> > 
> > You're handling error here but the previous code did not?  
> 
> The previous code did in store_updates_sp()
> 
> When I moved get_user() out of that function in preceeding patch, I did 
> consider that if get_user() fails, inst will remain 0, which means that 
> store_updates_sp() will return false if ever called.

Well it handles it just by saying no the store does not update SP.
Yours now segfaults it, doesn't it?

I don't think that's a bad idea, I think it should go in a patch by
itself though. In theory we can have execute but not read, I guess
that's not really going to work here either way and I don't know if
Linux exposes it ever.

> 
> Now, as the semaphore has been released, we really need to do something, 
> because if we goto retry inconditionally, we may end up in an infinite 
> loop, and we can't let it continue either as the semaphore is not held 
> anymore.
>
> >   
> >> +				return bad_area_nosemaphore(regs, address);
> >> +			}
> >> +		}
> >> +	}  
> > 
> > Would it be nicer to move all this up into bad_stack_expansion().
> > It would need a way to handle the retry and insn, but I think it
> > would still look better.  
> 
> That's what I did in v5 indeed, but it looked too complex to me at the 
> end. Can you have a look at it 
> (https://patchwork.ozlabs.org/patch/799053/) and tell me if you feel it 
> better than v7, or if you have any suggestion to improve based on v5 
> and/or v7 ?

Yeah I'm kind of liking that direction a bit more. I took your code
and hacked on it a bit more... Completely untested but I wonder what
you think?

We can put almost all the checking logic out of the main fault
path, and the retry stuff can fit into the unlikely failure
path. Also we only get_user at the last minute.

It does use fault_in_pages_readable which in theory means you might
repeat the loop if the page gets faulted out between retry, but that
same pattern exists in places in the filesystem code. Basically it
would be edge case trashing and if it persists then the system is
already finished. So I think it's okay. Just makes the retry loop a
bit simpler.

Any thoughts?

Thanks,
Nick


diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index c01d627e687a..f0d36ec949b3 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -69,12 +69,8 @@ static inline bool notify_page_fault(struct pt_regs *regs)
  * Check whether the instruction at regs->nip is a store using
  * an update addressing form which will update r1.
  */
-static bool store_updates_sp(struct pt_regs *regs)
+static bool store_updates_sp(unsigned int inst)
 {
-	unsigned int inst;
-
-	if (get_user(inst, (unsigned int __user *)regs->nip))
-		return false;
 	/* check for 1 in the rA field */
 	if (((inst >> 16) & 0x1f) != 1)
 		return false;
@@ -233,10 +229,23 @@ static bool bad_kernel_fault(bool is_exec, unsigned long error_code,
 	return is_exec || (address >= TASK_SIZE);
 }
 
-static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
+static bool bad_stack_expand(struct pt_regs *regs, unsigned long address,
 				struct vm_area_struct *vma,
-				bool store_update_sp)
+				bool *retry)
 {
+	unsigned int __user *nip = (unsigned int __user *)regs->nip;
+	struct pt_regs *uregs = current->thread.regs;
+	unsigned int inst;
+	int res;
+
+	/*
+	 * We want to do this outside mmap_sem, because reading code around nip
+	 * can result in fault, which will cause a deadlock when called with
+	 * mmap_sem held
+	 */
+	if (is_write && is_user)
+		store_update_sp = store_updates_sp(regs);
+
 	/*
 	 * N.B. The POWER/Open ABI allows programs to access up to
 	 * 288 bytes below the stack pointer.
@@ -246,26 +255,46 @@ static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
 	 * before setting the user r1.  Thus we allow the stack to
 	 * expand to 1MB without further checks.
 	 */
-	if (address + 0x100000 < vma->vm_end) {
-		/* get user regs even if this fault is in kernel mode */
-		struct pt_regs *uregs = current->thread.regs;
-		if (uregs == NULL)
-			return true;
+	if (address + 0x100000 >= vma->vm_end)
+		return false;
 
-		/*
-		 * A user-mode access to an address a long way below
-		 * the stack pointer is only valid if the instruction
-		 * is one which would update the stack pointer to the
-		 * address accessed if the instruction completed,
-		 * i.e. either stwu rs,n(r1) or stwux rs,r1,rb
-		 * (or the byte, halfword, float or double forms).
-		 *
-		 * If we don't check this then any write to the area
-		 * between the last mapped region and the stack will
-		 * expand the stack rather than segfaulting.
-		 */
-		if (address + 2048 < uregs->gpr[1] && !store_update_sp)
-			return true;
+	/* get user regs even if this fault is in kernel mode */
+	if (unlikely(uregs == NULL)) {
+		*must_retry = false;
+		return true;
+	}
+
+	/*
+	 * A user-mode access to an address a long way below
+	 * the stack pointer is only valid if the instruction
+	 * is one which would update the stack pointer to the
+	 * address accessed if the instruction completed,
+	 * i.e. either stwu rs,n(r1) or stwux rs,r1,rb
+	 * (or the byte, halfword, float or double forms).
+	 *
+	 * If we don't check this then any write to the area
+	 * between the last mapped region and the stack will
+	 * expand the stack rather than segfaulting.
+	 */
+	if (address + 2048 >= uregs->gpr[1])
+		return false;
+
+	if (unlikely(!access_ok(VERIFY_READ, nip, sizeof(inst)))) {
+		*must_retry = true;
+		return true;
+	}
+
+	pagefault_disable();
+	res = __get_user_inatomic(inst, nip);
+	pagefault_enable();
+	if (unlikely(res)) {
+		*must_retry = true;
+		return true;
+	}
+
+	if (!store_updates_sp(inst)) {
+		*must_retry = true;
+		return true;
 	}
 	return false;
 }
@@ -403,7 +432,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 	int is_user = user_mode(regs);
 	int is_write = page_fault_is_write(error_code);
 	int fault, major = 0;
-	bool store_update_sp = false;
+	bool must_retry;
 
 	if (notify_page_fault(regs))
 		return 0;
@@ -449,14 +478,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 		return bad_key_fault_exception(regs, address,
 					       get_mm_addr_key(mm, address));
 
-	/*
-	 * We want to do this outside mmap_sem, because reading code around nip
-	 * can result in fault, which will cause a deadlock when called with
-	 * mmap_sem held
-	 */
-	if (is_write && is_user)
-		store_update_sp = store_updates_sp(regs);
-
 	if (is_user)
 		flags |= FAULT_FLAG_USER;
 	if (is_write)
@@ -503,8 +524,17 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
 		return bad_area(regs, address);
 
 	/* The stack is being expanded, check if it's valid */
-	if (unlikely(bad_stack_expansion(regs, address, vma, store_update_sp)))
+	if (unlikely(bad_stack_expand(regs, address, vma, &must_retry))) {
+		if (must_retry) {
+			up_read(&mm->mmap_sem);
+			if (fault_in_pages_readable(address, sizeof(unsigned int)))
+				return bad_area_nosemaphore(regs, address);
+			goto retry;
+		}
+
 		return bad_area(regs, address);
+	}
+
 
 	/* Try to expand it */
 	if (unlikely(expand_stack(vma, address)))

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v7 2/3] powerpc/mm: Only read faulting instruction when necessary in do_page_fault()
@ 2018-05-23  6:29         ` Nicholas Piggin
  0 siblings, 0 replies; 8+ messages in thread
From: Nicholas Piggin @ 2018-05-23  6:29 UTC (permalink / raw)
  To: Christophe LEROY
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On Tue, 22 May 2018 16:50:55 +0200
Christophe LEROY <christophe.leroy@c-s.fr> wrote:

> Le 22/05/2018 =C3=A0 16:38, Nicholas Piggin a =C3=A9crit=C2=A0:
> > On Tue, 22 May 2018 16:02:56 +0200 (CEST)
> > Christophe Leroy <christophe.leroy@c-s.fr> wrote:
> >  =20
> >> Commit a7a9dcd882a67 ("powerpc: Avoid taking a data miss on every
> >> userspace instruction miss") has shown that limiting the read of
> >> faulting instruction to likely cases improves performance.
> >>
> >> This patch goes further into this direction by limiting the read
> >> of the faulting instruction to the only cases where it is likely
> >> needed.
> >>
> >> On an MPC885, with the same benchmark app as in the commit referred
> >> above, we see a reduction of about 3900 dTLB misses (approx 3%):
> >>
> >> Before the patch:
> >>   Performance counter stats for './fault 500' (10 runs):
> >>
> >>           683033312      cpu-cycles                                   =
                 ( +-  0.03% )
> >>              134538      dTLB-load-misses                             =
                 ( +-  0.03% )
> >>               46099      iTLB-load-misses                             =
                 ( +-  0.02% )
> >>               19681      faults                                       =
                 ( +-  0.02% )
> >>
> >>         5.389747878 seconds time elapsed                              =
            ( +-  0.06% )
> >>
> >> With the patch:
> >>
> >>   Performance counter stats for './fault 500' (10 runs):
> >>
> >>           682112862      cpu-cycles                                   =
                 ( +-  0.03% )
> >>              130619      dTLB-load-misses                             =
                 ( +-  0.03% )
> >>               46073      iTLB-load-misses                             =
                 ( +-  0.05% )
> >>               19681      faults                                       =
                 ( +-  0.01% )
> >>
> >>         5.381342641 seconds time elapsed                              =
            ( +-  0.07% )
> >>
> >> The proper work of the huge stack expansion was tested with the
> >> following app:
> >>
> >> int main(int argc, char **argv)
> >> {
> >> 	char buf[1024 * 1025];
> >>
> >> 	sprintf(buf, "Hello world !\n");
> >> 	printf(buf);
> >>
> >> 	exit(0);
> >> }
> >>
> >> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
> >> ---
> >>   v7: Following comment from Nicholas on v6 on possibility of the page=
 getting removed from the pagetables
> >>       between the fault and the read, I have reworked the patch in ord=
er to do the get_user() in
> >>       __do_page_fault() directly in order to reduce complexity compare=
d to version v5 =20
> >=20
> > This is looking better, thanks.
> >  =20
> >> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> >> index fcbb34431da2..dc64b8e06477 100644
> >> --- a/arch/powerpc/mm/fault.c
> >> +++ b/arch/powerpc/mm/fault.c
> >> @@ -450,9 +450,6 @@ static int __do_page_fault(struct pt_regs *regs, u=
nsigned long address,
> >>   	 * can result in fault, which will cause a deadlock when called with
> >>   	 * mmap_sem held
> >>   	 */
> >> -	if (is_write && is_user)
> >> -		get_user(inst, (unsigned int __user *)regs->nip);
> >> -
> >>   	if (is_user)
> >>   		flags |=3D FAULT_FLAG_USER;
> >>   	if (is_write)
> >> @@ -498,6 +495,26 @@ static int __do_page_fault(struct pt_regs *regs, =
unsigned long address,
> >>   	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
> >>   		return bad_area(regs, address);
> >>  =20
> >> +	if (unlikely(is_write && is_user && address + 0x100000 < vma->vm_end=
 &&
> >> +		     !inst)) {
> >> +		unsigned int __user *nip =3D (unsigned int __user *)regs->nip;
> >> +
> >> +		if (likely(access_ok(VERIFY_READ, nip, sizeof(inst)))) {
> >> +			int res;
> >> +
> >> +			pagefault_disable();
> >> +			res =3D __get_user_inatomic(inst, nip);
> >> +			pagefault_enable();
> >> +			if (unlikely(res)) {
> >> +				up_read(&mm->mmap_sem);
> >> +				res =3D __get_user(inst, nip);
> >> +				if (!res && inst)
> >> +					goto retry; =20
> >=20
> > You're handling error here but the previous code did not? =20
>=20
> The previous code did in store_updates_sp()
>=20
> When I moved get_user() out of that function in preceeding patch, I did=20
> consider that if get_user() fails, inst will remain 0, which means that=20
> store_updates_sp() will return false if ever called.

Well it handles it just by saying no the store does not update SP.
Yours now segfaults it, doesn't it?

I don't think that's a bad idea, I think it should go in a patch by
itself though. In theory we can have execute but not read, I guess
that's not really going to work here either way and I don't know if
Linux exposes it ever.

>=20
> Now, as the semaphore has been released, we really need to do something,=
=20
> because if we goto retry inconditionally, we may end up in an infinite=20
> loop, and we can't let it continue either as the semaphore is not held=20
> anymore.
>
> >  =20
> >> +				return bad_area_nosemaphore(regs, address);
> >> +			}
> >> +		}
> >> +	} =20
> >=20
> > Would it be nicer to move all this up into bad_stack_expansion().
> > It would need a way to handle the retry and insn, but I think it
> > would still look better. =20
>=20
> That's what I did in v5 indeed, but it looked too complex to me at the=20
> end. Can you have a look at it=20
> (https://patchwork.ozlabs.org/patch/799053/) and tell me if you feel it=20
> better than v7, or if you have any suggestion to improve based on v5=20
> and/or v7 ?

Yeah I'm kind of liking that direction a bit more. I took your code
and hacked on it a bit more... Completely untested but I wonder what
you think?

We can put almost all the checking logic out of the main fault
path, and the retry stuff can fit into the unlikely failure
path. Also we only get_user at the last minute.

It does use fault_in_pages_readable which in theory means you might
repeat the loop if the page gets faulted out between retry, but that
same pattern exists in places in the filesystem code. Basically it
would be edge case trashing and if it persists then the system is
already finished. So I think it's okay. Just makes the retry loop a
bit simpler.

Any thoughts?

Thanks,
Nick


diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index c01d627e687a..f0d36ec949b3 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -69,12 +69,8 @@ static inline bool notify_page_fault(struct pt_regs *reg=
s)
  * Check whether the instruction at regs->nip is a store using
  * an update addressing form which will update r1.
  */
-static bool store_updates_sp(struct pt_regs *regs)
+static bool store_updates_sp(unsigned int inst)
 {
-	unsigned int inst;
-
-	if (get_user(inst, (unsigned int __user *)regs->nip))
-		return false;
 	/* check for 1 in the rA field */
 	if (((inst >> 16) & 0x1f) !=3D 1)
 		return false;
@@ -233,10 +229,23 @@ static bool bad_kernel_fault(bool is_exec, unsigned l=
ong error_code,
 	return is_exec || (address >=3D TASK_SIZE);
 }
=20
-static bool bad_stack_expansion(struct pt_regs *regs, unsigned long addres=
s,
+static bool bad_stack_expand(struct pt_regs *regs, unsigned long address,
 				struct vm_area_struct *vma,
-				bool store_update_sp)
+				bool *retry)
 {
+	unsigned int __user *nip =3D (unsigned int __user *)regs->nip;
+	struct pt_regs *uregs =3D current->thread.regs;
+	unsigned int inst;
+	int res;
+
+	/*
+	 * We want to do this outside mmap_sem, because reading code around nip
+	 * can result in fault, which will cause a deadlock when called with
+	 * mmap_sem held
+	 */
+	if (is_write && is_user)
+		store_update_sp =3D store_updates_sp(regs);
+
 	/*
 	 * N.B. The POWER/Open ABI allows programs to access up to
 	 * 288 bytes below the stack pointer.
@@ -246,26 +255,46 @@ static bool bad_stack_expansion(struct pt_regs *regs,=
 unsigned long address,
 	 * before setting the user r1.  Thus we allow the stack to
 	 * expand to 1MB without further checks.
 	 */
-	if (address + 0x100000 < vma->vm_end) {
-		/* get user regs even if this fault is in kernel mode */
-		struct pt_regs *uregs =3D current->thread.regs;
-		if (uregs =3D=3D NULL)
-			return true;
+	if (address + 0x100000 >=3D vma->vm_end)
+		return false;
=20
-		/*
-		 * A user-mode access to an address a long way below
-		 * the stack pointer is only valid if the instruction
-		 * is one which would update the stack pointer to the
-		 * address accessed if the instruction completed,
-		 * i.e. either stwu rs,n(r1) or stwux rs,r1,rb
-		 * (or the byte, halfword, float or double forms).
-		 *
-		 * If we don't check this then any write to the area
-		 * between the last mapped region and the stack will
-		 * expand the stack rather than segfaulting.
-		 */
-		if (address + 2048 < uregs->gpr[1] && !store_update_sp)
-			return true;
+	/* get user regs even if this fault is in kernel mode */
+	if (unlikely(uregs =3D=3D NULL)) {
+		*must_retry =3D false;
+		return true;
+	}
+
+	/*
+	 * A user-mode access to an address a long way below
+	 * the stack pointer is only valid if the instruction
+	 * is one which would update the stack pointer to the
+	 * address accessed if the instruction completed,
+	 * i.e. either stwu rs,n(r1) or stwux rs,r1,rb
+	 * (or the byte, halfword, float or double forms).
+	 *
+	 * If we don't check this then any write to the area
+	 * between the last mapped region and the stack will
+	 * expand the stack rather than segfaulting.
+	 */
+	if (address + 2048 >=3D uregs->gpr[1])
+		return false;
+
+	if (unlikely(!access_ok(VERIFY_READ, nip, sizeof(inst)))) {
+		*must_retry =3D true;
+		return true;
+	}
+
+	pagefault_disable();
+	res =3D __get_user_inatomic(inst, nip);
+	pagefault_enable();
+	if (unlikely(res)) {
+		*must_retry =3D true;
+		return true;
+	}
+
+	if (!store_updates_sp(inst)) {
+		*must_retry =3D true;
+		return true;
 	}
 	return false;
 }
@@ -403,7 +432,7 @@ static int __do_page_fault(struct pt_regs *regs, unsign=
ed long address,
 	int is_user =3D user_mode(regs);
 	int is_write =3D page_fault_is_write(error_code);
 	int fault, major =3D 0;
-	bool store_update_sp =3D false;
+	bool must_retry;
=20
 	if (notify_page_fault(regs))
 		return 0;
@@ -449,14 +478,6 @@ static int __do_page_fault(struct pt_regs *regs, unsig=
ned long address,
 		return bad_key_fault_exception(regs, address,
 					       get_mm_addr_key(mm, address));
=20
-	/*
-	 * We want to do this outside mmap_sem, because reading code around nip
-	 * can result in fault, which will cause a deadlock when called with
-	 * mmap_sem held
-	 */
-	if (is_write && is_user)
-		store_update_sp =3D store_updates_sp(regs);
-
 	if (is_user)
 		flags |=3D FAULT_FLAG_USER;
 	if (is_write)
@@ -503,8 +524,17 @@ static int __do_page_fault(struct pt_regs *regs, unsig=
ned long address,
 		return bad_area(regs, address);
=20
 	/* The stack is being expanded, check if it's valid */
-	if (unlikely(bad_stack_expansion(regs, address, vma, store_update_sp)))
+	if (unlikely(bad_stack_expand(regs, address, vma, &must_retry))) {
+		if (must_retry) {
+			up_read(&mm->mmap_sem);
+			if (fault_in_pages_readable(address, sizeof(unsigned int)))
+				return bad_area_nosemaphore(regs, address);
+			goto retry;
+		}
+
 		return bad_area(regs, address);
+	}
+
=20
 	/* Try to expand it */
 	if (unlikely(expand_stack(vma, address)))

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v7 2/3] powerpc/mm: Only read faulting instruction when necessary in do_page_fault()
  2018-05-23  6:29         ` Nicholas Piggin
  (?)
@ 2018-05-23  7:00         ` Christophe LEROY
  -1 siblings, 0 replies; 8+ messages in thread
From: Christophe LEROY @ 2018-05-23  7:00 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev



Le 23/05/2018 à 08:29, Nicholas Piggin a écrit :
> On Tue, 22 May 2018 16:50:55 +0200
> Christophe LEROY <christophe.leroy@c-s.fr> wrote:
> 
>> Le 22/05/2018 à 16:38, Nicholas Piggin a écrit :
>>> On Tue, 22 May 2018 16:02:56 +0200 (CEST)
>>> Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>>>    
>>>> Commit a7a9dcd882a67 ("powerpc: Avoid taking a data miss on every
>>>> userspace instruction miss") has shown that limiting the read of
>>>> faulting instruction to likely cases improves performance.
>>>>
>>>> This patch goes further into this direction by limiting the read
>>>> of the faulting instruction to the only cases where it is likely
>>>> needed.
>>>>
>>>> On an MPC885, with the same benchmark app as in the commit referred
>>>> above, we see a reduction of about 3900 dTLB misses (approx 3%):
>>>>
>>>> Before the patch:
>>>>    Performance counter stats for './fault 500' (10 runs):
>>>>
>>>>            683033312      cpu-cycles                                                    ( +-  0.03% )
>>>>               134538      dTLB-load-misses                                              ( +-  0.03% )
>>>>                46099      iTLB-load-misses                                              ( +-  0.02% )
>>>>                19681      faults                                                        ( +-  0.02% )
>>>>
>>>>          5.389747878 seconds time elapsed                                          ( +-  0.06% )
>>>>
>>>> With the patch:
>>>>
>>>>    Performance counter stats for './fault 500' (10 runs):
>>>>
>>>>            682112862      cpu-cycles                                                    ( +-  0.03% )
>>>>               130619      dTLB-load-misses                                              ( +-  0.03% )
>>>>                46073      iTLB-load-misses                                              ( +-  0.05% )
>>>>                19681      faults                                                        ( +-  0.01% )
>>>>
>>>>          5.381342641 seconds time elapsed                                          ( +-  0.07% )
>>>>
>>>> The proper work of the huge stack expansion was tested with the
>>>> following app:
>>>>
>>>> int main(int argc, char **argv)
>>>> {
>>>> 	char buf[1024 * 1025];
>>>>
>>>> 	sprintf(buf, "Hello world !\n");
>>>> 	printf(buf);
>>>>
>>>> 	exit(0);
>>>> }
>>>>
>>>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>>>> ---
>>>>    v7: Following comment from Nicholas on v6 on possibility of the page getting removed from the pagetables
>>>>        between the fault and the read, I have reworked the patch in order to do the get_user() in
>>>>        __do_page_fault() directly in order to reduce complexity compared to version v5
>>>
>>> This is looking better, thanks.
>>>    
>>>> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>>>> index fcbb34431da2..dc64b8e06477 100644
>>>> --- a/arch/powerpc/mm/fault.c
>>>> +++ b/arch/powerpc/mm/fault.c
>>>> @@ -450,9 +450,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>>>>    	 * can result in fault, which will cause a deadlock when called with
>>>>    	 * mmap_sem held
>>>>    	 */
>>>> -	if (is_write && is_user)
>>>> -		get_user(inst, (unsigned int __user *)regs->nip);
>>>> -
>>>>    	if (is_user)
>>>>    		flags |= FAULT_FLAG_USER;
>>>>    	if (is_write)
>>>> @@ -498,6 +495,26 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>>>>    	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
>>>>    		return bad_area(regs, address);
>>>>    
>>>> +	if (unlikely(is_write && is_user && address + 0x100000 < vma->vm_end &&
>>>> +		     !inst)) {
>>>> +		unsigned int __user *nip = (unsigned int __user *)regs->nip;
>>>> +
>>>> +		if (likely(access_ok(VERIFY_READ, nip, sizeof(inst)))) {
>>>> +			int res;
>>>> +
>>>> +			pagefault_disable();
>>>> +			res = __get_user_inatomic(inst, nip);
>>>> +			pagefault_enable();
>>>> +			if (unlikely(res)) {
>>>> +				up_read(&mm->mmap_sem);
>>>> +				res = __get_user(inst, nip);
>>>> +				if (!res && inst)
>>>> +					goto retry;
>>>
>>> You're handling error here but the previous code did not?
>>
>> The previous code did in store_updates_sp()
>>
>> When I moved get_user() out of that function in preceeding patch, I did
>> consider that if get_user() fails, inst will remain 0, which means that
>> store_updates_sp() will return false if ever called.
> 
> Well it handles it just by saying no the store does not update SP.
> Yours now segfaults it, doesn't it?

Yes it segfaults the same way as before, as it tell the expansion is bad.

> 
> I don't think that's a bad idea, I think it should go in a patch by
> itself though. In theory we can have execute but not read, I guess
> that's not really going to work here either way and I don't know if
> Linux exposes it ever.

I don't understand what you mean, that's not different from before, is it ?

> 
>>
>> Now, as the semaphore has been released, we really need to do something,
>> because if we goto retry inconditionally, we may end up in an infinite
>> loop, and we can't let it continue either as the semaphore is not held
>> anymore.
>>
>>>    
>>>> +				return bad_area_nosemaphore(regs, address);
>>>> +			}
>>>> +		}
>>>> +	}
>>>
>>> Would it be nicer to move all this up into bad_stack_expansion().
>>> It would need a way to handle the retry and insn, but I think it
>>> would still look better.
>>
>> That's what I did in v5 indeed, but it looked too complex to me at the
>> end. Can you have a look at it
>> (https://patchwork.ozlabs.org/patch/799053/) and tell me if you feel it
>> better than v7, or if you have any suggestion to improve based on v5
>> and/or v7 ?
> 
> Yeah I'm kind of liking that direction a bit more. I took your code
> and hacked on it a bit more... Completely untested but I wonder what
> you think?
> 
> We can put almost all the checking logic out of the main fault
> path, and the retry stuff can fit into the unlikely failure
> path. Also we only get_user at the last minute.
> 
> It does use fault_in_pages_readable which in theory means you might
> repeat the loop if the page gets faulted out between retry, but that
> same pattern exists in places in the filesystem code. Basically it
> would be edge case trashing and if it persists then the system is
> already finished. So I think it's okay. Just makes the retry loop a
> bit simpler.
> 
> Any thoughts?

Indeed, after writing you I looked at it once more and I think I ended 
up with something rather similar as what you are proposing here.
The complexity in v5 was because I left the get_user() in 
store_updates_sp(). By moving it up into bad_stack_expansion(), it looks 
better.
The main difference I see between your proposal and my v8 is that I do 
the up_read() in bad_stack_expansion(). Maybe that's not a good idea.

I'll release it in a few minutes, let me know what you think about it.

Thanks,
Christophe

> 
> Thanks,
> Nick
> 
> 
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index c01d627e687a..f0d36ec949b3 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -69,12 +69,8 @@ static inline bool notify_page_fault(struct pt_regs *regs)
>    * Check whether the instruction at regs->nip is a store using
>    * an update addressing form which will update r1.
>    */
> -static bool store_updates_sp(struct pt_regs *regs)
> +static bool store_updates_sp(unsigned int inst)
>   {
> -	unsigned int inst;
> -
> -	if (get_user(inst, (unsigned int __user *)regs->nip))
> -		return false;
>   	/* check for 1 in the rA field */
>   	if (((inst >> 16) & 0x1f) != 1)
>   		return false;
> @@ -233,10 +229,23 @@ static bool bad_kernel_fault(bool is_exec, unsigned long error_code,
>   	return is_exec || (address >= TASK_SIZE);
>   }
>   
> -static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
> +static bool bad_stack_expand(struct pt_regs *regs, unsigned long address,
>   				struct vm_area_struct *vma,
> -				bool store_update_sp)
> +				bool *retry)
>   {
> +	unsigned int __user *nip = (unsigned int __user *)regs->nip;
> +	struct pt_regs *uregs = current->thread.regs;
> +	unsigned int inst;
> +	int res;
> +
> +	/*
> +	 * We want to do this outside mmap_sem, because reading code around nip
> +	 * can result in fault, which will cause a deadlock when called with
> +	 * mmap_sem held
> +	 */
> +	if (is_write && is_user)
> +		store_update_sp = store_updates_sp(regs);
> +
>   	/*
>   	 * N.B. The POWER/Open ABI allows programs to access up to
>   	 * 288 bytes below the stack pointer.
> @@ -246,26 +255,46 @@ static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
>   	 * before setting the user r1.  Thus we allow the stack to
>   	 * expand to 1MB without further checks.
>   	 */
> -	if (address + 0x100000 < vma->vm_end) {
> -		/* get user regs even if this fault is in kernel mode */
> -		struct pt_regs *uregs = current->thread.regs;
> -		if (uregs == NULL)
> -			return true;
> +	if (address + 0x100000 >= vma->vm_end)
> +		return false;
>   
> -		/*
> -		 * A user-mode access to an address a long way below
> -		 * the stack pointer is only valid if the instruction
> -		 * is one which would update the stack pointer to the
> -		 * address accessed if the instruction completed,
> -		 * i.e. either stwu rs,n(r1) or stwux rs,r1,rb
> -		 * (or the byte, halfword, float or double forms).
> -		 *
> -		 * If we don't check this then any write to the area
> -		 * between the last mapped region and the stack will
> -		 * expand the stack rather than segfaulting.
> -		 */
> -		if (address + 2048 < uregs->gpr[1] && !store_update_sp)
> -			return true;
> +	/* get user regs even if this fault is in kernel mode */
> +	if (unlikely(uregs == NULL)) {
> +		*must_retry = false;
> +		return true;
> +	}
> +
> +	/*
> +	 * A user-mode access to an address a long way below
> +	 * the stack pointer is only valid if the instruction
> +	 * is one which would update the stack pointer to the
> +	 * address accessed if the instruction completed,
> +	 * i.e. either stwu rs,n(r1) or stwux rs,r1,rb
> +	 * (or the byte, halfword, float or double forms).
> +	 *
> +	 * If we don't check this then any write to the area
> +	 * between the last mapped region and the stack will
> +	 * expand the stack rather than segfaulting.
> +	 */
> +	if (address + 2048 >= uregs->gpr[1])
> +		return false;
> +
> +	if (unlikely(!access_ok(VERIFY_READ, nip, sizeof(inst)))) {
> +		*must_retry = true;
> +		return true;
> +	}
> +
> +	pagefault_disable();
> +	res = __get_user_inatomic(inst, nip);
> +	pagefault_enable();
> +	if (unlikely(res)) {
> +		*must_retry = true;
> +		return true;
> +	}
> +
> +	if (!store_updates_sp(inst)) {
> +		*must_retry = true;
> +		return true;
>   	}
>   	return false;
>   }
> @@ -403,7 +432,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>   	int is_user = user_mode(regs);
>   	int is_write = page_fault_is_write(error_code);
>   	int fault, major = 0;
> -	bool store_update_sp = false;
> +	bool must_retry;
>   
>   	if (notify_page_fault(regs))
>   		return 0;
> @@ -449,14 +478,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>   		return bad_key_fault_exception(regs, address,
>   					       get_mm_addr_key(mm, address));
>   
> -	/*
> -	 * We want to do this outside mmap_sem, because reading code around nip
> -	 * can result in fault, which will cause a deadlock when called with
> -	 * mmap_sem held
> -	 */
> -	if (is_write && is_user)
> -		store_update_sp = store_updates_sp(regs);
> -
>   	if (is_user)
>   		flags |= FAULT_FLAG_USER;
>   	if (is_write)
> @@ -503,8 +524,17 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>   		return bad_area(regs, address);
>   
>   	/* The stack is being expanded, check if it's valid */
> -	if (unlikely(bad_stack_expansion(regs, address, vma, store_update_sp)))
> +	if (unlikely(bad_stack_expand(regs, address, vma, &must_retry))) {
> +		if (must_retry) {
> +			up_read(&mm->mmap_sem);
> +			if (fault_in_pages_readable(address, sizeof(unsigned int)))
> +				return bad_area_nosemaphore(regs, address);
> +			goto retry;
> +		}
> +
>   		return bad_area(regs, address);
> +	}
> +
>   
>   	/* Try to expand it */
>   	if (unlikely(expand_stack(vma, address)))
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-05-23  7:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-22 14:02 [PATCH v7 1/3] powerpc/mm: Move get_user() out of store_updates_sp() Christophe Leroy
2018-05-22 14:02 ` [PATCH v7 2/3] powerpc/mm: Only read faulting instruction when necessary in do_page_fault() Christophe Leroy
2018-05-22 14:38   ` Nicholas Piggin
2018-05-22 14:50     ` Christophe LEROY
2018-05-23  6:29       ` Nicholas Piggin
2018-05-23  6:29         ` Nicholas Piggin
2018-05-23  7:00         ` Christophe LEROY
2018-05-22 14:02 ` [PATCH v7 3/3] powerpc/mm: Use instruction symbolic names in store_updates_sp() Christophe Leroy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.