linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc/lpar: Don't crash on H_PROTECT errors
@ 2016-02-29 11:52 Anshuman Khandual
  2016-02-29 16:50 ` Tyrel Datwyler
  2016-02-29 23:01 ` Michael Ellerman
  0 siblings, 2 replies; 7+ messages in thread
From: Anshuman Khandual @ 2016-02-29 11:52 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aneesh.kumar, mpe

There are certain condition in which H_PROTECT can return error code
other than H_NOT_FOUND and H_SUCCESS. One such being an attempt to
update an hpte owned by adjunct partition. Return 0 in that case so
that user space will retry the access. In adjunct case this mean we
will not make much progress in the user space. But atleast we get a
chance to kill the task and avoid taking down the entire box.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/lpar.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 477290a..31bcdaf 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -310,7 +310,7 @@ static long pSeries_lpar_hpte_updatepp(unsigned long slot,
 
 	pr_devel("ok\n");
 
-	BUG_ON(lpar_rc != H_SUCCESS);
+	WARN_RATELIMIT(lpar_rc != H_SUCCESS, "H_PROTECT returned %lu\n", lpar_rc);
 
 	return 0;
 }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] powerpc/lpar: Don't crash on H_PROTECT errors
  2016-02-29 11:52 [PATCH] powerpc/lpar: Don't crash on H_PROTECT errors Anshuman Khandual
@ 2016-02-29 16:50 ` Tyrel Datwyler
  2016-03-01  4:34   ` Anshuman Khandual
  2016-03-01  5:27   ` Aneesh Kumar K.V
  2016-02-29 23:01 ` Michael Ellerman
  1 sibling, 2 replies; 7+ messages in thread
From: Tyrel Datwyler @ 2016-02-29 16:50 UTC (permalink / raw)
  To: Anshuman Khandual, linuxppc-dev; +Cc: aneesh.kumar

On 02/29/2016 03:52 AM, Anshuman Khandual wrote:
> There are certain condition in which H_PROTECT can return error code
> other than H_NOT_FOUND and H_SUCCESS. One such being an attempt to
> update an hpte owned by adjunct partition. Return 0 in that case so
> that user space will retry the access. In adjunct case this mean we
> will not make much progress in the user space. But atleast we get a
> chance to kill the task and avoid taking down the entire box.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/lpar.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
> index 477290a..31bcdaf 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -310,7 +310,7 @@ static long pSeries_lpar_hpte_updatepp(unsigned long slot,
>  
>  	pr_devel("ok\n");
>  
> -	BUG_ON(lpar_rc != H_SUCCESS);
> +	WARN_RATELIMIT(lpar_rc != H_SUCCESS, "H_PROTECT returned %lu\n", lpar_rc);

In the event that we don't get H_NOT_FOUND (which is handled earlier in
the function) or H_SUCCESS this patch assumes H_RESOURCE. It fails to
handle H_PARAMETER which is also a valid return from the H_PROTECT
hypercall.

-Tyrel

>  
>  	return 0;
>  }
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: powerpc/lpar: Don't crash on H_PROTECT errors
  2016-02-29 11:52 [PATCH] powerpc/lpar: Don't crash on H_PROTECT errors Anshuman Khandual
  2016-02-29 16:50 ` Tyrel Datwyler
@ 2016-02-29 23:01 ` Michael Ellerman
  2016-03-01  5:27   ` Anshuman Khandual
  1 sibling, 1 reply; 7+ messages in thread
From: Michael Ellerman @ 2016-02-29 23:01 UTC (permalink / raw)
  To: Anshuman Khandual, linuxppc-dev; +Cc: aneesh.kumar

On Mon, 2016-29-02 at 11:52:32 UTC, Anshuman Khandual wrote:
> There are certain condition in which H_PROTECT can return error code
> other than H_NOT_FOUND and H_SUCCESS. One such being an attempt to
> update an hpte owned by adjunct partition. Return 0 in that case so
> that user space will retry the access. In adjunct case this mean we
> will not make much progress in the user space. But atleast we get a
> chance to kill the task and avoid taking down the entire box.

Why is it OK to do nothing and return 0?

The function's contract is that it either does the update or returns -1, you
can't change that without auditing all callers - and describing that in the
change log.

cheers

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] powerpc/lpar: Don't crash on H_PROTECT errors
  2016-02-29 16:50 ` Tyrel Datwyler
@ 2016-03-01  4:34   ` Anshuman Khandual
  2016-03-01  5:27   ` Aneesh Kumar K.V
  1 sibling, 0 replies; 7+ messages in thread
From: Anshuman Khandual @ 2016-03-01  4:34 UTC (permalink / raw)
  To: Tyrel Datwyler, linuxppc-dev; +Cc: aneesh.kumar

On 02/29/2016 10:20 PM, Tyrel Datwyler wrote:
> On 02/29/2016 03:52 AM, Anshuman Khandual wrote:
>> There are certain condition in which H_PROTECT can return error code
>> other than H_NOT_FOUND and H_SUCCESS. One such being an attempt to
>> update an hpte owned by adjunct partition. Return 0 in that case so
>> that user space will retry the access. In adjunct case this mean we
>> will not make much progress in the user space. But atleast we get a
>> chance to kill the task and avoid taking down the entire box.
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/lpar.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
>> index 477290a..31bcdaf 100644
>> --- a/arch/powerpc/platforms/pseries/lpar.c
>> +++ b/arch/powerpc/platforms/pseries/lpar.c
>> @@ -310,7 +310,7 @@ static long pSeries_lpar_hpte_updatepp(unsigned long slot,
>>  
>>  	pr_devel("ok\n");
>>  
>> -	BUG_ON(lpar_rc != H_SUCCESS);
>> +	WARN_RATELIMIT(lpar_rc != H_SUCCESS, "H_PROTECT returned %lu\n", lpar_rc);
> 
> In the event that we don't get H_NOT_FOUND (which is handled earlier in
> the function) or H_SUCCESS this patch assumes H_RESOURCE. It fails to
> handle H_PARAMETER which is also a valid return from the H_PROTECT
> hypercall.

Right, I guess its okay to BUG_ON() if we detect H_PARAMETER return code
as the kernel should not have created wrong arguments to the hcall in
the first place. Although we will still want to return 0 from here for
H_RESOURCE return code with a WARN_RATELIMIT message. Will change the
code accordingly next time around.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] powerpc/lpar: Don't crash on H_PROTECT errors
  2016-02-29 16:50 ` Tyrel Datwyler
  2016-03-01  4:34   ` Anshuman Khandual
@ 2016-03-01  5:27   ` Aneesh Kumar K.V
  2016-03-01  5:46     ` Anshuman Khandual
  1 sibling, 1 reply; 7+ messages in thread
From: Aneesh Kumar K.V @ 2016-03-01  5:27 UTC (permalink / raw)
  To: Tyrel Datwyler, Anshuman Khandual, linuxppc-dev

Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes:

> On 02/29/2016 03:52 AM, Anshuman Khandual wrote:
>> There are certain condition in which H_PROTECT can return error code
>> other than H_NOT_FOUND and H_SUCCESS. One such being an attempt to
>> update an hpte owned by adjunct partition. Return 0 in that case so
>> that user space will retry the access. In adjunct case this mean we
>> will not make much progress in the user space. But atleast we get a
>> chance to kill the task and avoid taking down the entire box.
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/lpar.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
>> index 477290a..31bcdaf 100644
>> --- a/arch/powerpc/platforms/pseries/lpar.c
>> +++ b/arch/powerpc/platforms/pseries/lpar.c
>> @@ -310,7 +310,7 @@ static long pSeries_lpar_hpte_updatepp(unsigned long slot,
>>  
>>  	pr_devel("ok\n");
>>  
>> -	BUG_ON(lpar_rc != H_SUCCESS);
>> +	WARN_RATELIMIT(lpar_rc != H_SUCCESS, "H_PROTECT returned %lu\n", lpar_rc);
>
> In the event that we don't get H_NOT_FOUND (which is handled earlier in
> the function) or H_SUCCESS this patch assumes H_RESOURCE. It fails to
> handle H_PARAMETER which is also a valid return from the H_PROTECT
> hypercall.

One of the possible thing we could do is sent SIGBUS to the application
rather than taking down the system ?

-aneesh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: powerpc/lpar: Don't crash on H_PROTECT errors
  2016-02-29 23:01 ` Michael Ellerman
@ 2016-03-01  5:27   ` Anshuman Khandual
  0 siblings, 0 replies; 7+ messages in thread
From: Anshuman Khandual @ 2016-03-01  5:27 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: aneesh.kumar

On 03/01/2016 04:31 AM, Michael Ellerman wrote:
> On Mon, 2016-29-02 at 11:52:32 UTC, Anshuman Khandual wrote:
>> There are certain condition in which H_PROTECT can return error code
>> other than H_NOT_FOUND and H_SUCCESS. One such being an attempt to
>> update an hpte owned by adjunct partition. Return 0 in that case so
>> that user space will retry the access. In adjunct case this mean we
>> will not make much progress in the user space. But atleast we get a
>> chance to kill the task and avoid taking down the entire box.
> 
> Why is it OK to do nothing and return 0?
> 
> The function's contract is that it either does the update or returns -1, you

Right, the semantics of the function will change with it. The callers of
the function include these places.

(1) __hash_page_4K   (arch/powerpc/mm/hash64_4k.c)
(2) __hash_page_4K   (arch/powerpc/mm/hash64_64k.c)
(3) __hash_page_64K  (arch/powerpc/mm/hash64_64k.c)
(4) __hash_page_huge (arch/powerpc/mm/hugetlbpage-hash64.c)
(5) __hash_page_thp  (arch/powerpc/mm/hugepage-hash64.c)

All of them get called from hash_page_mm (in turn from hash_page)
which is triggered from page fault interrupt (0x300 or 0x400). A
return value of 0 from these individual HPTE management functions
will return 0 from hash_page_mm as well. This returns the control
back to the user space/kernel which would have caused the page
fault but without actually setting the right HPTE. This will cause
fault again. The semantics change does not affect these callers
in any bad way but fakes a correct page fault handling if I am not
missing anything.

But only __hash_page_4K and __hash_page_64K functions get called
from hash_preload (in turn from update_mmu_cache) which is called
from a lot of code path including linux page fault handling path
through handle_pte_fault. Inside hash_preload, the return value
from __hash_page_4K(|64K) function is checked only for -1, detecting
which hash_failure_debug is called. The return value is not passed
up in the call chain. With H_RESOURCE, hash_preload will pretend as
if it did what it was suppose to do and but none of it's callers will
explicitly know about it other than the WARN_RATELIMIT message. I
believed that hash_preload is a best attempt approach and anything
missed here will be corrected with future page faults when the process
starts executing. Please correct me if I am wrong.

> can't change that without auditing all callers - and describing that in the
> change log.

Did not get it, you would like to have something like the above code
path description in the change log ?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] powerpc/lpar: Don't crash on H_PROTECT errors
  2016-03-01  5:27   ` Aneesh Kumar K.V
@ 2016-03-01  5:46     ` Anshuman Khandual
  0 siblings, 0 replies; 7+ messages in thread
From: Anshuman Khandual @ 2016-03-01  5:46 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Tyrel Datwyler, linuxppc-dev

On 03/01/2016 10:57 AM, Aneesh Kumar K.V wrote:
> Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes:
> 
>> > On 02/29/2016 03:52 AM, Anshuman Khandual wrote:
>>> >> There are certain condition in which H_PROTECT can return error code
>>> >> other than H_NOT_FOUND and H_SUCCESS. One such being an attempt to
>>> >> update an hpte owned by adjunct partition. Return 0 in that case so
>>> >> that user space will retry the access. In adjunct case this mean we
>>> >> will not make much progress in the user space. But atleast we get a
>>> >> chance to kill the task and avoid taking down the entire box.
>>> >> 
>>> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
>>> >> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
>>> >> ---
>>> >>  arch/powerpc/platforms/pseries/lpar.c | 2 +-
>>> >>  1 file changed, 1 insertion(+), 1 deletion(-)
>>> >> 
>>> >> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
>>> >> index 477290a..31bcdaf 100644
>>> >> --- a/arch/powerpc/platforms/pseries/lpar.c
>>> >> +++ b/arch/powerpc/platforms/pseries/lpar.c
>>> >> @@ -310,7 +310,7 @@ static long pSeries_lpar_hpte_updatepp(unsigned long slot,
>>> >>  
>>> >>  	pr_devel("ok\n");
>>> >>  
>>> >> -	BUG_ON(lpar_rc != H_SUCCESS);
>>> >> +	WARN_RATELIMIT(lpar_rc != H_SUCCESS, "H_PROTECT returned %lu\n", lpar_rc);
>> >
>> > In the event that we don't get H_NOT_FOUND (which is handled earlier in
>> > the function) or H_SUCCESS this patch assumes H_RESOURCE. It fails to
>> > handle H_PARAMETER which is also a valid return from the H_PROTECT
>> > hypercall.
> One of the possible thing we could do is sent SIGBUS to the application
> rather than taking down the system ?

But in case of H_PARAMETER, its the kernel which generated wrong arguments
which was in it's control, hence BUG_ON will not more appropriate for it ?

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-03-01  5:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-29 11:52 [PATCH] powerpc/lpar: Don't crash on H_PROTECT errors Anshuman Khandual
2016-02-29 16:50 ` Tyrel Datwyler
2016-03-01  4:34   ` Anshuman Khandual
2016-03-01  5:27   ` Aneesh Kumar K.V
2016-03-01  5:46     ` Anshuman Khandual
2016-02-29 23:01 ` Michael Ellerman
2016-03-01  5:27   ` Anshuman Khandual

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).