All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: correctly report gpu recover status
@ 2019-12-18  3:25 Evan Quan
  2019-12-18  9:56 ` Christian König
  0 siblings, 1 reply; 4+ messages in thread
From: Evan Quan @ 2019-12-18  3:25 UTC (permalink / raw)
  To: amd-gfx; +Cc: Evan Quan

Knowing whether gpu recovery was performed successfully or not
is important for our BACO development.

Change-Id: I0e3ca4dcb65a053eb26bc55ad7431e4a42e160de
Signed-off-by: Evan Quan <evan.quan@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index e9efee04ca23..5dff5c0dd882 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -743,9 +743,7 @@ static int amdgpu_debugfs_gpu_recover(struct seq_file *m, void *data)
 	struct amdgpu_device *adev = dev->dev_private;
 
 	seq_printf(m, "gpu recover\n");
-	amdgpu_device_gpu_recover(adev, NULL);
-
-	return 0;
+	return amdgpu_device_gpu_recover(adev, NULL);
 }
 
 static const struct drm_info_list amdgpu_debugfs_fence_list[] = {
-- 
2.24.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/amdgpu: correctly report gpu recover status
  2019-12-18  3:25 [PATCH] drm/amdgpu: correctly report gpu recover status Evan Quan
@ 2019-12-18  9:56 ` Christian König
  2019-12-19  1:48   ` Quan, Evan
  0 siblings, 1 reply; 4+ messages in thread
From: Christian König @ 2019-12-18  9:56 UTC (permalink / raw)
  To: Evan Quan, amd-gfx

Am 18.12.19 um 04:25 schrieb Evan Quan:
> Knowing whether gpu recovery was performed successfully or not
> is important for our BACO development.
>
> Change-Id: I0e3ca4dcb65a053eb26bc55ad7431e4a42e160de
> Signed-off-by: Evan Quan <evan.quan@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index e9efee04ca23..5dff5c0dd882 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -743,9 +743,7 @@ static int amdgpu_debugfs_gpu_recover(struct seq_file *m, void *data)
>   	struct amdgpu_device *adev = dev->dev_private;
>   
>   	seq_printf(m, "gpu recover\n");
> -	amdgpu_device_gpu_recover(adev, NULL);
> -
> -	return 0;
> +	return amdgpu_device_gpu_recover(adev, NULL);

NAK, what we could do here is the following:

r = amdgpu_device_gpu_recover(....);
seq_printf(m, "gpu recover %d\n", r);

But returning the error code from the GPU recovery to userspace doesn't 
make to much sense.

Christian.

>   }
>   
>   static const struct drm_info_list amdgpu_debugfs_fence_list[] = {

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH] drm/amdgpu: correctly report gpu recover status
  2019-12-18  9:56 ` Christian König
@ 2019-12-19  1:48   ` Quan, Evan
  2020-01-01 11:17     ` Christian König
  0 siblings, 1 reply; 4+ messages in thread
From: Quan, Evan @ 2019-12-19  1:48 UTC (permalink / raw)
  To: Koenig, Christian, amd-gfx

Hi Christian,

Here is some background for this change:
I'm debugging a random failure issue on baco reset.
I used a while loop to run the continuous baco reset tests and hope it can exit immediately on failure occurred.
However, due to wrong return value, it did not. And as you can image, the failure scene was ruined.

I can add this "seq_printf(m, "gpu recover %d\n", r);".
But still what I care more(which is also the easiest way to me) is the correct return value of the API.

Regards,
Evan
> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken@gmail.com>
> Sent: Wednesday, December 18, 2019 5:57 PM
> To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: correctly report gpu recover status
> 
> Am 18.12.19 um 04:25 schrieb Evan Quan:
> > Knowing whether gpu recovery was performed successfully or not is
> > important for our BACO development.
> >
> > Change-Id: I0e3ca4dcb65a053eb26bc55ad7431e4a42e160de
> > Signed-off-by: Evan Quan <evan.quan@amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +---
> >   1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > index e9efee04ca23..5dff5c0dd882 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -743,9 +743,7 @@ static int amdgpu_debugfs_gpu_recover(struct
> seq_file *m, void *data)
> >   	struct amdgpu_device *adev = dev->dev_private;
> >
> >   	seq_printf(m, "gpu recover\n");
> > -	amdgpu_device_gpu_recover(adev, NULL);
> > -
> > -	return 0;
> > +	return amdgpu_device_gpu_recover(adev, NULL);
> 
> NAK, what we could do here is the following:
> 
> r = amdgpu_device_gpu_recover(....);
> seq_printf(m, "gpu recover %d\n", r);
> 
> But returning the error code from the GPU recovery to userspace doesn't make
> to much sense.
> 
> Christian.
> 
> >   }
> >
> >   static const struct drm_info_list amdgpu_debugfs_fence_list[] = {

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/amdgpu: correctly report gpu recover status
  2019-12-19  1:48   ` Quan, Evan
@ 2020-01-01 11:17     ` Christian König
  0 siblings, 0 replies; 4+ messages in thread
From: Christian König @ 2020-01-01 11:17 UTC (permalink / raw)
  To: Quan, Evan, amd-gfx

Hi Evan,

> But still what I care more(which is also the easiest way to me) is the correct return value of the API.
Well exactly that's the point ther return value is not correct for the API.

For example when the GPU reset function would return -EFAULT your 
program which reads the debugfs file would crash with a segmentation 
fault. That is not correct behavior.

In other words the result of the GPU reset can't be used as result of 
the debugfs read.

Regards,
Christian.

Am 19.12.19 um 02:48 schrieb Quan, Evan:
> Hi Christian,
>
> Here is some background for this change:
> I'm debugging a random failure issue on baco reset.
> I used a while loop to run the continuous baco reset tests and hope it can exit immediately on failure occurred.
> However, due to wrong return value, it did not. And as you can image, the failure scene was ruined.
>
> I can add this "seq_printf(m, "gpu recover %d\n", r);".
> But still what I care more(which is also the easiest way to me) is the correct return value of the API.
>
> Regards,
> Evan
>> -----Original Message-----
>> From: Christian König <ckoenig.leichtzumerken@gmail.com>
>> Sent: Wednesday, December 18, 2019 5:57 PM
>> To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdgpu: correctly report gpu recover status
>>
>> Am 18.12.19 um 04:25 schrieb Evan Quan:
>>> Knowing whether gpu recovery was performed successfully or not is
>>> important for our BACO development.
>>>
>>> Change-Id: I0e3ca4dcb65a053eb26bc55ad7431e4a42e160de
>>> Signed-off-by: Evan Quan <evan.quan@amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +---
>>>    1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> index e9efee04ca23..5dff5c0dd882 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> @@ -743,9 +743,7 @@ static int amdgpu_debugfs_gpu_recover(struct
>> seq_file *m, void *data)
>>>    	struct amdgpu_device *adev = dev->dev_private;
>>>
>>>    	seq_printf(m, "gpu recover\n");
>>> -	amdgpu_device_gpu_recover(adev, NULL);
>>> -
>>> -	return 0;
>>> +	return amdgpu_device_gpu_recover(adev, NULL);
>> NAK, what we could do here is the following:
>>
>> r = amdgpu_device_gpu_recover(....);
>> seq_printf(m, "gpu recover %d\n", r);
>>
>> But returning the error code from the GPU recovery to userspace doesn't make
>> to much sense.
>>
>> Christian.
>>
>>>    }
>>>
>>>    static const struct drm_info_list amdgpu_debugfs_fence_list[] = {

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-01-01 11:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-18  3:25 [PATCH] drm/amdgpu: correctly report gpu recover status Evan Quan
2019-12-18  9:56 ` Christian König
2019-12-19  1:48   ` Quan, Evan
2020-01-01 11:17     ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.