All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: no job timeout setting on compute queues
@ 2018-03-16  4:52 Evan Quan
       [not found] ` <1521175952-21758-1-git-send-email-evan.quan-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Evan Quan @ 2018-03-16  4:52 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Alexander.Deucher-5C7GfCeVMHo, Evan Quan

Under some heavy computing test(dgemm) environment, it may takes
the asic over 50+ seconds to finish the dispatched single job
which will trigger the timeout. It's quite annoying although it
does not seem to bring any real problems.
As a quick workround, we choose to not enfoce the timeout
setting on compute queues.

Change-Id: I210011a90898617367e897a90e9f8fb2639281a3
Signed-off-by: Evan Quan <evan.quan@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 008e198..455a81e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -435,7 +435,9 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
 	if (ring->funcs->type != AMDGPU_RING_TYPE_KIQ) {
 		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
 				   num_hw_submission, amdgpu_job_hang_limit,
-				   msecs_to_jiffies(amdgpu_lockup_timeout), ring->name);
+				   (ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE) ?
+				   MAX_SCHEDULE_TIMEOUT : msecs_to_jiffies(amdgpu_lockup_timeout),
+				   ring->name);
 		if (r) {
 			DRM_ERROR("Failed to create scheduler on ring %s.\n",
 				  ring->name);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues
       [not found] ` <1521175952-21758-1-git-send-email-evan.quan-5C7GfCeVMHo@public.gmane.org>
@ 2018-03-16 16:14   ` Deucher, Alexander
       [not found]     ` <CY4PR12MB1653E72F547ABAECAC0A91A3F7D70-rpdhrqHFk06apTa93KjAaQdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Deucher, Alexander @ 2018-03-16 16:14 UTC (permalink / raw)
  To: Quan, Evan, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 2059 bytes --]

Since GPU reset is not enabled yet anyway, a timeout will just print a message, can we just change amdgpu_lockup_timeout to MAX_SCHEDULE_TIMEOUT until we enable GPU reset?


Alex

________________________________
From: Evan Quan <evan.quan-5C7GfCeVMHo@public.gmane.org>
Sent: Friday, March 16, 2018 12:52:32 AM
To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Cc: Deucher, Alexander; Quan, Evan
Subject: [PATCH] drm/amdgpu: no job timeout setting on compute queues

Under some heavy computing test(dgemm) environment, it may takes
the asic over 50+ seconds to finish the dispatched single job
which will trigger the timeout. It's quite annoying although it
does not seem to bring any real problems.
As a quick workround, we choose to not enfoce the timeout
setting on compute queues.

Change-Id: I210011a90898617367e897a90e9f8fb2639281a3
Signed-off-by: Evan Quan <evan.quan-5C7GfCeVMHo@public.gmane.org>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 008e198..455a81e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -435,7 +435,9 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
         if (ring->funcs->type != AMDGPU_RING_TYPE_KIQ) {
                 r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
                                    num_hw_submission, amdgpu_job_hang_limit,
-                                  msecs_to_jiffies(amdgpu_lockup_timeout), ring->name);
+                                  (ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE) ?
+                                  MAX_SCHEDULE_TIMEOUT : msecs_to_jiffies(amdgpu_lockup_timeout),
+                                  ring->name);
                 if (r) {
                         DRM_ERROR("Failed to create scheduler on ring %s.\n",
                                   ring->name);
--
2.7.4


[-- Attachment #1.2: Type: text/html, Size: 4893 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues
       [not found]     ` <CY4PR12MB1653E72F547ABAECAC0A91A3F7D70-rpdhrqHFk06apTa93KjAaQdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2018-03-16 16:25       ` Michel Dänzer
       [not found]         ` <bc7b30b9-7230-fdfe-4fd1-0f4dcd26a38d-otUistvHUpPR7s880joybQ@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Michel Dänzer @ 2018-03-16 16:25 UTC (permalink / raw)
  To: Deucher, Alexander, Quan, Evan; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2018-03-16 05:14 PM, Deucher, Alexander wrote:
> Since GPU reset is not enabled yet anyway, a timeout will just print a
> message, can we just change amdgpu_lockup_timeout to
> MAX_SCHEDULE_TIMEOUT until we enable GPU reset?

Wouldn't that be rather surprising for somebody who sets
amdgpu.gpu_recovery=1 ?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues
       [not found]         ` <bc7b30b9-7230-fdfe-4fd1-0f4dcd26a38d-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2018-03-16 17:17           ` Deucher, Alexander
       [not found]             ` <CY4PR12MB16535036642155291C39F030F7D70-rpdhrqHFk06apTa93KjAaQdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Deucher, Alexander @ 2018-03-16 17:17 UTC (permalink / raw)
  To: Michel Dänzer, Quan, Evan; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 1060 bytes --]

How about something like this:

/* Disable timeout when GPU reset is disabled to avoid confusing timeout messages in the kernel log */

If (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)

    amdgpu_lockup_timeout = MAX_SCHEDULE_TIMEOUT;


Alex

________________________________
From: Michel Dänzer <michel-otUistvHUpPR7s880joybQ@public.gmane.org>
Sent: Friday, March 16, 2018 12:25:03 PM
To: Deucher, Alexander; Quan, Evan
Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Subject: Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues

On 2018-03-16 05:14 PM, Deucher, Alexander wrote:
> Since GPU reset is not enabled yet anyway, a timeout will just print a
> message, can we just change amdgpu_lockup_timeout to
> MAX_SCHEDULE_TIMEOUT until we enable GPU reset?

Wouldn't that be rather surprising for somebody who sets
amdgpu.gpu_recovery=1 ?


--
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

[-- Attachment #1.2: Type: text/html, Size: 3351 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH] drm/amdgpu: no job timeout setting on compute queues
       [not found]             ` <CY4PR12MB16535036642155291C39F030F7D70-rpdhrqHFk06apTa93KjAaQdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2018-03-19  1:46               ` Quan, Evan
       [not found]                 ` <DM5PR1201MB2489E0DEA55F61E4314CE1EEE4D40-grEf7a3NxMAAZHT/xKzwlGrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Quan, Evan @ 2018-03-19  1:46 UTC (permalink / raw)
  To: Deucher, Alexander, Michel Dänzer
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 1586 bytes --]

That's fine for me. Will update the patch accordingly.

Regards,
Evan
From: Deucher, Alexander
Sent: Saturday, March 17, 2018 1:17 AM
To: Michel Dänzer <michel-otUistvHUpPR7s880joybQ@public.gmane.org>; Quan, Evan <Evan.Quan-5C7GfCeVMHo@public.gmane.org>
Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Subject: Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues


How about something like this:

/* Disable timeout when GPU reset is disabled to avoid confusing timeout messages in the kernel log */

If (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)

    amdgpu_lockup_timeout = MAX_SCHEDULE_TIMEOUT;



Alex

________________________________
From: Michel Dänzer <michel-otUistvHUpPR7s880joybQ@public.gmane.org<mailto:michel-otUistvHUpPR7s880joybQ@public.gmane.org>>
Sent: Friday, March 16, 2018 12:25:03 PM
To: Deucher, Alexander; Quan, Evan
Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org<mailto:amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
Subject: Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues

On 2018-03-16 05:14 PM, Deucher, Alexander wrote:
> Since GPU reset is not enabled yet anyway, a timeout will just print a
> message, can we just change amdgpu_lockup_timeout to
> MAX_SCHEDULE_TIMEOUT until we enable GPU reset?

Wouldn't that be rather surprising for somebody who sets
amdgpu.gpu_recovery=1 ?


--
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

[-- Attachment #1.2: Type: text/html, Size: 5355 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH] drm/amdgpu: no job timeout setting on compute queues
       [not found]                 ` <DM5PR1201MB2489E0DEA55F61E4314CE1EEE4D40-grEf7a3NxMAAZHT/xKzwlGrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2018-03-26 12:52                   ` Liu, Monk
  2018-03-26 12:55                   ` Liu, Monk
  1 sibling, 0 replies; 10+ messages in thread
From: Liu, Monk @ 2018-03-26 12:52 UTC (permalink / raw)
  To: Quan, Evan, Deucher, Alexander, Michel Dänzer
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 1855 bytes --]

Please don’t do it for SR-IOV

/Monk

From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Quan, Evan
Sent: 2018年3月19日 9:47
To: Deucher, Alexander <Alexander.Deucher@amd.com>; Michel Dänzer <michel@daenzer.net>
Cc: amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: no job timeout setting on compute queues

That’s fine for me. Will update the patch accordingly.

Regards,
Evan
From: Deucher, Alexander
Sent: Saturday, March 17, 2018 1:17 AM
To: Michel Dänzer <michel@daenzer.net<mailto:michel@daenzer.net>>; Quan, Evan <Evan.Quan@amd.com<mailto:Evan.Quan@amd.com>>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues


How about something like this:

/* Disable timeout when GPU reset is disabled to avoid confusing timeout messages in the kernel log */

If (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)

    amdgpu_lockup_timeout = MAX_SCHEDULE_TIMEOUT;




Alex

________________________________
From: Michel Dänzer <michel@daenzer.net<mailto:michel@daenzer.net>>
Sent: Friday, March 16, 2018 12:25:03 PM
To: Deucher, Alexander; Quan, Evan
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues

On 2018-03-16 05:14 PM, Deucher, Alexander wrote:
> Since GPU reset is not enabled yet anyway, a timeout will just print a
> message, can we just change amdgpu_lockup_timeout to
> MAX_SCHEDULE_TIMEOUT until we enable GPU reset?

Wouldn't that be rather surprising for somebody who sets
amdgpu.gpu_recovery=1 ?


--
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

[-- Attachment #1.2: Type: text/html, Size: 7341 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH] drm/amdgpu: no job timeout setting on compute queues
       [not found]                 ` <DM5PR1201MB2489E0DEA55F61E4314CE1EEE4D40-grEf7a3NxMAAZHT/xKzwlGrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  2018-03-26 12:52                   ` Liu, Monk
@ 2018-03-26 12:55                   ` Liu, Monk
       [not found]                     ` <BLUPR12MB0449F32EA2C9877B7C0BCA2984AD0-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  1 sibling, 1 reply; 10+ messages in thread
From: Liu, Monk @ 2018-03-26 12:55 UTC (permalink / raw)
  To: Quan, Evan, Deucher, Alexander, Michel Dänzer
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 2248 bytes --]

Besides, if some compute shader takes time more them 50 seconds, you can just set lockuptime out to 50s
Why change the logic in kmd side ?

I don’t think it’s a good idea to disable the time out message for compute ring, we have virtualization end-user
Still want those message printed out

Can you do this way ?
In amdgpu_job_timeout, you can use DRM_WARN to replace DRM_ERROR for the job belongs to CPC engine ?

/Monk

From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Quan, Evan
Sent: 2018年3月19日 9:47
To: Deucher, Alexander <Alexander.Deucher@amd.com>; Michel Dänzer <michel@daenzer.net>
Cc: amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: no job timeout setting on compute queues

That’s fine for me. Will update the patch accordingly.

Regards,
Evan
From: Deucher, Alexander
Sent: Saturday, March 17, 2018 1:17 AM
To: Michel Dänzer <michel@daenzer.net<mailto:michel@daenzer.net>>; Quan, Evan <Evan.Quan@amd.com<mailto:Evan.Quan@amd.com>>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues


How about something like this:

/* Disable timeout when GPU reset is disabled to avoid confusing timeout messages in the kernel log */

If (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)

    amdgpu_lockup_timeout = MAX_SCHEDULE_TIMEOUT;




Alex

________________________________
From: Michel Dänzer <michel@daenzer.net<mailto:michel@daenzer.net>>
Sent: Friday, March 16, 2018 12:25:03 PM
To: Deucher, Alexander; Quan, Evan
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues

On 2018-03-16 05:14 PM, Deucher, Alexander wrote:
> Since GPU reset is not enabled yet anyway, a timeout will just print a
> message, can we just change amdgpu_lockup_timeout to
> MAX_SCHEDULE_TIMEOUT until we enable GPU reset?

Wouldn't that be rather surprising for somebody who sets
amdgpu.gpu_recovery=1 ?


--
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

[-- Attachment #1.2: Type: text/html, Size: 8500 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues
       [not found]                     ` <BLUPR12MB0449F32EA2C9877B7C0BCA2984AD0-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2018-03-26 15:19                       ` Deucher, Alexander
       [not found]                         ` <BN6PR12MB18098A39176CC8C063A9BF7AF7AD0-/b2+HYfkarSEx6ez0IUAagdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Deucher, Alexander @ 2018-03-26 15:19 UTC (permalink / raw)
  To: Liu, Monk, Quan, Evan, Michel Dänzer
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 2610 bytes --]

That's fine with me too.  Or make them DRM_INFO.


Alex

________________________________
From: Liu, Monk
Sent: Monday, March 26, 2018 8:55:51 AM
To: Quan, Evan; Deucher, Alexander; Michel Dänzer
Cc: amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: no job timeout setting on compute queues


Besides, if some compute shader takes time more them 50 seconds, you can just set lockuptime out to 50s

Why change the logic in kmd side ?



I don’t think it’s a good idea to disable the time out message for compute ring, we have virtualization end-user

Still want those message printed out



Can you do this way ?

In amdgpu_job_timeout, you can use DRM_WARN to replace DRM_ERROR for the job belongs to CPC engine ?



/Monk



From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Quan, Evan
Sent: 2018年3月19日 9:47
To: Deucher, Alexander <Alexander.Deucher@amd.com>; Michel Dänzer <michel@daenzer.net>
Cc: amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: no job timeout setting on compute queues



That’s fine for me. Will update the patch accordingly.



Regards,

Evan

From: Deucher, Alexander
Sent: Saturday, March 17, 2018 1:17 AM
To: Michel Dänzer <michel@daenzer.net<mailto:michel@daenzer.net>>; Quan, Evan <Evan.Quan@amd.com<mailto:Evan.Quan@amd.com>>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues



How about something like this:

/* Disable timeout when GPU reset is disabled to avoid confusing timeout messages in the kernel log */

If (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)

    amdgpu_lockup_timeout = MAX_SCHEDULE_TIMEOUT;




Alex

________________________________

From: Michel Dänzer <michel@daenzer.net<mailto:michel@daenzer.net>>
Sent: Friday, March 16, 2018 12:25:03 PM
To: Deucher, Alexander; Quan, Evan
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues



On 2018-03-16 05:14 PM, Deucher, Alexander wrote:
> Since GPU reset is not enabled yet anyway, a timeout will just print a
> message, can we just change amdgpu_lockup_timeout to
> MAX_SCHEDULE_TIMEOUT until we enable GPU reset?

Wouldn't that be rather surprising for somebody who sets
amdgpu.gpu_recovery=1 ?


--
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

[-- Attachment #1.2: Type: text/html, Size: 8314 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH] drm/amdgpu: no job timeout setting on compute queues
       [not found]                         ` <BN6PR12MB18098A39176CC8C063A9BF7AF7AD0-/b2+HYfkarSEx6ez0IUAagdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2018-03-27  1:33                           ` Quan, Evan
       [not found]                             ` <DM5PR1201MB248935C8971B1D451391F294E4AC0-grEf7a3NxMAAZHT/xKzwlGrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Quan, Evan @ 2018-03-27  1:33 UTC (permalink / raw)
  To: Deucher, Alexander, Liu, Monk, Michel Dänzer
  Cc: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 3637 bytes --]

Hi Monk,

That’s a fix reached after a long discussion(I believe you were in that mail thread).
50s is for the dgemm test. We are not sure whether it’s enough for other compute use case.
If virtualization still needs these messages, I believe we can list that as an exception

-                                  (ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE) ?
+                                  /* for non-sriov case, no timeout enforce on compute ring */
+                                  ((ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE) && !amdgpu_sriov_vf(ring->adev)) ?

Regards,
Evan
From: Deucher, Alexander
Sent: Monday, March 26, 2018 11:20 PM
To: Liu, Monk <Monk.Liu@amd.com>; Quan, Evan <Evan.Quan@amd.com>; Michel Dänzer <michel@daenzer.net>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues


That's fine with me too.  Or make them DRM_INFO.



Alex

________________________________
From: Liu, Monk
Sent: Monday, March 26, 2018 8:55:51 AM
To: Quan, Evan; Deucher, Alexander; Michel Dänzer
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: RE: [PATCH] drm/amdgpu: no job timeout setting on compute queues


Besides, if some compute shader takes time more them 50 seconds, you can just set lockuptime out to 50s

Why change the logic in kmd side ?



I don’t think it’s a good idea to disable the time out message for compute ring, we have virtualization end-user

Still want those message printed out



Can you do this way ?

In amdgpu_job_timeout, you can use DRM_WARN to replace DRM_ERROR for the job belongs to CPC engine ?



/Monk



From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Quan, Evan
Sent: 2018年3月19日 9:47
To: Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Michel Dänzer <michel@daenzer.net<mailto:michel@daenzer.net>>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: RE: [PATCH] drm/amdgpu: no job timeout setting on compute queues



That’s fine for me. Will update the patch accordingly.



Regards,

Evan

From: Deucher, Alexander
Sent: Saturday, March 17, 2018 1:17 AM
To: Michel Dänzer <michel@daenzer.net<mailto:michel@daenzer.net>>; Quan, Evan <Evan.Quan@amd.com<mailto:Evan.Quan@amd.com>>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues



How about something like this:

/* Disable timeout when GPU reset is disabled to avoid confusing timeout messages in the kernel log */

If (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)

    amdgpu_lockup_timeout = MAX_SCHEDULE_TIMEOUT;



Alex

________________________________

From: Michel Dänzer <michel@daenzer.net<mailto:michel@daenzer.net>>
Sent: Friday, March 16, 2018 12:25:03 PM
To: Deucher, Alexander; Quan, Evan
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues



On 2018-03-16 05:14 PM, Deucher, Alexander wrote:
> Since GPU reset is not enabled yet anyway, a timeout will just print a
> message, can we just change amdgpu_lockup_timeout to
> MAX_SCHEDULE_TIMEOUT until we enable GPU reset?

Wouldn't that be rather surprising for somebody who sets
amdgpu.gpu_recovery=1 ?


--
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

[-- Attachment #1.2: Type: text/html, Size: 12195 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH] drm/amdgpu: no job timeout setting on compute queues
       [not found]                             ` <DM5PR1201MB248935C8971B1D451391F294E4AC0-grEf7a3NxMAAZHT/xKzwlGrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2018-03-27  3:22                               ` Liu, Monk
  0 siblings, 0 replies; 10+ messages in thread
From: Liu, Monk @ 2018-03-27  3:22 UTC (permalink / raw)
  To: Quan, Evan, Deucher, Alexander, Michel Dänzer
  Cc: Koenig, Christian, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 4103 bytes --]

Ok,  fine by me

/Monk
From: Quan, Evan
Sent: 2018年3月27日 9:34
To: Deucher, Alexander <Alexander.Deucher@amd.com>; Liu, Monk <Monk.Liu@amd.com>; Michel Dänzer <michel@daenzer.net>
Cc: amd-gfx@lists.freedesktop.org; Koenig, Christian <Christian.Koenig@amd.com>
Subject: RE: [PATCH] drm/amdgpu: no job timeout setting on compute queues

Hi Monk,

That’s a fix reached after a long discussion(I believe you were in that mail thread).
50s is for the dgemm test. We are not sure whether it’s enough for other compute use case.
If virtualization still needs these messages, I believe we can list that as an exception

-                                  (ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE) ?
+                                  /* for non-sriov case, no timeout enforce on compute ring */
+                                  ((ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE) && !amdgpu_sriov_vf(ring->adev)) ?

Regards,
Evan
From: Deucher, Alexander
Sent: Monday, March 26, 2018 11:20 PM
To: Liu, Monk <Monk.Liu@amd.com<mailto:Monk.Liu@amd.com>>; Quan, Evan <Evan.Quan@amd.com<mailto:Evan.Quan@amd.com>>; Michel Dänzer <michel@daenzer.net<mailto:michel@daenzer.net>>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues


That's fine with me too.  Or make them DRM_INFO.



Alex

________________________________
From: Liu, Monk
Sent: Monday, March 26, 2018 8:55:51 AM
To: Quan, Evan; Deucher, Alexander; Michel Dänzer
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: RE: [PATCH] drm/amdgpu: no job timeout setting on compute queues


Besides, if some compute shader takes time more them 50 seconds, you can just set lockuptime out to 50s

Why change the logic in kmd side ?



I don’t think it’s a good idea to disable the time out message for compute ring, we have virtualization end-user

Still want those message printed out



Can you do this way ?

In amdgpu_job_timeout, you can use DRM_WARN to replace DRM_ERROR for the job belongs to CPC engine ?



/Monk



From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Quan, Evan
Sent: 2018年3月19日 9:47
To: Deucher, Alexander <Alexander.Deucher@amd.com<mailto:Alexander.Deucher@amd.com>>; Michel Dänzer <michel@daenzer.net<mailto:michel@daenzer.net>>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: RE: [PATCH] drm/amdgpu: no job timeout setting on compute queues



That’s fine for me. Will update the patch accordingly.



Regards,

Evan

From: Deucher, Alexander
Sent: Saturday, March 17, 2018 1:17 AM
To: Michel Dänzer <michel@daenzer.net<mailto:michel@daenzer.net>>; Quan, Evan <Evan.Quan@amd.com<mailto:Evan.Quan@amd.com>>
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues



How about something like this:

/* Disable timeout when GPU reset is disabled to avoid confusing timeout messages in the kernel log */

If (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)

    amdgpu_lockup_timeout = MAX_SCHEDULE_TIMEOUT;



Alex

________________________________

From: Michel Dänzer <michel@daenzer.net<mailto:michel@daenzer.net>>
Sent: Friday, March 16, 2018 12:25:03 PM
To: Deucher, Alexander; Quan, Evan
Cc: amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Subject: Re: [PATCH] drm/amdgpu: no job timeout setting on compute queues



On 2018-03-16 05:14 PM, Deucher, Alexander wrote:
> Since GPU reset is not enabled yet anyway, a timeout will just print a
> message, can we just change amdgpu_lockup_timeout to
> MAX_SCHEDULE_TIMEOUT until we enable GPU reset?

Wouldn't that be rather surprising for somebody who sets
amdgpu.gpu_recovery=1 ?


--
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

[-- Attachment #1.2: Type: text/html, Size: 15048 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-03-27  3:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-16  4:52 [PATCH] drm/amdgpu: no job timeout setting on compute queues Evan Quan
     [not found] ` <1521175952-21758-1-git-send-email-evan.quan-5C7GfCeVMHo@public.gmane.org>
2018-03-16 16:14   ` Deucher, Alexander
     [not found]     ` <CY4PR12MB1653E72F547ABAECAC0A91A3F7D70-rpdhrqHFk06apTa93KjAaQdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-03-16 16:25       ` Michel Dänzer
     [not found]         ` <bc7b30b9-7230-fdfe-4fd1-0f4dcd26a38d-otUistvHUpPR7s880joybQ@public.gmane.org>
2018-03-16 17:17           ` Deucher, Alexander
     [not found]             ` <CY4PR12MB16535036642155291C39F030F7D70-rpdhrqHFk06apTa93KjAaQdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-03-19  1:46               ` Quan, Evan
     [not found]                 ` <DM5PR1201MB2489E0DEA55F61E4314CE1EEE4D40-grEf7a3NxMAAZHT/xKzwlGrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2018-03-26 12:52                   ` Liu, Monk
2018-03-26 12:55                   ` Liu, Monk
     [not found]                     ` <BLUPR12MB0449F32EA2C9877B7C0BCA2984AD0-7LeqcoF/hwpTIQvHjXdJlwdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-03-26 15:19                       ` Deucher, Alexander
     [not found]                         ` <BN6PR12MB18098A39176CC8C063A9BF7AF7AD0-/b2+HYfkarSEx6ez0IUAagdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-03-27  1:33                           ` Quan, Evan
     [not found]                             ` <DM5PR1201MB248935C8971B1D451391F294E4AC0-grEf7a3NxMAAZHT/xKzwlGrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2018-03-27  3:22                               ` Liu, Monk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.