All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Quan, Evan" <Evan.Quan-5C7GfCeVMHo@public.gmane.org>
To: "Koenig,
	Christian" <Christian.Koenig-5C7GfCeVMHo@public.gmane.org>,
	"amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
Cc: "Deucher, Alexander" <Alexander.Deucher-5C7GfCeVMHo@public.gmane.org>
Subject: RE: [PATCH] drm/amdgpu: disable job timeout on GPU reset disabled
Date: Tue, 20 Mar 2018 02:11:24 +0000	[thread overview]
Message-ID: <DM5PR1201MB248999A09FAAC36F90204EF9E4AB0@DM5PR1201MB2489.namprd12.prod.outlook.com> (raw)
In-Reply-To: <d7a88e66-6533-9c12-c36c-9b3ea569e354-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 2283 bytes --]

Hi Christian,

The messages prompted on timeout are Errors not just Warnings although we did not see any real problem(for the dgemm special case). That's why we say it confusing.
And i suppose you want a fix like my previous patch(see attachment).

Regards,
Evan
> -----Original Message-----
> From: Christian König [mailto:ckoenig.leichtzumerken@gmail.com]
> Sent: Monday, March 19, 2018 5:42 PM
> To: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: disable job timeout on GPU reset
> disabled
> 
> Am 19.03.2018 um 07:08 schrieb Evan Quan:
> > Since under some heavy computing environment(dgemm test), it takes the
> > asic over 10+ seconds to finish the dispatched single job which will
> > trigger the timeout. It's quite confusing although it does not seem to
> > bring any real problems.
> > As a quick workround, we choose to disable timeout when GPU reset is
> > disabled.
> 
> NAK, I enabled those warning intentionally even when the GPU recovery is
> disabled to have a hint in the logs what goes wrong.
> 
> Please only increase the timeout for the compute queue and/or add a
> separate timeout for them.
> 
> Regards,
> Christian.
> 
> 
> >
> > Change-Id: I3a95d856ba4993094dc7b6269649e470c5b053d2
> > Signed-off-by: Evan Quan <evan.quan@amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++++++
> >   1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 8bd9c3f..9d6a775 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -861,6 +861,13 @@ static void
> amdgpu_device_check_arguments(struct amdgpu_device *adev)
> >   		amdgpu_lockup_timeout = 10000;
> >   	}
> >
> > +	/*
> > +	 * Disable timeout when GPU reset is disabled to avoid confusing
> > +	 * timeout messages in the kernel log.
> > +	 */
> > +	if (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)
> > +		amdgpu_lockup_timeout = INT_MAX;
> > +
> >   	adev->firmware.load_type = amdgpu_ucode_get_load_type(adev,
> amdgpu_fw_load_type);
> >   }
> >


[-- Attachment #2: Type: message/rfc822, Size: 4878 bytes --]

From: "Quan, Evan" <Evan.Quan-5C7GfCeVMHo@public.gmane.org>
To: "amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org" <amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
Cc: "Deucher, Alexander" <Alexander.Deucher-5C7GfCeVMHo@public.gmane.org>, "Quan, Evan" <Evan.Quan-5C7GfCeVMHo@public.gmane.org>
Subject: [PATCH] drm/amdgpu: no job timeout setting on compute queues
Date: Fri, 16 Mar 2018 04:52:32 +0000
Message-ID: <1521175952-21758-1-git-send-email-evan.quan-5C7GfCeVMHo@public.gmane.org>

Under some heavy computing test(dgemm) environment, it may takes
the asic over 50+ seconds to finish the dispatched single job
which will trigger the timeout. It's quite annoying although it
does not seem to bring any real problems.
As a quick workround, we choose to not enfoce the timeout
setting on compute queues.

Change-Id: I210011a90898617367e897a90e9f8fb2639281a3
Signed-off-by: Evan Quan <evan.quan-5C7GfCeVMHo@public.gmane.org>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 008e198..455a81e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -435,7 +435,9 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
        if (ring->funcs->type != AMDGPU_RING_TYPE_KIQ) {
                r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
                                   num_hw_submission, amdgpu_job_hang_limit,
-                                  msecs_to_jiffies(amdgpu_lockup_timeout), ring->name);
+                                  (ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE) ?
+                                  MAX_SCHEDULE_TIMEOUT : msecs_to_jiffies(amdgpu_lockup_timeout),
+                                  ring->name);
                if (r) {
                        DRM_ERROR("Failed to create scheduler on ring %s.\n",
                                  ring->name);
--
2.7.4


[-- Attachment #3: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  parent reply	other threads:[~2018-03-20  2:11 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-19  6:08 [PATCH] drm/amdgpu: disable job timeout on GPU reset disabled Evan Quan
     [not found] ` <1521439692-14823-1-git-send-email-evan.quan-5C7GfCeVMHo@public.gmane.org>
2018-03-19  6:12   ` Quan, Evan
2018-03-19  9:42   ` Christian König
     [not found]     ` <d7a88e66-6533-9c12-c36c-9b3ea569e354-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-03-20  2:11       ` Quan, Evan [this message]
     [not found]         ` <DM5PR1201MB248999A09FAAC36F90204EF9E4AB0-grEf7a3NxMAAZHT/xKzwlGrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2018-03-20 10:14           ` Christian König
     [not found]             ` <fffd20df-cbcb-51ae-7de2-915804fce17f-5C7GfCeVMHo@public.gmane.org>
2018-03-20 14:16               ` Deucher, Alexander
     [not found]                 ` <DM5PR12MB1820FEE50DE4EBD1E44B676BF7AB0-2J9CzHegvk8qWyLXlBb1HgdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-03-20 14:21                   ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM5PR1201MB248999A09FAAC36F90204EF9E4AB0@DM5PR1201MB2489.namprd12.prod.outlook.com \
    --to=evan.quan-5c7gfcevmho@public.gmane.org \
    --cc=Alexander.Deucher-5C7GfCeVMHo@public.gmane.org \
    --cc=Christian.Koenig-5C7GfCeVMHo@public.gmane.org \
    --cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.