All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: "Christian König" <christian.koenig@amd.com>
Cc: Rob Clark <robdclark@gmail.com>,
	Rob Clark <robdclark@chromium.org>,
	Jonathan Marek <jonathan@marek.ca>,
	David Airlie <airlied@linux.ie>,
	freedreno <freedreno@lists.freedesktop.org>,
	Vladimir Lypak <vladimir.lypak@gmail.com>,
	Abhinav Kumar <quic_abhinavk@quicinc.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Bjorn Andersson <bjorn.andersson@linaro.org>,
	Akhil P Oommen <quic_akhilpo@quicinc.com>,
	linux-arm-msm <linux-arm-msm@vger.kernel.org>,
	Sean Paul <sean@poorly.run>,
	open list <linux-kernel@vger.kernel.org>,
	AngeloGioacchino Del Regno 
	<angelogioacchino.delregno@collabora.com>
Subject: Re: [PATCH 2/3] drm/msm/gpu: Park scheduler threads for system suspend
Date: Thu, 17 Mar 2022 18:29:29 +0100	[thread overview]
Message-ID: <YjNv+csmFyjNYc5b@phenom.ffwll.local> (raw)
In-Reply-To: <915537e2-ac5b-ab0e-3697-2b16a9ec8f91@amd.com>

On Thu, Mar 17, 2022 at 05:44:57PM +0100, Christian König wrote:
> Am 17.03.22 um 17:18 schrieb Rob Clark:
> > On Thu, Mar 17, 2022 at 9:04 AM Christian König
> > <christian.koenig@amd.com> wrote:
> > > Am 17.03.22 um 16:10 schrieb Rob Clark:
> > > > [SNIP]
> > > > userspace frozen != kthread frozen .. that is what this patch is
> > > > trying to address, so we aren't racing between shutting down the hw
> > > > and the scheduler shoveling more jobs at us.
> > > Well exactly that's the problem. The scheduler is supposed to shoveling
> > > more jobs at us until it is empty.
> > > 
> > > Thinking more about it we will then keep some dma_fence instance
> > > unsignaled and that is and extremely bad idea since it can lead to
> > > deadlocks during suspend.
> > Hmm, perhaps that is true if you need to migrate things out of vram?
> > It is at least not a problem when vram is not involved.
> 
> No, it's much wider than that.
> 
> See what can happen is that the memory management shrinkers want to wait for
> a dma_fence during suspend.
> 
> And if you stop the scheduler they will just wait forever.
> 
> What you need to do instead is to drain the scheduler, e.g. call
> drm_sched_entity_flush() with a proper timeout for each entity you have
> created.

Yeah I think properly flushing the scheduler and stopping it and cutting
all drivers over to that sounds like the right approach. Generally suspend
shouldn't be such a critical path that this will hurt us, all the other io
queues get flushed too afaik.

Resume is the thing that needs to go real fast.

So a patch set to move all drivers that open code the kthread_park to the
right scheduler function sounds like the right idea here to me.
-Daniel

> 
> Regards,
> Christian.
> 
> > 
> > > So this patch here is an absolute clear NAK from my side. If amdgpu is
> > > doing something similar that is a severe bug and needs to be addressed
> > > somehow.
> > I think amdgpu's use of kthread_park is not related to suspend, but
> > didn't look too closely.
> > 
> > And perhaps the solution for this problem is more complex in the case
> > of amdgpu, I'm not super familiar with the constraints there.  But I
> > think it is a fine solution for integrated GPUs.
> > 
> > BR,
> > -R
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > BR,
> > > > -R
> > > > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch>
To: "Christian König" <christian.koenig@amd.com>
Cc: Rob Clark <robdclark@chromium.org>,
	Jonathan Marek <jonathan@marek.ca>,
	David Airlie <airlied@linux.ie>,
	linux-arm-msm <linux-arm-msm@vger.kernel.org>,
	Vladimir Lypak <vladimir.lypak@gmail.com>,
	Abhinav Kumar <quic_abhinavk@quicinc.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Bjorn Andersson <bjorn.andersson@linaro.org>,
	Akhil P Oommen <quic_akhilpo@quicinc.com>,
	Sean Paul <sean@poorly.run>,
	freedreno <freedreno@lists.freedesktop.org>,
	open list <linux-kernel@vger.kernel.org>,
	AngeloGioacchino Del Regno
	<angelogioacchino.delregno@collabora.com>
Subject: Re: [PATCH 2/3] drm/msm/gpu: Park scheduler threads for system suspend
Date: Thu, 17 Mar 2022 18:29:29 +0100	[thread overview]
Message-ID: <YjNv+csmFyjNYc5b@phenom.ffwll.local> (raw)
In-Reply-To: <915537e2-ac5b-ab0e-3697-2b16a9ec8f91@amd.com>

On Thu, Mar 17, 2022 at 05:44:57PM +0100, Christian König wrote:
> Am 17.03.22 um 17:18 schrieb Rob Clark:
> > On Thu, Mar 17, 2022 at 9:04 AM Christian König
> > <christian.koenig@amd.com> wrote:
> > > Am 17.03.22 um 16:10 schrieb Rob Clark:
> > > > [SNIP]
> > > > userspace frozen != kthread frozen .. that is what this patch is
> > > > trying to address, so we aren't racing between shutting down the hw
> > > > and the scheduler shoveling more jobs at us.
> > > Well exactly that's the problem. The scheduler is supposed to shoveling
> > > more jobs at us until it is empty.
> > > 
> > > Thinking more about it we will then keep some dma_fence instance
> > > unsignaled and that is and extremely bad idea since it can lead to
> > > deadlocks during suspend.
> > Hmm, perhaps that is true if you need to migrate things out of vram?
> > It is at least not a problem when vram is not involved.
> 
> No, it's much wider than that.
> 
> See what can happen is that the memory management shrinkers want to wait for
> a dma_fence during suspend.
> 
> And if you stop the scheduler they will just wait forever.
> 
> What you need to do instead is to drain the scheduler, e.g. call
> drm_sched_entity_flush() with a proper timeout for each entity you have
> created.

Yeah I think properly flushing the scheduler and stopping it and cutting
all drivers over to that sounds like the right approach. Generally suspend
shouldn't be such a critical path that this will hurt us, all the other io
queues get flushed too afaik.

Resume is the thing that needs to go real fast.

So a patch set to move all drivers that open code the kthread_park to the
right scheduler function sounds like the right idea here to me.
-Daniel

> 
> Regards,
> Christian.
> 
> > 
> > > So this patch here is an absolute clear NAK from my side. If amdgpu is
> > > doing something similar that is a severe bug and needs to be addressed
> > > somehow.
> > I think amdgpu's use of kthread_park is not related to suspend, but
> > didn't look too closely.
> > 
> > And perhaps the solution for this problem is more complex in the case
> > of amdgpu, I'm not super familiar with the constraints there.  But I
> > think it is a fine solution for integrated GPUs.
> > 
> > BR,
> > -R
> > 
> > > Regards,
> > > Christian.
> > > 
> > > > BR,
> > > > -R
> > > > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  reply	other threads:[~2022-03-17 17:29 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-10 23:46 [PATCH 0/3] drm/msm/gpu: More system suspend fixes Rob Clark
2022-03-10 23:46 ` Rob Clark
2022-03-10 23:46 ` [PATCH 1/3] drm/msm/gpu: Rename runtime suspend/resume functions Rob Clark
2022-03-10 23:46   ` Rob Clark
2022-03-11  9:26   ` AngeloGioacchino Del Regno
2022-03-11  9:26     ` AngeloGioacchino Del Regno
2022-03-10 23:46 ` [PATCH 2/3] drm/msm/gpu: Park scheduler threads for system suspend Rob Clark
2022-03-10 23:46   ` Rob Clark
2022-03-17  9:59   ` Daniel Vetter
2022-03-17  9:59     ` Daniel Vetter
2022-03-17 10:06     ` Christian König
2022-03-17 14:58       ` Matthew Brost
2022-03-17 14:58         ` Matthew Brost
2022-03-17 15:10       ` Rob Clark
2022-03-17 15:10         ` Rob Clark
2022-03-17 16:04         ` Christian König
2022-03-17 16:04           ` Christian König
2022-03-17 16:18           ` Rob Clark
2022-03-17 16:18             ` Rob Clark
2022-03-17 16:44             ` Christian König
2022-03-17 16:44               ` Christian König
2022-03-17 17:29               ` Daniel Vetter [this message]
2022-03-17 17:29                 ` Daniel Vetter
2022-03-17 17:35               ` Rob Clark
2022-03-17 17:35                 ` Rob Clark
2022-03-17 18:10                 ` Andrey Grodzovsky
2022-03-17 18:10                   ` Andrey Grodzovsky
2022-03-17 18:25                   ` Rob Clark
2022-03-17 18:25                     ` Rob Clark
2022-03-17 19:49                     ` Andrey Grodzovsky
2022-03-17 19:49                       ` Andrey Grodzovsky
2022-03-17 20:35                       ` Rob Clark
2022-03-17 20:35                         ` Rob Clark
2022-03-18 16:04                         ` Andrey Grodzovsky
2022-03-18 16:04                           ` Andrey Grodzovsky
2022-03-18 16:20                           ` Rob Clark
2022-03-18 16:20                             ` Rob Clark
2022-03-18 16:27                             ` Andrey Grodzovsky
2022-03-18 16:27                               ` Andrey Grodzovsky
2022-03-18 17:22                               ` Rob Clark
2022-03-18 17:22                                 ` Rob Clark
2022-03-18 20:14                                 ` Andrey Grodzovsky
2022-03-18 20:14                                   ` Andrey Grodzovsky
2022-03-17 17:46           ` Andrey Grodzovsky
2022-03-17 17:46             ` Andrey Grodzovsky
2022-03-10 23:46 ` [PATCH 3/3] drm/msm/gpu: Remove mutex from wait_event condition Rob Clark
2022-03-10 23:46   ` Rob Clark
2022-03-17 20:45   ` [Freedreno] " Akhil P Oommen
2022-03-17 20:45     ` Akhil P Oommen
2022-03-17 21:07     ` Rob Clark
2022-03-17 21:07       ` Rob Clark
2022-03-18  2:55       ` Hillf Danton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjNv+csmFyjNYc5b@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=airlied@linux.ie \
    --cc=angelogioacchino.delregno@collabora.com \
    --cc=bjorn.andersson@linaro.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=freedreno@lists.freedesktop.org \
    --cc=jonathan@marek.ca \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=quic_abhinavk@quicinc.com \
    --cc=quic_akhilpo@quicinc.com \
    --cc=robdclark@chromium.org \
    --cc=robdclark@gmail.com \
    --cc=sean@poorly.run \
    --cc=vladimir.lypak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.