From: Rob Clark <robdclark@gmail.com> To: dri-devel@lists.freedesktop.org Cc: freedreno@lists.freedesktop.org, linux-arm-msm@vger.kernel.org, "Emma Anholt" <emma@anholt.net>, "Yiwei Zhang" <zzyiwei@chromium.org>, "Rob Clark" <robdclark@chromium.org>, "Rob Clark" <robdclark@gmail.com>, "Sean Paul" <sean@poorly.run>, "Abhinav Kumar" <quic_abhinavk@quicinc.com>, "David Airlie" <airlied@linux.ie>, "Daniel Vetter" <daniel@ffwll.ch>, "Jonathan Marek" <jonathan@marek.ca>, "Jordan Crouse" <jordan@cosmicpenguin.net>, "Christian König" <christian.koenig@amd.com>, "Akhil P Oommen" <quic_akhilpo@quicinc.com>, "Stephen Boyd" <swboyd@chromium.org>, "Yangtao Li" <tiny.windzz@gmail.com>, linux-kernel@vger.kernel.org (open list) Subject: [PATCH v2 2/2] drm/msm/gpu: Track global faults per address-space Date: Tue, 1 Feb 2022 08:16:12 -0800 [thread overview] Message-ID: <20220201161618.778455-3-robdclark@gmail.com> (raw) In-Reply-To: <20220201161618.778455-1-robdclark@gmail.com> From: Rob Clark <robdclark@chromium.org> Other processes don't need to know about faults that they are isolated from by virtue of address space isolation. They are only interested in whether some of their state might have been corrupted. But to be safe, also track unattributed faults. This case should really never happen unless there is a kernel bug (and that would never happen, right?) v2: Instead of adding a new param, just change the behavior of the existing param to match what userspace actually wants [anholt] Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5934 Signed-off-by: Rob Clark <robdclark@chromium.org> --- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +- drivers/gpu/drm/msm/msm_gem.h | 3 +++ drivers/gpu/drm/msm/msm_gpu.c | 8 +++++++- drivers/gpu/drm/msm/msm_gpu.h | 5 ++++- 4 files changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index caa9076197de..58dfb23cf2af 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -269,7 +269,7 @@ int adreno_get_param(struct msm_gpu *gpu, struct msm_file_private *ctx, *value = 0; return 0; case MSM_PARAM_FAULTS: - *value = gpu->global_faults; + *value = gpu->global_faults + ctx->aspace->faults; return 0; case MSM_PARAM_SUSPENDS: *value = gpu->suspend_count; diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h index 54ca0817d807..af612add5264 100644 --- a/drivers/gpu/drm/msm/msm_gem.h +++ b/drivers/gpu/drm/msm/msm_gem.h @@ -35,6 +35,9 @@ struct msm_gem_address_space { * will be non-NULL: */ struct pid *pid; + + /* @faults: the number of GPU hangs associated with this address space */ + int faults; }; struct msm_gem_vma { diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c index 2c1049c0ea14..942bf41403ff 100644 --- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -370,8 +370,8 @@ static void recover_worker(struct kthread_work *work) struct task_struct *task; /* Increment the fault counts */ - gpu->global_faults++; submit->queue->faults++; + submit->aspace->faults++; task = get_pid_task(submit->pid, PIDTYPE_PID); if (task) { @@ -389,6 +389,12 @@ static void recover_worker(struct kthread_work *work) } else { msm_rd_dump_submit(priv->hangrd, submit, NULL); } + } else { + /* + * We couldn't attribute this fault to any particular context, + * so increment the global fault count instead. + */ + gpu->global_faults++; } /* Record the crash state */ diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h index ba8407231340..c99627fc99dd 100644 --- a/drivers/gpu/drm/msm/msm_gpu.h +++ b/drivers/gpu/drm/msm/msm_gpu.h @@ -200,7 +200,10 @@ struct msm_gpu { /* does gpu need hw_init? */ bool needs_hw_init; - /* number of GPU hangs (for all contexts) */ + /** + * global_faults: number of GPU hangs not attributed to a particular + * address space + */ int global_faults; void __iomem *mmio; -- 2.34.1
WARNING: multiple messages have this Message-ID (diff)
From: Rob Clark <robdclark@gmail.com> To: dri-devel@lists.freedesktop.org Cc: "Rob Clark" <robdclark@chromium.org>, "Jonathan Marek" <jonathan@marek.ca>, "Emma Anholt" <emma@anholt.net>, "Akhil P Oommen" <quic_akhilpo@quicinc.com>, "David Airlie" <airlied@linux.ie>, linux-arm-msm@vger.kernel.org, "Yangtao Li" <tiny.windzz@gmail.com>, "Abhinav Kumar" <quic_abhinavk@quicinc.com>, "Jordan Crouse" <jordan@cosmicpenguin.net>, "Sean Paul" <sean@poorly.run>, "Yiwei Zhang" <zzyiwei@chromium.org>, "Stephen Boyd" <swboyd@chromium.org>, freedreno@lists.freedesktop.org, "Christian König" <christian.koenig@amd.com>, "open list" <linux-kernel@vger.kernel.org> Subject: [PATCH v2 2/2] drm/msm/gpu: Track global faults per address-space Date: Tue, 1 Feb 2022 08:16:12 -0800 [thread overview] Message-ID: <20220201161618.778455-3-robdclark@gmail.com> (raw) In-Reply-To: <20220201161618.778455-1-robdclark@gmail.com> From: Rob Clark <robdclark@chromium.org> Other processes don't need to know about faults that they are isolated from by virtue of address space isolation. They are only interested in whether some of their state might have been corrupted. But to be safe, also track unattributed faults. This case should really never happen unless there is a kernel bug (and that would never happen, right?) v2: Instead of adding a new param, just change the behavior of the existing param to match what userspace actually wants [anholt] Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5934 Signed-off-by: Rob Clark <robdclark@chromium.org> --- drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +- drivers/gpu/drm/msm/msm_gem.h | 3 +++ drivers/gpu/drm/msm/msm_gpu.c | 8 +++++++- drivers/gpu/drm/msm/msm_gpu.h | 5 ++++- 4 files changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index caa9076197de..58dfb23cf2af 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -269,7 +269,7 @@ int adreno_get_param(struct msm_gpu *gpu, struct msm_file_private *ctx, *value = 0; return 0; case MSM_PARAM_FAULTS: - *value = gpu->global_faults; + *value = gpu->global_faults + ctx->aspace->faults; return 0; case MSM_PARAM_SUSPENDS: *value = gpu->suspend_count; diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h index 54ca0817d807..af612add5264 100644 --- a/drivers/gpu/drm/msm/msm_gem.h +++ b/drivers/gpu/drm/msm/msm_gem.h @@ -35,6 +35,9 @@ struct msm_gem_address_space { * will be non-NULL: */ struct pid *pid; + + /* @faults: the number of GPU hangs associated with this address space */ + int faults; }; struct msm_gem_vma { diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c index 2c1049c0ea14..942bf41403ff 100644 --- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -370,8 +370,8 @@ static void recover_worker(struct kthread_work *work) struct task_struct *task; /* Increment the fault counts */ - gpu->global_faults++; submit->queue->faults++; + submit->aspace->faults++; task = get_pid_task(submit->pid, PIDTYPE_PID); if (task) { @@ -389,6 +389,12 @@ static void recover_worker(struct kthread_work *work) } else { msm_rd_dump_submit(priv->hangrd, submit, NULL); } + } else { + /* + * We couldn't attribute this fault to any particular context, + * so increment the global fault count instead. + */ + gpu->global_faults++; } /* Record the crash state */ diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h index ba8407231340..c99627fc99dd 100644 --- a/drivers/gpu/drm/msm/msm_gpu.h +++ b/drivers/gpu/drm/msm/msm_gpu.h @@ -200,7 +200,10 @@ struct msm_gpu { /* does gpu need hw_init? */ bool needs_hw_init; - /* number of GPU hangs (for all contexts) */ + /** + * global_faults: number of GPU hangs not attributed to a particular + * address space + */ int global_faults; void __iomem *mmio; -- 2.34.1
next prev parent reply other threads:[~2022-02-01 16:16 UTC|newest] Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-02-01 16:16 [PATCH v2 0/2] drm/msm: Add tracking for faults associated with an address space Rob Clark 2022-02-01 16:16 ` Rob Clark 2022-02-01 16:16 ` [PATCH v2 1/2] drm/msm/gpu: Add ctx to get_param() Rob Clark 2022-02-01 16:16 ` Rob Clark 2022-02-02 19:27 ` Emma Anholt 2022-02-02 19:27 ` Emma Anholt 2022-02-01 16:16 ` Rob Clark [this message] 2022-02-01 16:16 ` [PATCH v2 2/2] drm/msm/gpu: Track global faults per address-space Rob Clark
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20220201161618.778455-3-robdclark@gmail.com \ --to=robdclark@gmail.com \ --cc=airlied@linux.ie \ --cc=christian.koenig@amd.com \ --cc=daniel@ffwll.ch \ --cc=dri-devel@lists.freedesktop.org \ --cc=emma@anholt.net \ --cc=freedreno@lists.freedesktop.org \ --cc=jonathan@marek.ca \ --cc=jordan@cosmicpenguin.net \ --cc=linux-arm-msm@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=quic_abhinavk@quicinc.com \ --cc=quic_akhilpo@quicinc.com \ --cc=robdclark@chromium.org \ --cc=sean@poorly.run \ --cc=swboyd@chromium.org \ --cc=tiny.windzz@gmail.com \ --cc=zzyiwei@chromium.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.