All of lore.kernel.org
 help / color / mirror / Atom feed
From: Danilo Krummrich <dakr@redhat.com>
To: airlied@gmail.com, daniel@ffwll.ch, tzimmermann@suse.de,
	mripard@kernel.org, corbet@lwn.net, christian.koenig@amd.com,
	bskeggs@redhat.com, Liam.Howlett@oracle.com,
	matthew.brost@intel.com, boris.brezillon@collabora.com,
	alexdeucher@gmail.com, ogabbay@kernel.org, bagasdotme@gmail.com,
	willy@infradead.org, jason@jlekstrand.net
Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org,
	linux-doc@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Danilo Krummrich <dakr@redhat.com>
Subject: [PATCH drm-next v4 10/14] drm/nouveau: fence: fail to emit when fence context is killed
Date: Wed,  7 Jun 2023 00:31:26 +0200	[thread overview]
Message-ID: <20230606223130.6132-11-dakr@redhat.com> (raw)
In-Reply-To: <20230606223130.6132-1-dakr@redhat.com>

The new VM_BIND UAPI implementation introduced in subsequent commits
will allow asynchronous jobs processing push buffers and emitting
fences.

If a fence context is killed, e.g. due to a channel fault, jobs which
are already queued for execution might still emit new fences. In such a
case a job would hang forever.

To fix that, fail to emit a new fence on a killed fence context with
-ENODEV to unblock the job.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_fence.c | 7 +++++++
 drivers/gpu/drm/nouveau/nouveau_fence.h | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index e946408f945b..77c739a55b19 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -96,6 +96,7 @@ nouveau_fence_context_kill(struct nouveau_fence_chan *fctx, int error)
 		if (nouveau_fence_signal(fence))
 			nvif_event_block(&fctx->event);
 	}
+	fctx->killed = 1;
 	spin_unlock_irqrestore(&fctx->lock, flags);
 }
 
@@ -229,6 +230,12 @@ nouveau_fence_emit(struct nouveau_fence *fence, struct nouveau_channel *chan)
 		dma_fence_get(&fence->base);
 		spin_lock_irq(&fctx->lock);
 
+		if (unlikely(fctx->killed)) {
+			spin_unlock_irq(&fctx->lock);
+			dma_fence_put(&fence->base);
+			return -ENODEV;
+		}
+
 		if (nouveau_fence_update(chan, fctx))
 			nvif_event_block(&fctx->event);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h b/drivers/gpu/drm/nouveau/nouveau_fence.h
index 7c73c7c9834a..2c72d96ef17d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.h
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
@@ -44,7 +44,7 @@ struct nouveau_fence_chan {
 	char name[32];
 
 	struct nvif_event event;
-	int notify_ref, dead;
+	int notify_ref, dead, killed;
 };
 
 struct nouveau_fence_priv {
-- 
2.40.1


WARNING: multiple messages have this Message-ID (diff)
From: Danilo Krummrich <dakr@redhat.com>
To: airlied@gmail.com, daniel@ffwll.ch, tzimmermann@suse.de,
	mripard@kernel.org, corbet@lwn.net, christian.koenig@amd.com,
	bskeggs@redhat.com, Liam.Howlett@oracle.com,
	matthew.brost@intel.com, boris.brezillon@collabora.com,
	alexdeucher@gmail.com, ogabbay@kernel.org, bagasdotme@gmail.com,
	willy@infradead.org, jason@jlekstrand.net
Cc: linux-doc@vger.kernel.org, nouveau@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-mm@kvack.org
Subject: [Nouveau] [PATCH drm-next v4 10/14] drm/nouveau: fence: fail to emit when fence context is killed
Date: Wed,  7 Jun 2023 00:31:26 +0200	[thread overview]
Message-ID: <20230606223130.6132-11-dakr@redhat.com> (raw)
In-Reply-To: <20230606223130.6132-1-dakr@redhat.com>

The new VM_BIND UAPI implementation introduced in subsequent commits
will allow asynchronous jobs processing push buffers and emitting
fences.

If a fence context is killed, e.g. due to a channel fault, jobs which
are already queued for execution might still emit new fences. In such a
case a job would hang forever.

To fix that, fail to emit a new fence on a killed fence context with
-ENODEV to unblock the job.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_fence.c | 7 +++++++
 drivers/gpu/drm/nouveau/nouveau_fence.h | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index e946408f945b..77c739a55b19 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -96,6 +96,7 @@ nouveau_fence_context_kill(struct nouveau_fence_chan *fctx, int error)
 		if (nouveau_fence_signal(fence))
 			nvif_event_block(&fctx->event);
 	}
+	fctx->killed = 1;
 	spin_unlock_irqrestore(&fctx->lock, flags);
 }
 
@@ -229,6 +230,12 @@ nouveau_fence_emit(struct nouveau_fence *fence, struct nouveau_channel *chan)
 		dma_fence_get(&fence->base);
 		spin_lock_irq(&fctx->lock);
 
+		if (unlikely(fctx->killed)) {
+			spin_unlock_irq(&fctx->lock);
+			dma_fence_put(&fence->base);
+			return -ENODEV;
+		}
+
 		if (nouveau_fence_update(chan, fctx))
 			nvif_event_block(&fctx->event);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h b/drivers/gpu/drm/nouveau/nouveau_fence.h
index 7c73c7c9834a..2c72d96ef17d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.h
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
@@ -44,7 +44,7 @@ struct nouveau_fence_chan {
 	char name[32];
 
 	struct nvif_event event;
-	int notify_ref, dead;
+	int notify_ref, dead, killed;
 };
 
 struct nouveau_fence_priv {
-- 
2.40.1


WARNING: multiple messages have this Message-ID (diff)
From: Danilo Krummrich <dakr@redhat.com>
To: airlied@gmail.com, daniel@ffwll.ch, tzimmermann@suse.de,
	mripard@kernel.org, corbet@lwn.net, christian.koenig@amd.com,
	bskeggs@redhat.com, Liam.Howlett@oracle.com,
	matthew.brost@intel.com, boris.brezillon@collabora.com,
	alexdeucher@gmail.com, ogabbay@kernel.org, bagasdotme@gmail.com,
	willy@infradead.org, jason@jlekstrand.net
Cc: linux-doc@vger.kernel.org, nouveau@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-mm@kvack.org, Danilo Krummrich <dakr@redhat.com>
Subject: [PATCH drm-next v4 10/14] drm/nouveau: fence: fail to emit when fence context is killed
Date: Wed,  7 Jun 2023 00:31:26 +0200	[thread overview]
Message-ID: <20230606223130.6132-11-dakr@redhat.com> (raw)
In-Reply-To: <20230606223130.6132-1-dakr@redhat.com>

The new VM_BIND UAPI implementation introduced in subsequent commits
will allow asynchronous jobs processing push buffers and emitting
fences.

If a fence context is killed, e.g. due to a channel fault, jobs which
are already queued for execution might still emit new fences. In such a
case a job would hang forever.

To fix that, fail to emit a new fence on a killed fence context with
-ENODEV to unblock the job.

Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_fence.c | 7 +++++++
 drivers/gpu/drm/nouveau/nouveau_fence.h | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c
index e946408f945b..77c739a55b19 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.c
@@ -96,6 +96,7 @@ nouveau_fence_context_kill(struct nouveau_fence_chan *fctx, int error)
 		if (nouveau_fence_signal(fence))
 			nvif_event_block(&fctx->event);
 	}
+	fctx->killed = 1;
 	spin_unlock_irqrestore(&fctx->lock, flags);
 }
 
@@ -229,6 +230,12 @@ nouveau_fence_emit(struct nouveau_fence *fence, struct nouveau_channel *chan)
 		dma_fence_get(&fence->base);
 		spin_lock_irq(&fctx->lock);
 
+		if (unlikely(fctx->killed)) {
+			spin_unlock_irq(&fctx->lock);
+			dma_fence_put(&fence->base);
+			return -ENODEV;
+		}
+
 		if (nouveau_fence_update(chan, fctx))
 			nvif_event_block(&fctx->event);
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.h b/drivers/gpu/drm/nouveau/nouveau_fence.h
index 7c73c7c9834a..2c72d96ef17d 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fence.h
+++ b/drivers/gpu/drm/nouveau/nouveau_fence.h
@@ -44,7 +44,7 @@ struct nouveau_fence_chan {
 	char name[32];
 
 	struct nvif_event event;
-	int notify_ref, dead;
+	int notify_ref, dead, killed;
 };
 
 struct nouveau_fence_priv {
-- 
2.40.1


  parent reply	other threads:[~2023-06-06 22:33 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-06 22:31 [PATCH drm-next v4 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Danilo Krummrich
2023-06-06 22:31 ` Danilo Krummrich
2023-06-06 22:31 ` [Nouveau] " Danilo Krummrich
2023-06-06 22:31 ` [PATCH drm-next v4 01/14] drm: execution context for GEM buffers v4 Danilo Krummrich
2023-06-06 22:31   ` Danilo Krummrich
2023-06-06 22:31   ` [Nouveau] " Danilo Krummrich
2023-06-06 22:31 ` [PATCH drm-next v4 02/14] maple_tree: split up MA_STATE() macro Danilo Krummrich
2023-06-06 22:31   ` Danilo Krummrich
2023-06-06 22:31   ` [Nouveau] " Danilo Krummrich
2023-06-13 17:25   ` Liam R. Howlett
2023-06-13 17:25     ` Liam R. Howlett
2023-06-13 17:25     ` [Nouveau] " Liam R. Howlett
2023-06-06 22:31 ` [PATCH drm-next v4 03/14] drm: manager to keep track of GPUs VA mappings Danilo Krummrich
2023-06-06 22:31   ` Danilo Krummrich
2023-06-06 22:31   ` [Nouveau] " Danilo Krummrich
2023-06-07  4:36   ` kernel test robot
2023-06-07  4:36     ` kernel test robot
2023-06-07  4:36     ` [Nouveau] " kernel test robot
2023-06-14  0:29   ` Liam R. Howlett
2023-06-14  0:29     ` Liam R. Howlett
2023-06-14  0:29     ` [Nouveau] " Liam R. Howlett
2023-06-15 14:27     ` Danilo Krummrich
2023-06-15 14:27       ` Danilo Krummrich
2023-06-15 14:27       ` [Nouveau] " Danilo Krummrich
2023-06-06 22:31 ` [PATCH drm-next v4 04/14] drm: debugfs: provide infrastructure to dump a DRM GPU VA space Danilo Krummrich
2023-06-06 22:31   ` Danilo Krummrich
2023-06-06 22:31   ` [Nouveau] " Danilo Krummrich
2023-06-06 23:55   ` kernel test robot
2023-06-06 23:55     ` kernel test robot
2023-06-06 23:55     ` [Nouveau] " kernel test robot
2023-06-07 20:19   ` kernel test robot
2023-06-07 20:19     ` kernel test robot
2023-06-07 20:19     ` [Nouveau] " kernel test robot
2023-06-06 22:31 ` [PATCH drm-next v4 05/14] drm/nouveau: new VM_BIND uapi interfaces Danilo Krummrich
2023-06-06 22:31   ` Danilo Krummrich
2023-06-06 22:31   ` [Nouveau] " Danilo Krummrich
2023-06-06 22:31 ` [PATCH drm-next v4 06/14] drm/nouveau: get vmm via nouveau_cli_vmm() Danilo Krummrich
2023-06-06 22:31   ` Danilo Krummrich
2023-06-06 22:31   ` [Nouveau] " Danilo Krummrich
2023-06-06 22:31 ` [PATCH drm-next v4 07/14] drm/nouveau: bo: initialize GEM GPU VA interface Danilo Krummrich
2023-06-06 22:31   ` Danilo Krummrich
2023-06-06 22:31   ` [Nouveau] " Danilo Krummrich
2023-06-06 22:31 ` [PATCH drm-next v4 08/14] drm/nouveau: move usercopy helpers to nouveau_drv.h Danilo Krummrich
2023-06-06 22:31   ` Danilo Krummrich
2023-06-06 22:31   ` [Nouveau] " Danilo Krummrich
2023-06-06 22:31 ` [PATCH drm-next v4 09/14] drm/nouveau: fence: separate fence alloc and emit Danilo Krummrich
2023-06-06 22:31   ` [Nouveau] " Danilo Krummrich
2023-06-06 22:31   ` Danilo Krummrich
2023-06-07 15:42   ` kernel test robot
2023-06-07 15:42     ` kernel test robot
2023-06-07 15:42     ` [Nouveau] " kernel test robot
2023-06-06 22:31 ` Danilo Krummrich [this message]
2023-06-06 22:31   ` [PATCH drm-next v4 10/14] drm/nouveau: fence: fail to emit when fence context is killed Danilo Krummrich
2023-06-06 22:31   ` [Nouveau] " Danilo Krummrich
2023-06-06 22:31 ` [PATCH drm-next v4 11/14] drm/nouveau: chan: provide nouveau_channel_kill() Danilo Krummrich
2023-06-06 22:31   ` Danilo Krummrich
2023-06-06 22:31   ` [Nouveau] " Danilo Krummrich
2023-06-06 22:31 ` [PATCH drm-next v4 12/14] drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm Danilo Krummrich
2023-06-06 22:31   ` Danilo Krummrich
2023-06-06 22:31   ` [Nouveau] " Danilo Krummrich
2023-06-07 15:52   ` kernel test robot
2023-06-07 15:52     ` kernel test robot
2023-06-07 15:52     ` [Nouveau] " kernel test robot
2023-06-06 22:31 ` [PATCH drm-next v4 13/14] drm/nouveau: implement new VM_BIND uAPI Danilo Krummrich
2023-06-06 22:31   ` Danilo Krummrich
2023-06-06 22:31   ` [Nouveau] " Danilo Krummrich
2023-06-08 12:53   ` kernel test robot
2023-06-08 12:53     ` [Nouveau] " kernel test robot
2023-06-08 12:53     ` kernel test robot
2023-06-06 22:31 ` [PATCH drm-next v4 14/14] drm/nouveau: debugfs: implement DRM GPU VA debugfs Danilo Krummrich
2023-06-06 22:31   ` Danilo Krummrich
2023-06-06 22:31   ` [Nouveau] " Danilo Krummrich
2023-06-14  1:20   ` Liam R. Howlett
2023-06-14  1:20     ` Liam R. Howlett
2023-06-14  1:20     ` [Nouveau] " Liam R. Howlett
2023-06-09 11:56 ` [PATCH drm-next v4 00/14] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI Donald Robson
2023-06-09 11:56   ` [Nouveau] " Donald Robson
2023-06-09 11:56   ` Donald Robson
2023-06-13 14:20   ` Danilo Krummrich
2023-06-13 14:20     ` Danilo Krummrich
2023-06-13 14:20     ` [Nouveau] " Danilo Krummrich
2023-06-14  7:58     ` Donald Robson
2023-06-14  7:58       ` [Nouveau] " Donald Robson
2023-06-14  7:58       ` Donald Robson
2023-06-15 16:31       ` Danilo Krummrich
2023-06-15 16:31         ` Danilo Krummrich
2023-06-15 16:31         ` [Nouveau] " Danilo Krummrich
2023-06-15 16:39 ` Danilo Krummrich
2023-06-15 16:39   ` Danilo Krummrich
2023-06-15 16:39   ` [Nouveau] " Danilo Krummrich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230606223130.6132-11-dakr@redhat.com \
    --to=dakr@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=airlied@gmail.com \
    --cc=alexdeucher@gmail.com \
    --cc=bagasdotme@gmail.com \
    --cc=boris.brezillon@collabora.com \
    --cc=bskeggs@redhat.com \
    --cc=christian.koenig@amd.com \
    --cc=corbet@lwn.net \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jason@jlekstrand.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.brost@intel.com \
    --cc=mripard@kernel.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=ogabbay@kernel.org \
    --cc=tzimmermann@suse.de \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.