All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH 1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period
@ 2020-03-23  9:28 Chris Wilson
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 2/8] drm/i915: Avoid live-lock with i915_vma_parked() Chris Wilson
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Chris Wilson @ 2020-03-23  9:28 UTC (permalink / raw)
  To: intel-gfx

Since we take advantage of RCU for some i915_active objects, like the
intel_timeline_cacheline, we need to delay the i915_active_fini until
after the RCU grace period and we perform the kfree -- that is until
after all RCU protected readers.

<3> [108.204873] ODEBUG: assert_init not available (active state 0) object type: i915_active hint: __cacheline_active+0x0/0x80 [i915]
<4> [108.207377] WARNING: CPU: 3 PID: 2342 at lib/debugobjects.c:488 debug_print_object+0x67/0x90
<4> [108.207400] Modules linked in: vgem snd_hda_codec_hdmi x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel ghash_clmulni_intel snd_intel_dspcfg snd_hda_codec ax88179_178a snd_hwdep usbnet btusb snd_hda_core btrtl mii btbcm btintel snd_pcm bluetooth ecdh_generic ecc i915 i2c_hid pinctrl_sunrisepoint pinctrl_intel intel_lpss_pci prime_numbers
<4> [108.207587] CPU: 3 PID: 2342 Comm: gem_exec_parall Tainted: G     U            5.6.0-rc6-CI-Patchwork_17047+ #1
<4> [108.207609] Hardware name: Google Soraka/Soraka, BIOS MrChromebox-4.10 08/25/2019
<4> [108.207639] RIP: 0010:debug_print_object+0x67/0x90
<4> [108.207668] Code: 83 c2 01 8b 4b 14 4c 8b 45 00 89 15 87 d2 8a 02 8b 53 10 4c 89 e6 48 c7 c7 38 2b 32 82 48 8b 14 d5 80 2f 07 82 e8 49 d5 b7 ff <0f> 0b 5b 83 05 c3 f6 22 01 01 5d 41 5c c3 83 05 b8 f6 22 01 01 c3
<4> [108.207692] RSP: 0018:ffffc90000e7f890 EFLAGS: 00010282
<4> [108.207723] RAX: 0000000000000000 RBX: ffffc90000e7f8b0 RCX: 0000000000000001
<4> [108.207747] RDX: 0000000080000001 RSI: ffff88817ada8cb8 RDI: 00000000ffffffff
<4> [108.207770] RBP: ffffffffa0341cc0 R08: ffff88816b5a8948 R09: 0000000000000000
<4> [108.207792] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82322d54
<4> [108.207814] R13: ffffffffa0341cc0 R14: ffffffff83df9568 R15: ffff88816064f400
<4> [108.207839] FS:  00007f437d753700(0000) GS:ffff88817ad80000(0000) knlGS:0000000000000000
<4> [108.207863] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [108.207887] CR2: 00007f2ad1fb5000 CR3: 00000001725d8004 CR4: 00000000003606e0
<4> [108.207907] Call Trace:
<4> [108.207959]  debug_object_assert_init+0x15c/0x180
<4> [108.208475]  ? i915_active_acquire_if_busy+0x10/0x50 [i915]
<4> [108.208513]  ? rcu_read_lock_held+0x4d/0x60
<4> [108.208970]  i915_active_acquire_if_busy+0x10/0x50 [i915]
<4> [108.209380]  intel_timeline_read_hwsp+0x81/0x540 [i915]
<4> [108.210262]  __emit_semaphore_wait+0x45/0x1b0 [i915]
<4> [108.210726]  ? i915_request_await_dma_fence+0x143/0x560 [i915]
<4> [108.211156]  i915_request_await_dma_fence+0x28a/0x560 [i915]
<4> [108.211633]  i915_request_await_object+0x24a/0x3f0 [i915]
<4> [108.212102]  eb_submit.isra.47+0x58f/0x920 [i915]
<4> [108.212622]  i915_gem_do_execbuffer+0x1706/0x2c70 [i915]
<4> [108.213071]  ? i915_gem_execbuffer2_ioctl+0xc0/0x470 [i915]

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_timeline.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
index 91debbc97c9a..3779c2ae0d65 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -119,6 +119,15 @@ static void __idle_hwsp_free(struct intel_timeline_hwsp *hwsp, int cacheline)
 	spin_unlock_irqrestore(&gt->hwsp_lock, flags);
 }
 
+static void __rcu_cacheline_free(struct rcu_head *rcu)
+{
+	struct intel_timeline_cacheline *cl =
+		container_of(rcu, typeof(*cl), rcu);
+
+	i915_active_fini(&cl->active);
+	kfree(cl);
+}
+
 static void __idle_cacheline_free(struct intel_timeline_cacheline *cl)
 {
 	GEM_BUG_ON(!i915_active_is_idle(&cl->active));
@@ -127,8 +136,7 @@ static void __idle_cacheline_free(struct intel_timeline_cacheline *cl)
 	i915_vma_put(cl->hwsp->vma);
 	__idle_hwsp_free(cl->hwsp, ptr_unmask_bits(cl->vaddr, CACHELINE_BITS));
 
-	i915_active_fini(&cl->active);
-	kfree_rcu(cl, rcu);
+	call_rcu(&cl->rcu, __rcu_cacheline_free);
 }
 
 __i915_active_call
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-gfx] [PATCH 2/8] drm/i915: Avoid live-lock with i915_vma_parked()
  2020-03-23  9:28 [Intel-gfx] [PATCH 1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period Chris Wilson
@ 2020-03-23  9:28 ` Chris Wilson
  2020-03-23 10:09   ` Tvrtko Ursulin
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 3/8] drm/i915: Extend intel_wakeref to support delayed puts Chris Wilson
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2020-03-23  9:28 UTC (permalink / raw)
  To: intel-gfx

Abuse^W Take advantage that we know we are inside the GT wakeref and
that prevents any client execbuf from reopening the i915_vma in order to
claim all the vma to close without having to drop the spinlock to free
each one individually. By keeping the spinlock, we do not have to
restart if we run concurrently with i915_gem_free_objects -- which
causes them both to restart continually and make very very slow
progress.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1361
Fixes: 77853186e547 ("drm/i915: Claim vma while under closed_lock in i915_vma_parked()")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_vma.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 5b3efb43a8ef..08699fa069aa 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -1097,6 +1097,7 @@ void i915_vma_release(struct kref *ref)
 void i915_vma_parked(struct intel_gt *gt)
 {
 	struct i915_vma *vma, *next;
+	LIST_HEAD(closed);
 
 	spin_lock_irq(&gt->closed_lock);
 	list_for_each_entry_safe(vma, next, &gt->closed_vma, closed_link) {
@@ -1108,28 +1109,26 @@ void i915_vma_parked(struct intel_gt *gt)
 		if (!kref_get_unless_zero(&obj->base.refcount))
 			continue;
 
-		if (i915_vm_tryopen(vm)) {
-			list_del_init(&vma->closed_link);
-		} else {
+		if (!i915_vm_tryopen(vm)) {
 			i915_gem_object_put(obj);
-			obj = NULL;
+			continue;
 		}
 
-		spin_unlock_irq(&gt->closed_lock);
+		list_move(&vma->closed_link, &closed);
+	}
+	spin_unlock_irq(&gt->closed_lock);
 
-		if (obj) {
-			__i915_vma_put(vma);
-			i915_gem_object_put(obj);
-		}
+	/* As the GT is held idle, no vma can be reopened as we destroy them */
+	list_for_each_entry_safe(vma, next, &closed, closed_link) {
+		struct drm_i915_gem_object *obj = vma->obj;
+		struct i915_address_space *vm = vma->vm;
 
-		i915_vm_close(vm);
+		INIT_LIST_HEAD(&vma->closed_link);
+		__i915_vma_put(vma);
 
-		/* Restart after dropping lock */
-		spin_lock_irq(&gt->closed_lock);
-		next = list_first_entry(&gt->closed_vma,
-					typeof(*next), closed_link);
+		i915_gem_object_put(obj);
+		i915_vm_close(vm);
 	}
-	spin_unlock_irq(&gt->closed_lock);
 }
 
 static void __i915_vma_iounmap(struct i915_vma *vma)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-gfx] [PATCH 3/8] drm/i915: Extend intel_wakeref to support delayed puts
  2020-03-23  9:28 [Intel-gfx] [PATCH 1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period Chris Wilson
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 2/8] drm/i915: Avoid live-lock with i915_vma_parked() Chris Wilson
@ 2020-03-23  9:28 ` Chris Wilson
  2020-03-23 10:13   ` Tvrtko Ursulin
  2020-03-23 10:32   ` [Intel-gfx] [PATCH] " Chris Wilson
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 4/8] drm/i915/gt: Delay release of engine-pm after last retirement Chris Wilson
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 22+ messages in thread
From: Chris Wilson @ 2020-03-23  9:28 UTC (permalink / raw)
  To: intel-gfx

In some cases we want to hold onto the wakeref for a little after the
last user so that we can avoid having to drop and then immediately
reacquire it. Allow the last user to specify if they would like to keep
the wakeref alive for a short hysteresis.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_engine_pm.h   |  6 ++++++
 drivers/gpu/drm/i915/gt/intel_gt_requests.c |  2 +-
 drivers/gpu/drm/i915/intel_wakeref.c        | 11 ++++++-----
 drivers/gpu/drm/i915/intel_wakeref.h        | 10 ++++++++--
 4 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.h b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
index e52c2b0cb245..418df0a13145 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
@@ -37,6 +37,12 @@ static inline void intel_engine_pm_put_async(struct intel_engine_cs *engine)
 	intel_wakeref_put_async(&engine->wakeref);
 }
 
+static inline void intel_engine_pm_put_delay(struct intel_engine_cs *engine,
+					     unsigned long delay)
+{
+	intel_wakeref_put_delay(&engine->wakeref, delay);
+}
+
 static inline void intel_engine_pm_flush(struct intel_engine_cs *engine)
 {
 	intel_wakeref_unlock_wait(&engine->wakeref);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index 24c99d0838af..835ec184763e 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -38,7 +38,7 @@ static bool flush_submission(struct intel_gt *gt)
 	for_each_engine(engine, gt, id) {
 		intel_engine_flush_submission(engine);
 		active |= flush_work(&engine->retire_work);
-		active |= flush_work(&engine->wakeref.work);
+		active |= flush_delayed_work(&engine->wakeref.work);
 	}
 
 	return active;
diff --git a/drivers/gpu/drm/i915/intel_wakeref.c b/drivers/gpu/drm/i915/intel_wakeref.c
index 8fbf6f4d3f26..2977bc0428e2 100644
--- a/drivers/gpu/drm/i915/intel_wakeref.c
+++ b/drivers/gpu/drm/i915/intel_wakeref.c
@@ -70,11 +70,11 @@ static void ____intel_wakeref_put_last(struct intel_wakeref *wf)
 
 void __intel_wakeref_put_last(struct intel_wakeref *wf, unsigned long flags)
 {
-	INTEL_WAKEREF_BUG_ON(work_pending(&wf->work));
+	INTEL_WAKEREF_BUG_ON(delayed_work_pending(&wf->work));
 
 	/* Assume we are not in process context and so cannot sleep. */
 	if (flags & INTEL_WAKEREF_PUT_ASYNC || !mutex_trylock(&wf->mutex)) {
-		schedule_work(&wf->work);
+		mod_delayed_work(system_wq, &wf->work, flags >> 1);
 		return;
 	}
 
@@ -83,7 +83,7 @@ void __intel_wakeref_put_last(struct intel_wakeref *wf, unsigned long flags)
 
 static void __intel_wakeref_put_work(struct work_struct *wrk)
 {
-	struct intel_wakeref *wf = container_of(wrk, typeof(*wf), work);
+	struct intel_wakeref *wf = container_of(wrk, typeof(*wf), work.work);
 
 	if (atomic_add_unless(&wf->count, -1, 1))
 		return;
@@ -104,8 +104,9 @@ void __intel_wakeref_init(struct intel_wakeref *wf,
 	atomic_set(&wf->count, 0);
 	wf->wakeref = 0;
 
-	INIT_WORK(&wf->work, __intel_wakeref_put_work);
-	lockdep_init_map(&wf->work.lockdep_map, "wakeref.work", &key->work, 0);
+	INIT_DELAYED_WORK(&wf->work, __intel_wakeref_put_work);
+	lockdep_init_map(&wf->work.work.lockdep_map,
+			 "wakeref.work", &key->work, 0);
 }
 
 int intel_wakeref_wait_for_idle(struct intel_wakeref *wf)
diff --git a/drivers/gpu/drm/i915/intel_wakeref.h b/drivers/gpu/drm/i915/intel_wakeref.h
index 7d1e676b71ef..e87532e282d2 100644
--- a/drivers/gpu/drm/i915/intel_wakeref.h
+++ b/drivers/gpu/drm/i915/intel_wakeref.h
@@ -41,7 +41,7 @@ struct intel_wakeref {
 	struct intel_runtime_pm *rpm;
 	const struct intel_wakeref_ops *ops;
 
-	struct work_struct work;
+	struct delayed_work work;
 };
 
 struct intel_wakeref_lockclass {
@@ -154,6 +154,12 @@ intel_wakeref_put_async(struct intel_wakeref *wf)
 	__intel_wakeref_put(wf, INTEL_WAKEREF_PUT_ASYNC);
 }
 
+static inline void
+intel_wakeref_put_delay(struct intel_wakeref *wf, unsigned long delay)
+{
+	__intel_wakeref_put(wf, INTEL_WAKEREF_PUT_ASYNC | delay << 1);
+}
+
 /**
  * intel_wakeref_lock: Lock the wakeref (mutex)
  * @wf: the wakeref
@@ -194,7 +200,7 @@ intel_wakeref_unlock_wait(struct intel_wakeref *wf)
 {
 	mutex_lock(&wf->mutex);
 	mutex_unlock(&wf->mutex);
-	flush_work(&wf->work);
+	flush_delayed_work(&wf->work);
 }
 
 /**
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-gfx] [PATCH 4/8] drm/i915/gt: Delay release of engine-pm after last retirement
  2020-03-23  9:28 [Intel-gfx] [PATCH 1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period Chris Wilson
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 2/8] drm/i915: Avoid live-lock with i915_vma_parked() Chris Wilson
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 3/8] drm/i915: Extend intel_wakeref to support delayed puts Chris Wilson
@ 2020-03-23  9:28 ` Chris Wilson
  2020-03-23 10:14   ` Tvrtko Ursulin
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 5/8] drm/i915: Rely on direct submission to the queue Chris Wilson
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2020-03-23  9:28 UTC (permalink / raw)
  To: intel-gfx

Keep the engine-pm awake until the next jiffie, to avoid immediate
ping-pong under moderate load. (Forcing the idle barrier excerbates the
moderate load, dramatically increasing the driver overhead.) On the
other hand, delaying the idle-barrier slightly incurs longer rc6-off
and so more power consumption.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/848
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_context.c | 2 +-
 drivers/gpu/drm/i915/i915_active.c      | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index e4aece20bc80..622ff425fce9 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -350,7 +350,7 @@ void intel_context_enter_engine(struct intel_context *ce)
 void intel_context_exit_engine(struct intel_context *ce)
 {
 	intel_timeline_exit(ce->timeline);
-	intel_engine_pm_put(ce->engine);
+	intel_engine_pm_put_delay(ce->engine, 1);
 }
 
 int intel_context_prepare_remote_request(struct intel_context *ce,
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index c4048628188a..a0d31f7bfb42 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -818,7 +818,7 @@ void i915_active_acquire_barrier(struct i915_active *ref)
 
 		GEM_BUG_ON(!intel_engine_pm_is_awake(engine));
 		llist_add(barrier_to_ll(node), &engine->barrier_tasks);
-		intel_engine_pm_put(engine);
+		intel_engine_pm_put_delay(engine, 1);
 	}
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-gfx] [PATCH 5/8] drm/i915: Rely on direct submission to the queue
  2020-03-23  9:28 [Intel-gfx] [PATCH 1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period Chris Wilson
                   ` (2 preceding siblings ...)
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 4/8] drm/i915/gt: Delay release of engine-pm after last retirement Chris Wilson
@ 2020-03-23  9:28 ` Chris Wilson
  2020-03-23 10:29   ` Tvrtko Ursulin
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 6/8] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission Chris Wilson
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2020-03-23  9:28 UTC (permalink / raw)
  To: intel-gfx

Drop the pretense of kicking the tasklet (used only for the defunct guc
submission backend, it should just take ownership of the submit!) and so
remove the bh-kicking from around submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 --
 drivers/gpu/drm/i915/gt/intel_lrc.c            | 5 +----
 drivers/gpu/drm/i915/i915_request.c            | 2 --
 3 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 36d069504836..c2bd5accde0c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -2348,9 +2348,7 @@ static void eb_request_add(struct i915_execbuffer *eb)
 		__i915_request_skip(rq);
 	}
 
-	local_bh_disable();
 	__i915_request_queue(rq, &attr);
-	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
 
 	/* Try to clean up the client's timeline after submitting the request */
 	if (prev)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index f09dd87324b9..210f60e14ef4 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2891,10 +2891,7 @@ static void __submit_queue_imm(struct intel_engine_cs *engine)
 	if (reset_in_progress(execlists))
 		return; /* defer until we restart the engine following reset */
 
-	if (execlists->tasklet.func == execlists_submission_tasklet)
-		__execlists_submission_tasklet(engine);
-	else
-		tasklet_hi_schedule(&execlists->tasklet);
+	__execlists_submission_tasklet(engine);
 }
 
 static void submit_queue(struct intel_engine_cs *engine,
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index c0df71d7d0ff..3388c5b610c5 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1444,9 +1444,7 @@ void i915_request_add(struct i915_request *rq)
 	if (list_empty(&rq->sched.signalers_list))
 		attr.priority |= I915_PRIORITY_WAIT;
 
-	local_bh_disable();
 	__i915_request_queue(rq, &attr);
-	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
 
 	mutex_unlock(&tl->mutex);
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-gfx] [PATCH 6/8] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission
  2020-03-23  9:28 [Intel-gfx] [PATCH 1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period Chris Wilson
                   ` (3 preceding siblings ...)
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 5/8] drm/i915: Rely on direct submission to the queue Chris Wilson
@ 2020-03-23  9:28 ` Chris Wilson
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 7/8] drm/i915: Immediately execute the fenced work Chris Wilson
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2020-03-23  9:28 UTC (permalink / raw)
  To: intel-gfx

We dropped calling process_csb prior to handling direct submission in
order to avoid the nesting of spinlocks and lift process_csb() and the
majority of the tasklet out of irq-off. However, we do want to avoid
ksoftirqd latency in the fast path, so try and pull the interrupt-bh
local to direct submission if we can acquire the tasklet's lock.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 210f60e14ef4..82dee2141b46 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2891,6 +2891,12 @@ static void __submit_queue_imm(struct intel_engine_cs *engine)
 	if (reset_in_progress(execlists))
 		return; /* defer until we restart the engine following reset */
 
+	/* Hopefully we clear execlists->pending[] to let us through */
+	if (execlists->pending[0] && tasklet_trylock(&execlists->tasklet)) {
+		process_csb(engine);
+		tasklet_unlock(&execlists->tasklet);
+	}
+
 	__execlists_submission_tasklet(engine);
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-gfx] [PATCH 7/8] drm/i915: Immediately execute the fenced work
  2020-03-23  9:28 [Intel-gfx] [PATCH 1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period Chris Wilson
                   ` (4 preceding siblings ...)
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 6/8] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission Chris Wilson
@ 2020-03-23  9:28 ` Chris Wilson
  2020-03-23 10:37   ` Tvrtko Ursulin
  2020-03-23 11:04   ` [Intel-gfx] [PATCH] " Chris Wilson
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 8/8] drm/i915/gem: Avoid gem_context->mutex for simple vma lookup Chris Wilson
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 22+ messages in thread
From: Chris Wilson @ 2020-03-23  9:28 UTC (permalink / raw)
  To: intel-gfx

If the caller allows and we do not have to wait for any signals,
immediately execute the work within the caller's process. By doing so we
avoid the overhead of scheduling a new task, and the latency in
executing it, at the cost of pulling that work back into the immediate
context. (Sometimes we still prefer to offload the task to another cpu,
especially if we plan on executing many such tasks in parallel for this
client.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c |  2 +-
 drivers/gpu/drm/i915/i915_sw_fence_work.c      |  5 ++++-
 drivers/gpu/drm/i915/i915_sw_fence_work.h      | 12 ++++++++++++
 drivers/gpu/drm/i915/i915_vma.c                |  2 +-
 4 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index c2bd5accde0c..e80c6f613feb 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1784,7 +1784,7 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
 	dma_resv_add_excl_fence(shadow->resv, &pw->base.dma);
 	dma_resv_unlock(shadow->resv);
 
-	dma_fence_work_commit(&pw->base);
+	dma_fence_work_commit_imm(&pw->base);
 	return 0;
 
 err_batch_unlock:
diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
index 997b2998f1f2..a3a81bb8f2c3 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
@@ -38,7 +38,10 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 
 		if (!f->dma.error) {
 			dma_fence_get(&f->dma);
-			queue_work(system_unbound_wq, &f->work);
+			if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
+				fence_work(&f->work);
+			else
+				queue_work(system_unbound_wq, &f->work);
 		} else {
 			fence_complete(f);
 		}
diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h
index 3a22b287e201..0719d661dc9c 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h
@@ -32,6 +32,10 @@ struct dma_fence_work {
 	const struct dma_fence_work_ops *ops;
 };
 
+enum {
+	DMA_FENCE_WORK_IMM = DMA_FENCE_FLAG_USER_BITS,
+};
+
 void dma_fence_work_init(struct dma_fence_work *f,
 			 const struct dma_fence_work_ops *ops);
 int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal);
@@ -41,4 +45,12 @@ static inline void dma_fence_work_commit(struct dma_fence_work *f)
 	i915_sw_fence_commit(&f->chain);
 }
 
+static inline void dma_fence_work_commit_imm(struct dma_fence_work *f)
+{
+	if (atomic_read(&f->chain.pending) <= 1)
+		__set_bit(DMA_FENCE_WORK_IMM, &f->dma.flags);
+
+	dma_fence_work_commit(f);
+}
+
 #endif /* I915_SW_FENCE_WORK_H */
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 08699fa069aa..191577a98390 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -980,7 +980,7 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	mutex_unlock(&vma->vm->mutex);
 err_fence:
 	if (work)
-		dma_fence_work_commit(&work->base);
+		dma_fence_work_commit_imm(&work->base);
 	if (wakeref)
 		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
 err_pages:
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Intel-gfx] [PATCH 8/8] drm/i915/gem: Avoid gem_context->mutex for simple vma lookup
  2020-03-23  9:28 [Intel-gfx] [PATCH 1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period Chris Wilson
                   ` (5 preceding siblings ...)
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 7/8] drm/i915: Immediately execute the fenced work Chris Wilson
@ 2020-03-23  9:28 ` Chris Wilson
  2020-03-23 10:38   ` Tvrtko Ursulin
  2020-03-23  9:43 ` [Intel-gfx] [PATCH 1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period Matthew Auld
  2020-03-23 13:06 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period (rev3) Patchwork
  8 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2020-03-23  9:28 UTC (permalink / raw)
  To: intel-gfx

As we store the handle lookup inside a radix tree, we do not need the
gem_context->mutex except until we need to insert our lookup into the
common radix tree. This takes a small bit of rearranging to ensure that
the lut we insert into the tree is ready prior to actually inserting it
(as soon as it is exposed via the radixtree, it is visible to any other
submission).

v2: For brownie points, remove the goto spaghetti.
v3: Tighten up the closed-handle checks.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 136 +++++++++++-------
 1 file changed, 87 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index e80c6f613feb..c643eec4dca0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -481,7 +481,7 @@ eb_add_vma(struct i915_execbuffer *eb,
 
 	GEM_BUG_ON(i915_vma_is_closed(vma));
 
-	ev->vma = i915_vma_get(vma);
+	ev->vma = vma;
 	ev->exec = entry;
 	ev->flags = entry->flags;
 
@@ -728,77 +728,117 @@ static int eb_select_context(struct i915_execbuffer *eb)
 	return 0;
 }
 
-static int eb_lookup_vmas(struct i915_execbuffer *eb)
+static int __eb_add_lut(struct i915_execbuffer *eb,
+			u32 handle, struct i915_vma *vma)
 {
-	struct radix_tree_root *handles_vma = &eb->gem_context->handles_vma;
-	struct drm_i915_gem_object *obj;
-	unsigned int i, batch;
+	struct i915_gem_context *ctx = eb->gem_context;
+	struct i915_lut_handle *lut;
 	int err;
 
-	if (unlikely(i915_gem_context_is_closed(eb->gem_context)))
-		return -ENOENT;
+	lut = i915_lut_handle_alloc();
+	if (unlikely(!lut))
+		return -ENOMEM;
 
-	INIT_LIST_HEAD(&eb->relocs);
-	INIT_LIST_HEAD(&eb->unbound);
+	i915_vma_get(vma);
+	if (!atomic_fetch_inc(&vma->open_count))
+		i915_vma_reopen(vma);
+	lut->handle = handle;
+	lut->ctx = ctx;
+
+	/* Check that the context hasn't been closed in the meantime */
+	err = -EINTR;
+	if (!mutex_lock_interruptible(&ctx->mutex)) {
+		err = -ENOENT;
+		if (likely(!i915_gem_context_is_closed(ctx)))
+			err = radix_tree_insert(&ctx->handles_vma, handle, vma);
+		if (err == 0) { /* And nor has this handle */
+			struct drm_i915_gem_object *obj = vma->obj;
+
+			i915_gem_object_lock(obj);
+			if (idr_find(&eb->file->object_idr, handle) == obj) {
+				list_add(&lut->obj_link, &obj->lut_list);
+			} else {
+				radix_tree_delete(&ctx->handles_vma, handle);
+				err = -ENOENT;
+			}
+			i915_gem_object_unlock(obj);
+		}
+		mutex_unlock(&ctx->mutex);
+	}
+	if (unlikely(err))
+		goto err;
 
-	batch = eb_batch_index(eb);
+	return 0;
 
-	for (i = 0; i < eb->buffer_count; i++) {
-		u32 handle = eb->exec[i].handle;
-		struct i915_lut_handle *lut;
+err:
+	atomic_dec(&vma->open_count);
+	i915_vma_put(vma);
+	i915_lut_handle_free(lut);
+	return err;
+}
+
+static struct i915_vma *eb_lookup_vma(struct i915_execbuffer *eb, u32 handle)
+{
+	do {
+		struct drm_i915_gem_object *obj;
 		struct i915_vma *vma;
+		int err;
 
-		vma = radix_tree_lookup(handles_vma, handle);
+		rcu_read_lock();
+		vma = radix_tree_lookup(&eb->gem_context->handles_vma, handle);
+		if (likely(vma))
+			vma = i915_vma_tryget(vma);
+		rcu_read_unlock();
 		if (likely(vma))
-			goto add_vma;
+			return vma;
 
 		obj = i915_gem_object_lookup(eb->file, handle);
-		if (unlikely(!obj)) {
-			err = -ENOENT;
-			goto err_vma;
-		}
+		if (unlikely(!obj))
+			return ERR_PTR(-ENOENT);
 
 		vma = i915_vma_instance(obj, eb->context->vm, NULL);
 		if (IS_ERR(vma)) {
-			err = PTR_ERR(vma);
-			goto err_obj;
+			i915_gem_object_put(obj);
+			return vma;
 		}
 
-		lut = i915_lut_handle_alloc();
-		if (unlikely(!lut)) {
-			err = -ENOMEM;
-			goto err_obj;
-		}
+		err = __eb_add_lut(eb, handle, vma);
+		if (likely(!err))
+			return vma;
 
-		err = radix_tree_insert(handles_vma, handle, vma);
-		if (unlikely(err)) {
-			i915_lut_handle_free(lut);
-			goto err_obj;
-		}
+		i915_gem_object_put(obj);
+		if (err != -EEXIST)
+			return ERR_PTR(err);
+	} while (1);
+}
 
-		/* transfer ref to lut */
-		if (!atomic_fetch_inc(&vma->open_count))
-			i915_vma_reopen(vma);
-		lut->handle = handle;
-		lut->ctx = eb->gem_context;
+static int eb_lookup_vmas(struct i915_execbuffer *eb)
+{
+	unsigned int batch = eb_batch_index(eb);
+	unsigned int i;
+	int err = 0;
 
-		i915_gem_object_lock(obj);
-		list_add(&lut->obj_link, &obj->lut_list);
-		i915_gem_object_unlock(obj);
+	INIT_LIST_HEAD(&eb->relocs);
+	INIT_LIST_HEAD(&eb->unbound);
+
+	for (i = 0; i < eb->buffer_count; i++) {
+		struct i915_vma *vma;
+
+		vma = eb_lookup_vma(eb, eb->exec[i].handle);
+		if (IS_ERR(vma)) {
+			err = PTR_ERR(vma);
+			break;
+		}
 
-add_vma:
 		err = eb_validate_vma(eb, &eb->exec[i], vma);
-		if (unlikely(err))
-			goto err_vma;
+		if (unlikely(err)) {
+			i915_vma_put(vma);
+			break;
+		}
 
 		eb_add_vma(eb, i, batch, vma);
 	}
 
-	return 0;
-
-err_obj:
-	i915_gem_object_put(obj);
-err_vma:
 	eb->vma[i].vma = NULL;
 	return err;
 }
@@ -1494,9 +1534,7 @@ static int eb_relocate(struct i915_execbuffer *eb)
 {
 	int err;
 
-	mutex_lock(&eb->gem_context->mutex);
 	err = eb_lookup_vmas(eb);
-	mutex_unlock(&eb->gem_context->mutex);
 	if (err)
 		return err;
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [PATCH 1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period
  2020-03-23  9:28 [Intel-gfx] [PATCH 1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period Chris Wilson
                   ` (6 preceding siblings ...)
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 8/8] drm/i915/gem: Avoid gem_context->mutex for simple vma lookup Chris Wilson
@ 2020-03-23  9:43 ` Matthew Auld
  2020-03-23 13:06 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period (rev3) Patchwork
  8 siblings, 0 replies; 22+ messages in thread
From: Matthew Auld @ 2020-03-23  9:43 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On Mon, 23 Mar 2020 at 09:29, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Since we take advantage of RCU for some i915_active objects, like the
> intel_timeline_cacheline, we need to delay the i915_active_fini until
> after the RCU grace period and we perform the kfree -- that is until
> after all RCU protected readers.
>
> <3> [108.204873] ODEBUG: assert_init not available (active state 0) object type: i915_active hint: __cacheline_active+0x0/0x80 [i915]
> <4> [108.207377] WARNING: CPU: 3 PID: 2342 at lib/debugobjects.c:488 debug_print_object+0x67/0x90
> <4> [108.207400] Modules linked in: vgem snd_hda_codec_hdmi x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel ghash_clmulni_intel snd_intel_dspcfg snd_hda_codec ax88179_178a snd_hwdep usbnet btusb snd_hda_core btrtl mii btbcm btintel snd_pcm bluetooth ecdh_generic ecc i915 i2c_hid pinctrl_sunrisepoint pinctrl_intel intel_lpss_pci prime_numbers
> <4> [108.207587] CPU: 3 PID: 2342 Comm: gem_exec_parall Tainted: G     U            5.6.0-rc6-CI-Patchwork_17047+ #1
> <4> [108.207609] Hardware name: Google Soraka/Soraka, BIOS MrChromebox-4.10 08/25/2019
> <4> [108.207639] RIP: 0010:debug_print_object+0x67/0x90
> <4> [108.207668] Code: 83 c2 01 8b 4b 14 4c 8b 45 00 89 15 87 d2 8a 02 8b 53 10 4c 89 e6 48 c7 c7 38 2b 32 82 48 8b 14 d5 80 2f 07 82 e8 49 d5 b7 ff <0f> 0b 5b 83 05 c3 f6 22 01 01 5d 41 5c c3 83 05 b8 f6 22 01 01 c3
> <4> [108.207692] RSP: 0018:ffffc90000e7f890 EFLAGS: 00010282
> <4> [108.207723] RAX: 0000000000000000 RBX: ffffc90000e7f8b0 RCX: 0000000000000001
> <4> [108.207747] RDX: 0000000080000001 RSI: ffff88817ada8cb8 RDI: 00000000ffffffff
> <4> [108.207770] RBP: ffffffffa0341cc0 R08: ffff88816b5a8948 R09: 0000000000000000
> <4> [108.207792] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82322d54
> <4> [108.207814] R13: ffffffffa0341cc0 R14: ffffffff83df9568 R15: ffff88816064f400
> <4> [108.207839] FS:  00007f437d753700(0000) GS:ffff88817ad80000(0000) knlGS:0000000000000000
> <4> [108.207863] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4> [108.207887] CR2: 00007f2ad1fb5000 CR3: 00000001725d8004 CR4: 00000000003606e0
> <4> [108.207907] Call Trace:
> <4> [108.207959]  debug_object_assert_init+0x15c/0x180
> <4> [108.208475]  ? i915_active_acquire_if_busy+0x10/0x50 [i915]
> <4> [108.208513]  ? rcu_read_lock_held+0x4d/0x60
> <4> [108.208970]  i915_active_acquire_if_busy+0x10/0x50 [i915]
> <4> [108.209380]  intel_timeline_read_hwsp+0x81/0x540 [i915]
> <4> [108.210262]  __emit_semaphore_wait+0x45/0x1b0 [i915]
> <4> [108.210726]  ? i915_request_await_dma_fence+0x143/0x560 [i915]
> <4> [108.211156]  i915_request_await_dma_fence+0x28a/0x560 [i915]
> <4> [108.211633]  i915_request_await_object+0x24a/0x3f0 [i915]
> <4> [108.212102]  eb_submit.isra.47+0x58f/0x920 [i915]
> <4> [108.212622]  i915_gem_do_execbuffer+0x1706/0x2c70 [i915]
> <4> [108.213071]  ? i915_gem_execbuffer2_ioctl+0xc0/0x470 [i915]
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [PATCH 2/8] drm/i915: Avoid live-lock with i915_vma_parked()
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 2/8] drm/i915: Avoid live-lock with i915_vma_parked() Chris Wilson
@ 2020-03-23 10:09   ` Tvrtko Ursulin
  0 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2020-03-23 10:09 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 23/03/2020 09:28, Chris Wilson wrote:
> Abuse^W Take advantage that we know we are inside the GT wakeref and
> that prevents any client execbuf from reopening the i915_vma in order to
> claim all the vma to close without having to drop the spinlock to free
> each one individually. By keeping the spinlock, we do not have to
> restart if we run concurrently with i915_gem_free_objects -- which
> causes them both to restart continually and make very very slow
> progress.
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/1361
> Fixes: 77853186e547 ("drm/i915: Claim vma while under closed_lock in i915_vma_parked()")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_vma.c | 29 ++++++++++++++---------------
>   1 file changed, 14 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 5b3efb43a8ef..08699fa069aa 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -1097,6 +1097,7 @@ void i915_vma_release(struct kref *ref)
>   void i915_vma_parked(struct intel_gt *gt)
>   {
>   	struct i915_vma *vma, *next;
> +	LIST_HEAD(closed);
>   
>   	spin_lock_irq(&gt->closed_lock);
>   	list_for_each_entry_safe(vma, next, &gt->closed_vma, closed_link) {
> @@ -1108,28 +1109,26 @@ void i915_vma_parked(struct intel_gt *gt)
>   		if (!kref_get_unless_zero(&obj->base.refcount))
>   			continue;
>   
> -		if (i915_vm_tryopen(vm)) {
> -			list_del_init(&vma->closed_link);
> -		} else {
> +		if (!i915_vm_tryopen(vm)) {
>   			i915_gem_object_put(obj);
> -			obj = NULL;
> +			continue;
>   		}
>   
> -		spin_unlock_irq(&gt->closed_lock);
> +		list_move(&vma->closed_link, &closed);
> +	}
> +	spin_unlock_irq(&gt->closed_lock);
>   
> -		if (obj) {
> -			__i915_vma_put(vma);
> -			i915_gem_object_put(obj);
> -		}
> +	/* As the GT is held idle, no vma can be reopened as we destroy them */
> +	list_for_each_entry_safe(vma, next, &closed, closed_link) {
> +		struct drm_i915_gem_object *obj = vma->obj;
> +		struct i915_address_space *vm = vma->vm;
>   
> -		i915_vm_close(vm);
> +		INIT_LIST_HEAD(&vma->closed_link);
> +		__i915_vma_put(vma);
>   
> -		/* Restart after dropping lock */
> -		spin_lock_irq(&gt->closed_lock);
> -		next = list_first_entry(&gt->closed_vma,
> -					typeof(*next), closed_link);
> +		i915_gem_object_put(obj);
> +		i915_vm_close(vm);
>   	}
> -	spin_unlock_irq(&gt->closed_lock);
>   }
>   
>   static void __i915_vma_iounmap(struct i915_vma *vma)
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [PATCH 3/8] drm/i915: Extend intel_wakeref to support delayed puts
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 3/8] drm/i915: Extend intel_wakeref to support delayed puts Chris Wilson
@ 2020-03-23 10:13   ` Tvrtko Ursulin
  2020-03-23 10:32   ` [Intel-gfx] [PATCH] " Chris Wilson
  1 sibling, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2020-03-23 10:13 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 23/03/2020 09:28, Chris Wilson wrote:
> In some cases we want to hold onto the wakeref for a little after the
> last user so that we can avoid having to drop and then immediately
> reacquire it. Allow the last user to specify if they would like to keep
> the wakeref alive for a short hysteresis.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gt/intel_engine_pm.h   |  6 ++++++
>   drivers/gpu/drm/i915/gt/intel_gt_requests.c |  2 +-
>   drivers/gpu/drm/i915/intel_wakeref.c        | 11 ++++++-----
>   drivers/gpu/drm/i915/intel_wakeref.h        | 10 ++++++++--
>   4 files changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.h b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
> index e52c2b0cb245..418df0a13145 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
> @@ -37,6 +37,12 @@ static inline void intel_engine_pm_put_async(struct intel_engine_cs *engine)
>   	intel_wakeref_put_async(&engine->wakeref);
>   }
>   
> +static inline void intel_engine_pm_put_delay(struct intel_engine_cs *engine,
> +					     unsigned long delay)
> +{
> +	intel_wakeref_put_delay(&engine->wakeref, delay);
> +}
> +
>   static inline void intel_engine_pm_flush(struct intel_engine_cs *engine)
>   {
>   	intel_wakeref_unlock_wait(&engine->wakeref);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index 24c99d0838af..835ec184763e 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -38,7 +38,7 @@ static bool flush_submission(struct intel_gt *gt)
>   	for_each_engine(engine, gt, id) {
>   		intel_engine_flush_submission(engine);
>   		active |= flush_work(&engine->retire_work);
> -		active |= flush_work(&engine->wakeref.work);
> +		active |= flush_delayed_work(&engine->wakeref.work);
>   	}
>   
>   	return active;
> diff --git a/drivers/gpu/drm/i915/intel_wakeref.c b/drivers/gpu/drm/i915/intel_wakeref.c
> index 8fbf6f4d3f26..2977bc0428e2 100644
> --- a/drivers/gpu/drm/i915/intel_wakeref.c
> +++ b/drivers/gpu/drm/i915/intel_wakeref.c
> @@ -70,11 +70,11 @@ static void ____intel_wakeref_put_last(struct intel_wakeref *wf)
>   
>   void __intel_wakeref_put_last(struct intel_wakeref *wf, unsigned long flags)
>   {
> -	INTEL_WAKEREF_BUG_ON(work_pending(&wf->work));
> +	INTEL_WAKEREF_BUG_ON(delayed_work_pending(&wf->work));
>   
>   	/* Assume we are not in process context and so cannot sleep. */
>   	if (flags & INTEL_WAKEREF_PUT_ASYNC || !mutex_trylock(&wf->mutex)) {
> -		schedule_work(&wf->work);
> +		mod_delayed_work(system_wq, &wf->work, flags >> 1);
>   		return;
>   	}
>   
> @@ -83,7 +83,7 @@ void __intel_wakeref_put_last(struct intel_wakeref *wf, unsigned long flags)
>   
>   static void __intel_wakeref_put_work(struct work_struct *wrk)
>   {
> -	struct intel_wakeref *wf = container_of(wrk, typeof(*wf), work);
> +	struct intel_wakeref *wf = container_of(wrk, typeof(*wf), work.work);
>   
>   	if (atomic_add_unless(&wf->count, -1, 1))
>   		return;
> @@ -104,8 +104,9 @@ void __intel_wakeref_init(struct intel_wakeref *wf,
>   	atomic_set(&wf->count, 0);
>   	wf->wakeref = 0;
>   
> -	INIT_WORK(&wf->work, __intel_wakeref_put_work);
> -	lockdep_init_map(&wf->work.lockdep_map, "wakeref.work", &key->work, 0);
> +	INIT_DELAYED_WORK(&wf->work, __intel_wakeref_put_work);
> +	lockdep_init_map(&wf->work.work.lockdep_map,
> +			 "wakeref.work", &key->work, 0);
>   }
>   
>   int intel_wakeref_wait_for_idle(struct intel_wakeref *wf)
> diff --git a/drivers/gpu/drm/i915/intel_wakeref.h b/drivers/gpu/drm/i915/intel_wakeref.h
> index 7d1e676b71ef..e87532e282d2 100644
> --- a/drivers/gpu/drm/i915/intel_wakeref.h
> +++ b/drivers/gpu/drm/i915/intel_wakeref.h
> @@ -41,7 +41,7 @@ struct intel_wakeref {
>   	struct intel_runtime_pm *rpm;
>   	const struct intel_wakeref_ops *ops;
>   
> -	struct work_struct work;
> +	struct delayed_work work;
>   };
>   
>   struct intel_wakeref_lockclass {
> @@ -154,6 +154,12 @@ intel_wakeref_put_async(struct intel_wakeref *wf)
>   	__intel_wakeref_put(wf, INTEL_WAKEREF_PUT_ASYNC);
>   }
>   
> +static inline void
> +intel_wakeref_put_delay(struct intel_wakeref *wf, unsigned long delay)
> +{
> +	__intel_wakeref_put(wf, INTEL_WAKEREF_PUT_ASYNC | delay << 1);
> +}
> +
>   /**
>    * intel_wakeref_lock: Lock the wakeref (mutex)
>    * @wf: the wakeref
> @@ -194,7 +200,7 @@ intel_wakeref_unlock_wait(struct intel_wakeref *wf)
>   {
>   	mutex_lock(&wf->mutex);
>   	mutex_unlock(&wf->mutex);
> -	flush_work(&wf->work);
> +	flush_delayed_work(&wf->work);
>   }
>   
>   /**
> 

I'd put a note after "#define INTEL_WAKEREF_PUT_ASYNC" that high bits 
are reserved. Could be also worth defining a macro for the delay shift 
value at the same place.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [PATCH 4/8] drm/i915/gt: Delay release of engine-pm after last retirement
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 4/8] drm/i915/gt: Delay release of engine-pm after last retirement Chris Wilson
@ 2020-03-23 10:14   ` Tvrtko Ursulin
  0 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2020-03-23 10:14 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 23/03/2020 09:28, Chris Wilson wrote:
> Keep the engine-pm awake until the next jiffie, to avoid immediate
> ping-pong under moderate load. (Forcing the idle barrier excerbates the
> moderate load, dramatically increasing the driver overhead.) On the
> other hand, delaying the idle-barrier slightly incurs longer rc6-off
> and so more power consumption.
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/848
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gt/intel_context.c | 2 +-
>   drivers/gpu/drm/i915/i915_active.c      | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index e4aece20bc80..622ff425fce9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -350,7 +350,7 @@ void intel_context_enter_engine(struct intel_context *ce)
>   void intel_context_exit_engine(struct intel_context *ce)
>   {
>   	intel_timeline_exit(ce->timeline);
> -	intel_engine_pm_put(ce->engine);
> +	intel_engine_pm_put_delay(ce->engine, 1);
>   }
>   
>   int intel_context_prepare_remote_request(struct intel_context *ce,
> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> index c4048628188a..a0d31f7bfb42 100644
> --- a/drivers/gpu/drm/i915/i915_active.c
> +++ b/drivers/gpu/drm/i915/i915_active.c
> @@ -818,7 +818,7 @@ void i915_active_acquire_barrier(struct i915_active *ref)
>   
>   		GEM_BUG_ON(!intel_engine_pm_is_awake(engine));
>   		llist_add(barrier_to_ll(node), &engine->barrier_tasks);
> -		intel_engine_pm_put(engine);
> +		intel_engine_pm_put_delay(engine, 1);
>   	}
>   }
>   
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [PATCH 5/8] drm/i915: Rely on direct submission to the queue
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 5/8] drm/i915: Rely on direct submission to the queue Chris Wilson
@ 2020-03-23 10:29   ` Tvrtko Ursulin
  0 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2020-03-23 10:29 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 23/03/2020 09:28, Chris Wilson wrote:
> Drop the pretense of kicking the tasklet (used only for the defunct guc
> submission backend, it should just take ownership of the submit!) and so
> remove the bh-kicking from around submission.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 --
>   drivers/gpu/drm/i915/gt/intel_lrc.c            | 5 +----
>   drivers/gpu/drm/i915/i915_request.c            | 2 --
>   3 files changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 36d069504836..c2bd5accde0c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -2348,9 +2348,7 @@ static void eb_request_add(struct i915_execbuffer *eb)
>   		__i915_request_skip(rq);
>   	}
>   
> -	local_bh_disable();
>   	__i915_request_queue(rq, &attr);
> -	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
>   
>   	/* Try to clean up the client's timeline after submitting the request */
>   	if (prev)
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index f09dd87324b9..210f60e14ef4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2891,10 +2891,7 @@ static void __submit_queue_imm(struct intel_engine_cs *engine)
>   	if (reset_in_progress(execlists))
>   		return; /* defer until we restart the engine following reset */
>   
> -	if (execlists->tasklet.func == execlists_submission_tasklet)
> -		__execlists_submission_tasklet(engine);
> -	else
> -		tasklet_hi_schedule(&execlists->tasklet);
> +	__execlists_submission_tasklet(engine);
>   }
>   
>   static void submit_queue(struct intel_engine_cs *engine,
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index c0df71d7d0ff..3388c5b610c5 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1444,9 +1444,7 @@ void i915_request_add(struct i915_request *rq)
>   	if (list_empty(&rq->sched.signalers_list))
>   		attr.priority |= I915_PRIORITY_WAIT;
>   
> -	local_bh_disable();
>   	__i915_request_queue(rq, &attr);
> -	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
>   
>   	mutex_unlock(&tl->mutex);
>   }
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Intel-gfx] [PATCH] drm/i915: Extend intel_wakeref to support delayed puts
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 3/8] drm/i915: Extend intel_wakeref to support delayed puts Chris Wilson
  2020-03-23 10:13   ` Tvrtko Ursulin
@ 2020-03-23 10:32   ` Chris Wilson
  2020-03-23 12:24     ` Tvrtko Ursulin
  1 sibling, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2020-03-23 10:32 UTC (permalink / raw)
  To: intel-gfx

In some cases we want to hold onto the wakeref for a little after the
last user so that we can avoid having to drop and then immediately
reacquire it. Allow the last user to specify if they would like to keep
the wakeref alive for a short hysteresis.

v2: Embrace bitfield.h for adjustable flags.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_pm.h   |  6 ++++++
 drivers/gpu/drm/i915/gt/intel_gt_requests.c |  2 +-
 drivers/gpu/drm/i915/intel_wakeref.c        | 12 ++++++-----
 drivers/gpu/drm/i915/intel_wakeref.h        | 22 ++++++++++++++++++---
 4 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.h b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
index e52c2b0cb245..418df0a13145 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
@@ -37,6 +37,12 @@ static inline void intel_engine_pm_put_async(struct intel_engine_cs *engine)
 	intel_wakeref_put_async(&engine->wakeref);
 }
 
+static inline void intel_engine_pm_put_delay(struct intel_engine_cs *engine,
+					     unsigned long delay)
+{
+	intel_wakeref_put_delay(&engine->wakeref, delay);
+}
+
 static inline void intel_engine_pm_flush(struct intel_engine_cs *engine)
 {
 	intel_wakeref_unlock_wait(&engine->wakeref);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index 24c99d0838af..835ec184763e 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -38,7 +38,7 @@ static bool flush_submission(struct intel_gt *gt)
 	for_each_engine(engine, gt, id) {
 		intel_engine_flush_submission(engine);
 		active |= flush_work(&engine->retire_work);
-		active |= flush_work(&engine->wakeref.work);
+		active |= flush_delayed_work(&engine->wakeref.work);
 	}
 
 	return active;
diff --git a/drivers/gpu/drm/i915/intel_wakeref.c b/drivers/gpu/drm/i915/intel_wakeref.c
index 8fbf6f4d3f26..dfd87d082218 100644
--- a/drivers/gpu/drm/i915/intel_wakeref.c
+++ b/drivers/gpu/drm/i915/intel_wakeref.c
@@ -70,11 +70,12 @@ static void ____intel_wakeref_put_last(struct intel_wakeref *wf)
 
 void __intel_wakeref_put_last(struct intel_wakeref *wf, unsigned long flags)
 {
-	INTEL_WAKEREF_BUG_ON(work_pending(&wf->work));
+	INTEL_WAKEREF_BUG_ON(delayed_work_pending(&wf->work));
 
 	/* Assume we are not in process context and so cannot sleep. */
 	if (flags & INTEL_WAKEREF_PUT_ASYNC || !mutex_trylock(&wf->mutex)) {
-		schedule_work(&wf->work);
+		mod_delayed_work(system_wq, &wf->work,
+				 FIELD_GET(INTEL_WAKEREF_PUT_DELAY, flags));
 		return;
 	}
 
@@ -83,7 +84,7 @@ void __intel_wakeref_put_last(struct intel_wakeref *wf, unsigned long flags)
 
 static void __intel_wakeref_put_work(struct work_struct *wrk)
 {
-	struct intel_wakeref *wf = container_of(wrk, typeof(*wf), work);
+	struct intel_wakeref *wf = container_of(wrk, typeof(*wf), work.work);
 
 	if (atomic_add_unless(&wf->count, -1, 1))
 		return;
@@ -104,8 +105,9 @@ void __intel_wakeref_init(struct intel_wakeref *wf,
 	atomic_set(&wf->count, 0);
 	wf->wakeref = 0;
 
-	INIT_WORK(&wf->work, __intel_wakeref_put_work);
-	lockdep_init_map(&wf->work.lockdep_map, "wakeref.work", &key->work, 0);
+	INIT_DELAYED_WORK(&wf->work, __intel_wakeref_put_work);
+	lockdep_init_map(&wf->work.work.lockdep_map,
+			 "wakeref.work", &key->work, 0);
 }
 
 int intel_wakeref_wait_for_idle(struct intel_wakeref *wf)
diff --git a/drivers/gpu/drm/i915/intel_wakeref.h b/drivers/gpu/drm/i915/intel_wakeref.h
index 7d1e676b71ef..545c8f277c46 100644
--- a/drivers/gpu/drm/i915/intel_wakeref.h
+++ b/drivers/gpu/drm/i915/intel_wakeref.h
@@ -8,6 +8,7 @@
 #define INTEL_WAKEREF_H
 
 #include <linux/atomic.h>
+#include <linux/bitfield.h>
 #include <linux/bits.h>
 #include <linux/lockdep.h>
 #include <linux/mutex.h>
@@ -41,7 +42,7 @@ struct intel_wakeref {
 	struct intel_runtime_pm *rpm;
 	const struct intel_wakeref_ops *ops;
 
-	struct work_struct work;
+	struct delayed_work work;
 };
 
 struct intel_wakeref_lockclass {
@@ -117,6 +118,11 @@ intel_wakeref_get_if_active(struct intel_wakeref *wf)
 	return atomic_inc_not_zero(&wf->count);
 }
 
+enum {
+	INTEL_WAKEREF_PUT_ASYNC_BIT = 0,
+	__INTEL_WAKEREF_PUT_LAST_BIT__
+};
+
 /**
  * intel_wakeref_put_flags: Release the wakeref
  * @wf: the wakeref
@@ -134,7 +140,9 @@ intel_wakeref_get_if_active(struct intel_wakeref *wf)
  */
 static inline void
 __intel_wakeref_put(struct intel_wakeref *wf, unsigned long flags)
-#define INTEL_WAKEREF_PUT_ASYNC BIT(0)
+#define INTEL_WAKEREF_PUT_ASYNC BIT(INTEL_WAKEREF_PUT_ASYNC_BIT)
+#define INTEL_WAKEREF_PUT_DELAY \
+	GENMASK(BITS_PER_LONG - 1, __INTEL_WAKEREF_PUT_LAST_BIT__)
 {
 	INTEL_WAKEREF_BUG_ON(atomic_read(&wf->count) <= 0);
 	if (unlikely(!atomic_add_unless(&wf->count, -1, 1)))
@@ -154,6 +162,14 @@ intel_wakeref_put_async(struct intel_wakeref *wf)
 	__intel_wakeref_put(wf, INTEL_WAKEREF_PUT_ASYNC);
 }
 
+static inline void
+intel_wakeref_put_delay(struct intel_wakeref *wf, unsigned long delay)
+{
+	__intel_wakeref_put(wf,
+			    INTEL_WAKEREF_PUT_ASYNC |
+			    FIELD_PREP(INTEL_WAKEREF_PUT_DELAY, delay));
+}
+
 /**
  * intel_wakeref_lock: Lock the wakeref (mutex)
  * @wf: the wakeref
@@ -194,7 +210,7 @@ intel_wakeref_unlock_wait(struct intel_wakeref *wf)
 {
 	mutex_lock(&wf->mutex);
 	mutex_unlock(&wf->mutex);
-	flush_work(&wf->work);
+	flush_delayed_work(&wf->work);
 }
 
 /**
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: Immediately execute the fenced work
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 7/8] drm/i915: Immediately execute the fenced work Chris Wilson
@ 2020-03-23 10:37   ` Tvrtko Ursulin
  2020-03-23 10:46     ` Chris Wilson
  2020-03-23 11:04   ` [Intel-gfx] [PATCH] " Chris Wilson
  1 sibling, 1 reply; 22+ messages in thread
From: Tvrtko Ursulin @ 2020-03-23 10:37 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 23/03/2020 09:28, Chris Wilson wrote:
> If the caller allows and we do not have to wait for any signals,
> immediately execute the work within the caller's process. By doing so we
> avoid the overhead of scheduling a new task, and the latency in
> executing it, at the cost of pulling that work back into the immediate
> context. (Sometimes we still prefer to offload the task to another cpu,
> especially if we plan on executing many such tasks in parallel for this
> client.)
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c |  2 +-
>   drivers/gpu/drm/i915/i915_sw_fence_work.c      |  5 ++++-
>   drivers/gpu/drm/i915/i915_sw_fence_work.h      | 12 ++++++++++++
>   drivers/gpu/drm/i915/i915_vma.c                |  2 +-
>   4 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index c2bd5accde0c..e80c6f613feb 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -1784,7 +1784,7 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
>   	dma_resv_add_excl_fence(shadow->resv, &pw->base.dma);
>   	dma_resv_unlock(shadow->resv);
>   
> -	dma_fence_work_commit(&pw->base);
> +	dma_fence_work_commit_imm(&pw->base);
>   	return 0;
>   
>   err_batch_unlock:
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
> index 997b2998f1f2..a3a81bb8f2c3 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
> @@ -38,7 +38,10 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>   
>   		if (!f->dma.error) {
>   			dma_fence_get(&f->dma);
> -			queue_work(system_unbound_wq, &f->work);
> +			if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
> +				fence_work(&f->work);
> +			else
> +				queue_work(system_unbound_wq, &f->work);
>   		} else {
>   			fence_complete(f);
>   		}
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h
> index 3a22b287e201..0719d661dc9c 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence_work.h
> +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h
> @@ -32,6 +32,10 @@ struct dma_fence_work {
>   	const struct dma_fence_work_ops *ops;
>   };
>   
> +enum {
> +	DMA_FENCE_WORK_IMM = DMA_FENCE_FLAG_USER_BITS,
> +};
> +
>   void dma_fence_work_init(struct dma_fence_work *f,
>   			 const struct dma_fence_work_ops *ops);
>   int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal);
> @@ -41,4 +45,12 @@ static inline void dma_fence_work_commit(struct dma_fence_work *f)
>   	i915_sw_fence_commit(&f->chain);
>   }
>   
> +static inline void dma_fence_work_commit_imm(struct dma_fence_work *f)
> +{
> +	if (atomic_read(&f->chain.pending) <= 1)
> +		__set_bit(DMA_FENCE_WORK_IMM, &f->dma.flags);
> +

What is someone bumps pending to 2 at this point?

Regards,

Tvrtko

> +	dma_fence_work_commit(f);
> +}
> +
>   #endif /* I915_SW_FENCE_WORK_H */
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 08699fa069aa..191577a98390 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -980,7 +980,7 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
>   	mutex_unlock(&vma->vm->mutex);
>   err_fence:
>   	if (work)
> -		dma_fence_work_commit(&work->base);
> +		dma_fence_work_commit_imm(&work->base);
>   	if (wakeref)
>   		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
>   err_pages:
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [PATCH 8/8] drm/i915/gem: Avoid gem_context->mutex for simple vma lookup
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 8/8] drm/i915/gem: Avoid gem_context->mutex for simple vma lookup Chris Wilson
@ 2020-03-23 10:38   ` Tvrtko Ursulin
  0 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2020-03-23 10:38 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 23/03/2020 09:28, Chris Wilson wrote:
> As we store the handle lookup inside a radix tree, we do not need the
> gem_context->mutex except until we need to insert our lookup into the
> common radix tree. This takes a small bit of rearranging to ensure that
> the lut we insert into the tree is ready prior to actually inserting it
> (as soon as it is exposed via the radixtree, it is visible to any other
> submission).
> 
> v2: For brownie points, remove the goto spaghetti.
> v3: Tighten up the closed-handle checks.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 136 +++++++++++-------
>   1 file changed, 87 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index e80c6f613feb..c643eec4dca0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -481,7 +481,7 @@ eb_add_vma(struct i915_execbuffer *eb,
>   
>   	GEM_BUG_ON(i915_vma_is_closed(vma));
>   
> -	ev->vma = i915_vma_get(vma);
> +	ev->vma = vma;
>   	ev->exec = entry;
>   	ev->flags = entry->flags;
>   
> @@ -728,77 +728,117 @@ static int eb_select_context(struct i915_execbuffer *eb)
>   	return 0;
>   }
>   
> -static int eb_lookup_vmas(struct i915_execbuffer *eb)
> +static int __eb_add_lut(struct i915_execbuffer *eb,
> +			u32 handle, struct i915_vma *vma)
>   {
> -	struct radix_tree_root *handles_vma = &eb->gem_context->handles_vma;
> -	struct drm_i915_gem_object *obj;
> -	unsigned int i, batch;
> +	struct i915_gem_context *ctx = eb->gem_context;
> +	struct i915_lut_handle *lut;
>   	int err;
>   
> -	if (unlikely(i915_gem_context_is_closed(eb->gem_context)))
> -		return -ENOENT;
> +	lut = i915_lut_handle_alloc();
> +	if (unlikely(!lut))
> +		return -ENOMEM;
>   
> -	INIT_LIST_HEAD(&eb->relocs);
> -	INIT_LIST_HEAD(&eb->unbound);
> +	i915_vma_get(vma);
> +	if (!atomic_fetch_inc(&vma->open_count))
> +		i915_vma_reopen(vma);
> +	lut->handle = handle;
> +	lut->ctx = ctx;
> +
> +	/* Check that the context hasn't been closed in the meantime */
> +	err = -EINTR;
> +	if (!mutex_lock_interruptible(&ctx->mutex)) {
> +		err = -ENOENT;
> +		if (likely(!i915_gem_context_is_closed(ctx)))
> +			err = radix_tree_insert(&ctx->handles_vma, handle, vma);
> +		if (err == 0) { /* And nor has this handle */
> +			struct drm_i915_gem_object *obj = vma->obj;
> +
> +			i915_gem_object_lock(obj);
> +			if (idr_find(&eb->file->object_idr, handle) == obj) {
> +				list_add(&lut->obj_link, &obj->lut_list);
> +			} else {
> +				radix_tree_delete(&ctx->handles_vma, handle);
> +				err = -ENOENT;
> +			}
> +			i915_gem_object_unlock(obj);
> +		}
> +		mutex_unlock(&ctx->mutex);
> +	}
> +	if (unlikely(err))
> +		goto err;
>   
> -	batch = eb_batch_index(eb);
> +	return 0;
>   
> -	for (i = 0; i < eb->buffer_count; i++) {
> -		u32 handle = eb->exec[i].handle;
> -		struct i915_lut_handle *lut;
> +err:
> +	atomic_dec(&vma->open_count);
> +	i915_vma_put(vma);
> +	i915_lut_handle_free(lut);
> +	return err;
> +}
> +
> +static struct i915_vma *eb_lookup_vma(struct i915_execbuffer *eb, u32 handle)
> +{
> +	do {
> +		struct drm_i915_gem_object *obj;
>   		struct i915_vma *vma;
> +		int err;
>   
> -		vma = radix_tree_lookup(handles_vma, handle);
> +		rcu_read_lock();
> +		vma = radix_tree_lookup(&eb->gem_context->handles_vma, handle);
> +		if (likely(vma))
> +			vma = i915_vma_tryget(vma);
> +		rcu_read_unlock();
>   		if (likely(vma))
> -			goto add_vma;
> +			return vma;
>   
>   		obj = i915_gem_object_lookup(eb->file, handle);
> -		if (unlikely(!obj)) {
> -			err = -ENOENT;
> -			goto err_vma;
> -		}
> +		if (unlikely(!obj))
> +			return ERR_PTR(-ENOENT);
>   
>   		vma = i915_vma_instance(obj, eb->context->vm, NULL);
>   		if (IS_ERR(vma)) {
> -			err = PTR_ERR(vma);
> -			goto err_obj;
> +			i915_gem_object_put(obj);
> +			return vma;
>   		}
>   
> -		lut = i915_lut_handle_alloc();
> -		if (unlikely(!lut)) {
> -			err = -ENOMEM;
> -			goto err_obj;
> -		}
> +		err = __eb_add_lut(eb, handle, vma);
> +		if (likely(!err))
> +			return vma;
>   
> -		err = radix_tree_insert(handles_vma, handle, vma);
> -		if (unlikely(err)) {
> -			i915_lut_handle_free(lut);
> -			goto err_obj;
> -		}
> +		i915_gem_object_put(obj);
> +		if (err != -EEXIST)
> +			return ERR_PTR(err);
> +	} while (1);
> +}
>   
> -		/* transfer ref to lut */
> -		if (!atomic_fetch_inc(&vma->open_count))
> -			i915_vma_reopen(vma);
> -		lut->handle = handle;
> -		lut->ctx = eb->gem_context;
> +static int eb_lookup_vmas(struct i915_execbuffer *eb)
> +{
> +	unsigned int batch = eb_batch_index(eb);
> +	unsigned int i;
> +	int err = 0;
>   
> -		i915_gem_object_lock(obj);
> -		list_add(&lut->obj_link, &obj->lut_list);
> -		i915_gem_object_unlock(obj);
> +	INIT_LIST_HEAD(&eb->relocs);
> +	INIT_LIST_HEAD(&eb->unbound);
> +
> +	for (i = 0; i < eb->buffer_count; i++) {
> +		struct i915_vma *vma;
> +
> +		vma = eb_lookup_vma(eb, eb->exec[i].handle);
> +		if (IS_ERR(vma)) {
> +			err = PTR_ERR(vma);
> +			break;
> +		}
>   
> -add_vma:
>   		err = eb_validate_vma(eb, &eb->exec[i], vma);
> -		if (unlikely(err))
> -			goto err_vma;
> +		if (unlikely(err)) {
> +			i915_vma_put(vma);
> +			break;
> +		}
>   
>   		eb_add_vma(eb, i, batch, vma);
>   	}
>   
> -	return 0;
> -
> -err_obj:
> -	i915_gem_object_put(obj);
> -err_vma:
>   	eb->vma[i].vma = NULL;
>   	return err;
>   }
> @@ -1494,9 +1534,7 @@ static int eb_relocate(struct i915_execbuffer *eb)
>   {
>   	int err;
>   
> -	mutex_lock(&eb->gem_context->mutex);
>   	err = eb_lookup_vmas(eb);
> -	mutex_unlock(&eb->gem_context->mutex);
>   	if (err)
>   		return err;
>   
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: Immediately execute the fenced work
  2020-03-23 10:37   ` Tvrtko Ursulin
@ 2020-03-23 10:46     ` Chris Wilson
  2020-03-24 16:13       ` Tvrtko Ursulin
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Wilson @ 2020-03-23 10:46 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-03-23 10:37:22)
> 
> On 23/03/2020 09:28, Chris Wilson wrote:
> > If the caller allows and we do not have to wait for any signals,
> > immediately execute the work within the caller's process. By doing so we
> > avoid the overhead of scheduling a new task, and the latency in
> > executing it, at the cost of pulling that work back into the immediate
> > context. (Sometimes we still prefer to offload the task to another cpu,
> > especially if we plan on executing many such tasks in parallel for this
> > client.)
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c |  2 +-
> >   drivers/gpu/drm/i915/i915_sw_fence_work.c      |  5 ++++-
> >   drivers/gpu/drm/i915/i915_sw_fence_work.h      | 12 ++++++++++++
> >   drivers/gpu/drm/i915/i915_vma.c                |  2 +-
> >   4 files changed, 18 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > index c2bd5accde0c..e80c6f613feb 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > @@ -1784,7 +1784,7 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
> >       dma_resv_add_excl_fence(shadow->resv, &pw->base.dma);
> >       dma_resv_unlock(shadow->resv);
> >   
> > -     dma_fence_work_commit(&pw->base);
> > +     dma_fence_work_commit_imm(&pw->base);
> >       return 0;
> >   
> >   err_batch_unlock:
> > diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
> > index 997b2998f1f2..a3a81bb8f2c3 100644
> > --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
> > +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
> > @@ -38,7 +38,10 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> >   
> >               if (!f->dma.error) {
> >                       dma_fence_get(&f->dma);
> > -                     queue_work(system_unbound_wq, &f->work);
> > +                     if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
> > +                             fence_work(&f->work);
> > +                     else
> > +                             queue_work(system_unbound_wq, &f->work);
> >               } else {
> >                       fence_complete(f);
> >               }
> > diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h
> > index 3a22b287e201..0719d661dc9c 100644
> > --- a/drivers/gpu/drm/i915/i915_sw_fence_work.h
> > +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h
> > @@ -32,6 +32,10 @@ struct dma_fence_work {
> >       const struct dma_fence_work_ops *ops;
> >   };
> >   
> > +enum {
> > +     DMA_FENCE_WORK_IMM = DMA_FENCE_FLAG_USER_BITS,
> > +};
> > +
> >   void dma_fence_work_init(struct dma_fence_work *f,
> >                        const struct dma_fence_work_ops *ops);
> >   int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal);
> > @@ -41,4 +45,12 @@ static inline void dma_fence_work_commit(struct dma_fence_work *f)
> >       i915_sw_fence_commit(&f->chain);
> >   }
> >   
> > +static inline void dma_fence_work_commit_imm(struct dma_fence_work *f)
> > +{
> > +     if (atomic_read(&f->chain.pending) <= 1)
> > +             __set_bit(DMA_FENCE_WORK_IMM, &f->dma.flags);
> > +
> 
> What is someone bumps pending to 2 at this point?

That's invalid. The sequence is

create a worker
... add all async waits ...
commit

Since the worker fires when pending waits hits zero, you cannot add any
more async waits after commit in a race free manner. You have to play
games, such as "is this fence already signaled? no, let's delay it
again". If you are playing such games, you know already and shouldn't be
trying to execute synchronously/immediately.

A BUG_ON(!dma_fence_signaled(&f->dma)) would suffice to catch most such
races.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Intel-gfx] [PATCH] drm/i915: Immediately execute the fenced work
  2020-03-23  9:28 ` [Intel-gfx] [PATCH 7/8] drm/i915: Immediately execute the fenced work Chris Wilson
  2020-03-23 10:37   ` Tvrtko Ursulin
@ 2020-03-23 11:04   ` Chris Wilson
  1 sibling, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2020-03-23 11:04 UTC (permalink / raw)
  To: intel-gfx

If the caller allows and we do not have to wait for any signals,
immediately execute the work within the caller's process. By doing so we
avoid the overhead of scheduling a new task, and the latency in
executing it, at the cost of pulling that work back into the immediate
context. (Sometimes we still prefer to offload the task to another cpu,
especially if we plan on executing many such tasks in parallel for this
client.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  2 +-
 drivers/gpu/drm/i915/i915_sw_fence_work.c     |  5 ++-
 drivers/gpu/drm/i915/i915_sw_fence_work.h     | 33 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_vma.c               |  2 +-
 4 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index c2bd5accde0c..e80c6f613feb 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1784,7 +1784,7 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
 	dma_resv_add_excl_fence(shadow->resv, &pw->base.dma);
 	dma_resv_unlock(shadow->resv);
 
-	dma_fence_work_commit(&pw->base);
+	dma_fence_work_commit_imm(&pw->base);
 	return 0;
 
 err_batch_unlock:
diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
index 997b2998f1f2..a3a81bb8f2c3 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
@@ -38,7 +38,10 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 
 		if (!f->dma.error) {
 			dma_fence_get(&f->dma);
-			queue_work(system_unbound_wq, &f->work);
+			if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
+				fence_work(&f->work);
+			else
+				queue_work(system_unbound_wq, &f->work);
 		} else {
 			fence_complete(f);
 		}
diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h
index 3a22b287e201..1e85ecaba105 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence_work.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h
@@ -13,6 +13,14 @@
 
 #include "i915_sw_fence.h"
 
+#ifdef CONFIG_DRM_I915_DEBUG_GEM /* placeholder */
+#define __DBG_BUG_ON(expr) BUG_ON(expr)
+#define __DBG_WARN_ON(expr) WARN_ON(expr)
+#else
+#define __DBG_BUG_ON(expr) BUILD_BUG_ON_INVALID(expr)
+#define __DBG_WARN_ON(expr) ({ unlikely(!!(expr)); })
+#endif
+
 struct dma_fence_work;
 
 struct dma_fence_work_ops {
@@ -32,6 +40,10 @@ struct dma_fence_work {
 	const struct dma_fence_work_ops *ops;
 };
 
+enum {
+	DMA_FENCE_WORK_IMM = DMA_FENCE_FLAG_USER_BITS,
+};
+
 void dma_fence_work_init(struct dma_fence_work *f,
 			 const struct dma_fence_work_ops *ops);
 int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal);
@@ -41,4 +53,25 @@ static inline void dma_fence_work_commit(struct dma_fence_work *f)
 	i915_sw_fence_commit(&f->chain);
 }
 
+/**
+ * dma_fence_work_commit_imm: Commit the fence, and if possible execute locally.
+ * @f: the fenced worker
+ *
+ * Instead of always scheduling a worker to execute the callback (see
+ * dma_fence_work_commit()), we try to execute the callback immediately in
+ * the local context. It is required that the fence be committed before it
+ * is published, and that no other threads try to tamper with the number
+ * of asynchronous waits on the fence (or else the callback will be
+ * executed in the wrong context, i.e. not the callers).
+ */
+static inline void dma_fence_work_commit_imm(struct dma_fence_work *f)
+{
+	if (atomic_read(&f->chain.pending) <= 1)
+		__set_bit(DMA_FENCE_WORK_IMM, &f->dma.flags);
+
+	dma_fence_work_commit(f);
+	__DBG_BUG_ON(test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags) &&
+		     !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &f->dma.flags));
+}
+
 #endif /* I915_SW_FENCE_WORK_H */
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 08699fa069aa..191577a98390 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -980,7 +980,7 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags)
 	mutex_unlock(&vma->vm->mutex);
 err_fence:
 	if (work)
-		dma_fence_work_commit(&work->base);
+		dma_fence_work_commit_imm(&work->base);
 	if (wakeref)
 		intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref);
 err_pages:
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Extend intel_wakeref to support delayed puts
  2020-03-23 10:32   ` [Intel-gfx] [PATCH] " Chris Wilson
@ 2020-03-23 12:24     ` Tvrtko Ursulin
  0 siblings, 0 replies; 22+ messages in thread
From: Tvrtko Ursulin @ 2020-03-23 12:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 23/03/2020 10:32, Chris Wilson wrote:
> In some cases we want to hold onto the wakeref for a little after the
> last user so that we can avoid having to drop and then immediately
> reacquire it. Allow the last user to specify if they would like to keep
> the wakeref alive for a short hysteresis.
> 
> v2: Embrace bitfield.h for adjustable flags.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_engine_pm.h   |  6 ++++++
>   drivers/gpu/drm/i915/gt/intel_gt_requests.c |  2 +-
>   drivers/gpu/drm/i915/intel_wakeref.c        | 12 ++++++-----
>   drivers/gpu/drm/i915/intel_wakeref.h        | 22 ++++++++++++++++++---
>   4 files changed, 33 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.h b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
> index e52c2b0cb245..418df0a13145 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.h
> @@ -37,6 +37,12 @@ static inline void intel_engine_pm_put_async(struct intel_engine_cs *engine)
>   	intel_wakeref_put_async(&engine->wakeref);
>   }
>   
> +static inline void intel_engine_pm_put_delay(struct intel_engine_cs *engine,
> +					     unsigned long delay)
> +{
> +	intel_wakeref_put_delay(&engine->wakeref, delay);
> +}
> +
>   static inline void intel_engine_pm_flush(struct intel_engine_cs *engine)
>   {
>   	intel_wakeref_unlock_wait(&engine->wakeref);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index 24c99d0838af..835ec184763e 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -38,7 +38,7 @@ static bool flush_submission(struct intel_gt *gt)
>   	for_each_engine(engine, gt, id) {
>   		intel_engine_flush_submission(engine);
>   		active |= flush_work(&engine->retire_work);
> -		active |= flush_work(&engine->wakeref.work);
> +		active |= flush_delayed_work(&engine->wakeref.work);
>   	}
>   
>   	return active;
> diff --git a/drivers/gpu/drm/i915/intel_wakeref.c b/drivers/gpu/drm/i915/intel_wakeref.c
> index 8fbf6f4d3f26..dfd87d082218 100644
> --- a/drivers/gpu/drm/i915/intel_wakeref.c
> +++ b/drivers/gpu/drm/i915/intel_wakeref.c
> @@ -70,11 +70,12 @@ static void ____intel_wakeref_put_last(struct intel_wakeref *wf)
>   
>   void __intel_wakeref_put_last(struct intel_wakeref *wf, unsigned long flags)
>   {
> -	INTEL_WAKEREF_BUG_ON(work_pending(&wf->work));
> +	INTEL_WAKEREF_BUG_ON(delayed_work_pending(&wf->work));
>   
>   	/* Assume we are not in process context and so cannot sleep. */
>   	if (flags & INTEL_WAKEREF_PUT_ASYNC || !mutex_trylock(&wf->mutex)) {
> -		schedule_work(&wf->work);
> +		mod_delayed_work(system_wq, &wf->work,
> +				 FIELD_GET(INTEL_WAKEREF_PUT_DELAY, flags));
>   		return;
>   	}
>   
> @@ -83,7 +84,7 @@ void __intel_wakeref_put_last(struct intel_wakeref *wf, unsigned long flags)
>   
>   static void __intel_wakeref_put_work(struct work_struct *wrk)
>   {
> -	struct intel_wakeref *wf = container_of(wrk, typeof(*wf), work);
> +	struct intel_wakeref *wf = container_of(wrk, typeof(*wf), work.work);
>   
>   	if (atomic_add_unless(&wf->count, -1, 1))
>   		return;
> @@ -104,8 +105,9 @@ void __intel_wakeref_init(struct intel_wakeref *wf,
>   	atomic_set(&wf->count, 0);
>   	wf->wakeref = 0;
>   
> -	INIT_WORK(&wf->work, __intel_wakeref_put_work);
> -	lockdep_init_map(&wf->work.lockdep_map, "wakeref.work", &key->work, 0);
> +	INIT_DELAYED_WORK(&wf->work, __intel_wakeref_put_work);
> +	lockdep_init_map(&wf->work.work.lockdep_map,
> +			 "wakeref.work", &key->work, 0);
>   }
>   
>   int intel_wakeref_wait_for_idle(struct intel_wakeref *wf)
> diff --git a/drivers/gpu/drm/i915/intel_wakeref.h b/drivers/gpu/drm/i915/intel_wakeref.h
> index 7d1e676b71ef..545c8f277c46 100644
> --- a/drivers/gpu/drm/i915/intel_wakeref.h
> +++ b/drivers/gpu/drm/i915/intel_wakeref.h
> @@ -8,6 +8,7 @@
>   #define INTEL_WAKEREF_H
>   
>   #include <linux/atomic.h>
> +#include <linux/bitfield.h>
>   #include <linux/bits.h>
>   #include <linux/lockdep.h>
>   #include <linux/mutex.h>
> @@ -41,7 +42,7 @@ struct intel_wakeref {
>   	struct intel_runtime_pm *rpm;
>   	const struct intel_wakeref_ops *ops;
>   
> -	struct work_struct work;
> +	struct delayed_work work;
>   };
>   
>   struct intel_wakeref_lockclass {
> @@ -117,6 +118,11 @@ intel_wakeref_get_if_active(struct intel_wakeref *wf)
>   	return atomic_inc_not_zero(&wf->count);
>   }
>   
> +enum {
> +	INTEL_WAKEREF_PUT_ASYNC_BIT = 0,
> +	__INTEL_WAKEREF_PUT_LAST_BIT__
> +};
> +
>   /**
>    * intel_wakeref_put_flags: Release the wakeref
>    * @wf: the wakeref
> @@ -134,7 +140,9 @@ intel_wakeref_get_if_active(struct intel_wakeref *wf)
>    */
>   static inline void
>   __intel_wakeref_put(struct intel_wakeref *wf, unsigned long flags)
> -#define INTEL_WAKEREF_PUT_ASYNC BIT(0)
> +#define INTEL_WAKEREF_PUT_ASYNC BIT(INTEL_WAKEREF_PUT_ASYNC_BIT)
> +#define INTEL_WAKEREF_PUT_DELAY \
> +	GENMASK(BITS_PER_LONG - 1, __INTEL_WAKEREF_PUT_LAST_BIT__)
>   {
>   	INTEL_WAKEREF_BUG_ON(atomic_read(&wf->count) <= 0);
>   	if (unlikely(!atomic_add_unless(&wf->count, -1, 1)))
> @@ -154,6 +162,14 @@ intel_wakeref_put_async(struct intel_wakeref *wf)
>   	__intel_wakeref_put(wf, INTEL_WAKEREF_PUT_ASYNC);
>   }
>   
> +static inline void
> +intel_wakeref_put_delay(struct intel_wakeref *wf, unsigned long delay)
> +{
> +	__intel_wakeref_put(wf,
> +			    INTEL_WAKEREF_PUT_ASYNC |
> +			    FIELD_PREP(INTEL_WAKEREF_PUT_DELAY, delay));
> +}
> +
>   /**
>    * intel_wakeref_lock: Lock the wakeref (mutex)
>    * @wf: the wakeref
> @@ -194,7 +210,7 @@ intel_wakeref_unlock_wait(struct intel_wakeref *wf)
>   {
>   	mutex_lock(&wf->mutex);
>   	mutex_unlock(&wf->mutex);
> -	flush_work(&wf->work);
> +	flush_delayed_work(&wf->work);
>   }
>   
>   /**
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period (rev3)
  2020-03-23  9:28 [Intel-gfx] [PATCH 1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period Chris Wilson
                   ` (7 preceding siblings ...)
  2020-03-23  9:43 ` [Intel-gfx] [PATCH 1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period Matthew Auld
@ 2020-03-23 13:06 ` Patchwork
  8 siblings, 0 replies; 22+ messages in thread
From: Patchwork @ 2020-03-23 13:06 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period (rev3)
URL   : https://patchwork.freedesktop.org/series/74967/
State : failure

== Summary ==

Applying: drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/gt/intel_timeline.c
Falling back to patching base and 3-way merge...
No changes -- Patch already applied.
Applying: drm/i915: Avoid live-lock with i915_vma_parked()
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/i915_vma.c
Falling back to patching base and 3-way merge...
No changes -- Patch already applied.
Applying: drm/i915: Extend intel_wakeref to support delayed puts
Applying: drm/i915/gt: Delay release of engine-pm after last retirement
Applying: drm/i915: Rely on direct submission to the queue
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
M	drivers/gpu/drm/i915/gt/intel_lrc.c
M	drivers/gpu/drm/i915/i915_request.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
No changes -- Patch already applied.
Applying: drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission
Applying: drm/i915: Immediately execute the fenced work
Applying: drm/i915/gem: Avoid gem_context->mutex for simple vma lookup
error: sha1 information is lacking or useless (drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch' to see the failed patch
Patch failed at 0008 drm/i915/gem: Avoid gem_context->mutex for simple vma lookup
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: Immediately execute the fenced work
  2020-03-23 10:46     ` Chris Wilson
@ 2020-03-24 16:13       ` Tvrtko Ursulin
  2020-03-24 16:19         ` Chris Wilson
  0 siblings, 1 reply; 22+ messages in thread
From: Tvrtko Ursulin @ 2020-03-24 16:13 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 23/03/2020 10:46, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-03-23 10:37:22)
>>
>> On 23/03/2020 09:28, Chris Wilson wrote:
>>> If the caller allows and we do not have to wait for any signals,
>>> immediately execute the work within the caller's process. By doing so we
>>> avoid the overhead of scheduling a new task, and the latency in
>>> executing it, at the cost of pulling that work back into the immediate
>>> context. (Sometimes we still prefer to offload the task to another cpu,
>>> especially if we plan on executing many such tasks in parallel for this
>>> client.)
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c |  2 +-
>>>    drivers/gpu/drm/i915/i915_sw_fence_work.c      |  5 ++++-
>>>    drivers/gpu/drm/i915/i915_sw_fence_work.h      | 12 ++++++++++++
>>>    drivers/gpu/drm/i915/i915_vma.c                |  2 +-
>>>    4 files changed, 18 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> index c2bd5accde0c..e80c6f613feb 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>> @@ -1784,7 +1784,7 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
>>>        dma_resv_add_excl_fence(shadow->resv, &pw->base.dma);
>>>        dma_resv_unlock(shadow->resv);
>>>    
>>> -     dma_fence_work_commit(&pw->base);
>>> +     dma_fence_work_commit_imm(&pw->base);
>>>        return 0;
>>>    
>>>    err_batch_unlock:
>>> diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
>>> index 997b2998f1f2..a3a81bb8f2c3 100644
>>> --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
>>> +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
>>> @@ -38,7 +38,10 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>>>    
>>>                if (!f->dma.error) {
>>>                        dma_fence_get(&f->dma);
>>> -                     queue_work(system_unbound_wq, &f->work);
>>> +                     if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
>>> +                             fence_work(&f->work);
>>> +                     else
>>> +                             queue_work(system_unbound_wq, &f->work);
>>>                } else {
>>>                        fence_complete(f);
>>>                }
>>> diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h
>>> index 3a22b287e201..0719d661dc9c 100644
>>> --- a/drivers/gpu/drm/i915/i915_sw_fence_work.h
>>> +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h
>>> @@ -32,6 +32,10 @@ struct dma_fence_work {
>>>        const struct dma_fence_work_ops *ops;
>>>    };
>>>    
>>> +enum {
>>> +     DMA_FENCE_WORK_IMM = DMA_FENCE_FLAG_USER_BITS,
>>> +};
>>> +
>>>    void dma_fence_work_init(struct dma_fence_work *f,
>>>                         const struct dma_fence_work_ops *ops);
>>>    int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal);
>>> @@ -41,4 +45,12 @@ static inline void dma_fence_work_commit(struct dma_fence_work *f)
>>>        i915_sw_fence_commit(&f->chain);
>>>    }
>>>    
>>> +static inline void dma_fence_work_commit_imm(struct dma_fence_work *f)
>>> +{
>>> +     if (atomic_read(&f->chain.pending) <= 1)
>>> +             __set_bit(DMA_FENCE_WORK_IMM, &f->dma.flags);
>>> +
>>
>> What is someone bumps pending to 2 at this point?
> 
> That's invalid. The sequence is
> 
> create a worker
> ... add all async waits ...
> commit
> 
> Since the worker fires when pending waits hits zero, you cannot add any
> more async waits after commit in a race free manner. You have to play
> games, such as "is this fence already signaled? no, let's delay it
> again". If you are playing such games, you know already and shouldn't be
> trying to execute synchronously/immediately.

> A BUG_ON(!dma_fence_signaled(&f->dma)) would suffice to catch most such
> races.

So the "if the callers allows" from the commit message means something 
like "if pending is only one at the time of commit" ?

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915: Immediately execute the fenced work
  2020-03-24 16:13       ` Tvrtko Ursulin
@ 2020-03-24 16:19         ` Chris Wilson
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Wilson @ 2020-03-24 16:19 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-03-24 16:13:04)
> 
> On 23/03/2020 10:46, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-03-23 10:37:22)
> >>
> >> On 23/03/2020 09:28, Chris Wilson wrote:
> >>> If the caller allows and we do not have to wait for any signals,
> >>> immediately execute the work within the caller's process. By doing so we
> >>> avoid the overhead of scheduling a new task, and the latency in
> >>> executing it, at the cost of pulling that work back into the immediate
> >>> context. (Sometimes we still prefer to offload the task to another cpu,
> >>> especially if we plan on executing many such tasks in parallel for this
> >>> client.)
> >>>
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> ---
> >>>    drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c |  2 +-
> >>>    drivers/gpu/drm/i915/i915_sw_fence_work.c      |  5 ++++-
> >>>    drivers/gpu/drm/i915/i915_sw_fence_work.h      | 12 ++++++++++++
> >>>    drivers/gpu/drm/i915/i915_vma.c                |  2 +-
> >>>    4 files changed, 18 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> >>> index c2bd5accde0c..e80c6f613feb 100644
> >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> >>> @@ -1784,7 +1784,7 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
> >>>        dma_resv_add_excl_fence(shadow->resv, &pw->base.dma);
> >>>        dma_resv_unlock(shadow->resv);
> >>>    
> >>> -     dma_fence_work_commit(&pw->base);
> >>> +     dma_fence_work_commit_imm(&pw->base);
> >>>        return 0;
> >>>    
> >>>    err_batch_unlock:
> >>> diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c
> >>> index 997b2998f1f2..a3a81bb8f2c3 100644
> >>> --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c
> >>> +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c
> >>> @@ -38,7 +38,10 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> >>>    
> >>>                if (!f->dma.error) {
> >>>                        dma_fence_get(&f->dma);
> >>> -                     queue_work(system_unbound_wq, &f->work);
> >>> +                     if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags))
> >>> +                             fence_work(&f->work);
> >>> +                     else
> >>> +                             queue_work(system_unbound_wq, &f->work);
> >>>                } else {
> >>>                        fence_complete(f);
> >>>                }
> >>> diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h
> >>> index 3a22b287e201..0719d661dc9c 100644
> >>> --- a/drivers/gpu/drm/i915/i915_sw_fence_work.h
> >>> +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h
> >>> @@ -32,6 +32,10 @@ struct dma_fence_work {
> >>>        const struct dma_fence_work_ops *ops;
> >>>    };
> >>>    
> >>> +enum {
> >>> +     DMA_FENCE_WORK_IMM = DMA_FENCE_FLAG_USER_BITS,
> >>> +};
> >>> +
> >>>    void dma_fence_work_init(struct dma_fence_work *f,
> >>>                         const struct dma_fence_work_ops *ops);
> >>>    int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal);
> >>> @@ -41,4 +45,12 @@ static inline void dma_fence_work_commit(struct dma_fence_work *f)
> >>>        i915_sw_fence_commit(&f->chain);
> >>>    }
> >>>    
> >>> +static inline void dma_fence_work_commit_imm(struct dma_fence_work *f)
> >>> +{
> >>> +     if (atomic_read(&f->chain.pending) <= 1)
> >>> +             __set_bit(DMA_FENCE_WORK_IMM, &f->dma.flags);
> >>> +
> >>
> >> What is someone bumps pending to 2 at this point?
> > 
> > That's invalid. The sequence is
> > 
> > create a worker
> > ... add all async waits ...
> > commit
> > 
> > Since the worker fires when pending waits hits zero, you cannot add any
> > more async waits after commit in a race free manner. You have to play
> > games, such as "is this fence already signaled? no, let's delay it
> > again". If you are playing such games, you know already and shouldn't be
> > trying to execute synchronously/immediately.
> 
> > A BUG_ON(!dma_fence_signaled(&f->dma)) would suffice to catch most such
> > races.
> 
> So the "if the callers allows" from the commit message means something 
> like "if pending is only one at the time of commit" ?

If the caller allows, means if it is valid to call the callback from
within the current context (i.e. within all the locks the caller holds).
And then there's the clflush where we not only worry about the locks in
the deep callpath, but also where we know that offloading the slow task
means we can parallelise better.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-03-24 16:19 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-23  9:28 [Intel-gfx] [PATCH 1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period Chris Wilson
2020-03-23  9:28 ` [Intel-gfx] [PATCH 2/8] drm/i915: Avoid live-lock with i915_vma_parked() Chris Wilson
2020-03-23 10:09   ` Tvrtko Ursulin
2020-03-23  9:28 ` [Intel-gfx] [PATCH 3/8] drm/i915: Extend intel_wakeref to support delayed puts Chris Wilson
2020-03-23 10:13   ` Tvrtko Ursulin
2020-03-23 10:32   ` [Intel-gfx] [PATCH] " Chris Wilson
2020-03-23 12:24     ` Tvrtko Ursulin
2020-03-23  9:28 ` [Intel-gfx] [PATCH 4/8] drm/i915/gt: Delay release of engine-pm after last retirement Chris Wilson
2020-03-23 10:14   ` Tvrtko Ursulin
2020-03-23  9:28 ` [Intel-gfx] [PATCH 5/8] drm/i915: Rely on direct submission to the queue Chris Wilson
2020-03-23 10:29   ` Tvrtko Ursulin
2020-03-23  9:28 ` [Intel-gfx] [PATCH 6/8] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission Chris Wilson
2020-03-23  9:28 ` [Intel-gfx] [PATCH 7/8] drm/i915: Immediately execute the fenced work Chris Wilson
2020-03-23 10:37   ` Tvrtko Ursulin
2020-03-23 10:46     ` Chris Wilson
2020-03-24 16:13       ` Tvrtko Ursulin
2020-03-24 16:19         ` Chris Wilson
2020-03-23 11:04   ` [Intel-gfx] [PATCH] " Chris Wilson
2020-03-23  9:28 ` [Intel-gfx] [PATCH 8/8] drm/i915/gem: Avoid gem_context->mutex for simple vma lookup Chris Wilson
2020-03-23 10:38   ` Tvrtko Ursulin
2020-03-23  9:43 ` [Intel-gfx] [PATCH 1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period Matthew Auld
2020-03-23 13:06 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for series starting with [1/8] drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period (rev3) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.