All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Subject: [Intel-gfx] [PATCH 31/39] drm/i915/gt: Remove virtual breadcrumb before transfer
Date: Wed, 26 Aug 2020 14:28:03 +0100	[thread overview]
Message-ID: <20200826132811.17577-31-chris@chris-wilson.co.uk> (raw)
In-Reply-To: <20200826132811.17577-1-chris@chris-wilson.co.uk>

The issue with stale virtual breadcrumbs remain. Now we have the problem
that if the irq-signaler is still referencing the stale breadcrumb as we
transfer it to a new sibling, the list becomes spaghetti. This is a very
small window, but that doesn't stop it being hit infrequently. To
prevent the lists being tangled (the iterator starting on one engine's
b->signalers but walking onto another list), always decouple the virtual
breadcrumb on schedule-out and make sure that the walker has stepped out
of the lists.

Otherwise, we end up corrupting the lists upon signaling:

<4> [412.623150] list_add corruption. prev->next should be next (ffff888284a98470), but was dead000000000100. (prev=ffff8882586fb870).
<4> [412.623167] WARNING: CPU: 7 PID: 0 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
<4> [412.623169] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal ax88179_178a usbnet coretemp snd_intel_dspcfg mii snd_hda_codec crct10dif_pclmul crc32_pclmul snd_hwdep snd_hda_core ghash_clmulni_intel snd_pcm e1000e mei_me mei ptp pps_core prime_numbers intel_lpss_pci [last unloaded: i915]
<4> [412.623191] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G     U            5.9.0-rc2-CI-CI_DRM_8925+ #1
<4> [412.623194] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.1905212112 05/21/2019
<4> [412.623197] RIP: 0010:__list_add_valid+0x4d/0x70
<4> [412.623200] Code: c3 48 89 d1 48 c7 c7 18 57 33 82 48 89 c2 e8 fa 78 b6 ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 68 57 33 82 e8 e3 78 b6 ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 b8 57 33 82 e8
<4> [412.623202] RSP: 0018:ffffc90000280e18 EFLAGS: 00010086
<4> [412.623205] RAX: 0000000000000000 RBX: ffff88828dbd8d00 RCX: 0000000000000105
<4> [412.623208] RDX: 0000000000000105 RSI: ffffffff82324b63 RDI: 00000000ffffffff
<4> [412.623210] RBP: ffff8882586fb870 R08: ffff88829a144998 R09: 00000000fffffffe
<4> [412.623212] R10: 00000000c4bbeb58 R11: 00000000eb0f6cb0 R12: ffff888284a98428
<4> [412.623215] R13: ffff88827d7db880 R14: ffff88827d7db848 R15: ffff888284a98470
<4> [412.623217] FS:  0000000000000000(0000) GS:ffff88829c180000(0000) knlGS:0000000000000000
<4> [412.623220] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [412.623222] CR2: 0000561612b6e6e0 CR3: 0000000005610002 CR4: 0000000000770ee0
<4> [412.623224] PKRU: 55555554
<4> [412.623227] Call Trace:
<4> [412.623229]  <IRQ>
<4> [412.623307]  i915_request_enable_breadcrumb+0x278/0x400 [i915]
<4> [412.623391]  __i915_request_submit+0xca/0x2a0 [i915]
<4> [412.623461]  __execlists_submission_tasklet+0x480/0x1830 [i915]
<4> [412.623572]  execlists_submission_tasklet+0xc4/0x130 [i915]
<4> [412.623585]  tasklet_action_common.isra.17+0x198/0x1d0
<4> [412.623606]  __do_softirq+0xe0/0x497
<4> [412.623619]  ? handle_fasteoi_irq+0x150/0x150
<4> [412.623622]  asm_call_on_stack+0xf/0x20
<4> [412.623630]  </IRQ>
<4> [412.623634]  do_softirq_own_stack+0x73/0x90
<4> [412.623645]  irq_exit_rcu+0xa9/0xb0
<4> [412.623648]  common_interrupt+0xdd/0x210
<4> [412.623652]  asm_common_interrupt+0x1e/0x40

So far this has only been seen in our selftests that emphasize lots of
fast sibling switches for the virtual engines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c |  5 +++--
 drivers/gpu/drm/i915/gt/intel_lrc.c         | 15 +++++++++++++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index f5f6feed0fa6..5b84f51931d9 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -461,15 +461,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *rq)
 {
 	struct intel_breadcrumbs *b = READ_ONCE(rq->engine)->breadcrumbs;
 	struct intel_context *ce = rq->context;
+	unsigned long flags;
 	bool release;
 
 	if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
 		return;
 
-	spin_lock(&ce->signal_lock);
+	spin_lock_irqsave(&ce->signal_lock, flags);
 	list_del_rcu(&rq->signal_link);
 	release = remove_signaling_context(b, ce);
-	spin_unlock(&ce->signal_lock);
+	spin_unlock_irqrestore(&ce->signal_lock, flags);
 	if (release)
 		intel_context_put(ce);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index f89e4f62c88d..6e57f30c80f0 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1387,6 +1387,21 @@ static inline void execlists_schedule_in(struct i915_request *rq, int idx)
 static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
 {
 	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
+	struct intel_engine_cs *engine = rq->engine;
+
+	/* Flush concurrent rcu iterators in signal_irq_work */
+	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags)) {
+		/*
+		 * After this point, the rq may be transferred to a new
+		 * sibling, so before we clear ce->inflight make sure that
+		 * the context has been removed from the b->signalers and
+		 * furthermore we need to make sure that the concurrent
+		 * iterator in signal_irq_work is no longer following
+		 * ce->signal_link.
+		 */
+		i915_request_cancel_breadcrumb(rq);
+		irq_work_sync(&engine->breadcrumbs->irq_work);
+	}
 
 	if (READ_ONCE(ve->request))
 		tasklet_hi_schedule(&ve->base.execlists.tasklet);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2020-08-26 13:28 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-26 13:27 [PATCH 01/39] drm/i915/gem: Avoid implicit vmap for highmem on x86-32 Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] " Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] [PATCH 02/39] drm/i915/gem: Use set_pte_at() for assigning the vmapped PTE Chris Wilson
2020-08-26 16:36   ` Matthew Auld
2020-08-26 16:55     ` Chris Wilson
2020-08-26 13:27 ` [PATCH 03/39] drm/i915/gem: Prevent using pgprot_writecombine() if PAT is not supported Chris Wilson
2020-08-26 13:27   ` [Intel-gfx] " Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] [PATCH 04/39] drm/i915/gt: Clear the buffer pool age before use Chris Wilson
2020-08-26 17:15   ` Matthew Auld
2020-08-26 13:27 ` [Intel-gfx] [PATCH 05/39] drm/i915/gt: Widen CSB pointer to u64 for the parsers Chris Wilson
2020-08-26 13:27 ` [PATCH 06/39] drm/i915/gt: Wait for CSB entries on Tigerlake Chris Wilson
2020-08-26 13:27   ` [Intel-gfx] " Chris Wilson
2020-08-28 14:04   ` Mika Kuoppala
2020-08-28 14:04     ` [Intel-gfx] " Mika Kuoppala
2020-08-26 13:27 ` [Intel-gfx] [PATCH 07/39] drm/i915/gt: Apply the CSB w/a for all Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] [PATCH 08/39] drm/i915/gt: Show engine properties in the pretty printer Chris Wilson
2020-08-27 12:25   ` Mika Kuoppala
2020-08-26 13:27 ` [Intel-gfx] [PATCH 09/39] drm/i915/gem: Hold request reference for canceling an active context Chris Wilson
2020-08-26 13:27 ` [PATCH 10/39] drm/i915: Cancel outstanding work after disabling heartbeats on an engine Chris Wilson
2020-08-26 13:27   ` [Intel-gfx] " Chris Wilson
2020-08-29  9:30   ` Mika Kuoppala
2020-08-29  9:30     ` Mika Kuoppala
2020-08-26 13:27 ` [PATCH 11/39] drm/i915/gt: Always send a pulse down the engine after disabling heartbeat Chris Wilson
2020-08-26 13:27   ` [Intel-gfx] " Chris Wilson
2020-08-26 13:27 ` [PATCH 12/39] drm/i915/gem: Always test execution status on closing the context Chris Wilson
2020-08-26 13:27   ` [Intel-gfx] " Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] [PATCH 13/39] drm/i915/gt: Signal cancelled requests Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] [PATCH 14/39] drm/i915/selftests: Finish pending mock requests on cancellation Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] [PATCH 15/39] drm/i915/gt: Retire cancelled requests on unload Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] [PATCH 16/39] drm/i915/gt: Remove defunct intel_virtual_engine_get_sibling() Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] [PATCH 17/39] drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] [PATCH 18/39] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] [PATCH 19/39] drm/i915/gt: Don't cancel the interrupt shadow too early Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] [PATCH 20/39] drm/i915/gt: Free stale request on destroying the virtual engine Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] [PATCH 21/39] drm/i915/gt: Protect context lifetime with RCU Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] [PATCH 22/39] drm/i915/gt: Split the breadcrumb spinlock between global and contexts Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] [PATCH 23/39] drm/i915/gt: Move the breadcrumb to the signaler if completed upon cancel Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] [PATCH 24/39] drm/i915/gt: Decouple completed requests on unwind Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] [PATCH 25/39] drm/i915/gt: Check for a completed last request once Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] [PATCH 26/39] drm/i915/gt: Replace direct submit with direct call to tasklet Chris Wilson
2020-08-26 13:27 ` [Intel-gfx] [PATCH 27/39] drm/i915/gt: ce->inflight updates are now serialised Chris Wilson
2020-08-26 13:28 ` [Intel-gfx] [PATCH 28/39] drm/i915/gt: Use virtual_engine during execlists_dequeue Chris Wilson
2020-08-26 13:28 ` [Intel-gfx] [PATCH 29/39] drm/i915/gt: Decouple inflight virtual engines Chris Wilson
2020-08-26 13:28 ` [Intel-gfx] [PATCH 30/39] drm/i915/gt: Defer schedule_out until after the next dequeue Chris Wilson
2020-08-26 13:28 ` Chris Wilson [this message]
2020-08-26 13:28 ` [Intel-gfx] [PATCH 32/39] drm/i915/gt: Shrink the critical section for irq signaling Chris Wilson
2020-08-26 13:28 ` [Intel-gfx] [PATCH 33/39] drm/i915/gt: Resubmit the virtual engine on schedule-out Chris Wilson
2020-08-26 13:28 ` [Intel-gfx] [PATCH 34/39] drm/i915/gt: Simplify virtual engine handling for execlists_hold() Chris Wilson
2020-08-26 13:28 ` [Intel-gfx] [PATCH 35/39] drm/i915: Encode fence specific waitqueue behaviour into the wait.flags Chris Wilson
2020-09-02 14:02   ` Thomas Hellström (Intel)
2020-09-03  9:50     ` Thomas Hellström (Intel)
2020-09-03 11:32       ` Chris Wilson
2020-09-03 12:08         ` Thomas Hellström (Intel)
2020-08-26 13:28 ` [Intel-gfx] [PATCH 36/39] drm/i915/selftests: Confirm RING_TIMESTAMP / CTX_TIMESTAMP share a clock Chris Wilson
2020-08-26 13:28 ` [Intel-gfx] [PATCH 37/39] drm/i915/gt: Consolidate the CS timestamp clocks Chris Wilson
2020-08-26 13:28 ` [PATCH 38/39] drm/i915: Break up error capture compression loops with cond_resched() Chris Wilson
2020-08-26 13:28   ` [Intel-gfx] " Chris Wilson
2020-08-26 13:28 ` [Intel-gfx] [PATCH 39/39] drm/i915: Reduce GPU error capture mutex hold time Chris Wilson
2020-08-26 13:53 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/39] drm/i915/gem: Avoid implicit vmap for highmem on x86-32 Patchwork
2020-08-26 13:54 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2020-08-26 14:11 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2020-08-26 14:20 ` [Intel-gfx] [PATCH 01/39] " Matthew Auld
2020-08-26 14:20   ` Matthew Auld
2020-08-26 20:43 ` Harald Arnesen
2020-08-26 20:43   ` [Intel-gfx] " Harald Arnesen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200826132811.17577-31-chris@chris-wilson.co.uk \
    --to=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.