dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v3] drm/i915: Synchronize active and retire callbacks
       [not found] ` <20200403223528.2570-1-sultan@kerneltoast.com>
@ 2020-04-04  2:41   ` Sultan Alsawaf
       [not found]     ` <20200407064007.7599-1-sultan@kerneltoast.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Sultan Alsawaf @ 2020-04-04  2:41 UTC (permalink / raw)
  To: stable, Jani Nikula, Joonas Lahtinen, Rodrigo Vivi, David Airlie,
	Daniel Vetter, Chris Wilson, Ville Syrjälä,
	Matthew Auld, Tvrtko Ursulin, Mika Kuoppala, Lionel Landwerlin,
	intel-gfx, dri-devel, linux-kernel

On Fri, Apr 03, 2020 at 03:35:15PM -0700, Sultan Alsawaf wrote:
> +			ref->retire(ref);
> +			mutex_unlock(&ref->callback_lock);

Ugh, this patch is still wrong because the mutex unlock after ref->retire() is a
use-after-free. Fun times...

Sultan
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] drm/i915: Synchronize active and retire callbacks
       [not found]     ` <20200407064007.7599-1-sultan@kerneltoast.com>
@ 2020-04-14  6:13       ` Sultan Alsawaf
  2020-04-14  8:23         ` Chris Wilson
  0 siblings, 1 reply; 9+ messages in thread
From: Sultan Alsawaf @ 2020-04-14  6:13 UTC (permalink / raw)
  To: Chris Wilson
  Cc: dri-devel, David Airlie, intel-gfx, linux-kernel, stable,
	Rodrigo Vivi, Matthew Auld

Chris,

Could you please take a look at this? This really is quite an important fix.

Thanks,
Sultan
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] drm/i915: Synchronize active and retire callbacks
  2020-04-14  6:13       ` [PATCH v4] " Sultan Alsawaf
@ 2020-04-14  8:23         ` Chris Wilson
  2020-04-14 14:43           ` Sultan Alsawaf
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Wilson @ 2020-04-14  8:23 UTC (permalink / raw)
  To: Sultan Alsawaf
  Cc: dri-devel, David Airlie, intel-gfx, linux-kernel, stable,
	Rodrigo Vivi, Matthew Auld

Quoting Sultan Alsawaf (2020-04-14 07:13:12)
> Chris,
> 
> Could you please take a look at this? This really is quite an important fix.

It's crazy. See a266bf420060 for a patch that should be applied to v5.4
-Chris
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] drm/i915: Synchronize active and retire callbacks
  2020-04-14  8:23         ` Chris Wilson
@ 2020-04-14 14:43           ` Sultan Alsawaf
  2020-04-20  5:24             ` Sultan Alsawaf
  0 siblings, 1 reply; 9+ messages in thread
From: Sultan Alsawaf @ 2020-04-14 14:43 UTC (permalink / raw)
  To: Chris Wilson
  Cc: dri-devel, David Airlie, intel-gfx, linux-kernel, stable,
	Rodrigo Vivi, Matthew Auld

On Tue, Apr 14, 2020 at 09:23:56AM +0100, Chris Wilson wrote:
> Quoting Sultan Alsawaf (2020-04-14 07:13:12)
> > Chris,
> > 
> > Could you please take a look at this? This really is quite an important fix.
> 
> It's crazy. See a266bf420060 for a patch that should be applied to v5.4
> -Chris

What? a266bf420060 was part of 5.4.0-rc7, so it's already in 5.4. And if you
read the commit message, you would see that the problem in question affects
Linus' tree.

You can break i915 in 5.6 by just adding a small delay:

diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c
index 6ff803f397c4..3a7968effdfd 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -10,6 +10,7 @@
 #include "intel_engine.h"
 #include "intel_ring.h"
 #include "intel_timeline.h"
+#include <linux/delay.h>
 
 unsigned int intel_ring_update_space(struct intel_ring *ring)
 {
@@ -92,6 +93,9 @@ void intel_ring_unpin(struct intel_ring *ring)
 	else
 		i915_gem_object_unpin_map(vma->obj);
 
+	mdelay(1);
+	ring->vaddr = NULL;
+
 	i915_vma_make_purgeable(vma);
 	i915_vma_unpin(vma);
 }

This is how I reproduced the race in question. I can't even reach the greeter on
my laptop with this, because i915 dies before that.

Sultan
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] drm/i915: Synchronize active and retire callbacks
  2020-04-14 14:43           ` Sultan Alsawaf
@ 2020-04-20  5:24             ` Sultan Alsawaf
  2020-04-20  8:21               ` Joonas Lahtinen
  0 siblings, 1 reply; 9+ messages in thread
From: Sultan Alsawaf @ 2020-04-20  5:24 UTC (permalink / raw)
  To: Chris Wilson
  Cc: dri-devel, David Airlie, intel-gfx, linux-kernel, stable,
	Rodrigo Vivi, Matthew Auld

Chris,

Could you please look at this in earnest? This is a real bug that crashes my
laptop without any kind of provocation. It is undeniably a bug in i915, and I've
clearly described it in my patch. If you dont like the patch, I'm open to any
suggestions you have for an alternative solution. My goal here is to make i915
better, but it's difficult when communication only goes one way.

Thanks,
Sultan
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] drm/i915: Synchronize active and retire callbacks
  2020-04-20  5:24             ` Sultan Alsawaf
@ 2020-04-20  8:21               ` Joonas Lahtinen
  2020-04-20 16:15                 ` Sultan Alsawaf
  0 siblings, 1 reply; 9+ messages in thread
From: Joonas Lahtinen @ 2020-04-20  8:21 UTC (permalink / raw)
  To: Chris Wilson, Sultan Alsawaf
  Cc: dri-devel, David Airlie, intel-gfx, linux-kernel, Matthew Auld,
	Rodrigo Vivi, stable

Quoting Sultan Alsawaf (2020-04-20 08:24:19)
> Chris,
> 
> Could you please look at this in earnest? This is a real bug that crashes my
> laptop without any kind of provocation. It is undeniably a bug in i915, and I've
> clearly described it in my patch. If you dont like the patch, I'm open to any
> suggestions you have for an alternative solution. My goal here is to make i915
> better, but it's difficult when communication only goes one way.

Hi Sultan,

The patch Chris pointed out was not part of 5.4 release. The commit
message describes that it fixes the functions to be tolerant to
running simultaneously. In doing that zeroing of ring->vaddr is
removed so the test to do mdelay(1) and "ring->vaddr = NULL;" is
not correct.

I think you might have used the wrong git command for checking the
patch history:

$ git describe a266bf420060
v5.4-rc7-1996-ga266bf420060 # after -rc7 tag

$ git describe --contains a266bf420060
v5.6-rc1~34^2~21^2~326 # included in v5.6-rc1

And git log to double check:

$ git log --format=oneline kernel.org/stable/linux-5.4.y --grep="drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint"
$ git log --format=oneline kernel.org/stable/linux-5.5.y --grep="drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint"
0725d9a31869e6c80630e99da366ede2848295cc drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint
$ git log --format=oneline kernel.org/stable/linux-5.6.y --grep="drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint"
a754012b9f2323a5d640da7eb7b095ac3b8cd012 drm/i915/execlists: Leave resetting ring to intel_ring
0725d9a31869e6c80630e99da366ede2848295cc drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint
a266bf42006004306dd48a9082c35dfbff153307 drm/i915/gt: Make intel_ring_unpin() safe for concurrent pint

So it seems that the patch got pulled into v5.6 and has been backported
to v5.5 but not v5.4.

Could you try applying the patch to 5.4 and seeing if the problem
persists?

Regards, Joonas
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] drm/i915: Synchronize active and retire callbacks
  2020-04-20  8:21               ` Joonas Lahtinen
@ 2020-04-20 16:15                 ` Sultan Alsawaf
  2020-04-21  6:51                   ` Joonas Lahtinen
  0 siblings, 1 reply; 9+ messages in thread
From: Sultan Alsawaf @ 2020-04-20 16:15 UTC (permalink / raw)
  To: Joonas Lahtinen
  Cc: dri-devel, David Airlie, intel-gfx, linux-kernel, Chris Wilson,
	stable, Rodrigo Vivi, Matthew Auld

On Mon, Apr 20, 2020 at 11:21:42AM +0300, Joonas Lahtinen wrote:
> So it seems that the patch got pulled into v5.6 and has been backported
> to v5.5 but not v5.4.

You're right, that's my mistake.

> In doing that zeroing of ring->vaddr is removed so the test to do mdelay(1)
> and "ring->vaddr = NULL;" is not correct.

I'm not so sure about this. Look at where `ring->vaddr` is assigned:
-------------------------------------8<-----------------------------------------
	ret = i915_vma_pin(vma, 0, 0, flags);
	if (unlikely(ret))
		goto err_unpin;

	if (i915_vma_is_map_and_fenceable(vma))
		addr = (void __force *)i915_vma_pin_iomap(vma);
	else
		addr = i915_gem_object_pin_map(vma->obj,
					       i915_coherent_map_type(vma->vm->i915));
	if (IS_ERR(addr)) {
		ret = PTR_ERR(addr);
		goto err_ring;
	}

	i915_vma_make_unshrinkable(vma);

	/* Discard any unused bytes beyond that submitted to hw. */
	intel_ring_reset(ring, ring->emit);

	ring->vaddr = addr;
------------------------------------->8-----------------------------------------

And then the converse of that is done *before* my reproducer patch does
`ring->vaddr = NULL;`:
-------------------------------------8<-----------------------------------------
	i915_vma_unset_ggtt_write(vma);
	if (i915_vma_is_map_and_fenceable(vma))
		i915_vma_unpin_iomap(vma);
	else
		i915_gem_object_unpin_map(vma->obj);

	/* mdelay(1);
	ring->vaddr = NULL; */

	i915_vma_make_purgeable(vma);
	i915_vma_unpin(vma);
------------------------------------->8-----------------------------------------

Isn't the value assigned to `ring->vaddr` trashed by those function calls above
where I've got the mdelay? If so, why would it be correct to let the stale value
sit in `ring->vaddr`?

My interpretation of the zeroing of ring->vaddr being removed by Chris was that
it was an unnecessary step before the ring was getting discarded anyway.

Sultan
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] drm/i915: Synchronize active and retire callbacks
  2020-04-20 16:15                 ` Sultan Alsawaf
@ 2020-04-21  6:51                   ` Joonas Lahtinen
  2020-04-21 15:54                     ` Sultan Alsawaf
  0 siblings, 1 reply; 9+ messages in thread
From: Joonas Lahtinen @ 2020-04-21  6:51 UTC (permalink / raw)
  To: Sultan Alsawaf
  Cc: dri-devel, David Airlie, intel-gfx, linux-kernel, Chris Wilson,
	stable, Rodrigo Vivi, Matthew Auld

Quoting Sultan Alsawaf (2020-04-20 19:15:14)
> On Mon, Apr 20, 2020 at 11:21:42AM +0300, Joonas Lahtinen wrote:
> > So it seems that the patch got pulled into v5.6 and has been backported
> > to v5.5 but not v5.4.
> 
> You're right, that's my mistake.

Did applying the patch to v5.4 fix the issue at hand?

Regards, Joonas
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] drm/i915: Synchronize active and retire callbacks
  2020-04-21  6:51                   ` Joonas Lahtinen
@ 2020-04-21 15:54                     ` Sultan Alsawaf
  0 siblings, 0 replies; 9+ messages in thread
From: Sultan Alsawaf @ 2020-04-21 15:54 UTC (permalink / raw)
  To: Joonas Lahtinen
  Cc: dri-devel, David Airlie, intel-gfx, linux-kernel, Chris Wilson,
	stable, Rodrigo Vivi, Matthew Auld

On Tue, Apr 21, 2020 at 09:51:37AM +0300, Joonas Lahtinen wrote:
> Quoting Sultan Alsawaf (2020-04-20 19:15:14)
> > On Mon, Apr 20, 2020 at 11:21:42AM +0300, Joonas Lahtinen wrote:
> > > So it seems that the patch got pulled into v5.6 and has been backported
> > > to v5.5 but not v5.4.
> > 
> > You're right, that's my mistake.
> 
> Did applying the patch to v5.4 fix the issue at hand?

Of course the patch doesn't apply as-is because the code's been shuffled around,
but yes. The crashes are gone with that patch, and I don't have the motivation
to check if that patch is actually correct, so hurray, problem solved. I'm not
going to send the backport myself because I'll probably be ignored, so you can
go ahead and do that.

Sultan
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-04-22  6:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200403042948.2533-1-sultan@kerneltoast.com>
     [not found] ` <20200403223528.2570-1-sultan@kerneltoast.com>
2020-04-04  2:41   ` [PATCH v3] drm/i915: Synchronize active and retire callbacks Sultan Alsawaf
     [not found]     ` <20200407064007.7599-1-sultan@kerneltoast.com>
2020-04-14  6:13       ` [PATCH v4] " Sultan Alsawaf
2020-04-14  8:23         ` Chris Wilson
2020-04-14 14:43           ` Sultan Alsawaf
2020-04-20  5:24             ` Sultan Alsawaf
2020-04-20  8:21               ` Joonas Lahtinen
2020-04-20 16:15                 ` Sultan Alsawaf
2020-04-21  6:51                   ` Joonas Lahtinen
2020-04-21 15:54                     ` Sultan Alsawaf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).