[PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
@ 2018-03-03  9:35 Chris Wilson
  2018-03-03 10:13 ` ✓ Fi.CI.BAT: success for " Patchwork
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Chris Wilson @ 2018-03-03  9:35 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

Similar to the staging around handling of engine->submit_request, we
need to stop adding to the execlists->queue prior to calling
engine->cancel_requests. cancel_requests will move requests from the
queue onto the timeline, so if we add a request onto the queue after that
point, it will be lost.

Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c     | 13 +++++++------
 drivers/gpu/drm/i915/i915_request.c |  2 ++
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a5bd07338b46..8d913d833ab9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -471,10 +471,11 @@ static void __fence_set_priority(struct dma_fence *fence, int prio)
 
 	rq = to_request(fence);
 	engine = rq->engine;
-	if (!engine->schedule)
-		return;
 
-	engine->schedule(rq, prio);
+	rcu_read_lock();
+	if (engine->schedule)
+		engine->schedule(rq, prio);
+	rcu_read_unlock();
 }
 
 static void fence_set_priority(struct dma_fence *fence, int prio)
@@ -3214,8 +3215,11 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
 	 */
 	for_each_engine(engine, i915, id) {
 		i915_gem_reset_prepare_engine(engine);
+
 		engine->submit_request = nop_submit_request;
+		engine->schedule = NULL;
 	}
+	i915->caps.scheduler = 0;
 
 	/*
 	 * Make sure no one is running the old callback before we proceed with
@@ -3233,11 +3237,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
 		 * start to complete all requests.
 		 */
 		engine->submit_request = nop_complete_submit_request;
-		engine->schedule = NULL;
 	}
 
-	i915->caps.scheduler = 0;
-
 	/*
 	 * Make sure no request can slip through without getting completed by
 	 * either this call here to intel_engine_init_global_seqno, or the one
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 2265bb8ff4fa..59a87afd83b6 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1081,8 +1081,10 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
 	 * decide whether to preempt the entire chain so that it is ready to
 	 * run at the earliest possible convenience.
 	 */
+	rcu_read_lock();
 	if (engine->schedule)
 		engine->schedule(request, request->ctx->priority);
+	rcu_read_unlock();
 
 	local_bh_disable();
 	i915_sw_fence_commit(&request->submit);
-- 
2.16.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
  2018-03-03  9:35 [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection Chris Wilson
@ 2018-03-03 10:13 ` Patchwork
  2018-03-03 11:45 ` ✓ Fi.CI.IGT: " Patchwork
  2018-03-05 13:59 ` [PATCH] " Mika Kuoppala
  2 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2018-03-03 10:13 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
URL   : https://patchwork.freedesktop.org/series/39324/
State : success

== Summary ==

Series 39324v1 drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
https://patchwork.freedesktop.org/api/1.0/series/39324/revisions/1/mbox/

---- Known issues:

Test debugfs_test:
        Subgroup read_all_entries:
                incomplete -> PASS       (fi-snb-2520m) fdo#103713 +1
Test gem_mmap_gtt:
        Subgroup basic-small-bo-tiledx:
                pass       -> FAIL       (fi-gdg-551) fdo#102575
Test kms_chamelium:
        Subgroup dp-edid-read:
                fail       -> PASS       (fi-kbl-7500u) fdo#102505

fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575
fdo#102505 https://bugs.freedesktop.org/show_bug.cgi?id=102505

fi-bdw-5557u     total:288  pass:267  dwarn:0   dfail:0   fail:0   skip:21  time:414s
fi-bdw-gvtdvm    total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:424s
fi-blb-e6850     total:288  pass:223  dwarn:1   dfail:0   fail:0   skip:64  time:369s
fi-bsw-n3050     total:288  pass:242  dwarn:0   dfail:0   fail:0   skip:46  time:481s
fi-bwr-2160      total:288  pass:183  dwarn:0   dfail:0   fail:0   skip:105 time:278s
fi-bxt-dsi       total:288  pass:258  dwarn:0   dfail:0   fail:0   skip:30  time:480s
fi-bxt-j4205     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:482s
fi-byt-j1900     total:288  pass:253  dwarn:0   dfail:0   fail:0   skip:35  time:463s
fi-byt-n2820     total:288  pass:249  dwarn:0   dfail:0   fail:0   skip:39  time:453s
fi-cfl-8700k     total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:391s
fi-cfl-s2        total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:560s
fi-elk-e7500     total:288  pass:229  dwarn:0   dfail:0   fail:0   skip:59  time:415s
fi-gdg-551       total:288  pass:179  dwarn:0   dfail:0   fail:1   skip:108 time:287s
fi-glk-1         total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:506s
fi-hsw-4770      total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:387s
fi-ilk-650       total:288  pass:228  dwarn:0   dfail:0   fail:0   skip:60  time:406s
fi-ivb-3520m     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:444s
fi-ivb-3770      total:288  pass:255  dwarn:0   dfail:0   fail:0   skip:33  time:411s
fi-kbl-7500u     total:288  pass:263  dwarn:1   dfail:0   fail:0   skip:24  time:444s
fi-kbl-7560u     total:288  pass:269  dwarn:0   dfail:0   fail:0   skip:19  time:492s
fi-kbl-7567u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:445s
fi-kbl-r         total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:491s
fi-pnv-d510      total:288  pass:222  dwarn:1   dfail:0   fail:0   skip:65  time:580s
fi-skl-6260u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:420s
fi-skl-6600u     total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:501s
fi-skl-6700hq    total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:519s
fi-skl-6700k2    total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:483s
fi-skl-6770hq    total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:471s
fi-skl-guc       total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:404s
fi-skl-gvtdvm    total:288  pass:265  dwarn:0   dfail:0   fail:0   skip:23  time:428s
fi-snb-2520m     total:245  pass:211  dwarn:0   dfail:0   fail:0   skip:33 
fi-snb-2600      total:288  pass:248  dwarn:0   dfail:0   fail:0   skip:40  time:391s

c07ef2c77d141730de535958cf9483d269b24676 drm-tip: 2018y-03m-03d-08h-39m-05s UTC integration manifest
854848bd28f7 drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8226/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* ✓ Fi.CI.IGT: success for drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
  2018-03-03  9:35 [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection Chris Wilson
  2018-03-03 10:13 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2018-03-03 11:45 ` Patchwork
  2018-03-05 13:59 ` [PATCH] " Mika Kuoppala
  2 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2018-03-03 11:45 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
URL   : https://patchwork.freedesktop.org/series/39324/
State : success

== Summary ==

---- Known issues:

Test gem_eio:
        Subgroup in-flight-external:
                incomplete -> PASS       (shard-apl) fdo#104945
Test kms_chv_cursor_fail:
        Subgroup pipe-b-256x256-right-edge:
                pass       -> DMESG-WARN (shard-snb) fdo#105185
Test kms_flip:
        Subgroup blocking-wf_vblank:
                fail       -> PASS       (shard-hsw) fdo#103928 +1
        Subgroup dpms-vs-vblank-race-interruptible:
                pass       -> FAIL       (shard-hsw) fdo#103060

fdo#104945 https://bugs.freedesktop.org/show_bug.cgi?id=104945
fdo#105185 https://bugs.freedesktop.org/show_bug.cgi?id=105185
fdo#103928 https://bugs.freedesktop.org/show_bug.cgi?id=103928
fdo#103060 https://bugs.freedesktop.org/show_bug.cgi?id=103060

shard-apl        total:3462 pass:1820 dwarn:1   dfail:0   fail:7   skip:1632 time:11967s
shard-hsw        total:3463 pass:1769 dwarn:1   dfail:0   fail:2   skip:1690 time:11797s
shard-snb        total:3463 pass:1358 dwarn:3   dfail:0   fail:2   skip:2100 time:7021s
Blacklisted hosts:
shard-kbl        total:3462 pass:1943 dwarn:1   dfail:0   fail:6   skip:1511 time:9274s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8226/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
  2018-03-03  9:35 [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection Chris Wilson
  2018-03-03 10:13 ` ✓ Fi.CI.BAT: success for " Patchwork
  2018-03-03 11:45 ` ✓ Fi.CI.IGT: " Patchwork
@ 2018-03-05 13:59 ` Mika Kuoppala
  2018-03-05 14:34   ` Chris Wilson
  2 siblings, 1 reply; 6+ messages in thread
From: Mika Kuoppala @ 2018-03-05 13:59 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Similar to the staging around handling of engine->submit_request, we
> need to stop adding to the execlists->queue prior to calling
> engine->cancel_requests. cancel_requests will move requests from the
> queue onto the timeline, so if we add a request onto the queue after that
> point, it will be lost.
>
> Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c     | 13 +++++++------
>  drivers/gpu/drm/i915/i915_request.c |  2 ++
>  2 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index a5bd07338b46..8d913d833ab9 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -471,10 +471,11 @@ static void __fence_set_priority(struct dma_fence *fence, int prio)
>  
>  	rq = to_request(fence);
>  	engine = rq->engine;
> -	if (!engine->schedule)
> -		return;
>  
> -	engine->schedule(rq, prio);
> +	rcu_read_lock();
> +	if (engine->schedule)
> +		engine->schedule(rq, prio);
> +	rcu_read_unlock();
>  }
>  
>  static void fence_set_priority(struct dma_fence *fence, int prio)
> @@ -3214,8 +3215,11 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>  	 */
>  	for_each_engine(engine, i915, id) {
>  		i915_gem_reset_prepare_engine(engine);
> +
>  		engine->submit_request = nop_submit_request;
> +		engine->schedule = NULL;

Why we are not using rcu_assign_pointer and rcu_deference pair
in the upper part where we check the schedule?

Further, is there are risk that we lose sync between the two
assigments. In another words, should we combine both callbacks
behind a single deferensable pointer in the engine struct?

-Mika


>  	}
> +	i915->caps.scheduler = 0;
>  
>  	/*
>  	 * Make sure no one is running the old callback before we proceed with
> @@ -3233,11 +3237,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
>  		 * start to complete all requests.
>  		 */
>  		engine->submit_request = nop_complete_submit_request;
> -		engine->schedule = NULL;
>  	}
>  
> -	i915->caps.scheduler = 0;
> -
>  	/*
>  	 * Make sure no request can slip through without getting completed by
>  	 * either this call here to intel_engine_init_global_seqno, or the one
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 2265bb8ff4fa..59a87afd83b6 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1081,8 +1081,10 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
>  	 * decide whether to preempt the entire chain so that it is ready to
>  	 * run at the earliest possible convenience.
>  	 */
> +	rcu_read_lock();
>  	if (engine->schedule)
>  		engine->schedule(request, request->ctx->priority);
> +	rcu_read_unlock();
>  
>  	local_bh_disable();
>  	i915_sw_fence_commit(&request->submit);
> -- 
> 2.16.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
  2018-03-05 13:59 ` [PATCH] " Mika Kuoppala
@ 2018-03-05 14:34   ` Chris Wilson
  2018-03-05 14:35     ` Chris Wilson
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Wilson @ 2018-03-05 14:34 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2018-03-05 13:59:43)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Similar to the staging around handling of engine->submit_request, we
> > need to stop adding to the execlists->queue prior to calling
> > engine->cancel_requests. cancel_requests will move requests from the
> > queue onto the timeline, so if we add a request onto the queue after that
> > point, it will be lost.
> >
> > Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c     | 13 +++++++------
> >  drivers/gpu/drm/i915/i915_request.c |  2 ++
> >  2 files changed, 9 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index a5bd07338b46..8d913d833ab9 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -471,10 +471,11 @@ static void __fence_set_priority(struct dma_fence *fence, int prio)
> >  
> >       rq = to_request(fence);
> >       engine = rq->engine;
> > -     if (!engine->schedule)
> > -             return;
> >  
> > -     engine->schedule(rq, prio);
> > +     rcu_read_lock();
> > +     if (engine->schedule)
> > +             engine->schedule(rq, prio);
> > +     rcu_read_unlock();
> >  }
> >  
> >  static void fence_set_priority(struct dma_fence *fence, int prio)
> > @@ -3214,8 +3215,11 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
> >        */
> >       for_each_engine(engine, i915, id) {
> >               i915_gem_reset_prepare_engine(engine);
> > +
> >               engine->submit_request = nop_submit_request;
> > +             engine->schedule = NULL;
> 
> Why we are not using rcu_assign_pointer and rcu_deference pair
> in the upper part where we check the schedule?

We are not using RCU protection. RCU here is being abused as a
free-flowing stop-machine.
 
> Further, is there are risk that we lose sync between the two
> assigments. In another words, should we combine both callbaks
> behind a single deferensable pointer in the engine struct?

They are only tied together by how the backend uses them, not by request
flow, so I don't think so.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
  2018-03-05 14:34   ` Chris Wilson
@ 2018-03-05 14:35     ` Chris Wilson
  0 siblings, 0 replies; 6+ messages in thread
From: Chris Wilson @ 2018-03-05 14:35 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Chris Wilson (2018-03-05 14:34:42)
> Quoting Mika Kuoppala (2018-03-05 13:59:43)
> > Chris Wilson <chris@chris-wilson.co.uk> writes:
> > 
> > > Similar to the staging around handling of engine->submit_request, we
> > > need to stop adding to the execlists->queue prior to calling
> > > engine->cancel_requests. cancel_requests will move requests from the
> > > queue onto the timeline, so if we add a request onto the queue after that
> > > point, it will be lost.
> > >
> > > Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem.c     | 13 +++++++------
> > >  drivers/gpu/drm/i915/i915_request.c |  2 ++
> > >  2 files changed, 9 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > index a5bd07338b46..8d913d833ab9 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -471,10 +471,11 @@ static void __fence_set_priority(struct dma_fence *fence, int prio)
> > >  
> > >       rq = to_request(fence);
> > >       engine = rq->engine;
> > > -     if (!engine->schedule)
> > > -             return;
> > >  
> > > -     engine->schedule(rq, prio);
> > > +     rcu_read_lock();
> > > +     if (engine->schedule)
> > > +             engine->schedule(rq, prio);
> > > +     rcu_read_unlock();
> > >  }
> > >  
> > >  static void fence_set_priority(struct dma_fence *fence, int prio)
> > > @@ -3214,8 +3215,11 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
> > >        */
> > >       for_each_engine(engine, i915, id) {
> > >               i915_gem_reset_prepare_engine(engine);
> > > +
> > >               engine->submit_request = nop_submit_request;
> > > +             engine->schedule = NULL;
> > 
> > Why we are not using rcu_assign_pointer and rcu_deference pair
> > in the upper part where we check the schedule?
> 
> We are not using RCU protection. RCU here is being abused as a
> free-flowing stop-machine.

I'm sorely tempted to put it back to stop_machine as the races are just
plain weird and proving hard to fix :(
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-03-05 14:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-03  9:35 [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection Chris Wilson
2018-03-03 10:13 ` ✓ Fi.CI.BAT: success for " Patchwork
2018-03-03 11:45 ` ✓ Fi.CI.IGT: " Patchwork
2018-03-05 13:59 ` [PATCH] " Mika Kuoppala
2018-03-05 14:34   ` Chris Wilson
2018-03-05 14:35     ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.