* [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
@ 2018-03-03 9:35 Chris Wilson
2018-03-03 10:13 ` ✓ Fi.CI.BAT: success for " Patchwork
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Chris Wilson @ 2018-03-03 9:35 UTC (permalink / raw)
To: intel-gfx; +Cc: Mika Kuoppala
Similar to the staging around handling of engine->submit_request, we
need to stop adding to the execlists->queue prior to calling
engine->cancel_requests. cancel_requests will move requests from the
queue onto the timeline, so if we add a request onto the queue after that
point, it will be lost.
Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
drivers/gpu/drm/i915/i915_gem.c | 13 +++++++------
drivers/gpu/drm/i915/i915_request.c | 2 ++
2 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a5bd07338b46..8d913d833ab9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -471,10 +471,11 @@ static void __fence_set_priority(struct dma_fence *fence, int prio)
rq = to_request(fence);
engine = rq->engine;
- if (!engine->schedule)
- return;
- engine->schedule(rq, prio);
+ rcu_read_lock();
+ if (engine->schedule)
+ engine->schedule(rq, prio);
+ rcu_read_unlock();
}
static void fence_set_priority(struct dma_fence *fence, int prio)
@@ -3214,8 +3215,11 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
*/
for_each_engine(engine, i915, id) {
i915_gem_reset_prepare_engine(engine);
+
engine->submit_request = nop_submit_request;
+ engine->schedule = NULL;
}
+ i915->caps.scheduler = 0;
/*
* Make sure no one is running the old callback before we proceed with
@@ -3233,11 +3237,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
* start to complete all requests.
*/
engine->submit_request = nop_complete_submit_request;
- engine->schedule = NULL;
}
- i915->caps.scheduler = 0;
-
/*
* Make sure no request can slip through without getting completed by
* either this call here to intel_engine_init_global_seqno, or the one
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 2265bb8ff4fa..59a87afd83b6 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1081,8 +1081,10 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
* decide whether to preempt the entire chain so that it is ready to
* run at the earliest possible convenience.
*/
+ rcu_read_lock();
if (engine->schedule)
engine->schedule(request, request->ctx->priority);
+ rcu_read_unlock();
local_bh_disable();
i915_sw_fence_commit(&request->submit);
--
2.16.2
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 6+ messages in thread
* ✓ Fi.CI.BAT: success for drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
2018-03-03 9:35 [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection Chris Wilson
@ 2018-03-03 10:13 ` Patchwork
2018-03-03 11:45 ` ✓ Fi.CI.IGT: " Patchwork
2018-03-05 13:59 ` [PATCH] " Mika Kuoppala
2 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2018-03-03 10:13 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
URL : https://patchwork.freedesktop.org/series/39324/
State : success
== Summary ==
Series 39324v1 drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
https://patchwork.freedesktop.org/api/1.0/series/39324/revisions/1/mbox/
---- Known issues:
Test debugfs_test:
Subgroup read_all_entries:
incomplete -> PASS (fi-snb-2520m) fdo#103713 +1
Test gem_mmap_gtt:
Subgroup basic-small-bo-tiledx:
pass -> FAIL (fi-gdg-551) fdo#102575
Test kms_chamelium:
Subgroup dp-edid-read:
fail -> PASS (fi-kbl-7500u) fdo#102505
fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
fdo#102575 https://bugs.freedesktop.org/show_bug.cgi?id=102575
fdo#102505 https://bugs.freedesktop.org/show_bug.cgi?id=102505
fi-bdw-5557u total:288 pass:267 dwarn:0 dfail:0 fail:0 skip:21 time:414s
fi-bdw-gvtdvm total:288 pass:264 dwarn:0 dfail:0 fail:0 skip:24 time:424s
fi-blb-e6850 total:288 pass:223 dwarn:1 dfail:0 fail:0 skip:64 time:369s
fi-bsw-n3050 total:288 pass:242 dwarn:0 dfail:0 fail:0 skip:46 time:481s
fi-bwr-2160 total:288 pass:183 dwarn:0 dfail:0 fail:0 skip:105 time:278s
fi-bxt-dsi total:288 pass:258 dwarn:0 dfail:0 fail:0 skip:30 time:480s
fi-bxt-j4205 total:288 pass:259 dwarn:0 dfail:0 fail:0 skip:29 time:482s
fi-byt-j1900 total:288 pass:253 dwarn:0 dfail:0 fail:0 skip:35 time:463s
fi-byt-n2820 total:288 pass:249 dwarn:0 dfail:0 fail:0 skip:39 time:453s
fi-cfl-8700k total:288 pass:260 dwarn:0 dfail:0 fail:0 skip:28 time:391s
fi-cfl-s2 total:288 pass:262 dwarn:0 dfail:0 fail:0 skip:26 time:560s
fi-elk-e7500 total:288 pass:229 dwarn:0 dfail:0 fail:0 skip:59 time:415s
fi-gdg-551 total:288 pass:179 dwarn:0 dfail:0 fail:1 skip:108 time:287s
fi-glk-1 total:288 pass:260 dwarn:0 dfail:0 fail:0 skip:28 time:506s
fi-hsw-4770 total:288 pass:261 dwarn:0 dfail:0 fail:0 skip:27 time:387s
fi-ilk-650 total:288 pass:228 dwarn:0 dfail:0 fail:0 skip:60 time:406s
fi-ivb-3520m total:288 pass:259 dwarn:0 dfail:0 fail:0 skip:29 time:444s
fi-ivb-3770 total:288 pass:255 dwarn:0 dfail:0 fail:0 skip:33 time:411s
fi-kbl-7500u total:288 pass:263 dwarn:1 dfail:0 fail:0 skip:24 time:444s
fi-kbl-7560u total:288 pass:269 dwarn:0 dfail:0 fail:0 skip:19 time:492s
fi-kbl-7567u total:288 pass:268 dwarn:0 dfail:0 fail:0 skip:20 time:445s
fi-kbl-r total:288 pass:261 dwarn:0 dfail:0 fail:0 skip:27 time:491s
fi-pnv-d510 total:288 pass:222 dwarn:1 dfail:0 fail:0 skip:65 time:580s
fi-skl-6260u total:288 pass:268 dwarn:0 dfail:0 fail:0 skip:20 time:420s
fi-skl-6600u total:288 pass:261 dwarn:0 dfail:0 fail:0 skip:27 time:501s
fi-skl-6700hq total:288 pass:262 dwarn:0 dfail:0 fail:0 skip:26 time:519s
fi-skl-6700k2 total:288 pass:264 dwarn:0 dfail:0 fail:0 skip:24 time:483s
fi-skl-6770hq total:288 pass:268 dwarn:0 dfail:0 fail:0 skip:20 time:471s
fi-skl-guc total:288 pass:260 dwarn:0 dfail:0 fail:0 skip:28 time:404s
fi-skl-gvtdvm total:288 pass:265 dwarn:0 dfail:0 fail:0 skip:23 time:428s
fi-snb-2520m total:245 pass:211 dwarn:0 dfail:0 fail:0 skip:33
fi-snb-2600 total:288 pass:248 dwarn:0 dfail:0 fail:0 skip:40 time:391s
c07ef2c77d141730de535958cf9483d269b24676 drm-tip: 2018y-03m-03d-08h-39m-05s UTC integration manifest
854848bd28f7 drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8226/issues.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* ✓ Fi.CI.IGT: success for drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
2018-03-03 9:35 [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection Chris Wilson
2018-03-03 10:13 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2018-03-03 11:45 ` Patchwork
2018-03-05 13:59 ` [PATCH] " Mika Kuoppala
2 siblings, 0 replies; 6+ messages in thread
From: Patchwork @ 2018-03-03 11:45 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
URL : https://patchwork.freedesktop.org/series/39324/
State : success
== Summary ==
---- Known issues:
Test gem_eio:
Subgroup in-flight-external:
incomplete -> PASS (shard-apl) fdo#104945
Test kms_chv_cursor_fail:
Subgroup pipe-b-256x256-right-edge:
pass -> DMESG-WARN (shard-snb) fdo#105185
Test kms_flip:
Subgroup blocking-wf_vblank:
fail -> PASS (shard-hsw) fdo#103928 +1
Subgroup dpms-vs-vblank-race-interruptible:
pass -> FAIL (shard-hsw) fdo#103060
fdo#104945 https://bugs.freedesktop.org/show_bug.cgi?id=104945
fdo#105185 https://bugs.freedesktop.org/show_bug.cgi?id=105185
fdo#103928 https://bugs.freedesktop.org/show_bug.cgi?id=103928
fdo#103060 https://bugs.freedesktop.org/show_bug.cgi?id=103060
shard-apl total:3462 pass:1820 dwarn:1 dfail:0 fail:7 skip:1632 time:11967s
shard-hsw total:3463 pass:1769 dwarn:1 dfail:0 fail:2 skip:1690 time:11797s
shard-snb total:3463 pass:1358 dwarn:3 dfail:0 fail:2 skip:2100 time:7021s
Blacklisted hosts:
shard-kbl total:3462 pass:1943 dwarn:1 dfail:0 fail:6 skip:1511 time:9274s
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_8226/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
2018-03-03 9:35 [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection Chris Wilson
2018-03-03 10:13 ` ✓ Fi.CI.BAT: success for " Patchwork
2018-03-03 11:45 ` ✓ Fi.CI.IGT: " Patchwork
@ 2018-03-05 13:59 ` Mika Kuoppala
2018-03-05 14:34 ` Chris Wilson
2 siblings, 1 reply; 6+ messages in thread
From: Mika Kuoppala @ 2018-03-05 13:59 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
Chris Wilson <chris@chris-wilson.co.uk> writes:
> Similar to the staging around handling of engine->submit_request, we
> need to stop adding to the execlists->queue prior to calling
> engine->cancel_requests. cancel_requests will move requests from the
> queue onto the timeline, so if we add a request onto the queue after that
> point, it will be lost.
>
> Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gem.c | 13 +++++++------
> drivers/gpu/drm/i915/i915_request.c | 2 ++
> 2 files changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index a5bd07338b46..8d913d833ab9 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -471,10 +471,11 @@ static void __fence_set_priority(struct dma_fence *fence, int prio)
>
> rq = to_request(fence);
> engine = rq->engine;
> - if (!engine->schedule)
> - return;
>
> - engine->schedule(rq, prio);
> + rcu_read_lock();
> + if (engine->schedule)
> + engine->schedule(rq, prio);
> + rcu_read_unlock();
> }
>
> static void fence_set_priority(struct dma_fence *fence, int prio)
> @@ -3214,8 +3215,11 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
> */
> for_each_engine(engine, i915, id) {
> i915_gem_reset_prepare_engine(engine);
> +
> engine->submit_request = nop_submit_request;
> + engine->schedule = NULL;
Why we are not using rcu_assign_pointer and rcu_deference pair
in the upper part where we check the schedule?
Further, is there are risk that we lose sync between the two
assigments. In another words, should we combine both callbacks
behind a single deferensable pointer in the engine struct?
-Mika
> }
> + i915->caps.scheduler = 0;
>
> /*
> * Make sure no one is running the old callback before we proceed with
> @@ -3233,11 +3237,8 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
> * start to complete all requests.
> */
> engine->submit_request = nop_complete_submit_request;
> - engine->schedule = NULL;
> }
>
> - i915->caps.scheduler = 0;
> -
> /*
> * Make sure no request can slip through without getting completed by
> * either this call here to intel_engine_init_global_seqno, or the one
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 2265bb8ff4fa..59a87afd83b6 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1081,8 +1081,10 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
> * decide whether to preempt the entire chain so that it is ready to
> * run at the earliest possible convenience.
> */
> + rcu_read_lock();
> if (engine->schedule)
> engine->schedule(request, request->ctx->priority);
> + rcu_read_unlock();
>
> local_bh_disable();
> i915_sw_fence_commit(&request->submit);
> --
> 2.16.2
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
2018-03-05 13:59 ` [PATCH] " Mika Kuoppala
@ 2018-03-05 14:34 ` Chris Wilson
2018-03-05 14:35 ` Chris Wilson
0 siblings, 1 reply; 6+ messages in thread
From: Chris Wilson @ 2018-03-05 14:34 UTC (permalink / raw)
To: Mika Kuoppala, intel-gfx
Quoting Mika Kuoppala (2018-03-05 13:59:43)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
> > Similar to the staging around handling of engine->submit_request, we
> > need to stop adding to the execlists->queue prior to calling
> > engine->cancel_requests. cancel_requests will move requests from the
> > queue onto the timeline, so if we add a request onto the queue after that
> > point, it will be lost.
> >
> > Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_gem.c | 13 +++++++------
> > drivers/gpu/drm/i915/i915_request.c | 2 ++
> > 2 files changed, 9 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index a5bd07338b46..8d913d833ab9 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -471,10 +471,11 @@ static void __fence_set_priority(struct dma_fence *fence, int prio)
> >
> > rq = to_request(fence);
> > engine = rq->engine;
> > - if (!engine->schedule)
> > - return;
> >
> > - engine->schedule(rq, prio);
> > + rcu_read_lock();
> > + if (engine->schedule)
> > + engine->schedule(rq, prio);
> > + rcu_read_unlock();
> > }
> >
> > static void fence_set_priority(struct dma_fence *fence, int prio)
> > @@ -3214,8 +3215,11 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
> > */
> > for_each_engine(engine, i915, id) {
> > i915_gem_reset_prepare_engine(engine);
> > +
> > engine->submit_request = nop_submit_request;
> > + engine->schedule = NULL;
>
> Why we are not using rcu_assign_pointer and rcu_deference pair
> in the upper part where we check the schedule?
We are not using RCU protection. RCU here is being abused as a
free-flowing stop-machine.
> Further, is there are risk that we lose sync between the two
> assigments. In another words, should we combine both callbaks
> behind a single deferensable pointer in the engine struct?
They are only tied together by how the backend uses them, not by request
flow, so I don't think so.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection
2018-03-05 14:34 ` Chris Wilson
@ 2018-03-05 14:35 ` Chris Wilson
0 siblings, 0 replies; 6+ messages in thread
From: Chris Wilson @ 2018-03-05 14:35 UTC (permalink / raw)
To: Mika Kuoppala, intel-gfx
Quoting Chris Wilson (2018-03-05 14:34:42)
> Quoting Mika Kuoppala (2018-03-05 13:59:43)
> > Chris Wilson <chris@chris-wilson.co.uk> writes:
> >
> > > Similar to the staging around handling of engine->submit_request, we
> > > need to stop adding to the execlists->queue prior to calling
> > > engine->cancel_requests. cancel_requests will move requests from the
> > > queue onto the timeline, so if we add a request onto the queue after that
> > > point, it will be lost.
> > >
> > > Fixes: af7a8ffad9c5 ("drm/i915: Use rcu instead of stop_machine in set_wedged")
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > > ---
> > > drivers/gpu/drm/i915/i915_gem.c | 13 +++++++------
> > > drivers/gpu/drm/i915/i915_request.c | 2 ++
> > > 2 files changed, 9 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > index a5bd07338b46..8d913d833ab9 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -471,10 +471,11 @@ static void __fence_set_priority(struct dma_fence *fence, int prio)
> > >
> > > rq = to_request(fence);
> > > engine = rq->engine;
> > > - if (!engine->schedule)
> > > - return;
> > >
> > > - engine->schedule(rq, prio);
> > > + rcu_read_lock();
> > > + if (engine->schedule)
> > > + engine->schedule(rq, prio);
> > > + rcu_read_unlock();
> > > }
> > >
> > > static void fence_set_priority(struct dma_fence *fence, int prio)
> > > @@ -3214,8 +3215,11 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
> > > */
> > > for_each_engine(engine, i915, id) {
> > > i915_gem_reset_prepare_engine(engine);
> > > +
> > > engine->submit_request = nop_submit_request;
> > > + engine->schedule = NULL;
> >
> > Why we are not using rcu_assign_pointer and rcu_deference pair
> > in the upper part where we check the schedule?
>
> We are not using RCU protection. RCU here is being abused as a
> free-flowing stop-machine.
I'm sorely tempted to put it back to stop_machine as the races are just
plain weird and proving hard to fix :(
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-03-05 14:35 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-03 9:35 [PATCH] drm/i915: Wrap engine->schedule in RCU locks for set-wedge protection Chris Wilson
2018-03-03 10:13 ` ✓ Fi.CI.BAT: success for " Patchwork
2018-03-03 11:45 ` ✓ Fi.CI.IGT: " Patchwork
2018-03-05 13:59 ` [PATCH] " Mika Kuoppala
2018-03-05 14:34 ` Chris Wilson
2018-03-05 14:35 ` Chris Wilson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.