All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/i915: Use rcu instead of stop_machine
@ 2017-10-05 14:09 Daniel Vetter
  2017-10-05 14:30 ` Chris Wilson
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Daniel Vetter @ 2017-10-05 14:09 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala

stop_machine is not really a locking primitive we should use, except
when the hw folks tell us the hw is broken and that's the only way to
work around it.

This patch here is just a suggestion for how to fix it up, possible
changes needed to make it actually work:

- Set the nop_submit_request first for _all_ engines, before
  proceeding.

- Make sure engine->cancel_requests copes with the possibility that
  not all tests have consistently used the new or old version. I dont
  think this is a problem, since the same can happen really with the
  stop_machine() locking - stop_machine also doesn't give you any kind
  of global ordering against other cpu threads, it just makes them
  stop.

This patch tries to address the locking snafu from

commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Nov 22 14:41:21 2016 +0000

    drm/i915: Stop the machine as we install the wedged submit_request handler

Chris said parts of the reasons for going with stop_machine() was that
it's no overhead for the fast-path. But these callbacks use irqsave
spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c                   | 18 +++++-------------
 drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
 drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
 3 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ab8c6946fea4..0b260e576b4b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3022,13 +3022,13 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
 
 static void engine_set_wedged(struct intel_engine_cs *engine)
 {
+	engine->submit_request = nop_submit_request;
+
 	/* We need to be sure that no thread is running the old callback as
 	 * we install the nop handler (otherwise we would submit a request
-	 * to hardware that will never complete). In order to prevent this
-	 * race, we wait until the machine is idle before making the swap
-	 * (using stop_machine()).
+	 * to hardware that will never complete).
 	 */
-	engine->submit_request = nop_submit_request;
+	synchronize_rcu();
 
 	/* Mark all executing requests as skipped */
 	engine->cancel_requests(engine);
@@ -3041,9 +3041,8 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
 				       intel_engine_last_submit(engine));
 }
 
-static int __i915_gem_set_wedged_BKL(void *data)
+void i915_gem_set_wedged(struct drm_i915_private *i915)
 {
-	struct drm_i915_private *i915 = data;
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
@@ -3052,13 +3051,6 @@ static int __i915_gem_set_wedged_BKL(void *data)
 
 	set_bit(I915_WEDGED, &i915->gpu_error.flags);
 	wake_up_all(&i915->gpu_error.reset_queue);
-
-	return 0;
-}
-
-void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
-{
-	stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL);
 }
 
 bool i915_gem_unset_wedged(struct drm_i915_private *i915)
diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
index b100b38f1dd2..ef78a85cb845 100644
--- a/drivers/gpu/drm/i915/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/i915_gem_request.c
@@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 	switch (state) {
 	case FENCE_COMPLETE:
 		trace_i915_gem_request_submit(request);
+		rcu_read_lock();
 		request->engine->submit_request(request);
+		rcu_read_unlock();
 		break;
 
 	case FENCE_FREE:
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
index 78b9f811707f..a999161e8db1 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
@@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg)
 	}
 	i915_gem_request_get(vip);
 	i915_add_request(vip);
+	rcu_read_lock();
 	request->engine->submit_request(request);
+	rcu_read_unlock();
 
 	mutex_unlock(&i915->drm.struct_mutex);
 
-- 
2.14.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Use rcu instead of stop_machine
  2017-10-05 14:09 [PATCH] drm/i915: Use rcu instead of stop_machine Daniel Vetter
@ 2017-10-05 14:30 ` Chris Wilson
  2017-10-05 16:12   ` Daniel Vetter
  2017-10-05 14:55 ` Tvrtko Ursulin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2017-10-05 14:30 UTC (permalink / raw)
  To: Intel Graphics Development; +Cc: Daniel Vetter, Daniel Vetter, Mika Kuoppala

Quoting Daniel Vetter (2017-10-05 15:09:48)
> stop_machine is not really a locking primitive we should use, except
> when the hw folks tell us the hw is broken and that's the only way to
> work around it.
> 
> This patch here is just a suggestion for how to fix it up, possible
> changes needed to make it actually work:
> 
> - Set the nop_submit_request first for _all_ engines, before
>   proceeding.
> 
> - Make sure engine->cancel_requests copes with the possibility that
>   not all tests have consistently used the new or old version. I dont
>   think this is a problem, since the same can happen really with the
>   stop_machine() locking - stop_machine also doesn't give you any kind
>   of global ordering against other cpu threads, it just makes them
>   stop.
> 
> This patch tries to address the locking snafu from

There's a locking snafu in the code?
 
> commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Nov 22 14:41:21 2016 +0000
> 
>     drm/i915: Stop the machine as we install the wedged submit_request handler
> 
> Chris said parts of the reasons for going with stop_machine() was that
> it's no overhead for the fast-path.

More than that, you don't even have to think about it. It's a one off
event that changes execution paths. I actually never thought about
putting the lock mechanism around the caller (that does prevent the issue
I was dreading of being inside the callback as it changed), it is still
magic that has nothing to do with the code flow. What variable should we
document as being rcu protected, (*engine->submit_request)()?

I'm definitely not sold on having set-wedge dictate terms to the rest of
the code.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Use rcu instead of stop_machine
  2017-10-05 14:09 [PATCH] drm/i915: Use rcu instead of stop_machine Daniel Vetter
  2017-10-05 14:30 ` Chris Wilson
@ 2017-10-05 14:55 ` Tvrtko Ursulin
  2017-10-05 16:24   ` Daniel Vetter
  2017-10-06  8:47 ` ✓ Fi.CI.BAT: success for " Patchwork
  2017-10-06  9:50 ` ✗ Fi.CI.IGT: failure " Patchwork
  3 siblings, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2017-10-05 14:55 UTC (permalink / raw)
  To: Daniel Vetter, Intel Graphics Development; +Cc: Daniel Vetter, Mika Kuoppala


On 05/10/2017 15:09, Daniel Vetter wrote:
> stop_machine is not really a locking primitive we should use, except
> when the hw folks tell us the hw is broken and that's the only way to
> work around it.
> 
> This patch here is just a suggestion for how to fix it up, possible
> changes needed to make it actually work:
> 
> - Set the nop_submit_request first for _all_ engines, before
>    proceeding.
> 
> - Make sure engine->cancel_requests copes with the possibility that
>    not all tests have consistently used the new or old version. I dont
>    think this is a problem, since the same can happen really with the
>    stop_machine() locking - stop_machine also doesn't give you any kind
>    of global ordering against other cpu threads, it just makes them
>    stop.
> 
> This patch tries to address the locking snafu from
> 
> commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Tue Nov 22 14:41:21 2016 +0000
> 
>      drm/i915: Stop the machine as we install the wedged submit_request handler
> 
> Chris said parts of the reasons for going with stop_machine() was that
> it's no overhead for the fast-path. But these callbacks use irqsave
> spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c                   | 18 +++++-------------
>   drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
>   drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
>   3 files changed, 9 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index ab8c6946fea4..0b260e576b4b 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3022,13 +3022,13 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
>   
>   static void engine_set_wedged(struct intel_engine_cs *engine)
>   {
> +	engine->submit_request = nop_submit_request;

Should this be rcu_assign_pointer?

> +
>   	/* We need to be sure that no thread is running the old callback as
>   	 * we install the nop handler (otherwise we would submit a request
> -	 * to hardware that will never complete). In order to prevent this
> -	 * race, we wait until the machine is idle before making the swap
> -	 * (using stop_machine()).
> +	 * to hardware that will never complete).
>   	 */
> -	engine->submit_request = nop_submit_request;
> +	synchronize_rcu();

Consumers of this are running in irq disabled or softirq. Does this mean 
we would need synchronize_rcu_bh? Would either guarantee all tasklets 
and irq handlers have exited?

>   	/* Mark all executing requests as skipped */
>   	engine->cancel_requests(engine);
> @@ -3041,9 +3041,8 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
>   				       intel_engine_last_submit(engine));
>   }
>   
> -static int __i915_gem_set_wedged_BKL(void *data)
> +void i915_gem_set_wedged(struct drm_i915_private *i915)
>   {
> -	struct drm_i915_private *i915 = data;
>   	struct intel_engine_cs *engine;
>   	enum intel_engine_id id;
>   
> @@ -3052,13 +3051,6 @@ static int __i915_gem_set_wedged_BKL(void *data)
>   
>   	set_bit(I915_WEDGED, &i915->gpu_error.flags);
>   	wake_up_all(&i915->gpu_error.reset_queue);
> -
> -	return 0;
> -}
> -
> -void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
> -{
> -	stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL);
>   }
>   
>   bool i915_gem_unset_wedged(struct drm_i915_private *i915)
> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> index b100b38f1dd2..ef78a85cb845 100644
> --- a/drivers/gpu/drm/i915/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>   	switch (state) {
>   	case FENCE_COMPLETE:
>   		trace_i915_gem_request_submit(request);
> +		rcu_read_lock();
>   		request->engine->submit_request(request);
> +		rcu_read_unlock();

And _bh for these? Although this already runs with preemption off, but I 
guess it is good for documentation.

Regards,

Tvrtko

>   		break;
>   
>   	case FENCE_FREE:
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> index 78b9f811707f..a999161e8db1 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> @@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg)
>   	}
>   	i915_gem_request_get(vip);
>   	i915_add_request(vip);
> +	rcu_read_lock();
>   	request->engine->submit_request(request);
> +	rcu_read_unlock();
>   
>   	mutex_unlock(&i915->drm.struct_mutex);
>   
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Use rcu instead of stop_machine
  2017-10-05 14:30 ` Chris Wilson
@ 2017-10-05 16:12   ` Daniel Vetter
  2017-10-06  8:42     ` Chris Wilson
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Vetter @ 2017-10-05 16:12 UTC (permalink / raw)
  To: Chris Wilson
  Cc: Daniel Vetter, Daniel Vetter, Intel Graphics Development, Mika Kuoppala

On Thu, Oct 05, 2017 at 03:30:12PM +0100, Chris Wilson wrote:
> Quoting Daniel Vetter (2017-10-05 15:09:48)
> > stop_machine is not really a locking primitive we should use, except
> > when the hw folks tell us the hw is broken and that's the only way to
> > work around it.
> > 
> > This patch here is just a suggestion for how to fix it up, possible
> > changes needed to make it actually work:
> > 
> > - Set the nop_submit_request first for _all_ engines, before
> >   proceeding.
> > 
> > - Make sure engine->cancel_requests copes with the possibility that
> >   not all tests have consistently used the new or old version. I dont
> >   think this is a problem, since the same can happen really with the
> >   stop_machine() locking - stop_machine also doesn't give you any kind
> >   of global ordering against other cpu threads, it just makes them
> >   stop.
> > 
> > This patch tries to address the locking snafu from
> 
> There's a locking snafu in the code?
>  
> > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Tue Nov 22 14:41:21 2016 +0000
> > 
> >     drm/i915: Stop the machine as we install the wedged submit_request handler
> > 
> > Chris said parts of the reasons for going with stop_machine() was that
> > it's no overhead for the fast-path.
> 
> More than that, you don't even have to think about it. It's a one off
> event that changes execution paths. I actually never thought about
> putting the lock mechanism around the caller (that does prevent the issue
> I was dreading of being inside the callback as it changed), it is still
> magic that has nothing to do with the code flow. What variable should we
> document as being rcu protected, (*engine->submit_request)()?

Yeah, engine->submit_request would be the one, except it shouldn't use
rcu_assign_pointer/rcu_dereference since we don't need those barriers,
ever - the .text doesn't change after all. That's why I didn't splatter
the sparse annotations all over it. In a way we don't need the data
barriers of rcu, but really only the "everyone has passed the critical
section now" part of it.

> I'm definitely not sold on having set-wedge dictate terms to the rest of
> the code.

Well, it sounds like tglx would very much prefer we clean this locking
inversion up on our side and not get them to break the cycle. Not what I
wanted Thomas to look at, but since I botched it and attached the wrong
splat we got the answer for this patch here, and it seems to be "don't do
that".
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Use rcu instead of stop_machine
  2017-10-05 14:55 ` Tvrtko Ursulin
@ 2017-10-05 16:24   ` Daniel Vetter
  2017-10-06  8:30     ` Tvrtko Ursulin
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Vetter @ 2017-10-05 16:24 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Daniel Vetter, Intel Graphics Development, Mika Kuoppala, Daniel Vetter

On Thu, Oct 05, 2017 at 03:55:19PM +0100, Tvrtko Ursulin wrote:
> 
> On 05/10/2017 15:09, Daniel Vetter wrote:
> > stop_machine is not really a locking primitive we should use, except
> > when the hw folks tell us the hw is broken and that's the only way to
> > work around it.
> > 
> > This patch here is just a suggestion for how to fix it up, possible
> > changes needed to make it actually work:
> > 
> > - Set the nop_submit_request first for _all_ engines, before
> >    proceeding.
> > 
> > - Make sure engine->cancel_requests copes with the possibility that
> >    not all tests have consistently used the new or old version. I dont
> >    think this is a problem, since the same can happen really with the
> >    stop_machine() locking - stop_machine also doesn't give you any kind
> >    of global ordering against other cpu threads, it just makes them
> >    stop.
> > 
> > This patch tries to address the locking snafu from
> > 
> > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Tue Nov 22 14:41:21 2016 +0000
> > 
> >      drm/i915: Stop the machine as we install the wedged submit_request handler
> > 
> > Chris said parts of the reasons for going with stop_machine() was that
> > it's no overhead for the fast-path. But these callbacks use irqsave
> > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
> > 
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c                   | 18 +++++-------------
> >   drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
> >   drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
> >   3 files changed, 9 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index ab8c6946fea4..0b260e576b4b 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3022,13 +3022,13 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
> >   static void engine_set_wedged(struct intel_engine_cs *engine)
> >   {
> > +	engine->submit_request = nop_submit_request;
> 
> Should this be rcu_assign_pointer?

Those provide additional barriers, needed when you change/allocate the
stuff you're pointing to. We point to immutable functions, so shouldn't be
necessary (and would be confusing imo).

> > +
> >   	/* We need to be sure that no thread is running the old callback as
> >   	 * we install the nop handler (otherwise we would submit a request
> > -	 * to hardware that will never complete). In order to prevent this
> > -	 * race, we wait until the machine is idle before making the swap
> > -	 * (using stop_machine()).
> > +	 * to hardware that will never complete).
> >   	 */
> > -	engine->submit_request = nop_submit_request;
> > +	synchronize_rcu();
> 
> Consumers of this are running in irq disabled or softirq. Does this mean we
> would need synchronize_rcu_bh? Would either guarantee all tasklets and irq
> handlers have exited?

Oh ... tbh I didn't even digg that deep (much less ran this stuff). This
really is an RFC so people with real clue could say whether it has a
chance of working or not.

Looking at rcu docs we don't want _bh variants, since rcu_read_lock should
be safe in even hardirq context. _bh and _sched otoh require that all
critical sections are either in bottom halfs or hardirq context, since
they treat scheduling of those as a grace period.

Cheers, Daniel

> >   	/* Mark all executing requests as skipped */
> >   	engine->cancel_requests(engine);
> > @@ -3041,9 +3041,8 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
> >   				       intel_engine_last_submit(engine));
> >   }
> > -static int __i915_gem_set_wedged_BKL(void *data)
> > +void i915_gem_set_wedged(struct drm_i915_private *i915)
> >   {
> > -	struct drm_i915_private *i915 = data;
> >   	struct intel_engine_cs *engine;
> >   	enum intel_engine_id id;
> > @@ -3052,13 +3051,6 @@ static int __i915_gem_set_wedged_BKL(void *data)
> >   	set_bit(I915_WEDGED, &i915->gpu_error.flags);
> >   	wake_up_all(&i915->gpu_error.reset_queue);
> > -
> > -	return 0;
> > -}
> > -
> > -void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
> > -{
> > -	stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL);
> >   }
> >   bool i915_gem_unset_wedged(struct drm_i915_private *i915)
> > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
> > index b100b38f1dd2..ef78a85cb845 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_request.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_request.c
> > @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> >   	switch (state) {
> >   	case FENCE_COMPLETE:
> >   		trace_i915_gem_request_submit(request);
> > +		rcu_read_lock();
> >   		request->engine->submit_request(request);
> > +		rcu_read_unlock();
> 
> And _bh for these? Although this already runs with preemption off, but I
> guess it is good for documentation.
> 
> Regards,
> 
> Tvrtko
> 
> >   		break;
> >   	case FENCE_FREE:
> > diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> > index 78b9f811707f..a999161e8db1 100644
> > --- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
> > @@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg)
> >   	}
> >   	i915_gem_request_get(vip);
> >   	i915_add_request(vip);
> > +	rcu_read_lock();
> >   	request->engine->submit_request(request);
> > +	rcu_read_unlock();
> >   	mutex_unlock(&i915->drm.struct_mutex);
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Use rcu instead of stop_machine
  2017-10-05 16:24   ` Daniel Vetter
@ 2017-10-06  8:30     ` Tvrtko Ursulin
  2017-10-06  8:49       ` Daniel Vetter
  0 siblings, 1 reply; 12+ messages in thread
From: Tvrtko Ursulin @ 2017-10-06  8:30 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Daniel Vetter, Intel Graphics Development, Mika Kuoppala, Daniel Vetter


On 05/10/2017 17:24, Daniel Vetter wrote:
> On Thu, Oct 05, 2017 at 03:55:19PM +0100, Tvrtko Ursulin wrote:
>>
>> On 05/10/2017 15:09, Daniel Vetter wrote:
>>> stop_machine is not really a locking primitive we should use, except
>>> when the hw folks tell us the hw is broken and that's the only way to
>>> work around it.
>>>
>>> This patch here is just a suggestion for how to fix it up, possible
>>> changes needed to make it actually work:
>>>
>>> - Set the nop_submit_request first for _all_ engines, before
>>>     proceeding.
>>>
>>> - Make sure engine->cancel_requests copes with the possibility that
>>>     not all tests have consistently used the new or old version. I dont
>>>     think this is a problem, since the same can happen really with the
>>>     stop_machine() locking - stop_machine also doesn't give you any kind
>>>     of global ordering against other cpu threads, it just makes them
>>>     stop.
>>>
>>> This patch tries to address the locking snafu from
>>>
>>> commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
>>> Author: Chris Wilson <chris@chris-wilson.co.uk>
>>> Date:   Tue Nov 22 14:41:21 2016 +0000
>>>
>>>       drm/i915: Stop the machine as we install the wedged submit_request handler
>>>
>>> Chris said parts of the reasons for going with stop_machine() was that
>>> it's no overhead for the fast-path. But these callbacks use irqsave
>>> spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
>>>
>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem.c                   | 18 +++++-------------
>>>    drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
>>>    drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
>>>    3 files changed, 9 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index ab8c6946fea4..0b260e576b4b 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -3022,13 +3022,13 @@ static void nop_submit_request(struct drm_i915_gem_request *request)
>>>    static void engine_set_wedged(struct intel_engine_cs *engine)
>>>    {
>>> +	engine->submit_request = nop_submit_request;
>>
>> Should this be rcu_assign_pointer?
> 
> Those provide additional barriers, needed when you change/allocate the
> stuff you're pointing to. We point to immutable functions, so shouldn't be
> necessary (and would be confusing imo).

Ah ok. Any barriers then? Or synchronize_rcu implies them all?

>>> +
>>>    	/* We need to be sure that no thread is running the old callback as
>>>    	 * we install the nop handler (otherwise we would submit a request
>>> -	 * to hardware that will never complete). In order to prevent this
>>> -	 * race, we wait until the machine is idle before making the swap
>>> -	 * (using stop_machine()).
>>> +	 * to hardware that will never complete).
>>>    	 */
>>> -	engine->submit_request = nop_submit_request;
>>> +	synchronize_rcu();
>>
>> Consumers of this are running in irq disabled or softirq. Does this mean we
>> would need synchronize_rcu_bh? Would either guarantee all tasklets and irq
>> handlers have exited?
> 
> Oh ... tbh I didn't even digg that deep (much less ran this stuff). This
> really is an RFC so people with real clue could say whether it has a
> chance of working or not.
> 
> Looking at rcu docs we don't want _bh variants, since rcu_read_lock should
> be safe in even hardirq context. _bh and _sched otoh require that all
> critical sections are either in bottom halfs or hardirq context, since
> they treat scheduling of those as a grace period.

rcu_read_unlock might schedule (via preempt_enable) so I don't think we 
can use them from the fence callbacks.

And _bh is indeed only for softirq while we need hard and soft. So I am 
not sure which one we could use.

It sounds to me any would be wrong and if we wanted to drop stop_machine 
we would simply have to use nothing. But then we couldn't be certain 
there are no more new requests queued after wedged has been set.

Maybe I am missing something, not sure.
  Regards,

Tvrtko

> Cheers, Daniel
> 
>>>    	/* Mark all executing requests as skipped */
>>>    	engine->cancel_requests(engine);
>>> @@ -3041,9 +3041,8 @@ static void engine_set_wedged(struct intel_engine_cs *engine)
>>>    				       intel_engine_last_submit(engine));
>>>    }
>>> -static int __i915_gem_set_wedged_BKL(void *data)
>>> +void i915_gem_set_wedged(struct drm_i915_private *i915)
>>>    {
>>> -	struct drm_i915_private *i915 = data;
>>>    	struct intel_engine_cs *engine;
>>>    	enum intel_engine_id id;
>>> @@ -3052,13 +3051,6 @@ static int __i915_gem_set_wedged_BKL(void *data)
>>>    	set_bit(I915_WEDGED, &i915->gpu_error.flags);
>>>    	wake_up_all(&i915->gpu_error.reset_queue);
>>> -
>>> -	return 0;
>>> -}
>>> -
>>> -void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
>>> -{
>>> -	stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL);
>>>    }
>>>    bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c
>>> index b100b38f1dd2..ef78a85cb845 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_request.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
>>> @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
>>>    	switch (state) {
>>>    	case FENCE_COMPLETE:
>>>    		trace_i915_gem_request_submit(request);
>>> +		rcu_read_lock();
>>>    		request->engine->submit_request(request);
>>> +		rcu_read_unlock();
>>
>> And _bh for these? Although this already runs with preemption off, but I
>> guess it is good for documentation.
>>
>> Regards,
>>
>> Tvrtko
>>
>>>    		break;
>>>    	case FENCE_FREE:
>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
>>> index 78b9f811707f..a999161e8db1 100644
>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c
>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
>>> @@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg)
>>>    	}
>>>    	i915_gem_request_get(vip);
>>>    	i915_add_request(vip);
>>> +	rcu_read_lock();
>>>    	request->engine->submit_request(request);
>>> +	rcu_read_unlock();
>>>    	mutex_unlock(&i915->drm.struct_mutex);
>>>
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Use rcu instead of stop_machine
  2017-10-05 16:12   ` Daniel Vetter
@ 2017-10-06  8:42     ` Chris Wilson
  2017-10-06  8:56       ` Daniel Vetter
  0 siblings, 1 reply; 12+ messages in thread
From: Chris Wilson @ 2017-10-06  8:42 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Daniel Vetter, Daniel Vetter, Intel Graphics Development, Mika Kuoppala

Quoting Daniel Vetter (2017-10-05 17:12:35)
> On Thu, Oct 05, 2017 at 03:30:12PM +0100, Chris Wilson wrote:
> > Quoting Daniel Vetter (2017-10-05 15:09:48)
> > > stop_machine is not really a locking primitive we should use, except
> > > when the hw folks tell us the hw is broken and that's the only way to
> > > work around it.
> > > 
> > > This patch here is just a suggestion for how to fix it up, possible
> > > changes needed to make it actually work:
> > > 
> > > - Set the nop_submit_request first for _all_ engines, before
> > >   proceeding.
> > > 
> > > - Make sure engine->cancel_requests copes with the possibility that
> > >   not all tests have consistently used the new or old version. I dont
> > >   think this is a problem, since the same can happen really with the
> > >   stop_machine() locking - stop_machine also doesn't give you any kind
> > >   of global ordering against other cpu threads, it just makes them
> > >   stop.
> > > 
> > > This patch tries to address the locking snafu from
> > 
> > There's a locking snafu in the code?
> >  
> > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
> > > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > > Date:   Tue Nov 22 14:41:21 2016 +0000
> > > 
> > >     drm/i915: Stop the machine as we install the wedged submit_request handler
> > > 
> > > Chris said parts of the reasons for going with stop_machine() was that
> > > it's no overhead for the fast-path.
> > 
> > More than that, you don't even have to think about it. It's a one off
> > event that changes execution paths. I actually never thought about
> > putting the lock mechanism around the caller (that does prevent the issue
> > I was dreading of being inside the callback as it changed), it is still
> > magic that has nothing to do with the code flow. What variable should we
> > document as being rcu protected, (*engine->submit_request)()?
> 
> Yeah, engine->submit_request would be the one, except it shouldn't use
> rcu_assign_pointer/rcu_dereference since we don't need those barriers,
> ever - the .text doesn't change after all. That's why I didn't splatter
> the sparse annotations all over it. In a way we don't need the data
> barriers of rcu, but really only the "everyone has passed the critical
> section now" part of it.
> 
> > I'm definitely not sold on having set-wedge dictate terms to the rest of
> > the code.
> 
> Well, it sounds like tglx would very much prefer we clean this locking
> inversion up on our side and not get them to break the cycle. Not what I
> wanted Thomas to look at, but since I botched it and attached the wrong
> splat we got the answer for this patch here, and it seems to be "don't do
> that".

The analogy is with kernel live-patching (which is what we are doing
here). Should every module in the kernel manually markup itself to
support the esoteric feature, or should the feature support the wider
kernel?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* ✓ Fi.CI.BAT: success for drm/i915: Use rcu instead of stop_machine
  2017-10-05 14:09 [PATCH] drm/i915: Use rcu instead of stop_machine Daniel Vetter
  2017-10-05 14:30 ` Chris Wilson
  2017-10-05 14:55 ` Tvrtko Ursulin
@ 2017-10-06  8:47 ` Patchwork
  2017-10-06  9:50 ` ✗ Fi.CI.IGT: failure " Patchwork
  3 siblings, 0 replies; 12+ messages in thread
From: Patchwork @ 2017-10-06  8:47 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Use rcu instead of stop_machine
URL   : https://patchwork.freedesktop.org/series/31433/
State : success

== Summary ==

Series 31433v1 drm/i915: Use rcu instead of stop_machine
https://patchwork.freedesktop.org/api/1.0/series/31433/revisions/1/mbox/

Test kms_flip:
        Subgroup basic-flip-vs-dpms:
                dmesg-warn -> PASS       (fi-skl-6700hq)

fi-bdw-5557u     total:289  pass:268  dwarn:0   dfail:0   fail:0   skip:21  time:454s
fi-bdw-gvtdvm    total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:469s
fi-blb-e6850     total:289  pass:223  dwarn:1   dfail:0   fail:0   skip:65  time:392s
fi-bsw-n3050     total:289  pass:243  dwarn:0   dfail:0   fail:0   skip:46  time:559s
fi-bwr-2160      total:289  pass:183  dwarn:0   dfail:0   fail:0   skip:106 time:288s
fi-bxt-dsi       total:289  pass:259  dwarn:0   dfail:0   fail:0   skip:30  time:523s
fi-bxt-j4205     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:527s
fi-byt-j1900     total:289  pass:253  dwarn:1   dfail:0   fail:0   skip:35  time:537s
fi-byt-n2820     total:289  pass:249  dwarn:1   dfail:0   fail:0   skip:39  time:534s
fi-cfl-s         total:289  pass:256  dwarn:1   dfail:0   fail:0   skip:32  time:560s
fi-cnl-y         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:624s
fi-elk-e7500     total:289  pass:229  dwarn:0   dfail:0   fail:0   skip:60  time:441s
fi-glk-1         total:289  pass:261  dwarn:0   dfail:0   fail:0   skip:28  time:598s
fi-hsw-4770      total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:438s
fi-hsw-4770r     total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:419s
fi-ilk-650       total:289  pass:228  dwarn:0   dfail:0   fail:0   skip:61  time:462s
fi-ivb-3520m     total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:510s
fi-ivb-3770      total:289  pass:260  dwarn:0   dfail:0   fail:0   skip:29  time:479s
fi-kbl-7500u     total:289  pass:264  dwarn:1   dfail:0   fail:0   skip:24  time:517s
fi-kbl-7560u     total:289  pass:270  dwarn:0   dfail:0   fail:0   skip:19  time:592s
fi-kbl-7567u     total:289  pass:265  dwarn:4   dfail:0   fail:0   skip:20  time:499s
fi-kbl-r         total:289  pass:262  dwarn:0   dfail:0   fail:0   skip:27  time:594s
fi-pnv-d510      total:289  pass:222  dwarn:1   dfail:0   fail:0   skip:66  time:654s
fi-skl-6260u     total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:478s
fi-skl-6700hq    total:289  pass:263  dwarn:0   dfail:0   fail:0   skip:26  time:661s
fi-skl-6700k     total:289  pass:265  dwarn:0   dfail:0   fail:0   skip:24  time:534s
fi-skl-6770hq    total:289  pass:269  dwarn:0   dfail:0   fail:0   skip:20  time:517s
fi-skl-gvtdvm    total:289  pass:266  dwarn:0   dfail:0   fail:0   skip:23  time:476s
fi-snb-2520m     total:289  pass:250  dwarn:0   dfail:0   fail:0   skip:39  time:585s
fi-snb-2600      total:289  pass:249  dwarn:0   dfail:0   fail:0   skip:40  time:431s

e364976751b3f78eb32ea957eb40deb0682393e5 drm-tip: 2017y-10m-06d-06h-41m-16s UTC integration manifest
d571d3c172b7 drm/i915: Use rcu instead of stop_machine

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5913/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Use rcu instead of stop_machine
  2017-10-06  8:30     ` Tvrtko Ursulin
@ 2017-10-06  8:49       ` Daniel Vetter
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Vetter @ 2017-10-06  8:49 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel Graphics Development, Mika Kuoppala

On Fri, Oct 6, 2017 at 10:30 AM, Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
> On 05/10/2017 17:24, Daniel Vetter wrote:
>>
>> On Thu, Oct 05, 2017 at 03:55:19PM +0100, Tvrtko Ursulin wrote:
>>>
>>>
>>> On 05/10/2017 15:09, Daniel Vetter wrote:
>>>>
>>>> stop_machine is not really a locking primitive we should use, except
>>>> when the hw folks tell us the hw is broken and that's the only way to
>>>> work around it.
>>>>
>>>> This patch here is just a suggestion for how to fix it up, possible
>>>> changes needed to make it actually work:
>>>>
>>>> - Set the nop_submit_request first for _all_ engines, before
>>>>     proceeding.
>>>>
>>>> - Make sure engine->cancel_requests copes with the possibility that
>>>>     not all tests have consistently used the new or old version. I dont
>>>>     think this is a problem, since the same can happen really with the
>>>>     stop_machine() locking - stop_machine also doesn't give you any kind
>>>>     of global ordering against other cpu threads, it just makes them
>>>>     stop.
>>>>
>>>> This patch tries to address the locking snafu from
>>>>
>>>> commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
>>>> Author: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Date:   Tue Nov 22 14:41:21 2016 +0000
>>>>
>>>>       drm/i915: Stop the machine as we install the wedged submit_request
>>>> handler
>>>>
>>>> Chris said parts of the reasons for going with stop_machine() was that
>>>> it's no overhead for the fast-path. But these callbacks use irqsave
>>>> spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
>>>>
>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_gem.c                   | 18
>>>> +++++-------------
>>>>    drivers/gpu/drm/i915/i915_gem_request.c           |  2 ++
>>>>    drivers/gpu/drm/i915/selftests/i915_gem_request.c |  2 ++
>>>>    3 files changed, 9 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>>>> b/drivers/gpu/drm/i915/i915_gem.c
>>>> index ab8c6946fea4..0b260e576b4b 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>> @@ -3022,13 +3022,13 @@ static void nop_submit_request(struct
>>>> drm_i915_gem_request *request)
>>>>    static void engine_set_wedged(struct intel_engine_cs *engine)
>>>>    {
>>>> +       engine->submit_request = nop_submit_request;
>>>
>>>
>>> Should this be rcu_assign_pointer?
>>
>>
>> Those provide additional barriers, needed when you change/allocate the
>> stuff you're pointing to. We point to immutable functions, so shouldn't be
>> necessary (and would be confusing imo).
>
>
> Ah ok. Any barriers then? Or synchronize_rcu implies them all?

Yup, at least for simple load/store.

rcu_derefence/rcu_assign_pointer need additional barriers because
compilers (and some cpus like alpha) might first load the pointer,
then load something through that pointer, then re-load the pointer,
and that could mean you see the memory pointed at in a state before
rcu_assign_pointer has been called.

On x86 it's just READ_ONCE/WRITE_ONCE, but alpha needs actual hw
barriers on top for this.

>>>> +
>>>>         /* We need to be sure that no thread is running the old callback
>>>> as
>>>>          * we install the nop handler (otherwise we would submit a
>>>> request
>>>> -        * to hardware that will never complete). In order to prevent
>>>> this
>>>> -        * race, we wait until the machine is idle before making the
>>>> swap
>>>> -        * (using stop_machine()).
>>>> +        * to hardware that will never complete).
>>>>          */
>>>> -       engine->submit_request = nop_submit_request;
>>>> +       synchronize_rcu();
>>>
>>>
>>> Consumers of this are running in irq disabled or softirq. Does this mean
>>> we
>>> would need synchronize_rcu_bh? Would either guarantee all tasklets and
>>> irq
>>> handlers have exited?
>>
>>
>> Oh ... tbh I didn't even digg that deep (much less ran this stuff). This
>> really is an RFC so people with real clue could say whether it has a
>> chance of working or not.
>>
>> Looking at rcu docs we don't want _bh variants, since rcu_read_lock should
>> be safe in even hardirq context. _bh and _sched otoh require that all
>> critical sections are either in bottom halfs or hardirq context, since
>> they treat scheduling of those as a grace period.
>
>
> rcu_read_unlock might schedule (via preempt_enable) so I don't think we can
> use them from the fence callbacks.

Only when it's the outermost preeempt_enable, and hard/softirq are
special kinds of preempt_disable/enable. See the implementation.

> And _bh is indeed only for softirq while we need hard and soft. So I am not
> sure which one we could use.

normal/_bh/_sched isn't about where you have your read side critical
section, but what other stuff can run in the read side critical
section, and hence what counts as a quiescent event:

normal -> no preempt -> task switch is a rcu quiescent state
_bh -> no (other) softirq -> sofirq completion is a quiescent state
(plus anything where softirq aren't disabled, which can be used to
expedite the grace period)
_sched: like _bh, but for hardirq.

And if you need scheduling within your critical section, then use
srcu. So different rcu variants trade off overhead and speed of the
grace period against what you can get preempted with in the read side
critical sections. They don't make restrictions on where your read
side critical section can be run. E.g. you could do an srcu read side
critical section in a hardirq handler - of course sleeping isn't
allowed anymore, but that's because you run in the hardirq handler,
not because you're in the read side srcu section.

> It sounds to me any would be wrong and if we wanted to drop stop_machine we
> would simply have to use nothing. But then we couldn't be certain there are
> no more new requests queued after wedged has been set.

Just dropping it entirely is definitely not good enough, we've watched
that approach go boom already. We do need some ordering to make sure
we don't start cleaning up while someone else is still executing code
from the old callbacks.

> Maybe I am missing something, not sure.

rcu is hard :-)
-Daniel

>  Regards,
>
> Tvrtko
>
>
>> Cheers, Daniel
>>
>>>>         /* Mark all executing requests as skipped */
>>>>         engine->cancel_requests(engine);
>>>> @@ -3041,9 +3041,8 @@ static void engine_set_wedged(struct
>>>> intel_engine_cs *engine)
>>>>
>>>> intel_engine_last_submit(engine));
>>>>    }
>>>> -static int __i915_gem_set_wedged_BKL(void *data)
>>>> +void i915_gem_set_wedged(struct drm_i915_private *i915)
>>>>    {
>>>> -       struct drm_i915_private *i915 = data;
>>>>         struct intel_engine_cs *engine;
>>>>         enum intel_engine_id id;
>>>> @@ -3052,13 +3051,6 @@ static int __i915_gem_set_wedged_BKL(void *data)
>>>>         set_bit(I915_WEDGED, &i915->gpu_error.flags);
>>>>         wake_up_all(&i915->gpu_error.reset_queue);
>>>> -
>>>> -       return 0;
>>>> -}
>>>> -
>>>> -void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
>>>> -{
>>>> -       stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL);
>>>>    }
>>>>    bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem_request.c
>>>> b/drivers/gpu/drm/i915/i915_gem_request.c
>>>> index b100b38f1dd2..ef78a85cb845 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem_request.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem_request.c
>>>> @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum
>>>> i915_sw_fence_notify state)
>>>>         switch (state) {
>>>>         case FENCE_COMPLETE:
>>>>                 trace_i915_gem_request_submit(request);
>>>> +               rcu_read_lock();
>>>>                 request->engine->submit_request(request);
>>>> +               rcu_read_unlock();
>>>
>>>
>>> And _bh for these? Although this already runs with preemption off, but I
>>> guess it is good for documentation.
>>>
>>> Regards,
>>>
>>> Tvrtko
>>>
>>>>                 break;
>>>>         case FENCE_FREE:
>>>> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c
>>>> b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
>>>> index 78b9f811707f..a999161e8db1 100644
>>>> --- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c
>>>> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c
>>>> @@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg)
>>>>         }
>>>>         i915_gem_request_get(vip);
>>>>         i915_add_request(vip);
>>>> +       rcu_read_lock();
>>>>         request->engine->submit_request(request);
>>>> +       rcu_read_unlock();
>>>>         mutex_unlock(&i915->drm.struct_mutex);
>>>>
>>
>



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Use rcu instead of stop_machine
  2017-10-06  8:42     ` Chris Wilson
@ 2017-10-06  8:56       ` Daniel Vetter
  2017-10-06 10:01         ` Thomas Gleixner
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Vetter @ 2017-10-06  8:56 UTC (permalink / raw)
  To: Chris Wilson, Thomas Gleixner
  Cc: Daniel Vetter, Intel Graphics Development, Mika Kuoppala

On Fri, Oct 6, 2017 at 10:42 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> Quoting Daniel Vetter (2017-10-05 17:12:35)
>> On Thu, Oct 05, 2017 at 03:30:12PM +0100, Chris Wilson wrote:
>> > Quoting Daniel Vetter (2017-10-05 15:09:48)
>> > > stop_machine is not really a locking primitive we should use, except
>> > > when the hw folks tell us the hw is broken and that's the only way to
>> > > work around it.
>> > >
>> > > This patch here is just a suggestion for how to fix it up, possible
>> > > changes needed to make it actually work:
>> > >
>> > > - Set the nop_submit_request first for _all_ engines, before
>> > >   proceeding.
>> > >
>> > > - Make sure engine->cancel_requests copes with the possibility that
>> > >   not all tests have consistently used the new or old version. I dont
>> > >   think this is a problem, since the same can happen really with the
>> > >   stop_machine() locking - stop_machine also doesn't give you any kind
>> > >   of global ordering against other cpu threads, it just makes them
>> > >   stop.
>> > >
>> > > This patch tries to address the locking snafu from
>> >
>> > There's a locking snafu in the code?
>> >
>> > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
>> > > Author: Chris Wilson <chris@chris-wilson.co.uk>
>> > > Date:   Tue Nov 22 14:41:21 2016 +0000
>> > >
>> > >     drm/i915: Stop the machine as we install the wedged submit_request handler
>> > >
>> > > Chris said parts of the reasons for going with stop_machine() was that
>> > > it's no overhead for the fast-path.
>> >
>> > More than that, you don't even have to think about it. It's a one off
>> > event that changes execution paths. I actually never thought about
>> > putting the lock mechanism around the caller (that does prevent the issue
>> > I was dreading of being inside the callback as it changed), it is still
>> > magic that has nothing to do with the code flow. What variable should we
>> > document as being rcu protected, (*engine->submit_request)()?
>>
>> Yeah, engine->submit_request would be the one, except it shouldn't use
>> rcu_assign_pointer/rcu_dereference since we don't need those barriers,
>> ever - the .text doesn't change after all. That's why I didn't splatter
>> the sparse annotations all over it. In a way we don't need the data
>> barriers of rcu, but really only the "everyone has passed the critical
>> section now" part of it.
>>
>> > I'm definitely not sold on having set-wedge dictate terms to the rest of
>> > the code.
>>
>> Well, it sounds like tglx would very much prefer we clean this locking
>> inversion up on our side and not get them to break the cycle. Not what I
>> wanted Thomas to look at, but since I botched it and attached the wrong
>> splat we got the answer for this patch here, and it seems to be "don't do
>> that".
>
> The analogy is with kernel live-patching (which is what we are doing
> here). Should every module in the kernel manually markup itself to
> support the esoteric feature, or should the feature support the wider
> kernel?

This is completely different from kernel live patching, where the
actual instructions get changed while we execute them. You'd need to
annotate every single IP load, which is impossible. This here seems to
be much more a bog standard critical section, restrict to a very small
piece of code (the hw submit functions essentially), and if we can't
get that to work then our locking is seriously in bad shape.
stop_machine really isn't a valid locking mechanism, except in very
extreme cases where there's really no other solution.

If you disagree, then pls convince Thomas Gleixner that he needs to
fix this on his end.

The other bit is that lockdep is warn_once, so every week we have this
in our CI is a week with reduced test coverage. Even if Thomas fixes
this in the cpu hotplug code I expect it'll take longer than to get
some interim solution in here in i915, since it doesn't sound trivial
at all to fix this on the cpu hotplug side either.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* ✗ Fi.CI.IGT: failure for drm/i915: Use rcu instead of stop_machine
  2017-10-05 14:09 [PATCH] drm/i915: Use rcu instead of stop_machine Daniel Vetter
                   ` (2 preceding siblings ...)
  2017-10-06  8:47 ` ✓ Fi.CI.BAT: success for " Patchwork
@ 2017-10-06  9:50 ` Patchwork
  3 siblings, 0 replies; 12+ messages in thread
From: Patchwork @ 2017-10-06  9:50 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

== Series Details ==

Series: drm/i915: Use rcu instead of stop_machine
URL   : https://patchwork.freedesktop.org/series/31433/
State : failure

== Summary ==

Test kms_cursor_crc:
        Subgroup cursor-size-change:
                dmesg-fail -> PASS       (shard-hsw) fdo#102886 +5
Test perf:
        Subgroup polling:
                pass       -> FAIL       (shard-hsw)

fdo#102886 https://bugs.freedesktop.org/show_bug.cgi?id=102886

shard-hsw        total:2446 pass:1332 dwarn:1   dfail:0   fail:10  skip:1103 time:10192s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5913/shards.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] drm/i915: Use rcu instead of stop_machine
  2017-10-06  8:56       ` Daniel Vetter
@ 2017-10-06 10:01         ` Thomas Gleixner
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2017-10-06 10:01 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Peter Zijlstra, Daniel Vetter, Intel Graphics Development, Mika Kuoppala

On Fri, 6 Oct 2017, Daniel Vetter wrote:
> On Fri, Oct 6, 2017 at 10:42 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >> > > This patch tries to address the locking snafu from
> >> >
> >> > There's a locking snafu in the code?

The lock problem exists definitely. We did not introduce new dependencies
through the hotplug locking rework. They've been there forever.

What's new is that lockdep became smarter and we exposed the hotplug lock
to lockdep fully. So yes, we need to address them no matter what.

While we might have a solution for that devfs issue, there are other ways
to expose this kind of problem when you have code pathes which come from user
space and end up taking the hotplug lock.

> >> > > Chris said parts of the reasons for going with stop_machine() was that
> >> > > it's no overhead for the fast-path.
> >> >
> >> > More than that, you don't even have to think about it. It's a one off
> >> > event that changes execution paths. I actually never thought about
> >> > putting the lock mechanism around the caller (that does prevent the issue
> >> > I was dreading of being inside the callback as it changed), it is still
> >> > magic that has nothing to do with the code flow. What variable should we
> >> > document as being rcu protected, (*engine->submit_request)()?

stop_machine() is one of the biggest hammers we have in the kernel. We try
to avoid it where ever we can. It's the least resort for problems which
cannot be solved otherwise.

> >> Well, it sounds like tglx would very much prefer we clean this locking
> >> inversion up on our side and not get them to break the cycle. Not what I
> >> wanted Thomas to look at, but since I botched it and attached the wrong
> >> splat we got the answer for this patch here, and it seems to be "don't do
> >> that".
> >
> > The analogy is with kernel live-patching (which is what we are doing
> > here). Should every module in the kernel manually markup itself to
> > support the esoteric feature, or should the feature support the wider
> > kernel?
>
> This is completely different from kernel live patching, where the
> actual instructions get changed while we execute them. You'd need to
> annotate every single IP load, which is impossible. This here seems to
> be much more a bog standard critical section, restrict to a very small
> piece of code (the hw submit functions essentially), and if we can't
> get that to work then our locking is seriously in bad shape.
> stop_machine really isn't a valid locking mechanism, except in very
> extreme cases where there's really no other solution.

Correct. This has absolutely nothing to do with life patching. It's a bug
standard serialization issue which does not require the big hammer of
stop_machine().

Thanks,

	tglx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-10-06 10:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-05 14:09 [PATCH] drm/i915: Use rcu instead of stop_machine Daniel Vetter
2017-10-05 14:30 ` Chris Wilson
2017-10-05 16:12   ` Daniel Vetter
2017-10-06  8:42     ` Chris Wilson
2017-10-06  8:56       ` Daniel Vetter
2017-10-06 10:01         ` Thomas Gleixner
2017-10-05 14:55 ` Tvrtko Ursulin
2017-10-05 16:24   ` Daniel Vetter
2017-10-06  8:30     ` Tvrtko Ursulin
2017-10-06  8:49       ` Daniel Vetter
2017-10-06  8:47 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-10-06  9:50 ` ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.