From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751994AbdJFLLB (ORCPT ); Fri, 6 Oct 2017 07:11:01 -0400 Received: from merlin.infradead.org ([205.233.59.134]:52336 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751702AbdJFLK6 (ORCPT ); Fri, 6 Oct 2017 07:10:58 -0400 Date: Fri, 6 Oct 2017 13:10:39 +0200 From: Peter Zijlstra To: Daniel Vetter Cc: Intel Graphics Development , LKML , Chris Wilson , Mika Kuoppala , Thomas Gleixner , Marta Lofstedt , Daniel Vetter Subject: Re: [PATCH 2/2] drm/i915: Use rcu instead of stop_machine in set_wedged Message-ID: <20171006111039.pwp7ijxclt4fchxp@hirez.programming.kicks-ass.net> References: <20171006090637.25545-1-daniel.vetter@ffwll.ch> <20171006090637.25545-2-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171006090637.25545-2-daniel.vetter@ffwll.ch> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 06, 2017 at 11:06:37AM +0200, Daniel Vetter wrote: > I pondered whether we should annotate engine->submit_request as __rcu > and use rcu_assign_pointer and rcu_dereference on it. But the reason > behind those is to make sure the compiler/cpu barriers are there for > when you have an actual data structure you point at, to make sure all > the writes are seen correctly on the read side. But we just have a > function pointer, and .text isn't changed, so no need for these > barriers and hence no need for annotations. synchronize_*() provides an smp_mb() on the calling CPU and ensures an smp_mb() on all other CPUs before completion, such that everybody agrees on the state prior to calling syncrhonize_rcu(). So yes, no additional ordering requirements. > drivers/gpu/drm/i915/i915_gem.c | 31 +++++++++-------------- > drivers/gpu/drm/i915/i915_gem_request.c | 2 ++ > drivers/gpu/drm/i915/selftests/i915_gem_request.c | 2 ++ > 3 files changed, 16 insertions(+), 19 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index ab8c6946fea4..e79a6ca60265 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request) > intel_engine_init_global_seqno(request->engine, request->global_seqno); > } > > +static void engine_complete_requests(struct intel_engine_cs *engine) > { > /* Mark all executing requests as skipped */ > engine->cancel_requests(engine); > > @@ -3041,24 +3033,25 @@ static void engine_set_wedged(struct intel_engine_cs *engine) > intel_engine_last_submit(engine)); > } > > +void i915_gem_set_wedged(struct drm_i915_private *i915) > { > struct intel_engine_cs *engine; > enum intel_engine_id id; > > for_each_engine(engine, i915, id) > + engine->submit_request = nop_submit_request; > > + /* Make sure no one is running the old callback before we proceed with > + * cancelling requests and resetting the completion tracking. Otherwise > + * we might submit a request to the hardware which never completes. > + */ ARGH @ horrid comment style.. http://lkml.iu.edu/hypermail/linux/kernel/1607.1/00627.html :-) > + synchronize_rcu(); > > + for_each_engine(engine, i915, id) > + engine_complete_requests(engine); > > + set_bit(I915_WEDGED, &i915->gpu_error.flags); > + wake_up_all(&i915->gpu_error.reset_queue); > } > > bool i915_gem_unset_wedged(struct drm_i915_private *i915) > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c > index b100b38f1dd2..ef78a85cb845 100644 > --- a/drivers/gpu/drm/i915/i915_gem_request.c > +++ b/drivers/gpu/drm/i915/i915_gem_request.c > @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) > switch (state) { > case FENCE_COMPLETE: > trace_i915_gem_request_submit(request); > + rcu_read_lock(); > request->engine->submit_request(request); > + rcu_read_unlock(); > break; > > case FENCE_FREE: > diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c > index 78b9f811707f..a999161e8db1 100644 > --- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c > @@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg) > } > i915_gem_request_get(vip); > i915_add_request(vip); > + rcu_read_lock(); > request->engine->submit_request(request); > + rcu_read_unlock(); > > mutex_unlock(&i915->drm.struct_mutex); Yes, this is a correct and good replacement, however, you said: > Chris said parts of the reasons for going with stop_machine() was that > it's no overhead for the fast-path. But these callbacks use irqsave > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast. This means that you could simply do synchronize_sched() without the addition of rcu_read_lock()s and still be fine.