From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752821AbdDJJjm (ORCPT ); Mon, 10 Apr 2017 05:39:42 -0400 Received: from mail.fireflyinternet.com ([109.228.58.192]:50690 "EHLO fireflyinternet.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751134AbdDJJjl (ORCPT ); Mon, 10 Apr 2017 05:39:41 -0400 X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Date: Mon, 10 Apr 2017 10:39:33 +0100 From: Chris Wilson To: Andrea Arcangeli Cc: Martin Kepplinger , Joonas Lahtinen , Thorsten Leemhuis , daniel.vetter@intel.com, Dave Airlie , intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org Subject: Re: [PATCH 2/5] i915: flush gem obj freeing workqueues to add accuracy to the i915 shrinker Message-ID: <20170410093933.GB6834@nuc-i3427.alporthouse.com> Mail-Followup-To: Chris Wilson , Andrea Arcangeli , Martin Kepplinger , Joonas Lahtinen , Thorsten Leemhuis , daniel.vetter@intel.com, Dave Airlie , intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org References: <87pogtplxr.fsf@intel.com> <20170406232347.988-1-aarcange@redhat.com> <20170406232347.988-3-aarcange@redhat.com> <20170407100211.GG10496@nuc-i3427.alporthouse.com> <20170407130600.GA5035@redhat.com> <20170407153011.GR10496@nuc-i3427.alporthouse.com> <20170407164858.GF5035@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170407164858.GF5035@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 07, 2017 at 06:48:58PM +0200, Andrea Arcangeli wrote: > On Fri, Apr 07, 2017 at 04:30:11PM +0100, Chris Wilson wrote: > > Not getting hangs is a good sign, but lockdep doesn't like it: > > > > [ 460.684901] WARNING: CPU: 1 PID: 172 at kernel/workqueue.c:2418 check_flush_dependency+0x92/0x130 > > [ 460.684924] workqueue: PF_MEMALLOC task 172(kworker/1:1H) is flushing !WQ_MEM_RECLAIM events:__i915_gem_free_work [i915] > > > > If I allocated the workqueue with WQ_MEM_RELCAIM, it complains bitterly > > as well. > > So in PF_MEMALLOC context we can't flush a workqueue with > !WQ_MEM_RECLAIM. > > system_wq = alloc_workqueue("events", 0, 0); > > My point is synchronize_rcu_expedited will still push its work in > the same system_wq workqueue... > > /* Marshall arguments & schedule the expedited grace period. */ > rew.rew_func = func; > rew.rew_rsp = rsp; > rew.rew_s = s; > INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp); > schedule_work(&rew.rew_work); > > It's also using schedule_work, so either the above is a false > positive, or we've still a problem with synchronize_rcu_expedited, > just a reproducible issue anymore after we stop running it under the > struct_mutex. We still do have a problem with using synchronize_rcu_expedited() from the shrinker as we maybe under someone else's mutex is that involved in its own RCU dance. > Even synchronize_sched will wait on the system_wq if > synchronize_rcu_expedited has been issued in parallel by some other > code, it's just there's no check for it because it's not invoking > flush_work to wait. Right. > The deadlock happens if we flush_work() while holding any lock that > may be taken by any of the workqueues that could be queued there. i915 > makes sure to flush_work only if the struct_mutex was released (not > my initial version) but we're not checking if any of the other > system_wq workqueues could possibly taking a lock that may have been > hold during the allocation that invoked reclaim, I suppose that is the > problem left, but I don't see how flush_work is special about it, > synchronize_rcu_expedited would still have the same issue no? (despite > no lockdep warning) > > I suspect this means synchronize_rcu_expedited() is not usable in > reclaim context and lockdep should warn if PF_MEMALLOC is set when > synchronize_rcu_expedited/synchronize_sched/synchronize_rcu are > called. Yes. > Probably to fix this we should create a private workqueue for both RCU > and i915 and stop sharing the system_wq with the rest of the system > (and of course set WQ_MEM_RECLAIM in both workqueues). This makes sure > when we call synchronize_rcu_expedited; flush_work from the shrinker, > we won't risk waiting on other random work that may be taking locks > that are hold by the code that invoked reclaim during an allocation. We simply do not need to do our own synchronize_rcu* -- it's only used to flush our slab frees on the off chance that (a) we have any and (b) we do manage to free a whole slab. It is not the bulk of the memory that we return to the system from the shrinker. In the other thread, I stated that we should simply remove it. The kswapd reclaim path should try to reclaim RCU slabs (by doing a synhronize_sched or equivalent). > The macro bug of waiting on system_wq 100% of the time while always > holding the struct_mutex is gone, but we need to perfect this further > and stop using the system_wq for RCU and i915 shrinker work. Agreed. My preference is to simply not do it and leave the dangling RCU to the core reclaim paths. -Chris -- Chris Wilson, Intel Open Source Technology Centre