From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752821AbdDJJjm (ORCPT <rfc822;w@1wt.eu>);
        Mon, 10 Apr 2017 05:39:42 -0400
Received: from mail.fireflyinternet.com ([109.228.58.192]:50690 "EHLO
        fireflyinternet.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org
        with ESMTP id S1751134AbdDJJjl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 10 Apr 2017 05:39:41 -0400
X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138;
Date: Mon, 10 Apr 2017 10:39:33 +0100
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Martin Kepplinger <martink@posteo.de>,
        Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
        Thorsten Leemhuis <regressions@leemhuis.info>, daniel.vetter@intel.com,
        Dave Airlie <airlied@gmail.com>, intel-gfx@lists.freedesktop.org,
        linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 2/5] i915: flush gem obj freeing workqueues to add
 accuracy to the i915 shrinker
Message-ID: <20170410093933.GB6834@nuc-i3427.alporthouse.com>
Mail-Followup-To: Chris Wilson <chris@chris-wilson.co.uk>,
        Andrea Arcangeli <aarcange@redhat.com>,
        Martin Kepplinger <martink@posteo.de>,
        Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
        Thorsten Leemhuis <regressions@leemhuis.info>,
        daniel.vetter@intel.com, Dave Airlie <airlied@gmail.com>,
        intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org,
        dri-devel@lists.freedesktop.org
References: <87pogtplxr.fsf@intel.com>
 <20170406232347.988-1-aarcange@redhat.com>
 <20170406232347.988-3-aarcange@redhat.com>
 <20170407100211.GG10496@nuc-i3427.alporthouse.com>
 <20170407130600.GA5035@redhat.com>
 <20170407153011.GR10496@nuc-i3427.alporthouse.com>
 <20170407164858.GF5035@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170407164858.GF5035@redhat.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri, Apr 07, 2017 at 06:48:58PM +0200, Andrea Arcangeli wrote:
> On Fri, Apr 07, 2017 at 04:30:11PM +0100, Chris Wilson wrote:
> > Not getting hangs is a good sign, but lockdep doesn't like it:
> > 
> > [  460.684901] WARNING: CPU: 1 PID: 172 at kernel/workqueue.c:2418 check_flush_dependency+0x92/0x130
> > [  460.684924] workqueue: PF_MEMALLOC task 172(kworker/1:1H) is flushing !WQ_MEM_RECLAIM events:__i915_gem_free_work [i915]
> > 
> > If I allocated the workqueue with WQ_MEM_RELCAIM, it complains bitterly
> > as well.
> 
> So in PF_MEMALLOC context we can't flush a workqueue with
> !WQ_MEM_RECLAIM.
> 
> 	system_wq = alloc_workqueue("events", 0, 0);
> 
> My point is synchronize_rcu_expedited will still push its work in
> the same system_wq workqueue...
> 
> 		/* Marshall arguments & schedule the expedited grace period. */
> 		rew.rew_func = func;
> 		rew.rew_rsp = rsp;
> 		rew.rew_s = s;
> 		INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
> 		schedule_work(&rew.rew_work);
> 
> It's also using schedule_work, so either the above is a false
> positive, or we've still a problem with synchronize_rcu_expedited,
> just a reproducible issue anymore after we stop running it under the
> struct_mutex.

We still do have a problem with using synchronize_rcu_expedited() from
the shrinker as we maybe under someone else's mutex is that involved in
its own RCU dance.

> Even synchronize_sched will wait on the system_wq if
> synchronize_rcu_expedited has been issued in parallel by some other
> code, it's just there's no check for it because it's not invoking
> flush_work to wait.

Right.
 
> The deadlock happens if we flush_work() while holding any lock that
> may be taken by any of the workqueues that could be queued there. i915
> makes sure to flush_work only if the struct_mutex was released (not
> my initial version) but we're not checking if any of the other
> system_wq workqueues could possibly taking a lock that may have been
> hold during the allocation that invoked reclaim, I suppose that is the
> problem left, but I don't see how flush_work is special about it,
> synchronize_rcu_expedited would still have the same issue no? (despite
> no lockdep warning)
> 
> I suspect this means synchronize_rcu_expedited() is not usable in
> reclaim context and lockdep should warn if PF_MEMALLOC is set when
> synchronize_rcu_expedited/synchronize_sched/synchronize_rcu are
> called.

Yes.

> Probably to fix this we should create a private workqueue for both RCU
> and i915 and stop sharing the system_wq with the rest of the system
> (and of course set WQ_MEM_RECLAIM in both workqueues). This makes sure
> when we call synchronize_rcu_expedited; flush_work from the shrinker,
> we won't risk waiting on other random work that may be taking locks
> that are hold by the code that invoked reclaim during an allocation.

We simply do not need to do our own synchronize_rcu* -- it's only used
to flush our slab frees on the off chance that (a) we have any and (b)
we do manage to free a whole slab. It is not the bulk of the memory that
we return to the system from the shrinker.

In the other thread, I stated that we should simply remove it. The
kswapd reclaim path should try to reclaim RCU slabs (by doing a
synhronize_sched or equivalent).

> The macro bug of waiting on system_wq 100% of the time while always
> holding the struct_mutex is gone, but we need to perfect this further
> and stop using the system_wq for RCU and i915 shrinker work.

Agreed. My preference is to simply not do it and leave the dangling RCU
to the core reclaim paths.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre