Re: On-stack work item completion race? (was Re: XFS crash?)

From: Dave Chinner <david@fromorbit.com>
To: Tejun Heo <tj@kernel.org>
Cc: Austin Schuh <austin@peloton-tech.com>, xfs <xfs@oss.sgi.com>,
	linux-kernel@vger.kernel.org
Subject: Re: On-stack work item completion race? (was Re: XFS crash?)
Date: Wed, 25 Jun 2014 15:56:41 +1000	[thread overview]
Message-ID: <20140625055641.GL9508@dastard> (raw)
In-Reply-To: <20140624032521.GA12164@htj.dyndns.org>

On Mon, Jun 23, 2014 at 11:25:21PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jun 24, 2014 at 01:02:40PM +1000, Dave Chinner wrote:
> > As I understand it, what then happens is that the workqueue code
> > grabs another kworker thread and runs the next work item in it's
> > queue. IOWs, work items can block, but doing that does not prevent
> > execution of other work items queued on other work queues or even on
> > the same work queue. Tejun, did I get that correct?
> 
> Yes, as long as the workqueue is under its @max_active limit and has
> access to an existing kworker or can create a new one, it'll start
> executing the next work item immediately; however, the guaranteed
> level of concurrency is 1 even for WQ_RECLAIM workqueues.  IOW, the
> work items queued on a workqueue must be able to make forward progress
> with single work item if the work items are being depended upon for
> memory reclaim.

Hmmm - that's different from my understanding of what the original
behaviour WQ_MEM_RECLAIM gave us. i.e. that WQ_MEM_RECLAIM
workqueues had a rescuer thread created to guarantee that the
*workqueue* could make forward progress executing work in a
reclaim context.

The concept that the *work being executed* needs to guarantee
forwards progress is something I've never heard stated before.
That worries me a lot, especially with all the memory reclaim
problems that have surfaced in the past couple of months....

> As long as a WQ_RECLAIM workqueue dosen't depend upon itself,
> forward-progress is guaranteed.

I can't find any documentation that actually defines what
WQ_MEM_RECLAIM means, so I can't tell when or how this requirement
came about. If it's true, then I suspect most of the WQ_MEM_RECLAIM
workqueues in filesystems violate it. Can you point me at
documentation/commits/code describing the constraints of
WQ_MEM_RECLAIM and the reasons for it?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com