linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] workqueue fixes for v4.6-rc5
@ 2016-04-27 16:11 Tejun Heo
  2016-04-27 18:49 ` Linus Torvalds
  0 siblings, 1 reply; 3+ messages in thread
From: Tejun Heo @ 2016-04-27 16:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Roman Pen, peter, Jens Axboe

Hello, Linus.

So, it turns out we had a silly bug in the most fundamental part of
workqueue for a very long time.  AFAICS, this dates back to pre-git
era and has quite likely been there from the time workqueue was first
introduced.

A work item uses its PENDING bit to synchronize multiple queuers.
Anyone who wins the PENDING bit owns the pending state of the work
item.  Whether a queuer wins or loses the race, one thing should be
guaranteed - there will soon be at least one execution of the work
item - where "after" means that the execution instance would be able
to see all the changes that the queuer has made prior to the queueing
attempt.

Unfortunately, we were missing a smp_rmb() after clearing PENDING for
execution, so nothing guaranteed visibility of the changes that a
queueing loser has made, which manifested as a reproducible blk-mq
stall.

Lots of kudos to Roman for debugging the problem.  The patch for
-stable is the minimal one.  For v3.7, Peter is working on a patch to
make the code path slightly more efficient and less fragile.

Thanks.

The following changes since commit f7813ad5cbfd1fab2899914281b72a1ba0805c80:

  Merge tag 'for-linus-4.6' of git://git.code.sf.net/p/openipmi/linux-ipmi (2016-03-18 10:21:04 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-4.6-fixes

for you to fetch changes up to 346c09f80459a3ad97df1816d6d606169a51001a:

  workqueue: fix ghost PENDING flag while doing MQ IO (2016-04-26 11:23:22 -0400)

----------------------------------------------------------------
Roman Pen (1):
      workqueue: fix ghost PENDING flag while doing MQ IO

 kernel/workqueue.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

-- 
tejun

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [GIT PULL] workqueue fixes for v4.6-rc5
  2016-04-27 16:11 [GIT PULL] workqueue fixes for v4.6-rc5 Tejun Heo
@ 2016-04-27 18:49 ` Linus Torvalds
  2016-04-27 19:03   ` Tejun Heo
  0 siblings, 1 reply; 3+ messages in thread
From: Linus Torvalds @ 2016-04-27 18:49 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Linux Kernel Mailing List, Roman Pen, Peter Hurley, Jens Axboe

On Wed, Apr 27, 2016 at 9:11 AM, Tejun Heo <tj@kernel.org> wrote:
>
> Unfortunately, we were missing a smp_rmb() after clearing PENDING for
> execution, so nothing guaranteed visibility of the changes that a
> queueing loser has made, which manifested as a reproducible blk-mq
> stall.

That explanation makes no sense. A smp_rmb() after a store makes no
sense what-so-ever.

Happily, the code itself does and has a big comment. It's not a
"smp_rmb()", it's a full memory barrier. Which *does* make sense as
serializing a store wrt a following load.

                  Linus

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [GIT PULL] workqueue fixes for v4.6-rc5
  2016-04-27 18:49 ` Linus Torvalds
@ 2016-04-27 19:03   ` Tejun Heo
  0 siblings, 0 replies; 3+ messages in thread
From: Tejun Heo @ 2016-04-27 19:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Roman Pen, Peter Hurley, Jens Axboe

On Wed, Apr 27, 2016 at 11:49:35AM -0700, Linus Torvalds wrote:
> On Wed, Apr 27, 2016 at 9:11 AM, Tejun Heo <tj@kernel.org> wrote:
> >
> > Unfortunately, we were missing a smp_rmb() after clearing PENDING for
> > execution, so nothing guaranteed visibility of the changes that a
> > queueing loser has made, which manifested as a reproducible blk-mq
> > stall.
> 
> That explanation makes no sense. A smp_rmb() after a store makes no
> sense what-so-ever.
> 
> Happily, the code itself does and has a big comment. It's not a
> "smp_rmb()", it's a full memory barrier. Which *does* make sense as
> serializing a store wrt a following load.

Heh, yeah, brainfart.  smp_mb(), obviously.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-04-27 19:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-27 16:11 [GIT PULL] workqueue fixes for v4.6-rc5 Tejun Heo
2016-04-27 18:49 ` Linus Torvalds
2016-04-27 19:03   ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).