* [GIT PULL] workqueue fixes for v4.6-rc5
@ 2016-04-27 16:11 Tejun Heo
2016-04-27 18:49 ` Linus Torvalds
0 siblings, 1 reply; 3+ messages in thread
From: Tejun Heo @ 2016-04-27 16:11 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-kernel, Roman Pen, peter, Jens Axboe
Hello, Linus.
So, it turns out we had a silly bug in the most fundamental part of
workqueue for a very long time. AFAICS, this dates back to pre-git
era and has quite likely been there from the time workqueue was first
introduced.
A work item uses its PENDING bit to synchronize multiple queuers.
Anyone who wins the PENDING bit owns the pending state of the work
item. Whether a queuer wins or loses the race, one thing should be
guaranteed - there will soon be at least one execution of the work
item - where "after" means that the execution instance would be able
to see all the changes that the queuer has made prior to the queueing
attempt.
Unfortunately, we were missing a smp_rmb() after clearing PENDING for
execution, so nothing guaranteed visibility of the changes that a
queueing loser has made, which manifested as a reproducible blk-mq
stall.
Lots of kudos to Roman for debugging the problem. The patch for
-stable is the minimal one. For v3.7, Peter is working on a patch to
make the code path slightly more efficient and less fragile.
Thanks.
The following changes since commit f7813ad5cbfd1fab2899914281b72a1ba0805c80:
Merge tag 'for-linus-4.6' of git://git.code.sf.net/p/openipmi/linux-ipmi (2016-03-18 10:21:04 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-4.6-fixes
for you to fetch changes up to 346c09f80459a3ad97df1816d6d606169a51001a:
workqueue: fix ghost PENDING flag while doing MQ IO (2016-04-26 11:23:22 -0400)
----------------------------------------------------------------
Roman Pen (1):
workqueue: fix ghost PENDING flag while doing MQ IO
kernel/workqueue.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
--
tejun
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [GIT PULL] workqueue fixes for v4.6-rc5
2016-04-27 16:11 [GIT PULL] workqueue fixes for v4.6-rc5 Tejun Heo
@ 2016-04-27 18:49 ` Linus Torvalds
2016-04-27 19:03 ` Tejun Heo
0 siblings, 1 reply; 3+ messages in thread
From: Linus Torvalds @ 2016-04-27 18:49 UTC (permalink / raw)
To: Tejun Heo; +Cc: Linux Kernel Mailing List, Roman Pen, Peter Hurley, Jens Axboe
On Wed, Apr 27, 2016 at 9:11 AM, Tejun Heo <tj@kernel.org> wrote:
>
> Unfortunately, we were missing a smp_rmb() after clearing PENDING for
> execution, so nothing guaranteed visibility of the changes that a
> queueing loser has made, which manifested as a reproducible blk-mq
> stall.
That explanation makes no sense. A smp_rmb() after a store makes no
sense what-so-ever.
Happily, the code itself does and has a big comment. It's not a
"smp_rmb()", it's a full memory barrier. Which *does* make sense as
serializing a store wrt a following load.
Linus
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [GIT PULL] workqueue fixes for v4.6-rc5
2016-04-27 18:49 ` Linus Torvalds
@ 2016-04-27 19:03 ` Tejun Heo
0 siblings, 0 replies; 3+ messages in thread
From: Tejun Heo @ 2016-04-27 19:03 UTC (permalink / raw)
To: Linus Torvalds
Cc: Linux Kernel Mailing List, Roman Pen, Peter Hurley, Jens Axboe
On Wed, Apr 27, 2016 at 11:49:35AM -0700, Linus Torvalds wrote:
> On Wed, Apr 27, 2016 at 9:11 AM, Tejun Heo <tj@kernel.org> wrote:
> >
> > Unfortunately, we were missing a smp_rmb() after clearing PENDING for
> > execution, so nothing guaranteed visibility of the changes that a
> > queueing loser has made, which manifested as a reproducible blk-mq
> > stall.
>
> That explanation makes no sense. A smp_rmb() after a store makes no
> sense what-so-ever.
>
> Happily, the code itself does and has a big comment. It's not a
> "smp_rmb()", it's a full memory barrier. Which *does* make sense as
> serializing a store wrt a following load.
Heh, yeah, brainfart. smp_mb(), obviously.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-04-27 19:03 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-27 16:11 [GIT PULL] workqueue fixes for v4.6-rc5 Tejun Heo
2016-04-27 18:49 ` Linus Torvalds
2016-04-27 19:03 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).