linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Berg <johannes@sipsolutions.net>
To: Bart Van Assche <bvanassche@acm.org>, Tejun Heo <tj@kernel.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>,
	"linux-nvme @ lists . infradead . org" 
	<linux-nvme@lists.infradead.org>
Subject: Re: [PATCH] Revert "workqueue: re-add lockdep dependencies for flushing"
Date: Mon, 22 Oct 2018 22:28:42 +0200	[thread overview]
Message-ID: <094669f3df1690dec5913c2086f6a6d8c470f685.camel@sipsolutions.net> (raw)
In-Reply-To: <13901aed5074f4b1fbd259d03928efb6ab40c65a.camel@sipsolutions.net>

On Mon, 2018-10-22 at 22:14 +0200, Johannes Berg wrote:

> "False positives" - which in this case really should more accurately be
> called "bad lockdep annotations" - of the kind you report here are
> inherent in how lockdep works, and thus following your argument to the
> ultimate conclusion you should remove lockdep.

> I must also say that I'm disappointed you'd try to do things this way.
> I'd be (have been?) willing to actually help you understand the problem
> and add the annotations, but rather than answer my question ("where do I
> find the right git tree"!) you just send a revert patch.

Actually, wait, you have a different problem in the patch than the one
you reported earlier? Which I haven't even seen at all...

And this one is actually either the same missing subclass annotation, or
completely valid. You have here

 * dio/... workqueue
 * dio->complete_work, running on that workqueue
 * i_mutex_key#16

The lockdep report even more or less tells you what's going on. Perhaps
we need to find a way to make lockdep not print "lock()" but "start()"
or "flush()" for work items ... but if you read it this way, you see:

CPU0			CPU1

lock(i_mutex_key)
			start(dio->complete_work)
			lock(i_mutex_key)
flush(wq dio/...)

which is *clearly* a problem. You can't flush a workqueue under lock
that is (or might be) currently running a work item that takes the same
lock. You've already acquired the lock, and so the complete_work is
waiting for the lock, but you're now in the flush waiting for it to
finish ... pretty much a classic ABBA deadlock, just with workqueues.

It's possible, of course, that multiple instances of the same class are
involved and they're somehow ordered/nested in a way that is in fact
safe, say in this case CPU0 and CPU1 have acquired (or are trying to
acquire) fundamentally different instances of "i_mutex_key#16", or the
dio->complete_work here doesn't actually run on the dio workqueue in
question, but somehow a different one.

With workqueues, it's also possible that something out-of-band, other
than the flush_workqueue(), prevents dio->complete_work from being
running at this point in time - but in that case you might actually be
able to get rid of the flush_workqueue() entirely.

It ought to be possible to annotate this too though, but again, I'm not
an expert on what the correct and safe nesting here would be, and thus
wouldn't be able to say how to do it.

johannes


  reply	other threads:[~2018-10-22 20:29 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-22 15:18 [PATCH] Revert "workqueue: re-add lockdep dependencies for flushing" Bart Van Assche
2018-10-22 20:14 ` Johannes Berg
2018-10-22 20:28   ` Johannes Berg [this message]
2018-10-22 20:54     ` Bart Van Assche
2018-10-22 21:04       ` Johannes Berg
2018-10-22 21:26         ` Bart Van Assche
2018-10-23 19:44           ` Johannes Berg
2018-10-23 19:58             ` Johannes Berg
2018-10-23  1:17         ` Bart Van Assche
2018-10-23 19:50           ` Johannes Berg
2018-10-23 20:24             ` Johannes Berg
2018-10-22 21:47   ` Bart Van Assche
2018-10-23  0:43     ` Sagi Grimberg
2018-10-23 19:26       ` Johannes Berg
2018-10-23 21:30         ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=094669f3df1690dec5913c2086f6a6d8c470f685.camel@sipsolutions.net \
    --to=johannes@sipsolutions.net \
    --cc=bvanassche@acm.org \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).