From: Johannes Berg <johannes@sipsolutions.net>
To: Bart Van Assche <bvanassche@acm.org>, Tejun Heo <tj@kernel.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>,
"tytso@mit.edu" <tytso@mit.edu>
Subject: Re: [PATCH 3/3] kernel/workqueue: Suppress a false positive lockdep complaint
Date: Thu, 25 Oct 2018 21:51:50 +0200 [thread overview]
Message-ID: <11db45f85e2f763c23ae32834dcd016adfce8ad6.camel@sipsolutions.net> (raw)
In-Reply-To: <1540487511.66186.31.camel@acm.org> (sfid-20181025_191156_926182_FF0802A9)
On Thu, 2018-10-25 at 10:11 -0700, Bart Van Assche wrote:
> On Thu, 2018-10-25 at 19:02 +0200, Johannes Berg wrote:
> > On Thu, 2018-10-25 at 15:05 +0000, Bart Van Assche wrote:
> > > It can happen that the direct I/O queue creates and destroys an empty
> > > workqueue from inside a work function.
> >
> > So, thinking about this more, can you guarantee (somehow) that the
> > workqueue is empty at this point?
>
> In general, no. But for the direct I/O case this can be guaranteed. Please
> have a look at the code in sb_init_dio_done_wq() if you would not yet have
> done this.
Indeed, obviously.
> > Do you know how to reproduce this?
>
> The lockdep complaint in the patch description is easy to reproduce. The
> way I reproduce it is as follows:
>
> git clone https://github.com/osandov/blktests
> (cd blktests && ./check -q nvmeof-mp)
I'm a bit scared by this, looks like that needs a lot of prerequisites?
I'll try it in a VM when it's done compiling with nvme.
Looking at the splat in more detail, I think we have the following:
__generic_file_fsync() takes i_mutex_key#14
__generic_file_fsync() this is called from dio_aio_complete_work()
dio_aio_complete_work() generally runs on the dio/%s workqueue
Lockdep also sees:
ext4_file_write_iter() takes i_mutex_key#14
depending on circumstances, it can then call do_blockdev_direct_IO(),
which needs to ensure the WQ exists for this SB, so it calls
sb_init_dio_done_wq() to allocate the dio/%s workqueue.
Since lockdep neither knows that the instance of the workqueue that was
executing dio_aio_complete_work() must be different from the instance
that's freed, it complains because this creates a circle of
dependencies.
Still, basically what I tried to say before - rather than track whether
a workqueue was ever used, which is error-prone since in other cases in
the kernel the usage might depend on whatever conditions, I think we
should either teach lockdep that this is guaranteed to be a different
workqueue, or perhaps we should just have a "free empty workqueue"
function. I tend to prefer the former as it's more general, so I'd
propose this (combined) patch:
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 093fb54cd316..9ef33d6cba56 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -629,9 +629,16 @@ int sb_init_dio_done_wq(struct super_block *sb)
* This has to be atomic as more DIOs can race to create the workqueue
*/
old = cmpxchg(&sb->s_dio_done_wq, NULL, wq);
- /* Someone created workqueue before us? Free ours... */
+ /*
+ * Someone created workqueue before us? Free ours...
+ * Note the _nested(), that pushes down to the (in this case actually
+ * pointless) flush_workqueue() happening inside, since this function
+ * might be called in contexts that hold the same locks that an fs may
+ * take while being called from dio_aio_complete_work() from another
+ * instance of the workqueue we allocate here.
+ */
if (old)
- destroy_workqueue(wq);
+ destroy_workqueue_nested(wq, SINGLE_DEPTH_NESTING);
return 0;
}
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 60d673e15632..0b36a7df61d4 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -453,7 +453,12 @@ __alloc_workqueue_key(const char *fmt, unsigned int flags, int max_active,
#define create_singlethread_workqueue(name) \
alloc_ordered_workqueue("%s", __WQ_LEGACY | WQ_MEM_RECLAIM, name)
-extern void destroy_workqueue(struct workqueue_struct *wq);
+extern void destroy_workqueue_nested(struct workqueue_struct *wq, int subclass);
+
+static inline void destroy_workqueue(struct workqueue_struct *wq)
+{
+ destroy_workqueue_nested(wq, 0);
+}
struct workqueue_attrs *alloc_workqueue_attrs(gfp_t gfp_mask);
void free_workqueue_attrs(struct workqueue_attrs *attrs);
@@ -469,8 +474,18 @@ extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
struct delayed_work *dwork, unsigned long delay);
extern bool queue_rcu_work(struct workqueue_struct *wq, struct rcu_work *rwork);
-extern void flush_workqueue(struct workqueue_struct *wq);
-extern void drain_workqueue(struct workqueue_struct *wq);
+extern void flush_workqueue_nested(struct workqueue_struct *wq, int subclass);
+extern void drain_workqueue_nested(struct workqueue_struct *wq, int subclass);
+
+static inline void flush_workqueue(struct workqueue_struct *wq)
+{
+ flush_workqueue_nested(wq, 0);
+}
+
+static inline void drain_workqueue(struct workqueue_struct *wq)
+{
+ drain_workqueue_nested(wq, 0);
+}
extern int schedule_on_each_cpu(work_func_t func);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0280deac392e..6b00e062af96 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2634,13 +2634,14 @@ static bool flush_workqueue_prep_pwqs(struct workqueue_struct *wq,
}
/**
- * flush_workqueue - ensure that any scheduled work has run to completion.
+ * flush_workqueue_nested - ensure that any scheduled work has run to completion.
* @wq: workqueue to flush
+ * @subclass: subclass for lockdep
*
* This function sleeps until all work items which were queued on entry
* have finished execution, but it is not livelocked by new incoming ones.
*/
-void flush_workqueue(struct workqueue_struct *wq)
+void flush_workqueue_nested(struct workqueue_struct *wq, int subclass)
{
struct wq_flusher this_flusher = {
.list = LIST_HEAD_INIT(this_flusher.list),
@@ -2652,7 +2653,7 @@ void flush_workqueue(struct workqueue_struct *wq)
if (WARN_ON(!wq_online))
return;
- lock_map_acquire(&wq->lockdep_map);
+ lock_acquire_exclusive(&wq->lockdep_map, subclass, 0, NULL, _THIS_IP_);
lock_map_release(&wq->lockdep_map);
mutex_lock(&wq->mutex);
@@ -2789,11 +2790,12 @@ void flush_workqueue(struct workqueue_struct *wq)
out_unlock:
mutex_unlock(&wq->mutex);
}
-EXPORT_SYMBOL(flush_workqueue);
+EXPORT_SYMBOL(flush_workqueue_nested);
/**
- * drain_workqueue - drain a workqueue
+ * drain_workqueue_nested - drain a workqueue
* @wq: workqueue to drain
+ * @subclass: lockdep subclass
*
* Wait until the workqueue becomes empty. While draining is in progress,
* only chain queueing is allowed. IOW, only currently pending or running
@@ -2802,7 +2804,7 @@ EXPORT_SYMBOL(flush_workqueue);
* by the depth of chaining and should be relatively short. Whine if it
* takes too long.
*/
-void drain_workqueue(struct workqueue_struct *wq)
+void drain_workqueue_nested(struct workqueue_struct *wq, int subclass)
{
unsigned int flush_cnt = 0;
struct pool_workqueue *pwq;
@@ -2817,7 +2819,7 @@ void drain_workqueue(struct workqueue_struct *wq)
wq->flags |= __WQ_DRAINING;
mutex_unlock(&wq->mutex);
reflush:
- flush_workqueue(wq);
+ flush_workqueue_nested(wq, subclass);
mutex_lock(&wq->mutex);
@@ -2844,7 +2846,7 @@ void drain_workqueue(struct workqueue_struct *wq)
wq->flags &= ~__WQ_DRAINING;
mutex_unlock(&wq->mutex);
}
-EXPORT_SYMBOL_GPL(drain_workqueue);
+EXPORT_SYMBOL_GPL(drain_workqueue_nested);
static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr,
bool from_cancel)
@@ -4141,18 +4143,19 @@ struct workqueue_struct *__alloc_workqueue_key(const char *fmt,
EXPORT_SYMBOL_GPL(__alloc_workqueue_key);
/**
- * destroy_workqueue - safely terminate a workqueue
+ * destroy_workqueue_nested - safely terminate a workqueue
* @wq: target workqueue
+ * @subclass: lockdep subclass
*
* Safely destroy a workqueue. All work currently pending will be done first.
*/
-void destroy_workqueue(struct workqueue_struct *wq)
+void destroy_workqueue_nested(struct workqueue_struct *wq, int subclass)
{
struct pool_workqueue *pwq;
int node;
/* drain it before proceeding with destruction */
- drain_workqueue(wq);
+ drain_workqueue_nested(wq, subclass);
/* sanity checks */
mutex_lock(&wq->mutex);
@@ -4217,7 +4220,7 @@ void destroy_workqueue(struct workqueue_struct *wq)
put_pwq_unlocked(pwq);
}
}
-EXPORT_SYMBOL_GPL(destroy_workqueue);
+EXPORT_SYMBOL_GPL(destroy_workqueue_nested);
/**
* workqueue_set_max_active - adjust max_active of a workqueue
We could avoid the useless subclass argument in the non-lockdep case
with some macro trickery, but for now I haven't done that.
johannes
next prev parent reply other threads:[~2018-10-25 19:52 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-25 15:05 [PATCH 0/3] Suppress false positives triggered by workqueue lockdep annotations Bart Van Assche
2018-10-25 15:05 ` [PATCH 1/3] kernel/workqueue: Remove lockdep annotation from __flush_work() Bart Van Assche
2018-10-25 15:31 ` Johannes Berg
2018-10-25 15:57 ` Johannes Berg
2018-10-25 16:01 ` Bart Van Assche
2018-10-25 15:05 ` [PATCH 2/3] kernel/workqueue: Surround work execution with shared lock annotations Bart Van Assche
2018-10-25 16:53 ` Johannes Berg
2018-10-25 17:22 ` Bart Van Assche
2018-10-25 19:17 ` Johannes Berg
2018-10-25 15:05 ` [PATCH 3/3] kernel/workqueue: Suppress a false positive lockdep complaint Bart Van Assche
2018-10-25 15:34 ` Johannes Berg
2018-10-25 15:55 ` Bart Van Assche
2018-10-25 19:59 ` Johannes Berg
2018-10-25 20:21 ` Theodore Y. Ts'o
2018-10-25 20:26 ` Johannes Berg
2018-10-25 15:36 ` Tejun Heo
2018-10-25 15:37 ` Tejun Heo
2018-10-25 20:13 ` Johannes Berg
2018-10-25 15:40 ` Theodore Y. Ts'o
2018-10-25 17:02 ` Johannes Berg
2018-10-25 17:11 ` Bart Van Assche
2018-10-25 19:51 ` Johannes Berg [this message]
2018-10-25 20:39 ` Bart Van Assche
2018-10-25 20:47 ` Johannes Berg
2018-10-25 15:27 ` [PATCH 0/3] Suppress false positives triggered by workqueue lockdep annotations Johannes Berg
2018-10-25 15:47 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=11db45f85e2f763c23ae32834dcd016adfce8ad6.camel@sipsolutions.net \
--to=johannes@sipsolutions.net \
--cc=bvanassche@acm.org \
--cc=hch@lst.de \
--cc=linux-kernel@vger.kernel.org \
--cc=sagi@grimberg.me \
--cc=tj@kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).