linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] btrfs: workqueue fixes+cleanups
@ 2019-09-16 18:30 Omar Sandoval
  2019-09-16 18:30 ` [PATCH 1/7] btrfs: get rid of unnecessary memset() of work item Omar Sandoval
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Omar Sandoval @ 2019-09-16 18:30 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

Hello,

This series fixes a few issues with Btrfs' use of workqueues.

The bulk of this series fixes multiple cases of the same bug: a work
item shouldn't be freed while it potentially depends on any other work
items. Currently, we've mostly been side-stepping this issue by having
different work functions for different work types (see commit
9e0af2376434 ("Btrfs: fix task hang under heavy compressed write").
However, there are cases where a work function can depend on itself,
usually through stacked filesystems (e.g., loop devices).

Patch 1 is a trivial cleanup.

Patches 2-5 fix all of the cases of this bug that I could find. Patch 2
is the ordered_free fix I previously sent [1], now with an updated title
and Josef's fix [2] folded in. Patch 3 fixes the deadlock reported with
my previous cleanup, and patches 4 and 5 fix similar issues.

With all of those fixed, patch 6 is my previous cleanup [3] to get rid
of the different work functions. Patch 7 is the same cleanup [4] as from
before.

Based on misc-next.

Thanks!

1: https://lore.kernel.org/linux-btrfs/0bea516a54b26e4e1c42e6fe47548cb48cc4172b.1565112813.git.osandov@fb.com/
2: https://lore.kernel.org/linux-btrfs/20190821151803.cuyq2yzpdnwgwrmb@MacBook-Pro-91.local/
3: https://lore.kernel.org/linux-btrfs/afaf532af471dd1a8a1f4af9de570127529323b0.1565717248.git.osandov@fb.com/
4: https://lore.kernel.org/linux-btrfs/d4fa1870ffce027ada265a67f4e00d397b683241.1565717248.git.osandov@fb.com/

Omar Sandoval (7):
  btrfs: get rid of unnecessary memset() of work item
  btrfs: don't prematurely free work in run_ordered_work()
  btrfs: don't prematurely free work in end_workqueue_fn()
  btrfs: don't prematurely free work in reada_start_machine_worker()
  btrfs: don't prematurely free work in scrub_missing_raid56_worker()
  btrfs: get rid of unique workqueue helper functions
  btrfs: get rid of pointless wtag variable in async-thread.c

 fs/btrfs/async-thread.c      | 107 +++++++++++++++++------------------
 fs/btrfs/async-thread.h      |  33 +----------
 fs/btrfs/block-group.c       |   3 +-
 fs/btrfs/delayed-inode.c     |   4 +-
 fs/btrfs/disk-io.c           |  36 ++++--------
 fs/btrfs/inode.c             |  36 ++++--------
 fs/btrfs/ordered-data.c      |   1 -
 fs/btrfs/qgroup.c            |   3 -
 fs/btrfs/raid56.c            |   5 +-
 fs/btrfs/reada.c             |  13 ++---
 fs/btrfs/scrub.c             |  17 +++---
 fs/btrfs/volumes.c           |   3 +-
 include/trace/events/btrfs.h |   6 +-
 13 files changed, 97 insertions(+), 170 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/7] btrfs: get rid of unnecessary memset() of work item
  2019-09-16 18:30 [PATCH 0/7] btrfs: workqueue fixes+cleanups Omar Sandoval
@ 2019-09-16 18:30 ` Omar Sandoval
  2019-09-17 11:46   ` Johannes Thumshirn
  2019-09-16 18:30 ` [PATCH 2/7] btrfs: don't prematurely free work in run_ordered_work() Omar Sandoval
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Omar Sandoval @ 2019-09-16 18:30 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

Commit fc97fab0ea59 ("btrfs: Replace fs_info->qgroup_rescan_worker
workqueue with btrfs_workqueue.") converted qgroup_rescan_work to be
initialized with btrfs_init_work(), but it left behind an unnecessary
memset(). Get rid of the memset().

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/qgroup.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 8d3bd799ac7d..b2b8ceaff467 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3272,8 +3272,6 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
 	spin_unlock(&fs_info->qgroup_lock);
 	mutex_unlock(&fs_info->qgroup_rescan_lock);
 
-	memset(&fs_info->qgroup_rescan_work, 0,
-	       sizeof(fs_info->qgroup_rescan_work));
 	btrfs_init_work(&fs_info->qgroup_rescan_work,
 			btrfs_qgroup_rescan_helper,
 			btrfs_qgroup_rescan_worker, NULL, NULL);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/7] btrfs: don't prematurely free work in run_ordered_work()
  2019-09-16 18:30 [PATCH 0/7] btrfs: workqueue fixes+cleanups Omar Sandoval
  2019-09-16 18:30 ` [PATCH 1/7] btrfs: get rid of unnecessary memset() of work item Omar Sandoval
@ 2019-09-16 18:30 ` Omar Sandoval
  2019-09-17 12:04   ` Johannes Thumshirn
  2019-09-16 18:30 ` [PATCH 3/7] btrfs: don't prematurely free work in end_workqueue_fn() Omar Sandoval
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Omar Sandoval @ 2019-09-16 18:30 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Tejun Heo

From: Omar Sandoval <osandov@fb.com>

We hit the following very strange deadlock on a system with Btrfs on a
loop device backed by another Btrfs filesystem:

1. The top (loop device) filesystem queues an async_cow work item from
   cow_file_range_async(). We'll call this work X.
2. Worker thread A starts work X (normal_work_helper()).
3. Worker thread A executes the ordered work for the top filesystem
   (run_ordered_work()).
4. Worker thread A finishes the ordered work for work X and frees X
   (work->ordered_free()).
5. Worker thread A executes another ordered work and gets blocked on I/O
   to the bottom filesystem (still in run_ordered_work()).
6. Meanwhile, the bottom filesystem allocates and queues an async_cow
   work item which happens to be the recently-freed X.
7. The workqueue code sees that X is already being executed by worker
   thread A, so it schedules X to be executed _after_ worker thread A
   finishes (see the find_worker_executing_work() call in
   process_one_work()).

Now, the top filesystem is waiting for I/O on the bottom filesystem, but
the bottom filesystem is waiting for the top filesystem to finish, so we
deadlock.

This happens because we are breaking the workqueue assumption that a
work item cannot be recycled while it still depends on other work. Fix
it by waiting to free the work item until we are done with all of the
related ordered work.

P.S.:

One might ask why the workqueue code doesn't try to detect a recycled
work item. It actually does try by checking whether the work item has
the same work function (find_worker_executing_work()), but in our case
the function is the same. This is the only key that the workqueue code
has available to compare, short of adding an additional, layer-violating
"custom key". Considering that we're the only ones that have ever hit
this, we should just play by the rules.

Unfortunately, we haven't been able to create a minimal reproducer other
than our full container setup using a compress-force=zstd filesystem on
top of another compress-force=zstd filesystem.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/async-thread.c | 56 ++++++++++++++++++++++++++++++++---------
 1 file changed, 44 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 2e9e13ffbd08..10a04b99798a 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -252,16 +252,17 @@ static inline void thresh_exec_hook(struct __btrfs_workqueue *wq)
 	}
 }
 
-static void run_ordered_work(struct __btrfs_workqueue *wq)
+static void run_ordered_work(struct __btrfs_workqueue *wq,
+			     struct btrfs_work *self)
 {
 	struct list_head *list = &wq->ordered_list;
 	struct btrfs_work *work;
 	spinlock_t *lock = &wq->list_lock;
 	unsigned long flags;
+	void *wtag;
+	bool free_self = false;
 
 	while (1) {
-		void *wtag;
-
 		spin_lock_irqsave(lock, flags);
 		if (list_empty(list))
 			break;
@@ -287,16 +288,47 @@ static void run_ordered_work(struct __btrfs_workqueue *wq)
 		list_del(&work->ordered_list);
 		spin_unlock_irqrestore(lock, flags);
 
-		/*
-		 * We don't want to call the ordered free functions with the
-		 * lock held though. Save the work as tag for the trace event,
-		 * because the callback could free the structure.
-		 */
-		wtag = work;
-		work->ordered_free(work);
-		trace_btrfs_all_work_done(wq->fs_info, wtag);
+		if (work == self) {
+			/*
+			 * This is the work item that the worker is currently
+			 * executing.
+			 *
+			 * The kernel workqueue code guarantees non-reentrancy
+			 * of work items. I.e., if a work item with the same
+			 * address and work function is queued twice, the second
+			 * execution is blocked until the first one finishes. A
+			 * work item may be freed and recycled with the same
+			 * work function; the workqueue code assumes that the
+			 * original work item cannot depend on the recycled work
+			 * item in that case (see find_worker_executing_work()).
+			 *
+			 * Note that the work of one Btrfs filesystem may depend
+			 * on the work of another Btrfs filesystem via, e.g., a
+			 * loop device. Therefore, we must not allow the current
+			 * work item to be recycled until we are really done,
+			 * otherwise we break the above assumption and can
+			 * deadlock.
+			 */
+			free_self = true;
+		} else {
+			/*
+			 * We don't want to call the ordered free functions with
+			 * the lock held though. Save the work as tag for the
+			 * trace event, because the callback could free the
+			 * structure.
+			 */
+			wtag = work;
+			work->ordered_free(work);
+			trace_btrfs_all_work_done(wq->fs_info, wtag);
+		}
 	}
 	spin_unlock_irqrestore(lock, flags);
+
+	if (free_self) {
+		wtag = self;
+		self->ordered_free(self);
+		trace_btrfs_all_work_done(wq->fs_info, wtag);
+	}
 }
 
 static void normal_work_helper(struct btrfs_work *work)
@@ -324,7 +356,7 @@ static void normal_work_helper(struct btrfs_work *work)
 	work->func(work);
 	if (need_order) {
 		set_bit(WORK_DONE_BIT, &work->flags);
-		run_ordered_work(wq);
+		run_ordered_work(wq, work);
 	}
 	if (!need_order)
 		trace_btrfs_all_work_done(wq->fs_info, wtag);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/7] btrfs: don't prematurely free work in end_workqueue_fn()
  2019-09-16 18:30 [PATCH 0/7] btrfs: workqueue fixes+cleanups Omar Sandoval
  2019-09-16 18:30 ` [PATCH 1/7] btrfs: get rid of unnecessary memset() of work item Omar Sandoval
  2019-09-16 18:30 ` [PATCH 2/7] btrfs: don't prematurely free work in run_ordered_work() Omar Sandoval
@ 2019-09-16 18:30 ` Omar Sandoval
  2019-09-17 12:05   ` Johannes Thumshirn
  2019-09-16 18:30 ` [PATCH 4/7] btrfs: don't prematurely free work in reada_start_machine_worker() Omar Sandoval
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Omar Sandoval @ 2019-09-16 18:30 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

Currently, end_workqueue_fn() frees the end_io_wq entry (which embeds
the work item) and then calls bio_endio(). This is another potential
instance of the bug in "btrfs: don't prematurely free work in
run_ordered_work()".

In particular, the endio call may depend on other work items. For
example, btrfs_end_dio_bio() can call btrfs_subio_endio_read() ->
__btrfs_correct_data_nocsum() -> dio_read_error() ->
submit_dio_repair_bio(), which submits a bio that is also completed
through a end_workqueue_fn() work item. However,
__btrfs_correct_data_nocsum() waits for the newly submitted bio to
complete, thus it depends on another work item.

This example currently usually works because we use different workqueue
helper functions for BTRFS_WQ_ENDIO_DATA and BTRFS_WQ_ENDIO_DIO_REPAIR.
However, it may deadlock with stacked filesystems and is fragile
overall. The proper fix is to free the work item at the very end of the
work function, so let's do that.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/disk-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 044981cf6df9..6388f83b6ee5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1657,8 +1657,8 @@ static void end_workqueue_fn(struct btrfs_work *work)
 	bio->bi_status = end_io_wq->status;
 	bio->bi_private = end_io_wq->private;
 	bio->bi_end_io = end_io_wq->end_io;
-	kmem_cache_free(btrfs_end_io_wq_cache, end_io_wq);
 	bio_endio(bio);
+	kmem_cache_free(btrfs_end_io_wq_cache, end_io_wq);
 }
 
 static int cleaner_kthread(void *arg)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/7] btrfs: don't prematurely free work in reada_start_machine_worker()
  2019-09-16 18:30 [PATCH 0/7] btrfs: workqueue fixes+cleanups Omar Sandoval
                   ` (2 preceding siblings ...)
  2019-09-16 18:30 ` [PATCH 3/7] btrfs: don't prematurely free work in end_workqueue_fn() Omar Sandoval
@ 2019-09-16 18:30 ` Omar Sandoval
  2019-09-17 12:09   ` Johannes Thumshirn
  2019-09-16 18:30 ` [PATCH 5/7] btrfs: don't prematurely free work in scrub_missing_raid56_worker() Omar Sandoval
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Omar Sandoval @ 2019-09-16 18:30 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

Currently, reada_start_machine_worker() frees the reada_machine_work and
then calls __reada_start_machine() to do readahead. This is another
potential instance of the bug in "btrfs: don't prematurely free work in
run_ordered_work()".

There _might_ already be a deadlock here: reada_start_machine_worker()
can depend on itself through stacked filesystems (__read_start_machine()
-> reada_start_machine_dev() -> reada_tree_block_flagged() ->
read_extent_buffer_pages() -> submit_one_bio() ->
btree_submit_bio_hook() -> btrfs_map_bio() -> submit_stripe_bio() ->
submit_bio() onto a loop device can trigger readahead on the lower
filesystem).

Either way, let's fix it by freeing the work at the end.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/reada.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
index ee6f60547a8d..dd4f9c2b7107 100644
--- a/fs/btrfs/reada.c
+++ b/fs/btrfs/reada.c
@@ -752,21 +752,19 @@ static int reada_start_machine_dev(struct btrfs_device *dev)
 static void reada_start_machine_worker(struct btrfs_work *work)
 {
 	struct reada_machine_work *rmw;
-	struct btrfs_fs_info *fs_info;
 	int old_ioprio;
 
 	rmw = container_of(work, struct reada_machine_work, work);
-	fs_info = rmw->fs_info;
-
-	kfree(rmw);
 
 	old_ioprio = IOPRIO_PRIO_VALUE(task_nice_ioclass(current),
 				       task_nice_ioprio(current));
 	set_task_ioprio(current, BTRFS_IOPRIO_READA);
-	__reada_start_machine(fs_info);
+	__reada_start_machine(rmw->fs_info);
 	set_task_ioprio(current, old_ioprio);
 
-	atomic_dec(&fs_info->reada_works_cnt);
+	atomic_dec(&rmw->fs_info->reada_works_cnt);
+
+	kfree(rmw);
 }
 
 static void __reada_start_machine(struct btrfs_fs_info *fs_info)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/7] btrfs: don't prematurely free work in scrub_missing_raid56_worker()
  2019-09-16 18:30 [PATCH 0/7] btrfs: workqueue fixes+cleanups Omar Sandoval
                   ` (3 preceding siblings ...)
  2019-09-16 18:30 ` [PATCH 4/7] btrfs: don't prematurely free work in reada_start_machine_worker() Omar Sandoval
@ 2019-09-16 18:30 ` Omar Sandoval
  2019-09-17 12:10   ` Johannes Thumshirn
  2019-09-16 18:30 ` [PATCH 6/7] btrfs: get rid of unique workqueue helper functions Omar Sandoval
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Omar Sandoval @ 2019-09-16 18:30 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team

From: Omar Sandoval <osandov@fb.com>

Currently, scrub_missing_raid56_worker() puts and potentially frees
sblock (which embeds the work item) and then submits a bio through
scrub_wr_submit(). This is another potential instance of the bug in
"btrfs: don't prematurely free work in run_ordered_work()". Fix it by
dropping the reference after we submit the bio.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/scrub.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index f7d4e03f4c5d..a0770a6aee00 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2149,14 +2149,13 @@ static void scrub_missing_raid56_worker(struct btrfs_work *work)
 		scrub_write_block_to_dev_replace(sblock);
 	}
 
-	scrub_block_put(sblock);
-
 	if (sctx->is_dev_replace && sctx->flush_all_writes) {
 		mutex_lock(&sctx->wr_lock);
 		scrub_wr_submit(sctx);
 		mutex_unlock(&sctx->wr_lock);
 	}
 
+	scrub_block_put(sblock);
 	scrub_pending_bio_dec(sctx);
 }
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/7] btrfs: get rid of unique workqueue helper functions
  2019-09-16 18:30 [PATCH 0/7] btrfs: workqueue fixes+cleanups Omar Sandoval
                   ` (4 preceding siblings ...)
  2019-09-16 18:30 ` [PATCH 5/7] btrfs: don't prematurely free work in scrub_missing_raid56_worker() Omar Sandoval
@ 2019-09-16 18:30 ` Omar Sandoval
  2019-09-16 18:30 ` [PATCH 7/7] btrfs: get rid of pointless wtag variable in async-thread.c Omar Sandoval
  2019-09-24 14:45 ` [PATCH 0/7] btrfs: workqueue fixes+cleanups David Sterba
  7 siblings, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2019-09-16 18:30 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Nikolay Borisov, Filipe Manana

From: Omar Sandoval <osandov@fb.com>

Commit 9e0af2376434 ("Btrfs: fix task hang under heavy compressed
write") worked around the issue that a recycled work item could get a
false dependency on the original work item due to how the workqueue code
guarantees non-reentrancy. It did so by giving different work functions
to different types of work.

However, the fixes in the previous few patches are more complete, as
they prevent a work item from being recycled at all (except for a tiny
window that the kernel workqueue code handles for us). This obsoletes
the previous fix, so we don't need the unique helpers for correctness.
The only other reason to keep them would be so they show up in stack
traces, but they always seem to be optimized to a tail call, so they
don't show up anyways. So, let's just get rid of the extra indirection.

While we're here, rename normal_work_helper() to the more informative
btrfs_work_helper().

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/async-thread.c  | 58 +++++++++-------------------------------
 fs/btrfs/async-thread.h  | 33 ++---------------------
 fs/btrfs/block-group.c   |  3 +--
 fs/btrfs/delayed-inode.c |  4 +--
 fs/btrfs/disk-io.c       | 34 ++++++++---------------
 fs/btrfs/inode.c         | 36 ++++++++-----------------
 fs/btrfs/ordered-data.c  |  1 -
 fs/btrfs/qgroup.c        |  1 -
 fs/btrfs/raid56.c        |  5 ++--
 fs/btrfs/reada.c         |  3 +--
 fs/btrfs/scrub.c         | 14 +++++-----
 fs/btrfs/volumes.c       |  3 +--
 12 files changed, 50 insertions(+), 145 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 10a04b99798a..3f3110975f88 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -53,16 +53,6 @@ struct btrfs_workqueue {
 	struct __btrfs_workqueue *high;
 };
 
-static void normal_work_helper(struct btrfs_work *work);
-
-#define BTRFS_WORK_HELPER(name)					\
-noinline_for_stack void btrfs_##name(struct work_struct *arg)		\
-{									\
-	struct btrfs_work *work = container_of(arg, struct btrfs_work,	\
-					       normal_work);		\
-	normal_work_helper(work);					\
-}
-
 struct btrfs_fs_info *
 btrfs_workqueue_owner(const struct __btrfs_workqueue *wq)
 {
@@ -89,29 +79,6 @@ bool btrfs_workqueue_normal_congested(const struct btrfs_workqueue *wq)
 	return atomic_read(&wq->normal->pending) > wq->normal->thresh * 2;
 }
 
-BTRFS_WORK_HELPER(worker_helper);
-BTRFS_WORK_HELPER(delalloc_helper);
-BTRFS_WORK_HELPER(flush_delalloc_helper);
-BTRFS_WORK_HELPER(cache_helper);
-BTRFS_WORK_HELPER(submit_helper);
-BTRFS_WORK_HELPER(fixup_helper);
-BTRFS_WORK_HELPER(endio_helper);
-BTRFS_WORK_HELPER(endio_meta_helper);
-BTRFS_WORK_HELPER(endio_meta_write_helper);
-BTRFS_WORK_HELPER(endio_raid56_helper);
-BTRFS_WORK_HELPER(endio_repair_helper);
-BTRFS_WORK_HELPER(rmw_helper);
-BTRFS_WORK_HELPER(endio_write_helper);
-BTRFS_WORK_HELPER(freespace_write_helper);
-BTRFS_WORK_HELPER(delayed_meta_helper);
-BTRFS_WORK_HELPER(readahead_helper);
-BTRFS_WORK_HELPER(qgroup_rescan_helper);
-BTRFS_WORK_HELPER(extent_refs_helper);
-BTRFS_WORK_HELPER(scrub_helper);
-BTRFS_WORK_HELPER(scrubwrc_helper);
-BTRFS_WORK_HELPER(scrubnc_helper);
-BTRFS_WORK_HELPER(scrubparity_helper);
-
 static struct __btrfs_workqueue *
 __btrfs_alloc_workqueue(struct btrfs_fs_info *fs_info, const char *name,
 			unsigned int flags, int limit_active, int thresh)
@@ -302,12 +269,13 @@ static void run_ordered_work(struct __btrfs_workqueue *wq,
 			 * original work item cannot depend on the recycled work
 			 * item in that case (see find_worker_executing_work()).
 			 *
-			 * Note that the work of one Btrfs filesystem may depend
-			 * on the work of another Btrfs filesystem via, e.g., a
-			 * loop device. Therefore, we must not allow the current
-			 * work item to be recycled until we are really done,
-			 * otherwise we break the above assumption and can
-			 * deadlock.
+			 * Note that different types of Btrfs work can depend on
+			 * each other, and one type of work on one Btrfs
+			 * filesystem may even depend on the same type of work
+			 * on another Btrfs filesystem via, e.g., a loop device.
+			 * Therefore, we must not allow the current work item to
+			 * be recycled until we are really done, otherwise we
+			 * break the above assumption and can deadlock.
 			 */
 			free_self = true;
 		} else {
@@ -331,8 +299,10 @@ static void run_ordered_work(struct __btrfs_workqueue *wq,
 	}
 }
 
-static void normal_work_helper(struct btrfs_work *work)
+static void btrfs_work_helper(struct work_struct *normal_work)
 {
+	struct btrfs_work *work = container_of(normal_work, struct btrfs_work,
+					       normal_work);
 	struct __btrfs_workqueue *wq;
 	void *wtag;
 	int need_order = 0;
@@ -362,15 +332,13 @@ static void normal_work_helper(struct btrfs_work *work)
 		trace_btrfs_all_work_done(wq->fs_info, wtag);
 }
 
-void btrfs_init_work(struct btrfs_work *work, btrfs_work_func_t uniq_func,
-		     btrfs_func_t func,
-		     btrfs_func_t ordered_func,
-		     btrfs_func_t ordered_free)
+void btrfs_init_work(struct btrfs_work *work, btrfs_func_t func,
+		     btrfs_func_t ordered_func, btrfs_func_t ordered_free)
 {
 	work->func = func;
 	work->ordered_func = ordered_func;
 	work->ordered_free = ordered_free;
-	INIT_WORK(&work->normal_work, uniq_func);
+	INIT_WORK(&work->normal_work, btrfs_work_helper);
 	INIT_LIST_HEAD(&work->ordered_list);
 	work->flags = 0;
 }
diff --git a/fs/btrfs/async-thread.h b/fs/btrfs/async-thread.h
index 7861c9feba5f..c5bf2b117c05 100644
--- a/fs/btrfs/async-thread.h
+++ b/fs/btrfs/async-thread.h
@@ -29,42 +29,13 @@ struct btrfs_work {
 	unsigned long flags;
 };
 
-#define BTRFS_WORK_HELPER_PROTO(name)					\
-void btrfs_##name(struct work_struct *arg)
-
-BTRFS_WORK_HELPER_PROTO(worker_helper);
-BTRFS_WORK_HELPER_PROTO(delalloc_helper);
-BTRFS_WORK_HELPER_PROTO(flush_delalloc_helper);
-BTRFS_WORK_HELPER_PROTO(cache_helper);
-BTRFS_WORK_HELPER_PROTO(submit_helper);
-BTRFS_WORK_HELPER_PROTO(fixup_helper);
-BTRFS_WORK_HELPER_PROTO(endio_helper);
-BTRFS_WORK_HELPER_PROTO(endio_meta_helper);
-BTRFS_WORK_HELPER_PROTO(endio_meta_write_helper);
-BTRFS_WORK_HELPER_PROTO(endio_raid56_helper);
-BTRFS_WORK_HELPER_PROTO(endio_repair_helper);
-BTRFS_WORK_HELPER_PROTO(rmw_helper);
-BTRFS_WORK_HELPER_PROTO(endio_write_helper);
-BTRFS_WORK_HELPER_PROTO(freespace_write_helper);
-BTRFS_WORK_HELPER_PROTO(delayed_meta_helper);
-BTRFS_WORK_HELPER_PROTO(readahead_helper);
-BTRFS_WORK_HELPER_PROTO(qgroup_rescan_helper);
-BTRFS_WORK_HELPER_PROTO(extent_refs_helper);
-BTRFS_WORK_HELPER_PROTO(scrub_helper);
-BTRFS_WORK_HELPER_PROTO(scrubwrc_helper);
-BTRFS_WORK_HELPER_PROTO(scrubnc_helper);
-BTRFS_WORK_HELPER_PROTO(scrubparity_helper);
-
-
 struct btrfs_workqueue *btrfs_alloc_workqueue(struct btrfs_fs_info *fs_info,
 					      const char *name,
 					      unsigned int flags,
 					      int limit_active,
 					      int thresh);
-void btrfs_init_work(struct btrfs_work *work, btrfs_work_func_t helper,
-		     btrfs_func_t func,
-		     btrfs_func_t ordered_func,
-		     btrfs_func_t ordered_free);
+void btrfs_init_work(struct btrfs_work *work, btrfs_func_t func,
+		     btrfs_func_t ordered_func, btrfs_func_t ordered_free);
 void btrfs_queue_work(struct btrfs_workqueue *wq,
 		      struct btrfs_work *work);
 void btrfs_destroy_workqueue(struct btrfs_workqueue *wq);
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index bf7e3f23bba7..e34c624db359 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -695,8 +695,7 @@ int btrfs_cache_block_group(struct btrfs_block_group_cache *cache,
 	caching_ctl->block_group = cache;
 	caching_ctl->progress = cache->key.objectid;
 	refcount_set(&caching_ctl->count, 1);
-	btrfs_init_work(&caching_ctl->work, btrfs_cache_helper,
-			caching_thread, NULL, NULL);
+	btrfs_init_work(&caching_ctl->work, caching_thread, NULL, NULL);
 
 	spin_lock(&cache->lock);
 	/*
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 1f7f39b10bd0..49ec3402886a 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1367,8 +1367,8 @@ static int btrfs_wq_run_delayed_node(struct btrfs_delayed_root *delayed_root,
 		return -ENOMEM;
 
 	async_work->delayed_root = delayed_root;
-	btrfs_init_work(&async_work->work, btrfs_delayed_meta_helper,
-			btrfs_async_run_delayed_root, NULL, NULL);
+	btrfs_init_work(&async_work->work, btrfs_async_run_delayed_root, NULL,
+			NULL);
 	async_work->nr = nr;
 
 	btrfs_queue_work(fs_info->delayed_workers, &async_work->work);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 6388f83b6ee5..e335fa4c4d1d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -706,43 +706,31 @@ static void end_workqueue_bio(struct bio *bio)
 	struct btrfs_end_io_wq *end_io_wq = bio->bi_private;
 	struct btrfs_fs_info *fs_info;
 	struct btrfs_workqueue *wq;
-	btrfs_work_func_t func;
 
 	fs_info = end_io_wq->info;
 	end_io_wq->status = bio->bi_status;
 
 	if (bio_op(bio) == REQ_OP_WRITE) {
-		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA) {
+		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA)
 			wq = fs_info->endio_meta_write_workers;
-			func = btrfs_endio_meta_write_helper;
-		} else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE) {
+		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE)
 			wq = fs_info->endio_freespace_worker;
-			func = btrfs_freespace_write_helper;
-		} else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56) {
+		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
 			wq = fs_info->endio_raid56_workers;
-			func = btrfs_endio_raid56_helper;
-		} else {
+		else
 			wq = fs_info->endio_write_workers;
-			func = btrfs_endio_write_helper;
-		}
 	} else {
-		if (unlikely(end_io_wq->metadata ==
-			     BTRFS_WQ_ENDIO_DIO_REPAIR)) {
+		if (unlikely(end_io_wq->metadata == BTRFS_WQ_ENDIO_DIO_REPAIR))
 			wq = fs_info->endio_repair_workers;
-			func = btrfs_endio_repair_helper;
-		} else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56) {
+		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
 			wq = fs_info->endio_raid56_workers;
-			func = btrfs_endio_raid56_helper;
-		} else if (end_io_wq->metadata) {
+		else if (end_io_wq->metadata)
 			wq = fs_info->endio_meta_workers;
-			func = btrfs_endio_meta_helper;
-		} else {
+		else
 			wq = fs_info->endio_workers;
-			func = btrfs_endio_helper;
-		}
 	}
 
-	btrfs_init_work(&end_io_wq->work, func, end_workqueue_fn, NULL, NULL);
+	btrfs_init_work(&end_io_wq->work, end_workqueue_fn, NULL, NULL);
 	btrfs_queue_work(wq, &end_io_wq->work);
 }
 
@@ -835,8 +823,8 @@ blk_status_t btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 	async->mirror_num = mirror_num;
 	async->submit_bio_start = submit_bio_start;
 
-	btrfs_init_work(&async->work, btrfs_worker_helper, run_one_async_start,
-			run_one_async_done, run_one_async_free);
+	btrfs_init_work(&async->work, run_one_async_start, run_one_async_done,
+			run_one_async_free);
 
 	async->bio_offset = bio_offset;
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a0546401bc0a..2eb1d7249f83 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1255,10 +1255,8 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page,
 		async_chunk[i].write_flags = write_flags;
 		INIT_LIST_HEAD(&async_chunk[i].extents);
 
-		btrfs_init_work(&async_chunk[i].work,
-				btrfs_delalloc_helper,
-				async_cow_start, async_cow_submit,
-				async_cow_free);
+		btrfs_init_work(&async_chunk[i].work, async_cow_start,
+				async_cow_submit, async_cow_free);
 
 		nr_pages = DIV_ROUND_UP(cur_end - start, PAGE_SIZE);
 		atomic_add(nr_pages, &fs_info->async_delalloc_pages);
@@ -2247,8 +2245,7 @@ int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end)
 
 	SetPageChecked(page);
 	get_page(page);
-	btrfs_init_work(&fixup->work, btrfs_fixup_helper,
-			btrfs_writepage_fixup_worker, NULL, NULL);
+	btrfs_init_work(&fixup->work, btrfs_writepage_fixup_worker, NULL, NULL);
 	fixup->page = page;
 	btrfs_queue_work(fs_info->fixup_workers, &fixup->work);
 	return -EBUSY;
@@ -3241,7 +3238,6 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_ordered_extent *ordered_extent = NULL;
 	struct btrfs_workqueue *wq;
-	btrfs_work_func_t func;
 
 	trace_btrfs_writepage_end_io_hook(page, start, end, uptodate);
 
@@ -3250,16 +3246,12 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start,
 					    end - start + 1, uptodate))
 		return;
 
-	if (btrfs_is_free_space_inode(BTRFS_I(inode))) {
+	if (btrfs_is_free_space_inode(BTRFS_I(inode)))
 		wq = fs_info->endio_freespace_worker;
-		func = btrfs_freespace_write_helper;
-	} else {
+	else
 		wq = fs_info->endio_write_workers;
-		func = btrfs_endio_write_helper;
-	}
 
-	btrfs_init_work(&ordered_extent->work, func, finish_ordered_fn, NULL,
-			NULL);
+	btrfs_init_work(&ordered_extent->work, finish_ordered_fn, NULL, NULL);
 	btrfs_queue_work(wq, &ordered_extent->work);
 }
 
@@ -8195,18 +8187,14 @@ static void __endio_write_update_ordered(struct inode *inode,
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
 	struct btrfs_ordered_extent *ordered = NULL;
 	struct btrfs_workqueue *wq;
-	btrfs_work_func_t func;
 	u64 ordered_offset = offset;
 	u64 ordered_bytes = bytes;
 	u64 last_offset;
 
-	if (btrfs_is_free_space_inode(BTRFS_I(inode))) {
+	if (btrfs_is_free_space_inode(BTRFS_I(inode)))
 		wq = fs_info->endio_freespace_worker;
-		func = btrfs_freespace_write_helper;
-	} else {
+	else
 		wq = fs_info->endio_write_workers;
-		func = btrfs_endio_write_helper;
-	}
 
 	while (ordered_offset < offset + bytes) {
 		last_offset = ordered_offset;
@@ -8214,9 +8202,8 @@ static void __endio_write_update_ordered(struct inode *inode,
 							   &ordered_offset,
 							   ordered_bytes,
 							   uptodate)) {
-			btrfs_init_work(&ordered->work, func,
-					finish_ordered_fn,
-					NULL, NULL);
+			btrfs_init_work(&ordered->work, finish_ordered_fn, NULL,
+					NULL);
 			btrfs_queue_work(wq, &ordered->work);
 		}
 		/*
@@ -10085,8 +10072,7 @@ static struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode
 	init_completion(&work->completion);
 	INIT_LIST_HEAD(&work->list);
 	work->inode = inode;
-	btrfs_init_work(&work->work, btrfs_flush_delalloc_helper,
-			btrfs_run_delalloc_work, NULL, NULL);
+	btrfs_init_work(&work->work, btrfs_run_delalloc_work, NULL, NULL);
 
 	return work;
 }
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 24b6c72b9a59..6240a5a1f2c0 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -547,7 +547,6 @@ u64 btrfs_wait_ordered_extents(struct btrfs_root *root, u64 nr,
 		spin_unlock(&root->ordered_extent_lock);
 
 		btrfs_init_work(&ordered->flush_work,
-				btrfs_flush_delalloc_helper,
 				btrfs_run_ordered_extent_work, NULL, NULL);
 		list_add_tail(&ordered->work_list, &works);
 		btrfs_queue_work(fs_info->flush_workers, &ordered->flush_work);
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index b2b8ceaff467..9c3fc31581be 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -3273,7 +3273,6 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
 	mutex_unlock(&fs_info->qgroup_rescan_lock);
 
 	btrfs_init_work(&fs_info->qgroup_rescan_work,
-			btrfs_qgroup_rescan_helper,
 			btrfs_qgroup_rescan_worker, NULL, NULL);
 	return 0;
 }
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 57a2ac721985..8f47a85944eb 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -190,7 +190,7 @@ static void scrub_parity_work(struct btrfs_work *work);
 
 static void start_async_work(struct btrfs_raid_bio *rbio, btrfs_func_t work_func)
 {
-	btrfs_init_work(&rbio->work, btrfs_rmw_helper, work_func, NULL, NULL);
+	btrfs_init_work(&rbio->work, work_func, NULL, NULL);
 	btrfs_queue_work(rbio->fs_info->rmw_workers, &rbio->work);
 }
 
@@ -1743,8 +1743,7 @@ static void btrfs_raid_unplug(struct blk_plug_cb *cb, bool from_schedule)
 	plug = container_of(cb, struct btrfs_plug_cb, cb);
 
 	if (from_schedule) {
-		btrfs_init_work(&plug->work, btrfs_rmw_helper,
-				unplug_work, NULL, NULL);
+		btrfs_init_work(&plug->work, unplug_work, NULL, NULL);
 		btrfs_queue_work(plug->info->rmw_workers,
 				 &plug->work);
 		return;
diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
index dd4f9c2b7107..1feaeadc8cf5 100644
--- a/fs/btrfs/reada.c
+++ b/fs/btrfs/reada.c
@@ -819,8 +819,7 @@ static void reada_start_machine(struct btrfs_fs_info *fs_info)
 		/* FIXME we cannot handle this properly right now */
 		BUG();
 	}
-	btrfs_init_work(&rmw->work, btrfs_readahead_helper,
-			reada_start_machine_worker, NULL, NULL);
+	btrfs_init_work(&rmw->work, reada_start_machine_worker, NULL, NULL);
 	rmw->fs_info = fs_info;
 
 	btrfs_queue_work(fs_info->readahead_workers, &rmw->work);
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index a0770a6aee00..a7b043fd7a57 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -598,8 +598,8 @@ static noinline_for_stack struct scrub_ctx *scrub_setup_ctx(
 		sbio->index = i;
 		sbio->sctx = sctx;
 		sbio->page_count = 0;
-		btrfs_init_work(&sbio->work, btrfs_scrub_helper,
-				scrub_bio_end_io_worker, NULL, NULL);
+		btrfs_init_work(&sbio->work, scrub_bio_end_io_worker, NULL,
+				NULL);
 
 		if (i != SCRUB_BIOS_PER_SCTX - 1)
 			sctx->bios[i]->next_free = i + 1;
@@ -1720,8 +1720,7 @@ static void scrub_wr_bio_end_io(struct bio *bio)
 	sbio->status = bio->bi_status;
 	sbio->bio = bio;
 
-	btrfs_init_work(&sbio->work, btrfs_scrubwrc_helper,
-			 scrub_wr_bio_end_io_worker, NULL, NULL);
+	btrfs_init_work(&sbio->work, scrub_wr_bio_end_io_worker, NULL, NULL);
 	btrfs_queue_work(fs_info->scrub_wr_completion_workers, &sbio->work);
 }
 
@@ -2203,8 +2202,7 @@ static void scrub_missing_raid56_pages(struct scrub_block *sblock)
 		raid56_add_scrub_pages(rbio, spage->page, spage->logical);
 	}
 
-	btrfs_init_work(&sblock->work, btrfs_scrub_helper,
-			scrub_missing_raid56_worker, NULL, NULL);
+	btrfs_init_work(&sblock->work, scrub_missing_raid56_worker, NULL, NULL);
 	scrub_block_get(sblock);
 	scrub_pending_bio_inc(sctx);
 	raid56_submit_missing_rbio(rbio);
@@ -2742,8 +2740,8 @@ static void scrub_parity_bio_endio(struct bio *bio)
 
 	bio_put(bio);
 
-	btrfs_init_work(&sparity->work, btrfs_scrubparity_helper,
-			scrub_parity_bio_endio_worker, NULL, NULL);
+	btrfs_init_work(&sparity->work, scrub_parity_bio_endio_worker, NULL,
+			NULL);
 	btrfs_queue_work(fs_info->scrub_parity_workers, &sparity->work);
 }
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a324480bc88b..5d35f3f34bd2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6665,8 +6665,7 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
 	else
 		generate_random_uuid(dev->uuid);
 
-	btrfs_init_work(&dev->work, btrfs_submit_helper,
-			pending_bios_fn, NULL, NULL);
+	btrfs_init_work(&dev->work, pending_bios_fn, NULL, NULL);
 
 	return dev;
 }
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 7/7] btrfs: get rid of pointless wtag variable in async-thread.c
  2019-09-16 18:30 [PATCH 0/7] btrfs: workqueue fixes+cleanups Omar Sandoval
                   ` (5 preceding siblings ...)
  2019-09-16 18:30 ` [PATCH 6/7] btrfs: get rid of unique workqueue helper functions Omar Sandoval
@ 2019-09-16 18:30 ` Omar Sandoval
  2019-09-24 14:45 ` [PATCH 0/7] btrfs: workqueue fixes+cleanups David Sterba
  7 siblings, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2019-09-16 18:30 UTC (permalink / raw)
  To: linux-btrfs; +Cc: kernel-team, Nikolay Borisov, Filipe Manana

From: Omar Sandoval <osandov@fb.com>

Commit ac0c7cf8be00 ("btrfs: fix crash when tracepoint arguments are
freed by wq callbacks") added a void pointer, wtag, which is passed into
trace_btrfs_all_work_done() instead of the freed work item. This is
silly for a few reasons:

1. The freed work item still has the same address.
2. work is still in scope after it's freed, so assigning wtag doesn't
   stop anyone from using it.
3. The tracepoint has always taken a void * argument, so assigning wtag
   doesn't actually make things any more type-safe. (Note that the
   original bug in commit bc074524e123 ("btrfs: prefix fsid to all trace
   events") was that the void * was implicitly casted when it was passed
   to btrfs_work_owner() in the trace point itself).

Instead, let's add some clearer warnings as comments.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 fs/btrfs/async-thread.c      | 21 ++++++++-------------
 include/trace/events/btrfs.h |  6 +++---
 2 files changed, 11 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 3f3110975f88..b97ae1b03417 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -226,7 +226,6 @@ static void run_ordered_work(struct __btrfs_workqueue *wq,
 	struct btrfs_work *work;
 	spinlock_t *lock = &wq->list_lock;
 	unsigned long flags;
-	void *wtag;
 	bool free_self = false;
 
 	while (1) {
@@ -281,21 +280,19 @@ static void run_ordered_work(struct __btrfs_workqueue *wq,
 		} else {
 			/*
 			 * We don't want to call the ordered free functions with
-			 * the lock held though. Save the work as tag for the
-			 * trace event, because the callback could free the
-			 * structure.
+			 * the lock held.
 			 */
-			wtag = work;
 			work->ordered_free(work);
-			trace_btrfs_all_work_done(wq->fs_info, wtag);
+			/* NB: work must not be dereferenced past this point. */
+			trace_btrfs_all_work_done(wq->fs_info, work);
 		}
 	}
 	spin_unlock_irqrestore(lock, flags);
 
 	if (free_self) {
-		wtag = self;
 		self->ordered_free(self);
-		trace_btrfs_all_work_done(wq->fs_info, wtag);
+		/* NB: self must not be dereferenced past this point. */
+		trace_btrfs_all_work_done(wq->fs_info, self);
 	}
 }
 
@@ -304,7 +301,6 @@ static void btrfs_work_helper(struct work_struct *normal_work)
 	struct btrfs_work *work = container_of(normal_work, struct btrfs_work,
 					       normal_work);
 	struct __btrfs_workqueue *wq;
-	void *wtag;
 	int need_order = 0;
 
 	/*
@@ -318,8 +314,6 @@ static void btrfs_work_helper(struct work_struct *normal_work)
 	if (work->ordered_func)
 		need_order = 1;
 	wq = work->wq;
-	/* Safe for tracepoints in case work gets freed by the callback */
-	wtag = work;
 
 	trace_btrfs_work_sched(work);
 	thresh_exec_hook(wq);
@@ -327,9 +321,10 @@ static void btrfs_work_helper(struct work_struct *normal_work)
 	if (need_order) {
 		set_bit(WORK_DONE_BIT, &work->flags);
 		run_ordered_work(wq, work);
+	} else {
+		/* NB: work must not be dereferenced past this point. */
+		trace_btrfs_all_work_done(wq->fs_info, work);
 	}
-	if (!need_order)
-		trace_btrfs_all_work_done(wq->fs_info, wtag);
 }
 
 void btrfs_init_work(struct btrfs_work *work, btrfs_func_t func,
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 5df604de4f11..c60cf281cb86 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -1389,9 +1389,9 @@ DECLARE_EVENT_CLASS(btrfs__work,
 );
 
 /*
- * For situiations when the work is freed, we pass fs_info and a tag that that
- * matches address of the work structure so it can be paired with the
- * scheduling event.
+ * For situations when the work is freed, we pass fs_info and a tag that matches
+ * the address of the work structure so it can be paired with the scheduling
+ * event. DO NOT add anything here that dereferences wtag.
  */
 DECLARE_EVENT_CLASS(btrfs__work__done,
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/7] btrfs: get rid of unnecessary memset() of work item
  2019-09-16 18:30 ` [PATCH 1/7] btrfs: get rid of unnecessary memset() of work item Omar Sandoval
@ 2019-09-17 11:46   ` Johannes Thumshirn
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2019-09-17 11:46 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, kernel-team

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/7] btrfs: don't prematurely free work in run_ordered_work()
  2019-09-16 18:30 ` [PATCH 2/7] btrfs: don't prematurely free work in run_ordered_work() Omar Sandoval
@ 2019-09-17 12:04   ` Johannes Thumshirn
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2019-09-17 12:04 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, kernel-team, Tejun Heo

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/7] btrfs: don't prematurely free work in end_workqueue_fn()
  2019-09-16 18:30 ` [PATCH 3/7] btrfs: don't prematurely free work in end_workqueue_fn() Omar Sandoval
@ 2019-09-17 12:05   ` Johannes Thumshirn
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2019-09-17 12:05 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, kernel-team

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/7] btrfs: don't prematurely free work in reada_start_machine_worker()
  2019-09-16 18:30 ` [PATCH 4/7] btrfs: don't prematurely free work in reada_start_machine_worker() Omar Sandoval
@ 2019-09-17 12:09   ` Johannes Thumshirn
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2019-09-17 12:09 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, kernel-team

On Mon, Sep 16, 2019 at 11:30:55AM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Currently, reada_start_machine_worker() frees the reada_machine_work and
> then calls __reada_start_machine() to do readahead. This is another
> potential instance of the bug in "btrfs: don't prematurely free work in
> run_ordered_work()".
> 
> There _might_ already be a deadlock here: reada_start_machine_worker()
> can depend on itself through stacked filesystems (__read_start_machine()
> -> reada_start_machine_dev() -> reada_tree_block_flagged() ->
> read_extent_buffer_pages() -> submit_one_bio() ->
> btree_submit_bio_hook() -> btrfs_map_bio() -> submit_stripe_bio() ->
> submit_bio() onto a loop device can trigger readahead on the lower
> filesystem).
> 
> Either way, let's fix it by freeing the work at the end.


Also remove the local fs_info variable as it got obsolete.

Anyways,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/7] btrfs: don't prematurely free work in scrub_missing_raid56_worker()
  2019-09-16 18:30 ` [PATCH 5/7] btrfs: don't prematurely free work in scrub_missing_raid56_worker() Omar Sandoval
@ 2019-09-17 12:10   ` Johannes Thumshirn
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2019-09-17 12:10 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, kernel-team

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/7] btrfs: workqueue fixes+cleanups
  2019-09-16 18:30 [PATCH 0/7] btrfs: workqueue fixes+cleanups Omar Sandoval
                   ` (6 preceding siblings ...)
  2019-09-16 18:30 ` [PATCH 7/7] btrfs: get rid of pointless wtag variable in async-thread.c Omar Sandoval
@ 2019-09-24 14:45 ` David Sterba
  7 siblings, 0 replies; 14+ messages in thread
From: David Sterba @ 2019-09-24 14:45 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-btrfs, kernel-team

On Mon, Sep 16, 2019 at 11:30:51AM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Hello,
> 
> This series fixes a few issues with Btrfs' use of workqueues.
> 
> The bulk of this series fixes multiple cases of the same bug: a work
> item shouldn't be freed while it potentially depends on any other work
> items. Currently, we've mostly been side-stepping this issue by having
> different work functions for different work types (see commit
> 9e0af2376434 ("Btrfs: fix task hang under heavy compressed write").
> However, there are cases where a work function can depend on itself,
> usually through stacked filesystems (e.g., loop devices).
> 
> Patch 1 is a trivial cleanup.
> 
> Patches 2-5 fix all of the cases of this bug that I could find. Patch 2
> is the ordered_free fix I previously sent [1], now with an updated title
> and Josef's fix [2] folded in. Patch 3 fixes the deadlock reported with
> my previous cleanup, and patches 4 and 5 fix similar issues.
> 
> With all of those fixed, patch 6 is my previous cleanup [3] to get rid
> of the different work functions. Patch 7 is the same cleanup [4] as from
> before.
> 
> Based on misc-next.
> 
> Thanks!
> 
> 1: https://lore.kernel.org/linux-btrfs/0bea516a54b26e4e1c42e6fe47548cb48cc4172b.1565112813.git.osandov@fb.com/
> 2: https://lore.kernel.org/linux-btrfs/20190821151803.cuyq2yzpdnwgwrmb@MacBook-Pro-91.local/
> 3: https://lore.kernel.org/linux-btrfs/afaf532af471dd1a8a1f4af9de570127529323b0.1565717248.git.osandov@fb.com/
> 4: https://lore.kernel.org/linux-btrfs/d4fa1870ffce027ada265a67f4e00d397b683241.1565717248.git.osandov@fb.com/
> 
> Omar Sandoval (7):
>   btrfs: get rid of unnecessary memset() of work item
>   btrfs: don't prematurely free work in run_ordered_work()
>   btrfs: don't prematurely free work in end_workqueue_fn()
>   btrfs: don't prematurely free work in reada_start_machine_worker()
>   btrfs: don't prematurely free work in scrub_missing_raid56_worker()
>   btrfs: get rid of unique workqueue helper functions
>   btrfs: get rid of pointless wtag variable in async-thread.c

Added to misc-next, thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-09-24 14:45 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-16 18:30 [PATCH 0/7] btrfs: workqueue fixes+cleanups Omar Sandoval
2019-09-16 18:30 ` [PATCH 1/7] btrfs: get rid of unnecessary memset() of work item Omar Sandoval
2019-09-17 11:46   ` Johannes Thumshirn
2019-09-16 18:30 ` [PATCH 2/7] btrfs: don't prematurely free work in run_ordered_work() Omar Sandoval
2019-09-17 12:04   ` Johannes Thumshirn
2019-09-16 18:30 ` [PATCH 3/7] btrfs: don't prematurely free work in end_workqueue_fn() Omar Sandoval
2019-09-17 12:05   ` Johannes Thumshirn
2019-09-16 18:30 ` [PATCH 4/7] btrfs: don't prematurely free work in reada_start_machine_worker() Omar Sandoval
2019-09-17 12:09   ` Johannes Thumshirn
2019-09-16 18:30 ` [PATCH 5/7] btrfs: don't prematurely free work in scrub_missing_raid56_worker() Omar Sandoval
2019-09-17 12:10   ` Johannes Thumshirn
2019-09-16 18:30 ` [PATCH 6/7] btrfs: get rid of unique workqueue helper functions Omar Sandoval
2019-09-16 18:30 ` [PATCH 7/7] btrfs: get rid of pointless wtag variable in async-thread.c Omar Sandoval
2019-09-24 14:45 ` [PATCH 0/7] btrfs: workqueue fixes+cleanups David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).