All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Btrfs: fix task hang under heavy compressed write
@ 2014-08-12  7:44 Liu Bo
  2014-08-12 14:35 ` [PATCH v2] " Liu Bo
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Liu Bo @ 2014-08-12  7:44 UTC (permalink / raw)
  To: linux-btrfs
  Cc: Chris Mason, miaox, Martin Steigerwald, Marc MERLIN, Torbjørn

This has been reported and discussed for a long time, and this hang occurs in
both 3.15 and 3.16.

Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.

Btrfs has a kind of work queued as an ordered way, which means that its
ordered_func() must be processed in the way of FIFO, so it usually looks like --

normal_work_helper(arg)
    work = container_of(arg, struct btrfs_work, normal_work);

    work->func() <---- (we name it work X)
    for ordered_work in wq->ordered_list
            ordered_work->ordered_func()
            ordered_work->ordered_free()

The hang is a rare case, first when we find free space, we get an uncached block
group, then we go to read its free space cache inode for free space information,
so it will

file a readahead request
    btrfs_readpages()
         for page that is not in page cache
                __do_readpage()
                     submit_extent_page()
                           btrfs_submit_bio_hook()
                                 btrfs_bio_wq_end_io()
                                 submit_bio()
                                 end_workqueue_bio() <--(ret by the 1st endio)
                                      queue a work(named work Y) for the 2nd
                                      also the real endio()

So the hang occurs when work Y's work_struct and work X's work_struct happens
to share the same address.

A bit more explanation,

A,B,C -- struct btrfs_work
arg   -- struct work_struct

kthread:
worker_thread()
    pick up a work_struct from @worklist
    process_one_work(arg)
	worker->current_work = arg;  <-- arg is A->normal_work
	worker->current_func(arg)
		normal_work_helper(arg)
		     A = container_of(arg, struct btrfs_work, normal_work);

		     A->func()
		     A->ordered_func()
		     A->ordered_free()  <-- A gets freed

		     B->ordered_func()
			  submit_compressed_extents()
			      find_free_extent()
				  load_free_space_inode()
				      ...   <-- (the above readhead stack)
				      end_workqueue_bio()
					   btrfs_queue_work(work C)
		     B->ordered_free()

As if work A has a high priority in wq->ordered_list and there are more ordered
works queued after it, such as B->ordered_func(), its memory could have been
freed before normal_work_helper() returns, which means that kernel workqueue
code worker_thread() still has worker->current_work pointer to be work
A->normal_work's, ie. arg's address.

Meanwhile, work C is allocated after work A is freed, work C->normal_work
and work A->normal_work are likely to share the same address(I confirmed this
with ftrace output, so I'm not just guessing, it's rare though).

When another kthread picks up work C->normal_work to process, and finds our
kthread is processing it(see find_worker_executing_work()), it'll think
work C as a collision and skip then, which ends up nobody processing work C.

So the situation is that our kthread is waiting forever on work C.

The key point is that they shouldn't have the same address, so this defers
->ordered_free() and does a batched free to avoid that.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/async-thread.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 5a201d8..2ac01b3 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -195,6 +195,7 @@ static void run_ordered_work(struct __btrfs_workqueue *wq)
 	struct btrfs_work *work;
 	spinlock_t *lock = &wq->list_lock;
 	unsigned long flags;
+	LIST_HEAD(free_list);
 
 	while (1) {
 		spin_lock_irqsave(lock, flags);
@@ -219,17 +220,24 @@ static void run_ordered_work(struct __btrfs_workqueue *wq)
 
 		/* now take the lock again and drop our item from the list */
 		spin_lock_irqsave(lock, flags);
-		list_del(&work->ordered_list);
+		list_move_tail(&work->ordered_list, &free_list);
 		spin_unlock_irqrestore(lock, flags);
 
 		/*
 		 * we don't want to call the ordered free functions
 		 * with the lock held though
 		 */
+	}
+	spin_unlock_irqrestore(lock, flags);
+
+	while (!list_empty(&free_list)) {
+		work = list_entry(free_list.next, struct btrfs_work,
+				  ordered_list);
+
+		list_del(&work->ordered_list);
 		work->ordered_free(work);
 		trace_btrfs_all_work_done(work);
 	}
-	spin_unlock_irqrestore(lock, flags);
 }
 
 static void normal_work_helper(struct work_struct *arg)
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2] Btrfs: fix task hang under heavy compressed write
  2014-08-12  7:44 [PATCH] Btrfs: fix task hang under heavy compressed write Liu Bo
@ 2014-08-12 14:35 ` Liu Bo
  2014-08-12 14:57 ` [PATCH] " Chris Mason
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Liu Bo @ 2014-08-12 14:35 UTC (permalink / raw)
  To: linux-btrfs

This has been reported and discussed for a long time, and this hang occurs in
both 3.15 and 3.16.

Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.

Btrfs has a kind of work queued as an ordered way, which means that its
ordered_func() must be processed in the way of FIFO, so it usually looks like --

normal_work_helper(arg)
    work = container_of(arg, struct btrfs_work, normal_work);

    work->func() <---- (we name it work X)
    for ordered_work in wq->ordered_list
            ordered_work->ordered_func()
            ordered_work->ordered_free()

The hang is a rare case, first when we find free space, we get an uncached block
group, then we go to read its free space cache inode for free space information,
so it will

file a readahead request
    btrfs_readpages()
         for page that is not in page cache
                __do_readpage()
                     submit_extent_page()
                           btrfs_submit_bio_hook()
                                 btrfs_bio_wq_end_io()
                                 submit_bio()
                                 end_workqueue_bio() <--(ret by the 1st endio)
                                      queue a work(named work Y) for the 2nd
                                      also the real endio()

So the hang occurs when work Y's work_struct and work X's work_struct happens
to share the same address.

A bit more explanation,

A,B,C -- struct btrfs_work
arg   -- struct work_struct

kthread:
worker_thread()
    pick up a work_struct from @worklist
    process_one_work(arg)
	worker->current_work = arg;  <-- arg is A->normal_work
	worker->current_func(arg)
		normal_work_helper(arg)
		     A = container_of(arg, struct btrfs_work, normal_work);

		     A->func()
		     A->ordered_func()
		     A->ordered_free()  <-- A gets freed

		     B->ordered_func()
			  submit_compressed_extents()
			      find_free_extent()
				  load_free_space_inode()
				      ...   <-- (the above readhead stack)
				      end_workqueue_bio()
					   btrfs_queue_work(work C)
		     B->ordered_free()

As if work A has a high priority in wq->ordered_list and there are more ordered
works queued after it, such as B->ordered_func(), its memory could have been
freed before normal_work_helper() returns, which means that kernel workqueue
code worker_thread() still has worker->current_work pointer to be work
A->normal_work's, ie. arg's address.

Meanwhile, work C is allocated after work A is freed, work C->normal_work
and work A->normal_work are likely to share the same address(I confirmed this
with ftrace output, so I'm not just guessing, it's rare though).

When another kthread picks up work C->normal_work to process, and finds our
kthread is processing it(see find_worker_executing_work()), it'll think
work C as a collision and skip then, which ends up nobody processing work C.

So the situation is that our kthread is waiting forever on work C.

The key point is that they shouldn't have the same address, so this defers
->ordered_free() to avoid that.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
v2:
   This changes a bit to not defer all ->ordered_free(), but only defer the work
   that triggers this run_ordered_work().  Actually we don't need to defer other
   ->ordered_free() because their work cannot be this kthread worker's
   @current_work.  We can benefit from it since this can pin less memory during
   the period.

 fs/btrfs/async-thread.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 5a201d8..9fa7e02 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -189,12 +189,14 @@ out:
 	}
 }
 
-static void run_ordered_work(struct __btrfs_workqueue *wq)
+static void run_ordered_work(struct __btrfs_workqueue *wq,
+			     struct btrfs_work *orig)
 {
 	struct list_head *list = &wq->ordered_list;
 	struct btrfs_work *work;
 	spinlock_t *lock = &wq->list_lock;
 	unsigned long flags;
+	bool delay_free = false;
 
 	while (1) {
 		spin_lock_irqsave(lock, flags);
@@ -226,10 +228,19 @@ static void run_ordered_work(struct __btrfs_workqueue *wq)
 		 * we don't want to call the ordered free functions
 		 * with the lock held though
 		 */
-		work->ordered_free(work);
-		trace_btrfs_all_work_done(work);
+		if (work == orig) {
+			delay_free = true;
+		} else {
+			work->ordered_free(work);
+			trace_btrfs_all_work_done(work);
+		}
 	}
 	spin_unlock_irqrestore(lock, flags);
+
+	if (delay_free) {
+		orig->ordered_free(orig);
+		trace_btrfs_all_work_done(orig);
+	}
 }
 
 static void normal_work_helper(struct work_struct *arg)
@@ -256,7 +267,7 @@ static void normal_work_helper(struct work_struct *arg)
 	work->func(work);
 	if (need_order) {
 		set_bit(WORK_DONE_BIT, &work->flags);
-		run_ordered_work(wq);
+		run_ordered_work(wq, work);
 	}
 	if (!need_order)
 		trace_btrfs_all_work_done(work);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH] Btrfs: fix task hang under heavy compressed write
  2014-08-12  7:44 [PATCH] Btrfs: fix task hang under heavy compressed write Liu Bo
  2014-08-12 14:35 ` [PATCH v2] " Liu Bo
@ 2014-08-12 14:57 ` Chris Mason
  2014-08-13  0:53   ` Qu Wenruo
  2014-08-13 11:54 ` Martin Steigerwald
  2014-08-15 15:36 ` [PATCH v3] " Liu Bo
  3 siblings, 1 reply; 22+ messages in thread
From: Chris Mason @ 2014-08-12 14:57 UTC (permalink / raw)
  To: Liu Bo, linux-btrfs; +Cc: miaox, Martin Steigerwald, Marc MERLIN, Torbjørn



On 08/12/2014 03:44 AM, Liu Bo wrote:
> This has been reported and discussed for a long time, and this hang occurs in
> both 3.15 and 3.16.
> 
> Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.
> 
> Btrfs has a kind of work queued as an ordered way, which means that its
> ordered_func() must be processed in the way of FIFO, so it usually looks like --

This definitely explains some problems, and I overlooked the part where
all of our workers use the same normal_work()

But I think it's actually goes beyond just the ordered work queues.

Process A:
	btrfs_bio_wq_end_io() -> kmalloc a end_io_wq struct at address P
	submit bio
	end bio
	btrfs_queue_work(endio_write_workers)
	worker thread jumps in
	end_workqueue_fn()
		-> kfree(end_io_wq)
		^^^^^ right here end_io_wq can be reused,
		but the worker thread is still processing this work item

Process B:
	btrfs_bio_wq_end() -> kmalloc an end_io_wq struct, reuse P
	submit bio
	end bio ... sometimes this is really fast
	btrfs_queue_work(endio_workers) // lets do a read
		->process_one_work()
		    -> find_worker_executing_work()
		    ^^^^^ now we get in trouble.  our struct P is still
		    active and so find_worker_executing_work() is going
		    to queue up this read completion on the end of the
		    scheduled list for this worker in the generic code.

		    The end result is we can have read IO completions
		    queued up behind write IO completions.

This example uses the bio end io code, but we probably have others.  The
real solution is to have each btrfs workqueue provide its own worker
function, or each caller to btrfs_queue_work to send a unique worker
function down to the generic code.

Thanks Liu, great job finding this.

-chris

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Btrfs: fix task hang under heavy compressed write
  2014-08-12 14:57 ` [PATCH] " Chris Mason
@ 2014-08-13  0:53   ` Qu Wenruo
  0 siblings, 0 replies; 22+ messages in thread
From: Qu Wenruo @ 2014-08-13  0:53 UTC (permalink / raw)
  To: Chris Mason, Liu Bo, linux-btrfs
  Cc: miaox, Martin Steigerwald, Marc MERLIN, Torbjørn


-------- Original Message --------
Subject: Re: [PATCH] Btrfs: fix task hang under heavy compressed write
From: Chris Mason <clm@fb.com>
To: Liu Bo <bo.li.liu@oracle.com>, linux-btrfs <linux-btrfs@vger.kernel.org>
Date: 2014年08月12日 22:57
>
> On 08/12/2014 03:44 AM, Liu Bo wrote:
>> This has been reported and discussed for a long time, and this hang occurs in
>> both 3.15 and 3.16.
>>
>> Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.
>>
>> Btrfs has a kind of work queued as an ordered way, which means that its
>> ordered_func() must be processed in the way of FIFO, so it usually looks like --
> This definitely explains some problems, and I overlooked the part where
> all of our workers use the same normal_work()
Oh, all my faults, it didn't occur to me that it is possible that other 
process can alloc the same address of memory :(
And same normal_work() makes things much worse.
>
> But I think it's actually goes beyond just the ordered work queues.
>
> Process A:
> 	btrfs_bio_wq_end_io() -> kmalloc a end_io_wq struct at address P
> 	submit bio
> 	end bio
> 	btrfs_queue_work(endio_write_workers)
> 	worker thread jumps in
> 	end_workqueue_fn()
> 		-> kfree(end_io_wq)
> 		^^^^^ right here end_io_wq can be reused,
> 		but the worker thread is still processing this work item
>
> Process B:
> 	btrfs_bio_wq_end() -> kmalloc an end_io_wq struct, reuse P
> 	submit bio
> 	end bio ... sometimes this is really fast
> 	btrfs_queue_work(endio_workers) // lets do a read
> 		->process_one_work()
> 		    -> find_worker_executing_work()
> 		    ^^^^^ now we get in trouble.  our struct P is still
> 		    active and so find_worker_executing_work() is going
> 		    to queue up this read completion on the end of the
> 		    scheduled list for this worker in the generic code.
>
> 		    The end result is we can have read IO completions
> 		    queued up behind write IO completions.
>
> This example uses the bio end io code, but we probably have others.  The
> real solution is to have each btrfs workqueue provide its own worker
> function, or each caller to btrfs_queue_work to send a unique worker
> function down to the generic code.
That's true, personally I prefer the first one since it affects less 
callers, but it seems to need more macro
hacks or somthing like it to generate different functions but they all 
do the samething...

Thanks,
Qu
>
> Thanks Liu, great job finding this.
>
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Btrfs: fix task hang under heavy compressed write
  2014-08-12  7:44 [PATCH] Btrfs: fix task hang under heavy compressed write Liu Bo
  2014-08-12 14:35 ` [PATCH v2] " Liu Bo
  2014-08-12 14:57 ` [PATCH] " Chris Mason
@ 2014-08-13 11:54 ` Martin Steigerwald
  2014-08-13 13:27   ` Rich Freeman
  2014-08-13 15:20   ` Liu Bo
  2014-08-15 15:36 ` [PATCH v3] " Liu Bo
  3 siblings, 2 replies; 22+ messages in thread
From: Martin Steigerwald @ 2014-08-13 11:54 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs, Chris Mason, miaox, Marc MERLIN, Torbjørn

Am Dienstag, 12. August 2014, 15:44:59 schrieb Liu Bo:
> This has been reported and discussed for a long time, and this hang occurs
> in both 3.15 and 3.16.

Liu, is this safe for testing yet?

Thanks,
Martin

> Btrfs now migrates to use kernel workqueue, but it introduces this hang
> problem.
> 
> Btrfs has a kind of work queued as an ordered way, which means that its
> ordered_func() must be processed in the way of FIFO, so it usually looks
> like --
> 
> normal_work_helper(arg)
>     work = container_of(arg, struct btrfs_work, normal_work);
> 
>     work->func() <---- (we name it work X)
>     for ordered_work in wq->ordered_list
>             ordered_work->ordered_func()
>             ordered_work->ordered_free()
> 
> The hang is a rare case, first when we find free space, we get an uncached
> block group, then we go to read its free space cache inode for free space
> information, so it will
> 
> file a readahead request
>     btrfs_readpages()
>          for page that is not in page cache
>                 __do_readpage()
>                      submit_extent_page()
>                            btrfs_submit_bio_hook()
>                                  btrfs_bio_wq_end_io()
>                                  submit_bio()
>                                  end_workqueue_bio() <--(ret by the 1st
> endio) queue a work(named work Y) for the 2nd also the real endio()
> 
> So the hang occurs when work Y's work_struct and work X's work_struct
> happens to share the same address.
> 
> A bit more explanation,
> 
> A,B,C -- struct btrfs_work
> arg   -- struct work_struct
> 
> kthread:
> worker_thread()
>     pick up a work_struct from @worklist
>     process_one_work(arg)
> 	worker->current_work = arg;  <-- arg is A->normal_work
> 	worker->current_func(arg)
> 		normal_work_helper(arg)
> 		     A = container_of(arg, struct btrfs_work, normal_work);
> 
> 		     A->func()
> 		     A->ordered_func()
> 		     A->ordered_free()  <-- A gets freed
> 
> 		     B->ordered_func()
> 			  submit_compressed_extents()
> 			      find_free_extent()
> 				  load_free_space_inode()
> 				      ...   <-- (the above readhead stack)
> 				      end_workqueue_bio()
> 					   btrfs_queue_work(work C)
> 		     B->ordered_free()
> 
> As if work A has a high priority in wq->ordered_list and there are more
> ordered works queued after it, such as B->ordered_func(), its memory could
> have been freed before normal_work_helper() returns, which means that
> kernel workqueue code worker_thread() still has worker->current_work
> pointer to be work A->normal_work's, ie. arg's address.
> 
> Meanwhile, work C is allocated after work A is freed, work C->normal_work
> and work A->normal_work are likely to share the same address(I confirmed
> this with ftrace output, so I'm not just guessing, it's rare though).
> 
> When another kthread picks up work C->normal_work to process, and finds our
> kthread is processing it(see find_worker_executing_work()), it'll think
> work C as a collision and skip then, which ends up nobody processing work C.
> 
> So the situation is that our kthread is waiting forever on work C.
> 
> The key point is that they shouldn't have the same address, so this defers
> ->ordered_free() and does a batched free to avoid that.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/async-thread.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
> index 5a201d8..2ac01b3 100644
> --- a/fs/btrfs/async-thread.c
> +++ b/fs/btrfs/async-thread.c
> @@ -195,6 +195,7 @@ static void run_ordered_work(struct __btrfs_workqueue
> *wq) struct btrfs_work *work;
>  	spinlock_t *lock = &wq->list_lock;
>  	unsigned long flags;
> +	LIST_HEAD(free_list);
> 
>  	while (1) {
>  		spin_lock_irqsave(lock, flags);
> @@ -219,17 +220,24 @@ static void run_ordered_work(struct __btrfs_workqueue
> *wq)
> 
>  		/* now take the lock again and drop our item from the list */
>  		spin_lock_irqsave(lock, flags);
> -		list_del(&work->ordered_list);
> +		list_move_tail(&work->ordered_list, &free_list);
>  		spin_unlock_irqrestore(lock, flags);
> 
>  		/*
>  		 * we don't want to call the ordered free functions
>  		 * with the lock held though
>  		 */
> +	}
> +	spin_unlock_irqrestore(lock, flags);
> +
> +	while (!list_empty(&free_list)) {
> +		work = list_entry(free_list.next, struct btrfs_work,
> +				  ordered_list);
> +
> +		list_del(&work->ordered_list);
>  		work->ordered_free(work);
>  		trace_btrfs_all_work_done(work);
>  	}
> -	spin_unlock_irqrestore(lock, flags);
>  }
> 
>  static void normal_work_helper(struct work_struct *arg)

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Btrfs: fix task hang under heavy compressed write
  2014-08-13 11:54 ` Martin Steigerwald
@ 2014-08-13 13:27   ` Rich Freeman
  2014-08-13 15:20   ` Liu Bo
  1 sibling, 0 replies; 22+ messages in thread
From: Rich Freeman @ 2014-08-13 13:27 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Liu Bo, linux-btrfs, Chris Mason, miaox, Marc MERLIN, Torbjørn

On Wed, Aug 13, 2014 at 7:54 AM, Martin Steigerwald <Martin@lichtvoll.de> wrote:
> Am Dienstag, 12. August 2014, 15:44:59 schrieb Liu Bo:
>> This has been reported and discussed for a long time, and this hang occurs
>> in both 3.15 and 3.16.
>
> Liu, is this safe for testing yet?
>

I'm more than happy to test this an re-enable lzo (I've been running
fine on 3.16 with it disabled, but had numerous issues when it was
enabled on 3.15 and the rcs).  It would just be helpful to clarify
exactly what patch we should be testing, and what kernel we should
test it against to be most helpful.  No sense generating issue reports
that aren't useful.

Rich

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Btrfs: fix task hang under heavy compressed write
  2014-08-13 11:54 ` Martin Steigerwald
  2014-08-13 13:27   ` Rich Freeman
@ 2014-08-13 15:20   ` Liu Bo
  2014-08-14  9:27     ` Martin Steigerwald
  1 sibling, 1 reply; 22+ messages in thread
From: Liu Bo @ 2014-08-13 15:20 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: linux-btrfs, Chris Mason, miaox, Marc MERLIN, Torbjørn

On Wed, Aug 13, 2014 at 01:54:40PM +0200, Martin Steigerwald wrote:
> Am Dienstag, 12. August 2014, 15:44:59 schrieb Liu Bo:
> > This has been reported and discussed for a long time, and this hang occurs
> > in both 3.15 and 3.16.
> 
> Liu, is this safe for testing yet?

Yes, I've confirmed that this hang doesn't occur by running my tests for 2
days(usually it hangs in 2 hours).

But...
As Chris said in the thread, this is more a workaround, there're other potential
issues that would lead to similar deadlock.

I'm trying to write a real fix instead of a workaround.

thanks,
-liubo

> 
> Thanks,
> Martin
> 
> > Btrfs now migrates to use kernel workqueue, but it introduces this hang
> > problem.
> > 
> > Btrfs has a kind of work queued as an ordered way, which means that its
> > ordered_func() must be processed in the way of FIFO, so it usually looks
> > like --
> > 
> > normal_work_helper(arg)
> >     work = container_of(arg, struct btrfs_work, normal_work);
> > 
> >     work->func() <---- (we name it work X)
> >     for ordered_work in wq->ordered_list
> >             ordered_work->ordered_func()
> >             ordered_work->ordered_free()
> > 
> > The hang is a rare case, first when we find free space, we get an uncached
> > block group, then we go to read its free space cache inode for free space
> > information, so it will
> > 
> > file a readahead request
> >     btrfs_readpages()
> >          for page that is not in page cache
> >                 __do_readpage()
> >                      submit_extent_page()
> >                            btrfs_submit_bio_hook()
> >                                  btrfs_bio_wq_end_io()
> >                                  submit_bio()
> >                                  end_workqueue_bio() <--(ret by the 1st
> > endio) queue a work(named work Y) for the 2nd also the real endio()
> > 
> > So the hang occurs when work Y's work_struct and work X's work_struct
> > happens to share the same address.
> > 
> > A bit more explanation,
> > 
> > A,B,C -- struct btrfs_work
> > arg   -- struct work_struct
> > 
> > kthread:
> > worker_thread()
> >     pick up a work_struct from @worklist
> >     process_one_work(arg)
> > 	worker->current_work = arg;  <-- arg is A->normal_work
> > 	worker->current_func(arg)
> > 		normal_work_helper(arg)
> > 		     A = container_of(arg, struct btrfs_work, normal_work);
> > 
> > 		     A->func()
> > 		     A->ordered_func()
> > 		     A->ordered_free()  <-- A gets freed
> > 
> > 		     B->ordered_func()
> > 			  submit_compressed_extents()
> > 			      find_free_extent()
> > 				  load_free_space_inode()
> > 				      ...   <-- (the above readhead stack)
> > 				      end_workqueue_bio()
> > 					   btrfs_queue_work(work C)
> > 		     B->ordered_free()
> > 
> > As if work A has a high priority in wq->ordered_list and there are more
> > ordered works queued after it, such as B->ordered_func(), its memory could
> > have been freed before normal_work_helper() returns, which means that
> > kernel workqueue code worker_thread() still has worker->current_work
> > pointer to be work A->normal_work's, ie. arg's address.
> > 
> > Meanwhile, work C is allocated after work A is freed, work C->normal_work
> > and work A->normal_work are likely to share the same address(I confirmed
> > this with ftrace output, so I'm not just guessing, it's rare though).
> > 
> > When another kthread picks up work C->normal_work to process, and finds our
> > kthread is processing it(see find_worker_executing_work()), it'll think
> > work C as a collision and skip then, which ends up nobody processing work C.
> > 
> > So the situation is that our kthread is waiting forever on work C.
> > 
> > The key point is that they shouldn't have the same address, so this defers
> > ->ordered_free() and does a batched free to avoid that.
> > 
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> >  fs/btrfs/async-thread.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
> > index 5a201d8..2ac01b3 100644
> > --- a/fs/btrfs/async-thread.c
> > +++ b/fs/btrfs/async-thread.c
> > @@ -195,6 +195,7 @@ static void run_ordered_work(struct __btrfs_workqueue
> > *wq) struct btrfs_work *work;
> >  	spinlock_t *lock = &wq->list_lock;
> >  	unsigned long flags;
> > +	LIST_HEAD(free_list);
> > 
> >  	while (1) {
> >  		spin_lock_irqsave(lock, flags);
> > @@ -219,17 +220,24 @@ static void run_ordered_work(struct __btrfs_workqueue
> > *wq)
> > 
> >  		/* now take the lock again and drop our item from the list */
> >  		spin_lock_irqsave(lock, flags);
> > -		list_del(&work->ordered_list);
> > +		list_move_tail(&work->ordered_list, &free_list);
> >  		spin_unlock_irqrestore(lock, flags);
> > 
> >  		/*
> >  		 * we don't want to call the ordered free functions
> >  		 * with the lock held though
> >  		 */
> > +	}
> > +	spin_unlock_irqrestore(lock, flags);
> > +
> > +	while (!list_empty(&free_list)) {
> > +		work = list_entry(free_list.next, struct btrfs_work,
> > +				  ordered_list);
> > +
> > +		list_del(&work->ordered_list);
> >  		work->ordered_free(work);
> >  		trace_btrfs_all_work_done(work);
> >  	}
> > -	spin_unlock_irqrestore(lock, flags);
> >  }
> > 
> >  static void normal_work_helper(struct work_struct *arg)
> 
> -- 
> Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
> GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Btrfs: fix task hang under heavy compressed write
  2014-08-13 15:20   ` Liu Bo
@ 2014-08-14  9:27     ` Martin Steigerwald
  2014-08-15 17:51       ` Martin Steigerwald
  0 siblings, 1 reply; 22+ messages in thread
From: Martin Steigerwald @ 2014-08-14  9:27 UTC (permalink / raw)
  To: bo.li.liu; +Cc: linux-btrfs, Chris Mason, miaox, Marc MERLIN, Torbjørn

Am Mittwoch, 13. August 2014, 23:20:46 schrieb Liu Bo:
> On Wed, Aug 13, 2014 at 01:54:40PM +0200, Martin Steigerwald wrote:
> > Am Dienstag, 12. August 2014, 15:44:59 schrieb Liu Bo:
> > > This has been reported and discussed for a long time, and this hang
> > > occurs
> > > in both 3.15 and 3.16.
> > 
> > Liu, is this safe for testing yet?
> 
> Yes, I've confirmed that this hang doesn't occur by running my tests for 2
> days(usually it hangs in 2 hours).
> 
> But...
> As Chris said in the thread, this is more a workaround, there're other
> potential issues that would lead to similar deadlock.
> 
> I'm trying to write a real fix instead of a workaround.

Thanks, so this one goes together with the fixed compressed write corruption 
one? I would put them onto 3.16.1. With 3.17 I want to wait till rc2 I think.

> thanks,
> -liubo
> 
> > Thanks,
> > Martin
> > 
> > > Btrfs now migrates to use kernel workqueue, but it introduces this hang
> > > problem.
> > > 
> > > Btrfs has a kind of work queued as an ordered way, which means that its
> > > ordered_func() must be processed in the way of FIFO, so it usually looks
> > > like --
> > > 
> > > normal_work_helper(arg)
> > > 
> > >     work = container_of(arg, struct btrfs_work, normal_work);
> > >     
> > >     work->func() <---- (we name it work X)
> > >     for ordered_work in wq->ordered_list
> > >     
> > >             ordered_work->ordered_func()
> > >             ordered_work->ordered_free()
> > > 
> > > The hang is a rare case, first when we find free space, we get an
> > > uncached
> > > block group, then we go to read its free space cache inode for free
> > > space
> > > information, so it will
> > > 
> > > file a readahead request
> > > 
> > >     btrfs_readpages()
> > >     
> > >          for page that is not in page cache
> > >          
> > >                 __do_readpage()
> > >                 
> > >                      submit_extent_page()
> > >                      
> > >                            btrfs_submit_bio_hook()
> > >                            
> > >                                  btrfs_bio_wq_end_io()
> > >                                  submit_bio()
> > >                                  end_workqueue_bio() <--(ret by the 1st
> > > 
> > > endio) queue a work(named work Y) for the 2nd also the real endio()
> > > 
> > > So the hang occurs when work Y's work_struct and work X's work_struct
> > > happens to share the same address.
> > > 
> > > A bit more explanation,
> > > 
> > > A,B,C -- struct btrfs_work
> > > arg   -- struct work_struct
> > > 
> > > kthread:
> > > worker_thread()
> > > 
> > >     pick up a work_struct from @worklist
> > >     process_one_work(arg)
> > > 	
> > > 	worker->current_work = arg;  <-- arg is A->normal_work
> > > 	worker->current_func(arg)
> > > 	
> > > 		normal_work_helper(arg)
> > > 		
> > > 		     A = container_of(arg, struct btrfs_work, normal_work);
> > > 		     
> > > 		     A->func()
> > > 		     A->ordered_func()
> > > 		     A->ordered_free()  <-- A gets freed
> > > 		     
> > > 		     B->ordered_func()
> > > 			  
> > > 			  submit_compressed_extents()
> > > 			  
> > > 			      find_free_extent()
> > > 				  
> > > 				  load_free_space_inode()
> > > 				  
> > > 				      ...   <-- (the above readhead stack)
> > > 				      end_workqueue_bio()
> > > 					   
> > > 					   btrfs_queue_work(work C)
> > > 		     
> > > 		     B->ordered_free()
> > > 
> > > As if work A has a high priority in wq->ordered_list and there are more
> > > ordered works queued after it, such as B->ordered_func(), its memory
> > > could
> > > have been freed before normal_work_helper() returns, which means that
> > > kernel workqueue code worker_thread() still has worker->current_work
> > > pointer to be work A->normal_work's, ie. arg's address.
> > > 
> > > Meanwhile, work C is allocated after work A is freed, work
> > > C->normal_work
> > > and work A->normal_work are likely to share the same address(I confirmed
> > > this with ftrace output, so I'm not just guessing, it's rare though).
> > > 
> > > When another kthread picks up work C->normal_work to process, and finds
> > > our
> > > kthread is processing it(see find_worker_executing_work()), it'll think
> > > work C as a collision and skip then, which ends up nobody processing
> > > work C.
> > > 
> > > So the situation is that our kthread is waiting forever on work C.
> > > 
> > > The key point is that they shouldn't have the same address, so this
> > > defers
> > > ->ordered_free() and does a batched free to avoid that.
> > > 
> > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > > ---
> > > 
> > >  fs/btrfs/async-thread.c | 12 ++++++++++--
> > >  1 file changed, 10 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
> > > index 5a201d8..2ac01b3 100644
> > > --- a/fs/btrfs/async-thread.c
> > > +++ b/fs/btrfs/async-thread.c
> > > @@ -195,6 +195,7 @@ static void run_ordered_work(struct
> > > __btrfs_workqueue
> > > *wq) struct btrfs_work *work;
> > > 
> > >  	spinlock_t *lock = &wq->list_lock;
> > >  	unsigned long flags;
> > > 
> > > +	LIST_HEAD(free_list);
> > > 
> > >  	while (1) {
> > >  	
> > >  		spin_lock_irqsave(lock, flags);
> > > 
> > > @@ -219,17 +220,24 @@ static void run_ordered_work(struct
> > > __btrfs_workqueue
> > > *wq)
> > > 
> > >  		/* now take the lock again and drop our item from the list */
> > >  		spin_lock_irqsave(lock, flags);
> > > 
> > > -		list_del(&work->ordered_list);
> > > +		list_move_tail(&work->ordered_list, &free_list);
> > > 
> > >  		spin_unlock_irqrestore(lock, flags);
> > >  		
> > >  		/*
> > >  		
> > >  		 * we don't want to call the ordered free functions
> > >  		 * with the lock held though
> > >  		 */
> > > 
> > > +	}
> > > +	spin_unlock_irqrestore(lock, flags);
> > > +
> > > +	while (!list_empty(&free_list)) {
> > > +		work = list_entry(free_list.next, struct btrfs_work,
> > > +				  ordered_list);
> > > +
> > > +		list_del(&work->ordered_list);
> > > 
> > >  		work->ordered_free(work);
> > >  		trace_btrfs_all_work_done(work);
> > >  	
> > >  	}
> > > 
> > > -	spin_unlock_irqrestore(lock, flags);
> > > 
> > >  }
> > >  
> > >  static void normal_work_helper(struct work_struct *arg)

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3] Btrfs: fix task hang under heavy compressed write
  2014-08-12  7:44 [PATCH] Btrfs: fix task hang under heavy compressed write Liu Bo
                   ` (2 preceding siblings ...)
  2014-08-13 11:54 ` Martin Steigerwald
@ 2014-08-15 15:36 ` Liu Bo
  2014-08-15 16:05   ` Chris Mason
                     ` (2 more replies)
  3 siblings, 3 replies; 22+ messages in thread
From: Liu Bo @ 2014-08-15 15:36 UTC (permalink / raw)
  To: linux-btrfs

This has been reported and discussed for a long time, and this hang occurs in
both 3.15 and 3.16.

Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.

Btrfs has a kind of work queued as an ordered way, which means that its
ordered_func() must be processed in the way of FIFO, so it usually looks like --

normal_work_helper(arg)
    work = container_of(arg, struct btrfs_work, normal_work);

    work->func() <---- (we name it work X)
    for ordered_work in wq->ordered_list
            ordered_work->ordered_func()
            ordered_work->ordered_free()

The hang is a rare case, first when we find free space, we get an uncached block
group, then we go to read its free space cache inode for free space information,
so it will

file a readahead request
    btrfs_readpages()
         for page that is not in page cache
                __do_readpage()
                     submit_extent_page()
                           btrfs_submit_bio_hook()
                                 btrfs_bio_wq_end_io()
                                 submit_bio()
                                 end_workqueue_bio() <--(ret by the 1st endio)
                                      queue a work(named work Y) for the 2nd
                                      also the real endio()

So the hang occurs when work Y's work_struct and work X's work_struct happens
to share the same address.

A bit more explanation,

A,B,C -- struct btrfs_work
arg   -- struct work_struct

kthread:
worker_thread()
    pick up a work_struct from @worklist
    process_one_work(arg)
	worker->current_work = arg;  <-- arg is A->normal_work
	worker->current_func(arg)
		normal_work_helper(arg)
		     A = container_of(arg, struct btrfs_work, normal_work);

		     A->func()
		     A->ordered_func()
		     A->ordered_free()  <-- A gets freed

		     B->ordered_func()
			  submit_compressed_extents()
			      find_free_extent()
				  load_free_space_inode()
				      ...   <-- (the above readhead stack)
				      end_workqueue_bio()
					   btrfs_queue_work(work C)
		     B->ordered_free()

As if work A has a high priority in wq->ordered_list and there are more ordered
works queued after it, such as B->ordered_func(), its memory could have been
freed before normal_work_helper() returns, which means that kernel workqueue
code worker_thread() still has worker->current_work pointer to be work
A->normal_work's, ie. arg's address.

Meanwhile, work C is allocated after work A is freed, work C->normal_work
and work A->normal_work are likely to share the same address(I confirmed this
with ftrace output, so I'm not just guessing, it's rare though).

When another kthread picks up work C->normal_work to process, and finds our
kthread is processing it(see find_worker_executing_work()), it'll think
work C as a collision and skip then, which ends up nobody processing work C.

So the situation is that our kthread is waiting forever on work C.

Besides, there're other cases that can lead to deadlock, but the real problem
is that all btrfs workqueue shares one work->func, -- normal_work_helper,
so this makes each workqueue to have its own helper function, but only a
wraper pf normal_work_helper.

With this patch, I no long hit the above hang.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
v3:
  - Adopting Chris's advice, this has each workqueue provide its own helper
    function.
  - This is rebased on the latest Chris's for-linus branch.

v2:
  - This changes a bit to not defer all ->ordered_free(), but only defer the
    work that triggers this run_ordered_work().  Actually we don't need to
    defer other ->ordered_free() because their work cannot be this kthread
    worker's @current_work.  We can benefit from it since this can pin less
    memory during the period.

 fs/btrfs/async-thread.c  | 44 +++++++++++++++++++++++++++------
 fs/btrfs/async-thread.h  | 28 ++++++++++++++++++++-
 fs/btrfs/delayed-inode.c |  4 +--
 fs/btrfs/disk-io.c       | 63 ++++++++++++++++++++++++++----------------------
 fs/btrfs/extent-tree.c   |  7 +++---
 fs/btrfs/inode.c         | 35 ++++++++++++++++++---------
 fs/btrfs/ordered-data.c  |  1 +
 fs/btrfs/qgroup.c        |  1 +
 fs/btrfs/raid56.c        |  9 ++++---
 fs/btrfs/reada.c         |  3 ++-
 fs/btrfs/scrub.c         | 14 ++++++-----
 fs/btrfs/volumes.c       |  3 ++-
 12 files changed, 146 insertions(+), 66 deletions(-)

diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
index 5a201d8..fbd76de 100644
--- a/fs/btrfs/async-thread.c
+++ b/fs/btrfs/async-thread.c
@@ -22,7 +22,6 @@
 #include <linux/list.h>
 #include <linux/spinlock.h>
 #include <linux/freezer.h>
-#include <linux/workqueue.h>
 #include "async-thread.h"
 #include "ctree.h"
 
@@ -55,8 +54,39 @@ struct btrfs_workqueue {
 	struct __btrfs_workqueue *high;
 };
 
-static inline struct __btrfs_workqueue
-*__btrfs_alloc_workqueue(const char *name, int flags, int max_active,
+static void normal_work_helper(struct btrfs_work *work);
+
+#define BTRFS_WORK_HELPER(name)					\
+void btrfs_##name(struct work_struct *arg)				\
+{									\
+	struct btrfs_work *work = container_of(arg, struct btrfs_work,	\
+					       normal_work);		\
+	normal_work_helper(work);					\
+}
+
+BTRFS_WORK_HELPER(worker_helper);
+BTRFS_WORK_HELPER(delalloc_helper);
+BTRFS_WORK_HELPER(flush_delalloc_helper);
+BTRFS_WORK_HELPER(cache_helper);
+BTRFS_WORK_HELPER(submit_helper);
+BTRFS_WORK_HELPER(fixup_helper);
+BTRFS_WORK_HELPER(endio_helper);
+BTRFS_WORK_HELPER(endio_meta_helper);
+BTRFS_WORK_HELPER(endio_meta_write_helper);
+BTRFS_WORK_HELPER(endio_raid56_helper);
+BTRFS_WORK_HELPER(rmw_helper);
+BTRFS_WORK_HELPER(endio_write_helper);
+BTRFS_WORK_HELPER(freespace_write_helper);
+BTRFS_WORK_HELPER(delayed_meta_helper);
+BTRFS_WORK_HELPER(readahead_helper);
+BTRFS_WORK_HELPER(qgroup_rescan_helper);
+BTRFS_WORK_HELPER(extent_refs_helper);
+BTRFS_WORK_HELPER(scrub_helper);
+BTRFS_WORK_HELPER(scrubwrc_helper);
+BTRFS_WORK_HELPER(scrubnc_helper);
+
+static struct __btrfs_workqueue *
+__btrfs_alloc_workqueue(const char *name, int flags, int max_active,
 			 int thresh)
 {
 	struct __btrfs_workqueue *ret = kzalloc(sizeof(*ret), GFP_NOFS);
@@ -232,13 +262,11 @@ static void run_ordered_work(struct __btrfs_workqueue *wq)
 	spin_unlock_irqrestore(lock, flags);
 }
 
-static void normal_work_helper(struct work_struct *arg)
+static void normal_work_helper(struct btrfs_work *work)
 {
-	struct btrfs_work *work;
 	struct __btrfs_workqueue *wq;
 	int need_order = 0;
 
-	work = container_of(arg, struct btrfs_work, normal_work);
 	/*
 	 * We should not touch things inside work in the following cases:
 	 * 1) after work->func() if it has no ordered_free
@@ -262,7 +290,7 @@ static void normal_work_helper(struct work_struct *arg)
 		trace_btrfs_all_work_done(work);
 }
 
-void btrfs_init_work(struct btrfs_work *work,
+void btrfs_init_work(struct btrfs_work *work, btrfs_work_func_t uniq_func,
 		     btrfs_func_t func,
 		     btrfs_func_t ordered_func,
 		     btrfs_func_t ordered_free)
@@ -270,7 +298,7 @@ void btrfs_init_work(struct btrfs_work *work,
 	work->func = func;
 	work->ordered_func = ordered_func;
 	work->ordered_free = ordered_free;
-	INIT_WORK(&work->normal_work, normal_work_helper);
+	INIT_WORK(&work->normal_work, uniq_func);
 	INIT_LIST_HEAD(&work->ordered_list);
 	work->flags = 0;
 }
diff --git a/fs/btrfs/async-thread.h b/fs/btrfs/async-thread.h
index 9c6b66d..e9e31c9 100644
--- a/fs/btrfs/async-thread.h
+++ b/fs/btrfs/async-thread.h
@@ -19,12 +19,14 @@
 
 #ifndef __BTRFS_ASYNC_THREAD_
 #define __BTRFS_ASYNC_THREAD_
+#include <linux/workqueue.h>
 
 struct btrfs_workqueue;
 /* Internal use only */
 struct __btrfs_workqueue;
 struct btrfs_work;
 typedef void (*btrfs_func_t)(struct btrfs_work *arg);
+typedef void (*btrfs_work_func_t)(struct work_struct *arg);
 
 struct btrfs_work {
 	btrfs_func_t func;
@@ -38,11 +40,35 @@ struct btrfs_work {
 	unsigned long flags;
 };
 
+#define BTRFS_WORK_HELPER_PROTO(name)					\
+void btrfs_##name(struct work_struct *arg)
+
+BTRFS_WORK_HELPER_PROTO(worker_helper);
+BTRFS_WORK_HELPER_PROTO(delalloc_helper);
+BTRFS_WORK_HELPER_PROTO(flush_delalloc_helper);
+BTRFS_WORK_HELPER_PROTO(cache_helper);
+BTRFS_WORK_HELPER_PROTO(submit_helper);
+BTRFS_WORK_HELPER_PROTO(fixup_helper);
+BTRFS_WORK_HELPER_PROTO(endio_helper);
+BTRFS_WORK_HELPER_PROTO(endio_meta_helper);
+BTRFS_WORK_HELPER_PROTO(endio_meta_write_helper);
+BTRFS_WORK_HELPER_PROTO(endio_raid56_helper);
+BTRFS_WORK_HELPER_PROTO(rmw_helper);
+BTRFS_WORK_HELPER_PROTO(endio_write_helper);
+BTRFS_WORK_HELPER_PROTO(freespace_write_helper);
+BTRFS_WORK_HELPER_PROTO(delayed_meta_helper);
+BTRFS_WORK_HELPER_PROTO(readahead_helper);
+BTRFS_WORK_HELPER_PROTO(qgroup_rescan_helper);
+BTRFS_WORK_HELPER_PROTO(extent_refs_helper);
+BTRFS_WORK_HELPER_PROTO(scrub_helper);
+BTRFS_WORK_HELPER_PROTO(scrubwrc_helper);
+BTRFS_WORK_HELPER_PROTO(scrubnc_helper);
+
 struct btrfs_workqueue *btrfs_alloc_workqueue(const char *name,
 					      int flags,
 					      int max_active,
 					      int thresh);
-void btrfs_init_work(struct btrfs_work *work,
+void btrfs_init_work(struct btrfs_work *work, btrfs_work_func_t helper,
 		     btrfs_func_t func,
 		     btrfs_func_t ordered_func,
 		     btrfs_func_t ordered_free);
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 0816814..054577b 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1395,8 +1395,8 @@ static int btrfs_wq_run_delayed_node(struct btrfs_delayed_root *delayed_root,
 		return -ENOMEM;
 
 	async_work->delayed_root = delayed_root;
-	btrfs_init_work(&async_work->work, btrfs_async_run_delayed_root,
-			NULL, NULL);
+	btrfs_init_work(&async_work->work, btrfs_delayed_meta_helper,
+			btrfs_async_run_delayed_root, NULL, NULL);
 	async_work->nr = nr;
 
 	btrfs_queue_work(root->fs_info->delayed_workers, &async_work->work);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index fd9be2d..28098ba 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -39,7 +39,6 @@
 #include "btrfs_inode.h"
 #include "volumes.h"
 #include "print-tree.h"
-#include "async-thread.h"
 #include "locking.h"
 #include "tree-log.h"
 #include "free-space-cache.h"
@@ -714,42 +713,48 @@ static void end_workqueue_bio(struct bio *bio, int err)
 {
 	struct end_io_wq *end_io_wq = bio->bi_private;
 	struct btrfs_fs_info *fs_info;
+	struct btrfs_workqueue *wq;
+	btrfs_work_func_t func;
 
 	fs_info = end_io_wq->info;
 	end_io_wq->error = err;
 
-	if (likely(end_io_wq->metadata != BTRFS_WQ_ENDIO_DIO_REPAIR))
-		btrfs_init_work(&end_io_wq->work, end_workqueue_fn, NULL,
-				NULL);
-	else
+	if (unlikely(end_io_wq->metadata == BTRFS_WQ_ENDIO_DIO_REPAIR &&
+		     !(bio->bi_rw & REQ_WRITE))) {
 		INIT_WORK(&end_io_wq->work.normal_work, dio_end_workqueue_fn);
+		queue_work(system_wq, &end_io_wq->work.normal_work);
+		return;
+	}
 
 	if (bio->bi_rw & REQ_WRITE) {
-		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA)
-			btrfs_queue_work(fs_info->endio_meta_write_workers,
-					 &end_io_wq->work);
-		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE)
-			btrfs_queue_work(fs_info->endio_freespace_worker,
-					 &end_io_wq->work);
-		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
-			btrfs_queue_work(fs_info->endio_raid56_workers,
-					 &end_io_wq->work);
-		else
-			btrfs_queue_work(fs_info->endio_write_workers,
-					 &end_io_wq->work);
+		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA) {
+			wq = fs_info->endio_meta_write_workers;
+			func = btrfs_endio_meta_write_helper;
+		} else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE) {
+			wq = fs_info->endio_freespace_worker;
+			func = btrfs_freespace_write_helper;
+		} else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56) {
+			wq = fs_info->endio_raid56_workers;
+			func = btrfs_endio_raid56_helper;
+		} else {
+			wq = fs_info->endio_write_workers;
+			func = btrfs_endio_write_helper;
+		}
 	} else {
-		if (unlikely(end_io_wq->metadata == BTRFS_WQ_ENDIO_DIO_REPAIR))
-			queue_work(system_wq, &end_io_wq->work.normal_work);
-		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
-			btrfs_queue_work(fs_info->endio_raid56_workers,
-					 &end_io_wq->work);
-		else if (end_io_wq->metadata)
-			btrfs_queue_work(fs_info->endio_meta_workers,
-					 &end_io_wq->work);
-		else
-			btrfs_queue_work(fs_info->endio_workers,
-					 &end_io_wq->work);
+		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56) {
+			wq = fs_info->endio_raid56_workers;
+			func = btrfs_endio_raid56_helper;
+		} else if (end_io_wq->metadata) {
+			wq = fs_info->endio_meta_workers;
+			func = btrfs_endio_meta_helper;
+		} else {
+			wq = fs_info->endio_workers;
+			func = btrfs_endio_helper;
+		}
 	}
+
+	btrfs_init_work(&end_io_wq->work, func, end_workqueue_fn, NULL, NULL);
+	btrfs_queue_work(wq, &end_io_wq->work);
 }
 
 /*
@@ -857,7 +862,7 @@ int btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct inode *inode,
 	async->submit_bio_start = submit_bio_start;
 	async->submit_bio_done = submit_bio_done;
 
-	btrfs_init_work(&async->work, run_one_async_start,
+	btrfs_init_work(&async->work, btrfs_worker_helper, run_one_async_start,
 			run_one_async_done, run_one_async_free);
 
 	async->bio_flags = bio_flags;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index ef0845d..e105558 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -552,7 +552,8 @@ static int cache_block_group(struct btrfs_block_group_cache *cache,
 	caching_ctl->block_group = cache;
 	caching_ctl->progress = cache->key.objectid;
 	atomic_set(&caching_ctl->count, 1);
-	btrfs_init_work(&caching_ctl->work, caching_thread, NULL, NULL);
+	btrfs_init_work(&caching_ctl->work, btrfs_cache_helper,
+			caching_thread, NULL, NULL);
 
 	spin_lock(&cache->lock);
 	/*
@@ -2749,8 +2750,8 @@ int btrfs_async_run_delayed_refs(struct btrfs_root *root,
 		async->sync = 0;
 	init_completion(&async->wait);
 
-	btrfs_init_work(&async->work, delayed_ref_async_start,
-			NULL, NULL);
+	btrfs_init_work(&async->work, btrfs_extent_refs_helper,
+			delayed_ref_async_start, NULL, NULL);
 
 	btrfs_queue_work(root->fs_info->extent_workers, &async->work);
 
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index de1df3a..c591af5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1111,8 +1111,10 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page,
 		async_cow->end = cur_end;
 		INIT_LIST_HEAD(&async_cow->extents);
 
-		btrfs_init_work(&async_cow->work, async_cow_start,
-				async_cow_submit, async_cow_free);
+		btrfs_init_work(&async_cow->work,
+				btrfs_delalloc_helper,
+				async_cow_start, async_cow_submit,
+				async_cow_free);
 
 		nr_pages = (cur_end - start + PAGE_CACHE_SIZE) >>
 			PAGE_CACHE_SHIFT;
@@ -1924,7 +1926,8 @@ static int btrfs_writepage_start_hook(struct page *page, u64 start, u64 end)
 
 	SetPageChecked(page);
 	page_cache_get(page);
-	btrfs_init_work(&fixup->work, btrfs_writepage_fixup_worker, NULL, NULL);
+	btrfs_init_work(&fixup->work, btrfs_fixup_helper,
+			btrfs_writepage_fixup_worker, NULL, NULL);
 	fixup->page = page;
 	btrfs_queue_work(root->fs_info->fixup_workers, &fixup->work);
 	return -EBUSY;
@@ -2869,7 +2872,8 @@ static int btrfs_writepage_end_io_hook(struct page *page, u64 start, u64 end,
 	struct inode *inode = page->mapping->host;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct btrfs_ordered_extent *ordered_extent = NULL;
-	struct btrfs_workqueue *workers;
+	struct btrfs_workqueue *wq;
+	btrfs_work_func_t func;
 
 	trace_btrfs_writepage_end_io_hook(page, start, end, uptodate);
 
@@ -2878,13 +2882,17 @@ static int btrfs_writepage_end_io_hook(struct page *page, u64 start, u64 end,
 					    end - start + 1, uptodate))
 		return 0;
 
-	btrfs_init_work(&ordered_extent->work, finish_ordered_fn, NULL, NULL);
+	if (btrfs_is_free_space_inode(inode)) {
+		wq = root->fs_info->endio_freespace_worker;
+		func = btrfs_freespace_write_helper;
+	} else {
+		wq = root->fs_info->endio_write_workers;
+		func = btrfs_endio_write_helper;
+	}
 
-	if (btrfs_is_free_space_inode(inode))
-		workers = root->fs_info->endio_freespace_worker;
-	else
-		workers = root->fs_info->endio_write_workers;
-	btrfs_queue_work(workers, &ordered_extent->work);
+	btrfs_init_work(&ordered_extent->work, func, finish_ordered_fn, NULL,
+			NULL);
+	btrfs_queue_work(wq, &ordered_extent->work);
 
 	return 0;
 }
@@ -7503,7 +7511,8 @@ again:
 	if (!ret)
 		goto out_test;
 
-	btrfs_init_work(&ordered->work, finish_ordered_fn, NULL, NULL);
+	btrfs_init_work(&ordered->work, btrfs_endio_write_helper,
+			finish_ordered_fn, NULL, NULL);
 	btrfs_queue_work(root->fs_info->endio_write_workers,
 			 &ordered->work);
 out_test:
@@ -8866,7 +8875,9 @@ struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode,
 	work->inode = inode;
 	work->wait = wait;
 	work->delay_iput = delay_iput;
-	btrfs_init_work(&work->work, btrfs_run_delalloc_work, NULL, NULL);
+	WARN_ON_ONCE(!inode);
+	btrfs_init_work(&work->work, btrfs_flush_delalloc_helper,
+			btrfs_run_delalloc_work, NULL, NULL);
 
 	return work;
 }
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 963895c..ac734ec 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -615,6 +615,7 @@ int btrfs_wait_ordered_extents(struct btrfs_root *root, int nr)
 		spin_unlock(&root->ordered_extent_lock);
 
 		btrfs_init_work(&ordered->flush_work,
+				btrfs_flush_delalloc_helper,
 				btrfs_run_ordered_extent_work, NULL, NULL);
 		list_add_tail(&ordered->work_list, &works);
 		btrfs_queue_work(root->fs_info->flush_workers,
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index c11acca..b4a27bf 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -2722,6 +2722,7 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
 	memset(&fs_info->qgroup_rescan_work, 0,
 	       sizeof(fs_info->qgroup_rescan_work));
 	btrfs_init_work(&fs_info->qgroup_rescan_work,
+			btrfs_qgroup_rescan_helper,
 			btrfs_qgroup_rescan_worker, NULL, NULL);
 
 	if (ret) {
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 6c5c5ae..6a41631 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1416,7 +1416,8 @@ cleanup:
 
 static void async_rmw_stripe(struct btrfs_raid_bio *rbio)
 {
-	btrfs_init_work(&rbio->work, rmw_work, NULL, NULL);
+	btrfs_init_work(&rbio->work, btrfs_rmw_helper,
+			rmw_work, NULL, NULL);
 
 	btrfs_queue_work(rbio->fs_info->rmw_workers,
 			 &rbio->work);
@@ -1424,7 +1425,8 @@ static void async_rmw_stripe(struct btrfs_raid_bio *rbio)
 
 static void async_read_rebuild(struct btrfs_raid_bio *rbio)
 {
-	btrfs_init_work(&rbio->work, read_rebuild_work, NULL, NULL);
+	btrfs_init_work(&rbio->work, btrfs_rmw_helper,
+			read_rebuild_work, NULL, NULL);
 
 	btrfs_queue_work(rbio->fs_info->rmw_workers,
 			 &rbio->work);
@@ -1665,7 +1667,8 @@ static void btrfs_raid_unplug(struct blk_plug_cb *cb, bool from_schedule)
 	plug = container_of(cb, struct btrfs_plug_cb, cb);
 
 	if (from_schedule) {
-		btrfs_init_work(&plug->work, unplug_work, NULL, NULL);
+		btrfs_init_work(&plug->work, btrfs_rmw_helper,
+				unplug_work, NULL, NULL);
 		btrfs_queue_work(plug->info->rmw_workers,
 				 &plug->work);
 		return;
diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
index 85eb55d..b63ae20 100644
--- a/fs/btrfs/reada.c
+++ b/fs/btrfs/reada.c
@@ -798,7 +798,8 @@ static void reada_start_machine(struct btrfs_fs_info *fs_info)
 		/* FIXME we cannot handle this properly right now */
 		BUG();
 	}
-	btrfs_init_work(&rmw->work, reada_start_machine_worker, NULL, NULL);
+	btrfs_init_work(&rmw->work, btrfs_readahead_helper,
+			reada_start_machine_worker, NULL, NULL);
 	rmw->fs_info = fs_info;
 
 	btrfs_queue_work(fs_info->readahead_workers, &rmw->work);
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 695623e..771f0ab 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -427,8 +427,8 @@ struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace)
 		sbio->index = i;
 		sbio->sctx = sctx;
 		sbio->page_count = 0;
-		btrfs_init_work(&sbio->work, scrub_bio_end_io_worker,
-				NULL, NULL);
+		btrfs_init_work(&sbio->work, btrfs_scrub_helper,
+				scrub_bio_end_io_worker, NULL, NULL);
 
 		if (i != SCRUB_BIOS_PER_SCTX - 1)
 			sctx->bios[i]->next_free = i + 1;
@@ -997,8 +997,8 @@ nodatasum_case:
 		fixup_nodatasum->root = fs_info->extent_root;
 		fixup_nodatasum->mirror_num = failed_mirror_index + 1;
 		scrub_pending_trans_workers_inc(sctx);
-		btrfs_init_work(&fixup_nodatasum->work, scrub_fixup_nodatasum,
-				NULL, NULL);
+		btrfs_init_work(&fixup_nodatasum->work, btrfs_scrub_helper,
+				scrub_fixup_nodatasum, NULL, NULL);
 		btrfs_queue_work(fs_info->scrub_workers,
 				 &fixup_nodatasum->work);
 		goto out;
@@ -1624,7 +1624,8 @@ static void scrub_wr_bio_end_io(struct bio *bio, int err)
 	sbio->err = err;
 	sbio->bio = bio;
 
-	btrfs_init_work(&sbio->work, scrub_wr_bio_end_io_worker, NULL, NULL);
+	btrfs_init_work(&sbio->work, btrfs_scrubwrc_helper,
+			 scrub_wr_bio_end_io_worker, NULL, NULL);
 	btrfs_queue_work(fs_info->scrub_wr_completion_workers, &sbio->work);
 }
 
@@ -3232,7 +3233,8 @@ static int copy_nocow_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
 	nocow_ctx->len = len;
 	nocow_ctx->mirror_num = mirror_num;
 	nocow_ctx->physical_for_dev_replace = physical_for_dev_replace;
-	btrfs_init_work(&nocow_ctx->work, copy_nocow_pages_worker, NULL, NULL);
+	btrfs_init_work(&nocow_ctx->work, btrfs_scrubnc_helper,
+			copy_nocow_pages_worker, NULL, NULL);
 	INIT_LIST_HEAD(&nocow_ctx->inodes);
 	btrfs_queue_work(fs_info->scrub_nocow_workers,
 			 &nocow_ctx->work);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index db67c8a..f20c288 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5887,7 +5887,8 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
 	else
 		generate_random_uuid(dev->uuid);
 
-	btrfs_init_work(&dev->work, pending_bios_fn, NULL, NULL);
+	btrfs_init_work(&dev->work, btrfs_submit_helper,
+			pending_bios_fn, NULL, NULL);
 
 	return dev;
 }
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
  2014-08-15 15:36 ` [PATCH v3] " Liu Bo
@ 2014-08-15 16:05   ` Chris Mason
  2014-08-16  7:28   ` Miao Xie
  2014-08-25 14:58   ` Chris Mason
  2 siblings, 0 replies; 22+ messages in thread
From: Chris Mason @ 2014-08-15 16:05 UTC (permalink / raw)
  To: Liu Bo, linux-btrfs

On 08/15/2014 11:36 AM, Liu Bo wrote:

[ snip ]
> 
> Besides, there're other cases that can lead to deadlock, but the real problem
> is that all btrfs workqueue shares one work->func, -- normal_work_helper,
> so this makes each workqueue to have its own helper function, but only a
> wraper pf normal_work_helper.
> 
> With this patch, I no long hit the above hang.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
> v3:
>   - Adopting Chris's advice, this has each workqueue provide its own helper
>     function.
>   - This is rebased on the latest Chris's for-linus branch.

This is what I had in mind, thanks Liu! I'll test and queue it up for rc2.

-chris

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Btrfs: fix task hang under heavy compressed write
  2014-08-14  9:27     ` Martin Steigerwald
@ 2014-08-15 17:51       ` Martin Steigerwald
  0 siblings, 0 replies; 22+ messages in thread
From: Martin Steigerwald @ 2014-08-15 17:51 UTC (permalink / raw)
  To: bo.li.liu; +Cc: linux-btrfs, Chris Mason, miaox, Marc MERLIN, Torbjørn

Am Donnerstag, 14. August 2014, 11:27:06 schrieb Martin Steigerwald:
> Am Mittwoch, 13. August 2014, 23:20:46 schrieb Liu Bo:
> > On Wed, Aug 13, 2014 at 01:54:40PM +0200, Martin Steigerwald wrote:
> > > Am Dienstag, 12. August 2014, 15:44:59 schrieb Liu Bo:
> > > > This has been reported and discussed for a long time, and this hang
> > > > occurs
> > > > in both 3.15 and 3.16.
> > > 
> > > Liu, is this safe for testing yet?
> > 
> > Yes, I've confirmed that this hang doesn't occur by running my tests for 2
> > days(usually it hangs in 2 hours).
> > 
> > But...
> > As Chris said in the thread, this is more a workaround, there're other
> > potential issues that would lead to similar deadlock.
> > 
> > I'm trying to write a real fix instead of a workaround.
> 
> Thanks, so this one goes together with the fixed compressed write corruption
> one? I would put them onto 3.16.1. With 3.17 I want to wait till rc2 I
> think.

Okay, testing this patch and the compressed write corruption fix one on 3.16.1 
now. The v3 patch seems to be 3.17 material as it doesn´t apply onto 3.16.1 
cleanly.

> > thanks,
> > -liubo
> > 
> > > Thanks,
> > > Martin
> > > 
> > > > Btrfs now migrates to use kernel workqueue, but it introduces this
> > > > hang
> > > > problem.
> > > > 
> > > > Btrfs has a kind of work queued as an ordered way, which means that
> > > > its
> > > > ordered_func() must be processed in the way of FIFO, so it usually
> > > > looks
> > > > like --
> > > > 
> > > > normal_work_helper(arg)
> > > > 
> > > >     work = container_of(arg, struct btrfs_work, normal_work);
> > > >     
> > > >     work->func() <---- (we name it work X)
> > > >     for ordered_work in wq->ordered_list
> > > >     
> > > >             ordered_work->ordered_func()
> > > >             ordered_work->ordered_free()
> > > > 
> > > > The hang is a rare case, first when we find free space, we get an
> > > > uncached
> > > > block group, then we go to read its free space cache inode for free
> > > > space
> > > > information, so it will
> > > > 
> > > > file a readahead request
> > > > 
> > > >     btrfs_readpages()
> > > >     
> > > >          for page that is not in page cache
> > > >          
> > > >                 __do_readpage()
> > > >                 
> > > >                      submit_extent_page()
> > > >                      
> > > >                            btrfs_submit_bio_hook()
> > > >                            
> > > >                                  btrfs_bio_wq_end_io()
> > > >                                  submit_bio()
> > > >                                  end_workqueue_bio() <--(ret by the
> > > >                                  1st
> > > > 
> > > > endio) queue a work(named work Y) for the 2nd also the real endio()
> > > > 
> > > > So the hang occurs when work Y's work_struct and work X's work_struct
> > > > happens to share the same address.
> > > > 
> > > > A bit more explanation,
> > > > 
> > > > A,B,C -- struct btrfs_work
> > > > arg   -- struct work_struct
> > > > 
> > > > kthread:
> > > > worker_thread()
> > > > 
> > > >     pick up a work_struct from @worklist
> > > >     process_one_work(arg)
> > > > 	
> > > > 	worker->current_work = arg;  <-- arg is A->normal_work
> > > > 	worker->current_func(arg)
> > > > 	
> > > > 		normal_work_helper(arg)
> > > > 		
> > > > 		     A = container_of(arg, struct btrfs_work, normal_work);
> > > > 		     
> > > > 		     A->func()
> > > > 		     A->ordered_func()
> > > > 		     A->ordered_free()  <-- A gets freed
> > > > 		     
> > > > 		     B->ordered_func()
> > > > 			  
> > > > 			  submit_compressed_extents()
> > > > 			  
> > > > 			      find_free_extent()
> > > > 				  
> > > > 				  load_free_space_inode()
> > > > 				  
> > > > 				      ...   <-- (the above readhead stack)
> > > > 				      end_workqueue_bio()
> > > > 					   
> > > > 					   btrfs_queue_work(work C)
> > > > 		     
> > > > 		     B->ordered_free()
> > > > 
> > > > As if work A has a high priority in wq->ordered_list and there are
> > > > more
> > > > ordered works queued after it, such as B->ordered_func(), its memory
> > > > could
> > > > have been freed before normal_work_helper() returns, which means that
> > > > kernel workqueue code worker_thread() still has worker->current_work
> > > > pointer to be work A->normal_work's, ie. arg's address.
> > > > 
> > > > Meanwhile, work C is allocated after work A is freed, work
> > > > C->normal_work
> > > > and work A->normal_work are likely to share the same address(I
> > > > confirmed
> > > > this with ftrace output, so I'm not just guessing, it's rare though).
> > > > 
> > > > When another kthread picks up work C->normal_work to process, and
> > > > finds
> > > > our
> > > > kthread is processing it(see find_worker_executing_work()), it'll
> > > > think
> > > > work C as a collision and skip then, which ends up nobody processing
> > > > work C.
> > > > 
> > > > So the situation is that our kthread is waiting forever on work C.
> > > > 
> > > > The key point is that they shouldn't have the same address, so this
> > > > defers
> > > > ->ordered_free() and does a batched free to avoid that.
> > > > 
> > > > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > > > ---
> > > > 
> > > >  fs/btrfs/async-thread.c | 12 ++++++++++--
> > > >  1 file changed, 10 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
> > > > index 5a201d8..2ac01b3 100644
> > > > --- a/fs/btrfs/async-thread.c
> > > > +++ b/fs/btrfs/async-thread.c
> > > > @@ -195,6 +195,7 @@ static void run_ordered_work(struct
> > > > __btrfs_workqueue
> > > > *wq) struct btrfs_work *work;
> > > > 
> > > >  	spinlock_t *lock = &wq->list_lock;
> > > >  	unsigned long flags;
> > > > 
> > > > +	LIST_HEAD(free_list);
> > > > 
> > > >  	while (1) {
> > > >  	
> > > >  		spin_lock_irqsave(lock, flags);
> > > > 
> > > > @@ -219,17 +220,24 @@ static void run_ordered_work(struct
> > > > __btrfs_workqueue
> > > > *wq)
> > > > 
> > > >  		/* now take the lock again and drop our item from the list */
> > > >  		spin_lock_irqsave(lock, flags);
> > > > 
> > > > -		list_del(&work->ordered_list);
> > > > +		list_move_tail(&work->ordered_list, &free_list);
> > > > 
> > > >  		spin_unlock_irqrestore(lock, flags);
> > > >  		
> > > >  		/*
> > > >  		
> > > >  		 * we don't want to call the ordered free functions
> > > >  		 * with the lock held though
> > > >  		 */
> > > > 
> > > > +	}
> > > > +	spin_unlock_irqrestore(lock, flags);
> > > > +
> > > > +	while (!list_empty(&free_list)) {
> > > > +		work = list_entry(free_list.next, struct btrfs_work,
> > > > +				  ordered_list);
> > > > +
> > > > +		list_del(&work->ordered_list);
> > > > 
> > > >  		work->ordered_free(work);
> > > >  		trace_btrfs_all_work_done(work);
> > > >  	
> > > >  	}
> > > > 
> > > > -	spin_unlock_irqrestore(lock, flags);
> > > > 
> > > >  }
> > > >  
> > > >  static void normal_work_helper(struct work_struct *arg)

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
  2014-08-15 15:36 ` [PATCH v3] " Liu Bo
  2014-08-15 16:05   ` Chris Mason
@ 2014-08-16  7:28   ` Miao Xie
  2014-08-18  7:32     ` Liu Bo
  2014-08-25 14:58   ` Chris Mason
  2 siblings, 1 reply; 22+ messages in thread
From: Miao Xie @ 2014-08-16  7:28 UTC (permalink / raw)
  To: Liu Bo, linux-btrfs

On Fri, 15 Aug 2014 23:36:53 +0800, Liu Bo wrote:
> This has been reported and discussed for a long time, and this hang occurs in
> both 3.15 and 3.16.
> 
> Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.
> 
> Btrfs has a kind of work queued as an ordered way, which means that its
> ordered_func() must be processed in the way of FIFO, so it usually looks like --
> 
> normal_work_helper(arg)
>     work = container_of(arg, struct btrfs_work, normal_work);
> 
>     work->func() <---- (we name it work X)
>     for ordered_work in wq->ordered_list
>             ordered_work->ordered_func()
>             ordered_work->ordered_free()
> 
> The hang is a rare case, first when we find free space, we get an uncached block
> group, then we go to read its free space cache inode for free space information,
> so it will
> 
> file a readahead request
>     btrfs_readpages()
>          for page that is not in page cache
>                 __do_readpage()
>                      submit_extent_page()
>                            btrfs_submit_bio_hook()
>                                  btrfs_bio_wq_end_io()
>                                  submit_bio()
>                                  end_workqueue_bio() <--(ret by the 1st endio)
>                                       queue a work(named work Y) for the 2nd
>                                       also the real endio()
> 
> So the hang occurs when work Y's work_struct and work X's work_struct happens
> to share the same address.
> 
> A bit more explanation,
> 
> A,B,C -- struct btrfs_work
> arg   -- struct work_struct
> 
> kthread:
> worker_thread()
>     pick up a work_struct from @worklist
>     process_one_work(arg)
> 	worker->current_work = arg;  <-- arg is A->normal_work
> 	worker->current_func(arg)
> 		normal_work_helper(arg)
> 		     A = container_of(arg, struct btrfs_work, normal_work);
> 
> 		     A->func()
> 		     A->ordered_func()
> 		     A->ordered_free()  <-- A gets freed
> 
> 		     B->ordered_func()
> 			  submit_compressed_extents()
> 			      find_free_extent()
> 				  load_free_space_inode()
> 				      ...   <-- (the above readhead stack)
> 				      end_workqueue_bio()
> 					   btrfs_queue_work(work C)
> 		     B->ordered_free()
> 
> As if work A has a high priority in wq->ordered_list and there are more ordered
> works queued after it, such as B->ordered_func(), its memory could have been
> freed before normal_work_helper() returns, which means that kernel workqueue
> code worker_thread() still has worker->current_work pointer to be work
> A->normal_work's, ie. arg's address.
> 
> Meanwhile, work C is allocated after work A is freed, work C->normal_work
> and work A->normal_work are likely to share the same address(I confirmed this
> with ftrace output, so I'm not just guessing, it's rare though).
> 
> When another kthread picks up work C->normal_work to process, and finds our
> kthread is processing it(see find_worker_executing_work()), it'll think
> work C as a collision and skip then, which ends up nobody processing work C.
> 
> So the situation is that our kthread is waiting forever on work C.
> 
> Besides, there're other cases that can lead to deadlock, but the real problem
> is that all btrfs workqueue shares one work->func, -- normal_work_helper,
> so this makes each workqueue to have its own helper function, but only a
> wraper pf normal_work_helper.
> 
> With this patch, I no long hit the above hang.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
> v3:
>   - Adopting Chris's advice, this has each workqueue provide its own helper
>     function.
>   - This is rebased on the latest Chris's for-linus branch.
> 
> v2:
>   - This changes a bit to not defer all ->ordered_free(), but only defer the
>     work that triggers this run_ordered_work().  Actually we don't need to
>     defer other ->ordered_free() because their work cannot be this kthread
>     worker's @current_work.  We can benefit from it since this can pin less
>     memory during the period.
> 
>  fs/btrfs/async-thread.c  | 44 +++++++++++++++++++++++++++------
>  fs/btrfs/async-thread.h  | 28 ++++++++++++++++++++-
>  fs/btrfs/delayed-inode.c |  4 +--
>  fs/btrfs/disk-io.c       | 63 ++++++++++++++++++++++++++----------------------
>  fs/btrfs/extent-tree.c   |  7 +++---
>  fs/btrfs/inode.c         | 35 ++++++++++++++++++---------
>  fs/btrfs/ordered-data.c  |  1 +
>  fs/btrfs/qgroup.c        |  1 +
>  fs/btrfs/raid56.c        |  9 ++++---
>  fs/btrfs/reada.c         |  3 ++-
>  fs/btrfs/scrub.c         | 14 ++++++-----
>  fs/btrfs/volumes.c       |  3 ++-
>  12 files changed, 146 insertions(+), 66 deletions(-)
> 
> diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
> index 5a201d8..fbd76de 100644
> --- a/fs/btrfs/async-thread.c
> +++ b/fs/btrfs/async-thread.c
> @@ -22,7 +22,6 @@
>  #include <linux/list.h>
>  #include <linux/spinlock.h>
>  #include <linux/freezer.h>
> -#include <linux/workqueue.h>
>  #include "async-thread.h"
>  #include "ctree.h"
>  
> @@ -55,8 +54,39 @@ struct btrfs_workqueue {
>  	struct __btrfs_workqueue *high;
>  };
>  
> -static inline struct __btrfs_workqueue
> -*__btrfs_alloc_workqueue(const char *name, int flags, int max_active,
> +static void normal_work_helper(struct btrfs_work *work);
> +
> +#define BTRFS_WORK_HELPER(name)					\
> +void btrfs_##name(struct work_struct *arg)				\
> +{									\
> +	struct btrfs_work *work = container_of(arg, struct btrfs_work,	\
> +					       normal_work);		\
> +	normal_work_helper(work);					\
> +}
> +
> +BTRFS_WORK_HELPER(worker_helper);
> +BTRFS_WORK_HELPER(delalloc_helper);
> +BTRFS_WORK_HELPER(flush_delalloc_helper);
> +BTRFS_WORK_HELPER(cache_helper);
> +BTRFS_WORK_HELPER(submit_helper);
> +BTRFS_WORK_HELPER(fixup_helper);
> +BTRFS_WORK_HELPER(endio_helper);
> +BTRFS_WORK_HELPER(endio_meta_helper);
> +BTRFS_WORK_HELPER(endio_meta_write_helper);
> +BTRFS_WORK_HELPER(endio_raid56_helper);
> +BTRFS_WORK_HELPER(rmw_helper);
> +BTRFS_WORK_HELPER(endio_write_helper);
> +BTRFS_WORK_HELPER(freespace_write_helper);
> +BTRFS_WORK_HELPER(delayed_meta_helper);
> +BTRFS_WORK_HELPER(readahead_helper);
> +BTRFS_WORK_HELPER(qgroup_rescan_helper);
> +BTRFS_WORK_HELPER(extent_refs_helper);
> +BTRFS_WORK_HELPER(scrub_helper);
> +BTRFS_WORK_HELPER(scrubwrc_helper);
> +BTRFS_WORK_HELPER(scrubnc_helper);
> +
> +static struct __btrfs_workqueue *
> +__btrfs_alloc_workqueue(const char *name, int flags, int max_active,
>  			 int thresh)
>  {
>  	struct __btrfs_workqueue *ret = kzalloc(sizeof(*ret), GFP_NOFS);
> @@ -232,13 +262,11 @@ static void run_ordered_work(struct __btrfs_workqueue *wq)
>  	spin_unlock_irqrestore(lock, flags);
>  }
>  
> -static void normal_work_helper(struct work_struct *arg)
> +static void normal_work_helper(struct btrfs_work *work)
>  {
> -	struct btrfs_work *work;
>  	struct __btrfs_workqueue *wq;
>  	int need_order = 0;
>  
> -	work = container_of(arg, struct btrfs_work, normal_work);
>  	/*
>  	 * We should not touch things inside work in the following cases:
>  	 * 1) after work->func() if it has no ordered_free
> @@ -262,7 +290,7 @@ static void normal_work_helper(struct work_struct *arg)
>  		trace_btrfs_all_work_done(work);
>  }
>  
> -void btrfs_init_work(struct btrfs_work *work,
> +void btrfs_init_work(struct btrfs_work *work, btrfs_work_func_t uniq_func,
>  		     btrfs_func_t func,
>  		     btrfs_func_t ordered_func,
>  		     btrfs_func_t ordered_free)
> @@ -270,7 +298,7 @@ void btrfs_init_work(struct btrfs_work *work,
>  	work->func = func;
>  	work->ordered_func = ordered_func;
>  	work->ordered_free = ordered_free;
> -	INIT_WORK(&work->normal_work, normal_work_helper);
> +	INIT_WORK(&work->normal_work, uniq_func);


why not put the function pointer into the btrfs workqueue structure? And then we needn't
change the interface of btrfs_init_work(just remove INIT_WORK in it), and then init the
normal work by the function pointer in the btrfs workqueue structure when we queue the work.
In this way, we need change so much and can make the code more clear.

Thanks
Miao

>  	INIT_LIST_HEAD(&work->ordered_list);
>  	work->flags = 0;
>  }
> diff --git a/fs/btrfs/async-thread.h b/fs/btrfs/async-thread.h
> index 9c6b66d..e9e31c9 100644
> --- a/fs/btrfs/async-thread.h
> +++ b/fs/btrfs/async-thread.h
> @@ -19,12 +19,14 @@
>  
>  #ifndef __BTRFS_ASYNC_THREAD_
>  #define __BTRFS_ASYNC_THREAD_
> +#include <linux/workqueue.h>
>  
>  struct btrfs_workqueue;
>  /* Internal use only */
>  struct __btrfs_workqueue;
>  struct btrfs_work;
>  typedef void (*btrfs_func_t)(struct btrfs_work *arg);
> +typedef void (*btrfs_work_func_t)(struct work_struct *arg);
>  
>  struct btrfs_work {
>  	btrfs_func_t func;
> @@ -38,11 +40,35 @@ struct btrfs_work {
>  	unsigned long flags;
>  };
>  
> +#define BTRFS_WORK_HELPER_PROTO(name)					\
> +void btrfs_##name(struct work_struct *arg)
> +
> +BTRFS_WORK_HELPER_PROTO(worker_helper);
> +BTRFS_WORK_HELPER_PROTO(delalloc_helper);
> +BTRFS_WORK_HELPER_PROTO(flush_delalloc_helper);
> +BTRFS_WORK_HELPER_PROTO(cache_helper);
> +BTRFS_WORK_HELPER_PROTO(submit_helper);
> +BTRFS_WORK_HELPER_PROTO(fixup_helper);
> +BTRFS_WORK_HELPER_PROTO(endio_helper);
> +BTRFS_WORK_HELPER_PROTO(endio_meta_helper);
> +BTRFS_WORK_HELPER_PROTO(endio_meta_write_helper);
> +BTRFS_WORK_HELPER_PROTO(endio_raid56_helper);
> +BTRFS_WORK_HELPER_PROTO(rmw_helper);
> +BTRFS_WORK_HELPER_PROTO(endio_write_helper);
> +BTRFS_WORK_HELPER_PROTO(freespace_write_helper);
> +BTRFS_WORK_HELPER_PROTO(delayed_meta_helper);
> +BTRFS_WORK_HELPER_PROTO(readahead_helper);
> +BTRFS_WORK_HELPER_PROTO(qgroup_rescan_helper);
> +BTRFS_WORK_HELPER_PROTO(extent_refs_helper);
> +BTRFS_WORK_HELPER_PROTO(scrub_helper);
> +BTRFS_WORK_HELPER_PROTO(scrubwrc_helper);
> +BTRFS_WORK_HELPER_PROTO(scrubnc_helper);
> +
>  struct btrfs_workqueue *btrfs_alloc_workqueue(const char *name,
>  					      int flags,
>  					      int max_active,
>  					      int thresh);
> -void btrfs_init_work(struct btrfs_work *work,
> +void btrfs_init_work(struct btrfs_work *work, btrfs_work_func_t helper,
>  		     btrfs_func_t func,
>  		     btrfs_func_t ordered_func,
>  		     btrfs_func_t ordered_free);
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index 0816814..054577b 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -1395,8 +1395,8 @@ static int btrfs_wq_run_delayed_node(struct btrfs_delayed_root *delayed_root,
>  		return -ENOMEM;
>  
>  	async_work->delayed_root = delayed_root;
> -	btrfs_init_work(&async_work->work, btrfs_async_run_delayed_root,
> -			NULL, NULL);
> +	btrfs_init_work(&async_work->work, btrfs_delayed_meta_helper,
> +			btrfs_async_run_delayed_root, NULL, NULL);
>  	async_work->nr = nr;
>  
>  	btrfs_queue_work(root->fs_info->delayed_workers, &async_work->work);
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index fd9be2d..28098ba 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -39,7 +39,6 @@
>  #include "btrfs_inode.h"
>  #include "volumes.h"
>  #include "print-tree.h"
> -#include "async-thread.h"
>  #include "locking.h"
>  #include "tree-log.h"
>  #include "free-space-cache.h"
> @@ -714,42 +713,48 @@ static void end_workqueue_bio(struct bio *bio, int err)
>  {
>  	struct end_io_wq *end_io_wq = bio->bi_private;
>  	struct btrfs_fs_info *fs_info;
> +	struct btrfs_workqueue *wq;
> +	btrfs_work_func_t func;
>  
>  	fs_info = end_io_wq->info;
>  	end_io_wq->error = err;
>  
> -	if (likely(end_io_wq->metadata != BTRFS_WQ_ENDIO_DIO_REPAIR))
> -		btrfs_init_work(&end_io_wq->work, end_workqueue_fn, NULL,
> -				NULL);
> -	else
> +	if (unlikely(end_io_wq->metadata == BTRFS_WQ_ENDIO_DIO_REPAIR &&
> +		     !(bio->bi_rw & REQ_WRITE))) {
>  		INIT_WORK(&end_io_wq->work.normal_work, dio_end_workqueue_fn);
> +		queue_work(system_wq, &end_io_wq->work.normal_work);
> +		return;
> +	}
>  
>  	if (bio->bi_rw & REQ_WRITE) {
> -		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA)
> -			btrfs_queue_work(fs_info->endio_meta_write_workers,
> -					 &end_io_wq->work);
> -		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE)
> -			btrfs_queue_work(fs_info->endio_freespace_worker,
> -					 &end_io_wq->work);
> -		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
> -			btrfs_queue_work(fs_info->endio_raid56_workers,
> -					 &end_io_wq->work);
> -		else
> -			btrfs_queue_work(fs_info->endio_write_workers,
> -					 &end_io_wq->work);
> +		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA) {
> +			wq = fs_info->endio_meta_write_workers;
> +			func = btrfs_endio_meta_write_helper;
> +		} else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE) {
> +			wq = fs_info->endio_freespace_worker;
> +			func = btrfs_freespace_write_helper;
> +		} else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56) {
> +			wq = fs_info->endio_raid56_workers;
> +			func = btrfs_endio_raid56_helper;
> +		} else {
> +			wq = fs_info->endio_write_workers;
> +			func = btrfs_endio_write_helper;
> +		}
>  	} else {
> -		if (unlikely(end_io_wq->metadata == BTRFS_WQ_ENDIO_DIO_REPAIR))
> -			queue_work(system_wq, &end_io_wq->work.normal_work);
> -		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
> -			btrfs_queue_work(fs_info->endio_raid56_workers,
> -					 &end_io_wq->work);
> -		else if (end_io_wq->metadata)
> -			btrfs_queue_work(fs_info->endio_meta_workers,
> -					 &end_io_wq->work);
> -		else
> -			btrfs_queue_work(fs_info->endio_workers,
> -					 &end_io_wq->work);
> +		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56) {
> +			wq = fs_info->endio_raid56_workers;
> +			func = btrfs_endio_raid56_helper;
> +		} else if (end_io_wq->metadata) {
> +			wq = fs_info->endio_meta_workers;
> +			func = btrfs_endio_meta_helper;
> +		} else {
> +			wq = fs_info->endio_workers;
> +			func = btrfs_endio_helper;
> +		}
>  	}
> +
> +	btrfs_init_work(&end_io_wq->work, func, end_workqueue_fn, NULL, NULL);
> +	btrfs_queue_work(wq, &end_io_wq->work);
>  }
>  
>  /*
> @@ -857,7 +862,7 @@ int btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct inode *inode,
>  	async->submit_bio_start = submit_bio_start;
>  	async->submit_bio_done = submit_bio_done;
>  
> -	btrfs_init_work(&async->work, run_one_async_start,
> +	btrfs_init_work(&async->work, btrfs_worker_helper, run_one_async_start,
>  			run_one_async_done, run_one_async_free);
>  
>  	async->bio_flags = bio_flags;
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index ef0845d..e105558 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -552,7 +552,8 @@ static int cache_block_group(struct btrfs_block_group_cache *cache,
>  	caching_ctl->block_group = cache;
>  	caching_ctl->progress = cache->key.objectid;
>  	atomic_set(&caching_ctl->count, 1);
> -	btrfs_init_work(&caching_ctl->work, caching_thread, NULL, NULL);
> +	btrfs_init_work(&caching_ctl->work, btrfs_cache_helper,
> +			caching_thread, NULL, NULL);
>  
>  	spin_lock(&cache->lock);
>  	/*
> @@ -2749,8 +2750,8 @@ int btrfs_async_run_delayed_refs(struct btrfs_root *root,
>  		async->sync = 0;
>  	init_completion(&async->wait);
>  
> -	btrfs_init_work(&async->work, delayed_ref_async_start,
> -			NULL, NULL);
> +	btrfs_init_work(&async->work, btrfs_extent_refs_helper,
> +			delayed_ref_async_start, NULL, NULL);
>  
>  	btrfs_queue_work(root->fs_info->extent_workers, &async->work);
>  
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index de1df3a..c591af5 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1111,8 +1111,10 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page,
>  		async_cow->end = cur_end;
>  		INIT_LIST_HEAD(&async_cow->extents);
>  
> -		btrfs_init_work(&async_cow->work, async_cow_start,
> -				async_cow_submit, async_cow_free);
> +		btrfs_init_work(&async_cow->work,
> +				btrfs_delalloc_helper,
> +				async_cow_start, async_cow_submit,
> +				async_cow_free);
>  
>  		nr_pages = (cur_end - start + PAGE_CACHE_SIZE) >>
>  			PAGE_CACHE_SHIFT;
> @@ -1924,7 +1926,8 @@ static int btrfs_writepage_start_hook(struct page *page, u64 start, u64 end)
>  
>  	SetPageChecked(page);
>  	page_cache_get(page);
> -	btrfs_init_work(&fixup->work, btrfs_writepage_fixup_worker, NULL, NULL);
> +	btrfs_init_work(&fixup->work, btrfs_fixup_helper,
> +			btrfs_writepage_fixup_worker, NULL, NULL);
>  	fixup->page = page;
>  	btrfs_queue_work(root->fs_info->fixup_workers, &fixup->work);
>  	return -EBUSY;
> @@ -2869,7 +2872,8 @@ static int btrfs_writepage_end_io_hook(struct page *page, u64 start, u64 end,
>  	struct inode *inode = page->mapping->host;
>  	struct btrfs_root *root = BTRFS_I(inode)->root;
>  	struct btrfs_ordered_extent *ordered_extent = NULL;
> -	struct btrfs_workqueue *workers;
> +	struct btrfs_workqueue *wq;
> +	btrfs_work_func_t func;
>  
>  	trace_btrfs_writepage_end_io_hook(page, start, end, uptodate);
>  
> @@ -2878,13 +2882,17 @@ static int btrfs_writepage_end_io_hook(struct page *page, u64 start, u64 end,
>  					    end - start + 1, uptodate))
>  		return 0;
>  
> -	btrfs_init_work(&ordered_extent->work, finish_ordered_fn, NULL, NULL);
> +	if (btrfs_is_free_space_inode(inode)) {
> +		wq = root->fs_info->endio_freespace_worker;
> +		func = btrfs_freespace_write_helper;
> +	} else {
> +		wq = root->fs_info->endio_write_workers;
> +		func = btrfs_endio_write_helper;
> +	}
>  
> -	if (btrfs_is_free_space_inode(inode))
> -		workers = root->fs_info->endio_freespace_worker;
> -	else
> -		workers = root->fs_info->endio_write_workers;
> -	btrfs_queue_work(workers, &ordered_extent->work);
> +	btrfs_init_work(&ordered_extent->work, func, finish_ordered_fn, NULL,
> +			NULL);
> +	btrfs_queue_work(wq, &ordered_extent->work);
>  
>  	return 0;
>  }
> @@ -7503,7 +7511,8 @@ again:
>  	if (!ret)
>  		goto out_test;
>  
> -	btrfs_init_work(&ordered->work, finish_ordered_fn, NULL, NULL);
> +	btrfs_init_work(&ordered->work, btrfs_endio_write_helper,
> +			finish_ordered_fn, NULL, NULL);
>  	btrfs_queue_work(root->fs_info->endio_write_workers,
>  			 &ordered->work);
>  out_test:
> @@ -8866,7 +8875,9 @@ struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode,
>  	work->inode = inode;
>  	work->wait = wait;
>  	work->delay_iput = delay_iput;
> -	btrfs_init_work(&work->work, btrfs_run_delalloc_work, NULL, NULL);
> +	WARN_ON_ONCE(!inode);
> +	btrfs_init_work(&work->work, btrfs_flush_delalloc_helper,
> +			btrfs_run_delalloc_work, NULL, NULL);
>  
>  	return work;
>  }
> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> index 963895c..ac734ec 100644
> --- a/fs/btrfs/ordered-data.c
> +++ b/fs/btrfs/ordered-data.c
> @@ -615,6 +615,7 @@ int btrfs_wait_ordered_extents(struct btrfs_root *root, int nr)
>  		spin_unlock(&root->ordered_extent_lock);
>  
>  		btrfs_init_work(&ordered->flush_work,
> +				btrfs_flush_delalloc_helper,
>  				btrfs_run_ordered_extent_work, NULL, NULL);
>  		list_add_tail(&ordered->work_list, &works);
>  		btrfs_queue_work(root->fs_info->flush_workers,
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index c11acca..b4a27bf 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -2722,6 +2722,7 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
>  	memset(&fs_info->qgroup_rescan_work, 0,
>  	       sizeof(fs_info->qgroup_rescan_work));
>  	btrfs_init_work(&fs_info->qgroup_rescan_work,
> +			btrfs_qgroup_rescan_helper,
>  			btrfs_qgroup_rescan_worker, NULL, NULL);
>  
>  	if (ret) {
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 6c5c5ae..6a41631 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -1416,7 +1416,8 @@ cleanup:
>  
>  static void async_rmw_stripe(struct btrfs_raid_bio *rbio)
>  {
> -	btrfs_init_work(&rbio->work, rmw_work, NULL, NULL);
> +	btrfs_init_work(&rbio->work, btrfs_rmw_helper,
> +			rmw_work, NULL, NULL);
>  
>  	btrfs_queue_work(rbio->fs_info->rmw_workers,
>  			 &rbio->work);
> @@ -1424,7 +1425,8 @@ static void async_rmw_stripe(struct btrfs_raid_bio *rbio)
>  
>  static void async_read_rebuild(struct btrfs_raid_bio *rbio)
>  {
> -	btrfs_init_work(&rbio->work, read_rebuild_work, NULL, NULL);
> +	btrfs_init_work(&rbio->work, btrfs_rmw_helper,
> +			read_rebuild_work, NULL, NULL);
>  
>  	btrfs_queue_work(rbio->fs_info->rmw_workers,
>  			 &rbio->work);
> @@ -1665,7 +1667,8 @@ static void btrfs_raid_unplug(struct blk_plug_cb *cb, bool from_schedule)
>  	plug = container_of(cb, struct btrfs_plug_cb, cb);
>  
>  	if (from_schedule) {
> -		btrfs_init_work(&plug->work, unplug_work, NULL, NULL);
> +		btrfs_init_work(&plug->work, btrfs_rmw_helper,
> +				unplug_work, NULL, NULL);
>  		btrfs_queue_work(plug->info->rmw_workers,
>  				 &plug->work);
>  		return;
> diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
> index 85eb55d..b63ae20 100644
> --- a/fs/btrfs/reada.c
> +++ b/fs/btrfs/reada.c
> @@ -798,7 +798,8 @@ static void reada_start_machine(struct btrfs_fs_info *fs_info)
>  		/* FIXME we cannot handle this properly right now */
>  		BUG();
>  	}
> -	btrfs_init_work(&rmw->work, reada_start_machine_worker, NULL, NULL);
> +	btrfs_init_work(&rmw->work, btrfs_readahead_helper,
> +			reada_start_machine_worker, NULL, NULL);
>  	rmw->fs_info = fs_info;
>  
>  	btrfs_queue_work(fs_info->readahead_workers, &rmw->work);
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 695623e..771f0ab 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -427,8 +427,8 @@ struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace)
>  		sbio->index = i;
>  		sbio->sctx = sctx;
>  		sbio->page_count = 0;
> -		btrfs_init_work(&sbio->work, scrub_bio_end_io_worker,
> -				NULL, NULL);
> +		btrfs_init_work(&sbio->work, btrfs_scrub_helper,
> +				scrub_bio_end_io_worker, NULL, NULL);
>  
>  		if (i != SCRUB_BIOS_PER_SCTX - 1)
>  			sctx->bios[i]->next_free = i + 1;
> @@ -997,8 +997,8 @@ nodatasum_case:
>  		fixup_nodatasum->root = fs_info->extent_root;
>  		fixup_nodatasum->mirror_num = failed_mirror_index + 1;
>  		scrub_pending_trans_workers_inc(sctx);
> -		btrfs_init_work(&fixup_nodatasum->work, scrub_fixup_nodatasum,
> -				NULL, NULL);
> +		btrfs_init_work(&fixup_nodatasum->work, btrfs_scrub_helper,
> +				scrub_fixup_nodatasum, NULL, NULL);
>  		btrfs_queue_work(fs_info->scrub_workers,
>  				 &fixup_nodatasum->work);
>  		goto out;
> @@ -1624,7 +1624,8 @@ static void scrub_wr_bio_end_io(struct bio *bio, int err)
>  	sbio->err = err;
>  	sbio->bio = bio;
>  
> -	btrfs_init_work(&sbio->work, scrub_wr_bio_end_io_worker, NULL, NULL);
> +	btrfs_init_work(&sbio->work, btrfs_scrubwrc_helper,
> +			 scrub_wr_bio_end_io_worker, NULL, NULL);
>  	btrfs_queue_work(fs_info->scrub_wr_completion_workers, &sbio->work);
>  }
>  
> @@ -3232,7 +3233,8 @@ static int copy_nocow_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
>  	nocow_ctx->len = len;
>  	nocow_ctx->mirror_num = mirror_num;
>  	nocow_ctx->physical_for_dev_replace = physical_for_dev_replace;
> -	btrfs_init_work(&nocow_ctx->work, copy_nocow_pages_worker, NULL, NULL);
> +	btrfs_init_work(&nocow_ctx->work, btrfs_scrubnc_helper,
> +			copy_nocow_pages_worker, NULL, NULL);
>  	INIT_LIST_HEAD(&nocow_ctx->inodes);
>  	btrfs_queue_work(fs_info->scrub_nocow_workers,
>  			 &nocow_ctx->work);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index db67c8a..f20c288 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -5887,7 +5887,8 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
>  	else
>  		generate_random_uuid(dev->uuid);
>  
> -	btrfs_init_work(&dev->work, pending_bios_fn, NULL, NULL);
> +	btrfs_init_work(&dev->work, btrfs_submit_helper,
> +			pending_bios_fn, NULL, NULL);
>  
>  	return dev;
>  }
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
  2014-08-16  7:28   ` Miao Xie
@ 2014-08-18  7:32     ` Liu Bo
  0 siblings, 0 replies; 22+ messages in thread
From: Liu Bo @ 2014-08-18  7:32 UTC (permalink / raw)
  To: Miao Xie; +Cc: linux-btrfs

On Sat, Aug 16, 2014 at 03:28:11PM +0800, Miao Xie wrote:
> On Fri, 15 Aug 2014 23:36:53 +0800, Liu Bo wrote:
> > This has been reported and discussed for a long time, and this hang occurs in
> > both 3.15 and 3.16.
> > 
> > Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.
> > 
> > Btrfs has a kind of work queued as an ordered way, which means that its
> > ordered_func() must be processed in the way of FIFO, so it usually looks like --
> > 
> > normal_work_helper(arg)
> >     work = container_of(arg, struct btrfs_work, normal_work);
> > 
> >     work->func() <---- (we name it work X)
> >     for ordered_work in wq->ordered_list
> >             ordered_work->ordered_func()
> >             ordered_work->ordered_free()
> > 
> > The hang is a rare case, first when we find free space, we get an uncached block
> > group, then we go to read its free space cache inode for free space information,
> > so it will
> > 
> > file a readahead request
> >     btrfs_readpages()
> >          for page that is not in page cache
> >                 __do_readpage()
> >                      submit_extent_page()
> >                            btrfs_submit_bio_hook()
> >                                  btrfs_bio_wq_end_io()
> >                                  submit_bio()
> >                                  end_workqueue_bio() <--(ret by the 1st endio)
> >                                       queue a work(named work Y) for the 2nd
> >                                       also the real endio()
> > 
> > So the hang occurs when work Y's work_struct and work X's work_struct happens
> > to share the same address.
> > 
> > A bit more explanation,
> > 
> > A,B,C -- struct btrfs_work
> > arg   -- struct work_struct
> > 
> > kthread:
> > worker_thread()
> >     pick up a work_struct from @worklist
> >     process_one_work(arg)
> > 	worker->current_work = arg;  <-- arg is A->normal_work
> > 	worker->current_func(arg)
> > 		normal_work_helper(arg)
> > 		     A = container_of(arg, struct btrfs_work, normal_work);
> > 
> > 		     A->func()
> > 		     A->ordered_func()
> > 		     A->ordered_free()  <-- A gets freed
> > 
> > 		     B->ordered_func()
> > 			  submit_compressed_extents()
> > 			      find_free_extent()
> > 				  load_free_space_inode()
> > 				      ...   <-- (the above readhead stack)
> > 				      end_workqueue_bio()
> > 					   btrfs_queue_work(work C)
> > 		     B->ordered_free()
> > 
> > As if work A has a high priority in wq->ordered_list and there are more ordered
> > works queued after it, such as B->ordered_func(), its memory could have been
> > freed before normal_work_helper() returns, which means that kernel workqueue
> > code worker_thread() still has worker->current_work pointer to be work
> > A->normal_work's, ie. arg's address.
> > 
> > Meanwhile, work C is allocated after work A is freed, work C->normal_work
> > and work A->normal_work are likely to share the same address(I confirmed this
> > with ftrace output, so I'm not just guessing, it's rare though).
> > 
> > When another kthread picks up work C->normal_work to process, and finds our
> > kthread is processing it(see find_worker_executing_work()), it'll think
> > work C as a collision and skip then, which ends up nobody processing work C.
> > 
> > So the situation is that our kthread is waiting forever on work C.
> > 
> > Besides, there're other cases that can lead to deadlock, but the real problem
> > is that all btrfs workqueue shares one work->func, -- normal_work_helper,
> > so this makes each workqueue to have its own helper function, but only a
> > wraper pf normal_work_helper.
> > 
> > With this patch, I no long hit the above hang.
> > 
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> > v3:
> >   - Adopting Chris's advice, this has each workqueue provide its own helper
> >     function.
> >   - This is rebased on the latest Chris's for-linus branch.
> > 
> > v2:
> >   - This changes a bit to not defer all ->ordered_free(), but only defer the
> >     work that triggers this run_ordered_work().  Actually we don't need to
> >     defer other ->ordered_free() because their work cannot be this kthread
> >     worker's @current_work.  We can benefit from it since this can pin less
> >     memory during the period.
> > 
> >  fs/btrfs/async-thread.c  | 44 +++++++++++++++++++++++++++------
> >  fs/btrfs/async-thread.h  | 28 ++++++++++++++++++++-
> >  fs/btrfs/delayed-inode.c |  4 +--
> >  fs/btrfs/disk-io.c       | 63 ++++++++++++++++++++++++++----------------------
> >  fs/btrfs/extent-tree.c   |  7 +++---
> >  fs/btrfs/inode.c         | 35 ++++++++++++++++++---------
> >  fs/btrfs/ordered-data.c  |  1 +
> >  fs/btrfs/qgroup.c        |  1 +
> >  fs/btrfs/raid56.c        |  9 ++++---
> >  fs/btrfs/reada.c         |  3 ++-
> >  fs/btrfs/scrub.c         | 14 ++++++-----
> >  fs/btrfs/volumes.c       |  3 ++-
> >  12 files changed, 146 insertions(+), 66 deletions(-)
> > 
> > diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
> > index 5a201d8..fbd76de 100644
> > --- a/fs/btrfs/async-thread.c
> > +++ b/fs/btrfs/async-thread.c
> > @@ -22,7 +22,6 @@
> >  #include <linux/list.h>
> >  #include <linux/spinlock.h>
> >  #include <linux/freezer.h>
> > -#include <linux/workqueue.h>
> >  #include "async-thread.h"
> >  #include "ctree.h"
> >  
> > @@ -55,8 +54,39 @@ struct btrfs_workqueue {
> >  	struct __btrfs_workqueue *high;
> >  };
> >  
> > -static inline struct __btrfs_workqueue
> > -*__btrfs_alloc_workqueue(const char *name, int flags, int max_active,
> > +static void normal_work_helper(struct btrfs_work *work);
> > +
> > +#define BTRFS_WORK_HELPER(name)					\
> > +void btrfs_##name(struct work_struct *arg)				\
> > +{									\
> > +	struct btrfs_work *work = container_of(arg, struct btrfs_work,	\
> > +					       normal_work);		\
> > +	normal_work_helper(work);					\
> > +}
> > +
> > +BTRFS_WORK_HELPER(worker_helper);
> > +BTRFS_WORK_HELPER(delalloc_helper);
> > +BTRFS_WORK_HELPER(flush_delalloc_helper);
> > +BTRFS_WORK_HELPER(cache_helper);
> > +BTRFS_WORK_HELPER(submit_helper);
> > +BTRFS_WORK_HELPER(fixup_helper);
> > +BTRFS_WORK_HELPER(endio_helper);
> > +BTRFS_WORK_HELPER(endio_meta_helper);
> > +BTRFS_WORK_HELPER(endio_meta_write_helper);
> > +BTRFS_WORK_HELPER(endio_raid56_helper);
> > +BTRFS_WORK_HELPER(rmw_helper);
> > +BTRFS_WORK_HELPER(endio_write_helper);
> > +BTRFS_WORK_HELPER(freespace_write_helper);
> > +BTRFS_WORK_HELPER(delayed_meta_helper);
> > +BTRFS_WORK_HELPER(readahead_helper);
> > +BTRFS_WORK_HELPER(qgroup_rescan_helper);
> > +BTRFS_WORK_HELPER(extent_refs_helper);
> > +BTRFS_WORK_HELPER(scrub_helper);
> > +BTRFS_WORK_HELPER(scrubwrc_helper);
> > +BTRFS_WORK_HELPER(scrubnc_helper);
> > +
> > +static struct __btrfs_workqueue *
> > +__btrfs_alloc_workqueue(const char *name, int flags, int max_active,
> >  			 int thresh)
> >  {
> >  	struct __btrfs_workqueue *ret = kzalloc(sizeof(*ret), GFP_NOFS);
> > @@ -232,13 +262,11 @@ static void run_ordered_work(struct __btrfs_workqueue *wq)
> >  	spin_unlock_irqrestore(lock, flags);
> >  }
> >  
> > -static void normal_work_helper(struct work_struct *arg)
> > +static void normal_work_helper(struct btrfs_work *work)
> >  {
> > -	struct btrfs_work *work;
> >  	struct __btrfs_workqueue *wq;
> >  	int need_order = 0;
> >  
> > -	work = container_of(arg, struct btrfs_work, normal_work);
> >  	/*
> >  	 * We should not touch things inside work in the following cases:
> >  	 * 1) after work->func() if it has no ordered_free
> > @@ -262,7 +290,7 @@ static void normal_work_helper(struct work_struct *arg)
> >  		trace_btrfs_all_work_done(work);
> >  }
> >  
> > -void btrfs_init_work(struct btrfs_work *work,
> > +void btrfs_init_work(struct btrfs_work *work, btrfs_work_func_t uniq_func,
> >  		     btrfs_func_t func,
> >  		     btrfs_func_t ordered_func,
> >  		     btrfs_func_t ordered_free)
> > @@ -270,7 +298,7 @@ void btrfs_init_work(struct btrfs_work *work,
> >  	work->func = func;
> >  	work->ordered_func = ordered_func;
> >  	work->ordered_free = ordered_free;
> > -	INIT_WORK(&work->normal_work, normal_work_helper);
> > +	INIT_WORK(&work->normal_work, uniq_func);
> 
> 
> why not put the function pointer into the btrfs workqueue structure? And then we needn't
> change the interface of btrfs_init_work(just remove INIT_WORK in it), and then init the
> normal work by the function pointer in the btrfs workqueue structure when we queue the work.
> In this way, we need change so much and can make the code more clear.

Yeah I did think about it, but INIT_WORK() is not just to set
work->normal_work.func, it also does some other things, such as lockdep initial,
debug object initial, so I decided not to put INIT_WORK in btrfs_queue_work().

And if we only update the field work->normal_work.func in btrfs code, well I don't
think it's good, because that is workqueue API's job.

thanks,
-liubo

> 
> Thanks
> Miao
> 
> >  	INIT_LIST_HEAD(&work->ordered_list);
> >  	work->flags = 0;
> >  }
> > diff --git a/fs/btrfs/async-thread.h b/fs/btrfs/async-thread.h
> > index 9c6b66d..e9e31c9 100644
> > --- a/fs/btrfs/async-thread.h
> > +++ b/fs/btrfs/async-thread.h
> > @@ -19,12 +19,14 @@
> >  
> >  #ifndef __BTRFS_ASYNC_THREAD_
> >  #define __BTRFS_ASYNC_THREAD_
> > +#include <linux/workqueue.h>
> >  
> >  struct btrfs_workqueue;
> >  /* Internal use only */
> >  struct __btrfs_workqueue;
> >  struct btrfs_work;
> >  typedef void (*btrfs_func_t)(struct btrfs_work *arg);
> > +typedef void (*btrfs_work_func_t)(struct work_struct *arg);
> >  
> >  struct btrfs_work {
> >  	btrfs_func_t func;
> > @@ -38,11 +40,35 @@ struct btrfs_work {
> >  	unsigned long flags;
> >  };
> >  
> > +#define BTRFS_WORK_HELPER_PROTO(name)					\
> > +void btrfs_##name(struct work_struct *arg)
> > +
> > +BTRFS_WORK_HELPER_PROTO(worker_helper);
> > +BTRFS_WORK_HELPER_PROTO(delalloc_helper);
> > +BTRFS_WORK_HELPER_PROTO(flush_delalloc_helper);
> > +BTRFS_WORK_HELPER_PROTO(cache_helper);
> > +BTRFS_WORK_HELPER_PROTO(submit_helper);
> > +BTRFS_WORK_HELPER_PROTO(fixup_helper);
> > +BTRFS_WORK_HELPER_PROTO(endio_helper);
> > +BTRFS_WORK_HELPER_PROTO(endio_meta_helper);
> > +BTRFS_WORK_HELPER_PROTO(endio_meta_write_helper);
> > +BTRFS_WORK_HELPER_PROTO(endio_raid56_helper);
> > +BTRFS_WORK_HELPER_PROTO(rmw_helper);
> > +BTRFS_WORK_HELPER_PROTO(endio_write_helper);
> > +BTRFS_WORK_HELPER_PROTO(freespace_write_helper);
> > +BTRFS_WORK_HELPER_PROTO(delayed_meta_helper);
> > +BTRFS_WORK_HELPER_PROTO(readahead_helper);
> > +BTRFS_WORK_HELPER_PROTO(qgroup_rescan_helper);
> > +BTRFS_WORK_HELPER_PROTO(extent_refs_helper);
> > +BTRFS_WORK_HELPER_PROTO(scrub_helper);
> > +BTRFS_WORK_HELPER_PROTO(scrubwrc_helper);
> > +BTRFS_WORK_HELPER_PROTO(scrubnc_helper);
> > +
> >  struct btrfs_workqueue *btrfs_alloc_workqueue(const char *name,
> >  					      int flags,
> >  					      int max_active,
> >  					      int thresh);
> > -void btrfs_init_work(struct btrfs_work *work,
> > +void btrfs_init_work(struct btrfs_work *work, btrfs_work_func_t helper,
> >  		     btrfs_func_t func,
> >  		     btrfs_func_t ordered_func,
> >  		     btrfs_func_t ordered_free);
> > diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> > index 0816814..054577b 100644
> > --- a/fs/btrfs/delayed-inode.c
> > +++ b/fs/btrfs/delayed-inode.c
> > @@ -1395,8 +1395,8 @@ static int btrfs_wq_run_delayed_node(struct btrfs_delayed_root *delayed_root,
> >  		return -ENOMEM;
> >  
> >  	async_work->delayed_root = delayed_root;
> > -	btrfs_init_work(&async_work->work, btrfs_async_run_delayed_root,
> > -			NULL, NULL);
> > +	btrfs_init_work(&async_work->work, btrfs_delayed_meta_helper,
> > +			btrfs_async_run_delayed_root, NULL, NULL);
> >  	async_work->nr = nr;
> >  
> >  	btrfs_queue_work(root->fs_info->delayed_workers, &async_work->work);
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index fd9be2d..28098ba 100644
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -39,7 +39,6 @@
> >  #include "btrfs_inode.h"
> >  #include "volumes.h"
> >  #include "print-tree.h"
> > -#include "async-thread.h"
> >  #include "locking.h"
> >  #include "tree-log.h"
> >  #include "free-space-cache.h"
> > @@ -714,42 +713,48 @@ static void end_workqueue_bio(struct bio *bio, int err)
> >  {
> >  	struct end_io_wq *end_io_wq = bio->bi_private;
> >  	struct btrfs_fs_info *fs_info;
> > +	struct btrfs_workqueue *wq;
> > +	btrfs_work_func_t func;
> >  
> >  	fs_info = end_io_wq->info;
> >  	end_io_wq->error = err;
> >  
> > -	if (likely(end_io_wq->metadata != BTRFS_WQ_ENDIO_DIO_REPAIR))
> > -		btrfs_init_work(&end_io_wq->work, end_workqueue_fn, NULL,
> > -				NULL);
> > -	else
> > +	if (unlikely(end_io_wq->metadata == BTRFS_WQ_ENDIO_DIO_REPAIR &&
> > +		     !(bio->bi_rw & REQ_WRITE))) {
> >  		INIT_WORK(&end_io_wq->work.normal_work, dio_end_workqueue_fn);
> > +		queue_work(system_wq, &end_io_wq->work.normal_work);
> > +		return;
> > +	}
> >  
> >  	if (bio->bi_rw & REQ_WRITE) {
> > -		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA)
> > -			btrfs_queue_work(fs_info->endio_meta_write_workers,
> > -					 &end_io_wq->work);
> > -		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE)
> > -			btrfs_queue_work(fs_info->endio_freespace_worker,
> > -					 &end_io_wq->work);
> > -		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
> > -			btrfs_queue_work(fs_info->endio_raid56_workers,
> > -					 &end_io_wq->work);
> > -		else
> > -			btrfs_queue_work(fs_info->endio_write_workers,
> > -					 &end_io_wq->work);
> > +		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA) {
> > +			wq = fs_info->endio_meta_write_workers;
> > +			func = btrfs_endio_meta_write_helper;
> > +		} else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE) {
> > +			wq = fs_info->endio_freespace_worker;
> > +			func = btrfs_freespace_write_helper;
> > +		} else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56) {
> > +			wq = fs_info->endio_raid56_workers;
> > +			func = btrfs_endio_raid56_helper;
> > +		} else {
> > +			wq = fs_info->endio_write_workers;
> > +			func = btrfs_endio_write_helper;
> > +		}
> >  	} else {
> > -		if (unlikely(end_io_wq->metadata == BTRFS_WQ_ENDIO_DIO_REPAIR))
> > -			queue_work(system_wq, &end_io_wq->work.normal_work);
> > -		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
> > -			btrfs_queue_work(fs_info->endio_raid56_workers,
> > -					 &end_io_wq->work);
> > -		else if (end_io_wq->metadata)
> > -			btrfs_queue_work(fs_info->endio_meta_workers,
> > -					 &end_io_wq->work);
> > -		else
> > -			btrfs_queue_work(fs_info->endio_workers,
> > -					 &end_io_wq->work);
> > +		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56) {
> > +			wq = fs_info->endio_raid56_workers;
> > +			func = btrfs_endio_raid56_helper;
> > +		} else if (end_io_wq->metadata) {
> > +			wq = fs_info->endio_meta_workers;
> > +			func = btrfs_endio_meta_helper;
> > +		} else {
> > +			wq = fs_info->endio_workers;
> > +			func = btrfs_endio_helper;
> > +		}
> >  	}
> > +
> > +	btrfs_init_work(&end_io_wq->work, func, end_workqueue_fn, NULL, NULL);
> > +	btrfs_queue_work(wq, &end_io_wq->work);
> >  }
> >  
> >  /*
> > @@ -857,7 +862,7 @@ int btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info, struct inode *inode,
> >  	async->submit_bio_start = submit_bio_start;
> >  	async->submit_bio_done = submit_bio_done;
> >  
> > -	btrfs_init_work(&async->work, run_one_async_start,
> > +	btrfs_init_work(&async->work, btrfs_worker_helper, run_one_async_start,
> >  			run_one_async_done, run_one_async_free);
> >  
> >  	async->bio_flags = bio_flags;
> > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > index ef0845d..e105558 100644
> > --- a/fs/btrfs/extent-tree.c
> > +++ b/fs/btrfs/extent-tree.c
> > @@ -552,7 +552,8 @@ static int cache_block_group(struct btrfs_block_group_cache *cache,
> >  	caching_ctl->block_group = cache;
> >  	caching_ctl->progress = cache->key.objectid;
> >  	atomic_set(&caching_ctl->count, 1);
> > -	btrfs_init_work(&caching_ctl->work, caching_thread, NULL, NULL);
> > +	btrfs_init_work(&caching_ctl->work, btrfs_cache_helper,
> > +			caching_thread, NULL, NULL);
> >  
> >  	spin_lock(&cache->lock);
> >  	/*
> > @@ -2749,8 +2750,8 @@ int btrfs_async_run_delayed_refs(struct btrfs_root *root,
> >  		async->sync = 0;
> >  	init_completion(&async->wait);
> >  
> > -	btrfs_init_work(&async->work, delayed_ref_async_start,
> > -			NULL, NULL);
> > +	btrfs_init_work(&async->work, btrfs_extent_refs_helper,
> > +			delayed_ref_async_start, NULL, NULL);
> >  
> >  	btrfs_queue_work(root->fs_info->extent_workers, &async->work);
> >  
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index de1df3a..c591af5 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -1111,8 +1111,10 @@ static int cow_file_range_async(struct inode *inode, struct page *locked_page,
> >  		async_cow->end = cur_end;
> >  		INIT_LIST_HEAD(&async_cow->extents);
> >  
> > -		btrfs_init_work(&async_cow->work, async_cow_start,
> > -				async_cow_submit, async_cow_free);
> > +		btrfs_init_work(&async_cow->work,
> > +				btrfs_delalloc_helper,
> > +				async_cow_start, async_cow_submit,
> > +				async_cow_free);
> >  
> >  		nr_pages = (cur_end - start + PAGE_CACHE_SIZE) >>
> >  			PAGE_CACHE_SHIFT;
> > @@ -1924,7 +1926,8 @@ static int btrfs_writepage_start_hook(struct page *page, u64 start, u64 end)
> >  
> >  	SetPageChecked(page);
> >  	page_cache_get(page);
> > -	btrfs_init_work(&fixup->work, btrfs_writepage_fixup_worker, NULL, NULL);
> > +	btrfs_init_work(&fixup->work, btrfs_fixup_helper,
> > +			btrfs_writepage_fixup_worker, NULL, NULL);
> >  	fixup->page = page;
> >  	btrfs_queue_work(root->fs_info->fixup_workers, &fixup->work);
> >  	return -EBUSY;
> > @@ -2869,7 +2872,8 @@ static int btrfs_writepage_end_io_hook(struct page *page, u64 start, u64 end,
> >  	struct inode *inode = page->mapping->host;
> >  	struct btrfs_root *root = BTRFS_I(inode)->root;
> >  	struct btrfs_ordered_extent *ordered_extent = NULL;
> > -	struct btrfs_workqueue *workers;
> > +	struct btrfs_workqueue *wq;
> > +	btrfs_work_func_t func;
> >  
> >  	trace_btrfs_writepage_end_io_hook(page, start, end, uptodate);
> >  
> > @@ -2878,13 +2882,17 @@ static int btrfs_writepage_end_io_hook(struct page *page, u64 start, u64 end,
> >  					    end - start + 1, uptodate))
> >  		return 0;
> >  
> > -	btrfs_init_work(&ordered_extent->work, finish_ordered_fn, NULL, NULL);
> > +	if (btrfs_is_free_space_inode(inode)) {
> > +		wq = root->fs_info->endio_freespace_worker;
> > +		func = btrfs_freespace_write_helper;
> > +	} else {
> > +		wq = root->fs_info->endio_write_workers;
> > +		func = btrfs_endio_write_helper;
> > +	}
> >  
> > -	if (btrfs_is_free_space_inode(inode))
> > -		workers = root->fs_info->endio_freespace_worker;
> > -	else
> > -		workers = root->fs_info->endio_write_workers;
> > -	btrfs_queue_work(workers, &ordered_extent->work);
> > +	btrfs_init_work(&ordered_extent->work, func, finish_ordered_fn, NULL,
> > +			NULL);
> > +	btrfs_queue_work(wq, &ordered_extent->work);
> >  
> >  	return 0;
> >  }
> > @@ -7503,7 +7511,8 @@ again:
> >  	if (!ret)
> >  		goto out_test;
> >  
> > -	btrfs_init_work(&ordered->work, finish_ordered_fn, NULL, NULL);
> > +	btrfs_init_work(&ordered->work, btrfs_endio_write_helper,
> > +			finish_ordered_fn, NULL, NULL);
> >  	btrfs_queue_work(root->fs_info->endio_write_workers,
> >  			 &ordered->work);
> >  out_test:
> > @@ -8866,7 +8875,9 @@ struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode,
> >  	work->inode = inode;
> >  	work->wait = wait;
> >  	work->delay_iput = delay_iput;
> > -	btrfs_init_work(&work->work, btrfs_run_delalloc_work, NULL, NULL);
> > +	WARN_ON_ONCE(!inode);
> > +	btrfs_init_work(&work->work, btrfs_flush_delalloc_helper,
> > +			btrfs_run_delalloc_work, NULL, NULL);
> >  
> >  	return work;
> >  }
> > diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> > index 963895c..ac734ec 100644
> > --- a/fs/btrfs/ordered-data.c
> > +++ b/fs/btrfs/ordered-data.c
> > @@ -615,6 +615,7 @@ int btrfs_wait_ordered_extents(struct btrfs_root *root, int nr)
> >  		spin_unlock(&root->ordered_extent_lock);
> >  
> >  		btrfs_init_work(&ordered->flush_work,
> > +				btrfs_flush_delalloc_helper,
> >  				btrfs_run_ordered_extent_work, NULL, NULL);
> >  		list_add_tail(&ordered->work_list, &works);
> >  		btrfs_queue_work(root->fs_info->flush_workers,
> > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> > index c11acca..b4a27bf 100644
> > --- a/fs/btrfs/qgroup.c
> > +++ b/fs/btrfs/qgroup.c
> > @@ -2722,6 +2722,7 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64 progress_objectid,
> >  	memset(&fs_info->qgroup_rescan_work, 0,
> >  	       sizeof(fs_info->qgroup_rescan_work));
> >  	btrfs_init_work(&fs_info->qgroup_rescan_work,
> > +			btrfs_qgroup_rescan_helper,
> >  			btrfs_qgroup_rescan_worker, NULL, NULL);
> >  
> >  	if (ret) {
> > diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> > index 6c5c5ae..6a41631 100644
> > --- a/fs/btrfs/raid56.c
> > +++ b/fs/btrfs/raid56.c
> > @@ -1416,7 +1416,8 @@ cleanup:
> >  
> >  static void async_rmw_stripe(struct btrfs_raid_bio *rbio)
> >  {
> > -	btrfs_init_work(&rbio->work, rmw_work, NULL, NULL);
> > +	btrfs_init_work(&rbio->work, btrfs_rmw_helper,
> > +			rmw_work, NULL, NULL);
> >  
> >  	btrfs_queue_work(rbio->fs_info->rmw_workers,
> >  			 &rbio->work);
> > @@ -1424,7 +1425,8 @@ static void async_rmw_stripe(struct btrfs_raid_bio *rbio)
> >  
> >  static void async_read_rebuild(struct btrfs_raid_bio *rbio)
> >  {
> > -	btrfs_init_work(&rbio->work, read_rebuild_work, NULL, NULL);
> > +	btrfs_init_work(&rbio->work, btrfs_rmw_helper,
> > +			read_rebuild_work, NULL, NULL);
> >  
> >  	btrfs_queue_work(rbio->fs_info->rmw_workers,
> >  			 &rbio->work);
> > @@ -1665,7 +1667,8 @@ static void btrfs_raid_unplug(struct blk_plug_cb *cb, bool from_schedule)
> >  	plug = container_of(cb, struct btrfs_plug_cb, cb);
> >  
> >  	if (from_schedule) {
> > -		btrfs_init_work(&plug->work, unplug_work, NULL, NULL);
> > +		btrfs_init_work(&plug->work, btrfs_rmw_helper,
> > +				unplug_work, NULL, NULL);
> >  		btrfs_queue_work(plug->info->rmw_workers,
> >  				 &plug->work);
> >  		return;
> > diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
> > index 85eb55d..b63ae20 100644
> > --- a/fs/btrfs/reada.c
> > +++ b/fs/btrfs/reada.c
> > @@ -798,7 +798,8 @@ static void reada_start_machine(struct btrfs_fs_info *fs_info)
> >  		/* FIXME we cannot handle this properly right now */
> >  		BUG();
> >  	}
> > -	btrfs_init_work(&rmw->work, reada_start_machine_worker, NULL, NULL);
> > +	btrfs_init_work(&rmw->work, btrfs_readahead_helper,
> > +			reada_start_machine_worker, NULL, NULL);
> >  	rmw->fs_info = fs_info;
> >  
> >  	btrfs_queue_work(fs_info->readahead_workers, &rmw->work);
> > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> > index 695623e..771f0ab 100644
> > --- a/fs/btrfs/scrub.c
> > +++ b/fs/btrfs/scrub.c
> > @@ -427,8 +427,8 @@ struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev, int is_dev_replace)
> >  		sbio->index = i;
> >  		sbio->sctx = sctx;
> >  		sbio->page_count = 0;
> > -		btrfs_init_work(&sbio->work, scrub_bio_end_io_worker,
> > -				NULL, NULL);
> > +		btrfs_init_work(&sbio->work, btrfs_scrub_helper,
> > +				scrub_bio_end_io_worker, NULL, NULL);
> >  
> >  		if (i != SCRUB_BIOS_PER_SCTX - 1)
> >  			sctx->bios[i]->next_free = i + 1;
> > @@ -997,8 +997,8 @@ nodatasum_case:
> >  		fixup_nodatasum->root = fs_info->extent_root;
> >  		fixup_nodatasum->mirror_num = failed_mirror_index + 1;
> >  		scrub_pending_trans_workers_inc(sctx);
> > -		btrfs_init_work(&fixup_nodatasum->work, scrub_fixup_nodatasum,
> > -				NULL, NULL);
> > +		btrfs_init_work(&fixup_nodatasum->work, btrfs_scrub_helper,
> > +				scrub_fixup_nodatasum, NULL, NULL);
> >  		btrfs_queue_work(fs_info->scrub_workers,
> >  				 &fixup_nodatasum->work);
> >  		goto out;
> > @@ -1624,7 +1624,8 @@ static void scrub_wr_bio_end_io(struct bio *bio, int err)
> >  	sbio->err = err;
> >  	sbio->bio = bio;
> >  
> > -	btrfs_init_work(&sbio->work, scrub_wr_bio_end_io_worker, NULL, NULL);
> > +	btrfs_init_work(&sbio->work, btrfs_scrubwrc_helper,
> > +			 scrub_wr_bio_end_io_worker, NULL, NULL);
> >  	btrfs_queue_work(fs_info->scrub_wr_completion_workers, &sbio->work);
> >  }
> >  
> > @@ -3232,7 +3233,8 @@ static int copy_nocow_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
> >  	nocow_ctx->len = len;
> >  	nocow_ctx->mirror_num = mirror_num;
> >  	nocow_ctx->physical_for_dev_replace = physical_for_dev_replace;
> > -	btrfs_init_work(&nocow_ctx->work, copy_nocow_pages_worker, NULL, NULL);
> > +	btrfs_init_work(&nocow_ctx->work, btrfs_scrubnc_helper,
> > +			copy_nocow_pages_worker, NULL, NULL);
> >  	INIT_LIST_HEAD(&nocow_ctx->inodes);
> >  	btrfs_queue_work(fs_info->scrub_nocow_workers,
> >  			 &nocow_ctx->work);
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index db67c8a..f20c288 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -5887,7 +5887,8 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
> >  	else
> >  		generate_random_uuid(dev->uuid);
> >  
> > -	btrfs_init_work(&dev->work, pending_bios_fn, NULL, NULL);
> > +	btrfs_init_work(&dev->work, btrfs_submit_helper,
> > +			pending_bios_fn, NULL, NULL);
> >  
> >  	return dev;
> >  }
> > 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
  2014-08-15 15:36 ` [PATCH v3] " Liu Bo
  2014-08-15 16:05   ` Chris Mason
  2014-08-16  7:28   ` Miao Xie
@ 2014-08-25 14:58   ` Chris Mason
  2014-08-25 15:19     ` Liu Bo
  2014-08-26 10:20     ` Martin Steigerwald
  2 siblings, 2 replies; 22+ messages in thread
From: Chris Mason @ 2014-08-25 14:58 UTC (permalink / raw)
  To: Liu Bo, linux-btrfs

On 08/15/2014 11:36 AM, Liu Bo wrote:
> This has been reported and discussed for a long time, and this hang occurs in
> both 3.15 and 3.16.

[ great description ]

I ran this through tests last week, and an overnight test over the
weekend.  It's in my for-linus branch now, along with everything else I
plan on sending for rc3.

Please double check my merge, I had to undo your rebase onto Miao's patches.

-chris

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
  2014-08-25 14:58   ` Chris Mason
@ 2014-08-25 15:19     ` Liu Bo
  2014-08-26 10:20     ` Martin Steigerwald
  1 sibling, 0 replies; 22+ messages in thread
From: Liu Bo @ 2014-08-25 15:19 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

On Mon, Aug 25, 2014 at 10:58:13AM -0400, Chris Mason wrote:
> On 08/15/2014 11:36 AM, Liu Bo wrote:
> > This has been reported and discussed for a long time, and this hang occurs in
> > both 3.15 and 3.16.
> 
> [ great description ]
> 
> I ran this through tests last week, and an overnight test over the
> weekend.  It's in my for-linus branch now, along with everything else I
> plan on sending for rc3.
> 
> Please double check my merge, I had to undo your rebase onto Miao's patches.

Just checked, looks good ;)

thanks,
-liubo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
  2014-08-25 14:58   ` Chris Mason
  2014-08-25 15:19     ` Liu Bo
@ 2014-08-26 10:20     ` Martin Steigerwald
  2014-08-26 10:38       ` Liu Bo
  2014-08-26 13:02       ` Chris Mason
  1 sibling, 2 replies; 22+ messages in thread
From: Martin Steigerwald @ 2014-08-26 10:20 UTC (permalink / raw)
  To: Chris Mason; +Cc: Liu Bo, linux-btrfs

Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason:
> On 08/15/2014 11:36 AM, Liu Bo wrote:
> > This has been reported and discussed for a long time, and this hang occurs
> > in both 3.15 and 3.16.
> 
> [ great description ]
> 
> I ran this through tests last week, and an overnight test over the
> weekend.  It's in my for-linus branch now, along with everything else I
> plan on sending for rc3.
> 
> Please double check my merge, I had to undo your rebase onto Miao's patches.

I would like to test this on 3.17-rc2, what do I need to do to make it apply 
cleanly? That function in disk-io.c looks quite different from what the patch 
assumes at the beginning, so I am not sure how to merge this.

martin@merkaba:~/Computer/Merkaba/Kernel/linux> patch -p1 < "../[PATCH v3] 
Btrfs_fix task hang under heavy compressed write.mbox"   
patching file fs/btrfs/async-thread.c
patching file fs/btrfs/async-thread.h
patching file fs/btrfs/delayed-inode.c
patching file fs/btrfs/disk-io.c
Hunk #2 FAILED at 713.
Hunk #3 succeeded at 827 (offset -29 lines).
1 out of 3 hunks FAILED -- saving rejects to file fs/btrfs/disk-io.c.rej
patching file fs/btrfs/extent-tree.c
patching file fs/btrfs/inode.c
Hunk #1 succeeded at 1096 (offset -15 lines).
Hunk #2 succeeded at 1883 (offset -43 lines).
Hunk #3 succeeded at 2825 (offset -47 lines).
Hunk #4 succeeded at 2835 (offset -47 lines).
Hunk #5 succeeded at 7166 (offset -345 lines).
Hunk #6 succeeded at 8504 (offset -371 lines).
patching file fs/btrfs/ordered-data.c
patching file fs/btrfs/qgroup.c
Hunk #1 succeeded at 2720 (offset -2 lines).
patching file fs/btrfs/raid56.c
patching file fs/btrfs/reada.c
patching file fs/btrfs/scrub.c
Hunk #1 succeeded at 428 (offset 1 line).
Hunk #2 succeeded at 999 (offset 2 lines).
Hunk #3 succeeded at 1616 (offset -8 lines).
Hunk #4 succeeded at 3204 (offset -29 lines).
patching file fs/btrfs/volumes.c
Hunk #1 succeeded at 5800 (offset -87 lines).

Otherwise, I´d wait till rc3, as my current 3.16.2 with v1 of this patch seems 
to behave stable with trees occupying all space:

merkaba:~> btrfs fi sh /home
Label: 'home'  uuid: […]
        Total devices 2 FS bytes used 127.69GiB
        devid    1 size 160.00GiB used 160.00GiB path /dev/dm-0
        devid    2 size 160.00GiB used 160.00GiB path /dev/mapper/sata-home

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
  2014-08-26 10:20     ` Martin Steigerwald
@ 2014-08-26 10:38       ` Liu Bo
  2014-08-26 12:04         ` Martin Steigerwald
  2014-08-26 13:02       ` Chris Mason
  1 sibling, 1 reply; 22+ messages in thread
From: Liu Bo @ 2014-08-26 10:38 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: Chris Mason, linux-btrfs

On Tue, Aug 26, 2014 at 12:20:28PM +0200, Martin Steigerwald wrote:
> Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason:
> > On 08/15/2014 11:36 AM, Liu Bo wrote:
> > > This has been reported and discussed for a long time, and this hang occurs
> > > in both 3.15 and 3.16.
> > 
> > [ great description ]
> > 
> > I ran this through tests last week, and an overnight test over the
> > weekend.  It's in my for-linus branch now, along with everything else I
> > plan on sending for rc3.
> > 
> > Please double check my merge, I had to undo your rebase onto Miao's patches.
> 
> I would like to test this on 3.17-rc2, what do I need to do to make it apply 
> cleanly? That function in disk-io.c looks quite different from what the patch 
> assumes at the beginning, so I am not sure how to merge this.
> 
> martin@merkaba:~/Computer/Merkaba/Kernel/linux> patch -p1 < "../[PATCH v3] 
> Btrfs_fix task hang under heavy compressed write.mbox"   
> patching file fs/btrfs/async-thread.c
> patching file fs/btrfs/async-thread.h
> patching file fs/btrfs/delayed-inode.c
> patching file fs/btrfs/disk-io.c
> Hunk #2 FAILED at 713.
> Hunk #3 succeeded at 827 (offset -29 lines).
> 1 out of 3 hunks FAILED -- saving rejects to file fs/btrfs/disk-io.c.rej
> patching file fs/btrfs/extent-tree.c
> patching file fs/btrfs/inode.c
> Hunk #1 succeeded at 1096 (offset -15 lines).
> Hunk #2 succeeded at 1883 (offset -43 lines).
> Hunk #3 succeeded at 2825 (offset -47 lines).
> Hunk #4 succeeded at 2835 (offset -47 lines).
> Hunk #5 succeeded at 7166 (offset -345 lines).
> Hunk #6 succeeded at 8504 (offset -371 lines).
> patching file fs/btrfs/ordered-data.c
> patching file fs/btrfs/qgroup.c
> Hunk #1 succeeded at 2720 (offset -2 lines).
> patching file fs/btrfs/raid56.c
> patching file fs/btrfs/reada.c
> patching file fs/btrfs/scrub.c
> Hunk #1 succeeded at 428 (offset 1 line).
> Hunk #2 succeeded at 999 (offset 2 lines).
> Hunk #3 succeeded at 1616 (offset -8 lines).
> Hunk #4 succeeded at 3204 (offset -29 lines).
> patching file fs/btrfs/volumes.c
> Hunk #1 succeeded at 5800 (offset -87 lines).
> 
> Otherwise, I´d wait till rc3, as my current 3.16.2 with v1 of this patch seems 
> to behave stable with trees occupying all space:
> 
> merkaba:~> btrfs fi sh /home
> Label: 'home'  uuid: […]
>         Total devices 2 FS bytes used 127.69GiB
>         devid    1 size 160.00GiB used 160.00GiB path /dev/dm-0
>         devid    2 size 160.00GiB used 160.00GiB path /dev/mapper/sata-home

Hmm, the v3 patch is based on Chris's integration branch, so it's expected not
to apply it cleanly.  But if you're urgent, I can provide a patch based on
3.17-rc2.

(rc3 is coming soon though ;-) )

thanks,
-liubo

> 
> Ciao,
> -- 
> Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
> GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
  2014-08-26 10:38       ` Liu Bo
@ 2014-08-26 12:04         ` Martin Steigerwald
  0 siblings, 0 replies; 22+ messages in thread
From: Martin Steigerwald @ 2014-08-26 12:04 UTC (permalink / raw)
  To: bo.li.liu; +Cc: Chris Mason, linux-btrfs

Am Dienstag, 26. August 2014, 18:38:03 schrieb Liu Bo:
> On Tue, Aug 26, 2014 at 12:20:28PM +0200, Martin Steigerwald wrote:
> > Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason:
> > > On 08/15/2014 11:36 AM, Liu Bo wrote:
> > > > This has been reported and discussed for a long time, and this hang
> > > > occurs
> > > > in both 3.15 and 3.16.
> > > 
> > > [ great description ]
> > > 
> > > I ran this through tests last week, and an overnight test over the
> > > weekend.  It's in my for-linus branch now, along with everything else I
> > > plan on sending for rc3.
> > > 
> > > Please double check my merge, I had to undo your rebase onto Miao's
> > > patches.> 
> > I would like to test this on 3.17-rc2, what do I need to do to make it
> > apply cleanly? That function in disk-io.c looks quite different from what
> > the patch assumes at the beginning, so I am not sure how to merge this.
> > 
> > martin@merkaba:~/Computer/Merkaba/Kernel/linux> patch -p1 < "../[PATCH v3]
> > Btrfs_fix task hang under heavy compressed write.mbox"
> > patching file fs/btrfs/async-thread.c
> > patching file fs/btrfs/async-thread.h
> > patching file fs/btrfs/delayed-inode.c
> > patching file fs/btrfs/disk-io.c
> > Hunk #2 FAILED at 713.
> > Hunk #3 succeeded at 827 (offset -29 lines).
> > 1 out of 3 hunks FAILED -- saving rejects to file fs/btrfs/disk-io.c.rej
> > patching file fs/btrfs/extent-tree.c
> > patching file fs/btrfs/inode.c
> > Hunk #1 succeeded at 1096 (offset -15 lines).
> > Hunk #2 succeeded at 1883 (offset -43 lines).
> > Hunk #3 succeeded at 2825 (offset -47 lines).
> > Hunk #4 succeeded at 2835 (offset -47 lines).
> > Hunk #5 succeeded at 7166 (offset -345 lines).
> > Hunk #6 succeeded at 8504 (offset -371 lines).
> > patching file fs/btrfs/ordered-data.c
> > patching file fs/btrfs/qgroup.c
> > Hunk #1 succeeded at 2720 (offset -2 lines).
> > patching file fs/btrfs/raid56.c
> > patching file fs/btrfs/reada.c
> > patching file fs/btrfs/scrub.c
> > Hunk #1 succeeded at 428 (offset 1 line).
> > Hunk #2 succeeded at 999 (offset 2 lines).
> > Hunk #3 succeeded at 1616 (offset -8 lines).
> > Hunk #4 succeeded at 3204 (offset -29 lines).
> > patching file fs/btrfs/volumes.c
> > Hunk #1 succeeded at 5800 (offset -87 lines).
> > 
> > Otherwise, I´d wait till rc3, as my current 3.16.2 with v1 of this patch
> > seems to behave stable with trees occupying all space:
> > 
> > merkaba:~> btrfs fi sh /home
> > Label: 'home'  uuid: […]
> > 
> >         Total devices 2 FS bytes used 127.69GiB
> >         devid    1 size 160.00GiB used 160.00GiB path /dev/dm-0
> >         devid    2 size 160.00GiB used 160.00GiB path
> >         /dev/mapper/sata-home
> 
> Hmm, the v3 patch is based on Chris's integration branch, so it's expected
> not to apply it cleanly.  But if you're urgent, I can provide a patch based
> on 3.17-rc2.
> 
> (rc3 is coming soon though ;-) )

I see.

Well I can stick with 3.16.1 for the time being. But if you want some 
additional testing soonish, feel free to provide a patch.

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
  2014-08-26 10:20     ` Martin Steigerwald
  2014-08-26 10:38       ` Liu Bo
@ 2014-08-26 13:02       ` Chris Mason
  2014-08-26 13:20         ` Martin Steigerwald
  1 sibling, 1 reply; 22+ messages in thread
From: Chris Mason @ 2014-08-26 13:02 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: Liu Bo, linux-btrfs



On 08/26/2014 06:20 AM, Martin Steigerwald wrote:
> Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason:
>> On 08/15/2014 11:36 AM, Liu Bo wrote:
>>> This has been reported and discussed for a long time, and this hang occurs
>>> in both 3.15 and 3.16.
>>
>> [ great description ]
>>
>> I ran this through tests last week, and an overnight test over the
>> weekend.  It's in my for-linus branch now, along with everything else I
>> plan on sending for rc3.
>>
>> Please double check my merge, I had to undo your rebase onto Miao's patches.
> 
> I would like to test this on 3.17-rc2, what do I need to do to make it apply 
> cleanly? That function in disk-io.c looks quite different from what the patch 
> assumes at the beginning, so I am not sure how to merge this.

You can just pull my for-linus branch, Liu Bo's fix is on top.

-chris

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
  2014-08-26 13:02       ` Chris Mason
@ 2014-08-26 13:20         ` Martin Steigerwald
  2014-08-31 11:48           ` Martin Steigerwald
  0 siblings, 1 reply; 22+ messages in thread
From: Martin Steigerwald @ 2014-08-26 13:20 UTC (permalink / raw)
  To: Chris Mason; +Cc: Liu Bo, linux-btrfs

Am Dienstag, 26. August 2014, 09:02:23 schrieb Chris Mason:
> On 08/26/2014 06:20 AM, Martin Steigerwald wrote:
> > Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason:
> >> On 08/15/2014 11:36 AM, Liu Bo wrote:
> >>> This has been reported and discussed for a long time, and this hang
> >>> occurs
> >>> in both 3.15 and 3.16.
> >> 
> >> [ great description ]
> >> 
> >> I ran this through tests last week, and an overnight test over the
> >> weekend.  It's in my for-linus branch now, along with everything else I
> >> plan on sending for rc3.
> >> 
> >> Please double check my merge, I had to undo your rebase onto Miao's
> >> patches.> 
> > I would like to test this on 3.17-rc2, what do I need to do to make it
> > apply cleanly? That function in disk-io.c looks quite different from what
> > the patch assumes at the beginning, so I am not sure how to merge this.
> 
> You can just pull my for-linus branch, Liu Bo's fix is on top.

Thanks, I put the commit on top of your for-linus branch on top of 3.17-rc2. 
Lets see how it goes.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
  2014-08-26 13:20         ` Martin Steigerwald
@ 2014-08-31 11:48           ` Martin Steigerwald
  2014-08-31 15:40             ` Liu Bo
  0 siblings, 1 reply; 22+ messages in thread
From: Martin Steigerwald @ 2014-08-31 11:48 UTC (permalink / raw)
  To: Chris Mason; +Cc: Liu Bo, linux-btrfs

Am Dienstag, 26. August 2014, 15:20:21 schrieb Martin Steigerwald:
> Am Dienstag, 26. August 2014, 09:02:23 schrieb Chris Mason:
> > On 08/26/2014 06:20 AM, Martin Steigerwald wrote:
> > > Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason:
> > >> On 08/15/2014 11:36 AM, Liu Bo wrote:
> > >>> This has been reported and discussed for a long time, and this hang
> > >>> occurs
> > >>> in both 3.15 and 3.16.
> > >> 
> > >> [ great description ]
> > >> 
> > >> I ran this through tests last week, and an overnight test over the
> > >> weekend.  It's in my for-linus branch now, along with everything else I
> > >> plan on sending for rc3.
> > >> 
> > >> Please double check my merge, I had to undo your rebase onto Miao's
> > >> patches.>
> > > 
> > > I would like to test this on 3.17-rc2, what do I need to do to make it
> > > apply cleanly? That function in disk-io.c looks quite different from
> > > what
> > > the patch assumes at the beginning, so I am not sure how to merge this.
> > 
> > You can just pull my for-linus branch, Liu Bo's fix is on top.
> 
> Thanks, I put the commit on top of your for-linus branch on top of 3.17-rc2.
> Lets see how it goes.

So far this appears to work well.

I have experienced no hangs.

Tested-By: Martin Steigerwald <martin@lichtvoll.de>

But I bet Linus took it already anyways :). Looking forward to rc3.

Thanks for digging into and fixing this long standing annoying bug.

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] Btrfs: fix task hang under heavy compressed write
  2014-08-31 11:48           ` Martin Steigerwald
@ 2014-08-31 15:40             ` Liu Bo
  0 siblings, 0 replies; 22+ messages in thread
From: Liu Bo @ 2014-08-31 15:40 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: Chris Mason, linux-btrfs

On Sun, Aug 31, 2014 at 01:48:36PM +0200, Martin Steigerwald wrote:
> Am Dienstag, 26. August 2014, 15:20:21 schrieb Martin Steigerwald:
> > Am Dienstag, 26. August 2014, 09:02:23 schrieb Chris Mason:
> > > On 08/26/2014 06:20 AM, Martin Steigerwald wrote:
> > > > Am Montag, 25. August 2014, 10:58:13 schrieb Chris Mason:
> > > >> On 08/15/2014 11:36 AM, Liu Bo wrote:
> > > >>> This has been reported and discussed for a long time, and this hang
> > > >>> occurs
> > > >>> in both 3.15 and 3.16.
> > > >> 
> > > >> [ great description ]
> > > >> 
> > > >> I ran this through tests last week, and an overnight test over the
> > > >> weekend.  It's in my for-linus branch now, along with everything else I
> > > >> plan on sending for rc3.
> > > >> 
> > > >> Please double check my merge, I had to undo your rebase onto Miao's
> > > >> patches.>
> > > > 
> > > > I would like to test this on 3.17-rc2, what do I need to do to make it
> > > > apply cleanly? That function in disk-io.c looks quite different from
> > > > what
> > > > the patch assumes at the beginning, so I am not sure how to merge this.
> > > 
> > > You can just pull my for-linus branch, Liu Bo's fix is on top.
> > 
> > Thanks, I put the commit on top of your for-linus branch on top of 3.17-rc2.
> > Lets see how it goes.
> 
> So far this appears to work well.
> 
> I have experienced no hangs.
> 
> Tested-By: Martin Steigerwald <martin@lichtvoll.de>
> 
> But I bet Linus took it already anyways :). Looking forward to rc3.
> 
> Thanks for digging into and fixing this long standing annoying bug.

Thanks for your feedback!

thanks,
-liubo

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-08-31 15:41 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-12  7:44 [PATCH] Btrfs: fix task hang under heavy compressed write Liu Bo
2014-08-12 14:35 ` [PATCH v2] " Liu Bo
2014-08-12 14:57 ` [PATCH] " Chris Mason
2014-08-13  0:53   ` Qu Wenruo
2014-08-13 11:54 ` Martin Steigerwald
2014-08-13 13:27   ` Rich Freeman
2014-08-13 15:20   ` Liu Bo
2014-08-14  9:27     ` Martin Steigerwald
2014-08-15 17:51       ` Martin Steigerwald
2014-08-15 15:36 ` [PATCH v3] " Liu Bo
2014-08-15 16:05   ` Chris Mason
2014-08-16  7:28   ` Miao Xie
2014-08-18  7:32     ` Liu Bo
2014-08-25 14:58   ` Chris Mason
2014-08-25 15:19     ` Liu Bo
2014-08-26 10:20     ` Martin Steigerwald
2014-08-26 10:38       ` Liu Bo
2014-08-26 12:04         ` Martin Steigerwald
2014-08-26 13:02       ` Chris Mason
2014-08-26 13:20         ` Martin Steigerwald
2014-08-31 11:48           ` Martin Steigerwald
2014-08-31 15:40             ` Liu Bo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.