All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Blocked tasks on 3.15.1
@ 2014-06-27 10:02 Tomasz Chmielewski
  2014-06-27 13:06 ` Duncan
  0 siblings, 1 reply; 49+ messages in thread
From: Tomasz Chmielewski @ 2014-06-27 10:02 UTC (permalink / raw)
  To: linux-btrfs

> I've been getting blocked tasks on 3.15.1 generally at times when the
> filesystem is somewhat busy (such as doing a backup via scp/clonezilla
> writing to the disk).
> 
> A week ago I had enabled snapper for a day which resulted in a daily
> cleanup of about 8 snapshots at once, which might have contributed,
> but I've been limping along since.

I've started seeing similar on several servers, after upgrading to 3.15 
or 3.15.1. With 3.16-rc1 it was even crashing for me.
I've rolled back to the latest 3.14.x, and it's still behaving fine.

I've signalled it before on the list in "btrfs filesystem hang with 
3.15-rc3 when doing rsync" thread.

-- 
Tomasz Chmielewski
http://www.sslrack.com


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-06-27 10:02 Blocked tasks on 3.15.1 Tomasz Chmielewski
@ 2014-06-27 13:06 ` Duncan
  2014-06-27 15:14   ` Rich Freeman
  0 siblings, 1 reply; 49+ messages in thread
From: Duncan @ 2014-06-27 13:06 UTC (permalink / raw)
  To: linux-btrfs

Tomasz Chmielewski posted on Fri, 27 Jun 2014 12:02:43 +0200 as excerpted:

>> I've been getting blocked tasks on 3.15.1 generally at times when the
>> filesystem is somewhat busy (such as doing a backup via scp/clonezilla
>> writing to the disk).
> 
> I've started seeing similar on several servers, after upgrading to 3.15
> or 3.15.1. With 3.16-rc1 it was even crashing for me.
> I've rolled back to the latest 3.14.x, and it's still behaving fine.
> 
> I've signalled it before on the list in "btrfs filesystem hang with
> 3.15-rc3 when doing rsync" thread.

There is a known btrfs lockup bug that was introduced in the commit-
window btrfs pull for 3.16, that was fixed by a pull I believe the day 
before 3.16-rc2.  So 3.16-pre to rc2 is known-bad tho it'll work for a 
few minutes and didn't do any permanent damage that I could see, here.

But from 3.16-rc2 on, the 3.16-pre series has been working fine for me.

For 3.15, I didn't run the pre-releases as I had another project I was 
focusing on, but I experienced no problems with 3.15 itself.  However, my 
use-case is multiple independent small btrfs on partitioned SSD, sub-100-
GB per btrfs, so I'd be less likely to experience the blocked task issues 
that others reported, mostly on TB+ size spinning rust.

And it /did/ seem to me that the frequency of blocked-task reports were 
higher for 3.15 than for previous kernel series, tho 3.15 worked fine for 
me on small btrfs on SSD, the relatively short time I ran it.

Hopefully that problem's fixed on 3.16-rc2+, but as of yet there's not 
enough 3.16-rc2+ reports out there from folks experiencing issues with 
3.15 blocked tasks to rightfully say.  What CAN be said is that the known 
3.16-series commit-window btrfs lockups bug that DID affect me was fixed 
right before rc2, and I'm running rc2+ just fine, here.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-06-27 13:06 ` Duncan
@ 2014-06-27 15:14   ` Rich Freeman
  2014-06-27 15:52     ` Chris Murphy
  0 siblings, 1 reply; 49+ messages in thread
From: Rich Freeman @ 2014-06-27 15:14 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On Fri, Jun 27, 2014 at 9:06 AM, Duncan <1i5t5.duncan@cox.net> wrote:
> Hopefully that problem's fixed on 3.16-rc2+, but as of yet there's not
> enough 3.16-rc2+ reports out there from folks experiencing issues with
> 3.15 blocked tasks to rightfully say.

Any chance that it was backported to 3.15.2?  I'd rather not move to
mainline just for btrfs.

I got another block this morning and failed to capture a log before my
terminals gave out.  I switched back to 3.15.0 for the moment, and
we'll see if that fares any better.

Rich

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-06-27 15:14   ` Rich Freeman
@ 2014-06-27 15:52     ` Chris Murphy
  2014-06-27 17:20       ` Duncan
  2014-06-27 18:33       ` Rich Freeman
  0 siblings, 2 replies; 49+ messages in thread
From: Chris Murphy @ 2014-06-27 15:52 UTC (permalink / raw)
  To: Btrfs BTRFS


On Jun 27, 2014, at 9:14 AM, Rich Freeman <r-btrfs@thefreemanclan.net> wrote:

> On Fri, Jun 27, 2014 at 9:06 AM, Duncan <1i5t5.duncan@cox.net> wrote:
>> Hopefully that problem's fixed on 3.16-rc2+, but as of yet there's not
>> enough 3.16-rc2+ reports out there from folks experiencing issues with
>> 3.15 blocked tasks to rightfully say.
> 
> Any chance that it was backported to 3.15.2?  I'd rather not move to
> mainline just for btrfs.

The backports don't happen that quickly. I'm uncertain about specifics but I think many such fixes need to be demonstrated in mainline before they get backported to stable.


> 
> I got another block this morning and failed to capture a log before my
> terminals gave out.  I switched back to 3.15.0 for the moment, and
> we'll see if that fares any better. 

Yeah I'd start going backwards. The idea of going forwards is to hopefully get you unstuck or extract data where otherwise you can't, it's not really a recommendation for production usage. It's also often useful if you can reproduce the block with a current rc kernel and issue sysrq+w and post that. Then do your regression with an older kernel.

Chris Murphy

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-06-27 15:52     ` Chris Murphy
@ 2014-06-27 17:20       ` Duncan
  2014-06-28  0:22         ` Chris Samuel
  2014-06-27 18:33       ` Rich Freeman
  1 sibling, 1 reply; 49+ messages in thread
From: Duncan @ 2014-06-27 17:20 UTC (permalink / raw)
  To: linux-btrfs

Chris Murphy posted on Fri, 27 Jun 2014 09:52:46 -0600 as excerpted:

> On Jun 27, 2014, at 9:14 AM, Rich Freeman <r-btrfs@thefreemanclan.net>
> wrote:
> 
>> On Fri, Jun 27, 2014 at 9:06 AM, Duncan <1i5t5.duncan@cox.net> wrote:
>>> Hopefully that problem's fixed on 3.16-rc2+, but as of yet there's not
>>> enough 3.16-rc2+ reports out there from folks experiencing issues with
>>> 3.15 blocked tasks to rightfully say.
>> 
>> Any chance that it was backported to 3.15.2?  I'd rather not move to
>> mainline just for btrfs.
> 
> The backports don't happen that quickly.

The lockup bug that affected early 3.16 was introduced in the commit-
window pull for 3.16, so the fix for that shouldn't have needed backported 
(unless the problem commit ended up in stable too, which I doubt but 
don't know for sure).

3.15.0 didn't contain that bug, which affected me, but as I said, there 
did seem to be more blocked-task reports in 3.15, which didn't affect me.

I didn't run 3.15.1, however, staying on 3.15.0 until after 3.16-rc2 
fixed the earlier 3.16-pre series bug that had kept me from the 3.16 
series until then.  So anything that might have affected the 3.15 stable 
series after 3.15.0, I wouldn't know about.

If I'm not mistaken the fix for the 3.16 series bug was:

ea4ebde02e08558b020c4b61bb9a4c0fcf63028e

Btrfs: fix deadlocks with trylock on tree nodes.

But I think the 3.16 commit-window changes introducing the bug weren't 
btrfs specific but instead at the generic vfs level.  If that's the case, 
then it's possible that the bug was there before 3.16's commit window and 
might have been triggering some of the 3.15 reports as well, and the 3.16 
vfs change simply made it much worse.

IOW, I don't know whether that 3.16 series fix will help 3.15 or not, but 
I don't believe it'll hurt, and it /might/ help.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-06-27 15:52     ` Chris Murphy
  2014-06-27 17:20       ` Duncan
@ 2014-06-27 18:33       ` Rich Freeman
  1 sibling, 0 replies; 49+ messages in thread
From: Rich Freeman @ 2014-06-27 18:33 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

On Fri, Jun 27, 2014 at 11:52 AM, Chris Murphy <lists@colorremedies.com> wrote:
> On Jun 27, 2014, at 9:14 AM, Rich Freeman <r-btrfs@thefreemanclan.net> wrote:
>
>>
>> I got another block this morning and failed to capture a log before my
>> terminals gave out.  I switched back to 3.15.0 for the moment, and
>> we'll see if that fares any better.
>
> Yeah I'd start going backwards. The idea of going forwards is to
> hopefully get you unstuck or extract data where otherwise you can't,
> it's not really a recommendation for production usage. It's also often
> useful if you can reproduce the block with a current rc kernel and
> issue sysrq+w and post that. Then do your regression with an older
> kernel.

So, obviously I'm getting my money's worth from the btrfs team, but
neither is always a great option as neither involves me running a
stable kernel.  3.15.0 contains CVE-2014-4014, although I'm running a
version patched for that vulnerability.  If I go back any further I'd
probably have to backport it myself, and I only know about it because
my distro patched that CVE on 3.15.0 before moving to 3.15.1.

Running 3.16 doesn't bother me much from a btrfs standpoint, but it
means I'm getting unstable updates on all the other modules as well.
It is just more to deal with.

I might give 3.15.2 a shot and see what happens, and I can always fall
back to 3.15.0 again.

Rich

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-06-27 17:20       ` Duncan
@ 2014-06-28  0:22         ` Chris Samuel
  2014-06-29 20:02           ` Cody P Schafer
  2014-06-30  2:33           ` Rich Freeman
  0 siblings, 2 replies; 49+ messages in thread
From: Chris Samuel @ 2014-06-28  0:22 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 544 bytes --]

On Fri, 27 Jun 2014 05:20:41 PM Duncan wrote:

> If I'm not mistaken the fix for the 3.16 series bug was:
> 
> ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
> 
> Btrfs: fix deadlocks with trylock on tree nodes.

That patch applies cleanly to 3.15.2 so if it is indeed the fix it should 
probably go to -stable for the next 3.15 release..

Unfortunately my test system died a while ago (hardware problem) and I've not 
been able to resurrect it yet.

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC


[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 482 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-06-28  0:22         ` Chris Samuel
@ 2014-06-29 20:02           ` Cody P Schafer
  2014-06-29 22:22             ` Cody P Schafer
  2014-06-30 18:11             ` Chris Mason
  2014-06-30  2:33           ` Rich Freeman
  1 sibling, 2 replies; 49+ messages in thread
From: Cody P Schafer @ 2014-06-29 20:02 UTC (permalink / raw)
  To: Chris Samuel; +Cc: linux-btrfs

On Fri, Jun 27, 2014 at 7:22 PM, Chris Samuel <chris@csamuel.org> wrote:
> On Fri, 27 Jun 2014 05:20:41 PM Duncan wrote:
>
>> If I'm not mistaken the fix for the 3.16 series bug was:
>>
>> ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
>>
>> Btrfs: fix deadlocks with trylock on tree nodes.
>
> That patch applies cleanly to 3.15.2 so if it is indeed the fix it should
> probably go to -stable for the next 3.15 release..
>
> Unfortunately my test system died a while ago (hardware problem) and I've not
> been able to resurrect it yet.

I'm also seeing stuck tasks on btrfs (3.14.4, 3.15.1, 3.15.2).
I've also tried 3.15.2 with ea4ebde02e08558b020c4b61bb9a4c applied on
top with similar results.
I've been triggering the hang with 'rsync -hPaHAXx --del /mnt/home/a/
/home/a/' where /mnt/home and /home are 2 separate btrfs filesystems
on 2 separate disks.

dmesg with w-trigger: http://bpaste.net/show/419555/ (3.15.2 + ea4ebde)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-06-29 20:02           ` Cody P Schafer
@ 2014-06-29 22:22             ` Cody P Schafer
  2014-06-30 18:11             ` Chris Mason
  1 sibling, 0 replies; 49+ messages in thread
From: Cody P Schafer @ 2014-06-29 22:22 UTC (permalink / raw)
  To: Chris Samuel; +Cc: linux-btrfs

On Sun, Jun 29, 2014 at 3:02 PM, Cody P Schafer <dev@codyps.com> wrote:
> On Fri, Jun 27, 2014 at 7:22 PM, Chris Samuel <chris@csamuel.org> wrote:
>> On Fri, 27 Jun 2014 05:20:41 PM Duncan wrote:
>>
>>> If I'm not mistaken the fix for the 3.16 series bug was:
>>>
>>> ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
>>>
>>> Btrfs: fix deadlocks with trylock on tree nodes.
>>
>> That patch applies cleanly to 3.15.2 so if it is indeed the fix it should
>> probably go to -stable for the next 3.15 release..
>>
>> Unfortunately my test system died a while ago (hardware problem) and I've not
>> been able to resurrect it yet.
>
> I'm also seeing stuck tasks on btrfs (3.14.4, 3.15.1, 3.15.2).
> I've also tried 3.15.2 with ea4ebde02e08558b020c4b61bb9a4c applied on
> top with similar results.
> I've been triggering the hang with 'rsync -hPaHAXx --del /mnt/home/a/
> /home/a/' where /mnt/home and /home are 2 separate btrfs filesystems
> on 2 separate disks.
>
> dmesg with w-trigger: http://bpaste.net/show/419555/ (3.15.2 + ea4ebde)

And here's the same thing but with lockdep enabled (in the hope that
that info might be useful)
http://bpaste.net/show/419899/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-06-28  0:22         ` Chris Samuel
  2014-06-29 20:02           ` Cody P Schafer
@ 2014-06-30  2:33           ` Rich Freeman
  1 sibling, 0 replies; 49+ messages in thread
From: Rich Freeman @ 2014-06-30  2:33 UTC (permalink / raw)
  To: Chris Samuel; +Cc: Btrfs BTRFS

On Fri, Jun 27, 2014 at 8:22 PM, Chris Samuel <chris@csamuel.org> wrote:
> On Fri, 27 Jun 2014 05:20:41 PM Duncan wrote:
>
>> If I'm not mistaken the fix for the 3.16 series bug was:
>>
>> ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
>>
>> Btrfs: fix deadlocks with trylock on tree nodes.
>
> That patch applies cleanly to 3.15.2 so if it is indeed the fix it should
> probably go to -stable for the next 3.15 release..

I can confirm that 3.15.2 definitely has the deadlock problem.  I
tried upgrading just to convince myself of this before patching it and
it only took a few hours before it stopped syncing with the usual
errors.

I applied the patch on Jun 28 around 20:00UTC.  I haven't had a
deadlock since, despite having the file system fairly active with a
few reboots, some deleted snapshots, being assimilated by the new
sysvinit replacement, etc.  That doesn't really prove anything though
- for all I know it will hang a week from now.

However, the patch seems stable so far on 3.15.2.

Rich

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-06-29 20:02           ` Cody P Schafer
  2014-06-29 22:22             ` Cody P Schafer
@ 2014-06-30 18:11             ` Chris Mason
  2014-06-30 18:30               ` Chris Mason
  2014-07-01  3:06               ` Charles Cazabon
  1 sibling, 2 replies; 49+ messages in thread
From: Chris Mason @ 2014-06-30 18:11 UTC (permalink / raw)
  To: Cody P Schafer, Chris Samuel; +Cc: linux-btrfs

On 06/29/2014 04:02 PM, Cody P Schafer wrote:
> On Fri, Jun 27, 2014 at 7:22 PM, Chris Samuel <chris@csamuel.org> wrote:
>> On Fri, 27 Jun 2014 05:20:41 PM Duncan wrote:
>>
>>> If I'm not mistaken the fix for the 3.16 series bug was:
>>>
>>> ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
>>>
>>> Btrfs: fix deadlocks with trylock on tree nodes.
>>
>> That patch applies cleanly to 3.15.2 so if it is indeed the fix it should
>> probably go to -stable for the next 3.15 release..
>>
>> Unfortunately my test system died a while ago (hardware problem) and I've not
>> been able to resurrect it yet.
> 
> I'm also seeing stuck tasks on btrfs (3.14.4, 3.15.1, 3.15.2).
> I've also tried 3.15.2 with ea4ebde02e08558b020c4b61bb9a4c applied on
> top with similar results.
> I've been triggering the hang with 'rsync -hPaHAXx --del /mnt/home/a/
> /home/a/' where /mnt/home and /home are 2 separate btrfs filesystems
> on 2 separate disks.
> 
> dmesg with w-trigger: http://bpaste.net/show/419555
> --

These traces show us waiting for IO, but it doesn't show anyone doing
the IO.  Either we're failing to kick off our work queues or they are
stuck on something else.

Could you please send a sysrq-t and sysrq-l while you're stuck?  That
will show us all the procs and all the CPUs.

-chris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-06-30 18:11             ` Chris Mason
@ 2014-06-30 18:30               ` Chris Mason
  2014-06-30 23:42                 ` Cody P Schafer
  2014-07-01  3:06               ` Charles Cazabon
  1 sibling, 1 reply; 49+ messages in thread
From: Chris Mason @ 2014-06-30 18:30 UTC (permalink / raw)
  To: Cody P Schafer, Chris Samuel; +Cc: linux-btrfs



On 06/30/2014 02:11 PM, Chris Mason wrote:
> On 06/29/2014 04:02 PM, Cody P Schafer wrote:
>> On Fri, Jun 27, 2014 at 7:22 PM, Chris Samuel <chris@csamuel.org> wrote:
>>> On Fri, 27 Jun 2014 05:20:41 PM Duncan wrote:
>>>
>>>> If I'm not mistaken the fix for the 3.16 series bug was:
>>>>
>>>> ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
>>>>
>>>> Btrfs: fix deadlocks with trylock on tree nodes.
>>>
>>> That patch applies cleanly to 3.15.2 so if it is indeed the fix it should
>>> probably go to -stable for the next 3.15 release..
>>>
>>> Unfortunately my test system died a while ago (hardware problem) and I've not
>>> been able to resurrect it yet.
>>
>> I'm also seeing stuck tasks on btrfs (3.14.4, 3.15.1, 3.15.2).
>> I've also tried 3.15.2 with ea4ebde02e08558b020c4b61bb9a4c applied on
>> top with similar results.
>> I've been triggering the hang with 'rsync -hPaHAXx --del /mnt/home/a/
>> /home/a/' where /mnt/home and /home are 2 separate btrfs filesystems
>> on 2 separate disks.
>>
>> dmesg with w-trigger: http://bpaste.net/show/419555
>> --
> 
> These traces show us waiting for IO, but it doesn't show anyone doing
> the IO.  Either we're failing to kick off our work queues or they are
> stuck on something else.
> 
> Could you please send a sysrq-t and sysrq-l while you're stuck?  That
> will show us all the procs and all the CPUs.

Also, do you have any nodatacow files in here?  Please say yes.

-chris


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-06-30 18:30               ` Chris Mason
@ 2014-06-30 23:42                 ` Cody P Schafer
  2014-07-01 21:04                   ` Chris Mason
  0 siblings, 1 reply; 49+ messages in thread
From: Cody P Schafer @ 2014-06-30 23:42 UTC (permalink / raw)
  To: Chris Mason; +Cc: Chris Samuel, linux-btrfs

On Mon, Jun 30, 2014 at 1:30 PM, Chris Mason <clm@fb.com> wrote:
> On 06/30/2014 02:11 PM, Chris Mason wrote:
>> On 06/29/2014 04:02 PM, Cody P Schafer wrote:
>>> On Fri, Jun 27, 2014 at 7:22 PM, Chris Samuel <chris@csamuel.org> wrote:
>>>> On Fri, 27 Jun 2014 05:20:41 PM Duncan wrote:
>>>>
>>>>> If I'm not mistaken the fix for the 3.16 series bug was:
>>>>>
>>>>> ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
>>>>>
>>>>> Btrfs: fix deadlocks with trylock on tree nodes.
>>>>
>>>> That patch applies cleanly to 3.15.2 so if it is indeed the fix it should
>>>> probably go to -stable for the next 3.15 release..
>>>>
>>>> Unfortunately my test system died a while ago (hardware problem) and I've not
>>>> been able to resurrect it yet.
>>>
>>> I'm also seeing stuck tasks on btrfs (3.14.4, 3.15.1, 3.15.2).
>>> I've also tried 3.15.2 with ea4ebde02e08558b020c4b61bb9a4c applied on
>>> top with similar results.
>>> I've been triggering the hang with 'rsync -hPaHAXx --del /mnt/home/a/
>>> /home/a/' where /mnt/home and /home are 2 separate btrfs filesystems
>>> on 2 separate disks.
>>>
>>> dmesg with w-trigger: http://bpaste.net/show/419555
>>> --
>>
>> These traces show us waiting for IO, but it doesn't show anyone doing
>> the IO.  Either we're failing to kick off our work queues or they are
>> stuck on something else.
>>
>> Could you please send a sysrq-t and sysrq-l while you're stuck?  That
>> will show us all the procs and all the CPUs.
>
> Also, do you have any nodatacow files in here?  Please say yes.
>

kernel log from 3.15.2 + ea4ebde02 showing the blocked tasks,
sysrq-{w,t,l} included
http://bpaste.net/show/423296/

I haven't explicitely created any nodatacow files, is there a quick
way to tell if there are any? Right now I'm doing
`lsattr -R /mnt/home/a/ 2>/dev/null | grep -- '^-*C-* '` to try and check.

(2>/dev/null is hiding lots of "Operation not supported While reading
flags on" warnings)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-06-30 18:11             ` Chris Mason
  2014-06-30 18:30               ` Chris Mason
@ 2014-07-01  3:06               ` Charles Cazabon
  1 sibling, 0 replies; 49+ messages in thread
From: Charles Cazabon @ 2014-07-01  3:06 UTC (permalink / raw)
  To: linux-btrfs

Chris Mason <clm@fb.com> wrote:
> On 06/29/2014 04:02 PM, Cody P Schafer wrote:
> >> been able to resurrect it yet.
> > 
> > I'm also seeing stuck tasks on btrfs (3.14.4, 3.15.1, 3.15.2).

I'm seeing these with 3.15.2 as well.

> Could you please send a sysrq-t and sysrq-l while you're stuck?  That
> will show us all the procs and all the CPUs.

For what it's worth, http://bpaste.net/show/BswHMVpHlguSrdELgv7e/ is my syslog
covering my most recent "stuck" event, including the results of sysrq-t and
sysrq-l.

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL'ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-06-30 23:42                 ` Cody P Schafer
@ 2014-07-01 21:04                   ` Chris Mason
  2014-07-01 23:05                     ` Cody P Schafer
  0 siblings, 1 reply; 49+ messages in thread
From: Chris Mason @ 2014-07-01 21:04 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Chris Samuel, linux-btrfs

On 06/30/2014 07:42 PM, Cody P Schafer wrote:
> On Mon, Jun 30, 2014 at 1:30 PM, Chris Mason <clm@fb.com> wrote:
>> On 06/30/2014 02:11 PM, Chris Mason wrote:
>>> On 06/29/2014 04:02 PM, Cody P Schafer wrote:
>>>> On Fri, Jun 27, 2014 at 7:22 PM, Chris Samuel <chris@csamuel.org> wrote:
>>>>> On Fri, 27 Jun 2014 05:20:41 PM Duncan wrote:
>>>>>
>>>>>> If I'm not mistaken the fix for the 3.16 series bug was:
>>>>>>
>>>>>> ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
>>>>>>
>>>>>> Btrfs: fix deadlocks with trylock on tree nodes.
>>>>>
>>>>> That patch applies cleanly to 3.15.2 so if it is indeed the fix it should
>>>>> probably go to -stable for the next 3.15 release..
>>>>>
>>>>> Unfortunately my test system died a while ago (hardware problem) and I've not
>>>>> been able to resurrect it yet.
>>>>
>>>> I'm also seeing stuck tasks on btrfs (3.14.4, 3.15.1, 3.15.2).
>>>> I've also tried 3.15.2 with ea4ebde02e08558b020c4b61bb9a4c applied on
>>>> top with similar results.
>>>> I've been triggering the hang with 'rsync -hPaHAXx --del /mnt/home/a/
>>>> /home/a/' where /mnt/home and /home are 2 separate btrfs filesystems
>>>> on 2 separate disks.
>>>>
>>>> dmesg with w-trigger: https://urldefense.proofpoint.com/v1/url?u=http://bpaste.net/show/419555&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=SAjzDO8AnhJBEWtUi6s8VGVQd2sORQ%2FJz5tWH4nOYWg%3D%0A&s=2c4ff3f7f39b2e6d3dcd4947905df54d6a534b35adf63c55d8c50e28ef5781b6
>>>> --
>>>
>>> These traces show us waiting for IO, but it doesn't show anyone doing
>>> the IO.  Either we're failing to kick off our work queues or they are
>>> stuck on something else.
>>>
>>> Could you please send a sysrq-t and sysrq-l while you're stuck?  That
>>> will show us all the procs and all the CPUs.
>>
>> Also, do you have any nodatacow files in here?  Please say yes.
>>
> 
> kernel log from 3.15.2 + ea4ebde02 showing the blocked tasks,
> sysrq-{w,t,l} included
> https://urldefense.proofpoint.com/v1/url?u=http://bpaste.net/show/423296/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=SAjzDO8AnhJBEWtUi6s8VGVQd2sORQ%2FJz5tWH4nOYWg%3D%0A&s=5af8bc75059925af242b0eef1f4b94348d233d79968d53ff36b7c2594c9dd6b9
> 
> I haven't explicitely created any nodatacow files, is there a quick
> way to tell if there are any? Right now I'm doing
> `lsattr -R /mnt/home/a/ 2>/dev/null | grep -- '^-*C-* '` to try and check.
> 
> (2>/dev/null is hiding lots of "Operation not supported While reading
> flags on" warnings)
> 

If you haven't turned nodatacow on intentionally, you don't have any
nodatacow files ;)  I have been trying to reproduce this with rsync and
other code that hammers on the ordered writeback, but no luck yet.

Before we spend too much time triggering it again, I'd like you to
please try a patch from Filipe that is in current mainline.  I've cherry
picked on top of 3.15.3 in a branch called v3.15.y:

git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git v3.15.y

-chris


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-01 21:04                   ` Chris Mason
@ 2014-07-01 23:05                     ` Cody P Schafer
  2014-07-02 12:27                       ` Cody P Schafer
  0 siblings, 1 reply; 49+ messages in thread
From: Cody P Schafer @ 2014-07-01 23:05 UTC (permalink / raw)
  To: Chris Mason; +Cc: Chris Samuel, linux-btrfs

On Tue, Jul 1, 2014 at 4:04 PM, Chris Mason <clm@fb.com> wrote:
> On 06/30/2014 07:42 PM, Cody P Schafer wrote:
>> On Mon, Jun 30, 2014 at 1:30 PM, Chris Mason <clm@fb.com> wrote:
>>> On 06/30/2014 02:11 PM, Chris Mason wrote:
>>>> On 06/29/2014 04:02 PM, Cody P Schafer wrote:
>>>>> On Fri, Jun 27, 2014 at 7:22 PM, Chris Samuel <chris@csamuel.org> wrote:
>>>>>> On Fri, 27 Jun 2014 05:20:41 PM Duncan wrote:
>>>>>>
>>>>>>> If I'm not mistaken the fix for the 3.16 series bug was:
>>>>>>>
>>>>>>> ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
>>>>>>>
>>>>>>> Btrfs: fix deadlocks with trylock on tree nodes.
>>>>>>
>>>>>> That patch applies cleanly to 3.15.2 so if it is indeed the fix it should
>>>>>> probably go to -stable for the next 3.15 release..
>>>>>>
>>>>>> Unfortunately my test system died a while ago (hardware problem) and I've not
>>>>>> been able to resurrect it yet.
>>>>>
>>>>> I'm also seeing stuck tasks on btrfs (3.14.4, 3.15.1, 3.15.2).
>>>>> I've also tried 3.15.2 with ea4ebde02e08558b020c4b61bb9a4c applied on
>>>>> top with similar results.
>>>>> I've been triggering the hang with 'rsync -hPaHAXx --del /mnt/home/a/
>>>>> /home/a/' where /mnt/home and /home are 2 separate btrfs filesystems
>>>>> on 2 separate disks.
>>>>>
>>>>> dmesg with w-trigger: https://urldefense.proofpoint.com/v1/url?u=http://bpaste.net/show/419555&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=SAjzDO8AnhJBEWtUi6s8VGVQd2sORQ%2FJz5tWH4nOYWg%3D%0A&s=2c4ff3f7f39b2e6d3dcd4947905df54d6a534b35adf63c55d8c50e28ef5781b6
>>>>> --
>>>>
>>>> These traces show us waiting for IO, but it doesn't show anyone doing
>>>> the IO.  Either we're failing to kick off our work queues or they are
>>>> stuck on something else.
>>>>
>>>> Could you please send a sysrq-t and sysrq-l while you're stuck?  That
>>>> will show us all the procs and all the CPUs.
>>>
>>> Also, do you have any nodatacow files in here?  Please say yes.
>>>
>>
>> kernel log from 3.15.2 + ea4ebde02 showing the blocked tasks,
>> sysrq-{w,t,l} included
>> https://urldefense.proofpoint.com/v1/url?u=http://bpaste.net/show/423296/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=SAjzDO8AnhJBEWtUi6s8VGVQd2sORQ%2FJz5tWH4nOYWg%3D%0A&s=5af8bc75059925af242b0eef1f4b94348d233d79968d53ff36b7c2594c9dd6b9
>>
>> I haven't explicitely created any nodatacow files, is there a quick
>> way to tell if there are any? Right now I'm doing
>> `lsattr -R /mnt/home/a/ 2>/dev/null | grep -- '^-*C-* '` to try and check.
>>
>> (2>/dev/null is hiding lots of "Operation not supported While reading
>> flags on" warnings)
>>
>
> If you haven't turned nodatacow on intentionally, you don't have any
> nodatacow files ;)  I have been trying to reproduce this with rsync and
> other code that hammers on the ordered writeback, but no luck yet.
>
> Before we spend too much time triggering it again, I'd like you to
> please try a patch from Filipe that is in current mainline.  I've cherry
> picked on top of 3.15.3 in a branch called v3.15.y:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git v3.15.y
>
> -chris
>

Will do. The rsync I'm running is processing a lot of chromium cache
files when it hangs (just for a reference), and ends up triggering a
bunch of deletes as well.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-01 23:05                     ` Cody P Schafer
@ 2014-07-02 12:27                       ` Cody P Schafer
  2014-07-02 13:58                         ` Chris Mason
  0 siblings, 1 reply; 49+ messages in thread
From: Cody P Schafer @ 2014-07-02 12:27 UTC (permalink / raw)
  To: Chris Mason; +Cc: Chris Samuel, linux-btrfs

On Tue, Jul 1, 2014 at 6:05 PM, Cody P Schafer <dev@codyps.com> wrote:
> On Tue, Jul 1, 2014 at 4:04 PM, Chris Mason <clm@fb.com> wrote:
>> On 06/30/2014 07:42 PM, Cody P Schafer wrote:
>>> On Mon, Jun 30, 2014 at 1:30 PM, Chris Mason <clm@fb.com> wrote:
>>>> On 06/30/2014 02:11 PM, Chris Mason wrote:
>>>>> On 06/29/2014 04:02 PM, Cody P Schafer wrote:
>>>>>> On Fri, Jun 27, 2014 at 7:22 PM, Chris Samuel <chris@csamuel.org> wrote:
>>>>>>> On Fri, 27 Jun 2014 05:20:41 PM Duncan wrote:
>>>>>>>
>>>>>>>> If I'm not mistaken the fix for the 3.16 series bug was:
>>>>>>>>
>>>>>>>> ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
>>>>>>>>
>>>>>>>> Btrfs: fix deadlocks with trylock on tree nodes.
>>>>>>>
>>>>>>> That patch applies cleanly to 3.15.2 so if it is indeed the fix it should
>>>>>>> probably go to -stable for the next 3.15 release..
>>>>>>>
>>>>>>> Unfortunately my test system died a while ago (hardware problem) and I've not
>>>>>>> been able to resurrect it yet.
>>>>>>
>>>>>> I'm also seeing stuck tasks on btrfs (3.14.4, 3.15.1, 3.15.2).
>>>>>> I've also tried 3.15.2 with ea4ebde02e08558b020c4b61bb9a4c applied on
>>>>>> top with similar results.
>>>>>> I've been triggering the hang with 'rsync -hPaHAXx --del /mnt/home/a/
>>>>>> /home/a/' where /mnt/home and /home are 2 separate btrfs filesystems
>>>>>> on 2 separate disks.
>>>>>>
>>>>>> dmesg with w-trigger: https://urldefense.proofpoint.com/v1/url?u=http://bpaste.net/show/419555&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=SAjzDO8AnhJBEWtUi6s8VGVQd2sORQ%2FJz5tWH4nOYWg%3D%0A&s=2c4ff3f7f39b2e6d3dcd4947905df54d6a534b35adf63c55d8c50e28ef5781b6
>>>>>> --
>>>>>
>>>>> These traces show us waiting for IO, but it doesn't show anyone doing
>>>>> the IO.  Either we're failing to kick off our work queues or they are
>>>>> stuck on something else.
>>>>>
>>>>> Could you please send a sysrq-t and sysrq-l while you're stuck?  That
>>>>> will show us all the procs and all the CPUs.
>>>>
>>>> Also, do you have any nodatacow files in here?  Please say yes.
>>>>
>>>
>>> kernel log from 3.15.2 + ea4ebde02 showing the blocked tasks,
>>> sysrq-{w,t,l} included
>>> https://urldefense.proofpoint.com/v1/url?u=http://bpaste.net/show/423296/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=SAjzDO8AnhJBEWtUi6s8VGVQd2sORQ%2FJz5tWH4nOYWg%3D%0A&s=5af8bc75059925af242b0eef1f4b94348d233d79968d53ff36b7c2594c9dd6b9
>>>
>>> I haven't explicitely created any nodatacow files, is there a quick
>>> way to tell if there are any? Right now I'm doing
>>> `lsattr -R /mnt/home/a/ 2>/dev/null | grep -- '^-*C-* '` to try and check.
>>>
>>> (2>/dev/null is hiding lots of "Operation not supported While reading
>>> flags on" warnings)
>>>
>>
>> If you haven't turned nodatacow on intentionally, you don't have any
>> nodatacow files ;)  I have been trying to reproduce this with rsync and
>> other code that hammers on the ordered writeback, but no luck yet.
>>
>> Before we spend too much time triggering it again, I'd like you to
>> please try a patch from Filipe that is in current mainline.  I've cherry
>> picked on top of 3.15.3 in a branch called v3.15.y:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git v3.15.y
>
> Will do. The rsync I'm running is processing a lot of chromium cache
> files when it hangs (just for a reference), and ends up triggering a
> bunch of deletes as well.

Still a problem with your v3.15.y (eb97581), here's the log with
sysrq-t and sysrq-l
http://bpaste.net/show/428234/

Also, correction, it's a firefox cache dir rsync that seems to trigger
it (stalls pretty early on and very consistently):

[... snip ...]
.cache/mozilla/firefox/kqtl1tlc.test/Cache/7/1F/F43F9d01
          5.23M 100%   17.82MB/s    0:00:00 (xfr#452, ir-chk=1201/6659)
.cache/mozilla/firefox/kqtl1tlc.test/Cache/7/20/
.cache/mozilla/firefox/kqtl1tlc.test/Cache/7/20/23A66d01
        116.82K 100%  376.50kB/s    0:00:00 (xfr#453, ir-chk=1200/6659)
.cache/mozilla/firefox/kqtl1tlc.test/Cache/7/21/
.cache/mozilla/firefox/kqtl1tlc.test/Cache/7/23/
.cache/mozilla/firefox/kqtl1tlc.test/Cache/7/24/
.cache/mozilla/firefox/kqtl1tlc.test/Cache/7/25/
.cache/mozilla/firefox/kqtl1tlc.test/Cache/7/25/7C836d01
[... stall here ...]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-02 12:27                       ` Cody P Schafer
@ 2014-07-02 13:58                         ` Chris Mason
  2014-07-02 14:15                           ` Chris Mason
  0 siblings, 1 reply; 49+ messages in thread
From: Chris Mason @ 2014-07-02 13:58 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Chris Samuel, linux-btrfs

On 07/02/2014 08:27 AM, Cody P Schafer wrote:

>> Will do. The rsync I'm running is processing a lot of chromium cache
>> files when it hangs (just for a reference), and ends up triggering a
>> bunch of deletes as well.
> 
> Still a problem with your v3.15.y (eb97581), here's the log with
> sysrq-t and sysrq-l
> https://urldefense.proofpoint.com/v1/url?u=http://bpaste.net/show/428234/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=9lRzEuxWeyHtsDXvFJNWlf2CgKZWZ1w%2FScqbUMy1jII%3D%0A&s=0daa1232bef652c4f16c9d12cdad408909feaa5069ba3c1888fa4895e01ec3a2
> 
> Also, correction, it's a firefox cache dir rsync that seems to trigger
> it (stalls pretty early on and very consistently):
> 
> [... snip ...]
> .cache/mozilla/firefox/kqtl1tlc.test/Cache/7/1F/F43F9d01
>           5.23M 100%   17.82MB/s    0:00:00 (xfr#452, ir-chk=1201/6659)
> .cache/mozilla/firefox/kqtl1tlc.test/Cache/7/20/
> .cache/mozilla/firefox/kqtl1tlc.test/Cache/7/20/23A66d01
>         116.82K 100%  376.50kB/s    0:00:00 (xfr#453, ir-chk=1200/6659)
> .cache/mozilla/firefox/kqtl1tlc.test/Cache/7/21/
> .cache/mozilla/firefox/kqtl1tlc.test/Cache/7/23/
> .cache/mozilla/firefox/kqtl1tlc.test/Cache/7/24/
> .cache/mozilla/firefox/kqtl1tlc.test/Cache/7/25/
> .cache/mozilla/firefox/kqtl1tlc.test/Cache/7/25/7C836d01
> [... stall here ...]
> 

Ok, and just to clarify, are you actively using the files on the
destination outside of rsync?

-chris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-02 13:58                         ` Chris Mason
@ 2014-07-02 14:15                           ` Chris Mason
  2014-07-17 13:18                             ` Chris Mason
  0 siblings, 1 reply; 49+ messages in thread
From: Chris Mason @ 2014-07-02 14:15 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Chris Samuel, linux-btrfs



On 07/02/2014 09:58 AM, Chris Mason wrote:
> On 07/02/2014 08:27 AM, Cody P Schafer wrote:
> 
>>> Will do. The rsync I'm running is processing a lot of chromium cache
>>> files when it hangs (just for a reference), and ends up triggering a
>>> bunch of deletes as well.
>>
>> Still a problem with your v3.15.y (eb97581), here's the log with
>> sysrq-t and sysrq-l
>> https://urldefense.proofpoint.com/v1/url?u=http://bpaste.net/show/428234/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=9lRzEuxWeyHtsDXvFJNWlf2CgKZWZ1w%2FScqbUMy1jII%3D%0A&s=0daa1232bef652c4f16c9d12cdad408909feaa5069ba3c1888fa4895e01ec3a2
>>
>> Also, correction, it's a firefox cache dir rsync that seems to trigger
>> it (stalls pretty early on and very consistently):
>>
>> [... snip ...]
>> .cache/mozilla/firefox/kqtl1tlc.test/Cache/7/1F/F43F9d01
>>           5.23M 100%   17.82MB/s    0:00:00 (xfr#452, ir-chk=1201/6659)
>> .cache/mozilla/firefox/kqtl1tlc.test/Cache/7/20/
>> .cache/mozilla/firefox/kqtl1tlc.test/Cache/7/20/23A66d01
>>         116.82K 100%  376.50kB/s    0:00:00 (xfr#453, ir-chk=1200/6659)
>> .cache/mozilla/firefox/kqtl1tlc.test/Cache/7/21/
>> .cache/mozilla/firefox/kqtl1tlc.test/Cache/7/23/
>> .cache/mozilla/firefox/kqtl1tlc.test/Cache/7/24/
>> .cache/mozilla/firefox/kqtl1tlc.test/Cache/7/25/
>> .cache/mozilla/firefox/kqtl1tlc.test/Cache/7/25/7C836d01
>> [... stall here ...]
>>
> 
> Ok, and just to clarify, are you actively using the files on the
> destination outside of rsync?

Also, do you have compression on?

-chris


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-02 14:15                           ` Chris Mason
@ 2014-07-17 13:18                             ` Chris Mason
  2014-07-19  0:33                               ` Blocked tasks on 3.15.1, raid1 btrfs is no ends of trouble for me Marc MERLIN
  2014-07-19 17:38                               ` Blocked tasks on 3.15.1 Cody P Schafer
  0 siblings, 2 replies; 49+ messages in thread
From: Chris Mason @ 2014-07-17 13:18 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Chris Samuel, linux-btrfs


[ deadlocks during rsync in 3.15 with compression enabled ]

Hi everyone,

I still haven't been able to reproduce this one here, but I'm going
through a series of tests with lzo compression foraced and every
operation forced to ordered.  Hopefully it'll kick it out soon.

While I'm hammering away, could you please try this patch.  If this is
the buy you're hitting, the deadlock will go away and you'll see this
printk in the log.

thanks!

-chris

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3668048..8ab56df 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8157,6 +8157,13 @@ void btrfs_destroy_inode(struct inode *inode)
 		spin_unlock(&root->fs_info->ordered_root_lock);
 	}
 
+	spin_lock(&root->fs_info->ordered_root_lock);
+	if (!list_empty(&BTRFS_I(inode)->ordered_operations)) {
+		list_del_init(&BTRFS_I(inode)->ordered_operations);
+printk(KERN_CRIT "racing inode deletion with ordered operations!!!!!!!!!!!\n");
+	}
+	spin_unlock(&root->fs_info->ordered_root_lock);
+
 	if (test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
 		     &BTRFS_I(inode)->runtime_flags)) {
 		btrfs_info(root->fs_info, "inode %llu still on the orphan list",

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1, raid1 btrfs is no ends of trouble for me
  2014-07-17 13:18                             ` Chris Mason
@ 2014-07-19  0:33                               ` Marc MERLIN
  2014-07-19  0:44                                 ` Marc MERLIN
  2014-07-19 17:38                               ` Blocked tasks on 3.15.1 Cody P Schafer
  1 sibling, 1 reply; 49+ messages in thread
From: Marc MERLIN @ 2014-07-19  0:33 UTC (permalink / raw)
  To: Chris Mason; +Cc: Cody P Schafer, Chris Samuel, linux-btrfs

On Thu, Jul 17, 2014 at 09:18:07AM -0400, Chris Mason wrote:
> 
> [ deadlocks during rsync in 3.15 with compression enabled ]
> 
> Hi everyone,
> 
> I still haven't been able to reproduce this one here, but I'm going
> through a series of tests with lzo compression foraced and every
> operation forced to ordered.  Hopefully it'll kick it out soon.
> 
> While I'm hammering away, could you please try this patch.  If this is
> the buy you're hitting, the deadlock will go away and you'll see this
> printk in the log.
> 
> thanks!
> 
> -chris
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 3668048..8ab56df 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8157,6 +8157,13 @@ void btrfs_destroy_inode(struct inode *inode)
>  		spin_unlock(&root->fs_info->ordered_root_lock);
>  	}
>  
> +	spin_lock(&root->fs_info->ordered_root_lock);
> +	if (!list_empty(&BTRFS_I(inode)->ordered_operations)) {
> +		list_del_init(&BTRFS_I(inode)->ordered_operations);
> +printk(KERN_CRIT "racing inode deletion with ordered operations!!!!!!!!!!!\n");
> +	}
> +	spin_unlock(&root->fs_info->ordered_root_lock);
> +
>  	if (test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
>  		     &BTRFS_I(inode)->runtime_flags)) {
>  		btrfs_info(root->fs_info, "inode %llu still on the orphan list",

I've gotten more blocked messages with your patch:


See also the message I sent about memory leaks, and how enabling
kmemleak gets btrfs to deadlock soon after boot relibably.
https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg35568.html
(this was before your patch though)

With your patch (and without kmemleak):

gargamel:/etc/apache2/sites-enabled# ps -eo pid,etime,wchan:30,args |grep df
 3495    01:48:38 btrfs_statfs                   df -hP -x none -x tmpfs -x iso9660 -x udf -x nfs
 4105    07:48:39 btrfs_statfs                   df -hP -x none -x tmpfs -x iso9660 -x udf -x nfs
12639       48:38 btrfs_statfs                   df -hP -x none -x tmpfs -x iso9660 -x udf -x nfs
12691       48:37 btrfs_statfs                   df
14753    06:48:38 btrfs_statfs                   df -hP -x none -x tmpfs -x iso9660 -x udf -x nfs
17214    10:48:39 btrfs_statfs                   df -hP -x none -x tmpfs -x iso9660 -x udf -x nfs
17526    03:48:38 btrfs_statfs                   df -hP -x none -x tmpfs -x iso9660 -x udf -x nfs
18710    09:48:38 btrfs_statfs                   df -hP -x none -x tmpfs -x iso9660 -x udf -x nfs
23668    05:48:38 btrfs_statfs                   df -hP -x none -x tmpfs -x iso9660 -x udf -x nfs
26675    11:37:42 btrfs_statfs                   df .
26828    02:48:38 btrfs_statfs                   df -hP -x none -x tmpfs -x iso9660 -x udf -x nfs
27515    08:48:38 btrfs_statfs                   df -hP -x none -x tmpfs -x iso9660 -x udf -x nfs

sysrq-w does not show me output for those and I cannot understand why.

Howver, I have found that btrfs raid 1 on top of dmcrypt has given me no ends of trouble.
I lost that filesystem twice due to corruption, and now it hangs my machine (strace finds
that df is hanging on that partition).
gargamel:~# btrfs fi df /mnt/btrfs_raid0
Data, RAID1: total=222.00GiB, used=221.61GiB
Data, single: total=8.00MiB, used=0.00
System, RAID1: total=8.00MiB, used=48.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, RAID1: total=2.00GiB, used=1.10GiB
Metadata, single: total=8.00MiB, used=0.00
unknown, single: total=384.00MiB, used=0.00
gargamel:~# btrfs fi show /mnt/btrfs_raid0
Label: 'btrfs_raid0'  uuid: 74279e10-46e7-4ac4-8216-a291819a6691
        Total devices 2 FS bytes used 222.71GiB
        devid    1 size 836.13GiB used 224.03GiB path /dev/dm-3
        devid    2 size 836.13GiB used 224.01GiB path /dev/mapper/raid0d2

Btrfs v3.14.1


This is not encouraging, I think I'm going to stop using raid1 in btrfs :(

I tried sysrq-t, but the output goes faster than my serial console can
capture it, I can't get you a traceback on those df processes.
the dmesg buffer is too small
I already have 
Kernel log buffer size (16 => 64KB, 17 => 128KB) (LOG_BUF_SHIFT) [17] (NEW) 17
and the kernel config does not let me increase it to something more useful like 24.

Btrfs in 3.15 has been no end of troubles for me on my 2 machines, and I can't even capture
useful info when it happens since my long sysrq dumps get truncated and
flow faster than syslog can capture and relay them it seems.

Do you have any suggestions on how to capture that data better?

In the meantime, kernel log when things started hanging is below. the zm processes 
are indeed accessing that raid1 partition.

[67499.502755] INFO: task btrfs-transacti:2867 blocked for more than 120 seconds.
[67499.526860]       Not tainted 3.15.5-amd64-i915-preempt-20140714cm1 #1
[67499.548624] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[67499.575212] btrfs-transacti D 0000000000000001     0  2867      2 0x00000000
[67499.598611]  ffff8802135e7e10 0000000000000046 ffff880118322158 ffff8802135e7fd8
[67499.623218]  ffff8800c96242d0 00000000000141c0 ffff880213496000 ffff880046d1fd40
[67499.647684]  ffff880118322158 ffff880118322100 0000000000000000 ffff8802135e7e20
[67499.672179] Call Trace:
[67499.681567]  [<ffffffff8161fa5e>] schedule+0x73/0x75
[67499.698510]  [<ffffffff8122ceb9>] btrfs_commit_transaction+0x333/0x8b0
[67499.720117]  [<ffffffff8108489d>] ? finish_wait+0x65/0x65
[67499.738331]  [<ffffffff8122966a>] transaction_kthread+0xf8/0x1ab
[67499.758313]  [<ffffffff81229572>] ? btrfs_cleanup_transaction+0x44c/0x44c
[67499.780605]  [<ffffffff8106b564>] kthread+0xae/0xb6
[67499.798238]  [<ffffffff8106b4b6>] ? __kthread_parkme+0x61/0x61
[67499.817609]  [<ffffffff81628d7c>] ret_from_fork+0x7c/0xb0
[67499.835645]  [<ffffffff8106b4b6>] ? __kthread_parkme+0x61/0x61
[67499.856079] INFO: task zma:5225 blocked for more than 120 seconds.
[67499.877451]       Not tainted 3.15.5-amd64-i915-preempt-20140714cm1 #1
[67499.899807] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[67499.926049] zma             D 0000000000000000     0  5225   4863 0x20020084
[67499.950039]  ffff8801f4117bb0 0000000000000082 ffff880211e5b270 ffff8801f4117fd8
[67499.975054]  ffff8801c74c8590 00000000000141c0 ffff880118322100 ffff8800c9e341e8
[67500.000049]  0000000000000000 ffff8800c9e34000 ffff880046d1ff20 ffff8801f4117bc0
[67500.024086] Call Trace:
[67500.033079]  [<ffffffff8161fa5e>] schedule+0x73/0x75
[67500.050477]  [<ffffffff8122c3b4>] wait_current_trans.isra.15+0x98/0xf4
[67500.071684]  [<ffffffff8108489d>] ? finish_wait+0x65/0x65
[67500.089422]  [<ffffffff8122d8ce>] start_transaction+0x498/0x4fc
[67500.108678]  [<ffffffff8122d94d>] btrfs_start_transaction+0x1b/0x1d
[67500.128989]  [<ffffffff81237b3f>] btrfs_create+0x3c/0x1ce
[67500.147460]  [<ffffffff8129a57d>] ? security_inode_permission+0x1c/0x23
[67500.168738]  [<ffffffff81160430>] ? __inode_permission+0x79/0xa4
[67500.188164]  [<ffffffff811616ee>] vfs_create+0x66/0x8c
[67500.204923]  [<ffffffff81162450>] do_last+0x5af/0xa23
[67500.221393]  [<ffffffff81162afb>] path_openat+0x237/0x4de
[67500.238899]  [<ffffffff816225a9>] ? _raw_spin_lock_irqsave+0x21/0x4a
[67500.259953]  [<ffffffff8162223d>] ? _raw_spin_unlock_irqrestore+0x1f/0x32
[67500.281773]  [<ffffffff81163efa>] do_filp_open+0x3a/0x7f
[67500.298935]  [<ffffffff8162220b>] ? _raw_spin_unlock+0x17/0x2a
[67500.317622]  [<ffffffff8116dedd>] ? __alloc_fd+0xea/0xf9
[67500.334714]  [<ffffffff8115648f>] do_sys_open+0x70/0xff
[67500.351512]  [<ffffffff8119691b>] compat_SyS_open+0x1b/0x1d
[67500.369317]  [<ffffffff8162ab2c>] sysenter_dispatch+0x7/0x21
[67500.387936] INFO: task zma:5245 blocked for more than 120 seconds.
[67500.407543]       Not tainted 3.15.5-amd64-i915-preempt-20140714cm1 #1
[67500.428139] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[67500.452610] zma             D ffff88021371eb80     0  5245   4863 0x20020080
[67500.474872]  ffff8800b2fd3bb0 0000000000000086 ffff880211e5b270 ffff8800b2fd3fd8
[67500.498274]  ffff8801889b61d0 00000000000141c0 ffff880118322100 ffff8800c9e341e8
[67500.521624]  0000000000000000 ffff8800c9e34000 ffff880102fce0c0 ffff8800b2fd3bc0
[67500.544987] Call Trace:
[67500.553292]  [<ffffffff8161fa5e>] schedule+0x73/0x75
[67500.569694]  [<ffffffff8122c3b4>] wait_current_trans.isra.15+0x98/0xf4
[67500.590266]  [<ffffffff8108489d>] ? finish_wait+0x65/0x65
[67500.607468]  [<ffffffff8122d8ce>] start_transaction+0x498/0x4fc
[67500.626205]  [<ffffffff8122d94d>] btrfs_start_transaction+0x1b/0x1d
[67500.645984]  [<ffffffff81237b3f>] btrfs_create+0x3c/0x1ce
[67500.663666]  [<ffffffff8129a57d>] ? security_inode_permission+0x1c/0x23
[67500.684852]  [<ffffffff81160430>] ? __inode_permission+0x79/0xa4
[67500.703862]  [<ffffffff811616ee>] vfs_create+0x66/0x8c
[67500.720746]  [<ffffffff81162450>] do_last+0x5af/0xa23
[67500.736856]  [<ffffffff81162afb>] path_openat+0x237/0x4de
[67500.754513]  [<ffffffff816225a9>] ? _raw_spin_lock_irqsave+0x21/0x4a
[67500.775057]  [<ffffffff8162223d>] ? _raw_spin_unlock_irqrestore+0x1f/0x32
[67500.796364]  [<ffffffff81163efa>] do_filp_open+0x3a/0x7f
[67500.813245]  [<ffffffff8162220b>] ? _raw_spin_unlock+0x17/0x2a
[67500.831682]  [<ffffffff8116dedd>] ? __alloc_fd+0xea/0xf9
[67500.848983]  [<ffffffff8115648f>] do_sys_open+0x70/0xff
[67500.866040]  [<ffffffff8119691b>] compat_SyS_open+0x1b/0x1d
[67500.883677]  [<ffffffff8162ab2c>] sysenter_dispatch+0x7/0x21

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1, raid1 btrfs is no ends of trouble for me
  2014-07-19  0:33                               ` Blocked tasks on 3.15.1, raid1 btrfs is no ends of trouble for me Marc MERLIN
@ 2014-07-19  0:44                                 ` Marc MERLIN
  2014-07-19  1:58                                   ` Marc MERLIN
  2014-07-19  1:59                                   ` Chris Samuel
  0 siblings, 2 replies; 49+ messages in thread
From: Marc MERLIN @ 2014-07-19  0:44 UTC (permalink / raw)
  To: Chris Mason; +Cc: Cody P Schafer, Chris Samuel, linux-btrfs

On Fri, Jul 18, 2014 at 05:33:45PM -0700, Marc MERLIN wrote:
> Howver, I have found that btrfs raid 1 on top of dmcrypt has given me no ends of trouble.
> I lost that filesystem twice due to corruption, and now it hangs my machine (strace finds
> that df is hanging on that partition).
> gargamel:~# btrfs fi df /mnt/btrfs_raid0
> Data, RAID1: total=222.00GiB, used=221.61GiB
> Data, single: total=8.00MiB, used=0.00
> System, RAID1: total=8.00MiB, used=48.00KiB
> System, single: total=4.00MiB, used=0.00
> Metadata, RAID1: total=2.00GiB, used=1.10GiB
> Metadata, single: total=8.00MiB, used=0.00
> unknown, single: total=384.00MiB, used=0.00
> gargamel:~# btrfs fi show /mnt/btrfs_raid0
> Label: 'btrfs_raid0'  uuid: 74279e10-46e7-4ac4-8216-a291819a6691
>         Total devices 2 FS bytes used 222.71GiB
>         devid    1 size 836.13GiB used 224.03GiB path /dev/dm-3
>         devid    2 size 836.13GiB used 224.01GiB path /dev/mapper/raid0d2
> 
> Btrfs v3.14.1
> 
> 
> This is not encouraging, I think I'm going to stop using raid1 in btrfs :(

Sorry, this may be a bit misleading. I actually lost 2 filesystems that
were raid0 on top of dmcrypt.
This time it's raid1, and the data isn't lost, but btrfs is tripping all
over itself and taking my whole system apparently because of that
filesystem.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1, raid1 btrfs is no ends of trouble for me
  2014-07-19  0:44                                 ` Marc MERLIN
@ 2014-07-19  1:58                                   ` Marc MERLIN
  2014-07-19  1:59                                   ` Chris Samuel
  1 sibling, 0 replies; 49+ messages in thread
From: Marc MERLIN @ 2014-07-19  1:58 UTC (permalink / raw)
  To: Chris Mason; +Cc: Cody P Schafer, Chris Samuel, linux-btrfs

TL;DR: 3.15.5 (or .1 when I tried it) just hang over and over again in
multiple ways on my server.
They also hang on my laptop reliably if I enable kmemleak, but otherwise
my laptop mostly survives with 3.15.x without kmemleak (although it does
deadlock eventually, but that could be after days/weeks, not hours).

I reverted to 3.14 on that machine, and everything is good again.

As a note, this is the 3rd time I try to upgrade this server to 3.15 and
everything goes to crap. I then go back to 3.14 and things work again,
not great since btrfs has never been great and stable for me, but it
works well enough.

On Fri, Jul 18, 2014 at 05:44:57PM -0700, Marc MERLIN wrote:
> On Fri, Jul 18, 2014 at 05:33:45PM -0700, Marc MERLIN wrote:
> > Howver, I have found that btrfs raid 1 on top of dmcrypt has given me no ends of trouble.
> > I lost that filesystem twice due to corruption, and now it hangs my machine (strace finds
> > that df is hanging on that partition).
> > gargamel:~# btrfs fi df /mnt/btrfs_raid0
> > Data, RAID1: total=222.00GiB, used=221.61GiB
> > Data, single: total=8.00MiB, used=0.00
> > System, RAID1: total=8.00MiB, used=48.00KiB
> > System, single: total=4.00MiB, used=0.00
> > Metadata, RAID1: total=2.00GiB, used=1.10GiB
> > Metadata, single: total=8.00MiB, used=0.00
> > unknown, single: total=384.00MiB, used=0.00
> > gargamel:~# btrfs fi show /mnt/btrfs_raid0
> > Label: 'btrfs_raid0'  uuid: 74279e10-46e7-4ac4-8216-a291819a6691
> >         Total devices 2 FS bytes used 222.71GiB
> >         devid    1 size 836.13GiB used 224.03GiB path /dev/dm-3
> >         devid    2 size 836.13GiB used 224.01GiB path /dev/mapper/raid0d2
> > 
> > Btrfs v3.14.1
> > 
> > 
> > This is not encouraging, I think I'm going to stop using raid1 in btrfs :(
> 
> Sorry, this may be a bit misleading. I actually lost 2 filesystems that
> were raid0 on top of dmcrypt.
> This time it's raid1, and the data isn't lost, but btrfs is tripping all
> over itself and taking my whole system apparently because of that
> filesystem.

And just to say that I'm wrong at pinning this down, the same 3.15.5
with your patch locked up on my root filesystem on the next boot

This time sysrq-w worked for a change.
Excerpt:

31933	    03:54 btrfs_file_llseek		 tail -n 50 /var/local/src/misterhouse/data/logs/print.log
31960	    32:54 btrfs_file_llseek		 tail -n 50 /var/local/src/misterhouse/data/logs/print.log
32077	    18:54 btrfs_file_llseek		 tail -n 50 /var/local/src/misterhouse/data/logs/print.log

[ 2176.230211] tail	       D ffff8801b3a567c0     0 25396  22031 0x20020080
[ 2176.252788]	ffff88006fed3e20 0000000000000082 00000000000000a8 ffff88006fed3fd8
[ 2176.276039]	ffff8801a542a3d0 00000000000141c0 ffff88020c374e10 ffff88020c374e14
[ 2176.299273]	ffff8801a542a3d0 ffff88020c374e18 00000000ffffffff ffff88006fed3e30
[ 2176.322515] Call Trace:
[ 2176.330739]	[<ffffffff8161fa5e>] schedule+0x73/0x75
[ 2176.346527]	[<ffffffff8161fd1f>] schedule_preempt_disabled+0x18/0x24
[ 2176.367208]	[<ffffffff81620e42>] __mutex_lock_slowpath+0x160/0x1d7
[ 2176.386946]	[<ffffffff81620ed0>] mutex_lock+0x17/0x27
[ 2176.403727]	[<ffffffff8123a33a>] btrfs_file_llseek+0x40/0x205
[ 2176.422603]	[<ffffffff810be59a>] ? from_kgid_munged+0x12/0x1e
[ 2176.441015]	[<ffffffff810482f1>] ? cp_stat64+0x50/0x20b
[ 2176.457841]	[<ffffffff81156627>] vfs_llseek+0x2e/0x30
[ 2176.474606]	[<ffffffff81156c32>] SyS_llseek+0x5b/0xaa
[ 2176.490895]	[<ffffffff8162ab2c>] sysenter_dispatch+0x7/0x21

Full log:
http://marc.merlins.org/tmp/btrfs_hang3.txt

After reboot, it's now hanging on this:
[  362.811392] INFO: task kworker/u8:0:6 blocked for more than 120 seconds.
[  362.831717]       Not tainted 3.15.5-amd64-i915-preempt-20140714cm1 #1
[  362.851516] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  362.875213] kworker/u8:0    D ffff88021265a800     0     6      2 0x00000000
[  362.896672] Workqueue: btrfs-flush_delalloc normal_work_helper
[  362.914260]  ffff8802148cbb60 0000000000000046 ffff8802148cbb30 ffff8802148cbfd8
[  362.936741]  ffff8802148c4150 00000000000141c0 ffff88021f3941c0 ffff8802148c4150
[  362.959195]  ffff8802148cbc00 0000000000000002 ffffffff810fdda8 ffff8802148cbb70
[  362.981602] Call Trace:
[  362.988972]  [<ffffffff810fdda8>] ? wait_on_page_read+0x3c/0x3c
[  363.006769]  [<ffffffff8161fa5e>] schedule+0x73/0x75
[  363.021704]  [<ffffffff8161fc03>] io_schedule+0x60/0x7a
[  363.037414]  [<ffffffff810fddb6>] sleep_on_page+0xe/0x12
[  363.053416]  [<ffffffff8161ff93>] __wait_on_bit_lock+0x46/0x8a
[  363.070980]  [<ffffffff810fde71>] __lock_page+0x69/0x6b
[  363.086722]  [<ffffffff810848d1>] ? autoremove_wake_function+0x34/0x34
[  363.106373]  [<ffffffff81242ab0>] lock_page+0x1e/0x21
[  363.121585]  [<ffffffff812465bb>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c6
[  363.148103]  [<ffffffff81246a19>] extent_writepages+0x4b/0x5c
[  363.166792]  [<ffffffff81230ce4>] ? btrfs_submit_direct+0x3f4/0x3f4
[  363.187074]  [<ffffffff810765ec>] ? get_parent_ip+0xc/0x3c
[  363.204975]  [<ffffffff8122f3fc>] btrfs_writepages+0x28/0x2a
[  363.223367]  [<ffffffff8110873d>] do_writepages+0x1e/0x2c
[  363.240980]  [<ffffffff810ff507>] __filemap_fdatawrite_range+0x55/0x57
[  363.261985]  [<ffffffff810fff50>] filemap_flush+0x1c/0x1e
[  363.279628]  [<ffffffff81231921>] btrfs_run_delalloc_work+0x32/0x69
[  363.299893]  [<ffffffff81252438>] normal_work_helper+0xfe/0x240
[  363.319143]  [<ffffffff81065e29>] process_one_work+0x195/0x2d2
[  363.338123]  [<ffffffff810660cb>] worker_thread+0x136/0x205
[  363.356348]  [<ffffffff81065f95>] ? process_scheduled_works+0x2f/0x2f
[  363.377203]  [<ffffffff8106b564>] kthread+0xae/0xb6
[  363.393396]  [<ffffffff8106b4b6>] ? __kthread_parkme+0x61/0x61
[  363.412469]  [<ffffffff81628d7c>] ret_from_fork+0x7c/0xb0
[  363.430228]  [<ffffffff8106b4b6>] ? __kthread_parkme+0x61/0x61

In the end, I went back to 3.14, and things work again.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1, raid1 btrfs is no ends of trouble for me
  2014-07-19  0:44                                 ` Marc MERLIN
  2014-07-19  1:58                                   ` Marc MERLIN
@ 2014-07-19  1:59                                   ` Chris Samuel
  2014-07-19  5:40                                     ` Marc MERLIN
  1 sibling, 1 reply; 49+ messages in thread
From: Chris Samuel @ 2014-07-19  1:59 UTC (permalink / raw)
  To: linux-btrfs

On Fri, 18 Jul 2014 05:44:57 PM Marc MERLIN wrote:

> Sorry, this may be a bit misleading. I actually lost 2 filesystems that
> were raid0 on top of dmcrypt.

Stupid question I know, but does this happen without dmcrypt?

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1, raid1 btrfs is no ends of trouble for me
  2014-07-19  1:59                                   ` Chris Samuel
@ 2014-07-19  5:40                                     ` Marc MERLIN
  0 siblings, 0 replies; 49+ messages in thread
From: Marc MERLIN @ 2014-07-19  5:40 UTC (permalink / raw)
  To: Chris Samuel; +Cc: linux-btrfs

On Sat, Jul 19, 2014 at 11:59:24AM +1000, Chris Samuel wrote:
> On Fri, 18 Jul 2014 05:44:57 PM Marc MERLIN wrote:
> 
> > Sorry, this may be a bit misleading. I actually lost 2 filesystems that
> > were raid0 on top of dmcrypt.
> 
> Stupid question I know, but does this happen without dmcrypt?

It's not a stupid question: I don't use btrfs without dmcrypt, so I can't
say.
(and I'm not interested in trying :) 

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-17 13:18                             ` Chris Mason
  2014-07-19  0:33                               ` Blocked tasks on 3.15.1, raid1 btrfs is no ends of trouble for me Marc MERLIN
@ 2014-07-19 17:38                               ` Cody P Schafer
  2014-07-19 18:23                                 ` Martin Steigerwald
  1 sibling, 1 reply; 49+ messages in thread
From: Cody P Schafer @ 2014-07-19 17:38 UTC (permalink / raw)
  To: Chris Mason; +Cc: Chris Samuel, linux-btrfs

On Thu, Jul 17, 2014 at 8:18 AM, Chris Mason <clm@fb.com> wrote:
>
> [ deadlocks during rsync in 3.15 with compression enabled ]
>
> Hi everyone,
>
> I still haven't been able to reproduce this one here, but I'm going
> through a series of tests with lzo compression foraced and every
> operation forced to ordered.  Hopefully it'll kick it out soon.
>
> While I'm hammering away, could you please try this patch.  If this is
> the buy you're hitting, the deadlock will go away and you'll see this
> printk in the log.
>
> thanks!
>
> -chris
>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 3668048..8ab56df 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -8157,6 +8157,13 @@ void btrfs_destroy_inode(struct inode *inode)
>                 spin_unlock(&root->fs_info->ordered_root_lock);
>         }
>
> +       spin_lock(&root->fs_info->ordered_root_lock);
> +       if (!list_empty(&BTRFS_I(inode)->ordered_operations)) {
> +               list_del_init(&BTRFS_I(inode)->ordered_operations);
> +printk(KERN_CRIT "racing inode deletion with ordered operations!!!!!!!!!!!\n");
> +       }
> +       spin_unlock(&root->fs_info->ordered_root_lock);
> +
>         if (test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
>                      &BTRFS_I(inode)->runtime_flags)) {
>                 btrfs_info(root->fs_info, "inode %llu still on the orphan list",

Thanks Chris.

Running 3.15.6 with this patch applied on top:
 - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/`
 - no extra error messages printed (`dmesg | grep racing`) compared to
without the patch

To recap some details (so I can have it all in one place):
 - /home/ is btrfs with compress=lzo
 - /mnt/home is btrfs with no compression enabled.
 - I have _not_ created any nodatacow files.
 - Both filesystems are on different physical disks.
 - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
mentioning the use of dmcrypt)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-19 17:38                               ` Blocked tasks on 3.15.1 Cody P Schafer
@ 2014-07-19 18:23                                 ` Martin Steigerwald
  2014-07-22 14:53                                   ` Chris Mason
  0 siblings, 1 reply; 49+ messages in thread
From: Martin Steigerwald @ 2014-07-19 18:23 UTC (permalink / raw)
  To: Cody P Schafer; +Cc: Chris Mason, Chris Samuel, linux-btrfs

Am Samstag, 19. Juli 2014, 12:38:53 schrieb Cody P Schafer:
> On Thu, Jul 17, 2014 at 8:18 AM, Chris Mason <clm@fb.com> wrote:
> > [ deadlocks during rsync in 3.15 with compression enabled ]
> > 
> > Hi everyone,
> > 
> > I still haven't been able to reproduce this one here, but I'm going
> > through a series of tests with lzo compression foraced and every
> > operation forced to ordered.  Hopefully it'll kick it out soon.
> > 
> > While I'm hammering away, could you please try this patch.  If this is
> > the buy you're hitting, the deadlock will go away and you'll see this
> > printk in the log.
> > 
> > thanks!
> > 
> > -chris
> > 
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 3668048..8ab56df 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -8157,6 +8157,13 @@ void btrfs_destroy_inode(struct inode *inode)
> > 
> >                 spin_unlock(&root->fs_info->ordered_root_lock);
> >         
> >         }
> > 
> > +       spin_lock(&root->fs_info->ordered_root_lock);
> > +       if (!list_empty(&BTRFS_I(inode)->ordered_operations)) {
> > +               list_del_init(&BTRFS_I(inode)->ordered_operations);
> > +printk(KERN_CRIT "racing inode deletion with ordered
> > operations!!!!!!!!!!!\n"); +       }
> > +       spin_unlock(&root->fs_info->ordered_root_lock);
> > +
> > 
> >         if (test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
> >         
> >                      &BTRFS_I(inode)->runtime_flags)) {
> >                 
> >                 btrfs_info(root->fs_info, "inode %llu still on the orphan
> >                 list",
> 
> Thanks Chris.
> 
> Running 3.15.6 with this patch applied on top:
>  - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/`
> - no extra error messages printed (`dmesg | grep racing`) compared to
> without the patch

I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with 
3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.

> To recap some details (so I can have it all in one place):
>  - /home/ is btrfs with compress=lzo

BTRFS RAID 1 with lzo.

>  - I have _not_ created any nodatacow files.

Me neither.

>  - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
> mentioning the use of dmcrypt)

Same, except no dmcrypt.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-19 18:23                                 ` Martin Steigerwald
@ 2014-07-22 14:53                                   ` Chris Mason
  2014-07-22 15:14                                     ` Torbjørn
                                                       ` (7 more replies)
  0 siblings, 8 replies; 49+ messages in thread
From: Chris Mason @ 2014-07-22 14:53 UTC (permalink / raw)
  To: Martin Steigerwald, Cody P Schafer; +Cc: Chris Samuel, linux-btrfs



On 07/19/2014 02:23 PM, Martin Steigerwald wrote:

>> Running 3.15.6 with this patch applied on top:
>>  - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/`
>> - no extra error messages printed (`dmesg | grep racing`) compared to
>> without the patch
> 
> I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with 
> 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
> 
>> To recap some details (so I can have it all in one place):
>>  - /home/ is btrfs with compress=lzo
> 
> BTRFS RAID 1 with lzo.
> 
>>  - I have _not_ created any nodatacow files.
> 
> Me neither.
> 
>>  - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
>> mentioning the use of dmcrypt)
> 
> Same, except no dmcrypt.
> 

Thanks for the help in tracking this down everyone.  We'll get there!
Are you all running multi-disk systems (from a btrfs POV, more than one
device?)  I don't care how many physical drives this maps to, just does
btrfs think there's more than one drive.

-chris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-22 14:53                                   ` Chris Mason
@ 2014-07-22 15:14                                     ` Torbjørn
  2014-07-22 16:46                                     ` Marc MERLIN
                                                       ` (6 subsequent siblings)
  7 siblings, 0 replies; 49+ messages in thread
From: Torbjørn @ 2014-07-22 15:14 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

On 07/22/2014 04:53 PM, Chris Mason wrote:
>
> On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
>
>>> Running 3.15.6 with this patch applied on top:
>>>   - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/`
>>> - no extra error messages printed (`dmesg | grep racing`) compared to
>>> without the patch
>> I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with
>> 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
>>
>>> To recap some details (so I can have it all in one place):
>>>   - /home/ is btrfs with compress=lzo
>> BTRFS RAID 1 with lzo.
>>
>>>   - I have _not_ created any nodatacow files.
>> Me neither.
>>
>>>   - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
>>> mentioning the use of dmcrypt)
>> Same, except no dmcrypt.
>>
> Thanks for the help in tracking this down everyone.  We'll get there!
> Are you all running multi-disk systems (from a btrfs POV, more than one
> device?)  I don't care how many physical drives this maps to, just does
> btrfs think there's more than one drive.
>
> -chris
Hi,

In case it's interesting:
 From an earlier email thread with subject: 3.15-rc6 - 
btrfs-transacti:4157 blocked for more than 120

TLDR: yes, btrfs sees multiple devices.

sata <-> dmcrypt <-> btrfs raid10
btrfs raid10 consist of multiple dmcrypt devices from multiple sata devices.

Mount: /dev/mapper/sdu on /mnt/storage type btrfs 
(rw,noatime,space_cache,compress=lzo,inode_cache,subvol=storage)
(yes I know inode_cache is not recommended for general use)

I have a nocow directory in a separate subvolume containing vm-images 
used by kvm.
The same kvm-vms are reading/writing data from that array over nfs.

I'm still holding that system on 3.14. Anything above causes blocks.

--
Torbjørn

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-22 14:53                                   ` Chris Mason
  2014-07-22 15:14                                     ` Torbjørn
@ 2014-07-22 16:46                                     ` Marc MERLIN
  2014-07-22 19:42                                     ` Torbjørn
                                                       ` (5 subsequent siblings)
  7 siblings, 0 replies; 49+ messages in thread
From: Marc MERLIN @ 2014-07-22 16:46 UTC (permalink / raw)
  To: Chris Mason; +Cc: Martin Steigerwald, Cody P Schafer, Chris Samuel, linux-btrfs

On Tue, Jul 22, 2014 at 10:53:03AM -0400, Chris Mason wrote:
> Thanks for the help in tracking this down everyone.  We'll get there!
> Are you all running multi-disk systems (from a btrfs POV, more than one
> device?)  I don't care how many physical drives this maps to, just does
> btrfs think there's more than one drive.

In the bugs I sent you, it was a mix of arrays that were
mdraid / dmcrypt / btrfs

I have also one array with:
disk1 / dmcrypt \
                 - btrfs (2 drives visible by btrfs)
disk2 / dmcrypt /

The multidrive setup seemed a bit worse, I just destroyed it and went
back to putting all the drives together with mdadm and showing a single
dmcrypted device to btrfs.

But that is still super unstable on my server with 3.15, while being
somewhat usable with my laptop (it still hangs, but more rarely)
The one difference is that my laptop actually does
disk > dmcrypt > btrfs
while my server does 
disks > mdadm > dmcrypt > btrfs

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-22 14:53                                   ` Chris Mason
  2014-07-22 15:14                                     ` Torbjørn
  2014-07-22 16:46                                     ` Marc MERLIN
@ 2014-07-22 19:42                                     ` Torbjørn
  2014-07-22 19:50                                       ` Chris Mason
  2014-07-22 21:13                                     ` Martin Steigerwald
                                                       ` (4 subsequent siblings)
  7 siblings, 1 reply; 49+ messages in thread
From: Torbjørn @ 2014-07-22 19:42 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

On 07/22/2014 04:53 PM, Chris Mason wrote:
>
> On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
>
>>> Running 3.15.6 with this patch applied on top:
>>>   - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/`
>>> - no extra error messages printed (`dmesg | grep racing`) compared to
>>> without the patch
>> I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with
>> 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
>>
>>> To recap some details (so I can have it all in one place):
>>>   - /home/ is btrfs with compress=lzo
>> BTRFS RAID 1 with lzo.
>>
>>>   - I have _not_ created any nodatacow files.
>> Me neither.
>>
>>>   - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
>>> mentioning the use of dmcrypt)
>> Same, except no dmcrypt.
>>
> Thanks for the help in tracking this down everyone.  We'll get there!
> Are you all running multi-disk systems (from a btrfs POV, more than one
> device?)  I don't care how many physical drives this maps to, just does
> btrfs think there's more than one drive.
>
> -chris
3.16-rc6 with your patch on top still causes hangs here.
No traces of "racing" in dmesg
Hang is on a btrfs raid 0 consisting of 3 drives.
Full stack is: sata <-> dmcrypt <-> btrfs raid0

Hang was caused by
1. Several rsync -av --inplace --delete <source> <backup subvol>
2. btrfs subvolume snapshot -r <backup subvol> <bacup snap>

The rsync jobs are done one at a time
btrfs is stuck when trying to create the read only snapshot

--
Torbjørn

All output via netconsole.
sysrq-w: 
https://gist.github.com/anonymous/d1837187e261f9a4cbd2#file-gistfile1-txt
sysrq-t: 
https://gist.github.com/anonymous/2bdb73f035ab9918c63d#file-gistfile1-txt

dmesg:
[ 9352.784136] INFO: task btrfs-transacti:3874 blocked for more than 120 
seconds.
[ 9352.784222]       Tainted: G            E 3.16.0-rc6+ #64
[ 9352.784270] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 9352.784354] btrfs-transacti D ffff88042fc943c0     0  3874      2 
0x00000000
[ 9352.784413]  ffff8803fb9dfca0 0000000000000002 ffff8800c4214b90 
ffff8803fb9dffd8
[ 9352.784502]  00000000000143c0 00000000000143c0 ffff88041977b260 
ffff8803d29f23a0
[ 9352.784592]  ffff8803d29f23a8 7fffffffffffffff ffff8800c4214b90 
ffff880232e2c0a8
[ 9352.784682] Call Trace:
[ 9352.784726]  [<ffffffff8170eb59>] schedule+0x29/0x70
[ 9352.784774]  [<ffffffff8170df99>] schedule_timeout+0x209/0x280
[ 9352.784827]  [<ffffffff8170874b>] ? __slab_free+0xfe/0x2c3
[ 9352.784879]  [<ffffffff810829f4>] ? wake_up_worker+0x24/0x30
[ 9352.784929]  [<ffffffff8170f656>] wait_for_completion+0xa6/0x160
[ 9352.784981]  [<ffffffff8109d4e0>] ? wake_up_state+0x20/0x20
[ 9352.785049]  [<ffffffffc045b936>] 
btrfs_wait_and_free_delalloc_work+0x16/0x30 [btrfs]
[ 9352.785141]  [<ffffffffc04658be>] 
btrfs_run_ordered_operations+0x1ee/0x2c0 [btrfs]
[ 9352.785260]  [<ffffffffc044bbb7>] btrfs_commit_transaction+0x27/0xa40 
[btrfs]
[ 9352.785324]  [<ffffffffc0447d65>] transaction_kthread+0x1b5/0x240 [btrfs]
[ 9352.785385]  [<ffffffffc0447bb0>] ? 
btrfs_cleanup_transaction+0x560/0x560 [btrfs]
[ 9352.785469]  [<ffffffff8108cc52>] kthread+0xd2/0xf0
[ 9352.785517]  [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.785571]  [<ffffffff81712dfc>] ret_from_fork+0x7c/0xb0
[ 9352.785620]  [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.785678] INFO: task kworker/u16:3:6932 blocked for more than 120 
seconds.
[ 9352.785732]       Tainted: G            E 3.16.0-rc6+ #64
[ 9352.785780] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 9352.785863] kworker/u16:3   D ffff88042fd943c0     0  6932      2 
0x00000000
[ 9352.785930] Workqueue: btrfs-flush_delalloc normal_work_helper [btrfs]
[ 9352.785983]  ffff88035f1bbb58 0000000000000002 ffff880417e564c0 
ffff88035f1bbfd8
[ 9352.786072]  00000000000143c0 00000000000143c0 ffff8800c1a03260 
ffff88042fd94cd8
[ 9352.786160]  ffff88042ffb4be8 ffff88035f1bbbe0 0000000000000002 
ffffffff81159930
[ 9352.786250] Call Trace:
[ 9352.786292]  [<ffffffff81159930>] ? wait_on_page_read+0x60/0x60
[ 9352.786343]  [<ffffffff8170ee6d>] io_schedule+0x9d/0x130
[ 9352.786393]  [<ffffffff8115993e>] sleep_on_page+0xe/0x20
[ 9352.786443]  [<ffffffff8170f3e8>] __wait_on_bit_lock+0x48/0xb0
[ 9352.786495]  [<ffffffff81159a4a>] __lock_page+0x6a/0x70
[ 9352.786544]  [<ffffffff810b14a0>] ? autoremove_wake_function+0x40/0x40
[ 9352.786607]  [<ffffffffc046711e>] ? flush_write_bio+0xe/0x10 [btrfs]
[ 9352.786669]  [<ffffffffc046b0c0>] 
extent_write_cache_pages.isra.28.constprop.46+0x3d0/0x3f0 [btrfs]
[ 9352.786766]  [<ffffffffc046cd2d>] extent_writepages+0x4d/0x70 [btrfs]
[ 9352.786828]  [<ffffffffc04506f0>] ? btrfs_submit_direct+0x6a0/0x6a0 
[btrfs]
[ 9352.786883]  [<ffffffff810b0d78>] ? __wake_up_common+0x58/0x90
[ 9352.786943]  [<ffffffffc044e1d8>] btrfs_writepages+0x28/0x30 [btrfs]
[ 9352.786997]  [<ffffffff811668ee>] do_writepages+0x1e/0x40
[ 9352.787045]  [<ffffffff8115b409>] __filemap_fdatawrite_range+0x59/0x60
[ 9352.787097]  [<ffffffff8115b4bc>] filemap_flush+0x1c/0x20
[ 9352.787151]  [<ffffffffc045158a>] btrfs_run_delalloc_work+0x5a/0xa0 
[btrfs]
[ 9352.787211]  [<ffffffffc047accf>] normal_work_helper+0x11f/0x2b0 [btrfs]
[ 9352.787266]  [<ffffffff810858d2>] process_one_work+0x182/0x450
[ 9352.787317]  [<ffffffff81086043>] worker_thread+0x123/0x5a0
[ 9352.787464]  [<ffffffff81085f20>] ? rescuer_thread+0x380/0x380
[ 9352.787515]  [<ffffffff8108cc52>] kthread+0xd2/0xf0
[ 9352.787563]  [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.787616]  [<ffffffff81712dfc>] ret_from_fork+0x7c/0xb0
[ 9352.787666]  [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.787744] INFO: task kworker/u16:9:8360 blocked for more than 120 
seconds.
[ 9352.787799]       Tainted: G            E 3.16.0-rc6+ #64
[ 9352.787847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 9352.787949] kworker/u16:9   D ffff88042fd543c0     0  8360      2 
0x00000000
[ 9352.788015] Workqueue: btrfs-delalloc normal_work_helper [btrfs]
[ 9352.788139]  ffff88035f20f838 0000000000000002 ffff8803216c1930 
ffff88035f20ffd8
[ 9352.788229]  00000000000143c0 00000000000143c0 ffff8804157e0000 
ffff88042fd54cd8
[ 9352.788317]  ffff88042ffad6e8 ffff88035f20f8c0 0000000000000002 
ffffffff81159930
[ 9352.788406] Call Trace:
[ 9352.788445]  [<ffffffff81159930>] ? wait_on_page_read+0x60/0x60
[ 9352.788495]  [<ffffffff8170ee6d>] io_schedule+0x9d/0x130
[ 9352.788543]  [<ffffffff8115993e>] sleep_on_page+0xe/0x20
[ 9352.788591]  [<ffffffff8170f3e8>] __wait_on_bit_lock+0x48/0xb0
[ 9352.788642]  [<ffffffff8133c8a4>] ? blk_finish_plug+0x14/0x40
[ 9352.788692]  [<ffffffff81159a4a>] __lock_page+0x6a/0x70
[ 9352.788740]  [<ffffffff810b14a0>] ? autoremove_wake_function+0x40/0x40
[ 9352.788792]  [<ffffffff8115ac34>] pagecache_get_page+0x164/0x1f0
[ 9352.788849]  [<ffffffffc048e447>] io_ctl_prepare_pages+0x67/0x180 [btrfs]
[ 9352.788933]  [<ffffffffc048fd0f>] __load_free_space_cache+0x1ff/0x700 
[btrfs]
[ 9352.788998]  [<ffffffffc049030c>] load_free_space_cache+0xfc/0x1c0 
[btrfs]
[ 9352.789059]  [<ffffffffc0430292>] cache_block_group+0x192/0x390 [btrfs]
[ 9352.789113]  [<ffffffff810b1460>] ? prepare_to_wait_event+0x100/0x100
[ 9352.789173]  [<ffffffffc04390a2>] find_free_extent+0xc72/0xcb0 [btrfs]
[ 9352.789233]  [<ffffffffc04392ef>] btrfs_reserve_extent+0xaf/0x1b0 [btrfs]
[ 9352.789295]  [<ffffffffc04526cc>] cow_file_range+0x13c/0x430 [btrfs]
[ 9352.789355]  [<ffffffffc0453c80>] ? 
submit_compressed_extents+0x480/0x480 [btrfs]
[ 9352.789444]  [<ffffffffc04539d2>] 
submit_compressed_extents+0x1d2/0x480 [btrfs]
[ 9352.789533]  [<ffffffffc0451524>] ? async_cow_free+0x24/0x30 [btrfs]
[ 9352.789592]  [<ffffffffc0453c80>] ? 
submit_compressed_extents+0x480/0x480 [btrfs]
[ 9352.789680]  [<ffffffffc0453d06>] async_cow_submit+0x86/0x90 [btrfs]
[ 9352.789737]  [<ffffffffc047ad43>] normal_work_helper+0x193/0x2b0 [btrfs]
[ 9352.789790]  [<ffffffff810858d2>] process_one_work+0x182/0x450
[ 9352.789840]  [<ffffffff81086043>] worker_thread+0x123/0x5a0
[ 9352.789889]  [<ffffffff81085f20>] ? rescuer_thread+0x380/0x380
[ 9352.789938]  [<ffffffff8108cc52>] kthread+0xd2/0xf0
[ 9352.789984]  [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.790038]  [<ffffffff81712dfc>] ret_from_fork+0x7c/0xb0
[ 9352.790088]  [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.790144] INFO: task kworker/u16:0:11682 blocked for more than 120 
seconds.
[ 9352.790199]       Tainted: G            E 3.16.0-rc6+ #64
[ 9352.795261] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 9352.795347] kworker/u16:0   D ffff88042fd543c0     0 11682      2 
0x00000000
[ 9352.795409] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-3)
[ 9352.795465]  ffff8802161cf9d8 0000000000000002 ffff8803fc3c0000 
ffff8802161cffd8
[ 9352.795554]  00000000000143c0 00000000000143c0 ffff8804197e0000 
ffff88042fd54cd8
[ 9352.795644]  ffff88042ffaa9e8 ffff8802161cfa60 0000000000000002 
ffffffff81159930
[ 9352.795734] Call Trace:
[ 9352.795776]  [<ffffffff81159930>] ? wait_on_page_read+0x60/0x60
[ 9352.795829]  [<ffffffff8170ee6d>] io_schedule+0x9d/0x130
[ 9352.795879]  [<ffffffff8115993e>] sleep_on_page+0xe/0x20
[ 9352.795928]  [<ffffffff8170f3e8>] __wait_on_bit_lock+0x48/0xb0
[ 9352.795979]  [<ffffffff81159a4a>] __lock_page+0x6a/0x70
[ 9352.796029]  [<ffffffff810b14a0>] ? autoremove_wake_function+0x40/0x40
[ 9352.796150]  [<ffffffffc046711e>] ? flush_write_bio+0xe/0x10 [btrfs]
[ 9352.796212]  [<ffffffffc046b0c0>] 
extent_write_cache_pages.isra.28.constprop.46+0x3d0/0x3f0 [btrfs]
[ 9352.796309]  [<ffffffffc046cd2d>] extent_writepages+0x4d/0x70 [btrfs]
[ 9352.796369]  [<ffffffffc04506f0>] ? btrfs_submit_direct+0x6a0/0x6a0 
[btrfs]
[ 9352.796428]  [<ffffffffc044e1d8>] btrfs_writepages+0x28/0x30 [btrfs]
[ 9352.796481]  [<ffffffff811668ee>] do_writepages+0x1e/0x40
[ 9352.796530]  [<ffffffff811f17d0>] __writeback_single_inode+0x40/0x210
[ 9352.796581]  [<ffffffff811f22b7>] writeback_sb_inodes+0x247/0x3e0
[ 9352.796632]  [<ffffffff811f24ef>] __writeback_inodes_wb+0x9f/0xd0
[ 9352.796682]  [<ffffffff811f2763>] wb_writeback+0x243/0x2c0
[ 9352.796731]  [<ffffffff811f4e09>] bdi_writeback_workfn+0x1b9/0x430
[ 9352.796784]  [<ffffffff810858d2>] process_one_work+0x182/0x450
[ 9352.796833]  [<ffffffff81086043>] worker_thread+0x123/0x5a0
[ 9352.796882]  [<ffffffff81085f20>] ? rescuer_thread+0x380/0x380
[ 9352.796933]  [<ffffffff8108cc52>] kthread+0xd2/0xf0
[ 9352.796979]  [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.797031]  [<ffffffff81712dfc>] ret_from_fork+0x7c/0xb0
[ 9352.797079]  [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.797134] INFO: task kworker/u16:4:13743 blocked for more than 120 
seconds.
[ 9352.797189]       Tainted: G            E 3.16.0-rc6+ #64
[ 9352.797299] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 9352.797383] kworker/u16:4   D ffff88042fc943c0     0 13743      2 
0x00000000
[ 9352.797451] Workqueue: btrfs-flush_delalloc normal_work_helper [btrfs]
[ 9352.797505]  ffff880139843b58 0000000000000002 ffff880191e38000 
ffff880139843fd8
[ 9352.797594]  00000000000143c0 00000000000143c0 ffff88041977b260 
ffff88042fc94cd8
[ 9352.797684]  ffff88042ffb4be8 ffff880139843be0 0000000000000002 
ffffffff81159930
[ 9352.797773] Call Trace:
[ 9352.797813]  [<ffffffff81159930>] ? wait_on_page_read+0x60/0x60
[ 9352.797865]  [<ffffffff8170ee6d>] io_schedule+0x9d/0x130
[ 9352.797914]  [<ffffffff8115993e>] sleep_on_page+0xe/0x20
[ 9352.797963]  [<ffffffff8170f3e8>] __wait_on_bit_lock+0x48/0xb0
[ 9352.798013]  [<ffffffff81159a4a>] __lock_page+0x6a/0x70
[ 9352.798060]  [<ffffffff810b14a0>] ? autoremove_wake_function+0x40/0x40
[ 9352.798119]  [<ffffffffc046711e>] ? flush_write_bio+0xe/0x10 [btrfs]
[ 9352.798176]  [<ffffffffc046b0c0>] 
extent_write_cache_pages.isra.28.constprop.46+0x3d0/0x3f0 [btrfs]
[ 9352.798269]  [<ffffffffc046cd2d>] extent_writepages+0x4d/0x70 [btrfs]
[ 9352.798326]  [<ffffffffc04506f0>] ? btrfs_submit_direct+0x6a0/0x6a0 
[btrfs]
[ 9352.798385]  [<ffffffffc044e1d8>] btrfs_writepages+0x28/0x30 [btrfs]
[ 9352.798437]  [<ffffffff811668ee>] do_writepages+0x1e/0x40
[ 9352.798485]  [<ffffffff8115b409>] __filemap_fdatawrite_range+0x59/0x60
[ 9352.798537]  [<ffffffff8115b4bc>] filemap_flush+0x1c/0x20
[ 9352.798590]  [<ffffffffc045158a>] btrfs_run_delalloc_work+0x5a/0xa0 
[btrfs]
[ 9352.798650]  [<ffffffffc047accf>] normal_work_helper+0x11f/0x2b0 [btrfs]
[ 9352.798703]  [<ffffffff810858d2>] process_one_work+0x182/0x450
[ 9352.798752]  [<ffffffff81086043>] worker_thread+0x123/0x5a0
[ 9352.798801]  [<ffffffff81085f20>] ? rescuer_thread+0x380/0x380
[ 9352.798851]  [<ffffffff8108cc52>] kthread+0xd2/0xf0
[ 9352.798897]  [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.798949]  [<ffffffff81712dfc>] ret_from_fork+0x7c/0xb0
[ 9352.798997]  [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9352.799049] INFO: task btrfs:13823 blocked for more than 120 seconds.
[ 9352.799101]       Tainted: G            E 3.16.0-rc6+ #64
[ 9352.799148] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 9352.799229] btrfs           D ffff88042fd943c0     0 13823  13461 
0x00000000
[ 9352.799286]  ffff880177203a80 0000000000000002 ffff8800c1a01930 
ffff880177203fd8
[ 9352.799375]  00000000000143c0 00000000000143c0 ffff8800c2f664c0 
ffff8803d29f2438
[ 9352.799463]  ffff8803d29f2440 7fffffffffffffff ffff8800c1a01930 
ffff880177203bc0
[ 9352.799552] Call Trace:
[ 9352.799590]  [<ffffffff8170eb59>] schedule+0x29/0x70
[ 9352.799637]  [<ffffffff8170df99>] schedule_timeout+0x209/0x280
[ 9352.799687]  [<ffffffff8109d4a3>] ? wake_up_process+0x23/0x40
[ 9352.799736]  [<ffffffff810829f4>] ? wake_up_worker+0x24/0x30
[ 9352.799786]  [<ffffffff810836ab>] ? insert_work+0x6b/0xb0
[ 9352.799877]  [<ffffffff8170f656>] wait_for_completion+0xa6/0x160
[ 9352.799928]  [<ffffffff8109d4e0>] ? wake_up_state+0x20/0x20
[ 9352.799989]  [<ffffffffc045b936>] 
btrfs_wait_and_free_delalloc_work+0x16/0x30 [btrfs]
[ 9352.800080]  [<ffffffffc045bafe>] __start_delalloc_inodes+0x1ae/0x2a0 
[btrfs]
[ 9352.800150]  [<ffffffffc045bc2e>] 
btrfs_start_delalloc_inodes+0x3e/0x120 [btrfs]
[ 9352.800235]  [<ffffffff810b1228>] ? finish_wait+0x58/0x70
[ 9352.800293]  [<ffffffffc047fa24>] btrfs_mksubvol.isra.29+0x1d4/0x570 
[btrfs]
[ 9352.800348]  [<ffffffff810b1460>] ? prepare_to_wait_event+0x100/0x100
[ 9352.800409]  [<ffffffffc047ff46>] 
btrfs_ioctl_snap_create_transid+0x186/0x190 [btrfs]
[ 9352.800501]  [<ffffffffc04800ab>] 
btrfs_ioctl_snap_create_v2+0xeb/0x130 [btrfs]
[ 9352.800592]  [<ffffffffc048218f>] btrfs_ioctl+0xcaf/0x2ae0 [btrfs]
[ 9352.800645]  [<ffffffff8118d933>] ? mmap_region+0x163/0x5d0
[ 9352.800696]  [<ffffffff81057c8c>] ? __do_page_fault+0x20c/0x550
[ 9352.800748]  [<ffffffff811dc840>] do_vfs_ioctl+0x2e0/0x4c0
[ 9352.800798]  [<ffffffff810a0e64>] ? vtime_account_user+0x54/0x60
[ 9352.800850]  [<ffffffff811dcaa1>] SyS_ioctl+0x81/0xa0
[ 9352.800899]  [<ffffffff817130bf>] tracesys+0xe1/0xe6
[ 9354.863323] kvm [4191]: vcpu0 unhandled rdmsr: 0x345
[ 9354.869074] kvm_set_msr_common: 118 callbacks suppressed
[ 9354.869130] kvm [4191]: vcpu0 unhandled wrmsr: 0x680 data 0
[ 9354.869185] kvm [4191]: vcpu0 unhandled wrmsr: 0x6c0 data 0
[ 9354.869240] kvm [4191]: vcpu0 unhandled wrmsr: 0x681 data 0
[ 9354.869294] kvm [4191]: vcpu0 unhandled wrmsr: 0x6c1 data 0
[ 9354.869346] kvm [4191]: vcpu0 unhandled wrmsr: 0x682 data 0
[ 9354.869397] kvm [4191]: vcpu0 unhandled wrmsr: 0x6c2 data 0
[ 9354.869449] kvm [4191]: vcpu0 unhandled wrmsr: 0x683 data 0
[ 9354.869502] kvm [4191]: vcpu0 unhandled wrmsr: 0x6c3 data 0
[ 9354.869556] kvm [4191]: vcpu0 unhandled wrmsr: 0x684 data 0
[ 9354.869610] kvm [4191]: vcpu0 unhandled wrmsr: 0x6c4 data 0
[ 9376.448998] kvm [4191]: vcpu0 unhandled rdmsr: 0x611
[ 9376.449052] kvm [4191]: vcpu0 unhandled rdmsr: 0x639
[ 9376.449100] kvm [4191]: vcpu0 unhandled rdmsr: 0x641
[ 9376.449148] kvm [4191]: vcpu0 unhandled rdmsr: 0x619
[ 9472.705412] INFO: task btrfs-transacti:3874 blocked for more than 120 
seconds.
[ 9472.705499]       Tainted: G            E 3.16.0-rc6+ #64
[ 9472.705547] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 9472.705629] btrfs-transacti D ffff88042fc943c0     0  3874      2 
0x00000000
[ 9472.705688]  ffff8803fb9dfca0 0000000000000002 ffff8800c4214b90 
ffff8803fb9dffd8
[ 9472.705778]  00000000000143c0 00000000000143c0 ffff88041977b260 
ffff8803d29f23a0
[ 9472.705868]  ffff8803d29f23a8 7fffffffffffffff ffff8800c4214b90 
ffff880232e2c0a8
[ 9472.705976] Call Trace:
[ 9472.706019]  [<ffffffff8170eb59>] schedule+0x29/0x70
[ 9472.706067]  [<ffffffff8170df99>] schedule_timeout+0x209/0x280
[ 9472.706118]  [<ffffffff8170874b>] ? __slab_free+0xfe/0x2c3
[ 9472.706169]  [<ffffffff810829f4>] ? wake_up_worker+0x24/0x30
[ 9472.706219]  [<ffffffff8170f656>] wait_for_completion+0xa6/0x160
[ 9472.706270]  [<ffffffff8109d4e0>] ? wake_up_state+0x20/0x20
[ 9472.706338]  [<ffffffffc045b936>] 
btrfs_wait_and_free_delalloc_work+0x16/0x30 [btrfs]
[ 9472.706429]  [<ffffffffc04658be>] 
btrfs_run_ordered_operations+0x1ee/0x2c0 [btrfs]
[ 9472.706518]  [<ffffffffc044bbb7>] btrfs_commit_transaction+0x27/0xa40 
[btrfs]
[ 9472.706579]  [<ffffffffc0447d65>] transaction_kthread+0x1b5/0x240 [btrfs]
[ 9472.706637]  [<ffffffffc0447bb0>] ? 
btrfs_cleanup_transaction+0x560/0x560 [btrfs]
[ 9472.706720]  [<ffffffff8108cc52>] kthread+0xd2/0xf0
[ 9472.706767]  [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9472.706820]  [<ffffffff81712dfc>] ret_from_fork+0x7c/0xb0
[ 9472.706869]  [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9472.706926] INFO: task kworker/u16:3:6932 blocked for more than 120 
seconds.
[ 9472.706980]       Tainted: G            E 3.16.0-rc6+ #64
[ 9472.707029] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 9472.707112] kworker/u16:3   D ffff88042fd943c0     0  6932      2 
0x00000000
[ 9472.707179] Workqueue: btrfs-flush_delalloc normal_work_helper [btrfs]
[ 9472.707233]  ffff88035f1bbb58 0000000000000002 ffff880417e564c0 
ffff88035f1bbfd8
[ 9472.707322]  00000000000143c0 00000000000143c0 ffff8800c1a03260 
ffff88042fd94cd8
[ 9472.707413]  ffff88042ffb4be8 ffff88035f1bbbe0 0000000000000002 
ffffffff81159930
[ 9472.707502] Call Trace:
[ 9472.707543]  [<ffffffff81159930>] ? wait_on_page_read+0x60/0x60
[ 9472.707593]  [<ffffffff8170ee6d>] io_schedule+0x9d/0x130
[ 9472.707643]  [<ffffffff8115993e>] sleep_on_page+0xe/0x20
[ 9472.707697]  [<ffffffff8170f3e8>] __wait_on_bit_lock+0x48/0xb0
[ 9472.707770]  [<ffffffff81159a4a>] __lock_page+0x6a/0x70
[ 9472.707820]  [<ffffffff810b14a0>] ? autoremove_wake_function+0x40/0x40
[ 9472.707884]  [<ffffffffc046711e>] ? flush_write_bio+0xe/0x10 [btrfs]
[ 9472.707947]  [<ffffffffc046b0c0>] 
extent_write_cache_pages.isra.28.constprop.46+0x3d0/0x3f0 [btrfs]
[ 9472.713000]  [<ffffffffc046cd2d>] extent_writepages+0x4d/0x70 [btrfs]
[ 9472.713092]  [<ffffffffc04506f0>] ? btrfs_submit_direct+0x6a0/0x6a0 
[btrfs]
[ 9472.713148]  [<ffffffff810b0d78>] ? __wake_up_common+0x58/0x90
[ 9472.713203]  [<ffffffffc044e1d8>] btrfs_writepages+0x28/0x30 [btrfs]
[ 9472.713256]  [<ffffffff811668ee>] do_writepages+0x1e/0x40
[ 9472.713305]  [<ffffffff8115b409>] __filemap_fdatawrite_range+0x59/0x60
[ 9472.713359]  [<ffffffff8115b4bc>] filemap_flush+0x1c/0x20
[ 9472.713425]  [<ffffffffc045158a>] btrfs_run_delalloc_work+0x5a/0xa0 
[btrfs]
[ 9472.713491]  [<ffffffffc047accf>] normal_work_helper+0x11f/0x2b0 [btrfs]
[ 9472.713547]  [<ffffffff810858d2>] process_one_work+0x182/0x450
[ 9472.713598]  [<ffffffff81086043>] worker_thread+0x123/0x5a0
[ 9472.713648]  [<ffffffff81085f20>] ? rescuer_thread+0x380/0x380
[ 9472.713723]  [<ffffffff8108cc52>] kthread+0xd2/0xf0
[ 9472.713794]  [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9472.713849]  [<ffffffff81712dfc>] ret_from_fork+0x7c/0xb0
[ 9472.713898]  [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9472.713953] INFO: task kworker/u16:9:8360 blocked for more than 120 
seconds.
[ 9472.714028]       Tainted: G            E 3.16.0-rc6+ #64
[ 9472.714076] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 9472.714159] kworker/u16:9   D ffff88042fd543c0     0  8360      2 
0x00000000
[ 9472.714225] Workqueue: btrfs-delalloc normal_work_helper [btrfs]
[ 9472.714277]  ffff88035f20f838 0000000000000002 ffff8803216c1930 
ffff88035f20ffd8
[ 9472.714366]  00000000000143c0 00000000000143c0 ffff8804157e0000 
ffff88042fd54cd8
[ 9472.714455]  ffff88042ffad6e8 ffff88035f20f8c0 0000000000000002 
ffffffff81159930
[ 9472.714544] Call Trace:
[ 9472.714584]  [<ffffffff81159930>] ? wait_on_page_read+0x60/0x60
[ 9472.714636]  [<ffffffff8170ee6d>] io_schedule+0x9d/0x130
[ 9472.714684]  [<ffffffff8115993e>] sleep_on_page+0xe/0x20
[ 9472.714734]  [<ffffffff8170f3e8>] __wait_on_bit_lock+0x48/0xb0
[ 9472.714785]  [<ffffffff8133c8a4>] ? blk_finish_plug+0x14/0x40
[ 9472.714835]  [<ffffffff81159a4a>] __lock_page+0x6a/0x70
[ 9472.714884]  [<ffffffff810b14a0>] ? autoremove_wake_function+0x40/0x40
[ 9472.714938]  [<ffffffff8115ac34>] pagecache_get_page+0x164/0x1f0
[ 9472.714999]  [<ffffffffc048e447>] io_ctl_prepare_pages+0x67/0x180 [btrfs]
[ 9472.715088]  [<ffffffffc048fd0f>] __load_free_space_cache+0x1ff/0x700 
[btrfs]
[ 9472.715152]  [<ffffffffc049030c>] load_free_space_cache+0xfc/0x1c0 
[btrfs]
[ 9472.715213]  [<ffffffffc0430292>] cache_block_group+0x192/0x390 [btrfs]
[ 9472.715267]  [<ffffffff810b1460>] ? prepare_to_wait_event+0x100/0x100
[ 9472.715328]  [<ffffffffc04390a2>] find_free_extent+0xc72/0xcb0 [btrfs]
[ 9472.715388]  [<ffffffffc04392ef>] btrfs_reserve_extent+0xaf/0x1b0 [btrfs]
[ 9472.715451]  [<ffffffffc04526cc>] cow_file_range+0x13c/0x430 [btrfs]
[ 9472.715511]  [<ffffffffc0453c80>] ? 
submit_compressed_extents+0x480/0x480 [btrfs]
[ 9472.715602]  [<ffffffffc04539d2>] 
submit_compressed_extents+0x1d2/0x480 [btrfs]
[ 9472.715691]  [<ffffffffc0451524>] ? async_cow_free+0x24/0x30 [btrfs]
[ 9472.715748]  [<ffffffffc0453c80>] ? 
submit_compressed_extents+0x480/0x480 [btrfs]
[ 9472.715835]  [<ffffffffc0453d06>] async_cow_submit+0x86/0x90 [btrfs]
[ 9472.715894]  [<ffffffffc047ad43>] normal_work_helper+0x193/0x2b0 [btrfs]
[ 9472.715948]  [<ffffffff810858d2>] process_one_work+0x182/0x450
[ 9472.715998]  [<ffffffff81086043>] worker_thread+0x123/0x5a0
[ 9472.716047]  [<ffffffff81085f20>] ? rescuer_thread+0x380/0x380
[ 9472.716098]  [<ffffffff8108cc52>] kthread+0xd2/0xf0
[ 9472.716145]  [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9472.716197]  [<ffffffff81712dfc>] ret_from_fork+0x7c/0xb0
[ 9472.716246]  [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9472.716303] INFO: task kworker/u16:0:11682 blocked for more than 120 
seconds.
[ 9472.716357]       Tainted: G            E 3.16.0-rc6+ #64
[ 9472.716405] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 9472.716488] kworker/u16:0   D ffff88042fd543c0     0 11682      2 
0x00000000
[ 9472.716548] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-3)
[ 9472.716610]  ffff8802161cf9d8 0000000000000002 ffff8803fc3c0000 
ffff8802161cffd8
[ 9472.716699]  00000000000143c0 00000000000143c0 ffff8804197e0000 
ffff88042fd54cd8
[ 9472.716788]  ffff88042ffaa9e8 ffff8802161cfa60 0000000000000002 
ffffffff81159930
[ 9472.716878] Call Trace:
[ 9472.716917]  [<ffffffff81159930>] ? wait_on_page_read+0x60/0x60
[ 9472.716968]  [<ffffffff8170ee6d>] io_schedule+0x9d/0x130
[ 9472.717017]  [<ffffffff8115993e>] sleep_on_page+0xe/0x20
[ 9472.717067]  [<ffffffff8170f3e8>] __wait_on_bit_lock+0x48/0xb0
[ 9472.717118]  [<ffffffff81159a4a>] __lock_page+0x6a/0x70
[ 9472.717167]  [<ffffffff810b14a0>] ? autoremove_wake_function+0x40/0x40
[ 9472.717231]  [<ffffffffc046711e>] ? flush_write_bio+0xe/0x10 [btrfs]
[ 9472.717293]  [<ffffffffc046b0c0>] 
extent_write_cache_pages.isra.28.constprop.46+0x3d0/0x3f0 [btrfs]
[ 9472.717402]  [<ffffffffc046cd2d>] extent_writepages+0x4d/0x70 [btrfs]
[ 9472.717465]  [<ffffffffc04506f0>] ? btrfs_submit_direct+0x6a0/0x6a0 
[btrfs]
[ 9472.717553]  [<ffffffffc044e1d8>] btrfs_writepages+0x28/0x30 [btrfs]
[ 9472.717607]  [<ffffffff811668ee>] do_writepages+0x1e/0x40
[ 9472.717657]  [<ffffffff811f17d0>] __writeback_single_inode+0x40/0x210
[ 9472.717710]  [<ffffffff811f22b7>] writeback_sb_inodes+0x247/0x3e0
[ 9472.717784]  [<ffffffff811f24ef>] __writeback_inodes_wb+0x9f/0xd0
[ 9472.717836]  [<ffffffff811f2763>] wb_writeback+0x243/0x2c0
[ 9472.717885]  [<ffffffff811f4e09>] bdi_writeback_workfn+0x1b9/0x430
[ 9472.717938]  [<ffffffff810858d2>] process_one_work+0x182/0x450
[ 9472.717990]  [<ffffffff81086043>] worker_thread+0x123/0x5a0
[ 9472.718039]  [<ffffffff81085f20>] ? rescuer_thread+0x380/0x380
[ 9472.718089]  [<ffffffff8108cc52>] kthread+0xd2/0xf0
[ 9472.718136]  [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180
[ 9472.718188]  [<ffffffff81712dfc>] ret_from_fork+0x7c/0xb0
[ 9472.718237]  [<ffffffff8108cb80>] ? kthread_create_on_node+0x180/0x180

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-22 19:42                                     ` Torbjørn
@ 2014-07-22 19:50                                       ` Chris Mason
  2014-07-22 20:10                                         ` Torbjørn
  0 siblings, 1 reply; 49+ messages in thread
From: Chris Mason @ 2014-07-22 19:50 UTC (permalink / raw)
  To: Torbjørn; +Cc: linux-btrfs

On 07/22/2014 03:42 PM, Torbjørn wrote:
> On 07/22/2014 04:53 PM, Chris Mason wrote:
>>
>> On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
>>
>>>> Running 3.15.6 with this patch applied on top:
>>>>   - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/
>>>> /home/nyx/`
>>>> - no extra error messages printed (`dmesg | grep racing`) compared to
>>>> without the patch
>>> I got same results with 3.16-rc5 + this patch (see thread BTRFS hang
>>> with
>>> 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
>>>
>>>> To recap some details (so I can have it all in one place):
>>>>   - /home/ is btrfs with compress=lzo
>>> BTRFS RAID 1 with lzo.
>>>
>>>>   - I have _not_ created any nodatacow files.
>>> Me neither.
>>>
>>>>   - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
>>>> mentioning the use of dmcrypt)
>>> Same, except no dmcrypt.
>>>
>> Thanks for the help in tracking this down everyone.  We'll get there!
>> Are you all running multi-disk systems (from a btrfs POV, more than one
>> device?)  I don't care how many physical drives this maps to, just does
>> btrfs think there's more than one drive.
>>
>> -chris
> 3.16-rc6 with your patch on top still causes hangs here.
> No traces of "racing" in dmesg
> Hang is on a btrfs raid 0 consisting of 3 drives.
> Full stack is: sata <-> dmcrypt <-> btrfs raid0
> 
> Hang was caused by
> 1. Several rsync -av --inplace --delete <source> <backup subvol>
> 2. btrfs subvolume snapshot -r <backup subvol> <bacup snap>
> 
> The rsync jobs are done one at a time
> btrfs is stuck when trying to create the read only snapshot

The trace is similar, but you're stuck trying to read the free space
cache.  This one I saw earlier this morning, but I haven't seen these
parts from the 3.15 bug reports.

Maybe they are related though, I'll dig into the 3.15 bug reports again.

-chris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-22 19:50                                       ` Chris Mason
@ 2014-07-22 20:10                                         ` Torbjørn
  0 siblings, 0 replies; 49+ messages in thread
From: Torbjørn @ 2014-07-22 20:10 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

On 07/22/2014 09:50 PM, Chris Mason wrote:
> On 07/22/2014 03:42 PM, Torbjørn wrote:
>> On 07/22/2014 04:53 PM, Chris Mason wrote:
>>> On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
>>>
>>>>> Running 3.15.6 with this patch applied on top:
>>>>>    - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/
>>>>> /home/nyx/`
>>>>> - no extra error messages printed (`dmesg | grep racing`) compared to
>>>>> without the patch
>>>> I got same results with 3.16-rc5 + this patch (see thread BTRFS hang
>>>> with
>>>> 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
>>>>
>>>>> To recap some details (so I can have it all in one place):
>>>>>    - /home/ is btrfs with compress=lzo
>>>> BTRFS RAID 1 with lzo.
>>>>
>>>>>    - I have _not_ created any nodatacow files.
>>>> Me neither.
>>>>
>>>>>    - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
>>>>> mentioning the use of dmcrypt)
>>>> Same, except no dmcrypt.
>>>>
>>> Thanks for the help in tracking this down everyone.  We'll get there!
>>> Are you all running multi-disk systems (from a btrfs POV, more than one
>>> device?)  I don't care how many physical drives this maps to, just does
>>> btrfs think there's more than one drive.
>>>
>>> -chris
>> 3.16-rc6 with your patch on top still causes hangs here.
>> No traces of "racing" in dmesg
>> Hang is on a btrfs raid 0 consisting of 3 drives.
>> Full stack is: sata <-> dmcrypt <-> btrfs raid0
>>
>> Hang was caused by
>> 1. Several rsync -av --inplace --delete <source> <backup subvol>
>> 2. btrfs subvolume snapshot -r <backup subvol> <bacup snap>
>>
>> The rsync jobs are done one at a time
>> btrfs is stuck when trying to create the read only snapshot
> The trace is similar, but you're stuck trying to read the free space
> cache.  This one I saw earlier this morning, but I haven't seen these
> parts from the 3.15 bug reports.
>
> Maybe they are related though, I'll dig into the 3.15 bug reports again.
>
> -chris
In case it was not clear, this hang was on a different btrfs volume than 
the 3.15 hang (but the same server).
Earlier the affected volume was readable during the hang. This time the 
volume is not readable either.

I'll keep the patched 3.16 running and see if I can trigger something 
similar to the 3.15 hang.

Thanks

--
Torbjørn

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-22 14:53                                   ` Chris Mason
                                                       ` (2 preceding siblings ...)
  2014-07-22 19:42                                     ` Torbjørn
@ 2014-07-22 21:13                                     ` Martin Steigerwald
  2014-07-22 21:15                                       ` Chris Mason
  2014-07-23  1:06                                     ` Rich Freeman
                                                       ` (3 subsequent siblings)
  7 siblings, 1 reply; 49+ messages in thread
From: Martin Steigerwald @ 2014-07-22 21:13 UTC (permalink / raw)
  To: Chris Mason; +Cc: Cody P Schafer, Chris Samuel, linux-btrfs

Am Dienstag, 22. Juli 2014, 10:53:03 schrieb Chris Mason:
> On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
> >> Running 3.15.6 with this patch applied on top:
> >>  - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/
> >>  /home/nyx/`
> >> 
> >> - no extra error messages printed (`dmesg | grep racing`) compared to
> >> without the patch
> > 
> > I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with
> > 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
> > 
> >> To recap some details (so I can have it all in one place):
> >>  - /home/ is btrfs with compress=lzo
> > 
> > BTRFS RAID 1 with lzo.
> > 
> >>  - I have _not_ created any nodatacow files.
> > 
> > Me neither.
> > 
> >>  - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
> >> 
> >> mentioning the use of dmcrypt)
> > 
> > Same, except no dmcrypt.
> 
> Thanks for the help in tracking this down everyone.  We'll get there!
> Are you all running multi-disk systems (from a btrfs POV, more than one
> device?)  I don't care how many physical drives this maps to, just does
> btrfs think there's more than one drive.

As I told before I am using BTRFS RAID 1. Two logival volumes on two distinct 
SSDs. RAID is directly in BTRFS, no SoftRAID here (which I wouldn´t want to 
use with SSDs anyway).

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-22 21:13                                     ` Martin Steigerwald
@ 2014-07-22 21:15                                       ` Chris Mason
  2014-07-23 11:13                                         ` Martin Steigerwald
  0 siblings, 1 reply; 49+ messages in thread
From: Chris Mason @ 2014-07-22 21:15 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: Cody P Schafer, Chris Samuel, linux-btrfs

On 07/22/2014 05:13 PM, Martin Steigerwald wrote:
> Am Dienstag, 22. Juli 2014, 10:53:03 schrieb Chris Mason:
>> On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
>>>> Running 3.15.6 with this patch applied on top:
>>>>  - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/
>>>>  /home/nyx/`
>>>>
>>>> - no extra error messages printed (`dmesg | grep racing`) compared to
>>>> without the patch
>>>
>>> I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with
>>> 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
>>>
>>>> To recap some details (so I can have it all in one place):
>>>>  - /home/ is btrfs with compress=lzo
>>>
>>> BTRFS RAID 1 with lzo.
>>>
>>>>  - I have _not_ created any nodatacow files.
>>>
>>> Me neither.
>>>
>>>>  - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
>>>>
>>>> mentioning the use of dmcrypt)
>>>
>>> Same, except no dmcrypt.
>>
>> Thanks for the help in tracking this down everyone.  We'll get there!
>> Are you all running multi-disk systems (from a btrfs POV, more than one
>> device?)  I don't care how many physical drives this maps to, just does
>> btrfs think there's more than one drive.
> 
> As I told before I am using BTRFS RAID 1. Two logival volumes on two distinct 
> SSDs. RAID is directly in BTRFS, no SoftRAID here (which I wouldn´t want to 
> use with SSDs anyway).
> 

When you say logical volumes, you mean LVM right?  Just making sure I
know all the pieces involved.

-chris

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-22 14:53                                   ` Chris Mason
                                                       ` (3 preceding siblings ...)
  2014-07-22 21:13                                     ` Martin Steigerwald
@ 2014-07-23  1:06                                     ` Rich Freeman
  2014-07-23  6:38                                       ` Felix Seidel
  2014-07-23 13:20                                     ` Charles Cazabon
                                                       ` (2 subsequent siblings)
  7 siblings, 1 reply; 49+ messages in thread
From: Rich Freeman @ 2014-07-23  1:06 UTC (permalink / raw)
  To: Chris Mason; +Cc: Martin Steigerwald, Cody P Schafer, Chris Samuel, Btrfs BTRFS

On Tue, Jul 22, 2014 at 10:53 AM, Chris Mason <clm@fb.com> wrote:
>
> Thanks for the help in tracking this down everyone.  We'll get there!
> Are you all running multi-disk systems (from a btrfs POV, more than one
> device?)  I don't care how many physical drives this maps to, just does
> btrfs think there's more than one drive.

I've been away on vacation so I haven't been able to try your latest
patch, but I can try whatever is out there starting this weekend.

I was getting fairly consistent hangs during heavy IO (especially
rsync) on 3.15 with lzo enabled.  This is on raid1 across 5 drives,
directly against the partitions themselves (no dmcrypt, mdadm, lvm,
etc).  I disabled lzo and haven't had problems since.  I'm now running
on mainline without issue, but I think I did see the hang on mainline
when I tried enabling lzo again briefly.

Rich

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-23  1:06                                     ` Rich Freeman
@ 2014-07-23  6:38                                       ` Felix Seidel
  0 siblings, 0 replies; 49+ messages in thread
From: Felix Seidel @ 2014-07-23  6:38 UTC (permalink / raw)
  To: Rich Freeman; +Cc: Chris Mason, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 571 bytes --]

On 23 Jul 2014, at 03:06, Rich Freeman <r-btrfs@thefreemanclan.net> wrote:
> I disabled lzo and haven't had problems since.  I'm now running
> on mainline without issue, but I think I did see the hang on mainline
> when I tried enabling lzo again briefly.

Can confirm. I’m running mainline 3.16rc5 and was experiencing deadlocks
when having LZO enabled. 
Disabled it, now all seems ok.

Using btrfs RAID1 -> dm-crypt -> SATA.

I’ve attached some more dmesg “blocked” messages using kernel versions
3.15.5, 3.14.6 and 3.16rc5 just in case it helps anyone.


[-- Attachment #2: dmesg.txt --]
[-- Type: text/plain, Size: 14924 bytes --]

Jul 18 23:36:58 nas kernel: INFO: task sudo:1214 blocked for more than 120 seconds.
Jul 18 23:36:58 nas kernel:       Tainted: G           O  3.15.5-2-ARCH #1
Jul 18 23:36:58 nas kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 18 23:36:58 nas kernel: sudo            D 0000000000000000     0  1214      1 0x00000004
Jul 18 23:36:58 nas kernel:  ffff88001d0ebc20 0000000000000086 ffff88002cca5bb0 0000000000014700
Jul 18 23:36:58 nas kernel:  ffff88001d0ebfd8 0000000000014700 ffff88002cca5bb0 0000000000000000
Jul 18 23:36:58 nas kernel:  ffff880028ee4000 0000000000000003 00000000284e0d53 0000000000000002
Jul 18 23:36:58 nas kernel: Call Trace:
Jul 18 23:36:58 nas kernel:  [<ffffffff815110dc>] ? __do_page_fault+0x2ec/0x600
Jul 18 23:36:58 nas kernel:  [<ffffffff81509fa9>] schedule+0x29/0x70
Jul 18 23:36:58 nas kernel:  [<ffffffff8150a426>] schedule_preempt_disabled+0x16/0x20
Jul 18 23:36:58 nas kernel:  [<ffffffff8150bda5>] __mutex_lock_slowpath+0xe5/0x230
Jul 18 23:36:58 nas kernel:  [<ffffffff8150bf07>] mutex_lock+0x17/0x30
Jul 18 23:36:58 nas kernel:  [<ffffffff811bfa24>] lookup_slow+0x34/0xc0
Jul 18 23:36:58 nas kernel:  [<ffffffff811c1b73>] path_lookupat+0x723/0x880
Jul 18 23:36:58 nas kernel:  [<ffffffff8114f111>] ? release_pages+0xc1/0x280
Jul 18 23:36:58 nas kernel:  [<ffffffff811bfd97>] ? getname_flags+0x37/0x130
Jul 18 23:36:58 nas kernel:  [<ffffffff811c1cf6>] filename_lookup.isra.30+0x26/0x80
Jul 18 23:36:58 nas kernel:  [<ffffffff811c4fd7>] user_path_at_empty+0x67/0xd0
Jul 18 23:36:58 nas kernel:  [<ffffffff81172b52>] ? unmap_region+0xe2/0x130
Jul 18 23:36:58 nas kernel:  [<ffffffff811c5051>] user_path_at+0x11/0x20
Jul 18 23:36:58 nas kernel:  [<ffffffff811b979a>] vfs_fstatat+0x6a/0xd0
Jul 18 23:36:58 nas kernel:  [<ffffffff811b981b>] vfs_stat+0x1b/0x20
Jul 18 23:36:58 nas kernel:  [<ffffffff811b9df9>] SyS_newstat+0x29/0x60
Jul 18 23:36:58 nas kernel:  [<ffffffff8117501c>] ? vm_munmap+0x4c/0x60
Jul 18 23:36:58 nas kernel:  [<ffffffff81175f92>] ? SyS_munmap+0x22/0x30
Jul 18 23:36:58 nas kernel:  [<ffffffff81515fa9>] system_call_fastpath+0x16/0x1b
---
Jul 19 18:34:17 nas kernel: INFO: task rsync:4900 blocked for more than 120 seconds.
Jul 19 18:34:17 nas kernel:       Tainted: G           O  3.15.5-2-ARCH #1
Jul 19 18:34:17 nas kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 19 18:34:17 nas kernel: rsync           D 0000000000000000     0  4900   4899 0x00000000
Jul 19 18:34:17 nas kernel:  ffff880005947c20 0000000000000082 ffff880034aa4750 0000000000014700
Jul 19 18:34:17 nas kernel:  ffff880005947fd8 0000000000014700 ffff880034aa4750 ffffffff810a5995
Jul 19 18:34:17 nas kernel:  ffff88011fc14700 ffff8800dd828a30 ffff8800cece6a00 ffff880005947bd8
Jul 19 18:34:17 nas kernel: Call Trace:
Jul 19 18:34:17 nas kernel:  [<ffffffff810a5995>] ? set_next_entity+0x95/0xb0
Jul 19 18:34:17 nas kernel:  [<ffffffff810ac0be>] ? pick_next_task_fair+0x46e/0x550
Jul 19 18:34:17 nas kernel:  [<ffffffff810136c1>] ? __switch_to+0x1f1/0x540
Jul 19 18:34:17 nas kernel:  [<ffffffff81509fa9>] schedule+0x29/0x70
Jul 19 18:34:17 nas kernel:  [<ffffffff8150a426>] schedule_preempt_disabled+0x16/0x20
Jul 19 18:34:17 nas kernel:  [<ffffffff8150bda5>] __mutex_lock_slowpath+0xe5/0x230
Jul 19 18:34:17 nas kernel:  [<ffffffff8150bf07>] mutex_lock+0x17/0x30
Jul 19 18:34:17 nas kernel:  [<ffffffff811bfa24>] lookup_slow+0x34/0xc0
Jul 19 18:34:17 nas kernel:  [<ffffffff811c1b73>] path_lookupat+0x723/0x880
Jul 19 18:34:17 nas kernel:  [<ffffffff8150a2bf>] ? io_schedule+0xbf/0xf0
Jul 19 18:34:17 nas kernel:  [<ffffffff8150a7d1>] ? __wait_on_bit_lock+0x91/0xb0
Jul 19 18:34:17 nas kernel:  [<ffffffff811bfd97>] ? getname_flags+0x37/0x130
Jul 19 18:34:17 nas kernel:  [<ffffffff811c1cf6>] filename_lookup.isra.30+0x26/0x80
Jul 19 18:34:17 nas kernel:  [<ffffffff811c4fd7>] user_path_at_empty+0x67/0xd0
Jul 19 18:34:17 nas kernel:  [<ffffffff811c5051>] user_path_at+0x11/0x20
Jul 19 18:34:17 nas kernel:  [<ffffffff811b979a>] vfs_fstatat+0x6a/0xd0
Jul 19 18:34:17 nas kernel:  [<ffffffff811d4414>] ? mntput+0x24/0x40
Jul 19 18:34:17 nas kernel:  [<ffffffff811b983e>] vfs_lstat+0x1e/0x20
Jul 19 18:34:17 nas kernel:  [<ffffffff811b9e59>] SyS_newlstat+0x29/0x60
Jul 19 18:34:17 nas kernel:  [<ffffffff8108a3c4>] ? task_work_run+0xa4/0xe0
Jul 19 18:34:17 nas kernel:  [<ffffffff8150e939>] ? do_device_not_available+0x19/0x20
Jul 19 18:34:17 nas kernel:  [<ffffffff8151760e>] ? device_not_available+0x1e/0x30
Jul 19 18:34:17 nas kernel:  [<ffffffff81515fa9>] system_call_fastpath+0x16/0x1b
---
Jul 19 23:58:39 nas kernel: INFO: task rsync:3407 blocked for more than 120 seconds.
Jul 19 23:58:39 nas kernel:       Not tainted 3.14.6-1-ARCH #1
Jul 19 23:58:39 nas kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 19 23:58:39 nas kernel: rsync           D ffff8800cd9e3110     0  3407   3405 0x00000000
Jul 19 23:58:39 nas kernel:  ffff8800a2f21bc0 0000000000000086 ffff8800a2f21b10 ffff8800cd9e3110
Jul 19 23:58:39 nas kernel:  00000000000146c0 ffff8800a2f21fd8 00000000000146c0 ffff8800cd9e3110
Jul 19 23:58:39 nas kernel:  00000000ffffffef ffff8800dd51bf60 ffffffffa02f4c5c ffff8800a2f21b28
Jul 19 23:58:39 nas kernel: Call Trace:
Jul 19 23:58:39 nas kernel:  [<ffffffffa02f4c5c>] ? __set_extent_bit+0x45c/0x550 [btrfs]
Jul 19 23:58:39 nas kernel:  [<ffffffffa02f4263>] ? free_extent_state+0x43/0xc0 [btrfs]
Jul 19 23:58:39 nas kernel:  [<ffffffffa02f4c5c>] ? __set_extent_bit+0x45c/0x550 [btrfs]
Jul 19 23:58:39 nas kernel:  [<ffffffff8150bb09>] schedule+0x29/0x70
Jul 19 23:58:39 nas kernel:  [<ffffffffa02f5b72>] lock_extent_bits+0x152/0x1e0 [btrfs]
Jul 19 23:58:39 nas kernel:  [<ffffffff810b3ec0>] ? __wake_up_sync+0x20/0x20
Jul 19 23:58:39 nas kernel:  [<ffffffffa02e39b9>] btrfs_evict_inode+0x139/0x520 [btrfs]
Jul 19 23:58:39 nas kernel:  [<ffffffff811d5ac0>] evict+0xb0/0x1c0
Jul 19 23:58:39 nas kernel:  [<ffffffff811d6375>] iput+0xf5/0x1a0
Jul 19 23:58:39 nas kernel:  [<ffffffff811d1bc0>] dentry_kill+0x240/0x290
Jul 19 23:58:39 nas kernel:  [<ffffffff811d1c8b>] dput+0x7b/0x120
Jul 19 23:58:39 nas kernel:  [<ffffffff811cc0da>] SyS_renameat+0x2ca/0x400
Jul 19 23:58:39 nas kernel:  [<ffffffff811bd001>] ? __sb_end_write+0x31/0x60
Jul 19 23:58:39 nas kernel:  [<ffffffff811d9111>] ? mnt_drop_write+0x31/0x50
Jul 19 23:58:39 nas kernel:  [<ffffffff811b81c3>] ? chmod_common+0xe3/0x160
Jul 19 23:58:39 nas kernel:  [<ffffffff811da334>] ? mntput+0x24/0x40
Jul 19 23:58:39 nas kernel:  [<ffffffff811c478e>] ? path_put+0x1e/0x30
Jul 19 23:58:39 nas kernel:  [<ffffffff811b96e2>] ? SyS_fchmodat+0x62/0xc0
Jul 19 23:58:39 nas kernel:  [<ffffffff811cc22b>] SyS_rename+0x1b/0x20
Jul 19 23:58:39 nas kernel:  [<ffffffff81517b29>] system_call_fastpath+0x16/0x1b
---
Jul 21 22:31:02 nas kernel: INFO: task kworker/u8:1:12866 blocked for more than 120 seconds.
Jul 21 22:31:02 nas kernel:       Not tainted 3.16.0-1-mainline #1
Jul 21 22:31:02 nas kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 21 22:31:02 nas kernel: kworker/u8:1    D 0000000000000000     0 12866      2 0x00000000
Jul 21 22:31:02 nas kernel: Workqueue: btrfs-delalloc normal_work_helper [btrfs]
Jul 21 22:31:02 nas kernel:  ffff88010760f5d8 0000000000000046 ffff8800d293bd20 00000000000145c0
Jul 21 22:31:02 nas kernel:  ffff88010760ffd8 00000000000145c0 ffff8800d293bd20 0000000000000001
Jul 21 22:31:02 nas kernel:  ffffffff810b6c30 ffff88010760f5c8 ffff88010d4bc2e8 ffff8800a45b4000
Jul 21 22:31:02 nas kernel: Call Trace:
Jul 21 22:31:02 nas kernel:  [<ffffffff810b6c30>] ? __wake_up_sync+0x20/0x20
Jul 21 22:31:02 nas kernel:  [<ffffffffa0357406>] ? btrfs_bio_counter_dec+0x26/0x60 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa0312665>] ? btrfs_map_bio+0x4b5/0x520 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffff8114b750>] ? filemap_fdatawait+0x30/0x30
Jul 21 22:31:02 nas kernel:  [<ffffffff8151e919>] schedule+0x29/0x70
Jul 21 22:31:02 nas kernel:  [<ffffffff8151ebf4>] io_schedule+0x94/0xf0
Jul 21 22:31:02 nas kernel:  [<ffffffff8114b75e>] sleep_on_page+0xe/0x20
Jul 21 22:31:02 nas kernel:  [<ffffffff8151f014>] __wait_on_bit+0x64/0x90
Jul 21 22:31:02 nas kernel:  [<ffffffff8114b509>] wait_on_page_bit+0x89/0xa0
Jul 21 22:31:02 nas kernel:  [<ffffffff810b6c70>] ? autoremove_wake_function+0x40/0x40
Jul 21 22:31:02 nas kernel:  [<ffffffffa030943a>] read_extent_buffer_pages+0x30a/0x350 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa02dc1e0>] ? free_root_pointers+0x60/0x60 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa02ddd01>] btree_read_extent_buffer_pages.constprop.45+0xb1/0x110 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa02de378>] read_tree_block+0x38/0x60 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa02be770>] read_block_for_search.isra.29+0x160/0x3b0 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa02c429a>] btrfs_next_old_leaf+0x26a/0x460 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa02c44a0>] btrfs_next_leaf+0x10/0x20 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa030e924>] find_free_dev_extent+0xc4/0x360 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa030ed8a>] __btrfs_alloc_chunk+0x1ca/0x810 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa03119b4>] btrfs_alloc_chunk+0x34/0x40 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa02cc82e>] do_chunk_alloc+0x1de/0x3d0 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa02d1e65>] find_free_extent+0xbd5/0xe00 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa02d2268>] btrfs_reserve_extent+0x68/0x160 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa02ec36b>] cow_file_range+0x13b/0x430 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa02ed715>] submit_compressed_extents+0x215/0x4a0 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa02ed9a0>] ? submit_compressed_extents+0x4a0/0x4a0 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa02eda26>] async_cow_submit+0x86/0x90 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa0316735>] normal_work_helper+0x205/0x350 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffff8108afd8>] process_one_work+0x168/0x450
Jul 21 22:31:02 nas kernel:  [<ffffffff8108b79c>] worker_thread+0x1fc/0x550
Jul 21 22:31:02 nas kernel:  [<ffffffff8108b5a0>] ? init_pwq.part.22+0x10/0x10
Jul 21 22:31:02 nas kernel:  [<ffffffff81091d1a>] kthread+0xea/0x100
Jul 21 22:31:02 nas kernel:  [<ffffffff81091c30>] ? kthread_create_on_node+0x1b0/0x1b0
Jul 21 22:31:02 nas kernel:  [<ffffffff8152277c>] ret_from_fork+0x7c/0xb0
Jul 21 22:31:02 nas kernel:  [<ffffffff81091c30>] ? kthread_create_on_node+0x1b0/0x1b0
---
Jul 21 22:31:02 nas kernel: INFO: task afpd:15988 blocked for more than 120 seconds.
Jul 21 22:31:02 nas kernel:       Not tainted 3.16.0-1-mainline #1
Jul 21 22:31:02 nas kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 21 22:31:02 nas kernel: afpd            D 0000000000000001     0 15988  11686 0x00000004
Jul 21 22:31:02 nas kernel:  ffff88010d083c48 0000000000000086 ffff8800d8d88000 00000000000145c0
Jul 21 22:31:02 nas kernel:  ffff88010d083fd8 00000000000145c0 ffff8800d8d88000 ffff8800a4505000
Jul 21 22:31:02 nas kernel:  0000000000000002 00000000000a6fff ffff88010d083c28 ffffffffa02ed342
Jul 21 22:31:02 nas kernel: Call Trace:
Jul 21 22:31:02 nas kernel:  [<ffffffffa02ed342>] ? run_delalloc_range+0x192/0x350 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffff8114b750>] ? filemap_fdatawait+0x30/0x30
Jul 21 22:31:02 nas kernel:  [<ffffffff8151e919>] schedule+0x29/0x70
Jul 21 22:31:02 nas kernel:  [<ffffffff8151ebf4>] io_schedule+0x94/0xf0
Jul 21 22:31:02 nas kernel:  [<ffffffff8114b75e>] sleep_on_page+0xe/0x20
Jul 21 22:31:02 nas kernel:  [<ffffffff8151f138>] __wait_on_bit_lock+0x48/0xb0
Jul 21 22:31:02 nas kernel:  [<ffffffff8114b8b8>] __lock_page+0x78/0x90
Jul 21 22:31:02 nas kernel:  [<ffffffff810b6c70>] ? autoremove_wake_function+0x40/0x40
Jul 21 22:31:02 nas kernel:  [<ffffffffa0305a18>] extent_write_cache_pages.isra.28.constprop.43+0x2d8/0x410 [btr
Jul 21 22:31:02 nas kernel:  [<ffffffffa0307cec>] extent_writepages+0x5c/0x90 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa02e9780>] ? btrfs_direct_IO+0x350/0x350 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffffa02e8518>] btrfs_writepages+0x28/0x30 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffff8115913e>] do_writepages+0x1e/0x30
Jul 21 22:31:02 nas kernel:  [<ffffffff8114d22d>] __filemap_fdatawrite_range+0x5d/0x80
Jul 21 22:31:02 nas kernel:  [<ffffffff8114d313>] filemap_fdatawrite_range+0x13/0x20
Jul 21 22:31:02 nas kernel:  [<ffffffffa02f68ad>] btrfs_sync_file+0x5d/0x320 [btrfs]
Jul 21 22:31:02 nas kernel:  [<ffffffff811f1e11>] do_fsync+0x51/0x80
Jul 21 22:31:02 nas kernel:  [<ffffffff811f20c0>] SyS_fsync+0x10/0x20
Jul 21 22:31:02 nas kernel:  [<ffffffff81522829>] system_call_fastpath+0x16/0x1b
---
Jul 21 22:33:02 nas kernel: INFO: task btrfs-transacti:921 blocked for more than 120 seconds.
Jul 21 22:33:02 nas kernel:       Not tainted 3.16.0-1-mainline #1
Jul 21 22:33:02 nas kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 21 22:33:02 nas kernel: btrfs-transacti D 0000000000000000     0   921      2 0x00000000
Jul 21 22:33:02 nas kernel:  ffff8800d27dbe00 0000000000000046 ffff8800d28d5bb0 00000000000145c0
Jul 21 22:33:02 nas kernel:  ffff8800d27dbfd8 00000000000145c0 ffff8800d28d5bb0 ffff880101f72480
Jul 21 22:33:02 nas kernel:  ffffffffa033b5a6 ffff8800d27dbda8 ffffffffa033b864 ffff8800d18f95a0
Jul 21 22:33:02 nas kernel: Call Trace:
Jul 21 22:33:02 nas kernel:  [<ffffffffa033b5a6>] ? btrfs_release_delayed_inode+0x46/0x60 [btrfs]
Jul 21 22:33:02 nas kernel:  [<ffffffffa033b864>] ? __btrfs_update_delayed_inode+0x1a4/0x210 [btrfs]
Jul 21 22:33:02 nas kernel:  [<ffffffff811a570a>] ? kmem_cache_alloc+0x16a/0x170
Jul 21 22:33:02 nas kernel:  [<ffffffffa02bb89a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
Jul 21 22:33:02 nas kernel:  [<ffffffffa034344a>] ? btrfs_scrub_pause+0xea/0x110 [btrfs]
Jul 21 22:33:02 nas kernel:  [<ffffffff8151e919>] schedule+0x29/0x70
Jul 21 22:33:02 nas kernel:  [<ffffffffa02e5745>] btrfs_commit_transaction+0x405/0xa50 [btrfs]
Jul 21 22:33:02 nas kernel:  [<ffffffff810b6c30>] ? __wake_up_sync+0x20/0x20
Jul 21 22:33:02 nas kernel:  [<ffffffffa02e1275>] transaction_kthread+0x1e5/0x250 [btrfs]
Jul 21 22:33:02 nas kernel:  [<ffffffffa02e1090>] ? btrfs_cleanup_transaction+0x5a0/0x5a0 [btrfs]
Jul 21 22:33:02 nas kernel:  [<ffffffff81091d1a>] kthread+0xea/0x100
Jul 21 22:33:02 nas kernel:  [<ffffffff81090000>] ? parse_args+0x4b0/0x4e0
Jul 21 22:33:02 nas kernel:  [<ffffffff81091c30>] ? kthread_create_on_node+0x1b0/0x1b0
Jul 21 22:33:02 nas kernel:  [<ffffffff8152277c>] ret_from_fork+0x7c/0xb0
Jul 21 22:33:02 nas kernel:  [<ffffffff81091c30>] ? kthread_create_on_node+0x1b0/0x1b0

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-22 21:15                                       ` Chris Mason
@ 2014-07-23 11:13                                         ` Martin Steigerwald
  0 siblings, 0 replies; 49+ messages in thread
From: Martin Steigerwald @ 2014-07-23 11:13 UTC (permalink / raw)
  To: Chris Mason; +Cc: Cody P Schafer, Chris Samuel, linux-btrfs

Am Dienstag, 22. Juli 2014, 17:15:21 schrieb Chris Mason:
> On 07/22/2014 05:13 PM, Martin Steigerwald wrote:
> > Am Dienstag, 22. Juli 2014, 10:53:03 schrieb Chris Mason:
> >> On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
> >>>> Running 3.15.6 with this patch applied on top:
> >>>>  - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/
> >>>>  /home/nyx/`
> >>>> 
> >>>> - no extra error messages printed (`dmesg | grep racing`) compared to
> >>>> without the patch
> >>> 
> >>> I got same results with 3.16-rc5 + this patch (see thread BTRFS hang
> >>> with
> >>> 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
> >>> 
> >>>> To recap some details (so I can have it all in one place):
> >>>>  - /home/ is btrfs with compress=lzo
> >>> 
> >>> BTRFS RAID 1 with lzo.
> >>> 
> >>>>  - I have _not_ created any nodatacow files.
> >>> 
> >>> Me neither.
> >>> 
> >>>>  - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
> >>>> 
> >>>> mentioning the use of dmcrypt)
> >>> 
> >>> Same, except no dmcrypt.
> >> 
> >> Thanks for the help in tracking this down everyone.  We'll get there!
> >> Are you all running multi-disk systems (from a btrfs POV, more than one
> >> device?)  I don't care how many physical drives this maps to, just does
> >> btrfs think there's more than one drive.
> > 
> > As I told before I am using BTRFS RAID 1. Two logival volumes on two
> > distinct SSDs. RAID is directly in BTRFS, no SoftRAID here (which I
> > wouldn´t want to use with SSDs anyway).
> 
> When you say logical volumes, you mean LVM right?  Just making sure I
> know all the pieces involved.

Exactly.

As a recap from the other thread:

merkaba:~> btrfs fi sh /home
Label: 'home'  uuid: […]
        Total devices 2 FS bytes used 123.20GiB
        devid    1 size 160.00GiB used 159.98GiB path /dev/mapper/msata-home
        devid    2 size 160.00GiB used 159.98GiB path /dev/dm-3

Btrfs v3.14.1

merkaba:~#1> btrfs fi df /home
Data, RAID1: total=154.95GiB, used=120.61GiB
System, RAID1: total=32.00MiB, used=48.00KiB
Metadata, RAID1: total=5.00GiB, used=2.59GiB
unknown, single: total=512.00MiB, used=0.00
merkaba:~> df -hT /home
Dateisystem            Typ   Größe Benutzt Verf. Verw% Eingehängt auf
/dev/mapper/msata-home btrfs  320G    247G   69G   79% /home

merkaba:~> file -sk /dev/sata/home
/dev/sata/home: symbolic link to `../dm-3'
merkaba:~> file -sk /dev/dm-3     
/dev/dm-3: BTRFS Filesystem label "home", sectorsize 4096, nodesize 16384, 
leafsize 16384, UUID=[…], 
132303151104/343597383680 bytes used, 2 devices


And LVM layout:

merkaba:~> lsblk
NAME                    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                       8:0    0 279,5G  0 disk 
├─sda1                    8:1    0     4M  0 part 
├─sda2                    8:2    0   191M  0 part 
├─sda3                    8:3    0   286M  0 part 
└─sda4                    8:4    0   279G  0 part 
  ├─sata-home (dm-3)    254:3    0   160G  0 lvm  
  ├─sata-swap (dm-4)    254:4    0    12G  0 lvm  [SWAP]
  └─sata-debian (dm-5)  254:5    0    30G  0 lvm  
sdb                       8:16   0 447,1G  0 disk 
├─sdb1                    8:17   0   200M  0 part 
├─sdb2                    8:18   0   300M  0 part /boot
└─sdb3                    8:19   0 446,7G  0 part 
  ├─msata-home (dm-0)   254:0    0   160G  0 lvm  
  ├─msata-daten (dm-1)  254:1    0   200G  0 lvm  
  └─msata-debian (dm-2) 254:2    0    30G  0 lvm  
sr0                      11:0    1  1024M  0 rom 

sda is Intel SSD 320 SATA

sdb is Crucial m500 mSATA

Thanks,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-22 14:53                                   ` Chris Mason
                                                       ` (4 preceding siblings ...)
  2014-07-23  1:06                                     ` Rich Freeman
@ 2014-07-23 13:20                                     ` Charles Cazabon
  2014-07-25  2:27                                     ` Cody P Schafer
  2014-08-12  2:55                                     ` Charles Cazabon
  7 siblings, 0 replies; 49+ messages in thread
From: Charles Cazabon @ 2014-07-23 13:20 UTC (permalink / raw)
  To: linux-btrfs

Chris Mason <clm@fb.com> wrote:
> 
> Thanks for the help in tracking this down everyone.  We'll get there!
> Are you all running multi-disk systems (from a btrfs POV, more than one
> device?)  I don't care how many physical drives this maps to, just does
> btrfs think there's more than one drive.

Not me, at least - I'm doing the device aggregation down at the LVM level
(sata-dmcrypt-lvm-btrfs stack), so it's presented to btrfs as a single logical
device.

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL'ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-22 14:53                                   ` Chris Mason
                                                       ` (5 preceding siblings ...)
  2014-07-23 13:20                                     ` Charles Cazabon
@ 2014-07-25  2:27                                     ` Cody P Schafer
  2014-08-07 15:12                                       ` Tobias Holst
  2014-08-12  2:55                                     ` Charles Cazabon
  7 siblings, 1 reply; 49+ messages in thread
From: Cody P Schafer @ 2014-07-25  2:27 UTC (permalink / raw)
  To: Chris Mason; +Cc: Martin Steigerwald, Chris Samuel, linux-btrfs

On Tue, Jul 22, 2014 at 9:53 AM, Chris Mason <clm@fb.com> wrote:
>
>
> On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
>
>>> Running 3.15.6 with this patch applied on top:
>>>  - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/`
>>> - no extra error messages printed (`dmesg | grep racing`) compared to
>>> without the patch
>>
>> I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with
>> 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
>>
>>> To recap some details (so I can have it all in one place):
>>>  - /home/ is btrfs with compress=lzo
>>
>> BTRFS RAID 1 with lzo.
>>
>>>  - I have _not_ created any nodatacow files.
>>
>> Me neither.
>>
>>>  - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
>>> mentioning the use of dmcrypt)
>>
>> Same, except no dmcrypt.
>>
>
> Thanks for the help in tracking this down everyone.  We'll get there!
> Are you all running multi-disk systems (from a btrfs POV, more than one
> device?)  I don't care how many physical drives this maps to, just does
> btrfs think there's more than one drive.

No, both of my btrfs filesystems are single disk.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-25  2:27                                     ` Cody P Schafer
@ 2014-08-07 15:12                                       ` Tobias Holst
  2014-08-07 16:05                                         ` Duncan
  0 siblings, 1 reply; 49+ messages in thread
From: Tobias Holst @ 2014-08-07 15:12 UTC (permalink / raw)
  To: linux-btrfs

Hi

Is there anything new on this topic? I am using Ubuntu 14.04.1 and
experiencing the same problem.
- 6 HDDs
- LUKS on every HDD
- btrfs RAID6 over this 6 crypt-devices
No LVM, no nodatacow files.
Mount-options: defaults,compress-force=lzo,space_cache
With the original 3.13-kernel (3.13.0-32-generic) it is working fine.

Then I tried the following kernels from here:
http://kernel.ubuntu.com/~kernel-ppa/mainline/
linux-image-3.14.15-031415-generic_3.14.15-031415.201407311853_amd64.deb
- not even booting, kernel panic at boot.
linux-image-3.15.6-031506-generic_3.15.6-031506.201407172034_amd64.deb,
linux-image-3.15.7-031507-generic_3.15.7-031507.201407281235_amd64.deb,
and linux-image-3.16.0-031600-generic_3.16.0-031600.201408031935_amd64.deb
causing the hangs like described in this thread. When doing big IO
(unpacking a .rar-archive with multiple GB) the filesystem stops
working. Load stays very high but nothing actually happens on the
drives accoding to dstat. htop shows a D (uninterruptible sleep
(usually IO)) at many kworker-threads.
Unmounting of the btrfs-filesystem only works with -l (lazy) option.
Reboot or shutdown doesn't work because of the blocking threads. So
only a power cut works. After the reboot the last written data before
the hang is lost.

I am now back on 3.13.

Regards


2014-07-25 4:27 GMT+02:00 Cody P Schafer <dev@codyps.com>:
>
> On Tue, Jul 22, 2014 at 9:53 AM, Chris Mason <clm@fb.com> wrote:
> >
> >
> > On 07/19/2014 02:23 PM, Martin Steigerwald wrote:
> >
> >>> Running 3.15.6 with this patch applied on top:
> >>>  - still causes a hang with `rsync -hPaHAXx --del /mnt/home/nyx/ /home/nyx/`
> >>> - no extra error messages printed (`dmesg | grep racing`) compared to
> >>> without the patch
> >>
> >> I got same results with 3.16-rc5 + this patch (see thread BTRFS hang with
> >> 3.16-rc5). 3.16-rc4 still is fine with me. No hang whatsoever so far.
> >>
> >>> To recap some details (so I can have it all in one place):
> >>>  - /home/ is btrfs with compress=lzo
> >>
> >> BTRFS RAID 1 with lzo.
> >>
> >>>  - I have _not_ created any nodatacow files.
> >>
> >> Me neither.
> >>
> >>>  - Full stack is: sata <-> dmcrypt <-> lvm <-> btrfs (I noticed others
> >>> mentioning the use of dmcrypt)
> >>
> >> Same, except no dmcrypt.
> >>
> >
> > Thanks for the help in tracking this down everyone.  We'll get there!
> > Are you all running multi-disk systems (from a btrfs POV, more than one
> > device?)  I don't care how many physical drives this maps to, just does
> > btrfs think there's more than one drive.
>
> No, both of my btrfs filesystems are single disk.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-08-07 15:12                                       ` Tobias Holst
@ 2014-08-07 16:05                                         ` Duncan
  0 siblings, 0 replies; 49+ messages in thread
From: Duncan @ 2014-08-07 16:05 UTC (permalink / raw)
  To: linux-btrfs

Tobias Holst posted on Thu, 07 Aug 2014 17:12:17 +0200 as excerpted:

> Is there anything new on this topic? I am using Ubuntu 14.04.1 and
> experiencing the same problem.
> - 6 HDDs - LUKS on every HDD - btrfs RAID6 over this 6 crypt-devices No
> LVM, no nodatacow files.
> Mount-options: defaults,compress-force=lzo,space_cache With the original
> 3.13-kernel (3.13.0-32-generic) it is working fine.

I see you're using compress-force.  See the recent replies to the
"Btrfs: fix compressed write corruption on enospc" thread.

I'm not /sure/ your case is directly related (tho the kworker code is 
pretty new and 3.13 may be working for you due to being before the 
migration to kworkers, supporting the case of it being either the same 
problem or another related to it), but that's certainly one problem 
they've recently traced down... to a bug in the kworker threads code, 
that starts a new worker that can race with the first instead of obeying 
a flag that says keep it on the first worker.

Looks like they're doing patch that takes a slower but safer path to work 
around the kworker bug for now, as that bug was just traced (there was 
another bug, with a patch available originally hiding the ultimate 
problem, but obviously that's only half the fix as it simply revealed 
another bug underneath) and fixing it properly is likely to take some 
time.  Now that it's basically traced the workaround patch should be 
published on-list shortly and should make it into 3.17 and back into the 
stables, altho I'm not sure it'll make it into 3.16.1, etc.

But there's certainly progress. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-07-22 14:53                                   ` Chris Mason
                                                       ` (6 preceding siblings ...)
  2014-07-25  2:27                                     ` Cody P Schafer
@ 2014-08-12  2:55                                     ` Charles Cazabon
  2014-08-12  2:56                                       ` Liu Bo
  2014-08-12  4:49                                       ` Marc MERLIN
  7 siblings, 2 replies; 49+ messages in thread
From: Charles Cazabon @ 2014-08-12  2:55 UTC (permalink / raw)
  To: linux-btrfs

The blocked tasks issue that got significantly worse in 3.15 -- did anything
go into 3.16 related to this?  I didn't see a single "btrfs" in Linus' 3.16
announcement, so I don't know whether it should be better, the same, or worse
in this respect...

I haven't seen a definite statement about this on this list, either.

Can someone more familiar with the state of development comment on this?

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon
GPL'ed software available at:               http://pyropus.ca/software/
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-08-12  2:55                                     ` Charles Cazabon
@ 2014-08-12  2:56                                       ` Liu Bo
  2014-08-12  4:18                                         ` Duncan
  2014-08-12  4:49                                       ` Marc MERLIN
  1 sibling, 1 reply; 49+ messages in thread
From: Liu Bo @ 2014-08-12  2:56 UTC (permalink / raw)
  To: linux-btrfs

On Mon, Aug 11, 2014 at 08:55:21PM -0600, Charles Cazabon wrote:
> The blocked tasks issue that got significantly worse in 3.15 -- did anything
> go into 3.16 related to this?  I didn't see a single "btrfs" in Linus' 3.16
> announcement, so I don't know whether it should be better, the same, or worse
> in this respect...
> 
> I haven't seen a definite statement about this on this list, either.
> 
> Can someone more familiar with the state of development comment on this?

Good news is that we've figured out the bug and the patch is already under
testing :-) 

thanks,
-liubo

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-08-12  2:56                                       ` Liu Bo
@ 2014-08-12  4:18                                         ` Duncan
  0 siblings, 0 replies; 49+ messages in thread
From: Duncan @ 2014-08-12  4:18 UTC (permalink / raw)
  To: linux-btrfs

Liu Bo posted on Tue, 12 Aug 2014 10:56:42 +0800 as excerpted:

> On Mon, Aug 11, 2014 at 08:55:21PM -0600, Charles Cazabon wrote:
>> The blocked tasks issue that got significantly worse in 3.15 -- did
>> anything go into 3.16 related to this?  I didn't see a single "btrfs"
>> in Linus' 3.16 announcement, so I don't know whether it should be
>> better, the same, or worse in this respect...
>> 
>> I haven't seen a definite statement about this on this list, either.
>> 
>> Can someone more familiar with the state of development comment on
>> this?
> 
> Good news is that we've figured out the bug and the patch is already
> under testing :-)

IOW, it's not in 3.16.0, but will hopefully make it into 3.16.2 (it'll 
likely be a too late for 3.16.1).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-08-12  2:55                                     ` Charles Cazabon
  2014-08-12  2:56                                       ` Liu Bo
@ 2014-08-12  4:49                                       ` Marc MERLIN
  2014-08-18 20:34                                         ` James Cloos
  1 sibling, 1 reply; 49+ messages in thread
From: Marc MERLIN @ 2014-08-12  4:49 UTC (permalink / raw)
  To: linux-btrfs

On Mon, Aug 11, 2014 at 08:55:21PM -0600, Charles Cazabon wrote:
> The blocked tasks issue that got significantly worse in 3.15 -- did anything
> go into 3.16 related to this?  I didn't see a single "btrfs" in Linus' 3.16
> announcement, so I don't know whether it should be better, the same, or worse
> in this respect...
> 
> I haven't seen a definite statement about this on this list, either.

Yes, 3.15 is unusable for some workloads, mine included.
Go back to 3.14 until there is a patch in 3.16, which there isn't quite
as for right now, but very soon hopefully.

Note 3.16.0 is actually worse than 3.15 for me.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
  2014-08-12  4:49                                       ` Marc MERLIN
@ 2014-08-18 20:34                                         ` James Cloos
  0 siblings, 0 replies; 49+ messages in thread
From: James Cloos @ 2014-08-18 20:34 UTC (permalink / raw)
  To: linux-btrfs

>>>>> "MM" == Marc MERLIN <marc@merlins.org> writes:

MM> Note 3.16.0 is actually worse than 3.15 for me.

Here (a single partition btrfs), 3.16.0 works fine, but 3.17-rc1 fails again.

My /var/log is also a compressed, single-partition btrfs; that doesn't
show the problem with any version.  Just the partition with git, svn and
rsync trees.

Last night's test of 3.17-rc1 showed the problem with the first git
pull, getting stuck reading FETCH_HEAD.  All repos on that fs failed
the same way.

But rebooting back to 3.16.0 let everything work perfectly.

-JimC
-- 
James Cloos <cloos@jhcloos.com>         OpenPGP: 0x997A9F17ED7DAEA6

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Blocked tasks on 3.15.1
@ 2014-07-20 21:34 Matt
  0 siblings, 0 replies; 49+ messages in thread
From: Matt @ 2014-07-20 21:34 UTC (permalink / raw)
  To: clm; +Cc: linux-btrfs

>[ deadlocks during rsync in 3.15 with compression enabled ]

>Hi everyone,

>I still haven't been able to reproduce this one here, but I'm going
>through a series of tests with lzo compression foraced and every
>operation forced to ordered.  Hopefully it'll kick it out soon.
>
>While I'm hammering away, could you please try this patch.  If this is
>the buy you're hitting, the deadlock will go away and you'll see this
>printk in the log.
>
>thanks!
>
>-chris

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 3668048..8ab56df 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8157,6 +8157,13 @@ void btrfs_destroy_inode(struct inode *inode)
  spin_unlock(&root->fs_info->ordered_root_lock);
  }

+ spin_lock(&root->fs_info->ordered_root_lock);
+ if (!list_empty(&BTRFS_I(inode)->ordered_operations)) {
+ list_del_init(&BTRFS_I(inode)->ordered_operations);
+printk(KERN_CRIT "racing inode deletion with ordered operations!!!!!!!!!!!\n");
+ }
+ spin_unlock(&root->fs_info->ordered_root_lock);
+
  if (test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
      &BTRFS_I(inode)->runtime_flags)) {
  btrfs_info(root->fs_info, "inode %llu still on the orphan list",
--



Hi Chris,

just had that hang during rsync from /home (ZFS, mirrored) to /bak
(Btrfs w. lzo compression) again with that patch applied, it doesn't
seem to be related to that issue (or patch) - only applicable to my
case, obviously - since search for that string (e.g. "racing") doesn't
show anything in that message:

[16028.169347] INFO: task kworker/u16:2:11956 blocked for more than 180 seconds.
[16028.169349] Tainted: P O 3.14.13_btrfs+_BFS_test27_integration #2
[16028.169350] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[16028.169351] kworker/u16:2 D ffff88081ec13540 0 11956 2 0x00000008
[16028.169356] Workqueue: btrfs-delalloc normal_work_helper
[16028.169358] ffff8806180ab8e0 0000000000000046 0000000000000000
0000000000000004
[16028.169359] 000000000000a000 ffff8806210f16b0 ffff8806180abfd8
ffffffff81e11500
[16028.169360] ffff8806210f16b0 0000000000000206 ffffffff8113e6cc
ffff88081ec135c0
[16028.169362] Call Trace:
[16028.169367] [<ffffffff8113e6cc>] ? delayacct_end+0x7c/0x90
[16028.169370] [<ffffffff811689d0>] ? wait_on_page_read+0x60/0x60
[16028.169374] [<ffffffff819cfc78>] ? io_schedule+0x88/0xe0
[16028.169375] [<ffffffff811689d5>] ? sleep_on_page+0x5/0x10
[16028.169377] [<ffffffff819cfffc>] ? __wait_on_bit_lock+0x3c/0x90
[16028.169378] [<ffffffff81168ac5>] ? __lock_page+0x65/0x70
[16028.169382] [<ffffffff810f5580>] ? autoremove_wake_function+0x30/0x30
[16028.169384] [<ffffffff81169854>] ? __find_lock_page+0x44/0x70
[16028.169385] [<ffffffff811698ca>] ? find_or_create_page+0x2a/0xa0
[16028.169388] [<ffffffff8145a1cf>] ? io_ctl_prepare_pages+0x4f/0x150
[16028.169390] [<ffffffff8145bd45>] ? __load_free_space_cache+0x195/0x5d0
[16028.169392] [<ffffffff8145c26b>] ? load_free_space_cache+0xeb/0x1b0
[16028.169395] [<ffffffff813fd6a1>] ? cache_block_group+0x191/0x390
[16028.169396] [<ffffffff810f5550>] ? prepare_to_wait_event+0xf0/0xf0
[16028.169398] [<ffffffff814085ea>] ? find_free_extent+0x95a/0xdb0
[16028.169400] [<ffffffff81408bf9>] ? btrfs_reserve_extent+0x69/0x150
[16028.169403] [<ffffffff81421116>] ? cow_file_range+0x136/0x420
[16028.169404] [<ffffffff81422493>] ? submit_compressed_extents+0x1f3/0x480
[16028.169406] [<ffffffff81422720>] ? submit_compressed_extents+0x480/0x480
[16028.169407] [<ffffffff8144896b>] ? normal_work_helper+0x1ab/0x330
[16028.169410] [<ffffffff810df26d>] ? process_one_work+0x16d/0x490
[16028.169411] [<ffffffff810dff8b>] ? worker_thread+0x12b/0x410
[16028.169412] [<ffffffff810dfe60>] ? manage_workers.isra.28+0x2c0/0x2c0
[16028.169414] [<ffffffff810e579a>] ? kthread+0xca/0xe0
[16028.169415] [<ffffffff810e56d0>] ? kthread_create_on_node+0x180/0x180
[16028.169417] [<ffffffff819d3c7c>] ? ret_from_fork+0x7c/0xb0
[16028.169418] [<ffffffff810e56d0>] ? kthread_create_on_node+0x180/0x180
[16028.169422] INFO: task btrfs-transacti:12042 blocked for more than
180 seconds.
[16028.169422] Tainted: P O 3.14.13_btrfs+_BFS_test27_integration #2
[16028.169423] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[16028.169423] btrfs-transacti D ffff88081ec13540 0 12042 2 0x00000008
[16028.169425] ffff88009c7adb20 0000000000000046 0000000000000000
ffff88040d84ca68
[16028.169426] 000000000000a000 ffff88061f284ba0 ffff88009c7adfd8
ffffffff81e11500
[16028.169427] ffff88061f284ba0 ffff88061a21dea8 ffffffff811b8c2d
ffff8805fc919e00
[16028.169428] Call Trace:
[16028.169431] [<ffffffff811b8c2d>] ? kmem_cache_alloc_trace+0x14d/0x160
[16028.169433] [<ffffffff813fd632>] ? cache_block_group+0x122/0x390
[16028.169434] [<ffffffff810f5550>] ? prepare_to_wait_event+0xf0/0xf0
[16028.169436] [<ffffffff814085ea>] ? find_free_extent+0x95a/0xdb0
[16028.169437] [<ffffffff81408bf9>] ? btrfs_reserve_extent+0x69/0x150
[16028.169439] [<ffffffff81422fa8>] ? __btrfs_prealloc_file_range+0xe8/0x380
[16028.169441] [<ffffffff8140b6f2>] ? btrfs_write_dirty_block_groups+0x642/0x6d0
[16028.169442] [<ffffffff819cb00c>] ? commit_cowonly_roots+0x173/0x221
[16028.169443] [<ffffffff8141ad19>] ? btrfs_commit_transaction+0x509/0xa30
[16028.169445] [<ffffffff8141b2cb>] ? start_transaction+0x8b/0x5b0
[16028.169446] [<ffffffff81416d65>] ? transaction_kthread+0x1d5/0x240
[16028.169447] [<ffffffff81416b90>] ? btrfs_cleanup_transaction+0x560/0x560
[16028.169448] [<ffffffff810e579a>] ? kthread+0xca/0xe0
[16028.169449] [<ffffffff810e56d0>] ? kthread_create_on_node+0x180/0x180
[16028.169451] [<ffffffff819d3c7c>] ? ret_from_fork+0x7c/0xb0
[16028.169452] [<ffffffff810e56d0>] ? kthread_create_on_node+0x180/0x180
[16028.169454] INFO: task kworker/u16:10:12208 blocked for more than
180 seconds.
[16028.169455] Tainted: P O 3.14.13_btrfs+_BFS_test27_integration #2
[16028.169455] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[16028.169456] kworker/u16:10 D ffff88081ed13540 0 12208 2 0x00000000
[16028.169460] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-183)
[16028.169461] ffff880567b31a60 0000000000000046 ffff880500000000
000000000000000e
[16028.169462] 000000000000a000 ffff88061efff170 ffff880567b31fd8
ffff880799bb0000
[16028.169463] ffff88061efff170 0000000000000206 ffffffff8113e6cc
ffff88081ed135c0
[16028.169464] Call Trace:
[16028.169466] [<ffffffff8113e6cc>] ? delayacct_end+0x7c/0x90
[16028.169467] [<ffffffff811689d0>] ? wait_on_page_read+0x60/0x60
[16028.169468] [<ffffffff819cfc78>] ? io_schedule+0x88/0xe0
[16028.169470] [<ffffffff811689d5>] ? sleep_on_page+0x5/0x10
[16028.169471] [<ffffffff819cfffc>] ? __wait_on_bit_lock+0x3c/0x90
[16028.169472] [<ffffffff81168ac5>] ? __lock_page+0x65/0x70
[16028.169474] [<ffffffff810f5580>] ? autoremove_wake_function+0x30/0x30
[16028.169476] [<ffffffff81175958>] ? pagevec_lookup_tag+0x18/0x20
[16028.169480] [<ffffffff81439468>] ?
extent_write_cache_pages.isra.31.constprop.54+0x268/0x360
[16028.169481] [<ffffffff8143af78>] ? extent_writepages+0x48/0x60
[16028.169483] [<ffffffff8141f190>] ? btrfs_writepage_end_io_hook+0x170/0x170
[16028.169485] [<ffffffff811ee518>] ? __writeback_single_inode+0x38/0x2c0
[16028.169486] [<ffffffff811f20f9>] ? writeback_sb_inodes+0x259/0x430
[16028.169488] [<ffffffff811f235e>] ? __writeback_inodes_wb+0x8e/0xc0
[16028.169489] [<ffffffff811f259b>] ? wb_writeback+0x20b/0x340
[16028.169490] [<ffffffff811f2ae7>] ? bdi_writeback_workfn+0x2f7/0x470
[16028.169492] [<ffffffff810df26d>] ? process_one_work+0x16d/0x490
[16028.169493] [<ffffffff810dff8b>] ? worker_thread+0x12b/0x410
[16028.169495] [<ffffffff810dfe60>] ? manage_workers.isra.28+0x2c0/0x2c0
[16028.169496] [<ffffffff810e579a>] ? kthread+0xca/0xe0
[16028.169497] [<ffffffff810e56d0>] ? kthread_create_on_node+0x180/0x180
[16028.169499] [<ffffffff819d3c7c>] ? ret_from_fork+0x7c/0xb0
[16028.169500] [<ffffffff810e56d0>] ? kthread_create_on_node+0x180/0x180
[16028.169501] INFO: task rsync:12233 blocked for more than 180 seconds.
[16028.169502] Tainted: P O 3.14.13_btrfs+_BFS_test27_integration #2
[16028.169502] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[16028.169503] rsync D ffff88081ec53540 0 12233 12232 0x00000008
[16028.169504] ffff8804030a3d28 0000000000000082 ffff8804030a3e30
ffffffff811d2f01
[16028.169505] 000000000000a000 ffff880799bb69e0 ffff8804030a3fd8
ffff8807fa0fe250
[16028.169507] ffff880799bb69e0 ffff880799bb69e0 ffff8804030a3da8
ffff8807f9c65600
[16028.169508] Call Trace:
[16028.169509] [<ffffffff811d2f01>] ? link_path_walk+0x81/0xe70
[16028.169511] [<ffffffff81419f27>] ? wait_current_trans.isra.20+0x97/0x100
[16028.169513] [<ffffffff810f5550>] ? prepare_to_wait_event+0xf0/0xf0
[16028.169514] [<ffffffff8141b510>] ? start_transaction+0x2d0/0x5b0
[16028.169515] [<ffffffff814208ab>] ? btrfs_dirty_inode+0x3b/0xd0
[16028.169517] [<ffffffff811e3411>] ? setattr_copy+0x91/0x120
[16028.169518] [<ffffffff814254fc>] ? btrfs_setattr+0xac/0x300
[16028.169519] [<ffffffff811e3691>] ? notify_change+0x1f1/0x370
[16028.169521] [<ffffffff811f6b1a>] ? utimes_common+0xca/0x1b0
[16028.169523] [<ffffffff811f6cc8>] ? do_utimes+0xc8/0x140
[16028.169525] [<ffffffff811f6e12>] ? SyS_utimensat+0x52/0x80
[16028.169527] [<ffffffff819d3d26>] ? system_call_fastpath+0x1a/0x1f
[16208.244385] INFO: task kworker/u16:2:11956 blocked for more than 180 seconds.
[16208.244388] Tainted: P O 3.14.13_btrfs+_BFS_test27_integration #2
[16208.244388] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[16208.244389] kworker/u16:2 D ffff88081ec13540 0 11956 2 0x00000008
[16208.244395] Workqueue: btrfs-delalloc normal_work_helper
[16208.244396] ffff8806180ab8e0 0000000000000046 0000000000000000
0000000000000004
[16208.244397] 000000000000a000 ffff8806210f16b0 ffff8806180abfd8
ffffffff81e11500
[16208.244399] ffff8806210f16b0 0000000000000206 ffffffff8113e6cc
ffff88081ec135c0
[16208.244400] Call Trace:
[16208.244405] [<ffffffff8113e6cc>] ? delayacct_end+0x7c/0x90
[16208.244408] [<ffffffff811689d0>] ? wait_on_page_read+0x60/0x60
[16208.244412] [<ffffffff819cfc78>] ? io_schedule+0x88/0xe0
[16208.244413] [<ffffffff811689d5>] ? sleep_on_page+0x5/0x10
[16208.244415] [<ffffffff819cfffc>] ? __wait_on_bit_lock+0x3c/0x90
[16208.244416] [<ffffffff81168ac5>] ? __lock_page+0x65/0x70
[16208.244420] [<ffffffff810f5580>] ? autoremove_wake_function+0x30/0x30
[16208.244421] [<ffffffff81169854>] ? __find_lock_page+0x44/0x70
[16208.244423] [<ffffffff811698ca>] ? find_or_create_page+0x2a/0xa0
[16208.244426] [<ffffffff8145a1cf>] ? io_ctl_prepare_pages+0x4f/0x150
[16208.244428] [<ffffffff8145bd45>] ? __load_free_space_cache+0x195/0x5d0
[16208.244430] [<ffffffff8145c26b>] ? load_free_space_cache+0xeb/0x1b0
[16208.244433] [<ffffffff813fd6a1>] ? cache_block_group+0x191/0x390
[16208.244434] [<ffffffff810f5550>] ? prepare_to_wait_event+0xf0/0xf0
[16208.244436] [<ffffffff814085ea>] ? find_free_extent+0x95a/0xdb0
[16208.244438] [<ffffffff81408bf9>] ? btrfs_reserve_extent+0x69/0x150
[16208.244440] [<ffffffff81421116>] ? cow_file_range+0x136/0x420
[16208.244442] [<ffffffff81422493>] ? submit_compressed_extents+0x1f3/0x480
[16208.244443] [<ffffffff81422720>] ? submit_compressed_extents+0x480/0x480
[16208.244445] [<ffffffff8144896b>] ? normal_work_helper+0x1ab/0x330
[16208.244447] [<ffffffff810df26d>] ? process_one_work+0x16d/0x490
[16208.244449] [<ffffffff810dff8b>] ? worker_thread+0x12b/0x410
[16208.244450] [<ffffffff810dfe60>] ? manage_workers.isra.28+0x2c0/0x2c0
[16208.244452] [<ffffffff810e579a>] ? kthread+0xca/0xe0
[16208.244453] [<ffffffff810e56d0>] ? kthread_create_on_node+0x180/0x180
[16208.244455] [<ffffffff819d3c7c>] ? ret_from_fork+0x7c/0xb0
[16208.244456] [<ffffffff810e56d0>] ? kthread_create_on_node+0x180/0x180
[16208.244459] INFO: task btrfs-transacti:12042 blocked for more than
180 seconds.
[16208.244460] Tainted: P O 3.14.13_btrfs+_BFS_test27_integration #2
[16208.244460] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[16208.244461] btrfs-transacti D ffff88081ec13540 0 12042 2 0x00000008
[16208.244462] ffff88009c7adb20 0000000000000046 0000000000000000
ffff88040d84ca68
[16208.244464] 000000000000a000 ffff88061f284ba0 ffff88009c7adfd8
ffffffff81e11500
[16208.244465] ffff88061f284ba0 ffff88061a21dea8 ffffffff811b8c2d
ffff8805fc919e00
[16208.244466] Call Trace:
[16208.244468] [<ffffffff811b8c2d>] ? kmem_cache_alloc_trace+0x14d/0x160
[16208.244470] [<ffffffff813fd632>] ? cache_block_group+0x122/0x390
[16208.244471] [<ffffffff810f5550>] ? prepare_to_wait_event+0xf0/0xf0
[16208.244473] [<ffffffff814085ea>] ? find_free_extent+0x95a/0xdb0
[16208.244474] [<ffffffff81408bf9>] ? btrfs_reserve_extent+0x69/0x150
[16208.244476] [<ffffffff81422fa8>] ? __btrfs_prealloc_file_range+0xe8/0x380
[16208.244478] [<ffffffff8140b6f2>] ? btrfs_write_dirty_block_groups+0x642/0x6d0
[16208.244479] [<ffffffff819cb00c>] ? commit_cowonly_roots+0x173/0x221
[16208.244481] [<ffffffff8141ad19>] ? btrfs_commit_transaction+0x509/0xa30
[16208.244482] [<ffffffff8141b2cb>] ? start_transaction+0x8b/0x5b0
[16208.244483] [<ffffffff81416d65>] ? transaction_kthread+0x1d5/0x240
[16208.244484] [<ffffffff81416b90>] ? btrfs_cleanup_transaction+0x560/0x560
[16208.244485] [<ffffffff810e579a>] ? kthread+0xca/0xe0
[16208.244487] [<ffffffff810e56d0>] ? kthread_create_on_node+0x180/0x180
[16208.244488] [<ffffffff819d3c7c>] ? ret_from_fork+0x7c/0xb0
[16208.244489] [<ffffffff810e56d0>] ? kthread_create_on_node+0x180/0x180
[16208.244491] INFO: task kworker/u16:10:12208 blocked for more than
180 seconds.
[16208.244491] Tainted: P O 3.14.13_btrfs+_BFS_test27_integration #2
[16208.244492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[16208.244492] kworker/u16:10 D ffff88081ed13540 0 12208 2 0x00000000
[16208.244496] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-183)
[16208.244497] ffff880567b31a60 0000000000000046 ffff880500000000
000000000000000e
[16208.244499] 000000000000a000 ffff88061efff170 ffff880567b31fd8
ffff880799bb0000
[16208.244500] ffff88061efff170 0000000000000206 ffffffff8113e6cc
ffff88081ed135c0
[16208.244501] Call Trace:
[16208.244502] [<ffffffff8113e6cc>] ? delayacct_end+0x7c/0x90
[16208.244504] [<ffffffff811689d0>] ? wait_on_page_read+0x60/0x60
[16208.244505] [<ffffffff819cfc78>] ? io_schedule+0x88/0xe0
[16208.244507] [<ffffffff811689d5>] ? sleep_on_page+0x5/0x10
[16208.244508] [<ffffffff819cfffc>] ? __wait_on_bit_lock+0x3c/0x90
[16208.244509] [<ffffffff81168ac5>] ? __lock_page+0x65/0x70
[16208.244511] [<ffffffff810f5580>] ? autoremove_wake_function+0x30/0x30
[16208.244513] [<ffffffff81175958>] ? pagevec_lookup_tag+0x18/0x20
[16208.244516] [<ffffffff81439468>] ?
extent_write_cache_pages.isra.31.constprop.54+0x268/0x360
[16208.244518] [<ffffffff8143af78>] ? extent_writepages+0x48/0x60
[16208.244520] [<ffffffff8141f190>] ? btrfs_writepage_end_io_hook+0x170/0x170
[16208.244522] [<ffffffff811ee518>] ? __writeback_single_inode+0x38/0x2c0
[16208.244523] [<ffffffff811f20f9>] ? writeback_sb_inodes+0x259/0x430
[16208.244524] [<ffffffff811f235e>] ? __writeback_inodes_wb+0x8e/0xc0
[16208.244526] [<ffffffff811f259b>] ? wb_writeback+0x20b/0x340
[16208.244527] [<ffffffff811f2ae7>] ? bdi_writeback_workfn+0x2f7/0x470
[16208.244529] [<ffffffff810df26d>] ? process_one_work+0x16d/0x490
[16208.244530] [<ffffffff810dff8b>] ? worker_thread+0x12b/0x410
[16208.244531] [<ffffffff810dfe60>] ? manage_workers.isra.28+0x2c0/0x2c0
[16208.244533] [<ffffffff810e579a>] ? kthread+0xca/0xe0
[16208.244534] [<ffffffff810e56d0>] ? kthread_create_on_node+0x180/0x180
[16208.244535] [<ffffffff819d3c7c>] ? ret_from_fork+0x7c/0xb0
[16208.244536] [<ffffffff810e56d0>] ? kthread_create_on_node+0x180/0x180
[16208.244538] INFO: task rsync:12233 blocked for more than 180 seconds.
[16208.244538] Tainted: P O 3.14.13_btrfs+_BFS_test27_integration #2
[16208.244539] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[16208.244539] rsync D ffff88081ec53540 0 12233 12232 0x00000008
[16208.244541] ffff8804030a3d28 0000000000000082 ffff8804030a3e30
ffffffff811d2f01
[16208.244542] 000000000000a000 ffff880799bb69e0 ffff8804030a3fd8
ffff8807fa0fe250
[16208.244543] ffff880799bb69e0 ffff880799bb69e0 ffff8804030a3da8
ffff8807f9c65600
[16208.244544] Call Trace:
[16208.244546] [<ffffffff811d2f01>] ? link_path_walk+0x81/0xe70
[16208.244547] [<ffffffff81419f27>] ? wait_current_trans.isra.20+0x97/0x100
[16208.244549] [<ffffffff810f5550>] ? prepare_to_wait_event+0xf0/0xf0
[16208.244550] [<ffffffff8141b510>] ? start_transaction+0x2d0/0x5b0
[16208.244552] [<ffffffff814208ab>] ? btrfs_dirty_inode+0x3b/0xd0
[16208.244553] [<ffffffff811e3411>] ? setattr_copy+0x91/0x120
[16208.244555] [<ffffffff814254fc>] ? btrfs_setattr+0xac/0x300
[16208.244556] [<ffffffff811e3691>] ? notify_change+0x1f1/0x370
[16208.244558] [<ffffffff811f6b1a>] ? utimes_common+0xca/0x1b0
[16208.244559] [<ffffffff811f6cc8>] ? do_utimes+0xc8/0x140
[16208.244561] [<ffffffff811f6e12>] ? SyS_utimensat+0x52/0x80
[16208.244563] [<ffffffff819d3d26>] ? system_call_fastpath+0x1a/0x1f
[16388.319440] INFO: task kworker/u16:2:11956 blocked for more than 180 seconds.
[16388.319442] Tainted: P O 3.14.13_btrfs+_BFS_test27_integration #2
[16388.319443] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[16388.319444] kworker/u16:2 D ffff88081ec13540 0 11956 2 0x00000008
[16388.319450] Workqueue: btrfs-delalloc normal_work_helper
[16388.319451] ffff8806180ab8e0 0000000000000046 0000000000000000
0000000000000004
[16388.319452] 000000000000a000 ffff8806210f16b0 ffff8806180abfd8
ffffffff81e11500
[16388.319453] ffff8806210f16b0 0000000000000206 ffffffff8113e6cc
ffff88081ec135c0
[16388.319455] Call Trace:
[16388.319460] [<ffffffff8113e6cc>] ? delayacct_end+0x7c/0x90
[16388.319463] [<ffffffff811689d0>] ? wait_on_page_read+0x60/0x60
[16388.319467] [<ffffffff819cfc78>] ? io_schedule+0x88/0xe0
[16388.319468] [<ffffffff811689d5>] ? sleep_on_page+0x5/0x10
[16388.319469] [<ffffffff819cfffc>] ? __wait_on_bit_lock+0x3c/0x90
[16388.319471] [<ffffffff81168ac5>] ? __lock_page+0x65/0x70
[16388.319475] [<ffffffff810f5580>] ? autoremove_wake_function+0x30/0x30
[16388.319477] [<ffffffff81169854>] ? __find_lock_page+0x44/0x70
[16388.319478] [<ffffffff811698ca>] ? find_or_create_page+0x2a/0xa0
[16388.319481] [<ffffffff8145a1cf>] ? io_ctl_prepare_pages+0x4f/0x150
[16388.319483] [<ffffffff8145bd45>] ? __load_free_space_cache+0x195/0x5d0
[16388.319485] [<ffffffff8145c26b>] ? load_free_space_cache+0xeb/0x1b0
[16388.319488] [<ffffffff813fd6a1>] ? cache_block_group+0x191/0x390
[16388.319489] [<ffffffff810f5550>] ? prepare_to_wait_event+0xf0/0xf0
[16388.319491] [<ffffffff814085ea>] ? find_free_extent+0x95a/0xdb0
[16388.319493] [<ffffffff81408bf9>] ? btrfs_reserve_extent+0x69/0x150
[16388.319496] [<ffffffff81421116>] ? cow_file_range+0x136/0x420
[16388.319497] [<ffffffff81422493>] ? submit_compressed_extents+0x1f3/0x480
[16388.319499] [<ffffffff81422720>] ? submit_compressed_extents+0x480/0x480
[16388.319500] [<ffffffff8144896b>] ? normal_work_helper+0x1ab/0x330
[16388.319503] [<ffffffff810df26d>] ? process_one_work+0x16d/0x490
[16388.319504] [<ffffffff810dff8b>] ? worker_thread+0x12b/0x410
[16388.319505] [<ffffffff810dfe60>] ? manage_workers.isra.28+0x2c0/0x2c0
[16388.319507] [<ffffffff810e579a>] ? kthread+0xca/0xe0
[16388.319508] [<ffffffff810e56d0>] ? kthread_create_on_node+0x180/0x180
[16388.319510] [<ffffffff819d3c7c>] ? ret_from_fork+0x7c/0xb0
[16388.319511] [<ffffffff810e56d0>] ? kthread_create_on_node+0x180/0x180
[16388.319514] INFO: task btrfs-transacti:12042 blocked for more than
180 seconds.
[16388.319515] Tainted: P O 3.14.13_btrfs+_BFS_test27_integration #2
[16388.319515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[16388.319516] btrfs-transacti D ffff88081ec13540 0 12042 2 0x00000008
[16388.319517] ffff88009c7adb20 0000000000000046 0000000000000000
ffff88040d84ca68
[16388.319519] 000000000000a000 ffff88061f284ba0 ffff88009c7adfd8
ffffffff81e11500
[16388.319520] ffff88061f284ba0 ffff88061a21dea8 ffffffff811b8c2d
ffff8805fc919e00
[16388.319521] Call Trace:
[16388.319524] [<ffffffff811b8c2d>] ? kmem_cache_alloc_trace+0x14d/0x160
[16388.319526] [<ffffffff813fd632>] ? cache_block_group+0x122/0x390
[16388.319527] [<ffffffff810f5550>] ? prepare_to_wait_event+0xf0/0xf0
[16388.319529] [<ffffffff814085ea>] ? find_free_extent+0x95a/0xdb0
[16388.319530] [<ffffffff81408bf9>] ? btrfs_reserve_extent+0x69/0x150
[16388.319532] [<ffffffff81422fa8>] ? __btrfs_prealloc_file_range+0xe8/0x380
[16388.319534] [<ffffffff8140b6f2>] ? btrfs_write_dirty_block_groups+0x642/0x6d0
[16388.319535] [<ffffffff819cb00c>] ? commit_cowonly_roots+0x173/0x221
[16388.319537] [<ffffffff8141ad19>] ? btrfs_commit_transaction+0x509/0xa30
[16388.319538] [<ffffffff8141b2cb>] ? start_transaction+0x8b/0x5b0
[16388.319539] [<ffffffff81416d65>] ? transaction_kthread+0x1d5/0x240
[16388.319540] [<ffffffff81416b90>] ? btrfs_cleanup_transaction+0x560/0x560
[16388.319541] [<ffffffff810e579a>] ? kthread+0xca/0xe0
[16388.319543] [<ffffffff810e56d0>] ? kthread_create_on_node+0x180/0x180
[16388.319544] [<ffffffff819d3c7c>] ? ret_from_fork+0x7c/0xb0
[16388.319545] [<ffffffff810e56d0>] ? kthread_create_on_node+0x180/0x180


but the previous error message I saw seemed related to

http://www.spinics.net/lists/linux-btrfs/msg35145.html

and

http://www.spinics.net/lists/linux-btrfs/msg33628.html


Be aware that this kernel is a highly patched up 3.14.13 with latest
Btrfs integration/for-linus branch - up to

commit abdd2e80a57e5f7278f47913315065f0a3d78d20
Author: Filipe Manana <fdmanana@gmail.com>
Date:   Tue Jun 24 17:46:58 2014 +0100

    Btrfs: fix crash when starting transaction

except

Btrfs: fix broken free space cache after the system crashed

(commit e570fd27f2c5d7eac3876bccf99e9838d7f911a3)

which doesn't seem to apply cleanly for me.

So it's not really representative when looking for other kernel
internals but should show almost 100% similar behavior like a 3.15+
kernel with latest integration/for-linus branch.

Currently I have no reason and plans to migrate to 3.15 since I'm
planning to wait for it to mature a little bit more.


Root is on Btrfs with lzo compression on an Intel SSD.

Last time this happened I had the partition formatted with zlib/gzip
compression. This time it's with lzo and also happening.

The problem is that rsync can't be killed off - so the load will
increase over time, only option being to reboot via Magic SYSRQ Key:

ps aux | grep rsync
root     12233  0.1  0.0  33880  4776 pts/0    D+   22:20   0:03 rsync
-aiP --delete --inplace --stats /home/matt/news/ /bak/matt/news/
root     12234  0.0  0.0      0     0 pts/0    Z+   22:20   0:00
[rsync] <defunct>
root     12579  0.0  0.0  30380  1376 pts/0    S+   23:20   0:00 rsync
-ai --delete --inplace --stats /home/matt/.links/ /bak/matt/.links/
root     12580  0.0  0.0  30352   940 pts/0    D+   23:20   0:00 rsync
-ai --delete --inplace --stats /home/matt/.links/ /bak/matt/.links/
root     12581  0.0  0.0  30352   280 pts/0    S+   23:20   0:00 rsync
-ai --delete --inplace --stats /home/matt/.links/ /bak/matt/.links/
root     12583  0.0  0.0  18916  1000 pts/1    S+   23:21   0:00 grep
--color=auto rsync

/bak is a newly created partition which a few days ago just got
finished getting written to (around 1.5 TB of data).


Any ideas or other patches I could try ?

If I understood correctly

"Btrfs: fix abnormal long waiting in fsync" doesn't apply to the 3.14
kernel base since it's rather new (June 5th, according to
http://code.metager.de/source/history/linux/stable/mm/ )

and "btrfs: test for valid bdev before kobj removal in
btrfs_rm_device" is not related

Keep up the great work !

Btrfs is significantly more resilient than in the past (surviving
hardlocks, etc.) - but this high load related rsync blocked task
behavior (also had this in the past several kernel versions back)
creates headaches and still prevents it from being used on a regular,
efficient, basis. I'd *really* like to have an alternative filesystem
with checksums besides ZFS


Kind Regards

Matt

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Blocked tasks on 3.15.1
@ 2014-06-27  1:37 Rich Freeman
  0 siblings, 0 replies; 49+ messages in thread
From: Rich Freeman @ 2014-06-27  1:37 UTC (permalink / raw)
  To: linux-btrfs

I've been getting blocked tasks on 3.15.1 generally at times when the
filesystem is somewhat busy (such as doing a backup via scp/clonezilla
writing to the disk).

A week ago I had enabled snapper for a day which resulted in a daily
cleanup of about 8 snapshots at once, which might have contributed,
but I've been limping along since.

Here is a pastebin of my dmesg from the hung tasks and a subsequent Alt-SysRq-W:

http://pastebin.com/yYdcxFTE

When this happens the system remains somewhat stable, but no writes to
the disk succeed, and I start getting load averages in the dozens as
tasks start blocking.

On reboot the system generally works fine, though it can hang a day or
two later.

I'm happy to try patches, or try to capture any other output that is
helpful the next time this happens - the system is fairly stable as
long as I capture things someplace other than my btrfs file systems.
I didn't see anything quite like this on the list.  I updated my
kernel around the time this behavior started, and was on 3.15.0
previously (though I haven't tried reverting yet).

Rich

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2014-08-18 20:45 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-27 10:02 Blocked tasks on 3.15.1 Tomasz Chmielewski
2014-06-27 13:06 ` Duncan
2014-06-27 15:14   ` Rich Freeman
2014-06-27 15:52     ` Chris Murphy
2014-06-27 17:20       ` Duncan
2014-06-28  0:22         ` Chris Samuel
2014-06-29 20:02           ` Cody P Schafer
2014-06-29 22:22             ` Cody P Schafer
2014-06-30 18:11             ` Chris Mason
2014-06-30 18:30               ` Chris Mason
2014-06-30 23:42                 ` Cody P Schafer
2014-07-01 21:04                   ` Chris Mason
2014-07-01 23:05                     ` Cody P Schafer
2014-07-02 12:27                       ` Cody P Schafer
2014-07-02 13:58                         ` Chris Mason
2014-07-02 14:15                           ` Chris Mason
2014-07-17 13:18                             ` Chris Mason
2014-07-19  0:33                               ` Blocked tasks on 3.15.1, raid1 btrfs is no ends of trouble for me Marc MERLIN
2014-07-19  0:44                                 ` Marc MERLIN
2014-07-19  1:58                                   ` Marc MERLIN
2014-07-19  1:59                                   ` Chris Samuel
2014-07-19  5:40                                     ` Marc MERLIN
2014-07-19 17:38                               ` Blocked tasks on 3.15.1 Cody P Schafer
2014-07-19 18:23                                 ` Martin Steigerwald
2014-07-22 14:53                                   ` Chris Mason
2014-07-22 15:14                                     ` Torbjørn
2014-07-22 16:46                                     ` Marc MERLIN
2014-07-22 19:42                                     ` Torbjørn
2014-07-22 19:50                                       ` Chris Mason
2014-07-22 20:10                                         ` Torbjørn
2014-07-22 21:13                                     ` Martin Steigerwald
2014-07-22 21:15                                       ` Chris Mason
2014-07-23 11:13                                         ` Martin Steigerwald
2014-07-23  1:06                                     ` Rich Freeman
2014-07-23  6:38                                       ` Felix Seidel
2014-07-23 13:20                                     ` Charles Cazabon
2014-07-25  2:27                                     ` Cody P Schafer
2014-08-07 15:12                                       ` Tobias Holst
2014-08-07 16:05                                         ` Duncan
2014-08-12  2:55                                     ` Charles Cazabon
2014-08-12  2:56                                       ` Liu Bo
2014-08-12  4:18                                         ` Duncan
2014-08-12  4:49                                       ` Marc MERLIN
2014-08-18 20:34                                         ` James Cloos
2014-07-01  3:06               ` Charles Cazabon
2014-06-30  2:33           ` Rich Freeman
2014-06-27 18:33       ` Rich Freeman
  -- strict thread matches above, loose matches on Subject: below --
2014-07-20 21:34 Matt
2014-06-27  1:37 Rich Freeman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.