All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction
       [not found] <20120626193637.GA27856@merlins.org>
@ 2012-06-27  1:38 ` Marc MERLIN
  2012-06-27  5:20   ` Marc MERLIN
  0 siblings, 1 reply; 10+ messages in thread
From: Marc MERLIN @ 2012-06-27  1:38 UTC (permalink / raw)
  To: linux-btrfs

On Tue, Jun 26, 2012 at 12:36:37PM -0700, Marc MERLIN wrote:
> I was fine with btrfs until 3.2.16, but when going to 3.4 or 3.4.4, I'm
> having my system randomly not wanting to suspend to RAM.
> 
> My suspend light started flashing but the system wasn't suspending.
> I was able to unlock X, run top, saw that btrfs-delalloc-??? was taking
> 98% CPU, and any command that touched disk hung 
> (so I could log in, but would not get a shell).
> 
> I did the sysrq commands, which took a long time to work (they stacked
> up with nothing happening and maybe 10mn later, the system unhung itself
> and they ran, maybe too late, not sure).
> 
> Interestingly the system did go to sleep and resumed ok after that, it
> just took 20mn to get there.
> 
> It's all pasted below.
> 
> gandalfthegreat:~# btrfs fi show
> Label: 'btrfs_pool1'  uuid: 873d526c-e911-4234-af1b-239889cd143d
> 	Total devices 1 FS bytes used 214.44GB
> 	devid    1 size 231.02GB used 231.02GB path /dev/dm-0
> 
> Btrfs Btrfs v0.19
> gandalfthegreat:~# 

Now, I'm also seeing these below and I have this again (86% CPU):
 6076 root      20   0     0    0    0 R   86  0.0  29:40.11 btrfs-delalloc-    

How bad is it, doctor?  I think I'll be going back to 3.2.16 for now though.

[100415.369301] BTRFS warning (device dm-0): Aborting unused transaction.
[100415.739337] BTRFS warning (device dm-0): Aborting unused transaction.
[100416.093839] BTRFS warning (device dm-0): Aborting unused transaction.
[100416.433245] BTRFS warning (device dm-0): Aborting unused transaction.
[100709.352445] usb 1-5: USB disconnect, device number 8
[100710.197602] thinkpad_acpi: EC reports that Thermal Table has changed
[100710.199954] ACPI: \_SB_.GDCK - undocking
[100711.358679] wlan0: deauthenticating from 00:24:6c:67:03:d1 by local choice (reason=3)
[100713.044214] e1000e 0000:00:19.0: irq 46 for MSI/MSI-X
[101447.801190] btrfs no csum found for inode 3910588 start 5144576
[101447.802277] btrfs csum failed ino 3910588 off 5144576 csum 2096800889 private 0
[101447.957161] btrfs no csum found for inode 3910588 start 5996544
[101447.957990] btrfs csum failed ino 3910588 off 5996544 csum 2096800889 private 0
[101447.960459] btrfs no csum found for inode 3910588 start 6062080
[101447.961351] btrfs csum failed ino 3910588 off 6062080 csum 2096800889 private 0
[101447.962459] btrfs no csum found for inode 3910588 start 6225920
[101447.963543] btrfs csum failed ino 3910588 off 6225920 csum 2096800889 private 0
[101447.964369] btrfs no csum found for inode 3910588 start 6291456
[101447.965045] btrfs csum failed ino 3910588 off 6291456 csum 2096800889 private 0
[101447.966406] btrfs no csum found for inode 3910588 start 6356992
[101447.967392] btrfs csum failed ino 3910588 off 6356992 csum 2123070294 private 0
[101448.088912] btrfs no csum found for inode 3910588 start 6553600
[101448.088971] btrfs no csum found for inode 3910588 start 6586368
[101448.090058] btrfs csum failed ino 3910588 off 6586368 csum 2096800889 private 0
[101448.090305] btrfs csum failed ino 3910588 off 6553600 csum 4253301504 private 0
[101448.093231] btrfs no csum found for inode 3910588 start 6717440
[101448.094269] btrfs csum failed ino 3910588 off 6717440 csum 2096800889 private 0
[101448.096104] btrfs no csum found for inode 3910588 start 6848512
[101448.097002] btrfs csum failed ino 3910588 off 6848512 csum 3282939717 private 0
[101448.098249] btrfs no csum found for inode 3910588 start 6946816
[101448.101561] btrfs no csum found for inode 3910588 start 7045120
[101448.106352] btrfs no csum found for inode 3910588 start 7176192
[101448.108224] btrfs no csum found for inode 3910588 start 7241728
[101448.111133] btrfs no csum found for inode 3910588 start 7307264
[101448.127081] btrfs no csum found for inode 3910588 start 7569408
[101448.144351] btrfs no csum found for inode 3910588 start 7864320
[101448.155857] btrfs no csum found for inode 3910588 start 8060928
[101448.165868] btrfs no csum found for inode 3910588 start 8257536
[101448.175275] btrfs no csum found for inode 3910588 start 8454144
[101448.176087] btrfs no csum found for inode 3910588 start 8552448
[101448.180927] btrfs no csum found for inode 3910588 start 8847360
[101448.183489] btrfs no csum found for inode 3910588 start 8978432
[101448.192909] btrfs no csum found for inode 3910588 start 9371648
[101448.198959] btrfs no csum found for inode 3910588 start 9568256
[101448.413716] btrfs no csum found for inode 3910588 start 5439488
[101448.413996] btrfs no csum found for inode 3910588 start 5537792
[101448.414379] btrfs no csum found for inode 3910588 start 5603328
[101448.416386] btrfs no csum found for inode 3910588 start 5734400
[101448.424108] btrfs no csum found for inode 3910588 start 5799936
[101448.721649] btrfs no csum found for inode 3910588 start 9699328
[101448.737377] btrfs no csum found for inode 3910588 start 9863168
[101448.743929] btrfs no csum found for inode 3910588 start 10027008
[101448.753186] btrfs no csum found for inode 3910588 start 10158080
[101448.762406] btrfs no csum found for inode 3910588 start 10256384
[101448.765143] btrfs no csum found for inode 3910588 start 10354688
[102230.473794] BTRFS warning (device dm-0): Aborting unused transaction.
[102231.154304] BTRFS warning (device dm-0): Aborting unused transaction.
[102231.816380] BTRFS warning (device dm-0): Aborting unused transaction.
[102231.845170] btrfs no csum found for inode 3910562 start 0
[102231.855415] btrfs_readpage_end_io_hook: 26 callbacks suppressed
[102231.855418] btrfs csum failed ino 3910562 off 0 csum 3145117582 private 0
[102231.999555] btrfs no csum found for inode 3910588 start 294912
[102232.009479] btrfs csum failed ino 3910588 off 294912 csum 697692408 private 0
[102232.142602] btrfs no csum found for inode 3910588 start 360448
[102232.142661] btrfs no csum found for inode 3910588 start 393216
[102232.157527] btrfs csum failed ino 3910588 off 360448 csum 66868660 private 0
[102232.158415] btrfs csum failed ino 3910588 off 393216 csum 230387419 private 0
[102232.968939] BTRFS warning (device dm-0): Aborting unused transaction.
[102234.107914] BTRFS warning (device dm-0): Aborting unused transaction.
[102235.218139] BTRFS warning (device dm-0): Aborting unused transaction.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction
  2012-06-27  1:38 ` Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction Marc MERLIN
@ 2012-06-27  5:20   ` Marc MERLIN
  2012-06-29 12:36     ` Marc MERLIN
  0 siblings, 1 reply; 10+ messages in thread
From: Marc MERLIN @ 2012-06-27  5:20 UTC (permalink / raw)
  To: linux-btrfs

On Tue, Jun 26, 2012 at 06:38:18PM -0700, Marc MERLIN wrote:
> Now, I'm also seeing these below and I have this again (86% CPU):
>  6076 root      20   0     0    0    0 R   86  0.0  29:40.11 btrfs-delalloc-    
> 
> How bad is it, doctor?  I think I'll be going back to 3.2.16 for now though.

Back to 3.2.16, I'm now seeing this:
[  840.516733] INFO: task VirtualBox:6818 blocked for more than 120 seconds.
[  840.516735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  840.516736] VirtualBox      D ffff8801fd134080     0  6818   6758 0x00000080
[  840.516740]  ffff8801fd134080 0000000000000086 0000000000000050 ffff880202e7f100
[  840.516744]  0000000000013580 ffff8801c6f0dfd8 ffff8801c6f0dfd8 ffff8801fd134080
[  840.516748]  ffff8801c6f0da68 ffff8801c6f0da68 ffff88020a4e22f0 ffff88023bc13e08
[  840.516752] Call Trace:
[  840.516755]  [<ffffffff810b5c67>] ? __lock_page+0x66/0x66
[  840.516758]  [<ffffffff8134aea4>] ? io_schedule+0x58/0x6f
[  840.516761]  [<ffffffff810b5c6d>] ? sleep_on_page+0x6/0xa
[  840.516764]  [<ffffffff8134b1e5>] ? __wait_on_bit_lock+0x3c/0x85
[  840.516767]  [<ffffffff810b5c62>] ? __lock_page+0x61/0x66
[  840.516770]  [<ffffffff81060051>] ? autoremove_wake_function+0x2a/0x2a
[  840.516785]  [<ffffffffa01838d7>] ? extent_write_cache_pages.isra.13.constprop.22+0xf6/0x278 [btrfs]
[  840.516789]  [<ffffffff810ec9cb>] ? __cache_free.isra.40+0x19/0x1a7
[  840.516792]  [<ffffffff8134ed52>] ? sub_preempt_count+0x83/0x94
[  840.516795]  [<ffffffff8134c2dd>] ? _raw_spin_unlock+0x24/0x30
[  840.516811]  [<ffffffffa0183c4b>] ? extent_writepages+0x40/0x57 [btrfs]
[  840.516826]  [<ffffffffa0177f5f>] ? __btrfs_buffered_write+0x2bb/0x2dc [btrfs]
[  840.516841]  [<ffffffffa016e88a>] ? uncompress_inline.isra.44+0x116/0x116 [btrfs]
[  840.516844]  [<ffffffff810b6aaf>] ? __filemap_fdatawrite_range+0x4b/0x50
[  840.516847]  [<ffffffff810b6ad9>] ? filemap_write_and_wait_range+0x25/0x4d
[  840.516863]  [<ffffffffa01782ce>] ? btrfs_file_aio_write+0x34e/0x490 [btrfs]
[  840.516866]  [<ffffffff8103e092>] ? get_parent_ip+0x9/0x1b
[  840.516882]  [<ffffffffa0177f80>] ? __btrfs_buffered_write+0x2dc/0x2dc [btrfs]
[  840.516886]  [<ffffffff8112f19c>] ? aio_rw_vect_retry+0x70/0x18e
[  840.516888]  [<ffffffff8112f12c>] ? aio_fsync+0x22/0x22
[  840.516891]  [<ffffffff8112fbc7>] ? aio_run_iocb+0x72/0x11c
[  840.516894]  [<ffffffff81130d9a>] ? do_io_submit+0x6a4/0x7f9
[  840.516898]  [<ffffffff813508d2>] ? system_call_fastpath+0x16/0x1b
[ 1187.553635] btrfs: unlinked 8 orphans
[ 3810.200064] e1000e 0000:00:19.0: BAR 0: set to [mem 0xfc000000-0xfc01ffff] (PCI address [0xfc000000-0xfc01ffff])
[ 3810.200071] e1000e 0000:00:19.0: BAR 1: set to [mem 0xfc025000-0xfc025fff] (PCI address [0xfc025000-0xfc025fff])
[ 3810.200076] e1000e 0000:00:19.0: BAR 2: set to [io  0x1840-0x185f] (PCI address [0x1840-0x185f])
[ 3810.200093] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
[ 3810.200115] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107)
[ 3810.200147] e1000e 0000:00:19.0: PME# disabled
[ 3810.200224] e1000e 0000:00:19.0: irq 45 for MSI/MSI-X
[ 4671.144685] iwlwifi 0000:03:00.0: Tx aggregation enabled on ra = 2c:b0:5d:3c:7d:f1 tid = 1
[ 4799.384107] btrfs: unlinked 8 orphans
[ 8436.512513] btrfs: unlinked 7 orphans
[11350.749850] btrfs no csum found for inode 3909426 start 0
[11350.750697] btrfs csum failed ino 3909426 off 0 csum 1419704114 private 0
[11652.088805] btrfs no csum found for inode 3910848 start 0
[11652.089524] btrfs csum failed ino 3910848 off 0 csum 3145117582 private 0

My firefox and chrome profiles were corrupted, so I had to restore them from an old snapshot.

I can't prove it, but it looks like my corruption happened right at the same
time than I rebooted to 3.4.4.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction
  2012-06-27  5:20   ` Marc MERLIN
@ 2012-06-29 12:36     ` Marc MERLIN
  2012-07-02 19:58       ` Marc MERLIN
  0 siblings, 1 reply; 10+ messages in thread
From: Marc MERLIN @ 2012-06-29 12:36 UTC (permalink / raw)
  To: linux-btrfs

On Tue, Jun 26, 2012 at 10:20:12PM -0700, Marc MERLIN wrote:
> On Tue, Jun 26, 2012 at 06:38:18PM -0700, Marc MERLIN wrote:
> > Now, I'm also seeing these below and I have this again (86% CPU):
> >  6076 root      20   0     0    0    0 R   86  0.0  29:40.11 btrfs-delalloc-    
> > 
> > How bad is it, doctor?  I think I'll be going back to 3.2.16 for now though.
 
I reverted to 3.2.16 and haven't had further problems after dropping the
current snapshot that was corrupted in various ways.

Now, I'm not sure when I should upgrade anymore since I haven't heard of
any fixes for what I saw.
Assuming I go forward again, is there something else I could have
provided to help debug?

Marc

> Back to 3.2.16, I'm now seeing this:
> [  840.516733] INFO: task VirtualBox:6818 blocked for more than 120 seconds.
> [  840.516735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  840.516736] VirtualBox      D ffff8801fd134080     0  6818   6758 0x00000080
> [  840.516740]  ffff8801fd134080 0000000000000086 0000000000000050 ffff880202e7f100
> [  840.516744]  0000000000013580 ffff8801c6f0dfd8 ffff8801c6f0dfd8 ffff8801fd134080
> [  840.516748]  ffff8801c6f0da68 ffff8801c6f0da68 ffff88020a4e22f0 ffff88023bc13e08
> [  840.516752] Call Trace:
> [  840.516755]  [<ffffffff810b5c67>] ? __lock_page+0x66/0x66
> [  840.516758]  [<ffffffff8134aea4>] ? io_schedule+0x58/0x6f
> [  840.516761]  [<ffffffff810b5c6d>] ? sleep_on_page+0x6/0xa
> [  840.516764]  [<ffffffff8134b1e5>] ? __wait_on_bit_lock+0x3c/0x85
> [  840.516767]  [<ffffffff810b5c62>] ? __lock_page+0x61/0x66
> [  840.516770]  [<ffffffff81060051>] ? autoremove_wake_function+0x2a/0x2a
> [  840.516785]  [<ffffffffa01838d7>] ? extent_write_cache_pages.isra.13.constprop.22+0xf6/0x278 [btrfs]
> [  840.516789]  [<ffffffff810ec9cb>] ? __cache_free.isra.40+0x19/0x1a7
> [  840.516792]  [<ffffffff8134ed52>] ? sub_preempt_count+0x83/0x94
> [  840.516795]  [<ffffffff8134c2dd>] ? _raw_spin_unlock+0x24/0x30
> [  840.516811]  [<ffffffffa0183c4b>] ? extent_writepages+0x40/0x57 [btrfs]
> [  840.516826]  [<ffffffffa0177f5f>] ? __btrfs_buffered_write+0x2bb/0x2dc [btrfs]
> [  840.516841]  [<ffffffffa016e88a>] ? uncompress_inline.isra.44+0x116/0x116 [btrfs]
> [  840.516844]  [<ffffffff810b6aaf>] ? __filemap_fdatawrite_range+0x4b/0x50
> [  840.516847]  [<ffffffff810b6ad9>] ? filemap_write_and_wait_range+0x25/0x4d
> [  840.516863]  [<ffffffffa01782ce>] ? btrfs_file_aio_write+0x34e/0x490 [btrfs]
> [  840.516866]  [<ffffffff8103e092>] ? get_parent_ip+0x9/0x1b
> [  840.516882]  [<ffffffffa0177f80>] ? __btrfs_buffered_write+0x2dc/0x2dc [btrfs]
> [  840.516886]  [<ffffffff8112f19c>] ? aio_rw_vect_retry+0x70/0x18e
> [  840.516888]  [<ffffffff8112f12c>] ? aio_fsync+0x22/0x22
> [  840.516891]  [<ffffffff8112fbc7>] ? aio_run_iocb+0x72/0x11c
> [  840.516894]  [<ffffffff81130d9a>] ? do_io_submit+0x6a4/0x7f9
> [  840.516898]  [<ffffffff813508d2>] ? system_call_fastpath+0x16/0x1b
> [ 1187.553635] btrfs: unlinked 8 orphans
> [ 3810.200064] e1000e 0000:00:19.0: BAR 0: set to [mem 0xfc000000-0xfc01ffff] (PCI address [0xfc000000-0xfc01ffff])
> [ 3810.200071] e1000e 0000:00:19.0: BAR 1: set to [mem 0xfc025000-0xfc025fff] (PCI address [0xfc025000-0xfc025fff])
> [ 3810.200076] e1000e 0000:00:19.0: BAR 2: set to [io  0x1840-0x185f] (PCI address [0x1840-0x185f])
> [ 3810.200093] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
> [ 3810.200115] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107)
> [ 3810.200147] e1000e 0000:00:19.0: PME# disabled
> [ 3810.200224] e1000e 0000:00:19.0: irq 45 for MSI/MSI-X
> [ 4671.144685] iwlwifi 0000:03:00.0: Tx aggregation enabled on ra = 2c:b0:5d:3c:7d:f1 tid = 1
> [ 4799.384107] btrfs: unlinked 8 orphans
> [ 8436.512513] btrfs: unlinked 7 orphans
> [11350.749850] btrfs no csum found for inode 3909426 start 0
> [11350.750697] btrfs csum failed ino 3909426 off 0 csum 1419704114 private 0
> [11652.088805] btrfs no csum found for inode 3910848 start 0
> [11652.089524] btrfs csum failed ino 3910848 off 0 csum 3145117582 private 0
> 
> My firefox and chrome profiles were corrupted, so I had to restore them from an old snapshot.
> 
> I can't prove it, but it looks like my corruption happened right at the same
> time than I rebooted to 3.4.4.
> 
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction
  2012-06-29 12:36     ` Marc MERLIN
@ 2012-07-02 19:58       ` Marc MERLIN
  2012-07-04  5:58         ` Liu Bo
  0 siblings, 1 reply; 10+ messages in thread
From: Marc MERLIN @ 2012-07-02 19:58 UTC (permalink / raw)
  To: linux-btrfs

On Fri, Jun 29, 2012 at 05:36:24AM -0700, Marc MERLIN wrote:
> On Tue, Jun 26, 2012 at 10:20:12PM -0700, Marc MERLIN wrote:
> > On Tue, Jun 26, 2012 at 06:38:18PM -0700, Marc MERLIN wrote:
> > > Now, I'm also seeing these below and I have this again (86% CPU):
> > >  6076 root      20   0     0    0    0 R   86  0.0  29:40.11 btrfs-delalloc-    
> > > 
> > > How bad is it, doctor?  I think I'll be going back to 3.2.16 for now though.
>  
> I reverted to 3.2.16 and haven't had further problems after dropping the
> current snapshot that was corrupted in various ways.
> 
> Now, I'm not sure when I should upgrade anymore since I haven't heard of
> any fixes for what I saw.
> Assuming I go forward again, is there something else I could have
> provided to help debug?

Mmmh, ok. I understand that this code comes with no guarantees, and I have
backups, but I'm reporting a problem that lead to corruption (I had multiple
files that were corrupted in my latest snapshot and I had to drop it and
revert to an older snapshot and then out of fear for 3.4.4, went back to
3.2.16).

I didn't see any problems with 3.2.16 (doesn't mean there weren't any, just
that I didn't see any).
Since my filesystem was a bit full, and that triggers problems with btrfs, I
freed up 70GB
gandalfthegreat:~# btrfs fi show
Label: 'btrfs_pool1'  uuid: 873d526c-e911-4234-af1b-239889cd143d
	Total devices 1 FS bytes used 163.01GB
	devid    1 size 231.02GB used 231.02GB path /dev/dm-0

I rebooted with 3.4.4 and started copying data, and for now I've gotten this:
kernel: [  832.108558] btrfs no csum found for inode 3896855 start 0
kernel: [  832.108873] btrfs csum failed ino 3896855 off 0 csum 1150320628 private 0

How bad is this?

More generally, what was missing from my previous report (I gave all the
sysrq I could output) that no one seemed to be able to use it?

Thanks,
Marc

> > Back to 3.2.16, I'm now seeing this:
> > [  840.516733] INFO: task VirtualBox:6818 blocked for more than 120 seconds.
> > [  840.516735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  840.516736] VirtualBox      D ffff8801fd134080     0  6818   6758 0x00000080
> > [  840.516740]  ffff8801fd134080 0000000000000086 0000000000000050 ffff880202e7f100
> > [  840.516744]  0000000000013580 ffff8801c6f0dfd8 ffff8801c6f0dfd8 ffff8801fd134080
> > [  840.516748]  ffff8801c6f0da68 ffff8801c6f0da68 ffff88020a4e22f0 ffff88023bc13e08
> > [  840.516752] Call Trace:
> > [  840.516755]  [<ffffffff810b5c67>] ? __lock_page+0x66/0x66
> > [  840.516758]  [<ffffffff8134aea4>] ? io_schedule+0x58/0x6f
> > [  840.516761]  [<ffffffff810b5c6d>] ? sleep_on_page+0x6/0xa
> > [  840.516764]  [<ffffffff8134b1e5>] ? __wait_on_bit_lock+0x3c/0x85
> > [  840.516767]  [<ffffffff810b5c62>] ? __lock_page+0x61/0x66
> > [  840.516770]  [<ffffffff81060051>] ? autoremove_wake_function+0x2a/0x2a
> > [  840.516785]  [<ffffffffa01838d7>] ? extent_write_cache_pages.isra.13.constprop.22+0xf6/0x278 [btrfs]
> > [  840.516789]  [<ffffffff810ec9cb>] ? __cache_free.isra.40+0x19/0x1a7
> > [  840.516792]  [<ffffffff8134ed52>] ? sub_preempt_count+0x83/0x94
> > [  840.516795]  [<ffffffff8134c2dd>] ? _raw_spin_unlock+0x24/0x30
> > [  840.516811]  [<ffffffffa0183c4b>] ? extent_writepages+0x40/0x57 [btrfs]
> > [  840.516826]  [<ffffffffa0177f5f>] ? __btrfs_buffered_write+0x2bb/0x2dc [btrfs]
> > [  840.516841]  [<ffffffffa016e88a>] ? uncompress_inline.isra.44+0x116/0x116 [btrfs]
> > [  840.516844]  [<ffffffff810b6aaf>] ? __filemap_fdatawrite_range+0x4b/0x50
> > [  840.516847]  [<ffffffff810b6ad9>] ? filemap_write_and_wait_range+0x25/0x4d
> > [  840.516863]  [<ffffffffa01782ce>] ? btrfs_file_aio_write+0x34e/0x490 [btrfs]
> > [  840.516866]  [<ffffffff8103e092>] ? get_parent_ip+0x9/0x1b
> > [  840.516882]  [<ffffffffa0177f80>] ? __btrfs_buffered_write+0x2dc/0x2dc [btrfs]
> > [  840.516886]  [<ffffffff8112f19c>] ? aio_rw_vect_retry+0x70/0x18e
> > [  840.516888]  [<ffffffff8112f12c>] ? aio_fsync+0x22/0x22
> > [  840.516891]  [<ffffffff8112fbc7>] ? aio_run_iocb+0x72/0x11c
> > [  840.516894]  [<ffffffff81130d9a>] ? do_io_submit+0x6a4/0x7f9
> > [  840.516898]  [<ffffffff813508d2>] ? system_call_fastpath+0x16/0x1b
> > [ 1187.553635] btrfs: unlinked 8 orphans
> > [ 3810.200064] e1000e 0000:00:19.0: BAR 0: set to [mem 0xfc000000-0xfc01ffff] (PCI address [0xfc000000-0xfc01ffff])
> > [ 3810.200071] e1000e 0000:00:19.0: BAR 1: set to [mem 0xfc025000-0xfc025fff] (PCI address [0xfc025000-0xfc025fff])
> > [ 3810.200076] e1000e 0000:00:19.0: BAR 2: set to [io  0x1840-0x185f] (PCI address [0x1840-0x185f])
> > [ 3810.200093] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
> > [ 3810.200115] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107)
> > [ 3810.200147] e1000e 0000:00:19.0: PME# disabled
> > [ 3810.200224] e1000e 0000:00:19.0: irq 45 for MSI/MSI-X
> > [ 4671.144685] iwlwifi 0000:03:00.0: Tx aggregation enabled on ra = 2c:b0:5d:3c:7d:f1 tid = 1
> > [ 4799.384107] btrfs: unlinked 8 orphans
> > [ 8436.512513] btrfs: unlinked 7 orphans
> > [11350.749850] btrfs no csum found for inode 3909426 start 0
> > [11350.750697] btrfs csum failed ino 3909426 off 0 csum 1419704114 private 0
> > [11652.088805] btrfs no csum found for inode 3910848 start 0
> > [11652.089524] btrfs csum failed ino 3910848 off 0 csum 3145117582 private 0
> > 
> > My firefox and chrome profiles were corrupted, so I had to restore them from an old snapshot.
> > 
> > I can't prove it, but it looks like my corruption happened right at the same
> > time than I rebooted to 3.4.4.
> > 
> > Marc
> > -- 
> > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> > Microsoft is to operating systems ....
> >                                       .... what McDonalds is to gourmet cooking
> > Home page: http://marc.merlins.org/  
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
>                                       .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction
  2012-07-02 19:58       ` Marc MERLIN
@ 2012-07-04  5:58         ` Liu Bo
  2012-07-04 15:15           ` Marc MERLIN
  0 siblings, 1 reply; 10+ messages in thread
From: Liu Bo @ 2012-07-04  5:58 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs

On 07/03/2012 03:58 AM, Marc MERLIN wrote:

> On Fri, Jun 29, 2012 at 05:36:24AM -0700, Marc MERLIN wrote:
>> On Tue, Jun 26, 2012 at 10:20:12PM -0700, Marc MERLIN wrote:
>>> On Tue, Jun 26, 2012 at 06:38:18PM -0700, Marc MERLIN wrote:
>>>> Now, I'm also seeing these below and I have this again (86% CPU):
>>>>  6076 root      20   0     0    0    0 R   86  0.0  29:40.11 btrfs-delalloc-    
>>>>
>>>> How bad is it, doctor?  I think I'll be going back to 3.2.16 for now though.
>>  
>> I reverted to 3.2.16 and haven't had further problems after dropping the
>> current snapshot that was corrupted in various ways.
>>
>> Now, I'm not sure when I should upgrade anymore since I haven't heard of
>> any fixes for what I saw.
>> Assuming I go forward again, is there something else I could have
>> provided to help debug?
> 
> Mmmh, ok. I understand that this code comes with no guarantees, and I have
> backups, but I'm reporting a problem that lead to corruption (I had multiple
> files that were corrupted in my latest snapshot and I had to drop it and
> revert to an older snapshot and then out of fear for 3.4.4, went back to
> 3.2.16).
> 


Hi Marc,

Sorry for not replying this earlier.

The dmesg log, sysrq log and stack dump info can usually be very helpful.

>From your report, we can see the csum error and hang on log,
'no csum' is not that bad while hanging-on is serious and dangerous.

so can you please get any 'sysrq + w' log in the hanging-on case and paste them here,
and the log may tell us who blocks other threads.

> I didn't see any problems with 3.2.16 (doesn't mean there weren't any, just
> that I didn't see any).


Feel free to use the latest btrfs upstream, it always contains some fixes.

thanks,
liubo

> Since my filesystem was a bit full, and that triggers problems with btrfs, I
> freed up 70GB
> gandalfthegreat:~# btrfs fi show
> Label: 'btrfs_pool1'  uuid: 873d526c-e911-4234-af1b-239889cd143d
> 	Total devices 1 FS bytes used 163.01GB
> 	devid    1 size 231.02GB used 231.02GB path /dev/dm-0
> 
> I rebooted with 3.4.4 and started copying data, and for now I've gotten this:
> kernel: [  832.108558] btrfs no csum found for inode 3896855 start 0
> kernel: [  832.108873] btrfs csum failed ino 3896855 off 0 csum 1150320628 private 0
> 
> How bad is this?
> 
> More generally, what was missing from my previous report (I gave all the
> sysrq I could output) that no one seemed to be able to use it?
> 

> Thanks,
> Marc
> 
>>> Back to 3.2.16, I'm now seeing this:
>>> [  840.516733] INFO: task VirtualBox:6818 blocked for more than 120 seconds.
>>> [  840.516735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [  840.516736] VirtualBox      D ffff8801fd134080     0  6818   6758 0x00000080
>>> [  840.516740]  ffff8801fd134080 0000000000000086 0000000000000050 ffff880202e7f100
>>> [  840.516744]  0000000000013580 ffff8801c6f0dfd8 ffff8801c6f0dfd8 ffff8801fd134080
>>> [  840.516748]  ffff8801c6f0da68 ffff8801c6f0da68 ffff88020a4e22f0 ffff88023bc13e08
>>> [  840.516752] Call Trace:
>>> [  840.516755]  [<ffffffff810b5c67>] ? __lock_page+0x66/0x66
>>> [  840.516758]  [<ffffffff8134aea4>] ? io_schedule+0x58/0x6f
>>> [  840.516761]  [<ffffffff810b5c6d>] ? sleep_on_page+0x6/0xa
>>> [  840.516764]  [<ffffffff8134b1e5>] ? __wait_on_bit_lock+0x3c/0x85
>>> [  840.516767]  [<ffffffff810b5c62>] ? __lock_page+0x61/0x66
>>> [  840.516770]  [<ffffffff81060051>] ? autoremove_wake_function+0x2a/0x2a
>>> [  840.516785]  [<ffffffffa01838d7>] ? extent_write_cache_pages.isra.13.constprop.22+0xf6/0x278 [btrfs]
>>> [  840.516789]  [<ffffffff810ec9cb>] ? __cache_free.isra.40+0x19/0x1a7
>>> [  840.516792]  [<ffffffff8134ed52>] ? sub_preempt_count+0x83/0x94
>>> [  840.516795]  [<ffffffff8134c2dd>] ? _raw_spin_unlock+0x24/0x30
>>> [  840.516811]  [<ffffffffa0183c4b>] ? extent_writepages+0x40/0x57 [btrfs]
>>> [  840.516826]  [<ffffffffa0177f5f>] ? __btrfs_buffered_write+0x2bb/0x2dc [btrfs]
>>> [  840.516841]  [<ffffffffa016e88a>] ? uncompress_inline.isra.44+0x116/0x116 [btrfs]
>>> [  840.516844]  [<ffffffff810b6aaf>] ? __filemap_fdatawrite_range+0x4b/0x50
>>> [  840.516847]  [<ffffffff810b6ad9>] ? filemap_write_and_wait_range+0x25/0x4d
>>> [  840.516863]  [<ffffffffa01782ce>] ? btrfs_file_aio_write+0x34e/0x490 [btrfs]
>>> [  840.516866]  [<ffffffff8103e092>] ? get_parent_ip+0x9/0x1b
>>> [  840.516882]  [<ffffffffa0177f80>] ? __btrfs_buffered_write+0x2dc/0x2dc [btrfs]
>>> [  840.516886]  [<ffffffff8112f19c>] ? aio_rw_vect_retry+0x70/0x18e
>>> [  840.516888]  [<ffffffff8112f12c>] ? aio_fsync+0x22/0x22
>>> [  840.516891]  [<ffffffff8112fbc7>] ? aio_run_iocb+0x72/0x11c
>>> [  840.516894]  [<ffffffff81130d9a>] ? do_io_submit+0x6a4/0x7f9
>>> [  840.516898]  [<ffffffff813508d2>] ? system_call_fastpath+0x16/0x1b
>>> [ 1187.553635] btrfs: unlinked 8 orphans
>>> [ 3810.200064] e1000e 0000:00:19.0: BAR 0: set to [mem 0xfc000000-0xfc01ffff] (PCI address [0xfc000000-0xfc01ffff])
>>> [ 3810.200071] e1000e 0000:00:19.0: BAR 1: set to [mem 0xfc025000-0xfc025fff] (PCI address [0xfc025000-0xfc025fff])
>>> [ 3810.200076] e1000e 0000:00:19.0: BAR 2: set to [io  0x1840-0x185f] (PCI address [0x1840-0x185f])
>>> [ 3810.200093] e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
>>> [ 3810.200115] e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100107)
>>> [ 3810.200147] e1000e 0000:00:19.0: PME# disabled
>>> [ 3810.200224] e1000e 0000:00:19.0: irq 45 for MSI/MSI-X
>>> [ 4671.144685] iwlwifi 0000:03:00.0: Tx aggregation enabled on ra = 2c:b0:5d:3c:7d:f1 tid = 1
>>> [ 4799.384107] btrfs: unlinked 8 orphans
>>> [ 8436.512513] btrfs: unlinked 7 orphans
>>> [11350.749850] btrfs no csum found for inode 3909426 start 0
>>> [11350.750697] btrfs csum failed ino 3909426 off 0 csum 1419704114 private 0
>>> [11652.088805] btrfs no csum found for inode 3910848 start 0
>>> [11652.089524] btrfs csum failed ino 3910848 off 0 csum 3145117582 private 0
>>>
>>> My firefox and chrome profiles were corrupted, so I had to restore them from an old snapshot.
>>>
>>> I can't prove it, but it looks like my corruption happened right at the same
>>> time than I rebooted to 3.4.4.
>>>
>>> Marc
>>> -- 
>>> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>>> Microsoft is to operating systems ....
>>>                                       .... what McDonalds is to gourmet cooking
>>> Home page: http://marc.merlins.org/  
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> -- 
>> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>> Microsoft is to operating systems ....
>>                                       .... what McDonalds is to gourmet cooking
>> Home page: http://marc.merlins.org/
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction
  2012-07-04  5:58         ` Liu Bo
@ 2012-07-04 15:15           ` Marc MERLIN
  2012-07-05 13:25             ` Liu Bo
  0 siblings, 1 reply; 10+ messages in thread
From: Marc MERLIN @ 2012-07-04 15:15 UTC (permalink / raw)
  To: Liu Bo, linux-btrfs

On Wed, Jul 04, 2012 at 01:58:31PM +0800, Liu Bo wrote:
> The dmesg log, sysrq log and stack dump info can usually be very helpful.
> 
> From your report, we can see the csum error and hang on log,
> 'no csum' is not that bad while hanging-on is serious and dangerous.
> 
> so can you please get any 'sysrq + w' log in the hanging-on case and paste them here,
> and the log may tell us who blocks other threads.

Hi, thanks for the answer.
 
I dumped all sysrq data, that was in my original Email. Here are two
different sysrq+w runs, as well as aborted transaction messages from that
Email. 
Sorry that the original was a bit long and contained a bunch of sysrq output.

>From doing further testing since then, it does seem that the code just start
doing bad things, including the file corruption I saw, when I'm running low
on free space.

Anything else that would help?

Thanks,
Marc

> [84951.680847] Sched Debug Version: v0.10, 3.4.4-amd64-preempt-noide-20120410 #1
> [84951.680850] ktime                                   : 84951680.847904
> [84951.680851] sched_clk                               : 55820767.832790
> [84951.680853] cpu_clk                                 : 84951680.846676
> [84951.680855] jiffies                                 : 4316130216
> [84951.680857] sched_clock_stable                      : 0
> [84951.680858] 
> [84951.680859] sysctl_sched
> [84951.680860]   .sysctl_sched_latency                    : 12.000000
> [84951.680862]   .sysctl_sched_min_granularity            : 1.500000
> [84951.680864]   .sysctl_sched_wakeup_granularity         : 2.000000
> [84951.680865]   .sysctl_sched_child_runs_first           : 0
> [84951.680867]   .sysctl_sched_features                   : 24119
> [84951.680869]   .sysctl_sched_tunable_scaling            : 1 (logaritmic)
> [84951.680871] 
> [84951.680872] cpu#0, 2527.110 MHz
> [84951.680873]   .nr_running                    : 4
> [84951.680875]   .load                          : 1542
> [84951.680876]   .nr_switches                   : 298015357
> [84951.680878]   .nr_load_updates               : 16858906
> [84951.680879]   .nr_uninterruptible            : -7812
> [84951.680881]   .next_balance                  : 4316.130229
> [84951.680883]   .curr->pid                     : 6859
> [84951.680884]   .clock                         : 84951679.104864
> [84951.680886]   .cpu_load[0]                   : 1277
> [84951.680887]   .cpu_load[1]                   : 885
> [84951.680889]   .cpu_load[2]                   : 728
> [84951.680890]   .cpu_load[3]                   : 693
> [84951.680892]   .cpu_load[4]                   : 731
> [84951.680894] 
> [84951.680894] cfs_rq[0]:/autogroup-69
> [84951.680896]   .exec_clock                    : 0.000000
> [84951.680898]   .MIN_vruntime                  : 0.000001
> [84951.680899]   .min_vruntime                  : 194572.976125
> [84951.680901]   .max_vruntime                  : 0.000001
> [84951.680903]   .spread                        : 0.000000
> [84951.680904]   .spread0                       : -52160826.593685
> [84951.680906]   .nr_spread_over                : 0
> [84951.680907]   .nr_running                    : 0
> [84951.680909]   .load                          : 0
> [84951.680910]   .load_avg                      : 5179.920896
> [84951.680912]   .load_period                   : 8.310703
> [84951.680913]   .load_contrib                  : 623
> [84951.680915]   .load_tg                       : 623
> [84951.680917]   .se->exec_start                : 84951675.234336
> [84951.680918]   .se->vruntime                  : 52355393.270692
> [84951.680920]   .se->sum_exec_runtime          : 241715.485721
> [84951.680922]   .se->load.weight               : 2
> [84951.680923] 
> [84951.680924] cfs_rq[0]:/autogroup-20
> [84951.680925]   .exec_clock                    : 0.000000
> [84951.680927]   .MIN_vruntime                  : 0.000001
> [84951.680929]   .min_vruntime                  : 13027.823103
> [84951.680930]   .max_vruntime                  : 0.000001
> [84951.680932]   .spread                        : 0.000000
> [84951.680934]   .spread0                       : -52342371.746707
> [84951.680935]   .nr_spread_over                : 0
> [84951.680937]   .nr_running                    : 0
> [84951.680938]   .load                          : 0
> [84951.680940]   .load_avg                      : 1279.999872
> [84951.680941]   .load_period                   : 6.444962
> [84951.680943]   .load_contrib                  : 198
> [84951.680944]   .load_tg                       : 257
> [84951.680946]   .se->exec_start                : 84951664.094034
> [84951.680948]   .se->vruntime                  : 52355389.621584
> [84951.680950]   .se->sum_exec_runtime          : 13082.655453
> [84951.680951]   .se->load.weight               : 2
> [84951.680953] 
> [84951.680954] cfs_rq[0]:/autogroup-74
> [84951.680955]   .exec_clock                    : 0.000000
> [84951.680957]   .MIN_vruntime                  : 1323610.934982
> [84951.680959]   .min_vruntime                  : 1323616.934982
> [84951.680960]   .max_vruntime                  : 1323610.934982
> [84951.680962]   .spread                        : 0.000000
> [84951.680964]   .spread0                       : -51031782.634828
> [84951.680965]   .nr_spread_over                : 0
> [84951.680967]   .nr_running                    : 1
> [84951.680968]   .load                          : 1024
> [84951.680970]   .load_avg                      : 252.855291
> [84951.680971]   .load_period                   : 6.216878
> [84951.680973]   .load_contrib                  : 44
> [84951.680974]   .load_tg                       : 991
> [84951.680976]   .se->exec_start                : 84951670.049973
> [84951.680978]   .se->vruntime                  : 52355393.569810
> [84951.680980]   .se->sum_exec_runtime          : 2397898.918671
> [84951.680981]   .se->load.weight               : 532
> [84951.680983] 
> [84951.680984] cfs_rq[0]:/autogroup-28
> [84951.680985]   .exec_clock                    : 0.000000
> [84951.680987]   .MIN_vruntime                  : 0.000001
> [84951.680989]   .min_vruntime                  : 3527635.809340
> [84951.680990]   .max_vruntime                  : 0.000001
> [84951.680992]   .spread                        : 0.000000
> [84951.680993]   .spread0                       : -48827763.760470
> [84951.680995]   .nr_spread_over                : 0
> [84951.680996]   .nr_running                    : 0
> [84951.680998]   .load                          : 0
> [84951.680999]   .load_avg                      : 1058.675911
> [84951.681001]   .load_period                   : 6.328132
> [84951.681002]   .load_contrib                  : 155
> [84951.681004]   .load_tg                       : 1169
> [84951.681006]   .se->exec_start                : 84951678.610201
> [84951.681007]   .se->vruntime                  : 52355394.509141
> [84951.681009]   .se->sum_exec_runtime          : 3528010.809431
> [84951.681011]   .se->load.weight               : 2
> [84951.681013] 
> [84951.681013] cfs_rq[0]:/autogroup-39
> [84951.681015]   .exec_clock                    : 0.000000
> [84951.681016]   .MIN_vruntime                  : 0.000001
> [84951.681018]   .min_vruntime                  : 4931529.000259
> [84951.681020]   .max_vruntime                  : 0.000001
> [84951.681021]   .spread                        : 0.000000
> [84951.681023]   .spread0                       : -47423870.569551
> [84951.681025]   .nr_spread_over                : 0
> [84951.681026]   .nr_running                    : 0
> [84951.681028]   .load                          : 0
> [84951.681029]   .load_avg                      : 4701.742763
> [84951.681031]   .load_period                   : 9.302606
> [84951.681032]   .load_contrib                  : 505
> [84951.681034]   .load_tg                       : 1033
> [84951.681036]   .se->exec_start                : 84951675.470082
> [84951.681037]   .se->vruntime                  : 52355399.569810
> [84951.681039]   .se->sum_exec_runtime          : 4256492.512746
> [84951.681041]   .se->load.weight               : 2
> [84951.681043] 
> [84951.681043] cfs_rq[0]:/autogroup-84
> [84951.681045]   .exec_clock                    : 0.000000
> [84951.681047]   .MIN_vruntime                  : 12773469.982548
> [84951.681048]   .min_vruntime                  : 12773475.929218
> [84951.681050]   .max_vruntime                  : 12773475.929218
> [84951.681052]   .spread                        : 5.946670
> [84951.681053]   .spread0                       : -39581923.640592
> [84951.681055]   .nr_spread_over                : 0
> [84951.681056]   .nr_running                    : 3
> [84951.681058]   .load                          : 1578
> [84951.681059]   .load_avg                      : 2389.940390
> [84951.681061]   .load_period                   : 6.650024
> [84951.681063]   .load_contrib                  : 289
> [84951.681064]   .load_tg                       : 310
> [84951.681066]   .se->exec_start                : 84951679.104864
> [84951.681068]   .se->vruntime                  : 52355396.091688
> [84951.681069]   .se->sum_exec_runtime          : 11567512.453793
> [84951.681071]   .se->load.weight               : 1010
> [84951.681073] 
> [84951.681073] cfs_rq[0]:/
> [84951.681075]   .exec_clock                    : 0.000000
> [84951.681076]   .MIN_vruntime                  : 52355393.569810
> [84951.681078]   .min_vruntime                  : 52355399.569810
> [84951.681080]   .max_vruntime                  : 52355393.569810
> [84951.681082]   .spread                        : 0.000000
> [84951.681083]   .spread0                       : 0.000000
> [84951.681085]   .nr_spread_over                : 0
> [84951.681086]   .nr_running                    : 2
> [84951.681087]   .load                          : 1542
> [84951.681089]   .load_avg                      : 0.000000
> [84951.681091]   .load_period                   : 0.000000
> [84951.681092]   .load_contrib                  : 0
> [84951.681093]   .load_tg                       : 0
> [84951.681095] 
> [84951.681095] rt_rq[0]:
> [84951.681097]   .rt_nr_running                 : 0
> [84951.681098]   .rt_throttled                  : 0
> [84951.681099]   .rt_time                       : 0.000000
> [84951.681101]   .rt_runtime                    : 950.000000
> [84951.681103] 
> [84951.681104] runnable tasks:
> [84951.681104]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
> [84951.681105] ----------------------------------------------------------------------------------------------------------
> [84951.681119] R         chrome  6859  12773469.929218  31318750   123               0               0               0.000000               0.000000               0.000000 /autogroup-84
> [84951.681129]           chrome  7268  12773475.929218  72627834   123               0               0               0.000000               0.000000               0.000000 /autogroup-84
> [84951.681135]  Chrome_ChildIOT  7270  12773469.982548  20491354   123               0               0               0.000000               0.000000               0.000000 /autogroup-84
> [84951.681143]  chromium-browse 28255   1323610.934982   2145602   120               0               0               0.000000               0.000000               0.000000 /autogroup-74
> [84951.681152] 
> [84951.681153] cpu#1, 2527.110 MHz
> [84951.681154]   .nr_running                    : 1
> [84951.681156]   .load                          : 1024
> [84951.681157]   .nr_switches                   : 193625464
> [84951.681159]   .nr_load_updates               : 19156541
> [84951.681160]   .nr_uninterruptible            : 7826
> [84951.681162]   .next_balance                  : 4316.130228
> [84951.681163]   .curr->pid                     : 5589
> [84951.681165]   .clock                         : 84951680.008137
> [84951.681167]   .cpu_load[0]                   : 1024
> [84951.681168]   .cpu_load[1]                   : 1024
> [84951.681169]   .cpu_load[2]                   : 1024
> [84951.681171]   .cpu_load[3]                   : 1024
> [84951.681172]   .cpu_load[4]                   : 1029
> [84951.681174] 
> [84951.681175] cfs_rq[1]:/autogroup-74
> [84951.681176]   .exec_clock                    : 0.000000
> [84951.681178]   .MIN_vruntime                  : 0.000001
> [84951.681179]   .min_vruntime                  : 1424197.278711
> [84951.681181]   .max_vruntime                  : 0.000001
> [84951.681183]   .spread                        : 0.000000
> [84951.681184]   .spread0                       : -50931202.291099
> [84951.681186]   .nr_spread_over                : 0
> [84951.681187]   .nr_running                    : 0
> [84951.681189]   .load                          : 0
> [84951.681190]   .load_avg                      : 5245.213184
> [84951.681192]   .load_period                   : 5.523241
> [84951.681194]   .load_contrib                  : 947
> [84951.681195]   .load_tg                       : 991
> [84951.681197]   .se->exec_start                : 84951600.488928
> [84951.681199]   .se->vruntime                  : 71918776.133351
> [84951.681201]   .se->sum_exec_runtime          : 2565479.414558
> [84951.681202]   .se->load.weight               : 2
> [84951.681204] 
> [84951.681205] cfs_rq[1]:/autogroup-20
> [84951.681206]   .exec_clock                    : 0.000000
> [84951.681208]   .MIN_vruntime                  : 0.000001
> [84951.681210]   .min_vruntime                  : 16579.934530
> [84951.681211]   .max_vruntime                  : 0.000001
> [84951.681213]   .spread                        : 0.000000
> [84951.681214]   .spread0                       : -52338819.635280
> [84951.681216]   .nr_spread_over                : 0
> [84951.681217]   .nr_running                    : 0
> [84951.681219]   .load                          : 0
> [84951.681220]   .load_avg                      : 440.283872
> [84951.681222]   .load_period                   : 6.022332
> [84951.681224]   .load_contrib                  : 59
> [84951.681225]   .load_tg                       : 257
> [84951.681227]   .se->exec_start                : 84951604.151757
> [84951.681229]   .se->vruntime                  : 71918779.019518
> [84951.681230]   .se->sum_exec_runtime          : 16600.817243
> [84951.681232]   .se->load.weight               : 2
> [84951.681234] 
> [84951.681234] cfs_rq[1]:/autogroup-39
> [84951.681236]   .exec_clock                    : 0.000000
> [84951.681238]   .MIN_vruntime                  : 0.000001
> [84951.681239]   .min_vruntime                  : 4238682.043047
> [84951.681241]   .max_vruntime                  : 0.000001
> [84951.681242]   .spread                        : 0.000000
> [84951.681244]   .spread0                       : -48116717.526763
> [84951.681246]   .nr_spread_over                : 0
> [84951.681247]   .nr_running                    : 0
> [84951.681248]   .load                          : 0
> [84951.681250]   .load_avg                      : 5176.058368
> [84951.681252]   .load_period                   : 9.746619
> [84951.681253]   .load_contrib                  : 528
> [84951.681255]   .load_tg                       : 1033
> [84951.681257]   .se->exec_start                : 84951600.594975
> [84951.681258]   .se->vruntime                  : 71918775.930447
> [84951.681260]   .se->sum_exec_runtime          : 4311876.124090
> [84951.681262]   .se->load.weight               : 2
> [84951.681264] 
> [84951.681264] cfs_rq[1]:/autogroup-84
> [84951.681266]   .exec_clock                    : 0.000000
> [84951.681267]   .MIN_vruntime                  : 0.000001
> [84951.681269]   .min_vruntime                  : 7070943.272798
> [84951.681271]   .max_vruntime                  : 0.000001
> [84951.681272]   .spread                        : 0.000000
> [84951.681274]   .spread0                       : -45284456.297012
> [84951.681275]   .nr_spread_over                : 0
> [84951.681277]   .nr_running                    : 0
> [84951.681278]   .load                          : 0
> [84951.681280]   .load_avg                      : 5259.999474
> [84951.681281]   .load_period                   : 9.999999
> [84951.681283]   .load_contrib                  : 21
> [84951.681284]   .load_tg                       : 310
> [84951.681286]   .se->exec_start                : 84951649.364685
> [84951.681288]   .se->vruntime                  : 71918824.034522
> [84951.681290]   .se->sum_exec_runtime          : 6502960.496865
> [84951.681292]   .se->load.weight               : 2
> [84951.681293] 
> [84951.681294] cfs_rq[1]:/autogroup-28
> [84951.681295]   .exec_clock                    : 0.000000
> [84951.681297]   .MIN_vruntime                  : 0.000001
> [84951.681299]   .min_vruntime                  : 1961436.099473
> [84951.681301]   .max_vruntime                  : 0.000001
> [84951.681302]   .spread                        : 0.000000
> [84951.681304]   .spread0                       : -50393963.470337
> [84951.681305]   .nr_spread_over                : 0
> [84951.681307]   .nr_running                    : 0
> [84951.681308]   .load                          : 0
> [84951.681310]   .load_avg                      : 5174.934016
> [84951.681311]   .load_period                   : 5.101167
> [84951.681313]   .load_contrib                  : 1014
> [84951.681314]   .load_tg                       : 1169
> [84951.681316]   .se->exec_start                : 84951649.415319
> [84951.681318]   .se->vruntime                  : 71918824.084299
> [84951.681320]   .se->sum_exec_runtime          : 1961721.167843
> [84951.681321]   .se->load.weight               : 2
> [84951.681323] 
> [84951.681323] cfs_rq[1]:/
> [84951.681325]   .exec_clock                    : 0.000000
> [84951.681326]   .MIN_vruntime                  : 0.000001
> [84951.681328]   .min_vruntime                  : 71918860.488271
> [84951.681330]   .max_vruntime                  : 0.000001
> [84951.681331]   .spread                        : 0.000000
> [84951.681333]   .spread0                       : 19563460.918461
> [84951.681334]   .nr_spread_over                : 0
> [84951.681336]   .nr_running                    : 1
> [84951.681337]   .load                          : 1024
> [84951.681339]   .load_avg                      : 0.000000
> [84951.681340]   .load_period                   : 0.000000
> [84951.681342]   .load_contrib                  : 0
> [84951.681343]   .load_tg                       : 0
> [84951.681345] 
> [84951.681345] rt_rq[1]:
> [84951.681346]   .rt_nr_running                 : 0
> [84951.681348]   .rt_throttled                  : 0
> [84951.681349]   .rt_time                       : 0.000000
> [84951.681351]   .rt_runtime                    : 950.000000
> [84951.681353] 
> [84951.681353] runnable tasks:
> [84951.681354]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
> [84951.681355] ----------------------------------------------------------------------------------------------------------
> [84951.681370] Rbtrfs-delalloc-  5589  71918860.488271    173262   120               0               0               0.000000               0.000000               0.000000 /



> [84959.028005] Sched Debug Version: v0.10, 3.4.4-amd64-preempt-noide-20120410 #1
> [84959.028005] ktime                                   : 84959073.124471
> [84959.028005] sched_clk                               : 55828156.022356
> [84959.028005] cpu_clk                                 : 84959028.005869
> [84959.028005] jiffies                                 : 4316132062
> [84959.028005] sched_clock_stable                      : 0
> [84959.028005] 
> [84959.028005] sysctl_sched
> [84959.028005]   .sysctl_sched_latency                    : 12.000000
> [84959.028005]   .sysctl_sched_min_granularity            : 1.500000
> [84959.028005]   .sysctl_sched_wakeup_granularity         : 2.000000
> [84959.028005]   .sysctl_sched_child_runs_first           : 0
> [84959.028005]   .sysctl_sched_features                   : 24119
> [84959.028005]   .sysctl_sched_tunable_scaling            : 1 (logaritmic)
> [84959.028005] 
> [84959.028005] cpu#0, 2527.110 MHz
> [84959.028005]   .nr_running                    : 1
> [84959.028005]   .load                          : 1024
> [84959.028005]   .nr_switches                   : 298019849
> [84959.028005]   .nr_load_updates               : 16860702
> [84959.028005]   .nr_uninterruptible            : -7812
> [84959.028005]   .next_balance                  : 4316.132064
> [84959.028005]   .curr->pid                     : 5589
> [84959.028005]   .clock                         : 84959024.005728
> [84959.028005]   .cpu_load[0]                   : 1024
> [84959.028005]   .cpu_load[1]                   : 1024
> [84959.028005]   .cpu_load[2]                   : 1024
> [84959.028005]   .cpu_load[3]                   : 1024
> [84959.028005]   .cpu_load[4]                   : 1024
> [84959.028005] 
> [84959.028005] cfs_rq[0]:/
> [84959.028005]   .exec_clock                    : 0.000000
> [84959.028005]   .MIN_vruntime                  : 0.000001
> [84959.028005]   .min_vruntime                  : 52362263.842363
> [84959.028005]   .max_vruntime                  : 0.000001
> [84959.028005]   .spread                        : 0.000000
> [84959.028005]   .spread0                       : 0.000000
> [84959.028005]   .nr_spread_over                : 0
> [84959.028005]   .nr_running                    : 1
> [84959.028005]   .load                          : 1024
> [84959.028005]   .load_avg                      : 0.000000
> [84959.028005]   .load_period                   : 0.000000
> [84959.028005]   .load_contrib                  : 0
> [84959.028005]   .load_tg                       : 0
> [84959.028005] 
> [84959.028005] rt_rq[0]:
> [84959.028005]   .rt_nr_running                 : 0
> [84959.028005]   .rt_throttled                  : 0
> [84959.028005]   .rt_time                       : 0.000000
> [84959.028005]   .rt_runtime                    : 950.000000
> [84959.028005] 
> [84959.028005] runnable tasks:
> [84959.028005]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
> [84959.028005] ----------------------------------------------------------------------------------------------------------
> [84959.028005] Rbtrfs-delalloc-  5589  52362263.842363    173591   120               0               0               0.000000               0.000000               0.000000 /
> [84959.028005] 
> [84959.028005] cpu#1, 2527.110 MHz
> [84959.028005]   .nr_running                    : 0
> [84959.028005]   .load                          : 0
> [84959.028005]   .nr_switches                   : 193660011
> [84959.028005]   .nr_load_updates               : 19157733
> [84959.028005]   .nr_uninterruptible            : 7826
> [84959.028005]   .next_balance                  : 4316.132092
> [84959.028005]   .curr->pid                     : 0
> [84959.028005]   .clock                         : 84959064.238592
> [84959.028005]   .cpu_load[0]                   : 4096
> [84959.028005]   .cpu_load[1]                   : 2072
> [84959.028005]   .cpu_load[2]                   : 1194
> [84959.028005]   .cpu_load[3]                   : 869
> [84959.028005]   .cpu_load[4]                   : 762
> [84959.028005] 
> [84959.028005] cfs_rq[1]:/autogroup-69
> [84959.028005]   .exec_clock                    : 0.000000
> [84959.028005]   .MIN_vruntime                  : 0.000001
> [84959.028005]   .min_vruntime                  : 223361.691158
> [84959.028005]   .max_vruntime                  : 0.000001
> [84959.028005]   .spread                        : 0.000000
> [84959.028005]   .spread0                       : -52138902.151205
> [84959.028005]   .nr_spread_over                : 0
> [84959.028005]   .nr_running                    : 0
> [84959.028005]   .load                          : 0
> [84959.028005]   .load_avg                      : 2590.643456
> [84959.028005]   .load_period                   : 6.494722
> [84959.028005]   .load_contrib                  : 398
> [84959.028005]   .load_tg                       : 398
> [84959.028005]   .se->exec_start                : 84959056.335435
> [84959.028005]   .se->vruntime                  : 71921556.083864
> [84959.028005]   .se->sum_exec_runtime          : 271360.995455
> [84959.028005]   .se->load.weight               : 2
> [84959.028005] 
> [84959.028005] cfs_rq[1]:/autogroup-20
> [84959.028005]   .exec_clock                    : 0.000000
> [84959.028005]   .MIN_vruntime                  : 0.000001
> [84959.028005]   .min_vruntime                  : 16585.443509
> [84959.028005]   .max_vruntime                  : 0.000001
> [84959.028005]   .spread                        : 0.000000
> [84959.028005]   .spread0                       : -52345678.398854
> [84959.028005]   .nr_spread_over                : 0
> [84959.028005]   .nr_running                    : 0
> [84959.028005]   .load                          : 0
> [84959.028005]   .load_avg                      : 347.126752
> [84959.028005]   .load_period                   : 6.020438
> [84959.028005]   .load_contrib                  : 57
> [84959.028005]   .load_tg                       : 57
> [84959.028005]   .se->exec_start                : 84959060.122289
> [84959.028005]   .se->vruntime                  : 71921556.040907
> [84959.028005]   .se->sum_exec_runtime          : 16606.326222
> [84959.028005]   .se->load.weight               : 2
> [84959.028005] 
> [84959.028005] cfs_rq[1]:/autogroup-84
> [84959.028005]   .exec_clock                    : 0.000000
> [84959.028005]   .MIN_vruntime                  : 0.000001
> [84959.028005]   .min_vruntime                  : 7072493.859899
> [84959.028005]   .max_vruntime                  : 0.000001
> [84959.028005]   .spread                        : 0.000000
> [84959.028005]   .spread0                       : -45289769.982464
> [84959.028005]   .nr_spread_over                : 0
> [84959.028005]   .nr_running                    : 0
> [84959.028005]   .load                          : 0
> [84959.028005]   .load_avg                      : 6650.423655
> [84959.028005]   .load_period                   : 8.469275
> [84959.028005]   .load_contrib                  : 785
> [84959.028005]   .load_tg                       : 785
> [84959.028005]   .se->exec_start                : 84959064.185687
> [84959.028005]   .se->vruntime                  : 71921564.865429
> [84959.028005]   .se->sum_exec_runtime          : 6504348.221165
> [84959.028005]   .se->load.weight               : 2
> [84959.028005] 
> [84959.028005] cfs_rq[1]:/autogroup-28
> [84959.028005]   .exec_clock                    : 0.000000
> [84959.028005]   .MIN_vruntime                  : 0.000001
> [84959.028005]   .min_vruntime                  : 1961906.582524
> [84959.028005]   .max_vruntime                  : 0.000001
> [84959.028005]   .spread                        : 0.000000
> [84959.028005]   .spread0                       : -50400357.259839
> [84959.028005]   .nr_spread_over                : 0
> [84959.028005]   .nr_running                    : 0
> [84959.028005]   .load                          : 0
> [84959.028005]   .load_avg                      : 994.827639
> [84959.028005]   .load_period                   : 7.072831
> [84959.028005]   .load_contrib                  : 140
> [84959.028005]   .load_tg                       : 140
> [84959.028005]   .se->exec_start                : 84959064.180949
> [84959.028005]   .se->vruntime                  : 71921559.003595
> [84959.028005]   .se->sum_exec_runtime          : 1962191.771814
> [84959.028005]   .se->load.weight               : 2
> [84959.028005] 
> [84959.028005] cfs_rq[1]:/autogroup-39
> [84959.028005]   .exec_clock                    : 0.000000
> [84959.028005]   .MIN_vruntime                  : 0.000001
> [84959.028005]   .min_vruntime                  : 4239411.104825
> [84959.028005]   .max_vruntime                  : 0.000001
> [84959.028005]   .spread                        : 0.000000
> [84959.028005]   .spread0                       : -48122852.737538
> [84959.028005]   .nr_spread_over                : 0
> [84959.028005]   .nr_running                    : 0
> [84959.028005]   .load                          : 0
> [84959.028005]   .load_avg                      : 7224.556007
> [84959.028005]   .load_period                   : 8.698458
> [84959.028005]   .load_contrib                  : 830
> [84959.028005]   .load_tg                       : 830
> [84959.028005]   .se->exec_start                : 84959060.995273
> [84959.028005]   .se->vruntime                  : 71921558.657276
> [84959.028005]   .se->sum_exec_runtime          : 4312867.647890
> [84959.028005]   .se->load.weight               : 2
> [84959.028005] 
> [84959.028005] cfs_rq[1]:/autogroup-74
> [84959.028005]   .exec_clock                    : 0.000000
> [84959.028005]   .MIN_vruntime                  : 0.000001
> [84959.028005]   .min_vruntime                  : 1424529.029769
> [84959.028005]   .max_vruntime                  : 0.000001
> [84959.028005]   .spread                        : 0.000000
> [84959.028005]   .spread0                       : -50937734.812594
> [84959.028005]   .nr_spread_over                : 0
> [84959.028005]   .nr_running                    : 0
> [84959.028005]   .load                          : 0
> [84959.028005]   .load_avg                      : 6439.299117
> [84959.028005]   .load_period                   : 8.633038
> [84959.028005]   .load_contrib                  : 745
> [84959.028005]   .load_tg                       : 745
> [84959.028005]   .se->exec_start                : 84959061.053899
> [84959.028005]   .se->vruntime                  : 71921559.096778
> [84959.028005]   .se->sum_exec_runtime          : 2566083.226405
> [84959.028005]   .se->load.weight               : 2
> [84959.028005] 
> [84959.028005] cfs_rq[1]:/
> [84959.028005]   .exec_clock                    : 0.000000
> [84959.028005]   .MIN_vruntime                  : 0.000001
> [84959.028005]   .min_vruntime                  : 71921564.865429
> [84959.028005]   .max_vruntime                  : 0.000001
> [84959.028005]   .spread                        : 0.000000
> [84959.028005]   .spread0                       : 19559301.023066
> [84959.028005]   .nr_spread_over                : 0
> [84959.028005]   .nr_running                    : 0
> [84959.028005]   .load                          : 0
> [84959.028005]   .load_avg                      : 0.000000
> [84959.028005]   .load_period                   : 0.000000
> [84959.028005]   .load_contrib                  : 0
> [84959.028005]   .load_tg                       : 0
> [84959.028005] 
> [84959.028005] rt_rq[1]:
> [84959.028005]   .rt_nr_running                 : 0
> [84959.028005]   .rt_throttled                  : 0
> [84959.028005]   .rt_time                       : 0.000000
> [84959.028005]   .rt_runtime                    : 950.000000
> [84959.028005] 
> [84959.028005] runnable tasks:
> [84959.028005]             task   PID         tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
> [84959.028005] ----------------------------------------------------------------------------------------------------------
> [84959.028005] 
> [84960.376059] INFO: task flush-btrfs-1:399 blocked for more than 120 seconds.



> [85748.849847] BTRFS warning (device dm-0): Aborting unused transaction.
> [85754.690251] BTRFS warning (device dm-0): Aborting unused transaction.
> [85754.787953] BTRFS warning (device dm-0): Aborting unused transaction.
> [85754.871973] BTRFS warning (device dm-0): Aborting unused transaction.
> [85754.971097] BTRFS warning (device dm-0): Aborting unused transaction.
> [85755.242875] BTRFS warning (device dm-0): Aborting unused transaction.
> [85755.522664] BTRFS warning (device dm-0): Aborting unused transaction.
> [85755.670110] BTRFS warning (device dm-0): Aborting unused transaction.
> [85755.811342] BTRFS warning (device dm-0): Aborting unused transaction.
> [85755.971524] BTRFS warning (device dm-0): Aborting unused transaction.
> [85756.175633] BTRFS warning (device dm-0): Aborting unused transaction.
> [85756.329593] BTRFS warning (device dm-0): Aborting unused transaction.
> [85757.064995] BTRFS warning (device dm-0): Aborting unused transaction.
> [85757.210976] BTRFS warning (device dm-0): Aborting unused transaction.
> [85758.058191] BTRFS warning (device dm-0): Aborting unused transaction.
> [85758.454009] BTRFS warning (device dm-0): Aborting unused transaction.
> [85758.742629] BTRFS warning (device dm-0): Aborting unused transaction.
> [85759.082905] BTRFS warning (device dm-0): Aborting unused transaction.
> [85760.355436] BTRFS warning (device dm-0): Aborting unused transaction.
> [85760.795476] BTRFS warning (device dm-0): Aborting unused transaction.
> [85760.921292] BTRFS warning (device dm-0): Aborting unused transaction.
> [85761.041269] BTRFS warning (device dm-0): Aborting unused transaction.
> [85761.160903] BTRFS warning (device dm-0): Aborting unused transaction.
> [85761.265970] BTRFS warning (device dm-0): Aborting unused transaction.
> [85761.376858] BTRFS warning (device dm-0): Aborting unused transaction.
> [85762.271769] BTRFS warning (device dm-0): Aborting unused transaction.
> [85762.888725] BTRFS warning (device dm-0): Aborting unused transaction.


-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction
  2012-07-04 15:15           ` Marc MERLIN
@ 2012-07-05 13:25             ` Liu Bo
  2012-07-05 14:34               ` Marc MERLIN
  0 siblings, 1 reply; 10+ messages in thread
From: Liu Bo @ 2012-07-05 13:25 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs

On 07/04/2012 11:15 PM, Marc MERLIN wrote:

> On Wed, Jul 04, 2012 at 01:58:31PM +0800, Liu Bo wrote:
>> > The dmesg log, sysrq log and stack dump info can usually be very helpful.
>> > 
>> > From your report, we can see the csum error and hang on log,
>> > 'no csum' is not that bad while hanging-on is serious and dangerous.
>> > 
>> > so can you please get any 'sysrq + w' log in the hanging-on case and paste them here,
>> > and the log may tell us who blocks other threads.
> 
> Hi, thanks for the answer.
>  
> I dumped all sysrq data, that was in my original Email. Here are two
> different sysrq+w runs, as well as aborted transaction messages from that
> Email. 
> Sorry that the original was a bit long and contained a bunch of sysrq output.
> 
>>From doing further testing since then, it does seem that the code just start
> doing bad things, including the file corruption I saw, when I'm running low
> on free space.
> 
> Anything else that would help?
> 
> Thanks,
> Marc
> 


I'd expect to get some info from the following one, but I fails.

Is it reproducible on your box?

thanks,
liubo

> 
> Back to 3.2.16, I'm now seeing this:
> [  840.516733] INFO: task VirtualBox:6818 blocked for more than 120 seconds.
> [  840.516735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  840.516736] VirtualBox      D ffff8801fd134080     0  6818   6758 0x00000080
> [  840.516740]  ffff8801fd134080 0000000000000086 0000000000000050 ffff880202e7f100
> [  840.516744]  0000000000013580 ffff8801c6f0dfd8 ffff8801c6f0dfd8 ffff8801fd134080
> [  840.516748]  ffff8801c6f0da68 ffff8801c6f0da68 ffff88020a4e22f0 ffff88023bc13e08
> [  840.516752] Call Trace:
> [  840.516755]  [<ffffffff810b5c67>] ? __lock_page+0x66/0x66
> [  840.516758]  [<ffffffff8134aea4>] ? io_schedule+0x58/0x6f
> [  840.516761]  [<ffffffff810b5c6d>] ? sleep_on_page+0x6/0xa
> [  840.516764]  [<ffffffff8134b1e5>] ? __wait_on_bit_lock+0x3c/0x85
> [  840.516767]  [<ffffffff810b5c62>] ? __lock_page+0x61/0x66
> [  840.516770]  [<ffffffff81060051>] ? autoremove_wake_function+0x2a/0x2a
> [  840.516785]  [<ffffffffa01838d7>] ? extent_write_cache_pages.isra.13.constprop.22+0xf6/0x278 [btrfs]
> [  840.516789]  [<ffffffff810ec9cb>] ? __cache_free.isra.40+0x19/0x1a7
> [  840.516792]  [<ffffffff8134ed52>] ? sub_preempt_count+0x83/0x94
> [  840.516795]  [<ffffffff8134c2dd>] ? _raw_spin_unlock+0x24/0x30
> [  840.516811]  [<ffffffffa0183c4b>] ? extent_writepages+0x40/0x57 [btrfs]
> [  840.516826]  [<ffffffffa0177f5f>] ? __btrfs_buffered_write+0x2bb/0x2dc [btrfs]
> [  840.516841]  [<ffffffffa016e88a>] ? uncompress_inline.isra.44+0x116/0x116 [btrfs]
> [  840.516844]  [<ffffffff810b6aaf>] ? __filemap_fdatawrite_range+0x4b/0x50
> [  840.516847]  [<ffffffff810b6ad9>] ? filemap_write_and_wait_range+0x25/0x4d
> [  840.516863]  [<ffffffffa01782ce>] ? btrfs_file_aio_write+0x34e/0x490 [btrfs]
> [  840.516866]  [<ffffffff8103e092>] ? get_parent_ip+0x9/0x1b
> [  840.516882]  [<ffffffffa0177f80>] ? __btrfs_buffered_write+0x2dc/0x2dc [btrfs]
> [  840.516886]  [<ffffffff8112f19c>] ? aio_rw_vect_retry+0x70/0x18e
> [  840.516888]  [<ffffffff8112f12c>] ? aio_fsync+0x22/0x22
> [  840.516891]  [<ffffffff8112fbc7>] ? aio_run_iocb+0x72/0x11c
> [  840.516894]  [<ffffffff81130d9a>] ? do_io_submit+0x6a4/0x7f9
> [  840.516898]  [<ffffffff813508d2>] ? system_call_fastpath+0x16/0x1b
> [ 1187.553635] btrfs: unlinked 8 orphans
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction
  2012-07-05 13:25             ` Liu Bo
@ 2012-07-05 14:34               ` Marc MERLIN
  2012-07-18 18:01                 ` Long btrfs hangs during suspend to RAM / BTRFS warning (device Marc MERLIN
  0 siblings, 1 reply; 10+ messages in thread
From: Marc MERLIN @ 2012-07-05 14:34 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Thu, Jul 05, 2012 at 09:25:44PM +0800, Liu Bo wrote:
> On 07/04/2012 11:15 PM, Marc MERLIN wrote:
> I'd expect to get some info from the following one, but I fails.
> 
> Is it reproducible on your box?

It was reproducible when I reported it, but not anymore now. I couldn't stay
in a state where I was getting hangs and corruption.

I had problems with btrfs-delalloc- taking most of the CPU and deadlocking
the machine for minutes or even over half an hour. I'm pretty sure it was
related to the fact that I only had 10GB free on my partition (including
lots of snapshots).
Now that I'm back to >40GB free, I haven't had the problem come back.

How about these I reported:
[102231.845170] btrfs no csum found for inode 3910562 start 0
[102231.855415] btrfs_readpage_end_io_hook: 26 callbacks suppressed
[102231.855418] btrfs csum failed ino 3910562 off 0 csum 3145117582 private 0
[102231.999555] btrfs no csum found for inode 3910588 start 294912
[102232.009479] btrfs csum failed ino 3910588 off 294912 csum 697692408 private 0
[102232.142602] btrfs no csum found for inode 3910588 start 360448
[102232.142661] btrfs no csum found for inode 3910588 start 393216
[102232.968939] BTRFS warning (device dm-0): Aborting unused transaction.
[102234.107914] BTRFS warning (device dm-0): Aborting unused transaction.
[102235.218139] BTRFS warning (device dm-0): Aborting unused transaction.

I know I got corruption since I saw some of my files being damaged, but 
are they useful in any way, or not so much?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device
  2012-07-05 14:34               ` Marc MERLIN
@ 2012-07-18 18:01                 ` Marc MERLIN
  2012-07-19  1:00                   ` Liu Bo
  0 siblings, 1 reply; 10+ messages in thread
From: Marc MERLIN @ 2012-07-18 18:01 UTC (permalink / raw)
  To: Liu Bo; +Cc: linux-btrfs

On Thu, Jul 05, 2012 at 07:34:17AM -0700, Marc MERLIN wrote:
> On Thu, Jul 05, 2012 at 09:25:44PM +0800, Liu Bo wrote:
> > On 07/04/2012 11:15 PM, Marc MERLIN wrote:
> > I'd expect to get some info from the following one, but I fails.
> > 
> > Is it reproducible on your box?
> 
> It was reproducible when I reported it, but not anymore now. I couldn't stay
> in a state where I was getting hangs and corruption.
> 
> I had problems with btrfs-delalloc- taking most of the CPU and deadlocking
> the machine for minutes or even over half an hour. I'm pretty sure it was
> related to the fact that I only had 10GB free on my partition (including
> lots of snapshots).
> Now that I'm back to >40GB free, I haven't had the problem come back.
> 
> How about these I reported:
> [102231.845170] btrfs no csum found for inode 3910562 start 0
> [102231.855415] btrfs_readpage_end_io_hook: 26 callbacks suppressed
> [102231.855418] btrfs csum failed ino 3910562 off 0 csum 3145117582 private 0
> [102231.999555] btrfs no csum found for inode 3910588 start 294912
> [102232.009479] btrfs csum failed ino 3910588 off 294912 csum 697692408 private 0
> [102232.142602] btrfs no csum found for inode 3910588 start 360448
> [102232.142661] btrfs no csum found for inode 3910588 start 393216
> [102232.968939] BTRFS warning (device dm-0): Aborting unused transaction.
> [102234.107914] BTRFS warning (device dm-0): Aborting unused transaction.
> [102235.218139] BTRFS warning (device dm-0): Aborting unused transaction.
> 
> I know I got corruption since I saw some of my files being damaged, but 
> are they useful in any way, or not so much?

Just to put a lid (maybe) on that thread, I now think that it's very
possible my btrfs hangs when the drive was almost full (10GB remaining
though) were due to my Crucial RealSSD C300, which has just mostly died
yesterday after (I think) the firmware got too many problems trying to remap
blocks (garbage collection) since I wasn't using TRIM.
(yes, I had the latest 007 firmware on the SSD).

I don't know that the SSD caused the corruption I saw in some files and that
went away when I dropped the latest btrfs snapshot, but at this point I'm
pretty sure that the SSD went into emergency garbage collection mode when
its space was running low, and in turn was causing btrfs to hang.

Sorry for the most likely false alarm (except for snapshot corruption which
is likely btrfs related).

I'll post another message about no TRIM with dmcrypt and possible ensuing
SSD problems.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Long btrfs hangs during suspend to RAM / BTRFS warning (device
  2012-07-18 18:01                 ` Long btrfs hangs during suspend to RAM / BTRFS warning (device Marc MERLIN
@ 2012-07-19  1:00                   ` Liu Bo
  0 siblings, 0 replies; 10+ messages in thread
From: Liu Bo @ 2012-07-19  1:00 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: linux-btrfs

On 07/19/2012 02:01 AM, Marc MERLIN wrote:

> 
> Just to put a lid (maybe) on that thread, I now think that it's very
> possible my btrfs hangs when the drive was almost full (10GB remaining
> though) were due to my Crucial RealSSD C300, which has just mostly died
> yesterday after (I think) the firmware got too many problems trying to remap
> blocks (garbage collection) since I wasn't using TRIM.
> (yes, I had the latest 007 firmware on the SSD).
> 
> I don't know that the SSD caused the corruption I saw in some files and that
> went away when I dropped the latest btrfs snapshot, but at this point I'm
> pretty sure that the SSD went into emergency garbage collection mode when
> its space was running low, and in turn was causing btrfs to hang.
> 
> Sorry for the most likely false alarm (except for snapshot corruption which
> is likely btrfs related).
> 
> I'll post another message about no TRIM with dmcrypt and possible ensuing
> SSD problems.
> 
> Marc


Hi Marc,

It's ok, any user report is WELCOME. :)

thanks,
liubo

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-07-19  1:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20120626193637.GA27856@merlins.org>
2012-06-27  1:38 ` Long btrfs hangs during suspend to RAM / BTRFS warning (device dm-0): Aborting unused transaction Marc MERLIN
2012-06-27  5:20   ` Marc MERLIN
2012-06-29 12:36     ` Marc MERLIN
2012-07-02 19:58       ` Marc MERLIN
2012-07-04  5:58         ` Liu Bo
2012-07-04 15:15           ` Marc MERLIN
2012-07-05 13:25             ` Liu Bo
2012-07-05 14:34               ` Marc MERLIN
2012-07-18 18:01                 ` Long btrfs hangs during suspend to RAM / BTRFS warning (device Marc MERLIN
2012-07-19  1:00                   ` Liu Bo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.