All of lore.kernel.org
 help / color / mirror / Atom feed
* 3.16.3: fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty
@ 2014-12-28 19:26 Marc MERLIN
  2014-12-28 20:00 ` Roman Mamedov
  0 siblings, 1 reply; 10+ messages in thread
From: Marc MERLIN @ 2014-12-28 19:26 UTC (permalink / raw)
  To: Btrfs BTRFS

Not sure if it's useful to anyone, but there you go. This happened after a forced
power cycle:

BTRFS info (device dm-1): disk space caching is enabled
------------[ cut here ]------------
WARNING: CPU: 2 PID: 778 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x32/0x34()
Modules linked in: aes_x86_64 lm85 hwmon_vid dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat x_tables nf_conntrack sg st snd_pcm_oss snd_mixer_oss fuse snd_hda_codec_realtek snd_hda_codec_generic microcode snd_cmipci gameport snd_hda_intel kvm_intel snd_hda_controller kvm snd_hda_codec snd_opl3_lib eeepc_wmi snd_mpu401_uart snd_seq_midi snd_seq_midi_event snd_seq asus_wmi battery snd_rawmidi snd_hwdep sparse_keymap rfkill snd_pcm snd_seq_device tpm_infineon snd_timer tpm_tis rc_ati_x10 asix coretemp tpm i2c_i801 snd processor wmi pl2303 kl5kusb105 libphy ati_remote parport_pc rc_core xhci_hcd intel_rapl keyspan ftdi_sio evdev usbnet soundcore pcspkr parport lpc_ich intel_powerclamp ezusb usbserial x86_pkg_temp_thermal xts gf128mul dm_crypt dm_mod raid456 async_raid6_recov async_pq async_xor async_memcpy async_tx e1000e ptp pps_core crc32c_intel crc32_pclmul sata_sil24 thermal crct10dif_pclmul ehci_pci ehci_hcd ghash_clmulni_intel r8169 cryptd fan mii usbcore usb_common sata_mv
CPU: 2 PID: 778 Comm: btrfs-transacti Tainted: G        W     3.16.7-amd64-i915-volpreempt-20141114 #1
Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3806 08/20/2012
 0000000000000000 ffff8802117abdc8 ffffffff816295db 0000000000000000
 ffff8802117abe00 ffffffff81051e2d ffffffff8127038d ffff880211534000
 ffff880212639980 0000000000000000 ffff8802137eaf00 ffff8802117abe10
Call Trace:
 [<ffffffff816295db>] dump_stack+0x45/0x56
 [<ffffffff81051e2d>] warn_slowpath_common+0x7f/0x98
 [<ffffffff8127038d>] ? btrfs_assert_delayed_root_empty+0x32/0x34
 [<ffffffff81051ef4>] warn_slowpath_null+0x1a/0x1c
 [<ffffffff8127038d>] btrfs_assert_delayed_root_empty+0x32/0x34
 [<ffffffff8122db95>] btrfs_commit_transaction+0x37f/0x867
 [<ffffffff8122a2f1>] transaction_kthread+0xec/0x19f
 [<ffffffff8122a205>] ? btrfs_cleanup_transaction+0x3f3/0x3f3
 [<ffffffff8106cd8f>] kthread+0xae/0xb6
 [<ffffffff8106cce1>] ? __kthread_parkme+0x61/0x61
 [<ffffffff8163007c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8106cce1>] ? __kthread_parkme+0x61/0x61
---[ end trace 32de13ca415f14fa ]---

I'm now getting this on interval as kernel spam.

It doesn't say which of my 4 btrfs volumes this is linked to, which
doesn't make life easier.

Any idea what I should do from here?

Will btrfs scrub, even if it takes about 24H to run for me, tell me
which FS is affected and if so do I run btrfs repair?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.16.3: fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty
  2014-12-28 19:26 3.16.3: fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty Marc MERLIN
@ 2014-12-28 20:00 ` Roman Mamedov
  2014-12-28 21:36   ` Marc MERLIN
  2014-12-30  1:06   ` Qu Wenruo
  0 siblings, 2 replies; 10+ messages in thread
From: Roman Mamedov @ 2014-12-28 20:00 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Btrfs BTRFS

On Sun, 28 Dec 2014 11:26:14 -0800
Marc MERLIN <marc@merlins.org> wrote:

> Not sure if it's useful to anyone, but there you go. This happened after a forced
> power cycle:
> 
> BTRFS info (device dm-1): disk space caching is enabled
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 778 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x32/0x34()
> Modules linked in: aes_x86_64 lm85 hwmon_vid dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat x_tables nf_conntrack sg st snd_pcm_oss snd_mixer_oss fuse snd_hda_codec_realtek snd_hda_codec_generic microcode snd_cmipci gameport snd_hda_intel kvm_intel snd_hda_controller kvm snd_hda_codec snd_opl3_lib eeepc_wmi snd_mpu401_uart snd_seq_midi snd_seq_midi_event snd_seq asus_wmi battery snd_rawmidi snd_hwdep sparse_keymap rfkill snd_pcm snd_seq_device tpm_infineon snd_timer tpm_tis rc_ati_x10 asix coretemp tpm i2c_i801 snd processor wmi pl2303 kl5kusb105 libphy ati_remote parport_pc rc_core xhci_hcd intel_rapl keyspan ftdi_sio evdev usbnet soundcore pcspkr parport lpc_ich intel_powerclamp ezusb usbserial x86_pkg_temp_thermal xts gf128mul dm_crypt dm_mod raid456 async_raid6_recov async_pq async_xor async_memcpy async_tx e1000e ptp pps_core crc32c_intel crc32_pclmul sata_sil24 thermal crct10dif_pclmul ehci_pci eh
 ci
>  _hcd ghash_clmulni_intel r8169 cryptd fan mii usbcore usb_common sata_mv
> CPU: 2 PID: 778 Comm: btrfs-transacti Tainted: G        W     3.16.7-amd64-i915-volpreempt-20141114 #1
> Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3806 08/20/2012
>  0000000000000000 ffff8802117abdc8 ffffffff816295db 0000000000000000
>  ffff8802117abe00 ffffffff81051e2d ffffffff8127038d ffff880211534000
>  ffff880212639980 0000000000000000 ffff8802137eaf00 ffff8802117abe10
> Call Trace:
>  [<ffffffff816295db>] dump_stack+0x45/0x56
>  [<ffffffff81051e2d>] warn_slowpath_common+0x7f/0x98
>  [<ffffffff8127038d>] ? btrfs_assert_delayed_root_empty+0x32/0x34
>  [<ffffffff81051ef4>] warn_slowpath_null+0x1a/0x1c
>  [<ffffffff8127038d>] btrfs_assert_delayed_root_empty+0x32/0x34
>  [<ffffffff8122db95>] btrfs_commit_transaction+0x37f/0x867
>  [<ffffffff8122a2f1>] transaction_kthread+0xec/0x19f
>  [<ffffffff8122a205>] ? btrfs_cleanup_transaction+0x3f3/0x3f3
>  [<ffffffff8106cd8f>] kthread+0xae/0xb6
>  [<ffffffff8106cce1>] ? __kthread_parkme+0x61/0x61
>  [<ffffffff8163007c>] ret_from_fork+0x7c/0xb0
>  [<ffffffff8106cce1>] ? __kthread_parkme+0x61/0x61
> ---[ end trace 32de13ca415f14fa ]---
> 
> I'm now getting this on interval as kernel spam.
> 
> It doesn't say which of my 4 btrfs volumes this is linked to, which
> doesn't make life easier.
> 
> Any idea what I should do from here?
> 
> Will btrfs scrub, even if it takes about 24H to run for me, tell me
> which FS is affected and if so do I run btrfs repair?

I had this: http://www.spinics.net/lists/linux-btrfs/msg40586.html

1) I determined which btrfs of the multiple ones that I have is the culprit, by
unmounting them one by one and seeing if the dmesg spam disappears;

2) Surprisingly(#1), it was not the one that was heavily operated on in the fashion
described in that message;

3) After that, I ran btrfsck (it did found some errors that looked like this,
repeated dozens of times, with different "root nnnnn" numbers):

root 22730 inode 97339 errors 200, dir isize wrong
root 22730 inode 4044171 errors 200, dir isize wrong
root 22730 inode 4478553 errors 200, dir isize wrong
root 22730 inode 6236418 errors 2000, link count wrong
        unresolved ref dir 105512 index 586340 namelen 48 name [redacted].dat.bak filetype 0 errors 3, no dir item, no dir index
root 22730 inode 6325949 errors 2000, link count wrong
        unresolved ref dir 105512 index 586348 namelen 48 name [redacted].dat.bak filetype 0 errors 3, no dir item, no dir index
root 22730 inode 6326136 errors 2000, link count wrong
        unresolved ref dir 105512 index 586344 namelen 48 name [redacted].dat.bak filetype 0 errors 3, no dir item, no dir index
root 22730 inode 6326291 errors 2000, link count wrong
        unresolved ref dir 104979 index 192292 namelen 16 name downloads.config filetype 0 errors 3, no dir item, no dir index
root 22730 inode 6326292 errors 2000, link count wrong
        unresolved ref dir 4376855 index 19522 namelen 15 name xfce4-panel.xml filetype 0 errors 3, no dir item, no dir index
root 22730 inode 6326296 errors 2000, link count wrong
        unresolved ref dir 104979 index 192295 namelen 18 name azureus.statistics filetype 0 errors 3, no dir item, no dir index
root 22730 inode 6326380 errors 2000, link count wrong
        unresolved ref dir 4478552 index 45107 namelen 11 name Local State filetype 0 errors 3, no dir item, no dir index
root 22730 inode 6326402 errors 2000, link count wrong
        unresolved ref dir 105469 index 233238 namelen 11 name diverse.dat filetype 0 errors 3, no dir item, no dir index
root 22730 inode 6326405 errors 2000, link count wrong
        unresolved ref dir 4478553 index 914792 namelen 17 name TransportSecurity filetype 0 errors 3, no dir item, no dir index

4) then btrfsck --repair;

5) The latter, after three hours flat of 100% CPU usage on a 4 GHz AMD FX, proceeding
very slowly with increasing the "root nnnnn" number and the same error descriptions,
still didn't manage to fix of them; so I terminated it as I had other work to do on
the machine, rather than sitting around with its key FSes unmounted;

6) Surprisingly(#2), despite apparently not all of the errors having been
fixed, the btrfs_assert_delayed_root_empty messages no longer appear in dmesg.

The current versions of files mentioned (xfce4-panel.xml and parts of the Chromium profile)
were of course corrupted, but I already noticed that and restored them from an earlier snapshot
even before starting the fsck (yes I also had backups, but didn't need them as snapshotted versions
were fine).

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.16.3: fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty
  2014-12-28 20:00 ` Roman Mamedov
@ 2014-12-28 21:36   ` Marc MERLIN
  2014-12-29 15:17     ` Chris Mason
  2014-12-30  1:06   ` Qu Wenruo
  1 sibling, 1 reply; 10+ messages in thread
From: Marc MERLIN @ 2014-12-28 21:36 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Btrfs BTRFS

On Mon, Dec 29, 2014 at 01:00:47AM +0500, Roman Mamedov wrote:
> > Will btrfs scrub, even if it takes about 24H to run for me, tell me
> > which FS is affected and if so do I run btrfs repair?
> 
> I had this: http://www.spinics.net/lists/linux-btrfs/msg40586.html
> 
> 1) I determined which btrfs of the multiple ones that I have is the culprit, by
> unmounting them one by one and seeing if the dmesg spam disappears;
 
And of course it's the root filesystem on a remote server which I can't
service remotely :-/

> 3) After that, I ran btrfsck (it did found some errors that looked like this,
> repeated dozens of times, with different "root nnnnn" numbers):
 
For the archives, one should use btrfs check --repair directly, btrfsck is
dead.

> 6) Surprisingly(#2), despite apparently not all of the errors having been
> fixed, the btrfs_assert_delayed_root_empty messages no longer appear in dmesg.
> 
> The current versions of files mentioned (xfce4-panel.xml and parts of the Chromium profile)
> were of course corrupted, but I already noticed that and restored them from an earlier snapshot
> even before starting the fsck (yes I also had backups, but didn't need them as snapshotted versions
> were fine).

Thanks for the info. I think for now I'll be forced to leave the broken
FS run as is and will deal with it when I get home.

Dear btrfs-devs: this is one more example of btrfs having a problem with
a non consistent state that ended up on disk.

I got there this way:
- btrfs on top of dmcrypt on top of md raid1 (sorry too many raid bugs
  in btrfs, so I went back to mdadm at the time)
- kernel bug in a serial driver was causing a loop, so I was forced to
  cycle power remotely
- btrfs got broken as per this mail.
- please please please, all warnings and bugs should still be fixed to
  output what device they happened on. Making the admin guess by trying
  filesystem one by one isn't really a good way.

Anyway, assuming there isn't a core bug in the btrfs "always consistent
state on disk" code, dmcrypt or mdadm prevented a consistent state from
reaching the disks.

Separately, I wish I could just fix this while the filesystem is online.
btrfs scrub ran totally clean with no errors :(
scrub device /dev/mapper/cryptroot (id 1) done
        scrub started at Sun Dec 28 12:07:55 2014 and finished after 512 seconds
        total bytes scrubbed: 25.95GiB with 0 errors

Thankfully the filesystem is still running for now, so it could be worse.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.16.3: fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty
  2014-12-28 21:36   ` Marc MERLIN
@ 2014-12-29 15:17     ` Chris Mason
  2014-12-29 15:41       ` Marc MERLIN
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Mason @ 2014-12-29 15:17 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Roman Mamedov, Btrfs BTRFS

On Sun, Dec 28, 2014 at 4:36 PM, Marc MERLIN <marc@merlins.org> wrote:
> On Mon, Dec 29, 2014 at 01:00:47AM +0500, Roman Mamedov wrote:
>>  > Will btrfs scrub, even if it takes about 24H to run for me, tell 
>> me
>>  > which FS is affected and if so do I run btrfs repair?
>> 
>>  I had this: 
>> https://urldefense.proofpoint.com/v1/url?u=http://www.spinics.net/lists/linux-btrfs/msg40586.html&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=6%2FL0lzzDhu0Y1hL9xm%2BQyA%3D%3D%0A&m=yBJylKLQ0wXzMPYXMMCJaXZfTMrX%2FbRGSoF3t%2FRZsUU%3D%0A&s=9d08d8fb169b6429b819fb9a0c2fda816b4b6c031ee4c5e6ca5a53bb04e3c067
>> 
>>  1) I determined which btrfs of the multiple ones that I have is the 
>> culprit, by
>>  unmounting them one by one and seeing if the dmesg spam disappears;
> 
> And of course it's the root filesystem on a remote server which I 
> can't
> service remotely :-/
> 
>>  3) After that, I ran btrfsck (it did found some errors that looked 
>> like this,
>>  repeated dozens of times, with different "root nnnnn" numbers):
> 
> For the archives, one should use btrfs check --repair directly, 
> btrfsck is
> dead.
> 
>>  6) Surprisingly(#2), despite apparently not all of the errors 
>> having been
>>  fixed, the btrfs_assert_delayed_root_empty messages no longer 
>> appear in dmesg.
>> 
>>  The current versions of files mentioned (xfce4-panel.xml and parts 
>> of the Chromium profile)
>>  were of course corrupted, but I already noticed that and restored 
>> them from an earlier snapshot
>>  even before starting the fsck (yes I also had backups, but didn't 
>> need them as snapshotted versions
>>  were fine).
> 
> Thanks for the info. I think for now I'll be forced to leave the 
> broken
> FS run as is and will deal with it when I get home.
> 
> Dear btrfs-devs: this is one more example of btrfs having a problem 
> with
> a non consistent state that ended up on disk.
> 
> I got there this way:
> - btrfs on top of dmcrypt on top of md raid1 (sorry too many raid bugs
>   in btrfs, so I went back to mdadm at the time)
> - kernel bug in a serial driver was causing a loop, so I was forced to
>   cycle power remotely
> - btrfs got broken as per this mail.
> - please please please, all warnings and bugs should still be fixed to
>   output what device they happened on. Making the admin guess by 
> trying
>   filesystem one by one isn't really a good way.
> 
> Anyway, assuming there isn't a core bug in the btrfs "always 
> consistent
> state on disk" code, dmcrypt or mdadm prevented a consistent state 
> from
> reaching the disks.
> 
> Separately, I wish I could just fix this while the filesystem is 
> online.
> btrfs scrub ran totally clean with no errors :(
> scrub device /dev/mapper/cryptroot (id 1) done
>         scrub started at Sun Dec 28 12:07:55 2014 and finished after 
> 512 seconds
>         total bytes scrubbed: 25.95GiB with 0 errors
> 
> Thankfully the filesystem is still running for now, so it could be 
> worse.


I've hit this recently on my laptop, and haven't yet been able to 
recreate it on a machine where I can debug things.  The messages are an 
error in the log tree replay code, and I don't think they are actually 
related to any corruptions.  Trying to nail it down today.

-chris




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.16.3: fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty
  2014-12-29 15:17     ` Chris Mason
@ 2014-12-29 15:41       ` Marc MERLIN
  2014-12-29 15:57         ` Chris Mason
  0 siblings, 1 reply; 10+ messages in thread
From: Marc MERLIN @ 2014-12-29 15:41 UTC (permalink / raw)
  To: Chris Mason; +Cc: Roman Mamedov, Btrfs BTRFS

On Mon, Dec 29, 2014 at 10:17:00AM -0500, Chris Mason wrote:
> I've hit this recently on my laptop, and haven't yet been able to
> recreate it on a machine where I can debug things.  The messages are
> an error in the log tree replay code, and I don't think they are
> actually related to any corruptions.  Trying to nail it down today.

Thanks for the update and looking at it.

Just to rule things out for me, on your laptop, are you running btrfs
directly on disk, or do you have layers like dmcrypt in the middle?
(having 2 other layers myself, I never know if it's btrfs that could be
to blame, or the other 2 layers not passing data through in atomic bits
like they're supposed to)

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.16.3: fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty
  2014-12-29 15:41       ` Marc MERLIN
@ 2014-12-29 15:57         ` Chris Mason
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Mason @ 2014-12-29 15:57 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Roman Mamedov, Btrfs BTRFS

On Mon, Dec 29, 2014 at 10:41 AM, Marc MERLIN <marc@merlins.org> wrote:
> On Mon, Dec 29, 2014 at 10:17:00AM -0500, Chris Mason wrote:
>>  I've hit this recently on my laptop, and haven't yet been able to
>>  recreate it on a machine where I can debug things.  The messages are
>>  an error in the log tree replay code, and I don't think they are
>>  actually related to any corruptions.  Trying to nail it down today.
> 
> Thanks for the update and looking at it.
> 
> Just to rule things out for me, on your laptop, are you running btrfs
> directly on disk, or do you have layers like dmcrypt in the middle?
> (having 2 other layers myself, I never know if it's btrfs that could 
> be
> to blame, or the other 2 layers not passing data through in atomic 
> bits
> like they're supposed to)

I do have dmcrypt, but I really think this is only in btrfs.

-chris




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 3.16.3: fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty
  2014-12-28 20:00 ` Roman Mamedov
  2014-12-28 21:36   ` Marc MERLIN
@ 2014-12-30  1:06   ` Qu Wenruo
  2014-12-31 18:30     ` [PATCH] Btrfs: don't delay inode ref updates during log replay Chris Mason
  1 sibling, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2014-12-30  1:06 UTC (permalink / raw)
  To: Roman Mamedov, Marc MERLIN; +Cc: Btrfs BTRFS


-------- Original Message --------
Subject: Re: 3.16.3: fs/btrfs/delayed-inode.c:1410 
btrfs_assert_delayed_root_empty
From: Roman Mamedov <rm@romanrm.net>
To: Marc MERLIN <marc@merlins.org>
Date: 2014年12月29日 04:00
> On Sun, 28 Dec 2014 11:26:14 -0800
> Marc MERLIN <marc@merlins.org> wrote:
>
>> Not sure if it's useful to anyone, but there you go. This happened after a forced
>> power cycle:
>>
>> BTRFS info (device dm-1): disk space caching is enabled
>> ------------[ cut here ]------------
>> WARNING: CPU: 2 PID: 778 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x32/0x34()
>> Modules linked in: aes_x86_64 lm85 hwmon_vid dm_snapshot dm_bufio iptable_nat ip_tables nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ftp ipt_MASQUERADE nf_nat x_tables nf_conntrack sg st snd_pcm_oss snd_mixer_oss fuse snd_hda_codec_realtek snd_hda_codec_generic microcode snd_cmipci gameport snd_hda_intel kvm_intel snd_hda_controller kvm snd_hda_codec snd_opl3_lib eeepc_wmi snd_mpu401_uart snd_seq_midi snd_seq_midi_event snd_seq asus_wmi battery snd_rawmidi snd_hwdep sparse_keymap rfkill snd_pcm snd_seq_device tpm_infineon snd_timer tpm_tis rc_ati_x10 asix coretemp tpm i2c_i801 snd processor wmi pl2303 kl5kusb105 libphy ati_remote parport_pc rc_core xhci_hcd intel_rapl keyspan ftdi_sio evdev usbnet soundcore pcspkr parport lpc_ich intel_powerclamp ezusb usbserial x86_pkg_temp_thermal xts gf128mul dm_crypt dm_mod raid456 async_raid6_recov async_pq async_xor async_memcpy async_tx e1000e ptp pps_core crc32c_intel crc32_pclmul sata_sil24 thermal crct10dif_pclmul eh!
>   ci_pci eh
>   ci
>>   _hcd ghash_clmulni_intel r8169 cryptd fan mii usbcore usb_common sata_mv
>> CPU: 2 PID: 778 Comm: btrfs-transacti Tainted: G        W     3.16.7-amd64-i915-volpreempt-20141114 #1
>> Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3806 08/20/2012
>>   0000000000000000 ffff8802117abdc8 ffffffff816295db 0000000000000000
>>   ffff8802117abe00 ffffffff81051e2d ffffffff8127038d ffff880211534000
>>   ffff880212639980 0000000000000000 ffff8802137eaf00 ffff8802117abe10
>> Call Trace:
>>   [<ffffffff816295db>] dump_stack+0x45/0x56
>>   [<ffffffff81051e2d>] warn_slowpath_common+0x7f/0x98
>>   [<ffffffff8127038d>] ? btrfs_assert_delayed_root_empty+0x32/0x34
>>   [<ffffffff81051ef4>] warn_slowpath_null+0x1a/0x1c
>>   [<ffffffff8127038d>] btrfs_assert_delayed_root_empty+0x32/0x34
>>   [<ffffffff8122db95>] btrfs_commit_transaction+0x37f/0x867
>>   [<ffffffff8122a2f1>] transaction_kthread+0xec/0x19f
>>   [<ffffffff8122a205>] ? btrfs_cleanup_transaction+0x3f3/0x3f3
>>   [<ffffffff8106cd8f>] kthread+0xae/0xb6
>>   [<ffffffff8106cce1>] ? __kthread_parkme+0x61/0x61
>>   [<ffffffff8163007c>] ret_from_fork+0x7c/0xb0
>>   [<ffffffff8106cce1>] ? __kthread_parkme+0x61/0x61
>> ---[ end trace 32de13ca415f14fa ]---
>>
>> I'm now getting this on interval as kernel spam.
>>
>> It doesn't say which of my 4 btrfs volumes this is linked to, which
>> doesn't make life easier.
>>
>> Any idea what I should do from here?
>>
>> Will btrfs scrub, even if it takes about 24H to run for me, tell me
>> which FS is affected and if so do I run btrfs repair?
> I had this: http://www.spinics.net/lists/linux-btrfs/msg40586.html
>
> 1) I determined which btrfs of the multiple ones that I have is the culprit, by
> unmounting them one by one and seeing if the dmesg spam disappears;
>
> 2) Surprisingly(#1), it was not the one that was heavily operated on in the fashion
> described in that message;
>
> 3) After that, I ran btrfsck (it did found some errors that looked like this,
> repeated dozens of times, with different "root nnnnn" numbers):
>
> root 22730 inode 97339 errors 200, dir isize wrong
> root 22730 inode 4044171 errors 200, dir isize wrong
> root 22730 inode 4478553 errors 200, dir isize wrong
> root 22730 inode 6236418 errors 2000, link count wrong
>          unresolved ref dir 105512 index 586340 namelen 48 name [redacted].dat.bak filetype 0 errors 3, no dir item, no dir index
> root 22730 inode 6325949 errors 2000, link count wrong
>          unresolved ref dir 105512 index 586348 namelen 48 name [redacted].dat.bak filetype 0 errors 3, no dir item, no dir index
> root 22730 inode 6326136 errors 2000, link count wrong
>          unresolved ref dir 105512 index 586344 namelen 48 name [redacted].dat.bak filetype 0 errors 3, no dir item, no dir index
> root 22730 inode 6326291 errors 2000, link count wrong
>          unresolved ref dir 104979 index 192292 namelen 16 name downloads.config filetype 0 errors 3, no dir item, no dir index
> root 22730 inode 6326292 errors 2000, link count wrong
>          unresolved ref dir 4376855 index 19522 namelen 15 name xfce4-panel.xml filetype 0 errors 3, no dir item, no dir index
> root 22730 inode 6326296 errors 2000, link count wrong
>          unresolved ref dir 104979 index 192295 namelen 18 name azureus.statistics filetype 0 errors 3, no dir item, no dir index
> root 22730 inode 6326380 errors 2000, link count wrong
>          unresolved ref dir 4478552 index 45107 namelen 11 name Local State filetype 0 errors 3, no dir item, no dir index
> root 22730 inode 6326402 errors 2000, link count wrong
>          unresolved ref dir 105469 index 233238 namelen 11 name diverse.dat filetype 0 errors 3, no dir item, no dir index
> root 22730 inode 6326405 errors 2000, link count wrong
>          unresolved ref dir 4478553 index 914792 namelen 17 name TransportSecurity filetype 0 errors 3, no dir item, no dir index
According to the btrfsck, it seems that at least 3 other users have 
already hit the problem,
and it seems to be the cause of the nlink problem.

Thanks,
Qu
>
> 4) then btrfsck --repair;
>
> 5) The latter, after three hours flat of 100% CPU usage on a 4 GHz AMD FX, proceeding
> very slowly with increasing the "root nnnnn" number and the same error descriptions,
> still didn't manage to fix of them; so I terminated it as I had other work to do on
> the machine, rather than sitting around with its key FSes unmounted;
>
> 6) Surprisingly(#2), despite apparently not all of the errors having been
> fixed, the btrfs_assert_delayed_root_empty messages no longer appear in dmesg.
>
> The current versions of files mentioned (xfce4-panel.xml and parts of the Chromium profile)
> were of course corrupted, but I already noticed that and restored them from an earlier snapshot
> even before starting the fsck (yes I also had backups, but didn't need them as snapshotted versions
> were fine).
>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] Btrfs: don't delay inode ref updates during log replay
  2014-12-30  1:06   ` Qu Wenruo
@ 2014-12-31 18:30     ` Chris Mason
  2015-01-01 22:58       ` Marc MERLIN
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Mason @ 2014-12-31 18:30 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Roman Mamedov, Marc MERLIN, Btrfs BTRFS

Commit 1d52c78afbb (Btrfs: try not to ENOSPC on log replay) added a
check to skip delayed inode updates during log replay because it
confuses the enospc code.  But the delayed processing will end up
skipping delayed refs from log replay because the inode itself wasn't
put through the delayed code.

This can end up triggering a warning at commit time:

WARNING: CPU: 2 PID: 778 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x32/0x34()

Which is repeated for each commit because we never process the delayed ref.

The fix used here is to change btrfs_delayed_delete_inode_ref to return
an error if we're current in log replay.  The caller will do the ref
deletion immediately and everything will work properly.

This bug can cause lost files, whick fsck will find.  --repair on
btrfs-progs 3.18 will fix them

Signed-off-by: Chris Mason <clm@fb.com>
cc: stable@vger.kernel.org # v3.18 and any stable series that picked 1d52c78afbbf80b58299e076a159617d6b42fe3c

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 054577b..de4e70f 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1857,6 +1857,14 @@ int btrfs_delayed_delete_inode_ref(struct inode *inode)
 {
 	struct btrfs_delayed_node *delayed_node;
 
+	/*
+	 * we don't do delayed inode updates during log recovery because it
+	 * leads to enospc problems.  This means we also can't do
+	 * delayed inode refs
+	 */
+	if (BTRFS_I(inode)->root->fs_info->log_root_recovering)
+		return -EAGAIN;
+
 	delayed_node = btrfs_get_or_create_delayed_node(inode);
 	if (IS_ERR(delayed_node))
 		return PTR_ERR(delayed_node);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: don't delay inode ref updates during log replay
  2014-12-31 18:30     ` [PATCH] Btrfs: don't delay inode ref updates during log replay Chris Mason
@ 2015-01-01 22:58       ` Marc MERLIN
  2015-01-02  0:44         ` Qu Wenruo
  0 siblings, 1 reply; 10+ messages in thread
From: Marc MERLIN @ 2015-01-01 22:58 UTC (permalink / raw)
  To: Chris Mason, Qu Wenruo, Roman Mamedov, Btrfs BTRFS

Chris, you rule, I applied this to 3.16.7 and my problem went away.

Tested-By: Marc MERLIN <marc@merlins.org>

Happy new year! :)

Marc

On Wed, Dec 31, 2014 at 01:30:13PM -0500, Chris Mason wrote:
> Commit 1d52c78afbb (Btrfs: try not to ENOSPC on log replay) added a
> check to skip delayed inode updates during log replay because it
> confuses the enospc code.  But the delayed processing will end up
> skipping delayed refs from log replay because the inode itself wasn't
> put through the delayed code.
> 
> This can end up triggering a warning at commit time:
> 
> WARNING: CPU: 2 PID: 778 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x32/0x34()
> 
> Which is repeated for each commit because we never process the delayed ref.
> 
> The fix used here is to change btrfs_delayed_delete_inode_ref to return
> an error if we're current in log replay.  The caller will do the ref
> deletion immediately and everything will work properly.
> 
> This bug can cause lost files, whick fsck will find.  --repair on
> btrfs-progs 3.18 will fix them
> 
> Signed-off-by: Chris Mason <clm@fb.com>
> cc: stable@vger.kernel.org # v3.18 and any stable series that picked 1d52c78afbbf80b58299e076a159617d6b42fe3c
> 
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index 054577b..de4e70f 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -1857,6 +1857,14 @@ int btrfs_delayed_delete_inode_ref(struct inode *inode)
>  {
>  	struct btrfs_delayed_node *delayed_node;
>  
> +	/*
> +	 * we don't do delayed inode updates during log recovery because it
> +	 * leads to enospc problems.  This means we also can't do
> +	 * delayed inode refs
> +	 */
> +	if (BTRFS_I(inode)->root->fs_info->log_root_recovering)
> +		return -EAGAIN;
> +
>  	delayed_node = btrfs_get_or_create_delayed_node(inode);
>  	if (IS_ERR(delayed_node))
>  		return PTR_ERR(delayed_node);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Btrfs: don't delay inode ref updates during log replay
  2015-01-01 22:58       ` Marc MERLIN
@ 2015-01-02  0:44         ` Qu Wenruo
  0 siblings, 0 replies; 10+ messages in thread
From: Qu Wenruo @ 2015-01-02  0:44 UTC (permalink / raw)
  To: Marc MERLIN, Chris Mason, Roman Mamedov, Btrfs BTRFS

Great! The root cause of missing file is fixed!

Thanks a lot!
Qu
-------- Original Message --------
Subject: Re: [PATCH] Btrfs: don't delay inode ref updates during log replay
From: Marc MERLIN <marc@merlins.org>
To: Chris Mason <clm@fb.com>, Qu Wenruo <quwenruo@cn.fujitsu.com>, Roman 
Mamedov <rm@romanrm.net>, Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Date: 2015年01月02日 06:58
> Chris, you rule, I applied this to 3.16.7 and my problem went away.
>
> Tested-By: Marc MERLIN <marc@merlins.org>
>
> Happy new year! :)
>
> Marc
>
> On Wed, Dec 31, 2014 at 01:30:13PM -0500, Chris Mason wrote:
>> Commit 1d52c78afbb (Btrfs: try not to ENOSPC on log replay) added a
>> check to skip delayed inode updates during log replay because it
>> confuses the enospc code.  But the delayed processing will end up
>> skipping delayed refs from log replay because the inode itself wasn't
>> put through the delayed code.
>>
>> This can end up triggering a warning at commit time:
>>
>> WARNING: CPU: 2 PID: 778 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x32/0x34()
>>
>> Which is repeated for each commit because we never process the delayed ref.
>>
>> The fix used here is to change btrfs_delayed_delete_inode_ref to return
>> an error if we're current in log replay.  The caller will do the ref
>> deletion immediately and everything will work properly.
>>
>> This bug can cause lost files, whick fsck will find.  --repair on
>> btrfs-progs 3.18 will fix them
>>
>> Signed-off-by: Chris Mason <clm@fb.com>
>> cc: stable@vger.kernel.org # v3.18 and any stable series that picked 1d52c78afbbf80b58299e076a159617d6b42fe3c
>>
>> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
>> index 054577b..de4e70f 100644
>> --- a/fs/btrfs/delayed-inode.c
>> +++ b/fs/btrfs/delayed-inode.c
>> @@ -1857,6 +1857,14 @@ int btrfs_delayed_delete_inode_ref(struct inode *inode)
>>   {
>>   	struct btrfs_delayed_node *delayed_node;
>>   
>> +	/*
>> +	 * we don't do delayed inode updates during log recovery because it
>> +	 * leads to enospc problems.  This means we also can't do
>> +	 * delayed inode refs
>> +	 */
>> +	if (BTRFS_I(inode)->root->fs_info->log_root_recovering)
>> +		return -EAGAIN;
>> +
>>   	delayed_node = btrfs_get_or_create_delayed_node(inode);
>>   	if (IS_ERR(delayed_node))
>>   		return PTR_ERR(delayed_node);
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-01-02  0:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-28 19:26 3.16.3: fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty Marc MERLIN
2014-12-28 20:00 ` Roman Mamedov
2014-12-28 21:36   ` Marc MERLIN
2014-12-29 15:17     ` Chris Mason
2014-12-29 15:41       ` Marc MERLIN
2014-12-29 15:57         ` Chris Mason
2014-12-30  1:06   ` Qu Wenruo
2014-12-31 18:30     ` [PATCH] Btrfs: don't delay inode ref updates during log replay Chris Mason
2015-01-01 22:58       ` Marc MERLIN
2015-01-02  0:44         ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.