All of lore.kernel.org
 help / color / mirror / Atom feed
* recent complete stalls of btrfs (4.6.0-rc4+) -- any advice?
@ 2016-06-10 23:41 Yaroslav Halchenko
  2016-06-11  0:17 ` Chris Murphy
  0 siblings, 1 reply; 5+ messages in thread
From: Yaroslav Halchenko @ 2016-06-10 23:41 UTC (permalink / raw)
  To: linux-btrfs

Dear BTRFS developers,

First of all -- thanks for developing BTRFS!  So far it served really
well, when others falling (or failing) behind in my initial evaluation
(http://datalad.org/test_fs_analysis.html).  With btrbk backups are a
breeze.  But it still does fail completely for me at times
unfortunately.

I know that I should upgrade the kernel, and I will now...  but I
thought to share this incident(s) report since those might have been of
some value.  Running Debian jessie but with manually built kernel.
btrfs is extensively used for a high meta-data partition (lots of
symlinks, lots of directories with a single file in them -- heave use of
git-annex), snapshots are taken regularly etc.

Setup -- btrfs on top of software raids:

# btrfs fi show /mnt/btrfs/
Label: 'tank'  uuid: b5fe7f5e-3478-4293-a42c-bf9ca26ea724
        Total devices 4 FS bytes used 21.07TiB
        devid    2 size 10.92TiB used 5.30TiB path /dev/md10
        devid    3 size 10.92TiB used 5.30TiB path /dev/md11
        devid    4 size 10.92TiB used 5.30TiB path /dev/md12
        devid    5 size 10.92TiB used 5.30TiB path /dev/md13


Within last 5 days, the beast has stalled twice by now.  The last signs
were:

* 20160605 -- kernel kaboomed at btrfs level

smaug login: [3675876.734400] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa03d0354
[3675876.734400]
[3675876.745680] CPU: 9 PID: 651474 Comm: git Tainted: G        W IO    4.6.0-rc4+ #1
[3675876.753272] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.0b 09/17/2014
[3675876.760431]  0000000000000086 000000005e62edd4 ffffffff813098f5 ffffffff817cd080
[3675876.768104]  ffff880036f23da8 ffffffff811701af ffff881e00000010 ffff880036f23db8
[3675876.775763]  ffff880036f23d50 000000005e62edd4 ffff880036f23d88 ffffffffa03d0354
[3675876.783426] Call Trace:
[3675876.786057]  [<ffffffff813098f5>] ? dump_stack+0x5c/0x77
[3675876.791575]  [<ffffffff811701af>] ? panic+0xdf/0x226
[3675876.796812]  [<ffffffffa03d0354>] ? btrfs_add_link+0x384/0x3e0 [btrfs]
[3675876.803549]  [<ffffffff8107abf7>] ? __stack_chk_fail+0x17/0x30
[3675876.809610]  [<ffffffffa03d0354>] ? btrfs_add_link+0x384/0x3e0 [btrfs]
[3675876.816391]  [<ffffffffa03d1273>] ? btrfs_link+0x143/0x220 [btrfs]
[3675876.822802]  [<ffffffff811fea9f>] ? vfs_link+0x1af/0x280
[3675876.828331]  [<ffffffff812020ba>] ? SyS_link+0x22a/0x260
[3675876.833859]  [<ffffffff815ba436>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
[3675876.840740] Kernel Offset: disabled
[3675876.854050] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa03d0354
[3675876.854050]

* 20160610 -- again, different kaboom

[443370.085059] CPU: 10 PID: 1044513 Comm: git-annex Tainted: G        W IO    4.6.0-rc4+ #1
[443370.093268] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.0b 09/17/2014
[443370.100356] task: ffff8806c463d0c0 ti: ffff8808f9dc8000 task.ti: ffff8808f9dc8000
[443370.107953] RIP: 0010:[<ffff88090f67be10>]  [<ffff88090f67be10>] 0xffff88090f67be10
[443370.115761] RSP: 0018:ffff8808f9dcbe18  EFLAGS: 00010292
[443370.121187] RAX: ffff88103fd95fc0 RBX: ffff8808f9dcc000 RCX: 0000000000000000
[443370.128438] RDX: 00000000ffffffff RSI: ffff8806c463d0c0 RDI: ffff88103fd95fc0
[443370.135693] RBP: ffff8808f9dcbe30 R08: ffff8808f9dc8000 R09: 0000000000000000
[443370.142940] R10: 000000000000000a R11: 0000000000000000 R12: ffff881035beedc8
[443370.150184] R13: ffff880ff1106800 R14: ffff88123d6c0000 R15: ffff88123d6c0068
[443370.157432] FS:  00007f0ab3d83740(0000) GS:ffff88103fd80000(0000) knlGS:0000000000000000
[443370.165645] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[443370.171512] CR2: ffff88090f67be10 CR3: 0000000cf7516000 CR4: 00000000001406e0
[443370.178758] Stack:
[443370.180880]  ffff88069dda93c0 ffffffffa0358700 ffff88069dda93c0 ffff880f00000000
[443370.188490]  ffff8806c463d0c0 ffffffff810bb560 ffff8808f9dcbe48 ffff8808f9dcbe48
[443370.196107]  00000000d5ce3509 ffff88069dda93c0 0000000000000001 ffff8806a64835c8
[443370.203726] Call Trace:
[443370.206310]  [<ffffffffa0358700>] ? btrfs_commit_transaction+0x350/0xa30 [btrfs]
[443370.213826]  [<ffffffff810bb560>] ? wait_woken+0x90/0x90
[443370.219280]  [<ffffffffa036fb6b>] ? btrfs_sync_file+0x2fb/0x3d0 [btrfs]
[443370.226012]  [<ffffffff81222a48>] ? do_fsync+0x38/0x60
[443370.231267]  [<ffffffff81222ccf>] ? SyS_fdatasync+0xf/0x20
[443370.236870]  [<ffffffff815ba436>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
[443370.243604] Code: 88 ff ff 21 67 5b 81 ff ff ff ff 00 00 6c 3d 12 88 ff ff dd 77 35 a0 ff ff ff ff 00 00 00 00 00 00 00 00 40 e0 91 4b 08 88 ff ff <60> b5 0b 81 ff ff ff ff f0 fd 61 8a 0c 88 ff ff 18 7c 79 3e 00
[443370.264107] RIP  [<ffff88090f67be10>] 0xffff88090f67be10
[443370.271044]  RSP <ffff8808f9dcbe18>
[443370.276177] CR2: ffff88090f67be10
[443370.284979] ---[ end trace 2c4b690b49d17ebd ]---

and for the last case here is more details with dmesg showing apparently other tracebacks 
and errors logged before, so might be of help:

http://www.onerussian.com/tmp/dmesg-nonet.20160610.txt

Are those issues something which was fixed since 4.6.0-rc4+ or I should
be on look out for them to come back?  What other information should I
provide if I run into them again to help you troubleshoot/fix it?

P.S. Please CC me the replies

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: recent complete stalls of btrfs (4.6.0-rc4+) -- any advice?
  2016-06-10 23:41 recent complete stalls of btrfs (4.6.0-rc4+) -- any advice? Yaroslav Halchenko
@ 2016-06-11  0:17 ` Chris Murphy
  2016-06-13  3:46   ` Yaroslav Halchenko
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Murphy @ 2016-06-11  0:17 UTC (permalink / raw)
  To: Yaroslav Halchenko; +Cc: Btrfs BTRFS

On Fri, Jun 10, 2016 at 5:41 PM, Yaroslav Halchenko <yoh@onerussian.com> wrote:
> Dear BTRFS developers,
>
> First of all -- thanks for developing BTRFS!  So far it served really
> well, when others falling (or failing) behind in my initial evaluation
> (http://datalad.org/test_fs_analysis.html).  With btrbk backups are a
> breeze.  But it still does fail completely for me at times
> unfortunately.
>
> I know that I should upgrade the kernel, and I will now...  but I
> thought to share this incident(s) report since those might have been of
> some value.  Running Debian jessie but with manually built kernel.
> btrfs is extensively used for a high meta-data partition (lots of
> symlinks, lots of directories with a single file in them -- heave use of
> git-annex), snapshots are taken regularly etc.
>
> Setup -- btrfs on top of software raids:
>
> # btrfs fi show /mnt/btrfs/
> Label: 'tank'  uuid: b5fe7f5e-3478-4293-a42c-bf9ca26ea724
>         Total devices 4 FS bytes used 21.07TiB
>         devid    2 size 10.92TiB used 5.30TiB path /dev/md10
>         devid    3 size 10.92TiB used 5.30TiB path /dev/md11
>         devid    4 size 10.92TiB used 5.30TiB path /dev/md12
>         devid    5 size 10.92TiB used 5.30TiB path /dev/md13
>
>
> Within last 5 days, the beast has stalled twice by now.  The last signs
> were:
>
> * 20160605 -- kernel kaboomed at btrfs level
>
> smaug login: [3675876.734400] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa03d0354
> [3675876.734400]
> [3675876.745680] CPU: 9 PID: 651474 Comm: git Tainted: G        W IO    4.6.0-rc4+ #1
> [3675876.753272] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.0b 09/17/2014
> [3675876.760431]  0000000000000086 000000005e62edd4 ffffffff813098f5 ffffffff817cd080
> [3675876.768104]  ffff880036f23da8 ffffffff811701af ffff881e00000010 ffff880036f23db8
> [3675876.775763]  ffff880036f23d50 000000005e62edd4 ffff880036f23d88 ffffffffa03d0354
> [3675876.783426] Call Trace:
> [3675876.786057]  [<ffffffff813098f5>] ? dump_stack+0x5c/0x77
> [3675876.791575]  [<ffffffff811701af>] ? panic+0xdf/0x226
> [3675876.796812]  [<ffffffffa03d0354>] ? btrfs_add_link+0x384/0x3e0 [btrfs]
> [3675876.803549]  [<ffffffff8107abf7>] ? __stack_chk_fail+0x17/0x30
> [3675876.809610]  [<ffffffffa03d0354>] ? btrfs_add_link+0x384/0x3e0 [btrfs]
> [3675876.816391]  [<ffffffffa03d1273>] ? btrfs_link+0x143/0x220 [btrfs]
> [3675876.822802]  [<ffffffff811fea9f>] ? vfs_link+0x1af/0x280
> [3675876.828331]  [<ffffffff812020ba>] ? SyS_link+0x22a/0x260
> [3675876.833859]  [<ffffffff815ba436>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
> [3675876.840740] Kernel Offset: disabled
> [3675876.854050] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffffa03d0354
> [3675876.854050]
>
> * 20160610 -- again, different kaboom
>
> [443370.085059] CPU: 10 PID: 1044513 Comm: git-annex Tainted: G        W IO    4.6.0-rc4+ #1
> [443370.093268] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.0b 09/17/2014
> [443370.100356] task: ffff8806c463d0c0 ti: ffff8808f9dc8000 task.ti: ffff8808f9dc8000
> [443370.107953] RIP: 0010:[<ffff88090f67be10>]  [<ffff88090f67be10>] 0xffff88090f67be10
> [443370.115761] RSP: 0018:ffff8808f9dcbe18  EFLAGS: 00010292
> [443370.121187] RAX: ffff88103fd95fc0 RBX: ffff8808f9dcc000 RCX: 0000000000000000
> [443370.128438] RDX: 00000000ffffffff RSI: ffff8806c463d0c0 RDI: ffff88103fd95fc0
> [443370.135693] RBP: ffff8808f9dcbe30 R08: ffff8808f9dc8000 R09: 0000000000000000
> [443370.142940] R10: 000000000000000a R11: 0000000000000000 R12: ffff881035beedc8
> [443370.150184] R13: ffff880ff1106800 R14: ffff88123d6c0000 R15: ffff88123d6c0068
> [443370.157432] FS:  00007f0ab3d83740(0000) GS:ffff88103fd80000(0000) knlGS:0000000000000000
> [443370.165645] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [443370.171512] CR2: ffff88090f67be10 CR3: 0000000cf7516000 CR4: 00000000001406e0
> [443370.178758] Stack:
> [443370.180880]  ffff88069dda93c0 ffffffffa0358700 ffff88069dda93c0 ffff880f00000000
> [443370.188490]  ffff8806c463d0c0 ffffffff810bb560 ffff8808f9dcbe48 ffff8808f9dcbe48
> [443370.196107]  00000000d5ce3509 ffff88069dda93c0 0000000000000001 ffff8806a64835c8
> [443370.203726] Call Trace:
> [443370.206310]  [<ffffffffa0358700>] ? btrfs_commit_transaction+0x350/0xa30 [btrfs]
> [443370.213826]  [<ffffffff810bb560>] ? wait_woken+0x90/0x90
> [443370.219280]  [<ffffffffa036fb6b>] ? btrfs_sync_file+0x2fb/0x3d0 [btrfs]
> [443370.226012]  [<ffffffff81222a48>] ? do_fsync+0x38/0x60
> [443370.231267]  [<ffffffff81222ccf>] ? SyS_fdatasync+0xf/0x20
> [443370.236870]  [<ffffffff815ba436>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
> [443370.243604] Code: 88 ff ff 21 67 5b 81 ff ff ff ff 00 00 6c 3d 12 88 ff ff dd 77 35 a0 ff ff ff ff 00 00 00 00 00 00 00 00 40 e0 91 4b 08 88 ff ff <60> b5 0b 81 ff ff ff ff f0 fd 61 8a 0c 88 ff ff 18 7c 79 3e 00
> [443370.264107] RIP  [<ffff88090f67be10>] 0xffff88090f67be10
> [443370.271044]  RSP <ffff8808f9dcbe18>
> [443370.276177] CR2: ffff88090f67be10
> [443370.284979] ---[ end trace 2c4b690b49d17ebd ]---
>
> and for the last case here is more details with dmesg showing apparently other tracebacks
> and errors logged before, so might be of help:
>
> http://www.onerussian.com/tmp/dmesg-nonet.20160610.txt
>
> Are those issues something which was fixed since 4.6.0-rc4+ or I should
> be on look out for them to come back?  What other information should I
> provide if I run into them again to help you troubleshoot/fix it?
>
> P.S. Please CC me the replies


4.6.2 is current and it's a lot easier to just use that and see if it
still happens than for someone to track down whether it's been fixed
since a six week old RC.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: recent complete stalls of btrfs (4.6.0-rc4+) -- any advice?
  2016-06-11  0:17 ` Chris Murphy
@ 2016-06-13  3:46   ` Yaroslav Halchenko
  2016-08-09 22:19     ` recent complete stalls of btrfs (4.7.0-rc2+) " Yaroslav Halchenko
  0 siblings, 1 reply; 5+ messages in thread
From: Yaroslav Halchenko @ 2016-06-13  3:46 UTC (permalink / raw)
  To: Btrfs BTRFS


On Fri, 10 Jun 2016, Chris Murphy wrote:

> > Are those issues something which was fixed since 4.6.0-rc4+ or I should
> > be on look out for them to come back?  What other information should I
> > provide if I run into them again to help you troubleshoot/fix it?

> > P.S. Please CC me the replies


> 4.6.2 is current and it's a lot easier to just use that and see if it
> still happens than for someone to track down whether it's been fixed
> since a six week old RC.

Dear Chris,

Thank you for the reply!  Now running v4.7-rc2-300-g3d0f0b6

The thing is that this issue doesn't happen right away, and it takes a
while for it to develop, and seems to be only after an intensive load.
So the version I run will always be "X weeks old" if I just keep hopping
the recent release of master, and it would be an indefinite goose
chase if left un-analyzed.  That is why I would still appreciate an
advice on what specifics to report/attempt if such crash happens next
time, or may be if someone is having an idea of what could have lead to
this crash to start with.

-- 
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Research Scientist,            Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: recent complete stalls of btrfs (4.7.0-rc2+) -- any advice?
  2016-06-13  3:46   ` Yaroslav Halchenko
@ 2016-08-09 22:19     ` Yaroslav Halchenko
  2016-09-09 12:13       ` Yaroslav Halchenko
  0 siblings, 1 reply; 5+ messages in thread
From: Yaroslav Halchenko @ 2016-08-09 22:19 UTC (permalink / raw)
  To: Btrfs BTRFS


On Sun, 12 Jun 2016, Yaroslav Halchenko wrote:
> On Fri, 10 Jun 2016, Chris Murphy wrote:

> > > Are those issues something which was fixed since 4.6.0-rc4+ or I should
> > > be on look out for them to come back?  What other information should I
> > > provide if I run into them again to help you troubleshoot/fix it?

> > > P.S. Please CC me the replies


> > 4.6.2 is current and it's a lot easier to just use that and see if it
> > still happens than for someone to track down whether it's been fixed
> > since a six week old RC.

> Dear Chris,

> Thank you for the reply!  Now running v4.7-rc2-300-g3d0f0b6

> The thing is that this issue doesn't happen right away, and it takes a
> while for it to develop, and seems to be only after an intensive load.
> So the version I run will always be "X weeks old" if I just keep hopping
> the recent release of master, and it would be an indefinite goose
> chase if left un-analyzed.  That is why I would still appreciate an
> advice on what specifics to report/attempt if such crash happens next
> time, or may be if someone is having an idea of what could have lead to
> this crash to start with.

The beast has died on me today's morning :-/  Last kern.log msg was

    (Fixing recursive fault but reboot is needed!)

One of the tracebacks is the same as before (ending on
btrfs_commit_transaction), so I guess it could be the same issue as
before?  Most probably I will perform the same kernel build/upgrade dance
again BUT I still hope that someone might just either spot some sign of
recently (since v4.7-rc2-300-g3d0f0b6) fixed issue or, if not spotted, actually
looks in detail on possibly a new issue which wasn't addressed yet.  I would be
"happy" to provide more information or enable any necessary additional
monitoring to provide more information in case of the next crash.

I have rebooted the box around 11am, and it was completely unresponsive since
some time earlier but I think it still "somewhat functioned" after the last
traceback reported in the kern.log which I shared at
http://www.onerussian.com/tmp/kern-smaug-20160809.log otherwise journalctl -b
-1 doesn't show any other grave errors.   The very last oops in the kern.log I
also cite here.  Out of academic interest?  why seems to be ext4 functionality
within the stack for btrfs_commit_transaction?  is some logic common/reused
between the two file systems?  Or it is just a mere fact that some partitions
on ext4 and something in btrfs triggered them as well?

Aug  9 07:46:15 smaug kernel: [5132590.362689] Oops: 0000 [#3] SMP
Aug  9 07:46:15 smaug kernel: [5132590.367913] Modules linked in: uas usb_storage vboxdrv(O) nls_utf8 ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs veth xt_addrtype ipt_MASQUERADE nf_nat_masquerade_ipv4 bridge stp llc cpufreq_stats cpufreq_userspace cpufreq_conservative cpufreq_powersave xt_pkttype nf_log_ipv4 nf_log_common xt_tcpudp ip6table_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_TCPMSS xt_LOG ipt_REJECT nf_reject_ipv4 iptable_mangle xt_multiport xt_state xt_limit xt_conntrack nfsd nf_conntrack_ftp auth_rpcgss oid_registry nfs_acl nfs lockd grace nf_conntrack ip6table_filter ip6_tables iptable_filter ip_tables x_tables fscache sunrpc binfmt_misc intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_watchdog ipmi_poweroff ipmi_devintf kvm_intel iTCO_wdt iTCO_vendor_support kvm irqbypass fuse crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drbg ansi_cprng aesni_intel aes_x86_64 lrw gf128mul snd_pcm glue_helper ablk_helper cryptd snd_timer snd soundcore pcspkr evdev joydev ast ttm drm_kms_helper i2c_i801 drm i2c_algo_bit mei_me lpc_ich mfd_core mei ipmi_si ioatdma shpchp wmi ipmi_msghandler ecryptfs cbc tpm_tis tpm acpi_power_meter acpi_pad button sha256_ssse3 sha256_generic hmac encrypted_keys autofs4 ext4 crc16 jbd2 mbcache btrfs dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 md_mod ses enclosure sg sd_mod hid_generic usbhid hid crc32c_intel mpt3sas raid_class scsi_transport_sas xhci_pci xhci_hcd ehci_pci ahci ehci_hcd libahci libata usbcore ixgbe scsi_mod usb_common dca ptp pps_core mdio fjes
Aug  9 07:46:15 smaug kernel: [5132590.538375] CPU: 6 PID: 2878531 Comm: git Tainted: G      D W IO    4.7.0-rc2+ #1
Aug  9 07:46:15 smaug kernel: [5132590.547950] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.0b 09/17/2014
Aug  9 07:46:15 smaug kernel: [5132590.557009] task: ffff8817b855b0c0 ti: ffff88000e0dc000 task.ti: ffff88000e0dc000
Aug  9 07:46:15 smaug kernel: [5132590.566572] RIP: 0010:[<ffffffffa0444be3>]  [<ffffffffa0444be3>] jbd2__journal_start+0x33/0x1e0 [jbd2]
Aug  9 07:46:15 smaug kernel: [5132590.578009] RSP: 0018:ffff88000e0df8f0  EFLAGS: 00010282
Aug  9 07:46:15 smaug kernel: [5132590.585427] RAX: ffff88155eae8140 RBX: ffff881ed5a9d128 RCX: 0000000002400040
Aug  9 07:46:15 smaug kernel: [5132590.594678] RDX: 00000000000fd0e4 RSI: 0000000000000002 RDI: ffff882034d0f000
Aug  9 07:46:15 smaug kernel: [5132590.603929] RBP: ffff882034d0f000 R08: 0000000000000001 R09: 0000000000001569
Aug  9 07:46:15 smaug kernel: [5132590.613264] R10: 00000000107aa8b7 R11: fffffffffffffff0 R12: ffff881ed5a9d128
Aug  9 07:46:15 smaug kernel: [5132590.622566] R13: ffff882033909000 R14: ffff881816302a00 R15: ffff881ed5a9d128
Aug  9 07:46:15 smaug kernel: [5132590.631846] FS:  0000000000000000(0000) GS:ffff88207fc80000(0000) knlGS:0000000000000000
Aug  9 07:46:15 smaug kernel: [5132590.642060] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug  9 07:46:15 smaug kernel: [5132590.649898] CR2: 00000000000fd0e4 CR3: 0000000001a06000 CR4: 00000000001406e0
Aug  9 07:46:15 smaug kernel: [5132590.659130] Stack:
Aug  9 07:46:15 smaug kernel: [5132590.663228]  ffffffffa049cc54 0000156902020200 ffff881ed5a9d128 0000000000000801
Aug  9 07:46:15 smaug kernel: [5132590.672811]  ffff881ed5a9d128 ffff882033909000 ffff881816302a00 ffff881ed5a9d128
Aug  9 07:46:15 smaug kernel: [5132590.682392]  ffffffffa0470b9d ffff881ed5a9d128 0000000000000801 ffffffff8121fe67
Aug  9 07:46:15 smaug kernel: [5132590.691981] Call Trace:
Aug  9 07:46:15 smaug kernel: [5132590.696597]  [<ffffffffa049cc54>] ? __ext4_journal_start_sb+0x34/0xf0 [ext4]
Aug  9 07:46:15 smaug kernel: [5132590.705791]  [<ffffffffa0470b9d>] ? ext4_dirty_inode+0x2d/0x60 [ext4]
Aug  9 07:46:15 smaug kernel: [5132590.714340]  [<ffffffff8121fe67>] ? __mark_inode_dirty+0x177/0x360
Aug  9 07:46:15 smaug kernel: [5132590.722623]  [<ffffffff8120e389>] ? generic_update_time+0x79/0xd0
Aug  9 07:46:15 smaug kernel: [5132590.730814]  [<ffffffff8120da8d>] ? file_update_time+0xbd/0x110
Aug  9 07:46:15 smaug kernel: [5132590.738845]  [<ffffffff81175f69>] ? __generic_file_write_iter+0x99/0x1e0
Aug  9 07:46:15 smaug kernel: [5132590.747708]  [<ffffffffa04631b6>] ? ext4_file_write_iter+0x196/0x3d0 [ext4]
Aug  9 07:46:15 smaug kernel: [5132590.756756]  [<ffffffff811f170b>] ? __vfs_write+0xeb/0x160
Aug  9 07:46:15 smaug kernel: [5132590.764301]  [<ffffffff811f2103>] ? __kernel_write+0x53/0x100
Aug  9 07:46:15 smaug kernel: [5132590.772081]  [<ffffffff810ff672>] ? do_acct_process+0x462/0x4e0
Aug  9 07:46:15 smaug kernel: [5132590.780035]  [<ffffffff810ffd4c>] ? acct_process+0xdc/0x100
Aug  9 07:46:15 smaug kernel: [5132590.787648]  [<ffffffff8107e403>] ? do_exit+0x7f3/0xb80
Aug  9 07:46:15 smaug kernel: [5132590.794894]  [<ffffffff8102fa5c>] ? oops_end+0x9c/0xd0
Aug  9 07:46:15 smaug kernel: [5132590.802027]  [<ffffffff81062d35>] ? no_context+0x135/0x390
Aug  9 07:46:15 smaug kernel: [5132590.809496]  [<ffffffff815ca1f8>] ? page_fault+0x28/0x30
Aug  9 07:46:15 smaug kernel: [5132590.816808]  [<ffffffffa0381af0>] ? btrfs_commit_transaction+0x350/0xa30 [btrfs]
Aug  9 07:46:15 smaug kernel: [5132590.826213]  [<ffffffff810ba590>] ? wait_woken+0x90/0x90
Aug  9 07:46:15 smaug kernel: [5132590.833501]  [<ffffffffa039a11b>] ? btrfs_sync_file+0x2fb/0x3e0 [btrfs]
Aug  9 07:46:15 smaug kernel: [5132590.842074]  [<ffffffff81225318>] ? do_fsync+0x38/0x60
Aug  9 07:46:15 smaug kernel: [5132590.849114]  [<ffffffff8122558c>] ? SyS_fsync+0xc/0x10
Aug  9 07:46:15 smaug kernel: [5132590.856096]  [<ffffffff815c81f6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
Aug  9 07:46:15 smaug kernel: [5132590.864522] Code: 56 41 55 41 54 55 53 48 89 fd 65 48 8b 04 25 c0 d4 00 00 48 83 ec 10 48 85 ff 48 8b 80 90 06 00 00 74 20 48 85 c0 74 33 48 8b 10 <48> 3b 3a 75 29 83 40 14 01 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e
Aug  9 07:46:15 smaug kernel: [5132590.888065] RIP  [<ffffffffa0444be3>] jbd2__journal_start+0x33/0x1e0 [jbd2]
Aug  9 07:46:15 smaug kernel: [5132590.896830]  RSP <ffff88000e0df8f0>
Aug  9 07:46:15 smaug kernel: [5132590.902039] CR2: 00000000000fd0e4
Aug  9 07:46:15 smaug kernel: [5132590.907032] ---[ end trace 3b9450d000ed06b4 ]---
Aug  9 07:46:15 smaug kernel: [5132590.914612] Fixing recursive fault but reboot is needed!

Thank you very much in advance for any ideas/feedback.  

Please CC me the responses
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: recent complete stalls of btrfs (4.7.0-rc2+) -- any advice?
  2016-08-09 22:19     ` recent complete stalls of btrfs (4.7.0-rc2+) " Yaroslav Halchenko
@ 2016-09-09 12:13       ` Yaroslav Halchenko
  0 siblings, 0 replies; 5+ messages in thread
From: Yaroslav Halchenko @ 2016-09-09 12:13 UTC (permalink / raw)
  To: Btrfs BTRFS


On Tue, 09 Aug 2016, Yaroslav Halchenko wrote:

> The beast has died on me today's morning :-/  Last kern.log msg was

>     (Fixing recursive fault but reboot is needed!)

locked down again but this time seems to be different stack (and no above
msg) from before:

(full list of oopses since boot at
http://www.onerussian.com/tmp/journal-20160909-oopses.log
)

Sep 09 02:18:33 smaug kernel: ------------[ cut here ]------------
Sep 09 02:18:33 smaug kernel: WARNING: CPU: 4 PID: 2189174 at lib/list_debug.c:33 __list_add+0x86/0xb0
Sep 09 02:18:33 smaug kernel: list_add corruption. prev->next should be next (ffff8820079d6308), but was ffff88181e7e0d28. (prev=ffff8810b209fe10).
Sep 09 02:18:33 smaug kernel: Modules linked in: veth xt_addrtype ipt_MASQUERADE nf_nat_masquerade_ipv4 bridge stp llc pci_stub cpufreq_stats cpufreq_userspace cpufreq_conservative cpufreq_powersave xt_pkttype nf_log_ipv4 nf_log_common xt_tcpudp ip6table_mangle nfsd auth_rpcgss oid_registry nfs_acl iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_TCPMSS xt_LOG ipt_REJECT nf_reject_ipv4 iptable_mangle xt_multiport xt_state xt_limit xt_conntrack nf_conntrack_ftp nfs lockd grace nf_conntrack ip6table_filter ip6_tables iptable_filter ip_tables x_tables fscache sunrpc binfmt_misc ipmi_watchdog intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_poweroff ipmi_devintf kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt iTCO_vendor_support drbg
Sep 09 02:18:33 smaug kernel:  ansi_cprng snd_pcm snd_timer aesni_intel snd aes_x86_64 soundcore lrw fuse gf128mul glue_helper ablk_helper cryptd pcspkr ast ttm drm_kms_helper joydev drm mei_me evdev i2c_algo_bit i2c_i801 mei shpchp lpc_ich ioatdma mfd_core ipmi_si wmi ipmi_msghandler tpm_tis tpm acpi_pad acpi_power_meter button ecryptfs cbc sha256_ssse3 sha256_generic hmac encrypted_keys autofs4 ext4 crc16 jbd2 mbcache btrfs dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 md_mod sg ses enclosure sd_mod hid_generic usbhid hid crc32c_intel ahci libahci mpt3sas raid_class scsi_transport_sas ehci_pci xhci_pci xhci_hcd ehci_hcd libata ixgbe dca usbcore usb_common scsi_mod ptp pps_core mdio fjes [last unloaded: vboxdrv]
Sep 09 02:18:33 smaug kernel: CPU: 4 PID: 2189174 Comm: git-annex Tainted: G        W IO    4.7.0-rc2+ #1
Sep 09 02:18:33 smaug kernel: Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.0b 09/17/2014
Sep 09 02:18:33 smaug kernel:  0000000000000286 000000000ab947c2 ffffffff8130c605 ffff881292cfbd28
Sep 09 02:18:33 smaug kernel:  0000000000000000 ffffffff8107a314 ffff881292cfbe10 ffff881292cfbd80
Sep 09 02:18:33 smaug kernel:  ffff8810b209fe10 ffff881037a07a98 ffff881f24b1a800 ffff881037a07800
Sep 09 02:18:33 smaug kernel: Call Trace:
Sep 09 02:18:33 smaug kernel:  [<ffffffff8130c605>] ? dump_stack+0x5c/0x77
Sep 09 02:18:33 smaug kernel:  [<ffffffff8107a314>] ? __warn+0xc4/0xe0
Sep 09 02:18:33 smaug kernel:  [<ffffffff8107a38f>] ? warn_slowpath_fmt+0x5f/0x80
Sep 09 02:18:33 smaug kernel:  [<ffffffffa0407495>] ? btrfs_write_marked_extents+0x95/0x130 [btrfs]
Sep 09 02:18:33 smaug kernel:  [<ffffffff81329f86>] ? __list_add+0x86/0xb0
Sep 09 02:18:33 smaug kernel:  [<ffffffffa044c459>] ? btrfs_sync_log+0x249/0xa80 [btrfs]
Sep 09 02:18:33 smaug kernel:  [<ffffffffa04211ba>] ? btrfs_sync_file+0x39a/0x3e0 [btrfs]
Sep 09 02:18:33 smaug kernel:  [<ffffffff81225318>] ? do_fsync+0x38/0x60
Sep 09 02:18:33 smaug kernel:  [<ffffffff8122559f>] ? SyS_fdatasync+0xf/0x20
Sep 09 02:18:33 smaug kernel:  [<ffffffff815c81f6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
Sep 09 02:18:33 smaug kernel: ---[ end trace 125800d45db3ce41 ]---
Sep 09 02:18:34 smaug kernel: general protection fault: 0000 [#1] SMP
Sep 09 02:18:34 smaug kernel: Modules linked in: veth xt_addrtype ipt_MASQUERADE nf_nat_masquerade_ipv4 bridge stp llc pci_stub cpufreq_stats cpufreq_userspace cpufreq_conservative cpufreq_powersave xt_pkttype nf_log_ipv4 nf_log_common xt_tcpudp ip6table_mangle nfsd auth_rpcgss oid_registry nfs_acl iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat xt_TCPMSS xt_LOG ipt_REJECT nf_reject_ipv4 iptable_mangle xt_multiport xt_state xt_limit xt_conntrack nf_conntrack_ftp nfs lockd grace nf_conntrack ip6table_filter ip6_tables iptable_filter ip_tables x_tables fscache sunrpc binfmt_misc ipmi_watchdog intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_poweroff ipmi_devintf kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt iTCO_vendor_support drbg
Sep 09 02:18:34 smaug kernel:  ansi_cprng snd_pcm snd_timer aesni_intel snd aes_x86_64 soundcore lrw fuse gf128mul glue_helper ablk_helper cryptd pcspkr ast ttm drm_kms_helper joydev drm mei_me evdev i2c_algo_bit i2c_i801 mei shpchp lpc_ich ioatdma mfd_core ipmi_si wmi ipmi_msghandler tpm_tis tpm acpi_pad acpi_power_meter button ecryptfs cbc sha256_ssse3 sha256_generic hmac encrypted_keys autofs4 ext4 crc16 jbd2 mbcache btrfs dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 md_mod sg ses enclosure sd_mod hid_generic usbhid hid crc32c_intel ahci libahci mpt3sas raid_class scsi_transport_sas ehci_pci xhci_pci xhci_hcd ehci_hcd libata ixgbe dca usbcore usb_common scsi_mod ptp pps_core mdio fjes [last unloaded: vboxdrv]
Sep 09 02:18:34 smaug kernel: CPU: 5 PID: 2189174 Comm: git-annex Tainted: G        W IO    4.7.0-rc2+ #1
Sep 09 02:18:34 smaug kernel: Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.0b 09/17/2014
Sep 09 02:18:34 smaug kernel: task: ffff8810eedb8000 ti: ffff881292cf8000 task.ti: ffff881292cf8000
Sep 09 02:18:34 smaug kernel: RIP: 0010:[<ffffffffa03dce1c>]  [<ffffffffa03dce1c>] btrfs_root_node+0xc/0x50 [btrfs]
Sep 09 02:18:34 smaug kernel: RSP: 0018:ffff881292cfbcd0  EFLAGS: 00010246
Sep 09 02:18:34 smaug kernel: RAX: 0000000000000008 RBX: ffff881033a07800 RCX: ae07f824b05dfc14
Sep 09 02:18:34 smaug kernel: RDX: ffff881292cfbd9f RSI: 0000000000000000 RDI: ffff881033a07800
Sep 09 02:18:34 smaug kernel: RBP: ffff881033a07800 R08: 00000000ffffffff R09: 00000000ffffffff
Sep 09 02:18:34 smaug kernel: R10: 0000000000000001 R11: 00000000ffffffff R12: 0000160000000000
Sep 09 02:18:34 smaug kernel: R13: ffff881033a07800 R14: ffff880000000000 R15: ffff8813031dc820
Sep 09 02:18:34 smaug kernel: FS:  00007efea04d2740(0000) GS:ffff88207fc40000(0000) knlGS:0000000000000000
Sep 09 02:18:34 smaug kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 09 02:18:34 smaug kernel: CR2: 00007f40b5389710 CR3: 0000001b4f38b000 CR4: 00000000001406e0
Sep 09 02:18:34 smaug kernel: Stack:
Sep 09 02:18:34 smaug kernel:  ffff8815e5ff6700 ffffffffa03dce84 ffff8815e5ff6700 0000000000000000
Sep 09 02:18:34 smaug kernel:  ffffffffa03e1f96 ffffffffa03dfe9c ffff881292cfbe1c ffff881292cfbd9f
Sep 09 02:18:34 smaug kernel:  ffff8813031dc820 0000000000000001 0000000200000002 1ffff1020100f7ac
Sep 09 02:18:34 smaug kernel: Call Trace:
Sep 09 02:18:34 smaug kernel:  [<ffffffffa03dce84>] ? btrfs_read_lock_root_node+0x24/0x40 [btrfs]
Sep 09 02:18:34 smaug kernel:  [<ffffffffa03e1f96>] ? btrfs_search_slot+0x726/0x9e0 [btrfs]
Sep 09 02:18:34 smaug kernel:  [<ffffffffa03dfe9c>] ? btrfs_leaf_free_space+0x4c/0xa0 [btrfs]
Sep 09 02:18:34 smaug kernel:  [<ffffffffa02801e6>] ? crc32c_pcl_intel_update+0x26/0x60 [crc32c_intel]
Sep 09 02:18:34 smaug kernel:  [<ffffffffa03fa979>] ? btrfs_lookup_dir_item+0x79/0xc0 [btrfs]
Sep 09 02:18:34 smaug kernel:  [<ffffffffa040f365>] ? __btrfs_unlink_inode+0xb5/0x4a0 [btrfs]
Sep 09 02:18:34 smaug kernel:  [<ffffffffa0413077>] ? btrfs_unlink_inode+0x17/0x40 [btrfs]
Sep 09 02:18:34 smaug kernel:  [<ffffffffa04136a2>] ? btrfs_rmdir+0xf2/0x140 [btrfs]
Sep 09 02:18:34 smaug kernel:  [<ffffffff811feccc>] ? vfs_rmdir+0xac/0x120
Sep 09 02:18:34 smaug kernel:  [<ffffffff812028c0>] ? do_rmdir+0x1e0/0x200
Sep 09 02:18:34 smaug kernel:  [<ffffffff815c81f6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
Sep 09 02:18:34 smaug kernel: Code: ff ff 48 89 de 48 8b 3d 93 bc 0c 00 5b e9 8d 81 df e0 f3 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 89 fb 48 8b 0b <8b> 71 24 85 f6 74 33 44 8d 46 01 48 8d 79 24 89 f0 f0 44 0f b1 
Sep 09 02:18:34 smaug kernel: RIP  [<ffffffffa03dce1c>] btrfs_root_node+0xc/0x50 [btrfs]
Sep 09 02:18:34 smaug kernel:  RSP <ffff881292cfbcd0>
Sep 09 02:18:34 smaug kernel: ---[ end trace 125800d45db3ce42 ]---

Anything hinting about the problem?

I guess I have to rebuild bleeding edge version and try again... heh heh

Please CC replies
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-09-09 12:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-10 23:41 recent complete stalls of btrfs (4.6.0-rc4+) -- any advice? Yaroslav Halchenko
2016-06-11  0:17 ` Chris Murphy
2016-06-13  3:46   ` Yaroslav Halchenko
2016-08-09 22:19     ` recent complete stalls of btrfs (4.7.0-rc2+) " Yaroslav Halchenko
2016-09-09 12:13       ` Yaroslav Halchenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.