* NMI from btrfs cron jobs in 3.11.6
@ 2013-11-13 14:45 Russell Coker
2013-11-13 15:15 ` Duncan
2013-11-28 17:37 ` David Sterba
0 siblings, 2 replies; 6+ messages in thread
From: Russell Coker @ 2013-11-13 14:45 UTC (permalink / raw)
To: linux-btrfs
[-- Attachment #1: Type: text/plain, Size: 3029 bytes --]
Nov 14 00:00:01 workstation /USR/SBIN/CRON[30993]: (root) CMD
(/usr/local/sbin/btrfs-make-snapshot minutes /home >> /var/log/snapshot.log)
Nov 14 00:00:01 workstation /USR/SBIN/CRON[30994]: (root) CMD
(/usr/local/sbin/btrfs-make-snapshot days /home >> /var/log/snapshot.log)
Nov 14 00:00:01 workstation /USR/SBIN/CRON[30995]: (root) CMD
(/usr/local/sbin/btrfs-remove-snapshots 100 100 /home/ / >>
/var/log/snapshot.log)
Below is a grep for NMI in my kernel message log. I'm running Debian kernel
3.11.6-2 on a quad-core AMD64 system. Above is the relevant section of the
cron log. It appears that running two scripts that create snapshots at the
same time as a script that removes maybe 97 snapshots causes an NMI. I've
attached the kernel message log from today, I can give you logs from other
days if they are useful.
Nov 11 00:00:58 workstation kernel: [896466.616001] sending NMI to all CPUs:
Nov 11 00:00:58 workstation kernel: [896466.616016] NMI backtrace for cpu 0
Nov 11 00:00:58 workstation kernel: [896466.616001] NMI backtrace for cpu 2
Nov 11 00:00:58 workstation kernel: [896466.616479] NMI backtrace for cpu 3
Nov 11 00:00:58 workstation kernel: [896466.616637] NMI backtrace for cpu 1
Nov 12 00:00:58 workstation kernel: [982863.428000] sending NMI to all CPUs:
Nov 12 00:00:58 workstation kernel: [982863.428000] NMI backtrace for cpu 1
Nov 12 00:00:58 workstation kernel: [982863.428340] NMI backtrace for cpu 0
Nov 12 00:00:58 workstation kernel: [982863.428647] NMI backtrace for cpu 2
Nov 12 00:00:58 workstation kernel: [982863.428725] NMI backtrace for cpu 3
Nov 13 00:00:57 workstation kernel: [1069265.036001] sending NMI to all CPUs:
Nov 13 00:00:57 workstation kernel: [1069265.036018] NMI backtrace for cpu 1
Nov 13 00:00:57 workstation kernel: [1069265.036341] NMI backtrace for cpu 0
Nov 13 00:00:57 workstation kernel: [1069265.036001] NMI backtrace for cpu 2
Nov 13 00:00:57 workstation kernel: [1069265.036723] NMI backtrace for cpu 3
Nov 13 22:21:04 workstation kernel: [1149672.214144] INFO: NMI handler
(perf_event_nmi_handler) took too long to run: 6.060 msecs
Nov 14 00:00:56 workstation kernel: [1155663.796001] sending NMI to all CPUs:
Nov 14 00:00:56 workstation kernel: [1155663.796001] NMI backtrace for cpu 2
Nov 14 00:00:56 workstation kernel: [1155663.796269] NMI backtrace for cpu 3
Nov 14 00:00:56 workstation kernel: [1155663.796007] NMI backtrace for cpu 1
Nov 14 00:00:56 workstation kernel: [1155663.796581] NMI backtrace for cpu 0
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
[-- Attachment #2: oops.gz --]
[-- Type: application/x-gzip, Size: 2761 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: NMI from btrfs cron jobs in 3.11.6
2013-11-13 14:45 NMI from btrfs cron jobs in 3.11.6 Russell Coker
@ 2013-11-13 15:15 ` Duncan
2013-11-17 14:43 ` Russell Coker
2013-11-28 17:37 ` David Sterba
1 sibling, 1 reply; 6+ messages in thread
From: Duncan @ 2013-11-13 15:15 UTC (permalink / raw)
To: linux-btrfs
Russell Coker posted on Thu, 14 Nov 2013 01:45:29 +1100 as excerpted:
> It appears that running two scripts that create snapshots at the same
> time as a script that removes maybe 97 snapshots causes an NMI.
This is a known bug in 3.11 (and presumably earlier). Snapshot
manipulation concurrency evidently wasn't originally well considered, and
either a recent regression triggered the problem or possibly it always
existed and only recently was reported (I don't know either way, tho the
btrfs devs might).
Either way, there's a patch both to address the problem and adding a
regression test to the xfstests filesystem testing suite to catch future
regressions, but it's too recent to be in 3.11 yet, and indeed, as of my
update yesterday (3.12.0-03510-g9b66bfb), there hadn't yet been a btrfs
pull in the 3.12 commit window yet either, so while 3.12 will likely get
it, it's not there yet as of the above git snapshot.
If interested in more, check recent list threads for the bug report and
proposed patch.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: NMI from btrfs cron jobs in 3.11.6
2013-11-13 15:15 ` Duncan
@ 2013-11-17 14:43 ` Russell Coker
2013-11-17 15:16 ` Marc MERLIN
0 siblings, 1 reply; 6+ messages in thread
From: Russell Coker @ 2013-11-17 14:43 UTC (permalink / raw)
To: linux-btrfs
On Thu, 14 Nov 2013, Duncan <1i5t5.duncan@cox.net> wrote:
> > It appears that running two scripts that create snapshots at the same
> > time as a script that removes maybe 97 snapshots causes an NMI.
>
> This is a known bug in 3.11 (and presumably earlier). Snapshot
> manipulation concurrency evidently wasn't originally well considered, and
> either a recent regression triggered the problem or possibly it always
> existed and only recently was reported (I don't know either way, tho the
> btrfs devs might).
>
> Either way, there's a patch both to address the problem and adding a
> regression test to the xfstests filesystem testing suite to catch future
Thanks for the information.
As an aside, I had it happen on a workstation that I was using to watch a
movie over NFS. Strangely the BTRFS hang caused video playback to pause. I
didn't expect a BTRFS issue like that to stop access to an NFS mount from
another system.
Would it be locking up inside the VFS?
--
My Main Blog http://etbe.coker.com.au/
My Documents Blog http://doc.coker.com.au/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: NMI from btrfs cron jobs in 3.11.6
2013-11-17 14:43 ` Russell Coker
@ 2013-11-17 15:16 ` Marc MERLIN
2013-11-28 17:10 ` David Sterba
0 siblings, 1 reply; 6+ messages in thread
From: Marc MERLIN @ 2013-11-17 15:16 UTC (permalink / raw)
To: Russell Coker, 1i5t5.duncan; +Cc: linux-btrfs
On Mon, Nov 18, 2013 at 01:43:51AM +1100, Russell Coker wrote:
> On Thu, 14 Nov 2013, Duncan <1i5t5.duncan@cox.net> wrote:
> > > It appears that running two scripts that create snapshots at the same
> > > time as a script that removes maybe 97 snapshots causes an NMI.
> >
> > This is a known bug in 3.11 (and presumably earlier). Snapshot
> > manipulation concurrency evidently wasn't originally well considered, and
Not super important, but the bug does go farther back in time, I
reported it here too some time back (although didn't hear anything back
at the time).
But for me I had
/var/local/scr/btrfs_snaps: line 23: 26017 Segmentation fault (core dumped) /sbin/btrfs subvolume
delete "$sub"
with details on the BT pasted in the mail:
http://permalink.gmane.org/gmane.comp.file-systems.btrfs/28770
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: NMI from btrfs cron jobs in 3.11.6
2013-11-17 15:16 ` Marc MERLIN
@ 2013-11-28 17:10 ` David Sterba
0 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2013-11-28 17:10 UTC (permalink / raw)
To: Marc MERLIN; +Cc: Russell Coker, 1i5t5.duncan, linux-btrfs
On Sun, Nov 17, 2013 at 07:16:18AM -0800, Marc MERLIN wrote:
> On Mon, Nov 18, 2013 at 01:43:51AM +1100, Russell Coker wrote:
> > On Thu, 14 Nov 2013, Duncan <1i5t5.duncan@cox.net> wrote:
> > > > It appears that running two scripts that create snapshots at the same
> > > > time as a script that removes maybe 97 snapshots causes an NMI.
> > >
> > > This is a known bug in 3.11 (and presumably earlier). Snapshot
> > > manipulation concurrency evidently wasn't originally well considered, and
>
> Not super important, but the bug does go farther back in time, I
> reported it here too some time back (although didn't hear anything back
> at the time).
>
> But for me I had
> /var/local/scr/btrfs_snaps: line 23: 26017 Segmentation fault (core dumped) /sbin/btrfs subvolume
> delete "$sub"
> with details on the BT pasted in the mail:
> http://permalink.gmane.org/gmane.comp.file-systems.btrfs/28770
Kernel 3.2.5
IIRC there was a fix or two regarding concurrency during snapshot
creation and deletion, roughly matching the kernel 3.2 times.
david
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: NMI from btrfs cron jobs in 3.11.6
2013-11-13 14:45 NMI from btrfs cron jobs in 3.11.6 Russell Coker
2013-11-13 15:15 ` Duncan
@ 2013-11-28 17:37 ` David Sterba
1 sibling, 0 replies; 6+ messages in thread
From: David Sterba @ 2013-11-28 17:37 UTC (permalink / raw)
To: Russell Coker; +Cc: linux-btrfs
On Thu, Nov 14, 2013 at 01:45:29AM +1100, Russell Coker wrote:
> Nov 14 00:00:01 workstation /USR/SBIN/CRON[30993]: (root) CMD
> (/usr/local/sbin/btrfs-make-snapshot minutes /home >> /var/log/snapshot.log)
> Nov 14 00:00:01 workstation /USR/SBIN/CRON[30994]: (root) CMD
> (/usr/local/sbin/btrfs-make-snapshot days /home >> /var/log/snapshot.log)
> Nov 14 00:00:01 workstation /USR/SBIN/CRON[30995]: (root) CMD
> (/usr/local/sbin/btrfs-remove-snapshots 100 100 /home/ / >>
> /var/log/snapshot.log)
>
> Below is a grep for NMI in my kernel message log. I'm running Debian kernel
> 3.11.6-2 on a quad-core AMD64 system. Above is the relevant section of the
> cron log. It appears that running two scripts that create snapshots at the
> same time as a script that removes maybe 97 snapshots causes an NMI. I've
> attached the kernel message log from today, I can give you logs from other
> days if they are useful.
Snapshot deletion may touch a lot of data and some operations are slowed
down by that. This may trigger the warning
INFO: task T blocked for more than 120 seconds
If there's no other crash (that would block the system indefinitely),
the warnings disappear. According to the grep output, the warnings
happen at the same daytime, so this is likely the system overload
reason.
You may want to follow the warning's sugeestion:
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message
or increase the timeout.
david
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-11-28 17:37 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-13 14:45 NMI from btrfs cron jobs in 3.11.6 Russell Coker
2013-11-13 15:15 ` Duncan
2013-11-17 14:43 ` Russell Coker
2013-11-17 15:16 ` Marc MERLIN
2013-11-28 17:10 ` David Sterba
2013-11-28 17:37 ` David Sterba
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.