All of lore.kernel.org
 help / color / mirror / Atom feed
* NMI from btrfs cron jobs in 3.11.6
@ 2013-11-13 14:45 Russell Coker
  2013-11-13 15:15 ` Duncan
  2013-11-28 17:37 ` David Sterba
  0 siblings, 2 replies; 6+ messages in thread
From: Russell Coker @ 2013-11-13 14:45 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3029 bytes --]

Nov 14 00:00:01 workstation /USR/SBIN/CRON[30993]: (root) CMD 
(/usr/local/sbin/btrfs-make-snapshot minutes /home >> /var/log/snapshot.log)                                                                                                                                                                                                                      
Nov 14 00:00:01 workstation /USR/SBIN/CRON[30994]: (root) CMD 
(/usr/local/sbin/btrfs-make-snapshot days /home >> /var/log/snapshot.log)                                                                                                                                                                                                                         
Nov 14 00:00:01 workstation /USR/SBIN/CRON[30995]: (root) CMD 
(/usr/local/sbin/btrfs-remove-snapshots 100 100 /home/ / >> 
/var/log/snapshot.log)

Below is a grep for NMI in my kernel message log.  I'm running Debian kernel 
3.11.6-2 on a quad-core AMD64 system.  Above is the relevant section of the 
cron log.  It appears that running two scripts that create snapshots at the 
same time as a script that removes maybe 97 snapshots causes an NMI.  I've 
attached the kernel message log from today, I can give you logs from other 
days if they are useful.

Nov 11 00:00:58 workstation kernel: [896466.616001] sending NMI to all CPUs:
Nov 11 00:00:58 workstation kernel: [896466.616016] NMI backtrace for cpu 0
Nov 11 00:00:58 workstation kernel: [896466.616001] NMI backtrace for cpu 2
Nov 11 00:00:58 workstation kernel: [896466.616479] NMI backtrace for cpu 3
Nov 11 00:00:58 workstation kernel: [896466.616637] NMI backtrace for cpu 1
Nov 12 00:00:58 workstation kernel: [982863.428000] sending NMI to all CPUs:
Nov 12 00:00:58 workstation kernel: [982863.428000] NMI backtrace for cpu 1
Nov 12 00:00:58 workstation kernel: [982863.428340] NMI backtrace for cpu 0
Nov 12 00:00:58 workstation kernel: [982863.428647] NMI backtrace for cpu 2
Nov 12 00:00:58 workstation kernel: [982863.428725] NMI backtrace for cpu 3
Nov 13 00:00:57 workstation kernel: [1069265.036001] sending NMI to all CPUs:
Nov 13 00:00:57 workstation kernel: [1069265.036018] NMI backtrace for cpu 1
Nov 13 00:00:57 workstation kernel: [1069265.036341] NMI backtrace for cpu 0
Nov 13 00:00:57 workstation kernel: [1069265.036001] NMI backtrace for cpu 2
Nov 13 00:00:57 workstation kernel: [1069265.036723] NMI backtrace for cpu 3
Nov 13 22:21:04 workstation kernel: [1149672.214144] INFO: NMI handler 
(perf_event_nmi_handler) took too long to run: 6.060 msecs
Nov 14 00:00:56 workstation kernel: [1155663.796001] sending NMI to all CPUs:
Nov 14 00:00:56 workstation kernel: [1155663.796001] NMI backtrace for cpu 2
Nov 14 00:00:56 workstation kernel: [1155663.796269] NMI backtrace for cpu 3
Nov 14 00:00:56 workstation kernel: [1155663.796007] NMI backtrace for cpu 1
Nov 14 00:00:56 workstation kernel: [1155663.796581] NMI backtrace for cpu 0

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

[-- Attachment #2: oops.gz --]
[-- Type: application/x-gzip, Size: 2761 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NMI from btrfs cron jobs in 3.11.6
  2013-11-13 14:45 NMI from btrfs cron jobs in 3.11.6 Russell Coker
@ 2013-11-13 15:15 ` Duncan
  2013-11-17 14:43   ` Russell Coker
  2013-11-28 17:37 ` David Sterba
  1 sibling, 1 reply; 6+ messages in thread
From: Duncan @ 2013-11-13 15:15 UTC (permalink / raw)
  To: linux-btrfs

Russell Coker posted on Thu, 14 Nov 2013 01:45:29 +1100 as excerpted:

> It appears that running two scripts that create snapshots at the same
> time as a script that removes maybe 97 snapshots causes an NMI.

This is a known bug in 3.11 (and presumably earlier).  Snapshot 
manipulation concurrency evidently wasn't originally well considered, and 
either a recent regression triggered the problem or possibly it always 
existed and only recently was reported (I don't know either way, tho the 
btrfs devs might).

Either way, there's a patch both to address the problem and adding a 
regression test to the xfstests filesystem testing suite to catch future 
regressions, but it's too recent to be in 3.11 yet, and indeed, as of my 
update yesterday (3.12.0-03510-g9b66bfb), there hadn't yet been a btrfs 
pull in the 3.12 commit window yet either, so while 3.12 will likely get 
it, it's not there yet as of the above git snapshot.

If interested in more, check recent list threads for the bug report and 
proposed patch.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NMI from btrfs cron jobs in 3.11.6
  2013-11-13 15:15 ` Duncan
@ 2013-11-17 14:43   ` Russell Coker
  2013-11-17 15:16     ` Marc MERLIN
  0 siblings, 1 reply; 6+ messages in thread
From: Russell Coker @ 2013-11-17 14:43 UTC (permalink / raw)
  To: linux-btrfs

On Thu, 14 Nov 2013, Duncan <1i5t5.duncan@cox.net> wrote:
> > It appears that running two scripts that create snapshots at the same
> > time as a script that removes maybe 97 snapshots causes an NMI.
> 
> This is a known bug in 3.11 (and presumably earlier).  Snapshot 
> manipulation concurrency evidently wasn't originally well considered, and 
> either a recent regression triggered the problem or possibly it always 
> existed and only recently was reported (I don't know either way, tho the 
> btrfs devs might).
> 
> Either way, there's a patch both to address the problem and adding a 
> regression test to the xfstests filesystem testing suite to catch future 

Thanks for the information.

As an aside, I had it happen on a workstation that I was using to watch a 
movie over NFS.  Strangely the BTRFS hang caused video playback to pause.  I 
didn't expect a BTRFS issue like that to stop access to an NFS mount from 
another system.

Would it be locking up inside the VFS?

-- 
My Main Blog         http://etbe.coker.com.au/
My Documents Blog    http://doc.coker.com.au/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NMI from btrfs cron jobs in 3.11.6
  2013-11-17 14:43   ` Russell Coker
@ 2013-11-17 15:16     ` Marc MERLIN
  2013-11-28 17:10       ` David Sterba
  0 siblings, 1 reply; 6+ messages in thread
From: Marc MERLIN @ 2013-11-17 15:16 UTC (permalink / raw)
  To: Russell Coker, 1i5t5.duncan; +Cc: linux-btrfs

On Mon, Nov 18, 2013 at 01:43:51AM +1100, Russell Coker wrote:
> On Thu, 14 Nov 2013, Duncan <1i5t5.duncan@cox.net> wrote:
> > > It appears that running two scripts that create snapshots at the same
> > > time as a script that removes maybe 97 snapshots causes an NMI.
> > 
> > This is a known bug in 3.11 (and presumably earlier).  Snapshot 
> > manipulation concurrency evidently wasn't originally well considered, and 

Not super important, but the bug does go farther back in time, I
reported it here too some time back (although didn't hear anything back
at the time).

But for me I had
/var/local/scr/btrfs_snaps: line 23: 26017 Segmentation fault      (core dumped) /sbin/btrfs subvolume
delete "$sub"
with details on the BT pasted in the mail:
http://permalink.gmane.org/gmane.comp.file-systems.btrfs/28770

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NMI from btrfs cron jobs in 3.11.6
  2013-11-17 15:16     ` Marc MERLIN
@ 2013-11-28 17:10       ` David Sterba
  0 siblings, 0 replies; 6+ messages in thread
From: David Sterba @ 2013-11-28 17:10 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Russell Coker, 1i5t5.duncan, linux-btrfs

On Sun, Nov 17, 2013 at 07:16:18AM -0800, Marc MERLIN wrote:
> On Mon, Nov 18, 2013 at 01:43:51AM +1100, Russell Coker wrote:
> > On Thu, 14 Nov 2013, Duncan <1i5t5.duncan@cox.net> wrote:
> > > > It appears that running two scripts that create snapshots at the same
> > > > time as a script that removes maybe 97 snapshots causes an NMI.
> > > 
> > > This is a known bug in 3.11 (and presumably earlier).  Snapshot 
> > > manipulation concurrency evidently wasn't originally well considered, and 
> 
> Not super important, but the bug does go farther back in time, I
> reported it here too some time back (although didn't hear anything back
> at the time).
> 
> But for me I had
> /var/local/scr/btrfs_snaps: line 23: 26017 Segmentation fault      (core dumped) /sbin/btrfs subvolume
> delete "$sub"
> with details on the BT pasted in the mail:
> http://permalink.gmane.org/gmane.comp.file-systems.btrfs/28770

Kernel 3.2.5

IIRC there was a fix or two regarding concurrency during snapshot
creation and deletion, roughly matching the kernel 3.2 times.

david

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NMI from btrfs cron jobs in 3.11.6
  2013-11-13 14:45 NMI from btrfs cron jobs in 3.11.6 Russell Coker
  2013-11-13 15:15 ` Duncan
@ 2013-11-28 17:37 ` David Sterba
  1 sibling, 0 replies; 6+ messages in thread
From: David Sterba @ 2013-11-28 17:37 UTC (permalink / raw)
  To: Russell Coker; +Cc: linux-btrfs

On Thu, Nov 14, 2013 at 01:45:29AM +1100, Russell Coker wrote:
> Nov 14 00:00:01 workstation /USR/SBIN/CRON[30993]: (root) CMD 
> (/usr/local/sbin/btrfs-make-snapshot minutes /home >> /var/log/snapshot.log)                                                                                                                                                                                                                      
> Nov 14 00:00:01 workstation /USR/SBIN/CRON[30994]: (root) CMD 
> (/usr/local/sbin/btrfs-make-snapshot days /home >> /var/log/snapshot.log)                                                                                                                                                                                                                         
> Nov 14 00:00:01 workstation /USR/SBIN/CRON[30995]: (root) CMD 
> (/usr/local/sbin/btrfs-remove-snapshots 100 100 /home/ / >> 
> /var/log/snapshot.log)
> 
> Below is a grep for NMI in my kernel message log.  I'm running Debian kernel 
> 3.11.6-2 on a quad-core AMD64 system.  Above is the relevant section of the 
> cron log.  It appears that running two scripts that create snapshots at the 
> same time as a script that removes maybe 97 snapshots causes an NMI.  I've 
> attached the kernel message log from today, I can give you logs from other 
> days if they are useful.

Snapshot deletion may touch a lot of data and some operations are slowed
down by that. This may trigger the warning

  INFO: task T blocked for more than 120 seconds

If there's no other crash (that would block the system indefinitely),
the warnings disappear. According to the grep output, the warnings
happen at the same daytime, so this is likely the system overload
reason.

You may want to follow the warning's sugeestion:

"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message

or increase the timeout.

david

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-11-28 17:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-13 14:45 NMI from btrfs cron jobs in 3.11.6 Russell Coker
2013-11-13 15:15 ` Duncan
2013-11-17 14:43   ` Russell Coker
2013-11-17 15:16     ` Marc MERLIN
2013-11-28 17:10       ` David Sterba
2013-11-28 17:37 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.