All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Jones <davej@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Chinner <david@fromorbit.com>,
	Oleg Nesterov <oleg@redhat.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Andrey Vagin <avagin@openvz.org>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: frequent softlockups with 3.10rc6.
Date: Sat, 29 Jun 2013 19:44:49 -0400	[thread overview]
Message-ID: <20130629234449.GA30554@redhat.com> (raw)
In-Reply-To: <CA+55aFz8rz=c8DsNHpBJW1K=FH-ztuPAMNOoGfM6HQyZByQ9mQ@mail.gmail.com>

On Sat, Jun 29, 2013 at 03:23:48PM -0700, Linus Torvalds wrote:

 > > So with that patch, those two boxes have now been fuzzing away for
 > > over 24hrs without seeing that specific sync related bug.
 > 
 > Ok, so at least that confirms that yes, the problem is the excessive
 > contention on inode_sb_list_lock.
 > 
 > Ugh. There's no way we can do that patch by DaveC for 3.10. Not only
 > is it scary, Andi pointed out that it's actively buggy and will miss
 > inodes that need writeback due to moving things to private lists.
 > 
 > So I suspect we'll have to do 3.10 with this starvation issue in
 > place, and mark for stable backporting whatever eventual fix we find.

Given I'm the only person who seems to have been bitten by this,
I suspect it's not going to be a big deal.  Worst case we can tell
people "yeah, just disable the soft watchdog until this is fixed".

 > > I did see the trace below, but I think that's a different problem..
 > > Not sure who to point at for that one though. Linus?
 > 
 > Hmm.
 > 
 > > [ 1583.293952] RIP: 0010:[<ffffffff810dd856>]  [<ffffffff810dd856>] stop_machine_cpu_stop+0x86/0x110
 > 
 > I'm not sure how sane the watchdog is over stop_machine situations. I
 > think we disable the watchdog for suspend/resume exactly because
 > stop-machine can take almost arbitrarily long. I'm assuming you're
 > stress-testing (perhaps unintentionally) the cpu offlining/onlining
 > and/or memory migration, which is just fundamentally big expensive
 > things.
 >
 > Does the machine recover? Because if it does, I'd be inclined to just
 > ignore it.

It did, after spewing that a few times, followed by this one..

BUG: soft lockup - CPU#2 stuck for 23s! [trinity-child3:2185]
Modules linked in: bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet i
rda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_i
ntel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
irq event stamp: 2291065
hardirqs last  enabled at (2291064): [<ffffffff816edca0>] restore_args+0x0/0x30
hardirqs last disabled at (2291065): [<ffffffff816f67aa>] apic_timer_interrupt+0x6a/0x80
softirqs last  enabled at (2290298): [<ffffffff810542e4>] __do_softirq+0x194/0x440
softirqs last disabled at (2290301): [<ffffffff8105474d>] irq_exit+0xcd/0xe0
CPU: 2 PID: 2185 Comm: trinity-child3 Not tainted 3.10.0-rc7+ #37 [loadavg: 27.02 10.32 6.81 60/194 2646]
task: ffff8801023e4a40 ti: ffff88022c958000 task.ti: ffff88022c958000
RIP: 0010:[<ffffffff81054201>]  [<ffffffff81054201>] __do_softirq+0xb1/0x440
RSP: 0000:ffff880244c03f08  EFLAGS: 00000206
RAX: ffff8801023e4a40 RBX: ffffffff816edca0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8801023e4a40
RBP: ffff880244c03f70 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff880244c03e78
R13: ffffffff816f67af R14: ffff880244c03f70 R15: 0000000000000000
FS:  00007f0f89ffb740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000002c1b000 CR3: 0000000210a2f000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
Stack:
 0000000a00406040 00000001002e7923 ffff88022c959fd8 ffff88022c959fd8
 ffff88022c959fd8 ffff8801023e4e38 ffff88022c959fd8 ffffffff00000002
 ffff8801023e4a40 0000000000000000 0000000000000006 0000000001807000
Call Trace:
 <IRQ> 
 [<ffffffff8105474d>] irq_exit+0xcd/0xe0
 [<ffffffff816f764b>] smp_apic_timer_interrupt+0x6b/0x9b
 [<ffffffff816f67af>] apic_timer_interrupt+0x6f/0x80
 <EOI> 
 [<ffffffff816edca0>] ? retint_restore_args+0xe/0xe
 [<ffffffff816eacf0>] ? wait_for_completion_interruptible+0x170/0x170
 [<ffffffff816ebd93>] ? preempt_schedule_irq+0x53/0x90
 [<ffffffff816eddb6>] retint_kernel+0x26/0x30
 [<ffffffff81145ba7>] ? user_enter+0x87/0xd0
 [<ffffffff816f1345>] do_page_fault+0x45/0x50
 [<ffffffff816edee2>] page_fault+0x22/0x30
Code: 48 89 45 b8 48 89 45 b0 48 89 45 a8 66 0f 1f 44 00 00 65 c7 04 25 80 0f 1d 00 00 00 00 00 e8 d7 35 06 00 fb 49 c7 c6 00 41 c0 81 <eb> 0e 0f 1f 44 00 00 49 83 c6 08 41 d1 ef 74 6c 41 f6 c7 01 74 

But after that, and one more from stop_machine, it's been quiet since, still chugging along.

 > Although it would be interesting to hear what triggers this
 > - normal users - and I'm assuming you're still running trinity as
 > non-root - generally should not be able to trigger stop-machine
 > events..

Yeah, this is running as a user. Those don't sound like things that should
be possible.  What instrumentation could I add to figure out why 
that kthread got awakened ?

	Dave


  reply	other threads:[~2013-06-29 23:45 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-19 16:45 frequent softlockups with 3.10rc6 Dave Jones
2013-06-19 17:53 ` Dave Jones
2013-06-19 18:13   ` Paul E. McKenney
2013-06-19 18:42     ` Dave Jones
2013-06-20  0:12     ` Dave Jones
2013-06-20 16:16       ` Paul E. McKenney
2013-06-20 16:27         ` Dave Jones
2013-06-21 15:11         ` Dave Jones
2013-06-21 19:59           ` Oleg Nesterov
2013-06-22  1:37             ` Dave Jones
2013-06-22 17:31               ` Oleg Nesterov
2013-06-22 21:59                 ` Dave Jones
2013-06-23  5:00                   ` Andrew Vagin
2013-06-23 14:36                   ` Oleg Nesterov
2013-06-23 15:06                     ` Dave Jones
2013-06-23 16:04                       ` Oleg Nesterov
2013-06-24  0:21                         ` Dave Jones
2013-06-24  2:00                         ` Dave Jones
2013-06-24 14:39                           ` Oleg Nesterov
2013-06-24 14:52                             ` Steven Rostedt
2013-06-24 16:00                               ` Dave Jones
2013-06-24 16:24                                 ` Steven Rostedt
2013-06-24 16:51                                   ` Dave Jones
2013-06-24 17:04                                     ` Steven Rostedt
2013-06-25 16:55                                       ` Dave Jones
2013-06-25 17:21                                         ` Steven Rostedt
2013-06-25 17:23                                           ` Steven Rostedt
2013-06-25 17:26                                           ` Dave Jones
2013-06-25 17:31                                             ` Steven Rostedt
2013-06-25 17:32                                             ` Steven Rostedt
2013-06-25 17:29                                           ` Steven Rostedt
2013-06-25 17:34                                             ` Dave Jones
2013-06-24 16:37                                 ` Oleg Nesterov
2013-06-24 16:49                                   ` Dave Jones
2013-06-24 15:57                         ` Dave Jones
2013-06-24 17:35                           ` Oleg Nesterov
2013-06-24 17:44                             ` Dave Jones
2013-06-24 17:53                             ` Steven Rostedt
2013-06-24 18:00                               ` Dave Jones
2013-06-25 15:35                             ` Dave Jones
2013-06-25 16:23                               ` Steven Rostedt
2013-06-26  5:23                                 ` Dave Jones
2013-06-26 19:52                                   ` Steven Rostedt
2013-06-26 20:00                                     ` Dave Jones
2013-06-27  3:01                                       ` Steven Rostedt
2013-06-26  5:48                                 ` Dave Jones
2013-06-26 19:18                               ` Oleg Nesterov
2013-06-26 19:40                                 ` Dave Jones
2013-06-27  0:22                                 ` Dave Jones
2013-06-27  1:06                                   ` Eric W. Biederman
2013-06-27  2:32                                     ` Tejun Heo
2013-06-27  7:55                                   ` Dave Chinner
2013-06-27 10:06                                     ` Dave Chinner
2013-06-27 12:52                                       ` Dave Chinner
2013-06-27 15:21                                         ` Dave Jones
2013-06-28  1:13                                           ` Dave Chinner
2013-06-28  3:58                                             ` Dave Chinner
2013-06-28 10:28                                               ` Jan Kara
2013-06-29  3:39                                                 ` Dave Chinner
2013-07-01 12:00                                                   ` Jan Kara
2013-07-02  6:29                                                     ` Dave Chinner
2013-07-02  8:19                                                       ` Jan Kara
2013-07-02 12:38                                                         ` Dave Chinner
2013-07-02 14:05                                                           ` Jan Kara
2013-07-02 16:13                                                             ` Linus Torvalds
2013-07-02 16:57                                                               ` Jan Kara
2013-07-02 17:38                                                                 ` Linus Torvalds
2013-07-03  3:07                                                                   ` Dave Chinner
2013-07-03  3:28                                                                     ` Linus Torvalds
2013-07-03  4:49                                                                       ` Dave Chinner
2013-07-04  7:19                                                                         ` Andrew Morton
2013-06-29 20:13                                               ` Dave Jones
2013-06-29 22:23                                                 ` Linus Torvalds
2013-06-29 23:44                                                   ` Dave Jones [this message]
2013-06-30  0:21                                                     ` Steven Rostedt
2013-07-01 12:49                                                     ` Pavel Machek
2013-06-30  0:17                                                   ` Steven Rostedt
2013-06-30  2:05                                                   ` Dave Chinner
2013-06-30  2:34                                                     ` Dave Chinner
2013-06-27 14:30                                     ` Dave Jones
2013-06-28  1:18                                       ` Dave Chinner
2013-06-28  2:54                                         ` Linus Torvalds
2013-06-28  3:54                                           ` Dave Chinner
2013-06-28  5:59                                             ` Linus Torvalds
2013-06-28  7:21                                               ` Dave Chinner
2013-06-28  8:22                                                 ` Linus Torvalds
2013-06-28  8:32                                                   ` Al Viro
2013-06-28  8:22                                               ` Al Viro
2013-06-28  9:49                                               ` Jan Kara
2013-07-01 17:57                                             ` block layer softlockup Dave Jones
2013-07-02  2:07                                               ` Dave Chinner
2013-07-02  6:01                                                 ` Dave Jones
2013-07-02  7:30                                                   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130629234449.GA30554@redhat.com \
    --to=davej@redhat.com \
    --cc=avagin@openvz.org \
    --cc=david@fromorbit.com \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.