From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Jay Vosburgh <jay.vosburgh@canonical.com>
Cc: Yanko Kaneti <yaneti@declera.com>,
Josh Boyer <jwboyer@fedoraproject.org>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Cong Wang <cwang@twopensource.com>, Kevin Fenzi <kevin@scrye.com>,
netdev <netdev@vger.kernel.org>,
"Linux-Kernel@Vger. Kernel. Org" <linux-kernel@vger.kernel.org>,
mroos@linux.ee, tj@kernel.org
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?
Date: Fri, 24 Oct 2014 22:16:02 -0700 [thread overview]
Message-ID: <20141025051602.GB28247@linux.vnet.ibm.com> (raw)
In-Reply-To: <11813.1414211613@famine>
On Fri, Oct 24, 2014 at 09:33:33PM -0700, Jay Vosburgh wrote:
> Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
>
> >On Fri, Oct 24, 2014 at 05:20:48PM -0700, Jay Vosburgh wrote:
> >> Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> >>
> >> >On Fri, Oct 24, 2014 at 03:59:31PM -0700, Paul E. McKenney wrote:
> >> [...]
> >> >> Hmmm... It sure looks like we have some callbacks stuck here. I clearly
> >> >> need to take a hard look at the sleep/wakeup code.
> >> >>
> >> >> Thank you for running this!!!
> >> >
> >> >Could you please try the following patch? If no joy, could you please
> >> >add rcu:rcu_nocb_wake to the list of ftrace events?
> >>
> >> I tried the patch, it did not change the behavior.
> >>
> >> I enabled the rcu:rcu_barrier and rcu:rcu_nocb_wake tracepoints
> >> and ran it again (with this patch and the first patch from earlier
> >> today); the trace output is a bit on the large side so I put it and the
> >> dmesg log at:
> >>
> >> http://people.canonical.com/~jvosburgh/nocb-wake-dmesg.txt
> >>
> >> http://people.canonical.com/~jvosburgh/nocb-wake-trace.txt
> >
> >Thank you again!
> >
> >Very strange part of the trace. The only sign of CPU 2 and 3 are:
> >
> > ovs-vswitchd-902 [000] .... 109.896840: rcu_barrier: rcu_sched Begin cpu -1 remaining 0 # 0
> > ovs-vswitchd-902 [000] .... 109.896840: rcu_barrier: rcu_sched Check cpu -1 remaining 0 # 0
> > ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched Inc1 cpu -1 remaining 0 # 1
> > ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched OnlineNoCB cpu 0 remaining 1 # 1
> > ovs-vswitchd-902 [000] d... 109.896841: rcu_nocb_wake: rcu_sched 0 WakeNot
> > ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched OnlineNoCB cpu 1 remaining 2 # 1
> > ovs-vswitchd-902 [000] d... 109.896841: rcu_nocb_wake: rcu_sched 1 WakeNot
> > ovs-vswitchd-902 [000] .... 109.896842: rcu_barrier: rcu_sched OnlineNoCB cpu 2 remaining 3 # 1
> > ovs-vswitchd-902 [000] d... 109.896842: rcu_nocb_wake: rcu_sched 2 WakeNotPoll
> > ovs-vswitchd-902 [000] .... 109.896842: rcu_barrier: rcu_sched OnlineNoCB cpu 3 remaining 4 # 1
> > ovs-vswitchd-902 [000] d... 109.896842: rcu_nocb_wake: rcu_sched 3 WakeNotPoll
> > ovs-vswitchd-902 [000] .... 109.896843: rcu_barrier: rcu_sched Inc2 cpu -1 remaining 4 # 2
> >
> >The pair of WakeNotPoll trace entries says that at that point, RCU believed
> >that the CPU 2's and CPU 3's rcuo kthreads did not exist. :-/
>
> On the test system I'm using, CPUs 2 and 3 really do not exist;
> it is a 2 CPU system (Intel Core 2 Duo E8400). I mentioned this in an
> earlier message, but perhaps you missed it in the flurry.
Or forgot it. Either way, thank you for reminding me.
> Looking at the dmesg, the early boot messages seem to be
> confused as to how many CPUs there are, e.g.,
>
> [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> [ 0.000000] Hierarchical RCU implementation.
> [ 0.000000] RCU debugfs-based tracing is enabled.
> [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
> [ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
> [ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
> [ 0.000000] NR_IRQS:16640 nr_irqs:456 0
> [ 0.000000] Offload RCU callbacks from all CPUs
> [ 0.000000] Offload RCU callbacks from CPUs: 0-3.
>
> but later shows 2:
>
> [ 0.233703] x86: Booting SMP configuration:
> [ 0.236003] .... node #0, CPUs: #1
> [ 0.255528] x86: Booted up 1 node, 2 CPUs
>
> In any event, the E8400 is a 2 core CPU with no hyperthreading.
Well, this might explain some of the difficulties. If RCU decides to wait
on CPUs that don't exist, we will of course get a hang. And rcu_barrier()
was definitely expecting four CPUs.
So what happens if you boot with maxcpus=2? (Or build with
CONFIG_NR_CPUS=2.) I suspect that this might avoid the hang. If so,
I might have some ideas for a real fix.
Thanx, Paul
next prev parent reply other threads:[~2014-10-25 5:16 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-20 20:15 localed stuck in recent 3.18 git in copy_net_ns? Kevin Fenzi
2014-10-20 20:43 ` Dave Jones
2014-10-20 20:53 ` Kevin Fenzi
2014-10-21 21:12 ` Kevin Fenzi
2014-10-22 17:12 ` Josh Boyer
2014-10-22 17:37 ` Cong Wang
2014-10-22 17:49 ` Josh Boyer
2014-10-22 17:53 ` Eric W. Biederman
2014-10-22 18:11 ` Paul E. McKenney
2014-10-22 18:25 ` Eric W. Biederman
2014-10-22 18:55 ` Paul E. McKenney
2014-10-22 19:33 ` Josh Boyer
2014-10-22 22:40 ` Yanko Kaneti
2014-10-22 23:24 ` Paul E. McKenney
2014-10-23 6:09 ` Yanko Kaneti
2014-10-23 12:27 ` Paul E. McKenney
2014-10-23 15:33 ` Paul E. McKenney
[not found] ` <CA+5PVA4H6EAf6cBc4a_8W8x4Mgppjc5GsskKaCRry2jq+LP+FA@mail.gmail.com>
2014-10-23 16:28 ` Paul E. McKenney
2014-10-23 19:51 ` Yanko Kaneti
2014-10-23 20:05 ` Paul E. McKenney
2014-10-23 21:45 ` Yanko Kaneti
2014-10-23 22:04 ` Paul E. McKenney
2014-10-24 4:48 ` Jay Vosburgh
2014-10-24 14:50 ` Paul E. McKenney
2014-10-24 18:20 ` Jay Vosburgh
2014-10-24 18:33 ` Paul E. McKenney
2014-10-24 9:08 ` Yanko Kaneti
2014-10-24 15:40 ` Paul E. McKenney
2014-10-24 16:29 ` Yanko Kaneti
2014-10-24 16:54 ` Paul E. McKenney
2014-10-24 17:09 ` Yanko Kaneti
2014-10-24 17:20 ` Paul E. McKenney
2014-10-24 17:35 ` Yanko Kaneti
2014-10-24 18:32 ` Paul E. McKenney
2014-10-24 18:49 ` Jay Vosburgh
2014-10-24 18:57 ` Paul E. McKenney
2014-10-24 20:15 ` Paul E. McKenney
2014-10-24 21:25 ` Yanko Kaneti
2014-10-24 21:49 ` Paul E. McKenney
2014-10-24 22:02 ` Jay Vosburgh
2014-10-24 22:16 ` Paul E. McKenney
2014-10-24 22:41 ` Jay Vosburgh
2014-10-24 22:34 ` Jay Vosburgh
2014-10-24 22:59 ` Paul E. McKenney
2014-10-24 23:05 ` Paul E. McKenney
2014-10-25 0:20 ` Jay Vosburgh
2014-10-25 2:03 ` Paul E. McKenney
2014-10-25 4:33 ` Jay Vosburgh
2014-10-25 5:16 ` Paul E. McKenney [this message]
2014-10-25 16:38 ` Jay Vosburgh
2014-10-25 18:18 ` Paul E. McKenney
2014-10-27 17:45 ` Paul E. McKenney
2014-10-27 20:43 ` Jay Vosburgh
2014-10-27 21:07 ` Paul E. McKenney
2014-10-28 8:12 ` Yanko Kaneti
2014-10-28 12:50 ` Paul E. McKenney
2014-10-28 13:00 ` Yanko Kaneti
2014-10-28 15:54 ` Kevin Fenzi
2014-10-28 16:15 ` Paul E. McKenney
2014-10-25 12:09 ` Yanko Kaneti
2014-10-25 13:38 ` Paul E. McKenney
2014-10-22 17:59 ` Paul E. McKenney
2014-10-22 18:03 ` Josh Boyer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141025051602.GB28247@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=cwang@twopensource.com \
--cc=ebiederm@xmission.com \
--cc=jay.vosburgh@canonical.com \
--cc=jwboyer@fedoraproject.org \
--cc=kevin@scrye.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mroos@linux.ee \
--cc=netdev@vger.kernel.org \
--cc=tj@kernel.org \
--cc=yaneti@declera.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).