From: "Paul E. McKenney" <paulmck@kernel.org>
To: donghai qiao <donghai.w.qiao@gmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>,
rcu@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: RCU: rcu stall issues and an approach to the fix
Date: Mon, 4 Oct 2021 17:59:08 -0700 [thread overview]
Message-ID: <20211005005908.GR880162@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <CAOzhvcO3a-GiipELoztmGWOmABuSC=b5vcBu8bC_Q-aT=Fe5ng@mail.gmail.com>
On Mon, Oct 04, 2021 at 05:22:52PM -0400, donghai qiao wrote:
> Hello Paul,
> Sorry it has been long..
On this problem, your schedule is my schedule. At least as long as your
are not expecting instantaneous response. ;-)
> > > Because I am dealing with this issue in multiple kernel versions, sometimes
> > > the configurations in these kernels may different. Initially the
> > > problem I described
> > > originated to rhel-8 on which the problem occurs more often and is a bit easier
> > > to reproduce than others.
> >
> > Understood, that does make things more difficult.
> >
> > > Regarding these dynticks* parameters, I collected the data for CPU 0 as below :
> > > - dynticks = 0x6eab02 which indicated the CPU was not in eqs.
> > > - dynticks_nesting = 1 which is in its initial state, so it said
> > > it was not in eqs either.
> > > - dynticks_nmi_nesting = 4000000000000004 which meant that this
> > > CPU had been
> > > interrupted when it was in the middle of the first interrupt.
> > > And this is true: the first
> > > interrupt was the sched_timer interrupt, and the second was a NMI
> > > when another
> > > CPU detected the RCU stall on CPU 0. So it looks all identical.
> > > If the kernel missed
> > > a rcu_user_enter or rcu_user_exit, would these items remain
> > > identical ? But I'll
> > > investigate that possibility seriously as you pointed out.
> >
> > So is the initial state non-eqs because it was interrupted from kernel
> > mode? Or because a missing rcu_user_enter() left ->dynticks_nesting
> > incorrectly equal to the value of 1? Or something else?
>
> As far as the original problem is concerned, the user thread was interrupted by
> the timer, so the CPU was not working in the nohz mode. But I saw the similar
> problems on CPUs working in nohz mode with different configurations.
OK.
> > > > There were some issues of this sort around the v5.8 timeframe. Might
> > > > there be another patch that needs to be backported? Or a patch that
> > > > was backported, but should not have been?
> > >
> > > Good to know that clue. I'll take a look into the log history.
> > >
> > > > Is it possible to bisect this?
> > > >
> > > > Or, again, to run with CONFIG_RCU_EQS_DEBUG=y?
> > >
> > > I am building the latest 5.14 kernel with this config and give it a try when the
> > > machine is set up, see how much it can help.
> >
> > Very good, as that will help determine whether or not the problem is
> > due to backporting issues.
>
> I enabled CONFIG_RCU_EQS_DEBUG=y as you suggested and
> tried it for both the latest rhel8 and a later upstream version 5.15.0-r1,
> turns out no new warning messages related to this came out. So,
> rcu_user_enter/rcu_user_exit() should be paired right.
OK, good.
> > > > Either way, what should happen is that dyntick_save_progress_counter() or
> > > > rcu_implicit_dynticks_qs() should see the rdp->dynticks field indicating
> > > > nohz_full user execution, and then the quiescent state will be supplied
> > > > on behalf of that CPU.
> > >
> > > Agreed. But the counter rdp->dynticks of the CPU can only be updated
> > > by rcu_dynticks_eqs_enter() or rcu_dynticks_exit() when rcu_eqs_enter()
> > > or rcu_eqs_exit() is called, which in turn depends on the context switch.
> > > So, when the context switch never happens, the counter rdp->dynticks
> > > never advances. That's the thing I try to fix here.
> >
> > First, understand the problem. Otherwise, your fix is not so likely
> > to actually fix anything. ;-)
> >
> > If kernel mode was interrupted, there is probably a missing cond_resched().
> > But in sufficiently old kernels, cond_resched() doesn't do anything for
> > RCU unless a context switch actually happened. In some of those kernels,
> > you can use cond_resched_rcu_qs() instead to get RCU's attention. In
> > really old kernels, life is hard and you will need to do some backporting.
> > Or move to newer kernels.
> >
> > In short, if an in-kernel code path runs for long enough without hitting
> > a cond_resched() or similar, that is a bug. The RCU CPU stall warning
> > that you will get is your diagnostic.
>
> Probably this is the case. With the test for 5.15.0-r1, I have seen different
> scenarios, among them the most frequent ones were caused by the networking
> in which a bunch of networking threads were spinning on the same rwlock.
>
> For instance in one of them, the ticks_this_gp of a rcu_data could go as
> large as 12166 (ticks) which is 12+ seconds. The thread on this cpu was
> doing networking work and finally it was spinning as a writer on a rwlock
> which had been locked by 16 readers. By the way, there were 70 this
> kinds of writers were blocked on the same rwlock.
OK, a lock-contention problem. The networking folks have fixed a
very large number of these over the years, though, so I wonder what is
special about this one so that it is just now showing up. I have added
a networking list on CC for their thoughts.
> When examining the readers of the lock, except the following code,
> don't see any other obvious problems: e.g
> #5 [ffffad3987254df8] __sock_queue_rcv_skb at ffffffffa49cd2ee
> #6 [ffffad3987254e18] raw_rcv at ffffffffa4ac75c8
> #7 [ffffad3987254e38] raw_local_deliver at ffffffffa4ac7819
> #8 [ffffad3987254e88] ip_protocol_deliver_rcu at ffffffffa4a8dea4
> #9 [ffffad3987254ea8] ip_local_deliver_finish at ffffffffa4a8e074
> #10 [ffffad3987254eb0] __netif_receive_skb_one_core at ffffffffa49f3057
> #11 [ffffad3987254ed0] process_backlog at ffffffffa49f3278
> #12 [ffffad3987254f08] __napi_poll at ffffffffa49f2aba
> #13 [ffffad3987254f30] net_rx_action at ffffffffa49f2f33
> #14 [ffffad3987254fa0] __softirqentry_text_start at ffffffffa50000d0
> #15 [ffffad3987254ff0] do_softirq at ffffffffa40e12f6
>
> In the function ip_local_deliver_finish() of this stack, a lot of the work needs
> to be done with ip_protocol_deliver_rcu(). But this function is invoked from
> a rcu reader side section.
>
> static int ip_local_deliver_finish(struct net *net, struct sock *sk,
> struct sk_buff *skb)
> {
> __skb_pull(skb, skb_network_header_len(skb));
>
> rcu_read_lock();
> ip_protocol_deliver_rcu(net, skb, ip_hdr(skb)->protocol);
> rcu_read_unlock();
>
> return 0;
> }
>
> Actually there are multiple chances that this code path can hit
> spinning locks starting from ip_protocol_deliver_rcu(). This kind
> usage looks not quite right. But I'd like to know your opinion on this first ?
It is perfectly legal to acquire spinlocks in RCU read-side critical
sections. In fact, this is one of the few ways to safely acquire a
per-object lock while still maintaining good performance and
scalability.
My guess is that the thing to track down is the cause of the high contention
on that reader-writer spinlock. Missed patches, misconfiguration, etc.
Thanx, Paul
next prev parent reply other threads:[~2021-10-05 0:59 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-22 20:08 RCU: rcu stall issues and an approach to the fix donghai qiao
2021-07-22 20:41 ` Paul E. McKenney
2021-07-23 0:29 ` Boqun Feng
2021-07-23 3:49 ` Paul E. McKenney
[not found] ` <CAOzhvcOLFzFGZAptOTrP9Xne1-LiO8jka1sPF6+0=WiLh-cQUA@mail.gmail.com>
2021-07-23 17:25 ` Paul E. McKenney
2021-07-23 18:41 ` donghai qiao
2021-07-23 19:06 ` Paul E. McKenney
2021-07-24 0:01 ` donghai qiao
2021-07-25 0:48 ` Paul E. McKenney
2021-07-27 22:34 ` donghai qiao
2021-07-28 0:10 ` Paul E. McKenney
2021-10-04 21:22 ` donghai qiao
2021-10-05 0:59 ` Paul E. McKenney [this message]
2021-10-05 16:10 ` donghai qiao
2021-10-05 16:39 ` Paul E. McKenney
2021-10-06 0:25 ` donghai qiao
2021-10-18 21:18 ` donghai qiao
2021-10-18 23:46 ` Paul E. McKenney
2021-10-20 17:48 ` donghai qiao
2021-10-20 18:37 ` Paul E. McKenney
2021-10-20 20:05 ` donghai qiao
2021-10-20 21:33 ` Paul E. McKenney
2021-10-21 3:25 ` Zhouyi Zhou
2021-10-21 4:17 ` Paul E. McKenney
2021-10-21 16:44 ` donghai qiao
2021-07-23 17:25 ` donghai qiao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211005005908.GR880162@paulmck-ThinkPad-P17-Gen-1 \
--to=paulmck@kernel.org \
--cc=boqun.feng@gmail.com \
--cc=donghai.w.qiao@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=rcu@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).