linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jarod Wilson <jarod@redhat.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Mahesh Bandewar <maheshb@google.com>,
	Jay Vosburgh <j.vosburgh@gmail.com>,
	Veaceslav Falico <vfalico@gmail.com>,
	Andy Gospodarek <andy@greyhouse.net>,
	"David S. Miller" <davem@davemloft.net>,
	Thomas Davis <tadavis@lbl.gov>, Netdev <netdev@vger.kernel.org>
Subject: Re: [PATCH net] bonding: reduce rtnl lock contention in mii monitor thread
Date: Wed, 9 Dec 2020 10:08:05 -0500	[thread overview]
Message-ID: <CAKfmpSe1o9_eFu70PPHT9MF5tMYjZqgajCfpDHCnsHQBUdcW0Q@mail.gmail.com> (raw)
In-Reply-To: <20201208113820.179ed5ca@kicinski-fedora-pc1c0hjn.DHCP.thefacebook.com>

On Tue, Dec 8, 2020 at 2:38 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sat,  5 Dec 2020 18:43:54 -0500 Jarod Wilson wrote:
> > I'm seeing a system get stuck unable to bring a downed interface back up
> > when it's got an updelay value set, behavior which ceased when logging
> > spew was removed from bond_miimon_inspect(). I'm monitoring logs on this
> > system over another network connection, and it seems that the act of
> > spewing logs at all there increases rtnl lock contention, because
> > instrumented code showed bond_mii_monitor() never able to succeed in it's
> > attempts to call rtnl_trylock() to actually commit link state changes,
> > leaving the downed link stuck in BOND_LINK_DOWN. The system in question
> > appears to be fine with the log spew being moved to
> > bond_commit_link_state(), which is called after the successful
> > rtnl_trylock().
>
> But it's not called under rtnl_lock AFAICT. So something else is also
> spewing messages?
>
> While bond_commit_link_state() _is_ called under the lock. So you're
> increasing the retry rate, by putting the slow operation under the
> lock, is that right?

Partially, yes. I probably should have tagged this with RFC instead of
PATCH, tbh. My theory was that the log spew, being sent out *other*
network interfaces when monitoring the system or remote syslog or ssh
was potentially causing some rtnl_lock() calls, so not spewing until
after actually being able to grab the lock would lessen the problem
w/actually acquiring the lock, but I ... don't know offhand how to
verify that theory.


> Also isn't bond_commit_link_state() called from many more places?
> So we're adding new prints, effectively?

Ah. Crap. Yes. bond_set_slave_link_state() is called quite a few
places, and that in turn calls bond_commit_link_state().


> > I'm actually wondering if perhaps we ultimately need/want
> > some bond-specific lock here to prevent racing with bond_close() instead
> > of using rtnl, but this shift of the output appears to work. I believe
> > this started happening when de77ecd4ef02 ("bonding: improve link-status
> > update in mii-monitoring") went in, but I'm not 100% on that.
> >
> > The addition of a case BOND_LINK_BACK in bond_miimon_inspect() is somewhat
> > separate from the fix for the actual hang, but it eliminates a constant
> > "invalid new link 3 on slave" message seen related to this issue, and it's
> > not actually an invalid state here, so we shouldn't be reporting it as an
> > error.
>
> Let's make it a separate patch, then.

Sounds like Jay is confident that bit is valid, and I shouldn't be
ending up in that state, unless something else is going wrong.

-- 
Jarod Wilson
jarod@redhat.com


  reply	other threads:[~2020-12-09 15:10 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-05 23:43 [PATCH net] bonding: reduce rtnl lock contention in mii monitor thread Jarod Wilson
2020-12-08 19:38 ` Jakub Kicinski
2020-12-09 15:08   ` Jarod Wilson [this message]
2020-12-08 20:34 ` Jay Vosburgh
2020-12-09 15:26   ` Jarod Wilson
2020-12-18 23:48   ` Jarod Wilson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKfmpSe1o9_eFu70PPHT9MF5tMYjZqgajCfpDHCnsHQBUdcW0Q@mail.gmail.com \
    --to=jarod@redhat.com \
    --cc=andy@greyhouse.net \
    --cc=davem@davemloft.net \
    --cc=j.vosburgh@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maheshb@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=tadavis@lbl.gov \
    --cc=vfalico@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).