All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: intel-wired-lan@lists.osuosl.org, anthony.l.nguyen@intel.com,
	 jesse.brandeburg@intel.com
Cc: Ilya Dryomov <idryomov@gmail.com>, Xiubo Li <xiubli@redhat.com>,
	Venky Shankar <vshankar@redhat.com>
Subject: [Intel-wired-lan] intermittent ixgbe transmit queue timeouts in v5.18 kernels
Date: Thu, 02 Jun 2022 17:37:38 -0400	[thread overview]
Message-ID: <8225a14538339c7b38d9da1974ebefaf4db1bc51.camel@kernel.org> (raw)

The Ceph project test lab has a fairly large cluster of machines with
ixgbe adapters:

    03:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

Recently, we've started getting intermittent tx queue timeouts with
these machines. One of them is reported here:

    https://tracker.ceph.com/issues/55823

Usually this happens when we're trying to do a sync, and there is a
flurry of transmission activity. Afterward we see a lot of fallout in
ceph culminating in softlockups.

The kernels we're testing have some patches that are not yet in
mainline, but mostly they are confined to net/ceph and fs/ceph, and
shouldn't really affect hw drivers.

The problem manifested pretty regularly during v5.18 and then I didn't
see it for a while. I had figured it was something that had been fixed,
but I think it was just "luck".

I attempted a bisect a while back, and ruled out recent ceph changes as
the issue. Unfortunately, I wasn't able to get to a conclusive patch
that broke it, but I think it likely crept in during the initial merge
window for v5.18 (pre-rc1).

One other oddity: the test lab often installs bleeding-edge kernels on
old distros (RHEL8 and Ubuntu from similar era). Is it possible that the
firmware that ships with these older distros is not suitable for the
more recent driver in v5.18 ?

Any thoughts or suggestions on things we can do to fix this?

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

             reply	other threads:[~2022-06-03  3:38 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-02 21:37 Jeff Layton [this message]
2022-06-07 21:22 ` [Intel-wired-lan] intermittent ixgbe transmit queue timeouts in v5.18 kernels Switzer, David
2022-06-08 12:44   ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8225a14538339c7b38d9da1974ebefaf4db1bc51.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=anthony.l.nguyen@intel.com \
    --cc=idryomov@gmail.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jesse.brandeburg@intel.com \
    --cc=vshankar@redhat.com \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.