netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ian Kumlien <ian.kumlien@gmail.com>
To: Saeed Mahameed <saeedm@mellanox.com>
Cc: Roi Dayan <roid@mellanox.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Yevgeny Kliteynik <kliteyn@mellanox.com>,
	Leon Romanovsky <leonro@mellanox.com>
Subject: Re: [VXLAN] [MLX5] Lost traffic and issues
Date: Wed, 4 Mar 2020 10:47:52 +0100	[thread overview]
Message-ID: <CAA85sZvi+7NLdacfhp=_VA5nAtpbcN6XYwF0+vvtwkFnZK8pBA@mail.gmail.com> (raw)
In-Reply-To: <CAA85sZv+6UGXoN-eHysfojK8JtvWnRiJ8xs_QZ6hM=SveQ5CpQ@mail.gmail.com>

On Tue, Mar 3, 2020 at 11:23 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> On Mon, Mar 2, 2020 at 11:45 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > On Mon, Mar 2, 2020 at 8:10 PM Saeed Mahameed <saeedm@mellanox.com> wrote:
>
> [... 8< ...]
>
> > > What type of mlx5 configuration you have (Native PV virtualization ?
> > > SRIOV ? legacy mode or switchdev mode ? )
> >
> > We have:
> > tap -> bridge -> ovs -> bond (one legged) -switch-fabric-> <other-end>
> >
> > So a pretty standard openstack setup
>
> Oh, the L3 nodes are also MLX5s (50gbit) and they do report the lag map thing
>
> [   37.389366] mlx5_core 0000:04:00.0 ens1f0: S-tagged traffic will be
> dropped while C-tag vlan stripping is enabled
> [77126.178520] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
> [77131.485189] mlx5_core 0000:04:00.0 ens1f0: Link down
> [77337.033686] mlx5_core 0000:04:00.0 ens1f0: Link up
> [77344.338901] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [78098.028670] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
> [78103.479494] mlx5_core 0000:04:00.0 ens1f0: Link down
> [78310.028518] mlx5_core 0000:04:00.0 ens1f0: Link up
> [78317.797155] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [78504.893590] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
> [78511.277529] mlx5_core 0000:04:00.0 ens1f0: Link down
> [78714.526539] mlx5_core 0000:04:00.0 ens1f0: Link up
> [78720.422078] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [78720.838063] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
> [78727.226433] mlx5_core 0000:04:00.0 ens1f0: Link down
> [78929.575826] mlx5_core 0000:04:00.0 ens1f0: Link up
> [78935.422600] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [79330.519516] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
> [79330.831447] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
> [79336.073520] mlx5_core 0000:04:00.1 ens1f1: Link down
> [79336.279519] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
> [79541.272469] mlx5_core 0000:04:00.1 ens1f1: Link up
> [79546.664008] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [82107.461831] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
> [82113.859238] mlx5_core 0000:04:00.1 ens1f1: Link down
> [82320.458475] mlx5_core 0000:04:00.1 ens1f1: Link up
> [82327.774289] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [82490.950671] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
> [82497.307348] mlx5_core 0000:04:00.1 ens1f1: Link down
> [82705.956583] mlx5_core 0000:04:00.1 ens1f1: Link up
> [82714.055134] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [83100.804620] mlx5_core 0000:04:00.0 ens1f0: Link down
> [83100.860943] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
> [83319.953296] mlx5_core 0000:04:00.0 ens1f0: Link up
> [83327.984559] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [83924.600444] mlx5_core 0000:04:00.0 ens1f0: Link down
> [83924.656321] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
> [84312.648630] mlx5_core 0000:04:00.0 ens1f0: Link up
> [84319.571326] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [84946.495374] mlx5_core 0000:04:00.1 ens1f1: Link down
> [84946.588637] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
> [84946.692596] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
> [84949.188628] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
> [85363.543475] mlx5_core 0000:04:00.1 ens1f1: Link up
> [85371.093484] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [624051.460733] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
> [624053.644769] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
> [624053.674747] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
>
> Sorry, it's been a long couple of weeks ;)

I made them one-legged but it doesn't seem to help

Someone also posted this:
https://marc.info/?l=linux-netdev&m=158330796503347&w=2

While I don't use IPVS - I do use VXLAN and if checksums are incorrectly tagged
the nic might drop it?

> > > The only change that i could think of is the lag multi-path support we
> > > added, Roi can you please take a look at this ?
> >
> > I'm also trying to get a setup working where i could try reverting changes
> > but so far we've only had this problem with mlx5_core...
> > Also the intermittent but reliable patterns are really weird...
> >
> > All traffic seems fine, except vxlan traffic :/
> >
> > (The problem is that the actual machines that has the issue is in production
> > with 8x V100 nvidia cards... Kinda hard to justify having them "offline" ;))

      reply	other threads:[~2020-03-04  9:48 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-28 15:02 [VXLAN] [MLX5] Lost traffic and issues Ian Kumlien
2020-03-02 19:10 ` Saeed Mahameed
2020-03-02 22:45   ` Ian Kumlien
2020-03-03 10:23     ` Ian Kumlien
2020-03-04  9:47       ` Ian Kumlien [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAA85sZvi+7NLdacfhp=_VA5nAtpbcN6XYwF0+vvtwkFnZK8pBA@mail.gmail.com' \
    --to=ian.kumlien@gmail.com \
    --cc=kliteyn@mellanox.com \
    --cc=leonro@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=roid@mellanox.com \
    --cc=saeedm@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).