All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ian Kumlien <ian.kumlien@gmail.com>
To: Saeed Mahameed <saeedm@mellanox.com>
Cc: Roi Dayan <roid@mellanox.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Yevgeny Kliteynik <kliteyn@mellanox.com>,
	Leon Romanovsky <leonro@mellanox.com>
Subject: Re: [VXLAN] [MLX5] Lost traffic and issues
Date: Wed, 4 Mar 2020 10:47:52 +0100	[thread overview]
Message-ID: <CAA85sZvi+7NLdacfhp=_VA5nAtpbcN6XYwF0+vvtwkFnZK8pBA@mail.gmail.com> (raw)
In-Reply-To: <CAA85sZv+6UGXoN-eHysfojK8JtvWnRiJ8xs_QZ6hM=SveQ5CpQ@mail.gmail.com>

On Tue, Mar 3, 2020 at 11:23 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> On Mon, Mar 2, 2020 at 11:45 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > On Mon, Mar 2, 2020 at 8:10 PM Saeed Mahameed <saeedm@mellanox.com> wrote:
>
> [... 8< ...]
>
> > > What type of mlx5 configuration you have (Native PV virtualization ?
> > > SRIOV ? legacy mode or switchdev mode ? )
> >
> > We have:
> > tap -> bridge -> ovs -> bond (one legged) -switch-fabric-> <other-end>
> >
> > So a pretty standard openstack setup
>
> Oh, the L3 nodes are also MLX5s (50gbit) and they do report the lag map thing
>
> [   37.389366] mlx5_core 0000:04:00.0 ens1f0: S-tagged traffic will be
> dropped while C-tag vlan stripping is enabled
> [77126.178520] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
> [77131.485189] mlx5_core 0000:04:00.0 ens1f0: Link down
> [77337.033686] mlx5_core 0000:04:00.0 ens1f0: Link up
> [77344.338901] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [78098.028670] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
> [78103.479494] mlx5_core 0000:04:00.0 ens1f0: Link down
> [78310.028518] mlx5_core 0000:04:00.0 ens1f0: Link up
> [78317.797155] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [78504.893590] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
> [78511.277529] mlx5_core 0000:04:00.0 ens1f0: Link down
> [78714.526539] mlx5_core 0000:04:00.0 ens1f0: Link up
> [78720.422078] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [78720.838063] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
> [78727.226433] mlx5_core 0000:04:00.0 ens1f0: Link down
> [78929.575826] mlx5_core 0000:04:00.0 ens1f0: Link up
> [78935.422600] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [79330.519516] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
> [79330.831447] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
> [79336.073520] mlx5_core 0000:04:00.1 ens1f1: Link down
> [79336.279519] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
> [79541.272469] mlx5_core 0000:04:00.1 ens1f1: Link up
> [79546.664008] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [82107.461831] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
> [82113.859238] mlx5_core 0000:04:00.1 ens1f1: Link down
> [82320.458475] mlx5_core 0000:04:00.1 ens1f1: Link up
> [82327.774289] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [82490.950671] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
> [82497.307348] mlx5_core 0000:04:00.1 ens1f1: Link down
> [82705.956583] mlx5_core 0000:04:00.1 ens1f1: Link up
> [82714.055134] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [83100.804620] mlx5_core 0000:04:00.0 ens1f0: Link down
> [83100.860943] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
> [83319.953296] mlx5_core 0000:04:00.0 ens1f0: Link up
> [83327.984559] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [83924.600444] mlx5_core 0000:04:00.0 ens1f0: Link down
> [83924.656321] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
> [84312.648630] mlx5_core 0000:04:00.0 ens1f0: Link up
> [84319.571326] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [84946.495374] mlx5_core 0000:04:00.1 ens1f1: Link down
> [84946.588637] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
> [84946.692596] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
> [84949.188628] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
> [85363.543475] mlx5_core 0000:04:00.1 ens1f1: Link up
> [85371.093484] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
> [624051.460733] mlx5_core 0000:04:00.0: modify lag map port 1:2 port 2:2
> [624053.644769] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:1
> [624053.674747] mlx5_core 0000:04:00.0: modify lag map port 1:1 port 2:2
>
> Sorry, it's been a long couple of weeks ;)

I made them one-legged but it doesn't seem to help

Someone also posted this:
https://marc.info/?l=linux-netdev&m=158330796503347&w=2

While I don't use IPVS - I do use VXLAN and if checksums are incorrectly tagged
the nic might drop it?

> > > The only change that i could think of is the lag multi-path support we
> > > added, Roi can you please take a look at this ?
> >
> > I'm also trying to get a setup working where i could try reverting changes
> > but so far we've only had this problem with mlx5_core...
> > Also the intermittent but reliable patterns are really weird...
> >
> > All traffic seems fine, except vxlan traffic :/
> >
> > (The problem is that the actual machines that has the issue is in production
> > with 8x V100 nvidia cards... Kinda hard to justify having them "offline" ;))

      reply	other threads:[~2020-03-04  9:48 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-28 15:02 [VXLAN] [MLX5] Lost traffic and issues Ian Kumlien
2020-03-02 19:10 ` Saeed Mahameed
2020-03-02 22:45   ` Ian Kumlien
2020-03-03 10:23     ` Ian Kumlien
2020-03-04  9:47       ` Ian Kumlien [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAA85sZvi+7NLdacfhp=_VA5nAtpbcN6XYwF0+vvtwkFnZK8pBA@mail.gmail.com' \
    --to=ian.kumlien@gmail.com \
    --cc=kliteyn@mellanox.com \
    --cc=leonro@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=roid@mellanox.com \
    --cc=saeedm@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.