linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jinpu Wang <jinpu.wang@ionos.com>
To: Leon Romanovsky <leon@kernel.org>, Itay Aveksis <itayav@nvidia.com>
Cc: Jinpu Wang <jinpu.wang@cloud.ionos.com>,
	Jack Wang <xjtuwjp@gmail.com>, Doug Ledford <dledford@redhat.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	linux-rdma@vger.kernel.org
Subject: Re: IPoIB child interfaces not working with mlx5
Date: Fri, 7 May 2021 08:53:28 +0200	[thread overview]
Message-ID: <CAMGffEmKKMS9dX+PTefsuoD7C350JPZ3hBM6ASm9XuzxvV-7pQ@mail.gmail.com> (raw)
In-Reply-To: <YH67EMmq+Rcd0hLJ@unreal>

On Tue, Apr 20, 2021 at 1:29 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Tue, Apr 20, 2021 at 11:14:41AM +0200, Jinpu Wang wrote:
> > On Mon, Mar 22, 2021 at 7:56 AM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Mon, Mar 22, 2021 at 07:08:01AM +0100, Jinpu Wang wrote:
> > > > On Sun, Mar 21, 2021 at 2:07 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > > >
> > > > > On Sat, Mar 20, 2021 at 02:09:50PM +0100, Jack Wang wrote:
> > > > > > Leon Romanovsky <leon@kernel.org>于2021年3月20日 周六12:17写道:
> > > > > >
> > > > > > > On Fri, Mar 19, 2021 at 08:44:29AM +0100, Jinpu Wang wrote:
> > > > > > > > Hi Jason and Leon,
> > > > > > > >
> > > > > > > > We recently switch to use upstream OFED from MLNX-OFED, and we notice
> > > > > > > > IPoIB stop working with upstream kernel 5.4.102 with mellanox CX-5
> > > > > > > > HCA, it's working fine on CX-2/CX-3. I tested also on 5.11 kernel it
> > > > > > > > behaves the same.
> > > > > > >
> > > > > > > Are you using "enhanced IPoIB" for CX-5 devices? MLX5_CORE_IPOIB?
> > > > > > >
> > > > > > > Thanks
> > > > > >
> > > > > >  Yes.
> > > > >
> > > > > > Is this expected behavor?
> > > > >
> > > > > Yes, we wanted to make IPoIB behave like any other netdev interfaces and
> > > > > if parent interface isn't enabled, no traffic should pass. More on that,
> > > > > in our internal implementation of enhanced IPoIB, we are reusing same
> > > > > resources for both parent and child, this requires us to wait for "UP"
> > > > > event before allowing traffic.
> > > > >
> > > > > Thanks
> > > > Hi Leon,
> > > >
> > > > Thanks for the clarification, is this behavior documented somewhere?
> > > > is it specific to "enhanced IPoIB" for CX-5?
> > >
> > > It is specific to "enhanced IPoIB" and not to device. I don't know where
> > > we can document it.
> > >
> > > > Will it work differently if without MLX5_CORE_IPOIB enabled?
> > >
> > > Yes, without MLX5_CORE_IPOIB, the devices will work in "legacy IPoIB",
> > > exactly as cx-3. The best thing will be to change IPoIB ULP to behave
> > > like netdev, but we were not comfortable to do it back then due to
> > > user visible nature of such change.
> > >
> > Hi Leon,
> >
> > More testing reveals new problems with MLX5_CORE_IPOIB.
> > w MLX5_CORE_IPOIB, ping wors on both hosts, but iperf3 doens't send any data.

 Just want to give an update, we finally find out the key which leads
to the failure on our side.

we need to set the child interface to same MTU as the parent.
jwang@ps401a-913.nst:/mnt/jwang$ ip link list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
mode DEFAULT group default qlen 1000
    link/ether 0c:c4:7a:ff:07:ce brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether 0c:c4:7a:ff:07:cf brd ff:ff:ff:ff:ff:ff
6: ha_transport: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether f6:ff:16:93:08:8a brd ff:ff:ff:ff:ff:ff
11: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP
mode DEFAULT group default qlen 1024
    link/infiniband
00:00:00:83:fe:80:00:00:00:00:00:00:98:03:9b:03:00:6c:79:12 brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
12: ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP
mode DEFAULT group default qlen 1024
    link/infiniband
00:00:01:58:fe:80:00:00:00:00:00:00:98:03:9b:03:00:6c:79:13 brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
13: ib0.dddd@ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq
state UP mode DEFAULT group default qlen 1024
    link/infiniband
00:00:10:8c:fe:80:00:00:00:00:00:00:98:03:9b:03:00:6c:79:12 brd
00:ff:ff:ff:ff:12:40:1b:dd:dd:00:00:00:00:00:00:ff:ff:ff:ff
14: ib1.dddd@ib1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq
state UP mode DEFAULT group default qlen 1024
    link/infiniband
00:00:11:8c:fe:80:00:00:00:00:00:00:98:03:9b:03:00:6c:79:13 brd
00:ff:ff:ff:ff:12:40:1b:dd:dd:00:00:00:00:00:00:ff:ff:ff:ff

Initially, ib0 mtu is 2044, and ib0.dddd is 4092.
After I reduced ib0.dddd mtu to 2044 on both sides, then iperf3 works fine.

Could you explain why mtu must be set to exactly the same in case of
enhanced IPoIB mode? is there anything else we must treat it special?
I guess it related to

> > > > > in our internal implementation of enhanced IPoIB, we are reusing same
> > > > > resources for both parent and child, this requires us to wait for "UP"
> > > > > event before allowing traffic.

Thanks!
Jinpu

  reply	other threads:[~2021-05-07  6:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-19  7:44 IPoIB child interfaces not working with mlx5 Jinpu Wang
2021-03-20  9:30 ` Leon Romanovsky
     [not found]   ` <CAD+HZHUHbuBeoB4cCLc78gsmZAEyEr+fiWtpuTrxyzRBzMBf_g@mail.gmail.com>
2021-03-21 13:07     ` Leon Romanovsky
2021-03-22  6:08       ` Jinpu Wang
2021-03-22  6:56         ` Leon Romanovsky
2021-04-20  9:14           ` Jinpu Wang
2021-04-20 11:29             ` Leon Romanovsky
2021-05-07  6:53               ` Jinpu Wang [this message]
2021-05-07  8:03                 ` Zhu Yanjun
2021-05-07  8:11                   ` Jinpu Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMGffEmKKMS9dX+PTefsuoD7C350JPZ3hBM6ASm9XuzxvV-7pQ@mail.gmail.com \
    --to=jinpu.wang@ionos.com \
    --cc=dledford@redhat.com \
    --cc=itayav@nvidia.com \
    --cc=jgg@ziepe.ca \
    --cc=jinpu.wang@cloud.ionos.com \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=xjtuwjp@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).