All of lore.kernel.org
 help / color / mirror / Atom feed
* Need to set if_index in ib_init_ah_from_wc() ?
@ 2017-01-30  1:21 Roland Dreier
       [not found] ` <CAL1RGDW6iHo2UYKbcJmcG=wCq63jvZB7nvOD=BJZwASSzc7Zhw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Roland Dreier @ 2017-01-30  1:21 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

I recently debugged an issue where an RDMA-CM / RoCE connection to a
link-local IPv6 address takes 1 second to establish.  The delay comes
when the passive side cm_req_handler() calls
cm_init_av_for_response(), which calls ib_init_ah_from_wc().

That calls into rdma_addr_find_l2_eth_by_grh() with the link-local
address, but with 0 passed in through the if_index parameter.  So
resolving the link-local routing fails (since we don't give the
network stack a device scope).  We end up in the ENODATA case in
rdma_resolve_ip(), so we queue a request to try again with a timeout
of 1000 msec.  Of course, that still fails, so we return failure.  The
CM ignores the return value, but we've delayed sending the REP for a
second.

The fix seems as simple as adding

    if_index = idev->ifindex;

to ib_init_ah_from_wc() before the call to
rdma_addr_find_l2_eth_by_grh() - this can't fail, since we return an
error if finding idev fails.

However, I'm not sure if it's quite that simple - how does this
interact with RoCEv2, or other addressing like IPv4 or non-link-local
IPv6? I'm not sure I understand what the cases where resolved_dev will
be != idev, or if passing in idev->ifindex will break this.

Do I need to add a conditional to limit when I assign if_index?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found] ` <CAL1RGDW6iHo2UYKbcJmcG=wCq63jvZB7nvOD=BJZwASSzc7Zhw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-01-30 19:54   ` Jason Gunthorpe
       [not found]     ` <20170130195412.GE24466-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2017-01-30 19:54 UTC (permalink / raw)
  To: Roland Dreier; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak

On Sun, Jan 29, 2017 at 05:21:43PM -0800, Roland Dreier wrote:
> I recently debugged an issue where an RDMA-CM / RoCE connection to a
> link-local IPv6 address takes 1 second to establish.  The delay comes
> when the passive side cm_req_handler() calls
> cm_init_av_for_response(), which calls ib_init_ah_from_wc().
> 
> That calls into rdma_addr_find_l2_eth_by_grh() with the link-local
> address, but with 0 passed in through the if_index parameter.  So
> 
> The fix seems as simple as adding
> 
>     if_index = idev->ifindex;
> 
> to ib_init_ah_from_wc() before the call to
> rdma_addr_find_l2_eth_by_grh() - this can't fail, since we return an
> error if finding idev fails.

Hum, it is certainly a big conceptual mistake to not anchor a route
lookup in a netdev at this point.

Both roceev1 and v2 need to use the netdev that was used to create the
UD QP to anchor these lookups, that netdev should be a child of some
kind from the get_netdev (eg a vlan, or macvlan, etc)

This is necessary to make namespaces and other subtle things work properly.

So, right idea, wrong netdev.

> However, I'm not sure if it's quite that simple - how does this
> interact with RoCEv2, or other addressing like IPv4 or non-link-local
> IPv6? I'm not sure I understand what the cases where resolved_dev will
> be != idev, or if passing in idev->ifindex will break this.

I think this brokenness fell out of the above confusion, it should be
removed, rdma_addr_find_dmac_by_grh should not be callable without a
proper netdev.

The != case can happen if the gid table resolve differently than the
route table, but it is conceptually wrong to lookup in the gid table
before doing a route lookup, and I think these apis are poorly
designed to enforce that requirement.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]     ` <20170130195412.GE24466-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-01-30 20:15       ` Parav Pandit
       [not found]         ` <VI1PR0502MB30081EBFBEE5994B77D58BB9D14B0-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  2017-10-18 19:00       ` Parav Pandit
  1 sibling, 1 reply; 31+ messages in thread
From: Parav Pandit @ 2017-01-30 20:15 UTC (permalink / raw)
  To: Jason Gunthorpe, Roland Dreier
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak

Hi Jason, Roland,

> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Jason Gunthorpe
> Sent: Monday, January 30, 2017 1:54 PM
> To: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
> 
> On Sun, Jan 29, 2017 at 05:21:43PM -0800, Roland Dreier wrote:
> > I recently debugged an issue where an RDMA-CM / RoCE connection to a
> > link-local IPv6 address takes 1 second to establish.  The delay comes
> > when the passive side cm_req_handler() calls
> > cm_init_av_for_response(), which calls ib_init_ah_from_wc().
> >
> > That calls into rdma_addr_find_l2_eth_by_grh() with the link-local
> > address, but with 0 passed in through the if_index parameter.  So
> >
> > The fix seems as simple as adding
> >
> >     if_index = idev->ifindex;
> >
> > to ib_init_ah_from_wc() before the call to
> > rdma_addr_find_l2_eth_by_grh() - this can't fail, since we return an
> > error if finding idev fails.
> 
> Hum, it is certainly a big conceptual mistake to not anchor a route lookup in a
> netdev at this point.
> 
> Both roceev1 and v2 need to use the netdev that was used to create the UD
> QP to anchor these lookups, that netdev should be a child of some kind from
> the get_netdev (eg a vlan, or macvlan, etc)
> 
As you know UD QP is not created with netdev (AH binds to netdev).
So AH and RC QPs are going to be anchored now around GID.

> This is necessary to make namespaces and other subtle things work properly.

Yes. I am currently testing my patches for this support in our setup.
I will post the patches soon.
However fix that Roland pointed out is needed regardless.
I actually hit ifindex case even with IPv4 last week but was unable to reproduce it.

> 
> So, right idea, wrong netdev.
> 
> > However, I'm not sure if it's quite that simple - how does this
> > interact with RoCEv2, or other addressing like IPv4 or non-link-local
> > IPv6? I'm not sure I understand what the cases where resolved_dev will
> > be != idev, or if passing in idev->ifindex will break this.
> 
> I think this brokenness fell out of the above confusion, it should be removed,
> rdma_addr_find_dmac_by_grh should not be callable without a proper
> netdev.
> 
IP route apis are working mostly based on the iface_id as input.
So I am thinking to continue with ifindex input argument instead of changing to netdev pointer.
However there is corner case where ifindex could become invalid between rdma_addr_find_l2_eth_by_grh() and dev_get_by_index().
So I am open for netdev as well. I will review this path one more time for possible netdev input.
Currently rdma_addr_find_l2_eth_by_grh() has ifindex as optional argument, which is removed in my sandbox now.

> The != case can happen if the gid table resolve differently than the route
> table, but it is conceptually wrong to lookup in the gid table before doing a
> route lookup, and I think these apis are poorly designed to enforce that
> requirement.
> 
Gid lookup is still done after the route lookup using get_sgid_index_from_eth().
ib_get_gids_from_rdma_hdr() just gets the GID formatted properly.
I think this part is ok?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]         ` <VI1PR0502MB30081EBFBEE5994B77D58BB9D14B0-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-01-30 21:14           ` Jason Gunthorpe
       [not found]             ` <20170130211420.GB27111-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2017-01-30 21:14 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak

On Mon, Jan 30, 2017 at 08:15:51PM +0000, Parav Pandit wrote:

> > Both roceev1 and v2 need to use the netdev that was used to create the UD
> > QP to anchor these lookups, that netdev should be a child of some kind from
> > the get_netdev (eg a vlan, or macvlan, etc)
> > 
> As you know UD QP is not created with netdev (AH binds to netdev).
> So AH and RC QPs are going to be anchored now around GID.

Well, that is fundamentally incompatible with how the netdev stack
works.

We certainly can't have namespaces with such an insecure
design. UD QPs must not receive packets inapproprate for the netdev
they are associated with, or the collection of netdevs in their
namespace (for * binds)

In my prior message I was assuming that rocee disallowed '*' binds
since that isn't really how RDMA CM works.. (what does that even mean
if you have two rocee devices??)

But this clearly hasn't been thought through on the rocee side, so we
have a nonsense mess :\

> > This is necessary to make namespaces and other subtle things work properly.
> 
> Yes. I am currently testing my patches for this support in our setup.
> I will post the patches soon.
> However fix that Roland pointed out is needed regardless.
> I actually hit ifindex case even with IPv4 last week but was unable to reproduce it.

Well, using idev isn't going to work with child netdevs, etc

The best approach for now is to first consult the GID table to learn
the ingress netdev for the packet that caused the 'wc'.

Then do an ingress route lookup to make sure the packet should have
been accepted (bit late, but OK)

Then do the egreess route lookup to figure out the parameters for the
new AH.

But this is only 'safe' if the caller of ib_init_ah_from_wc is
assuming the UD QP was bound to * - and only if there are no
namespaces.

The best thing to do is introduce required netdev binding for UD QPs
in kernel and then the kernel's ib_init_ah_from_wc can work safely..

> > I think this brokenness fell out of the above confusion, it should be removed,
> > rdma_addr_find_dmac_by_grh should not be callable without a proper
> > netdev.
> > 
> IP route apis are working mostly based on the iface_id as input.
> So I am thinking to continue with ifindex input argument instead of changing to netdev pointer.

ifindex do not quickly re-use so this is fine.

> Currently rdma_addr_find_l2_eth_by_grh() has ifindex as optional
> argument, which is removed in my sandbox now.

Right, should not be optional, but I'm wondering if this routine
should even exist in this form. The two call sites seem reasonably different.

> > The != case can happen if the gid table resolve differently than the route
> > table, but it is conceptually wrong to lookup in the gid table before doing a
> > route lookup, and I think these apis are poorly designed to enforce that
> > requirement.
> > 
> Gid lookup is still done after the route lookup using get_sgid_index_from_eth().
> ib_get_gids_from_rdma_hdr() just gets the GID formatted properly.
> I think this part is ok?

Sort of, but you can't do the route lookup until you get the netdev,
and you can't get a netdev without a gid table lookup. :\

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]             ` <20170130211420.GB27111-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-01-30 21:52               ` Parav Pandit
       [not found]                 ` <VI1PR0502MB30081EC9B35269636B4C2FBCD14B0-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Parav Pandit @ 2017-01-30 21:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak

Hi Jason,

> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org]
> Sent: Monday, January 30, 2017 3:14 PM
> To: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
> 
> On Mon, Jan 30, 2017 at 08:15:51PM +0000, Parav Pandit wrote:
> 
> > > Both roceev1 and v2 need to use the netdev that was used to create
> > > the UD QP to anchor these lookups, that netdev should be a child of
> > > some kind from the get_netdev (eg a vlan, or macvlan, etc)
> > >
> > As you know UD QP is not created with netdev (AH binds to netdev).
> > So AH and RC QPs are going to be anchored now around GID.
> 
> Well, that is fundamentally incompatible with how the netdev stack works.
What I meant is, UD QP is not attached to netdev but yes with my patches any QP is attached to namespace, just like regular socket.
So I have qp_net inside ib_qp structure.
Let me finish internal peer reviews before I post the patches.

> 
> We certainly can't have namespaces with such an insecure design. UD QPs
> must not receive packets inapproprate for the netdev they are associated
> with, or the collection of netdevs in their namespace (for * binds)
> 
Haggai and I had exact same discussion few days back when I referred the IB spec. 1.3.1 section C9-46.
So I agree for netdev (mac, vlan). But IP could be of other namespace in incoming packet and that should be ok because we don't have UDP socket style bind() on UD QPs.

> In my prior message I was assuming that rocee disallowed '*' binds since that
> isn't really how RDMA CM works.. (what does that even mean if you have
> two rocee devices??)
> 

> But this clearly hasn't been thought through on the rocee side, so we have a
> nonsense mess :\
> 
> > > This is necessary to make namespaces and other subtle things work
> properly.
> >
> > Yes. I am currently testing my patches for this support in our setup.
> > I will post the patches soon.
> > However fix that Roland pointed out is needed regardless.
> > I actually hit ifindex case even with IPv4 last week but was unable to
> reproduce it.
> 
> Well, using idev isn't going to work with child netdevs, etc
> 
I have further added checks for vlan_dev in presence of wc vlan id present and walking upto upper device.

> The best approach for now is to first consult the GID table to learn the
> ingress netdev for the packet that caused the 'wc'.
> 
> Then do an ingress route lookup to make sure the packet should have been
> accepted (bit late, but OK)
> 
> Then do the egreess route lookup to figure out the parameters for the new
> AH.
> 
> But this is only 'safe' if the caller of ib_init_ah_from_wc is assuming the UD
> QP was bound to * - and only if there are no namespaces.
> 
> The best thing to do is introduce required netdev binding for UD QPs in
> kernel and then the kernel's ib_init_ah_from_wc can work safely..
> 
I am binding to net_ns of the calling process which can have one or more netdev.
This is the right thing to do similar to regular sockets.
Exception if QP1 which will receive packets for all namespaces and will ingress and egress lookups based on the git_addr->net.

> > > I think this brokenness fell out of the above confusion, it should
> > > be removed, rdma_addr_find_dmac_by_grh should not be callable
> > > without a proper netdev.
> > >
> > IP route apis are working mostly based on the iface_id as input.
> > So I am thinking to continue with ifindex input argument instead of
> changing to netdev pointer.
> 
> ifindex do not quickly re-use so this is fine.
> 
Its not quickly reused but when netdevice moves to different namespace, if that ifindex is already exist there, it assigns new one.
But I agree it's pretty corner case and its likely an administrative limitation until we establish good use case.

> > Currently rdma_addr_find_l2_eth_by_grh() has ifindex as optional
> > argument, which is removed in my sandbox now.
> 
> Right, should not be optional, but I'm wondering if this routine should even
> exist in this form. The two call sites seem reasonably different.
> 
> > > The != case can happen if the gid table resolve differently than the
> > > route table, but it is conceptually wrong to lookup in the gid table
> > > before doing a route lookup, and I think these apis are poorly
> > > designed to enforce that requirement.
> > >
> > Gid lookup is still done after the route lookup using
> get_sgid_index_from_eth().
> > ib_get_gids_from_rdma_hdr() just gets the GID formatted properly.
> > I think this part is ok?
> 
> Sort of, but you can't do the route lookup until you get the netdev, and you
> can't get a netdev without a gid table lookup. :\
> 
wc->vlan establishes netdev and its ifindex.
If upper device exist, netdev is established through that in addition to vlan.
netdev establishes namespace.
Namespace establishes gid table search scope and therefore right sgid index.
Same namespace will be used for destination route table lookup as you described in steps above.

I am glad that we are having this discussion right now.

> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                 ` <VI1PR0502MB30081EC9B35269636B4C2FBCD14B0-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-01-30 22:14                   ` Jason Gunthorpe
       [not found]                     ` <20170130221437.GA28117-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2017-01-31  0:24                   ` Roland Dreier
  1 sibling, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2017-01-30 22:14 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak

On Mon, Jan 30, 2017 at 09:52:17PM +0000, Parav Pandit wrote:

> > > > Both roceev1 and v2 need to use the netdev that was used to create
> > > > the UD QP to anchor these lookups, that netdev should be a child of
> > > > some kind from the get_netdev (eg a vlan, or macvlan, etc)
> > > >
> > > As you know UD QP is not created with netdev (AH binds to netdev).
> > > So AH and RC QPs are going to be anchored now around GID.
> > 
> > Well, that is fundamentally incompatible with how the netdev stack works.

> What I meant is, UD QP is not attached to netdev but yes with my
> patches any QP is attached to namespace, just like regular socket.
> So I have qp_net inside ib_qp structure.  Let me finish internal
> peer reviews before I post the patches.

No, regular sockets can be attached to both - '*' bind makes no sense
for RDMA because such a socket will not rx for all ROCEE addreses,
just the ones on the device it is bound to.

For that reason the entire concept of '*' bind should not even exist.

> > We certainly can't have namespaces with such an insecure design. UD QPs
> > must not receive packets inapproprate for the netdev they are associated
> > with, or the collection of netdevs in their namespace (for * binds)
>
> Haggai and I had exact same discussion few days back when I referred
> the IB spec. 1.3.1 section C9-46.  So I agree for netdev (mac,
> vlan).

ipvlan is a thing too..

> But IP could be of other namespace in incoming packet and that
> should be ok because we don't have UDP socket style bind() on UD
> QPs.

That isn't OK from a security standpoint.

You can't secure UD QPs by using the QPN, the NIC has to restrict
incoming packets to only match certain GID table entries, and those
entries must be the netdev the UD QP is bound to or all netdevices in
the UD QPs's namespace (for '*' bind)

Otherwise the NIC does not support rocee namespaces.

> > Well, using idev isn't going to work with child netdevs, etc
 
> I have further added checks for vlan_dev in presence of wc vlan id
> present and walking upto upper device.

That sounds broken, there should never be walking of device or
anything silly like that to determine the ingress netdev. The only
place that is done is when constructing the GID cache.

> > The best thing to do is introduce required netdev binding for UD
> > QPs in kernel and then the kernel's ib_init_ah_from_wc can work
> > safely..
> > 
> I am binding to net_ns of the calling process which can have one or
> more netdev.

But that doesn't make sense for in-kernel users, and those are the
only users that can call ib_init_ah_from_wc...

> This is the right thing to do similar to regular sockets.

Only if the sockets are '*' bound,..

> Exception if QP1 which will receive packets for all namespaces and
> will ingress and egress lookups based on the git_addr->net.

But even here you need to get the ingress netdevice from the gid cache
and use that to deduce what namespace the packet is for.

> > ifindex do not quickly re-use so this is fine.
 
> Its not quickly reused but when netdevice moves to different
> namespace, if that ifindex is already exist there, it assigns new
> one.  But I agree it's pretty corner case and its likely an
> administrative limitation until we establish good use case.

Oh? But in that case an (ifindex,ns) is still a stable reference.

> > Sort of, but you can't do the route lookup until you get the netdev, and you
> > can't get a netdev without a gid table lookup. :\
> > 
> wc->vlan establishes netdev and its ifindex.

Nope, that isn't enough to support all the behaviors.

> If upper device exist, netdev is established through that in
> addition to vlan.  netdev establishes namespace.  Namespace
> establishes gid table search scope and therefore right sgid index.
> Same namespace will be used for destination route table lookup as
> you described in steps above.

No way, that is backwards.

The ingress netdev comes from searching the GID table, period. The GID
table matches the vlan and dgid.

>From there you have the netnamespace to do routing lookups, etc.

> I am glad that we are having this discussion right now.

Can you try and submit a patch to fix Roland's problem please?

Searching the gid table is the way to go for now...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                     ` <20170130221437.GA28117-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-01-31  0:21                       ` Parav Pandit
       [not found]                         ` <VI1PR0502MB3008B16A0F103D729209EA16D14A0-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Parav Pandit @ 2017-01-31  0:21 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak

Hi Jason,

> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org]
> Sent: Monday, January 30, 2017 4:15 PM
> To: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
> 
> On Mon, Jan 30, 2017 at 09:52:17PM +0000, Parav Pandit wrote:
> 
> > What I meant is, UD QP is not attached to netdev but yes with my
> > patches any QP is attached to namespace, just like regular socket.
> > So I have qp_net inside ib_qp structure.  Let me finish internal peer
> > reviews before I post the patches.
> 
> No, regular sockets can be attached to both - '*' bind makes no sense for
> RDMA because such a socket will not rx for all ROCEE addreses, just the ones
> on the device it is bound to.
> 
> For that reason the entire concept of '*' bind should not even exist.

I am talking of binding QP or socket to namespace.
My understanding is a given socket can be bound to one and only one network namespace at a time.
If that's not true, socket should have array for net.
Refer to sock_net_set() and sock_net() APIs.
Which namespace it is bound to, its obvious reason not known to the user, regardless of bind(* or specific address).

> 
> > > We certainly can't have namespaces with such an insecure design. UD
> > > QPs must not receive packets inapproprate for the netdev they are
> > > associated with, or the collection of netdevs in their namespace
> > > (for * binds)
> >
> > Haggai and I had exact same discussion few days back when I referred
> > the IB spec. 1.3.1 section C9-46.  So I agree for netdev (mac, vlan).
> 
> ipvlan is a thing too..
Yes. Will think about this further. 
> 
> > But IP could be of other namespace in incoming packet and that should
> > be ok because we don't have UDP socket style bind() on UD QPs.
> 
> That isn't OK from a security standpoint.
> 
> You can't secure UD QPs by using the QPN, the NIC has to restrict incoming
> packets to only match certain GID table entries, and those entries must be
> the netdev the UD QP is bound to or all netdevices in the UD QPs's
> namespace (for '*' bind)
> 
> Otherwise the NIC does not support rocee namespaces.
> 
o.k. Just to highlight you, QP1 is exception to this.

> > > Well, using idev isn't going to work with child netdevs, etc
> 
> > I have further added checks for vlan_dev in presence of wc vlan id
> > present and walking upto upper device.
> 
> That sounds broken, there should never be walking of device or anything silly
> like that to determine the ingress netdev. The only place that is done is when
> constructing the GID cache.
> 
GID table can have two gid entries with same GID content in there, but gid_attr->net can be different.
This is typically for vlan based devices located in two different namespace.
This is the fundamental point to query GID cache based on netdev's net.
Without considering net_ns, GID cache query is equally broken.
Therefore incoming packet's netdev and net namespace derivation is needed from ingress packet.

> > > The best thing to do is introduce required netdev binding for UD QPs
> > > in kernel and then the kernel's ib_init_ah_from_wc can work safely..
> > >
> > I am binding to net_ns of the calling process which can have one or
> > more netdev.
> 
> But that doesn't make sense for in-kernel users, and those are the only users
> that can call ib_init_ah_from_wc...

Its done by default even for sockets. That's how its established that it belongs to init_net.
Refer to sk_alloc().

> 
> > This is the right thing to do similar to regular sockets.
> 
> Only if the sockets are '*' bound,..
> 
I think you mean sockets are '*' bound to address.
Assuming yes, to it, regardless of address property they will be bound to net_ns.
This covers the case of applications who are not using RDMA CM as well.

> > Exception if QP1 which will receive packets for all namespaces and
> > will ingress and egress lookups based on the git_addr->net.
> 
> But even here you need to get the ingress netdevice from the gid cache and
> use that to deduce what namespace the packet is for.
> 
As mentioned above, GID table can have duplicate entries so namespace is identified by the ingress path.
I have reviewed net code before, but I will refer again how it's done on regular Ethernet packets.

> > > ifindex do not quickly re-use so this is fine.
> 
> > Its not quickly reused but when netdevice moves to different
> > namespace, if that ifindex is already exist there, it assigns new one.
> > But I agree it's pretty corner case and its likely an administrative
> > limitation until we establish good use case.
> 
> Oh? But in that case an (ifindex,ns) is still a stable reference.
> 
> > > Sort of, but you can't do the route lookup until you get the netdev,
> > > and you can't get a netdev without a gid table lookup. :\
> > >
> > wc->vlan establishes netdev and its ifindex.
> 
> Nope, that isn't enough to support all the behaviors.
> 
> > If upper device exist, netdev is established through that in addition
> > to vlan.  netdev establishes namespace.  Namespace establishes gid
> > table search scope and therefore right sgid index.
> > Same namespace will be used for destination route table lookup as you
> > described in steps above.
> 
> No way, that is backwards.
> 
> The ingress netdev comes from searching the GID table, period. The GID
> table matches the vlan and dgid.
> 
> From there you have the netnamespace to do routing lookups, etc.
> 
> > I am glad that we are having this discussion right now.
> 
> Can you try and submit a patch to fix Roland's problem please?
> 
> Searching the gid table is the way to go for now...
> 

Let's first agree/disagree for duplicate GID table entries exist for two different namespaces.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                 ` <VI1PR0502MB30081EC9B35269636B4C2FBCD14B0-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  2017-01-30 22:14                   ` Jason Gunthorpe
@ 2017-01-31  0:24                   ` Roland Dreier
  1 sibling, 0 replies; 31+ messages in thread
From: Roland Dreier @ 2017-01-31  0:24 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak

> Exception if QP1 which will receive packets for all namespaces and will ingress and egress lookups based on the git_addr->net.

And to be clear - the issue I am hitting is with QP1 - it is a delay
during CM connection establishment, due to doing ib_init_ah_from_wc()
in the REQ handler.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                         ` <VI1PR0502MB3008B16A0F103D729209EA16D14A0-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-01-31  0:53                           ` Jason Gunthorpe
       [not found]                             ` <20170131005340.GA30809-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2017-01-31  0:53 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak

On Tue, Jan 31, 2017 at 12:21:17AM +0000, Parav Pandit wrote:

> > No, regular sockets can be attached to both - '*' bind makes no sense for
> > RDMA because such a socket will not rx for all ROCEE addreses, just the ones
> > on the device it is bound to.
> > 
> > For that reason the entire concept of '*' bind should not even exist.
> 
> I am talking of binding QP or socket to namespace.

Sockets are only bound to a namespace with a '*' bind - otherwise they
are bound to netdevs (and that implies a namespace)

> My understanding is a given socket can be bound to one and only one
> network namespace at a time.

Right, but that isn't what I'm talking about here...

> > That sounds broken, there should never be walking of device or anything silly
> > like that to determine the ingress netdev. The only place that is done is when
> > constructing the GID cache.
>
> GID table can have two gid entries with same GID content in there,
> but gid_attr->net can be different.

A incoming packet *cannot* match two GID table entries - that is by
definition.

Yes, two table entries can have the same GID.

However, it is invalid to search the GID table by GID alone for
rocee. The GID table can only be searched with the full network
headers. For instance (DMAC,VLAN_ID,ROCE Version,GRH.DGID,etc).

This is what the hardware should be doing when it decides if it will
accept a packet or not. Packets that do not match GID table entries
should not be received. Each UD QP should have a list of GID table
entries it will accept packets for. (this addition is necessary for
namespaces)

In IB the matching GID table entry is placed in wc.sgid_index.

I argued that rocee should do the same, but since mlx didn't implement
this in hardware they didn't want to take the performance cost when
building the WC.

So, you have to reconstruct the wc.sgid_index that the hardware used
in software - and this will always match a single GID table entry.

Since GID table entries are associated with a single netdev, this
gives you everything needed to process at ingress.

> Without considering net_ns, GID cache query is equally broken.

Again, you must never, ever, search the GID table with only a GID or
IP address. That is always wrong for rocee..

> > > > The best thing to do is introduce required netdev binding for UD QPs
> > > > in kernel and then the kernel's ib_init_ah_from_wc can work safely..
> > > >
> > > I am binding to net_ns of the calling process which can have one or
> > > more netdev.
> > 
> > But that doesn't make sense for in-kernel users, and those are the only users
> > that can call ib_init_ah_from_wc...
> 
> Its done by default even for sockets. That's how its established
> that it belongs to init_net.  Refer to sk_alloc().

IB is not sockets, we don't have the concept of a '*' bind. It just
cannot be supported by the hardware.

> I think you mean sockets are '*' bound to address.  Assuming yes, to
> it, regardless of address property they will be bound to net_ns.
> This covers the case of applications who are not using RDMA CM as
> well.

More than that, we can't support the entire idea of binding an IP. The
best we can do is SO_BINDTODEVICE - which is what RDMA CM should
actually implement. (it binds to a rdma device, not a netdevice, which
is nonsensical and wrong from an IP stack perspective)

Since that has to be fixed to support namespaces, it should be fixed
properly and emulate SO_BINDTODEVICE semantics, and not something
else.

> > But even here you need to get the ingress netdevice from the gid
> > cache and use that to deduce what namespace the packet is for.
> 
> As mentioned above, GID table can have duplicate entries so
> namespace is identified by the ingress path.  I have reviewed net
> code before, but I will refer again how it's done on regular
> Ethernet packets.

Regular ethernet derives the ingress netdev from the DMAC and VLAN tag
present in the packet. (ignoring ipvlan which is special)

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                             ` <20170131005340.GA30809-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-01-31  1:49                               ` Jason Gunthorpe
       [not found]                                 ` <20170131014941.GA15387-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2017-02-08 16:02                               ` Matan Barak
  1 sibling, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2017-01-31  1:49 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak

On Mon, Jan 30, 2017 at 05:53:40PM -0700, Jason Gunthorpe wrote:

> In IB the matching GID table entry is placed in wc.sgid_index.
> 
> I argued that rocee should do the same, but since mlx didn't implement
> this in hardware they didn't want to take the performance cost when
> building the WC.

Sorry, I realized this reads confusingly after the fact - the argument
was about adding a wc.gid_index - IB unambigously open-codes computing
wc.gid_index ..

I think you should review the thread starting here:

http://www.spinics.net/lists/linux-rdma/msg30976.html

Which is what caused this broken route lookup to be implemented in the
first place.

And here is my first suggestion on how to fix it up:

http://www.spinics.net/lists/linux-rdma/msg31118.html

The fact the fundamental problems I raised during the review were not
properly addressed is what is causing these problems and confusion
now.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                                 ` <20170131014941.GA15387-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-01-31  1:57                                   ` Parav Pandit
  0 siblings, 0 replies; 31+ messages in thread
From: Parav Pandit @ 2017-01-31  1:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak

Hi Jason,

I will go through the posts.

Parav

> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org]
> Sent: Monday, January 30, 2017 7:50 PM
> To: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
> 
> On Mon, Jan 30, 2017 at 05:53:40PM -0700, Jason Gunthorpe wrote:
> 
> > In IB the matching GID table entry is placed in wc.sgid_index.
> >
> > I argued that rocee should do the same, but since mlx didn't implement
> > this in hardware they didn't want to take the performance cost when
> > building the WC.
> 
> Sorry, I realized this reads confusingly after the fact - the argument was
> about adding a wc.gid_index - IB unambigously open-codes computing
> wc.gid_index ..
> 
> I think you should review the thread starting here:
> 
> http://www.spinics.net/lists/linux-rdma/msg30976.html
> 
> Which is what caused this broken route lookup to be implemented in the first
> place.
> 
> And here is my first suggestion on how to fix it up:
> 
> http://www.spinics.net/lists/linux-rdma/msg31118.html
> 
> The fact the fundamental problems I raised during the review were not
> properly addressed is what is causing these problems and confusion now.
> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                             ` <20170131005340.GA30809-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2017-01-31  1:49                               ` Jason Gunthorpe
@ 2017-02-08 16:02                               ` Matan Barak
       [not found]                                 ` <CAAKD3BDbkFHbCi+gHyCXCGV2xi5E9FA+KgwKz+6dBJEtsV0ZkQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 31+ messages in thread
From: Matan Barak @ 2017-02-08 16:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Parav Pandit, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Matan Barak

>> > That sounds broken, there should never be walking of device or anything silly
>> > like that to determine the ingress netdev. The only place that is done is when
>> > constructing the GID cache.
>>
>> GID table can have two gid entries with same GID content in there,
>> but gid_attr->net can be different.
>
> A incoming packet *cannot* match two GID table entries - that is by
> definition.
>
> Yes, two table entries can have the same GID.
>
> However, it is invalid to search the GID table by GID alone for
> rocee. The GID table can only be searched with the full network
> headers. For instance (DMAC,VLAN_ID,ROCE Version,GRH.DGID,etc).
>
> This is what the hardware should be doing when it decides if it will
> accept a packet or not. Packets that do not match GID table entries
> should not be received. Each UD QP should have a list of GID table
> entries it will accept packets for. (this addition is necessary for
> namespaces)
>
> In IB the matching GID table entry is placed in wc.sgid_index.
>
> I argued that rocee should do the same, but since mlx didn't implement
> this in hardware they didn't want to take the performance cost when
> building the WC.
>
> So, you have to reconstruct the wc.sgid_index that the hardware used
> in software - and this will always match a single GID table entry.
>
> Since GID table entries are associated with a single netdev, this
> gives you everything needed to process at ingress.
>
>> Without considering net_ns, GID cache query is equally broken.
>
> Again, you must never, ever, search the GID table with only a GID or
> IP address. That is always wrong for rocee..
>

Actually, I think one of the problems here is that we insert GIDs
which aren't supported by the hardware.
For example, if we can't supply VNI, why would we add such a GID entry?
The hardware won't be able to strip/build such a header anyway.

So, assuming we only insert a combination of netdevices which could be actually
described by the hardware, we could try to match these attributes in
the gid cache table.
Once we have such a match, we could verify that with the ingress route
(actually, we could use
ingress route starting from get_netdev() as well).

Since RoCE is symmetrical (you don't know if an AH is for rx/tx in UD
and you need to select
a single sgid_index for RC connections), we need to verify that it
matches an egress route
bounded to this netdevice.

Regarding the CM, it could be implemented through the network stack as
well. It'll actually give
the correct netdev. Saying that, I'm not sure all hardwares support that.

As a quick fix, setting ifindex to get_netdev's ifindex in link local
addresses and filter out unsupported
netdevices in the GID cache, seems like a reasonable solution to me.

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                                 ` <CAAKD3BDbkFHbCi+gHyCXCGV2xi5E9FA+KgwKz+6dBJEtsV0ZkQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-02-08 17:40                                   ` Jason Gunthorpe
       [not found]                                     ` <20170208174040.GC30720-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2017-02-08 17:40 UTC (permalink / raw)
  To: Matan Barak
  Cc: Parav Pandit, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Matan Barak

On Wed, Feb 08, 2017 at 06:02:44PM +0200, Matan Barak wrote:
 
> Actually, I think one of the problems here is that we insert GIDs
> which aren't supported by the hardware.

That certainly sounds like a problem, yes.

> So, assuming we only insert a combination of netdevices which could
> be actually described by the hardware, we could try to match these
> attributes in the gid cache table.  Once we have such a match, we
> could verify that with the ingress route (actually, we could use
> ingress route starting from get_netdev() as well).

Yes, that is the proper thing to do.

Presumably the hardware does this, it should not be receiving packets
that do not match the GID table. That would be a huge problem when we
talk about namespaces/etc.

> Since RoCE is symmetrical (you don't know if an AH is for rx/tx in
> UD and you need to select a single sgid_index for RC connections),
> we need to verify that it matches an egress route bounded to this
> netdevice.

?? how is an AH involved in RX?

> As a quick fix, setting ifindex to get_netdev's ifindex in link local
> addresses and filter out unsupported
> netdevices in the GID cache, seems like a reasonable solution to me.

That still breaks link local addresses on vlan devices, so it is an
ugly hack, not a solution.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                                     ` <20170208174040.GC30720-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-02-09  0:00                                       ` Parav Pandit
       [not found]                                         ` <VI1PR0502MB3008417F3E388B617291147CD1450-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  2017-02-09  8:40                                       ` Matan Barak
  1 sibling, 1 reply; 31+ messages in thread
From: Parav Pandit @ 2017-02-09  0:00 UTC (permalink / raw)
  To: Jason Gunthorpe, Matan Barak
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak

Hi Jason,

> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org]
> Sent: Wednesday, February 8, 2017 11:41 AM
> To: Matan Barak <matanb-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
> Cc: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Roland Dreier
> <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Matan Barak
> <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
> 
> On Wed, Feb 08, 2017 at 06:02:44PM +0200, Matan Barak wrote:
> 
> > Actually, I think one of the problems here is that we insert GIDs
> > which aren't supported by the hardware.
> 
> That certainly sounds like a problem, yes.
> 
> > So, assuming we only insert a combination of netdevices which could be
> > actually described by the hardware, we could try to match these
> > attributes in the gid cache table.  Once we have such a match, we
> > could verify that with the ingress route (actually, we could use
> > ingress route starting from get_netdev() as well).
> 
> Yes, that is the proper thing to do.
> 
> Presumably the hardware does this, it should not be receiving packets that
> do not match the GID table. That would be a huge problem when we talk
> about namespaces/etc.
> 
> > Since RoCE is symmetrical (you don't know if an AH is for rx/tx in UD
> > and you need to select a single sgid_index for RC connections), we
> > need to verify that it matches an egress route bounded to this
> > netdevice.
> 
> ?? how is an AH involved in RX?
> 
> > As a quick fix, setting ifindex to get_netdev's ifindex in link local
> > addresses and filter out unsupported netdevices in the GID cache,
> > seems like a reasonable solution to me.
> 
> That still breaks link local addresses on vlan devices, so it is an ugly hack, not a
> solution.

In presence of vlan, shouldn't we be passing the ifindex of the vlan netdev?


> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                                         ` <VI1PR0502MB3008417F3E388B617291147CD1450-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-02-09  0:01                                           ` Jason Gunthorpe
       [not found]                                             ` <20170209000157.GA14556-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2017-02-09  0:01 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Matan Barak, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Matan Barak

On Thu, Feb 09, 2017 at 12:00:09AM +0000, Parav Pandit wrote:

> > That still breaks link local addresses on vlan devices, so it is an ugly hack, not a
> > solution.
> 
> In presence of vlan, shouldn't we be passing the ifindex of the vlan netdev?

yes, that is exactly my point...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                                             ` <20170209000157.GA14556-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-02-09  0:15                                               ` Parav Pandit
       [not found]                                                 ` <VI1PR0502MB3008153CDFD7F9C7CCAB022CD1450-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Parav Pandit @ 2017-02-09  0:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matan Barak, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Matan Barak


> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org]
> Sent: Wednesday, February 8, 2017 6:02 PM
> To: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Matan Barak <matanb-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>; Roland Dreier
> <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Matan Barak
> <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
> 
> On Thu, Feb 09, 2017 at 12:00:09AM +0000, Parav Pandit wrote:
> 
> > > That still breaks link local addresses on vlan devices, so it is an
> > > ugly hack, not a solution.
> >
> > In presence of vlan, shouldn't we be passing the ifindex of the vlan
> netdev?
> 
> yes, that is exactly my point...

Oh ok. I get it. I am on right path to fix it than.

Additionally,
when there is macvlan based slave device present on this vlan device, I will pass the ifindex of that particular netdev.
Now since we don't have MAC address coming in ib_wc nor in IB/RoCEv2 Annex spec, code needs to refer to the
(a) ifaddr of the vlan netdev
and
(b) ifaddr of the slave netdevs 
Compare the DGID of the grh with ifaddr and use that netdev's ifindex for the first matching entry.

Sounds reasonable now?


> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                                     ` <20170208174040.GC30720-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2017-02-09  0:00                                       ` Parav Pandit
@ 2017-02-09  8:40                                       ` Matan Barak
       [not found]                                         ` <CAAKD3BD88Kp2h3+vP4iMy5T4rR6=Y39ZnNXRLM1P-6G6iB=BNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 31+ messages in thread
From: Matan Barak @ 2017-02-09  8:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Parav Pandit, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Matan Barak

On Wed, Feb 8, 2017 at 7:40 PM, Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:
> On Wed, Feb 08, 2017 at 06:02:44PM +0200, Matan Barak wrote:
>
>> Actually, I think one of the problems here is that we insert GIDs
>> which aren't supported by the hardware.
>
> That certainly sounds like a problem, yes.
>
>> So, assuming we only insert a combination of netdevices which could
>> be actually described by the hardware, we could try to match these
>> attributes in the gid cache table.  Once we have such a match, we
>> could verify that with the ingress route (actually, we could use
>> ingress route starting from get_netdev() as well).
>
> Yes, that is the proper thing to do.
>
> Presumably the hardware does this, it should not be receiving packets
> that do not match the GID table. That would be a huge problem when we
> talk about namespaces/etc.
>
>> Since RoCE is symmetrical (you don't know if an AH is for rx/tx in
>> UD and you need to select a single sgid_index for RC connections),
>> we need to verify that it matches an egress route bounded to this
>> netdevice.
>
> ?? how is an AH involved in RX?
>

Well, some applications (maybe a lot of them), creates an AH from the first
received packet (of each node) and then just assumes the rest of the packets
comes from the same logical netdev.
Since there is no binding to a netdev, other packets could differ
vastly from this
one. That's what I meant by "symmetrical".

>> As a quick fix, setting ifindex to get_netdev's ifindex in link local
>> addresses and filter out unsupported
>> netdevices in the GID cache, seems like a reasonable solution to me.
>
> That still breaks link local addresses on vlan devices, so it is an
> ugly hack, not a solution.
>

I agree, but currently all link local addresses are broken :(

> Jason

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                                                 ` <VI1PR0502MB3008153CDFD7F9C7CCAB022CD1450-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-02-09  9:24                                                   ` Matan Barak
       [not found]                                                     ` <CAAKD3BBunKCb6Z-FObOD0covTt3gCXDn3oic4LNbU3J5JHiQJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Matan Barak @ 2017-02-09  9:24 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Jason Gunthorpe, Roland Dreier,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak

On Thu, Feb 9, 2017 at 2:15 AM, Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>
>> -----Original Message-----
>> From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org]
>> Sent: Wednesday, February 8, 2017 6:02 PM
>> To: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Cc: Matan Barak <matanb-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>; Roland Dreier
>> <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Matan Barak
>> <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
>>
>> On Thu, Feb 09, 2017 at 12:00:09AM +0000, Parav Pandit wrote:
>>
>> > > That still breaks link local addresses on vlan devices, so it is an
>> > > ugly hack, not a solution.
>> >
>> > In presence of vlan, shouldn't we be passing the ifindex of the vlan
>> netdev?
>>
>> yes, that is exactly my point...
>
> Oh ok. I get it. I am on right path to fix it than.
>
> Additionally,
> when there is macvlan based slave device present on this vlan device, I will pass the ifindex of that particular netdev.
> Now since we don't have MAC address coming in ib_wc nor in IB/RoCEv2 Annex spec, code needs to refer to the
> (a) ifaddr of the vlan netdev
> and
> (b) ifaddr of the slave netdevs
> Compare the DGID of the grh with ifaddr and use that netdev's ifindex for the first matching entry.
>
> Sounds reasonable now?
>

Since we don't get the DMAC address, I think the GID cache shouldn't
carry entries which the hardware
can't differentiate upon. It might be ok for some cases in the
transmit side (as you choose the smac
based on the netdev attached to the GID entry, but if you add a vxlan
based interface, you won't
be able to add the appropriate headers). We can leave this as is or
making it symmetrical.

So, when adding a GID, we need to consult the hardware capabilities
regarding the metadata
it can provide in the completion. If the hardware isn't capable of
creating/stripping one of the headers
of this netdev, there's no reason to add it.

If the hardware supports creating/stripping the required headers but
it doesn't support reporting
them in the completion or all fields are supported but there are
conflicting entries, you could
either consult the ingress route before adding these GIDs or add them
both and consult the
ingress route when creating an AH from the completion entry and GRH data.
I think adding them both, selecting the first one and validating with
the ingress route is the way
to go.

The link local address falls into this case as well.

>
>>
>> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                                                     ` <CAAKD3BBunKCb6Z-FObOD0covTt3gCXDn3oic4LNbU3J5JHiQJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-02-09 18:47                                                       ` Jason Gunthorpe
       [not found]                                                         ` <20170209184720.GA809-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Gunthorpe @ 2017-02-09 18:47 UTC (permalink / raw)
  To: Matan Barak
  Cc: Parav Pandit, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Matan Barak

On Thu, Feb 09, 2017 at 11:24:17AM +0200, Matan Barak wrote:
> On Thu, Feb 9, 2017 at 2:15 AM, Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> >
> >> From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org]
> >> Sent: Wednesday, February 8, 2017 6:02 PM
> >> To: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >> Cc: Matan Barak <matanb-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>; Roland Dreier
> >> <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Matan Barak
> >> <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
> >>
> >> On Thu, Feb 09, 2017 at 12:00:09AM +0000, Parav Pandit wrote:
> >>
> >> > > That still breaks link local addresses on vlan devices, so it is an
> >> > > ugly hack, not a solution.
> >> >
> >> > In presence of vlan, shouldn't we be passing the ifindex of the vlan
> >> netdev?
> >>
> >> yes, that is exactly my point...
> >
> > Oh ok. I get it. I am on right path to fix it than.

Use the gid cache to figure out, not seaching netdevs..

> > Additionally,
> > when there is macvlan based slave device present on this vlan device, I will pass the ifindex of that particular netdev.
> > Now since we don't have MAC address coming in ib_wc nor in IB/RoCEv2 Annex spec, code needs to refer to the
> > (a) ifaddr of the vlan netdev
> > and
> > (b) ifaddr of the slave netdevs
> > Compare the DGID of the grh with ifaddr and use that netdev's ifindex for the first matching entry.
> >
> > Sounds reasonable now?
> >
> 
> Since we don't get the DMAC address, I think the GID cache shouldn't
> carry entries which the hardware can't differentiate upon.

Well, more specifically, with this limiatation, the hardware must
*NEVER* receive a packet that does not match the primary MAC of the
port.

Which goes back to my first point: The hardware should not receive
something that is not in the GID cache, period. It sounds like this
basic sanity is being viloated in some current rocee hardware???

If any scenario makes the GID cache ambiguous then it cannot be
allowed. eg apparently macvlan must be denied, which makes this pretty
useless for namespaces.

>From your comments, I think the hardware function is going to have to
be improved to make this sane. I continue to recommend returning the
GID cache index in the WC.

> It might be ok for some cases in the transmit side (as you choose
> the smac based on the netdev attached to the GID entry, but if you
> add a vxlan based interface, you won't be able to add the
> appropriate headers). We can leave this as is or making it
> symmetrical.

Again, it is madness to allow the hardware to receive a packet on a UD
QP that is not present in the GID table, and it is unworkable to have
a WC that doesn't unambiguously refer to a GID Table entry.

So yes, things like vxlan should not be in the gid table if the
hardware cannot cope with it.

> So, when adding a GID, we need to consult the hardware capabilities
> regarding the metadata it can provide in the completion. If the
> hardware isn't capable of creating/stripping one of the headers of
> this netdev, there's no reason to add it.

Yes. This is also why long ago I suggested that the hardware driver
should provide a function to resolve the WC into a GID cach entry and
that function can rely on hardware unique capabilities.

IMHO userspace should not be exposed to this and UD QPs should be
locked by hardware to a single netdev worth of gid cache
entries. Anything weaker invites exploitation when we talk about
namespaces.

> If the hardware supports creating/stripping the required headers but
> it doesn't support reporting them in the completion or all fields
> are supported but there are conflicting entries, you could either
> consult the ingress route before adding these GIDs or add them both
> and consult the

No. Hardware must support all features: create/strip/report/per-QP
filter before the GID cache can have an entry. No subsets can be
permitted.

This probably means existing firmware/hardware/drivers cannot support
macvlan and maybe others, but that is much better than trying to
support it in an unsafe and insecure way.

That probbably answers Parav's earlier question about duplicates in
the gid table: It is a bug today that can even happen.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                                         ` <CAAKD3BD88Kp2h3+vP4iMy5T4rR6=Y39ZnNXRLM1P-6G6iB=BNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-02-09 18:48                                           ` Jason Gunthorpe
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2017-02-09 18:48 UTC (permalink / raw)
  To: Matan Barak
  Cc: Parav Pandit, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Matan Barak

On Thu, Feb 09, 2017 at 10:40:38AM +0200, Matan Barak wrote:

> Well, some applications (maybe a lot of them), creates an AH from
> the first received packet (of each node) and then just assumes the
> rest of the packets comes from the same logical netdev.  Since there
> is no binding to a netdev, other packets could differ vastly from
> this one. That's what I meant by "symmetrical".

Such applications should create a UD QP per netdev they want to
operate on.

There is no good reason to have a single 'all device' UD QP in user
space.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                                                         ` <20170209184720.GA809-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2017-02-09 21:55                                                           ` Parav Pandit
       [not found]                                                             ` <VI1PR0502MB30089F94C9F1504E142B86E0D1450-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Parav Pandit @ 2017-02-09 21:55 UTC (permalink / raw)
  To: Jason Gunthorpe, Matan Barak
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak

Hi Jason,

> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org]
> Sent: Thursday, February 9, 2017 12:47 PM
> To: Matan Barak <matanb-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
> Cc: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Roland Dreier
> <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Matan Barak
> <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
> 
> On Thu, Feb 09, 2017 at 11:24:17AM +0200, Matan Barak wrote:
> > On Thu, Feb 9, 2017 at 2:15 AM, Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> wrote:
> > >
> > >> From: Jason Gunthorpe [mailto:jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org]
> > >> Sent: Wednesday, February 8, 2017 6:02 PM
> > >> To: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >> Cc: Matan Barak <matanb-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>; Roland Dreier
> > >> <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Matan Barak
> > >> <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > >> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
> > >>
> > >> On Thu, Feb 09, 2017 at 12:00:09AM +0000, Parav Pandit wrote:
> > >>
> > >> > > That still breaks link local addresses on vlan devices, so it
> > >> > > is an ugly hack, not a solution.
> > >> >
> > >> > In presence of vlan, shouldn't we be passing the ifindex of the
> > >> > vlan
> > >> netdev?
> > >>
> > >> yes, that is exactly my point...
> > >
> > > Oh ok. I get it. I am on right path to fix it than.
> 
> Use the gid cache to figure out, not seaching netdevs..
> 
I still believe I need to search gid cache based on the netdev.
ib_find_cached_gid_by_port() already supports it.

Help me understand how to use the gid cache for below scenario without netdev.

eth0 has two vlans.
Each vlan device has one macvlan device.
eth0 (mac-A)
|----->eth0.vlan1 (mac-A)
        |---->macvlan1 (mac-B) - IP-1
        |---->macvlan2 (mac-C) - IP-2
|-----> eth0.vlan2 (mac-A)
        |---->macvlan3 (mac-D) - IP-1
        |---->macvlan4 (mac-E) - IP -3
Please note duplicate IP-1 assigned to two macvlan devices of two different vlan.
These two macvlan devices are in two different namespace and therefore duplicate IP is perfectly valid for them.
This means our GID cache will have two GID entries with different netdev.
Macvlan netdev doesn't have vlan property set.

Now when ib_wc arrives for it, it will not have mac address in it, but we will have VLAN.
Which means we can only reach upto eth0.vlan1.
If I seach gid cache purely based on the GID, vlan id, and netdev of eth0.vlan1 there won't a match in GID table.
So in order to have right search in gid cache we need the netdev of the macvlan-1.
So what proposed is,

If (wc_without_vlan()) {
      leaf_netdev = eth0.netdev;
} else {
     /* there is vlan in wc */
    Ifaddr = search_ifaddr(netdev, gid);
    If (ifaddr_found) {
        /* we found matching IP for this netdev, use this netdev further  */
       leaf_netdev = vlan_netdev;
    } else {
          for_each_lower_netdev_of_vlan(slave_ndev, list) {
                 ifaddr = search_ifaddr(netdev, gid);
                 if (ifaddr_found) {
                     /* matching ifaddr found, in macvlan dev, use this netdev further */
                     leaf_netdev = macvlan_netdev;
                     break;
                 }
          }
     gid_index = search_gid_cache(GID, leaf_netdev);
}

This should work with and without namespaces to resolve the right netdev and GID for macvlan, vlan, ipvlan, regular phy dev for linklocal and other ip addresses.
Most hardware will drop frame with mac-B-vlan-2 or other such combinations anyway.

> > > Additionally,
> > > when there is macvlan based slave device present on this vlan device, I
> will pass the ifindex of that particular netdev.
> > > Now since we don't have MAC address coming in ib_wc nor in IB/RoCEv2
> > > Annex spec, code needs to refer to the
> > > (a) ifaddr of the vlan netdev
> > > and
> > > (b) ifaddr of the slave netdevs
> > > Compare the DGID of the grh with ifaddr and use that netdev's ifindex for
> the first matching entry.
> > >
> > > Sounds reasonable now?
> > >
> >
> > Since we don't get the DMAC address, I think the GID cache shouldn't
> > carry entries which the hardware can't differentiate upon.
> 
> Well, more specifically, with this limiatation, the hardware must
> *NEVER* receive a packet that does not match the primary MAC of the port.
> 
> Which goes back to my first point: The hardware should not receive
> something that is not in the GID cache, period. It sounds like this basic sanity
> is being viloated in some current rocee hardware???
> 
> If any scenario makes the GID cache ambiguous then it cannot be allowed. eg
> apparently macvlan must be denied, which makes this pretty useless for
> namespaces.
> 
> From your comments, I think the hardware function is going to have to be
> improved to make this sane. I continue to recommend returning the GID
> cache index in the WC.
> 
> > It might be ok for some cases in the transmit side (as you choose the
> > smac based on the netdev attached to the GID entry, but if you add a
> > vxlan based interface, you won't be able to add the appropriate
> > headers). We can leave this as is or making it symmetrical.
> 
> Again, it is madness to allow the hardware to receive a packet on a UD QP
> that is not present in the GID table, and it is unworkable to have a WC that
> doesn't unambiguously refer to a GID Table entry.
> 
> So yes, things like vxlan should not be in the gid table if the hardware cannot
> cope with it.
> 
> > So, when adding a GID, we need to consult the hardware capabilities
> > regarding the metadata it can provide in the completion. If the
> > hardware isn't capable of creating/stripping one of the headers of
> > this netdev, there's no reason to add it.
> 
> Yes. This is also why long ago I suggested that the hardware driver should
> provide a function to resolve the WC into a GID cach entry and that function
> can rely on hardware unique capabilities.
> 
> IMHO userspace should not be exposed to this and UD QPs should be locked
> by hardware to a single netdev worth of gid cache entries. Anything weaker
> invites exploitation when we talk about namespaces.
> 
> > If the hardware supports creating/stripping the required headers but
> > it doesn't support reporting them in the completion or all fields are
> > supported but there are conflicting entries, you could either consult
> > the ingress route before adding these GIDs or add them both and
> > consult the
> 
> No. Hardware must support all features: create/strip/report/per-QP filter
> before the GID cache can have an entry. No subsets can be permitted.
> 
> This probably means existing firmware/hardware/drivers cannot support
> macvlan and maybe others, but that is much better than trying to support it
> in an unsafe and insecure way.
> 
> That probbably answers Parav's earlier question about duplicates in the gid
> table: It is a bug today that can even happen.
> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                                                             ` <VI1PR0502MB30089F94C9F1504E142B86E0D1450-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-02-09 22:18                                                               ` Jason Gunthorpe
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2017-02-09 22:18 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Matan Barak, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Matan Barak

On Thu, Feb 09, 2017 at 09:55:18PM +0000, Parav Pandit wrote:

> This means our GID cache will have two GID entries with different netdev.

When you load this into the GID cache you need to have four unique
entries for the MAC vlans:

(MAC-B, VLAN1, IP-1)
(MAC-C, VLAN1, IP-2)
(MAC-D, VLAN2, IP-1)
(MAC-E, VLAN2, IP-3)

And unique entries for IPs on the VLAN interfaces

(MAC-A, VLAN1, LL-IP for eth0.vlan1)
(MAC-A, VLAN2, LL-IP for eth0.vlan2)

And unique entries for the base netdevice:

(MAC-A, UNTAGGED, LL-IP for eth0)

All of these are unique keys in the GID cache.

> Now when ib_wc arrives for it, it will not have mac address in it,
> but we will have VLAN.  Which means we can only reach upto
> eth0.vlan1.

As I said before, you fundamentally cannot support macvlan on hardware
that does not provide the dmac or equivalent in the wc. As Matan said,
those IPs should never be added to the GID table in the first place.

> If I seach gid cache purely based on the GID, vlan id, and netdev of
> eth0.vlan1 there won't a match in GID table.

You'd never search the GID table with a netdev as part of the key.

But yes, if the hardware provides only (VLAN_ID, GID) then, by
definition, that hardware only supports DMAC == MAC-A and the gid
table search is (MAC-A, VLAN_ID, GID) and will not match any MACVLAN
entries.

This correct because the hardware does not support macvlan.

>           for_each_lower_netdev_of_vlan(slave_ndev, list) {
>                  ifaddr = search_ifaddr(netdev, gid);

This search is basically what ipvlan does.

It is completely wrong to replace the macvlan semantics with ipvlan
semantics way down in the offload world. If the admin intended to use
ipvlan then they would have done that.

You must stop trying to add fake support for macvlan on hardware that
cannot support it.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]     ` <20170130195412.GE24466-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2017-01-30 20:15       ` Parav Pandit
@ 2017-10-18 19:00       ` Parav Pandit
       [not found]         ` <VI1PR0502MB30086B735FD4B5948B521673D14D0-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  1 sibling, 1 reply; 31+ messages in thread
From: Parav Pandit @ 2017-10-18 19:00 UTC (permalink / raw)
  To: Devesh Sharma, Selvin Xavier, Somnath Kotur
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Jason Gunthorpe,
	Roland Dreier

Hi Devesh, Selvin, Somnath,

> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Jason Gunthorpe
> Sent: Monday, January 30, 2017 1:54 PM
> To: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
> 
> On Sun, Jan 29, 2017 at 05:21:43PM -0800, Roland Dreier wrote:
> > I recently debugged an issue where an RDMA-CM / RoCE connection to a
> > link-local IPv6 address takes 1 second to establish.  The delay comes
> > when the passive side cm_req_handler() calls
> > cm_init_av_for_response(), which calls ib_init_ah_from_wc().
> >
> > That calls into rdma_addr_find_l2_eth_by_grh() with the link-local
> > address, but with 0 passed in through the if_index parameter.  So
> >
> > The fix seems as simple as adding
> >
> >     if_index = idev->ifindex;
> >
> > to ib_init_ah_from_wc() before the call to
> > rdma_addr_find_l2_eth_by_grh() - this can't fail, since we return an
> > error if finding idev fails.
> 
> Hum, it is certainly a big conceptual mistake to not anchor a route lookup in a
> netdev at this point.
> 
> Both roceev1 and v2 need to use the netdev that was used to create the UD QP
> to anchor these lookups, that netdev should be a child of some kind from the
> get_netdev (eg a vlan, or macvlan, etc)
> 
> This is necessary to make namespaces and other subtle things work properly.
> 
> So, right idea, wrong netdev.
> 
> > However, I'm not sure if it's quite that simple - how does this
> > interact with RoCEv2, or other addressing like IPv4 or non-link-local
> > IPv6? I'm not sure I understand what the cases where resolved_dev will
> > be != idev, or if passing in idev->ifindex will break this.
> 
> I think this brokenness fell out of the above confusion, it should be removed,
> rdma_addr_find_dmac_by_grh should not be callable without a proper netdev.
> 
> The != case can happen if the gid table resolve differently than the route table,
> but it is conceptually wrong to lookup in the gid table before doing a route
> lookup, and I think these apis are poorly designed to enforce that requirement.
> 
> Jason

I have fixed this issue where rdma_addr_find_l2_eth_by_grh() accepts netdev of the GID.
Derivation of GID is based on vlan, and DGID of the incoming GRH, network_type fields.
Finding out the right GID depends above fields from the cached gid table.

Before I post the patch,
We found that bnx_re driver doesn't set IB_WC_WITH_VLAN in the QP1 wc.
Fail to set vlan id and IB_WC_WITH_VLAN flag when incoming QP1 packet in vlan, will fail to search the right GID.

Please also see recent important fix in qedr driver [1] who now correctly sets IB_WC_WITH_VLAN, in case if you need reference.
Most drivers are setting it correctly now.

Will you please fix this issue in bnx_re driver as well - to set IB_WC_WITH_VLAN, vlan_id, SL when vlan is present in QP1 UD receive packets?

Please let me know in case if you need more information.

[1] https://www.spinics.net/lists/linux-rdma/msg55115.html

Parav
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]         ` <VI1PR0502MB30086B735FD4B5948B521673D14D0-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-10-19  4:32           ` Devesh Sharma
       [not found]             ` <CANjDDBhJK=s28zrdCASUVSSeem=NVRWwsHT=jWS2N4jGh2Uvfw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Devesh Sharma @ 2017-10-19  4:32 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Selvin Xavier, Somnath Kotur, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Matan Barak, Jason Gunthorpe, Roland Dreier

Hello Parav,

Thanks for you note here. I can supply you a fix, but before that
please find a query inline below:

On Thu, Oct 19, 2017 at 12:30 AM, Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
> Hi Devesh, Selvin, Somnath,
>
>> -----Original Message-----
>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Jason Gunthorpe
>> Sent: Monday, January 30, 2017 1:54 PM
>> To: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
>>
>> On Sun, Jan 29, 2017 at 05:21:43PM -0800, Roland Dreier wrote:
>> > I recently debugged an issue where an RDMA-CM / RoCE connection to a
>> > link-local IPv6 address takes 1 second to establish.  The delay comes
>> > when the passive side cm_req_handler() calls
>> > cm_init_av_for_response(), which calls ib_init_ah_from_wc().
>> >
>> > That calls into rdma_addr_find_l2_eth_by_grh() with the link-local
>> > address, but with 0 passed in through the if_index parameter.  So
>> >
>> > The fix seems as simple as adding
>> >
>> >     if_index = idev->ifindex;
>> >
>> > to ib_init_ah_from_wc() before the call to
>> > rdma_addr_find_l2_eth_by_grh() - this can't fail, since we return an
>> > error if finding idev fails.
>>
>> Hum, it is certainly a big conceptual mistake to not anchor a route lookup in a
>> netdev at this point.
>>
>> Both roceev1 and v2 need to use the netdev that was used to create the UD QP
>> to anchor these lookups, that netdev should be a child of some kind from the
>> get_netdev (eg a vlan, or macvlan, etc)
>>
>> This is necessary to make namespaces and other subtle things work properly.
>>
>> So, right idea, wrong netdev.
>>
>> > However, I'm not sure if it's quite that simple - how does this
>> > interact with RoCEv2, or other addressing like IPv4 or non-link-local
>> > IPv6? I'm not sure I understand what the cases where resolved_dev will
>> > be != idev, or if passing in idev->ifindex will break this.
>>
>> I think this brokenness fell out of the above confusion, it should be removed,
>> rdma_addr_find_dmac_by_grh should not be callable without a proper netdev.
>>
>> The != case can happen if the gid table resolve differently than the route table,
>> but it is conceptually wrong to lookup in the gid table before doing a route
>> lookup, and I think these apis are poorly designed to enforce that requirement.
>>
>> Jason
>
> I have fixed this issue where rdma_addr_find_l2_eth_by_grh() accepts netdev of the GID.
> Derivation of GID is based on vlan, and DGID of the incoming GRH, network_type fields.
> Finding out the right GID depends above fields from the cached gid table.
>
> Before I post the patch,
> We found that bnx_re driver doesn't set IB_WC_WITH_VLAN in the QP1 wc.
> Fail to set vlan id and IB_WC_WITH_VLAN flag when incoming QP1 packet in vlan, will fail to search the right GID.

I am under impression that IB_WC_WITH_VLAN flags was supposed to be
set only if any vendor's WC reports VLAN-ID, otherwise it had to be
derived from somewhere else. Are we moving away from this notion?


>
> Please also see recent important fix in qedr driver [1] who now correctly sets IB_WC_WITH_VLAN, in case if you need reference.
> Most drivers are setting it correctly now.
>
> Will you please fix this issue in bnx_re driver as well - to set IB_WC_WITH_VLAN, vlan_id, SL when vlan is present in QP1 UD receive packets?
>
> Please let me know in case if you need more information.
>
> [1] https://www.spinics.net/lists/linux-rdma/msg55115.html
>
> Parav
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]             ` <CANjDDBhJK=s28zrdCASUVSSeem=NVRWwsHT=jWS2N4jGh2Uvfw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-10-19  4:54               ` Parav Pandit
       [not found]                 ` <VI1PR0502MB300803053059AA7CC0E6DA7DD1420-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  2017-10-19  6:20               ` Jason Gunthorpe
  1 sibling, 1 reply; 31+ messages in thread
From: Parav Pandit @ 2017-10-19  4:54 UTC (permalink / raw)
  To: Devesh Sharma
  Cc: Selvin Xavier, Somnath Kotur, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Matan Barak, Jason Gunthorpe, Roland Dreier

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 4609 bytes --]

Hi Devesh,

> -----Original Message-----
> From: Devesh Sharma [mailto:devesh.sharma@broadcom.com]
> Sent: Wednesday, October 18, 2017 11:32 PM
> To: Parav Pandit <parav@mellanox.com>
> Cc: Selvin Xavier <selvin.xavier@broadcom.com>; Somnath Kotur
> <somnath.kotur@broadcom.com>; linux-rdma@vger.kernel.org; Matan Barak
> <matanb@mellanox.com>; Jason Gunthorpe
> <jgunthorpe@obsidianresearch.com>; Roland Dreier
> <roland@purestorage.com>
> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
> 
> Hello Parav,
> 
> Thanks for you note here. I can supply you a fix, but before that please find a
> query inline below:

That will be good. More inline below.

> 
> On Thu, Oct 19, 2017 at 12:30 AM, Parav Pandit <parav@mellanox.com> wrote:
> > Hi Devesh, Selvin, Somnath,
> >
> >> -----Original Message-----
> >> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> >> owner@vger.kernel.org] On Behalf Of Jason Gunthorpe
> >> Sent: Monday, January 30, 2017 1:54 PM
> >> To: Roland Dreier <roland@purestorage.com>
> >> Cc: linux-rdma@vger.kernel.org; Matan Barak <matanb@mellanox.com>
> >> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
> >>
> >> On Sun, Jan 29, 2017 at 05:21:43PM -0800, Roland Dreier wrote:
> >> > I recently debugged an issue where an RDMA-CM / RoCE connection to
> >> > a link-local IPv6 address takes 1 second to establish.  The delay
> >> > comes when the passive side cm_req_handler() calls
> >> > cm_init_av_for_response(), which calls ib_init_ah_from_wc().
> >> >
> >> > That calls into rdma_addr_find_l2_eth_by_grh() with the link-local
> >> > address, but with 0 passed in through the if_index parameter.  So
> >> >
> >> > The fix seems as simple as adding
> >> >
> >> >     if_index = idev->ifindex;
> >> >
> >> > to ib_init_ah_from_wc() before the call to
> >> > rdma_addr_find_l2_eth_by_grh() - this can't fail, since we return
> >> > an error if finding idev fails.
> >>
> >> Hum, it is certainly a big conceptual mistake to not anchor a route
> >> lookup in a netdev at this point.
> >>
> >> Both roceev1 and v2 need to use the netdev that was used to create
> >> the UD QP to anchor these lookups, that netdev should be a child of
> >> some kind from the get_netdev (eg a vlan, or macvlan, etc)
> >>
> >> This is necessary to make namespaces and other subtle things work properly.
> >>
> >> So, right idea, wrong netdev.
> >>
> >> > However, I'm not sure if it's quite that simple - how does this
> >> > interact with RoCEv2, or other addressing like IPv4 or
> >> > non-link-local IPv6? I'm not sure I understand what the cases where
> >> > resolved_dev will be != idev, or if passing in idev->ifindex will break this.
> >>
> >> I think this brokenness fell out of the above confusion, it should be
> >> removed, rdma_addr_find_dmac_by_grh should not be callable without a
> proper netdev.
> >>
> >> The != case can happen if the gid table resolve differently than the
> >> route table, but it is conceptually wrong to lookup in the gid table
> >> before doing a route lookup, and I think these apis are poorly designed to
> enforce that requirement.
> >>
> >> Jason
> >
> > I have fixed this issue where rdma_addr_find_l2_eth_by_grh() accepts netdev
> of the GID.
> > Derivation of GID is based on vlan, and DGID of the incoming GRH,
> network_type fields.
> > Finding out the right GID depends above fields from the cached gid table.
> >
> > Before I post the patch,
> > We found that bnx_re driver doesn't set IB_WC_WITH_VLAN in the QP1 wc.
> > Fail to set vlan id and IB_WC_WITH_VLAN flag when incoming QP1 packet in
> vlan, will fail to search the right GID.
> 
> I am under impression that IB_WC_WITH_VLAN flags was supposed to be set
> only if any vendor's WC reports VLAN-ID, otherwise it had to be derived from
> somewhere else. Are we moving away from this notion?

Yes. That is correct. Its optional. However vlan defines L2 segments.
Two L2 segments can have same IP address and therefore GIDs. So Ignoring vlan is not good because ib_find_gid_index and friend functions won't be able to resolve them, which performs vlan + SGID exact lookup.
As discussed in past thread, we like to find the matching SGID first based on GRH + available L2 field such as vlan, and refer to the ndev of the SGID.
So my upcoming patch brings such fundamental fix that resolves IPv6 link local issue and few other similar cases.
N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]             ` <CANjDDBhJK=s28zrdCASUVSSeem=NVRWwsHT=jWS2N4jGh2Uvfw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2017-10-19  4:54               ` Parav Pandit
@ 2017-10-19  6:20               ` Jason Gunthorpe
  1 sibling, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2017-10-19  6:20 UTC (permalink / raw)
  To: Devesh Sharma
  Cc: Parav Pandit, Selvin Xavier, Somnath Kotur,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Roland Dreier

On Thu, Oct 19, 2017 at 10:02:17AM +0530, Devesh Sharma wrote:

> > Before I post the patch,
> > We found that bnx_re driver doesn't set IB_WC_WITH_VLAN in the QP1 wc.
> > Fail to set vlan id and IB_WC_WITH_VLAN flag when incoming QP1 packet in vlan, will fail to search the right GID.
> 
> I am under impression that IB_WC_WITH_VLAN flags was supposed to be
> set only if any vendor's WC reports VLAN-ID, otherwise it had to be
> derived from somewhere else. Are we moving away from this notion?

If your hardware does not report the VLAN, then it cannot set the
flag.

The issue here, is that we need a 100% reliable way to go from a WC to
the gid table index that was used to recieve the packet in the
hardware. The spec is defective here, IMHO, because it does not define
how this case should work.

Ideally hardware will report the index directly in the WC, but AFAIK,
no hardware does this.

Parav is looking at mlx hardware which reports the headers, and I
guess, he is planning to search the GID table in software using the
headers from the WC, GRH, etc.

Not sure what broadcom hardware can do here.

But, the basic principle holds, we *MUST* recover the rx GID index the
hardware used. Tell Parav how to do that for your chip.

If brodcom hardware cannot report the GID index directly, and does not
report all the headers (eg vlan) then it *MUST* reject GID table
configurations that create an ambiguity, eg it cannot allow the same
IP on different VLANs. This is horrible and you should try to avoid
that by having a sensible WC..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                 ` <VI1PR0502MB300803053059AA7CC0E6DA7DD1420-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-10-23 16:28                   ` Parav Pandit
       [not found]                     ` <VI1PR0502MB30087224F41436F787C4997FD1460-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Parav Pandit @ 2017-10-23 16:28 UTC (permalink / raw)
  To: Parav Pandit, Devesh Sharma
  Cc: Selvin Xavier, Somnath Kotur, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Matan Barak, Jason Gunthorpe, Roland Dreier

Hi Devesh,

> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Parav Pandit
> Sent: Wednesday, October 18, 2017 11:54 PM
> To: Devesh Sharma <devesh.sharma@broadcom.com>
> Cc: Selvin Xavier <selvin.xavier@broadcom.com>; Somnath Kotur
> <somnath.kotur@broadcom.com>; linux-rdma@vger.kernel.org; Matan Barak
> <matanb@mellanox.com>; Jason Gunthorpe
> <jgunthorpe@obsidianresearch.com>; Roland Dreier
> <roland@purestorage.com>
> Subject: RE: Need to set if_index in ib_init_ah_from_wc() ?
> 
> Hi Devesh,
> 
> > -----Original Message-----
> > From: Devesh Sharma [mailto:devesh.sharma@broadcom.com]
> > Sent: Wednesday, October 18, 2017 11:32 PM
> > To: Parav Pandit <parav@mellanox.com>
> > Cc: Selvin Xavier <selvin.xavier@broadcom.com>; Somnath Kotur
> > <somnath.kotur@broadcom.com>; linux-rdma@vger.kernel.org; Matan Barak
> > <matanb@mellanox.com>; Jason Gunthorpe
> > <jgunthorpe@obsidianresearch.com>; Roland Dreier
> > <roland@purestorage.com>
> > Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
> >
> > Hello Parav,
> >
> > Thanks for you note here. I can supply you a fix, but before that
> > please find a query inline below:
> 
So I am going forward to fix the ib core as discussed.
Please fix bnx_re driver whenever you find it appropriate in your schedule.

> That will be good. More inline below.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                     ` <VI1PR0502MB30087224F41436F787C4997FD1460-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-10-23 17:39                       ` Leon Romanovsky
       [not found]                         ` <20171023173946.GP2106-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  0 siblings, 1 reply; 31+ messages in thread
From: Leon Romanovsky @ 2017-10-23 17:39 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Devesh Sharma, Selvin Xavier, Somnath Kotur,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Jason Gunthorpe,
	Roland Dreier

[-- Attachment #1: Type: text/plain, Size: 2173 bytes --]

On Mon, Oct 23, 2017 at 04:28:43PM +0000, Parav Pandit wrote:
> Hi Devesh,
>
> > -----Original Message-----
> > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> > owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Parav Pandit
> > Sent: Wednesday, October 18, 2017 11:54 PM
> > To: Devesh Sharma <devesh.sharma-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> > Cc: Selvin Xavier <selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>; Somnath Kotur
> > <somnath.kotur-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Matan Barak
> > <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Jason Gunthorpe
> > <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>; Roland Dreier
> > <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
> > Subject: RE: Need to set if_index in ib_init_ah_from_wc() ?
> >
> > Hi Devesh,
> >
> > > -----Original Message-----
> > > From: Devesh Sharma [mailto:devesh.sharma-dY08KVG/lbpWk0Htik3J/w@public.gmane.org]
> > > Sent: Wednesday, October 18, 2017 11:32 PM
> > > To: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> > > Cc: Selvin Xavier <selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>; Somnath Kotur
> > > <somnath.kotur-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Matan Barak
> > > <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Jason Gunthorpe
> > > <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>; Roland Dreier
> > > <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
> > > Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
> > >
> > > Hello Parav,
> > >
> > > Thanks for you note here. I can supply you a fix, but before that
> > > please find a query inline below:
> >
> So I am going forward to fix the ib core as discussed.
> Please fix bnx_re driver whenever you find it appropriate in your schedule.

I would like to have proper deadline here, from one side, I don't want
to wait for Broadcom's fix forever, but from another, I don't want to
forward ANY patch which breaks other vendors.

Thanks

>
> > That will be good. More inline below.
>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                         ` <20171023173946.GP2106-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
@ 2017-10-29 18:26                           ` Jason Gunthorpe
  2017-10-30  3:59                           ` Devesh Sharma
  1 sibling, 0 replies; 31+ messages in thread
From: Jason Gunthorpe @ 2017-10-29 18:26 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Parav Pandit, Devesh Sharma, Selvin Xavier, Somnath Kotur,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Roland Dreier,
	Ram Amrani

On Mon, Oct 23, 2017 at 08:39:46PM +0300, Leon Romanovsky wrote:

> I would like to have proper deadline here, from one side, I don't want
> to wait for Broadcom's fix forever, but from another, I don't want to
> forward ANY patch which breaks other vendors.

I've had the guess that each driver will need to provide a WC to gid
table entry conversion function..

Ideal hardware driver will report the gid table index in their
internal WC. Other hardware like mlx wil have to parse the WC and the
GRH headers and so forth to figure out the gid table index.

The gid table index is security sensitive, so it *MUST* be determined
exactly and correctly.

But we need to understand what the other rocee drivers have done in
this area..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                         ` <20171023173946.GP2106-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  2017-10-29 18:26                           ` Jason Gunthorpe
@ 2017-10-30  3:59                           ` Devesh Sharma
       [not found]                             ` <CANjDDBiuFYo_3Y=0oGMza1N5SGcNcgKE8y1LscYK6bc44-YxGA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 31+ messages in thread
From: Devesh Sharma @ 2017-10-30  3:59 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Parav Pandit, Selvin Xavier, Somnath Kotur,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Jason Gunthorpe,
	Roland Dreier

Hi Leon/Parav,

I have a working patch under QA to support this change. I should be
able to push in a day or two.

-Regards
Devesh

On Mon, Oct 23, 2017 at 11:09 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> On Mon, Oct 23, 2017 at 04:28:43PM +0000, Parav Pandit wrote:
>> Hi Devesh,
>>
>> > -----Original Message-----
>> > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
>> > owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Parav Pandit
>> > Sent: Wednesday, October 18, 2017 11:54 PM
>> > To: Devesh Sharma <devesh.sharma-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
>> > Cc: Selvin Xavier <selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>; Somnath Kotur
>> > <somnath.kotur-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Matan Barak
>> > <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Jason Gunthorpe
>> > <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>; Roland Dreier
>> > <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
>> > Subject: RE: Need to set if_index in ib_init_ah_from_wc() ?
>> >
>> > Hi Devesh,
>> >
>> > > -----Original Message-----
>> > > From: Devesh Sharma [mailto:devesh.sharma-dY08KVG/lbpWk0Htik3J/w@public.gmane.org]
>> > > Sent: Wednesday, October 18, 2017 11:32 PM
>> > > To: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> > > Cc: Selvin Xavier <selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>; Somnath Kotur
>> > > <somnath.kotur-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Matan Barak
>> > > <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Jason Gunthorpe
>> > > <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>; Roland Dreier
>> > > <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
>> > > Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
>> > >
>> > > Hello Parav,
>> > >
>> > > Thanks for you note here. I can supply you a fix, but before that
>> > > please find a query inline below:
>> >
>> So I am going forward to fix the ib core as discussed.
>> Please fix bnx_re driver whenever you find it appropriate in your schedule.
>
> I would like to have proper deadline here, from one side, I don't want
> to wait for Broadcom's fix forever, but from another, I don't want to
> forward ANY patch which breaks other vendors.
>
> Thanks
>
>>
>> > That will be good. More inline below.
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Need to set if_index in ib_init_ah_from_wc() ?
       [not found]                             ` <CANjDDBiuFYo_3Y=0oGMza1N5SGcNcgKE8y1LscYK6bc44-YxGA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-10-30  5:32                               ` Leon Romanovsky
  0 siblings, 0 replies; 31+ messages in thread
From: Leon Romanovsky @ 2017-10-30  5:32 UTC (permalink / raw)
  To: Devesh Sharma
  Cc: Parav Pandit, Selvin Xavier, Somnath Kotur,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Jason Gunthorpe,
	Roland Dreier

[-- Attachment #1: Type: text/plain, Size: 256 bytes --]

On Mon, Oct 30, 2017 at 09:29:01AM +0530, Devesh Sharma wrote:
> Hi Leon/Parav,
>
> I have a working patch under QA to support this change. I should be
> able to push in a day or two.

Thanks

By the way, can you please do not reply in top-posting format?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2017-10-30  5:32 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-30  1:21 Need to set if_index in ib_init_ah_from_wc() ? Roland Dreier
     [not found] ` <CAL1RGDW6iHo2UYKbcJmcG=wCq63jvZB7nvOD=BJZwASSzc7Zhw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-01-30 19:54   ` Jason Gunthorpe
     [not found]     ` <20170130195412.GE24466-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-01-30 20:15       ` Parav Pandit
     [not found]         ` <VI1PR0502MB30081EBFBEE5994B77D58BB9D14B0-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-01-30 21:14           ` Jason Gunthorpe
     [not found]             ` <20170130211420.GB27111-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-01-30 21:52               ` Parav Pandit
     [not found]                 ` <VI1PR0502MB30081EC9B35269636B4C2FBCD14B0-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-01-30 22:14                   ` Jason Gunthorpe
     [not found]                     ` <20170130221437.GA28117-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-01-31  0:21                       ` Parav Pandit
     [not found]                         ` <VI1PR0502MB3008B16A0F103D729209EA16D14A0-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-01-31  0:53                           ` Jason Gunthorpe
     [not found]                             ` <20170131005340.GA30809-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-01-31  1:49                               ` Jason Gunthorpe
     [not found]                                 ` <20170131014941.GA15387-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-01-31  1:57                                   ` Parav Pandit
2017-02-08 16:02                               ` Matan Barak
     [not found]                                 ` <CAAKD3BDbkFHbCi+gHyCXCGV2xi5E9FA+KgwKz+6dBJEtsV0ZkQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-02-08 17:40                                   ` Jason Gunthorpe
     [not found]                                     ` <20170208174040.GC30720-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-02-09  0:00                                       ` Parav Pandit
     [not found]                                         ` <VI1PR0502MB3008417F3E388B617291147CD1450-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-02-09  0:01                                           ` Jason Gunthorpe
     [not found]                                             ` <20170209000157.GA14556-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-02-09  0:15                                               ` Parav Pandit
     [not found]                                                 ` <VI1PR0502MB3008153CDFD7F9C7CCAB022CD1450-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-02-09  9:24                                                   ` Matan Barak
     [not found]                                                     ` <CAAKD3BBunKCb6Z-FObOD0covTt3gCXDn3oic4LNbU3J5JHiQJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-02-09 18:47                                                       ` Jason Gunthorpe
     [not found]                                                         ` <20170209184720.GA809-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2017-02-09 21:55                                                           ` Parav Pandit
     [not found]                                                             ` <VI1PR0502MB30089F94C9F1504E142B86E0D1450-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-02-09 22:18                                                               ` Jason Gunthorpe
2017-02-09  8:40                                       ` Matan Barak
     [not found]                                         ` <CAAKD3BD88Kp2h3+vP4iMy5T4rR6=Y39ZnNXRLM1P-6G6iB=BNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-02-09 18:48                                           ` Jason Gunthorpe
2017-01-31  0:24                   ` Roland Dreier
2017-10-18 19:00       ` Parav Pandit
     [not found]         ` <VI1PR0502MB30086B735FD4B5948B521673D14D0-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-10-19  4:32           ` Devesh Sharma
     [not found]             ` <CANjDDBhJK=s28zrdCASUVSSeem=NVRWwsHT=jWS2N4jGh2Uvfw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-19  4:54               ` Parav Pandit
     [not found]                 ` <VI1PR0502MB300803053059AA7CC0E6DA7DD1420-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-10-23 16:28                   ` Parav Pandit
     [not found]                     ` <VI1PR0502MB30087224F41436F787C4997FD1460-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-10-23 17:39                       ` Leon Romanovsky
     [not found]                         ` <20171023173946.GP2106-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-10-29 18:26                           ` Jason Gunthorpe
2017-10-30  3:59                           ` Devesh Sharma
     [not found]                             ` <CANjDDBiuFYo_3Y=0oGMza1N5SGcNcgKE8y1LscYK6bc44-YxGA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-30  5:32                               ` Leon Romanovsky
2017-10-19  6:20               ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.