All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/3] siw on tunnel devices
@ 2023-05-05 15:41 Chuck Lever
  2023-05-05 15:42 ` [PATCH RFC 1/3] net/tun: Ensure tun devices have a MAC address Chuck Lever
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Chuck Lever @ 2023-05-05 15:41 UTC (permalink / raw)
  To: netdev, linux-rdma; +Cc: BMT, tom

Chalk this one up to yet another crazy idea.

At NFS testing events, we'd like to test NFS/RDMA over the event's
private network. We can do that with iWARP using siw from guests.

If the guest itself is on the VPN, that means siw's slave device
is a tun device. Such devices have no MAC address. That breaks the
RDMA core's ability to find the correct egress device for siw when
given a source IP address.

We've worked around this in the past with various software hacks,
but we'd rather see full support for this capability in stock
kernels.

A direct and perhaps naïve way to do that is to give loopback and
tun devices their own artificial MAC addresses for this purpose.

---

Chuck Lever (3):
      net/tun: Ensure tun devices have a MAC address
      net/lo: Ensure lo devices have a MAC address
      RDMA/siw: Require non-zero 6-byte MACs for soft iWARP


 drivers/infiniband/sw/siw/siw_main.c | 22 +++++++---------------
 drivers/net/loopback.c               |  2 ++
 drivers/net/tun.c                    |  6 +++---
 3 files changed, 12 insertions(+), 18 deletions(-)

--
Chuck Lever


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RFC 1/3] net/tun: Ensure tun devices have a MAC address
  2023-05-05 15:41 [PATCH RFC 0/3] siw on tunnel devices Chuck Lever
@ 2023-05-05 15:42 ` Chuck Lever
  2023-05-05 16:59   ` Stephen Hemminger
  2023-05-05 15:42 ` [PATCH RFC 2/3] net/lo: Ensure lo " Chuck Lever
  2023-05-05 15:43 ` [PATCH RFC 3/3] RDMA/siw: Require non-zero 6-byte MACs for soft iWARP Chuck Lever
  2 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2023-05-05 15:42 UTC (permalink / raw)
  To: netdev, linux-rdma; +Cc: BMT, tom

From: Chuck Lever <chuck.lever@oracle.com>

A non-zero MAC address enables a network device to be assigned as
the underlying device for a virtual RDMA device. Without a non-
zero MAC address, cma_acquire_dev_by_src_ip() is unable to find the
underlying egress device that corresponds to a source IP address,
and rdma_resolve_address() fails.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 drivers/net/tun.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index d4d0a41a905a..da85abfcd254 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1384,7 +1384,7 @@ static void tun_net_initialize(struct net_device *dev)
 
 		/* Point-to-Point TUN Device */
 		dev->hard_header_len = 0;
-		dev->addr_len = 0;
+		dev->addr_len = ETH_ALEN;
 		dev->mtu = 1500;
 
 		/* Zero header length */
@@ -1399,8 +1399,6 @@ static void tun_net_initialize(struct net_device *dev)
 		dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 		dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
 
-		eth_hw_addr_random(dev);
-
 		/* Currently tun does not support XDP, only tap does. */
 		dev->xdp_features = NETDEV_XDP_ACT_BASIC |
 				    NETDEV_XDP_ACT_REDIRECT |
@@ -1409,6 +1407,8 @@ static void tun_net_initialize(struct net_device *dev)
 		break;
 	}
 
+	eth_hw_addr_random(dev);
+
 	dev->min_mtu = MIN_MTU;
 	dev->max_mtu = MAX_MTU - dev->hard_header_len;
 }



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC 2/3] net/lo: Ensure lo devices have a MAC address
  2023-05-05 15:41 [PATCH RFC 0/3] siw on tunnel devices Chuck Lever
  2023-05-05 15:42 ` [PATCH RFC 1/3] net/tun: Ensure tun devices have a MAC address Chuck Lever
@ 2023-05-05 15:42 ` Chuck Lever
  2023-05-05 16:57   ` Stephen Hemminger
  2023-05-05 15:43 ` [PATCH RFC 3/3] RDMA/siw: Require non-zero 6-byte MACs for soft iWARP Chuck Lever
  2 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2023-05-05 15:42 UTC (permalink / raw)
  To: netdev, linux-rdma; +Cc: BMT, tom

From: Chuck Lever <chuck.lever@oracle.com>

A non-zero MAC address enables a network device to be assigned as
the underlying device for a virtual RDMA device. Without a non-
zero MAC address, cma_acquire_dev_by_src_ip() is unable to find the
underlying egress device that corresponds to a source IP address,
and rdma_resolve_address() fails.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 drivers/net/loopback.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index f6d53e63ef4e..1ce4f19d8065 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -192,6 +192,8 @@ static void gen_lo_setup(struct net_device *dev,
 	dev->needs_free_netdev	= true;
 	dev->priv_destructor	= dev_destructor;
 
+	eth_hw_addr_random(dev);
+
 	netif_set_tso_max_size(dev, GSO_MAX_SIZE);
 }
 



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC 3/3] RDMA/siw: Require non-zero 6-byte MACs for soft iWARP
  2023-05-05 15:41 [PATCH RFC 0/3] siw on tunnel devices Chuck Lever
  2023-05-05 15:42 ` [PATCH RFC 1/3] net/tun: Ensure tun devices have a MAC address Chuck Lever
  2023-05-05 15:42 ` [PATCH RFC 2/3] net/lo: Ensure lo " Chuck Lever
@ 2023-05-05 15:43 ` Chuck Lever
  2023-05-05 19:58   ` Jason Gunthorpe
  2 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2023-05-05 15:43 UTC (permalink / raw)
  To: netdev, linux-rdma; +Cc: BMT, tom

From: Chuck Lever <chuck.lever@oracle.com>

In the past, LOOPBACK and NONE (tunnel) devices had all-zero MAC
addresses. siw_device_create() would fall back to copying the
device's name in those cases, because an all-zero MAC address breaks
the RDMA core IP-to-device lookup mechanism.

However, in some cases, the net_device::name field is also empty.
So we're back at square one.

Rather than checking the device type, look at the
net_device::addr_len field. If it's got the right number of octets
and it is not all zeroes, use that.

Then, to enable siw support for that device/address type, change
the device driver to ensure such devices have a valid 6-octet MAC
address. For virtual devices, using eth_hw_addr_random() is
sufficient.

Fixes: a2d36b02c15d ("RDMA/siw: Enable siw on tunnel devices")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 drivers/infiniband/sw/siw/siw_main.c |   22 +++++++---------------
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c
index 65b5cda5457b..2c31bf397993 100644
--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -304,10 +304,15 @@ static const struct ib_device_ops siw_device_ops = {
 
 static struct siw_device *siw_device_create(struct net_device *netdev)
 {
+	static const u8 zeromac[ETH_ALEN] = { 0 };
 	struct siw_device *sdev = NULL;
 	struct ib_device *base_dev;
 	int rv;
 
+	if ((netdev->addr_len != ETH_ALEN) ||
+	    (memcmp(netdev->dev_addr, zeromac, ETH_ALEN) == 0))
+		return NULL;
+
 	sdev = ib_alloc_device(siw_device, base_dev);
 	if (!sdev)
 		return NULL;
@@ -316,21 +321,8 @@ static struct siw_device *siw_device_create(struct net_device *netdev)
 
 	sdev->netdev = netdev;
 
-	if (netdev->type != ARPHRD_LOOPBACK && netdev->type != ARPHRD_NONE) {
-		addrconf_addr_eui48((unsigned char *)&base_dev->node_guid,
-				    netdev->dev_addr);
-	} else {
-		/*
-		 * This device does not have a HW address,
-		 * but connection mangagement lib expects gid != 0
-		 */
-		size_t len = min_t(size_t, strlen(base_dev->name), 6);
-		char addr[6] = { };
-
-		memcpy(addr, base_dev->name, len);
-		addrconf_addr_eui48((unsigned char *)&base_dev->node_guid,
-				    addr);
-	}
+	addrconf_addr_eui48((unsigned char *)&base_dev->node_guid,
+			    netdev->dev_addr);
 
 	base_dev->uverbs_cmd_mask |= BIT_ULL(IB_USER_VERBS_CMD_POST_SEND);
 



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 2/3] net/lo: Ensure lo devices have a MAC address
  2023-05-05 15:42 ` [PATCH RFC 2/3] net/lo: Ensure lo " Chuck Lever
@ 2023-05-05 16:57   ` Stephen Hemminger
  0 siblings, 0 replies; 17+ messages in thread
From: Stephen Hemminger @ 2023-05-05 16:57 UTC (permalink / raw)
  To: Chuck Lever; +Cc: netdev, linux-rdma, BMT, tom

On Fri, 05 May 2023 11:42:44 -0400
Chuck Lever <cel@kernel.org> wrote:

> From: Chuck Lever <chuck.lever@oracle.com>
> 
> A non-zero MAC address enables a network device to be assigned as
> the underlying device for a virtual RDMA device. Without a non-
> zero MAC address, cma_acquire_dev_by_src_ip() is unable to find the
> underlying egress device that corresponds to a source IP address,
> and rdma_resolve_address() fails.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  drivers/net/loopback.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
> index f6d53e63ef4e..1ce4f19d8065 100644
> --- a/drivers/net/loopback.c
> +++ b/drivers/net/loopback.c
> @@ -192,6 +192,8 @@ static void gen_lo_setup(struct net_device *dev,
>  	dev->needs_free_netdev	= true;
>  	dev->priv_destructor	= dev_destructor;
>  
> +	eth_hw_addr_random(dev);
> +
>  	netif_set_tso_max_size(dev, GSO_MAX_SIZE);
>  }
>  
> 
> 
> 

This enough of a change, it will probably break somebody.
If you need dummy endpoint (ie multiple loopback), a common way
is to use dummy devices for that.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 1/3] net/tun: Ensure tun devices have a MAC address
  2023-05-05 15:42 ` [PATCH RFC 1/3] net/tun: Ensure tun devices have a MAC address Chuck Lever
@ 2023-05-05 16:59   ` Stephen Hemminger
  2023-05-05 17:09     ` Chuck Lever III
  0 siblings, 1 reply; 17+ messages in thread
From: Stephen Hemminger @ 2023-05-05 16:59 UTC (permalink / raw)
  To: Chuck Lever; +Cc: netdev, linux-rdma, BMT, tom

On Fri, 05 May 2023 11:42:17 -0400
Chuck Lever <cel@kernel.org> wrote:

> From: Chuck Lever <chuck.lever@oracle.com>
> 
> A non-zero MAC address enables a network device to be assigned as
> the underlying device for a virtual RDMA device. Without a non-
> zero MAC address, cma_acquire_dev_by_src_ip() is unable to find the
> underlying egress device that corresponds to a source IP address,
> and rdma_resolve_address() fails.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  drivers/net/tun.c |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index d4d0a41a905a..da85abfcd254 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1384,7 +1384,7 @@ static void tun_net_initialize(struct net_device *dev)
>  
>  		/* Point-to-Point TUN Device */
>  		dev->hard_header_len = 0;
> -		dev->addr_len = 0;
> +		dev->addr_len = ETH_ALEN;
>  		dev->mtu = 1500;
>  
>  		/* Zero header length */

This is a bad idea.
TUN devices are L3 devices without any MAC address.
This patch will change the semantics and break users.

If you want an L2 address, you need to use TAP, not TUN device.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 1/3] net/tun: Ensure tun devices have a MAC address
  2023-05-05 16:59   ` Stephen Hemminger
@ 2023-05-05 17:09     ` Chuck Lever III
  0 siblings, 0 replies; 17+ messages in thread
From: Chuck Lever III @ 2023-05-05 17:09 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Chuck Lever, open list:NETWORKING [GENERAL], linux-rdma, BMT, Tom Talpey



> On May 5, 2023, at 12:59 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> On Fri, 05 May 2023 11:42:17 -0400
> Chuck Lever <cel@kernel.org> wrote:
> 
>> From: Chuck Lever <chuck.lever@oracle.com>
>> 
>> A non-zero MAC address enables a network device to be assigned as
>> the underlying device for a virtual RDMA device. Without a non-
>> zero MAC address, cma_acquire_dev_by_src_ip() is unable to find the
>> underlying egress device that corresponds to a source IP address,
>> and rdma_resolve_address() fails.
>> 
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> drivers/net/tun.c |    6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index d4d0a41a905a..da85abfcd254 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1384,7 +1384,7 @@ static void tun_net_initialize(struct net_device *dev)
>> 
>> /* Point-to-Point TUN Device */
>> dev->hard_header_len = 0;
>> - dev->addr_len = 0;
>> + dev->addr_len = ETH_ALEN;
>> dev->mtu = 1500;
>> 
>> /* Zero header length */
> 
> This is a bad idea.
> TUN devices are L3 devices without any MAC address.
> This patch will change the semantics and break users.

I suspected this might be a problem, thanks for the quick
feedback.


> If you want an L2 address, you need to use TAP, not TUN device.

We can't assume how the VPN is implemented. In our case,
it's Tailscale, which creates a tun device. wireguard (in
kernel) is the same.

We would prefer a mechanism that can support tun. Having a
MAC is the easiest way forward, but is not a hard
requirement AFAICT.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 3/3] RDMA/siw: Require non-zero 6-byte MACs for soft iWARP
  2023-05-05 15:43 ` [PATCH RFC 3/3] RDMA/siw: Require non-zero 6-byte MACs for soft iWARP Chuck Lever
@ 2023-05-05 19:58   ` Jason Gunthorpe
  2023-05-05 20:03     ` Chuck Lever III
  2023-05-23 19:18     ` Chuck Lever III
  0 siblings, 2 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2023-05-05 19:58 UTC (permalink / raw)
  To: Chuck Lever; +Cc: netdev, linux-rdma, BMT, tom

On Fri, May 05, 2023 at 11:43:11AM -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> In the past, LOOPBACK and NONE (tunnel) devices had all-zero MAC
> addresses. siw_device_create() would fall back to copying the
> device's name in those cases, because an all-zero MAC address breaks
> the RDMA core IP-to-device lookup mechanism.

Why not just make up a dummy address in SIW? It shouldn't need to leak
out of it.. It is just some artifact of how the iWarp stuff has been
designed

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 3/3] RDMA/siw: Require non-zero 6-byte MACs for soft iWARP
  2023-05-05 19:58   ` Jason Gunthorpe
@ 2023-05-05 20:03     ` Chuck Lever III
  2023-05-06 18:05       ` Chuck Lever III
  2023-05-23 19:18     ` Chuck Lever III
  1 sibling, 1 reply; 17+ messages in thread
From: Chuck Lever III @ 2023-05-05 20:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Chuck Lever, open list:NETWORKING [GENERAL],
	linux-rdma, Bernard Metzler, Tom Talpey



> On May 5, 2023, at 3:58 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Fri, May 05, 2023 at 11:43:11AM -0400, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>> 
>> In the past, LOOPBACK and NONE (tunnel) devices had all-zero MAC
>> addresses. siw_device_create() would fall back to copying the
>> device's name in those cases, because an all-zero MAC address breaks
>> the RDMA core IP-to-device lookup mechanism.
> 
> Why not just make up a dummy address in SIW? It shouldn't need to leak
> out of it.. It is just some artifact of how the iWarp stuff has been
> designed

I've been trying that.

Even though the siw0 device is now registered with a non-zero GID, 
cma_acquire_dev_by_src_ip() still comes up with a zero GID which
matches no device. Address resolution then fails.

I'm still looking into why.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 3/3] RDMA/siw: Require non-zero 6-byte MACs for soft iWARP
  2023-05-05 20:03     ` Chuck Lever III
@ 2023-05-06 18:05       ` Chuck Lever III
  0 siblings, 0 replies; 17+ messages in thread
From: Chuck Lever III @ 2023-05-06 18:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Chuck Lever, open list:NETWORKING [GENERAL],
	linux-rdma, Bernard Metzler, Tom Talpey, parav


> On May 5, 2023, at 4:03 PM, Chuck Lever III <chuck.lever@oracle.com> wrote:
> 
>> On May 5, 2023, at 3:58 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>> 
>> On Fri, May 05, 2023 at 11:43:11AM -0400, Chuck Lever wrote:
>>> From: Chuck Lever <chuck.lever@oracle.com>
>>> 
>>> In the past, LOOPBACK and NONE (tunnel) devices had all-zero MAC
>>> addresses. siw_device_create() would fall back to copying the
>>> device's name in those cases, because an all-zero MAC address breaks
>>> the RDMA core IP-to-device lookup mechanism.
>> 
>> Why not just make up a dummy address in SIW? It shouldn't need to leak
>> out of it.. It is just some artifact of how the iWarp stuff has been
>> designed
> 
> I've been trying that.
> 
> Even though the siw0 device is now registered with a non-zero GID, 
> cma_acquire_dev_by_src_ip() still comes up with a zero GID which
> matches no device. Address resolution then fails.
> 
> I'm still looking into why.

The tun0 device's flags are:

   UP|POINTOPOINT|NOARP|MULTICAST

That flag combination turns addr_resolve_neigh() into a no-op, so
that the returned GIDs and addresses are uninitialized.

Cc'ing Parav because he's the last person who did significant work
on this code path. I can hack this to make it work, but I have no
idea what the proper solution would be.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 3/3] RDMA/siw: Require non-zero 6-byte MACs for soft iWARP
  2023-05-05 19:58   ` Jason Gunthorpe
  2023-05-05 20:03     ` Chuck Lever III
@ 2023-05-23 19:18     ` Chuck Lever III
  2023-05-23 19:44       ` Tom Talpey
  2023-05-31 19:04       ` Jason Gunthorpe
  1 sibling, 2 replies; 17+ messages in thread
From: Chuck Lever III @ 2023-05-23 19:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Chuck Lever, Netdev, linux-rdma, Bernard Metzler, Tom Talpey


> On May 5, 2023, at 3:58 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Fri, May 05, 2023 at 11:43:11AM -0400, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>> 
>> In the past, LOOPBACK and NONE (tunnel) devices had all-zero MAC
>> addresses. siw_device_create() would fall back to copying the
>> device's name in those cases, because an all-zero MAC address breaks
>> the RDMA core IP-to-device lookup mechanism.
> 
> Why not just make up a dummy address in SIW? It shouldn't need to leak
> out of it.. It is just some artifact of how the iWarp stuff has been
> designed

So that approach is already being done in siw_device_create(),
even though it is broken (the device name hasn't been initialized
when the phony MAC is created, so it is all zeroes). I've fixed
that and it still doesn't help.

siw cannot modify the underlying net_device to add a made-up
MAC address.

The core address resolution code wants to find an L2 address
for the egress device. The underlying ib_device, where a made-up
GID might be stored, is not involved with address resolution
AFAICT.

tun devices have no L2 address. Neither do loopback devices,
but address resolution makes an exception for LOOPBACK devices
by redirecting to a local physical Ethernet device.

Redirecting tun traffic to the local Ethernet device seems
dodgy at best.

I wasn't sure that an L2 address was required for siw before,
but now I'm pretty confident that it is required by our
implementation.

--
Chuck Lever



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 3/3] RDMA/siw: Require non-zero 6-byte MACs for soft iWARP
  2023-05-23 19:18     ` Chuck Lever III
@ 2023-05-23 19:44       ` Tom Talpey
  2023-05-23 22:50         ` Chuck Lever III
  2023-05-31 19:04       ` Jason Gunthorpe
  1 sibling, 1 reply; 17+ messages in thread
From: Tom Talpey @ 2023-05-23 19:44 UTC (permalink / raw)
  To: Chuck Lever III, Jason Gunthorpe
  Cc: Chuck Lever, Netdev, linux-rdma, Bernard Metzler

On 5/23/2023 3:18 PM, Chuck Lever III wrote:
> 
>> On May 5, 2023, at 3:58 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>
>> On Fri, May 05, 2023 at 11:43:11AM -0400, Chuck Lever wrote:
>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>
>>> In the past, LOOPBACK and NONE (tunnel) devices had all-zero MAC
>>> addresses. siw_device_create() would fall back to copying the
>>> device's name in those cases, because an all-zero MAC address breaks
>>> the RDMA core IP-to-device lookup mechanism.
>>
>> Why not just make up a dummy address in SIW? It shouldn't need to leak
>> out of it.. It is just some artifact of how the iWarp stuff has been
>> designed
> 
> So that approach is already being done in siw_device_create(),
> even though it is broken (the device name hasn't been initialized
> when the phony MAC is created, so it is all zeroes). I've fixed
> that and it still doesn't help.
> 
> siw cannot modify the underlying net_device to add a made-up
> MAC address.
> 
> The core address resolution code wants to find an L2 address
> for the egress device. The underlying ib_device, where a made-up
> GID might be stored, is not involved with address resolution
> AFAICT.
> 
> tun devices have no L2 address. Neither do loopback devices,
> but address resolution makes an exception for LOOPBACK devices
> by redirecting to a local physical Ethernet device.
> 
> Redirecting tun traffic to the local Ethernet device seems
> dodgy at best.
> 
> I wasn't sure that an L2 address was required for siw before,
> but now I'm pretty confident that it is required by our
> implementation.

Does rxe work over tunnels? Seems like it would have the same issue.

int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
{
...
         addrconf_addr_eui48((unsigned char *)&dev->node_guid,
                             rxe->ndev->dev_addr);

static struct siw_device *siw_device_create(struct net_device *netdev)
{
...
         addrconf_addr_eui48((unsigned char *)&base_dev->node_guid,
                                     netdev->dev_addr);

Tom.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 3/3] RDMA/siw: Require non-zero 6-byte MACs for soft iWARP
  2023-05-23 19:44       ` Tom Talpey
@ 2023-05-23 22:50         ` Chuck Lever III
  0 siblings, 0 replies; 17+ messages in thread
From: Chuck Lever III @ 2023-05-23 22:50 UTC (permalink / raw)
  To: Tom Talpey
  Cc: Jason Gunthorpe, Chuck Lever, Netdev, linux-rdma, Bernard Metzler



> On May 23, 2023, at 3:44 PM, Tom Talpey <tom@talpey.com> wrote:
> 
> On 5/23/2023 3:18 PM, Chuck Lever III wrote:
>>> On May 5, 2023, at 3:58 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>> 
>>> On Fri, May 05, 2023 at 11:43:11AM -0400, Chuck Lever wrote:
>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>> 
>>>> In the past, LOOPBACK and NONE (tunnel) devices had all-zero MAC
>>>> addresses. siw_device_create() would fall back to copying the
>>>> device's name in those cases, because an all-zero MAC address breaks
>>>> the RDMA core IP-to-device lookup mechanism.
>>> 
>>> Why not just make up a dummy address in SIW? It shouldn't need to leak
>>> out of it.. It is just some artifact of how the iWarp stuff has been
>>> designed
>> So that approach is already being done in siw_device_create(),
>> even though it is broken (the device name hasn't been initialized
>> when the phony MAC is created, so it is all zeroes). I've fixed
>> that and it still doesn't help.
>> siw cannot modify the underlying net_device to add a made-up
>> MAC address.
>> The core address resolution code wants to find an L2 address
>> for the egress device. The underlying ib_device, where a made-up
>> GID might be stored, is not involved with address resolution
>> AFAICT.
>> tun devices have no L2 address. Neither do loopback devices,
>> but address resolution makes an exception for LOOPBACK devices
>> by redirecting to a local physical Ethernet device.
>> Redirecting tun traffic to the local Ethernet device seems
>> dodgy at best.
>> I wasn't sure that an L2 address was required for siw before,
>> but now I'm pretty confident that it is required by our
>> implementation.
> 
> Does rxe work over tunnels?

(It's not tunnels per se, it's devices that don't have
L2 addresses... and tun happens to be one instance of
that class).

My (brief) reading of the source code is that the use of
devices that do not have L2 addresses is prohibited for
rxe.


> Seems like it would have the same issue.

Agreed, if rxe did not prohibit them, it would have the same
issue.

To be clear: siw itself and the family of iWARP protocols
shouldn't have any problem at all with such devices. The
issue seems to be with the Linux implementation of address
resolution.


> int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
> {
> ...
>        addrconf_addr_eui48((unsigned char *)&dev->node_guid,
>                            rxe->ndev->dev_addr);
> 
> static struct siw_device *siw_device_create(struct net_device *netdev)
> {
> ...
>        addrconf_addr_eui48((unsigned char *)&base_dev->node_guid,
>                                    netdev->dev_addr);
> 
> Tom.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 3/3] RDMA/siw: Require non-zero 6-byte MACs for soft iWARP
  2023-05-23 19:18     ` Chuck Lever III
  2023-05-23 19:44       ` Tom Talpey
@ 2023-05-31 19:04       ` Jason Gunthorpe
  2023-05-31 19:11         ` Chuck Lever III
  1 sibling, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2023-05-31 19:04 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Chuck Lever, Netdev, linux-rdma, Bernard Metzler, Tom Talpey

On Tue, May 23, 2023 at 07:18:18PM +0000, Chuck Lever III wrote:

> The core address resolution code wants to find an L2 address
> for the egress device. The underlying ib_device, where a made-up
> GID might be stored, is not involved with address resolution
> AFAICT.

Where are you hitting this?

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 3/3] RDMA/siw: Require non-zero 6-byte MACs for soft iWARP
  2023-05-31 19:04       ` Jason Gunthorpe
@ 2023-05-31 19:11         ` Chuck Lever III
  2023-05-31 20:09           ` Jason Gunthorpe
  0 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever III @ 2023-05-31 19:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Chuck Lever, Netdev, linux-rdma, Bernard Metzler, Tom Talpey



> On May 31, 2023, at 3:04 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Tue, May 23, 2023 at 07:18:18PM +0000, Chuck Lever III wrote:
> 
>> The core address resolution code wants to find an L2 address
>> for the egress device. The underlying ib_device, where a made-up
>> GID might be stored, is not involved with address resolution
>> AFAICT.
> 
> Where are you hitting this?

     kworker/2:0-26    [002]   551.962874: funcgraph_entry:                   |  addr_resolve() {
     kworker/2:0-26    [002]   551.962874: bprint:               addr_resolve: resolve_neigh=true resolve_by_gid_attr=false
     kworker/2:0-26    [002]   551.962874: funcgraph_entry:                   |    addr4_resolve.constprop.0() {
     kworker/2:0-26    [002]   551.962875: bprint:               addr4_resolve.constprop.0: src_in=0.0.0.0:35173 dst_in=100.72.1.2:20049
     kworker/2:0-26    [002]   551.962875: funcgraph_entry:                   |      ip_route_output_flow() {
     kworker/2:0-26    [002]   551.962875: funcgraph_entry:                   |        ip_route_output_key_hash() {
     kworker/2:0-26    [002]   551.962876: funcgraph_entry:                   |          ip_route_output_key_hash_rcu() {
     kworker/2:0-26    [002]   551.962876: funcgraph_entry:        4.526 us   |            __fib_lookup();
     kworker/2:0-26    [002]   551.962881: funcgraph_entry:        0.264 us   |            fib_select_path();
     kworker/2:0-26    [002]   551.962881: funcgraph_entry:        1.022 us   |            __mkroute_output();
     kworker/2:0-26    [002]   551.962882: funcgraph_exit:         6.705 us   |          }
     kworker/2:0-26    [002]   551.962882: funcgraph_exit:         7.283 us   |        }
     kworker/2:0-26    [002]   551.962883: funcgraph_exit:         7.624 us   |      }
     kworker/2:0-26    [002]   551.962883: funcgraph_exit:         8.395 us   |    }
     kworker/2:0-26    [002]   551.962883: funcgraph_entry:                   |    rdma_set_src_addr_rcu.constprop.0() {
     kworker/2:0-26    [002]   551.962883: bprint:               rdma_set_src_addr_rcu.constprop.0: ndev=0xffff91f5135a4000 name=tailscale0
     kworker/2:0-26    [002]   551.962884: funcgraph_entry:                   |      copy_src_l2_addr() {
     kworker/2:0-26    [002]   551.962884: funcgraph_entry:        0.984 us   |        iff_flags2string();
     kworker/2:0-26    [002]   551.962885: bprint:               copy_src_l2_addr: ndev=0xffff91f5135a4000 dst_in=100.72.1.2:20049 flags=UP|POINTOPOINT|NOARP|MULTICAST
     kworker/2:0-26    [002]   551.962885: funcgraph_entry:                   |        rdma_copy_src_l2_addr() {
     kworker/2:0-26    [002]   551.962886: funcgraph_entry:        0.148 us   |          devtype2string();
     kworker/2:0-26    [002]   551.962887: bprint:               rdma_copy_src_l2_addr: name=tailscale0 type=NONE src_dev_addr=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 broadcast=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ifindex=3
     kworker/2:0-26    [002]   551.962887: funcgraph_exit:         1.488 us   |        }
     kworker/2:0-26    [002]   551.962887: bprint:               copy_src_l2_addr: network type=IB
     kworker/2:0-26    [002]   551.962887: funcgraph_exit:         3.636 us   |      }
     kworker/2:0-26    [002]   551.962887: funcgraph_exit:         4.275 us   |    }


Address resolution finds the right device, but there's
a zero-value L2 address. Thus it cannot form a unique
GID from that. Perhaps there needs to be a call to
query_gid in here?


--
Chuck Lever



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 3/3] RDMA/siw: Require non-zero 6-byte MACs for soft iWARP
  2023-05-31 19:11         ` Chuck Lever III
@ 2023-05-31 20:09           ` Jason Gunthorpe
  2023-05-31 20:19             ` Chuck Lever III
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2023-05-31 20:09 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Chuck Lever, Netdev, linux-rdma, Bernard Metzler, Tom Talpey

On Wed, May 31, 2023 at 07:11:52PM +0000, Chuck Lever III wrote:
> 
> 
> > On May 31, 2023, at 3:04 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > 
> > On Tue, May 23, 2023 at 07:18:18PM +0000, Chuck Lever III wrote:
> > 
> >> The core address resolution code wants to find an L2 address
> >> for the egress device. The underlying ib_device, where a made-up
> >> GID might be stored, is not involved with address resolution
> >> AFAICT.
> > 
> > Where are you hitting this?
> 
>      kworker/2:0-26    [002]   551.962874: funcgraph_entry:                   |  addr_resolve() {
>      kworker/2:0-26    [002]   551.962874: bprint:               addr_resolve: resolve_neigh=true resolve_by_gid_attr=false
>      kworker/2:0-26    [002]   551.962874: funcgraph_entry:                   |    addr4_resolve.constprop.0() {
>      kworker/2:0-26    [002]   551.962875: bprint:               addr4_resolve.constprop.0: src_in=0.0.0.0:35173 dst_in=100.72.1.2:20049
>      kworker/2:0-26    [002]   551.962875: funcgraph_entry:                   |      ip_route_output_flow() {
>      kworker/2:0-26    [002]   551.962875: funcgraph_entry:                   |        ip_route_output_key_hash() {
>      kworker/2:0-26    [002]   551.962876: funcgraph_entry:                   |          ip_route_output_key_hash_rcu() {
>      kworker/2:0-26    [002]   551.962876: funcgraph_entry:        4.526 us   |            __fib_lookup();
>      kworker/2:0-26    [002]   551.962881: funcgraph_entry:        0.264 us   |            fib_select_path();
>      kworker/2:0-26    [002]   551.962881: funcgraph_entry:        1.022 us   |            __mkroute_output();
>      kworker/2:0-26    [002]   551.962882: funcgraph_exit:         6.705 us   |          }
>      kworker/2:0-26    [002]   551.962882: funcgraph_exit:         7.283 us   |        }
>      kworker/2:0-26    [002]   551.962883: funcgraph_exit:         7.624 us   |      }
>      kworker/2:0-26    [002]   551.962883: funcgraph_exit:         8.395 us   |    }
>      kworker/2:0-26    [002]   551.962883: funcgraph_entry:                   |    rdma_set_src_addr_rcu.constprop.0() {
>      kworker/2:0-26    [002]   551.962883: bprint:               rdma_set_src_addr_rcu.constprop.0: ndev=0xffff91f5135a4000 name=tailscale0
>      kworker/2:0-26    [002]   551.962884: funcgraph_entry:                   |      copy_src_l2_addr() {
>      kworker/2:0-26    [002]   551.962884: funcgraph_entry:        0.984 us   |        iff_flags2string();
>      kworker/2:0-26    [002]   551.962885: bprint:               copy_src_l2_addr: ndev=0xffff91f5135a4000 dst_in=100.72.1.2:20049 flags=UP|POINTOPOINT|NOARP|MULTICAST
>      kworker/2:0-26    [002]   551.962885: funcgraph_entry:                   |        rdma_copy_src_l2_addr() {
>      kworker/2:0-26    [002]   551.962886: funcgraph_entry:        0.148 us   |          devtype2string();
>      kworker/2:0-26    [002]   551.962887: bprint:               rdma_copy_src_l2_addr: name=tailscale0 type=NONE src_dev_addr=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 broadcast=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ifindex=3
>      kworker/2:0-26    [002]   551.962887: funcgraph_exit:         1.488 us   |        }
>      kworker/2:0-26    [002]   551.962887: bprint:               copy_src_l2_addr: network type=IB
>      kworker/2:0-26    [002]   551.962887: funcgraph_exit:         3.636 us   |      }
>      kworker/2:0-26    [002]   551.962887: funcgraph_exit:         4.275 us   |    }
> 
> 
> Address resolution finds the right device, but there's
> a zero-value L2 address.

Sure, but why is that a problem?

This got to rdma_set_src_addr_rcu, so the resolution suceeded, where
is the failure? From the above trace I think addr_resolve() succeeded?

> Thus it cannot form a unique GID from that. Perhaps there needs to
> be a call to query_gid in here?

So your issue is cma_iw_acquire_dev() which looks like it is encoding
the MAC into the GID for some reason? We don't do that on rocee, the
GID encodes the IP address

I have no idea how iWarp works, but this is surprising that it puts a
MAC in the GID..

If the iwarp device has only one GID ever and it is always the "MAC"
the cma_iw_acquire_dev()'s logic is simply wrong, it should check that
the dev_addr's netdev matches the one and only GID and just use the
GID. No reason to search for GIDs.

A small edit to cma_validate_port() might make sense, it is kind of
wrong to force the gid_type to IB_GID_TYPE_IB for whatever ARPHRD type
the tunnel is using.

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC 3/3] RDMA/siw: Require non-zero 6-byte MACs for soft iWARP
  2023-05-31 20:09           ` Jason Gunthorpe
@ 2023-05-31 20:19             ` Chuck Lever III
  0 siblings, 0 replies; 17+ messages in thread
From: Chuck Lever III @ 2023-05-31 20:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Chuck Lever, Netdev, linux-rdma, Bernard Metzler, Tom Talpey



> On May 31, 2023, at 4:09 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> On Wed, May 31, 2023 at 07:11:52PM +0000, Chuck Lever III wrote:
>> 
>> 
>>> On May 31, 2023, at 3:04 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>>> 
>>> On Tue, May 23, 2023 at 07:18:18PM +0000, Chuck Lever III wrote:
>>> 
>>>> The core address resolution code wants to find an L2 address
>>>> for the egress device. The underlying ib_device, where a made-up
>>>> GID might be stored, is not involved with address resolution
>>>> AFAICT.
>>> 
>>> Where are you hitting this?
>> 
>>     kworker/2:0-26    [002]   551.962874: funcgraph_entry:                   |  addr_resolve() {
>>     kworker/2:0-26    [002]   551.962874: bprint:               addr_resolve: resolve_neigh=true resolve_by_gid_attr=false
>>     kworker/2:0-26    [002]   551.962874: funcgraph_entry:                   |    addr4_resolve.constprop.0() {
>>     kworker/2:0-26    [002]   551.962875: bprint:               addr4_resolve.constprop.0: src_in=0.0.0.0:35173 dst_in=100.72.1.2:20049
>>     kworker/2:0-26    [002]   551.962875: funcgraph_entry:                   |      ip_route_output_flow() {
>>     kworker/2:0-26    [002]   551.962875: funcgraph_entry:                   |        ip_route_output_key_hash() {
>>     kworker/2:0-26    [002]   551.962876: funcgraph_entry:                   |          ip_route_output_key_hash_rcu() {
>>     kworker/2:0-26    [002]   551.962876: funcgraph_entry:        4.526 us   |            __fib_lookup();
>>     kworker/2:0-26    [002]   551.962881: funcgraph_entry:        0.264 us   |            fib_select_path();
>>     kworker/2:0-26    [002]   551.962881: funcgraph_entry:        1.022 us   |            __mkroute_output();
>>     kworker/2:0-26    [002]   551.962882: funcgraph_exit:         6.705 us   |          }
>>     kworker/2:0-26    [002]   551.962882: funcgraph_exit:         7.283 us   |        }
>>     kworker/2:0-26    [002]   551.962883: funcgraph_exit:         7.624 us   |      }
>>     kworker/2:0-26    [002]   551.962883: funcgraph_exit:         8.395 us   |    }
>>     kworker/2:0-26    [002]   551.962883: funcgraph_entry:                   |    rdma_set_src_addr_rcu.constprop.0() {
>>     kworker/2:0-26    [002]   551.962883: bprint:               rdma_set_src_addr_rcu.constprop.0: ndev=0xffff91f5135a4000 name=tailscale0
>>     kworker/2:0-26    [002]   551.962884: funcgraph_entry:                   |      copy_src_l2_addr() {
>>     kworker/2:0-26    [002]   551.962884: funcgraph_entry:        0.984 us   |        iff_flags2string();
>>     kworker/2:0-26    [002]   551.962885: bprint:               copy_src_l2_addr: ndev=0xffff91f5135a4000 dst_in=100.72.1.2:20049 flags=UP|POINTOPOINT|NOARP|MULTICAST
>>     kworker/2:0-26    [002]   551.962885: funcgraph_entry:                   |        rdma_copy_src_l2_addr() {
>>     kworker/2:0-26    [002]   551.962886: funcgraph_entry:        0.148 us   |          devtype2string();
>>     kworker/2:0-26    [002]   551.962887: bprint:               rdma_copy_src_l2_addr: name=tailscale0 type=NONE src_dev_addr=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 broadcast=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ifindex=3
>>     kworker/2:0-26    [002]   551.962887: funcgraph_exit:         1.488 us   |        }
>>     kworker/2:0-26    [002]   551.962887: bprint:               copy_src_l2_addr: network type=IB
>>     kworker/2:0-26    [002]   551.962887: funcgraph_exit:         3.636 us   |      }
>>     kworker/2:0-26    [002]   551.962887: funcgraph_exit:         4.275 us   |    }
>> 
>> 
>> Address resolution finds the right device, but there's
>> a zero-value L2 address.
> 
> Sure, but why is that a problem?
> 
> This got to rdma_set_src_addr_rcu, so the resolution suceeded, where
> is the failure? From the above trace I think addr_resolve() succeeded?

Possibly it did succeed. But the ULP consumer sees CM_ADDR_ERROR_EVENT,
and does not proceed to route resolution.


>> Thus it cannot form a unique GID from that. Perhaps there needs to
>> be a call to query_gid in here?
> 
> So your issue is cma_iw_acquire_dev() which looks like it is encoding
> the MAC into the GID for some reason? We don't do that on rocee, the
> GID encodes the IP address

Well, I'm not getting there at all on the initiator side.
cma_iw_acquire_dev() is called only for listeners, I thought.


> 
> I have no idea how iWarp works, but this is surprising that it puts a
> MAC in the GID..
> 
> If the iwarp device has only one GID ever and it is always the "MAC"
> the cma_iw_acquire_dev()'s logic is simply wrong, it should check that
> the dev_addr's netdev matches the one and only GID and just use the
> GID. No reason to search for GIDs.
> 
> A small edit to cma_validate_port() might make sense, it is kind of
> wrong to force the gid_type to IB_GID_TYPE_IB for whatever ARPHRD type
> the tunnel is using.

I will have a look.

--
Chuck Lever



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-05-31 20:20 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-05 15:41 [PATCH RFC 0/3] siw on tunnel devices Chuck Lever
2023-05-05 15:42 ` [PATCH RFC 1/3] net/tun: Ensure tun devices have a MAC address Chuck Lever
2023-05-05 16:59   ` Stephen Hemminger
2023-05-05 17:09     ` Chuck Lever III
2023-05-05 15:42 ` [PATCH RFC 2/3] net/lo: Ensure lo " Chuck Lever
2023-05-05 16:57   ` Stephen Hemminger
2023-05-05 15:43 ` [PATCH RFC 3/3] RDMA/siw: Require non-zero 6-byte MACs for soft iWARP Chuck Lever
2023-05-05 19:58   ` Jason Gunthorpe
2023-05-05 20:03     ` Chuck Lever III
2023-05-06 18:05       ` Chuck Lever III
2023-05-23 19:18     ` Chuck Lever III
2023-05-23 19:44       ` Tom Talpey
2023-05-23 22:50         ` Chuck Lever III
2023-05-31 19:04       ` Jason Gunthorpe
2023-05-31 19:11         ` Chuck Lever III
2023-05-31 20:09           ` Jason Gunthorpe
2023-05-31 20:19             ` Chuck Lever III

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.