All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Fix: Remove racy Subnet Manager sendonly join checks
@ 2021-01-25 11:28 Christoph Lameter
  2021-01-25 11:44 ` Leon Romanovsky
  2021-01-28 14:03 ` Jason Gunthorpe
  0 siblings, 2 replies; 14+ messages in thread
From: Christoph Lameter @ 2021-01-25 11:28 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Jason Gunthorpe, linux-rdma

On Sun, 24 Jan 2021, Leon Romanovsky wrote:

> > Since all SMs out there have had support for sendonly join for years now
> > we could just remove the check entirely. If there is an old grizzly SM out
> > there then it would not process that join request and would return an
> > error.
>
> I have no idea if it possible, if yes, this will be the best solution.

Ok hier ist ein neuer Patch:

From: Christoph Lameter <cl@linux.com>
Subject: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks

When a system receives a REREG event from the SM, then the SM information in
the kernel is marked as invalid and a request is sent to the SM to update
the information. The SM information is invalid in that time period.

However, receiving a REREG also occurs simultaneously in user space
applications that are now trying to rejoin the multicast groups. Some of those
may be sendonly multicast groups which are then failing.

If the SM information is invalid then ib_sa_sendonly_fullmem_support()
returns false. That is wrong because it just means that we do not know
yet if the potentially new SM supports sendonly joins.

Sendonly join was introduced in 2015 and all the Subnet managers have
supported it ever since. So there is no point in checking if a subnet
manager supports it.

Should an old opensm get a request for a sendonly join then the request
will fail. The code that is removed here accomodated that situation
and fell back to a full join.

Falling back to a full join is problematic in itself. The reason to
use the sendonly join was to reduce the traffic on the Infiniband
fabric otherwise one could have just stayed with the regular join.
So this patch may cause users of very old opensms to discover that
lots of traffic needlessly crosses their IB fabrics.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/drivers/infiniband/core/cma.c
===================================================================
--- linux.orig/drivers/infiniband/core/cma.c	2020-12-17 14:51:15.301206041 +0000
+++ linux/drivers/infiniband/core/cma.c	2021-01-25 09:39:29.191032891 +0000
@@ -4542,17 +4542,6 @@ static int cma_join_ib_multicast(struct
 	rec.pkey = cpu_to_be16(ib_addr_get_pkey(dev_addr));
 	rec.join_state = mc->join_state;

-	if ((rec.join_state == BIT(SENDONLY_FULLMEMBER_JOIN)) &&
-	    (!ib_sa_sendonly_fullmem_support(&sa_client,
-					     id_priv->id.device,
-					     id_priv->id.port_num))) {
-		dev_warn(
-			&id_priv->id.device->dev,
-			"RDMA CM: port %u Unable to multicast join: SM doesn't support Send Only Full Member option\n",
-			id_priv->id.port_num);
-		return -EOPNOTSUPP;
-	}
-
 	comp_mask = IB_SA_MCMEMBER_REC_MGID | IB_SA_MCMEMBER_REC_PORT_GID |
 		    IB_SA_MCMEMBER_REC_PKEY | IB_SA_MCMEMBER_REC_JOIN_STATE |
 		    IB_SA_MCMEMBER_REC_QKEY | IB_SA_MCMEMBER_REC_SL |
Index: linux/drivers/infiniband/core/sa_query.c
===================================================================
--- linux.orig/drivers/infiniband/core/sa_query.c	2021-01-25 09:36:56.000000000 +0000
+++ linux/drivers/infiniband/core/sa_query.c	2021-01-25 09:38:09.818795183 +0000
@@ -1951,30 +1951,6 @@ err1:
 }
 EXPORT_SYMBOL(ib_sa_guid_info_rec_query);

-bool ib_sa_sendonly_fullmem_support(struct ib_sa_client *client,
-				    struct ib_device *device,
-				    u8 port_num)
-{
-	struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client);
-	struct ib_sa_port *port;
-	bool ret = false;
-	unsigned long flags;
-
-	if (!sa_dev)
-		return ret;
-
-	port  = &sa_dev->port[port_num - sa_dev->start_port];
-
-	spin_lock_irqsave(&port->classport_lock, flags);
-	if ((port->classport_info.valid) &&
-	    (port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB))
-		ret = ib_get_cpi_capmask2(&port->classport_info.data.ib)
-			& IB_SA_CAP_MASK2_SENDONLY_FULL_MEM_SUPPORT;
-	spin_unlock_irqrestore(&port->classport_lock, flags);
-	return ret;
-}
-EXPORT_SYMBOL(ib_sa_sendonly_fullmem_support);
-
 struct ib_classport_info_context {
 	struct completion	done;
 	struct ib_sa_query	*sa_query;
Index: linux/drivers/infiniband/ulp/ipoib/ipoib.h
===================================================================
--- linux.orig/drivers/infiniband/ulp/ipoib/ipoib.h	2020-08-11 13:08:51.122523955 +0000
+++ linux/drivers/infiniband/ulp/ipoib/ipoib.h	2021-01-25 09:42:34.783587162 +0000
@@ -413,7 +413,6 @@ struct ipoib_dev_priv {
 	u64	hca_caps;
 	struct ipoib_ethtool_st ethtool;
 	unsigned int max_send_sge;
-	bool sm_fullmember_sendonly_support;
 	const struct net_device_ops	*rn_ops;
 };

Index: linux/drivers/infiniband/ulp/ipoib/ipoib_main.c
===================================================================
--- linux.orig/drivers/infiniband/ulp/ipoib/ipoib_main.c	2020-12-17 14:51:15.333206132 +0000
+++ linux/drivers/infiniband/ulp/ipoib/ipoib_main.c	2021-01-25 09:43:03.911673987 +0000
@@ -141,8 +141,6 @@ int ipoib_open(struct net_device *dev)

 	set_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags);

-	priv->sm_fullmember_sendonly_support = false;
-
 	if (ipoib_ib_dev_open(dev)) {
 		if (!test_bit(IPOIB_PKEY_ASSIGNED, &priv->flags))
 			return 0;
Index: linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
===================================================================
--- linux.orig/drivers/infiniband/ulp/ipoib/ipoib_multicast.c	2020-08-11 13:08:51.122523955 +0000
+++ linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c	2021-01-25 09:41:24.415377238 +0000
@@ -334,15 +334,6 @@ void ipoib_mcast_carrier_on_task(struct
 		return;
 	}
 	/*
-	 * Check if can send sendonly MCG's with sendonly-fullmember join state.
-	 * It done here after the successfully join to the broadcast group,
-	 * because the broadcast group must always be joined first and is always
-	 * re-joined if the SM changes substantially.
-	 */
-	priv->sm_fullmember_sendonly_support =
-		ib_sa_sendonly_fullmem_support(&ipoib_sa_client,
-					       priv->ca, priv->port);
-	/*
 	 * Take rtnl_lock to avoid racing with ipoib_stop() and
 	 * turning the carrier back on while a device is being
 	 * removed.  However, ipoib_stop() will attempt to flush
@@ -537,9 +528,7 @@ static int ipoib_mcast_join(struct net_d
 		 * most closely emulates the behavior, from a user space
 		 * application perspective, of Ethernet multicast operation.
 		 */
-		if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&
-		    priv->sm_fullmember_sendonly_support)
-			/* SM supports sendonly-fullmember, otherwise fallback to full-member */
+		if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
 			rec.join_state = SENDONLY_FULLMEMBER_JOIN;
 	}
 	spin_unlock_irq(&priv->lock);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks
  2021-01-25 11:28 [PATCH] Fix: Remove racy Subnet Manager sendonly join checks Christoph Lameter
@ 2021-01-25 11:44 ` Leon Romanovsky
  2021-01-28 14:03 ` Jason Gunthorpe
  1 sibling, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2021-01-25 11:44 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Jason Gunthorpe, linux-rdma

On Mon, Jan 25, 2021 at 11:28:57AM +0000, Christoph Lameter wrote:
> On Sun, 24 Jan 2021, Leon Romanovsky wrote:
>
> > > Since all SMs out there have had support for sendonly join for years now
> > > we could just remove the check entirely. If there is an old grizzly SM out
> > > there then it would not process that join request and would return an
> > > error.
> >
> > I have no idea if it possible, if yes, this will be the best solution.
>
> Ok hier ist ein neuer Patch:

Ich habe es zum testen genommen. danke.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks
  2021-01-25 11:28 [PATCH] Fix: Remove racy Subnet Manager sendonly join checks Christoph Lameter
  2021-01-25 11:44 ` Leon Romanovsky
@ 2021-01-28 14:03 ` Jason Gunthorpe
  2021-01-28 14:21   ` Leon Romanovsky
  2021-01-28 14:34   ` Christoph Lameter
  1 sibling, 2 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2021-01-28 14:03 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Leon Romanovsky, linux-rdma

On Mon, Jan 25, 2021 at 11:28:57AM +0000, Christoph Lameter wrote:
> On Sun, 24 Jan 2021, Leon Romanovsky wrote:
> 
> > > Since all SMs out there have had support for sendonly join for years now
> > > we could just remove the check entirely. If there is an old grizzly SM out
> > > there then it would not process that join request and would return an
> > > error.
> >
> > I have no idea if it possible, if yes, this will be the best solution.
> 
> Ok hier ist ein neuer Patch:
> 
> From: Christoph Lameter <cl@linux.com>
> Subject: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks

I need patches to be sent in a way that shows in patchworks to be
applied:

https://patchwork.kernel.org/project/linux-rdma/list/

> Index: linux/drivers/infiniband/core/cma.c
> ===================================================================
> +++ linux/drivers/infiniband/core/cma.c	2021-01-25 09:39:29.191032891 +0000
> @@ -4542,17 +4542,6 @@ static int cma_join_ib_multicast(struct

Also if patches aren't generated with 'git diff' then I won't fix any
minor conflicts :(

Thanks,
Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks
  2021-01-28 14:03 ` Jason Gunthorpe
@ 2021-01-28 14:21   ` Leon Romanovsky
  2021-01-28 14:34   ` Christoph Lameter
  1 sibling, 0 replies; 14+ messages in thread
From: Leon Romanovsky @ 2021-01-28 14:21 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Christoph Lameter, linux-rdma

On Thu, Jan 28, 2021 at 10:03:35AM -0400, Jason Gunthorpe wrote:
> On Mon, Jan 25, 2021 at 11:28:57AM +0000, Christoph Lameter wrote:
> > On Sun, 24 Jan 2021, Leon Romanovsky wrote:
> >
> > > > Since all SMs out there have had support for sendonly join for years now
> > > > we could just remove the check entirely. If there is an old grizzly SM out
> > > > there then it would not process that join request and would return an
> > > > error.
> > >
> > > I have no idea if it possible, if yes, this will be the best solution.
> >
> > Ok hier ist ein neuer Patch:
> >
> > From: Christoph Lameter <cl@linux.com>
> > Subject: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks
>
> I need patches to be sent in a way that shows in patchworks to be
> applied:
>
> https://patchwork.kernel.org/project/linux-rdma/list/
>
> > Index: linux/drivers/infiniband/core/cma.c
> > ===================================================================
> > +++ linux/drivers/infiniband/core/cma.c	2021-01-25 09:39:29.191032891 +0000
> > @@ -4542,17 +4542,6 @@ static int cma_join_ib_multicast(struct
>
> Also if patches aren't generated with 'git diff' then I won't fix any
> minor conflicts :(

My mutt2git script picked this patch correctly and without conflicts :).
Anyway, from our (MLNX testing) perspective this patch is OK.

Thanks

>
> Thanks,
> Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks
  2021-01-28 14:03 ` Jason Gunthorpe
  2021-01-28 14:21   ` Leon Romanovsky
@ 2021-01-28 14:34   ` Christoph Lameter
  2021-01-28 14:44     ` Jason Gunthorpe
  1 sibling, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2021-01-28 14:34 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

On Thu, 28 Jan 2021, Jason Gunthorpe wrote:

> I need patches to be sent in a way that shows in patchworks to be
> applied:
>
> https://patchwork.kernel.org/project/linux-rdma/list/


I see it in patchworks:

https://patchwork.kernel.org/project/linux-rdma/patch/alpine.DEB.2.22.394.2101251126090.344695@www.lameter.com/

> > Index: linux/drivers/infiniband/core/cma.c
> > ===================================================================
> > +++ linux/drivers/infiniband/core/cma.c	2021-01-25 09:39:29.191032891 +0000
> > @@ -4542,17 +4542,6 @@ static int cma_join_ib_multicast(struct
>
> Also if patches aren't generated with 'git diff' then I won't fix any
> minor conflicts :(

Well it was quilt ...... Do I need to put it into a git tree somewhere?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks
  2021-01-28 14:34   ` Christoph Lameter
@ 2021-01-28 14:44     ` Jason Gunthorpe
  2021-01-28 14:58       ` Christoph Lameter
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Gunthorpe @ 2021-01-28 14:44 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Leon Romanovsky, linux-rdma

On Thu, Jan 28, 2021 at 02:34:45PM +0000, Christoph Lameter wrote:
> On Thu, 28 Jan 2021, Jason Gunthorpe wrote:
> 
> > I need patches to be sent in a way that shows in patchworks to be
> > applied:
> >
> > https://patchwork.kernel.org/project/linux-rdma/list/
> 
> 
> I see it in patchworks:
> 
> https://patchwork.kernel.org/project/linux-rdma/patch/alpine.DEB.2.22.394.2101251126090.344695@www.lameter.com/

It is not in the right format in patchwork, I get this mess when
applying it:

commit 9215f573b2ce9233b6d99d7b9b45bbcf3b2d9d90 (HEAD -> k.o/for-next)
Author: Christoph Lameter <cl@linux.com>
Date:   Mon Jan 25 11:28:57 2021 +0000

    Fix: Remove racy Subnet Manager sendonly join checks
    
    On Sun, 24 Jan 2021, Leon Romanovsky wrote:
    
    > > Since all SMs out there have had support for sendonly join for years now
    > > we could just remove the check entirely. If there is an old grizzly SM out
    > > there then it would not process that join request and would return an
    > > error.
    >
    > I have no idea if it possible, if yes, this will be the best solution.
    
    Ok hier ist ein neuer Patch:
    
    From: Christoph Lameter <cl@linux.com>
    Subject: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks
    
    When a system receives a REREG event from the SM, then the SM information in
    the kernel is marked as invalid and a request is sent to the SM to update
    the information. The SM information is invalid in that time period.
    
    However, receiving a REREG also occurs simultaneously in user space
    applications that are now trying to rejoin the multicast groups. Some of those
    may be sendonly multicast groups which are then failing.
    
    If the SM information is invalid then ib_sa_sendonly_fullmem_support()
    returns false. That is wrong because it just means that we do not know
    yet if the potentially new SM supports sendonly joins.
    
    Sendonly join was introduced in 2015 and all the Subnet managers have
    supported it ever since. So there is no point in checking if a subnet
    manager supports it.
    
    Should an old opensm get a request for a sendonly join then the request
    will fail. The code that is removed here accomodated that situation
    and fell back to a full join.
    
    Falling back to a full join is problematic in itself. The reason to
    use the sendonly join was to reduce the traffic on the Infiniband
    fabric otherwise one could have just stayed with the regular join.
    So this patch may cause users of very old opensms to discover that
    lots of traffic needlessly crosses their IB fabrics.
    
    Signed-off-by: Christoph Lameter <cl@linux.com>
    
    Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2101251126090.344695@www.lameter.com
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

> > > Index: linux/drivers/infiniband/core/cma.c
> > > ===================================================================
> > > +++ linux/drivers/infiniband/core/cma.c	2021-01-25 09:39:29.191032891 +0000
> > > @@ -4542,17 +4542,6 @@ static int cma_join_ib_multicast(struct
> >
> > Also if patches aren't generated with 'git diff' then I won't fix any
> > minor conflicts :(
> 
> Well it was quilt ...... Do I need to put it into a git tree somewhere?

If you are doing this a lot get a quilt that can generate git diff
format output.

https://lists.gnu.org/archive/html/quilt-dev/2015-06/msg00002.html

Jason


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks
  2021-01-28 14:44     ` Jason Gunthorpe
@ 2021-01-28 14:58       ` Christoph Lameter
  2021-01-28 18:11         ` Jason Gunthorpe
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2021-01-28 14:58 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

On Thu, 28 Jan 2021, Jason Gunthorpe wrote:

> > Well it was quilt ...... Do I need to put it into a git tree somewhere?
>
> If you are doing this a lot get a quilt that can generate git diff
> format output.
>
> https://lists.gnu.org/archive/html/quilt-dev/2015-06/msg00002.html

Sadly that patch was never merged.

Will this do it?


commit 64e734c38f509d591073fc1e1db3caa42be3b874
Author: Christoph Lameter <cl@linux.com>
Date:   Thu Jan 28 14:55:36 2021 +0000

    Fix: Remove racy Subnet Manager sendonly join checks

    When a system receives a REREG event from the SM, then the SM information in
    the kernel is marked as invalid and a request is sent to the SM to update
    the information. The SM information is invalid in that time period.

    However, receiving a REREG also occurs simultaneously in user space
    applications that are now trying to rejoin the multicast groups. Some of those
    may be sendonly multicast groups which are then failing.

    If the SM information is invalid then ib_sa_sendonly_fullmem_support()
    returns false. That is wrong because it just means that we do not know
    yet if the potentially new SM supports sendonly joins.

    Sendonly join was introduced in 2015 and all the Subnet managers have
    supported it ever since. So there is no point in checking if a subnet
    manager supports it.

    Should an old opensm get a request for a sendonly join then the request
    will fail. The code that is removed here accomodated that situation
    and fell back to a full join.

    Falling back to a full join is problematic in itself. The reason to
    use the sendonly join was to reduce the traffic on the Infiniband
    fabric otherwise one could have just stayed with the regular join.
    So this patch may cause users of very old opensms to discover that
    lots of traffic needlessly crosses their IB fabrics.

    Signed-off-by: Christoph Lameter <cl@linux.com>

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index c51b84b2d2f3..58ee7004c8d8 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -4542,17 +4542,6 @@ static int cma_join_ib_multicast(struct rdma_id_private *id_priv,
 	rec.pkey = cpu_to_be16(ib_addr_get_pkey(dev_addr));
 	rec.join_state = mc->join_state;

-	if ((rec.join_state == BIT(SENDONLY_FULLMEMBER_JOIN)) &&
-	    (!ib_sa_sendonly_fullmem_support(&sa_client,
-					     id_priv->id.device,
-					     id_priv->id.port_num))) {
-		dev_warn(
-			&id_priv->id.device->dev,
-			"RDMA CM: port %u Unable to multicast join: SM doesn't support Send Only Full Member option\n",
-			id_priv->id.port_num);
-		return -EOPNOTSUPP;
-	}
-
 	comp_mask = IB_SA_MCMEMBER_REC_MGID | IB_SA_MCMEMBER_REC_PORT_GID |
 		    IB_SA_MCMEMBER_REC_PKEY | IB_SA_MCMEMBER_REC_JOIN_STATE |
 		    IB_SA_MCMEMBER_REC_QKEY | IB_SA_MCMEMBER_REC_SL |
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 89a831fa1885..921b097d6035 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -1951,30 +1951,6 @@ int ib_sa_guid_info_rec_query(struct ib_sa_client *client,
 }
 EXPORT_SYMBOL(ib_sa_guid_info_rec_query);

-bool ib_sa_sendonly_fullmem_support(struct ib_sa_client *client,
-				    struct ib_device *device,
-				    u8 port_num)
-{
-	struct ib_sa_device *sa_dev = ib_get_client_data(device, &sa_client);
-	struct ib_sa_port *port;
-	bool ret = false;
-	unsigned long flags;
-
-	if (!sa_dev)
-		return ret;
-
-	port  = &sa_dev->port[port_num - sa_dev->start_port];
-
-	spin_lock_irqsave(&port->classport_lock, flags);
-	if ((port->classport_info.valid) &&
-	    (port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB))
-		ret = ib_get_cpi_capmask2(&port->classport_info.data.ib)
-			& IB_SA_CAP_MASK2_SENDONLY_FULL_MEM_SUPPORT;
-	spin_unlock_irqrestore(&port->classport_lock, flags);
-	return ret;
-}
-EXPORT_SYMBOL(ib_sa_sendonly_fullmem_support);
-
 struct ib_classport_info_context {
 	struct completion	done;
 	struct ib_sa_query	*sa_query;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 3440dc48d02c..179ff1d068e5 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -413,7 +413,6 @@ struct ipoib_dev_priv {
 	u64	hca_caps;
 	struct ipoib_ethtool_st ethtool;
 	unsigned int max_send_sge;
-	bool sm_fullmember_sendonly_support;
 	const struct net_device_ops	*rn_ops;
 };

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index a6f413491321..e16b40c09f82 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -141,8 +141,6 @@ int ipoib_open(struct net_device *dev)

 	set_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags);

-	priv->sm_fullmember_sendonly_support = false;
-
 	if (ipoib_ib_dev_open(dev)) {
 		if (!test_bit(IPOIB_PKEY_ASSIGNED, &priv->flags))
 			return 0;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 86e4ed64e4e2..0a444ed11818 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -333,15 +333,6 @@ void ipoib_mcast_carrier_on_task(struct work_struct *work)
 		ipoib_dbg(priv, "Keeping carrier off until IB port is active\n");
 		return;
 	}
-	/*
-	 * Check if can send sendonly MCG's with sendonly-fullmember join state.
-	 * It done here after the successfully join to the broadcast group,
-	 * because the broadcast group must always be joined first and is always
-	 * re-joined if the SM changes substantially.
-	 */
-	priv->sm_fullmember_sendonly_support =
-		ib_sa_sendonly_fullmem_support(&ipoib_sa_client,
-					       priv->ca, priv->port);
 	/*
 	 * Take rtnl_lock to avoid racing with ipoib_stop() and
 	 * turning the carrier back on while a device is being
@@ -537,9 +528,7 @@ static int ipoib_mcast_join(struct net_device *dev, struct ipoib_mcast *mcast)
 		 * most closely emulates the behavior, from a user space
 		 * application perspective, of Ethernet multicast operation.
 		 */
-		if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags) &&
-		    priv->sm_fullmember_sendonly_support)
-			/* SM supports sendonly-fullmember, otherwise fallback to full-member */
+		if (test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags))
 			rec.join_state = SENDONLY_FULLMEMBER_JOIN;
 	}
 	spin_unlock_irq(&priv->lock);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks
  2021-01-28 14:58       ` Christoph Lameter
@ 2021-01-28 18:11         ` Jason Gunthorpe
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2021-01-28 18:11 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Leon Romanovsky, linux-rdma

On Thu, Jan 28, 2021 at 02:58:01PM +0000, Christoph Lameter wrote:
> On Thu, 28 Jan 2021, Jason Gunthorpe wrote:
> 
> > > Well it was quilt ...... Do I need to put it into a git tree somewhere?
> >
> > If you are doing this a lot get a quilt that can generate git diff
> > format output.
> >
> > https://lists.gnu.org/archive/html/quilt-dev/2015-06/msg00002.html
> 
> Sadly that patch was never merged.
> 
> Will this do it?

Patchworks ingored it

> 
> commit 64e734c38f509d591073fc1e1db3caa42be3b874
> Author: Christoph Lameter <cl@linux.com>
> Date:   Thu Jan 28 14:55:36 2021 +0000
> 
>     Fix: Remove racy Subnet Manager sendonly join checks
> 
>     When a system receives a REREG event from the SM, then the SM information in
>     the kernel is marked as invalid and a request is sent to the SM to update
>     the information. The SM information is invalid in that time period.
> 
>     However, receiving a REREG also occurs simultaneously in user space
>     applications that are now trying to rejoin the multicast groups. Some of those
>     may be sendonly multicast groups which are then failing.
> 
>     If the SM information is invalid then ib_sa_sendonly_fullmem_support()
>     returns false. That is wrong because it just means that we do not know
>     yet if the potentially new SM supports sendonly joins.
> 
>     Sendonly join was introduced in 2015 and all the Subnet managers have
>     supported it ever since. So there is no point in checking if a subnet
>     manager supports it.
> 
>     Should an old opensm get a request for a sendonly join then the request
>     will fail. The code that is removed here accomodated that situation
>     and fell back to a full join.
> 
>     Falling back to a full join is problematic in itself. The reason to
>     use the sendonly join was to reduce the traffic on the Infiniband
>     fabric otherwise one could have just stayed with the regular join.
>     So this patch may cause users of very old opensms to discover that
>     lots of traffic needlessly crosses their IB fabrics.
> 
>     Signed-off-by: Christoph Lameter <cl@linux.com>

This is 'git show', not 'git format-patch', tooling requires 'git
format-patch' output. Preferably in a clean new email to get reliably
captured by patchworks

> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index c51b84b2d2f3..58ee7004c8d8 100644
> +++ b/drivers/infiniband/core/cma.c

But this is all OK now, the index line is what allows easy resolving
conflicts

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks
  2021-02-10  9:31   ` Christoph Lameter
  2021-02-10 13:03     ` Jason Gunthorpe
@ 2021-02-12 22:16     ` Dennis Dalessandro
  1 sibling, 0 replies; 14+ messages in thread
From: Dennis Dalessandro @ 2021-02-12 22:16 UTC (permalink / raw)
  To: Christoph Lameter, Jason Gunthorpe; +Cc: linux-rdma, Leon Romanovsky



On 2/10/2021 4:31 AM, Christoph Lameter wrote:
> On Tue, 9 Feb 2021, Jason Gunthorpe wrote:
> 
>> This one got spam filtered and didn't make it to the list:
>>
>> Received-SPF: SoftFail (hqemgatev14.nvidia.com: domain of
>>          cl@linux.com is inclined to not designate 3.19.106.255 as
>>          permitted sender) identity=mailfrom; client-ip=3.19.106.255;
>>          receiver=hqemgatev14.nvidia.com;
>>          envelope-from="cl@linux.com"; x-sender="cl@linux.com";
>>          x-conformance=spf_only; x-record-type="v=spf1"
>>
>> Also the extra From/Date/Subject ended up in the commit message
> 
> Yes the Linux Foundation guys are not willing to address this issue in any
> way. I may have to give up my linux.com email address.
> 
>> I fixed it all up, applied to for-next
> 
> Thank you.
> 
>> It looks like OPA will also suffer this race (opa_pr_query_possible),
>> maybe it is a little less likely since it will be driven by PR queries
>> not broadcast joins.
>>
>> But the same logic is likely true there, I'd be surprised if OPA
>> fabrics are not running a capable OPA SM at this point.
> 
> There is also another potentially racy check in there for OPA in regards
> to the support of path records?
> 
> static bool ib_sa_opa_pathrecord_support(struct ib_sa_client *client,
>                                           struct ib_sa_device *sa_dev,
>                                           u8 port_num)
> {
>          struct ib_sa_port *port;
>          unsigned long flags;
>          bool ret = false;
> 
>          port = &sa_dev->port[port_num - sa_dev->start_port];
>          spin_lock_irqsave(&port->classport_lock, flags);
>          if (!port->classport_info.valid)
>                  goto ret;
> 
>          if (port->classport_info.data.type == RDMA_CLASS_PORT_INFO_OPA)
>                  ret = opa_get_cpi_capmask2(&port->classport_info.data.opa)
> &
>                          OPA_CLASS_PORT_INFO_PR_SUPPORT;
> ret:
>          spin_unlock_irqrestore(&port->classport_lock, flags);
>          return ret;
> }
> 

Thanks for pointing this out. We'll look into it.

-Denny

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks
  2021-02-09 19:15 ` Jason Gunthorpe
  2021-02-10  9:31   ` Christoph Lameter
@ 2021-02-12 22:13   ` Dennis Dalessandro
  1 sibling, 0 replies; 14+ messages in thread
From: Dennis Dalessandro @ 2021-02-12 22:13 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Lameter; +Cc: linux-rdma, Leon Romanovsky

On 2/9/2021 2:15 PM, Jason Gunthorpe wrote:
> On Thu, Jan 28, 2021 at 06:46:47PM +0000, Christoph Lameter wrote:
>>  From 64e734c38f509d591073fc1e1db3caa42be3b874 Mon Sep 17 00:00:00 2001
>> From: Christoph Lameter <cl@linux.com>
>> Date: Thu, 28 Jan 2021 14:55:36 +0000
>> Subject: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks
>>
>> When a system receives a REREG event from the SM, then the SM information in
>> the kernel is marked as invalid and a request is sent to the SM to update
>> the information. The SM information is invalid in that time period.
>>
>> However, receiving a REREG also occurs simultaneously in user space
>> applications that are now trying to rejoin the multicast groups. Some of those
>> may be sendonly multicast groups which are then failing.
>>
>> If the SM information is invalid then ib_sa_sendonly_fullmem_support()
>> returns false. That is wrong because it just means that we do not know
>> yet if the potentially new SM supports sendonly joins.
>>
>> Sendonly join was introduced in 2015 and all the Subnet managers have
>> supported it ever since. So there is no point in checking if a subnet
>> manager supports it.
>>
>> Should an old opensm get a request for a sendonly join then the request
>> will fail. The code that is removed here accomodated that situation
>> and fell back to a full join.
>>
>> Falling back to a full join is problematic in itself. The reason to
>> use the sendonly join was to reduce the traffic on the Infiniband
>> fabric otherwise one could have just stayed with the regular join.
>> So this patch may cause users of very old opensms to discover that
>> lots of traffic needlessly crosses their IB fabrics.
>>
>> Signed-off-by: Christoph Lameter <cl@linux.com>
>> ---
>>   drivers/infiniband/core/cma.c                 | 11 ---------
>>   drivers/infiniband/core/sa_query.c            | 24 -------------------
>>   drivers/infiniband/ulp/ipoib/ipoib.h          |  1 -
>>   drivers/infiniband/ulp/ipoib/ipoib_main.c     |  2 --
>>   .../infiniband/ulp/ipoib/ipoib_multicast.c    | 13 +---------
>>   5 files changed, 1 insertion(+), 50 deletions(-)
> 
> This one got spam filtered and didn't make it to the list:
> 
> Received-SPF: SoftFail (hqemgatev14.nvidia.com: domain of
>          cl@linux.com is inclined to not designate 3.19.106.255 as
>          permitted sender) identity=mailfrom; client-ip=3.19.106.255;
>          receiver=hqemgatev14.nvidia.com;
>          envelope-from="cl@linux.com"; x-sender="cl@linux.com";
>          x-conformance=spf_only; x-record-type="v=spf1"
> 
> Also the extra From/Date/Subject ended up in the commit message
> 
> I fixed it all up, applied to for-next
> 
> It looks like OPA will also suffer this race (opa_pr_query_possible),
> maybe it is a little less likely since it will be driven by PR queries
> not broadcast joins.
> 
> But the same logic is likely true there, I'd be surprised if OPA
> fabrics are not running a capable OPA SM at this point.

OPA supports SENDONLY joins.

-Denny

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks
  2021-02-10 13:03     ` Jason Gunthorpe
@ 2021-02-10 18:51       ` Christoph Lameter
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Lameter @ 2021-02-10 18:51 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-rdma, Leon Romanovsky

On Wed, 10 Feb 2021, Jason Gunthorpe wrote:

> > Yes the Linux Foundation guys are not willing to address this issue in any
> > way. I may have to give up my linux.com email address.
>
> It looks like you have to linux.com emails through their SMTP relay,
> just like kernel.org ?

No they do not offer an SMTP relay. That would actually fix the issue.

> I have an exim config that auto-routes to an authenticated smarthost
> based on the From email address if that would help you

I am running a mailer too but that does address the issue of not being
able to setup the SPF records for me on linux.com.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks
  2021-02-10  9:31   ` Christoph Lameter
@ 2021-02-10 13:03     ` Jason Gunthorpe
  2021-02-10 18:51       ` Christoph Lameter
  2021-02-12 22:16     ` Dennis Dalessandro
  1 sibling, 1 reply; 14+ messages in thread
From: Jason Gunthorpe @ 2021-02-10 13:03 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-rdma, Leon Romanovsky

On Wed, Feb 10, 2021 at 09:31:32AM +0000, Christoph Lameter wrote:
> On Tue, 9 Feb 2021, Jason Gunthorpe wrote:
> 
> > This one got spam filtered and didn't make it to the list:
> >
> > Received-SPF: SoftFail (hqemgatev14.nvidia.com: domain of
> >         cl@linux.com is inclined to not designate 3.19.106.255 as
> >         permitted sender) identity=mailfrom; client-ip=3.19.106.255;
> >         receiver=hqemgatev14.nvidia.com;
> >         envelope-from="cl@linux.com"; x-sender="cl@linux.com";
> >         x-conformance=spf_only; x-record-type="v=spf1"
> >
> > Also the extra From/Date/Subject ended up in the commit message
> 
> Yes the Linux Foundation guys are not willing to address this issue in any
> way. I may have to give up my linux.com email address.

It looks like you have to linux.com emails through their SMTP relay,
just like kernel.org ?

I have an exim config that auto-routes to an authenticated smarthost
based on the From email address if that would help you

> There is also another potentially racy check in there for OPA in regards
> to the support of path records?

Looks like

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks
  2021-02-09 19:15 ` Jason Gunthorpe
@ 2021-02-10  9:31   ` Christoph Lameter
  2021-02-10 13:03     ` Jason Gunthorpe
  2021-02-12 22:16     ` Dennis Dalessandro
  2021-02-12 22:13   ` Dennis Dalessandro
  1 sibling, 2 replies; 14+ messages in thread
From: Christoph Lameter @ 2021-02-10  9:31 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-rdma, Leon Romanovsky

On Tue, 9 Feb 2021, Jason Gunthorpe wrote:

> This one got spam filtered and didn't make it to the list:
>
> Received-SPF: SoftFail (hqemgatev14.nvidia.com: domain of
>         cl@linux.com is inclined to not designate 3.19.106.255 as
>         permitted sender) identity=mailfrom; client-ip=3.19.106.255;
>         receiver=hqemgatev14.nvidia.com;
>         envelope-from="cl@linux.com"; x-sender="cl@linux.com";
>         x-conformance=spf_only; x-record-type="v=spf1"
>
> Also the extra From/Date/Subject ended up in the commit message

Yes the Linux Foundation guys are not willing to address this issue in any
way. I may have to give up my linux.com email address.

> I fixed it all up, applied to for-next

Thank you.

> It looks like OPA will also suffer this race (opa_pr_query_possible),
> maybe it is a little less likely since it will be driven by PR queries
> not broadcast joins.
>
> But the same logic is likely true there, I'd be surprised if OPA
> fabrics are not running a capable OPA SM at this point.

There is also another potentially racy check in there for OPA in regards
to the support of path records?

static bool ib_sa_opa_pathrecord_support(struct ib_sa_client *client,
                                         struct ib_sa_device *sa_dev,
                                         u8 port_num)
{
        struct ib_sa_port *port;
        unsigned long flags;
        bool ret = false;

        port = &sa_dev->port[port_num - sa_dev->start_port];
        spin_lock_irqsave(&port->classport_lock, flags);
        if (!port->classport_info.valid)
                goto ret;

        if (port->classport_info.data.type == RDMA_CLASS_PORT_INFO_OPA)
                ret = opa_get_cpi_capmask2(&port->classport_info.data.opa)
&
                        OPA_CLASS_PORT_INFO_PR_SUPPORT;
ret:
        spin_unlock_irqrestore(&port->classport_lock, flags);
        return ret;
}


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks
       [not found] <alpine.DEB.2.22.394.2101281845160.13303@www.lameter.com>
@ 2021-02-09 19:15 ` Jason Gunthorpe
  2021-02-10  9:31   ` Christoph Lameter
  2021-02-12 22:13   ` Dennis Dalessandro
  0 siblings, 2 replies; 14+ messages in thread
From: Jason Gunthorpe @ 2021-02-09 19:15 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: linux-rdma, Leon Romanovsky

On Thu, Jan 28, 2021 at 06:46:47PM +0000, Christoph Lameter wrote:
> From 64e734c38f509d591073fc1e1db3caa42be3b874 Mon Sep 17 00:00:00 2001
> From: Christoph Lameter <cl@linux.com>
> Date: Thu, 28 Jan 2021 14:55:36 +0000
> Subject: [PATCH] Fix: Remove racy Subnet Manager sendonly join checks
> 
> When a system receives a REREG event from the SM, then the SM information in
> the kernel is marked as invalid and a request is sent to the SM to update
> the information. The SM information is invalid in that time period.
> 
> However, receiving a REREG also occurs simultaneously in user space
> applications that are now trying to rejoin the multicast groups. Some of those
> may be sendonly multicast groups which are then failing.
> 
> If the SM information is invalid then ib_sa_sendonly_fullmem_support()
> returns false. That is wrong because it just means that we do not know
> yet if the potentially new SM supports sendonly joins.
> 
> Sendonly join was introduced in 2015 and all the Subnet managers have
> supported it ever since. So there is no point in checking if a subnet
> manager supports it.
> 
> Should an old opensm get a request for a sendonly join then the request
> will fail. The code that is removed here accomodated that situation
> and fell back to a full join.
> 
> Falling back to a full join is problematic in itself. The reason to
> use the sendonly join was to reduce the traffic on the Infiniband
> fabric otherwise one could have just stayed with the regular join.
> So this patch may cause users of very old opensms to discover that
> lots of traffic needlessly crosses their IB fabrics.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> ---
>  drivers/infiniband/core/cma.c                 | 11 ---------
>  drivers/infiniband/core/sa_query.c            | 24 -------------------
>  drivers/infiniband/ulp/ipoib/ipoib.h          |  1 -
>  drivers/infiniband/ulp/ipoib/ipoib_main.c     |  2 --
>  .../infiniband/ulp/ipoib/ipoib_multicast.c    | 13 +---------
>  5 files changed, 1 insertion(+), 50 deletions(-)

This one got spam filtered and didn't make it to the list:

Received-SPF: SoftFail (hqemgatev14.nvidia.com: domain of
        cl@linux.com is inclined to not designate 3.19.106.255 as
        permitted sender) identity=mailfrom; client-ip=3.19.106.255;
        receiver=hqemgatev14.nvidia.com;
        envelope-from="cl@linux.com"; x-sender="cl@linux.com";
        x-conformance=spf_only; x-record-type="v=spf1"

Also the extra From/Date/Subject ended up in the commit message

I fixed it all up, applied to for-next

It looks like OPA will also suffer this race (opa_pr_query_possible),
maybe it is a little less likely since it will be driven by PR queries
not broadcast joins.

But the same logic is likely true there, I'd be surprised if OPA
fabrics are not running a capable OPA SM at this point.

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-02-12 22:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-25 11:28 [PATCH] Fix: Remove racy Subnet Manager sendonly join checks Christoph Lameter
2021-01-25 11:44 ` Leon Romanovsky
2021-01-28 14:03 ` Jason Gunthorpe
2021-01-28 14:21   ` Leon Romanovsky
2021-01-28 14:34   ` Christoph Lameter
2021-01-28 14:44     ` Jason Gunthorpe
2021-01-28 14:58       ` Christoph Lameter
2021-01-28 18:11         ` Jason Gunthorpe
     [not found] <alpine.DEB.2.22.394.2101281845160.13303@www.lameter.com>
2021-02-09 19:15 ` Jason Gunthorpe
2021-02-10  9:31   ` Christoph Lameter
2021-02-10 13:03     ` Jason Gunthorpe
2021-02-10 18:51       ` Christoph Lameter
2021-02-12 22:16     ` Dennis Dalessandro
2021-02-12 22:13   ` Dennis Dalessandro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.