All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Fix sendonly join going away after Reregister event
@ 2021-01-21 13:24 Christoph Lameter
  2021-01-21 16:11 ` Leon Romanovsky
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Lameter @ 2021-01-21 13:24 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: linux-rdma, Leon Romanovsky

From: Christoph Lameter <cl@linux.com>
Subject: [PATCH] Fix sendonly join going away after Reregister event

When a server receives a REREG event then the SM information in
the kernel is marked as invalid and a request is sent to the SM to update
the information.

However, receiving a REREG also occurs in user space applications that
are now trying to rejoin the multicast groups.

If the SM information is invalid then ib_sa_sendonly_fullmem_support()
returns false. That is wrong because it just means that we do not know
yet if the potentially new SM supports sendonly joins. It does not mean
that the SM does not support Sendonly joins.

This patch simply attempts to waits until the SM information is updated
and the determination can be made.

The code has not been testet but compiles fine.
I am not sure if it is good to do an msleep here.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/drivers/infiniband/core/sa_query.c
===================================================================
--- linux.orig/drivers/infiniband/core/sa_query.c	2020-12-17 14:51:15.301206041 +0000
+++ linux/drivers/infiniband/core/sa_query.c	2021-01-21 12:52:53.577943481 +0000
@@ -1963,11 +1963,19 @@ bool ib_sa_sendonly_fullmem_support(stru
 	if (!sa_dev)
 		return ret;

+redo:
 	port  = &sa_dev->port[port_num - sa_dev->start_port];

+	while (!port->classport_info.valid)
+		msleep(100);
+
 	spin_lock_irqsave(&port->classport_lock, flags);
-	if ((port->classport_info.valid) &&
-	    (port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB))
+	if (!port->classport_info.valid) {
+		/* Need to wait until the SM data is available */
+		spin_unlock_irqrestore(&port->classport_lock, flags);
+		goto redo;
+	}
+	if ((port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB))
 		ret = ib_get_cpi_capmask2(&port->classport_info.data.ib)
 			& IB_SA_CAP_MASK2_SENDONLY_FULL_MEM_SUPPORT;
 	spin_unlock_irqrestore(&port->classport_lock, flags);

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix sendonly join going away after Reregister event
  2021-01-21 13:24 [PATCH] Fix sendonly join going away after Reregister event Christoph Lameter
@ 2021-01-21 16:11 ` Leon Romanovsky
  2021-01-22  8:24   ` Christoph Lameter
  0 siblings, 1 reply; 4+ messages in thread
From: Leon Romanovsky @ 2021-01-21 16:11 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Jason Gunthorpe, linux-rdma

On Thu, Jan 21, 2021 at 01:24:43PM +0000, Christoph Lameter wrote:
> From: Christoph Lameter <cl@linux.com>
> Subject: [PATCH] Fix sendonly join going away after Reregister event
>
> When a server receives a REREG event then the SM information in
> the kernel is marked as invalid and a request is sent to the SM to update
> the information.
>
> However, receiving a REREG also occurs in user space applications that
> are now trying to rejoin the multicast groups.
>
> If the SM information is invalid then ib_sa_sendonly_fullmem_support()
> returns false. That is wrong because it just means that we do not know
> yet if the potentially new SM supports sendonly joins. It does not mean
> that the SM does not support Sendonly joins.
>
> This patch simply attempts to waits until the SM information is updated
> and the determination can be made.
>
> The code has not been testet but compiles fine.
> I am not sure if it is good to do an msleep here.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/drivers/infiniband/core/sa_query.c
> ===================================================================
> --- linux.orig/drivers/infiniband/core/sa_query.c	2020-12-17 14:51:15.301206041 +0000
> +++ linux/drivers/infiniband/core/sa_query.c	2021-01-21 12:52:53.577943481 +0000
> @@ -1963,11 +1963,19 @@ bool ib_sa_sendonly_fullmem_support(stru
>  	if (!sa_dev)
>  		return ret;
>
> +redo:
>  	port  = &sa_dev->port[port_num - sa_dev->start_port];
>
> +	while (!port->classport_info.valid)
> +		msleep(100);
> +
>  	spin_lock_irqsave(&port->classport_lock, flags);
> -	if ((port->classport_info.valid) &&
> -	    (port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB))
> +	if (!port->classport_info.valid) {
> +		/* Need to wait until the SM data is available */
> +		spin_unlock_irqrestore(&port->classport_lock, flags);
> +		goto redo;

We have all potential to loop forever here, if valid doesn't change.

> +	}
> +	if ((port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB))
>  		ret = ib_get_cpi_capmask2(&port->classport_info.data.ib)
>  			& IB_SA_CAP_MASK2_SENDONLY_FULL_MEM_SUPPORT;
>  	spin_unlock_irqrestore(&port->classport_lock, flags);

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix sendonly join going away after Reregister event
  2021-01-21 16:11 ` Leon Romanovsky
@ 2021-01-22  8:24   ` Christoph Lameter
  2021-01-24  6:57     ` Leon Romanovsky
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Lameter @ 2021-01-22  8:24 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Jason Gunthorpe, linux-rdma

On Thu, 21 Jan 2021, Leon Romanovsky wrote:

> >  	spin_lock_irqsave(&port->classport_lock, flags);
> > -	if ((port->classport_info.valid) &&
> > -	    (port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB))
> > +	if (!port->classport_info.valid) {
> > +		/* Need to wait until the SM data is available */
> > +		spin_unlock_irqrestore(&port->classport_lock, flags);
> > +		goto redo;
>
> We have all potential to loop forever here, if valid doesn't change.
>

Right. So what is the right solution here? The sendonly check function could return
an errno instead?

0	= Sendonly join is supported
-EAGAIN = SM information is currently invalid
-ENOSUP = SM does not support sendonly join

Since all SMs out there have had support for sendonly join for years now
we could just remove the check entirely. If there is an old grizzly SM out
there then it would not process that join request and would return an
error.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Fix sendonly join going away after Reregister event
  2021-01-22  8:24   ` Christoph Lameter
@ 2021-01-24  6:57     ` Leon Romanovsky
  0 siblings, 0 replies; 4+ messages in thread
From: Leon Romanovsky @ 2021-01-24  6:57 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Jason Gunthorpe, linux-rdma

On Fri, Jan 22, 2021 at 08:24:57AM +0000, Christoph Lameter wrote:
> On Thu, 21 Jan 2021, Leon Romanovsky wrote:
>
> > >  	spin_lock_irqsave(&port->classport_lock, flags);
> > > -	if ((port->classport_info.valid) &&
> > > -	    (port->classport_info.data.type == RDMA_CLASS_PORT_INFO_IB))
> > > +	if (!port->classport_info.valid) {
> > > +		/* Need to wait until the SM data is available */
> > > +		spin_unlock_irqrestore(&port->classport_lock, flags);
> > > +		goto redo;
> >
> > We have all potential to loop forever here, if valid doesn't change.
> >
>
> Right. So what is the right solution here? The sendonly check function could return
> an errno instead?
>
> 0	= Sendonly join is supported
> -EAGAIN = SM information is currently invalid
> -ENOSUP = SM does not support sendonly join

I would do the same flow as in update_ib_cpi(), use retry count and loop
with delay, but without workqueue.

>
> Since all SMs out there have had support for sendonly join for years now
> we could just remove the check entirely. If there is an old grizzly SM out
> there then it would not process that join request and would return an
> error.

I have no idea if it possible, if yes, this will be the best solution.

Thanks

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-01-24  6:58 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-21 13:24 [PATCH] Fix sendonly join going away after Reregister event Christoph Lameter
2021-01-21 16:11 ` Leon Romanovsky
2021-01-22  8:24   ` Christoph Lameter
2021-01-24  6:57     ` Leon Romanovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.