ROCE uses IGMP for Multicast instead of the native Infiniband system where joins are required in order to post messages on the Multicast group. On Ethernet one can send Multicast messages to arbitrary addresses without the need to subscribe to a group. So ROCE correctly does not send IGMP joins during rdma_join_multicast(). F.e. in cma_iboe_join_multicast() we see: if (addr->sa_family == AF_INET) { if (gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) { ib.rec.hop_limit = IPV6_DEFAULT_HOPLIMIT; if (!send_only) { err = cma_igmp_send(ndev, &ib.rec.mgid, true); } } } else { So the IGMP join is suppressed as it is unnecessary. However no such check is done in destroy_mc(). And therefore leaving a sendonly multicast group will send an IGMP leave. This means that the following scenario can lead to a multicast receiver unexpectedly being unsubscribed from a MC group: 1. Sender thread does a sendonly join on MC group X. No IGMP join is sent. 2. Receiver thread does a regular join on the same MC Group x. IGMP join is sent and the receiver begins to get messages. 3. Sender thread terminates and destroys MC group X. IGMP leave is sent and the receiver no longer receives data. This patch adds the same logic for sendonly joins to destroy_mc() that is also used in cma_iboe_join_multicast(). Signed-off-by: Christoph Lameter <cl@linux.com> Index: linux/drivers/infiniband/core/cma.c =================================================================== --- linux.orig/drivers/infiniband/core/cma.c 2021-09-08 12:59:51.602754272 +0200 +++ linux/drivers/infiniband/core/cma.c 2021-09-08 13:05:20.269838488 +0200 @@ -1810,6 +1810,8 @@ static void cma_release_port(struct rdma static void destroy_mc(struct rdma_id_private *id_priv, struct cma_multicast *mc) { + bool send_only = mc->join_state == BIT(SENDONLY_FULLMEMBER_JOIN); + if (rdma_cap_ib_mcast(id_priv->id.device, id_priv->id.port_num)) ib_sa_free_multicast(mc->sa_mc); @@ -1826,7 +1828,10 @@ static void destroy_mc(struct rdma_id_pr cma_set_mgid(id_priv, (struct sockaddr *)&mc->addr, &mgid); - cma_igmp_send(ndev, &mgid, false); + + if (!send_only) + cma_igmp_send(ndev, &mgid, false); + dev_put(ndev); }
On Wed, Sep 08, 2021 at 01:43:28PM +0200, Christoph Lameter wrote:
> ROCE uses IGMP for Multicast instead of the native Infiniband system where
> joins are required in order to post messages on the Multicast group.
According to the IBTA v1.5, there is no need to join multicast group to
send messages.
10.5.2 MULTICAST WORK REQUESTS
10.5.2.1 IBA UNRELIABLE MULTICAST WORK REQUESTS
...
A QP is not required to be attached to a Multicast Group
in order to initiate an IBA Unreliable Multicast Work Request.
Did I look in wrong place?
Thanks
On Mon, Sep 13, 2021 at 01:38:43PM +0300, Leon Romanovsky wrote:
> On Wed, Sep 08, 2021 at 01:43:28PM +0200, Christoph Lameter wrote:
> > ROCE uses IGMP for Multicast instead of the native Infiniband system where
> > joins are required in order to post messages on the Multicast group.
>
> According to the IBTA v1.5, there is no need to join multicast group to
> send messages.
>
> 10.5.2 MULTICAST WORK REQUESTS
> 10.5.2.1 IBA UNRELIABLE MULTICAST WORK REQUESTS
> ...
> A QP is not required to be attached to a Multicast Group
> in order to initiate an IBA Unreliable Multicast Work Request.
>
> Did I look in wrong place?
This is talking about the ibv_attach_mcast() verb, which is different
from the SM notion of a node being joined to a multicast group or not.
In IBA a node that is not joined to a MGID will not be able to send to
that MGID, the network is allowed to drop the packet.
With ethernet all nodes can always send to all multicast addresses,
the IGMP stuff is only required to receive
Jason
On Wed, Sep 08, 2021 at 01:43:28PM +0200, Christoph Lameter wrote:
> ROCE uses IGMP for Multicast instead of the native Infiniband system where
> joins are required in order to post messages on the Multicast group.
> On Ethernet one can send Multicast messages to arbitrary addresses
> without the need to subscribe to a group.
>
> So ROCE correctly does not send IGMP joins during rdma_join_multicast().
>
> F.e. in cma_iboe_join_multicast() we see:
>
> if (addr->sa_family == AF_INET) {
> if (gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) {
> ib.rec.hop_limit = IPV6_DEFAULT_HOPLIMIT;
> if (!send_only) {
> err = cma_igmp_send(ndev, &ib.rec.mgid,
> true);
> }
> }
> } else {
>
> So the IGMP join is suppressed as it is unnecessary.
>
> However no such check is done in destroy_mc(). And therefore leaving a
> sendonly multicast group will send an IGMP leave.
>
> This means that the following scenario can lead to a multicast receiver
> unexpectedly being unsubscribed from a MC group:
>
>
> 1. Sender thread does a sendonly join on MC group X. No IGMP join
> is sent.
>
> 2. Receiver thread does a regular join on the same MC Group x.
> IGMP join is sent and the receiver begins to get messages.
>
> 3. Sender thread terminates and destroys MC group X.
> IGMP leave is sent and the receiver no longer receives data.
>
> This patch adds the same logic for sendonly joins to destroy_mc()
> that is also used in cma_iboe_join_multicast().
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
I added the missing fixes line:
Fixes: ab15c95a17b3 ("IB/core: Support for CMA multicast join flags")
Applied to for-rc, thanks
Jason
On Mon, 13 Sep 2021, Leon Romanovsky wrote: > On Wed, Sep 08, 2021 at 01:43:28PM +0200, Christoph Lameter wrote: > > ROCE uses IGMP for Multicast instead of the native Infiniband system where > > joins are required in order to post messages on the Multicast group. > > According to the IBTA v1.5, there is no need to join multicast group to > send messages. This is ROCE where you do not need to do a join since its Ethernet Multicast. On Infiniband (which this patch is not dealing with) you can only send if you join a multicast group by sending a join request with the MGID to the SM. SM will reconfigure the IB switches so that your traffic can be routed to the receivers of the multicast channel. See the sendonly join description in the IBTA manuals. > 10.5.2 MULTICAST WORK REQUESTS > 10.5.2.1 IBA UNRELIABLE MULTICAST WORK REQUESTS > ... > A QP is not required to be attached to a Multicast Group > in order to initiate an IBA Unreliable Multicast Work Request. > > Did I look in wrong place? Work request? Does that mean it send multicast?