All of lore.kernel.org
 help / color / mirror / Atom feed
* IPoIB: Broken IGMP processing
@ 2010-08-23 17:16 Christoph Lameter
       [not found] ` <alpine.DEB.2.00.1008231210010.9840-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Lameter @ 2010-08-23 17:16 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Or Gerlitz, Yossi Etigin, Roland Dreier

We see that IGMP timers are not properly deferred when hosts send IGMP
membership information. It looks as if the IPoIB layer does not properly
mark the multicast/broadcast packets with PACKET_MULTICAST or
PACKET_BROADCAST. As a results icmp_recv() ignores the IGMP membership
information from others. That in turn results in the IGMP timers
frequently expiring, thus the network becomes quite chatty.

The following is an untested patch: I am not sure how exact to access the
ipob Mac header. The IB header contains a special marker for IPoIB
multicast. So it should be simple to identify the multicast packets in the
receive path.


Subject: [IB] Make igmp processing work with IPOIB

IGMP processing is broken because the IPOIB does not set the
skb->pkt_type the right way for Multicast traffic. All incoming
packets are set to PACKET_HOST which means that the igmp_recv()
function will ignore the IGMP broadcasts/multicasts.

This in turn means that the IGMP timers are firing and are sending
information about multicast subscriptions unnecessarily. In a large
private network this can cause traffic spikes.

Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>

---
 drivers/infiniband/ulp/ipoib/ipoib.h    |   13 +++++++++++++
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |    6 ++++--
 2 files changed, 17 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib.h
===================================================================
--- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h	2010-08-20 19:44:13.000000000 -0500
+++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib.h	2010-08-20 19:58:21.000000000 -0500
@@ -114,6 +114,9 @@ enum {
 #define	IPOIB_OP_CM     (0)
 #endif

+#define IPOIB_MGID_IPV4_SIGNATURE 0x401B
+#define IPOIB_MGID_IPV6_SIGNATURE 0x601B
+
 /* structs */

 struct ipoib_header {
@@ -125,6 +128,16 @@ struct ipoib_pseudoheader {
 	u8  hwaddr[INFINIBAND_ALEN];
 };

+int ipoib_is_ipv4_multicast(u8 *p)
+{
+	return *((u16 *)(p + 2)) == htonl(IPOIB_MGID_IPV4_SIGNATURE);
+}
+
+int ipoib_is_ipv6_multicast(u8 *p)
+{
+	return *((u16 *)(p + 2)) == htonl(IPOIB_MGID_IPV6_SIGNATURE);
+}
+
 /* Used for all multicast joins (broadcast, IPv4 mcast and IPv6 mcast) */
 struct ipoib_mcast {
 	struct ib_sa_mcmember_rec mcmember;
Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-20 18:43:44.000000000 -0500
+++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-20 19:58:34.000000000 -0500
@@ -281,8 +281,10 @@ static void ipoib_ib_handle_rx_wc(struct
 	dev->stats.rx_bytes += skb->len;

 	skb->dev = dev;
-	/* XXX get correct PACKET_ type here */
-	skb->pkt_type = PACKET_HOST;
+	if (ipoib_is_ipv4_multicast(skb_mac_header(skb)))
+		skb->pkt_type = PACKET_MULTICAST;
+	else
+		skb->pkt_type = PACKET_HOST;

 	if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok))
 		skb->ip_summed = CHECKSUM_UNNECESSARY;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: IPoIB: Broken IGMP processing
       [not found] ` <alpine.DEB.2.00.1008231210010.9840-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
@ 2010-08-23 17:41   ` Jason Gunthorpe
       [not found]     ` <20100823174110.GK26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Gunthorpe @ 2010-08-23 17:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Yossi Etigin,
	Roland Dreier

On Mon, Aug 23, 2010 at 12:16:40PM -0500, Christoph Lameter wrote:

> +int ipoib_is_ipv4_multicast(u8 *p)
> +{
> +	return *((u16 *)(p + 2)) == htonl(IPOIB_MGID_IPV4_SIGNATURE);
> +}
> +
> +int ipoib_is_ipv6_multicast(u8 *p)
> +{
> +	return *((u16 *)(p + 2)) == htonl(IPOIB_MGID_IPV6_SIGNATURE);
> +}

static inline for functions in headers?

> +
>  /* Used for all multicast joins (broadcast, IPv4 mcast and IPv6 mcast) */
>  struct ipoib_mcast {
>  	struct ib_sa_mcmember_rec mcmember;
> Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> ===================================================================
> +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-20 19:58:34.000000000 -0500
> @@ -281,8 +281,10 @@ static void ipoib_ib_handle_rx_wc(struct
>  	dev->stats.rx_bytes += skb->len;
> 
>  	skb->dev = dev;
> -	/* XXX get correct PACKET_ type here */
> -	skb->pkt_type = PACKET_HOST;
> +	if (ipoib_is_ipv4_multicast(skb_mac_header(skb)))
> +		skb->pkt_type = PACKET_MULTICAST;
> +	else
> +		skb->pkt_type = PACKET_HOST;
> 
>  	if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok))
>  		skb->ip_summed = CHECKSUM_UNNECESSARY;

Hmmm... What are you trying to access here? I'm guessing it is the
DGID of the GRH?

        ipoib_ud_skb_put_frags(priv, skb, wc->byte_len);
        skb_pull(skb, IB_GRH_BYTES);  <-- These are the bytes you want
        skb_reset_mac_header(skb);  <-- Sets skb_mac_header to skb->head+40
        skb_pull(skb, IPOIB_ENCAP_LEN);

So, I think you are accessing byte 42, which doesn't seem right? The
DGID starts in byte 24 from skb->head.

Also, you need to check for IBV_WC_GRH, the 40 bytes are garbage if it
is not set.

Maybe checking for checking wc->qp_num == multicast QPN is a better
choice?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: IPoIB: Broken IGMP processing
       [not found]     ` <20100823174110.GK26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2010-08-23 18:10       ` Christoph Lameter
       [not found]         ` <alpine.DEB.2.00.1008231254300.10719-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
  2010-08-23 18:19       ` Christoph Lameter
  1 sibling, 1 reply; 33+ messages in thread
From: Christoph Lameter @ 2010-08-23 18:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Yossi Etigin,
	Roland Dreier

On Mon, 23 Aug 2010, Jason Gunthorpe wrote:

> On Mon, Aug 23, 2010 at 12:16:40PM -0500, Christoph Lameter wrote:
>
> > +int ipoib_is_ipv4_multicast(u8 *p)
> > +{
> > +	return *((u16 *)(p + 2)) == htonl(IPOIB_MGID_IPV4_SIGNATURE);
> > +}
> > +
> > +int ipoib_is_ipv6_multicast(u8 *p)
> > +{
> > +	return *((u16 *)(p + 2)) == htonl(IPOIB_MGID_IPV6_SIGNATURE);
> > +}
>
> static inline for functions in headers?

Right.

> Maybe checking for checking wc->qp_num == multicast QPN is a better
> choice?

If that is properly set then yes of course that is much easier. Would this
work? The broadcast QP is used for multicast right?


Subject: [IB] Make igmp processing work with IPOIB

IGMP processing is broken because the IPOIB does not set the
skb->pkt_type the right way for Multicast traffic. All incoming
packets are set to PACKET_HOST which means that the igmp_recv()
function will ignore the IGMP broadcasts/multicasts.

This in turn means that the IGMP timers are firing and are sending
information about multicast subscriptions unnecessarily. In a large
private network this can cause traffic spikes.

Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>

---
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-23 13:07:32.000000000 -0500
+++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-23 13:09:06.000000000 -0500
@@ -223,6 +223,7 @@ static void ipoib_ib_handle_rx_wc(struct
 	unsigned int wr_id = wc->wr_id & ~IPOIB_OP_RECV;
 	struct sk_buff *skb;
 	u64 mapping[IPOIB_UD_RX_SG];
+	struct ipoib_dev_priv *multicast_priv = netdev_priv(priv->broadcast->dev);

 	ipoib_dbg_data(priv, "recv completion: id %d, status: %d\n",
 		       wr_id, wc->status);
@@ -281,8 +282,11 @@ static void ipoib_ib_handle_rx_wc(struct
 	dev->stats.rx_bytes += skb->len;

 	skb->dev = dev;
-	/* XXX get correct PACKET_ type here */
-	skb->pkt_type = PACKET_HOST;
+	if (wc->src_qp == multicast_priv->qp->qp_num)
+
+		skb->pkt_type = PACKET_MULTICAST;
+	else
+		skb->pkt_type = PACKET_HOST;

 	if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok))
 		skb->ip_summed = CHECKSUM_UNNECESSARY;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: IPoIB: Broken IGMP processing
       [not found]     ` <20100823174110.GK26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2010-08-23 18:10       ` Christoph Lameter
@ 2010-08-23 18:19       ` Christoph Lameter
       [not found]         ` <alpine.DEB.2.00.1008231317150.10719-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
  1 sibling, 1 reply; 33+ messages in thread
From: Christoph Lameter @ 2010-08-23 18:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Yossi Etigin,
	Roland Dreier

On Mon, 23 Aug 2010, Jason Gunthorpe wrote:

> Hmmm... What are you trying to access here? I'm guessing it is the
> DGID of the GRH?
>
>         ipoib_ud_skb_put_frags(priv, skb, wc->byte_len);
>         skb_pull(skb, IB_GRH_BYTES);  <-- These are the bytes you want
>         skb_reset_mac_header(skb);  <-- Sets skb_mac_header to skb->head+40
>         skb_pull(skb, IPOIB_ENCAP_LEN);
>
> So, I think you are accessing byte 42, which doesn't seem right? The
> DGID starts in byte 24 from skb->head.
>
> Also, you need to check for IBV_WC_GRH, the 40 bytes are garbage if it
> is not set.

Trying to get the MGID information:

>From http://tools.ietf.org/html/draft-ietf-ipoib-link-multicast-00


7. The IPoIB All-Node Multicast and Broadcast Group

   Once an IB partition is created with link attributes identified for
   an IPoIB link, the network administrator must create a special IB
   multicast group for every node on the IPoIB link to join. This is
   achieved through the creation of "MCGroupRecord" in each IB subnet
   that the IB partition encompasses, as described in section 4 above.

   The MGID will have the P_Key of the IB partition that defines the
   IPoIB link embedded in it. A special signature is also embedded to
   identify the MGID for IPoIB use only. For IPv4 over IB, the signature
   will be "0x401B". For IPv6 over IB, the signature will be "0x601B".

   For an IPv4 subnet, the MGID for this special IB multicast group
   SHALL have the following format:

   |   8    |  4 |  4 |     16 bits     | 16 bits | 48 bits  | 32 bits |
   +--------+----+----+-----------------+---------+----------+---------+
   |11111111|0001|scop|<IPoIB signature>|< P_Key >|00.......0|<all 1's>|
   +--------+----+----+-----------------+---------+----------+---------+



Chu & Kashyap                                                   [Page 7]

draft-ietf-ipoib-link-multicast-00.txt                      January 2002


   For an IPv6 subnet, the format of the MGID SHALL look like this:

   |   8    |  4 |  4 |     16 bits     | 16 bits |       80 bits      |
   +--------+----+----+-----------------+---------+--------------------+
   |11111111|0001|scop|<IPoIB signature>|< P_Key >|000.............0001|
   +--------+----+----+-----------------+---------+--------------------+

   As for the scop bits, if the IPoIB link is fully contained within a
   single IB subnet, the scop bits SHALL be set to 2 (link-local).
   Otherwise the scope will be set higher.

   A MCGroupRecord will be created with all the IPoIB link attributes
   described before, including the link MTU, Q_Key, TClass, FlowLabel,
   and HopLimit. When a node is attached to an IPoIB link identified by
   a P_Key, it must look for a special, all-node multicast/broadcast
   group to join. This is done by constructing the MGID with the link
   P_Key and the IPoIB signature. The node SHOULD always look for a MGID
   of a link-local scope first before attempting one with a greater
   scope.

   Once the right MGID and MCGroupRecord are identified, the local node
   SHOULD use the link MTU recorded in the MCGroupRecord. It MUST accept
   a smaller MTU if one is advertised through the link MTU option of a
   router advertisement [DISC].

   In case the link MTU is greater than the maximum payload size that
   the local HCA can support, the node can not join the IPoIB link and
   operate as an IP node.

   After the right MTU is determined, the local node must join the
   special all-node multicast/broadcast group by calling the SA to
   create a MCMemberRecord corresponding to the MGID. The SA will return
   all the link attributes for the local node to use. The node MUST use
   these attributes in all future multicast operations to the local
   IPoIB link.  The broadcast group for IPv4 will serve to provide a
   broadcast service for protocol like ARP to use.

   In addition to the all-node multicast/broadcast group, an all-router
   multicast group SHOULD be created at link configuration time if an IP
   router will be attached to the link. This is to facilitate IP
   multicast operations described later. A MCGroupRecord for the all-
   router MGID must be created in every IB subnet that the IPoIB link
   encompasses. The format of the all-router MGID will be covered in
   next section.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: IPoIB: Broken IGMP processing
       [not found]         ` <alpine.DEB.2.00.1008231254300.10719-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
@ 2010-08-23 18:28           ` Jason Gunthorpe
       [not found]             ` <20100823182859.GL26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Gunthorpe @ 2010-08-23 18:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Yossi Etigin,
	Roland Dreier

On Mon, Aug 23, 2010 at 01:10:45PM -0500, Christoph Lameter wrote:

> +	if (wc->src_qp == multicast_priv->qp->qp_num)
> +
> +		skb->pkt_type = PACKET_MULTICAST;
> +	else
> +		skb->pkt_type = PACKET_HOST;

src_qp is just the send QPN, you need to look at qp_num (aka dest qp).
I'm not entirely sure what it will be, I didn't find anything too clear
in the spec. If it is 0xFFFFFF then the HCA is copying the dest QP
directly into the WC and this can work, if it is something else then
the HCA is setting it to the QPN of the RQ that recieved the packet,
which is not useful for this.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: IPoIB: Broken IGMP processing
       [not found]         ` <alpine.DEB.2.00.1008231317150.10719-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
@ 2010-08-23 18:30           ` Jason Gunthorpe
       [not found]             ` <20100823183044.GM26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Gunthorpe @ 2010-08-23 18:30 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Yossi Etigin,
	Roland Dreier

On Mon, Aug 23, 2010 at 01:19:20PM -0500, Christoph Lameter wrote:
> On Mon, 23 Aug 2010, Jason Gunthorpe wrote:
> 
> > Hmmm... What are you trying to access here? I'm guessing it is the
> > DGID of the GRH?
> >
> >         ipoib_ud_skb_put_frags(priv, skb, wc->byte_len);
> >         skb_pull(skb, IB_GRH_BYTES);  <-- These are the bytes you want
> >         skb_reset_mac_header(skb);  <-- Sets skb_mac_header to skb->head+40
> >         skb_pull(skb, IPOIB_ENCAP_LEN);
> >
> > So, I think you are accessing byte 42, which doesn't seem right? The
> > DGID starts in byte 24 from skb->head.
> >
> > Also, you need to check for IBV_WC_GRH, the 40 bytes are garbage if it
> > is not set.
> 
> Trying to get the MGID information:
> 
> From http://tools.ietf.org/html/draft-ietf-ipoib-link-multicast-00

The MGID is stuffed into the DGID field of the GRH by the sender, so
you want to start at byte 24 of the wc data. The structure is 40 bytes
of GRH, 4 bytes of IPOIB_ENCAP with the protocol number, and then the
datagram. It looks to me like the MAC header pointer is set to the
IPOIB_ENCAP header.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: IPoIB: Broken IGMP processing
       [not found]             ` <20100823182859.GL26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2010-08-23 19:27               ` Christoph Lameter
       [not found]                 ` <alpine.DEB.2.00.1008231427130.12815-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Lameter @ 2010-08-23 19:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Yossi Etigin,
	Roland Dreier

On Mon, 23 Aug 2010, Jason Gunthorpe wrote:

> src_qp is just the send QPN, you need to look at qp_num (aka dest qp).
> I'm not entirely sure what it will be, I didn't find anything too clear
> in the spec. If it is 0xFFFFFF then the HCA is copying the dest QP
> directly into the WC and this can work, if it is something else then
> the HCA is setting it to the QPN of the RQ that recieved the packet,
> which is not useful for this.

AFAICT the spec has a QP for RC and one for UD packets. How do these
relate to multicast?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: IPoIB: Broken IGMP processing
       [not found]                 ` <alpine.DEB.2.00.1008231427130.12815-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
@ 2010-08-23 19:34                   ` Yossi Etigin
       [not found]                     ` <7E95F01E94AB484F83061FCFA35B39F8013249-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Yossi Etigin @ 2010-08-23 19:34 UTC (permalink / raw)
  To: Christoph Lameter, Jason Gunthorpe
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Roland Dreier

Hello Christoph,

I think you'll first want to check the wc_flags field in the work
completion to see whether GRH is present (and it must be present in
multicast packets, normally not present in unicast). Then, extract the
destination gid from grh, and compare the first 6 bytes of it to the
dev->broadcast address bytes 4...9 (once with dev->broadcast which is
IPv4, second time with dev->broadcast with the IP version nibble changed
to 6). The first 4 bytes of dev->broadcast are QPN.
Of course, you'll be doing this before pulling any other headers from
the skb because GRH comes first.

The QPN you get in the completion is the QP from which it was sent -
src_qp. You don't get the QPN which was put in the UD header by the
sender, which is 0xFFFFFF for multicast. In userspace (aka struct
ibv_wc), you get the qp number from which the receive buffer was taken,
and in kernel space (aka struct ib_wc, no 'v') you get the pointer to
the QP.

You can do multicast on UC QP.
You can't do multicast on RC QP.

BTW In your first patch looks like you are doing htonl when you should
be doing htons (or cpu_to_be16 which is good with constants)

> -----Original Message-----
> From: Christoph Lameter [mailto:cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org]
> Sent: Monday, August 23, 2010 22:28
> To: Jason Gunthorpe
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Or Gerlitz; Yossi Etigin; Roland
Dreier
> Subject: Re: IPoIB: Broken IGMP processing
> 
> On Mon, 23 Aug 2010, Jason Gunthorpe wrote:
> 
> > src_qp is just the send QPN, you need to look at qp_num (aka dest
qp).
> > I'm not entirely sure what it will be, I didn't find anything too
clear
> > in the spec. If it is 0xFFFFFF then the HCA is copying the dest QP
> > directly into the WC and this can work, if it is something else then
> > the HCA is setting it to the QPN of the RQ that recieved the packet,
> > which is not useful for this.
> 
> AFAICT the spec has a QP for RC and one for UD packets. How do these
> relate to multicast?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: IPoIB: Broken IGMP processing
       [not found]                     ` <7E95F01E94AB484F83061FCFA35B39F8013249-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org>
@ 2010-08-23 19:45                       ` Jason Gunthorpe
       [not found]                         ` <20100823194555.GN26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Gunthorpe @ 2010-08-23 19:45 UTC (permalink / raw)
  To: Yossi Etigin
  Cc: Christoph Lameter, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	Roland Dreier

On Mon, Aug 23, 2010 at 10:34:38PM +0300, Yossi Etigin wrote:
> Hello Christoph,
> 
> I think you'll first want to check the wc_flags field in the work
> completion to see whether GRH is present (and it must be present in
> multicast packets, normally not present in unicast). Then, extract the
> destination gid from grh, and compare the first 6 bytes of it to the
> dev->broadcast address bytes 4...9 (once with dev->broadcast which is
> IPv4, second time with dev->broadcast with the IP version nibble changed
> to 6). The first 4 bytes of dev->broadcast are QPN.
> Of course, you'll be doing this before pulling any other headers from
> the skb because GRH comes first.

> The QPN you get in the completion is the QP from which it was sent -
> src_qp. You don't get the QPN which was put in the UD header by the
> sender, which is 0xFFFFFF for multicast. In userspace (aka struct
> ibv_wc), you get the qp number from which the receive buffer was taken,
> and in kernel space (aka struct ib_wc, no 'v') you get the pointer to
> the QP.

Ah, so that clears things up. Sorry to have confused things with this
suggestion.

Simplest then is to check if byte 24 of the packet is 0xff.
(ie IN6_IS_ADDR_MULTICAST)

No need to worry about if it is properly formed or anything, if it is
a multicast DGID then it is a multicast packet at the link level.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: IPoIB: Broken IGMP processing
       [not found]                         ` <20100823194555.GN26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2010-08-23 19:50                           ` Yossi Etigin
       [not found]                             ` <7E95F01E94AB484F83061FCFA35B39F801324A-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org>
  2010-08-23 19:59                           ` Christoph Lameter
  1 sibling, 1 reply; 33+ messages in thread
From: Yossi Etigin @ 2010-08-23 19:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Lameter, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	Roland Dreier

> 
> Simplest then is to check if byte 24 of the packet is 0xff.
> (ie IN6_IS_ADDR_MULTICAST)
> 
> No need to worry about if it is properly formed or anything, if it is
> a multicast DGID then it is a multicast packet at the link level.
> 

Sounds good to me

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: IPoIB: Broken IGMP processing
       [not found]                         ` <20100823194555.GN26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2010-08-23 19:50                           ` Yossi Etigin
@ 2010-08-23 19:59                           ` Christoph Lameter
       [not found]                             ` <alpine.DEB.2.00.1008231454230.12815-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
  1 sibling, 1 reply; 33+ messages in thread
From: Christoph Lameter @ 2010-08-23 19:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yossi Etigin, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	Roland Dreier

On Mon, 23 Aug 2010, Jason Gunthorpe wrote:

> Simplest then is to check if byte 24 of the packet is 0xff.
> (ie IN6_IS_ADDR_MULTICAST)

Dont see that function defined anywhere in the kernel.

> No need to worry about if it is properly formed or anything, if it is
> a multicast DGID then it is a multicast packet at the link level.

ok so skb->head[24] = 0xff ?

Looks raw and not kernel style. There need to be some skb_xxx function
that get to it.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: IPoIB: Broken IGMP processing
       [not found]                             ` <alpine.DEB.2.00.1008231454230.12815-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
@ 2010-08-24 18:49                               ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2010-08-24 18:49 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Yossi Etigin, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	Roland Dreier

On Mon, Aug 23, 2010 at 02:59:45PM -0500, Christoph Lameter wrote:
> On Mon, 23 Aug 2010, Jason Gunthorpe wrote:
> 
> > Simplest then is to check if byte 24 of the packet is 0xff.
> > (ie IN6_IS_ADDR_MULTICAST)
> 
> Dont see that function defined anywhere in the kernel.

That is what userspace calls it..

How about

ipv6_addr_type(((ipv6hdr *)skb->head)->daddr) & IPV6_ADDR_MULTICAST

There are 32 occurrences of 'skb->head' in drivers/net..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: IPoIB: Broken IGMP processing
       [not found]                             ` <7E95F01E94AB484F83061FCFA35B39F801324A-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org>
@ 2010-08-25 14:43                               ` Christoph Lameter
  0 siblings, 0 replies; 33+ messages in thread
From: Christoph Lameter @ 2010-08-25 14:43 UTC (permalink / raw)
  To: Yossi Etigin
  Cc: Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	Roland Dreier

On Mon, 23 Aug 2010, Yossi Etigin wrote:

> >
> > Simplest then is to check if byte 24 of the packet is 0xff.
> > (ie IN6_IS_ADDR_MULTICAST)
> >
> > No need to worry about if it is properly formed or anything, if it is
> > a multicast DGID then it is a multicast packet at the link level.
> >
>
> Sounds good to me

Therefore this patch?

Subject: [IB] Make igmp processing work with IPOIB

IGMP processing is broken because the IPOIB does not set the
skb->pkt_type the right way for Multicast traffic. All incoming
packets are set to PACKET_HOST which means that the igmp_recv()
function will ignore the IGMP broadcasts/multicasts.

This in turn means that the IGMP timers are firing and are sending
information about multicast subscriptions unnecessarily. In a large
private network this can cause traffic spikes.

Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>

---
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |    7 +++++--
 include/linux/in6.h                     |    3 +++
 2 files changed, 8 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-23 16:04:38.000000000 -0500
+++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-25 09:43:01.000000000 -0500
@@ -281,8 +281,11 @@ static void ipoib_ib_handle_rx_wc(struct
 	dev->stats.rx_bytes += skb->len;

 	skb->dev = dev;
-	/* XXX get correct PACKET_ type here */
-	skb->pkt_type = PACKET_HOST;
+	if (IN6_IS_ADDR_MULTICAST(skb->head + 24))
+
+		skb->pkt_type = PACKET_MULTICAST;
+	else
+		skb->pkt_type = PACKET_HOST;

 	if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok))
 		skb->ip_summed = CHECKSUM_UNNECESSARY;
Index: linux-2.6/include/linux/in6.h
===================================================================
--- linux-2.6.orig/include/linux/in6.h	2010-08-25 09:39:40.000000000 -0500
+++ linux-2.6/include/linux/in6.h	2010-08-25 09:40:22.000000000 -0500
@@ -53,6 +53,9 @@ extern const struct in6_addr in6addr_lin
 extern const struct in6_addr in6addr_linklocal_allrouters;
 #define IN6ADDR_LINKLOCAL_ALLROUTERS_INIT \
 		{ { { 0xff,2,0,0,0,0,0,0,0,0,0,0,0,0,0,2 } } }
+
+#define IN6_IS_ADDR_MULTICAST(a) (((const __u8 *) (a))[0] == 0xff)
+
 #endif

 struct sockaddr_in6 {
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [IPoIB] Identify Multicast packets and fix IGMP breakage
       [not found]             ` <20100823183044.GM26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2010-08-26 19:55               ` Christoph Lameter
       [not found]                 ` <alpine.DEB.2.00.1008261453510.21466-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Lameter @ 2010-08-26 19:55 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Jason Gunthorpe,
	Yossi Etigin

I had to move the check around a bit but the patch now passes our tests.
Please merge soon.


Subject: [IPoIB] Identify Multicast packets and fix IGMP breakage

IGMP processing is broken because the IPOIB does not set the
skb->pkt_type the right way for Multicast traffic. All incoming
packets are set to PACKET_HOST which means that the igmp_recv()
function will ignore the IGMP broadcasts/multicasts.

This in turn means that the IGMP timers are firing and are sending
information about multicast subscriptions unnecessarily. In a large
private network this can cause traffic spikes.

Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>

---
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |    7 +++++--
 include/linux/in6.h                     |    3 +++
 2 files changed, 8 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-26 14:11:39.000000000 -0500
+++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-26 14:51:31.000000000 -0500
@@ -271,6 +271,14 @@
 	ipoib_ud_dma_unmap_rx(priv, mapping);
 	ipoib_ud_skb_put_frags(priv, skb, wc->byte_len);

+	/* According to Jason Gunthorpe byte 24 in the GRH has the MGID */
+	if (IN6_IS_ADDR_MULTICAST(skb->data + 24))
+
+		skb->pkt_type = PACKET_MULTICAST;
+
+	else
+		skb->pkt_type = PACKET_HOST;
+
 	skb_pull(skb, IB_GRH_BYTES);

 	skb->protocol = ((struct ipoib_header *) skb->data)->proto;
@@ -281,9 +289,6 @@
 	dev->stats.rx_bytes += skb->len;

 	skb->dev = dev;
-	/* XXX get correct PACKET_ type here */
-	skb->pkt_type = PACKET_HOST;
-
 	if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok))
 		skb->ip_summed = CHECKSUM_UNNECESSARY;

Index: linux-2.6/include/linux/in6.h
===================================================================
--- linux-2.6.orig/include/linux/in6.h	2010-08-26 14:11:39.000000000 -0500
+++ linux-2.6/include/linux/in6.h	2010-08-26 14:11:52.000000000 -0500
@@ -53,6 +53,9 @@
 extern const struct in6_addr in6addr_linklocal_allrouters;
 #define IN6ADDR_LINKLOCAL_ALLROUTERS_INIT \
 		{ { { 0xff,2,0,0,0,0,0,0,0,0,0,0,0,0,0,2 } } }
+
+#define IN6_IS_ADDR_MULTICAST(a) (((const __u8 *) (a))[0] == 0xff)
+
 #endif

 struct sockaddr_in6 {
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [IPoIB] Identify Multicast packets and fix IGMP breakage
       [not found]                 ` <alpine.DEB.2.00.1008261453510.21466-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
@ 2010-08-26 20:49                   ` Roland Dreier
       [not found]                     ` <adapqx5rtez.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Roland Dreier @ 2010-08-26 20:49 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	Jason Gunthorpe, Yossi Etigin

 > +	/* According to Jason Gunthorpe byte 24 in the GRH has the MGID */

I think we could put a better explanation here than "Jason said so."  In
fact since the GRH format matches the IPv6 header format, maybe we do
something like "((struct ipv6hdr *) skb->data)->daddr" to get at the
DGID being used.

 > +	if (IN6_IS_ADDR_MULTICAST(skb->data + 24))
 > +

trivial but this blank line isn't needed here.

 > +		skb->pkt_type = PACKET_MULTICAST;
 > +

or here

 > +	else
 > +		skb->pkt_type = PACKET_HOST;

also it's not clear to me why it's OK to do this test of the DGID if the
packet didn't have a GRH -- presumably we are just looking at random
uninitialized memory so we might incorrectly say some packets are
multicast if that byte happens to be 0xff.  (or does that not matter?
if so why can't we just always make everything PACKET_MULTICAST?)

It seems the check should be something like

	if ((wc->wc_flags & IB_WC_GRH) &&
	    IN6_IS_ADDR_MULTICAST(((struct ipv6hdr *) skb->data)->daddr))

 > --- linux-2.6.orig/include/linux/in6.h	2010-08-26 14:11:39.000000000 -0500
 > +++ linux-2.6/include/linux/in6.h	2010-08-26 14:11:52.000000000 -0500
 > @@ -53,6 +53,9 @@
 >  extern const struct in6_addr in6addr_linklocal_allrouters;
 >  #define IN6ADDR_LINKLOCAL_ALLROUTERS_INIT \
 >  		{ { { 0xff,2,0,0,0,0,0,0,0,0,0,0,0,0,0,2 } } }
 > +
 > +#define IN6_IS_ADDR_MULTICAST(a) (((const __u8 *) (a))[0] == 0xff)

We probably want to run this addition past netdev.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [IPoIB] Identify Multicast packets and fix IGMP breakage
       [not found]                     ` <adapqx5rtez.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
@ 2010-08-26 21:17                       ` Christoph Lameter
       [not found]                         ` <alpine.DEB.2.00.1008261613160.24174-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Lameter @ 2010-08-26 21:17 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	Jason Gunthorpe, Yossi Etigin

On Thu, 26 Aug 2010, Roland Dreier wrote:

> also it's not clear to me why it's OK to do this test of the DGID if the
> packet didn't have a GRH -- presumably we are just looking at random
> uninitialized memory so we might incorrectly say some packets are
> multicast if that byte happens to be 0xff.  (or does that not matter?
> if so why can't we just always make everything PACKET_MULTICAST?)

We will do an skb_pull to skip over the GRH next? The IP layer checks
PACKET_HOST etc in various places so we may risk breakage in odd places
like what already occurred here. It would be good to also have support
for PACKET_BROADCAST but that would require a full MGID comparison.

> It seems the check should be something like
>
> 	if ((wc->wc_flags & IB_WC_GRH) &&
> 	    IN6_IS_ADDR_MULTICAST(((struct ipv6hdr *) skb->data)->daddr))

Hmmm... More IPV6 stuff slipping in.

> We probably want to run this addition past netdev.

Ok.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [IPoIB] Identify Multicast packets and fix IGMP breakage
       [not found]                         ` <alpine.DEB.2.00.1008261613160.24174-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
@ 2010-08-26 21:29                           ` Jason Gunthorpe
  2010-08-26 21:31                           ` [IPoIB] Identify multicast packets and fix IGMP breakage V2 Christoph Lameter
  2010-08-26 21:32                           ` [IPoIB] Identify Multicast packets and fix IGMP breakage Roland Dreier
  2 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2010-08-26 21:29 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Roland Dreier, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Or Gerlitz, Yossi Etigin

On Thu, Aug 26, 2010 at 04:17:08PM -0500, Christoph Lameter wrote:
> On Thu, 26 Aug 2010, Roland Dreier wrote:
> 
> > also it's not clear to me why it's OK to do this test of the DGID if the
> > packet didn't have a GRH -- presumably we are just looking at random
> > uninitialized memory so we might incorrectly say some packets are
> > multicast if that byte happens to be 0xff.  (or does that not matter?
> > if so why can't we just always make everything PACKET_MULTICAST?)
>
> We will do an skb_pull to skip over the GRH next? 

The GRH bytes are always present, but the HW may not fill them in if 
IB_WC_GRH is not set.

Checking for the broadcast address should be done by comparing the
MGID against dev->broadcast + 4, like Yossi was describing. This can
be done as a sub-case of the multicast check since dev->broadcast+4
will always start with 0xff.

IPv6 does not have a broadcast address, so the additional work
to check the v6 MGID format Yossi was discussing is not required.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [IPoIB] Identify multicast packets and fix IGMP breakage V2
       [not found]                         ` <alpine.DEB.2.00.1008261613160.24174-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
  2010-08-26 21:29                           ` Jason Gunthorpe
@ 2010-08-26 21:31                           ` Christoph Lameter
       [not found]                             ` <alpine.DEB.2.00.1008261629440.24174-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
  2010-08-26 21:32                           ` [IPoIB] Identify Multicast packets and fix IGMP breakage Roland Dreier
  2 siblings, 1 reply; 33+ messages in thread
From: Christoph Lameter @ 2010-08-26 21:31 UTC (permalink / raw)
  To: Roland Dreier
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Jason Gunthorpe,
	Yossi Etigin, netdev-u79uwXL29TY76Z2rM5mHXA

Is this better?


Subject: [IPoIB] Identify Multicast packets and fix IGMP breakage V2

IGMP processing is broken because the IPOIB does not set the
skb->pkt_type the right way for Multicast traffic. All incoming
packets are set to PACKET_HOST which means that the igmp_recv()
function will ignore the IGMP broadcasts/multicasts.

This in turn means that the IGMP timers are firing and are sending
information about multicast subscriptions unnecessarily. In a large
private network this can cause traffic spikes.

Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>

---
 drivers/infiniband/ulp/ipoib/ipoib_ib.c |   10 +++++++---
 include/linux/in6.h                     |    3 +++
 2 files changed, 10 insertions(+), 3 deletions(-)

Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-26 15:11:34.000000000 -0500
+++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-26 16:29:19.000000000 -0500
@@ -271,6 +271,13 @@ static void ipoib_ib_handle_rx_wc(struct
 	ipoib_ud_dma_unmap_rx(priv, mapping);
 	ipoib_ud_skb_put_frags(priv, skb, wc->byte_len);

+	if ((wc->wc_flags & IB_WC_GRH) &&
+		IN6_IS_ADDR_MULTICAST(&((struct ipv6hdr *)skb->data)->daddr))
+
+		skb->pkt_type = PACKET_MULTICAST;
+	else
+		skb->pkt_type = PACKET_HOST;
+
 	skb_pull(skb, IB_GRH_BYTES);

 	skb->protocol = ((struct ipoib_header *) skb->data)->proto;
@@ -281,9 +288,6 @@ static void ipoib_ib_handle_rx_wc(struct
 	dev->stats.rx_bytes += skb->len;

 	skb->dev = dev;
-	/* XXX get correct PACKET_ type here */
-	skb->pkt_type = PACKET_HOST;
-
 	if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok))
 		skb->ip_summed = CHECKSUM_UNNECESSARY;

Index: linux-2.6/include/linux/in6.h
===================================================================
--- linux-2.6.orig/include/linux/in6.h	2010-08-26 15:11:34.000000000 -0500
+++ linux-2.6/include/linux/in6.h	2010-08-26 16:27:08.000000000 -0500
@@ -53,6 +53,9 @@ extern const struct in6_addr in6addr_lin
 extern const struct in6_addr in6addr_linklocal_allrouters;
 #define IN6ADDR_LINKLOCAL_ALLROUTERS_INIT \
 		{ { { 0xff,2,0,0,0,0,0,0,0,0,0,0,0,0,0,2 } } }
+
+#define IN6_IS_ADDR_MULTICAST(a) ((a)->in6_u.u6_addr8[0] == 0xff)
+
 #endif

 struct sockaddr_in6 {
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [IPoIB] Identify Multicast packets and fix IGMP breakage
       [not found]                         ` <alpine.DEB.2.00.1008261613160.24174-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
  2010-08-26 21:29                           ` Jason Gunthorpe
  2010-08-26 21:31                           ` [IPoIB] Identify multicast packets and fix IGMP breakage V2 Christoph Lameter
@ 2010-08-26 21:32                           ` Roland Dreier
  2 siblings, 0 replies; 33+ messages in thread
From: Roland Dreier @ 2010-08-26 21:32 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	Jason Gunthorpe, Yossi Etigin

 > > also it's not clear to me why it's OK to do this test of the DGID if the
 > > packet didn't have a GRH -- presumably we are just looking at random
 > > uninitialized memory so we might incorrectly say some packets are
 > > multicast if that byte happens to be 0xff.  (or does that not matter?
 > > if so why can't we just always make everything PACKET_MULTICAST?)

 > We will do an skb_pull to skip over the GRH next? The IP layer checks
 > PACKET_HOST etc in various places so we may risk breakage in odd places
 > like what already occurred here. It would be good to also have support
 > for PACKET_BROADCAST but that would require a full MGID comparison.

I'm not sure I follow what you're saying.  An IB UD packet may or may
not include a GRH on the wire.  If it does, when it's received, the
adapter writes the GRH and then the packet data into the receive
buffer.  If there is no GRH on the wire, then the adapter skips 40 bytes
(size of a GRH/ipv6 header) and then writes the packet data into the
receive buffer.

So in all cases, the real packet data starts 40 bytes into the receive
buffer and we have to do an skb_pull, but if there was no GRH on the
wire, then the 40 bytes we skipped is just uninitialized memory.  The
wc_flags bit IB_WC_GRH tells you if there was a GRH on the wire or not.

It sounds like we want to get this right without random false positives,
so I think we need to test IB_WC_GRH.  (And doing a full compare against
the broadcast MGID for multicast packets wouldn't be *too* bad -- it's
not much worse than what ethernet does)

 > > It seems the check should be something like
 > >
 > > 	if ((wc->wc_flags & IB_WC_GRH) &&
 > > 	    IN6_IS_ADDR_MULTICAST(((struct ipv6hdr *) skb->data)->daddr))
 > 
 > Hmmm... More IPV6 stuff slipping in.

I guess it would be at least as correct to cast to struct ib_grh and
check that the dgid is a multicast GID (we can add an inline function to
check this and as a cleanup we could then get rid of the open-coded
tests in ib_attach_mcast etc)

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V2
       [not found]                             ` <alpine.DEB.2.00.1008261629440.24174-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
@ 2010-08-26 22:15                               ` David Miller
       [not found]                                 ` <20100826.151553.242147157.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: David Miller @ 2010-08-26 22:15 UTC (permalink / raw)
  To: cl-vYTEC60ixJUAvxtiuMwx3w
  Cc: rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ogerlitz-smomgflXvOZWk0Htik3J/w,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	yosefe-smomgflXvOZWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA

From: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
Date: Thu, 26 Aug 2010 16:31:14 -0500 (CDT)

> @@ -271,6 +271,13 @@ static void ipoib_ib_handle_rx_wc(struct
>  	ipoib_ud_dma_unmap_rx(priv, mapping);
>  	ipoib_ud_skb_put_frags(priv, skb, wc->byte_len);
> 
> +	if ((wc->wc_flags & IB_WC_GRH) &&
> +		IN6_IS_ADDR_MULTICAST(&((struct ipv6hdr *)skb->data)->daddr))
> +
> +		skb->pkt_type = PACKET_MULTICAST;
> +	else
> +		skb->pkt_type = PACKET_HOST;

I really don't think you can assume there is an ipv6 header here
at all.

You'll need to parse the encapsulated protocol and process it as
ipv4 or ipv6 as needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V2
       [not found]                                 ` <20100826.151553.242147157.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2010-08-26 22:21                                   ` Jason Gunthorpe
       [not found]                                     ` <20100826222146.GA23025-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Gunthorpe @ 2010-08-26 22:21 UTC (permalink / raw)
  To: David Miller
  Cc: cl-vYTEC60ixJUAvxtiuMwx3w, rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ogerlitz-smomgflXvOZWk0Htik3J/w, yosefe-smomgflXvOZWk0Htik3J/w,
	netdev-u79uwXL29TY76Z2rM5mHXA

On Thu, Aug 26, 2010 at 03:15:53PM -0700, David Miller wrote:
> From: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
> Date: Thu, 26 Aug 2010 16:31:14 -0500 (CDT)
> 
> > @@ -271,6 +271,13 @@ static void ipoib_ib_handle_rx_wc(struct
> >  	ipoib_ud_dma_unmap_rx(priv, mapping);
> >  	ipoib_ud_skb_put_frags(priv, skb, wc->byte_len);
> > 
> > +	if ((wc->wc_flags & IB_WC_GRH) &&
> > +		IN6_IS_ADDR_MULTICAST(&((struct ipv6hdr *)skb->data)->daddr))
> > +
> > +		skb->pkt_type = PACKET_MULTICAST;
> > +	else
> > +		skb->pkt_type = PACKET_HOST;
> 
> I really don't think you can assume there is an ipv6 header here
> at all.

The 40 bytes at this location are defined by the HW specification to
be an IB GRH which has an identical layout to an IPv6 header. Roland
is right, it would be clearer to use ib_grh ->dgid

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3
       [not found]                                     ` <20100826222146.GA23025-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2010-08-26 23:26                                       ` Christoph Lameter
       [not found]                                         ` <alpine.DEB.2.00.1008261825490.26351-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Lameter @ 2010-08-26 23:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: David Miller, rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ogerlitz-smomgflXvOZWk0Htik3J/w, yosefe-smomgflXvOZWk0Htik3J/w,
	netdev-u79uwXL29TY76Z2rM5mHXA

On Thu, 26 Aug 2010, Jason Gunthorpe wrote:

> The 40 bytes at this location are defined by the HW specification to
> be an IB GRH which has an identical layout to an IPv6 header. Roland
> is right, it would be clearer to use ib_grh ->dgid

Ok but then we have no nice function that checks for multicast anymore.



Subject: [IPoIB] Identify multicast packets and fix IGMP breakage V3

IGMP processing is broken because the IPOIB does not set the
skb->pkt_type the right way for Multicast traffic. All incoming
packets are set to PACKET_HOST which means that the igmp_recv()
function will ignore the IGMP broadcasts/multicasts.

This in turn means that the IGMP timers are firing and are sending
information about multicast subscriptions unnecessarily. In a large
private network this can cause traffic spikes.

Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>

---

Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-26 18:24:07.842079559 -0500
+++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-26 18:25:33.859815544 -0500
@@ -271,6 +271,14 @@
 	ipoib_ud_dma_unmap_rx(priv, mapping);
 	ipoib_ud_skb_put_frags(priv, skb, wc->byte_len);

+	/* First byte of dgid signals multicast when 0xff */
+	if ((wc->wc_flags & IB_WC_GRH) &&
+		((struct ib_grh *)skb->data)->dgid.raw[0] == 0xff)
+
+		skb->pkt_type = PACKET_MULTICAST;
+	else
+		skb->pkt_type = PACKET_HOST;
+
 	skb_pull(skb, IB_GRH_BYTES);

 	skb->protocol = ((struct ipoib_header *) skb->data)->proto;
@@ -281,9 +289,6 @@
 	dev->stats.rx_bytes += skb->len;

 	skb->dev = dev;
-	/* XXX get correct PACKET_ type here */
-	skb->pkt_type = PACKET_HOST;
-
 	if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok))
 		skb->ip_summed = CHECKSUM_UNNECESSARY;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3
       [not found]                                         ` <alpine.DEB.2.00.1008261825490.26351-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
@ 2010-08-26 23:43                                           ` Jason Gunthorpe
  2010-08-26 23:57                                             ` Christoph Lameter
       [not found]                                             ` <20100826234342.GA24333-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 2 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2010-08-26 23:43 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ogerlitz-smomgflXvOZWk0Htik3J/w, yosefe-smomgflXvOZWk0Htik3J/w,
	netdev-u79uwXL29TY76Z2rM5mHXA

On Thu, Aug 26, 2010 at 06:26:58PM -0500, Christoph Lameter wrote:
> On Thu, 26 Aug 2010, Jason Gunthorpe wrote:
>
> > The 40 bytes at this location are defined by the HW specification to
> > be an IB GRH which has an identical layout to an IPv6 header. Roland
> > is right, it would be clearer to use ib_grh ->dgid
>
> Ok but then we have no nice function that checks for multicast anymore.

Were you going to try it this way?

     /* First byte of dgid signals multicast/broadcast when 0xff */
     if ((wc->wc_flags & IB_WC_GRH) &&
         ((struct ib_grh *)skb->data)->dgid.raw[0] == 0xff) {
	     if (memcmp(((struct ib_grh *)skb->data)->dgid.raw,
                        dev->broadcast + 4, sizeof(union ib_gid)) == 0)
                     skb->pkt_type = PACKET_BROADCAST;
             else
                     skb->pkt_type = PACKET_MULTICAST;
     }
     else
             skb->pkt_type = PACKET_HOST;

I think doing the memcmp only in the multicast path should be
reasonable overhead wise.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3
  2010-08-26 23:43                                           ` Jason Gunthorpe
@ 2010-08-26 23:57                                             ` Christoph Lameter
       [not found]                                               ` <alpine.DEB.2.00.1008261856090.29657-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
       [not found]                                             ` <20100826234342.GA24333-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 33+ messages in thread
From: Christoph Lameter @ 2010-08-26 23:57 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: rdreier, linux-rdma, ogerlitz, yosefe, netdev

On Thu, 26 Aug 2010, Jason Gunthorpe wrote:

> I think doing the memcmp only in the multicast path should be
> reasonable overhead wise.

Thats is not always possible. Here the multicast path is the
default path that is taken 99% of the time.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [IPoIB] Identify multicast packets and fix IGMP breakage V3
       [not found]                                             ` <20100826234342.GA24333-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2010-08-27  0:02                                                 ` Yossi Etigin
  2010-08-27 13:29                                               ` Christoph Lameter
  1 sibling, 0 replies; 33+ messages in thread
From: Yossi Etigin @ 2010-08-27  0:02 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Lameter
  Cc: rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	netdev-u79uwXL29TY76Z2rM5mHXA

> 
> Were you going to try it this way?
> 
>      /* First byte of dgid signals multicast/broadcast when 0xff */
>      if ((wc->wc_flags & IB_WC_GRH) &&
>          ((struct ib_grh *)skb->data)->dgid.raw[0] == 0xff) {
> 	     if (memcmp(((struct ib_grh *)skb->data)->dgid.raw,
>                         dev->broadcast + 4, sizeof(union ib_gid)) ==
0)
>                      skb->pkt_type = PACKET_BROADCAST;
>              else
>                      skb->pkt_type = PACKET_MULTICAST;
>      }
>      else
>              skb->pkt_type = PACKET_HOST;
> 
> I think doing the memcmp only in the multicast path should be
> reasonable overhead wise.
> 

Shouldn't struct ib_grh be packed to make this really work?
The code looks a little messy to me anyway... 
How about using a local var which is a ptr to packed struct ib_grh? The
compiler will probably eliminate it anyway.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* RE: [IPoIB] Identify multicast packets and fix IGMP breakage V3
@ 2010-08-27  0:02                                                 ` Yossi Etigin
  0 siblings, 0 replies; 33+ messages in thread
From: Yossi Etigin @ 2010-08-27  0:02 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Lameter
  Cc: rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	netdev-u79uwXL29TY76Z2rM5mHXA

> 
> Were you going to try it this way?
> 
>      /* First byte of dgid signals multicast/broadcast when 0xff */
>      if ((wc->wc_flags & IB_WC_GRH) &&
>          ((struct ib_grh *)skb->data)->dgid.raw[0] == 0xff) {
> 	     if (memcmp(((struct ib_grh *)skb->data)->dgid.raw,
>                         dev->broadcast + 4, sizeof(union ib_gid)) ==
0)
>                      skb->pkt_type = PACKET_BROADCAST;
>              else
>                      skb->pkt_type = PACKET_MULTICAST;
>      }
>      else
>              skb->pkt_type = PACKET_HOST;
> 
> I think doing the memcmp only in the multicast path should be
> reasonable overhead wise.
> 

Shouldn't struct ib_grh be packed to make this really work?
The code looks a little messy to me anyway... 
How about using a local var which is a ptr to packed struct ib_grh? The
compiler will probably eliminate it anyway.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3
       [not found]                                               ` <alpine.DEB.2.00.1008261856090.29657-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
@ 2010-08-27  0:07                                                 ` David Miller
  2010-08-27  2:24                                                   ` Christoph Lameter
  0 siblings, 1 reply; 33+ messages in thread
From: David Miller @ 2010-08-27  0:07 UTC (permalink / raw)
  To: cl-vYTEC60ixJUAvxtiuMwx3w
  Cc: jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ogerlitz-smomgflXvOZWk0Htik3J/w, yosefe-smomgflXvOZWk0Htik3J/w,
	netdev-u79uwXL29TY76Z2rM5mHXA

From: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>
Date: Thu, 26 Aug 2010 18:57:09 -0500 (CDT)

> On Thu, 26 Aug 2010, Jason Gunthorpe wrote:
> 
>> I think doing the memcmp only in the multicast path should be
>> reasonable overhead wise.
> 
> Thats is not always possible. Here the multicast path is the
> default path that is taken 99% of the time.

The highest cost is bringing in that packet header's cache line, which
you've already done by reading the byte and checking for 0xff.

I doubt the memcmp() can possibly matter.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3
       [not found]                                                 ` <7E95F01E94AB484F83061FCFA35B39F89BE276-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org>
@ 2010-08-27  0:17                                                   ` Jason Gunthorpe
  0 siblings, 0 replies; 33+ messages in thread
From: Jason Gunthorpe @ 2010-08-27  0:17 UTC (permalink / raw)
  To: Yossi Etigin
  Cc: Christoph Lameter, rdreier-FYB4Gu1CFyUAvxtiuMwx3w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz,
	netdev-u79uwXL29TY76Z2rM5mHXA

On Fri, Aug 27, 2010 at 03:02:54AM +0300, Yossi Etigin wrote:

> Shouldn't struct ib_grh be packed to make this really work?

No idea what the kernel convention for this is. It looks OK to me, in
that no arch I am familiar with will insert padding.

> The code looks a little messy to me anyway... 
> How about using a local var which is a ptr to packed struct ib_grh? The
> compiler will probably eliminate it anyway.

This bike shed is looking pretty well painted already :)

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3
  2010-08-27  0:07                                                 ` David Miller
@ 2010-08-27  2:24                                                   ` Christoph Lameter
  0 siblings, 0 replies; 33+ messages in thread
From: Christoph Lameter @ 2010-08-27  2:24 UTC (permalink / raw)
  To: David Miller; +Cc: jgunthorpe, rdreier, linux-rdma, ogerlitz, yosefe, netdev


On Thu, 26 Aug 2010, David Miller wrote:

> The highest cost is bringing in that packet header's cache line, which
> you've already done by reading the byte and checking for 0xff.

And then you need to bring in the cacheline of the broadcast address....
These are pretty long byte strings in the IB case.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3
       [not found]                                             ` <20100826234342.GA24333-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2010-08-27  0:02                                                 ` Yossi Etigin
@ 2010-08-27 13:29                                               ` Christoph Lameter
  2010-09-14  7:27                                                 ` Or Gerlitz
       [not found]                                                 ` <alpine.DEB.2.00.1008270827280.11792-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
  1 sibling, 2 replies; 33+ messages in thread
From: Christoph Lameter @ 2010-08-27 13:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ogerlitz-smomgflXvOZWk0Htik3J/w, yosefe-smomgflXvOZWk0Htik3J/w,
	netdev-u79uwXL29TY76Z2rM5mHXA, David Miller

On Thu, 26 Aug 2010, Jason Gunthorpe wrote:

> I think doing the memcmp only in the multicast path should be
> reasonable overhead wise.

Ok the dgid is only 8 bytes not the whole 40 bytes.... Here is the patch
somewhat cleaned up with PACKET_BROADCAST.


Subject: [IPoIB] Identify multicast packets and fix IGMP breakage V3

IGMP processing is broken because the IPOIB does not set the
skb->pkt_type the right way for Multicast traffic. All incoming
packets are set to PACKET_HOST which means that the igmp_recv()
function will ignore the IGMP broadcasts/multicasts.

This in turn means that the IGMP timers are firing and are sending
information about multicast subscriptions unnecessarily. In a large
private network this can cause traffic spikes.

Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org>

---

Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-26 18:24:07.842079559 -0500
+++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-27 08:26:37.929641162 -0500
@@ -223,6 +223,7 @@
 	unsigned int wr_id = wc->wr_id & ~IPOIB_OP_RECV;
 	struct sk_buff *skb;
 	u64 mapping[IPOIB_UD_RX_SG];
+	union ib_gid *dgid;

 	ipoib_dbg_data(priv, "recv completion: id %d, status: %d\n",
 		       wr_id, wc->status);
@@ -271,6 +272,21 @@
 	ipoib_ud_dma_unmap_rx(priv, mapping);
 	ipoib_ud_skb_put_frags(priv, skb, wc->byte_len);

+	/* First byte of dgid signals multicast when 0xff */
+	dgid = &((struct ib_grh *)skb->data)->dgid;
+
+	if (!(wc->wc_flags & IB_WC_GRH) || dgid->raw[0] != 0xff)
+
+		skb->pkt_type = PACKET_HOST;
+
+	else if (memcmp(dgid, dev->broadcast + 4, sizeof(union ib_gid)) == 0)
+
+		skb->pkt_type = PACKET_BROADCAST;
+
+	else
+
+		skb->pkt_type = PACKET_MULTICAST;
+
 	skb_pull(skb, IB_GRH_BYTES);

 	skb->protocol = ((struct ipoib_header *) skb->data)->proto;
@@ -281,9 +297,6 @@
 	dev->stats.rx_bytes += skb->len;

 	skb->dev = dev;
-	/* XXX get correct PACKET_ type here */
-	skb->pkt_type = PACKET_HOST;
-
 	if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok))
 		skb->ip_summed = CHECKSUM_UNNECESSARY;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3
  2010-08-27 13:29                                               ` Christoph Lameter
@ 2010-09-14  7:27                                                 ` Or Gerlitz
       [not found]                                                   ` <4C8F23DB.2040105-smomgflXvOZWk0Htik3J/w@public.gmane.org>
       [not found]                                                 ` <alpine.DEB.2.00.1008270827280.11792-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
  1 sibling, 1 reply; 33+ messages in thread
From: Or Gerlitz @ 2010-09-14  7:27 UTC (permalink / raw)
  To: Christoph Lameter, Roland Dreier
  Cc: Jason Gunthorpe, linux-rdma, yosefe, netdev, David Miller

Christoph Lameter wrote:
> Here is the patch somewhat cleaned up with PACKET_BROADCAST.
> Subject: [IPoIB] Identify multicast packets and fix IGMP breakage V3
>   

I don't see this patch in Roland's for-next branch nor Dave's 
net-next-2.6 tree, is anything else needed to merge that?

Or.

> IGMP processing is broken because the IPOIB does not set the
> skb->pkt_type the right way for Multicast traffic. All incoming
> packets are set to PACKET_HOST which means that the igmp_recv()
> function will ignore the IGMP broadcasts/multicasts.
>
> This in turn means that the IGMP timers are firing and are sending
> information about multicast subscriptions unnecessarily. In a large
> private network this can cause traffic spikes.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> ---
>
> Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> ===================================================================
> --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-26 18:24:07.842079559 -0500
> +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c	2010-08-27 08:26:37.929641162 -0500
> @@ -223,6 +223,7 @@
>  	unsigned int wr_id = wc->wr_id & ~IPOIB_OP_RECV;
>  	struct sk_buff *skb;
>  	u64 mapping[IPOIB_UD_RX_SG];
> +	union ib_gid *dgid;
>
>  	ipoib_dbg_data(priv, "recv completion: id %d, status: %d\n",
>  		       wr_id, wc->status);
> @@ -271,6 +272,21 @@
>  	ipoib_ud_dma_unmap_rx(priv, mapping);
>  	ipoib_ud_skb_put_frags(priv, skb, wc->byte_len);
>
> +	/* First byte of dgid signals multicast when 0xff */
> +	dgid = &((struct ib_grh *)skb->data)->dgid;
> +
> +	if (!(wc->wc_flags & IB_WC_GRH) || dgid->raw[0] != 0xff)
> +
> +		skb->pkt_type = PACKET_HOST;
> +
> +	else if (memcmp(dgid, dev->broadcast + 4, sizeof(union ib_gid)) == 0)
> +
> +		skb->pkt_type = PACKET_BROADCAST;
> +
> +	else
> +
> +		skb->pkt_type = PACKET_MULTICAST;
> +
>  	skb_pull(skb, IB_GRH_BYTES);
>
>  	skb->protocol = ((struct ipoib_header *) skb->data)->proto;
> @@ -281,9 +297,6 @@
>  	dev->stats.rx_bytes += skb->len;
>
>  	skb->dev = dev;
> -	/* XXX get correct PACKET_ type here */
> -	skb->pkt_type = PACKET_HOST;
> -
>  	if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok))
>  		skb->ip_summed = CHECKSUM_UNNECESSARY;
>
>   


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3
       [not found]                                                   ` <4C8F23DB.2040105-smomgflXvOZWk0Htik3J/w@public.gmane.org>
@ 2010-09-14 14:02                                                     ` Christoph Lameter
  0 siblings, 0 replies; 33+ messages in thread
From: Christoph Lameter @ 2010-09-14 14:02 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Roland Dreier, Jason Gunthorpe,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, yosefe-smomgflXvOZWk0Htik3J/w,
	netdev-u79uwXL29TY76Z2rM5mHXA, David Miller

On Tue, 14 Sep 2010, Or Gerlitz wrote:

> I don't see this patch in Roland's for-next branch nor Dave's net-next-2.6
> tree, is anything else needed to merge that?

No there is nothing else needed.

Roland said he is going to merge it for 2.6.37.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3
       [not found]                                                 ` <alpine.DEB.2.00.1008270827280.11792-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
@ 2010-09-28 18:09                                                   ` Roland Dreier
  0 siblings, 0 replies; 33+ messages in thread
From: Roland Dreier @ 2010-09-28 18:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	ogerlitz-smomgflXvOZWk0Htik3J/w, yosefe-smomgflXvOZWk0Htik3J/w,
	netdev-u79uwXL29TY76Z2rM5mHXA, David Miller

thanks, applied.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2010-09-28 18:09 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-23 17:16 IPoIB: Broken IGMP processing Christoph Lameter
     [not found] ` <alpine.DEB.2.00.1008231210010.9840-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
2010-08-23 17:41   ` Jason Gunthorpe
     [not found]     ` <20100823174110.GK26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-08-23 18:10       ` Christoph Lameter
     [not found]         ` <alpine.DEB.2.00.1008231254300.10719-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
2010-08-23 18:28           ` Jason Gunthorpe
     [not found]             ` <20100823182859.GL26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-08-23 19:27               ` Christoph Lameter
     [not found]                 ` <alpine.DEB.2.00.1008231427130.12815-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
2010-08-23 19:34                   ` Yossi Etigin
     [not found]                     ` <7E95F01E94AB484F83061FCFA35B39F8013249-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org>
2010-08-23 19:45                       ` Jason Gunthorpe
     [not found]                         ` <20100823194555.GN26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-08-23 19:50                           ` Yossi Etigin
     [not found]                             ` <7E95F01E94AB484F83061FCFA35B39F801324A-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org>
2010-08-25 14:43                               ` Christoph Lameter
2010-08-23 19:59                           ` Christoph Lameter
     [not found]                             ` <alpine.DEB.2.00.1008231454230.12815-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
2010-08-24 18:49                               ` Jason Gunthorpe
2010-08-23 18:19       ` Christoph Lameter
     [not found]         ` <alpine.DEB.2.00.1008231317150.10719-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
2010-08-23 18:30           ` Jason Gunthorpe
     [not found]             ` <20100823183044.GM26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-08-26 19:55               ` [IPoIB] Identify Multicast packets and fix IGMP breakage Christoph Lameter
     [not found]                 ` <alpine.DEB.2.00.1008261453510.21466-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
2010-08-26 20:49                   ` Roland Dreier
     [not found]                     ` <adapqx5rtez.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2010-08-26 21:17                       ` Christoph Lameter
     [not found]                         ` <alpine.DEB.2.00.1008261613160.24174-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
2010-08-26 21:29                           ` Jason Gunthorpe
2010-08-26 21:31                           ` [IPoIB] Identify multicast packets and fix IGMP breakage V2 Christoph Lameter
     [not found]                             ` <alpine.DEB.2.00.1008261629440.24174-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
2010-08-26 22:15                               ` David Miller
     [not found]                                 ` <20100826.151553.242147157.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2010-08-26 22:21                                   ` Jason Gunthorpe
     [not found]                                     ` <20100826222146.GA23025-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-08-26 23:26                                       ` [IPoIB] Identify multicast packets and fix IGMP breakage V3 Christoph Lameter
     [not found]                                         ` <alpine.DEB.2.00.1008261825490.26351-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
2010-08-26 23:43                                           ` Jason Gunthorpe
2010-08-26 23:57                                             ` Christoph Lameter
     [not found]                                               ` <alpine.DEB.2.00.1008261856090.29657-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
2010-08-27  0:07                                                 ` David Miller
2010-08-27  2:24                                                   ` Christoph Lameter
     [not found]                                             ` <20100826234342.GA24333-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2010-08-27  0:02                                               ` Yossi Etigin
2010-08-27  0:02                                                 ` Yossi Etigin
     [not found]                                                 ` <7E95F01E94AB484F83061FCFA35B39F89BE276-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org>
2010-08-27  0:17                                                   ` Jason Gunthorpe
2010-08-27 13:29                                               ` Christoph Lameter
2010-09-14  7:27                                                 ` Or Gerlitz
     [not found]                                                   ` <4C8F23DB.2040105-smomgflXvOZWk0Htik3J/w@public.gmane.org>
2010-09-14 14:02                                                     ` Christoph Lameter
     [not found]                                                 ` <alpine.DEB.2.00.1008270827280.11792-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>
2010-09-28 18:09                                                   ` Roland Dreier
2010-08-26 21:32                           ` [IPoIB] Identify Multicast packets and fix IGMP breakage Roland Dreier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.