* IPoIB: Broken IGMP processing @ 2010-08-23 17:16 Christoph Lameter [not found] ` <alpine.DEB.2.00.1008231210010.9840-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 0 siblings, 1 reply; 33+ messages in thread From: Christoph Lameter @ 2010-08-23 17:16 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Or Gerlitz, Yossi Etigin, Roland Dreier We see that IGMP timers are not properly deferred when hosts send IGMP membership information. It looks as if the IPoIB layer does not properly mark the multicast/broadcast packets with PACKET_MULTICAST or PACKET_BROADCAST. As a results icmp_recv() ignores the IGMP membership information from others. That in turn results in the IGMP timers frequently expiring, thus the network becomes quite chatty. The following is an untested patch: I am not sure how exact to access the ipob Mac header. The IB header contains a special marker for IPoIB multicast. So it should be simple to identify the multicast packets in the receive path. Subject: [IB] Make igmp processing work with IPOIB IGMP processing is broken because the IPOIB does not set the skb->pkt_type the right way for Multicast traffic. All incoming packets are set to PACKET_HOST which means that the igmp_recv() function will ignore the IGMP broadcasts/multicasts. This in turn means that the IGMP timers are firing and are sending information about multicast subscriptions unnecessarily. In a large private network this can cause traffic spikes. Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> --- drivers/infiniband/ulp/ipoib/ipoib.h | 13 +++++++++++++ drivers/infiniband/ulp/ipoib/ipoib_ib.c | 6 ++++-- 2 files changed, 17 insertions(+), 2 deletions(-) Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib.h =================================================================== --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib.h 2010-08-20 19:44:13.000000000 -0500 +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib.h 2010-08-20 19:58:21.000000000 -0500 @@ -114,6 +114,9 @@ enum { #define IPOIB_OP_CM (0) #endif +#define IPOIB_MGID_IPV4_SIGNATURE 0x401B +#define IPOIB_MGID_IPV6_SIGNATURE 0x601B + /* structs */ struct ipoib_header { @@ -125,6 +128,16 @@ struct ipoib_pseudoheader { u8 hwaddr[INFINIBAND_ALEN]; }; +int ipoib_is_ipv4_multicast(u8 *p) +{ + return *((u16 *)(p + 2)) == htonl(IPOIB_MGID_IPV4_SIGNATURE); +} + +int ipoib_is_ipv6_multicast(u8 *p) +{ + return *((u16 *)(p + 2)) == htonl(IPOIB_MGID_IPV6_SIGNATURE); +} + /* Used for all multicast joins (broadcast, IPv4 mcast and IPv6 mcast) */ struct ipoib_mcast { struct ib_sa_mcmember_rec mcmember; Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2010-08-20 18:43:44.000000000 -0500 +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2010-08-20 19:58:34.000000000 -0500 @@ -281,8 +281,10 @@ static void ipoib_ib_handle_rx_wc(struct dev->stats.rx_bytes += skb->len; skb->dev = dev; - /* XXX get correct PACKET_ type here */ - skb->pkt_type = PACKET_HOST; + if (ipoib_is_ipv4_multicast(skb_mac_header(skb))) + skb->pkt_type = PACKET_MULTICAST; + else + skb->pkt_type = PACKET_HOST; if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok)) skb->ip_summed = CHECKSUM_UNNECESSARY; -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <alpine.DEB.2.00.1008231210010.9840-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>]
* Re: IPoIB: Broken IGMP processing [not found] ` <alpine.DEB.2.00.1008231210010.9840-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> @ 2010-08-23 17:41 ` Jason Gunthorpe [not found] ` <20100823174110.GK26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 33+ messages in thread From: Jason Gunthorpe @ 2010-08-23 17:41 UTC (permalink / raw) To: Christoph Lameter Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Yossi Etigin, Roland Dreier On Mon, Aug 23, 2010 at 12:16:40PM -0500, Christoph Lameter wrote: > +int ipoib_is_ipv4_multicast(u8 *p) > +{ > + return *((u16 *)(p + 2)) == htonl(IPOIB_MGID_IPV4_SIGNATURE); > +} > + > +int ipoib_is_ipv6_multicast(u8 *p) > +{ > + return *((u16 *)(p + 2)) == htonl(IPOIB_MGID_IPV6_SIGNATURE); > +} static inline for functions in headers? > + > /* Used for all multicast joins (broadcast, IPv4 mcast and IPv6 mcast) */ > struct ipoib_mcast { > struct ib_sa_mcmember_rec mcmember; > Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c > =================================================================== > +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2010-08-20 19:58:34.000000000 -0500 > @@ -281,8 +281,10 @@ static void ipoib_ib_handle_rx_wc(struct > dev->stats.rx_bytes += skb->len; > > skb->dev = dev; > - /* XXX get correct PACKET_ type here */ > - skb->pkt_type = PACKET_HOST; > + if (ipoib_is_ipv4_multicast(skb_mac_header(skb))) > + skb->pkt_type = PACKET_MULTICAST; > + else > + skb->pkt_type = PACKET_HOST; > > if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok)) > skb->ip_summed = CHECKSUM_UNNECESSARY; Hmmm... What are you trying to access here? I'm guessing it is the DGID of the GRH? ipoib_ud_skb_put_frags(priv, skb, wc->byte_len); skb_pull(skb, IB_GRH_BYTES); <-- These are the bytes you want skb_reset_mac_header(skb); <-- Sets skb_mac_header to skb->head+40 skb_pull(skb, IPOIB_ENCAP_LEN); So, I think you are accessing byte 42, which doesn't seem right? The DGID starts in byte 24 from skb->head. Also, you need to check for IBV_WC_GRH, the 40 bytes are garbage if it is not set. Maybe checking for checking wc->qp_num == multicast QPN is a better choice? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <20100823174110.GK26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: IPoIB: Broken IGMP processing [not found] ` <20100823174110.GK26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2010-08-23 18:10 ` Christoph Lameter [not found] ` <alpine.DEB.2.00.1008231254300.10719-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 2010-08-23 18:19 ` Christoph Lameter 1 sibling, 1 reply; 33+ messages in thread From: Christoph Lameter @ 2010-08-23 18:10 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Yossi Etigin, Roland Dreier On Mon, 23 Aug 2010, Jason Gunthorpe wrote: > On Mon, Aug 23, 2010 at 12:16:40PM -0500, Christoph Lameter wrote: > > > +int ipoib_is_ipv4_multicast(u8 *p) > > +{ > > + return *((u16 *)(p + 2)) == htonl(IPOIB_MGID_IPV4_SIGNATURE); > > +} > > + > > +int ipoib_is_ipv6_multicast(u8 *p) > > +{ > > + return *((u16 *)(p + 2)) == htonl(IPOIB_MGID_IPV6_SIGNATURE); > > +} > > static inline for functions in headers? Right. > Maybe checking for checking wc->qp_num == multicast QPN is a better > choice? If that is properly set then yes of course that is much easier. Would this work? The broadcast QP is used for multicast right? Subject: [IB] Make igmp processing work with IPOIB IGMP processing is broken because the IPOIB does not set the skb->pkt_type the right way for Multicast traffic. All incoming packets are set to PACKET_HOST which means that the igmp_recv() function will ignore the IGMP broadcasts/multicasts. This in turn means that the IGMP timers are firing and are sending information about multicast subscriptions unnecessarily. In a large private network this can cause traffic spikes. Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> --- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2010-08-23 13:07:32.000000000 -0500 +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2010-08-23 13:09:06.000000000 -0500 @@ -223,6 +223,7 @@ static void ipoib_ib_handle_rx_wc(struct unsigned int wr_id = wc->wr_id & ~IPOIB_OP_RECV; struct sk_buff *skb; u64 mapping[IPOIB_UD_RX_SG]; + struct ipoib_dev_priv *multicast_priv = netdev_priv(priv->broadcast->dev); ipoib_dbg_data(priv, "recv completion: id %d, status: %d\n", wr_id, wc->status); @@ -281,8 +282,11 @@ static void ipoib_ib_handle_rx_wc(struct dev->stats.rx_bytes += skb->len; skb->dev = dev; - /* XXX get correct PACKET_ type here */ - skb->pkt_type = PACKET_HOST; + if (wc->src_qp == multicast_priv->qp->qp_num) + + skb->pkt_type = PACKET_MULTICAST; + else + skb->pkt_type = PACKET_HOST; if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok)) skb->ip_summed = CHECKSUM_UNNECESSARY; -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <alpine.DEB.2.00.1008231254300.10719-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>]
* Re: IPoIB: Broken IGMP processing [not found] ` <alpine.DEB.2.00.1008231254300.10719-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> @ 2010-08-23 18:28 ` Jason Gunthorpe [not found] ` <20100823182859.GL26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 33+ messages in thread From: Jason Gunthorpe @ 2010-08-23 18:28 UTC (permalink / raw) To: Christoph Lameter Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Yossi Etigin, Roland Dreier On Mon, Aug 23, 2010 at 01:10:45PM -0500, Christoph Lameter wrote: > + if (wc->src_qp == multicast_priv->qp->qp_num) > + > + skb->pkt_type = PACKET_MULTICAST; > + else > + skb->pkt_type = PACKET_HOST; src_qp is just the send QPN, you need to look at qp_num (aka dest qp). I'm not entirely sure what it will be, I didn't find anything too clear in the spec. If it is 0xFFFFFF then the HCA is copying the dest QP directly into the WC and this can work, if it is something else then the HCA is setting it to the QPN of the RQ that recieved the packet, which is not useful for this. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <20100823182859.GL26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: IPoIB: Broken IGMP processing [not found] ` <20100823182859.GL26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2010-08-23 19:27 ` Christoph Lameter [not found] ` <alpine.DEB.2.00.1008231427130.12815-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 0 siblings, 1 reply; 33+ messages in thread From: Christoph Lameter @ 2010-08-23 19:27 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Yossi Etigin, Roland Dreier On Mon, 23 Aug 2010, Jason Gunthorpe wrote: > src_qp is just the send QPN, you need to look at qp_num (aka dest qp). > I'm not entirely sure what it will be, I didn't find anything too clear > in the spec. If it is 0xFFFFFF then the HCA is copying the dest QP > directly into the WC and this can work, if it is something else then > the HCA is setting it to the QPN of the RQ that recieved the packet, > which is not useful for this. AFAICT the spec has a QP for RC and one for UD packets. How do these relate to multicast? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <alpine.DEB.2.00.1008231427130.12815-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>]
* RE: IPoIB: Broken IGMP processing [not found] ` <alpine.DEB.2.00.1008231427130.12815-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> @ 2010-08-23 19:34 ` Yossi Etigin [not found] ` <7E95F01E94AB484F83061FCFA35B39F8013249-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org> 0 siblings, 1 reply; 33+ messages in thread From: Yossi Etigin @ 2010-08-23 19:34 UTC (permalink / raw) To: Christoph Lameter, Jason Gunthorpe Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Roland Dreier Hello Christoph, I think you'll first want to check the wc_flags field in the work completion to see whether GRH is present (and it must be present in multicast packets, normally not present in unicast). Then, extract the destination gid from grh, and compare the first 6 bytes of it to the dev->broadcast address bytes 4...9 (once with dev->broadcast which is IPv4, second time with dev->broadcast with the IP version nibble changed to 6). The first 4 bytes of dev->broadcast are QPN. Of course, you'll be doing this before pulling any other headers from the skb because GRH comes first. The QPN you get in the completion is the QP from which it was sent - src_qp. You don't get the QPN which was put in the UD header by the sender, which is 0xFFFFFF for multicast. In userspace (aka struct ibv_wc), you get the qp number from which the receive buffer was taken, and in kernel space (aka struct ib_wc, no 'v') you get the pointer to the QP. You can do multicast on UC QP. You can't do multicast on RC QP. BTW In your first patch looks like you are doing htonl when you should be doing htons (or cpu_to_be16 which is good with constants) > -----Original Message----- > From: Christoph Lameter [mailto:cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org] > Sent: Monday, August 23, 2010 22:28 > To: Jason Gunthorpe > Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Or Gerlitz; Yossi Etigin; Roland Dreier > Subject: Re: IPoIB: Broken IGMP processing > > On Mon, 23 Aug 2010, Jason Gunthorpe wrote: > > > src_qp is just the send QPN, you need to look at qp_num (aka dest qp). > > I'm not entirely sure what it will be, I didn't find anything too clear > > in the spec. If it is 0xFFFFFF then the HCA is copying the dest QP > > directly into the WC and this can work, if it is something else then > > the HCA is setting it to the QPN of the RQ that recieved the packet, > > which is not useful for this. > > AFAICT the spec has a QP for RC and one for UD packets. How do these > relate to multicast? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <7E95F01E94AB484F83061FCFA35B39F8013249-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org>]
* Re: IPoIB: Broken IGMP processing [not found] ` <7E95F01E94AB484F83061FCFA35B39F8013249-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org> @ 2010-08-23 19:45 ` Jason Gunthorpe [not found] ` <20100823194555.GN26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 33+ messages in thread From: Jason Gunthorpe @ 2010-08-23 19:45 UTC (permalink / raw) To: Yossi Etigin Cc: Christoph Lameter, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Roland Dreier On Mon, Aug 23, 2010 at 10:34:38PM +0300, Yossi Etigin wrote: > Hello Christoph, > > I think you'll first want to check the wc_flags field in the work > completion to see whether GRH is present (and it must be present in > multicast packets, normally not present in unicast). Then, extract the > destination gid from grh, and compare the first 6 bytes of it to the > dev->broadcast address bytes 4...9 (once with dev->broadcast which is > IPv4, second time with dev->broadcast with the IP version nibble changed > to 6). The first 4 bytes of dev->broadcast are QPN. > Of course, you'll be doing this before pulling any other headers from > the skb because GRH comes first. > The QPN you get in the completion is the QP from which it was sent - > src_qp. You don't get the QPN which was put in the UD header by the > sender, which is 0xFFFFFF for multicast. In userspace (aka struct > ibv_wc), you get the qp number from which the receive buffer was taken, > and in kernel space (aka struct ib_wc, no 'v') you get the pointer to > the QP. Ah, so that clears things up. Sorry to have confused things with this suggestion. Simplest then is to check if byte 24 of the packet is 0xff. (ie IN6_IS_ADDR_MULTICAST) No need to worry about if it is properly formed or anything, if it is a multicast DGID then it is a multicast packet at the link level. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <20100823194555.GN26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: IPoIB: Broken IGMP processing [not found] ` <20100823194555.GN26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2010-08-23 19:50 ` Yossi Etigin [not found] ` <7E95F01E94AB484F83061FCFA35B39F801324A-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org> 2010-08-23 19:59 ` Christoph Lameter 1 sibling, 1 reply; 33+ messages in thread From: Yossi Etigin @ 2010-08-23 19:50 UTC (permalink / raw) To: Jason Gunthorpe Cc: Christoph Lameter, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Roland Dreier > > Simplest then is to check if byte 24 of the packet is 0xff. > (ie IN6_IS_ADDR_MULTICAST) > > No need to worry about if it is properly formed or anything, if it is > a multicast DGID then it is a multicast packet at the link level. > Sounds good to me -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <7E95F01E94AB484F83061FCFA35B39F801324A-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org>]
* RE: IPoIB: Broken IGMP processing [not found] ` <7E95F01E94AB484F83061FCFA35B39F801324A-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org> @ 2010-08-25 14:43 ` Christoph Lameter 0 siblings, 0 replies; 33+ messages in thread From: Christoph Lameter @ 2010-08-25 14:43 UTC (permalink / raw) To: Yossi Etigin Cc: Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Roland Dreier On Mon, 23 Aug 2010, Yossi Etigin wrote: > > > > Simplest then is to check if byte 24 of the packet is 0xff. > > (ie IN6_IS_ADDR_MULTICAST) > > > > No need to worry about if it is properly formed or anything, if it is > > a multicast DGID then it is a multicast packet at the link level. > > > > Sounds good to me Therefore this patch? Subject: [IB] Make igmp processing work with IPOIB IGMP processing is broken because the IPOIB does not set the skb->pkt_type the right way for Multicast traffic. All incoming packets are set to PACKET_HOST which means that the igmp_recv() function will ignore the IGMP broadcasts/multicasts. This in turn means that the IGMP timers are firing and are sending information about multicast subscriptions unnecessarily. In a large private network this can cause traffic spikes. Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> --- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 7 +++++-- include/linux/in6.h | 3 +++ 2 files changed, 8 insertions(+), 2 deletions(-) Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2010-08-23 16:04:38.000000000 -0500 +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2010-08-25 09:43:01.000000000 -0500 @@ -281,8 +281,11 @@ static void ipoib_ib_handle_rx_wc(struct dev->stats.rx_bytes += skb->len; skb->dev = dev; - /* XXX get correct PACKET_ type here */ - skb->pkt_type = PACKET_HOST; + if (IN6_IS_ADDR_MULTICAST(skb->head + 24)) + + skb->pkt_type = PACKET_MULTICAST; + else + skb->pkt_type = PACKET_HOST; if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok)) skb->ip_summed = CHECKSUM_UNNECESSARY; Index: linux-2.6/include/linux/in6.h =================================================================== --- linux-2.6.orig/include/linux/in6.h 2010-08-25 09:39:40.000000000 -0500 +++ linux-2.6/include/linux/in6.h 2010-08-25 09:40:22.000000000 -0500 @@ -53,6 +53,9 @@ extern const struct in6_addr in6addr_lin extern const struct in6_addr in6addr_linklocal_allrouters; #define IN6ADDR_LINKLOCAL_ALLROUTERS_INIT \ { { { 0xff,2,0,0,0,0,0,0,0,0,0,0,0,0,0,2 } } } + +#define IN6_IS_ADDR_MULTICAST(a) (((const __u8 *) (a))[0] == 0xff) + #endif struct sockaddr_in6 { -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: IPoIB: Broken IGMP processing [not found] ` <20100823194555.GN26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2010-08-23 19:50 ` Yossi Etigin @ 2010-08-23 19:59 ` Christoph Lameter [not found] ` <alpine.DEB.2.00.1008231454230.12815-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 1 sibling, 1 reply; 33+ messages in thread From: Christoph Lameter @ 2010-08-23 19:59 UTC (permalink / raw) To: Jason Gunthorpe Cc: Yossi Etigin, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Roland Dreier On Mon, 23 Aug 2010, Jason Gunthorpe wrote: > Simplest then is to check if byte 24 of the packet is 0xff. > (ie IN6_IS_ADDR_MULTICAST) Dont see that function defined anywhere in the kernel. > No need to worry about if it is properly formed or anything, if it is > a multicast DGID then it is a multicast packet at the link level. ok so skb->head[24] = 0xff ? Looks raw and not kernel style. There need to be some skb_xxx function that get to it. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <alpine.DEB.2.00.1008231454230.12815-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>]
* Re: IPoIB: Broken IGMP processing [not found] ` <alpine.DEB.2.00.1008231454230.12815-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> @ 2010-08-24 18:49 ` Jason Gunthorpe 0 siblings, 0 replies; 33+ messages in thread From: Jason Gunthorpe @ 2010-08-24 18:49 UTC (permalink / raw) To: Christoph Lameter Cc: Yossi Etigin, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Roland Dreier On Mon, Aug 23, 2010 at 02:59:45PM -0500, Christoph Lameter wrote: > On Mon, 23 Aug 2010, Jason Gunthorpe wrote: > > > Simplest then is to check if byte 24 of the packet is 0xff. > > (ie IN6_IS_ADDR_MULTICAST) > > Dont see that function defined anywhere in the kernel. That is what userspace calls it.. How about ipv6_addr_type(((ipv6hdr *)skb->head)->daddr) & IPV6_ADDR_MULTICAST There are 32 occurrences of 'skb->head' in drivers/net.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: IPoIB: Broken IGMP processing [not found] ` <20100823174110.GK26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2010-08-23 18:10 ` Christoph Lameter @ 2010-08-23 18:19 ` Christoph Lameter [not found] ` <alpine.DEB.2.00.1008231317150.10719-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 1 sibling, 1 reply; 33+ messages in thread From: Christoph Lameter @ 2010-08-23 18:19 UTC (permalink / raw) To: Jason Gunthorpe Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Yossi Etigin, Roland Dreier On Mon, 23 Aug 2010, Jason Gunthorpe wrote: > Hmmm... What are you trying to access here? I'm guessing it is the > DGID of the GRH? > > ipoib_ud_skb_put_frags(priv, skb, wc->byte_len); > skb_pull(skb, IB_GRH_BYTES); <-- These are the bytes you want > skb_reset_mac_header(skb); <-- Sets skb_mac_header to skb->head+40 > skb_pull(skb, IPOIB_ENCAP_LEN); > > So, I think you are accessing byte 42, which doesn't seem right? The > DGID starts in byte 24 from skb->head. > > Also, you need to check for IBV_WC_GRH, the 40 bytes are garbage if it > is not set. Trying to get the MGID information: >From http://tools.ietf.org/html/draft-ietf-ipoib-link-multicast-00 7. The IPoIB All-Node Multicast and Broadcast Group Once an IB partition is created with link attributes identified for an IPoIB link, the network administrator must create a special IB multicast group for every node on the IPoIB link to join. This is achieved through the creation of "MCGroupRecord" in each IB subnet that the IB partition encompasses, as described in section 4 above. The MGID will have the P_Key of the IB partition that defines the IPoIB link embedded in it. A special signature is also embedded to identify the MGID for IPoIB use only. For IPv4 over IB, the signature will be "0x401B". For IPv6 over IB, the signature will be "0x601B". For an IPv4 subnet, the MGID for this special IB multicast group SHALL have the following format: | 8 | 4 | 4 | 16 bits | 16 bits | 48 bits | 32 bits | +--------+----+----+-----------------+---------+----------+---------+ |11111111|0001|scop|<IPoIB signature>|< P_Key >|00.......0|<all 1's>| +--------+----+----+-----------------+---------+----------+---------+ Chu & Kashyap [Page 7] draft-ietf-ipoib-link-multicast-00.txt January 2002 For an IPv6 subnet, the format of the MGID SHALL look like this: | 8 | 4 | 4 | 16 bits | 16 bits | 80 bits | +--------+----+----+-----------------+---------+--------------------+ |11111111|0001|scop|<IPoIB signature>|< P_Key >|000.............0001| +--------+----+----+-----------------+---------+--------------------+ As for the scop bits, if the IPoIB link is fully contained within a single IB subnet, the scop bits SHALL be set to 2 (link-local). Otherwise the scope will be set higher. A MCGroupRecord will be created with all the IPoIB link attributes described before, including the link MTU, Q_Key, TClass, FlowLabel, and HopLimit. When a node is attached to an IPoIB link identified by a P_Key, it must look for a special, all-node multicast/broadcast group to join. This is done by constructing the MGID with the link P_Key and the IPoIB signature. The node SHOULD always look for a MGID of a link-local scope first before attempting one with a greater scope. Once the right MGID and MCGroupRecord are identified, the local node SHOULD use the link MTU recorded in the MCGroupRecord. It MUST accept a smaller MTU if one is advertised through the link MTU option of a router advertisement [DISC]. In case the link MTU is greater than the maximum payload size that the local HCA can support, the node can not join the IPoIB link and operate as an IP node. After the right MTU is determined, the local node must join the special all-node multicast/broadcast group by calling the SA to create a MCMemberRecord corresponding to the MGID. The SA will return all the link attributes for the local node to use. The node MUST use these attributes in all future multicast operations to the local IPoIB link. The broadcast group for IPv4 will serve to provide a broadcast service for protocol like ARP to use. In addition to the all-node multicast/broadcast group, an all-router multicast group SHOULD be created at link configuration time if an IP router will be attached to the link. This is to facilitate IP multicast operations described later. A MCGroupRecord for the all- router MGID must be created in every IB subnet that the IPoIB link encompasses. The format of the all-router MGID will be covered in next section. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <alpine.DEB.2.00.1008231317150.10719-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>]
* Re: IPoIB: Broken IGMP processing [not found] ` <alpine.DEB.2.00.1008231317150.10719-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> @ 2010-08-23 18:30 ` Jason Gunthorpe [not found] ` <20100823183044.GM26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 33+ messages in thread From: Jason Gunthorpe @ 2010-08-23 18:30 UTC (permalink / raw) To: Christoph Lameter Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Yossi Etigin, Roland Dreier On Mon, Aug 23, 2010 at 01:19:20PM -0500, Christoph Lameter wrote: > On Mon, 23 Aug 2010, Jason Gunthorpe wrote: > > > Hmmm... What are you trying to access here? I'm guessing it is the > > DGID of the GRH? > > > > ipoib_ud_skb_put_frags(priv, skb, wc->byte_len); > > skb_pull(skb, IB_GRH_BYTES); <-- These are the bytes you want > > skb_reset_mac_header(skb); <-- Sets skb_mac_header to skb->head+40 > > skb_pull(skb, IPOIB_ENCAP_LEN); > > > > So, I think you are accessing byte 42, which doesn't seem right? The > > DGID starts in byte 24 from skb->head. > > > > Also, you need to check for IBV_WC_GRH, the 40 bytes are garbage if it > > is not set. > > Trying to get the MGID information: > > From http://tools.ietf.org/html/draft-ietf-ipoib-link-multicast-00 The MGID is stuffed into the DGID field of the GRH by the sender, so you want to start at byte 24 of the wc data. The structure is 40 bytes of GRH, 4 bytes of IPOIB_ENCAP with the protocol number, and then the datagram. It looks to me like the MAC header pointer is set to the IPOIB_ENCAP header. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <20100823183044.GM26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* [IPoIB] Identify Multicast packets and fix IGMP breakage [not found] ` <20100823183044.GM26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2010-08-26 19:55 ` Christoph Lameter [not found] ` <alpine.DEB.2.00.1008261453510.21466-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 0 siblings, 1 reply; 33+ messages in thread From: Christoph Lameter @ 2010-08-26 19:55 UTC (permalink / raw) To: Roland Dreier Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Jason Gunthorpe, Yossi Etigin I had to move the check around a bit but the patch now passes our tests. Please merge soon. Subject: [IPoIB] Identify Multicast packets and fix IGMP breakage IGMP processing is broken because the IPOIB does not set the skb->pkt_type the right way for Multicast traffic. All incoming packets are set to PACKET_HOST which means that the igmp_recv() function will ignore the IGMP broadcasts/multicasts. This in turn means that the IGMP timers are firing and are sending information about multicast subscriptions unnecessarily. In a large private network this can cause traffic spikes. Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> --- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 7 +++++-- include/linux/in6.h | 3 +++ 2 files changed, 8 insertions(+), 2 deletions(-) Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2010-08-26 14:11:39.000000000 -0500 +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2010-08-26 14:51:31.000000000 -0500 @@ -271,6 +271,14 @@ ipoib_ud_dma_unmap_rx(priv, mapping); ipoib_ud_skb_put_frags(priv, skb, wc->byte_len); + /* According to Jason Gunthorpe byte 24 in the GRH has the MGID */ + if (IN6_IS_ADDR_MULTICAST(skb->data + 24)) + + skb->pkt_type = PACKET_MULTICAST; + + else + skb->pkt_type = PACKET_HOST; + skb_pull(skb, IB_GRH_BYTES); skb->protocol = ((struct ipoib_header *) skb->data)->proto; @@ -281,9 +289,6 @@ dev->stats.rx_bytes += skb->len; skb->dev = dev; - /* XXX get correct PACKET_ type here */ - skb->pkt_type = PACKET_HOST; - if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok)) skb->ip_summed = CHECKSUM_UNNECESSARY; Index: linux-2.6/include/linux/in6.h =================================================================== --- linux-2.6.orig/include/linux/in6.h 2010-08-26 14:11:39.000000000 -0500 +++ linux-2.6/include/linux/in6.h 2010-08-26 14:11:52.000000000 -0500 @@ -53,6 +53,9 @@ extern const struct in6_addr in6addr_linklocal_allrouters; #define IN6ADDR_LINKLOCAL_ALLROUTERS_INIT \ { { { 0xff,2,0,0,0,0,0,0,0,0,0,0,0,0,0,2 } } } + +#define IN6_IS_ADDR_MULTICAST(a) (((const __u8 *) (a))[0] == 0xff) + #endif struct sockaddr_in6 { -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <alpine.DEB.2.00.1008261453510.21466-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>]
* Re: [IPoIB] Identify Multicast packets and fix IGMP breakage [not found] ` <alpine.DEB.2.00.1008261453510.21466-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> @ 2010-08-26 20:49 ` Roland Dreier [not found] ` <adapqx5rtez.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 33+ messages in thread From: Roland Dreier @ 2010-08-26 20:49 UTC (permalink / raw) To: Christoph Lameter Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Jason Gunthorpe, Yossi Etigin > + /* According to Jason Gunthorpe byte 24 in the GRH has the MGID */ I think we could put a better explanation here than "Jason said so." In fact since the GRH format matches the IPv6 header format, maybe we do something like "((struct ipv6hdr *) skb->data)->daddr" to get at the DGID being used. > + if (IN6_IS_ADDR_MULTICAST(skb->data + 24)) > + trivial but this blank line isn't needed here. > + skb->pkt_type = PACKET_MULTICAST; > + or here > + else > + skb->pkt_type = PACKET_HOST; also it's not clear to me why it's OK to do this test of the DGID if the packet didn't have a GRH -- presumably we are just looking at random uninitialized memory so we might incorrectly say some packets are multicast if that byte happens to be 0xff. (or does that not matter? if so why can't we just always make everything PACKET_MULTICAST?) It seems the check should be something like if ((wc->wc_flags & IB_WC_GRH) && IN6_IS_ADDR_MULTICAST(((struct ipv6hdr *) skb->data)->daddr)) > --- linux-2.6.orig/include/linux/in6.h 2010-08-26 14:11:39.000000000 -0500 > +++ linux-2.6/include/linux/in6.h 2010-08-26 14:11:52.000000000 -0500 > @@ -53,6 +53,9 @@ > extern const struct in6_addr in6addr_linklocal_allrouters; > #define IN6ADDR_LINKLOCAL_ALLROUTERS_INIT \ > { { { 0xff,2,0,0,0,0,0,0,0,0,0,0,0,0,0,2 } } } > + > +#define IN6_IS_ADDR_MULTICAST(a) (((const __u8 *) (a))[0] == 0xff) We probably want to run this addition past netdev. - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <adapqx5rtez.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>]
* Re: [IPoIB] Identify Multicast packets and fix IGMP breakage [not found] ` <adapqx5rtez.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> @ 2010-08-26 21:17 ` Christoph Lameter [not found] ` <alpine.DEB.2.00.1008261613160.24174-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 0 siblings, 1 reply; 33+ messages in thread From: Christoph Lameter @ 2010-08-26 21:17 UTC (permalink / raw) To: Roland Dreier Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Jason Gunthorpe, Yossi Etigin On Thu, 26 Aug 2010, Roland Dreier wrote: > also it's not clear to me why it's OK to do this test of the DGID if the > packet didn't have a GRH -- presumably we are just looking at random > uninitialized memory so we might incorrectly say some packets are > multicast if that byte happens to be 0xff. (or does that not matter? > if so why can't we just always make everything PACKET_MULTICAST?) We will do an skb_pull to skip over the GRH next? The IP layer checks PACKET_HOST etc in various places so we may risk breakage in odd places like what already occurred here. It would be good to also have support for PACKET_BROADCAST but that would require a full MGID comparison. > It seems the check should be something like > > if ((wc->wc_flags & IB_WC_GRH) && > IN6_IS_ADDR_MULTICAST(((struct ipv6hdr *) skb->data)->daddr)) Hmmm... More IPV6 stuff slipping in. > We probably want to run this addition past netdev. Ok. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <alpine.DEB.2.00.1008261613160.24174-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>]
* Re: [IPoIB] Identify Multicast packets and fix IGMP breakage [not found] ` <alpine.DEB.2.00.1008261613160.24174-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> @ 2010-08-26 21:29 ` Jason Gunthorpe 2010-08-26 21:31 ` [IPoIB] Identify multicast packets and fix IGMP breakage V2 Christoph Lameter 2010-08-26 21:32 ` [IPoIB] Identify Multicast packets and fix IGMP breakage Roland Dreier 2 siblings, 0 replies; 33+ messages in thread From: Jason Gunthorpe @ 2010-08-26 21:29 UTC (permalink / raw) To: Christoph Lameter Cc: Roland Dreier, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Yossi Etigin On Thu, Aug 26, 2010 at 04:17:08PM -0500, Christoph Lameter wrote: > On Thu, 26 Aug 2010, Roland Dreier wrote: > > > also it's not clear to me why it's OK to do this test of the DGID if the > > packet didn't have a GRH -- presumably we are just looking at random > > uninitialized memory so we might incorrectly say some packets are > > multicast if that byte happens to be 0xff. (or does that not matter? > > if so why can't we just always make everything PACKET_MULTICAST?) > > We will do an skb_pull to skip over the GRH next? The GRH bytes are always present, but the HW may not fill them in if IB_WC_GRH is not set. Checking for the broadcast address should be done by comparing the MGID against dev->broadcast + 4, like Yossi was describing. This can be done as a sub-case of the multicast check since dev->broadcast+4 will always start with 0xff. IPv6 does not have a broadcast address, so the additional work to check the v6 MGID format Yossi was discussing is not required. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* [IPoIB] Identify multicast packets and fix IGMP breakage V2 [not found] ` <alpine.DEB.2.00.1008261613160.24174-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 2010-08-26 21:29 ` Jason Gunthorpe @ 2010-08-26 21:31 ` Christoph Lameter [not found] ` <alpine.DEB.2.00.1008261629440.24174-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 2010-08-26 21:32 ` [IPoIB] Identify Multicast packets and fix IGMP breakage Roland Dreier 2 siblings, 1 reply; 33+ messages in thread From: Christoph Lameter @ 2010-08-26 21:31 UTC (permalink / raw) To: Roland Dreier Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Jason Gunthorpe, Yossi Etigin, netdev-u79uwXL29TY76Z2rM5mHXA Is this better? Subject: [IPoIB] Identify Multicast packets and fix IGMP breakage V2 IGMP processing is broken because the IPOIB does not set the skb->pkt_type the right way for Multicast traffic. All incoming packets are set to PACKET_HOST which means that the igmp_recv() function will ignore the IGMP broadcasts/multicasts. This in turn means that the IGMP timers are firing and are sending information about multicast subscriptions unnecessarily. In a large private network this can cause traffic spikes. Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> --- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 10 +++++++--- include/linux/in6.h | 3 +++ 2 files changed, 10 insertions(+), 3 deletions(-) Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2010-08-26 15:11:34.000000000 -0500 +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2010-08-26 16:29:19.000000000 -0500 @@ -271,6 +271,13 @@ static void ipoib_ib_handle_rx_wc(struct ipoib_ud_dma_unmap_rx(priv, mapping); ipoib_ud_skb_put_frags(priv, skb, wc->byte_len); + if ((wc->wc_flags & IB_WC_GRH) && + IN6_IS_ADDR_MULTICAST(&((struct ipv6hdr *)skb->data)->daddr)) + + skb->pkt_type = PACKET_MULTICAST; + else + skb->pkt_type = PACKET_HOST; + skb_pull(skb, IB_GRH_BYTES); skb->protocol = ((struct ipoib_header *) skb->data)->proto; @@ -281,9 +288,6 @@ static void ipoib_ib_handle_rx_wc(struct dev->stats.rx_bytes += skb->len; skb->dev = dev; - /* XXX get correct PACKET_ type here */ - skb->pkt_type = PACKET_HOST; - if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok)) skb->ip_summed = CHECKSUM_UNNECESSARY; Index: linux-2.6/include/linux/in6.h =================================================================== --- linux-2.6.orig/include/linux/in6.h 2010-08-26 15:11:34.000000000 -0500 +++ linux-2.6/include/linux/in6.h 2010-08-26 16:27:08.000000000 -0500 @@ -53,6 +53,9 @@ extern const struct in6_addr in6addr_lin extern const struct in6_addr in6addr_linklocal_allrouters; #define IN6ADDR_LINKLOCAL_ALLROUTERS_INIT \ { { { 0xff,2,0,0,0,0,0,0,0,0,0,0,0,0,0,2 } } } + +#define IN6_IS_ADDR_MULTICAST(a) ((a)->in6_u.u6_addr8[0] == 0xff) + #endif struct sockaddr_in6 { -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <alpine.DEB.2.00.1008261629440.24174-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>]
* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V2 [not found] ` <alpine.DEB.2.00.1008261629440.24174-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> @ 2010-08-26 22:15 ` David Miller [not found] ` <20100826.151553.242147157.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> 0 siblings, 1 reply; 33+ messages in thread From: David Miller @ 2010-08-26 22:15 UTC (permalink / raw) To: cl-vYTEC60ixJUAvxtiuMwx3w Cc: rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ogerlitz-smomgflXvOZWk0Htik3J/w, jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/, yosefe-smomgflXvOZWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA From: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> Date: Thu, 26 Aug 2010 16:31:14 -0500 (CDT) > @@ -271,6 +271,13 @@ static void ipoib_ib_handle_rx_wc(struct > ipoib_ud_dma_unmap_rx(priv, mapping); > ipoib_ud_skb_put_frags(priv, skb, wc->byte_len); > > + if ((wc->wc_flags & IB_WC_GRH) && > + IN6_IS_ADDR_MULTICAST(&((struct ipv6hdr *)skb->data)->daddr)) > + > + skb->pkt_type = PACKET_MULTICAST; > + else > + skb->pkt_type = PACKET_HOST; I really don't think you can assume there is an ipv6 header here at all. You'll need to parse the encapsulated protocol and process it as ipv4 or ipv6 as needed. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <20100826.151553.242147157.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>]
* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V2 [not found] ` <20100826.151553.242147157.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> @ 2010-08-26 22:21 ` Jason Gunthorpe [not found] ` <20100826222146.GA23025-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 33+ messages in thread From: Jason Gunthorpe @ 2010-08-26 22:21 UTC (permalink / raw) To: David Miller Cc: cl-vYTEC60ixJUAvxtiuMwx3w, rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ogerlitz-smomgflXvOZWk0Htik3J/w, yosefe-smomgflXvOZWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA On Thu, Aug 26, 2010 at 03:15:53PM -0700, David Miller wrote: > From: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> > Date: Thu, 26 Aug 2010 16:31:14 -0500 (CDT) > > > @@ -271,6 +271,13 @@ static void ipoib_ib_handle_rx_wc(struct > > ipoib_ud_dma_unmap_rx(priv, mapping); > > ipoib_ud_skb_put_frags(priv, skb, wc->byte_len); > > > > + if ((wc->wc_flags & IB_WC_GRH) && > > + IN6_IS_ADDR_MULTICAST(&((struct ipv6hdr *)skb->data)->daddr)) > > + > > + skb->pkt_type = PACKET_MULTICAST; > > + else > > + skb->pkt_type = PACKET_HOST; > > I really don't think you can assume there is an ipv6 header here > at all. The 40 bytes at this location are defined by the HW specification to be an IB GRH which has an identical layout to an IPv6 header. Roland is right, it would be clearer to use ib_grh ->dgid Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <20100826222146.GA23025-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3 [not found] ` <20100826222146.GA23025-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2010-08-26 23:26 ` Christoph Lameter [not found] ` <alpine.DEB.2.00.1008261825490.26351-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 0 siblings, 1 reply; 33+ messages in thread From: Christoph Lameter @ 2010-08-26 23:26 UTC (permalink / raw) To: Jason Gunthorpe Cc: David Miller, rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ogerlitz-smomgflXvOZWk0Htik3J/w, yosefe-smomgflXvOZWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA On Thu, 26 Aug 2010, Jason Gunthorpe wrote: > The 40 bytes at this location are defined by the HW specification to > be an IB GRH which has an identical layout to an IPv6 header. Roland > is right, it would be clearer to use ib_grh ->dgid Ok but then we have no nice function that checks for multicast anymore. Subject: [IPoIB] Identify multicast packets and fix IGMP breakage V3 IGMP processing is broken because the IPOIB does not set the skb->pkt_type the right way for Multicast traffic. All incoming packets are set to PACKET_HOST which means that the igmp_recv() function will ignore the IGMP broadcasts/multicasts. This in turn means that the IGMP timers are firing and are sending information about multicast subscriptions unnecessarily. In a large private network this can cause traffic spikes. Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> --- Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2010-08-26 18:24:07.842079559 -0500 +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2010-08-26 18:25:33.859815544 -0500 @@ -271,6 +271,14 @@ ipoib_ud_dma_unmap_rx(priv, mapping); ipoib_ud_skb_put_frags(priv, skb, wc->byte_len); + /* First byte of dgid signals multicast when 0xff */ + if ((wc->wc_flags & IB_WC_GRH) && + ((struct ib_grh *)skb->data)->dgid.raw[0] == 0xff) + + skb->pkt_type = PACKET_MULTICAST; + else + skb->pkt_type = PACKET_HOST; + skb_pull(skb, IB_GRH_BYTES); skb->protocol = ((struct ipoib_header *) skb->data)->proto; @@ -281,9 +289,6 @@ dev->stats.rx_bytes += skb->len; skb->dev = dev; - /* XXX get correct PACKET_ type here */ - skb->pkt_type = PACKET_HOST; - if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok)) skb->ip_summed = CHECKSUM_UNNECESSARY; -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <alpine.DEB.2.00.1008261825490.26351-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>]
* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3 [not found] ` <alpine.DEB.2.00.1008261825490.26351-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> @ 2010-08-26 23:43 ` Jason Gunthorpe 2010-08-26 23:57 ` Christoph Lameter [not found] ` <20100826234342.GA24333-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 2 replies; 33+ messages in thread From: Jason Gunthorpe @ 2010-08-26 23:43 UTC (permalink / raw) To: Christoph Lameter Cc: rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ogerlitz-smomgflXvOZWk0Htik3J/w, yosefe-smomgflXvOZWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA On Thu, Aug 26, 2010 at 06:26:58PM -0500, Christoph Lameter wrote: > On Thu, 26 Aug 2010, Jason Gunthorpe wrote: > > > The 40 bytes at this location are defined by the HW specification to > > be an IB GRH which has an identical layout to an IPv6 header. Roland > > is right, it would be clearer to use ib_grh ->dgid > > Ok but then we have no nice function that checks for multicast anymore. Were you going to try it this way? /* First byte of dgid signals multicast/broadcast when 0xff */ if ((wc->wc_flags & IB_WC_GRH) && ((struct ib_grh *)skb->data)->dgid.raw[0] == 0xff) { if (memcmp(((struct ib_grh *)skb->data)->dgid.raw, dev->broadcast + 4, sizeof(union ib_gid)) == 0) skb->pkt_type = PACKET_BROADCAST; else skb->pkt_type = PACKET_MULTICAST; } else skb->pkt_type = PACKET_HOST; I think doing the memcmp only in the multicast path should be reasonable overhead wise. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3 2010-08-26 23:43 ` Jason Gunthorpe @ 2010-08-26 23:57 ` Christoph Lameter [not found] ` <alpine.DEB.2.00.1008261856090.29657-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> [not found] ` <20100826234342.GA24333-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 1 sibling, 1 reply; 33+ messages in thread From: Christoph Lameter @ 2010-08-26 23:57 UTC (permalink / raw) To: Jason Gunthorpe; +Cc: rdreier, linux-rdma, ogerlitz, yosefe, netdev On Thu, 26 Aug 2010, Jason Gunthorpe wrote: > I think doing the memcmp only in the multicast path should be > reasonable overhead wise. Thats is not always possible. Here the multicast path is the default path that is taken 99% of the time. ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <alpine.DEB.2.00.1008261856090.29657-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>]
* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3 [not found] ` <alpine.DEB.2.00.1008261856090.29657-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> @ 2010-08-27 0:07 ` David Miller 2010-08-27 2:24 ` Christoph Lameter 0 siblings, 1 reply; 33+ messages in thread From: David Miller @ 2010-08-27 0:07 UTC (permalink / raw) To: cl-vYTEC60ixJUAvxtiuMwx3w Cc: jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/, rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ogerlitz-smomgflXvOZWk0Htik3J/w, yosefe-smomgflXvOZWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA From: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> Date: Thu, 26 Aug 2010 18:57:09 -0500 (CDT) > On Thu, 26 Aug 2010, Jason Gunthorpe wrote: > >> I think doing the memcmp only in the multicast path should be >> reasonable overhead wise. > > Thats is not always possible. Here the multicast path is the > default path that is taken 99% of the time. The highest cost is bringing in that packet header's cache line, which you've already done by reading the byte and checking for 0xff. I doubt the memcmp() can possibly matter. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3 2010-08-27 0:07 ` David Miller @ 2010-08-27 2:24 ` Christoph Lameter 0 siblings, 0 replies; 33+ messages in thread From: Christoph Lameter @ 2010-08-27 2:24 UTC (permalink / raw) To: David Miller; +Cc: jgunthorpe, rdreier, linux-rdma, ogerlitz, yosefe, netdev On Thu, 26 Aug 2010, David Miller wrote: > The highest cost is bringing in that packet header's cache line, which > you've already done by reading the byte and checking for 0xff. And then you need to bring in the cacheline of the broadcast address.... These are pretty long byte strings in the IB case. ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <20100826234342.GA24333-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: [IPoIB] Identify multicast packets and fix IGMP breakage V3 [not found] ` <20100826234342.GA24333-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2010-08-27 0:02 ` Yossi Etigin 2010-08-27 13:29 ` Christoph Lameter 1 sibling, 0 replies; 33+ messages in thread From: Yossi Etigin @ 2010-08-27 0:02 UTC (permalink / raw) To: Jason Gunthorpe, Christoph Lameter Cc: rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, netdev-u79uwXL29TY76Z2rM5mHXA > > Were you going to try it this way? > > /* First byte of dgid signals multicast/broadcast when 0xff */ > if ((wc->wc_flags & IB_WC_GRH) && > ((struct ib_grh *)skb->data)->dgid.raw[0] == 0xff) { > if (memcmp(((struct ib_grh *)skb->data)->dgid.raw, > dev->broadcast + 4, sizeof(union ib_gid)) == 0) > skb->pkt_type = PACKET_BROADCAST; > else > skb->pkt_type = PACKET_MULTICAST; > } > else > skb->pkt_type = PACKET_HOST; > > I think doing the memcmp only in the multicast path should be > reasonable overhead wise. > Shouldn't struct ib_grh be packed to make this really work? The code looks a little messy to me anyway... How about using a local var which is a ptr to packed struct ib_grh? The compiler will probably eliminate it anyway. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* RE: [IPoIB] Identify multicast packets and fix IGMP breakage V3 @ 2010-08-27 0:02 ` Yossi Etigin 0 siblings, 0 replies; 33+ messages in thread From: Yossi Etigin @ 2010-08-27 0:02 UTC (permalink / raw) To: Jason Gunthorpe, Christoph Lameter Cc: rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, netdev-u79uwXL29TY76Z2rM5mHXA > > Were you going to try it this way? > > /* First byte of dgid signals multicast/broadcast when 0xff */ > if ((wc->wc_flags & IB_WC_GRH) && > ((struct ib_grh *)skb->data)->dgid.raw[0] == 0xff) { > if (memcmp(((struct ib_grh *)skb->data)->dgid.raw, > dev->broadcast + 4, sizeof(union ib_gid)) == 0) > skb->pkt_type = PACKET_BROADCAST; > else > skb->pkt_type = PACKET_MULTICAST; > } > else > skb->pkt_type = PACKET_HOST; > > I think doing the memcmp only in the multicast path should be > reasonable overhead wise. > Shouldn't struct ib_grh be packed to make this really work? The code looks a little messy to me anyway... How about using a local var which is a ptr to packed struct ib_grh? The compiler will probably eliminate it anyway. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <7E95F01E94AB484F83061FCFA35B39F89BE276-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org>]
* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3 [not found] ` <7E95F01E94AB484F83061FCFA35B39F89BE276-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org> @ 2010-08-27 0:17 ` Jason Gunthorpe 0 siblings, 0 replies; 33+ messages in thread From: Jason Gunthorpe @ 2010-08-27 0:17 UTC (permalink / raw) To: Yossi Etigin Cc: Christoph Lameter, rdreier-FYB4Gu1CFyUAvxtiuMwx3w, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, netdev-u79uwXL29TY76Z2rM5mHXA On Fri, Aug 27, 2010 at 03:02:54AM +0300, Yossi Etigin wrote: > Shouldn't struct ib_grh be packed to make this really work? No idea what the kernel convention for this is. It looks OK to me, in that no arch I am familiar with will insert padding. > The code looks a little messy to me anyway... > How about using a local var which is a ptr to packed struct ib_grh? The > compiler will probably eliminate it anyway. This bike shed is looking pretty well painted already :) Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3 [not found] ` <20100826234342.GA24333-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2010-08-27 0:02 ` Yossi Etigin @ 2010-08-27 13:29 ` Christoph Lameter 2010-09-14 7:27 ` Or Gerlitz [not found] ` <alpine.DEB.2.00.1008270827280.11792-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 1 sibling, 2 replies; 33+ messages in thread From: Christoph Lameter @ 2010-08-27 13:29 UTC (permalink / raw) To: Jason Gunthorpe Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ogerlitz-smomgflXvOZWk0Htik3J/w, yosefe-smomgflXvOZWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA, David Miller On Thu, 26 Aug 2010, Jason Gunthorpe wrote: > I think doing the memcmp only in the multicast path should be > reasonable overhead wise. Ok the dgid is only 8 bytes not the whole 40 bytes.... Here is the patch somewhat cleaned up with PACKET_BROADCAST. Subject: [IPoIB] Identify multicast packets and fix IGMP breakage V3 IGMP processing is broken because the IPOIB does not set the skb->pkt_type the right way for Multicast traffic. All incoming packets are set to PACKET_HOST which means that the igmp_recv() function will ignore the IGMP broadcasts/multicasts. This in turn means that the IGMP timers are firing and are sending information about multicast subscriptions unnecessarily. In a large private network this can cause traffic spikes. Signed-off-by: Christoph Lameter <cl-vYTEC60ixJUAvxtiuMwx3w@public.gmane.org> --- Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c =================================================================== --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2010-08-26 18:24:07.842079559 -0500 +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2010-08-27 08:26:37.929641162 -0500 @@ -223,6 +223,7 @@ unsigned int wr_id = wc->wr_id & ~IPOIB_OP_RECV; struct sk_buff *skb; u64 mapping[IPOIB_UD_RX_SG]; + union ib_gid *dgid; ipoib_dbg_data(priv, "recv completion: id %d, status: %d\n", wr_id, wc->status); @@ -271,6 +272,21 @@ ipoib_ud_dma_unmap_rx(priv, mapping); ipoib_ud_skb_put_frags(priv, skb, wc->byte_len); + /* First byte of dgid signals multicast when 0xff */ + dgid = &((struct ib_grh *)skb->data)->dgid; + + if (!(wc->wc_flags & IB_WC_GRH) || dgid->raw[0] != 0xff) + + skb->pkt_type = PACKET_HOST; + + else if (memcmp(dgid, dev->broadcast + 4, sizeof(union ib_gid)) == 0) + + skb->pkt_type = PACKET_BROADCAST; + + else + + skb->pkt_type = PACKET_MULTICAST; + skb_pull(skb, IB_GRH_BYTES); skb->protocol = ((struct ipoib_header *) skb->data)->proto; @@ -281,9 +297,6 @@ dev->stats.rx_bytes += skb->len; skb->dev = dev; - /* XXX get correct PACKET_ type here */ - skb->pkt_type = PACKET_HOST; - if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok)) skb->ip_summed = CHECKSUM_UNNECESSARY; -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3 2010-08-27 13:29 ` Christoph Lameter @ 2010-09-14 7:27 ` Or Gerlitz [not found] ` <4C8F23DB.2040105-smomgflXvOZWk0Htik3J/w@public.gmane.org> [not found] ` <alpine.DEB.2.00.1008270827280.11792-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 1 sibling, 1 reply; 33+ messages in thread From: Or Gerlitz @ 2010-09-14 7:27 UTC (permalink / raw) To: Christoph Lameter, Roland Dreier Cc: Jason Gunthorpe, linux-rdma, yosefe, netdev, David Miller Christoph Lameter wrote: > Here is the patch somewhat cleaned up with PACKET_BROADCAST. > Subject: [IPoIB] Identify multicast packets and fix IGMP breakage V3 > I don't see this patch in Roland's for-next branch nor Dave's net-next-2.6 tree, is anything else needed to merge that? Or. > IGMP processing is broken because the IPOIB does not set the > skb->pkt_type the right way for Multicast traffic. All incoming > packets are set to PACKET_HOST which means that the igmp_recv() > function will ignore the IGMP broadcasts/multicasts. > > This in turn means that the IGMP timers are firing and are sending > information about multicast subscriptions unnecessarily. In a large > private network this can cause traffic spikes. > > Signed-off-by: Christoph Lameter <cl@linux.com> > > --- > > Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c > =================================================================== > --- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2010-08-26 18:24:07.842079559 -0500 > +++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_ib.c 2010-08-27 08:26:37.929641162 -0500 > @@ -223,6 +223,7 @@ > unsigned int wr_id = wc->wr_id & ~IPOIB_OP_RECV; > struct sk_buff *skb; > u64 mapping[IPOIB_UD_RX_SG]; > + union ib_gid *dgid; > > ipoib_dbg_data(priv, "recv completion: id %d, status: %d\n", > wr_id, wc->status); > @@ -271,6 +272,21 @@ > ipoib_ud_dma_unmap_rx(priv, mapping); > ipoib_ud_skb_put_frags(priv, skb, wc->byte_len); > > + /* First byte of dgid signals multicast when 0xff */ > + dgid = &((struct ib_grh *)skb->data)->dgid; > + > + if (!(wc->wc_flags & IB_WC_GRH) || dgid->raw[0] != 0xff) > + > + skb->pkt_type = PACKET_HOST; > + > + else if (memcmp(dgid, dev->broadcast + 4, sizeof(union ib_gid)) == 0) > + > + skb->pkt_type = PACKET_BROADCAST; > + > + else > + > + skb->pkt_type = PACKET_MULTICAST; > + > skb_pull(skb, IB_GRH_BYTES); > > skb->protocol = ((struct ipoib_header *) skb->data)->proto; > @@ -281,9 +297,6 @@ > dev->stats.rx_bytes += skb->len; > > skb->dev = dev; > - /* XXX get correct PACKET_ type here */ > - skb->pkt_type = PACKET_HOST; > - > if (test_bit(IPOIB_FLAG_CSUM, &priv->flags) && likely(wc->csum_ok)) > skb->ip_summed = CHECKSUM_UNNECESSARY; > > ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <4C8F23DB.2040105-smomgflXvOZWk0Htik3J/w@public.gmane.org>]
* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3 [not found] ` <4C8F23DB.2040105-smomgflXvOZWk0Htik3J/w@public.gmane.org> @ 2010-09-14 14:02 ` Christoph Lameter 0 siblings, 0 replies; 33+ messages in thread From: Christoph Lameter @ 2010-09-14 14:02 UTC (permalink / raw) To: Or Gerlitz Cc: Roland Dreier, Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA, yosefe-smomgflXvOZWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA, David Miller On Tue, 14 Sep 2010, Or Gerlitz wrote: > I don't see this patch in Roland's for-next branch nor Dave's net-next-2.6 > tree, is anything else needed to merge that? No there is nothing else needed. Roland said he is going to merge it for 2.6.37. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <alpine.DEB.2.00.1008270827280.11792-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org>]
* Re: [IPoIB] Identify multicast packets and fix IGMP breakage V3 [not found] ` <alpine.DEB.2.00.1008270827280.11792-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> @ 2010-09-28 18:09 ` Roland Dreier 0 siblings, 0 replies; 33+ messages in thread From: Roland Dreier @ 2010-09-28 18:09 UTC (permalink / raw) To: Christoph Lameter Cc: Jason Gunthorpe, linux-rdma-u79uwXL29TY76Z2rM5mHXA, ogerlitz-smomgflXvOZWk0Htik3J/w, yosefe-smomgflXvOZWk0Htik3J/w, netdev-u79uwXL29TY76Z2rM5mHXA, David Miller thanks, applied. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [IPoIB] Identify Multicast packets and fix IGMP breakage [not found] ` <alpine.DEB.2.00.1008261613160.24174-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 2010-08-26 21:29 ` Jason Gunthorpe 2010-08-26 21:31 ` [IPoIB] Identify multicast packets and fix IGMP breakage V2 Christoph Lameter @ 2010-08-26 21:32 ` Roland Dreier 2 siblings, 0 replies; 33+ messages in thread From: Roland Dreier @ 2010-08-26 21:32 UTC (permalink / raw) To: Christoph Lameter Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Or Gerlitz, Jason Gunthorpe, Yossi Etigin > > also it's not clear to me why it's OK to do this test of the DGID if the > > packet didn't have a GRH -- presumably we are just looking at random > > uninitialized memory so we might incorrectly say some packets are > > multicast if that byte happens to be 0xff. (or does that not matter? > > if so why can't we just always make everything PACKET_MULTICAST?) > We will do an skb_pull to skip over the GRH next? The IP layer checks > PACKET_HOST etc in various places so we may risk breakage in odd places > like what already occurred here. It would be good to also have support > for PACKET_BROADCAST but that would require a full MGID comparison. I'm not sure I follow what you're saying. An IB UD packet may or may not include a GRH on the wire. If it does, when it's received, the adapter writes the GRH and then the packet data into the receive buffer. If there is no GRH on the wire, then the adapter skips 40 bytes (size of a GRH/ipv6 header) and then writes the packet data into the receive buffer. So in all cases, the real packet data starts 40 bytes into the receive buffer and we have to do an skb_pull, but if there was no GRH on the wire, then the 40 bytes we skipped is just uninitialized memory. The wc_flags bit IB_WC_GRH tells you if there was a GRH on the wire or not. It sounds like we want to get this right without random false positives, so I think we need to test IB_WC_GRH. (And doing a full compare against the broadcast MGID for multicast packets wouldn't be *too* bad -- it's not much worse than what ethernet does) > > It seems the check should be something like > > > > if ((wc->wc_flags & IB_WC_GRH) && > > IN6_IS_ADDR_MULTICAST(((struct ipv6hdr *) skb->data)->daddr)) > > Hmmm... More IPV6 stuff slipping in. I guess it would be at least as correct to cast to struct ib_grh and check that the dgid is a multicast GID (we can add an inline function to check this and as a cleanup we could then get rid of the open-coded tests in ib_attach_mcast etc) - R. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2010-09-28 18:09 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-08-23 17:16 IPoIB: Broken IGMP processing Christoph Lameter [not found] ` <alpine.DEB.2.00.1008231210010.9840-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 2010-08-23 17:41 ` Jason Gunthorpe [not found] ` <20100823174110.GK26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2010-08-23 18:10 ` Christoph Lameter [not found] ` <alpine.DEB.2.00.1008231254300.10719-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 2010-08-23 18:28 ` Jason Gunthorpe [not found] ` <20100823182859.GL26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2010-08-23 19:27 ` Christoph Lameter [not found] ` <alpine.DEB.2.00.1008231427130.12815-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 2010-08-23 19:34 ` Yossi Etigin [not found] ` <7E95F01E94AB484F83061FCFA35B39F8013249-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org> 2010-08-23 19:45 ` Jason Gunthorpe [not found] ` <20100823194555.GN26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2010-08-23 19:50 ` Yossi Etigin [not found] ` <7E95F01E94AB484F83061FCFA35B39F801324A-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org> 2010-08-25 14:43 ` Christoph Lameter 2010-08-23 19:59 ` Christoph Lameter [not found] ` <alpine.DEB.2.00.1008231454230.12815-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 2010-08-24 18:49 ` Jason Gunthorpe 2010-08-23 18:19 ` Christoph Lameter [not found] ` <alpine.DEB.2.00.1008231317150.10719-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 2010-08-23 18:30 ` Jason Gunthorpe [not found] ` <20100823183044.GM26549-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2010-08-26 19:55 ` [IPoIB] Identify Multicast packets and fix IGMP breakage Christoph Lameter [not found] ` <alpine.DEB.2.00.1008261453510.21466-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 2010-08-26 20:49 ` Roland Dreier [not found] ` <adapqx5rtez.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> 2010-08-26 21:17 ` Christoph Lameter [not found] ` <alpine.DEB.2.00.1008261613160.24174-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 2010-08-26 21:29 ` Jason Gunthorpe 2010-08-26 21:31 ` [IPoIB] Identify multicast packets and fix IGMP breakage V2 Christoph Lameter [not found] ` <alpine.DEB.2.00.1008261629440.24174-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 2010-08-26 22:15 ` David Miller [not found] ` <20100826.151553.242147157.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> 2010-08-26 22:21 ` Jason Gunthorpe [not found] ` <20100826222146.GA23025-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2010-08-26 23:26 ` [IPoIB] Identify multicast packets and fix IGMP breakage V3 Christoph Lameter [not found] ` <alpine.DEB.2.00.1008261825490.26351-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 2010-08-26 23:43 ` Jason Gunthorpe 2010-08-26 23:57 ` Christoph Lameter [not found] ` <alpine.DEB.2.00.1008261856090.29657-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 2010-08-27 0:07 ` David Miller 2010-08-27 2:24 ` Christoph Lameter [not found] ` <20100826234342.GA24333-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2010-08-27 0:02 ` Yossi Etigin 2010-08-27 0:02 ` Yossi Etigin [not found] ` <7E95F01E94AB484F83061FCFA35B39F89BE276-QfUkFaTmzUSUvQqKE/ONIwC/G2K4zDHf@public.gmane.org> 2010-08-27 0:17 ` Jason Gunthorpe 2010-08-27 13:29 ` Christoph Lameter 2010-09-14 7:27 ` Or Gerlitz [not found] ` <4C8F23DB.2040105-smomgflXvOZWk0Htik3J/w@public.gmane.org> 2010-09-14 14:02 ` Christoph Lameter [not found] ` <alpine.DEB.2.00.1008270827280.11792-sBS69tsa9Uj/9pzu0YdTqQ@public.gmane.org> 2010-09-28 18:09 ` Roland Dreier 2010-08-26 21:32 ` [IPoIB] Identify Multicast packets and fix IGMP breakage Roland Dreier
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.