netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/5] xen-netback: IPv6 offload support
@ 2013-10-08 10:58 Paul Durrant
  2013-10-08 10:58 ` [PATCH net-next v2 1/5] xen-netback: add IPv6 checksum offload to guest Paul Durrant
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Paul Durrant @ 2013-10-08 10:58 UTC (permalink / raw)
  To: xen-devel, netdev

This patch series adds support for checksum and large packet offloads into
xen-netback.
Testing has mainly been done using the Microsoft network hardware
certification suite running in Server 2008R2 VMs with Citrix PV frontends.

v2:
- Fixed Wei's email address in Cc lines

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next v2 1/5] xen-netback: add IPv6 checksum offload to guest
  2013-10-08 10:58 [PATCH net-next v2 0/5] xen-netback: IPv6 offload support Paul Durrant
@ 2013-10-08 10:58 ` Paul Durrant
  2013-10-08 13:10   ` Wei Liu
  2013-10-08 10:58 ` [PATCH net-next v2 2/5] xen-netback: Add support for IPv6 checksum offload from guest Paul Durrant
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Paul Durrant @ 2013-10-08 10:58 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell

Check xenstore flag feature-ipv6-csum-offload to determine if a
guest is happy to accept IPv6 packets with only partial checksum.
Also check analogous feature-ip-csum-offload to determone if a
guest is happy to accept IPv4 packets with only partial checksum
as a replacement for a negated feature-no-csum-offload value and
add a comment to deprecate use of feature-no-csum-offload.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/common.h    |    3 ++-
 drivers/net/xen-netback/interface.c |   10 +++++++---
 drivers/net/xen-netback/xenbus.c    |   24 +++++++++++++++++++++++-
 3 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 5715318..b4a9a3c 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -153,7 +153,8 @@ struct xenvif {
 	u8 can_sg:1;
 	u8 gso:1;
 	u8 gso_prefix:1;
-	u8 csum:1;
+	u8 ip_csum:1;
+	u8 ipv6_csum:1;
 
 	/* Internal feature information. */
 	u8 can_queue:1;	    /* can queue packets for receiver? */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 01bb854..8e92783 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -216,8 +216,10 @@ static netdev_features_t xenvif_fix_features(struct net_device *dev,
 		features &= ~NETIF_F_SG;
 	if (!vif->gso && !vif->gso_prefix)
 		features &= ~NETIF_F_TSO;
-	if (!vif->csum)
+	if (!vif->ip_csum)
 		features &= ~NETIF_F_IP_CSUM;
+	if (!vif->ipv6_csum)
+		features &= ~NETIF_F_IPV6_CSUM;
 
 	return features;
 }
@@ -306,7 +308,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->domid  = domid;
 	vif->handle = handle;
 	vif->can_sg = 1;
-	vif->csum = 1;
+	vif->ip_csum = 1;
 	vif->dev = dev;
 
 	vif->credit_bytes = vif->remaining_credit = ~0UL;
@@ -316,7 +318,9 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->credit_timeout.expires = jiffies;
 
 	dev->netdev_ops	= &xenvif_netdev_ops;
-	dev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO;
+	dev->hw_features = NETIF_F_SG |
+		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
+		NETIF_F_TSO;
 	dev->features = dev->hw_features;
 	SET_ETHTOOL_OPS(dev, &xenvif_ethtool_ops);
 
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index b45bce2..5e45b00 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -567,10 +567,32 @@ static int connect_rings(struct backend_info *be)
 		val = 0;
 	vif->gso_prefix = !!val;
 
+	/* Before feature-ipv6-csum-offload was introduced, checksum offload
+	 * was turned on by a zero value in (or lack of)
+	 * feature-no-csum-offload. Therefore, for compatibility, assume
+	 * that if feature-ip-csum-offload is missing that IP (v4) offload
+	 * is turned on. If this is not the case then the flag will be
+	 * cleared when we subsequently sample feature-no-csum-offload.
+	 */
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ip-csum-offload",
+			 "%d", &val) < 0)
+		val = 1;
+	vif->ip_csum = !!val;
+
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ipv6-csum-offload",
+			 "%d", &val) < 0)
+		val = 0;
+	vif->ipv6_csum = !!val;
+
+	/* This is deprecated. Frontends should set IP v4 and v6 checksum
+	 * offload using feature-ip-csum-offload and
+	 * feature-ipv6-csum-offload respectively.
+	 */
 	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-no-csum-offload",
 			 "%d", &val) < 0)
 		val = 0;
-	vif->csum = !val;
+	if (val)
+		vif->ip_csum = 0;
 
 	/* Map the shared frame, irq etc. */
 	err = xenvif_connect(vif, tx_ring_ref, rx_ring_ref,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v2 2/5] xen-netback: Add support for IPv6 checksum offload from guest
  2013-10-08 10:58 [PATCH net-next v2 0/5] xen-netback: IPv6 offload support Paul Durrant
  2013-10-08 10:58 ` [PATCH net-next v2 1/5] xen-netback: add IPv6 checksum offload to guest Paul Durrant
@ 2013-10-08 10:58 ` Paul Durrant
  2013-10-08 13:30   ` Wei Liu
  2013-10-08 10:58 ` [PATCH net-next v2 3/5] xen-netback: Unconditionally set NETIF_F_RXCSUM Paul Durrant
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Paul Durrant @ 2013-10-08 10:58 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell

For performance of VM to VM traffic on a single host it is better to avoid
calculation of TCP/UDP checksum in the sending frontend. To allow this this
patch adds the code necessary to set up partial checksum for IPv6 packets
and xenstore flag feature-ipv6-csum-offload to advertise that fact to
frontends.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/netback.c |  224 ++++++++++++++++++++++++++++++-------
 drivers/net/xen-netback/xenbus.c  |    9 ++
 2 files changed, 191 insertions(+), 42 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index f3e591c..35b1fa6 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -109,15 +109,10 @@ static inline unsigned long idx_to_kaddr(struct xenvif *vif,
 	return (unsigned long)pfn_to_kaddr(idx_to_pfn(vif, idx));
 }
 
-/*
- * This is the amount of packet we copy rather than map, so that the
- * guest can't fiddle with the contents of the headers while we do
- * packet processing on them (netfilter, routing, etc).
+/* This is a miniumum size for the linear area to avoid lots of
+ * calls to __pskb_pull_tail() as we set up checksum offsets.
  */
-#define PKT_PROT_LEN    (ETH_HLEN + \
-			 VLAN_HLEN + \
-			 sizeof(struct iphdr) + MAX_IPOPTLEN + \
-			 sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE)
+#define PKT_PROT_LEN 128
 
 static u16 frag_get_pending_idx(skb_frag_t *frag)
 {
@@ -1118,61 +1113,72 @@ static int xenvif_set_skb_gso(struct xenvif *vif,
 	return 0;
 }
 
-static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
+static inline void maybe_pull_tail(struct sk_buff *skb, unsigned int len)
 {
-	struct iphdr *iph;
+	if (skb_is_nonlinear(skb) && skb_headlen(skb) < len) {
+		int target = min_t(int, skb->len, len);
+		__pskb_pull_tail(skb, target - skb_headlen(skb));
+	}
+}
+
+static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
+			     int recalculate_partial_csum)
+{
+	struct iphdr *iph = (void *)skb->data;
+	unsigned int header_size;
+	unsigned int off;
 	int err = -EPROTO;
-	int recalculate_partial_csum = 0;
 
-	/*
-	 * A GSO SKB must be CHECKSUM_PARTIAL. However some buggy
-	 * peers can fail to set NETRXF_csum_blank when sending a GSO
-	 * frame. In this case force the SKB to CHECKSUM_PARTIAL and
-	 * recalculate the partial checksum.
-	 */
-	if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {
-		vif->rx_gso_checksum_fixup++;
-		skb->ip_summed = CHECKSUM_PARTIAL;
-		recalculate_partial_csum = 1;
-	}
+	off = sizeof(struct iphdr);
 
-	/* A non-CHECKSUM_PARTIAL SKB does not require setup. */
-	if (skb->ip_summed != CHECKSUM_PARTIAL)
-		return 0;
+	header_size = skb->network_header + off + MAX_IPOPTLEN;
+	maybe_pull_tail(skb, header_size);
 
-	if (skb->protocol != htons(ETH_P_IP))
-		goto out;
+	off = iph->ihl * 4;
 
-	iph = (void *)skb->data;
 	switch (iph->protocol) {
 	case IPPROTO_TCP:
-		if (!skb_partial_csum_set(skb, 4 * iph->ihl,
+		if (!skb_partial_csum_set(skb, off,
 					  offsetof(struct tcphdr, check)))
 			goto out;
 
 		if (recalculate_partial_csum) {
 			struct tcphdr *tcph = tcp_hdr(skb);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct tcphdr) +
+				MAX_TCP_OPTION_SPACE;
+			maybe_pull_tail(skb, header_size);
+
 			tcph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
-							 skb->len - iph->ihl*4,
+							 skb->len - off,
 							 IPPROTO_TCP, 0);
 		}
 		break;
 	case IPPROTO_UDP:
-		if (!skb_partial_csum_set(skb, 4 * iph->ihl,
+		if (!skb_partial_csum_set(skb, off,
 					  offsetof(struct udphdr, check)))
 			goto out;
 
 		if (recalculate_partial_csum) {
 			struct udphdr *udph = udp_hdr(skb);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct udphdr);
+			maybe_pull_tail(skb, header_size);
+
 			udph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
-							 skb->len - iph->ihl*4,
+							 skb->len - off,
 							 IPPROTO_UDP, 0);
 		}
 		break;
 	default:
 		if (net_ratelimit())
 			netdev_err(vif->dev,
-				   "Attempting to checksum a non-TCP/UDP packet, dropping a protocol %d packet\n",
+				   "Attempting to checksum a non-TCP/UDP packet, "
+				   "dropping a protocol %d packet\n",
 				   iph->protocol);
 		goto out;
 	}
@@ -1183,6 +1189,148 @@ out:
 	return err;
 }
 
+static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
+			       int recalculate_partial_csum)
+{
+	int err = -EPROTO;
+	struct ipv6hdr *ipv6h = (void *)skb->data;
+	u8 nexthdr;
+	unsigned int header_size;
+	unsigned int off;
+	bool done;
+
+	done = false;
+	off = sizeof(struct ipv6hdr);
+
+	header_size = skb->network_header + off;
+	maybe_pull_tail(skb, header_size);
+
+	nexthdr = ipv6h->nexthdr;
+
+	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len)) &&
+	       !done) {
+		/* We only access at most the first 2 bytes of any option header
+		 * to determine the next one.
+		 */
+		header_size = skb->network_header + off + 2;
+		maybe_pull_tail(skb, header_size);
+
+		switch (nexthdr) {
+		case IPPROTO_FRAGMENT: {
+			struct frag_hdr *hp = (void *)(skb->data + off);
+
+			nexthdr = hp->nexthdr;
+			off += 8;
+			break;
+		}
+		case IPPROTO_DSTOPTS:
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING: {
+			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
+
+			nexthdr = hp->nexthdr;
+			off += ipv6_optlen(hp);
+			break;
+		}
+		case IPPROTO_AH: {
+			struct ip_auth_hdr *hp = (void *)(skb->data + off);
+
+			nexthdr = hp->nexthdr;
+			off += (hp->hdrlen+2)<<2;
+			break;
+		}
+		default:
+			done = true;
+			break;
+		}
+	}
+
+	if (!done)
+		goto out;
+
+	switch (nexthdr) {
+	case IPPROTO_TCP:
+		if (!skb_partial_csum_set(skb, off,
+					  offsetof(struct tcphdr, check)))
+			goto out;
+
+		if (recalculate_partial_csum) {
+			struct tcphdr *tcph = tcp_hdr(skb);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct tcphdr) +
+				MAX_TCP_OPTION_SPACE;
+			maybe_pull_tail(skb, header_size);
+
+			tcph->check = ~csum_ipv6_magic(&ipv6h->saddr,
+						       &ipv6h->daddr,
+						       skb->len - off,
+						       IPPROTO_TCP, 0);
+		}
+		break;
+	case IPPROTO_UDP:
+		if (!skb_partial_csum_set(skb, off,
+					  offsetof(struct udphdr, check)))
+			goto out;
+
+		if (recalculate_partial_csum) {
+			struct udphdr *udph = udp_hdr(skb);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct udphdr);
+			maybe_pull_tail(skb, header_size);
+
+			udph->check = ~csum_ipv6_magic(&ipv6h->saddr,
+						       &ipv6h->daddr,
+						       skb->len - off,
+						       IPPROTO_UDP, 0);
+		}
+		break;
+	default:
+		if (net_ratelimit())
+			netdev_err(vif->dev,
+				   "Attempting to checksum a non-TCP/UDP packet, "
+				   "dropping a protocol %d packet\n",
+				   nexthdr);
+		goto out;
+	}
+
+	err = 0;
+
+out:
+	return err;
+}
+
+static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
+{
+	int err = -EPROTO;
+	int recalculate_partial_csum = 0;
+
+	/* A GSO SKB must be CHECKSUM_PARTIAL. However some buggy
+	 * peers can fail to set NETRXF_csum_blank when sending a GSO
+	 * frame. In this case force the SKB to CHECKSUM_PARTIAL and
+	 * recalculate the partial checksum.
+	 */
+	if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {
+		vif->rx_gso_checksum_fixup++;
+		skb->ip_summed = CHECKSUM_PARTIAL;
+		recalculate_partial_csum = 1;
+	}
+
+	/* A non-CHECKSUM_PARTIAL SKB does not require setup. */
+	if (skb->ip_summed != CHECKSUM_PARTIAL)
+		return 0;
+
+	if (skb->protocol == htons(ETH_P_IP))
+		err = checksum_setup_ip(vif, skb, recalculate_partial_csum);
+	else if (skb->protocol == htons(ETH_P_IPV6))
+		err = checksum_setup_ipv6(vif, skb, recalculate_partial_csum);
+
+	return err;
+}
+
 static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 {
 	unsigned long now = jiffies;
@@ -1428,15 +1576,7 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
 
 		xenvif_fill_frags(vif, skb);
 
-		/*
-		 * If the initial fragment was < PKT_PROT_LEN then
-		 * pull through some bytes from the other fragments to
-		 * increase the linear region to PKT_PROT_LEN bytes.
-		 */
-		if (skb_headlen(skb) < PKT_PROT_LEN && skb_is_nonlinear(skb)) {
-			int target = min_t(int, skb->len, PKT_PROT_LEN);
-			__pskb_pull_tail(skb, target - skb_headlen(skb));
-		}
+		maybe_pull_tail(skb, PKT_PROT_LEN);
 
 		skb->dev      = vif->dev;
 		skb->protocol = eth_type_trans(skb, skb->dev);
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 5e45b00..dff423b 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -101,6 +101,15 @@ static int netback_probe(struct xenbus_device *dev,
 			goto abort_transaction;
 		}
 
+		/* We support partial checksum setup for IPv6 packets */
+		err = xenbus_printf(xbt, dev->nodename,
+				    "feature-ipv6-csum-offload",
+				    "%d", 1);
+		if (err) {
+			message = "writing feature-ipv6-csum-offload";
+			goto abort_transaction;
+		}
+
 		/* We support rx-copy path. */
 		err = xenbus_printf(xbt, dev->nodename,
 				    "feature-rx-copy", "%d", 1);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v2 3/5] xen-netback: Unconditionally set NETIF_F_RXCSUM
  2013-10-08 10:58 [PATCH net-next v2 0/5] xen-netback: IPv6 offload support Paul Durrant
  2013-10-08 10:58 ` [PATCH net-next v2 1/5] xen-netback: add IPv6 checksum offload to guest Paul Durrant
  2013-10-08 10:58 ` [PATCH net-next v2 2/5] xen-netback: Add support for IPv6 checksum offload from guest Paul Durrant
@ 2013-10-08 10:58 ` Paul Durrant
  2013-10-08 10:58 ` [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO packets from the guest Paul Durrant
  2013-10-08 10:58 ` [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to " Paul Durrant
  4 siblings, 0 replies; 20+ messages in thread
From: Paul Durrant @ 2013-10-08 10:58 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell

There is no mechanism to insist that a guest always generates a packet
with good checksum (at least for IPv4) so we must handle checksum
offloading from the guest and hence should set NETIF_F_RXCSUM.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/interface.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 8e92783..cb0d8ea 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -321,7 +321,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	dev->hw_features = NETIF_F_SG |
 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
 		NETIF_F_TSO;
-	dev->features = dev->hw_features;
+	dev->features = dev->hw_features | NETIF_F_RXCSUM;
 	SET_ETHTOOL_OPS(dev, &xenvif_ethtool_ops);
 
 	dev->tx_queue_len = XENVIF_QUEUE_LENGTH;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO packets from the guest
  2013-10-08 10:58 [PATCH net-next v2 0/5] xen-netback: IPv6 offload support Paul Durrant
                   ` (2 preceding siblings ...)
  2013-10-08 10:58 ` [PATCH net-next v2 3/5] xen-netback: Unconditionally set NETIF_F_RXCSUM Paul Durrant
@ 2013-10-08 10:58 ` Paul Durrant
  2013-10-08 13:31   ` Wei Liu
  2013-10-08 10:58 ` [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to " Paul Durrant
  4 siblings, 1 reply; 20+ messages in thread
From: Paul Durrant @ 2013-10-08 10:58 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell

This patch a xenstore flag, festure-gso-tcpv6, to adevrtise netback can
handle IPv6 TCP GSO packets and creates SKB_GSO_TCPV6 skbs if the frontend
passes an extra segment with the new type XEN_NETIF_GSO_TYPE_TCPV6 added to
netif.h.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/netback.c |   11 ++++++++---
 drivers/net/xen-netback/xenbus.c  |    7 +++++++
 include/xen/interface/io/netif.h  |    3 ++-
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 35b1fa6..ac42f73 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1096,15 +1096,20 @@ static int xenvif_set_skb_gso(struct xenvif *vif,
 		return -EINVAL;
 	}
 
-	/* Currently only TCPv4 S.O. is supported. */
-	if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4) {
+	switch (gso->u.gso.type) {
+	case XEN_NETIF_GSO_TYPE_TCPV4:
+		skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
+		break;
+	case XEN_NETIF_GSO_TYPE_TCPV6:
+		skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
+		break;
+	default:
 		netdev_err(vif->dev, "Bad GSO type %d.\n", gso->u.gso.type);
 		xenvif_fatal_tx_err(vif);
 		return -EINVAL;
 	}
 
 	skb_shinfo(skb)->gso_size = gso->u.gso.size;
-	skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
 
 	/* Header must be checked, and gso_segs computed. */
 	skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index dff423b..389fa72 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -101,6 +101,13 @@ static int netback_probe(struct xenbus_device *dev,
 			goto abort_transaction;
 		}
 
+		err = xenbus_printf(xbt, dev->nodename, "feature-gso-tcpv6",
+				    "%d", sg);
+		if (err) {
+			message = "writing feature-gso-tcpv6";
+			goto abort_transaction;
+		}
+
 		/* We support partial checksum setup for IPv6 packets */
 		err = xenbus_printf(xbt, dev->nodename,
 				    "feature-ipv6-csum-offload",
diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
index eb262e3..c19cc06 100644
--- a/include/xen/interface/io/netif.h
+++ b/include/xen/interface/io/netif.h
@@ -95,8 +95,9 @@ struct xen_netif_tx_request {
 #define _XEN_NETIF_EXTRA_FLAG_MORE	(0)
 #define  XEN_NETIF_EXTRA_FLAG_MORE	(1U<<_XEN_NETIF_EXTRA_FLAG_MORE)
 
-/* GSO types - only TCPv4 currently supported. */
+/* GSO types */
 #define XEN_NETIF_GSO_TYPE_TCPV4	(1)
+#define XEN_NETIF_GSO_TYPE_TCPV6	(2)
 
 /*
  * This structure needs to fit within both netif_tx_request and
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to the guest
  2013-10-08 10:58 [PATCH net-next v2 0/5] xen-netback: IPv6 offload support Paul Durrant
                   ` (3 preceding siblings ...)
  2013-10-08 10:58 ` [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO packets from the guest Paul Durrant
@ 2013-10-08 10:58 ` Paul Durrant
  2013-10-08 13:34   ` Wei Liu
  2013-10-09  4:41   ` [Xen-devel] " annie li
  4 siblings, 2 replies; 20+ messages in thread
From: Paul Durrant @ 2013-10-08 10:58 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell

This patch adds code to handle SKB_GSO_TCPV6 skbs and construct appropriate
extra or prefix segments to pass the large packet to the frontend. New
xenstore flags, feature-gso-tcpv6 and feature-gso-tcpv6-prefix, are sampled
to determine if the frontend is capable of handling such packets.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/common.h    |    6 +++--
 drivers/net/xen-netback/interface.c |    8 ++++--
 drivers/net/xen-netback/netback.c   |   47 +++++++++++++++++++++++++++--------
 drivers/net/xen-netback/xenbus.c    |   29 +++++++++++++++++++--
 4 files changed, 74 insertions(+), 16 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index b4a9a3c..720b1ca 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -87,6 +87,7 @@ struct pending_tx_info {
 struct xenvif_rx_meta {
 	int id;
 	int size;
+	int gso_type;
 	int gso_size;
 };
 
@@ -150,9 +151,10 @@ struct xenvif {
 	u8               fe_dev_addr[6];
 
 	/* Frontend feature information. */
+	int gso_mask;
+	int gso_prefix_mask;
+
 	u8 can_sg:1;
-	u8 gso:1;
-	u8 gso_prefix:1;
 	u8 ip_csum:1;
 	u8 ipv6_csum:1;
 
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index cb0d8ea..3d11387 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -214,8 +214,12 @@ static netdev_features_t xenvif_fix_features(struct net_device *dev,
 
 	if (!vif->can_sg)
 		features &= ~NETIF_F_SG;
-	if (!vif->gso && !vif->gso_prefix)
+	if (~(vif->gso_mask | vif->gso_prefix_mask) &
+	    (1 << XEN_NETIF_GSO_TYPE_TCPV4))
 		features &= ~NETIF_F_TSO;
+	if (~(vif->gso_mask | vif->gso_prefix_mask) &
+	    (1 << XEN_NETIF_GSO_TYPE_TCPV6))
+		features &= ~NETIF_F_TSO6;
 	if (!vif->ip_csum)
 		features &= ~NETIF_F_IP_CSUM;
 	if (!vif->ipv6_csum)
@@ -320,7 +324,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	dev->netdev_ops	= &xenvif_netdev_ops;
 	dev->hw_features = NETIF_F_SG |
 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
-		NETIF_F_TSO;
+		NETIF_F_TSO | NETIF_F_TSO6;
 	dev->features = dev->hw_features | NETIF_F_RXCSUM;
 	SET_ETHTOOL_OPS(dev, &xenvif_ethtool_ops);
 
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index ac42f73..ee0d55c 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -140,7 +140,7 @@ static int max_required_rx_slots(struct xenvif *vif)
 	int max = DIV_ROUND_UP(vif->dev->mtu, PAGE_SIZE);
 
 	/* XXX FIXME: RX path dependent on MAX_SKB_FRAGS */
-	if (vif->can_sg || vif->gso || vif->gso_prefix)
+	if (vif->can_sg || vif->gso_mask || vif->gso_prefix_mask)
 		max += MAX_SKB_FRAGS + 1; /* extra_info + frags */
 
 	return max;
@@ -334,6 +334,7 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 	struct gnttab_copy *copy_gop;
 	struct xenvif_rx_meta *meta;
 	unsigned long bytes;
+	int gso_type;
 
 	/* Data must not cross a page boundary. */
 	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
@@ -392,7 +393,14 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 		}
 
 		/* Leave a gap for the GSO descriptor. */
-		if (*head && skb_shinfo(skb)->gso_size && !vif->gso_prefix)
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
+			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
+			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+		else
+			gso_type = 0;
+
+		if (*head && ((1 << gso_type) & vif->gso_prefix_mask))
 			vif->rx.req_cons++;
 
 		*head = 0; /* There must be something in this buffer now. */
@@ -423,14 +431,28 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 	unsigned char *data;
 	int head = 1;
 	int old_meta_prod;
+	int gso_type;
+	int gso_size;
 
 	old_meta_prod = npo->meta_prod;
 
+	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
+		gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+		gso_size = skb_shinfo(skb)->gso_size;
+	} else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) {
+		gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+		gso_size = skb_shinfo(skb)->gso_size;
+	} else {
+		gso_type = 0;
+		gso_size = 0;
+	}
+
 	/* Set up a GSO prefix descriptor, if necessary */
-	if (skb_shinfo(skb)->gso_size && vif->gso_prefix) {
+	if ((1 << gso_type) & vif->gso_prefix_mask) {
 		req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
 		meta = npo->meta + npo->meta_prod++;
-		meta->gso_size = skb_shinfo(skb)->gso_size;
+		meta->gso_type = gso_type;
+		meta->gso_size = gso_size;
 		meta->size = 0;
 		meta->id = req->id;
 	}
@@ -438,10 +460,13 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
 	meta = npo->meta + npo->meta_prod++;
 
-	if (!vif->gso_prefix)
-		meta->gso_size = skb_shinfo(skb)->gso_size;
-	else
+	if ((1 << gso_type) & vif->gso_mask) {
+		meta->gso_type = gso_type;
+		meta->gso_size = gso_size;
+	} else {
+		meta->gso_type = 0;
 		meta->gso_size = 0;
+	}
 
 	meta->size = 0;
 	meta->id = req->id;
@@ -587,7 +612,8 @@ void xenvif_rx_action(struct xenvif *vif)
 
 		vif = netdev_priv(skb->dev);
 
-		if (vif->meta[npo.meta_cons].gso_size && vif->gso_prefix) {
+		if ((1 << vif->meta[npo.meta_cons].gso_type) &
+		    vif->gso_prefix_mask) {
 			resp = RING_GET_RESPONSE(&vif->rx,
 						 vif->rx.rsp_prod_pvt++);
 
@@ -624,7 +650,8 @@ void xenvif_rx_action(struct xenvif *vif)
 					vif->meta[npo.meta_cons].size,
 					flags);
 
-		if (vif->meta[npo.meta_cons].gso_size && !vif->gso_prefix) {
+		if ((1 << vif->meta[npo.meta_cons].gso_type) &
+		    vif->gso_mask) {
 			struct xen_netif_extra_info *gso =
 				(struct xen_netif_extra_info *)
 				RING_GET_RESPONSE(&vif->rx,
@@ -632,8 +659,8 @@ void xenvif_rx_action(struct xenvif *vif)
 
 			resp->flags |= XEN_NETRXF_extra_info;
 
+			gso->u.gso.type = vif->meta[npo.meta_cons].gso_type;
 			gso->u.gso.size = vif->meta[npo.meta_cons].gso_size;
-			gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
 			gso->u.gso.pad = 0;
 			gso->u.gso.features = 0;
 
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 389fa72..4894256 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -573,15 +573,40 @@ static int connect_rings(struct backend_info *be)
 		val = 0;
 	vif->can_sg = !!val;
 
+	vif->gso_mask = 0;
+	vif->gso_prefix_mask = 0;
+
 	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4",
 			 "%d", &val) < 0)
 		val = 0;
-	vif->gso = !!val;
+	if (val)
+		vif->gso_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV4;
 
 	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4-prefix",
 			 "%d", &val) < 0)
 		val = 0;
-	vif->gso_prefix = !!val;
+	if (val)
+		vif->gso_prefix_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV4;
+
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv6",
+			 "%d", &val) < 0)
+		val = 0;
+	if (val)
+		vif->gso_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV6;
+
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv6-prefix",
+			 "%d", &val) < 0)
+		val = 0;
+	if (val)
+		vif->gso_prefix_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV6;
+
+	if (vif->gso_mask & vif->gso_prefix_mask) {
+		xenbus_dev_fatal(dev, err,
+				 "%s: gso and gso prefix flags are not "
+				 "mutually exclusive",
+				 dev->otherend);
+		return -EOPNOTSUPP;
+	}
 
 	/* Before feature-ipv6-csum-offload was introduced, checksum offload
 	 * was turned on by a zero value in (or lack of)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v2 1/5] xen-netback: add IPv6 checksum offload to guest
  2013-10-08 10:58 ` [PATCH net-next v2 1/5] xen-netback: add IPv6 checksum offload to guest Paul Durrant
@ 2013-10-08 13:10   ` Wei Liu
  2013-10-08 13:14     ` Paul Durrant
  0 siblings, 1 reply; 20+ messages in thread
From: Wei Liu @ 2013-10-08 13:10 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell

On Tue, Oct 08, 2013 at 11:58:12AM +0100, Paul Durrant wrote:
> Check xenstore flag feature-ipv6-csum-offload to determine if a
> guest is happy to accept IPv6 packets with only partial checksum.
> Also check analogous feature-ip-csum-offload to determone if a

determone -> determine

> guest is happy to accept IPv4 packets with only partial checksum
> as a replacement for a negated feature-no-csum-offload value and
> add a comment to deprecate use of feature-no-csum-offload.
> 
[...]
> @@ -306,7 +308,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
>  	vif->domid  = domid;
>  	vif->handle = handle;
>  	vif->can_sg = 1;
> -	vif->csum = 1;
> +	vif->ip_csum = 1;

Not setting a default value for vif->ipv6_csum?

>  	vif->dev = dev;
>  
[...]
> +	/* Before feature-ipv6-csum-offload was introduced, checksum offload
> +	 * was turned on by a zero value in (or lack of)
> +	 * feature-no-csum-offload. Therefore, for compatibility, assume
> +	 * that if feature-ip-csum-offload is missing that IP (v4) offload
> +	 * is turned on. If this is not the case then the flag will be
> +	 * cleared when we subsequently sample feature-no-csum-offload.
> +	 */
> +	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ip-csum-offload",
> +			 "%d", &val) < 0)
> +		val = 1;
> +	vif->ip_csum = !!val;
> +
> +	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ipv6-csum-offload",
> +			 "%d", &val) < 0)
> +		val = 0;
> +	vif->ipv6_csum = !!val;
> +
> +	/* This is deprecated. Frontends should set IP v4 and v6 checksum
> +	 * offload using feature-ip-csum-offload and
> +	 * feature-ipv6-csum-offload respectively.
> +	 */

These comments on feature flags should also be synchronized to master
netif.h in Xen and Linux's header. We've been trying to document thing
along the way for quite some time and the long term benifit is big.

>  	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-no-csum-offload",
>  			 "%d", &val) < 0)
>  		val = 0;
> -	vif->csum = !val;
> +	if (val)
> +		vif->ip_csum = 0;

Do you also need to set vif->ipv6_csum to 0? I don't think a frontend
which cannot deal with V4 csum offload has the ability to deal with V6
csum offload.

Wei.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH net-next v2 1/5] xen-netback: add IPv6 checksum offload to guest
  2013-10-08 13:10   ` Wei Liu
@ 2013-10-08 13:14     ` Paul Durrant
  0 siblings, 0 replies; 20+ messages in thread
From: Paul Durrant @ 2013-10-08 13:14 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 08 October 2013 14:10
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v2 1/5] xen-netback: add IPv6 checksum
> offload to guest
> 
> On Tue, Oct 08, 2013 at 11:58:12AM +0100, Paul Durrant wrote:
> > Check xenstore flag feature-ipv6-csum-offload to determine if a
> > guest is happy to accept IPv6 packets with only partial checksum.
> > Also check analogous feature-ip-csum-offload to determone if a
> 
> determone -> determine
> 

Ok.

> > guest is happy to accept IPv4 packets with only partial checksum
> > as a replacement for a negated feature-no-csum-offload value and
> > add a comment to deprecate use of feature-no-csum-offload.
> >
> [...]
> > @@ -306,7 +308,7 @@ struct xenvif *xenvif_alloc(struct device *parent,
> domid_t domid,
> >  	vif->domid  = domid;
> >  	vif->handle = handle;
> >  	vif->can_sg = 1;
> > -	vif->csum = 1;
> > +	vif->ip_csum = 1;
> 
> Not setting a default value for vif->ipv6_csum?
> 

The default is 0 so no need.

> >  	vif->dev = dev;
> >
> [...]
> > +	/* Before feature-ipv6-csum-offload was introduced, checksum
> offload
> > +	 * was turned on by a zero value in (or lack of)
> > +	 * feature-no-csum-offload. Therefore, for compatibility, assume
> > +	 * that if feature-ip-csum-offload is missing that IP (v4) offload
> > +	 * is turned on. If this is not the case then the flag will be
> > +	 * cleared when we subsequently sample feature-no-csum-offload.
> > +	 */
> > +	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ip-csum-
> offload",
> > +			 "%d", &val) < 0)
> > +		val = 1;
> > +	vif->ip_csum = !!val;
> > +
> > +	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ipv6-csum-
> offload",
> > +			 "%d", &val) < 0)
> > +		val = 0;
> > +	vif->ipv6_csum = !!val;
> > +
> > +	/* This is deprecated. Frontends should set IP v4 and v6 checksum
> > +	 * offload using feature-ip-csum-offload and
> > +	 * feature-ipv6-csum-offload respectively.
> > +	 */
> 
> These comments on feature flags should also be synchronized to master
> netif.h in Xen and Linux's header. We've been trying to document thing
> along the way for quite some time and the long term benifit is big.
> 

Ok. That sounds like a good idea. I'll add a patch to do that.

> >  	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-no-csum-
> offload",
> >  			 "%d", &val) < 0)
> >  		val = 0;
> > -	vif->csum = !val;
> > +	if (val)
> > +		vif->ip_csum = 0;
> 
> Do you also need to set vif->ipv6_csum to 0? I don't think a frontend
> which cannot deal with V4 csum offload has the ability to deal with V6
> csum offload.
> 

I was just trying to preserve the existing semantic of feature-no-csum-offload, which only affects v4. Since it's deprecated, should we add a new semantic?

  Paul

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v2 2/5] xen-netback: Add support for IPv6 checksum offload from guest
  2013-10-08 10:58 ` [PATCH net-next v2 2/5] xen-netback: Add support for IPv6 checksum offload from guest Paul Durrant
@ 2013-10-08 13:30   ` Wei Liu
  2013-10-08 13:50     ` Paul Durrant
  0 siblings, 1 reply; 20+ messages in thread
From: Wei Liu @ 2013-10-08 13:30 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell

On Tue, Oct 08, 2013 at 11:58:13AM +0100, Paul Durrant wrote:
[...]
> -/*
> - * This is the amount of packet we copy rather than map, so that the
> - * guest can't fiddle with the contents of the headers while we do
> - * packet processing on them (netfilter, routing, etc).
> +/* This is a miniumum size for the linear area to avoid lots of
> + * calls to __pskb_pull_tail() as we set up checksum offsets.
>   */
> -#define PKT_PROT_LEN    (ETH_HLEN + \
> -			 VLAN_HLEN + \
> -			 sizeof(struct iphdr) + MAX_IPOPTLEN + \
> -			 sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE)
> +#define PKT_PROT_LEN 128
>  

Where does 128 come from?

>  static u16 frag_get_pending_idx(skb_frag_t *frag)
>  {
> @@ -1118,61 +1113,72 @@ static int xenvif_set_skb_gso(struct xenvif *vif,
>  	return 0;
>  }
>  
> -static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
> +static inline void maybe_pull_tail(struct sk_buff *skb, unsigned int len)
>  {
> -	struct iphdr *iph;
> +	if (skb_is_nonlinear(skb) && skb_headlen(skb) < len) {
> +		int target = min_t(int, skb->len, len);
> +		__pskb_pull_tail(skb, target - skb_headlen(skb));
> +	}
> +}
> +
> +static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
> +			     int recalculate_partial_csum)
> +{
[...]
>  
>  		if (recalculate_partial_csum) {
>  			struct tcphdr *tcph = tcp_hdr(skb);
> +
> +			header_size = skb->network_header +
> +				off +
> +				sizeof(struct tcphdr) +
> +				MAX_TCP_OPTION_SPACE;
> +			maybe_pull_tail(skb, header_size);
> +

I presume this function is checksum_setup stripped down to handle IPv4
packet. What's the purpose of changing its behaviour? Why the pull_tail
here?

> +static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
> +			       int recalculate_partial_csum)
> +{
> +	int err = -EPROTO;
> +	struct ipv6hdr *ipv6h = (void *)skb->data;
> +	u8 nexthdr;
> +	unsigned int header_size;
> +	unsigned int off;
> +	bool done;
> +
> +	done = false;
> +	off = sizeof(struct ipv6hdr);
> +
> +	header_size = skb->network_header + off;
> +	maybe_pull_tail(skb, header_size);
> +
> +	nexthdr = ipv6h->nexthdr;
> +
> +	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len)) &&
> +	       !done) {
> +		/* We only access at most the first 2 bytes of any option header
> +		 * to determine the next one.
> +		 */
> +		header_size = skb->network_header + off + 2;
> +		maybe_pull_tail(skb, header_size);
> +

Will this cause performance problem? Is it possible that you pull too
many times?

> +		switch (nexthdr) {
> +		case IPPROTO_FRAGMENT: {
> +			struct frag_hdr *hp = (void *)(skb->data + off);
> +
> +			nexthdr = hp->nexthdr;
> +			off += 8;
> +			break;
> +		}
> +		case IPPROTO_DSTOPTS:
> +		case IPPROTO_HOPOPTS:
> +		case IPPROTO_ROUTING: {
> +			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
> +
> +			nexthdr = hp->nexthdr;
> +			off += ipv6_optlen(hp);
> +			break;
> +		}
> +		case IPPROTO_AH: {
> +			struct ip_auth_hdr *hp = (void *)(skb->data + off);
> +
> +			nexthdr = hp->nexthdr;
> +			off += (hp->hdrlen+2)<<2;
> +			break;
> +		}
> +		default:
> +			done = true;
> +			break;

Can you make the logic explicit here?

   case IPPROTO_TCP:
   case IPPROTO_UDP:
        done = true;
	break;
   default:
        break;

Another minor suggestion is that change "done" to "found" because you're
trying to find the two type of headers.

> +	switch (nexthdr) {
> +	case IPPROTO_TCP:
> +		if (!skb_partial_csum_set(skb, off,
> +					  offsetof(struct tcphdr, check)))
> +			goto out;
> +
> +		if (recalculate_partial_csum) {
> +			struct tcphdr *tcph = tcp_hdr(skb);
> +
> +			header_size = skb->network_header +
> +				off +
> +				sizeof(struct tcphdr) +
> +				MAX_TCP_OPTION_SPACE;
> +			maybe_pull_tail(skb, header_size);
> +

Same question as IPv4 counterpart, why do you need to pull here?

Wei.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO packets from the guest
  2013-10-08 10:58 ` [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO packets from the guest Paul Durrant
@ 2013-10-08 13:31   ` Wei Liu
  2013-10-08 13:42     ` Paul Durrant
  0 siblings, 1 reply; 20+ messages in thread
From: Wei Liu @ 2013-10-08 13:31 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell

On Tue, Oct 08, 2013 at 11:58:15AM +0100, Paul Durrant wrote:
> This patch a xenstore flag, festure-gso-tcpv6, to adevrtise netback can
            ^ adds            feature               advertise

> handle IPv6 TCP GSO packets and creates SKB_GSO_TCPV6 skbs if the frontend
> passes an extra segment with the new type XEN_NETIF_GSO_TYPE_TCPV6 added to
> netif.h.

Note the new type should be synced to Xen's netif.h tree with separate
patch for Xen.

Wei.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to the guest
  2013-10-08 10:58 ` [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to " Paul Durrant
@ 2013-10-08 13:34   ` Wei Liu
  2013-10-08 13:40     ` Paul Durrant
  2013-10-09  4:41   ` [Xen-devel] " annie li
  1 sibling, 1 reply; 20+ messages in thread
From: Wei Liu @ 2013-10-08 13:34 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell

On Tue, Oct 08, 2013 at 11:58:16AM +0100, Paul Durrant wrote:
[...]
>  	/* Data must not cross a page boundary. */
>  	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
> @@ -392,7 +393,14 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
>  		}
>  
>  		/* Leave a gap for the GSO descriptor. */
> -		if (*head && skb_shinfo(skb)->gso_size && !vif->gso_prefix)
> +		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
> +			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> +		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
> +			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> +		else
> +			gso_type = 0;

Should probably #define XEN_NETIF_GSO_TYPE_NONE 0 instead of using plain
0?

> +	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
> +		gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> +		gso_size = skb_shinfo(skb)->gso_size;
> +	} else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) {
> +		gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> +		gso_size = skb_shinfo(skb)->gso_size;
> +	} else {
> +		gso_type = 0;

Ditto.

> -	if (!vif->gso_prefix)
> -		meta->gso_size = skb_shinfo(skb)->gso_size;
> -	else
> +	if ((1 << gso_type) & vif->gso_mask) {
> +		meta->gso_type = gso_type;
> +		meta->gso_size = gso_size;
> +	} else {
> +		meta->gso_type = 0;

Ditto.

>  		meta->gso_size = 0;
[...]
> @@ -573,15 +573,40 @@ static int connect_rings(struct backend_info *be)
>  		val = 0;
>  	vif->can_sg = !!val;
>  
> +	vif->gso_mask = 0;
> +	vif->gso_prefix_mask = 0;
> +
>  	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4",
>  			 "%d", &val) < 0)
>  		val = 0;
> -	vif->gso = !!val;
> +	if (val)
> +		vif->gso_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV4;

Not using "|=" ?

>  
>  	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4-prefix",
>  			 "%d", &val) < 0)
>  		val = 0;
> -	vif->gso_prefix = !!val;
> +	if (val)
> +		vif->gso_prefix_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV4;
> +
> +	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv6",
> +			 "%d", &val) < 0)
> +		val = 0;
> +	if (val)
> +		vif->gso_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV6;
> +

Not using "|="? Are feature-gso_tcpv{4,6} mutually exclusive?

Same question goes to the setting of gso_prefix_mask.

Wei.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to the guest
  2013-10-08 13:34   ` Wei Liu
@ 2013-10-08 13:40     ` Paul Durrant
  0 siblings, 0 replies; 20+ messages in thread
From: Paul Durrant @ 2013-10-08 13:40 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 08 October 2013 14:34
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to
> the guest
> 
> On Tue, Oct 08, 2013 at 11:58:16AM +0100, Paul Durrant wrote:
> [...]
> >  	/* Data must not cross a page boundary. */
> >  	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
> > @@ -392,7 +393,14 @@ static void xenvif_gop_frag_copy(struct xenvif
> *vif, struct sk_buff *skb,
> >  		}
> >
> >  		/* Leave a gap for the GSO descriptor. */
> > -		if (*head && skb_shinfo(skb)->gso_size && !vif->gso_prefix)
> > +		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
> > +			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> > +		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
> > +			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> > +		else
> > +			gso_type = 0;
> 
> Should probably #define XEN_NETIF_GSO_TYPE_NONE 0 instead of using
> plain
> 0?

All I need is a bit shift that's not going to hit in the mask. Probably worth reserving the value in the header.

> 
> > +	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
> > +		gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> > +		gso_size = skb_shinfo(skb)->gso_size;
> > +	} else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) {
> > +		gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> > +		gso_size = skb_shinfo(skb)->gso_size;
> > +	} else {
> > +		gso_type = 0;
> 
> Ditto.
> 
> > -	if (!vif->gso_prefix)
> > -		meta->gso_size = skb_shinfo(skb)->gso_size;
> > -	else
> > +	if ((1 << gso_type) & vif->gso_mask) {
> > +		meta->gso_type = gso_type;
> > +		meta->gso_size = gso_size;
> > +	} else {
> > +		meta->gso_type = 0;
> 
> Ditto.
> 
> >  		meta->gso_size = 0;
> [...]
> > @@ -573,15 +573,40 @@ static int connect_rings(struct backend_info *be)
> >  		val = 0;
> >  	vif->can_sg = !!val;
> >
> > +	vif->gso_mask = 0;
> > +	vif->gso_prefix_mask = 0;
> > +
> >  	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4",
> >  			 "%d", &val) < 0)
> >  		val = 0;
> > -	vif->gso = !!val;
> > +	if (val)
> > +		vif->gso_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV4;
> 
> Not using "|=" ?
> 
> >
> >  	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4-
> prefix",
> >  			 "%d", &val) < 0)
> >  		val = 0;
> > -	vif->gso_prefix = !!val;
> > +	if (val)
> > +		vif->gso_prefix_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV4;
> > +
> > +	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv6",
> > +			 "%d", &val) < 0)
> > +		val = 0;
> > +	if (val)
> > +		vif->gso_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV6;
> > +
> 
> Not using "|="? Are feature-gso_tcpv{4,6} mutually exclusive?
> 
> Same question goes to the setting of gso_prefix_mask.
> 

That's very odd. You're correct and I could have sworn the code did have |= in these places; thanks for catching that.

  Paul

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO packets from the guest
  2013-10-08 13:31   ` Wei Liu
@ 2013-10-08 13:42     ` Paul Durrant
  2013-10-08 13:50       ` [Xen-devel] " Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Durrant @ 2013-10-08 13:42 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 08 October 2013 14:32
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO
> packets from the guest
> 
> On Tue, Oct 08, 2013 at 11:58:15AM +0100, Paul Durrant wrote:
> > This patch a xenstore flag, festure-gso-tcpv6, to adevrtise netback can
>             ^ adds            feature               advertise
> 
> > handle IPv6 TCP GSO packets and creates SKB_GSO_TCPV6 skbs if the
> frontend
> > passes an extra segment with the new type XEN_NETIF_GSO_TYPE_TCPV6
> added to
> > netif.h.
> 
> Note the new type should be synced to Xen's netif.h tree with separate
> patch for Xen.
> 

Ok. Do you think it is reasonable to make incremental changes to netif.h in this series and then submit a patch to xen-devel with the complete set once this series is accepted?

  Paul

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO packets from the guest
  2013-10-08 13:42     ` Paul Durrant
@ 2013-10-08 13:50       ` Jan Beulich
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2013-10-08 13:50 UTC (permalink / raw)
  To: Paul Durrant; +Cc: David Vrabel, Ian Campbell, Wei Liu, xen-devel, netdev

>>> On 08.10.13 at 15:42, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Wei Liu [mailto:wei.liu2@citrix.com]
>> Sent: 08 October 2013 14:32
>> To: Paul Durrant
>> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
>> Ian Campbell
>> Subject: Re: [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO
>> packets from the guest
>> 
>> On Tue, Oct 08, 2013 at 11:58:15AM +0100, Paul Durrant wrote:
>> > This patch a xenstore flag, festure-gso-tcpv6, to adevrtise netback can
>>             ^ adds            feature               advertise
>> 
>> > handle IPv6 TCP GSO packets and creates SKB_GSO_TCPV6 skbs if the
>> frontend
>> > passes an extra segment with the new type XEN_NETIF_GSO_TYPE_TCPV6
>> added to
>> > netif.h.
>> 
>> Note the new type should be synced to Xen's netif.h tree with separate
>> patch for Xen.
>> 
> 
> Ok. Do you think it is reasonable to make incremental changes to netif.h in 
> this series and then submit a patch to xen-devel with the complete set once 
> this series is accepted?

I would think so.

Jan

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH net-next v2 2/5] xen-netback: Add support for IPv6 checksum offload from guest
  2013-10-08 13:30   ` Wei Liu
@ 2013-10-08 13:50     ` Paul Durrant
  2013-10-08 16:19       ` Wei Liu
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Durrant @ 2013-10-08 13:50 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 08 October 2013 14:31
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v2 2/5] xen-netback: Add support for IPv6
> checksum offload from guest
> 
> On Tue, Oct 08, 2013 at 11:58:13AM +0100, Paul Durrant wrote:
> [...]
> > -/*
> > - * This is the amount of packet we copy rather than map, so that the
> > - * guest can't fiddle with the contents of the headers while we do
> > - * packet processing on them (netfilter, routing, etc).
> > +/* This is a miniumum size for the linear area to avoid lots of
> > + * calls to __pskb_pull_tail() as we set up checksum offsets.
> >   */
> > -#define PKT_PROT_LEN    (ETH_HLEN + \
> > -			 VLAN_HLEN + \
> > -			 sizeof(struct iphdr) + MAX_IPOPTLEN + \
> > -			 sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE)
> > +#define PKT_PROT_LEN 128
> >
> 
> Where does 128 come from?
> 

It's just an arbitrary power of 2 that was chosen because it seems to cover most likely v6 headers and all v4 headers.

> >  static u16 frag_get_pending_idx(skb_frag_t *frag)
> >  {
> > @@ -1118,61 +1113,72 @@ static int xenvif_set_skb_gso(struct xenvif *vif,
> >  	return 0;
> >  }
> >
> > -static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
> > +static inline void maybe_pull_tail(struct sk_buff *skb, unsigned int len)
> >  {
> > -	struct iphdr *iph;
> > +	if (skb_is_nonlinear(skb) && skb_headlen(skb) < len) {
> > +		int target = min_t(int, skb->len, len);
> > +		__pskb_pull_tail(skb, target - skb_headlen(skb));
> > +	}
> > +}
> > +
> > +static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
> > +			     int recalculate_partial_csum)
> > +{
> [...]
> >
> >  		if (recalculate_partial_csum) {
> >  			struct tcphdr *tcph = tcp_hdr(skb);
> > +
> > +			header_size = skb->network_header +
> > +				off +
> > +				sizeof(struct tcphdr) +
> > +				MAX_TCP_OPTION_SPACE;
> > +			maybe_pull_tail(skb, header_size);
> > +
> 
> I presume this function is checksum_setup stripped down to handle IPv4
> packet. What's the purpose of changing its behaviour? Why the pull_tail
> here?
> 

We have to make sure that the TCP header is in the linear area as we are about to write to the checksum field. In practice, the 128 byte pull should guarantee this but in case that is varied later I wanted to make sure things did not start to fail in an add way.

> > +static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
> > +			       int recalculate_partial_csum)
> > +{
> > +	int err = -EPROTO;
> > +	struct ipv6hdr *ipv6h = (void *)skb->data;
> > +	u8 nexthdr;
> > +	unsigned int header_size;
> > +	unsigned int off;
> > +	bool done;
> > +
> > +	done = false;
> > +	off = sizeof(struct ipv6hdr);
> > +
> > +	header_size = skb->network_header + off;
> > +	maybe_pull_tail(skb, header_size);
> > +
> > +	nexthdr = ipv6h->nexthdr;
> > +
> > +	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len))
> &&
> > +	       !done) {
> > +		/* We only access at most the first 2 bytes of any option
> header
> > +		 * to determine the next one.
> > +		 */
> > +		header_size = skb->network_header + off + 2;
> > +		maybe_pull_tail(skb, header_size);
> > +
> 
> Will this cause performance problem? Is it possible that you pull too
> many times?
> 

I guess it means we may get two pulls for the TCP/UDP headers rather than one so could push the pulls into the individual cases if you think it will affect performance that badly.

> > +		switch (nexthdr) {
> > +		case IPPROTO_FRAGMENT: {
> > +			struct frag_hdr *hp = (void *)(skb->data + off);
> > +
> > +			nexthdr = hp->nexthdr;
> > +			off += 8;
> > +			break;
> > +		}
> > +		case IPPROTO_DSTOPTS:
> > +		case IPPROTO_HOPOPTS:
> > +		case IPPROTO_ROUTING: {
> > +			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
> > +
> > +			nexthdr = hp->nexthdr;
> > +			off += ipv6_optlen(hp);
> > +			break;
> > +		}
> > +		case IPPROTO_AH: {
> > +			struct ip_auth_hdr *hp = (void *)(skb->data + off);
> > +
> > +			nexthdr = hp->nexthdr;
> > +			off += (hp->hdrlen+2)<<2;
> > +			break;
> > +		}
> > +		default:
> > +			done = true;
> > +			break;
> 
> Can you make the logic explicit here?
> 
>    case IPPROTO_TCP:
>    case IPPROTO_UDP:
>         done = true;
> 	break;
>    default:
>         break;
> 
> Another minor suggestion is that change "done" to "found" because you're
> trying to find the two type of headers.
> 

Yes, I could code it that way if you prefer.

> > +	switch (nexthdr) {
> > +	case IPPROTO_TCP:
> > +		if (!skb_partial_csum_set(skb, off,
> > +					  offsetof(struct tcphdr, check)))
> > +			goto out;
> > +
> > +		if (recalculate_partial_csum) {
> > +			struct tcphdr *tcph = tcp_hdr(skb);
> > +
> > +			header_size = skb->network_header +
> > +				off +
> > +				sizeof(struct tcphdr) +
> > +				MAX_TCP_OPTION_SPACE;
> > +			maybe_pull_tail(skb, header_size);
> > +
> 
> Same question as IPv4 counterpart, why do you need to pull here?
> 

Same answer as before ;-)

  Paul

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v2 2/5] xen-netback: Add support for IPv6 checksum offload from guest
  2013-10-08 13:50     ` Paul Durrant
@ 2013-10-08 16:19       ` Wei Liu
  2013-10-10  9:37         ` Paul Durrant
  0 siblings, 1 reply; 20+ messages in thread
From: Wei Liu @ 2013-10-08 16:19 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Wei Liu, xen-devel, netdev, David Vrabel, Ian Campbell

On Tue, Oct 08, 2013 at 02:50:27PM +0100, Paul Durrant wrote:
[...]
> > > -#define PKT_PROT_LEN    (ETH_HLEN + \
> > > -			 VLAN_HLEN + \
> > > -			 sizeof(struct iphdr) + MAX_IPOPTLEN + \
> > > -			 sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE)
> > > +#define PKT_PROT_LEN 128
> > >
> > 
> > Where does 128 come from?
> > 
> 
> It's just an arbitrary power of 2 that was chosen because it seems to
> cover most likely v6 headers and all v4 headers.
> 

Hmm... How about using the value of MAX_TCP_HEADER? I guess that can
cover all V4 / V6 headers.

MAX_TCP_HEADER varies, depending on configuration. To make sure we can
accommodate all guests packet we might need to use its maximum value
which can be as big as 128 + 128 + 48.

> > >  		if (recalculate_partial_csum) {
> > >  			struct tcphdr *tcph = tcp_hdr(skb);
> > > +
> > > +			header_size = skb->network_header +
> > > +				off +
> > > +				sizeof(struct tcphdr) +
> > > +				MAX_TCP_OPTION_SPACE;
> > > +			maybe_pull_tail(skb, header_size);
> > > +
> > 
> > I presume this function is checksum_setup stripped down to handle IPv4
> > packet. What's the purpose of changing its behaviour? Why the pull_tail
> > here?
> > 
> 
> We have to make sure that the TCP header is in the linear area as we
> are about to write to the checksum field. In practice, the 128 byte
> pull should guarantee this but in case that is varied later I wanted
> to make sure things did not start to fail in an add way.
> 

If you already set the pull size to maximum possible value then this
will not be necessary anymore, right?

> > > +	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len))
> > &&
> > > +	       !done) {
> > > +		/* We only access at most the first 2 bytes of any option
> > header
> > > +		 * to determine the next one.
> > > +		 */
> > > +		header_size = skb->network_header + off + 2;
> > > +		maybe_pull_tail(skb, header_size);
> > > +
> > 
> > Will this cause performance problem? Is it possible that you pull too
> > many times?
> > 
> 
> I guess it means we may get two pulls for the TCP/UDP headers rather
> than one so could push the pulls into the individual cases if you
> think it will affect performance that badly.

Hmm... Not sure I get what you mean here. The main problem I'm seeing is
that maybe_pull_tail is called in every loop.

I would like to see as few pulls as possible because __pskb_pull_tail
can be expensive and only expected to use in "exceptional cases" (quoted
from the comment above that function).

Is it possible to pull TCP_MAX_HEADER bytes once to eliminate all other
pulls in checksum_setup{,_ipv4,_ipv6}?

> 
[...]
> > 
> > Can you make the logic explicit here?
> > 
> >    case IPPROTO_TCP:
> >    case IPPROTO_UDP:
> >         done = true;
> > 	break;
> >    default:
> >         break;
> > 
> > Another minor suggestion is that change "done" to "found" because you're
> > trying to find the two type of headers.
> > 
> 
> Yes, I could code it that way if you prefer.
> 

Explicity is better than implicity IMHO. After this change could you
also move the default branch (netdev_err) in the following "switch" to
the first "switch".

Wei.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to the guest
  2013-10-08 10:58 ` [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to " Paul Durrant
  2013-10-08 13:34   ` Wei Liu
@ 2013-10-09  4:41   ` annie li
  2013-10-09 10:26     ` Paul Durrant
  1 sibling, 1 reply; 20+ messages in thread
From: annie li @ 2013-10-09  4:41 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell


On 2013-10-8 18:58, Paul Durrant wrote:
> This patch adds code to handle SKB_GSO_TCPV6 skbs and construct appropriate
> extra or prefix segments to pass the large packet to the frontend. New
> xenstore flags, feature-gso-tcpv6 and feature-gso-tcpv6-prefix, are sampled
> to determine if the frontend is capable of handling such packets.
>
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> ---
>   drivers/net/xen-netback/common.h    |    6 +++--
>   drivers/net/xen-netback/interface.c |    8 ++++--
>   drivers/net/xen-netback/netback.c   |   47 +++++++++++++++++++++++++++--------
>   drivers/net/xen-netback/xenbus.c    |   29 +++++++++++++++++++--
>   4 files changed, 74 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> index b4a9a3c..720b1ca 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -87,6 +87,7 @@ struct pending_tx_info {
>   struct xenvif_rx_meta {
>   	int id;
>   	int size;
> +	int gso_type;
>   	int gso_size;
>   };
>   
> @@ -150,9 +151,10 @@ struct xenvif {
>   	u8               fe_dev_addr[6];
>   
>   	/* Frontend feature information. */
> +	int gso_mask;
> +	int gso_prefix_mask;
I assume it is a flag instead of mask here, right? If it is mask, then 1 
means disabling the gso.
> +
>   	u8 can_sg:1;
> -	u8 gso:1;
> -	u8 gso_prefix:1;
>   	u8 ip_csum:1;
>   	u8 ipv6_csum:1;
>   
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index cb0d8ea..3d11387 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -214,8 +214,12 @@ static netdev_features_t xenvif_fix_features(struct net_device *dev,
>   
>   	if (!vif->can_sg)
>   		features &= ~NETIF_F_SG;
> -	if (!vif->gso && !vif->gso_prefix)
> +	if (~(vif->gso_mask | vif->gso_prefix_mask) &
> +	    (1 << XEN_NETIF_GSO_TYPE_TCPV4))

Is it better to use XEN_NETIF_GSO_TYPE_TCPV4 directly and setting 
gso_mask(gso_prefix_mask) with "|= XEN_NETIF_GSO_TYPE_TCPV4" or "|= 
XEN_NETIF_GSO_TYPE_TCPV6" instead of "1 <<"?

>   		features &= ~NETIF_F_TSO;
> +	if (~(vif->gso_mask | vif->gso_prefix_mask) &
> +	    (1 << XEN_NETIF_GSO_TYPE_TCPV6))
Same as above.
> +		features &= ~NETIF_F_TSO6;
>   	if (!vif->ip_csum)
>   		features &= ~NETIF_F_IP_CSUM;
>   	if (!vif->ipv6_csum)
> @@ -320,7 +324,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
>   	dev->netdev_ops	= &xenvif_netdev_ops;
>   	dev->hw_features = NETIF_F_SG |
>   		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
> -		NETIF_F_TSO;
> +		NETIF_F_TSO | NETIF_F_TSO6;
>   	dev->features = dev->hw_features | NETIF_F_RXCSUM;
>   	SET_ETHTOOL_OPS(dev, &xenvif_ethtool_ops);
>   
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index ac42f73..ee0d55c 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -140,7 +140,7 @@ static int max_required_rx_slots(struct xenvif *vif)
>   	int max = DIV_ROUND_UP(vif->dev->mtu, PAGE_SIZE);
>   
>   	/* XXX FIXME: RX path dependent on MAX_SKB_FRAGS */
> -	if (vif->can_sg || vif->gso || vif->gso_prefix)
> +	if (vif->can_sg || vif->gso_mask || vif->gso_prefix_mask)
>   		max += MAX_SKB_FRAGS + 1; /* extra_info + frags */
>   
>   	return max;
> @@ -334,6 +334,7 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
>   	struct gnttab_copy *copy_gop;
>   	struct xenvif_rx_meta *meta;
>   	unsigned long bytes;
> +	int gso_type;
>   
>   	/* Data must not cross a page boundary. */
>   	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
> @@ -392,7 +393,14 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
>   		}
>   
>   		/* Leave a gap for the GSO descriptor. */
> -		if (*head && skb_shinfo(skb)->gso_size && !vif->gso_prefix)
> +		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
> +			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> +		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
> +			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> +		else
> +			gso_type = 0;
> +
> +		if (*head && ((1 << gso_type) & vif->gso_prefix_mask))
Same
>   			vif->rx.req_cons++;
>   
>   		*head = 0; /* There must be something in this buffer now. */
> @@ -423,14 +431,28 @@ static int xenvif_gop_skb(struct sk_buff *skb,
>   	unsigned char *data;
>   	int head = 1;
>   	int old_meta_prod;
> +	int gso_type;
> +	int gso_size;
>   
>   	old_meta_prod = npo->meta_prod;
>   
> +	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
> +		gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> +		gso_size = skb_shinfo(skb)->gso_size;
> +	} else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) {
> +		gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> +		gso_size = skb_shinfo(skb)->gso_size;
> +	} else {
> +		gso_type = 0;
> +		gso_size = 0;
> +	}
> +
>   	/* Set up a GSO prefix descriptor, if necessary */
> -	if (skb_shinfo(skb)->gso_size && vif->gso_prefix) {
> +	if ((1 << gso_type) & vif->gso_prefix_mask) {
Same
>   		req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
>   		meta = npo->meta + npo->meta_prod++;
> -		meta->gso_size = skb_shinfo(skb)->gso_size;
> +		meta->gso_type = gso_type;
> +		meta->gso_size = gso_size;
>   		meta->size = 0;
>   		meta->id = req->id;
>   	}
> @@ -438,10 +460,13 @@ static int xenvif_gop_skb(struct sk_buff *skb,
>   	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
>   	meta = npo->meta + npo->meta_prod++;
>   
> -	if (!vif->gso_prefix)
> -		meta->gso_size = skb_shinfo(skb)->gso_size;
> -	else
> +	if ((1 << gso_type) & vif->gso_mask) {
Same
> +		meta->gso_type = gso_type;
> +		meta->gso_size = gso_size;
> +	} else {
> +		meta->gso_type = 0;
>   		meta->gso_size = 0;
> +	}
>   
>   	meta->size = 0;
>   	meta->id = req->id;
> @@ -587,7 +612,8 @@ void xenvif_rx_action(struct xenvif *vif)
>   
>   		vif = netdev_priv(skb->dev);
>   
> -		if (vif->meta[npo.meta_cons].gso_size && vif->gso_prefix) {
> +		if ((1 << vif->meta[npo.meta_cons].gso_type) &
> +		    vif->gso_prefix_mask) {
>   			resp = RING_GET_RESPONSE(&vif->rx,
>   						 vif->rx.rsp_prod_pvt++);
>   
> @@ -624,7 +650,8 @@ void xenvif_rx_action(struct xenvif *vif)
>   					vif->meta[npo.meta_cons].size,
>   					flags);
>   
> -		if (vif->meta[npo.meta_cons].gso_size && !vif->gso_prefix) {
> +		if ((1 << vif->meta[npo.meta_cons].gso_type) &
Same
> +		    vif->gso_mask) {
>   			struct xen_netif_extra_info *gso =
>   				(struct xen_netif_extra_info *)
>   				RING_GET_RESPONSE(&vif->rx,
> @@ -632,8 +659,8 @@ void xenvif_rx_action(struct xenvif *vif)
>   
>   			resp->flags |= XEN_NETRXF_extra_info;
>   
> +			gso->u.gso.type = vif->meta[npo.meta_cons].gso_type;
>   			gso->u.gso.size = vif->meta[npo.meta_cons].gso_size;
> -			gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
>   			gso->u.gso.pad = 0;
>   			gso->u.gso.features = 0;
>   
> diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
> index 389fa72..4894256 100644
> --- a/drivers/net/xen-netback/xenbus.c
> +++ b/drivers/net/xen-netback/xenbus.c
> @@ -573,15 +573,40 @@ static int connect_rings(struct backend_info *be)
>   		val = 0;
>   	vif->can_sg = !!val;
>   
> +	vif->gso_mask = 0;
> +	vif->gso_prefix_mask = 0;
> +
>   	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4",
>   			 "%d", &val) < 0)
>   		val = 0;
> -	vif->gso = !!val;
> +	if (val)
> +		vif->gso_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV4;
Same: |= XEN_NETIF_GSO_TYPE_TCPV4;
>   
>   	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4-prefix",
>   			 "%d", &val) < 0)
>   		val = 0;
> -	vif->gso_prefix = !!val;
> +	if (val)
> +		vif->gso_prefix_mask = 1 << XEN_NETIF_GSO_TYPE_TCPV4;
Same as above.

Thanks
Annie

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [Xen-devel] [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to the guest
  2013-10-09  4:41   ` [Xen-devel] " annie li
@ 2013-10-09 10:26     ` Paul Durrant
  2013-10-10  2:19       ` annie li
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Durrant @ 2013-10-09 10:26 UTC (permalink / raw)
  To: annie li; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell

> -----Original Message-----
> From: annie li [mailto:annie.li@oracle.com]
> Sent: 09 October 2013 05:42
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [Xen-devel] [PATCH net-next v2 5/5] xen-netback: enable IPv6
> TCP GSO to the guest
> 
> 
> On 2013-10-8 18:58, Paul Durrant wrote:
> > This patch adds code to handle SKB_GSO_TCPV6 skbs and construct
> appropriate
> > extra or prefix segments to pass the large packet to the frontend. New
> > xenstore flags, feature-gso-tcpv6 and feature-gso-tcpv6-prefix, are
> sampled
> > to determine if the frontend is capable of handling such packets.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > Cc: Wei Liu <wei.liu2@citrix.com>
> > Cc: David Vrabel <david.vrabel@citrix.com>
> > Cc: Ian Campbell <ian.campbell@citrix.com>
> > ---
> >   drivers/net/xen-netback/common.h    |    6 +++--
> >   drivers/net/xen-netback/interface.c |    8 ++++--
> >   drivers/net/xen-netback/netback.c   |   47
> +++++++++++++++++++++++++++--------
> >   drivers/net/xen-netback/xenbus.c    |   29 +++++++++++++++++++--
> >   4 files changed, 74 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-
> netback/common.h
> > index b4a9a3c..720b1ca 100644
> > --- a/drivers/net/xen-netback/common.h
> > +++ b/drivers/net/xen-netback/common.h
> > @@ -87,6 +87,7 @@ struct pending_tx_info {
> >   struct xenvif_rx_meta {
> >   	int id;
> >   	int size;
> > +	int gso_type;
> >   	int gso_size;
> >   };
> >
> > @@ -150,9 +151,10 @@ struct xenvif {
> >   	u8               fe_dev_addr[6];
> >
> >   	/* Frontend feature information. */
> > +	int gso_mask;
> > +	int gso_prefix_mask;
> I assume it is a flag instead of mask here, right? If it is mask, then 1
> means disabling the gso.

I don't understand what you're saying here. I'm just swapping from bit flags to a couple of masks. Masks without either of the requisite bits for v4 or v6 gso mean it is disabled.

> > +
> >   	u8 can_sg:1;
> > -	u8 gso:1;
> > -	u8 gso_prefix:1;
> >   	u8 ip_csum:1;
> >   	u8 ipv6_csum:1;
> >
> > diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-
> netback/interface.c
> > index cb0d8ea..3d11387 100644
> > --- a/drivers/net/xen-netback/interface.c
> > +++ b/drivers/net/xen-netback/interface.c
> > @@ -214,8 +214,12 @@ static netdev_features_t
> xenvif_fix_features(struct net_device *dev,
> >
> >   	if (!vif->can_sg)
> >   		features &= ~NETIF_F_SG;
> > -	if (!vif->gso && !vif->gso_prefix)
> > +	if (~(vif->gso_mask | vif->gso_prefix_mask) &
> > +	    (1 << XEN_NETIF_GSO_TYPE_TCPV4))
> 
> Is it better to use XEN_NETIF_GSO_TYPE_TCPV4 directly and setting
> gso_mask(gso_prefix_mask) with "|= XEN_NETIF_GSO_TYPE_TCPV4" or "|=
> XEN_NETIF_GSO_TYPE_TCPV6" instead of "1 <<"?
> 

I thought about it but decided it was best to leave XEN_NETIF_GSO_TYPE_xxx as a list of types rather than bits in a mask as there's no intrinsic reason why you'd ever want to OR them together (unlike the tx or rx flags). That fact I use them as bit shifts in netback is purely for convenience of coding - I guess I could define macros to make it a little tidier though.

  Paul

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Xen-devel] [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to the guest
  2013-10-09 10:26     ` Paul Durrant
@ 2013-10-10  2:19       ` annie li
  0 siblings, 0 replies; 20+ messages in thread
From: annie li @ 2013-10-10  2:19 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell


On 2013-10-9 18:26, Paul Durrant wrote:
>> -----Original Message-----
>> From: annie li [mailto:annie.li@oracle.com]
>> Sent: 09 October 2013 05:42
>> To: Paul Durrant
>> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
>> Ian Campbell
>> Subject: Re: [Xen-devel] [PATCH net-next v2 5/5] xen-netback: enable IPv6
>> TCP GSO to the guest
>>
>>
>> On 2013-10-8 18:58, Paul Durrant wrote:
>>> This patch adds code to handle SKB_GSO_TCPV6 skbs and construct
>> appropriate
>>> extra or prefix segments to pass the large packet to the frontend. New
>>> xenstore flags, feature-gso-tcpv6 and feature-gso-tcpv6-prefix, are
>> sampled
>>> to determine if the frontend is capable of handling such packets.
>>>
>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>> Cc: Wei Liu <wei.liu2@citrix.com>
>>> Cc: David Vrabel <david.vrabel@citrix.com>
>>> Cc: Ian Campbell <ian.campbell@citrix.com>
>>> ---
>>>    drivers/net/xen-netback/common.h    |    6 +++--
>>>    drivers/net/xen-netback/interface.c |    8 ++++--
>>>    drivers/net/xen-netback/netback.c   |   47
>> +++++++++++++++++++++++++++--------
>>>    drivers/net/xen-netback/xenbus.c    |   29 +++++++++++++++++++--
>>>    4 files changed, 74 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-
>> netback/common.h
>>> index b4a9a3c..720b1ca 100644
>>> --- a/drivers/net/xen-netback/common.h
>>> +++ b/drivers/net/xen-netback/common.h
>>> @@ -87,6 +87,7 @@ struct pending_tx_info {
>>>    struct xenvif_rx_meta {
>>>    	int id;
>>>    	int size;
>>> +	int gso_type;
>>>    	int gso_size;
>>>    };
>>>
>>> @@ -150,9 +151,10 @@ struct xenvif {
>>>    	u8               fe_dev_addr[6];
>>>
>>>    	/* Frontend feature information. */
>>> +	int gso_mask;
>>> +	int gso_prefix_mask;
>> I assume it is a flag instead of mask here, right? If it is mask, then 1
>> means disabling the gso.
> I don't understand what you're saying here. I'm just swapping from bit flags to a couple of masks. Masks without either of the requisite bits for v4 or v6 gso mean it is disabled.

It is just about semantics,  my understanding is masks WITH bits for v4 
or v6 means disabling.

>
>>> +
>>>    	u8 can_sg:1;
>>> -	u8 gso:1;
>>> -	u8 gso_prefix:1;
>>>    	u8 ip_csum:1;
>>>    	u8 ipv6_csum:1;
>>>
>>> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-
>> netback/interface.c
>>> index cb0d8ea..3d11387 100644
>>> --- a/drivers/net/xen-netback/interface.c
>>> +++ b/drivers/net/xen-netback/interface.c
>>> @@ -214,8 +214,12 @@ static netdev_features_t
>> xenvif_fix_features(struct net_device *dev,
>>>    	if (!vif->can_sg)
>>>    		features &= ~NETIF_F_SG;
>>> -	if (!vif->gso && !vif->gso_prefix)
>>> +	if (~(vif->gso_mask | vif->gso_prefix_mask) &
>>> +	    (1 << XEN_NETIF_GSO_TYPE_TCPV4))
>> Is it better to use XEN_NETIF_GSO_TYPE_TCPV4 directly and setting
>> gso_mask(gso_prefix_mask) with "|= XEN_NETIF_GSO_TYPE_TCPV4" or "|=
>> XEN_NETIF_GSO_TYPE_TCPV6" instead of "1 <<"?
>>
> I thought about it but decided it was best to leave XEN_NETIF_GSO_TYPE_xxx as a list of types rather than bits in a mask as there's no intrinsic reason why you'd ever want to OR them together (unlike the tx or rx flags). That fact I use them as bit shifts in netback is purely for convenience of coding - I guess I could define macros to make it a little tidier though.

Macros would be fine.

Thanks
Annie

^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [PATCH net-next v2 2/5] xen-netback: Add support for IPv6 checksum offload from guest
  2013-10-08 16:19       ` Wei Liu
@ 2013-10-10  9:37         ` Paul Durrant
  0 siblings, 0 replies; 20+ messages in thread
From: Paul Durrant @ 2013-10-10  9:37 UTC (permalink / raw)
  To: Wei Liu; +Cc: Wei Liu, xen-devel, netdev, David Vrabel, Ian Campbell

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 08 October 2013 17:19
> To: Paul Durrant
> Cc: Wei Liu; xen-devel@lists.xen.org; netdev@vger.kernel.org; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v2 2/5] xen-netback: Add support for IPv6
> checksum offload from guest
> 
> On Tue, Oct 08, 2013 at 02:50:27PM +0100, Paul Durrant wrote:
> [...]
> > > > -#define PKT_PROT_LEN    (ETH_HLEN + \
> > > > -			 VLAN_HLEN + \
> > > > -			 sizeof(struct iphdr) + MAX_IPOPTLEN + \
> > > > -			 sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE)
> > > > +#define PKT_PROT_LEN 128
> > > >
> > >
> > > Where does 128 come from?
> > >
> >
> > It's just an arbitrary power of 2 that was chosen because it seems to
> > cover most likely v6 headers and all v4 headers.
> >
> 
> Hmm... How about using the value of MAX_TCP_HEADER? I guess that can
> cover all V4 / V6 headers.
> 
> MAX_TCP_HEADER varies, depending on configuration. To make sure we can
> accommodate all guests packet we might need to use its maximum value
> which can be as big as 128 + 128 + 48.
> 

Because we always double-copy (as the grant copy doesn't copy direct to the linear area) I was concerned about making the pullup too big. I'd rather optimize around a smaller header so how about we stick with 128 but if maybe_pull_tail() finds it needs to pullup then it just pulls up to MAX_TCP_HEADER, so we limit to pullups to maximum of 2?

> > > >  		if (recalculate_partial_csum) {
> > > >  			struct tcphdr *tcph = tcp_hdr(skb);
> > > > +
> > > > +			header_size = skb->network_header +
> > > > +				off +
> > > > +				sizeof(struct tcphdr) +
> > > > +				MAX_TCP_OPTION_SPACE;
> > > > +			maybe_pull_tail(skb, header_size);
> > > > +
> > >
> > > I presume this function is checksum_setup stripped down to handle IPv4
> > > packet. What's the purpose of changing its behaviour? Why the pull_tail
> > > here?
> > >
> >
> > We have to make sure that the TCP header is in the linear area as we
> > are about to write to the checksum field. In practice, the 128 byte
> > pull should guarantee this but in case that is varied later I wanted
> > to make sure things did not start to fail in an add way.
> >
> 
> If you already set the pull size to maximum possible value then this
> will not be necessary anymore, right?
> 
> > > > +	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len))
> > > &&
> > > > +	       !done) {
> > > > +		/* We only access at most the first 2 bytes of any option
> > > header
> > > > +		 * to determine the next one.
> > > > +		 */
> > > > +		header_size = skb->network_header + off + 2;
> > > > +		maybe_pull_tail(skb, header_size);
> > > > +
> > >
> > > Will this cause performance problem? Is it possible that you pull too
> > > many times?
> > >
> >
> > I guess it means we may get two pulls for the TCP/UDP headers rather
> > than one so could push the pulls into the individual cases if you
> > think it will affect performance that badly.
> 
> Hmm... Not sure I get what you mean here. The main problem I'm seeing is
> that maybe_pull_tail is called in every loop.
> 
> I would like to see as few pulls as possible because __pskb_pull_tail
> can be expensive and only expected to use in "exceptional cases" (quoted
> from the comment above that function).
> 
> Is it possible to pull TCP_MAX_HEADER bytes once to eliminate all other
> pulls in checksum_setup{,_ipv4,_ipv6}?
> 

Note that the function is called *maybe*_pull_tail(). It only pulls if it needs to :-)

  Paul

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-10-10  9:37 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-08 10:58 [PATCH net-next v2 0/5] xen-netback: IPv6 offload support Paul Durrant
2013-10-08 10:58 ` [PATCH net-next v2 1/5] xen-netback: add IPv6 checksum offload to guest Paul Durrant
2013-10-08 13:10   ` Wei Liu
2013-10-08 13:14     ` Paul Durrant
2013-10-08 10:58 ` [PATCH net-next v2 2/5] xen-netback: Add support for IPv6 checksum offload from guest Paul Durrant
2013-10-08 13:30   ` Wei Liu
2013-10-08 13:50     ` Paul Durrant
2013-10-08 16:19       ` Wei Liu
2013-10-10  9:37         ` Paul Durrant
2013-10-08 10:58 ` [PATCH net-next v2 3/5] xen-netback: Unconditionally set NETIF_F_RXCSUM Paul Durrant
2013-10-08 10:58 ` [PATCH net-next v2 4/5] xen-netback: handle IPv6 TCP GSO packets from the guest Paul Durrant
2013-10-08 13:31   ` Wei Liu
2013-10-08 13:42     ` Paul Durrant
2013-10-08 13:50       ` [Xen-devel] " Jan Beulich
2013-10-08 10:58 ` [PATCH net-next v2 5/5] xen-netback: enable IPv6 TCP GSO to " Paul Durrant
2013-10-08 13:34   ` Wei Liu
2013-10-08 13:40     ` Paul Durrant
2013-10-09  4:41   ` [Xen-devel] " annie li
2013-10-09 10:26     ` Paul Durrant
2013-10-10  2:19       ` annie li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).