netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/8 v6] Open vSwitch upcall optimiziations
@ 2013-11-21 18:13 Thomas Graf
       [not found] ` <cover.1385057355.git.tgraf-G/eBtMaohhA@public.gmane.org>
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Thomas Graf @ 2013-11-21 18:13 UTC (permalink / raw)
  To: jesse, davem
  Cc: dev, netdev, dborkman, ffusco, fleitner, eric.dumazet, bhutchings

Reposting this before the merge window as it will go through Jesse's
tree.

Given jumbo frames, the capacity of the slow path is improved by 
a factor of 2.5x.

V6: - Added memory mapped netlink i/o support
    - Drop user_features if old user space not aware of user features
      reuses an existing datapath
V5: - Removed padding requirement in user space
    - Added OVS_DP_F_UNALIGNED flag let user space signal ability to
      receive unaligned Netlink messages, fall back to linear copy
      otherwise.
V4: - Daniel Borkmann pointed out that the style in skbuff.h has changed
      in net-next, adapted to no longer using extern in headers.
V3: - Removed unneeded alignment of nlmsg_len after padding
V2: - Added skb_zerocopy_headlen() to calculate headroom of destination
      buffer. This also takes care of the from->head_frag issue.
    - Attribute alignment for frag_list case
    - API documentation
    - performance data for CHECKSUM_PARTIAL tx case

Thomas Graf (8):
  genl: Add genlmsg_new_unicast() for unicast message allocation
  netlink: Avoid netlink mmap alloc if msg size exceeds frame size
  openvswitch: Enable memory mapped Netlink i/o
  net: Export skb_zerocopy() to zerocopy from one skb to another
  openvswitch: Allow user space to announce ability to accept unaligned
    Netlink messages
  openvswitch: Allow update of dp with OVS_DP_CMD_NEW if NLM_F_REPLACE
    is set
  openvswitch: Drop user features if old user space attempted to create
    datapath
  openvswitch: Use skb_zerocopy() for upcall

 include/linux/skbuff.h               |   3 +
 include/net/genetlink.h              |   4 +
 include/uapi/linux/openvswitch.h     |  15 ++-
 net/core/skbuff.c                    |  85 +++++++++++++++
 net/netfilter/nfnetlink_queue_core.c |  59 +----------
 net/netlink/af_netlink.c             |   4 +
 net/netlink/genetlink.c              |  21 ++++
 net/openvswitch/datapath.c           | 193 ++++++++++++++++++++++-------------
 net/openvswitch/datapath.h           |   2 +
 9 files changed, 257 insertions(+), 129 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next 1/8] genl: Add genlmsg_new_unicast() for unicast message allocation
       [not found] ` <cover.1385057355.git.tgraf-G/eBtMaohhA@public.gmane.org>
@ 2013-11-21 18:13   ` Thomas Graf
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Graf @ 2013-11-21 18:13 UTC (permalink / raw)
  To: jesse-l0M0P4e3n4LQT0dZR+AlfA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, fleitner-H+wXaHxf7aLQT0dZR+AlfA,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, bhutchings-s/n/eUQHGBpZroRs9YW3xA

Allocates a new sk_buff large enough to cover the specified payload
plus required Netlink headers. Will check receiving socket for
memory mapped i/o capability and use it if enabled. Will fall back
to non-mapped skb if message size exceeds the frame size of the ring.

Signed-of-by: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
Reviewed-by: Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 include/net/genetlink.h |  4 ++++
 net/netlink/genetlink.c | 21 +++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index ace4abf..1d50c6f 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -73,6 +73,7 @@ struct genl_family {
  * @attrs: netlink attributes
  * @_net: network namespace
  * @user_ptr: user pointers
+ * @dst_sk: destination socket
  */
 struct genl_info {
 	u32			snd_seq;
@@ -85,6 +86,7 @@ struct genl_info {
 	struct net *		_net;
 #endif
 	void *			user_ptr[2];
+	struct sock *		dst_sk;
 };
 
 static inline struct net *genl_info_net(struct genl_info *info)
@@ -177,6 +179,8 @@ void genl_notify(struct genl_family *family,
 		 struct sk_buff *skb, struct net *net, u32 portid,
 		 u32 group, struct nlmsghdr *nlh, gfp_t flags);
 
+struct sk_buff *genlmsg_new_unicast(size_t payload, struct genl_info *info,
+				    gfp_t flags);
 void *genlmsg_put(struct sk_buff *skb, u32 portid, u32 seq,
 		  struct genl_family *family, int flags, u8 cmd);
 
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 7dbc4f7..51bdc12 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -454,6 +454,26 @@ int genl_unregister_family(struct genl_family *family)
 EXPORT_SYMBOL(genl_unregister_family);
 
 /**
+ * genlmsg_new_unicast - Allocate generic netlink message for unicast
+ * @payload: size of the message payload
+ * @info: information on destination
+ * @flags: the type of memory to allocate
+ *
+ * Allocates a new sk_buff large enough to cover the specified payload
+ * plus required Netlink headers. Will check receiving socket for
+ * memory mapped i/o capability and use it if enabled. Will fall back
+ * to non-mapped skb if message size exceeds the frame size of the ring.
+ */
+struct sk_buff *genlmsg_new_unicast(size_t payload, struct genl_info *info,
+				    gfp_t flags)
+{
+	size_t len = nlmsg_total_size(genlmsg_total_size(payload));
+
+	return netlink_alloc_skb(info->dst_sk, len, info->snd_portid, flags);
+}
+EXPORT_SYMBOL_GPL(genlmsg_new_unicast);
+
+/**
  * genlmsg_put - Add generic netlink header to netlink message
  * @skb: socket buffer holding the message
  * @portid: netlink portid the message is addressed to
@@ -593,6 +613,7 @@ static int genl_family_rcv_msg(struct genl_family *family,
 	info.genlhdr = nlmsg_data(nlh);
 	info.userhdr = nlmsg_data(nlh) + GENL_HDRLEN;
 	info.attrs = attrbuf;
+	info.dst_sk = skb->sk;
 	genl_info_net_set(&info, net);
 	memset(&info.user_ptr, 0, sizeof(info.user_ptr));
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 2/8] netlink: Avoid netlink mmap alloc if msg size exceeds frame size
  2013-11-21 18:13 [PATCH net-next 0/8 v6] Open vSwitch upcall optimiziations Thomas Graf
       [not found] ` <cover.1385057355.git.tgraf-G/eBtMaohhA@public.gmane.org>
@ 2013-11-21 18:13 ` Thomas Graf
  2013-11-21 18:13 ` [PATCH net-next 3/8] openvswitch: Enable memory mapped Netlink i/o Thomas Graf
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Thomas Graf @ 2013-11-21 18:13 UTC (permalink / raw)
  To: jesse, davem
  Cc: dev, netdev, dborkman, ffusco, fleitner, eric.dumazet, bhutchings

An insufficent ring frame size configuration can lead to an
unnecessary skb allocation for every Netlink message. Check frame
size before taking the queue lock and allocating the skb and
re-check with lock to be safe.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
---
 net/netlink/af_netlink.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index bca50b9..6433489 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1769,6 +1769,9 @@ struct sk_buff *netlink_alloc_skb(struct sock *ssk, unsigned int size,
 	if (ring->pg_vec == NULL)
 		goto out_put;
 
+	if (ring->frame_size - NL_MMAP_HDRLEN < size)
+		goto out_put;
+
 	skb = alloc_skb_head(gfp_mask);
 	if (skb == NULL)
 		goto err1;
@@ -1778,6 +1781,7 @@ struct sk_buff *netlink_alloc_skb(struct sock *ssk, unsigned int size,
 	if (ring->pg_vec == NULL)
 		goto out_free;
 
+	/* check again under lock */
 	maxlen = ring->frame_size - NL_MMAP_HDRLEN;
 	if (maxlen < size)
 		goto out_free;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 3/8] openvswitch: Enable memory mapped Netlink i/o
  2013-11-21 18:13 [PATCH net-next 0/8 v6] Open vSwitch upcall optimiziations Thomas Graf
       [not found] ` <cover.1385057355.git.tgraf-G/eBtMaohhA@public.gmane.org>
  2013-11-21 18:13 ` [PATCH net-next 2/8] netlink: Avoid netlink mmap alloc if msg size exceeds frame size Thomas Graf
@ 2013-11-21 18:13 ` Thomas Graf
  2013-11-21 18:13 ` [PATCH net-next 4/8] net: Export skb_zerocopy() to zerocopy from one skb to another Thomas Graf
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Thomas Graf @ 2013-11-21 18:13 UTC (permalink / raw)
  To: jesse, davem
  Cc: dev, netdev, dborkman, ffusco, fleitner, eric.dumazet, bhutchings

Use memory mapped Netlink i/o for all unicast openvswitch
communication if a ring has been set up.

Benchmark
  * pktgen -> ovs internal port
  * 5M pkts, 5M flows
  * 4 threads, 8 cores

Before:
Result: OK: 67418743(c67108212+d310530) usec, 5000000 (9000byte,0frags)
  74163pps 5339Mb/sec (5339736000bps) errors: 0
	+   2.98%     ovs-vswitchd  [k] copy_user_generic_string
	+   2.49%     ovs-vswitchd  [k] memcpy
	+   1.84%       kpktgend_2  [k] memcpy
	+   1.81%       kpktgend_1  [k] memcpy
	+   1.81%       kpktgend_3  [k] memcpy
	+   1.78%       kpktgend_0  [k] memcpy

After:
Result: OK: 24229690(c24127165+d102524) usec, 5000000 (9000byte,0frags)
  206358pps 14857Mb/sec (14857776000bps) errors: 0
	+   2.80%     ovs-vswitchd  [k] memcpy
	+   1.31%       kpktgend_2  [k] memcpy
	+   1.23%       kpktgend_0  [k] memcpy
	+   1.09%       kpktgend_1  [k] memcpy
	+   1.04%       kpktgend_3  [k] memcpy
	+   0.96%     ovs-vswitchd  [k] copy_user_generic_string

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
---
 net/openvswitch/datapath.c | 56 ++++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 6f5e1dd..0ac9cde 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -402,6 +402,11 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
 	struct sk_buff *nskb = NULL;
 	struct sk_buff *user_skb; /* to be queued to userspace */
 	struct nlattr *nla;
+	struct genl_info info = {
+		.dst_sk = net->genl_sock,
+		.snd_portid = upcall_info->portid,
+	};
+	size_t len;
 	int err;
 
 	if (vlan_tx_tag_present(skb)) {
@@ -422,7 +427,8 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
 		goto out;
 	}
 
-	user_skb = genlmsg_new(upcall_msg_size(skb, upcall_info->userdata), GFP_ATOMIC);
+	len = upcall_msg_size(skb, upcall_info->userdata);
+	user_skb = genlmsg_new_unicast(len, &info, GFP_ATOMIC);
 	if (!user_skb) {
 		err = -ENOMEM;
 		goto out;
@@ -726,27 +732,30 @@ error:
 	return err;
 }
 
-static struct sk_buff *ovs_flow_cmd_alloc_info(struct sw_flow *flow)
+static struct sk_buff *ovs_flow_cmd_alloc_info(struct sw_flow *flow,
+					       struct genl_info *info)
 {
-	const struct sw_flow_actions *sf_acts;
+	size_t len;
 
-	sf_acts = ovsl_dereference(flow->sf_acts);
+	len = ovs_flow_cmd_msg_size(ovsl_dereference(flow->sf_acts));
 
-	return genlmsg_new(ovs_flow_cmd_msg_size(sf_acts), GFP_KERNEL);
+	return genlmsg_new_unicast(len, info, GFP_KERNEL);
 }
 
 static struct sk_buff *ovs_flow_cmd_build_info(struct sw_flow *flow,
 					       struct datapath *dp,
-					       u32 portid, u32 seq, u8 cmd)
+					       struct genl_info *info,
+					       u8 cmd)
 {
 	struct sk_buff *skb;
 	int retval;
 
-	skb = ovs_flow_cmd_alloc_info(flow);
+	skb = ovs_flow_cmd_alloc_info(flow, info);
 	if (!skb)
 		return ERR_PTR(-ENOMEM);
 
-	retval = ovs_flow_cmd_fill_info(flow, dp, skb, portid, seq, 0, cmd);
+	retval = ovs_flow_cmd_fill_info(flow, dp, skb, info->snd_portid,
+					info->snd_seq, 0, cmd);
 	BUG_ON(retval < 0);
 	return skb;
 }
@@ -835,8 +844,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
 			goto err_flow_free;
 		}
 
-		reply = ovs_flow_cmd_build_info(flow, dp, info->snd_portid,
-						info->snd_seq, OVS_FLOW_CMD_NEW);
+		reply = ovs_flow_cmd_build_info(flow, dp, info, OVS_FLOW_CMD_NEW);
 	} else {
 		/* We found a matching flow. */
 		struct sw_flow_actions *old_acts;
@@ -864,8 +872,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
 		rcu_assign_pointer(flow->sf_acts, acts);
 		ovs_nla_free_flow_actions(old_acts);
 
-		reply = ovs_flow_cmd_build_info(flow, dp, info->snd_portid,
-					       info->snd_seq, OVS_FLOW_CMD_NEW);
+		reply = ovs_flow_cmd_build_info(flow, dp, info, OVS_FLOW_CMD_NEW);
 
 		/* Clear stats. */
 		if (a[OVS_FLOW_ATTR_CLEAR]) {
@@ -927,8 +934,7 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct genl_info *info)
 		goto unlock;
 	}
 
-	reply = ovs_flow_cmd_build_info(flow, dp, info->snd_portid,
-					info->snd_seq, OVS_FLOW_CMD_NEW);
+	reply = ovs_flow_cmd_build_info(flow, dp, info, OVS_FLOW_CMD_NEW);
 	if (IS_ERR(reply)) {
 		err = PTR_ERR(reply);
 		goto unlock;
@@ -975,7 +981,7 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info)
 		goto unlock;
 	}
 
-	reply = ovs_flow_cmd_alloc_info(flow);
+	reply = ovs_flow_cmd_alloc_info(flow, info);
 	if (!reply) {
 		err = -ENOMEM;
 		goto unlock;
@@ -1127,17 +1133,17 @@ error:
 	return -EMSGSIZE;
 }
 
-static struct sk_buff *ovs_dp_cmd_build_info(struct datapath *dp, u32 portid,
-					     u32 seq, u8 cmd)
+static struct sk_buff *ovs_dp_cmd_build_info(struct datapath *dp,
+					     struct genl_info *info, u8 cmd)
 {
 	struct sk_buff *skb;
 	int retval;
 
-	skb = genlmsg_new(ovs_dp_cmd_msg_size(), GFP_KERNEL);
+	skb = genlmsg_new_unicast(ovs_dp_cmd_msg_size(), info, GFP_KERNEL);
 	if (!skb)
 		return ERR_PTR(-ENOMEM);
 
-	retval = ovs_dp_cmd_fill_info(dp, skb, portid, seq, 0, cmd);
+	retval = ovs_dp_cmd_fill_info(dp, skb, info->snd_portid, info->snd_seq, 0, cmd);
 	if (retval < 0) {
 		kfree_skb(skb);
 		return ERR_PTR(retval);
@@ -1232,8 +1238,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		goto err_destroy_ports_array;
 	}
 
-	reply = ovs_dp_cmd_build_info(dp, info->snd_portid,
-				      info->snd_seq, OVS_DP_CMD_NEW);
+	reply = ovs_dp_cmd_build_info(dp, info, OVS_DP_CMD_NEW);
 	err = PTR_ERR(reply);
 	if (IS_ERR(reply))
 		goto err_destroy_local_port;
@@ -1299,8 +1304,7 @@ static int ovs_dp_cmd_del(struct sk_buff *skb, struct genl_info *info)
 	if (IS_ERR(dp))
 		goto unlock;
 
-	reply = ovs_dp_cmd_build_info(dp, info->snd_portid,
-				      info->snd_seq, OVS_DP_CMD_DEL);
+	reply = ovs_dp_cmd_build_info(dp, info, OVS_DP_CMD_DEL);
 	err = PTR_ERR(reply);
 	if (IS_ERR(reply))
 		goto unlock;
@@ -1328,8 +1332,7 @@ static int ovs_dp_cmd_set(struct sk_buff *skb, struct genl_info *info)
 	if (IS_ERR(dp))
 		goto unlock;
 
-	reply = ovs_dp_cmd_build_info(dp, info->snd_portid,
-				      info->snd_seq, OVS_DP_CMD_NEW);
+	reply = ovs_dp_cmd_build_info(dp, info, OVS_DP_CMD_NEW);
 	if (IS_ERR(reply)) {
 		err = PTR_ERR(reply);
 		genl_set_err(&dp_datapath_genl_family, sock_net(skb->sk), 0,
@@ -1360,8 +1363,7 @@ static int ovs_dp_cmd_get(struct sk_buff *skb, struct genl_info *info)
 		goto unlock;
 	}
 
-	reply = ovs_dp_cmd_build_info(dp, info->snd_portid,
-				      info->snd_seq, OVS_DP_CMD_NEW);
+	reply = ovs_dp_cmd_build_info(dp, info, OVS_DP_CMD_NEW);
 	if (IS_ERR(reply)) {
 		err = PTR_ERR(reply);
 		goto unlock;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 4/8] net: Export skb_zerocopy() to zerocopy from one skb to another
  2013-11-21 18:13 [PATCH net-next 0/8 v6] Open vSwitch upcall optimiziations Thomas Graf
                   ` (2 preceding siblings ...)
  2013-11-21 18:13 ` [PATCH net-next 3/8] openvswitch: Enable memory mapped Netlink i/o Thomas Graf
@ 2013-11-21 18:13 ` Thomas Graf
  2013-11-21 18:13 ` [PATCH net-next 5/8] openvswitch: Allow user space to announce ability to accept unaligned Netlink messages Thomas Graf
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Thomas Graf @ 2013-11-21 18:13 UTC (permalink / raw)
  To: jesse, davem
  Cc: dev, netdev, dborkman, ffusco, fleitner, eric.dumazet, bhutchings

Make the skb zerocopy logic written for nfnetlink queue available for
use by other modules.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
---
 include/linux/skbuff.h               |  3 ++
 net/core/skbuff.c                    | 85 ++++++++++++++++++++++++++++++++++++
 net/netfilter/nfnetlink_queue_core.c | 59 ++-----------------------
 3 files changed, 92 insertions(+), 55 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index bec1cc7..7c48e2d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2345,6 +2345,9 @@ int skb_splice_bits(struct sk_buff *skb, unsigned int offset,
 		    struct pipe_inode_info *pipe, unsigned int len,
 		    unsigned int flags);
 void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to);
+unsigned int skb_zerocopy_headlen(const struct sk_buff *from);
+void skb_zerocopy(struct sk_buff *to, const struct sk_buff *from,
+		  int len, int hlen);
 void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len);
 int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 8cec1e6..b448fc0 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2122,6 +2122,91 @@ __wsum skb_copy_and_csum_bits(const struct sk_buff *skb, int offset,
 }
 EXPORT_SYMBOL(skb_copy_and_csum_bits);
 
+ /**
+ *	skb_zerocopy_headlen - Calculate headroom needed for skb_zerocopy()
+ *	@from: source buffer
+ *
+ *	Calculates the amount of linear headroom needed in the 'to' skb passed
+ *	into skb_zerocopy().
+ */
+unsigned int
+skb_zerocopy_headlen(const struct sk_buff *from)
+{
+	unsigned int hlen = 0;
+
+	if (!from->head_frag ||
+	    skb_headlen(from) < L1_CACHE_BYTES ||
+	    skb_shinfo(from)->nr_frags >= MAX_SKB_FRAGS)
+		hlen = skb_headlen(from);
+
+	if (skb_has_frag_list(from))
+		hlen = from->len;
+
+	return hlen;
+}
+EXPORT_SYMBOL_GPL(skb_zerocopy_headlen);
+
+/**
+ *	skb_zerocopy - Zero copy skb to skb
+ *	@to: destination buffer
+ *	@source: source buffer
+ *	@len: number of bytes to copy from source buffer
+ *	@hlen: size of linear headroom in destination buffer
+ *
+ *	Copies up to `len` bytes from `from` to `to` by creating references
+ *	to the frags in the source buffer.
+ *
+ *	The `hlen` as calculated by skb_zerocopy_headlen() specifies the
+ *	headroom in the `to` buffer.
+ */
+void
+skb_zerocopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
+{
+	int i, j = 0;
+	int plen = 0; /* length of skb->head fragment */
+	struct page *page;
+	unsigned int offset;
+
+	BUG_ON(!from->head_frag && !hlen);
+
+	/* dont bother with small payloads */
+	if (len <= skb_tailroom(to)) {
+		skb_copy_bits(from, 0, skb_put(to, len), len);
+		return;
+	}
+
+	if (hlen) {
+		skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+		len -= hlen;
+	} else {
+		plen = min_t(int, skb_headlen(from), len);
+		if (plen) {
+			page = virt_to_head_page(from->head);
+			offset = from->data - (unsigned char *)page_address(page);
+			__skb_fill_page_desc(to, 0, page, offset, plen);
+			get_page(page);
+			j = 1;
+			len -= plen;
+		}
+	}
+
+	to->truesize += len + plen;
+	to->len += len + plen;
+	to->data_len += len + plen;
+
+	for (i = 0; i < skb_shinfo(from)->nr_frags; i++) {
+		if (!len)
+			break;
+		skb_shinfo(to)->frags[j] = skb_shinfo(from)->frags[i];
+		skb_shinfo(to)->frags[j].size = min_t(int, skb_shinfo(to)->frags[j].size, len);
+		len -= skb_shinfo(to)->frags[j].size;
+		skb_frag_ref(to, j);
+		j++;
+	}
+	skb_shinfo(to)->nr_frags = j;
+}
+EXPORT_SYMBOL_GPL(skb_zerocopy);
+
 void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to)
 {
 	__wsum csum;
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index 21258cf..615ee12 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -235,51 +235,6 @@ nfqnl_flush(struct nfqnl_instance *queue, nfqnl_cmpfn cmpfn, unsigned long data)
 	spin_unlock_bh(&queue->lock);
 }
 
-static void
-nfqnl_zcopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
-{
-	int i, j = 0;
-	int plen = 0; /* length of skb->head fragment */
-	struct page *page;
-	unsigned int offset;
-
-	/* dont bother with small payloads */
-	if (len <= skb_tailroom(to)) {
-		skb_copy_bits(from, 0, skb_put(to, len), len);
-		return;
-	}
-
-	if (hlen) {
-		skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
-		len -= hlen;
-	} else {
-		plen = min_t(int, skb_headlen(from), len);
-		if (plen) {
-			page = virt_to_head_page(from->head);
-			offset = from->data - (unsigned char *)page_address(page);
-			__skb_fill_page_desc(to, 0, page, offset, plen);
-			get_page(page);
-			j = 1;
-			len -= plen;
-		}
-	}
-
-	to->truesize += len + plen;
-	to->len += len + plen;
-	to->data_len += len + plen;
-
-	for (i = 0; i < skb_shinfo(from)->nr_frags; i++) {
-		if (!len)
-			break;
-		skb_shinfo(to)->frags[j] = skb_shinfo(from)->frags[i];
-		skb_shinfo(to)->frags[j].size = min_t(int, skb_shinfo(to)->frags[j].size, len);
-		len -= skb_shinfo(to)->frags[j].size;
-		skb_frag_ref(to, j);
-		j++;
-	}
-	skb_shinfo(to)->nr_frags = j;
-}
-
 static int
 nfqnl_put_packet_info(struct sk_buff *nlskb, struct sk_buff *packet,
 		      bool csum_verify)
@@ -304,7 +259,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 {
 	size_t size;
 	size_t data_len = 0, cap_len = 0;
-	int hlen = 0;
+	unsigned int hlen = 0;
 	struct sk_buff *skb;
 	struct nlattr *nla;
 	struct nfqnl_msg_packet_hdr *pmsg;
@@ -356,14 +311,8 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 		if (data_len > entskb->len)
 			data_len = entskb->len;
 
-		if (!entskb->head_frag ||
-		    skb_headlen(entskb) < L1_CACHE_BYTES ||
-		    skb_shinfo(entskb)->nr_frags >= MAX_SKB_FRAGS)
-			hlen = skb_headlen(entskb);
-
-		if (skb_has_frag_list(entskb))
-			hlen = entskb->len;
-		hlen = min_t(int, data_len, hlen);
+		hlen = skb_zerocopy_headlen(entskb);
+		hlen = min_t(unsigned int, hlen, data_len);
 		size += sizeof(struct nlattr) + hlen;
 		cap_len = entskb->len;
 		break;
@@ -504,7 +453,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 		nla->nla_type = NFQA_PAYLOAD;
 		nla->nla_len = nla_attr_size(data_len);
 
-		nfqnl_zcopy(skb, entskb, data_len, hlen);
+		skb_zerocopy(skb, entskb, data_len, hlen);
 	}
 
 	nlh->nlmsg_len = skb->len;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 5/8] openvswitch: Allow user space to announce ability to accept unaligned Netlink messages
  2013-11-21 18:13 [PATCH net-next 0/8 v6] Open vSwitch upcall optimiziations Thomas Graf
                   ` (3 preceding siblings ...)
  2013-11-21 18:13 ` [PATCH net-next 4/8] net: Export skb_zerocopy() to zerocopy from one skb to another Thomas Graf
@ 2013-11-21 18:13 ` Thomas Graf
  2013-11-21 18:13 ` [PATCH net-next 6/8] openvswitch: Allow update of dp with OVS_DP_CMD_NEW if NLM_F_REPLACE is set Thomas Graf
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Thomas Graf @ 2013-11-21 18:13 UTC (permalink / raw)
  To: jesse, davem
  Cc: dev, netdev, dborkman, ffusco, fleitner, eric.dumazet, bhutchings

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
---
 include/uapi/linux/openvswitch.h |  4 ++++
 net/openvswitch/datapath.c       | 14 ++++++++++++++
 net/openvswitch/datapath.h       |  2 ++
 3 files changed, 20 insertions(+)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index d120f9f..07ef2c3 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -75,6 +75,7 @@ enum ovs_datapath_attr {
 	OVS_DP_ATTR_UPCALL_PID,		/* Netlink PID to receive upcalls */
 	OVS_DP_ATTR_STATS,		/* struct ovs_dp_stats */
 	OVS_DP_ATTR_MEGAFLOW_STATS,	/* struct ovs_dp_megaflow_stats */
+	OVS_DP_ATTR_USER_FEATURES,	/* OVS_DP_F_*  */
 	__OVS_DP_ATTR_MAX
 };
 
@@ -106,6 +107,9 @@ struct ovs_vport_stats {
 	__u64   tx_dropped;		/* no space available in linux  */
 };
 
+/* Allow last Netlink attribute to be unaligned */
+#define OVS_DP_F_UNALIGNED	(1 << 0)
+
 /* Fixed logical ports. */
 #define OVSP_LOCAL      ((__u32)0)
 
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 0ac9cde..95d4424 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1067,6 +1067,7 @@ static const struct genl_ops dp_flow_genl_ops[] = {
 static const struct nla_policy datapath_policy[OVS_DP_ATTR_MAX + 1] = {
 	[OVS_DP_ATTR_NAME] = { .type = NLA_NUL_STRING, .len = IFNAMSIZ - 1 },
 	[OVS_DP_ATTR_UPCALL_PID] = { .type = NLA_U32 },
+	[OVS_DP_ATTR_USER_FEATURES] = { .type = NLA_U32 },
 };
 
 static struct genl_family dp_datapath_genl_family = {
@@ -1125,6 +1126,9 @@ static int ovs_dp_cmd_fill_info(struct datapath *dp, struct sk_buff *skb,
 			&dp_megaflow_stats))
 		goto nla_put_failure;
 
+	if (nla_put_u32(skb, OVS_DP_ATTR_USER_FEATURES, dp->user_features))
+		goto nla_put_failure;
+
 	return genlmsg_end(skb, ovs_header);
 
 nla_put_failure:
@@ -1171,6 +1175,12 @@ static struct datapath *lookup_datapath(struct net *net,
 	return dp ? dp : ERR_PTR(-ENODEV);
 }
 
+static void ovs_dp_change(struct datapath *dp, struct nlattr **a)
+{
+	if (a[OVS_DP_ATTR_USER_FEATURES])
+		dp->user_features = nla_get_u32(a[OVS_DP_ATTR_USER_FEATURES]);
+}
+
 static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 {
 	struct nlattr **a = info->attrs;
@@ -1229,6 +1239,8 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	parms.port_no = OVSP_LOCAL;
 	parms.upcall_portid = nla_get_u32(a[OVS_DP_ATTR_UPCALL_PID]);
 
+	ovs_dp_change(dp, a);
+
 	vport = new_vport(&parms);
 	if (IS_ERR(vport)) {
 		err = PTR_ERR(vport);
@@ -1332,6 +1344,8 @@ static int ovs_dp_cmd_set(struct sk_buff *skb, struct genl_info *info)
 	if (IS_ERR(dp))
 		goto unlock;
 
+	ovs_dp_change(dp, info->attrs);
+
 	reply = ovs_dp_cmd_build_info(dp, info, OVS_DP_CMD_NEW);
 	if (IS_ERR(reply)) {
 		err = PTR_ERR(reply);
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 4067ea4..193e2e0 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -88,6 +88,8 @@ struct datapath {
 	/* Network namespace ref. */
 	struct net *net;
 #endif
+
+	u32 user_features;
 };
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 6/8] openvswitch: Allow update of dp with OVS_DP_CMD_NEW if NLM_F_REPLACE is set
  2013-11-21 18:13 [PATCH net-next 0/8 v6] Open vSwitch upcall optimiziations Thomas Graf
                   ` (4 preceding siblings ...)
  2013-11-21 18:13 ` [PATCH net-next 5/8] openvswitch: Allow user space to announce ability to accept unaligned Netlink messages Thomas Graf
@ 2013-11-21 18:13 ` Thomas Graf
  2013-11-21 18:13 ` [PATCH net-next 7/8] openvswitch: Drop user features if old user space attempted to create datapath Thomas Graf
  2013-11-21 18:13 ` [PATCH net-next 8/8] openvswitch: Use skb_zerocopy() for upcall Thomas Graf
  7 siblings, 0 replies; 13+ messages in thread
From: Thomas Graf @ 2013-11-21 18:13 UTC (permalink / raw)
  To: jesse, davem
  Cc: dev, netdev, dborkman, ffusco, fleitner, eric.dumazet, bhutchings

Consolidates ovs_dp_cmd_new() and ovs_dp_cmd_set() to simplify
handling and avoid code duplication.

Allows user space to specify NLM_F_REPLACE with OVS_DP_CMD_NEW and
overwrite the settings such as the user features of an existing
datapath.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
---
 net/openvswitch/datapath.c | 83 +++++++++++++++++++++++-----------------------
 1 file changed, 42 insertions(+), 41 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 95d4424..3f1fb87 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1175,13 +1175,8 @@ static struct datapath *lookup_datapath(struct net *net,
 	return dp ? dp : ERR_PTR(-ENODEV);
 }
 
-static void ovs_dp_change(struct datapath *dp, struct nlattr **a)
-{
-	if (a[OVS_DP_ATTR_USER_FEATURES])
-		dp->user_features = nla_get_u32(a[OVS_DP_ATTR_USER_FEATURES]);
-}
-
-static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
+static int ovs_dp_create_or_update(struct sk_buff *skb, struct genl_info *info,
+				   bool create)
 {
 	struct nlattr **a = info->attrs;
 	struct vport_parms parms;
@@ -1190,6 +1185,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	struct vport *vport;
 	struct ovs_net *ovs_net;
 	int err, i;
+	bool allocated = false;
 
 	err = -EINVAL;
 	if (!a[OVS_DP_ATTR_NAME] || !a[OVS_DP_ATTR_UPCALL_PID])
@@ -1197,11 +1193,26 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 
 	ovs_lock();
 
+	dp = lookup_datapath(sock_net(skb->sk), info->userhdr, info->attrs);
+	if (!IS_ERR(dp)) {
+		if (info->nlhdr->nlmsg_flags & NLM_F_REPLACE)
+			goto update;
+
+		err = -EEXIST;
+		goto err_unlock_ovs;
+	}
+
+	if (!create) {
+		err = PTR_ERR(dp);
+		goto err_unlock_ovs;
+	}
+
 	err = -ENOMEM;
 	dp = kzalloc(sizeof(*dp), GFP_KERNEL);
 	if (dp == NULL)
 		goto err_unlock_ovs;
 
+	allocated = true;
 	ovs_dp_set_net(dp, hold_net(sock_net(skb->sk)));
 
 	/* Allocate table. */
@@ -1239,8 +1250,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	parms.port_no = OVSP_LOCAL;
 	parms.upcall_portid = nla_get_u32(a[OVS_DP_ATTR_UPCALL_PID]);
 
-	ovs_dp_change(dp, a);
-
 	vport = new_vport(&parms);
 	if (IS_ERR(vport)) {
 		err = PTR_ERR(vport);
@@ -1250,13 +1259,27 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		goto err_destroy_ports_array;
 	}
 
+update:
+	if (a[OVS_DP_ATTR_USER_FEATURES])
+		dp->user_features = nla_get_u32(a[OVS_DP_ATTR_USER_FEATURES]);
+
 	reply = ovs_dp_cmd_build_info(dp, info, OVS_DP_CMD_NEW);
 	err = PTR_ERR(reply);
-	if (IS_ERR(reply))
-		goto err_destroy_local_port;
+	if (IS_ERR(reply)) {
+		if (allocated)
+			goto err_destroy_local_port;
+		else {
+			err = 0;
+			genl_set_err(&dp_datapath_genl_family, sock_net(skb->sk), 0,
+				     0, err);
+			goto err_unlock_ovs;
+		}
+	}
 
-	ovs_net = net_generic(ovs_dp_get_net(dp), ovs_net_id);
-	list_add_tail_rcu(&dp->list_node, &ovs_net->dps);
+	if (allocated) {
+		ovs_net = net_generic(ovs_dp_get_net(dp), ovs_net_id);
+		list_add_tail_rcu(&dp->list_node, &ovs_net->dps);
+	}
 
 	ovs_unlock();
 
@@ -1280,6 +1303,11 @@ err:
 	return err;
 }
 
+static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
+{
+	return ovs_dp_create_or_update(skb, info, true);
+}
+
 /* Called with ovs_mutex. */
 static void __dp_destroy(struct datapath *dp)
 {
@@ -1334,34 +1362,7 @@ unlock:
 
 static int ovs_dp_cmd_set(struct sk_buff *skb, struct genl_info *info)
 {
-	struct sk_buff *reply;
-	struct datapath *dp;
-	int err;
-
-	ovs_lock();
-	dp = lookup_datapath(sock_net(skb->sk), info->userhdr, info->attrs);
-	err = PTR_ERR(dp);
-	if (IS_ERR(dp))
-		goto unlock;
-
-	ovs_dp_change(dp, info->attrs);
-
-	reply = ovs_dp_cmd_build_info(dp, info, OVS_DP_CMD_NEW);
-	if (IS_ERR(reply)) {
-		err = PTR_ERR(reply);
-		genl_set_err(&dp_datapath_genl_family, sock_net(skb->sk), 0,
-			     0, err);
-		err = 0;
-		goto unlock;
-	}
-
-	ovs_unlock();
-	ovs_notify(&dp_datapath_genl_family, reply, info);
-
-	return 0;
-unlock:
-	ovs_unlock();
-	return err;
+	return ovs_dp_create_or_update(skb, info, false);
 }
 
 static int ovs_dp_cmd_get(struct sk_buff *skb, struct genl_info *info)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 7/8] openvswitch: Drop user features if old user space attempted to create datapath
  2013-11-21 18:13 [PATCH net-next 0/8 v6] Open vSwitch upcall optimiziations Thomas Graf
                   ` (5 preceding siblings ...)
  2013-11-21 18:13 ` [PATCH net-next 6/8] openvswitch: Allow update of dp with OVS_DP_CMD_NEW if NLM_F_REPLACE is set Thomas Graf
@ 2013-11-21 18:13 ` Thomas Graf
  2013-11-21 18:23   ` Ben Hutchings
  2013-11-21 18:13 ` [PATCH net-next 8/8] openvswitch: Use skb_zerocopy() for upcall Thomas Graf
  7 siblings, 1 reply; 13+ messages in thread
From: Thomas Graf @ 2013-11-21 18:13 UTC (permalink / raw)
  To: jesse, davem
  Cc: dev, netdev, dborkman, ffusco, fleitner, eric.dumazet, bhutchings

Drop user features if an outdated user space instance that does not
understand the concept of user_features attempted to create a new
datapath.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
---
 include/uapi/linux/openvswitch.h | 11 ++++++++++-
 net/openvswitch/datapath.c       | 10 ++++++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 07ef2c3..4f74c2c 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -40,7 +40,16 @@ struct ovs_header {
 
 #define OVS_DATAPATH_FAMILY  "ovs_datapath"
 #define OVS_DATAPATH_MCGROUP "ovs_datapath"
-#define OVS_DATAPATH_VERSION 0x1
+
+/**
+ * V2:
+ *   - API users are expected to provide OVS_DP_ATTR_USER_FEATURES
+ *     with NLM_F_REPLACE when creating the datapath.
+ */
+#define OVS_DATAPATH_VERSION 2
+
+/* First OVS datapath version to support features */
+#define OVS_DP_VER_FEATURES 2
 
 enum ovs_datapath_cmd {
 	OVS_DP_CMD_UNSPEC,
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 3f1fb87..72cdffb 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1198,6 +1198,16 @@ static int ovs_dp_create_or_update(struct sk_buff *skb, struct genl_info *info,
 		if (info->nlhdr->nlmsg_flags & NLM_F_REPLACE)
 			goto update;
 
+		/* An outdated user space instance that does not understand
+		 * the concept of user_features has attempted to create a new
+		 * datapath. Drop all user features.
+		 */
+		if (info->genlhdr->version < OVS_DP_VER_FEATURES) {
+			WARN_ONCE(dp->user_features, "Dropping previously "
+				  "announced user features");
+			dp->user_features = 0;
+		}
+
 		err = -EEXIST;
 		goto err_unlock_ovs;
 	}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 8/8] openvswitch: Use skb_zerocopy() for upcall
  2013-11-21 18:13 [PATCH net-next 0/8 v6] Open vSwitch upcall optimiziations Thomas Graf
                   ` (6 preceding siblings ...)
  2013-11-21 18:13 ` [PATCH net-next 7/8] openvswitch: Drop user features if old user space attempted to create datapath Thomas Graf
@ 2013-11-21 18:13 ` Thomas Graf
  7 siblings, 0 replies; 13+ messages in thread
From: Thomas Graf @ 2013-11-21 18:13 UTC (permalink / raw)
  To: jesse, davem
  Cc: dev, netdev, dborkman, ffusco, fleitner, eric.dumazet, bhutchings

Use of skb_zerocopy() can avoids the expensive call to memcpy()
when copying the packet data into the Netlink skb. Completes
checksum through skb_checksum_help() if needed.

Zerocopy is only performed if user space supported unaligned
Netlink messages. memory mapped netlink i/o is preferred over
zerocopy if it is set up.

Cost of upcall is significantly reduced from:
+   7.48%       vhost-8471  [k] memcpy
+   5.57%     ovs-vswitchd  [k] memcpy
+   2.81%       vhost-8471  [k] csum_partial_copy_generic

to:
+   5.72%     ovs-vswitchd  [k] memcpy
+   3.32%       vhost-5153  [k] memcpy
+   0.68%       vhost-5153  [k] skb_zerocopy

(megaflows disabled)

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
---
 net/openvswitch/datapath.c | 54 +++++++++++++++++++++++++++++++---------------
 1 file changed, 37 insertions(+), 17 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 72cdffb..237f7d7 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -108,10 +108,10 @@ int lockdep_ovsl_is_held(void)
 #endif
 
 static struct vport *new_vport(const struct vport_parms *);
-static int queue_gso_packets(struct net *, int dp_ifindex, struct sk_buff *,
-			     const struct dp_upcall_info *);
-static int queue_userspace_packet(struct net *, int dp_ifindex,
-				  struct sk_buff *,
+static int queue_gso_packets(struct datapath *, struct net *, int dp_ifindex,
+			     struct sk_buff *, const struct dp_upcall_info *);
+static int queue_userspace_packet(struct datapath *, struct net *,
+				  int dp_ifindex, struct sk_buff *,
 				  const struct dp_upcall_info *);
 
 /* Must be called with rcu_read_lock or ovs_mutex. */
@@ -292,9 +292,9 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff *skb,
 	}
 
 	if (!skb_is_gso(skb))
-		err = queue_userspace_packet(ovs_dp_get_net(dp), dp_ifindex, skb, upcall_info);
+		err = queue_userspace_packet(dp, ovs_dp_get_net(dp), dp_ifindex, skb, upcall_info);
 	else
-		err = queue_gso_packets(ovs_dp_get_net(dp), dp_ifindex, skb, upcall_info);
+		err = queue_gso_packets(dp, ovs_dp_get_net(dp), dp_ifindex, skb, upcall_info);
 	if (err)
 		goto err;
 
@@ -310,7 +310,7 @@ err:
 	return err;
 }
 
-static int queue_gso_packets(struct net *net, int dp_ifindex,
+static int queue_gso_packets(struct datapath *dp, struct net *net, int dp_ifindex,
 			     struct sk_buff *skb,
 			     const struct dp_upcall_info *upcall_info)
 {
@@ -327,7 +327,7 @@ static int queue_gso_packets(struct net *net, int dp_ifindex,
 	/* Queue all of the segments. */
 	skb = segs;
 	do {
-		err = queue_userspace_packet(net, dp_ifindex, skb, upcall_info);
+		err = queue_userspace_packet(dp, net, dp_ifindex, skb, upcall_info);
 		if (err)
 			break;
 
@@ -381,10 +381,11 @@ static size_t key_attr_size(void)
 }
 
 static size_t upcall_msg_size(const struct sk_buff *skb,
-			      const struct nlattr *userdata)
+			      const struct nlattr *userdata,
+			      unsigned int hdrlen)
 {
 	size_t size = NLMSG_ALIGN(sizeof(struct ovs_header))
-		+ nla_total_size(skb->len) /* OVS_PACKET_ATTR_PACKET */
+		+ nla_total_size(hdrlen) /* OVS_PACKET_ATTR_PACKET */
 		+ nla_total_size(key_attr_size()); /* OVS_PACKET_ATTR_KEY */
 
 	/* OVS_PACKET_ATTR_USERDATA */
@@ -394,8 +395,8 @@ static size_t upcall_msg_size(const struct sk_buff *skb,
 	return size;
 }
 
-static int queue_userspace_packet(struct net *net, int dp_ifindex,
-				  struct sk_buff *skb,
+static int queue_userspace_packet(struct datapath *dp, struct net *net,
+				  int dp_ifindex, struct sk_buff *skb,
 				  const struct dp_upcall_info *upcall_info)
 {
 	struct ovs_header *upcall;
@@ -407,6 +408,7 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
 		.snd_portid = upcall_info->portid,
 	};
 	size_t len;
+	unsigned int hlen;
 	int err;
 
 	if (vlan_tx_tag_present(skb)) {
@@ -427,7 +429,21 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
 		goto out;
 	}
 
-	len = upcall_msg_size(skb, upcall_info->userdata);
+	/* Complete checksum if needed */
+	if (skb->ip_summed == CHECKSUM_PARTIAL &&
+	    (err = skb_checksum_help(skb)))
+		goto out;
+
+	/* Older versions of OVS user space enforce alignment of the last
+	 * Netlink attribute to NLA_ALIGNTO which would require extensive
+	 * padding logic. Only perform zerocopy if padding is not required.
+	 */
+	if (dp->user_features & OVS_DP_F_UNALIGNED)
+		hlen = skb_zerocopy_headlen(skb);
+	else
+		hlen = skb->len;
+
+	len = upcall_msg_size(skb, upcall_info->userdata, hlen);
 	user_skb = genlmsg_new_unicast(len, &info, GFP_ATOMIC);
 	if (!user_skb) {
 		err = -ENOMEM;
@@ -447,13 +463,17 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
 			  nla_len(upcall_info->userdata),
 			  nla_data(upcall_info->userdata));
 
-	nla = __nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, skb->len);
+	/* Only reserve room for attribute header, packet data is added
+	 * in skb_zerocopy() */
+	if (!(nla = nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, 0)))
+		goto out;
+	nla->nla_len = nla_attr_size(skb->len);
+
+	skb_zerocopy(user_skb, skb, skb->len, hlen);
 
-	skb_copy_and_csum_dev(skb, nla_data(nla));
+	((struct nlmsghdr *) user_skb->data)->nlmsg_len = user_skb->len;
 
-	genlmsg_end(user_skb, upcall);
 	err = genlmsg_unicast(net, user_skb, upcall_info->portid);
-
 out:
 	kfree_skb(nskb);
 	return err;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 7/8] openvswitch: Drop user features if old user space attempted to create datapath
  2013-11-21 18:13 ` [PATCH net-next 7/8] openvswitch: Drop user features if old user space attempted to create datapath Thomas Graf
@ 2013-11-21 18:23   ` Ben Hutchings
  2013-11-21 22:20     ` Thomas Graf
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Hutchings @ 2013-11-21 18:23 UTC (permalink / raw)
  To: Thomas Graf
  Cc: jesse, davem, dev, netdev, dborkman, ffusco, fleitner, eric.dumazet

On Thu, 2013-11-21 at 19:13 +0100, Thomas Graf wrote:
> Drop user features if an outdated user space instance that does not
> understand the concept of user_features attempted to create a new
> datapath.
> 
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
> ---
>  include/uapi/linux/openvswitch.h | 11 ++++++++++-
>  net/openvswitch/datapath.c       | 10 ++++++++++
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
> index 07ef2c3..4f74c2c 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -40,7 +40,16 @@ struct ovs_header {
>  
>  #define OVS_DATAPATH_FAMILY  "ovs_datapath"
>  #define OVS_DATAPATH_MCGROUP "ovs_datapath"
> -#define OVS_DATAPATH_VERSION 0x1
> +
> +/**
> + * V2:

This is not kernel-doc format so don't use '/**'.

> + *   - API users are expected to provide OVS_DP_ATTR_USER_FEATURES
> + *     with NLM_F_REPLACE when creating the datapath.
> + */
> +#define OVS_DATAPATH_VERSION 2
> +
> +/* First OVS datapath version to support features */
> +#define OVS_DP_VER_FEATURES 2
>  
>  enum ovs_datapath_cmd {
>  	OVS_DP_CMD_UNSPEC,
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index 3f1fb87..72cdffb 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -1198,6 +1198,16 @@ static int ovs_dp_create_or_update(struct sk_buff *skb, struct genl_info *info,
>  		if (info->nlhdr->nlmsg_flags & NLM_F_REPLACE)
>  			goto update;
>  
> +		/* An outdated user space instance that does not understand
> +		 * the concept of user_features has attempted to create a new
> +		 * datapath. Drop all user features.
> +		 */
> +		if (info->genlhdr->version < OVS_DP_VER_FEATURES) {
> +			WARN_ONCE(dp->user_features, "Dropping previously "
> +				  "announced user features");

Log messages shouldn't be split like this as it makes them harder to
find.  There should also be a newline at the end of the message.

Ben.

> +			dp->user_features = 0;
> +		}
> +
>  		err = -EEXIST;
>  		goto err_unlock_ovs;
>  	}

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 7/8] openvswitch: Drop user features if old user space attempted to create datapath
  2013-11-21 18:23   ` Ben Hutchings
@ 2013-11-21 22:20     ` Thomas Graf
  2013-11-21 23:50       ` Jesse Gross
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Graf @ 2013-11-21 22:20 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: jesse, davem, dev, netdev, dborkman, ffusco, fleitner, eric.dumazet

On 11/21/13 at 06:23pm, Ben Hutchings wrote:
> On Thu, 2013-11-21 at 19:13 +0100, Thomas Graf wrote:
> > +
> > +/**
> > + * V2:
> 
> This is not kernel-doc format so don't use '/**'.

I was hoping kernel-doc would pick it up but it doesn't.
I'll convert it.

> > +		if (info->genlhdr->version < OVS_DP_VER_FEATURES) {
> > +			WARN_ONCE(dp->user_features, "Dropping previously "
> > +				  "announced user features");
> 
> Log messages shouldn't be split like this as it makes them harder to
> find.  There should also be a newline at the end of the message.

Right, I'll fix this up. We seem to have many of these unfixed.

Jesse, do you want a full respin or just a v2 of this patch?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 7/8] openvswitch: Drop user features if old user space attempted to create datapath
  2013-11-21 22:20     ` Thomas Graf
@ 2013-11-21 23:50       ` Jesse Gross
  0 siblings, 0 replies; 13+ messages in thread
From: Jesse Gross @ 2013-11-21 23:50 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Ben Hutchings, David Miller, dev, netdev, Daniel Borkmann,
	ffusco, fleitner, Eric Dumazet

On Thu, Nov 21, 2013 at 2:20 PM, Thomas Graf <tgraf@suug.ch> wrote:
> On 11/21/13 at 06:23pm, Ben Hutchings wrote:
>> On Thu, 2013-11-21 at 19:13 +0100, Thomas Graf wrote:
>> > +
>> > +/**
>> > + * V2:
>>
>> This is not kernel-doc format so don't use '/**'.
>
> I was hoping kernel-doc would pick it up but it doesn't.
> I'll convert it.
>
>> > +           if (info->genlhdr->version < OVS_DP_VER_FEATURES) {
>> > +                   WARN_ONCE(dp->user_features, "Dropping previously "
>> > +                             "announced user features");
>>
>> Log messages shouldn't be split like this as it makes them harder to
>> find.  There should also be a newline at the end of the message.
>
> Right, I'll fix this up. We seem to have many of these unfixed.
>
> Jesse, do you want a full respin or just a v2 of this patch?

Whichever is easier is fine.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next 4/8] net: Export skb_zerocopy() to zerocopy from one skb to another
  2013-11-22 16:56 [PATCH net-next 0/8 v7] Open vSwitch upcall optimiziations Thomas Graf
@ 2013-11-22 16:56 ` Thomas Graf
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Graf @ 2013-11-22 16:56 UTC (permalink / raw)
  To: jesse, davem
  Cc: dev, netdev, dborkman, ffusco, fleitner, eric.dumazet, bhutchings

Make the skb zerocopy logic written for nfnetlink queue available for
use by other modules.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
---
 include/linux/skbuff.h               |  3 ++
 net/core/skbuff.c                    | 85 ++++++++++++++++++++++++++++++++++++
 net/netfilter/nfnetlink_queue_core.c | 59 ++-----------------------
 3 files changed, 92 insertions(+), 55 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index bec1cc7..7c48e2d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2345,6 +2345,9 @@ int skb_splice_bits(struct sk_buff *skb, unsigned int offset,
 		    struct pipe_inode_info *pipe, unsigned int len,
 		    unsigned int flags);
 void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to);
+unsigned int skb_zerocopy_headlen(const struct sk_buff *from);
+void skb_zerocopy(struct sk_buff *to, const struct sk_buff *from,
+		  int len, int hlen);
 void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len);
 int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2718fed..55859cb 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2122,6 +2122,91 @@ __wsum skb_copy_and_csum_bits(const struct sk_buff *skb, int offset,
 }
 EXPORT_SYMBOL(skb_copy_and_csum_bits);
 
+ /**
+ *	skb_zerocopy_headlen - Calculate headroom needed for skb_zerocopy()
+ *	@from: source buffer
+ *
+ *	Calculates the amount of linear headroom needed in the 'to' skb passed
+ *	into skb_zerocopy().
+ */
+unsigned int
+skb_zerocopy_headlen(const struct sk_buff *from)
+{
+	unsigned int hlen = 0;
+
+	if (!from->head_frag ||
+	    skb_headlen(from) < L1_CACHE_BYTES ||
+	    skb_shinfo(from)->nr_frags >= MAX_SKB_FRAGS)
+		hlen = skb_headlen(from);
+
+	if (skb_has_frag_list(from))
+		hlen = from->len;
+
+	return hlen;
+}
+EXPORT_SYMBOL_GPL(skb_zerocopy_headlen);
+
+/**
+ *	skb_zerocopy - Zero copy skb to skb
+ *	@to: destination buffer
+ *	@source: source buffer
+ *	@len: number of bytes to copy from source buffer
+ *	@hlen: size of linear headroom in destination buffer
+ *
+ *	Copies up to `len` bytes from `from` to `to` by creating references
+ *	to the frags in the source buffer.
+ *
+ *	The `hlen` as calculated by skb_zerocopy_headlen() specifies the
+ *	headroom in the `to` buffer.
+ */
+void
+skb_zerocopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
+{
+	int i, j = 0;
+	int plen = 0; /* length of skb->head fragment */
+	struct page *page;
+	unsigned int offset;
+
+	BUG_ON(!from->head_frag && !hlen);
+
+	/* dont bother with small payloads */
+	if (len <= skb_tailroom(to)) {
+		skb_copy_bits(from, 0, skb_put(to, len), len);
+		return;
+	}
+
+	if (hlen) {
+		skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+		len -= hlen;
+	} else {
+		plen = min_t(int, skb_headlen(from), len);
+		if (plen) {
+			page = virt_to_head_page(from->head);
+			offset = from->data - (unsigned char *)page_address(page);
+			__skb_fill_page_desc(to, 0, page, offset, plen);
+			get_page(page);
+			j = 1;
+			len -= plen;
+		}
+	}
+
+	to->truesize += len + plen;
+	to->len += len + plen;
+	to->data_len += len + plen;
+
+	for (i = 0; i < skb_shinfo(from)->nr_frags; i++) {
+		if (!len)
+			break;
+		skb_shinfo(to)->frags[j] = skb_shinfo(from)->frags[i];
+		skb_shinfo(to)->frags[j].size = min_t(int, skb_shinfo(to)->frags[j].size, len);
+		len -= skb_shinfo(to)->frags[j].size;
+		skb_frag_ref(to, j);
+		j++;
+	}
+	skb_shinfo(to)->nr_frags = j;
+}
+EXPORT_SYMBOL_GPL(skb_zerocopy);
+
 void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to)
 {
 	__wsum csum;
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index 21258cf..615ee12 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -235,51 +235,6 @@ nfqnl_flush(struct nfqnl_instance *queue, nfqnl_cmpfn cmpfn, unsigned long data)
 	spin_unlock_bh(&queue->lock);
 }
 
-static void
-nfqnl_zcopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
-{
-	int i, j = 0;
-	int plen = 0; /* length of skb->head fragment */
-	struct page *page;
-	unsigned int offset;
-
-	/* dont bother with small payloads */
-	if (len <= skb_tailroom(to)) {
-		skb_copy_bits(from, 0, skb_put(to, len), len);
-		return;
-	}
-
-	if (hlen) {
-		skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
-		len -= hlen;
-	} else {
-		plen = min_t(int, skb_headlen(from), len);
-		if (plen) {
-			page = virt_to_head_page(from->head);
-			offset = from->data - (unsigned char *)page_address(page);
-			__skb_fill_page_desc(to, 0, page, offset, plen);
-			get_page(page);
-			j = 1;
-			len -= plen;
-		}
-	}
-
-	to->truesize += len + plen;
-	to->len += len + plen;
-	to->data_len += len + plen;
-
-	for (i = 0; i < skb_shinfo(from)->nr_frags; i++) {
-		if (!len)
-			break;
-		skb_shinfo(to)->frags[j] = skb_shinfo(from)->frags[i];
-		skb_shinfo(to)->frags[j].size = min_t(int, skb_shinfo(to)->frags[j].size, len);
-		len -= skb_shinfo(to)->frags[j].size;
-		skb_frag_ref(to, j);
-		j++;
-	}
-	skb_shinfo(to)->nr_frags = j;
-}
-
 static int
 nfqnl_put_packet_info(struct sk_buff *nlskb, struct sk_buff *packet,
 		      bool csum_verify)
@@ -304,7 +259,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 {
 	size_t size;
 	size_t data_len = 0, cap_len = 0;
-	int hlen = 0;
+	unsigned int hlen = 0;
 	struct sk_buff *skb;
 	struct nlattr *nla;
 	struct nfqnl_msg_packet_hdr *pmsg;
@@ -356,14 +311,8 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 		if (data_len > entskb->len)
 			data_len = entskb->len;
 
-		if (!entskb->head_frag ||
-		    skb_headlen(entskb) < L1_CACHE_BYTES ||
-		    skb_shinfo(entskb)->nr_frags >= MAX_SKB_FRAGS)
-			hlen = skb_headlen(entskb);
-
-		if (skb_has_frag_list(entskb))
-			hlen = entskb->len;
-		hlen = min_t(int, data_len, hlen);
+		hlen = skb_zerocopy_headlen(entskb);
+		hlen = min_t(unsigned int, hlen, data_len);
 		size += sizeof(struct nlattr) + hlen;
 		cap_len = entskb->len;
 		break;
@@ -504,7 +453,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 		nla->nla_type = NFQA_PAYLOAD;
 		nla->nla_len = nla_attr_size(data_len);
 
-		nfqnl_zcopy(skb, entskb, data_len, hlen);
+		skb_zerocopy(skb, entskb, data_len, hlen);
 	}
 
 	nlh->nlmsg_len = skb->len;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-11-22 16:56 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-21 18:13 [PATCH net-next 0/8 v6] Open vSwitch upcall optimiziations Thomas Graf
     [not found] ` <cover.1385057355.git.tgraf-G/eBtMaohhA@public.gmane.org>
2013-11-21 18:13   ` [PATCH net-next 1/8] genl: Add genlmsg_new_unicast() for unicast message allocation Thomas Graf
2013-11-21 18:13 ` [PATCH net-next 2/8] netlink: Avoid netlink mmap alloc if msg size exceeds frame size Thomas Graf
2013-11-21 18:13 ` [PATCH net-next 3/8] openvswitch: Enable memory mapped Netlink i/o Thomas Graf
2013-11-21 18:13 ` [PATCH net-next 4/8] net: Export skb_zerocopy() to zerocopy from one skb to another Thomas Graf
2013-11-21 18:13 ` [PATCH net-next 5/8] openvswitch: Allow user space to announce ability to accept unaligned Netlink messages Thomas Graf
2013-11-21 18:13 ` [PATCH net-next 6/8] openvswitch: Allow update of dp with OVS_DP_CMD_NEW if NLM_F_REPLACE is set Thomas Graf
2013-11-21 18:13 ` [PATCH net-next 7/8] openvswitch: Drop user features if old user space attempted to create datapath Thomas Graf
2013-11-21 18:23   ` Ben Hutchings
2013-11-21 22:20     ` Thomas Graf
2013-11-21 23:50       ` Jesse Gross
2013-11-21 18:13 ` [PATCH net-next 8/8] openvswitch: Use skb_zerocopy() for upcall Thomas Graf
2013-11-22 16:56 [PATCH net-next 0/8 v7] Open vSwitch upcall optimiziations Thomas Graf
2013-11-22 16:56 ` [PATCH net-next 4/8] net: Export skb_zerocopy() to zerocopy from one skb to another Thomas Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).