All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT net-next] Open vSwitch
@ 2014-01-07  0:15 Jesse Gross
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
  2014-01-08 14:49 ` [ovs-dev] " Zoltan Kiss
  0 siblings, 2 replies; 33+ messages in thread
From: Jesse Gross @ 2014-01-07  0:15 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

Open vSwitch changes for net-next/3.14. Highlights are:
 * Performance improvements in the mechanism to get packets to userspace
   using memory mapped netlink and skb zero copy where appropriate.
 * Per-cpu flow stats in situations where flows are likely to be shared
   across CPUs. Standard flow stats are used in other situations to save
   memory and allocation time.
 * A handful of code cleanups and rationalization.

The following changes since commit 6ce4eac1f600b34f2f7f58f9cd8f0503d79e42ae:

  Linux 3.13-rc1 (2013-11-22 11:30:55 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch.git master

for you to fetch changes up to 443cd88c8a31379e95326428bbbd40af25c1d440:

  ovs: make functions local (2014-01-06 15:54:39 -0800)

----------------------------------------------------------------
Andy Zhou (1):
      openvswitch: Change ovs_flow_tbl_lookup_xx() APIs

Ben Pfaff (2):
      openvswitch: Correct comment.
      openvswitch: Shrink sw_flow_mask by 8 bytes (64-bit) or 4 bytes (32-bit).

Daniel Borkmann (1):
      net: ovs: use kfree_rcu instead of rcu_free_{sw_flow_mask_cb,acts_callback}

Jesse Gross (1):
      openvswitch: Silence RCU lockdep checks from flow lookup.

Pravin B Shelar (1):
      openvswitch: Per cpu flow stats.

Stephen Hemminger (1):
      ovs: make functions local

Thomas Graf (9):
      genl: Add genlmsg_new_unicast() for unicast message allocation
      netlink: Avoid netlink mmap alloc if msg size exceeds frame size
      openvswitch: Enable memory mapped Netlink i/o
      net: Export skb_zerocopy() to zerocopy from one skb to another
      openvswitch: Allow user space to announce ability to accept unaligned Netlink messages
      openvswitch: Drop user features if old user space attempted to create datapath
      openvswitch: Pass datapath into userspace queue functions
      openvswitch: Use skb_zerocopy() for upcall
      openvswitch: Compute checksum in skb_gso_segment() if needed

Wei Yongjun (1):
      openvswitch: remove duplicated include from flow_table.c

 include/linux/skbuff.h               |   3 +
 include/net/genetlink.h              |   4 +
 include/uapi/linux/openvswitch.h     |  14 ++-
 net/core/skbuff.c                    |  85 +++++++++++++
 net/netfilter/nfnetlink_queue_core.c |  59 +--------
 net/netlink/af_netlink.c             |   4 +
 net/netlink/genetlink.c              |  21 ++++
 net/openvswitch/datapath.c           | 231 +++++++++++++++++++----------------
 net/openvswitch/datapath.h           |   6 +-
 net/openvswitch/flow.c               |  96 +++++++++++++--
 net/openvswitch/flow.h               |  33 +++--
 net/openvswitch/flow_netlink.c       |  66 ++++++++--
 net/openvswitch/flow_netlink.h       |   1 +
 net/openvswitch/flow_table.c         |  60 ++++++---
 net/openvswitch/flow_table.h         |   6 +-
 net/openvswitch/vport.c              |   6 +-
 net/openvswitch/vport.h              |   1 -
 17 files changed, 483 insertions(+), 213 deletions(-)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH net-next 01/17] openvswitch: Correct comment.
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
@ 2014-01-07  0:16   ` Jesse Gross
  2014-01-07  0:16   ` [PATCH net-next 02/17] openvswitch: Shrink sw_flow_mask by 8 bytes (64-bit) or 4 bytes (32-bit) Jesse Gross
                     ` (16 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2014-01-07  0:16 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

From: Ben Pfaff <blp-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

Signed-off-by: Ben Pfaff <blp-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/flow_table.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index e425427..f96ebd5 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -514,11 +514,7 @@ static struct sw_flow_mask *flow_mask_find(const struct flow_table *tbl,
 	return NULL;
 }
 
-/**
- * add a new mask into the mask list.
- * The caller needs to make sure that 'mask' is not the same
- * as any masks that are already on the list.
- */
+/* Add 'mask' into the mask list, if it is not already there. */
 static int flow_mask_insert(struct flow_table *tbl, struct sw_flow *flow,
 			    struct sw_flow_mask *new)
 {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 02/17] openvswitch: Shrink sw_flow_mask by 8 bytes (64-bit) or 4 bytes (32-bit).
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
  2014-01-07  0:16   ` [PATCH net-next 01/17] openvswitch: Correct comment Jesse Gross
@ 2014-01-07  0:16   ` Jesse Gross
  2014-01-07 21:36     ` Sergei Shtylyov
  2014-01-07  0:16   ` [PATCH net-next 03/17] openvswitch: Change ovs_flow_tbl_lookup_xx() APIs Jesse Gross
                     ` (15 subsequent siblings)
  17 siblings, 1 reply; 33+ messages in thread
From: Jesse Gross @ 2014-01-07  0:16 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

From: Ben Pfaff <blp-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

We won't normally have a ton of flow masks but using a size_t to store
values no bigger than sizeof(struct sw_flow_key) seems excessive.

This reduces sw_flow_key_range and sw_flow_mask by 4 bytes on 32-bit
systems.  On 64-bit systems it shrinks sw_flow_key_range by 12 bytes but
sw_flow_mask only by 8 bytes due to padding.

Compile tested only.

Signed-off-by: Ben Pfaff <blp-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Acked-by: Andy Zhou <azhou-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/flow.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 1510f51..176406d 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -122,8 +122,8 @@ struct sw_flow_key {
 } __aligned(BITS_PER_LONG/8); /* Ensure that we can do comparisons as longs. */
 
 struct sw_flow_key_range {
-	size_t start;
-	size_t end;
+	unsigned short int start;
+	unsigned short int end;
 };
 
 struct sw_flow_mask {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 03/17] openvswitch: Change ovs_flow_tbl_lookup_xx() APIs
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
  2014-01-07  0:16   ` [PATCH net-next 01/17] openvswitch: Correct comment Jesse Gross
  2014-01-07  0:16   ` [PATCH net-next 02/17] openvswitch: Shrink sw_flow_mask by 8 bytes (64-bit) or 4 bytes (32-bit) Jesse Gross
@ 2014-01-07  0:16   ` Jesse Gross
  2014-01-07  0:16   ` [PATCH net-next 04/17] openvswitch: Silence RCU lockdep checks from flow lookup Jesse Gross
                     ` (14 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2014-01-07  0:16 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

From: Andy Zhou <azhou-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

API changes only for code readability. No functional chnages.

This patch removes the underscored version. Added a new API
ovs_flow_tbl_lookup_stats() that returns the n_mask_hits.

Reported by: Ben Pfaff <blp-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Thomas Graf <tgraf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Andy Zhou <azhou-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/datapath.c   | 16 ++++------------
 net/openvswitch/flow_table.c | 10 +++++++++-
 net/openvswitch/flow_table.h |  4 +++-
 3 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 6f5e1dd..fcaed98 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -234,7 +234,7 @@ void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb)
 	}
 
 	/* Look up flow. */
-	flow = ovs_flow_tbl_lookup(&dp->table, &key, &n_mask_hit);
+	flow = ovs_flow_tbl_lookup_stats(&dp->table, &key, &n_mask_hit);
 	if (unlikely(!flow)) {
 		struct dp_upcall_info upcall;
 
@@ -751,14 +751,6 @@ static struct sk_buff *ovs_flow_cmd_build_info(struct sw_flow *flow,
 	return skb;
 }
 
-static struct sw_flow *__ovs_flow_tbl_lookup(struct flow_table *tbl,
-					      const struct sw_flow_key *key)
-{
-	u32 __always_unused n_mask_hit;
-
-	return ovs_flow_tbl_lookup(tbl, key, &n_mask_hit);
-}
-
 static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
 {
 	struct nlattr **a = info->attrs;
@@ -809,7 +801,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
 		goto err_unlock_ovs;
 
 	/* Check if this is a duplicate flow */
-	flow = __ovs_flow_tbl_lookup(&dp->table, &key);
+	flow = ovs_flow_tbl_lookup(&dp->table, &key);
 	if (!flow) {
 		/* Bail out if we're not allowed to create a new flow. */
 		error = -ENOENT;
@@ -921,7 +913,7 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct genl_info *info)
 		goto unlock;
 	}
 
-	flow = __ovs_flow_tbl_lookup(&dp->table, &key);
+	flow = ovs_flow_tbl_lookup(&dp->table, &key);
 	if (!flow || !ovs_flow_cmp_unmasked_key(flow, &match)) {
 		err = -ENOENT;
 		goto unlock;
@@ -969,7 +961,7 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info)
 	if (err)
 		goto unlock;
 
-	flow = __ovs_flow_tbl_lookup(&dp->table, &key);
+	flow = ovs_flow_tbl_lookup(&dp->table, &key);
 	if (!flow || !ovs_flow_cmp_unmasked_key(flow, &match)) {
 		err = -ENOENT;
 		goto unlock;
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index f96ebd5..261a54e 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -429,7 +429,7 @@ static struct sw_flow *masked_flow_lookup(struct table_instance *ti,
 	return NULL;
 }
 
-struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *tbl,
+struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
 				    const struct sw_flow_key *key,
 				    u32 *n_mask_hit)
 {
@@ -447,6 +447,14 @@ struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *tbl,
 	return NULL;
 }
 
+struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *tbl,
+				    const struct sw_flow_key *key)
+{
+	u32 __always_unused n_mask_hit;
+
+	return ovs_flow_tbl_lookup_stats(tbl, key, &n_mask_hit);
+}
+
 int ovs_flow_tbl_num_masks(const struct flow_table *table)
 {
 	struct sw_flow_mask *mask;
diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h
index fbe45d5..f54aa82 100644
--- a/net/openvswitch/flow_table.h
+++ b/net/openvswitch/flow_table.h
@@ -69,9 +69,11 @@ void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow);
 int  ovs_flow_tbl_num_masks(const struct flow_table *table);
 struct sw_flow *ovs_flow_tbl_dump_next(struct table_instance *table,
 				       u32 *bucket, u32 *idx);
-struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *,
+struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *,
 				    const struct sw_flow_key *,
 				    u32 *n_mask_hit);
+struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *,
+				    const struct sw_flow_key *);
 
 bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow,
 			       struct sw_flow_match *match);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 04/17] openvswitch: Silence RCU lockdep checks from flow lookup.
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
                     ` (2 preceding siblings ...)
  2014-01-07  0:16   ` [PATCH net-next 03/17] openvswitch: Change ovs_flow_tbl_lookup_xx() APIs Jesse Gross
@ 2014-01-07  0:16   ` Jesse Gross
  2014-01-07  0:16   ` [PATCH net-next 05/17] genl: Add genlmsg_new_unicast() for unicast message allocation Jesse Gross
                     ` (13 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2014-01-07  0:16 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

Flow lookup can happen either in packet processing context or userspace
context but it was annotated as requiring RCU read lock to be held. This
also allows OVS mutex to be held without causing warnings.

Reported-by: Justin Pettit <jpettit-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Thomas Graf <tgraf-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/datapath.c   | 3 +--
 net/openvswitch/datapath.h   | 2 ++
 net/openvswitch/flow_table.c | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index fcaed98..0727aaa 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -701,8 +701,7 @@ static int ovs_flow_cmd_fill_info(struct sw_flow *flow, struct datapath *dp,
 	if (start) {
 		const struct sw_flow_actions *sf_acts;
 
-		sf_acts = rcu_dereference_check(flow->sf_acts,
-						lockdep_ovsl_is_held());
+		sf_acts = rcu_dereference_ovsl(flow->sf_acts);
 
 		err = ovs_nla_put_actions(sf_acts->actions,
 					  sf_acts->actions_len, skb);
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 4067ea4..ba13be4 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -145,6 +145,8 @@ int lockdep_ovsl_is_held(void);
 #define ASSERT_OVSL()		WARN_ON(unlikely(!lockdep_ovsl_is_held()))
 #define ovsl_dereference(p)					\
 	rcu_dereference_protected(p, lockdep_ovsl_is_held())
+#define rcu_dereference_ovsl(p)					\
+	rcu_dereference_check(p, lockdep_ovsl_is_held())
 
 static inline struct net *ovs_dp_get_net(struct datapath *dp)
 {
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 261a54e..7b9cf2c 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -433,7 +433,7 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
 				    const struct sw_flow_key *key,
 				    u32 *n_mask_hit)
 {
-	struct table_instance *ti = rcu_dereference(tbl->ti);
+	struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
 	struct sw_flow_mask *mask;
 	struct sw_flow *flow;
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 05/17] genl: Add genlmsg_new_unicast() for unicast message allocation
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
                     ` (3 preceding siblings ...)
  2014-01-07  0:16   ` [PATCH net-next 04/17] openvswitch: Silence RCU lockdep checks from flow lookup Jesse Gross
@ 2014-01-07  0:16   ` Jesse Gross
  2014-01-07  0:16   ` [PATCH net-next 06/17] netlink: Avoid netlink mmap alloc if msg size exceeds frame size Jesse Gross
                     ` (12 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2014-01-07  0:16 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

From: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>

Allocates a new sk_buff large enough to cover the specified payload
plus required Netlink headers. Will check receiving socket for
memory mapped i/o capability and use it if enabled. Will fall back
to non-mapped skb if message size exceeds the frame size of the ring.

Signed-of-by: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
Reviewed-by: Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 include/net/genetlink.h |  4 ++++
 net/netlink/genetlink.c | 21 +++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index 1b177ed..93695f0 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -73,6 +73,7 @@ struct genl_family {
  * @attrs: netlink attributes
  * @_net: network namespace
  * @user_ptr: user pointers
+ * @dst_sk: destination socket
  */
 struct genl_info {
 	u32			snd_seq;
@@ -85,6 +86,7 @@ struct genl_info {
 	struct net *		_net;
 #endif
 	void *			user_ptr[2];
+	struct sock *		dst_sk;
 };
 
 static inline struct net *genl_info_net(struct genl_info *info)
@@ -177,6 +179,8 @@ void genl_notify(struct genl_family *family,
 		 struct sk_buff *skb, struct net *net, u32 portid,
 		 u32 group, struct nlmsghdr *nlh, gfp_t flags);
 
+struct sk_buff *genlmsg_new_unicast(size_t payload, struct genl_info *info,
+				    gfp_t flags);
 void *genlmsg_put(struct sk_buff *skb, u32 portid, u32 seq,
 		  struct genl_family *family, int flags, u8 cmd);
 
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 4518a57..85bf42e 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -454,6 +454,26 @@ int genl_unregister_family(struct genl_family *family)
 EXPORT_SYMBOL(genl_unregister_family);
 
 /**
+ * genlmsg_new_unicast - Allocate generic netlink message for unicast
+ * @payload: size of the message payload
+ * @info: information on destination
+ * @flags: the type of memory to allocate
+ *
+ * Allocates a new sk_buff large enough to cover the specified payload
+ * plus required Netlink headers. Will check receiving socket for
+ * memory mapped i/o capability and use it if enabled. Will fall back
+ * to non-mapped skb if message size exceeds the frame size of the ring.
+ */
+struct sk_buff *genlmsg_new_unicast(size_t payload, struct genl_info *info,
+				    gfp_t flags)
+{
+	size_t len = nlmsg_total_size(genlmsg_total_size(payload));
+
+	return netlink_alloc_skb(info->dst_sk, len, info->snd_portid, flags);
+}
+EXPORT_SYMBOL_GPL(genlmsg_new_unicast);
+
+/**
  * genlmsg_put - Add generic netlink header to netlink message
  * @skb: socket buffer holding the message
  * @portid: netlink portid the message is addressed to
@@ -593,6 +613,7 @@ static int genl_family_rcv_msg(struct genl_family *family,
 	info.genlhdr = nlmsg_data(nlh);
 	info.userhdr = nlmsg_data(nlh) + GENL_HDRLEN;
 	info.attrs = attrbuf;
+	info.dst_sk = skb->sk;
 	genl_info_net_set(&info, net);
 	memset(&info.user_ptr, 0, sizeof(info.user_ptr));
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 06/17] netlink: Avoid netlink mmap alloc if msg size exceeds frame size
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
                     ` (4 preceding siblings ...)
  2014-01-07  0:16   ` [PATCH net-next 05/17] genl: Add genlmsg_new_unicast() for unicast message allocation Jesse Gross
@ 2014-01-07  0:16   ` Jesse Gross
  2014-01-07  0:16   ` [PATCH net-next 07/17] openvswitch: Enable memory mapped Netlink i/o Jesse Gross
                     ` (11 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2014-01-07  0:16 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

From: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>

An insufficent ring frame size configuration can lead to an
unnecessary skb allocation for every Netlink message. Check frame
size before taking the queue lock and allocating the skb and
re-check with lock to be safe.

Signed-off-by: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
Reviewed-by: Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/netlink/af_netlink.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index bca50b9..6433489 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1769,6 +1769,9 @@ struct sk_buff *netlink_alloc_skb(struct sock *ssk, unsigned int size,
 	if (ring->pg_vec == NULL)
 		goto out_put;
 
+	if (ring->frame_size - NL_MMAP_HDRLEN < size)
+		goto out_put;
+
 	skb = alloc_skb_head(gfp_mask);
 	if (skb == NULL)
 		goto err1;
@@ -1778,6 +1781,7 @@ struct sk_buff *netlink_alloc_skb(struct sock *ssk, unsigned int size,
 	if (ring->pg_vec == NULL)
 		goto out_free;
 
+	/* check again under lock */
 	maxlen = ring->frame_size - NL_MMAP_HDRLEN;
 	if (maxlen < size)
 		goto out_free;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 07/17] openvswitch: Enable memory mapped Netlink i/o
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
                     ` (5 preceding siblings ...)
  2014-01-07  0:16   ` [PATCH net-next 06/17] netlink: Avoid netlink mmap alloc if msg size exceeds frame size Jesse Gross
@ 2014-01-07  0:16   ` Jesse Gross
  2014-01-07  0:16   ` [PATCH net-next 08/17] openvswitch: Per cpu flow stats Jesse Gross
                     ` (10 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2014-01-07  0:16 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

From: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>

Use memory mapped Netlink i/o for all unicast openvswitch
communication if a ring has been set up.

Benchmark
  * pktgen -> ovs internal port
  * 5M pkts, 5M flows
  * 4 threads, 8 cores

Before:
Result: OK: 67418743(c67108212+d310530) usec, 5000000 (9000byte,0frags)
  74163pps 5339Mb/sec (5339736000bps) errors: 0
	+   2.98%     ovs-vswitchd  [k] copy_user_generic_string
	+   2.49%     ovs-vswitchd  [k] memcpy
	+   1.84%       kpktgend_2  [k] memcpy
	+   1.81%       kpktgend_1  [k] memcpy
	+   1.81%       kpktgend_3  [k] memcpy
	+   1.78%       kpktgend_0  [k] memcpy

After:
Result: OK: 24229690(c24127165+d102524) usec, 5000000 (9000byte,0frags)
  206358pps 14857Mb/sec (14857776000bps) errors: 0
	+   2.80%     ovs-vswitchd  [k] memcpy
	+   1.31%       kpktgend_2  [k] memcpy
	+   1.23%       kpktgend_0  [k] memcpy
	+   1.09%       kpktgend_1  [k] memcpy
	+   1.04%       kpktgend_3  [k] memcpy
	+   0.96%     ovs-vswitchd  [k] copy_user_generic_string

Signed-off-by: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
Reviewed-by: Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/datapath.c | 56 ++++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 0727aaa..5da2534 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -402,6 +402,11 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
 	struct sk_buff *nskb = NULL;
 	struct sk_buff *user_skb; /* to be queued to userspace */
 	struct nlattr *nla;
+	struct genl_info info = {
+		.dst_sk = net->genl_sock,
+		.snd_portid = upcall_info->portid,
+	};
+	size_t len;
 	int err;
 
 	if (vlan_tx_tag_present(skb)) {
@@ -422,7 +427,8 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
 		goto out;
 	}
 
-	user_skb = genlmsg_new(upcall_msg_size(skb, upcall_info->userdata), GFP_ATOMIC);
+	len = upcall_msg_size(skb, upcall_info->userdata);
+	user_skb = genlmsg_new_unicast(len, &info, GFP_ATOMIC);
 	if (!user_skb) {
 		err = -ENOMEM;
 		goto out;
@@ -725,27 +731,30 @@ error:
 	return err;
 }
 
-static struct sk_buff *ovs_flow_cmd_alloc_info(struct sw_flow *flow)
+static struct sk_buff *ovs_flow_cmd_alloc_info(struct sw_flow *flow,
+					       struct genl_info *info)
 {
-	const struct sw_flow_actions *sf_acts;
+	size_t len;
 
-	sf_acts = ovsl_dereference(flow->sf_acts);
+	len = ovs_flow_cmd_msg_size(ovsl_dereference(flow->sf_acts));
 
-	return genlmsg_new(ovs_flow_cmd_msg_size(sf_acts), GFP_KERNEL);
+	return genlmsg_new_unicast(len, info, GFP_KERNEL);
 }
 
 static struct sk_buff *ovs_flow_cmd_build_info(struct sw_flow *flow,
 					       struct datapath *dp,
-					       u32 portid, u32 seq, u8 cmd)
+					       struct genl_info *info,
+					       u8 cmd)
 {
 	struct sk_buff *skb;
 	int retval;
 
-	skb = ovs_flow_cmd_alloc_info(flow);
+	skb = ovs_flow_cmd_alloc_info(flow, info);
 	if (!skb)
 		return ERR_PTR(-ENOMEM);
 
-	retval = ovs_flow_cmd_fill_info(flow, dp, skb, portid, seq, 0, cmd);
+	retval = ovs_flow_cmd_fill_info(flow, dp, skb, info->snd_portid,
+					info->snd_seq, 0, cmd);
 	BUG_ON(retval < 0);
 	return skb;
 }
@@ -826,8 +835,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
 			goto err_flow_free;
 		}
 
-		reply = ovs_flow_cmd_build_info(flow, dp, info->snd_portid,
-						info->snd_seq, OVS_FLOW_CMD_NEW);
+		reply = ovs_flow_cmd_build_info(flow, dp, info, OVS_FLOW_CMD_NEW);
 	} else {
 		/* We found a matching flow. */
 		struct sw_flow_actions *old_acts;
@@ -855,8 +863,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
 		rcu_assign_pointer(flow->sf_acts, acts);
 		ovs_nla_free_flow_actions(old_acts);
 
-		reply = ovs_flow_cmd_build_info(flow, dp, info->snd_portid,
-					       info->snd_seq, OVS_FLOW_CMD_NEW);
+		reply = ovs_flow_cmd_build_info(flow, dp, info, OVS_FLOW_CMD_NEW);
 
 		/* Clear stats. */
 		if (a[OVS_FLOW_ATTR_CLEAR]) {
@@ -918,8 +925,7 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct genl_info *info)
 		goto unlock;
 	}
 
-	reply = ovs_flow_cmd_build_info(flow, dp, info->snd_portid,
-					info->snd_seq, OVS_FLOW_CMD_NEW);
+	reply = ovs_flow_cmd_build_info(flow, dp, info, OVS_FLOW_CMD_NEW);
 	if (IS_ERR(reply)) {
 		err = PTR_ERR(reply);
 		goto unlock;
@@ -966,7 +972,7 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info)
 		goto unlock;
 	}
 
-	reply = ovs_flow_cmd_alloc_info(flow);
+	reply = ovs_flow_cmd_alloc_info(flow, info);
 	if (!reply) {
 		err = -ENOMEM;
 		goto unlock;
@@ -1118,17 +1124,17 @@ error:
 	return -EMSGSIZE;
 }
 
-static struct sk_buff *ovs_dp_cmd_build_info(struct datapath *dp, u32 portid,
-					     u32 seq, u8 cmd)
+static struct sk_buff *ovs_dp_cmd_build_info(struct datapath *dp,
+					     struct genl_info *info, u8 cmd)
 {
 	struct sk_buff *skb;
 	int retval;
 
-	skb = genlmsg_new(ovs_dp_cmd_msg_size(), GFP_KERNEL);
+	skb = genlmsg_new_unicast(ovs_dp_cmd_msg_size(), info, GFP_KERNEL);
 	if (!skb)
 		return ERR_PTR(-ENOMEM);
 
-	retval = ovs_dp_cmd_fill_info(dp, skb, portid, seq, 0, cmd);
+	retval = ovs_dp_cmd_fill_info(dp, skb, info->snd_portid, info->snd_seq, 0, cmd);
 	if (retval < 0) {
 		kfree_skb(skb);
 		return ERR_PTR(retval);
@@ -1223,8 +1229,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		goto err_destroy_ports_array;
 	}
 
-	reply = ovs_dp_cmd_build_info(dp, info->snd_portid,
-				      info->snd_seq, OVS_DP_CMD_NEW);
+	reply = ovs_dp_cmd_build_info(dp, info, OVS_DP_CMD_NEW);
 	err = PTR_ERR(reply);
 	if (IS_ERR(reply))
 		goto err_destroy_local_port;
@@ -1290,8 +1295,7 @@ static int ovs_dp_cmd_del(struct sk_buff *skb, struct genl_info *info)
 	if (IS_ERR(dp))
 		goto unlock;
 
-	reply = ovs_dp_cmd_build_info(dp, info->snd_portid,
-				      info->snd_seq, OVS_DP_CMD_DEL);
+	reply = ovs_dp_cmd_build_info(dp, info, OVS_DP_CMD_DEL);
 	err = PTR_ERR(reply);
 	if (IS_ERR(reply))
 		goto unlock;
@@ -1319,8 +1323,7 @@ static int ovs_dp_cmd_set(struct sk_buff *skb, struct genl_info *info)
 	if (IS_ERR(dp))
 		goto unlock;
 
-	reply = ovs_dp_cmd_build_info(dp, info->snd_portid,
-				      info->snd_seq, OVS_DP_CMD_NEW);
+	reply = ovs_dp_cmd_build_info(dp, info, OVS_DP_CMD_NEW);
 	if (IS_ERR(reply)) {
 		err = PTR_ERR(reply);
 		genl_set_err(&dp_datapath_genl_family, sock_net(skb->sk), 0,
@@ -1351,8 +1354,7 @@ static int ovs_dp_cmd_get(struct sk_buff *skb, struct genl_info *info)
 		goto unlock;
 	}
 
-	reply = ovs_dp_cmd_build_info(dp, info->snd_portid,
-				      info->snd_seq, OVS_DP_CMD_NEW);
+	reply = ovs_dp_cmd_build_info(dp, info, OVS_DP_CMD_NEW);
 	if (IS_ERR(reply)) {
 		err = PTR_ERR(reply);
 		goto unlock;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 08/17] openvswitch: Per cpu flow stats.
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
                     ` (6 preceding siblings ...)
  2014-01-07  0:16   ` [PATCH net-next 07/17] openvswitch: Enable memory mapped Netlink i/o Jesse Gross
@ 2014-01-07  0:16   ` Jesse Gross
  2014-01-07  0:16   ` [PATCH net-next 09/17] net: ovs: use kfree_rcu instead of rcu_free_{sw_flow_mask_cb, acts_callback} Jesse Gross
                     ` (9 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2014-01-07  0:16 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

From: Pravin B Shelar <pshelar-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>

With mega flow implementation ovs flow can be shared between
multiple CPUs which makes stats updates highly contended
operation. This patch uses per-CPU stats in cases where a flow
is likely to be shared (if there is a wildcard in the 5-tuple
and therefore likely to be spread by RSS). In other situations,
it uses the current strategy, saving memory and allocation time.

Signed-off-by: Pravin B Shelar <pshelar-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/datapath.c     | 50 +++++++---------------
 net/openvswitch/flow.c         | 96 +++++++++++++++++++++++++++++++++++++++---
 net/openvswitch/flow.h         | 29 ++++++++++---
 net/openvswitch/flow_netlink.c | 56 ++++++++++++++++++++++--
 net/openvswitch/flow_netlink.h |  1 +
 net/openvswitch/flow_table.c   | 31 +++++++++++++-
 net/openvswitch/flow_table.h   |  2 +-
 7 files changed, 210 insertions(+), 55 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 5da2534..50d7782 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -251,9 +251,9 @@ void ovs_dp_process_received_packet(struct vport *p, struct sk_buff *skb)
 	OVS_CB(skb)->flow = flow;
 	OVS_CB(skb)->pkt_key = &key;
 
-	stats_counter = &stats->n_hit;
-	ovs_flow_used(OVS_CB(skb)->flow, skb);
+	ovs_flow_stats_update(OVS_CB(skb)->flow, skb);
 	ovs_execute_actions(dp, skb);
+	stats_counter = &stats->n_hit;
 
 out:
 	/* Update datapath statistics. */
@@ -459,14 +459,6 @@ out:
 	return err;
 }
 
-static void clear_stats(struct sw_flow *flow)
-{
-	flow->used = 0;
-	flow->tcp_flags = 0;
-	flow->packet_count = 0;
-	flow->byte_count = 0;
-}
-
 static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 {
 	struct ovs_header *ovs_header = info->userhdr;
@@ -505,7 +497,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 		packet->protocol = htons(ETH_P_802_2);
 
 	/* Build an sw_flow for sending this packet. */
-	flow = ovs_flow_alloc();
+	flow = ovs_flow_alloc(false);
 	err = PTR_ERR(flow);
 	if (IS_ERR(flow))
 		goto err_kfree_skb;
@@ -641,10 +633,10 @@ static int ovs_flow_cmd_fill_info(struct sw_flow *flow, struct datapath *dp,
 	const int skb_orig_len = skb->len;
 	struct nlattr *start;
 	struct ovs_flow_stats stats;
+	__be16 tcp_flags;
+	unsigned long used;
 	struct ovs_header *ovs_header;
 	struct nlattr *nla;
-	unsigned long used;
-	u8 tcp_flags;
 	int err;
 
 	ovs_header = genlmsg_put(skb, portid, seq, &dp_flow_genl_family, flags, cmd);
@@ -673,24 +665,17 @@ static int ovs_flow_cmd_fill_info(struct sw_flow *flow, struct datapath *dp,
 
 	nla_nest_end(skb, nla);
 
-	spin_lock_bh(&flow->lock);
-	used = flow->used;
-	stats.n_packets = flow->packet_count;
-	stats.n_bytes = flow->byte_count;
-	tcp_flags = (u8)ntohs(flow->tcp_flags);
-	spin_unlock_bh(&flow->lock);
-
+	ovs_flow_stats_get(flow, &stats, &used, &tcp_flags);
 	if (used &&
 	    nla_put_u64(skb, OVS_FLOW_ATTR_USED, ovs_flow_used_time(used)))
 		goto nla_put_failure;
 
 	if (stats.n_packets &&
-	    nla_put(skb, OVS_FLOW_ATTR_STATS,
-		    sizeof(struct ovs_flow_stats), &stats))
+	    nla_put(skb, OVS_FLOW_ATTR_STATS, sizeof(struct ovs_flow_stats), &stats))
 		goto nla_put_failure;
 
-	if (tcp_flags &&
-	    nla_put_u8(skb, OVS_FLOW_ATTR_TCP_FLAGS, tcp_flags))
+	if ((u8)ntohs(tcp_flags) &&
+	     nla_put_u8(skb, OVS_FLOW_ATTR_TCP_FLAGS, (u8)ntohs(tcp_flags)))
 		goto nla_put_failure;
 
 	/* If OVS_FLOW_ATTR_ACTIONS doesn't fit, skip dumping the actions if
@@ -770,6 +755,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
 	struct datapath *dp;
 	struct sw_flow_actions *acts = NULL;
 	struct sw_flow_match match;
+	bool exact_5tuple;
 	int error;
 
 	/* Extract key. */
@@ -778,7 +764,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
 		goto error;
 
 	ovs_match_init(&match, &key, &mask);
-	error = ovs_nla_get_match(&match,
+	error = ovs_nla_get_match(&match, &exact_5tuple,
 				  a[OVS_FLOW_ATTR_KEY], a[OVS_FLOW_ATTR_MASK]);
 	if (error)
 		goto error;
@@ -817,12 +803,11 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
 			goto err_unlock_ovs;
 
 		/* Allocate flow. */
-		flow = ovs_flow_alloc();
+		flow = ovs_flow_alloc(!exact_5tuple);
 		if (IS_ERR(flow)) {
 			error = PTR_ERR(flow);
 			goto err_unlock_ovs;
 		}
-		clear_stats(flow);
 
 		flow->key = masked_key;
 		flow->unmasked_key = key;
@@ -866,11 +851,8 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
 		reply = ovs_flow_cmd_build_info(flow, dp, info, OVS_FLOW_CMD_NEW);
 
 		/* Clear stats. */
-		if (a[OVS_FLOW_ATTR_CLEAR]) {
-			spin_lock_bh(&flow->lock);
-			clear_stats(flow);
-			spin_unlock_bh(&flow->lock);
-		}
+		if (a[OVS_FLOW_ATTR_CLEAR])
+			ovs_flow_stats_clear(flow);
 	}
 	ovs_unlock();
 
@@ -908,7 +890,7 @@ static int ovs_flow_cmd_get(struct sk_buff *skb, struct genl_info *info)
 	}
 
 	ovs_match_init(&match, &key, NULL);
-	err = ovs_nla_get_match(&match, a[OVS_FLOW_ATTR_KEY], NULL);
+	err = ovs_nla_get_match(&match, NULL, a[OVS_FLOW_ATTR_KEY], NULL);
 	if (err)
 		return err;
 
@@ -962,7 +944,7 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info)
 	}
 
 	ovs_match_init(&match, &key, NULL);
-	err = ovs_nla_get_match(&match, a[OVS_FLOW_ATTR_KEY], NULL);
+	err = ovs_nla_get_match(&match, NULL, a[OVS_FLOW_ATTR_KEY], NULL);
 	if (err)
 		goto unlock;
 
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index b409f52..16f4b46 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -35,6 +35,7 @@
 #include <linux/ip.h>
 #include <linux/ipv6.h>
 #include <linux/sctp.h>
+#include <linux/smp.h>
 #include <linux/tcp.h>
 #include <linux/udp.h>
 #include <linux/icmp.h>
@@ -60,10 +61,16 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies)
 
 #define TCP_FLAGS_BE16(tp) (*(__be16 *)&tcp_flag_word(tp) & htons(0x0FFF))
 
-void ovs_flow_used(struct sw_flow *flow, struct sk_buff *skb)
+void ovs_flow_stats_update(struct sw_flow *flow, struct sk_buff *skb)
 {
+	struct flow_stats *stats;
 	__be16 tcp_flags = 0;
 
+	if (!flow->stats.is_percpu)
+		stats = flow->stats.stat;
+	else
+		stats = this_cpu_ptr(flow->stats.cpu_stats);
+
 	if ((flow->key.eth.type == htons(ETH_P_IP) ||
 	     flow->key.eth.type == htons(ETH_P_IPV6)) &&
 	    flow->key.ip.proto == IPPROTO_TCP &&
@@ -71,12 +78,87 @@ void ovs_flow_used(struct sw_flow *flow, struct sk_buff *skb)
 		tcp_flags = TCP_FLAGS_BE16(tcp_hdr(skb));
 	}
 
-	spin_lock(&flow->lock);
-	flow->used = jiffies;
-	flow->packet_count++;
-	flow->byte_count += skb->len;
-	flow->tcp_flags |= tcp_flags;
-	spin_unlock(&flow->lock);
+	spin_lock(&stats->lock);
+	stats->used = jiffies;
+	stats->packet_count++;
+	stats->byte_count += skb->len;
+	stats->tcp_flags |= tcp_flags;
+	spin_unlock(&stats->lock);
+}
+
+static void stats_read(struct flow_stats *stats,
+		       struct ovs_flow_stats *ovs_stats,
+		       unsigned long *used, __be16 *tcp_flags)
+{
+	spin_lock(&stats->lock);
+	if (time_after(stats->used, *used))
+		*used = stats->used;
+	*tcp_flags |= stats->tcp_flags;
+	ovs_stats->n_packets += stats->packet_count;
+	ovs_stats->n_bytes += stats->byte_count;
+	spin_unlock(&stats->lock);
+}
+
+void ovs_flow_stats_get(struct sw_flow *flow, struct ovs_flow_stats *ovs_stats,
+			unsigned long *used, __be16 *tcp_flags)
+{
+	int cpu, cur_cpu;
+
+	*used = 0;
+	*tcp_flags = 0;
+	memset(ovs_stats, 0, sizeof(*ovs_stats));
+
+	if (!flow->stats.is_percpu) {
+		stats_read(flow->stats.stat, ovs_stats, used, tcp_flags);
+	} else {
+		cur_cpu = get_cpu();
+		for_each_possible_cpu(cpu) {
+			struct flow_stats *stats;
+
+			if (cpu == cur_cpu)
+				local_bh_disable();
+
+			stats = per_cpu_ptr(flow->stats.cpu_stats, cpu);
+			stats_read(stats, ovs_stats, used, tcp_flags);
+
+			if (cpu == cur_cpu)
+				local_bh_enable();
+		}
+		put_cpu();
+	}
+}
+
+static void stats_reset(struct flow_stats *stats)
+{
+	spin_lock(&stats->lock);
+	stats->used = 0;
+	stats->packet_count = 0;
+	stats->byte_count = 0;
+	stats->tcp_flags = 0;
+	spin_unlock(&stats->lock);
+}
+
+void ovs_flow_stats_clear(struct sw_flow *flow)
+{
+	int cpu, cur_cpu;
+
+	if (!flow->stats.is_percpu) {
+		stats_reset(flow->stats.stat);
+	} else {
+		cur_cpu = get_cpu();
+
+		for_each_possible_cpu(cpu) {
+
+			if (cpu == cur_cpu)
+				local_bh_disable();
+
+			stats_reset(per_cpu_ptr(flow->stats.cpu_stats, cpu));
+
+			if (cpu == cur_cpu)
+				local_bh_enable();
+		}
+		put_cpu();
+	}
 }
 
 static int check_header(struct sk_buff *skb, int len)
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 176406d..2d770e2 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -19,6 +19,7 @@
 #ifndef FLOW_H
 #define FLOW_H 1
 
+#include <linux/cache.h>
 #include <linux/kernel.h>
 #include <linux/netlink.h>
 #include <linux/openvswitch.h>
@@ -146,6 +147,22 @@ struct sw_flow_actions {
 	struct nlattr actions[];
 };
 
+struct flow_stats {
+	u64 packet_count;		/* Number of packets matched. */
+	u64 byte_count;			/* Number of bytes matched. */
+	unsigned long used;		/* Last used time (in jiffies). */
+	spinlock_t lock;		/* Lock for atomic stats update. */
+	__be16 tcp_flags;		/* Union of seen TCP flags. */
+};
+
+struct sw_flow_stats {
+	bool is_percpu;
+	union {
+		struct flow_stats *stat;
+		struct flow_stats __percpu *cpu_stats;
+	};
+};
+
 struct sw_flow {
 	struct rcu_head rcu;
 	struct hlist_node hash_node[2];
@@ -155,12 +172,7 @@ struct sw_flow {
 	struct sw_flow_key unmasked_key;
 	struct sw_flow_mask *mask;
 	struct sw_flow_actions __rcu *sf_acts;
-
-	spinlock_t lock;	/* Lock for values below. */
-	unsigned long used;	/* Last used time (in jiffies). */
-	u64 packet_count;	/* Number of packets matched. */
-	u64 byte_count;		/* Number of bytes matched. */
-	__be16 tcp_flags;	/* Union of seen TCP flags. */
+	struct sw_flow_stats stats;
 };
 
 struct arp_eth_header {
@@ -177,7 +189,10 @@ struct arp_eth_header {
 	unsigned char       ar_tip[4];		/* target IP address        */
 } __packed;
 
-void ovs_flow_used(struct sw_flow *, struct sk_buff *);
+void ovs_flow_stats_update(struct sw_flow *flow, struct sk_buff *skb);
+void ovs_flow_stats_get(struct sw_flow *flow, struct ovs_flow_stats *stats,
+			unsigned long *used, __be16 *tcp_flags);
+void ovs_flow_stats_clear(struct sw_flow *flow);
 u64 ovs_flow_used_time(unsigned long flow_jiffies);
 
 int ovs_flow_extract(struct sk_buff *, u16 in_port, struct sw_flow_key *);
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 2bc1bc1..3ccb92f 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -266,6 +266,20 @@ static bool is_all_zero(const u8 *fp, size_t size)
 	return true;
 }
 
+static bool is_all_set(const u8 *fp, size_t size)
+{
+	int i;
+
+	if (!fp)
+		return false;
+
+	for (i = 0; i < size; i++)
+		if (fp[i] != 0xff)
+			return false;
+
+	return true;
+}
+
 static int __parse_flow_nlattrs(const struct nlattr *attr,
 				const struct nlattr *a[],
 				u64 *attrsp, bool nz)
@@ -487,8 +501,9 @@ static int metadata_from_nlattrs(struct sw_flow_match *match,  u64 *attrs,
 	return 0;
 }
 
-static int ovs_key_from_nlattrs(struct sw_flow_match *match,  u64 attrs,
-				const struct nlattr **a, bool is_mask)
+static int ovs_key_from_nlattrs(struct sw_flow_match *match,  bool *exact_5tuple,
+				u64 attrs, const struct nlattr **a,
+				bool is_mask)
 {
 	int err;
 	u64 orig_attrs = attrs;
@@ -545,6 +560,11 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match,  u64 attrs,
 		SW_FLOW_KEY_PUT(match, eth.type, htons(ETH_P_802_2), is_mask);
 	}
 
+	if (is_mask && exact_5tuple) {
+		if (match->mask->key.eth.type != htons(0xffff))
+			*exact_5tuple = false;
+	}
+
 	if (attrs & (1 << OVS_KEY_ATTR_IPV4)) {
 		const struct ovs_key_ipv4 *ipv4_key;
 
@@ -567,6 +587,13 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match,  u64 attrs,
 		SW_FLOW_KEY_PUT(match, ipv4.addr.dst,
 				ipv4_key->ipv4_dst, is_mask);
 		attrs &= ~(1 << OVS_KEY_ATTR_IPV4);
+
+		if (is_mask && exact_5tuple && *exact_5tuple) {
+			if (ipv4_key->ipv4_proto != 0xff ||
+			    ipv4_key->ipv4_src != htonl(0xffffffff) ||
+			    ipv4_key->ipv4_dst != htonl(0xffffffff))
+				*exact_5tuple = false;
+		}
 	}
 
 	if (attrs & (1 << OVS_KEY_ATTR_IPV6)) {
@@ -598,6 +625,13 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match,  u64 attrs,
 				is_mask);
 
 		attrs &= ~(1 << OVS_KEY_ATTR_IPV6);
+
+		if (is_mask && exact_5tuple && *exact_5tuple) {
+			if (ipv6_key->ipv6_proto != 0xff ||
+			    !is_all_set((u8 *)ipv6_key->ipv6_src, sizeof(match->key->ipv6.addr.src)) ||
+			    !is_all_set((u8 *)ipv6_key->ipv6_dst, sizeof(match->key->ipv6.addr.dst)))
+				*exact_5tuple = false;
+		}
 	}
 
 	if (attrs & (1 << OVS_KEY_ATTR_ARP)) {
@@ -640,6 +674,11 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match,  u64 attrs,
 					tcp_key->tcp_dst, is_mask);
 		}
 		attrs &= ~(1 << OVS_KEY_ATTR_TCP);
+
+		if (is_mask && exact_5tuple && *exact_5tuple &&
+		    (tcp_key->tcp_src != htons(0xffff) ||
+		     tcp_key->tcp_dst != htons(0xffff)))
+			*exact_5tuple = false;
 	}
 
 	if (attrs & (1 << OVS_KEY_ATTR_TCP_FLAGS)) {
@@ -671,6 +710,11 @@ static int ovs_key_from_nlattrs(struct sw_flow_match *match,  u64 attrs,
 					udp_key->udp_dst, is_mask);
 		}
 		attrs &= ~(1 << OVS_KEY_ATTR_UDP);
+
+		if (is_mask && exact_5tuple && *exact_5tuple &&
+		    (udp_key->udp_src != htons(0xffff) ||
+		     udp_key->udp_dst != htons(0xffff)))
+			*exact_5tuple = false;
 	}
 
 	if (attrs & (1 << OVS_KEY_ATTR_SCTP)) {
@@ -756,6 +800,7 @@ static void sw_flow_mask_set(struct sw_flow_mask *mask,
  * attribute specifies the mask field of the wildcarded flow.
  */
 int ovs_nla_get_match(struct sw_flow_match *match,
+		      bool *exact_5tuple,
 		      const struct nlattr *key,
 		      const struct nlattr *mask)
 {
@@ -803,10 +848,13 @@ int ovs_nla_get_match(struct sw_flow_match *match,
 		}
 	}
 
-	err = ovs_key_from_nlattrs(match, key_attrs, a, false);
+	err = ovs_key_from_nlattrs(match, NULL, key_attrs, a, false);
 	if (err)
 		return err;
 
+	if (exact_5tuple)
+		*exact_5tuple = true;
+
 	if (mask) {
 		err = parse_flow_mask_nlattrs(mask, a, &mask_attrs);
 		if (err)
@@ -844,7 +892,7 @@ int ovs_nla_get_match(struct sw_flow_match *match,
 			}
 		}
 
-		err = ovs_key_from_nlattrs(match, mask_attrs, a, true);
+		err = ovs_key_from_nlattrs(match, exact_5tuple, mask_attrs, a, true);
 		if (err)
 			return err;
 	} else {
diff --git a/net/openvswitch/flow_netlink.h b/net/openvswitch/flow_netlink.h
index 4401510..b31fbe2 100644
--- a/net/openvswitch/flow_netlink.h
+++ b/net/openvswitch/flow_netlink.h
@@ -45,6 +45,7 @@ int ovs_nla_put_flow(const struct sw_flow_key *,
 int ovs_nla_get_flow_metadata(struct sw_flow *flow,
 			      const struct nlattr *attr);
 int ovs_nla_get_match(struct sw_flow_match *match,
+		      bool *exact_5tuple,
 		      const struct nlattr *,
 		      const struct nlattr *);
 
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 7b9cf2c..299ea8b 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -72,19 +72,42 @@ void ovs_flow_mask_key(struct sw_flow_key *dst, const struct sw_flow_key *src,
 		*d++ = *s++ & *m++;
 }
 
-struct sw_flow *ovs_flow_alloc(void)
+struct sw_flow *ovs_flow_alloc(bool percpu_stats)
 {
 	struct sw_flow *flow;
+	int cpu;
 
 	flow = kmem_cache_alloc(flow_cache, GFP_KERNEL);
 	if (!flow)
 		return ERR_PTR(-ENOMEM);
 
-	spin_lock_init(&flow->lock);
 	flow->sf_acts = NULL;
 	flow->mask = NULL;
 
+	flow->stats.is_percpu = percpu_stats;
+
+	if (!percpu_stats) {
+		flow->stats.stat = kzalloc(sizeof(*flow->stats.stat), GFP_KERNEL);
+		if (!flow->stats.stat)
+			goto err;
+
+		spin_lock_init(&flow->stats.stat->lock);
+	} else {
+		flow->stats.cpu_stats = alloc_percpu(struct flow_stats);
+		if (!flow->stats.cpu_stats)
+			goto err;
+
+		for_each_possible_cpu(cpu) {
+			struct flow_stats *cpu_stats;
+
+			cpu_stats = per_cpu_ptr(flow->stats.cpu_stats, cpu);
+			spin_lock_init(&cpu_stats->lock);
+		}
+	}
 	return flow;
+err:
+	kfree(flow);
+	return ERR_PTR(-ENOMEM);
 }
 
 int ovs_flow_tbl_count(struct flow_table *table)
@@ -118,6 +141,10 @@ static struct flex_array *alloc_buckets(unsigned int n_buckets)
 static void flow_free(struct sw_flow *flow)
 {
 	kfree((struct sf_flow_acts __force *)flow->sf_acts);
+	if (flow->stats.is_percpu)
+		free_percpu(flow->stats.cpu_stats);
+	else
+		kfree(flow->stats.stat);
 	kmem_cache_free(flow_cache, flow);
 }
 
diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h
index f54aa82..1996e34 100644
--- a/net/openvswitch/flow_table.h
+++ b/net/openvswitch/flow_table.h
@@ -55,7 +55,7 @@ struct flow_table {
 int ovs_flow_init(void);
 void ovs_flow_exit(void);
 
-struct sw_flow *ovs_flow_alloc(void);
+struct sw_flow *ovs_flow_alloc(bool percpu_stats);
 void ovs_flow_free(struct sw_flow *, bool deferred);
 
 int ovs_flow_tbl_init(struct flow_table *);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 09/17] net: ovs: use kfree_rcu instead of rcu_free_{sw_flow_mask_cb, acts_callback}
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
                     ` (7 preceding siblings ...)
  2014-01-07  0:16   ` [PATCH net-next 08/17] openvswitch: Per cpu flow stats Jesse Gross
@ 2014-01-07  0:16   ` Jesse Gross
  2014-01-07  0:16   ` [PATCH net-next 10/17] openvswitch: remove duplicated include from flow_table.c Jesse Gross
                     ` (8 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2014-01-07  0:16 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

From: Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

As we're only doing a kfree() anyway in the RCU callback, we can
simply use kfree_rcu, which does the same job, and remove the
function rcu_free_sw_flow_mask_cb() and rcu_free_acts_callback().

Signed-off-by: Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/flow_netlink.c | 10 +---------
 net/openvswitch/flow_table.c   |  9 +--------
 2 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 3ccb92f..4d000ac 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -1176,19 +1176,11 @@ struct sw_flow_actions *ovs_nla_alloc_flow_actions(int size)
 	return sfa;
 }
 
-/* RCU callback used by ovs_nla_free_flow_actions. */
-static void rcu_free_acts_callback(struct rcu_head *rcu)
-{
-	struct sw_flow_actions *sf_acts = container_of(rcu,
-			struct sw_flow_actions, rcu);
-	kfree(sf_acts);
-}
-
 /* Schedules 'sf_acts' to be freed after the next RCU grace period.
  * The caller must hold rcu_read_lock for this to be sensible. */
 void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts)
 {
-	call_rcu(&sf_acts->rcu, rcu_free_acts_callback);
+	kfree_rcu(sf_acts, rcu);
 }
 
 static struct nlattr *reserve_sfa_size(struct sw_flow_actions **sfa,
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 299ea8b..099a1a9 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -155,13 +155,6 @@ static void rcu_free_flow_callback(struct rcu_head *rcu)
 	flow_free(flow);
 }
 
-static void rcu_free_sw_flow_mask_cb(struct rcu_head *rcu)
-{
-	struct sw_flow_mask *mask = container_of(rcu, struct sw_flow_mask, rcu);
-
-	kfree(mask);
-}
-
 static void flow_mask_del_ref(struct sw_flow_mask *mask, bool deferred)
 {
 	if (!mask)
@@ -173,7 +166,7 @@ static void flow_mask_del_ref(struct sw_flow_mask *mask, bool deferred)
 	if (!mask->ref_count) {
 		list_del_rcu(&mask->list);
 		if (deferred)
-			call_rcu(&mask->rcu, rcu_free_sw_flow_mask_cb);
+			kfree_rcu(mask, rcu);
 		else
 			kfree(mask);
 	}
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 10/17] openvswitch: remove duplicated include from flow_table.c
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
                     ` (8 preceding siblings ...)
  2014-01-07  0:16   ` [PATCH net-next 09/17] net: ovs: use kfree_rcu instead of rcu_free_{sw_flow_mask_cb, acts_callback} Jesse Gross
@ 2014-01-07  0:16   ` Jesse Gross
  2014-01-07  0:16   ` [PATCH net-next 11/17] net: Export skb_zerocopy() to zerocopy from one skb to another Jesse Gross
                     ` (7 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2014-01-07  0:16 UTC (permalink / raw)
  To: David Miller
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA, Wei Yongjun

From: Wei Yongjun <yongjun_wei-zrsr2BFq86L20UzCJQGyNP8+0UxHXcjY@public.gmane.org>

Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei-zrsr2BFq86L20UzCJQGyNP8+0UxHXcjY@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/flow_table.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 099a1a9..51f6d7f 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -44,8 +44,6 @@
 #include <net/ipv6.h>
 #include <net/ndisc.h>
 
-#include "datapath.h"
-
 #define TBL_MIN_BUCKETS		1024
 #define REHASH_INTERVAL		(10 * 60 * HZ)
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 11/17] net: Export skb_zerocopy() to zerocopy from one skb to another
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
                     ` (9 preceding siblings ...)
  2014-01-07  0:16   ` [PATCH net-next 10/17] openvswitch: remove duplicated include from flow_table.c Jesse Gross
@ 2014-01-07  0:16   ` Jesse Gross
  2014-01-07  0:16   ` [PATCH net-next 12/17] openvswitch: Allow user space to announce ability to accept unaligned Netlink messages Jesse Gross
                     ` (6 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2014-01-07  0:16 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

From: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>

Make the skb zerocopy logic written for nfnetlink queue available for
use by other modules.

Signed-off-by: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
Reviewed-by: Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Acked-by: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 include/linux/skbuff.h               |  3 ++
 net/core/skbuff.c                    | 85 ++++++++++++++++++++++++++++++++++++
 net/netfilter/nfnetlink_queue_core.c | 59 ++-----------------------
 3 files changed, 92 insertions(+), 55 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index bec1cc7..7c48e2d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2345,6 +2345,9 @@ int skb_splice_bits(struct sk_buff *skb, unsigned int offset,
 		    struct pipe_inode_info *pipe, unsigned int len,
 		    unsigned int flags);
 void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to);
+unsigned int skb_zerocopy_headlen(const struct sk_buff *from);
+void skb_zerocopy(struct sk_buff *to, const struct sk_buff *from,
+		  int len, int hlen);
 void skb_split(struct sk_buff *skb, struct sk_buff *skb1, const u32 len);
 int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen);
 void skb_scrub_packet(struct sk_buff *skb, bool xnet);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2718fed..55859cb 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2122,6 +2122,91 @@ __wsum skb_copy_and_csum_bits(const struct sk_buff *skb, int offset,
 }
 EXPORT_SYMBOL(skb_copy_and_csum_bits);
 
+ /**
+ *	skb_zerocopy_headlen - Calculate headroom needed for skb_zerocopy()
+ *	@from: source buffer
+ *
+ *	Calculates the amount of linear headroom needed in the 'to' skb passed
+ *	into skb_zerocopy().
+ */
+unsigned int
+skb_zerocopy_headlen(const struct sk_buff *from)
+{
+	unsigned int hlen = 0;
+
+	if (!from->head_frag ||
+	    skb_headlen(from) < L1_CACHE_BYTES ||
+	    skb_shinfo(from)->nr_frags >= MAX_SKB_FRAGS)
+		hlen = skb_headlen(from);
+
+	if (skb_has_frag_list(from))
+		hlen = from->len;
+
+	return hlen;
+}
+EXPORT_SYMBOL_GPL(skb_zerocopy_headlen);
+
+/**
+ *	skb_zerocopy - Zero copy skb to skb
+ *	@to: destination buffer
+ *	@source: source buffer
+ *	@len: number of bytes to copy from source buffer
+ *	@hlen: size of linear headroom in destination buffer
+ *
+ *	Copies up to `len` bytes from `from` to `to` by creating references
+ *	to the frags in the source buffer.
+ *
+ *	The `hlen` as calculated by skb_zerocopy_headlen() specifies the
+ *	headroom in the `to` buffer.
+ */
+void
+skb_zerocopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
+{
+	int i, j = 0;
+	int plen = 0; /* length of skb->head fragment */
+	struct page *page;
+	unsigned int offset;
+
+	BUG_ON(!from->head_frag && !hlen);
+
+	/* dont bother with small payloads */
+	if (len <= skb_tailroom(to)) {
+		skb_copy_bits(from, 0, skb_put(to, len), len);
+		return;
+	}
+
+	if (hlen) {
+		skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
+		len -= hlen;
+	} else {
+		plen = min_t(int, skb_headlen(from), len);
+		if (plen) {
+			page = virt_to_head_page(from->head);
+			offset = from->data - (unsigned char *)page_address(page);
+			__skb_fill_page_desc(to, 0, page, offset, plen);
+			get_page(page);
+			j = 1;
+			len -= plen;
+		}
+	}
+
+	to->truesize += len + plen;
+	to->len += len + plen;
+	to->data_len += len + plen;
+
+	for (i = 0; i < skb_shinfo(from)->nr_frags; i++) {
+		if (!len)
+			break;
+		skb_shinfo(to)->frags[j] = skb_shinfo(from)->frags[i];
+		skb_shinfo(to)->frags[j].size = min_t(int, skb_shinfo(to)->frags[j].size, len);
+		len -= skb_shinfo(to)->frags[j].size;
+		skb_frag_ref(to, j);
+		j++;
+	}
+	skb_shinfo(to)->nr_frags = j;
+}
+EXPORT_SYMBOL_GPL(skb_zerocopy);
+
 void skb_copy_and_csum_dev(const struct sk_buff *skb, u8 *to)
 {
 	__wsum csum;
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index 21258cf..615ee12 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -235,51 +235,6 @@ nfqnl_flush(struct nfqnl_instance *queue, nfqnl_cmpfn cmpfn, unsigned long data)
 	spin_unlock_bh(&queue->lock);
 }
 
-static void
-nfqnl_zcopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen)
-{
-	int i, j = 0;
-	int plen = 0; /* length of skb->head fragment */
-	struct page *page;
-	unsigned int offset;
-
-	/* dont bother with small payloads */
-	if (len <= skb_tailroom(to)) {
-		skb_copy_bits(from, 0, skb_put(to, len), len);
-		return;
-	}
-
-	if (hlen) {
-		skb_copy_bits(from, 0, skb_put(to, hlen), hlen);
-		len -= hlen;
-	} else {
-		plen = min_t(int, skb_headlen(from), len);
-		if (plen) {
-			page = virt_to_head_page(from->head);
-			offset = from->data - (unsigned char *)page_address(page);
-			__skb_fill_page_desc(to, 0, page, offset, plen);
-			get_page(page);
-			j = 1;
-			len -= plen;
-		}
-	}
-
-	to->truesize += len + plen;
-	to->len += len + plen;
-	to->data_len += len + plen;
-
-	for (i = 0; i < skb_shinfo(from)->nr_frags; i++) {
-		if (!len)
-			break;
-		skb_shinfo(to)->frags[j] = skb_shinfo(from)->frags[i];
-		skb_shinfo(to)->frags[j].size = min_t(int, skb_shinfo(to)->frags[j].size, len);
-		len -= skb_shinfo(to)->frags[j].size;
-		skb_frag_ref(to, j);
-		j++;
-	}
-	skb_shinfo(to)->nr_frags = j;
-}
-
 static int
 nfqnl_put_packet_info(struct sk_buff *nlskb, struct sk_buff *packet,
 		      bool csum_verify)
@@ -304,7 +259,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 {
 	size_t size;
 	size_t data_len = 0, cap_len = 0;
-	int hlen = 0;
+	unsigned int hlen = 0;
 	struct sk_buff *skb;
 	struct nlattr *nla;
 	struct nfqnl_msg_packet_hdr *pmsg;
@@ -356,14 +311,8 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 		if (data_len > entskb->len)
 			data_len = entskb->len;
 
-		if (!entskb->head_frag ||
-		    skb_headlen(entskb) < L1_CACHE_BYTES ||
-		    skb_shinfo(entskb)->nr_frags >= MAX_SKB_FRAGS)
-			hlen = skb_headlen(entskb);
-
-		if (skb_has_frag_list(entskb))
-			hlen = entskb->len;
-		hlen = min_t(int, data_len, hlen);
+		hlen = skb_zerocopy_headlen(entskb);
+		hlen = min_t(unsigned int, hlen, data_len);
 		size += sizeof(struct nlattr) + hlen;
 		cap_len = entskb->len;
 		break;
@@ -504,7 +453,7 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
 		nla->nla_type = NFQA_PAYLOAD;
 		nla->nla_len = nla_attr_size(data_len);
 
-		nfqnl_zcopy(skb, entskb, data_len, hlen);
+		skb_zerocopy(skb, entskb, data_len, hlen);
 	}
 
 	nlh->nlmsg_len = skb->len;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 12/17] openvswitch: Allow user space to announce ability to accept unaligned Netlink messages
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
                     ` (10 preceding siblings ...)
  2014-01-07  0:16   ` [PATCH net-next 11/17] net: Export skb_zerocopy() to zerocopy from one skb to another Jesse Gross
@ 2014-01-07  0:16   ` Jesse Gross
  2014-01-07  0:16   ` [PATCH net-next 13/17] openvswitch: Drop user features if old user space attempted to create datapath Jesse Gross
                     ` (5 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2014-01-07  0:16 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

From: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>

Signed-off-by: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
Reviewed-by: Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 include/uapi/linux/openvswitch.h |  4 ++++
 net/openvswitch/datapath.c       | 14 ++++++++++++++
 net/openvswitch/datapath.h       |  2 ++
 3 files changed, 20 insertions(+)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index d120f9f..07ef2c3 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -75,6 +75,7 @@ enum ovs_datapath_attr {
 	OVS_DP_ATTR_UPCALL_PID,		/* Netlink PID to receive upcalls */
 	OVS_DP_ATTR_STATS,		/* struct ovs_dp_stats */
 	OVS_DP_ATTR_MEGAFLOW_STATS,	/* struct ovs_dp_megaflow_stats */
+	OVS_DP_ATTR_USER_FEATURES,	/* OVS_DP_F_*  */
 	__OVS_DP_ATTR_MAX
 };
 
@@ -106,6 +107,9 @@ struct ovs_vport_stats {
 	__u64   tx_dropped;		/* no space available in linux  */
 };
 
+/* Allow last Netlink attribute to be unaligned */
+#define OVS_DP_F_UNALIGNED	(1 << 0)
+
 /* Fixed logical ports. */
 #define OVSP_LOCAL      ((__u32)0)
 
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 50d7782..6a9b0cb 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1040,6 +1040,7 @@ static const struct genl_ops dp_flow_genl_ops[] = {
 static const struct nla_policy datapath_policy[OVS_DP_ATTR_MAX + 1] = {
 	[OVS_DP_ATTR_NAME] = { .type = NLA_NUL_STRING, .len = IFNAMSIZ - 1 },
 	[OVS_DP_ATTR_UPCALL_PID] = { .type = NLA_U32 },
+	[OVS_DP_ATTR_USER_FEATURES] = { .type = NLA_U32 },
 };
 
 static struct genl_family dp_datapath_genl_family = {
@@ -1098,6 +1099,9 @@ static int ovs_dp_cmd_fill_info(struct datapath *dp, struct sk_buff *skb,
 			&dp_megaflow_stats))
 		goto nla_put_failure;
 
+	if (nla_put_u32(skb, OVS_DP_ATTR_USER_FEATURES, dp->user_features))
+		goto nla_put_failure;
+
 	return genlmsg_end(skb, ovs_header);
 
 nla_put_failure:
@@ -1144,6 +1148,12 @@ static struct datapath *lookup_datapath(struct net *net,
 	return dp ? dp : ERR_PTR(-ENODEV);
 }
 
+static void ovs_dp_change(struct datapath *dp, struct nlattr **a)
+{
+	if (a[OVS_DP_ATTR_USER_FEATURES])
+		dp->user_features = nla_get_u32(a[OVS_DP_ATTR_USER_FEATURES]);
+}
+
 static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 {
 	struct nlattr **a = info->attrs;
@@ -1202,6 +1212,8 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	parms.port_no = OVSP_LOCAL;
 	parms.upcall_portid = nla_get_u32(a[OVS_DP_ATTR_UPCALL_PID]);
 
+	ovs_dp_change(dp, a);
+
 	vport = new_vport(&parms);
 	if (IS_ERR(vport)) {
 		err = PTR_ERR(vport);
@@ -1305,6 +1317,8 @@ static int ovs_dp_cmd_set(struct sk_buff *skb, struct genl_info *info)
 	if (IS_ERR(dp))
 		goto unlock;
 
+	ovs_dp_change(dp, info->attrs);
+
 	reply = ovs_dp_cmd_build_info(dp, info, OVS_DP_CMD_NEW);
 	if (IS_ERR(reply)) {
 		err = PTR_ERR(reply);
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index ba13be4..62619a4 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -88,6 +88,8 @@ struct datapath {
 	/* Network namespace ref. */
 	struct net *net;
 #endif
+
+	u32 user_features;
 };
 
 /**
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 13/17] openvswitch: Drop user features if old user space attempted to create datapath
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
                     ` (11 preceding siblings ...)
  2014-01-07  0:16   ` [PATCH net-next 12/17] openvswitch: Allow user space to announce ability to accept unaligned Netlink messages Jesse Gross
@ 2014-01-07  0:16   ` Jesse Gross
  2014-01-07  0:16   ` [PATCH net-next 14/17] openvswitch: Pass datapath into userspace queue functions Jesse Gross
                     ` (4 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2014-01-07  0:16 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

From: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>

Drop user features if an outdated user space instance that does not
understand the concept of user_features attempted to create a new
datapath.

Signed-off-by: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 include/uapi/linux/openvswitch.h | 10 +++++++++-
 net/openvswitch/datapath.c       | 21 +++++++++++++++++++++
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 07ef2c3..970553c 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -40,7 +40,15 @@ struct ovs_header {
 
 #define OVS_DATAPATH_FAMILY  "ovs_datapath"
 #define OVS_DATAPATH_MCGROUP "ovs_datapath"
-#define OVS_DATAPATH_VERSION 0x1
+
+/* V2:
+ *   - API users are expected to provide OVS_DP_ATTR_USER_FEATURES
+ *     when creating the datapath.
+ */
+#define OVS_DATAPATH_VERSION 2
+
+/* First OVS datapath version to support features */
+#define OVS_DP_VER_FEATURES 2
 
 enum ovs_datapath_cmd {
 	OVS_DP_CMD_UNSPEC,
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 6a9b0cb..497b2fd 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1148,6 +1148,18 @@ static struct datapath *lookup_datapath(struct net *net,
 	return dp ? dp : ERR_PTR(-ENODEV);
 }
 
+static void ovs_dp_reset_user_features(struct sk_buff *skb, struct genl_info *info)
+{
+	struct datapath *dp;
+
+	dp = lookup_datapath(sock_net(skb->sk), info->userhdr, info->attrs);
+	if (!dp)
+		return;
+
+	WARN(dp->user_features, "Dropping previously announced user features\n");
+	dp->user_features = 0;
+}
+
 static void ovs_dp_change(struct datapath *dp, struct nlattr **a)
 {
 	if (a[OVS_DP_ATTR_USER_FEATURES])
@@ -1220,6 +1232,15 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		if (err == -EBUSY)
 			err = -EEXIST;
 
+		if (err == -EEXIST) {
+			/* An outdated user space instance that does not understand
+			 * the concept of user_features has attempted to create a new
+			 * datapath and is likely to reuse it. Drop all user features.
+			 */
+			if (info->genlhdr->version < OVS_DP_VER_FEATURES)
+				ovs_dp_reset_user_features(skb, info);
+		}
+
 		goto err_destroy_ports_array;
 	}
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 14/17] openvswitch: Pass datapath into userspace queue functions
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
                     ` (12 preceding siblings ...)
  2014-01-07  0:16   ` [PATCH net-next 13/17] openvswitch: Drop user features if old user space attempted to create datapath Jesse Gross
@ 2014-01-07  0:16   ` Jesse Gross
  2014-01-07  0:16   ` [PATCH net-next 15/17] openvswitch: Use skb_zerocopy() for upcall Jesse Gross
                     ` (3 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2014-01-07  0:16 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

From: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>

Allows removing the net and dp_ifindex argument and simplify the
code.

Signed-off-by: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/datapath.c | 34 ++++++++++++++--------------------
 1 file changed, 14 insertions(+), 20 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 497b2fd..235acae 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -108,10 +108,9 @@ int lockdep_ovsl_is_held(void)
 #endif
 
 static struct vport *new_vport(const struct vport_parms *);
-static int queue_gso_packets(struct net *, int dp_ifindex, struct sk_buff *,
+static int queue_gso_packets(struct datapath *dp, struct sk_buff *,
 			     const struct dp_upcall_info *);
-static int queue_userspace_packet(struct net *, int dp_ifindex,
-				  struct sk_buff *,
+static int queue_userspace_packet(struct datapath *dp, struct sk_buff *,
 				  const struct dp_upcall_info *);
 
 /* Must be called with rcu_read_lock or ovs_mutex. */
@@ -277,7 +276,6 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff *skb,
 		  const struct dp_upcall_info *upcall_info)
 {
 	struct dp_stats_percpu *stats;
-	int dp_ifindex;
 	int err;
 
 	if (upcall_info->portid == 0) {
@@ -285,16 +283,10 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff *skb,
 		goto err;
 	}
 
-	dp_ifindex = get_dpifindex(dp);
-	if (!dp_ifindex) {
-		err = -ENODEV;
-		goto err;
-	}
-
 	if (!skb_is_gso(skb))
-		err = queue_userspace_packet(ovs_dp_get_net(dp), dp_ifindex, skb, upcall_info);
+		err = queue_userspace_packet(dp, skb, upcall_info);
 	else
-		err = queue_gso_packets(ovs_dp_get_net(dp), dp_ifindex, skb, upcall_info);
+		err = queue_gso_packets(dp, skb, upcall_info);
 	if (err)
 		goto err;
 
@@ -310,8 +302,7 @@ err:
 	return err;
 }
 
-static int queue_gso_packets(struct net *net, int dp_ifindex,
-			     struct sk_buff *skb,
+static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb,
 			     const struct dp_upcall_info *upcall_info)
 {
 	unsigned short gso_type = skb_shinfo(skb)->gso_type;
@@ -327,7 +318,7 @@ static int queue_gso_packets(struct net *net, int dp_ifindex,
 	/* Queue all of the segments. */
 	skb = segs;
 	do {
-		err = queue_userspace_packet(net, dp_ifindex, skb, upcall_info);
+		err = queue_userspace_packet(dp, skb, upcall_info);
 		if (err)
 			break;
 
@@ -394,8 +385,7 @@ static size_t upcall_msg_size(const struct sk_buff *skb,
 	return size;
 }
 
-static int queue_userspace_packet(struct net *net, int dp_ifindex,
-				  struct sk_buff *skb,
+static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
 				  const struct dp_upcall_info *upcall_info)
 {
 	struct ovs_header *upcall;
@@ -403,11 +393,15 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
 	struct sk_buff *user_skb; /* to be queued to userspace */
 	struct nlattr *nla;
 	struct genl_info info = {
-		.dst_sk = net->genl_sock,
+		.dst_sk = ovs_dp_get_net(dp)->genl_sock,
 		.snd_portid = upcall_info->portid,
 	};
 	size_t len;
-	int err;
+	int err, dp_ifindex;
+
+	dp_ifindex = get_dpifindex(dp);
+	if (!dp_ifindex)
+		return -ENODEV;
 
 	if (vlan_tx_tag_present(skb)) {
 		nskb = skb_clone(skb, GFP_ATOMIC);
@@ -452,7 +446,7 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
 	skb_copy_and_csum_dev(skb, nla_data(nla));
 
 	genlmsg_end(user_skb, upcall);
-	err = genlmsg_unicast(net, user_skb, upcall_info->portid);
+	err = genlmsg_unicast(ovs_dp_get_net(dp), user_skb, upcall_info->portid);
 
 out:
 	kfree_skb(nskb);
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 15/17] openvswitch: Use skb_zerocopy() for upcall
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
                     ` (13 preceding siblings ...)
  2014-01-07  0:16   ` [PATCH net-next 14/17] openvswitch: Pass datapath into userspace queue functions Jesse Gross
@ 2014-01-07  0:16   ` Jesse Gross
  2014-01-07  0:16   ` [PATCH net-next 16/17] openvswitch: Compute checksum in skb_gso_segment() if needed Jesse Gross
                     ` (2 subsequent siblings)
  17 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2014-01-07  0:16 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

From: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>

Use of skb_zerocopy() can avoid the expensive call to memcpy()
when copying the packet data into the Netlink skb. Completes
checksum through skb_checksum_help() if not already done in
GSO segmentation.

Zerocopy is only performed if user space supported unaligned
Netlink messages. memory mapped netlink i/o is preferred over
zerocopy if it is set up.

Cost of upcall is significantly reduced from:
+   7.48%       vhost-8471  [k] memcpy
+   5.57%     ovs-vswitchd  [k] memcpy
+   2.81%       vhost-8471  [k] csum_partial_copy_generic

to:
+   5.72%     ovs-vswitchd  [k] memcpy
+   3.32%       vhost-5153  [k] memcpy
+   0.68%       vhost-5153  [k] skb_zerocopy

(megaflows disabled)

Signed-off-by: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/datapath.c | 37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 235acae..8557834 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -371,11 +371,11 @@ static size_t key_attr_size(void)
 		+ nla_total_size(28); /* OVS_KEY_ATTR_ND */
 }
 
-static size_t upcall_msg_size(const struct sk_buff *skb,
-			      const struct nlattr *userdata)
+static size_t upcall_msg_size(const struct nlattr *userdata,
+			      unsigned int hdrlen)
 {
 	size_t size = NLMSG_ALIGN(sizeof(struct ovs_header))
-		+ nla_total_size(skb->len) /* OVS_PACKET_ATTR_PACKET */
+		+ nla_total_size(hdrlen) /* OVS_PACKET_ATTR_PACKET */
 		+ nla_total_size(key_attr_size()); /* OVS_PACKET_ATTR_KEY */
 
 	/* OVS_PACKET_ATTR_USERDATA */
@@ -397,6 +397,7 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
 		.snd_portid = upcall_info->portid,
 	};
 	size_t len;
+	unsigned int hlen;
 	int err, dp_ifindex;
 
 	dp_ifindex = get_dpifindex(dp);
@@ -421,7 +422,21 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
 		goto out;
 	}
 
-	len = upcall_msg_size(skb, upcall_info->userdata);
+	/* Complete checksum if needed */
+	if (skb->ip_summed == CHECKSUM_PARTIAL &&
+	    (err = skb_checksum_help(skb)))
+		goto out;
+
+	/* Older versions of OVS user space enforce alignment of the last
+	 * Netlink attribute to NLA_ALIGNTO which would require extensive
+	 * padding logic. Only perform zerocopy if padding is not required.
+	 */
+	if (dp->user_features & OVS_DP_F_UNALIGNED)
+		hlen = skb_zerocopy_headlen(skb);
+	else
+		hlen = skb->len;
+
+	len = upcall_msg_size(upcall_info->userdata, hlen);
 	user_skb = genlmsg_new_unicast(len, &info, GFP_ATOMIC);
 	if (!user_skb) {
 		err = -ENOMEM;
@@ -441,13 +456,19 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
 			  nla_len(upcall_info->userdata),
 			  nla_data(upcall_info->userdata));
 
-	nla = __nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, skb->len);
+	/* Only reserve room for attribute header, packet data is added
+	 * in skb_zerocopy() */
+	if (!(nla = nla_reserve(user_skb, OVS_PACKET_ATTR_PACKET, 0))) {
+		err = -ENOBUFS;
+		goto out;
+	}
+	nla->nla_len = nla_attr_size(skb->len);
 
-	skb_copy_and_csum_dev(skb, nla_data(nla));
+	skb_zerocopy(user_skb, skb, skb->len, hlen);
 
-	genlmsg_end(user_skb, upcall);
-	err = genlmsg_unicast(ovs_dp_get_net(dp), user_skb, upcall_info->portid);
+	((struct nlmsghdr *) user_skb->data)->nlmsg_len = user_skb->len;
 
+	err = genlmsg_unicast(ovs_dp_get_net(dp), user_skb, upcall_info->portid);
 out:
 	kfree_skb(nskb);
 	return err;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 16/17] openvswitch: Compute checksum in skb_gso_segment() if needed
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
                     ` (14 preceding siblings ...)
  2014-01-07  0:16   ` [PATCH net-next 15/17] openvswitch: Use skb_zerocopy() for upcall Jesse Gross
@ 2014-01-07  0:16   ` Jesse Gross
  2014-01-07  0:16   ` [PATCH net-next 17/17] ovs: make functions local Jesse Gross
  2014-01-07  0:49   ` [GIT net-next] Open vSwitch David Miller
  17 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2014-01-07  0:16 UTC (permalink / raw)
  To: David Miller; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

From: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>

The copy & csum optimization is no longer present with zerocopy
enabled. Compute the checksum in skb_gso_segment() directly by
dropping the HW CSUM capability from the features passed in.

Signed-off-by: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/datapath.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 8557834..61ae3b8 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -311,7 +311,7 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb,
 	struct sk_buff *segs, *nskb;
 	int err;
 
-	segs = __skb_gso_segment(skb, NETIF_F_SG | NETIF_F_HW_CSUM, false);
+	segs = __skb_gso_segment(skb, NETIF_F_SG, false);
 	if (IS_ERR(segs))
 		return PTR_ERR(segs);
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 17/17] ovs: make functions local
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
                     ` (15 preceding siblings ...)
  2014-01-07  0:16   ` [PATCH net-next 16/17] openvswitch: Compute checksum in skb_gso_segment() if needed Jesse Gross
@ 2014-01-07  0:16   ` Jesse Gross
  2014-01-07  0:49   ` [GIT net-next] Open vSwitch David Miller
  17 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2014-01-07  0:16 UTC (permalink / raw)
  To: David Miller
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	Stephen Hemminger

From: Stephen Hemminger <stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org>

Several functions and datastructures could be local
Found with 'make namespacecheck'

Signed-off-by: Stephen Hemminger <stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org>
Signed-off-by: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
---
 net/openvswitch/datapath.c | 4 ++--
 net/openvswitch/datapath.h | 2 --
 net/openvswitch/vport.c    | 6 +++++-
 net/openvswitch/vport.h    | 1 -
 4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 61ae3b8..df46928 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -132,7 +132,7 @@ static struct datapath *get_dp(struct net *net, int dp_ifindex)
 }
 
 /* Must be called with rcu_read_lock or ovs_mutex. */
-const char *ovs_dp_name(const struct datapath *dp)
+static const char *ovs_dp_name(const struct datapath *dp)
 {
 	struct vport *vport = ovs_vport_ovsl_rcu(dp, OVSP_LOCAL);
 	return vport->ops->get_name(vport);
@@ -1466,7 +1466,7 @@ struct genl_family dp_vport_genl_family = {
 	.parallel_ops = true,
 };
 
-struct genl_multicast_group ovs_dp_vport_multicast_group = {
+static struct genl_multicast_group ovs_dp_vport_multicast_group = {
 	.name = OVS_VPORT_MCGROUP
 };
 
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 62619a4..6be9fbb 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -182,14 +182,12 @@ static inline struct vport *ovs_vport_ovsl(const struct datapath *dp, int port_n
 
 extern struct notifier_block ovs_dp_device_notifier;
 extern struct genl_family dp_vport_genl_family;
-extern struct genl_multicast_group ovs_dp_vport_multicast_group;
 
 void ovs_dp_process_received_packet(struct vport *, struct sk_buff *);
 void ovs_dp_detach_port(struct vport *);
 int ovs_dp_upcall(struct datapath *, struct sk_buff *,
 		  const struct dp_upcall_info *);
 
-const char *ovs_dp_name(const struct datapath *dp);
 struct sk_buff *ovs_vport_cmd_build_info(struct vport *, u32 pid, u32 seq,
 					 u8 cmd);
 
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index d830a95f..f28ff7e 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -33,6 +33,9 @@
 #include "vport.h"
 #include "vport-internal_dev.h"
 
+static void ovs_vport_record_error(struct vport *,
+				   enum vport_err_type err_type);
+
 /* List of statically compiled vport implementations.  Don't forget to also
  * add yours to the list at the bottom of vport.h. */
 static const struct vport_ops *vport_ops_list[] = {
@@ -396,7 +399,8 @@ int ovs_vport_send(struct vport *vport, struct sk_buff *skb)
  * If using the vport generic stats layer indicate that an error of the given
  * type has occurred.
  */
-void ovs_vport_record_error(struct vport *vport, enum vport_err_type err_type)
+static void ovs_vport_record_error(struct vport *vport,
+				   enum vport_err_type err_type)
 {
 	spin_lock(&vport->stats_lock);
 
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index 1a9fbce..92137dd 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -192,7 +192,6 @@ static inline struct vport *vport_from_priv(const void *priv)
 
 void ovs_vport_receive(struct vport *, struct sk_buff *,
 		       struct ovs_key_ipv4_tunnel *);
-void ovs_vport_record_error(struct vport *, enum vport_err_type err_type);
 
 /* List of statically compiled vport implementations.  Don't forget to also
  * add yours to the list at the top of vport.c. */
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [GIT net-next] Open vSwitch
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
                     ` (16 preceding siblings ...)
  2014-01-07  0:16   ` [PATCH net-next 17/17] ovs: make functions local Jesse Gross
@ 2014-01-07  0:49   ` David Miller
  17 siblings, 0 replies; 33+ messages in thread
From: David Miller @ 2014-01-07  0:49 UTC (permalink / raw)
  To: jesse-l0M0P4e3n4LQT0dZR+AlfA
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

From: Jesse Gross <jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
Date: Mon,  6 Jan 2014 16:15:59 -0800

> Open vSwitch changes for net-next/3.14. Highlights are:
>  * Performance improvements in the mechanism to get packets to userspace
>    using memory mapped netlink and skb zero copy where appropriate.
>  * Per-cpu flow stats in situations where flows are likely to be shared
>    across CPUs. Standard flow stats are used in other situations to save
>    memory and allocation time.
>  * A handful of code cleanups and rationalization.

Lots of good stuff in here, pulled, thanks Jesse.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 02/17] openvswitch: Shrink sw_flow_mask by 8 bytes (64-bit) or 4 bytes (32-bit).
  2014-01-07  0:16   ` [PATCH net-next 02/17] openvswitch: Shrink sw_flow_mask by 8 bytes (64-bit) or 4 bytes (32-bit) Jesse Gross
@ 2014-01-07 21:36     ` Sergei Shtylyov
  0 siblings, 0 replies; 33+ messages in thread
From: Sergei Shtylyov @ 2014-01-07 21:36 UTC (permalink / raw)
  To: Jesse Gross, David Miller, Ben Pfaff; +Cc: netdev, dev, Andy Zhou

On 07.01.2014 4:16, Jesse Gross wrote:

> From: Ben Pfaff <blp@nicira.com>

> We won't normally have a ton of flow masks but using a size_t to store
> values no bigger than sizeof(struct sw_flow_key) seems excessive.

> This reduces sw_flow_key_range and sw_flow_mask by 4 bytes on 32-bit
> systems.  On 64-bit systems it shrinks sw_flow_key_range by 12 bytes but
> sw_flow_mask only by 8 bytes due to padding.

> Compile tested only.

> Signed-off-by: Ben Pfaff <blp@nicira.com>
> Acked-by: Andy Zhou <azhou@nicira.com>
> Signed-off-by: Jesse Gross <jesse@nicira.com>
> ---
>   net/openvswitch/flow.h | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)

> diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
> index 1510f51..176406d 100644
> --- a/net/openvswitch/flow.h
> +++ b/net/openvswitch/flow.h
> @@ -122,8 +122,8 @@ struct sw_flow_key {
>   } __aligned(BITS_PER_LONG/8); /* Ensure that we can do comparisons as longs. */
>
>   struct sw_flow_key_range {
> -	size_t start;
> -	size_t end;
> +	unsigned short int start;
> +	unsigned short int end;

    *short int* seems somewhat ambiguous, no?

WBR, Sergei

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ovs-dev] [GIT net-next] Open vSwitch
  2014-01-07  0:15 [GIT net-next] Open vSwitch Jesse Gross
       [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
@ 2014-01-08 14:49 ` Zoltan Kiss
       [not found]   ` <52CD657F.7080806-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 33+ messages in thread
From: Zoltan Kiss @ 2014-01-08 14:49 UTC (permalink / raw)
  To: Jesse Gross, David Miller; +Cc: dev, netdev

Hi,

I've tried the latest net-next on a Xenserver install with 1.9.3 
userspace, and it seems this patch series broke it (at least after 
reverting that locally it works now). I haven't went too far yet 
checking what's the problem, but it seems the xenbrX device doesn't 
really receive too much of the traffic coming through the NIC. Is it 
expected?

Regards,

Zoli

On 07/01/14 00:15, Jesse Gross wrote:
> Open vSwitch changes for net-next/3.14. Highlights are:
>   * Performance improvements in the mechanism to get packets to userspace
>     using memory mapped netlink and skb zero copy where appropriate.
>   * Per-cpu flow stats in situations where flows are likely to be shared
>     across CPUs. Standard flow stats are used in other situations to save
>     memory and allocation time.
>   * A handful of code cleanups and rationalization.
>
> The following changes since commit 6ce4eac1f600b34f2f7f58f9cd8f0503d79e42ae:
>
>    Linux 3.13-rc1 (2013-11-22 11:30:55 -0800)
>
> are available in the git repository at:
>
>    git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch.git master
>
> for you to fetch changes up to 443cd88c8a31379e95326428bbbd40af25c1d440:
>
>    ovs: make functions local (2014-01-06 15:54:39 -0800)
>
> ----------------------------------------------------------------
> Andy Zhou (1):
>        openvswitch: Change ovs_flow_tbl_lookup_xx() APIs
>
> Ben Pfaff (2):
>        openvswitch: Correct comment.
>        openvswitch: Shrink sw_flow_mask by 8 bytes (64-bit) or 4 bytes (32-bit).
>
> Daniel Borkmann (1):
>        net: ovs: use kfree_rcu instead of rcu_free_{sw_flow_mask_cb,acts_callback}
>
> Jesse Gross (1):
>        openvswitch: Silence RCU lockdep checks from flow lookup.
>
> Pravin B Shelar (1):
>        openvswitch: Per cpu flow stats.
>
> Stephen Hemminger (1):
>        ovs: make functions local
>
> Thomas Graf (9):
>        genl: Add genlmsg_new_unicast() for unicast message allocation
>        netlink: Avoid netlink mmap alloc if msg size exceeds frame size
>        openvswitch: Enable memory mapped Netlink i/o
>        net: Export skb_zerocopy() to zerocopy from one skb to another
>        openvswitch: Allow user space to announce ability to accept unaligned Netlink messages
>        openvswitch: Drop user features if old user space attempted to create datapath
>        openvswitch: Pass datapath into userspace queue functions
>        openvswitch: Use skb_zerocopy() for upcall
>        openvswitch: Compute checksum in skb_gso_segment() if needed
>
> Wei Yongjun (1):
>        openvswitch: remove duplicated include from flow_table.c
>
>   include/linux/skbuff.h               |   3 +
>   include/net/genetlink.h              |   4 +
>   include/uapi/linux/openvswitch.h     |  14 ++-
>   net/core/skbuff.c                    |  85 +++++++++++++
>   net/netfilter/nfnetlink_queue_core.c |  59 +--------
>   net/netlink/af_netlink.c             |   4 +
>   net/netlink/genetlink.c              |  21 ++++
>   net/openvswitch/datapath.c           | 231 +++++++++++++++++++----------------
>   net/openvswitch/datapath.h           |   6 +-
>   net/openvswitch/flow.c               |  96 +++++++++++++--
>   net/openvswitch/flow.h               |  33 +++--
>   net/openvswitch/flow_netlink.c       |  66 ++++++++--
>   net/openvswitch/flow_netlink.h       |   1 +
>   net/openvswitch/flow_table.c         |  60 ++++++---
>   net/openvswitch/flow_table.h         |   6 +-
>   net/openvswitch/vport.c              |   6 +-
>   net/openvswitch/vport.h              |   1 -
>   17 files changed, 483 insertions(+), 213 deletions(-)
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [GIT net-next] Open vSwitch
       [not found]   ` <52CD657F.7080806-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
@ 2014-01-08 15:10     ` Jesse Gross
  2014-01-13 18:04       ` [ovs-dev] " Zoltan Kiss
  0 siblings, 1 reply; 33+ messages in thread
From: Jesse Gross @ 2014-01-08 15:10 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev, David Miller

On Wed, Jan 8, 2014 at 9:49 AM, Zoltan Kiss <zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> wrote:
> Hi,
>
> I've tried the latest net-next on a Xenserver install with 1.9.3 userspace,
> and it seems this patch series broke it (at least after reverting that
> locally it works now). I haven't went too far yet checking what's the
> problem, but it seems the xenbrX device doesn't really receive too much of
> the traffic coming through the NIC. Is it expected?

What do you mean by doesn't receive too much traffic? What does it get?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ovs-dev] [GIT net-next] Open vSwitch
  2014-01-08 15:10     ` Jesse Gross
@ 2014-01-13 18:04       ` Zoltan Kiss
       [not found]         ` <52D42A9E.1030805-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Zoltan Kiss @ 2014-01-13 18:04 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, dev, netdev

On 08/01/14 15:10, Jesse Gross wrote:
> On Wed, Jan 8, 2014 at 9:49 AM, Zoltan Kiss <zoltan.kiss@citrix.com> wrote:
>> Hi,
>>
>> I've tried the latest net-next on a Xenserver install with 1.9.3 userspace,
>> and it seems this patch series broke it (at least after reverting that
>> locally it works now). I haven't went too far yet checking what's the
>> problem, but it seems the xenbrX device doesn't really receive too much of
>> the traffic coming through the NIC. Is it expected?
>
> What do you mean by doesn't receive too much traffic? What does it get?
>

Sorry for the vague error description, now I had more time to look into 
this. I think the problem boils down to this:

Jan 13 17:55:07 localhost ovs-vswitchd: 
07890|netlink_socket|DBG|nl_sock_recv__ (Success): nl(len:274, 
type=29(ovs_packet), flags=0, seq=0, pid=0,genl(cmd=1,version=1)
Jan 13 17:55:07 localhost ovs-vswitchd: 07891|netlink|DBG|attributes 
followed by garbage
Jan 13 17:55:07 localhost ovs-vswitchd: 07892|dpif|WARN|system@xenbr0: 
recv failed (Invalid argument)

That's just keep repeating. I'm keep looking.

Zoli

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [GIT net-next] Open vSwitch
       [not found]         ` <52D42A9E.1030805-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
@ 2014-01-14  0:31           ` Zoltan Kiss
       [not found]             ` <52D4857C.7020902-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Zoltan Kiss @ 2014-01-14  0:31 UTC (permalink / raw)
  To: Jesse Gross; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev, David Miller

On 13/01/14 18:04, Zoltan Kiss wrote:
> On 08/01/14 15:10, Jesse Gross wrote:
>> On Wed, Jan 8, 2014 at 9:49 AM, Zoltan Kiss <zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> 
>> wrote:
>>> Hi,
>>>
>>> I've tried the latest net-next on a Xenserver install with 1.9.3 
>>> userspace,
>>> and it seems this patch series broke it (at least after reverting that
>>> locally it works now). I haven't went too far yet checking what's the
>>> problem, but it seems the xenbrX device doesn't really receive too 
>>> much of
>>> the traffic coming through the NIC. Is it expected?
>>
>> What do you mean by doesn't receive too much traffic? What does it get?
>>
>
> Sorry for the vague error description, now I had more time to look 
> into this. I think the problem boils down to this:
>
> Jan 13 17:55:07 localhost ovs-vswitchd: 
> 07890|netlink_socket|DBG|nl_sock_recv__ (Success): nl(len:274, 
> type=29(ovs_packet), flags=0, seq=0, pid=0,genl(cmd=1,version=1)
> Jan 13 17:55:07 localhost ovs-vswitchd: 07891|netlink|DBG|attributes 
> followed by garbage
> Jan 13 17:55:07 localhost ovs-vswitchd: 07892|dpif|WARN|system@xenbr0: 
> recv failed (Invalid argument)
>
> That's just keep repeating. I'm keep looking.

Now I reverted these top 3 commits:

ovs: make functions local
openvswitch: Compute checksum in skb_gso_segment() if needed
openvswitch: Use skb_zerocopy() for upcall

And it works. I guess the last one causing the problem. Might be an 
important factor, I'm using 32 bit Dom0.

Zoli

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [GIT net-next] Open vSwitch
       [not found]             ` <52D4857C.7020902-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
@ 2014-01-14  1:30               ` Jesse Gross
       [not found]                 ` <CAEP_g=8nG6AHV9Y+5=48nPhkf5Oe=mG8EiyaKSqN4omnmGhv4A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 33+ messages in thread
From: Jesse Gross @ 2014-01-14  1:30 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev, David Miller

On Mon, Jan 13, 2014 at 4:31 PM, Zoltan Kiss <zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org> wrote:
> On 13/01/14 18:04, Zoltan Kiss wrote:
>>
>> On 08/01/14 15:10, Jesse Gross wrote:
>>>
>>> On Wed, Jan 8, 2014 at 9:49 AM, Zoltan Kiss <zoltan.kiss-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I've tried the latest net-next on a Xenserver install with 1.9.3
>>>> userspace,
>>>> and it seems this patch series broke it (at least after reverting that
>>>> locally it works now). I haven't went too far yet checking what's the
>>>> problem, but it seems the xenbrX device doesn't really receive too much
>>>> of
>>>> the traffic coming through the NIC. Is it expected?
>>>
>>>
>>> What do you mean by doesn't receive too much traffic? What does it get?
>>>
>>
>> Sorry for the vague error description, now I had more time to look into
>> this. I think the problem boils down to this:
>>
>> Jan 13 17:55:07 localhost ovs-vswitchd:
>> 07890|netlink_socket|DBG|nl_sock_recv__ (Success): nl(len:274,
>> type=29(ovs_packet), flags=0, seq=0, pid=0,genl(cmd=1,version=1)
>> Jan 13 17:55:07 localhost ovs-vswitchd: 07891|netlink|DBG|attributes
>> followed by garbage
>> Jan 13 17:55:07 localhost ovs-vswitchd: 07892|dpif|WARN|system@xenbr0:
>> recv failed (Invalid argument)
>>
>> That's just keep repeating. I'm keep looking.
>
>
> Now I reverted these top 3 commits:
>
> ovs: make functions local
>
> openvswitch: Compute checksum in skb_gso_segment() if needed
> openvswitch: Use skb_zerocopy() for upcall
>
> And it works. I guess the last one causing the problem. Might be an
> important factor, I'm using 32 bit Dom0.

I think you're probably right. Thomas - can you take a look?

We shouldn't be doing any zerocopy in this situation but it looks to
me like we don't do any padding at all, even in situations where we
are copying the data.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [GIT net-next] Open vSwitch
       [not found]                 ` <CAEP_g=8nG6AHV9Y+5=48nPhkf5Oe=mG8EiyaKSqN4omnmGhv4A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-01-14  9:46                   ` Thomas Graf
  2014-01-14 12:31                     ` [ovs-dev] " Zoltan Kiss
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Graf @ 2014-01-14  9:46 UTC (permalink / raw)
  To: Jesse Gross, Zoltan Kiss; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev, David Miller

On 01/14/2014 02:30 AM, Jesse Gross wrote:
>> And it works. I guess the last one causing the problem. Might be an
>> important factor, I'm using 32 bit Dom0.
>
> I think you're probably right. Thomas - can you take a look?
>
> We shouldn't be doing any zerocopy in this situation but it looks to
> me like we don't do any padding at all, even in situations where we
> are copying the data.

I'm looking into this now. The zerocopy method should only be attempted
if user space has announced the ability to received unaligned messages.

@Zoltan: I assume you are using an unmodified OVS 1.9.3?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [ovs-dev] [GIT net-next] Open vSwitch
  2014-01-14  9:46                   ` Thomas Graf
@ 2014-01-14 12:31                     ` Zoltan Kiss
  2014-01-14 16:16                       ` [PATCH net-next] openvswitch: Pad OVS_PACKET_ATTR_PACKET if linear copy was performed Thomas Graf
  0 siblings, 1 reply; 33+ messages in thread
From: Zoltan Kiss @ 2014-01-14 12:31 UTC (permalink / raw)
  To: Thomas Graf, Jesse Gross; +Cc: David Miller, dev, netdev

On 14/01/14 09:46, Thomas Graf wrote:
> On 01/14/2014 02:30 AM, Jesse Gross wrote:
>>> And it works. I guess the last one causing the problem. Might be an
>>> important factor, I'm using 32 bit Dom0.
>>
>> I think you're probably right. Thomas - can you take a look?
>>
>> We shouldn't be doing any zerocopy in this situation but it looks to
>> me like we don't do any padding at all, even in situations where we
>> are copying the data.
>
> I'm looking into this now. The zerocopy method should only be attempted
> if user space has announced the ability to received unaligned messages.
>
> @Zoltan: I assume you are using an unmodified OVS 1.9.3?
>
Well, there are a few changes, but none of them seems to be significant. 
Most of them are just making OVS work with our build system. Here is the 
patchqueue:

https://github.com/xenserver/openvswitch.pg/tree/master/master

See the series file for the order of the patches.

Zoli

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH net-next] openvswitch: Pad OVS_PACKET_ATTR_PACKET if linear copy was performed
  2014-01-14 12:31                     ` [ovs-dev] " Zoltan Kiss
@ 2014-01-14 16:16                       ` Thomas Graf
  2014-01-14 16:19                         ` Thomas Graf
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Graf @ 2014-01-14 16:16 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: Thomas Graf, Jesse Gross, David Miller, dev, netdev

While the zerocopy method is correctly omitted if user space
does not support unaligned Netlink messages. The attribute is
still not padded correctly as skb_zerocopy() will not ensure
padding and the attribute size is no longer pre calculated
though nla_reserve() which ensured padding previously.

This patch applies appropriate padding if a linear data copy
was performed in skb_zerocopy().

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/openvswitch/datapath.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index df46928..24e93f5 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -396,7 +396,7 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
 		.dst_sk = ovs_dp_get_net(dp)->genl_sock,
 		.snd_portid = upcall_info->portid,
 	};
-	size_t len;
+	size_t len, plen;
 	unsigned int hlen;
 	int err, dp_ifindex;
 
@@ -466,6 +466,11 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
 
 	skb_zerocopy(user_skb, skb, skb->len, hlen);
 
+	/* Pad OVS_PACKET_ATTR_PACKET if linear copy was performed */
+	if (!(dp->user_features & OVS_DP_F_UNALIGNED) &&
+	    (plen = (ALIGN(user_skb->len, NLA_ALIGNTO) - user_skb->len)) > 0)
+		skb_put(user_skb, plen);
+
 	((struct nlmsghdr *) user_skb->data)->nlmsg_len = user_skb->len;
 
 	err = genlmsg_unicast(ovs_dp_get_net(dp), user_skb, upcall_info->portid);

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next] openvswitch: Pad OVS_PACKET_ATTR_PACKET if linear copy was performed
  2014-01-14 16:16                       ` [PATCH net-next] openvswitch: Pad OVS_PACKET_ATTR_PACKET if linear copy was performed Thomas Graf
@ 2014-01-14 16:19                         ` Thomas Graf
  2014-01-14 16:27                           ` [PATCH net-next v2] " Thomas Graf
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Graf @ 2014-01-14 16:19 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: Thomas Graf, Jesse Gross, David Miller, dev, netdev

On 01/14/14 at 04:16pm, Thomas Graf wrote:
> While the zerocopy method is correctly omitted if user space
> does not support unaligned Netlink messages. The attribute is
> still not padded correctly as skb_zerocopy() will not ensure
> padding and the attribute size is no longer pre calculated
> though nla_reserve() which ensured padding previously.
> 
> This patch applies appropriate padding if a linear data copy
> was performed in skb_zerocopy().
> 
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
>  net/openvswitch/datapath.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index df46928..24e93f5 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -396,7 +396,7 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
>  		.dst_sk = ovs_dp_get_net(dp)->genl_sock,
>  		.snd_portid = upcall_info->portid,
>  	};
> -	size_t len;
> +	size_t len, plen;
>  	unsigned int hlen;
>  	int err, dp_ifindex;
>  
> @@ -466,6 +466,11 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
>  
>  	skb_zerocopy(user_skb, skb, skb->len, hlen);
>  
> +	/* Pad OVS_PACKET_ATTR_PACKET if linear copy was performed */
> +	if (!(dp->user_features & OVS_DP_F_UNALIGNED) &&
> +	    (plen = (ALIGN(user_skb->len, NLA_ALIGNTO) - user_skb->len)) > 0)
> +		skb_put(user_skb, plen);

While this fixes the padding issue, it leaves the padding
uninitialized, I will send v2 with an additional memset().

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH net-next v2] openvswitch: Pad OVS_PACKET_ATTR_PACKET if linear copy was performed
  2014-01-14 16:19                         ` Thomas Graf
@ 2014-01-14 16:27                           ` Thomas Graf
  2014-01-14 18:56                             ` Zoltan Kiss
  2014-01-15  3:03                             ` David Miller
  0 siblings, 2 replies; 33+ messages in thread
From: Thomas Graf @ 2014-01-14 16:27 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: Thomas Graf, Jesse Gross, David Miller, dev, netdev

While the zerocopy method is correctly omitted if user space
does not support unaligned Netlink messages. The attribute is
still not padded correctly as skb_zerocopy() will not ensure
padding and the attribute size is no longer pre calculated
though nla_reserve() which ensured padding previously.

This patch applies appropriate padding if a linear data copy
was performed in skb_zerocopy().

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
v2: initialize padding to 0's

 net/openvswitch/datapath.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index df46928..3ca9121 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -396,7 +396,7 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
 		.dst_sk = ovs_dp_get_net(dp)->genl_sock,
 		.snd_portid = upcall_info->portid,
 	};
-	size_t len;
+	size_t len, plen;
 	unsigned int hlen;
 	int err, dp_ifindex;
 
@@ -466,6 +466,11 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
 
 	skb_zerocopy(user_skb, skb, skb->len, hlen);
 
+	/* Pad OVS_PACKET_ATTR_PACKET if linear copy was performed */
+	if (!(dp->user_features & OVS_DP_F_UNALIGNED) &&
+	    (plen = (ALIGN(user_skb->len, NLA_ALIGNTO) - user_skb->len)) > 0)
+		memset(skb_put(user_skb, plen), 0, plen);
+
 	((struct nlmsghdr *) user_skb->data)->nlmsg_len = user_skb->len;
 
 	err = genlmsg_unicast(ovs_dp_get_net(dp), user_skb, upcall_info->portid);

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2] openvswitch: Pad OVS_PACKET_ATTR_PACKET if linear copy was performed
  2014-01-14 16:27                           ` [PATCH net-next v2] " Thomas Graf
@ 2014-01-14 18:56                             ` Zoltan Kiss
  2014-01-15  3:03                             ` David Miller
  1 sibling, 0 replies; 33+ messages in thread
From: Zoltan Kiss @ 2014-01-14 18:56 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Thomas Graf, Jesse Gross, David Miller, dev, netdev

It works for me, thanks!

Acked-by: Zoltan Kiss <zoltan.kiss@citrix.com>

On 14/01/14 16:27, Thomas Graf wrote:
> While the zerocopy method is correctly omitted if user space
> does not support unaligned Netlink messages. The attribute is
> still not padded correctly as skb_zerocopy() will not ensure
> padding and the attribute size is no longer pre calculated
> though nla_reserve() which ensured padding previously.
>
> This patch applies appropriate padding if a linear data copy
> was performed in skb_zerocopy().
>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
> v2: initialize padding to 0's
>
>   net/openvswitch/datapath.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index df46928..3ca9121 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -396,7 +396,7 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
>   		.dst_sk = ovs_dp_get_net(dp)->genl_sock,
>   		.snd_portid = upcall_info->portid,
>   	};
> -	size_t len;
> +	size_t len, plen;
>   	unsigned int hlen;
>   	int err, dp_ifindex;
>
> @@ -466,6 +466,11 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb,
>
>   	skb_zerocopy(user_skb, skb, skb->len, hlen);
>
> +	/* Pad OVS_PACKET_ATTR_PACKET if linear copy was performed */
> +	if (!(dp->user_features & OVS_DP_F_UNALIGNED) &&
> +	    (plen = (ALIGN(user_skb->len, NLA_ALIGNTO) - user_skb->len)) > 0)
> +		memset(skb_put(user_skb, plen), 0, plen);
> +
>   	((struct nlmsghdr *) user_skb->data)->nlmsg_len = user_skb->len;
>
>   	err = genlmsg_unicast(ovs_dp_get_net(dp), user_skb, upcall_info->portid);
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2] openvswitch: Pad OVS_PACKET_ATTR_PACKET if linear copy was performed
  2014-01-14 16:27                           ` [PATCH net-next v2] " Thomas Graf
  2014-01-14 18:56                             ` Zoltan Kiss
@ 2014-01-15  3:03                             ` David Miller
  2014-01-15 22:53                               ` Jesse Gross
  1 sibling, 1 reply; 33+ messages in thread
From: David Miller @ 2014-01-15  3:03 UTC (permalink / raw)
  To: tgraf; +Cc: zoltan.kiss, tgraf, jesse, dev, netdev

From: Thomas Graf <tgraf@suug.ch>
Date: Tue, 14 Jan 2014 16:27:49 +0000

> While the zerocopy method is correctly omitted if user space
> does not support unaligned Netlink messages. The attribute is
> still not padded correctly as skb_zerocopy() will not ensure
> padding and the attribute size is no longer pre calculated
> though nla_reserve() which ensured padding previously.
> 
> This patch applies appropriate padding if a linear data copy
> was performed in skb_zerocopy().
> 
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
> v2: initialize padding to 0's

Jesse do you want to queue this up or should I apply it directly?

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next v2] openvswitch: Pad OVS_PACKET_ATTR_PACKET if linear copy was performed
  2014-01-15  3:03                             ` David Miller
@ 2014-01-15 22:53                               ` Jesse Gross
  0 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2014-01-15 22:53 UTC (permalink / raw)
  To: David Miller; +Cc: Thomas Graf, Zoltan Kiss, Thomas Graf, dev, netdev

On Tue, Jan 14, 2014 at 7:03 PM, David Miller <davem@davemloft.net> wrote:
> From: Thomas Graf <tgraf@suug.ch>
> Date: Tue, 14 Jan 2014 16:27:49 +0000
>
>> While the zerocopy method is correctly omitted if user space
>> does not support unaligned Netlink messages. The attribute is
>> still not padded correctly as skb_zerocopy() will not ensure
>> padding and the attribute size is no longer pre calculated
>> though nla_reserve() which ensured padding previously.
>>
>> This patch applies appropriate padding if a linear data copy
>> was performed in skb_zerocopy().
>>
>> Signed-off-by: Thomas Graf <tgraf@suug.ch>
>> ---
>> v2: initialize padding to 0's
>
> Jesse do you want to queue this up or should I apply it directly?

I queued it up, thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2014-01-15 22:54 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-07  0:15 [GIT net-next] Open vSwitch Jesse Gross
     [not found] ` <1389053776-62865-1-git-send-email-jesse-l0M0P4e3n4LQT0dZR+AlfA@public.gmane.org>
2014-01-07  0:16   ` [PATCH net-next 01/17] openvswitch: Correct comment Jesse Gross
2014-01-07  0:16   ` [PATCH net-next 02/17] openvswitch: Shrink sw_flow_mask by 8 bytes (64-bit) or 4 bytes (32-bit) Jesse Gross
2014-01-07 21:36     ` Sergei Shtylyov
2014-01-07  0:16   ` [PATCH net-next 03/17] openvswitch: Change ovs_flow_tbl_lookup_xx() APIs Jesse Gross
2014-01-07  0:16   ` [PATCH net-next 04/17] openvswitch: Silence RCU lockdep checks from flow lookup Jesse Gross
2014-01-07  0:16   ` [PATCH net-next 05/17] genl: Add genlmsg_new_unicast() for unicast message allocation Jesse Gross
2014-01-07  0:16   ` [PATCH net-next 06/17] netlink: Avoid netlink mmap alloc if msg size exceeds frame size Jesse Gross
2014-01-07  0:16   ` [PATCH net-next 07/17] openvswitch: Enable memory mapped Netlink i/o Jesse Gross
2014-01-07  0:16   ` [PATCH net-next 08/17] openvswitch: Per cpu flow stats Jesse Gross
2014-01-07  0:16   ` [PATCH net-next 09/17] net: ovs: use kfree_rcu instead of rcu_free_{sw_flow_mask_cb, acts_callback} Jesse Gross
2014-01-07  0:16   ` [PATCH net-next 10/17] openvswitch: remove duplicated include from flow_table.c Jesse Gross
2014-01-07  0:16   ` [PATCH net-next 11/17] net: Export skb_zerocopy() to zerocopy from one skb to another Jesse Gross
2014-01-07  0:16   ` [PATCH net-next 12/17] openvswitch: Allow user space to announce ability to accept unaligned Netlink messages Jesse Gross
2014-01-07  0:16   ` [PATCH net-next 13/17] openvswitch: Drop user features if old user space attempted to create datapath Jesse Gross
2014-01-07  0:16   ` [PATCH net-next 14/17] openvswitch: Pass datapath into userspace queue functions Jesse Gross
2014-01-07  0:16   ` [PATCH net-next 15/17] openvswitch: Use skb_zerocopy() for upcall Jesse Gross
2014-01-07  0:16   ` [PATCH net-next 16/17] openvswitch: Compute checksum in skb_gso_segment() if needed Jesse Gross
2014-01-07  0:16   ` [PATCH net-next 17/17] ovs: make functions local Jesse Gross
2014-01-07  0:49   ` [GIT net-next] Open vSwitch David Miller
2014-01-08 14:49 ` [ovs-dev] " Zoltan Kiss
     [not found]   ` <52CD657F.7080806-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2014-01-08 15:10     ` Jesse Gross
2014-01-13 18:04       ` [ovs-dev] " Zoltan Kiss
     [not found]         ` <52D42A9E.1030805-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2014-01-14  0:31           ` Zoltan Kiss
     [not found]             ` <52D4857C.7020902-Sxgqhf6Nn4DQT0dZR+AlfA@public.gmane.org>
2014-01-14  1:30               ` Jesse Gross
     [not found]                 ` <CAEP_g=8nG6AHV9Y+5=48nPhkf5Oe=mG8EiyaKSqN4omnmGhv4A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-14  9:46                   ` Thomas Graf
2014-01-14 12:31                     ` [ovs-dev] " Zoltan Kiss
2014-01-14 16:16                       ` [PATCH net-next] openvswitch: Pad OVS_PACKET_ATTR_PACKET if linear copy was performed Thomas Graf
2014-01-14 16:19                         ` Thomas Graf
2014-01-14 16:27                           ` [PATCH net-next v2] " Thomas Graf
2014-01-14 18:56                             ` Zoltan Kiss
2014-01-15  3:03                             ` David Miller
2014-01-15 22:53                               ` Jesse Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.