All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC net-next sample action optimization 0/3]
@ 2017-03-08  0:15 Andy Zhou
  2017-03-08  0:15 ` [RFC net-next sample action optimization 1/3] openvswitch: deferred fifo api change Andy Zhou
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Andy Zhou @ 2017-03-08  0:15 UTC (permalink / raw)
  To: netdev; +Cc: joe, pshelar, Andy Zhou

The sample action can be used for translating Openflow 'clone' action.
However its implementation has not been sufficiently optimized for this
use case. This series attempts to close the gap.

Patch 3 commit message has more details on the specific optimizations
implemented.


Andy Zhou (3):
  openvswitch: deferred fifo api change
  openvswitch: Refactor recirc key allocation.
  openvswitch: Optimize sample action for the clone use cases

 net/openvswitch/actions.c      | 190 ++++++++++++++++++++++-------------------
 net/openvswitch/datapath.h     |   7 ++
 net/openvswitch/flow_netlink.c | 118 +++++++++++++++++--------
 3 files changed, 192 insertions(+), 123 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC net-next sample action optimization 1/3] openvswitch: deferred fifo api change
  2017-03-08  0:15 [RFC net-next sample action optimization 0/3] Andy Zhou
@ 2017-03-08  0:15 ` Andy Zhou
  2017-03-09 19:08   ` Joe Stringer
  2017-03-08  0:15 ` [RFC net-next sample action optimization 2/3] openvswitch: Refactor recirc key allocation Andy Zhou
  2017-03-08  0:15 ` [RFC net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases Andy Zhou
  2 siblings, 1 reply; 10+ messages in thread
From: Andy Zhou @ 2017-03-08  0:15 UTC (permalink / raw)
  To: netdev; +Cc: joe, pshelar, Andy Zhou

add_deferred_actions() API currently requires actions to be passed in
as a fully encoded netlink message. So far both 'sample' and 'recirc'
actions happens to carry actions as fully encoded netlink messages.
However, this requirement is more restrictive than necessary, future
patch will need to pass in action lists that are not fully encoded
by themselves.

Signed-off-by: Andy Zhou <azhou@ovn.org>
---
 net/openvswitch/actions.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index c82301c..75182e9 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -51,6 +51,7 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 struct deferred_action {
 	struct sk_buff *skb;
 	const struct nlattr *actions;
+	int actions_len;
 
 	/* Store pkt_key clone when creating deferred action. */
 	struct sw_flow_key pkt_key;
@@ -119,8 +120,9 @@ static struct deferred_action *action_fifo_put(struct action_fifo *fifo)
 
 /* Return true if fifo is not full */
 static struct deferred_action *add_deferred_actions(struct sk_buff *skb,
-						    const struct sw_flow_key *key,
-						    const struct nlattr *attr)
+				    const struct sw_flow_key *key,
+				    const struct nlattr *actions,
+				    const int actions_len)
 {
 	struct action_fifo *fifo;
 	struct deferred_action *da;
@@ -129,7 +131,8 @@ static struct deferred_action *add_deferred_actions(struct sk_buff *skb,
 	da = action_fifo_put(fifo);
 	if (da) {
 		da->skb = skb;
-		da->actions = attr;
+		da->actions = actions;
+		da->actions_len = actions_len;
 		da->pkt_key = *key;
 	}
 
@@ -966,7 +969,8 @@ static int sample(struct datapath *dp, struct sk_buff *skb,
 		/* Skip the sample action when out of memory. */
 		return 0;
 
-	if (!add_deferred_actions(skb, key, a)) {
+	if (!add_deferred_actions(skb, key, nla_data(acts_list),
+				  nla_len(acts_list))) {
 		if (net_ratelimit())
 			pr_warn("%s: deferred actions limit reached, dropping sample action\n",
 				ovs_dp_name(dp));
@@ -1123,7 +1127,7 @@ static int execute_recirc(struct datapath *dp, struct sk_buff *skb,
 		return 0;
 	}
 
-	da = add_deferred_actions(skb, key, NULL);
+	da = add_deferred_actions(skb, key, NULL, 0);
 	if (da) {
 		da->pkt_key.recirc_id = nla_get_u32(a);
 	} else {
@@ -1278,10 +1282,10 @@ static void process_deferred_actions(struct datapath *dp)
 		struct sk_buff *skb = da->skb;
 		struct sw_flow_key *key = &da->pkt_key;
 		const struct nlattr *actions = da->actions;
+		int actions_len = da->actions_len;
 
 		if (actions)
-			do_execute_actions(dp, skb, key, actions,
-					   nla_len(actions));
+			do_execute_actions(dp, skb, key, actions, actions_len);
 		else
 			ovs_dp_process_packet(skb, key);
 	} while (!action_fifo_is_empty(fifo));
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC net-next sample action optimization 2/3] openvswitch: Refactor recirc key allocation.
  2017-03-08  0:15 [RFC net-next sample action optimization 0/3] Andy Zhou
  2017-03-08  0:15 ` [RFC net-next sample action optimization 1/3] openvswitch: deferred fifo api change Andy Zhou
@ 2017-03-08  0:15 ` Andy Zhou
  2017-03-09 19:11   ` Joe Stringer
  2017-03-08  0:15 ` [RFC net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases Andy Zhou
  2 siblings, 1 reply; 10+ messages in thread
From: Andy Zhou @ 2017-03-08  0:15 UTC (permalink / raw)
  To: netdev; +Cc: joe, pshelar, Andy Zhou

The logic of allocating and copy key for each 'exec_actions_level'
was specific to execute_recirc(). However, future patches will reuse
as well.  Refactor the logic into its own function clone_key().

Signed-off-by: Andy Zhou <azhou@ovn.org>
---
 net/openvswitch/actions.c | 72 +++++++++++++++++++++++++++++------------------
 1 file changed, 44 insertions(+), 28 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 75182e9..259aea9 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2007-2014 Nicira, Inc.
+ * Copyright (c) 2007-2017 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
@@ -75,7 +75,7 @@ struct ovs_frag_data {
 
 #define DEFERRED_ACTION_FIFO_SIZE 10
 #define OVS_RECURSION_LIMIT 5
-#define OVS_DEFERRED_ACTION_THRESHOLD (OVS_RECURSION_LIMIT - 2)
+#define OVS_ACTION_RECURSION_THRESHOLD (OVS_RECURSION_LIMIT - 2)
 struct action_fifo {
 	int head;
 	int tail;
@@ -83,14 +83,32 @@ struct action_fifo {
 	struct deferred_action fifo[DEFERRED_ACTION_FIFO_SIZE];
 };
 
-struct recirc_keys {
-	struct sw_flow_key key[OVS_DEFERRED_ACTION_THRESHOLD];
+struct action_flow_keys {
+	struct sw_flow_key key[OVS_ACTION_RECURSION_THRESHOLD];
 };
 
 static struct action_fifo __percpu *action_fifos;
-static struct recirc_keys __percpu *recirc_keys;
+static struct action_flow_keys __percpu *flow_keys;
 static DEFINE_PER_CPU(int, exec_actions_level);
 
+/* Make a clone of the 'key', using the pre-allocated percpu 'flow_keys'
+ * space. Since the storage is pre-allocated, the caller does not
+ * need to check for NULL return pointer.
+ */
+static struct sw_flow_key *clone_key(const struct sw_flow_key *key_)
+{
+	struct action_flow_keys *keys = this_cpu_ptr(flow_keys);
+	int level = this_cpu_read(exec_actions_level);
+	struct sw_flow_key *key = NULL;
+
+	if (level <= OVS_ACTION_RECURSION_THRESHOLD) {
+		key = &keys->key[level - 1];
+		*key = *key_;
+	}
+
+	return key;
+}
+
 static void action_fifo_init(struct action_fifo *fifo)
 {
 	fifo->head = 0;
@@ -1090,8 +1108,8 @@ static int execute_recirc(struct datapath *dp, struct sk_buff *skb,
 			  struct sw_flow_key *key,
 			  const struct nlattr *a, int rem)
 {
+	struct sw_flow_key *recirc_key;
 	struct deferred_action *da;
-	int level;
 
 	if (!is_flow_key_valid(key)) {
 		int err;
@@ -1115,29 +1133,27 @@ static int execute_recirc(struct datapath *dp, struct sk_buff *skb,
 			return 0;
 	}
 
-	level = this_cpu_read(exec_actions_level);
-	if (level <= OVS_DEFERRED_ACTION_THRESHOLD) {
-		struct recirc_keys *rks = this_cpu_ptr(recirc_keys);
-		struct sw_flow_key *recirc_key = &rks->key[level - 1];
-
-		*recirc_key = *key;
+	/* If we are within the limit of 'OVS_ACTION_RECURSION_THRESHOLD',
+	 * recirc immediately, otherwise, defer it for later execution.
+	 */
+	recirc_key = clone_key(key);
+	if (recirc_key) {
 		recirc_key->recirc_id = nla_get_u32(a);
 		ovs_dp_process_packet(skb, recirc_key);
-
-		return 0;
-	}
-
-	da = add_deferred_actions(skb, key, NULL, 0);
-	if (da) {
-		da->pkt_key.recirc_id = nla_get_u32(a);
 	} else {
-		kfree_skb(skb);
-
-		if (net_ratelimit())
-			pr_warn("%s: deferred action limit reached, drop recirc action\n",
-				ovs_dp_name(dp));
+		da = add_deferred_actions(skb, key, NULL, 0);
+		if (da) {
+			recirc_key = &da->pkt_key;
+			recirc_key->recirc_id = nla_get_u32(a);
+		} else {
+			/* Log an error in case action fifo is full.
+			 */
+			kfree_skb(skb);
+			if (net_ratelimit())
+				pr_warn("%s: deferred action limit reached, drop recirc action\n",
+					ovs_dp_name(dp));
+		}
 	}
-
 	return 0;
 }
 
@@ -1327,8 +1343,8 @@ int action_fifos_init(void)
 	if (!action_fifos)
 		return -ENOMEM;
 
-	recirc_keys = alloc_percpu(struct recirc_keys);
-	if (!recirc_keys) {
+	flow_keys = alloc_percpu(struct action_flow_keys);
+	if (!flow_keys) {
 		free_percpu(action_fifos);
 		return -ENOMEM;
 	}
@@ -1339,5 +1355,5 @@ int action_fifos_init(void)
 void action_fifos_exit(void)
 {
 	free_percpu(action_fifos);
-	free_percpu(recirc_keys);
+	free_percpu(flow_keys);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases
  2017-03-08  0:15 [RFC net-next sample action optimization 0/3] Andy Zhou
  2017-03-08  0:15 ` [RFC net-next sample action optimization 1/3] openvswitch: deferred fifo api change Andy Zhou
  2017-03-08  0:15 ` [RFC net-next sample action optimization 2/3] openvswitch: Refactor recirc key allocation Andy Zhou
@ 2017-03-08  0:15 ` Andy Zhou
  2017-03-09 19:46   ` Joe Stringer
  2 siblings, 1 reply; 10+ messages in thread
From: Andy Zhou @ 2017-03-08  0:15 UTC (permalink / raw)
  To: netdev; +Cc: joe, pshelar, Andy Zhou

With the introduction of open flow 'clone' action, the OVS user space
can now translate the 'clone' action into kernel datapath 'sample'
action, with 100% probability, to ensure that the clone semantics,
which is that the packet seen by the clone action is the same as the
packet seen by the action after clone, is faithfully carried out
in the datapath.

While the sample action in the datpath has the matching semantics,
its implementation is only optimized for its original use.
Specifically, there are two limitation: First, there is a 3 level of
nesting restriction, enforced at the flow downloading time. This
limit turns out to be too restrictive for the 'clone' use case.
Second, the implementation avoid recursive call only if the sample
action list has a single userspace action.

The optimization implemented in this series removes the static
nesting limit check, instead, implement the run time recursion limit
check, and recursion avoidance similar to that of the 'recirc' action.
This optimization solve both #1 and #2 issues above.

Another optimization implemented is to avoid coping flow key as
long as the actions enclosed does not change the flow key. The
detection is performed only once at the flow downloading time.

The third optimization implemented is to rewrite the action list
at flow downloading time in order to save the fast path from parsing
the sample action list in its original form repeatedly.

Signed-off-by: Andy Zhou <azhou@ovn.org>
---
 net/openvswitch/actions.c      | 106 ++++++++++++++++++------------------
 net/openvswitch/datapath.h     |   7 +++
 net/openvswitch/flow_netlink.c | 118 ++++++++++++++++++++++++++++-------------
 3 files changed, 140 insertions(+), 91 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 259aea9..2e8c372 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -930,71 +930,52 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
 }
 
 static int sample(struct datapath *dp, struct sk_buff *skb,
-		  struct sw_flow_key *key, const struct nlattr *attr,
-		  const struct nlattr *actions, int actions_len)
+		  struct sw_flow_key *key, const struct nlattr *attr)
 {
-	const struct nlattr *acts_list = NULL;
-	const struct nlattr *a;
-	int rem;
-	u32 cutlen = 0;
-
-	for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
-		 a = nla_next(a, &rem)) {
-		u32 probability;
-
-		switch (nla_type(a)) {
-		case OVS_SAMPLE_ATTR_PROBABILITY:
-			probability = nla_get_u32(a);
-			if (!probability || prandom_u32() > probability)
-				return 0;
-			break;
-
-		case OVS_SAMPLE_ATTR_ACTIONS:
-			acts_list = a;
-			break;
-		}
-	}
+	struct nlattr *actions;
+	struct nlattr *sample_arg;
+	struct sw_flow_key *orig = key;
+	int rem = nla_len(attr);
+	int err = 0;
+	const struct sample_arg *arg;
 
-	rem = nla_len(acts_list);
-	a = nla_data(acts_list);
+	/* The first action is always 'OVS_SAMPLE_ATTR_AUX'. */
+	sample_arg = nla_data(attr);
+	arg = nla_data(sample_arg);
+	actions = nla_next(sample_arg, &rem);
 
-	/* Actions list is empty, do nothing */
-	if (unlikely(!rem))
-		return 0;
+	if (arg->probability != U32_MAX)
+		if (!arg->probability || prandom_u32() > arg->probability)
+			return 0;
 
-	/* The only known usage of sample action is having a single user-space
-	 * action, or having a truncate action followed by a single user-space
-	 * action. Treat this usage as a special case.
-	 * The output_userspace() should clone the skb to be sent to the
-	 * user space. This skb will be consumed by its caller.
+	/* In case the sample actions won't change 'key',
+	 * we can use key for the clone execution.
+	 * Otherwise, try to allocate a key from the
+	 * next recursion level of 'flow_keys'. If
+	 * successful, we can still execute the clone
+	 * actions  without deferring.
+	 *
+	 * Defer the clone action if the action recursion
+	 * limit has been reached.
 	 */
-	if (unlikely(nla_type(a) == OVS_ACTION_ATTR_TRUNC)) {
-		struct ovs_action_trunc *trunc = nla_data(a);
-
-		if (skb->len > trunc->max_len)
-			cutlen = skb->len - trunc->max_len;
-
-		a = nla_next(a, &rem);
+	if (!arg->exec) {
+		__this_cpu_inc(exec_actions_level);
+		key = clone_key(key);
 	}
 
-	if (likely(nla_type(a) == OVS_ACTION_ATTR_USERSPACE &&
-		   nla_is_last(a, rem)))
-		return output_userspace(dp, skb, key, a, actions,
-					actions_len, cutlen);
-
-	skb = skb_clone(skb, GFP_ATOMIC);
-	if (!skb)
-		/* Skip the sample action when out of memory. */
-		return 0;
+	if (key) {
+		err = do_execute_actions(dp, skb, key, actions, rem);
+	} else if (!add_deferred_actions(skb, orig, actions, rem)) {
 
-	if (!add_deferred_actions(skb, key, nla_data(acts_list),
-				  nla_len(acts_list))) {
 		if (net_ratelimit())
-			pr_warn("%s: deferred actions limit reached, dropping sample action\n",
+			pr_warn("%s: deferred action limit reached, drop clone action\n",
 				ovs_dp_name(dp));
-
 		kfree_skb(skb);
 	}
+
+	if (!arg->exec)
+		__this_cpu_dec(exec_actions_level);
+
 	return 0;
 }
 
@@ -1246,9 +1227,24 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
 			err = execute_masked_set_action(skb, key, nla_data(a));
 			break;
 
-		case OVS_ACTION_ATTR_SAMPLE:
-			err = sample(dp, skb, key, a, attr, len);
+		case OVS_ACTION_ATTR_SAMPLE: {
+			bool last = nla_is_last(a, rem);
+			struct sk_buff *clone_skb;
+
+			clone_skb = last ? skb : skb_clone(skb, GFP_ATOMIC);
+
+			if (!clone_skb)
+				/* Out of memory, skip this sample action.
+				 */
+				break;
+
+			err = sample(dp, clone_skb, key, a);
+			if (last)
+				return err;
+
+			pr_err("executing sample");
 			break;
+			}
 
 		case OVS_ACTION_ATTR_CT:
 			if (!is_flow_key_valid(key)) {
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 1c6e937..d7a6e803 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -220,4 +220,11 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
 	if (logging_allowed && net_ratelimit())			\
 		pr_info("netlink: " fmt "\n", ##__VA_ARGS__);	\
 } while (0)
+
+#define OVS_SAMPLE_ATTR_ARG  (OVS_ACTION_ATTR_MAX + 1)
+struct sample_arg {
+	bool exec;
+	u32  probability;
+};
+
 #endif /* datapath.h */
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 6f5fa50..0e7cf13 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2007-2014 Nicira, Inc.
+ * Copyright (c) 2007-2017 Nicira, Inc.
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of version 2 of the GNU General Public
@@ -59,6 +59,39 @@ struct ovs_len_tbl {
 #define OVS_ATTR_NESTED -1
 #define OVS_ATTR_VARIABLE -2
 
+static bool actions_may_change_flow(const struct nlattr *actions)
+{
+	struct nlattr *nla;
+	int rem;
+
+	nla_for_each_nested(nla, actions, rem) {
+		u16 action = nla_type(nla);
+
+		switch (action) {
+		case OVS_ACTION_ATTR_OUTPUT:
+		case OVS_ACTION_ATTR_RECIRC:
+		case OVS_ACTION_ATTR_USERSPACE:
+		case OVS_ACTION_ATTR_SAMPLE:
+		case OVS_ACTION_ATTR_TRUNC:
+			break;
+
+		case OVS_ACTION_ATTR_PUSH_MPLS:
+		case OVS_ACTION_ATTR_POP_MPLS:
+		case OVS_ACTION_ATTR_PUSH_VLAN:
+		case OVS_ACTION_ATTR_POP_VLAN:
+		case OVS_ACTION_ATTR_SET:
+		case OVS_ACTION_ATTR_SET_MASKED:
+		case OVS_ACTION_ATTR_HASH:
+		case OVS_ACTION_ATTR_CT:
+		case OVS_ACTION_ATTR_PUSH_ETH:
+		case OVS_ACTION_ATTR_POP_ETH:
+		default:
+			return true;
+		}
+	}
+	return false;
+}
+
 static void update_range(struct sw_flow_match *match,
 			 size_t offset, size_t size, bool is_mask)
 {
@@ -2027,12 +2060,14 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
 				    const struct sw_flow_key *key, int depth,
 				    struct sw_flow_actions **sfa,
-				    __be16 eth_type, __be16 vlan_tci, bool log)
+				    __be16 eth_type, __be16 vlan_tci,
+				    bool log, bool last)
 {
 	const struct nlattr *attrs[OVS_SAMPLE_ATTR_MAX + 1];
 	const struct nlattr *probability, *actions;
 	const struct nlattr *a;
-	int rem, start, err, st_acts;
+	int rem, start, err;
+	struct sample_arg arg;
 
 	memset(attrs, 0, sizeof(attrs));
 	nla_for_each_nested(a, attr, rem) {
@@ -2056,20 +2091,32 @@ static int validate_and_copy_sample(struct net *net, const struct nlattr *attr,
 	start = add_nested_action_start(sfa, OVS_ACTION_ATTR_SAMPLE, log);
 	if (start < 0)
 		return start;
-	err = ovs_nla_add_action(sfa, OVS_SAMPLE_ATTR_PROBABILITY,
-				 nla_data(probability), sizeof(u32), log);
+
+	/* When both skb and flow may be changed, put the sample
+	 * into a deferred fifo. On the other hand, if only skb
+	 * may be modified, the actions can be executed in place.
+	 *
+	 * Do this analysis at the flow installation time.
+	 * Set 'clone_action->exec' to true if the actions can be
+	 * executed without being deferred.
+	 *
+	 * If the sample is the last action, it can always be excuted
+	 * rather than deferred.
+	 */
+	arg.exec = last || !actions_may_change_flow(actions);
+	arg.probability = nla_get_u32(probability);
+
+	err = ovs_nla_add_action(sfa, OVS_SAMPLE_ATTR_ARG, &arg, sizeof(arg),
+				 log);
 	if (err)
 		return err;
-	st_acts = add_nested_action_start(sfa, OVS_SAMPLE_ATTR_ACTIONS, log);
-	if (st_acts < 0)
-		return st_acts;
 
-	err = __ovs_nla_copy_actions(net, actions, key, depth + 1, sfa,
+	err = __ovs_nla_copy_actions(net, actions, key, depth, sfa,
 				     eth_type, vlan_tci, log);
+
 	if (err)
 		return err;
 
-	add_nested_action_end(*sfa, st_acts);
 	add_nested_action_end(*sfa, start);
 
 	return 0;
@@ -2554,8 +2601,10 @@ static int __ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 			break;
 
 		case OVS_ACTION_ATTR_SAMPLE:
-			err = validate_and_copy_sample(net, a, key, depth, sfa,
-						       eth_type, vlan_tci, log);
+			err = validate_and_copy_sample(net, a, key, depth,
+						       sfa, eth_type,
+						       vlan_tci, log,
+						       nla_is_last(a, rem));
 			if (err)
 				return err;
 			skip_copy = true;
@@ -2621,39 +2670,36 @@ int ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
 	return err;
 }
 
-static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff *skb)
+static int sample_action_to_attr(const struct nlattr *attr,
+				 struct sk_buff *skb)
 {
-	const struct nlattr *a;
-	struct nlattr *start;
-	int err = 0, rem;
+	struct nlattr *start, *ac_start, *sample_arg;
+	int err = 0, rem = nla_len(attr);
+	const struct sample_arg *arg;
+	struct nlattr *actions;
 
 	start = nla_nest_start(skb, OVS_ACTION_ATTR_SAMPLE);
 	if (!start)
 		return -EMSGSIZE;
 
-	nla_for_each_nested(a, attr, rem) {
-		int type = nla_type(a);
-		struct nlattr *st_sample;
+	sample_arg = nla_data(attr);
+	arg = nla_data(sample_arg);
+	actions = nla_next(sample_arg, &rem);
 
-		switch (type) {
-		case OVS_SAMPLE_ATTR_PROBABILITY:
-			if (nla_put(skb, OVS_SAMPLE_ATTR_PROBABILITY,
-				    sizeof(u32), nla_data(a)))
-				return -EMSGSIZE;
-			break;
-		case OVS_SAMPLE_ATTR_ACTIONS:
-			st_sample = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS);
-			if (!st_sample)
-				return -EMSGSIZE;
-			err = ovs_nla_put_actions(nla_data(a), nla_len(a), skb);
-			if (err)
-				return err;
-			nla_nest_end(skb, st_sample);
-			break;
-		}
-	}
+	if (nla_put_u32(skb, OVS_SAMPLE_ATTR_PROBABILITY, arg->probability))
+		return -EMSGSIZE;
 
+	ac_start = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS);
+	if (!ac_start)
+		return -EMSGSIZE;
+
+	err = ovs_nla_put_actions(actions, rem, skb);
+	if (err)
+		return err;
+
+	nla_nest_end(skb, ac_start);
 	nla_nest_end(skb, start);
+
 	return err;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC net-next sample action optimization 1/3] openvswitch: deferred fifo api change
  2017-03-08  0:15 ` [RFC net-next sample action optimization 1/3] openvswitch: deferred fifo api change Andy Zhou
@ 2017-03-09 19:08   ` Joe Stringer
  0 siblings, 0 replies; 10+ messages in thread
From: Joe Stringer @ 2017-03-09 19:08 UTC (permalink / raw)
  To: Andy Zhou; +Cc: netdev, pravin shelar

On 7 March 2017 at 16:15, Andy Zhou <azhou@ovn.org> wrote:
> add_deferred_actions() API currently requires actions to be passed in
> as a fully encoded netlink message. So far both 'sample' and 'recirc'
> actions happens to carry actions as fully encoded netlink messages.
> However, this requirement is more restrictive than necessary, future
> patch will need to pass in action lists that are not fully encoded
> by themselves.
>
> Signed-off-by: Andy Zhou <azhou@ovn.org>

Acked-by: Joe Stringer <joe@ovn.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC net-next sample action optimization 2/3] openvswitch: Refactor recirc key allocation.
  2017-03-08  0:15 ` [RFC net-next sample action optimization 2/3] openvswitch: Refactor recirc key allocation Andy Zhou
@ 2017-03-09 19:11   ` Joe Stringer
  2017-03-10 21:44     ` Andy Zhou
  0 siblings, 1 reply; 10+ messages in thread
From: Joe Stringer @ 2017-03-09 19:11 UTC (permalink / raw)
  To: Andy Zhou; +Cc: netdev, pravin shelar

On 7 March 2017 at 16:15, Andy Zhou <azhou@ovn.org> wrote:
> The logic of allocating and copy key for each 'exec_actions_level'
> was specific to execute_recirc(). However, future patches will reuse
> as well.  Refactor the logic into its own function clone_key().
>
> Signed-off-by: Andy Zhou <azhou@ovn.org>
> ---

<snip>

> @@ -83,14 +83,32 @@ struct action_fifo {
>         struct deferred_action fifo[DEFERRED_ACTION_FIFO_SIZE];
>  };
>
> -struct recirc_keys {
> -       struct sw_flow_key key[OVS_DEFERRED_ACTION_THRESHOLD];
> +struct action_flow_keys {
> +       struct sw_flow_key key[OVS_ACTION_RECURSION_THRESHOLD];
>  };

I thought the old struct name was clearer on how it would be used -
for when actions are deferred.

>
>  static struct action_fifo __percpu *action_fifos;
> -static struct recirc_keys __percpu *recirc_keys;
> +static struct action_flow_keys __percpu *flow_keys;
>  static DEFINE_PER_CPU(int, exec_actions_level);
>
> +/* Make a clone of the 'key', using the pre-allocated percpu 'flow_keys'
> + * space. Since the storage is pre-allocated, the caller does not
> + * need to check for NULL return pointer.
> + */

Hmm? if level > OVS_ACTION_RECURSION_THRESHOLD, this function returns NULL.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases
  2017-03-08  0:15 ` [RFC net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases Andy Zhou
@ 2017-03-09 19:46   ` Joe Stringer
  2017-03-10 22:07     ` Andy Zhou
  0 siblings, 1 reply; 10+ messages in thread
From: Joe Stringer @ 2017-03-09 19:46 UTC (permalink / raw)
  To: Andy Zhou; +Cc: netdev, pravin shelar

On 7 March 2017 at 16:15, Andy Zhou <azhou@ovn.org> wrote:
> With the introduction of open flow 'clone' action, the OVS user space
> can now translate the 'clone' action into kernel datapath 'sample'
> action, with 100% probability, to ensure that the clone semantics,
> which is that the packet seen by the clone action is the same as the
> packet seen by the action after clone, is faithfully carried out
> in the datapath.
>
> While the sample action in the datpath has the matching semantics,
> its implementation is only optimized for its original use.
> Specifically, there are two limitation: First, there is a 3 level of
> nesting restriction, enforced at the flow downloading time. This
> limit turns out to be too restrictive for the 'clone' use case.
> Second, the implementation avoid recursive call only if the sample
> action list has a single userspace action.
>
> The optimization implemented in this series removes the static
> nesting limit check, instead, implement the run time recursion limit
> check, and recursion avoidance similar to that of the 'recirc' action.
> This optimization solve both #1 and #2 issues above.
>
> Another optimization implemented is to avoid coping flow key as

*copying

> long as the actions enclosed does not change the flow key. The
> detection is performed only once at the flow downloading time.
>
> The third optimization implemented is to rewrite the action list
> at flow downloading time in order to save the fast path from parsing
> the sample action list in its original form repeatedly.

Whenever there is an enumeration (1, 2, 3; ..another..., third thing
implemented) in a commit message, I have to ask whether each "another
change..." should be a separate patch. It certainly makes it easier to
review.

I ran this through the OVS kernel tests and it's working correctly
from that point of view, but I didn't get a chance to dig in and
ensure for example, runtime behaviour of several nested
sample(actions(sample(actions(sample(actions(output)))))) handles
reasonably when it runs out of stack and deferred actions space. At a
high level though, this seems pretty straightforward.

Several comments below, thanks.

>
> Signed-off-by: Andy Zhou <azhou@ovn.org>
> ---
>  net/openvswitch/actions.c      | 106 ++++++++++++++++++------------------
>  net/openvswitch/datapath.h     |   7 +++
>  net/openvswitch/flow_netlink.c | 118 ++++++++++++++++++++++++++++-------------
>  3 files changed, 140 insertions(+), 91 deletions(-)
>
> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> index 259aea9..2e8c372 100644
> --- a/net/openvswitch/actions.c
> +++ b/net/openvswitch/actions.c
> @@ -930,71 +930,52 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
>  }
>
>  static int sample(struct datapath *dp, struct sk_buff *skb,
> -                 struct sw_flow_key *key, const struct nlattr *attr,
> -                 const struct nlattr *actions, int actions_len)
> +                 struct sw_flow_key *key, const struct nlattr *attr)
>  {
> -       const struct nlattr *acts_list = NULL;
> -       const struct nlattr *a;
> -       int rem;
> -       u32 cutlen = 0;
> -
> -       for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
> -                a = nla_next(a, &rem)) {
> -               u32 probability;
> -
> -               switch (nla_type(a)) {
> -               case OVS_SAMPLE_ATTR_PROBABILITY:
> -                       probability = nla_get_u32(a);
> -                       if (!probability || prandom_u32() > probability)
> -                               return 0;
> -                       break;
> -
> -               case OVS_SAMPLE_ATTR_ACTIONS:
> -                       acts_list = a;
> -                       break;
> -               }
> -       }
> +       struct nlattr *actions;
> +       struct nlattr *sample_arg;
> +       struct sw_flow_key *orig = key;
> +       int rem = nla_len(attr);
> +       int err = 0;
> +       const struct sample_arg *arg;
>
> -       rem = nla_len(acts_list);
> -       a = nla_data(acts_list);
> +       /* The first action is always 'OVS_SAMPLE_ATTR_AUX'. */

Is it? This is the only reference to OVS_SAMPLE_ATTR_AUX I can see.

> +       sample_arg = nla_data(attr);

We could do this in the parent call, like several other actions do.

<snip>

> @@ -1246,9 +1227,24 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
>                         err = execute_masked_set_action(skb, key, nla_data(a));
>                         break;
>
> -               case OVS_ACTION_ATTR_SAMPLE:
> -                       err = sample(dp, skb, key, a, attr, len);
> +               case OVS_ACTION_ATTR_SAMPLE: {
> +                       bool last = nla_is_last(a, rem);
> +                       struct sk_buff *clone_skb;
> +
> +                       clone_skb = last ? skb : skb_clone(skb, GFP_ATOMIC);
> +
> +                       if (!clone_skb)
> +                               /* Out of memory, skip this sample action.
> +                                */

Should we ratelimited complain in this case?

> +                               break;
> +
> +                       err = sample(dp, clone_skb, key, a);
> +                       if (last)
> +                               return err;

Given that sample() doesn't guarantee it will free 'clone_skb', I
don't think this is safe.

> +
> +                       pr_err("executing sample");

Useful for debugging, but don't forget to drop for next version.

>                         break;
> +                       }

This bracket is typically aligned with the character of the beginning
of the 'case' that it applies to.

>
>                 case OVS_ACTION_ATTR_CT:
>                         if (!is_flow_key_valid(key)) {
> diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
> index 1c6e937..d7a6e803 100644
> --- a/net/openvswitch/datapath.h
> +++ b/net/openvswitch/datapath.h
> @@ -220,4 +220,11 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
>         if (logging_allowed && net_ratelimit())                 \
>                 pr_info("netlink: " fmt "\n", ##__VA_ARGS__);   \
>  } while (0)
> +
> +#define OVS_SAMPLE_ATTR_ARG  (OVS_ACTION_ATTR_MAX + 1)

This should be defined in include/uapi/linux/openvswitch.h, protected
by #ifdef __KERNEL__ like the other internal formats. While I don't
think it actually conflicts with any other internal action format,
these should be in one place so we don't end up accidentally using the
same enum twice.

> +struct sample_arg {
> +       bool exec;
> +       u32  probability;
> +};

Perhaps we should move this to either net/openvswitch/flow_netlink.h
or include/linux/openvswitch.h since it's only for consumption
internally by the kernel module?

> +
>  #endif /* datapath.h */
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index 6f5fa50..0e7cf13 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -1,5 +1,5 @@
>  /*
> - * Copyright (c) 2007-2014 Nicira, Inc.
> + * Copyright (c) 2007-2017 Nicira, Inc.
>   *
>   * This program is free software; you can redistribute it and/or
>   * modify it under the terms of version 2 of the GNU General Public
> @@ -59,6 +59,39 @@ struct ovs_len_tbl {
>  #define OVS_ATTR_NESTED -1
>  #define OVS_ATTR_VARIABLE -2
>
> +static bool actions_may_change_flow(const struct nlattr *actions)
> +{
> +       struct nlattr *nla;
> +       int rem;
> +
> +       nla_for_each_nested(nla, actions, rem) {
> +               u16 action = nla_type(nla);
> +
> +               switch (action) {
> +               case OVS_ACTION_ATTR_OUTPUT:
> +               case OVS_ACTION_ATTR_RECIRC:
> +               case OVS_ACTION_ATTR_USERSPACE:
> +               case OVS_ACTION_ATTR_SAMPLE:
> +               case OVS_ACTION_ATTR_TRUNC:

Trunc doesn't change the flow, but if there's something like
sample(output,trunc),output() then I wouldn't expect the final output
to be truncated.

<snip>

> @@ -2621,39 +2670,36 @@ int ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
>         return err;
>  }
>
> -static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff *skb)
> +static int sample_action_to_attr(const struct nlattr *attr,
> +                                struct sk_buff *skb)
>  {
> -       const struct nlattr *a;
> -       struct nlattr *start;
> -       int err = 0, rem;
> +       struct nlattr *start, *ac_start, *sample_arg;
> +       int err = 0, rem = nla_len(attr);
> +       const struct sample_arg *arg;
> +       struct nlattr *actions;
>
>         start = nla_nest_start(skb, OVS_ACTION_ATTR_SAMPLE);
>         if (!start)
>                 return -EMSGSIZE;
>
> -       nla_for_each_nested(a, attr, rem) {
> -               int type = nla_type(a);
> -               struct nlattr *st_sample;
> +       sample_arg = nla_data(attr);
> +       arg = nla_data(sample_arg);
> +       actions = nla_next(sample_arg, &rem);
>
> -               switch (type) {
> -               case OVS_SAMPLE_ATTR_PROBABILITY:
> -                       if (nla_put(skb, OVS_SAMPLE_ATTR_PROBABILITY,
> -                                   sizeof(u32), nla_data(a)))
> -                               return -EMSGSIZE;
> -                       break;
> -               case OVS_SAMPLE_ATTR_ACTIONS:
> -                       st_sample = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS);
> -                       if (!st_sample)
> -                               return -EMSGSIZE;
> -                       err = ovs_nla_put_actions(nla_data(a), nla_len(a), skb);
> -                       if (err)
> -                               return err;
> -                       nla_nest_end(skb, st_sample);
> -                       break;
> -               }
> -       }
> +       if (nla_put_u32(skb, OVS_SAMPLE_ATTR_PROBABILITY, arg->probability))
> +               return -EMSGSIZE;
>
> +       ac_start = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS);
> +       if (!ac_start)
> +               return -EMSGSIZE;
> +
> +       err = ovs_nla_put_actions(actions, rem, skb);
> +       if (err)
> +               return err;

Shouldn't we nla_nest_cancel() when something fails in the middle of
nesting nla attributes? For example I believe that upcall will attempt
to serialize actions, but if actions don't fit in the buffer then
it'll just skip the actions and forward everything else.

> +
> +       nla_nest_end(skb, ac_start);
>         nla_nest_end(skb, start);
> +
>         return err;
>  }
>
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC net-next sample action optimization 2/3] openvswitch: Refactor recirc key allocation.
  2017-03-09 19:11   ` Joe Stringer
@ 2017-03-10 21:44     ` Andy Zhou
  0 siblings, 0 replies; 10+ messages in thread
From: Andy Zhou @ 2017-03-10 21:44 UTC (permalink / raw)
  To: Joe Stringer; +Cc: netdev, pravin shelar

On Thu, Mar 9, 2017 at 11:11 AM, Joe Stringer <joe@ovn.org> wrote:
> On 7 March 2017 at 16:15, Andy Zhou <azhou@ovn.org> wrote:
>> The logic of allocating and copy key for each 'exec_actions_level'
>> was specific to execute_recirc(). However, future patches will reuse
>> as well.  Refactor the logic into its own function clone_key().
>>
>> Signed-off-by: Andy Zhou <azhou@ovn.org>
>> ---
>
> <snip>
>
>> @@ -83,14 +83,32 @@ struct action_fifo {
>>         struct deferred_action fifo[DEFERRED_ACTION_FIFO_SIZE];
>>  };
>>
>> -struct recirc_keys {
>> -       struct sw_flow_key key[OVS_DEFERRED_ACTION_THRESHOLD];
>> +struct action_flow_keys {
>> +       struct sw_flow_key key[OVS_ACTION_RECURSION_THRESHOLD];
>>  };
>
> I thought the old struct name was clearer on how it would be used -
> for when actions are deferred.

O.K. I will revert it.
>
>>
>>  static struct action_fifo __percpu *action_fifos;
>> -static struct recirc_keys __percpu *recirc_keys;
>> +static struct action_flow_keys __percpu *flow_keys;
>>  static DEFINE_PER_CPU(int, exec_actions_level);
>>
>> +/* Make a clone of the 'key', using the pre-allocated percpu 'flow_keys'
>> + * space. Since the storage is pre-allocated, the caller does not
>> + * need to check for NULL return pointer.
>> + */
>
> Hmm? if level > OVS_ACTION_RECURSION_THRESHOLD, this function returns NULL.
Thanks for catching this. I will update the comment.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases
  2017-03-09 19:46   ` Joe Stringer
@ 2017-03-10 22:07     ` Andy Zhou
  2017-03-10 23:12       ` Joe Stringer
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Zhou @ 2017-03-10 22:07 UTC (permalink / raw)
  To: Joe Stringer; +Cc: netdev, pravin shelar

On Thu, Mar 9, 2017 at 11:46 AM, Joe Stringer <joe@ovn.org> wrote:
> On 7 March 2017 at 16:15, Andy Zhou <azhou@ovn.org> wrote:
>> With the introduction of open flow 'clone' action, the OVS user space
>> can now translate the 'clone' action into kernel datapath 'sample'
>> action, with 100% probability, to ensure that the clone semantics,
>> which is that the packet seen by the clone action is the same as the
>> packet seen by the action after clone, is faithfully carried out
>> in the datapath.
>>
>> While the sample action in the datpath has the matching semantics,
>> its implementation is only optimized for its original use.
>> Specifically, there are two limitation: First, there is a 3 level of
>> nesting restriction, enforced at the flow downloading time. This
>> limit turns out to be too restrictive for the 'clone' use case.
>> Second, the implementation avoid recursive call only if the sample
>> action list has a single userspace action.
>>
>> The optimization implemented in this series removes the static
>> nesting limit check, instead, implement the run time recursion limit
>> check, and recursion avoidance similar to that of the 'recirc' action.
>> This optimization solve both #1 and #2 issues above.
>>
>> Another optimization implemented is to avoid coping flow key as
>
> *copying
>
>> long as the actions enclosed does not change the flow key. The
>> detection is performed only once at the flow downloading time.
>>
>> The third optimization implemented is to rewrite the action list
>> at flow downloading time in order to save the fast path from parsing
>> the sample action list in its original form repeatedly.
>
> Whenever there is an enumeration (1, 2, 3; ..another..., third thing
> implemented) in a commit message, I have to ask whether each "another
> change..." should be a separate patch. It certainly makes it easier to
> review.
>
They are all part of the same implementation. Spliting them probably won't
make much sense. I think I will drop #2 and #3 in the commit message since
#1 is the main optimization.

> I ran this through the OVS kernel tests and it's working correctly
> from that point of view, but I didn't get a chance to dig in and
> ensure for example, runtime behaviour of several nested
> sample(actions(sample(actions(sample(actions(output)))))) handles
> reasonably when it runs out of stack and deferred actions space. At a
> high level though, this seems pretty straightforward.
>
> Several comments below, thanks.
>
>>
>> Signed-off-by: Andy Zhou <azhou@ovn.org>
>> ---
>>  net/openvswitch/actions.c      | 106 ++++++++++++++++++------------------
>>  net/openvswitch/datapath.h     |   7 +++
>>  net/openvswitch/flow_netlink.c | 118 ++++++++++++++++++++++++++++-------------
>>  3 files changed, 140 insertions(+), 91 deletions(-)
>>
>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>> index 259aea9..2e8c372 100644
>> --- a/net/openvswitch/actions.c
>> +++ b/net/openvswitch/actions.c
>> @@ -930,71 +930,52 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
>>  }
>>
>>  static int sample(struct datapath *dp, struct sk_buff *skb,
>> -                 struct sw_flow_key *key, const struct nlattr *attr,
>> -                 const struct nlattr *actions, int actions_len)
>> +                 struct sw_flow_key *key, const struct nlattr *attr)
>>  {
>> -       const struct nlattr *acts_list = NULL;
>> -       const struct nlattr *a;
>> -       int rem;
>> -       u32 cutlen = 0;
>> -
>> -       for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
>> -                a = nla_next(a, &rem)) {
>> -               u32 probability;
>> -
>> -               switch (nla_type(a)) {
>> -               case OVS_SAMPLE_ATTR_PROBABILITY:
>> -                       probability = nla_get_u32(a);
>> -                       if (!probability || prandom_u32() > probability)
>> -                               return 0;
>> -                       break;
>> -
>> -               case OVS_SAMPLE_ATTR_ACTIONS:
>> -                       acts_list = a;
>> -                       break;
>> -               }
>> -       }
>> +       struct nlattr *actions;
>> +       struct nlattr *sample_arg;
>> +       struct sw_flow_key *orig = key;
>> +       int rem = nla_len(attr);
>> +       int err = 0;
>> +       const struct sample_arg *arg;
>>
>> -       rem = nla_len(acts_list);
>> -       a = nla_data(acts_list);
>> +       /* The first action is always 'OVS_SAMPLE_ATTR_AUX'. */
>
> Is it? This is the only reference to OVS_SAMPLE_ATTR_AUX I can see.
>
>> +       sample_arg = nla_data(attr);
>
> We could do this in the parent call, like several other actions do.

What do you mean?

>
> <snip>
>
>> @@ -1246,9 +1227,24 @@ static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
>>                         err = execute_masked_set_action(skb, key, nla_data(a));
>>                         break;
>>
>> -               case OVS_ACTION_ATTR_SAMPLE:
>> -                       err = sample(dp, skb, key, a, attr, len);
>> +               case OVS_ACTION_ATTR_SAMPLE: {
>> +                       bool last = nla_is_last(a, rem);
>> +                       struct sk_buff *clone_skb;
>> +
>> +                       clone_skb = last ? skb : skb_clone(skb, GFP_ATOMIC);
>> +
>> +                       if (!clone_skb)
>> +                               /* Out of memory, skip this sample action.
>> +                                */
>
> Should we ratelimited complain in this case?

The DP code has never logged against skb_clone() failures.

>
>> +                               break;
>> +
>> +                       err = sample(dp, clone_skb, key, a);
>> +                       if (last)
>> +                               return err;
>
> Given that sample() doesn't guarantee it will free 'clone_skb', I
> don't think this is safe.

Good point, I will move to clone into the sample() function, and only clone
after the probability test passes.

>
>> +
>> +                       pr_err("executing sample");
>
> Useful for debugging, but don't forget to drop for next version.
Sure
>
>>                         break;
>> +                       }
>
> This bracket is typically aligned with the character of the beginning
> of the 'case' that it applies to.
>
>>
>>                 case OVS_ACTION_ATTR_CT:
>>                         if (!is_flow_key_valid(key)) {
>> diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
>> index 1c6e937..d7a6e803 100644
>> --- a/net/openvswitch/datapath.h
>> +++ b/net/openvswitch/datapath.h
>> @@ -220,4 +220,11 @@ int ovs_execute_actions(struct datapath *dp, struct sk_buff *skb,
>>         if (logging_allowed && net_ratelimit())                 \
>>                 pr_info("netlink: " fmt "\n", ##__VA_ARGS__);   \
>>  } while (0)
>> +
>> +#define OVS_SAMPLE_ATTR_ARG  (OVS_ACTION_ATTR_MAX + 1)
>
> This should be defined in include/uapi/linux/openvswitch.h, protected
> by #ifdef __KERNEL__ like the other internal formats. While I don't
> think it actually conflicts with any other internal action format,
> these should be in one place so we don't end up accidentally using the
> same enum twice.
>
O.K.

>> +struct sample_arg {
>> +       bool exec;
>> +       u32  probability;
>> +};
>
> Perhaps we should move this to either net/openvswitch/flow_netlink.h
> or include/linux/openvswitch.h since it's only for consumption
> internally by the kernel module?

O.K.  I will move it into openvswitch.h
>
>> +
>>  #endif /* datapath.h */
>> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
>> index 6f5fa50..0e7cf13 100644
>> --- a/net/openvswitch/flow_netlink.c
>> +++ b/net/openvswitch/flow_netlink.c
>> @@ -1,5 +1,5 @@
>>  /*
>> - * Copyright (c) 2007-2014 Nicira, Inc.
>> + * Copyright (c) 2007-2017 Nicira, Inc.
>>   *
>>   * This program is free software; you can redistribute it and/or
>>   * modify it under the terms of version 2 of the GNU General Public
>> @@ -59,6 +59,39 @@ struct ovs_len_tbl {
>>  #define OVS_ATTR_NESTED -1
>>  #define OVS_ATTR_VARIABLE -2
>>
>> +static bool actions_may_change_flow(const struct nlattr *actions)
>> +{
>> +       struct nlattr *nla;
>> +       int rem;
>> +
>> +       nla_for_each_nested(nla, actions, rem) {
>> +               u16 action = nla_type(nla);
>> +
>> +               switch (action) {
>> +               case OVS_ACTION_ATTR_OUTPUT:
>> +               case OVS_ACTION_ATTR_RECIRC:
>> +               case OVS_ACTION_ATTR_USERSPACE:
>> +               case OVS_ACTION_ATTR_SAMPLE:
>> +               case OVS_ACTION_ATTR_TRUNC:
>
> Trunc doesn't change the flow, but if there's something like
> sample(output,trunc),output() then I wouldn't expect the final output
> to be truncated.

It should not, since skb is not seen by the 2nd output().
>
> <snip>
>
>> @@ -2621,39 +2670,36 @@ int ovs_nla_copy_actions(struct net *net, const struct nlattr *attr,
>>         return err;
>>  }
>>
>> -static int sample_action_to_attr(const struct nlattr *attr, struct sk_buff *skb)
>> +static int sample_action_to_attr(const struct nlattr *attr,
>> +                                struct sk_buff *skb)
>>  {
>> -       const struct nlattr *a;
>> -       struct nlattr *start;
>> -       int err = 0, rem;
>> +       struct nlattr *start, *ac_start, *sample_arg;
>> +       int err = 0, rem = nla_len(attr);
>> +       const struct sample_arg *arg;
>> +       struct nlattr *actions;
>>
>>         start = nla_nest_start(skb, OVS_ACTION_ATTR_SAMPLE);
>>         if (!start)
>>                 return -EMSGSIZE;
>>
>> -       nla_for_each_nested(a, attr, rem) {
>> -               int type = nla_type(a);
>> -               struct nlattr *st_sample;
>> +       sample_arg = nla_data(attr);
>> +       arg = nla_data(sample_arg);
>> +       actions = nla_next(sample_arg, &rem);
>>
>> -               switch (type) {
>> -               case OVS_SAMPLE_ATTR_PROBABILITY:
>> -                       if (nla_put(skb, OVS_SAMPLE_ATTR_PROBABILITY,
>> -                                   sizeof(u32), nla_data(a)))
>> -                               return -EMSGSIZE;
>> -                       break;
>> -               case OVS_SAMPLE_ATTR_ACTIONS:
>> -                       st_sample = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS);
>> -                       if (!st_sample)
>> -                               return -EMSGSIZE;
>> -                       err = ovs_nla_put_actions(nla_data(a), nla_len(a), skb);
>> -                       if (err)
>> -                               return err;
>> -                       nla_nest_end(skb, st_sample);
>> -                       break;
>> -               }
>> -       }
>> +       if (nla_put_u32(skb, OVS_SAMPLE_ATTR_PROBABILITY, arg->probability))
>> +               return -EMSGSIZE;
>>
>> +       ac_start = nla_nest_start(skb, OVS_SAMPLE_ATTR_ACTIONS);
>> +       if (!ac_start)
>> +               return -EMSGSIZE;
>> +
>> +       err = ovs_nla_put_actions(actions, rem, skb);
>> +       if (err)
>> +               return err;
>
> Shouldn't we nla_nest_cancel() when something fails in the middle of
> nesting nla attributes? For example I believe that upcall will attempt
> to serialize actions, but if actions don't fit in the buffer then
> it'll just skip the actions and forward everything else.

Good catch.  Will fix.
>
>> +
>> +       nla_nest_end(skb, ac_start);
>>         nla_nest_end(skb, start);
>> +
>>         return err;
>>  }
>>
>> --
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases
  2017-03-10 22:07     ` Andy Zhou
@ 2017-03-10 23:12       ` Joe Stringer
  0 siblings, 0 replies; 10+ messages in thread
From: Joe Stringer @ 2017-03-10 23:12 UTC (permalink / raw)
  To: Andy Zhou; +Cc: netdev, pravin shelar

On 10 March 2017 at 14:07, Andy Zhou <azhou@ovn.org> wrote:
> On Thu, Mar 9, 2017 at 11:46 AM, Joe Stringer <joe@ovn.org> wrote:
>> On 7 March 2017 at 16:15, Andy Zhou <azhou@ovn.org> wrote:
>>> With the introduction of open flow 'clone' action, the OVS user space
>>> can now translate the 'clone' action into kernel datapath 'sample'
>>> action, with 100% probability, to ensure that the clone semantics,
>>> which is that the packet seen by the clone action is the same as the
>>> packet seen by the action after clone, is faithfully carried out
>>> in the datapath.
>>>
>>> While the sample action in the datpath has the matching semantics,
>>> its implementation is only optimized for its original use.
>>> Specifically, there are two limitation: First, there is a 3 level of
>>> nesting restriction, enforced at the flow downloading time. This
>>> limit turns out to be too restrictive for the 'clone' use case.
>>> Second, the implementation avoid recursive call only if the sample
>>> action list has a single userspace action.
>>>
>>> The optimization implemented in this series removes the static
>>> nesting limit check, instead, implement the run time recursion limit
>>> check, and recursion avoidance similar to that of the 'recirc' action.
>>> This optimization solve both #1 and #2 issues above.
>>>
>>> Another optimization implemented is to avoid coping flow key as
>>
>> *copying
>>
>>> long as the actions enclosed does not change the flow key. The
>>> detection is performed only once at the flow downloading time.
>>>
>>> The third optimization implemented is to rewrite the action list
>>> at flow downloading time in order to save the fast path from parsing
>>> the sample action list in its original form repeatedly.
>>
>> Whenever there is an enumeration (1, 2, 3; ..another..., third thing
>> implemented) in a commit message, I have to ask whether each "another
>> change..." should be a separate patch. It certainly makes it easier to
>> review.
>>
> They are all part of the same implementation. Spliting them probably won't
> make much sense. I think I will drop #2 and #3 in the commit message since
> #1 is the main optimization.

Fair enough. You don't have to drop them from the commit message, it
makes the intention of all of the changes more clear.

>> I ran this through the OVS kernel tests and it's working correctly
>> from that point of view, but I didn't get a chance to dig in and
>> ensure for example, runtime behaviour of several nested
>> sample(actions(sample(actions(sample(actions(output)))))) handles
>> reasonably when it runs out of stack and deferred actions space. At a
>> high level though, this seems pretty straightforward.
>>
>> Several comments below, thanks.
>>
>>>
>>> Signed-off-by: Andy Zhou <azhou@ovn.org>
>>> ---
>>>  net/openvswitch/actions.c      | 106 ++++++++++++++++++------------------
>>>  net/openvswitch/datapath.h     |   7 +++
>>>  net/openvswitch/flow_netlink.c | 118 ++++++++++++++++++++++++++++-------------
>>>  3 files changed, 140 insertions(+), 91 deletions(-)
>>>
>>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
>>> index 259aea9..2e8c372 100644
>>> --- a/net/openvswitch/actions.c
>>> +++ b/net/openvswitch/actions.c
>>> @@ -930,71 +930,52 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb,
>>>  }
>>>
>>>  static int sample(struct datapath *dp, struct sk_buff *skb,
>>> -                 struct sw_flow_key *key, const struct nlattr *attr,
>>> -                 const struct nlattr *actions, int actions_len)
>>> +                 struct sw_flow_key *key, const struct nlattr *attr)
>>>  {
>>> -       const struct nlattr *acts_list = NULL;
>>> -       const struct nlattr *a;
>>> -       int rem;
>>> -       u32 cutlen = 0;
>>> -
>>> -       for (a = nla_data(attr), rem = nla_len(attr); rem > 0;
>>> -                a = nla_next(a, &rem)) {
>>> -               u32 probability;
>>> -
>>> -               switch (nla_type(a)) {
>>> -               case OVS_SAMPLE_ATTR_PROBABILITY:
>>> -                       probability = nla_get_u32(a);
>>> -                       if (!probability || prandom_u32() > probability)
>>> -                               return 0;
>>> -                       break;
>>> -
>>> -               case OVS_SAMPLE_ATTR_ACTIONS:
>>> -                       acts_list = a;
>>> -                       break;
>>> -               }
>>> -       }
>>> +       struct nlattr *actions;
>>> +       struct nlattr *sample_arg;
>>> +       struct sw_flow_key *orig = key;
>>> +       int rem = nla_len(attr);
>>> +       int err = 0;
>>> +       const struct sample_arg *arg;
>>>
>>> -       rem = nla_len(acts_list);
>>> -       a = nla_data(acts_list);
>>> +       /* The first action is always 'OVS_SAMPLE_ATTR_AUX'. */
>>
>> Is it? This is the only reference to OVS_SAMPLE_ATTR_AUX I can see.
>>
>>> +       sample_arg = nla_data(attr);
>>
>> We could do this in the parent call, like several other actions do.
>
> What do you mean?

In do_execute_actions(), call like this:

err = sample(dp, clone_skb, key, nla_data(a));

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-03-10 23:13 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-08  0:15 [RFC net-next sample action optimization 0/3] Andy Zhou
2017-03-08  0:15 ` [RFC net-next sample action optimization 1/3] openvswitch: deferred fifo api change Andy Zhou
2017-03-09 19:08   ` Joe Stringer
2017-03-08  0:15 ` [RFC net-next sample action optimization 2/3] openvswitch: Refactor recirc key allocation Andy Zhou
2017-03-09 19:11   ` Joe Stringer
2017-03-10 21:44     ` Andy Zhou
2017-03-08  0:15 ` [RFC net-next sample action optimization 3/3] openvswitch: Optimize sample action for the clone use cases Andy Zhou
2017-03-09 19:46   ` Joe Stringer
2017-03-10 22:07     ` Andy Zhou
2017-03-10 23:12       ` Joe Stringer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.