netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 00/10] optimize openvswitch flow looking up
@ 2019-10-08  1:00 xiangxia.m.yue
  2019-10-08  1:00 ` [PATCH net-next v2 01/10] net: openvswitch: add flow-mask cache for performance xiangxia.m.yue
                   ` (10 more replies)
  0 siblings, 11 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08  1:00 UTC (permalink / raw)
  To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This series patch optimize openvswitch for performance or simplify
codes.

Patch 1, 2, 4: Port Pravin B Shelar patches to
linux upstream with little changes.

Patch 5, 6, 7: Optimize the flow looking up and
simplify the flow hash.

Patch 8, 9: are bugfix.

The performance test is on Intel Xeon E5-2630 v4.
The test topology is show as below:

+-----------------------------------+
|   +---------------------------+   |
|   | eth0   ovs-switch    eth1 |   | Host0
|   +---------------------------+   |
+-----------------------------------+
      ^                       |
      |                       |
      |                       |
      |                       |
      |                       v
+-----+----+             +----+-----+
| netperf  | Host1       | netserver| Host2
+----------+             +----------+

We use netperf send the 64B packets, and insert 255+ flow-mask:
$ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:01:00:00:00:00/ff:ff:ff:ff:ff:01),eth_type(0x0800),ipv4(frag=no)" 2
...
$ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:ff:00:00:00:00/ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(frag=no)" 2
$
$ netperf -t UDP_STREAM -H 2.2.2.200 -l 40 -- -m 18

* Without series patch, throughput 8.28Mbps
* With series patch, throughput 46.05Mbps

v1 -> v2:
1. use kfree_rcu instead of call_rcu.
2. add barrier when changing the ma->count.
3. change the ma->max to ma->count in flow_lookup. 

Tonghao Zhang (10):
  net: openvswitch: add flow-mask cache for performance
  net: openvswitch: convert mask list in mask array
  net: openvswitch: shrink the mask array if necessary
  net: openvswitch: optimize flow-mask cache hash collision
  net: openvswitch: optimize flow-mask looking up
  net: openvswitch: simplify the flow_hash
  net: openvswitch: add likely in flow_lookup
  net: openvswitch: fix possible memleak on destroy flow-table
  net: openvswitch: don't unlock mutex when changing the user_features
    fails
  net: openvswitch: simplify the ovs_dp_cmd_new

 net/openvswitch/datapath.c   |  65 +++++----
 net/openvswitch/flow.h       |   1 -
 net/openvswitch/flow_table.c | 315 +++++++++++++++++++++++++++++++++++++------
 net/openvswitch/flow_table.h |  19 ++-
 4 files changed, 328 insertions(+), 72 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH net-next v2 01/10] net: openvswitch: add flow-mask cache for performance
  2019-10-08  1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
@ 2019-10-08  1:00 ` xiangxia.m.yue
  2019-10-08  1:00 ` [PATCH net-next v2 02/10] net: openvswitch: convert mask list in mask array xiangxia.m.yue
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08  1:00 UTC (permalink / raw)
  To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

The idea of this optimization comes from a patch which
is committed in 2014, openvswitch community. The author
is Pravin B Shelar. In order to get high performance, I
implement it again. Later patches will use it.

Pravin B Shelar, says:
| On every packet OVS needs to lookup flow-table with every
| mask until it finds a match. The packet flow-key is first
| masked with mask in the list and then the masked key is
| looked up in flow-table. Therefore number of masks can
| affect packet processing performance.

Link: https://github.com/openvswitch/ovs/commit/5604935e4e1cbc16611d2d97f50b717aa31e8ec5
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/openvswitch/datapath.c   |   3 +-
 net/openvswitch/flow_table.c | 109 +++++++++++++++++++++++++++++++++++++------
 net/openvswitch/flow_table.h |  11 ++++-
 3 files changed, 107 insertions(+), 16 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index f30e406..9fea7e1 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -227,7 +227,8 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
 	stats = this_cpu_ptr(dp->stats_percpu);
 
 	/* Look up flow. */
-	flow = ovs_flow_tbl_lookup_stats(&dp->table, key, &n_mask_hit);
+	flow = ovs_flow_tbl_lookup_stats(&dp->table, key, skb_get_hash(skb),
+					 &n_mask_hit);
 	if (unlikely(!flow)) {
 		struct dp_upcall_info upcall;
 
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index cf3582c..3d515c0 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -36,6 +36,10 @@
 #define TBL_MIN_BUCKETS		1024
 #define REHASH_INTERVAL		(10 * 60 * HZ)
 
+#define MC_HASH_SHIFT		8
+#define MC_HASH_ENTRIES		(1u << MC_HASH_SHIFT)
+#define MC_HASH_SEGS		((sizeof(uint32_t) * 8) / MC_HASH_SHIFT)
+
 static struct kmem_cache *flow_cache;
 struct kmem_cache *flow_stats_cache __read_mostly;
 
@@ -168,10 +172,15 @@ int ovs_flow_tbl_init(struct flow_table *table)
 {
 	struct table_instance *ti, *ufid_ti;
 
-	ti = table_instance_alloc(TBL_MIN_BUCKETS);
+	table->mask_cache = __alloc_percpu(sizeof(struct mask_cache_entry) *
+					   MC_HASH_ENTRIES,
+					   __alignof__(struct mask_cache_entry));
+	if (!table->mask_cache)
+		return -ENOMEM;
 
+	ti = table_instance_alloc(TBL_MIN_BUCKETS);
 	if (!ti)
-		return -ENOMEM;
+		goto free_mask_cache;
 
 	ufid_ti = table_instance_alloc(TBL_MIN_BUCKETS);
 	if (!ufid_ti)
@@ -187,6 +196,8 @@ int ovs_flow_tbl_init(struct flow_table *table)
 
 free_ti:
 	__table_instance_destroy(ti);
+free_mask_cache:
+	free_percpu(table->mask_cache);
 	return -ENOMEM;
 }
 
@@ -243,6 +254,7 @@ void ovs_flow_tbl_destroy(struct flow_table *table)
 	struct table_instance *ti = rcu_dereference_raw(table->ti);
 	struct table_instance *ufid_ti = rcu_dereference_raw(table->ufid_ti);
 
+	free_percpu(table->mask_cache);
 	table_instance_destroy(ti, ufid_ti, false);
 }
 
@@ -425,7 +437,8 @@ static bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow,
 
 static struct sw_flow *masked_flow_lookup(struct table_instance *ti,
 					  const struct sw_flow_key *unmasked,
-					  const struct sw_flow_mask *mask)
+					  const struct sw_flow_mask *mask,
+					  u32 *n_mask_hit)
 {
 	struct sw_flow *flow;
 	struct hlist_head *head;
@@ -435,6 +448,8 @@ static struct sw_flow *masked_flow_lookup(struct table_instance *ti,
 	ovs_flow_mask_key(&masked_key, unmasked, false, mask);
 	hash = flow_hash(&masked_key, &mask->range);
 	head = find_bucket(ti, hash);
+	(*n_mask_hit)++;
+
 	hlist_for_each_entry_rcu(flow, head, flow_table.node[ti->node_ver]) {
 		if (flow->mask == mask && flow->flow_table.hash == hash &&
 		    flow_cmp_masked_key(flow, &masked_key, &mask->range))
@@ -443,30 +458,97 @@ static struct sw_flow *masked_flow_lookup(struct table_instance *ti,
 	return NULL;
 }
 
-struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
-				    const struct sw_flow_key *key,
-				    u32 *n_mask_hit)
+static struct sw_flow *flow_lookup(struct flow_table *tbl,
+				   struct table_instance *ti,
+				   const struct sw_flow_key *key,
+				   u32 *n_mask_hit)
 {
-	struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
 	struct sw_flow_mask *mask;
 	struct sw_flow *flow;
 
-	*n_mask_hit = 0;
 	list_for_each_entry_rcu(mask, &tbl->mask_list, list) {
-		(*n_mask_hit)++;
-		flow = masked_flow_lookup(ti, key, mask);
+		flow = masked_flow_lookup(ti, key, mask, n_mask_hit);
 		if (flow)  /* Found */
 			return flow;
 	}
 	return NULL;
 }
 
+/*
+ * mask_cache maps flow to probable mask. This cache is not tightly
+ * coupled cache, It means updates to  mask list can result in inconsistent
+ * cache entry in mask cache.
+ * This is per cpu cache and is divided in MC_HASH_SEGS segments.
+ * In case of a hash collision the entry is hashed in next segment.
+ * */
+struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
+					  const struct sw_flow_key *key,
+					  u32 skb_hash,
+					  u32 *n_mask_hit)
+{
+	struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
+	struct mask_cache_entry  *entries, *ce, *del;
+	struct sw_flow *flow;
+	u32 hash = skb_hash;
+	int seg;
+
+	*n_mask_hit = 0;
+	if (unlikely(!skb_hash))
+		return flow_lookup(tbl, ti, key, n_mask_hit);
+
+	del = NULL;
+	entries = this_cpu_ptr(tbl->mask_cache);
+
+	for (seg = 0; seg < MC_HASH_SEGS; seg++) {
+		int index;
+
+		index = hash & (MC_HASH_ENTRIES - 1);
+		ce = &entries[index];
+
+		if (ce->skb_hash == skb_hash) {
+			struct sw_flow_mask *mask;
+			int i;
+
+			i = 0;
+			list_for_each_entry_rcu(mask, &tbl->mask_list, list) {
+				if (ce->mask_index == i++) {
+					flow = masked_flow_lookup(ti, key, mask,
+								  n_mask_hit);
+					if (flow)  /* Found */
+						return flow;
+
+					break;
+				}
+			}
+
+			del = ce;
+			break;
+		}
+
+		if (!del || (del->skb_hash && !ce->skb_hash)) {
+			del = ce;
+		}
+
+		hash >>= MC_HASH_SHIFT;
+	}
+
+	flow = flow_lookup(tbl, ti, key, n_mask_hit);
+
+	if (flow) {
+		del->skb_hash = skb_hash;
+		del->mask_index = (*n_mask_hit - 1);
+	}
+
+	return flow;
+}
+
 struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *tbl,
 				    const struct sw_flow_key *key)
 {
+	struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
 	u32 __always_unused n_mask_hit;
 
-	return ovs_flow_tbl_lookup_stats(tbl, key, &n_mask_hit);
+	return flow_lookup(tbl, ti, key, &n_mask_hit);
 }
 
 struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,
@@ -475,10 +557,11 @@ struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,
 	struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
 	struct sw_flow_mask *mask;
 	struct sw_flow *flow;
+	u32 __always_unused n_mask_hit;
 
 	/* Always called under ovs-mutex. */
 	list_for_each_entry(mask, &tbl->mask_list, list) {
-		flow = masked_flow_lookup(ti, match->key, mask);
+		flow = masked_flow_lookup(ti, match->key, mask, &n_mask_hit);
 		if (flow && ovs_identifier_is_key(&flow->id) &&
 		    ovs_flow_cmp_unmasked_key(flow, match))
 			return flow;
@@ -631,7 +714,7 @@ static int flow_mask_insert(struct flow_table *tbl, struct sw_flow *flow,
 			return -ENOMEM;
 		mask->key = new->key;
 		mask->range = new->range;
-		list_add_rcu(&mask->list, &tbl->mask_list);
+		list_add_tail_rcu(&mask->list, &tbl->mask_list);
 	} else {
 		BUG_ON(!mask->ref_count);
 		mask->ref_count++;
diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h
index bc52045..04b6b1c 100644
--- a/net/openvswitch/flow_table.h
+++ b/net/openvswitch/flow_table.h
@@ -22,6 +22,11 @@
 
 #include "flow.h"
 
+struct mask_cache_entry {
+	u32 skb_hash;
+	u32 mask_index;
+};
+
 struct table_instance {
 	struct hlist_head *buckets;
 	unsigned int n_buckets;
@@ -34,6 +39,7 @@ struct table_instance {
 struct flow_table {
 	struct table_instance __rcu *ti;
 	struct table_instance __rcu *ufid_ti;
+	struct mask_cache_entry __percpu *mask_cache;
 	struct list_head mask_list;
 	unsigned long last_rehash;
 	unsigned int count;
@@ -60,8 +66,9 @@ int ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow,
 struct sw_flow *ovs_flow_tbl_dump_next(struct table_instance *table,
 				       u32 *bucket, u32 *idx);
 struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *,
-				    const struct sw_flow_key *,
-				    u32 *n_mask_hit);
+					  const struct sw_flow_key *,
+					  u32 skb_hash,
+					  u32 *n_mask_hit);
 struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *,
 				    const struct sw_flow_key *);
 struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next v2 02/10] net: openvswitch: convert mask list in mask array
  2019-10-08  1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
  2019-10-08  1:00 ` [PATCH net-next v2 01/10] net: openvswitch: add flow-mask cache for performance xiangxia.m.yue
@ 2019-10-08  1:00 ` xiangxia.m.yue
  2019-10-08  1:00 ` [PATCH net-next v2 03/10] net: openvswitch: shrink the mask array if necessary xiangxia.m.yue
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08  1:00 UTC (permalink / raw)
  To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Port the codes to linux upstream and with little changes.

Pravin B Shelar, says:
| mask caches index of mask in mask_list. On packet recv OVS
| need to traverse mask-list to get cached mask. Therefore array
| is better for retrieving cached mask. This also allows better
| cache replacement algorithm by directly checking mask's existence.

Link: https://github.com/openvswitch/ovs/commit/d49fc3ff53c65e4eca9cabd52ac63396746a7ef5
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/openvswitch/flow.h       |   1 -
 net/openvswitch/flow_table.c | 210 ++++++++++++++++++++++++++++++++-----------
 net/openvswitch/flow_table.h |   8 +-
 3 files changed, 167 insertions(+), 52 deletions(-)

diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index b830d5f..8080518 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -166,7 +166,6 @@ struct sw_flow_key_range {
 struct sw_flow_mask {
 	int ref_count;
 	struct rcu_head rcu;
-	struct list_head list;
 	struct sw_flow_key_range range;
 	struct sw_flow_key key;
 };
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 3d515c0..aab7a27 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -34,6 +34,7 @@
 #include <net/ndisc.h>
 
 #define TBL_MIN_BUCKETS		1024
+#define MASK_ARRAY_SIZE_MIN	16
 #define REHASH_INTERVAL		(10 * 60 * HZ)
 
 #define MC_HASH_SHIFT		8
@@ -168,9 +169,51 @@ static struct table_instance *table_instance_alloc(int new_size)
 	return ti;
 }
 
+static struct mask_array *tbl_mask_array_alloc(int size)
+{
+	struct mask_array *new;
+
+	size = max(MASK_ARRAY_SIZE_MIN, size);
+	new = kzalloc(sizeof(struct mask_array) +
+		      sizeof(struct sw_flow_mask *) * size, GFP_KERNEL);
+	if (!new)
+		return NULL;
+
+	new->count = 0;
+	new->max = size;
+
+	return new;
+}
+
+static int tbl_mask_array_realloc(struct flow_table *tbl, int size)
+{
+	struct mask_array *old;
+	struct mask_array *new;
+
+	new = tbl_mask_array_alloc(size);
+	if (!new)
+		return -ENOMEM;
+
+	old = ovsl_dereference(tbl->mask_array);
+	if (old) {
+		int i;
+
+		for (i = 0; i < old->max; i++) {
+			if (ovsl_dereference(old->masks[i]))
+				new->masks[new->count++] = old->masks[i];
+		}
+	}
+
+	rcu_assign_pointer(tbl->mask_array, new);
+	kfree_rcu(old, rcu);
+
+	return 0;
+}
+
 int ovs_flow_tbl_init(struct flow_table *table)
 {
 	struct table_instance *ti, *ufid_ti;
+	struct mask_array *ma;
 
 	table->mask_cache = __alloc_percpu(sizeof(struct mask_cache_entry) *
 					   MC_HASH_ENTRIES,
@@ -178,9 +221,13 @@ int ovs_flow_tbl_init(struct flow_table *table)
 	if (!table->mask_cache)
 		return -ENOMEM;
 
+	ma = tbl_mask_array_alloc(MASK_ARRAY_SIZE_MIN);
+	if (!ma)
+		goto free_mask_cache;
+
 	ti = table_instance_alloc(TBL_MIN_BUCKETS);
 	if (!ti)
-		goto free_mask_cache;
+		goto free_mask_array;
 
 	ufid_ti = table_instance_alloc(TBL_MIN_BUCKETS);
 	if (!ufid_ti)
@@ -188,7 +235,7 @@ int ovs_flow_tbl_init(struct flow_table *table)
 
 	rcu_assign_pointer(table->ti, ti);
 	rcu_assign_pointer(table->ufid_ti, ufid_ti);
-	INIT_LIST_HEAD(&table->mask_list);
+	rcu_assign_pointer(table->mask_array, ma);
 	table->last_rehash = jiffies;
 	table->count = 0;
 	table->ufid_count = 0;
@@ -196,6 +243,8 @@ int ovs_flow_tbl_init(struct flow_table *table)
 
 free_ti:
 	__table_instance_destroy(ti);
+free_mask_array:
+	kfree(ma);
 free_mask_cache:
 	free_percpu(table->mask_cache);
 	return -ENOMEM;
@@ -255,6 +304,7 @@ void ovs_flow_tbl_destroy(struct flow_table *table)
 	struct table_instance *ufid_ti = rcu_dereference_raw(table->ufid_ti);
 
 	free_percpu(table->mask_cache);
+	kfree_rcu(table->mask_array, rcu);
 	table_instance_destroy(ti, ufid_ti, false);
 }
 
@@ -460,17 +510,27 @@ static struct sw_flow *masked_flow_lookup(struct table_instance *ti,
 
 static struct sw_flow *flow_lookup(struct flow_table *tbl,
 				   struct table_instance *ti,
+				   struct mask_array *ma,
 				   const struct sw_flow_key *key,
-				   u32 *n_mask_hit)
+				   u32 *n_mask_hit,
+				   u32 *index)
 {
-	struct sw_flow_mask *mask;
 	struct sw_flow *flow;
+	int i;
 
-	list_for_each_entry_rcu(mask, &tbl->mask_list, list) {
-		flow = masked_flow_lookup(ti, key, mask, n_mask_hit);
-		if (flow)  /* Found */
-			return flow;
+	for (i = 0; i < ma->max; i++)  {
+		struct sw_flow_mask *mask;
+
+		mask = rcu_dereference_ovsl(ma->masks[i]);
+		if (mask) {
+			flow = masked_flow_lookup(ti, key, mask, n_mask_hit);
+			if (flow) { /* Found */
+				*index = i;
+				return flow;
+			}
+		}
 	}
+
 	return NULL;
 }
 
@@ -486,6 +546,7 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
 					  u32 skb_hash,
 					  u32 *n_mask_hit)
 {
+	struct mask_array *ma = rcu_dereference_ovsl(tbl->mask_array);
 	struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
 	struct mask_cache_entry  *entries, *ce, *del;
 	struct sw_flow *flow;
@@ -493,8 +554,11 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
 	int seg;
 
 	*n_mask_hit = 0;
-	if (unlikely(!skb_hash))
-		return flow_lookup(tbl, ti, key, n_mask_hit);
+	if (unlikely(!skb_hash)) {
+		u32 __always_unused mask_index;
+
+		return flow_lookup(tbl, ti, ma, key, n_mask_hit, &mask_index);
+	}
 
 	del = NULL;
 	entries = this_cpu_ptr(tbl->mask_cache);
@@ -507,37 +571,33 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
 
 		if (ce->skb_hash == skb_hash) {
 			struct sw_flow_mask *mask;
-			int i;
-
-			i = 0;
-			list_for_each_entry_rcu(mask, &tbl->mask_list, list) {
-				if (ce->mask_index == i++) {
-					flow = masked_flow_lookup(ti, key, mask,
-								  n_mask_hit);
-					if (flow)  /* Found */
-						return flow;
-
-					break;
-				}
+			struct sw_flow *flow;
+
+			mask = rcu_dereference_ovsl(ma->masks[ce->mask_index]);
+			if (mask) {
+				flow = masked_flow_lookup(ti, key, mask,
+							  n_mask_hit);
+				if (flow)  /* Found */
+					return flow;
 			}
 
 			del = ce;
 			break;
 		}
 
-		if (!del || (del->skb_hash && !ce->skb_hash)) {
+		if (!del || (del->skb_hash && !ce->skb_hash) ||
+		    (rcu_dereference_ovsl(ma->masks[del->mask_index]) &&
+		     !rcu_dereference_ovsl(ma->masks[ce->mask_index]))) {
 			del = ce;
 		}
 
 		hash >>= MC_HASH_SHIFT;
 	}
 
-	flow = flow_lookup(tbl, ti, key, n_mask_hit);
+	flow = flow_lookup(tbl, ti, ma, key, n_mask_hit, &del->mask_index);
 
-	if (flow) {
+	if (flow)
 		del->skb_hash = skb_hash;
-		del->mask_index = (*n_mask_hit - 1);
-	}
 
 	return flow;
 }
@@ -546,26 +606,38 @@ struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *tbl,
 				    const struct sw_flow_key *key)
 {
 	struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
+	struct mask_array *ma = rcu_dereference_ovsl(tbl->mask_array);
+
 	u32 __always_unused n_mask_hit;
+	u32 __always_unused index;
 
-	return flow_lookup(tbl, ti, key, &n_mask_hit);
+	return flow_lookup(tbl, ti, ma, key, &n_mask_hit, &index);
 }
 
 struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,
 					  const struct sw_flow_match *match)
 {
-	struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
-	struct sw_flow_mask *mask;
-	struct sw_flow *flow;
-	u32 __always_unused n_mask_hit;
+	struct mask_array *ma = ovsl_dereference(tbl->mask_array);
+	int i;
 
 	/* Always called under ovs-mutex. */
-	list_for_each_entry(mask, &tbl->mask_list, list) {
+	for (i = 0; i < ma->max; i++) {
+		struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
+		u32 __always_unused n_mask_hit;
+		struct sw_flow_mask *mask;
+		struct sw_flow *flow;
+
+		mask = ovsl_dereference(ma->masks[i]);
+		if (!mask)
+			continue;
+
 		flow = masked_flow_lookup(ti, match->key, mask, &n_mask_hit);
 		if (flow && ovs_identifier_is_key(&flow->id) &&
-		    ovs_flow_cmp_unmasked_key(flow, match))
+		    ovs_flow_cmp_unmasked_key(flow, match)) {
 			return flow;
+		}
 	}
+
 	return NULL;
 }
 
@@ -611,13 +683,8 @@ struct sw_flow *ovs_flow_tbl_lookup_ufid(struct flow_table *tbl,
 
 int ovs_flow_tbl_num_masks(const struct flow_table *table)
 {
-	struct sw_flow_mask *mask;
-	int num = 0;
-
-	list_for_each_entry(mask, &table->mask_list, list)
-		num++;
-
-	return num;
+	struct mask_array *ma = rcu_dereference_ovsl(table->mask_array);
+	return ma->count;
 }
 
 static struct table_instance *table_instance_expand(struct table_instance *ti,
@@ -638,8 +705,19 @@ static void flow_mask_remove(struct flow_table *tbl, struct sw_flow_mask *mask)
 		mask->ref_count--;
 
 		if (!mask->ref_count) {
-			list_del_rcu(&mask->list);
-			kfree_rcu(mask, rcu);
+			struct mask_array *ma;
+			int i;
+
+			ma = ovsl_dereference(tbl->mask_array);
+			for (i = 0; i < ma->max; i++) {
+				if (mask == ovsl_dereference(ma->masks[i])) {
+					RCU_INIT_POINTER(ma->masks[i], NULL);
+					ma->count--;
+					kfree_rcu(mask, rcu);
+					return;
+				}
+			}
+			BUG();
 		}
 	}
 }
@@ -689,13 +767,16 @@ static bool mask_equal(const struct sw_flow_mask *a,
 static struct sw_flow_mask *flow_mask_find(const struct flow_table *tbl,
 					   const struct sw_flow_mask *mask)
 {
-	struct list_head *ml;
+	struct mask_array *ma;
+	int i;
 
-	list_for_each(ml, &tbl->mask_list) {
-		struct sw_flow_mask *m;
-		m = container_of(ml, struct sw_flow_mask, list);
-		if (mask_equal(mask, m))
-			return m;
+	ma = ovsl_dereference(tbl->mask_array);
+	for (i = 0; i < ma->max; i++) {
+		struct sw_flow_mask *t;
+		t = ovsl_dereference(ma->masks[i]);
+
+		if (t && mask_equal(mask, t))
+			return t;
 	}
 
 	return NULL;
@@ -706,15 +787,44 @@ static int flow_mask_insert(struct flow_table *tbl, struct sw_flow *flow,
 			    const struct sw_flow_mask *new)
 {
 	struct sw_flow_mask *mask;
+
 	mask = flow_mask_find(tbl, new);
 	if (!mask) {
+		struct mask_array *ma;
+		int i;
+
 		/* Allocate a new mask if none exsits. */
 		mask = mask_alloc();
 		if (!mask)
 			return -ENOMEM;
 		mask->key = new->key;
 		mask->range = new->range;
-		list_add_tail_rcu(&mask->list, &tbl->mask_list);
+
+		/* Add mask to mask-list. */
+		ma = ovsl_dereference(tbl->mask_array);
+		if (ma->count >= ma->max) {
+			int err;
+
+			err = tbl_mask_array_realloc(tbl, ma->max +
+						     MASK_ARRAY_SIZE_MIN);
+			if (err) {
+				kfree(mask);
+				return err;
+			}
+
+			ma = ovsl_dereference(tbl->mask_array);
+		}
+
+		for (i = 0; i < ma->max; i++) {
+			const struct sw_flow_mask *t;
+
+			t = ovsl_dereference(ma->masks[i]);
+			if (!t) {
+				rcu_assign_pointer(ma->masks[i], mask);
+				ma->count++;
+				break;
+			}
+		}
 	} else {
 		BUG_ON(!mask->ref_count);
 		mask->ref_count++;
diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h
index 04b6b1c..8a5cea6 100644
--- a/net/openvswitch/flow_table.h
+++ b/net/openvswitch/flow_table.h
@@ -27,6 +27,12 @@ struct mask_cache_entry {
 	u32 mask_index;
 };
 
+struct mask_array {
+	struct rcu_head rcu;
+	int count, max;
+	struct sw_flow_mask __rcu *masks[];
+};
+
 struct table_instance {
 	struct hlist_head *buckets;
 	unsigned int n_buckets;
@@ -40,7 +46,7 @@ struct flow_table {
 	struct table_instance __rcu *ti;
 	struct table_instance __rcu *ufid_ti;
 	struct mask_cache_entry __percpu *mask_cache;
-	struct list_head mask_list;
+	struct mask_array __rcu *mask_array;
 	unsigned long last_rehash;
 	unsigned int count;
 	unsigned int ufid_count;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next v2 03/10] net: openvswitch: shrink the mask array if necessary
  2019-10-08  1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
  2019-10-08  1:00 ` [PATCH net-next v2 01/10] net: openvswitch: add flow-mask cache for performance xiangxia.m.yue
  2019-10-08  1:00 ` [PATCH net-next v2 02/10] net: openvswitch: convert mask list in mask array xiangxia.m.yue
@ 2019-10-08  1:00 ` xiangxia.m.yue
  2019-10-08  1:00 ` [PATCH net-next v2 04/10] net: openvswitch: optimize flow-mask cache hash collision xiangxia.m.yue
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08  1:00 UTC (permalink / raw)
  To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

When creating and inserting flow-mask, if there is no available
flow-mask, we realloc the mask array. When removing flow-mask,
if necessary, we shrink mask array.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/openvswitch/flow_table.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index aab7a27..ff4c4b049 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -693,6 +693,23 @@ static struct table_instance *table_instance_expand(struct table_instance *ti,
 	return table_instance_rehash(ti, ti->n_buckets * 2, ufid);
 }
 
+static void tbl_mask_array_delete_mask(struct mask_array *ma,
+				       struct sw_flow_mask *mask)
+{
+	int i;
+
+	/* Remove the deleted mask pointers from the array */
+	for (i = 0; i < ma->max; i++) {
+		if (mask == ovsl_dereference(ma->masks[i])) {
+			RCU_INIT_POINTER(ma->masks[i], NULL);
+			ma->count--;
+			kfree_rcu(mask, rcu);
+			return;
+		}
+	}
+	BUG();
+}
+
 /* Remove 'mask' from the mask list, if it is not needed any more. */
 static void flow_mask_remove(struct flow_table *tbl, struct sw_flow_mask *mask)
 {
@@ -706,18 +723,14 @@ static void flow_mask_remove(struct flow_table *tbl, struct sw_flow_mask *mask)
 
 		if (!mask->ref_count) {
 			struct mask_array *ma;
-			int i;
 
 			ma = ovsl_dereference(tbl->mask_array);
-			for (i = 0; i < ma->max; i++) {
-				if (mask == ovsl_dereference(ma->masks[i])) {
-					RCU_INIT_POINTER(ma->masks[i], NULL);
-					ma->count--;
-					kfree_rcu(mask, rcu);
-					return;
-				}
-			}
-			BUG();
+			tbl_mask_array_delete_mask(ma, mask);
+
+			/* Shrink the mask array if necessary. */
+			if (ma->max >= (MASK_ARRAY_SIZE_MIN * 2) &&
+			    ma->count <= (ma->max / 3))
+				tbl_mask_array_realloc(tbl, ma->max / 2);
 		}
 	}
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next v2 04/10] net: openvswitch: optimize flow-mask cache hash collision
  2019-10-08  1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
                   ` (2 preceding siblings ...)
  2019-10-08  1:00 ` [PATCH net-next v2 03/10] net: openvswitch: shrink the mask array if necessary xiangxia.m.yue
@ 2019-10-08  1:00 ` xiangxia.m.yue
  2019-10-08  1:00 ` [PATCH net-next v2 05/10] net: openvswitch: optimize flow-mask looking up xiangxia.m.yue
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08  1:00 UTC (permalink / raw)
  To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Port the codes to linux upstream and with little changes.

Pravin B Shelar, says:
| In case hash collision on mask cache, OVS does extra flow
| lookup. Following patch avoid it.

Link: https://github.com/openvswitch/ovs/commit/0e6efbe2712da03522532dc5e84806a96f6a0dd1
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/openvswitch/flow_table.c | 95 ++++++++++++++++++++++++--------------------
 1 file changed, 53 insertions(+), 42 deletions(-)

diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index ff4c4b049..4c82960 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -508,6 +508,9 @@ static struct sw_flow *masked_flow_lookup(struct table_instance *ti,
 	return NULL;
 }
 
+/* Flow lookup does full lookup on flow table. It starts with
+ * mask from index passed in *index.
+ */
 static struct sw_flow *flow_lookup(struct flow_table *tbl,
 				   struct table_instance *ti,
 				   struct mask_array *ma,
@@ -516,18 +519,31 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl,
 				   u32 *index)
 {
 	struct sw_flow *flow;
+	struct sw_flow_mask *mask;
 	int i;
 
-	for (i = 0; i < ma->max; i++)  {
-		struct sw_flow_mask *mask;
-
-		mask = rcu_dereference_ovsl(ma->masks[i]);
+	if (*index < ma->max) {
+		mask = rcu_dereference_ovsl(ma->masks[*index]);
 		if (mask) {
 			flow = masked_flow_lookup(ti, key, mask, n_mask_hit);
-			if (flow) { /* Found */
-				*index = i;
+			if (flow)
 				return flow;
-			}
+		}
+	}
+
+	for (i = 0; i < ma->max; i++)  {
+
+		if (i == *index)
+			continue;
+
+		mask = rcu_dereference_ovsl(ma->masks[i]);
+		if (!mask)
+			continue;
+
+		flow = masked_flow_lookup(ti, key, mask, n_mask_hit);
+		if (flow) { /* Found */
+			*index = i;
+			return flow;
 		}
 	}
 
@@ -546,58 +562,54 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
 					  u32 skb_hash,
 					  u32 *n_mask_hit)
 {
-	struct mask_array *ma = rcu_dereference_ovsl(tbl->mask_array);
-	struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
-	struct mask_cache_entry  *entries, *ce, *del;
+	struct mask_array *ma = rcu_dereference(tbl->mask_array);
+	struct table_instance *ti = rcu_dereference(tbl->ti);
+	struct mask_cache_entry *entries, *ce;
 	struct sw_flow *flow;
-	u32 hash = skb_hash;
+	u32 hash;
 	int seg;
 
 	*n_mask_hit = 0;
 	if (unlikely(!skb_hash)) {
-		u32 __always_unused mask_index;
+		u32 mask_index = 0;
 
 		return flow_lookup(tbl, ti, ma, key, n_mask_hit, &mask_index);
 	}
 
-	del = NULL;
+	/* Pre and post recirulation flows usually have the same skb_hash
+	 * value. To avoid hash collisions, rehash the 'skb_hash' with
+	 * 'recirc_id'.  */
+	if (key->recirc_id)
+		skb_hash = jhash_1word(skb_hash, key->recirc_id);
+
+	ce = NULL;
+	hash = skb_hash;
 	entries = this_cpu_ptr(tbl->mask_cache);
 
+	/* Find the cache entry 'ce' to operate on. */
 	for (seg = 0; seg < MC_HASH_SEGS; seg++) {
-		int index;
-
-		index = hash & (MC_HASH_ENTRIES - 1);
-		ce = &entries[index];
-
-		if (ce->skb_hash == skb_hash) {
-			struct sw_flow_mask *mask;
-			struct sw_flow *flow;
-
-			mask = rcu_dereference_ovsl(ma->masks[ce->mask_index]);
-			if (mask) {
-				flow = masked_flow_lookup(ti, key, mask,
-							  n_mask_hit);
-				if (flow)  /* Found */
-					return flow;
-			}
-
-			del = ce;
-			break;
+		int index = hash & (MC_HASH_ENTRIES - 1);
+		struct mask_cache_entry *e;
+
+		e = &entries[index];
+		if (e->skb_hash == skb_hash) {
+			flow = flow_lookup(tbl, ti, ma, key, n_mask_hit,
+					   &e->mask_index);
+			if (!flow)
+				e->skb_hash = 0;
+			return flow;
 		}
 
-		if (!del || (del->skb_hash && !ce->skb_hash) ||
-		    (rcu_dereference_ovsl(ma->masks[del->mask_index]) &&
-		     !rcu_dereference_ovsl(ma->masks[ce->mask_index]))) {
-			del = ce;
-		}
+		if (!ce || e->skb_hash < ce->skb_hash)
+			ce = e;  /* A better replacement cache candidate. */
 
 		hash >>= MC_HASH_SHIFT;
 	}
 
-	flow = flow_lookup(tbl, ti, ma, key, n_mask_hit, &del->mask_index);
-
+	/* Cache miss, do full lookup. */
+	flow = flow_lookup(tbl, ti, ma, key, n_mask_hit, &ce->mask_index);
 	if (flow)
-		del->skb_hash = skb_hash;
+		ce->skb_hash = skb_hash;
 
 	return flow;
 }
@@ -607,9 +619,8 @@ struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *tbl,
 {
 	struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
 	struct mask_array *ma = rcu_dereference_ovsl(tbl->mask_array);
-
 	u32 __always_unused n_mask_hit;
-	u32 __always_unused index;
+	u32 index = 0;
 
 	return flow_lookup(tbl, ti, ma, key, &n_mask_hit, &index);
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next v2 05/10] net: openvswitch: optimize flow-mask looking up
  2019-10-08  1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
                   ` (3 preceding siblings ...)
  2019-10-08  1:00 ` [PATCH net-next v2 04/10] net: openvswitch: optimize flow-mask cache hash collision xiangxia.m.yue
@ 2019-10-08  1:00 ` xiangxia.m.yue
  2019-10-08  1:00 ` [PATCH net-next v2 06/10] net: openvswitch: simplify the flow_hash xiangxia.m.yue
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08  1:00 UTC (permalink / raw)
  To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

The full looking up on flow table traverses all mask array.
If mask-array is too large, the number of invalid flow-mask
increase, performance will be drop.

This patch optimizes mask-array operation:

* Inserting, insert it [ma->count- 1] directly.
* Removing, only change last and current mask point, and free current mask.
* Looking up, full looking up will break if mask is NULL.

The function which changes or gets *count* of struct mask_array,
is protected by ovs_lock, but flow_lookup (not protected) should use *max* of
struct mask_array.

Functions protected by ovs_lock:
* tbl_mask_array_del_mask
* tbl_mask_array_add_mask
* flow_mask_find
* ovs_flow_tbl_lookup_exact
* ovs_flow_tbl_num_masks

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/openvswitch/flow_table.c | 106 ++++++++++++++++++++++---------------------
 1 file changed, 54 insertions(+), 52 deletions(-)

diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 4c82960..7edddd9 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -538,7 +538,7 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl,
 
 		mask = rcu_dereference_ovsl(ma->masks[i]);
 		if (!mask)
-			continue;
+			break;
 
 		flow = masked_flow_lookup(ti, key, mask, n_mask_hit);
 		if (flow) { /* Found */
@@ -632,15 +632,13 @@ struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,
 	int i;
 
 	/* Always called under ovs-mutex. */
-	for (i = 0; i < ma->max; i++) {
+	for (i = 0; i < ma->count; i++) {
 		struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
 		u32 __always_unused n_mask_hit;
 		struct sw_flow_mask *mask;
 		struct sw_flow *flow;
 
 		mask = ovsl_dereference(ma->masks[i]);
-		if (!mask)
-			continue;
 
 		flow = masked_flow_lookup(ti, match->key, mask, &n_mask_hit);
 		if (flow && ovs_identifier_is_key(&flow->id) &&
@@ -704,21 +702,34 @@ static struct table_instance *table_instance_expand(struct table_instance *ti,
 	return table_instance_rehash(ti, ti->n_buckets * 2, ufid);
 }
 
-static void tbl_mask_array_delete_mask(struct mask_array *ma,
-				       struct sw_flow_mask *mask)
+static void tbl_mask_array_del_mask(struct flow_table *tbl,
+				    struct sw_flow_mask *mask)
 {
+	struct mask_array *ma = ovsl_dereference(tbl->mask_array);
 	int i;
 
 	/* Remove the deleted mask pointers from the array */
-	for (i = 0; i < ma->max; i++) {
-		if (mask == ovsl_dereference(ma->masks[i])) {
-			RCU_INIT_POINTER(ma->masks[i], NULL);
-			ma->count--;
-			kfree_rcu(mask, rcu);
-			return;
-		}
+	for (i = 0; i < ma->count; i++) {
+		if (mask == ovsl_dereference(ma->masks[i]))
+			goto found;
 	}
+
 	BUG();
+	return;
+
+found:
+	ma->count--;
+	smp_wmb();
+
+	rcu_assign_pointer(ma->masks[i], ma->masks[ma->count]);
+	RCU_INIT_POINTER(ma->masks[ma->count], NULL);
+
+	kfree_rcu(mask, rcu);
+
+	/* Shrink the mask array if necessary. */
+	if (ma->max >= (MASK_ARRAY_SIZE_MIN * 2) &&
+	    ma->count <= (ma->max / 3))
+		tbl_mask_array_realloc(tbl, ma->max / 2);
 }
 
 /* Remove 'mask' from the mask list, if it is not needed any more. */
@@ -732,17 +743,8 @@ static void flow_mask_remove(struct flow_table *tbl, struct sw_flow_mask *mask)
 		BUG_ON(!mask->ref_count);
 		mask->ref_count--;
 
-		if (!mask->ref_count) {
-			struct mask_array *ma;
-
-			ma = ovsl_dereference(tbl->mask_array);
-			tbl_mask_array_delete_mask(ma, mask);
-
-			/* Shrink the mask array if necessary. */
-			if (ma->max >= (MASK_ARRAY_SIZE_MIN * 2) &&
-			    ma->count <= (ma->max / 3))
-				tbl_mask_array_realloc(tbl, ma->max / 2);
-		}
+		if (!mask->ref_count)
+			tbl_mask_array_del_mask(tbl, mask);
 	}
 }
 
@@ -795,17 +797,40 @@ static struct sw_flow_mask *flow_mask_find(const struct flow_table *tbl,
 	int i;
 
 	ma = ovsl_dereference(tbl->mask_array);
-	for (i = 0; i < ma->max; i++) {
+	for (i = 0; i < ma->count; i++) {
 		struct sw_flow_mask *t;
 		t = ovsl_dereference(ma->masks[i]);
 
-		if (t && mask_equal(mask, t))
+		if (mask_equal(mask, t))
 			return t;
 	}
 
 	return NULL;
 }
 
+static int tbl_mask_array_add_mask(struct flow_table *tbl,
+				   struct sw_flow_mask *new)
+{
+	struct mask_array *ma = ovsl_dereference(tbl->mask_array);
+	int err;
+
+	if (ma->count >= ma->max) {
+		err = tbl_mask_array_realloc(tbl, ma->max +
+					      MASK_ARRAY_SIZE_MIN);
+		if (err)
+			return err;
+	}
+
+	BUG_ON(ovsl_dereference(ma->masks[ma->count]));
+
+	rcu_assign_pointer(ma->masks[ma->count], new);
+
+	smp_wmb();
+	ma->count++;
+
+	return 0;
+}
+
 /* Add 'mask' into the mask list, if it is not already there. */
 static int flow_mask_insert(struct flow_table *tbl, struct sw_flow *flow,
 			    const struct sw_flow_mask *new)
@@ -814,9 +839,6 @@ static int flow_mask_insert(struct flow_table *tbl, struct sw_flow *flow,
 
 	mask = flow_mask_find(tbl, new);
 	if (!mask) {
-		struct mask_array *ma;
-		int i;
-
 		/* Allocate a new mask if none exsits. */
 		mask = mask_alloc();
 		if (!mask)
@@ -825,29 +847,9 @@ static int flow_mask_insert(struct flow_table *tbl, struct sw_flow *flow,
 		mask->range = new->range;
 
 		/* Add mask to mask-list. */
-		ma = ovsl_dereference(tbl->mask_array);
-		if (ma->count >= ma->max) {
-			int err;
-
-			err = tbl_mask_array_realloc(tbl, ma->max +
-						     MASK_ARRAY_SIZE_MIN);
-			if (err) {
-				kfree(mask);
-				return err;
-			}
-
-			ma = ovsl_dereference(tbl->mask_array);
-		}
-
-		for (i = 0; i < ma->max; i++) {
-			const struct sw_flow_mask *t;
-
-			t = ovsl_dereference(ma->masks[i]);
-			if (!t) {
-				rcu_assign_pointer(ma->masks[i], mask);
-				ma->count++;
-				break;
-			}
+		if (tbl_mask_array_add_mask(tbl, mask)) {
+			kfree(mask);
+			return -ENOMEM;
 		}
 	} else {
 		BUG_ON(!mask->ref_count);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next v2 06/10] net: openvswitch: simplify the flow_hash
  2019-10-08  1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
                   ` (4 preceding siblings ...)
  2019-10-08  1:00 ` [PATCH net-next v2 05/10] net: openvswitch: optimize flow-mask looking up xiangxia.m.yue
@ 2019-10-08  1:00 ` xiangxia.m.yue
  2019-10-08  1:00 ` [PATCH net-next v2 07/10] net: openvswitch: add likely in flow_lookup xiangxia.m.yue
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08  1:00 UTC (permalink / raw)
  To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Simplify the code and remove the unnecessary BUILD_BUG_ON.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/openvswitch/flow_table.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 7edddd9..667f474 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -432,13 +432,9 @@ int ovs_flow_tbl_flush(struct flow_table *flow_table)
 static u32 flow_hash(const struct sw_flow_key *key,
 		     const struct sw_flow_key_range *range)
 {
-	int key_start = range->start;
-	int key_end = range->end;
-	const u32 *hash_key = (const u32 *)((const u8 *)key + key_start);
-	int hash_u32s = (key_end - key_start) >> 2;
-
+	const u32 *hash_key = (const u32 *)((const u8 *)key + range->start);
 	/* Make sure number of hash bytes are multiple of u32. */
-	BUILD_BUG_ON(sizeof(long) % sizeof(u32));
+	int hash_u32s = range_n_bytes(range) >> 2;
 
 	return jhash2(hash_key, hash_u32s, 0);
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next v2 07/10] net: openvswitch: add likely in flow_lookup
  2019-10-08  1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
                   ` (5 preceding siblings ...)
  2019-10-08  1:00 ` [PATCH net-next v2 06/10] net: openvswitch: simplify the flow_hash xiangxia.m.yue
@ 2019-10-08  1:00 ` xiangxia.m.yue
  2019-10-08  1:00 ` [PATCH net-next v2 08/10] net: openvswitch: fix possible memleak on destroy flow-table xiangxia.m.yue
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08  1:00 UTC (permalink / raw)
  To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

The most case *index < ma->count, and flow-mask is not NULL.
We add un/likely for performance.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/openvswitch/flow_table.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 667f474..007f7cd 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -518,7 +518,7 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl,
 	struct sw_flow_mask *mask;
 	int i;
 
-	if (*index < ma->max) {
+	if (likely(*index < ma->count)) {
 		mask = rcu_dereference_ovsl(ma->masks[*index]);
 		if (mask) {
 			flow = masked_flow_lookup(ti, key, mask, n_mask_hit);
@@ -527,13 +527,13 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl,
 		}
 	}
 
-	for (i = 0; i < ma->max; i++)  {
+	for (i = 0; i < ma->count; i++)  {
 
 		if (i == *index)
 			continue;
 
 		mask = rcu_dereference_ovsl(ma->masks[i]);
-		if (!mask)
+		if (unlikely(!mask))
 			break;
 
 		flow = masked_flow_lookup(ti, key, mask, n_mask_hit);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next v2 08/10] net: openvswitch: fix possible memleak on destroy flow-table
  2019-10-08  1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
                   ` (6 preceding siblings ...)
  2019-10-08  1:00 ` [PATCH net-next v2 07/10] net: openvswitch: add likely in flow_lookup xiangxia.m.yue
@ 2019-10-08  1:00 ` xiangxia.m.yue
  2019-10-08  1:00 ` [PATCH net-next v2 09/10] net: openvswitch: don't unlock mutex when changing the user_features fails xiangxia.m.yue
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08  1:00 UTC (permalink / raw)
  To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

When we destroy the flow tables which may contain the flow_mask,
so release the flow mask struct.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/openvswitch/flow_table.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 007f7cd..bc14b12 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -295,6 +295,18 @@ static void table_instance_destroy(struct table_instance *ti,
 	}
 }
 
+static void tbl_mask_array_destroy(struct flow_table *tbl)
+{
+	struct mask_array *ma = ovsl_dereference(tbl->mask_array);
+	int i;
+
+	/* Free the flow-mask and kfree_rcu the NULL is allowed. */
+	for (i = 0; i < ma->count; i++)
+		kfree_rcu(ma->masks[i], rcu);
+
+	kfree_rcu(tbl->mask_array, rcu);
+}
+
 /* No need for locking this function is called from RCU callback or
  * error path.
  */
@@ -304,7 +316,7 @@ void ovs_flow_tbl_destroy(struct flow_table *table)
 	struct table_instance *ufid_ti = rcu_dereference_raw(table->ufid_ti);
 
 	free_percpu(table->mask_cache);
-	kfree_rcu(table->mask_array, rcu);
+	tbl_mask_array_destroy(table);
 	table_instance_destroy(ti, ufid_ti, false);
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next v2 09/10] net: openvswitch: don't unlock mutex when changing the user_features fails
  2019-10-08  1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
                   ` (7 preceding siblings ...)
  2019-10-08  1:00 ` [PATCH net-next v2 08/10] net: openvswitch: fix possible memleak on destroy flow-table xiangxia.m.yue
@ 2019-10-08  1:00 ` xiangxia.m.yue
  2019-10-08  1:00 ` [PATCH net-next v2 10/10] net: openvswitch: simplify the ovs_dp_cmd_new xiangxia.m.yue
  2019-10-08 17:33 ` [PATCH net-next v2 00/10] optimize openvswitch flow looking up Gregory Rose
  10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08  1:00 UTC (permalink / raw)
  To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang, Paul Blakey

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Unlocking of a not locked mutex is not allowed.
Other kernel thread may be in critical section while
we unlock it because of setting user_feature fail.

Fixes: 95a7233c4 ("net: openvswitch: Set OvS recirc_id from tc chain index")
Cc: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/openvswitch/datapath.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 9fea7e1..aeb76e4 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1657,6 +1657,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 				ovs_dp_reset_user_features(skb, info);
 		}
 
+		ovs_unlock();
 		goto err_destroy_meters;
 	}
 
@@ -1673,7 +1674,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	return 0;
 
 err_destroy_meters:
-	ovs_unlock();
 	ovs_meters_exit(dp);
 err_destroy_ports_array:
 	kfree(dp->ports);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next v2 10/10] net: openvswitch: simplify the ovs_dp_cmd_new
  2019-10-08  1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
                   ` (8 preceding siblings ...)
  2019-10-08  1:00 ` [PATCH net-next v2 09/10] net: openvswitch: don't unlock mutex when changing the user_features fails xiangxia.m.yue
@ 2019-10-08  1:00 ` xiangxia.m.yue
  2019-10-08 17:33 ` [PATCH net-next v2 00/10] optimize openvswitch flow looking up Gregory Rose
  10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08  1:00 UTC (permalink / raw)
  To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

use the specified functions to init resource.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/openvswitch/datapath.c | 60 +++++++++++++++++++++++++++++-----------------
 1 file changed, 38 insertions(+), 22 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index aeb76e4..4d48e48 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1576,6 +1576,31 @@ static int ovs_dp_change(struct datapath *dp, struct nlattr *a[])
 	return 0;
 }
 
+static int ovs_dp_stats_init(struct datapath *dp)
+{
+	dp->stats_percpu = netdev_alloc_pcpu_stats(struct dp_stats_percpu);
+	if (!dp->stats_percpu)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int ovs_dp_vport_init(struct datapath *dp)
+{
+	int i;
+
+	dp->ports = kmalloc_array(DP_VPORT_HASH_BUCKETS,
+				  sizeof(struct hlist_head),
+				  GFP_KERNEL);
+	if (!dp->ports)
+		return -ENOMEM;
+
+	for (i = 0; i < DP_VPORT_HASH_BUCKETS; i++)
+		INIT_HLIST_HEAD(&dp->ports[i]);
+
+	return 0;
+}
+
 static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 {
 	struct nlattr **a = info->attrs;
@@ -1584,7 +1609,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	struct datapath *dp;
 	struct vport *vport;
 	struct ovs_net *ovs_net;
-	int err, i;
+	int err;
 
 	err = -EINVAL;
 	if (!a[OVS_DP_ATTR_NAME] || !a[OVS_DP_ATTR_UPCALL_PID])
@@ -1597,35 +1622,26 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	err = -ENOMEM;
 	dp = kzalloc(sizeof(*dp), GFP_KERNEL);
 	if (dp == NULL)
-		goto err_free_reply;
+		goto err_destroy_reply;
 
 	ovs_dp_set_net(dp, sock_net(skb->sk));
 
 	/* Allocate table. */
 	err = ovs_flow_tbl_init(&dp->table);
 	if (err)
-		goto err_free_dp;
+		goto err_destroy_dp;
 
-	dp->stats_percpu = netdev_alloc_pcpu_stats(struct dp_stats_percpu);
-	if (!dp->stats_percpu) {
-		err = -ENOMEM;
+	err = ovs_dp_stats_init(dp);
+	if (err)
 		goto err_destroy_table;
-	}
 
-	dp->ports = kmalloc_array(DP_VPORT_HASH_BUCKETS,
-				  sizeof(struct hlist_head),
-				  GFP_KERNEL);
-	if (!dp->ports) {
-		err = -ENOMEM;
-		goto err_destroy_percpu;
-	}
-
-	for (i = 0; i < DP_VPORT_HASH_BUCKETS; i++)
-		INIT_HLIST_HEAD(&dp->ports[i]);
+	err = ovs_dp_vport_init(dp);
+	if (err)
+		goto err_destroy_stats;
 
 	err = ovs_meters_init(dp);
 	if (err)
-		goto err_destroy_ports_array;
+		goto err_destroy_ports;
 
 	/* Set up our datapath device. */
 	parms.name = nla_data(a[OVS_DP_ATTR_NAME]);
@@ -1675,15 +1691,15 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 
 err_destroy_meters:
 	ovs_meters_exit(dp);
-err_destroy_ports_array:
+err_destroy_ports:
 	kfree(dp->ports);
-err_destroy_percpu:
+err_destroy_stats:
 	free_percpu(dp->stats_percpu);
 err_destroy_table:
 	ovs_flow_tbl_destroy(&dp->table);
-err_free_dp:
+err_destroy_dp:
 	kfree(dp);
-err_free_reply:
+err_destroy_reply:
 	kfree_skb(reply);
 err:
 	return err;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v2 00/10] optimize openvswitch flow looking up
  2019-10-08  1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
                   ` (9 preceding siblings ...)
  2019-10-08  1:00 ` [PATCH net-next v2 10/10] net: openvswitch: simplify the ovs_dp_cmd_new xiangxia.m.yue
@ 2019-10-08 17:33 ` Gregory Rose
  2019-10-10  8:42   ` Tonghao Zhang
  10 siblings, 1 reply; 16+ messages in thread
From: Gregory Rose @ 2019-10-08 17:33 UTC (permalink / raw)
  To: xiangxia.m.yue, pshelar; +Cc: netdev


On 10/7/2019 6:00 PM, xiangxia.m.yue@gmail.com wrote:
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> This series patch optimize openvswitch for performance or simplify
> codes.
>
> Patch 1, 2, 4: Port Pravin B Shelar patches to
> linux upstream with little changes.
>
> Patch 5, 6, 7: Optimize the flow looking up and
> simplify the flow hash.
>
> Patch 8, 9: are bugfix.
>
> The performance test is on Intel Xeon E5-2630 v4.
> The test topology is show as below:
>
> +-----------------------------------+
> |   +---------------------------+   |
> |   | eth0   ovs-switch    eth1 |   | Host0
> |   +---------------------------+   |
> +-----------------------------------+
>        ^                       |
>        |                       |
>        |                       |
>        |                       |
>        |                       v
> +-----+----+             +----+-----+
> | netperf  | Host1       | netserver| Host2
> +----------+             +----------+
>
> We use netperf send the 64B packets, and insert 255+ flow-mask:
> $ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:01:00:00:00:00/ff:ff:ff:ff:ff:01),eth_type(0x0800),ipv4(frag=no)" 2
> ...
> $ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:ff:00:00:00:00/ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(frag=no)" 2
> $
> $ netperf -t UDP_STREAM -H 2.2.2.200 -l 40 -- -m 18
>
> * Without series patch, throughput 8.28Mbps
> * With series patch, throughput 46.05Mbps
>
> v1 -> v2:
> 1. use kfree_rcu instead of call_rcu.
> 2. add barrier when changing the ma->count.
> 3. change the ma->max to ma->count in flow_lookup.
>
> Tonghao Zhang (10):
>    net: openvswitch: add flow-mask cache for performance
>    net: openvswitch: convert mask list in mask array
>    net: openvswitch: shrink the mask array if necessary
>    net: openvswitch: optimize flow-mask cache hash collision
>    net: openvswitch: optimize flow-mask looking up
>    net: openvswitch: simplify the flow_hash
>    net: openvswitch: add likely in flow_lookup
>    net: openvswitch: fix possible memleak on destroy flow-table
>    net: openvswitch: don't unlock mutex when changing the user_features
>      fails
>    net: openvswitch: simplify the ovs_dp_cmd_new
>
>   net/openvswitch/datapath.c   |  65 +++++----
>   net/openvswitch/flow.h       |   1 -
>   net/openvswitch/flow_table.c | 315 +++++++++++++++++++++++++++++++++++++------
>   net/openvswitch/flow_table.h |  19 ++-
>   4 files changed, 328 insertions(+), 72 deletions(-)
>

Hi Tonghao,

I've applied your patch series and built a 5.4.0-rc1 kernel with them.

xxxxx@ubuntu-1604:~$ modinfo openvswitch
filename: /lib/modules/5.4.0-rc1+/kernel/net/openvswitch/openvswitch.ko
alias:          net-pf-16-proto-16-family-ovs_ct_limit
alias:          net-pf-16-proto-16-family-ovs_meter
alias:          net-pf-16-proto-16-family-ovs_packet
alias:          net-pf-16-proto-16-family-ovs_flow
alias:          net-pf-16-proto-16-family-ovs_vport
alias:          net-pf-16-proto-16-family-ovs_datapath
license:        GPL
description:    Open vSwitch switching datapath
srcversion:     F15EB8B4460D81BAA16216B
depends: nf_conntrack,nf_nat,nf_conncount,libcrc32c,nf_defrag_ipv6,nsh
retpoline:      Y
intree:         Y
name:           openvswitch
vermagic:       5.4.0-rc1+ SMP mod_unload modversions

I then built openvswitch master branch from github and ran 'make 
check-kernel'.

In doing so I ran into the following splat in this test:
63: conntrack - IPv6 fragmentation + vlan

Here is the splat:
[  480.024215] ------------[ cut here ]------------
[  480.024218] kernel BUG at net/openvswitch/flow_table.c:725!
[  480.024267] invalid opcode: 0000 [#1] SMP PTI
[  480.024297] CPU: 2 PID: 15717 Comm: ovs-vswitchd Tainted: G            E
5.4.0-rc1+ #131
[  480.024345] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  480.024386] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
[  480.024424] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff 
ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb 
92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
[  480.024527] RSP: 0018:ffffaf32c05e38c8 EFLAGS: 00010246
[  480.024560] RAX: 0000000000000010 RBX: ffff9e4f6cd5a000 RCX: 
ffff9e4f6c585000
[  480.024601] RDX: ffff9e4f6cd5a098 RSI: 0000000000000010 RDI: 
ffff9e4f6b2c6d20
[  480.024642] RBP: ffffaf32c05e3b70 R08: ffff9e4f6c1651c0 R09: 
ffff9e4f756a43c0
[  480.024684] R10: 0000000000000000 R11: ffffffffc06e5500 R12: 
ffff9e4f6baf7800
[  480.024742] R13: ffff9e4f6b2c6d20 R14: ffff9e4f724a4e14 R15: 
0000000000000007
[  480.024790] FS:  00007fdd76058980(0000) GS:ffff9e4f77b00000(0000) 
knlGS:0000000000000000
[  480.024836] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  480.024871] CR2: 00007ffd18a5ac60 CR3: 0000000230f3a002 CR4: 
00000000001606e0
[  480.024917] Call Trace:
[  480.024941]  action_fifos_exit+0x3240/0x37b0 [openvswitch]
[  480.024979]  ? __switch_to_asm+0x40/0x70
[  480.025005]  ? __switch_to_asm+0x34/0x70
[  480.025031]  ? __switch_to_asm+0x40/0x70
[  480.025056]  ? __switch_to_asm+0x40/0x70
[  480.025082]  ? __switch_to_asm+0x34/0x70
[  480.025108]  ? __switch_to_asm+0x40/0x70
[  480.025134]  ? __switch_to_asm+0x34/0x70
[  480.025159]  ? __switch_to_asm+0x40/0x70
[  480.025185]  ? __switch_to_asm+0x34/0x70
[  480.025210]  ? __switch_to_asm+0x40/0x70
[  480.025236]  ? __switch_to_asm+0x34/0x70
[  480.025262]  ? __switch_to_asm+0x40/0x70
[  480.025287]  ? __switch_to_asm+0x34/0x70
[  480.025312]  ? __switch_to_asm+0x40/0x70
[  480.025338]  ? __switch_to_asm+0x34/0x70
[  480.025364]  ? __switch_to_asm+0x40/0x70
[  480.025389]  ? __switch_to_asm+0x34/0x70
[  480.025415]  ? __switch_to_asm+0x40/0x70
[  480.025443]  ? __update_load_avg_se+0x11c/0x2e0
[  480.025472]  ? __update_load_avg_se+0x11c/0x2e0
[  480.025503]  ? update_load_avg+0x7e/0x600
[  480.025529]  ? update_load_avg+0x7e/0x600
[  480.025556]  ? update_curr+0x85/0x1d0
[  480.025582]  ? cred_has_capability+0x85/0x130
[  480.025611]  ? __nla_validate_parse+0x57/0x8a0
[  480.025640]  ? _cond_resched+0x15/0x40
[  480.025666]  ? genl_family_rcv_msg_attrs_parse.isra.14+0x93/0x100
[  480.026523]  genl_rcv_msg+0x1d9/0x490
[  480.027385]  ? __switch_to_asm+0x34/0x70
[  480.028230]  ? __switch_to_asm+0x40/0x70
[  480.029050]  ? __switch_to_asm+0x40/0x70
[  480.029874]  ? genl_family_rcv_msg_attrs_parse.isra.14+0x100/0x100
[  480.030673]  netlink_rcv_skb+0x4a/0x110
[  480.031465]  genl_rcv+0x24/0x40
[  480.032312]  netlink_unicast+0x1a0/0x250
[  480.033059]  netlink_sendmsg+0x2b4/0x3b0
[  480.033758]  sock_sendmsg+0x5b/0x60
[  480.034422]  ___sys_sendmsg+0x278/0x2f0
[  480.035083]  ? file_update_time+0x60/0x130
[  480.035680]  ? pipe_write+0x286/0x400
[  480.036290]  ? new_sync_write+0x12d/0x1d0
[  480.036882]  ? __sys_sendmsg+0x5e/0xa0
[  480.037452]  __sys_sendmsg+0x5e/0xa0
[  480.038013]  do_syscall_64+0x52/0x1a0
[  480.038546]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  480.039083] RIP: 0033:0x7fdd7537fa6d
[  480.039596] Code: b9 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0 
ff ff 73 31 c3 48 83 ec 08 e8 fe f6 ff ff 48 89 04 24 b8 2e 00 00 00 0f 
05 <48> 8b 3c 24 48 89 c2 e8 47 f7 ff ff 48 89 d0 48 83 c4 08 48 3d 01
[  480.040769] RSP: 002b:00007ffd18a6ad40 EFLAGS: 00000293 ORIG_RAX: 
000000000000002e
[  480.041391] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 
00007fdd7537fa6d
[  480.042045] RDX: 0000000000000000 RSI: 00007ffd18a6ada0 RDI: 
0000000000000014
[  480.042713] RBP: 0000000002300870 R08: 0000000000000000 R09: 
00007ffd18a6bd58
[  480.043438] R10: 0000000000000000 R11: 0000000000000293 R12: 
00007ffd18a6bb70
[  480.044138] R13: 00007ffd18a6bd00 R14: 00007ffd18a6bb78 R15: 
00007ffd18a6b230
[  480.044852] Modules linked in: vport_vxlan(E) vxlan(E) vport_gre(E) 
ip_gre(E) ip_tunnel(E) vport_geneve(E) geneve(E) ip6_udp_tunnel(E) 
udp_tunnel(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat_tftp(E) 
nf_conntrack_tftp(E) nf_nat_ftp(E) nf_conntrack_ftp(E) nf_nat(E) 
nf_conntrack_netlink(E) ip6table_filter(E) ip6_tables(E) 
iptable_filter(E) ip_tables(E) x_tables(E) ip6_gre(E) ip6_tunnel(E) 
tunnel6(E) gre(E) bonding(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E) 
veth(E) nfnetlink_cttimeout(E) nfnetlink(E) nf_conntrack(E) 
nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) intel_rapl_msr(E) 
snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_intel(E) 
snd_intel_nhlt(E) joydev(E) snd_hda_codec(E) input_leds(E) 
snd_hda_core(E) snd_hwdep(E) intel_rapl_common(E) snd_pcm(E) 
snd_timer(E) serio_raw(E) snd(E) soundcore(E) i2c_piix4(E) mac_hid(E) 
ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) configfs(E) 
iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) 
autofs4(E) btrfs(E) zstd_decompress(E)
[  480.044888]  zstd_compress(E) raid10(E) raid456(E) 
async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) 
async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E) 
multipath(E) linear(E) crct10dif_pclmul(E) crc32_pclmul(E) 
ghash_clmulni_intel(E) aesni_intel(E) qxl(E) crypto_simd(E) ttm(E) 
cryptd(E) glue_helper(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) 
sysimgblt(E) fb_sys_fops(E) psmouse(E) drm(E) floppy(E) pata_acpi(E) 
[last unloaded: nf_conntrack_ftp]
[  480.056765] ---[ end trace 4a8c4eceeb9f5dec ]---
[  480.057953] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
[  480.059134] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff 
ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb 
92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
[  480.061623] RSP: 0018:ffffaf32c05e38c8 EFLAGS: 00010246
[  480.062959] RAX: 0000000000000010 RBX: ffff9e4f6cd5a000 RCX: 
ffff9e4f6c585000
[  480.064248] RDX: ffff9e4f6cd5a098 RSI: 0000000000000010 RDI: 
ffff9e4f6b2c6d20
[  480.065524] RBP: ffffaf32c05e3b70 R08: ffff9e4f6c1651c0 R09: 
ffff9e4f756a43c0
[  480.066830] R10: 0000000000000000 R11: ffffffffc06e5500 R12: 
ffff9e4f6baf7800
[  480.068870] R13: ffff9e4f6b2c6d20 R14: ffff9e4f724a4e14 R15: 
0000000000000007
[  480.070081] FS:  00007fdd76058980(0000) GS:ffff9e4f77b00000(0000) 
knlGS:0000000000000000
[  480.071340] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  480.072610] CR2: 00007ffd18a5ac60 CR3: 0000000230f3a002 CR4: 
00000000001606e0

You're hitting the BUG_ON here:

/* Must be called with OVS mutex held. */
void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow)
{
         struct table_instance *ti = ovsl_dereference(table->ti);
         struct table_instance *ufid_ti = ovsl_dereference(table->ufid_ti);

         BUG_ON(table->count == 0); 
<------------------------------------------------ Here

Thanks,

- Greg

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v2 00/10] optimize openvswitch flow looking up
  2019-10-08 17:33 ` [PATCH net-next v2 00/10] optimize openvswitch flow looking up Gregory Rose
@ 2019-10-10  8:42   ` Tonghao Zhang
  2019-10-14 22:26     ` Gregory Rose
  0 siblings, 1 reply; 16+ messages in thread
From: Tonghao Zhang @ 2019-10-10  8:42 UTC (permalink / raw)
  To: Gregory Rose; +Cc: Pravin Shelar, Linux Kernel Network Developers

On Wed, Oct 9, 2019 at 1:33 AM Gregory Rose <gvrose8192@gmail.com> wrote:
>
>
> On 10/7/2019 6:00 PM, xiangxia.m.yue@gmail.com wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > This series patch optimize openvswitch for performance or simplify
> > codes.
> >
> > Patch 1, 2, 4: Port Pravin B Shelar patches to
> > linux upstream with little changes.
> >
> > Patch 5, 6, 7: Optimize the flow looking up and
> > simplify the flow hash.
> >
> > Patch 8, 9: are bugfix.
> >
> > The performance test is on Intel Xeon E5-2630 v4.
> > The test topology is show as below:
> >
> > +-----------------------------------+
> > |   +---------------------------+   |
> > |   | eth0   ovs-switch    eth1 |   | Host0
> > |   +---------------------------+   |
> > +-----------------------------------+
> >        ^                       |
> >        |                       |
> >        |                       |
> >        |                       |
> >        |                       v
> > +-----+----+             +----+-----+
> > | netperf  | Host1       | netserver| Host2
> > +----------+             +----------+
> >
> > We use netperf send the 64B packets, and insert 255+ flow-mask:
> > $ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:01:00:00:00:00/ff:ff:ff:ff:ff:01),eth_type(0x0800),ipv4(frag=no)" 2
> > ...
> > $ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:ff:00:00:00:00/ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(frag=no)" 2
> > $
> > $ netperf -t UDP_STREAM -H 2.2.2.200 -l 40 -- -m 18
> >
> > * Without series patch, throughput 8.28Mbps
> > * With series patch, throughput 46.05Mbps
> >
> > v1 -> v2:
> > 1. use kfree_rcu instead of call_rcu.
> > 2. add barrier when changing the ma->count.
> > 3. change the ma->max to ma->count in flow_lookup.
> >
> > Tonghao Zhang (10):
> >    net: openvswitch: add flow-mask cache for performance
> >    net: openvswitch: convert mask list in mask array
> >    net: openvswitch: shrink the mask array if necessary
> >    net: openvswitch: optimize flow-mask cache hash collision
> >    net: openvswitch: optimize flow-mask looking up
> >    net: openvswitch: simplify the flow_hash
> >    net: openvswitch: add likely in flow_lookup
> >    net: openvswitch: fix possible memleak on destroy flow-table
> >    net: openvswitch: don't unlock mutex when changing the user_features
> >      fails
> >    net: openvswitch: simplify the ovs_dp_cmd_new
> >
> >   net/openvswitch/datapath.c   |  65 +++++----
> >   net/openvswitch/flow.h       |   1 -
> >   net/openvswitch/flow_table.c | 315 +++++++++++++++++++++++++++++++++++++------
> >   net/openvswitch/flow_table.h |  19 ++-
> >   4 files changed, 328 insertions(+), 72 deletions(-)
> >
>
> Hi Tonghao,
>
> I've applied your patch series and built a 5.4.0-rc1 kernel with them.
>
> xxxxx@ubuntu-1604:~$ modinfo openvswitch
> filename: /lib/modules/5.4.0-rc1+/kernel/net/openvswitch/openvswitch.ko
> alias:          net-pf-16-proto-16-family-ovs_ct_limit
> alias:          net-pf-16-proto-16-family-ovs_meter
> alias:          net-pf-16-proto-16-family-ovs_packet
> alias:          net-pf-16-proto-16-family-ovs_flow
> alias:          net-pf-16-proto-16-family-ovs_vport
> alias:          net-pf-16-proto-16-family-ovs_datapath
> license:        GPL
> description:    Open vSwitch switching datapath
> srcversion:     F15EB8B4460D81BAA16216B
> depends: nf_conntrack,nf_nat,nf_conncount,libcrc32c,nf_defrag_ipv6,nsh
> retpoline:      Y
> intree:         Y
> name:           openvswitch
> vermagic:       5.4.0-rc1+ SMP mod_unload modversions
>
> I then built openvswitch master branch from github and ran 'make
> check-kernel'.
>
> In doing so I ran into the following splat in this test:
> 63: conntrack - IPv6 fragmentation + vlan
>
> Here is the splat:
> [  480.024215] ------------[ cut here ]------------
> [  480.024218] kernel BUG at net/openvswitch/flow_table.c:725!
> [  480.024267] invalid opcode: 0000 [#1] SMP PTI
> [  480.024297] CPU: 2 PID: 15717 Comm: ovs-vswitchd Tainted: G            E
> 5.4.0-rc1+ #131
> [  480.024345] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [  480.024386] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
> [  480.024424] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
> [  480.024527] RSP: 0018:ffffaf32c05e38c8 EFLAGS: 00010246
> [  480.024560] RAX: 0000000000000010 RBX: ffff9e4f6cd5a000 RCX:
> ffff9e4f6c585000
> [  480.024601] RDX: ffff9e4f6cd5a098 RSI: 0000000000000010 RDI:
> ffff9e4f6b2c6d20
> [  480.024642] RBP: ffffaf32c05e3b70 R08: ffff9e4f6c1651c0 R09:
> ffff9e4f756a43c0
> [  480.024684] R10: 0000000000000000 R11: ffffffffc06e5500 R12:
> ffff9e4f6baf7800
> [  480.024742] R13: ffff9e4f6b2c6d20 R14: ffff9e4f724a4e14 R15:
> 0000000000000007
> [  480.024790] FS:  00007fdd76058980(0000) GS:ffff9e4f77b00000(0000)
> knlGS:0000000000000000
> [  480.024836] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  480.024871] CR2: 00007ffd18a5ac60 CR3: 0000000230f3a002 CR4:
> 00000000001606e0
> [  480.024917] Call Trace:
> [  480.024941]  action_fifos_exit+0x3240/0x37b0 [openvswitch]
> [  480.024979]  ? __switch_to_asm+0x40/0x70
> [  480.025005]  ? __switch_to_asm+0x34/0x70
> [  480.025031]  ? __switch_to_asm+0x40/0x70
> [  480.025056]  ? __switch_to_asm+0x40/0x70
> [  480.025082]  ? __switch_to_asm+0x34/0x70
> [  480.025108]  ? __switch_to_asm+0x40/0x70
> [  480.025134]  ? __switch_to_asm+0x34/0x70
> [  480.025159]  ? __switch_to_asm+0x40/0x70
> [  480.025185]  ? __switch_to_asm+0x34/0x70
> [  480.025210]  ? __switch_to_asm+0x40/0x70
> [  480.025236]  ? __switch_to_asm+0x34/0x70
> [  480.025262]  ? __switch_to_asm+0x40/0x70
> [  480.025287]  ? __switch_to_asm+0x34/0x70
> [  480.025312]  ? __switch_to_asm+0x40/0x70
> [  480.025338]  ? __switch_to_asm+0x34/0x70
> [  480.025364]  ? __switch_to_asm+0x40/0x70
> [  480.025389]  ? __switch_to_asm+0x34/0x70
> [  480.025415]  ? __switch_to_asm+0x40/0x70
> [  480.025443]  ? __update_load_avg_se+0x11c/0x2e0
> [  480.025472]  ? __update_load_avg_se+0x11c/0x2e0
> [  480.025503]  ? update_load_avg+0x7e/0x600
> [  480.025529]  ? update_load_avg+0x7e/0x600
> [  480.025556]  ? update_curr+0x85/0x1d0
> [  480.025582]  ? cred_has_capability+0x85/0x130
> [  480.025611]  ? __nla_validate_parse+0x57/0x8a0
> [  480.025640]  ? _cond_resched+0x15/0x40
> [  480.025666]  ? genl_family_rcv_msg_attrs_parse.isra.14+0x93/0x100
> [  480.026523]  genl_rcv_msg+0x1d9/0x490
> [  480.027385]  ? __switch_to_asm+0x34/0x70
> [  480.028230]  ? __switch_to_asm+0x40/0x70
> [  480.029050]  ? __switch_to_asm+0x40/0x70
> [  480.029874]  ? genl_family_rcv_msg_attrs_parse.isra.14+0x100/0x100
> [  480.030673]  netlink_rcv_skb+0x4a/0x110
> [  480.031465]  genl_rcv+0x24/0x40
> [  480.032312]  netlink_unicast+0x1a0/0x250
> [  480.033059]  netlink_sendmsg+0x2b4/0x3b0
> [  480.033758]  sock_sendmsg+0x5b/0x60
> [  480.034422]  ___sys_sendmsg+0x278/0x2f0
> [  480.035083]  ? file_update_time+0x60/0x130
> [  480.035680]  ? pipe_write+0x286/0x400
> [  480.036290]  ? new_sync_write+0x12d/0x1d0
> [  480.036882]  ? __sys_sendmsg+0x5e/0xa0
> [  480.037452]  __sys_sendmsg+0x5e/0xa0
> [  480.038013]  do_syscall_64+0x52/0x1a0
> [  480.038546]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  480.039083] RIP: 0033:0x7fdd7537fa6d
> [  480.039596] Code: b9 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0
> ff ff 73 31 c3 48 83 ec 08 e8 fe f6 ff ff 48 89 04 24 b8 2e 00 00 00 0f
> 05 <48> 8b 3c 24 48 89 c2 e8 47 f7 ff ff 48 89 d0 48 83 c4 08 48 3d 01
> [  480.040769] RSP: 002b:00007ffd18a6ad40 EFLAGS: 00000293 ORIG_RAX:
> 000000000000002e
> [  480.041391] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> 00007fdd7537fa6d
> [  480.042045] RDX: 0000000000000000 RSI: 00007ffd18a6ada0 RDI:
> 0000000000000014
> [  480.042713] RBP: 0000000002300870 R08: 0000000000000000 R09:
> 00007ffd18a6bd58
> [  480.043438] R10: 0000000000000000 R11: 0000000000000293 R12:
> 00007ffd18a6bb70
> [  480.044138] R13: 00007ffd18a6bd00 R14: 00007ffd18a6bb78 R15:
> 00007ffd18a6b230
> [  480.044852] Modules linked in: vport_vxlan(E) vxlan(E) vport_gre(E)
> ip_gre(E) ip_tunnel(E) vport_geneve(E) geneve(E) ip6_udp_tunnel(E)
> udp_tunnel(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat_tftp(E)
> nf_conntrack_tftp(E) nf_nat_ftp(E) nf_conntrack_ftp(E) nf_nat(E)
> nf_conntrack_netlink(E) ip6table_filter(E) ip6_tables(E)
> iptable_filter(E) ip_tables(E) x_tables(E) ip6_gre(E) ip6_tunnel(E)
> tunnel6(E) gre(E) bonding(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E)
> veth(E) nfnetlink_cttimeout(E) nfnetlink(E) nf_conntrack(E)
> nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) intel_rapl_msr(E)
> snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_intel(E)
> snd_intel_nhlt(E) joydev(E) snd_hda_codec(E) input_leds(E)
> snd_hda_core(E) snd_hwdep(E) intel_rapl_common(E) snd_pcm(E)
> snd_timer(E) serio_raw(E) snd(E) soundcore(E) i2c_piix4(E) mac_hid(E)
> ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) configfs(E)
> iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E)
> autofs4(E) btrfs(E) zstd_decompress(E)
> [  480.044888]  zstd_compress(E) raid10(E) raid456(E)
> async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E)
> async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E)
> multipath(E) linear(E) crct10dif_pclmul(E) crc32_pclmul(E)
> ghash_clmulni_intel(E) aesni_intel(E) qxl(E) crypto_simd(E) ttm(E)
> cryptd(E) glue_helper(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E)
> sysimgblt(E) fb_sys_fops(E) psmouse(E) drm(E) floppy(E) pata_acpi(E)
> [last unloaded: nf_conntrack_ftp]
> [  480.056765] ---[ end trace 4a8c4eceeb9f5dec ]---
> [  480.057953] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
> [  480.059134] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
> [  480.061623] RSP: 0018:ffffaf32c05e38c8 EFLAGS: 00010246
> [  480.062959] RAX: 0000000000000010 RBX: ffff9e4f6cd5a000 RCX:
> ffff9e4f6c585000
> [  480.064248] RDX: ffff9e4f6cd5a098 RSI: 0000000000000010 RDI:
> ffff9e4f6b2c6d20
> [  480.065524] RBP: ffffaf32c05e3b70 R08: ffff9e4f6c1651c0 R09:
> ffff9e4f756a43c0
> [  480.066830] R10: 0000000000000000 R11: ffffffffc06e5500 R12:
> ffff9e4f6baf7800
> [  480.068870] R13: ffff9e4f6b2c6d20 R14: ffff9e4f724a4e14 R15:
> 0000000000000007
> [  480.070081] FS:  00007fdd76058980(0000) GS:ffff9e4f77b00000(0000)
> knlGS:0000000000000000
> [  480.071340] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  480.072610] CR2: 00007ffd18a5ac60 CR3: 0000000230f3a002 CR4:
> 00000000001606e0
>
> You're hitting the BUG_ON here:
>
> /* Must be called with OVS mutex held. */
> void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow)
> {
>          struct table_instance *ti = ovsl_dereference(table->ti);
>          struct table_instance *ufid_ti = ovsl_dereference(table->ufid_ti);
>
>          BUG_ON(table->count == 0);
> <------------------------------------------------ Here
Hi Greg,
Thanks for your work, I fixed it, when relloac mask_array I don't
update ma point in patch 5.

diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index bc14b12..210018a 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -827,6 +827,8 @@ static int tbl_mask_array_add_mask(struct flow_table *tbl,
                                              MASK_ARRAY_SIZE_MIN);
                if (err)
                        return err;
+
+               ma = ovsl_dereference(tbl->mask_array);
        }

        BUG_ON(ovsl_dereference(ma->masks[ma->count]));

> Thanks,
>
> - Greg

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v2 00/10] optimize openvswitch flow looking up
  2019-10-10  8:42   ` Tonghao Zhang
@ 2019-10-14 22:26     ` Gregory Rose
  2019-10-15  8:25       ` Tonghao Zhang
  0 siblings, 1 reply; 16+ messages in thread
From: Gregory Rose @ 2019-10-14 22:26 UTC (permalink / raw)
  To: Tonghao Zhang; +Cc: Pravin Shelar, Linux Kernel Network Developers

On 10/10/2019 1:42 AM, Tonghao Zhang wrote:
> On Wed, Oct 9, 2019 at 1:33 AM Gregory Rose <gvrose8192@gmail.com> wrote:
>>

[snip]

>> Hi Tonghao,
>>
>> I've applied your patch series and built a 5.4.0-rc1 kernel with them.
>>
>> xxxxx@ubuntu-1604:~$ modinfo openvswitch
>> filename: /lib/modules/5.4.0-rc1+/kernel/net/openvswitch/openvswitch.ko
>> alias:          net-pf-16-proto-16-family-ovs_ct_limit
>> alias:          net-pf-16-proto-16-family-ovs_meter
>> alias:          net-pf-16-proto-16-family-ovs_packet
>> alias:          net-pf-16-proto-16-family-ovs_flow
>> alias:          net-pf-16-proto-16-family-ovs_vport
>> alias:          net-pf-16-proto-16-family-ovs_datapath
>> license:        GPL
>> description:    Open vSwitch switching datapath
>> srcversion:     F15EB8B4460D81BAA16216B
>> depends: nf_conntrack,nf_nat,nf_conncount,libcrc32c,nf_defrag_ipv6,nsh
>> retpoline:      Y
>> intree:         Y
>> name:           openvswitch
>> vermagic:       5.4.0-rc1+ SMP mod_unload modversions
>>
>> I then built openvswitch master branch from github and ran 'make
>> check-kernel'.
>>
>> In doing so I ran into the following splat in this test:
>> 63: conntrack - IPv6 fragmentation + vlan
>>
>> Here is the splat:
>> [  480.024215] ------------[ cut here ]------------
>> [  480.024218] kernel BUG at net/openvswitch/flow_table.c:725!
>> [  480.024267] invalid opcode: 0000 [#1] SMP PTI
>> [  480.024297] CPU: 2 PID: 15717 Comm: ovs-vswitchd Tainted: G            E
>> 5.4.0-rc1+ #131
>> [  480.024345] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>> [  480.024386] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
>> [  480.024424] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
>> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
>> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
>> [  480.024527] RSP: 0018:ffffaf32c05e38c8 EFLAGS: 00010246
>> [  480.024560] RAX: 0000000000000010 RBX: ffff9e4f6cd5a000 RCX:
>> ffff9e4f6c585000
>> [  480.024601] RDX: ffff9e4f6cd5a098 RSI: 0000000000000010 RDI:
>> ffff9e4f6b2c6d20
>> [  480.024642] RBP: ffffaf32c05e3b70 R08: ffff9e4f6c1651c0 R09:
>> ffff9e4f756a43c0
>> [  480.024684] R10: 0000000000000000 R11: ffffffffc06e5500 R12:
>> ffff9e4f6baf7800
>> [  480.024742] R13: ffff9e4f6b2c6d20 R14: ffff9e4f724a4e14 R15:
>> 0000000000000007
>> [  480.024790] FS:  00007fdd76058980(0000) GS:ffff9e4f77b00000(0000)
>> knlGS:0000000000000000
>> [  480.024836] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  480.024871] CR2: 00007ffd18a5ac60 CR3: 0000000230f3a002 CR4:
>> 00000000001606e0
>> [  480.024917] Call Trace:
>> [  480.024941]  action_fifos_exit+0x3240/0x37b0 [openvswitch]
>> [  480.024979]  ? __switch_to_asm+0x40/0x70
>> [  480.025005]  ? __switch_to_asm+0x34/0x70
>> [  480.025031]  ? __switch_to_asm+0x40/0x70
>> [  480.025056]  ? __switch_to_asm+0x40/0x70
>> [  480.025082]  ? __switch_to_asm+0x34/0x70
>> [  480.025108]  ? __switch_to_asm+0x40/0x70
>> [  480.025134]  ? __switch_to_asm+0x34/0x70
>> [  480.025159]  ? __switch_to_asm+0x40/0x70
>> [  480.025185]  ? __switch_to_asm+0x34/0x70
>> [  480.025210]  ? __switch_to_asm+0x40/0x70
>> [  480.025236]  ? __switch_to_asm+0x34/0x70
>> [  480.025262]  ? __switch_to_asm+0x40/0x70
>> [  480.025287]  ? __switch_to_asm+0x34/0x70
>> [  480.025312]  ? __switch_to_asm+0x40/0x70
>> [  480.025338]  ? __switch_to_asm+0x34/0x70
>> [  480.025364]  ? __switch_to_asm+0x40/0x70
>> [  480.025389]  ? __switch_to_asm+0x34/0x70
>> [  480.025415]  ? __switch_to_asm+0x40/0x70
>> [  480.025443]  ? __update_load_avg_se+0x11c/0x2e0
>> [  480.025472]  ? __update_load_avg_se+0x11c/0x2e0
>> [  480.025503]  ? update_load_avg+0x7e/0x600
>> [  480.025529]  ? update_load_avg+0x7e/0x600
>> [  480.025556]  ? update_curr+0x85/0x1d0
>> [  480.025582]  ? cred_has_capability+0x85/0x130
>> [  480.025611]  ? __nla_validate_parse+0x57/0x8a0
>> [  480.025640]  ? _cond_resched+0x15/0x40
>> [  480.025666]  ? genl_family_rcv_msg_attrs_parse.isra.14+0x93/0x100
>> [  480.026523]  genl_rcv_msg+0x1d9/0x490
>> [  480.027385]  ? __switch_to_asm+0x34/0x70
>> [  480.028230]  ? __switch_to_asm+0x40/0x70
>> [  480.029050]  ? __switch_to_asm+0x40/0x70
>> [  480.029874]  ? genl_family_rcv_msg_attrs_parse.isra.14+0x100/0x100
>> [  480.030673]  netlink_rcv_skb+0x4a/0x110
>> [  480.031465]  genl_rcv+0x24/0x40
>> [  480.032312]  netlink_unicast+0x1a0/0x250
>> [  480.033059]  netlink_sendmsg+0x2b4/0x3b0
>> [  480.033758]  sock_sendmsg+0x5b/0x60
>> [  480.034422]  ___sys_sendmsg+0x278/0x2f0
>> [  480.035083]  ? file_update_time+0x60/0x130
>> [  480.035680]  ? pipe_write+0x286/0x400
>> [  480.036290]  ? new_sync_write+0x12d/0x1d0
>> [  480.036882]  ? __sys_sendmsg+0x5e/0xa0
>> [  480.037452]  __sys_sendmsg+0x5e/0xa0
>> [  480.038013]  do_syscall_64+0x52/0x1a0
>> [  480.038546]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [  480.039083] RIP: 0033:0x7fdd7537fa6d
>> [  480.039596] Code: b9 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0
>> ff ff 73 31 c3 48 83 ec 08 e8 fe f6 ff ff 48 89 04 24 b8 2e 00 00 00 0f
>> 05 <48> 8b 3c 24 48 89 c2 e8 47 f7 ff ff 48 89 d0 48 83 c4 08 48 3d 01
>> [  480.040769] RSP: 002b:00007ffd18a6ad40 EFLAGS: 00000293 ORIG_RAX:
>> 000000000000002e
>> [  480.041391] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>> 00007fdd7537fa6d
>> [  480.042045] RDX: 0000000000000000 RSI: 00007ffd18a6ada0 RDI:
>> 0000000000000014
>> [  480.042713] RBP: 0000000002300870 R08: 0000000000000000 R09:
>> 00007ffd18a6bd58
>> [  480.043438] R10: 0000000000000000 R11: 0000000000000293 R12:
>> 00007ffd18a6bb70
>> [  480.044138] R13: 00007ffd18a6bd00 R14: 00007ffd18a6bb78 R15:
>> 00007ffd18a6b230
>> [  480.044852] Modules linked in: vport_vxlan(E) vxlan(E) vport_gre(E)
>> ip_gre(E) ip_tunnel(E) vport_geneve(E) geneve(E) ip6_udp_tunnel(E)
>> udp_tunnel(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat_tftp(E)
>> nf_conntrack_tftp(E) nf_nat_ftp(E) nf_conntrack_ftp(E) nf_nat(E)
>> nf_conntrack_netlink(E) ip6table_filter(E) ip6_tables(E)
>> iptable_filter(E) ip_tables(E) x_tables(E) ip6_gre(E) ip6_tunnel(E)
>> tunnel6(E) gre(E) bonding(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E)
>> veth(E) nfnetlink_cttimeout(E) nfnetlink(E) nf_conntrack(E)
>> nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) intel_rapl_msr(E)
>> snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_intel(E)
>> snd_intel_nhlt(E) joydev(E) snd_hda_codec(E) input_leds(E)
>> snd_hda_core(E) snd_hwdep(E) intel_rapl_common(E) snd_pcm(E)
>> snd_timer(E) serio_raw(E) snd(E) soundcore(E) i2c_piix4(E) mac_hid(E)
>> ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) configfs(E)
>> iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E)
>> autofs4(E) btrfs(E) zstd_decompress(E)
>> [  480.044888]  zstd_compress(E) raid10(E) raid456(E)
>> async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E)
>> async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E)
>> multipath(E) linear(E) crct10dif_pclmul(E) crc32_pclmul(E)
>> ghash_clmulni_intel(E) aesni_intel(E) qxl(E) crypto_simd(E) ttm(E)
>> cryptd(E) glue_helper(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E)
>> sysimgblt(E) fb_sys_fops(E) psmouse(E) drm(E) floppy(E) pata_acpi(E)
>> [last unloaded: nf_conntrack_ftp]
>> [  480.056765] ---[ end trace 4a8c4eceeb9f5dec ]---
>> [  480.057953] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
>> [  480.059134] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
>> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
>> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
>> [  480.061623] RSP: 0018:ffffaf32c05e38c8 EFLAGS: 00010246
>> [  480.062959] RAX: 0000000000000010 RBX: ffff9e4f6cd5a000 RCX:
>> ffff9e4f6c585000
>> [  480.064248] RDX: ffff9e4f6cd5a098 RSI: 0000000000000010 RDI:
>> ffff9e4f6b2c6d20
>> [  480.065524] RBP: ffffaf32c05e3b70 R08: ffff9e4f6c1651c0 R09:
>> ffff9e4f756a43c0
>> [  480.066830] R10: 0000000000000000 R11: ffffffffc06e5500 R12:
>> ffff9e4f6baf7800
>> [  480.068870] R13: ffff9e4f6b2c6d20 R14: ffff9e4f724a4e14 R15:
>> 0000000000000007
>> [  480.070081] FS:  00007fdd76058980(0000) GS:ffff9e4f77b00000(0000)
>> knlGS:0000000000000000
>> [  480.071340] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  480.072610] CR2: 00007ffd18a5ac60 CR3: 0000000230f3a002 CR4:
>> 00000000001606e0
>>
>> You're hitting the BUG_ON here:
>>
>> /* Must be called with OVS mutex held. */
>> void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow)
>> {
>>           struct table_instance *ti = ovsl_dereference(table->ti);
>>           struct table_instance *ufid_ti = ovsl_dereference(table->ufid_ti);
>>
>>           BUG_ON(table->count == 0);
>> <------------------------------------------------ Here
> Hi Greg,
> Thanks for your work, I fixed it, when relloac mask_array I don't
> update ma point in patch 5.
>
> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> index bc14b12..210018a 100644
> --- a/net/openvswitch/flow_table.c
> +++ b/net/openvswitch/flow_table.c
> @@ -827,6 +827,8 @@ static int tbl_mask_array_add_mask(struct flow_table *tbl,
>                                                MASK_ARRAY_SIZE_MIN);
>                  if (err)
>                          return err;
> +
> +               ma = ovsl_dereference(tbl->mask_array);
>          }
>
>          BUG_ON(ovsl_dereference(ma->masks[ma->count]));

Hi Tonghao,

I did make the change you suggested:

git diff
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index bc14b12..210018a 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -827,6 +827,8 @@ static int tbl_mask_array_add_mask(struct flow_table 
*tbl,
                                               MASK_ARRAY_SIZE_MIN);
                 if (err)
                         return err;
+
+               ma = ovsl_dereference(tbl->mask_array);
         }

However, there is still an issue.  Apparently this change just moves the 
bug.  Now I'm getting this splat:

[  512.147478] ------------[ cut here ]------------
[  512.147481] kernel BUG at net/openvswitch/flow_table.c:725!
[  512.147526] invalid opcode: 0000 [#1] SMP PTI
[  512.147552] CPU: 1 PID: 14636 Comm: ovs-vswitchd Tainted: 
G            E     5.4.0-rc1+ #138
[  512.147595] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  512.147630] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
[  512.147663] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff 
ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb 
92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
[  512.147753] RSP: 0018:ffffb637002cf8c8 EFLAGS: 00010246
[  512.147781] RAX: 0000000000000009 RBX: ffff95ebf32d23c0 RCX: 
ffff95ebf00e5a00
[  512.147817] RDX: ffff95ebf32d2420 RSI: 0000000000000009 RDI: 
ffff95ebf0dffba0
[  512.147852] RBP: ffffb637002cfb70 R08: ffff95ebf6030240 R09: 
ffff95ebf1643180
[  512.147888] R10: ffff95ebf283b814 R11: ffffffffc0932500 R12: 
ffff95ebf040a300
[  512.147924] R13: ffff95ebf0dffba0 R14: ffff95ebf283b814 R15: 
0000000000000007
[  512.147961] FS:  00007fbbab3d2980(0000) GS:ffff95ebf7a80000(0000) 
knlGS:0000000000000000
[  512.148001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  512.148031] CR2: 00007fffca190ff8 CR3: 0000000232810006 CR4: 
00000000001606e0
[  512.148071] Call Trace:
[  512.148092]  action_fifos_exit+0x3240/0x37b0 [openvswitch]
[  512.148125]  ? update_sd_lb_stats+0x613/0x760
[  512.148152]  ? find_busiest_group+0x3e/0x520
[  512.148177]  ? __nla_validate_parse+0x57/0x8a0
[  512.148203]  ? _cond_resched+0x15/0x40
[  512.148226]  ? genl_family_rcv_msg_attrs_parse+0xe4/0x110
[  512.148256]  genl_rcv_msg+0x1ed/0x430
[  512.148303]  ? __switch_to_asm+0x34/0x70
[  512.148326]  ? __switch_to_asm+0x40/0x70
[  512.148349]  ? __switch_to_asm+0x34/0x70
[  512.148371]  ? __switch_to_asm+0x40/0x70
[  512.148394]  ? __switch_to_asm+0x34/0x70
[  512.148416]  ? __switch_to_asm+0x40/0x70
[  512.148439]  ? genl_family_rcv_msg_attrs_parse+0x110/0x110
[  512.148470]  netlink_rcv_skb+0x4a/0x110
[  512.148492]  genl_rcv+0x24/0x40
[  512.148512]  netlink_unicast+0x1a0/0x250
[  512.148536]  netlink_sendmsg+0x2b4/0x3b0
[  512.148560]  sock_sendmsg+0x5b/0x60
[  512.148582]  ___sys_sendmsg+0x278/0x2f0
[  512.148607]  ? file_update_time+0x60/0x130
[  512.148630]  ? pipe_write+0x286/0x400
[  512.148653]  ? new_sync_write+0x12d/0x1d0
[  512.148676]  ? __sys_sendmsg+0x5e/0xa0
[  512.148697]  __sys_sendmsg+0x5e/0xa0
[  512.148720]  do_syscall_64+0x52/0x1a0
[  512.148742]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  512.148771] RIP: 0033:0x7fbbaa6f9a6d
[  512.149622] Code: b9 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0 
ff ff 73 31 c3 48 83 ec 08 e8 fe f6 ff ff 48 89 04 24 b8 2e 00 00 00 0f 
05 <48> 8b 3c 24 48 89 c2 e8 47 f7 ff ff 48 89 d0 48 83 c4 08 48 3d 01
[  512.151428] RSP: 002b:00007fffca1a1100 EFLAGS: 00000293 ORIG_RAX: 
000000000000002e
[  512.152349] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 
00007fbbaa6f9a6d
[  512.153266] RDX: 0000000000000000 RSI: 00007fffca1a1160 RDI: 
0000000000000010
[  512.154184] RBP: 0000000000f42680 R08: 0000000000000000 R09: 
00007fffca1a2118
[  512.155094] R10: 0000000000000008 R11: 0000000000000293 R12: 
00007fffca1a1f30
[  512.155992] R13: 00007fffca1a20c0 R14: 00007fffca1a1f38 R15: 
00007fffca1a15f0
[  512.156866] Modules linked in: vport_vxlan(E) vxlan(E) vport_gre(E) 
ip_gre(E) ip_tunnel(E) vport_geneve(E) geneve(E) ip6_udp_tunnel(E) 
udp_tunnel(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat_tftp(E) 
nf_conntrack_tftp(E) nf_nat_ftp(E) nf_conntrack_ftp(E) nf_nat(E) 
nf_conntrack_netlink(E) ip6table_filter(E) ip6_tables(E) 
iptable_filter(E) ip_tables(E) x_tables(E) ip6_gre(E) ip6_tunnel(E) 
tunnel6(E) gre(E) bonding(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E) 
veth(E) nfnetlink_cttimeout(E) nfnetlink(E) nf_conntrack(E) 
nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) intel_rapl_msr(E) 
snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_intel(E) 
snd_intel_nhlt(E) snd_hda_codec(E) intel_rapl_common(E) snd_hda_core(E) 
snd_hwdep(E) input_leds(E) snd_pcm(E) joydev(E) snd_timer(E) 
serio_raw(E) snd(E) soundcore(E) i2c_piix4(E) mac_hid(E) ib_iser(E) 
rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) configfs(E) iscsi_tcp(E) 
libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) autofs4(E) btrfs(E) 
zstd_decompress(E)
[  512.156899]  zstd_compress(E) raid10(E) raid456(E) 
async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) 
async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E) 
multipath(E) linear(E) crct10dif_pclmul(E) crc32_pclmul(E) 
ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) qxl(E) ttm(E) 
cryptd(E) glue_helper(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) 
sysimgblt(E) fb_sys_fops(E) psmouse(E) drm(E) pata_acpi(E) floppy(E) 
[last unloaded: nf_conntrack_ftp]
[  512.168488] ---[ end trace 26730810beeb11e1 ]---
[  512.169555] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
[  512.170638] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff 
ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb 
92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
[  512.172813] RSP: 0018:ffffb637002cf8c8 EFLAGS: 00010246
[  512.173897] RAX: 0000000000000009 RBX: ffff95ebf32d23c0 RCX: 
ffff95ebf00e5a00
[  512.174991] RDX: ffff95ebf32d2420 RSI: 0000000000000009 RDI: 
ffff95ebf0dffba0
[  512.176109] RBP: ffffb637002cfb70 R08: ffff95ebf6030240 R09: 
ffff95ebf1643180
[  512.177229] R10: ffff95ebf283b814 R11: ffffffffc0932500 R12: 
ffff95ebf040a300
[  512.178364] R13: ffff95ebf0dffba0 R14: ffff95ebf283b814 R15: 
0000000000000007
[  512.179530] FS:  00007fbbab3d2980(0000) GS:ffff95ebf7a80000(0000) 
knlGS:0000000000000000
[  512.180700] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  512.181885] CR2: 00007fffca190ff8 CR3: 0000000232810006 CR4: 
00000000001606e0

The code is hitting this:

static void tbl_mask_array_del_mask(struct flow_table *tbl,
                                     struct sw_flow_mask *mask)
{
         struct mask_array *ma = ovsl_dereference(tbl->mask_array);
         int i;

         /* Remove the deleted mask pointers from the array */
         for (i = 0; i < ma->count; i++) {
                 if (mask == ovsl_dereference(ma->masks[i]))
                         goto found;
         }

         BUG();   <----------------------------   Here

Pravin mentioned memory barrier usage in one of his replies. Perhaps 
that is an avenue to explore.

Thanks,

- Greg


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v2 00/10] optimize openvswitch flow looking up
  2019-10-14 22:26     ` Gregory Rose
@ 2019-10-15  8:25       ` Tonghao Zhang
  2019-10-15 17:37         ` Gregory Rose
  0 siblings, 1 reply; 16+ messages in thread
From: Tonghao Zhang @ 2019-10-15  8:25 UTC (permalink / raw)
  To: Gregory Rose; +Cc: Pravin Shelar, Linux Kernel Network Developers

On Tue, Oct 15, 2019 at 6:26 AM Gregory Rose <gvrose8192@gmail.com> wrote:
>
> On 10/10/2019 1:42 AM, Tonghao Zhang wrote:
> > On Wed, Oct 9, 2019 at 1:33 AM Gregory Rose <gvrose8192@gmail.com> wrote:
> >>
>
> [snip]
>
> >> Hi Tonghao,
> >>
> >> I've applied your patch series and built a 5.4.0-rc1 kernel with them.
> >>
> >> xxxxx@ubuntu-1604:~$ modinfo openvswitch
> >> filename: /lib/modules/5.4.0-rc1+/kernel/net/openvswitch/openvswitch.ko
> >> alias:          net-pf-16-proto-16-family-ovs_ct_limit
> >> alias:          net-pf-16-proto-16-family-ovs_meter
> >> alias:          net-pf-16-proto-16-family-ovs_packet
> >> alias:          net-pf-16-proto-16-family-ovs_flow
> >> alias:          net-pf-16-proto-16-family-ovs_vport
> >> alias:          net-pf-16-proto-16-family-ovs_datapath
> >> license:        GPL
> >> description:    Open vSwitch switching datapath
> >> srcversion:     F15EB8B4460D81BAA16216B
> >> depends: nf_conntrack,nf_nat,nf_conncount,libcrc32c,nf_defrag_ipv6,nsh
> >> retpoline:      Y
> >> intree:         Y
> >> name:           openvswitch
> >> vermagic:       5.4.0-rc1+ SMP mod_unload modversions
> >>
> >> I then built openvswitch master branch from github and ran 'make
> >> check-kernel'.
> >>
> >> In doing so I ran into the following splat in this test:
> >> 63: conntrack - IPv6 fragmentation + vlan
> >>
> >> Here is the splat:
> >> [  480.024215] ------------[ cut here ]------------
> >> [  480.024218] kernel BUG at net/openvswitch/flow_table.c:725!
> >> [  480.024267] invalid opcode: 0000 [#1] SMP PTI
> >> [  480.024297] CPU: 2 PID: 15717 Comm: ovs-vswitchd Tainted: G            E
> >> 5.4.0-rc1+ #131
> >> [  480.024345] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> >> [  480.024386] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
> >> [  480.024424] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
> >> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
> >> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
> >> [  480.024527] RSP: 0018:ffffaf32c05e38c8 EFLAGS: 00010246
> >> [  480.024560] RAX: 0000000000000010 RBX: ffff9e4f6cd5a000 RCX:
> >> ffff9e4f6c585000
> >> [  480.024601] RDX: ffff9e4f6cd5a098 RSI: 0000000000000010 RDI:
> >> ffff9e4f6b2c6d20
> >> [  480.024642] RBP: ffffaf32c05e3b70 R08: ffff9e4f6c1651c0 R09:
> >> ffff9e4f756a43c0
> >> [  480.024684] R10: 0000000000000000 R11: ffffffffc06e5500 R12:
> >> ffff9e4f6baf7800
> >> [  480.024742] R13: ffff9e4f6b2c6d20 R14: ffff9e4f724a4e14 R15:
> >> 0000000000000007
> >> [  480.024790] FS:  00007fdd76058980(0000) GS:ffff9e4f77b00000(0000)
> >> knlGS:0000000000000000
> >> [  480.024836] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [  480.024871] CR2: 00007ffd18a5ac60 CR3: 0000000230f3a002 CR4:
> >> 00000000001606e0
> >> [  480.024917] Call Trace:
> >> [  480.024941]  action_fifos_exit+0x3240/0x37b0 [openvswitch]
> >> [  480.024979]  ? __switch_to_asm+0x40/0x70
> >> [  480.025005]  ? __switch_to_asm+0x34/0x70
> >> [  480.025031]  ? __switch_to_asm+0x40/0x70
> >> [  480.025056]  ? __switch_to_asm+0x40/0x70
> >> [  480.025082]  ? __switch_to_asm+0x34/0x70
> >> [  480.025108]  ? __switch_to_asm+0x40/0x70
> >> [  480.025134]  ? __switch_to_asm+0x34/0x70
> >> [  480.025159]  ? __switch_to_asm+0x40/0x70
> >> [  480.025185]  ? __switch_to_asm+0x34/0x70
> >> [  480.025210]  ? __switch_to_asm+0x40/0x70
> >> [  480.025236]  ? __switch_to_asm+0x34/0x70
> >> [  480.025262]  ? __switch_to_asm+0x40/0x70
> >> [  480.025287]  ? __switch_to_asm+0x34/0x70
> >> [  480.025312]  ? __switch_to_asm+0x40/0x70
> >> [  480.025338]  ? __switch_to_asm+0x34/0x70
> >> [  480.025364]  ? __switch_to_asm+0x40/0x70
> >> [  480.025389]  ? __switch_to_asm+0x34/0x70
> >> [  480.025415]  ? __switch_to_asm+0x40/0x70
> >> [  480.025443]  ? __update_load_avg_se+0x11c/0x2e0
> >> [  480.025472]  ? __update_load_avg_se+0x11c/0x2e0
> >> [  480.025503]  ? update_load_avg+0x7e/0x600
> >> [  480.025529]  ? update_load_avg+0x7e/0x600
> >> [  480.025556]  ? update_curr+0x85/0x1d0
> >> [  480.025582]  ? cred_has_capability+0x85/0x130
> >> [  480.025611]  ? __nla_validate_parse+0x57/0x8a0
> >> [  480.025640]  ? _cond_resched+0x15/0x40
> >> [  480.025666]  ? genl_family_rcv_msg_attrs_parse.isra.14+0x93/0x100
> >> [  480.026523]  genl_rcv_msg+0x1d9/0x490
> >> [  480.027385]  ? __switch_to_asm+0x34/0x70
> >> [  480.028230]  ? __switch_to_asm+0x40/0x70
> >> [  480.029050]  ? __switch_to_asm+0x40/0x70
> >> [  480.029874]  ? genl_family_rcv_msg_attrs_parse.isra.14+0x100/0x100
> >> [  480.030673]  netlink_rcv_skb+0x4a/0x110
> >> [  480.031465]  genl_rcv+0x24/0x40
> >> [  480.032312]  netlink_unicast+0x1a0/0x250
> >> [  480.033059]  netlink_sendmsg+0x2b4/0x3b0
> >> [  480.033758]  sock_sendmsg+0x5b/0x60
> >> [  480.034422]  ___sys_sendmsg+0x278/0x2f0
> >> [  480.035083]  ? file_update_time+0x60/0x130
> >> [  480.035680]  ? pipe_write+0x286/0x400
> >> [  480.036290]  ? new_sync_write+0x12d/0x1d0
> >> [  480.036882]  ? __sys_sendmsg+0x5e/0xa0
> >> [  480.037452]  __sys_sendmsg+0x5e/0xa0
> >> [  480.038013]  do_syscall_64+0x52/0x1a0
> >> [  480.038546]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >> [  480.039083] RIP: 0033:0x7fdd7537fa6d
> >> [  480.039596] Code: b9 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0
> >> ff ff 73 31 c3 48 83 ec 08 e8 fe f6 ff ff 48 89 04 24 b8 2e 00 00 00 0f
> >> 05 <48> 8b 3c 24 48 89 c2 e8 47 f7 ff ff 48 89 d0 48 83 c4 08 48 3d 01
> >> [  480.040769] RSP: 002b:00007ffd18a6ad40 EFLAGS: 00000293 ORIG_RAX:
> >> 000000000000002e
> >> [  480.041391] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >> 00007fdd7537fa6d
> >> [  480.042045] RDX: 0000000000000000 RSI: 00007ffd18a6ada0 RDI:
> >> 0000000000000014
> >> [  480.042713] RBP: 0000000002300870 R08: 0000000000000000 R09:
> >> 00007ffd18a6bd58
> >> [  480.043438] R10: 0000000000000000 R11: 0000000000000293 R12:
> >> 00007ffd18a6bb70
> >> [  480.044138] R13: 00007ffd18a6bd00 R14: 00007ffd18a6bb78 R15:
> >> 00007ffd18a6b230
> >> [  480.044852] Modules linked in: vport_vxlan(E) vxlan(E) vport_gre(E)
> >> ip_gre(E) ip_tunnel(E) vport_geneve(E) geneve(E) ip6_udp_tunnel(E)
> >> udp_tunnel(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat_tftp(E)
> >> nf_conntrack_tftp(E) nf_nat_ftp(E) nf_conntrack_ftp(E) nf_nat(E)
> >> nf_conntrack_netlink(E) ip6table_filter(E) ip6_tables(E)
> >> iptable_filter(E) ip_tables(E) x_tables(E) ip6_gre(E) ip6_tunnel(E)
> >> tunnel6(E) gre(E) bonding(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E)
> >> veth(E) nfnetlink_cttimeout(E) nfnetlink(E) nf_conntrack(E)
> >> nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) intel_rapl_msr(E)
> >> snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_intel(E)
> >> snd_intel_nhlt(E) joydev(E) snd_hda_codec(E) input_leds(E)
> >> snd_hda_core(E) snd_hwdep(E) intel_rapl_common(E) snd_pcm(E)
> >> snd_timer(E) serio_raw(E) snd(E) soundcore(E) i2c_piix4(E) mac_hid(E)
> >> ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) configfs(E)
> >> iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E)
> >> autofs4(E) btrfs(E) zstd_decompress(E)
> >> [  480.044888]  zstd_compress(E) raid10(E) raid456(E)
> >> async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E)
> >> async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E)
> >> multipath(E) linear(E) crct10dif_pclmul(E) crc32_pclmul(E)
> >> ghash_clmulni_intel(E) aesni_intel(E) qxl(E) crypto_simd(E) ttm(E)
> >> cryptd(E) glue_helper(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E)
> >> sysimgblt(E) fb_sys_fops(E) psmouse(E) drm(E) floppy(E) pata_acpi(E)
> >> [last unloaded: nf_conntrack_ftp]
> >> [  480.056765] ---[ end trace 4a8c4eceeb9f5dec ]---
> >> [  480.057953] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
> >> [  480.059134] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
> >> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
> >> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
> >> [  480.061623] RSP: 0018:ffffaf32c05e38c8 EFLAGS: 00010246
> >> [  480.062959] RAX: 0000000000000010 RBX: ffff9e4f6cd5a000 RCX:
> >> ffff9e4f6c585000
> >> [  480.064248] RDX: ffff9e4f6cd5a098 RSI: 0000000000000010 RDI:
> >> ffff9e4f6b2c6d20
> >> [  480.065524] RBP: ffffaf32c05e3b70 R08: ffff9e4f6c1651c0 R09:
> >> ffff9e4f756a43c0
> >> [  480.066830] R10: 0000000000000000 R11: ffffffffc06e5500 R12:
> >> ffff9e4f6baf7800
> >> [  480.068870] R13: ffff9e4f6b2c6d20 R14: ffff9e4f724a4e14 R15:
> >> 0000000000000007
> >> [  480.070081] FS:  00007fdd76058980(0000) GS:ffff9e4f77b00000(0000)
> >> knlGS:0000000000000000
> >> [  480.071340] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [  480.072610] CR2: 00007ffd18a5ac60 CR3: 0000000230f3a002 CR4:
> >> 00000000001606e0
> >>
> >> You're hitting the BUG_ON here:
> >>
> >> /* Must be called with OVS mutex held. */
> >> void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow)
> >> {
> >>           struct table_instance *ti = ovsl_dereference(table->ti);
> >>           struct table_instance *ufid_ti = ovsl_dereference(table->ufid_ti);
> >>
> >>           BUG_ON(table->count == 0);
> >> <------------------------------------------------ Here
> > Hi Greg,
> > Thanks for your work, I fixed it, when relloac mask_array I don't
> > update ma point in patch 5.
> >
> > diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> > index bc14b12..210018a 100644
> > --- a/net/openvswitch/flow_table.c
> > +++ b/net/openvswitch/flow_table.c
> > @@ -827,6 +827,8 @@ static int tbl_mask_array_add_mask(struct flow_table *tbl,
> >                                                MASK_ARRAY_SIZE_MIN);
> >                  if (err)
> >                          return err;
> > +
> > +               ma = ovsl_dereference(tbl->mask_array);
> >          }
> >
> >          BUG_ON(ovsl_dereference(ma->masks[ma->count]));
>
> Hi Tonghao,
>
> I did make the change you suggested:
>
> git diff
> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> index bc14b12..210018a 100644
> --- a/net/openvswitch/flow_table.c
> +++ b/net/openvswitch/flow_table.c
> @@ -827,6 +827,8 @@ static int tbl_mask_array_add_mask(struct flow_table
> *tbl,
>                                                MASK_ARRAY_SIZE_MIN);
>                  if (err)
>                          return err;
> +
> +               ma = ovsl_dereference(tbl->mask_array);
>          }
>
> However, there is still an issue.  Apparently this change just moves the
> bug.  Now I'm getting this splat:
>
> [  512.147478] ------------[ cut here ]------------
> [  512.147481] kernel BUG at net/openvswitch/flow_table.c:725!
> [  512.147526] invalid opcode: 0000 [#1] SMP PTI
> [  512.147552] CPU: 1 PID: 14636 Comm: ovs-vswitchd Tainted:
> G            E     5.4.0-rc1+ #138
> [  512.147595] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [  512.147630] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
> [  512.147663] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
> [  512.147753] RSP: 0018:ffffb637002cf8c8 EFLAGS: 00010246
> [  512.147781] RAX: 0000000000000009 RBX: ffff95ebf32d23c0 RCX:
> ffff95ebf00e5a00
> [  512.147817] RDX: ffff95ebf32d2420 RSI: 0000000000000009 RDI:
> ffff95ebf0dffba0
> [  512.147852] RBP: ffffb637002cfb70 R08: ffff95ebf6030240 R09:
> ffff95ebf1643180
> [  512.147888] R10: ffff95ebf283b814 R11: ffffffffc0932500 R12:
> ffff95ebf040a300
> [  512.147924] R13: ffff95ebf0dffba0 R14: ffff95ebf283b814 R15:
> 0000000000000007
> [  512.147961] FS:  00007fbbab3d2980(0000) GS:ffff95ebf7a80000(0000)
> knlGS:0000000000000000
> [  512.148001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  512.148031] CR2: 00007fffca190ff8 CR3: 0000000232810006 CR4:
> 00000000001606e0
> [  512.148071] Call Trace:
> [  512.148092]  action_fifos_exit+0x3240/0x37b0 [openvswitch]
> [  512.148125]  ? update_sd_lb_stats+0x613/0x760
> [  512.148152]  ? find_busiest_group+0x3e/0x520
> [  512.148177]  ? __nla_validate_parse+0x57/0x8a0
> [  512.148203]  ? _cond_resched+0x15/0x40
> [  512.148226]  ? genl_family_rcv_msg_attrs_parse+0xe4/0x110
> [  512.148256]  genl_rcv_msg+0x1ed/0x430
> [  512.148303]  ? __switch_to_asm+0x34/0x70
> [  512.148326]  ? __switch_to_asm+0x40/0x70
> [  512.148349]  ? __switch_to_asm+0x34/0x70
> [  512.148371]  ? __switch_to_asm+0x40/0x70
> [  512.148394]  ? __switch_to_asm+0x34/0x70
> [  512.148416]  ? __switch_to_asm+0x40/0x70
> [  512.148439]  ? genl_family_rcv_msg_attrs_parse+0x110/0x110
> [  512.148470]  netlink_rcv_skb+0x4a/0x110
> [  512.148492]  genl_rcv+0x24/0x40
> [  512.148512]  netlink_unicast+0x1a0/0x250
> [  512.148536]  netlink_sendmsg+0x2b4/0x3b0
> [  512.148560]  sock_sendmsg+0x5b/0x60
> [  512.148582]  ___sys_sendmsg+0x278/0x2f0
> [  512.148607]  ? file_update_time+0x60/0x130
> [  512.148630]  ? pipe_write+0x286/0x400
> [  512.148653]  ? new_sync_write+0x12d/0x1d0
> [  512.148676]  ? __sys_sendmsg+0x5e/0xa0
> [  512.148697]  __sys_sendmsg+0x5e/0xa0
> [  512.148720]  do_syscall_64+0x52/0x1a0
> [  512.148742]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  512.148771] RIP: 0033:0x7fbbaa6f9a6d
> [  512.149622] Code: b9 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0
> ff ff 73 31 c3 48 83 ec 08 e8 fe f6 ff ff 48 89 04 24 b8 2e 00 00 00 0f
> 05 <48> 8b 3c 24 48 89 c2 e8 47 f7 ff ff 48 89 d0 48 83 c4 08 48 3d 01
> [  512.151428] RSP: 002b:00007fffca1a1100 EFLAGS: 00000293 ORIG_RAX:
> 000000000000002e
> [  512.152349] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> 00007fbbaa6f9a6d
> [  512.153266] RDX: 0000000000000000 RSI: 00007fffca1a1160 RDI:
> 0000000000000010
> [  512.154184] RBP: 0000000000f42680 R08: 0000000000000000 R09:
> 00007fffca1a2118
> [  512.155094] R10: 0000000000000008 R11: 0000000000000293 R12:
> 00007fffca1a1f30
> [  512.155992] R13: 00007fffca1a20c0 R14: 00007fffca1a1f38 R15:
> 00007fffca1a15f0
> [  512.156866] Modules linked in: vport_vxlan(E) vxlan(E) vport_gre(E)
> ip_gre(E) ip_tunnel(E) vport_geneve(E) geneve(E) ip6_udp_tunnel(E)
> udp_tunnel(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat_tftp(E)
> nf_conntrack_tftp(E) nf_nat_ftp(E) nf_conntrack_ftp(E) nf_nat(E)
> nf_conntrack_netlink(E) ip6table_filter(E) ip6_tables(E)
> iptable_filter(E) ip_tables(E) x_tables(E) ip6_gre(E) ip6_tunnel(E)
> tunnel6(E) gre(E) bonding(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E)
> veth(E) nfnetlink_cttimeout(E) nfnetlink(E) nf_conntrack(E)
> nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) intel_rapl_msr(E)
> snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_intel(E)
> snd_intel_nhlt(E) snd_hda_codec(E) intel_rapl_common(E) snd_hda_core(E)
> snd_hwdep(E) input_leds(E) snd_pcm(E) joydev(E) snd_timer(E)
> serio_raw(E) snd(E) soundcore(E) i2c_piix4(E) mac_hid(E) ib_iser(E)
> rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) configfs(E) iscsi_tcp(E)
> libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) autofs4(E) btrfs(E)
> zstd_decompress(E)
> [  512.156899]  zstd_compress(E) raid10(E) raid456(E)
> async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E)
> async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E)
> multipath(E) linear(E) crct10dif_pclmul(E) crc32_pclmul(E)
> ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) qxl(E) ttm(E)
> cryptd(E) glue_helper(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E)
> sysimgblt(E) fb_sys_fops(E) psmouse(E) drm(E) pata_acpi(E) floppy(E)
> [last unloaded: nf_conntrack_ftp]
> [  512.168488] ---[ end trace 26730810beeb11e1 ]---
> [  512.169555] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
> [  512.170638] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
> [  512.172813] RSP: 0018:ffffb637002cf8c8 EFLAGS: 00010246
> [  512.173897] RAX: 0000000000000009 RBX: ffff95ebf32d23c0 RCX:
> ffff95ebf00e5a00
> [  512.174991] RDX: ffff95ebf32d2420 RSI: 0000000000000009 RDI:
> ffff95ebf0dffba0
> [  512.176109] RBP: ffffb637002cfb70 R08: ffff95ebf6030240 R09:
> ffff95ebf1643180
> [  512.177229] R10: ffff95ebf283b814 R11: ffffffffc0932500 R12:
> ffff95ebf040a300
> [  512.178364] R13: ffff95ebf0dffba0 R14: ffff95ebf283b814 R15:
> 0000000000000007
> [  512.179530] FS:  00007fbbab3d2980(0000) GS:ffff95ebf7a80000(0000)
> knlGS:0000000000000000
> [  512.180700] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  512.181885] CR2: 00007fffca190ff8 CR3: 0000000232810006 CR4:
> 00000000001606e0
>
> The code is hitting this:
>
> static void tbl_mask_array_del_mask(struct flow_table *tbl,
>                                      struct sw_flow_mask *mask)
> {
>          struct mask_array *ma = ovsl_dereference(tbl->mask_array);
>          int i;
>
>          /* Remove the deleted mask pointers from the array */
>          for (i = 0; i < ma->count; i++) {
>                  if (mask == ovsl_dereference(ma->masks[i]))
>                          goto found;
>          }
>
>          BUG();   <----------------------------   Here
>
> Pravin mentioned memory barrier usage in one of his replies. Perhaps
> that is an avenue to explore.
Hi Pravin, Greg
I run the make check-kernel for a long time, and don't reproduce it.
Greg, how did you reproduce it?

For using barrier, i should add READ_ONCE in flow_lookup in fast path
(read-side ), and use WRITE_ONCE in
tbl_mask_array_add/del_mask (write-side) protected by ovs_mutex. Other
read-side (protected by ovs_mutex),
e.g
* flow_mask_find
* ovs_flow_tbl_lookup_exact
* ovs_flow_tbl_num_masks

can access ma->count directly ?

> Thanks,
>
> - Greg
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next v2 00/10] optimize openvswitch flow looking up
  2019-10-15  8:25       ` Tonghao Zhang
@ 2019-10-15 17:37         ` Gregory Rose
  0 siblings, 0 replies; 16+ messages in thread
From: Gregory Rose @ 2019-10-15 17:37 UTC (permalink / raw)
  To: Tonghao Zhang; +Cc: Pravin Shelar, Linux Kernel Network Developers


On 10/15/2019 1:25 AM, Tonghao Zhang wrote:
> On Tue, Oct 15, 2019 at 6:26 AM Gregory Rose <gvrose8192@gmail.com> wrote:

[snip]

>> Hi Tonghao,
>> I did make the change you suggested:
>>
>> git diff
>> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
>> index bc14b12..210018a 100644
>> --- a/net/openvswitch/flow_table.c
>> +++ b/net/openvswitch/flow_table.c
>> @@ -827,6 +827,8 @@ static int tbl_mask_array_add_mask(struct flow_table
>> *tbl,
>>                                                 MASK_ARRAY_SIZE_MIN);
>>                   if (err)
>>                           return err;
>> +
>> +               ma = ovsl_dereference(tbl->mask_array);
>>           }
>>
>> However, there is still an issue.  Apparently this change just moves the
>> bug.  Now I'm getting this splat:
>>
>> [  512.147478] ------------[ cut here ]------------
>> [  512.147481] kernel BUG at net/openvswitch/flow_table.c:725!
>> [  512.147526] invalid opcode: 0000 [#1] SMP PTI
>> [  512.147552] CPU: 1 PID: 14636 Comm: ovs-vswitchd Tainted:
>> G            E     5.4.0-rc1+ #138
>> [  512.147595] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>> [  512.147630] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
>> [  512.147663] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
>> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
>> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
>> [  512.147753] RSP: 0018:ffffb637002cf8c8 EFLAGS: 00010246
>> [  512.147781] RAX: 0000000000000009 RBX: ffff95ebf32d23c0 RCX:
>> ffff95ebf00e5a00
>> [  512.147817] RDX: ffff95ebf32d2420 RSI: 0000000000000009 RDI:
>> ffff95ebf0dffba0
>> [  512.147852] RBP: ffffb637002cfb70 R08: ffff95ebf6030240 R09:
>> ffff95ebf1643180
>> [  512.147888] R10: ffff95ebf283b814 R11: ffffffffc0932500 R12:
>> ffff95ebf040a300
>> [  512.147924] R13: ffff95ebf0dffba0 R14: ffff95ebf283b814 R15:
>> 0000000000000007
>> [  512.147961] FS:  00007fbbab3d2980(0000) GS:ffff95ebf7a80000(0000)
>> knlGS:0000000000000000
>> [  512.148001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  512.148031] CR2: 00007fffca190ff8 CR3: 0000000232810006 CR4:
>> 00000000001606e0
>> [  512.148071] Call Trace:
>> [  512.148092]  action_fifos_exit+0x3240/0x37b0 [openvswitch]
>> [  512.148125]  ? update_sd_lb_stats+0x613/0x760
>> [  512.148152]  ? find_busiest_group+0x3e/0x520
>> [  512.148177]  ? __nla_validate_parse+0x57/0x8a0
>> [  512.148203]  ? _cond_resched+0x15/0x40
>> [  512.148226]  ? genl_family_rcv_msg_attrs_parse+0xe4/0x110
>> [  512.148256]  genl_rcv_msg+0x1ed/0x430
>> [  512.148303]  ? __switch_to_asm+0x34/0x70
>> [  512.148326]  ? __switch_to_asm+0x40/0x70
>> [  512.148349]  ? __switch_to_asm+0x34/0x70
>> [  512.148371]  ? __switch_to_asm+0x40/0x70
>> [  512.148394]  ? __switch_to_asm+0x34/0x70
>> [  512.148416]  ? __switch_to_asm+0x40/0x70
>> [  512.148439]  ? genl_family_rcv_msg_attrs_parse+0x110/0x110
>> [  512.148470]  netlink_rcv_skb+0x4a/0x110
>> [  512.148492]  genl_rcv+0x24/0x40
>> [  512.148512]  netlink_unicast+0x1a0/0x250
>> [  512.148536]  netlink_sendmsg+0x2b4/0x3b0
>> [  512.148560]  sock_sendmsg+0x5b/0x60
>> [  512.148582]  ___sys_sendmsg+0x278/0x2f0
>> [  512.148607]  ? file_update_time+0x60/0x130
>> [  512.148630]  ? pipe_write+0x286/0x400
>> [  512.148653]  ? new_sync_write+0x12d/0x1d0
>> [  512.148676]  ? __sys_sendmsg+0x5e/0xa0
>> [  512.148697]  __sys_sendmsg+0x5e/0xa0
>> [  512.148720]  do_syscall_64+0x52/0x1a0
>> [  512.148742]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [  512.148771] RIP: 0033:0x7fbbaa6f9a6d
>> [  512.149622] Code: b9 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0
>> ff ff 73 31 c3 48 83 ec 08 e8 fe f6 ff ff 48 89 04 24 b8 2e 00 00 00 0f
>> 05 <48> 8b 3c 24 48 89 c2 e8 47 f7 ff ff 48 89 d0 48 83 c4 08 48 3d 01
>> [  512.151428] RSP: 002b:00007fffca1a1100 EFLAGS: 00000293 ORIG_RAX:
>> 000000000000002e
>> [  512.152349] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>> 00007fbbaa6f9a6d
>> [  512.153266] RDX: 0000000000000000 RSI: 00007fffca1a1160 RDI:
>> 0000000000000010
>> [  512.154184] RBP: 0000000000f42680 R08: 0000000000000000 R09:
>> 00007fffca1a2118
>> [  512.155094] R10: 0000000000000008 R11: 0000000000000293 R12:
>> 00007fffca1a1f30
>> [  512.155992] R13: 00007fffca1a20c0 R14: 00007fffca1a1f38 R15:
>> 00007fffca1a15f0
>> [  512.156866] Modules linked in: vport_vxlan(E) vxlan(E) vport_gre(E)
>> ip_gre(E) ip_tunnel(E) vport_geneve(E) geneve(E) ip6_udp_tunnel(E)
>> udp_tunnel(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat_tftp(E)
>> nf_conntrack_tftp(E) nf_nat_ftp(E) nf_conntrack_ftp(E) nf_nat(E)
>> nf_conntrack_netlink(E) ip6table_filter(E) ip6_tables(E)
>> iptable_filter(E) ip_tables(E) x_tables(E) ip6_gre(E) ip6_tunnel(E)
>> tunnel6(E) gre(E) bonding(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E)
>> veth(E) nfnetlink_cttimeout(E) nfnetlink(E) nf_conntrack(E)
>> nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) intel_rapl_msr(E)
>> snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_intel(E)
>> snd_intel_nhlt(E) snd_hda_codec(E) intel_rapl_common(E) snd_hda_core(E)
>> snd_hwdep(E) input_leds(E) snd_pcm(E) joydev(E) snd_timer(E)
>> serio_raw(E) snd(E) soundcore(E) i2c_piix4(E) mac_hid(E) ib_iser(E)
>> rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) configfs(E) iscsi_tcp(E)
>> libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) autofs4(E) btrfs(E)
>> zstd_decompress(E)
>> [  512.156899]  zstd_compress(E) raid10(E) raid456(E)
>> async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E)
>> async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E)
>> multipath(E) linear(E) crct10dif_pclmul(E) crc32_pclmul(E)
>> ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) qxl(E) ttm(E)
>> cryptd(E) glue_helper(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E)
>> sysimgblt(E) fb_sys_fops(E) psmouse(E) drm(E) pata_acpi(E) floppy(E)
>> [last unloaded: nf_conntrack_ftp]
>> [  512.168488] ---[ end trace 26730810beeb11e1 ]---
>> [  512.169555] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
>> [  512.170638] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
>> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
>> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
>> [  512.172813] RSP: 0018:ffffb637002cf8c8 EFLAGS: 00010246
>> [  512.173897] RAX: 0000000000000009 RBX: ffff95ebf32d23c0 RCX:
>> ffff95ebf00e5a00
>> [  512.174991] RDX: ffff95ebf32d2420 RSI: 0000000000000009 RDI:
>> ffff95ebf0dffba0
>> [  512.176109] RBP: ffffb637002cfb70 R08: ffff95ebf6030240 R09:
>> ffff95ebf1643180
>> [  512.177229] R10: ffff95ebf283b814 R11: ffffffffc0932500 R12:
>> ffff95ebf040a300
>> [  512.178364] R13: ffff95ebf0dffba0 R14: ffff95ebf283b814 R15:
>> 0000000000000007
>> [  512.179530] FS:  00007fbbab3d2980(0000) GS:ffff95ebf7a80000(0000)
>> knlGS:0000000000000000
>> [  512.180700] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  512.181885] CR2: 00007fffca190ff8 CR3: 0000000232810006 CR4:
>> 00000000001606e0
>>
>> The code is hitting this:
>>
>> static void tbl_mask_array_del_mask(struct flow_table *tbl,
>>                                       struct sw_flow_mask *mask)
>> {
>>           struct mask_array *ma = ovsl_dereference(tbl->mask_array);
>>           int i;
>>
>>           /* Remove the deleted mask pointers from the array */
>>           for (i = 0; i < ma->count; i++) {
>>                   if (mask == ovsl_dereference(ma->masks[i]))
>>                           goto found;
>>           }
>>
>>           BUG();   <----------------------------   Here
>>
>> Pravin mentioned memory barrier usage in one of his replies. Perhaps
>> that is an avenue to explore.
> Hi Pravin, Greg
> I run the make check-kernel for a long time, and don't reproduce it.
> Greg, how did you reproduce it?

Hi Tonghao,

I use a Ubuntu 16.04 base system with updates all applied and build the 
net-next kernel (with your
patches applied) and then install it.

I boot to the new kernel and then build the current master branch of OVS 
from github:
./boot.sh
mkdir _build
cd _build
../configure
make -j8

I then run 'sudo make check-kernel TESTSUITEFLAGS="63"'

Hope this helps with your repro.

Thanks,

- Greg

>
> For using barrier, i should add READ_ONCE in flow_lookup in fast path
> (read-side ), and use WRITE_ONCE in
> tbl_mask_array_add/del_mask (write-side) protected by ovs_mutex. Other
> read-side (protected by ovs_mutex),
> e.g
> * flow_mask_find
> * ovs_flow_tbl_lookup_exact
> * ovs_flow_tbl_num_masks
>
> can access ma->count directly ?
>
>> Thanks,
>>
>> - Greg
>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-10-15 17:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-08  1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
2019-10-08  1:00 ` [PATCH net-next v2 01/10] net: openvswitch: add flow-mask cache for performance xiangxia.m.yue
2019-10-08  1:00 ` [PATCH net-next v2 02/10] net: openvswitch: convert mask list in mask array xiangxia.m.yue
2019-10-08  1:00 ` [PATCH net-next v2 03/10] net: openvswitch: shrink the mask array if necessary xiangxia.m.yue
2019-10-08  1:00 ` [PATCH net-next v2 04/10] net: openvswitch: optimize flow-mask cache hash collision xiangxia.m.yue
2019-10-08  1:00 ` [PATCH net-next v2 05/10] net: openvswitch: optimize flow-mask looking up xiangxia.m.yue
2019-10-08  1:00 ` [PATCH net-next v2 06/10] net: openvswitch: simplify the flow_hash xiangxia.m.yue
2019-10-08  1:00 ` [PATCH net-next v2 07/10] net: openvswitch: add likely in flow_lookup xiangxia.m.yue
2019-10-08  1:00 ` [PATCH net-next v2 08/10] net: openvswitch: fix possible memleak on destroy flow-table xiangxia.m.yue
2019-10-08  1:00 ` [PATCH net-next v2 09/10] net: openvswitch: don't unlock mutex when changing the user_features fails xiangxia.m.yue
2019-10-08  1:00 ` [PATCH net-next v2 10/10] net: openvswitch: simplify the ovs_dp_cmd_new xiangxia.m.yue
2019-10-08 17:33 ` [PATCH net-next v2 00/10] optimize openvswitch flow looking up Gregory Rose
2019-10-10  8:42   ` Tonghao Zhang
2019-10-14 22:26     ` Gregory Rose
2019-10-15  8:25       ` Tonghao Zhang
2019-10-15 17:37         ` Gregory Rose

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).