* [PATCH net-next v2 00/10] optimize openvswitch flow looking up
@ 2019-10-08 1:00 xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 01/10] net: openvswitch: add flow-mask cache for performance xiangxia.m.yue
` (10 more replies)
0 siblings, 11 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08 1:00 UTC (permalink / raw)
To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
This series patch optimize openvswitch for performance or simplify
codes.
Patch 1, 2, 4: Port Pravin B Shelar patches to
linux upstream with little changes.
Patch 5, 6, 7: Optimize the flow looking up and
simplify the flow hash.
Patch 8, 9: are bugfix.
The performance test is on Intel Xeon E5-2630 v4.
The test topology is show as below:
+-----------------------------------+
| +---------------------------+ |
| | eth0 ovs-switch eth1 | | Host0
| +---------------------------+ |
+-----------------------------------+
^ |
| |
| |
| |
| v
+-----+----+ +----+-----+
| netperf | Host1 | netserver| Host2
+----------+ +----------+
We use netperf send the 64B packets, and insert 255+ flow-mask:
$ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:01:00:00:00:00/ff:ff:ff:ff:ff:01),eth_type(0x0800),ipv4(frag=no)" 2
...
$ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:ff:00:00:00:00/ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(frag=no)" 2
$
$ netperf -t UDP_STREAM -H 2.2.2.200 -l 40 -- -m 18
* Without series patch, throughput 8.28Mbps
* With series patch, throughput 46.05Mbps
v1 -> v2:
1. use kfree_rcu instead of call_rcu.
2. add barrier when changing the ma->count.
3. change the ma->max to ma->count in flow_lookup.
Tonghao Zhang (10):
net: openvswitch: add flow-mask cache for performance
net: openvswitch: convert mask list in mask array
net: openvswitch: shrink the mask array if necessary
net: openvswitch: optimize flow-mask cache hash collision
net: openvswitch: optimize flow-mask looking up
net: openvswitch: simplify the flow_hash
net: openvswitch: add likely in flow_lookup
net: openvswitch: fix possible memleak on destroy flow-table
net: openvswitch: don't unlock mutex when changing the user_features
fails
net: openvswitch: simplify the ovs_dp_cmd_new
net/openvswitch/datapath.c | 65 +++++----
net/openvswitch/flow.h | 1 -
net/openvswitch/flow_table.c | 315 +++++++++++++++++++++++++++++++++++++------
net/openvswitch/flow_table.h | 19 ++-
4 files changed, 328 insertions(+), 72 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH net-next v2 01/10] net: openvswitch: add flow-mask cache for performance
2019-10-08 1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
@ 2019-10-08 1:00 ` xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 02/10] net: openvswitch: convert mask list in mask array xiangxia.m.yue
` (9 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08 1:00 UTC (permalink / raw)
To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
The idea of this optimization comes from a patch which
is committed in 2014, openvswitch community. The author
is Pravin B Shelar. In order to get high performance, I
implement it again. Later patches will use it.
Pravin B Shelar, says:
| On every packet OVS needs to lookup flow-table with every
| mask until it finds a match. The packet flow-key is first
| masked with mask in the list and then the masked key is
| looked up in flow-table. Therefore number of masks can
| affect packet processing performance.
Link: https://github.com/openvswitch/ovs/commit/5604935e4e1cbc16611d2d97f50b717aa31e8ec5
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
net/openvswitch/datapath.c | 3 +-
net/openvswitch/flow_table.c | 109 +++++++++++++++++++++++++++++++++++++------
net/openvswitch/flow_table.h | 11 ++++-
3 files changed, 107 insertions(+), 16 deletions(-)
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index f30e406..9fea7e1 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -227,7 +227,8 @@ void ovs_dp_process_packet(struct sk_buff *skb, struct sw_flow_key *key)
stats = this_cpu_ptr(dp->stats_percpu);
/* Look up flow. */
- flow = ovs_flow_tbl_lookup_stats(&dp->table, key, &n_mask_hit);
+ flow = ovs_flow_tbl_lookup_stats(&dp->table, key, skb_get_hash(skb),
+ &n_mask_hit);
if (unlikely(!flow)) {
struct dp_upcall_info upcall;
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index cf3582c..3d515c0 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -36,6 +36,10 @@
#define TBL_MIN_BUCKETS 1024
#define REHASH_INTERVAL (10 * 60 * HZ)
+#define MC_HASH_SHIFT 8
+#define MC_HASH_ENTRIES (1u << MC_HASH_SHIFT)
+#define MC_HASH_SEGS ((sizeof(uint32_t) * 8) / MC_HASH_SHIFT)
+
static struct kmem_cache *flow_cache;
struct kmem_cache *flow_stats_cache __read_mostly;
@@ -168,10 +172,15 @@ int ovs_flow_tbl_init(struct flow_table *table)
{
struct table_instance *ti, *ufid_ti;
- ti = table_instance_alloc(TBL_MIN_BUCKETS);
+ table->mask_cache = __alloc_percpu(sizeof(struct mask_cache_entry) *
+ MC_HASH_ENTRIES,
+ __alignof__(struct mask_cache_entry));
+ if (!table->mask_cache)
+ return -ENOMEM;
+ ti = table_instance_alloc(TBL_MIN_BUCKETS);
if (!ti)
- return -ENOMEM;
+ goto free_mask_cache;
ufid_ti = table_instance_alloc(TBL_MIN_BUCKETS);
if (!ufid_ti)
@@ -187,6 +196,8 @@ int ovs_flow_tbl_init(struct flow_table *table)
free_ti:
__table_instance_destroy(ti);
+free_mask_cache:
+ free_percpu(table->mask_cache);
return -ENOMEM;
}
@@ -243,6 +254,7 @@ void ovs_flow_tbl_destroy(struct flow_table *table)
struct table_instance *ti = rcu_dereference_raw(table->ti);
struct table_instance *ufid_ti = rcu_dereference_raw(table->ufid_ti);
+ free_percpu(table->mask_cache);
table_instance_destroy(ti, ufid_ti, false);
}
@@ -425,7 +437,8 @@ static bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow,
static struct sw_flow *masked_flow_lookup(struct table_instance *ti,
const struct sw_flow_key *unmasked,
- const struct sw_flow_mask *mask)
+ const struct sw_flow_mask *mask,
+ u32 *n_mask_hit)
{
struct sw_flow *flow;
struct hlist_head *head;
@@ -435,6 +448,8 @@ static struct sw_flow *masked_flow_lookup(struct table_instance *ti,
ovs_flow_mask_key(&masked_key, unmasked, false, mask);
hash = flow_hash(&masked_key, &mask->range);
head = find_bucket(ti, hash);
+ (*n_mask_hit)++;
+
hlist_for_each_entry_rcu(flow, head, flow_table.node[ti->node_ver]) {
if (flow->mask == mask && flow->flow_table.hash == hash &&
flow_cmp_masked_key(flow, &masked_key, &mask->range))
@@ -443,30 +458,97 @@ static struct sw_flow *masked_flow_lookup(struct table_instance *ti,
return NULL;
}
-struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
- const struct sw_flow_key *key,
- u32 *n_mask_hit)
+static struct sw_flow *flow_lookup(struct flow_table *tbl,
+ struct table_instance *ti,
+ const struct sw_flow_key *key,
+ u32 *n_mask_hit)
{
- struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
struct sw_flow_mask *mask;
struct sw_flow *flow;
- *n_mask_hit = 0;
list_for_each_entry_rcu(mask, &tbl->mask_list, list) {
- (*n_mask_hit)++;
- flow = masked_flow_lookup(ti, key, mask);
+ flow = masked_flow_lookup(ti, key, mask, n_mask_hit);
if (flow) /* Found */
return flow;
}
return NULL;
}
+/*
+ * mask_cache maps flow to probable mask. This cache is not tightly
+ * coupled cache, It means updates to mask list can result in inconsistent
+ * cache entry in mask cache.
+ * This is per cpu cache and is divided in MC_HASH_SEGS segments.
+ * In case of a hash collision the entry is hashed in next segment.
+ * */
+struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
+ const struct sw_flow_key *key,
+ u32 skb_hash,
+ u32 *n_mask_hit)
+{
+ struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
+ struct mask_cache_entry *entries, *ce, *del;
+ struct sw_flow *flow;
+ u32 hash = skb_hash;
+ int seg;
+
+ *n_mask_hit = 0;
+ if (unlikely(!skb_hash))
+ return flow_lookup(tbl, ti, key, n_mask_hit);
+
+ del = NULL;
+ entries = this_cpu_ptr(tbl->mask_cache);
+
+ for (seg = 0; seg < MC_HASH_SEGS; seg++) {
+ int index;
+
+ index = hash & (MC_HASH_ENTRIES - 1);
+ ce = &entries[index];
+
+ if (ce->skb_hash == skb_hash) {
+ struct sw_flow_mask *mask;
+ int i;
+
+ i = 0;
+ list_for_each_entry_rcu(mask, &tbl->mask_list, list) {
+ if (ce->mask_index == i++) {
+ flow = masked_flow_lookup(ti, key, mask,
+ n_mask_hit);
+ if (flow) /* Found */
+ return flow;
+
+ break;
+ }
+ }
+
+ del = ce;
+ break;
+ }
+
+ if (!del || (del->skb_hash && !ce->skb_hash)) {
+ del = ce;
+ }
+
+ hash >>= MC_HASH_SHIFT;
+ }
+
+ flow = flow_lookup(tbl, ti, key, n_mask_hit);
+
+ if (flow) {
+ del->skb_hash = skb_hash;
+ del->mask_index = (*n_mask_hit - 1);
+ }
+
+ return flow;
+}
+
struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *tbl,
const struct sw_flow_key *key)
{
+ struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
u32 __always_unused n_mask_hit;
- return ovs_flow_tbl_lookup_stats(tbl, key, &n_mask_hit);
+ return flow_lookup(tbl, ti, key, &n_mask_hit);
}
struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,
@@ -475,10 +557,11 @@ struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,
struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
struct sw_flow_mask *mask;
struct sw_flow *flow;
+ u32 __always_unused n_mask_hit;
/* Always called under ovs-mutex. */
list_for_each_entry(mask, &tbl->mask_list, list) {
- flow = masked_flow_lookup(ti, match->key, mask);
+ flow = masked_flow_lookup(ti, match->key, mask, &n_mask_hit);
if (flow && ovs_identifier_is_key(&flow->id) &&
ovs_flow_cmp_unmasked_key(flow, match))
return flow;
@@ -631,7 +714,7 @@ static int flow_mask_insert(struct flow_table *tbl, struct sw_flow *flow,
return -ENOMEM;
mask->key = new->key;
mask->range = new->range;
- list_add_rcu(&mask->list, &tbl->mask_list);
+ list_add_tail_rcu(&mask->list, &tbl->mask_list);
} else {
BUG_ON(!mask->ref_count);
mask->ref_count++;
diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h
index bc52045..04b6b1c 100644
--- a/net/openvswitch/flow_table.h
+++ b/net/openvswitch/flow_table.h
@@ -22,6 +22,11 @@
#include "flow.h"
+struct mask_cache_entry {
+ u32 skb_hash;
+ u32 mask_index;
+};
+
struct table_instance {
struct hlist_head *buckets;
unsigned int n_buckets;
@@ -34,6 +39,7 @@ struct table_instance {
struct flow_table {
struct table_instance __rcu *ti;
struct table_instance __rcu *ufid_ti;
+ struct mask_cache_entry __percpu *mask_cache;
struct list_head mask_list;
unsigned long last_rehash;
unsigned int count;
@@ -60,8 +66,9 @@ int ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow,
struct sw_flow *ovs_flow_tbl_dump_next(struct table_instance *table,
u32 *bucket, u32 *idx);
struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *,
- const struct sw_flow_key *,
- u32 *n_mask_hit);
+ const struct sw_flow_key *,
+ u32 skb_hash,
+ u32 *n_mask_hit);
struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *,
const struct sw_flow_key *);
struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,
--
1.8.3.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v2 02/10] net: openvswitch: convert mask list in mask array
2019-10-08 1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 01/10] net: openvswitch: add flow-mask cache for performance xiangxia.m.yue
@ 2019-10-08 1:00 ` xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 03/10] net: openvswitch: shrink the mask array if necessary xiangxia.m.yue
` (8 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08 1:00 UTC (permalink / raw)
To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Port the codes to linux upstream and with little changes.
Pravin B Shelar, says:
| mask caches index of mask in mask_list. On packet recv OVS
| need to traverse mask-list to get cached mask. Therefore array
| is better for retrieving cached mask. This also allows better
| cache replacement algorithm by directly checking mask's existence.
Link: https://github.com/openvswitch/ovs/commit/d49fc3ff53c65e4eca9cabd52ac63396746a7ef5
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
net/openvswitch/flow.h | 1 -
net/openvswitch/flow_table.c | 210 ++++++++++++++++++++++++++++++++-----------
net/openvswitch/flow_table.h | 8 +-
3 files changed, 167 insertions(+), 52 deletions(-)
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index b830d5f..8080518 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -166,7 +166,6 @@ struct sw_flow_key_range {
struct sw_flow_mask {
int ref_count;
struct rcu_head rcu;
- struct list_head list;
struct sw_flow_key_range range;
struct sw_flow_key key;
};
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 3d515c0..aab7a27 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -34,6 +34,7 @@
#include <net/ndisc.h>
#define TBL_MIN_BUCKETS 1024
+#define MASK_ARRAY_SIZE_MIN 16
#define REHASH_INTERVAL (10 * 60 * HZ)
#define MC_HASH_SHIFT 8
@@ -168,9 +169,51 @@ static struct table_instance *table_instance_alloc(int new_size)
return ti;
}
+static struct mask_array *tbl_mask_array_alloc(int size)
+{
+ struct mask_array *new;
+
+ size = max(MASK_ARRAY_SIZE_MIN, size);
+ new = kzalloc(sizeof(struct mask_array) +
+ sizeof(struct sw_flow_mask *) * size, GFP_KERNEL);
+ if (!new)
+ return NULL;
+
+ new->count = 0;
+ new->max = size;
+
+ return new;
+}
+
+static int tbl_mask_array_realloc(struct flow_table *tbl, int size)
+{
+ struct mask_array *old;
+ struct mask_array *new;
+
+ new = tbl_mask_array_alloc(size);
+ if (!new)
+ return -ENOMEM;
+
+ old = ovsl_dereference(tbl->mask_array);
+ if (old) {
+ int i;
+
+ for (i = 0; i < old->max; i++) {
+ if (ovsl_dereference(old->masks[i]))
+ new->masks[new->count++] = old->masks[i];
+ }
+ }
+
+ rcu_assign_pointer(tbl->mask_array, new);
+ kfree_rcu(old, rcu);
+
+ return 0;
+}
+
int ovs_flow_tbl_init(struct flow_table *table)
{
struct table_instance *ti, *ufid_ti;
+ struct mask_array *ma;
table->mask_cache = __alloc_percpu(sizeof(struct mask_cache_entry) *
MC_HASH_ENTRIES,
@@ -178,9 +221,13 @@ int ovs_flow_tbl_init(struct flow_table *table)
if (!table->mask_cache)
return -ENOMEM;
+ ma = tbl_mask_array_alloc(MASK_ARRAY_SIZE_MIN);
+ if (!ma)
+ goto free_mask_cache;
+
ti = table_instance_alloc(TBL_MIN_BUCKETS);
if (!ti)
- goto free_mask_cache;
+ goto free_mask_array;
ufid_ti = table_instance_alloc(TBL_MIN_BUCKETS);
if (!ufid_ti)
@@ -188,7 +235,7 @@ int ovs_flow_tbl_init(struct flow_table *table)
rcu_assign_pointer(table->ti, ti);
rcu_assign_pointer(table->ufid_ti, ufid_ti);
- INIT_LIST_HEAD(&table->mask_list);
+ rcu_assign_pointer(table->mask_array, ma);
table->last_rehash = jiffies;
table->count = 0;
table->ufid_count = 0;
@@ -196,6 +243,8 @@ int ovs_flow_tbl_init(struct flow_table *table)
free_ti:
__table_instance_destroy(ti);
+free_mask_array:
+ kfree(ma);
free_mask_cache:
free_percpu(table->mask_cache);
return -ENOMEM;
@@ -255,6 +304,7 @@ void ovs_flow_tbl_destroy(struct flow_table *table)
struct table_instance *ufid_ti = rcu_dereference_raw(table->ufid_ti);
free_percpu(table->mask_cache);
+ kfree_rcu(table->mask_array, rcu);
table_instance_destroy(ti, ufid_ti, false);
}
@@ -460,17 +510,27 @@ static struct sw_flow *masked_flow_lookup(struct table_instance *ti,
static struct sw_flow *flow_lookup(struct flow_table *tbl,
struct table_instance *ti,
+ struct mask_array *ma,
const struct sw_flow_key *key,
- u32 *n_mask_hit)
+ u32 *n_mask_hit,
+ u32 *index)
{
- struct sw_flow_mask *mask;
struct sw_flow *flow;
+ int i;
- list_for_each_entry_rcu(mask, &tbl->mask_list, list) {
- flow = masked_flow_lookup(ti, key, mask, n_mask_hit);
- if (flow) /* Found */
- return flow;
+ for (i = 0; i < ma->max; i++) {
+ struct sw_flow_mask *mask;
+
+ mask = rcu_dereference_ovsl(ma->masks[i]);
+ if (mask) {
+ flow = masked_flow_lookup(ti, key, mask, n_mask_hit);
+ if (flow) { /* Found */
+ *index = i;
+ return flow;
+ }
+ }
}
+
return NULL;
}
@@ -486,6 +546,7 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
u32 skb_hash,
u32 *n_mask_hit)
{
+ struct mask_array *ma = rcu_dereference_ovsl(tbl->mask_array);
struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
struct mask_cache_entry *entries, *ce, *del;
struct sw_flow *flow;
@@ -493,8 +554,11 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
int seg;
*n_mask_hit = 0;
- if (unlikely(!skb_hash))
- return flow_lookup(tbl, ti, key, n_mask_hit);
+ if (unlikely(!skb_hash)) {
+ u32 __always_unused mask_index;
+
+ return flow_lookup(tbl, ti, ma, key, n_mask_hit, &mask_index);
+ }
del = NULL;
entries = this_cpu_ptr(tbl->mask_cache);
@@ -507,37 +571,33 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
if (ce->skb_hash == skb_hash) {
struct sw_flow_mask *mask;
- int i;
-
- i = 0;
- list_for_each_entry_rcu(mask, &tbl->mask_list, list) {
- if (ce->mask_index == i++) {
- flow = masked_flow_lookup(ti, key, mask,
- n_mask_hit);
- if (flow) /* Found */
- return flow;
-
- break;
- }
+ struct sw_flow *flow;
+
+ mask = rcu_dereference_ovsl(ma->masks[ce->mask_index]);
+ if (mask) {
+ flow = masked_flow_lookup(ti, key, mask,
+ n_mask_hit);
+ if (flow) /* Found */
+ return flow;
}
del = ce;
break;
}
- if (!del || (del->skb_hash && !ce->skb_hash)) {
+ if (!del || (del->skb_hash && !ce->skb_hash) ||
+ (rcu_dereference_ovsl(ma->masks[del->mask_index]) &&
+ !rcu_dereference_ovsl(ma->masks[ce->mask_index]))) {
del = ce;
}
hash >>= MC_HASH_SHIFT;
}
- flow = flow_lookup(tbl, ti, key, n_mask_hit);
+ flow = flow_lookup(tbl, ti, ma, key, n_mask_hit, &del->mask_index);
- if (flow) {
+ if (flow)
del->skb_hash = skb_hash;
- del->mask_index = (*n_mask_hit - 1);
- }
return flow;
}
@@ -546,26 +606,38 @@ struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *tbl,
const struct sw_flow_key *key)
{
struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
+ struct mask_array *ma = rcu_dereference_ovsl(tbl->mask_array);
+
u32 __always_unused n_mask_hit;
+ u32 __always_unused index;
- return flow_lookup(tbl, ti, key, &n_mask_hit);
+ return flow_lookup(tbl, ti, ma, key, &n_mask_hit, &index);
}
struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,
const struct sw_flow_match *match)
{
- struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
- struct sw_flow_mask *mask;
- struct sw_flow *flow;
- u32 __always_unused n_mask_hit;
+ struct mask_array *ma = ovsl_dereference(tbl->mask_array);
+ int i;
/* Always called under ovs-mutex. */
- list_for_each_entry(mask, &tbl->mask_list, list) {
+ for (i = 0; i < ma->max; i++) {
+ struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
+ u32 __always_unused n_mask_hit;
+ struct sw_flow_mask *mask;
+ struct sw_flow *flow;
+
+ mask = ovsl_dereference(ma->masks[i]);
+ if (!mask)
+ continue;
+
flow = masked_flow_lookup(ti, match->key, mask, &n_mask_hit);
if (flow && ovs_identifier_is_key(&flow->id) &&
- ovs_flow_cmp_unmasked_key(flow, match))
+ ovs_flow_cmp_unmasked_key(flow, match)) {
return flow;
+ }
}
+
return NULL;
}
@@ -611,13 +683,8 @@ struct sw_flow *ovs_flow_tbl_lookup_ufid(struct flow_table *tbl,
int ovs_flow_tbl_num_masks(const struct flow_table *table)
{
- struct sw_flow_mask *mask;
- int num = 0;
-
- list_for_each_entry(mask, &table->mask_list, list)
- num++;
-
- return num;
+ struct mask_array *ma = rcu_dereference_ovsl(table->mask_array);
+ return ma->count;
}
static struct table_instance *table_instance_expand(struct table_instance *ti,
@@ -638,8 +705,19 @@ static void flow_mask_remove(struct flow_table *tbl, struct sw_flow_mask *mask)
mask->ref_count--;
if (!mask->ref_count) {
- list_del_rcu(&mask->list);
- kfree_rcu(mask, rcu);
+ struct mask_array *ma;
+ int i;
+
+ ma = ovsl_dereference(tbl->mask_array);
+ for (i = 0; i < ma->max; i++) {
+ if (mask == ovsl_dereference(ma->masks[i])) {
+ RCU_INIT_POINTER(ma->masks[i], NULL);
+ ma->count--;
+ kfree_rcu(mask, rcu);
+ return;
+ }
+ }
+ BUG();
}
}
}
@@ -689,13 +767,16 @@ static bool mask_equal(const struct sw_flow_mask *a,
static struct sw_flow_mask *flow_mask_find(const struct flow_table *tbl,
const struct sw_flow_mask *mask)
{
- struct list_head *ml;
+ struct mask_array *ma;
+ int i;
- list_for_each(ml, &tbl->mask_list) {
- struct sw_flow_mask *m;
- m = container_of(ml, struct sw_flow_mask, list);
- if (mask_equal(mask, m))
- return m;
+ ma = ovsl_dereference(tbl->mask_array);
+ for (i = 0; i < ma->max; i++) {
+ struct sw_flow_mask *t;
+ t = ovsl_dereference(ma->masks[i]);
+
+ if (t && mask_equal(mask, t))
+ return t;
}
return NULL;
@@ -706,15 +787,44 @@ static int flow_mask_insert(struct flow_table *tbl, struct sw_flow *flow,
const struct sw_flow_mask *new)
{
struct sw_flow_mask *mask;
+
mask = flow_mask_find(tbl, new);
if (!mask) {
+ struct mask_array *ma;
+ int i;
+
/* Allocate a new mask if none exsits. */
mask = mask_alloc();
if (!mask)
return -ENOMEM;
mask->key = new->key;
mask->range = new->range;
- list_add_tail_rcu(&mask->list, &tbl->mask_list);
+
+ /* Add mask to mask-list. */
+ ma = ovsl_dereference(tbl->mask_array);
+ if (ma->count >= ma->max) {
+ int err;
+
+ err = tbl_mask_array_realloc(tbl, ma->max +
+ MASK_ARRAY_SIZE_MIN);
+ if (err) {
+ kfree(mask);
+ return err;
+ }
+
+ ma = ovsl_dereference(tbl->mask_array);
+ }
+
+ for (i = 0; i < ma->max; i++) {
+ const struct sw_flow_mask *t;
+
+ t = ovsl_dereference(ma->masks[i]);
+ if (!t) {
+ rcu_assign_pointer(ma->masks[i], mask);
+ ma->count++;
+ break;
+ }
+ }
} else {
BUG_ON(!mask->ref_count);
mask->ref_count++;
diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h
index 04b6b1c..8a5cea6 100644
--- a/net/openvswitch/flow_table.h
+++ b/net/openvswitch/flow_table.h
@@ -27,6 +27,12 @@ struct mask_cache_entry {
u32 mask_index;
};
+struct mask_array {
+ struct rcu_head rcu;
+ int count, max;
+ struct sw_flow_mask __rcu *masks[];
+};
+
struct table_instance {
struct hlist_head *buckets;
unsigned int n_buckets;
@@ -40,7 +46,7 @@ struct flow_table {
struct table_instance __rcu *ti;
struct table_instance __rcu *ufid_ti;
struct mask_cache_entry __percpu *mask_cache;
- struct list_head mask_list;
+ struct mask_array __rcu *mask_array;
unsigned long last_rehash;
unsigned int count;
unsigned int ufid_count;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v2 03/10] net: openvswitch: shrink the mask array if necessary
2019-10-08 1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 01/10] net: openvswitch: add flow-mask cache for performance xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 02/10] net: openvswitch: convert mask list in mask array xiangxia.m.yue
@ 2019-10-08 1:00 ` xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 04/10] net: openvswitch: optimize flow-mask cache hash collision xiangxia.m.yue
` (7 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08 1:00 UTC (permalink / raw)
To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
When creating and inserting flow-mask, if there is no available
flow-mask, we realloc the mask array. When removing flow-mask,
if necessary, we shrink mask array.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
net/openvswitch/flow_table.c | 33 +++++++++++++++++++++++----------
1 file changed, 23 insertions(+), 10 deletions(-)
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index aab7a27..ff4c4b049 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -693,6 +693,23 @@ static struct table_instance *table_instance_expand(struct table_instance *ti,
return table_instance_rehash(ti, ti->n_buckets * 2, ufid);
}
+static void tbl_mask_array_delete_mask(struct mask_array *ma,
+ struct sw_flow_mask *mask)
+{
+ int i;
+
+ /* Remove the deleted mask pointers from the array */
+ for (i = 0; i < ma->max; i++) {
+ if (mask == ovsl_dereference(ma->masks[i])) {
+ RCU_INIT_POINTER(ma->masks[i], NULL);
+ ma->count--;
+ kfree_rcu(mask, rcu);
+ return;
+ }
+ }
+ BUG();
+}
+
/* Remove 'mask' from the mask list, if it is not needed any more. */
static void flow_mask_remove(struct flow_table *tbl, struct sw_flow_mask *mask)
{
@@ -706,18 +723,14 @@ static void flow_mask_remove(struct flow_table *tbl, struct sw_flow_mask *mask)
if (!mask->ref_count) {
struct mask_array *ma;
- int i;
ma = ovsl_dereference(tbl->mask_array);
- for (i = 0; i < ma->max; i++) {
- if (mask == ovsl_dereference(ma->masks[i])) {
- RCU_INIT_POINTER(ma->masks[i], NULL);
- ma->count--;
- kfree_rcu(mask, rcu);
- return;
- }
- }
- BUG();
+ tbl_mask_array_delete_mask(ma, mask);
+
+ /* Shrink the mask array if necessary. */
+ if (ma->max >= (MASK_ARRAY_SIZE_MIN * 2) &&
+ ma->count <= (ma->max / 3))
+ tbl_mask_array_realloc(tbl, ma->max / 2);
}
}
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v2 04/10] net: openvswitch: optimize flow-mask cache hash collision
2019-10-08 1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
` (2 preceding siblings ...)
2019-10-08 1:00 ` [PATCH net-next v2 03/10] net: openvswitch: shrink the mask array if necessary xiangxia.m.yue
@ 2019-10-08 1:00 ` xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 05/10] net: openvswitch: optimize flow-mask looking up xiangxia.m.yue
` (6 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08 1:00 UTC (permalink / raw)
To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Port the codes to linux upstream and with little changes.
Pravin B Shelar, says:
| In case hash collision on mask cache, OVS does extra flow
| lookup. Following patch avoid it.
Link: https://github.com/openvswitch/ovs/commit/0e6efbe2712da03522532dc5e84806a96f6a0dd1
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
net/openvswitch/flow_table.c | 95 ++++++++++++++++++++++++--------------------
1 file changed, 53 insertions(+), 42 deletions(-)
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index ff4c4b049..4c82960 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -508,6 +508,9 @@ static struct sw_flow *masked_flow_lookup(struct table_instance *ti,
return NULL;
}
+/* Flow lookup does full lookup on flow table. It starts with
+ * mask from index passed in *index.
+ */
static struct sw_flow *flow_lookup(struct flow_table *tbl,
struct table_instance *ti,
struct mask_array *ma,
@@ -516,18 +519,31 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl,
u32 *index)
{
struct sw_flow *flow;
+ struct sw_flow_mask *mask;
int i;
- for (i = 0; i < ma->max; i++) {
- struct sw_flow_mask *mask;
-
- mask = rcu_dereference_ovsl(ma->masks[i]);
+ if (*index < ma->max) {
+ mask = rcu_dereference_ovsl(ma->masks[*index]);
if (mask) {
flow = masked_flow_lookup(ti, key, mask, n_mask_hit);
- if (flow) { /* Found */
- *index = i;
+ if (flow)
return flow;
- }
+ }
+ }
+
+ for (i = 0; i < ma->max; i++) {
+
+ if (i == *index)
+ continue;
+
+ mask = rcu_dereference_ovsl(ma->masks[i]);
+ if (!mask)
+ continue;
+
+ flow = masked_flow_lookup(ti, key, mask, n_mask_hit);
+ if (flow) { /* Found */
+ *index = i;
+ return flow;
}
}
@@ -546,58 +562,54 @@ struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl,
u32 skb_hash,
u32 *n_mask_hit)
{
- struct mask_array *ma = rcu_dereference_ovsl(tbl->mask_array);
- struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
- struct mask_cache_entry *entries, *ce, *del;
+ struct mask_array *ma = rcu_dereference(tbl->mask_array);
+ struct table_instance *ti = rcu_dereference(tbl->ti);
+ struct mask_cache_entry *entries, *ce;
struct sw_flow *flow;
- u32 hash = skb_hash;
+ u32 hash;
int seg;
*n_mask_hit = 0;
if (unlikely(!skb_hash)) {
- u32 __always_unused mask_index;
+ u32 mask_index = 0;
return flow_lookup(tbl, ti, ma, key, n_mask_hit, &mask_index);
}
- del = NULL;
+ /* Pre and post recirulation flows usually have the same skb_hash
+ * value. To avoid hash collisions, rehash the 'skb_hash' with
+ * 'recirc_id'. */
+ if (key->recirc_id)
+ skb_hash = jhash_1word(skb_hash, key->recirc_id);
+
+ ce = NULL;
+ hash = skb_hash;
entries = this_cpu_ptr(tbl->mask_cache);
+ /* Find the cache entry 'ce' to operate on. */
for (seg = 0; seg < MC_HASH_SEGS; seg++) {
- int index;
-
- index = hash & (MC_HASH_ENTRIES - 1);
- ce = &entries[index];
-
- if (ce->skb_hash == skb_hash) {
- struct sw_flow_mask *mask;
- struct sw_flow *flow;
-
- mask = rcu_dereference_ovsl(ma->masks[ce->mask_index]);
- if (mask) {
- flow = masked_flow_lookup(ti, key, mask,
- n_mask_hit);
- if (flow) /* Found */
- return flow;
- }
-
- del = ce;
- break;
+ int index = hash & (MC_HASH_ENTRIES - 1);
+ struct mask_cache_entry *e;
+
+ e = &entries[index];
+ if (e->skb_hash == skb_hash) {
+ flow = flow_lookup(tbl, ti, ma, key, n_mask_hit,
+ &e->mask_index);
+ if (!flow)
+ e->skb_hash = 0;
+ return flow;
}
- if (!del || (del->skb_hash && !ce->skb_hash) ||
- (rcu_dereference_ovsl(ma->masks[del->mask_index]) &&
- !rcu_dereference_ovsl(ma->masks[ce->mask_index]))) {
- del = ce;
- }
+ if (!ce || e->skb_hash < ce->skb_hash)
+ ce = e; /* A better replacement cache candidate. */
hash >>= MC_HASH_SHIFT;
}
- flow = flow_lookup(tbl, ti, ma, key, n_mask_hit, &del->mask_index);
-
+ /* Cache miss, do full lookup. */
+ flow = flow_lookup(tbl, ti, ma, key, n_mask_hit, &ce->mask_index);
if (flow)
- del->skb_hash = skb_hash;
+ ce->skb_hash = skb_hash;
return flow;
}
@@ -607,9 +619,8 @@ struct sw_flow *ovs_flow_tbl_lookup(struct flow_table *tbl,
{
struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
struct mask_array *ma = rcu_dereference_ovsl(tbl->mask_array);
-
u32 __always_unused n_mask_hit;
- u32 __always_unused index;
+ u32 index = 0;
return flow_lookup(tbl, ti, ma, key, &n_mask_hit, &index);
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v2 05/10] net: openvswitch: optimize flow-mask looking up
2019-10-08 1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
` (3 preceding siblings ...)
2019-10-08 1:00 ` [PATCH net-next v2 04/10] net: openvswitch: optimize flow-mask cache hash collision xiangxia.m.yue
@ 2019-10-08 1:00 ` xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 06/10] net: openvswitch: simplify the flow_hash xiangxia.m.yue
` (5 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08 1:00 UTC (permalink / raw)
To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
The full looking up on flow table traverses all mask array.
If mask-array is too large, the number of invalid flow-mask
increase, performance will be drop.
This patch optimizes mask-array operation:
* Inserting, insert it [ma->count- 1] directly.
* Removing, only change last and current mask point, and free current mask.
* Looking up, full looking up will break if mask is NULL.
The function which changes or gets *count* of struct mask_array,
is protected by ovs_lock, but flow_lookup (not protected) should use *max* of
struct mask_array.
Functions protected by ovs_lock:
* tbl_mask_array_del_mask
* tbl_mask_array_add_mask
* flow_mask_find
* ovs_flow_tbl_lookup_exact
* ovs_flow_tbl_num_masks
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
net/openvswitch/flow_table.c | 106 ++++++++++++++++++++++---------------------
1 file changed, 54 insertions(+), 52 deletions(-)
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 4c82960..7edddd9 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -538,7 +538,7 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl,
mask = rcu_dereference_ovsl(ma->masks[i]);
if (!mask)
- continue;
+ break;
flow = masked_flow_lookup(ti, key, mask, n_mask_hit);
if (flow) { /* Found */
@@ -632,15 +632,13 @@ struct sw_flow *ovs_flow_tbl_lookup_exact(struct flow_table *tbl,
int i;
/* Always called under ovs-mutex. */
- for (i = 0; i < ma->max; i++) {
+ for (i = 0; i < ma->count; i++) {
struct table_instance *ti = rcu_dereference_ovsl(tbl->ti);
u32 __always_unused n_mask_hit;
struct sw_flow_mask *mask;
struct sw_flow *flow;
mask = ovsl_dereference(ma->masks[i]);
- if (!mask)
- continue;
flow = masked_flow_lookup(ti, match->key, mask, &n_mask_hit);
if (flow && ovs_identifier_is_key(&flow->id) &&
@@ -704,21 +702,34 @@ static struct table_instance *table_instance_expand(struct table_instance *ti,
return table_instance_rehash(ti, ti->n_buckets * 2, ufid);
}
-static void tbl_mask_array_delete_mask(struct mask_array *ma,
- struct sw_flow_mask *mask)
+static void tbl_mask_array_del_mask(struct flow_table *tbl,
+ struct sw_flow_mask *mask)
{
+ struct mask_array *ma = ovsl_dereference(tbl->mask_array);
int i;
/* Remove the deleted mask pointers from the array */
- for (i = 0; i < ma->max; i++) {
- if (mask == ovsl_dereference(ma->masks[i])) {
- RCU_INIT_POINTER(ma->masks[i], NULL);
- ma->count--;
- kfree_rcu(mask, rcu);
- return;
- }
+ for (i = 0; i < ma->count; i++) {
+ if (mask == ovsl_dereference(ma->masks[i]))
+ goto found;
}
+
BUG();
+ return;
+
+found:
+ ma->count--;
+ smp_wmb();
+
+ rcu_assign_pointer(ma->masks[i], ma->masks[ma->count]);
+ RCU_INIT_POINTER(ma->masks[ma->count], NULL);
+
+ kfree_rcu(mask, rcu);
+
+ /* Shrink the mask array if necessary. */
+ if (ma->max >= (MASK_ARRAY_SIZE_MIN * 2) &&
+ ma->count <= (ma->max / 3))
+ tbl_mask_array_realloc(tbl, ma->max / 2);
}
/* Remove 'mask' from the mask list, if it is not needed any more. */
@@ -732,17 +743,8 @@ static void flow_mask_remove(struct flow_table *tbl, struct sw_flow_mask *mask)
BUG_ON(!mask->ref_count);
mask->ref_count--;
- if (!mask->ref_count) {
- struct mask_array *ma;
-
- ma = ovsl_dereference(tbl->mask_array);
- tbl_mask_array_delete_mask(ma, mask);
-
- /* Shrink the mask array if necessary. */
- if (ma->max >= (MASK_ARRAY_SIZE_MIN * 2) &&
- ma->count <= (ma->max / 3))
- tbl_mask_array_realloc(tbl, ma->max / 2);
- }
+ if (!mask->ref_count)
+ tbl_mask_array_del_mask(tbl, mask);
}
}
@@ -795,17 +797,40 @@ static struct sw_flow_mask *flow_mask_find(const struct flow_table *tbl,
int i;
ma = ovsl_dereference(tbl->mask_array);
- for (i = 0; i < ma->max; i++) {
+ for (i = 0; i < ma->count; i++) {
struct sw_flow_mask *t;
t = ovsl_dereference(ma->masks[i]);
- if (t && mask_equal(mask, t))
+ if (mask_equal(mask, t))
return t;
}
return NULL;
}
+static int tbl_mask_array_add_mask(struct flow_table *tbl,
+ struct sw_flow_mask *new)
+{
+ struct mask_array *ma = ovsl_dereference(tbl->mask_array);
+ int err;
+
+ if (ma->count >= ma->max) {
+ err = tbl_mask_array_realloc(tbl, ma->max +
+ MASK_ARRAY_SIZE_MIN);
+ if (err)
+ return err;
+ }
+
+ BUG_ON(ovsl_dereference(ma->masks[ma->count]));
+
+ rcu_assign_pointer(ma->masks[ma->count], new);
+
+ smp_wmb();
+ ma->count++;
+
+ return 0;
+}
+
/* Add 'mask' into the mask list, if it is not already there. */
static int flow_mask_insert(struct flow_table *tbl, struct sw_flow *flow,
const struct sw_flow_mask *new)
@@ -814,9 +839,6 @@ static int flow_mask_insert(struct flow_table *tbl, struct sw_flow *flow,
mask = flow_mask_find(tbl, new);
if (!mask) {
- struct mask_array *ma;
- int i;
-
/* Allocate a new mask if none exsits. */
mask = mask_alloc();
if (!mask)
@@ -825,29 +847,9 @@ static int flow_mask_insert(struct flow_table *tbl, struct sw_flow *flow,
mask->range = new->range;
/* Add mask to mask-list. */
- ma = ovsl_dereference(tbl->mask_array);
- if (ma->count >= ma->max) {
- int err;
-
- err = tbl_mask_array_realloc(tbl, ma->max +
- MASK_ARRAY_SIZE_MIN);
- if (err) {
- kfree(mask);
- return err;
- }
-
- ma = ovsl_dereference(tbl->mask_array);
- }
-
- for (i = 0; i < ma->max; i++) {
- const struct sw_flow_mask *t;
-
- t = ovsl_dereference(ma->masks[i]);
- if (!t) {
- rcu_assign_pointer(ma->masks[i], mask);
- ma->count++;
- break;
- }
+ if (tbl_mask_array_add_mask(tbl, mask)) {
+ kfree(mask);
+ return -ENOMEM;
}
} else {
BUG_ON(!mask->ref_count);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v2 06/10] net: openvswitch: simplify the flow_hash
2019-10-08 1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
` (4 preceding siblings ...)
2019-10-08 1:00 ` [PATCH net-next v2 05/10] net: openvswitch: optimize flow-mask looking up xiangxia.m.yue
@ 2019-10-08 1:00 ` xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 07/10] net: openvswitch: add likely in flow_lookup xiangxia.m.yue
` (4 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08 1:00 UTC (permalink / raw)
To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Simplify the code and remove the unnecessary BUILD_BUG_ON.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
net/openvswitch/flow_table.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 7edddd9..667f474 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -432,13 +432,9 @@ int ovs_flow_tbl_flush(struct flow_table *flow_table)
static u32 flow_hash(const struct sw_flow_key *key,
const struct sw_flow_key_range *range)
{
- int key_start = range->start;
- int key_end = range->end;
- const u32 *hash_key = (const u32 *)((const u8 *)key + key_start);
- int hash_u32s = (key_end - key_start) >> 2;
-
+ const u32 *hash_key = (const u32 *)((const u8 *)key + range->start);
/* Make sure number of hash bytes are multiple of u32. */
- BUILD_BUG_ON(sizeof(long) % sizeof(u32));
+ int hash_u32s = range_n_bytes(range) >> 2;
return jhash2(hash_key, hash_u32s, 0);
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v2 07/10] net: openvswitch: add likely in flow_lookup
2019-10-08 1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
` (5 preceding siblings ...)
2019-10-08 1:00 ` [PATCH net-next v2 06/10] net: openvswitch: simplify the flow_hash xiangxia.m.yue
@ 2019-10-08 1:00 ` xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 08/10] net: openvswitch: fix possible memleak on destroy flow-table xiangxia.m.yue
` (3 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08 1:00 UTC (permalink / raw)
To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
The most case *index < ma->count, and flow-mask is not NULL.
We add un/likely for performance.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
net/openvswitch/flow_table.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 667f474..007f7cd 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -518,7 +518,7 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl,
struct sw_flow_mask *mask;
int i;
- if (*index < ma->max) {
+ if (likely(*index < ma->count)) {
mask = rcu_dereference_ovsl(ma->masks[*index]);
if (mask) {
flow = masked_flow_lookup(ti, key, mask, n_mask_hit);
@@ -527,13 +527,13 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl,
}
}
- for (i = 0; i < ma->max; i++) {
+ for (i = 0; i < ma->count; i++) {
if (i == *index)
continue;
mask = rcu_dereference_ovsl(ma->masks[i]);
- if (!mask)
+ if (unlikely(!mask))
break;
flow = masked_flow_lookup(ti, key, mask, n_mask_hit);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v2 08/10] net: openvswitch: fix possible memleak on destroy flow-table
2019-10-08 1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
` (6 preceding siblings ...)
2019-10-08 1:00 ` [PATCH net-next v2 07/10] net: openvswitch: add likely in flow_lookup xiangxia.m.yue
@ 2019-10-08 1:00 ` xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 09/10] net: openvswitch: don't unlock mutex when changing the user_features fails xiangxia.m.yue
` (2 subsequent siblings)
10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08 1:00 UTC (permalink / raw)
To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
When we destroy the flow tables which may contain the flow_mask,
so release the flow mask struct.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
net/openvswitch/flow_table.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 007f7cd..bc14b12 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -295,6 +295,18 @@ static void table_instance_destroy(struct table_instance *ti,
}
}
+static void tbl_mask_array_destroy(struct flow_table *tbl)
+{
+ struct mask_array *ma = ovsl_dereference(tbl->mask_array);
+ int i;
+
+ /* Free the flow-mask and kfree_rcu the NULL is allowed. */
+ for (i = 0; i < ma->count; i++)
+ kfree_rcu(ma->masks[i], rcu);
+
+ kfree_rcu(tbl->mask_array, rcu);
+}
+
/* No need for locking this function is called from RCU callback or
* error path.
*/
@@ -304,7 +316,7 @@ void ovs_flow_tbl_destroy(struct flow_table *table)
struct table_instance *ufid_ti = rcu_dereference_raw(table->ufid_ti);
free_percpu(table->mask_cache);
- kfree_rcu(table->mask_array, rcu);
+ tbl_mask_array_destroy(table);
table_instance_destroy(ti, ufid_ti, false);
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v2 09/10] net: openvswitch: don't unlock mutex when changing the user_features fails
2019-10-08 1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
` (7 preceding siblings ...)
2019-10-08 1:00 ` [PATCH net-next v2 08/10] net: openvswitch: fix possible memleak on destroy flow-table xiangxia.m.yue
@ 2019-10-08 1:00 ` xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 10/10] net: openvswitch: simplify the ovs_dp_cmd_new xiangxia.m.yue
2019-10-08 17:33 ` [PATCH net-next v2 00/10] optimize openvswitch flow looking up Gregory Rose
10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08 1:00 UTC (permalink / raw)
To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang, Paul Blakey
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Unlocking of a not locked mutex is not allowed.
Other kernel thread may be in critical section while
we unlock it because of setting user_feature fail.
Fixes: 95a7233c4 ("net: openvswitch: Set OvS recirc_id from tc chain index")
Cc: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
net/openvswitch/datapath.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 9fea7e1..aeb76e4 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1657,6 +1657,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
ovs_dp_reset_user_features(skb, info);
}
+ ovs_unlock();
goto err_destroy_meters;
}
@@ -1673,7 +1674,6 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
return 0;
err_destroy_meters:
- ovs_unlock();
ovs_meters_exit(dp);
err_destroy_ports_array:
kfree(dp->ports);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH net-next v2 10/10] net: openvswitch: simplify the ovs_dp_cmd_new
2019-10-08 1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
` (8 preceding siblings ...)
2019-10-08 1:00 ` [PATCH net-next v2 09/10] net: openvswitch: don't unlock mutex when changing the user_features fails xiangxia.m.yue
@ 2019-10-08 1:00 ` xiangxia.m.yue
2019-10-08 17:33 ` [PATCH net-next v2 00/10] optimize openvswitch flow looking up Gregory Rose
10 siblings, 0 replies; 16+ messages in thread
From: xiangxia.m.yue @ 2019-10-08 1:00 UTC (permalink / raw)
To: gvrose8192, pshelar; +Cc: netdev, Tonghao Zhang
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
use the specified functions to init resource.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
net/openvswitch/datapath.c | 60 +++++++++++++++++++++++++++++-----------------
1 file changed, 38 insertions(+), 22 deletions(-)
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index aeb76e4..4d48e48 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1576,6 +1576,31 @@ static int ovs_dp_change(struct datapath *dp, struct nlattr *a[])
return 0;
}
+static int ovs_dp_stats_init(struct datapath *dp)
+{
+ dp->stats_percpu = netdev_alloc_pcpu_stats(struct dp_stats_percpu);
+ if (!dp->stats_percpu)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static int ovs_dp_vport_init(struct datapath *dp)
+{
+ int i;
+
+ dp->ports = kmalloc_array(DP_VPORT_HASH_BUCKETS,
+ sizeof(struct hlist_head),
+ GFP_KERNEL);
+ if (!dp->ports)
+ return -ENOMEM;
+
+ for (i = 0; i < DP_VPORT_HASH_BUCKETS; i++)
+ INIT_HLIST_HEAD(&dp->ports[i]);
+
+ return 0;
+}
+
static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
{
struct nlattr **a = info->attrs;
@@ -1584,7 +1609,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
struct datapath *dp;
struct vport *vport;
struct ovs_net *ovs_net;
- int err, i;
+ int err;
err = -EINVAL;
if (!a[OVS_DP_ATTR_NAME] || !a[OVS_DP_ATTR_UPCALL_PID])
@@ -1597,35 +1622,26 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
err = -ENOMEM;
dp = kzalloc(sizeof(*dp), GFP_KERNEL);
if (dp == NULL)
- goto err_free_reply;
+ goto err_destroy_reply;
ovs_dp_set_net(dp, sock_net(skb->sk));
/* Allocate table. */
err = ovs_flow_tbl_init(&dp->table);
if (err)
- goto err_free_dp;
+ goto err_destroy_dp;
- dp->stats_percpu = netdev_alloc_pcpu_stats(struct dp_stats_percpu);
- if (!dp->stats_percpu) {
- err = -ENOMEM;
+ err = ovs_dp_stats_init(dp);
+ if (err)
goto err_destroy_table;
- }
- dp->ports = kmalloc_array(DP_VPORT_HASH_BUCKETS,
- sizeof(struct hlist_head),
- GFP_KERNEL);
- if (!dp->ports) {
- err = -ENOMEM;
- goto err_destroy_percpu;
- }
-
- for (i = 0; i < DP_VPORT_HASH_BUCKETS; i++)
- INIT_HLIST_HEAD(&dp->ports[i]);
+ err = ovs_dp_vport_init(dp);
+ if (err)
+ goto err_destroy_stats;
err = ovs_meters_init(dp);
if (err)
- goto err_destroy_ports_array;
+ goto err_destroy_ports;
/* Set up our datapath device. */
parms.name = nla_data(a[OVS_DP_ATTR_NAME]);
@@ -1675,15 +1691,15 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
err_destroy_meters:
ovs_meters_exit(dp);
-err_destroy_ports_array:
+err_destroy_ports:
kfree(dp->ports);
-err_destroy_percpu:
+err_destroy_stats:
free_percpu(dp->stats_percpu);
err_destroy_table:
ovs_flow_tbl_destroy(&dp->table);
-err_free_dp:
+err_destroy_dp:
kfree(dp);
-err_free_reply:
+err_destroy_reply:
kfree_skb(reply);
err:
return err;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v2 00/10] optimize openvswitch flow looking up
2019-10-08 1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
` (9 preceding siblings ...)
2019-10-08 1:00 ` [PATCH net-next v2 10/10] net: openvswitch: simplify the ovs_dp_cmd_new xiangxia.m.yue
@ 2019-10-08 17:33 ` Gregory Rose
2019-10-10 8:42 ` Tonghao Zhang
10 siblings, 1 reply; 16+ messages in thread
From: Gregory Rose @ 2019-10-08 17:33 UTC (permalink / raw)
To: xiangxia.m.yue, pshelar; +Cc: netdev
On 10/7/2019 6:00 PM, xiangxia.m.yue@gmail.com wrote:
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> This series patch optimize openvswitch for performance or simplify
> codes.
>
> Patch 1, 2, 4: Port Pravin B Shelar patches to
> linux upstream with little changes.
>
> Patch 5, 6, 7: Optimize the flow looking up and
> simplify the flow hash.
>
> Patch 8, 9: are bugfix.
>
> The performance test is on Intel Xeon E5-2630 v4.
> The test topology is show as below:
>
> +-----------------------------------+
> | +---------------------------+ |
> | | eth0 ovs-switch eth1 | | Host0
> | +---------------------------+ |
> +-----------------------------------+
> ^ |
> | |
> | |
> | |
> | v
> +-----+----+ +----+-----+
> | netperf | Host1 | netserver| Host2
> +----------+ +----------+
>
> We use netperf send the 64B packets, and insert 255+ flow-mask:
> $ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:01:00:00:00:00/ff:ff:ff:ff:ff:01),eth_type(0x0800),ipv4(frag=no)" 2
> ...
> $ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:ff:00:00:00:00/ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(frag=no)" 2
> $
> $ netperf -t UDP_STREAM -H 2.2.2.200 -l 40 -- -m 18
>
> * Without series patch, throughput 8.28Mbps
> * With series patch, throughput 46.05Mbps
>
> v1 -> v2:
> 1. use kfree_rcu instead of call_rcu.
> 2. add barrier when changing the ma->count.
> 3. change the ma->max to ma->count in flow_lookup.
>
> Tonghao Zhang (10):
> net: openvswitch: add flow-mask cache for performance
> net: openvswitch: convert mask list in mask array
> net: openvswitch: shrink the mask array if necessary
> net: openvswitch: optimize flow-mask cache hash collision
> net: openvswitch: optimize flow-mask looking up
> net: openvswitch: simplify the flow_hash
> net: openvswitch: add likely in flow_lookup
> net: openvswitch: fix possible memleak on destroy flow-table
> net: openvswitch: don't unlock mutex when changing the user_features
> fails
> net: openvswitch: simplify the ovs_dp_cmd_new
>
> net/openvswitch/datapath.c | 65 +++++----
> net/openvswitch/flow.h | 1 -
> net/openvswitch/flow_table.c | 315 +++++++++++++++++++++++++++++++++++++------
> net/openvswitch/flow_table.h | 19 ++-
> 4 files changed, 328 insertions(+), 72 deletions(-)
>
Hi Tonghao,
I've applied your patch series and built a 5.4.0-rc1 kernel with them.
xxxxx@ubuntu-1604:~$ modinfo openvswitch
filename: /lib/modules/5.4.0-rc1+/kernel/net/openvswitch/openvswitch.ko
alias: net-pf-16-proto-16-family-ovs_ct_limit
alias: net-pf-16-proto-16-family-ovs_meter
alias: net-pf-16-proto-16-family-ovs_packet
alias: net-pf-16-proto-16-family-ovs_flow
alias: net-pf-16-proto-16-family-ovs_vport
alias: net-pf-16-proto-16-family-ovs_datapath
license: GPL
description: Open vSwitch switching datapath
srcversion: F15EB8B4460D81BAA16216B
depends: nf_conntrack,nf_nat,nf_conncount,libcrc32c,nf_defrag_ipv6,nsh
retpoline: Y
intree: Y
name: openvswitch
vermagic: 5.4.0-rc1+ SMP mod_unload modversions
I then built openvswitch master branch from github and ran 'make
check-kernel'.
In doing so I ran into the following splat in this test:
63: conntrack - IPv6 fragmentation + vlan
Here is the splat:
[ 480.024215] ------------[ cut here ]------------
[ 480.024218] kernel BUG at net/openvswitch/flow_table.c:725!
[ 480.024267] invalid opcode: 0000 [#1] SMP PTI
[ 480.024297] CPU: 2 PID: 15717 Comm: ovs-vswitchd Tainted: G E
5.4.0-rc1+ #131
[ 480.024345] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 480.024386] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
[ 480.024424] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
[ 480.024527] RSP: 0018:ffffaf32c05e38c8 EFLAGS: 00010246
[ 480.024560] RAX: 0000000000000010 RBX: ffff9e4f6cd5a000 RCX:
ffff9e4f6c585000
[ 480.024601] RDX: ffff9e4f6cd5a098 RSI: 0000000000000010 RDI:
ffff9e4f6b2c6d20
[ 480.024642] RBP: ffffaf32c05e3b70 R08: ffff9e4f6c1651c0 R09:
ffff9e4f756a43c0
[ 480.024684] R10: 0000000000000000 R11: ffffffffc06e5500 R12:
ffff9e4f6baf7800
[ 480.024742] R13: ffff9e4f6b2c6d20 R14: ffff9e4f724a4e14 R15:
0000000000000007
[ 480.024790] FS: 00007fdd76058980(0000) GS:ffff9e4f77b00000(0000)
knlGS:0000000000000000
[ 480.024836] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 480.024871] CR2: 00007ffd18a5ac60 CR3: 0000000230f3a002 CR4:
00000000001606e0
[ 480.024917] Call Trace:
[ 480.024941] action_fifos_exit+0x3240/0x37b0 [openvswitch]
[ 480.024979] ? __switch_to_asm+0x40/0x70
[ 480.025005] ? __switch_to_asm+0x34/0x70
[ 480.025031] ? __switch_to_asm+0x40/0x70
[ 480.025056] ? __switch_to_asm+0x40/0x70
[ 480.025082] ? __switch_to_asm+0x34/0x70
[ 480.025108] ? __switch_to_asm+0x40/0x70
[ 480.025134] ? __switch_to_asm+0x34/0x70
[ 480.025159] ? __switch_to_asm+0x40/0x70
[ 480.025185] ? __switch_to_asm+0x34/0x70
[ 480.025210] ? __switch_to_asm+0x40/0x70
[ 480.025236] ? __switch_to_asm+0x34/0x70
[ 480.025262] ? __switch_to_asm+0x40/0x70
[ 480.025287] ? __switch_to_asm+0x34/0x70
[ 480.025312] ? __switch_to_asm+0x40/0x70
[ 480.025338] ? __switch_to_asm+0x34/0x70
[ 480.025364] ? __switch_to_asm+0x40/0x70
[ 480.025389] ? __switch_to_asm+0x34/0x70
[ 480.025415] ? __switch_to_asm+0x40/0x70
[ 480.025443] ? __update_load_avg_se+0x11c/0x2e0
[ 480.025472] ? __update_load_avg_se+0x11c/0x2e0
[ 480.025503] ? update_load_avg+0x7e/0x600
[ 480.025529] ? update_load_avg+0x7e/0x600
[ 480.025556] ? update_curr+0x85/0x1d0
[ 480.025582] ? cred_has_capability+0x85/0x130
[ 480.025611] ? __nla_validate_parse+0x57/0x8a0
[ 480.025640] ? _cond_resched+0x15/0x40
[ 480.025666] ? genl_family_rcv_msg_attrs_parse.isra.14+0x93/0x100
[ 480.026523] genl_rcv_msg+0x1d9/0x490
[ 480.027385] ? __switch_to_asm+0x34/0x70
[ 480.028230] ? __switch_to_asm+0x40/0x70
[ 480.029050] ? __switch_to_asm+0x40/0x70
[ 480.029874] ? genl_family_rcv_msg_attrs_parse.isra.14+0x100/0x100
[ 480.030673] netlink_rcv_skb+0x4a/0x110
[ 480.031465] genl_rcv+0x24/0x40
[ 480.032312] netlink_unicast+0x1a0/0x250
[ 480.033059] netlink_sendmsg+0x2b4/0x3b0
[ 480.033758] sock_sendmsg+0x5b/0x60
[ 480.034422] ___sys_sendmsg+0x278/0x2f0
[ 480.035083] ? file_update_time+0x60/0x130
[ 480.035680] ? pipe_write+0x286/0x400
[ 480.036290] ? new_sync_write+0x12d/0x1d0
[ 480.036882] ? __sys_sendmsg+0x5e/0xa0
[ 480.037452] __sys_sendmsg+0x5e/0xa0
[ 480.038013] do_syscall_64+0x52/0x1a0
[ 480.038546] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 480.039083] RIP: 0033:0x7fdd7537fa6d
[ 480.039596] Code: b9 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0
ff ff 73 31 c3 48 83 ec 08 e8 fe f6 ff ff 48 89 04 24 b8 2e 00 00 00 0f
05 <48> 8b 3c 24 48 89 c2 e8 47 f7 ff ff 48 89 d0 48 83 c4 08 48 3d 01
[ 480.040769] RSP: 002b:00007ffd18a6ad40 EFLAGS: 00000293 ORIG_RAX:
000000000000002e
[ 480.041391] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007fdd7537fa6d
[ 480.042045] RDX: 0000000000000000 RSI: 00007ffd18a6ada0 RDI:
0000000000000014
[ 480.042713] RBP: 0000000002300870 R08: 0000000000000000 R09:
00007ffd18a6bd58
[ 480.043438] R10: 0000000000000000 R11: 0000000000000293 R12:
00007ffd18a6bb70
[ 480.044138] R13: 00007ffd18a6bd00 R14: 00007ffd18a6bb78 R15:
00007ffd18a6b230
[ 480.044852] Modules linked in: vport_vxlan(E) vxlan(E) vport_gre(E)
ip_gre(E) ip_tunnel(E) vport_geneve(E) geneve(E) ip6_udp_tunnel(E)
udp_tunnel(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat_tftp(E)
nf_conntrack_tftp(E) nf_nat_ftp(E) nf_conntrack_ftp(E) nf_nat(E)
nf_conntrack_netlink(E) ip6table_filter(E) ip6_tables(E)
iptable_filter(E) ip_tables(E) x_tables(E) ip6_gre(E) ip6_tunnel(E)
tunnel6(E) gre(E) bonding(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E)
veth(E) nfnetlink_cttimeout(E) nfnetlink(E) nf_conntrack(E)
nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) intel_rapl_msr(E)
snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_intel(E)
snd_intel_nhlt(E) joydev(E) snd_hda_codec(E) input_leds(E)
snd_hda_core(E) snd_hwdep(E) intel_rapl_common(E) snd_pcm(E)
snd_timer(E) serio_raw(E) snd(E) soundcore(E) i2c_piix4(E) mac_hid(E)
ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) configfs(E)
iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E)
autofs4(E) btrfs(E) zstd_decompress(E)
[ 480.044888] zstd_compress(E) raid10(E) raid456(E)
async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E)
async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E)
multipath(E) linear(E) crct10dif_pclmul(E) crc32_pclmul(E)
ghash_clmulni_intel(E) aesni_intel(E) qxl(E) crypto_simd(E) ttm(E)
cryptd(E) glue_helper(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E)
sysimgblt(E) fb_sys_fops(E) psmouse(E) drm(E) floppy(E) pata_acpi(E)
[last unloaded: nf_conntrack_ftp]
[ 480.056765] ---[ end trace 4a8c4eceeb9f5dec ]---
[ 480.057953] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
[ 480.059134] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
[ 480.061623] RSP: 0018:ffffaf32c05e38c8 EFLAGS: 00010246
[ 480.062959] RAX: 0000000000000010 RBX: ffff9e4f6cd5a000 RCX:
ffff9e4f6c585000
[ 480.064248] RDX: ffff9e4f6cd5a098 RSI: 0000000000000010 RDI:
ffff9e4f6b2c6d20
[ 480.065524] RBP: ffffaf32c05e3b70 R08: ffff9e4f6c1651c0 R09:
ffff9e4f756a43c0
[ 480.066830] R10: 0000000000000000 R11: ffffffffc06e5500 R12:
ffff9e4f6baf7800
[ 480.068870] R13: ffff9e4f6b2c6d20 R14: ffff9e4f724a4e14 R15:
0000000000000007
[ 480.070081] FS: 00007fdd76058980(0000) GS:ffff9e4f77b00000(0000)
knlGS:0000000000000000
[ 480.071340] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 480.072610] CR2: 00007ffd18a5ac60 CR3: 0000000230f3a002 CR4:
00000000001606e0
You're hitting the BUG_ON here:
/* Must be called with OVS mutex held. */
void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow)
{
struct table_instance *ti = ovsl_dereference(table->ti);
struct table_instance *ufid_ti = ovsl_dereference(table->ufid_ti);
BUG_ON(table->count == 0);
<------------------------------------------------ Here
Thanks,
- Greg
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v2 00/10] optimize openvswitch flow looking up
2019-10-08 17:33 ` [PATCH net-next v2 00/10] optimize openvswitch flow looking up Gregory Rose
@ 2019-10-10 8:42 ` Tonghao Zhang
2019-10-14 22:26 ` Gregory Rose
0 siblings, 1 reply; 16+ messages in thread
From: Tonghao Zhang @ 2019-10-10 8:42 UTC (permalink / raw)
To: Gregory Rose; +Cc: Pravin Shelar, Linux Kernel Network Developers
On Wed, Oct 9, 2019 at 1:33 AM Gregory Rose <gvrose8192@gmail.com> wrote:
>
>
> On 10/7/2019 6:00 PM, xiangxia.m.yue@gmail.com wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > This series patch optimize openvswitch for performance or simplify
> > codes.
> >
> > Patch 1, 2, 4: Port Pravin B Shelar patches to
> > linux upstream with little changes.
> >
> > Patch 5, 6, 7: Optimize the flow looking up and
> > simplify the flow hash.
> >
> > Patch 8, 9: are bugfix.
> >
> > The performance test is on Intel Xeon E5-2630 v4.
> > The test topology is show as below:
> >
> > +-----------------------------------+
> > | +---------------------------+ |
> > | | eth0 ovs-switch eth1 | | Host0
> > | +---------------------------+ |
> > +-----------------------------------+
> > ^ |
> > | |
> > | |
> > | |
> > | v
> > +-----+----+ +----+-----+
> > | netperf | Host1 | netserver| Host2
> > +----------+ +----------+
> >
> > We use netperf send the 64B packets, and insert 255+ flow-mask:
> > $ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:01:00:00:00:00/ff:ff:ff:ff:ff:01),eth_type(0x0800),ipv4(frag=no)" 2
> > ...
> > $ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:ff:00:00:00:00/ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(frag=no)" 2
> > $
> > $ netperf -t UDP_STREAM -H 2.2.2.200 -l 40 -- -m 18
> >
> > * Without series patch, throughput 8.28Mbps
> > * With series patch, throughput 46.05Mbps
> >
> > v1 -> v2:
> > 1. use kfree_rcu instead of call_rcu.
> > 2. add barrier when changing the ma->count.
> > 3. change the ma->max to ma->count in flow_lookup.
> >
> > Tonghao Zhang (10):
> > net: openvswitch: add flow-mask cache for performance
> > net: openvswitch: convert mask list in mask array
> > net: openvswitch: shrink the mask array if necessary
> > net: openvswitch: optimize flow-mask cache hash collision
> > net: openvswitch: optimize flow-mask looking up
> > net: openvswitch: simplify the flow_hash
> > net: openvswitch: add likely in flow_lookup
> > net: openvswitch: fix possible memleak on destroy flow-table
> > net: openvswitch: don't unlock mutex when changing the user_features
> > fails
> > net: openvswitch: simplify the ovs_dp_cmd_new
> >
> > net/openvswitch/datapath.c | 65 +++++----
> > net/openvswitch/flow.h | 1 -
> > net/openvswitch/flow_table.c | 315 +++++++++++++++++++++++++++++++++++++------
> > net/openvswitch/flow_table.h | 19 ++-
> > 4 files changed, 328 insertions(+), 72 deletions(-)
> >
>
> Hi Tonghao,
>
> I've applied your patch series and built a 5.4.0-rc1 kernel with them.
>
> xxxxx@ubuntu-1604:~$ modinfo openvswitch
> filename: /lib/modules/5.4.0-rc1+/kernel/net/openvswitch/openvswitch.ko
> alias: net-pf-16-proto-16-family-ovs_ct_limit
> alias: net-pf-16-proto-16-family-ovs_meter
> alias: net-pf-16-proto-16-family-ovs_packet
> alias: net-pf-16-proto-16-family-ovs_flow
> alias: net-pf-16-proto-16-family-ovs_vport
> alias: net-pf-16-proto-16-family-ovs_datapath
> license: GPL
> description: Open vSwitch switching datapath
> srcversion: F15EB8B4460D81BAA16216B
> depends: nf_conntrack,nf_nat,nf_conncount,libcrc32c,nf_defrag_ipv6,nsh
> retpoline: Y
> intree: Y
> name: openvswitch
> vermagic: 5.4.0-rc1+ SMP mod_unload modversions
>
> I then built openvswitch master branch from github and ran 'make
> check-kernel'.
>
> In doing so I ran into the following splat in this test:
> 63: conntrack - IPv6 fragmentation + vlan
>
> Here is the splat:
> [ 480.024215] ------------[ cut here ]------------
> [ 480.024218] kernel BUG at net/openvswitch/flow_table.c:725!
> [ 480.024267] invalid opcode: 0000 [#1] SMP PTI
> [ 480.024297] CPU: 2 PID: 15717 Comm: ovs-vswitchd Tainted: G E
> 5.4.0-rc1+ #131
> [ 480.024345] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [ 480.024386] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
> [ 480.024424] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
> [ 480.024527] RSP: 0018:ffffaf32c05e38c8 EFLAGS: 00010246
> [ 480.024560] RAX: 0000000000000010 RBX: ffff9e4f6cd5a000 RCX:
> ffff9e4f6c585000
> [ 480.024601] RDX: ffff9e4f6cd5a098 RSI: 0000000000000010 RDI:
> ffff9e4f6b2c6d20
> [ 480.024642] RBP: ffffaf32c05e3b70 R08: ffff9e4f6c1651c0 R09:
> ffff9e4f756a43c0
> [ 480.024684] R10: 0000000000000000 R11: ffffffffc06e5500 R12:
> ffff9e4f6baf7800
> [ 480.024742] R13: ffff9e4f6b2c6d20 R14: ffff9e4f724a4e14 R15:
> 0000000000000007
> [ 480.024790] FS: 00007fdd76058980(0000) GS:ffff9e4f77b00000(0000)
> knlGS:0000000000000000
> [ 480.024836] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 480.024871] CR2: 00007ffd18a5ac60 CR3: 0000000230f3a002 CR4:
> 00000000001606e0
> [ 480.024917] Call Trace:
> [ 480.024941] action_fifos_exit+0x3240/0x37b0 [openvswitch]
> [ 480.024979] ? __switch_to_asm+0x40/0x70
> [ 480.025005] ? __switch_to_asm+0x34/0x70
> [ 480.025031] ? __switch_to_asm+0x40/0x70
> [ 480.025056] ? __switch_to_asm+0x40/0x70
> [ 480.025082] ? __switch_to_asm+0x34/0x70
> [ 480.025108] ? __switch_to_asm+0x40/0x70
> [ 480.025134] ? __switch_to_asm+0x34/0x70
> [ 480.025159] ? __switch_to_asm+0x40/0x70
> [ 480.025185] ? __switch_to_asm+0x34/0x70
> [ 480.025210] ? __switch_to_asm+0x40/0x70
> [ 480.025236] ? __switch_to_asm+0x34/0x70
> [ 480.025262] ? __switch_to_asm+0x40/0x70
> [ 480.025287] ? __switch_to_asm+0x34/0x70
> [ 480.025312] ? __switch_to_asm+0x40/0x70
> [ 480.025338] ? __switch_to_asm+0x34/0x70
> [ 480.025364] ? __switch_to_asm+0x40/0x70
> [ 480.025389] ? __switch_to_asm+0x34/0x70
> [ 480.025415] ? __switch_to_asm+0x40/0x70
> [ 480.025443] ? __update_load_avg_se+0x11c/0x2e0
> [ 480.025472] ? __update_load_avg_se+0x11c/0x2e0
> [ 480.025503] ? update_load_avg+0x7e/0x600
> [ 480.025529] ? update_load_avg+0x7e/0x600
> [ 480.025556] ? update_curr+0x85/0x1d0
> [ 480.025582] ? cred_has_capability+0x85/0x130
> [ 480.025611] ? __nla_validate_parse+0x57/0x8a0
> [ 480.025640] ? _cond_resched+0x15/0x40
> [ 480.025666] ? genl_family_rcv_msg_attrs_parse.isra.14+0x93/0x100
> [ 480.026523] genl_rcv_msg+0x1d9/0x490
> [ 480.027385] ? __switch_to_asm+0x34/0x70
> [ 480.028230] ? __switch_to_asm+0x40/0x70
> [ 480.029050] ? __switch_to_asm+0x40/0x70
> [ 480.029874] ? genl_family_rcv_msg_attrs_parse.isra.14+0x100/0x100
> [ 480.030673] netlink_rcv_skb+0x4a/0x110
> [ 480.031465] genl_rcv+0x24/0x40
> [ 480.032312] netlink_unicast+0x1a0/0x250
> [ 480.033059] netlink_sendmsg+0x2b4/0x3b0
> [ 480.033758] sock_sendmsg+0x5b/0x60
> [ 480.034422] ___sys_sendmsg+0x278/0x2f0
> [ 480.035083] ? file_update_time+0x60/0x130
> [ 480.035680] ? pipe_write+0x286/0x400
> [ 480.036290] ? new_sync_write+0x12d/0x1d0
> [ 480.036882] ? __sys_sendmsg+0x5e/0xa0
> [ 480.037452] __sys_sendmsg+0x5e/0xa0
> [ 480.038013] do_syscall_64+0x52/0x1a0
> [ 480.038546] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 480.039083] RIP: 0033:0x7fdd7537fa6d
> [ 480.039596] Code: b9 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0
> ff ff 73 31 c3 48 83 ec 08 e8 fe f6 ff ff 48 89 04 24 b8 2e 00 00 00 0f
> 05 <48> 8b 3c 24 48 89 c2 e8 47 f7 ff ff 48 89 d0 48 83 c4 08 48 3d 01
> [ 480.040769] RSP: 002b:00007ffd18a6ad40 EFLAGS: 00000293 ORIG_RAX:
> 000000000000002e
> [ 480.041391] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> 00007fdd7537fa6d
> [ 480.042045] RDX: 0000000000000000 RSI: 00007ffd18a6ada0 RDI:
> 0000000000000014
> [ 480.042713] RBP: 0000000002300870 R08: 0000000000000000 R09:
> 00007ffd18a6bd58
> [ 480.043438] R10: 0000000000000000 R11: 0000000000000293 R12:
> 00007ffd18a6bb70
> [ 480.044138] R13: 00007ffd18a6bd00 R14: 00007ffd18a6bb78 R15:
> 00007ffd18a6b230
> [ 480.044852] Modules linked in: vport_vxlan(E) vxlan(E) vport_gre(E)
> ip_gre(E) ip_tunnel(E) vport_geneve(E) geneve(E) ip6_udp_tunnel(E)
> udp_tunnel(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat_tftp(E)
> nf_conntrack_tftp(E) nf_nat_ftp(E) nf_conntrack_ftp(E) nf_nat(E)
> nf_conntrack_netlink(E) ip6table_filter(E) ip6_tables(E)
> iptable_filter(E) ip_tables(E) x_tables(E) ip6_gre(E) ip6_tunnel(E)
> tunnel6(E) gre(E) bonding(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E)
> veth(E) nfnetlink_cttimeout(E) nfnetlink(E) nf_conntrack(E)
> nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) intel_rapl_msr(E)
> snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_intel(E)
> snd_intel_nhlt(E) joydev(E) snd_hda_codec(E) input_leds(E)
> snd_hda_core(E) snd_hwdep(E) intel_rapl_common(E) snd_pcm(E)
> snd_timer(E) serio_raw(E) snd(E) soundcore(E) i2c_piix4(E) mac_hid(E)
> ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) configfs(E)
> iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E)
> autofs4(E) btrfs(E) zstd_decompress(E)
> [ 480.044888] zstd_compress(E) raid10(E) raid456(E)
> async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E)
> async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E)
> multipath(E) linear(E) crct10dif_pclmul(E) crc32_pclmul(E)
> ghash_clmulni_intel(E) aesni_intel(E) qxl(E) crypto_simd(E) ttm(E)
> cryptd(E) glue_helper(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E)
> sysimgblt(E) fb_sys_fops(E) psmouse(E) drm(E) floppy(E) pata_acpi(E)
> [last unloaded: nf_conntrack_ftp]
> [ 480.056765] ---[ end trace 4a8c4eceeb9f5dec ]---
> [ 480.057953] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
> [ 480.059134] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
> [ 480.061623] RSP: 0018:ffffaf32c05e38c8 EFLAGS: 00010246
> [ 480.062959] RAX: 0000000000000010 RBX: ffff9e4f6cd5a000 RCX:
> ffff9e4f6c585000
> [ 480.064248] RDX: ffff9e4f6cd5a098 RSI: 0000000000000010 RDI:
> ffff9e4f6b2c6d20
> [ 480.065524] RBP: ffffaf32c05e3b70 R08: ffff9e4f6c1651c0 R09:
> ffff9e4f756a43c0
> [ 480.066830] R10: 0000000000000000 R11: ffffffffc06e5500 R12:
> ffff9e4f6baf7800
> [ 480.068870] R13: ffff9e4f6b2c6d20 R14: ffff9e4f724a4e14 R15:
> 0000000000000007
> [ 480.070081] FS: 00007fdd76058980(0000) GS:ffff9e4f77b00000(0000)
> knlGS:0000000000000000
> [ 480.071340] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 480.072610] CR2: 00007ffd18a5ac60 CR3: 0000000230f3a002 CR4:
> 00000000001606e0
>
> You're hitting the BUG_ON here:
>
> /* Must be called with OVS mutex held. */
> void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow)
> {
> struct table_instance *ti = ovsl_dereference(table->ti);
> struct table_instance *ufid_ti = ovsl_dereference(table->ufid_ti);
>
> BUG_ON(table->count == 0);
> <------------------------------------------------ Here
Hi Greg,
Thanks for your work, I fixed it, when relloac mask_array I don't
update ma point in patch 5.
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index bc14b12..210018a 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -827,6 +827,8 @@ static int tbl_mask_array_add_mask(struct flow_table *tbl,
MASK_ARRAY_SIZE_MIN);
if (err)
return err;
+
+ ma = ovsl_dereference(tbl->mask_array);
}
BUG_ON(ovsl_dereference(ma->masks[ma->count]));
> Thanks,
>
> - Greg
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v2 00/10] optimize openvswitch flow looking up
2019-10-10 8:42 ` Tonghao Zhang
@ 2019-10-14 22:26 ` Gregory Rose
2019-10-15 8:25 ` Tonghao Zhang
0 siblings, 1 reply; 16+ messages in thread
From: Gregory Rose @ 2019-10-14 22:26 UTC (permalink / raw)
To: Tonghao Zhang; +Cc: Pravin Shelar, Linux Kernel Network Developers
On 10/10/2019 1:42 AM, Tonghao Zhang wrote:
> On Wed, Oct 9, 2019 at 1:33 AM Gregory Rose <gvrose8192@gmail.com> wrote:
>>
[snip]
>> Hi Tonghao,
>>
>> I've applied your patch series and built a 5.4.0-rc1 kernel with them.
>>
>> xxxxx@ubuntu-1604:~$ modinfo openvswitch
>> filename: /lib/modules/5.4.0-rc1+/kernel/net/openvswitch/openvswitch.ko
>> alias: net-pf-16-proto-16-family-ovs_ct_limit
>> alias: net-pf-16-proto-16-family-ovs_meter
>> alias: net-pf-16-proto-16-family-ovs_packet
>> alias: net-pf-16-proto-16-family-ovs_flow
>> alias: net-pf-16-proto-16-family-ovs_vport
>> alias: net-pf-16-proto-16-family-ovs_datapath
>> license: GPL
>> description: Open vSwitch switching datapath
>> srcversion: F15EB8B4460D81BAA16216B
>> depends: nf_conntrack,nf_nat,nf_conncount,libcrc32c,nf_defrag_ipv6,nsh
>> retpoline: Y
>> intree: Y
>> name: openvswitch
>> vermagic: 5.4.0-rc1+ SMP mod_unload modversions
>>
>> I then built openvswitch master branch from github and ran 'make
>> check-kernel'.
>>
>> In doing so I ran into the following splat in this test:
>> 63: conntrack - IPv6 fragmentation + vlan
>>
>> Here is the splat:
>> [ 480.024215] ------------[ cut here ]------------
>> [ 480.024218] kernel BUG at net/openvswitch/flow_table.c:725!
>> [ 480.024267] invalid opcode: 0000 [#1] SMP PTI
>> [ 480.024297] CPU: 2 PID: 15717 Comm: ovs-vswitchd Tainted: G E
>> 5.4.0-rc1+ #131
>> [ 480.024345] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>> [ 480.024386] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
>> [ 480.024424] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
>> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
>> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
>> [ 480.024527] RSP: 0018:ffffaf32c05e38c8 EFLAGS: 00010246
>> [ 480.024560] RAX: 0000000000000010 RBX: ffff9e4f6cd5a000 RCX:
>> ffff9e4f6c585000
>> [ 480.024601] RDX: ffff9e4f6cd5a098 RSI: 0000000000000010 RDI:
>> ffff9e4f6b2c6d20
>> [ 480.024642] RBP: ffffaf32c05e3b70 R08: ffff9e4f6c1651c0 R09:
>> ffff9e4f756a43c0
>> [ 480.024684] R10: 0000000000000000 R11: ffffffffc06e5500 R12:
>> ffff9e4f6baf7800
>> [ 480.024742] R13: ffff9e4f6b2c6d20 R14: ffff9e4f724a4e14 R15:
>> 0000000000000007
>> [ 480.024790] FS: 00007fdd76058980(0000) GS:ffff9e4f77b00000(0000)
>> knlGS:0000000000000000
>> [ 480.024836] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 480.024871] CR2: 00007ffd18a5ac60 CR3: 0000000230f3a002 CR4:
>> 00000000001606e0
>> [ 480.024917] Call Trace:
>> [ 480.024941] action_fifos_exit+0x3240/0x37b0 [openvswitch]
>> [ 480.024979] ? __switch_to_asm+0x40/0x70
>> [ 480.025005] ? __switch_to_asm+0x34/0x70
>> [ 480.025031] ? __switch_to_asm+0x40/0x70
>> [ 480.025056] ? __switch_to_asm+0x40/0x70
>> [ 480.025082] ? __switch_to_asm+0x34/0x70
>> [ 480.025108] ? __switch_to_asm+0x40/0x70
>> [ 480.025134] ? __switch_to_asm+0x34/0x70
>> [ 480.025159] ? __switch_to_asm+0x40/0x70
>> [ 480.025185] ? __switch_to_asm+0x34/0x70
>> [ 480.025210] ? __switch_to_asm+0x40/0x70
>> [ 480.025236] ? __switch_to_asm+0x34/0x70
>> [ 480.025262] ? __switch_to_asm+0x40/0x70
>> [ 480.025287] ? __switch_to_asm+0x34/0x70
>> [ 480.025312] ? __switch_to_asm+0x40/0x70
>> [ 480.025338] ? __switch_to_asm+0x34/0x70
>> [ 480.025364] ? __switch_to_asm+0x40/0x70
>> [ 480.025389] ? __switch_to_asm+0x34/0x70
>> [ 480.025415] ? __switch_to_asm+0x40/0x70
>> [ 480.025443] ? __update_load_avg_se+0x11c/0x2e0
>> [ 480.025472] ? __update_load_avg_se+0x11c/0x2e0
>> [ 480.025503] ? update_load_avg+0x7e/0x600
>> [ 480.025529] ? update_load_avg+0x7e/0x600
>> [ 480.025556] ? update_curr+0x85/0x1d0
>> [ 480.025582] ? cred_has_capability+0x85/0x130
>> [ 480.025611] ? __nla_validate_parse+0x57/0x8a0
>> [ 480.025640] ? _cond_resched+0x15/0x40
>> [ 480.025666] ? genl_family_rcv_msg_attrs_parse.isra.14+0x93/0x100
>> [ 480.026523] genl_rcv_msg+0x1d9/0x490
>> [ 480.027385] ? __switch_to_asm+0x34/0x70
>> [ 480.028230] ? __switch_to_asm+0x40/0x70
>> [ 480.029050] ? __switch_to_asm+0x40/0x70
>> [ 480.029874] ? genl_family_rcv_msg_attrs_parse.isra.14+0x100/0x100
>> [ 480.030673] netlink_rcv_skb+0x4a/0x110
>> [ 480.031465] genl_rcv+0x24/0x40
>> [ 480.032312] netlink_unicast+0x1a0/0x250
>> [ 480.033059] netlink_sendmsg+0x2b4/0x3b0
>> [ 480.033758] sock_sendmsg+0x5b/0x60
>> [ 480.034422] ___sys_sendmsg+0x278/0x2f0
>> [ 480.035083] ? file_update_time+0x60/0x130
>> [ 480.035680] ? pipe_write+0x286/0x400
>> [ 480.036290] ? new_sync_write+0x12d/0x1d0
>> [ 480.036882] ? __sys_sendmsg+0x5e/0xa0
>> [ 480.037452] __sys_sendmsg+0x5e/0xa0
>> [ 480.038013] do_syscall_64+0x52/0x1a0
>> [ 480.038546] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [ 480.039083] RIP: 0033:0x7fdd7537fa6d
>> [ 480.039596] Code: b9 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0
>> ff ff 73 31 c3 48 83 ec 08 e8 fe f6 ff ff 48 89 04 24 b8 2e 00 00 00 0f
>> 05 <48> 8b 3c 24 48 89 c2 e8 47 f7 ff ff 48 89 d0 48 83 c4 08 48 3d 01
>> [ 480.040769] RSP: 002b:00007ffd18a6ad40 EFLAGS: 00000293 ORIG_RAX:
>> 000000000000002e
>> [ 480.041391] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>> 00007fdd7537fa6d
>> [ 480.042045] RDX: 0000000000000000 RSI: 00007ffd18a6ada0 RDI:
>> 0000000000000014
>> [ 480.042713] RBP: 0000000002300870 R08: 0000000000000000 R09:
>> 00007ffd18a6bd58
>> [ 480.043438] R10: 0000000000000000 R11: 0000000000000293 R12:
>> 00007ffd18a6bb70
>> [ 480.044138] R13: 00007ffd18a6bd00 R14: 00007ffd18a6bb78 R15:
>> 00007ffd18a6b230
>> [ 480.044852] Modules linked in: vport_vxlan(E) vxlan(E) vport_gre(E)
>> ip_gre(E) ip_tunnel(E) vport_geneve(E) geneve(E) ip6_udp_tunnel(E)
>> udp_tunnel(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat_tftp(E)
>> nf_conntrack_tftp(E) nf_nat_ftp(E) nf_conntrack_ftp(E) nf_nat(E)
>> nf_conntrack_netlink(E) ip6table_filter(E) ip6_tables(E)
>> iptable_filter(E) ip_tables(E) x_tables(E) ip6_gre(E) ip6_tunnel(E)
>> tunnel6(E) gre(E) bonding(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E)
>> veth(E) nfnetlink_cttimeout(E) nfnetlink(E) nf_conntrack(E)
>> nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) intel_rapl_msr(E)
>> snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_intel(E)
>> snd_intel_nhlt(E) joydev(E) snd_hda_codec(E) input_leds(E)
>> snd_hda_core(E) snd_hwdep(E) intel_rapl_common(E) snd_pcm(E)
>> snd_timer(E) serio_raw(E) snd(E) soundcore(E) i2c_piix4(E) mac_hid(E)
>> ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) configfs(E)
>> iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E)
>> autofs4(E) btrfs(E) zstd_decompress(E)
>> [ 480.044888] zstd_compress(E) raid10(E) raid456(E)
>> async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E)
>> async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E)
>> multipath(E) linear(E) crct10dif_pclmul(E) crc32_pclmul(E)
>> ghash_clmulni_intel(E) aesni_intel(E) qxl(E) crypto_simd(E) ttm(E)
>> cryptd(E) glue_helper(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E)
>> sysimgblt(E) fb_sys_fops(E) psmouse(E) drm(E) floppy(E) pata_acpi(E)
>> [last unloaded: nf_conntrack_ftp]
>> [ 480.056765] ---[ end trace 4a8c4eceeb9f5dec ]---
>> [ 480.057953] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
>> [ 480.059134] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
>> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
>> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
>> [ 480.061623] RSP: 0018:ffffaf32c05e38c8 EFLAGS: 00010246
>> [ 480.062959] RAX: 0000000000000010 RBX: ffff9e4f6cd5a000 RCX:
>> ffff9e4f6c585000
>> [ 480.064248] RDX: ffff9e4f6cd5a098 RSI: 0000000000000010 RDI:
>> ffff9e4f6b2c6d20
>> [ 480.065524] RBP: ffffaf32c05e3b70 R08: ffff9e4f6c1651c0 R09:
>> ffff9e4f756a43c0
>> [ 480.066830] R10: 0000000000000000 R11: ffffffffc06e5500 R12:
>> ffff9e4f6baf7800
>> [ 480.068870] R13: ffff9e4f6b2c6d20 R14: ffff9e4f724a4e14 R15:
>> 0000000000000007
>> [ 480.070081] FS: 00007fdd76058980(0000) GS:ffff9e4f77b00000(0000)
>> knlGS:0000000000000000
>> [ 480.071340] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 480.072610] CR2: 00007ffd18a5ac60 CR3: 0000000230f3a002 CR4:
>> 00000000001606e0
>>
>> You're hitting the BUG_ON here:
>>
>> /* Must be called with OVS mutex held. */
>> void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow)
>> {
>> struct table_instance *ti = ovsl_dereference(table->ti);
>> struct table_instance *ufid_ti = ovsl_dereference(table->ufid_ti);
>>
>> BUG_ON(table->count == 0);
>> <------------------------------------------------ Here
> Hi Greg,
> Thanks for your work, I fixed it, when relloac mask_array I don't
> update ma point in patch 5.
>
> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> index bc14b12..210018a 100644
> --- a/net/openvswitch/flow_table.c
> +++ b/net/openvswitch/flow_table.c
> @@ -827,6 +827,8 @@ static int tbl_mask_array_add_mask(struct flow_table *tbl,
> MASK_ARRAY_SIZE_MIN);
> if (err)
> return err;
> +
> + ma = ovsl_dereference(tbl->mask_array);
> }
>
> BUG_ON(ovsl_dereference(ma->masks[ma->count]));
Hi Tonghao,
I did make the change you suggested:
git diff
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index bc14b12..210018a 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -827,6 +827,8 @@ static int tbl_mask_array_add_mask(struct flow_table
*tbl,
MASK_ARRAY_SIZE_MIN);
if (err)
return err;
+
+ ma = ovsl_dereference(tbl->mask_array);
}
However, there is still an issue. Apparently this change just moves the
bug. Now I'm getting this splat:
[ 512.147478] ------------[ cut here ]------------
[ 512.147481] kernel BUG at net/openvswitch/flow_table.c:725!
[ 512.147526] invalid opcode: 0000 [#1] SMP PTI
[ 512.147552] CPU: 1 PID: 14636 Comm: ovs-vswitchd Tainted:
G E 5.4.0-rc1+ #138
[ 512.147595] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 512.147630] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
[ 512.147663] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
[ 512.147753] RSP: 0018:ffffb637002cf8c8 EFLAGS: 00010246
[ 512.147781] RAX: 0000000000000009 RBX: ffff95ebf32d23c0 RCX:
ffff95ebf00e5a00
[ 512.147817] RDX: ffff95ebf32d2420 RSI: 0000000000000009 RDI:
ffff95ebf0dffba0
[ 512.147852] RBP: ffffb637002cfb70 R08: ffff95ebf6030240 R09:
ffff95ebf1643180
[ 512.147888] R10: ffff95ebf283b814 R11: ffffffffc0932500 R12:
ffff95ebf040a300
[ 512.147924] R13: ffff95ebf0dffba0 R14: ffff95ebf283b814 R15:
0000000000000007
[ 512.147961] FS: 00007fbbab3d2980(0000) GS:ffff95ebf7a80000(0000)
knlGS:0000000000000000
[ 512.148001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 512.148031] CR2: 00007fffca190ff8 CR3: 0000000232810006 CR4:
00000000001606e0
[ 512.148071] Call Trace:
[ 512.148092] action_fifos_exit+0x3240/0x37b0 [openvswitch]
[ 512.148125] ? update_sd_lb_stats+0x613/0x760
[ 512.148152] ? find_busiest_group+0x3e/0x520
[ 512.148177] ? __nla_validate_parse+0x57/0x8a0
[ 512.148203] ? _cond_resched+0x15/0x40
[ 512.148226] ? genl_family_rcv_msg_attrs_parse+0xe4/0x110
[ 512.148256] genl_rcv_msg+0x1ed/0x430
[ 512.148303] ? __switch_to_asm+0x34/0x70
[ 512.148326] ? __switch_to_asm+0x40/0x70
[ 512.148349] ? __switch_to_asm+0x34/0x70
[ 512.148371] ? __switch_to_asm+0x40/0x70
[ 512.148394] ? __switch_to_asm+0x34/0x70
[ 512.148416] ? __switch_to_asm+0x40/0x70
[ 512.148439] ? genl_family_rcv_msg_attrs_parse+0x110/0x110
[ 512.148470] netlink_rcv_skb+0x4a/0x110
[ 512.148492] genl_rcv+0x24/0x40
[ 512.148512] netlink_unicast+0x1a0/0x250
[ 512.148536] netlink_sendmsg+0x2b4/0x3b0
[ 512.148560] sock_sendmsg+0x5b/0x60
[ 512.148582] ___sys_sendmsg+0x278/0x2f0
[ 512.148607] ? file_update_time+0x60/0x130
[ 512.148630] ? pipe_write+0x286/0x400
[ 512.148653] ? new_sync_write+0x12d/0x1d0
[ 512.148676] ? __sys_sendmsg+0x5e/0xa0
[ 512.148697] __sys_sendmsg+0x5e/0xa0
[ 512.148720] do_syscall_64+0x52/0x1a0
[ 512.148742] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 512.148771] RIP: 0033:0x7fbbaa6f9a6d
[ 512.149622] Code: b9 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0
ff ff 73 31 c3 48 83 ec 08 e8 fe f6 ff ff 48 89 04 24 b8 2e 00 00 00 0f
05 <48> 8b 3c 24 48 89 c2 e8 47 f7 ff ff 48 89 d0 48 83 c4 08 48 3d 01
[ 512.151428] RSP: 002b:00007fffca1a1100 EFLAGS: 00000293 ORIG_RAX:
000000000000002e
[ 512.152349] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007fbbaa6f9a6d
[ 512.153266] RDX: 0000000000000000 RSI: 00007fffca1a1160 RDI:
0000000000000010
[ 512.154184] RBP: 0000000000f42680 R08: 0000000000000000 R09:
00007fffca1a2118
[ 512.155094] R10: 0000000000000008 R11: 0000000000000293 R12:
00007fffca1a1f30
[ 512.155992] R13: 00007fffca1a20c0 R14: 00007fffca1a1f38 R15:
00007fffca1a15f0
[ 512.156866] Modules linked in: vport_vxlan(E) vxlan(E) vport_gre(E)
ip_gre(E) ip_tunnel(E) vport_geneve(E) geneve(E) ip6_udp_tunnel(E)
udp_tunnel(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat_tftp(E)
nf_conntrack_tftp(E) nf_nat_ftp(E) nf_conntrack_ftp(E) nf_nat(E)
nf_conntrack_netlink(E) ip6table_filter(E) ip6_tables(E)
iptable_filter(E) ip_tables(E) x_tables(E) ip6_gre(E) ip6_tunnel(E)
tunnel6(E) gre(E) bonding(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E)
veth(E) nfnetlink_cttimeout(E) nfnetlink(E) nf_conntrack(E)
nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) intel_rapl_msr(E)
snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_intel(E)
snd_intel_nhlt(E) snd_hda_codec(E) intel_rapl_common(E) snd_hda_core(E)
snd_hwdep(E) input_leds(E) snd_pcm(E) joydev(E) snd_timer(E)
serio_raw(E) snd(E) soundcore(E) i2c_piix4(E) mac_hid(E) ib_iser(E)
rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) configfs(E) iscsi_tcp(E)
libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) autofs4(E) btrfs(E)
zstd_decompress(E)
[ 512.156899] zstd_compress(E) raid10(E) raid456(E)
async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E)
async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E)
multipath(E) linear(E) crct10dif_pclmul(E) crc32_pclmul(E)
ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) qxl(E) ttm(E)
cryptd(E) glue_helper(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E)
sysimgblt(E) fb_sys_fops(E) psmouse(E) drm(E) pata_acpi(E) floppy(E)
[last unloaded: nf_conntrack_ftp]
[ 512.168488] ---[ end trace 26730810beeb11e1 ]---
[ 512.169555] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
[ 512.170638] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
[ 512.172813] RSP: 0018:ffffb637002cf8c8 EFLAGS: 00010246
[ 512.173897] RAX: 0000000000000009 RBX: ffff95ebf32d23c0 RCX:
ffff95ebf00e5a00
[ 512.174991] RDX: ffff95ebf32d2420 RSI: 0000000000000009 RDI:
ffff95ebf0dffba0
[ 512.176109] RBP: ffffb637002cfb70 R08: ffff95ebf6030240 R09:
ffff95ebf1643180
[ 512.177229] R10: ffff95ebf283b814 R11: ffffffffc0932500 R12:
ffff95ebf040a300
[ 512.178364] R13: ffff95ebf0dffba0 R14: ffff95ebf283b814 R15:
0000000000000007
[ 512.179530] FS: 00007fbbab3d2980(0000) GS:ffff95ebf7a80000(0000)
knlGS:0000000000000000
[ 512.180700] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 512.181885] CR2: 00007fffca190ff8 CR3: 0000000232810006 CR4:
00000000001606e0
The code is hitting this:
static void tbl_mask_array_del_mask(struct flow_table *tbl,
struct sw_flow_mask *mask)
{
struct mask_array *ma = ovsl_dereference(tbl->mask_array);
int i;
/* Remove the deleted mask pointers from the array */
for (i = 0; i < ma->count; i++) {
if (mask == ovsl_dereference(ma->masks[i]))
goto found;
}
BUG(); <---------------------------- Here
Pravin mentioned memory barrier usage in one of his replies. Perhaps
that is an avenue to explore.
Thanks,
- Greg
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v2 00/10] optimize openvswitch flow looking up
2019-10-14 22:26 ` Gregory Rose
@ 2019-10-15 8:25 ` Tonghao Zhang
2019-10-15 17:37 ` Gregory Rose
0 siblings, 1 reply; 16+ messages in thread
From: Tonghao Zhang @ 2019-10-15 8:25 UTC (permalink / raw)
To: Gregory Rose; +Cc: Pravin Shelar, Linux Kernel Network Developers
On Tue, Oct 15, 2019 at 6:26 AM Gregory Rose <gvrose8192@gmail.com> wrote:
>
> On 10/10/2019 1:42 AM, Tonghao Zhang wrote:
> > On Wed, Oct 9, 2019 at 1:33 AM Gregory Rose <gvrose8192@gmail.com> wrote:
> >>
>
> [snip]
>
> >> Hi Tonghao,
> >>
> >> I've applied your patch series and built a 5.4.0-rc1 kernel with them.
> >>
> >> xxxxx@ubuntu-1604:~$ modinfo openvswitch
> >> filename: /lib/modules/5.4.0-rc1+/kernel/net/openvswitch/openvswitch.ko
> >> alias: net-pf-16-proto-16-family-ovs_ct_limit
> >> alias: net-pf-16-proto-16-family-ovs_meter
> >> alias: net-pf-16-proto-16-family-ovs_packet
> >> alias: net-pf-16-proto-16-family-ovs_flow
> >> alias: net-pf-16-proto-16-family-ovs_vport
> >> alias: net-pf-16-proto-16-family-ovs_datapath
> >> license: GPL
> >> description: Open vSwitch switching datapath
> >> srcversion: F15EB8B4460D81BAA16216B
> >> depends: nf_conntrack,nf_nat,nf_conncount,libcrc32c,nf_defrag_ipv6,nsh
> >> retpoline: Y
> >> intree: Y
> >> name: openvswitch
> >> vermagic: 5.4.0-rc1+ SMP mod_unload modversions
> >>
> >> I then built openvswitch master branch from github and ran 'make
> >> check-kernel'.
> >>
> >> In doing so I ran into the following splat in this test:
> >> 63: conntrack - IPv6 fragmentation + vlan
> >>
> >> Here is the splat:
> >> [ 480.024215] ------------[ cut here ]------------
> >> [ 480.024218] kernel BUG at net/openvswitch/flow_table.c:725!
> >> [ 480.024267] invalid opcode: 0000 [#1] SMP PTI
> >> [ 480.024297] CPU: 2 PID: 15717 Comm: ovs-vswitchd Tainted: G E
> >> 5.4.0-rc1+ #131
> >> [ 480.024345] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> >> [ 480.024386] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
> >> [ 480.024424] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
> >> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
> >> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
> >> [ 480.024527] RSP: 0018:ffffaf32c05e38c8 EFLAGS: 00010246
> >> [ 480.024560] RAX: 0000000000000010 RBX: ffff9e4f6cd5a000 RCX:
> >> ffff9e4f6c585000
> >> [ 480.024601] RDX: ffff9e4f6cd5a098 RSI: 0000000000000010 RDI:
> >> ffff9e4f6b2c6d20
> >> [ 480.024642] RBP: ffffaf32c05e3b70 R08: ffff9e4f6c1651c0 R09:
> >> ffff9e4f756a43c0
> >> [ 480.024684] R10: 0000000000000000 R11: ffffffffc06e5500 R12:
> >> ffff9e4f6baf7800
> >> [ 480.024742] R13: ffff9e4f6b2c6d20 R14: ffff9e4f724a4e14 R15:
> >> 0000000000000007
> >> [ 480.024790] FS: 00007fdd76058980(0000) GS:ffff9e4f77b00000(0000)
> >> knlGS:0000000000000000
> >> [ 480.024836] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [ 480.024871] CR2: 00007ffd18a5ac60 CR3: 0000000230f3a002 CR4:
> >> 00000000001606e0
> >> [ 480.024917] Call Trace:
> >> [ 480.024941] action_fifos_exit+0x3240/0x37b0 [openvswitch]
> >> [ 480.024979] ? __switch_to_asm+0x40/0x70
> >> [ 480.025005] ? __switch_to_asm+0x34/0x70
> >> [ 480.025031] ? __switch_to_asm+0x40/0x70
> >> [ 480.025056] ? __switch_to_asm+0x40/0x70
> >> [ 480.025082] ? __switch_to_asm+0x34/0x70
> >> [ 480.025108] ? __switch_to_asm+0x40/0x70
> >> [ 480.025134] ? __switch_to_asm+0x34/0x70
> >> [ 480.025159] ? __switch_to_asm+0x40/0x70
> >> [ 480.025185] ? __switch_to_asm+0x34/0x70
> >> [ 480.025210] ? __switch_to_asm+0x40/0x70
> >> [ 480.025236] ? __switch_to_asm+0x34/0x70
> >> [ 480.025262] ? __switch_to_asm+0x40/0x70
> >> [ 480.025287] ? __switch_to_asm+0x34/0x70
> >> [ 480.025312] ? __switch_to_asm+0x40/0x70
> >> [ 480.025338] ? __switch_to_asm+0x34/0x70
> >> [ 480.025364] ? __switch_to_asm+0x40/0x70
> >> [ 480.025389] ? __switch_to_asm+0x34/0x70
> >> [ 480.025415] ? __switch_to_asm+0x40/0x70
> >> [ 480.025443] ? __update_load_avg_se+0x11c/0x2e0
> >> [ 480.025472] ? __update_load_avg_se+0x11c/0x2e0
> >> [ 480.025503] ? update_load_avg+0x7e/0x600
> >> [ 480.025529] ? update_load_avg+0x7e/0x600
> >> [ 480.025556] ? update_curr+0x85/0x1d0
> >> [ 480.025582] ? cred_has_capability+0x85/0x130
> >> [ 480.025611] ? __nla_validate_parse+0x57/0x8a0
> >> [ 480.025640] ? _cond_resched+0x15/0x40
> >> [ 480.025666] ? genl_family_rcv_msg_attrs_parse.isra.14+0x93/0x100
> >> [ 480.026523] genl_rcv_msg+0x1d9/0x490
> >> [ 480.027385] ? __switch_to_asm+0x34/0x70
> >> [ 480.028230] ? __switch_to_asm+0x40/0x70
> >> [ 480.029050] ? __switch_to_asm+0x40/0x70
> >> [ 480.029874] ? genl_family_rcv_msg_attrs_parse.isra.14+0x100/0x100
> >> [ 480.030673] netlink_rcv_skb+0x4a/0x110
> >> [ 480.031465] genl_rcv+0x24/0x40
> >> [ 480.032312] netlink_unicast+0x1a0/0x250
> >> [ 480.033059] netlink_sendmsg+0x2b4/0x3b0
> >> [ 480.033758] sock_sendmsg+0x5b/0x60
> >> [ 480.034422] ___sys_sendmsg+0x278/0x2f0
> >> [ 480.035083] ? file_update_time+0x60/0x130
> >> [ 480.035680] ? pipe_write+0x286/0x400
> >> [ 480.036290] ? new_sync_write+0x12d/0x1d0
> >> [ 480.036882] ? __sys_sendmsg+0x5e/0xa0
> >> [ 480.037452] __sys_sendmsg+0x5e/0xa0
> >> [ 480.038013] do_syscall_64+0x52/0x1a0
> >> [ 480.038546] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >> [ 480.039083] RIP: 0033:0x7fdd7537fa6d
> >> [ 480.039596] Code: b9 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0
> >> ff ff 73 31 c3 48 83 ec 08 e8 fe f6 ff ff 48 89 04 24 b8 2e 00 00 00 0f
> >> 05 <48> 8b 3c 24 48 89 c2 e8 47 f7 ff ff 48 89 d0 48 83 c4 08 48 3d 01
> >> [ 480.040769] RSP: 002b:00007ffd18a6ad40 EFLAGS: 00000293 ORIG_RAX:
> >> 000000000000002e
> >> [ 480.041391] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >> 00007fdd7537fa6d
> >> [ 480.042045] RDX: 0000000000000000 RSI: 00007ffd18a6ada0 RDI:
> >> 0000000000000014
> >> [ 480.042713] RBP: 0000000002300870 R08: 0000000000000000 R09:
> >> 00007ffd18a6bd58
> >> [ 480.043438] R10: 0000000000000000 R11: 0000000000000293 R12:
> >> 00007ffd18a6bb70
> >> [ 480.044138] R13: 00007ffd18a6bd00 R14: 00007ffd18a6bb78 R15:
> >> 00007ffd18a6b230
> >> [ 480.044852] Modules linked in: vport_vxlan(E) vxlan(E) vport_gre(E)
> >> ip_gre(E) ip_tunnel(E) vport_geneve(E) geneve(E) ip6_udp_tunnel(E)
> >> udp_tunnel(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat_tftp(E)
> >> nf_conntrack_tftp(E) nf_nat_ftp(E) nf_conntrack_ftp(E) nf_nat(E)
> >> nf_conntrack_netlink(E) ip6table_filter(E) ip6_tables(E)
> >> iptable_filter(E) ip_tables(E) x_tables(E) ip6_gre(E) ip6_tunnel(E)
> >> tunnel6(E) gre(E) bonding(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E)
> >> veth(E) nfnetlink_cttimeout(E) nfnetlink(E) nf_conntrack(E)
> >> nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) intel_rapl_msr(E)
> >> snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_intel(E)
> >> snd_intel_nhlt(E) joydev(E) snd_hda_codec(E) input_leds(E)
> >> snd_hda_core(E) snd_hwdep(E) intel_rapl_common(E) snd_pcm(E)
> >> snd_timer(E) serio_raw(E) snd(E) soundcore(E) i2c_piix4(E) mac_hid(E)
> >> ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) configfs(E)
> >> iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E)
> >> autofs4(E) btrfs(E) zstd_decompress(E)
> >> [ 480.044888] zstd_compress(E) raid10(E) raid456(E)
> >> async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E)
> >> async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E)
> >> multipath(E) linear(E) crct10dif_pclmul(E) crc32_pclmul(E)
> >> ghash_clmulni_intel(E) aesni_intel(E) qxl(E) crypto_simd(E) ttm(E)
> >> cryptd(E) glue_helper(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E)
> >> sysimgblt(E) fb_sys_fops(E) psmouse(E) drm(E) floppy(E) pata_acpi(E)
> >> [last unloaded: nf_conntrack_ftp]
> >> [ 480.056765] ---[ end trace 4a8c4eceeb9f5dec ]---
> >> [ 480.057953] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
> >> [ 480.059134] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
> >> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
> >> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
> >> [ 480.061623] RSP: 0018:ffffaf32c05e38c8 EFLAGS: 00010246
> >> [ 480.062959] RAX: 0000000000000010 RBX: ffff9e4f6cd5a000 RCX:
> >> ffff9e4f6c585000
> >> [ 480.064248] RDX: ffff9e4f6cd5a098 RSI: 0000000000000010 RDI:
> >> ffff9e4f6b2c6d20
> >> [ 480.065524] RBP: ffffaf32c05e3b70 R08: ffff9e4f6c1651c0 R09:
> >> ffff9e4f756a43c0
> >> [ 480.066830] R10: 0000000000000000 R11: ffffffffc06e5500 R12:
> >> ffff9e4f6baf7800
> >> [ 480.068870] R13: ffff9e4f6b2c6d20 R14: ffff9e4f724a4e14 R15:
> >> 0000000000000007
> >> [ 480.070081] FS: 00007fdd76058980(0000) GS:ffff9e4f77b00000(0000)
> >> knlGS:0000000000000000
> >> [ 480.071340] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [ 480.072610] CR2: 00007ffd18a5ac60 CR3: 0000000230f3a002 CR4:
> >> 00000000001606e0
> >>
> >> You're hitting the BUG_ON here:
> >>
> >> /* Must be called with OVS mutex held. */
> >> void ovs_flow_tbl_remove(struct flow_table *table, struct sw_flow *flow)
> >> {
> >> struct table_instance *ti = ovsl_dereference(table->ti);
> >> struct table_instance *ufid_ti = ovsl_dereference(table->ufid_ti);
> >>
> >> BUG_ON(table->count == 0);
> >> <------------------------------------------------ Here
> > Hi Greg,
> > Thanks for your work, I fixed it, when relloac mask_array I don't
> > update ma point in patch 5.
> >
> > diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> > index bc14b12..210018a 100644
> > --- a/net/openvswitch/flow_table.c
> > +++ b/net/openvswitch/flow_table.c
> > @@ -827,6 +827,8 @@ static int tbl_mask_array_add_mask(struct flow_table *tbl,
> > MASK_ARRAY_SIZE_MIN);
> > if (err)
> > return err;
> > +
> > + ma = ovsl_dereference(tbl->mask_array);
> > }
> >
> > BUG_ON(ovsl_dereference(ma->masks[ma->count]));
>
> Hi Tonghao,
>
> I did make the change you suggested:
>
> git diff
> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
> index bc14b12..210018a 100644
> --- a/net/openvswitch/flow_table.c
> +++ b/net/openvswitch/flow_table.c
> @@ -827,6 +827,8 @@ static int tbl_mask_array_add_mask(struct flow_table
> *tbl,
> MASK_ARRAY_SIZE_MIN);
> if (err)
> return err;
> +
> + ma = ovsl_dereference(tbl->mask_array);
> }
>
> However, there is still an issue. Apparently this change just moves the
> bug. Now I'm getting this splat:
>
> [ 512.147478] ------------[ cut here ]------------
> [ 512.147481] kernel BUG at net/openvswitch/flow_table.c:725!
> [ 512.147526] invalid opcode: 0000 [#1] SMP PTI
> [ 512.147552] CPU: 1 PID: 14636 Comm: ovs-vswitchd Tainted:
> G E 5.4.0-rc1+ #138
> [ 512.147595] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [ 512.147630] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
> [ 512.147663] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
> [ 512.147753] RSP: 0018:ffffb637002cf8c8 EFLAGS: 00010246
> [ 512.147781] RAX: 0000000000000009 RBX: ffff95ebf32d23c0 RCX:
> ffff95ebf00e5a00
> [ 512.147817] RDX: ffff95ebf32d2420 RSI: 0000000000000009 RDI:
> ffff95ebf0dffba0
> [ 512.147852] RBP: ffffb637002cfb70 R08: ffff95ebf6030240 R09:
> ffff95ebf1643180
> [ 512.147888] R10: ffff95ebf283b814 R11: ffffffffc0932500 R12:
> ffff95ebf040a300
> [ 512.147924] R13: ffff95ebf0dffba0 R14: ffff95ebf283b814 R15:
> 0000000000000007
> [ 512.147961] FS: 00007fbbab3d2980(0000) GS:ffff95ebf7a80000(0000)
> knlGS:0000000000000000
> [ 512.148001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 512.148031] CR2: 00007fffca190ff8 CR3: 0000000232810006 CR4:
> 00000000001606e0
> [ 512.148071] Call Trace:
> [ 512.148092] action_fifos_exit+0x3240/0x37b0 [openvswitch]
> [ 512.148125] ? update_sd_lb_stats+0x613/0x760
> [ 512.148152] ? find_busiest_group+0x3e/0x520
> [ 512.148177] ? __nla_validate_parse+0x57/0x8a0
> [ 512.148203] ? _cond_resched+0x15/0x40
> [ 512.148226] ? genl_family_rcv_msg_attrs_parse+0xe4/0x110
> [ 512.148256] genl_rcv_msg+0x1ed/0x430
> [ 512.148303] ? __switch_to_asm+0x34/0x70
> [ 512.148326] ? __switch_to_asm+0x40/0x70
> [ 512.148349] ? __switch_to_asm+0x34/0x70
> [ 512.148371] ? __switch_to_asm+0x40/0x70
> [ 512.148394] ? __switch_to_asm+0x34/0x70
> [ 512.148416] ? __switch_to_asm+0x40/0x70
> [ 512.148439] ? genl_family_rcv_msg_attrs_parse+0x110/0x110
> [ 512.148470] netlink_rcv_skb+0x4a/0x110
> [ 512.148492] genl_rcv+0x24/0x40
> [ 512.148512] netlink_unicast+0x1a0/0x250
> [ 512.148536] netlink_sendmsg+0x2b4/0x3b0
> [ 512.148560] sock_sendmsg+0x5b/0x60
> [ 512.148582] ___sys_sendmsg+0x278/0x2f0
> [ 512.148607] ? file_update_time+0x60/0x130
> [ 512.148630] ? pipe_write+0x286/0x400
> [ 512.148653] ? new_sync_write+0x12d/0x1d0
> [ 512.148676] ? __sys_sendmsg+0x5e/0xa0
> [ 512.148697] __sys_sendmsg+0x5e/0xa0
> [ 512.148720] do_syscall_64+0x52/0x1a0
> [ 512.148742] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 512.148771] RIP: 0033:0x7fbbaa6f9a6d
> [ 512.149622] Code: b9 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0
> ff ff 73 31 c3 48 83 ec 08 e8 fe f6 ff ff 48 89 04 24 b8 2e 00 00 00 0f
> 05 <48> 8b 3c 24 48 89 c2 e8 47 f7 ff ff 48 89 d0 48 83 c4 08 48 3d 01
> [ 512.151428] RSP: 002b:00007fffca1a1100 EFLAGS: 00000293 ORIG_RAX:
> 000000000000002e
> [ 512.152349] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> 00007fbbaa6f9a6d
> [ 512.153266] RDX: 0000000000000000 RSI: 00007fffca1a1160 RDI:
> 0000000000000010
> [ 512.154184] RBP: 0000000000f42680 R08: 0000000000000000 R09:
> 00007fffca1a2118
> [ 512.155094] R10: 0000000000000008 R11: 0000000000000293 R12:
> 00007fffca1a1f30
> [ 512.155992] R13: 00007fffca1a20c0 R14: 00007fffca1a1f38 R15:
> 00007fffca1a15f0
> [ 512.156866] Modules linked in: vport_vxlan(E) vxlan(E) vport_gre(E)
> ip_gre(E) ip_tunnel(E) vport_geneve(E) geneve(E) ip6_udp_tunnel(E)
> udp_tunnel(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat_tftp(E)
> nf_conntrack_tftp(E) nf_nat_ftp(E) nf_conntrack_ftp(E) nf_nat(E)
> nf_conntrack_netlink(E) ip6table_filter(E) ip6_tables(E)
> iptable_filter(E) ip_tables(E) x_tables(E) ip6_gre(E) ip6_tunnel(E)
> tunnel6(E) gre(E) bonding(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E)
> veth(E) nfnetlink_cttimeout(E) nfnetlink(E) nf_conntrack(E)
> nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) intel_rapl_msr(E)
> snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_intel(E)
> snd_intel_nhlt(E) snd_hda_codec(E) intel_rapl_common(E) snd_hda_core(E)
> snd_hwdep(E) input_leds(E) snd_pcm(E) joydev(E) snd_timer(E)
> serio_raw(E) snd(E) soundcore(E) i2c_piix4(E) mac_hid(E) ib_iser(E)
> rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) configfs(E) iscsi_tcp(E)
> libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) autofs4(E) btrfs(E)
> zstd_decompress(E)
> [ 512.156899] zstd_compress(E) raid10(E) raid456(E)
> async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E)
> async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E)
> multipath(E) linear(E) crct10dif_pclmul(E) crc32_pclmul(E)
> ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) qxl(E) ttm(E)
> cryptd(E) glue_helper(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E)
> sysimgblt(E) fb_sys_fops(E) psmouse(E) drm(E) pata_acpi(E) floppy(E)
> [last unloaded: nf_conntrack_ftp]
> [ 512.168488] ---[ end trace 26730810beeb11e1 ]---
> [ 512.169555] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
> [ 512.170638] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
> [ 512.172813] RSP: 0018:ffffb637002cf8c8 EFLAGS: 00010246
> [ 512.173897] RAX: 0000000000000009 RBX: ffff95ebf32d23c0 RCX:
> ffff95ebf00e5a00
> [ 512.174991] RDX: ffff95ebf32d2420 RSI: 0000000000000009 RDI:
> ffff95ebf0dffba0
> [ 512.176109] RBP: ffffb637002cfb70 R08: ffff95ebf6030240 R09:
> ffff95ebf1643180
> [ 512.177229] R10: ffff95ebf283b814 R11: ffffffffc0932500 R12:
> ffff95ebf040a300
> [ 512.178364] R13: ffff95ebf0dffba0 R14: ffff95ebf283b814 R15:
> 0000000000000007
> [ 512.179530] FS: 00007fbbab3d2980(0000) GS:ffff95ebf7a80000(0000)
> knlGS:0000000000000000
> [ 512.180700] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 512.181885] CR2: 00007fffca190ff8 CR3: 0000000232810006 CR4:
> 00000000001606e0
>
> The code is hitting this:
>
> static void tbl_mask_array_del_mask(struct flow_table *tbl,
> struct sw_flow_mask *mask)
> {
> struct mask_array *ma = ovsl_dereference(tbl->mask_array);
> int i;
>
> /* Remove the deleted mask pointers from the array */
> for (i = 0; i < ma->count; i++) {
> if (mask == ovsl_dereference(ma->masks[i]))
> goto found;
> }
>
> BUG(); <---------------------------- Here
>
> Pravin mentioned memory barrier usage in one of his replies. Perhaps
> that is an avenue to explore.
Hi Pravin, Greg
I run the make check-kernel for a long time, and don't reproduce it.
Greg, how did you reproduce it?
For using barrier, i should add READ_ONCE in flow_lookup in fast path
(read-side ), and use WRITE_ONCE in
tbl_mask_array_add/del_mask (write-side) protected by ovs_mutex. Other
read-side (protected by ovs_mutex),
e.g
* flow_mask_find
* ovs_flow_tbl_lookup_exact
* ovs_flow_tbl_num_masks
can access ma->count directly ?
> Thanks,
>
> - Greg
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH net-next v2 00/10] optimize openvswitch flow looking up
2019-10-15 8:25 ` Tonghao Zhang
@ 2019-10-15 17:37 ` Gregory Rose
0 siblings, 0 replies; 16+ messages in thread
From: Gregory Rose @ 2019-10-15 17:37 UTC (permalink / raw)
To: Tonghao Zhang; +Cc: Pravin Shelar, Linux Kernel Network Developers
On 10/15/2019 1:25 AM, Tonghao Zhang wrote:
> On Tue, Oct 15, 2019 at 6:26 AM Gregory Rose <gvrose8192@gmail.com> wrote:
[snip]
>> Hi Tonghao,
>> I did make the change you suggested:
>>
>> git diff
>> diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
>> index bc14b12..210018a 100644
>> --- a/net/openvswitch/flow_table.c
>> +++ b/net/openvswitch/flow_table.c
>> @@ -827,6 +827,8 @@ static int tbl_mask_array_add_mask(struct flow_table
>> *tbl,
>> MASK_ARRAY_SIZE_MIN);
>> if (err)
>> return err;
>> +
>> + ma = ovsl_dereference(tbl->mask_array);
>> }
>>
>> However, there is still an issue. Apparently this change just moves the
>> bug. Now I'm getting this splat:
>>
>> [ 512.147478] ------------[ cut here ]------------
>> [ 512.147481] kernel BUG at net/openvswitch/flow_table.c:725!
>> [ 512.147526] invalid opcode: 0000 [#1] SMP PTI
>> [ 512.147552] CPU: 1 PID: 14636 Comm: ovs-vswitchd Tainted:
>> G E 5.4.0-rc1+ #138
>> [ 512.147595] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>> [ 512.147630] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
>> [ 512.147663] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
>> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
>> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
>> [ 512.147753] RSP: 0018:ffffb637002cf8c8 EFLAGS: 00010246
>> [ 512.147781] RAX: 0000000000000009 RBX: ffff95ebf32d23c0 RCX:
>> ffff95ebf00e5a00
>> [ 512.147817] RDX: ffff95ebf32d2420 RSI: 0000000000000009 RDI:
>> ffff95ebf0dffba0
>> [ 512.147852] RBP: ffffb637002cfb70 R08: ffff95ebf6030240 R09:
>> ffff95ebf1643180
>> [ 512.147888] R10: ffff95ebf283b814 R11: ffffffffc0932500 R12:
>> ffff95ebf040a300
>> [ 512.147924] R13: ffff95ebf0dffba0 R14: ffff95ebf283b814 R15:
>> 0000000000000007
>> [ 512.147961] FS: 00007fbbab3d2980(0000) GS:ffff95ebf7a80000(0000)
>> knlGS:0000000000000000
>> [ 512.148001] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 512.148031] CR2: 00007fffca190ff8 CR3: 0000000232810006 CR4:
>> 00000000001606e0
>> [ 512.148071] Call Trace:
>> [ 512.148092] action_fifos_exit+0x3240/0x37b0 [openvswitch]
>> [ 512.148125] ? update_sd_lb_stats+0x613/0x760
>> [ 512.148152] ? find_busiest_group+0x3e/0x520
>> [ 512.148177] ? __nla_validate_parse+0x57/0x8a0
>> [ 512.148203] ? _cond_resched+0x15/0x40
>> [ 512.148226] ? genl_family_rcv_msg_attrs_parse+0xe4/0x110
>> [ 512.148256] genl_rcv_msg+0x1ed/0x430
>> [ 512.148303] ? __switch_to_asm+0x34/0x70
>> [ 512.148326] ? __switch_to_asm+0x40/0x70
>> [ 512.148349] ? __switch_to_asm+0x34/0x70
>> [ 512.148371] ? __switch_to_asm+0x40/0x70
>> [ 512.148394] ? __switch_to_asm+0x34/0x70
>> [ 512.148416] ? __switch_to_asm+0x40/0x70
>> [ 512.148439] ? genl_family_rcv_msg_attrs_parse+0x110/0x110
>> [ 512.148470] netlink_rcv_skb+0x4a/0x110
>> [ 512.148492] genl_rcv+0x24/0x40
>> [ 512.148512] netlink_unicast+0x1a0/0x250
>> [ 512.148536] netlink_sendmsg+0x2b4/0x3b0
>> [ 512.148560] sock_sendmsg+0x5b/0x60
>> [ 512.148582] ___sys_sendmsg+0x278/0x2f0
>> [ 512.148607] ? file_update_time+0x60/0x130
>> [ 512.148630] ? pipe_write+0x286/0x400
>> [ 512.148653] ? new_sync_write+0x12d/0x1d0
>> [ 512.148676] ? __sys_sendmsg+0x5e/0xa0
>> [ 512.148697] __sys_sendmsg+0x5e/0xa0
>> [ 512.148720] do_syscall_64+0x52/0x1a0
>> [ 512.148742] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [ 512.148771] RIP: 0033:0x7fbbaa6f9a6d
>> [ 512.149622] Code: b9 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0
>> ff ff 73 31 c3 48 83 ec 08 e8 fe f6 ff ff 48 89 04 24 b8 2e 00 00 00 0f
>> 05 <48> 8b 3c 24 48 89 c2 e8 47 f7 ff ff 48 89 d0 48 83 c4 08 48 3d 01
>> [ 512.151428] RSP: 002b:00007fffca1a1100 EFLAGS: 00000293 ORIG_RAX:
>> 000000000000002e
>> [ 512.152349] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>> 00007fbbaa6f9a6d
>> [ 512.153266] RDX: 0000000000000000 RSI: 00007fffca1a1160 RDI:
>> 0000000000000010
>> [ 512.154184] RBP: 0000000000f42680 R08: 0000000000000000 R09:
>> 00007fffca1a2118
>> [ 512.155094] R10: 0000000000000008 R11: 0000000000000293 R12:
>> 00007fffca1a1f30
>> [ 512.155992] R13: 00007fffca1a20c0 R14: 00007fffca1a1f38 R15:
>> 00007fffca1a15f0
>> [ 512.156866] Modules linked in: vport_vxlan(E) vxlan(E) vport_gre(E)
>> ip_gre(E) ip_tunnel(E) vport_geneve(E) geneve(E) ip6_udp_tunnel(E)
>> udp_tunnel(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat_tftp(E)
>> nf_conntrack_tftp(E) nf_nat_ftp(E) nf_conntrack_ftp(E) nf_nat(E)
>> nf_conntrack_netlink(E) ip6table_filter(E) ip6_tables(E)
>> iptable_filter(E) ip_tables(E) x_tables(E) ip6_gre(E) ip6_tunnel(E)
>> tunnel6(E) gre(E) bonding(E) 8021q(E) garp(E) stp(E) mrp(E) llc(E)
>> veth(E) nfnetlink_cttimeout(E) nfnetlink(E) nf_conntrack(E)
>> nf_defrag_ipv6(E) nf_defrag_ipv4(E) binfmt_misc(E) intel_rapl_msr(E)
>> snd_hda_codec_generic(E) ledtrig_audio(E) snd_hda_intel(E)
>> snd_intel_nhlt(E) snd_hda_codec(E) intel_rapl_common(E) snd_hda_core(E)
>> snd_hwdep(E) input_leds(E) snd_pcm(E) joydev(E) snd_timer(E)
>> serio_raw(E) snd(E) soundcore(E) i2c_piix4(E) mac_hid(E) ib_iser(E)
>> rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) configfs(E) iscsi_tcp(E)
>> libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) autofs4(E) btrfs(E)
>> zstd_decompress(E)
>> [ 512.156899] zstd_compress(E) raid10(E) raid456(E)
>> async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E)
>> async_tx(E) xor(E) raid6_pq(E) libcrc32c(E) raid1(E) raid0(E)
>> multipath(E) linear(E) crct10dif_pclmul(E) crc32_pclmul(E)
>> ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) qxl(E) ttm(E)
>> cryptd(E) glue_helper(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E)
>> sysimgblt(E) fb_sys_fops(E) psmouse(E) drm(E) pata_acpi(E) floppy(E)
>> [last unloaded: nf_conntrack_ftp]
>> [ 512.168488] ---[ end trace 26730810beeb11e1 ]---
>> [ 512.169555] RIP: 0010:ovs_flow_tbl_remove+0x151/0x160 [openvswitch]
>> [ 512.170638] Code: 55 f7 ea 89 f0 c1 f8 1f 29 c2 39 53 10 0f 8f 6a ff
>> ff ff 48 89 ef d1 fe 5b 5d e9 8a ed ff ff 0f 0b 0f 0b b8 18 00 00 00 eb
>> 92 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57
>> [ 512.172813] RSP: 0018:ffffb637002cf8c8 EFLAGS: 00010246
>> [ 512.173897] RAX: 0000000000000009 RBX: ffff95ebf32d23c0 RCX:
>> ffff95ebf00e5a00
>> [ 512.174991] RDX: ffff95ebf32d2420 RSI: 0000000000000009 RDI:
>> ffff95ebf0dffba0
>> [ 512.176109] RBP: ffffb637002cfb70 R08: ffff95ebf6030240 R09:
>> ffff95ebf1643180
>> [ 512.177229] R10: ffff95ebf283b814 R11: ffffffffc0932500 R12:
>> ffff95ebf040a300
>> [ 512.178364] R13: ffff95ebf0dffba0 R14: ffff95ebf283b814 R15:
>> 0000000000000007
>> [ 512.179530] FS: 00007fbbab3d2980(0000) GS:ffff95ebf7a80000(0000)
>> knlGS:0000000000000000
>> [ 512.180700] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 512.181885] CR2: 00007fffca190ff8 CR3: 0000000232810006 CR4:
>> 00000000001606e0
>>
>> The code is hitting this:
>>
>> static void tbl_mask_array_del_mask(struct flow_table *tbl,
>> struct sw_flow_mask *mask)
>> {
>> struct mask_array *ma = ovsl_dereference(tbl->mask_array);
>> int i;
>>
>> /* Remove the deleted mask pointers from the array */
>> for (i = 0; i < ma->count; i++) {
>> if (mask == ovsl_dereference(ma->masks[i]))
>> goto found;
>> }
>>
>> BUG(); <---------------------------- Here
>>
>> Pravin mentioned memory barrier usage in one of his replies. Perhaps
>> that is an avenue to explore.
> Hi Pravin, Greg
> I run the make check-kernel for a long time, and don't reproduce it.
> Greg, how did you reproduce it?
Hi Tonghao,
I use a Ubuntu 16.04 base system with updates all applied and build the
net-next kernel (with your
patches applied) and then install it.
I boot to the new kernel and then build the current master branch of OVS
from github:
./boot.sh
mkdir _build
cd _build
../configure
make -j8
I then run 'sudo make check-kernel TESTSUITEFLAGS="63"'
Hope this helps with your repro.
Thanks,
- Greg
>
> For using barrier, i should add READ_ONCE in flow_lookup in fast path
> (read-side ), and use WRITE_ONCE in
> tbl_mask_array_add/del_mask (write-side) protected by ovs_mutex. Other
> read-side (protected by ovs_mutex),
> e.g
> * flow_mask_find
> * ovs_flow_tbl_lookup_exact
> * ovs_flow_tbl_num_masks
>
> can access ma->count directly ?
>
>> Thanks,
>>
>> - Greg
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2019-10-15 17:37 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-08 1:00 [PATCH net-next v2 00/10] optimize openvswitch flow looking up xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 01/10] net: openvswitch: add flow-mask cache for performance xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 02/10] net: openvswitch: convert mask list in mask array xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 03/10] net: openvswitch: shrink the mask array if necessary xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 04/10] net: openvswitch: optimize flow-mask cache hash collision xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 05/10] net: openvswitch: optimize flow-mask looking up xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 06/10] net: openvswitch: simplify the flow_hash xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 07/10] net: openvswitch: add likely in flow_lookup xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 08/10] net: openvswitch: fix possible memleak on destroy flow-table xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 09/10] net: openvswitch: don't unlock mutex when changing the user_features fails xiangxia.m.yue
2019-10-08 1:00 ` [PATCH net-next v2 10/10] net: openvswitch: simplify the ovs_dp_cmd_new xiangxia.m.yue
2019-10-08 17:33 ` [PATCH net-next v2 00/10] optimize openvswitch flow looking up Gregory Rose
2019-10-10 8:42 ` Tonghao Zhang
2019-10-14 22:26 ` Gregory Rose
2019-10-15 8:25 ` Tonghao Zhang
2019-10-15 17:37 ` Gregory Rose
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).