* [PATCH 0/4] COLO net and runstate bugfix/optimization
@ 2022-03-09 8:38 Zhang Chen
2022-03-09 8:38 ` [PATCH 1/4] softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH Zhang Chen
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Zhang Chen @ 2022-03-09 8:38 UTC (permalink / raw)
To: Jason Wang, Li Zhijian; +Cc: Zhang Chen, qemu-dev
This series fix some COLO related issues in internal stress testing.
Zhang Chen (4):
softmmu/runstate.c: add RunStateTransition support form COLO to
PRELAUNCH
net/colo: Fix a "double free" crash to clear the conn_list
net/colo.c: No need to track conn_list for filter-rewriter
net/colo.c: fix segmentation fault when packet is not parsed correctly
net/colo-compare.c | 2 +-
net/colo.c | 11 +++++++++--
net/filter-rewriter.c | 2 +-
net/trace-events | 1 +
softmmu/runstate.c | 1 +
5 files changed, 13 insertions(+), 4 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/4] softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH
2022-03-09 8:38 [PATCH 0/4] COLO net and runstate bugfix/optimization Zhang Chen
@ 2022-03-09 8:38 ` Zhang Chen
2022-03-09 8:38 ` [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list Zhang Chen
` (2 subsequent siblings)
3 siblings, 0 replies; 13+ messages in thread
From: Zhang Chen @ 2022-03-09 8:38 UTC (permalink / raw)
To: Jason Wang, Li Zhijian; +Cc: Zhang Chen, qemu-dev, Like Xu
If the checkpoint occurs when the guest finishes restarting
but has not started running, the runstate_set() may reject
the transition from COLO to PRELAUNCH with the crash log:
{"timestamp": {"seconds": 1593484591, "microseconds": 26605},\
"event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
qemu-system-x86_64: invalid runstate transition: 'colo' -> 'prelaunch'
Long-term testing says that it's pretty safe.
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
softmmu/runstate.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index e0d869b21a..c021c56338 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -127,6 +127,7 @@ static const RunStateTransition runstate_transitions_def[] = {
{ RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
{ RUN_STATE_COLO, RUN_STATE_RUNNING },
+ { RUN_STATE_COLO, RUN_STATE_PRELAUNCH },
{ RUN_STATE_COLO, RUN_STATE_SHUTDOWN},
{ RUN_STATE_RUNNING, RUN_STATE_DEBUG },
--
2.25.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list
2022-03-09 8:38 [PATCH 0/4] COLO net and runstate bugfix/optimization Zhang Chen
2022-03-09 8:38 ` [PATCH 1/4] softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH Zhang Chen
@ 2022-03-09 8:38 ` Zhang Chen
2022-03-21 3:05 ` lizhijian
2022-03-09 8:38 ` [PATCH 3/4] net/colo.c: No need to track conn_list for filter-rewriter Zhang Chen
2022-03-09 8:38 ` [PATCH 4/4] net/colo.c: fix segmentation fault when packet is not parsed correctly Zhang Chen
3 siblings, 1 reply; 13+ messages in thread
From: Zhang Chen @ 2022-03-09 8:38 UTC (permalink / raw)
To: Jason Wang, Li Zhijian; +Cc: Zhang Chen, qemu-dev, Like Xu
We notice the QEMU may crash when the guest has too many
incoming network connections with the following log:
15197@1593578622.668573:colo_proxy_main : colo proxy connection hashtable full, clear it
free(): invalid pointer
[1] 15195 abort (core dumped) qemu-system-x86_64 ....
This is because we create the s->connection_track_table with
g_hash_table_new_full() which is defined as:
GHashTable * g_hash_table_new_full (GHashFunc hash_func,
GEqualFunc key_equal_func,
GDestroyNotify key_destroy_func,
GDestroyNotify value_destroy_func);
The fourth parameter connection_destroy() will be called to free the
memory allocated for all 'Connection' values in the hashtable when
we call g_hash_table_remove_all() in the connection_hashtable_reset().
It's unnecessary because we clear the conn_list explicitly later,
and it's buggy when other agents try to call connection_get()
with the same connection_track_table.
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
net/colo-compare.c | 2 +-
net/filter-rewriter.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/colo-compare.c b/net/colo-compare.c
index 62554b5b3c..ab054cfd21 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -1324,7 +1324,7 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
s->connection_track_table = g_hash_table_new_full(connection_key_hash,
connection_key_equal,
g_free,
- connection_destroy);
+ NULL);
colo_compare_iothread(s);
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index bf05023dc3..c18c4c2019 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState *nf, Error **errp)
s->connection_track_table = g_hash_table_new_full(connection_key_hash,
connection_key_equal,
g_free,
- connection_destroy);
+ NULL);
s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
}
--
2.25.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 3/4] net/colo.c: No need to track conn_list for filter-rewriter
2022-03-09 8:38 [PATCH 0/4] COLO net and runstate bugfix/optimization Zhang Chen
2022-03-09 8:38 ` [PATCH 1/4] softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH Zhang Chen
2022-03-09 8:38 ` [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list Zhang Chen
@ 2022-03-09 8:38 ` Zhang Chen
2022-03-21 3:16 ` lizhijian
2022-03-09 8:38 ` [PATCH 4/4] net/colo.c: fix segmentation fault when packet is not parsed correctly Zhang Chen
3 siblings, 1 reply; 13+ messages in thread
From: Zhang Chen @ 2022-03-09 8:38 UTC (permalink / raw)
To: Jason Wang, Li Zhijian; +Cc: Zhang Chen, qemu-dev
Filter-rewriter no need to track connection in conn_list.
This patch fix the glib g_queue_is_empty assertion when COLO guest
keep a lot of network connection.
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
net/colo.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/colo.c b/net/colo.c
index 1f8162f59f..694f3c93ef 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -218,7 +218,7 @@ Connection *connection_get(GHashTable *connection_track_table,
/*
* clear the conn_list
*/
- while (!g_queue_is_empty(conn_list)) {
+ while (conn_list && !g_queue_is_empty(conn_list)) {
connection_destroy(g_queue_pop_head(conn_list));
}
}
--
2.25.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 4/4] net/colo.c: fix segmentation fault when packet is not parsed correctly
2022-03-09 8:38 [PATCH 0/4] COLO net and runstate bugfix/optimization Zhang Chen
` (2 preceding siblings ...)
2022-03-09 8:38 ` [PATCH 3/4] net/colo.c: No need to track conn_list for filter-rewriter Zhang Chen
@ 2022-03-09 8:38 ` Zhang Chen
2022-03-21 3:39 ` lizhijian
3 siblings, 1 reply; 13+ messages in thread
From: Zhang Chen @ 2022-03-09 8:38 UTC (permalink / raw)
To: Jason Wang, Li Zhijian; +Cc: Zhang Chen, Tao Xu, qemu-dev
When COLO use only one vnet_hdr_support parameter between
filter-redirector and filter-mirror(or colo-compare), COLO will crash
with segmentation fault. Back track as follow:
Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
296 uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto);
(gdb) bt
0 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
1 0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at
net/colo.c:49
2 0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at
net/filter-rewriter.c:63
So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to
raise error and add trace-events to track vnet_hdr_len.
Signed-off-by: Tao Xu <tao3.xu@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
net/colo.c | 9 ++++++++-
net/trace-events | 1 +
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/net/colo.c b/net/colo.c
index 694f3c93ef..6b0ff562ad 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -46,7 +46,14 @@ int parse_packet_early(Packet *pkt)
static const uint8_t vlan[] = {0x81, 0x00};
uint8_t *data = pkt->data + pkt->vnet_hdr_len;
uint16_t l3_proto;
- ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
+ ssize_t l2hdr_len;
+
+ if (data == NULL) {
+ trace_colo_proxy_main_vnet_info("This packet is not parsed correctly, "
+ "pkt->vnet_hdr_len", pkt->vnet_hdr_len);
+ return 1;
+ }
+ l2hdr_len = eth_get_l2_hdr_length(data);
if (pkt->size < ETH_HLEN + pkt->vnet_hdr_len) {
trace_colo_proxy_main("pkt->size < ETH_HLEN");
diff --git a/net/trace-events b/net/trace-events
index d7a17256cc..6af927b4b9 100644
--- a/net/trace-events
+++ b/net/trace-events
@@ -9,6 +9,7 @@ vhost_user_event(const char *chr, int event) "chr: %s got event: %d"
# colo.c
colo_proxy_main(const char *chr) ": %s"
+colo_proxy_main_vnet_info(const char *sta, int size) ": %s = %d"
# colo-compare.c
colo_compare_main(const char *chr) ": %s"
--
2.25.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list
2022-03-09 8:38 ` [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list Zhang Chen
@ 2022-03-21 3:05 ` lizhijian
2022-03-28 9:13 ` Zhang, Chen
0 siblings, 1 reply; 13+ messages in thread
From: lizhijian @ 2022-03-21 3:05 UTC (permalink / raw)
To: Zhang Chen, Jason Wang, lizhijian; +Cc: qemu-dev, Like Xu
On 09/03/2022 16:38, Zhang Chen wrote:
> We notice the QEMU may crash when the guest has too many
> incoming network connections with the following log:
>
> 15197@1593578622.668573:colo_proxy_main : colo proxy connection hashtable full, clear it
> free(): invalid pointer
> [1] 15195 abort (core dumped) qemu-system-x86_64 ....
>
> This is because we create the s->connection_track_table with
> g_hash_table_new_full() which is defined as:
>
> GHashTable * g_hash_table_new_full (GHashFunc hash_func,
> GEqualFunc key_equal_func,
> GDestroyNotify key_destroy_func,
> GDestroyNotify value_destroy_func);
>
> The fourth parameter connection_destroy() will be called to free the
> memory allocated for all 'Connection' values in the hashtable when
> we call g_hash_table_remove_all() in the connection_hashtable_reset().
>
> It's unnecessary because we clear the conn_list explicitly later,
> and it's buggy when other agents try to call connection_get()
> with the same connection_track_table.
>
> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> ---
> net/colo-compare.c | 2 +-
> net/filter-rewriter.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/colo-compare.c b/net/colo-compare.c
> index 62554b5b3c..ab054cfd21 100644
> --- a/net/colo-compare.c
> +++ b/net/colo-compare.c
> @@ -1324,7 +1324,7 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
> s->connection_track_table = g_hash_table_new_full(connection_key_hash,
> connection_key_equal,
> g_free,
> - connection_destroy);
> + NULL);
202 /* if not found, create a new connection and add to hash table */
203 Connection *connection_get(GHashTable *connection_track_table,
204 ConnectionKey *key,
205 GQueue *conn_list)
206 {
207 Connection *conn = g_hash_table_lookup(connection_track_table, key);
208
209 if (conn == NULL) {
210 ConnectionKey *new_key = g_memdup(key, sizeof(*key));
211
212 conn = connection_new(key);
213
214 if (g_hash_table_size(connection_track_table) > HASHTABLE_MAX_SIZE) {
215 trace_colo_proxy_main("colo proxy connection hashtable full,"
216 " clear it");
217 connection_hashtable_reset(connection_track_table);
197 void connection_hashtable_reset(GHashTable *connection_track_table)
198 {
199 g_hash_table_remove_all(connection_track_table);
200 }
IIUC, above subroutine will do some cleanup explicitly. And before your patch, connection_hashtable_reset()
will release all keys and their values in this hashtable. But now, you remove all keys and just
one value(conn_list) instead. Does it means other values will be leaked ?
218 /*
219 * clear the conn_list
220 */
221 while (!g_queue_is_empty(conn_list)) {
222 connection_destroy(g_queue_pop_head(conn_list));
223 }
224 }
225
226 g_hash_table_insert(connection_track_table, new_key, conn);
227 }
228
229 return conn;
230 }
Thanks
Zhijian
>
> colo_compare_iothread(s);
>
> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
> index bf05023dc3..c18c4c2019 100644
> --- a/net/filter-rewriter.c
> +++ b/net/filter-rewriter.c
> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState *nf, Error **errp)
> s->connection_track_table = g_hash_table_new_full(connection_key_hash,
> connection_key_equal,
> g_free,
> - connection_destroy);
> + NULL);
> s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
> }
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 3/4] net/colo.c: No need to track conn_list for filter-rewriter
2022-03-09 8:38 ` [PATCH 3/4] net/colo.c: No need to track conn_list for filter-rewriter Zhang Chen
@ 2022-03-21 3:16 ` lizhijian
0 siblings, 0 replies; 13+ messages in thread
From: lizhijian @ 2022-03-21 3:16 UTC (permalink / raw)
To: Zhang Chen, Jason Wang, lizhijian; +Cc: qemu-dev
On 09/03/2022 16:38, Zhang Chen wrote:
> Filter-rewriter no need to track connection in conn_list.
> This patch fix the glib g_queue_is_empty assertion when COLO guest
> keep a lot of network connection.
>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
LGTM.
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
> ---
> net/colo.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/colo.c b/net/colo.c
> index 1f8162f59f..694f3c93ef 100644
> --- a/net/colo.c
> +++ b/net/colo.c
> @@ -218,7 +218,7 @@ Connection *connection_get(GHashTable *connection_track_table,
> /*
> * clear the conn_list
> */
> - while (!g_queue_is_empty(conn_list)) {
> + while (conn_list && !g_queue_is_empty(conn_list)) {
> connection_destroy(g_queue_pop_head(conn_list));
> }
> }
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 4/4] net/colo.c: fix segmentation fault when packet is not parsed correctly
2022-03-09 8:38 ` [PATCH 4/4] net/colo.c: fix segmentation fault when packet is not parsed correctly Zhang Chen
@ 2022-03-21 3:39 ` lizhijian
0 siblings, 0 replies; 13+ messages in thread
From: lizhijian @ 2022-03-21 3:39 UTC (permalink / raw)
To: Zhang Chen, Jason Wang, lizhijian; +Cc: Tao Xu, qemu-dev
On 09/03/2022 16:38, Zhang Chen wrote:
> When COLO use only one vnet_hdr_support parameter between
> filter-redirector and filter-mirror(or colo-compare), COLO will crash
> with segmentation fault. Back track as follow:
>
> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
> at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
> 296 uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto);
> (gdb) bt
> 0 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
> at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
> 1 0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at
> net/colo.c:49
> 2 0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at
> net/filter-rewriter.c:63
>
> So wrong vnet_hdr_len will cause pkt->data become NULL.
Not sure if we can check this earlier, well
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
> Add check to
> raise error and add trace-events to track vnet_hdr_len.
>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> ---
> net/colo.c | 9 ++++++++-
> net/trace-events | 1 +
> 2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/net/colo.c b/net/colo.c
> index 694f3c93ef..6b0ff562ad 100644
> --- a/net/colo.c
> +++ b/net/colo.c
> @@ -46,7 +46,14 @@ int parse_packet_early(Packet *pkt)
> static const uint8_t vlan[] = {0x81, 0x00};
> uint8_t *data = pkt->data + pkt->vnet_hdr_len;
> uint16_t l3_proto;
> - ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
> + ssize_t l2hdr_len;
> +
> + if (data == NULL) {
> + trace_colo_proxy_main_vnet_info("This packet is not parsed correctly, "
> + "pkt->vnet_hdr_len", pkt->vnet_hdr_len);
> + return 1;
> + }
> + l2hdr_len = eth_get_l2_hdr_length(data);
>
> if (pkt->size < ETH_HLEN + pkt->vnet_hdr_len) {
> trace_colo_proxy_main("pkt->size < ETH_HLEN");
> diff --git a/net/trace-events b/net/trace-events
> index d7a17256cc..6af927b4b9 100644
> --- a/net/trace-events
> +++ b/net/trace-events
> @@ -9,6 +9,7 @@ vhost_user_event(const char *chr, int event) "chr: %s got event: %d"
>
> # colo.c
> colo_proxy_main(const char *chr) ": %s"
> +colo_proxy_main_vnet_info(const char *sta, int size) ": %s = %d"
>
> # colo-compare.c
> colo_compare_main(const char *chr) ": %s"
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list
2022-03-21 3:05 ` lizhijian
@ 2022-03-28 9:13 ` Zhang, Chen
2022-03-31 1:14 ` lizhijian
0 siblings, 1 reply; 13+ messages in thread
From: Zhang, Chen @ 2022-03-28 9:13 UTC (permalink / raw)
To: lizhijian, Jason Wang; +Cc: qemu-dev, Like Xu
> -----Original Message-----
> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
> Sent: Monday, March 21, 2022 11:06 AM
> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
> <jasowang@redhat.com>; lizhijian@fujitsu.com
> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com>
> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the
> conn_list
>
>
>
> On 09/03/2022 16:38, Zhang Chen wrote:
> > We notice the QEMU may crash when the guest has too many incoming
> > network connections with the following log:
> >
> > 15197@1593578622.668573:colo_proxy_main : colo proxy connection
> > hashtable full, clear it
> > free(): invalid pointer
> > [1] 15195 abort (core dumped) qemu-system-x86_64 ....
> >
> > This is because we create the s->connection_track_table with
> > g_hash_table_new_full() which is defined as:
> >
> > GHashTable * g_hash_table_new_full (GHashFunc hash_func,
> > GEqualFunc key_equal_func,
> > GDestroyNotify key_destroy_func,
> > GDestroyNotify value_destroy_func);
> >
> > The fourth parameter connection_destroy() will be called to free the
> > memory allocated for all 'Connection' values in the hashtable when we
> > call g_hash_table_remove_all() in the connection_hashtable_reset().
> >
> > It's unnecessary because we clear the conn_list explicitly later, and
> > it's buggy when other agents try to call connection_get() with the
> > same connection_track_table.
> >
> > Signed-off-by: Like Xu <like.xu@linux.intel.com>
> > Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> > ---
> > net/colo-compare.c | 2 +-
> > net/filter-rewriter.c | 2 +-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/colo-compare.c b/net/colo-compare.c index
> > 62554b5b3c..ab054cfd21 100644
> > --- a/net/colo-compare.c
> > +++ b/net/colo-compare.c
> > @@ -1324,7 +1324,7 @@ static void
> colo_compare_complete(UserCreatable *uc, Error **errp)
> > s->connection_track_table =
> g_hash_table_new_full(connection_key_hash,
> > connection_key_equal,
> > g_free,
> > - connection_destroy);
> > + NULL);
>
>
> 202 /* if not found, create a new connection and add to hash table */
> 203 Connection *connection_get(GHashTable *connection_track_table,
> 204 ConnectionKey *key,
> 205 GQueue *conn_list)
> 206 {
> 207 Connection *conn = g_hash_table_lookup(connection_track_table,
> key);
> 208
> 209 if (conn == NULL) {
> 210 ConnectionKey *new_key = g_memdup(key, sizeof(*key));
> 211
> 212 conn = connection_new(key);
> 213
> 214 if (g_hash_table_size(connection_track_table) >
> HASHTABLE_MAX_SIZE) {
> 215 trace_colo_proxy_main("colo proxy connection hashtable full,"
> 216 " clear it");
> 217 connection_hashtable_reset(connection_track_table);
>
> 197 void connection_hashtable_reset(GHashTable *connection_track_table)
> 198 {
> 199 g_hash_table_remove_all(connection_track_table);
> 200 }
>
> IIUC, above subroutine will do some cleanup explicitly. And before your
> patch, connection_hashtable_reset() will release all keys and their values in
> this hashtable. But now, you remove all keys and just one value(conn_list)
> instead. Does it means other values will be leaked ?
Thanks Zhijian. Re-think about it. Yes, I think you are right.
It looks need a lock to avoid into connection_get() when triggered the clear hashtable operation.
What do you think?
Thanks
Chen
>
>
> 218 /*
> 219 * clear the conn_list
> 220 */
> 221 while (!g_queue_is_empty(conn_list)) {
> 222 connection_destroy(g_queue_pop_head(conn_list));
> 223 }
> 224 }
> 225
> 226 g_hash_table_insert(connection_track_table, new_key, conn);
> 227 }
> 228
> 229 return conn;
> 230 }
>
>
> Thanks
> Zhijian
>
> >
> > colo_compare_iothread(s);
> >
> > diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index
> > bf05023dc3..c18c4c2019 100644
> > --- a/net/filter-rewriter.c
> > +++ b/net/filter-rewriter.c
> > @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState *nf,
> Error **errp)
> > s->connection_track_table =
> g_hash_table_new_full(connection_key_hash,
> > connection_key_equal,
> > g_free,
> > - connection_destroy);
> > + NULL);
> > s->incoming_queue =
> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
> > }
> >
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list
2022-03-28 9:13 ` Zhang, Chen
@ 2022-03-31 1:14 ` lizhijian
2022-03-31 2:25 ` Zhang, Chen
0 siblings, 1 reply; 13+ messages in thread
From: lizhijian @ 2022-03-31 1:14 UTC (permalink / raw)
To: Zhang, Chen, Jason Wang; +Cc: qemu-dev, Like Xu
connection_track_table
-----+----------
key1 | conn |-----------------------------------------------------------+
-----+---------- |
key2 | conn |----------------------------------+ |
-----+---------- | |
key3 | conn |---------+ | |
-----+---------- | | |
| | |
| | |
+ CompareState ++ | | |
| | V V V
+---------------+ +---------------+ +---------------+
|conn_list +--->conn +--------->conn | connx
+---------------+ +---------------+ +---------------+
| | | | | |
+---------------+ +---v----+ +---v----+ +---v----+ +---v----+
|primary | |secondary |primary | |secondary
|packet | |packet + |packet | |packet +
+--------+ +--------+ +--------+ +--------+
| | | |
+---v----+ +---v----+ +---v----+ +---v----+
|primary | |secondary |primary | |secondary
|packet | |packet + |packet | |packet +
+--------+ +--------+ +--------+ +--------+
| | | |
+---v----+ +---v----+ +---v----+ +---v----+
|primary | |secondary |primary | |secondary
|packet | |packet + |packet | |packet +
+--------+ +--------+ +--------+ +--------+
I recalled that we should above relationships between connection_track_table conn_list and conn.
That means both connection_track_table and conn_list reference to the same conn instance.
So before this patch, connection_get() is possible to use-after-free/double free conn. where 1st was in
connection_hashtable_reset() and 2nd was
221 while (!g_queue_is_empty(conn_list)) {
222 connection_destroy(g_queue_pop_head(conn_list));
223 }
I also doubt that your current abort was just due to above use-after-free/double free.
If so, looks it's enough we just update to g_queue_clear(conn_list) in the 2nd place.
Thanks
Zhijian
On 28/03/2022 17:13, Zhang, Chen wrote:
>
>> -----Original Message-----
>> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
>> Sent: Monday, March 21, 2022 11:06 AM
>> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
>> <jasowang@redhat.com>; lizhijian@fujitsu.com
>> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com>
>> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the
>> conn_list
>>
>>
>>
>> On 09/03/2022 16:38, Zhang Chen wrote:
>>> We notice the QEMU may crash when the guest has too many incoming
>>> network connections with the following log:
>>>
>>> 15197@1593578622.668573:colo_proxy_main : colo proxy connection
>>> hashtable full, clear it
>>> free(): invalid pointer
>>> [1] 15195 abort (core dumped) qemu-system-x86_64 ....
>>>
>>> This is because we create the s->connection_track_table with
>>> g_hash_table_new_full() which is defined as:
>>>
>>> GHashTable * g_hash_table_new_full (GHashFunc hash_func,
>>> GEqualFunc key_equal_func,
>>> GDestroyNotify key_destroy_func,
>>> GDestroyNotify value_destroy_func);
>>>
>>> The fourth parameter connection_destroy() will be called to free the
>>> memory allocated for all 'Connection' values in the hashtable when we
>>> call g_hash_table_remove_all() in the connection_hashtable_reset().
>>>
>>> It's unnecessary because we clear the conn_list explicitly later, and
>>> it's buggy when other agents try to call connection_get() with the
>>> same connection_track_table.
>>>
>>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
>>> ---
>>> net/colo-compare.c | 2 +-
>>> net/filter-rewriter.c | 2 +-
>>> 2 files changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/colo-compare.c b/net/colo-compare.c index
>>> 62554b5b3c..ab054cfd21 100644
>>> --- a/net/colo-compare.c
>>> +++ b/net/colo-compare.c
>>> @@ -1324,7 +1324,7 @@ static void
>> colo_compare_complete(UserCreatable *uc, Error **errp)
>>> s->connection_track_table =
>> g_hash_table_new_full(connection_key_hash,
>>> connection_key_equal,
>>> g_free,
>>> - connection_destroy);
>>> + NULL);
>>
>> 202 /* if not found, create a new connection and add to hash table */
>> 203 Connection *connection_get(GHashTable *connection_track_table,
>> 204 ConnectionKey *key,
>> 205 GQueue *conn_list)
>> 206 {
>> 207 Connection *conn = g_hash_table_lookup(connection_track_table,
>> key);
>> 208
>> 209 if (conn == NULL) {
>> 210 ConnectionKey *new_key = g_memdup(key, sizeof(*key));
>> 211
>> 212 conn = connection_new(key);
>> 213
>> 214 if (g_hash_table_size(connection_track_table) >
>> HASHTABLE_MAX_SIZE) {
>> 215 trace_colo_proxy_main("colo proxy connection hashtable full,"
>> 216 " clear it");
>> 217 connection_hashtable_reset(connection_track_table);
>>
>> 197 void connection_hashtable_reset(GHashTable *connection_track_table)
>> 198 {
>> 199 g_hash_table_remove_all(connection_track_table);
>> 200 }
>>
>> IIUC, above subroutine will do some cleanup explicitly. And before your
>> patch, connection_hashtable_reset() will release all keys and their values in
>> this hashtable. But now, you remove all keys and just one value(conn_list)
>> instead. Does it means other values will be leaked ?
> Thanks Zhijian. Re-think about it. Yes, I think you are right.
> It looks need a lock to avoid into connection_get() when triggered the clear hashtable operation.
> What do you think?
>
> Thanks
> Chen
>
>
>>
>> 218 /*
>> 219 * clear the conn_list
>> 220 */
>> 221 while (!g_queue_is_empty(conn_list)) {
>> 222 connection_destroy(g_queue_pop_head(conn_list));
>> 223 }
>> 224 }
>> 225
>> 226 g_hash_table_insert(connection_track_table, new_key, conn);
>> 227 }
>> 228
>> 229 return conn;
>> 230 }
>>
>>
>> Thanks
>> Zhijian
>>
>>> colo_compare_iothread(s);
>>>
>>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index
>>> bf05023dc3..c18c4c2019 100644
>>> --- a/net/filter-rewriter.c
>>> +++ b/net/filter-rewriter.c
>>> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState *nf,
>> Error **errp)
>>> s->connection_track_table =
>> g_hash_table_new_full(connection_key_hash,
>>> connection_key_equal,
>>> g_free,
>>> - connection_destroy);
>>> + NULL);
>>> s->incoming_queue =
>> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
>>> }
>>>
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list
2022-03-31 1:14 ` lizhijian
@ 2022-03-31 2:25 ` Zhang, Chen
2022-04-01 1:46 ` lizhijian
0 siblings, 1 reply; 13+ messages in thread
From: Zhang, Chen @ 2022-03-31 2:25 UTC (permalink / raw)
To: lizhijian, Jason Wang; +Cc: qemu-dev, Like Xu
> -----Original Message-----
> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
> Sent: Thursday, March 31, 2022 9:15 AM
> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
> <jasowang@redhat.com>
> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com>
> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the
> conn_list
>
>
> connection_track_table
> -----+----------
> key1 | conn |-----------------------------------------------------------+
> -----+---------- |
> key2 | conn |----------------------------------+ |
> -----+---------- | |
> key3 | conn |---------+ | |
> -----+---------- | | |
> | | |
> | | |
> + CompareState ++ | | |
> | | V V V
> +---------------+ +---------------+ +---------------+
> |conn_list +--->conn +--------->conn | connx
> +---------------+ +---------------+ +---------------+
> | | | | | |
> +---------------+ +---v----+ +---v----+ +---v----+ +---v----+
> |primary | |secondary |primary | |secondary
> |packet | |packet + |packet | |packet +
> +--------+ +--------+ +--------+ +--------+
> | | | |
> +---v----+ +---v----+ +---v----+ +---v----+
> |primary | |secondary |primary | |secondary
> |packet | |packet + |packet | |packet +
> +--------+ +--------+ +--------+ +--------+
> | | | |
> +---v----+ +---v----+ +---v----+ +---v----+
> |primary | |secondary |primary | |secondary
> |packet | |packet + |packet | |packet +
> +--------+ +--------+ +--------+ +--------+
>
> I recalled that we should above relationships between
> connection_track_table conn_list and conn.
> That means both connection_track_table and conn_list reference to the
> same conn instance.
>
> So before this patch, connection_get() is possible to use-after-free/double
> free conn. where 1st was in
> connection_hashtable_reset() and 2nd was
> 221 while (!g_queue_is_empty(conn_list)) {
> 222 connection_destroy(g_queue_pop_head(conn_list));
> 223 }
>
> I also doubt that your current abort was just due to above use-after-
> free/double free.
> If so, looks it's enough we just update to g_queue_clear(conn_list) in the 2nd
> place.
Make sense, but It also means the original patch works here, skip free conn in connection_hashtable_reset() and do it in:
221 while (!g_queue_is_empty(conn_list)) {
222 connection_destroy(g_queue_pop_head(conn_list));
223 }.
It also avoid use-after-free/double free conn.
Maybe we can keep the original version to fix it?
Thanks
Chen
>
> Thanks
> Zhijian
>
>
> On 28/03/2022 17:13, Zhang, Chen wrote:
> >
> >> -----Original Message-----
> >> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
> >> Sent: Monday, March 21, 2022 11:06 AM
> >> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
> >> <jasowang@redhat.com>; lizhijian@fujitsu.com
> >> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu
> >> <like.xu@linux.intel.com>
> >> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear
> >> the conn_list
> >>
> >>
> >>
> >> On 09/03/2022 16:38, Zhang Chen wrote:
> >>> We notice the QEMU may crash when the guest has too many incoming
> >>> network connections with the following log:
> >>>
> >>> 15197@1593578622.668573:colo_proxy_main : colo proxy connection
> >>> hashtable full, clear it
> >>> free(): invalid pointer
> >>> [1] 15195 abort (core dumped) qemu-system-x86_64 ....
> >>>
> >>> This is because we create the s->connection_track_table with
> >>> g_hash_table_new_full() which is defined as:
> >>>
> >>> GHashTable * g_hash_table_new_full (GHashFunc hash_func,
> >>> GEqualFunc key_equal_func,
> >>> GDestroyNotify key_destroy_func,
> >>> GDestroyNotify value_destroy_func);
> >>>
> >>> The fourth parameter connection_destroy() will be called to free the
> >>> memory allocated for all 'Connection' values in the hashtable when
> >>> we call g_hash_table_remove_all() in the connection_hashtable_reset().
> >>>
> >>> It's unnecessary because we clear the conn_list explicitly later,
> >>> and it's buggy when other agents try to call connection_get() with
> >>> the same connection_track_table.
> >>>
> >>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> >>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> >>> ---
> >>> net/colo-compare.c | 2 +-
> >>> net/filter-rewriter.c | 2 +-
> >>> 2 files changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/net/colo-compare.c b/net/colo-compare.c index
> >>> 62554b5b3c..ab054cfd21 100644
> >>> --- a/net/colo-compare.c
> >>> +++ b/net/colo-compare.c
> >>> @@ -1324,7 +1324,7 @@ static void
> >> colo_compare_complete(UserCreatable *uc, Error **errp)
> >>> s->connection_track_table =
> >> g_hash_table_new_full(connection_key_hash,
> >>> connection_key_equal,
> >>> g_free,
> >>> - connection_destroy);
> >>> + NULL);
> >>
> >> 202 /* if not found, create a new connection and add to hash table */
> >> 203 Connection *connection_get(GHashTable *connection_track_table,
> >> 204 ConnectionKey *key,
> >> 205 GQueue *conn_list)
> >> 206 {
> >> 207 Connection *conn =
> >> g_hash_table_lookup(connection_track_table,
> >> key);
> >> 208
> >> 209 if (conn == NULL) {
> >> 210 ConnectionKey *new_key = g_memdup(key, sizeof(*key));
> >> 211
> >> 212 conn = connection_new(key);
> >> 213
> >> 214 if (g_hash_table_size(connection_track_table) >
> >> HASHTABLE_MAX_SIZE) {
> >> 215 trace_colo_proxy_main("colo proxy connection hashtable full,"
> >> 216 " clear it");
> >> 217 connection_hashtable_reset(connection_track_table);
> >>
> >> 197 void connection_hashtable_reset(GHashTable
> >> *connection_track_table)
> >> 198 {
> >> 199 g_hash_table_remove_all(connection_track_table);
> >> 200 }
> >>
> >> IIUC, above subroutine will do some cleanup explicitly. And before
> >> your patch, connection_hashtable_reset() will release all keys and
> >> their values in this hashtable. But now, you remove all keys and just
> >> one value(conn_list) instead. Does it means other values will be leaked ?
> > Thanks Zhijian. Re-think about it. Yes, I think you are right.
> > It looks need a lock to avoid into connection_get() when triggered the clear
> hashtable operation.
> > What do you think?
> >
> > Thanks
> > Chen
> >
> >
> >>
> >> 218 /*
> >> 219 * clear the conn_list
> >> 220 */
> >> 221 while (!g_queue_is_empty(conn_list)) {
> >> 222 connection_destroy(g_queue_pop_head(conn_list));
> >> 223 }
> >> 224 }
> >> 225
> >> 226 g_hash_table_insert(connection_track_table, new_key,
> >> conn);
> >> 227 }
> >> 228
> >> 229 return conn;
> >> 230 }
> >>
> >>
> >> Thanks
> >> Zhijian
> >>
> >>> colo_compare_iothread(s);
> >>>
> >>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index
> >>> bf05023dc3..c18c4c2019 100644
> >>> --- a/net/filter-rewriter.c
> >>> +++ b/net/filter-rewriter.c
> >>> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState
> >>> *nf,
> >> Error **errp)
> >>> s->connection_track_table =
> >> g_hash_table_new_full(connection_key_hash,
> >>> connection_key_equal,
> >>> g_free,
> >>> - connection_destroy);
> >>> + NULL);
> >>> s->incoming_queue =
> >> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
> >>> }
> >>>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list
2022-03-31 2:25 ` Zhang, Chen
@ 2022-04-01 1:46 ` lizhijian
2022-04-01 3:33 ` Zhang, Chen
0 siblings, 1 reply; 13+ messages in thread
From: lizhijian @ 2022-04-01 1:46 UTC (permalink / raw)
To: Zhang, Chen, Jason Wang; +Cc: qemu-dev, Like Xu
On 31/03/2022 10:25, Zhang, Chen wrote:
>
>> -----Original Message-----
>> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
>> Sent: Thursday, March 31, 2022 9:15 AM
>> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
>> <jasowang@redhat.com>
>> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com>
>> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the
>> conn_list
>>
>>
>> connection_track_table
>> -----+----------
>> key1 | conn |-----------------------------------------------------------+
>> -----+---------- |
>> key2 | conn |----------------------------------+ |
>> -----+---------- | |
>> key3 | conn |---------+ | |
>> -----+---------- | | |
>> | | |
>> | | |
>> + CompareState ++ | | |
>> | | V V V
>> +---------------+ +---------------+ +---------------+
>> |conn_list +--->conn +--------->conn | connx
>> +---------------+ +---------------+ +---------------+
>> | | | | | |
>> +---------------+ +---v----+ +---v----+ +---v----+ +---v----+
>> |primary | |secondary |primary | |secondary
>> |packet | |packet + |packet | |packet +
>> +--------+ +--------+ +--------+ +--------+
>> | | | |
>> +---v----+ +---v----+ +---v----+ +---v----+
>> |primary | |secondary |primary | |secondary
>> |packet | |packet + |packet | |packet +
>> +--------+ +--------+ +--------+ +--------+
>> | | | |
>> +---v----+ +---v----+ +---v----+ +---v----+
>> |primary | |secondary |primary | |secondary
>> |packet | |packet + |packet | |packet +
>> +--------+ +--------+ +--------+ +--------+
>>
>> I recalled that we should above relationships between
>> connection_track_table conn_list and conn.
>> That means both connection_track_table and conn_list reference to the
>> same conn instance.
>>
>> So before this patch, connection_get() is possible to use-after-free/double
>> free conn. where 1st was in
>> connection_hashtable_reset() and 2nd was
>> 221 while (!g_queue_is_empty(conn_list)) {
>> 222 connection_destroy(g_queue_pop_head(conn_list));
>> 223 }
>>
>> I also doubt that your current abort was just due to above use-after-
>> free/double free.
>> If so, looks it's enough we just update to g_queue_clear(conn_list) in the 2nd
>> place.
> Make sense, but It also means the original patch works here, skip free conn in connection_hashtable_reset() and do it in:
> 221 while (!g_queue_is_empty(conn_list)) {
> 222 connection_destroy(g_queue_pop_head(conn_list));
> 223 }.
> It also avoid use-after-free/double free conn.
Although you will not use-after-free here, you have to consider other situations carefully that
g_hash_table_remove_all() g_hash_table_destroy() were called where the conn_list should also be freed
with you approach.
> Maybe we can keep the original version to fix it?
And your commit log should be more clear.
Thanks
Zhijian
>
> Thanks
> Chen
>
>> Thanks
>> Zhijian
>>
>>
>> On 28/03/2022 17:13, Zhang, Chen wrote:
>>>> -----Original Message-----
>>>> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
>>>> Sent: Monday, March 21, 2022 11:06 AM
>>>> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
>>>> <jasowang@redhat.com>; lizhijian@fujitsu.com
>>>> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu
>>>> <like.xu@linux.intel.com>
>>>> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear
>>>> the conn_list
>>>>
>>>>
>>>>
>>>> On 09/03/2022 16:38, Zhang Chen wrote:
>>>>> We notice the QEMU may crash when the guest has too many incoming
>>>>> network connections with the following log:
>>>>>
>>>>> 15197@1593578622.668573:colo_proxy_main : colo proxy connection
>>>>> hashtable full, clear it
>>>>> free(): invalid pointer
>>>>> [1] 15195 abort (core dumped) qemu-system-x86_64 ....
>>>>>
>>>>> This is because we create the s->connection_track_table with
>>>>> g_hash_table_new_full() which is defined as:
>>>>>
>>>>> GHashTable * g_hash_table_new_full (GHashFunc hash_func,
>>>>> GEqualFunc key_equal_func,
>>>>> GDestroyNotify key_destroy_func,
>>>>> GDestroyNotify value_destroy_func);
>>>>>
>>>>> The fourth parameter connection_destroy() will be called to free the
>>>>> memory allocated for all 'Connection' values in the hashtable when
>>>>> we call g_hash_table_remove_all() in the connection_hashtable_reset().
>>>>>
>>>>> It's unnecessary because we clear the conn_list explicitly later,
>>>>> and it's buggy when other agents try to call connection_get() with
>>>>> the same connection_track_table.
>>>>>
>>>>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>>>>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
>>>>> ---
>>>>> net/colo-compare.c | 2 +-
>>>>> net/filter-rewriter.c | 2 +-
>>>>> 2 files changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/net/colo-compare.c b/net/colo-compare.c index
>>>>> 62554b5b3c..ab054cfd21 100644
>>>>> --- a/net/colo-compare.c
>>>>> +++ b/net/colo-compare.c
>>>>> @@ -1324,7 +1324,7 @@ static void
>>>> colo_compare_complete(UserCreatable *uc, Error **errp)
>>>>> s->connection_track_table =
>>>> g_hash_table_new_full(connection_key_hash,
>>>>> connection_key_equal,
>>>>> g_free,
>>>>> - connection_destroy);
>>>>> + NULL);
>>>> 202 /* if not found, create a new connection and add to hash table */
>>>> 203 Connection *connection_get(GHashTable *connection_track_table,
>>>> 204 ConnectionKey *key,
>>>> 205 GQueue *conn_list)
>>>> 206 {
>>>> 207 Connection *conn =
>>>> g_hash_table_lookup(connection_track_table,
>>>> key);
>>>> 208
>>>> 209 if (conn == NULL) {
>>>> 210 ConnectionKey *new_key = g_memdup(key, sizeof(*key));
>>>> 211
>>>> 212 conn = connection_new(key);
>>>> 213
>>>> 214 if (g_hash_table_size(connection_track_table) >
>>>> HASHTABLE_MAX_SIZE) {
>>>> 215 trace_colo_proxy_main("colo proxy connection hashtable full,"
>>>> 216 " clear it");
>>>> 217 connection_hashtable_reset(connection_track_table);
>>>>
>>>> 197 void connection_hashtable_reset(GHashTable
>>>> *connection_track_table)
>>>> 198 {
>>>> 199 g_hash_table_remove_all(connection_track_table);
>>>> 200 }
>>>>
>>>> IIUC, above subroutine will do some cleanup explicitly. And before
>>>> your patch, connection_hashtable_reset() will release all keys and
>>>> their values in this hashtable. But now, you remove all keys and just
>>>> one value(conn_list) instead. Does it means other values will be leaked ?
>>> Thanks Zhijian. Re-think about it. Yes, I think you are right.
>>> It looks need a lock to avoid into connection_get() when triggered the clear
>> hashtable operation.
>>> What do you think?
>>>
>>> Thanks
>>> Chen
>>>
>>>
>>>> 218 /*
>>>> 219 * clear the conn_list
>>>> 220 */
>>>> 221 while (!g_queue_is_empty(conn_list)) {
>>>> 222 connection_destroy(g_queue_pop_head(conn_list));
>>>> 223 }
>>>> 224 }
>>>> 225
>>>> 226 g_hash_table_insert(connection_track_table, new_key,
>>>> conn);
>>>> 227 }
>>>> 228
>>>> 229 return conn;
>>>> 230 }
>>>>
>>>>
>>>> Thanks
>>>> Zhijian
>>>>
>>>>> colo_compare_iothread(s);
>>>>>
>>>>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index
>>>>> bf05023dc3..c18c4c2019 100644
>>>>> --- a/net/filter-rewriter.c
>>>>> +++ b/net/filter-rewriter.c
>>>>> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState
>>>>> *nf,
>>>> Error **errp)
>>>>> s->connection_track_table =
>>>> g_hash_table_new_full(connection_key_hash,
>>>>> connection_key_equal,
>>>>> g_free,
>>>>> - connection_destroy);
>>>>> + NULL);
>>>>> s->incoming_queue =
>>>> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
>>>>> }
>>>>>
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list
2022-04-01 1:46 ` lizhijian
@ 2022-04-01 3:33 ` Zhang, Chen
0 siblings, 0 replies; 13+ messages in thread
From: Zhang, Chen @ 2022-04-01 3:33 UTC (permalink / raw)
To: lizhijian, Jason Wang; +Cc: qemu-dev, Like Xu
> -----Original Message-----
> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
> Sent: Friday, April 1, 2022 9:47 AM
> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
> <jasowang@redhat.com>
> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu <like.xu@linux.intel.com>
> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear the
> conn_list
>
>
>
> On 31/03/2022 10:25, Zhang, Chen wrote:
> >
> >> -----Original Message-----
> >> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
> >> Sent: Thursday, March 31, 2022 9:15 AM
> >> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
> >> <jasowang@redhat.com>
> >> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu
> >> <like.xu@linux.intel.com>
> >> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to clear
> >> the conn_list
> >>
> >>
> >> connection_track_table
> >> -----+----------
> >> key1 | conn |-----------------------------------------------------------+
> >> -----+---------- |
> >> key2 | conn |----------------------------------+ |
> >> -----+---------- | |
> >> key3 | conn |---------+ | |
> >> -----+---------- | | |
> >> | | |
> >> | | |
> >> + CompareState ++ | | |
> >> | | V V V
> >> +---------------+ +---------------+ +---------------+
> >> |conn_list +--->conn +--------->conn | connx
> >> +---------------+ +---------------+ +---------------+
> >> | | | | | |
> >> +---------------+ +---v----+ +---v----+ +---v----+ +---v----+
> >> |primary | |secondary |primary | |secondary
> >> |packet | |packet + |packet | |packet +
> >> +--------+ +--------+ +--------+ +--------+
> >> | | | |
> >> +---v----+ +---v----+ +---v----+ +---v----+
> >> |primary | |secondary |primary | |secondary
> >> |packet | |packet + |packet | |packet +
> >> +--------+ +--------+ +--------+ +--------+
> >> | | | |
> >> +---v----+ +---v----+ +---v----+ +---v----+
> >> |primary | |secondary |primary | |secondary
> >> |packet | |packet + |packet | |packet +
> >> +--------+ +--------+ +--------+ +--------+
> >>
> >> I recalled that we should above relationships between
> >> connection_track_table conn_list and conn.
> >> That means both connection_track_table and conn_list reference to the
> >> same conn instance.
> >>
> >> So before this patch, connection_get() is possible to
> >> use-after-free/double free conn. where 1st was in
> >> connection_hashtable_reset() and 2nd was
> >> 221 while (!g_queue_is_empty(conn_list)) {
> >> 222 connection_destroy(g_queue_pop_head(conn_list));
> >> 223 }
> >>
> >> I also doubt that your current abort was just due to above use-after-
> >> free/double free.
> >> If so, looks it's enough we just update to g_queue_clear(conn_list)
> >> in the 2nd place.
> > Make sense, but It also means the original patch works here, skip free conn
> in connection_hashtable_reset() and do it in:
> > 221 while (!g_queue_is_empty(conn_list)) {
> > 222 connection_destroy(g_queue_pop_head(conn_list));
> > 223 }.
> > It also avoid use-after-free/double free conn.
> Although you will not use-after-free here, you have to consider other
> situations carefully that
> g_hash_table_remove_all() g_hash_table_destroy() were called where the
> conn_list should also be freed with you approach.
>
>
I re-checked the code, it looks fine to me.
>
>
> > Maybe we can keep the original version to fix it?
> And your commit log should be more clear.
OK, I will update V2 for commit log.
Thanks
Chen
>
> Thanks
> Zhijian
>
> >
> > Thanks
> > Chen
> >
> >> Thanks
> >> Zhijian
> >>
> >>
> >> On 28/03/2022 17:13, Zhang, Chen wrote:
> >>>> -----Original Message-----
> >>>> From: lizhijian@fujitsu.com <lizhijian@fujitsu.com>
> >>>> Sent: Monday, March 21, 2022 11:06 AM
> >>>> To: Zhang, Chen <chen.zhang@intel.com>; Jason Wang
> >>>> <jasowang@redhat.com>; lizhijian@fujitsu.com
> >>>> Cc: qemu-dev <qemu-devel@nongnu.org>; Like Xu
> >>>> <like.xu@linux.intel.com>
> >>>> Subject: Re: [PATCH 2/4] net/colo: Fix a "double free" crash to
> >>>> clear the conn_list
> >>>>
> >>>>
> >>>>
> >>>> On 09/03/2022 16:38, Zhang Chen wrote:
> >>>>> We notice the QEMU may crash when the guest has too many
> incoming
> >>>>> network connections with the following log:
> >>>>>
> >>>>> 15197@1593578622.668573:colo_proxy_main : colo proxy connection
> >>>>> hashtable full, clear it
> >>>>> free(): invalid pointer
> >>>>> [1] 15195 abort (core dumped) qemu-system-x86_64 ....
> >>>>>
> >>>>> This is because we create the s->connection_track_table with
> >>>>> g_hash_table_new_full() which is defined as:
> >>>>>
> >>>>> GHashTable * g_hash_table_new_full (GHashFunc hash_func,
> >>>>> GEqualFunc key_equal_func,
> >>>>> GDestroyNotify key_destroy_func,
> >>>>> GDestroyNotify value_destroy_func);
> >>>>>
> >>>>> The fourth parameter connection_destroy() will be called to free
> >>>>> the memory allocated for all 'Connection' values in the hashtable
> >>>>> when we call g_hash_table_remove_all() in the
> connection_hashtable_reset().
> >>>>>
> >>>>> It's unnecessary because we clear the conn_list explicitly later,
> >>>>> and it's buggy when other agents try to call connection_get() with
> >>>>> the same connection_track_table.
> >>>>>
> >>>>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> >>>>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> >>>>> ---
> >>>>> net/colo-compare.c | 2 +-
> >>>>> net/filter-rewriter.c | 2 +-
> >>>>> 2 files changed, 2 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/net/colo-compare.c b/net/colo-compare.c index
> >>>>> 62554b5b3c..ab054cfd21 100644
> >>>>> --- a/net/colo-compare.c
> >>>>> +++ b/net/colo-compare.c
> >>>>> @@ -1324,7 +1324,7 @@ static void
> >>>> colo_compare_complete(UserCreatable *uc, Error **errp)
> >>>>> s->connection_track_table =
> >>>> g_hash_table_new_full(connection_key_hash,
> >>>>> connection_key_equal,
> >>>>> g_free,
> >>>>> - connection_destroy);
> >>>>> + NULL);
> >>>> 202 /* if not found, create a new connection and add to hash table
> >>>> */
> >>>> 203 Connection *connection_get(GHashTable *connection_track_table,
> >>>> 204 ConnectionKey *key,
> >>>> 205 GQueue *conn_list)
> >>>> 206 {
> >>>> 207 Connection *conn =
> >>>> g_hash_table_lookup(connection_track_table,
> >>>> key);
> >>>> 208
> >>>> 209 if (conn == NULL) {
> >>>> 210 ConnectionKey *new_key = g_memdup(key, sizeof(*key));
> >>>> 211
> >>>> 212 conn = connection_new(key);
> >>>> 213
> >>>> 214 if (g_hash_table_size(connection_track_table) >
> >>>> HASHTABLE_MAX_SIZE) {
> >>>> 215 trace_colo_proxy_main("colo proxy connection hashtable
> full,"
> >>>> 216 " clear it");
> >>>> 217 connection_hashtable_reset(connection_track_table);
> >>>>
> >>>> 197 void connection_hashtable_reset(GHashTable
> >>>> *connection_track_table)
> >>>> 198 {
> >>>> 199 g_hash_table_remove_all(connection_track_table);
> >>>> 200 }
> >>>>
> >>>> IIUC, above subroutine will do some cleanup explicitly. And before
> >>>> your patch, connection_hashtable_reset() will release all keys and
> >>>> their values in this hashtable. But now, you remove all keys and
> >>>> just one value(conn_list) instead. Does it means other values will be
> leaked ?
> >>> Thanks Zhijian. Re-think about it. Yes, I think you are right.
> >>> It looks need a lock to avoid into connection_get() when triggered
> >>> the clear
> >> hashtable operation.
> >>> What do you think?
> >>>
> >>> Thanks
> >>> Chen
> >>>
> >>>
> >>>> 218 /*
> >>>> 219 * clear the conn_list
> >>>> 220 */
> >>>> 221 while (!g_queue_is_empty(conn_list)) {
> >>>> 222 connection_destroy(g_queue_pop_head(conn_list));
> >>>> 223 }
> >>>> 224 }
> >>>> 225
> >>>> 226 g_hash_table_insert(connection_track_table, new_key,
> >>>> conn);
> >>>> 227 }
> >>>> 228
> >>>> 229 return conn;
> >>>> 230 }
> >>>>
> >>>>
> >>>> Thanks
> >>>> Zhijian
> >>>>
> >>>>> colo_compare_iothread(s);
> >>>>>
> >>>>> diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c index
> >>>>> bf05023dc3..c18c4c2019 100644
> >>>>> --- a/net/filter-rewriter.c
> >>>>> +++ b/net/filter-rewriter.c
> >>>>> @@ -383,7 +383,7 @@ static void colo_rewriter_setup(NetFilterState
> >>>>> *nf,
> >>>> Error **errp)
> >>>>> s->connection_track_table =
> >>>> g_hash_table_new_full(connection_key_hash,
> >>>>> connection_key_equal,
> >>>>> g_free,
> >>>>> - connection_destroy);
> >>>>> + NULL);
> >>>>> s->incoming_queue =
> >>>> qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
> >>>>> }
> >>>>>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2022-04-01 3:34 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-09 8:38 [PATCH 0/4] COLO net and runstate bugfix/optimization Zhang Chen
2022-03-09 8:38 ` [PATCH 1/4] softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH Zhang Chen
2022-03-09 8:38 ` [PATCH 2/4] net/colo: Fix a "double free" crash to clear the conn_list Zhang Chen
2022-03-21 3:05 ` lizhijian
2022-03-28 9:13 ` Zhang, Chen
2022-03-31 1:14 ` lizhijian
2022-03-31 2:25 ` Zhang, Chen
2022-04-01 1:46 ` lizhijian
2022-04-01 3:33 ` Zhang, Chen
2022-03-09 8:38 ` [PATCH 3/4] net/colo.c: No need to track conn_list for filter-rewriter Zhang Chen
2022-03-21 3:16 ` lizhijian
2022-03-09 8:38 ` [PATCH 4/4] net/colo.c: fix segmentation fault when packet is not parsed correctly Zhang Chen
2022-03-21 3:39 ` lizhijian
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.