All of lore.kernel.org
 help / color / mirror / Atom feed
* Memory leak when adding/removing vhost_user ports
@ 2016-04-18 17:18 Christian Ehrhardt
  2016-04-18 17:46 ` Yuanhan Liu
  2016-04-21 11:01 ` Ilya Maximets
  0 siblings, 2 replies; 22+ messages in thread
From: Christian Ehrhardt @ 2016-04-18 17:18 UTC (permalink / raw)
  To: dev; +Cc: Daniele Di Proietto

I assume there is a leak somewhere on adding/removing vhost_user ports.
Although it could also be "only" a fragmentation issue.

Reproduction is easy:
I set up a pair of nicely working OVS-DPDK connected KVM Guests.
Then in a loop I
   - add up to more 512 ports
   - test connectivity between the two guests
   - remove up to 512 ports

Depending on memory and the amount of multiqueue/rxq I use it seems to
slightly change when exactly it breaks. But for my default setup of 4
queues and 5G Hugepages initialized by DPDK it always breaks at the sixth
iteration.
Here a link to the stack trace indicating a memory shortage (TBC):
https://launchpadlibrarian.net/253916410/apport-retrace.log

Known Todos:
- I want to track it down more, and will try to come up with a non
openvswitch based looping testcase that might show it as well to simplify
debugging.
- in use were Openvswitch-dpdk 2.5 and DPDK 2.2; Retest with DPDK 16.04 and
Openvswitch master is planned.

I will go on debugging this and let you know, but I wanted to give a heads
up to everyone.
In case this is a known issue for some of you please let me know.

Kind Regards,
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

P.S. I think it is a dpdk issue, but adding Daniele on CC to represent
ovs-dpdk as well.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Memory leak when adding/removing vhost_user ports
  2016-04-18 17:18 Memory leak when adding/removing vhost_user ports Christian Ehrhardt
@ 2016-04-18 17:46 ` Yuanhan Liu
  2016-04-18 18:14   ` Yuanhan Liu
  2016-04-21 11:01 ` Ilya Maximets
  1 sibling, 1 reply; 22+ messages in thread
From: Yuanhan Liu @ 2016-04-18 17:46 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: dev, Daniele Di Proietto

On Mon, Apr 18, 2016 at 07:18:05PM +0200, Christian Ehrhardt wrote:
> I assume there is a leak somewhere on adding/removing vhost_user ports.
> Although it could also be "only" a fragmentation issue.
> 
> Reproduction is easy:
> I set up a pair of nicely working OVS-DPDK connected KVM Guests.
> Then in a loop I
>    - add up to more 512 ports
>    - test connectivity between the two guests
>    - remove up to 512 ports
> 
> Depending on memory and the amount of multiqueue/rxq I use it seems to
> slightly change when exactly it breaks. But for my default setup of 4
> queues and 5G Hugepages initialized by DPDK it always breaks at the sixth
> iteration.
> Here a link to the stack trace indicating a memory shortage (TBC):
> https://launchpadlibrarian.net/253916410/apport-retrace.log
> 
> Known Todos:
> - I want to track it down more, and will try to come up with a non
> openvswitch based looping testcase that might show it as well to simplify
> debugging.
> - in use were Openvswitch-dpdk 2.5 and DPDK 2.2; Retest with DPDK 16.04 and
> Openvswitch master is planned.
> 
> I will go on debugging this and let you know, but I wanted to give a heads
> up to everyone.

Thanks for the report.

> In case this is a known issue for some of you please let me know.

Yeah, it might be. I'm wondering that virtio_net struct is not freed.
It will be freed only (if I'm not mistaken) when guest quits, by far.

BTW, could you dump the ovs-dpdk log?

	--yliu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Memory leak when adding/removing vhost_user ports
  2016-04-18 17:46 ` Yuanhan Liu
@ 2016-04-18 18:14   ` Yuanhan Liu
  2016-04-19 16:33     ` Christian Ehrhardt
  0 siblings, 1 reply; 22+ messages in thread
From: Yuanhan Liu @ 2016-04-18 18:14 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: dev, Daniele Di Proietto

On Mon, Apr 18, 2016 at 10:46:50AM -0700, Yuanhan Liu wrote:
> On Mon, Apr 18, 2016 at 07:18:05PM +0200, Christian Ehrhardt wrote:
> > I assume there is a leak somewhere on adding/removing vhost_user ports.
> > Although it could also be "only" a fragmentation issue.
> > 
> > Reproduction is easy:
> > I set up a pair of nicely working OVS-DPDK connected KVM Guests.
> > Then in a loop I
> >    - add up to more 512 ports
> >    - test connectivity between the two guests
> >    - remove up to 512 ports
> > 
> > Depending on memory and the amount of multiqueue/rxq I use it seems to
> > slightly change when exactly it breaks. But for my default setup of 4
> > queues and 5G Hugepages initialized by DPDK it always breaks at the sixth
> > iteration.
> > Here a link to the stack trace indicating a memory shortage (TBC):
> > https://launchpadlibrarian.net/253916410/apport-retrace.log
> > 
> > Known Todos:
> > - I want to track it down more, and will try to come up with a non
> > openvswitch based looping testcase that might show it as well to simplify
> > debugging.
> > - in use were Openvswitch-dpdk 2.5 and DPDK 2.2; Retest with DPDK 16.04 and
> > Openvswitch master is planned.
> > 
> > I will go on debugging this and let you know, but I wanted to give a heads
> > up to everyone.
> 
> Thanks for the report.
> 
> > In case this is a known issue for some of you please let me know.
> 
> Yeah, it might be. I'm wondering that virtio_net struct is not freed.
> It will be freed only (if I'm not mistaken) when guest quits, by far.

Would you try following diff and to see if it fix your issue?

	--yliu

---
 lib/librte_vhost/vhost_user/vhost-net-user.c | 6 ++++++
 lib/librte_vhost/vhost_user/vhost-net-user.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c
index df2bd64..8f7ebd7 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -309,6 +309,7 @@ vserver_new_vq_conn(int fd, void *dat, __rte_unused int *remove)
 	}
 
 	vdev_ctx.fh = fh;
+	vserver->fh = fh;
 	size = strnlen(vserver->path, PATH_MAX);
 	vhost_set_ifname(vdev_ctx, vserver->path,
 		size);
@@ -501,6 +502,11 @@ rte_vhost_driver_unregister(const char *path)
 
 	for (i = 0; i < g_vhost_server.vserver_cnt; i++) {
 		if (!strcmp(g_vhost_server.server[i]->path, path)) {
+			struct vhost_device_ctx ctx;
+
+			ctx.fh = g_vhost_server.server[i]->fh;
+			vhost_destroy_device(ctx);
+
 			fdset_del(&g_vhost_server.fdset,
 				g_vhost_server.server[i]->listenfd);
 
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h b/lib/librte_vhost/vhost_user/vhost-net-user.h
index e3bb413..7cf21db 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.h
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.h
@@ -43,6 +43,7 @@
 struct vhost_server {
 	char *path; /**< The path the uds is bind to. */
 	int listenfd;     /**< The listener sockfd. */
+	uint32_t fh;
 };
 
 /* refer to hw/virtio/vhost-user.c */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Memory leak when adding/removing vhost_user ports
  2016-04-18 18:14   ` Yuanhan Liu
@ 2016-04-19 16:33     ` Christian Ehrhardt
  2016-04-20  5:04       ` Yuanhan Liu
  2016-07-06 12:24       ` [PATCH v2] " Christian Ehrhardt
  0 siblings, 2 replies; 22+ messages in thread
From: Christian Ehrhardt @ 2016-04-19 16:33 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, Daniele Di Proietto

Hi,
thanks for the patch.
I backported it this way to my DPDK 2.2 based environment for now (see
below):

With that applied one (and only one) of my two guests looses connectivity
after removing the ports the first time.
No traffic seems to pass, setting the device in the guest down/up doesn't
get anything.
But It isn't totally broken - stopping/starting the guest gets it working
again.
So openvswitch/dpdk is still somewhat working - it just seems the guest
lost something, after tapping on that vhost_user interface again it works.

I will check tomorrow and let you know:
- if I'm more lucky with that patch on top of 16.04
- if it looses connectivity after the first or a certain amount of port
removes

If you find issues with my backport adaption let me know.


---

Backport and reasoning:

new fix relies on a lot of new code, vhost_destroy_device looks totally
different from the former destroy_device.
History on todays function content:
  4796ad63 - original code moved from examples to lib
  a90ca1a1 - this replaces ops->destroy_device with vhost_destroy_device
  71dc571e - simple check against null pointers
  45ca9c6f - this changed the code from linked list to arrays

  New code cleans with:
      notify_ops->destroy_device (callback into the parent)
      cleanup_device was existing before even in 2.2 code
      free_device as well existing before even in 2.2 code
  Old code cleans with:
      notify_ops->destroy_device - still there
      rm_config_ll_entry -> eventually calls cleanup_device and free_device
        (just in the more complex linked list way)

So the "only" adaption for backporting needed is to replace
vhost_destroy_device
with ops->destroy_device(ctx)

Index: dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c
===================================================================
--- dpdk.orig/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -310,6 +310,7 @@ vserver_new_vq_conn(int fd, void *dat, _
  }

  vdev_ctx.fh = fh;
+ vserver->fh = fh;
  size = strnlen(vserver->path, PATH_MAX);
  ops->set_ifname(vdev_ctx, vserver->path,
  size);
@@ -516,6 +517,11 @@ rte_vhost_driver_unregister(const char *

  for (i = 0; i < g_vhost_server.vserver_cnt; i++) {
  if (!strcmp(g_vhost_server.server[i]->path, path)) {
+ struct vhost_device_ctx ctx;
+
+ ctx.fh = g_vhost_server.server[i]->fh;
+ ops->destroy_device(ctx);
+
  fdset_del(&g_vhost_server.fdset,
  g_vhost_server.server[i]->listenfd);

Index: dpdk/lib/librte_vhost/vhost_user/vhost-net-user.h
===================================================================
--- dpdk.orig/lib/librte_vhost/vhost_user/vhost-net-user.h
+++ dpdk/lib/librte_vhost/vhost_user/vhost-net-user.h
@@ -43,6 +43,7 @@
 struct vhost_server {
  char *path; /**< The path the uds is bind to. */
  int listenfd;     /**< The listener sockfd. */
+ uint32_t fh;
 };

 /* refer to hw/virtio/vhost-user.c */




Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

On Mon, Apr 18, 2016 at 8:14 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:

> On Mon, Apr 18, 2016 at 10:46:50AM -0700, Yuanhan Liu wrote:
> > On Mon, Apr 18, 2016 at 07:18:05PM +0200, Christian Ehrhardt wrote:
> > > I assume there is a leak somewhere on adding/removing vhost_user ports.
> > > Although it could also be "only" a fragmentation issue.
> > >
> > > Reproduction is easy:
> > > I set up a pair of nicely working OVS-DPDK connected KVM Guests.
> > > Then in a loop I
> > >    - add up to more 512 ports
> > >    - test connectivity between the two guests
> > >    - remove up to 512 ports
> > >
> > > Depending on memory and the amount of multiqueue/rxq I use it seems to
> > > slightly change when exactly it breaks. But for my default setup of 4
> > > queues and 5G Hugepages initialized by DPDK it always breaks at the
> sixth
> > > iteration.
> > > Here a link to the stack trace indicating a memory shortage (TBC):
> > > https://launchpadlibrarian.net/253916410/apport-retrace.log
> > >
> > > Known Todos:
> > > - I want to track it down more, and will try to come up with a non
> > > openvswitch based looping testcase that might show it as well to
> simplify
> > > debugging.
> > > - in use were Openvswitch-dpdk 2.5 and DPDK 2.2; Retest with DPDK
> 16.04 and
> > > Openvswitch master is planned.
> > >
> > > I will go on debugging this and let you know, but I wanted to give a
> heads
> > > up to everyone.
> >
> > Thanks for the report.
> >
> > > In case this is a known issue for some of you please let me know.
> >
> > Yeah, it might be. I'm wondering that virtio_net struct is not freed.
> > It will be freed only (if I'm not mistaken) when guest quits, by far.
>
> Would you try following diff and to see if it fix your issue?
>
>         --yliu
>
> ---
>  lib/librte_vhost/vhost_user/vhost-net-user.c | 6 ++++++
>  lib/librte_vhost/vhost_user/vhost-net-user.h | 1 +
>  2 files changed, 7 insertions(+)
>
> diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c
> b/lib/librte_vhost/vhost_user/vhost-net-user.c
> index df2bd64..8f7ebd7 100644
> --- a/lib/librte_vhost/vhost_user/vhost-net-user.c
> +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
> @@ -309,6 +309,7 @@ vserver_new_vq_conn(int fd, void *dat, __rte_unused
> int *remove)
>         }
>
>         vdev_ctx.fh = fh;
> +       vserver->fh = fh;
>         size = strnlen(vserver->path, PATH_MAX);
>         vhost_set_ifname(vdev_ctx, vserver->path,
>                 size);
> @@ -501,6 +502,11 @@ rte_vhost_driver_unregister(const char *path)
>
>         for (i = 0; i < g_vhost_server.vserver_cnt; i++) {
>                 if (!strcmp(g_vhost_server.server[i]->path, path)) {
> +                       struct vhost_device_ctx ctx;
> +
> +                       ctx.fh = g_vhost_server.server[i]->fh;
> +                       vhost_destroy_device(ctx);
> +
>                         fdset_del(&g_vhost_server.fdset,
>                                 g_vhost_server.server[i]->listenfd);
>
> diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h
> b/lib/librte_vhost/vhost_user/vhost-net-user.h
> index e3bb413..7cf21db 100644
> --- a/lib/librte_vhost/vhost_user/vhost-net-user.h
> +++ b/lib/librte_vhost/vhost_user/vhost-net-user.h
> @@ -43,6 +43,7 @@
>  struct vhost_server {
>         char *path; /**< The path the uds is bind to. */
>         int listenfd;     /**< The listener sockfd. */
> +       uint32_t fh;
>  };
>
>  /* refer to hw/virtio/vhost-user.c */
> --
> 1.9.3
>
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Memory leak when adding/removing vhost_user ports
  2016-04-19 16:33     ` Christian Ehrhardt
@ 2016-04-20  5:04       ` Yuanhan Liu
  2016-04-20  6:18         ` Christian Ehrhardt
  2016-07-06 12:24       ` [PATCH v2] " Christian Ehrhardt
  1 sibling, 1 reply; 22+ messages in thread
From: Yuanhan Liu @ 2016-04-20  5:04 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: dev, Daniele Di Proietto

On Tue, Apr 19, 2016 at 06:33:50PM +0200, Christian Ehrhardt wrote:
> Hi,
> thanks for the patch.
> I backported it this way to my DPDK 2.2 based environment for now (see below):
> 
> With that applied one (and only one) of my two guests looses connectivity after
> removing the ports the first time.

Yeah, that's should be because I invoked the "->destroy_device()"
callback.

BTW, I'm curious how do you do the test? I saw you added 256 ports, but
with 2 guests only? So, 254 of them are idle, just for testing the
memory leak bug?

And then you remove all of them, without stopping the guest. How that's
gonna work? I mean, the vhost-user connection would be broken, and data
flow would not work.

	--yliu

> No traffic seems to pass, setting the device in the guest down/up doesn't get
> anything.
> But It isn't totally broken - stopping/starting the guest gets it working
> again.
> So openvswitch/dpdk is still somewhat working - it just seems the guest lost
> something, after tapping on that vhost_user interface again it works.
> 
> I will check tomorrow and let you know:
> - if I'm more lucky with that patch on top of 16.04
> - if it looses connectivity after the first or a certain amount of port removes
> 
> If you find issues with my backport adaption let me know.
> 
> 
> ---
> 
> Backport and reasoning:
> 
> new fix relies on a lot of new code, vhost_destroy_device looks totally
> different from the former destroy_device.
> History on todays function content:
>   4796ad63 - original code moved from examples to lib
>   a90ca1a1 - this replaces ops->destroy_device with vhost_destroy_device
>   71dc571e - simple check against null pointers
>   45ca9c6f - this changed the code from linked list to arrays
> 
>   New code cleans with:
>       notify_ops->destroy_device (callback into the parent)
>       cleanup_device was existing before even in 2.2 code
>       free_device as well existing before even in 2.2 code
>   Old code cleans with:
>       notify_ops->destroy_device - still there
>       rm_config_ll_entry -> eventually calls cleanup_device and free_device
>         (just in the more complex linked list way)
> 
> So the "only" adaption for backporting needed is to replace
> vhost_destroy_device
> with ops->destroy_device(ctx)
> 
> Index: dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c
> ===================================================================
> --- dpdk.orig/lib/librte_vhost/vhost_user/vhost-net-user.c
> +++ dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c
> @@ -310,6 +310,7 @@ vserver_new_vq_conn(int fd, void *dat, _
>   }
>  
>   vdev_ctx.fh = fh;
> + vserver->fh = fh;
>   size = strnlen(vserver->path, PATH_MAX);
>   ops->set_ifname(vdev_ctx, vserver->path,
>   size);
> @@ -516,6 +517,11 @@ rte_vhost_driver_unregister(const char *
>  
>   for (i = 0; i < g_vhost_server.vserver_cnt; i++) {
>   if (!strcmp(g_vhost_server.server[i]->path, path)) {
> + struct vhost_device_ctx ctx;
> +
> + ctx.fh = g_vhost_server.server[i]->fh;
> + ops->destroy_device(ctx);
> +
>   fdset_del(&g_vhost_server.fdset,
>   g_vhost_server.server[i]->listenfd);
>  
> Index: dpdk/lib/librte_vhost/vhost_user/vhost-net-user.h
> ===================================================================
> --- dpdk.orig/lib/librte_vhost/vhost_user/vhost-net-user.h
> +++ dpdk/lib/librte_vhost/vhost_user/vhost-net-user.h
> @@ -43,6 +43,7 @@
>  struct vhost_server {
>   char *path; /**< The path the uds is bind to. */
>   int listenfd;     /**< The listener sockfd. */
> + uint32_t fh;
>  };
>  
>  /* refer to hw/virtio/vhost-user.c */
> 
> 
> 
> 
> Christian Ehrhardt
> Software Engineer, Ubuntu Server
> Canonical Ltd
> 
> On Mon, Apr 18, 2016 at 8:14 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> wrote:
> 
>     On Mon, Apr 18, 2016 at 10:46:50AM -0700, Yuanhan Liu wrote:
>     > On Mon, Apr 18, 2016 at 07:18:05PM +0200, Christian Ehrhardt wrote:
>     > > I assume there is a leak somewhere on adding/removing vhost_user ports.
>     > > Although it could also be "only" a fragmentation issue.
>     > >
>     > > Reproduction is easy:
>     > > I set up a pair of nicely working OVS-DPDK connected KVM Guests.
>     > > Then in a loop I
>     > >    - add up to more 512 ports
>     > >    - test connectivity between the two guests
>     > >    - remove up to 512 ports
>     > >
>     > > Depending on memory and the amount of multiqueue/rxq I use it seems to
>     > > slightly change when exactly it breaks. But for my default setup of 4
>     > > queues and 5G Hugepages initialized by DPDK it always breaks at the
>     sixth
>     > > iteration.
>     > > Here a link to the stack trace indicating a memory shortage (TBC):
>     > > https://launchpadlibrarian.net/253916410/apport-retrace.log
>     > >
>     > > Known Todos:
>     > > - I want to track it down more, and will try to come up with a non
>     > > openvswitch based looping testcase that might show it as well to
>     simplify
>     > > debugging.
>     > > - in use were Openvswitch-dpdk 2.5 and DPDK 2.2; Retest with DPDK 16.04
>     and
>     > > Openvswitch master is planned.
>     > >
>     > > I will go on debugging this and let you know, but I wanted to give a
>     heads
>     > > up to everyone.
>     >
>     > Thanks for the report.
>     >
>     > > In case this is a known issue for some of you please let me know.
>     >
>     > Yeah, it might be. I'm wondering that virtio_net struct is not freed.
>     > It will be freed only (if I'm not mistaken) when guest quits, by far.
> 
>     Would you try following diff and to see if it fix your issue?
> 
>             --yliu
> 
>     ---
>      lib/librte_vhost/vhost_user/vhost-net-user.c | 6 ++++++
>      lib/librte_vhost/vhost_user/vhost-net-user.h | 1 +
>      2 files changed, 7 insertions(+)
> 
>     diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/
>     librte_vhost/vhost_user/vhost-net-user.c
>     index df2bd64..8f7ebd7 100644
>     --- a/lib/librte_vhost/vhost_user/vhost-net-user.c
>     +++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
>     @@ -309,6 +309,7 @@ vserver_new_vq_conn(int fd, void *dat, __rte_unused int
>     *remove)
>             }
> 
>             vdev_ctx.fh = fh;
>     +       vserver->fh = fh;
>             size = strnlen(vserver->path, PATH_MAX);
>             vhost_set_ifname(vdev_ctx, vserver->path,
>                     size);
>     @@ -501,6 +502,11 @@ rte_vhost_driver_unregister(const char *path)
> 
>             for (i = 0; i < g_vhost_server.vserver_cnt; i++) {
>                     if (!strcmp(g_vhost_server.server[i]->path, path)) {
>     +                       struct vhost_device_ctx ctx;
>     +
>     +                       ctx.fh = g_vhost_server.server[i]->fh;
>     +                       vhost_destroy_device(ctx);
>     +
>                             fdset_del(&g_vhost_server.fdset,
>                                     g_vhost_server.server[i]->listenfd);
> 
>     diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h b/lib/
>     librte_vhost/vhost_user/vhost-net-user.h
>     index e3bb413..7cf21db 100644
>     --- a/lib/librte_vhost/vhost_user/vhost-net-user.h
>     +++ b/lib/librte_vhost/vhost_user/vhost-net-user.h
>     @@ -43,6 +43,7 @@
>      struct vhost_server {
>             char *path; /**< The path the uds is bind to. */
>             int listenfd;     /**< The listener sockfd. */
>     +       uint32_t fh;
>      };
> 
>      /* refer to hw/virtio/vhost-user.c */
>     --
>     1.9.3
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Memory leak when adding/removing vhost_user ports
  2016-04-20  5:04       ` Yuanhan Liu
@ 2016-04-20  6:18         ` Christian Ehrhardt
  2016-04-21  5:54           ` Yuanhan Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Christian Ehrhardt @ 2016-04-20  6:18 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, Daniele Di Proietto

On Wed, Apr 20, 2016 at 7:04 AM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:

> On Tue, Apr 19, 2016 at 06:33:50PM +0200, Christian Ehrhardt wrote:
>
[...]

> > With that applied one (and only one) of my two guests looses
> connectivity after
> > removing the ports the first time.
>
> Yeah, that's should be because I invoked the "->destroy_device()"
> callback.
>

Shouldn't that not only destroy the particular vhost_user device I remove?
See below for some better details on the test to clarify that.

BTW, I'm curious how do you do the test? I saw you added 256 ports, but
> with 2 guests only? So, 254 of them are idle, just for testing the
> memory leak bug?
>

Maybe I should describe it better:
1. Spawn some vhost-user ports (40 in my case)
2. Spawn a pair of guests that connect via four of those ports per guest
3. Guests only intialize one of that vhost_user based NICs
4. check connectivity between guests via the vhost_user based connection
(working at this stage)
LOOP 5-7:
   5. add ports 41-512
   6. remove  ports 41-512
   7. check connectivity between guests via the vhost_user based connection

So the vhost_user ports the guests are using are never deleted.
Only some extra (not even used) ports are added&removed in the loop to
search for potential leaks over a longer lifetime of an openvswitch-dpdk
based solution.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Memory leak when adding/removing vhost_user ports
  2016-04-20  6:18         ` Christian Ehrhardt
@ 2016-04-21  5:54           ` Yuanhan Liu
  2016-04-21  9:07             ` Christian Ehrhardt
  0 siblings, 1 reply; 22+ messages in thread
From: Yuanhan Liu @ 2016-04-21  5:54 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: dev, Daniele Di Proietto

On Wed, Apr 20, 2016 at 08:18:49AM +0200, Christian Ehrhardt wrote:
> On Wed, Apr 20, 2016 at 7:04 AM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> wrote:
> 
>     On Tue, Apr 19, 2016 at 06:33:50PM +0200, Christian Ehrhardt wrote:
> 
> [...] 
> 
>     > With that applied one (and only one) of my two guests looses connectivity
>     after
>     > removing the ports the first time.
> 
>     Yeah, that's should be because I invoked the "->destroy_device()"
>     callback.
> 
> 
> Shouldn't that not only destroy the particular vhost_user device I remove?

I assume the "not" is typed wrong here, then yes. Well, it turned
out that I accidentally destroyed the first guest (with id 0) with
following code:

	ctx.fh = g_vhost_server.server[i]->fh;
	vhost_destroy_device(ctx);

server[i]->fh is initialized with 0 when no connection is established
(check below for more info), and the first id is started with 0. Anyway,
this could be fixed easily.

> See below for some better details on the test to clarify that.
> 
> 
>     BTW, I'm curious how do you do the test? I saw you added 256 ports, but
>     with 2 guests only? So, 254 of them are idle, just for testing the
>     memory leak bug?
> 
> 
> Maybe I should describe it better:
> 1. Spawn some vhost-user ports (40 in my case)
> 2. Spawn a pair of guests that connect via four of those ports per guest
> 3. Guests only intialize one of that vhost_user based NICs
> 4. check connectivity between guests via the vhost_user based connection
> (working at this stage)
> LOOP 5-7:
>    5. add ports 41-512
>    6. remove  ports 41-512
>    7. check connectivity between guests via the vhost_user based connection

Yes, it's much clearer now. Thanks.

I then don't see it's a leak from DPDK vhost-user, at least not the leak
on "struct virtio_net" I have mentioned before. "struct virito_net" will
not even be allocated for those ports never used (ports 41-512 in your case),
as it will be allocated only when there is a connection established, aka,
a guest is connected.

BTW, will you be able to reproduce it without any connections? Say, all
512 ports are added, and then deleted.

Thanks.

	--yliu

> 
> So the vhost_user ports the guests are using are never deleted.
> Only some extra (not even used) ports are added&removed in the loop to search
> for potential leaks over a longer lifetime of an openvswitch-dpdk based
> solution.
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Memory leak when adding/removing vhost_user ports
  2016-04-21  5:54           ` Yuanhan Liu
@ 2016-04-21  9:07             ` Christian Ehrhardt
  0 siblings, 0 replies; 22+ messages in thread
From: Christian Ehrhardt @ 2016-04-21  9:07 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, Daniele Di Proietto

Hi,
I can follow your argument that - and agree that in this case the leak
can't be solved your patch.
Still I found it useful to revise it along our discussion as eventually it
will still be a good patch to have.
I followed your suggestion and found:
- rte_vhost_driver_register callocs vserver (implies fh=0)
- later when on init when getting the callback to vserver_new_vq_conn it
would get set by ops->new_device(vdev_ctx);
- but as you pointed out that could be fh = 0 for the first
- so I initialized vserver->fh with -1 in rte_vhost_driver_register - that
won't ever be a real fh.
- later on get_config_ll_entry won't find a device with that then on the
call by destroy_device.
- so the revised patch currently in use (still for DPDK 2.2) can be found
here http://paste.ubuntu.com/15961394/

Also as you requested I tried with no guest attached at all - that way I
can still reproduce it.
Here is a new stacktrace, but to me it looks the same
http://paste.ubuntu.com/15961185/
Also as you asked before a log of the vswitch, but it is 895MB since a lot
of messages repeat on port add/remove.
Even compressed still 27MB - I need to do something about verbosity there.
Also the system journal of the same time.
Therefore I only added links to bz2 files.
The crash is at "2016-04-21T07:54:47.782Z" in the logs.
=>
http://people.canonical.com/~paelzer/ovs-dpdk-vhost-add-remove-leak/mem-leak-addremove.journal.bz2
=>
http://people.canonical.com/~paelzer/ovs-dpdk-vhost-add-remove-leak/ovs-vswitchd.log.bz2

Kind Regards,
Christian Ehrhardt


Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

On Thu, Apr 21, 2016 at 7:54 AM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:

> On Wed, Apr 20, 2016 at 08:18:49AM +0200, Christian Ehrhardt wrote:
> > On Wed, Apr 20, 2016 at 7:04 AM, Yuanhan Liu <
> yuanhan.liu@linux.intel.com>
> > wrote:
> >
> >     On Tue, Apr 19, 2016 at 06:33:50PM +0200, Christian Ehrhardt wrote:
> >
> > [...]
> >
> >     > With that applied one (and only one) of my two guests looses
> connectivity
> >     after
> >     > removing the ports the first time.
> >
> >     Yeah, that's should be because I invoked the "->destroy_device()"
> >     callback.
> >
> >
> > Shouldn't that not only destroy the particular vhost_user device I
> remove?
>
> I assume the "not" is typed wrong here, then yes. Well, it turned
> out that I accidentally destroyed the first guest (with id 0) with
> following code:
>
>         ctx.fh = g_vhost_server.server[i]->fh;
>         vhost_destroy_device(ctx);
>
> server[i]->fh is initialized with 0 when no connection is established
> (check below for more info), and the first id is started with 0. Anyway,
> this could be fixed easily.
>
> > See below for some better details on the test to clarify that.
> >
> >
> >     BTW, I'm curious how do you do the test? I saw you added 256 ports,
> but
> >     with 2 guests only? So, 254 of them are idle, just for testing the
> >     memory leak bug?
> >
> >
> > Maybe I should describe it better:
> > 1. Spawn some vhost-user ports (40 in my case)
> > 2. Spawn a pair of guests that connect via four of those ports per guest
> > 3. Guests only intialize one of that vhost_user based NICs
> > 4. check connectivity between guests via the vhost_user based connection
> > (working at this stage)
> > LOOP 5-7:
> >    5. add ports 41-512
> >    6. remove  ports 41-512
> >    7. check connectivity between guests via the vhost_user based
> connection
>
> Yes, it's much clearer now. Thanks.
>
> I then don't see it's a leak from DPDK vhost-user, at least not the leak
> on "struct virtio_net" I have mentioned before. "struct virito_net" will
> not even be allocated for those ports never used (ports 41-512 in your
> case),
> as it will be allocated only when there is a connection established, aka,
> a guest is connected.
>
> BTW, will you be able to reproduce it without any connections? Say, all
> 512 ports are added, and then deleted.
>
> Thanks.
>
>         --yliu
>
> >
> > So the vhost_user ports the guests are using are never deleted.
> > Only some extra (not even used) ports are added&removed in the loop to
> search
> > for potential leaks over a longer lifetime of an openvswitch-dpdk based
> > solution.
> >
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Memory leak when adding/removing vhost_user ports
  2016-04-18 17:18 Memory leak when adding/removing vhost_user ports Christian Ehrhardt
  2016-04-18 17:46 ` Yuanhan Liu
@ 2016-04-21 11:01 ` Ilya Maximets
       [not found]   ` <5718B306.5070801-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
  2016-04-21 16:54   ` Yuanhan Liu
  1 sibling, 2 replies; 22+ messages in thread
From: Ilya Maximets @ 2016-04-21 11:01 UTC (permalink / raw)
  To: dev, christian.ehrhardt
  Cc: Daniele Di Proietto, dev, yuanhan.liu, Dyasly Sergey

Hi, Christian.
You're, likely, using tar archive with openvswitch from openvswitch.org.
It doesn't contain many bug fixes from git/branch-2.5 unfortunately.

The problem that you are facing has been solved in branch-2.5 by

commit d9df7b9206831631ddbd90f9cbeef1b4fc5a8e89
Author: Ilya Maximets <i.maximets@samsung.com>
Date:   Thu Mar 3 11:30:06 2016 +0300

    netdev-dpdk: Fix memory leak in netdev_dpdk_vhost_destruct().
    
    Fixes: 4573fbd38fa1 ("netdev-dpdk: Add vhost-user multiqueue support")
    Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
    Acked-by: Flavio Leitner <fbl@sysclose.org>
    Acked-by: Daniele Di Proietto <diproiettod@vmware.com>

Best regards, Ilya Maximets.

> I assume there is a leak somewhere on adding/removing vhost_user ports.
> Although it could also be "only" a fragmentation issue.
> 
> Reproduction is easy:
> I set up a pair of nicely working OVS-DPDK connected KVM Guests.
> Then in a loop I
>    - add up to more 512 ports
>    - test connectivity between the two guests
>    - remove up to 512 ports
> 
> Depending on memory and the amount of multiqueue/rxq I use it seems to
> slightly change when exactly it breaks. But for my default setup of 4
> queues and 5G Hugepages initialized by DPDK it always breaks at the sixth
> iteration.
> Here a link to the stack trace indicating a memory shortage (TBC):
> https://launchpadlibrarian.net/253916410/apport-retrace.log
> 
> Known Todos:
> - I want to track it down more, and will try to come up with a non
> openvswitch based looping testcase that might show it as well to simplify
> debugging.
> - in use were Openvswitch-dpdk 2.5 and DPDK 2.2; Retest with DPDK 16.04 and
> Openvswitch master is planned.
> 
> I will go on debugging this and let you know, but I wanted to give a heads
> up to everyone.
> In case this is a known issue for some of you please let me know.
> 
> Kind Regards,
> Christian Ehrhardt
> Software Engineer, Ubuntu Server
> Canonical Ltd
> 
> P.S. I think it is a dpdk issue, but adding Daniele on CC to represent
> ovs-dpdk as well.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [dpdk-dev] Memory leak when adding/removing vhost_user ports
       [not found]   ` <5718B306.5070801-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
@ 2016-04-21 14:04     ` Christian Ehrhardt
  2016-04-21 16:56       ` Yuanhan Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Christian Ehrhardt @ 2016-04-21 14:04 UTC (permalink / raw)
  To: Ilya Maximets; +Cc: dev, dev-yBygre7rU0TnMu66kgdUjQ, Dyasly Sergey

Thanks Ilya,
yeah we usually wait for the point releases as they undergo some extra
testing and verification.
.1 shouldn't be too much into the future I guess.
Thanks a lot for identifying.

That said, I'd still go on with Yuanhan to finalize the dpdk side leak fix
we identified, so we eventually get it committed.
So Yuanhan, what do you think of my last revised version of your patch for
upstream DPDK (there with the vhost_destroy_device then)?
I mean it is essentially your patch plus a bit of polishing, not mine so I
don't feel entitled to submit it as mine :-)

Kind Regards,
Christian


Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

On Thu, Apr 21, 2016 at 1:01 PM, Ilya Maximets <i.maximets@samsung.com>
wrote:

> Hi, Christian.
> You're, likely, using tar archive with openvswitch from openvswitch.org.
> It doesn't contain many bug fixes from git/branch-2.5 unfortunately.
>
> The problem that you are facing has been solved in branch-2.5 by
>
> commit d9df7b9206831631ddbd90f9cbeef1b4fc5a8e89
> Author: Ilya Maximets <i.maximets@samsung.com>
> Date:   Thu Mar 3 11:30:06 2016 +0300
>
>     netdev-dpdk: Fix memory leak in netdev_dpdk_vhost_destruct().
>
>     Fixes: 4573fbd38fa1 ("netdev-dpdk: Add vhost-user multiqueue support")
>     Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>     Acked-by: Flavio Leitner <fbl@sysclose.org>
>     Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
>
> Best regards, Ilya Maximets.
>
> > I assume there is a leak somewhere on adding/removing vhost_user ports.
> > Although it could also be "only" a fragmentation issue.
> >
> > Reproduction is easy:
> > I set up a pair of nicely working OVS-DPDK connected KVM Guests.
> > Then in a loop I
> >    - add up to more 512 ports
> >    - test connectivity between the two guests
> >    - remove up to 512 ports
> >
> > Depending on memory and the amount of multiqueue/rxq I use it seems to
> > slightly change when exactly it breaks. But for my default setup of 4
> > queues and 5G Hugepages initialized by DPDK it always breaks at the sixth
> > iteration.
> > Here a link to the stack trace indicating a memory shortage (TBC):
> > https://launchpadlibrarian.net/253916410/apport-retrace.log
> >
> > Known Todos:
> > - I want to track it down more, and will try to come up with a non
> > openvswitch based looping testcase that might show it as well to simplify
> > debugging.
> > - in use were Openvswitch-dpdk 2.5 and DPDK 2.2; Retest with DPDK 16.04
> and
> > Openvswitch master is planned.
> >
> > I will go on debugging this and let you know, but I wanted to give a
> heads
> > up to everyone.
> > In case this is a known issue for some of you please let me know.
> >
> > Kind Regards,
> > Christian Ehrhardt
> > Software Engineer, Ubuntu Server
> > Canonical Ltd
> >
> > P.S. I think it is a dpdk issue, but adding Daniele on CC to represent
> > ovs-dpdk as well.
>
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Memory leak when adding/removing vhost_user ports
  2016-04-21 11:01 ` Ilya Maximets
       [not found]   ` <5718B306.5070801-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
@ 2016-04-21 16:54   ` Yuanhan Liu
  1 sibling, 0 replies; 22+ messages in thread
From: Yuanhan Liu @ 2016-04-21 16:54 UTC (permalink / raw)
  To: Ilya Maximets
  Cc: dev, christian.ehrhardt, Daniele Di Proietto, dev, Dyasly Sergey

On Thu, Apr 21, 2016 at 02:01:26PM +0300, Ilya Maximets wrote:
> Hi, Christian.
> You're, likely, using tar archive with openvswitch from openvswitch.org.
> It doesn't contain many bug fixes from git/branch-2.5 unfortunately.
> 
> The problem that you are facing has been solved in branch-2.5 by
> 
> commit d9df7b9206831631ddbd90f9cbeef1b4fc5a8e89
> Author: Ilya Maximets <i.maximets@samsung.com>
> Date:   Thu Mar 3 11:30:06 2016 +0300
> 
>     netdev-dpdk: Fix memory leak in netdev_dpdk_vhost_destruct().
>     
>     Fixes: 4573fbd38fa1 ("netdev-dpdk: Add vhost-user multiqueue support")
>     Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>     Acked-by: Flavio Leitner <fbl@sysclose.org>
>     Acked-by: Daniele Di Proietto <diproiettod@vmware.com>
Hi Ilya,

Thanks for the info. And, I actually checked this peice of code. I was
using new code, so, I didn't find anything wrong.

	--yliu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Memory leak when adding/removing vhost_user ports
  2016-04-21 14:04     ` [dpdk-dev] " Christian Ehrhardt
@ 2016-04-21 16:56       ` Yuanhan Liu
  0 siblings, 0 replies; 22+ messages in thread
From: Yuanhan Liu @ 2016-04-21 16:56 UTC (permalink / raw)
  To: Christian Ehrhardt
  Cc: Ilya Maximets, dev, Daniele Di Proietto, dev, Dyasly Sergey

On Thu, Apr 21, 2016 at 04:04:03PM +0200, Christian Ehrhardt wrote:
> Thanks Ilya,
> yeah we usually wait for the point releases as they undergo some extra testing
> and verification.
> .1 shouldn't be too much into the future I guess.
> Thanks a lot for identifying.
> 
> That said, I'd still go on with Yuanhan to finalize the dpdk side leak fix we
> identified, so we eventually get it committed.
> So Yuanhan, what do you think of my last revised version of your patch for
> upstream DPDK (there with the vhost_destroy_device then)?

That's good.

> I mean it is essentially your patch plus a bit of polishing, not mine so I
> don't feel entitled to submit it as mine :-)

Thanks. I will make and send out a formal patch later.

	--yliu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2] Memory leak when adding/removing vhost_user ports
  2016-04-19 16:33     ` Christian Ehrhardt
  2016-04-20  5:04       ` Yuanhan Liu
@ 2016-07-06 12:24       ` Christian Ehrhardt
  2016-07-06 12:24         ` [PATCH v2] vhost_user: avoid crash when exeeding file descriptors Christian Ehrhardt
                           ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Christian Ehrhardt @ 2016-07-06 12:24 UTC (permalink / raw)
  To: christian.ehrhardt, patrik.r.andersson, thomas.monjalon, dev,
	yuanhan.liu, huawei.xie

Hi,
while checking for dpdk 16.07 what backports are accepted in the meantime so I
can drop them I found this particular discussion has been silently forgotten by
all of us.

Back then we had the patch and discussion first in
http://dpdk.org/dev/patchwork/patch/12103/
and then
http://dpdk.org/dev/patchwork/patch/12118/

Things worked fine as I reported and I integrated the patch in our packaging as
it fixed a severe issue. Since it was reported by someone else I do not seem to
be the only one :-)

So today I rebased the patch including my updates I made based on our discussion
and I think it would make as much sense as it made back then to fix this.

Christian Ehrhardt (1):
  vhost_user: avoid crash when exeeding file descriptors

 lib/librte_vhost/vhost_user/fd_man.c         | 11 ++++++-----
 lib/librte_vhost/vhost_user/vhost-net-user.c | 19 +++++++++++++++++--
 2 files changed, 23 insertions(+), 7 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2] vhost_user: avoid crash when exeeding file descriptors
  2016-07-06 12:24       ` [PATCH v2] " Christian Ehrhardt
@ 2016-07-06 12:24         ` Christian Ehrhardt
  2016-07-12  8:37           ` Yuanhan Liu
  2016-07-06 12:26         ` [PATCH v2] Memory leak when adding/removing vhost_user ports Christian Ehrhardt
  2016-07-06 13:08         ` Yuanhan Liu
  2 siblings, 1 reply; 22+ messages in thread
From: Christian Ehrhardt @ 2016-07-06 12:24 UTC (permalink / raw)
  To: christian.ehrhardt, patrik.r.andersson, thomas.monjalon, dev,
	yuanhan.liu, huawei.xie

*update in v2*
- refreshing for DPDK 16.07
- Close fd on vserver->listenfd as suggested in discussion

Original From:
From: Patrik Andersson <patrik.r.andersson@ericsson.com>

Protect against DPDK crash when allocation of listen fd >= 1023.
For events on fd:s >1023, the current implementation will trigger
an abort due to access outside of allocated bit mask.

Corrections would include:

  * Match fdset_add() signature in fd_man.c to fd_man.h
  * Handling of return codes from fdset_add()
  * Addition of check of fd number in fdset_add_fd()

The rationale behind the suggested code change is that,
fdset_event_dispatch() could attempt access outside of the FD_SET
bitmask if there is an event on a file descriptor that in turn
looks up a virtio file descriptor with a value > 1023.
Such an attempt will lead to an abort() and a restart of any
vswitch using DPDK.

A discussion topic exist in the ovs-discuss mailing list that can
provide a little more background:
http://openvswitch.org/pipermail/discuss/2016-February/020243.html

Fixes: 8f972312 ("vhost: support vhost-user")

Signed-off-by: Patrik Andersson <patrik.r.andersson@ericsson.com>
Signed-off-by: Christian Ehrhardt <christian.ehrhardt@canonical.com>
---
 lib/librte_vhost/vhost_user/fd_man.c         | 11 ++++++-----
 lib/librte_vhost/vhost_user/vhost-net-user.c | 19 +++++++++++++++++--
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/lib/librte_vhost/vhost_user/fd_man.c b/lib/librte_vhost/vhost_user/fd_man.c
index 087aaed..c691339 100644
--- a/lib/librte_vhost/vhost_user/fd_man.c
+++ b/lib/librte_vhost/vhost_user/fd_man.c
@@ -71,20 +71,22 @@ fdset_find_free_slot(struct fdset *pfdset)
 	return fdset_find_fd(pfdset, -1);
 }
 
-static void
+static int
 fdset_add_fd(struct fdset  *pfdset, int idx, int fd,
 	fd_cb rcb, fd_cb wcb, void *dat)
 {
 	struct fdentry *pfdentry;
 
-	if (pfdset == NULL || idx >= MAX_FDS)
-		return;
+	if (pfdset == NULL || idx >= MAX_FDS || fd >= FD_SETSIZE)
+		return -1;
 
 	pfdentry = &pfdset->fd[idx];
 	pfdentry->fd = fd;
 	pfdentry->rcb = rcb;
 	pfdentry->wcb = wcb;
 	pfdentry->dat = dat;
+
+	return 0;
 }
 
 /**
@@ -150,12 +152,11 @@ fdset_add(struct fdset *pfdset, int fd, fd_cb rcb, fd_cb wcb, void *dat)
 
 	/* Find a free slot in the list. */
 	i = fdset_find_free_slot(pfdset);
-	if (i == -1) {
+	if (i == -1 || fdset_add_fd(pfdset, i, fd, rcb, wcb, dat) < 0) {
 		pthread_mutex_unlock(&pfdset->fd_mutex);
 		return -2;
 	}
 
-	fdset_add_fd(pfdset, i, fd, rcb, wcb, dat);
 	pfdset->num++;
 
 	pthread_mutex_unlock(&pfdset->fd_mutex);
diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c b/lib/librte_vhost/vhost_user/vhost-net-user.c
index 94f1b92..5743f52 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -257,6 +257,7 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 	int vid;
 	size_t size;
 	struct vhost_user_connection *conn;
+	int ret;
 
 	conn = malloc(sizeof(*conn));
 	if (conn == NULL) {
@@ -278,7 +279,15 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket)
 
 	conn->vsocket = vsocket;
 	conn->vid = vid;
-	fdset_add(&vhost_user.fdset, fd, vhost_user_msg_handler, NULL, conn);
+	ret = fdset_add(&vhost_user.fdset, fd, vhost_user_msg_handler,
+			NULL, conn);
+	if (ret < 0) {
+		free(conn);
+		close(fd);
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"failed to add fd %d into vhost server fdset\n",
+			fd);
+	}
 }
 
 /* call back when there is new vhost-user connection from client  */
@@ -469,8 +478,14 @@ vhost_user_create_server(struct vhost_user_socket *vsocket)
 		goto err;
 
 	vsocket->listenfd = fd;
-	fdset_add(&vhost_user.fdset, fd, vhost_user_server_new_connection,
+	ret = fdset_add(&vhost_user.fdset, fd, vhost_user_server_new_connection,
 		  NULL, vsocket);
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"failed to add listen fd %d to vhost server fdset\n",
+			fd);
+		goto err;
+	}
 
 	return 0;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] Memory leak when adding/removing vhost_user ports
  2016-07-06 12:24       ` [PATCH v2] " Christian Ehrhardt
  2016-07-06 12:24         ` [PATCH v2] vhost_user: avoid crash when exeeding file descriptors Christian Ehrhardt
@ 2016-07-06 12:26         ` Christian Ehrhardt
  2016-07-06 12:30           ` Christian Ehrhardt
  2016-07-06 13:08         ` Yuanhan Liu
  2 siblings, 1 reply; 22+ messages in thread
From: Christian Ehrhardt @ 2016-07-06 12:26 UTC (permalink / raw)
  To: Christian Ehrhardt, Patrik Andersson R, Thomas Monjalon, dev,
	Yuanhan Liu, Xie, Huawei

Sorry,
please ignore the two links, the cover letter has - they belong to a
different issue I have to bring up again.
Everything else still applies.

Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

On Wed, Jul 6, 2016 at 2:24 PM, Christian Ehrhardt <
christian.ehrhardt@canonical.com> wrote:

> Hi,
> while checking for dpdk 16.07 what backports are accepted in the meantime
> so I
> can drop them I found this particular discussion has been silently
> forgotten by
> all of us.
>
> Back then we had the patch and discussion first in
> http://dpdk.org/dev/patchwork/patch/12103/
> and then
> http://dpdk.org/dev/patchwork/patch/12118/
>
> Things worked fine as I reported and I integrated the patch in our
> packaging as
> it fixed a severe issue. Since it was reported by someone else I do not
> seem to
> be the only one :-)
>
> So today I rebased the patch including my updates I made based on our
> discussion
> and I think it would make as much sense as it made back then to fix this.
>
> Christian Ehrhardt (1):
>   vhost_user: avoid crash when exeeding file descriptors
>
>  lib/librte_vhost/vhost_user/fd_man.c         | 11 ++++++-----
>  lib/librte_vhost/vhost_user/vhost-net-user.c | 19 +++++++++++++++++--
>  2 files changed, 23 insertions(+), 7 deletions(-)
>
> --
> 2.7.4
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] Memory leak when adding/removing vhost_user ports
  2016-07-06 12:26         ` [PATCH v2] Memory leak when adding/removing vhost_user ports Christian Ehrhardt
@ 2016-07-06 12:30           ` Christian Ehrhardt
  2016-07-06 12:37             ` Christian Ehrhardt
  0 siblings, 1 reply; 22+ messages in thread
From: Christian Ehrhardt @ 2016-07-06 12:30 UTC (permalink / raw)
  To: Christian Ehrhardt, Patrik Andersson R, Thomas Monjalon, dev,
	Yuanhan Liu, Xie, Huawei

This is the series this really belongs to
http://dpdk.org/dev/patchwork/patch/11581/
Message ID <1458292380-9258-1-git-send-email-patrik.r.andersson@ericsson.com
>

Should I wait for a v2 to point the patch at the right ID or do you prefer
a fixed resubmit right away?

Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

On Wed, Jul 6, 2016 at 2:26 PM, Christian Ehrhardt <
christian.ehrhardt@canonical.com> wrote:

> Sorry,
> please ignore the two links, the cover letter has - they belong to a
> different issue I have to bring up again.
> Everything else still applies.
>
> Christian Ehrhardt
> Software Engineer, Ubuntu Server
> Canonical Ltd
>
> On Wed, Jul 6, 2016 at 2:24 PM, Christian Ehrhardt <
> christian.ehrhardt@canonical.com> wrote:
>
>> Hi,
>> while checking for dpdk 16.07 what backports are accepted in the meantime
>> so I
>> can drop them I found this particular discussion has been silently
>> forgotten by
>> all of us.
>>
>> Back then we had the patch and discussion first in
>> http://dpdk.org/dev/patchwork/patch/12103/
>> and then
>> http://dpdk.org/dev/patchwork/patch/12118/
>>
>> Things worked fine as I reported and I integrated the patch in our
>> packaging as
>> it fixed a severe issue. Since it was reported by someone else I do not
>> seem to
>> be the only one :-)
>>
>> So today I rebased the patch including my updates I made based on our
>> discussion
>> and I think it would make as much sense as it made back then to fix this.
>>
>> Christian Ehrhardt (1):
>>   vhost_user: avoid crash when exeeding file descriptors
>>
>>  lib/librte_vhost/vhost_user/fd_man.c         | 11 ++++++-----
>>  lib/librte_vhost/vhost_user/vhost-net-user.c | 19 +++++++++++++++++--
>>  2 files changed, 23 insertions(+), 7 deletions(-)
>>
>> --
>> 2.7.4
>>
>>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] Memory leak when adding/removing vhost_user ports
  2016-07-06 12:30           ` Christian Ehrhardt
@ 2016-07-06 12:37             ` Christian Ehrhardt
  0 siblings, 0 replies; 22+ messages in thread
From: Christian Ehrhardt @ 2016-07-06 12:37 UTC (permalink / raw)
  To: Christian Ehrhardt, Patrik Andersson R, Thomas Monjalon, dev,
	Yuanhan Liu, Xie, Huawei

Now let us really get to this old discussion.
Once again - sorry for mixing this up :-/
Found too much at once today.

## Ok - refocus ##

Back then in
http://dpdk.org/dev/patchwork/patch/12103/
http://dpdk.org/dev/patchwork/patch/12118/

We came up with a nice solution for the leak.
But it was never picked up upstream.

There were a lot of changes to all of that code, especially vhost
client/server.
Lacking an openvswitch for DPDK 16.07 I can't test if the issue still shows
up, but looking at the code suggests it still does.

Unfortunately the internal structures and scopes are slightly different so
that I couldn't come up with a working forward port of the old patch.
Attached is a patch as close as I got (obviously not working).
I'm convinced that Yuanhan Liu and Xie Huawei with their deep context
knowledge of dpdk's vhost_user will quickly know:
- if the root cause still applies
- what the best new way of fixing this would be

As this makes long term usage of dpdk aborting by a leak I hope we have a
chance to get this into 16.07 still.

Kind Regards,
Christian


Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

On Wed, Jul 6, 2016 at 2:30 PM, Christian Ehrhardt <
christian.ehrhardt@canonical.com> wrote:

> This is the series this really belongs to
> http://dpdk.org/dev/patchwork/patch/11581/
> Message ID <
> 1458292380-9258-1-git-send-email-patrik.r.andersson@ericsson.com>
>
> Should I wait for a v2 to point the patch at the right ID or do you prefer
> a fixed resubmit right away?
>
> Christian Ehrhardt
> Software Engineer, Ubuntu Server
> Canonical Ltd
>
> On Wed, Jul 6, 2016 at 2:26 PM, Christian Ehrhardt <
> christian.ehrhardt@canonical.com> wrote:
>
>> Sorry,
>> please ignore the two links, the cover letter has - they belong to a
>> different issue I have to bring up again.
>> Everything else still applies.
>>
>> Christian Ehrhardt
>> Software Engineer, Ubuntu Server
>> Canonical Ltd
>>
>> On Wed, Jul 6, 2016 at 2:24 PM, Christian Ehrhardt <
>> christian.ehrhardt@canonical.com> wrote:
>>
>>> Hi,
>>> while checking for dpdk 16.07 what backports are accepted in the
>>> meantime so I
>>> can drop them I found this particular discussion has been silently
>>> forgotten by
>>> all of us.
>>>
>>> Back then we had the patch and discussion first in
>>> http://dpdk.org/dev/patchwork/patch/12103/
>>> and then
>>> http://dpdk.org/dev/patchwork/patch/12118/
>>>
>>> Things worked fine as I reported and I integrated the patch in our
>>> packaging as
>>> it fixed a severe issue. Since it was reported by someone else I do not
>>> seem to
>>> be the only one :-)
>>>
>>> So today I rebased the patch including my updates I made based on our
>>> discussion
>>> and I think it would make as much sense as it made back then to fix this.
>>>
>>> Christian Ehrhardt (1):
>>>   vhost_user: avoid crash when exeeding file descriptors
>>>
>>>  lib/librte_vhost/vhost_user/fd_man.c         | 11 ++++++-----
>>>  lib/librte_vhost/vhost_user/vhost-net-user.c | 19 +++++++++++++++++--
>>>  2 files changed, 23 insertions(+), 7 deletions(-)
>>>
>>> --
>>> 2.7.4
>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] Memory leak when adding/removing vhost_user ports
  2016-07-06 12:24       ` [PATCH v2] " Christian Ehrhardt
  2016-07-06 12:24         ` [PATCH v2] vhost_user: avoid crash when exeeding file descriptors Christian Ehrhardt
  2016-07-06 12:26         ` [PATCH v2] Memory leak when adding/removing vhost_user ports Christian Ehrhardt
@ 2016-07-06 13:08         ` Yuanhan Liu
  2016-07-12 12:08           ` Yuanhan Liu
  2 siblings, 1 reply; 22+ messages in thread
From: Yuanhan Liu @ 2016-07-06 13:08 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: patrik.r.andersson, thomas.monjalon, dev, huawei.xie

On Wed, Jul 06, 2016 at 02:24:57PM +0200, Christian Ehrhardt wrote:
> Hi,
> while checking for dpdk 16.07 what backports are accepted in the meantime so I
> can drop them I found this particular discussion has been silently forgotten by
> all of us.

Not really. As later I found that my patch was actually wrong (besides
the issue you already found). I will revisit this issue thoroughly when
get time, hopefully, next week.

	--yliu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] vhost_user: avoid crash when exeeding file descriptors
  2016-07-06 12:24         ` [PATCH v2] vhost_user: avoid crash when exeeding file descriptors Christian Ehrhardt
@ 2016-07-12  8:37           ` Yuanhan Liu
  2016-07-15 19:46             ` Thomas Monjalon
  0 siblings, 1 reply; 22+ messages in thread
From: Yuanhan Liu @ 2016-07-12  8:37 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: patrik.r.andersson, thomas.monjalon, dev, huawei.xie

On Wed, Jul 06, 2016 at 02:24:58PM +0200, Christian Ehrhardt wrote:
> *update in v2*
> - refreshing for DPDK 16.07
> - Close fd on vserver->listenfd as suggested in discussion
> 
> Original From:
> From: Patrik Andersson <patrik.r.andersson@ericsson.com>
> 
> Protect against DPDK crash when allocation of listen fd >= 1023.
> For events on fd:s >1023, the current implementation will trigger
> an abort due to access outside of allocated bit mask.

Hmmm, I have no idea why I missed this email in the beginning,
otherwise, it would have been in rc2 release.

Thanks for the re-posting, and we should have had it in last
release.

Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

	--yliu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] Memory leak when adding/removing vhost_user ports
  2016-07-06 13:08         ` Yuanhan Liu
@ 2016-07-12 12:08           ` Yuanhan Liu
  2016-07-19 13:50             ` Christian Ehrhardt
  0 siblings, 1 reply; 22+ messages in thread
From: Yuanhan Liu @ 2016-07-12 12:08 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: patrik.r.andersson, thomas.monjalon, dev

On Wed, Jul 06, 2016 at 09:08:12PM +0800, Yuanhan Liu wrote:
> On Wed, Jul 06, 2016 at 02:24:57PM +0200, Christian Ehrhardt wrote:
> > Hi,
> > while checking for dpdk 16.07 what backports are accepted in the meantime so I
> > can drop them I found this particular discussion has been silently forgotten by
> > all of us.
> 
> Not really. As later I found that my patch was actually wrong (besides
> the issue you already found). I will revisit this issue thoroughly when
> get time, hopefully, next week.

Here it is: vhost_destroy() will be invoked when:

- QEMU quits gracefully 
- QEMU terminates unexpectedly

Meaning, there should be no memory leak. I think we are fine here (I
maybe wrong though, feeling a bit dizzy now :(

	--yliu

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] vhost_user: avoid crash when exeeding file descriptors
  2016-07-12  8:37           ` Yuanhan Liu
@ 2016-07-15 19:46             ` Thomas Monjalon
  0 siblings, 0 replies; 22+ messages in thread
From: Thomas Monjalon @ 2016-07-15 19:46 UTC (permalink / raw)
  To: Christian Ehrhardt, patrik.r.andersson; +Cc: Yuanhan Liu, dev, huawei.xie

2016-07-12 16:37, Yuanhan Liu:
> On Wed, Jul 06, 2016 at 02:24:58PM +0200, Christian Ehrhardt wrote:
> > *update in v2*
> > - refreshing for DPDK 16.07
> > - Close fd on vserver->listenfd as suggested in discussion
> > 
> > Original From:
> > From: Patrik Andersson <patrik.r.andersson@ericsson.com>
> > 
> > Protect against DPDK crash when allocation of listen fd >= 1023.
> > For events on fd:s >1023, the current implementation will trigger
> > an abort due to access outside of allocated bit mask.
> 
> Hmmm, I have no idea why I missed this email in the beginning,
> otherwise, it would have been in rc2 release.
> 
> Thanks for the re-posting, and we should have had it in last
> release.
> 
> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

Applied, thanks

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2] Memory leak when adding/removing vhost_user ports
  2016-07-12 12:08           ` Yuanhan Liu
@ 2016-07-19 13:50             ` Christian Ehrhardt
  0 siblings, 0 replies; 22+ messages in thread
From: Christian Ehrhardt @ 2016-07-19 13:50 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Patrik Andersson R, Thomas Monjalon, dev

Hi,
thanks for evaluating.
I'll need a few days until I'm ready for OVS again and until OVS is ready
for DPDK 16.07.
Then I can run an adapted version of my old testcase to be sure.

In the worst cases it will be in 16.11 and I'll backport to 16.07 as
distributed.
I'll let you know then.


Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

On Tue, Jul 12, 2016 at 2:08 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:

> On Wed, Jul 06, 2016 at 09:08:12PM +0800, Yuanhan Liu wrote:
> > On Wed, Jul 06, 2016 at 02:24:57PM +0200, Christian Ehrhardt wrote:
> > > Hi,
> > > while checking for dpdk 16.07 what backports are accepted in the
> meantime so I
> > > can drop them I found this particular discussion has been silently
> forgotten by
> > > all of us.
> >
> > Not really. As later I found that my patch was actually wrong (besides
> > the issue you already found). I will revisit this issue thoroughly when
> > get time, hopefully, next week.
>
> Here it is: vhost_destroy() will be invoked when:
>
> - QEMU quits gracefully
> - QEMU terminates unexpectedly
>
> Meaning, there should be no memory leak. I think we are fine here (I
> maybe wrong though, feeling a bit dizzy now :(
>
>         --yliu
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-07-19 13:50 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-18 17:18 Memory leak when adding/removing vhost_user ports Christian Ehrhardt
2016-04-18 17:46 ` Yuanhan Liu
2016-04-18 18:14   ` Yuanhan Liu
2016-04-19 16:33     ` Christian Ehrhardt
2016-04-20  5:04       ` Yuanhan Liu
2016-04-20  6:18         ` Christian Ehrhardt
2016-04-21  5:54           ` Yuanhan Liu
2016-04-21  9:07             ` Christian Ehrhardt
2016-07-06 12:24       ` [PATCH v2] " Christian Ehrhardt
2016-07-06 12:24         ` [PATCH v2] vhost_user: avoid crash when exeeding file descriptors Christian Ehrhardt
2016-07-12  8:37           ` Yuanhan Liu
2016-07-15 19:46             ` Thomas Monjalon
2016-07-06 12:26         ` [PATCH v2] Memory leak when adding/removing vhost_user ports Christian Ehrhardt
2016-07-06 12:30           ` Christian Ehrhardt
2016-07-06 12:37             ` Christian Ehrhardt
2016-07-06 13:08         ` Yuanhan Liu
2016-07-12 12:08           ` Yuanhan Liu
2016-07-19 13:50             ` Christian Ehrhardt
2016-04-21 11:01 ` Ilya Maximets
     [not found]   ` <5718B306.5070801-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
2016-04-21 14:04     ` [dpdk-dev] " Christian Ehrhardt
2016-04-21 16:56       ` Yuanhan Liu
2016-04-21 16:54   ` Yuanhan Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.