All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net_namespace: fixed net_device reference leak
@ 2016-09-08  1:55 Jojy Varghese
  2016-09-08 19:08 ` Jojy Varghese
  0 siblings, 1 reply; 16+ messages in thread
From: Jojy Varghese @ 2016-09-08  1:55 UTC (permalink / raw)
  To: netdev; +Cc: davem

During namespace cleanup, if ‘dst’ subsystem is holding
a reference to the loopback interface in the namespace, it does not get
released. This is because in the case where the net_device held by 'dst'
is same as the namespace's loopback net_device, current code first
does a ’dev_hold’ on the same device followed by a ‘dev_put’ on the
same device resulting in a no-op.

This change fixes this leak by assigning the initial namespace’s loopback
device to the ‘dst’’ before releasing the reference to the network device
being unregistered.

Additional reference: https://github.com/docker/docker/issues/5618

Signed-off-by: Jojy G Varghese <jojy.varghese@gmail.com>
---
 net/core/dst.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dst.c b/net/core/dst.c
index a1656e3..a18e8ea 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -433,7 +433,7 @@ static void dst_ifdown(struct dst_entry *dst,
struct net_device *dev,
                dst->input = dst_discard;
                dst->output = dst_discard_out;
        } else {
-               dst->dev = dev_net(dst->dev)->loopback_dev;
+               dst->dev = init_net.loopback_dev;
                dev_hold(dst->dev);
                dev_put(dev);
        }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] net_namespace: fixed net_device reference leak
  2016-09-08  1:55 [PATCH] net_namespace: fixed net_device reference leak Jojy Varghese
@ 2016-09-08 19:08 ` Jojy Varghese
  2016-09-08 19:32   ` Lance Richardson
  2016-09-08 21:00   ` Eric Dumazet
  0 siblings, 2 replies; 16+ messages in thread
From: Jojy Varghese @ 2016-09-08 19:08 UTC (permalink / raw)
  To: netdev; +Cc: davem

(Updating the patch for the case when a non-loopback dev could be unregistered)

Currently during namespace cleanup, if ‘dst’ subsystem is holding
a reference to the interface in the namespace, it does not get released.
Current code first does a ’dev_hold’ on the same device followed by a
‘dev_put’ on the same device resulting in a no-op.

This change fixes this leak by assigning the initial namespace’s loopback
device to the ‘dst’ before releasing the reference to the network
device being released.

Additional reference: https://github.com/docker/docker/issues/5618

Signed-off-by: Jojy G Varghese <jojy.varghese@gmail.com>
---
 net/core/dst.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/core/dst.c b/net/core/dst.c
index a1656e3..7e45593 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -433,7 +433,11 @@ static void dst_ifdown(struct dst_entry *dst,
struct net_device *dev,
                dst->input = dst_discard;
                dst->output = dst_discard_out;
        } else {
-               dst->dev = dev_net(dst->dev)->loopback_dev;
+               if (dst->dev == dev_net(dst->dev)->loopback_dev)
+                       dst->dev = init_net.loopback_dev;
+               else
+                       dst->dev = dev_net(dst->dev)->loopback_dev;
+
                dev_hold(dst->dev);
                dev_put(dev);
        }
--
1.8.3.1

On Wed, Sep 7, 2016 at 6:55 PM, Jojy Varghese <jojy.varghese@gmail.com> wrote:
> During namespace cleanup, if ‘dst’ subsystem is holding
> a reference to the loopback interface in the namespace, it does not get
> released. This is because in the case where the net_device held by 'dst'
> is same as the namespace's loopback net_device, current code first
> does a ’dev_hold’ on the same device followed by a ‘dev_put’ on the
> same device resulting in a no-op.
>
> This change fixes this leak by assigning the initial namespace’s loopback
> device to the ‘dst’’ before releasing the reference to the network device
> being unregistered.
>
> Additional reference: https://github.com/docker/docker/issues/5618
>
> Signed-off-by: Jojy G Varghese <jojy.varghese@gmail.com>
> ---
>  net/core/dst.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/dst.c b/net/core/dst.c
> index a1656e3..a18e8ea 100644
> --- a/net/core/dst.c
> +++ b/net/core/dst.c
> @@ -433,7 +433,7 @@ static void dst_ifdown(struct dst_entry *dst,
> struct net_device *dev,
>                 dst->input = dst_discard;
>                 dst->output = dst_discard_out;
>         } else {
> -               dst->dev = dev_net(dst->dev)->loopback_dev;
> +               dst->dev = init_net.loopback_dev;
>                 dev_hold(dst->dev);
>                 dev_put(dev);
>         }
> --
> 1.8.3.1



-- 
Jojy G Varghese

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] net_namespace: fixed net_device reference leak
  2016-09-08 19:08 ` Jojy Varghese
@ 2016-09-08 19:32   ` Lance Richardson
  2016-09-08 21:00   ` Eric Dumazet
  1 sibling, 0 replies; 16+ messages in thread
From: Lance Richardson @ 2016-09-08 19:32 UTC (permalink / raw)
  To: Jojy Varghese; +Cc: netdev, davem

> From: "Jojy Varghese" <jojy.varghese@gmail.com>
> To: netdev@vger.kernel.org
> Cc: davem@davemloft.net
> Sent: Thursday, September 8, 2016 3:08:29 PM
> Subject: Re: [PATCH] net_namespace: fixed net_device reference leak
> 
> (Updating the patch for the case when a non-loopback dev could be
> unregistered)
> 
> Currently during namespace cleanup, if ‘dst’ subsystem is holding
> a reference to the interface in the namespace, it does not get released.
> Current code first does a ’dev_hold’ on the same device followed by a
> ‘dev_put’ on the same device resulting in a no-op.
> 
> This change fixes this leak by assigning the initial namespace’s loopback
> device to the ‘dst’ before releasing the reference to the network
> device being released.
> 
> Additional reference: https://github.com/docker/docker/issues/5618
> 
> Signed-off-by: Jojy G Varghese <jojy.varghese@gmail.com>
> ---

If the dst wasn't being cleaned up in the namespace that's being deleted,
it is unlikely it will be cleaned up after moving it to the init namespace.

I think it would be better to figure out why this dst is being leaked
in the first place, otherwise this is probably still a memory leak.

Is this easily reproducible with current net or net-next kernels?

Regards,

    Lance

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net_namespace: fixed net_device reference leak
  2016-09-08 19:08 ` Jojy Varghese
  2016-09-08 19:32   ` Lance Richardson
@ 2016-09-08 21:00   ` Eric Dumazet
  2016-09-08 21:05     ` Eric Dumazet
  1 sibling, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2016-09-08 21:00 UTC (permalink / raw)
  To: Jojy Varghese; +Cc: netdev, davem

On Thu, 2016-09-08 at 12:08 -0700, Jojy Varghese wrote:
> (Updating the patch for the case when a non-loopback dev could be unregistered)
> 
> Currently during namespace cleanup, if ‘dst’ subsystem is holding
> a reference to the interface in the namespace, it does not get released.
> Current code first does a ’dev_hold’ on the same device followed by a
> ‘dev_put’ on the same device resulting in a no-op.
> 
> This change fixes this leak by assigning the initial namespace’s loopback
> device to the ‘dst’ before releasing the reference to the network
> device being released.
> 
> Additional reference: https://github.com/docker/docker/issues/5618
> 
> Signed-off-by: Jojy G Varghese <jojy.varghese@gmail.com>
> ---
>  net/core/dst.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/dst.c b/net/core/dst.c
> index a1656e3..7e45593 100644
> --- a/net/core/dst.c
> +++ b/net/core/dst.c
> @@ -433,7 +433,11 @@ static void dst_ifdown(struct dst_entry *dst,
> struct net_device *dev,
>                 dst->input = dst_discard;
>                 dst->output = dst_discard_out;
>         } else {
> -               dst->dev = dev_net(dst->dev)->loopback_dev;
> +               if (dst->dev == dev_net(dst->dev)->loopback_dev)
> +                       dst->dev = init_net.loopback_dev;
> +               else
> +                       dst->dev = dev_net(dst->dev)->loopback_dev;
> +
>                 dev_hold(dst->dev);
>                 dev_put(dev);
>         }

I understand these recurring problems are really a hassle
(unregister_netdevice: waiting for xxx to become free. Usage count = 1),
but this change simply hides the real bug ?

Pretending this dst now belongs to init_net is a bit like avoiding a
kfree(mem) : Better not free memory than risk use-after-free.

Now, try this one billion time, and I am sure host will have no memory
left.

At net namespace dismantle, all dst should be freed.
We have numerous callbacks to ensure this.
But maybe we have _one_ dst that is not properly purged.

Latest real fix was :
https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=751eb6b6042a596b0080967c1a529a9fe98dac1d

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net_namespace: fixed net_device reference leak
  2016-09-08 21:00   ` Eric Dumazet
@ 2016-09-08 21:05     ` Eric Dumazet
  2016-09-08 22:12       ` Jojy Varghese
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2016-09-08 21:05 UTC (permalink / raw)
  To: Jojy Varghese; +Cc: netdev, davem

On Thu, 2016-09-08 at 14:00 -0700, Eric Dumazet wrote:

> Latest real fix was :
> https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=751eb6b6042a596b0080967c1a529a9fe98dac1d

Another interesting fix is 

https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=a5d0dc810abf3d6b241777467ee1d6efb02575fc

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net_namespace: fixed net_device reference leak
  2016-09-08 21:05     ` Eric Dumazet
@ 2016-09-08 22:12       ` Jojy Varghese
  2016-09-08 22:16         ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Jojy Varghese @ 2016-09-08 22:12 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem

Hi Eric/Lance

 Thanks for taking time to look at the patch. The problem in dst.c's
"dst_ifdown" is that it will never do an actual "dev_put" on the
loopback net device. This results in the loopback net device to be
alive even when the dst goes away.

Thanks
Jojy

On Thu, Sep 8, 2016 at 2:05 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2016-09-08 at 14:00 -0700, Eric Dumazet wrote:
>
>> Latest real fix was :
>> https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=751eb6b6042a596b0080967c1a529a9fe98dac1d
>
> Another interesting fix is
>
> https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=a5d0dc810abf3d6b241777467ee1d6efb02575fc
>
>
>



-- 
Jojy G Varghese

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net_namespace: fixed net_device reference leak
  2016-09-08 22:12       ` Jojy Varghese
@ 2016-09-08 22:16         ` Eric Dumazet
  2016-09-08 23:16           ` Jojy Varghese
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2016-09-08 22:16 UTC (permalink / raw)
  To: Jojy Varghese; +Cc: netdev, davem

On Thu, 2016-09-08 at 15:12 -0700, Jojy Varghese wrote:
> Hi Eric/Lance
> 
>  Thanks for taking time to look at the patch. The problem in dst.c's
> "dst_ifdown" is that it will never do an actual "dev_put" on the
> loopback net device. This results in the loopback net device to be
> alive even when the dst goes away.

Sure, but the dst themselves should disappear, and thus release their
dst->dev reference.

Something is missing.

loopback device is only used when we remove one device from the system,
but namespace (and loopback device) is still around.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net_namespace: fixed net_device reference leak
  2016-09-08 22:16         ` Eric Dumazet
@ 2016-09-08 23:16           ` Jojy Varghese
  2016-09-08 23:33             ` Jojy Varghese
  2016-09-08 23:37             ` Eric Dumazet
  0 siblings, 2 replies; 16+ messages in thread
From: Jojy Varghese @ 2016-09-08 23:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem

The dst's do disappear but the net device does not. A better solution
is to only hold reference to dst's net device ONLY if it is not a
loopback device. In the case of the loopback device, we want to do
only a "put" on it and skip the "hold".

Updating the patch below:

---
 net/core/dst.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/core/dst.c b/net/core/dst.c
index 7e45593..f63027e 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -433,12 +433,11 @@ static void dst_ifdown(struct dst_entry *dst,
struct net_device *dev,
                dst->input = dst_discard;
                dst->output = dst_discard_out;
        } else {
-               if (dst->dev == dev_net(dst->dev)->loopback_dev)
-                       dst->dev = init_net.loopback_dev;
-               else
+               if (dst->dev != dev_net(dst->dev)->loopback_dev) {
                        dst->dev = dev_net(dst->dev)->loopback_dev;
+                       dev_hold(dst->dev);
+               }

-               dev_hold(dst->dev);
                dev_put(dev);
        }
 }

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] net_namespace: fixed net_device reference leak
  2016-09-08 23:16           ` Jojy Varghese
@ 2016-09-08 23:33             ` Jojy Varghese
  2016-09-08 23:51               ` Eric Dumazet
  2016-09-09  0:04               ` David Miller
  2016-09-08 23:37             ` Eric Dumazet
  1 sibling, 2 replies; 16+ messages in thread
From: Jojy Varghese @ 2016-09-08 23:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem

Sorry last patch was just the delta over my original patch. Here is
the complete patch:

---
 net/core/dst.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/core/dst.c b/net/core/dst.c
index a1656e3..f63027e 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -433,8 +433,11 @@ static void dst_ifdown(struct dst_entry *dst,
struct net_device *dev,
                dst->input = dst_discard;
                dst->output = dst_discard_out;
        } else {
-               dst->dev = dev_net(dst->dev)->loopback_dev;
-               dev_hold(dst->dev);
+               if (dst->dev != dev_net(dst->dev)->loopback_dev) {
+                       dst->dev = dev_net(dst->dev)->loopback_dev;
+                       dev_hold(dst->dev);
+               }
+
                dev_put(dev);
        }
 }
--
1.8.3.1

On Thu, Sep 8, 2016 at 4:16 PM, Jojy Varghese <jojy.varghese@gmail.com> wrote:
> The dst's do disappear but the net device does not. A better solution
> is to only hold reference to dst's net device ONLY if it is not a
> loopback device. In the case of the loopback device, we want to do
> only a "put" on it and skip the "hold".
>
> Updating the patch below:
>
> ---
>  net/core/dst.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/net/core/dst.c b/net/core/dst.c
> index 7e45593..f63027e 100644
> --- a/net/core/dst.c
> +++ b/net/core/dst.c
> @@ -433,12 +433,11 @@ static void dst_ifdown(struct dst_entry *dst,
> struct net_device *dev,
>                 dst->input = dst_discard;
>                 dst->output = dst_discard_out;
>         } else {
> -               if (dst->dev == dev_net(dst->dev)->loopback_dev)
> -                       dst->dev = init_net.loopback_dev;
> -               else
> +               if (dst->dev != dev_net(dst->dev)->loopback_dev) {
>                         dst->dev = dev_net(dst->dev)->loopback_dev;
> +                       dev_hold(dst->dev);
> +               }
>
> -               dev_hold(dst->dev);
>                 dev_put(dev);
>         }
>  }
> --
> 1.8.3.1
>
>
>
> On Thu, Sep 8, 2016 at 3:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Thu, 2016-09-08 at 15:12 -0700, Jojy Varghese wrote:
>>> Hi Eric/Lance
>>>
>>>  Thanks for taking time to look at the patch. The problem in dst.c's
>>> "dst_ifdown" is that it will never do an actual "dev_put" on the
>>> loopback net device. This results in the loopback net device to be
>>> alive even when the dst goes away.
>>
>> Sure, but the dst themselves should disappear, and thus release their
>> dst->dev reference.
>>
>> Something is missing.
>>
>> loopback device is only used when we remove one device from the system,
>> but namespace (and loopback device) is still around.
>>
>>
>>
>
>
>
> --
> Jojy G Varghese



-- 
Jojy G Varghese

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] net_namespace: fixed net_device reference leak
  2016-09-08 23:16           ` Jojy Varghese
  2016-09-08 23:33             ` Jojy Varghese
@ 2016-09-08 23:37             ` Eric Dumazet
  1 sibling, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2016-09-08 23:37 UTC (permalink / raw)
  To: Jojy Varghese; +Cc: netdev, davem

On Thu, 2016-09-08 at 16:16 -0700, Jojy Varghese wrote:
> The dst's do disappear but the net device does not. A better solution
> is to only hold reference to dst's net device ONLY if it is not a
> loopback device. In the case of the loopback device, we want to do
> only a "put" on it and skip the "hold".
> 
> Updating the patch below:
> 
> ---
>  net/core/dst.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/net/core/dst.c b/net/core/dst.c
> index 7e45593..f63027e 100644
> --- a/net/core/dst.c
> +++ b/net/core/dst.c
> @@ -433,12 +433,11 @@ static void dst_ifdown(struct dst_entry *dst,
> struct net_device *dev,
>                 dst->input = dst_discard;
>                 dst->output = dst_discard_out;
>         } else {
> -               if (dst->dev == dev_net(dst->dev)->loopback_dev)

On which tree are you working exactly ?
Current code is not looking like that.

> -                       dst->dev = init_net.loopback_dev;
> -               else
> +               if (dst->dev != dev_net(dst->dev)->loopback_dev) {
>                         dst->dev = dev_net(dst->dev)->loopback_dev;
> +                       dev_hold(dst->dev);
> +               }
> 
> -               dev_hold(dst->dev);
>                 dev_put(dev);
>         }
>  }


I appreciate your desperate efforts, but I am telling you the bug is
elsewhere.

Do you have a reproducer of the bug, on latest David Miller tree
( https://git.kernel.org/cgit/linux/kernel/git/davem/net.git )

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net_namespace: fixed net_device reference leak
  2016-09-08 23:33             ` Jojy Varghese
@ 2016-09-08 23:51               ` Eric Dumazet
  2016-09-09  0:04               ` David Miller
  1 sibling, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2016-09-08 23:51 UTC (permalink / raw)
  To: Jojy Varghese; +Cc: netdev, davem

On Thu, 2016-09-08 at 16:33 -0700, Jojy Varghese wrote:
> Sorry last patch was just the delta over my original patch. Here is
> the complete patch:
> 
> ---
>  net/core/dst.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/dst.c b/net/core/dst.c
> index a1656e3..f63027e 100644
> --- a/net/core/dst.c
> +++ b/net/core/dst.c
> @@ -433,8 +433,11 @@ static void dst_ifdown(struct dst_entry *dst,
> struct net_device *dev,
>                 dst->input = dst_discard;
>                 dst->output = dst_discard_out;
>         } else {
> -               dst->dev = dev_net(dst->dev)->loopback_dev;
> -               dev_hold(dst->dev);
> +               if (dst->dev != dev_net(dst->dev)->loopback_dev) {
> +                       dst->dev = dev_net(dst->dev)->loopback_dev;
> +                       dev_hold(dst->dev);
> +               }
> +
>                 dev_put(dev);
>         }
>  }

This is complete garbage, now we have a dst somewhere in the system,
pointing to a device that might have been freed.

Crash might happen when dst->dev is used later.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net_namespace: fixed net_device reference leak
  2016-09-08 23:33             ` Jojy Varghese
  2016-09-08 23:51               ` Eric Dumazet
@ 2016-09-09  0:04               ` David Miller
  2016-09-09  0:35                 ` Jojy Varghese
  1 sibling, 1 reply; 16+ messages in thread
From: David Miller @ 2016-09-09  0:04 UTC (permalink / raw)
  To: jojy.varghese; +Cc: eric.dumazet, netdev

From: Jojy Varghese <jojy.varghese@gmail.com>
Date: Thu, 8 Sep 2016 16:33:38 -0700

> Sorry last patch was just the delta over my original patch. Here is
> the complete patch:

It looks like you are grasping at straws.

Please slow down and take the time to debug this properly.

All of the mechanisms exist to make these references go away
properly, but you are seeing a leak for some reason.

Therefore, please work with Eric to discover where the real bug is
instead of posting knee-jerk reaction patches every 15 minutes.

Thank you.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net_namespace: fixed net_device reference leak
  2016-09-09  0:04               ` David Miller
@ 2016-09-09  0:35                 ` Jojy Varghese
  2016-09-09  0:39                   ` David Miller
                                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Jojy Varghese @ 2016-09-09  0:35 UTC (permalink / raw)
  To: David Miller; +Cc: Eric Dumazet, netdev

Thanks for the feedback guys. I appreciate it.

I am working with 4.4.3 kernel.  I will try to reproduce the issue
with the kernel tree at
https://git.kernel.org/cgit/linux/kernel/git/davem/net.git.

In the meantime, the only data I have now is :
-  "dst-ifdown"  does not look correct since it will never release
reference to loopback device.
-  One way to fix might be to assign the 'dst''s net device to the
init net device. I thought adding reference to init net device's
loopback might be safe.

Any suggestion on debugging further is much appreciated.

-Jojy


On Thu, Sep 8, 2016 at 5:04 PM, David Miller <davem@davemloft.net> wrote:
> From: Jojy Varghese <jojy.varghese@gmail.com>
> Date: Thu, 8 Sep 2016 16:33:38 -0700
>
>> Sorry last patch was just the delta over my original patch. Here is
>> the complete patch:
>
> It looks like you are grasping at straws.
>
> Please slow down and take the time to debug this properly.
>
> All of the mechanisms exist to make these references go away
> properly, but you are seeing a leak for some reason.
>
> Therefore, please work with Eric to discover where the real bug is
> instead of posting knee-jerk reaction patches every 15 minutes.
>
> Thank you.



-- 
Jojy G Varghese

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net_namespace: fixed net_device reference leak
  2016-09-09  0:35                 ` Jojy Varghese
@ 2016-09-09  0:39                   ` David Miller
  2016-09-09  2:35                   ` Eric Dumazet
  2016-09-10 16:12                   ` David Ahern
  2 siblings, 0 replies; 16+ messages in thread
From: David Miller @ 2016-09-09  0:39 UTC (permalink / raw)
  To: jojy.varghese; +Cc: eric.dumazet, netdev

From: Jojy Varghese <jojy.varghese@gmail.com>
Date: Thu, 8 Sep 2016 17:35:08 -0700

> -  "dst-ifdown"  does not look correct since it will never release
> reference to loopback device.

The routes and packets in the namespace are purged and disappear when
the namespace goes down, and that's how all of the references to the
loopback device are decremented.

Nothing dst_ifdown() does looks wrong, we are quite sure the problem
is elsewhere.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net_namespace: fixed net_device reference leak
  2016-09-09  0:35                 ` Jojy Varghese
  2016-09-09  0:39                   ` David Miller
@ 2016-09-09  2:35                   ` Eric Dumazet
  2016-09-10 16:12                   ` David Ahern
  2 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2016-09-09  2:35 UTC (permalink / raw)
  To: Jojy Varghese; +Cc: David Miller, netdev

On Thu, 2016-09-08 at 17:35 -0700, Jojy Varghese wrote:
> Thanks for the feedback guys. I appreciate it.
> 
> I am working with 4.4.3 kernel.  I will try to reproduce the issue
> with the kernel tree at
> https://git.kernel.org/cgit/linux/kernel/git/davem/net.git.
> 

Good.

Please do not top post on netdev.

> In the meantime, the only data I have now is :
> -  "dst-ifdown"  does not look correct since it will never release
> reference to loopback device.

Normally at netns dismantle, all sockets and devices holding references
to dst in this netns are closed and dst freed. Details are in various
notifiers.

At that time the dst->dev references will be released, including the
ones having dst->dev == loopback (on the soon to be freed netns)

We do not want to force in dst_ifdown() a dst->dev pointing nowhere, as
it might crash say when dst->dev is used later during a sendmsg() or
something like that.


> -  One way to fix might be to assign the 'dst''s net device to the
> init net device. I thought adding reference to init net device's
> loopback might be safe.
> 


You only make _rebooting_ the host not possible, since the problem of
having some reference on loopback device will still be there.

We really have to find the root cause, not simply working around it.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] net_namespace: fixed net_device reference leak
  2016-09-09  0:35                 ` Jojy Varghese
  2016-09-09  0:39                   ` David Miller
  2016-09-09  2:35                   ` Eric Dumazet
@ 2016-09-10 16:12                   ` David Ahern
  2 siblings, 0 replies; 16+ messages in thread
From: David Ahern @ 2016-09-10 16:12 UTC (permalink / raw)
  To: Jojy Varghese, David Miller; +Cc: Eric Dumazet, netdev

[-- Attachment #1: Type: text/plain, Size: 1209 bytes --]

On 9/8/16 6:35 PM, Jojy Varghese wrote:
> Thanks for the feedback guys. I appreciate it.
> 
> I am working with 4.4.3 kernel.  I will try to reproduce the issue
> with the kernel tree at
> https://git.kernel.org/cgit/linux/kernel/git/davem/net.git.
> 
> In the meantime, the only data I have now is :
> -  "dst-ifdown"  does not look correct since it will never release
> reference to loopback device.
> -  One way to fix might be to assign the 'dst''s net device to the
> init net device. I thought adding reference to init net device's
> loopback might be safe.
> 
> Any suggestion on debugging further is much appreciated.
> 

Last 2 times I chased down netdev refcnts I used the attach patch. It adds tracepoints to dev_hold and dev_put.

perf record -e net:dev_hold,net:dev_put -a -g -- <run your test>

You probably will need to increase memory for perf buffers (e.g., -m 8192).

It will dump a lot of data but you will find the missing put; you just have to be patient wading through the data and matching up hold and put references. python/perl script with perf script can make that easier.

---

Have you looked at whether there are open sockets? e.g., perhaps a dst reference is cached on a socket


[-- Attachment #2: 0001-Add-tracepoints-to-dev_hold-and-dev_put.patch --]
[-- Type: text/plain, Size: 2875 bytes --]

From 068b1b8362ec5fd1b9dffdbd6e84474ada2eb829 Mon Sep 17 00:00:00 2001
From: David Ahern <dsa@cumulusnetworks.com>
Date: Thu, 11 Feb 2016 02:40:12 -0800
Subject: [PATCH] Add tracepoints to dev_hold and dev_put

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
---
 include/linux/netdevice.h  |  6 ++++++
 include/trace/events/net.h | 38 ++++++++++++++++++++++++++++++++++++++
 net/core/dev.c             | 21 +++++++++++++++++++++
 3 files changed, 65 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 219f53c30cb3..7ef6fc672dfb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3193,6 +3193,7 @@ extern int		netdev_budget;
 /* Called by rtnetlink.c:rtnl_unlock() */
 void netdev_run_todo(void);
 
+#if 0
 /**
  *	dev_put - release reference to device
  *	@dev: network device
@@ -3214,6 +3215,11 @@ static inline void dev_hold(struct net_device *dev)
 {
 	this_cpu_inc(*dev->pcpu_refcnt);
 }
+#else
+void dev_put(struct net_device *dev);
+void dev_hold(struct net_device *dev);
+
+#endif
 
 /* Carrier loss detection, dial on demand. The functions netif_carrier_on
  * and _off may be called from IRQ context, but it is caller
diff --git a/include/trace/events/net.h b/include/trace/events/net.h
index 49cc7c3de252..9ed73dfe9d09 100644
--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h
@@ -236,6 +236,44 @@ DEFINE_EVENT(net_dev_rx_verbose_template, netif_rx_ni_entry,
 	TP_ARGS(skb)
 );
 
+TRACE_EVENT(dev_put,
+
+	TP_PROTO(struct net_device *dev),
+
+	TP_ARGS(dev),
+
+	TP_STRUCT__entry(
+		__string(	name,		dev->name	)
+		__field(	int,		refcnt )
+	),
+
+	TP_fast_assign(
+		__assign_str(name, dev->name);
+		__entry->refcnt = netdev_refcnt_read(dev);
+	),
+
+	TP_printk("dev=%s refcnt %d", __get_str(name), __entry->refcnt)
+);
+
+TRACE_EVENT(dev_hold,
+
+	TP_PROTO(struct net_device *dev),
+
+	TP_ARGS(dev),
+
+	TP_STRUCT__entry(
+		__string(	name,		dev->name	)
+		__field(	int,		refcnt )
+	),
+
+	TP_fast_assign(
+		__assign_str(name, dev->name);
+		__entry->refcnt = netdev_refcnt_read(dev);
+	),
+
+	TP_printk("dev=%s refcnt %d", __get_str(name), __entry->refcnt)
+);
+
 #endif /* _TRACE_NET_H */
 
 /* This part must be outside protection */
diff --git a/net/core/dev.c b/net/core/dev.c
index f1284835b8c9..99ac067afd18 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8117,3 +8117,24 @@ static int __init net_dev_init(void)
 }
 
 subsys_initcall(net_dev_init);
+
+
+void dev_put(struct net_device *dev)
+{
+	this_cpu_dec(*dev->pcpu_refcnt);
+	trace_dev_put(dev);
+}
+EXPORT_SYMBOL(dev_put);
+
+/**
+ *      dev_hold - get reference to device
+ *      @dev: network device
+ *
+ * Hold reference to device to keep it from being freed.
+ */
+void dev_hold(struct net_device *dev)
+{
+	this_cpu_inc(*dev->pcpu_refcnt);
+	trace_dev_hold(dev);
+}
+EXPORT_SYMBOL(dev_hold);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-09-10 16:12 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-08  1:55 [PATCH] net_namespace: fixed net_device reference leak Jojy Varghese
2016-09-08 19:08 ` Jojy Varghese
2016-09-08 19:32   ` Lance Richardson
2016-09-08 21:00   ` Eric Dumazet
2016-09-08 21:05     ` Eric Dumazet
2016-09-08 22:12       ` Jojy Varghese
2016-09-08 22:16         ` Eric Dumazet
2016-09-08 23:16           ` Jojy Varghese
2016-09-08 23:33             ` Jojy Varghese
2016-09-08 23:51               ` Eric Dumazet
2016-09-09  0:04               ` David Miller
2016-09-09  0:35                 ` Jojy Varghese
2016-09-09  0:39                   ` David Miller
2016-09-09  2:35                   ` Eric Dumazet
2016-09-10 16:12                   ` David Ahern
2016-09-08 23:37             ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.