All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
@ 2017-03-16  8:08 Kevin Xu
  2017-03-16 10:01 ` Jakub Sitnicki
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Kevin Xu @ 2017-03-16  8:08 UTC (permalink / raw)
  To: davem; +Cc: netdev, Kevin Xu

In some rare cases, inet_sk_rx_dst_set() may be called multiple times
on the same dst, causing reference count leakage. Eventually, it
prevents net_device to be destroyed. The bug then manifested as

unregister_netdevice: waiting for lo to become free. Usage count = 1

in the kernel log, preventing new network namespace creation.

The patch works around the issue by checking whether the socket already
has the same dst set.

Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com>
---
 net/ipv4/tcp_ipv4.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 575e19d..f425c14 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1807,9 +1807,14 @@ void inet_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
 
-	if (dst && dst_hold_safe(dst)) {
-		sk->sk_rx_dst = dst;
-		inet_sk(sk)->rx_dst_ifindex = skb->skb_iif;
+	if (dst) {
+		if (unlikely(dst == sk->sk_rx_dst))
+			return;
+
+		if (dst_hold_safe(dst)) {
+			sk->sk_rx_dst = dst;
+			inet_sk(sk)->rx_dst_ifindex = skb->skb_iif;
+		}
 	}
 }
 EXPORT_SYMBOL(inet_sk_rx_dst_set);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
  2017-03-16  8:08 [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst Kevin Xu
@ 2017-03-16 10:01 ` Jakub Sitnicki
  2017-03-16 10:12   ` Kevin Xu
  2017-03-16 18:16 ` David Miller
  2017-03-16 21:33 ` [PATCH] [PATCH net] " Eric Dumazet
  2 siblings, 1 reply; 15+ messages in thread
From: Jakub Sitnicki @ 2017-03-16 10:01 UTC (permalink / raw)
  To: Kevin Xu; +Cc: davem, netdev

On Thu, Mar 16, 2017 at 08:08 AM GMT, Kevin Xu wrote:
> In some rare cases, inet_sk_rx_dst_set() may be called multiple times
> on the same dst, causing reference count leakage. Eventually, it
> prevents net_device to be destroyed. The bug then manifested as
>
> unregister_netdevice: waiting for lo to become free. Usage count = 1
>
> in the kernel log, preventing new network namespace creation.
>
> The patch works around the issue by checking whether the socket already
> has the same dst set.
>
> Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com>
> ---

FWIW, with this patch applied I'm still sometimes seeing:

[  125.928095] unregister_netdevice: waiting for lo to become free. Usage count = 1

-Jakub

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
  2017-03-16 10:01 ` Jakub Sitnicki
@ 2017-03-16 10:12   ` Kevin Xu
  2017-03-16 10:45     ` Jakub Sitnicki
  0 siblings, 1 reply; 15+ messages in thread
From: Kevin Xu @ 2017-03-16 10:12 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: davem, netdev

Do you mean the message looping endlessly?

If so, then I suppose it's a different bug.

Kevin

> On Mar 16, 2017, at 3:01 AM, Jakub Sitnicki <jkbs@redhat.com> wrote:
> 
>> On Thu, Mar 16, 2017 at 08:08 AM GMT, Kevin Xu wrote:
>> In some rare cases, inet_sk_rx_dst_set() may be called multiple times
>> on the same dst, causing reference count leakage. Eventually, it
>> prevents net_device to be destroyed. The bug then manifested as
>> 
>> unregister_netdevice: waiting for lo to become free. Usage count = 1
>> 
>> in the kernel log, preventing new network namespace creation.
>> 
>> The patch works around the issue by checking whether the socket already
>> has the same dst set.
>> 
>> Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com>
>> ---
> 
> FWIW, with this patch applied I'm still sometimes seeing:
> 
> [  125.928095] unregister_netdevice: waiting for lo to become free. Usage count = 1
> 
> -Jakub

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
  2017-03-16 10:12   ` Kevin Xu
@ 2017-03-16 10:45     ` Jakub Sitnicki
  2017-03-16 18:18       ` Kaiwen Xu
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Sitnicki @ 2017-03-16 10:45 UTC (permalink / raw)
  To: Kevin Xu; +Cc: davem, netdev

On Thu, Mar 16, 2017 at 10:12 AM GMT, Kevin Xu wrote:
> Do you mean the message looping endlessly?

No, the message is emitted just once. Around 100 seconds after
destroying a few namespaces. Occurs not so often, maybe once per ten
runs.

-Jakub

> If so, then I suppose it's a different bug.
>
> Kevin
>
>> On Mar 16, 2017, at 3:01 AM, Jakub Sitnicki <jkbs@redhat.com> wrote:
>>
>>> On Thu, Mar 16, 2017 at 08:08 AM GMT, Kevin Xu wrote:
>>> In some rare cases, inet_sk_rx_dst_set() may be called multiple times
>>> on the same dst, causing reference count leakage. Eventually, it
>>> prevents net_device to be destroyed. The bug then manifested as
>>>
>>> unregister_netdevice: waiting for lo to become free. Usage count = 1
>>>
>>> in the kernel log, preventing new network namespace creation.
>>>
>>> The patch works around the issue by checking whether the socket already
>>> has the same dst set.
>>>
>>> Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com>
>>> ---
>>
>> FWIW, with this patch applied I'm still sometimes seeing:
>>
>> [  125.928095] unregister_netdevice: waiting for lo to become free. Usage count = 1
>>
>> -Jakub

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
  2017-03-16  8:08 [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst Kevin Xu
  2017-03-16 10:01 ` Jakub Sitnicki
@ 2017-03-16 18:16 ` David Miller
  2017-03-19  1:48   ` [PATCH net v2] " Kevin Xu
  2017-03-16 21:33 ` [PATCH] [PATCH net] " Eric Dumazet
  2 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2017-03-16 18:16 UTC (permalink / raw)
  To: kaiwen.xu; +Cc: netdev

From: Kevin Xu <kaiwen.xu@hulu.com>
Date: Thu, 16 Mar 2017 01:08:30 -0700

> In some rare cases, inet_sk_rx_dst_set() may be called multiple times
> on the same dst, causing reference count leakage. Eventually, it
> prevents net_device to be destroyed. The bug then manifested as
> 
> unregister_netdevice: waiting for lo to become free. Usage count = 1
> 
> in the kernel log, preventing new network namespace creation.
> 
> The patch works around the issue by checking whether the socket already
> has the same dst set.
> 
> Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com>

You need to prevent this parallel execution of this function or use
atomic compare-and-exchange to set the rx_dst in order to prevent
the double refcounting.

This patch by itself is even worse than a workaround, because depending
upon how the compiler reloads values from memory, the problem can still
occur.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
  2017-03-16 10:45     ` Jakub Sitnicki
@ 2017-03-16 18:18       ` Kaiwen Xu
  2017-03-16 18:42         ` Cong Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Kaiwen Xu @ 2017-03-16 18:18 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: davem, netdev

On Thu, Mar 16, 2017 at 11:45:03AM +0100, Jakub Sitnicki wrote:
> On Thu, Mar 16, 2017 at 10:12 AM GMT, Kevin Xu wrote:
> > Do you mean the message looping endlessly?
> 
> No, the message is emitted just once. Around 100 seconds after
> destroying a few namespaces. Occurs not so often, maybe once per ten
> runs.
> 
> -Jakub
> 

I saw that happening from time to time during my test as well, I suspect
it was because some TCP sockets stays in either TCP_TIME_WAIT or
TCP_FIN_WAIT1. But eventually those sockets get destroyed and lo gets
deleted as well.

The patch was fixing an issue I am seeing, that the message gets looped
forever, and causing a deadlock on new network ns creation.

Kevin

> > If so, then I suppose it's a different bug.
> >
> > Kevin
> >
> >> On Mar 16, 2017, at 3:01 AM, Jakub Sitnicki <jkbs@redhat.com> wrote:
> >>
> >>> On Thu, Mar 16, 2017 at 08:08 AM GMT, Kevin Xu wrote:
> >>> In some rare cases, inet_sk_rx_dst_set() may be called multiple times
> >>> on the same dst, causing reference count leakage. Eventually, it
> >>> prevents net_device to be destroyed. The bug then manifested as
> >>>
> >>> unregister_netdevice: waiting for lo to become free. Usage count = 1
> >>>
> >>> in the kernel log, preventing new network namespace creation.
> >>>
> >>> The patch works around the issue by checking whether the socket already
> >>> has the same dst set.
> >>>
> >>> Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com>
> >>> ---
> >>
> >> FWIW, with this patch applied I'm still sometimes seeing:
> >>
> >> [  125.928095] unregister_netdevice: waiting for lo to become free. Usage count = 1
> >>
> >> -Jakub

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
  2017-03-16 18:18       ` Kaiwen Xu
@ 2017-03-16 18:42         ` Cong Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Cong Wang @ 2017-03-16 18:42 UTC (permalink / raw)
  To: Kaiwen Xu; +Cc: Jakub Sitnicki, davem, netdev

On Thu, Mar 16, 2017 at 11:18 AM, Kaiwen Xu <kaiwen.xu@hulu.com> wrote:
> On Thu, Mar 16, 2017 at 11:45:03AM +0100, Jakub Sitnicki wrote:
>> On Thu, Mar 16, 2017 at 10:12 AM GMT, Kevin Xu wrote:
>> > Do you mean the message looping endlessly?
>>
>> No, the message is emitted just once. Around 100 seconds after
>> destroying a few namespaces. Occurs not so often, maybe once per ten
>> runs.
>>
>> -Jakub
>>
>
> I saw that happening from time to time during my test as well, I suspect
> it was because some TCP sockets stays in either TCP_TIME_WAIT or
> TCP_FIN_WAIT1. But eventually those sockets get destroyed and lo gets
> deleted as well.

But TIMEWAIT sockets are purged by inet_twsk_purge() during netns
destroy, apparently before lo is destroyed.

>
> The patch was fixing an issue I am seeing, that the message gets looped
> forever, and causing a deadlock on new network ns creation.
>

But as DaveM said, the race still could happen after your patch,
the if check you add is not atomic at all.

Also, we should already lock the sock at the time we call
inet_sk_rx_dst_set(), but perhaps not for TCP_LISTEN case...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
  2017-03-16  8:08 [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst Kevin Xu
  2017-03-16 10:01 ` Jakub Sitnicki
  2017-03-16 18:16 ` David Miller
@ 2017-03-16 21:33 ` Eric Dumazet
  2017-03-16 23:13   ` Cong Wang
  2 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2017-03-16 21:33 UTC (permalink / raw)
  To: Kevin Xu; +Cc: davem, netdev

On Thu, 2017-03-16 at 01:08 -0700, Kevin Xu wrote:
> In some rare cases, inet_sk_rx_dst_set() may be called multiple times
> on the same dst, causing reference count leakage. Eventually, it
> prevents net_device to be destroyed. The bug then manifested as
> 
> unregister_netdevice: waiting for lo to become free. Usage count = 1
> 
> in the kernel log, preventing new network namespace creation.
> 
> The patch works around the issue by checking whether the socket already
> has the same dst set.
> 
> Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com>
> ---
>  net/ipv4/tcp_ipv4.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 575e19d..f425c14 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1807,9 +1807,14 @@ void inet_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
>  {
>  	struct dst_entry *dst = skb_dst(skb);
>  
> -	if (dst && dst_hold_safe(dst)) {
> -		sk->sk_rx_dst = dst;
> -		inet_sk(sk)->rx_dst_ifindex = skb->skb_iif;
> +	if (dst) {
> +		if (unlikely(dst == sk->sk_rx_dst))
> +			return;
> +
> +		if (dst_hold_safe(dst)) {
> +			sk->sk_rx_dst = dst;
> +			inet_sk(sk)->rx_dst_ifindex = skb->skb_iif;
> +		}


That should not be needed.

Have you backported the redirect fix ?

commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0

Or other fixes that went very recently (pick David Miller net tree)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
  2017-03-16 21:33 ` [PATCH] [PATCH net] " Eric Dumazet
@ 2017-03-16 23:13   ` Cong Wang
  2017-03-16 23:29     ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Cong Wang @ 2017-03-16 23:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Kevin Xu, David Miller, Linux Kernel Network Developers

On Thu, Mar 16, 2017 at 2:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Have you backported the redirect fix ?
>
> commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0
>
> Or other fixes that went very recently (pick David Miller net tree)

Why the commit above is relevant here? It fixes a double-release,
while Kevin's case is a double-hold... Not to mention it sk_rx_dst
instead of sk_dst_cache, according to Kevin.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst.
  2017-03-16 23:13   ` Cong Wang
@ 2017-03-16 23:29     ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2017-03-16 23:29 UTC (permalink / raw)
  To: Cong Wang; +Cc: Kevin Xu, David Miller, Linux Kernel Network Developers

On Thu, 2017-03-16 at 16:13 -0700, Cong Wang wrote:
> On Thu, Mar 16, 2017 at 2:33 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Have you backported the redirect fix ?
> >
> > commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0
> >
> > Or other fixes that went very recently (pick David Miller net tree)
> 
> Why the commit above is relevant here? It fixes a double-release,
> while Kevin's case is a double-hold... Not to mention it sk_rx_dst
> instead of sk_dst_cache, according to Kevin.

Yes you are right.

I was lazy, the other fix I wanted to mention was 
02b2faaf0af1d85585f

In any case I wanted to make sure the reported issue is not for some old
kernels.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net v2] net: Do not hold the reference for the same sk_rx_dst
  2017-03-16 18:16 ` David Miller
@ 2017-03-19  1:48   ` Kevin Xu
  2017-03-19  3:49     ` Cong Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Kevin Xu @ 2017-03-19  1:48 UTC (permalink / raw)
  To: davem; +Cc: netdev, Kevin Xu

In some rare cases, inet_sk_rx_dst_set() may be called multiple times
on the same dst, causing double refcounting. Eventually, it
prevents net_device to be destroyed. The bug manifested as

unregister_netdevice: waiting for lo to become free. Usage count = 1

in the kernel log, preventing new network namespace creation.

Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com>
---
 net/ipv4/tcp_ipv4.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 575e19d..ca588d1 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1806,9 +1806,11 @@ int tcp_v4_rcv(struct sk_buff *skb)
 void inet_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
+	struct dst_entry *old;
 
 	if (dst && dst_hold_safe(dst)) {
-		sk->sk_rx_dst = dst;
+		old = xchg(&sk->sk_rx_dst, dst);
+		dst_release(old);
 		inet_sk(sk)->rx_dst_ifindex = skb->skb_iif;
 	}
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2] net: Do not hold the reference for the same sk_rx_dst
  2017-03-19  1:48   ` [PATCH net v2] " Kevin Xu
@ 2017-03-19  3:49     ` Cong Wang
  2017-03-19  4:03       ` Kaiwen Xu
  0 siblings, 1 reply; 15+ messages in thread
From: Cong Wang @ 2017-03-19  3:49 UTC (permalink / raw)
  To: Kevin Xu; +Cc: David Miller, Linux Kernel Network Developers

On Sat, Mar 18, 2017 at 6:48 PM, Kevin Xu <kaiwen.xu@hulu.com> wrote:
> In some rare cases, inet_sk_rx_dst_set() may be called multiple times
> on the same dst, causing double refcounting. Eventually, it
> prevents net_device to be destroyed. The bug manifested as
>
> unregister_netdevice: waiting for lo to become free. Usage count = 1
>
> in the kernel log, preventing new network namespace creation.
>
> Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com>

Don't know why you don't follow the discussion on your v1...

It is protected by bh_lock_sock(), so your patch is not needed
at all.

Read net/ipv4/udp.c:

1762 /* For TCP sockets, sk_rx_dst is protected by socket lock
1763  * For UDP, we use xchg() to guard against concurrent changes.
1764  */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2] net: Do not hold the reference for the same sk_rx_dst
  2017-03-19  3:49     ` Cong Wang
@ 2017-03-19  4:03       ` Kaiwen Xu
  2017-03-20  4:09         ` Cong Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Kaiwen Xu @ 2017-03-19  4:03 UTC (permalink / raw)
  To: Cong Wang; +Cc: David Miller, Linux Kernel Network Developers

On Sat, Mar 18, 2017 at 08:49:43PM -0700, Cong Wang wrote:
> On Sat, Mar 18, 2017 at 6:48 PM, Kevin Xu <kaiwen.xu@hulu.com> wrote:
> > In some rare cases, inet_sk_rx_dst_set() may be called multiple times
> > on the same dst, causing double refcounting. Eventually, it
> > prevents net_device to be destroyed. The bug manifested as
> >
> > unregister_netdevice: waiting for lo to become free. Usage count = 1
> >
> > in the kernel log, preventing new network namespace creation.
> >
> > Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com>
> 
> Don't know why you don't follow the discussion on your v1...
> 
> It is protected by bh_lock_sock(), so your patch is not needed
> at all.
> 
> Read net/ipv4/udp.c:
> 
> 1762 /* For TCP sockets, sk_rx_dst is protected by socket lock
> 1763  * For UDP, we use xchg() to guard against concurrent changes.
> 1764  */

I probably misunderstood. Do you mean v2 patch is actually not needed or
the whole workaround is not necessary?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2] net: Do not hold the reference for the same sk_rx_dst
  2017-03-19  4:03       ` Kaiwen Xu
@ 2017-03-20  4:09         ` Cong Wang
  2017-03-20 21:23           ` Kaiwen Xu
  0 siblings, 1 reply; 15+ messages in thread
From: Cong Wang @ 2017-03-20  4:09 UTC (permalink / raw)
  To: Kaiwen Xu; +Cc: David Miller, Linux Kernel Network Developers

On Sat, Mar 18, 2017 at 9:03 PM, Kaiwen Xu <kaiwen.xu@hulu.com> wrote:
> On Sat, Mar 18, 2017 at 08:49:43PM -0700, Cong Wang wrote:
>> On Sat, Mar 18, 2017 at 6:48 PM, Kevin Xu <kaiwen.xu@hulu.com> wrote:
>> > In some rare cases, inet_sk_rx_dst_set() may be called multiple times
>> > on the same dst, causing double refcounting. Eventually, it
>> > prevents net_device to be destroyed. The bug manifested as
>> >
>> > unregister_netdevice: waiting for lo to become free. Usage count = 1
>> >
>> > in the kernel log, preventing new network namespace creation.
>> >
>> > Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com>
>>
>> Don't know why you don't follow the discussion on your v1...
>>
>> It is protected by bh_lock_sock(), so your patch is not needed
>> at all.
>>
>> Read net/ipv4/udp.c:
>>
>> 1762 /* For TCP sockets, sk_rx_dst is protected by socket lock
>> 1763  * For UDP, we use xchg() to guard against concurrent changes.
>> 1764  */
>
> I probably misunderstood. Do you mean v2 patch is actually not needed or
> the whole workaround is not necessary?

Your patch, no matter v1 or v2, is not needed because we use
bh_lock_sock() to serialize inet_sk_rx_dst_set(), unless you find
a case where we miss the bh_lock_sock(), but you don't say it in
your changelog. "some rare cases" is not enough to justify this bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net v2] net: Do not hold the reference for the same sk_rx_dst
  2017-03-20  4:09         ` Cong Wang
@ 2017-03-20 21:23           ` Kaiwen Xu
  0 siblings, 0 replies; 15+ messages in thread
From: Kaiwen Xu @ 2017-03-20 21:23 UTC (permalink / raw)
  To: Cong Wang; +Cc: David Miller, Linux Kernel Network Developers

On Sun, Mar 19, 2017 at 09:09:38PM -0700, Cong Wang wrote:
> On Sat, Mar 18, 2017 at 9:03 PM, Kaiwen Xu <kaiwen.xu@hulu.com> wrote:
> > On Sat, Mar 18, 2017 at 08:49:43PM -0700, Cong Wang wrote:
> >> On Sat, Mar 18, 2017 at 6:48 PM, Kevin Xu <kaiwen.xu@hulu.com> wrote:
> >> > In some rare cases, inet_sk_rx_dst_set() may be called multiple times
> >> > on the same dst, causing double refcounting. Eventually, it
> >> > prevents net_device to be destroyed. The bug manifested as
> >> >
> >> > unregister_netdevice: waiting for lo to become free. Usage count = 1
> >> >
> >> > in the kernel log, preventing new network namespace creation.
> >> >
> >> > Signed-off-by: Kevin Xu <kaiwen.xu@hulu.com>
> >>
> >> Don't know why you don't follow the discussion on your v1...
> >>
> >> It is protected by bh_lock_sock(), so your patch is not needed
> >> at all.
> >>
> >> Read net/ipv4/udp.c:
> >>
> >> 1762 /* For TCP sockets, sk_rx_dst is protected by socket lock
> >> 1763  * For UDP, we use xchg() to guard against concurrent changes.
> >> 1764  */
> >
> > I probably misunderstood. Do you mean v2 patch is actually not needed or
> > the whole workaround is not necessary?
> 
> Your patch, no matter v1 or v2, is not needed because we use
> bh_lock_sock() to serialize inet_sk_rx_dst_set(), unless you find
> a case where we miss the bh_lock_sock(), but you don't say it in
> your changelog. "some rare cases" is not enough to justify this bug.

I see, thanks for your explanation! I will try to dig in more to see if
I can find the root cause.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-03-20 21:58 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-16  8:08 [PATCH] [PATCH net] net: Do not hold the reference for the same sk_rx_dst Kevin Xu
2017-03-16 10:01 ` Jakub Sitnicki
2017-03-16 10:12   ` Kevin Xu
2017-03-16 10:45     ` Jakub Sitnicki
2017-03-16 18:18       ` Kaiwen Xu
2017-03-16 18:42         ` Cong Wang
2017-03-16 18:16 ` David Miller
2017-03-19  1:48   ` [PATCH net v2] " Kevin Xu
2017-03-19  3:49     ` Cong Wang
2017-03-19  4:03       ` Kaiwen Xu
2017-03-20  4:09         ` Cong Wang
2017-03-20 21:23           ` Kaiwen Xu
2017-03-16 21:33 ` [PATCH] [PATCH net] " Eric Dumazet
2017-03-16 23:13   ` Cong Wang
2017-03-16 23:29     ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.