All of lore.kernel.org
 help / color / mirror / Atom feed
* Panic in udp4_lib_lookup2
@ 2021-05-12 18:21 kapandey
  2021-05-13  8:05 ` Paolo Abeni
  0 siblings, 1 reply; 7+ messages in thread
From: kapandey @ 2021-05-12 18:21 UTC (permalink / raw)
  To: willemb, pabeni, netdev, eric.dumazet, sharathv, subashab

Hi All,

We observed panic in udp_lib_lookup with below call trace:
[136523.743271]  (7) Call trace:
[136523.743275]  (7)  udp4_lib_lookup2+0x88/0x1d8
[136523.743277]  (7)  __udp4_lib_lookup+0x168/0x194
[136523.743280]  (7)  udp4_lib_lookup+0x28/0x54
[136523.743285]  (7)  nf_sk_lookup_slow_v4+0x2b4/0x384
[136523.743289]  (7)  owner_mt+0xb8/0x248
[136523.743292]  (7)  ipt_do_table+0x28c/0x6a8
[136523.743295]  (7) iptable_filter_hook+0x24/0x30
[136523.743299]  (7)  nf_hook_slow+0xa8/0x148
[136523.743303]  (7)  ip_local_deliver+0xa8/0x14c
[136523.743305]  (7)  ip_rcv+0xe0/0x134


We suspect this might happen due to below sequence:
Time                                                   CPU X             
                                                                          
                                    CPU Y
t0                                inet_release -> udp_lib_close -> 
sk_common_release -> udp_lib_unhash                            
inet_diag_handler_cmd -> udp_diag_destroy -> __udp_diag_destroy -> 
udp_lib_rehash

t1                                if(sk hashed)                          
                                                                          
                         if(sk hashed)
                                          sk_del_node_init_rcu
                                          
hlist_del_init_rcu(udp_portaddr_node)

t2                                                                       
                                                                          
                                        if(hslot2 != nhslot2)
                                                                          
                                                                          
                                                   
hlist_del_init_rcu(udp_portaddr_node)
                                                                          
                                                                          
                                                   
hlist_add_head_rcu(udp_portaddr_node)

t3                                sk is freed here

After t3 any use of that added udp_portaddr_node  will cause panic.
Same happens in udp4_lib_lookup2/udp6_lib_lookup2.

We dont have any reproducer for this, getting hit during random testing.

Pls let us know how this be resolved.

Thanks,
KP

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Panic in udp4_lib_lookup2
  2021-05-12 18:21 Panic in udp4_lib_lookup2 kapandey
@ 2021-05-13  8:05 ` Paolo Abeni
  2021-05-13 19:48   ` Kaustubh Pandey
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2021-05-13  8:05 UTC (permalink / raw)
  To: kapandey, willemb, netdev, eric.dumazet, sharathv, subashab

Hello,

On Wed, 2021-05-12 at 23:51 +0530, kapandey@codeaurora.org wrote:
> We observed panic in udp_lib_lookup with below call trace:
> [136523.743271]  (7) Call trace:
> [136523.743275]  (7)  udp4_lib_lookup2+0x88/0x1d8
> [136523.743277]  (7)  __udp4_lib_lookup+0x168/0x194
> [136523.743280]  (7)  udp4_lib_lookup+0x28/0x54
> [136523.743285]  (7)  nf_sk_lookup_slow_v4+0x2b4/0x384
> [136523.743289]  (7)  owner_mt+0xb8/0x248
> [136523.743292]  (7)  ipt_do_table+0x28c/0x6a8
> [136523.743295]  (7) iptable_filter_hook+0x24/0x30
> [136523.743299]  (7)  nf_hook_slow+0xa8/0x148
> [136523.743303]  (7)  ip_local_deliver+0xa8/0x14c
> [136523.743305]  (7)  ip_rcv+0xe0/0x134

It would be helpful if you could provide a full, decoded, stack trace
and the relevant kernel version.

> We suspect this might happen due to below sequence:

Some email formatting made the "graph" very hard to read...

> Time                                                   CPU X             
>                                                                           
>                                     CPU Y
> t0                                inet_release -> udp_lib_close -> 
> sk_common_release -> udp_lib_unhash                            
> inet_diag_handler_cmd -> udp_diag_destroy -> __udp_diag_destroy -> 
> udp_lib_rehash

... but it looks like udp_lib_close() is missing a
lock_sock()/release_sock() pair. Something alike:
---
diff --git a/include/net/udp.h b/include/net/udp.h
index 360df454356c..06586b42db3f 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -209,7 +209,9 @@ void udp_lib_rehash(struct sock *sk, u16 new_hash);
 
 static inline void udp_lib_close(struct sock *sk, long timeout)
 {
+	lock_sock(sk);
 	sk_common_release(sk);
+	release_sock(sk);
 }
 
 int udp_lib_get_port(struct sock *sk, unsigned short snum,
---

could u please give the above a spin in your testbed?

Thanks!

Paolo



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Panic in udp4_lib_lookup2
  2021-05-13  8:05 ` Paolo Abeni
@ 2021-05-13 19:48   ` Kaustubh Pandey
  2021-05-13 21:18     ` Paolo Abeni
  0 siblings, 1 reply; 7+ messages in thread
From: Kaustubh Pandey @ 2021-05-13 19:48 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: willemb, netdev, eric.dumazet, sharathv, subashab

On Thu, May 13, 2021 at 10:05:37AM +0200, Paolo Abeni wrote:
> Hello,
>
> On Wed, 2021-05-12 at 23:51 +0530, kapandey@codeaurora.org wrote:
> > We observed panic in udp_lib_lookup with below call trace:
> > [136523.743271]  (7) Call trace:
> > [136523.743275]  (7)  udp4_lib_lookup2+0x88/0x1d8
> > [136523.743277]  (7)  __udp4_lib_lookup+0x168/0x194
> > [136523.743280]  (7)  udp4_lib_lookup+0x28/0x54
> > [136523.743285]  (7)  nf_sk_lookup_slow_v4+0x2b4/0x384
> > [136523.743289]  (7)  owner_mt+0xb8/0x248
> > [136523.743292]  (7)  ipt_do_table+0x28c/0x6a8
> > [136523.743295]  (7) iptable_filter_hook+0x24/0x30
> > [136523.743299]  (7)  nf_hook_slow+0xa8/0x148
> > [136523.743303]  (7)  ip_local_deliver+0xa8/0x14c
> > [136523.743305]  (7)  ip_rcv+0xe0/0x134
>
> It would be helpful if you could provide a full, decoded, stack trace
> and the relevant kernel version.
>
> > We suspect this might happen due to below sequence:
>
> Some email formatting made the "graph" very hard to read...
>
> > Time                                                   CPU X
> >
> >                                     CPU Y
> > t0                                inet_release -> udp_lib_close ->
> > sk_common_release -> udp_lib_unhash
> > inet_diag_handler_cmd -> udp_diag_destroy -> __udp_diag_destroy ->
> > udp_lib_rehash
>
> ... but it looks like udp_lib_close() is missing a
> lock_sock()/release_sock() pair. Something alike:
> ---
> diff --git a/include/net/udp.h b/include/net/udp.h
> index 360df454356c..06586b42db3f 100644
> --- a/include/net/udp.h
> +++ b/include/net/udp.h
> @@ -209,7 +209,9 @@ void udp_lib_rehash(struct sock *sk, u16 new_hash);
>
>  static inline void udp_lib_close(struct sock *sk, long timeout)
>  {
> +	lock_sock(sk);
>  	sk_common_release(sk);
> +	release_sock(sk);
>  }
>
>  int udp_lib_get_port(struct sock *sk, unsigned short snum,
> ---
>
> could u please give the above a spin in your testbed?
>
> Thanks!
>
> Paolo
>
>
Hi Paolo,

Pls find full backtrace:
[136523.743141]Unable to handle kernel paging request at virtual address
00000c08ff000820

[136523.743147]Mem abort info:
[136523.743150]  ESR = 0x96000004
[136523.743154]  EC = 0x25: DABT (current EL), IL = 32 bits
[136523.743157]  SET = 0, FnV = 0
[136523.743159]  EA = 0, S1PTW = 0
[136523.743162]Data abort info:
[136523.743164]  ISV = 0, ISS = 0x00000004
[136523.743167]  CM = 0, WnR = 0
[136523.743170][00000c08ff000820] address between user and kernel
address ranges
[136523.743174]Internal error: Oops: 96000004 [#1] PREEMPT SMP
[136523.743218]pstate: a0c00005 (NzCv daif +PAN +UAO)

[136523.743226]pc : udp4_lib_lookup2+0x88/0x1d8
[136523.743229]lr : __udp4_lib_lookup+0x168/0x194

[136523.743232]sp : ffffffc039a9b600
[136523.743234]x29: ffffffc039a9b610 x28: 0000000000000000
[136523.743237]x27: 0000000000000000 x26: ff000c08ff0007f0
[136523.743239]x25: ffffffe1e2319b40 x24: 0000000073737272
[136523.743242]x23: 0000000000003500 x22: 0000000000000000
[136523.743244]x21: 000000000000a622 x20: 000000000000001a
[136523.743246]x19: 00000000ffffffff x18: ffffffc02b6d7098
[136523.743249]x17: 0000000000000000 x16: 0000000000000000
[136523.743251]x15: 0000000000000000 x14: 0000000000000000
[136523.743254]x13: 0000000000000000 x12: ffffff81d8e30000
[136523.743257]x11: 0000000000000fff x10: ff000c08ff0007f0
[136523.743259]x9 : ff000c08ff000808 x8 : 0000000000000000
[136523.743261]x7 : ffffff81d8e3e6a0 x6 : 0000000000000000
[136523.743263]x5 : 000000000000001a x4 : 000000000000a622
[136523.743266]x3 : 0000000000000000 x2 : 0000000000003500
[136523.743268]x1 : 0000000073737272 x0 : ffffffe1e2319b40
[136523.743271]Call trace:
[136523.743275] udp4_lib_lookup2+0x88/0x1d8
[136523.743277] __udp4_lib_lookup+0x168/0x194
[136523.743280] udp4_lib_lookup+0x28/0x54
[136523.743285] nf_sk_lookup_slow_v4+0x2b4/0x384
[136523.743289] owner_mt+0xb8/0x248
[136523.743292] ipt_do_table+0x28c/0x6a8
[136523.743295] iptable_filter_hook+0x24/0x30
[136523.743299] nf_hook_slow+0xa8/0x148
[136523.743303] ip_local_deliver+0xa8/0x14c
[136523.743305] ip_rcv+0xe0/0x134
[136523.743309] __netif_receive_skb_core+0x9d0/0xd74
[136523.743313] __netif_receive_skb+0x50/0x17c
[136523.743316] netif_receive_skb_internal+0xa0/0xd8
[136523.743318] netif_receive_skb+0xec/0x1e4

Kernel version used is 5.4

Derived from your suggestion I think below change might be more suited
here:
diff --git a/net/core/sock.c b/net/core/sock.c
index f87c9f7..a5dd0e5 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3210,7 +3210,9 @@ void sk_common_release(struct sock *sk)
         * A. Remove from hash tables.
         */
+       lock_sock(sk);
       sk->sk_prot->unhash(sk);
+       release_sock(sk);

--

Previously I was trying to share this anlaysis:

Time |		CPU[X]		               |		CPU[Y]
____________________________________________________________________________
     |				               |
 t0  | inet_release                            | inet_diag_handler_cmd
     |	udp_lib_close                          |  udp_diag_destroy
     |	 sk_common_release                     |   __udp_diag_destroy
     |     udp_lib_unhash                      |    udp_lib_rehash
____________________________________________________________________________

 t1  | if(sk hashed)                           |  if(sk hashed)
     |   sk_del_node_init_rcu                  |
     |   hlist_del_init_rcu(udp_portaddr_node) |
____________________________________________________________________________

 t2  |                                         |    if(hslot2 != nhslot2)
     |                                         |       hlist_del_init_rcu(udp_portaddr_node)
     |                                         |       hlist_add_head_rcu(udp_portaddr_node)
____________________________________________________________________________

 t3  | sk is freed here                        |

 Pls let me know your views on this.

 Thanks
 KP

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Panic in udp4_lib_lookup2
  2021-05-13 19:48   ` Kaustubh Pandey
@ 2021-05-13 21:18     ` Paolo Abeni
  2021-05-18 14:20       ` Kaustubh Pandey
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2021-05-13 21:18 UTC (permalink / raw)
  To: Kaustubh Pandey; +Cc: willemb, netdev, eric.dumazet, sharathv, subashab

On Fri, 2021-05-14 at 01:18 +0530, Kaustubh Pandey wrote:
> On Thu, May 13, 2021 at 10:05:37AM +0200, Paolo Abeni wrote:
> > Hello,
> > 
> > On Wed, 2021-05-12 at 23:51 +0530, kapandey@codeaurora.org wrote:
> > > We observed panic in udp_lib_lookup with below call trace:
> > > [136523.743271]  (7) Call trace:
> > > [136523.743275]  (7)  udp4_lib_lookup2+0x88/0x1d8
> > > [136523.743277]  (7)  __udp4_lib_lookup+0x168/0x194
> > > [136523.743280]  (7)  udp4_lib_lookup+0x28/0x54
> > > [136523.743285]  (7)  nf_sk_lookup_slow_v4+0x2b4/0x384
> > > [136523.743289]  (7)  owner_mt+0xb8/0x248
> > > [136523.743292]  (7)  ipt_do_table+0x28c/0x6a8
> > > [136523.743295]  (7) iptable_filter_hook+0x24/0x30
> > > [136523.743299]  (7)  nf_hook_slow+0xa8/0x148
> > > [136523.743303]  (7)  ip_local_deliver+0xa8/0x14c
> > > [136523.743305]  (7)  ip_rcv+0xe0/0x134
> > 
> > It would be helpful if you could provide a full, decoded, stack trace
> > and the relevant kernel version.
> > 
> > > We suspect this might happen due to below sequence:
> > 
> > Some email formatting made the "graph" very hard to read...
> > 
> > > Time                                                   CPU X
> > > 
> > >                                     CPU Y
> > > t0                                inet_release -> udp_lib_close ->
> > > sk_common_release -> udp_lib_unhash
> > > inet_diag_handler_cmd -> udp_diag_destroy -> __udp_diag_destroy ->
> > > udp_lib_rehash
> > 
> > ... but it looks like udp_lib_close() is missing a
> > lock_sock()/release_sock() pair. Something alike:
> > ---
> > diff --git a/include/net/udp.h b/include/net/udp.h
> > index 360df454356c..06586b42db3f 100644
> > --- a/include/net/udp.h
> > +++ b/include/net/udp.h
> > @@ -209,7 +209,9 @@ void udp_lib_rehash(struct sock *sk, u16 new_hash);
> > 
> >  static inline void udp_lib_close(struct sock *sk, long timeout)
> >  {
> > +	lock_sock(sk);
> >  	sk_common_release(sk);
> > +	release_sock(sk);
> >  }
> > 
> >  int udp_lib_get_port(struct sock *sk, unsigned short snum,
> > ---
> > 
> > could u please give the above a spin in your testbed?
> > 
> > Thanks!
> > 
> > Paolo
> > 
> > 
> Hi Paolo,
> 
> Pls find full backtrace:
> [136523.743141]Unable to handle kernel paging request at virtual address
> 00000c08ff000820
> 
> [136523.743147]Mem abort info:
> [136523.743150]  ESR = 0x96000004
> [136523.743154]  EC = 0x25: DABT (current EL), IL = 32 bits
> [136523.743157]  SET = 0, FnV = 0
> [136523.743159]  EA = 0, S1PTW = 0
> [136523.743162]Data abort info:
> [136523.743164]  ISV = 0, ISS = 0x00000004
> [136523.743167]  CM = 0, WnR = 0
> [136523.743170][00000c08ff000820] address between user and kernel
> address ranges
> [136523.743174]Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [136523.743218]pstate: a0c00005 (NzCv daif +PAN +UAO)
> 
> [136523.743226]pc : udp4_lib_lookup2+0x88/0x1d8
> [136523.743229]lr : __udp4_lib_lookup+0x168/0x194
> 
> [136523.743232]sp : ffffffc039a9b600
> [136523.743234]x29: ffffffc039a9b610 x28: 0000000000000000
> [136523.743237]x27: 0000000000000000 x26: ff000c08ff0007f0
> [136523.743239]x25: ffffffe1e2319b40 x24: 0000000073737272
> [136523.743242]x23: 0000000000003500 x22: 0000000000000000
> [136523.743244]x21: 000000000000a622 x20: 000000000000001a
> [136523.743246]x19: 00000000ffffffff x18: ffffffc02b6d7098
> [136523.743249]x17: 0000000000000000 x16: 0000000000000000
> [136523.743251]x15: 0000000000000000 x14: 0000000000000000
> [136523.743254]x13: 0000000000000000 x12: ffffff81d8e30000
> [136523.743257]x11: 0000000000000fff x10: ff000c08ff0007f0
> [136523.743259]x9 : ff000c08ff000808 x8 : 0000000000000000
> [136523.743261]x7 : ffffff81d8e3e6a0 x6 : 0000000000000000
> [136523.743263]x5 : 000000000000001a x4 : 000000000000a622
> [136523.743266]x3 : 0000000000000000 x2 : 0000000000003500
> [136523.743268]x1 : 0000000073737272 x0 : ffffffe1e2319b40
> [136523.743271]Call trace:
> [136523.743275] udp4_lib_lookup2+0x88/0x1d8
> [136523.743277] __udp4_lib_lookup+0x168/0x194
> [136523.743280] udp4_lib_lookup+0x28/0x54
> [136523.743285] nf_sk_lookup_slow_v4+0x2b4/0x384
> [136523.743289] owner_mt+0xb8/0x248
> [136523.743292] ipt_do_table+0x28c/0x6a8
> [136523.743295] iptable_filter_hook+0x24/0x30
> [136523.743299] nf_hook_slow+0xa8/0x148
> [136523.743303] ip_local_deliver+0xa8/0x14c
> [136523.743305] ip_rcv+0xe0/0x134
> [136523.743309] __netif_receive_skb_core+0x9d0/0xd74
> [136523.743313] __netif_receive_skb+0x50/0x17c
> [136523.743316] netif_receive_skb_internal+0xa0/0xd8
> [136523.743318] netif_receive_skb+0xec/0x1e4
> 
> Kernel version used is 5.4
> 
> Derived from your suggestion I think below change might be more suited
> here:
> diff --git a/net/core/sock.c b/net/core/sock.c
> index f87c9f7..a5dd0e5 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -3210,7 +3210,9 @@ void sk_common_release(struct sock *sk)
>          * A. Remove from hash tables.
>          */
> +       lock_sock(sk);
>        sk->sk_prot->unhash(sk);
> +       release_sock(sk);

The above will cause deadlock on protocol rightfully calling
sk_common_release under the socket lock (e.g. sctp).

Could you please test the change I suggested in your testbed?

Thanks

Paolo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Panic in udp4_lib_lookup2
  2021-05-13 21:18     ` Paolo Abeni
@ 2021-05-18 14:20       ` Kaustubh Pandey
  2021-05-21 18:18         ` Paolo Abeni
  0 siblings, 1 reply; 7+ messages in thread
From: Kaustubh Pandey @ 2021-05-18 14:20 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: willemb, netdev, eric.dumazet, sharathv, subashab

On Thu, May 13, 2021 at 11:18:53PM +0200, Paolo Abeni wrote:
> On Fri, 2021-05-14 at 01:18 +0530, Kaustubh Pandey wrote:
> > On Thu, May 13, 2021 at 10:05:37AM +0200, Paolo Abeni wrote:
> > > Hello,
> > > 
> > > On Wed, 2021-05-12 at 23:51 +0530, kapandey@codeaurora.org wrote:
> > > > We observed panic in udp_lib_lookup with below call trace:
> > > > [136523.743271]  (7) Call trace:
> > > > [136523.743275]  (7)  udp4_lib_lookup2+0x88/0x1d8
> > > > [136523.743277]  (7)  __udp4_lib_lookup+0x168/0x194
> > > > [136523.743280]  (7)  udp4_lib_lookup+0x28/0x54
> > > > [136523.743285]  (7)  nf_sk_lookup_slow_v4+0x2b4/0x384
> > > > [136523.743289]  (7)  owner_mt+0xb8/0x248
> > > > [136523.743292]  (7)  ipt_do_table+0x28c/0x6a8
> > > > [136523.743295]  (7) iptable_filter_hook+0x24/0x30
> > > > [136523.743299]  (7)  nf_hook_slow+0xa8/0x148
> > > > [136523.743303]  (7)  ip_local_deliver+0xa8/0x14c
> > > > [136523.743305]  (7)  ip_rcv+0xe0/0x134
> > > 
> > > It would be helpful if you could provide a full, decoded, stack trace
> > > and the relevant kernel version.
> > > 
> > > > We suspect this might happen due to below sequence:
> > > 
> > > Some email formatting made the "graph" very hard to read...
> > > 
> > > > Time                                                   CPU X
> > > > 
> > > >                                     CPU Y
> > > > t0                                inet_release -> udp_lib_close ->
> > > > sk_common_release -> udp_lib_unhash
> > > > inet_diag_handler_cmd -> udp_diag_destroy -> __udp_diag_destroy ->
> > > > udp_lib_rehash
> > > 
> > > ... but it looks like udp_lib_close() is missing a
> > > lock_sock()/release_sock() pair. Something alike:
> > > ---
> > > diff --git a/include/net/udp.h b/include/net/udp.h
> > > index 360df454356c..06586b42db3f 100644
> > > --- a/include/net/udp.h
> > > +++ b/include/net/udp.h
> > > @@ -209,7 +209,9 @@ void udp_lib_rehash(struct sock *sk, u16 new_hash);
> > > 
> > >  static inline void udp_lib_close(struct sock *sk, long timeout)
> > >  {
> > > +	lock_sock(sk);
> > >  	sk_common_release(sk);
> > > +	release_sock(sk);
> > >  }
> > > 
> > >  int udp_lib_get_port(struct sock *sk, unsigned short snum,
> > > ---
> > > 
> > > could u please give the above a spin in your testbed?
> > > 
> > > Thanks!
> > > 
> > > Paolo
> > > 
> > > 
> > Hi Paolo,
> > 
> > Pls find full backtrace:
> > [136523.743141]Unable to handle kernel paging request at virtual address
> > 00000c08ff000820
> > 
> > [136523.743147]Mem abort info:
> > [136523.743150]  ESR = 0x96000004
> > [136523.743154]  EC = 0x25: DABT (current EL), IL = 32 bits
> > [136523.743157]  SET = 0, FnV = 0
> > [136523.743159]  EA = 0, S1PTW = 0
> > [136523.743162]Data abort info:
> > [136523.743164]  ISV = 0, ISS = 0x00000004
> > [136523.743167]  CM = 0, WnR = 0
> > [136523.743170][00000c08ff000820] address between user and kernel
> > address ranges
> > [136523.743174]Internal error: Oops: 96000004 [#1] PREEMPT SMP
> > [136523.743218]pstate: a0c00005 (NzCv daif +PAN +UAO)
> > 
> > [136523.743226]pc : udp4_lib_lookup2+0x88/0x1d8
> > [136523.743229]lr : __udp4_lib_lookup+0x168/0x194
> > 
> > [136523.743232]sp : ffffffc039a9b600
> > [136523.743234]x29: ffffffc039a9b610 x28: 0000000000000000
> > [136523.743237]x27: 0000000000000000 x26: ff000c08ff0007f0
> > [136523.743239]x25: ffffffe1e2319b40 x24: 0000000073737272
> > [136523.743242]x23: 0000000000003500 x22: 0000000000000000
> > [136523.743244]x21: 000000000000a622 x20: 000000000000001a
> > [136523.743246]x19: 00000000ffffffff x18: ffffffc02b6d7098
> > [136523.743249]x17: 0000000000000000 x16: 0000000000000000
> > [136523.743251]x15: 0000000000000000 x14: 0000000000000000
> > [136523.743254]x13: 0000000000000000 x12: ffffff81d8e30000
> > [136523.743257]x11: 0000000000000fff x10: ff000c08ff0007f0
> > [136523.743259]x9 : ff000c08ff000808 x8 : 0000000000000000
> > [136523.743261]x7 : ffffff81d8e3e6a0 x6 : 0000000000000000
> > [136523.743263]x5 : 000000000000001a x4 : 000000000000a622
> > [136523.743266]x3 : 0000000000000000 x2 : 0000000000003500
> > [136523.743268]x1 : 0000000073737272 x0 : ffffffe1e2319b40
> > [136523.743271]Call trace:
> > [136523.743275] udp4_lib_lookup2+0x88/0x1d8
> > [136523.743277] __udp4_lib_lookup+0x168/0x194
> > [136523.743280] udp4_lib_lookup+0x28/0x54
> > [136523.743285] nf_sk_lookup_slow_v4+0x2b4/0x384
> > [136523.743289] owner_mt+0xb8/0x248
> > [136523.743292] ipt_do_table+0x28c/0x6a8
> > [136523.743295] iptable_filter_hook+0x24/0x30
> > [136523.743299] nf_hook_slow+0xa8/0x148
> > [136523.743303] ip_local_deliver+0xa8/0x14c
> > [136523.743305] ip_rcv+0xe0/0x134
> > [136523.743309] __netif_receive_skb_core+0x9d0/0xd74
> > [136523.743313] __netif_receive_skb+0x50/0x17c
> > [136523.743316] netif_receive_skb_internal+0xa0/0xd8
> > [136523.743318] netif_receive_skb+0xec/0x1e4
> > 
> > Kernel version used is 5.4
> > 
> > Derived from your suggestion I think below change might be more suited
> > here:
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index f87c9f7..a5dd0e5 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -3210,7 +3210,9 @@ void sk_common_release(struct sock *sk)
> >          * A. Remove from hash tables.
> >          */
> > +       lock_sock(sk);
> >        sk->sk_prot->unhash(sk);
> > +       release_sock(sk);
> 
> The above will cause deadlock on protocol rightfully calling
> sk_common_release under the socket lock (e.g. sctp).
> 
> Could you please test the change I suggested in your testbed?
> 
> Thanks
> 
> Paolo
> 
Hi Paolo,

I picked up your change and with that it get stuck in udp_destroy_sock
trying to do lock_sock_fast.

Can you guide for next steps here?

Thanks,
KP

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Panic in udp4_lib_lookup2
  2021-05-18 14:20       ` Kaustubh Pandey
@ 2021-05-21 18:18         ` Paolo Abeni
  2021-06-04 14:26           ` Kaustubh Pandey
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2021-05-21 18:18 UTC (permalink / raw)
  To: Kaustubh Pandey; +Cc: willemb, netdev, eric.dumazet, sharathv, subashab

On Tue, 2021-05-18 at 19:50 +0530, Kaustubh Pandey wrote:
> On Thu, May 13, 2021 at 11:18:53PM +0200, Paolo Abeni wrote:
> > On Fri, 2021-05-14 at 01:18 +0530, Kaustubh Pandey wrote:
> > > On Thu, May 13, 2021 at 10:05:37AM +0200, Paolo Abeni wrote:
> > > > Hello,
> > > > 
> > > > On Wed, 2021-05-12 at 23:51 +0530, kapandey@codeaurora.org wrote:
> > > > > We observed panic in udp_lib_lookup with below call trace:
> > > > > [136523.743271]  (7) Call trace:
> > > > > [136523.743275]  (7)  udp4_lib_lookup2+0x88/0x1d8
> > > > > [136523.743277]  (7)  __udp4_lib_lookup+0x168/0x194
> > > > > [136523.743280]  (7)  udp4_lib_lookup+0x28/0x54
> > > > > [136523.743285]  (7)  nf_sk_lookup_slow_v4+0x2b4/0x384
> > > > > [136523.743289]  (7)  owner_mt+0xb8/0x248
> > > > > [136523.743292]  (7)  ipt_do_table+0x28c/0x6a8
> > > > > [136523.743295]  (7) iptable_filter_hook+0x24/0x30
> > > > > [136523.743299]  (7)  nf_hook_slow+0xa8/0x148
> > > > > [136523.743303]  (7)  ip_local_deliver+0xa8/0x14c
> > > > > [136523.743305]  (7)  ip_rcv+0xe0/0x134
> > > > 
> > > > It would be helpful if you could provide a full, decoded, stack trace
> > > > and the relevant kernel version.
> > > > 
> > > > > We suspect this might happen due to below sequence:
> > > > 
> > > > Some email formatting made the "graph" very hard to read...
> > > > 
> > > > > Time                                                   CPU X
> > > > > 
> > > > >                                     CPU Y
> > > > > t0                                inet_release -> udp_lib_close ->
> > > > > sk_common_release -> udp_lib_unhash
> > > > > inet_diag_handler_cmd -> udp_diag_destroy -> __udp_diag_destroy ->
> > > > > udp_lib_rehash
> > > > 
> > > > ... but it looks like udp_lib_close() is missing a
> > > > lock_sock()/release_sock() pair. Something alike:
> > > > ---
> > > > diff --git a/include/net/udp.h b/include/net/udp.h
> > > > index 360df454356c..06586b42db3f 100644
> > > > --- a/include/net/udp.h
> > > > +++ b/include/net/udp.h
> > > > @@ -209,7 +209,9 @@ void udp_lib_rehash(struct sock *sk, u16 new_hash);
> > > > 
> > > >  static inline void udp_lib_close(struct sock *sk, long timeout)
> > > >  {
> > > > +	lock_sock(sk);
> > > >  	sk_common_release(sk);
> > > > +	release_sock(sk);
> > > >  }
> > > > 
> > > >  int udp_lib_get_port(struct sock *sk, unsigned short snum,
> > > > ---
> > > > 
> > > > could u please give the above a spin in your testbed?
> > > > 
> > > > Thanks!
> > > > 
> > > > Paolo
> > > > 
> > > > 
> > > Hi Paolo,
> > > 
> > > Pls find full backtrace:
> > > [136523.743141]Unable to handle kernel paging request at virtual address
> > > 00000c08ff000820
> > > 
> > > [136523.743147]Mem abort info:
> > > [136523.743150]  ESR = 0x96000004
> > > [136523.743154]  EC = 0x25: DABT (current EL), IL = 32 bits
> > > [136523.743157]  SET = 0, FnV = 0
> > > [136523.743159]  EA = 0, S1PTW = 0
> > > [136523.743162]Data abort info:
> > > [136523.743164]  ISV = 0, ISS = 0x00000004
> > > [136523.743167]  CM = 0, WnR = 0
> > > [136523.743170][00000c08ff000820] address between user and kernel
> > > address ranges
> > > [136523.743174]Internal error: Oops: 96000004 [#1] PREEMPT SMP
> > > [136523.743218]pstate: a0c00005 (NzCv daif +PAN +UAO)
> > > 
> > > [136523.743226]pc : udp4_lib_lookup2+0x88/0x1d8
> > > [136523.743229]lr : __udp4_lib_lookup+0x168/0x194
> > > 
> > > [136523.743232]sp : ffffffc039a9b600
> > > [136523.743234]x29: ffffffc039a9b610 x28: 0000000000000000
> > > [136523.743237]x27: 0000000000000000 x26: ff000c08ff0007f0
> > > [136523.743239]x25: ffffffe1e2319b40 x24: 0000000073737272
> > > [136523.743242]x23: 0000000000003500 x22: 0000000000000000
> > > [136523.743244]x21: 000000000000a622 x20: 000000000000001a
> > > [136523.743246]x19: 00000000ffffffff x18: ffffffc02b6d7098
> > > [136523.743249]x17: 0000000000000000 x16: 0000000000000000
> > > [136523.743251]x15: 0000000000000000 x14: 0000000000000000
> > > [136523.743254]x13: 0000000000000000 x12: ffffff81d8e30000
> > > [136523.743257]x11: 0000000000000fff x10: ff000c08ff0007f0
> > > [136523.743259]x9 : ff000c08ff000808 x8 : 0000000000000000
> > > [136523.743261]x7 : ffffff81d8e3e6a0 x6 : 0000000000000000
> > > [136523.743263]x5 : 000000000000001a x4 : 000000000000a622
> > > [136523.743266]x3 : 0000000000000000 x2 : 0000000000003500
> > > [136523.743268]x1 : 0000000073737272 x0 : ffffffe1e2319b40
> > > [136523.743271]Call trace:
> > > [136523.743275] udp4_lib_lookup2+0x88/0x1d8
> > > [136523.743277] __udp4_lib_lookup+0x168/0x194
> > > [136523.743280] udp4_lib_lookup+0x28/0x54
> > > [136523.743285] nf_sk_lookup_slow_v4+0x2b4/0x384
> > > [136523.743289] owner_mt+0xb8/0x248
> > > [136523.743292] ipt_do_table+0x28c/0x6a8
> > > [136523.743295] iptable_filter_hook+0x24/0x30
> > > [136523.743299] nf_hook_slow+0xa8/0x148
> > > [136523.743303] ip_local_deliver+0xa8/0x14c
> > > [136523.743305] ip_rcv+0xe0/0x134
> > > [136523.743309] __netif_receive_skb_core+0x9d0/0xd74
> > > [136523.743313] __netif_receive_skb+0x50/0x17c
> > > [136523.743316] netif_receive_skb_internal+0xa0/0xd8
> > > [136523.743318] netif_receive_skb+0xec/0x1e4
> > > 
> > > Kernel version used is 5.4
> > > 
> > > Derived from your suggestion I think below change might be more suited
> > > here:
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > index f87c9f7..a5dd0e5 100644
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -3210,7 +3210,9 @@ void sk_common_release(struct sock *sk)
> > >          * A. Remove from hash tables.
> > >          */
> > > +       lock_sock(sk);
> > >        sk->sk_prot->unhash(sk);
> > > +       release_sock(sk);
> > 
> > The above will cause deadlock on protocol rightfully calling
> > sk_common_release under the socket lock (e.g. sctp).
> > 
> > Could you please test the change I suggested in your testbed?
> > 
> > Thanks
> > 
> > Paolo
> > 
> Hi Paolo,
> 
> I picked up your change and with that it get stuck in udp_destroy_sock
> trying to do lock_sock_fast.
> 
> Can you guide for next steps here?

I'm sorry for the high latency. 

It's a bit tricky as we can't easily acquire the socket lock on close
with a wider scope.

Can you please try this one?

Thanks!

Paolo
---
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 055fceb18bea..0d6205726abf 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2607,6 +2607,7 @@ void udp_destroy_sock(struct sock *sk)
 {
 	struct udp_sock *up = udp_sk(sk);
 	bool slow = lock_sock_fast(sk);
+	sock_set_flag(sk, SOCK_DEAD);
 	udp_flush_pending_frames(sk);
 	unlock_sock_fast(sk, slow);
 	if (static_branch_unlikely(&udp_encap_needed_key)) {
@@ -2857,10 +2858,17 @@ int udp_abort(struct sock *sk, int err)
 {
 	lock_sock(sk);
 
+	/* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing
+	 * with close()
+	 */
+	if (sock_flag(sk, SOCK_DEAD))
+		goto out;
+
 	sk->sk_err = err;
 	sk->sk_error_report(sk);
 	__udp_disconnect(sk, 0);
 
+out:
 	release_sock(sk);
 
 	return 0;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 199b080d418a..2984ec43b9aa 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1598,6 +1598,7 @@ void udpv6_destroy_sock(struct sock *sk)
 {
 	struct udp_sock *up = udp_sk(sk);
 	lock_sock(sk);
+	sock_set_flag(sk, SOCK_DEAD);
 	udp_v6_flush_pending_frames(sk);
 	release_sock(sk);
 


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Panic in udp4_lib_lookup2
  2021-05-21 18:18         ` Paolo Abeni
@ 2021-06-04 14:26           ` Kaustubh Pandey
  0 siblings, 0 replies; 7+ messages in thread
From: Kaustubh Pandey @ 2021-06-04 14:26 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: willemb, netdev, eric.dumazet, sharathv, subashab

On Fri, May 21, 2021 at 08:18:56PM +0200, Paolo Abeni wrote:
> On Tue, 2021-05-18 at 19:50 +0530, Kaustubh Pandey wrote:
> > On Thu, May 13, 2021 at 11:18:53PM +0200, Paolo Abeni wrote:
> > > On Fri, 2021-05-14 at 01:18 +0530, Kaustubh Pandey wrote:
> > > > On Thu, May 13, 2021 at 10:05:37AM +0200, Paolo Abeni wrote:
> > > > > Hello,
> > > > > 
> > > > > On Wed, 2021-05-12 at 23:51 +0530, kapandey@codeaurora.org wrote:
> > > > > > We observed panic in udp_lib_lookup with below call trace:
> > > > > > [136523.743271]  (7) Call trace:
> > > > > > [136523.743275]  (7)  udp4_lib_lookup2+0x88/0x1d8
> > > > > > [136523.743277]  (7)  __udp4_lib_lookup+0x168/0x194
> > > > > > [136523.743280]  (7)  udp4_lib_lookup+0x28/0x54
> > > > > > [136523.743285]  (7)  nf_sk_lookup_slow_v4+0x2b4/0x384
> > > > > > [136523.743289]  (7)  owner_mt+0xb8/0x248
> > > > > > [136523.743292]  (7)  ipt_do_table+0x28c/0x6a8
> > > > > > [136523.743295]  (7) iptable_filter_hook+0x24/0x30
> > > > > > [136523.743299]  (7)  nf_hook_slow+0xa8/0x148
> > > > > > [136523.743303]  (7)  ip_local_deliver+0xa8/0x14c
> > > > > > [136523.743305]  (7)  ip_rcv+0xe0/0x134
> > > > > 
> > > > > It would be helpful if you could provide a full, decoded, stack trace
> > > > > and the relevant kernel version.
> > > > > 
> > > > > > We suspect this might happen due to below sequence:
> > > > > 
> > > > > Some email formatting made the "graph" very hard to read...
> > > > > 
> > > > > > Time                                                   CPU X
> > > > > > 
> > > > > >                                     CPU Y
> > > > > > t0                                inet_release -> udp_lib_close ->
> > > > > > sk_common_release -> udp_lib_unhash
> > > > > > inet_diag_handler_cmd -> udp_diag_destroy -> __udp_diag_destroy ->
> > > > > > udp_lib_rehash
> > > > > 
> > > > > ... but it looks like udp_lib_close() is missing a
> > > > > lock_sock()/release_sock() pair. Something alike:
> > > > > ---
> > > > > diff --git a/include/net/udp.h b/include/net/udp.h
> > > > > index 360df454356c..06586b42db3f 100644
> > > > > --- a/include/net/udp.h
> > > > > +++ b/include/net/udp.h
> > > > > @@ -209,7 +209,9 @@ void udp_lib_rehash(struct sock *sk, u16 new_hash);
> > > > > 
> > > > >  static inline void udp_lib_close(struct sock *sk, long timeout)
> > > > >  {
> > > > > +	lock_sock(sk);
> > > > >  	sk_common_release(sk);
> > > > > +	release_sock(sk);
> > > > >  }
> > > > > 
> > > > >  int udp_lib_get_port(struct sock *sk, unsigned short snum,
> > > > > ---
> > > > > 
> > > > > could u please give the above a spin in your testbed?
> > > > > 
> > > > > Thanks!
> > > > > 
> > > > > Paolo
> > > > > 
> > > > > 
> > > > Hi Paolo,
> > > > 
> > > > Pls find full backtrace:
> > > > [136523.743141]Unable to handle kernel paging request at virtual address
> > > > 00000c08ff000820
> > > > 
> > > > [136523.743147]Mem abort info:
> > > > [136523.743150]  ESR = 0x96000004
> > > > [136523.743154]  EC = 0x25: DABT (current EL), IL = 32 bits
> > > > [136523.743157]  SET = 0, FnV = 0
> > > > [136523.743159]  EA = 0, S1PTW = 0
> > > > [136523.743162]Data abort info:
> > > > [136523.743164]  ISV = 0, ISS = 0x00000004
> > > > [136523.743167]  CM = 0, WnR = 0
> > > > [136523.743170][00000c08ff000820] address between user and kernel
> > > > address ranges
> > > > [136523.743174]Internal error: Oops: 96000004 [#1] PREEMPT SMP
> > > > [136523.743218]pstate: a0c00005 (NzCv daif +PAN +UAO)
> > > > 
> > > > [136523.743226]pc : udp4_lib_lookup2+0x88/0x1d8
> > > > [136523.743229]lr : __udp4_lib_lookup+0x168/0x194
> > > > 
> > > > [136523.743232]sp : ffffffc039a9b600
> > > > [136523.743234]x29: ffffffc039a9b610 x28: 0000000000000000
> > > > [136523.743237]x27: 0000000000000000 x26: ff000c08ff0007f0
> > > > [136523.743239]x25: ffffffe1e2319b40 x24: 0000000073737272
> > > > [136523.743242]x23: 0000000000003500 x22: 0000000000000000
> > > > [136523.743244]x21: 000000000000a622 x20: 000000000000001a
> > > > [136523.743246]x19: 00000000ffffffff x18: ffffffc02b6d7098
> > > > [136523.743249]x17: 0000000000000000 x16: 0000000000000000
> > > > [136523.743251]x15: 0000000000000000 x14: 0000000000000000
> > > > [136523.743254]x13: 0000000000000000 x12: ffffff81d8e30000
> > > > [136523.743257]x11: 0000000000000fff x10: ff000c08ff0007f0
> > > > [136523.743259]x9 : ff000c08ff000808 x8 : 0000000000000000
> > > > [136523.743261]x7 : ffffff81d8e3e6a0 x6 : 0000000000000000
> > > > [136523.743263]x5 : 000000000000001a x4 : 000000000000a622
> > > > [136523.743266]x3 : 0000000000000000 x2 : 0000000000003500
> > > > [136523.743268]x1 : 0000000073737272 x0 : ffffffe1e2319b40
> > > > [136523.743271]Call trace:
> > > > [136523.743275] udp4_lib_lookup2+0x88/0x1d8
> > > > [136523.743277] __udp4_lib_lookup+0x168/0x194
> > > > [136523.743280] udp4_lib_lookup+0x28/0x54
> > > > [136523.743285] nf_sk_lookup_slow_v4+0x2b4/0x384
> > > > [136523.743289] owner_mt+0xb8/0x248
> > > > [136523.743292] ipt_do_table+0x28c/0x6a8
> > > > [136523.743295] iptable_filter_hook+0x24/0x30
> > > > [136523.743299] nf_hook_slow+0xa8/0x148
> > > > [136523.743303] ip_local_deliver+0xa8/0x14c
> > > > [136523.743305] ip_rcv+0xe0/0x134
> > > > [136523.743309] __netif_receive_skb_core+0x9d0/0xd74
> > > > [136523.743313] __netif_receive_skb+0x50/0x17c
> > > > [136523.743316] netif_receive_skb_internal+0xa0/0xd8
> > > > [136523.743318] netif_receive_skb+0xec/0x1e4
> > > > 
> > > > Kernel version used is 5.4
> > > > 
> > > > Derived from your suggestion I think below change might be more suited
> > > > here:
> > > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > > index f87c9f7..a5dd0e5 100644
> > > > --- a/net/core/sock.c
> > > > +++ b/net/core/sock.c
> > > > @@ -3210,7 +3210,9 @@ void sk_common_release(struct sock *sk)
> > > >          * A. Remove from hash tables.
> > > >          */
> > > > +       lock_sock(sk);
> > > >        sk->sk_prot->unhash(sk);
> > > > +       release_sock(sk);
> > > 
> > > The above will cause deadlock on protocol rightfully calling
> > > sk_common_release under the socket lock (e.g. sctp).
> > > 
> > > Could you please test the change I suggested in your testbed?
> > > 
> > > Thanks
> > > 
> > > Paolo
> > > 
> > Hi Paolo,
> > 
> > I picked up your change and with that it get stuck in udp_destroy_sock
> > trying to do lock_sock_fast.
> > 
> > Can you guide for next steps here?
> 
> I'm sorry for the high latency. 
> 
> It's a bit tricky as we can't easily acquire the socket lock on close
> with a wider scope.
> 
> Can you please try this one?
> 
> Thanks!
> 
> Paolo
> ---
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 055fceb18bea..0d6205726abf 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -2607,6 +2607,7 @@ void udp_destroy_sock(struct sock *sk)
>  {
>  	struct udp_sock *up = udp_sk(sk);
>  	bool slow = lock_sock_fast(sk);
> +	sock_set_flag(sk, SOCK_DEAD);
>  	udp_flush_pending_frames(sk);
>  	unlock_sock_fast(sk, slow);
>  	if (static_branch_unlikely(&udp_encap_needed_key)) {
> @@ -2857,10 +2858,17 @@ int udp_abort(struct sock *sk, int err)
>  {
>  	lock_sock(sk);
>  
> +	/* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing
> +	 * with close()
> +	 */
> +	if (sock_flag(sk, SOCK_DEAD))
> +		goto out;
> +
>  	sk->sk_err = err;
>  	sk->sk_error_report(sk);
>  	__udp_disconnect(sk, 0);
>  
> +out:
>  	release_sock(sk);
>  
>  	return 0;
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 199b080d418a..2984ec43b9aa 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -1598,6 +1598,7 @@ void udpv6_destroy_sock(struct sock *sk)
>  {
>  	struct udp_sock *up = udp_sk(sk);
>  	lock_sock(sk);
> +	sock_set_flag(sk, SOCK_DEAD);
>  	udp_v6_flush_pending_frames(sk);
>  	release_sock(sk);
>  
> 
Hi Paolo,
Apologies for delay.

I picked up this patch and tried reproducing the panic.
I never observed the panic with this patch. Seems to work perfectly
fine.

Can you pls raise a formal patch for this?

Thanks,
KP

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-06-04 14:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-12 18:21 Panic in udp4_lib_lookup2 kapandey
2021-05-13  8:05 ` Paolo Abeni
2021-05-13 19:48   ` Kaustubh Pandey
2021-05-13 21:18     ` Paolo Abeni
2021-05-18 14:20       ` Kaustubh Pandey
2021-05-21 18:18         ` Paolo Abeni
2021-06-04 14:26           ` Kaustubh Pandey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.