* Panic in udp4_lib_lookup2 @ 2021-05-12 18:21 kapandey 2021-05-13 8:05 ` Paolo Abeni 0 siblings, 1 reply; 7+ messages in thread From: kapandey @ 2021-05-12 18:21 UTC (permalink / raw) To: willemb, pabeni, netdev, eric.dumazet, sharathv, subashab Hi All, We observed panic in udp_lib_lookup with below call trace: [136523.743271] (7) Call trace: [136523.743275] (7) udp4_lib_lookup2+0x88/0x1d8 [136523.743277] (7) __udp4_lib_lookup+0x168/0x194 [136523.743280] (7) udp4_lib_lookup+0x28/0x54 [136523.743285] (7) nf_sk_lookup_slow_v4+0x2b4/0x384 [136523.743289] (7) owner_mt+0xb8/0x248 [136523.743292] (7) ipt_do_table+0x28c/0x6a8 [136523.743295] (7) iptable_filter_hook+0x24/0x30 [136523.743299] (7) nf_hook_slow+0xa8/0x148 [136523.743303] (7) ip_local_deliver+0xa8/0x14c [136523.743305] (7) ip_rcv+0xe0/0x134 We suspect this might happen due to below sequence: Time CPU X CPU Y t0 inet_release -> udp_lib_close -> sk_common_release -> udp_lib_unhash inet_diag_handler_cmd -> udp_diag_destroy -> __udp_diag_destroy -> udp_lib_rehash t1 if(sk hashed) if(sk hashed) sk_del_node_init_rcu hlist_del_init_rcu(udp_portaddr_node) t2 if(hslot2 != nhslot2) hlist_del_init_rcu(udp_portaddr_node) hlist_add_head_rcu(udp_portaddr_node) t3 sk is freed here After t3 any use of that added udp_portaddr_node will cause panic. Same happens in udp4_lib_lookup2/udp6_lib_lookup2. We dont have any reproducer for this, getting hit during random testing. Pls let us know how this be resolved. Thanks, KP ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Panic in udp4_lib_lookup2 2021-05-12 18:21 Panic in udp4_lib_lookup2 kapandey @ 2021-05-13 8:05 ` Paolo Abeni 2021-05-13 19:48 ` Kaustubh Pandey 0 siblings, 1 reply; 7+ messages in thread From: Paolo Abeni @ 2021-05-13 8:05 UTC (permalink / raw) To: kapandey, willemb, netdev, eric.dumazet, sharathv, subashab Hello, On Wed, 2021-05-12 at 23:51 +0530, kapandey@codeaurora.org wrote: > We observed panic in udp_lib_lookup with below call trace: > [136523.743271] (7) Call trace: > [136523.743275] (7) udp4_lib_lookup2+0x88/0x1d8 > [136523.743277] (7) __udp4_lib_lookup+0x168/0x194 > [136523.743280] (7) udp4_lib_lookup+0x28/0x54 > [136523.743285] (7) nf_sk_lookup_slow_v4+0x2b4/0x384 > [136523.743289] (7) owner_mt+0xb8/0x248 > [136523.743292] (7) ipt_do_table+0x28c/0x6a8 > [136523.743295] (7) iptable_filter_hook+0x24/0x30 > [136523.743299] (7) nf_hook_slow+0xa8/0x148 > [136523.743303] (7) ip_local_deliver+0xa8/0x14c > [136523.743305] (7) ip_rcv+0xe0/0x134 It would be helpful if you could provide a full, decoded, stack trace and the relevant kernel version. > We suspect this might happen due to below sequence: Some email formatting made the "graph" very hard to read... > Time CPU X > > CPU Y > t0 inet_release -> udp_lib_close -> > sk_common_release -> udp_lib_unhash > inet_diag_handler_cmd -> udp_diag_destroy -> __udp_diag_destroy -> > udp_lib_rehash ... but it looks like udp_lib_close() is missing a lock_sock()/release_sock() pair. Something alike: --- diff --git a/include/net/udp.h b/include/net/udp.h index 360df454356c..06586b42db3f 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -209,7 +209,9 @@ void udp_lib_rehash(struct sock *sk, u16 new_hash); static inline void udp_lib_close(struct sock *sk, long timeout) { + lock_sock(sk); sk_common_release(sk); + release_sock(sk); } int udp_lib_get_port(struct sock *sk, unsigned short snum, --- could u please give the above a spin in your testbed? Thanks! Paolo ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Panic in udp4_lib_lookup2 2021-05-13 8:05 ` Paolo Abeni @ 2021-05-13 19:48 ` Kaustubh Pandey 2021-05-13 21:18 ` Paolo Abeni 0 siblings, 1 reply; 7+ messages in thread From: Kaustubh Pandey @ 2021-05-13 19:48 UTC (permalink / raw) To: Paolo Abeni; +Cc: willemb, netdev, eric.dumazet, sharathv, subashab On Thu, May 13, 2021 at 10:05:37AM +0200, Paolo Abeni wrote: > Hello, > > On Wed, 2021-05-12 at 23:51 +0530, kapandey@codeaurora.org wrote: > > We observed panic in udp_lib_lookup with below call trace: > > [136523.743271] (7) Call trace: > > [136523.743275] (7) udp4_lib_lookup2+0x88/0x1d8 > > [136523.743277] (7) __udp4_lib_lookup+0x168/0x194 > > [136523.743280] (7) udp4_lib_lookup+0x28/0x54 > > [136523.743285] (7) nf_sk_lookup_slow_v4+0x2b4/0x384 > > [136523.743289] (7) owner_mt+0xb8/0x248 > > [136523.743292] (7) ipt_do_table+0x28c/0x6a8 > > [136523.743295] (7) iptable_filter_hook+0x24/0x30 > > [136523.743299] (7) nf_hook_slow+0xa8/0x148 > > [136523.743303] (7) ip_local_deliver+0xa8/0x14c > > [136523.743305] (7) ip_rcv+0xe0/0x134 > > It would be helpful if you could provide a full, decoded, stack trace > and the relevant kernel version. > > > We suspect this might happen due to below sequence: > > Some email formatting made the "graph" very hard to read... > > > Time CPU X > > > > CPU Y > > t0 inet_release -> udp_lib_close -> > > sk_common_release -> udp_lib_unhash > > inet_diag_handler_cmd -> udp_diag_destroy -> __udp_diag_destroy -> > > udp_lib_rehash > > ... but it looks like udp_lib_close() is missing a > lock_sock()/release_sock() pair. Something alike: > --- > diff --git a/include/net/udp.h b/include/net/udp.h > index 360df454356c..06586b42db3f 100644 > --- a/include/net/udp.h > +++ b/include/net/udp.h > @@ -209,7 +209,9 @@ void udp_lib_rehash(struct sock *sk, u16 new_hash); > > static inline void udp_lib_close(struct sock *sk, long timeout) > { > + lock_sock(sk); > sk_common_release(sk); > + release_sock(sk); > } > > int udp_lib_get_port(struct sock *sk, unsigned short snum, > --- > > could u please give the above a spin in your testbed? > > Thanks! > > Paolo > > Hi Paolo, Pls find full backtrace: [136523.743141]Unable to handle kernel paging request at virtual address 00000c08ff000820 [136523.743147]Mem abort info: [136523.743150] ESR = 0x96000004 [136523.743154] EC = 0x25: DABT (current EL), IL = 32 bits [136523.743157] SET = 0, FnV = 0 [136523.743159] EA = 0, S1PTW = 0 [136523.743162]Data abort info: [136523.743164] ISV = 0, ISS = 0x00000004 [136523.743167] CM = 0, WnR = 0 [136523.743170][00000c08ff000820] address between user and kernel address ranges [136523.743174]Internal error: Oops: 96000004 [#1] PREEMPT SMP [136523.743218]pstate: a0c00005 (NzCv daif +PAN +UAO) [136523.743226]pc : udp4_lib_lookup2+0x88/0x1d8 [136523.743229]lr : __udp4_lib_lookup+0x168/0x194 [136523.743232]sp : ffffffc039a9b600 [136523.743234]x29: ffffffc039a9b610 x28: 0000000000000000 [136523.743237]x27: 0000000000000000 x26: ff000c08ff0007f0 [136523.743239]x25: ffffffe1e2319b40 x24: 0000000073737272 [136523.743242]x23: 0000000000003500 x22: 0000000000000000 [136523.743244]x21: 000000000000a622 x20: 000000000000001a [136523.743246]x19: 00000000ffffffff x18: ffffffc02b6d7098 [136523.743249]x17: 0000000000000000 x16: 0000000000000000 [136523.743251]x15: 0000000000000000 x14: 0000000000000000 [136523.743254]x13: 0000000000000000 x12: ffffff81d8e30000 [136523.743257]x11: 0000000000000fff x10: ff000c08ff0007f0 [136523.743259]x9 : ff000c08ff000808 x8 : 0000000000000000 [136523.743261]x7 : ffffff81d8e3e6a0 x6 : 0000000000000000 [136523.743263]x5 : 000000000000001a x4 : 000000000000a622 [136523.743266]x3 : 0000000000000000 x2 : 0000000000003500 [136523.743268]x1 : 0000000073737272 x0 : ffffffe1e2319b40 [136523.743271]Call trace: [136523.743275] udp4_lib_lookup2+0x88/0x1d8 [136523.743277] __udp4_lib_lookup+0x168/0x194 [136523.743280] udp4_lib_lookup+0x28/0x54 [136523.743285] nf_sk_lookup_slow_v4+0x2b4/0x384 [136523.743289] owner_mt+0xb8/0x248 [136523.743292] ipt_do_table+0x28c/0x6a8 [136523.743295] iptable_filter_hook+0x24/0x30 [136523.743299] nf_hook_slow+0xa8/0x148 [136523.743303] ip_local_deliver+0xa8/0x14c [136523.743305] ip_rcv+0xe0/0x134 [136523.743309] __netif_receive_skb_core+0x9d0/0xd74 [136523.743313] __netif_receive_skb+0x50/0x17c [136523.743316] netif_receive_skb_internal+0xa0/0xd8 [136523.743318] netif_receive_skb+0xec/0x1e4 Kernel version used is 5.4 Derived from your suggestion I think below change might be more suited here: diff --git a/net/core/sock.c b/net/core/sock.c index f87c9f7..a5dd0e5 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3210,7 +3210,9 @@ void sk_common_release(struct sock *sk) * A. Remove from hash tables. */ + lock_sock(sk); sk->sk_prot->unhash(sk); + release_sock(sk); -- Previously I was trying to share this anlaysis: Time | CPU[X] | CPU[Y] ____________________________________________________________________________ | | t0 | inet_release | inet_diag_handler_cmd | udp_lib_close | udp_diag_destroy | sk_common_release | __udp_diag_destroy | udp_lib_unhash | udp_lib_rehash ____________________________________________________________________________ t1 | if(sk hashed) | if(sk hashed) | sk_del_node_init_rcu | | hlist_del_init_rcu(udp_portaddr_node) | ____________________________________________________________________________ t2 | | if(hslot2 != nhslot2) | | hlist_del_init_rcu(udp_portaddr_node) | | hlist_add_head_rcu(udp_portaddr_node) ____________________________________________________________________________ t3 | sk is freed here | Pls let me know your views on this. Thanks KP ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Panic in udp4_lib_lookup2 2021-05-13 19:48 ` Kaustubh Pandey @ 2021-05-13 21:18 ` Paolo Abeni 2021-05-18 14:20 ` Kaustubh Pandey 0 siblings, 1 reply; 7+ messages in thread From: Paolo Abeni @ 2021-05-13 21:18 UTC (permalink / raw) To: Kaustubh Pandey; +Cc: willemb, netdev, eric.dumazet, sharathv, subashab On Fri, 2021-05-14 at 01:18 +0530, Kaustubh Pandey wrote: > On Thu, May 13, 2021 at 10:05:37AM +0200, Paolo Abeni wrote: > > Hello, > > > > On Wed, 2021-05-12 at 23:51 +0530, kapandey@codeaurora.org wrote: > > > We observed panic in udp_lib_lookup with below call trace: > > > [136523.743271] (7) Call trace: > > > [136523.743275] (7) udp4_lib_lookup2+0x88/0x1d8 > > > [136523.743277] (7) __udp4_lib_lookup+0x168/0x194 > > > [136523.743280] (7) udp4_lib_lookup+0x28/0x54 > > > [136523.743285] (7) nf_sk_lookup_slow_v4+0x2b4/0x384 > > > [136523.743289] (7) owner_mt+0xb8/0x248 > > > [136523.743292] (7) ipt_do_table+0x28c/0x6a8 > > > [136523.743295] (7) iptable_filter_hook+0x24/0x30 > > > [136523.743299] (7) nf_hook_slow+0xa8/0x148 > > > [136523.743303] (7) ip_local_deliver+0xa8/0x14c > > > [136523.743305] (7) ip_rcv+0xe0/0x134 > > > > It would be helpful if you could provide a full, decoded, stack trace > > and the relevant kernel version. > > > > > We suspect this might happen due to below sequence: > > > > Some email formatting made the "graph" very hard to read... > > > > > Time CPU X > > > > > > CPU Y > > > t0 inet_release -> udp_lib_close -> > > > sk_common_release -> udp_lib_unhash > > > inet_diag_handler_cmd -> udp_diag_destroy -> __udp_diag_destroy -> > > > udp_lib_rehash > > > > ... but it looks like udp_lib_close() is missing a > > lock_sock()/release_sock() pair. Something alike: > > --- > > diff --git a/include/net/udp.h b/include/net/udp.h > > index 360df454356c..06586b42db3f 100644 > > --- a/include/net/udp.h > > +++ b/include/net/udp.h > > @@ -209,7 +209,9 @@ void udp_lib_rehash(struct sock *sk, u16 new_hash); > > > > static inline void udp_lib_close(struct sock *sk, long timeout) > > { > > + lock_sock(sk); > > sk_common_release(sk); > > + release_sock(sk); > > } > > > > int udp_lib_get_port(struct sock *sk, unsigned short snum, > > --- > > > > could u please give the above a spin in your testbed? > > > > Thanks! > > > > Paolo > > > > > Hi Paolo, > > Pls find full backtrace: > [136523.743141]Unable to handle kernel paging request at virtual address > 00000c08ff000820 > > [136523.743147]Mem abort info: > [136523.743150] ESR = 0x96000004 > [136523.743154] EC = 0x25: DABT (current EL), IL = 32 bits > [136523.743157] SET = 0, FnV = 0 > [136523.743159] EA = 0, S1PTW = 0 > [136523.743162]Data abort info: > [136523.743164] ISV = 0, ISS = 0x00000004 > [136523.743167] CM = 0, WnR = 0 > [136523.743170][00000c08ff000820] address between user and kernel > address ranges > [136523.743174]Internal error: Oops: 96000004 [#1] PREEMPT SMP > [136523.743218]pstate: a0c00005 (NzCv daif +PAN +UAO) > > [136523.743226]pc : udp4_lib_lookup2+0x88/0x1d8 > [136523.743229]lr : __udp4_lib_lookup+0x168/0x194 > > [136523.743232]sp : ffffffc039a9b600 > [136523.743234]x29: ffffffc039a9b610 x28: 0000000000000000 > [136523.743237]x27: 0000000000000000 x26: ff000c08ff0007f0 > [136523.743239]x25: ffffffe1e2319b40 x24: 0000000073737272 > [136523.743242]x23: 0000000000003500 x22: 0000000000000000 > [136523.743244]x21: 000000000000a622 x20: 000000000000001a > [136523.743246]x19: 00000000ffffffff x18: ffffffc02b6d7098 > [136523.743249]x17: 0000000000000000 x16: 0000000000000000 > [136523.743251]x15: 0000000000000000 x14: 0000000000000000 > [136523.743254]x13: 0000000000000000 x12: ffffff81d8e30000 > [136523.743257]x11: 0000000000000fff x10: ff000c08ff0007f0 > [136523.743259]x9 : ff000c08ff000808 x8 : 0000000000000000 > [136523.743261]x7 : ffffff81d8e3e6a0 x6 : 0000000000000000 > [136523.743263]x5 : 000000000000001a x4 : 000000000000a622 > [136523.743266]x3 : 0000000000000000 x2 : 0000000000003500 > [136523.743268]x1 : 0000000073737272 x0 : ffffffe1e2319b40 > [136523.743271]Call trace: > [136523.743275] udp4_lib_lookup2+0x88/0x1d8 > [136523.743277] __udp4_lib_lookup+0x168/0x194 > [136523.743280] udp4_lib_lookup+0x28/0x54 > [136523.743285] nf_sk_lookup_slow_v4+0x2b4/0x384 > [136523.743289] owner_mt+0xb8/0x248 > [136523.743292] ipt_do_table+0x28c/0x6a8 > [136523.743295] iptable_filter_hook+0x24/0x30 > [136523.743299] nf_hook_slow+0xa8/0x148 > [136523.743303] ip_local_deliver+0xa8/0x14c > [136523.743305] ip_rcv+0xe0/0x134 > [136523.743309] __netif_receive_skb_core+0x9d0/0xd74 > [136523.743313] __netif_receive_skb+0x50/0x17c > [136523.743316] netif_receive_skb_internal+0xa0/0xd8 > [136523.743318] netif_receive_skb+0xec/0x1e4 > > Kernel version used is 5.4 > > Derived from your suggestion I think below change might be more suited > here: > diff --git a/net/core/sock.c b/net/core/sock.c > index f87c9f7..a5dd0e5 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -3210,7 +3210,9 @@ void sk_common_release(struct sock *sk) > * A. Remove from hash tables. > */ > + lock_sock(sk); > sk->sk_prot->unhash(sk); > + release_sock(sk); The above will cause deadlock on protocol rightfully calling sk_common_release under the socket lock (e.g. sctp). Could you please test the change I suggested in your testbed? Thanks Paolo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Panic in udp4_lib_lookup2 2021-05-13 21:18 ` Paolo Abeni @ 2021-05-18 14:20 ` Kaustubh Pandey 2021-05-21 18:18 ` Paolo Abeni 0 siblings, 1 reply; 7+ messages in thread From: Kaustubh Pandey @ 2021-05-18 14:20 UTC (permalink / raw) To: Paolo Abeni; +Cc: willemb, netdev, eric.dumazet, sharathv, subashab On Thu, May 13, 2021 at 11:18:53PM +0200, Paolo Abeni wrote: > On Fri, 2021-05-14 at 01:18 +0530, Kaustubh Pandey wrote: > > On Thu, May 13, 2021 at 10:05:37AM +0200, Paolo Abeni wrote: > > > Hello, > > > > > > On Wed, 2021-05-12 at 23:51 +0530, kapandey@codeaurora.org wrote: > > > > We observed panic in udp_lib_lookup with below call trace: > > > > [136523.743271] (7) Call trace: > > > > [136523.743275] (7) udp4_lib_lookup2+0x88/0x1d8 > > > > [136523.743277] (7) __udp4_lib_lookup+0x168/0x194 > > > > [136523.743280] (7) udp4_lib_lookup+0x28/0x54 > > > > [136523.743285] (7) nf_sk_lookup_slow_v4+0x2b4/0x384 > > > > [136523.743289] (7) owner_mt+0xb8/0x248 > > > > [136523.743292] (7) ipt_do_table+0x28c/0x6a8 > > > > [136523.743295] (7) iptable_filter_hook+0x24/0x30 > > > > [136523.743299] (7) nf_hook_slow+0xa8/0x148 > > > > [136523.743303] (7) ip_local_deliver+0xa8/0x14c > > > > [136523.743305] (7) ip_rcv+0xe0/0x134 > > > > > > It would be helpful if you could provide a full, decoded, stack trace > > > and the relevant kernel version. > > > > > > > We suspect this might happen due to below sequence: > > > > > > Some email formatting made the "graph" very hard to read... > > > > > > > Time CPU X > > > > > > > > CPU Y > > > > t0 inet_release -> udp_lib_close -> > > > > sk_common_release -> udp_lib_unhash > > > > inet_diag_handler_cmd -> udp_diag_destroy -> __udp_diag_destroy -> > > > > udp_lib_rehash > > > > > > ... but it looks like udp_lib_close() is missing a > > > lock_sock()/release_sock() pair. Something alike: > > > --- > > > diff --git a/include/net/udp.h b/include/net/udp.h > > > index 360df454356c..06586b42db3f 100644 > > > --- a/include/net/udp.h > > > +++ b/include/net/udp.h > > > @@ -209,7 +209,9 @@ void udp_lib_rehash(struct sock *sk, u16 new_hash); > > > > > > static inline void udp_lib_close(struct sock *sk, long timeout) > > > { > > > + lock_sock(sk); > > > sk_common_release(sk); > > > + release_sock(sk); > > > } > > > > > > int udp_lib_get_port(struct sock *sk, unsigned short snum, > > > --- > > > > > > could u please give the above a spin in your testbed? > > > > > > Thanks! > > > > > > Paolo > > > > > > > > Hi Paolo, > > > > Pls find full backtrace: > > [136523.743141]Unable to handle kernel paging request at virtual address > > 00000c08ff000820 > > > > [136523.743147]Mem abort info: > > [136523.743150] ESR = 0x96000004 > > [136523.743154] EC = 0x25: DABT (current EL), IL = 32 bits > > [136523.743157] SET = 0, FnV = 0 > > [136523.743159] EA = 0, S1PTW = 0 > > [136523.743162]Data abort info: > > [136523.743164] ISV = 0, ISS = 0x00000004 > > [136523.743167] CM = 0, WnR = 0 > > [136523.743170][00000c08ff000820] address between user and kernel > > address ranges > > [136523.743174]Internal error: Oops: 96000004 [#1] PREEMPT SMP > > [136523.743218]pstate: a0c00005 (NzCv daif +PAN +UAO) > > > > [136523.743226]pc : udp4_lib_lookup2+0x88/0x1d8 > > [136523.743229]lr : __udp4_lib_lookup+0x168/0x194 > > > > [136523.743232]sp : ffffffc039a9b600 > > [136523.743234]x29: ffffffc039a9b610 x28: 0000000000000000 > > [136523.743237]x27: 0000000000000000 x26: ff000c08ff0007f0 > > [136523.743239]x25: ffffffe1e2319b40 x24: 0000000073737272 > > [136523.743242]x23: 0000000000003500 x22: 0000000000000000 > > [136523.743244]x21: 000000000000a622 x20: 000000000000001a > > [136523.743246]x19: 00000000ffffffff x18: ffffffc02b6d7098 > > [136523.743249]x17: 0000000000000000 x16: 0000000000000000 > > [136523.743251]x15: 0000000000000000 x14: 0000000000000000 > > [136523.743254]x13: 0000000000000000 x12: ffffff81d8e30000 > > [136523.743257]x11: 0000000000000fff x10: ff000c08ff0007f0 > > [136523.743259]x9 : ff000c08ff000808 x8 : 0000000000000000 > > [136523.743261]x7 : ffffff81d8e3e6a0 x6 : 0000000000000000 > > [136523.743263]x5 : 000000000000001a x4 : 000000000000a622 > > [136523.743266]x3 : 0000000000000000 x2 : 0000000000003500 > > [136523.743268]x1 : 0000000073737272 x0 : ffffffe1e2319b40 > > [136523.743271]Call trace: > > [136523.743275] udp4_lib_lookup2+0x88/0x1d8 > > [136523.743277] __udp4_lib_lookup+0x168/0x194 > > [136523.743280] udp4_lib_lookup+0x28/0x54 > > [136523.743285] nf_sk_lookup_slow_v4+0x2b4/0x384 > > [136523.743289] owner_mt+0xb8/0x248 > > [136523.743292] ipt_do_table+0x28c/0x6a8 > > [136523.743295] iptable_filter_hook+0x24/0x30 > > [136523.743299] nf_hook_slow+0xa8/0x148 > > [136523.743303] ip_local_deliver+0xa8/0x14c > > [136523.743305] ip_rcv+0xe0/0x134 > > [136523.743309] __netif_receive_skb_core+0x9d0/0xd74 > > [136523.743313] __netif_receive_skb+0x50/0x17c > > [136523.743316] netif_receive_skb_internal+0xa0/0xd8 > > [136523.743318] netif_receive_skb+0xec/0x1e4 > > > > Kernel version used is 5.4 > > > > Derived from your suggestion I think below change might be more suited > > here: > > diff --git a/net/core/sock.c b/net/core/sock.c > > index f87c9f7..a5dd0e5 100644 > > --- a/net/core/sock.c > > +++ b/net/core/sock.c > > @@ -3210,7 +3210,9 @@ void sk_common_release(struct sock *sk) > > * A. Remove from hash tables. > > */ > > + lock_sock(sk); > > sk->sk_prot->unhash(sk); > > + release_sock(sk); > > The above will cause deadlock on protocol rightfully calling > sk_common_release under the socket lock (e.g. sctp). > > Could you please test the change I suggested in your testbed? > > Thanks > > Paolo > Hi Paolo, I picked up your change and with that it get stuck in udp_destroy_sock trying to do lock_sock_fast. Can you guide for next steps here? Thanks, KP ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Panic in udp4_lib_lookup2 2021-05-18 14:20 ` Kaustubh Pandey @ 2021-05-21 18:18 ` Paolo Abeni 2021-06-04 14:26 ` Kaustubh Pandey 0 siblings, 1 reply; 7+ messages in thread From: Paolo Abeni @ 2021-05-21 18:18 UTC (permalink / raw) To: Kaustubh Pandey; +Cc: willemb, netdev, eric.dumazet, sharathv, subashab On Tue, 2021-05-18 at 19:50 +0530, Kaustubh Pandey wrote: > On Thu, May 13, 2021 at 11:18:53PM +0200, Paolo Abeni wrote: > > On Fri, 2021-05-14 at 01:18 +0530, Kaustubh Pandey wrote: > > > On Thu, May 13, 2021 at 10:05:37AM +0200, Paolo Abeni wrote: > > > > Hello, > > > > > > > > On Wed, 2021-05-12 at 23:51 +0530, kapandey@codeaurora.org wrote: > > > > > We observed panic in udp_lib_lookup with below call trace: > > > > > [136523.743271] (7) Call trace: > > > > > [136523.743275] (7) udp4_lib_lookup2+0x88/0x1d8 > > > > > [136523.743277] (7) __udp4_lib_lookup+0x168/0x194 > > > > > [136523.743280] (7) udp4_lib_lookup+0x28/0x54 > > > > > [136523.743285] (7) nf_sk_lookup_slow_v4+0x2b4/0x384 > > > > > [136523.743289] (7) owner_mt+0xb8/0x248 > > > > > [136523.743292] (7) ipt_do_table+0x28c/0x6a8 > > > > > [136523.743295] (7) iptable_filter_hook+0x24/0x30 > > > > > [136523.743299] (7) nf_hook_slow+0xa8/0x148 > > > > > [136523.743303] (7) ip_local_deliver+0xa8/0x14c > > > > > [136523.743305] (7) ip_rcv+0xe0/0x134 > > > > > > > > It would be helpful if you could provide a full, decoded, stack trace > > > > and the relevant kernel version. > > > > > > > > > We suspect this might happen due to below sequence: > > > > > > > > Some email formatting made the "graph" very hard to read... > > > > > > > > > Time CPU X > > > > > > > > > > CPU Y > > > > > t0 inet_release -> udp_lib_close -> > > > > > sk_common_release -> udp_lib_unhash > > > > > inet_diag_handler_cmd -> udp_diag_destroy -> __udp_diag_destroy -> > > > > > udp_lib_rehash > > > > > > > > ... but it looks like udp_lib_close() is missing a > > > > lock_sock()/release_sock() pair. Something alike: > > > > --- > > > > diff --git a/include/net/udp.h b/include/net/udp.h > > > > index 360df454356c..06586b42db3f 100644 > > > > --- a/include/net/udp.h > > > > +++ b/include/net/udp.h > > > > @@ -209,7 +209,9 @@ void udp_lib_rehash(struct sock *sk, u16 new_hash); > > > > > > > > static inline void udp_lib_close(struct sock *sk, long timeout) > > > > { > > > > + lock_sock(sk); > > > > sk_common_release(sk); > > > > + release_sock(sk); > > > > } > > > > > > > > int udp_lib_get_port(struct sock *sk, unsigned short snum, > > > > --- > > > > > > > > could u please give the above a spin in your testbed? > > > > > > > > Thanks! > > > > > > > > Paolo > > > > > > > > > > > Hi Paolo, > > > > > > Pls find full backtrace: > > > [136523.743141]Unable to handle kernel paging request at virtual address > > > 00000c08ff000820 > > > > > > [136523.743147]Mem abort info: > > > [136523.743150] ESR = 0x96000004 > > > [136523.743154] EC = 0x25: DABT (current EL), IL = 32 bits > > > [136523.743157] SET = 0, FnV = 0 > > > [136523.743159] EA = 0, S1PTW = 0 > > > [136523.743162]Data abort info: > > > [136523.743164] ISV = 0, ISS = 0x00000004 > > > [136523.743167] CM = 0, WnR = 0 > > > [136523.743170][00000c08ff000820] address between user and kernel > > > address ranges > > > [136523.743174]Internal error: Oops: 96000004 [#1] PREEMPT SMP > > > [136523.743218]pstate: a0c00005 (NzCv daif +PAN +UAO) > > > > > > [136523.743226]pc : udp4_lib_lookup2+0x88/0x1d8 > > > [136523.743229]lr : __udp4_lib_lookup+0x168/0x194 > > > > > > [136523.743232]sp : ffffffc039a9b600 > > > [136523.743234]x29: ffffffc039a9b610 x28: 0000000000000000 > > > [136523.743237]x27: 0000000000000000 x26: ff000c08ff0007f0 > > > [136523.743239]x25: ffffffe1e2319b40 x24: 0000000073737272 > > > [136523.743242]x23: 0000000000003500 x22: 0000000000000000 > > > [136523.743244]x21: 000000000000a622 x20: 000000000000001a > > > [136523.743246]x19: 00000000ffffffff x18: ffffffc02b6d7098 > > > [136523.743249]x17: 0000000000000000 x16: 0000000000000000 > > > [136523.743251]x15: 0000000000000000 x14: 0000000000000000 > > > [136523.743254]x13: 0000000000000000 x12: ffffff81d8e30000 > > > [136523.743257]x11: 0000000000000fff x10: ff000c08ff0007f0 > > > [136523.743259]x9 : ff000c08ff000808 x8 : 0000000000000000 > > > [136523.743261]x7 : ffffff81d8e3e6a0 x6 : 0000000000000000 > > > [136523.743263]x5 : 000000000000001a x4 : 000000000000a622 > > > [136523.743266]x3 : 0000000000000000 x2 : 0000000000003500 > > > [136523.743268]x1 : 0000000073737272 x0 : ffffffe1e2319b40 > > > [136523.743271]Call trace: > > > [136523.743275] udp4_lib_lookup2+0x88/0x1d8 > > > [136523.743277] __udp4_lib_lookup+0x168/0x194 > > > [136523.743280] udp4_lib_lookup+0x28/0x54 > > > [136523.743285] nf_sk_lookup_slow_v4+0x2b4/0x384 > > > [136523.743289] owner_mt+0xb8/0x248 > > > [136523.743292] ipt_do_table+0x28c/0x6a8 > > > [136523.743295] iptable_filter_hook+0x24/0x30 > > > [136523.743299] nf_hook_slow+0xa8/0x148 > > > [136523.743303] ip_local_deliver+0xa8/0x14c > > > [136523.743305] ip_rcv+0xe0/0x134 > > > [136523.743309] __netif_receive_skb_core+0x9d0/0xd74 > > > [136523.743313] __netif_receive_skb+0x50/0x17c > > > [136523.743316] netif_receive_skb_internal+0xa0/0xd8 > > > [136523.743318] netif_receive_skb+0xec/0x1e4 > > > > > > Kernel version used is 5.4 > > > > > > Derived from your suggestion I think below change might be more suited > > > here: > > > diff --git a/net/core/sock.c b/net/core/sock.c > > > index f87c9f7..a5dd0e5 100644 > > > --- a/net/core/sock.c > > > +++ b/net/core/sock.c > > > @@ -3210,7 +3210,9 @@ void sk_common_release(struct sock *sk) > > > * A. Remove from hash tables. > > > */ > > > + lock_sock(sk); > > > sk->sk_prot->unhash(sk); > > > + release_sock(sk); > > > > The above will cause deadlock on protocol rightfully calling > > sk_common_release under the socket lock (e.g. sctp). > > > > Could you please test the change I suggested in your testbed? > > > > Thanks > > > > Paolo > > > Hi Paolo, > > I picked up your change and with that it get stuck in udp_destroy_sock > trying to do lock_sock_fast. > > Can you guide for next steps here? I'm sorry for the high latency. It's a bit tricky as we can't easily acquire the socket lock on close with a wider scope. Can you please try this one? Thanks! Paolo --- diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 055fceb18bea..0d6205726abf 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2607,6 +2607,7 @@ void udp_destroy_sock(struct sock *sk) { struct udp_sock *up = udp_sk(sk); bool slow = lock_sock_fast(sk); + sock_set_flag(sk, SOCK_DEAD); udp_flush_pending_frames(sk); unlock_sock_fast(sk, slow); if (static_branch_unlikely(&udp_encap_needed_key)) { @@ -2857,10 +2858,17 @@ int udp_abort(struct sock *sk, int err) { lock_sock(sk); + /* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing + * with close() + */ + if (sock_flag(sk, SOCK_DEAD)) + goto out; + sk->sk_err = err; sk->sk_error_report(sk); __udp_disconnect(sk, 0); +out: release_sock(sk); return 0; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 199b080d418a..2984ec43b9aa 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1598,6 +1598,7 @@ void udpv6_destroy_sock(struct sock *sk) { struct udp_sock *up = udp_sk(sk); lock_sock(sk); + sock_set_flag(sk, SOCK_DEAD); udp_v6_flush_pending_frames(sk); release_sock(sk); ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: Panic in udp4_lib_lookup2 2021-05-21 18:18 ` Paolo Abeni @ 2021-06-04 14:26 ` Kaustubh Pandey 0 siblings, 0 replies; 7+ messages in thread From: Kaustubh Pandey @ 2021-06-04 14:26 UTC (permalink / raw) To: Paolo Abeni; +Cc: willemb, netdev, eric.dumazet, sharathv, subashab On Fri, May 21, 2021 at 08:18:56PM +0200, Paolo Abeni wrote: > On Tue, 2021-05-18 at 19:50 +0530, Kaustubh Pandey wrote: > > On Thu, May 13, 2021 at 11:18:53PM +0200, Paolo Abeni wrote: > > > On Fri, 2021-05-14 at 01:18 +0530, Kaustubh Pandey wrote: > > > > On Thu, May 13, 2021 at 10:05:37AM +0200, Paolo Abeni wrote: > > > > > Hello, > > > > > > > > > > On Wed, 2021-05-12 at 23:51 +0530, kapandey@codeaurora.org wrote: > > > > > > We observed panic in udp_lib_lookup with below call trace: > > > > > > [136523.743271] (7) Call trace: > > > > > > [136523.743275] (7) udp4_lib_lookup2+0x88/0x1d8 > > > > > > [136523.743277] (7) __udp4_lib_lookup+0x168/0x194 > > > > > > [136523.743280] (7) udp4_lib_lookup+0x28/0x54 > > > > > > [136523.743285] (7) nf_sk_lookup_slow_v4+0x2b4/0x384 > > > > > > [136523.743289] (7) owner_mt+0xb8/0x248 > > > > > > [136523.743292] (7) ipt_do_table+0x28c/0x6a8 > > > > > > [136523.743295] (7) iptable_filter_hook+0x24/0x30 > > > > > > [136523.743299] (7) nf_hook_slow+0xa8/0x148 > > > > > > [136523.743303] (7) ip_local_deliver+0xa8/0x14c > > > > > > [136523.743305] (7) ip_rcv+0xe0/0x134 > > > > > > > > > > It would be helpful if you could provide a full, decoded, stack trace > > > > > and the relevant kernel version. > > > > > > > > > > > We suspect this might happen due to below sequence: > > > > > > > > > > Some email formatting made the "graph" very hard to read... > > > > > > > > > > > Time CPU X > > > > > > > > > > > > CPU Y > > > > > > t0 inet_release -> udp_lib_close -> > > > > > > sk_common_release -> udp_lib_unhash > > > > > > inet_diag_handler_cmd -> udp_diag_destroy -> __udp_diag_destroy -> > > > > > > udp_lib_rehash > > > > > > > > > > ... but it looks like udp_lib_close() is missing a > > > > > lock_sock()/release_sock() pair. Something alike: > > > > > --- > > > > > diff --git a/include/net/udp.h b/include/net/udp.h > > > > > index 360df454356c..06586b42db3f 100644 > > > > > --- a/include/net/udp.h > > > > > +++ b/include/net/udp.h > > > > > @@ -209,7 +209,9 @@ void udp_lib_rehash(struct sock *sk, u16 new_hash); > > > > > > > > > > static inline void udp_lib_close(struct sock *sk, long timeout) > > > > > { > > > > > + lock_sock(sk); > > > > > sk_common_release(sk); > > > > > + release_sock(sk); > > > > > } > > > > > > > > > > int udp_lib_get_port(struct sock *sk, unsigned short snum, > > > > > --- > > > > > > > > > > could u please give the above a spin in your testbed? > > > > > > > > > > Thanks! > > > > > > > > > > Paolo > > > > > > > > > > > > > > Hi Paolo, > > > > > > > > Pls find full backtrace: > > > > [136523.743141]Unable to handle kernel paging request at virtual address > > > > 00000c08ff000820 > > > > > > > > [136523.743147]Mem abort info: > > > > [136523.743150] ESR = 0x96000004 > > > > [136523.743154] EC = 0x25: DABT (current EL), IL = 32 bits > > > > [136523.743157] SET = 0, FnV = 0 > > > > [136523.743159] EA = 0, S1PTW = 0 > > > > [136523.743162]Data abort info: > > > > [136523.743164] ISV = 0, ISS = 0x00000004 > > > > [136523.743167] CM = 0, WnR = 0 > > > > [136523.743170][00000c08ff000820] address between user and kernel > > > > address ranges > > > > [136523.743174]Internal error: Oops: 96000004 [#1] PREEMPT SMP > > > > [136523.743218]pstate: a0c00005 (NzCv daif +PAN +UAO) > > > > > > > > [136523.743226]pc : udp4_lib_lookup2+0x88/0x1d8 > > > > [136523.743229]lr : __udp4_lib_lookup+0x168/0x194 > > > > > > > > [136523.743232]sp : ffffffc039a9b600 > > > > [136523.743234]x29: ffffffc039a9b610 x28: 0000000000000000 > > > > [136523.743237]x27: 0000000000000000 x26: ff000c08ff0007f0 > > > > [136523.743239]x25: ffffffe1e2319b40 x24: 0000000073737272 > > > > [136523.743242]x23: 0000000000003500 x22: 0000000000000000 > > > > [136523.743244]x21: 000000000000a622 x20: 000000000000001a > > > > [136523.743246]x19: 00000000ffffffff x18: ffffffc02b6d7098 > > > > [136523.743249]x17: 0000000000000000 x16: 0000000000000000 > > > > [136523.743251]x15: 0000000000000000 x14: 0000000000000000 > > > > [136523.743254]x13: 0000000000000000 x12: ffffff81d8e30000 > > > > [136523.743257]x11: 0000000000000fff x10: ff000c08ff0007f0 > > > > [136523.743259]x9 : ff000c08ff000808 x8 : 0000000000000000 > > > > [136523.743261]x7 : ffffff81d8e3e6a0 x6 : 0000000000000000 > > > > [136523.743263]x5 : 000000000000001a x4 : 000000000000a622 > > > > [136523.743266]x3 : 0000000000000000 x2 : 0000000000003500 > > > > [136523.743268]x1 : 0000000073737272 x0 : ffffffe1e2319b40 > > > > [136523.743271]Call trace: > > > > [136523.743275] udp4_lib_lookup2+0x88/0x1d8 > > > > [136523.743277] __udp4_lib_lookup+0x168/0x194 > > > > [136523.743280] udp4_lib_lookup+0x28/0x54 > > > > [136523.743285] nf_sk_lookup_slow_v4+0x2b4/0x384 > > > > [136523.743289] owner_mt+0xb8/0x248 > > > > [136523.743292] ipt_do_table+0x28c/0x6a8 > > > > [136523.743295] iptable_filter_hook+0x24/0x30 > > > > [136523.743299] nf_hook_slow+0xa8/0x148 > > > > [136523.743303] ip_local_deliver+0xa8/0x14c > > > > [136523.743305] ip_rcv+0xe0/0x134 > > > > [136523.743309] __netif_receive_skb_core+0x9d0/0xd74 > > > > [136523.743313] __netif_receive_skb+0x50/0x17c > > > > [136523.743316] netif_receive_skb_internal+0xa0/0xd8 > > > > [136523.743318] netif_receive_skb+0xec/0x1e4 > > > > > > > > Kernel version used is 5.4 > > > > > > > > Derived from your suggestion I think below change might be more suited > > > > here: > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > > > index f87c9f7..a5dd0e5 100644 > > > > --- a/net/core/sock.c > > > > +++ b/net/core/sock.c > > > > @@ -3210,7 +3210,9 @@ void sk_common_release(struct sock *sk) > > > > * A. Remove from hash tables. > > > > */ > > > > + lock_sock(sk); > > > > sk->sk_prot->unhash(sk); > > > > + release_sock(sk); > > > > > > The above will cause deadlock on protocol rightfully calling > > > sk_common_release under the socket lock (e.g. sctp). > > > > > > Could you please test the change I suggested in your testbed? > > > > > > Thanks > > > > > > Paolo > > > > > Hi Paolo, > > > > I picked up your change and with that it get stuck in udp_destroy_sock > > trying to do lock_sock_fast. > > > > Can you guide for next steps here? > > I'm sorry for the high latency. > > It's a bit tricky as we can't easily acquire the socket lock on close > with a wider scope. > > Can you please try this one? > > Thanks! > > Paolo > --- > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > index 055fceb18bea..0d6205726abf 100644 > --- a/net/ipv4/udp.c > +++ b/net/ipv4/udp.c > @@ -2607,6 +2607,7 @@ void udp_destroy_sock(struct sock *sk) > { > struct udp_sock *up = udp_sk(sk); > bool slow = lock_sock_fast(sk); > + sock_set_flag(sk, SOCK_DEAD); > udp_flush_pending_frames(sk); > unlock_sock_fast(sk, slow); > if (static_branch_unlikely(&udp_encap_needed_key)) { > @@ -2857,10 +2858,17 @@ int udp_abort(struct sock *sk, int err) > { > lock_sock(sk); > > + /* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing > + * with close() > + */ > + if (sock_flag(sk, SOCK_DEAD)) > + goto out; > + > sk->sk_err = err; > sk->sk_error_report(sk); > __udp_disconnect(sk, 0); > > +out: > release_sock(sk); > > return 0; > diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c > index 199b080d418a..2984ec43b9aa 100644 > --- a/net/ipv6/udp.c > +++ b/net/ipv6/udp.c > @@ -1598,6 +1598,7 @@ void udpv6_destroy_sock(struct sock *sk) > { > struct udp_sock *up = udp_sk(sk); > lock_sock(sk); > + sock_set_flag(sk, SOCK_DEAD); > udp_v6_flush_pending_frames(sk); > release_sock(sk); > > Hi Paolo, Apologies for delay. I picked up this patch and tried reproducing the panic. I never observed the panic with this patch. Seems to work perfectly fine. Can you pls raise a formal patch for this? Thanks, KP ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-06-04 14:27 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-12 18:21 Panic in udp4_lib_lookup2 kapandey 2021-05-13 8:05 ` Paolo Abeni 2021-05-13 19:48 ` Kaustubh Pandey 2021-05-13 21:18 ` Paolo Abeni 2021-05-18 14:20 ` Kaustubh Pandey 2021-05-21 18:18 ` Paolo Abeni 2021-06-04 14:26 ` Kaustubh Pandey
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.