All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Kaustubh Pandey <kapandey@codeaurora.org>
Cc: willemb@google.com, netdev@vger.kernel.org,
	eric.dumazet@gmail.com, sharathv@codeaurora.org,
	subashab@codeaurora.org
Subject: Re: Panic in udp4_lib_lookup2
Date: Fri, 21 May 2021 20:18:56 +0200	[thread overview]
Message-ID: <da0d68ea162be29d489ad9dec575f8a8558e8e57.camel@redhat.com> (raw)
In-Reply-To: <20210518142023.GA16017@kapandey-linux.qualcomm.com>

On Tue, 2021-05-18 at 19:50 +0530, Kaustubh Pandey wrote:
> On Thu, May 13, 2021 at 11:18:53PM +0200, Paolo Abeni wrote:
> > On Fri, 2021-05-14 at 01:18 +0530, Kaustubh Pandey wrote:
> > > On Thu, May 13, 2021 at 10:05:37AM +0200, Paolo Abeni wrote:
> > > > Hello,
> > > > 
> > > > On Wed, 2021-05-12 at 23:51 +0530, kapandey@codeaurora.org wrote:
> > > > > We observed panic in udp_lib_lookup with below call trace:
> > > > > [136523.743271]  (7) Call trace:
> > > > > [136523.743275]  (7)  udp4_lib_lookup2+0x88/0x1d8
> > > > > [136523.743277]  (7)  __udp4_lib_lookup+0x168/0x194
> > > > > [136523.743280]  (7)  udp4_lib_lookup+0x28/0x54
> > > > > [136523.743285]  (7)  nf_sk_lookup_slow_v4+0x2b4/0x384
> > > > > [136523.743289]  (7)  owner_mt+0xb8/0x248
> > > > > [136523.743292]  (7)  ipt_do_table+0x28c/0x6a8
> > > > > [136523.743295]  (7) iptable_filter_hook+0x24/0x30
> > > > > [136523.743299]  (7)  nf_hook_slow+0xa8/0x148
> > > > > [136523.743303]  (7)  ip_local_deliver+0xa8/0x14c
> > > > > [136523.743305]  (7)  ip_rcv+0xe0/0x134
> > > > 
> > > > It would be helpful if you could provide a full, decoded, stack trace
> > > > and the relevant kernel version.
> > > > 
> > > > > We suspect this might happen due to below sequence:
> > > > 
> > > > Some email formatting made the "graph" very hard to read...
> > > > 
> > > > > Time                                                   CPU X
> > > > > 
> > > > >                                     CPU Y
> > > > > t0                                inet_release -> udp_lib_close ->
> > > > > sk_common_release -> udp_lib_unhash
> > > > > inet_diag_handler_cmd -> udp_diag_destroy -> __udp_diag_destroy ->
> > > > > udp_lib_rehash
> > > > 
> > > > ... but it looks like udp_lib_close() is missing a
> > > > lock_sock()/release_sock() pair. Something alike:
> > > > ---
> > > > diff --git a/include/net/udp.h b/include/net/udp.h
> > > > index 360df454356c..06586b42db3f 100644
> > > > --- a/include/net/udp.h
> > > > +++ b/include/net/udp.h
> > > > @@ -209,7 +209,9 @@ void udp_lib_rehash(struct sock *sk, u16 new_hash);
> > > > 
> > > >  static inline void udp_lib_close(struct sock *sk, long timeout)
> > > >  {
> > > > +	lock_sock(sk);
> > > >  	sk_common_release(sk);
> > > > +	release_sock(sk);
> > > >  }
> > > > 
> > > >  int udp_lib_get_port(struct sock *sk, unsigned short snum,
> > > > ---
> > > > 
> > > > could u please give the above a spin in your testbed?
> > > > 
> > > > Thanks!
> > > > 
> > > > Paolo
> > > > 
> > > > 
> > > Hi Paolo,
> > > 
> > > Pls find full backtrace:
> > > [136523.743141]Unable to handle kernel paging request at virtual address
> > > 00000c08ff000820
> > > 
> > > [136523.743147]Mem abort info:
> > > [136523.743150]  ESR = 0x96000004
> > > [136523.743154]  EC = 0x25: DABT (current EL), IL = 32 bits
> > > [136523.743157]  SET = 0, FnV = 0
> > > [136523.743159]  EA = 0, S1PTW = 0
> > > [136523.743162]Data abort info:
> > > [136523.743164]  ISV = 0, ISS = 0x00000004
> > > [136523.743167]  CM = 0, WnR = 0
> > > [136523.743170][00000c08ff000820] address between user and kernel
> > > address ranges
> > > [136523.743174]Internal error: Oops: 96000004 [#1] PREEMPT SMP
> > > [136523.743218]pstate: a0c00005 (NzCv daif +PAN +UAO)
> > > 
> > > [136523.743226]pc : udp4_lib_lookup2+0x88/0x1d8
> > > [136523.743229]lr : __udp4_lib_lookup+0x168/0x194
> > > 
> > > [136523.743232]sp : ffffffc039a9b600
> > > [136523.743234]x29: ffffffc039a9b610 x28: 0000000000000000
> > > [136523.743237]x27: 0000000000000000 x26: ff000c08ff0007f0
> > > [136523.743239]x25: ffffffe1e2319b40 x24: 0000000073737272
> > > [136523.743242]x23: 0000000000003500 x22: 0000000000000000
> > > [136523.743244]x21: 000000000000a622 x20: 000000000000001a
> > > [136523.743246]x19: 00000000ffffffff x18: ffffffc02b6d7098
> > > [136523.743249]x17: 0000000000000000 x16: 0000000000000000
> > > [136523.743251]x15: 0000000000000000 x14: 0000000000000000
> > > [136523.743254]x13: 0000000000000000 x12: ffffff81d8e30000
> > > [136523.743257]x11: 0000000000000fff x10: ff000c08ff0007f0
> > > [136523.743259]x9 : ff000c08ff000808 x8 : 0000000000000000
> > > [136523.743261]x7 : ffffff81d8e3e6a0 x6 : 0000000000000000
> > > [136523.743263]x5 : 000000000000001a x4 : 000000000000a622
> > > [136523.743266]x3 : 0000000000000000 x2 : 0000000000003500
> > > [136523.743268]x1 : 0000000073737272 x0 : ffffffe1e2319b40
> > > [136523.743271]Call trace:
> > > [136523.743275] udp4_lib_lookup2+0x88/0x1d8
> > > [136523.743277] __udp4_lib_lookup+0x168/0x194
> > > [136523.743280] udp4_lib_lookup+0x28/0x54
> > > [136523.743285] nf_sk_lookup_slow_v4+0x2b4/0x384
> > > [136523.743289] owner_mt+0xb8/0x248
> > > [136523.743292] ipt_do_table+0x28c/0x6a8
> > > [136523.743295] iptable_filter_hook+0x24/0x30
> > > [136523.743299] nf_hook_slow+0xa8/0x148
> > > [136523.743303] ip_local_deliver+0xa8/0x14c
> > > [136523.743305] ip_rcv+0xe0/0x134
> > > [136523.743309] __netif_receive_skb_core+0x9d0/0xd74
> > > [136523.743313] __netif_receive_skb+0x50/0x17c
> > > [136523.743316] netif_receive_skb_internal+0xa0/0xd8
> > > [136523.743318] netif_receive_skb+0xec/0x1e4
> > > 
> > > Kernel version used is 5.4
> > > 
> > > Derived from your suggestion I think below change might be more suited
> > > here:
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > index f87c9f7..a5dd0e5 100644
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -3210,7 +3210,9 @@ void sk_common_release(struct sock *sk)
> > >          * A. Remove from hash tables.
> > >          */
> > > +       lock_sock(sk);
> > >        sk->sk_prot->unhash(sk);
> > > +       release_sock(sk);
> > 
> > The above will cause deadlock on protocol rightfully calling
> > sk_common_release under the socket lock (e.g. sctp).
> > 
> > Could you please test the change I suggested in your testbed?
> > 
> > Thanks
> > 
> > Paolo
> > 
> Hi Paolo,
> 
> I picked up your change and with that it get stuck in udp_destroy_sock
> trying to do lock_sock_fast.
> 
> Can you guide for next steps here?

I'm sorry for the high latency. 

It's a bit tricky as we can't easily acquire the socket lock on close
with a wider scope.

Can you please try this one?

Thanks!

Paolo
---
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 055fceb18bea..0d6205726abf 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2607,6 +2607,7 @@ void udp_destroy_sock(struct sock *sk)
 {
 	struct udp_sock *up = udp_sk(sk);
 	bool slow = lock_sock_fast(sk);
+	sock_set_flag(sk, SOCK_DEAD);
 	udp_flush_pending_frames(sk);
 	unlock_sock_fast(sk, slow);
 	if (static_branch_unlikely(&udp_encap_needed_key)) {
@@ -2857,10 +2858,17 @@ int udp_abort(struct sock *sk, int err)
 {
 	lock_sock(sk);
 
+	/* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing
+	 * with close()
+	 */
+	if (sock_flag(sk, SOCK_DEAD))
+		goto out;
+
 	sk->sk_err = err;
 	sk->sk_error_report(sk);
 	__udp_disconnect(sk, 0);
 
+out:
 	release_sock(sk);
 
 	return 0;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 199b080d418a..2984ec43b9aa 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1598,6 +1598,7 @@ void udpv6_destroy_sock(struct sock *sk)
 {
 	struct udp_sock *up = udp_sk(sk);
 	lock_sock(sk);
+	sock_set_flag(sk, SOCK_DEAD);
 	udp_v6_flush_pending_frames(sk);
 	release_sock(sk);
 


  reply	other threads:[~2021-05-21 18:19 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-12 18:21 Panic in udp4_lib_lookup2 kapandey
2021-05-13  8:05 ` Paolo Abeni
2021-05-13 19:48   ` Kaustubh Pandey
2021-05-13 21:18     ` Paolo Abeni
2021-05-18 14:20       ` Kaustubh Pandey
2021-05-21 18:18         ` Paolo Abeni [this message]
2021-06-04 14:26           ` Kaustubh Pandey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da0d68ea162be29d489ad9dec575f8a8558e8e57.camel@redhat.com \
    --to=pabeni@redhat.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kapandey@codeaurora.org \
    --cc=netdev@vger.kernel.org \
    --cc=sharathv@codeaurora.org \
    --cc=subashab@codeaurora.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.