linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] datagram: return from __skb_recv_datagram() as soon as possible
@ 2018-07-15 11:13 Baoyou Xie
  2018-07-16 22:17 ` Willem de Bruijn
  0 siblings, 1 reply; 3+ messages in thread
From: Baoyou Xie @ 2018-07-15 11:13 UTC (permalink / raw)
  To: davem, willemb, viro, gregkh, pombredanne, tklauser, matthew
  Cc: netdev, linux-kernel, Baoyou Xie

We got a soft lockup in a heavy busy cloud server where RIP is
at _raw_spin_unlock_irqrestore+0x1b/0x40:
        [] finish_wait+0x56/0x70
        [] __skb_recv_datagram+0x3fb/0x5a0
        [] ? datagram_poll+0x100/0x100
        [] skb_recv_datagram+0x41/0x60
        [] netlink_recvmsg+0x62/0x450
        [] sock_recvmsg+0xbf/0x100
        [] ? futex_wait+0x193/0x280
        [] ? finish_task_switch+0x108/0x170
        [] SYSC_recvfrom+0xe8/0x160
        [] ? __schedule+0x3c8/0x990
        [] SyS_recvfrom+0xe/0x10
        [] system_call_fastpath+0x16/0x1b

In fact, a mistake exists in __skb_recv_datagram(). For example,
if a datagram come in persistently after go through the socket
queue, then __skb_wait_for_more_packets() will find out that the
last peeked skb is not the real last one, so it return 0. this
results in long time outer loop, and can trigger soft lockup.

So this patch changes the loop condition to prevent soft lockup.

Signed-off-by: Baoyou Xie <baoyou.xie@gmail.com>
---
 net/core/datagram.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index 9938952..76c1001 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -295,9 +295,11 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 				    int *peeked, int *off, int *err)
 {
 	struct sk_buff *skb, *last;
+	unsigned long expire;
 	long timeo;
 
 	timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
+	expire = jiffies + timeo;
 
 	do {
 		skb = __skb_try_recv_datagram(sk, flags, destructor, peeked,
@@ -307,7 +309,7 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
 
 		if (*err != -EAGAIN)
 			break;
-	} while (timeo &&
+	} while (time_before(jiffies, expire) &&
 		!__skb_wait_for_more_packets(sk, err, &timeo, last));
 
 	return NULL;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] datagram: return from __skb_recv_datagram() as soon as possible
  2018-07-15 11:13 [PATCH v2] datagram: return from __skb_recv_datagram() as soon as possible Baoyou Xie
@ 2018-07-16 22:17 ` Willem de Bruijn
  2018-07-18  5:15   ` David Miller
  0 siblings, 1 reply; 3+ messages in thread
From: Willem de Bruijn @ 2018-07-16 22:17 UTC (permalink / raw)
  To: baoyou.xie
  Cc: David Miller, Willem de Bruijn, Al Viro, Greg Kroah-Hartman,
	pombredanne, Tobias Klauser, Matthew Dawson, Network Development,
	LKML

On Sun, Jul 15, 2018 at 4:17 AM Baoyou Xie <baoyou.xie@gmail.com> wrote:
>
> We got a soft lockup in a heavy busy cloud server where RIP is
> at _raw_spin_unlock_irqrestore+0x1b/0x40:
>         [] finish_wait+0x56/0x70
>         [] __skb_recv_datagram+0x3fb/0x5a0
>         [] ? datagram_poll+0x100/0x100
>         [] skb_recv_datagram+0x41/0x60
>         [] netlink_recvmsg+0x62/0x450
>         [] sock_recvmsg+0xbf/0x100
>         [] ? futex_wait+0x193/0x280
>         [] ? finish_task_switch+0x108/0x170
>         [] SYSC_recvfrom+0xe8/0x160
>         [] ? __schedule+0x3c8/0x990
>         [] SyS_recvfrom+0xe/0x10
>         [] system_call_fastpath+0x16/0x1b
>
> In fact, a mistake exists in __skb_recv_datagram(). For example,
> if a datagram come in persistently after go through the socket
> queue, then __skb_wait_for_more_packets() will find out that the
> last peeked skb is not the real last one, so it return 0. this
> results in long time outer loop, and can trigger soft lockup.

Is this with MSG_PEEK?

If the above occurs, that implies that the queue is not empty so the
next iteration of the loop in __skb_recv_datagram should return
the oldest packet on the queue.

This is a netlink socket. Those do not support SO_PEEK_OFF,
simplifying the problem somewhat.

I do not yet see how this can loop until timeout if data is queued.

> So this patch changes the loop condition to prevent soft lockup.

Bounding waiting time in this manner should not be needed, as
__skb_wait_for_more_packets reduces remaining timeo on wake.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] datagram: return from __skb_recv_datagram() as soon as possible
  2018-07-16 22:17 ` Willem de Bruijn
@ 2018-07-18  5:15   ` David Miller
  0 siblings, 0 replies; 3+ messages in thread
From: David Miller @ 2018-07-18  5:15 UTC (permalink / raw)
  To: willemdebruijn.kernel
  Cc: baoyou.xie, willemb, viro, gregkh, pombredanne, tklauser,
	matthew, netdev, linux-kernel

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Mon, 16 Jul 2018 15:17:54 -0700

> If the above occurs, that implies that the queue is not empty so the
> next iteration of the loop in __skb_recv_datagram should return
> the oldest packet on the queue.

Isn't it possible, with two threads pulling from the socket in
parallel, for one of them to be constantly unable to pass that
test:

	if (sk->sk_receive_queue.prev != skb)
		goto out;

because the other one empties the queue too quickly every time?

We sample 'last' with the queue lock held, but the above test is done
without that lock.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-07-18  5:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-15 11:13 [PATCH v2] datagram: return from __skb_recv_datagram() as soon as possible Baoyou Xie
2018-07-16 22:17 ` Willem de Bruijn
2018-07-18  5:15   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).