netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crash
@ 2012-11-07  0:15 Julius Werner
  2012-11-07  1:39 ` Dave Jones
  0 siblings, 1 reply; 19+ messages in thread
From: Julius Werner @ 2012-11-07  0:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, Patrick McHardy, Hideaki YOSHIFUJI, James Morris,
	Alexey Kuznetsov, David S. Miller, Sameer Nanda,
	Mandeep Singh Baines, Eric Dumazet, Julius Werner

tcp_recvmsg contains a sanity check that WARNs when there is a gap
between the socket's copied_seq and the first buffer in the
sk_receive_queue. In theory, the TCP stack makes sure that This Should
Never Happen (TM)... however, practice shows that there are still a few
bug reports from it out there (and one in my inbox).

Unfortunately, when it does happen for whatever reason, the situation
is not handled very well: the kernel logs a warning and breaks out of
the loop that walks the receive queue. It proceeds to find nothing else
to do on the socket and hits sk_wait_data, which cannot block because
the receive queue is not empty. As no data was read, the outer while
loop repeats (logging the same warning again) ad infinitum until the
system's syslog exhausts all available hard drive capacity.

This patch improves that behavior by going straight to a proper kernel
crash. The cause of the error can be identified right away and the
system's hard drive is not unnecessarily strained.

Signed-off-by: Julius Werner <jwerner@chromium.org>
---
 net/ipv4/tcp.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 197c000..fcb0927 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1628,7 +1628,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 				 "recvmsg bug: copied %X seq %X rcvnxt %X fl %X\n",
 				 *seq, TCP_SKB_CB(skb)->seq, tp->rcv_nxt,
 				 flags))
-				break;
+				BUG();
 
 			offset = *seq - TCP_SKB_CB(skb)->seq;
 			if (tcp_hdr(skb)->syn)
-- 
1.7.8.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crash
  2012-11-07  0:15 [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crash Julius Werner
@ 2012-11-07  1:39 ` Dave Jones
  2012-11-07  1:51   ` Julius Werner
  2012-11-07  1:51   ` [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crash Eric Dumazet
  0 siblings, 2 replies; 19+ messages in thread
From: Dave Jones @ 2012-11-07  1:39 UTC (permalink / raw)
  To: Julius Werner
  Cc: linux-kernel, netdev, Patrick McHardy, Hideaki YOSHIFUJI,
	James Morris, Alexey Kuznetsov, David S. Miller, Sameer Nanda,
	Mandeep Singh Baines, Eric Dumazet

On Tue, Nov 06, 2012 at 04:15:35PM -0800, Julius Werner wrote:
 > tcp_recvmsg contains a sanity check that WARNs when there is a gap
 > between the socket's copied_seq and the first buffer in the
 > sk_receive_queue. In theory, the TCP stack makes sure that This Should
 > Never Happen (TM)... however, practice shows that there are still a few
 > bug reports from it out there (and one in my inbox).
 > 
 > Unfortunately, when it does happen for whatever reason, the situation
 > is not handled very well: the kernel logs a warning and breaks out of
 > the loop that walks the receive queue. It proceeds to find nothing else
 > to do on the socket and hits sk_wait_data, which cannot block because
 > the receive queue is not empty. As no data was read, the outer while
 > loop repeats (logging the same warning again) ad infinitum until the
 > system's syslog exhausts all available hard drive capacity.
 > 
 > This patch improves that behavior by going straight to a proper kernel
 > crash. The cause of the error can be identified right away and the
 > system's hard drive is not unnecessarily strained.
 > 
 > Signed-off-by: Julius Werner <jwerner@chromium.org>
 > ---
 >  net/ipv4/tcp.c |    2 +-
 >  1 files changed, 1 insertions(+), 1 deletions(-)
 > 
 > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
 > index 197c000..fcb0927 100644
 > --- a/net/ipv4/tcp.c
 > +++ b/net/ipv4/tcp.c
 > @@ -1628,7 +1628,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 >  				 "recvmsg bug: copied %X seq %X rcvnxt %X fl %X\n",
 >  				 *seq, TCP_SKB_CB(skb)->seq, tp->rcv_nxt,
 >  				 flags))
 > -				break;
 > +				BUG();
 >  
 >  			offset = *seq - TCP_SKB_CB(skb)->seq;
 >  			if (tcp_hdr(skb)->syn)

We've had reports of this WARN against the Fedora kernel for a while.
Had this been immediately followed by a BUG(), we'd have never seen those traces at all,
and just got "my machine just locked up" reports instead.

The proper fix here is to find out why we're getting into this state.

	Dave

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crash
  2012-11-07  1:39 ` Dave Jones
@ 2012-11-07  1:51   ` Julius Werner
  2012-11-07 15:54     ` Dave Jones
  2012-11-07  1:51   ` [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crash Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: Julius Werner @ 2012-11-07  1:51 UTC (permalink / raw)
  To: Dave Jones, Julius Werner, linux-kernel, netdev, Patrick McHardy,
	Hideaki YOSHIFUJI, James Morris, Alexey Kuznetsov,
	David S. Miller, Sameer Nanda, Mandeep Singh Baines,
	Eric Dumazet

> We've had reports of this WARN against the Fedora kernel for a while.
> Had this been immediately followed by a BUG(), we'd have never seen those traces at all,
> and just got "my machine just locked up" reports instead.
>
> The proper fix here is to find out why we're getting into this state.

Are you sure you don't mean the WARN below that ("recvmsg bug 2")
instead? I don't think this one can happen without eventually running
into the syslog overflow issue I described.

I agree that the underlying cause must be fixed too, but as we will
always have bugs in the kernel I think proper handling when it does
happen is also important (and filling the hard disk with junk is
obviously not the best approach). If you think a full panic is too
extreme, I have an alternative version of this patch that logs the
WARN once, closes the socket, and returns EBADFD from the syscall...
would you think that is more appropriate?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crash
  2012-11-07  1:39 ` Dave Jones
  2012-11-07  1:51   ` Julius Werner
@ 2012-11-07  1:51   ` Eric Dumazet
  1 sibling, 0 replies; 19+ messages in thread
From: Eric Dumazet @ 2012-11-07  1:51 UTC (permalink / raw)
  To: Dave Jones
  Cc: Julius Werner, linux-kernel, netdev, Patrick McHardy,
	Hideaki YOSHIFUJI, James Morris, Alexey Kuznetsov,
	David S. Miller, Sameer Nanda, Mandeep Singh Baines,
	Eric Dumazet

On Tue, 2012-11-06 at 20:39 -0500, Dave Jones wrote:
> On Tue, Nov 06, 2012 at 04:15:35PM -0800, Julius Werner wrote:
>  > tcp_recvmsg contains a sanity check that WARNs when there is a gap
>  > between the socket's copied_seq and the first buffer in the
>  > sk_receive_queue. In theory, the TCP stack makes sure that This Should
>  > Never Happen (TM)... however, practice shows that there are still a few
>  > bug reports from it out there (and one in my inbox).
>  > 
>  > Unfortunately, when it does happen for whatever reason, the situation
>  > is not handled very well: the kernel logs a warning and breaks out of
>  > the loop that walks the receive queue. It proceeds to find nothing else
>  > to do on the socket and hits sk_wait_data, which cannot block because
>  > the receive queue is not empty. As no data was read, the outer while
>  > loop repeats (logging the same warning again) ad infinitum until the
>  > system's syslog exhausts all available hard drive capacity.
>  > 
>  > This patch improves that behavior by going straight to a proper kernel
>  > crash. The cause of the error can be identified right away and the
>  > system's hard drive is not unnecessarily strained.
>  > 
>  > Signed-off-by: Julius Werner <jwerner@chromium.org>
>  > ---
>  >  net/ipv4/tcp.c |    2 +-
>  >  1 files changed, 1 insertions(+), 1 deletions(-)
>  > 
>  > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>  > index 197c000..fcb0927 100644
>  > --- a/net/ipv4/tcp.c
>  > +++ b/net/ipv4/tcp.c
>  > @@ -1628,7 +1628,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
>  >  				 "recvmsg bug: copied %X seq %X rcvnxt %X fl %X\n",
>  >  				 *seq, TCP_SKB_CB(skb)->seq, tp->rcv_nxt,
>  >  				 flags))
>  > -				break;
>  > +				BUG();
>  >  
>  >  			offset = *seq - TCP_SKB_CB(skb)->seq;
>  >  			if (tcp_hdr(skb)->syn)
> 
> We've had reports of this WARN against the Fedora kernel for a while.
> Had this been immediately followed by a BUG(), we'd have never seen those traces at all,
> and just got "my machine just locked up" reports instead.
> 
> The proper fix here is to find out why we're getting into this state.

Yes, but there is no need to fill syslog over and over.

In fact, some drivers are buggy and can overwrite skbs.

Thats also a security issue, as payload can be changed without notice
(unless SSL or application checksums are done, see commit
abf02cfc179bb4bd for an example)

Quite frankly BUG_ON() here is the only way we can fix bugs instead of
being lazy.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crash
  2012-11-07  1:51   ` Julius Werner
@ 2012-11-07 15:54     ` Dave Jones
  2012-11-07 16:29       ` [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crashusers Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Jones @ 2012-11-07 15:54 UTC (permalink / raw)
  To: Julius Werner
  Cc: linux-kernel, netdev, Patrick McHardy, Hideaki YOSHIFUJI,
	James Morris, Alexey Kuznetsov, David S. Miller, Sameer Nanda,
	Mandeep Singh Baines, Eric Dumazet

On Tue, Nov 06, 2012 at 05:51:19PM -0800, Julius Werner wrote:
 > > We've had reports of this WARN against the Fedora kernel for a while.
 > > Had this been immediately followed by a BUG(), we'd have never seen those traces at all,
 > > and just got "my machine just locked up" reports instead.
 > >
 > > The proper fix here is to find out why we're getting into this state.
 > 
 > Are you sure you don't mean the WARN below that ("recvmsg bug 2")
 > instead? I don't think this one can happen without eventually running
 > into the syslog overflow issue I described.

bug2 is more common (And usually is accompanied by mangled traces),
but we have reports of the first WARN too..

https://bugzilla.redhat.com/show_bug.cgi?id=841769
https://bugzilla.redhat.com/show_bug.cgi?id=845853
https://bugzilla.redhat.com/show_bug.cgi?id=846991
https://bugzilla.redhat.com/show_bug.cgi?id=860039

(I note that none of these reports mention "also, my hard disk is now full")

 > I agree that the underlying cause must be fixed too, but as we will
 > always have bugs in the kernel I think proper handling when it does
 > happen is also important (and filling the hard disk with junk is
 > obviously not the best approach). If you think a full panic is too
 > extreme, I have an alternative version of this patch that logs the
 > WARN once, closes the socket, and returns EBADFD from the syscall...
 > would you think that is more appropriate?

It sounds more appropriate to me, instead of silently wedging the box.
At least with that approach we have a chance of finding out what happened.

	Dave

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crashusers
  2012-11-07 15:54     ` Dave Jones
@ 2012-11-07 16:29       ` Eric Dumazet
  2012-11-07 16:43         ` Dave Jones
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2012-11-07 16:29 UTC (permalink / raw)
  To: Dave Jones
  Cc: Julius Werner, linux-kernel, netdev, Patrick McHardy,
	Hideaki YOSHIFUJI, James Morris, Alexey Kuznetsov,
	David S. Miller, Sameer Nanda, Mandeep Singh Baines,
	Eric Dumazet

On Wed, 2012-11-07 at 10:54 -0500, Dave Jones wrote:

> It sounds more appropriate to me, instead of silently wedging the box.
> At least with that approach we have a chance of finding out what happened.

Its quite the opposite.

If bug is still there 6 months after the commits that broke the drivers,
(making an old bug visible) that means that people never realized the
bug was there.

I understand a distro maintainer has its own choices, but for upstream
kernel we want to have early reports.

This bug is fatal and a security issue. BUG() is appropriate.

If the driver cant be fixed, it should be marked broken.

So I personally NACKed patch to hide the bug, trying to be friendly to
the user.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crashusers
  2012-11-07 16:29       ` [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crashusers Eric Dumazet
@ 2012-11-07 16:43         ` Dave Jones
  2012-11-07 17:05           ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Jones @ 2012-11-07 16:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Julius Werner, linux-kernel, netdev, Patrick McHardy,
	Hideaki YOSHIFUJI, James Morris, Alexey Kuznetsov,
	David S. Miller, Sameer Nanda, Mandeep Singh Baines,
	Eric Dumazet

On Wed, Nov 07, 2012 at 08:29:12AM -0800, Eric Dumazet wrote:
 > On Wed, 2012-11-07 at 10:54 -0500, Dave Jones wrote:
 > 
 > > It sounds more appropriate to me, instead of silently wedging the box.
 > > At least with that approach we have a chance of finding out what happened.
 > 
 > Its quite the opposite.
 > 
 > If bug is still there 6 months after the commits that broke the drivers,
 > (making an old bug visible) that means that people never realized the
 > bug was there.

dude, look at the bug reports I just pointed you at.
People _are_ aware there are bugs there.

If you turn that into a BUG() those reports would never have been filed.
How is that increasing awareness ?  People are going to see wedged computers,
and hit the reset button. If we're lucky, we'll get photos of someone lucky
enough to have hit it while at the console, not in X. But this is a huge
step backwards for debugability.

 > I understand a distro maintainer has its own choices, but for upstream
 > kernel we want to have early reports.

I'm running out of ways to word this, but I'll try again.
You won't get those early reports if you turn this into a BUG().

 > This bug is fatal and a security issue. BUG() is appropriate.

turning a bug into a remote DoS is also a security issue.

	Dave

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crashusers
  2012-11-07 16:43         ` Dave Jones
@ 2012-11-07 17:05           ` Eric Dumazet
  2012-11-07 17:15             ` Dave Jones
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2012-11-07 17:05 UTC (permalink / raw)
  To: Dave Jones
  Cc: Julius Werner, linux-kernel, netdev, Patrick McHardy,
	Hideaki YOSHIFUJI, James Morris, Alexey Kuznetsov,
	David S. Miller, Sameer Nanda, Mandeep Singh Baines,
	Eric Dumazet

On Wed, 2012-11-07 at 11:43 -0500, Dave Jones wrote:

> dude, look at the bug reports I just pointed you at.
> People _are_ aware there are bugs there.
> 

If I remember well, I helped to fix some of them.

> If you turn that into a BUG() those reports would never have been filed.
> How is that increasing awareness ?  People are going to see wedged computers,
> and hit the reset button. If we're lucky, we'll get photos of someone lucky
> enough to have hit it while at the console, not in X. But this is a huge
> step backwards for debugability.
> 
>  > I understand a distro maintainer has its own choices, but for upstream
>  > kernel we want to have early reports.
> 
> I'm running out of ways to word this, but I'll try again.
> You won't get those early reports if you turn this into a BUG().
> 
>  > This bug is fatal and a security issue. BUG() is appropriate.
> 
> turning a bug into a remote DoS is also a security issue.
> 

Apparently in some cases we can loop and fill the syslog, or
else Julius wouldnt have sent a patch.

So the proper fix is to emit this message only once, and to find
a way to alert the user security is compromised.

So if BUG() isnt good, just use WARN_ON_ONCE()

I feel that WARN_ON_ONCE() wont be clear enough to the user, especially
if we recover from this by closing the tcp session, exactly as if we
received a proper FIN.

Really if you object a BUG() here, I cant understand you didnt shout to
other BUG() uses in the kernel.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crashusers
  2012-11-07 17:05           ` Eric Dumazet
@ 2012-11-07 17:15             ` Dave Jones
  2012-11-07 19:32               ` Julius Werner
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Jones @ 2012-11-07 17:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Julius Werner, linux-kernel, netdev, Patrick McHardy,
	Hideaki YOSHIFUJI, James Morris, Alexey Kuznetsov,
	David S. Miller, Sameer Nanda, Mandeep Singh Baines,
	Eric Dumazet

On Wed, Nov 07, 2012 at 09:05:02AM -0800, Eric Dumazet wrote:
 > On Wed, 2012-11-07 at 11:43 -0500, Dave Jones wrote:
 > 
 > > dude, look at the bug reports I just pointed you at.
 > > People _are_ aware there are bugs there.
 > > 
 > If I remember well, I helped to fix some of them.

indeed, and I commend you for it. I want to help you fix more ;)

 > >  > I understand a distro maintainer has its own choices, but for upstream
 > >  > kernel we want to have early reports.
 > > 
 > > I'm running out of ways to word this, but I'll try again.
 > > You won't get those early reports if you turn this into a BUG().
 > > 
 > >  > This bug is fatal and a security issue. BUG() is appropriate.
 > > 
 > > turning a bug into a remote DoS is also a security issue.
 > 
 > Apparently in some cases we can loop and fill the syslog, or
 > else Julius wouldnt have sent a patch.
 > 
 > So the proper fix is to emit this message only once, and to find
 > a way to alert the user security is compromised.
 > 
 > So if BUG() isnt good, just use WARN_ON_ONCE()
 > 
 > I feel that WARN_ON_ONCE() wont be clear enough to the user, especially
 > if we recover from this by closing the tcp session, exactly as if we
 > received a proper FIN.

Judging by the mangled traces we've seen, further reports after the initial
one aren't too useful anyway.  Automated detectors like abrt should be
able to pick up these traces from the logs on the next reboot.
(Which would probably be better than it trying to file them immediately over
 the network when the tcp layer is so confused)

sidenote: If the integrity of the tcp layer is in question, maybe some kind of
localised version of BUG() that just shuts down that subsystem might
be something worth persueing.

 > Really if you object a BUG() here, I cant understand you didnt shout to
 > other BUG() uses in the kernel.

When I see them, I call them. But I am just one person, and usage of that
macro is like a disease.

	Dave

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crashusers
  2012-11-07 17:15             ` Dave Jones
@ 2012-11-07 19:32               ` Julius Werner
  2012-11-07 19:33                 ` [PATCH] tcp: Avoid infinite loop on recvmsg bug Julius Werner
  0 siblings, 1 reply; 19+ messages in thread
From: Julius Werner @ 2012-11-07 19:32 UTC (permalink / raw)
  To: Dave Jones, Eric Dumazet, Julius Werner, linux-kernel, netdev,
	Patrick McHardy, Hideaki YOSHIFUJI, James Morris,
	Alexey Kuznetsov, David S. Miller, Sameer Nanda,
	Mandeep Singh Baines, Eric Dumazet

I tend to agree with Dave that it's not in the user's best interest to
have a full-on BUG() here, and that we can get our reports just as
well by fishing them from the log through abrt or something similar. I
will just submit my alternative patch too and let you decide which one
you prefer.

This version shuts down the socket, so the broken receive queue will
not be used again and eventually freed. Other sockets and the system
as a whole will stay usable and probably still work if the bug is a
very rare coincidence. Of course, the driver will still be buggy, but
the same would stay true after a reboot (which is what most people do
after a panic). The userland caller gets an unexpected error code,
which is not the same as receiving a proper FIN and is the only thing
we can do to communicate this.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] tcp: Avoid infinite loop on recvmsg bug
  2012-11-07 19:32               ` Julius Werner
@ 2012-11-07 19:33                 ` Julius Werner
  2012-11-07 19:40                   ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Julius Werner @ 2012-11-07 19:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: netdev, Patrick McHardy, Hideaki YOSHIFUJI, James Morris,
	Alexey Kuznetsov, David S. Miller, Eric Dumazet, Dave Jones,
	Sameer Nanda, Mandeep Singh Baines, Julius Werner

tcp_recvmsg contains a sanity check that WARNs when there is a gap
between the socket's copied_seq and the first buffer in the
sk_receive_queue. In theory, the TCP stack makes sure that This Should
Never Happen (TM)... however, practice shows that there are still a few
bug reports from it out there (and one in my inbox).

Unfortunately, when it does happen for whatever reason, the situation
is not handled very well: the kernel logs a warning and breaks out of
the loop that walks the receive queue. It proceeds to find nothing else
to do on the socket and hits sk_wait_data, which cannot block because
the receive queue is not empty. As no data was read, the outer while
loop repeats (logging the same warning again) ad infinitum until the
system's syslog exhausts all available hard drive capacity.

This patch addresses that issue by closing the socket outright and
throwing EBADFD to userspace (which seems most appropriate to me at this
point). As the underlying bug condition is "impossible" and therefore by
definition unrecoverable, this is the only sensible action other than a
full panic.

Signed-off-by: Julius Werner <jwerner@chromium.org>
---
 net/ipv4/tcp.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 197c000..d612308 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1628,7 +1628,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 				 "recvmsg bug: copied %X seq %X rcvnxt %X fl %X\n",
 				 *seq, TCP_SKB_CB(skb)->seq, tp->rcv_nxt,
 				 flags))
-				break;
+				goto selfdestruct;
 
 			offset = *seq - TCP_SKB_CB(skb)->seq;
 			if (tcp_hdr(skb)->syn)
@@ -1936,6 +1936,11 @@ recv_urg:
 recv_sndq:
 	err = tcp_peek_sndq(sk, msg, len);
 	goto out;
+
+selfdestruct:
+	err = -EBADFD;
+	tcp_done(sk);
+	goto out;
 }
 EXPORT_SYMBOL(tcp_recvmsg);
 
-- 
1.7.8.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] tcp: Avoid infinite loop on recvmsg bug
  2012-11-07 19:33                 ` [PATCH] tcp: Avoid infinite loop on recvmsg bug Julius Werner
@ 2012-11-07 19:40                   ` Eric Dumazet
  2012-11-07 21:14                     ` Julius Werner
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2012-11-07 19:40 UTC (permalink / raw)
  To: Julius Werner
  Cc: linux-kernel, netdev, Patrick McHardy, Hideaki YOSHIFUJI,
	James Morris, Alexey Kuznetsov, David S. Miller, Dave Jones,
	Sameer Nanda, Mandeep Singh Baines

On Wed, 2012-11-07 at 11:33 -0800, Julius Werner wrote:
> tcp_recvmsg contains a sanity check that WARNs when there is a gap
> between the socket's copied_seq and the first buffer in the
> sk_receive_queue. In theory, the TCP stack makes sure that This Should
> Never Happen (TM)... however, practice shows that there are still a few
> bug reports from it out there (and one in my inbox).
> 
> Unfortunately, when it does happen for whatever reason, the situation
> is not handled very well: the kernel logs a warning and breaks out of
> the loop that walks the receive queue. It proceeds to find nothing else
> to do on the socket and hits sk_wait_data, which cannot block because
> the receive queue is not empty. As no data was read, the outer while
> loop repeats (logging the same warning again) ad infinitum until the
> system's syslog exhausts all available hard drive capacity.
> 
> This patch addresses that issue by closing the socket outright and
> throwing EBADFD to userspace (which seems most appropriate to me at this
> point). As the underlying bug condition is "impossible" and therefore by
> definition unrecoverable, this is the only sensible action other than a
> full panic.
> 
> Signed-off-by: Julius Werner <jwerner@chromium.org>
> ---
>  net/ipv4/tcp.c |    7 ++++++-
>  1 files changed, 6 insertions(+), 1 deletions(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 197c000..d612308 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1628,7 +1628,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
>  				 "recvmsg bug: copied %X seq %X rcvnxt %X fl %X\n",
>  				 *seq, TCP_SKB_CB(skb)->seq, tp->rcv_nxt,
>  				 flags))
> -				break;
> +				goto selfdestruct;
>  
>  			offset = *seq - TCP_SKB_CB(skb)->seq;
>  			if (tcp_hdr(skb)->syn)
> @@ -1936,6 +1936,11 @@ recv_urg:
>  recv_sndq:
>  	err = tcp_peek_sndq(sk, msg, len);
>  	goto out;
> +
> +selfdestruct:
> +	err = -EBADFD;
> +	tcp_done(sk);
> +	goto out;
>  }
>  EXPORT_SYMBOL(tcp_recvmsg);
>  


What I find very sad in all this is that you didnt mention the driver
that was triggering this bug.

So instead of making real progress, we are discussing of some dubious
'fixes'

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tcp: Avoid infinite loop on recvmsg bug
  2012-11-07 19:40                   ` Eric Dumazet
@ 2012-11-07 21:14                     ` Julius Werner
  2012-11-07 23:33                       ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Julius Werner @ 2012-11-07 21:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, netdev, Patrick McHardy, Hideaki YOSHIFUJI,
	James Morris, Alexey Kuznetsov, David S. Miller, Dave Jones,
	Sameer Nanda, Mandeep Singh Baines

> What I find very sad in all this is that you didnt mention the driver
> that was triggering this bug.

Sorry, I was just trying to keep this thread focussed on one patch.
The bug report that led me to this is publicly accessible at
http://crosbug.com/35827. We have encountered the problem only once,
on an Acer AC700 Chromebook that ran automated tests. The ethernet
interface for the offending socket was provided by a USB-to-Ethernet
dongle using the smsc95xx/usbnet module (v1.0.4).

Don't get me wrong, I do understand the importance of finding the
underlying cause of this... I just don't think I have much of a chance
with one report. I can go through the above-mentioned module and see
if something looks suspicious in the skb handling code if I can find
the time. But on the other hand the fact remains that this condition
is not handled well... not just for this particular case, but for all
future kernel and driver bugs that may trigger it again. I am not
trying to "hide" any issues, I am all for making them as visible as
possible... but as Dave pointed out, kernel panics may not be the best
way to do that either, and I think damage mitigation also has some
value. The current code clearly does the worst of both worlds, so
please let's just improve it one way or the other.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tcp: Avoid infinite loop on recvmsg bug
  2012-11-07 21:14                     ` Julius Werner
@ 2012-11-07 23:33                       ` Eric Dumazet
  2012-11-07 23:42                         ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2012-11-07 23:33 UTC (permalink / raw)
  To: Julius Werner
  Cc: linux-kernel, netdev, Patrick McHardy, Hideaki YOSHIFUJI,
	James Morris, Alexey Kuznetsov, David S. Miller, Dave Jones,
	Sameer Nanda, Mandeep Singh Baines

On Wed, 2012-11-07 at 13:14 -0800, Julius Werner wrote:
> > What I find very sad in all this is that you didnt mention the driver
> > that was triggering this bug.
> 
> Sorry, I was just trying to keep this thread focussed on one patch.
> The bug report that led me to this is publicly accessible at
> http://crosbug.com/35827. We have encountered the problem only once,
> on an Acer AC700 Chromebook that ran automated tests. The ethernet
> interface for the offending socket was provided by a USB-to-Ethernet
> dongle using the smsc95xx/usbnet module (v1.0.4).

This driver uses interesting skb_clone() games and skb->truesize lies :

skb->truesize = size + sizeof(struct sk_buff);

So you probably are fighting a bug we already fixed in upstream kernel.

(commit c8628155ece363 "tcp: reduce out_of_order memory use" did not
played well with cloned skbs.)

This issue was already discussed on netdev in the past.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tcp: Avoid infinite loop on recvmsg bug
  2012-11-07 23:33                       ` Eric Dumazet
@ 2012-11-07 23:42                         ` Eric Dumazet
  2012-11-08  2:25                           ` Julius Werner
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2012-11-07 23:42 UTC (permalink / raw)
  To: Julius Werner
  Cc: linux-kernel, netdev, Patrick McHardy, Hideaki YOSHIFUJI,
	James Morris, Alexey Kuznetsov, David S. Miller, Dave Jones,
	Sameer Nanda, Mandeep Singh Baines

On Wed, 2012-11-07 at 15:33 -0800, Eric Dumazet wrote:

> So you probably are fighting a bug we already fixed in upstream kernel.
> 
> (commit c8628155ece363 "tcp: reduce out_of_order memory use" did not
> played well with cloned skbs.)
> 
> This issue was already discussed on netdev in the past.

If you use a 3.4 kernel, you want the following patch.

(I guess you could reproduce the crash easily running a tcpdump in //)


diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 257b617..9f8f68c 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4496,7 +4496,9 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 		 * to avoid future tcp_collapse_ofo_queue(),
 		 * probably the most expensive function in tcp stack.
 		 */
-		if (skb->len <= skb_tailroom(skb1) && !tcp_hdr(skb)->fin) {
+		if (skb->len <= skb_tailroom(skb1) &&
+		    !tcp_hdr(skb)->fin &&
+		    !skb_cloned(skb1)) {
 			NET_INC_STATS_BH(sock_net(sk),
 					 LINUX_MIB_TCPRCVCOALESCE);
 			BUG_ON(skb_copy_bits(skb, 0,

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] tcp: Avoid infinite loop on recvmsg bug
  2012-11-07 23:42                         ` Eric Dumazet
@ 2012-11-08  2:25                           ` Julius Werner
  2012-11-09  3:29                             ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Julius Werner @ 2012-11-08  2:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, netdev, Patrick McHardy, Hideaki YOSHIFUJI,
	James Morris, Alexey Kuznetsov, David S. Miller, Dave Jones,
	Sameer Nanda, Mandeep Singh Baines

> So you probably are fighting a bug we already fixed in upstream kernel.
>
> (commit c8628155ece363 "tcp: reduce out_of_order memory use" did not
> played well with cloned skbs.)
>
> This issue was already discussed on netdev in the past.

Thanks for the hint. Unfortunately, we have not pulled c8628 into our
tree yet, so that's not it. Is there another point where the cloned
skb or the faked truesize might make it break? We have been running
this test with that hardware some 30 times in the last months and only
seen it once, so it cannot be that common.

I have noticed that you have already proposed a patch to repair
smsc95xx (replacing the clone with a copy) on this list a few times...
what's the status on that? Will it be committed eventually or did you
abandon that approach?

Regardless of that, I still think that the bug handling in tcp_recvmsg
should be updated in one way or the other.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tcp: Avoid infinite loop on recvmsg bug
  2012-11-08  2:25                           ` Julius Werner
@ 2012-11-09  3:29                             ` Eric Dumazet
  2012-12-10 19:33                               ` Julius Werner
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2012-11-09  3:29 UTC (permalink / raw)
  To: Julius Werner
  Cc: linux-kernel, netdev, Patrick McHardy, Hideaki YOSHIFUJI,
	James Morris, Alexey Kuznetsov, David S. Miller, Dave Jones,
	Sameer Nanda, Mandeep Singh Baines

On Wed, 2012-11-07 at 18:25 -0800, Julius Werner wrote:
> > So you probably are fighting a bug we already fixed in upstream kernel.
> >
> > (commit c8628155ece363 "tcp: reduce out_of_order memory use" did not
> > played well with cloned skbs.)
> >
> > This issue was already discussed on netdev in the past.
> 
> Thanks for the hint. Unfortunately, we have not pulled c8628 into our
> tree yet, so that's not it. Is there another point where the cloned
> skb or the faked truesize might make it break? We have been running
> this test with that hardware some 30 times in the last months and only
> seen it once, so it cannot be that common.

Update : Chrome OS current tree is based on 3.4 and really needed the
patch :

https://gerrit.chromium.org/gerrit/#/c/37666/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tcp: Avoid infinite loop on recvmsg bug
  2012-11-09  3:29                             ` Eric Dumazet
@ 2012-12-10 19:33                               ` Julius Werner
  2012-12-10 20:23                                 ` David Miller
  0 siblings, 1 reply; 19+ messages in thread
From: Julius Werner @ 2012-12-10 19:33 UTC (permalink / raw)
  To: Dave Jones
  Cc: linux-kernel, netdev, Patrick McHardy, Hideaki YOSHIFUJI,
	James Morris, Alexey Kuznetsov, Eric Dumazet, David S. Miller,
	Sameer Nanda, Mandeep Singh Baines

Hi Dave,

Have you thought about picking up one of the patches to tcp_recvmsg I
proposed in this thread? We consider the underlying bug in Chromium OS
that led mere here to be fixed now, but I bet this will not be the
last time someone hits this code path and has to deal with the bad
error handling.

I understand that not everyone here agrees on what the best solution
is, but I think both of them are far better than the inconsistent and
potentially hard-disk-filling way that the current kernel does it.

On Wed, 2012-11-07 at 11:33 -0800, Julius Werner wrote:
> tcp_recvmsg contains a sanity check that WARNs when there is a gap
> between the socket's copied_seq and the first buffer in the
> sk_receive_queue. In theory, the TCP stack makes sure that This Should
> Never Happen (TM)... however, practice shows that there are still a few
> bug reports from it out there (and one in my inbox).
>
> Unfortunately, when it does happen for whatever reason, the situation
> is not handled very well: the kernel logs a warning and breaks out of
> the loop that walks the receive queue. It proceeds to find nothing else
> to do on the socket and hits sk_wait_data, which cannot block because
> the receive queue is not empty. As no data was read, the outer while
> loop repeats (logging the same warning again) ad infinitum until the
> system's syslog exhausts all available hard drive capacity.
>
> This patch addresses that issue by closing the socket outright and
> throwing EBADFD to userspace (which seems most appropriate to me at this
> point). As the underlying bug condition is "impossible" and therefore by
> definition unrecoverable, this is the only sensible action other than a
> full panic.
>
> Signed-off-by: Julius Werner <jwerner@chromium.org>
> ---
>  net/ipv4/tcp.c |    7 ++++++-
>  1 files changed, 6 insertions(+), 1 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 197c000..d612308 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1628,7 +1628,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
>                                "recvmsg bug: copied %X seq %X rcvnxt %X fl %X\n",
>                                *seq, TCP_SKB_CB(skb)->seq, tp->rcv_nxt,
>                                flags))
> -                             break;
> +                             goto selfdestruct;
>
>                       offset = *seq - TCP_SKB_CB(skb)->seq;
>                       if (tcp_hdr(skb)->syn)
> @@ -1936,6 +1936,11 @@ recv_urg:
>  recv_sndq:
>       err = tcp_peek_sndq(sk, msg, len);
>       goto out;
> +
> +selfdestruct:
> +     err = -EBADFD;
> +     tcp_done(sk);
> +     goto out;
>  }
>  EXPORT_SYMBOL(tcp_recvmsg);
>

On Tue, Nov 06, 2012 at 04:15:35PM -0800, Julius Werner wrote:
> tcp_recvmsg contains a sanity check that WARNs when there is a gap
> between the socket's copied_seq and the first buffer in the
> sk_receive_queue. In theory, the TCP stack makes sure that This Should
> Never Happen (TM)... however, practice shows that there are still a few
> bug reports from it out there (and one in my inbox).
>
> Unfortunately, when it does happen for whatever reason, the situation
> is not handled very well: the kernel logs a warning and breaks out of
> the loop that walks the receive queue. It proceeds to find nothing else
> to do on the socket and hits sk_wait_data, which cannot block because
> the receive queue is not empty. As no data was read, the outer while
> loop repeats (logging the same warning again) ad infinitum until the
> system's syslog exhausts all available hard drive capacity.
>
> This patch improves that behavior by going straight to a proper kernel
> crash. The cause of the error can be identified right away and the
> system's hard drive is not unnecessarily strained.
>
> Signed-off-by: Julius Werner <jwerner@chromium.org>
> ---
>  net/ipv4/tcp.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 197c000..fcb0927 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1628,7 +1628,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
>                               "recvmsg bug: copied %X seq %X rcvnxt %X fl %X\n",
>                               *seq, TCP_SKB_CB(skb)->seq, tp->rcv_nxt,
>                               flags))
> -                            break;
> +                            BUG();
>
>                      offset = *seq - TCP_SKB_CB(skb)->seq;
>                      if (tcp_hdr(skb)->syn)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] tcp: Avoid infinite loop on recvmsg bug
  2012-12-10 19:33                               ` Julius Werner
@ 2012-12-10 20:23                                 ` David Miller
  0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2012-12-10 20:23 UTC (permalink / raw)
  To: jwerner
  Cc: davej, linux-kernel, netdev, kaber, yoshfuji, jmorris, kuznet,
	eric.dumazet, snanda, msb


I've tossed these two patches under the carpet, so you'll need to
repost whichever one you want me to consider.

Basically, discussing old patches is pretty useless without a resend
to get it back into the fore-front of the patchwork queue.  So please
don't reference old stale patches without an associated repost like
this.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-12-10 20:23 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-07  0:15 [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crash Julius Werner
2012-11-07  1:39 ` Dave Jones
2012-11-07  1:51   ` Julius Werner
2012-11-07 15:54     ` Dave Jones
2012-11-07 16:29       ` [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crashusers Eric Dumazet
2012-11-07 16:43         ` Dave Jones
2012-11-07 17:05           ` Eric Dumazet
2012-11-07 17:15             ` Dave Jones
2012-11-07 19:32               ` Julius Werner
2012-11-07 19:33                 ` [PATCH] tcp: Avoid infinite loop on recvmsg bug Julius Werner
2012-11-07 19:40                   ` Eric Dumazet
2012-11-07 21:14                     ` Julius Werner
2012-11-07 23:33                       ` Eric Dumazet
2012-11-07 23:42                         ` Eric Dumazet
2012-11-08  2:25                           ` Julius Werner
2012-11-09  3:29                             ` Eric Dumazet
2012-12-10 19:33                               ` Julius Werner
2012-12-10 20:23                                 ` David Miller
2012-11-07  1:51   ` [PATCH] tcp: Replace infinite loop on recvmsg bug with proper crash Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).