All of lore.kernel.org
 help / color / mirror / Atom feed
* TCP-MD5 checksum failure on x86_64 SMP
       [not found] <i2h571fb4001005031027y4a58c4dtfd28ddcdc08d8401@mail.gmail.com>
@ 2010-05-04  3:30 ` Bhaskar Dutta
  2010-05-04 11:32   ` Ben Hutchings
  0 siblings, 1 reply; 48+ messages in thread
From: Bhaskar Dutta @ 2010-05-04  3:30 UTC (permalink / raw)
  To: netdev

Hi,

I am observing intermittent TCP-MD5 checksum failures
(CONFIG_TCP_MD5SIG)  on kernel 2.6.31 while talking to a BGP router.

The problem is only seen in multi-core 64 bit machines.
Is there any known bug in the per_cpu_ptr implementation (I am aware
that the percpu allocator has been re-implemented in 2.6.33) that
might cause a corruption in 64 bit SMP machines?

Any pointers would be appreciated.

Thanks,
Bhaskar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-04  3:30 ` TCP-MD5 checksum failure on x86_64 SMP Bhaskar Dutta
@ 2010-05-04 11:32   ` Ben Hutchings
  2010-05-04 14:28     ` Bhaskar Dutta
  0 siblings, 1 reply; 48+ messages in thread
From: Ben Hutchings @ 2010-05-04 11:32 UTC (permalink / raw)
  To: Bhaskar Dutta; +Cc: netdev

On Tue, 2010-05-04 at 09:00 +0530, Bhaskar Dutta wrote:
> Hi,
> 
> I am observing intermittent TCP-MD5 checksum failures
> (CONFIG_TCP_MD5SIG)  on kernel 2.6.31 while talking to a BGP router.
> 
> The problem is only seen in multi-core 64 bit machines.
> Is there any known bug in the per_cpu_ptr implementation (I am aware
> that the percpu allocator has been re-implemented in 2.6.33) that
> might cause a corruption in 64 bit SMP machines?
> 
> Any pointers would be appreciated.

There was another recent report of incorrect MD5 signatures in
<http://thread.gmane.org/gmane.linux.network/159556>, but without any
response.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-04 11:32   ` Ben Hutchings
@ 2010-05-04 14:28     ` Bhaskar Dutta
  2010-05-04 16:12       ` Stephen Hemminger
  0 siblings, 1 reply; 48+ messages in thread
From: Bhaskar Dutta @ 2010-05-04 14:28 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev

On Tue, May 4, 2010 at 5:02 PM, Ben Hutchings <bhutchings@solarflare.com> wrote:
> On Tue, 2010-05-04 at 09:00 +0530, Bhaskar Dutta wrote:
>> Hi,
>>
>> I am observing intermittent TCP-MD5 checksum failures
>> (CONFIG_TCP_MD5SIG)  on kernel 2.6.31 while talking to a BGP router.
>>
>> The problem is only seen in multi-core 64 bit machines.
>> Is there any known bug in the per_cpu_ptr implementation (I am aware
>> that the percpu allocator has been re-implemented in 2.6.33) that
>> might cause a corruption in 64 bit SMP machines?
>>
>> Any pointers would be appreciated.
>
> There was another recent report of incorrect MD5 signatures in
> <http://thread.gmane.org/gmane.linux.network/159556>, but without any
> response.
>
> Ben.
>

I found another thread posted back in Jan 2007 with a similar bug
(x86_64 on 2.6.20) but no replies to that as well.
http://lkml.org/lkml/2007/1/20/56

Bhaskar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-04 14:28     ` Bhaskar Dutta
@ 2010-05-04 16:12       ` Stephen Hemminger
  2010-05-04 17:08         ` Bhaskar Dutta
  0 siblings, 1 reply; 48+ messages in thread
From: Stephen Hemminger @ 2010-05-04 16:12 UTC (permalink / raw)
  To: Bhaskar Dutta; +Cc: Ben Hutchings, netdev

On Tue, 4 May 2010 19:58:32 +0530
Bhaskar Dutta <bhaskie@gmail.com> wrote:

> On Tue, May 4, 2010 at 5:02 PM, Ben Hutchings <bhutchings@solarflare.com> wrote:
> > On Tue, 2010-05-04 at 09:00 +0530, Bhaskar Dutta wrote:
> >> Hi,
> >>
> >> I am observing intermittent TCP-MD5 checksum failures
> >> (CONFIG_TCP_MD5SIG)  on kernel 2.6.31 while talking to a BGP router.
> >>
> >> The problem is only seen in multi-core 64 bit machines.
> >> Is there any known bug in the per_cpu_ptr implementation (I am aware
> >> that the percpu allocator has been re-implemented in 2.6.33) that
> >> might cause a corruption in 64 bit SMP machines?
> >>
> >> Any pointers would be appreciated.
> >
> > There was another recent report of incorrect MD5 signatures in
> > <http://thread.gmane.org/gmane.linux.network/159556>, but without any
> > response.
> >
> > Ben.
> >
> 
> I found another thread posted back in Jan 2007 with a similar bug
> (x86_64 on 2.6.20) but no replies to that as well.
> http://lkml.org/lkml/2007/1/20/56

2.6.20 had lots of other MD5 bugs. Your problem might be related to
GRO.  MD5 may not handle multi-fragment packets.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-04 16:12       ` Stephen Hemminger
@ 2010-05-04 17:08         ` Bhaskar Dutta
  2010-05-04 17:13           ` Stephen Hemminger
  0 siblings, 1 reply; 48+ messages in thread
From: Bhaskar Dutta @ 2010-05-04 17:08 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Ben Hutchings, netdev

On Tue, May 4, 2010 at 9:42 PM, Stephen Hemminger <shemminger@vyatta.com> wrote:
> On Tue, 4 May 2010 19:58:32 +0530
> Bhaskar Dutta <bhaskie@gmail.com> wrote:
>
>> On Tue, May 4, 2010 at 5:02 PM, Ben Hutchings <bhutchings@solarflare.com> wrote:
>> > On Tue, 2010-05-04 at 09:00 +0530, Bhaskar Dutta wrote:
>> >> Hi,
>> >>
>> >> I am observing intermittent TCP-MD5 checksum failures
>> >> (CONFIG_TCP_MD5SIG)  on kernel 2.6.31 while talking to a BGP router.
>> >>
>> >> The problem is only seen in multi-core 64 bit machines.
>> >> Is there any known bug in the per_cpu_ptr implementation (I am aware
>> >> that the percpu allocator has been re-implemented in 2.6.33) that
>> >> might cause a corruption in 64 bit SMP machines?
>> >>
>> >> Any pointers would be appreciated.
>> >
>> > There was another recent report of incorrect MD5 signatures in
>> > <http://thread.gmane.org/gmane.linux.network/159556>, but without any
>> > response.
>> >
>> > Ben.
>> >
>>
>> I found another thread posted back in Jan 2007 with a similar bug
>> (x86_64 on 2.6.20) but no replies to that as well.
>> http://lkml.org/lkml/2007/1/20/56
>
> 2.6.20 had lots of other MD5 bugs. Your problem might be related to
> GRO.  MD5 may not handle multi-fragment packets.
> --

I am getting the issue on 2.6.31 and 2.6.28 (gro infrastructure was
added in 2.6.29).
Also, both segmentation offloading as well as receive offloading
(gso/gro) are turned off.

Moreover outgoing TCP packets are the ones with the corrupt checksums.
Both tcpdump on my local machine and the BGP router on the other side
complain of the bad checksums with the same packet.

I am trying to figure out if there is something in the per-cpu
implementation that might be causing a corruption (SMP and x86_64) but
I am not really getting anywhere.

I am trying to reproduce the bad checksums with the latest kernel
sources since it has a new implementation of the percpu allocator.

Any pointers would be highly appreciated!

Thanks,
Bhaskar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-04 17:08         ` Bhaskar Dutta
@ 2010-05-04 17:13           ` Stephen Hemminger
  2010-05-05 18:03             ` Bhaskar Dutta
  0 siblings, 1 reply; 48+ messages in thread
From: Stephen Hemminger @ 2010-05-04 17:13 UTC (permalink / raw)
  To: Bhaskar Dutta; +Cc: Ben Hutchings, netdev

On Tue, 4 May 2010 22:38:49 +0530
Bhaskar Dutta <bhaskie@gmail.com> wrote:

> On Tue, May 4, 2010 at 9:42 PM, Stephen Hemminger <shemminger@vyatta.com> wrote:
> > On Tue, 4 May 2010 19:58:32 +0530
> > Bhaskar Dutta <bhaskie@gmail.com> wrote:
> >
> >> On Tue, May 4, 2010 at 5:02 PM, Ben Hutchings <bhutchings@solarflare.com> wrote:
> >> > On Tue, 2010-05-04 at 09:00 +0530, Bhaskar Dutta wrote:
> >> >> Hi,
> >> >>
> >> >> I am observing intermittent TCP-MD5 checksum failures
> >> >> (CONFIG_TCP_MD5SIG)  on kernel 2.6.31 while talking to a BGP router.
> >> >>
> >> >> The problem is only seen in multi-core 64 bit machines.
> >> >> Is there any known bug in the per_cpu_ptr implementation (I am aware
> >> >> that the percpu allocator has been re-implemented in 2.6.33) that
> >> >> might cause a corruption in 64 bit SMP machines?
> >> >>
> >> >> Any pointers would be appreciated.
> >> >
> >> > There was another recent report of incorrect MD5 signatures in
> >> > <http://thread.gmane.org/gmane.linux.network/159556>, but without any
> >> > response.
> >> >
> >> > Ben.
> >> >
> >>
> >> I found another thread posted back in Jan 2007 with a similar bug
> >> (x86_64 on 2.6.20) but no replies to that as well.
> >> http://lkml.org/lkml/2007/1/20/56
> >
> > 2.6.20 had lots of other MD5 bugs. Your problem might be related to
> > GRO.  MD5 may not handle multi-fragment packets.
> > --
> 
> I am getting the issue on 2.6.31 and 2.6.28 (gro infrastructure was
> added in 2.6.29).
> Also, both segmentation offloading as well as receive offloading
> (gso/gro) are turned off.
> 
> Moreover outgoing TCP packets are the ones with the corrupt checksums.
> Both tcpdump on my local machine and the BGP router on the other side
> complain of the bad checksums with the same packet.
> 
> I am trying to figure out if there is something in the per-cpu
> implementation that might be causing a corruption (SMP and x86_64) but
> I am not really getting anywhere.

I seriously doubt the per-cpu stuff is the issue.

> I am trying to reproduce the bad checksums with the latest kernel
> sources since it has a new implementation of the percpu allocator.

First turn off all offload settings on the device (TSO,GSO,SG,CSUM)
then check that size of the bad packets. Are they fragmented or
just simple linear packets?



-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-04 17:13           ` Stephen Hemminger
@ 2010-05-05 18:03             ` Bhaskar Dutta
  2010-05-05 18:53               ` Eric Dumazet
  0 siblings, 1 reply; 48+ messages in thread
From: Bhaskar Dutta @ 2010-05-05 18:03 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Ben Hutchings, netdev

On Tue, May 4, 2010 at 10:43 PM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
>
> On Tue, 4 May 2010 22:38:49 +0530
> Bhaskar Dutta <bhaskie@gmail.com> wrote:
>
> > On Tue, May 4, 2010 at 9:42 PM, Stephen Hemminger <shemminger@vyatta.com> wrote:
> > > On Tue, 4 May 2010 19:58:32 +0530
> > > Bhaskar Dutta <bhaskie@gmail.com> wrote:
> > >
> > >> On Tue, May 4, 2010 at 5:02 PM, Ben Hutchings <bhutchings@solarflare.com> wrote:
> > >> > On Tue, 2010-05-04 at 09:00 +0530, Bhaskar Dutta wrote:
> > >> >> Hi,
> > >> >>
> > >> >> I am observing intermittent TCP-MD5 checksum failures
> > >> >> (CONFIG_TCP_MD5SIG)  on kernel 2.6.31 while talking to a BGP router.
> > >> >>
> > >> >> The problem is only seen in multi-core 64 bit machines.
> > >> >> Is there any known bug in the per_cpu_ptr implementation (I am aware
> > >> >> that the percpu allocator has been re-implemented in 2.6.33) that
> > >> >> might cause a corruption in 64 bit SMP machines?
> > >> >>
> > >> >> Any pointers would be appreciated.
> > >> >
> > >> > There was another recent report of incorrect MD5 signatures in
> > >> > <http://thread.gmane.org/gmane.linux.network/159556>, but without any
> > >> > response.
> > >> >
> > >> > Ben.
> > >> >
> > >>
> > >> I found another thread posted back in Jan 2007 with a similar bug
> > >> (x86_64 on 2.6.20) but no replies to that as well.
> > >> http://lkml.org/lkml/2007/1/20/56
> > >
> > > 2.6.20 had lots of other MD5 bugs. Your problem might be related to
> > > GRO.  MD5 may not handle multi-fragment packets.
> > > --
> >
> > I am getting the issue on 2.6.31 and 2.6.28 (gro infrastructure was
> > added in 2.6.29).
> > Also, both segmentation offloading as well as receive offloading
> > (gso/gro) are turned off.
> >
> > Moreover outgoing TCP packets are the ones with the corrupt checksums.
> > Both tcpdump on my local machine and the BGP router on the other side
> > complain of the bad checksums with the same packet.
> >
> > I am trying to figure out if there is something in the per-cpu
> > implementation that might be causing a corruption (SMP and x86_64) but
> > I am not really getting anywhere.
>
> I seriously doubt the per-cpu stuff is the issue.
>
> > I am trying to reproduce the bad checksums with the latest kernel
> > sources since it has a new implementation of the percpu allocator.
>
> First turn off all offload settings on the device (TSO,GSO,SG,CSUM)
> then check that size of the bad packets. Are they fragmented or
> just simple linear packets?
>
> --

Hi,

TSO, GSO and SG are already turned off.
rx/tx checksumming is on, but that shouldn't matter, right?

# ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: off
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off

The bad packets are very small in size, most have no data at all (<300 bytes).

After adding some logs to kernel 2.6.31-12, it seems that
tcp_v4_md5_hash_skb (function that calculates the md5 hash) is
(might?) getting corrupt.

The tcp4_pseudohdr (bp = &hp->md5_blk.ip4) structure's saddr, daddr
and len fields get modified to different values towards the end of the
tcp_v4_md5_hash_skb function whenever there is a checksum error.

The tcp4_pseudohdr (bp) is within the tcp_md5sig_pool (hp), which is
filled up by tcp_get_md5sig_pool (which calls per_cpu_ptr).

Using a local copy of the tcp4_pseudohdr in the same function
tcp_v4_md5_hash_skb (copied all fields from the original
tcp4_pseudohdr within the tcp_md5sig_pool) and calculating the md5
checksum with the local  tcp4_pseudohdr seems to solve the issue
(don't see bad packets for a hours in load tests, and without the
change I can see them instantaneously in the load tests).

I am still unable to figure out how this is happening. Please let me
know if you have any pointers.

Thanks a lot!
Bhaskar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-05 18:03             ` Bhaskar Dutta
@ 2010-05-05 18:53               ` Eric Dumazet
  2010-05-06 11:55                 ` Bhaskar Dutta
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2010-05-05 18:53 UTC (permalink / raw)
  To: Bhaskar Dutta; +Cc: Stephen Hemminger, Ben Hutchings, netdev

Le mercredi 05 mai 2010 à 23:33 +0530, Bhaskar Dutta a écrit :

> Hi,
> 
> TSO, GSO and SG are already turned off.
> rx/tx checksumming is on, but that shouldn't matter, right?
> 
> # ethtool -k eth0
> Offload parameters for eth0:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: off
> tcp segmentation offload: off
> udp fragmentation offload: off
> generic segmentation offload: off
> 
> The bad packets are very small in size, most have no data at all (<300 bytes).
> 
> After adding some logs to kernel 2.6.31-12, it seems that
> tcp_v4_md5_hash_skb (function that calculates the md5 hash) is
> (might?) getting corrupt.
> 
> The tcp4_pseudohdr (bp = &hp->md5_blk.ip4) structure's saddr, daddr
> and len fields get modified to different values towards the end of the
> tcp_v4_md5_hash_skb function whenever there is a checksum error.
> 
> The tcp4_pseudohdr (bp) is within the tcp_md5sig_pool (hp), which is
> filled up by tcp_get_md5sig_pool (which calls per_cpu_ptr).
> 
> Using a local copy of the tcp4_pseudohdr in the same function
> tcp_v4_md5_hash_skb (copied all fields from the original
> tcp4_pseudohdr within the tcp_md5sig_pool) and calculating the md5
> checksum with the local  tcp4_pseudohdr seems to solve the issue
> (don't see bad packets for a hours in load tests, and without the
> change I can see them instantaneously in the load tests).
> 
> I am still unable to figure out how this is happening. Please let me
> know if you have any pointers.

I am not familiar with this code, but I suspect same per_cpu data can be
used at both time by a sender (process context) and by a receiver
(softirq context).

To trigger this, you need at least two active md5 sockets.

tcp_get_md5sig_pool() should probably disable bh to make sure current
cpu wont be preempted by softirq processing


Something like :

diff --git a/include/net/tcp.h b/include/net/tcp.h
index fb5c66b..e232123 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1221,12 +1221,15 @@ struct tcp_md5sig_pool          *tcp_get_md5sig_pool(void)
 	struct tcp_md5sig_pool *ret = __tcp_get_md5sig_pool(cpu);
 	if (!ret)
 		put_cpu();
+	else
+		local_bh_disable();
 	return ret;
 }
 
 static inline void             tcp_put_md5sig_pool(void)
 {
 	__tcp_put_md5sig_pool();
+	local_bh_enable();
 	put_cpu();
 }



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-05 18:53               ` Eric Dumazet
@ 2010-05-06 11:55                 ` Bhaskar Dutta
  2010-05-06 12:06                   ` Eric Dumazet
  2010-05-07  5:39                   ` Eric Dumazet
  0 siblings, 2 replies; 48+ messages in thread
From: Bhaskar Dutta @ 2010-05-06 11:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Stephen Hemminger, Ben Hutchings, netdev

On Thu, May 6, 2010 at 12:23 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mercredi 05 mai 2010 à 23:33 +0530, Bhaskar Dutta a écrit :
>
>> Hi,
>>
>> TSO, GSO and SG are already turned off.
>> rx/tx checksumming is on, but that shouldn't matter, right?
>>
>> # ethtool -k eth0
>> Offload parameters for eth0:
>> rx-checksumming: on
>> tx-checksumming: on
>> scatter-gather: off
>> tcp segmentation offload: off
>> udp fragmentation offload: off
>> generic segmentation offload: off
>>
>> The bad packets are very small in size, most have no data at all (<300 bytes).
>>
>> After adding some logs to kernel 2.6.31-12, it seems that
>> tcp_v4_md5_hash_skb (function that calculates the md5 hash) is
>> (might?) getting corrupt.
>>
>> The tcp4_pseudohdr (bp = &hp->md5_blk.ip4) structure's saddr, daddr
>> and len fields get modified to different values towards the end of the
>> tcp_v4_md5_hash_skb function whenever there is a checksum error.
>>
>> The tcp4_pseudohdr (bp) is within the tcp_md5sig_pool (hp), which is
>> filled up by tcp_get_md5sig_pool (which calls per_cpu_ptr).
>>
>> Using a local copy of the tcp4_pseudohdr in the same function
>> tcp_v4_md5_hash_skb (copied all fields from the original
>> tcp4_pseudohdr within the tcp_md5sig_pool) and calculating the md5
>> checksum with the local  tcp4_pseudohdr seems to solve the issue
>> (don't see bad packets for a hours in load tests, and without the
>> change I can see them instantaneously in the load tests).
>>
>> I am still unable to figure out how this is happening. Please let me
>> know if you have any pointers.
>
> I am not familiar with this code, but I suspect same per_cpu data can be
> used at both time by a sender (process context) and by a receiver
> (softirq context).
>
> To trigger this, you need at least two active md5 sockets.
>
> tcp_get_md5sig_pool() should probably disable bh to make sure current
> cpu wont be preempted by softirq processing
>
>
> Something like :
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index fb5c66b..e232123 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1221,12 +1221,15 @@ struct tcp_md5sig_pool          *tcp_get_md5sig_pool(void)
>        struct tcp_md5sig_pool *ret = __tcp_get_md5sig_pool(cpu);
>        if (!ret)
>                put_cpu();
> +       else
> +               local_bh_disable();
>        return ret;
>  }
>
>  static inline void             tcp_put_md5sig_pool(void)
>  {
>        __tcp_put_md5sig_pool();
> +       local_bh_enable();
>        put_cpu();
>  }
>
>
>

I put in the above change and ran some load tests with around 50
active TCP connections doing MD5.
I could see only 1 bad packet in 30 min (earlier the problem used to
occur instantaneously and repeatedly).

I think there is another possibility of being preempted when calling
tcp_alloc_md5sig_pool()
this function releases the spinlock when calling __tcp_alloc_md5sig_pool().

I will run some more tests after changing the  tcp_alloc_md5sig_pool
and see if the problem is completely resolved.

Thanks a lot for your help!
Bhaskar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-06 11:55                 ` Bhaskar Dutta
@ 2010-05-06 12:06                   ` Eric Dumazet
  2010-05-07  5:04                     ` David Miller
  2010-05-07  8:46                     ` Lars Eggert
  2010-05-07  5:39                   ` Eric Dumazet
  1 sibling, 2 replies; 48+ messages in thread
From: Eric Dumazet @ 2010-05-06 12:06 UTC (permalink / raw)
  To: Bhaskar Dutta; +Cc: Stephen Hemminger, Ben Hutchings, netdev

Le jeudi 06 mai 2010 à 17:25 +0530, Bhaskar Dutta a écrit :

> I put in the above change and ran some load tests with around 50
> active TCP connections doing MD5.
> I could see only 1 bad packet in 30 min (earlier the problem used to
> occur instantaneously and repeatedly).
> 
> I think there is another possibility of being preempted when calling
> tcp_alloc_md5sig_pool()
> this function releases the spinlock when calling __tcp_alloc_md5sig_pool().
> 
> I will run some more tests after changing the  tcp_alloc_md5sig_pool
> and see if the problem is completely resolved.
> 

This code should be completely rewritten for linux-2.6.35, its very ugly
and over complex, yet it is not scalable.

It could use true percpu data, with no central lock or refcount.




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-06 12:06                   ` Eric Dumazet
@ 2010-05-07  5:04                     ` David Miller
  2010-05-07  5:32                       ` Eric Dumazet
  2010-05-07  8:46                     ` Lars Eggert
  1 sibling, 1 reply; 48+ messages in thread
From: David Miller @ 2010-05-07  5:04 UTC (permalink / raw)
  To: eric.dumazet; +Cc: bhaskie, shemminger, bhutchings, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 06 May 2010 14:06:26 +0200

> Le jeudi 06 mai 2010 à 17:25 +0530, Bhaskar Dutta a écrit :
> 
>> I put in the above change and ran some load tests with around 50
>> active TCP connections doing MD5.
>> I could see only 1 bad packet in 30 min (earlier the problem used to
>> occur instantaneously and repeatedly).
>> 
>> I think there is another possibility of being preempted when calling
>> tcp_alloc_md5sig_pool()
>> this function releases the spinlock when calling __tcp_alloc_md5sig_pool().
>> 
>> I will run some more tests after changing the  tcp_alloc_md5sig_pool
>> and see if the problem is completely resolved.
>> 
> 
> This code should be completely rewritten for linux-2.6.35, its very ugly
> and over complex, yet it is not scalable.
> 
> It could use true percpu data, with no central lock or refcount.

Yes I've always disliked the way the TCP MD5 pool stuff is coded, it's
frankly crap and this is like the 5th major bug that had to get fixed
in it. :-)

But let's fix this as simply as possible in net-2.6 and -stable, so Eric
let me know when you have a tested patch for me to apply.

Thanks a lot!


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-07  5:04                     ` David Miller
@ 2010-05-07  5:32                       ` Eric Dumazet
  2010-05-07 17:14                         ` Stephen Hemminger
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2010-05-07  5:32 UTC (permalink / raw)
  To: David Miller; +Cc: bhaskie, shemminger, bhutchings, netdev

Le jeudi 06 mai 2010 à 22:04 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>

> > This code should be completely rewritten for linux-2.6.35, its very ugly
> > and over complex, yet it is not scalable.
> > 
> > It could use true percpu data, with no central lock or refcount.
> 
> Yes I've always disliked the way the TCP MD5 pool stuff is coded, it's
> frankly crap and this is like the 5th major bug that had to get fixed
> in it. :-)
> 
> But let's fix this as simply as possible in net-2.6 and -stable, so Eric
> let me know when you have a tested patch for me to apply.

There are so many things ...

I am comtemplating commit aa133076

__tcp_alloc_md5sig_pool() now do a : 

p = kzalloc(sizeof(*p), sk->sk_allocation);

But later call :

hash = crypto_alloc_hash("md5", 0, CRYPTO_ALG_ASYNC);

(GFP_KERNEL allocations for sure)


tcp_v4_parse_md5_keys() also use :

p = kzalloc(sizeof(*p), sk->sk_allocation);

right after a (possibly sleeping) copy_from_user()

Oh well...


I'll test the already suggested patch before official submission,
thanks.




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-06 11:55                 ` Bhaskar Dutta
  2010-05-06 12:06                   ` Eric Dumazet
@ 2010-05-07  5:39                   ` Eric Dumazet
  2010-05-07  8:00                     ` Eric Dumazet
  1 sibling, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2010-05-07  5:39 UTC (permalink / raw)
  To: Bhaskar Dutta; +Cc: Stephen Hemminger, Ben Hutchings, netdev

Le jeudi 06 mai 2010 à 17:25 +0530, Bhaskar Dutta a écrit :
> On Thu, May 6, 2010 at 12:23 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> > I am not familiar with this code, but I suspect same per_cpu data can be
> > used at both time by a sender (process context) and by a receiver
> > (softirq context).
> >
> > To trigger this, you need at least two active md5 sockets.
> >
> > tcp_get_md5sig_pool() should probably disable bh to make sure current
> > cpu wont be preempted by softirq processing
> >
> >
> > Something like :
> >
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index fb5c66b..e232123 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -1221,12 +1221,15 @@ struct tcp_md5sig_pool          *tcp_get_md5sig_pool(void)
> >        struct tcp_md5sig_pool *ret = __tcp_get_md5sig_pool(cpu);
> >        if (!ret)
> >                put_cpu();
> > +       else
> > +               local_bh_disable();
> >        return ret;
> >  }
> >
> >  static inline void             tcp_put_md5sig_pool(void)
> >  {
> >        __tcp_put_md5sig_pool();
> > +       local_bh_enable();
> >        put_cpu();
> >  }
> >
> >
> >
> 
> I put in the above change and ran some load tests with around 50
> active TCP connections doing MD5.
> I could see only 1 bad packet in 30 min (earlier the problem used to
> occur instantaneously and repeatedly).
> 


> I think there is another possibility of being preempted when calling
> tcp_alloc_md5sig_pool()
> this function releases the spinlock when calling __tcp_alloc_md5sig_pool().
> 
> I will run some more tests after changing the  tcp_alloc_md5sig_pool
> and see if the problem is completely resolved.

I cant see a race with spinlock in
tcp_alloc_md5sig_pool/__tcp_alloc_md5sig_pool().

We allocate structures for all cpus, so preemption.migration should be
OK

Could you elaborate please ?



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-07  5:39                   ` Eric Dumazet
@ 2010-05-07  8:00                     ` Eric Dumazet
  2010-05-07  8:59                       ` Bhaskar Dutta
  2010-05-16  7:35                       ` David Miller
  0 siblings, 2 replies; 48+ messages in thread
From: Eric Dumazet @ 2010-05-07  8:00 UTC (permalink / raw)
  To: Bhaskar Dutta; +Cc: Stephen Hemminger, Ben Hutchings, netdev, David Miller

Le vendredi 07 mai 2010 à 07:39 +0200, Eric Dumazet a écrit :
> Le jeudi 06 mai 2010 à 17:25 +0530, Bhaskar Dutta a écrit :
> > On Thu, May 6, 2010 at 12:23 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > > I am not familiar with this code, but I suspect same per_cpu data can be
> > > used at both time by a sender (process context) and by a receiver
> > > (softirq context).
> > >
> > > To trigger this, you need at least two active md5 sockets.
> > >
> > > tcp_get_md5sig_pool() should probably disable bh to make sure current
> > > cpu wont be preempted by softirq processing
> > >
> > >
> > > Something like :
> > >
> > > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > > index fb5c66b..e232123 100644
> > > --- a/include/net/tcp.h
> > > +++ b/include/net/tcp.h
> > > @@ -1221,12 +1221,15 @@ struct tcp_md5sig_pool          *tcp_get_md5sig_pool(void)
> > >        struct tcp_md5sig_pool *ret = __tcp_get_md5sig_pool(cpu);
> > >        if (!ret)
> > >                put_cpu();
> > > +       else
> > > +               local_bh_disable();
> > >        return ret;
> > >  }
> > >
> > >  static inline void             tcp_put_md5sig_pool(void)
> > >  {
> > >        __tcp_put_md5sig_pool();
> > > +       local_bh_enable();
> > >        put_cpu();
> > >  }
> > >
> > >
> > >
> > 
> > I put in the above change and ran some load tests with around 50
> > active TCP connections doing MD5.
> > I could see only 1 bad packet in 30 min (earlier the problem used to
> > occur instantaneously and repeatedly).
> > 
> 
> 
> > I think there is another possibility of being preempted when calling
> > tcp_alloc_md5sig_pool()
> > this function releases the spinlock when calling __tcp_alloc_md5sig_pool().
> > 
> > I will run some more tests after changing the  tcp_alloc_md5sig_pool
> > and see if the problem is completely resolved.

Here is my official patch submission, could you please test it ?

Thanks

[PATCH] tcp: fix MD5 (RFC2385) support

TCP MD5 support uses percpu data for temporary storage. It currently
disables preemption so that same storage cannot be reclaimed by another
thread on same cpu.

We also have to make sure a softirq handler wont try to use also same
context. Various bug reports demonstrated corruptions.

Fix is to disable preemption and BH.

Reported-by: Bhaskar Dutta <bhaskie@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/tcp.h |   21 +++------------------
 net/ipv4/tcp.c    |   34 ++++++++++++++++++++++++----------
 2 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 75be5a2..aa04b9a 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1197,30 +1197,15 @@ extern int			tcp_v4_md5_do_del(struct sock *sk,
 extern struct tcp_md5sig_pool * __percpu *tcp_alloc_md5sig_pool(struct sock *);
 extern void			tcp_free_md5sig_pool(void);
 
-extern struct tcp_md5sig_pool	*__tcp_get_md5sig_pool(int cpu);
-extern void			__tcp_put_md5sig_pool(void);
+extern struct tcp_md5sig_pool	*tcp_get_md5sig_pool(void);
+extern void			tcp_put_md5sig_pool(void);
+
 extern int tcp_md5_hash_header(struct tcp_md5sig_pool *, struct tcphdr *);
 extern int tcp_md5_hash_skb_data(struct tcp_md5sig_pool *, struct sk_buff *,
 				 unsigned header_len);
 extern int tcp_md5_hash_key(struct tcp_md5sig_pool *hp,
 			    struct tcp_md5sig_key *key);
 
-static inline
-struct tcp_md5sig_pool		*tcp_get_md5sig_pool(void)
-{
-	int cpu = get_cpu();
-	struct tcp_md5sig_pool *ret = __tcp_get_md5sig_pool(cpu);
-	if (!ret)
-		put_cpu();
-	return ret;
-}
-
-static inline void		tcp_put_md5sig_pool(void)
-{
-	__tcp_put_md5sig_pool();
-	put_cpu();
-}
-
 /* write queue abstraction */
 static inline void tcp_write_queue_purge(struct sock *sk)
 {
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0f8caf6..296150b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2839,7 +2839,6 @@ static void __tcp_free_md5sig_pool(struct tcp_md5sig_pool * __percpu *pool)
 			if (p->md5_desc.tfm)
 				crypto_free_hash(p->md5_desc.tfm);
 			kfree(p);
-			p = NULL;
 		}
 	}
 	free_percpu(pool);
@@ -2937,25 +2936,40 @@ retry:
 
 EXPORT_SYMBOL(tcp_alloc_md5sig_pool);
 
-struct tcp_md5sig_pool *__tcp_get_md5sig_pool(int cpu)
+
+/**
+ *	tcp_get_md5sig_pool - get md5sig_pool for this user
+ *
+ *	We use percpu structure, so if we succeed, we exit with preemption
+ *	and BH disabled, to make sure another thread or softirq handling
+ *	wont try to get same context.
+ */
+struct tcp_md5sig_pool *tcp_get_md5sig_pool(void)
 {
 	struct tcp_md5sig_pool * __percpu *p;
-	spin_lock_bh(&tcp_md5sig_pool_lock);
+
+	local_bh_disable();
+
+	spin_lock(&tcp_md5sig_pool_lock);
 	p = tcp_md5sig_pool;
 	if (p)
 		tcp_md5sig_users++;
-	spin_unlock_bh(&tcp_md5sig_pool_lock);
-	return (p ? *per_cpu_ptr(p, cpu) : NULL);
-}
+	spin_unlock(&tcp_md5sig_pool_lock);
+
+	if (p)
+		return *per_cpu_ptr(p, smp_processor_id());
 
-EXPORT_SYMBOL(__tcp_get_md5sig_pool);
+	local_bh_enable();
+	return NULL;
+}
+EXPORT_SYMBOL(tcp_get_md5sig_pool);
 
-void __tcp_put_md5sig_pool(void)
+void tcp_put_md5sig_pool(void)
 {
+	local_bh_enable();
 	tcp_free_md5sig_pool();
 }
-
-EXPORT_SYMBOL(__tcp_put_md5sig_pool);
+EXPORT_SYMBOL(tcp_put_md5sig_pool);
 
 int tcp_md5_hash_header(struct tcp_md5sig_pool *hp,
 			struct tcphdr *th)



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-06 12:06                   ` Eric Dumazet
  2010-05-07  5:04                     ` David Miller
@ 2010-05-07  8:46                     ` Lars Eggert
  2010-05-07  8:55                       ` Eric Dumazet
  2010-05-07  9:12                       ` David Miller
  1 sibling, 2 replies; 48+ messages in thread
From: Lars Eggert @ 2010-05-07  8:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Bhaskar Dutta, Stephen Hemminger, Ben Hutchings, netdev

[-- Attachment #1: Type: text/plain, Size: 616 bytes --]

On 2010-5-6, at 15:06, Eric Dumazet wrote:
> This code should be completely rewritten for linux-2.6.35, its very ugly
> and over complex, yet it is not scalable.

Let me quickly point out that the IETF has recently replaced TCP-MD5 with TCP-AO (http://tools.ietf.org/html/draft-ietf-tcpm-tcp-auth-opt-11) which is days or weeks away from being published as an RFC. (The draft has been approved and it waiting for the RFC Editor to finish the final copy-editing.)

You'll obviously still want TCP-MD5 support for talking to existing equipment, but significant cycles would IMO be better spent on TCP-AO.

Lars

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2490 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-07  8:46                     ` Lars Eggert
@ 2010-05-07  8:55                       ` Eric Dumazet
  2010-05-07  9:12                       ` David Miller
  1 sibling, 0 replies; 48+ messages in thread
From: Eric Dumazet @ 2010-05-07  8:55 UTC (permalink / raw)
  To: Lars Eggert; +Cc: Bhaskar Dutta, Stephen Hemminger, Ben Hutchings, netdev

Le vendredi 07 mai 2010 à 11:46 +0300, Lars Eggert a écrit :
> On 2010-5-6, at 15:06, Eric Dumazet wrote:
> > This code should be completely rewritten for linux-2.6.35, its very ugly
> > and over complex, yet it is not scalable.
> 
> Let me quickly point out that the IETF has recently replaced TCP-MD5 with TCP-AO (http://tools.ietf.org/html/draft-ietf-tcpm-tcp-auth-opt-11) which is days or weeks away from being published as an RFC. (The draft has been approved and it waiting for the RFC Editor to finish the final copy-editing.)
> 
> You'll obviously still want TCP-MD5 support for talking to existing equipment, but significant cycles would IMO be better spent on TCP-AO.
> 

Thanks for this info !



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-07  8:00                     ` Eric Dumazet
@ 2010-05-07  8:59                       ` Bhaskar Dutta
  2010-05-07  9:37                         ` Eric Dumazet
  2010-05-16  7:35                       ` David Miller
  1 sibling, 1 reply; 48+ messages in thread
From: Bhaskar Dutta @ 2010-05-07  8:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Stephen Hemminger, Ben Hutchings, netdev, David Miller

On Fri, May 7, 2010 at 1:30 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le vendredi 07 mai 2010 à 07:39 +0200, Eric Dumazet a écrit :
>> Le jeudi 06 mai 2010 à 17:25 +0530, Bhaskar Dutta a écrit :
>> > On Thu, May 6, 2010 at 12:23 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>> > > I am not familiar with this code, but I suspect same per_cpu data can be
>> > > used at both time by a sender (process context) and by a receiver
>> > > (softirq context).
>> > >
>> > > To trigger this, you need at least two active md5 sockets.
>> > >
>> > > tcp_get_md5sig_pool() should probably disable bh to make sure current
>> > > cpu wont be preempted by softirq processing
>> > >
>> > >
>> > > Something like :
>> > >
>> > > diff --git a/include/net/tcp.h b/include/net/tcp.h
>> > > index fb5c66b..e232123 100644
>> > > --- a/include/net/tcp.h
>> > > +++ b/include/net/tcp.h
>> > > @@ -1221,12 +1221,15 @@ struct tcp_md5sig_pool          *tcp_get_md5sig_pool(void)
>> > >        struct tcp_md5sig_pool *ret = __tcp_get_md5sig_pool(cpu);
>> > >        if (!ret)
>> > >                put_cpu();
>> > > +       else
>> > > +               local_bh_disable();
>> > >        return ret;
>> > >  }
>> > >
>> > >  static inline void             tcp_put_md5sig_pool(void)
>> > >  {
>> > >        __tcp_put_md5sig_pool();
>> > > +       local_bh_enable();
>> > >        put_cpu();
>> > >  }
>> > >
>> > >
>> > >
>> >
>> > I put in the above change and ran some load tests with around 50
>> > active TCP connections doing MD5.
>> > I could see only 1 bad packet in 30 min (earlier the problem used to
>> > occur instantaneously and repeatedly).
>> >
>>
>>
>> > I think there is another possibility of being preempted when calling
>> > tcp_alloc_md5sig_pool()
>> > this function releases the spinlock when calling __tcp_alloc_md5sig_pool().
>> >
>> > I will run some more tests after changing the  tcp_alloc_md5sig_pool
>> > and see if the problem is completely resolved.
>
> Here is my official patch submission, could you please test it ?
>


Eric,

Thanks a lot! I will test it out and let you know.
BTW this patch seems to essentially do the same as the earlier fix you
had posted (where you just do bh disable/enable).
Am I missing something?

With the earlier fix, I ran load tests with 80 TCP connections for
over 6 hrs and found 5 bad checksum packets.
So there is still a problem. Without the fix I see a bad packet every
minute or so.

Bhaskar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-07  8:46                     ` Lars Eggert
  2010-05-07  8:55                       ` Eric Dumazet
@ 2010-05-07  9:12                       ` David Miller
  1 sibling, 0 replies; 48+ messages in thread
From: David Miller @ 2010-05-07  9:12 UTC (permalink / raw)
  To: lars.eggert; +Cc: eric.dumazet, bhaskie, shemminger, bhutchings, netdev

From: Lars Eggert <lars.eggert@nokia.com>
Date: Fri, 7 May 2010 11:46:40 +0300

> You'll obviously still want TCP-MD5 support for talking to existing
> equipment, but significant cycles would IMO be better spent on
> TCP-AO.

Code we have and users use which is unstable and crashes is more
important to work on and fix than code we don't have which
user's therefore don't use.

You're wrong from just about every possible angle.

Whoever finds AO useful will work on it, just as was the case with MD5
support.  And right now, that "whoever" is definitely not us. :-)


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-07  8:59                       ` Bhaskar Dutta
@ 2010-05-07  9:37                         ` Eric Dumazet
  2010-05-07 10:50                           ` Bhaskar Dutta
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2010-05-07  9:37 UTC (permalink / raw)
  To: Bhaskar Dutta; +Cc: Stephen Hemminger, Ben Hutchings, netdev, David Miller

Le vendredi 07 mai 2010 à 14:29 +0530, Bhaskar Dutta a écrit :

> Eric,
> 
> Thanks a lot! I will test it out and let you know.
> BTW this patch seems to essentially do the same as the earlier fix you
> had posted (where you just do bh disable/enable).
> Am I missing something?
> 
> With the earlier fix, I ran load tests with 80 TCP connections for
> over 6 hrs and found 5 bad checksum packets.
> So there is still a problem. Without the fix I see a bad packet every
> minute or so.

My second patch is cleaner, using only out of line code (inline was not
necessary and made include file bigger than necessary). Inline is fine
if we can avoid a function call, but it was not the case.

If you notice another corruption, it may be because of another problem,
yet to be discovered.

To you have a userland suite to test/stress tcp md5 connections ?




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-07  9:37                         ` Eric Dumazet
@ 2010-05-07 10:50                           ` Bhaskar Dutta
  2010-05-07 15:18                             ` Eric Dumazet
  0 siblings, 1 reply; 48+ messages in thread
From: Bhaskar Dutta @ 2010-05-07 10:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Stephen Hemminger, Ben Hutchings, netdev, David Miller

On Fri, May 7, 2010 at 3:07 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le vendredi 07 mai 2010 à 14:29 +0530, Bhaskar Dutta a écrit :
>
>> Eric,
>>
>> Thanks a lot! I will test it out and let you know.
>> BTW this patch seems to essentially do the same as the earlier fix you
>> had posted (where you just do bh disable/enable).
>> Am I missing something?
>>
>> With the earlier fix, I ran load tests with 80 TCP connections for
>> over 6 hrs and found 5 bad checksum packets.
>> So there is still a problem. Without the fix I see a bad packet every
>> minute or so.
>
> My second patch is cleaner, using only out of line code (inline was not
> necessary and made include file bigger than necessary). Inline is fine
> if we can avoid a function call, but it was not the case.
>
> If you notice another corruption, it may be because of another problem,
> yet to be discovered.
>
> To you have a userland suite to test/stress tcp md5 connections ?
>
>

We are still trying to find the other corruption.

I will send you the tarball of a userland client/server suite to
stress test the TCP-MD5 that we've been using to reproduce the issue.

Thanks,
Bhaskar

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-07 10:50                           ` Bhaskar Dutta
@ 2010-05-07 15:18                             ` Eric Dumazet
  2010-05-07 15:44                               ` Eric Dumazet
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2010-05-07 15:18 UTC (permalink / raw)
  To: Bhaskar Dutta; +Cc: Stephen Hemminger, Ben Hutchings, netdev, David Miller

OK, I found the second problem.

if/when IP route cache is invalidated, ip_queue_xmit() has to refetch a
route and calls sk_setup_caps(sk, &rt->u.dst), destroying the 

sk->sk_route_caps &= ~NETIF_F_GSO_MASK

that MD5 desesperatly try to make all over its way (from
tcp_transmit_skb() for example)

So we send few bad packets, and everything is fine when
tcp_transmit_skb() is called again.

You get many errors on remote peer if you do

ip route flush cache




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-07 15:18                             ` Eric Dumazet
@ 2010-05-07 15:44                               ` Eric Dumazet
  2010-05-07 21:18                                 ` Eric Dumazet
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2010-05-07 15:44 UTC (permalink / raw)
  To: Bhaskar Dutta; +Cc: Stephen Hemminger, Ben Hutchings, netdev, David Miller

Le vendredi 07 mai 2010 à 17:18 +0200, Eric Dumazet a écrit :
> OK, I found the second problem.
> 
> if/when IP route cache is invalidated, ip_queue_xmit() has to refetch a
> route and calls sk_setup_caps(sk, &rt->u.dst), destroying the 
> 
> sk->sk_route_caps &= ~NETIF_F_GSO_MASK
> 
> that MD5 desesperatly try to make all over its way (from
> tcp_transmit_skb() for example)
> 
> So we send few bad packets, and everything is fine when
> tcp_transmit_skb() is called again.
> 
> You get many errors on remote peer if you do
> 
> ip route flush cache
> 

I am testing following patch :

 include/net/sock.h    |    8 ++++++++
 net/core/sock.c       |    1 +
 net/ipv4/tcp_ipv4.c   |    6 +++---
 net/ipv4/tcp_output.c |    2 +-
 net/ipv6/tcp_ipv6.c   |    4 ++--
 5 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 1ad6435..abfadfe 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -177,6 +177,7 @@ struct sock_common {
   *		   %SO_OOBINLINE settings, %SO_TIMESTAMPING settings
   *	@sk_no_check: %SO_NO_CHECK setting, wether or not checkup packets
   *	@sk_route_caps: route capabilities (e.g. %NETIF_F_TSO)
+  *	@sk_route_nocaps: forbidden route capabilities (e.g NETIF_F_GSO_MASK)
   *	@sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4)
   *	@sk_gso_max_size: Maximum GSO segment size to build
   *	@sk_lingertime: %SO_LINGER l_linger setting
@@ -276,6 +277,7 @@ struct sock {
 	int			sk_forward_alloc;
 	gfp_t			sk_allocation;
 	int			sk_route_caps;
+	int			sk_route_nocaps;
 	int			sk_gso_type;
 	unsigned int		sk_gso_max_size;
 	int			sk_rcvlowat;
@@ -1257,6 +1259,12 @@ static inline int sk_can_gso(const struct sock *sk)
 
 extern void sk_setup_caps(struct sock *sk, struct dst_entry *dst);
 
+static inline void sk_nocaps_add(struct sock *sk, int flags)
+{
+	sk->sk_route_nocaps |= flags;
+	sk->sk_route_caps &= ~flags;
+}
+
 static inline int skb_copy_to_page(struct sock *sk, char __user *from,
 				   struct sk_buff *skb, struct page *page,
 				   int off, int copy)
diff --git a/net/core/sock.c b/net/core/sock.c
index c5812bb..5056a6a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1227,6 +1227,7 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
 	sk->sk_route_caps = dst->dev->features;
 	if (sk->sk_route_caps & NETIF_F_GSO)
 		sk->sk_route_caps |= NETIF_F_GSO_SOFTWARE;
+	sk->sk_route_caps &= ~sk->sk_route_nocaps;
 	if (sk_can_gso(sk)) {
 		if (dst->header_len) {
 			sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 3c23e70..f1a1dd9 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -894,7 +894,7 @@ int tcp_v4_md5_do_add(struct sock *sk, __be32 addr,
 				kfree(newkey);
 				return -ENOMEM;
 			}
-			sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
+			sk_nocaps_add(sk, NETIF_F_GSO_MASK);
 		}
 		if (tcp_alloc_md5sig_pool(sk) == NULL) {
 			kfree(newkey);
@@ -1024,7 +1024,7 @@ static int tcp_v4_parse_md5_keys(struct sock *sk, char __user *optval,
 			return -EINVAL;
 
 		tp->md5sig_info = p;
-		sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
+		sk_nocaps_add(sk, NETIF_F_GSO_MASK);
 	}
 
 	newkey = kmemdup(cmd.tcpm_key, cmd.tcpm_keylen, sk->sk_allocation);
@@ -1465,7 +1465,7 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
 		if (newkey != NULL)
 			tcp_v4_md5_do_add(newsk, newinet->inet_daddr,
 					  newkey, key->keylen);
-		newsk->sk_route_caps &= ~NETIF_F_GSO_MASK;
+		sk_nocaps_add(newsk, NETIF_F_GSO_MASK);
 	}
 #endif
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 0dda86e..0193a39 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -872,7 +872,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 #ifdef CONFIG_TCP_MD5SIG
 	/* Calculate the MD5 hash, as we have all we need now */
 	if (md5) {
-		sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
+		sk_nocaps_add(sk, NETIF_F_GSO_MASK);
 		tp->af_specific->calc_md5_hash(opts.hash_location,
 					       md5, sk, NULL, skb);
 	}
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 075f540..bf34893 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -600,7 +600,7 @@ static int tcp_v6_md5_do_add(struct sock *sk, struct in6_addr *peer,
 				kfree(newkey);
 				return -ENOMEM;
 			}
-			sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
+			sk_nocaps_add(sk, NETIF_F_GSO_MASK);
 		}
 		if (tcp_alloc_md5sig_pool(sk) == NULL) {
 			kfree(newkey);
@@ -737,7 +737,7 @@ static int tcp_v6_parse_md5_keys (struct sock *sk, char __user *optval,
 			return -ENOMEM;
 
 		tp->md5sig_info = p;
-		sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
+		sk_nocaps_add(sk, NETIF_F_GSO_MASK);
 	}
 
 	newkey = kmemdup(cmd.tcpm_key, cmd.tcpm_keylen, GFP_KERNEL);

 


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-07  5:32                       ` Eric Dumazet
@ 2010-05-07 17:14                         ` Stephen Hemminger
  2010-05-07 17:21                           ` Eric Dumazet
  0 siblings, 1 reply; 48+ messages in thread
From: Stephen Hemminger @ 2010-05-07 17:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, bhaskie, bhutchings, netdev

On Fri, 07 May 2010 07:32:09 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le jeudi 06 mai 2010 à 22:04 -0700, David Miller a écrit :
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> 
> > > This code should be completely rewritten for linux-2.6.35, its very ugly
> > > and over complex, yet it is not scalable.
> > > 
> > > It could use true percpu data, with no central lock or refcount.
> > 
> > Yes I've always disliked the way the TCP MD5 pool stuff is coded, it's
> > frankly crap and this is like the 5th major bug that had to get fixed
> > in it. :-)
> > 
> > But let's fix this as simply as possible in net-2.6 and -stable, so Eric
> > let me know when you have a tested patch for me to apply.
> 
> There are so many things ...
> 
> I am comtemplating commit aa133076
> 
> __tcp_alloc_md5sig_pool() now do a : 
> 
> p = kzalloc(sizeof(*p), sk->sk_allocation);
> 
> But later call :
> 
> hash = crypto_alloc_hash("md5", 0, CRYPTO_ALG_ASYNC);
> 
> (GFP_KERNEL allocations for sure)
> 
> 
> tcp_v4_parse_md5_keys() also use :
> 
> p = kzalloc(sizeof(*p), sk->sk_allocation);
> 
> right after a (possibly sleeping) copy_from_user()
> 
> Oh well...
> 
> 
> I'll test the already suggested patch before official submission,
> thanks.

Forget the per cpu data; the pool should just be scrapped.

The only reason the pool exists is that the crypto hash state which
should just be moved into the md5_info (88 bytes).  The pseudo
header can just be generated on the stack before passing to the crypto
code.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-07 17:14                         ` Stephen Hemminger
@ 2010-05-07 17:21                           ` Eric Dumazet
  2010-05-07 17:36                             ` Stephen Hemminger
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2010-05-07 17:21 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, bhaskie, bhutchings, netdev

Le vendredi 07 mai 2010 à 10:14 -0700, Stephen Hemminger a écrit :

> Forget the per cpu data; the pool should just be scrapped.
> 
> The only reason the pool exists is that the crypto hash state which
> should just be moved into the md5_info (88 bytes).  The pseudo
> header can just be generated on the stack before passing to the crypto
> code.


Sure, but I'm afraid there is no generic API do do that (if we want to
reuse crypto/md5.c code)




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-07 17:21                           ` Eric Dumazet
@ 2010-05-07 17:36                             ` Stephen Hemminger
  2010-05-07 21:40                               ` Eric Dumazet
  2010-05-16  7:30                               ` TCP-MD5 checksum failure on x86_64 SMP David Miller
  0 siblings, 2 replies; 48+ messages in thread
From: Stephen Hemminger @ 2010-05-07 17:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, bhaskie, bhutchings, netdev

On Fri, 07 May 2010 19:21:33 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le vendredi 07 mai 2010 à 10:14 -0700, Stephen Hemminger a écrit :
> 
> > Forget the per cpu data; the pool should just be scrapped.
> > 
> > The only reason the pool exists is that the crypto hash state which
> > should just be moved into the md5_info (88 bytes).  The pseudo
> > header can just be generated on the stack before passing to the crypto
> > code.
> 
> 
> Sure, but I'm afraid there is no generic API do do that (if we want to
> reuse crypto/md5.c code).

It looks like the pool is just an optimization to avoid opening too
many crypto API connections.  This should only be an issue if offloading
MD5.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-07 15:44                               ` Eric Dumazet
@ 2010-05-07 21:18                                 ` Eric Dumazet
  2010-05-16  7:37                                   ` David Miller
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2010-05-07 21:18 UTC (permalink / raw)
  To: Bhaskar Dutta; +Cc: Stephen Hemminger, Ben Hutchings, netdev, David Miller

Le vendredi 07 mai 2010 à 17:44 +0200, Eric Dumazet a écrit :
> Le vendredi 07 mai 2010 à 17:18 +0200, Eric Dumazet a écrit :
> > OK, I found the second problem.
> > 
> > if/when IP route cache is invalidated, ip_queue_xmit() has to refetch a
> > route and calls sk_setup_caps(sk, &rt->u.dst), destroying the 
> > 
> > sk->sk_route_caps &= ~NETIF_F_GSO_MASK
> > 
> > that MD5 desesperatly try to make all over its way (from
> > tcp_transmit_skb() for example)
> > 
> > So we send few bad packets, and everything is fine when
> > tcp_transmit_skb() is called again.
> > 
> > You get many errors on remote peer if you do
> > 
> > ip route flush cache
> > 


Patch solves the problem for me. I tested it with 200 MD5 sockets
established between two 16 cpus machine, with a multiqueue NIC. Trafic
of 100.000 pps per second, and "ip route flush cache" every minute on
both machines. After five hours, not a single frame had a bad hash
value.

Here is the official submission.

[PATCH] net: Introduce sk_route_nocaps

TCP-MD5 sessions have intermittent failures, when route cache is
invalidated. ip_queue_xmit() has to find a new route, calls
sk_setup_caps(sk, &rt->u.dst), destroying the 

sk->sk_route_caps &= ~NETIF_F_GSO_MASK

that MD5 desperately try to make all over its way (from
tcp_transmit_skb() for example)

So we send few bad packets, and everything is fine when
tcp_transmit_skb() is called again for this socket.

Since ip_queue_xmit() is at a lower level than TCP-MD5, I chose to use a
socket field, sk_route_nocaps, containing bits to mask on sk_route_caps.

Reported-by: Bhaskar Dutta <bhaskie@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sock.h    |    8 ++++++++
 net/core/sock.c       |    1 +
 net/ipv4/tcp_ipv4.c   |    6 +++---
 net/ipv4/tcp_output.c |    2 +-
 net/ipv6/tcp_ipv6.c   |    4 ++--
 5 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 1ad6435..abfadfe 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -177,6 +177,7 @@ struct sock_common {
   *		   %SO_OOBINLINE settings, %SO_TIMESTAMPING settings
   *	@sk_no_check: %SO_NO_CHECK setting, wether or not checkup packets
   *	@sk_route_caps: route capabilities (e.g. %NETIF_F_TSO)
+  *	@sk_route_nocaps: forbidden route capabilities (e.g NETIF_F_GSO_MASK)
   *	@sk_gso_type: GSO type (e.g. %SKB_GSO_TCPV4)
   *	@sk_gso_max_size: Maximum GSO segment size to build
   *	@sk_lingertime: %SO_LINGER l_linger setting
@@ -276,6 +277,7 @@ struct sock {
 	int			sk_forward_alloc;
 	gfp_t			sk_allocation;
 	int			sk_route_caps;
+	int			sk_route_nocaps;
 	int			sk_gso_type;
 	unsigned int		sk_gso_max_size;
 	int			sk_rcvlowat;
@@ -1257,6 +1259,12 @@ static inline int sk_can_gso(const struct sock *sk)
 
 extern void sk_setup_caps(struct sock *sk, struct dst_entry *dst);
 
+static inline void sk_nocaps_add(struct sock *sk, int flags)
+{
+	sk->sk_route_nocaps |= flags;
+	sk->sk_route_caps &= ~flags;
+}
+
 static inline int skb_copy_to_page(struct sock *sk, char __user *from,
 				   struct sk_buff *skb, struct page *page,
 				   int off, int copy)
diff --git a/net/core/sock.c b/net/core/sock.c
index c5812bb..5056a6a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1227,6 +1227,7 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
 	sk->sk_route_caps = dst->dev->features;
 	if (sk->sk_route_caps & NETIF_F_GSO)
 		sk->sk_route_caps |= NETIF_F_GSO_SOFTWARE;
+	sk->sk_route_caps &= ~sk->sk_route_nocaps;
 	if (sk_can_gso(sk)) {
 		if (dst->header_len) {
 			sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 3c23e70..f1a1dd9 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -894,7 +894,7 @@ int tcp_v4_md5_do_add(struct sock *sk, __be32 addr,
 				kfree(newkey);
 				return -ENOMEM;
 			}
-			sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
+			sk_nocaps_add(sk, NETIF_F_GSO_MASK);
 		}
 		if (tcp_alloc_md5sig_pool(sk) == NULL) {
 			kfree(newkey);
@@ -1024,7 +1024,7 @@ static int tcp_v4_parse_md5_keys(struct sock *sk, char __user *optval,
 			return -EINVAL;
 
 		tp->md5sig_info = p;
-		sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
+		sk_nocaps_add(sk, NETIF_F_GSO_MASK);
 	}
 
 	newkey = kmemdup(cmd.tcpm_key, cmd.tcpm_keylen, sk->sk_allocation);
@@ -1465,7 +1465,7 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
 		if (newkey != NULL)
 			tcp_v4_md5_do_add(newsk, newinet->inet_daddr,
 					  newkey, key->keylen);
-		newsk->sk_route_caps &= ~NETIF_F_GSO_MASK;
+		sk_nocaps_add(newsk, NETIF_F_GSO_MASK);
 	}
 #endif
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 0dda86e..0193a39 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -872,7 +872,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 #ifdef CONFIG_TCP_MD5SIG
 	/* Calculate the MD5 hash, as we have all we need now */
 	if (md5) {
-		sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
+		sk_nocaps_add(sk, NETIF_F_GSO_MASK);
 		tp->af_specific->calc_md5_hash(opts.hash_location,
 					       md5, sk, NULL, skb);
 	}
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 075f540..bf34893 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -600,7 +600,7 @@ static int tcp_v6_md5_do_add(struct sock *sk, struct in6_addr *peer,
 				kfree(newkey);
 				return -ENOMEM;
 			}
-			sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
+			sk_nocaps_add(sk, NETIF_F_GSO_MASK);
 		}
 		if (tcp_alloc_md5sig_pool(sk) == NULL) {
 			kfree(newkey);
@@ -737,7 +737,7 @@ static int tcp_v6_parse_md5_keys (struct sock *sk, char __user *optval,
 			return -ENOMEM;
 
 		tp->md5sig_info = p;
-		sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
+		sk_nocaps_add(sk, NETIF_F_GSO_MASK);
 	}
 
 	newkey = kmemdup(cmd.tcpm_key, cmd.tcpm_keylen, GFP_KERNEL);



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-07 17:36                             ` Stephen Hemminger
@ 2010-05-07 21:40                               ` Eric Dumazet
  2010-05-10 14:55                                 ` Bijay Singh
  2010-05-16  7:30                               ` TCP-MD5 checksum failure on x86_64 SMP David Miller
  1 sibling, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2010-05-07 21:40 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, bhaskie, bhutchings, netdev

Le vendredi 07 mai 2010 à 10:36 -0700, Stephen Hemminger a écrit :
> On Fri, 07 May 2010 19:21:33 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > Le vendredi 07 mai 2010 à 10:14 -0700, Stephen Hemminger a écrit :
> > 
> > > Forget the per cpu data; the pool should just be scrapped.
> > > 
> > > The only reason the pool exists is that the crypto hash state which
> > > should just be moved into the md5_info (88 bytes).  The pseudo
> > > header can just be generated on the stack before passing to the crypto
> > > code.
> > 
> > 
> > Sure, but I'm afraid there is no generic API do do that (if we want to
> > reuse crypto/md5.c code).
> 
> It looks like the pool is just an optimization to avoid opening too
> many crypto API connections.  This should only be an issue if offloading
> MD5.

You mean we could allocate two contexts per socket, one for tx path, one
for rx path, but TCP-MD5 implementors chose to use percpu allocations to
factorize them. They should have allocated two contexts per cpu (one for
process context, preemption disabled, one for BH context)

As you said, this could be allocated on stack, with some changes to
crypto API I guess. Since TCP is not a module, md5 is also static, so
there is no module loading involved.

struct crypto_tfm *__crypto_alloc_tfm(struct crypto_alg *alg, u32
type,u32 mask)

-->

struct crypto_tfm *__crypto_alloc_tfm_onstack(struct crypto_alg *alg,
u32 type, u32 mask, void *storage, size_t storage_max_length)


Or a direct plug to crypto/md5.c functions and hand crafted context.




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-07 21:40                               ` Eric Dumazet
@ 2010-05-10 14:55                                 ` Bijay Singh
  2010-05-10 15:18                                   ` Eric Dumazet
  0 siblings, 1 reply; 48+ messages in thread
From: Bijay Singh @ 2010-05-10 14:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>, <netdev@vger.kernel.org>

Hi,
I had noticed the corruption in the context and actually did what is mentioned.

I allocated the context on the stack and plugged in the md5.c functions. I was able to temporarily solve the problem, all this before I got a response on this thread.

But now I have seeing another problem, when i change the MTU on the interface from 1500 to 4470 none of the message from the peer get thru and I get hash failed message. I am wondering if this is another bug getting hit in this scenario.

Regards,
Bijay

On 08-May-2010, at 3:10 AM, Eric Dumazet wrote:

> Le vendredi 07 mai 2010 à 10:36 -0700, Stephen Hemminger a écrit :
>> On Fri, 07 May 2010 19:21:33 +0200
>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> 
>>> Le vendredi 07 mai 2010 à 10:14 -0700, Stephen Hemminger a écrit :
>>> 
>>>> Forget the per cpu data; the pool should just be scrapped.
>>>> 
>>>> The only reason the pool exists is that the crypto hash state which
>>>> should just be moved into the md5_info (88 bytes).  The pseudo
>>>> header can just be generated on the stack before passing to the crypto
>>>> code.
>>> 
>>> 
>>> Sure, but I'm afraid there is no generic API do do that (if we want to
>>> reuse crypto/md5.c code).
>> 
>> It looks like the pool is just an optimization to avoid opening too
>> many crypto API connections.  This should only be an issue if offloading
>> MD5.
> 
> You mean we could allocate two contexts per socket, one for tx path, one
> for rx path, but TCP-MD5 implementors chose to use percpu allocations to
> factorize them. They should have allocated two contexts per cpu (one for
> process context, preemption disabled, one for BH context)
> 
> As you said, this could be allocated on stack, with some changes to
> crypto API I guess. Since TCP is not a module, md5 is also static, so
> there is no module loading involved.
> 
> struct crypto_tfm *__crypto_alloc_tfm(struct crypto_alg *alg, u32
> type,u32 mask)
> 
> -->
> 
> struct crypto_tfm *__crypto_alloc_tfm_onstack(struct crypto_alg *alg,
> u32 type, u32 mask, void *storage, size_t storage_max_length)
> 
> 
> Or a direct plug to crypto/md5.c functions and hand crafted context.
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-10 14:55                                 ` Bijay Singh
@ 2010-05-10 15:18                                   ` Eric Dumazet
  2010-05-10 17:27                                     ` Bijay Singh
  2010-05-11  4:08                                     ` Bijay Singh
  0 siblings, 2 replies; 48+ messages in thread
From: Eric Dumazet @ 2010-05-10 15:18 UTC (permalink / raw)
  To: Bijay Singh
  Cc: Stephen Hemminger, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>, <netdev@vger.kernel.org>

Le lundi 10 mai 2010 à 14:55 +0000, Bijay Singh a écrit :
> Hi,
> I had noticed the corruption in the context and actually did what is mentioned.
> 
> I allocated the context on the stack and plugged in the md5.c functions. I was able to temporarily solve the problem, all this before I got a response on this thread.
> 
> But now I have seeing another problem, when i change the MTU on the interface from 1500 to 4470 none of the message from the peer get thru and I get hash failed message. I am wondering if this is another bug getting hit in this scenario.

Thats very fine, but you mix very different problems.

Step by step resolution is required, and clean patches too, because
plugging md5.c functions is not an option for stable series :)

Obviously, nobody seriously used TCP-MD5 on linux, but you...




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-10 15:18                                   ` Eric Dumazet
@ 2010-05-10 17:27                                     ` Bijay Singh
  2010-05-11  4:08                                     ` Bijay Singh
  1 sibling, 0 replies; 48+ messages in thread
From: Bijay Singh @ 2010-05-10 17:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>, <netdev@vger.kernel.org>

Hi Eric,

Didn't intend to mix  the issues. It was a hack intended to calm the nerves. I am going to apply the proper patches asap.

About the latest problem of MD5 not working with MTU set to 4470. I noticed this when i needed to change the MTU for some other  purpose. 

Since it was a production box, i have to first set up my box with the right NIC card to reproduce this and try debugging it. In the meantime any ques will  help.

Thanks,
Bijay
 
On 10-May-2010, at 8:48 PM, Eric Dumazet wrote:

> Le lundi 10 mai 2010 à 14:55 +0000, Bijay Singh a écrit :
>> Hi,
>> I had noticed the corruption in the context and actually did what is mentioned.
>> 
>> I allocated the context on the stack and plugged in the md5.c functions. I was able to temporarily solve the problem, all this before I got a response on this thread.
>> 
>> But now I have seeing another problem, when i change the MTU on the interface from 1500 to 4470 none of the message from the peer get thru and I get hash failed message. I am wondering if this is another bug getting hit in this scenario.
> 
> Thats very fine, but you mix very different problems.
> 
> Step by step resolution is required, and clean patches too, because
> plugging md5.c functions is not an option for stable series :)
> 
> Obviously, nobody seriously used TCP-MD5 on linux, but you...
> 
> 
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-10 15:18                                   ` Eric Dumazet
  2010-05-10 17:27                                     ` Bijay Singh
@ 2010-05-11  4:08                                     ` Bijay Singh
  2010-05-11  6:27                                       ` Eric Dumazet
  2010-05-11 20:50                                       ` Eric Dumazet
  1 sibling, 2 replies; 48+ messages in thread
From: Bijay Singh @ 2010-05-11  4:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>, <netdev@vger.kernel.org>

Hi Eric,

I guess that makes me the enviable one. So I am keen to test out this feature completely, as long as I know what to do as a next step, directions, patches.

Thanks,
Bijay

On 10-May-2010, at 8:48 PM, Eric Dumazet wrote:

> Le lundi 10 mai 2010 à 14:55 +0000, Bijay Singh a écrit :
>> Hi,
>> I had noticed the corruption in the context and actually did what is mentioned.
>> 
>> I allocated the context on the stack and plugged in the md5.c functions. I was able to temporarily solve the problem, all this before I got a response on this thread.
>> 
>> But now I have seeing another problem, when i change the MTU on the interface from 1500 to 4470 none of the message from the peer get thru and I get hash failed message. I am wondering if this is another bug getting hit in this scenario.
> 
> Thats very fine, but you mix very different problems.
> 
> Step by step resolution is required, and clean patches too, because
> plugging md5.c functions is not an option for stable series :)
> 
> Obviously, nobody seriously used TCP-MD5 on linux, but you...
> 
> 
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-11  4:08                                     ` Bijay Singh
@ 2010-05-11  6:27                                       ` Eric Dumazet
  2010-05-11  8:23                                         ` Bijay Singh
  2010-05-11 20:50                                       ` Eric Dumazet
  1 sibling, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2010-05-11  6:27 UTC (permalink / raw)
  To: Bijay Singh
  Cc: Stephen Hemminger, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>, <netdev@vger.kernel.org>

Le mardi 11 mai 2010 à 04:08 +0000, Bijay Singh a écrit :
> Hi Eric,
> 
> I guess that makes me the enviable one. So I am keen to test out this feature

>  completely, as long as I know what to do as a next step, directions, patches.
> 

MTU > 4000 is not reliable because of high order allocations on typical
NICS. I am afraid you need NIC able to deliver page fragments.

Its working here (32bit kernel) with a tg3 NIC, but I got following
message :

ifconfig eth3 mtu 9000
...
[51492.936500] 167731 total pagecache pages
[51492.936500] 0 pages in swap cache
[51492.936500] Swap cache stats: add 0, delete 0, find 0/0
[51492.936500] Free swap  = 4192928kB
[51492.936500] Total swap = 4192928kB
[51492.936500] 1114110 pages RAM
[51492.936500] 885761 pages HighMem
[51492.936500] 77073 pages reserved
[51492.936500] 134483 pages shared
[51492.936500] 159131 pages non-shared
[51492.953027] tg3 0000:14:04.1: eth3: Using a smaller RX standard ring.
Only 183 out of 511 buffers were allocated successfully

$ ethtool -g eth3
Ring parameters for eth3:
Pre-set maximums:
RX:		511
RX Mini:	0
RX Jumbo:	0
TX:		511
Current hardware settings:
RX:		183
RX Mini:	0
RX Jumbo:	0
TX:		511

$ cat /proc/buddyinfo
Node 0, zone      DMA      5      2      1      1      2      2      2      1      0      1      0 
Node 0, zone   Normal   4285   1823    248     73      9      5      0      0      0      0      0 
Node 0, zone  HighMem     97    199    921    583    383    261    155    117     69     41    649 

I know that if I try to stress RX path, I'll get failures.

Could you explain me why you need both big MTUS and TCP-MD5 ?




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-11  6:27                                       ` Eric Dumazet
@ 2010-05-11  8:23                                         ` Bijay Singh
  0 siblings, 0 replies; 48+ messages in thread
From: Bijay Singh @ 2010-05-11  8:23 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>, <netdev@vger.kernel.org>

I need MD5 for my BGP sessions and need the jumbo packets for the IS-IS peering. MTU of 1500 results in LSPs higher that 1500 getting dropped at the peering router. 

On 11-May-2010, at 11:57 AM, Eric Dumazet wrote:

> Le mardi 11 mai 2010 à 04:08 +0000, Bijay Singh a écrit :
>> Hi Eric,
>> 
>> I guess that makes me the enviable one. So I am keen to test out this feature
> 
>> completely, as long as I know what to do as a next step, directions, patches.
>> 
> 
> MTU > 4000 is not reliable because of high order allocations on typical
> NICS. I am afraid you need NIC able to deliver page fragments.
> 
> Its working here (32bit kernel) with a tg3 NIC, but I got following
> message :
> 
> ifconfig eth3 mtu 9000
> ...
> [51492.936500] 167731 total pagecache pages
> [51492.936500] 0 pages in swap cache
> [51492.936500] Swap cache stats: add 0, delete 0, find 0/0
> [51492.936500] Free swap  = 4192928kB
> [51492.936500] Total swap = 4192928kB
> [51492.936500] 1114110 pages RAM
> [51492.936500] 885761 pages HighMem
> [51492.936500] 77073 pages reserved
> [51492.936500] 134483 pages shared
> [51492.936500] 159131 pages non-shared
> [51492.953027] tg3 0000:14:04.1: eth3: Using a smaller RX standard ring.
> Only 183 out of 511 buffers were allocated successfully
> 
> $ ethtool -g eth3
> Ring parameters for eth3:
> Pre-set maximums:
> RX:		511
> RX Mini:	0
> RX Jumbo:	0
> TX:		511
> Current hardware settings:
> RX:		183
> RX Mini:	0
> RX Jumbo:	0
> TX:		511
> 
> $ cat /proc/buddyinfo
> Node 0, zone      DMA      5      2      1      1      2      2      2      1      0      1      0 
> Node 0, zone   Normal   4285   1823    248     73      9      5      0      0      0      0      0 
> Node 0, zone  HighMem     97    199    921    583    383    261    155    117     69     41    649 
> 
> I know that if I try to stress RX path, I'll get failures.
> 
> Could you explain me why you need both big MTUS and TCP-MD5 ?
> 
> 
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-11  4:08                                     ` Bijay Singh
  2010-05-11  6:27                                       ` Eric Dumazet
@ 2010-05-11 20:50                                       ` Eric Dumazet
  2010-05-12  3:20                                         ` Eric Dumazet
  2010-05-16 20:48                                         ` Eric Dumazet
  1 sibling, 2 replies; 48+ messages in thread
From: Eric Dumazet @ 2010-05-11 20:50 UTC (permalink / raw)
  To: Bijay Singh
  Cc: Stephen Hemminger, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>,
	netdev, Ilpo Järvinen

Le mardi 11 mai 2010 à 04:08 +0000, Bijay Singh a écrit :
> Hi Eric,
> 
> I guess that makes me the enviable one. So I am keen to test out this feature completely, as long as I know what to do as a next step, directions, patches.
> 
> Thanks


I believe third problem comes from commit 4957faad
(TCPCT part 1g: Responder Cookie => Initiator), from William Allen
Simpson.

When a SYN-ACK packet is built (in tcp_synack_options()),
it specifically forbids a TIMESTAMP option to be included if SACK is
also selected :

doing_ts &= !ireq->sack_ok;

Problem is this mask is done on a local variable. socket is still marked
as being timestamp enabled.


Later, when we build tcp options for data packets, we _include_ a
timestamp, while our SYNACK didnt mention the option.  

So the following trafic can happen (and fails) :

18:38:29.041966 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [S], seq 4014064674, win 8860, options [mss 4430,sackOK,TS val 519041 ecr 0,nop,wscale 7,nop,nop,md5can't check - 9b44126367effcf3247fcbf6da76b24d], length 0
18:38:29.042072 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [S.], seq 586328714, ack 4014064675, win 5792, options [nop,nop,md5can't check - badd847799ded46f39642c341cc7e92b,mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
18:38:29.042093 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [.], ack 1, win 70, options [nop,nop,md5can't check - 3994ef6987df02a592963fba04c5d313], length 0
18:38:29.043217 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [.], seq 1:1441, ack 1, win 70, options [nop,nop,md5can't check - 8399f7ccab3a6b8c5a3027ed58bba314], length 1440
18:38:29.043226 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [P.], seq 1441:2501, ack 1, win 70, options [nop,nop,md5can't check - 701ebf65b1894a6bed4cefbf7a56596a], length 1060
18:38:29.043374 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], ack 1441, win 68, options [nop,nop,md5can't check - 1badb315ba436ab59bff5b37daa871be,nop,nop,TS val 113051377 ecr 519041], length 0
18:38:29.043383 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], ack 2501, win 91, options [nop,nop,md5can't check - 120564dcb99f822f3b70910282a6ed9d,nop,nop,TS val 113051377 ecr 519041], length 0
18:38:29.043673 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113051377 ecr 519041], length 1428
18:38:29.043681 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [P.], seq 1429:2500, ack 2501, win 91, options [nop,nop,md5can't check - 7a910cd5ff357bf0e2c8d3489aafaa86,nop,nop,TS val 113051377 ecr 519041], length 1071
18:38:32.037786 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113051677 ecr 519041], length 1428
18:38:38.037708 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113052277 ecr 519041], length 1428
18:38:50.037524 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113053477 ecr 519041], length 1428


Could you try following patch ?

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5db3a2c..0be21cd 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -668,7 +668,7 @@ static unsigned tcp_synack_options(struct sock *sk,
 	u8 cookie_plus = (xvp != NULL && !xvp->cookie_out_never) ?
 			 xvp->cookie_plus :
 			 0;
-	bool doing_ts = ireq->tstamp_ok;
+	bool doing_ts;
 
 #ifdef CONFIG_TCP_MD5SIG
 	*md5 = tcp_rsk(req)->af_specific->md5_lookup(sk, req);
@@ -681,11 +681,12 @@ static unsigned tcp_synack_options(struct sock *sk,
 		 * rather than TS in order to fit in better with old,
 		 * buggy kernels, but that was deemed to be unnecessary.
 		 */
-		doing_ts &= !ireq->sack_ok;
+		ireq->tstamp_ok &= !ireq->sack_ok;
 	}
 #else
 	*md5 = NULL;
 #endif
+	doing_ts = ireq->tstamp_ok;
 
 	/* We always send an MSS option. */
 	opts->mss = mss;






^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-11 20:50                                       ` Eric Dumazet
@ 2010-05-12  3:20                                         ` Eric Dumazet
  2010-05-12 22:22                                           ` Stephen Hemminger
  2010-05-16 20:48                                         ` Eric Dumazet
  1 sibling, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2010-05-12  3:20 UTC (permalink / raw)
  To: Bijay Singh
  Cc: Stephen Hemminger, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>,
	netdev, Ilpo Järvinen

Le mardi 11 mai 2010 à 22:50 +0200, Eric Dumazet a écrit :
> Le mardi 11 mai 2010 à 04:08 +0000, Bijay Singh a écrit :
> > Hi Eric,
> > 
> > I guess that makes me the enviable one. So I am keen to test out this feature completely, as long as I know what to do as a next step, directions, patches.
> > 
> > Thanks
> 
> 
> I believe third problem comes from commit 4957faad
> (TCPCT part 1g: Responder Cookie => Initiator), from William Allen
> Simpson.

And a fourth problem might be that tcp_md5_hash_skb_data() is not
frag_list aware ?




diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 8ce2974..56ee0f2 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2985,6 +2985,7 @@ int tcp_md5_hash_skb_data(struct tcp_md5sig_pool *hp,
 	const unsigned head_data_len = skb_headlen(skb) > header_len ?
 				       skb_headlen(skb) - header_len : 0;
 	const struct skb_shared_info *shi = skb_shinfo(skb);
+	struct sk_buff *frag_iter;
 
 	sg_init_table(&sg, 1);
 
@@ -2999,6 +3000,10 @@ int tcp_md5_hash_skb_data(struct tcp_md5sig_pool *hp,
 			return 1;
 	}
 
+	skb_walk_frags(skb, frag_iter)
+		if (tcp_md5_hash_skb_data(hp, frag_iter, 0))
+			return 1;
+
 	return 0;
 }
 



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-12  3:20                                         ` Eric Dumazet
@ 2010-05-12 22:22                                           ` Stephen Hemminger
  2010-05-12 22:24                                             ` David Miller
  0 siblings, 1 reply; 48+ messages in thread
From: Stephen Hemminger @ 2010-05-12 22:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Bijay Singh, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>,
	netdev, Ilpo Järvinen

On Wed, 12 May 2010 05:20:21 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le mardi 11 mai 2010 à 22:50 +0200, Eric Dumazet a écrit :
> > Le mardi 11 mai 2010 à 04:08 +0000, Bijay Singh a écrit :
> > > Hi Eric,
> > > 
> > > I guess that makes me the enviable one. So I am keen to test out this feature completely, as long as I know what to do as a next step, directions, patches.
> > > 
> > > Thanks
> > 
> > 
> > I believe third problem comes from commit 4957faad
> > (TCPCT part 1g: Responder Cookie => Initiator), from William Allen
> > Simpson.
> 
> And a fourth problem might be that tcp_md5_hash_skb_data() is not
> frag_list aware ?
> 
> 
> 
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 8ce2974..56ee0f2 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2985,6 +2985,7 @@ int tcp_md5_hash_skb_data(struct tcp_md5sig_pool *hp,
>  	const unsigned head_data_len = skb_headlen(skb) > header_len ?
>  				       skb_headlen(skb) - header_len : 0;
>  	const struct skb_shared_info *shi = skb_shinfo(skb);
> +	struct sk_buff *frag_iter;
>  
>  	sg_init_table(&sg, 1);
>  
> @@ -2999,6 +3000,10 @@ int tcp_md5_hash_skb_data(struct tcp_md5sig_pool *hp,
>  			return 1;
>  	}
>  
> +	skb_walk_frags(skb, frag_iter)
> +		if (tcp_md5_hash_skb_data(hp, frag_iter, 0))
> +			return 1;
> +
>  	return 0;
>  }

Yes, that looks like a possible bug, not sure what hardware
generates frag_list.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-12 22:22                                           ` Stephen Hemminger
@ 2010-05-12 22:24                                             ` David Miller
  2010-05-16 19:53                                               ` Eric Dumazet
  0 siblings, 1 reply; 48+ messages in thread
From: David Miller @ 2010-05-12 22:24 UTC (permalink / raw)
  To: shemminger
  Cc: eric.dumazet, Bijay.Singh, bhaskie, bhutchings, netdev, ilpo.jarvinen

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 12 May 2010 15:22:07 -0700

> Yes, that looks like a possible bug, not sure what hardware
> generates frag_list.

GRO generates frag_list

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-07 17:36                             ` Stephen Hemminger
  2010-05-07 21:40                               ` Eric Dumazet
@ 2010-05-16  7:30                               ` David Miller
  1 sibling, 0 replies; 48+ messages in thread
From: David Miller @ 2010-05-16  7:30 UTC (permalink / raw)
  To: shemminger; +Cc: eric.dumazet, bhaskie, bhutchings, netdev

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Fri, 7 May 2010 10:36:39 -0700

> On Fri, 07 May 2010 19:21:33 +0200
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
>> Le vendredi 07 mai 2010 à 10:14 -0700, Stephen Hemminger a écrit :
>> 
>> > Forget the per cpu data; the pool should just be scrapped.
>> > 
>> > The only reason the pool exists is that the crypto hash state which
>> > should just be moved into the md5_info (88 bytes).  The pseudo
>> > header can just be generated on the stack before passing to the crypto
>> > code.
>> 
>> 
>> Sure, but I'm afraid there is no generic API do do that (if we want to
>> reuse crypto/md5.c code).
> 
> It looks like the pool is just an optimization to avoid opening too
> many crypto API connections.  This should only be an issue if offloading
> MD5.

It's an issue because creating a crypto API context is expensive, so this
influences our connection rates with MD5.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-07  8:00                     ` Eric Dumazet
  2010-05-07  8:59                       ` Bhaskar Dutta
@ 2010-05-16  7:35                       ` David Miller
  1 sibling, 0 replies; 48+ messages in thread
From: David Miller @ 2010-05-16  7:35 UTC (permalink / raw)
  To: eric.dumazet; +Cc: bhaskie, shemminger, bhutchings, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 07 May 2010 10:00:22 +0200

> [PATCH] tcp: fix MD5 (RFC2385) support
> 
> TCP MD5 support uses percpu data for temporary storage. It currently
> disables preemption so that same storage cannot be reclaimed by another
> thread on same cpu.
> 
> We also have to make sure a softirq handler wont try to use also same
> context. Various bug reports demonstrated corruptions.
> 
> Fix is to disable preemption and BH.
> 
> Reported-by: Bhaskar Dutta <bhaskie@gmail.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-07 21:18                                 ` Eric Dumazet
@ 2010-05-16  7:37                                   ` David Miller
  0 siblings, 0 replies; 48+ messages in thread
From: David Miller @ 2010-05-16  7:37 UTC (permalink / raw)
  To: eric.dumazet; +Cc: bhaskie, shemminger, bhutchings, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 07 May 2010 23:18:57 +0200

> [PATCH] net: Introduce sk_route_nocaps
> 
> TCP-MD5 sessions have intermittent failures, when route cache is
> invalidated. ip_queue_xmit() has to find a new route, calls
> sk_setup_caps(sk, &rt->u.dst), destroying the 
> 
> sk->sk_route_caps &= ~NETIF_F_GSO_MASK
> 
> that MD5 desperately try to make all over its way (from
> tcp_transmit_skb() for example)
> 
> So we send few bad packets, and everything is fine when
> tcp_transmit_skb() is called again for this socket.
> 
> Since ip_queue_xmit() is at a lower level than TCP-MD5, I chose to use a
> socket field, sk_route_nocaps, containing bits to mask on sk_route_caps.
> 
> Reported-by: Bhaskar Dutta <bhaskie@gmail.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Since the connection does recover eventually, I'm stuffing this into
net-next-2.6 into net-2.6

After some time in net-next-2.6, we can submit it to -stable.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-12 22:24                                             ` David Miller
@ 2010-05-16 19:53                                               ` Eric Dumazet
  0 siblings, 0 replies; 48+ messages in thread
From: Eric Dumazet @ 2010-05-16 19:53 UTC (permalink / raw)
  To: David Miller
  Cc: shemminger, Bijay.Singh, bhaskie, bhutchings, netdev, ilpo.jarvinen

Le mercredi 12 mai 2010 à 15:24 -0700, David Miller a écrit :
> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Wed, 12 May 2010 15:22:07 -0700
> 
> > Yes, that looks like a possible bug, not sure what hardware
> > generates frag_list.
> 
> GRO generates frag_list

ixgbe (82599) too, if I understand well this driver (TCP Receive Side
Coalescing RSC)




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-11 20:50                                       ` Eric Dumazet
  2010-05-12  3:20                                         ` Eric Dumazet
@ 2010-05-16 20:48                                         ` Eric Dumazet
  2010-05-17  3:49                                           ` Bijay Singh
  1 sibling, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2010-05-16 20:48 UTC (permalink / raw)
  To: Bijay Singh
  Cc: Stephen Hemminger, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>,
	netdev, Ilpo Järvinen

Le mardi 11 mai 2010 à 22:50 +0200, Eric Dumazet a écrit :
> Le mardi 11 mai 2010 à 04:08 +0000, Bijay Singh a écrit :
> > Hi Eric,
> > 
> > I guess that makes me the enviable one. So I am keen to test out this feature completely, as long as I know what to do as a next step, directions, patches.
> > 
> > Thanks
> 
> 
> I believe third problem comes from commit 4957faad
> (TCPCT part 1g: Responder Cookie => Initiator), from William Allen
> Simpson.
> 
> When a SYN-ACK packet is built (in tcp_synack_options()),
> it specifically forbids a TIMESTAMP option to be included if SACK is
> also selected :
> 
> doing_ts &= !ireq->sack_ok;
> 
> Problem is this mask is done on a local variable. socket is still marked
> as being timestamp enabled.
> 
> 
> Later, when we build tcp options for data packets, we _include_ a
> timestamp, while our SYNACK didnt mention the option.  
> 
> So the following trafic can happen (and fails) :
> 
> 18:38:29.041966 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [S], seq 4014064674, win 8860, options [mss 4430,sackOK,TS val 519041 ecr 0,nop,wscale 7,nop,nop,md5can't check - 9b44126367effcf3247fcbf6da76b24d], length 0
> 18:38:29.042072 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [S.], seq 586328714, ack 4014064675, win 5792, options [nop,nop,md5can't check - badd847799ded46f39642c341cc7e92b,mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
> 18:38:29.042093 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [.], ack 1, win 70, options [nop,nop,md5can't check - 3994ef6987df02a592963fba04c5d313], length 0
> 18:38:29.043217 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [.], seq 1:1441, ack 1, win 70, options [nop,nop,md5can't check - 8399f7ccab3a6b8c5a3027ed58bba314], length 1440
> 18:38:29.043226 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [P.], seq 1441:2501, ack 1, win 70, options [nop,nop,md5can't check - 701ebf65b1894a6bed4cefbf7a56596a], length 1060
> 18:38:29.043374 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], ack 1441, win 68, options [nop,nop,md5can't check - 1badb315ba436ab59bff5b37daa871be,nop,nop,TS val 113051377 ecr 519041], length 0
> 18:38:29.043383 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], ack 2501, win 91, options [nop,nop,md5can't check - 120564dcb99f822f3b70910282a6ed9d,nop,nop,TS val 113051377 ecr 519041], length 0
> 18:38:29.043673 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113051377 ecr 519041], length 1428
> 18:38:29.043681 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [P.], seq 1429:2500, ack 2501, win 91, options [nop,nop,md5can't check - 7a910cd5ff357bf0e2c8d3489aafaa86,nop,nop,TS val 113051377 ecr 519041], length 1071
> 18:38:32.037786 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113051677 ecr 519041], length 1428
> 18:38:38.037708 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113052277 ecr 519041], length 1428
> 18:38:50.037524 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113053477 ecr 519041], length 1428
> 
> 
> Could you try following patch ?
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 5db3a2c..0be21cd 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -668,7 +668,7 @@ static unsigned tcp_synack_options(struct sock *sk,
>  	u8 cookie_plus = (xvp != NULL && !xvp->cookie_out_never) ?
>  			 xvp->cookie_plus :
>  			 0;
> -	bool doing_ts = ireq->tstamp_ok;
> +	bool doing_ts;
>  
>  #ifdef CONFIG_TCP_MD5SIG
>  	*md5 = tcp_rsk(req)->af_specific->md5_lookup(sk, req);
> @@ -681,11 +681,12 @@ static unsigned tcp_synack_options(struct sock *sk,
>  		 * rather than TS in order to fit in better with old,
>  		 * buggy kernels, but that was deemed to be unnecessary.
>  		 */
> -		doing_ts &= !ireq->sack_ok;
> +		ireq->tstamp_ok &= !ireq->sack_ok;
>  	}
>  #else
>  	*md5 = NULL;
>  #endif
> +	doing_ts = ireq->tstamp_ok;
>  
>  	/* We always send an MSS option. */
>  	opts->mss = mss;
> 
> 
> 
> 

Bijay, had you tested this patch by any chance ?

Thanks



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-16 20:48                                         ` Eric Dumazet
@ 2010-05-17  3:49                                           ` Bijay Singh
  2010-05-17  5:03                                             ` Eric Dumazet
  0 siblings, 1 reply; 48+ messages in thread
From: Bijay Singh @ 2010-05-17  3:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>,
	netdev, Ilpo Järvinen



On 17-May-2010, at 2:18 AM, Eric Dumazet wrote:

> Le mardi 11 mai 2010 à 22:50 +0200, Eric Dumazet a écrit :
>> Le mardi 11 mai 2010 à 04:08 +0000, Bijay Singh a écrit :
>>> Hi Eric,
>>> 
>>> I guess that makes me the enviable one. So I am keen to test out this feature completely, as long as I know what to do as a next step, directions, patches.
>>> 
>>> Thanks
>> 
>> 
>> I believe third problem comes from commit 4957faad
>> (TCPCT part 1g: Responder Cookie => Initiator), from William Allen
>> Simpson.
>> 
>> When a SYN-ACK packet is built (in tcp_synack_options()),
>> it specifically forbids a TIMESTAMP option to be included if SACK is
>> also selected :
>> 
>> doing_ts &= !ireq->sack_ok;
>> 
>> Problem is this mask is done on a local variable. socket is still marked
>> as being timestamp enabled.
>> 
>> 
>> Later, when we build tcp options for data packets, we _include_ a
>> timestamp, while our SYNACK didnt mention the option.  
>> 
>> So the following trafic can happen (and fails) :
>> 
>> 18:38:29.041966 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [S], seq 4014064674, win 8860, options [mss 4430,sackOK,TS val 519041 ecr 0,nop,wscale 7,nop,nop,md5can't check - 9b44126367effcf3247fcbf6da76b24d], length 0
>> 18:38:29.042072 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [S.], seq 586328714, ack 4014064675, win 5792, options [nop,nop,md5can't check - badd847799ded46f39642c341cc7e92b,mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
>> 18:38:29.042093 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [.], ack 1, win 70, options [nop,nop,md5can't check - 3994ef6987df02a592963fba04c5d313], length 0
>> 18:38:29.043217 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [.], seq 1:1441, ack 1, win 70, options [nop,nop,md5can't check - 8399f7ccab3a6b8c5a3027ed58bba314], length 1440
>> 18:38:29.043226 IP 192.168.0.33.58906 > 192.168.0.56.22226: Flags [P.], seq 1441:2501, ack 1, win 70, options [nop,nop,md5can't check - 701ebf65b1894a6bed4cefbf7a56596a], length 1060
>> 18:38:29.043374 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], ack 1441, win 68, options [nop,nop,md5can't check - 1badb315ba436ab59bff5b37daa871be,nop,nop,TS val 113051377 ecr 519041], length 0
>> 18:38:29.043383 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], ack 2501, win 91, options [nop,nop,md5can't check - 120564dcb99f822f3b70910282a6ed9d,nop,nop,TS val 113051377 ecr 519041], length 0
>> 18:38:29.043673 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113051377 ecr 519041], length 1428
>> 18:38:29.043681 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [P.], seq 1429:2500, ack 2501, win 91, options [nop,nop,md5can't check - 7a910cd5ff357bf0e2c8d3489aafaa86,nop,nop,TS val 113051377 ecr 519041], length 1071
>> 18:38:32.037786 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113051677 ecr 519041], length 1428
>> 18:38:38.037708 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113052277 ecr 519041], length 1428
>> 18:38:50.037524 IP 192.168.0.56.22226 > 192.168.0.33.58906: Flags [.], seq 1:1429, ack 2501, win 91, options [nop,nop,md5can't check - fe5dfb438065373b52ba85bf800876a8,nop,nop,TS val 113053477 ecr 519041], length 1428
>> 
>> 
>> Could you try following patch ?
>> 
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index 5db3a2c..0be21cd 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -668,7 +668,7 @@ static unsigned tcp_synack_options(struct sock *sk,
>> 	u8 cookie_plus = (xvp != NULL && !xvp->cookie_out_never) ?
>> 			 xvp->cookie_plus :
>> 			 0;
>> -	bool doing_ts = ireq->tstamp_ok;
>> +	bool doing_ts;
>> 
>> #ifdef CONFIG_TCP_MD5SIG
>> 	*md5 = tcp_rsk(req)->af_specific->md5_lookup(sk, req);
>> @@ -681,11 +681,12 @@ static unsigned tcp_synack_options(struct sock *sk,
>> 		 * rather than TS in order to fit in better with old,
>> 		 * buggy kernels, but that was deemed to be unnecessary.
>> 		 */
>> -		doing_ts &= !ireq->sack_ok;
>> +		ireq->tstamp_ok &= !ireq->sack_ok;
>> 	}
>> #else
>> 	*md5 = NULL;
>> #endif
>> +	doing_ts = ireq->tstamp_ok;
>> 
>> 	/* We always send an MSS option. */
>> 	opts->mss = mss;
>> 
>> 
>> 
>> 
> 
> Bijay, had you tested this patch by any chance ?
> 


I am on quite an old kernel 2.6.27 and could not apply your patches.

Then i moved on to the kernel 2.6.32.11 however since then I have not been able to bring up my card, this is something i need to fix before i can test you fix. Working on that.

> Thanks
> 
> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-17  3:49                                           ` Bijay Singh
@ 2010-05-17  5:03                                             ` Eric Dumazet
  2010-05-17 17:22                                               ` Stephen Hemminger
  2010-05-17 20:42                                               ` Stephen Hemminger
  0 siblings, 2 replies; 48+ messages in thread
From: Eric Dumazet @ 2010-05-17  5:03 UTC (permalink / raw)
  To: Bijay Singh
  Cc: Stephen Hemminger, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>,
	netdev, Ilpo Järvinen

Le lundi 17 mai 2010 à 03:49 +0000, Bijay Singh a écrit :

> I am on quite an old kernel 2.6.27 and could not apply your patches.
> 
> Then i moved on to the kernel 2.6.32.11 however since then I have not been able to bring up my card, this is something i need to fix before i can test you fix. Working on that.
> 

Thanks again for the status report.

I see bug is older than what I stated in my previous mail

I could reproduce it in my lab and confirm following patch fixes it

This is a stable candidate (2.6.27 kernels)

Thanks

[PATCH] tcp: tcp_synack_options() fix 

Commit 33ad798c924b4a (tcp: options clean up) introduced a problem
if MD5+SACK+timestamps were used in initial SYN message.

Some stacks (old linux for example) try to negotiate MD5+SACK+TSTAMP
sessions, but since 20 bytes of tcp options space are not enough to
store all the bits needed, we chose to disable timestamps in this case.

We send a SYN-ACK _without_ timestamp option, but socket has timestamps
enabled and all further outgoing messages contain a TS block, all with
the initial timestamp of the remote peer.

Fix is to really disable timestamps option for the whole session.

Reported-by: Bijay Singh <Bijay.Singh@guavus.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/tcp_output.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 0dda86e..b8bb226 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -667,7 +667,7 @@ static unsigned tcp_synack_options(struct sock *sk,
 	u8 cookie_plus = (xvp != NULL && !xvp->cookie_out_never) ?
 			 xvp->cookie_plus :
 			 0;
-	bool doing_ts = ireq->tstamp_ok;
+	bool doing_ts;
 
 #ifdef CONFIG_TCP_MD5SIG
 	*md5 = tcp_rsk(req)->af_specific->md5_lookup(sk, req);
@@ -680,11 +680,12 @@ static unsigned tcp_synack_options(struct sock *sk,
 		 * rather than TS in order to fit in better with old,
 		 * buggy kernels, but that was deemed to be unnecessary.
 		 */
-		doing_ts &= !ireq->sack_ok;
+		ireq->tstamp_ok &= !ireq->sack_ok;
 	}
 #else
 	*md5 = NULL;
 #endif
+	doing_ts = ireq->tstamp_ok;
 
 	/* We always send an MSS option. */
 	opts->mss = mss;



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-17  5:03                                             ` Eric Dumazet
@ 2010-05-17 17:22                                               ` Stephen Hemminger
  2010-05-17 20:42                                               ` Stephen Hemminger
  1 sibling, 0 replies; 48+ messages in thread
From: Stephen Hemminger @ 2010-05-17 17:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Bijay Singh, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>,
	netdev, Ilpo Järvinen

On Mon, 17 May 2010 07:03:49 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le lundi 17 mai 2010 à 03:49 +0000, Bijay Singh a écrit :
> 
> > I am on quite an old kernel 2.6.27 and could not apply your patches.
> > 
> > Then i moved on to the kernel 2.6.32.11 however since then I have not been able to bring up my card, this is something i need to fix before i can test you fix. Working on that.
> > 
> 
> Thanks again for the status report.
> 
> I see bug is older than what I stated in my previous mail
> 
> I could reproduce it in my lab and confirm following patch fixes it
> 
> This is a stable candidate (2.6.27 kernels)
> 
> Thanks
> 
> [PATCH] tcp: tcp_synack_options() fix 
> 
> Commit 33ad798c924b4a (tcp: options clean up) introduced a problem
> if MD5+SACK+timestamps were used in initial SYN message.
> 
> Some stacks (old linux for example) try to negotiate MD5+SACK+TSTAMP
> sessions, but since 20 bytes of tcp options space are not enough to
> store all the bits needed, we chose to disable timestamps in this case.
> 
> We send a SYN-ACK _without_ timestamp option, but socket has timestamps
> enabled and all further outgoing messages contain a TS block, all with
> the initial timestamp of the remote peer.
> 
> Fix is to really disable timestamps option for the whole session.
> 
> Reported-by: Bijay Singh <Bijay.Singh@guavus.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  net/ipv4/tcp_output.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 0dda86e..b8bb226 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -667,7 +667,7 @@ static unsigned tcp_synack_options(struct sock *sk,
>  	u8 cookie_plus = (xvp != NULL && !xvp->cookie_out_never) ?
>  			 xvp->cookie_plus :
>  			 0;
> -	bool doing_ts = ireq->tstamp_ok;
> +	bool doing_ts;
>  
>  #ifdef CONFIG_TCP_MD5SIG
>  	*md5 = tcp_rsk(req)->af_specific->md5_lookup(sk, req);
> @@ -680,11 +680,12 @@ static unsigned tcp_synack_options(struct sock *sk,
>  		 * rather than TS in order to fit in better with old,
>  		 * buggy kernels, but that was deemed to be unnecessary.
>  		 */
> -		doing_ts &= !ireq->sack_ok;
> +		ireq->tstamp_ok &= !ireq->sack_ok;
>  	}
>  #else
>  	*md5 = NULL;
>  #endif
> +	doing_ts = ireq->tstamp_ok;
>  
>  	/* We always send an MSS option. */
>  	opts->mss = mss;
> 
> 

Make this gets back to stable as well.

-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: TCP-MD5 checksum failure on x86_64 SMP
  2010-05-17  5:03                                             ` Eric Dumazet
  2010-05-17 17:22                                               ` Stephen Hemminger
@ 2010-05-17 20:42                                               ` Stephen Hemminger
  2010-05-17 21:04                                                 ` [PATCH] tcp: tcp_synack_options() fix Eric Dumazet
  1 sibling, 1 reply; 48+ messages in thread
From: Stephen Hemminger @ 2010-05-17 20:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Bijay Singh, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>,
	netdev, Ilpo Järvinen

On Mon, 17 May 2010 07:03:49 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le lundi 17 mai 2010 à 03:49 +0000, Bijay Singh a écrit :
> 
> > I am on quite an old kernel 2.6.27 and could not apply your patches.
> > 
> > Then i moved on to the kernel 2.6.32.11 however since then I have not been able to bring up my card, this is something i need to fix before i can test you fix. Working on that.
> > 
> 
> Thanks again for the status report.
> 
> I see bug is older than what I stated in my previous mail
> 
> I could reproduce it in my lab and confirm following patch fixes it
> 
> This is a stable candidate (2.6.27 kernels)
> 
> Thanks
> 
> [PATCH] tcp: tcp_synack_options() fix 
> 
> Commit 33ad798c924b4a (tcp: options clean up) introduced a problem
> if MD5+SACK+timestamps were used in initial SYN message.
> 
> Some stacks (old linux for example) try to negotiate MD5+SACK+TSTAMP
> sessions, but since 20 bytes of tcp options space are not enough to
> store all the bits needed, we chose to disable timestamps in this case.
> 
> We send a SYN-ACK _without_ timestamp option, but socket has timestamps
> enabled and all further outgoing messages contain a TS block, all with
> the initial timestamp of the remote peer.
> 
> Fix is to really disable timestamps option for the whole session.
> 
> Reported-by: Bijay Singh <Bijay.Singh@guavus.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  net/ipv4/tcp_output.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 0dda86e..b8bb226 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -667,7 +667,7 @@ static unsigned tcp_synack_options(struct sock *sk,
>  	u8 cookie_plus = (xvp != NULL && !xvp->cookie_out_never) ?
>  			 xvp->cookie_plus :
>  			 0;
> -	bool doing_ts = ireq->tstamp_ok;
> +	bool doing_ts;
>  
>  #ifdef CONFIG_TCP_MD5SIG
>  	*md5 = tcp_rsk(req)->af_specific->md5_lookup(sk, req);
> @@ -680,11 +680,12 @@ static unsigned tcp_synack_options(struct sock *sk,
>  		 * rather than TS in order to fit in better with old,
>  		 * buggy kernels, but that was deemed to be unnecessary.
>  		 */
> -		doing_ts &= !ireq->sack_ok;
> +		ireq->tstamp_ok &= !ireq->sack_ok;
>  	}
>  #else
>  	*md5 = NULL;
>  #endif
> +	doing_ts = ireq->tstamp_ok;
>  
>  	/* We always send an MSS option. */
>  	opts->mss = mss;
> 
> 

Since you are doing away with flag variable, why not this instead?


--- a/net/ipv4/tcp_output.c	2010-05-17 13:38:32.822025583 -0700
+++ b/net/ipv4/tcp_output.c	2010-05-17 13:41:47.321734775 -0700
@@ -668,7 +668,6 @@ static unsigned tcp_synack_options(struc
 	u8 cookie_plus = (xvp != NULL && !xvp->cookie_out_never) ?
 			 xvp->cookie_plus :
 			 0;
-	bool doing_ts = ireq->tstamp_ok;
 
 #ifdef CONFIG_TCP_MD5SIG
 	*md5 = tcp_rsk(req)->af_specific->md5_lookup(sk, req);
@@ -681,7 +680,7 @@ static unsigned tcp_synack_options(struc
 		 * rather than TS in order to fit in better with old,
 		 * buggy kernels, but that was deemed to be unnecessary.
 		 */
-		doing_ts &= !ireq->sack_ok;
+		ireq->tstamp_ok &= !ireq->sack_ok;
 	}
 #else
 	*md5 = NULL;
@@ -696,7 +695,7 @@ static unsigned tcp_synack_options(struc
 		opts->options |= OPTION_WSCALE;
 		remaining -= TCPOLEN_WSCALE_ALIGNED;
 	}
-	if (likely(doing_ts)) {
+	if (likely(ireq->tstamp_ok)) {
 		opts->options |= OPTION_TS;
 		opts->tsval = TCP_SKB_CB(skb)->when;
 		opts->tsecr = req->ts_recent;
@@ -704,7 +703,7 @@ static unsigned tcp_synack_options(struc
 	}
 	if (likely(ireq->sack_ok)) {
 		opts->options |= OPTION_SACK_ADVERTISE;
-		if (unlikely(!doing_ts))
+		if (unlikely(!ireq->tstamp_ok))
 			remaining -= TCPOLEN_SACKPERM_ALIGNED;
 	}
 
@@ -712,7 +711,7 @@ static unsigned tcp_synack_options(struc
 	 * If the <SYN> options fit, the same options should fit now!
 	 */
 	if (*md5 == NULL &&
-	    doing_ts &&
+	    ireq->tstamp_ok &&
 	    cookie_plus > TCPOLEN_COOKIE_BASE) {
 		int need = cookie_plus; /* has TCPOLEN_COOKIE_BASE */
 


 



-- 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH] tcp: tcp_synack_options() fix
  2010-05-17 20:42                                               ` Stephen Hemminger
@ 2010-05-17 21:04                                                 ` Eric Dumazet
  2010-05-18  5:35                                                   ` David Miller
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Dumazet @ 2010-05-17 21:04 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Bijay Singh, David Miller, <bhaskie@gmail.com>,
	<bhutchings@solarflare.com>,
	netdev, Ilpo Järvinen

Le lundi 17 mai 2010 à 13:42 -0700, Stephen Hemminger a écrit :

> Since you are doing away with flag variable, why not this instead?
> 

Sure, we can eliminate this doing_ts variable and save few bytes

Thanks

[PATCH] tcp: tcp_synack_options() fix 

Commit 33ad798c924b4a (tcp: options clean up) introduced a problem
if MD5+SACK+timestamps were used in initial SYN message.

Some stacks (old linux for example) try to negotiate MD5+SACK+TSTAMP
sessions, but since 40 bytes of tcp options space are not enough to
store all the bits needed, we chose to disable timestamps in this case.

We send a SYN-ACK _without_ timestamp option, but socket has timestamps
enabled and all further outgoing messages contain a TS block, all with
the initial timestamp of the remote peer.

Fix is to really disable timestamps option for the whole session.

Reported-by: Bijay Singh <Bijay.Singh@guavus.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 0dda86e..44e98d9 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -667,7 +667,6 @@ static unsigned tcp_synack_options(struct sock *sk,
 	u8 cookie_plus = (xvp != NULL && !xvp->cookie_out_never) ?
 			 xvp->cookie_plus :
 			 0;
-	bool doing_ts = ireq->tstamp_ok;
 
 #ifdef CONFIG_TCP_MD5SIG
 	*md5 = tcp_rsk(req)->af_specific->md5_lookup(sk, req);
@@ -680,7 +679,7 @@ static unsigned tcp_synack_options(struct sock *sk,
 		 * rather than TS in order to fit in better with old,
 		 * buggy kernels, but that was deemed to be unnecessary.
 		 */
-		doing_ts &= !ireq->sack_ok;
+		ireq->tstamp_ok &= !ireq->sack_ok;
 	}
 #else
 	*md5 = NULL;
@@ -695,7 +694,7 @@ static unsigned tcp_synack_options(struct sock *sk,
 		opts->options |= OPTION_WSCALE;
 		remaining -= TCPOLEN_WSCALE_ALIGNED;
 	}
-	if (likely(doing_ts)) {
+	if (likely(ireq->tstamp_ok)) {
 		opts->options |= OPTION_TS;
 		opts->tsval = TCP_SKB_CB(skb)->when;
 		opts->tsecr = req->ts_recent;
@@ -703,7 +702,7 @@ static unsigned tcp_synack_options(struct sock *sk,
 	}
 	if (likely(ireq->sack_ok)) {
 		opts->options |= OPTION_SACK_ADVERTISE;
-		if (unlikely(!doing_ts))
+		if (unlikely(!ireq->tstamp_ok))
 			remaining -= TCPOLEN_SACKPERM_ALIGNED;
 	}
 
@@ -711,7 +710,7 @@ static unsigned tcp_synack_options(struct sock *sk,
 	 * If the <SYN> options fit, the same options should fit now!
 	 */
 	if (*md5 == NULL &&
-	    doing_ts &&
+	    ireq->tstamp_ok &&
 	    cookie_plus > TCPOLEN_COOKIE_BASE) {
 		int need = cookie_plus; /* has TCPOLEN_COOKIE_BASE */
 



^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH] tcp: tcp_synack_options() fix
  2010-05-17 21:04                                                 ` [PATCH] tcp: tcp_synack_options() fix Eric Dumazet
@ 2010-05-18  5:35                                                   ` David Miller
  0 siblings, 0 replies; 48+ messages in thread
From: David Miller @ 2010-05-18  5:35 UTC (permalink / raw)
  To: eric.dumazet
  Cc: shemminger, Bijay.Singh, bhaskie, bhutchings, netdev, ilpo.jarvinen

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 17 May 2010 23:04:38 +0200

> Le lundi 17 mai 2010 à 13:42 -0700, Stephen Hemminger a écrit :
> 
>> Since you are doing away with flag variable, why not this instead?
>> 
> 
> Sure, we can eliminate this doing_ts variable and save few bytes
> 
> Thanks
> 
> [PATCH] tcp: tcp_synack_options() fix 

Applied, thanks!

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2010-05-18  5:35 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <i2h571fb4001005031027y4a58c4dtfd28ddcdc08d8401@mail.gmail.com>
2010-05-04  3:30 ` TCP-MD5 checksum failure on x86_64 SMP Bhaskar Dutta
2010-05-04 11:32   ` Ben Hutchings
2010-05-04 14:28     ` Bhaskar Dutta
2010-05-04 16:12       ` Stephen Hemminger
2010-05-04 17:08         ` Bhaskar Dutta
2010-05-04 17:13           ` Stephen Hemminger
2010-05-05 18:03             ` Bhaskar Dutta
2010-05-05 18:53               ` Eric Dumazet
2010-05-06 11:55                 ` Bhaskar Dutta
2010-05-06 12:06                   ` Eric Dumazet
2010-05-07  5:04                     ` David Miller
2010-05-07  5:32                       ` Eric Dumazet
2010-05-07 17:14                         ` Stephen Hemminger
2010-05-07 17:21                           ` Eric Dumazet
2010-05-07 17:36                             ` Stephen Hemminger
2010-05-07 21:40                               ` Eric Dumazet
2010-05-10 14:55                                 ` Bijay Singh
2010-05-10 15:18                                   ` Eric Dumazet
2010-05-10 17:27                                     ` Bijay Singh
2010-05-11  4:08                                     ` Bijay Singh
2010-05-11  6:27                                       ` Eric Dumazet
2010-05-11  8:23                                         ` Bijay Singh
2010-05-11 20:50                                       ` Eric Dumazet
2010-05-12  3:20                                         ` Eric Dumazet
2010-05-12 22:22                                           ` Stephen Hemminger
2010-05-12 22:24                                             ` David Miller
2010-05-16 19:53                                               ` Eric Dumazet
2010-05-16 20:48                                         ` Eric Dumazet
2010-05-17  3:49                                           ` Bijay Singh
2010-05-17  5:03                                             ` Eric Dumazet
2010-05-17 17:22                                               ` Stephen Hemminger
2010-05-17 20:42                                               ` Stephen Hemminger
2010-05-17 21:04                                                 ` [PATCH] tcp: tcp_synack_options() fix Eric Dumazet
2010-05-18  5:35                                                   ` David Miller
2010-05-16  7:30                               ` TCP-MD5 checksum failure on x86_64 SMP David Miller
2010-05-07  8:46                     ` Lars Eggert
2010-05-07  8:55                       ` Eric Dumazet
2010-05-07  9:12                       ` David Miller
2010-05-07  5:39                   ` Eric Dumazet
2010-05-07  8:00                     ` Eric Dumazet
2010-05-07  8:59                       ` Bhaskar Dutta
2010-05-07  9:37                         ` Eric Dumazet
2010-05-07 10:50                           ` Bhaskar Dutta
2010-05-07 15:18                             ` Eric Dumazet
2010-05-07 15:44                               ` Eric Dumazet
2010-05-07 21:18                                 ` Eric Dumazet
2010-05-16  7:37                                   ` David Miller
2010-05-16  7:35                       ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.