All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] fix limited slow start bug
@ 2007-02-25  8:55 Roger While
  2007-02-25 23:43 ` David Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Roger While @ 2007-02-25  8:55 UTC (permalink / raw)
  To: netdev; +Cc: davem


Dave M wrote :
>diff --git a/include/linux/tcp.h b/include/linux/tcp.h
>index 415193e..18a468d 100644
>--- a/include/linux/tcp.h
>+++ b/include/linux/tcp.h
>@@ -302,7 +302,7 @@ struct tcp_sock {
>         u32     snd_ssthresh;   /* Slow start size threshold            */
>         u32     snd_cwnd;       /* Sending congestion window            */
>         u16     snd_cwnd_cnt;   /* Linear increase counter              */
>-       u16     snd_cwnd_clamp; /* Do not allow snd_cwnd to grow above this */
>+       u32     snd_cwnd_clamp; /* Do not allow snd_cwnd to grow above this */
>         u32     snd_cwnd_used;
>         u32     snd_cwnd_stamp;

Was anything done about size/member alignment of struct tcp_sock per
mail from last year  -
http://marc.theaimsgroup.com/?l=linux-netdev&m=114318857102290&w=2

(I have no idea what current size is)



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix limited slow start bug
  2007-02-25  8:55 [PATCH] fix limited slow start bug Roger While
@ 2007-02-25 23:43 ` David Miller
  2007-02-26  2:19   ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2007-02-25 23:43 UTC (permalink / raw)
  To: simrw; +Cc: netdev

From: Roger While <simrw@sim-basis.de>
Date: Sun, 25 Feb 2007 09:55:34 +0100

> Was anything done about size/member alignment of struct tcp_sock per
> mail from last year  -
> http://marc.theaimsgroup.com/?l=linux-netdev&m=114318857102290&w=2
> 
> (I have no idea what current size is)

Nothing has been done yet but I've been thinking about it a lot
over the past year and I've had some discussions with other
developers such as Arnaldo.

It's just a matter of me being backlogged, so I never get to
it as often as I would like :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix limited slow start bug
  2007-02-25 23:43 ` David Miller
@ 2007-02-26  2:19   ` Arnaldo Carvalho de Melo
  2007-02-26  3:02     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-02-26  2:19 UTC (permalink / raw)
  To: David Miller; +Cc: simrw, netdev

[-- Attachment #1: Type: text/plain, Size: 988 bytes --]

On 2/25/07, David Miller <davem@davemloft.net> wrote:
> From: Roger While <simrw@sim-basis.de>
> Date: Sun, 25 Feb 2007 09:55:34 +0100
>
> > Was anything done about size/member alignment of struct tcp_sock per
> > mail from last year  -
> > http://marc.theaimsgroup.com/?l=linux-netdev&m=114318857102290&w=2
> >
> > (I have no idea what current size is)
>
> Nothing has been done yet but I've been thinking about it a lot
> over the past year and I've had some discussions with other
> developers such as Arnaldo.
>
> It's just a matter of me being backlogged, so I never get to
> it as often as I would like :)

Attached goes a current (DaveM's net-2.6 git tree build) pahole
picture of tcp_sock on UP, 32bits, summary:

}; /* size: 1288, cachelines: 21 */
   /* last cacheline: 8 bytes */

and for the really curious, take a look at:

http://oops.ghostprotocols.net:81/acme/dwarves/tcp_sock.pahole.expand_types.txt

All the types are expanded, makes a pretty big picture :-)

- Arnaldo

[-- Attachment #2: tcp_sock.pahole.txt --]
[-- Type: text/plain, Size: 6870 bytes --]

[acme@filo net-2.6]$ pahole ../OUTPUT/qemu/net-2.6/vmlinux tcp_sock
/* <e40138> /home/acme/git/net-2.6/include/linux/tcp.h:227 */
struct tcp_sock {
	struct inet_connection_sock inet_conn;           /*     0   844 */
	/* --- cacheline 13 boundary (832 bytes) was 12 bytes ago --- */
	u16                        tcp_header_len;       /*   844     2 */
	u16                        xmit_size_goal;       /*   846     2 */
	__be32                     pred_flags;           /*   848     4 */
	u32                        rcv_nxt;              /*   852     4 */
	u32                        snd_nxt;              /*   856     4 */
	u32                        snd_una;              /*   860     4 */
	u32                        snd_sml;              /*   864     4 */
	u32                        rcv_tstamp;           /*   868     4 */
	u32                        lsndtime;             /*   872     4 */
	struct {
		struct sk_buff_head prequeue;            /*     0    28 */
		struct task_struct * task;               /*    28     4 */
		struct iovec *     iov;                  /*    32     4 */
		int                memory;               /*    36     4 */
		int                len;                  /*    40     4 */
	} ucopy;                                         /*   876    44 */
	/* --- cacheline 14 boundary (896 bytes) was 24 bytes ago --- */
	u32                        snd_wl1;              /*   920     4 */
	u32                        snd_wnd;              /*   924     4 */
	u32                        max_window;           /*   928     4 */
	u32                        mss_cache;            /*   932     4 */
	u32                        window_clamp;         /*   936     4 */
	u32                        rcv_ssthresh;         /*   940     4 */
	u32                        frto_highmark;        /*   944     4 */
	u8                         reordering;           /*   948     1 */
	u8                         frto_counter;         /*   949     1 */
	u8                         nonagle;              /*   950     1 */
	u8                         keepalive_probes;     /*   951     1 */
	u32                        srtt;                 /*   952     4 */
	u32                        mdev;                 /*   956     4 */
	/* --- cacheline 15 boundary (960 bytes) --- */
	u32                        mdev_max;             /*   960     4 */
	u32                        rttvar;               /*   964     4 */
	u32                        rtt_seq;              /*   968     4 */
	u32                        packets_out;          /*   972     4 */
	u32                        left_out;             /*   976     4 */
	u32                        retrans_out;          /*   980     4 */
	struct tcp_options_received rx_opt;              /*   984    24 */
	u32                        snd_ssthresh;         /*  1008     4 */
	u32                        snd_cwnd;             /*  1012     4 */
	u16                        snd_cwnd_cnt;         /*  1016     2 */
	u16                        snd_cwnd_clamp;       /*  1018     2 */
	u32                        snd_cwnd_used;        /*  1020     4 */
	/* --- cacheline 16 boundary (1024 bytes) --- */
	u32                        snd_cwnd_stamp;       /*  1024     4 */
	struct sk_buff_head        out_of_order_queue;   /*  1028    28 */
	u32                        rcv_wnd;              /*  1056     4 */
	u32                        rcv_wup;              /*  1060     4 */
	u32                        write_seq;            /*  1064     4 */
	u32                        pushed_seq;           /*  1068     4 */
	u32                        copied_seq;           /*  1072     4 */
	struct tcp_sack_block      duplicate_sack[1];    /*  1076     8 */
	struct tcp_sack_block      selective_acks[4];    /*  1084    32 */
	/* --- cacheline 17 boundary (1088 bytes) was 28 bytes ago --- */
	struct tcp_sack_block_wire recv_sack_cache[4];   /*  1116    32 */
	struct sk_buff *           lost_skb_hint;        /*  1148     4 */
	/* --- cacheline 18 boundary (1152 bytes) --- */
	struct sk_buff *           scoreboard_skb_hint;  /*  1152     4 */
	struct sk_buff *           retransmit_skb_hint;  /*  1156     4 */
	struct sk_buff *           forward_skb_hint;     /*  1160     4 */
	struct sk_buff *           fastpath_skb_hint;    /*  1164     4 */
	int                        fastpath_cnt_hint;    /*  1168     4 */
	int                        lost_cnt_hint;        /*  1172     4 */
	int                        retransmit_cnt_hint;  /*  1176     4 */
	int                        forward_cnt_hint;     /*  1180     4 */
	u16                        advmss;               /*  1184     2 */
	u16                        prior_ssthresh;       /*  1186     2 */
	u32                        lost_out;             /*  1188     4 */
	u32                        sacked_out;           /*  1192     4 */
	u32                        fackets_out;          /*  1196     4 */
	u32                        high_seq;             /*  1200     4 */
	u32                        retrans_stamp;        /*  1204     4 */
	u32                        undo_marker;          /*  1208     4 */
	int                        undo_retrans;         /*  1212     4 */
	/* --- cacheline 19 boundary (1216 bytes) --- */
	u32                        urg_seq;              /*  1216     4 */
	u16                        urg_data;             /*  1220     2 */
	u8                         urg_mode;             /*  1222     1 */
	u8                         ecn_flags;            /*  1223     1 */
	u32                        snd_up;               /*  1224     4 */
	u32                        total_retrans;        /*  1228     4 */
	u32                        bytes_acked;          /*  1232     4 */
	unsigned int               keepalive_time;       /*  1236     4 */
	unsigned int               keepalive_intvl;      /*  1240     4 */
	int                        linger2;              /*  1244     4 */
	long unsigned int          last_synq_overflow;   /*  1248     4 */
	u32                        tso_deferred;         /*  1252     4 */
	struct {
		u32                rtt;                  /*     0     4 */
		u32                seq;                  /*     4     4 */
		u32                time;                 /*     8     4 */
	} rcv_rtt_est;                                   /*  1256    12 */
	struct {
		int                space;                /*     0     4 */
		u32                seq;                  /*     4     4 */
		u32                time;                 /*     8     4 */
	} rcvq_space;                                    /*  1268    12 */
	/* --- cacheline 20 boundary (1280 bytes) --- */
	struct {
		u32                probe_seq_start;      /*     0     4 */
		u32                probe_seq_end;        /*     4     4 */
	} mtu_probe;                                     /*  1280     8 */
}; /* size: 1288, cachelines: 21 */
   /* last cacheline: 8 bytes */

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix limited slow start bug
  2007-02-26  2:19   ` Arnaldo Carvalho de Melo
@ 2007-02-26  3:02     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 9+ messages in thread
From: Arnaldo Carvalho de Melo @ 2007-02-26  3:02 UTC (permalink / raw)
  To: David Miller; +Cc: simrw, netdev

On 2/25/07, Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> wrote:
> On 2/25/07, David Miller <davem@davemloft.net> wrote:
> > From: Roger While <simrw@sim-basis.de>
> > Date: Sun, 25 Feb 2007 09:55:34 +0100
> >
> > > Was anything done about size/member alignment of struct tcp_sock per
> > > mail from last year  -
> > > http://marc.theaimsgroup.com/?l=linux-netdev&m=114318857102290&w=2
> > >
> > > (I have no idea what current size is)
> >
> > Nothing has been done yet but I've been thinking about it a lot
> > over the past year and I've had some discussions with other
> > developers such as Arnaldo.
> >
> > It's just a matter of me being backlogged, so I never get to
> > it as often as I would like :)
>
> Attached goes a current (DaveM's net-2.6 git tree build) pahole
> picture of tcp_sock on UP, 32bits, summary:
>
> }; /* size: 1288, cachelines: 21 */
>    /* last cacheline: 8 bytes */
>
> and for the really curious, take a look at:
>
> http://oops.ghostprotocols.net:81/acme/dwarves/tcp_sock.pahole.expand_types.txt
>
> All the types are expanded, makes a pretty big picture :-)
>

And looking at it I saw I have Ingo's timer debugging option enabled,
which makes struct timer_list a bit bigger:

struct timer_list {
        struct list_head {
                struct list_head * next;           /*     0     4 */
                struct list_head * prev;           /*     4     4 */
        } entry;                                   /*     0     8 */
        long unsigned int expires;                 /*     8     4 */
        void       (*function)(long unsigned int); /*    12     4 */
        long unsigned int data;                    /*    16     4 */
        struct tvec_t_base_s * base;               /*    20     4 */

        void *     start_site;                     /*    24     4 */
        char       start_comm[16];                 /*    28    16 */
        int        start_pid;                      /*    44     4 */
} icsk_delack_timer; /*   668    48 */

The three last members are related to debugging, so discount 24 bytes
times 3, as there are tree struct timer_list inside struct tcp_sock.

- Arnaldo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix limited slow start bug
  2007-02-22 21:52   ` John Heffner
@ 2007-02-23  6:53     ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2007-02-23  6:53 UTC (permalink / raw)
  To: jheffner; +Cc: ilpo.jarvinen, netdev

From: John Heffner <jheffner@psc.edu>
Date: Thu, 22 Feb 2007 16:52:03 -0500

> I think it's not unreasonable to change clamp to 32 bits now, since with 
> 1500 byte packets, this corresponds to a max cwnd of ~94MB.  This is 
> pretty big, but we are currently right at this limit with 10 GigE.

Agreed, and done in tcp-2.6.git as below.

What should we do about that 65535 assignment in hybla?

commit cedfa95566512730202bb4abed5d9118e74bab30
Author: David S. Miller <davem@sunset.davemloft.net>
Date:   Thu Feb 22 22:52:59 2007 -0800

    [TCP]: Make snd_cwnd_clamp a u32.
    
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 415193e..18a468d 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -302,7 +302,7 @@ struct tcp_sock {
  	u32	snd_ssthresh;	/* Slow start size threshold		*/
  	u32	snd_cwnd;	/* Sending congestion window		*/
  	u16	snd_cwnd_cnt;	/* Linear increase counter		*/
-	u16	snd_cwnd_clamp; /* Do not allow snd_cwnd to grow above this */
+	u32	snd_cwnd_clamp; /* Do not allow snd_cwnd to grow above this */
 	u32	snd_cwnd_used;
 	u32	snd_cwnd_stamp;
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix limited slow start bug
  2007-02-22 18:56 John Heffner
  2007-02-22 21:37 ` Ilpo Järvinen
@ 2007-02-23  6:42 ` David Miller
  1 sibling, 0 replies; 9+ messages in thread
From: David Miller @ 2007-02-23  6:42 UTC (permalink / raw)
  To: jheffner; +Cc: netdev


John, what tree did you diff this against?

I can tell you didn't create this patch against anything
actually in any of my trees, because:

diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 7fd2910..a0c894f 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -303,9 +303,9 @@ void tcp_slow_start(struct tcp_sock *tp)
 	
 	tp->snd_cwnd_cnt += cnt;

See that line right before the snd_cwnd_cnt increment?

There is a tab there in your patch on that empty line.

Yoshifuji eliminated all extraneous trailing whitespaces,
and spaces that should be tabs, across the entire networking,
several weeks ago.  Including the tab on that empty line in
your patch.

This means that you applied the YeaH and your own patch to
another source tree, perhaps 2.6.20 or similar, and then
generated your fix against that.

Please don't do things like that.  Instead please produce patches
against the tree they will end up being applied to so that they will
apply cleanly for me.

I can't believe how much time I spend getting people to produce
correct patches :-/

Anyways, I applied this by hand, but next time I definitely
am not going to.

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix limited slow start bug
  2007-02-22 21:37 ` Ilpo Järvinen
@ 2007-02-22 21:52   ` John Heffner
  2007-02-23  6:53     ` David Miller
  0 siblings, 1 reply; 9+ messages in thread
From: John Heffner @ 2007-02-22 21:52 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: David Miller, netdev

Ilpo Järvinen wrote:
> BTW, while looking this patch, I noticed that snd_cwnd_clamp is only u16 
> while snd_cwnd is u32, which seems rather strange since snd_cwnd is being 
> limited by the clamp value here and there?!?! And tcp_highspeed.c is 
> clearly assuming even more than this (but the problem is hidden as 
> snd_cwnd_clamp is feed back to the min_t and the used 32-bit constant 
> could be safely cut to 16-bits anyway):
> 
>   tp->snd_cwnd_clamp = min_t(u32, tp->snd_cwnd_clamp, 0xffffffff/128);
> 
> Has the type being changed somewhere in the past or why is this so?

It's been that way as long as I can remember.  It's always been a 
mystery to me as well.  I suspect the tcp_highspeed code is that way 
because this patch originally came out of the Web100-patched kernel, 
which at one point was using a 32 bit snd_cwnd_clamp IIRC.

I think it's not unreasonable to change clamp to 32 bits now, since with 
1500 byte packets, this corresponds to a max cwnd of ~94MB.  This is 
pretty big, but we are currently right at this limit with 10 GigE.

   -John

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] fix limited slow start bug
  2007-02-22 18:56 John Heffner
@ 2007-02-22 21:37 ` Ilpo Järvinen
  2007-02-22 21:52   ` John Heffner
  2007-02-23  6:42 ` David Miller
  1 sibling, 1 reply; 9+ messages in thread
From: Ilpo Järvinen @ 2007-02-22 21:37 UTC (permalink / raw)
  To: John Heffner; +Cc: David Miller, netdev

On Thu, 22 Feb 2007, John Heffner wrote:

> diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
> index 7fd2910..a0c894f 100644
> --- a/net/ipv4/tcp_cong.c
> +++ b/net/ipv4/tcp_cong.c
> @@ -303,9 +303,9 @@ void tcp_slow_start(struct tcp_sock *tp)
> 	
> 	tp->snd_cwnd_cnt += cnt;
> 	while (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
> +		tp->snd_cwnd_cnt -= tp->snd_cwnd;
> 		if (tp->snd_cwnd < tp->snd_cwnd_clamp)
>  			tp->snd_cwnd++;
> -		tp->snd_cwnd_cnt -= tp->snd_cwnd;
>  	}
> }
> EXPORT_SYMBOL_GPL(tcp_slow_start);


ACK. I was going to track this down tomorrow as I noticed strange
behavior during slow start while testing tcp-2.6 today but I don't have to 
anymore :-). The problem I saw is now obvious, whole congestion control 
was practically disabled due to underflow of the unsigned type that occurs 
without this patch, causing TCP to be limited only by the receiver 
advertized window during slow-start.


BTW, while looking this patch, I noticed that snd_cwnd_clamp is only u16 
while snd_cwnd is u32, which seems rather strange since snd_cwnd is being 
limited by the clamp value here and there?!?! And tcp_highspeed.c is 
clearly assuming even more than this (but the problem is hidden as 
snd_cwnd_clamp is feed back to the min_t and the used 32-bit constant 
could be safely cut to 16-bits anyway):

  tp->snd_cwnd_clamp = min_t(u32, tp->snd_cwnd_clamp, 0xffffffff/128);

Has the type being changed somewhere in the past or why is this so?


-- 
 i.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] fix limited slow start bug
@ 2007-02-22 18:56 John Heffner
  2007-02-22 21:37 ` Ilpo Järvinen
  2007-02-23  6:42 ` David Miller
  0 siblings, 2 replies; 9+ messages in thread
From: John Heffner @ 2007-02-22 18:56 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2: lss_order_bug.patch --]
[-- Type: text/plain, Size: 989 bytes --]

Fix arithmetic order bug in limited slow start.  The subtraction needs to be
done before snd_cwnd is incremented.

Signed-off-by: John Heffner <jheffner@psc.edu>

---
commit 244e7411d99443df7b7ae849ba6ebbec4c2342bc
tree e6d5985a22448f59f8bef393542e1d5497ee5684
parent 97033fa201705e6cfc68ce66f34ede3277c3d645
author John Heffner <jheffner@psc.edu> Thu, 22 Feb 2007 13:54:01 -0500
committer John Heffner <jheffner@psc.edu> Thu, 22 Feb 2007 13:54:01 -0500

 net/ipv4/tcp_cong.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 7fd2910..a0c894f 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -303,9 +303,9 @@ void tcp_slow_start(struct tcp_sock *tp)
 	
 	tp->snd_cwnd_cnt += cnt;
 	while (tp->snd_cwnd_cnt >= tp->snd_cwnd) {
+		tp->snd_cwnd_cnt -= tp->snd_cwnd;
 		if (tp->snd_cwnd < tp->snd_cwnd_clamp)
 			tp->snd_cwnd++;
-		tp->snd_cwnd_cnt -= tp->snd_cwnd;
 	}
 }
 EXPORT_SYMBOL_GPL(tcp_slow_start);

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-02-26  3:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-25  8:55 [PATCH] fix limited slow start bug Roger While
2007-02-25 23:43 ` David Miller
2007-02-26  2:19   ` Arnaldo Carvalho de Melo
2007-02-26  3:02     ` Arnaldo Carvalho de Melo
  -- strict thread matches above, loose matches on Subject: below --
2007-02-22 18:56 John Heffner
2007-02-22 21:37 ` Ilpo Järvinen
2007-02-22 21:52   ` John Heffner
2007-02-23  6:53     ` David Miller
2007-02-23  6:42 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.