All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2/25]: Avoid accumulation of large send credit
@ 2007-03-21 18:44 Gerrit Renker
  2007-03-26  2:33 ` Ian McDonald
                   ` (26 more replies)
  0 siblings, 27 replies; 28+ messages in thread
From: Gerrit Renker @ 2007-03-21 18:44 UTC (permalink / raw)
  To: dccp

[CCID 3]: Avoid accumulation of large send credit

Problem:
--------
 Large backlogs of packets which can be sent immediately currently accumulate
 when (i) the application idles, or (ii) the application emits at a rate slower
 than the allowed rate X/s, or (iii) due to scheduling inaccuracy (resolution
 only up to HZ). The consequence is that a huge burst of packets can be sent
 immediately, which violates the allowed sending rate and can (worst case)
 choke the network.
 NB: Corresponding paragraph on send credits recently added to rfc3448bis

Fix:
----
 Avoid any backlog of sending time which is greater than one whole t_ipi. This
 permits the coarse-granularity bursts mentioned in [RFC 3448, 4.6], but disallows
 the disproportionally large bursts.

 D e t a i l e d   J u s t i f i c a t i o n   [not commit message]
 ------------------------------------------------------------------
 Let t_nom < t_now be such that t_now = t_nom + n*t_ipi + t_r, where
 n is a natural number and t_r < t_ipi. Then 
 
 	t_nom - t_now = - (n*t_ipi + t_r)
 
 First consider n=0: the current packet is sent immediately, and for
 the next one the send time is
 	
 	t_nom'  =  t_nom + t_ipi  =  t_now + (t_ipi - t_r)
 
 Thus the next packet is sent t_r time units earlier. The result is
 burstier traffic, as the inter-packet spacing is reduced; this 
 burstiness is mentioned by [RFC 3448, 4.6]. 
 
 Now consider n=1. This case is illustrated below
 
 	|<----- t_ipi -------->|<-- t_r -->|
 
 	|----------------------|-----------|
 	t_nom                              t_now
 
 Not only can the next packet be sent t_r time units earlier, a third
 packet can additionally be sent at the same time. 
 
 This case can be generalised in that the packet scheduling mechanism
 now acts as a Token Bucket Filter whose bucket size equals n: when
 n=0, a packet can only be sent when the next token arrives. When n>0,
 a burst of n packets can be sent immediately in addition to the tokens
 which arrive with rate rho = 1/t_ipi.
 
 The aim of CCID 3 is an on average smooth traffic with allowed sending
 rate X. The following determines the required bucket size n for the 
 purpose of achieving, over the period of one RTT R, an average allowed
 sending rate X.
 The number of bytes sent during this period is X*R. Tokens arrive with
 rate rho at the bucket, whose size n shall be determined now. Over the
 period of R, the TBF allows s * (n + R * rho) bytes to be sent, since
 each token represents a packet of size s. Hence we have the equation
 
 		s * (n + R * rho) = X * R
 	<=>	n + R/t_ipi	  = X/s * R = R / t_ipi
 
 which shows that n must be 0. Hence we can not allow a `credit' of
 t_nom - t_now > t_ipi time units to accrue in the packet scheduling.


Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/ccids/ccid3.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

--- a/net/dccp/ccids/ccid3.c
+++ b/net/dccp/ccids/ccid3.c
@@ -362,7 +362,15 @@ static int ccid3_hc_tx_send_packet(struc
 	case TFRC_SSTATE_NO_FBACK:
 	case TFRC_SSTATE_FBACK:
 		delay = timeval_delta(&hctx->ccid3hctx_t_nom, &now);
-		ccid3_pr_debug("delay=%ld\n", (long)delay);
+		/*
+		 * Lagging behind for more than a full t_ipi: when this occurs,
+		 * a send credit accrues which causes packet storms, violating
+		 * even the average allowed sending rate. This case happens if
+		 * the application idles for some time, or if it emits packets
+		 * at a rate smaller than X/s. Avoid such accumulation.
+		 */
+		if (delay + (suseconds_t)hctx->ccid3hctx_t_ipi  <  0)
+			hctx->ccid3hctx_t_nom = now;
 		/*
 		 *	Scheduling of packet transmissions [RFC 3448, 4.6]
 		 *
@@ -371,7 +379,7 @@ static int ccid3_hc_tx_send_packet(struc
 		 * else
 		 *       // send the packet in (t_nom - t_now) milliseconds.
 		 */
-		if (delay - (suseconds_t)hctx->ccid3hctx_delta >= 0)
+		else if (delay - (suseconds_t)hctx->ccid3hctx_delta  >=  0)
 			return delay / 1000L;
 
 		ccid3_hc_tx_update_win_count(hctx, &now);

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
@ 2007-03-26  2:33 ` Ian McDonald
  2007-04-10 17:24 ` Eddie Kohler
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Ian McDonald @ 2007-03-26  2:33 UTC (permalink / raw)
  To: dccp

On 3/22/07, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
> [CCID 3]: Avoid accumulation of large send credit
>
> Problem:
> --------
>  Large backlogs of packets which can be sent immediately currently accumulate
>  when (i) the application idles, or (ii) the application emits at a rate slower
>  than the allowed rate X/s, or (iii) due to scheduling inaccuracy (resolution
>  only up to HZ). The consequence is that a huge burst of packets can be sent
>  immediately, which violates the allowed sending rate and can (worst case)
>  choke the network.
>  NB: Corresponding paragraph on send credits recently added to rfc3448bis
>
> Fix:
> ----
>  Avoid any backlog of sending time which is greater than one whole t_ipi. This
>  permits the coarse-granularity bursts mentioned in [RFC 3448, 4.6], but disallows
>  the disproportionally large bursts.

I think we should be going with greater than max(t_ipi, t_gran) as per
discussion on IETF list and proposal by Sally Floyd
http://www1.ietf.org/mail-archive/web/dccp/current/msg02281.html

I do know that Gerrit has some reservations around the whole
granularity of scheduling issue and I will try and address these at
some point but Eddie/Sally seem to be siding around allowing packet
bursts as per how we read RFC spec.

Ian
-- 
Web: http://wand.net.nz/~iam4
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
  2007-03-26  2:33 ` Ian McDonald
@ 2007-04-10 17:24 ` Eddie Kohler
  2007-04-11 14:50 ` Gerrit Renker
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Eddie Kohler @ 2007-04-10 17:24 UTC (permalink / raw)
  To: dccp

> Fix:
> ----
>  Avoid any backlog of sending time which is greater than one whole t_ipi. This
>  permits the coarse-granularity bursts mentioned in [RFC 3448, 4.6], but disallows
>  the disproportionally large bursts.

Actually this does not permit coarse granularity bursts, since it limits 
the maximum burst size to 2 packets.  That is not sufficient for high 
rates and medium-to-low granularities and it is far stricter than TCP.

Eddie


>  D e t a i l e d   J u s t i f i c a t i o n   [not commit message]
>  ------------------------------------------------------------------
>  Let t_nom < t_now be such that t_now = t_nom + n*t_ipi + t_r, where
>  n is a natural number and t_r < t_ipi. Then 
>  
>  	t_nom - t_now = - (n*t_ipi + t_r)
>  
>  First consider n=0: the current packet is sent immediately, and for
>  the next one the send time is
>  	
>  	t_nom'  =  t_nom + t_ipi  =  t_now + (t_ipi - t_r)
>  
>  Thus the next packet is sent t_r time units earlier. The result is
>  burstier traffic, as the inter-packet spacing is reduced; this 
>  burstiness is mentioned by [RFC 3448, 4.6]. 
>  
>  Now consider n=1. This case is illustrated below
>  
>  	|<----- t_ipi -------->|<-- t_r -->|
>  
>  	|----------------------|-----------|
>  	t_nom                              t_now
>  
>  Not only can the next packet be sent t_r time units earlier, a third
>  packet can additionally be sent at the same time. 
>  
>  This case can be generalised in that the packet scheduling mechanism
>  now acts as a Token Bucket Filter whose bucket size equals n: when
>  n=0, a packet can only be sent when the next token arrives. When n>0,
>  a burst of n packets can be sent immediately in addition to the tokens
>  which arrive with rate rho = 1/t_ipi.
>  
>  The aim of CCID 3 is an on average smooth traffic with allowed sending
>  rate X. The following determines the required bucket size n for the 
>  purpose of achieving, over the period of one RTT R, an average allowed
>  sending rate X.
>  The number of bytes sent during this period is X*R. Tokens arrive with
>  rate rho at the bucket, whose size n shall be determined now. Over the
>  period of R, the TBF allows s * (n + R * rho) bytes to be sent, since
>  each token represents a packet of size s. Hence we have the equation
>  
>  		s * (n + R * rho) = X * R
>  	<=>	n + R/t_ipi	  = X/s * R = R / t_ipi
>  
>  which shows that n must be 0. Hence we can not allow a `credit' of
>  t_nom - t_now > t_ipi time units to accrue in the packet scheduling.
> 
> 
> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
> ---
>  net/dccp/ccids/ccid3.c |   12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> --- a/net/dccp/ccids/ccid3.c
> +++ b/net/dccp/ccids/ccid3.c
> @@ -362,7 +362,15 @@ static int ccid3_hc_tx_send_packet(struc
>  	case TFRC_SSTATE_NO_FBACK:
>  	case TFRC_SSTATE_FBACK:
>  		delay = timeval_delta(&hctx->ccid3hctx_t_nom, &now);
> -		ccid3_pr_debug("delay=%ld\n", (long)delay);
> +		/*
> +		 * Lagging behind for more than a full t_ipi: when this occurs,
> +		 * a send credit accrues which causes packet storms, violating
> +		 * even the average allowed sending rate. This case happens if
> +		 * the application idles for some time, or if it emits packets
> +		 * at a rate smaller than X/s. Avoid such accumulation.
> +		 */
> +		if (delay + (suseconds_t)hctx->ccid3hctx_t_ipi  <  0)
> +			hctx->ccid3hctx_t_nom = now;
>  		/*
>  		 *	Scheduling of packet transmissions [RFC 3448, 4.6]
>  		 *
> @@ -371,7 +379,7 @@ static int ccid3_hc_tx_send_packet(struc
>  		 * else
>  		 *       // send the packet in (t_nom - t_now) milliseconds.
>  		 */
> -		if (delay - (suseconds_t)hctx->ccid3hctx_delta >= 0)
> +		else if (delay - (suseconds_t)hctx->ccid3hctx_delta  >=  0)
>  			return delay / 1000L;
>  
>  		ccid3_hc_tx_update_win_count(hctx, &now);
> -
> To unsubscribe from this list: send the line "unsubscribe dccp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
  2007-03-26  2:33 ` Ian McDonald
  2007-04-10 17:24 ` Eddie Kohler
@ 2007-04-11 14:50 ` Gerrit Renker
  2007-04-11 15:43 ` Eddie Kohler
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Gerrit Renker @ 2007-04-11 14:50 UTC (permalink / raw)
  To: dccp

Quoting Eddie Kohler:
|  > Fix:
|  > ----
|  >  Avoid any backlog of sending time which is greater than one whole t_ipi. This
|  >  permits the coarse-granularity bursts mentioned in [RFC 3448, 4.6], but disallows
|  >  the disproportionally large bursts.
|  
|  Actually this does not permit coarse granularity bursts, since it limits 
|  the maximum burst size to 2 packets.  That is not sufficient for high 
|  rates and medium-to-low granularities and it is far stricter than TCP.
|  
The comment affects the commit message. I can change that if you like. With regard to the
remainder:

First is the issue with TCP. As shown below, increasing the allowed lag beyond one full 
t_ipi will effectively increase the sending rate beyond the allowed rate X; which 
means that the sender sends more per RTT than it is allowed by the throughput equation. 

With regards to stricter, we do respect RFC 4340, 3.6,
 `DCCP implementations will follow TCP's "general principle of robustness": 
  "be conservative in what you do, be liberal in what you  accept from others" [RFC793].'

Finally, the main reason for using a tighter value on the maximum lag is to protect against
problems with high-speed hardware. Commodity PCs already have Gigabit ethernet cards and
the Linux stack nicely scales up to speed. Unfortunately, unless one implements real-time
extensions to pace the packets, there will always be slack and accumulation of send credits.

And these will accrue for the simple reason that a t_ipi of 1.6 milliseconds becomes 1 millisecond,
and a t_ipi of 0.9 milliseconds becomes 0 milliseconds. 

There is no way to stop a Linux CCID3 sender from ramping X up to the link bandwidth of 1 Gbit/sec;
but the scheduler can only control packet pacing up to a rate of s * HZ bytes per second.
Therefore, if we allow slack in the scheduling lag, the bursts on such systems as use
Gbit or even 10-Gbit ethernet cards will become astronomically large. It is thus safer to choose the
more restrictive value. Of course, a regrettable compromise. But to do the scheduling right _and_
safe requires real-time extensions or busy-wait threads (not sure that they will find much favour). 
The same topic has been discussed several times over on this mailing list. 


C o n c l u s i o n :
==========The patch fixes a serious problem which will occur in any application using CCID3, due to
realistically possible conditions such as

 * a low sending rate and/or
 * silence periods and/or
 * scheduling inaccuracies (as described above).

I therefore still want it in!



|  
|  >  D e t a i l e d   J u s t i f i c a t i o n   [not commit message]
|  >  ------------------------------------------------------------------
|  >  Let t_nom < t_now be such that t_now = t_nom + n*t_ipi + t_r, where
|  >  n is a natural number and t_r < t_ipi. Then 
|  >  
|  >  	t_nom - t_now = - (n*t_ipi + t_r)
|  >  
|  >  First consider n=0: the current packet is sent immediately, and for
|  >  the next one the send time is
|  >  	
|  >  	t_nom'  =  t_nom + t_ipi  =  t_now + (t_ipi - t_r)
|  >  
|  >  Thus the next packet is sent t_r time units earlier. The result is
|  >  burstier traffic, as the inter-packet spacing is reduced; this 
|  >  burstiness is mentioned by [RFC 3448, 4.6]. 
|  >  
|  >  Now consider n=1. This case is illustrated below
|  >  
|  >  	|<----- t_ipi -------->|<-- t_r -->|
|  >  
|  >  	|----------------------|-----------|
|  >  	t_nom                              t_now
|  >  
|  >  Not only can the next packet be sent t_r time units earlier, a third
|  >  packet can additionally be sent at the same time. 
|  >  
|  >  This case can be generalised in that the packet scheduling mechanism
|  >  now acts as a Token Bucket Filter whose bucket size equals n: when
|  >  n=0, a packet can only be sent when the next token arrives. When n>0,
|  >  a burst of n packets can be sent immediately in addition to the tokens
|  >  which arrive with rate rho = 1/t_ipi.
|  >  
|  >  The aim of CCID 3 is an on average smooth traffic with allowed sending
|  >  rate X. The following determines the required bucket size n for the 
|  >  purpose of achieving, over the period of one RTT R, an average allowed
|  >  sending rate X.
|  >  The number of bytes sent during this period is X*R. Tokens arrive with
|  >  rate rho at the bucket, whose size n shall be determined now. Over the
|  >  period of R, the TBF allows s * (n + R * rho) bytes to be sent, since
|  >  each token represents a packet of size s. Hence we have the equation
|  >  
|  >  		s * (n + R * rho) = X * R
|  >  	<=>	n + R/t_ipi	  = X/s * R = R / t_ipi
|  >  
|  >  which shows that n must be 0. Hence we can not allow a `credit' of
|  >  t_nom - t_now > t_ipi time units to accrue in the packet scheduling.
|  > 
|  > 
|  > Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
|  > ---
|  >  net/dccp/ccids/ccid3.c |   12 ++++++++++--
|  >  1 file changed, 10 insertions(+), 2 deletions(-)
|  > 
|  > --- a/net/dccp/ccids/ccid3.c
|  > +++ b/net/dccp/ccids/ccid3.c
|  > @@ -362,7 +362,15 @@ static int ccid3_hc_tx_send_packet(struc
|  >  	case TFRC_SSTATE_NO_FBACK:
|  >  	case TFRC_SSTATE_FBACK:
|  >  		delay = timeval_delta(&hctx->ccid3hctx_t_nom, &now);
|  > -		ccid3_pr_debug("delay=%ld\n", (long)delay);
|  > +		/*
|  > +		 * Lagging behind for more than a full t_ipi: when this occurs,
|  > +		 * a send credit accrues which causes packet storms, violating
|  > +		 * even the average allowed sending rate. This case happens if
|  > +		 * the application idles for some time, or if it emits packets
|  > +		 * at a rate smaller than X/s. Avoid such accumulation.
|  > +		 */
|  > +		if (delay + (suseconds_t)hctx->ccid3hctx_t_ipi  <  0)
|  > +			hctx->ccid3hctx_t_nom = now;
|  >  		/*
|  >  		 *	Scheduling of packet transmissions [RFC 3448, 4.6]
|  >  		 *
|  > @@ -371,7 +379,7 @@ static int ccid3_hc_tx_send_packet(struc
|  >  		 * else
|  >  		 *       // send the packet in (t_nom - t_now) milliseconds.
|  >  		 */
|  > -		if (delay - (suseconds_t)hctx->ccid3hctx_delta >= 0)
|  > +		else if (delay - (suseconds_t)hctx->ccid3hctx_delta  >=  0)
|  >  			return delay / 1000L;
|  >  
|  >  		ccid3_hc_tx_update_win_count(hctx, &now);
|  > -
|  > To unsubscribe from this list: send the line "unsubscribe dccp" in
|  > the body of a message to majordomo@vger.kernel.org
|  > More majordomo info at  http://vger.kernel.org/majordomo-info.html
|  
|  

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (2 preceding siblings ...)
  2007-04-11 14:50 ` Gerrit Renker
@ 2007-04-11 15:43 ` Eddie Kohler
  2007-04-11 22:45 ` Ian McDonald
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Eddie Kohler @ 2007-04-11 15:43 UTC (permalink / raw)
  To: dccp

Gerrit Renker wrote:
> Quoting Eddie Kohler:
> |  > Fix:
> |  > ----
> |  >  Avoid any backlog of sending time which is greater than one whole t_ipi. This
> |  >  permits the coarse-granularity bursts mentioned in [RFC 3448, 4.6], but disallows
> |  >  the disproportionally large bursts.
> |  
> |  Actually this does not permit coarse granularity bursts, since it limits 
> |  the maximum burst size to 2 packets.  That is not sufficient for high 
> |  rates and medium-to-low granularities and it is far stricter than TCP.
> |  
> The comment affects the commit message. I can change that if you like.

Yes, that would be the right thing to do.

> With regard to the
> remainder:
> 
> First is the issue with TCP. As shown below, increasing the allowed lag beyond one full 
> t_ipi will effectively increase the sending rate beyond the allowed rate X; which 
> means that the sender sends more per RTT than it is allowed by the throughput equation. 

The RFC3448/RFC4342 credit mechanism simply does not allow long-term rates 
greater than X (once you fix the problem with idle periods).

> And these will accrue for the simple reason that a t_ipi of 1.6 milliseconds becomes 1 millisecond,
> and a t_ipi of 0.9 milliseconds becomes 0 milliseconds. 

You do not mean that DCCP calculates a t_ipi of 0 milliseconds, do you???  The 
RFCs assume that t_ipi is kept in *precise* units, not units of jiffies or 
what-have-you.  t_gran might be greater than t_ipi.

> There is no way to stop a Linux CCID3 sender from ramping X up to the link bandwidth of 1 Gbit/sec;
> but the scheduler can only control packet pacing up to a rate of s * HZ bytes per second.
> Therefore, if we allow slack in the scheduling lag, the bursts on such systems as use
> Gbit or even 10-Gbit ethernet cards will become astronomically large. It is thus safer to choose the
> more restrictive value. Of course, a regrettable compromise. But to do the scheduling right _and_
> safe requires real-time extensions or busy-wait threads (not sure that they will find much favour). 
> The same topic has been discussed several times over on this mailing list. 

I agree that massive bursts are undesirable, but NO bursts is too restrictive.

Your token bucket math, incidentally, is wrong.  The easiest way to see this 
is to note that, according to your math, ANY token bucket filter attempting to 
limit the average output rate would have to have n = 0, making TBFs useless. 
The critical error is in assuming that a TBF allows "s * (n + R * rho)" bytes 
to be sent in a period R.  This is not right; a TBF allows a maximum of s * R 
* rho per longer-term period R; that's the point.  A token bucket filter 
allows only SHORT-term bursts to compensate for earlier slow periods.  Which 
is exactly what we need.

As part of the work on RFC3448bis we are trying to define the best maximum 
credit accumulation.  We are currently thinking R (one RTT), although feedback 
is welcome.  This is much better than t_gran, I think, and will limit 
burstiness a great deal in practice, where most fast connections have very 
short RTTs.  But the credit accumulation in RFC3448bis WILL be explicit, it 
WILL be a normative upper bound, and it certainly won't be zero.

Eddie



> C o n c l u s i o n :
> ==========> The patch fixes a serious problem which will occur in any application using CCID3, due to
> realistically possible conditions such as
> 
>  * a low sending rate and/or
>  * silence periods and/or
>  * scheduling inaccuracies (as described above).
> 
> I therefore still want it in!
> 
> 
> 
> |  
> |  >  D e t a i l e d   J u s t i f i c a t i o n   [not commit message]
> |  >  ------------------------------------------------------------------
> |  >  Let t_nom < t_now be such that t_now = t_nom + n*t_ipi + t_r, where
> |  >  n is a natural number and t_r < t_ipi. Then 
> |  >  
> |  >  	t_nom - t_now = - (n*t_ipi + t_r)
> |  >  
> |  >  First consider n=0: the current packet is sent immediately, and for
> |  >  the next one the send time is
> |  >  	
> |  >  	t_nom'  =  t_nom + t_ipi  =  t_now + (t_ipi - t_r)
> |  >  
> |  >  Thus the next packet is sent t_r time units earlier. The result is
> |  >  burstier traffic, as the inter-packet spacing is reduced; this 
> |  >  burstiness is mentioned by [RFC 3448, 4.6]. 
> |  >  
> |  >  Now consider n=1. This case is illustrated below
> |  >  
> |  >  	|<----- t_ipi -------->|<-- t_r -->|
> |  >  
> |  >  	|----------------------|-----------|
> |  >  	t_nom                              t_now
> |  >  
> |  >  Not only can the next packet be sent t_r time units earlier, a third
> |  >  packet can additionally be sent at the same time. 
> |  >  
> |  >  This case can be generalised in that the packet scheduling mechanism
> |  >  now acts as a Token Bucket Filter whose bucket size equals n: when
> |  >  n=0, a packet can only be sent when the next token arrives. When n>0,
> |  >  a burst of n packets can be sent immediately in addition to the tokens
> |  >  which arrive with rate rho = 1/t_ipi.
> |  >  
> |  >  The aim of CCID 3 is an on average smooth traffic with allowed sending
> |  >  rate X. The following determines the required bucket size n for the 
> |  >  purpose of achieving, over the period of one RTT R, an average allowed
> |  >  sending rate X.
> |  >  The number of bytes sent during this period is X*R. Tokens arrive with
> |  >  rate rho at the bucket, whose size n shall be determined now. Over the
> |  >  period of R, the TBF allows s * (n + R * rho) bytes to be sent, since
> |  >  each token represents a packet of size s. Hence we have the equation
> |  >  
> |  >  		s * (n + R * rho) = X * R
> |  >  	<=>	n + R/t_ipi	  = X/s * R = R / t_ipi
> |  >  
> |  >  which shows that n must be 0. Hence we can not allow a `credit' of
> |  >  t_nom - t_now > t_ipi time units to accrue in the packet scheduling.
> |  > 
> |  > 
> |  > Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
> |  > ---
> |  >  net/dccp/ccids/ccid3.c |   12 ++++++++++--
> |  >  1 file changed, 10 insertions(+), 2 deletions(-)
> |  > 
> |  > --- a/net/dccp/ccids/ccid3.c
> |  > +++ b/net/dccp/ccids/ccid3.c
> |  > @@ -362,7 +362,15 @@ static int ccid3_hc_tx_send_packet(struc
> |  >  	case TFRC_SSTATE_NO_FBACK:
> |  >  	case TFRC_SSTATE_FBACK:
> |  >  		delay = timeval_delta(&hctx->ccid3hctx_t_nom, &now);
> |  > -		ccid3_pr_debug("delay=%ld\n", (long)delay);
> |  > +		/*
> |  > +		 * Lagging behind for more than a full t_ipi: when this occurs,
> |  > +		 * a send credit accrues which causes packet storms, violating
> |  > +		 * even the average allowed sending rate. This case happens if
> |  > +		 * the application idles for some time, or if it emits packets
> |  > +		 * at a rate smaller than X/s. Avoid such accumulation.
> |  > +		 */
> |  > +		if (delay + (suseconds_t)hctx->ccid3hctx_t_ipi  <  0)
> |  > +			hctx->ccid3hctx_t_nom = now;
> |  >  		/*
> |  >  		 *	Scheduling of packet transmissions [RFC 3448, 4.6]
> |  >  		 *
> |  > @@ -371,7 +379,7 @@ static int ccid3_hc_tx_send_packet(struc
> |  >  		 * else
> |  >  		 *       // send the packet in (t_nom - t_now) milliseconds.
> |  >  		 */
> |  > -		if (delay - (suseconds_t)hctx->ccid3hctx_delta >= 0)
> |  > +		else if (delay - (suseconds_t)hctx->ccid3hctx_delta  >=  0)
> |  >  			return delay / 1000L;
> |  >  
> |  >  		ccid3_hc_tx_update_win_count(hctx, &now);
> |  > -
> |  > To unsubscribe from this list: send the line "unsubscribe dccp" in
> |  > the body of a message to majordomo@vger.kernel.org
> |  > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> |  
> |  
> -
> To unsubscribe from this list: send the line "unsubscribe dccp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (3 preceding siblings ...)
  2007-04-11 15:43 ` Eddie Kohler
@ 2007-04-11 22:45 ` Ian McDonald
  2007-04-12 11:40 ` Gerrit Renker
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Ian McDonald @ 2007-04-11 22:45 UTC (permalink / raw)
  To: dccp

On 4/12/07, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
> There is no way to stop a Linux CCID3 sender from ramping X up to the link bandwidth of 1 Gbit/sec;
> but the scheduler can only control packet pacing up to a rate of s * HZ bytes per second.

Let's start to think laterally about this. Many of the problems around
CCID3/TFRC implementation seem to be on local LANs and rtt is less
than t_gran. We get really badly affected by how we do x_recv etc and
the rate is basically all over the show. We get affected by send
credits and numerous other problems.

What got me thinking about this was a comment by Dave Miller recently
saying something like congestion control on a LAN is basically
meaningless. By the time you've detected any congestion (and I would
add in my case "perceived" congestion) it has gone and you're better
having no congestion control and use Ethernet flow control.

I'm not 100% sure I agree with him totally but there are some very
valid thoughts there. And he was referring to TCP which is window
based and less problematic in this regard than rate based congestion
control I think.

Do we need to do some sanity checks on the rtt and adapt based on
that? I don't actually have the answers yet as just started thinking
about this and thought worthwhile kicking off a discussion as a real
implementation issue for CCID3 in Linux.

Ian
-- 
Web: http://wand.net.nz/~iam4/
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (4 preceding siblings ...)
  2007-04-11 22:45 ` Ian McDonald
@ 2007-04-12 11:40 ` Gerrit Renker
  2007-04-12 12:55 ` Gerrit Renker
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Gerrit Renker @ 2007-04-12 11:40 UTC (permalink / raw)
  To: dccp

[not cc:ed to dccp@ietf]

There are some interesting points here, but please consider the other message
also. From an implementation point of view and in the current absence of a
normative or majority-supported view on how to deal with accumulation of send
credits which do in fact arise, I think it is safer and better to for the moment
stick with the default of one t_ipi as maximum current credit.

Quoting Ian McDonald:
|  Do we need to do some sanity checks on the rtt and adapt based on
|  that? I don't actually have the answers yet as just started thinking
|  about this and thought worthwhile kicking off a discussion as a real
|  implementation issue for CCID3 in Linux.
This sounds to me like the oscillation prevention mechanism from RFC 3448, which 
is expressly meant for networks with low statistical multiplexing (e.g. LANs). 

I have it in the pipeline but didn't want to submit until the present patches are
through, since it requires to change the computation of X, for which I would like
to have a review and decision first.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (5 preceding siblings ...)
  2007-04-12 11:40 ` Gerrit Renker
@ 2007-04-12 12:55 ` Gerrit Renker
  2007-04-12 14:39 ` Eddie Kohler
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Gerrit Renker @ 2007-04-12 12:55 UTC (permalink / raw)
  To: dccp

|  > |  Actually this does not permit coarse granularity bursts, since it limits 
|  > |  the maximum burst size to 2 packets.  That is not sufficient for high 
|  > |  rates and medium-to-low granularities and it is far stricter than TCP.
|  > |  
|  > The comment affects the commit message. I can change that if you like.
|  
|  Yes, that would be the right thing to do.
Will send a revision with updated commit message later as requested. However, while doing that
I just found that the commit message already had an explanation what is meant in the above with
burstiness (copied from below):
	|  Let t_nom < t_now be such that t_now = t_nom + n*t_ipi + t_r, where
	|  n is a natural number and t_r < t_ipi. Then 
	|   
	|   	t_nom - t_now = - (n*t_ipi + t_r)
	|  
	| First consider n=0: the current packet is sent immediately, and for
	| the next one the send time is
	|  	
	|  	t_nom'  =  t_nom + t_ipi  =  t_now + (t_ipi - t_r)
	|  
==>	|  Thus the next packet is sent t_r time units earlier. The result is
==>	|  burstier traffic, as the inter-packet spacing is reduced; this 
==>	|  burstiness is mentioned by [RFC 3448, 4.6].


|  > First is the issue with TCP. As shown below, increasing the allowed lag beyond one full 
|  > t_ipi will effectively increase the sending rate beyond the allowed rate X; which 
|  > means that the sender sends more per RTT than it is allowed by the throughput equation. 
|  
|  The RFC3448/RFC4342 credit mechanism simply does not allow long-term rates 
|  greater than X (once you fix the problem with idle periods).
Don't understand "RFC3448/RFC4342 credit mechanism".

  
|  > And these will accrue for the simple reason that a t_ipi of 1.6 milliseconds becomes 1 millisecond,
|  > and a t_ipi of 0.9 milliseconds becomes 0 milliseconds. 
|  
|  You do not mean that DCCP calculates a t_ipi of 0 milliseconds, do you???  The 
|  RFCs assume that t_ipi is kept in *precise* units, not units of jiffies or 
|  what-have-you.  t_gran might be greater than t_ipi.
The calculations are accurate in units of microseconds. The accuracy of computing X is (since December) also 
precise, with a granularity of 1/64 bytes per second. 

The problem here is that the scheduler knows only discrete slices of HZ (here HZ\x1000) granularity. 
It is a quantisation problem: everything less than 1 becomes 0, everything from 1 up to (but excluding) 2 
becomes 1, etc.

That is one of the problems here - in the RFC such problems do not arise, but the implementation needs
to address these correctly.

  
|  Your token bucket math, incidentally, is wrong.  The easiest way to see this 
|  is to note that, according to your math, ANY token bucket filter attempting to 
|  limit the average output rate would have to have n = 0, making TBFs useless. 
|  The critical error is in assuming that a TBF allows "s * (n + R * rho)" bytes 
|  to be sent in a period R.  This is not right; a TBF allows a maximum of s * R 
|  * rho per longer-term period R; that's the point.  A token bucket filter 
|  allows only SHORT-term bursts to compensate for earlier slow periods.  Which 
|  is exactly what we need.
Please take another look. The formula is correct (you will find the same one e.g in Andrew
Tanenbaum's book). 

Please also do not generalise my statement: I used the term token bucket filter in a very 
specific sense with concrete reference to problems with the implementation (cf. below).

I think (with regard to the paragraph below) that your perspective is an entirely different one,
namely to solve the question "which kind of token bucket do we need to obtain an a rate which
is on average consistent with X". 

I have tried to close this research thread on dccp@ietf by saying that this is an implementation
problem: there is currently no guidance by the DCCP working group on this issue, or a normative
statement. I think it is not fair to shoulder a continuation of this discussion on a patch I sent
to fix an outstanding implementation problem. 

If there is a codified and documented agreement, fine, we can implement it.

But until this is truly resolved I want this patch in.


|  As part of the work on RFC3448bis we are trying to define the best maximum 
|  credit accumulation.  We are currently thinking R (one RTT), although feedback 
|  is welcome.  This is much better than t_gran, I think, and will limit 
|  burstiness a great deal in practice, where most fast connections have very 
|  short RTTs.  But the credit accumulation in RFC3448bis WILL be explicit, it 
|  WILL be a normative upper bound, and it certainly won't be zero.
Please see the [ANNOUNCE] message I sent.

When you use one RTT then the sender is allowed to send /two times/ its allowed share per RTT: 

 * one directly when the credit is cleared (could e.g. be cleared by draining the TX queue)
 * the second over the course of the RTT, the amount it is allocated by the throughput equation.

I think what you are really looking for is a model to average over a longer term, not just between
one RTT and the next. For this the maths get more complicated and the simple model I used for the 
patch (for which it is sufficient) is no longer applicable.

As a result, it leads nowhere to argue about token bucket filters. I think you need a mathematician 
who can model this with a time series of different and changing values of X, RTT, s, and t_ipi; and 
who could give guidance on worst-case versus average-case conditions. For such an analysis a simple 
token-bucket filter consideration is indeed not sufficient; maybe this can with some care also be 
modelled in ns-2 (but this is the Linux list).

|  > C o n c l u s i o n :
|  > ==========|  > The patch fixes a serious problem which will occur in any application using CCID3, due to
|  > realistically possible conditions such as
|  > 
|  >  * a low sending rate and/or
|  >  * silence periods and/or
|  >  * scheduling inaccuracies (as described above).
|  > 
|  > I therefore still want it in!
|  > 
|  > 
|  > 
|  > |  
|  > |  >  D e t a i l e d   J u s t i f i c a t i o n   [not commit message]
|  > |  >  ------------------------------------------------------------------
|  > |  >  Let t_nom < t_now be such that t_now = t_nom + n*t_ipi + t_r, where
|  > |  >  n is a natural number and t_r < t_ipi. Then 
|  > |  >  
|  > |  >  	t_nom - t_now = - (n*t_ipi + t_r)
|  > |  >  
|  > |  >  First consider n=0: the current packet is sent immediately, and for
|  > |  >  the next one the send time is
|  > |  >  	
|  > |  >  	t_nom'  =  t_nom + t_ipi  =  t_now + (t_ipi - t_r)
|  > |  >  
|  > |  >  Thus the next packet is sent t_r time units earlier. The result is
|  > |  >  burstier traffic, as the inter-packet spacing is reduced; this 
|  > |  >  burstiness is mentioned by [RFC 3448, 4.6]. 
|  > |  >  
|  > |  >  Now consider n=1. This case is illustrated below
|  > |  >  
|  > |  >  	|<----- t_ipi -------->|<-- t_r -->|
|  > |  >  
|  > |  >  	|----------------------|-----------|
|  > |  >  	t_nom                              t_now
|  > |  >  
|  > |  >  Not only can the next packet be sent t_r time units earlier, a third
|  > |  >  packet can additionally be sent at the same time. 
|  > |  >  
|  > |  >  This case can be generalised in that the packet scheduling mechanism
|  > |  >  now acts as a Token Bucket Filter whose bucket size equals n: when
|  > |  >  n=0, a packet can only be sent when the next token arrives. When n>0,
|  > |  >  a burst of n packets can be sent immediately in addition to the tokens
|  > |  >  which arrive with rate rho = 1/t_ipi.
|  > |  >  
|  > |  >  The aim of CCID 3 is an on average smooth traffic with allowed sending
|  > |  >  rate X. The following determines the required bucket size n for the 
|  > |  >  purpose of achieving, over the period of one RTT R, an average allowed
|  > |  >  sending rate X.
|  > |  >  The number of bytes sent during this period is X*R. Tokens arrive with
|  > |  >  rate rho at the bucket, whose size n shall be determined now. Over the
|  > |  >  period of R, the TBF allows s * (n + R * rho) bytes to be sent, since
|  > |  >  each token represents a packet of size s. Hence we have the equation
|  > |  >  
|  > |  >  		s * (n + R * rho) = X * R
|  > |  >  	<=>	n + R/t_ipi	  = X/s * R = R / t_ipi
|  > |  >  
|  > |  >  which shows that n must be 0. Hence we can not allow a `credit' of
|  > |  >  t_nom - t_now > t_ipi time units to accrue in the packet scheduling.
|  > |  > 
|  > |  > 
|  > |  > Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
|  > |  > ---
|  > |  >  net/dccp/ccids/ccid3.c |   12 ++++++++++--
|  > |  >  1 file changed, 10 insertions(+), 2 deletions(-)
|  > |  > 
|  > |  > --- a/net/dccp/ccids/ccid3.c
|  > |  > +++ b/net/dccp/ccids/ccid3.c
|  > |  > @@ -362,7 +362,15 @@ static int ccid3_hc_tx_send_packet(struc
|  > |  >  	case TFRC_SSTATE_NO_FBACK:
|  > |  >  	case TFRC_SSTATE_FBACK:
|  > |  >  		delay = timeval_delta(&hctx->ccid3hctx_t_nom, &now);
|  > |  > -		ccid3_pr_debug("delay=%ld\n", (long)delay);
|  > |  > +		/*
|  > |  > +		 * Lagging behind for more than a full t_ipi: when this occurs,
|  > |  > +		 * a send credit accrues which causes packet storms, violating
|  > |  > +		 * even the average allowed sending rate. This case happens if
|  > |  > +		 * the application idles for some time, or if it emits packets
|  > |  > +		 * at a rate smaller than X/s. Avoid such accumulation.
|  > |  > +		 */
|  > |  > +		if (delay + (suseconds_t)hctx->ccid3hctx_t_ipi  <  0)
|  > |  > +			hctx->ccid3hctx_t_nom = now;
|  > |  >  		/*
|  > |  >  		 *	Scheduling of packet transmissions [RFC 3448, 4.6]
|  > |  >  		 *
|  > |  > @@ -371,7 +379,7 @@ static int ccid3_hc_tx_send_packet(struc
|  > |  >  		 * else
|  > |  >  		 *       // send the packet in (t_nom - t_now) milliseconds.
|  > |  >  		 */
|  > |  > -		if (delay - (suseconds_t)hctx->ccid3hctx_delta >= 0)
|  > |  > +		else if (delay - (suseconds_t)hctx->ccid3hctx_delta  >=  0)
|  > |  >  			return delay / 1000L;
|  > |  >  
|  > |  >  		ccid3_hc_tx_update_win_count(hctx, &now);
|  > |  > -
|  > |  > To unsubscribe from this list: send the line "unsubscribe dccp" in
|  > |  > the body of a message to majordomo@vger.kernel.org
|  > |  > More majordomo info at  http://vger.kernel.org/majordomo-info.html
|  > |  
|  > |  
|  > -
|  > To unsubscribe from this list: send the line "unsubscribe dccp" in
|  > the body of a message to majordomo@vger.kernel.org
|  > More majordomo info at  http://vger.kernel.org/majordomo-info.html
|  
|  

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (6 preceding siblings ...)
  2007-04-12 12:55 ` Gerrit Renker
@ 2007-04-12 14:39 ` Eddie Kohler
  2007-04-13 18:27 ` Gerrit Renker
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Eddie Kohler @ 2007-04-12 14:39 UTC (permalink / raw)
  To: dccp

> That is one of the problems here - in the RFC such problems do not arise, but the implementation needs
> to address these correctly.

The RFC's solution to this problem, which involves t_gran, EXACTLY addresses this

> |  Your token bucket math, incidentally, is wrong.  The easiest way to see this 
> |  is to note that, according to your math, ANY token bucket filter attempting to 
> |  limit the average output rate would have to have n = 0, making TBFs useless. 
> |  The critical error is in assuming that a TBF allows "s * (n + R * rho)" bytes 
> |  to be sent in a period R.  This is not right; a TBF allows a maximum of s * R 
> |  * rho per longer-term period R; that's the point.  A token bucket filter 
> |  allows only SHORT-term bursts to compensate for earlier slow periods.  Which 
> |  is exactly what we need.
> Please take another look. The formula is correct (you will find the same one e.g in Andrew
> Tanenbaum's book). 

So I assume what you are referring to is the clause "average rate OVER ONE 
RTT"?  Sorry I missed that.  I missed it because it is not TFRC's goal.  Can 
you point to the section in RFC3448 or RFC4342 that prohibits a TFRC sender 
from ever sending a (transient) rate more than X over one RTT?  RFC3448 4.6 
allows burstiness much more than a single packet, and the intro allows 
fluctuations of up to a factor of 2 relative to the fair rate

> I think (with regard to the paragraph below) that your perspective is an entirely different one,
> namely to solve the question "which kind of token bucket do we need to obtain an a rate which
> is on average consistent with X". 

That is *TFRC's* perspective: finding packet sends that on average are 
consistent with X.  As demonstrated by 4.6 and elsewhere

How much above X may an application transiently send?  The intro would argue 2x.

> But until this is truly resolved I want this patch in.

Fine, I disagree, Ian disagrees (as far as I read its messages).  You are 
fixing one problem and creating another: artificially low send rates

Eddie

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (7 preceding siblings ...)
  2007-04-12 14:39 ` Eddie Kohler
@ 2007-04-13 18:27 ` Gerrit Renker
  2007-04-13 20:37 ` Eddie Kohler
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Gerrit Renker @ 2007-04-13 18:27 UTC (permalink / raw)
  To: dccp

Your arguments consider only the specification. What you don't see, and Ian also doesn't seem
to see, is that this implementation conforms to the ideas of TFRC only up to a maximum speed
of s * HZ bytes per second; under benign conditions this is about 12..15 Mbits/sec.

Once you are past that speed you effectively have a `raw socket' module whose only resemblance 
to TFRC/DCCP is the package format; without even a hint of congestion control.

Here for instance is typical output, copied & pasted just a minute ago:

$ iperf -sd -t20
------------------------------------------------------------
Server listening on DCCP port 5001
DCCP datagram buffer size:   106 KByte (default)
------------------------------------------------------------
[  4] local 192.235.214.65 port 5001 connected with 192.235.214.75 port 40524
[  4]  0.0-20.4 sec  1.08 GBytes    454 Mbits/sec                              

If you ask the above sender to reduce its speed to 200 Mbits/sec in response to network congestion
reported via ECN receiver or feedback it will _not_ do that - simply because it is unable to control
those speeds. It will continue to send at maximum speed (up to 80% link bandwidth is possible).

Only when you ask it to reduce below s * HZ will it be able to slow down, which here would mean
to reduce from 454 Mbits/sec to 12 Mbits/sec.

That said, without this patch you will get a stampede of packets for the other reason that
the scheduler is not as precise as required; it will always add up the lag arising from
interpreting e.g. 1.7 as 1 and 0.9 as 0 milliseconds. I still would like this patch in for exactly
these reasons.

Seriously, I think that Linux or any other scheduler-based OS is simply the wrong platform for CCID3, 
this here can not give you the precision and the controllability that your specification assumes
and requires. 

You are aware of Ian's (and I doubt whether he is the only one) aversions against high-res timers.
This would remove these silly accumulations and remove the need for patches such as this one.

The other case is the use of interface timestamps. With interface timestamps, I was able to accurately
sample the link RTT as it is reported e.g. by ping. With the present layer-4 timestamps, this goes up
back again to very high values, simply because the inaccuracies add all up. 

Conversely, it very much seems that the specification needs some revision before it becomes implementable
on a non-realtime OS. Can you give us something which we can implement with the constraints we have
(i.e. no interface timestamps, no high-res timers, accumulation of inaccuracies)?

CCID2 works nicely since it does not have all these precision requirements.





Quoting Eddie Kohler:
|  > That is one of the problems here - in the RFC such problems do not arise, but the implementation needs
|  > to address these correctly.
|  
|  The RFC's solution to this problem, which involves t_gran, EXACTLY addresses this
|  
|  > |  Your token bucket math, incidentally, is wrong.  The easiest way to see this 
|  > |  is to note that, according to your math, ANY token bucket filter attempting to 
|  > |  limit the average output rate would have to have n = 0, making TBFs useless. 
|  > |  The critical error is in assuming that a TBF allows "s * (n + R * rho)" bytes 
|  > |  to be sent in a period R.  This is not right; a TBF allows a maximum of s * R 
|  > |  * rho per longer-term period R; that's the point.  A token bucket filter 
|  > |  allows only SHORT-term bursts to compensate for earlier slow periods.  Which 
|  > |  is exactly what we need.
|  > Please take another look. The formula is correct (you will find the same one e.g in Andrew
|  > Tanenbaum's book). 
|  
|  So I assume what you are referring to is the clause "average rate OVER ONE 
|  RTT"?  Sorry I missed that.  I missed it because it is not TFRC's goal.  Can 
|  you point to the section in RFC3448 or RFC4342 that prohibits a TFRC sender 
|  from ever sending a (transient) rate more than X over one RTT?  RFC3448 4.6 
|  allows burstiness much more than a single packet, and the intro allows 
|  fluctuations of up to a factor of 2 relative to the fair rate
|  
|  > I think (with regard to the paragraph below) that your perspective is an entirely different one,
|  > namely to solve the question "which kind of token bucket do we need to obtain an a rate which
|  > is on average consistent with X". 
|  
|  That is *TFRC's* perspective: finding packet sends that on average are 
|  consistent with X.  As demonstrated by 4.6 and elsewhere
|  
|  How much above X may an application transiently send?  The intro would argue 2x.
|  
|  > But until this is truly resolved I want this patch in.
|  
|  Fine, I disagree, Ian disagrees (as far as I read its messages).  You are 
|  fixing one problem and creating another: artificially low send rates
|  
|  Eddie
|  -
|  To unsubscribe from this list: send the line "unsubscribe dccp" in
|  the body of a message to majordomo@vger.kernel.org
|  More majordomo info at  http://vger.kernel.org/majordomo-info.html
|  
|  

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (8 preceding siblings ...)
  2007-04-13 18:27 ` Gerrit Renker
@ 2007-04-13 20:37 ` Eddie Kohler
  2007-04-13 20:58 ` David Miller
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Eddie Kohler @ 2007-04-13 20:37 UTC (permalink / raw)
  To: dccp

Gerrit.  I know the implementation is broken for high rates.  But you are 
saying that it is impossible to implement CCID3 congestion control at high 
rates.  I am not convinced.  Among other things, CCID3's t_gran section gives 
the implementation EXACTLY the flexibility required to smoothly transition 
from a purely rate-based, packet-at-a-time sending algorithm to a hybrid 
algorithm where periodic bursts provide a rate that is on average X.

Your examples repeatedly demonstrate that the current implementation is 
broken.  Cool.

If you were to just say this was an interim fix it would be easier, but I'd 
still be confused, since fixing tihs issue does not seem hard.  Just limit the 
accumulated send credit to something greater than 0, such as the RTT.  But you 
hate that for some reason that you are not articulating.

It's here that you go off the rails:

 > Seriously, I think that Linux or any other scheduler-based OS is simply the
 > wrong platform for CCID3, this here can not give you the precision and the
 > controllability that your specification assumes and requires.

The specification neither assumes nor requires this and in fact has an 
EXPLICIT section that EXACTLY addresses this problem, 4.6 of 3448.

Similarly:

 > CCID2 works nicely since it does not have all these precision requirements.

To put it mildly, you have not provided evidence that CCID3 does either.

Ian: do you want to collaborate on a patch for this?

Eddie


Gerrit Renker wrote:
> Your arguments consider only the specification. What you don't see, and Ian also doesn't seem
> to see, is that this implementation conforms to the ideas of TFRC only up to a maximum speed
> of s * HZ bytes per second; under benign conditions this is about 12..15 Mbits/sec.
> 
> Once you are past that speed you effectively have a `raw socket' module whose only resemblance 
> to TFRC/DCCP is the package format; without even a hint of congestion control.
> 
> Here for instance is typical output, copied & pasted just a minute ago:
> 
> $ iperf -sd -t20
> ------------------------------------------------------------
> Server listening on DCCP port 5001
> DCCP datagram buffer size:   106 KByte (default)
> ------------------------------------------------------------
> [  4] local 192.235.214.65 port 5001 connected with 192.235.214.75 port 40524
> [  4]  0.0-20.4 sec  1.08 GBytes    454 Mbits/sec                              
> 
> If you ask the above sender to reduce its speed to 200 Mbits/sec in response to network congestion
> reported via ECN receiver or feedback it will _not_ do that - simply because it is unable to control
> those speeds. It will continue to send at maximum speed (up to 80% link bandwidth is possible).
> 
> Only when you ask it to reduce below s * HZ will it be able to slow down, which here would mean
> to reduce from 454 Mbits/sec to 12 Mbits/sec.
> 
> That said, without this patch you will get a stampede of packets for the other reason that
> the scheduler is not as precise as required; it will always add up the lag arising from
> interpreting e.g. 1.7 as 1 and 0.9 as 0 milliseconds. I still would like this patch in for exactly
> these reasons.
> 
> Seriously, I think that Linux or any other scheduler-based OS is simply the wrong platform for CCID3, 
> this here can not give you the precision and the controllability that your specification assumes
> and requires. 
> 
> You are aware of Ian's (and I doubt whether he is the only one) aversions against high-res timers.
> This would remove these silly accumulations and remove the need for patches such as this one.
> 
> The other case is the use of interface timestamps. With interface timestamps, I was able to accurately
> sample the link RTT as it is reported e.g. by ping. With the present layer-4 timestamps, this goes up
> back again to very high values, simply because the inaccuracies add all up. 
> 
> Conversely, it very much seems that the specification needs some revision before it becomes implementable
> on a non-realtime OS. Can you give us something which we can implement with the constraints we have
> (i.e. no interface timestamps, no high-res timers, accumulation of inaccuracies)?
> 
> CCID2 works nicely since it does not have all these precision requirements.
> 
> 
> 
> 
> 
> Quoting Eddie Kohler:
> |  > That is one of the problems here - in the RFC such problems do not arise, but the implementation needs
> |  > to address these correctly.
> |  
> |  The RFC's solution to this problem, which involves t_gran, EXACTLY addresses this
> |  
> |  > |  Your token bucket math, incidentally, is wrong.  The easiest way to see this 
> |  > |  is to note that, according to your math, ANY token bucket filter attempting to 
> |  > |  limit the average output rate would have to have n = 0, making TBFs useless. 
> |  > |  The critical error is in assuming that a TBF allows "s * (n + R * rho)" bytes 
> |  > |  to be sent in a period R.  This is not right; a TBF allows a maximum of s * R 
> |  > |  * rho per longer-term period R; that's the point.  A token bucket filter 
> |  > |  allows only SHORT-term bursts to compensate for earlier slow periods.  Which 
> |  > |  is exactly what we need.
> |  > Please take another look. The formula is correct (you will find the same one e.g in Andrew
> |  > Tanenbaum's book). 
> |  
> |  So I assume what you are referring to is the clause "average rate OVER ONE 
> |  RTT"?  Sorry I missed that.  I missed it because it is not TFRC's goal.  Can 
> |  you point to the section in RFC3448 or RFC4342 that prohibits a TFRC sender 
> |  from ever sending a (transient) rate more than X over one RTT?  RFC3448 4.6 
> |  allows burstiness much more than a single packet, and the intro allows 
> |  fluctuations of up to a factor of 2 relative to the fair rate
> |  
> |  > I think (with regard to the paragraph below) that your perspective is an entirely different one,
> |  > namely to solve the question "which kind of token bucket do we need to obtain an a rate which
> |  > is on average consistent with X". 
> |  
> |  That is *TFRC's* perspective: finding packet sends that on average are 
> |  consistent with X.  As demonstrated by 4.6 and elsewhere
> |  
> |  How much above X may an application transiently send?  The intro would argue 2x.
> |  
> |  > But until this is truly resolved I want this patch in.
> |  
> |  Fine, I disagree, Ian disagrees (as far as I read its messages).  You are 
> |  fixing one problem and creating another: artificially low send rates
> |  
> |  Eddie
> |  -
> |  To unsubscribe from this list: send the line "unsubscribe dccp" in
> |  the body of a message to majordomo@vger.kernel.org
> |  More majordomo info at  http://vger.kernel.org/majordomo-info.html
> |  
> |  

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (9 preceding siblings ...)
  2007-04-13 20:37 ` Eddie Kohler
@ 2007-04-13 20:58 ` David Miller
  2007-04-13 21:45 ` Ian McDonald
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2007-04-13 20:58 UTC (permalink / raw)
  To: dccp

From: Eddie Kohler <kohler@cs.ucla.edu>
Date: Fri, 13 Apr 2007 13:37:57 -0700

> Gerrit.  I know the implementation is broken for high rates.  But you are 
> saying that it is impossible to implement CCID3 congestion control at high 
> rates.  I am not convinced.  Among other things, CCID3's t_gran section gives 
> the implementation EXACTLY the flexibility required to smoothly transition 
> from a purely rate-based, packet-at-a-time sending algorithm to a hybrid 
> algorithm where periodic bursts provide a rate that is on average X.
> 
> Your examples repeatedly demonstrate that the current implementation is 
> broken.  Cool.
> 
> If you were to just say this was an interim fix it would be easier, but I'd 
> still be confused, since fixing tihs issue does not seem hard.  Just limit the 
> accumulated send credit to something greater than 0, such as the RTT.

Eddie, this is an interesting idea, but would you be amicable to the
suggestion I made in another email?  Basically if RTT is extremely
low, don't do any of this limiting.

What sense is there to doing any of this for very low RTTs?  It is
a very honest question.

If we hit some congestion in a switch on the local network, responding
to that signal is pointless because the congestion event will pass
before we even get the feedback showing us that there was congestion
in the first place.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (10 preceding siblings ...)
  2007-04-13 20:58 ` David Miller
@ 2007-04-13 21:45 ` Ian McDonald
  2007-04-13 23:43 ` Eddie Kohler
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Ian McDonald @ 2007-04-13 21:45 UTC (permalink / raw)
  To: dccp

On 4/14/07, David Miller <davem@davemloft.net> wrote:
> From: Eddie Kohler <kohler@cs.ucla.edu>
> Date: Fri, 13 Apr 2007 13:37:57 -0700
>
> > Gerrit.  I know the implementation is broken for high rates.  But you are
> > saying that it is impossible to implement CCID3 congestion control at high
> > rates.  I am not convinced.  Among other things, CCID3's t_gran section gives
> > the implementation EXACTLY the flexibility required to smoothly transition
> > from a purely rate-based, packet-at-a-time sending algorithm to a hybrid
> > algorithm where periodic bursts provide a rate that is on average X.
> >
> > Your examples repeatedly demonstrate that the current implementation is
> > broken.  Cool.
> >
> > If you were to just say this was an interim fix it would be easier, but I'd
> > still be confused, since fixing tihs issue does not seem hard.  Just limit the
> > accumulated send credit to something greater than 0, such as the RTT.
>
> Eddie, this is an interesting idea, but would you be amicable to the
> suggestion I made in another email?  Basically if RTT is extremely
> low, don't do any of this limiting.
>
> What sense is there to doing any of this for very low RTTs?  It is
> a very honest question.
>
> If we hit some congestion in a switch on the local network, responding
> to that signal is pointless because the congestion event will pass
> before we even get the feedback showing us that there was congestion
> in the first place.

It's not totally pointless Dave because it is a rate based protocol
not a window based protocol and you've got the real issue of slow
receivers especially when we use a whole lot of CPU... It's not
network congestion but still should be dealt with. There's probably
other scenarios too - e.g. I can think of a 10 Mbit radio link between
two buildings that run on 100 Mbit internally. TCP works fine as no
acks will stop transmission but a rate based one will keep on
trying.... Although this isn't a local switch as you mention but it is
low RTT.

Eddie - I would like to work on this more in answer to your question.
I'll see what I can do over the next weeks once I get a paper out of
the way (or when I get bored with it!). Gerrit's work is nearly there.
What I'd like to do is work on this whole granularity/out of control
thing he keeps referring to. I am not convinced but I need to put up
or shut up by replicating some of his work and fixing the bugs or
admitting I'm wrong.

Ian
-- 
Web: http://wand.net.nz/~iam4/
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (11 preceding siblings ...)
  2007-04-13 21:45 ` Ian McDonald
@ 2007-04-13 23:43 ` Eddie Kohler
  2007-04-14  5:51 ` Eddie Kohler
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Eddie Kohler @ 2007-04-13 23:43 UTC (permalink / raw)
  To: dccp

Hi David,

This might work, but I'd need to work it through.

The fact is that ALL TCP-like algorithms have rates that are inversely 
proportional to the RTT.  But in TCP and windowed protocols this happens 
naturally due to ack clocking.  In CCID3, there's no ack clocking.  Acks 
arrive much more seldom -- down to once per RTT, rather than once per 2 
packets.  Thus the RTT measurement is EXPLICITLY fed in to the throughput 
equation.  In TCP the RTT measurement mostly just feeds in to the RTO, which 
is why the protocol's behavior is less sensitive to the measurement.

DCCP *MIGHT* work just fine with the inflated RTT measurements (i.e. the RTT 
including IP processing) but there is yet another gerrit missive to work 
through to see how real that complaint is.

A less aggressive version of "turn off RTT for LANs" would be simply to 
subtract an estimate of the IP<->card path's cost from the measured coarse 
RTT.  This would fix the problem.  If you used a stable minimum estimate, the 
RTT would naturally "inflate" when the host was busy, which as Ian points out 
is what we actually want.  How to obtain an estimate?  Probably anything would 
do, including something derived at boot time from BogoMIPS.

-*-

As for coarse-grained timers, does DCCP CCID3 *only* send packets at timer 
granularity?  This would differ from TCP which sends packets as acks arrive. 
It should be relatively easy in CCID3 to likewise try to send packets as acks 
arrive.  There are fewer acks, of course, but still on LANs where RTT << 
timer_granularity this would reduce burstiness.  (All assuming CCID3 doesn't 
do this already.)

Eddie


David Miller wrote:
> From: Eddie Kohler <kohler@cs.ucla.edu>
> Date: Fri, 13 Apr 2007 13:37:57 -0700
> 
>> Gerrit.  I know the implementation is broken for high rates.  But you are 
>> saying that it is impossible to implement CCID3 congestion control at high 
>> rates.  I am not convinced.  Among other things, CCID3's t_gran section gives 
>> the implementation EXACTLY the flexibility required to smoothly transition 
>> from a purely rate-based, packet-at-a-time sending algorithm to a hybrid 
>> algorithm where periodic bursts provide a rate that is on average X.
>>
>> Your examples repeatedly demonstrate that the current implementation is 
>> broken.  Cool.
>>
>> If you were to just say this was an interim fix it would be easier, but I'd 
>> still be confused, since fixing tihs issue does not seem hard.  Just limit the 
>> accumulated send credit to something greater than 0, such as the RTT.
> 
> Eddie, this is an interesting idea, but would you be amicable to the
> suggestion I made in another email?  Basically if RTT is extremely
> low, don't do any of this limiting.
> 
> What sense is there to doing any of this for very low RTTs?  It is
> a very honest question.
> 
> If we hit some congestion in a switch on the local network, responding
> to that signal is pointless because the congestion event will pass
> before we even get the feedback showing us that there was congestion
> in the first place.
> -
> To unsubscribe from this list: send the line "unsubscribe dccp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (12 preceding siblings ...)
  2007-04-13 23:43 ` Eddie Kohler
@ 2007-04-14  5:51 ` Eddie Kohler
  2007-04-15 15:44 ` Gerrit Renker
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Eddie Kohler @ 2007-04-14  5:51 UTC (permalink / raw)
  To: dccp

Putting on my Sally hat:

David Miller wrote:
> Eddie, this is an interesting idea, but would you be amicable to the
> suggestion I made in another email?  Basically if RTT is extremely
> low, don't do any of this limiting.
> 
> What sense is there to doing any of this for very low RTTs?  It is
> a very honest question.
> 
> If we hit some congestion in a switch on the local network, responding
> to that signal is pointless because the congestion event will pass
> before we even get the feedback showing us that there was congestion
> in the first place.

An idea like this is definitely worth exploring.  Of course it would be a 
change to congestion control and would have to be treated as such.  it 
wouldn't be CCID3, or TCP-friendly, since TCP is (according to research & 
such) responding to the RTT, due to ack clocking.  You'd have to worry about 
perhaps rare, but absolutely possible, cases such as persistent LAN 
congestion.  (Maybe a local wireless LAN?)

I wonder in Gerrit's RTT experiments what a TCP connection would achieve, and 
how that would correspond to the TCP throughput equation.

Eddie

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (13 preceding siblings ...)
  2007-04-14  5:51 ` Eddie Kohler
@ 2007-04-15 15:44 ` Gerrit Renker
  2007-04-15 15:56 ` Gerrit Renker
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Gerrit Renker @ 2007-04-15 15:44 UTC (permalink / raw)
  To: dccp

Hi Eddie,

this email is confused and angry so before even starting with the facts can I just
apologize for having asked you not to send any offline emails. That was probably
a bad thing to do, sorry.

With that out of the way, can we please have a cooler view at the facts.


|  Gerrit.  I know the implementation is broken for high rates.  But you are 
|  saying that it is impossible to implement CCID3 congestion control at high 
|  rates.  I am not convinced.  Among other things, CCID3's t_gran section gives 
|  the implementation EXACTLY the flexibility required to smoothly transition 
|  from a purely rate-based, packet-at-a-time sending algorithm to a hybrid 
|  algorithm where periodic bursts provide a rate that is on average X.
|  
|  Your examples repeatedly demonstrate that the current implementation is 
|  broken.  Cool.
Unfortunately it is, and I say this without any glee. It was before I started work
on it, and for that matter probably even before Ian converted the code.

The problem is that, due to the slow-start mechanism, the sender will always try
to ramp up to link speed and thus invariably to such small packet spacings that it
can not control.

I didn't say CCID3 is impossible, and I didn't say that your specification was bad.

What I mean to say is that trying to implement the algorithm "exactly and explicitly"
out of the book does not work: on the one hand it ignores the realities of the operating 
system (scheduling granularity, processing costs, inaccuracies, delays) and on the other
hand it ignores realities of networking - as per David's and Ian's answers. 

So the point is merely that the goals of TFRC need to somehow be rephrased in terms of
what can be done sensibly. 

I think that there is a lot of very valuable points to be learned from David's input
and that listening and not listening carefully to such hints can make the key difference
as to whether or not CCID3 works in the real world too.

I really hope that the points raised at the end of last week will somehow be linked with
the TFRC/CCID3 specification.


|  If you were to just say this was an interim fix it would be easier, but I'd 
|  still be confused, since fixing tihs issue does not seem hard.  Just limit the 
|  accumulated send credit to something greater than 0, such as the RTT.  But you 
|  hate that for some reason that you are not articulating.
No sorry, it is not a quick fix. I think it requires some rethinking.  

|  It's here that you go off the rails:
|  
|   > Seriously, I think that Linux or any other scheduler-based OS is simply the
|   > wrong platform for CCID3, this here can not give you the precision and the
|   > controllability that your specification assumes and requires.
|  
|  The specification neither assumes nor requires this and in fact has an 
|  EXPLICIT section that EXACTLY addresses this problem, 4.6 of 3448.
I think that this is `exactly' the problem - we can not meet the goals of that
specification by implementing it `explicitly'. The explicit requirements constrain
the implementation, which itself is constrained by the realities of what can be 
implemented, and what works in a real network. 

By relaxing that explicitness, you would give implementers the freedom to meet the 
goals of your specification. And you would win tremendously from that - especially
when using the input from David or Arnaldo.


 
|   > CCID2 works nicely since it does not have all these precision requirements.
|  
|  To put it mildly, you have not provided evidence that CCID3 does either.
Oh I did several months ago, all posted to the list and mentioned several times.
The links are
	http://www.erg.abdn.ac.uk/users/gerrit/dccp/docs/packet_scheduling/
	http://www.erg.abdn.ac.uk/users/gerrit/dccp/docs/impact_of_tx_queue_lenghts/

  
|  Ian: do you want to collaborate on a patch for this?
A patch for a conceptual problem? Please do.


Thanks.


  
|  Gerrit Renker wrote:
|  > Your arguments consider only the specification. What you don't see, and Ian also doesn't seem
|  > to see, is that this implementation conforms to the ideas of TFRC only up to a maximum speed
|  > of s * HZ bytes per second; under benign conditions this is about 12..15 Mbits/sec.
|  > 
|  > Once you are past that speed you effectively have a `raw socket' module whose only resemblance 
|  > to TFRC/DCCP is the package format; without even a hint of congestion control.
|  > 
|  > Here for instance is typical output, copied & pasted just a minute ago:
|  > 
|  > $ iperf -sd -t20
|  > ------------------------------------------------------------
|  > Server listening on DCCP port 5001
|  > DCCP datagram buffer size:   106 KByte (default)
|  > ------------------------------------------------------------
|  > [  4] local 192.235.214.65 port 5001 connected with 192.235.214.75 port 40524
|  > [  4]  0.0-20.4 sec  1.08 GBytes    454 Mbits/sec                              
|  > 
|  > If you ask the above sender to reduce its speed to 200 Mbits/sec in response to network congestion
|  > reported via ECN receiver or feedback it will _not_ do that - simply because it is unable to control
|  > those speeds. It will continue to send at maximum speed (up to 80% link bandwidth is possible).
|  > 
|  > Only when you ask it to reduce below s * HZ will it be able to slow down, which here would mean
|  > to reduce from 454 Mbits/sec to 12 Mbits/sec.
|  > 
|  > That said, without this patch you will get a stampede of packets for the other reason that
|  > the scheduler is not as precise as required; it will always add up the lag arising from
|  > interpreting e.g. 1.7 as 1 and 0.9 as 0 milliseconds. I still would like this patch in for exactly
|  > these reasons.
|  > 
|  > Seriously, I think that Linux or any other scheduler-based OS is simply the wrong platform for CCID3, 
|  > this here can not give you the precision and the controllability that your specification assumes
|  > and requires. 
|  > 
|  > You are aware of Ian's (and I doubt whether he is the only one) aversions against high-res timers.
|  > This would remove these silly accumulations and remove the need for patches such as this one.
|  > 
|  > The other case is the use of interface timestamps. With interface timestamps, I was able to accurately
|  > sample the link RTT as it is reported e.g. by ping. With the present layer-4 timestamps, this goes up
|  > back again to very high values, simply because the inaccuracies add all up. 
|  > 
|  > Conversely, it very much seems that the specification needs some revision before it becomes implementable
|  > on a non-realtime OS. Can you give us something which we can implement with the constraints we have
|  > (i.e. no interface timestamps, no high-res timers, accumulation of inaccuracies)?
|  > 
|  > CCID2 works nicely since it does not have all these precision requirements.
|  > 
|  > 
|  > 
|  > 
|  > 
|  > Quoting Eddie Kohler:
|  > |  > That is one of the problems here - in the RFC such problems do not arise, but the implementation needs
|  > |  > to address these correctly.
|  > |  
|  > |  The RFC's solution to this problem, which involves t_gran, EXACTLY addresses this
|  > |  
|  > |  > |  Your token bucket math, incidentally, is wrong.  The easiest way to see this 
|  > |  > |  is to note that, according to your math, ANY token bucket filter attempting to 
|  > |  > |  limit the average output rate would have to have n = 0, making TBFs useless. 
|  > |  > |  The critical error is in assuming that a TBF allows "s * (n + R * rho)" bytes 
|  > |  > |  to be sent in a period R.  This is not right; a TBF allows a maximum of s * R 
|  > |  > |  * rho per longer-term period R; that's the point.  A token bucket filter 
|  > |  > |  allows only SHORT-term bursts to compensate for earlier slow periods.  Which 
|  > |  > |  is exactly what we need.
|  > |  > Please take another look. The formula is correct (you will find the same one e.g in Andrew
|  > |  > Tanenbaum's book). 
|  > |  
|  > |  So I assume what you are referring to is the clause "average rate OVER ONE 
|  > |  RTT"?  Sorry I missed that.  I missed it because it is not TFRC's goal.  Can 
|  > |  you point to the section in RFC3448 or RFC4342 that prohibits a TFRC sender 
|  > |  from ever sending a (transient) rate more than X over one RTT?  RFC3448 4.6 
|  > |  allows burstiness much more than a single packet, and the intro allows 
|  > |  fluctuations of up to a factor of 2 relative to the fair rate
|  > |  
|  > |  > I think (with regard to the paragraph below) that your perspective is an entirely different one,
|  > |  > namely to solve the question "which kind of token bucket do we need to obtain an a rate which
|  > |  > is on average consistent with X". 
|  > |  
|  > |  That is *TFRC's* perspective: finding packet sends that on average are 
|  > |  consistent with X.  As demonstrated by 4.6 and elsewhere
|  > |  
|  > |  How much above X may an application transiently send?  The intro would argue 2x.
|  > |  
|  > |  > But until this is truly resolved I want this patch in.
|  > |  
|  > |  Fine, I disagree, Ian disagrees (as far as I read its messages).  You are 
|  > |  fixing one problem and creating another: artificially low send rates
|  > |  
|  > |  Eddie
|  > |  -
|  > |  To unsubscribe from this list: send the line "unsubscribe dccp" in
|  > |  the body of a message to majordomo@vger.kernel.org
|  > |  More majordomo info at  http://vger.kernel.org/majordomo-info.html
|  > |  
|  > |  
|  -
|  To unsubscribe from this list: send the line "unsubscribe dccp" in
|  the body of a message to majordomo@vger.kernel.org
|  More majordomo info at  http://vger.kernel.org/majordomo-info.html
|  
|  

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (14 preceding siblings ...)
  2007-04-15 15:44 ` Gerrit Renker
@ 2007-04-15 15:56 ` Gerrit Renker
  2007-04-15 16:23 ` Gerrit Renker
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Gerrit Renker @ 2007-04-15 15:56 UTC (permalink / raw)
  To: dccp

Quoting Ian McDonald:
|  It's not totally pointless Dave because it is a rate based protocol
|  not a window based protocol and you've got the real issue of slow
|  receivers especially when we use a whole lot of CPU... It's not
|  network congestion but still should be dealt with. There's probably
|  other scenarios too - e.g. I can think of a 10 Mbit radio link between
|  two buildings that run on 100 Mbit internally. TCP works fine as no
|  acks will stop transmission but a rate based one will keep on
|  trying.... Although this isn't a local switch as you mention but it is
|  low RTT.
I think that we can use quite a lot of that input, but it requires some
thinking - which IMO is what you are trying to say here. And agrees with
what I am trying to say - we need some revision so that we don't end up
coding RFC 3448 blindly.


|  Eddie - I would like to work on this more in answer to your question.
|  I'll see what I can do over the next weeks once I get a paper out of
|  the way (or when I get bored with it!). Gerrit's work is nearly there.
|  What I'd like to do is work on this whole granularity/out of control
|  thing he keeps referring to. I am not convinced but I need to put up
|  or shut up by replicating some of his work and fixing the bugs or
|  admitting I'm wrong.
I think it is less a question of wrong/not wrong but rather "works/doesn't work".

Since the patches (apart from one minor improvement) do not touch the packet
scheduling engine in net/dccp/output.c, can I suggest to work through the submitted
set of patches first? They fix other problems and I think that they make CCID3
quite a bit more lightweight. This would simplify working on that problem, too. Ok?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (15 preceding siblings ...)
  2007-04-15 15:56 ` Gerrit Renker
@ 2007-04-15 16:23 ` Gerrit Renker
  2007-04-15 16:41 ` Gerrit Renker
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Gerrit Renker @ 2007-04-15 16:23 UTC (permalink / raw)
  To: dccp

|  Thus the RTT measurement is EXPLICITLY fed in to the throughput 
|  equation.  In TCP the RTT measurement mostly just feeds in to the RTO, which 
|  is why the protocol's behavior is less sensitive to the measurement.
|  
|  DCCP *MIGHT* work just fine with the inflated RTT measurements (i.e. the RTT 
|  including IP processing) but there is yet another gerrit missive to work 
|  through to see how real that complaint is.
|  
|  A less aggressive version of "turn off RTT for LANs" would be simply to 
|  subtract an estimate of the IP<->card path's cost from the measured coarse 
|  RTT.  This would fix the problem.  If you used a stable minimum estimate, the 
|  RTT would naturally "inflate" when the host was busy, which as Ian points out 
|  is what we actually want.  How to obtain an estimate?  Probably anything would 
|  do, including something derived at boot time from BogoMIPS.
Such an estimate is difficult to obtain - I get completely different results on
different computers; depending on the card (ee100 with NAPI is fine, my cheapo RTL 8139
on the other hand makes a lot of noise). Some practical figures are:

 * when the timestamps get taken at arrive time, the RTT values are very close to what ping
   reports (e.g. 100 ... 1,100 usec on a LAN);

 * when the timestamps get taken in the CCID3 module, the RTT values are roughly up to 10
   times larger (5 milliseconds on a link with a 500usec link RTT was frequently the case)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (16 preceding siblings ...)
  2007-04-15 16:23 ` Gerrit Renker
@ 2007-04-15 16:41 ` Gerrit Renker
  2007-04-18 16:16 ` [dccp] " Colin Perkins
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Gerrit Renker @ 2007-04-15 16:41 UTC (permalink / raw)
  To: dccp

|  Basically if RTT is extremely low, don't do any of this limiting.
I think we should find a way of implementing this hint in CCID3,
if Eddie disagrees, maybe as experimental or Kconfig option.

This concept is different from the `preventing oscillations' idea
(which we should also do, I know); yet it will be something encountered
on almost any network that a CCID3 client runs on.


Coming back to the patch referred to in the subject-line - I would
like to keep that in for the moment; for the very reason that the
rate-pacing is broken, so that this patch avoids putting more kerosene
into the fire (remember, this patch is not about RTTs).

If you like I can change the comment above the single two lines of code
that this patch introduces; but I would still like to keep it.

And it is something which can easily be changed once the other issues
have been resolved. Plus, it is a safe option.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dccp] Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (17 preceding siblings ...)
  2007-04-15 16:41 ` Gerrit Renker
@ 2007-04-18 16:16 ` Colin Perkins
  2007-04-18 16:48 ` Lars Eggert
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Colin Perkins @ 2007-04-18 16:16 UTC (permalink / raw)
  To: dccp

On 11 Apr 2007, at 23:45, Ian McDonald wrote:
> On 4/12/07, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
>> There is no way to stop a Linux CCID3 sender from ramping X up to  
>> the link bandwidth of 1 Gbit/sec; but the scheduler can only  
>> control packet pacing up to a rate of s * HZ bytes per second.
>
> Let's start to think laterally about this. Many of the problems around
> CCID3/TFRC implementation seem to be on local LANs and rtt is less
> than t_gran. We get really badly affected by how we do x_recv etc and
> the rate is basically all over the show. We get affected by send
> credits and numerous other problems.

As a data point, we've seen similar stability issues with our user- 
space TFRC implementation, although at somewhat larger RTTs (order of  
a few milliseconds or less). We're still checking whether these are  
bugs in our code, or issues with TFRC, but this may be a broader  
issue than problems with the Linux DCCP implementation.

Colin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dccp] Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (18 preceding siblings ...)
  2007-04-18 16:16 ` [dccp] " Colin Perkins
@ 2007-04-18 16:48 ` Lars Eggert
  2007-04-18 18:32 ` vlad.gm
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Lars Eggert @ 2007-04-18 16:48 UTC (permalink / raw)
  To: dccp

[-- Attachment #1: Type: text/plain, Size: 1152 bytes --]

On 2007-4-18, at 19:16, ext Colin Perkins wrote:
> On 11 Apr 2007, at 23:45, Ian McDonald wrote:
>> On 4/12/07, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
>>> There is no way to stop a Linux CCID3 sender from ramping X up to  
>>> the link bandwidth of 1 Gbit/sec; but the scheduler can only  
>>> control packet pacing up to a rate of s * HZ bytes per second.
>>
>> Let's start to think laterally about this. Many of the problems  
>> around
>> CCID3/TFRC implementation seem to be on local LANs and rtt is less
>> than t_gran. We get really badly affected by how we do x_recv etc and
>> the rate is basically all over the show. We get affected by send
>> credits and numerous other problems.
>
> As a data point, we've seen similar stability issues with our user- 
> space TFRC implementation, although at somewhat larger RTTs (order  
> of a few milliseconds or less). We're still checking whether these  
> are bugs in our code, or issues with TFRC, but this may be a  
> broader issue than problems with the Linux DCCP implementation.

I think Vlad saw similar issues with the KAME code when running over  
a local area network. (Vlad?)

Lars



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2446 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dccp] Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (19 preceding siblings ...)
  2007-04-18 16:48 ` Lars Eggert
@ 2007-04-18 18:32 ` vlad.gm
  2007-04-18 18:34 ` vlad.gm
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: vlad.gm @ 2007-04-18 18:32 UTC (permalink / raw)
  To: dccp

Exactly, we encountered some problems also with the KAME
implementation when running over links with very small delays (in the
millisecond range). The problem was that the connection would choke
and require a long time in order to recover. We suspected some
possible bugs in the implementation (arithmetic overflow?), however
later reports by the Linux kernel implementers seemed to be consistent
with our experience.
We must not forget that we all derive our code from the same codebase,
which means that some implementation decisions such as the way in
which the TFRC equation calculations are done are the same for both
implementations.
As a side note, concerning the way the scheduling of sent packets and
timing are done: we set our target rate around 50-100 packets/sec,
typical for voice applications. KAME is based on the FreeBSD 5.4
kernel which was one of the last kernels in which HZ\x100 was used. In
order to have our implementation running reliably (also with larger
RTTs) we had to increase it to HZ\x1000. I am not sure which part of
the implementation is responsible for this problem. In principle, it
should be possible to implement TFRC with a grainy timer interrupt
rate.

Regards,
Vlad

On 4/18/07, Lars Eggert <lars.eggert@nokia.com> wrote:
> On 2007-4-18, at 19:16, ext Colin Perkins wrote:
> > On 11 Apr 2007, at 23:45, Ian McDonald wrote:
> >> On 4/12/07, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
> >>> There is no way to stop a Linux CCID3 sender from ramping X up to
> >>> the link bandwidth of 1 Gbit/sec; but the scheduler can only
> >>> control packet pacing up to a rate of s * HZ bytes per second.
> >>
> >> Let's start to think laterally about this. Many of the problems
> >> around
> >> CCID3/TFRC implementation seem to be on local LANs and rtt is less
> >> than t_gran. We get really badly affected by how we do x_recv etc and
> >> the rate is basically all over the show. We get affected by send
> >> credits and numerous other problems.
> >
> > As a data point, we've seen similar stability issues with our user-
> > space TFRC implementation, although at somewhat larger RTTs (order
> > of a few milliseconds or less). We're still checking whether these
> > are bugs in our code, or issues with TFRC, but this may be a
> > broader issue than problems with the Linux DCCP implementation.
>
> I think Vlad saw similar issues with the KAME code when running over
> a local area network. (Vlad?)
>
> Lars
>
>
>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dccp] Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (20 preceding siblings ...)
  2007-04-18 18:32 ` vlad.gm
@ 2007-04-18 18:34 ` vlad.gm
  2007-04-20  9:45 ` Gerrit Renker
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: vlad.gm @ 2007-04-18 18:34 UTC (permalink / raw)
  To: dccp

Exactly, we encountered some problems also with the KAME
implementation when running over links with very small delays (in the
millisecond range). The problem was that the connection would choke
and require a long time in order to recover. We suspected some
possible bugs in the implementation (arithmetic overflow?), however
later reports by the Linux kernel implementers seemed to be consistent
with our experience.
We must not forget that we all derive our code from the same codebase,
which means that some implementation decisions such as the way in
which the TFRC equation calculations are done are the same for both
implementations.
As a side note, concerning the way the scheduling of sent packets and
timing are done: we set our target rate around 50-100 packets/sec,
typical for voice applications. KAME is based on the FreeBSD 5.4
kernel which was one of the last kernels in which HZ\x100 was used. In
order to have our implementation running reliably (also with larger
RTTs) we had to increase it to HZ\x1000. I am not sure which part of
the implementation is responsible for this problem. In principle, it
should be possible to implement TFRC with a grainy timer interrupt
rate.

Regards,
Vlad

PS: Sorry four double posting, the mailing list only knows one of my
mail addresses.

On 4/18/07, Lars Eggert <lars.eggert@nokia.com> wrote:
> On 2007-4-18, at 19:16, ext Colin Perkins wrote:
> > On 11 Apr 2007, at 23:45, Ian McDonald wrote:
> >> On 4/12/07, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
> >>> There is no way to stop a Linux CCID3 sender from ramping X up to
> >>> the link bandwidth of 1 Gbit/sec; but the scheduler can only
> >>> control packet pacing up to a rate of s * HZ bytes per second.
> >>
> >> Let's start to think laterally about this. Many of the problems
> >> around
> >> CCID3/TFRC implementation seem to be on local LANs and rtt is less
> >> than t_gran. We get really badly affected by how we do x_recv etc and
> >> the rate is basically all over the show. We get affected by send
> >> credits and numerous other problems.
> >
> > As a data point, we've seen similar stability issues with our user-
> > space TFRC implementation, although at somewhat larger RTTs (order
> > of a few milliseconds or less). We're still checking whether these
> > are bugs in our code, or issues with TFRC, but this may be a
> > broader issue than problems with the Linux DCCP implementation.
>
> I think Vlad saw similar issues with the KAME code when running over
> a local area network. (Vlad?)
>
> Lars
>
>
>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dccp] Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (21 preceding siblings ...)
  2007-04-18 18:34 ` vlad.gm
@ 2007-04-20  9:45 ` Gerrit Renker
  2007-04-20 10:20 ` Ian McDonald
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Gerrit Renker @ 2007-04-20  9:45 UTC (permalink / raw)
  To: dccp

Ian, I would appreciate if in future you would not copy patch
descriptions over from dccp@vger to dccp@ietf. 

Apart from the fact that I don't like it, this creates the wrong idea among 
people who have little or nothing to do with actual protocol implementation 
- it produces an impression of "let's talk about some implementation bugs". 
(But competent implementation feedback is welcome and solicited on dccp@vger)

Which is the more regrettable since you are right in raising this point
as a general one: it is indeed a limitation of [RFC 3448, 4.6] with regard
to non-realtime OSes. To clarify, the two main issues of this limitation
are summarised below.


I. Uncontrollable speeds
------------------------
Non-realtime OSes schedule processes in discrete timeslices with a granularity
of t_gran = 1/HZ. When packets are scheduled using this mechanism, this
naturally limits the maximum packets per second to HZ.  

There are two speeds involved here: the packet rate `A' of the application
(user-space), and the allowed sending rate `X' determined by the TFRC mechanism
(kernel-space). 

These speeds are not related to one another. The allowed sending rate X will, under
normal circumstances, approach the link bandwidth; following the principles of slow
 start. The application sending rate A is fixed or is not.

No major problems arise when it is ensured that A is always below X. Numerical
example: A2kbps, X”Mbps (standard 100 Mb Ethernet link speed). When loss 
occurs, X is reduced according to p. As long as X remains above A, the sender
can send as before; if X is reduced below A, the sender will be limited.

Now the problem: when the application rate A is above s * HZ, there is a range
of speed where the TFRC mechanism is effectively out of control, i.e. requests
to reduce the sending rate in response to ECN-marked packets or congestion events
(ECN-marked or lost packets) will not be followed.

Numerical example: HZ\x1000/sec, X”Mbps, AYMbps, s\x1500 bytes. The controllable
limit is s * HZ = 1500 * 8 * 1000 bps = 12Mbps. Assume loss occurs in steady-state
such that X is to be reduced to X_reduced. Then, if 
                      s * HZ  <  X_reduced <=  A,
nothing will happen and the effective speed after computing X_reduced will remain at A. 
This is even more problematic if A is not fixed but could increase above its current rate
So, with regard to the numerical example, nothing will happen if X_reduced is between 
12Mpbs ... 59Mbps, the speed after the congestion occurs will remain at AYMbps.

The problem is even more serious when considering that Gigabit NICs are standard
in most laptops and desktop PCs, here X will ramp up even higher so that the range
for mayhem is even greater. (Standard Linux even comes with 10 Gbit ethernet drivers).

Again: the problem is that TFRC/CCID3 can not control speeds above s * HZ on a non-realtime
operating system. In car manufacturer terms, this is like a car whose accelerator is
functional, but switches to top speed, somewhere in its range. Obviously, they would not
be allowed to sell cars with such a deficiency. 

A safer solution, therefore, would be to insert a throttle into to limit application speeds 
below s * HZ; to keep applications from stealing bandwidth which they are not supposed to use.


II. Accumulation of send credits
--------------------------------
This second problem is also conceptual and is described as accumulation of send credits.
It has been discussed on this list before, please refer to those threads for a more
detailed description of how this comes about. The relevant point here is that accumulation
of send credits will also happen  as a natural consequence of using [RFC 3448, 4.6] on 
non-realtime operating systems. 

The reason is that the use of discrete time slices leads to a quantisation problem, where
t_nom is always set earlier than would be required by the exact formula: 0.9 msec becomes
0 msec, 1.7 msec becomes 1 msec, 2.8 msec becomes 2 msec and so forth (this assumes HZ\x1000,
it is even worse with lower values of HZ). 

Thus, after a few packets, the sender will be "too early" by the sum total of quantisation 
errors that have so far occurred. In the given numerical example, the sender is skewed by 
(0.9 + 0.7 + 0.8) msec = 2.4 msec, which will be broken into a send credit of 2 msec plus a 
remainder of 0.4 msec; which might clear at a later stage. 

In addition, this will lead to speeds which are typically faster than allowed by the exact
value of t_nom: measurements have shown that in the ``linear'' range of speeds below s * HZ,
the real implementation is more than 3 times faster than allowed by the sending rate X = s/t_ipi.


III. Accumulation of inaccuracies
---------------------------------
Due to context switch latencies, interrupt handling, and processing overhead, a scheduling-based
packet pacing will not schedule packets at the exact time, they may be sent slightly earlier or
later. This is another source where send credits can accumulate, but it is not fully understood
yet. It would require measurements to see how far off on average the scheduling is. It does seem
however that this problem is less serious than I/II; scheduling inaccuracies might cancel each other
out over the long term.


NOTE: numerical examples serve to illustrate the principle only. Please do not interpret this as an 
      invitation for discussion of numerical examples.

Thanks.

  
|  On 4/18/07, Lars Eggert <lars.eggert@nokia.com> wrote:
|  > On 2007-4-18, at 19:16, ext Colin Perkins wrote:
|  > > On 11 Apr 2007, at 23:45, Ian McDonald wrote:
|  > >> On 4/12/07, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
|  > >>> There is no way to stop a Linux CCID3 sender from ramping X up to
|  > >>> the link bandwidth of 1 Gbit/sec; but the scheduler can only
|  > >>> control packet pacing up to a rate of s * HZ bytes per second.
|  > >>
|  > >> Let's start to think laterally about this. Many of the problems
|  > >> around
|  > >> CCID3/TFRC implementation seem to be on local LANs and rtt is less
|  > >> than t_gran. We get really badly affected by how we do x_recv etc and
|  > >> the rate is basically all over the show. We get affected by send
|  > >> credits and numerous other problems.
|  > >
|  > > As a data point, we've seen similar stability issues with our user-
|  > > space TFRC implementation, although at somewhat larger RTTs (order
|  > > of a few milliseconds or less). We're still checking whether these
|  > > are bugs in our code, or issues with TFRC, but this may be a
|  > > broader issue than problems with the Linux DCCP implementation.
|  >
|  > I think Vlad saw similar issues with the KAME code when running over
|  > a local area network. (Vlad?)
|  >
|  > Lars
|  >
|  >
|  >
|  >
|  -
|  To unsubscribe from this list: send the line "unsubscribe dccp" in
|  the body of a message to majordomo@vger.kernel.org
|  More majordomo info at  http://vger.kernel.org/majordomo-info.html
|  
|  

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dccp] Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (22 preceding siblings ...)
  2007-04-20  9:45 ` Gerrit Renker
@ 2007-04-20 10:20 ` Ian McDonald
  2007-04-20 10:56 ` Colin Perkins
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Ian McDonald @ 2007-04-20 10:20 UTC (permalink / raw)
  To: dccp

On 4/20/07, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
> Ian, I would appreciate if in future you would not copy patch
> descriptions over from dccp@vger to dccp@ietf.
>
> Apart from the fact that I don't like it, this creates the wrong idea among
> people who have little or nothing to do with actual protocol implementation
> - it produces an impression of "let's talk about some implementation bugs".
> (But competent implementation feedback is welcome and solicited on dccp@vger)
>
> Which is the more regrettable since you are right in raising this point
> as a general one: it is indeed a limitation of [RFC 3448, 4.6] with regard
> to non-realtime OSes. To clarify, the two main issues of this limitation
> are summarised below.
>
Yes I was a bit lazy in replying without changing the subject etc. My apologies.

I'm refraining on replying further on these topics until I do some
experimentation which may be a while off.

Ian
-- 
Web: http://wand.net.nz/~iam4/
Blog: http://iansblog.jandi.co.nz
WAND Network Research Group

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dccp] Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (23 preceding siblings ...)
  2007-04-20 10:20 ` Ian McDonald
@ 2007-04-20 10:56 ` Colin Perkins
  2007-04-20 11:31 ` Gerrit Renker
  2007-04-24 22:50 ` Colin Perkins
  26 siblings, 0 replies; 28+ messages in thread
From: Colin Perkins @ 2007-04-20 10:56 UTC (permalink / raw)
  To: dccp

On 20 Apr 2007, at 11:20, Ian McDonald wrote:
> On 4/20/07, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
>> Ian, I would appreciate if in future you would not copy patch
>> descriptions over from dccp@vger to dccp@ietf.
>>
>> Apart from the fact that I don't like it, this creates the wrong  
>> idea among
>> people who have little or nothing to do with actual protocol  
>> implementation
>> - it produces an impression of "let's talk about some  
>> implementation bugs".
>> (But competent implementation feedback is welcome and solicited on  
>> dccp@vger)
>>
>> Which is the more regrettable since you are right in raising this  
>> point
>> as a general one: it is indeed a limitation of [RFC 3448, 4.6]  
>> with regard
>> to non-realtime OSes. To clarify, the two main issues of this  
>> limitation
>> are summarised below.
>>
> Yes I was a bit lazy in replying without changing the subject etc.  
> My apologies.
>
> I'm refraining on replying further on these topics until I do some
> experimentation which may be a while off.

Likewise - we have experiments in progress, and while we do have  
stability problems with our TFRC implementation, I'm unconvinced the  
behaviour is due to the reasons suggested. I'll report back when we  
have more results.

Colin

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dccp] Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (24 preceding siblings ...)
  2007-04-20 10:56 ` Colin Perkins
@ 2007-04-20 11:31 ` Gerrit Renker
  2007-04-24 22:50 ` Colin Perkins
  26 siblings, 0 replies; 28+ messages in thread
From: Gerrit Renker @ 2007-04-20 11:31 UTC (permalink / raw)
  To: dccp

|  Likewise - we have experiments in progress, and while we do have  
|  stability problems with our TFRC implementation, I'm unconvinced the  
|  behaviour is due to the reasons suggested. I'll report back when we  
|  have more results.
|  
I am not sure you mean the same problem. If you have data which relates
to the problem described - and the description is the result of experiments
over several months - please do post the results.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dccp] Re: [PATCH 2/25]: Avoid accumulation of large send credit
  2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
                   ` (25 preceding siblings ...)
  2007-04-20 11:31 ` Gerrit Renker
@ 2007-04-24 22:50 ` Colin Perkins
  26 siblings, 0 replies; 28+ messages in thread
From: Colin Perkins @ 2007-04-24 22:50 UTC (permalink / raw)
  To: dccp

On 20 Apr 2007, at 12:31, Gerrit Renker wrote:
> |  Likewise - we have experiments in progress, and while we do have
> |  stability problems with our TFRC implementation, I'm unconvinced  
> the
> |  behaviour is due to the reasons suggested. I'll report back when we
> |  have more results.
> |
> I am not sure you mean the same problem.

Maybe - it's certainly possible there are several issues here.

> If you have data which relates to the problem described - and the  
> description is the result of experiments over several months -  
> please do post the results.

We have initial data, collected as part of an MSc project over the  
course of a year, which gives some indications as to the cause of the  
issues we're seeing. As I said, I'll report back when we have more  
data to confirm (or otherwise) those findings.

Colin

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2007-04-24 22:50 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-21 18:44 [PATCH 2/25]: Avoid accumulation of large send credit Gerrit Renker
2007-03-26  2:33 ` Ian McDonald
2007-04-10 17:24 ` Eddie Kohler
2007-04-11 14:50 ` Gerrit Renker
2007-04-11 15:43 ` Eddie Kohler
2007-04-11 22:45 ` Ian McDonald
2007-04-12 11:40 ` Gerrit Renker
2007-04-12 12:55 ` Gerrit Renker
2007-04-12 14:39 ` Eddie Kohler
2007-04-13 18:27 ` Gerrit Renker
2007-04-13 20:37 ` Eddie Kohler
2007-04-13 20:58 ` David Miller
2007-04-13 21:45 ` Ian McDonald
2007-04-13 23:43 ` Eddie Kohler
2007-04-14  5:51 ` Eddie Kohler
2007-04-15 15:44 ` Gerrit Renker
2007-04-15 15:56 ` Gerrit Renker
2007-04-15 16:23 ` Gerrit Renker
2007-04-15 16:41 ` Gerrit Renker
2007-04-18 16:16 ` [dccp] " Colin Perkins
2007-04-18 16:48 ` Lars Eggert
2007-04-18 18:32 ` vlad.gm
2007-04-18 18:34 ` vlad.gm
2007-04-20  9:45 ` Gerrit Renker
2007-04-20 10:20 ` Ian McDonald
2007-04-20 10:56 ` Colin Perkins
2007-04-20 11:31 ` Gerrit Renker
2007-04-24 22:50 ` Colin Perkins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.