* [RFC] fq_codel : interval servo on hosts [not found] <1346396137.2586.301.camel@edumazet-glaptop> @ 2012-08-31 13:50 ` Eric Dumazet 2012-08-31 13:57 ` [RFC v2] " Eric Dumazet 0 siblings, 1 reply; 11+ messages in thread From: Eric Dumazet @ 2012-08-31 13:50 UTC (permalink / raw) To: codel; +Cc: Tomas Hruby, Nandita Dukkipati, netdev On Thu, 2012-08-30 at 23:55 -0700, Eric Dumazet wrote: > On locally generated TCP traffic (host), we can override the 100 ms > interval value using the more accurate RTT estimation maintained by TCP > stack (tp->srtt) > > Datacenter workload benefits using shorter feedback (say if RTT is below > 1 ms, we can react 100 times faster to a congestion) > > Idea from Yuchung Cheng. > Linux patch would be the following : I'll do tests next week, but I am sending a raw patch right now if anybody wants to try it. Presumably we also want to adjust target as well. To get more precise srtt values in the datacenter, we might avoid the 'one jiffie slack' on small values in tcp_rtt_estimator(), as we force m to be 1 before the scaling by 8 : if (m == 0) m = 1; We only need to force the least significant bit of srtt to be set. net/sched/sch_fq_codel.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c index 9fc1c62..7d2fe35 100644 --- a/net/sched/sch_fq_codel.c +++ b/net/sched/sch_fq_codel.c @@ -25,6 +25,7 @@ #include <net/pkt_sched.h> #include <net/flow_keys.h> #include <net/codel.h> +#include <linux/tcp.h> /* Fair Queue CoDel. * @@ -59,6 +60,7 @@ struct fq_codel_sched_data { u32 perturbation; /* hash perturbation */ u32 quantum; /* psched_mtu(qdisc_dev(sch)); */ struct codel_params cparams; + codel_time_t default_interval; struct codel_stats cstats; u32 drop_overlimit; u32 new_flow_count; @@ -211,6 +213,14 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch) return NET_XMIT_SUCCESS; } +/* Given TCP srtt evaluation, return codel interval. + * srtt is given in jiffies, scaled by 8. + */ +static codel_time_t tcp_srtt_to_codel(unsigned int srtt) +{ + return srtt * ((NSEC_PER_SEC >> (CODEL_SHIFT + 3)) / HZ); +} + /* This is the specific function called from codel_dequeue() * to dequeue a packet from queue. Note: backlog is handled in * codel, we dont need to reduce it here. @@ -220,12 +230,21 @@ static struct sk_buff *dequeue(struct codel_vars *vars, struct Qdisc *sch) struct fq_codel_sched_data *q = qdisc_priv(sch); struct fq_codel_flow *flow; struct sk_buff *skb = NULL; + struct sock *sk; flow = container_of(vars, struct fq_codel_flow, cvars); if (flow->head) { skb = dequeue_head(flow); q->backlogs[flow - q->flows] -= qdisc_pkt_len(skb); sch->q.qlen--; + sk = skb->sk; + q->cparams.interval = q->default_interval; + if (sk && sk->sk_protocol == IPPROTO_TCP) { + u32 srtt = tcp_sk(sk)->srtt; + + if (srtt) + q->cparams.interval = tcp_srtt_to_codel(srtt); + } } return skb; } @@ -330,7 +349,7 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt) if (tb[TCA_FQ_CODEL_INTERVAL]) { u64 interval = nla_get_u32(tb[TCA_FQ_CODEL_INTERVAL]); - q->cparams.interval = (interval * NSEC_PER_USEC) >> CODEL_SHIFT; + q->default_interval = (interval * NSEC_PER_USEC) >> CODEL_SHIFT; } if (tb[TCA_FQ_CODEL_LIMIT]) @@ -441,7 +460,7 @@ static int fq_codel_dump(struct Qdisc *sch, struct sk_buff *skb) nla_put_u32(skb, TCA_FQ_CODEL_LIMIT, sch->limit) || nla_put_u32(skb, TCA_FQ_CODEL_INTERVAL, - codel_time_to_us(q->cparams.interval)) || + codel_time_to_us(q->default_interval)) || nla_put_u32(skb, TCA_FQ_CODEL_ECN, q->cparams.ecn) || nla_put_u32(skb, TCA_FQ_CODEL_QUANTUM, ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC v2] fq_codel : interval servo on hosts 2012-08-31 13:50 ` [RFC] fq_codel : interval servo on hosts Eric Dumazet @ 2012-08-31 13:57 ` Eric Dumazet 2012-09-01 1:37 ` Yuchung Cheng 0 siblings, 1 reply; 11+ messages in thread From: Eric Dumazet @ 2012-08-31 13:57 UTC (permalink / raw) To: codel; +Cc: Tomas Hruby, Nandita Dukkipati, netdev On Fri, 2012-08-31 at 06:50 -0700, Eric Dumazet wrote: > On Thu, 2012-08-30 at 23:55 -0700, Eric Dumazet wrote: > > On locally generated TCP traffic (host), we can override the 100 ms > > interval value using the more accurate RTT estimation maintained by TCP > > stack (tp->srtt) > > > > Datacenter workload benefits using shorter feedback (say if RTT is below > > 1 ms, we can react 100 times faster to a congestion) > > > > Idea from Yuchung Cheng. > > > > Linux patch would be the following : > > I'll do tests next week, but I am sending a raw patch right now if > anybody wants to try it. > > Presumably we also want to adjust target as well. > > To get more precise srtt values in the datacenter, we might avoid the > 'one jiffie slack' on small values in tcp_rtt_estimator(), as we force > m to be 1 before the scaling by 8 : > > if (m == 0) > m = 1; > > We only need to force the least significant bit of srtt to be set. > Hmm, I also need to properly init default_interval after codel_params_init(&q->cparams) : net/sched/sch_fq_codel.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c index 9fc1c62..f04ff6a 100644 --- a/net/sched/sch_fq_codel.c +++ b/net/sched/sch_fq_codel.c @@ -25,6 +25,7 @@ #include <net/pkt_sched.h> #include <net/flow_keys.h> #include <net/codel.h> +#include <linux/tcp.h> /* Fair Queue CoDel. * @@ -59,6 +60,7 @@ struct fq_codel_sched_data { u32 perturbation; /* hash perturbation */ u32 quantum; /* psched_mtu(qdisc_dev(sch)); */ struct codel_params cparams; + codel_time_t default_interval; struct codel_stats cstats; u32 drop_overlimit; u32 new_flow_count; @@ -211,6 +213,14 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch) return NET_XMIT_SUCCESS; } +/* Given TCP srtt evaluation, return codel interval. + * srtt is given in jiffies, scaled by 8. + */ +static codel_time_t tcp_srtt_to_codel(unsigned int srtt) +{ + return srtt * ((NSEC_PER_SEC >> (CODEL_SHIFT + 3)) / HZ); +} + /* This is the specific function called from codel_dequeue() * to dequeue a packet from queue. Note: backlog is handled in * codel, we dont need to reduce it here. @@ -220,12 +230,21 @@ static struct sk_buff *dequeue(struct codel_vars *vars, struct Qdisc *sch) struct fq_codel_sched_data *q = qdisc_priv(sch); struct fq_codel_flow *flow; struct sk_buff *skb = NULL; + struct sock *sk; flow = container_of(vars, struct fq_codel_flow, cvars); if (flow->head) { skb = dequeue_head(flow); q->backlogs[flow - q->flows] -= qdisc_pkt_len(skb); sch->q.qlen--; + sk = skb->sk; + q->cparams.interval = q->default_interval; + if (sk && sk->sk_protocol == IPPROTO_TCP) { + u32 srtt = tcp_sk(sk)->srtt; + + if (srtt) + q->cparams.interval = tcp_srtt_to_codel(srtt); + } } return skb; } @@ -330,7 +349,7 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt) if (tb[TCA_FQ_CODEL_INTERVAL]) { u64 interval = nla_get_u32(tb[TCA_FQ_CODEL_INTERVAL]); - q->cparams.interval = (interval * NSEC_PER_USEC) >> CODEL_SHIFT; + q->default_interval = (interval * NSEC_PER_USEC) >> CODEL_SHIFT; } if (tb[TCA_FQ_CODEL_LIMIT]) @@ -395,6 +414,7 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt) INIT_LIST_HEAD(&q->new_flows); INIT_LIST_HEAD(&q->old_flows); codel_params_init(&q->cparams); + q->default_interval = q->cparams.interval; codel_stats_init(&q->cstats); q->cparams.ecn = true; @@ -441,7 +461,7 @@ static int fq_codel_dump(struct Qdisc *sch, struct sk_buff *skb) nla_put_u32(skb, TCA_FQ_CODEL_LIMIT, sch->limit) || nla_put_u32(skb, TCA_FQ_CODEL_INTERVAL, - codel_time_to_us(q->cparams.interval)) || + codel_time_to_us(q->default_interval)) || nla_put_u32(skb, TCA_FQ_CODEL_ECN, q->cparams.ecn) || nla_put_u32(skb, TCA_FQ_CODEL_QUANTUM, ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC v2] fq_codel : interval servo on hosts 2012-08-31 13:57 ` [RFC v2] " Eric Dumazet @ 2012-09-01 1:37 ` Yuchung Cheng 2012-09-01 12:51 ` Eric Dumazet 0 siblings, 1 reply; 11+ messages in thread From: Yuchung Cheng @ 2012-09-01 1:37 UTC (permalink / raw) To: Eric Dumazet; +Cc: Tomas Hruby, Nandita Dukkipati, netdev, codel On Fri, Aug 31, 2012 at 6:57 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Fri, 2012-08-31 at 06:50 -0700, Eric Dumazet wrote: >> On Thu, 2012-08-30 at 23:55 -0700, Eric Dumazet wrote: >> > On locally generated TCP traffic (host), we can override the 100 ms >> > interval value using the more accurate RTT estimation maintained by TCP >> > stack (tp->srtt) >> > >> > Datacenter workload benefits using shorter feedback (say if RTT is below >> > 1 ms, we can react 100 times faster to a congestion) >> > >> > Idea from Yuchung Cheng. >> > >> >> Linux patch would be the following : >> >> I'll do tests next week, but I am sending a raw patch right now if >> anybody wants to try it. >> >> Presumably we also want to adjust target as well. >> >> To get more precise srtt values in the datacenter, we might avoid the >> 'one jiffie slack' on small values in tcp_rtt_estimator(), as we force >> m to be 1 before the scaling by 8 : >> >> if (m == 0) >> m = 1; >> >> We only need to force the least significant bit of srtt to be set. >> Just curious: tp->srtt is a very rough estimator, e.g., Delayed-ACks can easily add 40 - 200ms fuzziness. Will this affect short flows? > > Hmm, I also need to properly init default_interval after > codel_params_init(&q->cparams) : > > net/sched/sch_fq_codel.c | 24 ++++++++++++++++++++++-- > 1 file changed, 22 insertions(+), 2 deletions(-) > > diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c > index 9fc1c62..f04ff6a 100644 > --- a/net/sched/sch_fq_codel.c > +++ b/net/sched/sch_fq_codel.c > @@ -25,6 +25,7 @@ > #include <net/pkt_sched.h> > #include <net/flow_keys.h> > #include <net/codel.h> > +#include <linux/tcp.h> > > /* Fair Queue CoDel. > * > @@ -59,6 +60,7 @@ struct fq_codel_sched_data { > u32 perturbation; /* hash perturbation */ > u32 quantum; /* psched_mtu(qdisc_dev(sch)); */ > struct codel_params cparams; > + codel_time_t default_interval; > struct codel_stats cstats; > u32 drop_overlimit; > u32 new_flow_count; > @@ -211,6 +213,14 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch) > return NET_XMIT_SUCCESS; > } > > +/* Given TCP srtt evaluation, return codel interval. > + * srtt is given in jiffies, scaled by 8. > + */ > +static codel_time_t tcp_srtt_to_codel(unsigned int srtt) > +{ > + return srtt * ((NSEC_PER_SEC >> (CODEL_SHIFT + 3)) / HZ); > +} > + > /* This is the specific function called from codel_dequeue() > * to dequeue a packet from queue. Note: backlog is handled in > * codel, we dont need to reduce it here. > @@ -220,12 +230,21 @@ static struct sk_buff *dequeue(struct codel_vars *vars, struct Qdisc *sch) > struct fq_codel_sched_data *q = qdisc_priv(sch); > struct fq_codel_flow *flow; > struct sk_buff *skb = NULL; > + struct sock *sk; > > flow = container_of(vars, struct fq_codel_flow, cvars); > if (flow->head) { > skb = dequeue_head(flow); > q->backlogs[flow - q->flows] -= qdisc_pkt_len(skb); > sch->q.qlen--; > + sk = skb->sk; > + q->cparams.interval = q->default_interval; > + if (sk && sk->sk_protocol == IPPROTO_TCP) { > + u32 srtt = tcp_sk(sk)->srtt; > + > + if (srtt) > + q->cparams.interval = tcp_srtt_to_codel(srtt); > + } > } > return skb; > } > @@ -330,7 +349,7 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt) > if (tb[TCA_FQ_CODEL_INTERVAL]) { > u64 interval = nla_get_u32(tb[TCA_FQ_CODEL_INTERVAL]); > > - q->cparams.interval = (interval * NSEC_PER_USEC) >> CODEL_SHIFT; > + q->default_interval = (interval * NSEC_PER_USEC) >> CODEL_SHIFT; > } > > if (tb[TCA_FQ_CODEL_LIMIT]) > @@ -395,6 +414,7 @@ static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt) > INIT_LIST_HEAD(&q->new_flows); > INIT_LIST_HEAD(&q->old_flows); > codel_params_init(&q->cparams); > + q->default_interval = q->cparams.interval; > codel_stats_init(&q->cstats); > q->cparams.ecn = true; > > @@ -441,7 +461,7 @@ static int fq_codel_dump(struct Qdisc *sch, struct sk_buff *skb) > nla_put_u32(skb, TCA_FQ_CODEL_LIMIT, > sch->limit) || > nla_put_u32(skb, TCA_FQ_CODEL_INTERVAL, > - codel_time_to_us(q->cparams.interval)) || > + codel_time_to_us(q->default_interval)) || > nla_put_u32(skb, TCA_FQ_CODEL_ECN, > q->cparams.ecn) || > nla_put_u32(skb, TCA_FQ_CODEL_QUANTUM, > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC v2] fq_codel : interval servo on hosts 2012-09-01 1:37 ` Yuchung Cheng @ 2012-09-01 12:51 ` Eric Dumazet 2012-09-04 15:10 ` Nandita Dukkipati 0 siblings, 1 reply; 11+ messages in thread From: Eric Dumazet @ 2012-09-01 12:51 UTC (permalink / raw) To: Yuchung Cheng; +Cc: Tomas Hruby, Nandita Dukkipati, netdev, codel On Fri, 2012-08-31 at 18:37 -0700, Yuchung Cheng wrote: > Just curious: tp->srtt is a very rough estimator, e.g., Delayed-ACks > can easily add 40 - 200ms fuzziness. Will this affect short flows? Good point Delayed acks shouldnt matter, because they happen when flow had been idle for a while. I guess we should clamp the srtt to the default interval if (srtt) q->cparams.interval = min(tcp_srtt_to_codel(srtt), q->default_interval); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC v2] fq_codel : interval servo on hosts 2012-09-01 12:51 ` Eric Dumazet @ 2012-09-04 15:10 ` Nandita Dukkipati 2012-09-04 15:25 ` Jonathan Morton 2012-09-04 15:34 ` Eric Dumazet 0 siblings, 2 replies; 11+ messages in thread From: Nandita Dukkipati @ 2012-09-04 15:10 UTC (permalink / raw) To: Eric Dumazet; +Cc: Tomas Hruby, netdev, codel The idea of using srtt as interval makes sense to me if alongside we also hash flows with similar RTTs into same bucket. But with just the change in interval, I am not sure how codel is expected to behave. My understanding is: the interval (usually set to worst case expected RTT) is used to measure the standing queue or the "bad" queue. Suppose 1ms and 100ms RTT flows get hashed to same bucket, then the interval with this patch will flip flop between 1ms and 100ms. How is this expected to measure a standing queue? In fact I think the 1ms flow may land up measuring the burstiness or the "good" queue created by the long RTT flows, and this isn't desirable. On Sat, Sep 1, 2012 at 5:51 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Fri, 2012-08-31 at 18:37 -0700, Yuchung Cheng wrote: > >> Just curious: tp->srtt is a very rough estimator, e.g., Delayed-ACks >> can easily add 40 - 200ms fuzziness. Will this affect short flows? > > Good point > > Delayed acks shouldnt matter, because they happen when flow had been > idle for a while. > > I guess we should clamp the srtt to the default interval > > if (srtt) > q->cparams.interval = min(tcp_srtt_to_codel(srtt), > q->default_interval); > > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC v2] fq_codel : interval servo on hosts 2012-09-04 15:10 ` Nandita Dukkipati @ 2012-09-04 15:25 ` Jonathan Morton 2012-09-04 15:39 ` Eric Dumazet 2012-09-04 15:34 ` Eric Dumazet 1 sibling, 1 reply; 11+ messages in thread From: Jonathan Morton @ 2012-09-04 15:25 UTC (permalink / raw) To: Nandita Dukkipati; +Cc: netdev, codel, Tomas Hruby I think that in most cases, a long RTT flow and a short RTT flow on the same interface means that the long RTT flow isn't bottlenecked here, and therefore won't ever build up a significant queue - and that means you would want to track over the shorter interval. Is that a reasonable assumption? The key to knowledge is not to rely on others to teach you it. On 4 Sep 2012, at 18:10, Nandita Dukkipati <nanditad@google.com> wrote: > The idea of using srtt as interval makes sense to me if alongside we > also hash flows with similar RTTs into same bucket. But with just the > change in interval, I am not sure how codel is expected to behave. > > My understanding is: the interval (usually set to worst case expected > RTT) is used to measure the standing queue or the "bad" queue. Suppose > 1ms and 100ms RTT flows get hashed to same bucket, then the interval > with this patch will flip flop between 1ms and 100ms. How is this > expected to measure a standing queue? In fact I think the 1ms flow may > land up measuring the burstiness or the "good" queue created by the > long RTT flows, and this isn't desirable. > > > On Sat, Sep 1, 2012 at 5:51 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> On Fri, 2012-08-31 at 18:37 -0700, Yuchung Cheng wrote: >> >>> Just curious: tp->srtt is a very rough estimator, e.g., Delayed-ACks >>> can easily add 40 - 200ms fuzziness. Will this affect short flows? >> >> Good point >> >> Delayed acks shouldnt matter, because they happen when flow had been >> idle for a while. >> >> I guess we should clamp the srtt to the default interval >> >> if (srtt) >> q->cparams.interval = min(tcp_srtt_to_codel(srtt), >> q->default_interval); >> >> >> > _______________________________________________ > Codel mailing list > Codel@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/codel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC v2] fq_codel : interval servo on hosts 2012-09-04 15:25 ` Jonathan Morton @ 2012-09-04 15:39 ` Eric Dumazet 0 siblings, 0 replies; 11+ messages in thread From: Eric Dumazet @ 2012-09-04 15:39 UTC (permalink / raw) To: Jonathan Morton; +Cc: Nandita Dukkipati, netdev, codel, Tomas Hruby On Tue, 2012-09-04 at 18:25 +0300, Jonathan Morton wrote: > I think that in most cases, a long RTT flow and a short RTT flow on > the same interface means that the long RTT flow isn't bottlenecked > here, and therefore won't ever build up a significant queue - and that > means you would want to track over the shorter interval. Is that a > reasonable assumption? > This would be reasonable, but if we have a shorter interval, this means we could drop packets of the long RTT flow sooner than expected. Thats because the drop_next value is setup on the previous packet, and not based on the 'next packet' Re-evaluating drop_next at the right time would need more cpu cycles. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC v2] fq_codel : interval servo on hosts 2012-09-04 15:10 ` Nandita Dukkipati 2012-09-04 15:25 ` Jonathan Morton @ 2012-09-04 15:34 ` Eric Dumazet 2012-09-04 16:40 ` Dave Taht 1 sibling, 1 reply; 11+ messages in thread From: Eric Dumazet @ 2012-09-04 15:34 UTC (permalink / raw) To: Nandita Dukkipati; +Cc: Tomas Hruby, netdev, codel On Tue, 2012-09-04 at 08:10 -0700, Nandita Dukkipati wrote: > The idea of using srtt as interval makes sense to me if alongside we > also hash flows with similar RTTs into same bucket. But with just the > change in interval, I am not sure how codel is expected to behave. > > My understanding is: the interval (usually set to worst case expected > RTT) is used to measure the standing queue or the "bad" queue. Suppose > 1ms and 100ms RTT flows get hashed to same bucket, then the interval > with this patch will flip flop between 1ms and 100ms. How is this > expected to measure a standing queue? In fact I think the 1ms flow may > land up measuring the burstiness or the "good" queue created by the > long RTT flows, and this isn't desirable. > Well, how things settle with a pure codel, mixing flows of very different RTT then ? It seems there is a high resistance on SFQ/fq_codel model because of the probabilities of flows sharing a bucket. So what about removing the stochastic thing and switch to a hash with collision resolution ? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC v2] fq_codel : interval servo on hosts 2012-09-04 15:34 ` Eric Dumazet @ 2012-09-04 16:40 ` Dave Taht 2012-09-04 16:54 ` Eric Dumazet 2012-09-04 16:57 ` Eric Dumazet 0 siblings, 2 replies; 11+ messages in thread From: Dave Taht @ 2012-09-04 16:40 UTC (permalink / raw) To: Eric Dumazet Cc: Nandita Dukkipati, Yuchung Cheng, codel, Kathleen Nichols, netdev, Tomas Hruby On Tue, Sep 4, 2012 at 8:34 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Tue, 2012-09-04 at 08:10 -0700, Nandita Dukkipati wrote: >> The idea of using srtt as interval makes sense to me if alongside we >> also hash flows with similar RTTs into same bucket. But with just the >> change in interval, I am not sure how codel is expected to behave. >> >> My understanding is: the interval (usually set to worst case expected >> RTT) is used to measure the standing queue or the "bad" queue. Suppose >> 1ms and 100ms RTT flows get hashed to same bucket, then the interval >> with this patch will flip flop between 1ms and 100ms. How is this >> expected to measure a standing queue? In fact I think the 1ms flow may >> land up measuring the burstiness or the "good" queue created by the >> long RTT flows, and this isn't desirable. Experiments would be good. > > Well, how things settle with a pure codel, mixing flows of very > different RTT then ? Elephants are shot statistically more often than mice. > It seems there is a high resistance on SFQ/fq_codel model because of the > probabilities of flows sharing a bucket. I was going to do this in a separate email, because it is a little off-topic. fq_codel has a standing queue problem, based on the fact that when a queue empties, codel.h resets. This made sense for the single FIFO codel but not multi-queued fq_codel. So after we hit X high rate flows, target can never be achieved, even straining mightily, and we end up with a standing queue again. Easily seen with like 150 bidirectional flows at 10 or 100Mbit. (as queues go, it's still pretty good queue. And: I've fiddled with various means of draining multi-queue behavior thus far, and they ended up unstable/unfair) > So what about removing the stochastic thing and switch to a hash with > collision resolution ? Was considered and discarded in the original SFQ paper as being too computationally intensive (in 1993). Worth revisiting. http://www2.rdrop.com/~paulmck/scalability/paper/sfq.2002.06.04.pdf > > -- Dave Täht http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out with fq_codel!" ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC v2] fq_codel : interval servo on hosts 2012-09-04 16:40 ` Dave Taht @ 2012-09-04 16:54 ` Eric Dumazet 2012-09-04 16:57 ` Eric Dumazet 1 sibling, 0 replies; 11+ messages in thread From: Eric Dumazet @ 2012-09-04 16:54 UTC (permalink / raw) To: Dave Taht; +Cc: Tomas Hruby, Nandita Dukkipati, netdev, codel On Tue, 2012-09-04 at 09:40 -0700, Dave Taht wrote: > > > > Well, how things settle with a pure codel, mixing flows of very > > different RTT then ? > > Elephants are shot statistically more often than mice. This doesnt answer the question. long/short RTT have nothing to do with elephant and mice. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC v2] fq_codel : interval servo on hosts 2012-09-04 16:40 ` Dave Taht 2012-09-04 16:54 ` Eric Dumazet @ 2012-09-04 16:57 ` Eric Dumazet 1 sibling, 0 replies; 11+ messages in thread From: Eric Dumazet @ 2012-09-04 16:57 UTC (permalink / raw) To: Dave Taht; +Cc: Tomas Hruby, Nandita Dukkipati, netdev, codel On Tue, 2012-09-04 at 09:40 -0700, Dave Taht wrote: > fq_codel has a standing queue problem, based on the fact that when a > queue empties, codel.h resets. This made sense for the single FIFO > codel but not multi-queued fq_codel. So after we hit X high rate > flows, target can never be achieved, even straining mightily, and we > end up with a standing queue again. > > Easily seen with like 150 bidirectional flows at 10 or 100Mbit. > > (as queues go, it's still pretty good queue. And: I've fiddled with > various means of draining multi-queue behavior thus far, and they > ended up unstable/unfair) No idea of what you mean by "codel.h resets". Please use small mails, one idea by mail. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-09-04 16:57 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <1346396137.2586.301.camel@edumazet-glaptop> 2012-08-31 13:50 ` [RFC] fq_codel : interval servo on hosts Eric Dumazet 2012-08-31 13:57 ` [RFC v2] " Eric Dumazet 2012-09-01 1:37 ` Yuchung Cheng 2012-09-01 12:51 ` Eric Dumazet 2012-09-04 15:10 ` Nandita Dukkipati 2012-09-04 15:25 ` Jonathan Morton 2012-09-04 15:39 ` Eric Dumazet 2012-09-04 15:34 ` Eric Dumazet 2012-09-04 16:40 ` Dave Taht 2012-09-04 16:54 ` Eric Dumazet 2012-09-04 16:57 ` Eric Dumazet
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.