From: Eric Dumazet <eric.dumazet@gmail.com>
To: codel@lists.bufferbloat.net
Cc: Tomas Hruby <thruby@google.com>,
Nandita Dukkipati <nanditad@google.com>,
netdev <netdev@vger.kernel.org>
Subject: [RFC] fq_codel : interval servo on hosts
Date: Fri, 31 Aug 2012 06:50:31 -0700 [thread overview]
Message-ID: <1346421031.2591.34.camel@edumazet-glaptop> (raw)
In-Reply-To: <1346396137.2586.301.camel@edumazet-glaptop>
On Thu, 2012-08-30 at 23:55 -0700, Eric Dumazet wrote:
> On locally generated TCP traffic (host), we can override the 100 ms
> interval value using the more accurate RTT estimation maintained by TCP
> stack (tp->srtt)
>
> Datacenter workload benefits using shorter feedback (say if RTT is below
> 1 ms, we can react 100 times faster to a congestion)
>
> Idea from Yuchung Cheng.
>
Linux patch would be the following :
I'll do tests next week, but I am sending a raw patch right now if
anybody wants to try it.
Presumably we also want to adjust target as well.
To get more precise srtt values in the datacenter, we might avoid the
'one jiffie slack' on small values in tcp_rtt_estimator(), as we force
m to be 1 before the scaling by 8 :
if (m == 0)
m = 1;
We only need to force the least significant bit of srtt to be set.
net/sched/sch_fq_codel.c | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 9fc1c62..7d2fe35 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -25,6 +25,7 @@
#include <net/pkt_sched.h>
#include <net/flow_keys.h>
#include <net/codel.h>
+#include <linux/tcp.h>
/* Fair Queue CoDel.
*
@@ -59,6 +60,7 @@ struct fq_codel_sched_data {
u32 perturbation; /* hash perturbation */
u32 quantum; /* psched_mtu(qdisc_dev(sch)); */
struct codel_params cparams;
+ codel_time_t default_interval;
struct codel_stats cstats;
u32 drop_overlimit;
u32 new_flow_count;
@@ -211,6 +213,14 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
return NET_XMIT_SUCCESS;
}
+/* Given TCP srtt evaluation, return codel interval.
+ * srtt is given in jiffies, scaled by 8.
+ */
+static codel_time_t tcp_srtt_to_codel(unsigned int srtt)
+{
+ return srtt * ((NSEC_PER_SEC >> (CODEL_SHIFT + 3)) / HZ);
+}
+
/* This is the specific function called from codel_dequeue()
* to dequeue a packet from queue. Note: backlog is handled in
* codel, we dont need to reduce it here.
@@ -220,12 +230,21 @@ static struct sk_buff *dequeue(struct codel_vars *vars, struct Qdisc *sch)
struct fq_codel_sched_data *q = qdisc_priv(sch);
struct fq_codel_flow *flow;
struct sk_buff *skb = NULL;
+ struct sock *sk;
flow = container_of(vars, struct fq_codel_flow, cvars);
if (flow->head) {
skb = dequeue_head(flow);
q->backlogs[flow - q->flows] -= qdisc_pkt_len(skb);
sch->q.qlen--;
+ sk = skb->sk;
+ q->cparams.interval = q->default_interval;
+ if (sk && sk->sk_protocol == IPPROTO_TCP) {
+ u32 srtt = tcp_sk(sk)->srtt;
+
+ if (srtt)
+ q->cparams.interval = tcp_srtt_to_codel(srtt);
+ }
}
return skb;
}
@@ -330,7 +349,7 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt)
if (tb[TCA_FQ_CODEL_INTERVAL]) {
u64 interval = nla_get_u32(tb[TCA_FQ_CODEL_INTERVAL]);
- q->cparams.interval = (interval * NSEC_PER_USEC) >> CODEL_SHIFT;
+ q->default_interval = (interval * NSEC_PER_USEC) >> CODEL_SHIFT;
}
if (tb[TCA_FQ_CODEL_LIMIT])
@@ -441,7 +460,7 @@ static int fq_codel_dump(struct Qdisc *sch, struct sk_buff *skb)
nla_put_u32(skb, TCA_FQ_CODEL_LIMIT,
sch->limit) ||
nla_put_u32(skb, TCA_FQ_CODEL_INTERVAL,
- codel_time_to_us(q->cparams.interval)) ||
+ codel_time_to_us(q->default_interval)) ||
nla_put_u32(skb, TCA_FQ_CODEL_ECN,
q->cparams.ecn) ||
nla_put_u32(skb, TCA_FQ_CODEL_QUANTUM,
next parent reply other threads:[~2012-08-31 13:50 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1346396137.2586.301.camel@edumazet-glaptop>
2012-08-31 13:50 ` Eric Dumazet [this message]
2012-08-31 13:57 ` [RFC v2] fq_codel : interval servo on hosts Eric Dumazet
2012-09-01 1:37 ` Yuchung Cheng
2012-09-01 12:51 ` Eric Dumazet
2012-09-04 15:10 ` Nandita Dukkipati
2012-09-04 15:25 ` Jonathan Morton
2012-09-04 15:39 ` Eric Dumazet
2012-09-04 15:34 ` Eric Dumazet
2012-09-04 16:40 ` Dave Taht
2012-09-04 16:54 ` Eric Dumazet
2012-09-04 16:57 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1346421031.2591.34.camel@edumazet-glaptop \
--to=eric.dumazet@gmail.com \
--cc=codel@lists.bufferbloat.net \
--cc=nanditad@google.com \
--cc=netdev@vger.kernel.org \
--cc=thruby@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.