All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: codel@lists.bufferbloat.net
Cc: Tomas Hruby <thruby@google.com>,
	Nandita Dukkipati <nanditad@google.com>,
	netdev <netdev@vger.kernel.org>
Subject: [RFC] fq_codel : interval servo on hosts
Date: Fri, 31 Aug 2012 06:50:31 -0700	[thread overview]
Message-ID: <1346421031.2591.34.camel@edumazet-glaptop> (raw)
In-Reply-To: <1346396137.2586.301.camel@edumazet-glaptop>

On Thu, 2012-08-30 at 23:55 -0700, Eric Dumazet wrote:
> On locally generated TCP traffic (host), we can override the 100 ms
> interval value using the more accurate RTT estimation maintained by TCP
> stack (tp->srtt)
> 
> Datacenter workload benefits using shorter feedback (say if RTT is below
> 1 ms, we can react 100 times faster to a congestion)
> 
> Idea from Yuchung Cheng.
> 

Linux patch would be the following :

I'll do tests next week, but I am sending a raw patch right now if
anybody wants to try it.

Presumably we also want to adjust target as well.

To get more precise srtt values in the datacenter, we might avoid the
'one jiffie slack' on small values in tcp_rtt_estimator(), as we force
m to be 1 before the scaling by 8 :

if (m == 0)
	m = 1;

We only need to force the least significant bit of srtt to be set.


 net/sched/sch_fq_codel.c |   23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 9fc1c62..7d2fe35 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -25,6 +25,7 @@
 #include <net/pkt_sched.h>
 #include <net/flow_keys.h>
 #include <net/codel.h>
+#include <linux/tcp.h>
 
 /*	Fair Queue CoDel.
  *
@@ -59,6 +60,7 @@ struct fq_codel_sched_data {
 	u32		perturbation;	/* hash perturbation */
 	u32		quantum;	/* psched_mtu(qdisc_dev(sch)); */
 	struct codel_params cparams;
+	codel_time_t	default_interval;
 	struct codel_stats cstats;
 	u32		drop_overlimit;
 	u32		new_flow_count;
@@ -211,6 +213,14 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	return NET_XMIT_SUCCESS;
 }
 
+/* Given TCP srtt evaluation, return codel interval.
+ * srtt is given in jiffies, scaled by 8.
+ */
+static codel_time_t tcp_srtt_to_codel(unsigned int srtt)
+{
+	return srtt * ((NSEC_PER_SEC >> (CODEL_SHIFT + 3)) / HZ);
+}
+
 /* This is the specific function called from codel_dequeue()
  * to dequeue a packet from queue. Note: backlog is handled in
  * codel, we dont need to reduce it here.
@@ -220,12 +230,21 @@ static struct sk_buff *dequeue(struct codel_vars *vars, struct Qdisc *sch)
 	struct fq_codel_sched_data *q = qdisc_priv(sch);
 	struct fq_codel_flow *flow;
 	struct sk_buff *skb = NULL;
+	struct sock *sk;
 
 	flow = container_of(vars, struct fq_codel_flow, cvars);
 	if (flow->head) {
 		skb = dequeue_head(flow);
 		q->backlogs[flow - q->flows] -= qdisc_pkt_len(skb);
 		sch->q.qlen--;
+		sk = skb->sk;
+		q->cparams.interval = q->default_interval;
+		if (sk && sk->sk_protocol == IPPROTO_TCP) {
+			u32 srtt = tcp_sk(sk)->srtt;
+
+			if (srtt)
+				q->cparams.interval = tcp_srtt_to_codel(srtt);
+		}
 	}
 	return skb;
 }
@@ -330,7 +349,7 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt)
 	if (tb[TCA_FQ_CODEL_INTERVAL]) {
 		u64 interval = nla_get_u32(tb[TCA_FQ_CODEL_INTERVAL]);
 
-		q->cparams.interval = (interval * NSEC_PER_USEC) >> CODEL_SHIFT;
+		q->default_interval = (interval * NSEC_PER_USEC) >> CODEL_SHIFT;
 	}
 
 	if (tb[TCA_FQ_CODEL_LIMIT])
@@ -441,7 +460,7 @@ static int fq_codel_dump(struct Qdisc *sch, struct sk_buff *skb)
 	    nla_put_u32(skb, TCA_FQ_CODEL_LIMIT,
 			sch->limit) ||
 	    nla_put_u32(skb, TCA_FQ_CODEL_INTERVAL,
-			codel_time_to_us(q->cparams.interval)) ||
+			codel_time_to_us(q->default_interval)) ||
 	    nla_put_u32(skb, TCA_FQ_CODEL_ECN,
 			q->cparams.ecn) ||
 	    nla_put_u32(skb, TCA_FQ_CODEL_QUANTUM,

       reply	other threads:[~2012-08-31 13:50 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1346396137.2586.301.camel@edumazet-glaptop>
2012-08-31 13:50 ` Eric Dumazet [this message]
2012-08-31 13:57   ` [RFC v2] fq_codel : interval servo on hosts Eric Dumazet
2012-09-01  1:37     ` Yuchung Cheng
2012-09-01 12:51       ` Eric Dumazet
2012-09-04 15:10         ` Nandita Dukkipati
2012-09-04 15:25           ` Jonathan Morton
2012-09-04 15:39             ` Eric Dumazet
2012-09-04 15:34           ` Eric Dumazet
2012-09-04 16:40             ` Dave Taht
2012-09-04 16:54               ` Eric Dumazet
2012-09-04 16:57               ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1346421031.2591.34.camel@edumazet-glaptop \
    --to=eric.dumazet@gmail.com \
    --cc=codel@lists.bufferbloat.net \
    --cc=nanditad@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=thruby@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.