All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
       [not found]   ` <1298390536.2861.9.camel@edumazet-laptop>
@ 2011-02-23 15:14     ` Eric Dumazet
  2011-02-23 15:43       ` Stephen Hemminger
                         ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Eric Dumazet @ 2011-02-23 15:14 UTC (permalink / raw)
  To: David Miller, Juliusz Chroboczek
  Cc: John W. Linville, Stephen Hemminger, Patrick McHardy, netdev, Andi Kleen

Hi David & Juliusz

Here is v3 of SFB. (previous ones were from Juliusz)

Thanks

[PATCH net-next-2.6 v3] net_sched: SFB flow scheduler

This is the Stochastic Fair Blue scheduler, based on work from :

W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: A New Class of Active Queue
Management Algorithms. U. Michigan CSE-TR-387-99, April 1999.

http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf

This implementation is based on work done by Juliusz Chroboczek

General SFB algorithm can be found in figure 14, page 15:

B[l][n] : L x N array of bins (L levels, N bins per level)
enqueue()
Calculate hash function values h{0}, h{1}, .. h{L-1}
Update bins at each level
for i = 0 to L - 1
   if (B[i][h{i}].qlen > bin_size)
      B[i][h{i}].pm += delta;
   else if (B[i][h{i}].qlen == 0)
      B[i][h{i}].pm -= delta;
pmin = min(B[0][h{0}].pm ... B[L-1][h{L-1}].pm);
if (pmin == 1.0)
    ratelimit();
else
    mark/drop with probabilty pmin;

I did the adaptation of Juliusz code to meet current kernel standards,
and various changes to address previous comments :

http://thread.gmane.org/gmane.linux.network/90225
http://thread.gmane.org/gmane.linux.network/90375

Default flow classifier is the rxhash introduced by RPS in 2.6.35, but
we can use an external flow classifier if wanted.

tc qdisc add dev $IFB parent 1:11 handle 11:  \
	est 0.5sec 2sec sfb limit 128

tc filter add dev $DEV protocol ip parent 11: handle 3 \
	flow hash keys dst divisor 1024

Notes:

1) SFB default child qdisc is pfifo_fast. It can be changed by another
qdisc but a child qdisc MUST not drop a packet previously queued. This
is because SFB needs to handle a dequeued packet in order to maintain
its virtual queue states. pfifo_head_drop or CHOKe should not be used.

2) I added one field in qdisc_skb_cb because SFB needs to remember the
hash/classid of an skb to decrement virtual queue lengthes at dequeue()
time.

3) ECN is enabled by default, unlike RED/CHOKe/GRED

With help from Patrick McHardy & Andi Kleen

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: John W. Linville <linville@tuxdriver.com>
---
 include/linux/pkt_sched.h |   38 +
 include/net/sch_generic.h |    1 
 net/sched/Kconfig         |   11 
 net/sched/Makefile        |    1 
 net/sched/sch_sfb.c       |  696 ++++++++++++++++++++++++++++++++++++
 5 files changed, 747 insertions(+)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index d4bb6f5..629a8b0 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -522,4 +522,42 @@ struct tc_mqprio_qopt {
 	__u16	offset[TC_QOPT_MAX_QUEUE];
 };
 
+/* SFB */
+
+enum {
+	TCA_SFB_UNSPEC,
+	TCA_SFB_PARMS,
+	__TCA_SFB_MAX,
+};
+
+#define TCA_SFB_MAX (__TCA_SFB_MAX - 1)
+
+/*
+ * Note: increment, decrement are Q0.16 fixed-point values.
+ */
+struct tc_sfb_qopt {
+	__u32 rehash_interval;	/* delay between hash flip, in seconds */
+	__u32 db_interval;	/* double buffering interval in seconds (db_interval < rehash_interval) */
+	__u32 max;		/* max len of qlen_min */
+	__u32 target;		/* bin_size */
+	__u32 increment;	/* delta, (d1 in Blue) */
+	__u32 decrement;	/* delta, (d2 in Blue) */
+	__u32 limit;		/* max SFB queue length */
+	__u32 penalty_rate;
+	__u32 penalty_burst;
+};
+
+struct tc_sfb_xstats {
+	__u32 earlydrop;
+	__u32 penaltydrop;
+	__u32 bucketdrop;
+	__u32 queuedrop;
+	__u32 childdrop; /* drops in child qdisc */
+	__u32 marked;
+	__u32 maxqlen;
+	__u32 maxprob;
+};
+
+#define SFB_MAX_PROB 0xFFFF
+
 #endif
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 16626a0..f40d32e 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -218,6 +218,7 @@ struct tcf_proto {
 
 struct qdisc_skb_cb {
 	unsigned int		pkt_len;
+	unsigned int		sfb_classid;
 	char			data[];
 };
 
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 8c19b6e..a7a5583 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -126,6 +126,17 @@ config NET_SCH_RED
 	  To compile this code as a module, choose M here: the
 	  module will be called sch_red.
 
+config NET_SCH_SFB
+	tristate "Stochastic Fair Blue (SFB)"
+	---help---
+	  Say Y here if you want to use the Stochastic Fair Blue (SFB)
+	  packet scheduling algorithm.
+
+	  See the top of <file:net/sched/sch_sfb.c> for more details.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called sch_sfb.
+
 config NET_SCH_SFQ
 	tristate "Stochastic Fairness Queueing (SFQ)"
 	---help---
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 06c6cdf..2e77b8d 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -24,6 +24,7 @@ obj-$(CONFIG_NET_SCH_RED)	+= sch_red.o
 obj-$(CONFIG_NET_SCH_GRED)	+= sch_gred.o
 obj-$(CONFIG_NET_SCH_INGRESS)	+= sch_ingress.o 
 obj-$(CONFIG_NET_SCH_DSMARK)	+= sch_dsmark.o
+obj-$(CONFIG_NET_SCH_SFB)	+= sch_sfb.o
 obj-$(CONFIG_NET_SCH_SFQ)	+= sch_sfq.o
 obj-$(CONFIG_NET_SCH_TBF)	+= sch_tbf.o
 obj-$(CONFIG_NET_SCH_TEQL)	+= sch_teql.o
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
new file mode 100644
index 0000000..b7f1c6e
--- /dev/null
+++ b/net/sched/sch_sfb.c
@@ -0,0 +1,696 @@
+/*
+ * net/sched/sch_sfb.c	  Stochastic Fair Blue
+ *
+ * Copyright (c) 2008-2011 Juliusz Chroboczek <jch@pps.jussieu.fr>
+ * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: 
+ * A New Class of Active Queue Management Algorithms. 
+ * U. Michigan CSE-TR-387-99, April 1999.
+ *
+ * http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <linux/random.h>
+#include <linux/jhash.h>
+#include <net/ip.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+
+/*
+ * SFB uses two B[l][n] : L x N arrays of bins (L levels, N bins per level)
+ * This implementation uses L = 8 and N = 16
+ * This permits us to split one 32bit hash (provided per packet by rxhash or
+ * external classifier) into 8 subhashes of 4 bits.
+ */
+#define SFB_BUCKET_SHIFT 4
+#define SFB_NUMBUCKETS	(1 << SFB_BUCKET_SHIFT) /* N bins per Level */
+#define SFB_BUCKET_MASK (SFB_NUMBUCKETS - 1)
+#define SFB_LEVELS	(32 / SFB_BUCKET_SHIFT) /* L */
+
+/* SFB algo uses a virtual queue, named "bin" */
+struct sfb_bucket {
+	u16		qlen; /* length of virtual queue */
+	u16		pm; /* marking probability */
+};
+
+/* We use a double buffering right before hash change
+ * (Section 4.4 of SFB reference : moving hash functions)
+ */
+struct sfb_bins {
+	u32		  perturbation; /* jhash perturbation */
+	struct sfb_bucket bins[SFB_LEVELS][SFB_NUMBUCKETS];
+};
+
+struct sfb_sched_data {
+	struct Qdisc	*qdisc;
+	struct tcf_proto *filter_list;
+	unsigned long	rehash_interval;
+	unsigned long	db_interval;	/* interval of double buffering */
+	u32		max;
+	u32		target;		/* bin_size */
+	u32		increment;	/* d1 */
+	u32		decrement;	/* d2 */
+	u32		limit;		/* HARD maximal queue length */
+	u32		penalty_rate;
+	u32		penalty_burst;
+	u32		tokens_avail;
+	unsigned long	rehash_time;
+	unsigned long	token_time;
+
+	u8		slot;		/* current active bins (0 or 1) */
+	bool		double_buffering;
+	struct sfb_bins bins[2];
+
+	struct {
+		u32	earlydrop;
+		u32	penaltydrop;
+		u32	bucketdrop;
+		u32	queuedrop;
+		u32	childdrop;	/* drops in child qdisc */
+		u32	marked;		/* ECN mark */
+	} stats;
+};
+
+/*
+ * If using 'internal' SFB flow classifier, sfb_classid is skb rxhash
+ * If using external classifier, sfb_classid contains the classid.
+ */
+static u32 sfb_hash(const struct sk_buff *skb, u32 slot,
+		    struct sfb_sched_data *q)
+{
+	return jhash_1word(qdisc_skb_cb(skb)->sfb_classid,
+			   q->bins[slot].perturbation);
+}
+
+/* Probabilities are coded as Q0.16 fixed-point values,
+ * with 0xFFFF representing 65535/65536 (almost 1.0)
+ * Addition and subtraction are saturating in [0, 65535]
+ */
+static u32 prob_plus(u32 p1, u32 p2)
+{
+	u32 res = p1 + p2;
+
+	return min_t(u32, res, SFB_MAX_PROB);
+}
+
+static u32 prob_minus(u32 p1, u32 p2)
+{
+	return p1 > p2 ? p1 - p2 : 0;
+}
+
+static void increment_one_qlen(u32 sfbhash, u32 slot, struct sfb_sched_data *q)
+{
+	int i;
+	struct sfb_bucket *b = &q->bins[slot].bins[0][0];
+
+	for (i = 0; i < SFB_LEVELS; i++) {
+		u32 hash = sfbhash & SFB_BUCKET_MASK;
+
+		sfbhash >>= SFB_BUCKET_SHIFT;
+		if (b[hash].qlen < 0xFFFF)
+			b[hash].qlen++;
+		b += SFB_NUMBUCKETS; /* next level */
+	}
+}
+
+static void increment_qlen(u32 hashes[2], struct sfb_sched_data *q)
+{
+	u32 slot = q->slot;
+
+	increment_one_qlen(hashes[slot], slot, q);
+	if (q->double_buffering) {
+		slot ^= 1;
+		increment_one_qlen(hashes[slot], slot, q);
+	}
+}
+
+static void decrement_one_qlen(u32 sfbhash, u32 slot,
+			       struct sfb_sched_data *q)
+{
+	int i;
+	struct sfb_bucket *b = &q->bins[slot].bins[0][0];
+
+	for (i = 0; i < SFB_LEVELS; i++) {
+		u32 hash = sfbhash & SFB_BUCKET_MASK;
+
+		sfbhash >>= SFB_BUCKET_SHIFT;
+		if (b[hash].qlen > 0)
+			b[hash].qlen--;
+		b += SFB_NUMBUCKETS; /* next level */
+	}
+}
+
+static void decrement_qlen(struct sk_buff *skb, struct sfb_sched_data *q)
+{
+	u32 slot = q->slot;
+	u32 sfbhash = sfb_hash(skb, slot, q);
+
+	decrement_one_qlen(sfbhash, slot, q);
+	if (q->double_buffering) {
+		slot ^= 1;
+		sfbhash = sfb_hash(skb, slot, q);
+		decrement_one_qlen(sfbhash, slot, q);
+	}
+}
+
+static void decrement_prob(struct sfb_bucket *b, struct sfb_sched_data *q)
+{
+	b->pm =	prob_minus(b->pm, q->decrement);
+}
+
+static void increment_prob(struct sfb_bucket *b, struct sfb_sched_data *q)
+{
+	b->pm = prob_plus(b->pm, q->increment);
+}
+
+static void sfb_zero_all_buckets(int slot, struct sfb_sched_data *q)
+{
+	memset(&q->bins[slot], 0, sizeof(q->bins[slot]));
+}
+
+/*
+ * compute max qlen and max pm
+ */
+static u32 sfb_compute_qlen(u32 *prob_r, const struct sfb_sched_data *q)
+{
+	int i;
+	u32 qlen = 0, prob = 0;
+	const struct sfb_bucket *b = &q->bins[q->slot].bins[0][0];
+
+	for (i = 0; i < SFB_LEVELS * SFB_NUMBUCKETS; i++) {
+		if (qlen < b->qlen)
+			qlen = b->qlen;
+		if (prob < b->pm)
+			prob = b->pm;
+		b++;
+	}
+	*prob_r = prob;
+	return qlen;
+}
+
+
+static void sfb_init_perturbation(u32 slot, struct sfb_sched_data *q)
+{
+	q->bins[slot].perturbation = net_random();
+}
+
+static void sfb_swap_buffers(struct sfb_sched_data *q)
+{
+	sfb_zero_all_buckets(q->slot, q);
+	sfb_init_perturbation(q->slot, q);
+	q->slot ^= 1;
+	q->double_buffering = false;
+}
+
+/* Non elastic flows are allowed to use part of the bandwidth, expressed
+ * in "penalty_rate" packets per second, with "penalty_burst" burst
+ */
+static bool sfb_rate_limit(struct sk_buff *skb, struct sfb_sched_data *q)
+{
+	if (q->penalty_rate == 0 || q->penalty_burst == 0)
+		return true;
+
+	if (q->tokens_avail < 1) {
+		unsigned long age = min(10UL * HZ, jiffies - q->token_time);
+
+		q->tokens_avail = (age * q->penalty_rate) / HZ;
+		if (q->tokens_avail > q->penalty_burst)
+			q->tokens_avail = q->penalty_burst;
+		q->token_time = jiffies;
+		if (q->tokens_avail < 1)
+			return true;
+	}
+
+	q->tokens_avail--;
+	return false;
+}
+
+static bool sfb_classify(struct sk_buff *skb, struct sfb_sched_data *q,
+			 int *qerr)
+{
+	struct tcf_result res;
+	int result;
+
+	result = tc_classify(skb, q->filter_list, &res);
+	if (result >= 0) {
+#ifdef CONFIG_NET_CLS_ACT
+		switch (result) {
+		case TC_ACT_STOLEN:
+		case TC_ACT_QUEUED:
+			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+		case TC_ACT_SHOT:
+			return false;
+		}
+#endif
+		qdisc_skb_cb(skb)->sfb_classid = TC_H_MIN(res.classid);
+		return true;
+	}
+	return false;
+}
+
+static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *child = q->qdisc;
+	int i;
+	u32 minprob = SFB_MAX_PROB;
+	u32 minqlen = ~0;
+	u32 r, slot, hashes[2], sfbhash;
+	int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+
+	if (q->filter_list) {
+		/* If using external classifiers, get result and record it. */
+		if (!sfb_classify(skb, q, &ret))
+			goto other_drop;
+	} else {
+		qdisc_skb_cb(skb)->sfb_classid = skb_get_rxhash(skb);
+	}
+
+	if (q->rehash_interval > 0) {
+		unsigned long limit = q->rehash_time + q->rehash_interval;
+
+		if (unlikely(time_after(jiffies, limit))) {
+			sfb_swap_buffers(q);
+			q->rehash_time = jiffies;
+		} else if (unlikely(!q->double_buffering && q->db_interval > 0 &&
+				    time_after(jiffies, limit - q->db_interval))) {
+			q->double_buffering = true;
+		}
+	}
+
+	slot = q->slot;
+
+	hashes[slot] = sfbhash = sfb_hash(skb, slot, q);
+	for (i = 0; i < SFB_LEVELS; i++) {
+		u32 hash = sfbhash & SFB_BUCKET_MASK;
+		struct sfb_bucket *b = &q->bins[slot].bins[i][hash];
+
+		sfbhash >>= SFB_BUCKET_SHIFT;
+		if (b->qlen == 0)
+			decrement_prob(b, q);
+		else if (unlikely(b->qlen >= q->target))
+			increment_prob(b, q);
+		if (minqlen > b->qlen)
+			minqlen = b->qlen;
+		if (minprob > b->pm)
+			minprob = b->pm;
+	}
+
+	if (q->double_buffering) {
+		slot ^= 1;
+		hashes[slot] = sfbhash = sfb_hash(skb, slot, q);
+		for (i = 0; i < SFB_LEVELS; i++) {
+			u32 hash = sfbhash & SFB_BUCKET_MASK;
+			struct sfb_bucket *b = &q->bins[slot].bins[i][hash];
+
+			sfbhash >>= SFB_BUCKET_SHIFT;
+			if (b->qlen == 0)
+				decrement_prob(b, q);
+			else if (unlikely(b->qlen >= q->target))
+				increment_prob(b, q);
+		}
+	}
+
+	if (unlikely(minqlen >= q->max || sch->q.qlen >= q->limit)) {
+		sch->qstats.overlimits++;
+		if (minqlen >= q->max)
+			q->stats.bucketdrop++;
+		else
+			q->stats.queuedrop++;
+		goto drop;
+	}
+
+	if (unlikely(minprob >= SFB_MAX_PROB)) {
+		/* Inelastic flow */
+		if (sfb_rate_limit(skb, q)) {
+			sch->qstats.overlimits++;
+			q->stats.penaltydrop++;
+			goto drop;
+		}
+		goto enqueue;
+	}
+
+	r = net_random() & SFB_MAX_PROB;
+
+	if (unlikely(r < minprob)) {
+		if (unlikely(minprob > SFB_MAX_PROB / 2)) {
+			/* If we're marking that many packets, then either
+			 * this flow is unresponsive, or we're badly congested.
+			 * In either case, we want to start dropping packets.
+			 */
+			if (r < (minprob - SFB_MAX_PROB / 2) * 2) {
+				q->stats.earlydrop++;
+				goto drop;
+			}
+		}
+		if (INET_ECN_set_ce(skb)) {
+			q->stats.marked++;
+		} else {
+			q->stats.earlydrop++;
+			goto drop;
+		}
+	}
+
+enqueue:
+	ret = qdisc_enqueue(skb, child);
+	if (likely(ret == NET_XMIT_SUCCESS)) {
+		sch->q.qlen++;
+		increment_qlen(hashes, q);
+	} else if (net_xmit_drop_count(ret)) {
+		q->stats.childdrop++;
+		sch->qstats.drops++;
+	}
+	return ret;
+
+drop:
+	qdisc_drop(skb, sch);
+	return NET_XMIT_CN;
+other_drop:
+	if (ret & __NET_XMIT_BYPASS)
+		sch->qstats.drops++;
+	kfree_skb(skb);
+	return ret;
+}
+
+static struct sk_buff *sfb_dequeue(struct Qdisc *sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *child = q->qdisc;
+	struct sk_buff *skb;
+
+	skb = child->dequeue(q->qdisc);
+
+	if (skb) {
+		qdisc_bstats_update(sch, skb);
+		sch->q.qlen--;
+		decrement_qlen(skb, q);
+	}
+
+	return skb;
+}
+
+static struct sk_buff *sfb_peek(struct Qdisc *sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *child = q->qdisc;
+
+	return child->ops->peek(child);
+}
+
+/* No sfb_drop -- impossible since the child doesn't return the dropped skb. */
+
+static void sfb_reset(struct Qdisc *sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	qdisc_reset(q->qdisc);
+	sch->q.qlen = 0;
+	q->slot = 0;
+	q->double_buffering = false;
+	sfb_zero_all_buckets(0, q);
+	sfb_zero_all_buckets(1, q);
+	sfb_init_perturbation(0, q);
+}
+
+static void sfb_destroy(struct Qdisc *sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	tcf_destroy_chain(&q->filter_list);
+	qdisc_destroy(q->qdisc);
+}
+
+static const struct nla_policy sfb_policy[TCA_SFB_MAX + 1] = {
+	[TCA_SFB_PARMS]	= { .len = sizeof(struct tc_sfb_qopt) },
+};
+
+static int sfb_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *child = NULL;
+	struct nlattr *tb[TCA_SFB_MAX + 1];
+	struct tc_sfb_qopt *ctl;
+	u32 rehash_interval, db_interval;
+	u32 limit;
+	u32 max, target;
+	u32 increment, decrement;
+	u32 penalty_rate, penalty_burst;
+	int err;
+
+	if (opt == NULL) {
+		rehash_interval = 600;
+		db_interval = 60;
+		limit = 0;
+		max = 25;
+		target = 20;
+		increment = (SFB_MAX_PROB + 500) / 1000; /* 0.1 % */
+		decrement = (SFB_MAX_PROB + 3000) / 6000;
+		penalty_rate = 10;
+		penalty_burst = 20;
+	} else {
+		err = nla_parse_nested(tb, TCA_SFB_MAX, opt, sfb_policy);
+		if (err < 0)
+			return -EINVAL;
+
+		if (tb[TCA_SFB_PARMS] == NULL)
+			return -EINVAL;
+
+		ctl = nla_data(tb[TCA_SFB_PARMS]);
+
+		rehash_interval = ctl->rehash_interval;
+		db_interval = ctl->db_interval;
+		limit = ctl->limit;
+		max = ctl->max;
+		target = ctl->target;
+		increment = ctl->increment;
+		decrement = ctl->decrement;
+		penalty_rate = ctl->penalty_rate;
+		penalty_burst = ctl->penalty_burst;
+	}
+
+	if (limit == 0)
+		limit = qdisc_dev(sch)->tx_queue_len;
+	if (limit == 0)
+		limit = 1;
+
+	child = fifo_create_dflt(sch, &pfifo_qdisc_ops, limit);
+	if (IS_ERR(child))
+		return PTR_ERR(child);
+
+	sch_tree_lock(sch);
+
+	qdisc_tree_decrease_qlen(q->qdisc, q->qdisc->q.qlen);
+	qdisc_destroy(q->qdisc);
+	q->qdisc = child;
+
+	q->rehash_interval = (unsigned long)rehash_interval * HZ;
+	q->db_interval = (unsigned long)db_interval * HZ;
+	q->rehash_time = jiffies;
+	q->limit = limit;
+	q->increment = increment;
+	q->decrement = decrement;
+	q->max = max;
+	q->target = target;
+	q->penalty_rate = penalty_rate;
+	q->penalty_burst = penalty_burst;
+	q->tokens_avail = penalty_burst;
+	q->token_time = jiffies;
+
+	q->slot = 0;
+	q->double_buffering = false;
+	sfb_zero_all_buckets(0, q);
+	sfb_zero_all_buckets(1, q);
+	sfb_init_perturbation(0, q);
+
+	sch_tree_unlock(sch);
+
+	return 0;
+}
+
+static int sfb_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	q->qdisc = &noop_qdisc;
+	return sfb_change(sch, opt);
+}
+
+static int sfb_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts;
+	struct tc_sfb_qopt opt = {
+		.rehash_interval = q->rehash_interval / HZ,
+		.db_interval = q->db_interval / HZ,
+		.limit = q->limit,
+		.max = q->max,
+		.target = q->target,
+		.increment = q->increment,
+		.decrement = q->decrement,
+		.penalty_rate = q->penalty_rate,
+		.penalty_burst = q->penalty_burst,
+	};
+
+	sch->qstats.backlog = q->qdisc->qstats.backlog;
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	NLA_PUT(skb, TCA_SFB_PARMS, sizeof(opt), &opt);
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -EMSGSIZE;
+}
+
+static int sfb_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct tc_sfb_xstats st = {
+		.earlydrop = q->stats.earlydrop,
+		.penaltydrop = q->stats.penaltydrop,
+		.bucketdrop = q->stats.bucketdrop,
+		.queuedrop = q->stats.queuedrop,
+		.childdrop = q->stats.childdrop,
+		.marked = q->stats.marked,
+	};
+
+	st.maxqlen = sfb_compute_qlen(&st.maxprob, q);
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static int sfb_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+	return -ENOSYS;
+}
+
+static int sfb_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
+		     struct Qdisc **old)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	if (new == NULL)
+		new = &noop_qdisc;
+
+	sch_tree_lock(sch);
+	*old = q->qdisc;
+	q->qdisc = new;
+	qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
+	qdisc_reset(*old);
+	sch_tree_unlock(sch);
+	return 0;
+}
+
+static struct Qdisc *sfb_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	return q->qdisc;
+}
+
+static unsigned long sfb_get(struct Qdisc *sch, u32 classid)
+{
+	return 1;
+}
+
+static void sfb_put(struct Qdisc *sch, unsigned long arg)
+{
+}
+
+static int sfb_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
+			    struct nlattr **tca, unsigned long *arg)
+{
+	return -ENOSYS;
+}
+
+static int sfb_delete(struct Qdisc *sch, unsigned long cl)
+{
+	return -ENOSYS;
+}
+
+static void sfb_walk(struct Qdisc *sch, struct qdisc_walker *walker)
+{
+	if (!walker->stop) {
+		if (walker->count >= walker->skip)
+			if (walker->fn(sch, 1, walker) < 0) {
+				walker->stop = 1;
+				return;
+			}
+		walker->count++;
+	}
+}
+
+static struct tcf_proto **sfb_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	if (cl)
+		return NULL;
+	return &q->filter_list;
+}
+
+static unsigned long sfb_bind(struct Qdisc *sch, unsigned long parent,
+			      u32 classid)
+{
+	return 0;
+}
+
+
+static const struct Qdisc_class_ops sfb_class_ops = {
+	.graft		=	sfb_graft,
+	.leaf		=	sfb_leaf,
+	.get		=	sfb_get,
+	.put		=	sfb_put,
+	.change		=	sfb_change_class,
+	.delete		=	sfb_delete,
+	.walk		=	sfb_walk,
+	.tcf_chain	=	sfb_find_tcf,
+	.bind_tcf	=	sfb_bind,
+	.unbind_tcf	=	sfb_put,
+	.dump		=	sfb_dump_class,
+};
+
+struct Qdisc_ops sfb_qdisc_ops __read_mostly = {
+	.id		=	"sfb",
+	.priv_size	=	sizeof(struct sfb_sched_data),
+	.cl_ops		=	&sfb_class_ops,
+	.enqueue	=	sfb_enqueue,
+	.dequeue	=	sfb_dequeue,
+	.peek		=	sfb_peek,
+	.init		=	sfb_init,
+	.reset		=	sfb_reset,
+	.destroy	=	sfb_destroy,
+	.change		=	sfb_change,
+	.dump		=	sfb_dump,
+	.dump_stats	=	sfb_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init sfb_module_init(void)
+{
+	return register_qdisc(&sfb_qdisc_ops);
+}
+
+static void __exit sfb_module_exit(void)
+{
+	unregister_qdisc(&sfb_qdisc_ops);
+}
+
+module_init(sfb_module_init)
+module_exit(sfb_module_exit)
+
+MODULE_DESCRIPTION("Stochastic Fair Blue queue discipline");
+MODULE_AUTHOR("Juliusz Chroboczek");
+MODULE_LICENSE("GPL");



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
  2011-02-23 15:14     ` [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler Eric Dumazet
@ 2011-02-23 15:43       ` Stephen Hemminger
  2011-02-23 16:13         ` Eric Dumazet
  2011-02-23 16:24       ` Patrick McHardy
  2011-02-23 20:56       ` [PATCH net-next-2.6 v4] net_sched: SFB flow scheduler Eric Dumazet
  2 siblings, 1 reply; 22+ messages in thread
From: Stephen Hemminger @ 2011-02-23 15:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Patrick McHardy, netdev, Andi Kleen

On Wed, 23 Feb 2011 16:14:51 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> 1) SFB default child qdisc is pfifo_fast. It can be changed by another
> qdisc but a child qdisc MUST not drop a packet previously queued. This
> is because SFB needs to handle a dequeued packet in order to maintain
> its virtual queue states. pfifo_head_drop or CHOKe should not be used.

Why not add a flag field to Qdisc_ops and to mark qdisc's that
are (or not) work conserving?

-- 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
  2011-02-23 15:43       ` Stephen Hemminger
@ 2011-02-23 16:13         ` Eric Dumazet
  2011-02-23 16:20           ` Patrick McHardy
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2011-02-23 16:13 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Patrick McHardy, netdev, Andi Kleen

Le mercredi 23 février 2011 à 07:43 -0800, Stephen Hemminger a écrit :
> On Wed, 23 Feb 2011 16:14:51 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > 1) SFB default child qdisc is pfifo_fast. It can be changed by another
> > qdisc but a child qdisc MUST not drop a packet previously queued. This
> > is because SFB needs to handle a dequeued packet in order to maintain
> > its virtual queue states. pfifo_head_drop or CHOKe should not be used.
> 
> Why not add a flag field to Qdisc_ops and to mark qdisc's that
> are (or not) work conserving?
> 

That was my initial idea, but have no idea how to implement it (outside
of fast path, I mean...)




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
  2011-02-23 16:13         ` Eric Dumazet
@ 2011-02-23 16:20           ` Patrick McHardy
  0 siblings, 0 replies; 22+ messages in thread
From: Patrick McHardy @ 2011-02-23 16:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, David Miller, Juliusz Chroboczek,
	John W. Linville, netdev, Andi Kleen

Am 23.02.2011 17:13, schrieb Eric Dumazet:
> Le mercredi 23 février 2011 à 07:43 -0800, Stephen Hemminger a écrit :
>> On Wed, 23 Feb 2011 16:14:51 +0100
>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>> 1) SFB default child qdisc is pfifo_fast. It can be changed by another
>>> qdisc but a child qdisc MUST not drop a packet previously queued. This
>>> is because SFB needs to handle a dequeued packet in order to maintain
>>> its virtual queue states. pfifo_head_drop or CHOKe should not be used.
>>
>> Why not add a flag field to Qdisc_ops and to mark qdisc's that
>> are (or not) work conserving?
>>
> 
> That was my initial idea, but have no idea how to implement it (outside
> of fast path, I mean...)

This also doesn't really have anything to do with work-conserving
qdiscs, SFB f.i. is work conserving, but still might drop other
packets. Actually I don't think there's any qdisc besides the
*fifos that can reasonably be used with SFB, so we might as well
only support a built-in qdisc.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
  2011-02-23 15:14     ` [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler Eric Dumazet
  2011-02-23 15:43       ` Stephen Hemminger
@ 2011-02-23 16:24       ` Patrick McHardy
  2011-02-23 16:48         ` Eric Dumazet
  2011-02-23 17:05         ` [PATCH] net_sched: long word align struct qdisc_skb_cb data Eric Dumazet
  2011-02-23 20:56       ` [PATCH net-next-2.6 v4] net_sched: SFB flow scheduler Eric Dumazet
  2 siblings, 2 replies; 22+ messages in thread
From: Patrick McHardy @ 2011-02-23 16:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Stephen Hemminger, netdev, Andi Kleen

Am 23.02.2011 16:14, schrieb Eric Dumazet:
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 16626a0..f40d32e 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -218,6 +218,7 @@ struct tcf_proto {
>  
>  struct qdisc_skb_cb {
>  	unsigned int		pkt_len;
> +	unsigned int		sfb_classid;
>  	char			data[];
>  };

This could be moved into a SFB specific cb, similar to what netem
does.

> diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
> new file mode 100644
> index 0000000..b7f1c6e
> --- /dev/null
> +++ b/net/sched/sch_sfb.c
> @@ -0,0 +1,696 @@
> +/*
> + * net/sched/sch_sfb.c	  Stochastic Fair Blue
> + *
> + * Copyright (c) 2008-2011 Juliusz Chroboczek <jch@pps.jussieu.fr>
> + * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * version 2 as published by the Free Software Foundation.
> + *
> + * W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: 
> + * A New Class of Active Queue Management Algorithms. 
> + * U. Michigan CSE-TR-387-99, April 1999.
> + *
> + * http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf
> + *
> + */
> +
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/skbuff.h>
> +#include <linux/random.h>
> +#include <linux/jhash.h>
> +#include <net/ip.h>
> +#include <net/pkt_sched.h>
> +#include <net/inet_ecn.h>
> +
> +/*
> + * SFB uses two B[l][n] : L x N arrays of bins (L levels, N bins per level)
> + * This implementation uses L = 8 and N = 16
> + * This permits us to split one 32bit hash (provided per packet by rxhash or
> + * external classifier) into 8 subhashes of 4 bits.
> + */
> +#define SFB_BUCKET_SHIFT 4

If you want to make this dynamic, there are a couple of papers analyzing
combined hash functions for bloom filters, f.i.
"Less Hashing, Same Performance: Building a Better Bloom Filter".

> +/*
> + * If using 'internal' SFB flow classifier, sfb_classid is skb rxhash
> + * If using external classifier, sfb_classid contains the classid.
> + */
> +static u32 sfb_hash(const struct sk_buff *skb, u32 slot,
> +		    struct sfb_sched_data *q)
> +{
> +	return jhash_1word(qdisc_skb_cb(skb)->sfb_classid,
> +			   q->bins[slot].perturbation);
> +}
> +
> +/* Probabilities are coded as Q0.16 fixed-point values,
> + * with 0xFFFF representing 65535/65536 (almost 1.0)
> + * Addition and subtraction are saturating in [0, 65535]
> + */
> +static u32 prob_plus(u32 p1, u32 p2)
> +{
> +	u32 res = p1 + p2;
> +
> +	return min_t(u32, res, SFB_MAX_PROB);
> +}
> +
> +static u32 prob_minus(u32 p1, u32 p2)
> +{
> +	return p1 > p2 ? p1 - p2 : 0;
> +}
> +
> +static void increment_one_qlen(u32 sfbhash, u32 slot, struct sfb_sched_data *q)
> +{
> +	int i;
> +	struct sfb_bucket *b = &q->bins[slot].bins[0][0];
> +
> +	for (i = 0; i < SFB_LEVELS; i++) {
> +		u32 hash = sfbhash & SFB_BUCKET_MASK;
> +
> +		sfbhash >>= SFB_BUCKET_SHIFT;
> +		if (b[hash].qlen < 0xFFFF)
> +			b[hash].qlen++;
> +		b += SFB_NUMBUCKETS; /* next level */
> +	}
> +}
> +
> +static void increment_qlen(u32 hashes[2], struct sfb_sched_data *q)
> +{
> +	u32 slot = q->slot;
> +
> +	increment_one_qlen(hashes[slot], slot, q);
> +	if (q->double_buffering) {
> +		slot ^= 1;
> +		increment_one_qlen(hashes[slot], slot, q);
> +	}
> +}
> +
> +static void decrement_one_qlen(u32 sfbhash, u32 slot,
> +			       struct sfb_sched_data *q)
> +{
> +	int i;
> +	struct sfb_bucket *b = &q->bins[slot].bins[0][0];
> +
> +	for (i = 0; i < SFB_LEVELS; i++) {
> +		u32 hash = sfbhash & SFB_BUCKET_MASK;
> +
> +		sfbhash >>= SFB_BUCKET_SHIFT;
> +		if (b[hash].qlen > 0)
> +			b[hash].qlen--;
> +		b += SFB_NUMBUCKETS; /* next level */
> +	}
> +}
> +
> +static void decrement_qlen(struct sk_buff *skb, struct sfb_sched_data *q)
> +{
> +	u32 slot = q->slot;
> +	u32 sfbhash = sfb_hash(skb, slot, q);
> +
> +	decrement_one_qlen(sfbhash, slot, q);
> +	if (q->double_buffering) {

This needs to be a per-skb property, otherwise you could have the
situation:

- enqueue skb, double_buffering=0, increment buffer 0
- enable double buffering
- swap buffers
- dequeue same skb, decrement buffer 1

after which the qlen values of buffer 1 will be incorrect.


> +		slot ^= 1;
> +		sfbhash = sfb_hash(skb, slot, q);

Isn't there room in the cb to store both hash values?

> +		decrement_one_qlen(sfbhash, slot, q);
> +	}
> +}
> +

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
  2011-02-23 16:24       ` Patrick McHardy
@ 2011-02-23 16:48         ` Eric Dumazet
  2011-02-23 16:58           ` Patrick McHardy
  2011-02-23 17:05         ` [PATCH] net_sched: long word align struct qdisc_skb_cb data Eric Dumazet
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2011-02-23 16:48 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Stephen Hemminger, netdev, Andi Kleen

Le mercredi 23 février 2011 à 17:24 +0100, Patrick McHardy a écrit :

> This needs to be a per-skb property, otherwise you could have the
> situation:
> 
> - enqueue skb, double_buffering=0, increment buffer 0
> - enable double buffering
> - swap buffers
> - dequeue same skb, decrement buffer 1
> 
> after which the qlen values of buffer 1 will be incorrect.
> 

Normally its OK, because we bzero() the zone, and the "decrement" is
0-bounded.

I had this idea (of storing two bits per skb), but :

- It means that swap_buffer() should not touch (bzero) the 'old' bins

- Since hash perturbator is changed, we have to store the two hash
values per skb (instead of one u32 / classid).


> 
> > +		slot ^= 1;
> > +		sfbhash = sfb_hash(skb, slot, q);
> 
> Isn't there room in the cb to store both hash values?

Yes, I am going to implement your idea, its probably OK to use two u32
on skb_cb for this.

Thanks !



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
  2011-02-23 16:48         ` Eric Dumazet
@ 2011-02-23 16:58           ` Patrick McHardy
  2011-02-23 17:16             ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Patrick McHardy @ 2011-02-23 16:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Stephen Hemminger, netdev, Andi Kleen

Am 23.02.2011 17:48, schrieb Eric Dumazet:
> Le mercredi 23 février 2011 à 17:24 +0100, Patrick McHardy a écrit :
> 
>> This needs to be a per-skb property, otherwise you could have the
>> situation:
>>
>> - enqueue skb, double_buffering=0, increment buffer 0
>> - enable double buffering
>> - swap buffers
>> - dequeue same skb, decrement buffer 1
>>
>> after which the qlen values of buffer 1 will be incorrect.
>>
> 
> Normally its OK, because we bzero() the zone, and the "decrement" is
> 0-bounded.

Yeah, but we might decrement buckets of different flows which
are non-zero. Probably not too bad, but still not correct.

> I had this idea (of storing two bits per skb), but :
> 
> - It means that swap_buffer() should not touch (bzero) the 'old' bins

Yes, it means we have to properly decrement the old buffer
until all bins reached zero.

> - Since hash perturbator is changed, we have to store the two hash
> values per skb (instead of one u32 / classid).

Indeed.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] net_sched: long word align struct qdisc_skb_cb data
  2011-02-23 16:24       ` Patrick McHardy
  2011-02-23 16:48         ` Eric Dumazet
@ 2011-02-23 17:05         ` Eric Dumazet
  2011-02-23 17:30           ` Stephen Hemminger
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2011-02-23 17:05 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Stephen Hemminger, netdev, Andi Kleen

Le mercredi 23 février 2011 à 17:24 +0100, Patrick McHardy a écrit :
> Am 23.02.2011 16:14, schrieb Eric Dumazet:
> > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> > index 16626a0..f40d32e 100644
> > --- a/include/net/sch_generic.h
> > +++ b/include/net/sch_generic.h
> > @@ -218,6 +218,7 @@ struct tcf_proto {
> >  
> >  struct qdisc_skb_cb {
> >  	unsigned int		pkt_len;
> > +	unsigned int		sfb_classid;
> >  	char			data[];
> >  };
> 
> This could be moved into a SFB specific cb, similar to what netem
> does.

Hmm... well... I want to be sure no other sch will destroy my values.

netem seems buggy then.

Probably following patch is needed ?

Thanks

[PATCH] net_sched: long word align struct qdisc_skb_cb data

netem_skb_cb() does :

return (struct netem_skb_cb *)qdisc_skb_cb(skb)->data;

Unfortunatly struct qdisc_skb_cb data is not long word aligned, so
access to psched_time_t time_to_send uses a non aligned access.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sch_generic.h |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 16626a0..dbdf2b5 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -218,7 +218,7 @@ struct tcf_proto {
 
 struct qdisc_skb_cb {
 	unsigned int		pkt_len;
-	char			data[];
+	long			data[];
 };
 
 static inline int qdisc_qlen(struct Qdisc *q)



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
  2011-02-23 16:58           ` Patrick McHardy
@ 2011-02-23 17:16             ` Eric Dumazet
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2011-02-23 17:16 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Stephen Hemminger, netdev, Andi Kleen

Le mercredi 23 février 2011 à 17:58 +0100, Patrick McHardy a écrit :
> Am 23.02.2011 17:48, schrieb Eric Dumazet:
> > Le mercredi 23 février 2011 à 17:24 +0100, Patrick McHardy a écrit :
> > 
> >> This needs to be a per-skb property, otherwise you could have the
> >> situation:
> >>
> >> - enqueue skb, double_buffering=0, increment buffer 0
> >> - enable double buffering
> >> - swap buffers
> >> - dequeue same skb, decrement buffer 1
> >>
> >> after which the qlen values of buffer 1 will be incorrect.
> >>
> > 
> > Normally its OK, because we bzero() the zone, and the "decrement" is
> > 0-bounded.
> 
> Yeah, but we might decrement buckets of different flows which
> are non-zero. Probably not too bad, but still not correct.
> 
> > I had this idea (of storing two bits per skb), but :
> > 
> > - It means that swap_buffer() should not touch (bzero) the 'old' bins
> 
> Yes, it means we have to properly decrement the old buffer
> until all bins reached zero.
> 
> > - Since hash perturbator is changed, we have to store the two hash
> > values per skb (instead of one u32 / classid).
> 
> Indeed.

BTQ, I had this idea of storing the double_buffer per skb reading SFB
paper, because paper says the double buffering is really needed only for
unelastic flows, not for all packets.

paper quote :

As one set of bins is being used
for queue management, a second set of bins using the next set of hash
functions can be warmed up. In this
case, any time a flow is classified as non-responsive, it is hashed
using the second set of hash functions and
the marking probabilities of the corresponding bins in the warmup set
are updated.

So using two 'hash' values per skb is the way to go, with special 0
value meanings : skb was not 'inserted' into virtual queues.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net_sched: long word align struct qdisc_skb_cb data
  2011-02-23 17:05         ` [PATCH] net_sched: long word align struct qdisc_skb_cb data Eric Dumazet
@ 2011-02-23 17:30           ` Stephen Hemminger
  2011-02-23 22:17             ` David Miller
  0 siblings, 1 reply; 22+ messages in thread
From: Stephen Hemminger @ 2011-02-23 17:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Patrick McHardy, David Miller, Juliusz Chroboczek,
	John W. Linville, netdev, Andi Kleen

On Wed, 23 Feb 2011 18:05:07 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le mercredi 23 février 2011 à 17:24 +0100, Patrick McHardy a écrit :
> > Am 23.02.2011 16:14, schrieb Eric Dumazet:
> > > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> > > index 16626a0..f40d32e 100644
> > > --- a/include/net/sch_generic.h
> > > +++ b/include/net/sch_generic.h
> > > @@ -218,6 +218,7 @@ struct tcf_proto {
> > >  
> > >  struct qdisc_skb_cb {
> > >  	unsigned int		pkt_len;
> > > +	unsigned int		sfb_classid;
> > >  	char			data[];
> > >  };
> > 
> > This could be moved into a SFB specific cb, similar to what netem
> > does.
> 
> Hmm... well... I want to be sure no other sch will destroy my values.
> 
> netem seems buggy then.
> 
> Probably following patch is needed ?

Yes, it was long word aligned when netem was written but
we seem to have bit creep.



-- 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH net-next-2.6 v4] net_sched: SFB flow scheduler
  2011-02-23 15:14     ` [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler Eric Dumazet
  2011-02-23 15:43       ` Stephen Hemminger
  2011-02-23 16:24       ` Patrick McHardy
@ 2011-02-23 20:56       ` Eric Dumazet
  2011-02-23 21:11         ` Stephen Hemminger
  2011-02-23 22:06         ` David Miller
  2 siblings, 2 replies; 22+ messages in thread
From: Eric Dumazet @ 2011-02-23 20:56 UTC (permalink / raw)
  To: David Miller
  Cc: Juliusz Chroboczek, John W. Linville, Stephen Hemminger,
	Patrick McHardy, netdev, Andi Kleen

This is the Stochastic Fair Blue scheduler, based on work from :

W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: A New Class of Active Queue
Management Algorithms. U. Michigan CSE-TR-387-99, April 1999.

http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf

This implementation is based on work done by Juliusz Chroboczek

General SFB algorithm can be found in figure 14, page 15:

B[l][n] : L x N array of bins (L levels, N bins per level)
enqueue()
Calculate hash function values h{0}, h{1}, .. h{L-1}
Update bins at each level
for i = 0 to L - 1
   if (B[i][h{i}].qlen > bin_size)
      B[i][h{i}].p_mark += p_increment;
   else if (B[i][h{i}].qlen == 0)
      B[i][h{i}].p_mark -= p_decrement;
p_min = min(B[0][h{0}].p_mark ... B[L-1][h{L-1}].p_mark);
if (p_min == 1.0)
    ratelimit();
else
    mark/drop with probabilty p_min;

I did the adaptation of Juliusz code to meet current kernel standards,
and various changes to address previous comments :

http://thread.gmane.org/gmane.linux.network/90225
http://thread.gmane.org/gmane.linux.network/90375

Default flow classifier is the rxhash introduced by RPS in 2.6.35, but
we can use an external flow classifier if wanted.

tc qdisc add dev $DEV parent 1:11 handle 11:  \
        est 0.5sec 2sec sfb limit 128

tc filter add dev $DEV protocol ip parent 11: handle 3 \
        flow hash keys dst divisor 1024

Notes:

1) SFB default child qdisc is pfifo_fast. It can be changed by another
qdisc but a child qdisc MUST not drop a packet previously queued. This
is because SFB needs to handle a dequeued packet in order to maintain
its virtual queue states. pfifo_head_drop or CHOKe should not be used.

2) ECN is enabled by default, unlike RED/CHOKe/GRED

With help from Patrick McHardy & Andi Kleen

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: John W. Linville <linville@tuxdriver.com>
---
V4: added sfb_skb_cb to hold two hashes values per skb
    double buffering takes place only for inelastic flows
    some fields renames.
    timeouts are now given in ms instead of seconds

 include/linux/pkt_sched.h |   39 +
 net/sched/Kconfig         |   11 
 net/sched/Makefile        |    1 
 net/sched/sch_sfb.c       |  709 ++++++++++++++++++++++++++++++++++++
 4 files changed, 760 insertions(+)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index d4bb6f5..5afee2b 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -522,4 +522,43 @@ struct tc_mqprio_qopt {
 	__u16	offset[TC_QOPT_MAX_QUEUE];
 };
 
+/* SFB */
+
+enum {
+	TCA_SFB_UNSPEC,
+	TCA_SFB_PARMS,
+	__TCA_SFB_MAX,
+};
+
+#define TCA_SFB_MAX (__TCA_SFB_MAX - 1)
+
+/*
+ * Note: increment, decrement are Q0.16 fixed-point values.
+ */
+struct tc_sfb_qopt {
+	__u32 rehash_interval;	/* delay between hash move, in ms */
+	__u32 warmup_time;	/* double buffering warmup time in ms (warmup_time < rehash_interval) */
+	__u32 max;		/* max len of qlen_min */
+	__u32 bin_size;		/* maximum queue length per bin */
+	__u32 increment;	/* probability increment, (d1 in Blue) */
+	__u32 decrement;	/* probability decrement, (d2 in Blue) */
+	__u32 limit;		/* max SFB queue length */
+	__u32 penalty_rate;	/* inelastic flows are rate limited to 'rate' pps */
+	__u32 penalty_burst;
+};
+
+struct tc_sfb_xstats {
+	__u32 earlydrop;
+	__u32 penaltydrop;
+	__u32 bucketdrop;
+	__u32 queuedrop;
+	__u32 childdrop; /* drops in child qdisc */
+	__u32 marked;
+	__u32 maxqlen;
+	__u32 maxprob;
+	__u32 avgprob;
+};
+
+#define SFB_MAX_PROB 0xFFFF
+
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 8c19b6e..a7a5583 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -126,6 +126,17 @@ config NET_SCH_RED
 	  To compile this code as a module, choose M here: the
 	  module will be called sch_red.
 
+config NET_SCH_SFB
+	tristate "Stochastic Fair Blue (SFB)"
+	---help---
+	  Say Y here if you want to use the Stochastic Fair Blue (SFB)
+	  packet scheduling algorithm.
+
+	  See the top of <file:net/sched/sch_sfb.c> for more details.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called sch_sfb.
+
 config NET_SCH_SFQ
 	tristate "Stochastic Fairness Queueing (SFQ)"
 	---help---
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 06c6cdf..2e77b8d 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -24,6 +24,7 @@ obj-$(CONFIG_NET_SCH_RED)	+= sch_red.o
 obj-$(CONFIG_NET_SCH_GRED)	+= sch_gred.o
 obj-$(CONFIG_NET_SCH_INGRESS)	+= sch_ingress.o 
 obj-$(CONFIG_NET_SCH_DSMARK)	+= sch_dsmark.o
+obj-$(CONFIG_NET_SCH_SFB)	+= sch_sfb.o
 obj-$(CONFIG_NET_SCH_SFQ)	+= sch_sfq.o
 obj-$(CONFIG_NET_SCH_TBF)	+= sch_tbf.o
 obj-$(CONFIG_NET_SCH_TEQL)	+= sch_teql.o
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
new file mode 100644
index 0000000..7b2e0cc
--- /dev/null
+++ b/net/sched/sch_sfb.c
@@ -0,0 +1,709 @@
+/*
+ * net/sched/sch_sfb.c	  Stochastic Fair Blue
+ *
+ * Copyright (c) 2008-2011 Juliusz Chroboczek <jch@pps.jussieu.fr>
+ * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * W. Feng, D. Kandlur, D. Saha, K. Shin. Blue:
+ * A New Class of Active Queue Management Algorithms.
+ * U. Michigan CSE-TR-387-99, April 1999.
+ *
+ * http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <linux/random.h>
+#include <linux/jhash.h>
+#include <net/ip.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+
+/*
+ * SFB uses two B[l][n] : L x N arrays of bins (L levels, N bins per level)
+ * This implementation uses L = 8 and N = 16
+ * This permits us to split one 32bit hash (provided per packet by rxhash or
+ * external classifier) into 8 subhashes of 4 bits.
+ */
+#define SFB_BUCKET_SHIFT 4
+#define SFB_NUMBUCKETS	(1 << SFB_BUCKET_SHIFT) /* N bins per Level */
+#define SFB_BUCKET_MASK (SFB_NUMBUCKETS - 1)
+#define SFB_LEVELS	(32 / SFB_BUCKET_SHIFT) /* L */
+
+/* SFB algo uses a virtual queue, named "bin" */
+struct sfb_bucket {
+	u16		qlen; /* length of virtual queue */
+	u16		p_mark; /* marking probability */
+};
+
+/* We use a double buffering right before hash change
+ * (Section 4.4 of SFB reference : moving hash functions)
+ */
+struct sfb_bins {
+	u32		  perturbation; /* jhash perturbation */
+	struct sfb_bucket bins[SFB_LEVELS][SFB_NUMBUCKETS];
+};
+
+struct sfb_sched_data {
+	struct Qdisc	*qdisc;
+	struct tcf_proto *filter_list;
+	unsigned long	rehash_interval;
+	unsigned long	warmup_time;	/* double buffering warmup time in jiffies */
+	u32		max;
+	u32		bin_size;	/* maximum queue length per bin */
+	u32		increment;	/* d1 */
+	u32		decrement;	/* d2 */
+	u32		limit;		/* HARD maximal queue length */
+	u32		penalty_rate;
+	u32		penalty_burst;
+	u32		tokens_avail;
+	unsigned long	rehash_time;
+	unsigned long	token_time;
+
+	u8		slot;		/* current active bins (0 or 1) */
+	bool		double_buffering;
+	struct sfb_bins bins[2];
+
+	struct {
+		u32	earlydrop;
+		u32	penaltydrop;
+		u32	bucketdrop;
+		u32	queuedrop;
+		u32	childdrop;	/* drops in child qdisc */
+		u32	marked;		/* ECN mark */
+	} stats;
+};
+
+/*
+ * Each queued skb might be hashed on one or two bins
+ * We store in skb_cb the two hash values.
+ * (A zero value means double buffering was not used)
+ */
+struct sfb_skb_cb {
+	u32 hashes[2];
+};
+
+static inline struct sfb_skb_cb *sfb_skb_cb(const struct sk_buff *skb)
+{
+	BUILD_BUG_ON(sizeof(skb->cb) <
+		sizeof(struct qdisc_skb_cb) + sizeof(struct sfb_skb_cb));
+	return (struct sfb_skb_cb *)qdisc_skb_cb(skb)->data;
+}
+
+/*
+ * If using 'internal' SFB flow classifier, hash comes from skb rxhash
+ * If using external classifier, hash comes from the classid.
+ */
+static u32 sfb_hash(const struct sk_buff *skb, u32 slot)
+{
+	return sfb_skb_cb(skb)->hashes[slot];
+}
+
+/* Probabilities are coded as Q0.16 fixed-point values,
+ * with 0xFFFF representing 65535/65536 (almost 1.0)
+ * Addition and subtraction are saturating in [0, 65535]
+ */
+static u32 prob_plus(u32 p1, u32 p2)
+{
+	u32 res = p1 + p2;
+
+	return min_t(u32, res, SFB_MAX_PROB);
+}
+
+static u32 prob_minus(u32 p1, u32 p2)
+{
+	return p1 > p2 ? p1 - p2 : 0;
+}
+
+static void increment_one_qlen(u32 sfbhash, u32 slot, struct sfb_sched_data *q)
+{
+	int i;
+	struct sfb_bucket *b = &q->bins[slot].bins[0][0];
+
+	for (i = 0; i < SFB_LEVELS; i++) {
+		u32 hash = sfbhash & SFB_BUCKET_MASK;
+
+		sfbhash >>= SFB_BUCKET_SHIFT;
+		if (b[hash].qlen < 0xFFFF)
+			b[hash].qlen++;
+		b += SFB_NUMBUCKETS; /* next level */
+	}
+}
+
+static void increment_qlen(const struct sk_buff *skb, struct sfb_sched_data *q)
+{
+	u32 sfbhash;
+
+	sfbhash = sfb_hash(skb, 0);
+	if (sfbhash)
+		increment_one_qlen(sfbhash, 0, q);
+
+	sfbhash = sfb_hash(skb, 1);
+	if (sfbhash)
+		increment_one_qlen(sfbhash, 1, q);
+}
+
+static void decrement_one_qlen(u32 sfbhash, u32 slot,
+			       struct sfb_sched_data *q)
+{
+	int i;
+	struct sfb_bucket *b = &q->bins[slot].bins[0][0];
+
+	for (i = 0; i < SFB_LEVELS; i++) {
+		u32 hash = sfbhash & SFB_BUCKET_MASK;
+
+		sfbhash >>= SFB_BUCKET_SHIFT;
+		if (b[hash].qlen > 0)
+			b[hash].qlen--;
+		b += SFB_NUMBUCKETS; /* next level */
+	}
+}
+
+static void decrement_qlen(const struct sk_buff *skb, struct sfb_sched_data *q)
+{
+	u32 sfbhash;
+
+	sfbhash = sfb_hash(skb, 0);
+	if (sfbhash)
+		decrement_one_qlen(sfbhash, 0, q);
+
+	sfbhash = sfb_hash(skb, 1);
+	if (sfbhash)
+		decrement_one_qlen(sfbhash, 1, q);
+}
+
+static void decrement_prob(struct sfb_bucket *b, struct sfb_sched_data *q)
+{
+	b->p_mark = prob_minus(b->p_mark, q->decrement);
+}
+
+static void increment_prob(struct sfb_bucket *b, struct sfb_sched_data *q)
+{
+	b->p_mark = prob_plus(b->p_mark, q->increment);
+}
+
+static void sfb_zero_all_buckets(struct sfb_sched_data *q)
+{
+	memset(&q->bins, 0, sizeof(q->bins));
+}
+
+/*
+ * compute max qlen, max p_mark, and avg p_mark
+ */
+static u32 sfb_compute_qlen(u32 *prob_r, u32 *avgpm_r, const struct sfb_sched_data *q)
+{
+	int i;
+	u32 qlen = 0, prob = 0, totalpm = 0;
+	const struct sfb_bucket *b = &q->bins[q->slot].bins[0][0];
+
+	for (i = 0; i < SFB_LEVELS * SFB_NUMBUCKETS; i++) {
+		if (qlen < b->qlen)
+			qlen = b->qlen;
+		totalpm += b->p_mark;
+		if (prob < b->p_mark)
+			prob = b->p_mark;
+		b++;
+	}
+	*prob_r = prob;
+	*avgpm_r = totalpm / (SFB_LEVELS * SFB_NUMBUCKETS);
+	return qlen;
+}
+
+
+static void sfb_init_perturbation(u32 slot, struct sfb_sched_data *q)
+{
+	q->bins[slot].perturbation = net_random();
+}
+
+static void sfb_swap_slot(struct sfb_sched_data *q)
+{
+	sfb_init_perturbation(q->slot, q);
+	q->slot ^= 1;
+	q->double_buffering = false;
+}
+
+/* Non elastic flows are allowed to use part of the bandwidth, expressed
+ * in "penalty_rate" packets per second, with "penalty_burst" burst
+ */
+static bool sfb_rate_limit(struct sk_buff *skb, struct sfb_sched_data *q)
+{
+	if (q->penalty_rate == 0 || q->penalty_burst == 0)
+		return true;
+
+	if (q->tokens_avail < 1) {
+		unsigned long age = min(10UL * HZ, jiffies - q->token_time);
+
+		q->tokens_avail = (age * q->penalty_rate) / HZ;
+		if (q->tokens_avail > q->penalty_burst)
+			q->tokens_avail = q->penalty_burst;
+		q->token_time = jiffies;
+		if (q->tokens_avail < 1)
+			return true;
+	}
+
+	q->tokens_avail--;
+	return false;
+}
+
+static bool sfb_classify(struct sk_buff *skb, struct sfb_sched_data *q,
+			 int *qerr, u32 *salt)
+{
+	struct tcf_result res;
+	int result;
+
+	result = tc_classify(skb, q->filter_list, &res);
+	if (result >= 0) {
+#ifdef CONFIG_NET_CLS_ACT
+		switch (result) {
+		case TC_ACT_STOLEN:
+		case TC_ACT_QUEUED:
+			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+		case TC_ACT_SHOT:
+			return false;
+		}
+#endif
+		*salt = TC_H_MIN(res.classid);
+		return true;
+	}
+	return false;
+}
+
+static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *child = q->qdisc;
+	int i;
+	u32 p_min = ~0;
+	u32 minqlen = ~0;
+	u32 r, slot, salt, sfbhash;
+	int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+
+	if (q->rehash_interval > 0) {
+		unsigned long limit = q->rehash_time + q->rehash_interval;
+
+		if (unlikely(time_after(jiffies, limit))) {
+			sfb_swap_slot(q);
+			q->rehash_time = jiffies;
+		} else if (unlikely(!q->double_buffering && q->warmup_time > 0 &&
+				    time_after(jiffies, limit - q->warmup_time))) {
+			q->double_buffering = true;
+		}
+	}
+
+	if (q->filter_list) {
+		/* If using external classifiers, get result and record it. */
+		if (!sfb_classify(skb, q, &ret, &salt))
+			goto other_drop;
+	} else {
+		salt = skb_get_rxhash(skb);
+	}
+
+	slot = q->slot;
+
+	sfbhash = jhash_1word(salt, q->bins[slot].perturbation);
+	if (!sfbhash)
+		sfbhash = 1;
+	sfb_skb_cb(skb)->hashes[slot] = sfbhash;
+
+	for (i = 0; i < SFB_LEVELS; i++) {
+		u32 hash = sfbhash & SFB_BUCKET_MASK;
+		struct sfb_bucket *b = &q->bins[slot].bins[i][hash];
+
+		sfbhash >>= SFB_BUCKET_SHIFT;
+		if (b->qlen == 0)
+			decrement_prob(b, q);
+		else if (b->qlen >= q->bin_size)
+			increment_prob(b, q);
+		if (minqlen > b->qlen)
+			minqlen = b->qlen;
+		if (p_min > b->p_mark)
+			p_min = b->p_mark;
+	}
+
+	slot ^= 1;
+	sfb_skb_cb(skb)->hashes[slot] = 0;
+
+	if (unlikely(minqlen >= q->max || sch->q.qlen >= q->limit)) {
+		sch->qstats.overlimits++;
+		if (minqlen >= q->max)
+			q->stats.bucketdrop++;
+		else
+			q->stats.queuedrop++;
+		goto drop;
+	}
+
+	if (unlikely(p_min >= SFB_MAX_PROB)) {
+		/* Inelastic flow */
+		if (q->double_buffering) {
+			sfbhash = jhash_1word(salt, q->bins[slot].perturbation);
+			if (!sfbhash)
+				sfbhash = 1;
+			sfb_skb_cb(skb)->hashes[slot] = sfbhash;
+
+			for (i = 0; i < SFB_LEVELS; i++) {
+				u32 hash = sfbhash & SFB_BUCKET_MASK;
+				struct sfb_bucket *b = &q->bins[slot].bins[i][hash];
+
+				sfbhash >>= SFB_BUCKET_SHIFT;
+				if (b->qlen == 0)
+					decrement_prob(b, q);
+				else if (b->qlen >= q->bin_size)
+					increment_prob(b, q);
+			}
+		}
+		if (sfb_rate_limit(skb, q)) {
+			sch->qstats.overlimits++;
+			q->stats.penaltydrop++;
+			goto drop;
+		}
+		goto enqueue;
+	}
+
+	r = net_random() & SFB_MAX_PROB;
+
+	if (unlikely(r < p_min)) {
+		if (unlikely(p_min > SFB_MAX_PROB / 2)) {
+			/* If we're marking that many packets, then either
+			 * this flow is unresponsive, or we're badly congested.
+			 * In either case, we want to start dropping packets.
+			 */
+			if (r < (p_min - SFB_MAX_PROB / 2) * 2) {
+				q->stats.earlydrop++;
+				goto drop;
+			}
+		}
+		if (INET_ECN_set_ce(skb)) {
+			q->stats.marked++;
+		} else {
+			q->stats.earlydrop++;
+			goto drop;
+		}
+	}
+
+enqueue:
+	ret = qdisc_enqueue(skb, child);
+	if (likely(ret == NET_XMIT_SUCCESS)) {
+		sch->q.qlen++;
+		increment_qlen(skb, q);
+	} else if (net_xmit_drop_count(ret)) {
+		q->stats.childdrop++;
+		sch->qstats.drops++;
+	}
+	return ret;
+
+drop:
+	qdisc_drop(skb, sch);
+	return NET_XMIT_CN;
+other_drop:
+	if (ret & __NET_XMIT_BYPASS)
+		sch->qstats.drops++;
+	kfree_skb(skb);
+	return ret;
+}
+
+static struct sk_buff *sfb_dequeue(struct Qdisc *sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *child = q->qdisc;
+	struct sk_buff *skb;
+
+	skb = child->dequeue(q->qdisc);
+
+	if (skb) {
+		qdisc_bstats_update(sch, skb);
+		sch->q.qlen--;
+		decrement_qlen(skb, q);
+	}
+
+	return skb;
+}
+
+static struct sk_buff *sfb_peek(struct Qdisc *sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *child = q->qdisc;
+
+	return child->ops->peek(child);
+}
+
+/* No sfb_drop -- impossible since the child doesn't return the dropped skb. */
+
+static void sfb_reset(struct Qdisc *sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	qdisc_reset(q->qdisc);
+	sch->q.qlen = 0;
+	q->slot = 0;
+	q->double_buffering = false;
+	sfb_zero_all_buckets(q);
+	sfb_init_perturbation(0, q);
+}
+
+static void sfb_destroy(struct Qdisc *sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	tcf_destroy_chain(&q->filter_list);
+	qdisc_destroy(q->qdisc);
+}
+
+static const struct nla_policy sfb_policy[TCA_SFB_MAX + 1] = {
+	[TCA_SFB_PARMS]	= { .len = sizeof(struct tc_sfb_qopt) },
+};
+
+static const struct tc_sfb_qopt sfb_default_ops = {
+	.rehash_interval = 600 * MSEC_PER_SEC,
+	.warmup_time = 60 * MSEC_PER_SEC,
+	.limit = 0,
+	.max = 25,
+	.bin_size = 20,
+	.increment = (SFB_MAX_PROB + 500) / 1000, /* 0.1 % */
+	.decrement = (SFB_MAX_PROB + 3000) / 6000,
+	.penalty_rate = 10,
+	.penalty_burst = 20,
+};
+
+static int sfb_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *child;
+	struct nlattr *tb[TCA_SFB_MAX + 1];
+	const struct tc_sfb_qopt *ctl = &sfb_default_ops;
+	u32 limit;
+	int err;
+
+	if (opt) {
+		err = nla_parse_nested(tb, TCA_SFB_MAX, opt, sfb_policy);
+		if (err < 0)
+			return -EINVAL;
+
+		if (tb[TCA_SFB_PARMS] == NULL)
+			return -EINVAL;
+
+		ctl = nla_data(tb[TCA_SFB_PARMS]);
+	}
+
+	limit = ctl->limit;
+	if (limit == 0)
+		limit = max_t(u32, qdisc_dev(sch)->tx_queue_len, 1);
+
+	child = fifo_create_dflt(sch, &pfifo_qdisc_ops, limit);
+	if (IS_ERR(child))
+		return PTR_ERR(child);
+
+	sch_tree_lock(sch);
+
+	qdisc_tree_decrease_qlen(q->qdisc, q->qdisc->q.qlen);
+	qdisc_destroy(q->qdisc);
+	q->qdisc = child;
+
+	q->rehash_interval = msecs_to_jiffies(ctl->rehash_interval);
+	q->warmup_time = msecs_to_jiffies(ctl->warmup_time);
+	q->rehash_time = jiffies;
+	q->limit = limit;
+	q->increment = ctl->increment;
+	q->decrement = ctl->decrement;
+	q->max = ctl->max;
+	q->bin_size = ctl->bin_size;
+	q->penalty_rate = ctl->penalty_rate;
+	q->penalty_burst = ctl->penalty_burst;
+	q->tokens_avail = ctl->penalty_burst;
+	q->token_time = jiffies;
+
+	q->slot = 0;
+	q->double_buffering = false;
+	sfb_zero_all_buckets(q);
+	sfb_init_perturbation(0, q);
+	sfb_init_perturbation(1, q);
+
+	sch_tree_unlock(sch);
+
+	return 0;
+}
+
+static int sfb_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	q->qdisc = &noop_qdisc;
+	return sfb_change(sch, opt);
+}
+
+static int sfb_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts;
+	struct tc_sfb_qopt opt = {
+		.rehash_interval = jiffies_to_msecs(q->rehash_interval),
+		.warmup_time = jiffies_to_msecs(q->warmup_time),
+		.limit = q->limit,
+		.max = q->max,
+		.bin_size = q->bin_size,
+		.increment = q->increment,
+		.decrement = q->decrement,
+		.penalty_rate = q->penalty_rate,
+		.penalty_burst = q->penalty_burst,
+	};
+
+	sch->qstats.backlog = q->qdisc->qstats.backlog;
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	NLA_PUT(skb, TCA_SFB_PARMS, sizeof(opt), &opt);
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -EMSGSIZE;
+}
+
+static int sfb_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct tc_sfb_xstats st = {
+		.earlydrop = q->stats.earlydrop,
+		.penaltydrop = q->stats.penaltydrop,
+		.bucketdrop = q->stats.bucketdrop,
+		.queuedrop = q->stats.queuedrop,
+		.childdrop = q->stats.childdrop,
+		.marked = q->stats.marked,
+	};
+
+	st.maxqlen = sfb_compute_qlen(&st.maxprob, &st.avgprob, q);
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static int sfb_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+	return -ENOSYS;
+}
+
+static int sfb_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
+		     struct Qdisc **old)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	if (new == NULL)
+		new = &noop_qdisc;
+
+	sch_tree_lock(sch);
+	*old = q->qdisc;
+	q->qdisc = new;
+	qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
+	qdisc_reset(*old);
+	sch_tree_unlock(sch);
+	return 0;
+}
+
+static struct Qdisc *sfb_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	return q->qdisc;
+}
+
+static unsigned long sfb_get(struct Qdisc *sch, u32 classid)
+{
+	return 1;
+}
+
+static void sfb_put(struct Qdisc *sch, unsigned long arg)
+{
+}
+
+static int sfb_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
+			    struct nlattr **tca, unsigned long *arg)
+{
+	return -ENOSYS;
+}
+
+static int sfb_delete(struct Qdisc *sch, unsigned long cl)
+{
+	return -ENOSYS;
+}
+
+static void sfb_walk(struct Qdisc *sch, struct qdisc_walker *walker)
+{
+	if (!walker->stop) {
+		if (walker->count >= walker->skip)
+			if (walker->fn(sch, 1, walker) < 0) {
+				walker->stop = 1;
+				return;
+			}
+		walker->count++;
+	}
+}
+
+static struct tcf_proto **sfb_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	if (cl)
+		return NULL;
+	return &q->filter_list;
+}
+
+static unsigned long sfb_bind(struct Qdisc *sch, unsigned long parent,
+			      u32 classid)
+{
+	return 0;
+}
+
+
+static const struct Qdisc_class_ops sfb_class_ops = {
+	.graft		=	sfb_graft,
+	.leaf		=	sfb_leaf,
+	.get		=	sfb_get,
+	.put		=	sfb_put,
+	.change		=	sfb_change_class,
+	.delete		=	sfb_delete,
+	.walk		=	sfb_walk,
+	.tcf_chain	=	sfb_find_tcf,
+	.bind_tcf	=	sfb_bind,
+	.unbind_tcf	=	sfb_put,
+	.dump		=	sfb_dump_class,
+};
+
+struct Qdisc_ops sfb_qdisc_ops __read_mostly = {
+	.id		=	"sfb",
+	.priv_size	=	sizeof(struct sfb_sched_data),
+	.cl_ops		=	&sfb_class_ops,
+	.enqueue	=	sfb_enqueue,
+	.dequeue	=	sfb_dequeue,
+	.peek		=	sfb_peek,
+	.init		=	sfb_init,
+	.reset		=	sfb_reset,
+	.destroy	=	sfb_destroy,
+	.change		=	sfb_change,
+	.dump		=	sfb_dump,
+	.dump_stats	=	sfb_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init sfb_module_init(void)
+{
+	return register_qdisc(&sfb_qdisc_ops);
+}
+
+static void __exit sfb_module_exit(void)
+{
+	unregister_qdisc(&sfb_qdisc_ops);
+}
+
+module_init(sfb_module_init)
+module_exit(sfb_module_exit)
+
+MODULE_DESCRIPTION("Stochastic Fair Blue queue discipline");
+MODULE_AUTHOR("Juliusz Chroboczek");
+MODULE_AUTHOR("Eric Dumazet");
+MODULE_LICENSE("GPL");



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next-2.6 v4] net_sched: SFB flow scheduler
  2011-02-23 20:56       ` [PATCH net-next-2.6 v4] net_sched: SFB flow scheduler Eric Dumazet
@ 2011-02-23 21:11         ` Stephen Hemminger
  2011-02-23 21:28           ` Eric Dumazet
  2011-02-23 22:06         ` David Miller
  1 sibling, 1 reply; 22+ messages in thread
From: Stephen Hemminger @ 2011-02-23 21:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Patrick McHardy, netdev, Andi Kleen

On Wed, 23 Feb 2011 21:56:17 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> +struct Qdisc_ops sfb_qdisc_ops __read_mostly = {

Should be static?


-- 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next-2.6 v4] net_sched: SFB flow scheduler
  2011-02-23 21:11         ` Stephen Hemminger
@ 2011-02-23 21:28           ` Eric Dumazet
  2011-02-23 21:30             ` David Miller
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2011-02-23 21:28 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Patrick McHardy, netdev, Andi Kleen

Le mercredi 23 février 2011 à 13:11 -0800, Stephen Hemminger a écrit :
> On Wed, 23 Feb 2011 21:56:17 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > +struct Qdisc_ops sfb_qdisc_ops __read_mostly = {
> 
> Should be static?
> 
> 

Sure ! ;)



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next-2.6 v4] net_sched: SFB flow scheduler
  2011-02-23 21:28           ` Eric Dumazet
@ 2011-02-23 21:30             ` David Miller
  0 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2011-02-23 21:30 UTC (permalink / raw)
  To: eric.dumazet
  Cc: shemminger, Juliusz.Chroboczek, linville, kaber, netdev, andi

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 23 Feb 2011 22:28:36 +0100

> Le mercredi 23 février 2011 à 13:11 -0800, Stephen Hemminger a écrit :
>> On Wed, 23 Feb 2011 21:56:17 +0100
>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> 
>> > +struct Qdisc_ops sfb_qdisc_ops __read_mostly = {
>> 
>> Should be static?
>> 
>> 
> 
> Sure ! ;)

I can make this simple fixup when I apply the patch to net-next-2.6,
Eric you don't have to respin it just for this.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next-2.6 v4] net_sched: SFB flow scheduler
  2011-02-23 20:56       ` [PATCH net-next-2.6 v4] net_sched: SFB flow scheduler Eric Dumazet
  2011-02-23 21:11         ` Stephen Hemminger
@ 2011-02-23 22:06         ` David Miller
  2011-02-24  5:40           ` Eric Dumazet
  1 sibling, 1 reply; 22+ messages in thread
From: David Miller @ 2011-02-23 22:06 UTC (permalink / raw)
  To: eric.dumazet
  Cc: Juliusz.Chroboczek, linville, shemminger, kaber, netdev, andi

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 23 Feb 2011 21:56:17 +0100

> This is the Stochastic Fair Blue scheduler, based on work from :

Applied with the 'static' fix, thanks Eric!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net_sched: long word align struct qdisc_skb_cb data
  2011-02-23 17:30           ` Stephen Hemminger
@ 2011-02-23 22:17             ` David Miller
  0 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2011-02-23 22:17 UTC (permalink / raw)
  To: shemminger
  Cc: eric.dumazet, kaber, Juliusz.Chroboczek, linville, netdev, andi

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 23 Feb 2011 09:30:01 -0800

> On Wed, 23 Feb 2011 18:05:07 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
>> Le mercredi 23 février 2011 à 17:24 +0100, Patrick McHardy a écrit :
>> > Am 23.02.2011 16:14, schrieb Eric Dumazet:
>> > > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> > > index 16626a0..f40d32e 100644
>> > > --- a/include/net/sch_generic.h
>> > > +++ b/include/net/sch_generic.h
>> > > @@ -218,6 +218,7 @@ struct tcf_proto {
>> > >  
>> > >  struct qdisc_skb_cb {
>> > >  	unsigned int		pkt_len;
>> > > +	unsigned int		sfb_classid;
>> > >  	char			data[];
>> > >  };
>> > 
>> > This could be moved into a SFB specific cb, similar to what netem
>> > does.
>> 
>> Hmm... well... I want to be sure no other sch will destroy my values.
>> 
>> netem seems buggy then.
>> 
>> Probably following patch is needed ?
> 
> Yes, it was long word aligned when netem was written but
> we seem to have bit creep.

Applied to net-2.6, thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next-2.6 v4] net_sched: SFB flow scheduler
  2011-02-23 22:06         ` David Miller
@ 2011-02-24  5:40           ` Eric Dumazet
  2011-02-24  6:51             ` Stephen Hemminger
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2011-02-24  5:40 UTC (permalink / raw)
  To: David Miller
  Cc: Juliusz.Chroboczek, linville, shemminger, kaber, netdev, andi

Le mercredi 23 février 2011 à 14:06 -0800, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Wed, 23 Feb 2011 21:56:17 +0100
> 
> > This is the Stochastic Fair Blue scheduler, based on work from :
> 
> Applied with the 'static' fix, thanks Eric!

Thanks David !

My next step is to expand the idea I had some time ago (with my SFQ
works in december) :

Add a generic (core) service :

- Timestamp skb when it enters qdisc (might use skb->tstamp ?)
- At dequeue time, compute the delay.

1) Be able to mark/drop the packet right before giving it to device if
delay above a threshold, or use an array of thresholds depending on TOS

2) Integrate the delay into one EWMA

3) For SFB : Use the EWMA to eventually replace the non convenient
penalty_box by auto adaptative mechanism : Allow non elastic flows to
take part of the bandwidth, using a drop/mark probability depending on
this EWMA.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next-2.6 v4] net_sched: SFB flow scheduler
  2011-02-24  5:40           ` Eric Dumazet
@ 2011-02-24  6:51             ` Stephen Hemminger
  2011-02-24  7:14               ` Eric Dumazet
  2011-03-24 17:44               ` [PATCH iproute2] tc : " Eric Dumazet
  0 siblings, 2 replies; 22+ messages in thread
From: Stephen Hemminger @ 2011-02-24  6:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Juliusz.Chroboczek, linville, kaber, netdev, andi

Where is the iproute (q_sfb) piece?


-- 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next-2.6 v4] net_sched: SFB flow scheduler
  2011-02-24  6:51             ` Stephen Hemminger
@ 2011-02-24  7:14               ` Eric Dumazet
  2011-02-24  7:18                 ` Eric Dumazet
  2011-03-24 17:44               ` [PATCH iproute2] tc : " Eric Dumazet
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2011-02-24  7:14 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, Juliusz.Chroboczek, linville, kaber, netdev, andi

Le mercredi 23 février 2011 à 22:51 -0800, Stephen Hemminger a écrit :
> Where is the iproute (q_sfb) piece?
> 
> 

I'll send it this morning after polishing.

As a matter of fact, I might add Patrick idea to use nested attributes,
before official SFB release.

Thanks



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next-2.6 v4] net_sched: SFB flow scheduler
  2011-02-24  7:14               ` Eric Dumazet
@ 2011-02-24  7:18                 ` Eric Dumazet
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2011-02-24  7:18 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, Juliusz.Chroboczek, linville, kaber, netdev, andi

Le jeudi 24 février 2011 à 08:14 +0100, Eric Dumazet a écrit :
> Le mercredi 23 février 2011 à 22:51 -0800, Stephen Hemminger a écrit :
> > Where is the iproute (q_sfb) piece?
> > 
> > 
> 
> I'll send it this morning after polishing.
> 
> As a matter of fact, I might add Patrick idea to use nested attributes,
> before official SFB release.
> 

BTW, one can use old iproute (tc command) for SFB with default params.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH iproute2] tc : SFB flow scheduler
  2011-02-24  6:51             ` Stephen Hemminger
  2011-02-24  7:14               ` Eric Dumazet
@ 2011-03-24 17:44               ` Eric Dumazet
  2011-03-24 21:59                 ` Stephen Hemminger
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2011-03-24 17:44 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, Juliusz.Chroboczek, netdev, Jonathan Morton

From: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr>

Supports SFB qdisc (included in linux-2.6.39)

1) Setup phase : accept non default parameters

2) dump information 

qdisc sfb 11: parent 1:11 limit 1 max 25 target 20
  increment 0.00050 decrement 0.00005 penalty rate 10 burst 20 (600000ms 60000ms)
 Sent 47991616 bytes 521648 pkt (dropped 549245, overlimits 549245 requeues 0) 
 rate 7193Kbit 9774pps backlog 0b 0p requeues 0 
  earlydrop 0 penaltydrop 0 bucketdrop 0 queuedrop 549245 childdrop 0 marked 0
  maxqlen 0 maxprob 0.00000 avgprob 0.00000 

Signed-off-by: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 tc/Makefile |    1 
 tc/q_sfb.c  |  200 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 201 insertions(+)

diff --git a/tc/Makefile b/tc/Makefile
index 2cbd5d5..24584f1 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -16,6 +16,7 @@ TCMODULES += q_rr.o
 TCMODULES += q_multiq.o
 TCMODULES += q_netem.o
 TCMODULES += q_choke.o
+TCMODULES += q_sfb.o
 TCMODULES += f_rsvp.o
 TCMODULES += f_u32.o
 TCMODULES += f_route.o
diff --git a/tc/q_sfb.c b/tc/q_sfb.c
new file mode 100644
index 0000000..f11c33a
--- /dev/null
+++ b/tc/q_sfb.c
@@ -0,0 +1,200 @@
+/*
+ * q_sfb.c	Stochastic Fair Blue.
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Juliusz Chroboczek <jch@pps.jussieu.fr>
+ *
+ */
+
+
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+
+#include "utils.h"
+#include "tc_util.h"
+
+static void explain(void)
+{
+	fprintf(stderr,
+		"Usage: ... sfb [ rehash SECS ] [ db SECS ]\n"
+		"	    [ limit PACKETS ] [ max PACKETS ] [ target PACKETS ]\n"
+		"	    [ increment FLOAT ] [ decrement FLOAT ]\n"
+		"	    [ penalty_rate PPS ] [ penalty_burst PACKETS ]\n");
+}
+
+static int get_prob(__u32 *val, const char *arg)
+{
+	double d;
+	char *ptr;
+
+	if (!arg || !*arg)
+		return -1;
+	d = strtod(arg, &ptr);
+	if (!ptr || ptr == arg || d < 0.0 || d > 1.0)
+		return -1;
+	*val = (__u32)(d * SFB_MAX_PROB + 0.5);
+	return 0;
+}
+
+static int sfb_parse_opt(struct qdisc_util *qu, int argc, char **argv,
+			 struct nlmsghdr *n)
+{
+	struct tc_sfb_qopt opt;
+	struct rtattr *tail;
+
+	memset(&opt, 0, sizeof(opt));
+	opt.rehash_interval = 600*1000;
+	opt.warmup_time = 60*1000;
+	opt.penalty_rate = 10;
+	opt.penalty_burst = 20;
+	opt.increment = (SFB_MAX_PROB + 1000) / 2000;
+	opt.decrement = (SFB_MAX_PROB + 10000) / 20000;
+
+	while (argc > 0) {
+	    if (strcmp(*argv, "rehash") == 0) {
+			NEXT_ARG();
+			if (get_u32(&opt.rehash_interval, *argv, 0)) {
+				fprintf(stderr, "Illegal \"rehash\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "db") == 0) {
+			NEXT_ARG();
+			if (get_u32(&opt.warmup_time, *argv, 0)) {
+				fprintf(stderr, "Illegal \"db\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "limit") == 0) {
+			NEXT_ARG();
+			if (get_u32(&opt.limit, *argv, 0)) {
+				fprintf(stderr, "Illegal \"limit\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "max") == 0) {
+			NEXT_ARG();
+			if (get_u32(&opt.max, *argv, 0)) {
+				fprintf(stderr, "Illegal \"max\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "target") == 0) {
+			NEXT_ARG();
+			if (get_u32(&opt.bin_size, *argv, 0)) {
+				fprintf(stderr, "Illegal \"target\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "increment") == 0) {
+			NEXT_ARG();
+			if (get_prob(&opt.increment, *argv)) {
+				fprintf(stderr, "Illegal \"increment\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "decrement") == 0) {
+			NEXT_ARG();
+			if (get_prob(&opt.decrement, *argv)) {
+				fprintf(stderr, "Illegal \"decrement\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "penalty_rate") == 0) {
+			NEXT_ARG();
+			if (get_u32(&opt.penalty_rate, *argv, 0)) {
+				fprintf(stderr, "Illegal \"penalty_rate\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "penalty_burst") == 0) {
+			NEXT_ARG();
+			if (get_u32(&opt.penalty_burst, *argv, 0)) {
+				fprintf(stderr, "Illegal \"penalty_burst\"\n");
+				return -1;
+			}
+		} else {
+			fprintf(stderr, "What is \"%s\"?\n", *argv);
+			explain();
+			return -1;
+		}
+		argc--; argv++;
+	}
+
+	if (opt.max == 0) {
+		if (opt.bin_size >= 1)
+			opt.max = (opt.bin_size * 5 + 1) / 4;
+		else
+			opt.max = 25;
+	}
+	if (opt.bin_size == 0)
+		opt.bin_size = (opt.max * 4 + 3) / 5;
+
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
+	addattr_l(n, 1024, TCA_SFB_PARMS, &opt, sizeof(opt));
+	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
+	return 0;
+}
+
+static int sfb_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
+{
+	struct rtattr *tb[__TCA_SFB_MAX];
+	struct tc_sfb_qopt *qopt;
+
+	if (opt == NULL)
+		return 0;
+
+	parse_rtattr_nested(tb, TCA_SFB_MAX, opt);
+	if (tb[TCA_SFB_PARMS] == NULL)
+		return -1;
+	qopt = RTA_DATA(tb[TCA_SFB_PARMS]);
+	if (RTA_PAYLOAD(tb[TCA_SFB_PARMS]) < sizeof(*qopt))
+		return -1;
+
+	fprintf(f,
+		"limit %d max %d target %d\n"
+		"  increment %.5f decrement %.5f penalty rate %d burst %d "
+		"(%ums %ums)",
+		qopt->limit, qopt->max, qopt->bin_size,
+		(double)qopt->increment / SFB_MAX_PROB,
+		(double)qopt->decrement / SFB_MAX_PROB,
+		qopt->penalty_rate, qopt->penalty_burst,
+		qopt->rehash_interval, qopt->warmup_time);
+
+	return 0;
+}
+
+static int sfb_print_xstats(struct qdisc_util *qu, FILE *f,
+			    struct rtattr *xstats)
+{
+    struct tc_sfb_xstats *st;
+
+    if (xstats == NULL)
+	    return 0;
+
+    if (RTA_PAYLOAD(xstats) < sizeof(*st))
+	    return -1;
+
+    st = RTA_DATA(xstats);
+    fprintf(f,
+	    "  earlydrop %u penaltydrop %u bucketdrop %u queuedrop %u childdrop %u marked %u\n"
+	    "  maxqlen %u maxprob %.5f avgprob %.5f ",
+	    st->earlydrop, st->penaltydrop, st->bucketdrop, st->queuedrop, st->childdrop,
+	    st->marked,
+	    st->maxqlen, (double)st->maxprob / SFB_MAX_PROB,
+		(double)st->avgprob / SFB_MAX_PROB);
+
+    return 0;
+}
+
+struct qdisc_util sfb_qdisc_util = {
+	.id		= "sfb",
+	.parse_qopt	= sfb_parse_opt,
+	.print_qopt	= sfb_print_opt,
+	.print_xstats	= sfb_print_xstats,
+};



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH iproute2] tc : SFB flow scheduler
  2011-03-24 17:44               ` [PATCH iproute2] tc : " Eric Dumazet
@ 2011-03-24 21:59                 ` Stephen Hemminger
  0 siblings, 0 replies; 22+ messages in thread
From: Stephen Hemminger @ 2011-03-24 21:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Juliusz.Chroboczek, netdev, Jonathan Morton

On Thu, 24 Mar 2011 18:44:09 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> From: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr>
> 
> Supports SFB qdisc (included in linux-2.6.39)
> 
> 1) Setup phase : accept non default parameters
> 
> 2) dump information 
> 
> qdisc sfb 11: parent 1:11 limit 1 max 25 target 20
>   increment 0.00050 decrement 0.00005 penalty rate 10 burst 20 (600000ms 60000ms)
>  Sent 47991616 bytes 521648 pkt (dropped 549245, overlimits 549245 requeues 0) 
>  rate 7193Kbit 9774pps backlog 0b 0p requeues 0 
>   earlydrop 0 penaltydrop 0 bucketdrop 0 queuedrop 549245 childdrop 0 marked 0
>   maxqlen 0 maxprob 0.00000 avgprob 0.00000 
> 
> Signed-off-by: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied (to net-next branch)


-- 

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2011-03-24 21:59 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20110221160306.GA9650@tuxdriver.com>
     [not found] ` <1298308283.2849.5.camel@edumazet-laptop>
     [not found]   ` <1298390536.2861.9.camel@edumazet-laptop>
2011-02-23 15:14     ` [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler Eric Dumazet
2011-02-23 15:43       ` Stephen Hemminger
2011-02-23 16:13         ` Eric Dumazet
2011-02-23 16:20           ` Patrick McHardy
2011-02-23 16:24       ` Patrick McHardy
2011-02-23 16:48         ` Eric Dumazet
2011-02-23 16:58           ` Patrick McHardy
2011-02-23 17:16             ` Eric Dumazet
2011-02-23 17:05         ` [PATCH] net_sched: long word align struct qdisc_skb_cb data Eric Dumazet
2011-02-23 17:30           ` Stephen Hemminger
2011-02-23 22:17             ` David Miller
2011-02-23 20:56       ` [PATCH net-next-2.6 v4] net_sched: SFB flow scheduler Eric Dumazet
2011-02-23 21:11         ` Stephen Hemminger
2011-02-23 21:28           ` Eric Dumazet
2011-02-23 21:30             ` David Miller
2011-02-23 22:06         ` David Miller
2011-02-24  5:40           ` Eric Dumazet
2011-02-24  6:51             ` Stephen Hemminger
2011-02-24  7:14               ` Eric Dumazet
2011-02-24  7:18                 ` Eric Dumazet
2011-03-24 17:44               ` [PATCH iproute2] tc : " Eric Dumazet
2011-03-24 21:59                 ` Stephen Hemminger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.