All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] CHOKe flow scheduler (0.7)
@ 2011-01-13 17:27 Stephen Hemminger
  2011-01-13 17:48 ` [PATCH] CHOKe flow scheduler (iproute) Stephen Hemminger
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Stephen Hemminger @ 2011-01-13 17:27 UTC (permalink / raw)
  To: David Miller, Eric Dumazet; +Cc: netdev

This implements the CHOKe packet scheduler based on the existing
Linux RED scheduler based on the algorithm described in the paper.

The core idea is:
  For every packet arrival:
  	Calculate Qave
	if (Qave < minth) 
	     Queue the new packet
	else 
	     Select randomly a packet from the queue 
	     if (both packets from same flow)
	     then Drop both the packets
	     else if (Qave > maxth)
	          Drop packet
	     else
	       	  Admit packet with proability p (same as RED)

See also:
  Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
   queue management scheme for approximating fair bandwidth allocation", 
  Proceeding of INFOCOM'2000, March 2000.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
Patch versions
0.3 (Eric) uses table for queue.
0.4 allows classification with TC filters
    fixes crash when peek_random() finds a hole
0.5 (Eric) that fixes qlen with holes and peek
0.7 change to use separate params / stats than RED
    account for drops in backlog

Almost ready, still need to make sure API (netlink) is right


 net/sched/Kconfig     |   11 +
 net/sched/Makefile    |    1 
 net/sched/sch_choke.c |  536 ++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 548 insertions(+)

--- a/net/sched/Kconfig	2011-01-12 17:44:05.747500044 -0800
+++ b/net/sched/Kconfig	2011-01-12 17:44:53.167735188 -0800
@@ -205,6 +205,17 @@ config NET_SCH_DRR
 
 	  If unsure, say N.
 
+config NET_SCH_CHOKE
+	tristate "CHOose and Keep responsive flow scheduler (CHOKE)"
+	help
+	  Say Y here if you want to use the CHOKe packet scheduler (CHOose
+	  and Keep for responsive flows, CHOose and Kill for unresponsive
+	  flows). This is a variation of RED which trys to penalize flows
+	  that monopolize the queue.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called sch_choke.
+
 config NET_SCH_INGRESS
 	tristate "Ingress Qdisc"
 	depends on NET_CLS_ACT
--- a/net/sched/Makefile	2011-01-12 17:44:05.767500135 -0800
+++ b/net/sched/Makefile	2011-01-12 17:44:53.167735188 -0800
@@ -32,6 +32,7 @@ obj-$(CONFIG_NET_SCH_MULTIQ)	+= sch_mult
 obj-$(CONFIG_NET_SCH_ATM)	+= sch_atm.o
 obj-$(CONFIG_NET_SCH_NETEM)	+= sch_netem.o
 obj-$(CONFIG_NET_SCH_DRR)	+= sch_drr.o
+obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
 obj-$(CONFIG_NET_CLS_FW)	+= cls_fw.o
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ b/net/sched/sch_choke.c	2011-01-12 17:45:07.227806180 -0800
@@ -0,0 +1,556 @@
+/*
+ * net/sched/sch_choke.c	CHOKE scheduler
+ *
+ * Copyright (c) 2011 Stephen Hemminger <shemminger@vyatta.com>
+ * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/reciprocal_div.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <net/red.h>
+
+/*	CHOKe stateless AQM for fair bandwidth allocation
+        =================================================
+
+   CHOKe (CHOose and Keep for responsive flows, CHOose and Kill for
+   unresponsive flows) is a variant of RED that penalizes misbehaving flows but
+   maintains no flow state. The difference from RED is an additional step
+   during the enqueuing process. If average queue size is over the
+   low threshold (qmin), a packet is chosen at random from the queue.
+   If both the new and chosen packet are from the same flow, both
+   are dropped. Unlike RED, CHOKe is not really a "classful" qdisc because it
+   needs to access packets in queue randomly. It has a minimal class
+   interface to allow overriding the builtin flow classifier with
+   filters.
+
+   Source:
+   R. Pan, B. Prabhakar, and K. Psounis, "CHOKe, A Stateless
+   Active Queue Management Scheme for Approximating Fair Bandwidth Allocation",
+   IEEE INFOCOM, 2000.
+
+   A. Tang, J. Wang, S. Low, "Understanding CHOKe: Throughput and Spatial
+   Characteristics", IEEE/ACM Transactions on Networking, 2004
+
+ */
+
+/* Upper bound on size of sk_buff table */
+#define CHOKE_MAX_QUEUE	(128*1024 - 1)
+
+struct choke_sched_data {
+/* Parameters */
+	u32		 limit;
+	unsigned char	 flags;
+
+	struct red_parms parms;
+
+/* Variables */
+	struct tcf_proto *filter_list;
+	struct {
+		u32	prob_drop;	/* Early probability drops */
+		u32	prob_mark;	/* Early probability marks */
+		u32	forced_drop;	/* Forced drops, qavg > max_thresh */
+		u32	forced_mark;	/* Forced marks, qavg > max_thresh */
+		u32	pdrop;          /* Drops due to queue limits */
+		u32	other;          /* Drops due to drop() calls */
+		u32	matched;	/* Drops to flow match */
+	} stats;
+
+	unsigned int	 head;
+	unsigned int	 tail;
+	unsigned int	 holes;
+	unsigned int	 tab_mask; /* size - 1 */
+
+	struct sk_buff **tab;
+};
+
+static inline unsigned int choke_len(const struct choke_sched_data *q)
+{
+	return (q->tail - q->head) & q->tab_mask;
+}
+
+/* deliver a random number between 0 and N - 1 */
+static inline u32 random_N(unsigned int N)
+{
+	return reciprocal_divide(random32(), N);
+}
+
+/* Select a packet at random from the queue in O(1) and handle holes */
+static struct sk_buff *choke_peek_random(struct choke_sched_data *q,
+					 unsigned int *pidx)
+{
+	struct sk_buff *skb;
+	int retrys = 3;
+
+	do {
+		*pidx = (q->head + random_N(choke_len(q))) & q->tab_mask;
+		skb = q->tab[*pidx];
+		if (skb)
+			return skb;
+	} while (--retrys > 0);
+
+	/* queue is has lots of holes use the head which is known to exist */
+	return q->tab[*pidx = q->head];
+}
+
+/* Is ECN parameter configured */
+static inline int use_ecn(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_ECN;
+}
+
+/* Should packets over max just be dropped (versus marked) */
+static inline int use_harddrop(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_HARDDROP;
+}
+
+/* Move head pointer forward to skip over holes */
+static void choke_zap_head_holes(struct choke_sched_data *q)
+{
+	while (q->holes && q->tab[q->head] == NULL) {
+		q->head = (q->head + 1) & q->tab_mask;
+		q->holes--;
+	}
+}
+
+/* Move tail pointer backwards to reuse holes */
+static void choke_zap_tail_holes(struct choke_sched_data *q)
+{
+	while (q->holes && q->tab[q->tail - 1] == NULL) {
+		q->tail = (q->tail - 1) & q->tab_mask;
+		q->holes--;
+	}
+}
+
+/* Drop packet from queue array by creating a "hole" */
+static void choke_drop_by_idx(struct choke_sched_data *q, unsigned int idx)
+{
+	q->tab[idx] = NULL;
+	q->holes++;
+
+	if (idx == q->head)
+		choke_zap_head_holes(q);
+	if (idx == q->tail)
+		choke_zap_tail_holes(q);
+}
+
+/* Classify flow using either:
+   1. pre-existing classification result in skb
+   2. fast internal classification
+   3. use TC filter based classification
+*/
+static inline unsigned int choke_classify(struct sk_buff *skb,
+					  struct Qdisc *sch, int *qerr)
+
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tcf_result res;
+	int result;
+
+	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+
+	if (TC_H_MAJ(skb->priority) == sch->handle &&
+	    TC_H_MIN(skb->priority) > 0)
+		return TC_H_MIN(skb->priority);
+
+	if (!q->filter_list)
+		return skb_get_rxhash(skb);
+
+	result = tc_classify(skb, q->filter_list, &res);
+	if (result >= 0) {
+#ifdef CONFIG_NET_CLS_ACT
+		switch (result) {
+		case TC_ACT_STOLEN:
+		case TC_ACT_QUEUED:
+			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+		case TC_ACT_SHOT:
+			return 0;
+		}
+#endif
+		return TC_H_MIN(res.classid);
+	}
+
+	return 0;
+}
+
+static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct red_parms *p = &q->parms;
+	unsigned int hash;
+	int uninitialized_var(ret);
+
+	hash = choke_classify(skb, sch, &ret);
+	if (!hash) {
+		/* Packet was eaten by filter */
+		if (ret & __NET_XMIT_BYPASS)
+			sch->qstats.drops++;
+		kfree_skb(skb);
+		return ret;
+	}
+
+	/* Maybe add hash as field in struct qdisc_skb_cb? */
+	*(unsigned int *)(qdisc_skb_cb(skb)->data) = hash;
+
+	/* Compute average queue usage (see RED) */
+	p->qavg = red_calc_qavg(p, choke_len(q) - q->holes);
+	if (red_is_idling(p))
+		red_end_of_idle_period(p);
+
+	/* Is queue small? */
+	if (p->qavg <= p->qth_min)
+		p->qcount = -1;
+	else {
+		struct sk_buff *oskb;
+		unsigned int idx;
+
+		/* Draw a packet at random from queue */
+		oskb = choke_peek_random(q, &idx);
+
+		/* Both packets from same flow ? */
+		if (*(unsigned int *)(qdisc_skb_cb(oskb)->data) == hash) {
+			/* Drop both packets */
+			q->stats.matched++;
+			choke_drop_by_idx(q, idx);
+			qdisc_drop(oskb, sch);
+			goto congestion_drop;
+		}
+
+		/* Queue is large, always mark/drop */
+		if (p->qavg > p->qth_max) {
+			p->qcount = -1;
+
+			sch->qstats.overlimits++;
+			if (use_harddrop(q) || !use_ecn(q) ||
+			    !INET_ECN_set_ce(skb)) {
+				q->stats.forced_drop++;
+				goto congestion_drop;
+			}
+
+			q->stats.forced_mark++;
+		} else if (++p->qcount) {
+			if (red_mark_probability(p, p->qavg)) {
+				p->qcount = 0;
+				p->qR = red_random(p);
+
+				sch->qstats.overlimits++;
+				if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
+					q->stats.prob_drop++;
+					goto congestion_drop;
+				}
+
+				q->stats.prob_mark++;
+			}
+		} else
+			p->qR = red_random(p);
+	}
+
+	/* Admit new packet */
+	if (likely(choke_len(q) < q->limit)) {
+
+		q->tab[q->tail] = skb;
+		q->tail = (q->tail + 1) & q->tab_mask;
+
+		sch->qstats.backlog += qdisc_pkt_len(skb);
+		qdisc_update_bstats(sch, skb);
+		sch->q.qlen = choke_len(q) - q->holes;
+		return NET_XMIT_SUCCESS;
+	}
+
+	q->stats.pdrop++;
+	sch->qstats.drops++;
+	kfree_skb(skb);
+	return NET_XMIT_DROP;
+
+ congestion_drop:
+	qdisc_drop(skb, sch);
+	return NET_XMIT_CN;
+}
+
+static struct sk_buff *choke_dequeue(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+
+	if (q->head == q->tail) {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+		return NULL;
+	}
+	skb = q->tab[q->head];
+	q->tab[q->head] = NULL; /* not really needed */
+	q->head = (q->head + 1) & q->tab_mask;
+	choke_zap_head_holes(q);
+	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	sch->q.qlen = choke_len(q) - q->holes;
+
+	return skb;
+}
+
+static unsigned int choke_drop(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	unsigned int len;
+
+	len = qdisc_queue_drop(sch);
+	if (len > 0)
+		q->stats.other++;
+	else {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+	}
+
+	return len;
+}
+
+static void choke_reset(struct Qdisc* sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	red_restart(&q->parms);
+}
+
+static const struct nla_policy choke_policy[TCA_CHOKE_MAX + 1] = {
+	[TCA_CHOKE_PARMS]	= { .len = sizeof(struct tc_red_qopt) },
+	[TCA_CHOKE_STAB]	= { .len = 256 },
+};
+
+
+static void choke_free(void *addr)
+{
+	if (addr) {
+		if (is_vmalloc_addr(addr))
+			vfree(addr);
+		else
+			kfree(addr);
+	}
+}
+
+static int choke_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_CHOKE_MAX + 1];
+	struct tc_red_qopt *ctl;
+	int err;
+	struct sk_buff **old = NULL;
+	unsigned int mask;
+
+	if (opt == NULL)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CHOKE_MAX, opt, choke_policy);
+	if (err < 0)
+		return err;
+
+	if (tb[TCA_CHOKE_PARMS] == NULL ||
+	    tb[TCA_CHOKE_STAB] == NULL)
+		return -EINVAL;
+
+	ctl = nla_data(tb[TCA_CHOKE_PARMS]);
+
+	if (ctl->limit > CHOKE_MAX_QUEUE)
+		return -EINVAL;
+
+	mask = roundup_pow_of_two(ctl->limit + 1) - 1;
+	if (mask != q->tab_mask) {
+		struct sk_buff **ntab;
+
+		ntab = kcalloc(mask + 1, sizeof(struct sk_buff *), GFP_KERNEL);
+		if (!ntab)
+			ntab = vzalloc((mask + 1) * sizeof(struct sk_buff *));
+		if (!ntab)
+			return -ENOMEM;
+
+		sch_tree_lock(sch);
+		old = q->tab;
+		if (old) {
+			unsigned int tail = 0;
+
+			while (q->head != q->tail) {
+				ntab[tail++] = q->tab[q->head];
+				q->head = (q->head + 1) & q->tab_mask;
+			}
+			q->head = 0;
+			q->tail = tail;
+		}
+
+		q->tab_mask = mask;
+		q->tab = ntab;
+		q->holes = 0;
+	} else
+		sch_tree_lock(sch);
+
+	q->flags = ctl->flags;
+	q->limit = ctl->limit;
+
+	red_set_parms(&q->parms, ctl->qth_min, ctl->qth_max, ctl->Wlog,
+		      ctl->Plog, ctl->Scell_log,
+		      nla_data(tb[TCA_CHOKE_STAB]));
+
+	if (q->head == q->tail)
+		red_end_of_idle_period(&q->parms);
+
+	sch_tree_unlock(sch);
+	choke_free(old);
+	return 0;
+}
+
+static int choke_init(struct Qdisc* sch, struct nlattr *opt)
+{
+	return choke_change(sch, opt);
+}
+
+static int choke_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts = NULL;
+	struct tc_red_qopt opt = {
+		.limit		= q->limit,
+		.flags		= q->flags,
+		.qth_min	= q->parms.qth_min >> q->parms.Wlog,
+		.qth_max	= q->parms.qth_max >> q->parms.Wlog,
+		.Wlog		= q->parms.Wlog,
+		.Plog		= q->parms.Plog,
+		.Scell_log	= q->parms.Scell_log,
+	};
+
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	if (opts == NULL)
+		goto nla_put_failure;
+
+	NLA_PUT(skb, TCA_CHOKE_PARMS, sizeof(opt), &opt);
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -EMSGSIZE;
+}
+
+static int choke_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tc_choke_xstats st = {
+		.early	= q->stats.prob_drop + q->stats.forced_drop,
+		.marked	= q->stats.prob_mark + q->stats.forced_mark,
+		.pdrop	= q->stats.pdrop,
+		.other	= q->stats.other,
+		.matched = q->stats.matched,
+	};
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static void choke_destroy(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	tcf_destroy_chain(&q->filter_list);
+	choke_free(q->tab);
+}
+
+static struct Qdisc *choke_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	return NULL;
+}
+
+static unsigned long choke_get(struct Qdisc *sch, u32 classid)
+{
+	return 0;
+}
+
+static void choke_put(struct Qdisc *q, unsigned long cl)
+{
+}
+
+static unsigned long choke_bind(struct Qdisc *sch, unsigned long parent,
+				u32 classid)
+{
+	return 0;
+}
+
+static struct tcf_proto **choke_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	if (cl)
+		return NULL;
+	return &q->filter_list;
+}
+
+static int choke_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+	tcm->tcm_handle |= TC_H_MIN(cl);
+	return 0;
+}
+
+static void choke_walk(struct Qdisc *sch, struct qdisc_walker *arg)
+{
+	if (!arg->stop) {
+		if (arg->fn(sch, 1, arg) < 0) {
+			arg->stop = 1;
+			return;
+		}
+		arg->count++;
+	}
+}
+
+static const struct Qdisc_class_ops choke_class_ops = {
+	.leaf		=	choke_leaf,
+	.get		=	choke_get,
+	.put		=	choke_put,
+	.tcf_chain	=	choke_find_tcf,
+	.bind_tcf	=	choke_bind,
+	.unbind_tcf	=	choke_put,
+	.dump		=	choke_dump_class,
+	.walk		=	choke_walk,
+};
+
+static struct sk_buff *choke_peek_head(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	return (q->head != q->tail) ? q->tab[q->head] : NULL;
+}
+
+static struct Qdisc_ops choke_qdisc_ops __read_mostly = {
+	.id		=	"choke",
+	.priv_size	=	sizeof(struct choke_sched_data),
+
+	.enqueue	=	choke_enqueue,
+	.dequeue	=	choke_dequeue,
+	.peek		=	choke_peek_head,
+	.drop		=	choke_drop,
+	.init		=	choke_init,
+	.destroy	=	choke_destroy,
+	.reset		=	choke_reset,
+	.change		=	choke_change,
+	.dump		=	choke_dump,
+	.dump_stats	=	choke_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init choke_module_init(void)
+{
+	return register_qdisc(&choke_qdisc_ops);
+}
+
+static void __exit choke_module_exit(void)
+{
+	unregister_qdisc(&choke_qdisc_ops);
+}
+
+module_init(choke_module_init)
+module_exit(choke_module_exit)
+
+MODULE_LICENSE("GPL");
--- a/include/linux/pkt_sched.h	2011-01-12 17:44:05.823500415 -0800
+++ b/include/linux/pkt_sched.h	2011-01-12 17:44:53.175735219 -0800
@@ -247,6 +247,35 @@ struct tc_gred_sopt {
 	__u16		pad1;
 };
 
+/* CHOKe section */
+
+enum {
+	TCA_CHOKE_UNSPEC,
+	TCA_CHOKE_PARMS,
+	TCA_CHOKE_STAB,
+	__TCA_CHOKE_MAX,
+};
+
+#define TCA_CHOKE_MAX (__TCA_CHOKE_MAX - 1)
+
+struct tc_choke_qopt {
+	__u32		limit;		/* HARD maximal queue length (packets)	*/
+	__u32		qth_min;	/* Min average length threshold (packets) */
+	__u32		qth_max;	/* Max average length threshold (packets) */
+	unsigned char   Wlog;		/* log(W)		*/
+	unsigned char   Plog;		/* log(P_max/(qth_max-qth_min))	*/
+	unsigned char   Scell_log;	/* cell size for idle damping */
+	unsigned char	flags;		/* see RED flags */
+};
+
+struct tc_choke_xstats {
+	__u32           early;          /* Early drops */
+	__u32           pdrop;          /* Drops due to queue limits */
+	__u32           other;          /* Drops due to drop() calls */
+	__u32           marked;         /* Marked packets */
+	__u32		matched;	/* Drops due to flow match */
+};
+
 /* HTB section */
 #define TC_HTB_NUMPRIO		8
 #define TC_HTB_MAXDEPTH		8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] CHOKe flow scheduler (iproute)
  2011-01-13 17:27 [PATCH] CHOKe flow scheduler (0.7) Stephen Hemminger
@ 2011-01-13 17:48 ` Stephen Hemminger
  2011-01-13 21:01   ` Eric Dumazet
  2011-01-13 18:00 ` [PATCH] CHOKe flow scheduler (0.7) Eric Dumazet
  2011-01-13 20:37 ` Eric Dumazet
  2 siblings, 1 reply; 26+ messages in thread
From: Stephen Hemminger @ 2011-01-13 17:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

Preliminary interface for CHOKe scheduler in iproute

---
 include/linux/pkt_sched.h |   29 ++++++
 tc/Makefile               |    1 +
 tc/q_choke.c              |  221 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 251 insertions(+), 0 deletions(-)
 create mode 100644 tc/q_choke.c

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 2cfa4bc..83bac92 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -247,6 +247,35 @@ struct tc_gred_sopt {
 	__u16		pad1;
 };
 
+/* CHOKe section */
+
+enum {
+	TCA_CHOKE_UNSPEC,
+	TCA_CHOKE_PARMS,
+	TCA_CHOKE_STAB,
+	__TCA_CHOKE_MAX,
+};
+
+#define TCA_CHOKE_MAX (__TCA_CHOKE_MAX - 1)
+
+struct tc_choke_qopt {
+	__u32		limit;		/* HARD maximal queue length (packets)	*/
+	__u32		qth_min;	/* Min average length threshold (packets) */
+	__u32		qth_max;	/* Max average length threshold (packets) */
+	unsigned char   Wlog;		/* log(W)		*/
+	unsigned char   Plog;		/* log(P_max/(qth_max-qth_min))	*/
+	unsigned char   Scell_log;	/* cell size for idle damping */
+	unsigned char	flags;		/* see RED flags */
+};
+
+struct tc_choke_xstats {
+	__u32           early;          /* Early drops */
+	__u32           pdrop;          /* Drops due to queue limits */
+	__u32           other;          /* Drops due to drop() calls */
+	__u32           marked;         /* Marked packets */
+	__u32		matched;	/* Drops due to flow match */
+};
+
 /* HTB section */
 #define TC_HTB_NUMPRIO		8
 #define TC_HTB_MAXDEPTH		8
diff --git a/tc/Makefile b/tc/Makefile
index 101cc83..2cbd5d5 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -15,6 +15,7 @@ TCMODULES += q_cbq.o
 TCMODULES += q_rr.o
 TCMODULES += q_multiq.o
 TCMODULES += q_netem.o
+TCMODULES += q_choke.o
 TCMODULES += f_rsvp.o
 TCMODULES += f_u32.o
 TCMODULES += f_route.o
diff --git a/tc/q_choke.c b/tc/q_choke.c
new file mode 100644
index 0000000..044ae9a
--- /dev/null
+++ b/tc/q_choke.c
@@ -0,0 +1,221 @@
+/*
+ * q_choke.c		CHOKE.
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Stephen Hemminger <shemminger@vyatta.com>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+
+#include "utils.h"
+#include "tc_util.h"
+
+#include "tc_red.h"
+
+static void explain(void)
+{
+	fprintf(stderr, "Usage: ... choke limit PACKETS bandwidth KBPS [ecn]\n");
+	fprintf(stderr, "                 [ min PACKETS ] [ max PACKETS ] [ burst PACKETS ]\n");
+}
+
+static int choke_parse_opt(struct qdisc_util *qu, int argc, char **argv,
+			   struct nlmsghdr *n)
+{
+	struct tc_red_qopt opt;
+	unsigned burst = 0;
+	unsigned avpkt = 1000;
+	double probability = 0.02;
+	unsigned rate = 0;
+	int ecn_ok = 0;
+	int wlog;
+	__u8 sbuf[256];
+	struct rtattr *tail;
+
+	memset(&opt, 0, sizeof(opt));
+
+	while (argc > 0) {
+		if (strcmp(*argv, "limit") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&opt.limit, *argv, 0)) {
+				fprintf(stderr, "Illegal \"limit\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "bandwidth") == 0) {
+			NEXT_ARG();
+			if (get_rate(&rate, *argv)) {
+				fprintf(stderr, "Illegal \"bandwidth\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "ecn") == 0) {
+			ecn_ok = 1;
+		} else if (strcmp(*argv, "min") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&opt.qth_min, *argv, 0)) {
+				fprintf(stderr, "Illegal \"min\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "max") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&opt.qth_max, *argv, 0)) {
+				fprintf(stderr, "Illegal \"max\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "burst") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&burst, *argv, 0)) {
+				fprintf(stderr, "Illegal \"burst\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "avpkt") == 0) {
+			NEXT_ARG();
+			if (get_size(&avpkt, *argv)) {
+				fprintf(stderr, "Illegal \"avpkt\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "probability") == 0) {
+			NEXT_ARG();
+			if (sscanf(*argv, "%lg", &probability) != 1) {
+				fprintf(stderr, "Illegal \"probability\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "help") == 0) {
+			explain();
+			return -1;
+		} else {
+			fprintf(stderr, "What is \"%s\"?\n", *argv);
+			explain();
+			return -1;
+		}
+		argc--; argv++;
+	}
+
+	if (!rate || !opt.limit) {
+		fprintf(stderr, "Required parameter (bandwidth, limit) is missing\n");
+		return -1;
+	}
+
+	/* Compute default min/max thresholds based on 
+	   Sally Floyd's recommendations:
+	   http://www.icir.org/floyd/REDparameters.txt
+	*/
+	if (!opt.qth_max) 
+		opt.qth_max = opt.limit / 4;
+	if (!opt.qth_min)
+		opt.qth_min = opt.qth_max / 3;
+	if (!burst)
+		burst = (2 * opt.qth_min + opt.qth_max) / 3;
+
+	if (opt.qth_max > opt.limit) {
+		fprintf(stderr, "\"max\" is larger than \"limit\"\n");
+		return -1;
+	}
+
+	if (opt.qth_min > opt.qth_min) {
+		fprintf(stderr, "\"min\" is not smaller than \"max\"\n");
+		return -1;
+	}
+
+	printf("min=%u max=%u burst=%u limit=%u\n",
+	       opt.qth_min, opt.qth_max, burst, opt.limit);
+	wlog = tc_red_eval_ewma(opt.qth_min, burst, 1);
+	if (wlog < 0) {
+		fprintf(stderr, "CHOKE: failed to calculate EWMA constant.\n");
+		return -1;
+	}
+	if (wlog >= 10)
+		fprintf(stderr, "CHOKE: WARNING. Burst %d seems to be to large.\n", burst);
+	opt.Wlog = wlog;
+
+	wlog = tc_red_eval_P(opt.qth_min, opt.qth_max, probability);
+	if (wlog < 0) {
+		fprintf(stderr, "CHOKE: failed to calculate probability.\n");
+		return -1;
+	}
+	opt.Plog = wlog;
+
+	wlog = tc_red_eval_idle_damping(opt.Wlog, avpkt, rate, sbuf);
+	if (wlog < 0) {
+		fprintf(stderr, "CHOKE: failed to calculate idle damping table.\n");
+		return -1;
+	}
+	opt.Scell_log = wlog;
+	if (ecn_ok)
+		opt.flags |= TC_RED_ECN;
+
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
+	addattr_l(n, 1024, TCA_CHOKE_PARMS, &opt, sizeof(opt));
+	addattr_l(n, 1024, TCA_CHOKE_STAB, sbuf, 256);
+	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
+	return 0;
+}
+
+static int choke_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
+{
+	struct rtattr *tb[TCA_CHOKE_STAB+1];
+	struct tc_red_qopt *qopt;
+	SPRINT_BUF(b1);
+	SPRINT_BUF(b2);
+	SPRINT_BUF(b3);
+
+	if (opt == NULL)
+		return 0;
+
+	parse_rtattr_nested(tb, TCA_CHOKE_STAB, opt);
+
+	if (tb[TCA_CHOKE_PARMS] == NULL)
+		return -1;
+	qopt = RTA_DATA(tb[TCA_CHOKE_PARMS]);
+	if (RTA_PAYLOAD(tb[TCA_CHOKE_PARMS])  < sizeof(*qopt))
+		return -1;
+	fprintf(f, "limit %s min %s max %s ",
+		sprint_size(qopt->limit, b1),
+		sprint_size(qopt->qth_min, b2),
+		sprint_size(qopt->qth_max, b3));
+
+	if (qopt->flags & TC_RED_ECN)
+		fprintf(f, "ecn ");
+
+	if (show_details) {
+		fprintf(f, "ewma %u Plog %u Scell_log %u",
+			qopt->Wlog, qopt->Plog, qopt->Scell_log);
+	}
+	return 0;
+}
+
+static int choke_print_xstats(struct qdisc_util *qu, FILE *f,
+			      struct rtattr *xstats)
+{
+	struct tc_choke_xstats *st;
+
+	if (xstats == NULL)
+		return 0;
+
+	if (RTA_PAYLOAD(xstats) < sizeof(*st))
+		return -1;
+
+	st = RTA_DATA(xstats);
+	fprintf(f, "  marked %u early %u pdrop %u other %u matched %u",
+		st->marked, st->early, st->pdrop, st->other, st->matched);
+	return 0;
+
+}
+
+struct qdisc_util choke_qdisc_util = {
+	.id		= "choke",
+	.parse_qopt	= choke_parse_opt,
+	.print_qopt	= choke_print_opt,
+	.print_xstats	= choke_print_xstats,
+};
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] CHOKe flow scheduler (0.7)
  2011-01-13 17:27 [PATCH] CHOKe flow scheduler (0.7) Stephen Hemminger
  2011-01-13 17:48 ` [PATCH] CHOKe flow scheduler (iproute) Stephen Hemminger
@ 2011-01-13 18:00 ` Eric Dumazet
  2011-01-13 18:02   ` Eric Dumazet
  2011-01-13 20:37 ` Eric Dumazet
  2 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-01-13 18:00 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

Le jeudi 13 janvier 2011 à 09:27 -0800, Stephen Hemminger a écrit :
> This implements the CHOKe packet scheduler based on the existing
> Linux RED scheduler based on the algorithm described in the paper.
> 
> The core idea is:
>   For every packet arrival:
>   	Calculate Qave
> 	if (Qave < minth) 
> 	     Queue the new packet
> 	else 
> 	     Select randomly a packet from the queue 
> 	     if (both packets from same flow)
> 	     then Drop both the packets
> 	     else if (Qave > maxth)
> 	          Drop packet
> 	     else
> 	       	  Admit packet with proability p (same as RED)
> 
> See also:
>   Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
>    queue management scheme for approximating fair bandwidth allocation", 
>   Proceeding of INFOCOM'2000, March 2000.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> ---
> Patch versions
> 0.3 (Eric) uses table for queue.
> 0.4 allows classification with TC filters
>     fixes crash when peek_random() finds a hole
> 0.5 (Eric) that fixes qlen with holes and peek
> 0.7 change to use separate params / stats than RED
>     account for drops in backlog
> 
> Almost ready, still need to make sure API (netlink) is right
> 
> 
>  net/sched/Kconfig     |   11 +
>  net/sched/Makefile    |    1 
>  net/sched/sch_choke.c |  536 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 548 insertions(+)
> 
> --- a/net/sched/Kconfig	2011-01-12 17:44:05.747500044 -0800
> +++ b/net/sched/Kconfig	2011-01-12 17:44:53.167735188 -0800
> @@ -205,6 +205,17 @@ config NET_SCH_DRR
>  
>  	  If unsure, say N.
>  
> +config NET_SCH_CHOKE
> +	tristate "CHOose and Keep responsive flow scheduler (CHOKE)"
> +	help
> +	  Say Y here if you want to use the CHOKe packet scheduler (CHOose
> +	  and Keep for responsive flows, CHOose and Kill for unresponsive
> +	  flows). This is a variation of RED which trys to penalize flows
> +	  that monopolize the queue.
> +
> +	  To compile this code as a module, choose M here: the
> +	  module will be called sch_choke.
> +
>  config NET_SCH_INGRESS
>  	tristate "Ingress Qdisc"
>  	depends on NET_CLS_ACT
> --- a/net/sched/Makefile	2011-01-12 17:44:05.767500135 -0800
> +++ b/net/sched/Makefile	2011-01-12 17:44:53.167735188 -0800
> @@ -32,6 +32,7 @@ obj-$(CONFIG_NET_SCH_MULTIQ)	+= sch_mult
>  obj-$(CONFIG_NET_SCH_ATM)	+= sch_atm.o
>  obj-$(CONFIG_NET_SCH_NETEM)	+= sch_netem.o
>  obj-$(CONFIG_NET_SCH_DRR)	+= sch_drr.o
> +obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
>  obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
>  obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
>  obj-$(CONFIG_NET_CLS_FW)	+= cls_fw.o
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ b/net/sched/sch_choke.c	2011-01-12 17:45:07.227806180 -0800
> @@ -0,0 +1,556 @@
> +/*
> + * net/sched/sch_choke.c	CHOKE scheduler
> + *
> + * Copyright (c) 2011 Stephen Hemminger <shemminger@vyatta.com>
> + * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * version 2 as published by the Free Software Foundation.
> + *
> + */
> +
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/skbuff.h>
> +#include <linux/reciprocal_div.h>
> +#include <net/pkt_sched.h>
> +#include <net/inet_ecn.h>
> +#include <net/red.h>
> +
> +/*	CHOKe stateless AQM for fair bandwidth allocation
> +        =================================================
> +
> +   CHOKe (CHOose and Keep for responsive flows, CHOose and Kill for
> +   unresponsive flows) is a variant of RED that penalizes misbehaving flows but
> +   maintains no flow state. The difference from RED is an additional step
> +   during the enqueuing process. If average queue size is over the
> +   low threshold (qmin), a packet is chosen at random from the queue.
> +   If both the new and chosen packet are from the same flow, both
> +   are dropped. Unlike RED, CHOKe is not really a "classful" qdisc because it
> +   needs to access packets in queue randomly. It has a minimal class
> +   interface to allow overriding the builtin flow classifier with
> +   filters.
> +
> +   Source:
> +   R. Pan, B. Prabhakar, and K. Psounis, "CHOKe, A Stateless
> +   Active Queue Management Scheme for Approximating Fair Bandwidth Allocation",
> +   IEEE INFOCOM, 2000.
> +
> +   A. Tang, J. Wang, S. Low, "Understanding CHOKe: Throughput and Spatial
> +   Characteristics", IEEE/ACM Transactions on Networking, 2004
> +
> + */
> +
> +/* Upper bound on size of sk_buff table */
> +#define CHOKE_MAX_QUEUE	(128*1024 - 1)
> +
> +struct choke_sched_data {
> +/* Parameters */
> +	u32		 limit;
> +	unsigned char	 flags;
> +
> +	struct red_parms parms;
> +
> +/* Variables */
> +	struct tcf_proto *filter_list;
> +	struct {
> +		u32	prob_drop;	/* Early probability drops */
> +		u32	prob_mark;	/* Early probability marks */
> +		u32	forced_drop;	/* Forced drops, qavg > max_thresh */
> +		u32	forced_mark;	/* Forced marks, qavg > max_thresh */
> +		u32	pdrop;          /* Drops due to queue limits */
> +		u32	other;          /* Drops due to drop() calls */
> +		u32	matched;	/* Drops to flow match */
> +	} stats;
> +
> +	unsigned int	 head;
> +	unsigned int	 tail;
> +	unsigned int	 holes;
> +	unsigned int	 tab_mask; /* size - 1 */
> +
> +	struct sk_buff **tab;
> +};
> +
> +static inline unsigned int choke_len(const struct choke_sched_data *q)
> +{
> +	return (q->tail - q->head) & q->tab_mask;
> +}
> +
> +/* deliver a random number between 0 and N - 1 */
> +static inline u32 random_N(unsigned int N)
> +{
> +	return reciprocal_divide(random32(), N);
> +}
> +
> +/* Select a packet at random from the queue in O(1) and handle holes */
> +static struct sk_buff *choke_peek_random(struct choke_sched_data *q,
> +					 unsigned int *pidx)
> +{
> +	struct sk_buff *skb;
> +	int retrys = 3;
> +
> +	do {
> +		*pidx = (q->head + random_N(choke_len(q))) & q->tab_mask;
> +		skb = q->tab[*pidx];
> +		if (skb)
> +			return skb;
> +	} while (--retrys > 0);
> +
> +	/* queue is has lots of holes use the head which is known to exist */
> +	return q->tab[*pidx = q->head];
> +}
> +
> +/* Is ECN parameter configured */
> +static inline int use_ecn(const struct choke_sched_data *q)
> +{
> +	return q->flags & TC_RED_ECN;
> +}
> +
> +/* Should packets over max just be dropped (versus marked) */
> +static inline int use_harddrop(const struct choke_sched_data *q)
> +{
> +	return q->flags & TC_RED_HARDDROP;
> +}
> +
> +/* Move head pointer forward to skip over holes */
> +static void choke_zap_head_holes(struct choke_sched_data *q)
> +{
> +	while (q->holes && q->tab[q->head] == NULL) {
> +		q->head = (q->head + 1) & q->tab_mask;
> +		q->holes--;
> +	}
> +}
> +
> +/* Move tail pointer backwards to reuse holes */
> +static void choke_zap_tail_holes(struct choke_sched_data *q)
> +{
> +	while (q->holes && q->tab[q->tail - 1] == NULL) {
> +		q->tail = (q->tail - 1) & q->tab_mask;
> +		q->holes--;
> +	}
> +}
> +
> +/* Drop packet from queue array by creating a "hole" */
> +static void choke_drop_by_idx(struct choke_sched_data *q, unsigned int idx)
> +{
> +	q->tab[idx] = NULL;
> +	q->holes++;
> +
> +	if (idx == q->head)
> +		choke_zap_head_holes(q);
> +	if (idx == q->tail)
> +		choke_zap_tail_holes(q);
> +}
> +
> +/* Classify flow using either:
> +   1. pre-existing classification result in skb
> +   2. fast internal classification
> +   3. use TC filter based classification
> +*/
> +static inline unsigned int choke_classify(struct sk_buff *skb,
> +					  struct Qdisc *sch, int *qerr)
> +
> +{
> +	struct choke_sched_data *q = qdisc_priv(sch);
> +	struct tcf_result res;
> +	int result;
> +
> +	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
> +
> +	if (TC_H_MAJ(skb->priority) == sch->handle &&
> +	    TC_H_MIN(skb->priority) > 0)
> +		return TC_H_MIN(skb->priority);
> +
> +	if (!q->filter_list)
> +		return skb_get_rxhash(skb);
> +
> +	result = tc_classify(skb, q->filter_list, &res);
> +	if (result >= 0) {
> +#ifdef CONFIG_NET_CLS_ACT
> +		switch (result) {
> +		case TC_ACT_STOLEN:
> +		case TC_ACT_QUEUED:
> +			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
> +		case TC_ACT_SHOT:
> +			return 0;
> +		}
> +#endif
> +		return TC_H_MIN(res.classid);
> +	}
> +
> +	return 0;
> +}
> +
> +static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
> +{
> +	struct choke_sched_data *q = qdisc_priv(sch);
> +	struct red_parms *p = &q->parms;
> +	unsigned int hash;
> +	int uninitialized_var(ret);
> +
> +	hash = choke_classify(skb, sch, &ret);
> +	if (!hash) {
> +		/* Packet was eaten by filter */
> +		if (ret & __NET_XMIT_BYPASS)
> +			sch->qstats.drops++;
> +		kfree_skb(skb);
> +		return ret;
> +	}
> +
> +	/* Maybe add hash as field in struct qdisc_skb_cb? */
> +	*(unsigned int *)(qdisc_skb_cb(skb)->data) = hash;
> +
> +	/* Compute average queue usage (see RED) */
> +	p->qavg = red_calc_qavg(p, choke_len(q) - q->holes);
> +	if (red_is_idling(p))
> +		red_end_of_idle_period(p);
> +
> +	/* Is queue small? */
> +	if (p->qavg <= p->qth_min)
> +		p->qcount = -1;
> +	else {
> +		struct sk_buff *oskb;
> +		unsigned int idx;
> +
> +		/* Draw a packet at random from queue */
> +		oskb = choke_peek_random(q, &idx);
> +
> +		/* Both packets from same flow ? */
> +		if (*(unsigned int *)(qdisc_skb_cb(oskb)->data) == hash) {
> +			/* Drop both packets */
> +			q->stats.matched++;
> +			choke_drop_by_idx(q, idx);
> +			qdisc_drop(oskb, sch);

I feel we should add : sch->q.qlen--;

> +			goto congestion_drop;
> +		}
> +
> +		/* Queue is large, always mark/drop */
> +		if (p->qavg > p->qth_max) {
> +			p->qcount = -1;
> +
> +			sch->qstats.overlimits++;
> +			if (use_harddrop(q) || !use_ecn(q) ||
> +			    !INET_ECN_set_ce(skb)) {
> +				q->stats.forced_drop++;
> +				goto congestion_drop;
> +			}
> +
> +			q->stats.forced_mark++;
> +		} else if (++p->qcount) {
> +			if (red_mark_probability(p, p->qavg)) {
> +				p->qcount = 0;
> +				p->qR = red_random(p);
> +
> +				sch->qstats.overlimits++;
> +				if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
> +					q->stats.prob_drop++;
> +					goto congestion_drop;
> +				}
> +
> +				q->stats.prob_mark++;
> +			}
> +		} else
> +			p->qR = red_random(p);
> +	}
> +
> +	/* Admit new packet */
> +	if (likely(choke_len(q) < q->limit)) {
> +
> +		q->tab[q->tail] = skb;
> +		q->tail = (q->tail + 1) & q->tab_mask;
> +
> +		sch->qstats.backlog += qdisc_pkt_len(skb);
> +		qdisc_update_bstats(sch, skb);
> +		sch->q.qlen = choke_len(q) - q->holes;
	or : sch->q.qlen++;

(If sch->q.qlen is up2date in respect of above comment)

> +		return NET_XMIT_SUCCESS;
> +	}
> +
> +	q->stats.pdrop++;
> +	sch->qstats.drops++;
> +	kfree_skb(skb);
> +	return NET_XMIT_DROP;
> +
> + congestion_drop:
> +	qdisc_drop(skb, sch);
> +	return NET_XMIT_CN;
> +}
> +
> +static struct sk_buff *choke_dequeue(struct Qdisc *sch)
> +{
> +	struct choke_sched_data *q = qdisc_priv(sch);
> +	struct sk_buff *skb;
> +
> +	if (q->head == q->tail) {
> +		if (!red_is_idling(&q->parms))
> +			red_start_of_idle_period(&q->parms);
> +		return NULL;
> +	}
> +	skb = q->tab[q->head];
> +	q->tab[q->head] = NULL; /* not really needed */
> +	q->head = (q->head + 1) & q->tab_mask;
> +	choke_zap_head_holes(q);
> +	sch->qstats.backlog -= qdisc_pkt_len(skb);
> +	sch->q.qlen = choke_len(q) - q->holes;

	sch->q.qlen--;

> +
> +	return skb;
> +}
> +
> +static unsigned int choke_drop(struct Qdisc *sch)
> +{
> +	struct choke_sched_data *q = qdisc_priv(sch);
> +	unsigned int len;
> +
> +	len = qdisc_queue_drop(sch);
> +	if (len > 0)
> +		q->stats.other++;
> +	else {
> +		if (!red_is_idling(&q->parms))
> +			red_start_of_idle_period(&q->parms);
> +	}
> +
> +	return len;
> +}
> +
> +static void choke_reset(struct Qdisc* sch)
> +{
> +	struct choke_sched_data *q = qdisc_priv(sch);
> +
> +	red_restart(&q->parms);
> +}
> +
> +static const struct nla_policy choke_policy[TCA_CHOKE_MAX + 1] = {
> +	[TCA_CHOKE_PARMS]	= { .len = sizeof(struct tc_red_qopt) },
> +	[TCA_CHOKE_STAB]	= { .len = 256 },

RED_STAB_SIZE ?


Thanks !



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] CHOKe flow scheduler (0.7)
  2011-01-13 18:00 ` [PATCH] CHOKe flow scheduler (0.7) Eric Dumazet
@ 2011-01-13 18:02   ` Eric Dumazet
  0 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2011-01-13 18:02 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

Le jeudi 13 janvier 2011 à 19:00 +0100, Eric Dumazet a écrit :
> Le jeudi 13 janvier 2011 à 09:27 -0800, Stephen Hemminger a écrit :
> > This implements the CHOKe packet scheduler based on the existing
> > Linux RED scheduler based on the algorithm described in the paper.
> > 
> >

Sorry for the long reply, I hit 'Send' button in the wrong window,
before removing hunks.




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] CHOKe flow scheduler (0.7)
  2011-01-13 17:27 [PATCH] CHOKe flow scheduler (0.7) Stephen Hemminger
  2011-01-13 17:48 ` [PATCH] CHOKe flow scheduler (iproute) Stephen Hemminger
  2011-01-13 18:00 ` [PATCH] CHOKe flow scheduler (0.7) Eric Dumazet
@ 2011-01-13 20:37 ` Eric Dumazet
  2011-01-13 23:34   ` [PATCH] CHOKe flow scheduler (0.8) Stephen Hemminger
  2 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-01-13 20:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

Le jeudi 13 janvier 2011 à 09:27 -0800, Stephen Hemminger a écrit :
> +	/* Admit new packet */
> +	if (likely(choke_len(q) < q->limit)) {
> +
> +		q->tab[q->tail] = skb;
> +		q->tail = (q->tail + 1) & q->tab_mask;
> +
> +		sch->qstats.backlog += qdisc_pkt_len(skb);
> +		qdisc_update_bstats(sch, skb);
> +		sch->q.qlen = choke_len(q) - q->holes;
> +		return NET_XMIT_SUCCESS;
> +	}
> +

You now must use qdisc_bstats_update() ;)




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] CHOKe flow scheduler (iproute)
  2011-01-13 17:48 ` [PATCH] CHOKe flow scheduler (iproute) Stephen Hemminger
@ 2011-01-13 21:01   ` Eric Dumazet
  0 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2011-01-13 21:01 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

Le jeudi 13 janvier 2011 à 09:48 -0800, Stephen Hemminger a écrit :
> Preliminary interface for CHOKe scheduler in iproute
> 
> +		return -1;
> +	fprintf(f, "limit %s min %s max %s ",
> +		sprint_size(qopt->limit, b1),
> +		sprint_size(qopt->qth_min, b2),
> +		sprint_size(qopt->qth_max, b3));
> +

sprint_size() adds 'b', not 'p' (packets)

fprintf(f, "limit %up min %up max %up ", qopt->limit, ...





^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] CHOKe flow scheduler (0.8)
  2011-01-13 20:37 ` Eric Dumazet
@ 2011-01-13 23:34   ` Stephen Hemminger
  2011-01-14  3:29     ` Eric Dumazet
  2011-01-14 13:54     ` Patrick McHardy
  0 siblings, 2 replies; 26+ messages in thread
From: Stephen Hemminger @ 2011-01-13 23:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
packet scheduler based on the Random Exponential Drop (RED) algorithm.

The core idea is:
  For every packet arrival:
  	Calculate Qave
	if (Qave < minth) 
	     Queue the new packet
	else 
	     Select randomly a packet from the queue 
	     if (both packets from same flow)
	     then Drop both the packets
	     else if (Qave > maxth)
	          Drop packet
	     else
	       	  Admit packet with probability P (same as RED)

See also:
  Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
   queue management scheme for approximating fair bandwidth allocation", 
  Proceeding of INFOCOM'2000, March 2000.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
0.8 change queue length and holes account.
    keep sch->q.qlen updated, and holes counter not needed.

---
 net/sched/Kconfig     |   11 +
 net/sched/Makefile    |    1 
 net/sched/sch_choke.c |  536 ++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 548 insertions(+)

--- a/net/sched/Kconfig	2011-01-13 15:19:41.542022830 -0800
+++ b/net/sched/Kconfig	2011-01-13 15:20:53.586380066 -0800
@@ -205,6 +205,17 @@ config NET_SCH_DRR
 
 	  If unsure, say N.
 
+config NET_SCH_CHOKE
+	tristate "CHOose and Keep responsive flow scheduler (CHOKE)"
+	help
+	  Say Y here if you want to use the CHOKe packet scheduler (CHOose
+	  and Keep for responsive flows, CHOose and Kill for unresponsive
+	  flows). This is a variation of RED which trys to penalize flows
+	  that monopolize the queue.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called sch_choke.
+
 config NET_SCH_INGRESS
 	tristate "Ingress Qdisc"
 	depends on NET_CLS_ACT
--- a/net/sched/Makefile	2011-01-13 15:19:41.578022995 -0800
+++ b/net/sched/Makefile	2011-01-13 15:20:53.586380066 -0800
@@ -32,6 +32,7 @@ obj-$(CONFIG_NET_SCH_MULTIQ)	+= sch_mult
 obj-$(CONFIG_NET_SCH_ATM)	+= sch_atm.o
 obj-$(CONFIG_NET_SCH_NETEM)	+= sch_netem.o
 obj-$(CONFIG_NET_SCH_DRR)	+= sch_drr.o
+obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
 obj-$(CONFIG_NET_CLS_FW)	+= cls_fw.o
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ b/net/sched/sch_choke.c	2011-01-13 15:26:18.771992614 -0800
@@ -0,0 +1,552 @@
+/*
+ * net/sched/sch_choke.c	CHOKE scheduler
+ *
+ * Copyright (c) 2011 Stephen Hemminger <shemminger@vyatta.com>
+ * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/reciprocal_div.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <net/red.h>
+
+/*	CHOKe stateless AQM for fair bandwidth allocation
+        =================================================
+
+   CHOKe (CHOose and Keep for responsive flows, CHOose and Kill for
+   unresponsive flows) is a variant of RED that penalizes misbehaving flows but
+   maintains no flow state. The difference from RED is an additional step
+   during the enqueuing process. If average queue size is over the
+   low threshold (qmin), a packet is chosen at random from the queue.
+   If both the new and chosen packet are from the same flow, both
+   are dropped. Unlike RED, CHOKe is not really a "classful" qdisc because it
+   needs to access packets in queue randomly. It has a minimal class
+   interface to allow overriding the builtin flow classifier with
+   filters.
+
+   Source:
+   R. Pan, B. Prabhakar, and K. Psounis, "CHOKe, A Stateless
+   Active Queue Management Scheme for Approximating Fair Bandwidth Allocation",
+   IEEE INFOCOM, 2000.
+
+   A. Tang, J. Wang, S. Low, "Understanding CHOKe: Throughput and Spatial
+   Characteristics", IEEE/ACM Transactions on Networking, 2004
+
+ */
+
+/* Upper bound on size of sk_buff table */
+#define CHOKE_MAX_QUEUE	(128*1024 - 1)
+
+struct choke_sched_data {
+/* Parameters */
+	u32		 limit;
+	unsigned char	 flags;
+
+	struct red_parms parms;
+
+/* Variables */
+	struct tcf_proto *filter_list;
+	struct {
+		u32	prob_drop;	/* Early probability drops */
+		u32	prob_mark;	/* Early probability marks */
+		u32	forced_drop;	/* Forced drops, qavg > max_thresh */
+		u32	forced_mark;	/* Forced marks, qavg > max_thresh */
+		u32	pdrop;          /* Drops due to queue limits */
+		u32	other;          /* Drops due to drop() calls */
+		u32	matched;	/* Drops to flow match */
+	} stats;
+
+	unsigned int	 head;
+	unsigned int	 tail;
+
+	unsigned int	 tab_mask; /* size - 1 */
+
+	struct sk_buff **tab;
+};
+
+/* deliver a random number between 0 and N - 1 */
+static inline u32 random_N(unsigned int N)
+{
+	return reciprocal_divide(random32(), N);
+}
+
+/* Select a packet at random from the queue in O(1) and handle holes */
+static struct sk_buff *choke_peek_random(struct Qdisc *sch,
+					 unsigned int *pidx)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+	int retrys = 3;
+
+	do {
+		*pidx = (q->head + random_N(sch->q.qlen)) & q->tab_mask;
+		skb = q->tab[*pidx];
+		if (skb)
+			return skb;
+	} while (--retrys > 0);
+
+	/* queue is has lots of holes use the head which is known to exist */
+	return q->tab[*pidx = q->head];
+}
+
+/* Is ECN parameter configured */
+static inline int use_ecn(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_ECN;
+}
+
+/* Should packets over max just be dropped (versus marked) */
+static inline int use_harddrop(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_HARDDROP;
+}
+
+/* Move head pointer forward to skip over holes */
+static void choke_zap_head_holes(struct choke_sched_data *q)
+{
+	while (q->tab[q->head] == NULL) {
+		q->head = (q->head + 1) & q->tab_mask;
+
+		BUG_ON(q->head == q->tail);
+	}
+}
+
+/* Move tail pointer backwards to reuse holes */
+static void choke_zap_tail_holes(struct choke_sched_data *q)
+{
+	while (q->tab[q->tail - 1] == NULL) {
+		q->tail = (q->tail - 1) & q->tab_mask;
+		BUG_ON(q->head == q->tail);
+	}
+}
+
+/* Drop packet from queue array by creating a "hole" */
+static void choke_drop_by_idx(struct choke_sched_data *q, unsigned int idx)
+{
+	q->tab[idx] = NULL;
+
+	if (idx == q->head)
+		choke_zap_head_holes(q);
+	if (idx == q->tail)
+		choke_zap_tail_holes(q);
+}
+
+/* Classify flow using either:
+   1. pre-existing classification result in skb
+   2. fast internal classification
+   3. use TC filter based classification
+*/
+static inline unsigned int choke_classify(struct sk_buff *skb,
+					  struct Qdisc *sch, int *qerr)
+
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tcf_result res;
+	int result;
+
+	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+
+	if (TC_H_MAJ(skb->priority) == sch->handle &&
+	    TC_H_MIN(skb->priority) > 0)
+		return TC_H_MIN(skb->priority);
+
+	if (!q->filter_list)
+		return skb_get_rxhash(skb);
+
+	result = tc_classify(skb, q->filter_list, &res);
+	if (result >= 0) {
+#ifdef CONFIG_NET_CLS_ACT
+		switch (result) {
+		case TC_ACT_STOLEN:
+		case TC_ACT_QUEUED:
+			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+		case TC_ACT_SHOT:
+			return 0;
+		}
+#endif
+		return TC_H_MIN(res.classid);
+	}
+
+	return 0;
+}
+
+static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct red_parms *p = &q->parms;
+	unsigned int hash;
+	int uninitialized_var(ret);
+
+	hash = choke_classify(skb, sch, &ret);
+	if (!hash) {
+		/* Packet was eaten by filter */
+		if (ret & __NET_XMIT_BYPASS)
+			sch->qstats.drops++;
+		kfree_skb(skb);
+		return ret;
+	}
+
+	/* Maybe add hash as field in struct qdisc_skb_cb? */
+	*(unsigned int *)(qdisc_skb_cb(skb)->data) = hash;
+
+	/* Compute average queue usage (see RED) */
+	p->qavg = red_calc_qavg(p, sch->q.qlen);
+	if (red_is_idling(p))
+		red_end_of_idle_period(p);
+
+	/* Is queue small? */
+	if (p->qavg <= p->qth_min)
+		p->qcount = -1;
+	else {
+		struct sk_buff *oskb;
+		unsigned int idx;
+
+		/* Draw a packet at random from queue */
+		oskb = choke_peek_random(sch, &idx);
+
+		/* Both packets from same flow ? */
+		if (*(unsigned int *)(qdisc_skb_cb(oskb)->data) == hash) {
+			/* Drop both packets */
+			q->stats.matched++;
+			choke_drop_by_idx(q, idx);
+			sch->qstats.backlog -= qdisc_pkt_len(skb);
+			--sch->q.qlen;
+			qdisc_drop(oskb, sch);
+			goto congestion_drop;
+		}
+
+		/* Queue is large, always mark/drop */
+		if (p->qavg > p->qth_max) {
+			p->qcount = -1;
+
+			sch->qstats.overlimits++;
+			if (use_harddrop(q) || !use_ecn(q) ||
+			    !INET_ECN_set_ce(skb)) {
+				q->stats.forced_drop++;
+				goto congestion_drop;
+			}
+
+			q->stats.forced_mark++;
+		} else if (++p->qcount) {
+			if (red_mark_probability(p, p->qavg)) {
+				p->qcount = 0;
+				p->qR = red_random(p);
+
+				sch->qstats.overlimits++;
+				if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
+					q->stats.prob_drop++;
+					goto congestion_drop;
+				}
+
+				q->stats.prob_mark++;
+			}
+		} else
+			p->qR = red_random(p);
+	}
+
+	/* Admit new packet */
+	if (sch->q.qlen < q->limit) {
+		q->tab[q->tail] = skb;
+		q->tail = (q->tail + 1) & q->tab_mask;
+		++sch->q.qlen;
+		sch->qstats.backlog += qdisc_pkt_len(skb);
+		qdisc_bstats_update(sch, skb);
+		return NET_XMIT_SUCCESS;
+	}
+
+	q->stats.pdrop++;
+	sch->qstats.drops++;
+	kfree_skb(skb);
+	return NET_XMIT_DROP;
+
+ congestion_drop:
+	qdisc_drop(skb, sch);
+	return NET_XMIT_CN;
+}
+
+static struct sk_buff *choke_dequeue(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+
+	if (q->head == q->tail) {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+		return NULL;
+	}
+
+	skb = q->tab[q->head];
+	q->tab[q->head] = NULL; /* not really needed */
+	q->head = (q->head + 1) & q->tab_mask;
+	choke_zap_head_holes(q);
+	--sch->q.qlen;
+	sch->qstats.backlog -= qdisc_pkt_len(skb);
+
+	return skb;
+}
+
+static unsigned int choke_drop(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	unsigned int len;
+
+	len = qdisc_queue_drop(sch);
+	if (len > 0)
+		q->stats.other++;
+	else {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+	}
+
+	return len;
+}
+
+static void choke_reset(struct Qdisc* sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	red_restart(&q->parms);
+}
+
+static const struct nla_policy choke_policy[TCA_CHOKE_MAX + 1] = {
+	[TCA_CHOKE_PARMS]	= { .len = sizeof(struct tc_red_qopt) },
+	[TCA_CHOKE_STAB]	= { .len = 256 },
+};
+
+
+static void choke_free(void *addr)
+{
+	if (addr) {
+		if (is_vmalloc_addr(addr))
+			vfree(addr);
+		else
+			kfree(addr);
+	}
+}
+
+static int choke_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_CHOKE_MAX + 1];
+	struct tc_red_qopt *ctl;
+	int err;
+	struct sk_buff **old = NULL;
+	unsigned int mask;
+
+	if (opt == NULL)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CHOKE_MAX, opt, choke_policy);
+	if (err < 0)
+		return err;
+
+	if (tb[TCA_CHOKE_PARMS] == NULL ||
+	    tb[TCA_CHOKE_STAB] == NULL)
+		return -EINVAL;
+
+	ctl = nla_data(tb[TCA_CHOKE_PARMS]);
+
+	if (ctl->limit > CHOKE_MAX_QUEUE)
+		return -EINVAL;
+
+	mask = roundup_pow_of_two(ctl->limit + 1) - 1;
+	if (mask != q->tab_mask) {
+		struct sk_buff **ntab;
+
+		ntab = kcalloc(mask + 1, sizeof(struct sk_buff *), GFP_KERNEL);
+		if (!ntab)
+			ntab = vzalloc((mask + 1) * sizeof(struct sk_buff *));
+		if (!ntab)
+			return -ENOMEM;
+
+		sch_tree_lock(sch);
+		old = q->tab;
+		if (old) {
+			unsigned int tail = 0;
+
+			while (q->head != q->tail) {
+				ntab[tail++] = q->tab[q->head];
+				q->head = (q->head + 1) & q->tab_mask;
+			}
+			q->head = 0;
+			q->tail = tail;
+		}
+
+		q->tab_mask = mask;
+		q->tab = ntab;
+	} else
+		sch_tree_lock(sch);
+
+	q->flags = ctl->flags;
+	q->limit = ctl->limit;
+
+	red_set_parms(&q->parms, ctl->qth_min, ctl->qth_max, ctl->Wlog,
+		      ctl->Plog, ctl->Scell_log,
+		      nla_data(tb[TCA_CHOKE_STAB]));
+
+	if (q->head == q->tail)
+		red_end_of_idle_period(&q->parms);
+
+	sch_tree_unlock(sch);
+	choke_free(old);
+	return 0;
+}
+
+static int choke_init(struct Qdisc* sch, struct nlattr *opt)
+{
+	return choke_change(sch, opt);
+}
+
+static int choke_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts = NULL;
+	struct tc_red_qopt opt = {
+		.limit		= q->limit,
+		.flags		= q->flags,
+		.qth_min	= q->parms.qth_min >> q->parms.Wlog,
+		.qth_max	= q->parms.qth_max >> q->parms.Wlog,
+		.Wlog		= q->parms.Wlog,
+		.Plog		= q->parms.Plog,
+		.Scell_log	= q->parms.Scell_log,
+	};
+
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	if (opts == NULL)
+		goto nla_put_failure;
+
+	NLA_PUT(skb, TCA_CHOKE_PARMS, sizeof(opt), &opt);
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -EMSGSIZE;
+}
+
+static int choke_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tc_choke_xstats st = {
+		.early	= q->stats.prob_drop + q->stats.forced_drop,
+		.marked	= q->stats.prob_mark + q->stats.forced_mark,
+		.pdrop	= q->stats.pdrop,
+		.other	= q->stats.other,
+		.matched = q->stats.matched,
+	};
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static void choke_destroy(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	tcf_destroy_chain(&q->filter_list);
+	choke_free(q->tab);
+}
+
+static struct Qdisc *choke_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	return NULL;
+}
+
+static unsigned long choke_get(struct Qdisc *sch, u32 classid)
+{
+	return 0;
+}
+
+static void choke_put(struct Qdisc *q, unsigned long cl)
+{
+}
+
+static unsigned long choke_bind(struct Qdisc *sch, unsigned long parent,
+				u32 classid)
+{
+	return 0;
+}
+
+static struct tcf_proto **choke_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	if (cl)
+		return NULL;
+	return &q->filter_list;
+}
+
+static int choke_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+	tcm->tcm_handle |= TC_H_MIN(cl);
+	return 0;
+}
+
+static void choke_walk(struct Qdisc *sch, struct qdisc_walker *arg)
+{
+	if (!arg->stop) {
+		if (arg->fn(sch, 1, arg) < 0) {
+			arg->stop = 1;
+			return;
+		}
+		arg->count++;
+	}
+}
+
+static const struct Qdisc_class_ops choke_class_ops = {
+	.leaf		=	choke_leaf,
+	.get		=	choke_get,
+	.put		=	choke_put,
+	.tcf_chain	=	choke_find_tcf,
+	.bind_tcf	=	choke_bind,
+	.unbind_tcf	=	choke_put,
+	.dump		=	choke_dump_class,
+	.walk		=	choke_walk,
+};
+
+static struct sk_buff *choke_peek_head(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	return (q->head != q->tail) ? q->tab[q->head] : NULL;
+}
+
+static struct Qdisc_ops choke_qdisc_ops __read_mostly = {
+	.id		=	"choke",
+	.priv_size	=	sizeof(struct choke_sched_data),
+
+	.enqueue	=	choke_enqueue,
+	.dequeue	=	choke_dequeue,
+	.peek		=	choke_peek_head,
+	.drop		=	choke_drop,
+	.init		=	choke_init,
+	.destroy	=	choke_destroy,
+	.reset		=	choke_reset,
+	.change		=	choke_change,
+	.dump		=	choke_dump,
+	.dump_stats	=	choke_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init choke_module_init(void)
+{
+	return register_qdisc(&choke_qdisc_ops);
+}
+
+static void __exit choke_module_exit(void)
+{
+	unregister_qdisc(&choke_qdisc_ops);
+}
+
+module_init(choke_module_init)
+module_exit(choke_module_exit)
+
+MODULE_LICENSE("GPL");
--- a/include/linux/pkt_sched.h	2011-01-13 15:19:41.726023725 -0800
+++ b/include/linux/pkt_sched.h	2011-01-13 15:20:53.590380094 -0800
@@ -247,6 +247,35 @@ struct tc_gred_sopt {
 	__u16		pad1;
 };
 
+/* CHOKe section */
+
+enum {
+	TCA_CHOKE_UNSPEC,
+	TCA_CHOKE_PARMS,
+	TCA_CHOKE_STAB,
+	__TCA_CHOKE_MAX,
+};
+
+#define TCA_CHOKE_MAX (__TCA_CHOKE_MAX - 1)
+
+struct tc_choke_qopt {
+	__u32		limit;		/* HARD maximal queue length (packets)	*/
+	__u32		qth_min;	/* Min average length threshold (packets) */
+	__u32		qth_max;	/* Max average length threshold (packets) */
+	unsigned char   Wlog;		/* log(W)		*/
+	unsigned char   Plog;		/* log(P_max/(qth_max-qth_min))	*/
+	unsigned char   Scell_log;	/* cell size for idle damping */
+	unsigned char	flags;		/* see RED flags */
+};
+
+struct tc_choke_xstats {
+	__u32           early;          /* Early drops */
+	__u32           pdrop;          /* Drops due to queue limits */
+	__u32           other;          /* Drops due to drop() calls */
+	__u32           marked;         /* Marked packets */
+	__u32		matched;	/* Drops due to flow match */
+};
+
 /* HTB section */
 #define TC_HTB_NUMPRIO		8
 #define TC_HTB_MAXDEPTH		8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] CHOKe flow scheduler (0.8)
  2011-01-13 23:34   ` [PATCH] CHOKe flow scheduler (0.8) Stephen Hemminger
@ 2011-01-14  3:29     ` Eric Dumazet
  2011-01-14  3:34       ` Eric Dumazet
  2011-01-14 13:54     ` Patrick McHardy
  1 sibling, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-01-14  3:29 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

Le jeudi 13 janvier 2011 à 15:34 -0800, Stephen Hemminger a écrit :
> CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
> packet scheduler based on the Random Exponential Drop (RED) algorithm.
> 
> The core idea is:
>   For every packet arrival:
>   	Calculate Qave
> 	if (Qave < minth) 
> 	     Queue the new packet
> 	else 
> 	     Select randomly a packet from the queue 
> 	     if (both packets from same flow)
> 	     then Drop both the packets
> 	     else if (Qave > maxth)
> 	          Drop packet
> 	     else
> 	       	  Admit packet with probability P (same as RED)
> 
> See also:
>   Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
>    queue management scheme for approximating fair bandwidth allocation", 
>   Proceeding of INFOCOM'2000, March 2000.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> ---
> 0.8 change queue length and holes account.
>     keep sch->q.qlen updated, and holes counter not needed.
> 
> ---
>  net/sched/Kconfig     |   11 +
>  net/sched/Makefile    |    1 
>  net/sched/sch_choke.c |  536 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 548 insertions(+)
> 

Hi Stephen

Your diffstat was an old one, here the right one.

 include/linux/pkt_sched.h |   29 +
 net/sched/Kconfig         |   11 
 net/sched/Makefile        |    1 
 net/sched/sch_choke.c     |  552 ++++++++++++++++++++++++++++++++++++


I tested v8 and found several serious problems, please find a diff of my
latest changes :

- wrong oskb/skb used in choke_enqueue()
- choke_zap_head_holes() is called from choke_dequeue() and crash if we
dequeued last packet. (!!!)
- out of bound access in choke_zap_tail_holes()
- choke_dequeue() can be shorter
- choke_change() must dequeue/drop in excess packets or risk new array
overfill (if we reduce queue limit by tc qdisc change ...)
- inline is not needed, space errors in include file

Thanks !

 include/linux/pkt_sched.h |    8 +++---
 net/sched/sch_choke.c     |   43 ++++++++++++++++++++++--------------
 2 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 83bac92..498c798 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -269,10 +269,10 @@ struct tc_choke_qopt {
 };
 
 struct tc_choke_xstats {
-	__u32           early;          /* Early drops */
-	__u32           pdrop;          /* Drops due to queue limits */
-	__u32           other;          /* Drops due to drop() calls */
-	__u32           marked;         /* Marked packets */
+	__u32		early;          /* Early drops */
+	__u32		pdrop;          /* Drops due to queue limits */
+	__u32		other;          /* Drops due to drop() calls */
+	__u32		marked;         /* Marked packets */
 	__u32		matched;	/* Drops due to flow match */
 };
 
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 136d4e5..c710c57 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -74,7 +74,7 @@ struct choke_sched_data {
 };
 
 /* deliver a random number between 0 and N - 1 */
-static inline u32 random_N(unsigned int N)
+static u32 random_N(unsigned int N)
 {
 	return reciprocal_divide(random32(), N);
 }
@@ -99,13 +99,13 @@ static struct sk_buff *choke_peek_random(struct Qdisc *sch,
 }
 
 /* Is ECN parameter configured */
-static inline int use_ecn(const struct choke_sched_data *q)
+static int use_ecn(const struct choke_sched_data *q)
 {
 	return q->flags & TC_RED_ECN;
 }
 
 /* Should packets over max just be dropped (versus marked) */
-static inline int use_harddrop(const struct choke_sched_data *q)
+static int use_harddrop(const struct choke_sched_data *q)
 {
 	return q->flags & TC_RED_HARDDROP;
 }
@@ -113,20 +113,21 @@ static inline int use_harddrop(const struct choke_sched_data *q)
 /* Move head pointer forward to skip over holes */
 static void choke_zap_head_holes(struct choke_sched_data *q)
 {
-	while (q->tab[q->head] == NULL) {
+	do {
 		q->head = (q->head + 1) & q->tab_mask;
-
-		BUG_ON(q->head == q->tail);
-	}
+		if (q->head == q->tail)
+			break;
+	} while (q->tab[q->head] == NULL);
 }
 
 /* Move tail pointer backwards to reuse holes */
 static void choke_zap_tail_holes(struct choke_sched_data *q)
 {
-	while (q->tab[q->tail - 1] == NULL) {
+	do {
 		q->tail = (q->tail - 1) & q->tab_mask;
-		BUG_ON(q->head == q->tail);
-	}
+		if (q->head == q->tail)
+			break;
+	} while (q->tab[q->tail] == NULL);
 }
 
 /* Drop packet from queue array by creating a "hole" */
@@ -145,7 +146,7 @@ static void choke_drop_by_idx(struct choke_sched_data *q, unsigned int idx)
    2. fast internal classification
    3. use TC filter based classification
 */
-static inline unsigned int choke_classify(struct sk_buff *skb,
+static unsigned int choke_classify(struct sk_buff *skb,
 					  struct Qdisc *sch, int *qerr)
 
 {
@@ -218,7 +219,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 			/* Drop both packets */
 			q->stats.matched++;
 			choke_drop_by_idx(q, idx);
-			sch->qstats.backlog -= qdisc_pkt_len(skb);
+			sch->qstats.backlog -= qdisc_pkt_len(oskb);
 			--sch->q.qlen;
 			qdisc_drop(oskb, sch);
 			goto congestion_drop;
@@ -285,8 +286,7 @@ static struct sk_buff *choke_dequeue(struct Qdisc *sch)
 	}
 
 	skb = q->tab[q->head];
-	q->tab[q->head] = NULL; /* not really needed */
-	q->head = (q->head + 1) & q->tab_mask;
+	q->tab[q->head] = NULL;
 	choke_zap_head_holes(q);
 	--sch->q.qlen;
 	sch->qstats.backlog -= qdisc_pkt_len(skb);
@@ -371,12 +371,23 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
 		sch_tree_lock(sch);
 		old = q->tab;
 		if (old) {
-			unsigned int tail = 0;
+			unsigned int oqlen = sch->q.qlen, tail = 0;
 
 			while (q->head != q->tail) {
-				ntab[tail++] = q->tab[q->head];
+				struct sk_buff *skb = q->tab[q->head];
+
 				q->head = (q->head + 1) & q->tab_mask;
+				if (!skb)
+					continue;
+				if (tail < mask) {
+					ntab[tail++] = skb;
+					continue;
+				}
+				sch->qstats.backlog -= qdisc_pkt_len(skb);
+				--sch->q.qlen;
+				qdisc_drop(skb, sch);
 			}
+			qdisc_tree_decrease_qlen(sch, oqlen - sch->q.qlen);
 			q->head = 0;
 			q->tail = tail;
 		}



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] CHOKe flow scheduler (0.8)
  2011-01-14  3:29     ` Eric Dumazet
@ 2011-01-14  3:34       ` Eric Dumazet
  2011-01-14  3:58         ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-01-14  3:34 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

Le vendredi 14 janvier 2011 à 04:29 +0100, Eric Dumazet a écrit :
> Le jeudi 13 janvier 2011 à 15:34 -0800, Stephen Hemminger a écrit :
> > CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
> > packet scheduler based on the Random Exponential Drop (RED) algorithm.
> > 
> > The core idea is:
> >   For every packet arrival:
> >   	Calculate Qave
> > 	if (Qave < minth) 
> > 	     Queue the new packet
> > 	else 
> > 	     Select randomly a packet from the queue 
> > 	     if (both packets from same flow)
> > 	     then Drop both the packets
> > 	     else if (Qave > maxth)
> > 	          Drop packet
> > 	     else
> > 	       	  Admit packet with probability P (same as RED)
> > 
> > See also:
> >   Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
> >    queue management scheme for approximating fair bandwidth allocation", 
> >   Proceeding of INFOCOM'2000, March 2000.
> > 
> > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> > 
> > ---
> > 0.8 change queue length and holes account.
> >     keep sch->q.qlen updated, and holes counter not needed.
> > 
> > ---
> >  net/sched/Kconfig     |   11 +
> >  net/sched/Makefile    |    1 
> >  net/sched/sch_choke.c |  536 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 548 insertions(+)
> > 
> 
> Hi Stephen
> 
> Your diffstat was an old one, here the right one.
> 
>  include/linux/pkt_sched.h |   29 +
>  net/sched/Kconfig         |   11 
>  net/sched/Makefile        |    1 
>  net/sched/sch_choke.c     |  552 ++++++++++++++++++++++++++++++++++++
> 
> 
> I tested v8 and found several serious problems, please find a diff of my
> latest changes :
> 
> - wrong oskb/skb used in choke_enqueue()
> - choke_zap_head_holes() is called from choke_dequeue() and crash if we
> dequeued last packet. (!!!)
> - out of bound access in choke_zap_tail_holes()
> - choke_dequeue() can be shorter
> - choke_change() must dequeue/drop in excess packets or risk new array
> overfill (if we reduce queue limit by tc qdisc change ...)
> - inline is not needed, space errors in include file
> 
> Thanks !

Hmm, please wait a bit, I had another crash when I stopped my
bench/stress




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] CHOKe flow scheduler (0.8)
  2011-01-14  3:34       ` Eric Dumazet
@ 2011-01-14  3:58         ` Eric Dumazet
  2011-01-14 11:32           ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-01-14  3:58 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

Le vendredi 14 janvier 2011 à 04:34 +0100, Eric Dumazet a écrit :

> Hmm, please wait a bit, I had another crash when I stopped my
> bench/stress

I am not sure p->qavg is correctly computed.

Crash happened because choke_peek_random() was called while no packet
was in queue.

With my params (min=10833 max=32500 burst=18055 limit=130000) this
implies qavg was very big while qlen==0 !

qdisc choke 11: dev ifb0 parent 1:11 limit 130000b min 10833b max 32500b ewma 13 Plog 21 Scell_log 30
 Sent 200857857 bytes 365183 pkt (dropped 1010937, overlimits 557577 requeues 0) 
 rate 32253Kbit 7330pps backlog 17875996b 32505p requeues 0 
  marked 0 early 557577 pdrop 0 other 0 matched 226680


Here is latest diff :

 include/linux/pkt_sched.h |    8 +++----
 net/sched/sch_choke.c     |   50 +++++++++++++++++++++++++++++-----------------
 2 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 83bac92..498c798 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -269,10 +269,10 @@ struct tc_choke_qopt {
 };
 
 struct tc_choke_xstats {
-	__u32           early;          /* Early drops */
-	__u32           pdrop;          /* Drops due to queue limits */
-	__u32           other;          /* Drops due to drop() calls */
-	__u32           marked;         /* Marked packets */
+	__u32		early;          /* Early drops */
+	__u32		pdrop;          /* Drops due to queue limits */
+	__u32		other;          /* Drops due to drop() calls */
+	__u32		marked;         /* Marked packets */
 	__u32		matched;	/* Drops due to flow match */
 };
 
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 136d4e5..2f94dad 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -74,7 +74,7 @@ struct choke_sched_data {
 };
 
 /* deliver a random number between 0 and N - 1 */
-static inline u32 random_N(unsigned int N)
+static u32 random_N(unsigned int N)
 {
 	return reciprocal_divide(random32(), N);
 }
@@ -94,18 +94,20 @@ static struct sk_buff *choke_peek_random(struct Qdisc *sch,
 			return skb;
 	} while (--retrys > 0);
 
-	/* queue is has lots of holes use the head which is known to exist */
+	/* queue is has lots of holes use the head which is known to exist
+	 * Note : result can still be NULL if q->head == q->tail
+	 */
 	return q->tab[*pidx = q->head];
 }
 
 /* Is ECN parameter configured */
-static inline int use_ecn(const struct choke_sched_data *q)
+static int use_ecn(const struct choke_sched_data *q)
 {
 	return q->flags & TC_RED_ECN;
 }
 
 /* Should packets over max just be dropped (versus marked) */
-static inline int use_harddrop(const struct choke_sched_data *q)
+static int use_harddrop(const struct choke_sched_data *q)
 {
 	return q->flags & TC_RED_HARDDROP;
 }
@@ -113,20 +115,21 @@ static inline int use_harddrop(const struct choke_sched_data *q)
 /* Move head pointer forward to skip over holes */
 static void choke_zap_head_holes(struct choke_sched_data *q)
 {
-	while (q->tab[q->head] == NULL) {
+	do {
 		q->head = (q->head + 1) & q->tab_mask;
-
-		BUG_ON(q->head == q->tail);
-	}
+		if (q->head == q->tail)
+			break;
+	} while (q->tab[q->head] == NULL);
 }
 
 /* Move tail pointer backwards to reuse holes */
 static void choke_zap_tail_holes(struct choke_sched_data *q)
 {
-	while (q->tab[q->tail - 1] == NULL) {
+	do {
 		q->tail = (q->tail - 1) & q->tab_mask;
-		BUG_ON(q->head == q->tail);
-	}
+		if (q->head == q->tail)
+			break;
+	} while (q->tab[q->tail] == NULL);
 }
 
 /* Drop packet from queue array by creating a "hole" */
@@ -145,7 +148,7 @@ static void choke_drop_by_idx(struct choke_sched_data *q, unsigned int idx)
    2. fast internal classification
    3. use TC filter based classification
 */
-static inline unsigned int choke_classify(struct sk_buff *skb,
+static unsigned int choke_classify(struct sk_buff *skb,
 					  struct Qdisc *sch, int *qerr)
 
 {
@@ -214,11 +217,12 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		oskb = choke_peek_random(sch, &idx);
 
 		/* Both packets from same flow ? */
-		if (*(unsigned int *)(qdisc_skb_cb(oskb)->data) == hash) {
+		if (oskb &&
+		    *(unsigned int *)(qdisc_skb_cb(oskb)->data) == hash) {
 			/* Drop both packets */
 			q->stats.matched++;
 			choke_drop_by_idx(q, idx);
-			sch->qstats.backlog -= qdisc_pkt_len(skb);
+			sch->qstats.backlog -= qdisc_pkt_len(oskb);
 			--sch->q.qlen;
 			qdisc_drop(oskb, sch);
 			goto congestion_drop;
@@ -285,8 +289,7 @@ static struct sk_buff *choke_dequeue(struct Qdisc *sch)
 	}
 
 	skb = q->tab[q->head];
-	q->tab[q->head] = NULL; /* not really needed */
-	q->head = (q->head + 1) & q->tab_mask;
+	q->tab[q->head] = NULL;
 	choke_zap_head_holes(q);
 	--sch->q.qlen;
 	sch->qstats.backlog -= qdisc_pkt_len(skb);
@@ -371,12 +374,23 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
 		sch_tree_lock(sch);
 		old = q->tab;
 		if (old) {
-			unsigned int tail = 0;
+			unsigned int oqlen = sch->q.qlen, tail = 0;
 
 			while (q->head != q->tail) {
-				ntab[tail++] = q->tab[q->head];
+				struct sk_buff *skb = q->tab[q->head];
+
 				q->head = (q->head + 1) & q->tab_mask;
+				if (!skb)
+					continue;
+				if (tail < mask) {
+					ntab[tail++] = skb;
+					continue;
+				}
+				sch->qstats.backlog -= qdisc_pkt_len(skb);
+				--sch->q.qlen;
+				qdisc_drop(skb, sch);
 			}
+			qdisc_tree_decrease_qlen(sch, oqlen - sch->q.qlen);
 			q->head = 0;
 			q->tail = tail;
 		}



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] CHOKe flow scheduler (0.8)
  2011-01-14  3:58         ` Eric Dumazet
@ 2011-01-14 11:32           ` Eric Dumazet
  0 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2011-01-14 11:32 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

Le vendredi 14 janvier 2011 à 04:58 +0100, Eric Dumazet a écrit :
> Le vendredi 14 janvier 2011 à 04:34 +0100, Eric Dumazet a écrit :
> 
> > Hmm, please wait a bit, I had another crash when I stopped my
> > bench/stress
> 
> I am not sure p->qavg is correctly computed.
> 
> Crash happened because choke_peek_random() was called while no packet
> was in queue.
> 
> With my params (min=10833 max=32500 burst=18055 limit=130000) this
> implies qavg was very big while qlen==0 !
> 
> qdisc choke 11: dev ifb0 parent 1:11 limit 130000b min 10833b max 32500b ewma 13 Plog 21 Scell_log 30
>  Sent 200857857 bytes 365183 pkt (dropped 1010937, overlimits 557577 requeues 0) 
>  rate 32253Kbit 7330pps backlog 17875996b 32505p requeues 0 
>   marked 0 early 557577 pdrop 0 other 0 matched 226680

Moving the qdisc_bstats_update(sch, skb); out of choke_enqueue() to
choke_dequeue(), I get nicer rate values (because packets that are
enqueued but CHOKed dont artificialy raise the packet/byte rates)

Now, rate properly matches my 10Mbit CBQ bandwidth :

qdisc choke 11: parent 1:11 limit 130000b min 10833b max 32500b ewma 13 Plog 21 Scell_log 30
 Sent 86470970 bytes 157418 pkt (dropped 127451, overlimits 48275 requeues 0) 
 rate 9947Kbit 2264pps backlog 17759368b 32288p requeues 0 
  marked 0 early 48275 pdrop 0 other 0 matched 39588


For other qdiscs, it is less easy because qdisc_bstats_update() call is
integrated in __qdisc_enqueue_tail() / qdisc_enqueue_tail(), so all
users shall be updated at once.






^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] CHOKe flow scheduler (0.8)
  2011-01-13 23:34   ` [PATCH] CHOKe flow scheduler (0.8) Stephen Hemminger
  2011-01-14  3:29     ` Eric Dumazet
@ 2011-01-14 13:54     ` Patrick McHardy
  2011-01-14 13:55       ` Patrick McHardy
  1 sibling, 1 reply; 26+ messages in thread
From: Patrick McHardy @ 2011-01-14 13:54 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Eric Dumazet, David Miller, netdev

On 14.01.2011 00:34, Stephen Hemminger wrote:
> +static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
> +{
> ...
> +	/* Is queue small? */
> +	if (p->qavg <= p->qth_min)
> +		p->qcount = -1;
> +	else {
> +		struct sk_buff *oskb;
> +		unsigned int idx;
> +
> +		/* Draw a packet at random from queue */
> +		oskb = choke_peek_random(sch, &idx);
> +
> +		/* Both packets from same flow ? */
> +		if (*(unsigned int *)(qdisc_skb_cb(oskb)->data) == hash) {
> +			/* Drop both packets */
> +			q->stats.matched++;
> +			choke_drop_by_idx(q, idx);
> +			sch->qstats.backlog -= qdisc_pkt_len(skb);
> +			--sch->q.qlen;
> +			qdisc_drop(oskb, sch);

You need to adjust the qlen values of parent qdiscs by calling
qdisc_tree_decrease_qlen(), they are not aware that a second
packet has been dropped.

> +			goto congestion_drop;
> +		}
> +

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] CHOKe flow scheduler (0.8)
  2011-01-14 13:54     ` Patrick McHardy
@ 2011-01-14 13:55       ` Patrick McHardy
  2011-01-14 14:24         ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Patrick McHardy @ 2011-01-14 13:55 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Eric Dumazet, David Miller, netdev

On 14.01.2011 14:54, Patrick McHardy wrote:
> On 14.01.2011 00:34, Stephen Hemminger wrote:
>> +static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>> +{
>> ...
>> +	/* Is queue small? */
>> +	if (p->qavg <= p->qth_min)
>> +		p->qcount = -1;
>> +	else {
>> +		struct sk_buff *oskb;
>> +		unsigned int idx;
>> +
>> +		/* Draw a packet at random from queue */
>> +		oskb = choke_peek_random(sch, &idx);
>> +
>> +		/* Both packets from same flow ? */
>> +		if (*(unsigned int *)(qdisc_skb_cb(oskb)->data) == hash) {
>> +			/* Drop both packets */
>> +			q->stats.matched++;
>> +			choke_drop_by_idx(q, idx);
>> +			sch->qstats.backlog -= qdisc_pkt_len(skb);
>> +			--sch->q.qlen;
>> +			qdisc_drop(oskb, sch);
> 
> You need to adjust the qlen values of parent qdiscs by calling
> qdisc_tree_decrease_qlen(), they are not aware that a second
> packet has been dropped.

I just saw that Eric already fixed this :)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] CHOKe flow scheduler (0.8)
  2011-01-14 13:55       ` Patrick McHardy
@ 2011-01-14 14:24         ` Eric Dumazet
  2011-01-14 23:45           ` [PATCH] CHOKe flow scheduler (0.9) Stephen Hemminger
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-01-14 14:24 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Stephen Hemminger, David Miller, netdev

Le vendredi 14 janvier 2011 à 14:55 +0100, Patrick McHardy a écrit :
> On 14.01.2011 14:54, Patrick McHardy wrote:
> > On 14.01.2011 00:34, Stephen Hemminger wrote:
> >> +static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
> >> +{
> >> ...
> >> +	/* Is queue small? */
> >> +	if (p->qavg <= p->qth_min)
> >> +		p->qcount = -1;
> >> +	else {
> >> +		struct sk_buff *oskb;
> >> +		unsigned int idx;
> >> +
> >> +		/* Draw a packet at random from queue */
> >> +		oskb = choke_peek_random(sch, &idx);
> >> +
> >> +		/* Both packets from same flow ? */
> >> +		if (*(unsigned int *)(qdisc_skb_cb(oskb)->data) == hash) {
> >> +			/* Drop both packets */
> >> +			q->stats.matched++;
> >> +			choke_drop_by_idx(q, idx);
> >> +			sch->qstats.backlog -= qdisc_pkt_len(skb);
> >> +			--sch->q.qlen;
> >> +			qdisc_drop(oskb, sch);
> > 
> > You need to adjust the qlen values of parent qdiscs by calling
> > qdisc_tree_decrease_qlen(), they are not aware that a second
> > packet has been dropped.
> 
> I just saw that Eric already fixed this :)
> --

Thanks Patrick, I did that in choke_change() only, not on this part of
the code.


qdisc cbq 1: root refcnt 2 rate 1000Mbit cell 8b (bounded,isolated) prio no-transmit/8 weight 1000Mbit allot 1514b 
level 2 ewma 5 avpkt 1000b maxidle 0us 
 Sent 92016573 bytes 167811 pkt (dropped 615544, overlimits 982637 requeues 0) 
 rate 6248bit 9pps backlog 0b 191202p requeues 0 
  borrowed 0 overactions 0 avgidle 125 undertime 0
qdisc choke 11: parent 1:11 limit 130000b min 10833b max 32500b ewma 13 Plog 21 Scell_log 30
 Sent 91969929 bytes 167257 pkt (dropped 806746, overlimits 424342 requeues 0) 
 rate 56bit 0pps backlog 0b 0p requeues 0 
  marked 0 early 424342 pdrop 0 other 0 matched 191202


So yes : we see 191202 were CHOKed, and upper cbq leaks 191202 packets in 'backlog'

After following patch :

diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 29a91d8..5479f7e 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -223,6 +223,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 			q->stats.matched++;
 			choke_drop_by_idx(q, idx);
 			sch->qstats.backlog -= qdisc_pkt_len(oskb);
+			qdisc_tree_decrease_qlen(sch, 1);
 			--sch->q.qlen;
 			qdisc_drop(oskb, sch);
 			goto congestion_drop;

Everything seems fine :

qdisc cbq 1: root refcnt 2 rate 1000Mbit cell 8b (bounded,isolated) prio no-transmit/8 weight 1000Mbit allot 1514b 
level 2 ewma 5 avpkt 1000b maxidle 0us 
 Sent 93962148 bytes 170940 pkt (dropped 633053, overlimits 1010490 requeues 0) 
 rate 568bit 1pps backlog 0b 0p requeues 0 
  borrowed 0 overactions 0 avgidle 125 undertime 0
qdisc choke 11: parent 1:11 limit 130000b min 10833b max 32500b ewma 13 Plog 21 Scell_log 30
 Sent 93957621 bytes 170888 pkt (dropped 829160, overlimits 436946 requeues 0) 
 rate 104bit 0pps backlog 0b 0p requeues 0 
  marked 0 early 436946 pdrop 0 other 0 matched 196107



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH] CHOKe flow scheduler (0.9)
  2011-01-14 14:24         ` Eric Dumazet
@ 2011-01-14 23:45           ` Stephen Hemminger
  2011-01-15  7:45             ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Stephen Hemminger @ 2011-01-14 23:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Patrick McHardy, David Miller, netdev

CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
packet scheduler based on the Random Exponential Drop (RED) algorithm.

The core idea is:
  For every packet arrival:
  	Calculate Qave
	if (Qave < minth) 
	     Queue the new packet
	else 
	     Select randomly a packet from the queue 
	     if (both packets from same flow)
	     then Drop both the packets
	     else if (Qave > maxth)
	          Drop packet
	     else
	       	  Admit packet with proability p (same as RED)

See also:
  Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
   queue management scheme for approximating fair bandwidth allocation", 
  Proceeding of INFOCOM'2000, March 2000.

Help from:
     Eric Dumazet <eric.dumazet@gmail.com>
     Patrick McHardy <kaber@trash.net>

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
This version is based on net-next, and assumes Eric's patch for
corrected bstats is already applied.

0.9 incorporate patches from Patrick/Eric
    rework the peek_random and drop code to simplify and fix bug where
    random_N needs to called with full length (including holes).

 include/linux/pkt_sched.h |   29 ++
 net/sched/Kconfig         |   11 
 net/sched/Makefile        |    1 
 net/sched/sch_choke.c     |  579 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 620 insertions(+)

--- a/net/sched/Kconfig	2011-01-14 10:43:19.062537393 -0800
+++ b/net/sched/Kconfig	2011-01-14 10:43:50.915195785 -0800
@@ -205,6 +205,17 @@ config NET_SCH_DRR
 
 	  If unsure, say N.
 
+config NET_SCH_CHOKE
+	tristate "CHOose and Keep responsive flow scheduler (CHOKE)"
+	help
+	  Say Y here if you want to use the CHOKe packet scheduler (CHOose
+	  and Keep for responsive flows, CHOose and Kill for unresponsive
+	  flows). This is a variation of RED which trys to penalize flows
+	  that monopolize the queue.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called sch_choke.
+
 config NET_SCH_INGRESS
 	tristate "Ingress Qdisc"
 	depends on NET_CLS_ACT
--- a/net/sched/Makefile	2011-01-14 10:43:19.072538228 -0800
+++ b/net/sched/Makefile	2011-01-14 10:43:50.915195785 -0800
@@ -32,6 +32,7 @@ obj-$(CONFIG_NET_SCH_MULTIQ)	+= sch_mult
 obj-$(CONFIG_NET_SCH_ATM)	+= sch_atm.o
 obj-$(CONFIG_NET_SCH_NETEM)	+= sch_netem.o
 obj-$(CONFIG_NET_SCH_DRR)	+= sch_drr.o
+obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
 obj-$(CONFIG_NET_CLS_FW)	+= cls_fw.o
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ b/net/sched/sch_choke.c	2011-01-14 11:30:33.826240296 -0800
@@ -0,0 +1,579 @@
+/*
+ * net/sched/sch_choke.c	CHOKE scheduler
+ *
+ * Copyright (c) 2011 Stephen Hemminger <shemminger@vyatta.com>
+ * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/reciprocal_div.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <net/red.h>
+
+/*	CHOKe stateless AQM for fair bandwidth allocation
+        =================================================
+
+   CHOKe (CHOose and Keep for responsive flows, CHOose and Kill for
+   unresponsive flows) is a variant of RED that penalizes misbehaving flows but
+   maintains no flow state. The difference from RED is an additional step
+   during the enqueuing process. If average queue size is over the
+   low threshold (qmin), a packet is chosen at random from the queue.
+   If both the new and chosen packet are from the same flow, both
+   are dropped. Unlike RED, CHOKe is not really a "classful" qdisc because it
+   needs to access packets in queue randomly. It has a minimal class
+   interface to allow overriding the builtin flow classifier with
+   filters.
+
+   Source:
+   R. Pan, B. Prabhakar, and K. Psounis, "CHOKe, A Stateless
+   Active Queue Management Scheme for Approximating Fair Bandwidth Allocation",
+   IEEE INFOCOM, 2000.
+
+   A. Tang, J. Wang, S. Low, "Understanding CHOKe: Throughput and Spatial
+   Characteristics", IEEE/ACM Transactions on Networking, 2004
+
+ */
+
+/* Upper bound on size of sk_buff table */
+#define CHOKE_MAX_QUEUE	(128*1024 - 1)
+
+struct choke_sched_data {
+/* Parameters */
+	u32		 limit;
+	unsigned char	 flags;
+
+	struct red_parms parms;
+
+/* Variables */
+	struct tcf_proto *filter_list;
+	struct {
+		u32	prob_drop;	/* Early probability drops */
+		u32	prob_mark;	/* Early probability marks */
+		u32	forced_drop;	/* Forced drops, qavg > max_thresh */
+		u32	forced_mark;	/* Forced marks, qavg > max_thresh */
+		u32	pdrop;          /* Drops due to queue limits */
+		u32	other;          /* Drops due to drop() calls */
+		u32	matched;	/* Drops to flow match */
+	} stats;
+
+	unsigned int	 head;
+	unsigned int	 tail;
+
+	unsigned int	 tab_mask; /* size - 1 */
+
+	struct sk_buff **tab;
+};
+
+/* deliver a random number between 0 and N - 1 */
+static u32 random_N(unsigned int N)
+{
+	return reciprocal_divide(random32(), N);
+}
+
+/* number of elements in queue including holes */
+static unsigned int choke_len(const struct choke_sched_data *q)
+{
+	return (q->tail - q->head) & q->tab_mask;
+}
+
+/* Is ECN parameter configured */
+static int use_ecn(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_ECN;
+}
+
+/* Should packets over max just be dropped (versus marked) */
+static int use_harddrop(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_HARDDROP;
+}
+
+/* Move head pointer forward to skip over holes */
+static void choke_zap_head_holes(struct choke_sched_data *q)
+{
+	do {
+		q->head = (q->head + 1) & q->tab_mask;
+		if (q->head == q->tail)
+			break;
+	} while (q->tab[q->head] == NULL);
+}
+
+/* Move tail pointer backwards to reuse holes */
+static void choke_zap_tail_holes(struct choke_sched_data *q)
+{
+	do {
+		q->tail = (q->tail - 1) & q->tab_mask;
+		if (q->head == q->tail)
+			break;
+	} while (q->tab[q->tail] == NULL);
+}
+
+/* Drop packet from queue array by creating a "hole" */
+static void choke_drop_by_idx(struct Qdisc *sch, unsigned int idx)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb = q->tab[idx];
+
+	q->tab[idx] = NULL;
+
+	if (idx == q->head)
+		choke_zap_head_holes(q);
+	if (idx == q->tail)
+		choke_zap_tail_holes(q);
+
+	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	qdisc_drop(skb, sch);
+	qdisc_tree_decrease_qlen(sch, 1);
+	--sch->q.qlen;
+}
+
+/* Classify flow using either:
+   1. pre-existing classification result in skb
+   2. fast internal classification
+   3. use TC filter based classification
+*/
+static unsigned int choke_classify(struct sk_buff *skb,
+				   struct Qdisc *sch, int *qerr)
+
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tcf_result res;
+	int result;
+
+	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+
+	if (TC_H_MAJ(skb->priority) == sch->handle &&
+	    TC_H_MIN(skb->priority) > 0)
+		return TC_H_MIN(skb->priority);
+
+	if (!q->filter_list)
+		return skb_get_rxhash(skb);
+
+	result = tc_classify(skb, q->filter_list, &res);
+	if (result >= 0) {
+#ifdef CONFIG_NET_CLS_ACT
+		switch (result) {
+		case TC_ACT_STOLEN:
+		case TC_ACT_QUEUED:
+			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+		case TC_ACT_SHOT:
+			return 0;
+		}
+#endif
+		return TC_H_MIN(res.classid);
+	}
+
+	return 0;
+}
+
+/* Select a packet at random from the queue and return flow classification
+   returns 0 if queue empty
+ */
+static unsigned int choke_peek_random(struct Qdisc *sch,
+				      unsigned int *pidx)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	const struct sk_buff *skb;
+	int retrys = 3;
+
+	do {
+		*pidx = (q->head + random_N(choke_len(q))) & q->tab_mask;
+		skb = q->tab[*pidx];
+		if (skb)
+			break;
+	} while (--retrys > 0);
+
+	/* queue is has lots of holes use the head which is known to exist
+	 * Note : result can still be NULL if q->head == q->tail
+	 */
+	if (!skb) {
+		*pidx = q->head;
+		skb = q->tab[*pidx];
+	}
+
+	/* retrieve flow classification is saved during enqueue */
+	return skb ? *(unsigned int *)(qdisc_skb_cb(skb)->data) : 0;
+}
+
+static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct red_parms *p = &q->parms;
+	unsigned int hash;
+	int uninitialized_var(ret);
+
+	hash = choke_classify(skb, sch, &ret);
+	if (!hash) {
+		/* Packet was eaten by filter */
+		if (ret & __NET_XMIT_BYPASS)
+			sch->qstats.drops++;
+		kfree_skb(skb);
+		return ret;
+	}
+
+	/* Maybe add hash as field in struct qdisc_skb_cb? */
+	*(unsigned int *)(qdisc_skb_cb(skb)->data) = hash;
+
+	/* Compute average queue usage (see RED) */
+	p->qavg = red_calc_qavg(p, sch->q.qlen);
+	if (red_is_idling(p))
+		red_end_of_idle_period(p);
+
+	/* Is queue small? */
+	if (p->qavg <= p->qth_min)
+		p->qcount = -1;
+	else {
+		unsigned int idx;
+
+		/* Draw a packet at random from queue and compare flow */
+		if (choke_peek_random(sch, &idx) == hash) {
+			q->stats.matched++;
+			choke_drop_by_idx(sch, idx);
+			goto congestion_drop;
+		}
+
+		/* Queue is large, always mark/drop */
+		if (p->qavg > p->qth_max) {
+			p->qcount = -1;
+
+			sch->qstats.overlimits++;
+			if (use_harddrop(q) || !use_ecn(q) ||
+			    !INET_ECN_set_ce(skb)) {
+				q->stats.forced_drop++;
+				goto congestion_drop;
+			}
+
+			q->stats.forced_mark++;
+		} else if (++p->qcount) {
+			if (red_mark_probability(p, p->qavg)) {
+				p->qcount = 0;
+				p->qR = red_random(p);
+
+				sch->qstats.overlimits++;
+				if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
+					q->stats.prob_drop++;
+					goto congestion_drop;
+				}
+
+				q->stats.prob_mark++;
+			}
+		} else
+			p->qR = red_random(p);
+	}
+
+	/* Admit new packet */
+	if (sch->q.qlen < q->limit) {
+		q->tab[q->tail] = skb;
+		q->tail = (q->tail + 1) & q->tab_mask;
+		++sch->q.qlen;
+		sch->qstats.backlog += qdisc_pkt_len(skb);
+		qdisc_bstats_update(sch, skb);
+		return NET_XMIT_SUCCESS;
+	}
+
+	q->stats.pdrop++;
+	sch->qstats.drops++;
+	kfree_skb(skb);
+	return NET_XMIT_DROP;
+
+ congestion_drop:
+	qdisc_drop(skb, sch);
+	return NET_XMIT_CN;
+}
+
+static struct sk_buff *choke_dequeue(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+
+	if (q->head == q->tail) {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+		return NULL;
+	}
+
+	skb = q->tab[q->head];
+	q->tab[q->head] = NULL;
+	choke_zap_head_holes(q);
+	--sch->q.qlen;
+	sch->qstats.backlog -= qdisc_pkt_len(skb);
+
+	return skb;
+}
+
+static unsigned int choke_drop(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	unsigned int len;
+
+	len = qdisc_queue_drop(sch);
+	if (len > 0)
+		q->stats.other++;
+	else {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+	}
+
+	return len;
+}
+
+static void choke_reset(struct Qdisc* sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	red_restart(&q->parms);
+}
+
+static const struct nla_policy choke_policy[TCA_CHOKE_MAX + 1] = {
+	[TCA_CHOKE_PARMS]	= { .len = sizeof(struct tc_red_qopt) },
+	[TCA_CHOKE_STAB]	= { .len = RED_STAB_SIZE },
+};
+
+
+static void choke_free(void *addr)
+{
+	if (addr) {
+		if (is_vmalloc_addr(addr))
+			vfree(addr);
+		else
+			kfree(addr);
+	}
+}
+
+static int choke_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_CHOKE_MAX + 1];
+	const struct tc_red_qopt *ctl;
+	int err;
+	struct sk_buff **old = NULL;
+	unsigned int mask;
+
+	if (opt == NULL)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CHOKE_MAX, opt, choke_policy);
+	if (err < 0)
+		return err;
+
+	if (tb[TCA_CHOKE_PARMS] == NULL ||
+	    tb[TCA_CHOKE_STAB] == NULL)
+		return -EINVAL;
+
+	ctl = nla_data(tb[TCA_CHOKE_PARMS]);
+
+	if (ctl->limit > CHOKE_MAX_QUEUE)
+		return -EINVAL;
+
+	mask = roundup_pow_of_two(ctl->limit + 1) - 1;
+	if (mask != q->tab_mask) {
+		struct sk_buff **ntab;
+
+		ntab = kcalloc(mask + 1, sizeof(struct sk_buff *), GFP_KERNEL);
+		if (!ntab)
+			ntab = vzalloc((mask + 1) * sizeof(struct sk_buff *));
+		if (!ntab)
+			return -ENOMEM;
+
+		sch_tree_lock(sch);
+		old = q->tab;
+		if (old) {
+			unsigned int oqlen = sch->q.qlen, tail = 0;
+
+			while (q->head != q->tail) {
+				struct sk_buff *skb = q->tab[q->head];
+
+				q->head = (q->head + 1) & q->tab_mask;
+				if (!skb)
+					continue;
+				if (tail < mask) {
+					ntab[tail++] = skb;
+					continue;
+				}
+				sch->qstats.backlog -= qdisc_pkt_len(skb);
+				--sch->q.qlen;
+				qdisc_drop(skb, sch);
+			}
+			qdisc_tree_decrease_qlen(sch, oqlen - sch->q.qlen);
+			q->head = 0;
+			q->tail = tail;
+		}
+
+		q->tab_mask = mask;
+		q->tab = ntab;
+	} else
+		sch_tree_lock(sch);
+
+	q->flags = ctl->flags;
+	q->limit = ctl->limit;
+
+	red_set_parms(&q->parms, ctl->qth_min, ctl->qth_max, ctl->Wlog,
+		      ctl->Plog, ctl->Scell_log,
+		      nla_data(tb[TCA_CHOKE_STAB]));
+
+	if (q->head == q->tail)
+		red_end_of_idle_period(&q->parms);
+
+	sch_tree_unlock(sch);
+	choke_free(old);
+	return 0;
+}
+
+static int choke_init(struct Qdisc* sch, struct nlattr *opt)
+{
+	return choke_change(sch, opt);
+}
+
+static int choke_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts = NULL;
+	struct tc_red_qopt opt = {
+		.limit		= q->limit,
+		.flags		= q->flags,
+		.qth_min	= q->parms.qth_min >> q->parms.Wlog,
+		.qth_max	= q->parms.qth_max >> q->parms.Wlog,
+		.Wlog		= q->parms.Wlog,
+		.Plog		= q->parms.Plog,
+		.Scell_log	= q->parms.Scell_log,
+	};
+
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	if (opts == NULL)
+		goto nla_put_failure;
+
+	NLA_PUT(skb, TCA_CHOKE_PARMS, sizeof(opt), &opt);
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -EMSGSIZE;
+}
+
+static int choke_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tc_choke_xstats st = {
+		.early	= q->stats.prob_drop + q->stats.forced_drop,
+		.marked	= q->stats.prob_mark + q->stats.forced_mark,
+		.pdrop	= q->stats.pdrop,
+		.other	= q->stats.other,
+		.matched = q->stats.matched,
+	};
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static void choke_destroy(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	tcf_destroy_chain(&q->filter_list);
+	choke_free(q->tab);
+}
+
+static struct Qdisc *choke_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	return NULL;
+}
+
+static unsigned long choke_get(struct Qdisc *sch, u32 classid)
+{
+	return 0;
+}
+
+static void choke_put(struct Qdisc *q, unsigned long cl)
+{
+}
+
+static unsigned long choke_bind(struct Qdisc *sch, unsigned long parent,
+				u32 classid)
+{
+	return 0;
+}
+
+static struct tcf_proto **choke_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	if (cl)
+		return NULL;
+	return &q->filter_list;
+}
+
+static int choke_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+	tcm->tcm_handle |= TC_H_MIN(cl);
+	return 0;
+}
+
+static void choke_walk(struct Qdisc *sch, struct qdisc_walker *arg)
+{
+	if (!arg->stop) {
+		if (arg->fn(sch, 1, arg) < 0) {
+			arg->stop = 1;
+			return;
+		}
+		arg->count++;
+	}
+}
+
+static const struct Qdisc_class_ops choke_class_ops = {
+	.leaf		=	choke_leaf,
+	.get		=	choke_get,
+	.put		=	choke_put,
+	.tcf_chain	=	choke_find_tcf,
+	.bind_tcf	=	choke_bind,
+	.unbind_tcf	=	choke_put,
+	.dump		=	choke_dump_class,
+	.walk		=	choke_walk,
+};
+
+static struct sk_buff *choke_peek_head(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	return (q->head != q->tail) ? q->tab[q->head] : NULL;
+}
+
+static struct Qdisc_ops choke_qdisc_ops __read_mostly = {
+	.id		=	"choke",
+	.priv_size	=	sizeof(struct choke_sched_data),
+
+	.enqueue	=	choke_enqueue,
+	.dequeue	=	choke_dequeue,
+	.peek		=	choke_peek_head,
+	.drop		=	choke_drop,
+	.init		=	choke_init,
+	.destroy	=	choke_destroy,
+	.reset		=	choke_reset,
+	.change		=	choke_change,
+	.dump		=	choke_dump,
+	.dump_stats	=	choke_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init choke_module_init(void)
+{
+	return register_qdisc(&choke_qdisc_ops);
+}
+
+static void __exit choke_module_exit(void)
+{
+	unregister_qdisc(&choke_qdisc_ops);
+}
+
+module_init(choke_module_init)
+module_exit(choke_module_exit)
+
+MODULE_LICENSE("GPL");
--- a/include/linux/pkt_sched.h	2011-01-14 10:43:19.092539898 -0800
+++ b/include/linux/pkt_sched.h	2011-01-14 10:43:50.915195785 -0800
@@ -247,6 +247,35 @@ struct tc_gred_sopt {
 	__u16		pad1;
 };
 
+/* CHOKe section */
+
+enum {
+	TCA_CHOKE_UNSPEC,
+	TCA_CHOKE_PARMS,
+	TCA_CHOKE_STAB,
+	__TCA_CHOKE_MAX,
+};
+
+#define TCA_CHOKE_MAX (__TCA_CHOKE_MAX - 1)
+
+struct tc_choke_qopt {
+	__u32		limit;		/* HARD maximal queue length (packets)	*/
+	__u32		qth_min;	/* Min average length threshold (packets) */
+	__u32		qth_max;	/* Max average length threshold (packets) */
+	unsigned char   Wlog;		/* log(W)		*/
+	unsigned char   Plog;		/* log(P_max/(qth_max-qth_min))	*/
+	unsigned char   Scell_log;	/* cell size for idle damping */
+	unsigned char	flags;		/* see RED flags */
+};
+
+struct tc_choke_xstats {
+	__u32		early;          /* Early drops */
+	__u32		pdrop;          /* Drops due to queue limits */
+	__u32		other;          /* Drops due to drop() calls */
+	__u32		marked;         /* Marked packets */
+	__u32		matched;	/* Drops due to flow match */
+};
+
 /* HTB section */
 #define TC_HTB_NUMPRIO		8
 #define TC_HTB_MAXDEPTH		8


-- 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] CHOKe flow scheduler (0.9)
  2011-01-14 23:45           ` [PATCH] CHOKe flow scheduler (0.9) Stephen Hemminger
@ 2011-01-15  7:45             ` Eric Dumazet
  2011-01-17 17:25               ` Stephen Hemminger
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-01-15  7:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Patrick McHardy, David Miller, netdev

Le vendredi 14 janvier 2011 à 15:45 -0800, Stephen Hemminger a écrit :
> CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
> packet scheduler based on the Random Exponential Drop (RED) algorithm.
> 
> The core idea is:
>   For every packet arrival:
>   	Calculate Qave
> 	if (Qave < minth) 
> 	     Queue the new packet
> 	else 
> 	     Select randomly a packet from the queue 
> 	     if (both packets from same flow)
> 	     then Drop both the packets
> 	     else if (Qave > maxth)
> 	          Drop packet
> 	     else
> 	       	  Admit packet with proability p (same as RED)
> 
> See also:
>   Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
>    queue management scheme for approximating fair bandwidth allocation", 
>   Proceeding of INFOCOM'2000, March 2000.
> 
> Help from:
>      Eric Dumazet <eric.dumazet@gmail.com>
>      Patrick McHardy <kaber@trash.net>
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> ---
> This version is based on net-next, and assumes Eric's patch for
> corrected bstats is already applied.
> 
> 0.9 incorporate patches from Patrick/Eric
>     rework the peek_random and drop code to simplify and fix bug where
>     random_N needs to called with full length (including holes).

Nice catch, I now have more "matched" counts after my test :

qdisc choke 11: parent 1:11 limit 130000b min 10833b max 32500b ewma 13 Plog 21 Scell_log 30
 Sent 93944198 bytes 170889 pkt (dropped 829140, overlimits 436686 requeues 0) 
 rate 48bit 0pps backlog 0b 0p requeues 0 
  marked 0 early 436686 pdrop 0 other 0 matched 196227

You missed the qdisc_bstats_update() move from enqueue() to dequeue()

And some minor CodingStyle / checkpatch.pl changes, here is my
latest diff on top of 0.9

I believe you can release v1 :)

Thanks !

 net/sched/sch_choke.c |   35 ++++++++++++++++++-----------------
 1 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 55eeb7d..a7605a0 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -20,7 +20,7 @@
 #include <net/red.h>
 
 /*	CHOKe stateless AQM for fair bandwidth allocation
-        =================================================
+	=================================================
 
    CHOKe (CHOose and Keep for responsive flows, CHOose and Kill for
    unresponsive flows) is a variant of RED that penalizes misbehaving flows but
@@ -137,10 +137,10 @@ static void choke_drop_by_idx(struct Qdisc *sch, unsigned int idx)
 }
 
 /* Classify flow using either:
-   1. pre-existing classification result in skb
-   2. fast internal classification
-   3. use TC filter based classification
-*/
+ * 1. pre-existing classification result in skb
+ * 2. fast internal classification (rxhash)
+ * 3. use TC filter based classification
+ */
 static unsigned int choke_classify(struct sk_buff *skb,
 				   struct Qdisc *sch, int *qerr)
 
@@ -176,10 +176,9 @@ static unsigned int choke_classify(struct sk_buff *skb,
 }
 
 /* Select a packet at random from the queue and return flow classification
-   returns 0 if queue empty
+ * returns 0 if queue empty
  */
-static unsigned int choke_peek_random(struct Qdisc *sch,
-				      unsigned int *pidx)
+static unsigned int choke_peek_random(struct Qdisc *sch, unsigned int *pidx)
 {
 	struct choke_sched_data *q = qdisc_priv(sch);
 	const struct sk_buff *skb;
@@ -229,9 +228,9 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		red_end_of_idle_period(p);
 
 	/* Is queue small? */
-	if (p->qavg <= p->qth_min)
+	if (p->qavg <= p->qth_min) {
 		p->qcount = -1;
-	else {
+	} else {
 		unsigned int idx;
 
 		/* Draw a packet at random from queue and compare flow */
@@ -266,8 +265,9 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 				q->stats.prob_mark++;
 			}
-		} else
+		} else {
 			p->qR = red_random(p);
+		}
 	}
 
 	/* Admit new packet */
@@ -276,7 +276,6 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		q->tail = (q->tail + 1) & q->tab_mask;
 		++sch->q.qlen;
 		sch->qstats.backlog += qdisc_pkt_len(skb);
-		qdisc_bstats_update(sch, skb);
 		return NET_XMIT_SUCCESS;
 	}
 
@@ -306,6 +305,7 @@ static struct sk_buff *choke_dequeue(struct Qdisc *sch)
 	choke_zap_head_holes(q);
 	--sch->q.qlen;
 	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	qdisc_bstats_update(sch, skb);
 
 	return skb;
 }
@@ -316,9 +316,9 @@ static unsigned int choke_drop(struct Qdisc *sch)
 	unsigned int len;
 
 	len = qdisc_queue_drop(sch);
-	if (len > 0)
+	if (len > 0) {
 		q->stats.other++;
-	else {
+	} else {
 		if (!red_is_idling(&q->parms))
 			red_start_of_idle_period(&q->parms);
 	}
@@ -326,7 +326,7 @@ static unsigned int choke_drop(struct Qdisc *sch)
 	return len;
 }
 
-static void choke_reset(struct Qdisc* sch)
+static void choke_reset(struct Qdisc *sch)
 {
 	struct choke_sched_data *q = qdisc_priv(sch);
 
@@ -410,8 +410,9 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
 
 		q->tab_mask = mask;
 		q->tab = ntab;
-	} else
+	} else {
 		sch_tree_lock(sch);
+	}
 
 	q->flags = ctl->flags;
 	q->limit = ctl->limit;
@@ -428,7 +429,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt)
 	return 0;
 }
 
-static int choke_init(struct Qdisc* sch, struct nlattr *opt)
+static int choke_init(struct Qdisc *sch, struct nlattr *opt)
 {
 	return choke_change(sch, opt);
 }



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] CHOKe flow scheduler (0.9)
  2011-01-15  7:45             ` Eric Dumazet
@ 2011-01-17 17:25               ` Stephen Hemminger
  2011-01-17 17:54                 ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Stephen Hemminger @ 2011-01-17 17:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Patrick McHardy, David Miller, netdev

On Sat, 15 Jan 2011 08:45:42 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le vendredi 14 janvier 2011 à 15:45 -0800, Stephen Hemminger a écrit :
> > CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
> > packet scheduler based on the Random Exponential Drop (RED) algorithm.
> > 
> > The core idea is:
> >   For every packet arrival:
> >   	Calculate Qave
> > 	if (Qave < minth) 
> > 	     Queue the new packet
> > 	else 
> > 	     Select randomly a packet from the queue 
> > 	     if (both packets from same flow)
> > 	     then Drop both the packets
> > 	     else if (Qave > maxth)
> > 	          Drop packet
> > 	     else
> > 	       	  Admit packet with proability p (same as RED)
> > 
> > See also:
> >   Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
> >    queue management scheme for approximating fair bandwidth allocation", 
> >   Proceeding of INFOCOM'2000, March 2000.
> > 
> > Help from:
> >      Eric Dumazet <eric.dumazet@gmail.com>
> >      Patrick McHardy <kaber@trash.net>
> > 
> > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> > 
> > ---
> > This version is based on net-next, and assumes Eric's patch for
> > corrected bstats is already applied.
> > 
> > 0.9 incorporate patches from Patrick/Eric
> >     rework the peek_random and drop code to simplify and fix bug where
> >     random_N needs to called with full length (including holes).
> 
> Nice catch, I now have more "matched" counts after my test :
> 
> qdisc choke 11: parent 1:11 limit 130000b min 10833b max 32500b ewma 13 Plog 21 Scell_log 30
>  Sent 93944198 bytes 170889 pkt (dropped 829140, overlimits 436686 requeues 0) 
>  rate 48bit 0pps backlog 0b 0p requeues 0 
>   marked 0 early 436686 pdrop 0 other 0 matched 196227
> 
> You missed the qdisc_bstats_update() move from enqueue() to dequeue()
> 
> And some minor CodingStyle / checkpatch.pl changes, here is my
> latest diff on top of 0.9
> 
> I believe you can release v1 :)
> 
> Thanks !

I rolled in your changes. But there is one more change I want to make.
The existing flow match based on hash is vulnerable to side-channel DoS attack.
It is possible for a hostile flow to send packets that match the same
hash value which would effectively kill a targeted flow.

The solution is to match based on full source and destination, not hash value.
Still coding that up.



-- 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] CHOKe flow scheduler (0.9)
  2011-01-17 17:25               ` Stephen Hemminger
@ 2011-01-17 17:54                 ` Eric Dumazet
  2011-01-18 19:06                   ` Stephen Hemminger
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-01-17 17:54 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Patrick McHardy, David Miller, netdev

Le lundi 17 janvier 2011 à 09:25 -0800, Stephen Hemminger a écrit :

> I rolled in your changes. But there is one more change I want to make.
> The existing flow match based on hash is vulnerable to side-channel DoS attack.
> It is possible for a hostile flow to send packets that match the same
> hash value which would effectively kill a targeted flow.
> 
> The solution is to match based on full source and destination, not hash value.
> Still coding that up.

I see, but you only want to make this full test if (!q->filter_list)  ?

(or precisely only if skb_get_rxhash() was used to get the cookie )






^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] CHOKe flow scheduler (0.9)
  2011-01-17 17:54                 ` Eric Dumazet
@ 2011-01-18 19:06                   ` Stephen Hemminger
  2011-01-18 19:34                     ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Stephen Hemminger @ 2011-01-18 19:06 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Patrick McHardy, David Miller, netdev

On Mon, 17 Jan 2011 18:54:11 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le lundi 17 janvier 2011 à 09:25 -0800, Stephen Hemminger a écrit :
> 
> > I rolled in your changes. But there is one more change I want to make.
> > The existing flow match based on hash is vulnerable to side-channel DoS attack.
> > It is possible for a hostile flow to send packets that match the same
> > hash value which would effectively kill a targeted flow.
> > 
> > The solution is to match based on full source and destination, not hash value.
> > Still coding that up.
> 
> I see, but you only want to make this full test if (!q->filter_list)  ?
> 
> (or precisely only if skb_get_rxhash() was used to get the cookie )

This is what I am starting to retest. The code can probably be simplified
to avoid the may_pull() on the packet already in queue.

Subject: sched: CHOKe flow scheduler

CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
packet scheduler based on the Random Exponential Drop (RED) algorithm.

The core idea is:
  For every packet arrival:
  	Calculate Qave
	if (Qave < minth) 
	     Queue the new packet
	else 
	     Select randomly a packet from the queue 
	     if (both packets from same flow)
	     then Drop both the packets
	     else if (Qave > maxth)
	          Drop packet
	     else
	       	  Admit packet with proability p (same as RED)

See also:
  Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
   queue management scheme for approximating fair bandwidth allocation", 
  Proceeding of INFOCOM'2000, March 2000.

Help from:
     Eric Dumazet <eric.dumazet@gmail.com>
     Patrick McHardy <kaber@trash.net>

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
This version is based on net-next, and assumes Eric's patch for
corrected bstats is already applied.

0.9 incorporate patches from Patrick/Eric
    rework the peek_random and drop code to simplify and fix bug where
    random_N needs to called with full length (including holes).

 include/linux/pkt_sched.h |   29 ++
 net/sched/Kconfig         |   11 
 net/sched/Makefile        |    1 
 net/sched/sch_choke.c     |  579 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 620 insertions(+)

--- a/net/sched/Kconfig	2011-01-14 10:43:19.062537393 -0800
+++ b/net/sched/Kconfig	2011-01-16 13:42:45.938919517 -0800
@@ -205,6 +205,17 @@ config NET_SCH_DRR
 
 	  If unsure, say N.
 
+config NET_SCH_CHOKE
+	tristate "CHOose and Keep responsive flow scheduler (CHOKE)"
+	help
+	  Say Y here if you want to use the CHOKe packet scheduler (CHOose
+	  and Keep for responsive flows, CHOose and Kill for unresponsive
+	  flows). This is a variation of RED which trys to penalize flows
+	  that monopolize the queue.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called sch_choke.
+
 config NET_SCH_INGRESS
 	tristate "Ingress Qdisc"
 	depends on NET_CLS_ACT
--- a/net/sched/Makefile	2011-01-14 10:43:19.072538228 -0800
+++ b/net/sched/Makefile	2011-01-16 13:42:45.946919793 -0800
@@ -32,6 +32,7 @@ obj-$(CONFIG_NET_SCH_MULTIQ)	+= sch_mult
 obj-$(CONFIG_NET_SCH_ATM)	+= sch_atm.o
 obj-$(CONFIG_NET_SCH_NETEM)	+= sch_netem.o
 obj-$(CONFIG_NET_SCH_DRR)	+= sch_drr.o
+obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
 obj-$(CONFIG_NET_CLS_FW)	+= cls_fw.o
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ b/net/sched/sch_choke.c	2011-01-17 09:18:42.271211633 -0800
@@ -0,0 +1,686 @@
+/*
+ * net/sched/sch_choke.c	CHOKE scheduler
+ *
+ * Copyright (c) 2011 Stephen Hemminger <shemminger@vyatta.com>
+ * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/reciprocal_div.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <net/red.h>
+#include <linux/ip.h>
+#include <net/ip.h>
+#include <linux/ipv6.h>
+#include <net/ipv6.h>
+
+/*
+   CHOKe stateless AQM for fair bandwidth allocation
+   =================================================
+
+   CHOKe (CHOose and Keep for responsive flows, CHOose and Kill for
+   unresponsive flows) is a variant of RED that penalizes misbehaving flows but
+   maintains no flow state. The difference from RED is an additional step
+   during the enqueuing process. If average queue size is over the
+   low threshold (qmin), a packet is chosen at random from the queue.
+   If both the new and chosen packet are from the same flow, both
+   are dropped. Unlike RED, CHOKe is not really a "classful" qdisc because it
+   needs to access packets in queue randomly. It has a minimal class
+   interface to allow overriding the builtin flow classifier with
+   filters.
+
+   Source:
+   R. Pan, B. Prabhakar, and K. Psounis, "CHOKe, A Stateless
+   Active Queue Management Scheme for Approximating Fair Bandwidth Allocation",
+   IEEE INFOCOM, 2000.
+
+   A. Tang, J. Wang, S. Low, "Understanding CHOKe: Throughput and Spatial
+   Characteristics", IEEE/ACM Transactions on Networking, 2004
+
+ */
+
+/* Upper bound on size of sk_buff table (packets) */
+#define CHOKE_MAX_QUEUE	(128*1024 - 1)
+
+struct choke_sched_data {
+/* Parameters */
+	u32		 limit;
+	unsigned char	 flags;
+
+	struct red_parms parms;
+
+/* Variables */
+	struct tcf_proto *filter_list;
+	struct {
+		u32	prob_drop;	/* Early probability drops */
+		u32	prob_mark;	/* Early probability marks */
+		u32	forced_drop;	/* Forced drops, qavg > max_thresh */
+		u32	forced_mark;	/* Forced marks, qavg > max_thresh */
+		u32	pdrop;          /* Drops due to queue limits */
+		u32	other;          /* Drops due to drop() calls */
+		u32	matched;	/* Drops to flow match */
+	} stats;
+
+	unsigned int	 head;
+	unsigned int	 tail;
+
+	unsigned int	 tab_mask; /* size - 1 */
+
+	struct sk_buff **tab;
+};
+
+/* deliver a random number between 0 and N - 1 */
+static u32 random_N(unsigned int N)
+{
+	return reciprocal_divide(random32(), N);
+}
+
+/* number of elements in queue including holes */
+static unsigned int choke_len(const struct choke_sched_data *q)
+{
+	return (q->tail - q->head) & q->tab_mask;
+}
+
+/* Is ECN parameter configured */
+static int use_ecn(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_ECN;
+}
+
+/* Should packets over max just be dropped (versus marked) */
+static int use_harddrop(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_HARDDROP;
+}
+
+/* Move head pointer forward to skip over holes */
+static void choke_zap_head_holes(struct choke_sched_data *q)
+{
+	do {
+		q->head = (q->head + 1) & q->tab_mask;
+		if (q->head == q->tail)
+			break;
+	} while (q->tab[q->head] == NULL);
+}
+
+/* Move tail pointer backwards to reuse holes */
+static void choke_zap_tail_holes(struct choke_sched_data *q)
+{
+	do {
+		q->tail = (q->tail - 1) & q->tab_mask;
+		if (q->head == q->tail)
+			break;
+	} while (q->tab[q->tail] == NULL);
+}
+
+/* Drop packet from queue array by creating a "hole" */
+static void choke_drop_by_idx(struct Qdisc *sch, unsigned int idx)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb = q->tab[idx];
+
+	q->tab[idx] = NULL;
+
+	if (idx == q->head)
+		choke_zap_head_holes(q);
+	if (idx == q->tail)
+		choke_zap_tail_holes(q);
+
+	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	qdisc_drop(skb, sch);
+	qdisc_tree_decrease_qlen(sch, 1);
+	--sch->q.qlen;
+}
+
+/*
+ * Compare flow of two packets
+ *  Returns true only if source and destination address and port match.
+ *          false for special cases
+ */
+static bool choke_match_flow(struct sk_buff *skb1, struct sk_buff *skb2)
+{
+	int off1, off2, poff;
+	u8 ip_proto;
+	u32 ihl;
+
+	if (skb1->protocol != skb2->protocol)
+		return false;
+
+	off1 = skb_network_offset(skb1);
+	off2 = skb_network_offset(skb2);
+
+	switch (skb1->protocol) {
+	case __constant_htons(ETH_P_IP): {
+		struct iphdr *ip1, *ip2;
+
+		if (!pskb_may_pull(skb1, sizeof(struct iphdr) + off1))
+			return false;
+
+		ip1 = (struct iphdr *) (skb1->data + off1);
+		if (ip1->frag_off & htons(IP_MF | IP_OFFSET))
+			return false;	/* don't compare fragments */
+
+		if (!pskb_may_pull(skb2, sizeof(struct iphdr) + off2))
+			return false;
+
+		ip2 = (struct iphdr *) (skb2->data + off2);
+		if (ip2->frag_off & htons(IP_MF | IP_OFFSET))
+			return false;
+
+		if (ip1->protocol != ip2->protocol ||
+		    ip1->saddr != ip2->saddr || ip1->daddr != ip2->daddr)
+			return false;
+
+		ip_proto = ip1->protocol;
+		ihl = ip1->ihl;
+		break;
+	}
+
+	case __constant_htons(ETH_P_IPV6): {
+		struct ipv6hdr *ip1, *ip2;
+
+		if (!pskb_may_pull(skb1, sizeof(struct ipv6hdr *) + off1))
+			return false;
+
+		if (!pskb_may_pull(skb2, sizeof(struct ipv6hdr *) + off2))
+			return false;
+
+		ip1 = (struct ipv6hdr *) (skb1->data + off1);
+		ip2 = (struct ipv6hdr *) (skb2->data + off2);
+
+		if (ip1->nexthdr != ip2->nexthdr ||
+		    ipv6_addr_cmp(&ip1->saddr, &ip2->saddr) != 0 ||
+		    ipv6_addr_cmp(&ip1->daddr, &ip2->daddr))
+			return false;
+
+		ihl = (40 >> 2);
+		ip_proto = ip1->nexthdr;
+		break;
+	}
+
+	default:
+		return false;
+	}
+
+	poff = proto_ports_offset(ip_proto);
+	if (poff >= 0) {
+		u32 *ports1, *ports2;
+
+		off1 += ihl * 4 + poff;
+		if (!pskb_may_pull(skb1, off1 + 4))
+			return false;
+
+		off2 += ihl * 4 + poff;
+		if (!pskb_may_pull(skb2, off2 + 4))
+			return false;
+
+		ports1 = (__force u32 *) (skb1->data + off1);
+		ports2 = (__force u32 *) (skb2->data + off2);
+
+		return *ports1 == *ports2;
+	}
+
+	return true;
+}
+
+static inline void choke_set_classid(struct sk_buff *skb, u16 classid)
+{
+	*(unsigned int *)(qdisc_skb_cb(skb)->data) = classid;
+}
+
+static u16 choke_get_classid(const struct sk_buff *skb)
+{
+	return *(unsigned int *)(qdisc_skb_cb(skb)->data);
+}
+
+/*
+ * Classify flow using either:
+ *  1. pre-existing classification result in skb
+ *  2. fast internal classification
+ *  3. use TC filter based classification
+ */
+static bool choke_classify(struct sk_buff *skb,
+			   struct Qdisc *sch, int *qerr)
+
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tcf_result res;
+	int result;
+
+	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+
+	result = tc_classify(skb, q->filter_list, &res);
+	if (result >= 0) {
+#ifdef CONFIG_NET_CLS_ACT
+		switch (result) {
+		case TC_ACT_STOLEN:
+		case TC_ACT_QUEUED:
+			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+		case TC_ACT_SHOT:
+			return false;
+		}
+#endif
+		choke_set_classid(skb, TC_H_MIN(res.classid));
+		return true;
+	}
+
+	return false;
+}
+
+/* Select packet a random from queue */
+static struct sk_buff *choke_peek_random(const struct choke_sched_data *q,
+					 unsigned int *pidx)
+{
+	struct sk_buff *skb;
+	int retrys = 3;
+
+	do {
+		*pidx = (q->head + random_N(choke_len(q))) & q->tab_mask;
+		skb = q->tab[*pidx];
+		if (skb)
+			return skb;
+	} while (--retrys > 0);
+
+	/* queue is has lots of holes use the head which is known to exist
+	 * Note : result can still be NULL if q->head == q->tail
+	 */
+	return q->tab[*pidx = q->head];
+}
+
+/* Select a packet at random from the queue and compare flow */
+static bool choke_match_random(const struct choke_sched_data *q,
+			       struct sk_buff *nskb,
+			       unsigned int *pidx)
+{
+	struct sk_buff *oskb;
+
+	if (q->head == q->tail)
+		return false;
+
+	oskb = choke_peek_random(q, pidx);
+	if (q->filter_list)
+		return choke_get_classid(nskb) == choke_get_classid(oskb);
+
+
+	return choke_match_flow(oskb, nskb);
+}
+
+static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct red_parms *p = &q->parms;
+	int uninitialized_var(ret);
+
+	/* If using external classifiers, get result and record it. */
+	if (q->filter_list &&
+	    !choke_classify(skb, sch, &ret)) {
+		/* Packet was eaten by filter */
+		if (ret & __NET_XMIT_BYPASS)
+			sch->qstats.drops++;
+		kfree_skb(skb);
+		return ret;
+	}
+
+	/* Compute average queue usage (see RED) */
+	p->qavg = red_calc_qavg(p, sch->q.qlen);
+	if (red_is_idling(p))
+		red_end_of_idle_period(p);
+
+	/* Is queue small? */
+	if (p->qavg <= p->qth_min)
+		p->qcount = -1;
+	else {
+		unsigned int idx;
+
+		/* Draw a packet at random from queue and compare flow */
+		if (choke_match_random(q, skb, &idx)) {
+			q->stats.matched++;
+			choke_drop_by_idx(sch, idx);
+			goto congestion_drop;
+		}
+
+		/* Queue is large, always mark/drop */
+		if (p->qavg > p->qth_max) {
+			p->qcount = -1;
+
+			sch->qstats.overlimits++;
+			if (use_harddrop(q) || !use_ecn(q) ||
+			    !INET_ECN_set_ce(skb)) {
+				q->stats.forced_drop++;
+				goto congestion_drop;
+			}
+
+			q->stats.forced_mark++;
+		} else if (++p->qcount) {
+			if (red_mark_probability(p, p->qavg)) {
+				p->qcount = 0;
+				p->qR = red_random(p);
+
+				sch->qstats.overlimits++;
+				if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
+					q->stats.prob_drop++;
+					goto congestion_drop;
+				}
+
+				q->stats.prob_mark++;
+			}
+		} else
+			p->qR = red_random(p);
+	}
+
+	/* Admit new packet */
+	if (sch->q.qlen < q->limit) {
+		q->tab[q->tail] = skb;
+		q->tail = (q->tail + 1) & q->tab_mask;
+		++sch->q.qlen;
+		sch->qstats.backlog += qdisc_pkt_len(skb);
+		return NET_XMIT_SUCCESS;
+	}
+
+	q->stats.pdrop++;
+	sch->qstats.drops++;
+	kfree_skb(skb);
+	return NET_XMIT_DROP;
+
+ congestion_drop:
+	qdisc_drop(skb, sch);
+	return NET_XMIT_CN;
+}
+
+static struct sk_buff *choke_dequeue(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+
+	if (q->head == q->tail) {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+		return NULL;
+	}
+
+	skb = q->tab[q->head];
+	q->tab[q->head] = NULL;
+	choke_zap_head_holes(q);
+	--sch->q.qlen;
+	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	qdisc_bstats_update(sch, skb);
+
+	return skb;
+}
+
+static unsigned int choke_drop(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	unsigned int len;
+
+	len = qdisc_queue_drop(sch);
+	if (len > 0)
+		q->stats.other++;
+	else {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+	}
+
+	return len;
+}
+
+static void choke_reset(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	red_restart(&q->parms);
+}
+
+static const struct nla_policy choke_policy[TCA_CHOKE_MAX + 1] = {
+	[TCA_CHOKE_PARMS]	= { .len = sizeof(struct tc_red_qopt) },
+	[TCA_CHOKE_STAB]	= { .len = RED_STAB_SIZE },
+};
+
+
+static void choke_free(void *addr)
+{
+	if (addr) {
+		if (is_vmalloc_addr(addr))
+			vfree(addr);
+		else
+			kfree(addr);
+	}
+}
+
+static int choke_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_CHOKE_MAX + 1];
+	const struct tc_red_qopt *ctl;
+	int err;
+	struct sk_buff **old = NULL;
+	unsigned int mask;
+
+	if (opt == NULL)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CHOKE_MAX, opt, choke_policy);
+	if (err < 0)
+		return err;
+
+	if (tb[TCA_CHOKE_PARMS] == NULL ||
+	    tb[TCA_CHOKE_STAB] == NULL)
+		return -EINVAL;
+
+	ctl = nla_data(tb[TCA_CHOKE_PARMS]);
+
+	if (ctl->limit > CHOKE_MAX_QUEUE)
+		return -EINVAL;
+
+	mask = roundup_pow_of_two(ctl->limit + 1) - 1;
+	if (mask != q->tab_mask) {
+		struct sk_buff **ntab;
+
+		ntab = kcalloc(mask + 1, sizeof(struct sk_buff *), GFP_KERNEL);
+		if (!ntab)
+			ntab = vzalloc((mask + 1) * sizeof(struct sk_buff *));
+		if (!ntab)
+			return -ENOMEM;
+
+		sch_tree_lock(sch);
+		old = q->tab;
+		if (old) {
+			unsigned int oqlen = sch->q.qlen, tail = 0;
+
+			while (q->head != q->tail) {
+				struct sk_buff *skb = q->tab[q->head];
+
+				q->head = (q->head + 1) & q->tab_mask;
+				if (!skb)
+					continue;
+				if (tail < mask) {
+					ntab[tail++] = skb;
+					continue;
+				}
+				sch->qstats.backlog -= qdisc_pkt_len(skb);
+				--sch->q.qlen;
+				qdisc_drop(skb, sch);
+			}
+			qdisc_tree_decrease_qlen(sch, oqlen - sch->q.qlen);
+			q->head = 0;
+			q->tail = tail;
+		}
+
+		q->tab_mask = mask;
+		q->tab = ntab;
+	} else
+		sch_tree_lock(sch);
+
+	q->flags = ctl->flags;
+	q->limit = ctl->limit;
+
+	red_set_parms(&q->parms, ctl->qth_min, ctl->qth_max, ctl->Wlog,
+		      ctl->Plog, ctl->Scell_log,
+		      nla_data(tb[TCA_CHOKE_STAB]));
+
+	if (q->head == q->tail)
+		red_end_of_idle_period(&q->parms);
+
+	sch_tree_unlock(sch);
+	choke_free(old);
+	return 0;
+}
+
+static int choke_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	return choke_change(sch, opt);
+}
+
+static int choke_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts = NULL;
+	struct tc_red_qopt opt = {
+		.limit		= q->limit,
+		.flags		= q->flags,
+		.qth_min	= q->parms.qth_min >> q->parms.Wlog,
+		.qth_max	= q->parms.qth_max >> q->parms.Wlog,
+		.Wlog		= q->parms.Wlog,
+		.Plog		= q->parms.Plog,
+		.Scell_log	= q->parms.Scell_log,
+	};
+
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	if (opts == NULL)
+		goto nla_put_failure;
+
+	NLA_PUT(skb, TCA_CHOKE_PARMS, sizeof(opt), &opt);
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -EMSGSIZE;
+}
+
+static int choke_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tc_choke_xstats st = {
+		.early	= q->stats.prob_drop + q->stats.forced_drop,
+		.marked	= q->stats.prob_mark + q->stats.forced_mark,
+		.pdrop	= q->stats.pdrop,
+		.other	= q->stats.other,
+		.matched = q->stats.matched,
+	};
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static void choke_destroy(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	tcf_destroy_chain(&q->filter_list);
+	choke_free(q->tab);
+}
+
+static struct Qdisc *choke_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	return NULL;
+}
+
+static unsigned long choke_get(struct Qdisc *sch, u32 classid)
+{
+	return 0;
+}
+
+static void choke_put(struct Qdisc *q, unsigned long cl)
+{
+}
+
+static unsigned long choke_bind(struct Qdisc *sch, unsigned long parent,
+				u32 classid)
+{
+	return 0;
+}
+
+static struct tcf_proto **choke_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	if (cl)
+		return NULL;
+	return &q->filter_list;
+}
+
+static int choke_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+	tcm->tcm_handle |= TC_H_MIN(cl);
+	return 0;
+}
+
+static void choke_walk(struct Qdisc *sch, struct qdisc_walker *arg)
+{
+	if (!arg->stop) {
+		if (arg->fn(sch, 1, arg) < 0) {
+			arg->stop = 1;
+			return;
+		}
+		arg->count++;
+	}
+}
+
+static const struct Qdisc_class_ops choke_class_ops = {
+	.leaf		=	choke_leaf,
+	.get		=	choke_get,
+	.put		=	choke_put,
+	.tcf_chain	=	choke_find_tcf,
+	.bind_tcf	=	choke_bind,
+	.unbind_tcf	=	choke_put,
+	.dump		=	choke_dump_class,
+	.walk		=	choke_walk,
+};
+
+static struct sk_buff *choke_peek_head(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	return (q->head != q->tail) ? q->tab[q->head] : NULL;
+}
+
+static struct Qdisc_ops choke_qdisc_ops __read_mostly = {
+	.id		=	"choke",
+	.priv_size	=	sizeof(struct choke_sched_data),
+
+	.enqueue	=	choke_enqueue,
+	.dequeue	=	choke_dequeue,
+	.peek		=	choke_peek_head,
+	.drop		=	choke_drop,
+	.init		=	choke_init,
+	.destroy	=	choke_destroy,
+	.reset		=	choke_reset,
+	.change		=	choke_change,
+	.dump		=	choke_dump,
+	.dump_stats	=	choke_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init choke_module_init(void)
+{
+	return register_qdisc(&choke_qdisc_ops);
+}
+
+static void __exit choke_module_exit(void)
+{
+	unregister_qdisc(&choke_qdisc_ops);
+}
+
+module_init(choke_module_init)
+module_exit(choke_module_exit)
+
+MODULE_LICENSE("GPL");
--- a/include/linux/pkt_sched.h	2011-01-14 10:43:19.092539898 -0800
+++ b/include/linux/pkt_sched.h	2011-01-16 13:42:45.926919103 -0800
@@ -247,6 +247,35 @@ struct tc_gred_sopt {
 	__u16		pad1;
 };
 
+/* CHOKe section */
+
+enum {
+	TCA_CHOKE_UNSPEC,
+	TCA_CHOKE_PARMS,
+	TCA_CHOKE_STAB,
+	__TCA_CHOKE_MAX,
+};
+
+#define TCA_CHOKE_MAX (__TCA_CHOKE_MAX - 1)
+
+struct tc_choke_qopt {
+	__u32		limit;		/* Hard queue length (packets)	*/
+	__u32		qth_min;	/* Min average threshold (packets) */
+	__u32		qth_max;	/* Max average threshold (packets) */
+	unsigned char   Wlog;		/* log(W)		*/
+	unsigned char   Plog;		/* log(P_max/(qth_max-qth_min))	*/
+	unsigned char   Scell_log;	/* cell size for idle damping */
+	unsigned char	flags;		/* see RED flags */
+};
+
+struct tc_choke_xstats {
+	__u32		early;          /* Early drops */
+	__u32		pdrop;          /* Drops due to queue limits */
+	__u32		other;          /* Drops due to drop() calls */
+	__u32		marked;         /* Marked packets */
+	__u32		matched;	/* Drops due to flow match */
+};
+
 /* HTB section */
 #define TC_HTB_NUMPRIO		8
 #define TC_HTB_MAXDEPTH		8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] CHOKe flow scheduler (0.9)
  2011-01-18 19:06                   ` Stephen Hemminger
@ 2011-01-18 19:34                     ` Eric Dumazet
  2011-01-20 17:38                       ` [PATCH] CHOKe flow scheduler (0.10) Stephen Hemminger
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-01-18 19:34 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Patrick McHardy, David Miller, netdev

Le mardi 18 janvier 2011 à 11:06 -0800, Stephen Hemminger a écrit :

> +static bool choke_match_flow(struct sk_buff *skb1, struct sk_buff *skb2)
> +{
> +	int off1, off2, poff;
> +	u8 ip_proto;
> +	u32 ihl;
> +
> +	if (skb1->protocol != skb2->protocol)
> +		return false;
> +
> +	off1 = skb_network_offset(skb1);
> +	off2 = skb_network_offset(skb2);

> +
> +	switch (skb1->protocol) {
> +	case __constant_htons(ETH_P_IP): {
> +		struct iphdr *ip1, *ip2;
> +
> +		if (!pskb_may_pull(skb1, sizeof(struct iphdr) + off1))
> +			return false;
> +
	pskb_network_may_pull() might be cleaner


> +		ip1 = (struct iphdr *) (skb1->data + off1);
	ip1 = ip_hdr(skb);

> +		if (ip1->frag_off & htons(IP_MF | IP_OFFSET))
> +			return false;	/* don't compare fragments */
> +

Hmm, we should compare fragments if possible.

saddr/daddr are available, not the ports.

> +		if (!pskb_may_pull(skb2, sizeof(struct iphdr) + off2))
> +			return false;
> +
> +		ip2 = (struct iphdr *) (skb2->data + off2);
> +		if (ip2->frag_off & htons(IP_MF | IP_OFFSET))
> +			return false;
> +
> +		if (ip1->protocol != ip2->protocol ||
> +		    ip1->saddr != ip2->saddr || ip1->daddr != ip2->daddr)
> +			return false;
> +


	What happens if ip1->ihl != ip2->ihl  here ?

Here I would add the fragment test :

	if ((ip1->frag_off | ip2->frag_off)) & htons(IP_MF | IP_OFFSET))
		return true;

> +		ip_proto = ip1->protocol;
> +		ihl = ip1->ihl;
> +		break;
> +	}
> +
> +	case __constant_htons(ETH_P_IPV6): {
> +		struct ipv6hdr *ip1, *ip2;
> +
> +		if (!pskb_may_pull(skb1, sizeof(struct ipv6hdr *) + off1))
> +			return false;

ouch... sizeof(sizeof(struct ipv6hdr *) is not what you want but
sizeof(struct ipv6hdr) is.

So just use :

	pskb_network_may_pull(skb1, sizeof(*ip1))
> +
> +		if (!pskb_may_pull(skb2, sizeof(struct ipv6hdr *) + off2))
> +			return false;
> +
> +		ip1 = (struct ipv6hdr *) (skb1->data + off1);
	ip1 = ipv6_hdr(skb1);
> +		ip2 = (struct ipv6hdr *) (skb2->data + off2);
> +
> +		if (ip1->nexthdr != ip2->nexthdr ||
> +		    ipv6_addr_cmp(&ip1->saddr, &ip2->saddr) != 0 ||
> +		    ipv6_addr_cmp(&ip1->daddr, &ip2->daddr))
> +			return false;
> +
> +		ihl = (40 >> 2);
> +		ip_proto = ip1->nexthdr;
> +		break;
> +	}
> +
> +	default:
> +		return false;
> +	}
> +




^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH] CHOKe flow scheduler (0.10)
  2011-01-18 19:34                     ` Eric Dumazet
@ 2011-01-20 17:38                       ` Stephen Hemminger
  2011-01-20 18:19                         ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Stephen Hemminger @ 2011-01-20 17:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Patrick McHardy, David Miller, netdev

CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
packet scheduler based on the Random Exponential Drop (RED) algorithm.

The core idea is:
  For every packet arrival:
  	Calculate Qave
	if (Qave < minth) 
	     Queue the new packet
	else 
	     Select randomly a packet from the queue 
	     if (both packets from same flow)
	     then Drop both the packets
	     else if (Qave > maxth)
	          Drop packet
	     else
	       	  Admit packet with proability p (same as RED)

See also:
  Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
   queue management scheme for approximating fair bandwidth allocation", 
  Proceeding of INFOCOM'2000, March 2000.

Help from:
     Eric Dumazet <eric.dumazet@gmail.com>
     Patrick McHardy <kaber@trash.net>

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
0.10 The internal flow classifier now does full compare of IPv4/IPv6 headers
  to avoid possible DoS attack from hash collision. Internal flow match code
  needs review/testing.

 include/linux/pkt_sched.h |   29 +
 net/sched/Kconfig         |   11 
 net/sched/Makefile        |    1 
 net/sched/sch_choke.c     |  713 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 754 insertions(+)

--- a/net/sched/Kconfig	2011-01-18 09:27:32.224666104 -0800
+++ b/net/sched/Kconfig	2011-01-18 09:27:45.522274178 -0800
@@ -205,6 +205,17 @@ config NET_SCH_DRR
 
 	  If unsure, say N.
 
+config NET_SCH_CHOKE
+	tristate "CHOose and Keep responsive flow scheduler (CHOKE)"
+	help
+	  Say Y here if you want to use the CHOKe packet scheduler (CHOose
+	  and Keep for responsive flows, CHOose and Kill for unresponsive
+	  flows). This is a variation of RED which trys to penalize flows
+	  that monopolize the queue.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called sch_choke.
+
 config NET_SCH_INGRESS
 	tristate "Ingress Qdisc"
 	depends on NET_CLS_ACT
--- a/net/sched/Makefile	2011-01-18 09:27:32.232666939 -0800
+++ b/net/sched/Makefile	2011-01-18 09:27:45.526274731 -0800
@@ -32,6 +32,7 @@ obj-$(CONFIG_NET_SCH_MULTIQ)	+= sch_mult
 obj-$(CONFIG_NET_SCH_ATM)	+= sch_atm.o
 obj-$(CONFIG_NET_SCH_NETEM)	+= sch_netem.o
 obj-$(CONFIG_NET_SCH_DRR)	+= sch_drr.o
+obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
 obj-$(CONFIG_NET_CLS_FW)	+= cls_fw.o
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ b/net/sched/sch_choke.c	2011-01-20 08:33:06.021631871 -0800
@@ -0,0 +1,713 @@
+/*
+ * net/sched/sch_choke.c	CHOKE scheduler
+ *
+ * Copyright (c) 2011 Stephen Hemminger <shemminger@vyatta.com>
+ * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/reciprocal_div.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <net/red.h>
+#include <linux/ip.h>
+#include <net/ip.h>
+#include <linux/ipv6.h>
+#include <net/ipv6.h>
+
+/*
+   CHOKe stateless AQM for fair bandwidth allocation
+   =================================================
+
+   CHOKe (CHOose and Keep for responsive flows, CHOose and Kill for
+   unresponsive flows) is a variant of RED that penalizes misbehaving flows but
+   maintains no flow state. The difference from RED is an additional step
+   during the enqueuing process. If average queue size is over the
+   low threshold (qmin), a packet is chosen at random from the queue.
+   If both the new and chosen packet are from the same flow, both
+   are dropped. Unlike RED, CHOKe is not really a "classful" qdisc because it
+   needs to access packets in queue randomly. It has a minimal class
+   interface to allow overriding the builtin flow classifier with
+   filters.
+
+   Source:
+   R. Pan, B. Prabhakar, and K. Psounis, "CHOKe, A Stateless
+   Active Queue Management Scheme for Approximating Fair Bandwidth Allocation",
+   IEEE INFOCOM, 2000.
+
+   A. Tang, J. Wang, S. Low, "Understanding CHOKe: Throughput and Spatial
+   Characteristics", IEEE/ACM Transactions on Networking, 2004
+
+ */
+
+/* Upper bound on size of sk_buff table (packets) */
+#define CHOKE_MAX_QUEUE	(128*1024 - 1)
+
+struct choke_sched_data {
+/* Parameters */
+	u32		 limit;
+	unsigned char	 flags;
+
+	struct red_parms parms;
+
+/* Variables */
+	struct tcf_proto *filter_list;
+	struct {
+		u32	prob_drop;	/* Early probability drops */
+		u32	prob_mark;	/* Early probability marks */
+		u32	forced_drop;	/* Forced drops, qavg > max_thresh */
+		u32	forced_mark;	/* Forced marks, qavg > max_thresh */
+		u32	pdrop;          /* Drops due to queue limits */
+		u32	other;          /* Drops due to drop() calls */
+		u32	matched;	/* Drops to flow match */
+	} stats;
+
+	unsigned int	 head;
+	unsigned int	 tail;
+
+	unsigned int	 tab_mask; /* size - 1 */
+
+	struct sk_buff **tab;
+};
+
+/* deliver a random number between 0 and N - 1 */
+static u32 random_N(unsigned int N)
+{
+	return reciprocal_divide(random32(), N);
+}
+
+/* number of elements in queue including holes */
+static unsigned int choke_len(const struct choke_sched_data *q)
+{
+	return (q->tail - q->head) & q->tab_mask;
+}
+
+/* Is ECN parameter configured */
+static int use_ecn(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_ECN;
+}
+
+/* Should packets over max just be dropped (versus marked) */
+static int use_harddrop(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_HARDDROP;
+}
+
+/* Move head pointer forward to skip over holes */
+static void choke_zap_head_holes(struct choke_sched_data *q)
+{
+	do {
+		q->head = (q->head + 1) & q->tab_mask;
+		if (q->head == q->tail)
+			break;
+	} while (q->tab[q->head] == NULL);
+}
+
+/* Move tail pointer backwards to reuse holes */
+static void choke_zap_tail_holes(struct choke_sched_data *q)
+{
+	do {
+		q->tail = (q->tail - 1) & q->tab_mask;
+		if (q->head == q->tail)
+			break;
+	} while (q->tab[q->tail] == NULL);
+}
+
+/* Drop packet from queue array by creating a "hole" */
+static void choke_drop_by_idx(struct Qdisc *sch, unsigned int idx)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb = q->tab[idx];
+
+	q->tab[idx] = NULL;
+
+	if (idx == q->head)
+		choke_zap_head_holes(q);
+	if (idx == q->tail)
+		choke_zap_tail_holes(q);
+
+	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	qdisc_drop(skb, sch);
+	qdisc_tree_decrease_qlen(sch, 1);
+	--sch->q.qlen;
+}
+
+/* Make sure network and transport headers can be dereferenced */
+static bool choke_linearize_header(struct sk_buff *skb)
+{
+	int nhoff, poff;
+	struct ipv6hdr *ip6;
+	struct iphdr *ip;
+	u8 ip_proto;
+	u32 ihl;
+
+	nhoff = skb_network_offset(skb);
+
+	switch (skb->protocol) {
+	case __constant_htons(ETH_P_IP):
+		if (!pskb_may_pull(skb, sizeof(*ip) + nhoff))
+			return false;
+
+		ip = (struct iphdr *) (skb->data + nhoff);
+		if (ip->frag_off & htons(IP_MF | IP_OFFSET))
+			return false;
+
+		ip_proto = ip->protocol;
+		ihl = ip->ihl;
+		break;
+	case __constant_htons(ETH_P_IPV6):
+		if (!pskb_may_pull(skb, sizeof(*ip6) + nhoff))
+			return false;
+
+		ip6 = (struct ipv6hdr *) (skb->data + nhoff);
+		ip_proto = ip6->nexthdr;
+		ihl = (40 >> 2);
+		break;
+	default:
+		return true;
+	}
+
+	poff = proto_ports_offset(ip_proto);
+	if (poff >= 0) {
+		nhoff += ihl * 4 + poff;
+		if (!pskb_may_pull(skb, nhoff + 4))
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * Compare flow of two packets
+ *  Returns true only if source and destination address and port match.
+ *          false for special cases
+ */
+static bool choke_match_flow(const struct sk_buff *skb1,
+			     const struct sk_buff *skb2)
+{
+	int off1, off2, poff;
+
+	if (skb1->protocol != skb2->protocol)
+		return false;
+
+	off1 = skb_network_offset(skb1);
+	off2 = skb_network_offset(skb2);
+
+	switch (skb1->protocol) {
+	case __constant_htons(ETH_P_IP): {
+		const struct iphdr *ip1, *ip2;
+
+		ip1 = (struct iphdr *) (skb1->data + off1);
+		ip2 = (struct iphdr *) (skb2->data + off2);
+
+		if (ip1->protocol != ip2->protocol ||
+		    ip1->saddr != ip2->saddr || ip1->daddr != ip2->daddr)
+			return false;
+
+		if (ip1->frag_off & htons(IP_MF | IP_OFFSET) ||
+		    ip2->frag_off & htons(IP_MF | IP_OFFSET))
+			return ip1->id == ip2->id;
+
+		poff = proto_ports_offset(ip1->protocol);
+		if (poff < 0)
+			return true;
+		else {
+			const u32 *ports1, *ports2;
+
+			ports1 = (__force u32 *) (skb1->data + off1
+						  + ip1->ihl * 4 + poff);
+			ports2 = (__force u32 *) (skb2->data + off2
+						  + ip2->ihl * 4 + poff);
+
+			return *ports1 == *ports2;
+		}
+		break;
+	}
+
+	case __constant_htons(ETH_P_IPV6): {
+		const struct ipv6hdr *ip1, *ip2;
+
+		ip1 = (struct ipv6hdr *) (skb1->data + off1);
+		ip2 = (struct ipv6hdr *) (skb2->data + off2);
+
+		return (ip1->nexthdr == ip2->nexthdr &&
+			ipv6_addr_cmp(&ip1->saddr, &ip2->saddr) == 0 &&
+			ipv6_addr_cmp(&ip1->daddr, &ip2->daddr) == 0);
+	}
+
+	default: /* Maybe compare MAC header here? */
+		return false;
+	}
+	/* not reached */
+}
+
+static inline void choke_set_classid(struct sk_buff *skb, u16 classid)
+{
+	*(unsigned int *)(qdisc_skb_cb(skb)->data) = classid;
+}
+
+static u16 choke_get_classid(const struct sk_buff *skb)
+{
+	return *(unsigned int *)(qdisc_skb_cb(skb)->data);
+}
+
+/*
+ * Classify flow using either:
+ *  1. pre-existing classification result in skb
+ *  2. fast internal classification
+ *  3. use TC filter based classification
+ */
+static bool choke_classify(struct sk_buff *skb,
+			   struct Qdisc *sch, int *qerr)
+
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tcf_result res;
+	int result;
+
+	result = tc_classify(skb, q->filter_list, &res);
+	if (result >= 0) {
+#ifdef CONFIG_NET_CLS_ACT
+		switch (result) {
+		case TC_ACT_STOLEN:
+		case TC_ACT_QUEUED:
+			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+		case TC_ACT_SHOT:
+			return false;
+		}
+#endif
+		choke_set_classid(skb, TC_H_MIN(res.classid));
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * Select a packet at random from queue
+ * HACK: since queue can have holes from previous deletion; retry several
+ *   times to find a random skb but then just give up and return the head
+ * Will return NULL if queue is empty (q->head == q->tail)
+ */
+static struct sk_buff *choke_peek_random(const struct choke_sched_data *q,
+					 unsigned int *pidx)
+{
+	struct sk_buff *skb;
+	int retrys = 3;
+
+	do {
+		*pidx = (q->head + random_N(choke_len(q))) & q->tab_mask;
+		skb = q->tab[*pidx];
+		if (skb)
+			return skb;
+	} while (--retrys > 0);
+
+	return q->tab[*pidx = q->head];
+}
+
+/*
+ * Compare new packet with random packet in queue
+ * returns true if matched and sets *pidx
+ */
+static bool choke_match_random(const struct choke_sched_data *q,
+			       const struct sk_buff *nskb,
+			       unsigned int *pidx)
+{
+	const struct sk_buff *oskb;
+
+	if (q->head == q->tail)
+		return false;
+
+	oskb = choke_peek_random(q, pidx);
+	if (q->filter_list)
+		return choke_get_classid(nskb) == choke_get_classid(oskb);
+
+
+	return choke_match_flow(oskb, nskb);
+}
+
+static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct red_parms *p = &q->parms;
+	int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+
+	if (q->filter_list) {
+		/* If using external classifiers, get result and record it. */
+		if (!choke_classify(skb, sch, &ret))
+			goto other_drop;	/* Packet was eaten by filter */
+	} else {
+		/* for internal classifier, make header accessible */
+		if (!choke_linearize_header(skb))
+			goto other_drop;
+	}
+
+	/* Compute average queue usage (see RED) */
+	p->qavg = red_calc_qavg(p, sch->q.qlen);
+	if (red_is_idling(p))
+		red_end_of_idle_period(p);
+
+	/* Is queue small? */
+	if (p->qavg <= p->qth_min)
+		p->qcount = -1;
+	else {
+		unsigned int idx;
+
+		/* Draw a packet at random from queue and compare flow */
+		if (choke_match_random(q, skb, &idx)) {
+			q->stats.matched++;
+			choke_drop_by_idx(sch, idx);
+			goto congestion_drop;
+		}
+
+		/* Queue is large, always mark/drop */
+		if (p->qavg > p->qth_max) {
+			p->qcount = -1;
+
+			sch->qstats.overlimits++;
+			if (use_harddrop(q) || !use_ecn(q) ||
+			    !INET_ECN_set_ce(skb)) {
+				q->stats.forced_drop++;
+				goto congestion_drop;
+			}
+
+			q->stats.forced_mark++;
+		} else if (++p->qcount) {
+			if (red_mark_probability(p, p->qavg)) {
+				p->qcount = 0;
+				p->qR = red_random(p);
+
+				sch->qstats.overlimits++;
+				if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
+					q->stats.prob_drop++;
+					goto congestion_drop;
+				}
+
+				q->stats.prob_mark++;
+			}
+		} else
+			p->qR = red_random(p);
+	}
+
+	/* Admit new packet */
+	if (sch->q.qlen < q->limit) {
+		q->tab[q->tail] = skb;
+		q->tail = (q->tail + 1) & q->tab_mask;
+		++sch->q.qlen;
+		sch->qstats.backlog += qdisc_pkt_len(skb);
+		return NET_XMIT_SUCCESS;
+	}
+
+	q->stats.pdrop++;
+	sch->qstats.drops++;
+	kfree_skb(skb);
+	return NET_XMIT_DROP;
+
+ congestion_drop:
+	qdisc_drop(skb, sch);
+	return NET_XMIT_CN;
+
+ other_drop:
+	if (ret & __NET_XMIT_BYPASS)
+		sch->qstats.drops++;
+	kfree_skb(skb);
+	return ret;
+}
+
+static struct sk_buff *choke_dequeue(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+
+	if (q->head == q->tail) {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+		return NULL;
+	}
+
+	skb = q->tab[q->head];
+	q->tab[q->head] = NULL;
+	choke_zap_head_holes(q);
+	--sch->q.qlen;
+	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	qdisc_bstats_update(sch, skb);
+
+	return skb;
+}
+
+static unsigned int choke_drop(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	unsigned int len;
+
+	len = qdisc_queue_drop(sch);
+	if (len > 0)
+		q->stats.other++;
+	else {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+	}
+
+	return len;
+}
+
+static void choke_reset(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	red_restart(&q->parms);
+}
+
+static const struct nla_policy choke_policy[TCA_CHOKE_MAX + 1] = {
+	[TCA_CHOKE_PARMS]	= { .len = sizeof(struct tc_red_qopt) },
+	[TCA_CHOKE_STAB]	= { .len = RED_STAB_SIZE },
+};
+
+
+static void choke_free(void *addr)
+{
+	if (addr) {
+		if (is_vmalloc_addr(addr))
+			vfree(addr);
+		else
+			kfree(addr);
+	}
+}
+
+static int choke_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_CHOKE_MAX + 1];
+	const struct tc_red_qopt *ctl;
+	int err;
+	struct sk_buff **old = NULL;
+	unsigned int mask;
+
+	if (opt == NULL)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CHOKE_MAX, opt, choke_policy);
+	if (err < 0)
+		return err;
+
+	if (tb[TCA_CHOKE_PARMS] == NULL ||
+	    tb[TCA_CHOKE_STAB] == NULL)
+		return -EINVAL;
+
+	ctl = nla_data(tb[TCA_CHOKE_PARMS]);
+
+	if (ctl->limit > CHOKE_MAX_QUEUE)
+		return -EINVAL;
+
+	mask = roundup_pow_of_two(ctl->limit + 1) - 1;
+	if (mask != q->tab_mask) {
+		struct sk_buff **ntab;
+
+		ntab = kcalloc(mask + 1, sizeof(struct sk_buff *), GFP_KERNEL);
+		if (!ntab)
+			ntab = vzalloc((mask + 1) * sizeof(struct sk_buff *));
+		if (!ntab)
+			return -ENOMEM;
+
+		sch_tree_lock(sch);
+		old = q->tab;
+		if (old) {
+			unsigned int oqlen = sch->q.qlen, tail = 0;
+
+			while (q->head != q->tail) {
+				struct sk_buff *skb = q->tab[q->head];
+
+				q->head = (q->head + 1) & q->tab_mask;
+				if (!skb)
+					continue;
+				if (tail < mask) {
+					ntab[tail++] = skb;
+					continue;
+				}
+				sch->qstats.backlog -= qdisc_pkt_len(skb);
+				--sch->q.qlen;
+				qdisc_drop(skb, sch);
+			}
+			qdisc_tree_decrease_qlen(sch, oqlen - sch->q.qlen);
+			q->head = 0;
+			q->tail = tail;
+		}
+
+		q->tab_mask = mask;
+		q->tab = ntab;
+	} else
+		sch_tree_lock(sch);
+
+	q->flags = ctl->flags;
+	q->limit = ctl->limit;
+
+	red_set_parms(&q->parms, ctl->qth_min, ctl->qth_max, ctl->Wlog,
+		      ctl->Plog, ctl->Scell_log,
+		      nla_data(tb[TCA_CHOKE_STAB]));
+
+	if (q->head == q->tail)
+		red_end_of_idle_period(&q->parms);
+
+	sch_tree_unlock(sch);
+	choke_free(old);
+	return 0;
+}
+
+static int choke_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	return choke_change(sch, opt);
+}
+
+static int choke_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts = NULL;
+	struct tc_red_qopt opt = {
+		.limit		= q->limit,
+		.flags		= q->flags,
+		.qth_min	= q->parms.qth_min >> q->parms.Wlog,
+		.qth_max	= q->parms.qth_max >> q->parms.Wlog,
+		.Wlog		= q->parms.Wlog,
+		.Plog		= q->parms.Plog,
+		.Scell_log	= q->parms.Scell_log,
+	};
+
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	if (opts == NULL)
+		goto nla_put_failure;
+
+	NLA_PUT(skb, TCA_CHOKE_PARMS, sizeof(opt), &opt);
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -EMSGSIZE;
+}
+
+static int choke_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tc_choke_xstats st = {
+		.early	= q->stats.prob_drop + q->stats.forced_drop,
+		.marked	= q->stats.prob_mark + q->stats.forced_mark,
+		.pdrop	= q->stats.pdrop,
+		.other	= q->stats.other,
+		.matched = q->stats.matched,
+	};
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static void choke_destroy(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	tcf_destroy_chain(&q->filter_list);
+	choke_free(q->tab);
+}
+
+static struct Qdisc *choke_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	return NULL;
+}
+
+static unsigned long choke_get(struct Qdisc *sch, u32 classid)
+{
+	return 0;
+}
+
+static void choke_put(struct Qdisc *q, unsigned long cl)
+{
+}
+
+static unsigned long choke_bind(struct Qdisc *sch, unsigned long parent,
+				u32 classid)
+{
+	return 0;
+}
+
+static struct tcf_proto **choke_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	if (cl)
+		return NULL;
+	return &q->filter_list;
+}
+
+static int choke_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+	tcm->tcm_handle |= TC_H_MIN(cl);
+	return 0;
+}
+
+static void choke_walk(struct Qdisc *sch, struct qdisc_walker *arg)
+{
+	if (!arg->stop) {
+		if (arg->fn(sch, 1, arg) < 0) {
+			arg->stop = 1;
+			return;
+		}
+		arg->count++;
+	}
+}
+
+static const struct Qdisc_class_ops choke_class_ops = {
+	.leaf		=	choke_leaf,
+	.get		=	choke_get,
+	.put		=	choke_put,
+	.tcf_chain	=	choke_find_tcf,
+	.bind_tcf	=	choke_bind,
+	.unbind_tcf	=	choke_put,
+	.dump		=	choke_dump_class,
+	.walk		=	choke_walk,
+};
+
+static struct sk_buff *choke_peek_head(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	return (q->head != q->tail) ? q->tab[q->head] : NULL;
+}
+
+static struct Qdisc_ops choke_qdisc_ops __read_mostly = {
+	.id		=	"choke",
+	.priv_size	=	sizeof(struct choke_sched_data),
+
+	.enqueue	=	choke_enqueue,
+	.dequeue	=	choke_dequeue,
+	.peek		=	choke_peek_head,
+	.drop		=	choke_drop,
+	.init		=	choke_init,
+	.destroy	=	choke_destroy,
+	.reset		=	choke_reset,
+	.change		=	choke_change,
+	.dump		=	choke_dump,
+	.dump_stats	=	choke_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init choke_module_init(void)
+{
+	return register_qdisc(&choke_qdisc_ops);
+}
+
+static void __exit choke_module_exit(void)
+{
+	unregister_qdisc(&choke_qdisc_ops);
+}
+
+module_init(choke_module_init)
+module_exit(choke_module_exit)
+
+MODULE_LICENSE("GPL");
--- a/include/linux/pkt_sched.h	2011-01-18 09:27:32.252669307 -0800
+++ b/include/linux/pkt_sched.h	2011-01-18 09:27:45.526274731 -0800
@@ -247,6 +247,35 @@ struct tc_gred_sopt {
 	__u16		pad1;
 };
 
+/* CHOKe section */
+
+enum {
+	TCA_CHOKE_UNSPEC,
+	TCA_CHOKE_PARMS,
+	TCA_CHOKE_STAB,
+	__TCA_CHOKE_MAX,
+};
+
+#define TCA_CHOKE_MAX (__TCA_CHOKE_MAX - 1)
+
+struct tc_choke_qopt {
+	__u32		limit;		/* Hard queue length (packets)	*/
+	__u32		qth_min;	/* Min average threshold (packets) */
+	__u32		qth_max;	/* Max average threshold (packets) */
+	unsigned char   Wlog;		/* log(W)		*/
+	unsigned char   Plog;		/* log(P_max/(qth_max-qth_min))	*/
+	unsigned char   Scell_log;	/* cell size for idle damping */
+	unsigned char	flags;		/* see RED flags */
+};
+
+struct tc_choke_xstats {
+	__u32		early;          /* Early drops */
+	__u32		pdrop;          /* Drops due to queue limits */
+	__u32		other;          /* Drops due to drop() calls */
+	__u32		marked;         /* Marked packets */
+	__u32		matched;	/* Drops due to flow match */
+};
+
 /* HTB section */
 #define TC_HTB_NUMPRIO		8
 #define TC_HTB_MAXDEPTH		8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] CHOKe flow scheduler (0.10)
  2011-01-20 17:38                       ` [PATCH] CHOKe flow scheduler (0.10) Stephen Hemminger
@ 2011-01-20 18:19                         ` Eric Dumazet
  2011-01-20 22:46                           ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-01-20 18:19 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Patrick McHardy, David Miller, netdev

Le jeudi 20 janvier 2011 à 09:38 -0800, Stephen Hemminger a écrit :

...

Hi Stephen

I am contemplating choke_linearize_header() (yet another hash
function... oh well...) and say :

Instead of this boolean return, just use skb_get_rxhash(). It performs
what you want, and return 32bits of information, instead of one bit.

(if result (rxhash) is zero, it means you can drop packet because it is
not linearized)

So choke_match_flow() can be faster if rxhash differs (no cache miss on
skb->data on an old skb)
(Do the full check only if rxhash is equal on skb1 / skb2)

More over, the skb_get_rxhash() call is _really_ needed only if CHOKe
triggers :
if (p->qavg <= p->qth_min)
	we dont have to compute rxhash at all.

> +/* Make sure network and transport headers can be dereferenced */
> +static bool choke_linearize_header(struct sk_buff *skb)
> +{
> +	int nhoff, poff;
> +	struct ipv6hdr *ip6;
> +	struct iphdr *ip;
> +	u8 ip_proto;
> +	u32 ihl;
> +
> +	nhoff = skb_network_offset(skb);
> +
> +	switch (skb->protocol) {
> +	case __constant_htons(ETH_P_IP):
> +		if (!pskb_may_pull(skb, sizeof(*ip) + nhoff))
> +			return false;
> +
> +		ip = (struct iphdr *) (skb->data + nhoff);
> +		if (ip->frag_off & htons(IP_MF | IP_OFFSET))
> +			return false;
> +
> +		ip_proto = ip->protocol;
> +		ihl = ip->ihl;
> +		break;
> +	case __constant_htons(ETH_P_IPV6):
> +		if (!pskb_may_pull(skb, sizeof(*ip6) + nhoff))
> +			return false;
> +
> +		ip6 = (struct ipv6hdr *) (skb->data + nhoff);
> +		ip_proto = ip6->nexthdr;
> +		ihl = (40 >> 2);
> +		break;
> +	default:
> +		return true;
> +	}
> +
> +	poff = proto_ports_offset(ip_proto);
> +	if (poff >= 0) {
> +		nhoff += ihl * 4 + poff;
> +		if (!pskb_may_pull(skb, nhoff + 4))
> +			return false;
> +	}
> +
> +	return true;
> +}
> +


> +/*
> + * Compare flow of two packets
> + *  Returns true only if source and destination address and port match.
> + *          false for special cases
> + */
> +static bool choke_match_flow(const struct sk_buff *skb1,
> +			     const struct sk_buff *skb2)
> +{
> +	int off1, off2, poff;
> +
> +	if (skb1->protocol != skb2->protocol)
> +		return false;
> +
> +	off1 = skb_network_offset(skb1);
> +	off2 = skb_network_offset(skb2);
> +
> +	switch (skb1->protocol) {
> +	case __constant_htons(ETH_P_IP): {
> +		const struct iphdr *ip1, *ip2;
> +
> +		ip1 = (struct iphdr *) (skb1->data + off1);
> +		ip2 = (struct iphdr *) (skb2->data + off2);
> +
> +		if (ip1->protocol != ip2->protocol ||
> +		    ip1->saddr != ip2->saddr || ip1->daddr != ip2->daddr)
> +			return false;
> +
> +		if (ip1->frag_off & htons(IP_MF | IP_OFFSET) ||
> +		    ip2->frag_off & htons(IP_MF | IP_OFFSET))
> +			return ip1->id == ip2->id;
> +
> +		poff = proto_ports_offset(ip1->protocol);
> +		if (poff < 0)
> +			return true;
> +		else {
> +			const u32 *ports1, *ports2;
> +
> +			ports1 = (__force u32 *) (skb1->data + off1
> +						  + ip1->ihl * 4 + poff);
> +			ports2 = (__force u32 *) (skb2->data + off2
> +						  + ip2->ihl * 4 + poff);
> +
> +			return *ports1 == *ports2;
> +		}
> +		break;
> +	}
> +
> +	case __constant_htons(ETH_P_IPV6): {
> +		const struct ipv6hdr *ip1, *ip2;
> +
> +		ip1 = (struct ipv6hdr *) (skb1->data + off1);
> +		ip2 = (struct ipv6hdr *) (skb2->data + off2);
> +
> +		return (ip1->nexthdr == ip2->nexthdr &&
> +			ipv6_addr_cmp(&ip1->saddr, &ip2->saddr) == 0 &&
> +			ipv6_addr_cmp(&ip1->daddr, &ip2->daddr) == 0);

Hmm, we also need to check ports, you might just move the check from
case __constant_htons(ETH_P_IP) after the switch(skb1->protocol)


> +	}
> +
> +	default: /* Maybe compare MAC header here? */
> +		return false;
> +	}
> +	/* not reached */
> +}
> +

If you want, I can send a diff on your v0.10, just tell me !

Thanks !



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] CHOKe flow scheduler (0.10)
  2011-01-20 18:19                         ` Eric Dumazet
@ 2011-01-20 22:46                           ` Eric Dumazet
  2011-02-03  1:21                             ` [PATCH net-next] CHOKe flow scheduler (0.11) Stephen Hemminger
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-01-20 22:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Patrick McHardy, David Miller, netdev

Le jeudi 20 janvier 2011 à 19:19 +0100, Eric Dumazet a écrit :
> Le jeudi 20 janvier 2011 à 09:38 -0800, Stephen Hemminger a écrit :
> 
> ...
> 
> Hi Stephen
> 
> I am contemplating choke_linearize_header() (yet another hash
> function... oh well...) and say :
> 
> Instead of this boolean return, just use skb_get_rxhash(). It performs
> what you want, and return 32bits of information, instead of one bit.
> 
> (if result (rxhash) is zero, it means you can drop packet because it is
> not linearized)
> 
> So choke_match_flow() can be faster if rxhash differs (no cache miss on
> skb->data on an old skb)
> (Do the full check only if rxhash is equal on skb1 / skb2)
> 
> More over, the skb_get_rxhash() call is _really_ needed only if CHOKe
> triggers :
> if (p->qavg <= p->qth_min)
> 	we dont have to compute rxhash at all.
> 

> If you want, I can send a diff on your v0.10, just tell me !

Here is my diff against v 0.10

Seems to work well here

qdisc choke 11: parent 1:11 limit 130000b min 10833b max 32500b ewma 13 Plog 21 Scell_log 30
 Sent 459876256 bytes 836183 pkt (dropped 318428, overlimits 28 requeues 0) 
 rate 51823Kbit 11779pps backlog 5926323b 10781p requeues 0 
  marked 0 early 28 pdrop 0 other 0 matched 159200

Thanks !


 net/sched/sch_choke.c |  114 ++++++++++++----------------------------
 1 file changed, 37 insertions(+), 77 deletions(-)

diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index e1ac560..0e898ed 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -141,63 +141,24 @@ static void choke_drop_by_idx(struct Qdisc *sch, unsigned int idx)
 	--sch->q.qlen;
 }
 
-/* Make sure network and transport headers can be dereferenced */
-static bool choke_linearize_header(struct sk_buff *skb)
-{
-	int nhoff, poff;
-	struct ipv6hdr *ip6;
-	struct iphdr *ip;
-	u8 ip_proto;
-	u32 ihl;
-
-	nhoff = skb_network_offset(skb);
-
-	switch (skb->protocol) {
-	case __constant_htons(ETH_P_IP):
-		if (!pskb_may_pull(skb, sizeof(*ip) + nhoff))
-			return false;
-
-		ip = (struct iphdr *) (skb->data + nhoff);
-		if (ip->frag_off & htons(IP_MF | IP_OFFSET))
-			return false;
-
-		ip_proto = ip->protocol;
-		ihl = ip->ihl;
-		break;
-	case __constant_htons(ETH_P_IPV6):
-		if (!pskb_may_pull(skb, sizeof(*ip6) + nhoff))
-			return false;
-
-		ip6 = (struct ipv6hdr *) (skb->data + nhoff);
-		ip_proto = ip6->nexthdr;
-		ihl = (40 >> 2);
-		break;
-	default:
-		return true;
-	}
-
-	poff = proto_ports_offset(ip_proto);
-	if (poff >= 0) {
-		nhoff += ihl * 4 + poff;
-		if (!pskb_may_pull(skb, nhoff + 4))
-			return false;
-	}
-
-	return true;
-}
-
 /*
  * Compare flow of two packets
  *  Returns true only if source and destination address and port match.
  *          false for special cases
  */
-static bool choke_match_flow(const struct sk_buff *skb1,
-			     const struct sk_buff *skb2)
+static bool choke_match_flow(struct sk_buff *skb1,
+			     struct sk_buff *skb2)
 {
 	int off1, off2, poff;
+	const u32 *ports1, *ports2;
+	u8 ip_proto;
 
 	if (skb1->protocol != skb2->protocol)
 		return false;
+	if (skb_get_rxhash(skb1) != skb_get_rxhash(skb2))
+		return false;
+	if (!skb_get_rxhash(skb1))
+		return true;
 
 	off1 = skb_network_offset(skb1);
 	off2 = skb_network_offset(skb2);
@@ -209,27 +170,14 @@ static bool choke_match_flow(const struct sk_buff *skb1,
 		ip1 = (struct iphdr *) (skb1->data + off1);
 		ip2 = (struct iphdr *) (skb2->data + off2);
 
-		if (ip1->protocol != ip2->protocol ||
+		ip_proto = ip1->protocol;
+		if (ip_proto != ip2->protocol ||
 		    ip1->saddr != ip2->saddr || ip1->daddr != ip2->daddr)
 			return false;
-
-		if (ip1->frag_off & htons(IP_MF | IP_OFFSET) ||
-		    ip2->frag_off & htons(IP_MF | IP_OFFSET))
-			return ip1->id == ip2->id;
-
-		poff = proto_ports_offset(ip1->protocol);
-		if (poff < 0)
-			return true;
-		else {
-			const u32 *ports1, *ports2;
-
-			ports1 = (__force u32 *) (skb1->data + off1
-						  + ip1->ihl * 4 + poff);
-			ports2 = (__force u32 *) (skb2->data + off2
-						  + ip2->ihl * 4 + poff);
-
-			return *ports1 == *ports2;
-		}
+		if ((ip1->frag_off | ip2->frag_off) & htons(IP_MF | IP_OFFSET))
+			ip_proto = 0;
+		off1 += ip1->ihl * 4;
+		off2 += ip2->ihl * 4;
 		break;
 	}
 
@@ -239,15 +187,32 @@ static bool choke_match_flow(const struct sk_buff *skb1,
 		ip1 = (struct ipv6hdr *) (skb1->data + off1);
 		ip2 = (struct ipv6hdr *) (skb2->data + off2);
 
-		return (ip1->nexthdr == ip2->nexthdr &&
-			ipv6_addr_cmp(&ip1->saddr, &ip2->saddr) == 0 &&
-			ipv6_addr_cmp(&ip1->daddr, &ip2->daddr) == 0);
+		ip_proto = ip1->nexthdr;
+		if (ip_proto != ip2->nexthdr ||
+		    ipv6_addr_cmp(&ip1->saddr, &ip2->saddr) ||
+		    ipv6_addr_cmp(&ip1->daddr, &ip2->daddr))
+			return false;
+		off1 += 40;
+		off2 += 40;
 	}
 
 	default: /* Maybe compare MAC header here? */
 		return false;
 	}
-	/* not reached */
+	poff = proto_ports_offset(ip_proto);
+	if (poff < 0)
+		return true;
+	off1 += poff;
+	off2 += poff;
+	if (!pskb_may_pull(skb1, off1 + 4))
+		return false;
+
+	if (!pskb_may_pull(skb2, off2 + 4))
+		return false;
+
+	ports1 = (__force u32 *)(skb1->data + off1);
+	ports2 = (__force u32 *)(skb2->data + off2);
+	return *ports1 == *ports2;
 }
 
 static inline void choke_set_classid(struct sk_buff *skb, u16 classid)
@@ -319,10 +284,10 @@ static struct sk_buff *choke_peek_random(const struct choke_sched_data *q,
  * returns true if matched and sets *pidx
  */
 static bool choke_match_random(const struct choke_sched_data *q,
-			       const struct sk_buff *nskb,
+			       struct sk_buff *nskb,
 			       unsigned int *pidx)
 {
-	const struct sk_buff *oskb;
+	struct sk_buff *oskb;
 
 	if (q->head == q->tail)
 		return false;
@@ -331,7 +296,6 @@ static bool choke_match_random(const struct choke_sched_data *q,
 	if (q->filter_list)
 		return choke_get_classid(nskb) == choke_get_classid(oskb);
 
-
 	return choke_match_flow(oskb, nskb);
 }
 
@@ -345,10 +309,6 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		/* If using external classifiers, get result and record it. */
 		if (!choke_classify(skb, sch, &ret))
 			goto other_drop;	/* Packet was eaten by filter */
-	} else {
-		/* for internal classifier, make header accessible */
-		if (!choke_linearize_header(skb))
-			goto other_drop;
 	}
 
 	/* Compute average queue usage (see RED) */



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next] CHOKe flow scheduler (0.11)
  2011-01-20 22:46                           ` Eric Dumazet
@ 2011-02-03  1:21                             ` Stephen Hemminger
  2011-02-03  1:59                               ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Stephen Hemminger @ 2011-02-03  1:21 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: Patrick McHardy, netdev

Subject: sched: CHOKe flow scheduler

CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
packet scheduler based on the Random Exponential Drop (RED) algorithm.

The core idea is:
  For every packet arrival:
  	Calculate Qave
	if (Qave < minth) 
	     Queue the new packet
	else 
	     Select randomly a packet from the queue 
	     if (both packets from same flow)
	     then Drop both the packets
	     else if (Qave > maxth)
	          Drop packet
	     else
	       	  Admit packet with proability p (same as RED)

See also:
  Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
   queue management scheme for approximating fair bandwidth allocation", 
  Proceeding of INFOCOM'2000, March 2000.

Help from:
     Eric Dumazet <eric.dumazet@gmail.com>
     Patrick McHardy <kaber@trash.net>

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
0.11 - incorporates Eric's change to use rxhash

 include/linux/pkt_sched.h |   29 +
 net/sched/Kconfig         |   11 
 net/sched/Makefile        |    2 
 net/sched/sch_choke.c     |  676 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 718 insertions(+)

--- a/net/sched/Kconfig	2011-01-31 09:01:35.000000000 -0800
+++ b/net/sched/Kconfig	2011-02-02 17:00:36.798764819 -0800
@@ -217,6 +217,17 @@ config NET_SCH_MQPRIO
 
 	  If unsure, say N.
 
+config NET_SCH_CHOKE
+	tristate "CHOose and Keep responsive flow scheduler (CHOKE)"
+	help
+	  Say Y here if you want to use the CHOKe packet scheduler (CHOose
+	  and Keep for responsive flows, CHOose and Kill for unresponsive
+	  flows). This is a variation of RED which trys to penalize flows
+	  that monopolize the queue.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called sch_choke.
+
 config NET_SCH_INGRESS
 	tristate "Ingress Qdisc"
 	depends on NET_CLS_ACT
--- a/net/sched/Makefile	2011-01-31 09:01:35.000000000 -0800
+++ b/net/sched/Makefile	2011-02-02 17:01:00.987025820 -0800
@@ -33,6 +33,8 @@ obj-$(CONFIG_NET_SCH_ATM)	+= sch_atm.o
 obj-$(CONFIG_NET_SCH_NETEM)	+= sch_netem.o
 obj-$(CONFIG_NET_SCH_DRR)	+= sch_drr.o
 obj-$(CONFIG_NET_SCH_MQPRIO)	+= sch_mqprio.o
+obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
+
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
 obj-$(CONFIG_NET_CLS_FW)	+= cls_fw.o
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ b/net/sched/sch_choke.c	2011-02-02 17:08:57.208163848 -0800
@@ -0,0 +1,676 @@
+/*
+ * net/sched/sch_choke.c	CHOKE scheduler
+ *
+ * Copyright (c) 2011 Stephen Hemminger <shemminger@vyatta.com>
+ * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/reciprocal_div.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <net/red.h>
+#include <linux/ip.h>
+#include <net/ip.h>
+#include <linux/ipv6.h>
+#include <net/ipv6.h>
+
+/*
+   CHOKe stateless AQM for fair bandwidth allocation
+   =================================================
+
+   CHOKe (CHOose and Keep for responsive flows, CHOose and Kill for
+   unresponsive flows) is a variant of RED that penalizes misbehaving flows but
+   maintains no flow state. The difference from RED is an additional step
+   during the enqueuing process. If average queue size is over the
+   low threshold (qmin), a packet is chosen at random from the queue.
+   If both the new and chosen packet are from the same flow, both
+   are dropped. Unlike RED, CHOKe is not really a "classful" qdisc because it
+   needs to access packets in queue randomly. It has a minimal class
+   interface to allow overriding the builtin flow classifier with
+   filters.
+
+   Source:
+   R. Pan, B. Prabhakar, and K. Psounis, "CHOKe, A Stateless
+   Active Queue Management Scheme for Approximating Fair Bandwidth Allocation",
+   IEEE INFOCOM, 2000.
+
+   A. Tang, J. Wang, S. Low, "Understanding CHOKe: Throughput and Spatial
+   Characteristics", IEEE/ACM Transactions on Networking, 2004
+
+ */
+
+/* Upper bound on size of sk_buff table (packets) */
+#define CHOKE_MAX_QUEUE	(128*1024 - 1)
+
+struct choke_sched_data {
+/* Parameters */
+	u32		 limit;
+	unsigned char	 flags;
+
+	struct red_parms parms;
+
+/* Variables */
+	struct tcf_proto *filter_list;
+	struct {
+		u32	prob_drop;	/* Early probability drops */
+		u32	prob_mark;	/* Early probability marks */
+		u32	forced_drop;	/* Forced drops, qavg > max_thresh */
+		u32	forced_mark;	/* Forced marks, qavg > max_thresh */
+		u32	pdrop;          /* Drops due to queue limits */
+		u32	other;          /* Drops due to drop() calls */
+		u32	matched;	/* Drops to flow match */
+	} stats;
+
+	unsigned int	 head;
+	unsigned int	 tail;
+
+	unsigned int	 tab_mask; /* size - 1 */
+
+	struct sk_buff **tab;
+};
+
+/* deliver a random number between 0 and N - 1 */
+static u32 random_N(unsigned int N)
+{
+	return reciprocal_divide(random32(), N);
+}
+
+/* number of elements in queue including holes */
+static unsigned int choke_len(const struct choke_sched_data *q)
+{
+	return (q->tail - q->head) & q->tab_mask;
+}
+
+/* Is ECN parameter configured */
+static int use_ecn(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_ECN;
+}
+
+/* Should packets over max just be dropped (versus marked) */
+static int use_harddrop(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_HARDDROP;
+}
+
+/* Move head pointer forward to skip over holes */
+static void choke_zap_head_holes(struct choke_sched_data *q)
+{
+	do {
+		q->head = (q->head + 1) & q->tab_mask;
+		if (q->head == q->tail)
+			break;
+	} while (q->tab[q->head] == NULL);
+}
+
+/* Move tail pointer backwards to reuse holes */
+static void choke_zap_tail_holes(struct choke_sched_data *q)
+{
+	do {
+		q->tail = (q->tail - 1) & q->tab_mask;
+		if (q->head == q->tail)
+			break;
+	} while (q->tab[q->tail] == NULL);
+}
+
+/* Drop packet from queue array by creating a "hole" */
+static void choke_drop_by_idx(struct Qdisc *sch, unsigned int idx)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb = q->tab[idx];
+
+	q->tab[idx] = NULL;
+
+	if (idx == q->head)
+		choke_zap_head_holes(q);
+	if (idx == q->tail)
+		choke_zap_tail_holes(q);
+
+	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	qdisc_drop(skb, sch);
+	qdisc_tree_decrease_qlen(sch, 1);
+	--sch->q.qlen;
+}
+
+/*
+ * Compare flow of two packets
+ *  Returns true only if source and destination address and port match.
+ *          false for special cases
+ */
+static bool choke_match_flow(struct sk_buff *skb1,
+			     struct sk_buff *skb2)
+{
+	int off1, off2, poff;
+	const u32 *ports1, *ports2;
+	u8 ip_proto;
+	__u32 hash1;
+
+	if (skb1->protocol != skb2->protocol)
+		return false;
+
+	/* Use hash value as quick check
+	 * Assumes that __skb_get_rxhash makes IP header and ports linear
+	 */
+	hash1 = skb_get_rxhash(skb1);
+	if (!hash1 || hash1 != skb_get_rxhash(skb2))
+		return false;
+
+	/* Probably match, but be sure to avoid hash collisions */
+	off1 = skb_network_offset(skb1);
+	off2 = skb_network_offset(skb2);
+
+	switch (skb1->protocol) {
+	case __constant_htons(ETH_P_IP): {
+		const struct iphdr *ip1, *ip2;
+
+		ip1 = (const struct iphdr *) (skb1->data + off1);
+		ip2 = (const struct iphdr *) (skb2->data + off2);
+
+		ip_proto = ip1->protocol;
+		if (ip_proto != ip2->protocol ||
+		    ip1->saddr != ip2->saddr || ip1->daddr != ip2->daddr)
+			return false;
+
+		if ((ip1->frag_off | ip2->frag_off) & htons(IP_MF | IP_OFFSET))
+			ip_proto = 0;
+		off1 += ip1->ihl * 4;
+		off2 += ip2->ihl * 4;
+		break;
+	}
+
+	case __constant_htons(ETH_P_IPV6): {
+		const struct ipv6hdr *ip1, *ip2;
+
+		ip1 = (const struct ipv6hdr *) (skb1->data + off1);
+		ip2 = (const struct ipv6hdr *) (skb2->data + off2);
+
+		ip_proto = ip1->nexthdr;
+		if (ip_proto != ip2->nexthdr ||
+		    ipv6_addr_cmp(&ip1->saddr, &ip2->saddr) ||
+		    ipv6_addr_cmp(&ip1->daddr, &ip2->daddr))
+			return false;
+		off1 += 40;
+		off2 += 40;
+	}
+
+	default: /* Maybe compare MAC header here? */
+		return false;
+	}
+
+	poff = proto_ports_offset(ip_proto);
+	if (poff < 0)
+		return true;
+
+	off1 += poff;
+	off2 += poff;
+
+	ports1 = (__force u32 *)(skb1->data + off1);
+	ports2 = (__force u32 *)(skb2->data + off2);
+	return *ports1 == *ports2;
+}
+
+static inline void choke_set_classid(struct sk_buff *skb, u16 classid)
+{
+	*(unsigned int *)(qdisc_skb_cb(skb)->data) = classid;
+}
+
+static u16 choke_get_classid(const struct sk_buff *skb)
+{
+	return *(unsigned int *)(qdisc_skb_cb(skb)->data);
+}
+
+/*
+ * Classify flow using either:
+ *  1. pre-existing classification result in skb
+ *  2. fast internal classification
+ *  3. use TC filter based classification
+ */
+static bool choke_classify(struct sk_buff *skb,
+			   struct Qdisc *sch, int *qerr)
+
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tcf_result res;
+	int result;
+
+	result = tc_classify(skb, q->filter_list, &res);
+	if (result >= 0) {
+#ifdef CONFIG_NET_CLS_ACT
+		switch (result) {
+		case TC_ACT_STOLEN:
+		case TC_ACT_QUEUED:
+			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+		case TC_ACT_SHOT:
+			return false;
+		}
+#endif
+		choke_set_classid(skb, TC_H_MIN(res.classid));
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * Select a packet at random from queue
+ * HACK: since queue can have holes from previous deletion; retry several
+ *   times to find a random skb but then just give up and return the head
+ * Will return NULL if queue is empty (q->head == q->tail)
+ */
+static struct sk_buff *choke_peek_random(const struct choke_sched_data *q,
+					 unsigned int *pidx)
+{
+	struct sk_buff *skb;
+	int retrys = 3;
+
+	do {
+		*pidx = (q->head + random_N(choke_len(q))) & q->tab_mask;
+		skb = q->tab[*pidx];
+		if (skb)
+			return skb;
+	} while (--retrys > 0);
+
+	return q->tab[*pidx = q->head];
+}
+
+/*
+ * Compare new packet with random packet in queue
+ * returns true if matched and sets *pidx
+ */
+static bool choke_match_random(const struct choke_sched_data *q,
+			       struct sk_buff *nskb,
+			       unsigned int *pidx)
+{
+	struct sk_buff *oskb;
+
+	if (q->head == q->tail)
+		return false;
+
+	oskb = choke_peek_random(q, pidx);
+	if (q->filter_list)
+		return choke_get_classid(nskb) == choke_get_classid(oskb);
+
+	return choke_match_flow(oskb, nskb);
+}
+
+static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct red_parms *p = &q->parms;
+	int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+
+	if (q->filter_list) {
+		/* If using external classifiers, get result and record it. */
+		if (!choke_classify(skb, sch, &ret))
+			goto other_drop;	/* Packet was eaten by filter */
+	}
+
+	/* Compute average queue usage (see RED) */
+	p->qavg = red_calc_qavg(p, sch->q.qlen);
+	if (red_is_idling(p))
+		red_end_of_idle_period(p);
+
+	/* Is queue small? */
+	if (p->qavg <= p->qth_min)
+		p->qcount = -1;
+	else {
+		unsigned int idx;
+
+		/* Draw a packet at random from queue and compare flow */
+		if (choke_match_random(q, skb, &idx)) {
+			q->stats.matched++;
+			choke_drop_by_idx(sch, idx);
+			goto congestion_drop;
+		}
+
+		/* Queue is large, always mark/drop */
+		if (p->qavg > p->qth_max) {
+			p->qcount = -1;
+
+			sch->qstats.overlimits++;
+			if (use_harddrop(q) || !use_ecn(q) ||
+			    !INET_ECN_set_ce(skb)) {
+				q->stats.forced_drop++;
+				goto congestion_drop;
+			}
+
+			q->stats.forced_mark++;
+		} else if (++p->qcount) {
+			if (red_mark_probability(p, p->qavg)) {
+				p->qcount = 0;
+				p->qR = red_random(p);
+
+				sch->qstats.overlimits++;
+				if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
+					q->stats.prob_drop++;
+					goto congestion_drop;
+				}
+
+				q->stats.prob_mark++;
+			}
+		} else
+			p->qR = red_random(p);
+	}
+
+	/* Admit new packet */
+	if (sch->q.qlen < q->limit) {
+		q->tab[q->tail] = skb;
+		q->tail = (q->tail + 1) & q->tab_mask;
+		++sch->q.qlen;
+		sch->qstats.backlog += qdisc_pkt_len(skb);
+		return NET_XMIT_SUCCESS;
+	}
+
+	q->stats.pdrop++;
+	sch->qstats.drops++;
+	kfree_skb(skb);
+	return NET_XMIT_DROP;
+
+ congestion_drop:
+	qdisc_drop(skb, sch);
+	return NET_XMIT_CN;
+
+ other_drop:
+	if (ret & __NET_XMIT_BYPASS)
+		sch->qstats.drops++;
+	kfree_skb(skb);
+	return ret;
+}
+
+static struct sk_buff *choke_dequeue(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+
+	if (q->head == q->tail) {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+		return NULL;
+	}
+
+	skb = q->tab[q->head];
+	q->tab[q->head] = NULL;
+	choke_zap_head_holes(q);
+	--sch->q.qlen;
+	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	qdisc_bstats_update(sch, skb);
+
+	return skb;
+}
+
+static unsigned int choke_drop(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	unsigned int len;
+
+	len = qdisc_queue_drop(sch);
+	if (len > 0)
+		q->stats.other++;
+	else {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+	}
+
+	return len;
+}
+
+static void choke_reset(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	red_restart(&q->parms);
+}
+
+static const struct nla_policy choke_policy[TCA_CHOKE_MAX + 1] = {
+	[TCA_CHOKE_PARMS]	= { .len = sizeof(struct tc_red_qopt) },
+	[TCA_CHOKE_STAB]	= { .len = RED_STAB_SIZE },
+};
+
+
+static void choke_free(void *addr)
+{
+	if (addr) {
+		if (is_vmalloc_addr(addr))
+			vfree(addr);
+		else
+			kfree(addr);
+	}
+}
+
+static int choke_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_CHOKE_MAX + 1];
+	const struct tc_red_qopt *ctl;
+	int err;
+	struct sk_buff **old = NULL;
+	unsigned int mask;
+
+	if (opt == NULL)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CHOKE_MAX, opt, choke_policy);
+	if (err < 0)
+		return err;
+
+	if (tb[TCA_CHOKE_PARMS] == NULL ||
+	    tb[TCA_CHOKE_STAB] == NULL)
+		return -EINVAL;
+
+	ctl = nla_data(tb[TCA_CHOKE_PARMS]);
+
+	if (ctl->limit > CHOKE_MAX_QUEUE)
+		return -EINVAL;
+
+	mask = roundup_pow_of_two(ctl->limit + 1) - 1;
+	if (mask != q->tab_mask) {
+		struct sk_buff **ntab;
+
+		ntab = kcalloc(mask + 1, sizeof(struct sk_buff *), GFP_KERNEL);
+		if (!ntab)
+			ntab = vzalloc((mask + 1) * sizeof(struct sk_buff *));
+		if (!ntab)
+			return -ENOMEM;
+
+		sch_tree_lock(sch);
+		old = q->tab;
+		if (old) {
+			unsigned int oqlen = sch->q.qlen, tail = 0;
+
+			while (q->head != q->tail) {
+				struct sk_buff *skb = q->tab[q->head];
+
+				q->head = (q->head + 1) & q->tab_mask;
+				if (!skb)
+					continue;
+				if (tail < mask) {
+					ntab[tail++] = skb;
+					continue;
+				}
+				sch->qstats.backlog -= qdisc_pkt_len(skb);
+				--sch->q.qlen;
+				qdisc_drop(skb, sch);
+			}
+			qdisc_tree_decrease_qlen(sch, oqlen - sch->q.qlen);
+			q->head = 0;
+			q->tail = tail;
+		}
+
+		q->tab_mask = mask;
+		q->tab = ntab;
+	} else
+		sch_tree_lock(sch);
+
+	q->flags = ctl->flags;
+	q->limit = ctl->limit;
+
+	red_set_parms(&q->parms, ctl->qth_min, ctl->qth_max, ctl->Wlog,
+		      ctl->Plog, ctl->Scell_log,
+		      nla_data(tb[TCA_CHOKE_STAB]));
+
+	if (q->head == q->tail)
+		red_end_of_idle_period(&q->parms);
+
+	sch_tree_unlock(sch);
+	choke_free(old);
+	return 0;
+}
+
+static int choke_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	return choke_change(sch, opt);
+}
+
+static int choke_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts = NULL;
+	struct tc_red_qopt opt = {
+		.limit		= q->limit,
+		.flags		= q->flags,
+		.qth_min	= q->parms.qth_min >> q->parms.Wlog,
+		.qth_max	= q->parms.qth_max >> q->parms.Wlog,
+		.Wlog		= q->parms.Wlog,
+		.Plog		= q->parms.Plog,
+		.Scell_log	= q->parms.Scell_log,
+	};
+
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	if (opts == NULL)
+		goto nla_put_failure;
+
+	NLA_PUT(skb, TCA_CHOKE_PARMS, sizeof(opt), &opt);
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -EMSGSIZE;
+}
+
+static int choke_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tc_choke_xstats st = {
+		.early	= q->stats.prob_drop + q->stats.forced_drop,
+		.marked	= q->stats.prob_mark + q->stats.forced_mark,
+		.pdrop	= q->stats.pdrop,
+		.other	= q->stats.other,
+		.matched = q->stats.matched,
+	};
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static void choke_destroy(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	tcf_destroy_chain(&q->filter_list);
+	choke_free(q->tab);
+}
+
+static struct Qdisc *choke_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	return NULL;
+}
+
+static unsigned long choke_get(struct Qdisc *sch, u32 classid)
+{
+	return 0;
+}
+
+static void choke_put(struct Qdisc *q, unsigned long cl)
+{
+}
+
+static unsigned long choke_bind(struct Qdisc *sch, unsigned long parent,
+				u32 classid)
+{
+	return 0;
+}
+
+static struct tcf_proto **choke_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	if (cl)
+		return NULL;
+	return &q->filter_list;
+}
+
+static int choke_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+	tcm->tcm_handle |= TC_H_MIN(cl);
+	return 0;
+}
+
+static void choke_walk(struct Qdisc *sch, struct qdisc_walker *arg)
+{
+	if (!arg->stop) {
+		if (arg->fn(sch, 1, arg) < 0) {
+			arg->stop = 1;
+			return;
+		}
+		arg->count++;
+	}
+}
+
+static const struct Qdisc_class_ops choke_class_ops = {
+	.leaf		=	choke_leaf,
+	.get		=	choke_get,
+	.put		=	choke_put,
+	.tcf_chain	=	choke_find_tcf,
+	.bind_tcf	=	choke_bind,
+	.unbind_tcf	=	choke_put,
+	.dump		=	choke_dump_class,
+	.walk		=	choke_walk,
+};
+
+static struct sk_buff *choke_peek_head(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	return (q->head != q->tail) ? q->tab[q->head] : NULL;
+}
+
+static struct Qdisc_ops choke_qdisc_ops __read_mostly = {
+	.id		=	"choke",
+	.priv_size	=	sizeof(struct choke_sched_data),
+
+	.enqueue	=	choke_enqueue,
+	.dequeue	=	choke_dequeue,
+	.peek		=	choke_peek_head,
+	.drop		=	choke_drop,
+	.init		=	choke_init,
+	.destroy	=	choke_destroy,
+	.reset		=	choke_reset,
+	.change		=	choke_change,
+	.dump		=	choke_dump,
+	.dump_stats	=	choke_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init choke_module_init(void)
+{
+	return register_qdisc(&choke_qdisc_ops);
+}
+
+static void __exit choke_module_exit(void)
+{
+	unregister_qdisc(&choke_qdisc_ops);
+}
+
+module_init(choke_module_init)
+module_exit(choke_module_exit)
+
+MODULE_LICENSE("GPL");
--- a/include/linux/pkt_sched.h	2011-01-31 09:01:32.000000000 -0800
+++ b/include/linux/pkt_sched.h	2011-02-02 17:00:36.802764862 -0800
@@ -247,6 +247,35 @@ struct tc_gred_sopt {
 	__u16		pad1;
 };
 
+/* CHOKe section */
+
+enum {
+	TCA_CHOKE_UNSPEC,
+	TCA_CHOKE_PARMS,
+	TCA_CHOKE_STAB,
+	__TCA_CHOKE_MAX,
+};
+
+#define TCA_CHOKE_MAX (__TCA_CHOKE_MAX - 1)
+
+struct tc_choke_qopt {
+	__u32		limit;		/* Hard queue length (packets)	*/
+	__u32		qth_min;	/* Min average threshold (packets) */
+	__u32		qth_max;	/* Max average threshold (packets) */
+	unsigned char   Wlog;		/* log(W)		*/
+	unsigned char   Plog;		/* log(P_max/(qth_max-qth_min))	*/
+	unsigned char   Scell_log;	/* cell size for idle damping */
+	unsigned char	flags;		/* see RED flags */
+};
+
+struct tc_choke_xstats {
+	__u32		early;          /* Early drops */
+	__u32		pdrop;          /* Drops due to queue limits */
+	__u32		other;          /* Drops due to drop() calls */
+	__u32		marked;         /* Marked packets */
+	__u32		matched;	/* Drops due to flow match */
+};
+
 /* HTB section */
 #define TC_HTB_NUMPRIO		8
 #define TC_HTB_MAXDEPTH		8

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next] CHOKe flow scheduler (0.11)
  2011-02-03  1:21                             ` [PATCH net-next] CHOKe flow scheduler (0.11) Stephen Hemminger
@ 2011-02-03  1:59                               ` Eric Dumazet
  2011-02-03  4:53                                 ` David Miller
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2011-02-03  1:59 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, Patrick McHardy, netdev

Le mercredi 02 février 2011 à 17:21 -0800, Stephen Hemminger a écrit :
> Subject: sched: CHOKe flow scheduler
> 
> CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
> packet scheduler based on the Random Exponential Drop (RED) algorithm.
> 
> The core idea is:
>   For every packet arrival:
>   	Calculate Qave
> 	if (Qave < minth) 
> 	     Queue the new packet
> 	else 
> 	     Select randomly a packet from the queue 
> 	     if (both packets from same flow)
> 	     then Drop both the packets
> 	     else if (Qave > maxth)
> 	          Drop packet
> 	     else
> 	       	  Admit packet with proability p (same as RED)
> 
> See also:
>   Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
>    queue management scheme for approximating fair bandwidth allocation", 
>   Proceeding of INFOCOM'2000, March 2000.
> 
> Help from:
>      Eric Dumazet <eric.dumazet@gmail.com>
>      Patrick McHardy <kaber@trash.net>
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> ---
> 0.11 - incorporates Eric's change to use rxhash
> 
>  

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Thanks Stephen !



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next] CHOKe flow scheduler (0.11)
  2011-02-03  1:59                               ` Eric Dumazet
@ 2011-02-03  4:53                                 ` David Miller
  0 siblings, 0 replies; 26+ messages in thread
From: David Miller @ 2011-02-03  4:53 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, kaber, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 03 Feb 2011 02:59:12 +0100

> Le mercredi 02 février 2011 à 17:21 -0800, Stephen Hemminger a écrit :
>> Subject: sched: CHOKe flow scheduler
>> 
>> CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
>> packet scheduler based on the Random Exponential Drop (RED) algorithm.
>> 
>> The core idea is:
>>   For every packet arrival:
>>   	Calculate Qave
>> 	if (Qave < minth) 
>> 	     Queue the new packet
>> 	else 
>> 	     Select randomly a packet from the queue 
>> 	     if (both packets from same flow)
>> 	     then Drop both the packets
>> 	     else if (Qave > maxth)
>> 	          Drop packet
>> 	     else
>> 	       	  Admit packet with proability p (same as RED)
>> 
>> See also:
>>   Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
>>    queue management scheme for approximating fair bandwidth allocation", 
>>   Proceeding of INFOCOM'2000, March 2000.
>> 
>> Help from:
>>      Eric Dumazet <eric.dumazet@gmail.com>
>>      Patrick McHardy <kaber@trash.net>
>> 
>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>> 
>> ---
>> 0.11 - incorporates Eric's change to use rxhash
>> 
>>  
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Also applied, thanks guys!

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2011-02-03  4:52 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-13 17:27 [PATCH] CHOKe flow scheduler (0.7) Stephen Hemminger
2011-01-13 17:48 ` [PATCH] CHOKe flow scheduler (iproute) Stephen Hemminger
2011-01-13 21:01   ` Eric Dumazet
2011-01-13 18:00 ` [PATCH] CHOKe flow scheduler (0.7) Eric Dumazet
2011-01-13 18:02   ` Eric Dumazet
2011-01-13 20:37 ` Eric Dumazet
2011-01-13 23:34   ` [PATCH] CHOKe flow scheduler (0.8) Stephen Hemminger
2011-01-14  3:29     ` Eric Dumazet
2011-01-14  3:34       ` Eric Dumazet
2011-01-14  3:58         ` Eric Dumazet
2011-01-14 11:32           ` Eric Dumazet
2011-01-14 13:54     ` Patrick McHardy
2011-01-14 13:55       ` Patrick McHardy
2011-01-14 14:24         ` Eric Dumazet
2011-01-14 23:45           ` [PATCH] CHOKe flow scheduler (0.9) Stephen Hemminger
2011-01-15  7:45             ` Eric Dumazet
2011-01-17 17:25               ` Stephen Hemminger
2011-01-17 17:54                 ` Eric Dumazet
2011-01-18 19:06                   ` Stephen Hemminger
2011-01-18 19:34                     ` Eric Dumazet
2011-01-20 17:38                       ` [PATCH] CHOKe flow scheduler (0.10) Stephen Hemminger
2011-01-20 18:19                         ` Eric Dumazet
2011-01-20 22:46                           ` Eric Dumazet
2011-02-03  1:21                             ` [PATCH net-next] CHOKe flow scheduler (0.11) Stephen Hemminger
2011-02-03  1:59                               ` Eric Dumazet
2011-02-03  4:53                                 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.