netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH BUGFIX] pkt_sched: fix little service anomalies and possible crashes of qfq+
@ 2012-12-19 17:31 Paolo valente
  2012-12-26 23:13 ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo valente @ 2012-12-19 17:31 UTC (permalink / raw)
  To: davem, jhs, shemminger
  Cc: linux-kernel, netdev, rizzo, fchecconi, Paolo Valente

This patch contains five fixes that remove some little service
anomalies, plus an improvement that slightly reduces the execution
time of qfq+.

As for possible crashes, in a stress test in which a few hundreds of
classes randomly switched from backlogged to non-backlogged, and the
parameters of a few tens of randomly chosen classes were changed
repeatedly and in parallel, the problems solved by these fixes
happened to combine and cause corruption of the data structures
supporting the group bucket lists.

The portions of the code interested by each fix are small and do not
overlap with each other, so I decided to provide just one patch
(I hope that this was the right choice).

1. Fix the update of eligible-group sets

Between two invocations of make_eligible, the system virtual time may
happen to grow enough that, in its binary representation, a bit with
higher order than 31 flips. This happens especially with
TSO/GSO. Before this fix, the mask used in make_eligible was computed
as (1UL<<index_of_last_flipped_bit)-1, whose value is well defined on
a 64-bit architecture, because index_of_flipped_bit <= 63, but is in
general undefined on a 32-bit architecture if index_of_flipped_bit > 31.
The fix just replaces 1UL with 1ULL.

2. Properly cap timestamps in charge_actual_service

After decreasing the number of classes, and hence the maximum budget
budgetmax for an aggregate in service, it may happen that the service
it receives before a new aggregate is chosen for service becomes
larger than budgetmax. This may cause the aggregate to be assigned an
incorrect, too high virtual finish time. The cap introduced by this
fix solves this problem.

3. Do not allow virtual time to jump if an aggregate is in service

By definition of (the algorithm of) qfq+, the system virtual time must
be pushed up only if there is no eligible aggregate. But to decide
whether this condition holds correctly, qfq+ must also check whether
no aggregate is in service (and hence eligible).

4. Prevent the budget of an aggregate from becoming negative in
qfq_dequeue

If lmax is lowered, through qfq_change_class, for a class owning
pending packets with larger size than the new value of lmax, then even
the budget of a just selected-aggregate may happen to be lower than
the length of the next packet to serve. This fix prevents the budget
from becoming negative after the packet is dequeued.

5. Start serving a just-activated aggregate immediately if the
scheduler is empty

This avoids qfq+ to be occasionally non-work-conserving.

6. Remove useless invocation of qfq_update_eligible from qfq_deactivate_agg

There is no need to invoke qfq_update_eligible in qfq_deactivate_agg
to keep qfq+ work-conserving.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
---
 net/sched/sch_qfq.c |   72 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 49 insertions(+), 23 deletions(-)

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 6ed3765..01f36a5 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -299,6 +299,11 @@ static void qfq_update_agg(struct qfq_sched *q, struct qfq_aggregate *agg,
 	    new_num_classes == q->max_agg_classes - 1) /* agg no more full */
 		hlist_add_head(&agg->nonfull_next, &q->nonfull_aggs);
 
+	/*
+	 * The next assignment may let
+	 * agg->initial_budget > agg->budgetmax
+	 * hold, but this does not cause any harm
+	 */
 	agg->budgetmax = new_num_classes * agg->lmax;
 	new_agg_weight = agg->class_weight * new_num_classes;
 	agg->inv_w = ONE_FP/new_agg_weight;
@@ -325,9 +330,10 @@ static void qfq_add_to_agg(struct qfq_sched *q,
 	qfq_update_agg(q, agg, agg->num_classes+1);
 	if (cl->qdisc->q.qlen > 0) { /* adding an active class */
 		list_add_tail(&cl->alist, &agg->active);
-		if (list_first_entry(&agg->active, struct qfq_class, alist) ==
-		    cl && q->in_serv_agg != agg) /* agg was inactive */
-			qfq_activate_agg(q, agg, enqueue); /* schedule agg */
+		if (list_first_entry(&agg->active, struct qfq_class, alist) !=
+		    cl || q->in_serv_agg == agg)
+			return;
+		qfq_activate_agg(q, agg, enqueue);
 	}
 }
 
@@ -819,7 +825,7 @@ static void qfq_make_eligible(struct qfq_sched *q)
 	unsigned long old_vslot = q->oldV >> q->min_slot_shift;
 
 	if (vslot != old_vslot) {
-		unsigned long mask = (1UL << fls(vslot ^ old_vslot)) - 1;
+		unsigned long mask = (1ULL << fls(vslot ^ old_vslot)) - 1;
 		qfq_move_groups(q, mask, IR, ER);
 		qfq_move_groups(q, mask, IB, EB);
 	}
@@ -993,9 +999,19 @@ static inline void charge_actual_service(struct qfq_aggregate *agg)
 	/* compute the service received by the aggregate */
 	u32 service_received = agg->initial_budget - agg->budget;
 
+	/* may happen after decreasing the number of classes in agg */
+	if (service_received > agg->budgetmax)
+		service_received = agg->budgetmax;
+
 	agg->F = agg->S + (u64)service_received * agg->inv_w;
 }
 
+static inline void qfq_update_agg_ts(struct qfq_sched *,
+				     struct qfq_aggregate *,
+				     enum update_reason);
+
+static void qfq_schedule_agg(struct qfq_sched *, struct qfq_aggregate *);
+
 static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 {
 	struct qfq_sched *q = qdisc_priv(sch);
@@ -1023,7 +1039,7 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 		in_serv_agg->initial_budget = in_serv_agg->budget =
 			in_serv_agg->budgetmax;
 
-		if (!list_empty(&in_serv_agg->active))
+		if (!list_empty(&in_serv_agg->active)) {
 			/*
 			 * Still active: reschedule for
 			 * service. Possible optimization: if no other
@@ -1034,8 +1050,9 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 			 * handle it, we would need to maintain an
 			 * extra num_active_aggs field.
 			*/
-			qfq_activate_agg(q, in_serv_agg, requeue);
-		else if (sch->q.qlen == 0) { /* no aggregate to serve */
+			qfq_update_agg_ts(q, in_serv_agg, requeue);
+			qfq_schedule_agg(q, in_serv_agg);
+		} else if (sch->q.qlen == 0) { /* no aggregate to serve */
 			q->in_serv_agg = NULL;
 			return NULL;
 		}
@@ -1054,7 +1071,16 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 	qdisc_bstats_update(sch, skb);
 
 	agg_dequeue(in_serv_agg, cl, len);
-	in_serv_agg->budget -= len;
+	/*
+	 * If lmax is lowered, through qfq_change_class, for a class
+	 * owning pending packets with larger size than the new value of lmax,
+	 * then the following condition may hold.
+	 */
+	if (unlikely(in_serv_agg->budget < len))
+		in_serv_agg->budget = 0;
+	else
+		in_serv_agg->budget -= len;
+
 	q->V += (u64)len * IWSUM;
 	pr_debug("qfq dequeue: len %u F %lld now %lld\n",
 		 len, (unsigned long long) in_serv_agg->F,
@@ -1154,7 +1180,7 @@ static void qfq_update_start(struct qfq_sched *q, struct qfq_aggregate *agg)
  */
 static inline void
 qfq_update_agg_ts(struct qfq_sched *q,
-		    struct qfq_aggregate *agg, enum update_reason reason)
+		  struct qfq_aggregate *agg, enum update_reason reason)
 {
 	if (reason != requeue)
 		qfq_update_start(q, agg);
@@ -1219,17 +1245,11 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	cl->deficit = agg->lmax;
 	list_add_tail(&cl->alist, &agg->active);
 
-	if (list_first_entry(&agg->active, struct qfq_class, alist) != cl)
-		return err; /* aggregate was not empty, nothing else to do */
+	if (list_first_entry(&agg->active, struct qfq_class, alist) != cl ||
+	    q->in_serv_agg == agg)
+		return err; /* non-empty or in service, nothing else to do */
 
-	/* recharge budget */
-	agg->initial_budget = agg->budget = agg->budgetmax;
-
-	qfq_update_agg_ts(q, agg, enqueue);
-	if (q->in_serv_agg == NULL)
-		q->in_serv_agg = agg;
-	else if (agg != q->in_serv_agg)
-		qfq_schedule_agg(q, agg);
+	qfq_activate_agg(q, agg, enqueue);
 
 	return err;
 }
@@ -1263,7 +1283,8 @@ static void qfq_schedule_agg(struct qfq_sched *q, struct qfq_aggregate *agg)
 		/* group was surely ineligible, remove */
 		__clear_bit(grp->index, &q->bitmaps[IR]);
 		__clear_bit(grp->index, &q->bitmaps[IB]);
-	} else if (!q->bitmaps[ER] && qfq_gt(roundedS, q->V))
+	} else if (!q->bitmaps[ER] && qfq_gt(roundedS, q->V) &&
+		   q->in_serv_agg == NULL)
 		q->V = roundedS;
 
 	grp->S = roundedS;
@@ -1286,8 +1307,15 @@ skip_update:
 static void qfq_activate_agg(struct qfq_sched *q, struct qfq_aggregate *agg,
 			     enum update_reason reason)
 {
+	agg->initial_budget = agg->budget = agg->budgetmax; /* recharge budg */
+
 	qfq_update_agg_ts(q, agg, reason);
-	qfq_schedule_agg(q, agg);
+	if (q->in_serv_agg == NULL) { /* no aggr. in service or scheduled */
+		q->in_serv_agg = agg; /* start serving this aggregate */
+		 /* update V: to be in service, agg must be eligible */
+		q->oldV = q->V = agg->S;
+	} else if (agg != q->in_serv_agg)
+		qfq_schedule_agg(q, agg);
 }
 
 static void qfq_slot_remove(struct qfq_sched *q, struct qfq_group *grp,
@@ -1359,8 +1387,6 @@ static void qfq_deactivate_agg(struct qfq_sched *q, struct qfq_aggregate *agg)
 			__set_bit(grp->index, &q->bitmaps[s]);
 		}
 	}
-
-	qfq_update_eligible(q);
 }
 
 static void qfq_qlen_notify(struct Qdisc *sch, unsigned long arg)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH BUGFIX] pkt_sched: fix little service anomalies and possible crashes of qfq+
  2012-12-19 17:31 [PATCH BUGFIX] pkt_sched: fix little service anomalies and possible crashes of qfq+ Paolo valente
@ 2012-12-26 23:13 ` David Miller
  2013-02-26 17:02   ` Paolo valente
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2012-12-26 23:13 UTC (permalink / raw)
  To: paolo.valente; +Cc: jhs, shemminger, linux-kernel, netdev, rizzo, fchecconi

From: Paolo valente <paolo.valente@unimore.it>
Date: Wed, 19 Dec 2012 18:31:06 +0100

> +	/*
> +	 * The next assignment may let
> +	 * agg->initial_budget > agg->budgetmax
> +	 * hold, but this does not cause any harm
> +	 */

Please format comments in the networking:

	/* Like
	 * this.
	 */

and

	/*
	 * Never
	 * like this.
	 */

I know this file is full of exceptions, but that error is to be
corrected rather than expanded.

> +	/*
> +	 * If lmax is lowered, through qfq_change_class, for a class
> +	 * owning pending packets with larger size than the new value of lmax,
> +	 * then the following condition may hold.
> +	 */

Likewise.

And I'm not applying this until someone familiar with this code
does some review of this patch.  These are seriously non-trivial
changes.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH BUGFIX] pkt_sched: fix little service anomalies and possible crashes of qfq+
  2012-12-26 23:13 ` David Miller
@ 2013-02-26 17:02   ` Paolo valente
  2013-02-26 22:37     ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo valente @ 2013-02-26 17:02 UTC (permalink / raw)
  To: Jamal Hadi Salim, David S. Miller, shemminger
  Cc: netdev, linux-kernel, fchecconi, rizzo, Paolo Valente

This is a new version of the patch, with comments reformatted as
requested, and reviewed by Fabio (one of the authors of QFQ). This
version also includes some little, non-functional code improvements
suggested by Fabio.

The patch contains five fixes that remove some little service
anomalies, plus an improvement that slightly reduces the execution
time of qfq+.

As for possible crashes, in a stress test in which a few hundreds of
classes randomly switched from backlogged to non-backlogged, and the
parameters of a few tens of randomly chosen classes were changed
repeatedly and in parallel, the problems solved by these fixes
happened to combine and cause corruption of the data structures
supporting the group bucket lists.

The portions of the code interested by each fix are small and do not
overlap with each other, so I decided to provide just one patch
(I hope that this was the right choice).

1. Fix the update of eligible-group sets

Between two invocations of make_eligible, the system virtual time may
happen to grow enough that, in its binary representation, a bit with
higher order than 31 flips. This happens especially with
TSO/GSO. Before this fix, the mask used in make_eligible was computed
as (1UL<<index_of_last_flipped_bit)-1, whose value is well defined on
a 64-bit architecture, because index_of_flipped_bit <= 63, but is in
general undefined on a 32-bit architecture if index_of_flipped_bit > 31.
The fix just replaces 1UL with 1ULL.

2. Properly cap timestamps in charge_actual_service

After decreasing the number of classes, and hence the maximum budget
budgetmax for an aggregate in service, it may happen that the service
it receives before a new aggregate is chosen for service becomes
larger than budgetmax. This may cause the aggregate to be assigned an
incorrect, too high virtual finish time. The cap introduced by this
fix solves this problem.

3. Do not allow virtual time to jump if an aggregate is in service

By definition of (the algorithm of) qfq+, the system virtual time must
be pushed up only if there is no eligible aggregate. But to decide
whether this condition holds correctly, qfq+ must also check whether
no aggregate is in service (and hence eligible).

4. Prevent the budget of an aggregate from becoming negative in
qfq_dequeue

If lmax is lowered, through qfq_change_class, for a class owning
pending packets with larger size than the new value of lmax, then even
the budget of a just selected-aggregate may happen to be lower than
the length of the next packet to serve. This fix prevents the budget
from becoming negative after the packet is dequeued.

5. Start serving a just-activated aggregate immediately if the
scheduler is empty

This avoids qfq+ to be occasionally non-work-conserving.

6. Remove useless invocation of qfq_update_eligible from qfq_deactivate_agg

There is no need to invoke qfq_update_eligible in qfq_deactivate_agg
to keep qfq+ work-conserving.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
---
 net/sched/sch_qfq.c |   66 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 45 insertions(+), 21 deletions(-)

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 6ed3765..7cfa1f8 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -299,6 +299,10 @@ static void qfq_update_agg(struct qfq_sched *q, struct qfq_aggregate *agg,
 	    new_num_classes == q->max_agg_classes - 1) /* agg no more full */
 		hlist_add_head(&agg->nonfull_next, &q->nonfull_aggs);
 
+	/* The next assignment may let
+	 * agg->initial_budget > agg->budgetmax
+	 * hold, we will take it into account in charge_actual_service().
+	 */
 	agg->budgetmax = new_num_classes * agg->lmax;
 	new_agg_weight = agg->class_weight * new_num_classes;
 	agg->inv_w = ONE_FP/new_agg_weight;
@@ -819,7 +823,7 @@ static void qfq_make_eligible(struct qfq_sched *q)
 	unsigned long old_vslot = q->oldV >> q->min_slot_shift;
 
 	if (vslot != old_vslot) {
-		unsigned long mask = (1UL << fls(vslot ^ old_vslot)) - 1;
+		unsigned long mask = (1ULL << fls(vslot ^ old_vslot)) - 1;
 		qfq_move_groups(q, mask, IR, ER);
 		qfq_move_groups(q, mask, IB, EB);
 	}
@@ -990,12 +994,23 @@ static inline struct sk_buff *qfq_peek_skb(struct qfq_aggregate *agg,
 /* Update F according to the actual service received by the aggregate. */
 static inline void charge_actual_service(struct qfq_aggregate *agg)
 {
-	/* compute the service received by the aggregate */
-	u32 service_received = agg->initial_budget - agg->budget;
+	/* Compute the service received by the aggregate, taking into
+	 * account that, after decreasing the number of classes in
+	 * agg, it may happen that
+	 * agg->initial_budget - agg->budget > agg->bugdetmax
+	 */
+	u32 service_received = min(agg->budgetmax,
+				   agg->initial_budget - agg->budget);
 
 	agg->F = agg->S + (u64)service_received * agg->inv_w;
 }
 
+static inline void qfq_update_agg_ts(struct qfq_sched *q,
+				     struct qfq_aggregate *agg,
+				     enum update_reason reason);
+
+static void qfq_schedule_agg(struct qfq_sched *q, struct qfq_aggregate *agg);
+
 static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 {
 	struct qfq_sched *q = qdisc_priv(sch);
@@ -1023,7 +1038,7 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 		in_serv_agg->initial_budget = in_serv_agg->budget =
 			in_serv_agg->budgetmax;
 
-		if (!list_empty(&in_serv_agg->active))
+		if (!list_empty(&in_serv_agg->active)) {
 			/*
 			 * Still active: reschedule for
 			 * service. Possible optimization: if no other
@@ -1034,8 +1049,9 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 			 * handle it, we would need to maintain an
 			 * extra num_active_aggs field.
 			*/
-			qfq_activate_agg(q, in_serv_agg, requeue);
-		else if (sch->q.qlen == 0) { /* no aggregate to serve */
+			qfq_update_agg_ts(q, in_serv_agg, requeue);
+			qfq_schedule_agg(q, in_serv_agg);
+		} else if (sch->q.qlen == 0) { /* no aggregate to serve */
 			q->in_serv_agg = NULL;
 			return NULL;
 		}
@@ -1054,7 +1070,15 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 	qdisc_bstats_update(sch, skb);
 
 	agg_dequeue(in_serv_agg, cl, len);
-	in_serv_agg->budget -= len;
+	/* If lmax is lowered, through qfq_change_class, for a class
+	 * owning pending packets with larger size than the new value
+	 * of lmax, then the following condition may hold.
+	 */
+	if (unlikely(in_serv_agg->budget < len))
+		in_serv_agg->budget = 0;
+	else
+		in_serv_agg->budget -= len;
+
 	q->V += (u64)len * IWSUM;
 	pr_debug("qfq dequeue: len %u F %lld now %lld\n",
 		 len, (unsigned long long) in_serv_agg->F,
@@ -1219,17 +1243,11 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	cl->deficit = agg->lmax;
 	list_add_tail(&cl->alist, &agg->active);
 
-	if (list_first_entry(&agg->active, struct qfq_class, alist) != cl)
-		return err; /* aggregate was not empty, nothing else to do */
+	if (list_first_entry(&agg->active, struct qfq_class, alist) != cl ||
+	    q->in_serv_agg == agg)
+		return err; /* non-empty or in service, nothing else to do */
 
-	/* recharge budget */
-	agg->initial_budget = agg->budget = agg->budgetmax;
-
-	qfq_update_agg_ts(q, agg, enqueue);
-	if (q->in_serv_agg == NULL)
-		q->in_serv_agg = agg;
-	else if (agg != q->in_serv_agg)
-		qfq_schedule_agg(q, agg);
+	qfq_activate_agg(q, agg, enqueue);
 
 	return err;
 }
@@ -1263,7 +1281,8 @@ static void qfq_schedule_agg(struct qfq_sched *q, struct qfq_aggregate *agg)
 		/* group was surely ineligible, remove */
 		__clear_bit(grp->index, &q->bitmaps[IR]);
 		__clear_bit(grp->index, &q->bitmaps[IB]);
-	} else if (!q->bitmaps[ER] && qfq_gt(roundedS, q->V))
+	} else if (!q->bitmaps[ER] && qfq_gt(roundedS, q->V) &&
+		   q->in_serv_agg == NULL)
 		q->V = roundedS;
 
 	grp->S = roundedS;
@@ -1286,8 +1305,15 @@ skip_update:
 static void qfq_activate_agg(struct qfq_sched *q, struct qfq_aggregate *agg,
 			     enum update_reason reason)
 {
+	agg->initial_budget = agg->budget = agg->budgetmax; /* recharge budg. */
+
 	qfq_update_agg_ts(q, agg, reason);
-	qfq_schedule_agg(q, agg);
+	if (q->in_serv_agg == NULL) { /* no aggr. in service or scheduled */
+		q->in_serv_agg = agg; /* start serving this aggregate */
+		 /* update V: to be in service, agg must be eligible */
+		q->oldV = q->V = agg->S;
+	} else if (agg != q->in_serv_agg)
+		qfq_schedule_agg(q, agg);
 }
 
 static void qfq_slot_remove(struct qfq_sched *q, struct qfq_group *grp,
@@ -1359,8 +1385,6 @@ static void qfq_deactivate_agg(struct qfq_sched *q, struct qfq_aggregate *agg)
 			__set_bit(grp->index, &q->bitmaps[s]);
 		}
 	}
-
-	qfq_update_eligible(q);
 }
 
 static void qfq_qlen_notify(struct Qdisc *sch, unsigned long arg)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH BUGFIX] pkt_sched: fix little service anomalies and possible crashes of qfq+
  2013-02-26 17:02   ` Paolo valente
@ 2013-02-26 22:37     ` David Miller
  2013-03-05 18:04       ` [PATCH BUGFIX 0/6] " Paolo valente
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2013-02-26 22:37 UTC (permalink / raw)
  To: paolo.valente; +Cc: jhs, shemminger, netdev, linux-kernel, fchecconi, rizzo

From: Paolo valente <paolo.valente@unimore.it>
Date: Tue, 26 Feb 2013 18:02:46 +0100

> The portions of the code interested by each fix are small and do not
> overlap with each other, so I decided to provide just one patch
> (I hope that this was the right choice).

Please split this up into 6 patches, each with an appropriately
verbose analysis and explanation of each bug being fixed, thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH BUGFIX 0/6] pkt_sched: fix little service anomalies and possible crashes of qfq+
  2013-02-26 22:37     ` David Miller
@ 2013-03-05 18:04       ` Paolo valente
  2013-03-05 18:04         ` [PATCH BUGFIX 1/6] pkt_sched: properly cap timestamps in charge_actual_service Paolo valente
                           ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Paolo valente @ 2013-03-05 18:04 UTC (permalink / raw)
  To: Jamal Hadi Salim, David S. Miller, shemminger
  Cc: netdev, linux-kernel, fchecconi, rizzo, Paolo Valente

Il 26/02/2013 23:37, David Miller ha scritto:
> From: Paolo valente <paolo.valente@unimore.it>
> Date: Tue, 26 Feb 2013 18:02:46 +0100
> 
>> The portions of the code interested by each fix are small and do not
>> overlap with each other, so I decided to provide just one patch
>> (I hope that this was the right choice).
> 
> Please split this up into 6 patches, each with an appropriately
> verbose analysis and explanation of each bug being fixed, thanks.
> 
> 

Split, and inserted a detailed description of both the problem and the fix
in each patch.

Paolo valente (6):
  pkt_sched: properly cap timestamps in charge_actual_service
  pkt_sched: fix the update of eligible-group sets
  pkt_sched: serve activated aggregates immediately if the scheduler is
    empty
  pkt_sched: prevent budget from wrapping around after a dequeue
  pkt_sched: do not allow virtual time to jump if an aggregate is in
    service
  pkt_sched: remove a useless invocation of qfq_update_eligible

 net/sched/sch_qfq.c |   66 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 45 insertions(+), 21 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH BUGFIX 1/6] pkt_sched: properly cap timestamps in charge_actual_service
  2013-03-05 18:04       ` [PATCH BUGFIX 0/6] " Paolo valente
@ 2013-03-05 18:04         ` Paolo valente
  2013-03-05 18:04         ` [PATCH BUGFIX 2/6] pkt_sched: fix the update of eligible-group sets Paolo valente
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Paolo valente @ 2013-03-05 18:04 UTC (permalink / raw)
  To: Jamal Hadi Salim, David S. Miller, shemminger
  Cc: netdev, linux-kernel, fchecconi, rizzo, Paolo Valente

QFQ+ schedules the active aggregates in a group using a bucket list
(one list per group). The bucket in which each aggregate is inserted
depends on the aggregate's timestamps, and the number
of buckets in a group is enough to accomodate the possible (range of)
values of the timestamps of all the aggregates in the group. For this
property to hold, timestamps must however be computed correctly.  One
necessary condition for computing timestamps correctly is that the
number of bits dequeued for each aggregate, while the aggregate is in
service, does not exceed the maximum budget budgetmax assigned to the
aggregate.

For each aggregate, budgetmax is proportional to the number of classes
in the aggregate. If the number of classes of the aggregate is
decreased through qfq_change_class(), then budgetmax is decreased
automatically as well.  Problems may occur if the aggregate is in
service when budgetmax is decreased, because the current remaining
budget of the aggregate and/or the service already received by the
aggregate may happen to be larger than the new value of budgetmax.  In
this case, when the aggregate is eventually deselected and its
timestamps are updated, the aggregate may happen to have received an
amount of service larger than budgetmax.  This may cause the aggregate
to be assigned a higher virtual finish time than the maximum
acceptable value for the last bucket in the bucket list of the group.

This fix introduces a cap that addresses this issue.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
---
 net/sched/sch_qfq.c |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 6ed3765..0f6e2db 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -299,6 +299,10 @@ static void qfq_update_agg(struct qfq_sched *q, struct qfq_aggregate *agg,
 	    new_num_classes == q->max_agg_classes - 1) /* agg no more full */
 		hlist_add_head(&agg->nonfull_next, &q->nonfull_aggs);
 
+	/* The next assignment may let
+	 * agg->initial_budget > agg->budgetmax
+	 * hold, we will take it into account in charge_actual_service().
+	 */
 	agg->budgetmax = new_num_classes * agg->lmax;
 	new_agg_weight = agg->class_weight * new_num_classes;
 	agg->inv_w = ONE_FP/new_agg_weight;
@@ -990,8 +994,13 @@ static inline struct sk_buff *qfq_peek_skb(struct qfq_aggregate *agg,
 /* Update F according to the actual service received by the aggregate. */
 static inline void charge_actual_service(struct qfq_aggregate *agg)
 {
-	/* compute the service received by the aggregate */
-	u32 service_received = agg->initial_budget - agg->budget;
+	/* Compute the service received by the aggregate, taking into
+	 * account that, after decreasing the number of classes in
+	 * agg, it may happen that
+	 * agg->initial_budget - agg->budget > agg->bugdetmax
+	 */
+	u32 service_received = min(agg->budgetmax,
+				   agg->initial_budget - agg->budget);
 
 	agg->F = agg->S + (u64)service_received * agg->inv_w;
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH BUGFIX 2/6] pkt_sched: fix the update of eligible-group sets
  2013-03-05 18:04       ` [PATCH BUGFIX 0/6] " Paolo valente
  2013-03-05 18:04         ` [PATCH BUGFIX 1/6] pkt_sched: properly cap timestamps in charge_actual_service Paolo valente
@ 2013-03-05 18:04         ` Paolo valente
  2013-03-06 10:05           ` David Laight
  2013-03-05 18:04         ` [PATCH BUGFIX 3/6] pkt_sched: serve activated aggregates immediately if the scheduler is empty Paolo valente
                           ` (4 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Paolo valente @ 2013-03-05 18:04 UTC (permalink / raw)
  To: Jamal Hadi Salim, David S. Miller, shemminger
  Cc: netdev, linux-kernel, fchecconi, rizzo, Paolo Valente

Between two invocations of make_eligible, the system virtual time may
happen to grow enough that, in its binary representation, a bit with
higher order than 31 flips. This happens especially with
TSO/GSO. Before this fix, the mask used in make_eligible was computed
as (1UL<<index_of_last_flipped_bit)-1, whose value is well defined on
a 64-bit architecture, because index_of_flipped_bit <= 63, but is in
general undefined on a 32-bit architecture if index_of_flipped_bit > 31.
The fix just replaces 1UL with 1ULL.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
---
 net/sched/sch_qfq.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 0f6e2db..4cbbf79 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -823,7 +823,7 @@ static void qfq_make_eligible(struct qfq_sched *q)
 	unsigned long old_vslot = q->oldV >> q->min_slot_shift;
 
 	if (vslot != old_vslot) {
-		unsigned long mask = (1UL << fls(vslot ^ old_vslot)) - 1;
+		unsigned long mask = (1ULL << fls(vslot ^ old_vslot)) - 1;
 		qfq_move_groups(q, mask, IR, ER);
 		qfq_move_groups(q, mask, IB, EB);
 	}
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH BUGFIX 3/6] pkt_sched: serve activated aggregates immediately if the scheduler is empty
  2013-03-05 18:04       ` [PATCH BUGFIX 0/6] " Paolo valente
  2013-03-05 18:04         ` [PATCH BUGFIX 1/6] pkt_sched: properly cap timestamps in charge_actual_service Paolo valente
  2013-03-05 18:04         ` [PATCH BUGFIX 2/6] pkt_sched: fix the update of eligible-group sets Paolo valente
@ 2013-03-05 18:04         ` Paolo valente
  2013-03-05 18:05         ` [PATCH BUGFIX 4/6] pkt_sched: prevent budget from wrapping around after a dequeue Paolo valente
                           ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Paolo valente @ 2013-03-05 18:04 UTC (permalink / raw)
  To: Jamal Hadi Salim, David S. Miller, shemminger
  Cc: netdev, linux-kernel, fchecconi, rizzo, Paolo Valente

If no aggregate is in service, then the function qfq_dequeue() does
not dequeue any packet. For this reason, to guarantee QFQ+ to be work
conserving, a just-activated aggregate must be set as in service
immediately if it happens to be the only active aggregate.
This is done by the function qfq_enqueue().

Unfortunately, the function qfq_add_to_agg(), used to add a class to
an aggregate, does not perform this important additional operation.
In particular, if: 1) qfq_add_to_agg() is invoked to complete the move
of a class from a source aggregate, becoming, for this move, inactive,
to a destination aggregate, becoming instead active, and 2) the
destination aggregate becomes the only active aggregate, then this
aggregate is not however set as in service. QFQ+ remains then in a
non-work-conserving state until a new invocation of qfq_enqueue()
recovers the situation.

This fix solves the problem by moving the logic for setting an
aggregate as in service directly into the function qfq_activate_agg().
Hence, from whatever point qfq_activate_aggregate() is invoked, QFQ+
remains work conserving.  Since the more-complex logic of this new
version of activate_aggregate() is not necessary, in qfq_dequeue(), to
reschedule an aggregate that finishes its budget, then the aggregate
is now rescheduled by invoking directly the functions needed.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
---
 net/sched/sch_qfq.c |   36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 4cbbf79..0dbec31 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1005,6 +1005,12 @@ static inline void charge_actual_service(struct qfq_aggregate *agg)
 	agg->F = agg->S + (u64)service_received * agg->inv_w;
 }
 
+static inline void qfq_update_agg_ts(struct qfq_sched *q,
+				     struct qfq_aggregate *agg,
+				     enum update_reason reason);
+
+static void qfq_schedule_agg(struct qfq_sched *q, struct qfq_aggregate *agg);
+
 static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 {
 	struct qfq_sched *q = qdisc_priv(sch);
@@ -1032,7 +1038,7 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 		in_serv_agg->initial_budget = in_serv_agg->budget =
 			in_serv_agg->budgetmax;
 
-		if (!list_empty(&in_serv_agg->active))
+		if (!list_empty(&in_serv_agg->active)) {
 			/*
 			 * Still active: reschedule for
 			 * service. Possible optimization: if no other
@@ -1043,8 +1049,9 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 			 * handle it, we would need to maintain an
 			 * extra num_active_aggs field.
 			*/
-			qfq_activate_agg(q, in_serv_agg, requeue);
-		else if (sch->q.qlen == 0) { /* no aggregate to serve */
+			qfq_update_agg_ts(q, in_serv_agg, requeue);
+			qfq_schedule_agg(q, in_serv_agg);
+		} else if (sch->q.qlen == 0) { /* no aggregate to serve */
 			q->in_serv_agg = NULL;
 			return NULL;
 		}
@@ -1228,17 +1235,11 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	cl->deficit = agg->lmax;
 	list_add_tail(&cl->alist, &agg->active);
 
-	if (list_first_entry(&agg->active, struct qfq_class, alist) != cl)
-		return err; /* aggregate was not empty, nothing else to do */
-
-	/* recharge budget */
-	agg->initial_budget = agg->budget = agg->budgetmax;
+	if (list_first_entry(&agg->active, struct qfq_class, alist) != cl ||
+	    q->in_serv_agg == agg)
+		return err; /* non-empty or in service, nothing else to do */
 
-	qfq_update_agg_ts(q, agg, enqueue);
-	if (q->in_serv_agg == NULL)
-		q->in_serv_agg = agg;
-	else if (agg != q->in_serv_agg)
-		qfq_schedule_agg(q, agg);
+	qfq_activate_agg(q, agg, enqueue);
 
 	return err;
 }
@@ -1295,8 +1296,15 @@ skip_update:
 static void qfq_activate_agg(struct qfq_sched *q, struct qfq_aggregate *agg,
 			     enum update_reason reason)
 {
+	agg->initial_budget = agg->budget = agg->budgetmax; /* recharge budg. */
+
 	qfq_update_agg_ts(q, agg, reason);
-	qfq_schedule_agg(q, agg);
+	if (q->in_serv_agg == NULL) { /* no aggr. in service or scheduled */
+		q->in_serv_agg = agg; /* start serving this aggregate */
+		 /* update V: to be in service, agg must be eligible */
+		q->oldV = q->V = agg->S;
+	} else if (agg != q->in_serv_agg)
+		qfq_schedule_agg(q, agg);
 }
 
 static void qfq_slot_remove(struct qfq_sched *q, struct qfq_group *grp,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH BUGFIX 4/6] pkt_sched: prevent budget from wrapping around after a dequeue
  2013-03-05 18:04       ` [PATCH BUGFIX 0/6] " Paolo valente
                           ` (2 preceding siblings ...)
  2013-03-05 18:04         ` [PATCH BUGFIX 3/6] pkt_sched: serve activated aggregates immediately if the scheduler is empty Paolo valente
@ 2013-03-05 18:05         ` Paolo valente
  2013-03-05 18:05         ` [PATCH BUGFIX 5/6] pkt_sched: do not allow virtual time to jump if an aggregate is in service Paolo valente
                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Paolo valente @ 2013-03-05 18:05 UTC (permalink / raw)
  To: Jamal Hadi Salim, David S. Miller, shemminger
  Cc: netdev, linux-kernel, fchecconi, rizzo, Paolo Valente

Aggregate budgets are computed so as to guarantee that, after an
aggregate has been selected for service, that aggregate has enough
budget to serve at least one maximum-size packet for the classes it
contains. For this reason, after a new aggregate has been selected
for service, its next packet is immediately dequeued, without any
further control.

The maximum packet size for a class, lmax, can be changed through
qfq_change_class(). In case the user sets lmax to a lower value than
the the size of some of the still-to-arrive packets, QFQ+ will
automatically push up lmax as it enqueues these packets.  This
automatic push up is likely to happen with TSO/GSO.

In any case, if lmax is assigned a lower value than the size of some
of the packets already enqueued for the class, then the following
problem may occur: the size of the next packet to dequeue for the
class may happen to be larger than lmax, after the aggregate to which
the class belongs has been just selected for service. In this case,
even the budget of the aggregate, which is an unsigned value, may be
lower than the size of the next packet to dequeue. After dequeueing
this packet and subtracting its size from the budget, the latter would
wrap around.

This fix prevents the budget from wrapping around after any packet
dequeue.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
---
 net/sched/sch_qfq.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 0dbec31..26149e2 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1070,7 +1070,15 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
 	qdisc_bstats_update(sch, skb);
 
 	agg_dequeue(in_serv_agg, cl, len);
-	in_serv_agg->budget -= len;
+	/* If lmax is lowered, through qfq_change_class, for a class
+	 * owning pending packets with larger size than the new value
+	 * of lmax, then the following condition may hold.
+	 */
+	if (unlikely(in_serv_agg->budget < len))
+		in_serv_agg->budget = 0;
+	else
+		in_serv_agg->budget -= len;
+
 	q->V += (u64)len * IWSUM;
 	pr_debug("qfq dequeue: len %u F %lld now %lld\n",
 		 len, (unsigned long long) in_serv_agg->F,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH BUGFIX 5/6] pkt_sched: do not allow virtual time to jump if an aggregate is in service
  2013-03-05 18:04       ` [PATCH BUGFIX 0/6] " Paolo valente
                           ` (3 preceding siblings ...)
  2013-03-05 18:05         ` [PATCH BUGFIX 4/6] pkt_sched: prevent budget from wrapping around after a dequeue Paolo valente
@ 2013-03-05 18:05         ` Paolo valente
  2013-03-05 18:05         ` [PATCH BUGFIX 6/6] pkt_sched: remove a useless invocation of qfq_update_eligible Paolo valente
  2013-03-06  4:50         ` [PATCH BUGFIX 0/6] pkt_sched: fix little service anomalies and possible crashes of qfq+ David Miller
  6 siblings, 0 replies; 17+ messages in thread
From: Paolo valente @ 2013-03-05 18:05 UTC (permalink / raw)
  To: Jamal Hadi Salim, David S. Miller, shemminger
  Cc: netdev, linux-kernel, fchecconi, rizzo, Paolo Valente

By definition of (the algorithm of) QFQ+, the system virtual time must
be pushed up only if there is no 'eligible' aggregate, i.e. no
aggregate that would have started to be served also in the ideal
system emulated by QFQ+.  QFQ+ serves only eligible aggregates, hence
the aggregate currently in service is eligible.  As a consequence, to
decide whether there is no eligible aggregate, QFQ+ must also check
whether there is no aggregate in service.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
---
 net/sched/sch_qfq.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 26149e2..5d327b3 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1281,7 +1281,8 @@ static void qfq_schedule_agg(struct qfq_sched *q, struct qfq_aggregate *agg)
 		/* group was surely ineligible, remove */
 		__clear_bit(grp->index, &q->bitmaps[IR]);
 		__clear_bit(grp->index, &q->bitmaps[IB]);
-	} else if (!q->bitmaps[ER] && qfq_gt(roundedS, q->V))
+	} else if (!q->bitmaps[ER] && qfq_gt(roundedS, q->V) &&
+		   q->in_serv_agg == NULL)
 		q->V = roundedS;
 
 	grp->S = roundedS;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH BUGFIX 6/6] pkt_sched: remove a useless invocation of qfq_update_eligible
  2013-03-05 18:04       ` [PATCH BUGFIX 0/6] " Paolo valente
                           ` (4 preceding siblings ...)
  2013-03-05 18:05         ` [PATCH BUGFIX 5/6] pkt_sched: do not allow virtual time to jump if an aggregate is in service Paolo valente
@ 2013-03-05 18:05         ` Paolo valente
  2013-03-06  4:50         ` [PATCH BUGFIX 0/6] pkt_sched: fix little service anomalies and possible crashes of qfq+ David Miller
  6 siblings, 0 replies; 17+ messages in thread
From: Paolo valente @ 2013-03-05 18:05 UTC (permalink / raw)
  To: Jamal Hadi Salim, David S. Miller, shemminger
  Cc: netdev, linux-kernel, fchecconi, rizzo, Paolo Valente

QFQ+ can select for service only 'eligible' aggregates, i.e.,
aggregates that would have started to be served also in the emulated
ideal system.  As a consequence, for QFQ+ to be work conserving, at
least one of the active aggregates must be eligible when it is time to
choose the next aggregate to serve.

The set of eligible aggregates is updated through the function
qfq_update_eligible(), which does guarantee that, after its
invocation, at least one of the active aggregates is eligible.
Because of this property, this function is invoked in
qfq_deactivate_agg() to guarantee that at least one of the active
aggregates is still eligible after an aggregate has been deactivated.
In particular, the critical case is when there are other active
aggregates, but the aggregate being deactivated happens to be the only
one eligible.

However, this precaution is not needed for QFQ+ to be work conserving,
because update_eligible() is always invoked also at the beginning of
qfq_choose_next_agg(). This patch removes the additional invocation of
update_eligible() in qfq_deactivate_agg().

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
---
 net/sched/sch_qfq.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 5d327b3..7cfa1f8 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1385,8 +1385,6 @@ static void qfq_deactivate_agg(struct qfq_sched *q, struct qfq_aggregate *agg)
 			__set_bit(grp->index, &q->bitmaps[s]);
 		}
 	}
-
-	qfq_update_eligible(q);
 }
 
 static void qfq_qlen_notify(struct Qdisc *sch, unsigned long arg)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH BUGFIX 0/6] pkt_sched: fix little service anomalies and possible crashes of qfq+
  2013-03-05 18:04       ` [PATCH BUGFIX 0/6] " Paolo valente
                           ` (5 preceding siblings ...)
  2013-03-05 18:05         ` [PATCH BUGFIX 6/6] pkt_sched: remove a useless invocation of qfq_update_eligible Paolo valente
@ 2013-03-06  4:50         ` David Miller
  2013-07-10 13:46           ` [PATCH 0/2] pkt_sched: sch_qfq: efficiency and codestyle improvements Paolo Valente
  6 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2013-03-06  4:50 UTC (permalink / raw)
  To: paolo.valente; +Cc: jhs, shemminger, netdev, linux-kernel, fchecconi, rizzo

From: Paolo valente <paolo.valente@unimore.it>
Date: Tue,  5 Mar 2013 19:04:56 +0100

> Split, and inserted a detailed description of both the problem and the fix
> in each patch.

Series applied, thanks.

Although two topics for possibly resolving later:

1) That 1ULL bit mask fix is quite expensive on 32-bit, it would
   probably be cheaper to test for that case using a helper function
   that nops out on 64-bit.  Although this is not so important.

2) That static inline forward declaration is ugly, better to remove
   the inline tag (let the compiler handle it) or move the function
   above all the call sites.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH BUGFIX 2/6] pkt_sched: fix the update of eligible-group sets
  2013-03-05 18:04         ` [PATCH BUGFIX 2/6] pkt_sched: fix the update of eligible-group sets Paolo valente
@ 2013-03-06 10:05           ` David Laight
  0 siblings, 0 replies; 17+ messages in thread
From: David Laight @ 2013-03-06 10:05 UTC (permalink / raw)
  To: Paolo valente, Jamal Hadi Salim, David S. Miller, shemminger
  Cc: netdev, linux-kernel, fchecconi, rizzo

> Between two invocations of make_eligible, the system virtual time may
> happen to grow enough that, in its binary representation, a bit with
> higher order than 31 flips. This happens especially with
> TSO/GSO. Before this fix, the mask used in make_eligible was computed
> as (1UL<<index_of_last_flipped_bit)-1, whose value is well defined on
> a 64-bit architecture, because index_of_flipped_bit <= 63, but is in
> general undefined on a 32-bit architecture if index_of_flipped_bit > 31.
> The fix just replaces 1UL with 1ULL.
...
> -		unsigned long mask = (1UL << fls(vslot ^ old_vslot)) - 1;
> +		unsigned long mask = (1ULL << fls(vslot ^ old_vslot)) - 1;

I'm not sure you really want to be doing 64bit shifts on 32 bit
systems just to generate ~0ul.
probably worth a conditional.

	David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/2]  pkt_sched: sch_qfq: efficiency and codestyle improvements
  2013-03-06  4:50         ` [PATCH BUGFIX 0/6] pkt_sched: fix little service anomalies and possible crashes of qfq+ David Miller
@ 2013-07-10 13:46           ` Paolo Valente
  2013-07-10 13:46             ` [PATCH 1/2] pkt_sched: sch_qfq: improve efficiency of make_eligible Paolo Valente
                               ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Paolo Valente @ 2013-07-10 13:46 UTC (permalink / raw)
  To: Jamal Hadi Salim, David S. Miller, shemminger
  Cc: netdev, linux-kernel, fchecconi, rizzo, David Laight, Paolo Valente

Hi,
this patchset tries to address the two issues raised by Dave in
http://marc.info/?l=linux-kernel&m=136254542019113&w=2

The first of these issues has been also raised by
David Laight:
http://marc.info/?l=linux-kernel&m=136256446624557&w=2

> Although two topics for possibly resolving later:
> 
> 1) That 1ULL bit mask fix is quite expensive on 32-bit, it would
>     probably be cheaper to test for that case using a helper function
>     that nops out on 64-bit.  Although this is not so important.
> 
I have tried to find a simple solution, based on the properties of the mask.
It is in the first patch.

> 2) That static inline forward declaration is ugly, better to remove
>     the inline tag (let the compiler handle it) or move the function
>     above all the call sites.
> 
Done by the second patch.

Paolo Valente (2):
  pkt_sched: sch_qfq: improve efficiency of make_eligible
  pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts

 net/sched/sch_qfq.c | 127 ++++++++++++++++++++++++++--------------------------
 1 file changed, 63 insertions(+), 64 deletions(-)

-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] pkt_sched: sch_qfq: improve efficiency of make_eligible
  2013-07-10 13:46           ` [PATCH 0/2] pkt_sched: sch_qfq: efficiency and codestyle improvements Paolo Valente
@ 2013-07-10 13:46             ` Paolo Valente
  2013-07-10 13:46             ` [PATCH 2/2] pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts Paolo Valente
  2013-07-11 20:01             ` [PATCH 0/2] pkt_sched: sch_qfq: efficiency and codestyle improvements David Miller
  2 siblings, 0 replies; 17+ messages in thread
From: Paolo Valente @ 2013-07-10 13:46 UTC (permalink / raw)
  To: Jamal Hadi Salim, David S. Miller, shemminger
  Cc: netdev, linux-kernel, fchecconi, rizzo, David Laight, Paolo Valente

In make_eligible, a mask is used to decide which groups must become eligible:
the i-th group becomes eligible only if the i-th bit of the mask (from the
right) is set. The mask is computed by left-shifting a 1 by a given number of
places, and decrementing the result.  The shift is performed on a ULL to avoid
problems in case the number of places to shift is higher than 31.  On a 32-bit
machine, this is more costly than working on an UL. This patch replaces such a
costly operation with two cheaper branches.

The trick is based on the following fact: in case of a shift of at least 32
places, the resulting mask has at least the 32 less significant bits set,
whereas the total number of groups is lower than 32.  As a consequence, in this
case it is enough to just set the 32 less significant bits of the mask with a
cheaper ~0UL. In the other case, the shift can be safely performed on a UL.

Reported-by: David S. Miller <davem@davemloft.net>
Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
---
 net/sched/sch_qfq.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index d51852b..2a0ed50 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -821,7 +821,14 @@ static void qfq_make_eligible(struct qfq_sched *q)
 	unsigned long old_vslot = q->oldV >> q->min_slot_shift;
 
 	if (vslot != old_vslot) {
-		unsigned long mask = (1ULL << fls(vslot ^ old_vslot)) - 1;
+		unsigned long mask;
+		int last_flip_pos = fls(vslot ^ old_vslot);
+
+		if (last_flip_pos > 31) /* higher than the number of groups */
+			mask = ~0UL;    /* make all groups eligible */
+		else
+			mask = (1UL << last_flip_pos) - 1;
+
 		qfq_move_groups(q, mask, IR, ER);
 		qfq_move_groups(q, mask, IB, EB);
 	}
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts
  2013-07-10 13:46           ` [PATCH 0/2] pkt_sched: sch_qfq: efficiency and codestyle improvements Paolo Valente
  2013-07-10 13:46             ` [PATCH 1/2] pkt_sched: sch_qfq: improve efficiency of make_eligible Paolo Valente
@ 2013-07-10 13:46             ` Paolo Valente
  2013-07-11 20:01             ` [PATCH 0/2] pkt_sched: sch_qfq: efficiency and codestyle improvements David Miller
  2 siblings, 0 replies; 17+ messages in thread
From: Paolo Valente @ 2013-07-10 13:46 UTC (permalink / raw)
  To: Jamal Hadi Salim, David S. Miller, shemminger
  Cc: netdev, linux-kernel, fchecconi, rizzo, David Laight, Paolo Valente

This patch removes the forward declaration of qfq_update_agg_ts, by moving
the definition of the function above its first call. This patch also
removes a useless forward declaration of qfq_schedule_agg.

Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
---
 net/sched/sch_qfq.c | 118 ++++++++++++++++++++++++----------------------------
 1 file changed, 55 insertions(+), 63 deletions(-)

diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 2a0ed50..0025ebe 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1010,9 +1010,61 @@ static inline void charge_actual_service(struct qfq_aggregate *agg)
 	agg->F = agg->S + (u64)service_received * agg->inv_w;
 }
 
-static inline void qfq_update_agg_ts(struct qfq_sched *q,
-				     struct qfq_aggregate *agg,
-				     enum update_reason reason);
+/* Assign a reasonable start time for a new aggregate in group i.
+ * Admissible values for \hat(F) are multiples of \sigma_i
+ * no greater than V+\sigma_i . Larger values mean that
+ * we had a wraparound so we consider the timestamp to be stale.
+ *
+ * If F is not stale and F >= V then we set S = F.
+ * Otherwise we should assign S = V, but this may violate
+ * the ordering in EB (see [2]). So, if we have groups in ER,
+ * set S to the F_j of the first group j which would be blocking us.
+ * We are guaranteed not to move S backward because
+ * otherwise our group i would still be blocked.
+ */
+static void qfq_update_start(struct qfq_sched *q, struct qfq_aggregate *agg)
+{
+	unsigned long mask;
+	u64 limit, roundedF;
+	int slot_shift = agg->grp->slot_shift;
+
+	roundedF = qfq_round_down(agg->F, slot_shift);
+	limit = qfq_round_down(q->V, slot_shift) + (1ULL << slot_shift);
+
+	if (!qfq_gt(agg->F, q->V) || qfq_gt(roundedF, limit)) {
+		/* timestamp was stale */
+		mask = mask_from(q->bitmaps[ER], agg->grp->index);
+		if (mask) {
+			struct qfq_group *next = qfq_ffs(q, mask);
+			if (qfq_gt(roundedF, next->F)) {
+				if (qfq_gt(limit, next->F))
+					agg->S = next->F;
+				else /* preserve timestamp correctness */
+					agg->S = limit;
+				return;
+			}
+		}
+		agg->S = q->V;
+	} else  /* timestamp is not stale */
+		agg->S = agg->F;
+}
+
+/* Update the timestamps of agg before scheduling/rescheduling it for
+ * service.  In particular, assign to agg->F its maximum possible
+ * value, i.e., the virtual finish time with which the aggregate
+ * should be labeled if it used all its budget once in service.
+ */
+static inline void
+qfq_update_agg_ts(struct qfq_sched *q,
+		    struct qfq_aggregate *agg, enum update_reason reason)
+{
+	if (reason != requeue)
+		qfq_update_start(q, agg);
+	else /* just charge agg for the service received */
+		agg->S = agg->F;
+
+	agg->F = agg->S + (u64)agg->budgetmax * agg->inv_w;
+}
 
 static void qfq_schedule_agg(struct qfq_sched *q, struct qfq_aggregate *agg);
 
@@ -1135,66 +1187,6 @@ static struct qfq_aggregate *qfq_choose_next_agg(struct qfq_sched *q)
 	return agg;
 }
 
-/*
- * Assign a reasonable start time for a new aggregate in group i.
- * Admissible values for \hat(F) are multiples of \sigma_i
- * no greater than V+\sigma_i . Larger values mean that
- * we had a wraparound so we consider the timestamp to be stale.
- *
- * If F is not stale and F >= V then we set S = F.
- * Otherwise we should assign S = V, but this may violate
- * the ordering in EB (see [2]). So, if we have groups in ER,
- * set S to the F_j of the first group j which would be blocking us.
- * We are guaranteed not to move S backward because
- * otherwise our group i would still be blocked.
- */
-static void qfq_update_start(struct qfq_sched *q, struct qfq_aggregate *agg)
-{
-	unsigned long mask;
-	u64 limit, roundedF;
-	int slot_shift = agg->grp->slot_shift;
-
-	roundedF = qfq_round_down(agg->F, slot_shift);
-	limit = qfq_round_down(q->V, slot_shift) + (1ULL << slot_shift);
-
-	if (!qfq_gt(agg->F, q->V) || qfq_gt(roundedF, limit)) {
-		/* timestamp was stale */
-		mask = mask_from(q->bitmaps[ER], agg->grp->index);
-		if (mask) {
-			struct qfq_group *next = qfq_ffs(q, mask);
-			if (qfq_gt(roundedF, next->F)) {
-				if (qfq_gt(limit, next->F))
-					agg->S = next->F;
-				else /* preserve timestamp correctness */
-					agg->S = limit;
-				return;
-			}
-		}
-		agg->S = q->V;
-	} else  /* timestamp is not stale */
-		agg->S = agg->F;
-}
-
-/*
- * Update the timestamps of agg before scheduling/rescheduling it for
- * service.  In particular, assign to agg->F its maximum possible
- * value, i.e., the virtual finish time with which the aggregate
- * should be labeled if it used all its budget once in service.
- */
-static inline void
-qfq_update_agg_ts(struct qfq_sched *q,
-		    struct qfq_aggregate *agg, enum update_reason reason)
-{
-	if (reason != requeue)
-		qfq_update_start(q, agg);
-	else /* just charge agg for the service received */
-		agg->S = agg->F;
-
-	agg->F = agg->S + (u64)agg->budgetmax * agg->inv_w;
-}
-
-static void qfq_schedule_agg(struct qfq_sched *, struct qfq_aggregate *);
-
 static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 {
 	struct qfq_sched *q = qdisc_priv(sch);
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] pkt_sched: sch_qfq: efficiency and codestyle improvements
  2013-07-10 13:46           ` [PATCH 0/2] pkt_sched: sch_qfq: efficiency and codestyle improvements Paolo Valente
  2013-07-10 13:46             ` [PATCH 1/2] pkt_sched: sch_qfq: improve efficiency of make_eligible Paolo Valente
  2013-07-10 13:46             ` [PATCH 2/2] pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts Paolo Valente
@ 2013-07-11 20:01             ` David Miller
  2 siblings, 0 replies; 17+ messages in thread
From: David Miller @ 2013-07-11 20:01 UTC (permalink / raw)
  To: paolo.valente
  Cc: jhs, shemminger, netdev, linux-kernel, fchecconi, rizzo, David.Laight

From: Paolo Valente <paolo.valente@unimore.it>
Date: Wed, 10 Jul 2013 15:46:07 +0200

> Hi,
> this patchset tries to address the two issues raised by Dave in
> http://marc.info/?l=linux-kernel&m=136254542019113&w=2
> 
> The first of these issues has been also raised by
> David Laight:
> http://marc.info/?l=linux-kernel&m=136256446624557&w=2
> 
>> Although two topics for possibly resolving later:
>> 
>> 1) That 1ULL bit mask fix is quite expensive on 32-bit, it would
>>     probably be cheaper to test for that case using a helper function
>>     that nops out on 64-bit.  Although this is not so important.
>> 
> I have tried to find a simple solution, based on the properties of the mask.
> It is in the first patch.
> 
>> 2) That static inline forward declaration is ugly, better to remove
>>     the inline tag (let the compiler handle it) or move the function
>>     above all the call sites.
>> 
> Done by the second patch.

Looks good, thanks for doing this, both applied.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-07-11 20:01 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-19 17:31 [PATCH BUGFIX] pkt_sched: fix little service anomalies and possible crashes of qfq+ Paolo valente
2012-12-26 23:13 ` David Miller
2013-02-26 17:02   ` Paolo valente
2013-02-26 22:37     ` David Miller
2013-03-05 18:04       ` [PATCH BUGFIX 0/6] " Paolo valente
2013-03-05 18:04         ` [PATCH BUGFIX 1/6] pkt_sched: properly cap timestamps in charge_actual_service Paolo valente
2013-03-05 18:04         ` [PATCH BUGFIX 2/6] pkt_sched: fix the update of eligible-group sets Paolo valente
2013-03-06 10:05           ` David Laight
2013-03-05 18:04         ` [PATCH BUGFIX 3/6] pkt_sched: serve activated aggregates immediately if the scheduler is empty Paolo valente
2013-03-05 18:05         ` [PATCH BUGFIX 4/6] pkt_sched: prevent budget from wrapping around after a dequeue Paolo valente
2013-03-05 18:05         ` [PATCH BUGFIX 5/6] pkt_sched: do not allow virtual time to jump if an aggregate is in service Paolo valente
2013-03-05 18:05         ` [PATCH BUGFIX 6/6] pkt_sched: remove a useless invocation of qfq_update_eligible Paolo valente
2013-03-06  4:50         ` [PATCH BUGFIX 0/6] pkt_sched: fix little service anomalies and possible crashes of qfq+ David Miller
2013-07-10 13:46           ` [PATCH 0/2] pkt_sched: sch_qfq: efficiency and codestyle improvements Paolo Valente
2013-07-10 13:46             ` [PATCH 1/2] pkt_sched: sch_qfq: improve efficiency of make_eligible Paolo Valente
2013-07-10 13:46             ` [PATCH 2/2] pkt_sched: sch_qfq: remove forward declaration of qfq_update_agg_ts Paolo Valente
2013-07-11 20:01             ` [PATCH 0/2] pkt_sched: sch_qfq: efficiency and codestyle improvements David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).