All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net_sched: sch_sfq: fix allot handling
@ 2010-12-15 14:03 Eric Dumazet
  2010-12-15 16:03 ` Patrick McHardy
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2010-12-15 14:03 UTC (permalink / raw)
  To: David Miller; +Cc: Patrick McHardy, netdev, Jarek Poplawski

When deploying SFQ/IFB here at work, I found the allot management was
pretty wrong in sfq, even changing allot from short to int...

We should init allot for each new flow turn, not using a previous value,
or else small packets can easily make allot overflow.

Before patch, I saw burst of several packets per flow, apparently
denying the "allot 1514" limit I had on my SFQ class.

class sfq 11:1 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 7p requeues 0 
 allot 11546 

class sfq 11:46 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 1p requeues 0 
 allot -23873 

class sfq 11:78 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 5p requeues 0 
 allot 11393 

After patch, better fairness among each flow, allot limit being
respected.

class sfq 11:52 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 4p requeues 0 
 allot 1514 

class sfq 11:60 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 3p requeues 0 
 allot -586 

class sfq 11:6b parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 3p requeues 0 
 allot -586 

class sfq 11:71 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 3p requeues 0 
 allot 1514 


Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/sched/sch_sfq.c |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 3cf478d..8c8a190 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -270,7 +270,7 @@ static unsigned int sfq_drop(struct Qdisc *sch)
 		/* It is difficult to believe, but ALL THE SLOTS HAVE LENGTH 1. */
 		d = q->next[q->tail];
 		q->next[q->tail] = q->next[d];
-		q->allot[q->next[d]] += q->quantum;
+		q->allot[q->next[d]] = q->quantum;
 		skb = q->qs[d].prev;
 		len = qdisc_pkt_len(skb);
 		__skb_unlink(skb, &q->qs[d]);
@@ -321,14 +321,13 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	sfq_inc(q, x);
 	if (q->qs[x].qlen == 1) {		/* The flow is new */
 		if (q->tail == SFQ_DEPTH) {	/* It is the first flow */
-			q->tail = x;
 			q->next[x] = x;
-			q->allot[x] = q->quantum;
 		} else {
 			q->next[x] = q->next[q->tail];
 			q->next[q->tail] = x;
-			q->tail = x;
 		}
+		q->tail = x;
+		q->allot[x] = q->quantum;
 	}
 	if (++sch->q.qlen <= q->limit) {
 		sch->bstats.bytes += qdisc_pkt_len(skb);
@@ -382,11 +381,11 @@ sfq_dequeue(struct Qdisc *sch)
 			return skb;
 		}
 		q->next[q->tail] = a;
-		q->allot[a] += q->quantum;
+		q->allot[a] = q->quantum;
 	} else if ((q->allot[a] -= qdisc_pkt_len(skb)) <= 0) {
 		q->tail = a;
 		a = q->next[a];
-		q->allot[a] += q->quantum;
+		q->allot[a] = q->quantum;
 	}
 	return skb;
 }



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH] net_sched: sch_sfq: fix allot handling
  2010-12-15 14:03 [PATCH] net_sched: sch_sfq: fix allot handling Eric Dumazet
@ 2010-12-15 16:03 ` Patrick McHardy
  2010-12-15 16:27   ` Eric Dumazet
  0 siblings, 1 reply; 40+ messages in thread
From: Patrick McHardy @ 2010-12-15 16:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Jarek Poplawski

On 15.12.2010 15:03, Eric Dumazet wrote:
> When deploying SFQ/IFB here at work, I found the allot management was
> pretty wrong in sfq, even changing allot from short to int...
> 
> We should init allot for each new flow turn, not using a previous value,
> or else small packets can easily make allot overflow.
> 
> Before patch, I saw burst of several packets per flow, apparently
> denying the "allot 1514" limit I had on my SFQ class.
> 
> class sfq 11:1 parent 11: 
>  (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 7p requeues 0 
>  allot 11546 
> 
> class sfq 11:46 parent 11: 
>  (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 1p requeues 0 
>  allot -23873 
> 
> class sfq 11:78 parent 11: 
>  (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 5p requeues 0 
>  allot 11393 

These values definitely look wrong.

> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> index 3cf478d..8c8a190 100644
> --- a/net/sched/sch_sfq.c
> +++ b/net/sched/sch_sfq.c
> @@ -270,7 +270,7 @@ static unsigned int sfq_drop(struct Qdisc *sch)
>  		/* It is difficult to believe, but ALL THE SLOTS HAVE LENGTH 1. */
>  		d = q->next[q->tail];
>  		q->next[q->tail] = q->next[d];
> -		q->allot[q->next[d]] += q->quantum;
> +		q->allot[q->next[d]] = q->quantum;
>  		skb = q->qs[d].prev;
>  		len = qdisc_pkt_len(skb);
>  		__skb_unlink(skb, &q->qs[d]);

I'm not sure about this part, but lets ignore that for now since it
shouldn't affect your testcase unless you're using CBQ.

> @@ -321,14 +321,13 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>  	sfq_inc(q, x);
>  	if (q->qs[x].qlen == 1) {		/* The flow is new */
>  		if (q->tail == SFQ_DEPTH) {	/* It is the first flow */
> -			q->tail = x;
>  			q->next[x] = x;
> -			q->allot[x] = q->quantum;
>  		} else {
>  			q->next[x] = q->next[q->tail];
>  			q->next[q->tail] = x;
> -			q->tail = x;
>  		}
> +		q->tail = x;
> +		q->allot[x] = q->quantum;
>  	}

This looks correct, for new flows allot should be initialized from
scratch.

>  	if (++sch->q.qlen <= q->limit) {
>  		sch->bstats.bytes += qdisc_pkt_len(skb);
> @@ -382,11 +381,11 @@ sfq_dequeue(struct Qdisc *sch)
>  			return skb;
>  		}
>  		q->next[q->tail] = a;
> -		q->allot[a] += q->quantum;
> +		q->allot[a] = q->quantum;

The allot initialization doesn't seem necessary anymore at all
now that you're reinitalizing allot for flows that became active
unconditionally in sfq_enqueue().

>  	} else if ((q->allot[a] -= qdisc_pkt_len(skb)) <= 0) {
>  		q->tail = a;
>  		a = q->next[a];
> -		q->allot[a] += q->quantum;
> +		q->allot[a] = q->quantum;

This seems to break long-term fairness for active flows by not
accounting for overshooting the allotment in the next round
anymore.

I think either the change in sfq_enqueue() or the first change
in sfq_dequeue() should be enough to fix the problem you're seeing.
Basically what needs to be done is initialize allot once from
scratch when the flow becomes active, then add one quantum per
round while it stays active.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] net_sched: sch_sfq: fix allot handling
  2010-12-15 16:03 ` Patrick McHardy
@ 2010-12-15 16:27   ` Eric Dumazet
  2010-12-15 16:40     ` [PATCH v2] " Eric Dumazet
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2010-12-15 16:27 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev, Jarek Poplawski

Le mercredi 15 décembre 2010 à 17:03 +0100, Patrick McHardy a écrit :
> On 15.12.2010 15:03, Eric Dumazet wrote:
> > When deploying SFQ/IFB here at work, I found the allot management was
> > pretty wrong in sfq, even changing allot from short to int...
> > 
> > We should init allot for each new flow turn, not using a previous value,
> > or else small packets can easily make allot overflow.
> > 
> > Before patch, I saw burst of several packets per flow, apparently
> > denying the "allot 1514" limit I had on my SFQ class.
> > 
> > class sfq 11:1 parent 11: 
> >  (dropped 0, overlimits 0 requeues 0) 
> >  backlog 0b 7p requeues 0 
> >  allot 11546 
> > 
> > class sfq 11:46 parent 11: 
> >  (dropped 0, overlimits 0 requeues 0) 
> >  backlog 0b 1p requeues 0 
> >  allot -23873 
> > 
> > class sfq 11:78 parent 11: 
> >  (dropped 0, overlimits 0 requeues 0) 
> >  backlog 0b 5p requeues 0 
> >  allot 11393 
> 
> These values definitely look wrong.
> 
> > diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> > index 3cf478d..8c8a190 100644
> > --- a/net/sched/sch_sfq.c
> > +++ b/net/sched/sch_sfq.c
> > @@ -270,7 +270,7 @@ static unsigned int sfq_drop(struct Qdisc *sch)
> >  		/* It is difficult to believe, but ALL THE SLOTS HAVE LENGTH 1. */
> >  		d = q->next[q->tail];
> >  		q->next[q->tail] = q->next[d];
> > -		q->allot[q->next[d]] += q->quantum;
> > +		q->allot[q->next[d]] = q->quantum;
> >  		skb = q->qs[d].prev;
> >  		len = qdisc_pkt_len(skb);
> >  		__skb_unlink(skb, &q->qs[d]);
> 
> I'm not sure about this part, but lets ignore that for now since it
> shouldn't affect your testcase unless you're using CBQ.
> 




> > @@ -321,14 +321,13 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
> >  	sfq_inc(q, x);
> >  	if (q->qs[x].qlen == 1) {		/* The flow is new */
> >  		if (q->tail == SFQ_DEPTH) {	/* It is the first flow */
> > -			q->tail = x;
> >  			q->next[x] = x;
> > -			q->allot[x] = q->quantum;
> >  		} else {
> >  			q->next[x] = q->next[q->tail];
> >  			q->next[q->tail] = x;
> > -			q->tail = x;
> >  		}
> > +		q->tail = x;
> > +		q->allot[x] = q->quantum;
> >  	}
> 
> This looks correct, for new flows allot should be initialized from
> scratch.
> 
> >  	if (++sch->q.qlen <= q->limit) {
> >  		sch->bstats.bytes += qdisc_pkt_len(skb);
> > @@ -382,11 +381,11 @@ sfq_dequeue(struct Qdisc *sch)
> >  			return skb;
> >  		}
> >  		q->next[q->tail] = a;
> > -		q->allot[a] += q->quantum;
> > +		q->allot[a] = q->quantum;
> 
> The allot initialization doesn't seem necessary anymore at all
> now that you're reinitalizing allot for flows that became active
> unconditionally in sfq_enqueue().
> 



> >  	} else if ((q->allot[a] -= qdisc_pkt_len(skb)) <= 0) {
> >  		q->tail = a;
> >  		a = q->next[a];
> > -		q->allot[a] += q->quantum;
> > +		q->allot[a] = q->quantum;
> 
> This seems to break long-term fairness for active flows by not
> accounting for overshooting the allotment in the next round
> anymore.
> 
> I think either the change in sfq_enqueue() or the first change
> in sfq_dequeue() should be enough to fix the problem you're seeing.
> Basically what needs to be done is initialize allot once from
> scratch when the flow becomes active, then add one quantum per
> round while it stays active.

Hmm, you may be right, thanks a lot for reviewing !

I noticed that with normal quantum (1514), my SFQ setup was sending two
full frames per flow after my patch, so was about to prepare a new
version ;)

I'll post a v2 shortly.

Thanks



^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2] net_sched: sch_sfq: fix allot handling
  2010-12-15 16:27   ` Eric Dumazet
@ 2010-12-15 16:40     ` Eric Dumazet
  2010-12-15 16:43       ` Patrick McHardy
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2010-12-15 16:40 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev, Jarek Poplawski

Le mercredi 15 décembre 2010 à 17:27 +0100, Eric Dumazet a écrit :

> Hmm, you may be right, thanks a lot for reviewing !
> 
> I noticed that with normal quantum (1514), my SFQ setup was sending two
> full frames per flow after my patch, so was about to prepare a new
> version ;)
> 
> I'll post a v2 shortly.

Indeed, the missing init in sfq_enqueue() is enough to solve the
problem : 

[PATCH v2] net_sched: sch_sfq: fix allot handling

When deploying SFQ/IFB here at work, I found the allot management was
pretty wrong in sfq, even changing allot from short to int...

We should init allot for each new flow, not using a previous value found
in slot.

Before patch, I saw bursts of several packets per flow, apparently
denying the default "quantum 1514" limit I had on my SFQ class.

class sfq 11:1 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 7p requeues 0 
 allot 11546 

class sfq 11:46 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 1p requeues 0 
 allot -23873 

class sfq 11:78 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 5p requeues 0 
 allot 11393 

After patch, better fairness among each flow, allot limit being
respected.

class sfq 11:2e4 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 4p requeues 0 
 allot -1166 

class sfq 11:2f7 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 3p requeues 0 
 allot -1166 

class sfq 11:313 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 3p requeues 0 
 allot -1220 

class sfq 11:335 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 4p requeues 0 
 allot -1166 


Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/sched/sch_sfq.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 3cf478d..065a2a5 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -321,14 +321,13 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	sfq_inc(q, x);
 	if (q->qs[x].qlen == 1) {		/* The flow is new */
 		if (q->tail == SFQ_DEPTH) {	/* It is the first flow */
-			q->tail = x;
 			q->next[x] = x;
-			q->allot[x] = q->quantum;
 		} else {
 			q->next[x] = q->next[q->tail];
 			q->next[q->tail] = x;
-			q->tail = x;
 		}
+		q->tail = x;
+		q->allot[x] = q->quantum;
 	}
 	if (++sch->q.qlen <= q->limit) {
 		sch->bstats.bytes += qdisc_pkt_len(skb);



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] net_sched: sch_sfq: fix allot handling
  2010-12-15 16:40     ` [PATCH v2] " Eric Dumazet
@ 2010-12-15 16:43       ` Patrick McHardy
  2010-12-15 16:55         ` Eric Dumazet
  0 siblings, 1 reply; 40+ messages in thread
From: Patrick McHardy @ 2010-12-15 16:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Jarek Poplawski

On 15.12.2010 17:40, Eric Dumazet wrote:
> Le mercredi 15 décembre 2010 à 17:27 +0100, Eric Dumazet a écrit :
> 
>> Hmm, you may be right, thanks a lot for reviewing !
>>
>> I noticed that with normal quantum (1514), my SFQ setup was sending two
>> full frames per flow after my patch, so was about to prepare a new
>> version ;)
>>
>> I'll post a v2 shortly.
> 
> Indeed, the missing init in sfq_enqueue() is enough to solve the
> problem : 
> 
> [PATCH v2] net_sched: sch_sfq: fix allot handling
> 
> When deploying SFQ/IFB here at work, I found the allot management was
> pretty wrong in sfq, even changing allot from short to int...
> 
> We should init allot for each new flow, not using a previous value found
> in slot.
> 
> Before patch, I saw bursts of several packets per flow, apparently
> denying the default "quantum 1514" limit I had on my SFQ class.
> 
> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> index 3cf478d..065a2a5 100644
> --- a/net/sched/sch_sfq.c
> +++ b/net/sched/sch_sfq.c
> @@ -321,14 +321,13 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>  	sfq_inc(q, x);
>  	if (q->qs[x].qlen == 1) {		/* The flow is new */
>  		if (q->tail == SFQ_DEPTH) {	/* It is the first flow */
> -			q->tail = x;
>  			q->next[x] = x;
> -			q->allot[x] = q->quantum;
>  		} else {
>  			q->next[x] = q->next[q->tail];
>  			q->next[q->tail] = x;
> -			q->tail = x;
>  		}
> +		q->tail = x;
> +		q->allot[x] = q->quantum;
>  	}

Now we could remove the allot increase in sfq_dequeue for
the case that the flow becomes inactive. It is incorrect
anyways.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] net_sched: sch_sfq: fix allot handling
  2010-12-15 16:43       ` Patrick McHardy
@ 2010-12-15 16:55         ` Eric Dumazet
  2010-12-15 17:03           ` Patrick McHardy
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2010-12-15 16:55 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev, Jarek Poplawski

Le mercredi 15 décembre 2010 à 17:43 +0100, Patrick McHardy a écrit :

> Now we could remove the allot increase in sfq_dequeue for
> the case that the flow becomes inactive. It is incorrect
> anyways.

Hmm, we increase the allot for the next slot, not for the slot now
empty.


        /* Is the slot empty? */
        if (q->qs[a].qlen == 0) {
                q->ht[q->hash[a]] = SFQ_DEPTH;
                a = q->next[a]; // a = next slot index
                if (a == old_a) {
                        q->tail = SFQ_DEPTH;
                        return skb;
                }
                q->next[q->tail] = a;
                q->allot[a] += q->quantum;
// HERE, q->allot[a] is for next slot, we give it its quantum for being
activated

        } else if ((q->allot[a] -= qdisc_pkt_len(skb)) <= 0) {

Maybe we should rename (a / old_a) by (a / next_a) to avoid confusion :)

I was thinking in allowing more packets per SFQ (but keep the 126 active
flows limit), what do you think ?




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] net_sched: sch_sfq: fix allot handling
  2010-12-15 16:55         ` Eric Dumazet
@ 2010-12-15 17:03           ` Patrick McHardy
  2010-12-15 17:09             ` Eric Dumazet
  2010-12-16 13:08             ` [PATCH v2] " Eric Dumazet
  0 siblings, 2 replies; 40+ messages in thread
From: Patrick McHardy @ 2010-12-15 17:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Jarek Poplawski

On 15.12.2010 17:55, Eric Dumazet wrote:
> Le mercredi 15 décembre 2010 à 17:43 +0100, Patrick McHardy a écrit :
> 
>> Now we could remove the allot increase in sfq_dequeue for
>> the case that the flow becomes inactive. It is incorrect
>> anyways.
> 
> Hmm, we increase the allot for the next slot, not for the slot now
> empty.
> 
> 
>         /* Is the slot empty? */
>         if (q->qs[a].qlen == 0) {
>                 q->ht[q->hash[a]] = SFQ_DEPTH;
>                 a = q->next[a]; // a = next slot index
>                 if (a == old_a) {
>                         q->tail = SFQ_DEPTH;
>                         return skb;
>                 }
>                 q->next[q->tail] = a;
>                 q->allot[a] += q->quantum;
> // HERE, q->allot[a] is for next slot, we give it its quantum for being
> activated

Right, that's odd. It shouldn't be necessary anymore though since
now we initialize allot in sfq_enqueue() for all new flows and
increase allotment for all active flows once per round in sfq_dequeue().
The above code causes a second increase for the flow following a flow
which went inactive.

> 
>         } else if ((q->allot[a] -= qdisc_pkt_len(skb)) <= 0) {
> 
> Maybe we should rename (a / old_a) by (a / next_a) to avoid confusion :)

That might have made things more clearer :)

> I was thinking in allowing more packets per SFQ (but keep the 126 active
> flows limit), what do you think ?

I keep forgetting why this limit exists, let me try to figure
it out once more :)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] net_sched: sch_sfq: fix allot handling
  2010-12-15 17:03           ` Patrick McHardy
@ 2010-12-15 17:09             ` Eric Dumazet
  2010-12-15 17:21               ` Patrick McHardy
  2010-12-16 13:08             ` [PATCH v2] " Eric Dumazet
  1 sibling, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2010-12-15 17:09 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev, Jarek Poplawski

Le mercredi 15 décembre 2010 à 18:03 +0100, Patrick McHardy a écrit :

> Right, that's odd. It shouldn't be necessary anymore though since
> now we initialize allot in sfq_enqueue() for all new flows and
> increase allotment for all active flows once per round in sfq_dequeue().
> The above code causes a second increase for the flow following a flow
> which went inactive.

Well, we do this in three places. Each time we 'select' a flow as the
next packet provider, we increase its allot by quantum.

We could change this, adding quantum to the current slot when its allot
becomes negative (and we select the next slot for next round)

This basically was what my V1 was doing ;)




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] net_sched: sch_sfq: fix allot handling
  2010-12-15 17:09             ` Eric Dumazet
@ 2010-12-15 17:21               ` Patrick McHardy
  2010-12-15 17:30                 ` [PATCH v3] " Eric Dumazet
  0 siblings, 1 reply; 40+ messages in thread
From: Patrick McHardy @ 2010-12-15 17:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Jarek Poplawski

On 15.12.2010 18:09, Eric Dumazet wrote:
> Le mercredi 15 décembre 2010 à 18:03 +0100, Patrick McHardy a écrit :
> 
>> Right, that's odd. It shouldn't be necessary anymore though since
>> now we initialize allot in sfq_enqueue() for all new flows and
>> increase allotment for all active flows once per round in sfq_dequeue().
>> The above code causes a second increase for the flow following a flow
>> which went inactive.
> 
> Well, we do this in three places. Each time we 'select' a flow as the
> next packet provider, we increase its allot by quantum.
> 
> We could change this, adding quantum to the current slot when its allot
> becomes negative (and we select the next slot for next round)

Right, I again missed that 'a' refers to the next flow in the second
condition in sfq_dequeue().

I have to run now, will have another look at this tommorrow.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v3] net_sched: sch_sfq: fix allot handling
  2010-12-15 17:21               ` Patrick McHardy
@ 2010-12-15 17:30                 ` Eric Dumazet
  2010-12-15 18:18                   ` [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() Eric Dumazet
  2010-12-20 21:18                   ` [PATCH v3] net_sched: sch_sfq: fix allot handling David Miller
  0 siblings, 2 replies; 40+ messages in thread
From: Eric Dumazet @ 2010-12-15 17:30 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev, Jarek Poplawski

Le mercredi 15 décembre 2010 à 18:21 +0100, Patrick McHardy a écrit :
> On 15.12.2010 18:09, Eric Dumazet wrote:
> > Le mercredi 15 décembre 2010 à 18:03 +0100, Patrick McHardy a écrit :
> > 
> >> Right, that's odd. It shouldn't be necessary anymore though since
> >> now we initialize allot in sfq_enqueue() for all new flows and
> >> increase allotment for all active flows once per round in sfq_dequeue().
> >> The above code causes a second increase for the flow following a flow
> >> which went inactive.
> > 
> > Well, we do this in three places. Each time we 'select' a flow as the
> > next packet provider, we increase its allot by quantum.
> > 
> > We could change this, adding quantum to the current slot when its allot
> > becomes negative (and we select the next slot for next round)
> 
> Right, I again missed that 'a' refers to the next flow in the second
> condition in sfq_dequeue().
> 
> I have to run now, will have another look at this tommorrow.

Here is v3 : Now we add a quantum only when current flow consumed all
its allot, not "in advance for next slot". this should use less cpu too.

Thanks

[PATCH v3] net_sched: sch_sfq: fix allot handling

When deploying SFQ/IFB here at work, I found the allot management was
pretty wrong in sfq, even changing allot from short to int...

We should init allot for each new flow, not using a previous value found
in slot.

Before patch, I saw bursts of several packets per flow, apparently
denying the default "quantum 1514" limit I had on my SFQ class.

class sfq 11:1 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 7p requeues 0 
 allot 11546 

class sfq 11:46 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 1p requeues 0 
 allot -23873 

class sfq 11:78 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 5p requeues 0 
 allot 11393 

After patch, better fairness among each flow, allot limit being
respected, allot is positive :

class sfq 11:e parent 11: 
 (dropped 0, overlimits 0 requeues 86) 
 backlog 0b 3p requeues 86 
 allot 596 

class sfq 11:94 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 3p requeues 0 
 allot 1468 

class sfq 11:a4 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 4p requeues 0 
 allot 650 

class sfq 11:bb parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 3p requeues 0 
 allot 596 

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/sched/sch_sfq.c |   20 ++++++++------------
 1 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 3cf478d..7150705 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -270,7 +270,6 @@ static unsigned int sfq_drop(struct Qdisc *sch)
 		/* It is difficult to believe, but ALL THE SLOTS HAVE LENGTH 1. */
 		d = q->next[q->tail];
 		q->next[q->tail] = q->next[d];
-		q->allot[q->next[d]] += q->quantum;
 		skb = q->qs[d].prev;
 		len = qdisc_pkt_len(skb);
 		__skb_unlink(skb, &q->qs[d]);
@@ -321,14 +320,13 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	sfq_inc(q, x);
 	if (q->qs[x].qlen == 1) {		/* The flow is new */
 		if (q->tail == SFQ_DEPTH) {	/* It is the first flow */
-			q->tail = x;
 			q->next[x] = x;
-			q->allot[x] = q->quantum;
 		} else {
 			q->next[x] = q->next[q->tail];
 			q->next[q->tail] = x;
-			q->tail = x;
 		}
+		q->tail = x;
+		q->allot[x] = q->quantum;
 	}
 	if (++sch->q.qlen <= q->limit) {
 		sch->bstats.bytes += qdisc_pkt_len(skb);
@@ -359,13 +357,13 @@ sfq_dequeue(struct Qdisc *sch)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
 	struct sk_buff *skb;
-	sfq_index a, old_a;
+	sfq_index a, next_a;
 
 	/* No active slots */
 	if (q->tail == SFQ_DEPTH)
 		return NULL;
 
-	a = old_a = q->next[q->tail];
+	a = q->next[q->tail];
 
 	/* Grab packet */
 	skb = __skb_dequeue(&q->qs[a]);
@@ -376,17 +374,15 @@ sfq_dequeue(struct Qdisc *sch)
 	/* Is the slot empty? */
 	if (q->qs[a].qlen == 0) {
 		q->ht[q->hash[a]] = SFQ_DEPTH;
-		a = q->next[a];
-		if (a == old_a) {
+		next_a = q->next[a];
+		if (a == next_a) {
 			q->tail = SFQ_DEPTH;
 			return skb;
 		}
-		q->next[q->tail] = a;
-		q->allot[a] += q->quantum;
+		q->next[q->tail] = next_a;
 	} else if ((q->allot[a] -= qdisc_pkt_len(skb)) <= 0) {
-		q->tail = a;
-		a = q->next[a];
 		q->allot[a] += q->quantum;
+		q->tail = a;
 	}
 	return skb;
 }



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats()
  2010-12-15 17:30                 ` [PATCH v3] " Eric Dumazet
@ 2010-12-15 18:18                   ` Eric Dumazet
  2010-12-15 19:10                     ` Eric Dumazet
                                       ` (2 more replies)
  2010-12-20 21:18                   ` [PATCH v3] net_sched: sch_sfq: fix allot handling David Miller
  1 sibling, 3 replies; 40+ messages in thread
From: Eric Dumazet @ 2010-12-15 18:18 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jarek Poplawski, Patrick McHardy

We currently return for each active SFQ slot the number of packets in
queue. We can also give number of bytes accounted for these packets.

tc -s class show dev ifb0

Before patch :

class sfq 11:3d9 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 3p requeues 0 
 allot 1266 

After patch :

class sfq 11:3e4 parent 11: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 4380b 3p requeues 0 
 allot 1212 


Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/sched/sch_sfq.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 3cf478d..cb331de 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -548,8 +548,13 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
 	sfq_index idx = q->ht[cl-1];
-	struct gnet_stats_queue qs = { .qlen = q->qs[idx].qlen };
+	struct sk_buff_head *list = &q->qs[idx];
+	struct gnet_stats_queue qs = { .qlen = list->qlen };
 	struct tc_sfq_xstats xstats = { .allot = q->allot[idx] };
+	struct sk_buff *skb;
+
+	skb_queue_walk(list, skb)
+		qs.backlog += qdisc_pkt_len(skb);
 
 	if (gnet_stats_copy_queue(d, &qs) < 0)
 		return -1;



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats()
  2010-12-15 18:18                   ` [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() Eric Dumazet
@ 2010-12-15 19:10                     ` Eric Dumazet
  2010-12-16  8:16                     ` Jarek Poplawski
  2010-12-20 21:14                     ` David Miller
  2 siblings, 0 replies; 40+ messages in thread
From: Eric Dumazet @ 2010-12-15 19:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jarek Poplawski, Patrick McHardy

Le mercredi 15 décembre 2010 à 19:18 +0100, Eric Dumazet a écrit :
> We currently return for each active SFQ slot the number of packets in
> queue. We can also give number of bytes accounted for these packets.
> 
> tc -s class show dev ifb0
> 
> Before patch :
> 
> class sfq 11:3d9 parent 11: 
>  (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 3p requeues 0 
>  allot 1266 
> 
> After patch :
> 
> class sfq 11:3e4 parent 11: 
>  (dropped 0, overlimits 0 requeues 0) 
>  backlog 4380b 3p requeues 0 
>  allot 1212 
> 
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  net/sched/sch_sfq.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> index 3cf478d..cb331de 100644
> --- a/net/sched/sch_sfq.c
> +++ b/net/sched/sch_sfq.c
> @@ -548,8 +548,13 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
>  {
>  	struct sfq_sched_data *q = qdisc_priv(sch);
>  	sfq_index idx = q->ht[cl-1];
> -	struct gnet_stats_queue qs = { .qlen = q->qs[idx].qlen };
> +	struct sk_buff_head *list = &q->qs[idx];
> +	struct gnet_stats_queue qs = { .qlen = list->qlen };
>  	struct tc_sfq_xstats xstats = { .allot = q->allot[idx] };
> +	struct sk_buff *skb;
> +
> +	skb_queue_walk(list, skb)
> +		qs.backlog += qdisc_pkt_len(skb);
>  
>  	if (gnet_stats_copy_queue(d, &qs) < 0)
>  		return -1;
> 


By the way, I could not find out how to make "tc -s class show dev ifb0"
reports correct backlog information on parent class itself :

Here we can see 126p, and 0b in backlog :

class cbq 1:11 parent 1:1 leaf 11: rate 50000Kbit cell 8b mpu 64b (bounded) prio 2/2 weight 50000Kbit allot 2250b 
level 0 ewma 5 avpkt 1500b maxidle 0us 
 Sent 330418440 bytes 226314 pkt (dropped 280142, overlimits 689534 requeues 0) 
 rate 52200Kbit 4469pps backlog 0b 126p requeues 0 
  borrowed 0 overactions 182814 avgidle -3363 undertime 2812

qdisc is OK :

qdisc sfq 11: parent 1:11 limit 127p quantum 1514b flows 127/1024 perturb 60sec 
 Sent 1712372746 bytes 1172859 pkt (dropped 1451945, overlimits 0 requeues 0) 
 rate 52287Kbit 4477pps backlog 185420b 127p requeues 0 



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats()
  2010-12-15 18:18                   ` [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() Eric Dumazet
  2010-12-15 19:10                     ` Eric Dumazet
@ 2010-12-16  8:16                     ` Jarek Poplawski
  2010-12-16 10:18                       ` [PATCH v2 " Eric Dumazet
  2010-12-16 11:03                       ` [PATCH " Eric Dumazet
  2010-12-20 21:14                     ` David Miller
  2 siblings, 2 replies; 40+ messages in thread
From: Jarek Poplawski @ 2010-12-16  8:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Patrick McHardy

On 2010-12-15 19:18, Eric Dumazet wrote:
> We currently return for each active SFQ slot the number of packets in
> queue. We can also give number of bytes accounted for these packets.
> 
> tc -s class show dev ifb0
> 
> Before patch :
> 
> class sfq 11:3d9 parent 11: 
>  (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 3p requeues 0 
>  allot 1266 
> 
> After patch :
> 
> class sfq 11:3e4 parent 11: 
>  (dropped 0, overlimits 0 requeues 0) 
>  backlog 4380b 3p requeues 0 
>  allot 1212 
> 
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  net/sched/sch_sfq.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> index 3cf478d..cb331de 100644
> --- a/net/sched/sch_sfq.c
> +++ b/net/sched/sch_sfq.c
> @@ -548,8 +548,13 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
>  {
>  	struct sfq_sched_data *q = qdisc_priv(sch);
>  	sfq_index idx = q->ht[cl-1];
> -	struct gnet_stats_queue qs = { .qlen = q->qs[idx].qlen };
> +	struct sk_buff_head *list = &q->qs[idx];
> +	struct gnet_stats_queue qs = { .qlen = list->qlen };
>  	struct tc_sfq_xstats xstats = { .allot = q->allot[idx] };
> +	struct sk_buff *skb;
> +
> +	skb_queue_walk(list, skb)
> +		qs.backlog += qdisc_pkt_len(skb);

I don't think you can walk this list without the qdisc lock.

Jarek P.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats()
  2010-12-16  8:16                     ` Jarek Poplawski
@ 2010-12-16 10:18                       ` Eric Dumazet
  2010-12-16 11:03                       ` [PATCH " Eric Dumazet
  1 sibling, 0 replies; 40+ messages in thread
From: Eric Dumazet @ 2010-12-16 10:18 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, netdev, Patrick McHardy

Le jeudi 16 décembre 2010 à 08:16 +0000, Jarek Poplawski a écrit :

> I don't think you can walk this list without the qdisc lock.
> 

I assumed it was already the case, but I did not check


Me confused...

If I use following patch, I get a recursive lock :

 net/sched/sch_sfq.c |   17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 3cf478d..a2cde03 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -547,9 +547,20 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 				struct gnet_dump *d)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
-	sfq_index idx = q->ht[cl-1];
-	struct gnet_stats_queue qs = { .qlen = q->qs[idx].qlen };
-	struct tc_sfq_xstats xstats = { .allot = q->allot[idx] };
+	sfq_index idx;
+	struct gnet_stats_queue qs = { 0 };
+	struct tc_sfq_xstats xstats = { 0 };
+	struct sk_buff_head *list;
+	struct sk_buff *skb;
+
+	spin_lock_bh(qdisc_root_sleeping_lock(sch));
+	idx = q->ht[cl - 1];
+	list = &q->qs[idx];
+	xstats.allot = q->allot[idx];
+	qs.qlen = list->qlen;
+	skb_queue_walk(list, skb)
+		qs.backlog += qdisc_pkt_len(skb);
+	spin_unlock_bh(qdisc_root_sleeping_lock(sch));
 
 	if (gnet_stats_copy_queue(d, &qs) < 0)
 		return -1;


Dec 16 10:49:34 edumdev kernel: [  616.452080] sch->qstats.backlog=185420
Dec 16 10:49:34 edumdev kernel: [  616.452146] 
Dec 16 10:49:34 edumdev kernel: [  616.452147] =============================================
Dec 16 10:49:34 edumdev kernel: [  616.452265] [ INFO: possible recursive locking detected ]
Dec 16 10:49:34 edumdev kernel: [  616.452329] 2.6.37-rc1-01820-g4be8976-dirty #456
Dec 16 10:49:34 edumdev kernel: [  616.452425] ---------------------------------------------
Dec 16 10:49:34 edumdev kernel: [  616.452489] tc/8747 is trying to acquire lock:
Dec 16 10:49:34 edumdev kernel: [  616.452550]  (&qdisc_tx_lock){+.-...}, at: [<ffffffffa01331d5>] sfq_dump_class_stats+0x65/0x160 [sch_sfq]
Dec 16 10:49:34 edumdev kernel: [  616.452753] 
Dec 16 10:49:34 edumdev kernel: [  616.452754] but task is already holding lock:
Dec 16 10:49:34 edumdev kernel: [  616.452867]  (&qdisc_tx_lock){+.-...}, at: [<ffffffff8145474a>] gnet_stats_start_copy_compat+0x4a/0xc0
Dec 16 10:49:34 edumdev kernel: [  616.453068] 
Dec 16 10:49:34 edumdev kernel: [  616.453069] other info that might help us debug this:
Dec 16 10:49:34 edumdev kernel: [  616.453184] 2 locks held by tc/8747:
Dec 16 10:49:34 edumdev kernel: [  616.453243]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff8149dbc2>] netlink_dump+0x52/0x1e0
Dec 16 10:49:34 edumdev kernel: [  616.453510]  #1:  (&qdisc_tx_lock){+.-...}, at: [<ffffffff8145474a>] gnet_stats_start_copy_compat+0x4a/0xc0
Dec 16 10:49:34 edumdev kernel: [  616.453745] 
Dec 16 10:49:35 edumdev kernel: [  616.453746] stack backtrace:
Dec 16 10:49:35 edumdev kernel: [  616.453857] Pid: 8747, comm: tc Tainted: G        W   2.6.37-rc1-01820-g4be8976-dirty #456
Dec 16 10:49:35 edumdev kernel: [  616.453943] Call Trace:
Dec 16 10:49:35 edumdev kernel: [  616.454004]  [<ffffffff8107bd1e>] validate_chain+0x10be/0x1330
Dec 16 10:49:35 edumdev kernel: [  616.454072]  [<ffffffff8107b144>] ? validate_chain+0x4e4/0x1330
Dec 16 10:49:35 edumdev kernel: [  616.454142]  [<ffffffff810cd6cc>] ? get_page_from_freelist+0x2bc/0x730
Dec 16 10:49:35 edumdev kernel: [  616.454211]  [<ffffffff81079200>] ? trace_hardirqs_on_caller+0x110/0x190
Dec 16 10:49:35 edumdev kernel: [  616.454281]  [<ffffffff8107c3e9>] __lock_acquire+0x459/0xbe0
Dec 16 10:49:35 edumdev kernel: [  616.454379]  [<ffffffff8107cc10>] lock_acquire+0xa0/0x140
Dec 16 10:49:35 edumdev kernel: [  616.454446]  [<ffffffffa01331d5>] ? sfq_dump_class_stats+0x65/0x160 [sch_sfq]
Dec 16 10:49:35 edumdev kernel: [  616.454520]  [<ffffffff815aee46>] _raw_spin_lock_bh+0x36/0x50
Dec 16 10:49:35 edumdev kernel: [  616.454586]  [<ffffffffa01331d5>] ? sfq_dump_class_stats+0x65/0x160 [sch_sfq]
Dec 16 10:49:35 edumdev kernel: [  616.454657]  [<ffffffffa01331d5>] sfq_dump_class_stats+0x65/0x160 [sch_sfq]
Dec 16 10:49:35 edumdev kernel: [  616.454727]  [<ffffffff81202ef4>] ? nla_put+0x34/0x40
Dec 16 10:49:35 edumdev kernel: [  616.454794]  [<ffffffff8145478f>] ? gnet_stats_start_copy_compat+0x8f/0xc0
Dec 16 10:49:35 edumdev kernel: [  616.454865]  [<ffffffff8147a2f1>] tc_fill_tclass+0x1b1/0x250
Dec 16 10:49:35 edumdev kernel: [  616.454932]  [<ffffffff8147a3ce>] qdisc_class_dump+0x3e/0x40
Dec 16 10:49:35 edumdev kernel: [  616.454999]  [<ffffffff81483a68>] ? cbq_walk+0x78/0xc0
Dec 16 10:49:35 edumdev kernel: [  616.455064]  [<ffffffffa013228c>] sfq_walk+0x5c/0x90 [sch_sfq]
Dec 16 10:49:35 edumdev kernel: [  616.455131]  [<ffffffff81479f3a>] tc_dump_tclass_qdisc+0xba/0x110
Dec 16 10:49:35 edumdev kernel: [  616.455199]  [<ffffffff8147a390>] ? qdisc_class_dump+0x0/0x40
Dec 16 10:49:35 edumdev kernel: [  616.455266]  [<ffffffff8147a00f>] tc_dump_tclass_root+0x7f/0xa0
Dec 16 10:49:35 edumdev kernel: [  616.455332]  [<ffffffff8147a0bc>] tc_dump_tclass+0x8c/0x110
Dec 16 10:49:35 edumdev kernel: [  616.455426]  [<ffffffff8149dbdd>] netlink_dump+0x6d/0x1e0
Dec 16 10:49:35 edumdev kernel: [  616.455494]  [<ffffffff814a0c0c>] netlink_dump_start+0x19c/0x210
Dec 16 10:49:35 edumdev kernel: [  616.455562]  [<ffffffff8147a030>] ? tc_dump_tclass+0x0/0x110
Dec 16 10:49:35 edumdev kernel: [  616.455628]  [<ffffffff8147a030>] ? tc_dump_tclass+0x0/0x110
Dec 16 10:49:35 edumdev kernel: [  616.455694]  [<ffffffff8146bfa9>] rtnetlink_rcv_msg+0xb9/0x260
Dec 16 10:49:35 edumdev kernel: [  616.455763]  [<ffffffff8146bef0>] ? rtnetlink_rcv_msg+0x0/0x260
Dec 16 10:49:35 edumdev kernel: [  616.455832]  [<ffffffff8149ef29>] netlink_rcv_skb+0x99/0xc0
Dec 16 10:49:35 edumdev kernel: [  616.455898]  [<ffffffff8146bed5>] rtnetlink_rcv+0x25/0x40
Dec 16 10:49:35 edumdev kernel: [  616.455963]  [<ffffffff8149ea95>] ? netlink_unicast+0xf5/0x2d0
Dec 16 10:49:35 edumdev kernel: [  616.456030]  [<ffffffff8149ec42>] netlink_unicast+0x2a2/0x2d0
Dec 16 10:49:35 edumdev kernel: [  616.456098]  [<ffffffff810e8cb3>] ? might_fault+0x53/0xb0
Dec 16 10:49:35 edumdev kernel: [  616.456163]  [<ffffffff81451fed>] ? memcpy_fromiovec+0x6d/0x90
Dec 16 10:49:35 edumdev kernel: [  616.456231]  [<ffffffff8149fbdd>] netlink_sendmsg+0x24d/0x390
Dec 16 10:49:35 edumdev kernel: [  616.456299]  [<ffffffff814463d0>] sock_sendmsg+0xc0/0xf0
Dec 16 10:49:36 edumdev kernel: [  616.456390]  [<ffffffff810e8cb3>] ? might_fault+0x53/0xb0
Dec 16 10:49:36 edumdev kernel: [  616.456457]  [<ffffffff814767ce>] ? verify_compat_iovec+0x6e/0x110
Dec 16 10:49:36 edumdev kernel: [  616.456526]  [<ffffffff81447164>] sys_sendmsg+0x194/0x320
Dec 16 10:49:36 edumdev kernel: [  616.456593]  [<ffffffff815b2e02>] ? do_page_fault+0x102/0x4e0
Dec 16 10:49:36 edumdev kernel: [  616.456661]  [<ffffffff8107cd4d>] ? lock_release_non_nested+0x9d/0x2e0
Dec 16 10:49:36 edumdev kernel: [  616.456729]  [<ffffffff810e8cb3>] ? might_fault+0x53/0xb0
Dec 16 10:49:36 edumdev kernel: [  616.456796]  [<ffffffff810e8cb3>] ? might_fault+0x53/0xb0
Dec 16 10:49:36 edumdev kernel: [  616.456863]  [<ffffffff81476154>] compat_sys_sendmsg+0x14/0x20
Dec 16 10:49:36 edumdev kernel: [  616.456929]  [<ffffffff814770fe>] compat_sys_socketcall+0x1be/0x210
Dec 16 10:49:36 edumdev kernel: [  616.457000]  [<ffffffff8102f1d0>] sysenter_dispatch+0x7/0x33
Dec 16 10:49:36 edumdev kernel: [  616.457067]  [<ffffffff815ae9a9>] ? trace_hardirqs_on_thunk+0x3a/0x3f



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats()
  2010-12-16  8:16                     ` Jarek Poplawski
  2010-12-16 10:18                       ` [PATCH v2 " Eric Dumazet
@ 2010-12-16 11:03                       ` Eric Dumazet
  2010-12-16 13:09                         ` Jarek Poplawski
  1 sibling, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2010-12-16 11:03 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, netdev, Patrick McHardy

Le jeudi 16 décembre 2010 à 08:16 +0000, Jarek Poplawski a écrit :
> On 2010-12-15 19:18, Eric Dumazet wrote:
> > We currently return for each active SFQ slot the number of packets in
> > queue. We can also give number of bytes accounted for these packets.
> > 
> > tc -s class show dev ifb0
> > 
> > Before patch :
> > 
> > class sfq 11:3d9 parent 11: 
> >  (dropped 0, overlimits 0 requeues 0) 
> >  backlog 0b 3p requeues 0 
> >  allot 1266 
> > 
> > After patch :
> > 
> > class sfq 11:3e4 parent 11: 
> >  (dropped 0, overlimits 0 requeues 0) 
> >  backlog 4380b 3p requeues 0 
> >  allot 1212 
> > 
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > ---
> >  net/sched/sch_sfq.c |    7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> > index 3cf478d..cb331de 100644
> > --- a/net/sched/sch_sfq.c
> > +++ b/net/sched/sch_sfq.c
> > @@ -548,8 +548,13 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
> >  {
> >  	struct sfq_sched_data *q = qdisc_priv(sch);
> >  	sfq_index idx = q->ht[cl-1];
> > -	struct gnet_stats_queue qs = { .qlen = q->qs[idx].qlen };
> > +	struct sk_buff_head *list = &q->qs[idx];
> > +	struct gnet_stats_queue qs = { .qlen = list->qlen };
> >  	struct tc_sfq_xstats xstats = { .allot = q->allot[idx] };
> > +	struct sk_buff *skb;
> > +
> > +	skb_queue_walk(list, skb)
> > +		qs.backlog += qdisc_pkt_len(skb);
> 
> I don't think you can walk this list without the qdisc lock.

So after checks, I confirm qdisc lock is held at this point, patch is
valid.

tc_fill_tclass() calls gnet_stats_start_copy_compat() (and locks
qdisc_root_sleeping_lock()), before calling 
 cl_ops->dump_stats(q, cl, &d)

Thanks !



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] net_sched: sch_sfq: fix allot handling
  2010-12-15 17:03           ` Patrick McHardy
  2010-12-15 17:09             ` Eric Dumazet
@ 2010-12-16 13:08             ` Eric Dumazet
  2010-12-17 16:52               ` [RFC PATCH] net_sched: sch_sfq: better struct layouts Eric Dumazet
  1 sibling, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2010-12-16 13:08 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev, Jarek Poplawski

Le mercredi 15 décembre 2010 à 18:03 +0100, Patrick McHardy a écrit :
> On 15.12.2010 17:55, Eric Dumazet wrote:

> > I was thinking in allowing more packets per SFQ (but keep the 126 active
> > flows limit), what do you think ?
> 
> I keep forgetting why this limit exists, let me try to figure
> it out once more :)

I took a look, and found we already are at max, unless we change
sfq_index type (from u8 to u16)

SFQ_SLOTS is 128  (not really exist, but this is the number of slots)
SFQ_DEPTH is 128 (max depth for one slot)
Constraints are :
 SFQ_SLOTS < 256
 SFQ_DEPTH < 256
 SFQ_SLOTS + SFQ_DEPTH < 256, because of the dep[] array.

We could avoid the "struct sk_buff_head" structure with its un-needed
spinlock, and eventually group data for a given slot in a structure to
reduce number of cache lines we touch...

struct sfq_slot {
	struct sk_buff *first;
	struct sk_buff *last;
	u8 qlen;
	sfq_index next; /* dequeue chain */
	u16 hash;
	short allot;
	/* 16bit hole */
};

This would save 768 bytes on x86_64 (and much more if LOCKDEP is used)




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats()
  2010-12-16 11:03                       ` [PATCH " Eric Dumazet
@ 2010-12-16 13:09                         ` Jarek Poplawski
  0 siblings, 0 replies; 40+ messages in thread
From: Jarek Poplawski @ 2010-12-16 13:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Patrick McHardy

On Thu, Dec 16, 2010 at 12:03:04PM +0100, Eric Dumazet wrote:
> Le jeudi 16 décembre 2010 ?? 08:16 +0000, Jarek Poplawski a écrit :
> > On 2010-12-15 19:18, Eric Dumazet wrote:
> > > We currently return for each active SFQ slot the number of packets in
> > > queue. We can also give number of bytes accounted for these packets.
...
> > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > > ---
> > >  net/sched/sch_sfq.c |    7 ++++++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> > > index 3cf478d..cb331de 100644
> > > --- a/net/sched/sch_sfq.c
> > > +++ b/net/sched/sch_sfq.c
> > > @@ -548,8 +548,13 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
> > >  {
> > >  	struct sfq_sched_data *q = qdisc_priv(sch);
> > >  	sfq_index idx = q->ht[cl-1];
> > > -	struct gnet_stats_queue qs = { .qlen = q->qs[idx].qlen };
> > > +	struct sk_buff_head *list = &q->qs[idx];
> > > +	struct gnet_stats_queue qs = { .qlen = list->qlen };
> > >  	struct tc_sfq_xstats xstats = { .allot = q->allot[idx] };
> > > +	struct sk_buff *skb;
> > > +
> > > +	skb_queue_walk(list, skb)
> > > +		qs.backlog += qdisc_pkt_len(skb);
> > 
> > I don't think you can walk this list without the qdisc lock.
> 
> So after checks, I confirm qdisc lock is held at this point, patch is
> valid.
> 
> tc_fill_tclass() calls gnet_stats_start_copy_compat() (and locks
> qdisc_root_sleeping_lock()), before calling 
>  cl_ops->dump_stats(q, cl, &d)
> 
> Thanks !

You are right. Sorry for misleading.

Jarek P.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RFC PATCH] net_sched: sch_sfq: better struct layouts
  2010-12-16 13:08             ` [PATCH v2] " Eric Dumazet
@ 2010-12-17 16:52               ` Eric Dumazet
  2010-12-19 21:22                 ` Jarek Poplawski
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2010-12-17 16:52 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, netdev, Jarek Poplawski

Le jeudi 16 décembre 2010 à 14:08 +0100, Eric Dumazet a écrit :
> Le mercredi 15 décembre 2010 à 18:03 +0100, Patrick McHardy a écrit :
> > On 15.12.2010 17:55, Eric Dumazet wrote:
> 
> > > I was thinking in allowing more packets per SFQ (but keep the 126 active
> > > flows limit), what do you think ?
> > 
> > I keep forgetting why this limit exists, let me try to figure
> > it out once more :)
> 
> I took a look, and found we already are at max, unless we change
> sfq_index type (from u8 to u16)
> 
> SFQ_SLOTS is 128  (not really exist, but this is the number of slots)
> SFQ_DEPTH is 128 (max depth for one slot)
> Constraints are :
>  SFQ_SLOTS < 256
>  SFQ_DEPTH < 256
>  SFQ_SLOTS + SFQ_DEPTH < 256, because of the dep[] array.
> 
> We could avoid the "struct sk_buff_head" structure with its un-needed
> spinlock, and eventually group data for a given slot in a structure to
> reduce number of cache lines we touch...
> 
> struct sfq_slot {
> 	struct sk_buff *first;
> 	struct sk_buff *last;
> 	u8 qlen;
> 	sfq_index next; /* dequeue chain */
> 	u16 hash;
> 	short allot;
> 	/* 16bit hole */
> };
> 
> This would save 768 bytes on x86_64 (and much more if LOCKDEP is used)
> 
> 

Here is a preliminary patch, shrinking sizeof(struct sfq_sched_data)
from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and
reduce text size as well.

   text	   data	    bss	    dec	    hex	filename
   4821	    152	      0	   4973	   136d	old/net/sched/sch_sfq.o
   4651	    136	      0	   4787	   12b3	new/net/sched/sch_sfq.o


All data for a slot/flow is now grouped in a compact and cache friendly
structure :

struct sfq_slot {
        struct sk_buff  *skblist_next;
        struct sk_buff  *skblist_prev;
        sfq_index       qlen; /* number of skbs in skblist */
        sfq_index       next; /* next slot in sfq chain */
        unsigned short  hash; /* hash value (index in ht[]) */
        short           allot; /* credit for this slot */
        struct sfq_head anchor; /* anchor in dep[] chains */
};



 net/sched/sch_sfq.c |  223 +++++++++++++++++++++++-------------------
 1 file changed, 125 insertions(+), 98 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 3cf478d..28968eb 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -69,25 +69,40 @@
 	This implementation limits maximal queue length to 128;
 	maximal mtu to 2^15-1; number of hash buckets to 1024.
 	The only goal of this restrictions was that all data
-	fit into one 4K page :-). Struct sfq_sched_data is
-	organized in anti-cache manner: all the data for a bucket
-	are scattered over different locations. This is not good,
-	but it allowed me to put it into 4K.
+	fit into one 4K page on 32bit arches.
 
 	It is easy to increase these values, but not in flight.  */
 
-#define SFQ_DEPTH		128
+#define SFQ_DEPTH		128 /* max number of packets per slot (per flow) */
+#define SFQ_SLOTS		128 /* max number of flows */
+#define EMPTY_SLOT		255
 #define SFQ_HASH_DIVISOR	1024
 
-/* This type should contain at least SFQ_DEPTH*2 values */
+/* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */
 typedef unsigned char sfq_index;
 
+/*
+ * We dont use pointers to save space.
+ * Small indexes [0 ... SFQ_SLOTS - 1] are 'pointers' to slots[] array
+ * while following values [SFQ_SLOTS ... SFQ_SLOTS + SFQ_DEPTH - 1]
+ * are 'pointers' to dep[] array
+ */
 struct sfq_head
 {
 	sfq_index	next;
 	sfq_index	prev;
 };
 
+struct sfq_slot {
+	struct sk_buff	*skblist_next;
+	struct sk_buff	*skblist_prev;
+	sfq_index	qlen; /* number of skbs in skblist */
+	sfq_index	next; /* next slot in sfq chain */
+	unsigned short	hash; /* hash value (index in ht[]) */
+	short		allot; /* credit for this slot */
+	struct sfq_head anchor; /* anchor in dep[] chains */
+};
+
 struct sfq_sched_data
 {
 /* Parameters */
@@ -99,17 +114,24 @@ struct sfq_sched_data
 	struct tcf_proto *filter_list;
 	struct timer_list perturb_timer;
 	u32		perturbation;
-	sfq_index	tail;		/* Index of current slot in round */
-	sfq_index	max_depth;	/* Maximal depth */
+	sfq_index	max_depth;	/* depth of longest slot */
 
+	struct sfq_slot *tail;		/* current slot in round */
 	sfq_index	ht[SFQ_HASH_DIVISOR];	/* Hash table */
-	sfq_index	next[SFQ_DEPTH];	/* Active slots link */
-	short		allot[SFQ_DEPTH];	/* Current allotment per slot */
-	unsigned short	hash[SFQ_DEPTH];	/* Hash value indexed by slots */
-	struct sk_buff_head	qs[SFQ_DEPTH];		/* Slot queue */
-	struct sfq_head	dep[SFQ_DEPTH*2];	/* Linked list of slots, indexed by depth */
+	struct sfq_slot	slots[SFQ_SLOTS];
+	struct sfq_head	dep[SFQ_DEPTH];	/* Linked list of slots, indexed by depth */
 };
 
+/*
+ * sfq_head are either in a sfq_slot or in dep[] array
+ */
+static inline struct sfq_head *get_head(struct sfq_sched_data *q, sfq_index val)
+{
+	if (val < SFQ_SLOTS)
+		return &q->slots[val].anchor;
+	return &q->dep[val - SFQ_SLOTS];
+}
+
 static __inline__ unsigned sfq_fold_hash(struct sfq_sched_data *q, u32 h, u32 h1)
 {
 	return jhash_2words(h, h1, q->perturbation) & (SFQ_HASH_DIVISOR - 1);
@@ -200,30 +222,37 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 	return 0;
 }
 
+/*
+ * x : slot number [0 .. SFQ_SLOTS - 1]
+ */
 static inline void sfq_link(struct sfq_sched_data *q, sfq_index x)
 {
 	sfq_index p, n;
-	int d = q->qs[x].qlen + SFQ_DEPTH;
+	int qlen = q->slots[x].qlen;
 
-	p = d;
-	n = q->dep[d].next;
-	q->dep[x].next = n;
-	q->dep[x].prev = p;
-	q->dep[p].next = q->dep[n].prev = x;
+	p = qlen + SFQ_SLOTS;
+	n = q->dep[qlen].next;
+
+	q->slots[x].anchor.next = n;
+	q->slots[x].anchor.prev = p;
+
+	q->dep[qlen].next = x;		/* get_head(q, p)->next = x */
+	get_head(q, n)->prev = x;
 }
 
 static inline void sfq_dec(struct sfq_sched_data *q, sfq_index x)
 {
 	sfq_index p, n;
+	int d;
 
-	n = q->dep[x].next;
-	p = q->dep[x].prev;
-	q->dep[p].next = n;
-	q->dep[n].prev = p;
+	n = q->slots[x].anchor.next;
+	p = q->slots[x].anchor.prev;
+	get_head(q, p)->next = n;
+	get_head(q, n)->prev = p;
 
-	if (n == p && q->max_depth == q->qs[x].qlen + 1)
+	d = q->slots[x].qlen--;
+	if (n == p && q->max_depth == d)
 		q->max_depth--;
-
 	sfq_link(q, x);
 }
 
@@ -232,34 +261,36 @@ static inline void sfq_inc(struct sfq_sched_data *q, sfq_index x)
 	sfq_index p, n;
 	int d;
 
-	n = q->dep[x].next;
-	p = q->dep[x].prev;
-	q->dep[p].next = n;
-	q->dep[n].prev = p;
-	d = q->qs[x].qlen;
+	n = q->slots[x].anchor.next;
+	p = q->slots[x].anchor.prev;
+	get_head(q, p)->next = n;
+	get_head(q, n)->prev = p;
+
+	d = ++q->slots[x].qlen;
 	if (q->max_depth < d)
 		q->max_depth = d;
-
 	sfq_link(q, x);
 }
 
 static unsigned int sfq_drop(struct Qdisc *sch)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
-	sfq_index d = q->max_depth;
+	sfq_index x, d = q->max_depth;
 	struct sk_buff *skb;
 	unsigned int len;
+	struct sfq_slot *slot;
 
-	/* Queue is full! Find the longest slot and
-	   drop a packet from it */
-
+	/* Queue is full! Find the longest slot and drop tail packet from it */
 	if (d > 1) {
-		sfq_index x = q->dep[d + SFQ_DEPTH].next;
-		skb = q->qs[x].prev;
+		x = q->dep[d].next;
+		slot = &q->slots[x];
+drop:
+		skb = slot->skblist_prev;
+		slot->skblist_prev = skb->prev;
 		len = qdisc_pkt_len(skb);
-		__skb_unlink(skb, &q->qs[x]);
-		kfree_skb(skb);
 		sfq_dec(q, x);
+		skb->next = skb->prev = NULL;
+		kfree_skb(skb);
 		sch->q.qlen--;
 		sch->qstats.drops++;
 		sch->qstats.backlog -= len;
@@ -268,19 +299,11 @@ static unsigned int sfq_drop(struct Qdisc *sch)
 
 	if (d == 1) {
 		/* It is difficult to believe, but ALL THE SLOTS HAVE LENGTH 1. */
-		d = q->next[q->tail];
-		q->next[q->tail] = q->next[d];
-		q->allot[q->next[d]] += q->quantum;
-		skb = q->qs[d].prev;
-		len = qdisc_pkt_len(skb);
-		__skb_unlink(skb, &q->qs[d]);
-		kfree_skb(skb);
-		sfq_dec(q, d);
-		sch->q.qlen--;
-		q->ht[q->hash[d]] = SFQ_DEPTH;
-		sch->qstats.drops++;
-		sch->qstats.backlog -= len;
-		return len;
+		x = q->tail->next;
+		slot = &q->slots[x];
+		q->tail->next = slot->next;
+		q->ht[slot->hash] = EMPTY_SLOT;
+		goto drop;
 	}
 
 	return 0;
@@ -292,6 +315,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	struct sfq_sched_data *q = qdisc_priv(sch);
 	unsigned int hash;
 	sfq_index x;
+	struct sfq_slot *slot;
 	int uninitialized_var(ret);
 
 	hash = sfq_classify(skb, sch, &ret);
@@ -304,31 +328,36 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	hash--;
 
 	x = q->ht[hash];
-	if (x == SFQ_DEPTH) {
-		q->ht[hash] = x = q->dep[SFQ_DEPTH].next;
-		q->hash[x] = hash;
+	slot = &q->slots[x];
+	if (x == EMPTY_SLOT) {
+		x = q->dep[0].next; /* get a free slot */
+		q->ht[hash] = x;
+		slot = &q->slots[x];
+		slot->hash = hash;
+		slot->skblist_next = slot->skblist_prev = (struct sk_buff *)slot;
 	}
 
-	/* If selected queue has length q->limit, this means that
-	 * all another queues are empty and that we do simple tail drop,
+	/* If selected queue has length q->limit, do simple tail drop,
 	 * i.e. drop _this_ packet.
 	 */
-	if (q->qs[x].qlen >= q->limit)
+	if (slot->qlen >= q->limit)
 		return qdisc_drop(skb, sch);
 
 	sch->qstats.backlog += qdisc_pkt_len(skb);
-	__skb_queue_tail(&q->qs[x], skb);
+	skb->prev = slot->skblist_prev;
+	skb->next = (struct sk_buff *)slot;
+	slot->skblist_prev->next = skb;
+	slot->skblist_prev = skb;
 	sfq_inc(q, x);
-	if (q->qs[x].qlen == 1) {		/* The flow is new */
-		if (q->tail == SFQ_DEPTH) {	/* It is the first flow */
-			q->tail = x;
-			q->next[x] = x;
-			q->allot[x] = q->quantum;
+	if (slot->qlen == 1) {		/* The flow is new */
+		if (q->tail == NULL) {	/* It is the first flow */
+			slot->next = x;
 		} else {
-			q->next[x] = q->next[q->tail];
-			q->next[q->tail] = x;
-			q->tail = x;
+			slot->next = q->tail->next;
+			q->tail->next = x;
 		}
+		q->tail = slot;
+		slot->allot = q->quantum;
 	}
 	if (++sch->q.qlen <= q->limit) {
 		sch->bstats.bytes += qdisc_pkt_len(skb);
@@ -344,14 +373,12 @@ static struct sk_buff *
 sfq_peek(struct Qdisc *sch)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
-	sfq_index a;
 
 	/* No active slots */
-	if (q->tail == SFQ_DEPTH)
+	if (q->tail == NULL)
 		return NULL;
 
-	a = q->next[q->tail];
-	return skb_peek(&q->qs[a]);
+	return q->slots[q->tail->next].skblist_next;
 }
 
 static struct sk_buff *
@@ -359,34 +386,35 @@ sfq_dequeue(struct Qdisc *sch)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
 	struct sk_buff *skb;
-	sfq_index a, old_a;
+	sfq_index a, next_a;
+	struct sfq_slot *slot;
 
 	/* No active slots */
-	if (q->tail == SFQ_DEPTH)
+	if (q->tail == NULL)
 		return NULL;
 
-	a = old_a = q->next[q->tail];
-
+	a = q->tail->next;
+	slot = &q->slots[a];
 	/* Grab packet */
-	skb = __skb_dequeue(&q->qs[a]);
+	skb = slot->skblist_next;
+	slot->skblist_next = skb->next;
+	skb->next = skb->prev = NULL;
 	sfq_dec(q, a);
 	sch->q.qlen--;
 	sch->qstats.backlog -= qdisc_pkt_len(skb);
 
-	/* Is the slot empty? */
-	if (q->qs[a].qlen == 0) {
-		q->ht[q->hash[a]] = SFQ_DEPTH;
-		a = q->next[a];
-		if (a == old_a) {
-			q->tail = SFQ_DEPTH;
+	/* Is the slot now empty? */
+	if (slot->qlen == 0) {
+		q->ht[slot->hash] = EMPTY_SLOT;
+		next_a = slot->next;
+		if (a == next_a) {
+			q->tail = NULL; /* no more active slots */
 			return skb;
 		}
-		q->next[q->tail] = a;
-		q->allot[a] += q->quantum;
-	} else if ((q->allot[a] -= qdisc_pkt_len(skb)) <= 0) {
-		q->tail = a;
-		a = q->next[a];
-		q->allot[a] += q->quantum;
+		q->tail->next = next_a;
+	} else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) {
+		q->tail = slot;
+		slot->allot += q->quantum;
 	}
 	return skb;
 }
@@ -450,17 +478,16 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt)
 	init_timer_deferrable(&q->perturb_timer);
 
 	for (i = 0; i < SFQ_HASH_DIVISOR; i++)
-		q->ht[i] = SFQ_DEPTH;
+		q->ht[i] = EMPTY_SLOT;
 
 	for (i = 0; i < SFQ_DEPTH; i++) {
-		skb_queue_head_init(&q->qs[i]);
-		q->dep[i + SFQ_DEPTH].next = i + SFQ_DEPTH;
-		q->dep[i + SFQ_DEPTH].prev = i + SFQ_DEPTH;
+		q->dep[i].next = i + SFQ_SLOTS;
+		q->dep[i].prev = i + SFQ_SLOTS;
 	}
 
 	q->limit = SFQ_DEPTH - 1;
 	q->max_depth = 0;
-	q->tail = SFQ_DEPTH;
+	q->tail = NULL;
 	if (opt == NULL) {
 		q->quantum = psched_mtu(qdisc_dev(sch));
 		q->perturb_period = 0;
@@ -471,7 +498,7 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt)
 			return err;
 	}
 
-	for (i = 0; i < SFQ_DEPTH; i++)
+	for (i = 0; i < SFQ_SLOTS; i++)
 		sfq_link(q, i);
 	return 0;
 }
@@ -547,9 +574,9 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 				struct gnet_dump *d)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
-	sfq_index idx = q->ht[cl-1];
-	struct gnet_stats_queue qs = { .qlen = q->qs[idx].qlen };
-	struct tc_sfq_xstats xstats = { .allot = q->allot[idx] };
+	const struct sfq_slot *slot = &q->slots[q->ht[cl - 1]];
+	struct gnet_stats_queue qs = { .qlen = slot->qlen };
+	struct tc_sfq_xstats xstats = { .allot = slot->allot };
 
 	if (gnet_stats_copy_queue(d, &qs) < 0)
 		return -1;
@@ -565,7 +592,7 @@ static void sfq_walk(struct Qdisc *sch, struct qdisc_walker *arg)
 		return;
 
 	for (i = 0; i < SFQ_HASH_DIVISOR; i++) {
-		if (q->ht[i] == SFQ_DEPTH ||
+		if (q->ht[i] == EMPTY_SLOT ||
 		    arg->count < arg->skip) {
 			arg->count++;
 			continue;



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [RFC PATCH] net_sched: sch_sfq: better struct layouts
  2010-12-17 16:52               ` [RFC PATCH] net_sched: sch_sfq: better struct layouts Eric Dumazet
@ 2010-12-19 21:22                 ` Jarek Poplawski
  2010-12-20 17:02                   ` [PATCH v2] " Eric Dumazet
  0 siblings, 1 reply; 40+ messages in thread
From: Jarek Poplawski @ 2010-12-19 21:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Patrick McHardy, David Miller, netdev

On Fri, Dec 17, 2010 at 05:52:46PM +0100, Eric Dumazet wrote:
> Le jeudi 16 décembre 2010 ?? 14:08 +0100, Eric Dumazet a écrit :
...
> > struct sfq_slot {
> > 	struct sk_buff *first;
> > 	struct sk_buff *last;
> > 	u8 qlen;
> > 	sfq_index next; /* dequeue chain */
> > 	u16 hash;
> > 	short allot;
> > 	/* 16bit hole */
> > };
> > 
> > This would save 768 bytes on x86_64 (and much more if LOCKDEP is used)

I think open coding sk_buff_head is a wrong idea. Otherwise, this
patch looks OK to me, only a few cosmetic suggestions below.

> 
> Here is a preliminary patch, shrinking sizeof(struct sfq_sched_data)
> from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and
> reduce text size as well.
> 
>    text	   data	    bss	    dec	    hex	filename
>    4821	    152	      0	   4973	   136d	old/net/sched/sch_sfq.o
>    4651	    136	      0	   4787	   12b3	new/net/sched/sch_sfq.o
> 
> 
> All data for a slot/flow is now grouped in a compact and cache friendly
> structure :
> 
> struct sfq_slot {
>         struct sk_buff  *skblist_next;
>         struct sk_buff  *skblist_prev;
>         sfq_index       qlen; /* number of skbs in skblist */
>         sfq_index       next; /* next slot in sfq chain */
>         unsigned short  hash; /* hash value (index in ht[]) */
>         short           allot; /* credit for this slot */
>         struct sfq_head anchor; /* anchor in dep[] chains */
> };
> 
> 
> 
>  net/sched/sch_sfq.c |  223 +++++++++++++++++++++++-------------------
>  1 file changed, 125 insertions(+), 98 deletions(-)
> 
> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> index 3cf478d..28968eb 100644
> --- a/net/sched/sch_sfq.c
> +++ b/net/sched/sch_sfq.c
> @@ -69,25 +69,40 @@
>  	This implementation limits maximal queue length to 128;
>  	maximal mtu to 2^15-1; number of hash buckets to 1024.
>  	The only goal of this restrictions was that all data
> -	fit into one 4K page :-). Struct sfq_sched_data is
> -	organized in anti-cache manner: all the data for a bucket
> -	are scattered over different locations. This is not good,
> -	but it allowed me to put it into 4K.
> +	fit into one 4K page on 32bit arches.
>  
>  	It is easy to increase these values, but not in flight.  */
>  
> -#define SFQ_DEPTH		128
> +#define SFQ_DEPTH		128 /* max number of packets per slot (per flow) */
> +#define SFQ_SLOTS		128 /* max number of flows */
> +#define EMPTY_SLOT		255

SFQ_EMPTY_SLOT?

>  #define SFQ_HASH_DIVISOR	1024
>  
> -/* This type should contain at least SFQ_DEPTH*2 values */
> +/* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */
>  typedef unsigned char sfq_index;
>  
> +/*
> + * We dont use pointers to save space.
> + * Small indexes [0 ... SFQ_SLOTS - 1] are 'pointers' to slots[] array
> + * while following values [SFQ_SLOTS ... SFQ_SLOTS + SFQ_DEPTH - 1]
> + * are 'pointers' to dep[] array
> + */
>  struct sfq_head
>  {
>  	sfq_index	next;
>  	sfq_index	prev;
>  };
>  
> +struct sfq_slot {
> +	struct sk_buff	*skblist_next;
> +	struct sk_buff	*skblist_prev;
> +	sfq_index	qlen; /* number of skbs in skblist */
> +	sfq_index	next; /* next slot in sfq chain */
> +	unsigned short	hash; /* hash value (index in ht[]) */
> +	short		allot; /* credit for this slot */
> +	struct sfq_head anchor; /* anchor in dep[] chains */

struct sfq_head dep?

> +};
> +
>  struct sfq_sched_data
>  {
>  /* Parameters */
> @@ -99,17 +114,24 @@ struct sfq_sched_data
>  	struct tcf_proto *filter_list;
>  	struct timer_list perturb_timer;
>  	u32		perturbation;
> -	sfq_index	tail;		/* Index of current slot in round */
> -	sfq_index	max_depth;	/* Maximal depth */
> +	sfq_index	max_depth;	/* depth of longest slot */

depth and/or length? (One dimension should be enough.)

>  
> +	struct sfq_slot *tail;		/* current slot in round */
>  	sfq_index	ht[SFQ_HASH_DIVISOR];	/* Hash table */
> -	sfq_index	next[SFQ_DEPTH];	/* Active slots link */
> -	short		allot[SFQ_DEPTH];	/* Current allotment per slot */
> -	unsigned short	hash[SFQ_DEPTH];	/* Hash value indexed by slots */
> -	struct sk_buff_head	qs[SFQ_DEPTH];		/* Slot queue */
> -	struct sfq_head	dep[SFQ_DEPTH*2];	/* Linked list of slots, indexed by depth */
> +	struct sfq_slot	slots[SFQ_SLOTS];
> +	struct sfq_head	dep[SFQ_DEPTH];	/* Linked list of slots, indexed by depth */
>  };
>  
> +/*
> + * sfq_head are either in a sfq_slot or in dep[] array
> + */
> +static inline struct sfq_head *get_head(struct sfq_sched_data *q, sfq_index val)

static inline struct sfq_head *sfq_dep_head()?

...
> @@ -304,31 +328,36 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>  	hash--;
>  
>  	x = q->ht[hash];
> -	if (x == SFQ_DEPTH) {
> -		q->ht[hash] = x = q->dep[SFQ_DEPTH].next;
> -		q->hash[x] = hash;
> +	slot = &q->slots[x];
> +	if (x == EMPTY_SLOT) {
> +		x = q->dep[0].next; /* get a free slot */
> +		q->ht[hash] = x;
> +		slot = &q->slots[x];
> +		slot->hash = hash;
> +		slot->skblist_next = slot->skblist_prev = (struct sk_buff *)slot;
>  	}
>  
> -	/* If selected queue has length q->limit, this means that
> -	 * all another queues are empty and that we do simple tail drop,

No reason to remove this line.

> +	/* If selected queue has length q->limit, do simple tail drop,
>  	 * i.e. drop _this_ packet.
>  	 */
> -	if (q->qs[x].qlen >= q->limit)
> +	if (slot->qlen >= q->limit)
>  		return qdisc_drop(skb, sch);
>  
>  	sch->qstats.backlog += qdisc_pkt_len(skb);
> -	__skb_queue_tail(&q->qs[x], skb);
> +	skb->prev = slot->skblist_prev;
> +	skb->next = (struct sk_buff *)slot;
> +	slot->skblist_prev->next = skb;
> +	slot->skblist_prev = skb;

If you really have to do this, all these: __skb_queue_tail(),
__skb_dequeue(), skb_queue_head_init(), skb_peek() etc. used here
should stay as (local) inline functions to remain readability.

Jarek P.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2] net_sched: sch_sfq: better struct layouts
  2010-12-19 21:22                 ` Jarek Poplawski
@ 2010-12-20 17:02                   ` Eric Dumazet
  2010-12-20 21:33                     ` David Miller
                                       ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Eric Dumazet @ 2010-12-20 17:02 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Patrick McHardy, David Miller, netdev

Le dimanche 19 décembre 2010 à 22:22 +0100, Jarek Poplawski a écrit :

> I think open coding sk_buff_head is a wrong idea. Otherwise, this
> patch looks OK to me, only a few cosmetic suggestions below.
> 

I completely agree with you but this should be temporary, because David
really wants to use list_head for skbs, I believe this will be done ;)

I chose to name the list skblist to make clear where we want to plug a
real list_head once done.

Also, not using sk_buff_head saves at least 8 bytes per slot.

>   
> > -#define SFQ_DEPTH		128
> > +#define SFQ_DEPTH		128 /* max number of packets per slot (per flow) */
> > +#define SFQ_SLOTS		128 /* max number of flows */
> > +#define EMPTY_SLOT		255
> 
> SFQ_EMPTY_SLOT?

OK done


> >  
> > +struct sfq_slot {
> > +	struct sk_buff	*skblist_next;
> > +	struct sk_buff	*skblist_prev;
> > +	sfq_index	qlen; /* number of skbs in skblist */
> > +	sfq_index	next; /* next slot in sfq chain */
> > +	unsigned short	hash; /* hash value (index in ht[]) */
> > +	short		allot; /* credit for this slot */
> > +	struct sfq_head anchor; /* anchor in dep[] chains */
> 
> struct sfq_head dep?

OK

> 
> > +};
> > +
> >  struct sfq_sched_data
> >  {
> >  /* Parameters */
> > @@ -99,17 +114,24 @@ struct sfq_sched_data
> >  	struct tcf_proto *filter_list;
> >  	struct timer_list perturb_timer;
> >  	u32		perturbation;
> > -	sfq_index	tail;		/* Index of current slot in round */
> > -	sfq_index	max_depth;	/* Maximal depth */
> > +	sfq_index	max_depth;	/* depth of longest slot */
> 
> depth and/or length? (One dimension should be enough.)

maybe cur_depth ? Its not the maximal possible depth, but depth of
longest slot, or current max depth...

> 
> >  
> > +	struct sfq_slot *tail;		/* current slot in round */
> >  	sfq_index	ht[SFQ_HASH_DIVISOR];	/* Hash table */
> > -	sfq_index	next[SFQ_DEPTH];	/* Active slots link */
> > -	short		allot[SFQ_DEPTH];	/* Current allotment per slot */
> > -	unsigned short	hash[SFQ_DEPTH];	/* Hash value indexed by slots */
> > -	struct sk_buff_head	qs[SFQ_DEPTH];		/* Slot queue */
> > -	struct sfq_head	dep[SFQ_DEPTH*2];	/* Linked list of slots, indexed by depth */
> > +	struct sfq_slot	slots[SFQ_SLOTS];
> > +	struct sfq_head	dep[SFQ_DEPTH];	/* Linked list of slots, indexed by depth */
> >  };
> >  
> > +/*
> > + * sfq_head are either in a sfq_slot or in dep[] array
> > + */
> > +static inline struct sfq_head *get_head(struct sfq_sched_data *q, sfq_index val)
> 
> static inline struct sfq_head *sfq_dep_head()?
> 

OK


> > -	/* If selected queue has length q->limit, this means that
> > -	 * all another queues are empty and that we do simple tail drop,
> 
> No reason to remove this line.

Well, the reason we drop this packet is not because other queues are
empty, but because we reach max depth for this queue. (I have the idea
to extend SFQ to allow more packets to be queued, still with a 127 limit
per queue, and 127 flows). With 10Gbs links, a global limit of 127
packets is short.


> If you really have to do this, all these: __skb_queue_tail(),
> __skb_dequeue(), skb_queue_head_init(), skb_peek() etc. used here
> should stay as (local) inline functions to remain readability.
> 

OK done, thanks a lot for reviewing and very useful comments !

We should address the problem of allot being 16bit, GRO makes allot
overflow so fast, that SFQ is not fair at all...

allot could use 17 bits and hash only 15 (10 really needed for current
divisor)

[PATCH v2] net_sched: sch_sfq: better struct layouts

This patch shrinks sizeof(struct sfq_sched_data)
from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and
reduce text size as well.

   text    data     bss     dec     hex filename
   4821     152       0    4973    136d old/net/sched/sch_sfq.o
   4627     136       0    4763    129b new/net/sched/sch_sfq.o


All data for a slot/flow is now grouped in a compact and cache friendly
structure, instead of being spreaded in many different points.

struct sfq_slot {
        struct sk_buff  *skblist_next;
        struct sk_buff  *skblist_prev;
        sfq_index       qlen; /* number of skbs in skblist */
        sfq_index       next; /* next slot in sfq chain */
        unsigned short  hash; /* hash value (index in ht[]) */
        short           allot; /* credit for this slot */
        struct sfq_head dep; /* anchor in dep[] chains */
};

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jarek Poplawski <jarkao2@gmail.com>
---
v2: address Jarek comments

 net/sched/sch_sfq.c |  263 +++++++++++++++++++++++++-----------------
 1 files changed, 160 insertions(+), 103 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 3cf478d..ef94f3d 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -67,27 +67,42 @@
 
 	IMPLEMENTATION:
 	This implementation limits maximal queue length to 128;
-	maximal mtu to 2^15-1; number of hash buckets to 1024.
+	maximal mtu to 2^15-1; max 128 flows, number of hash buckets to 1024.
 	The only goal of this restrictions was that all data
-	fit into one 4K page :-). Struct sfq_sched_data is
-	organized in anti-cache manner: all the data for a bucket
-	are scattered over different locations. This is not good,
-	but it allowed me to put it into 4K.
+	fit into one 4K page on 32bit arches.
 
 	It is easy to increase these values, but not in flight.  */
 
-#define SFQ_DEPTH		128
+#define SFQ_DEPTH		128 /* max number of packets per flow */
+#define SFQ_SLOTS		128 /* max number of flows */
+#define SFQ_EMPTY_SLOT		255
 #define SFQ_HASH_DIVISOR	1024
 
-/* This type should contain at least SFQ_DEPTH*2 values */
+/* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */
 typedef unsigned char sfq_index;
 
+/*
+ * We dont use pointers to save space.
+ * Small indexes [0 ... SFQ_SLOTS - 1] are 'pointers' to slots[] array
+ * while following values [SFQ_SLOTS ... SFQ_SLOTS + SFQ_DEPTH - 1]
+ * are 'pointers' to dep[] array
+ */
 struct sfq_head
 {
 	sfq_index	next;
 	sfq_index	prev;
 };
 
+struct sfq_slot {
+	struct sk_buff	*skblist_next;
+	struct sk_buff	*skblist_prev;
+	sfq_index	qlen; /* number of skbs in skblist */
+	sfq_index	next; /* next slot in sfq chain */
+	unsigned short	hash; /* hash value (index in ht[]) */
+	short		allot; /* credit for this slot */
+	struct sfq_head dep; /* anchor in dep[] chains */
+};
+
 struct sfq_sched_data
 {
 /* Parameters */
@@ -99,17 +114,24 @@ struct sfq_sched_data
 	struct tcf_proto *filter_list;
 	struct timer_list perturb_timer;
 	u32		perturbation;
-	sfq_index	tail;		/* Index of current slot in round */
-	sfq_index	max_depth;	/* Maximal depth */
+	sfq_index	cur_depth;	/* depth of longest slot */
 
+	struct sfq_slot *tail;		/* current slot in round */
 	sfq_index	ht[SFQ_HASH_DIVISOR];	/* Hash table */
-	sfq_index	next[SFQ_DEPTH];	/* Active slots link */
-	short		allot[SFQ_DEPTH];	/* Current allotment per slot */
-	unsigned short	hash[SFQ_DEPTH];	/* Hash value indexed by slots */
-	struct sk_buff_head	qs[SFQ_DEPTH];		/* Slot queue */
-	struct sfq_head	dep[SFQ_DEPTH*2];	/* Linked list of slots, indexed by depth */
+	struct sfq_slot	slots[SFQ_SLOTS];
+	struct sfq_head	dep[SFQ_DEPTH];	/* Linked list of slots, indexed by depth */
 };
 
+/*
+ * sfq_head are either in a sfq_slot or in dep[] array
+ */
+static inline struct sfq_head *sfq_dep_head(struct sfq_sched_data *q, sfq_index val)
+{
+	if (val < SFQ_SLOTS)
+		return &q->slots[val].dep;
+	return &q->dep[val - SFQ_SLOTS];
+}
+
 static __inline__ unsigned sfq_fold_hash(struct sfq_sched_data *q, u32 h, u32 h1)
 {
 	return jhash_2words(h, h1, q->perturbation) & (SFQ_HASH_DIVISOR - 1);
@@ -200,30 +222,41 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 	return 0;
 }
 
+/*
+ * x : slot number [0 .. SFQ_SLOTS - 1]
+ */
 static inline void sfq_link(struct sfq_sched_data *q, sfq_index x)
 {
 	sfq_index p, n;
-	int d = q->qs[x].qlen + SFQ_DEPTH;
+	int qlen = q->slots[x].qlen;
 
-	p = d;
-	n = q->dep[d].next;
-	q->dep[x].next = n;
-	q->dep[x].prev = p;
-	q->dep[p].next = q->dep[n].prev = x;
+	p = qlen + SFQ_SLOTS;
+	n = q->dep[qlen].next;
+
+	q->slots[x].dep.next = n;
+	q->slots[x].dep.prev = p;
+
+	q->dep[qlen].next = x;		/* sfq_dep_head(q, p)->next = x */
+	sfq_dep_head(q, n)->prev = x;
 }
 
+#define sfq_unlink(q, x, n, p)			\
+	n = q->slots[x].dep.next;		\
+	p = q->slots[x].dep.prev;		\
+	sfq_dep_head(q, p)->next = n;		\
+	sfq_dep_head(q, n)->prev = p
+	
+
 static inline void sfq_dec(struct sfq_sched_data *q, sfq_index x)
 {
 	sfq_index p, n;
+	int d;
 
-	n = q->dep[x].next;
-	p = q->dep[x].prev;
-	q->dep[p].next = n;
-	q->dep[n].prev = p;
-
-	if (n == p && q->max_depth == q->qs[x].qlen + 1)
-		q->max_depth--;
+	sfq_unlink(q, x, n, p);
 
+	d = q->slots[x].qlen--;
+	if (n == p && q->cur_depth == d)
+		q->cur_depth--;
 	sfq_link(q, x);
 }
 
@@ -232,34 +265,68 @@ static inline void sfq_inc(struct sfq_sched_data *q, sfq_index x)
 	sfq_index p, n;
 	int d;
 
-	n = q->dep[x].next;
-	p = q->dep[x].prev;
-	q->dep[p].next = n;
-	q->dep[n].prev = p;
-	d = q->qs[x].qlen;
-	if (q->max_depth < d)
-		q->max_depth = d;
+	sfq_unlink(q, x, n, p);
 
+	d = ++q->slots[x].qlen;
+	if (q->cur_depth < d)
+		q->cur_depth = d;
 	sfq_link(q, x);
 }
 
+/* helper functions : might be changed when/if skb use a standard list_head */
+
+/* remove one skb from tail of slot queue */
+static inline struct sk_buff *slot_dequeue_tail(struct sfq_slot *slot)
+{
+	struct sk_buff *skb = slot->skblist_prev;
+
+	slot->skblist_prev = skb->prev;
+	skb->next = skb->prev = NULL;
+	return skb;
+}
+
+/* remove one skb from head of slot queue */
+static inline struct sk_buff *slot_dequeue_head(struct sfq_slot *slot)
+{
+	struct sk_buff *skb = slot->skblist_next;
+
+	slot->skblist_next = skb->next;
+	skb->next = skb->prev = NULL;
+	return skb;
+}
+
+static inline void slot_queue_init(struct sfq_slot *slot)
+{
+	slot->skblist_prev = slot->skblist_next = (struct sk_buff *)slot;
+}
+
+/* add skb to slot queue (tail add) */
+static inline void slot_queue_add(struct sfq_slot *slot, struct sk_buff *skb)
+{
+	skb->prev = slot->skblist_prev;
+	skb->next = (struct sk_buff *)slot;
+	slot->skblist_prev->next = skb;
+	slot->skblist_prev = skb;
+}
+
+
 static unsigned int sfq_drop(struct Qdisc *sch)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
-	sfq_index d = q->max_depth;
+	sfq_index x, d = q->cur_depth;
 	struct sk_buff *skb;
 	unsigned int len;
+	struct sfq_slot *slot;
 
-	/* Queue is full! Find the longest slot and
-	   drop a packet from it */
-
+	/* Queue is full! Find the longest slot and drop tail packet from it */
 	if (d > 1) {
-		sfq_index x = q->dep[d + SFQ_DEPTH].next;
-		skb = q->qs[x].prev;
+		x = q->dep[d].next;
+		slot = &q->slots[x];
+drop:
+		skb = slot_dequeue_tail(slot);
 		len = qdisc_pkt_len(skb);
-		__skb_unlink(skb, &q->qs[x]);
-		kfree_skb(skb);
 		sfq_dec(q, x);
+		kfree_skb(skb);
 		sch->q.qlen--;
 		sch->qstats.drops++;
 		sch->qstats.backlog -= len;
@@ -268,19 +335,11 @@ static unsigned int sfq_drop(struct Qdisc *sch)
 
 	if (d == 1) {
 		/* It is difficult to believe, but ALL THE SLOTS HAVE LENGTH 1. */
-		d = q->next[q->tail];
-		q->next[q->tail] = q->next[d];
-		q->allot[q->next[d]] += q->quantum;
-		skb = q->qs[d].prev;
-		len = qdisc_pkt_len(skb);
-		__skb_unlink(skb, &q->qs[d]);
-		kfree_skb(skb);
-		sfq_dec(q, d);
-		sch->q.qlen--;
-		q->ht[q->hash[d]] = SFQ_DEPTH;
-		sch->qstats.drops++;
-		sch->qstats.backlog -= len;
-		return len;
+		x = q->tail->next;
+		slot = &q->slots[x];
+		q->tail->next = slot->next;
+		q->ht[slot->hash] = SFQ_EMPTY_SLOT;
+		goto drop;
 	}
 
 	return 0;
@@ -292,6 +351,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	struct sfq_sched_data *q = qdisc_priv(sch);
 	unsigned int hash;
 	sfq_index x;
+	struct sfq_slot *slot;
 	int uninitialized_var(ret);
 
 	hash = sfq_classify(skb, sch, &ret);
@@ -304,31 +364,33 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	hash--;
 
 	x = q->ht[hash];
-	if (x == SFQ_DEPTH) {
-		q->ht[hash] = x = q->dep[SFQ_DEPTH].next;
-		q->hash[x] = hash;
+	slot = &q->slots[x];
+	if (x == SFQ_EMPTY_SLOT) {
+		x = q->dep[0].next; /* get a free slot */
+		q->ht[hash] = x;
+		slot = &q->slots[x];
+		slot->hash = hash;
+		slot_queue_init(slot);
 	}
 
-	/* If selected queue has length q->limit, this means that
-	 * all another queues are empty and that we do simple tail drop,
+	/* If selected queue has length q->limit, do simple tail drop,
 	 * i.e. drop _this_ packet.
 	 */
-	if (q->qs[x].qlen >= q->limit)
+	if (slot->qlen >= q->limit)
 		return qdisc_drop(skb, sch);
 
 	sch->qstats.backlog += qdisc_pkt_len(skb);
-	__skb_queue_tail(&q->qs[x], skb);
+	slot_queue_add(slot, skb);
 	sfq_inc(q, x);
-	if (q->qs[x].qlen == 1) {		/* The flow is new */
-		if (q->tail == SFQ_DEPTH) {	/* It is the first flow */
-			q->tail = x;
-			q->next[x] = x;
-			q->allot[x] = q->quantum;
+	if (slot->qlen == 1) {		/* The flow is new */
+		if (q->tail == NULL) {	/* It is the first flow */
+			slot->next = x;
 		} else {
-			q->next[x] = q->next[q->tail];
-			q->next[q->tail] = x;
-			q->tail = x;
+			slot->next = q->tail->next;
+			q->tail->next = x;
 		}
+		q->tail = slot;
+		slot->allot = q->quantum;
 	}
 	if (++sch->q.qlen <= q->limit) {
 		sch->bstats.bytes += qdisc_pkt_len(skb);
@@ -344,14 +406,12 @@ static struct sk_buff *
 sfq_peek(struct Qdisc *sch)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
-	sfq_index a;
 
 	/* No active slots */
-	if (q->tail == SFQ_DEPTH)
+	if (q->tail == NULL)
 		return NULL;
 
-	a = q->next[q->tail];
-	return skb_peek(&q->qs[a]);
+	return q->slots[q->tail->next].skblist_next;
 }
 
 static struct sk_buff *
@@ -359,34 +419,32 @@ sfq_dequeue(struct Qdisc *sch)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
 	struct sk_buff *skb;
-	sfq_index a, old_a;
+	sfq_index a, next_a;
+	struct sfq_slot *slot;
 
 	/* No active slots */
-	if (q->tail == SFQ_DEPTH)
+	if (q->tail == NULL)
 		return NULL;
 
-	a = old_a = q->next[q->tail];
-
-	/* Grab packet */
-	skb = __skb_dequeue(&q->qs[a]);
+	a = q->tail->next;
+	slot = &q->slots[a];
+	skb = slot_dequeue_head(slot);
 	sfq_dec(q, a);
 	sch->q.qlen--;
 	sch->qstats.backlog -= qdisc_pkt_len(skb);
 
-	/* Is the slot empty? */
-	if (q->qs[a].qlen == 0) {
-		q->ht[q->hash[a]] = SFQ_DEPTH;
-		a = q->next[a];
-		if (a == old_a) {
-			q->tail = SFQ_DEPTH;
+	/* Is the slot now empty? */
+	if (slot->qlen == 0) {
+		q->ht[slot->hash] = SFQ_EMPTY_SLOT;
+		next_a = slot->next;
+		if (a == next_a) {
+			q->tail = NULL; /* no more active slots */
 			return skb;
 		}
-		q->next[q->tail] = a;
-		q->allot[a] += q->quantum;
-	} else if ((q->allot[a] -= qdisc_pkt_len(skb)) <= 0) {
-		q->tail = a;
-		a = q->next[a];
-		q->allot[a] += q->quantum;
+		q->tail->next = next_a;
+	} else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) {
+		q->tail = slot;
+		slot->allot += q->quantum;
 	}
 	return skb;
 }
@@ -450,17 +508,16 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt)
 	init_timer_deferrable(&q->perturb_timer);
 
 	for (i = 0; i < SFQ_HASH_DIVISOR; i++)
-		q->ht[i] = SFQ_DEPTH;
+		q->ht[i] = SFQ_EMPTY_SLOT;
 
 	for (i = 0; i < SFQ_DEPTH; i++) {
-		skb_queue_head_init(&q->qs[i]);
-		q->dep[i + SFQ_DEPTH].next = i + SFQ_DEPTH;
-		q->dep[i + SFQ_DEPTH].prev = i + SFQ_DEPTH;
+		q->dep[i].next = i + SFQ_SLOTS;
+		q->dep[i].prev = i + SFQ_SLOTS;
 	}
 
 	q->limit = SFQ_DEPTH - 1;
-	q->max_depth = 0;
-	q->tail = SFQ_DEPTH;
+	q->cur_depth = 0;
+	q->tail = NULL;
 	if (opt == NULL) {
 		q->quantum = psched_mtu(qdisc_dev(sch));
 		q->perturb_period = 0;
@@ -471,7 +528,7 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt)
 			return err;
 	}
 
-	for (i = 0; i < SFQ_DEPTH; i++)
+	for (i = 0; i < SFQ_SLOTS; i++)
 		sfq_link(q, i);
 	return 0;
 }
@@ -547,9 +604,9 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 				struct gnet_dump *d)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
-	sfq_index idx = q->ht[cl-1];
-	struct gnet_stats_queue qs = { .qlen = q->qs[idx].qlen };
-	struct tc_sfq_xstats xstats = { .allot = q->allot[idx] };
+	const struct sfq_slot *slot = &q->slots[q->ht[cl - 1]];
+	struct gnet_stats_queue qs = { .qlen = slot->qlen };
+	struct tc_sfq_xstats xstats = { .allot = slot->allot };
 
 	if (gnet_stats_copy_queue(d, &qs) < 0)
 		return -1;
@@ -565,7 +622,7 @@ static void sfq_walk(struct Qdisc *sch, struct qdisc_walker *arg)
 		return;
 
 	for (i = 0; i < SFQ_HASH_DIVISOR; i++) {
-		if (q->ht[i] == SFQ_DEPTH ||
+		if (q->ht[i] == SFQ_EMPTY_SLOT ||
 		    arg->count < arg->skip) {
 			arg->count++;
 			continue;



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats()
  2010-12-15 18:18                   ` [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() Eric Dumazet
  2010-12-15 19:10                     ` Eric Dumazet
  2010-12-16  8:16                     ` Jarek Poplawski
@ 2010-12-20 21:14                     ` David Miller
  2 siblings, 0 replies; 40+ messages in thread
From: David Miller @ 2010-12-20 21:14 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, jarkao2, kaber

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 15 Dec 2010 19:18:36 +0100

> We currently return for each active SFQ slot the number of packets in
> queue. We can also give number of bytes accounted for these packets.
> 
> tc -s class show dev ifb0
> 
> Before patch :
> 
> class sfq 11:3d9 parent 11: 
>  (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 3p requeues 0 
>  allot 1266 
> 
> After patch :
> 
> class sfq 11:3e4 parent 11: 
>  (dropped 0, overlimits 0 requeues 0) 
>  backlog 4380b 3p requeues 0 
>  allot 1212 
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v3] net_sched: sch_sfq: fix allot handling
  2010-12-15 17:30                 ` [PATCH v3] " Eric Dumazet
  2010-12-15 18:18                   ` [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() Eric Dumazet
@ 2010-12-20 21:18                   ` David Miller
  1 sibling, 0 replies; 40+ messages in thread
From: David Miller @ 2010-12-20 21:18 UTC (permalink / raw)
  To: eric.dumazet; +Cc: kaber, netdev, jarkao2

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 15 Dec 2010 18:30:27 +0100

> [PATCH v3] net_sched: sch_sfq: fix allot handling
 ...
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied to net-2.6, thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] net_sched: sch_sfq: better struct layouts
  2010-12-20 17:02                   ` [PATCH v2] " Eric Dumazet
@ 2010-12-20 21:33                     ` David Miller
  2010-12-20 21:42                       ` Eric Dumazet
  2010-12-20 22:55                     ` [PATCH v2] " Jarek Poplawski
  2010-12-20 23:16                     ` [PATCH net-next-2.6] sch_sfq: allow big packets and be fair Eric Dumazet
  2 siblings, 1 reply; 40+ messages in thread
From: David Miller @ 2010-12-20 21:33 UTC (permalink / raw)
  To: eric.dumazet; +Cc: jarkao2, kaber, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 20 Dec 2010 18:02:05 +0100

> [PATCH v2] net_sched: sch_sfq: better struct layouts

Please respin this against current net-next-2.6, it doesn't
apply cleanly after your other two sfq changes.

Thanks!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] net_sched: sch_sfq: better struct layouts
  2010-12-20 21:33                     ` David Miller
@ 2010-12-20 21:42                       ` Eric Dumazet
  2010-12-20 22:54                         ` [PATCH v3 net-next-2.6] " Eric Dumazet
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2010-12-20 21:42 UTC (permalink / raw)
  To: David Miller; +Cc: jarkao2, kaber, netdev

Le lundi 20 décembre 2010 à 13:33 -0800, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 20 Dec 2010 18:02:05 +0100
> 
> > [PATCH v2] net_sched: sch_sfq: better struct layouts
> 
> Please respin this against current net-next-2.6, it doesn't
> apply cleanly after your other two sfq changes.
> 

Sure, will do, thanks David.




^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v3 net-next-2.6] net_sched: sch_sfq: better struct layouts
  2010-12-20 21:42                       ` Eric Dumazet
@ 2010-12-20 22:54                         ` Eric Dumazet
  2010-12-21  5:33                           ` David Miller
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2010-12-20 22:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Patrick McHardy, Jarek Poplawski

Here is a respin of patch.

I'll send a short patch to make SFQ more fair in presence of large
packets as well.

Thanks

[PATCH v3 net-next-2.6] net_sched: sch_sfq: better struct layouts

This patch shrinks sizeof(struct sfq_sched_data)
from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and
reduce text size as well.

   text    data     bss     dec     hex filename
   4821     152       0    4973    136d old/net/sched/sch_sfq.o
   4627     136       0    4763    129b new/net/sched/sch_sfq.o


All data for a slot/flow is now grouped in a compact and cache friendly
structure, instead of being spreaded in many different points.

struct sfq_slot {
        struct sk_buff  *skblist_next;
        struct sk_buff  *skblist_prev;
        sfq_index       qlen; /* number of skbs in skblist */
        sfq_index       next; /* next slot in sfq chain */
        struct sfq_head dep; /* anchor in dep[] chains */
        unsigned short  hash; /* hash value (index in ht[]) */
        short           allot; /* credit for this slot */
};

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jarek Poplawski <jarkao2@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
---
v3: respin after two prior patches applied

 net/sched/sch_sfq.c |  260 ++++++++++++++++++++++++++----------------
 1 file changed, 162 insertions(+), 98 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 42396c9..c474b4b 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -67,27 +67,42 @@
 
 	IMPLEMENTATION:
 	This implementation limits maximal queue length to 128;
-	maximal mtu to 2^15-1; number of hash buckets to 1024.
+	maximal mtu to 2^15-1; max 128 flows, number of hash buckets to 1024.
 	The only goal of this restrictions was that all data
-	fit into one 4K page :-). Struct sfq_sched_data is
-	organized in anti-cache manner: all the data for a bucket
-	are scattered over different locations. This is not good,
-	but it allowed me to put it into 4K.
+	fit into one 4K page on 32bit arches.
 
 	It is easy to increase these values, but not in flight.  */
 
-#define SFQ_DEPTH		128
+#define SFQ_DEPTH		128 /* max number of packets per flow */
+#define SFQ_SLOTS		128 /* max number of flows */
+#define SFQ_EMPTY_SLOT		255
 #define SFQ_HASH_DIVISOR	1024
 
-/* This type should contain at least SFQ_DEPTH*2 values */
+/* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */
 typedef unsigned char sfq_index;
 
+/*
+ * We dont use pointers to save space.
+ * Small indexes [0 ... SFQ_SLOTS - 1] are 'pointers' to slots[] array
+ * while following values [SFQ_SLOTS ... SFQ_SLOTS + SFQ_DEPTH - 1]
+ * are 'pointers' to dep[] array
+ */
 struct sfq_head
 {
 	sfq_index	next;
 	sfq_index	prev;
 };
 
+struct sfq_slot {
+	struct sk_buff	*skblist_next;
+	struct sk_buff	*skblist_prev;
+	sfq_index	qlen; /* number of skbs in skblist */
+	sfq_index	next; /* next slot in sfq chain */
+	struct sfq_head dep; /* anchor in dep[] chains */
+	unsigned short	hash; /* hash value (index in ht[]) */
+	short		allot; /* credit for this slot */
+};
+
 struct sfq_sched_data
 {
 /* Parameters */
@@ -99,17 +114,24 @@ struct sfq_sched_data
 	struct tcf_proto *filter_list;
 	struct timer_list perturb_timer;
 	u32		perturbation;
-	sfq_index	tail;		/* Index of current slot in round */
-	sfq_index	max_depth;	/* Maximal depth */
+	sfq_index	cur_depth;	/* depth of longest slot */
 
+	struct sfq_slot *tail;		/* current slot in round */
 	sfq_index	ht[SFQ_HASH_DIVISOR];	/* Hash table */
-	sfq_index	next[SFQ_DEPTH];	/* Active slots link */
-	short		allot[SFQ_DEPTH];	/* Current allotment per slot */
-	unsigned short	hash[SFQ_DEPTH];	/* Hash value indexed by slots */
-	struct sk_buff_head	qs[SFQ_DEPTH];		/* Slot queue */
-	struct sfq_head	dep[SFQ_DEPTH*2];	/* Linked list of slots, indexed by depth */
+	struct sfq_slot	slots[SFQ_SLOTS];
+	struct sfq_head	dep[SFQ_DEPTH];	/* Linked list of slots, indexed by depth */
 };
 
+/*
+ * sfq_head are either in a sfq_slot or in dep[] array
+ */
+static inline struct sfq_head *sfq_dep_head(struct sfq_sched_data *q, sfq_index val)
+{
+	if (val < SFQ_SLOTS)
+		return &q->slots[val].dep;
+	return &q->dep[val - SFQ_SLOTS];
+}
+
 static __inline__ unsigned sfq_fold_hash(struct sfq_sched_data *q, u32 h, u32 h1)
 {
 	return jhash_2words(h, h1, q->perturbation) & (SFQ_HASH_DIVISOR - 1);
@@ -200,30 +222,41 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch,
 	return 0;
 }
 
+/*
+ * x : slot number [0 .. SFQ_SLOTS - 1]
+ */
 static inline void sfq_link(struct sfq_sched_data *q, sfq_index x)
 {
 	sfq_index p, n;
-	int d = q->qs[x].qlen + SFQ_DEPTH;
+	int qlen = q->slots[x].qlen;
 
-	p = d;
-	n = q->dep[d].next;
-	q->dep[x].next = n;
-	q->dep[x].prev = p;
-	q->dep[p].next = q->dep[n].prev = x;
+	p = qlen + SFQ_SLOTS;
+	n = q->dep[qlen].next;
+
+	q->slots[x].dep.next = n;
+	q->slots[x].dep.prev = p;
+
+	q->dep[qlen].next = x;		/* sfq_dep_head(q, p)->next = x */
+	sfq_dep_head(q, n)->prev = x;
 }
 
+#define sfq_unlink(q, x, n, p)			\
+	n = q->slots[x].dep.next;		\
+	p = q->slots[x].dep.prev;		\
+	sfq_dep_head(q, p)->next = n;		\
+	sfq_dep_head(q, n)->prev = p
+	
+
 static inline void sfq_dec(struct sfq_sched_data *q, sfq_index x)
 {
 	sfq_index p, n;
+	int d;
 
-	n = q->dep[x].next;
-	p = q->dep[x].prev;
-	q->dep[p].next = n;
-	q->dep[n].prev = p;
-
-	if (n == p && q->max_depth == q->qs[x].qlen + 1)
-		q->max_depth--;
+	sfq_unlink(q, x, n, p);
 
+	d = q->slots[x].qlen--;
+	if (n == p && q->cur_depth == d)
+		q->cur_depth--;
 	sfq_link(q, x);
 }
 
@@ -232,34 +265,72 @@ static inline void sfq_inc(struct sfq_sched_data *q, sfq_index x)
 	sfq_index p, n;
 	int d;
 
-	n = q->dep[x].next;
-	p = q->dep[x].prev;
-	q->dep[p].next = n;
-	q->dep[n].prev = p;
-	d = q->qs[x].qlen;
-	if (q->max_depth < d)
-		q->max_depth = d;
+	sfq_unlink(q, x, n, p);
 
+	d = ++q->slots[x].qlen;
+	if (q->cur_depth < d)
+		q->cur_depth = d;
 	sfq_link(q, x);
 }
 
+/* helper functions : might be changed when/if skb use a standard list_head */
+
+/* remove one skb from tail of slot queue */
+static inline struct sk_buff *slot_dequeue_tail(struct sfq_slot *slot)
+{
+	struct sk_buff *skb = slot->skblist_prev;
+
+	slot->skblist_prev = skb->prev;
+	skb->next = skb->prev = NULL;
+	return skb;
+}
+
+/* remove one skb from head of slot queue */
+static inline struct sk_buff *slot_dequeue_head(struct sfq_slot *slot)
+{
+	struct sk_buff *skb = slot->skblist_next;
+
+	slot->skblist_next = skb->next;
+	skb->next = skb->prev = NULL;
+	return skb;
+}
+
+static inline void slot_queue_init(struct sfq_slot *slot)
+{
+	slot->skblist_prev = slot->skblist_next = (struct sk_buff *)slot;
+}
+
+/* add skb to slot queue (tail add) */
+static inline void slot_queue_add(struct sfq_slot *slot, struct sk_buff *skb)
+{
+	skb->prev = slot->skblist_prev;
+	skb->next = (struct sk_buff *)slot;
+	slot->skblist_prev->next = skb;
+	slot->skblist_prev = skb;
+}
+
+#define	slot_queue_walk(slot, skb)		\
+	for (skb = slot->skblist_next;		\
+	     skb != (struct sk_buff *)slot;	\
+	     skb = skb->next)
+
 static unsigned int sfq_drop(struct Qdisc *sch)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
-	sfq_index d = q->max_depth;
+	sfq_index x, d = q->cur_depth;
 	struct sk_buff *skb;
 	unsigned int len;
+	struct sfq_slot *slot;
 
-	/* Queue is full! Find the longest slot and
-	   drop a packet from it */
-
+	/* Queue is full! Find the longest slot and drop tail packet from it */
 	if (d > 1) {
-		sfq_index x = q->dep[d + SFQ_DEPTH].next;
-		skb = q->qs[x].prev;
+		x = q->dep[d].next;
+		slot = &q->slots[x];
+drop:
+		skb = slot_dequeue_tail(slot);
 		len = qdisc_pkt_len(skb);
-		__skb_unlink(skb, &q->qs[x]);
-		kfree_skb(skb);
 		sfq_dec(q, x);
+		kfree_skb(skb);
 		sch->q.qlen--;
 		sch->qstats.drops++;
 		sch->qstats.backlog -= len;
@@ -268,18 +339,11 @@ static unsigned int sfq_drop(struct Qdisc *sch)
 
 	if (d == 1) {
 		/* It is difficult to believe, but ALL THE SLOTS HAVE LENGTH 1. */
-		d = q->next[q->tail];
-		q->next[q->tail] = q->next[d];
-		skb = q->qs[d].prev;
-		len = qdisc_pkt_len(skb);
-		__skb_unlink(skb, &q->qs[d]);
-		kfree_skb(skb);
-		sfq_dec(q, d);
-		sch->q.qlen--;
-		q->ht[q->hash[d]] = SFQ_DEPTH;
-		sch->qstats.drops++;
-		sch->qstats.backlog -= len;
-		return len;
+		x = q->tail->next;
+		slot = &q->slots[x];
+		q->tail->next = slot->next;
+		q->ht[slot->hash] = SFQ_EMPTY_SLOT;
+		goto drop;
 	}
 
 	return 0;
@@ -291,6 +355,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	struct sfq_sched_data *q = qdisc_priv(sch);
 	unsigned int hash;
 	sfq_index x;
+	struct sfq_slot *slot;
 	int uninitialized_var(ret);
 
 	hash = sfq_classify(skb, sch, &ret);
@@ -303,30 +368,33 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	hash--;
 
 	x = q->ht[hash];
-	if (x == SFQ_DEPTH) {
-		q->ht[hash] = x = q->dep[SFQ_DEPTH].next;
-		q->hash[x] = hash;
+	slot = &q->slots[x];
+	if (x == SFQ_EMPTY_SLOT) {
+		x = q->dep[0].next; /* get a free slot */
+		q->ht[hash] = x;
+		slot = &q->slots[x];
+		slot->hash = hash;
+		slot_queue_init(slot);
 	}
 
-	/* If selected queue has length q->limit, this means that
-	 * all another queues are empty and that we do simple tail drop,
+	/* If selected queue has length q->limit, do simple tail drop,
 	 * i.e. drop _this_ packet.
 	 */
-	if (q->qs[x].qlen >= q->limit)
+	if (slot->qlen >= q->limit)
 		return qdisc_drop(skb, sch);
 
 	sch->qstats.backlog += qdisc_pkt_len(skb);
-	__skb_queue_tail(&q->qs[x], skb);
+	slot_queue_add(slot, skb);
 	sfq_inc(q, x);
-	if (q->qs[x].qlen == 1) {		/* The flow is new */
-		if (q->tail == SFQ_DEPTH) {	/* It is the first flow */
-			q->next[x] = x;
+	if (slot->qlen == 1) {		/* The flow is new */
+		if (q->tail == NULL) {	/* It is the first flow */
+			slot->next = x;
 		} else {
-			q->next[x] = q->next[q->tail];
-			q->next[q->tail] = x;
+			slot->next = q->tail->next;
+			q->tail->next = x;
 		}
-		q->tail = x;
-		q->allot[x] = q->quantum;
+		q->tail = slot;
+		slot->allot = q->quantum;
 	}
 	if (++sch->q.qlen <= q->limit) {
 		sch->bstats.bytes += qdisc_pkt_len(skb);
@@ -342,14 +410,12 @@ static struct sk_buff *
 sfq_peek(struct Qdisc *sch)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
-	sfq_index a;
 
 	/* No active slots */
-	if (q->tail == SFQ_DEPTH)
+	if (q->tail == NULL)
 		return NULL;
 
-	a = q->next[q->tail];
-	return skb_peek(&q->qs[a]);
+	return q->slots[q->tail->next].skblist_next;
 }
 
 static struct sk_buff *
@@ -358,31 +424,31 @@ sfq_dequeue(struct Qdisc *sch)
 	struct sfq_sched_data *q = qdisc_priv(sch);
 	struct sk_buff *skb;
 	sfq_index a, next_a;
+	struct sfq_slot *slot;
 
 	/* No active slots */
-	if (q->tail == SFQ_DEPTH)
+	if (q->tail == NULL)
 		return NULL;
 
-	a = q->next[q->tail];
-
-	/* Grab packet */
-	skb = __skb_dequeue(&q->qs[a]);
+	a = q->tail->next;
+	slot = &q->slots[a];
+	skb = slot_dequeue_head(slot);
 	sfq_dec(q, a);
 	sch->q.qlen--;
 	sch->qstats.backlog -= qdisc_pkt_len(skb);
 
 	/* Is the slot empty? */
-	if (q->qs[a].qlen == 0) {
-		q->ht[q->hash[a]] = SFQ_DEPTH;
-		next_a = q->next[a];
+	if (slot->qlen == 0) {
+		q->ht[slot->hash] = SFQ_EMPTY_SLOT;
+		next_a = slot->next;
 		if (a == next_a) {
-			q->tail = SFQ_DEPTH;
+			q->tail = NULL; /* no more active slots */
 			return skb;
 		}
-		q->next[q->tail] = next_a;
-	} else if ((q->allot[a] -= qdisc_pkt_len(skb)) <= 0) {
-		q->allot[a] += q->quantum;
-		q->tail = a;
+		q->tail->next = next_a;
+	} else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) {
+		q->tail = slot;
+		slot->allot += q->quantum;
 	}
 	return skb;
 }
@@ -446,17 +512,16 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt)
 	init_timer_deferrable(&q->perturb_timer);
 
 	for (i = 0; i < SFQ_HASH_DIVISOR; i++)
-		q->ht[i] = SFQ_DEPTH;
+		q->ht[i] = SFQ_EMPTY_SLOT;
 
 	for (i = 0; i < SFQ_DEPTH; i++) {
-		skb_queue_head_init(&q->qs[i]);
-		q->dep[i + SFQ_DEPTH].next = i + SFQ_DEPTH;
-		q->dep[i + SFQ_DEPTH].prev = i + SFQ_DEPTH;
+		q->dep[i].next = i + SFQ_SLOTS;
+		q->dep[i].prev = i + SFQ_SLOTS;
 	}
 
 	q->limit = SFQ_DEPTH - 1;
-	q->max_depth = 0;
-	q->tail = SFQ_DEPTH;
+	q->cur_depth = 0;
+	q->tail = NULL;
 	if (opt == NULL) {
 		q->quantum = psched_mtu(qdisc_dev(sch));
 		q->perturb_period = 0;
@@ -467,7 +532,7 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt)
 			return err;
 	}
 
-	for (i = 0; i < SFQ_DEPTH; i++)
+	for (i = 0; i < SFQ_SLOTS; i++)
 		sfq_link(q, i);
 	return 0;
 }
@@ -543,13 +608,12 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 				struct gnet_dump *d)
 {
 	struct sfq_sched_data *q = qdisc_priv(sch);
-	sfq_index idx = q->ht[cl-1];
-	struct sk_buff_head *list = &q->qs[idx];
-	struct gnet_stats_queue qs = { .qlen = list->qlen };
-	struct tc_sfq_xstats xstats = { .allot = q->allot[idx] };
+	const struct sfq_slot *slot = &q->slots[q->ht[cl - 1]];
+	struct gnet_stats_queue qs = { .qlen = slot->qlen };
+	struct tc_sfq_xstats xstats = { .allot = slot->allot };
 	struct sk_buff *skb;
 
-	skb_queue_walk(list, skb)
+	slot_queue_walk(slot, skb)
 		qs.backlog += qdisc_pkt_len(skb);
 
 	if (gnet_stats_copy_queue(d, &qs) < 0)
@@ -566,7 +630,7 @@ static void sfq_walk(struct Qdisc *sch, struct qdisc_walker *arg)
 		return;
 
 	for (i = 0; i < SFQ_HASH_DIVISOR; i++) {
-		if (q->ht[i] == SFQ_DEPTH ||
+		if (q->ht[i] == SFQ_EMPTY_SLOT ||
 		    arg->count < arg->skip) {
 			arg->count++;
 			continue;



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2] net_sched: sch_sfq: better struct layouts
  2010-12-20 17:02                   ` [PATCH v2] " Eric Dumazet
  2010-12-20 21:33                     ` David Miller
@ 2010-12-20 22:55                     ` Jarek Poplawski
  2010-12-20 23:16                     ` [PATCH net-next-2.6] sch_sfq: allow big packets and be fair Eric Dumazet
  2 siblings, 0 replies; 40+ messages in thread
From: Jarek Poplawski @ 2010-12-20 22:55 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Patrick McHardy, David Miller, netdev

On Mon, Dec 20, 2010 at 06:02:05PM +0100, Eric Dumazet wrote:
> Le dimanche 19 décembre 2010 ?? 22:22 +0100, Jarek Poplawski a écrit :
> 
> > I think open coding sk_buff_head is a wrong idea. Otherwise, this
> > patch looks OK to me, only a few cosmetic suggestions below.
> > 
> 
> I completely agree with you but this should be temporary, because David
> really wants to use list_head for skbs, I believe this will be done ;)
> 
> I chose to name the list skblist to make clear where we want to plug a
> real list_head once done.
> 
> Also, not using sk_buff_head saves at least 8 bytes per slot.

Alas I dumped my 486sx already :-/

> > > -	sfq_index	max_depth;	/* Maximal depth */
> > > +	sfq_index	max_depth;	/* depth of longest slot */
> > 
> > depth and/or length? (One dimension should be enough.)
> 
> maybe cur_depth ? Its not the maximal possible depth, but depth of
> longest slot, or current max depth...

Hmm... or max_depth? I meant the comment only, sorry ;-)

> > > -	/* If selected queue has length q->limit, this means that
> > > -	 * all another queues are empty and that we do simple tail drop,
> > 
> > No reason to remove this line.
> 
> Well, the reason we drop this packet is not because other queues are
> empty, but because we reach max depth for this queue. (I have the idea
> to extend SFQ to allow more packets to be queued, still with a 127 limit
> per queue, and 127 flows). With 10Gbs links, a global limit of 127
> packets is short.

Right, but does this line say something else? Of course, you can find
it out by yourself too, but this comment makes reading a bit faster.

> > If you really have to do this, all these: __skb_queue_tail(),
> > __skb_dequeue(), skb_queue_head_init(), skb_peek() etc. used here
> > should stay as (local) inline functions to remain readability.
> > 
> 
> OK done, thanks a lot for reviewing and very useful comments !

Thanks for using them!
Jarek P.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH net-next-2.6] sch_sfq: allow big packets and be fair
  2010-12-20 17:02                   ` [PATCH v2] " Eric Dumazet
  2010-12-20 21:33                     ` David Miller
  2010-12-20 22:55                     ` [PATCH v2] " Jarek Poplawski
@ 2010-12-20 23:16                     ` Eric Dumazet
  2010-12-21 10:15                       ` Jarek Poplawski
  2 siblings, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2010-12-20 23:16 UTC (permalink / raw)
  To: David Miller; +Cc: Patrick McHardy, netdev, Jarek Poplawski

SFQ is currently 'limited' to small packets, because it uses a 16bit
allotment number per flow. Switch it to 18bit, and use appropriate
handling to make sure this allotment is in [1 .. quantum] range before a
new packet is dequeued, so that fairness is respected.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jarek Poplawski <jarkao2@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
---
 net/sched/sch_sfq.c |   24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index c474b4b..878704a 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -67,7 +67,7 @@
 
 	IMPLEMENTATION:
 	This implementation limits maximal queue length to 128;
-	maximal mtu to 2^15-1; max 128 flows, number of hash buckets to 1024.
+	maximal mtu to 2^16-1; max 128 flows, number of hash buckets to 1024.
 	The only goal of this restrictions was that all data
 	fit into one 4K page on 32bit arches.
 
@@ -99,9 +99,10 @@ struct sfq_slot {
 	sfq_index	qlen; /* number of skbs in skblist */
 	sfq_index	next; /* next slot in sfq chain */
 	struct sfq_head dep; /* anchor in dep[] chains */
-	unsigned short	hash; /* hash value (index in ht[]) */
-	short		allot; /* credit for this slot */
+	unsigned int	hash:14; /* hash value (index in ht[]) */
+	unsigned int	allot:18; /* credit for this slot */
 };
+#define ALLOT_ZERO (1 << 16)
 
 struct sfq_sched_data
 {
@@ -394,7 +395,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 			q->tail->next = x;
 		}
 		q->tail = slot;
-		slot->allot = q->quantum;
+		slot->allot = ALLOT_ZERO + q->quantum;
 	}
 	if (++sch->q.qlen <= q->limit) {
 		sch->bstats.bytes += qdisc_pkt_len(skb);
@@ -430,8 +431,14 @@ sfq_dequeue(struct Qdisc *sch)
 	if (q->tail == NULL)
 		return NULL;
 
+next:
 	a = q->tail->next;
 	slot = &q->slots[a];
+	if (slot->allot <= ALLOT_ZERO) {
+		q->tail = slot;
+		slot->allot += q->quantum;
+		goto next;
+	}
 	skb = slot_dequeue_head(slot);
 	sfq_dec(q, a);
 	sch->q.qlen--;
@@ -446,9 +453,8 @@ sfq_dequeue(struct Qdisc *sch)
 			return skb;
 		}
 		q->tail->next = next_a;
-	} else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) {
-		q->tail = slot;
-		slot->allot += q->quantum;
+	} else {
+		slot->allot -= qdisc_pkt_len(skb);
 	}
 	return skb;
 }
@@ -610,7 +616,9 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	struct sfq_sched_data *q = qdisc_priv(sch);
 	const struct sfq_slot *slot = &q->slots[q->ht[cl - 1]];
 	struct gnet_stats_queue qs = { .qlen = slot->qlen };
-	struct tc_sfq_xstats xstats = { .allot = slot->allot };
+	struct tc_sfq_xstats xstats = {
+		.allot = slot->allot - ALLOT_ZERO
+	};
 	struct sk_buff *skb;
 
 	slot_queue_walk(slot, skb)



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 net-next-2.6] net_sched: sch_sfq: better struct layouts
  2010-12-20 22:54                         ` [PATCH v3 net-next-2.6] " Eric Dumazet
@ 2010-12-21  5:33                           ` David Miller
  0 siblings, 0 replies; 40+ messages in thread
From: David Miller @ 2010-12-21  5:33 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, kaber, jarkao2

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 20 Dec 2010 23:54:58 +0100

> [PATCH v3 net-next-2.6] net_sched: sch_sfq: better struct layouts
> 
> This patch shrinks sizeof(struct sfq_sched_data)
> from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and
> reduce text size as well.
> 
>    text    data     bss     dec     hex filename
>    4821     152       0    4973    136d old/net/sched/sch_sfq.o
>    4627     136       0    4763    129b new/net/sched/sch_sfq.o
> 
> 
> All data for a slot/flow is now grouped in a compact and cache friendly
> structure, instead of being spreaded in many different points.
> 
> struct sfq_slot {
>         struct sk_buff  *skblist_next;
>         struct sk_buff  *skblist_prev;
>         sfq_index       qlen; /* number of skbs in skblist */
>         sfq_index       next; /* next slot in sfq chain */
>         struct sfq_head dep; /* anchor in dep[] chains */
>         unsigned short  hash; /* hash value (index in ht[]) */
>         short           allot; /* credit for this slot */
> };
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next-2.6] sch_sfq: allow big packets and be fair
  2010-12-20 23:16                     ` [PATCH net-next-2.6] sch_sfq: allow big packets and be fair Eric Dumazet
@ 2010-12-21 10:15                       ` Jarek Poplawski
  2010-12-21 10:30                         ` Jarek Poplawski
  2010-12-21 10:57                         ` Eric Dumazet
  0 siblings, 2 replies; 40+ messages in thread
From: Jarek Poplawski @ 2010-12-21 10:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Patrick McHardy, netdev

On 2010-12-21 00:16, Eric Dumazet wrote:
> SFQ is currently 'limited' to small packets, because it uses a 16bit
> allotment number per flow. Switch it to 18bit, and use appropriate
> handling to make sure this allotment is in [1 .. quantum] range before a
> new packet is dequeued, so that fairness is respected.

Well, such two important changes should be in separate patches.

The change of allotment limit looks OK (but I would try scaling, e.g.
in 16-byte chunks, btw).

The change in fair treatment looks dubious. A flow which uses exactly
it's quantum in one round will be skipped in the next round. A flow
which uses a bit more than its quantum in one round, will be skipped
too, while we should only give it less this time to keep the sum up to
2 quantums. (The usual algorithm is to check if a flow has enough
"tickets" for sending its next packet.)

Jarek P.

> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Jarek Poplawski <jarkao2@gmail.com>
> Cc: Patrick McHardy <kaber@trash.net>
> ---
>  net/sched/sch_sfq.c |   24 ++++++++++++++++--------
>  1 file changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> index c474b4b..878704a 100644
> --- a/net/sched/sch_sfq.c
> +++ b/net/sched/sch_sfq.c
> @@ -67,7 +67,7 @@
>  
>  	IMPLEMENTATION:
>  	This implementation limits maximal queue length to 128;
> -	maximal mtu to 2^15-1; max 128 flows, number of hash buckets to 1024.
> +	maximal mtu to 2^16-1; max 128 flows, number of hash buckets to 1024.
>  	The only goal of this restrictions was that all data
>  	fit into one 4K page on 32bit arches.
>  
> @@ -99,9 +99,10 @@ struct sfq_slot {
>  	sfq_index	qlen; /* number of skbs in skblist */
>  	sfq_index	next; /* next slot in sfq chain */
>  	struct sfq_head dep; /* anchor in dep[] chains */
> -	unsigned short	hash; /* hash value (index in ht[]) */
> -	short		allot; /* credit for this slot */
> +	unsigned int	hash:14; /* hash value (index in ht[]) */
> +	unsigned int	allot:18; /* credit for this slot */
>  };
> +#define ALLOT_ZERO (1 << 16)
>  
>  struct sfq_sched_data
>  {
> @@ -394,7 +395,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>  			q->tail->next = x;
>  		}
>  		q->tail = slot;
> -		slot->allot = q->quantum;
> +		slot->allot = ALLOT_ZERO + q->quantum;
>  	}
>  	if (++sch->q.qlen <= q->limit) {
>  		sch->bstats.bytes += qdisc_pkt_len(skb);
> @@ -430,8 +431,14 @@ sfq_dequeue(struct Qdisc *sch)
>  	if (q->tail == NULL)
>  		return NULL;
>  
> +next:
>  	a = q->tail->next;
>  	slot = &q->slots[a];
> +	if (slot->allot <= ALLOT_ZERO) {
> +		q->tail = slot;
> +		slot->allot += q->quantum;
> +		goto next;
> +	}
>  	skb = slot_dequeue_head(slot);
>  	sfq_dec(q, a);
>  	sch->q.qlen--;
> @@ -446,9 +453,8 @@ sfq_dequeue(struct Qdisc *sch)
>  			return skb;
>  		}
>  		q->tail->next = next_a;
> -	} else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) {
> -		q->tail = slot;
> -		slot->allot += q->quantum;
> +	} else {
> +		slot->allot -= qdisc_pkt_len(skb);
>  	}
>  	return skb;
>  }
> @@ -610,7 +616,9 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
>  	struct sfq_sched_data *q = qdisc_priv(sch);
>  	const struct sfq_slot *slot = &q->slots[q->ht[cl - 1]];
>  	struct gnet_stats_queue qs = { .qlen = slot->qlen };
> -	struct tc_sfq_xstats xstats = { .allot = slot->allot };
> +	struct tc_sfq_xstats xstats = {
> +		.allot = slot->allot - ALLOT_ZERO
> +	};
>  	struct sk_buff *skb;
>  
>  	slot_queue_walk(slot, skb)
> 
> 
> --

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next-2.6] sch_sfq: allow big packets and be fair
  2010-12-21 10:15                       ` Jarek Poplawski
@ 2010-12-21 10:30                         ` Jarek Poplawski
  2010-12-21 10:44                           ` Eric Dumazet
  2010-12-21 10:57                         ` Eric Dumazet
  1 sibling, 1 reply; 40+ messages in thread
From: Jarek Poplawski @ 2010-12-21 10:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Patrick McHardy, netdev

On Tue, Dec 21, 2010 at 10:15:06AM +0000, Jarek Poplawski wrote:
> The change of allotment limit looks OK [...]

Hmm... but maybe s/ALLOT_ZERO/SFQ_ALLOT_ZERO/? ;-)

Jarek P.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next-2.6] sch_sfq: allow big packets and be fair
  2010-12-21 10:30                         ` Jarek Poplawski
@ 2010-12-21 10:44                           ` Eric Dumazet
  2010-12-21 10:56                             ` Jarek Poplawski
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2010-12-21 10:44 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, Patrick McHardy, netdev

Le mardi 21 décembre 2010 à 10:30 +0000, Jarek Poplawski a écrit :
> On Tue, Dec 21, 2010 at 10:15:06AM +0000, Jarek Poplawski wrote:
> > The change of allotment limit looks OK [...]
> 
> Hmm... but maybe s/ALLOT_ZERO/SFQ_ALLOT_ZERO/? ;-)

Its a local symbol, its not like it being in an include file ;)



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next-2.6] sch_sfq: allow big packets and be fair
  2010-12-21 10:44                           ` Eric Dumazet
@ 2010-12-21 10:56                             ` Jarek Poplawski
  0 siblings, 0 replies; 40+ messages in thread
From: Jarek Poplawski @ 2010-12-21 10:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Patrick McHardy, netdev

On Tue, Dec 21, 2010 at 11:44:29AM +0100, Eric Dumazet wrote:
> Le mardi 21 décembre 2010 ?? 10:30 +0000, Jarek Poplawski a écrit :
> > On Tue, Dec 21, 2010 at 10:15:06AM +0000, Jarek Poplawski wrote:
> > > The change of allotment limit looks OK [...]
> > 
> > Hmm... but maybe s/ALLOT_ZERO/SFQ_ALLOT_ZERO/? ;-)
> 
> Its a local symbol, its not like it being in an include file ;)

Sure, but why this one has to be different than others in this file?
(Plus, if you don't remember all this code it's a good hint.)

Jarek P.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next-2.6] sch_sfq: allow big packets and be fair
  2010-12-21 10:15                       ` Jarek Poplawski
  2010-12-21 10:30                         ` Jarek Poplawski
@ 2010-12-21 10:57                         ` Eric Dumazet
  2010-12-21 11:39                           ` Jarek Poplawski
  1 sibling, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2010-12-21 10:57 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, Patrick McHardy, netdev

Le mardi 21 décembre 2010 à 10:15 +0000, Jarek Poplawski a écrit :
> On 2010-12-21 00:16, Eric Dumazet wrote:
> > SFQ is currently 'limited' to small packets, because it uses a 16bit
> > allotment number per flow. Switch it to 18bit, and use appropriate
> > handling to make sure this allotment is in [1 .. quantum] range before a
> > new packet is dequeued, so that fairness is respected.
> 
> Well, such two important changes should be in separate patches.
> 
> The change of allotment limit looks OK (but I would try scaling, e.g.
> in 16-byte chunks, btw).
> 

Hmm, we could scale by 2 or 3 and keep 16bit allot/hash (faster than
18/14 bit bitfields on x86). Not sure its worth it (it adds two shifts
per packet)


> The change in fair treatment looks dubious. A flow which uses exactly
> it's quantum in one round will be skipped in the next round. A flow
> which uses a bit more than its quantum in one round, will be skipped
> too, while we should only give it less this time to keep the sum up to
> 2 quantums. (The usual algorithm is to check if a flow has enough
> "tickets" for sending its next packet.)

Hmm... 

A flow which uses exactly its quantum in one round wont be skipped in
the next round.

I only made the "I pass my round to next slot in chain" in one place
instead of two, maybe you missed the removal at the end of
sfq_dequeue() ?

-	} else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) {
-		q->tail = slot;
-		slot->allot += q->quantum;
+	} else {
+		slot->allot -= qdisc_pkt_len(skb);
	}

Now the check is performed at the beginning of sfq_dequeue(), to be able
to charge a previously sent 'big packet' multiple times (faulty flow
wont send a packet before passing xx rounds)

I believe I just did the right thing. The "allot" is incremented when
current flow "pass its round to next slot", and decremented when a
packet is dequeued from this slot. Before being allowed to dequeue a
packet, "allot" must be 'positive'.




^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next-2.6] sch_sfq: allow big packets and be fair
  2010-12-21 10:57                         ` Eric Dumazet
@ 2010-12-21 11:39                           ` Jarek Poplawski
  2010-12-21 12:17                             ` Jarek Poplawski
  0 siblings, 1 reply; 40+ messages in thread
From: Jarek Poplawski @ 2010-12-21 11:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Patrick McHardy, netdev

On Tue, Dec 21, 2010 at 11:57:17AM +0100, Eric Dumazet wrote:
> Le mardi 21 décembre 2010 ?? 10:15 +0000, Jarek Poplawski a écrit :
> > On 2010-12-21 00:16, Eric Dumazet wrote:
> > > SFQ is currently 'limited' to small packets, because it uses a 16bit
> > > allotment number per flow. Switch it to 18bit, and use appropriate
> > > handling to make sure this allotment is in [1 .. quantum] range before a
> > > new packet is dequeued, so that fairness is respected.
> > 
> > Well, such two important changes should be in separate patches.
> > 
> > The change of allotment limit looks OK (but I would try scaling, e.g.
> > in 16-byte chunks, btw).
> > 
> 
> Hmm, we could scale by 2 or 3 and keep 16bit allot/hash (faster than
> 18/14 bit bitfields on x86). Not sure its worth it (it adds two shifts
> per packet)

I'm OK with any of those methods.

> > The change in fair treatment looks dubious. A flow which uses exactly
> > it's quantum in one round will be skipped in the next round. A flow
> > which uses a bit more than its quantum in one round, will be skipped
> > too, while we should only give it less this time to keep the sum up to
> > 2 quantums. (The usual algorithm is to check if a flow has enough
> > "tickets" for sending its next packet.)
> 
> Hmm... 
> 
> A flow which uses exactly its quantum in one round wont be skipped in
> the next round.
> 
> I only made the "I pass my round to next slot in chain" in one place
> instead of two, maybe you missed the removal at the end of
> sfq_dequeue() ?
> 
> -	} else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) {
> -		q->tail = slot;
> -		slot->allot += q->quantum;
> +	} else {
> +		slot->allot -= qdisc_pkt_len(skb);
> 	}
> 
> Now the check is performed at the beginning of sfq_dequeue(), to be able
> to charge a previously sent 'big packet' multiple times (faulty flow
> wont send a packet before passing xx rounds)
> 
> I believe I just did the right thing. The "allot" is incremented when
> current flow "pass its round to next slot", and decremented when a
> packet is dequeued from this slot. Before being allowed to dequeue a
> packet, "allot" must be 'positive'.

Simply try to check my examples before and after. There is no skipping
of a round now. It's a serious change. Somebody tried to avoid it at
all in the current implementation. You should also think about fairness
of normal (but different) size packets.

Jarek P.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next-2.6] sch_sfq: allow big packets and be fair
  2010-12-21 11:39                           ` Jarek Poplawski
@ 2010-12-21 12:17                             ` Jarek Poplawski
  2010-12-21 13:04                               ` [PATCH v2 " Eric Dumazet
  0 siblings, 1 reply; 40+ messages in thread
From: Jarek Poplawski @ 2010-12-21 12:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Patrick McHardy, netdev

On Tue, Dec 21, 2010 at 11:39:20AM +0000, Jarek Poplawski wrote:
> On Tue, Dec 21, 2010 at 11:57:17AM +0100, Eric Dumazet wrote:
> > Now the check is performed at the beginning of sfq_dequeue(), to be able
> > to charge a previously sent 'big packet' multiple times (faulty flow
> > wont send a packet before passing xx rounds)
> > 
> > I believe I just did the right thing. The "allot" is incremented when
> > current flow "pass its round to next slot", and decremented when a
> > packet is dequeued from this slot. Before being allowed to dequeue a
> > packet, "allot" must be 'positive'.
> 
> Simply try to check my examples before and after. There is no skipping
> of a round now. It's a serious change. Somebody tried to avoid it at
> all in the current implementation. You should also think about fairness
> of normal (but different) size packets.

Oops! You're right yet ;-) This skipping shouldn't happen with quantum
bigger than max packet size, so this patch is OK.

Sorry,
Jarek P.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 net-next-2.6] sch_sfq: allow big packets and be fair
  2010-12-21 12:17                             ` Jarek Poplawski
@ 2010-12-21 13:04                               ` Eric Dumazet
  2010-12-21 13:47                                 ` Jarek Poplawski
  2010-12-28 21:46                                 ` David Miller
  0 siblings, 2 replies; 40+ messages in thread
From: Eric Dumazet @ 2010-12-21 13:04 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, Patrick McHardy, netdev

Le mardi 21 décembre 2010 à 12:17 +0000, Jarek Poplawski a écrit :

> Oops! You're right yet ;-) This skipping shouldn't happen with quantum
> bigger than max packet size, so this patch is OK.

Thanks Jarek, here is a v2 with the scale you suggested.

[PATCH v2 net-next-2.6] sch_sfq: allow big packets and be fair

SFQ is currently 'limited' to small packets, because it uses a 15bit
allotment number per flow. Introduce a scale by 8, so that we can handle
full size TSO/GRO packets.

Use appropriate handling to make sure allot is positive before a new
packet is dequeued, so that fairness is respected.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jarek Poplawski <jarkao2@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
---
v2: Use a scale of 8 as Jarek suggested, instead of 18bit fields

 net/sched/sch_sfq.c |   28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index c474b4b..f3a9fd7 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -67,7 +67,7 @@
 
 	IMPLEMENTATION:
 	This implementation limits maximal queue length to 128;
-	maximal mtu to 2^15-1; max 128 flows, number of hash buckets to 1024.
+	max mtu to 2^18-1; max 128 flows, number of hash buckets to 1024.
 	The only goal of this restrictions was that all data
 	fit into one 4K page on 32bit arches.
 
@@ -77,6 +77,11 @@
 #define SFQ_SLOTS		128 /* max number of flows */
 #define SFQ_EMPTY_SLOT		255
 #define SFQ_HASH_DIVISOR	1024
+/* We use 15+1 bits to store allot, and want to handle packets up to 64K
+ * Scale allot by 8 (1<<3) so that no overflow occurs.
+ */
+#define SFQ_ALLOT_SHIFT		3
+#define SFQ_ALLOT_SIZE(X)	DIV_ROUND_UP(X, 1 << SFQ_ALLOT_SHIFT)
 
 /* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */
 typedef unsigned char sfq_index;
@@ -115,7 +120,7 @@ struct sfq_sched_data
 	struct timer_list perturb_timer;
 	u32		perturbation;
 	sfq_index	cur_depth;	/* depth of longest slot */
-
+	unsigned short  scaled_quantum; /* SFQ_ALLOT_SIZE(quantum) */
 	struct sfq_slot *tail;		/* current slot in round */
 	sfq_index	ht[SFQ_HASH_DIVISOR];	/* Hash table */
 	struct sfq_slot	slots[SFQ_SLOTS];
@@ -394,7 +399,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 			q->tail->next = x;
 		}
 		q->tail = slot;
-		slot->allot = q->quantum;
+		slot->allot = q->scaled_quantum;
 	}
 	if (++sch->q.qlen <= q->limit) {
 		sch->bstats.bytes += qdisc_pkt_len(skb);
@@ -430,8 +435,14 @@ sfq_dequeue(struct Qdisc *sch)
 	if (q->tail == NULL)
 		return NULL;
 
+next_slot:
 	a = q->tail->next;
 	slot = &q->slots[a];
+	if (slot->allot <= 0) {
+		q->tail = slot;
+		slot->allot += q->scaled_quantum;
+		goto next_slot;
+	}
 	skb = slot_dequeue_head(slot);
 	sfq_dec(q, a);
 	sch->q.qlen--;
@@ -446,9 +457,8 @@ sfq_dequeue(struct Qdisc *sch)
 			return skb;
 		}
 		q->tail->next = next_a;
-	} else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) {
-		q->tail = slot;
-		slot->allot += q->quantum;
+	} else {
+		slot->allot -= SFQ_ALLOT_SIZE(qdisc_pkt_len(skb));
 	}
 	return skb;
 }
@@ -484,6 +494,7 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt)
 
 	sch_tree_lock(sch);
 	q->quantum = ctl->quantum ? : psched_mtu(qdisc_dev(sch));
+	q->scaled_quantum = SFQ_ALLOT_SIZE(q->quantum);
 	q->perturb_period = ctl->perturb_period * HZ;
 	if (ctl->limit)
 		q->limit = min_t(u32, ctl->limit, SFQ_DEPTH - 1);
@@ -524,6 +535,7 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt)
 	q->tail = NULL;
 	if (opt == NULL) {
 		q->quantum = psched_mtu(qdisc_dev(sch));
+		q->scaled_quantum = SFQ_ALLOT_SIZE(q->quantum);
 		q->perturb_period = 0;
 		q->perturbation = net_random();
 	} else {
@@ -610,7 +622,9 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	struct sfq_sched_data *q = qdisc_priv(sch);
 	const struct sfq_slot *slot = &q->slots[q->ht[cl - 1]];
 	struct gnet_stats_queue qs = { .qlen = slot->qlen };
-	struct tc_sfq_xstats xstats = { .allot = slot->allot };
+	struct tc_sfq_xstats xstats = {
+		.allot = slot->allot << SFQ_ALLOT_SHIFT
+	};
 	struct sk_buff *skb;
 
 	slot_queue_walk(slot, skb)



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 net-next-2.6] sch_sfq: allow big packets and be fair
  2010-12-21 13:04                               ` [PATCH v2 " Eric Dumazet
@ 2010-12-21 13:47                                 ` Jarek Poplawski
  2010-12-28 21:46                                 ` David Miller
  1 sibling, 0 replies; 40+ messages in thread
From: Jarek Poplawski @ 2010-12-21 13:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Patrick McHardy, netdev

On Tue, Dec 21, 2010 at 02:04:59PM +0100, Eric Dumazet wrote:
> Le mardi 21 décembre 2010 ?? 12:17 +0000, Jarek Poplawski a écrit :
> 
> > Oops! You're right yet ;-) This skipping shouldn't happen with quantum
> > bigger than max packet size, so this patch is OK.
> 
> Thanks Jarek, here is a v2 with the scale you suggested.

Very nice! Thanks as well,
Jarek P.

> 
> [PATCH v2 net-next-2.6] sch_sfq: allow big packets and be fair
> 
> SFQ is currently 'limited' to small packets, because it uses a 15bit
> allotment number per flow. Introduce a scale by 8, so that we can handle
> full size TSO/GRO packets.
> 
> Use appropriate handling to make sure allot is positive before a new
> packet is dequeued, so that fairness is respected.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Jarek Poplawski <jarkao2@gmail.com>
> Cc: Patrick McHardy <kaber@trash.net>
> ---
> v2: Use a scale of 8 as Jarek suggested, instead of 18bit fields
> 
>  net/sched/sch_sfq.c |   28 +++++++++++++++++++++-------
>  1 file changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> index c474b4b..f3a9fd7 100644
> --- a/net/sched/sch_sfq.c
> +++ b/net/sched/sch_sfq.c
> @@ -67,7 +67,7 @@
>  
>  	IMPLEMENTATION:
>  	This implementation limits maximal queue length to 128;
> -	maximal mtu to 2^15-1; max 128 flows, number of hash buckets to 1024.
> +	max mtu to 2^18-1; max 128 flows, number of hash buckets to 1024.
>  	The only goal of this restrictions was that all data
>  	fit into one 4K page on 32bit arches.
>  
> @@ -77,6 +77,11 @@
>  #define SFQ_SLOTS		128 /* max number of flows */
>  #define SFQ_EMPTY_SLOT		255
>  #define SFQ_HASH_DIVISOR	1024
> +/* We use 15+1 bits to store allot, and want to handle packets up to 64K
> + * Scale allot by 8 (1<<3) so that no overflow occurs.
> + */
> +#define SFQ_ALLOT_SHIFT		3
> +#define SFQ_ALLOT_SIZE(X)	DIV_ROUND_UP(X, 1 << SFQ_ALLOT_SHIFT)
>  
>  /* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */
>  typedef unsigned char sfq_index;
> @@ -115,7 +120,7 @@ struct sfq_sched_data
>  	struct timer_list perturb_timer;
>  	u32		perturbation;
>  	sfq_index	cur_depth;	/* depth of longest slot */
> -
> +	unsigned short  scaled_quantum; /* SFQ_ALLOT_SIZE(quantum) */
>  	struct sfq_slot *tail;		/* current slot in round */
>  	sfq_index	ht[SFQ_HASH_DIVISOR];	/* Hash table */
>  	struct sfq_slot	slots[SFQ_SLOTS];
> @@ -394,7 +399,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>  			q->tail->next = x;
>  		}
>  		q->tail = slot;
> -		slot->allot = q->quantum;
> +		slot->allot = q->scaled_quantum;
>  	}
>  	if (++sch->q.qlen <= q->limit) {
>  		sch->bstats.bytes += qdisc_pkt_len(skb);
> @@ -430,8 +435,14 @@ sfq_dequeue(struct Qdisc *sch)
>  	if (q->tail == NULL)
>  		return NULL;
>  
> +next_slot:
>  	a = q->tail->next;
>  	slot = &q->slots[a];
> +	if (slot->allot <= 0) {
> +		q->tail = slot;
> +		slot->allot += q->scaled_quantum;
> +		goto next_slot;
> +	}
>  	skb = slot_dequeue_head(slot);
>  	sfq_dec(q, a);
>  	sch->q.qlen--;
> @@ -446,9 +457,8 @@ sfq_dequeue(struct Qdisc *sch)
>  			return skb;
>  		}
>  		q->tail->next = next_a;
> -	} else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) {
> -		q->tail = slot;
> -		slot->allot += q->quantum;
> +	} else {
> +		slot->allot -= SFQ_ALLOT_SIZE(qdisc_pkt_len(skb));
>  	}
>  	return skb;
>  }
> @@ -484,6 +494,7 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt)
>  
>  	sch_tree_lock(sch);
>  	q->quantum = ctl->quantum ? : psched_mtu(qdisc_dev(sch));
> +	q->scaled_quantum = SFQ_ALLOT_SIZE(q->quantum);
>  	q->perturb_period = ctl->perturb_period * HZ;
>  	if (ctl->limit)
>  		q->limit = min_t(u32, ctl->limit, SFQ_DEPTH - 1);
> @@ -524,6 +535,7 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt)
>  	q->tail = NULL;
>  	if (opt == NULL) {
>  		q->quantum = psched_mtu(qdisc_dev(sch));
> +		q->scaled_quantum = SFQ_ALLOT_SIZE(q->quantum);
>  		q->perturb_period = 0;
>  		q->perturbation = net_random();
>  	} else {
> @@ -610,7 +622,9 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
>  	struct sfq_sched_data *q = qdisc_priv(sch);
>  	const struct sfq_slot *slot = &q->slots[q->ht[cl - 1]];
>  	struct gnet_stats_queue qs = { .qlen = slot->qlen };
> -	struct tc_sfq_xstats xstats = { .allot = slot->allot };
> +	struct tc_sfq_xstats xstats = {
> +		.allot = slot->allot << SFQ_ALLOT_SHIFT
> +	};
>  	struct sk_buff *skb;
>  
>  	slot_queue_walk(slot, skb)
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 net-next-2.6] sch_sfq: allow big packets and be fair
  2010-12-21 13:04                               ` [PATCH v2 " Eric Dumazet
  2010-12-21 13:47                                 ` Jarek Poplawski
@ 2010-12-28 21:46                                 ` David Miller
  2010-12-29  7:53                                   ` [PATCH v3 " Eric Dumazet
  1 sibling, 1 reply; 40+ messages in thread
From: David Miller @ 2010-12-28 21:46 UTC (permalink / raw)
  To: eric.dumazet; +Cc: jarkao2, kaber, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 21 Dec 2010 14:04:59 +0100

> Le mardi 21 décembre 2010 à 12:17 +0000, Jarek Poplawski a écrit :
> 
>> Oops! You're right yet ;-) This skipping shouldn't happen with quantum
>> bigger than max packet size, so this patch is OK.
> 
> Thanks Jarek, here is a v2 with the scale you suggested.
> 
> [PATCH v2 net-next-2.6] sch_sfq: allow big packets and be fair
> 
> SFQ is currently 'limited' to small packets, because it uses a 15bit
> allotment number per flow. Introduce a scale by 8, so that we can handle
> full size TSO/GRO packets.
> 
> Use appropriate handling to make sure allot is positive before a new
> packet is dequeued, so that fairness is respected.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Jarek Poplawski <jarkao2@gmail.com>
> Cc: Patrick McHardy <kaber@trash.net>
> ---
> v2: Use a scale of 8 as Jarek suggested, instead of 18bit fields

Eric this doesn't apply cleanly, since the code in sfq_dump_class_stats()
is a bit different in net-next-2.6

Please respin this and resubmit.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v3 net-next-2.6] sch_sfq: allow big packets and be fair
  2010-12-28 21:46                                 ` David Miller
@ 2010-12-29  7:53                                   ` Eric Dumazet
  2010-12-31 20:48                                     ` David Miller
  0 siblings, 1 reply; 40+ messages in thread
From: Eric Dumazet @ 2010-12-29  7:53 UTC (permalink / raw)
  To: David Miller; +Cc: jarkao2, kaber, netdev

Le mardi 28 décembre 2010 à 13:46 -0800, David Miller a écrit :
> Eric this doesn't apply cleanly, since the code in sfq_dump_class_stats()
> is a bit different in net-next-2.6
> 
> Please respin this and resubmit.

Sure, here is the version for latest net-next

Thanks

[PATCH v3 net-next-2.6] sch_sfq: allow big packets and be fair

SFQ is currently 'limited' to small packets, because it uses a 15bit
allotment number per flow. Introduce a scale by 8, so that we can handle
full size TSO/GRO packets.

Use appropriate handling to make sure allot is positive before a new
packet is dequeued, so that fairness is respected.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Jarek Poplawski <jarkao2@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
---
v3: respin after commit ee09b3c1cf (fix sfq class stats handling)
v2: Use a scale of 8 as Jarek suggested, instead of 18bit fields

 net/sched/sch_sfq.c |   26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 0ce421d..79c8967 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -67,7 +67,7 @@
 
 	IMPLEMENTATION:
 	This implementation limits maximal queue length to 128;
-	maximal mtu to 2^15-1; max 128 flows, number of hash buckets to 1024.
+	max mtu to 2^18-1; max 128 flows, number of hash buckets to 1024.
 	The only goal of this restrictions was that all data
 	fit into one 4K page on 32bit arches.
 
@@ -77,6 +77,11 @@
 #define SFQ_SLOTS		128 /* max number of flows */
 #define SFQ_EMPTY_SLOT		255
 #define SFQ_HASH_DIVISOR	1024
+/* We use 16 bits to store allot, and want to handle packets up to 64K
+ * Scale allot by 8 (1<<3) so that no overflow occurs.
+ */
+#define SFQ_ALLOT_SHIFT		3
+#define SFQ_ALLOT_SIZE(X)	DIV_ROUND_UP(X, 1 << SFQ_ALLOT_SHIFT)
 
 /* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */
 typedef unsigned char sfq_index;
@@ -115,7 +120,7 @@ struct sfq_sched_data
 	struct timer_list perturb_timer;
 	u32		perturbation;
 	sfq_index	cur_depth;	/* depth of longest slot */
-
+	unsigned short  scaled_quantum; /* SFQ_ALLOT_SIZE(quantum) */
 	struct sfq_slot *tail;		/* current slot in round */
 	sfq_index	ht[SFQ_HASH_DIVISOR];	/* Hash table */
 	struct sfq_slot	slots[SFQ_SLOTS];
@@ -395,7 +400,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 			q->tail->next = x;
 		}
 		q->tail = slot;
-		slot->allot = q->quantum;
+		slot->allot = q->scaled_quantum;
 	}
 	if (++sch->q.qlen <= q->limit) {
 		sch->bstats.bytes += qdisc_pkt_len(skb);
@@ -431,8 +436,14 @@ sfq_dequeue(struct Qdisc *sch)
 	if (q->tail == NULL)
 		return NULL;
 
+next_slot:
 	a = q->tail->next;
 	slot = &q->slots[a];
+	if (slot->allot <= 0) {
+		q->tail = slot;
+		slot->allot += q->scaled_quantum;
+		goto next_slot;
+	}
 	skb = slot_dequeue_head(slot);
 	sfq_dec(q, a);
 	sch->q.qlen--;
@@ -447,9 +458,8 @@ sfq_dequeue(struct Qdisc *sch)
 			return skb;
 		}
 		q->tail->next = next_a;
-	} else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) {
-		q->tail = slot;
-		slot->allot += q->quantum;
+	} else {
+		slot->allot -= SFQ_ALLOT_SIZE(qdisc_pkt_len(skb));
 	}
 	return skb;
 }
@@ -485,6 +495,7 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt)
 
 	sch_tree_lock(sch);
 	q->quantum = ctl->quantum ? : psched_mtu(qdisc_dev(sch));
+	q->scaled_quantum = SFQ_ALLOT_SIZE(q->quantum);
 	q->perturb_period = ctl->perturb_period * HZ;
 	if (ctl->limit)
 		q->limit = min_t(u32, ctl->limit, SFQ_DEPTH - 1);
@@ -525,6 +536,7 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt)
 	q->tail = NULL;
 	if (opt == NULL) {
 		q->quantum = psched_mtu(qdisc_dev(sch));
+		q->scaled_quantum = SFQ_ALLOT_SIZE(q->quantum);
 		q->perturb_period = 0;
 		q->perturbation = net_random();
 	} else {
@@ -617,7 +629,7 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl,
 	if (idx != SFQ_EMPTY_SLOT) {
 		const struct sfq_slot *slot = &q->slots[idx];
 
-		xstats.allot = slot->allot;
+		xstats.allot = slot->allot << SFQ_ALLOT_SHIFT;
 		qs.qlen = slot->qlen;
 		slot_queue_walk(slot, skb)
 			qs.backlog += qdisc_pkt_len(skb);



^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v3 net-next-2.6] sch_sfq: allow big packets and be fair
  2010-12-29  7:53                                   ` [PATCH v3 " Eric Dumazet
@ 2010-12-31 20:48                                     ` David Miller
  0 siblings, 0 replies; 40+ messages in thread
From: David Miller @ 2010-12-31 20:48 UTC (permalink / raw)
  To: eric.dumazet; +Cc: jarkao2, kaber, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 29 Dec 2010 08:53:33 +0100

> [PATCH v3 net-next-2.6] sch_sfq: allow big packets and be fair
> 
> SFQ is currently 'limited' to small packets, because it uses a 15bit
> allotment number per flow. Introduce a scale by 8, so that we can handle
> full size TSO/GRO packets.
> 
> Use appropriate handling to make sure allot is positive before a new
> packet is dequeued, so that fairness is respected.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Acked-by: Jarek Poplawski <jarkao2@gmail.com>
> Cc: Patrick McHardy <kaber@trash.net>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2010-12-31 20:47 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-15 14:03 [PATCH] net_sched: sch_sfq: fix allot handling Eric Dumazet
2010-12-15 16:03 ` Patrick McHardy
2010-12-15 16:27   ` Eric Dumazet
2010-12-15 16:40     ` [PATCH v2] " Eric Dumazet
2010-12-15 16:43       ` Patrick McHardy
2010-12-15 16:55         ` Eric Dumazet
2010-12-15 17:03           ` Patrick McHardy
2010-12-15 17:09             ` Eric Dumazet
2010-12-15 17:21               ` Patrick McHardy
2010-12-15 17:30                 ` [PATCH v3] " Eric Dumazet
2010-12-15 18:18                   ` [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() Eric Dumazet
2010-12-15 19:10                     ` Eric Dumazet
2010-12-16  8:16                     ` Jarek Poplawski
2010-12-16 10:18                       ` [PATCH v2 " Eric Dumazet
2010-12-16 11:03                       ` [PATCH " Eric Dumazet
2010-12-16 13:09                         ` Jarek Poplawski
2010-12-20 21:14                     ` David Miller
2010-12-20 21:18                   ` [PATCH v3] net_sched: sch_sfq: fix allot handling David Miller
2010-12-16 13:08             ` [PATCH v2] " Eric Dumazet
2010-12-17 16:52               ` [RFC PATCH] net_sched: sch_sfq: better struct layouts Eric Dumazet
2010-12-19 21:22                 ` Jarek Poplawski
2010-12-20 17:02                   ` [PATCH v2] " Eric Dumazet
2010-12-20 21:33                     ` David Miller
2010-12-20 21:42                       ` Eric Dumazet
2010-12-20 22:54                         ` [PATCH v3 net-next-2.6] " Eric Dumazet
2010-12-21  5:33                           ` David Miller
2010-12-20 22:55                     ` [PATCH v2] " Jarek Poplawski
2010-12-20 23:16                     ` [PATCH net-next-2.6] sch_sfq: allow big packets and be fair Eric Dumazet
2010-12-21 10:15                       ` Jarek Poplawski
2010-12-21 10:30                         ` Jarek Poplawski
2010-12-21 10:44                           ` Eric Dumazet
2010-12-21 10:56                             ` Jarek Poplawski
2010-12-21 10:57                         ` Eric Dumazet
2010-12-21 11:39                           ` Jarek Poplawski
2010-12-21 12:17                             ` Jarek Poplawski
2010-12-21 13:04                               ` [PATCH v2 " Eric Dumazet
2010-12-21 13:47                                 ` Jarek Poplawski
2010-12-28 21:46                                 ` David Miller
2010-12-29  7:53                                   ` [PATCH v3 " Eric Dumazet
2010-12-31 20:48                                     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.