* [PATCH] net_sched: sch_sfq: fix allot handling @ 2010-12-15 14:03 Eric Dumazet 2010-12-15 16:03 ` Patrick McHardy 0 siblings, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2010-12-15 14:03 UTC (permalink / raw) To: David Miller; +Cc: Patrick McHardy, netdev, Jarek Poplawski When deploying SFQ/IFB here at work, I found the allot management was pretty wrong in sfq, even changing allot from short to int... We should init allot for each new flow turn, not using a previous value, or else small packets can easily make allot overflow. Before patch, I saw burst of several packets per flow, apparently denying the "allot 1514" limit I had on my SFQ class. class sfq 11:1 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 7p requeues 0 allot 11546 class sfq 11:46 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 1p requeues 0 allot -23873 class sfq 11:78 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 5p requeues 0 allot 11393 After patch, better fairness among each flow, allot limit being respected. class sfq 11:52 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 4p requeues 0 allot 1514 class sfq 11:60 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot -586 class sfq 11:6b parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot -586 class sfq 11:71 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot 1514 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- net/sched/sch_sfq.c | 11 +++++------ 1 files changed, 5 insertions(+), 6 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 3cf478d..8c8a190 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -270,7 +270,7 @@ static unsigned int sfq_drop(struct Qdisc *sch) /* It is difficult to believe, but ALL THE SLOTS HAVE LENGTH 1. */ d = q->next[q->tail]; q->next[q->tail] = q->next[d]; - q->allot[q->next[d]] += q->quantum; + q->allot[q->next[d]] = q->quantum; skb = q->qs[d].prev; len = qdisc_pkt_len(skb); __skb_unlink(skb, &q->qs[d]); @@ -321,14 +321,13 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) sfq_inc(q, x); if (q->qs[x].qlen == 1) { /* The flow is new */ if (q->tail == SFQ_DEPTH) { /* It is the first flow */ - q->tail = x; q->next[x] = x; - q->allot[x] = q->quantum; } else { q->next[x] = q->next[q->tail]; q->next[q->tail] = x; - q->tail = x; } + q->tail = x; + q->allot[x] = q->quantum; } if (++sch->q.qlen <= q->limit) { sch->bstats.bytes += qdisc_pkt_len(skb); @@ -382,11 +381,11 @@ sfq_dequeue(struct Qdisc *sch) return skb; } q->next[q->tail] = a; - q->allot[a] += q->quantum; + q->allot[a] = q->quantum; } else if ((q->allot[a] -= qdisc_pkt_len(skb)) <= 0) { q->tail = a; a = q->next[a]; - q->allot[a] += q->quantum; + q->allot[a] = q->quantum; } return skb; } ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH] net_sched: sch_sfq: fix allot handling 2010-12-15 14:03 [PATCH] net_sched: sch_sfq: fix allot handling Eric Dumazet @ 2010-12-15 16:03 ` Patrick McHardy 2010-12-15 16:27 ` Eric Dumazet 0 siblings, 1 reply; 40+ messages in thread From: Patrick McHardy @ 2010-12-15 16:03 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev, Jarek Poplawski On 15.12.2010 15:03, Eric Dumazet wrote: > When deploying SFQ/IFB here at work, I found the allot management was > pretty wrong in sfq, even changing allot from short to int... > > We should init allot for each new flow turn, not using a previous value, > or else small packets can easily make allot overflow. > > Before patch, I saw burst of several packets per flow, apparently > denying the "allot 1514" limit I had on my SFQ class. > > class sfq 11:1 parent 11: > (dropped 0, overlimits 0 requeues 0) > backlog 0b 7p requeues 0 > allot 11546 > > class sfq 11:46 parent 11: > (dropped 0, overlimits 0 requeues 0) > backlog 0b 1p requeues 0 > allot -23873 > > class sfq 11:78 parent 11: > (dropped 0, overlimits 0 requeues 0) > backlog 0b 5p requeues 0 > allot 11393 These values definitely look wrong. > diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c > index 3cf478d..8c8a190 100644 > --- a/net/sched/sch_sfq.c > +++ b/net/sched/sch_sfq.c > @@ -270,7 +270,7 @@ static unsigned int sfq_drop(struct Qdisc *sch) > /* It is difficult to believe, but ALL THE SLOTS HAVE LENGTH 1. */ > d = q->next[q->tail]; > q->next[q->tail] = q->next[d]; > - q->allot[q->next[d]] += q->quantum; > + q->allot[q->next[d]] = q->quantum; > skb = q->qs[d].prev; > len = qdisc_pkt_len(skb); > __skb_unlink(skb, &q->qs[d]); I'm not sure about this part, but lets ignore that for now since it shouldn't affect your testcase unless you're using CBQ. > @@ -321,14 +321,13 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) > sfq_inc(q, x); > if (q->qs[x].qlen == 1) { /* The flow is new */ > if (q->tail == SFQ_DEPTH) { /* It is the first flow */ > - q->tail = x; > q->next[x] = x; > - q->allot[x] = q->quantum; > } else { > q->next[x] = q->next[q->tail]; > q->next[q->tail] = x; > - q->tail = x; > } > + q->tail = x; > + q->allot[x] = q->quantum; > } This looks correct, for new flows allot should be initialized from scratch. > if (++sch->q.qlen <= q->limit) { > sch->bstats.bytes += qdisc_pkt_len(skb); > @@ -382,11 +381,11 @@ sfq_dequeue(struct Qdisc *sch) > return skb; > } > q->next[q->tail] = a; > - q->allot[a] += q->quantum; > + q->allot[a] = q->quantum; The allot initialization doesn't seem necessary anymore at all now that you're reinitalizing allot for flows that became active unconditionally in sfq_enqueue(). > } else if ((q->allot[a] -= qdisc_pkt_len(skb)) <= 0) { > q->tail = a; > a = q->next[a]; > - q->allot[a] += q->quantum; > + q->allot[a] = q->quantum; This seems to break long-term fairness for active flows by not accounting for overshooting the allotment in the next round anymore. I think either the change in sfq_enqueue() or the first change in sfq_dequeue() should be enough to fix the problem you're seeing. Basically what needs to be done is initialize allot once from scratch when the flow becomes active, then add one quantum per round while it stays active. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH] net_sched: sch_sfq: fix allot handling 2010-12-15 16:03 ` Patrick McHardy @ 2010-12-15 16:27 ` Eric Dumazet 2010-12-15 16:40 ` [PATCH v2] " Eric Dumazet 0 siblings, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2010-12-15 16:27 UTC (permalink / raw) To: Patrick McHardy; +Cc: David Miller, netdev, Jarek Poplawski Le mercredi 15 décembre 2010 à 17:03 +0100, Patrick McHardy a écrit : > On 15.12.2010 15:03, Eric Dumazet wrote: > > When deploying SFQ/IFB here at work, I found the allot management was > > pretty wrong in sfq, even changing allot from short to int... > > > > We should init allot for each new flow turn, not using a previous value, > > or else small packets can easily make allot overflow. > > > > Before patch, I saw burst of several packets per flow, apparently > > denying the "allot 1514" limit I had on my SFQ class. > > > > class sfq 11:1 parent 11: > > (dropped 0, overlimits 0 requeues 0) > > backlog 0b 7p requeues 0 > > allot 11546 > > > > class sfq 11:46 parent 11: > > (dropped 0, overlimits 0 requeues 0) > > backlog 0b 1p requeues 0 > > allot -23873 > > > > class sfq 11:78 parent 11: > > (dropped 0, overlimits 0 requeues 0) > > backlog 0b 5p requeues 0 > > allot 11393 > > These values definitely look wrong. > > > diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c > > index 3cf478d..8c8a190 100644 > > --- a/net/sched/sch_sfq.c > > +++ b/net/sched/sch_sfq.c > > @@ -270,7 +270,7 @@ static unsigned int sfq_drop(struct Qdisc *sch) > > /* It is difficult to believe, but ALL THE SLOTS HAVE LENGTH 1. */ > > d = q->next[q->tail]; > > q->next[q->tail] = q->next[d]; > > - q->allot[q->next[d]] += q->quantum; > > + q->allot[q->next[d]] = q->quantum; > > skb = q->qs[d].prev; > > len = qdisc_pkt_len(skb); > > __skb_unlink(skb, &q->qs[d]); > > I'm not sure about this part, but lets ignore that for now since it > shouldn't affect your testcase unless you're using CBQ. > > > @@ -321,14 +321,13 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) > > sfq_inc(q, x); > > if (q->qs[x].qlen == 1) { /* The flow is new */ > > if (q->tail == SFQ_DEPTH) { /* It is the first flow */ > > - q->tail = x; > > q->next[x] = x; > > - q->allot[x] = q->quantum; > > } else { > > q->next[x] = q->next[q->tail]; > > q->next[q->tail] = x; > > - q->tail = x; > > } > > + q->tail = x; > > + q->allot[x] = q->quantum; > > } > > This looks correct, for new flows allot should be initialized from > scratch. > > > if (++sch->q.qlen <= q->limit) { > > sch->bstats.bytes += qdisc_pkt_len(skb); > > @@ -382,11 +381,11 @@ sfq_dequeue(struct Qdisc *sch) > > return skb; > > } > > q->next[q->tail] = a; > > - q->allot[a] += q->quantum; > > + q->allot[a] = q->quantum; > > The allot initialization doesn't seem necessary anymore at all > now that you're reinitalizing allot for flows that became active > unconditionally in sfq_enqueue(). > > > } else if ((q->allot[a] -= qdisc_pkt_len(skb)) <= 0) { > > q->tail = a; > > a = q->next[a]; > > - q->allot[a] += q->quantum; > > + q->allot[a] = q->quantum; > > This seems to break long-term fairness for active flows by not > accounting for overshooting the allotment in the next round > anymore. > > I think either the change in sfq_enqueue() or the first change > in sfq_dequeue() should be enough to fix the problem you're seeing. > Basically what needs to be done is initialize allot once from > scratch when the flow becomes active, then add one quantum per > round while it stays active. Hmm, you may be right, thanks a lot for reviewing ! I noticed that with normal quantum (1514), my SFQ setup was sending two full frames per flow after my patch, so was about to prepare a new version ;) I'll post a v2 shortly. Thanks ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v2] net_sched: sch_sfq: fix allot handling 2010-12-15 16:27 ` Eric Dumazet @ 2010-12-15 16:40 ` Eric Dumazet 2010-12-15 16:43 ` Patrick McHardy 0 siblings, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2010-12-15 16:40 UTC (permalink / raw) To: Patrick McHardy; +Cc: David Miller, netdev, Jarek Poplawski Le mercredi 15 décembre 2010 à 17:27 +0100, Eric Dumazet a écrit : > Hmm, you may be right, thanks a lot for reviewing ! > > I noticed that with normal quantum (1514), my SFQ setup was sending two > full frames per flow after my patch, so was about to prepare a new > version ;) > > I'll post a v2 shortly. Indeed, the missing init in sfq_enqueue() is enough to solve the problem : [PATCH v2] net_sched: sch_sfq: fix allot handling When deploying SFQ/IFB here at work, I found the allot management was pretty wrong in sfq, even changing allot from short to int... We should init allot for each new flow, not using a previous value found in slot. Before patch, I saw bursts of several packets per flow, apparently denying the default "quantum 1514" limit I had on my SFQ class. class sfq 11:1 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 7p requeues 0 allot 11546 class sfq 11:46 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 1p requeues 0 allot -23873 class sfq 11:78 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 5p requeues 0 allot 11393 After patch, better fairness among each flow, allot limit being respected. class sfq 11:2e4 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 4p requeues 0 allot -1166 class sfq 11:2f7 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot -1166 class sfq 11:313 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot -1220 class sfq 11:335 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 4p requeues 0 allot -1166 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- net/sched/sch_sfq.c | 5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 3cf478d..065a2a5 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -321,14 +321,13 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) sfq_inc(q, x); if (q->qs[x].qlen == 1) { /* The flow is new */ if (q->tail == SFQ_DEPTH) { /* It is the first flow */ - q->tail = x; q->next[x] = x; - q->allot[x] = q->quantum; } else { q->next[x] = q->next[q->tail]; q->next[q->tail] = x; - q->tail = x; } + q->tail = x; + q->allot[x] = q->quantum; } if (++sch->q.qlen <= q->limit) { sch->bstats.bytes += qdisc_pkt_len(skb); ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v2] net_sched: sch_sfq: fix allot handling 2010-12-15 16:40 ` [PATCH v2] " Eric Dumazet @ 2010-12-15 16:43 ` Patrick McHardy 2010-12-15 16:55 ` Eric Dumazet 0 siblings, 1 reply; 40+ messages in thread From: Patrick McHardy @ 2010-12-15 16:43 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev, Jarek Poplawski On 15.12.2010 17:40, Eric Dumazet wrote: > Le mercredi 15 décembre 2010 à 17:27 +0100, Eric Dumazet a écrit : > >> Hmm, you may be right, thanks a lot for reviewing ! >> >> I noticed that with normal quantum (1514), my SFQ setup was sending two >> full frames per flow after my patch, so was about to prepare a new >> version ;) >> >> I'll post a v2 shortly. > > Indeed, the missing init in sfq_enqueue() is enough to solve the > problem : > > [PATCH v2] net_sched: sch_sfq: fix allot handling > > When deploying SFQ/IFB here at work, I found the allot management was > pretty wrong in sfq, even changing allot from short to int... > > We should init allot for each new flow, not using a previous value found > in slot. > > Before patch, I saw bursts of several packets per flow, apparently > denying the default "quantum 1514" limit I had on my SFQ class. > > diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c > index 3cf478d..065a2a5 100644 > --- a/net/sched/sch_sfq.c > +++ b/net/sched/sch_sfq.c > @@ -321,14 +321,13 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) > sfq_inc(q, x); > if (q->qs[x].qlen == 1) { /* The flow is new */ > if (q->tail == SFQ_DEPTH) { /* It is the first flow */ > - q->tail = x; > q->next[x] = x; > - q->allot[x] = q->quantum; > } else { > q->next[x] = q->next[q->tail]; > q->next[q->tail] = x; > - q->tail = x; > } > + q->tail = x; > + q->allot[x] = q->quantum; > } Now we could remove the allot increase in sfq_dequeue for the case that the flow becomes inactive. It is incorrect anyways. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2] net_sched: sch_sfq: fix allot handling 2010-12-15 16:43 ` Patrick McHardy @ 2010-12-15 16:55 ` Eric Dumazet 2010-12-15 17:03 ` Patrick McHardy 0 siblings, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2010-12-15 16:55 UTC (permalink / raw) To: Patrick McHardy; +Cc: David Miller, netdev, Jarek Poplawski Le mercredi 15 décembre 2010 à 17:43 +0100, Patrick McHardy a écrit : > Now we could remove the allot increase in sfq_dequeue for > the case that the flow becomes inactive. It is incorrect > anyways. Hmm, we increase the allot for the next slot, not for the slot now empty. /* Is the slot empty? */ if (q->qs[a].qlen == 0) { q->ht[q->hash[a]] = SFQ_DEPTH; a = q->next[a]; // a = next slot index if (a == old_a) { q->tail = SFQ_DEPTH; return skb; } q->next[q->tail] = a; q->allot[a] += q->quantum; // HERE, q->allot[a] is for next slot, we give it its quantum for being activated } else if ((q->allot[a] -= qdisc_pkt_len(skb)) <= 0) { Maybe we should rename (a / old_a) by (a / next_a) to avoid confusion :) I was thinking in allowing more packets per SFQ (but keep the 126 active flows limit), what do you think ? ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2] net_sched: sch_sfq: fix allot handling 2010-12-15 16:55 ` Eric Dumazet @ 2010-12-15 17:03 ` Patrick McHardy 2010-12-15 17:09 ` Eric Dumazet 2010-12-16 13:08 ` [PATCH v2] " Eric Dumazet 0 siblings, 2 replies; 40+ messages in thread From: Patrick McHardy @ 2010-12-15 17:03 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev, Jarek Poplawski On 15.12.2010 17:55, Eric Dumazet wrote: > Le mercredi 15 décembre 2010 à 17:43 +0100, Patrick McHardy a écrit : > >> Now we could remove the allot increase in sfq_dequeue for >> the case that the flow becomes inactive. It is incorrect >> anyways. > > Hmm, we increase the allot for the next slot, not for the slot now > empty. > > > /* Is the slot empty? */ > if (q->qs[a].qlen == 0) { > q->ht[q->hash[a]] = SFQ_DEPTH; > a = q->next[a]; // a = next slot index > if (a == old_a) { > q->tail = SFQ_DEPTH; > return skb; > } > q->next[q->tail] = a; > q->allot[a] += q->quantum; > // HERE, q->allot[a] is for next slot, we give it its quantum for being > activated Right, that's odd. It shouldn't be necessary anymore though since now we initialize allot in sfq_enqueue() for all new flows and increase allotment for all active flows once per round in sfq_dequeue(). The above code causes a second increase for the flow following a flow which went inactive. > > } else if ((q->allot[a] -= qdisc_pkt_len(skb)) <= 0) { > > Maybe we should rename (a / old_a) by (a / next_a) to avoid confusion :) That might have made things more clearer :) > I was thinking in allowing more packets per SFQ (but keep the 126 active > flows limit), what do you think ? I keep forgetting why this limit exists, let me try to figure it out once more :) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2] net_sched: sch_sfq: fix allot handling 2010-12-15 17:03 ` Patrick McHardy @ 2010-12-15 17:09 ` Eric Dumazet 2010-12-15 17:21 ` Patrick McHardy 2010-12-16 13:08 ` [PATCH v2] " Eric Dumazet 1 sibling, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2010-12-15 17:09 UTC (permalink / raw) To: Patrick McHardy; +Cc: David Miller, netdev, Jarek Poplawski Le mercredi 15 décembre 2010 à 18:03 +0100, Patrick McHardy a écrit : > Right, that's odd. It shouldn't be necessary anymore though since > now we initialize allot in sfq_enqueue() for all new flows and > increase allotment for all active flows once per round in sfq_dequeue(). > The above code causes a second increase for the flow following a flow > which went inactive. Well, we do this in three places. Each time we 'select' a flow as the next packet provider, we increase its allot by quantum. We could change this, adding quantum to the current slot when its allot becomes negative (and we select the next slot for next round) This basically was what my V1 was doing ;) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2] net_sched: sch_sfq: fix allot handling 2010-12-15 17:09 ` Eric Dumazet @ 2010-12-15 17:21 ` Patrick McHardy 2010-12-15 17:30 ` [PATCH v3] " Eric Dumazet 0 siblings, 1 reply; 40+ messages in thread From: Patrick McHardy @ 2010-12-15 17:21 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev, Jarek Poplawski On 15.12.2010 18:09, Eric Dumazet wrote: > Le mercredi 15 décembre 2010 à 18:03 +0100, Patrick McHardy a écrit : > >> Right, that's odd. It shouldn't be necessary anymore though since >> now we initialize allot in sfq_enqueue() for all new flows and >> increase allotment for all active flows once per round in sfq_dequeue(). >> The above code causes a second increase for the flow following a flow >> which went inactive. > > Well, we do this in three places. Each time we 'select' a flow as the > next packet provider, we increase its allot by quantum. > > We could change this, adding quantum to the current slot when its allot > becomes negative (and we select the next slot for next round) Right, I again missed that 'a' refers to the next flow in the second condition in sfq_dequeue(). I have to run now, will have another look at this tommorrow. ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v3] net_sched: sch_sfq: fix allot handling 2010-12-15 17:21 ` Patrick McHardy @ 2010-12-15 17:30 ` Eric Dumazet 2010-12-15 18:18 ` [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() Eric Dumazet 2010-12-20 21:18 ` [PATCH v3] net_sched: sch_sfq: fix allot handling David Miller 0 siblings, 2 replies; 40+ messages in thread From: Eric Dumazet @ 2010-12-15 17:30 UTC (permalink / raw) To: Patrick McHardy; +Cc: David Miller, netdev, Jarek Poplawski Le mercredi 15 décembre 2010 à 18:21 +0100, Patrick McHardy a écrit : > On 15.12.2010 18:09, Eric Dumazet wrote: > > Le mercredi 15 décembre 2010 à 18:03 +0100, Patrick McHardy a écrit : > > > >> Right, that's odd. It shouldn't be necessary anymore though since > >> now we initialize allot in sfq_enqueue() for all new flows and > >> increase allotment for all active flows once per round in sfq_dequeue(). > >> The above code causes a second increase for the flow following a flow > >> which went inactive. > > > > Well, we do this in three places. Each time we 'select' a flow as the > > next packet provider, we increase its allot by quantum. > > > > We could change this, adding quantum to the current slot when its allot > > becomes negative (and we select the next slot for next round) > > Right, I again missed that 'a' refers to the next flow in the second > condition in sfq_dequeue(). > > I have to run now, will have another look at this tommorrow. Here is v3 : Now we add a quantum only when current flow consumed all its allot, not "in advance for next slot". this should use less cpu too. Thanks [PATCH v3] net_sched: sch_sfq: fix allot handling When deploying SFQ/IFB here at work, I found the allot management was pretty wrong in sfq, even changing allot from short to int... We should init allot for each new flow, not using a previous value found in slot. Before patch, I saw bursts of several packets per flow, apparently denying the default "quantum 1514" limit I had on my SFQ class. class sfq 11:1 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 7p requeues 0 allot 11546 class sfq 11:46 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 1p requeues 0 allot -23873 class sfq 11:78 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 5p requeues 0 allot 11393 After patch, better fairness among each flow, allot limit being respected, allot is positive : class sfq 11:e parent 11: (dropped 0, overlimits 0 requeues 86) backlog 0b 3p requeues 86 allot 596 class sfq 11:94 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot 1468 class sfq 11:a4 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 4p requeues 0 allot 650 class sfq 11:bb parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot 596 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- net/sched/sch_sfq.c | 20 ++++++++------------ 1 files changed, 8 insertions(+), 12 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 3cf478d..7150705 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -270,7 +270,6 @@ static unsigned int sfq_drop(struct Qdisc *sch) /* It is difficult to believe, but ALL THE SLOTS HAVE LENGTH 1. */ d = q->next[q->tail]; q->next[q->tail] = q->next[d]; - q->allot[q->next[d]] += q->quantum; skb = q->qs[d].prev; len = qdisc_pkt_len(skb); __skb_unlink(skb, &q->qs[d]); @@ -321,14 +320,13 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) sfq_inc(q, x); if (q->qs[x].qlen == 1) { /* The flow is new */ if (q->tail == SFQ_DEPTH) { /* It is the first flow */ - q->tail = x; q->next[x] = x; - q->allot[x] = q->quantum; } else { q->next[x] = q->next[q->tail]; q->next[q->tail] = x; - q->tail = x; } + q->tail = x; + q->allot[x] = q->quantum; } if (++sch->q.qlen <= q->limit) { sch->bstats.bytes += qdisc_pkt_len(skb); @@ -359,13 +357,13 @@ sfq_dequeue(struct Qdisc *sch) { struct sfq_sched_data *q = qdisc_priv(sch); struct sk_buff *skb; - sfq_index a, old_a; + sfq_index a, next_a; /* No active slots */ if (q->tail == SFQ_DEPTH) return NULL; - a = old_a = q->next[q->tail]; + a = q->next[q->tail]; /* Grab packet */ skb = __skb_dequeue(&q->qs[a]); @@ -376,17 +374,15 @@ sfq_dequeue(struct Qdisc *sch) /* Is the slot empty? */ if (q->qs[a].qlen == 0) { q->ht[q->hash[a]] = SFQ_DEPTH; - a = q->next[a]; - if (a == old_a) { + next_a = q->next[a]; + if (a == next_a) { q->tail = SFQ_DEPTH; return skb; } - q->next[q->tail] = a; - q->allot[a] += q->quantum; + q->next[q->tail] = next_a; } else if ((q->allot[a] -= qdisc_pkt_len(skb)) <= 0) { - q->tail = a; - a = q->next[a]; q->allot[a] += q->quantum; + q->tail = a; } return skb; } ^ permalink raw reply related [flat|nested] 40+ messages in thread
* [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() 2010-12-15 17:30 ` [PATCH v3] " Eric Dumazet @ 2010-12-15 18:18 ` Eric Dumazet 2010-12-15 19:10 ` Eric Dumazet ` (2 more replies) 2010-12-20 21:18 ` [PATCH v3] net_sched: sch_sfq: fix allot handling David Miller 1 sibling, 3 replies; 40+ messages in thread From: Eric Dumazet @ 2010-12-15 18:18 UTC (permalink / raw) To: David Miller; +Cc: netdev, Jarek Poplawski, Patrick McHardy We currently return for each active SFQ slot the number of packets in queue. We can also give number of bytes accounted for these packets. tc -s class show dev ifb0 Before patch : class sfq 11:3d9 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot 1266 After patch : class sfq 11:3e4 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 4380b 3p requeues 0 allot 1212 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- net/sched/sch_sfq.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 3cf478d..cb331de 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -548,8 +548,13 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl, { struct sfq_sched_data *q = qdisc_priv(sch); sfq_index idx = q->ht[cl-1]; - struct gnet_stats_queue qs = { .qlen = q->qs[idx].qlen }; + struct sk_buff_head *list = &q->qs[idx]; + struct gnet_stats_queue qs = { .qlen = list->qlen }; struct tc_sfq_xstats xstats = { .allot = q->allot[idx] }; + struct sk_buff *skb; + + skb_queue_walk(list, skb) + qs.backlog += qdisc_pkt_len(skb); if (gnet_stats_copy_queue(d, &qs) < 0) return -1; ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() 2010-12-15 18:18 ` [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() Eric Dumazet @ 2010-12-15 19:10 ` Eric Dumazet 2010-12-16 8:16 ` Jarek Poplawski 2010-12-20 21:14 ` David Miller 2 siblings, 0 replies; 40+ messages in thread From: Eric Dumazet @ 2010-12-15 19:10 UTC (permalink / raw) To: David Miller; +Cc: netdev, Jarek Poplawski, Patrick McHardy Le mercredi 15 décembre 2010 à 19:18 +0100, Eric Dumazet a écrit : > We currently return for each active SFQ slot the number of packets in > queue. We can also give number of bytes accounted for these packets. > > tc -s class show dev ifb0 > > Before patch : > > class sfq 11:3d9 parent 11: > (dropped 0, overlimits 0 requeues 0) > backlog 0b 3p requeues 0 > allot 1266 > > After patch : > > class sfq 11:3e4 parent 11: > (dropped 0, overlimits 0 requeues 0) > backlog 4380b 3p requeues 0 > allot 1212 > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > net/sched/sch_sfq.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c > index 3cf478d..cb331de 100644 > --- a/net/sched/sch_sfq.c > +++ b/net/sched/sch_sfq.c > @@ -548,8 +548,13 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl, > { > struct sfq_sched_data *q = qdisc_priv(sch); > sfq_index idx = q->ht[cl-1]; > - struct gnet_stats_queue qs = { .qlen = q->qs[idx].qlen }; > + struct sk_buff_head *list = &q->qs[idx]; > + struct gnet_stats_queue qs = { .qlen = list->qlen }; > struct tc_sfq_xstats xstats = { .allot = q->allot[idx] }; > + struct sk_buff *skb; > + > + skb_queue_walk(list, skb) > + qs.backlog += qdisc_pkt_len(skb); > > if (gnet_stats_copy_queue(d, &qs) < 0) > return -1; > By the way, I could not find out how to make "tc -s class show dev ifb0" reports correct backlog information on parent class itself : Here we can see 126p, and 0b in backlog : class cbq 1:11 parent 1:1 leaf 11: rate 50000Kbit cell 8b mpu 64b (bounded) prio 2/2 weight 50000Kbit allot 2250b level 0 ewma 5 avpkt 1500b maxidle 0us Sent 330418440 bytes 226314 pkt (dropped 280142, overlimits 689534 requeues 0) rate 52200Kbit 4469pps backlog 0b 126p requeues 0 borrowed 0 overactions 182814 avgidle -3363 undertime 2812 qdisc is OK : qdisc sfq 11: parent 1:11 limit 127p quantum 1514b flows 127/1024 perturb 60sec Sent 1712372746 bytes 1172859 pkt (dropped 1451945, overlimits 0 requeues 0) rate 52287Kbit 4477pps backlog 185420b 127p requeues 0 ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() 2010-12-15 18:18 ` [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() Eric Dumazet 2010-12-15 19:10 ` Eric Dumazet @ 2010-12-16 8:16 ` Jarek Poplawski 2010-12-16 10:18 ` [PATCH v2 " Eric Dumazet 2010-12-16 11:03 ` [PATCH " Eric Dumazet 2010-12-20 21:14 ` David Miller 2 siblings, 2 replies; 40+ messages in thread From: Jarek Poplawski @ 2010-12-16 8:16 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev, Patrick McHardy On 2010-12-15 19:18, Eric Dumazet wrote: > We currently return for each active SFQ slot the number of packets in > queue. We can also give number of bytes accounted for these packets. > > tc -s class show dev ifb0 > > Before patch : > > class sfq 11:3d9 parent 11: > (dropped 0, overlimits 0 requeues 0) > backlog 0b 3p requeues 0 > allot 1266 > > After patch : > > class sfq 11:3e4 parent 11: > (dropped 0, overlimits 0 requeues 0) > backlog 4380b 3p requeues 0 > allot 1212 > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > net/sched/sch_sfq.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c > index 3cf478d..cb331de 100644 > --- a/net/sched/sch_sfq.c > +++ b/net/sched/sch_sfq.c > @@ -548,8 +548,13 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl, > { > struct sfq_sched_data *q = qdisc_priv(sch); > sfq_index idx = q->ht[cl-1]; > - struct gnet_stats_queue qs = { .qlen = q->qs[idx].qlen }; > + struct sk_buff_head *list = &q->qs[idx]; > + struct gnet_stats_queue qs = { .qlen = list->qlen }; > struct tc_sfq_xstats xstats = { .allot = q->allot[idx] }; > + struct sk_buff *skb; > + > + skb_queue_walk(list, skb) > + qs.backlog += qdisc_pkt_len(skb); I don't think you can walk this list without the qdisc lock. Jarek P. ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v2 net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() 2010-12-16 8:16 ` Jarek Poplawski @ 2010-12-16 10:18 ` Eric Dumazet 2010-12-16 11:03 ` [PATCH " Eric Dumazet 1 sibling, 0 replies; 40+ messages in thread From: Eric Dumazet @ 2010-12-16 10:18 UTC (permalink / raw) To: Jarek Poplawski; +Cc: David Miller, netdev, Patrick McHardy Le jeudi 16 décembre 2010 à 08:16 +0000, Jarek Poplawski a écrit : > I don't think you can walk this list without the qdisc lock. > I assumed it was already the case, but I did not check Me confused... If I use following patch, I get a recursive lock : net/sched/sch_sfq.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 3cf478d..a2cde03 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -547,9 +547,20 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl, struct gnet_dump *d) { struct sfq_sched_data *q = qdisc_priv(sch); - sfq_index idx = q->ht[cl-1]; - struct gnet_stats_queue qs = { .qlen = q->qs[idx].qlen }; - struct tc_sfq_xstats xstats = { .allot = q->allot[idx] }; + sfq_index idx; + struct gnet_stats_queue qs = { 0 }; + struct tc_sfq_xstats xstats = { 0 }; + struct sk_buff_head *list; + struct sk_buff *skb; + + spin_lock_bh(qdisc_root_sleeping_lock(sch)); + idx = q->ht[cl - 1]; + list = &q->qs[idx]; + xstats.allot = q->allot[idx]; + qs.qlen = list->qlen; + skb_queue_walk(list, skb) + qs.backlog += qdisc_pkt_len(skb); + spin_unlock_bh(qdisc_root_sleeping_lock(sch)); if (gnet_stats_copy_queue(d, &qs) < 0) return -1; Dec 16 10:49:34 edumdev kernel: [ 616.452080] sch->qstats.backlog=185420 Dec 16 10:49:34 edumdev kernel: [ 616.452146] Dec 16 10:49:34 edumdev kernel: [ 616.452147] ============================================= Dec 16 10:49:34 edumdev kernel: [ 616.452265] [ INFO: possible recursive locking detected ] Dec 16 10:49:34 edumdev kernel: [ 616.452329] 2.6.37-rc1-01820-g4be8976-dirty #456 Dec 16 10:49:34 edumdev kernel: [ 616.452425] --------------------------------------------- Dec 16 10:49:34 edumdev kernel: [ 616.452489] tc/8747 is trying to acquire lock: Dec 16 10:49:34 edumdev kernel: [ 616.452550] (&qdisc_tx_lock){+.-...}, at: [<ffffffffa01331d5>] sfq_dump_class_stats+0x65/0x160 [sch_sfq] Dec 16 10:49:34 edumdev kernel: [ 616.452753] Dec 16 10:49:34 edumdev kernel: [ 616.452754] but task is already holding lock: Dec 16 10:49:34 edumdev kernel: [ 616.452867] (&qdisc_tx_lock){+.-...}, at: [<ffffffff8145474a>] gnet_stats_start_copy_compat+0x4a/0xc0 Dec 16 10:49:34 edumdev kernel: [ 616.453068] Dec 16 10:49:34 edumdev kernel: [ 616.453069] other info that might help us debug this: Dec 16 10:49:34 edumdev kernel: [ 616.453184] 2 locks held by tc/8747: Dec 16 10:49:34 edumdev kernel: [ 616.453243] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff8149dbc2>] netlink_dump+0x52/0x1e0 Dec 16 10:49:34 edumdev kernel: [ 616.453510] #1: (&qdisc_tx_lock){+.-...}, at: [<ffffffff8145474a>] gnet_stats_start_copy_compat+0x4a/0xc0 Dec 16 10:49:34 edumdev kernel: [ 616.453745] Dec 16 10:49:35 edumdev kernel: [ 616.453746] stack backtrace: Dec 16 10:49:35 edumdev kernel: [ 616.453857] Pid: 8747, comm: tc Tainted: G W 2.6.37-rc1-01820-g4be8976-dirty #456 Dec 16 10:49:35 edumdev kernel: [ 616.453943] Call Trace: Dec 16 10:49:35 edumdev kernel: [ 616.454004] [<ffffffff8107bd1e>] validate_chain+0x10be/0x1330 Dec 16 10:49:35 edumdev kernel: [ 616.454072] [<ffffffff8107b144>] ? validate_chain+0x4e4/0x1330 Dec 16 10:49:35 edumdev kernel: [ 616.454142] [<ffffffff810cd6cc>] ? get_page_from_freelist+0x2bc/0x730 Dec 16 10:49:35 edumdev kernel: [ 616.454211] [<ffffffff81079200>] ? trace_hardirqs_on_caller+0x110/0x190 Dec 16 10:49:35 edumdev kernel: [ 616.454281] [<ffffffff8107c3e9>] __lock_acquire+0x459/0xbe0 Dec 16 10:49:35 edumdev kernel: [ 616.454379] [<ffffffff8107cc10>] lock_acquire+0xa0/0x140 Dec 16 10:49:35 edumdev kernel: [ 616.454446] [<ffffffffa01331d5>] ? sfq_dump_class_stats+0x65/0x160 [sch_sfq] Dec 16 10:49:35 edumdev kernel: [ 616.454520] [<ffffffff815aee46>] _raw_spin_lock_bh+0x36/0x50 Dec 16 10:49:35 edumdev kernel: [ 616.454586] [<ffffffffa01331d5>] ? sfq_dump_class_stats+0x65/0x160 [sch_sfq] Dec 16 10:49:35 edumdev kernel: [ 616.454657] [<ffffffffa01331d5>] sfq_dump_class_stats+0x65/0x160 [sch_sfq] Dec 16 10:49:35 edumdev kernel: [ 616.454727] [<ffffffff81202ef4>] ? nla_put+0x34/0x40 Dec 16 10:49:35 edumdev kernel: [ 616.454794] [<ffffffff8145478f>] ? gnet_stats_start_copy_compat+0x8f/0xc0 Dec 16 10:49:35 edumdev kernel: [ 616.454865] [<ffffffff8147a2f1>] tc_fill_tclass+0x1b1/0x250 Dec 16 10:49:35 edumdev kernel: [ 616.454932] [<ffffffff8147a3ce>] qdisc_class_dump+0x3e/0x40 Dec 16 10:49:35 edumdev kernel: [ 616.454999] [<ffffffff81483a68>] ? cbq_walk+0x78/0xc0 Dec 16 10:49:35 edumdev kernel: [ 616.455064] [<ffffffffa013228c>] sfq_walk+0x5c/0x90 [sch_sfq] Dec 16 10:49:35 edumdev kernel: [ 616.455131] [<ffffffff81479f3a>] tc_dump_tclass_qdisc+0xba/0x110 Dec 16 10:49:35 edumdev kernel: [ 616.455199] [<ffffffff8147a390>] ? qdisc_class_dump+0x0/0x40 Dec 16 10:49:35 edumdev kernel: [ 616.455266] [<ffffffff8147a00f>] tc_dump_tclass_root+0x7f/0xa0 Dec 16 10:49:35 edumdev kernel: [ 616.455332] [<ffffffff8147a0bc>] tc_dump_tclass+0x8c/0x110 Dec 16 10:49:35 edumdev kernel: [ 616.455426] [<ffffffff8149dbdd>] netlink_dump+0x6d/0x1e0 Dec 16 10:49:35 edumdev kernel: [ 616.455494] [<ffffffff814a0c0c>] netlink_dump_start+0x19c/0x210 Dec 16 10:49:35 edumdev kernel: [ 616.455562] [<ffffffff8147a030>] ? tc_dump_tclass+0x0/0x110 Dec 16 10:49:35 edumdev kernel: [ 616.455628] [<ffffffff8147a030>] ? tc_dump_tclass+0x0/0x110 Dec 16 10:49:35 edumdev kernel: [ 616.455694] [<ffffffff8146bfa9>] rtnetlink_rcv_msg+0xb9/0x260 Dec 16 10:49:35 edumdev kernel: [ 616.455763] [<ffffffff8146bef0>] ? rtnetlink_rcv_msg+0x0/0x260 Dec 16 10:49:35 edumdev kernel: [ 616.455832] [<ffffffff8149ef29>] netlink_rcv_skb+0x99/0xc0 Dec 16 10:49:35 edumdev kernel: [ 616.455898] [<ffffffff8146bed5>] rtnetlink_rcv+0x25/0x40 Dec 16 10:49:35 edumdev kernel: [ 616.455963] [<ffffffff8149ea95>] ? netlink_unicast+0xf5/0x2d0 Dec 16 10:49:35 edumdev kernel: [ 616.456030] [<ffffffff8149ec42>] netlink_unicast+0x2a2/0x2d0 Dec 16 10:49:35 edumdev kernel: [ 616.456098] [<ffffffff810e8cb3>] ? might_fault+0x53/0xb0 Dec 16 10:49:35 edumdev kernel: [ 616.456163] [<ffffffff81451fed>] ? memcpy_fromiovec+0x6d/0x90 Dec 16 10:49:35 edumdev kernel: [ 616.456231] [<ffffffff8149fbdd>] netlink_sendmsg+0x24d/0x390 Dec 16 10:49:35 edumdev kernel: [ 616.456299] [<ffffffff814463d0>] sock_sendmsg+0xc0/0xf0 Dec 16 10:49:36 edumdev kernel: [ 616.456390] [<ffffffff810e8cb3>] ? might_fault+0x53/0xb0 Dec 16 10:49:36 edumdev kernel: [ 616.456457] [<ffffffff814767ce>] ? verify_compat_iovec+0x6e/0x110 Dec 16 10:49:36 edumdev kernel: [ 616.456526] [<ffffffff81447164>] sys_sendmsg+0x194/0x320 Dec 16 10:49:36 edumdev kernel: [ 616.456593] [<ffffffff815b2e02>] ? do_page_fault+0x102/0x4e0 Dec 16 10:49:36 edumdev kernel: [ 616.456661] [<ffffffff8107cd4d>] ? lock_release_non_nested+0x9d/0x2e0 Dec 16 10:49:36 edumdev kernel: [ 616.456729] [<ffffffff810e8cb3>] ? might_fault+0x53/0xb0 Dec 16 10:49:36 edumdev kernel: [ 616.456796] [<ffffffff810e8cb3>] ? might_fault+0x53/0xb0 Dec 16 10:49:36 edumdev kernel: [ 616.456863] [<ffffffff81476154>] compat_sys_sendmsg+0x14/0x20 Dec 16 10:49:36 edumdev kernel: [ 616.456929] [<ffffffff814770fe>] compat_sys_socketcall+0x1be/0x210 Dec 16 10:49:36 edumdev kernel: [ 616.457000] [<ffffffff8102f1d0>] sysenter_dispatch+0x7/0x33 Dec 16 10:49:36 edumdev kernel: [ 616.457067] [<ffffffff815ae9a9>] ? trace_hardirqs_on_thunk+0x3a/0x3f ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() 2010-12-16 8:16 ` Jarek Poplawski 2010-12-16 10:18 ` [PATCH v2 " Eric Dumazet @ 2010-12-16 11:03 ` Eric Dumazet 2010-12-16 13:09 ` Jarek Poplawski 1 sibling, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2010-12-16 11:03 UTC (permalink / raw) To: Jarek Poplawski; +Cc: David Miller, netdev, Patrick McHardy Le jeudi 16 décembre 2010 à 08:16 +0000, Jarek Poplawski a écrit : > On 2010-12-15 19:18, Eric Dumazet wrote: > > We currently return for each active SFQ slot the number of packets in > > queue. We can also give number of bytes accounted for these packets. > > > > tc -s class show dev ifb0 > > > > Before patch : > > > > class sfq 11:3d9 parent 11: > > (dropped 0, overlimits 0 requeues 0) > > backlog 0b 3p requeues 0 > > allot 1266 > > > > After patch : > > > > class sfq 11:3e4 parent 11: > > (dropped 0, overlimits 0 requeues 0) > > backlog 4380b 3p requeues 0 > > allot 1212 > > > > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > --- > > net/sched/sch_sfq.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c > > index 3cf478d..cb331de 100644 > > --- a/net/sched/sch_sfq.c > > +++ b/net/sched/sch_sfq.c > > @@ -548,8 +548,13 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl, > > { > > struct sfq_sched_data *q = qdisc_priv(sch); > > sfq_index idx = q->ht[cl-1]; > > - struct gnet_stats_queue qs = { .qlen = q->qs[idx].qlen }; > > + struct sk_buff_head *list = &q->qs[idx]; > > + struct gnet_stats_queue qs = { .qlen = list->qlen }; > > struct tc_sfq_xstats xstats = { .allot = q->allot[idx] }; > > + struct sk_buff *skb; > > + > > + skb_queue_walk(list, skb) > > + qs.backlog += qdisc_pkt_len(skb); > > I don't think you can walk this list without the qdisc lock. So after checks, I confirm qdisc lock is held at this point, patch is valid. tc_fill_tclass() calls gnet_stats_start_copy_compat() (and locks qdisc_root_sleeping_lock()), before calling cl_ops->dump_stats(q, cl, &d) Thanks ! ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() 2010-12-16 11:03 ` [PATCH " Eric Dumazet @ 2010-12-16 13:09 ` Jarek Poplawski 0 siblings, 0 replies; 40+ messages in thread From: Jarek Poplawski @ 2010-12-16 13:09 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, netdev, Patrick McHardy On Thu, Dec 16, 2010 at 12:03:04PM +0100, Eric Dumazet wrote: > Le jeudi 16 décembre 2010 ?? 08:16 +0000, Jarek Poplawski a écrit : > > On 2010-12-15 19:18, Eric Dumazet wrote: > > > We currently return for each active SFQ slot the number of packets in > > > queue. We can also give number of bytes accounted for these packets. ... > > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > > --- > > > net/sched/sch_sfq.c | 7 ++++++- > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c > > > index 3cf478d..cb331de 100644 > > > --- a/net/sched/sch_sfq.c > > > +++ b/net/sched/sch_sfq.c > > > @@ -548,8 +548,13 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl, > > > { > > > struct sfq_sched_data *q = qdisc_priv(sch); > > > sfq_index idx = q->ht[cl-1]; > > > - struct gnet_stats_queue qs = { .qlen = q->qs[idx].qlen }; > > > + struct sk_buff_head *list = &q->qs[idx]; > > > + struct gnet_stats_queue qs = { .qlen = list->qlen }; > > > struct tc_sfq_xstats xstats = { .allot = q->allot[idx] }; > > > + struct sk_buff *skb; > > > + > > > + skb_queue_walk(list, skb) > > > + qs.backlog += qdisc_pkt_len(skb); > > > > I don't think you can walk this list without the qdisc lock. > > So after checks, I confirm qdisc lock is held at this point, patch is > valid. > > tc_fill_tclass() calls gnet_stats_start_copy_compat() (and locks > qdisc_root_sleeping_lock()), before calling > cl_ops->dump_stats(q, cl, &d) > > Thanks ! You are right. Sorry for misleading. Jarek P. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() 2010-12-15 18:18 ` [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() Eric Dumazet 2010-12-15 19:10 ` Eric Dumazet 2010-12-16 8:16 ` Jarek Poplawski @ 2010-12-20 21:14 ` David Miller 2 siblings, 0 replies; 40+ messages in thread From: David Miller @ 2010-12-20 21:14 UTC (permalink / raw) To: eric.dumazet; +Cc: netdev, jarkao2, kaber From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 15 Dec 2010 19:18:36 +0100 > We currently return for each active SFQ slot the number of packets in > queue. We can also give number of bytes accounted for these packets. > > tc -s class show dev ifb0 > > Before patch : > > class sfq 11:3d9 parent 11: > (dropped 0, overlimits 0 requeues 0) > backlog 0b 3p requeues 0 > allot 1266 > > After patch : > > class sfq 11:3e4 parent 11: > (dropped 0, overlimits 0 requeues 0) > backlog 4380b 3p requeues 0 > allot 1212 > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Applied, thanks Eric. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v3] net_sched: sch_sfq: fix allot handling 2010-12-15 17:30 ` [PATCH v3] " Eric Dumazet 2010-12-15 18:18 ` [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() Eric Dumazet @ 2010-12-20 21:18 ` David Miller 1 sibling, 0 replies; 40+ messages in thread From: David Miller @ 2010-12-20 21:18 UTC (permalink / raw) To: eric.dumazet; +Cc: kaber, netdev, jarkao2 From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 15 Dec 2010 18:30:27 +0100 > [PATCH v3] net_sched: sch_sfq: fix allot handling ... > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Applied to net-2.6, thanks. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2] net_sched: sch_sfq: fix allot handling 2010-12-15 17:03 ` Patrick McHardy 2010-12-15 17:09 ` Eric Dumazet @ 2010-12-16 13:08 ` Eric Dumazet 2010-12-17 16:52 ` [RFC PATCH] net_sched: sch_sfq: better struct layouts Eric Dumazet 1 sibling, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2010-12-16 13:08 UTC (permalink / raw) To: Patrick McHardy; +Cc: David Miller, netdev, Jarek Poplawski Le mercredi 15 décembre 2010 à 18:03 +0100, Patrick McHardy a écrit : > On 15.12.2010 17:55, Eric Dumazet wrote: > > I was thinking in allowing more packets per SFQ (but keep the 126 active > > flows limit), what do you think ? > > I keep forgetting why this limit exists, let me try to figure > it out once more :) I took a look, and found we already are at max, unless we change sfq_index type (from u8 to u16) SFQ_SLOTS is 128 (not really exist, but this is the number of slots) SFQ_DEPTH is 128 (max depth for one slot) Constraints are : SFQ_SLOTS < 256 SFQ_DEPTH < 256 SFQ_SLOTS + SFQ_DEPTH < 256, because of the dep[] array. We could avoid the "struct sk_buff_head" structure with its un-needed spinlock, and eventually group data for a given slot in a structure to reduce number of cache lines we touch... struct sfq_slot { struct sk_buff *first; struct sk_buff *last; u8 qlen; sfq_index next; /* dequeue chain */ u16 hash; short allot; /* 16bit hole */ }; This would save 768 bytes on x86_64 (and much more if LOCKDEP is used) ^ permalink raw reply [flat|nested] 40+ messages in thread
* [RFC PATCH] net_sched: sch_sfq: better struct layouts 2010-12-16 13:08 ` [PATCH v2] " Eric Dumazet @ 2010-12-17 16:52 ` Eric Dumazet 2010-12-19 21:22 ` Jarek Poplawski 0 siblings, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2010-12-17 16:52 UTC (permalink / raw) To: Patrick McHardy; +Cc: David Miller, netdev, Jarek Poplawski Le jeudi 16 décembre 2010 à 14:08 +0100, Eric Dumazet a écrit : > Le mercredi 15 décembre 2010 à 18:03 +0100, Patrick McHardy a écrit : > > On 15.12.2010 17:55, Eric Dumazet wrote: > > > > I was thinking in allowing more packets per SFQ (but keep the 126 active > > > flows limit), what do you think ? > > > > I keep forgetting why this limit exists, let me try to figure > > it out once more :) > > I took a look, and found we already are at max, unless we change > sfq_index type (from u8 to u16) > > SFQ_SLOTS is 128 (not really exist, but this is the number of slots) > SFQ_DEPTH is 128 (max depth for one slot) > Constraints are : > SFQ_SLOTS < 256 > SFQ_DEPTH < 256 > SFQ_SLOTS + SFQ_DEPTH < 256, because of the dep[] array. > > We could avoid the "struct sk_buff_head" structure with its un-needed > spinlock, and eventually group data for a given slot in a structure to > reduce number of cache lines we touch... > > struct sfq_slot { > struct sk_buff *first; > struct sk_buff *last; > u8 qlen; > sfq_index next; /* dequeue chain */ > u16 hash; > short allot; > /* 16bit hole */ > }; > > This would save 768 bytes on x86_64 (and much more if LOCKDEP is used) > > Here is a preliminary patch, shrinking sizeof(struct sfq_sched_data) from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and reduce text size as well. text data bss dec hex filename 4821 152 0 4973 136d old/net/sched/sch_sfq.o 4651 136 0 4787 12b3 new/net/sched/sch_sfq.o All data for a slot/flow is now grouped in a compact and cache friendly structure : struct sfq_slot { struct sk_buff *skblist_next; struct sk_buff *skblist_prev; sfq_index qlen; /* number of skbs in skblist */ sfq_index next; /* next slot in sfq chain */ unsigned short hash; /* hash value (index in ht[]) */ short allot; /* credit for this slot */ struct sfq_head anchor; /* anchor in dep[] chains */ }; net/sched/sch_sfq.c | 223 +++++++++++++++++++++++------------------- 1 file changed, 125 insertions(+), 98 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 3cf478d..28968eb 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -69,25 +69,40 @@ This implementation limits maximal queue length to 128; maximal mtu to 2^15-1; number of hash buckets to 1024. The only goal of this restrictions was that all data - fit into one 4K page :-). Struct sfq_sched_data is - organized in anti-cache manner: all the data for a bucket - are scattered over different locations. This is not good, - but it allowed me to put it into 4K. + fit into one 4K page on 32bit arches. It is easy to increase these values, but not in flight. */ -#define SFQ_DEPTH 128 +#define SFQ_DEPTH 128 /* max number of packets per slot (per flow) */ +#define SFQ_SLOTS 128 /* max number of flows */ +#define EMPTY_SLOT 255 #define SFQ_HASH_DIVISOR 1024 -/* This type should contain at least SFQ_DEPTH*2 values */ +/* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */ typedef unsigned char sfq_index; +/* + * We dont use pointers to save space. + * Small indexes [0 ... SFQ_SLOTS - 1] are 'pointers' to slots[] array + * while following values [SFQ_SLOTS ... SFQ_SLOTS + SFQ_DEPTH - 1] + * are 'pointers' to dep[] array + */ struct sfq_head { sfq_index next; sfq_index prev; }; +struct sfq_slot { + struct sk_buff *skblist_next; + struct sk_buff *skblist_prev; + sfq_index qlen; /* number of skbs in skblist */ + sfq_index next; /* next slot in sfq chain */ + unsigned short hash; /* hash value (index in ht[]) */ + short allot; /* credit for this slot */ + struct sfq_head anchor; /* anchor in dep[] chains */ +}; + struct sfq_sched_data { /* Parameters */ @@ -99,17 +114,24 @@ struct sfq_sched_data struct tcf_proto *filter_list; struct timer_list perturb_timer; u32 perturbation; - sfq_index tail; /* Index of current slot in round */ - sfq_index max_depth; /* Maximal depth */ + sfq_index max_depth; /* depth of longest slot */ + struct sfq_slot *tail; /* current slot in round */ sfq_index ht[SFQ_HASH_DIVISOR]; /* Hash table */ - sfq_index next[SFQ_DEPTH]; /* Active slots link */ - short allot[SFQ_DEPTH]; /* Current allotment per slot */ - unsigned short hash[SFQ_DEPTH]; /* Hash value indexed by slots */ - struct sk_buff_head qs[SFQ_DEPTH]; /* Slot queue */ - struct sfq_head dep[SFQ_DEPTH*2]; /* Linked list of slots, indexed by depth */ + struct sfq_slot slots[SFQ_SLOTS]; + struct sfq_head dep[SFQ_DEPTH]; /* Linked list of slots, indexed by depth */ }; +/* + * sfq_head are either in a sfq_slot or in dep[] array + */ +static inline struct sfq_head *get_head(struct sfq_sched_data *q, sfq_index val) +{ + if (val < SFQ_SLOTS) + return &q->slots[val].anchor; + return &q->dep[val - SFQ_SLOTS]; +} + static __inline__ unsigned sfq_fold_hash(struct sfq_sched_data *q, u32 h, u32 h1) { return jhash_2words(h, h1, q->perturbation) & (SFQ_HASH_DIVISOR - 1); @@ -200,30 +222,37 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch, return 0; } +/* + * x : slot number [0 .. SFQ_SLOTS - 1] + */ static inline void sfq_link(struct sfq_sched_data *q, sfq_index x) { sfq_index p, n; - int d = q->qs[x].qlen + SFQ_DEPTH; + int qlen = q->slots[x].qlen; - p = d; - n = q->dep[d].next; - q->dep[x].next = n; - q->dep[x].prev = p; - q->dep[p].next = q->dep[n].prev = x; + p = qlen + SFQ_SLOTS; + n = q->dep[qlen].next; + + q->slots[x].anchor.next = n; + q->slots[x].anchor.prev = p; + + q->dep[qlen].next = x; /* get_head(q, p)->next = x */ + get_head(q, n)->prev = x; } static inline void sfq_dec(struct sfq_sched_data *q, sfq_index x) { sfq_index p, n; + int d; - n = q->dep[x].next; - p = q->dep[x].prev; - q->dep[p].next = n; - q->dep[n].prev = p; + n = q->slots[x].anchor.next; + p = q->slots[x].anchor.prev; + get_head(q, p)->next = n; + get_head(q, n)->prev = p; - if (n == p && q->max_depth == q->qs[x].qlen + 1) + d = q->slots[x].qlen--; + if (n == p && q->max_depth == d) q->max_depth--; - sfq_link(q, x); } @@ -232,34 +261,36 @@ static inline void sfq_inc(struct sfq_sched_data *q, sfq_index x) sfq_index p, n; int d; - n = q->dep[x].next; - p = q->dep[x].prev; - q->dep[p].next = n; - q->dep[n].prev = p; - d = q->qs[x].qlen; + n = q->slots[x].anchor.next; + p = q->slots[x].anchor.prev; + get_head(q, p)->next = n; + get_head(q, n)->prev = p; + + d = ++q->slots[x].qlen; if (q->max_depth < d) q->max_depth = d; - sfq_link(q, x); } static unsigned int sfq_drop(struct Qdisc *sch) { struct sfq_sched_data *q = qdisc_priv(sch); - sfq_index d = q->max_depth; + sfq_index x, d = q->max_depth; struct sk_buff *skb; unsigned int len; + struct sfq_slot *slot; - /* Queue is full! Find the longest slot and - drop a packet from it */ - + /* Queue is full! Find the longest slot and drop tail packet from it */ if (d > 1) { - sfq_index x = q->dep[d + SFQ_DEPTH].next; - skb = q->qs[x].prev; + x = q->dep[d].next; + slot = &q->slots[x]; +drop: + skb = slot->skblist_prev; + slot->skblist_prev = skb->prev; len = qdisc_pkt_len(skb); - __skb_unlink(skb, &q->qs[x]); - kfree_skb(skb); sfq_dec(q, x); + skb->next = skb->prev = NULL; + kfree_skb(skb); sch->q.qlen--; sch->qstats.drops++; sch->qstats.backlog -= len; @@ -268,19 +299,11 @@ static unsigned int sfq_drop(struct Qdisc *sch) if (d == 1) { /* It is difficult to believe, but ALL THE SLOTS HAVE LENGTH 1. */ - d = q->next[q->tail]; - q->next[q->tail] = q->next[d]; - q->allot[q->next[d]] += q->quantum; - skb = q->qs[d].prev; - len = qdisc_pkt_len(skb); - __skb_unlink(skb, &q->qs[d]); - kfree_skb(skb); - sfq_dec(q, d); - sch->q.qlen--; - q->ht[q->hash[d]] = SFQ_DEPTH; - sch->qstats.drops++; - sch->qstats.backlog -= len; - return len; + x = q->tail->next; + slot = &q->slots[x]; + q->tail->next = slot->next; + q->ht[slot->hash] = EMPTY_SLOT; + goto drop; } return 0; @@ -292,6 +315,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) struct sfq_sched_data *q = qdisc_priv(sch); unsigned int hash; sfq_index x; + struct sfq_slot *slot; int uninitialized_var(ret); hash = sfq_classify(skb, sch, &ret); @@ -304,31 +328,36 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) hash--; x = q->ht[hash]; - if (x == SFQ_DEPTH) { - q->ht[hash] = x = q->dep[SFQ_DEPTH].next; - q->hash[x] = hash; + slot = &q->slots[x]; + if (x == EMPTY_SLOT) { + x = q->dep[0].next; /* get a free slot */ + q->ht[hash] = x; + slot = &q->slots[x]; + slot->hash = hash; + slot->skblist_next = slot->skblist_prev = (struct sk_buff *)slot; } - /* If selected queue has length q->limit, this means that - * all another queues are empty and that we do simple tail drop, + /* If selected queue has length q->limit, do simple tail drop, * i.e. drop _this_ packet. */ - if (q->qs[x].qlen >= q->limit) + if (slot->qlen >= q->limit) return qdisc_drop(skb, sch); sch->qstats.backlog += qdisc_pkt_len(skb); - __skb_queue_tail(&q->qs[x], skb); + skb->prev = slot->skblist_prev; + skb->next = (struct sk_buff *)slot; + slot->skblist_prev->next = skb; + slot->skblist_prev = skb; sfq_inc(q, x); - if (q->qs[x].qlen == 1) { /* The flow is new */ - if (q->tail == SFQ_DEPTH) { /* It is the first flow */ - q->tail = x; - q->next[x] = x; - q->allot[x] = q->quantum; + if (slot->qlen == 1) { /* The flow is new */ + if (q->tail == NULL) { /* It is the first flow */ + slot->next = x; } else { - q->next[x] = q->next[q->tail]; - q->next[q->tail] = x; - q->tail = x; + slot->next = q->tail->next; + q->tail->next = x; } + q->tail = slot; + slot->allot = q->quantum; } if (++sch->q.qlen <= q->limit) { sch->bstats.bytes += qdisc_pkt_len(skb); @@ -344,14 +373,12 @@ static struct sk_buff * sfq_peek(struct Qdisc *sch) { struct sfq_sched_data *q = qdisc_priv(sch); - sfq_index a; /* No active slots */ - if (q->tail == SFQ_DEPTH) + if (q->tail == NULL) return NULL; - a = q->next[q->tail]; - return skb_peek(&q->qs[a]); + return q->slots[q->tail->next].skblist_next; } static struct sk_buff * @@ -359,34 +386,35 @@ sfq_dequeue(struct Qdisc *sch) { struct sfq_sched_data *q = qdisc_priv(sch); struct sk_buff *skb; - sfq_index a, old_a; + sfq_index a, next_a; + struct sfq_slot *slot; /* No active slots */ - if (q->tail == SFQ_DEPTH) + if (q->tail == NULL) return NULL; - a = old_a = q->next[q->tail]; - + a = q->tail->next; + slot = &q->slots[a]; /* Grab packet */ - skb = __skb_dequeue(&q->qs[a]); + skb = slot->skblist_next; + slot->skblist_next = skb->next; + skb->next = skb->prev = NULL; sfq_dec(q, a); sch->q.qlen--; sch->qstats.backlog -= qdisc_pkt_len(skb); - /* Is the slot empty? */ - if (q->qs[a].qlen == 0) { - q->ht[q->hash[a]] = SFQ_DEPTH; - a = q->next[a]; - if (a == old_a) { - q->tail = SFQ_DEPTH; + /* Is the slot now empty? */ + if (slot->qlen == 0) { + q->ht[slot->hash] = EMPTY_SLOT; + next_a = slot->next; + if (a == next_a) { + q->tail = NULL; /* no more active slots */ return skb; } - q->next[q->tail] = a; - q->allot[a] += q->quantum; - } else if ((q->allot[a] -= qdisc_pkt_len(skb)) <= 0) { - q->tail = a; - a = q->next[a]; - q->allot[a] += q->quantum; + q->tail->next = next_a; + } else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) { + q->tail = slot; + slot->allot += q->quantum; } return skb; } @@ -450,17 +478,16 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt) init_timer_deferrable(&q->perturb_timer); for (i = 0; i < SFQ_HASH_DIVISOR; i++) - q->ht[i] = SFQ_DEPTH; + q->ht[i] = EMPTY_SLOT; for (i = 0; i < SFQ_DEPTH; i++) { - skb_queue_head_init(&q->qs[i]); - q->dep[i + SFQ_DEPTH].next = i + SFQ_DEPTH; - q->dep[i + SFQ_DEPTH].prev = i + SFQ_DEPTH; + q->dep[i].next = i + SFQ_SLOTS; + q->dep[i].prev = i + SFQ_SLOTS; } q->limit = SFQ_DEPTH - 1; q->max_depth = 0; - q->tail = SFQ_DEPTH; + q->tail = NULL; if (opt == NULL) { q->quantum = psched_mtu(qdisc_dev(sch)); q->perturb_period = 0; @@ -471,7 +498,7 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt) return err; } - for (i = 0; i < SFQ_DEPTH; i++) + for (i = 0; i < SFQ_SLOTS; i++) sfq_link(q, i); return 0; } @@ -547,9 +574,9 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl, struct gnet_dump *d) { struct sfq_sched_data *q = qdisc_priv(sch); - sfq_index idx = q->ht[cl-1]; - struct gnet_stats_queue qs = { .qlen = q->qs[idx].qlen }; - struct tc_sfq_xstats xstats = { .allot = q->allot[idx] }; + const struct sfq_slot *slot = &q->slots[q->ht[cl - 1]]; + struct gnet_stats_queue qs = { .qlen = slot->qlen }; + struct tc_sfq_xstats xstats = { .allot = slot->allot }; if (gnet_stats_copy_queue(d, &qs) < 0) return -1; @@ -565,7 +592,7 @@ static void sfq_walk(struct Qdisc *sch, struct qdisc_walker *arg) return; for (i = 0; i < SFQ_HASH_DIVISOR; i++) { - if (q->ht[i] == SFQ_DEPTH || + if (q->ht[i] == EMPTY_SLOT || arg->count < arg->skip) { arg->count++; continue; ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [RFC PATCH] net_sched: sch_sfq: better struct layouts 2010-12-17 16:52 ` [RFC PATCH] net_sched: sch_sfq: better struct layouts Eric Dumazet @ 2010-12-19 21:22 ` Jarek Poplawski 2010-12-20 17:02 ` [PATCH v2] " Eric Dumazet 0 siblings, 1 reply; 40+ messages in thread From: Jarek Poplawski @ 2010-12-19 21:22 UTC (permalink / raw) To: Eric Dumazet; +Cc: Patrick McHardy, David Miller, netdev On Fri, Dec 17, 2010 at 05:52:46PM +0100, Eric Dumazet wrote: > Le jeudi 16 décembre 2010 ?? 14:08 +0100, Eric Dumazet a écrit : ... > > struct sfq_slot { > > struct sk_buff *first; > > struct sk_buff *last; > > u8 qlen; > > sfq_index next; /* dequeue chain */ > > u16 hash; > > short allot; > > /* 16bit hole */ > > }; > > > > This would save 768 bytes on x86_64 (and much more if LOCKDEP is used) I think open coding sk_buff_head is a wrong idea. Otherwise, this patch looks OK to me, only a few cosmetic suggestions below. > > Here is a preliminary patch, shrinking sizeof(struct sfq_sched_data) > from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and > reduce text size as well. > > text data bss dec hex filename > 4821 152 0 4973 136d old/net/sched/sch_sfq.o > 4651 136 0 4787 12b3 new/net/sched/sch_sfq.o > > > All data for a slot/flow is now grouped in a compact and cache friendly > structure : > > struct sfq_slot { > struct sk_buff *skblist_next; > struct sk_buff *skblist_prev; > sfq_index qlen; /* number of skbs in skblist */ > sfq_index next; /* next slot in sfq chain */ > unsigned short hash; /* hash value (index in ht[]) */ > short allot; /* credit for this slot */ > struct sfq_head anchor; /* anchor in dep[] chains */ > }; > > > > net/sched/sch_sfq.c | 223 +++++++++++++++++++++++------------------- > 1 file changed, 125 insertions(+), 98 deletions(-) > > diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c > index 3cf478d..28968eb 100644 > --- a/net/sched/sch_sfq.c > +++ b/net/sched/sch_sfq.c > @@ -69,25 +69,40 @@ > This implementation limits maximal queue length to 128; > maximal mtu to 2^15-1; number of hash buckets to 1024. > The only goal of this restrictions was that all data > - fit into one 4K page :-). Struct sfq_sched_data is > - organized in anti-cache manner: all the data for a bucket > - are scattered over different locations. This is not good, > - but it allowed me to put it into 4K. > + fit into one 4K page on 32bit arches. > > It is easy to increase these values, but not in flight. */ > > -#define SFQ_DEPTH 128 > +#define SFQ_DEPTH 128 /* max number of packets per slot (per flow) */ > +#define SFQ_SLOTS 128 /* max number of flows */ > +#define EMPTY_SLOT 255 SFQ_EMPTY_SLOT? > #define SFQ_HASH_DIVISOR 1024 > > -/* This type should contain at least SFQ_DEPTH*2 values */ > +/* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */ > typedef unsigned char sfq_index; > > +/* > + * We dont use pointers to save space. > + * Small indexes [0 ... SFQ_SLOTS - 1] are 'pointers' to slots[] array > + * while following values [SFQ_SLOTS ... SFQ_SLOTS + SFQ_DEPTH - 1] > + * are 'pointers' to dep[] array > + */ > struct sfq_head > { > sfq_index next; > sfq_index prev; > }; > > +struct sfq_slot { > + struct sk_buff *skblist_next; > + struct sk_buff *skblist_prev; > + sfq_index qlen; /* number of skbs in skblist */ > + sfq_index next; /* next slot in sfq chain */ > + unsigned short hash; /* hash value (index in ht[]) */ > + short allot; /* credit for this slot */ > + struct sfq_head anchor; /* anchor in dep[] chains */ struct sfq_head dep? > +}; > + > struct sfq_sched_data > { > /* Parameters */ > @@ -99,17 +114,24 @@ struct sfq_sched_data > struct tcf_proto *filter_list; > struct timer_list perturb_timer; > u32 perturbation; > - sfq_index tail; /* Index of current slot in round */ > - sfq_index max_depth; /* Maximal depth */ > + sfq_index max_depth; /* depth of longest slot */ depth and/or length? (One dimension should be enough.) > > + struct sfq_slot *tail; /* current slot in round */ > sfq_index ht[SFQ_HASH_DIVISOR]; /* Hash table */ > - sfq_index next[SFQ_DEPTH]; /* Active slots link */ > - short allot[SFQ_DEPTH]; /* Current allotment per slot */ > - unsigned short hash[SFQ_DEPTH]; /* Hash value indexed by slots */ > - struct sk_buff_head qs[SFQ_DEPTH]; /* Slot queue */ > - struct sfq_head dep[SFQ_DEPTH*2]; /* Linked list of slots, indexed by depth */ > + struct sfq_slot slots[SFQ_SLOTS]; > + struct sfq_head dep[SFQ_DEPTH]; /* Linked list of slots, indexed by depth */ > }; > > +/* > + * sfq_head are either in a sfq_slot or in dep[] array > + */ > +static inline struct sfq_head *get_head(struct sfq_sched_data *q, sfq_index val) static inline struct sfq_head *sfq_dep_head()? ... > @@ -304,31 +328,36 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) > hash--; > > x = q->ht[hash]; > - if (x == SFQ_DEPTH) { > - q->ht[hash] = x = q->dep[SFQ_DEPTH].next; > - q->hash[x] = hash; > + slot = &q->slots[x]; > + if (x == EMPTY_SLOT) { > + x = q->dep[0].next; /* get a free slot */ > + q->ht[hash] = x; > + slot = &q->slots[x]; > + slot->hash = hash; > + slot->skblist_next = slot->skblist_prev = (struct sk_buff *)slot; > } > > - /* If selected queue has length q->limit, this means that > - * all another queues are empty and that we do simple tail drop, No reason to remove this line. > + /* If selected queue has length q->limit, do simple tail drop, > * i.e. drop _this_ packet. > */ > - if (q->qs[x].qlen >= q->limit) > + if (slot->qlen >= q->limit) > return qdisc_drop(skb, sch); > > sch->qstats.backlog += qdisc_pkt_len(skb); > - __skb_queue_tail(&q->qs[x], skb); > + skb->prev = slot->skblist_prev; > + skb->next = (struct sk_buff *)slot; > + slot->skblist_prev->next = skb; > + slot->skblist_prev = skb; If you really have to do this, all these: __skb_queue_tail(), __skb_dequeue(), skb_queue_head_init(), skb_peek() etc. used here should stay as (local) inline functions to remain readability. Jarek P. ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v2] net_sched: sch_sfq: better struct layouts 2010-12-19 21:22 ` Jarek Poplawski @ 2010-12-20 17:02 ` Eric Dumazet 2010-12-20 21:33 ` David Miller ` (2 more replies) 0 siblings, 3 replies; 40+ messages in thread From: Eric Dumazet @ 2010-12-20 17:02 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Patrick McHardy, David Miller, netdev Le dimanche 19 décembre 2010 à 22:22 +0100, Jarek Poplawski a écrit : > I think open coding sk_buff_head is a wrong idea. Otherwise, this > patch looks OK to me, only a few cosmetic suggestions below. > I completely agree with you but this should be temporary, because David really wants to use list_head for skbs, I believe this will be done ;) I chose to name the list skblist to make clear where we want to plug a real list_head once done. Also, not using sk_buff_head saves at least 8 bytes per slot. > > > -#define SFQ_DEPTH 128 > > +#define SFQ_DEPTH 128 /* max number of packets per slot (per flow) */ > > +#define SFQ_SLOTS 128 /* max number of flows */ > > +#define EMPTY_SLOT 255 > > SFQ_EMPTY_SLOT? OK done > > > > +struct sfq_slot { > > + struct sk_buff *skblist_next; > > + struct sk_buff *skblist_prev; > > + sfq_index qlen; /* number of skbs in skblist */ > > + sfq_index next; /* next slot in sfq chain */ > > + unsigned short hash; /* hash value (index in ht[]) */ > > + short allot; /* credit for this slot */ > > + struct sfq_head anchor; /* anchor in dep[] chains */ > > struct sfq_head dep? OK > > > +}; > > + > > struct sfq_sched_data > > { > > /* Parameters */ > > @@ -99,17 +114,24 @@ struct sfq_sched_data > > struct tcf_proto *filter_list; > > struct timer_list perturb_timer; > > u32 perturbation; > > - sfq_index tail; /* Index of current slot in round */ > > - sfq_index max_depth; /* Maximal depth */ > > + sfq_index max_depth; /* depth of longest slot */ > > depth and/or length? (One dimension should be enough.) maybe cur_depth ? Its not the maximal possible depth, but depth of longest slot, or current max depth... > > > > > + struct sfq_slot *tail; /* current slot in round */ > > sfq_index ht[SFQ_HASH_DIVISOR]; /* Hash table */ > > - sfq_index next[SFQ_DEPTH]; /* Active slots link */ > > - short allot[SFQ_DEPTH]; /* Current allotment per slot */ > > - unsigned short hash[SFQ_DEPTH]; /* Hash value indexed by slots */ > > - struct sk_buff_head qs[SFQ_DEPTH]; /* Slot queue */ > > - struct sfq_head dep[SFQ_DEPTH*2]; /* Linked list of slots, indexed by depth */ > > + struct sfq_slot slots[SFQ_SLOTS]; > > + struct sfq_head dep[SFQ_DEPTH]; /* Linked list of slots, indexed by depth */ > > }; > > > > +/* > > + * sfq_head are either in a sfq_slot or in dep[] array > > + */ > > +static inline struct sfq_head *get_head(struct sfq_sched_data *q, sfq_index val) > > static inline struct sfq_head *sfq_dep_head()? > OK > > - /* If selected queue has length q->limit, this means that > > - * all another queues are empty and that we do simple tail drop, > > No reason to remove this line. Well, the reason we drop this packet is not because other queues are empty, but because we reach max depth for this queue. (I have the idea to extend SFQ to allow more packets to be queued, still with a 127 limit per queue, and 127 flows). With 10Gbs links, a global limit of 127 packets is short. > If you really have to do this, all these: __skb_queue_tail(), > __skb_dequeue(), skb_queue_head_init(), skb_peek() etc. used here > should stay as (local) inline functions to remain readability. > OK done, thanks a lot for reviewing and very useful comments ! We should address the problem of allot being 16bit, GRO makes allot overflow so fast, that SFQ is not fair at all... allot could use 17 bits and hash only 15 (10 really needed for current divisor) [PATCH v2] net_sched: sch_sfq: better struct layouts This patch shrinks sizeof(struct sfq_sched_data) from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and reduce text size as well. text data bss dec hex filename 4821 152 0 4973 136d old/net/sched/sch_sfq.o 4627 136 0 4763 129b new/net/sched/sch_sfq.o All data for a slot/flow is now grouped in a compact and cache friendly structure, instead of being spreaded in many different points. struct sfq_slot { struct sk_buff *skblist_next; struct sk_buff *skblist_prev; sfq_index qlen; /* number of skbs in skblist */ sfq_index next; /* next slot in sfq chain */ unsigned short hash; /* hash value (index in ht[]) */ short allot; /* credit for this slot */ struct sfq_head dep; /* anchor in dep[] chains */ }; Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jarek Poplawski <jarkao2@gmail.com> --- v2: address Jarek comments net/sched/sch_sfq.c | 263 +++++++++++++++++++++++++----------------- 1 files changed, 160 insertions(+), 103 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 3cf478d..ef94f3d 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -67,27 +67,42 @@ IMPLEMENTATION: This implementation limits maximal queue length to 128; - maximal mtu to 2^15-1; number of hash buckets to 1024. + maximal mtu to 2^15-1; max 128 flows, number of hash buckets to 1024. The only goal of this restrictions was that all data - fit into one 4K page :-). Struct sfq_sched_data is - organized in anti-cache manner: all the data for a bucket - are scattered over different locations. This is not good, - but it allowed me to put it into 4K. + fit into one 4K page on 32bit arches. It is easy to increase these values, but not in flight. */ -#define SFQ_DEPTH 128 +#define SFQ_DEPTH 128 /* max number of packets per flow */ +#define SFQ_SLOTS 128 /* max number of flows */ +#define SFQ_EMPTY_SLOT 255 #define SFQ_HASH_DIVISOR 1024 -/* This type should contain at least SFQ_DEPTH*2 values */ +/* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */ typedef unsigned char sfq_index; +/* + * We dont use pointers to save space. + * Small indexes [0 ... SFQ_SLOTS - 1] are 'pointers' to slots[] array + * while following values [SFQ_SLOTS ... SFQ_SLOTS + SFQ_DEPTH - 1] + * are 'pointers' to dep[] array + */ struct sfq_head { sfq_index next; sfq_index prev; }; +struct sfq_slot { + struct sk_buff *skblist_next; + struct sk_buff *skblist_prev; + sfq_index qlen; /* number of skbs in skblist */ + sfq_index next; /* next slot in sfq chain */ + unsigned short hash; /* hash value (index in ht[]) */ + short allot; /* credit for this slot */ + struct sfq_head dep; /* anchor in dep[] chains */ +}; + struct sfq_sched_data { /* Parameters */ @@ -99,17 +114,24 @@ struct sfq_sched_data struct tcf_proto *filter_list; struct timer_list perturb_timer; u32 perturbation; - sfq_index tail; /* Index of current slot in round */ - sfq_index max_depth; /* Maximal depth */ + sfq_index cur_depth; /* depth of longest slot */ + struct sfq_slot *tail; /* current slot in round */ sfq_index ht[SFQ_HASH_DIVISOR]; /* Hash table */ - sfq_index next[SFQ_DEPTH]; /* Active slots link */ - short allot[SFQ_DEPTH]; /* Current allotment per slot */ - unsigned short hash[SFQ_DEPTH]; /* Hash value indexed by slots */ - struct sk_buff_head qs[SFQ_DEPTH]; /* Slot queue */ - struct sfq_head dep[SFQ_DEPTH*2]; /* Linked list of slots, indexed by depth */ + struct sfq_slot slots[SFQ_SLOTS]; + struct sfq_head dep[SFQ_DEPTH]; /* Linked list of slots, indexed by depth */ }; +/* + * sfq_head are either in a sfq_slot or in dep[] array + */ +static inline struct sfq_head *sfq_dep_head(struct sfq_sched_data *q, sfq_index val) +{ + if (val < SFQ_SLOTS) + return &q->slots[val].dep; + return &q->dep[val - SFQ_SLOTS]; +} + static __inline__ unsigned sfq_fold_hash(struct sfq_sched_data *q, u32 h, u32 h1) { return jhash_2words(h, h1, q->perturbation) & (SFQ_HASH_DIVISOR - 1); @@ -200,30 +222,41 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch, return 0; } +/* + * x : slot number [0 .. SFQ_SLOTS - 1] + */ static inline void sfq_link(struct sfq_sched_data *q, sfq_index x) { sfq_index p, n; - int d = q->qs[x].qlen + SFQ_DEPTH; + int qlen = q->slots[x].qlen; - p = d; - n = q->dep[d].next; - q->dep[x].next = n; - q->dep[x].prev = p; - q->dep[p].next = q->dep[n].prev = x; + p = qlen + SFQ_SLOTS; + n = q->dep[qlen].next; + + q->slots[x].dep.next = n; + q->slots[x].dep.prev = p; + + q->dep[qlen].next = x; /* sfq_dep_head(q, p)->next = x */ + sfq_dep_head(q, n)->prev = x; } +#define sfq_unlink(q, x, n, p) \ + n = q->slots[x].dep.next; \ + p = q->slots[x].dep.prev; \ + sfq_dep_head(q, p)->next = n; \ + sfq_dep_head(q, n)->prev = p + + static inline void sfq_dec(struct sfq_sched_data *q, sfq_index x) { sfq_index p, n; + int d; - n = q->dep[x].next; - p = q->dep[x].prev; - q->dep[p].next = n; - q->dep[n].prev = p; - - if (n == p && q->max_depth == q->qs[x].qlen + 1) - q->max_depth--; + sfq_unlink(q, x, n, p); + d = q->slots[x].qlen--; + if (n == p && q->cur_depth == d) + q->cur_depth--; sfq_link(q, x); } @@ -232,34 +265,68 @@ static inline void sfq_inc(struct sfq_sched_data *q, sfq_index x) sfq_index p, n; int d; - n = q->dep[x].next; - p = q->dep[x].prev; - q->dep[p].next = n; - q->dep[n].prev = p; - d = q->qs[x].qlen; - if (q->max_depth < d) - q->max_depth = d; + sfq_unlink(q, x, n, p); + d = ++q->slots[x].qlen; + if (q->cur_depth < d) + q->cur_depth = d; sfq_link(q, x); } +/* helper functions : might be changed when/if skb use a standard list_head */ + +/* remove one skb from tail of slot queue */ +static inline struct sk_buff *slot_dequeue_tail(struct sfq_slot *slot) +{ + struct sk_buff *skb = slot->skblist_prev; + + slot->skblist_prev = skb->prev; + skb->next = skb->prev = NULL; + return skb; +} + +/* remove one skb from head of slot queue */ +static inline struct sk_buff *slot_dequeue_head(struct sfq_slot *slot) +{ + struct sk_buff *skb = slot->skblist_next; + + slot->skblist_next = skb->next; + skb->next = skb->prev = NULL; + return skb; +} + +static inline void slot_queue_init(struct sfq_slot *slot) +{ + slot->skblist_prev = slot->skblist_next = (struct sk_buff *)slot; +} + +/* add skb to slot queue (tail add) */ +static inline void slot_queue_add(struct sfq_slot *slot, struct sk_buff *skb) +{ + skb->prev = slot->skblist_prev; + skb->next = (struct sk_buff *)slot; + slot->skblist_prev->next = skb; + slot->skblist_prev = skb; +} + + static unsigned int sfq_drop(struct Qdisc *sch) { struct sfq_sched_data *q = qdisc_priv(sch); - sfq_index d = q->max_depth; + sfq_index x, d = q->cur_depth; struct sk_buff *skb; unsigned int len; + struct sfq_slot *slot; - /* Queue is full! Find the longest slot and - drop a packet from it */ - + /* Queue is full! Find the longest slot and drop tail packet from it */ if (d > 1) { - sfq_index x = q->dep[d + SFQ_DEPTH].next; - skb = q->qs[x].prev; + x = q->dep[d].next; + slot = &q->slots[x]; +drop: + skb = slot_dequeue_tail(slot); len = qdisc_pkt_len(skb); - __skb_unlink(skb, &q->qs[x]); - kfree_skb(skb); sfq_dec(q, x); + kfree_skb(skb); sch->q.qlen--; sch->qstats.drops++; sch->qstats.backlog -= len; @@ -268,19 +335,11 @@ static unsigned int sfq_drop(struct Qdisc *sch) if (d == 1) { /* It is difficult to believe, but ALL THE SLOTS HAVE LENGTH 1. */ - d = q->next[q->tail]; - q->next[q->tail] = q->next[d]; - q->allot[q->next[d]] += q->quantum; - skb = q->qs[d].prev; - len = qdisc_pkt_len(skb); - __skb_unlink(skb, &q->qs[d]); - kfree_skb(skb); - sfq_dec(q, d); - sch->q.qlen--; - q->ht[q->hash[d]] = SFQ_DEPTH; - sch->qstats.drops++; - sch->qstats.backlog -= len; - return len; + x = q->tail->next; + slot = &q->slots[x]; + q->tail->next = slot->next; + q->ht[slot->hash] = SFQ_EMPTY_SLOT; + goto drop; } return 0; @@ -292,6 +351,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) struct sfq_sched_data *q = qdisc_priv(sch); unsigned int hash; sfq_index x; + struct sfq_slot *slot; int uninitialized_var(ret); hash = sfq_classify(skb, sch, &ret); @@ -304,31 +364,33 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) hash--; x = q->ht[hash]; - if (x == SFQ_DEPTH) { - q->ht[hash] = x = q->dep[SFQ_DEPTH].next; - q->hash[x] = hash; + slot = &q->slots[x]; + if (x == SFQ_EMPTY_SLOT) { + x = q->dep[0].next; /* get a free slot */ + q->ht[hash] = x; + slot = &q->slots[x]; + slot->hash = hash; + slot_queue_init(slot); } - /* If selected queue has length q->limit, this means that - * all another queues are empty and that we do simple tail drop, + /* If selected queue has length q->limit, do simple tail drop, * i.e. drop _this_ packet. */ - if (q->qs[x].qlen >= q->limit) + if (slot->qlen >= q->limit) return qdisc_drop(skb, sch); sch->qstats.backlog += qdisc_pkt_len(skb); - __skb_queue_tail(&q->qs[x], skb); + slot_queue_add(slot, skb); sfq_inc(q, x); - if (q->qs[x].qlen == 1) { /* The flow is new */ - if (q->tail == SFQ_DEPTH) { /* It is the first flow */ - q->tail = x; - q->next[x] = x; - q->allot[x] = q->quantum; + if (slot->qlen == 1) { /* The flow is new */ + if (q->tail == NULL) { /* It is the first flow */ + slot->next = x; } else { - q->next[x] = q->next[q->tail]; - q->next[q->tail] = x; - q->tail = x; + slot->next = q->tail->next; + q->tail->next = x; } + q->tail = slot; + slot->allot = q->quantum; } if (++sch->q.qlen <= q->limit) { sch->bstats.bytes += qdisc_pkt_len(skb); @@ -344,14 +406,12 @@ static struct sk_buff * sfq_peek(struct Qdisc *sch) { struct sfq_sched_data *q = qdisc_priv(sch); - sfq_index a; /* No active slots */ - if (q->tail == SFQ_DEPTH) + if (q->tail == NULL) return NULL; - a = q->next[q->tail]; - return skb_peek(&q->qs[a]); + return q->slots[q->tail->next].skblist_next; } static struct sk_buff * @@ -359,34 +419,32 @@ sfq_dequeue(struct Qdisc *sch) { struct sfq_sched_data *q = qdisc_priv(sch); struct sk_buff *skb; - sfq_index a, old_a; + sfq_index a, next_a; + struct sfq_slot *slot; /* No active slots */ - if (q->tail == SFQ_DEPTH) + if (q->tail == NULL) return NULL; - a = old_a = q->next[q->tail]; - - /* Grab packet */ - skb = __skb_dequeue(&q->qs[a]); + a = q->tail->next; + slot = &q->slots[a]; + skb = slot_dequeue_head(slot); sfq_dec(q, a); sch->q.qlen--; sch->qstats.backlog -= qdisc_pkt_len(skb); - /* Is the slot empty? */ - if (q->qs[a].qlen == 0) { - q->ht[q->hash[a]] = SFQ_DEPTH; - a = q->next[a]; - if (a == old_a) { - q->tail = SFQ_DEPTH; + /* Is the slot now empty? */ + if (slot->qlen == 0) { + q->ht[slot->hash] = SFQ_EMPTY_SLOT; + next_a = slot->next; + if (a == next_a) { + q->tail = NULL; /* no more active slots */ return skb; } - q->next[q->tail] = a; - q->allot[a] += q->quantum; - } else if ((q->allot[a] -= qdisc_pkt_len(skb)) <= 0) { - q->tail = a; - a = q->next[a]; - q->allot[a] += q->quantum; + q->tail->next = next_a; + } else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) { + q->tail = slot; + slot->allot += q->quantum; } return skb; } @@ -450,17 +508,16 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt) init_timer_deferrable(&q->perturb_timer); for (i = 0; i < SFQ_HASH_DIVISOR; i++) - q->ht[i] = SFQ_DEPTH; + q->ht[i] = SFQ_EMPTY_SLOT; for (i = 0; i < SFQ_DEPTH; i++) { - skb_queue_head_init(&q->qs[i]); - q->dep[i + SFQ_DEPTH].next = i + SFQ_DEPTH; - q->dep[i + SFQ_DEPTH].prev = i + SFQ_DEPTH; + q->dep[i].next = i + SFQ_SLOTS; + q->dep[i].prev = i + SFQ_SLOTS; } q->limit = SFQ_DEPTH - 1; - q->max_depth = 0; - q->tail = SFQ_DEPTH; + q->cur_depth = 0; + q->tail = NULL; if (opt == NULL) { q->quantum = psched_mtu(qdisc_dev(sch)); q->perturb_period = 0; @@ -471,7 +528,7 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt) return err; } - for (i = 0; i < SFQ_DEPTH; i++) + for (i = 0; i < SFQ_SLOTS; i++) sfq_link(q, i); return 0; } @@ -547,9 +604,9 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl, struct gnet_dump *d) { struct sfq_sched_data *q = qdisc_priv(sch); - sfq_index idx = q->ht[cl-1]; - struct gnet_stats_queue qs = { .qlen = q->qs[idx].qlen }; - struct tc_sfq_xstats xstats = { .allot = q->allot[idx] }; + const struct sfq_slot *slot = &q->slots[q->ht[cl - 1]]; + struct gnet_stats_queue qs = { .qlen = slot->qlen }; + struct tc_sfq_xstats xstats = { .allot = slot->allot }; if (gnet_stats_copy_queue(d, &qs) < 0) return -1; @@ -565,7 +622,7 @@ static void sfq_walk(struct Qdisc *sch, struct qdisc_walker *arg) return; for (i = 0; i < SFQ_HASH_DIVISOR; i++) { - if (q->ht[i] == SFQ_DEPTH || + if (q->ht[i] == SFQ_EMPTY_SLOT || arg->count < arg->skip) { arg->count++; continue; ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v2] net_sched: sch_sfq: better struct layouts 2010-12-20 17:02 ` [PATCH v2] " Eric Dumazet @ 2010-12-20 21:33 ` David Miller 2010-12-20 21:42 ` Eric Dumazet 2010-12-20 22:55 ` [PATCH v2] " Jarek Poplawski 2010-12-20 23:16 ` [PATCH net-next-2.6] sch_sfq: allow big packets and be fair Eric Dumazet 2 siblings, 1 reply; 40+ messages in thread From: David Miller @ 2010-12-20 21:33 UTC (permalink / raw) To: eric.dumazet; +Cc: jarkao2, kaber, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 20 Dec 2010 18:02:05 +0100 > [PATCH v2] net_sched: sch_sfq: better struct layouts Please respin this against current net-next-2.6, it doesn't apply cleanly after your other two sfq changes. Thanks! ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2] net_sched: sch_sfq: better struct layouts 2010-12-20 21:33 ` David Miller @ 2010-12-20 21:42 ` Eric Dumazet 2010-12-20 22:54 ` [PATCH v3 net-next-2.6] " Eric Dumazet 0 siblings, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2010-12-20 21:42 UTC (permalink / raw) To: David Miller; +Cc: jarkao2, kaber, netdev Le lundi 20 décembre 2010 à 13:33 -0800, David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Mon, 20 Dec 2010 18:02:05 +0100 > > > [PATCH v2] net_sched: sch_sfq: better struct layouts > > Please respin this against current net-next-2.6, it doesn't > apply cleanly after your other two sfq changes. > Sure, will do, thanks David. ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v3 net-next-2.6] net_sched: sch_sfq: better struct layouts 2010-12-20 21:42 ` Eric Dumazet @ 2010-12-20 22:54 ` Eric Dumazet 2010-12-21 5:33 ` David Miller 0 siblings, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2010-12-20 22:54 UTC (permalink / raw) To: David Miller; +Cc: netdev, Patrick McHardy, Jarek Poplawski Here is a respin of patch. I'll send a short patch to make SFQ more fair in presence of large packets as well. Thanks [PATCH v3 net-next-2.6] net_sched: sch_sfq: better struct layouts This patch shrinks sizeof(struct sfq_sched_data) from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and reduce text size as well. text data bss dec hex filename 4821 152 0 4973 136d old/net/sched/sch_sfq.o 4627 136 0 4763 129b new/net/sched/sch_sfq.o All data for a slot/flow is now grouped in a compact and cache friendly structure, instead of being spreaded in many different points. struct sfq_slot { struct sk_buff *skblist_next; struct sk_buff *skblist_prev; sfq_index qlen; /* number of skbs in skblist */ sfq_index next; /* next slot in sfq chain */ struct sfq_head dep; /* anchor in dep[] chains */ unsigned short hash; /* hash value (index in ht[]) */ short allot; /* credit for this slot */ }; Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jarek Poplawski <jarkao2@gmail.com> Cc: Patrick McHardy <kaber@trash.net> --- v3: respin after two prior patches applied net/sched/sch_sfq.c | 260 ++++++++++++++++++++++++++---------------- 1 file changed, 162 insertions(+), 98 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 42396c9..c474b4b 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -67,27 +67,42 @@ IMPLEMENTATION: This implementation limits maximal queue length to 128; - maximal mtu to 2^15-1; number of hash buckets to 1024. + maximal mtu to 2^15-1; max 128 flows, number of hash buckets to 1024. The only goal of this restrictions was that all data - fit into one 4K page :-). Struct sfq_sched_data is - organized in anti-cache manner: all the data for a bucket - are scattered over different locations. This is not good, - but it allowed me to put it into 4K. + fit into one 4K page on 32bit arches. It is easy to increase these values, but not in flight. */ -#define SFQ_DEPTH 128 +#define SFQ_DEPTH 128 /* max number of packets per flow */ +#define SFQ_SLOTS 128 /* max number of flows */ +#define SFQ_EMPTY_SLOT 255 #define SFQ_HASH_DIVISOR 1024 -/* This type should contain at least SFQ_DEPTH*2 values */ +/* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */ typedef unsigned char sfq_index; +/* + * We dont use pointers to save space. + * Small indexes [0 ... SFQ_SLOTS - 1] are 'pointers' to slots[] array + * while following values [SFQ_SLOTS ... SFQ_SLOTS + SFQ_DEPTH - 1] + * are 'pointers' to dep[] array + */ struct sfq_head { sfq_index next; sfq_index prev; }; +struct sfq_slot { + struct sk_buff *skblist_next; + struct sk_buff *skblist_prev; + sfq_index qlen; /* number of skbs in skblist */ + sfq_index next; /* next slot in sfq chain */ + struct sfq_head dep; /* anchor in dep[] chains */ + unsigned short hash; /* hash value (index in ht[]) */ + short allot; /* credit for this slot */ +}; + struct sfq_sched_data { /* Parameters */ @@ -99,17 +114,24 @@ struct sfq_sched_data struct tcf_proto *filter_list; struct timer_list perturb_timer; u32 perturbation; - sfq_index tail; /* Index of current slot in round */ - sfq_index max_depth; /* Maximal depth */ + sfq_index cur_depth; /* depth of longest slot */ + struct sfq_slot *tail; /* current slot in round */ sfq_index ht[SFQ_HASH_DIVISOR]; /* Hash table */ - sfq_index next[SFQ_DEPTH]; /* Active slots link */ - short allot[SFQ_DEPTH]; /* Current allotment per slot */ - unsigned short hash[SFQ_DEPTH]; /* Hash value indexed by slots */ - struct sk_buff_head qs[SFQ_DEPTH]; /* Slot queue */ - struct sfq_head dep[SFQ_DEPTH*2]; /* Linked list of slots, indexed by depth */ + struct sfq_slot slots[SFQ_SLOTS]; + struct sfq_head dep[SFQ_DEPTH]; /* Linked list of slots, indexed by depth */ }; +/* + * sfq_head are either in a sfq_slot or in dep[] array + */ +static inline struct sfq_head *sfq_dep_head(struct sfq_sched_data *q, sfq_index val) +{ + if (val < SFQ_SLOTS) + return &q->slots[val].dep; + return &q->dep[val - SFQ_SLOTS]; +} + static __inline__ unsigned sfq_fold_hash(struct sfq_sched_data *q, u32 h, u32 h1) { return jhash_2words(h, h1, q->perturbation) & (SFQ_HASH_DIVISOR - 1); @@ -200,30 +222,41 @@ static unsigned int sfq_classify(struct sk_buff *skb, struct Qdisc *sch, return 0; } +/* + * x : slot number [0 .. SFQ_SLOTS - 1] + */ static inline void sfq_link(struct sfq_sched_data *q, sfq_index x) { sfq_index p, n; - int d = q->qs[x].qlen + SFQ_DEPTH; + int qlen = q->slots[x].qlen; - p = d; - n = q->dep[d].next; - q->dep[x].next = n; - q->dep[x].prev = p; - q->dep[p].next = q->dep[n].prev = x; + p = qlen + SFQ_SLOTS; + n = q->dep[qlen].next; + + q->slots[x].dep.next = n; + q->slots[x].dep.prev = p; + + q->dep[qlen].next = x; /* sfq_dep_head(q, p)->next = x */ + sfq_dep_head(q, n)->prev = x; } +#define sfq_unlink(q, x, n, p) \ + n = q->slots[x].dep.next; \ + p = q->slots[x].dep.prev; \ + sfq_dep_head(q, p)->next = n; \ + sfq_dep_head(q, n)->prev = p + + static inline void sfq_dec(struct sfq_sched_data *q, sfq_index x) { sfq_index p, n; + int d; - n = q->dep[x].next; - p = q->dep[x].prev; - q->dep[p].next = n; - q->dep[n].prev = p; - - if (n == p && q->max_depth == q->qs[x].qlen + 1) - q->max_depth--; + sfq_unlink(q, x, n, p); + d = q->slots[x].qlen--; + if (n == p && q->cur_depth == d) + q->cur_depth--; sfq_link(q, x); } @@ -232,34 +265,72 @@ static inline void sfq_inc(struct sfq_sched_data *q, sfq_index x) sfq_index p, n; int d; - n = q->dep[x].next; - p = q->dep[x].prev; - q->dep[p].next = n; - q->dep[n].prev = p; - d = q->qs[x].qlen; - if (q->max_depth < d) - q->max_depth = d; + sfq_unlink(q, x, n, p); + d = ++q->slots[x].qlen; + if (q->cur_depth < d) + q->cur_depth = d; sfq_link(q, x); } +/* helper functions : might be changed when/if skb use a standard list_head */ + +/* remove one skb from tail of slot queue */ +static inline struct sk_buff *slot_dequeue_tail(struct sfq_slot *slot) +{ + struct sk_buff *skb = slot->skblist_prev; + + slot->skblist_prev = skb->prev; + skb->next = skb->prev = NULL; + return skb; +} + +/* remove one skb from head of slot queue */ +static inline struct sk_buff *slot_dequeue_head(struct sfq_slot *slot) +{ + struct sk_buff *skb = slot->skblist_next; + + slot->skblist_next = skb->next; + skb->next = skb->prev = NULL; + return skb; +} + +static inline void slot_queue_init(struct sfq_slot *slot) +{ + slot->skblist_prev = slot->skblist_next = (struct sk_buff *)slot; +} + +/* add skb to slot queue (tail add) */ +static inline void slot_queue_add(struct sfq_slot *slot, struct sk_buff *skb) +{ + skb->prev = slot->skblist_prev; + skb->next = (struct sk_buff *)slot; + slot->skblist_prev->next = skb; + slot->skblist_prev = skb; +} + +#define slot_queue_walk(slot, skb) \ + for (skb = slot->skblist_next; \ + skb != (struct sk_buff *)slot; \ + skb = skb->next) + static unsigned int sfq_drop(struct Qdisc *sch) { struct sfq_sched_data *q = qdisc_priv(sch); - sfq_index d = q->max_depth; + sfq_index x, d = q->cur_depth; struct sk_buff *skb; unsigned int len; + struct sfq_slot *slot; - /* Queue is full! Find the longest slot and - drop a packet from it */ - + /* Queue is full! Find the longest slot and drop tail packet from it */ if (d > 1) { - sfq_index x = q->dep[d + SFQ_DEPTH].next; - skb = q->qs[x].prev; + x = q->dep[d].next; + slot = &q->slots[x]; +drop: + skb = slot_dequeue_tail(slot); len = qdisc_pkt_len(skb); - __skb_unlink(skb, &q->qs[x]); - kfree_skb(skb); sfq_dec(q, x); + kfree_skb(skb); sch->q.qlen--; sch->qstats.drops++; sch->qstats.backlog -= len; @@ -268,18 +339,11 @@ static unsigned int sfq_drop(struct Qdisc *sch) if (d == 1) { /* It is difficult to believe, but ALL THE SLOTS HAVE LENGTH 1. */ - d = q->next[q->tail]; - q->next[q->tail] = q->next[d]; - skb = q->qs[d].prev; - len = qdisc_pkt_len(skb); - __skb_unlink(skb, &q->qs[d]); - kfree_skb(skb); - sfq_dec(q, d); - sch->q.qlen--; - q->ht[q->hash[d]] = SFQ_DEPTH; - sch->qstats.drops++; - sch->qstats.backlog -= len; - return len; + x = q->tail->next; + slot = &q->slots[x]; + q->tail->next = slot->next; + q->ht[slot->hash] = SFQ_EMPTY_SLOT; + goto drop; } return 0; @@ -291,6 +355,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) struct sfq_sched_data *q = qdisc_priv(sch); unsigned int hash; sfq_index x; + struct sfq_slot *slot; int uninitialized_var(ret); hash = sfq_classify(skb, sch, &ret); @@ -303,30 +368,33 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) hash--; x = q->ht[hash]; - if (x == SFQ_DEPTH) { - q->ht[hash] = x = q->dep[SFQ_DEPTH].next; - q->hash[x] = hash; + slot = &q->slots[x]; + if (x == SFQ_EMPTY_SLOT) { + x = q->dep[0].next; /* get a free slot */ + q->ht[hash] = x; + slot = &q->slots[x]; + slot->hash = hash; + slot_queue_init(slot); } - /* If selected queue has length q->limit, this means that - * all another queues are empty and that we do simple tail drop, + /* If selected queue has length q->limit, do simple tail drop, * i.e. drop _this_ packet. */ - if (q->qs[x].qlen >= q->limit) + if (slot->qlen >= q->limit) return qdisc_drop(skb, sch); sch->qstats.backlog += qdisc_pkt_len(skb); - __skb_queue_tail(&q->qs[x], skb); + slot_queue_add(slot, skb); sfq_inc(q, x); - if (q->qs[x].qlen == 1) { /* The flow is new */ - if (q->tail == SFQ_DEPTH) { /* It is the first flow */ - q->next[x] = x; + if (slot->qlen == 1) { /* The flow is new */ + if (q->tail == NULL) { /* It is the first flow */ + slot->next = x; } else { - q->next[x] = q->next[q->tail]; - q->next[q->tail] = x; + slot->next = q->tail->next; + q->tail->next = x; } - q->tail = x; - q->allot[x] = q->quantum; + q->tail = slot; + slot->allot = q->quantum; } if (++sch->q.qlen <= q->limit) { sch->bstats.bytes += qdisc_pkt_len(skb); @@ -342,14 +410,12 @@ static struct sk_buff * sfq_peek(struct Qdisc *sch) { struct sfq_sched_data *q = qdisc_priv(sch); - sfq_index a; /* No active slots */ - if (q->tail == SFQ_DEPTH) + if (q->tail == NULL) return NULL; - a = q->next[q->tail]; - return skb_peek(&q->qs[a]); + return q->slots[q->tail->next].skblist_next; } static struct sk_buff * @@ -358,31 +424,31 @@ sfq_dequeue(struct Qdisc *sch) struct sfq_sched_data *q = qdisc_priv(sch); struct sk_buff *skb; sfq_index a, next_a; + struct sfq_slot *slot; /* No active slots */ - if (q->tail == SFQ_DEPTH) + if (q->tail == NULL) return NULL; - a = q->next[q->tail]; - - /* Grab packet */ - skb = __skb_dequeue(&q->qs[a]); + a = q->tail->next; + slot = &q->slots[a]; + skb = slot_dequeue_head(slot); sfq_dec(q, a); sch->q.qlen--; sch->qstats.backlog -= qdisc_pkt_len(skb); /* Is the slot empty? */ - if (q->qs[a].qlen == 0) { - q->ht[q->hash[a]] = SFQ_DEPTH; - next_a = q->next[a]; + if (slot->qlen == 0) { + q->ht[slot->hash] = SFQ_EMPTY_SLOT; + next_a = slot->next; if (a == next_a) { - q->tail = SFQ_DEPTH; + q->tail = NULL; /* no more active slots */ return skb; } - q->next[q->tail] = next_a; - } else if ((q->allot[a] -= qdisc_pkt_len(skb)) <= 0) { - q->allot[a] += q->quantum; - q->tail = a; + q->tail->next = next_a; + } else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) { + q->tail = slot; + slot->allot += q->quantum; } return skb; } @@ -446,17 +512,16 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt) init_timer_deferrable(&q->perturb_timer); for (i = 0; i < SFQ_HASH_DIVISOR; i++) - q->ht[i] = SFQ_DEPTH; + q->ht[i] = SFQ_EMPTY_SLOT; for (i = 0; i < SFQ_DEPTH; i++) { - skb_queue_head_init(&q->qs[i]); - q->dep[i + SFQ_DEPTH].next = i + SFQ_DEPTH; - q->dep[i + SFQ_DEPTH].prev = i + SFQ_DEPTH; + q->dep[i].next = i + SFQ_SLOTS; + q->dep[i].prev = i + SFQ_SLOTS; } q->limit = SFQ_DEPTH - 1; - q->max_depth = 0; - q->tail = SFQ_DEPTH; + q->cur_depth = 0; + q->tail = NULL; if (opt == NULL) { q->quantum = psched_mtu(qdisc_dev(sch)); q->perturb_period = 0; @@ -467,7 +532,7 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt) return err; } - for (i = 0; i < SFQ_DEPTH; i++) + for (i = 0; i < SFQ_SLOTS; i++) sfq_link(q, i); return 0; } @@ -543,13 +608,12 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl, struct gnet_dump *d) { struct sfq_sched_data *q = qdisc_priv(sch); - sfq_index idx = q->ht[cl-1]; - struct sk_buff_head *list = &q->qs[idx]; - struct gnet_stats_queue qs = { .qlen = list->qlen }; - struct tc_sfq_xstats xstats = { .allot = q->allot[idx] }; + const struct sfq_slot *slot = &q->slots[q->ht[cl - 1]]; + struct gnet_stats_queue qs = { .qlen = slot->qlen }; + struct tc_sfq_xstats xstats = { .allot = slot->allot }; struct sk_buff *skb; - skb_queue_walk(list, skb) + slot_queue_walk(slot, skb) qs.backlog += qdisc_pkt_len(skb); if (gnet_stats_copy_queue(d, &qs) < 0) @@ -566,7 +630,7 @@ static void sfq_walk(struct Qdisc *sch, struct qdisc_walker *arg) return; for (i = 0; i < SFQ_HASH_DIVISOR; i++) { - if (q->ht[i] == SFQ_DEPTH || + if (q->ht[i] == SFQ_EMPTY_SLOT || arg->count < arg->skip) { arg->count++; continue; ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v3 net-next-2.6] net_sched: sch_sfq: better struct layouts 2010-12-20 22:54 ` [PATCH v3 net-next-2.6] " Eric Dumazet @ 2010-12-21 5:33 ` David Miller 0 siblings, 0 replies; 40+ messages in thread From: David Miller @ 2010-12-21 5:33 UTC (permalink / raw) To: eric.dumazet; +Cc: netdev, kaber, jarkao2 From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 20 Dec 2010 23:54:58 +0100 > [PATCH v3 net-next-2.6] net_sched: sch_sfq: better struct layouts > > This patch shrinks sizeof(struct sfq_sched_data) > from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and > reduce text size as well. > > text data bss dec hex filename > 4821 152 0 4973 136d old/net/sched/sch_sfq.o > 4627 136 0 4763 129b new/net/sched/sch_sfq.o > > > All data for a slot/flow is now grouped in a compact and cache friendly > structure, instead of being spreaded in many different points. > > struct sfq_slot { > struct sk_buff *skblist_next; > struct sk_buff *skblist_prev; > sfq_index qlen; /* number of skbs in skblist */ > sfq_index next; /* next slot in sfq chain */ > struct sfq_head dep; /* anchor in dep[] chains */ > unsigned short hash; /* hash value (index in ht[]) */ > short allot; /* credit for this slot */ > }; > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Applied. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2] net_sched: sch_sfq: better struct layouts 2010-12-20 17:02 ` [PATCH v2] " Eric Dumazet 2010-12-20 21:33 ` David Miller @ 2010-12-20 22:55 ` Jarek Poplawski 2010-12-20 23:16 ` [PATCH net-next-2.6] sch_sfq: allow big packets and be fair Eric Dumazet 2 siblings, 0 replies; 40+ messages in thread From: Jarek Poplawski @ 2010-12-20 22:55 UTC (permalink / raw) To: Eric Dumazet; +Cc: Patrick McHardy, David Miller, netdev On Mon, Dec 20, 2010 at 06:02:05PM +0100, Eric Dumazet wrote: > Le dimanche 19 décembre 2010 ?? 22:22 +0100, Jarek Poplawski a écrit : > > > I think open coding sk_buff_head is a wrong idea. Otherwise, this > > patch looks OK to me, only a few cosmetic suggestions below. > > > > I completely agree with you but this should be temporary, because David > really wants to use list_head for skbs, I believe this will be done ;) > > I chose to name the list skblist to make clear where we want to plug a > real list_head once done. > > Also, not using sk_buff_head saves at least 8 bytes per slot. Alas I dumped my 486sx already :-/ > > > - sfq_index max_depth; /* Maximal depth */ > > > + sfq_index max_depth; /* depth of longest slot */ > > > > depth and/or length? (One dimension should be enough.) > > maybe cur_depth ? Its not the maximal possible depth, but depth of > longest slot, or current max depth... Hmm... or max_depth? I meant the comment only, sorry ;-) > > > - /* If selected queue has length q->limit, this means that > > > - * all another queues are empty and that we do simple tail drop, > > > > No reason to remove this line. > > Well, the reason we drop this packet is not because other queues are > empty, but because we reach max depth for this queue. (I have the idea > to extend SFQ to allow more packets to be queued, still with a 127 limit > per queue, and 127 flows). With 10Gbs links, a global limit of 127 > packets is short. Right, but does this line say something else? Of course, you can find it out by yourself too, but this comment makes reading a bit faster. > > If you really have to do this, all these: __skb_queue_tail(), > > __skb_dequeue(), skb_queue_head_init(), skb_peek() etc. used here > > should stay as (local) inline functions to remain readability. > > > > OK done, thanks a lot for reviewing and very useful comments ! Thanks for using them! Jarek P. ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH net-next-2.6] sch_sfq: allow big packets and be fair 2010-12-20 17:02 ` [PATCH v2] " Eric Dumazet 2010-12-20 21:33 ` David Miller 2010-12-20 22:55 ` [PATCH v2] " Jarek Poplawski @ 2010-12-20 23:16 ` Eric Dumazet 2010-12-21 10:15 ` Jarek Poplawski 2 siblings, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2010-12-20 23:16 UTC (permalink / raw) To: David Miller; +Cc: Patrick McHardy, netdev, Jarek Poplawski SFQ is currently 'limited' to small packets, because it uses a 16bit allotment number per flow. Switch it to 18bit, and use appropriate handling to make sure this allotment is in [1 .. quantum] range before a new packet is dequeued, so that fairness is respected. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jarek Poplawski <jarkao2@gmail.com> Cc: Patrick McHardy <kaber@trash.net> --- net/sched/sch_sfq.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index c474b4b..878704a 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -67,7 +67,7 @@ IMPLEMENTATION: This implementation limits maximal queue length to 128; - maximal mtu to 2^15-1; max 128 flows, number of hash buckets to 1024. + maximal mtu to 2^16-1; max 128 flows, number of hash buckets to 1024. The only goal of this restrictions was that all data fit into one 4K page on 32bit arches. @@ -99,9 +99,10 @@ struct sfq_slot { sfq_index qlen; /* number of skbs in skblist */ sfq_index next; /* next slot in sfq chain */ struct sfq_head dep; /* anchor in dep[] chains */ - unsigned short hash; /* hash value (index in ht[]) */ - short allot; /* credit for this slot */ + unsigned int hash:14; /* hash value (index in ht[]) */ + unsigned int allot:18; /* credit for this slot */ }; +#define ALLOT_ZERO (1 << 16) struct sfq_sched_data { @@ -394,7 +395,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) q->tail->next = x; } q->tail = slot; - slot->allot = q->quantum; + slot->allot = ALLOT_ZERO + q->quantum; } if (++sch->q.qlen <= q->limit) { sch->bstats.bytes += qdisc_pkt_len(skb); @@ -430,8 +431,14 @@ sfq_dequeue(struct Qdisc *sch) if (q->tail == NULL) return NULL; +next: a = q->tail->next; slot = &q->slots[a]; + if (slot->allot <= ALLOT_ZERO) { + q->tail = slot; + slot->allot += q->quantum; + goto next; + } skb = slot_dequeue_head(slot); sfq_dec(q, a); sch->q.qlen--; @@ -446,9 +453,8 @@ sfq_dequeue(struct Qdisc *sch) return skb; } q->tail->next = next_a; - } else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) { - q->tail = slot; - slot->allot += q->quantum; + } else { + slot->allot -= qdisc_pkt_len(skb); } return skb; } @@ -610,7 +616,9 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl, struct sfq_sched_data *q = qdisc_priv(sch); const struct sfq_slot *slot = &q->slots[q->ht[cl - 1]]; struct gnet_stats_queue qs = { .qlen = slot->qlen }; - struct tc_sfq_xstats xstats = { .allot = slot->allot }; + struct tc_sfq_xstats xstats = { + .allot = slot->allot - ALLOT_ZERO + }; struct sk_buff *skb; slot_queue_walk(slot, skb) ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH net-next-2.6] sch_sfq: allow big packets and be fair 2010-12-20 23:16 ` [PATCH net-next-2.6] sch_sfq: allow big packets and be fair Eric Dumazet @ 2010-12-21 10:15 ` Jarek Poplawski 2010-12-21 10:30 ` Jarek Poplawski 2010-12-21 10:57 ` Eric Dumazet 0 siblings, 2 replies; 40+ messages in thread From: Jarek Poplawski @ 2010-12-21 10:15 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, Patrick McHardy, netdev On 2010-12-21 00:16, Eric Dumazet wrote: > SFQ is currently 'limited' to small packets, because it uses a 16bit > allotment number per flow. Switch it to 18bit, and use appropriate > handling to make sure this allotment is in [1 .. quantum] range before a > new packet is dequeued, so that fairness is respected. Well, such two important changes should be in separate patches. The change of allotment limit looks OK (but I would try scaling, e.g. in 16-byte chunks, btw). The change in fair treatment looks dubious. A flow which uses exactly it's quantum in one round will be skipped in the next round. A flow which uses a bit more than its quantum in one round, will be skipped too, while we should only give it less this time to keep the sum up to 2 quantums. (The usual algorithm is to check if a flow has enough "tickets" for sending its next packet.) Jarek P. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > Cc: Jarek Poplawski <jarkao2@gmail.com> > Cc: Patrick McHardy <kaber@trash.net> > --- > net/sched/sch_sfq.c | 24 ++++++++++++++++-------- > 1 file changed, 16 insertions(+), 8 deletions(-) > > diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c > index c474b4b..878704a 100644 > --- a/net/sched/sch_sfq.c > +++ b/net/sched/sch_sfq.c > @@ -67,7 +67,7 @@ > > IMPLEMENTATION: > This implementation limits maximal queue length to 128; > - maximal mtu to 2^15-1; max 128 flows, number of hash buckets to 1024. > + maximal mtu to 2^16-1; max 128 flows, number of hash buckets to 1024. > The only goal of this restrictions was that all data > fit into one 4K page on 32bit arches. > > @@ -99,9 +99,10 @@ struct sfq_slot { > sfq_index qlen; /* number of skbs in skblist */ > sfq_index next; /* next slot in sfq chain */ > struct sfq_head dep; /* anchor in dep[] chains */ > - unsigned short hash; /* hash value (index in ht[]) */ > - short allot; /* credit for this slot */ > + unsigned int hash:14; /* hash value (index in ht[]) */ > + unsigned int allot:18; /* credit for this slot */ > }; > +#define ALLOT_ZERO (1 << 16) > > struct sfq_sched_data > { > @@ -394,7 +395,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) > q->tail->next = x; > } > q->tail = slot; > - slot->allot = q->quantum; > + slot->allot = ALLOT_ZERO + q->quantum; > } > if (++sch->q.qlen <= q->limit) { > sch->bstats.bytes += qdisc_pkt_len(skb); > @@ -430,8 +431,14 @@ sfq_dequeue(struct Qdisc *sch) > if (q->tail == NULL) > return NULL; > > +next: > a = q->tail->next; > slot = &q->slots[a]; > + if (slot->allot <= ALLOT_ZERO) { > + q->tail = slot; > + slot->allot += q->quantum; > + goto next; > + } > skb = slot_dequeue_head(slot); > sfq_dec(q, a); > sch->q.qlen--; > @@ -446,9 +453,8 @@ sfq_dequeue(struct Qdisc *sch) > return skb; > } > q->tail->next = next_a; > - } else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) { > - q->tail = slot; > - slot->allot += q->quantum; > + } else { > + slot->allot -= qdisc_pkt_len(skb); > } > return skb; > } > @@ -610,7 +616,9 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl, > struct sfq_sched_data *q = qdisc_priv(sch); > const struct sfq_slot *slot = &q->slots[q->ht[cl - 1]]; > struct gnet_stats_queue qs = { .qlen = slot->qlen }; > - struct tc_sfq_xstats xstats = { .allot = slot->allot }; > + struct tc_sfq_xstats xstats = { > + .allot = slot->allot - ALLOT_ZERO > + }; > struct sk_buff *skb; > > slot_queue_walk(slot, skb) > > > -- ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next-2.6] sch_sfq: allow big packets and be fair 2010-12-21 10:15 ` Jarek Poplawski @ 2010-12-21 10:30 ` Jarek Poplawski 2010-12-21 10:44 ` Eric Dumazet 2010-12-21 10:57 ` Eric Dumazet 1 sibling, 1 reply; 40+ messages in thread From: Jarek Poplawski @ 2010-12-21 10:30 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, Patrick McHardy, netdev On Tue, Dec 21, 2010 at 10:15:06AM +0000, Jarek Poplawski wrote: > The change of allotment limit looks OK [...] Hmm... but maybe s/ALLOT_ZERO/SFQ_ALLOT_ZERO/? ;-) Jarek P. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next-2.6] sch_sfq: allow big packets and be fair 2010-12-21 10:30 ` Jarek Poplawski @ 2010-12-21 10:44 ` Eric Dumazet 2010-12-21 10:56 ` Jarek Poplawski 0 siblings, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2010-12-21 10:44 UTC (permalink / raw) To: Jarek Poplawski; +Cc: David Miller, Patrick McHardy, netdev Le mardi 21 décembre 2010 à 10:30 +0000, Jarek Poplawski a écrit : > On Tue, Dec 21, 2010 at 10:15:06AM +0000, Jarek Poplawski wrote: > > The change of allotment limit looks OK [...] > > Hmm... but maybe s/ALLOT_ZERO/SFQ_ALLOT_ZERO/? ;-) Its a local symbol, its not like it being in an include file ;) ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next-2.6] sch_sfq: allow big packets and be fair 2010-12-21 10:44 ` Eric Dumazet @ 2010-12-21 10:56 ` Jarek Poplawski 0 siblings, 0 replies; 40+ messages in thread From: Jarek Poplawski @ 2010-12-21 10:56 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, Patrick McHardy, netdev On Tue, Dec 21, 2010 at 11:44:29AM +0100, Eric Dumazet wrote: > Le mardi 21 décembre 2010 ?? 10:30 +0000, Jarek Poplawski a écrit : > > On Tue, Dec 21, 2010 at 10:15:06AM +0000, Jarek Poplawski wrote: > > > The change of allotment limit looks OK [...] > > > > Hmm... but maybe s/ALLOT_ZERO/SFQ_ALLOT_ZERO/? ;-) > > Its a local symbol, its not like it being in an include file ;) Sure, but why this one has to be different than others in this file? (Plus, if you don't remember all this code it's a good hint.) Jarek P. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next-2.6] sch_sfq: allow big packets and be fair 2010-12-21 10:15 ` Jarek Poplawski 2010-12-21 10:30 ` Jarek Poplawski @ 2010-12-21 10:57 ` Eric Dumazet 2010-12-21 11:39 ` Jarek Poplawski 1 sibling, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2010-12-21 10:57 UTC (permalink / raw) To: Jarek Poplawski; +Cc: David Miller, Patrick McHardy, netdev Le mardi 21 décembre 2010 à 10:15 +0000, Jarek Poplawski a écrit : > On 2010-12-21 00:16, Eric Dumazet wrote: > > SFQ is currently 'limited' to small packets, because it uses a 16bit > > allotment number per flow. Switch it to 18bit, and use appropriate > > handling to make sure this allotment is in [1 .. quantum] range before a > > new packet is dequeued, so that fairness is respected. > > Well, such two important changes should be in separate patches. > > The change of allotment limit looks OK (but I would try scaling, e.g. > in 16-byte chunks, btw). > Hmm, we could scale by 2 or 3 and keep 16bit allot/hash (faster than 18/14 bit bitfields on x86). Not sure its worth it (it adds two shifts per packet) > The change in fair treatment looks dubious. A flow which uses exactly > it's quantum in one round will be skipped in the next round. A flow > which uses a bit more than its quantum in one round, will be skipped > too, while we should only give it less this time to keep the sum up to > 2 quantums. (The usual algorithm is to check if a flow has enough > "tickets" for sending its next packet.) Hmm... A flow which uses exactly its quantum in one round wont be skipped in the next round. I only made the "I pass my round to next slot in chain" in one place instead of two, maybe you missed the removal at the end of sfq_dequeue() ? - } else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) { - q->tail = slot; - slot->allot += q->quantum; + } else { + slot->allot -= qdisc_pkt_len(skb); } Now the check is performed at the beginning of sfq_dequeue(), to be able to charge a previously sent 'big packet' multiple times (faulty flow wont send a packet before passing xx rounds) I believe I just did the right thing. The "allot" is incremented when current flow "pass its round to next slot", and decremented when a packet is dequeued from this slot. Before being allowed to dequeue a packet, "allot" must be 'positive'. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next-2.6] sch_sfq: allow big packets and be fair 2010-12-21 10:57 ` Eric Dumazet @ 2010-12-21 11:39 ` Jarek Poplawski 2010-12-21 12:17 ` Jarek Poplawski 0 siblings, 1 reply; 40+ messages in thread From: Jarek Poplawski @ 2010-12-21 11:39 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, Patrick McHardy, netdev On Tue, Dec 21, 2010 at 11:57:17AM +0100, Eric Dumazet wrote: > Le mardi 21 décembre 2010 ?? 10:15 +0000, Jarek Poplawski a écrit : > > On 2010-12-21 00:16, Eric Dumazet wrote: > > > SFQ is currently 'limited' to small packets, because it uses a 16bit > > > allotment number per flow. Switch it to 18bit, and use appropriate > > > handling to make sure this allotment is in [1 .. quantum] range before a > > > new packet is dequeued, so that fairness is respected. > > > > Well, such two important changes should be in separate patches. > > > > The change of allotment limit looks OK (but I would try scaling, e.g. > > in 16-byte chunks, btw). > > > > Hmm, we could scale by 2 or 3 and keep 16bit allot/hash (faster than > 18/14 bit bitfields on x86). Not sure its worth it (it adds two shifts > per packet) I'm OK with any of those methods. > > The change in fair treatment looks dubious. A flow which uses exactly > > it's quantum in one round will be skipped in the next round. A flow > > which uses a bit more than its quantum in one round, will be skipped > > too, while we should only give it less this time to keep the sum up to > > 2 quantums. (The usual algorithm is to check if a flow has enough > > "tickets" for sending its next packet.) > > Hmm... > > A flow which uses exactly its quantum in one round wont be skipped in > the next round. > > I only made the "I pass my round to next slot in chain" in one place > instead of two, maybe you missed the removal at the end of > sfq_dequeue() ? > > - } else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) { > - q->tail = slot; > - slot->allot += q->quantum; > + } else { > + slot->allot -= qdisc_pkt_len(skb); > } > > Now the check is performed at the beginning of sfq_dequeue(), to be able > to charge a previously sent 'big packet' multiple times (faulty flow > wont send a packet before passing xx rounds) > > I believe I just did the right thing. The "allot" is incremented when > current flow "pass its round to next slot", and decremented when a > packet is dequeued from this slot. Before being allowed to dequeue a > packet, "allot" must be 'positive'. Simply try to check my examples before and after. There is no skipping of a round now. It's a serious change. Somebody tried to avoid it at all in the current implementation. You should also think about fairness of normal (but different) size packets. Jarek P. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH net-next-2.6] sch_sfq: allow big packets and be fair 2010-12-21 11:39 ` Jarek Poplawski @ 2010-12-21 12:17 ` Jarek Poplawski 2010-12-21 13:04 ` [PATCH v2 " Eric Dumazet 0 siblings, 1 reply; 40+ messages in thread From: Jarek Poplawski @ 2010-12-21 12:17 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, Patrick McHardy, netdev On Tue, Dec 21, 2010 at 11:39:20AM +0000, Jarek Poplawski wrote: > On Tue, Dec 21, 2010 at 11:57:17AM +0100, Eric Dumazet wrote: > > Now the check is performed at the beginning of sfq_dequeue(), to be able > > to charge a previously sent 'big packet' multiple times (faulty flow > > wont send a packet before passing xx rounds) > > > > I believe I just did the right thing. The "allot" is incremented when > > current flow "pass its round to next slot", and decremented when a > > packet is dequeued from this slot. Before being allowed to dequeue a > > packet, "allot" must be 'positive'. > > Simply try to check my examples before and after. There is no skipping > of a round now. It's a serious change. Somebody tried to avoid it at > all in the current implementation. You should also think about fairness > of normal (but different) size packets. Oops! You're right yet ;-) This skipping shouldn't happen with quantum bigger than max packet size, so this patch is OK. Sorry, Jarek P. ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v2 net-next-2.6] sch_sfq: allow big packets and be fair 2010-12-21 12:17 ` Jarek Poplawski @ 2010-12-21 13:04 ` Eric Dumazet 2010-12-21 13:47 ` Jarek Poplawski 2010-12-28 21:46 ` David Miller 0 siblings, 2 replies; 40+ messages in thread From: Eric Dumazet @ 2010-12-21 13:04 UTC (permalink / raw) To: Jarek Poplawski; +Cc: David Miller, Patrick McHardy, netdev Le mardi 21 décembre 2010 à 12:17 +0000, Jarek Poplawski a écrit : > Oops! You're right yet ;-) This skipping shouldn't happen with quantum > bigger than max packet size, so this patch is OK. Thanks Jarek, here is a v2 with the scale you suggested. [PATCH v2 net-next-2.6] sch_sfq: allow big packets and be fair SFQ is currently 'limited' to small packets, because it uses a 15bit allotment number per flow. Introduce a scale by 8, so that we can handle full size TSO/GRO packets. Use appropriate handling to make sure allot is positive before a new packet is dequeued, so that fairness is respected. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jarek Poplawski <jarkao2@gmail.com> Cc: Patrick McHardy <kaber@trash.net> --- v2: Use a scale of 8 as Jarek suggested, instead of 18bit fields net/sched/sch_sfq.c | 28 +++++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index c474b4b..f3a9fd7 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -67,7 +67,7 @@ IMPLEMENTATION: This implementation limits maximal queue length to 128; - maximal mtu to 2^15-1; max 128 flows, number of hash buckets to 1024. + max mtu to 2^18-1; max 128 flows, number of hash buckets to 1024. The only goal of this restrictions was that all data fit into one 4K page on 32bit arches. @@ -77,6 +77,11 @@ #define SFQ_SLOTS 128 /* max number of flows */ #define SFQ_EMPTY_SLOT 255 #define SFQ_HASH_DIVISOR 1024 +/* We use 15+1 bits to store allot, and want to handle packets up to 64K + * Scale allot by 8 (1<<3) so that no overflow occurs. + */ +#define SFQ_ALLOT_SHIFT 3 +#define SFQ_ALLOT_SIZE(X) DIV_ROUND_UP(X, 1 << SFQ_ALLOT_SHIFT) /* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */ typedef unsigned char sfq_index; @@ -115,7 +120,7 @@ struct sfq_sched_data struct timer_list perturb_timer; u32 perturbation; sfq_index cur_depth; /* depth of longest slot */ - + unsigned short scaled_quantum; /* SFQ_ALLOT_SIZE(quantum) */ struct sfq_slot *tail; /* current slot in round */ sfq_index ht[SFQ_HASH_DIVISOR]; /* Hash table */ struct sfq_slot slots[SFQ_SLOTS]; @@ -394,7 +399,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) q->tail->next = x; } q->tail = slot; - slot->allot = q->quantum; + slot->allot = q->scaled_quantum; } if (++sch->q.qlen <= q->limit) { sch->bstats.bytes += qdisc_pkt_len(skb); @@ -430,8 +435,14 @@ sfq_dequeue(struct Qdisc *sch) if (q->tail == NULL) return NULL; +next_slot: a = q->tail->next; slot = &q->slots[a]; + if (slot->allot <= 0) { + q->tail = slot; + slot->allot += q->scaled_quantum; + goto next_slot; + } skb = slot_dequeue_head(slot); sfq_dec(q, a); sch->q.qlen--; @@ -446,9 +457,8 @@ sfq_dequeue(struct Qdisc *sch) return skb; } q->tail->next = next_a; - } else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) { - q->tail = slot; - slot->allot += q->quantum; + } else { + slot->allot -= SFQ_ALLOT_SIZE(qdisc_pkt_len(skb)); } return skb; } @@ -484,6 +494,7 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt) sch_tree_lock(sch); q->quantum = ctl->quantum ? : psched_mtu(qdisc_dev(sch)); + q->scaled_quantum = SFQ_ALLOT_SIZE(q->quantum); q->perturb_period = ctl->perturb_period * HZ; if (ctl->limit) q->limit = min_t(u32, ctl->limit, SFQ_DEPTH - 1); @@ -524,6 +535,7 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt) q->tail = NULL; if (opt == NULL) { q->quantum = psched_mtu(qdisc_dev(sch)); + q->scaled_quantum = SFQ_ALLOT_SIZE(q->quantum); q->perturb_period = 0; q->perturbation = net_random(); } else { @@ -610,7 +622,9 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl, struct sfq_sched_data *q = qdisc_priv(sch); const struct sfq_slot *slot = &q->slots[q->ht[cl - 1]]; struct gnet_stats_queue qs = { .qlen = slot->qlen }; - struct tc_sfq_xstats xstats = { .allot = slot->allot }; + struct tc_sfq_xstats xstats = { + .allot = slot->allot << SFQ_ALLOT_SHIFT + }; struct sk_buff *skb; slot_queue_walk(slot, skb) ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v2 net-next-2.6] sch_sfq: allow big packets and be fair 2010-12-21 13:04 ` [PATCH v2 " Eric Dumazet @ 2010-12-21 13:47 ` Jarek Poplawski 2010-12-28 21:46 ` David Miller 1 sibling, 0 replies; 40+ messages in thread From: Jarek Poplawski @ 2010-12-21 13:47 UTC (permalink / raw) To: Eric Dumazet; +Cc: David Miller, Patrick McHardy, netdev On Tue, Dec 21, 2010 at 02:04:59PM +0100, Eric Dumazet wrote: > Le mardi 21 décembre 2010 ?? 12:17 +0000, Jarek Poplawski a écrit : > > > Oops! You're right yet ;-) This skipping shouldn't happen with quantum > > bigger than max packet size, so this patch is OK. > > Thanks Jarek, here is a v2 with the scale you suggested. Very nice! Thanks as well, Jarek P. > > [PATCH v2 net-next-2.6] sch_sfq: allow big packets and be fair > > SFQ is currently 'limited' to small packets, because it uses a 15bit > allotment number per flow. Introduce a scale by 8, so that we can handle > full size TSO/GRO packets. > > Use appropriate handling to make sure allot is positive before a new > packet is dequeued, so that fairness is respected. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > Cc: Jarek Poplawski <jarkao2@gmail.com> > Cc: Patrick McHardy <kaber@trash.net> > --- > v2: Use a scale of 8 as Jarek suggested, instead of 18bit fields > > net/sched/sch_sfq.c | 28 +++++++++++++++++++++------- > 1 file changed, 21 insertions(+), 7 deletions(-) > > diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c > index c474b4b..f3a9fd7 100644 > --- a/net/sched/sch_sfq.c > +++ b/net/sched/sch_sfq.c > @@ -67,7 +67,7 @@ > > IMPLEMENTATION: > This implementation limits maximal queue length to 128; > - maximal mtu to 2^15-1; max 128 flows, number of hash buckets to 1024. > + max mtu to 2^18-1; max 128 flows, number of hash buckets to 1024. > The only goal of this restrictions was that all data > fit into one 4K page on 32bit arches. > > @@ -77,6 +77,11 @@ > #define SFQ_SLOTS 128 /* max number of flows */ > #define SFQ_EMPTY_SLOT 255 > #define SFQ_HASH_DIVISOR 1024 > +/* We use 15+1 bits to store allot, and want to handle packets up to 64K > + * Scale allot by 8 (1<<3) so that no overflow occurs. > + */ > +#define SFQ_ALLOT_SHIFT 3 > +#define SFQ_ALLOT_SIZE(X) DIV_ROUND_UP(X, 1 << SFQ_ALLOT_SHIFT) > > /* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */ > typedef unsigned char sfq_index; > @@ -115,7 +120,7 @@ struct sfq_sched_data > struct timer_list perturb_timer; > u32 perturbation; > sfq_index cur_depth; /* depth of longest slot */ > - > + unsigned short scaled_quantum; /* SFQ_ALLOT_SIZE(quantum) */ > struct sfq_slot *tail; /* current slot in round */ > sfq_index ht[SFQ_HASH_DIVISOR]; /* Hash table */ > struct sfq_slot slots[SFQ_SLOTS]; > @@ -394,7 +399,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) > q->tail->next = x; > } > q->tail = slot; > - slot->allot = q->quantum; > + slot->allot = q->scaled_quantum; > } > if (++sch->q.qlen <= q->limit) { > sch->bstats.bytes += qdisc_pkt_len(skb); > @@ -430,8 +435,14 @@ sfq_dequeue(struct Qdisc *sch) > if (q->tail == NULL) > return NULL; > > +next_slot: > a = q->tail->next; > slot = &q->slots[a]; > + if (slot->allot <= 0) { > + q->tail = slot; > + slot->allot += q->scaled_quantum; > + goto next_slot; > + } > skb = slot_dequeue_head(slot); > sfq_dec(q, a); > sch->q.qlen--; > @@ -446,9 +457,8 @@ sfq_dequeue(struct Qdisc *sch) > return skb; > } > q->tail->next = next_a; > - } else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) { > - q->tail = slot; > - slot->allot += q->quantum; > + } else { > + slot->allot -= SFQ_ALLOT_SIZE(qdisc_pkt_len(skb)); > } > return skb; > } > @@ -484,6 +494,7 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt) > > sch_tree_lock(sch); > q->quantum = ctl->quantum ? : psched_mtu(qdisc_dev(sch)); > + q->scaled_quantum = SFQ_ALLOT_SIZE(q->quantum); > q->perturb_period = ctl->perturb_period * HZ; > if (ctl->limit) > q->limit = min_t(u32, ctl->limit, SFQ_DEPTH - 1); > @@ -524,6 +535,7 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt) > q->tail = NULL; > if (opt == NULL) { > q->quantum = psched_mtu(qdisc_dev(sch)); > + q->scaled_quantum = SFQ_ALLOT_SIZE(q->quantum); > q->perturb_period = 0; > q->perturbation = net_random(); > } else { > @@ -610,7 +622,9 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl, > struct sfq_sched_data *q = qdisc_priv(sch); > const struct sfq_slot *slot = &q->slots[q->ht[cl - 1]]; > struct gnet_stats_queue qs = { .qlen = slot->qlen }; > - struct tc_sfq_xstats xstats = { .allot = slot->allot }; > + struct tc_sfq_xstats xstats = { > + .allot = slot->allot << SFQ_ALLOT_SHIFT > + }; > struct sk_buff *skb; > > slot_queue_walk(slot, skb) > > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 net-next-2.6] sch_sfq: allow big packets and be fair 2010-12-21 13:04 ` [PATCH v2 " Eric Dumazet 2010-12-21 13:47 ` Jarek Poplawski @ 2010-12-28 21:46 ` David Miller 2010-12-29 7:53 ` [PATCH v3 " Eric Dumazet 1 sibling, 1 reply; 40+ messages in thread From: David Miller @ 2010-12-28 21:46 UTC (permalink / raw) To: eric.dumazet; +Cc: jarkao2, kaber, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Tue, 21 Dec 2010 14:04:59 +0100 > Le mardi 21 décembre 2010 à 12:17 +0000, Jarek Poplawski a écrit : > >> Oops! You're right yet ;-) This skipping shouldn't happen with quantum >> bigger than max packet size, so this patch is OK. > > Thanks Jarek, here is a v2 with the scale you suggested. > > [PATCH v2 net-next-2.6] sch_sfq: allow big packets and be fair > > SFQ is currently 'limited' to small packets, because it uses a 15bit > allotment number per flow. Introduce a scale by 8, so that we can handle > full size TSO/GRO packets. > > Use appropriate handling to make sure allot is positive before a new > packet is dequeued, so that fairness is respected. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > Cc: Jarek Poplawski <jarkao2@gmail.com> > Cc: Patrick McHardy <kaber@trash.net> > --- > v2: Use a scale of 8 as Jarek suggested, instead of 18bit fields Eric this doesn't apply cleanly, since the code in sfq_dump_class_stats() is a bit different in net-next-2.6 Please respin this and resubmit. ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v3 net-next-2.6] sch_sfq: allow big packets and be fair 2010-12-28 21:46 ` David Miller @ 2010-12-29 7:53 ` Eric Dumazet 2010-12-31 20:48 ` David Miller 0 siblings, 1 reply; 40+ messages in thread From: Eric Dumazet @ 2010-12-29 7:53 UTC (permalink / raw) To: David Miller; +Cc: jarkao2, kaber, netdev Le mardi 28 décembre 2010 à 13:46 -0800, David Miller a écrit : > Eric this doesn't apply cleanly, since the code in sfq_dump_class_stats() > is a bit different in net-next-2.6 > > Please respin this and resubmit. Sure, here is the version for latest net-next Thanks [PATCH v3 net-next-2.6] sch_sfq: allow big packets and be fair SFQ is currently 'limited' to small packets, because it uses a 15bit allotment number per flow. Introduce a scale by 8, so that we can handle full size TSO/GRO packets. Use appropriate handling to make sure allot is positive before a new packet is dequeued, so that fairness is respected. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Jarek Poplawski <jarkao2@gmail.com> Cc: Patrick McHardy <kaber@trash.net> --- v3: respin after commit ee09b3c1cf (fix sfq class stats handling) v2: Use a scale of 8 as Jarek suggested, instead of 18bit fields net/sched/sch_sfq.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c index 0ce421d..79c8967 100644 --- a/net/sched/sch_sfq.c +++ b/net/sched/sch_sfq.c @@ -67,7 +67,7 @@ IMPLEMENTATION: This implementation limits maximal queue length to 128; - maximal mtu to 2^15-1; max 128 flows, number of hash buckets to 1024. + max mtu to 2^18-1; max 128 flows, number of hash buckets to 1024. The only goal of this restrictions was that all data fit into one 4K page on 32bit arches. @@ -77,6 +77,11 @@ #define SFQ_SLOTS 128 /* max number of flows */ #define SFQ_EMPTY_SLOT 255 #define SFQ_HASH_DIVISOR 1024 +/* We use 16 bits to store allot, and want to handle packets up to 64K + * Scale allot by 8 (1<<3) so that no overflow occurs. + */ +#define SFQ_ALLOT_SHIFT 3 +#define SFQ_ALLOT_SIZE(X) DIV_ROUND_UP(X, 1 << SFQ_ALLOT_SHIFT) /* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */ typedef unsigned char sfq_index; @@ -115,7 +120,7 @@ struct sfq_sched_data struct timer_list perturb_timer; u32 perturbation; sfq_index cur_depth; /* depth of longest slot */ - + unsigned short scaled_quantum; /* SFQ_ALLOT_SIZE(quantum) */ struct sfq_slot *tail; /* current slot in round */ sfq_index ht[SFQ_HASH_DIVISOR]; /* Hash table */ struct sfq_slot slots[SFQ_SLOTS]; @@ -395,7 +400,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch) q->tail->next = x; } q->tail = slot; - slot->allot = q->quantum; + slot->allot = q->scaled_quantum; } if (++sch->q.qlen <= q->limit) { sch->bstats.bytes += qdisc_pkt_len(skb); @@ -431,8 +436,14 @@ sfq_dequeue(struct Qdisc *sch) if (q->tail == NULL) return NULL; +next_slot: a = q->tail->next; slot = &q->slots[a]; + if (slot->allot <= 0) { + q->tail = slot; + slot->allot += q->scaled_quantum; + goto next_slot; + } skb = slot_dequeue_head(slot); sfq_dec(q, a); sch->q.qlen--; @@ -447,9 +458,8 @@ sfq_dequeue(struct Qdisc *sch) return skb; } q->tail->next = next_a; - } else if ((slot->allot -= qdisc_pkt_len(skb)) <= 0) { - q->tail = slot; - slot->allot += q->quantum; + } else { + slot->allot -= SFQ_ALLOT_SIZE(qdisc_pkt_len(skb)); } return skb; } @@ -485,6 +495,7 @@ static int sfq_change(struct Qdisc *sch, struct nlattr *opt) sch_tree_lock(sch); q->quantum = ctl->quantum ? : psched_mtu(qdisc_dev(sch)); + q->scaled_quantum = SFQ_ALLOT_SIZE(q->quantum); q->perturb_period = ctl->perturb_period * HZ; if (ctl->limit) q->limit = min_t(u32, ctl->limit, SFQ_DEPTH - 1); @@ -525,6 +536,7 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt) q->tail = NULL; if (opt == NULL) { q->quantum = psched_mtu(qdisc_dev(sch)); + q->scaled_quantum = SFQ_ALLOT_SIZE(q->quantum); q->perturb_period = 0; q->perturbation = net_random(); } else { @@ -617,7 +629,7 @@ static int sfq_dump_class_stats(struct Qdisc *sch, unsigned long cl, if (idx != SFQ_EMPTY_SLOT) { const struct sfq_slot *slot = &q->slots[idx]; - xstats.allot = slot->allot; + xstats.allot = slot->allot << SFQ_ALLOT_SHIFT; qs.qlen = slot->qlen; slot_queue_walk(slot, skb) qs.backlog += qdisc_pkt_len(skb); ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v3 net-next-2.6] sch_sfq: allow big packets and be fair 2010-12-29 7:53 ` [PATCH v3 " Eric Dumazet @ 2010-12-31 20:48 ` David Miller 0 siblings, 0 replies; 40+ messages in thread From: David Miller @ 2010-12-31 20:48 UTC (permalink / raw) To: eric.dumazet; +Cc: jarkao2, kaber, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 29 Dec 2010 08:53:33 +0100 > [PATCH v3 net-next-2.6] sch_sfq: allow big packets and be fair > > SFQ is currently 'limited' to small packets, because it uses a 15bit > allotment number per flow. Introduce a scale by 8, so that we can handle > full size TSO/GRO packets. > > Use appropriate handling to make sure allot is positive before a new > packet is dequeued, so that fairness is respected. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > Acked-by: Jarek Poplawski <jarkao2@gmail.com> > Cc: Patrick McHardy <kaber@trash.net> Applied, thanks Eric. ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2010-12-31 20:47 UTC | newest] Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-12-15 14:03 [PATCH] net_sched: sch_sfq: fix allot handling Eric Dumazet 2010-12-15 16:03 ` Patrick McHardy 2010-12-15 16:27 ` Eric Dumazet 2010-12-15 16:40 ` [PATCH v2] " Eric Dumazet 2010-12-15 16:43 ` Patrick McHardy 2010-12-15 16:55 ` Eric Dumazet 2010-12-15 17:03 ` Patrick McHardy 2010-12-15 17:09 ` Eric Dumazet 2010-12-15 17:21 ` Patrick McHardy 2010-12-15 17:30 ` [PATCH v3] " Eric Dumazet 2010-12-15 18:18 ` [PATCH net-next-2.6] net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() Eric Dumazet 2010-12-15 19:10 ` Eric Dumazet 2010-12-16 8:16 ` Jarek Poplawski 2010-12-16 10:18 ` [PATCH v2 " Eric Dumazet 2010-12-16 11:03 ` [PATCH " Eric Dumazet 2010-12-16 13:09 ` Jarek Poplawski 2010-12-20 21:14 ` David Miller 2010-12-20 21:18 ` [PATCH v3] net_sched: sch_sfq: fix allot handling David Miller 2010-12-16 13:08 ` [PATCH v2] " Eric Dumazet 2010-12-17 16:52 ` [RFC PATCH] net_sched: sch_sfq: better struct layouts Eric Dumazet 2010-12-19 21:22 ` Jarek Poplawski 2010-12-20 17:02 ` [PATCH v2] " Eric Dumazet 2010-12-20 21:33 ` David Miller 2010-12-20 21:42 ` Eric Dumazet 2010-12-20 22:54 ` [PATCH v3 net-next-2.6] " Eric Dumazet 2010-12-21 5:33 ` David Miller 2010-12-20 22:55 ` [PATCH v2] " Jarek Poplawski 2010-12-20 23:16 ` [PATCH net-next-2.6] sch_sfq: allow big packets and be fair Eric Dumazet 2010-12-21 10:15 ` Jarek Poplawski 2010-12-21 10:30 ` Jarek Poplawski 2010-12-21 10:44 ` Eric Dumazet 2010-12-21 10:56 ` Jarek Poplawski 2010-12-21 10:57 ` Eric Dumazet 2010-12-21 11:39 ` Jarek Poplawski 2010-12-21 12:17 ` Jarek Poplawski 2010-12-21 13:04 ` [PATCH v2 " Eric Dumazet 2010-12-21 13:47 ` Jarek Poplawski 2010-12-28 21:46 ` David Miller 2010-12-29 7:53 ` [PATCH v3 " Eric Dumazet 2010-12-31 20:48 ` David Miller
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.