All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-2.6] net_sched: always clone skbs
@ 2010-12-20 14:35 Changli Gao
  2010-12-20 14:37 ` Eric Dumazet
  2010-12-20 23:20 ` Jarek Poplawski
  0 siblings, 2 replies; 35+ messages in thread
From: Changli Gao @ 2010-12-20 14:35 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jarek Poplawski, Eric Dumazet, netdev, Jamal Hadi Salim,
	Pawel Staszewski, Changli Gao

Pawel reported a panic related to handling shared skbs in ixgbe
incorrectly. So we need to revert my previous patch to work around
this bug. Instead of reverting the patch completely, I just revert
the essential lines, so we can add the previous optimization
back more easily in future.

    commit 3511c9132f8b1e1b5634e41a3331c44b0c13be70
    Author: Changli Gao <xiaosuo@gmail.com>
    Date:   Sat Oct 16 13:04:08 2010 +0000
    
        net_sched: remove the unused parameter of qdisc_create_dflt()

Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
---
 include/net/sch_generic.h |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 786cc39..0af57eb 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -611,11 +611,7 @@ static inline struct sk_buff *skb_act_clone(struct sk_buff *skb, gfp_t gfp_mask,
 {
 	struct sk_buff *n;
 
-	if ((action == TC_ACT_STOLEN || action == TC_ACT_QUEUED) &&
-	    !skb_shared(skb))
-		n = skb_get(skb);
-	else
-		n = skb_clone(skb, gfp_mask);
+	n = skb_clone(skb, gfp_mask);
 
 	if (n) {
 		n->tc_verd = SET_TC_VERD(n->tc_verd, 0);

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH net-2.6] net_sched: always clone skbs
  2010-12-20 14:35 [PATCH net-2.6] net_sched: always clone skbs Changli Gao
@ 2010-12-20 14:37 ` Eric Dumazet
  2010-12-20 18:27   ` David Miller
  2010-12-20 23:20 ` Jarek Poplawski
  1 sibling, 1 reply; 35+ messages in thread
From: Eric Dumazet @ 2010-12-20 14:37 UTC (permalink / raw)
  To: Changli Gao
  Cc: David S. Miller, Jarek Poplawski, netdev, Jamal Hadi Salim,
	Pawel Staszewski

Le lundi 20 décembre 2010 à 22:35 +0800, Changli Gao a écrit :
> Pawel reported a panic related to handling shared skbs in ixgbe
> incorrectly. So we need to revert my previous patch to work around
> this bug. Instead of reverting the patch completely, I just revert
> the essential lines, so we can add the previous optimization
> back more easily in future.
> 
>     commit 3511c9132f8b1e1b5634e41a3331c44b0c13be70
>     Author: Changli Gao <xiaosuo@gmail.com>
>     Date:   Sat Oct 16 13:04:08 2010 +0000
>     
>         net_sched: remove the unused parameter of qdisc_create_dflt()
> 
> Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-2.6] net_sched: always clone skbs
  2010-12-20 14:37 ` Eric Dumazet
@ 2010-12-20 18:27   ` David Miller
  0 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2010-12-20 18:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xiaosuo, jarkao2, netdev, hadi, pstaszewski

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 20 Dec 2010 15:37:56 +0100

> Le lundi 20 décembre 2010 à 22:35 +0800, Changli Gao a écrit :
>> Pawel reported a panic related to handling shared skbs in ixgbe
>> incorrectly. So we need to revert my previous patch to work around
>> this bug. Instead of reverting the patch completely, I just revert
>> the essential lines, so we can add the previous optimization
>> back more easily in future.
>> 
>>     commit 3511c9132f8b1e1b5634e41a3331c44b0c13be70
>>     Author: Changli Gao <xiaosuo@gmail.com>
>>     Date:   Sat Oct 16 13:04:08 2010 +0000
>>     
>>         net_sched: remove the unused parameter of qdisc_create_dflt()
>> 
>> Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
>> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks everyone.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-2.6] net_sched: always clone skbs
  2010-12-20 14:35 [PATCH net-2.6] net_sched: always clone skbs Changli Gao
  2010-12-20 14:37 ` Eric Dumazet
@ 2010-12-20 23:20 ` Jarek Poplawski
  2010-12-20 23:28   ` Eric Dumazet
  1 sibling, 1 reply; 35+ messages in thread
From: Jarek Poplawski @ 2010-12-20 23:20 UTC (permalink / raw)
  To: Changli Gao
  Cc: David S. Miller, Eric Dumazet, netdev, Jamal Hadi Salim,
	Pawel Staszewski

On Mon, Dec 20, 2010 at 10:35:30PM +0800, Changli Gao wrote:
> Pawel reported a panic related to handling shared skbs in ixgbe
> incorrectly. So we need to revert my previous patch to work around
> this bug. Instead of reverting the patch completely, I just revert
> the essential lines, so we can add the previous optimization
> back more easily in future.
> 
>     commit 3511c9132f8b1e1b5634e41a3331c44b0c13be70
>     Author: Changli Gao <xiaosuo@gmail.com>
>     Date:   Sat Oct 16 13:04:08 2010 +0000
>     
>         net_sched: remove the unused parameter of qdisc_create_dflt()
> 

Isn't there some mistake in the commit number? IMHO this changelog is
mostly wrong. The bug happens outside of ixgbe, probably in
dev_hard_start_xmit() ->  __skb_linearize(), and even if not, it can,
without any driver's fault. The only question is why it didn't triger
with 2.6.36.

Jarek P.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-2.6] net_sched: always clone skbs
  2010-12-20 23:20 ` Jarek Poplawski
@ 2010-12-20 23:28   ` Eric Dumazet
  2010-12-20 23:52     ` Jarek Poplawski
  2010-12-21  0:21     ` [PATCH net-2.6] net_sched: always clone skbs Changli Gao
  0 siblings, 2 replies; 35+ messages in thread
From: Eric Dumazet @ 2010-12-20 23:28 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Changli Gao, David S. Miller, netdev, Jamal Hadi Salim, Pawel Staszewski

Le mardi 21 décembre 2010 à 00:20 +0100, Jarek Poplawski a écrit :
> On Mon, Dec 20, 2010 at 10:35:30PM +0800, Changli Gao wrote:
> > Pawel reported a panic related to handling shared skbs in ixgbe
> > incorrectly. So we need to revert my previous patch to work around
> > this bug. Instead of reverting the patch completely, I just revert
> > the essential lines, so we can add the previous optimization
> > back more easily in future.
> > 
> >     commit 3511c9132f8b1e1b5634e41a3331c44b0c13be70
> >     Author: Changli Gao <xiaosuo@gmail.com>
> >     Date:   Sat Oct 16 13:04:08 2010 +0000
> >     
> >         net_sched: remove the unused parameter of qdisc_create_dflt()
> > 
> 
> Isn't there some mistake in the commit number? IMHO this changelog is
> mostly wrong. The bug happens outside of ixgbe, probably in
> dev_hard_start_xmit() ->  __skb_linearize(), and even if not, it can,
> without any driver's fault. The only question is why it didn't triger
> with 2.6.36.
> 

Indeed, commit number is wrong. It was :

commit 210d6de78c5d7c785fc532556cea340e517955e1
Author: Changli Gao <xiaosuo@gmail.com>
Date:   Thu Jun 24 16:25:12 2010 +0000

    act_mirred: don't clone skb when skb isn't shared

It triggers now because GRO might be default to on now.

commit 6a08d194ee40806e0ccd5f36ed768e64cbfc979f
e1000: use GRO for receive 

(or other various GRO patches)




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-2.6] net_sched: always clone skbs
  2010-12-20 23:28   ` Eric Dumazet
@ 2010-12-20 23:52     ` Jarek Poplawski
  2010-12-21 13:52       ` jamal
  2010-12-21  0:21     ` [PATCH net-2.6] net_sched: always clone skbs Changli Gao
  1 sibling, 1 reply; 35+ messages in thread
From: Jarek Poplawski @ 2010-12-20 23:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Changli Gao, David S. Miller, netdev, Jamal Hadi Salim, Pawel Staszewski

On Tue, Dec 21, 2010 at 12:28:09AM +0100, Eric Dumazet wrote:
> Le mardi 21 décembre 2010 ?? 00:20 +0100, Jarek Poplawski a écrit :
> > On Mon, Dec 20, 2010 at 10:35:30PM +0800, Changli Gao wrote:
> > > Pawel reported a panic related to handling shared skbs in ixgbe
> > > incorrectly. So we need to revert my previous patch to work around
> > > this bug. Instead of reverting the patch completely, I just revert
> > > the essential lines, so we can add the previous optimization
> > > back more easily in future.
> > > 
> > >     commit 3511c9132f8b1e1b5634e41a3331c44b0c13be70
> > >     Author: Changli Gao <xiaosuo@gmail.com>
> > >     Date:   Sat Oct 16 13:04:08 2010 +0000
> > >     
> > >         net_sched: remove the unused parameter of qdisc_create_dflt()
> > > 
> > 
> > Isn't there some mistake in the commit number? IMHO this changelog is
> > mostly wrong. The bug happens outside of ixgbe, probably in
> > dev_hard_start_xmit() ->  __skb_linearize(), and even if not, it can,
> > without any driver's fault. The only question is why it didn't triger
> > with 2.6.36.
> > 
> 
> Indeed, commit number is wrong. It was :
> 
> commit 210d6de78c5d7c785fc532556cea340e517955e1
> Author: Changli Gao <xiaosuo@gmail.com>
> Date:   Thu Jun 24 16:25:12 2010 +0000
> 
>     act_mirred: don't clone skb when skb isn't shared
> 
> It triggers now because GRO might be default to on now.
> 
> commit 6a08d194ee40806e0ccd5f36ed768e64cbfc979f
> e1000: use GRO for receive 
> 
> (or other various GRO patches)

Well, still some doubt after re-reading Pawel's 1st. message:

host1 (kernel 2.6.36.2)
netperf client -> eth3 (82598EB 10-Gigabit AT CX4) - directly connected to eth2 of host2
ethtool -k eth3
Offload parameters for eth3:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on 
...

Of course, I know this was another box.

Jarek P.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-2.6] net_sched: always clone skbs
  2010-12-20 23:28   ` Eric Dumazet
  2010-12-20 23:52     ` Jarek Poplawski
@ 2010-12-21  0:21     ` Changli Gao
  2010-12-21  3:55       ` David Miller
  1 sibling, 1 reply; 35+ messages in thread
From: Changli Gao @ 2010-12-21  0:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jarek Poplawski, David S. Miller, netdev, Jamal Hadi Salim,
	Pawel Staszewski

On Tue, Dec 21, 2010 at 7:28 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Indeed, commit number is wrong. It was :
>
> commit 210d6de78c5d7c785fc532556cea340e517955e1
> Author: Changli Gao <xiaosuo@gmail.com>
> Date:   Thu Jun 24 16:25:12 2010 +0000
>
>    act_mirred: don't clone skb when skb isn't shared
>
> It triggers now because GRO might be default to on now.
>
> commit 6a08d194ee40806e0ccd5f36ed768e64cbfc979f
> e1000: use GRO for receive
>
> (or other various GRO patches)
>
>

Oops. It is all my fault. Sorry. What can I do to fix this after David
has applied it and pushed it out?


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-2.6] net_sched: always clone skbs
  2010-12-21  0:21     ` [PATCH net-2.6] net_sched: always clone skbs Changli Gao
@ 2010-12-21  3:55       ` David Miller
  0 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2010-12-21  3:55 UTC (permalink / raw)
  To: xiaosuo; +Cc: eric.dumazet, jarkao2, netdev, hadi, pstaszewski

From: Changli Gao <xiaosuo@gmail.com>
Date: Tue, 21 Dec 2010 08:21:41 +0800

> Oops. It is all my fault. Sorry. What can I do to fix this after David
> has applied it and pushed it out?

Don't worry about it, mistakes happen.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-2.6] net_sched: always clone skbs
  2010-12-20 23:52     ` Jarek Poplawski
@ 2010-12-21 13:52       ` jamal
  2010-12-21 14:17         ` Changli Gao
  0 siblings, 1 reply; 35+ messages in thread
From: jamal @ 2010-12-21 13:52 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Eric Dumazet, Changli Gao, David S. Miller, netdev, Pawel Staszewski

On Tue, 2010-12-21 at 00:52 +0100, Jarek Poplawski wrote:


> 
> host1 (kernel 2.6.36.2)
> netperf client -> eth3 (82598EB 10-Gigabit AT CX4) - directly connected to eth2 of host2
> ethtool -k eth3
> Offload parameters for eth3:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp-segmentation-offload: on
> udp-fragmentation-offload: off
> generic-segmentation-offload: on
> generic-receive-offload: on 

Use to be we couldnt get the ifb+mirred combo to work
with TSO even without this change - i will have to dig old emails
to remember details. So we are making progress;-> I did write a set
of rules in: Documentation/networking/tc-actions-env-rules.txt
Changli optimized rule #1. When i looked at his patch it seems
to not harm that case. Sometimes dumb is a good principle ;->

cheers,
jamal


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-2.6] net_sched: always clone skbs
  2010-12-21 13:52       ` jamal
@ 2010-12-21 14:17         ` Changli Gao
  2010-12-21 22:37           ` Jarek Poplawski
  0 siblings, 1 reply; 35+ messages in thread
From: Changli Gao @ 2010-12-21 14:17 UTC (permalink / raw)
  To: hadi
  Cc: Jarek Poplawski, Eric Dumazet, David S. Miller, netdev, Pawel Staszewski

On Tue, Dec 21, 2010 at 9:52 PM, jamal <hadi@cyberus.ca> wrote:
>
> Use to be we couldnt get the ifb+mirred combo to work
> with TSO even without this change - i will have to dig old emails
> to remember details. So we are making progress;-> I did write a set
> of rules in: Documentation/networking/tc-actions-env-rules.txt
> Changli optimized rule #1. When i looked at his patch it seems
> to not harm that case. Sometimes dumb is a good principle ;->
>

In order to make my trick work. We need to assure dev_queue_xmit() and
dev_hard_start_xmit() accept shared skbs. As Eric pointed, pktgen also
need dev->netdev_ops->ndo_start_xmit() accept shared skbs. We need to
fix every ndo_start_xmit() one by one, then dev_hard_start_xmit(), and
when dev_queue_xmit() is also fixed, my trick can be added back.
:)

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-2.6] net_sched: always clone skbs
  2010-12-21 14:17         ` Changli Gao
@ 2010-12-21 22:37           ` Jarek Poplawski
  2010-12-23 13:35             ` jamal
  0 siblings, 1 reply; 35+ messages in thread
From: Jarek Poplawski @ 2010-12-21 22:37 UTC (permalink / raw)
  To: Changli Gao; +Cc: hadi, Eric Dumazet, David S. Miller, netdev, Pawel Staszewski

On Tue, Dec 21, 2010 at 10:17:35PM +0800, Changli Gao wrote:
> On Tue, Dec 21, 2010 at 9:52 PM, jamal <hadi@cyberus.ca> wrote:
> >
> > Use to be we couldnt get the ifb+mirred combo to work
> > with TSO even without this change - i will have to dig old emails
> > to remember details. So we are making progress;-> I did write a set
> > of rules in: Documentation/networking/tc-actions-env-rules.txt
> > Changli optimized rule #1. When i looked at his patch it seems
> > to not harm that case. Sometimes dumb is a good principle ;->

Actually, when dumb isn't a good principle? ;->

Speaking about wrong things like optimizing this 1st commandment: it
seems, if we're stealing not shared skb it's not unreasonable to tell
others not to touch it anymore, and skip kfree_skb() in handle_ing(),
unless I miss something?


> In order to make my trick work. We need to assure dev_queue_xmit() and
> dev_hard_start_xmit() accept shared skbs. As Eric pointed, pktgen also
> need dev->netdev_ops->ndo_start_xmit() accept shared skbs. We need to
> fix every ndo_start_xmit() one by one, then dev_hard_start_xmit(), and
> when dev_queue_xmit() is also fixed, my trick can be added back.
> :)

I'm not sure pktgen is right - even if it's safe in its case, it seems
to break some older rules, and drivers should really own the things.
So it needs reviewing first.

Cheers,
Jarek P.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-2.6] net_sched: always clone skbs
  2010-12-21 22:37           ` Jarek Poplawski
@ 2010-12-23 13:35             ` jamal
  2010-12-23 23:04               ` Jarek Poplawski
  2011-01-02 21:27               ` [RFC] net_sched: mark packet staying on queue too long Eric Dumazet
  0 siblings, 2 replies; 35+ messages in thread
From: jamal @ 2010-12-23 13:35 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Changli Gao, Eric Dumazet, David S. Miller, netdev, Pawel Staszewski

On Tue, 2010-12-21 at 23:37 +0100, Jarek Poplawski wrote:

> Actually, when dumb isn't a good principle? ;->
> 
> Speaking about wrong things like optimizing this 1st commandment: it
> seems, if we're stealing not shared skb it's not unreasonable to tell
> others not to touch it anymore, and skip kfree_skb() in handle_ing(),
> unless I miss something?

I dont know how much cycles it will save you vs the code you add
but it is not unreasonable. Of course it would mean we are moving
beyond KISSes[1] now Jarek;->

> > In order to make my trick work. We need to assure dev_queue_xmit() and
> > dev_hard_start_xmit() accept shared skbs. As Eric pointed, pktgen also
> > need dev->netdev_ops->ndo_start_xmit() accept shared skbs. We need to
> > fix every ndo_start_xmit() one by one, then dev_hard_start_xmit(), and
> > when dev_queue_xmit() is also fixed, my trick can be added back.
> > :)
> 
> I'm not sure pktgen is right - even if it's safe in its case, it seems
> to break some older rules, and drivers should really own the things.
> So it needs reviewing first.

The idea we always try to keep is that the caller is responsible for the
fate of the skb it sent. The callee will do things like make a copy or
clone if it needed to make changes or keep the skb. Same should be
expected of a driver.


cheers,
jamal

[1] Jarek - so you dont misunderstand this as an overture - it means
Keep It Simple Silly (or the principle of dumb we talked about) ;->


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-2.6] net_sched: always clone skbs
  2010-12-23 13:35             ` jamal
@ 2010-12-23 23:04               ` Jarek Poplawski
  2011-01-02 21:27               ` [RFC] net_sched: mark packet staying on queue too long Eric Dumazet
  1 sibling, 0 replies; 35+ messages in thread
From: Jarek Poplawski @ 2010-12-23 23:04 UTC (permalink / raw)
  To: jamal
  Cc: Changli Gao, Eric Dumazet, David S. Miller, netdev, Pawel Staszewski

On Thu, Dec 23, 2010 at 08:35:33AM -0500, jamal wrote:
> On Tue, 2010-12-21 at 23:37 +0100, Jarek Poplawski wrote:
> 
> > Actually, when dumb isn't a good principle? ;->
> > 
> > Speaking about wrong things like optimizing this 1st commandment: it
> > seems, if we're stealing not shared skb it's not unreasonable to tell
> > others not to touch it anymore, and skip kfree_skb() in handle_ing(),
> > unless I miss something?
> 
> I dont know how much cycles it will save you vs the code you add
> but it is not unreasonable. Of course it would mean we are moving
> beyond KISSes[1] now Jarek;->

Hmm... seeing all those multiqued multiques I thought everybody was
beyond, but no problem, let's go back to KISSes[2] ;-)

> > > In order to make my trick work. We need to assure dev_queue_xmit() and
> > > dev_hard_start_xmit() accept shared skbs. As Eric pointed, pktgen also
> > > need dev->netdev_ops->ndo_start_xmit() accept shared skbs. We need to
> > > fix every ndo_start_xmit() one by one, then dev_hard_start_xmit(), and
> > > when dev_queue_xmit() is also fixed, my trick can be added back.
> > > :)
> > 
> > I'm not sure pktgen is right - even if it's safe in its case, it seems
> > to break some older rules, and drivers should really own the things.
> > So it needs reviewing first.
> 
> The idea we always try to keep is that the caller is responsible for the
> fate of the skb it sent. The callee will do things like make a copy or
> clone if it needed to make changes or keep the skb. Same should be
> expected of a driver.

Well, I'm not sure we talk about the same thing. I mean e.g. the
comment above dev_queue_xmit(); the caller passes all control to the
callee here.

Cheers,
Jarek P.

> [1] Jarek - so you dont misunderstand this as an overture - it means
> Keep It Simple Silly (or the principle of dumb we talked about) ;->

OK, I can remember the seventies too ;-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RFC] net_sched: mark packet staying on queue too long
  2010-12-23 13:35             ` jamal
  2010-12-23 23:04               ` Jarek Poplawski
@ 2011-01-02 21:27               ` Eric Dumazet
  2011-01-03 13:52                 ` jamal
  2011-01-03 17:58                 ` [RFC] net_sched: mark packet staying on queue too long Stephen Hemminger
  1 sibling, 2 replies; 35+ messages in thread
From: Eric Dumazet @ 2011-01-02 21:27 UTC (permalink / raw)
  To: hadi, Jarek Poplawski, David Miller, Jesper Dangaard Brouer,
	Patrick McHardy
  Cc: netdev

While playing with SFQ and other AQM, I was bothered to see how easy it
was for a single tcp flow to 'fill the pipe' and consume lot of memory
buffers in queues. I know Jesper use more than 50.000 SFQ on his
routers, and with GRO packets this can consume a lot of memory.

I played a bit adding ECN in SFQ, first by marking packets for a
particular flow if this flow qlen was above a given threshold, and later
using another trick : ECN mark packet if it stayed longer than a given
delay in the queue. This of course could be done on other modules, what
do you think ?

The idea is to take into account the time packet stayed in the queue,
regardless of other class parameters.

Following quick and dirty patch to show the idea. Of course, the delay
should be configured on each SFQ/RED/XXXX class, so it would need an
iproute2 patch, and the delay unit should be "ms" (or even "us"), not
ticks, but as I said this is a quick and dirty patch (net-next-2.6
based)

Using jiffies allows only delays above 3 or 4 ticks...

Or maybe ECN is just a dream :(

 include/net/sch_generic.h |    1 +
 net/sched/sch_sfq.c       |   11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 0af57eb..1a7ac68 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -198,6 +198,7 @@ struct tcf_proto {
 };
 
 struct qdisc_skb_cb {
+	unsigned long		timestamp;
 	unsigned int		pkt_len;
 	char			data[];
 };
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index d54ac94..baf9465 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -24,6 +24,8 @@
 #include <net/ip.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <linux/moduleparam.h>
 
 
 /*	Stochastic Fairness Queuing algorithm.
@@ -86,6 +88,10 @@
 /* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */
 typedef unsigned char sfq_index;
 
+static int ecn_delay; /* default : no ECN handling */
+module_param(ecn_delay, int, 0);
+MODULE_PARM_DESC(ecn_delay, "mark packets if they stay in queue longer than ecn_delay ticks");
+
 /*
  * We dont use pointers to save space.
  * Small indexes [0 ... SFQ_SLOTS - 1] are 'pointers' to slots[] array
@@ -391,6 +397,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 	sch->qstats.backlog += qdisc_pkt_len(skb);
 	slot_queue_add(slot, skb);
+	qdisc_skb_cb(skb)->timestamp = jiffies;
 	sfq_inc(q, x);
 	if (slot->qlen == 1) {		/* The flow is new */
 		if (q->tail == NULL) {	/* It is the first flow */
@@ -460,6 +467,10 @@ next_slot:
 		q->tail->next = next_a;
 	} else {
 		slot->allot -= SFQ_ALLOT_SIZE(qdisc_pkt_len(skb));
+		if (ecn_delay &&
+		    time_after(jiffies, qdisc_skb_cb(skb)->timestamp + ecn_delay) &&
+		    INET_ECN_set_ce(skb))
+			sch->qstats.overlimits++;
 	}
 	return skb;
 }



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC] net_sched: mark packet staying on queue too long
  2011-01-02 21:27               ` [RFC] net_sched: mark packet staying on queue too long Eric Dumazet
@ 2011-01-03 13:52                 ` jamal
  2011-01-03 14:02                   ` Eric Dumazet
  2011-01-03 17:58                 ` [RFC] net_sched: mark packet staying on queue too long Stephen Hemminger
  1 sibling, 1 reply; 35+ messages in thread
From: jamal @ 2011-01-03 13:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jarek Poplawski, David Miller, Jesper Dangaard Brouer,
	Patrick McHardy, netdev

On Sun, 2011-01-02 at 22:27 +0100, Eric Dumazet wrote:
> While playing with SFQ and other AQM, I was bothered to see how easy it
> was for a single tcp flow to 'fill the pipe' and consume lot of memory
> buffers in queues. I know Jesper use more than 50.000 SFQ on his
> routers, and with GRO packets this can consume a lot of memory.
> 
> I played a bit adding ECN in SFQ, first by marking packets for a
> particular flow if this flow qlen was above a given threshold, and later
> using another trick : ECN mark packet if it stayed longer than a given
> delay in the queue. This of course could be done on other modules, what
> do you think ?
> 

I think for this to be effective, it would require maintaining some
history of the effect (some form of moving window average)
and probably a randomness in marking instead of a deterministic one.
Something like what Stochastic Fair RED/BLUE Queueing does.
Otherwise you get a burst of marked packets then silence then a burst
etc (i.e the classical synchronization effect).

It would probably be more effective to provide feedback to the local tcp
since we can detect this locally instead of waiting to some round trip
(or half roundtrip) effect at the receiver with ECN i.e in the same
spirit as NET_XMIT_CN but for which local TCP does something useful with
that info (instead of "retransmit shortly"). But even that would require
maintaining some state on the scheduler per hash in this case....

cheers,
jamal



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] net_sched: mark packet staying on queue too long
  2011-01-03 13:52                 ` jamal
@ 2011-01-03 14:02                   ` Eric Dumazet
  2011-01-03 15:02                     ` jamal
  2011-01-03 18:11                     ` [PATCH] sch_red: report backlog information Eric Dumazet
  0 siblings, 2 replies; 35+ messages in thread
From: Eric Dumazet @ 2011-01-03 14:02 UTC (permalink / raw)
  To: hadi
  Cc: Jarek Poplawski, David Miller, Jesper Dangaard Brouer,
	Patrick McHardy, netdev

Le lundi 03 janvier 2011 à 08:52 -0500, jamal a écrit :
> On Sun, 2011-01-02 at 22:27 +0100, Eric Dumazet wrote:
> > While playing with SFQ and other AQM, I was bothered to see how easy it
> > was for a single tcp flow to 'fill the pipe' and consume lot of memory
> > buffers in queues. I know Jesper use more than 50.000 SFQ on his
> > routers, and with GRO packets this can consume a lot of memory.
> > 
> > I played a bit adding ECN in SFQ, first by marking packets for a
> > particular flow if this flow qlen was above a given threshold, and later
> > using another trick : ECN mark packet if it stayed longer than a given
> > delay in the queue. This of course could be done on other modules, what
> > do you think ?
> > 
> 
> I think for this to be effective, it would require maintaining some
> history of the effect (some form of moving window average)
> and probably a randomness in marking instead of a deterministic one.
> Something like what Stochastic Fair RED/BLUE Queueing does.
> Otherwise you get a burst of marked packets then silence then a burst
> etc (i.e the classical synchronization effect).
> 

I got fairly good results here, but admit-idly on a LAN.

Yep, maybe adding RED on each SFQ slot ;) Should be fairly cheap, and
actually needed in case ECN is not possible and we must earlly drop
instead.

I found BLUE very expensive in term of cache line accesses. Especially
with double hashing.

> It would probably be more effective to provide feedback to the local tcp
> since we can detect this locally instead of waiting to some round trip
> (or half roundtrip) effect at the receiver with ECN i.e in the same
> spirit as NET_XMIT_CN but for which local TCP does something useful with
> that info (instead of "retransmit shortly"). But even that would require
> maintaining some state on the scheduler per hash in this case....
> 

local tcp, for a router ? Hmm... But yes I see your point.

Speaking of ECN marking, it seems we (in RED/GRED or tunnels) change skb
data even if it is shared (can happen on ingress path)

Probably harmless, but tcpdump can show ECN bit being marked even on skb
snapshot before ingress (and later, ECN marked) or tunnels, while it
came unset from the wire.

Is it worth fixing this ? maybe using skb_make_writable() [once moved to
core network from netfilter]




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] net_sched: mark packet staying on queue too long
  2011-01-03 14:02                   ` Eric Dumazet
@ 2011-01-03 15:02                     ` jamal
  2011-01-03 18:11                     ` [PATCH] sch_red: report backlog information Eric Dumazet
  1 sibling, 0 replies; 35+ messages in thread
From: jamal @ 2011-01-03 15:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jarek Poplawski, David Miller, Jesper Dangaard Brouer,
	Patrick McHardy, netdev

On Mon, 2011-01-03 at 15:02 +0100, Eric Dumazet wrote:
> Le lundi 03 janvier 2011 à 08:52 -0500, jamal a écrit :

> I got fairly good results here, but admit-idly on a LAN.

Maybe just adding the randomness marking factor alone may help.
It all depends on RTT.

> Yep, maybe adding RED on each SFQ slot ;) Should be fairly cheap, and
> actually needed in case ECN is not possible and we must earlly drop
> instead.
> 

That would essentially be achieving the goal of SF Blue.

> I found BLUE very expensive in term of cache line accesses. Especially
> with double hashing.

If you can do it cheaply as you describe above, maybe should be
sufficient.

> local tcp, for a router ? Hmm... But yes I see your point.

Oh;-> thought you were talking host where my mumbling would make
more sense.

> Speaking of ECN marking, it seems we (in RED/GRED or tunnels) change skb
> data even if it is shared (can happen on ingress path)
> 
> Probably harmless, but tcpdump can show ECN bit being marked even on skb
> snapshot before ingress (and later, ECN marked) or tunnels, while it
> came unset from the wire.
> 
> Is it worth fixing this ? maybe using skb_make_writable() [once moved to
> core network from netfilter]

Typically the netdev owns the packet once it gets to that level and
it can do whatever it wants with it. But if you are seeing it on
ingress (probably using ifb?), then it makes sense to fix it.

cheers,
jamal



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] net_sched: mark packet staying on queue too long
  2011-01-02 21:27               ` [RFC] net_sched: mark packet staying on queue too long Eric Dumazet
  2011-01-03 13:52                 ` jamal
@ 2011-01-03 17:58                 ` Stephen Hemminger
  2011-01-04 14:19                   ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 35+ messages in thread
From: Stephen Hemminger @ 2011-01-03 17:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: hadi, Jarek Poplawski, David Miller, Jesper Dangaard Brouer,
	Patrick McHardy, netdev

On Sun, 02 Jan 2011 22:27:11 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> While playing with SFQ and other AQM, I was bothered to see how easy it
> was for a single tcp flow to 'fill the pipe' and consume lot of memory
> buffers in queues. I know Jesper use more than 50.000 SFQ on his
> routers, and with GRO packets this can consume a lot of memory.
> 
> I played a bit adding ECN in SFQ, first by marking packets for a
> particular flow if this flow qlen was above a given threshold, and later
> using another trick : ECN mark packet if it stayed longer than a given
> delay in the queue. This of course could be done on other modules, what
> do you think ?
> 
> The idea is to take into account the time packet stayed in the queue,
> regardless of other class parameters.
> 
> Following quick and dirty patch to show the idea. Of course, the delay
> should be configured on each SFQ/RED/XXXX class, so it would need an
> iproute2 patch, and the delay unit should be "ms" (or even "us"), not
> ticks, but as I said this is a quick and dirty patch (net-next-2.6
> based)
> 
> Using jiffies allows only delays above 3 or 4 ticks...
> 
> Or maybe ECN is just a dream :(

You might want to look into CHOKe and ECSFQ which are other AQM models
that have shown up in research.


-- 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH] sch_red: report backlog information
  2011-01-03 14:02                   ` Eric Dumazet
  2011-01-03 15:02                     ` jamal
@ 2011-01-03 18:11                     ` Eric Dumazet
  2011-01-03 20:13                       ` David Miller
  1 sibling, 1 reply; 35+ messages in thread
From: Eric Dumazet @ 2011-01-03 18:11 UTC (permalink / raw)
  To: hadi
  Cc: Jarek Poplawski, David Miller, Jesper Dangaard Brouer,
	Patrick McHardy, netdev

Provide child qdisc backlog (byte count) information so that "tc -s
qdisc" can report it to user.

packet count is already correctly provided.

qdisc red 11: parent 1:11 limit 60Kb min 15Kb max 45Kb ecn 
 Sent 3116427684 bytes 1415782 pkt (dropped 8, overlimits 7866 requeues 0) 
 rate 242385Kbit 13630pps backlog 13560b 8p requeues 0 
  marked 7865 early 1 pdrop 7 other 0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/sched/sch_red.c |    1 +
 1 files changed, 1 insertion(+)

diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 8d42bb3..a67ba3c 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -239,6 +239,7 @@ static int red_dump(struct Qdisc *sch, struct sk_buff *skb)
 		.Scell_log	= q->parms.Scell_log,
 	};
 
+	sch->qstats.backlog = q->qdisc->qstats.backlog;
 	opts = nla_nest_start(skb, TCA_OPTIONS);
 	if (opts == NULL)
 		goto nla_put_failure;



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH] sch_red: report backlog information
  2011-01-03 18:11                     ` [PATCH] sch_red: report backlog information Eric Dumazet
@ 2011-01-03 20:13                       ` David Miller
  0 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2011-01-03 20:13 UTC (permalink / raw)
  To: eric.dumazet; +Cc: hadi, jarkao2, hawk, kaber, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 03 Jan 2011 19:11:38 +0100

> Provide child qdisc backlog (byte count) information so that "tc -s
> qdisc" can report it to user.
> 
> packet count is already correctly provided.
> 
> qdisc red 11: parent 1:11 limit 60Kb min 15Kb max 45Kb ecn 
>  Sent 3116427684 bytes 1415782 pkt (dropped 8, overlimits 7866 requeues 0) 
>  rate 242385Kbit 13630pps backlog 13560b 8p requeues 0 
>   marked 7865 early 1 pdrop 7 other 0
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] net_sched: mark packet staying on queue too long
  2011-01-03 17:58                 ` [RFC] net_sched: mark packet staying on queue too long Stephen Hemminger
@ 2011-01-04 14:19                   ` Jesper Dangaard Brouer
  2011-01-04 15:02                     ` Eric Dumazet
  2011-01-13 11:50                     ` Patrick McHardy
  0 siblings, 2 replies; 35+ messages in thread
From: Jesper Dangaard Brouer @ 2011-01-04 14:19 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Eric Dumazet, hadi, Jarek Poplawski, David Miller,
	Patrick McHardy, netdev

On Mon, 3 Jan 2011, Stephen Hemminger wrote:

> On Sun, 02 Jan 2011 22:27:11 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>> While playing with SFQ and other AQM, I was bothered to see how easy it
>> was for a single tcp flow to 'fill the pipe' and consume lot of memory
>> buffers in queues. I know Jesper use more than 50.000 SFQ on his
>> routers, and with GRO packets this can consume a lot of memory.

That is true, operations department went kind of crazy when they started 
to add/move customers to our new Nehalem Xeon 5550 systems.  I have 
stopped them now ;-)

The use of an SFQ per customer, actually also solves the buffer bloat 
issue for our customers...


>> I played a bit adding ECN in SFQ, first by marking packets for a
>> particular flow if this flow qlen was above a given threshold, and later
>> using another trick : ECN mark packet if it stayed longer than a given
>> delay in the queue. This of course could be done on other modules, what
>> do you think ?

This is very interesting stuff! :-)

Are you inspired by Jim Gettys buffer bloat discussions?

> ...
> You might want to look into CHOKe and ECSFQ which are other AQM models
> that have shown up in research.

Have you looked at the SFB (Stochastic Fair Blue) implementation by 
Juliusz Chroboczek?

http://www.pps.jussieu.fr/~jch/software/sfb/

Cheers,
   Jesper Brouer

--
-------------------------------------------------------------------
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
-------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] net_sched: mark packet staying on queue too long
  2011-01-04 14:19                   ` Jesper Dangaard Brouer
@ 2011-01-04 15:02                     ` Eric Dumazet
  2011-01-04 16:05                       ` Stephen Hemminger
  2011-01-04 18:20                       ` Eric Dumazet
  2011-01-13 11:50                     ` Patrick McHardy
  1 sibling, 2 replies; 35+ messages in thread
From: Eric Dumazet @ 2011-01-04 15:02 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Stephen Hemminger, hadi, Jarek Poplawski, David Miller,
	Patrick McHardy, netdev

Le mardi 04 janvier 2011 à 15:19 +0100, Jesper Dangaard Brouer a écrit :
> On Mon, 3 Jan 2011, Stephen Hemminger wrote:
> 
> > On Sun, 02 Jan 2011 22:27:11 +0100
> > Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> >> While playing with SFQ and other AQM, I was bothered to see how easy it
> >> was for a single tcp flow to 'fill the pipe' and consume lot of memory
> >> buffers in queues. I know Jesper use more than 50.000 SFQ on his
> >> routers, and with GRO packets this can consume a lot of memory.
> 
> That is true, operations department went kind of crazy when they started 
> to add/move customers to our new Nehalem Xeon 5550 systems.  I have 
> stopped them now ;-)
> 
> The use of an SFQ per customer, actually also solves the buffer bloat 
> issue for our customers...
> 
> 
> >> I played a bit adding ECN in SFQ, first by marking packets for a
> >> particular flow if this flow qlen was above a given threshold, and later
> >> using another trick : ECN mark packet if it stayed longer than a given
> >> delay in the queue. This of course could be done on other modules, what
> >> do you think ?
> 
> This is very interesting stuff! :-)
> 
> Are you inspired by Jim Gettys buffer bloat discussions?
> 

Not at all, I had to install an AQM here at work, I chose SFQ because
the machines only handle tcp flows (and limited number of flows)

> > ...
> > You might want to look into CHOKe and ECSFQ which are other AQM models
> > that have shown up in research.
> 
> Have you looked at the SFB (Stochastic Fair Blue) implementation by 
> Juliusz Chroboczek?
> 
> http://www.pps.jussieu.fr/~jch/software/sfb/

Yes I did, but after some reading, I think there is an issue with BLUE,
regarding number of cache misses and complexity because of Bloom filter
(and double hashing)
For workloads with many flows, all bits are marked very fast.

I'd like to try kind of a SFQRED implementation, ie :

classify flows, then instead of using plain pfifo queues (currently done
in SFQ), use N pseudo RED queues.

RED is a bit complex because it tries to make the probability estimation
given queue backlog average. It has to use expensive time services (on
some machines at least, if TSC not available)

My idea was to take into account the delay packets stay in its queue, so
that no extra state is needed : Only take a timestamp when packet is
enqueued, compute delta when dequeued, get 

Px = delta * Prob_per_time_unit;
and drop/mark packet with Px probability.

Ram usage of SFQRED would be the same than SFQ, and cost roughly the
same (because we could use jiffies based time sampling, (and HZ=1000 for
a ms unit)).




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] net_sched: mark packet staying on queue too long
  2011-01-04 15:02                     ` Eric Dumazet
@ 2011-01-04 16:05                       ` Stephen Hemminger
  2011-01-04 18:20                       ` Eric Dumazet
  1 sibling, 0 replies; 35+ messages in thread
From: Stephen Hemminger @ 2011-01-04 16:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jesper Dangaard Brouer, hadi, Jarek Poplawski, David Miller,
	Patrick McHardy, netdev

I am playing with Quick Fair Queue (QFQ) from FreeBSD.
The authors started with Linux first and sent me their version,
it needs some work first.

http://info.iet.unipi.it/~luigi/qfq/

http://www.google.com/search?client=ubuntu&channel=fs&q=QFQ+rizzo&ie=utf-8&oe=utf-8

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] net_sched: mark packet staying on queue too long
  2011-01-04 15:02                     ` Eric Dumazet
  2011-01-04 16:05                       ` Stephen Hemminger
@ 2011-01-04 18:20                       ` Eric Dumazet
  2011-01-04 18:29                         ` Eric Dumazet
  1 sibling, 1 reply; 35+ messages in thread
From: Eric Dumazet @ 2011-01-04 18:20 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Stephen Hemminger, hadi, Jarek Poplawski, David Miller,
	Patrick McHardy, netdev

Le mardi 04 janvier 2011 à 16:02 +0100, Eric Dumazet a écrit :

> I'd like to try kind of a SFQRED implementation, ie :
> 
> classify flows, then instead of using plain pfifo queues (currently done
> in SFQ), use N pseudo RED queues.
> 
> RED is a bit complex because it tries to make the probability estimation
> given queue backlog average. It has to use expensive time services (on
> some machines at least, if TSC not available)
> 
> My idea was to take into account the delay packets stay in its queue, so
> that no extra state is needed : Only take a timestamp when packet is
> enqueued, compute delta when dequeued, get 
> 
> Px = delta * Prob_per_time_unit;
> and drop/mark packet with Px probability.
> 
> Ram usage of SFQRED would be the same than SFQ, and cost roughly the
> same (because we could use jiffies based time sampling, (and HZ=1000 for
> a ms unit)).
> 
> 

Here is the POC patch I am currently testing, with a probability to
"early drop" a packet of one percent per ms (HZ=1000 here), only if
packet stayed at least 4 ms on queue.

Of course, this only apply where SFQ is used, with known SFQ limits :)

The term "early drop" is a lie. RED really early mark/drop a packet at
enqueue() time, while I do it at dequeue() time [since I need to compute
the delay]. But effect is the same on sent packets. This might use a bit
more memory, but no more than current SFQ [and only if flows dont react
to mark/drops]

insmod net/sched/sch_sfq.ko red_delay=4

By the way, I do think we should lower SFQ_DEPTH a bit and increase
SFQ_SLOTS by same amount. Allowing 127 packets per flow seems not
necessary in most situations SFQ might be used.

 net/sched/sch_sfq.c |   37 +++++++++++++++++++++++++++++++++----
 1 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index d54ac94..4f958e3 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -24,6 +24,8 @@
 #include <net/ip.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <linux/moduleparam.h>
 
 
 /*	Stochastic Fairness Queuing algorithm.
@@ -86,6 +88,10 @@
 /* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */
 typedef unsigned char sfq_index;
 
+static int red_delay; /* default : no RED handling */
+module_param(red_delay, int, 0);
+MODULE_PARM_DESC(red_delay, "mark/drop packets if they stay in queue longer than red_delay ticks");
+
 /*
  * We dont use pointers to save space.
  * Small indexes [0 ... SFQ_SLOTS - 1] are 'pointers' to slots[] array
@@ -391,6 +397,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 
 	sch->qstats.backlog += qdisc_pkt_len(skb);
 	slot_queue_add(slot, skb);
+	qdisc_skb_cb(skb)->timestamp = jiffies;
 	sfq_inc(q, x);
 	if (slot->qlen == 1) {		/* The flow is new */
 		if (q->tail == NULL) {	/* It is the first flow */
@@ -402,11 +409,8 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		q->tail = slot;
 		slot->allot = q->scaled_quantum;
 	}
-	if (++sch->q.qlen <= q->limit) {
-		sch->bstats.bytes += qdisc_pkt_len(skb);
-		sch->bstats.packets++;
+	if (++sch->q.qlen <= q->limit)
 		return NET_XMIT_SUCCESS;
-	}
 
 	sfq_drop(sch);
 	return NET_XMIT_CN;
@@ -432,6 +436,7 @@ sfq_dequeue(struct Qdisc *sch)
 	sfq_index a, next_a;
 	struct sfq_slot *slot;
 
+restart:
 	/* No active slots */
 	if (q->tail == NULL)
 		return NULL;
@@ -455,12 +460,36 @@ next_slot:
 		next_a = slot->next;
 		if (a == next_a) {
 			q->tail = NULL; /* no more active slots */
+			/* last packet queued, dont even try to apply RED */
 			return skb;
 		}
 		q->tail->next = next_a;
 	} else {
 		slot->allot -= SFQ_ALLOT_SIZE(qdisc_pkt_len(skb));
 	}
+	if (red_delay) {
+		long delay = jiffies - qdisc_skb_cb(skb)->timestamp;
+
+		if (delay >= red_delay) {
+			long Px = delay * (0xFFFFFF / 100); /* 1 percent per jiffy */
+			if ((net_random() & 0xFFFFFF) < Px) {
+				if (INET_ECN_set_ce(skb)) {
+					/* no ecnmark counter yet :) */
+					sch->qstats.overlimits++;
+				} else {
+					/* penalize this flow : we drop the 
+					 * packet while we changed slot->allot
+					 */
+					kfree_skb(skb);
+					/* no early_drop counter yet :) */
+					sch->qstats.drops++;
+					goto restart;
+				}
+			}
+		}
+	}
+	sch->bstats.bytes += qdisc_pkt_len(skb);
+	sch->bstats.packets++;
 	return skb;
 }
 



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC] net_sched: mark packet staying on queue too long
  2011-01-04 18:20                       ` Eric Dumazet
@ 2011-01-04 18:29                         ` Eric Dumazet
  0 siblings, 0 replies; 35+ messages in thread
From: Eric Dumazet @ 2011-01-04 18:29 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Stephen Hemminger, hadi, Jarek Poplawski, David Miller,
	Patrick McHardy, netdev

Le mardi 04 janvier 2011 à 19:20 +0100, Eric Dumazet a écrit :

> diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
> index d54ac94..4f958e3 100644
> --- a/net/sched/sch_sfq.c
> +++ b/net/sched/sch_sfq.c
> @@ -24,6 +24,8 @@
>  #include <net/ip.h>
>  #include <net/netlink.h>
>  #include <net/pkt_sched.h>
> +#include <net/inet_ecn.h>
> +#include <linux/moduleparam.h>
>  
> 
>  /*	Stochastic Fairness Queuing algorithm.
> @@ -86,6 +88,10 @@
>  /* This type should contain at least SFQ_DEPTH + SFQ_SLOTS values */
>  typedef unsigned char sfq_index;
>  
> +static int red_delay; /* default : no RED handling */
> +module_param(red_delay, int, 0);
> +MODULE_PARM_DESC(red_delay, "mark/drop packets if they stay in queue longer than red_delay ticks");
> +
>  /*
>   * We dont use pointers to save space.
>   * Small indexes [0 ... SFQ_SLOTS - 1] are 'pointers' to slots[] array
> @@ -391,6 +397,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>  
>  	sch->qstats.backlog += qdisc_pkt_len(skb);
>  	slot_queue_add(slot, skb);
> +	qdisc_skb_cb(skb)->timestamp = jiffies;
>  	sfq_inc(q, x);
>  	if (slot->qlen == 1) {		/* The flow is new */
>  		if (q->tail == NULL) {	/* It is the first flow */
> @@ -402,11 +409,8 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch)
>  		q->tail = slot;
>  		slot->allot = q->scaled_quantum;
>  	}
> -	if (++sch->q.qlen <= q->limit) {
> -		sch->bstats.bytes += qdisc_pkt_len(skb);
> -		sch->bstats.packets++;
> +	if (++sch->q.qlen <= q->limit)
>  		return NET_XMIT_SUCCESS;
> -	}
>  
>  	sfq_drop(sch);
>  	return NET_XMIT_CN;
> @@ -432,6 +436,7 @@ sfq_dequeue(struct Qdisc *sch)
>  	sfq_index a, next_a;
>  	struct sfq_slot *slot;
>  
> +restart:
>  	/* No active slots */
>  	if (q->tail == NULL)
>  		return NULL;
> @@ -455,12 +460,36 @@ next_slot:
>  		next_a = slot->next;
>  		if (a == next_a) {
>  			q->tail = NULL; /* no more active slots */
> +			/* last packet queued, dont even try to apply RED */

And of course I should do "goto end;" to properly do accounting,
or maybe do the RED thing after all ;)

>  			return skb;
>  		}
>  		q->tail->next = next_a;
>  	} else {
>  		slot->allot -= SFQ_ALLOT_SIZE(qdisc_pkt_len(skb));
>  	}
> +	if (red_delay) {
> +		long delay = jiffies - qdisc_skb_cb(skb)->timestamp;
> +
> +		if (delay >= red_delay) {
> +			long Px = delay * (0xFFFFFF / 100); /* 1 percent per jiffy */
> +			if ((net_random() & 0xFFFFFF) < Px) {
> +				if (INET_ECN_set_ce(skb)) {
> +					/* no ecnmark counter yet :) */
> +					sch->qstats.overlimits++;
> +				} else {
> +					/* penalize this flow : we drop the 
> +					 * packet while we changed slot->allot
> +					 */
> +					kfree_skb(skb);
> +					/* no early_drop counter yet :) */
> +					sch->qstats.drops++;
> +					goto restart;
> +				}
> +			}
> +		}
> +	}

end:

> +	sch->bstats.bytes += qdisc_pkt_len(skb);
> +	sch->bstats.packets++;
>  	return skb;
>  }
>  
> 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] net_sched: mark packet staying on queue too long
  2011-01-04 14:19                   ` Jesper Dangaard Brouer
  2011-01-04 15:02                     ` Eric Dumazet
@ 2011-01-13 11:50                     ` Patrick McHardy
  2011-01-13 15:54                       ` Eric Dumazet
  2011-01-13 16:04                       ` sch_sfb [was: net_sched: mark packet staying on queue too long] Juliusz Chroboczek
  1 sibling, 2 replies; 35+ messages in thread
From: Patrick McHardy @ 2011-01-13 11:50 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Stephen Hemminger, Eric Dumazet, hadi, Jarek Poplawski,
	David Miller, netdev

On 04.01.2011 15:19, Jesper Dangaard Brouer wrote:
>> ...
>> You might want to look into CHOKe and ECSFQ which are other AQM models
>> that have shown up in research.
> 
> Have you looked at the SFB (Stochastic Fair Blue) implementation by Juliusz Chroboczek?
> 
> http://www.pps.jussieu.fr/~jch/software/sfb/

I had a closer look at this some time ago and noticed a couple of bugs
(f.i. double buffering might be enabled or disabled or the buffers
switched while a packet is queued, so on dequeue the wrong buffer will
have its queue length decremented) and also found the hashing quite
inefficient, so I've implemented my own version. There's still a minor
bug somewhere, but if people are interested I could finish it some
time soon and post the patches.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] net_sched: mark packet staying on queue too long
  2011-01-13 11:50                     ` Patrick McHardy
@ 2011-01-13 15:54                       ` Eric Dumazet
  2011-01-13 16:04                       ` sch_sfb [was: net_sched: mark packet staying on queue too long] Juliusz Chroboczek
  1 sibling, 0 replies; 35+ messages in thread
From: Eric Dumazet @ 2011-01-13 15:54 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Jesper Dangaard Brouer, Stephen Hemminger, hadi, Jarek Poplawski,
	David Miller, netdev

Le jeudi 13 janvier 2011 à 12:50 +0100, Patrick McHardy a écrit :
> On 04.01.2011 15:19, Jesper Dangaard Brouer wrote:
> >> ...
> >> You might want to look into CHOKe and ECSFQ which are other AQM models
> >> that have shown up in research.
> > 
> > Have you looked at the SFB (Stochastic Fair Blue) implementation by Juliusz Chroboczek?
> > 
> > http://www.pps.jussieu.fr/~jch/software/sfb/
> 
> I had a closer look at this some time ago and noticed a couple of bugs
> (f.i. double buffering might be enabled or disabled or the buffers
> switched while a packet is queued, so on dequeue the wrong buffer will
> have its queue length decremented) and also found the hashing quite
> inefficient, so I've implemented my own version. There's still a minor
> bug somewhere, but if people are interested I could finish it some
> time soon and post the patches.

I am very interested Patrick, I also found SFB hashing inefficient, so
any new idea is welcomed. 

We should get more schedulers at hand.

CHOKe is about to be released, QFQ is also coming soon, thanks to
Stephen (and respective AQM authors)

Thanks



^ permalink raw reply	[flat|nested] 35+ messages in thread

* sch_sfb [was: net_sched: mark packet staying on queue too long]
  2011-01-13 11:50                     ` Patrick McHardy
  2011-01-13 15:54                       ` Eric Dumazet
@ 2011-01-13 16:04                       ` Juliusz Chroboczek
  2011-01-13 18:59                         ` Patrick McHardy
  1 sibling, 1 reply; 35+ messages in thread
From: Juliusz Chroboczek @ 2011-01-13 16:04 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, Jesper Dangaard Brouer, David Miller

>> Have you looked at the SFB (Stochastic Fair Blue) implementation by
>> Juliusz Chroboczek?

>> http://www.pps.jussieu.fr/~jch/software/sfb/

> I had a closer look at this some time ago and noticed a couple of bugs
> (f.i. double buffering might be enabled or disabled or the buffers
> switched while a packet is queued, so on dequeue the wrong buffer will
> have its queue length decremented) and also found the hashing quite
> inefficient,

And you never found the time to drop me a mail on the subject?

> so I've implemented my own version.

I see.

                                        Juliusz

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sch_sfb [was: net_sched: mark packet staying on queue too long]
  2011-01-13 16:04                       ` sch_sfb [was: net_sched: mark packet staying on queue too long] Juliusz Chroboczek
@ 2011-01-13 18:59                         ` Patrick McHardy
  2011-01-14  0:59                           ` sch_sfb Juliusz Chroboczek
  0 siblings, 1 reply; 35+ messages in thread
From: Patrick McHardy @ 2011-01-13 18:59 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: netdev, Jesper Dangaard Brouer, David Miller

Am 13.01.2011 17:04, schrieb Juliusz Chroboczek:
>>> Have you looked at the SFB (Stochastic Fair Blue) implementation by
>>> Juliusz Chroboczek?
> 
>>> http://www.pps.jussieu.fr/~jch/software/sfb/
> 
>> I had a closer look at this some time ago and noticed a couple of bugs
>> (f.i. double buffering might be enabled or disabled or the buffers
>> switched while a packet is queued, so on dequeue the wrong buffer will
>> have its queue length decremented) and also found the hashing quite
>> inefficient,
> 
> And you never found the time to drop me a mail on the subject?

Well, I just looked at it out of interest after already having started
my own version. Also I was riding the train without possibility of
communication :)

>> so I've implemented my own version.
> 
> I see.

I took a lot of ideas from your version, and this is also mentioned
in my version. It just seemed easier to start from scratch than to
fully analyze your version and base it on the original paper.
Since to my knowledge you've never attempted an upstream merge this
also seemed like the more polite way. No impoliteness intended.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sch_sfb
  2011-01-13 18:59                         ` Patrick McHardy
@ 2011-01-14  0:59                           ` Juliusz Chroboczek
  2011-01-14  1:39                             ` sch_sfb Patrick McHardy
  0 siblings, 1 reply; 35+ messages in thread
From: Juliusz Chroboczek @ 2011-01-14  0:59 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, Jesper Dangaard Brouer, David Miller

> Since to my knowledge you've never attempted an upstream merge

You're kidding, I hope.

  http://thread.gmane.org/gmane.linux.network/90225
  http://thread.gmane.org/gmane.linux.network/90375

It was reviewed in particular by one Patrick McHardy.

                                        Juliusz

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sch_sfb
  2011-01-14  0:59                           ` sch_sfb Juliusz Chroboczek
@ 2011-01-14  1:39                             ` Patrick McHardy
  2011-01-14 13:34                               ` sch_sfb Juliusz Chroboczek
  0 siblings, 1 reply; 35+ messages in thread
From: Patrick McHardy @ 2011-01-14  1:39 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: netdev, Jesper Dangaard Brouer, David Miller

On 14.01.2011 01:59, Juliusz Chroboczek wrote:
>> Since to my knowledge you've never attempted an upstream merge
> 
> You're kidding, I hope.
> 
>   http://thread.gmane.org/gmane.linux.network/90225
>   http://thread.gmane.org/gmane.linux.network/90375
> 
> It was reviewed in particular by one Patrick McHardy.

Well, sorry, I don't remember every patch I've ever reviewed.

Just to state it again, I've actually started implementing this
years ago without every finishing it and was quite happy about
noticing your implementation for inspiration. There's no reason
to be pissed, I don't really mind which version is merged if
any, reimplementing just forced me to think from scratch, with
existing code at hand this makes it easier to avoid or fix
mistakes.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sch_sfb
  2011-01-14  1:39                             ` sch_sfb Patrick McHardy
@ 2011-01-14 13:34                               ` Juliusz Chroboczek
  2011-01-14 13:37                                 ` sch_sfb Patrick McHardy
                                                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Juliusz Chroboczek @ 2011-01-14 13:34 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, David Miller

> I just looked at it out of interest after already having started my
> own version.

>>   http://thread.gmane.org/gmane.linux.network/90225
>>   http://thread.gmane.org/gmane.linux.network/90375

>> It was reviewed in particular by one Patrick McHardy.

> There's no reason to be pissed

Yes, there is.

First you object to my patch by making a bunch of unreasonable requests
(notably that I use the in-kernel classifiers, which are not usable with
Bloom filters).  Then it turns out you're implementing your own version
"from scratch".  And then you claim that you never saw my version in the
first place?

Patrick, what you're doing is not merely rude, it's actually unethical.

                                        Juliusz Chroboczek

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sch_sfb
  2011-01-14 13:34                               ` sch_sfb Juliusz Chroboczek
@ 2011-01-14 13:37                                 ` Patrick McHardy
  2011-01-14 15:09                                 ` sch_sfb Eric Dumazet
  2011-01-14 21:06                                 ` sch_sfb Jarek Poplawski
  2 siblings, 0 replies; 35+ messages in thread
From: Patrick McHardy @ 2011-01-14 13:37 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: netdev, David Miller

On 14.01.2011 14:34, Juliusz Chroboczek wrote:
>> I just looked at it out of interest after already having started my
>> own version.
> 
>>>   http://thread.gmane.org/gmane.linux.network/90225
>>>   http://thread.gmane.org/gmane.linux.network/90375
> 
>>> It was reviewed in particular by one Patrick McHardy.
> 
>> There's no reason to be pissed
> 
> Yes, there is.
> 
> First you object to my patch by making a bunch of unreasonable requests
> (notably that I use the in-kernel classifiers, which are not usable with
> Bloom filters).

They are - at least the flow classifier and others returning numerical
classids. Which is the one most useful for SFB.

> Then it turns out you're implementing your own version
> "from scratch".  And then you claim that you never saw my version in the
> first place?

I said I didn't remember reviewing it. If you read my first email,
I specifically stated that I had a look at your version and noticed
a few bugs,

> Patrick, what you're doing is not merely rude, it's actually unethical.

I'm not continuing this discussion.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sch_sfb
  2011-01-14 13:34                               ` sch_sfb Juliusz Chroboczek
  2011-01-14 13:37                                 ` sch_sfb Patrick McHardy
@ 2011-01-14 15:09                                 ` Eric Dumazet
  2011-01-14 21:06                                 ` sch_sfb Jarek Poplawski
  2 siblings, 0 replies; 35+ messages in thread
From: Eric Dumazet @ 2011-01-14 15:09 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: Patrick McHardy, David Miller, netdev

Le vendredi 14 janvier 2011 à 14:34 +0100, Juliusz Chroboczek a écrit :
> > I just looked at it out of interest after already having started my
> > own version.
> 
> >>   http://thread.gmane.org/gmane.linux.network/90225
> >>   http://thread.gmane.org/gmane.linux.network/90375
> 
> >> It was reviewed in particular by one Patrick McHardy.
> 
> > There's no reason to be pissed
> 
> Yes, there is.
> 
> First you object to my patch by making a bunch of unreasonable requests
> (notably that I use the in-kernel classifiers, which are not usable with
> Bloom filters).  Then it turns out you're implementing your own version
> "from scratch".  And then you claim that you never saw my version in the
> first place?
> 
> Patrick, what you're doing is not merely rude, it's actually unethical.
> 

Juliusz, I believe you overreact on this one.

Patrick reviews are really good and very reasonable. Yet, for a beginner
it might sounds difficult to understand and make the requested changes.

I sent you a private mail 2 weeks ago to offer my services to take SFB
and get it into kernel, because I read your previous attempts and
understood you had litle time to make the needed changes.

I got distracted by SFQ/CHOKe experiments but was about to work on SFB
when Patrick sent his mail. I really think we should work together, and
eventually offer SFB as a new qdisc to network admins :)

Thanks !



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: sch_sfb
  2011-01-14 13:34                               ` sch_sfb Juliusz Chroboczek
  2011-01-14 13:37                                 ` sch_sfb Patrick McHardy
  2011-01-14 15:09                                 ` sch_sfb Eric Dumazet
@ 2011-01-14 21:06                                 ` Jarek Poplawski
  2 siblings, 0 replies; 35+ messages in thread
From: Jarek Poplawski @ 2011-01-14 21:06 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: Patrick McHardy, netdev, David Miller

Juliusz Chroboczek wrote:
>> I just looked at it out of interest after already having started my
>> own version.
> 
>>>   http://thread.gmane.org/gmane.linux.network/90225
>>>   http://thread.gmane.org/gmane.linux.network/90375
> 
>>> It was reviewed in particular by one Patrick McHardy.
> 
>> There's no reason to be pissed
> 
> Yes, there is.
> 
> First you object to my patch by making a bunch of unreasonable requests
> (notably that I use the in-kernel classifiers, which are not usable with
> Bloom filters).  Then it turns out you're implementing your own version
> "from scratch".  And then you claim that you never saw my version in the
> first place?
> 
> Patrick, what you're doing is not merely rude, it's actually unethical.

Or Linux Classics ;-) Vide: Molnar vs Kolivas.

Jarek P.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2011-01-14 21:06 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-20 14:35 [PATCH net-2.6] net_sched: always clone skbs Changli Gao
2010-12-20 14:37 ` Eric Dumazet
2010-12-20 18:27   ` David Miller
2010-12-20 23:20 ` Jarek Poplawski
2010-12-20 23:28   ` Eric Dumazet
2010-12-20 23:52     ` Jarek Poplawski
2010-12-21 13:52       ` jamal
2010-12-21 14:17         ` Changli Gao
2010-12-21 22:37           ` Jarek Poplawski
2010-12-23 13:35             ` jamal
2010-12-23 23:04               ` Jarek Poplawski
2011-01-02 21:27               ` [RFC] net_sched: mark packet staying on queue too long Eric Dumazet
2011-01-03 13:52                 ` jamal
2011-01-03 14:02                   ` Eric Dumazet
2011-01-03 15:02                     ` jamal
2011-01-03 18:11                     ` [PATCH] sch_red: report backlog information Eric Dumazet
2011-01-03 20:13                       ` David Miller
2011-01-03 17:58                 ` [RFC] net_sched: mark packet staying on queue too long Stephen Hemminger
2011-01-04 14:19                   ` Jesper Dangaard Brouer
2011-01-04 15:02                     ` Eric Dumazet
2011-01-04 16:05                       ` Stephen Hemminger
2011-01-04 18:20                       ` Eric Dumazet
2011-01-04 18:29                         ` Eric Dumazet
2011-01-13 11:50                     ` Patrick McHardy
2011-01-13 15:54                       ` Eric Dumazet
2011-01-13 16:04                       ` sch_sfb [was: net_sched: mark packet staying on queue too long] Juliusz Chroboczek
2011-01-13 18:59                         ` Patrick McHardy
2011-01-14  0:59                           ` sch_sfb Juliusz Chroboczek
2011-01-14  1:39                             ` sch_sfb Patrick McHardy
2011-01-14 13:34                               ` sch_sfb Juliusz Chroboczek
2011-01-14 13:37                                 ` sch_sfb Patrick McHardy
2011-01-14 15:09                                 ` sch_sfb Eric Dumazet
2011-01-14 21:06                                 ` sch_sfb Jarek Poplawski
2010-12-21  0:21     ` [PATCH net-2.6] net_sched: always clone skbs Changli Gao
2010-12-21  3:55       ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.