* [PATCH -next] net: core: set qdisc pkt len before tc_classify
@ 2015-05-12 13:37 Florian Westphal
2015-05-12 14:22 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Florian Westphal @ 2015-05-12 13:37 UTC (permalink / raw)
To: netdev; +Cc: Florian Westphal
commit d2788d34885d4ce5ba ("net: sched: further simplify handle_ing")
removed the call to qdisc_enqueue_root().
However, after this removal we no longer set qdisc pkt length.
This breaks traffic policing on ingress.
Daniel re-ran his test with this patch applied and reports that
while there is a small performance hit the total gain vs. before
d2788d34885d4ce5ba is still consistently higher than 1Mpps.
Fixes: d2788d34885d ("net: sched: further simplify handle_ing")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
net/core/dev.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index e5f77c4..871181a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3540,8 +3540,9 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
*pt_prev = NULL;
}
- qdisc_bstats_update_cpu(cl->q, skb);
+ qdisc_skb_cb(skb)->pkt_len = skb->len;
skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
+ qdisc_bstats_update_cpu(cl->q, skb);
switch (tc_classify(skb, cl, &cl_res)) {
case TC_ACT_OK:
--
2.0.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH -next] net: core: set qdisc pkt len before tc_classify
2015-05-12 13:37 [PATCH -next] net: core: set qdisc pkt len before tc_classify Florian Westphal
@ 2015-05-12 14:22 ` Eric Dumazet
2015-05-12 15:16 ` Florian Westphal
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2015-05-12 14:22 UTC (permalink / raw)
To: Florian Westphal; +Cc: netdev
On Tue, 2015-05-12 at 15:37 +0200, Florian Westphal wrote:
>
> Fixes: d2788d34885d ("net: sched: further simplify handle_ing")
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
> net/core/dev.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index e5f77c4..871181a 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3540,8 +3540,9 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
> *pt_prev = NULL;
> }
>
> - qdisc_bstats_update_cpu(cl->q, skb);
> + qdisc_skb_cb(skb)->pkt_len = skb->len;
A qdisc might have a stab (cf qdisc_calculate_pkt_len() )
> skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
> + qdisc_bstats_update_cpu(cl->q, skb);
>
> switch (tc_classify(skb, cl, &cl_res)) {
> case TC_ACT_OK:
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH -next] net: core: set qdisc pkt len before tc_classify
2015-05-12 14:22 ` Eric Dumazet
@ 2015-05-12 15:16 ` Florian Westphal
2015-05-12 17:03 ` Alexei Starovoitov
0 siblings, 1 reply; 7+ messages in thread
From: Florian Westphal @ 2015-05-12 15:16 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Florian Westphal, netdev
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Fixes: d2788d34885d ("net: sched: further simplify handle_ing")
> > Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> > ---
> > net/core/dev.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index e5f77c4..871181a 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3540,8 +3540,9 @@ static inline struct sk_buff *handle_ing(struct sk_buff *skb,
> > *pt_prev = NULL;
> > }
> >
> > - qdisc_bstats_update_cpu(cl->q, skb);
> > + qdisc_skb_cb(skb)->pkt_len = skb->len;
>
>
> A qdisc might have a stab (cf qdisc_calculate_pkt_len() )
Thanks for pointing this out Eric.
I was under impression that stab was only useful for egress but in
fact tc did support ingress stab too.
Daniel will submit a v2.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH -next] net: core: set qdisc pkt len before tc_classify
2015-05-12 15:16 ` Florian Westphal
@ 2015-05-12 17:03 ` Alexei Starovoitov
2015-05-13 11:02 ` Jamal Hadi Salim
0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2015-05-12 17:03 UTC (permalink / raw)
To: Florian Westphal; +Cc: Eric Dumazet, netdev, daniel, jhs
On Tue, May 12, 2015 at 05:16:46PM +0200, Florian Westphal wrote:
> >
> > A qdisc might have a stab (cf qdisc_calculate_pkt_len() )
>
> Thanks for pointing this out Eric.
Indeed. Thanks Eric!
> I was under impression that stab was only useful for egress but in
> fact tc did support ingress stab too.
That was my impression as well. Though it was allowed to add
qdisc_size_table to ingress, it's useless. Nothing takes advantage
of recomputed qdisc_pkt_len. It can only mess with stats, which
seems to be already broken:
- egress qdiscs do qdisc_bstats_update() only at dequeue, whereas
ingress double counts dropped packets
- qdisc_bstats_update() does:
bstats->bytes += qdisc_pkt_len(skb);
bstats->packets += skb_is_gso(skb) ? skb_shinfo(skb)->gso_segs : 1;
but nothing on ingress does qdisc_pkt_len_init().
So when we see gso packet on ingress the stats are very wrong.
I think we should fix ingress stats as a whole.
Option 1:
do qdisc_pkt_len_init() + optional call into size_table
to update stats only after classifers returned TC_ACT_OK.
Cons: extra overhead per packet only to update ingress qdisc stats.
Option 2:
use byte and packet counts from underlying netdev, when
reporting stats via ingress qdisc.
Pros: No arithmetic in the fast path.
I think option 2 is much preferred, since it's faster and
equally accurate.
Jamal, what's your take?
Thanks!
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH -next] net: core: set qdisc pkt len before tc_classify
2015-05-12 17:03 ` Alexei Starovoitov
@ 2015-05-13 11:02 ` Jamal Hadi Salim
2015-05-13 11:07 ` Jamal Hadi Salim
2015-05-13 13:10 ` Eric Dumazet
0 siblings, 2 replies; 7+ messages in thread
From: Jamal Hadi Salim @ 2015-05-13 11:02 UTC (permalink / raw)
To: Alexei Starovoitov, Florian Westphal; +Cc: Eric Dumazet, netdev, daniel
On 05/12/15 13:03, Alexei Starovoitov wrote:
> On Tue, May 12, 2015 at 05:16:46PM +0200, Florian Westphal wrote:
>>>
>>> A qdisc might have a stab (cf qdisc_calculate_pkt_len() )
>>
>> Thanks for pointing this out Eric.
>
> Indeed. Thanks Eric!
>
>> I was under impression that stab was only useful for egress but in
>> fact tc did support ingress stab too.
>
> That was my impression as well. Though it was allowed to add
> qdisc_size_table to ingress, it's useless. Nothing takes advantage
> of recomputed qdisc_pkt_len. It can only mess with stats, which
> seems to be already broken:
> - egress qdiscs do qdisc_bstats_update() only at dequeue, whereas
> ingress double counts dropped packets
> - qdisc_bstats_update() does:
> bstats->bytes += qdisc_pkt_len(skb);
> bstats->packets += skb_is_gso(skb) ? skb_shinfo(skb)->gso_segs : 1;
> but nothing on ingress does qdisc_pkt_len_init().
> So when we see gso packet on ingress the stats are very wrong.
>
> I think we should fix ingress stats as a whole.
> Option 1:
> do qdisc_pkt_len_init() + optional call into size_table
> to update stats only after classifers returned TC_ACT_OK.
> Cons: extra overhead per packet only to update ingress qdisc stats.
> Option 2:
> use byte and packet counts from underlying netdev, when
> reporting stats via ingress qdisc.
> Pros: No arithmetic in the fast path.
>
> I think option 2 is much preferred, since it's faster and
> equally accurate.
>
> Jamal, what's your take?
>
I dont think we need the stab on the ingress but we do need to account
for gso. So option #1 with qdisc_pkt_len_init() alone is the only thing
needed. i.e Florian's change becomes:
- qdisc_bstats_update_cpu(cl->q, skb);
+ qdisc_pkt_len_init(skb)
skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
+ qdisc_bstats_update_cpu(cl->q, skb);
Alexei, why do you say this option will have overhead?
cheers,
jamal
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH -next] net: core: set qdisc pkt len before tc_classify
2015-05-13 11:02 ` Jamal Hadi Salim
@ 2015-05-13 11:07 ` Jamal Hadi Salim
2015-05-13 13:10 ` Eric Dumazet
1 sibling, 0 replies; 7+ messages in thread
From: Jamal Hadi Salim @ 2015-05-13 11:07 UTC (permalink / raw)
To: Alexei Starovoitov, Florian Westphal; +Cc: Eric Dumazet, netdev, daniel
On 05/13/15 07:02, Jamal Hadi Salim wrote:
>
>
> I dont think we need the stab on the ingress but we do need to account
> for gso. So option #1 with qdisc_pkt_len_init() alone is the only thing
> needed. i.e Florian's change becomes:
>
> - qdisc_bstats_update_cpu(cl->q, skb);
> + qdisc_pkt_len_init(skb)
> skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
> + qdisc_bstats_update_cpu(cl->q, skb);
>
> Alexei, why do you say this option will have overhead?
>
and the comment in qdisc_pkt_len_init() needs fixing too.
It says:
/* To get more precise estimation of bytes sent on wire,
* we add to pkt_len the headers size of all segments
*/
It implies transmit direction only - modern nics do set this
on receive. Something like:
/* To get more precise estimation of bytes rx or to be sent
on the wire, we add to pkt_len the headers size of
all segments
*/
cheers,
jamal
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH -next] net: core: set qdisc pkt len before tc_classify
2015-05-13 11:02 ` Jamal Hadi Salim
2015-05-13 11:07 ` Jamal Hadi Salim
@ 2015-05-13 13:10 ` Eric Dumazet
1 sibling, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2015-05-13 13:10 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: Alexei Starovoitov, Florian Westphal, netdev, daniel
On Wed, 2015-05-13 at 07:02 -0400, Jamal Hadi Salim wrote:
> Alexei, why do you say this option will have overhead?
Well, they want to be able to give nice numbers and ultimately have
a nice bogomips value ;)
About stab on ingress, I never used it, but maybe Jesper or someone else
did.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-05-13 13:10 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-12 13:37 [PATCH -next] net: core: set qdisc pkt len before tc_classify Florian Westphal
2015-05-12 14:22 ` Eric Dumazet
2015-05-12 15:16 ` Florian Westphal
2015-05-12 17:03 ` Alexei Starovoitov
2015-05-13 11:02 ` Jamal Hadi Salim
2015-05-13 11:07 ` Jamal Hadi Salim
2015-05-13 13:10 ` Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).