On Sun, May 10, 2015 at 10:57:44PM -0700, Alexei Starovoitov wrote: > On 5/10/15 4:43 PM, Pablo Neira Ayuso wrote: > > > >The noop is patched to an unconditional branch to skip that code, but > >the code is still there in that path, even if it's dormant. > > > >What the numbers show is rather simple: The more code is in the path, > >the less performance you get, and the qdisc ingress specific code > >embedded there is reducing performance for people that are not using > >qdisc ingress, hence it should go where it belongs. The static key > >cannot save you from that. > > hmm, did I miss these numbers ? > > My numbers are showing the opposite. There is no degradation whatsoever. > To recap, here are the numbers from my box: > no ingress - 37.6 > ingress on other dev - 36.5 > ingress on this dev - 28.8 > ingress on this dev + u32 - 24.1 > > after Daniel's two patches: > no ingress - 37.6 > ingress on other dev - 36.5 > ingress on this dev - 36.5 > ingress on this dev + u32 - 25.2 I'm refering to the numbers and the scenario that I describe in: http://patchwork.ozlabs.org/patch/470512/ ie. no qdisc ingress before my patch vs. no qdisc ingress after my patch. that shows that moving that code to sch_ingress results in a performance improvements. Some more numbers. I intentionally added more static key code that depends on the ingress_needed key, see attached patch, the numbers show a clear performance drop: Result: OK: 5049126(c5049126+d0) usec, 100000000 (60byte,0frags) 19805404pps 9506Mb/sec (9506593920bps) errors: 100000000 Remember that the base (the results with my patch applied) is: Result: OK: 4747078(c4747078+d0) usec, 100000000 (60byte,0frags) 21065587pps 10111Mb/sec (10111481760bps) errors: 100000000 That's 1M pps less because of more code that is under the static key. So I'm measuring an impact on the use of static keys. Yes, I have jump_label support as I told to Daniel, and I can reproduce this numbers over and over again. Perf also show that I'm measuring the right thing. I'm running exactly the pktgen tests with netif_receive using the script from the patch description, so it's basically the same test here. My old box is a Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz lscpu on debian shows: L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 3072K Please, carefully read my patch description. I think you can rebuild your work on top of this patch. Thanks.