Verification speed w/ KASAN builds

* Verification speed w/ KASAN builds
@ 2020-07-17 10:46 Lorenz Bauer
  2020-07-20 20:37 ` Alexei Starovoitov
  0 siblings, 1 reply; 3+ messages in thread
From: Lorenz Bauer @ 2020-07-17 10:46 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Daniel Borkmann; +Cc: kernel-team

Hi list,

I'm not sure whether this is a bug report or just the way of life.
The problem: we have a couple of machines that run KASAN
kernels to weed out bugs. On those machines, loading our
cls-redirect TC classifier takes so long that our user space
program aborts.

I've reproduced this in a VM: loading cls-redirect on a VM
with a 5.4 kernel without KASAN takes around 4 seconds.
Doing the same on recent bpf-next with KASAN and other
shenanigans enabled it takes more like a minute.

Is it expected that the overhead of KASAN is this large?
I went and collected a perf profile of loading the program
in the VM:

-   96.31%     1.00%  redirect.test  [kernel.kallsyms]  [k] do_check_common
   - 95.32% do_check_common
      - 69.24% states_equal.isra.0
         + 49.81% kmem_cache_alloc_trace
         + 16.77% kfree
         + 1.22% regsafe.part.0
      - 12.75% push_stack
         - 10.65% copy_verifier_state
            - 4.50% realloc_stack_state
               + 4.48% __kmalloc
            + 4.16% kmem_cache_alloc_trace
            + 1.82% __kmalloc
         + 2.07% kmem_cache_alloc_trace
      + 5.25% pop_stack
      + 2.84% push_jmp_history.isra.0
      + 2.46% copy_verifier_state
      + 1.00% free_verifier_state
        0.53% kmem_cache_alloc_trace
   + 1.00% runtime.goexit

Note that the version of cls-redirect in the tree and our internal version
have diverged a bit, the internal one is a bit more complicated.

Looking forward to your opinions,
Lorenz

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 3+ messages in thread