On 8/12/2019 7:06 PM, Catalin Marinas wrote: Following the discussions on v2 of this patch(set) [1], this series takes slightly different approach: - it implements its own simple memory pool that does not rely on the slab allocator - drops the early log buffer logic entirely since it can now allocate metadata from the memory pool directly before kmemleak is fully initialised - CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE option is renamed to CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE - moves the kmemleak_init() call earlier (mm_init()) - to avoid a separate memory pool for struct scan_area, it makes the tool robust when such allocations fail as scan areas are rather an optimisation [1] http://lkml.kernel.org/r/20190727132334.9184-1-catalin.marinas@arm.com Catalin Marinas (3): mm: kmemleak: Make the tool tolerant to struct scan_area allocation failures mm: kmemleak: Simple memory allocation pool for kmemleak objects mm: kmemleak: Use the memory pool for early allocations init/main.c | 2 +- lib/Kconfig.debug | 11 +- mm/kmemleak.c | 325 ++++++++++++---------------------------------- 3 files changed, 91 insertions(+), 247 deletions(-) Hi Catalin, We observe severe degradation in our network performance affecting all of our NICs. The degradation is directly linked to this patch. What we run: Simple Iperf TCP loopback with 8 streams on ConnectX5-100GbE. Since it's a loopback test, traffic goes from the socket through the IP stack and back to the socket, without going through the NIC driver. What we observe: Throughput performance: - Kernel 5.3GA - Throughput was 230Gbps - Kernel 5.4-rc1 and later - Throughput is 50Gbps CPU utilization-wise: Using perf we see much higher CPU utilization with kmem related functions: Function | Kernel 5.3GA | Kernel 5.4-rc1 and later --------------------------|--------------|------------------------- __kfree_skb | 3.4% | 11.0% kmem_cache_free | 0.3% | 10.2% __alloc_skb | 2.2% | 26.0% queued_spin_lock_slowpath | 1.3% | 26.3% delete_object_full | Not used | 18.0% 'delete_object_full()' function seems like the one which starts the slower flow. One of the conditions causing this function to kick into action is 'kmemleak_free_enabled' flag, which was changed to enabled by default by your series. Reverting discussed series restore the performance almost completely. Can you help shed light on the subject?