linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>,
	Matthew Wilcox <willy@infradead.org>, Qian Cai <cai@lca.pw>
Subject: Re: [PATCH v3 1/3] mm: kmemleak: Make the tool tolerant to struct scan_area allocation failures
Date: Thu, 3 Oct 2019 09:41:35 +0100	[thread overview]
Message-ID: <20191003084135.GA21629@arrakis.emea.arm.com> (raw)
In-Reply-To: <2ac37341-097e-17a2-fb6b-7912da9fa38e@ozlabs.ru>

On Thu, Oct 03, 2019 at 04:13:07PM +1000, Alexey Kardashevskiy wrote:
> On 13/08/2019 02:06, Catalin Marinas wrote:
> > Object scan areas are an optimisation aimed to decrease the false
> > positives and slightly improve the scanning time of large objects known
> > to only have a few specific pointers. If a struct scan_area fails to
> > allocate, kmemleak can still function normally by scanning the full
> > object.
> > 
> > Introduce an OBJECT_FULL_SCAN flag and mark objects as such when
> > scan_area allocation fails.
> 
> I came across this one while bisecting sudden drop in throughput of a
> 100Gbit Mellanox CX4 ethernet card in a PPC POWER9 system, the speed
> dropped from 100Gbit to about 40Gbit. Bisect pointed at dba82d943177,
> this are the relevant config options:
> 
> [fstn1-p1 kernel]$ grep KMEMLEAK ~/pbuild/kernel-le-4g/.config
> CONFIG_HAVE_DEBUG_KMEMLEAK=y
> CONFIG_DEBUG_KMEMLEAK=y
> CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=16000
> # CONFIG_DEBUG_KMEMLEAK_TEST is not set
> # CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF is not set
> CONFIG_DEBUG_KMEMLEAK_AUTO_SCAN=y

The throughput drop is probably caused caused by kmemleak slowing down
all memory allocations (including skb). So that's expected. You may get
similar drop with other debug options like lock proving, kasan.

> Setting CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=400 or even 4000 (this is
> what KMEMLEAK_EARLY_LOG_SIZE is now in the master) produces soft
> lockups on the recent upstream (sha1 a3c0e7b1fe1f):
> 
> [c000001fde64fb60] [c000000000c24ed4] _raw_write_unlock_irqrestore+0x54/0x70
> [c000001fde64fb90] [c0000000004117e4] find_and_remove_object+0xa4/0xd0
> [c000001fde64fbe0] [c000000000411c74] delete_object_full+0x24/0x50
> [c000001fde64fc00] [c000000000411d28] __kmemleak_do_cleanup+0x88/0xd0
> [c000001fde64fc40] [c00000000012a1a4] process_one_work+0x374/0x6a0
> [c000001fde64fd20] [c00000000012a548] worker_thread+0x78/0x5a0
> [c000001fde64fdb0] [c000000000135508] kthread+0x198/0x1a0
> [c000001fde64fe20] [c00000000000b980] ret_from_kernel_thread+0x5c/0x7c

That's the kmemleak disabling path. I don't have the full log but I
suspect by setting a small pool size, kmemleak failed to allocate memory
and went into disabling itself. The clean-up above tries to remove the
allocated metadata. It seems that it takes significant time on your
platform. Not sure how to avoid the soft lock-up but I wouldn't bother
too much about it, it's only triggered by a previous error condition
disabling kmemleak.

> KMEMLEAK_EARLY_LOG_SIZE=8000 works but slow.
> 
> Interestingly KMEMLEAK_EARLY_LOG_SIZE=400 on dba82d943177 still worked
> and I saw my 100Gbit. Disabling KMEMLEAK also fixes the speed
> (apparently).

A small memory pool for kmemleak just disables it shortly after boot, so
it's no longer in the way and you get your throughput back.

> Is that something expected? Thanks,

Yes for the throughput. Not sure about the soft lock-up. Do you have the
full log around the lock-up?

-- 
Catalin


  parent reply	other threads:[~2019-10-03  8:41 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-12 16:06 [PATCH v3 0/3] mm: kmemleak: Use a memory pool for kmemleak object allocations Catalin Marinas
2019-08-12 16:06 ` [PATCH v3 1/3] mm: kmemleak: Make the tool tolerant to struct scan_area allocation failures Catalin Marinas
     [not found]   ` <2ac37341-097e-17a2-fb6b-7912da9fa38e@ozlabs.ru>
2019-10-03  8:41     ` Catalin Marinas [this message]
     [not found]       ` <ba47fb68-f44c-04c9-7ea8-2705e799937b@ozlabs.ru>
2019-10-07  9:56         ` Catalin Marinas
2019-08-12 16:06 ` [PATCH v3 2/3] mm: kmemleak: Simple memory allocation pool for kmemleak objects Catalin Marinas
2019-08-13  9:37   ` Catalin Marinas
2019-08-12 16:06 ` [PATCH v3 3/3] mm: kmemleak: Use the memory pool for early allocations Catalin Marinas
2019-08-13  9:35   ` Catalin Marinas
2019-08-13 12:35   ` Qian Cai
2019-08-13 13:49     ` Catalin Marinas
2019-08-12 21:07 ` [PATCH v3 0/3] mm: kmemleak: Use a memory pool for kmemleak object allocations Andrew Morton
2019-08-13  9:40   ` Catalin Marinas
2019-12-03 15:51 ` Noam Stolero
2019-12-03 16:08   ` Catalin Marinas
2019-12-05 16:16     ` Noam Stolero

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191003084135.GA21629@arrakis.emea.arm.com \
    --to=catalin.marinas@arm.com \
    --cc=aik@ozlabs.ru \
    --cc=akpm@linux-foundation.org \
    --cc=cai@lca.pw \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).