All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: paulmck@kernel.org, linux-kernel@vger.kernel.org,
	rcu@vger.kernel.org, linux-mm@kvack.org
Cc: cl@linux.com, penberg@kernel.org, rientjes@google.com,
	iamjoonsoo.kim@lge.com, akpm@linux-foundation.org,
	ming.lei@redhat.com, axboe@kernel.dk, kernel-team@fb.com
Subject: Re: [PATCH mm,percpu_ref,rcu 1/6] mm: Add mem_dump_obj() to print source of memory block
Date: Fri, 8 Jan 2021 14:50:35 +0100	[thread overview]
Message-ID: <39e1bbd5-2766-d709-d932-bf66d11e244f@suse.cz> (raw)
In-Reply-To: <20210106011750.13709-1-paulmck@kernel.org>

On 1/6/21 2:17 AM, paulmck@kernel.org wrote:
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> There are kernel facilities such as per-CPU reference counts that give
> error messages in generic handlers or callbacks, whose messages are
> unenlightening.  In the case of per-CPU reference-count underflow, this
> is not a problem when creating a new use of this facility because in that
> case the bug is almost certainly in the code implementing that new use.
> However, trouble arises when deploying across many systems, which might
> exercise corner cases that were not seen during development and testing.
> Here, it would be really nice to get some kind of hint as to which of
> several uses the underflow was caused by.
> 
> This commit therefore exposes a mem_dump_obj() function that takes
> a pointer to memory (which must still be allocated if it has been
> dynamically allocated) and prints available information on where that
> memory came from.  This pointer can reference the middle of the block as
> well as the beginning of the block, as needed by things like RCU callback
> functions and timer handlers that might not know where the beginning of
> the memory block is.  These functions and handlers can use mem_dump_obj()
> to print out better hints as to where the problem might lie.
> 
> The information printed can depend on kernel configuration.  For example,
> the allocation return address can be printed only for slab and slub,
> and even then only when the necessary debug has been enabled.  For slab,
> build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
> to the next power of two or use the SLAB_STORE_USER when creating the
> kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
> boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
> if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
> to enable printing of the allocation-time stack trace.
> 
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: <linux-mm@kvack.org>
> Reported-by: Andrii Nakryiko <andrii@kernel.org>
> [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
> [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
> [ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
> [ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
> [ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ]
> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

Some nits below:

> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3635,6 +3635,26 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
>  EXPORT_SYMBOL(__kmalloc_node_track_caller);
>  #endif /* CONFIG_NUMA */
>  
> +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
> +{
> +	struct kmem_cache *cachep;
> +	unsigned int objnr;
> +	void *objp;
> +
> +	kpp->kp_ptr = object;
> +	kpp->kp_page = page;
> +	cachep = page->slab_cache;
> +	kpp->kp_slab_cache = cachep;
> +	objp = object - obj_offset(cachep);
> +	kpp->kp_data_offset = obj_offset(cachep);
> +	page = virt_to_head_page(objp);

Hm when can this page be different from the "page" we got as function parameter?
I guess only if "object" was so close to the beginning of page that "object -
obj_offset(cachep)" underflowed it. So either "object" pointed to the
padding/redzone, or even below page->s_mem. Both situations sounds like we
should handle them differently than continuing with an unrelated page that's
below our slab page?

> +	objnr = obj_to_index(cachep, page, objp);

Related, this will return bogus value for objp below page->s_mem.
And if our "object" pointer pointed beyond last valid object, this will give us
too large index.


> +	objp = index_to_obj(cachep, page, objnr);

Too large index can cause dbg_userword to be beyond our page.
In SLUB version you have the WARN_ON_ONCE that catches such invalid pointers
(before first valid object or after last valid object) and skips getting the
backtrace for those, so analogical thing should probably be done here?

> +	kpp->kp_objp = objp;
> +	if (DEBUG && cachep->flags & SLAB_STORE_USER)
> +		kpp->kp_ret = *dbg_userword(cachep, objp);
> +}
> +
> diff --git a/mm/slub.c b/mm/slub.c
> index 0c8b43a..3c1a843 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3919,6 +3919,46 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
>  	return 0;
>  }
>  
> +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
> +{
> +	void *base;
> +	int __maybe_unused i;
> +	unsigned int objnr;
> +	void *objp;
> +	void *objp0;
> +	struct kmem_cache *s = page->slab_cache;
> +	struct track __maybe_unused *trackp;
> +
> +	kpp->kp_ptr = object;
> +	kpp->kp_page = page;
> +	kpp->kp_slab_cache = s;
> +	base = page_address(page);
> +	objp0 = kasan_reset_tag(object);
> +#ifdef CONFIG_SLUB_DEBUG
> +	objp = restore_red_left(s, objp0);
> +#else
> +	objp = objp0;
> +#endif
> +	objnr = obj_to_index(s, page, objp);

It would be safer to use objp0 instead of objp here I think. In case "object"
was pointer to the first object's left red zone, then we would not have "objp"
underflow "base" and get a bogus objnr. The WARN_ON_ONCE below could then be
less paranoid? Basically just the "objp >= base + page->objects * s->size"
should be possible if "object" points beyond the last valid object. But
otherwise we should get valid index and thus valid "objp = base + s->size *
objnr;" below, and "objp < base" and "(objp - base) % s->size)" should be
impossible?

Hmm but since it would then be possible to get a negative pointer offset (into
the left padding/redzone), kmem_dump_obj() should calculate and print it as signed?
But it's not obvious if a pointer to left red zone is a pointer that was an
overflow of object N-1 or underflow of object N, and which one to report (unless
it's the very first object). AFAICS your current code will report all as
overflows of object N-1, which is problematic with N=0 (as described above) so
changing it to report underflows of object N would make more sense?

Thanks,
Vlastimil

  reply	other threads:[~2021-01-08 13:51 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-06  1:16 [PATCH v4 sl-b 0/6] Export return addresses etc. for better diagnostics Paul E. McKenney
2021-01-06  1:17 ` [PATCH mm,percpu_ref,rcu 1/6] mm: Add mem_dump_obj() to print source of memory block paulmck
2021-01-08 13:50   ` Vlastimil Babka [this message]
2021-01-08 19:01     ` Paul E. McKenney
2021-01-08 19:41       ` Vlastimil Babka
2021-01-06  1:17 ` [PATCH mm,percpu_ref,rcu 2/6] mm: Make mem_dump_obj() handle NULL and zero-sized pointers paulmck
2021-01-06  1:17 ` [PATCH mm,percpu_ref,rcu 3/6] mm: Make mem_dump_obj() handle vmalloc() memory paulmck
2021-01-08 15:30   ` Vlastimil Babka
2021-01-06  1:17 ` [PATCH mm,percpu_ref,rcu 4/6] mm: Make mem_obj_dump() vmalloc() dumps include start and length paulmck
2021-01-08 15:30   ` Vlastimil Babka
2021-01-06  1:17 ` [PATCH mm,percpu_ref,rcu 5/6] rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback paulmck
2021-01-06  1:17 ` [PATCH mm,percpu_ref,rcu 6/6] percpu_ref: Dump mem_dump_obj() info upon reference-count underflow paulmck
2021-01-06 21:48 ` [PATCH v4 sl-b 0/6] Export return addresses etc. for better diagnostics Andrew Morton
2021-01-06 23:42   ` Paul E. McKenney
2021-01-08  0:26     ` Paul E. McKenney
2021-01-08 15:35       ` Vlastimil Babka
2021-01-08 17:36         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=39e1bbd5-2766-d709-d932-bf66d11e244f@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ming.lei@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=penberg@kernel.org \
    --cc=rcu@vger.kernel.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.