From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2FB17C43334 for ; Tue, 19 Jul 2022 14:16:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238729AbiGSOP6 (ORCPT ); Tue, 19 Jul 2022 10:15:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53214 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237484AbiGSOPa (ORCPT ); Tue, 19 Jul 2022 10:15:30 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 66E07264 for ; Tue, 19 Jul 2022 06:45:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1658238303; x=1689774303; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=qK5y0D0KpTkzm25IMexWQ0WperUrAguGmOpw+uIjEJM=; b=E/VtvHtNbHQcinJXNOl4/Nbil5+i1GXzjYqkiAM7YA+QwAR1/21/P+qs ClpFK6sBhtJ4Nnn9K6rpFc6JWLnOw+dpwXQ3liFTBsW512ifCD0vCiZYg 39Cq7C/C+axmImjxZEVFTG2qlByOD6tFwPBxRHtuceh2CGqPPvaEsOZQi ME07u5FfRXb/cxov2Nis8FuGq9kIMALTZrGXltnawj/Yyn6sYDtHkxRFB JS2Vf946Ks2Fv8P4s5oj7Xrb6IOSjJw4RjUCQrvavhTzoukaiI3CUDGtB J4BaMEwaDc8kb7HEDCQmIo/Qmdg1jzwWTAEVfYOMX7BeSFp6ZF+cNfGO9 A==; X-IronPort-AV: E=McAfee;i="6400,9594,10412"; a="287643795" X-IronPort-AV: E=Sophos;i="5.92,284,1650956400"; d="scan'208";a="287643795" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jul 2022 06:45:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,284,1650956400"; d="scan'208";a="655766731" Received: from shbuild999.sh.intel.com (HELO localhost) ([10.239.146.138]) by fmsmga008.fm.intel.com with ESMTP; 19 Jul 2022 06:44:59 -0700 Date: Tue, 19 Jul 2022 21:45:03 +0800 From: Feng Tang To: Vlastimil Babka Cc: Andrew Morton , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "Hansen, Dave" , Robin Murphy , John Garry Subject: Re: [PATCH v1] mm/slub: enable debugging memory wasting of kmalloc Message-ID: <20220719134503.GA56558@shbuild999.sh.intel.com> References: <20220701135954.45045-1-feng.tang@intel.com> <41763154-f923-ae99-55c0-0f3717636779@suse.cz> <20220713073642.GA69088@shbuild999.sh.intel.com> <45906408-34ce-4b79-fbe4-768335ffbf96@suse.cz> <20220715082922.GA88035@shbuild999.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220715082922.GA88035@shbuild999.sh.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Vlastimil, On Fri, Jul 15, 2022 at 04:29:22PM +0800, Tang, Feng wrote: [...] > > >> - the knowledge of actual size could be used to improve poisoning checks as > > >> well, detect cases when there's buffer overrun over the orig_size but not > > >> cache's size. e.g. if you kmalloc(48) and overrun up to 64 we won't detect > > >> it now, but with orig_size stored we could? > > > > > > The above patch doesn't touch this. As I have a question, for the > > > [orib_size, object_size) area, shall we fill it with POISON_XXX no matter > > > REDZONE flag is set or not? > > > > Ah, looks like we use redzoning, not poisoning, for padding from > > s->object_size to word boundary. So it would be more consistent to use the > > redzone pattern (RED_ACTIVE) and check with the dynamic orig_size. Probably > > no change for RED_INACTIVE handling is needed though. > > Thanks for clarifying, will go this way and do more test. Also I'd > make it a separate patch, as it is logically different from the space > wastage. I made a draft to redzone the wasted space, which basically works (patch pasted at the end of the mail) as detecting corruption of below test code: size = 256; buf = kmalloc(size + 8, GFP_KERNEL); memset(buf + size + size/2, 0xff, size/4); print_section(KERN_ERR, "Corruptted-kmalloc-space", buf, size * 2); kfree(buf); However when it is enabled globally, there are many places reporting corruption. I debugged one case, and found that the network(skb_buff) code already knows this "wasted" kmalloc space and utilize it which is detected by my patch. The allocation stack is: [ 0.933675] BUG kmalloc-2k (Not tainted): kmalloc unused part overwritten [ 0.933675] ----------------------------------------------------------------------------- [ 0.933675] [ 0.933675] 0xffff888237d026c0-0xffff888237d026e3 @offset=9920. First byte 0x0 instead of 0xcc [ 0.933675] Allocated in __alloc_skb+0x8e/0x1d0 age=5 cpu=0 pid=1 [ 0.933675] __slab_alloc.constprop.0+0x52/0x90 [ 0.933675] __kmalloc_node_track_caller+0x129/0x380 [ 0.933675] kmalloc_reserve+0x2a/0x70 [ 0.933675] __alloc_skb+0x8e/0x1d0 [ 0.933675] audit_buffer_alloc+0x3a/0xc0 [ 0.933675] audit_log_start.part.0+0xa3/0x300 [ 0.933675] audit_log+0x62/0xc0 [ 0.933675] audit_init+0x15c/0x16f And the networking code which touches the [orig_size, object_size) area is in __build_skb_around(), which put a 'struct skb_shared_info' at the end of this area: static void __build_skb_around(struct sk_buff *skb, void *data, unsigned int frag_size) { struct skb_shared_info *shinfo; unsigned int size = frag_size ? : ksize(data); size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); -----> XXX carve the space <----- ... skb_set_end_offset(skb, size); ... shinfo = skb_shinfo(skb); memset(shinfo, 0, offsetof(struct skb_shared_info, dataref)); atomic_set(&shinfo->dataref, 1); -----> upper 2 lines changes the memory <----- ... } Then we end up seeing the corruption report: [ 0.933675] Object ffff888237d026c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 0.933675] Object ffff888237d026d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 0.933675] Object ffff888237d026e0: 01 00 00 00 cc cc cc cc cc cc cc cc cc cc cc cc ................ I haven't got time to chase other cases, and would update these first. Following is the draft (not cleaned patch) patch to redzone the [orig_size, object_size) space. Thanks, Feng --- diff --git a/mm/slab.c b/mm/slab.c index 6474c515a664..2f1110b16463 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3229,7 +3229,7 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_ init = slab_want_init_on_alloc(flags, cachep); out_hooks: - slab_post_alloc_hook(cachep, objcg, flags, 1, &ptr, init); + slab_post_alloc_hook(cachep, objcg, flags, 1, &ptr, init, 0); return ptr; } @@ -3291,7 +3291,7 @@ slab_alloc(struct kmem_cache *cachep, struct list_lru *lru, gfp_t flags, init = slab_want_init_on_alloc(flags, cachep); out: - slab_post_alloc_hook(cachep, objcg, flags, 1, &objp, init); + slab_post_alloc_hook(cachep, objcg, flags, 1, &objp, init, 0); return objp; } @@ -3536,13 +3536,13 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, * Done outside of the IRQ disabled section. */ slab_post_alloc_hook(s, objcg, flags, size, p, - slab_want_init_on_alloc(flags, s)); + slab_want_init_on_alloc(flags, s), 0); /* FIXME: Trace call missing. Christoph would like a bulk variant */ return size; error: local_irq_enable(); cache_alloc_debugcheck_after_bulk(s, flags, i, p, _RET_IP_); - slab_post_alloc_hook(s, objcg, flags, i, p, false); + slab_post_alloc_hook(s, objcg, flags, i, p, false, 0); __kmem_cache_free_bulk(s, i, p); return 0; } diff --git a/mm/slab.h b/mm/slab.h index a8d5eb1c323f..938ec6454dbc 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -719,12 +719,17 @@ static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, static inline void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, gfp_t flags, - size_t size, void **p, bool init) + size_t size, void **p, bool init, + unsigned int orig_size) { size_t i; flags &= gfp_allowed_mask; + /* If original request size(kmalloc) is not set, use object_size */ + if (!orig_size) + orig_size = s->object_size; + /* * As memory initialization might be integrated into KASAN, * kasan_slab_alloc and initialization memset must be @@ -735,7 +740,7 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s, for (i = 0; i < size; i++) { p[i] = kasan_slab_alloc(s, p[i], flags, init); if (p[i] && init && !kasan_has_integrated_init()) - memset(p[i], 0, s->object_size); + memset(p[i], 0, orig_size); kmemleak_alloc_recursive(p[i], s->object_size, 1, s->flags, flags); } diff --git a/mm/slub.c b/mm/slub.c index 1a806912b1a3..014513e0658f 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -45,6 +45,21 @@ #include "internal.h" +static inline void dump_slub(struct kmem_cache *s) +{ + printk("Dump slab[%s] info:\n", s->name); + printk("flags=0x%lx, size=%d, obj_size=%d, offset=%d\n" + "oo=0x%x, inuse=%d, align=%d, red_left_pad=%d\n", + s->flags, s->size, s->object_size, s->offset, + s->oo.x, s->inuse, s->align, s->red_left_pad + ); +#ifdef CONFIG_SLUB_CPU_PARTIAL + printk("cpu_partial=%d, cpu_partial_slabs=%d\n", + s->cpu_partial, s->cpu_partial_slabs); +#endif + printk("\n"); +} + /* * Lock order: * 1. slab_mutex (Global Mutex) @@ -191,6 +206,12 @@ static inline bool kmem_cache_debug(struct kmem_cache *s) return kmem_cache_debug_flags(s, SLAB_DEBUG_FLAGS); } +static inline bool kmem_cache_debug_orig_size(struct kmem_cache *s) +{ + return (s->flags & SLAB_KMALLOC && + s->flags & (SLAB_RED_ZONE | SLAB_STORE_USER)); +} + void *fixup_red_left(struct kmem_cache *s, void *p) { if (kmem_cache_debug_flags(s, SLAB_RED_ZONE)) @@ -833,7 +854,7 @@ static unsigned int get_orig_size(struct kmem_cache *s, void *object) { void *p = kasan_reset_tag(object); - if (!(s->flags & SLAB_KMALLOC)) + if (!kmem_cache_debug_orig_size(s)) return s->object_size; p = object + get_info_end(s); @@ -902,6 +923,9 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p) if (s->flags & SLAB_STORE_USER) off += 2 * sizeof(struct track); + if (kmem_cache_debug_orig_size(s)) + off += sizeof(unsigned int); + off += kasan_metadata_size(s); if (off != size_from_object(s)) @@ -958,13 +982,21 @@ static __printf(3, 4) void slab_err(struct kmem_cache *s, struct slab *slab, static void init_object(struct kmem_cache *s, void *object, u8 val) { u8 *p = kasan_reset_tag(object); + unsigned int orig_size = s->object_size; if (s->flags & SLAB_RED_ZONE) memset(p - s->red_left_pad, val, s->red_left_pad); + if (kmem_cache_debug_orig_size(s) && val == SLUB_RED_ACTIVE) { + /* Redzone the allocated by kmalloc but unused space */ + orig_size = get_orig_size(s, object); + if (orig_size < s->object_size) + memset(p + orig_size, val, s->object_size - orig_size); + } + if (s->flags & __OBJECT_POISON) { - memset(p, POISON_FREE, s->object_size - 1); - p[s->object_size - 1] = POISON_END; + memset(p, POISON_FREE, orig_size - 1); + p[orig_size - 1] = POISON_END; } if (s->flags & SLAB_RED_ZONE) @@ -1057,7 +1089,7 @@ static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p) /* We also have user information there */ off += 2 * sizeof(struct track); - if (s->flags & SLAB_KMALLOC) + if (kmem_cache_debug_orig_size(s)) off += sizeof(unsigned int); off += kasan_metadata_size(s); @@ -1110,6 +1142,7 @@ static int check_object(struct kmem_cache *s, struct slab *slab, { u8 *p = object; u8 *endobject = object + s->object_size; + unsigned int orig_size; if (s->flags & SLAB_RED_ZONE) { if (!check_bytes_and_report(s, slab, object, "Left Redzone", @@ -1119,6 +1152,8 @@ static int check_object(struct kmem_cache *s, struct slab *slab, if (!check_bytes_and_report(s, slab, object, "Right Redzone", endobject, val, s->inuse - s->object_size)) return 0; + + } else { if ((s->flags & SLAB_POISON) && s->object_size < s->inuse) { check_bytes_and_report(s, slab, p, "Alignment padding", @@ -1127,7 +1162,23 @@ static int check_object(struct kmem_cache *s, struct slab *slab, } } + #if 1 + if (kmem_cache_debug_orig_size(s) && val == SLUB_RED_ACTIVE) { + + orig_size = get_orig_size(s, object); + + if (s->object_size != orig_size && + !check_bytes_and_report(s, slab, object, "kmalloc unused part", + p + orig_size, val, s->object_size - orig_size)) { + dump_slub(s); +// while (1); + return 0; + } + } + #endif + if (s->flags & SLAB_POISON) { + if (val != SLUB_RED_ACTIVE && (s->flags & __OBJECT_POISON) && (!check_bytes_and_report(s, slab, p, "Poison", p, POISON_FREE, s->object_size - 1) || @@ -1367,7 +1418,7 @@ static noinline int alloc_debug_processing(struct kmem_cache *s, if (s->flags & SLAB_STORE_USER) set_track(s, object, TRACK_ALLOC, addr); - if (s->flags & SLAB_KMALLOC) + if (kmem_cache_debug_orig_size(s)) set_orig_size(s, object, orig_size); trace(s, slab, object, 1); @@ -3276,7 +3327,7 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct list_l init = slab_want_init_on_alloc(gfpflags, s); out: - slab_post_alloc_hook(s, objcg, gfpflags, 1, &object, init); + slab_post_alloc_hook(s, objcg, gfpflags, 1, &object, init, orig_size); return object; } @@ -3769,11 +3820,11 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, * Done outside of the IRQ disabled fastpath loop. */ slab_post_alloc_hook(s, objcg, flags, size, p, - slab_want_init_on_alloc(flags, s)); + slab_want_init_on_alloc(flags, s), 0); return i; error: slub_put_cpu_ptr(s->cpu_slab); - slab_post_alloc_hook(s, objcg, flags, i, p, false); + slab_post_alloc_hook(s, objcg, flags, i, p, false, 0); __kmem_cache_free_bulk(s, i, p); return 0; } @@ -4155,12 +4206,12 @@ static int calculate_sizes(struct kmem_cache *s) */ size += 2 * sizeof(struct track); - /* Save the original requsted kmalloc size */ - if (flags & SLAB_KMALLOC) + /* Save the original kmalloc request size */ + if (kmem_cache_debug_orig_size(s)) size += sizeof(unsigned int); #endif - kasan_cache_create(s, &size, &s->flags); + #ifdef CONFIG_SLUB_DEBUG if (flags & SLAB_RED_ZONE) { /*