linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] guarantee natural alignment for kmalloc()
@ 2019-08-26 11:16 Vlastimil Babka
  2019-08-26 11:16 ` [PATCH v2 1/2] mm, sl[ou]b: improve memory accounting Vlastimil Babka
  2019-08-26 11:16 ` [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) Vlastimil Babka
  0 siblings, 2 replies; 40+ messages in thread
From: Vlastimil Babka @ 2019-08-26 11:16 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Christoph Lameter, Pekka Enberg, David Rientjes,
	Ming Lei, Dave Chinner, Matthew Wilcox, Darrick J . Wong,
	Christoph Hellwig, linux-xfs, linux-fsdevel, linux-block,
	James Bottomley, linux-btrfs, Vlastimil Babka

After a while, here's v2 of the series, also discussed at LSF/MM [1].
I've updated the documentation bits and expanded changelog. Also measured
effect on SLOB, and found it to be within noise. That required first adding
some accounting for SLOB, which I believe is useful in general, so that
became Patch 1. All other details are in Patch 2 changelog.

[1] https://lwn.net/Articles/787740/

Vlastimil Babka (2):
  mm, sl[ou]b: improve memory accounting
  mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)

 Documentation/core-api/memory-allocation.rst |  4 ++
 include/linux/slab.h                         |  4 ++
 mm/slab_common.c                             | 19 +++++-
 mm/slob.c                                    | 62 +++++++++++++++-----
 mm/slub.c                                    | 14 ++++-
 5 files changed, 82 insertions(+), 21 deletions(-)

-- 
2.22.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 1/2] mm, sl[ou]b: improve memory accounting
  2019-08-26 11:16 [PATCH v2 0/2] guarantee natural alignment for kmalloc() Vlastimil Babka
@ 2019-08-26 11:16 ` Vlastimil Babka
  2019-08-26 11:16 ` [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) Vlastimil Babka
  1 sibling, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2019-08-26 11:16 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Christoph Lameter, Pekka Enberg, David Rientjes,
	Ming Lei, Dave Chinner, Matthew Wilcox, Darrick J . Wong,
	Christoph Hellwig, linux-xfs, linux-fsdevel, linux-block,
	James Bottomley, linux-btrfs, Vlastimil Babka

SLOB currently doesn't account its pages at all, so in /proc/meminfo the Slab
field shows zero. Modifying a counter on page allocation and freeing should be
acceptable even for the small system scenarios SLOB is intended for.
Since reclaimable caches are not separated in SLOB, account everything as
unreclaimable.

SLUB currently doesn't account kmalloc() and kmalloc_node() allocations larger
than order-1 page, that are passed directly to the page allocator. As they also
don't appear in /proc/slabinfo, it might look like a memory leak. For
consistency, account them as well. (SLAB doesn't actually use page allocator
directly, so no change there).

Ideally SLOB and SLUB would be handled in separate patches, but due to the
shared kmalloc_order() function and different kfree() implementations, it's
easier to patch both at once to prevent inconsistencies.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slab_common.c |  8 ++++++--
 mm/slob.c        | 20 ++++++++++++++++----
 mm/slub.c        | 14 +++++++++++---
 3 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 807490fe217a..929c02a90fba 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1250,12 +1250,16 @@ void __init create_kmalloc_caches(slab_flags_t flags)
  */
 void *kmalloc_order(size_t size, gfp_t flags, unsigned int order)
 {
-	void *ret;
+	void *ret = NULL;
 	struct page *page;
 
 	flags |= __GFP_COMP;
 	page = alloc_pages(flags, order);
-	ret = page ? page_address(page) : NULL;
+	if (likely(page)) {
+		ret = page_address(page);
+		mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE,
+				    1 << order);
+	}
 	ret = kasan_kmalloc_large(ret, size, flags);
 	/* As ret might get tagged, call kmemleak hook after KASAN. */
 	kmemleak_alloc(ret, size, 1, flags);
diff --git a/mm/slob.c b/mm/slob.c
index 7f421d0ca9ab..3dcde9cf2b17 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -190,7 +190,7 @@ static int slob_last(slob_t *s)
 
 static void *slob_new_pages(gfp_t gfp, int order, int node)
 {
-	void *page;
+	struct page *page;
 
 #ifdef CONFIG_NUMA
 	if (node != NUMA_NO_NODE)
@@ -202,14 +202,21 @@ static void *slob_new_pages(gfp_t gfp, int order, int node)
 	if (!page)
 		return NULL;
 
+	mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE,
+			    1 << order);
 	return page_address(page);
 }
 
 static void slob_free_pages(void *b, int order)
 {
+	struct page *sp = virt_to_page(b);
+
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += 1 << order;
-	free_pages((unsigned long)b, order);
+
+	mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE,
+			    -(1 << order));
+	__free_pages(sp, order);
 }
 
 /*
@@ -521,8 +528,13 @@ void kfree(const void *block)
 		int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
 		unsigned int *m = (unsigned int *)(block - align);
 		slob_free(m, *m + align);
-	} else
-		__free_pages(sp, compound_order(sp));
+	} else {
+		unsigned int order = compound_order(sp);
+		mod_node_page_state(page_pgdat(sp), NR_SLAB_UNRECLAIMABLE,
+				    -(1 << order));
+		__free_pages(sp, order);
+
+	}
 }
 EXPORT_SYMBOL(kfree);
 
diff --git a/mm/slub.c b/mm/slub.c
index 8834563cdb4b..74365d083a1e 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3819,11 +3819,15 @@ static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
 {
 	struct page *page;
 	void *ptr = NULL;
+	unsigned int order = get_order(size);
 
 	flags |= __GFP_COMP;
-	page = alloc_pages_node(node, flags, get_order(size));
-	if (page)
+	page = alloc_pages_node(node, flags, order);
+	if (page) {
 		ptr = page_address(page);
+		mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE,
+				    1 << order);
+	}
 
 	return kmalloc_large_node_hook(ptr, size, flags);
 }
@@ -3949,9 +3953,13 @@ void kfree(const void *x)
 
 	page = virt_to_head_page(x);
 	if (unlikely(!PageSlab(page))) {
+		unsigned int order = compound_order(page);
+
 		BUG_ON(!PageCompound(page));
 		kfree_hook(object);
-		__free_pages(page, compound_order(page));
+		mod_node_page_state(page_pgdat(page), NR_SLAB_UNRECLAIMABLE,
+				    -(1 << order));
+		__free_pages(page, order);
 		return;
 	}
 	slab_free(page->slab_cache, page, object, NULL, 1, _RET_IP_);
-- 
2.22.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-08-26 11:16 [PATCH v2 0/2] guarantee natural alignment for kmalloc() Vlastimil Babka
  2019-08-26 11:16 ` [PATCH v2 1/2] mm, sl[ou]b: improve memory accounting Vlastimil Babka
@ 2019-08-26 11:16 ` Vlastimil Babka
  2019-08-28 18:45   ` Christopher Lameter
                     ` (2 more replies)
  1 sibling, 3 replies; 40+ messages in thread
From: Vlastimil Babka @ 2019-08-26 11:16 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Christoph Lameter, Pekka Enberg, David Rientjes,
	Ming Lei, Dave Chinner, Matthew Wilcox, Darrick J . Wong,
	Christoph Hellwig, linux-xfs, linux-fsdevel, linux-block,
	James Bottomley, linux-btrfs, Vlastimil Babka

In most configurations, kmalloc() happens to return naturally aligned (i.e.
aligned to the block size itself) blocks for power of two sizes. That means
some kmalloc() users might unknowingly rely on that alignment, until stuff
breaks when the kernel is built with e.g.  CONFIG_SLUB_DEBUG or CONFIG_SLOB,
and blocks stop being aligned. Then developers have to devise workaround such
as own kmem caches with specified alignment [1], which is not always practical,
as recently evidenced in [2].

The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()'
variant would not help with code unknowingly relying on the implicit alignment.
For slab implementations it would either require creating more kmalloc caches,
or allocate a larger size and only give back part of it. That would be
wasteful, especially with a generic alignment parameter (in contrast with a
fixed alignment to size).

Ideally we should provide to mm users what they need without difficult
workarounds or own reimplementations, so let's make the kmalloc() alignment to
size explicitly guaranteed for power-of-two sizes under all configurations.
What this means for the three available allocators?

* SLAB object layout happens to be mostly unchanged by the patch. The
  implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due
  to redzoning, however SLAB disables redzoning for caches with alignment
  larger than unsigned long long. Practically on at least x86 this includes
  kmalloc caches as they use cache line alignment, which is larger than that.
  Still, this patch ensures alignment on all arches and cache sizes.

* SLUB layout is also unchanged unless redzoning is enabled through
  CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With
  this patch, explicit alignment is guaranteed with redzoning as well. This
  will result in more memory being wasted, but that should be acceptable in a
  debugging scenario.

* SLOB has no implicit alignment so this patch adds it explicitly for
  kmalloc(). The potential downside is increased fragmentation. While
  pathological allocation scenarios are certainly possible, in my testing,
  after booting a x86_64 kernel+userspace with virtme, around 16MB memory
  was consumed by slab pages both before and after the patch, with difference
  in the noise.

[1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
[2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/
[3] https://lwn.net/Articles/787740/

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 Documentation/core-api/memory-allocation.rst |  4 ++
 include/linux/slab.h                         |  4 ++
 mm/slab_common.c                             | 11 ++++-
 mm/slob.c                                    | 42 +++++++++++++++-----
 4 files changed, 49 insertions(+), 12 deletions(-)

diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst
index 7744aa3bf2e0..27c54854b508 100644
--- a/Documentation/core-api/memory-allocation.rst
+++ b/Documentation/core-api/memory-allocation.rst
@@ -98,6 +98,10 @@ limited. The actual limit depends on the hardware and the kernel
 configuration, but it is a good practice to use `kmalloc` for objects
 smaller than page size.
 
+The address of a chunk allocated with `kmalloc` is aligned to at least
+ARCH_KMALLOC_MINALIGN bytes. For sizes of power of two bytes, the
+alignment is also guaranteed to be at least to the respective size.
+
 For large allocations you can use :c:func:`vmalloc` and
 :c:func:`vzalloc`, or directly request pages from the page
 allocator. The memory allocated by `vmalloc` and related functions is
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 56c9c7eed34e..0d4c26395785 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -493,6 +493,10 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
  * kmalloc is the normal method of allocating memory
  * for objects smaller than page size in the kernel.
  *
+ * The allocated object address is aligned to at least ARCH_KMALLOC_MINALIGN
+ * bytes. For @size of power of two bytes, the alignment is also guaranteed
+ * to be at least to the size.
+ *
  * The @flags argument may be one of the GFP flags defined at
  * include/linux/gfp.h and described at
  * :ref:`Documentation/core-api/mm-api.rst <mm-api-gfp-flags>`
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 929c02a90fba..b9ba93ad5c7f 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -993,10 +993,19 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
 		unsigned int useroffset, unsigned int usersize)
 {
 	int err;
+	unsigned int align = ARCH_KMALLOC_MINALIGN;
 
 	s->name = name;
 	s->size = s->object_size = size;
-	s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
+
+	/*
+	 * For power of two sizes, guarantee natural alignment for kmalloc
+	 * caches, regardless of SL*B debugging options.
+	 */
+	if (is_power_of_2(size))
+		align = max(align, size);
+	s->align = calculate_alignment(flags, align, size);
+
 	s->useroffset = useroffset;
 	s->usersize = usersize;
 
diff --git a/mm/slob.c b/mm/slob.c
index 3dcde9cf2b17..07a39047aa54 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -224,6 +224,7 @@ static void slob_free_pages(void *b, int order)
  * @sp: Page to look in.
  * @size: Size of the allocation.
  * @align: Allocation alignment.
+ * @align_offset: Offset in the allocated block that will be aligned.
  * @page_removed_from_list: Return parameter.
  *
  * Tries to find a chunk of memory at least @size bytes big within @page.
@@ -234,7 +235,7 @@ static void slob_free_pages(void *b, int order)
  *         true (set to false otherwise).
  */
 static void *slob_page_alloc(struct page *sp, size_t size, int align,
-			     bool *page_removed_from_list)
+			      int align_offset, bool *page_removed_from_list)
 {
 	slob_t *prev, *cur, *aligned = NULL;
 	int delta = 0, units = SLOB_UNITS(size);
@@ -243,8 +244,17 @@ static void *slob_page_alloc(struct page *sp, size_t size, int align,
 	for (prev = NULL, cur = sp->freelist; ; prev = cur, cur = slob_next(cur)) {
 		slobidx_t avail = slob_units(cur);
 
+		/*
+		 * 'aligned' will hold the address of the slob block so that the
+		 * address 'aligned'+'align_offset' is aligned according to the
+		 * 'align' parameter. This is for kmalloc() which prepends the
+		 * allocated block with its size, so that the block itself is
+		 * aligned when needed.
+		 */
 		if (align) {
-			aligned = (slob_t *)ALIGN((unsigned long)cur, align);
+			aligned = (slob_t *)
+				(ALIGN((unsigned long)cur + align_offset, align)
+				 - align_offset);
 			delta = aligned - cur;
 		}
 		if (avail >= units + delta) { /* room enough? */
@@ -288,7 +298,8 @@ static void *slob_page_alloc(struct page *sp, size_t size, int align,
 /*
  * slob_alloc: entry point into the slob allocator.
  */
-static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
+static void *slob_alloc(size_t size, gfp_t gfp, int align, int node,
+							int align_offset)
 {
 	struct page *sp;
 	struct list_head *slob_list;
@@ -319,7 +330,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
 		if (sp->units < SLOB_UNITS(size))
 			continue;
 
-		b = slob_page_alloc(sp, size, align, &page_removed_from_list);
+		b = slob_page_alloc(sp, size, align, align_offset, &page_removed_from_list);
 		if (!b)
 			continue;
 
@@ -356,7 +367,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
 		INIT_LIST_HEAD(&sp->slab_list);
 		set_slob(b, SLOB_UNITS(PAGE_SIZE), b + SLOB_UNITS(PAGE_SIZE));
 		set_slob_page_free(sp, slob_list);
-		b = slob_page_alloc(sp, size, align, &_unused);
+		b = slob_page_alloc(sp, size, align, align_offset, &_unused);
 		BUG_ON(!b);
 		spin_unlock_irqrestore(&slob_lock, flags);
 	}
@@ -458,7 +469,7 @@ static __always_inline void *
 __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
 {
 	unsigned int *m;
-	int align = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
+	int minalign = max_t(size_t, ARCH_KMALLOC_MINALIGN, ARCH_SLAB_MINALIGN);
 	void *ret;
 
 	gfp &= gfp_allowed_mask;
@@ -466,19 +477,28 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, unsigned long caller)
 	fs_reclaim_acquire(gfp);
 	fs_reclaim_release(gfp);
 
-	if (size < PAGE_SIZE - align) {
+	if (size < PAGE_SIZE - minalign) {
+		int align = minalign;
+
+		/*
+		 * For power of two sizes, guarantee natural alignment for
+		 * kmalloc()'d objects.
+		 */
+		if (is_power_of_2(size))
+			align = max(minalign, (int) size);
+
 		if (!size)
 			return ZERO_SIZE_PTR;
 
-		m = slob_alloc(size + align, gfp, align, node);
+		m = slob_alloc(size + minalign, gfp, align, node, minalign);
 
 		if (!m)
 			return NULL;
 		*m = size;
-		ret = (void *)m + align;
+		ret = (void *)m + minalign;
 
 		trace_kmalloc_node(caller, ret,
-				   size, size + align, gfp, node);
+				   size, size + minalign, gfp, node);
 	} else {
 		unsigned int order = get_order(size);
 
@@ -579,7 +599,7 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
 	fs_reclaim_release(flags);
 
 	if (c->size < PAGE_SIZE) {
-		b = slob_alloc(c->size, flags, c->align, node);
+		b = slob_alloc(c->size, flags, c->align, node, 0);
 		trace_kmem_cache_alloc_node(_RET_IP_, b, c->object_size,
 					    SLOB_UNITS(c->size) * SLOB_UNIT,
 					    flags, node);
-- 
2.22.1


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-08-26 11:16 ` [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) Vlastimil Babka
@ 2019-08-28 18:45   ` Christopher Lameter
  2019-08-28 19:46     ` Matthew Wilcox
  2019-09-23 16:36   ` Vlastimil Babka
  2019-09-23 18:57   ` Matthew Wilcox
  2 siblings, 1 reply; 40+ messages in thread
From: Christopher Lameter @ 2019-08-28 18:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Pekka Enberg,
	David Rientjes, Ming Lei, Dave Chinner, Matthew Wilcox,
	Darrick J . Wong, Christoph Hellwig, linux-xfs, linux-fsdevel,
	linux-block, James Bottomley, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 3428 bytes --]

On Mon, 26 Aug 2019, Vlastimil Babka wrote:

> The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()'
> variant would not help with code unknowingly relying on the implicit alignment.
> For slab implementations it would either require creating more kmalloc caches,
> or allocate a larger size and only give back part of it. That would be
> wasteful, especially with a generic alignment parameter (in contrast with a
> fixed alignment to size).

The additional caches will be detected if similar to existing ones and
merged into one. So the overhead is not that significant.

> Ideally we should provide to mm users what they need without difficult
> workarounds or own reimplementations, so let's make the kmalloc() alignment to
> size explicitly guaranteed for power-of-two sizes under all configurations.

The objection remains that this will create exceptions for the general
notion that all kmalloc caches are aligned to KMALLOC_MINALIGN which may
be suprising and it limits the optimizations that slab allocators may use
for optimizing data use. The SLOB allocator was designed in such a way
that data wastage is limited. The changes here sabotage that goal and show
that future slab allocators may be similarly constrained with the
exceptional alignents implemented. Additional debugging features etc etc
must all support the exceptional alignment requirements.

> * SLUB layout is also unchanged unless redzoning is enabled through
>   CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With
>   this patch, explicit alignment is guaranteed with redzoning as well. This
>   will result in more memory being wasted, but that should be acceptable in a
>   debugging scenario.

Well ok. That sounds fine (apart from breaking the rules for slab object
alignment).

> * SLOB has no implicit alignment so this patch adds it explicitly for
>   kmalloc(). The potential downside is increased fragmentation. While
>   pathological allocation scenarios are certainly possible, in my testing,
>   after booting a x86_64 kernel+userspace with virtme, around 16MB memory
>   was consumed by slab pages both before and after the patch, with difference
>   in the noise.

This change to slob will cause a significant additional use of memory. The
advertised advantage of SLOB is that *minimal* memory will be used since
it is targeted for embedded systems. Different types of slab objects of
varying sizes can be allocated in the same memory page to reduce
allocation overhead.

Having these exceptional rules for aligning power of two sizes caches
will significantly increase the memory wastage in SLOB.

The result of this patch is just to use more memory to be safe from
certain pathologies where one subsystem was relying on an alignment that
was not specified. That is why this approach should not be called
�natural" but "implicit alignment". The one using the slab cache is not
aware that the slab allocator provides objects aligned in a special way
(which is in general not needed. There seems to be a single pathological
case that needs to be addressed and I thought that was due to some
brokenness in the hardware?).

It is better to ensure that subsystems that require special alignment
explicitly tell the allocator about this.

I still think implicit exceptions to alignments are a bad idea. Those need
to be explicity specified and that is possible using kmem_cache_create().


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-08-28 18:45   ` Christopher Lameter
@ 2019-08-28 19:46     ` Matthew Wilcox
  2019-08-28 22:24       ` Dave Chinner
  2019-08-29  7:39       ` Michal Hocko
  0 siblings, 2 replies; 40+ messages in thread
From: Matthew Wilcox @ 2019-08-28 19:46 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Vlastimil Babka, Andrew Morton, linux-mm, linux-kernel,
	Pekka Enberg, David Rientjes, Ming Lei, Dave Chinner,
	Darrick J . Wong, Christoph Hellwig, linux-xfs, linux-fsdevel,
	linux-block, James Bottomley, linux-btrfs

On Wed, Aug 28, 2019 at 06:45:07PM +0000, Christopher Lameter wrote:
> > Ideally we should provide to mm users what they need without difficult
> > workarounds or own reimplementations, so let's make the kmalloc() alignment to
> > size explicitly guaranteed for power-of-two sizes under all configurations.
> 
> The objection remains that this will create exceptions for the general
> notion that all kmalloc caches are aligned to KMALLOC_MINALIGN which may

Hmm?  kmalloc caches will be aligned to both KMALLOC_MINALIGN and the
natural alignment of the object.

> be suprising and it limits the optimizations that slab allocators may use
> for optimizing data use. The SLOB allocator was designed in such a way
> that data wastage is limited. The changes here sabotage that goal and show
> that future slab allocators may be similarly constrained with the
> exceptional alignents implemented. Additional debugging features etc etc
> must all support the exceptional alignment requirements.

While I sympathise with the poor programmer who has to write the
fourth implementation of the sl*b interface, it's more for the pain of
picking a new letter than the pain of needing to honour the alignment
of allocations.

There are many places in the kernel which assume alignment.  They break
when it's not supplied.  I believe we have a better overall system if
the MM developers provide stronger guarantees than the MM consumers have
to work around only weak guarantees.

> > * SLOB has no implicit alignment so this patch adds it explicitly for
> >   kmalloc(). The potential downside is increased fragmentation. While
> >   pathological allocation scenarios are certainly possible, in my testing,
> >   after booting a x86_64 kernel+userspace with virtme, around 16MB memory
> >   was consumed by slab pages both before and after the patch, with difference
> >   in the noise.
> 
> This change to slob will cause a significant additional use of memory. The
> advertised advantage of SLOB is that *minimal* memory will be used since
> it is targeted for embedded systems. Different types of slab objects of
> varying sizes can be allocated in the same memory page to reduce
> allocation overhead.

Did you not read the part where he said the difference was in the noise?

> The result of this patch is just to use more memory to be safe from
> certain pathologies where one subsystem was relying on an alignment that
> was not specified. That is why this approach should not be called
> �natural" but "implicit alignment". The one using the slab cache is not
> aware that the slab allocator provides objects aligned in a special way
> (which is in general not needed. There seems to be a single pathological
> case that needs to be addressed and I thought that was due to some
> brokenness in the hardware?).

It turns out there are lots of places which assume this, including the
pmem driver, the ramdisk driver and a few other similar drivers.

> It is better to ensure that subsystems that require special alignment
> explicitly tell the allocator about this.

But it's not the subsystems which have this limitation which do the
allocation; it's the subsystems who allocate the memory that they then
pass to the subsystems.  So you're forcing communication of these limits
up & down the stack.

> I still think implicit exceptions to alignments are a bad idea. Those need
> to be explicity specified and that is possible using kmem_cache_create().

I swear we covered this last time the topic came up, but XFS would need
to create special slab caches for each size between 512 and PAGE_SIZE.
Potentially larger, depending on whether the MM developers are willing to
guarantee that kmalloc(PAGE_SIZE * 2, GFP_KERNEL) will return a PAGE_SIZE
aligned block of memory indefinitely.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-08-28 19:46     ` Matthew Wilcox
@ 2019-08-28 22:24       ` Dave Chinner
  2019-08-29  7:56         ` Vlastimil Babka
  2019-08-29  7:39       ` Michal Hocko
  1 sibling, 1 reply; 40+ messages in thread
From: Dave Chinner @ 2019-08-28 22:24 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christopher Lameter, Vlastimil Babka, Andrew Morton, linux-mm,
	linux-kernel, Pekka Enberg, David Rientjes, Ming Lei,
	Darrick J . Wong, Christoph Hellwig, linux-xfs, linux-fsdevel,
	linux-block, James Bottomley, linux-btrfs

On Wed, Aug 28, 2019 at 12:46:08PM -0700, Matthew Wilcox wrote:
> On Wed, Aug 28, 2019 at 06:45:07PM +0000, Christopher Lameter wrote:
> > I still think implicit exceptions to alignments are a bad idea. Those need
> > to be explicity specified and that is possible using kmem_cache_create().
> 
> I swear we covered this last time the topic came up, but XFS would need
> to create special slab caches for each size between 512 and PAGE_SIZE.
> Potentially larger, depending on whether the MM developers are willing to
> guarantee that kmalloc(PAGE_SIZE * 2, GFP_KERNEL) will return a PAGE_SIZE
> aligned block of memory indefinitely.

Page size alignment of multi-page heap allocations is ncessary. The
current behaviour w/ KASAN is to offset so a 8KB allocation spans 3
pages and is not page aligned. That causes just as much in way
of alignment problems as unaligned objects in multi-object-per-page
slabs.

As I said in the lastest discussion of this problem on XFS (pmem
devices w/ KASAN enabled), all we -need- is a GFP flag that tells the
slab allocator to give us naturally aligned object or fail if it
can't. I don't care how that gets implemented (e.g. another set of
heap slabs like the -rcl slabs), I just don't want every high level
subsystem that allocates heap memory for IO buffers to have to
implement their own aligned slab caches.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-08-28 19:46     ` Matthew Wilcox
  2019-08-28 22:24       ` Dave Chinner
@ 2019-08-29  7:39       ` Michal Hocko
  2019-08-30 17:41         ` Christopher Lameter
  1 sibling, 1 reply; 40+ messages in thread
From: Michal Hocko @ 2019-08-29  7:39 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christopher Lameter, Vlastimil Babka, Andrew Morton, linux-mm,
	linux-kernel, Pekka Enberg, David Rientjes, Ming Lei,
	Dave Chinner, Darrick J . Wong, Christoph Hellwig, linux-xfs,
	linux-fsdevel, linux-block, James Bottomley, linux-btrfs

On Wed 28-08-19 12:46:08, Matthew Wilcox wrote:
> On Wed, Aug 28, 2019 at 06:45:07PM +0000, Christopher Lameter wrote:
[...]
> > be suprising and it limits the optimizations that slab allocators may use
> > for optimizing data use. The SLOB allocator was designed in such a way
> > that data wastage is limited. The changes here sabotage that goal and show
> > that future slab allocators may be similarly constrained with the
> > exceptional alignents implemented. Additional debugging features etc etc
> > must all support the exceptional alignment requirements.
> 
> While I sympathise with the poor programmer who has to write the
> fourth implementation of the sl*b interface, it's more for the pain of
> picking a new letter than the pain of needing to honour the alignment
> of allocations.
> 
> There are many places in the kernel which assume alignment.  They break
> when it's not supplied.  I believe we have a better overall system if
> the MM developers provide stronger guarantees than the MM consumers have
> to work around only weak guarantees.

I absolutely agree. A hypothetical benefit of a new implementation
doesn't outweigh the complexity the existing code has to jump over or
worse is not aware of and it is broken silently. My general experience
is that the later is more likely with a large variety of drivers we have
in the tree and odd things they do in general.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-08-28 22:24       ` Dave Chinner
@ 2019-08-29  7:56         ` Vlastimil Babka
  2019-08-30  0:29           ` Dave Chinner
  0 siblings, 1 reply; 40+ messages in thread
From: Vlastimil Babka @ 2019-08-29  7:56 UTC (permalink / raw)
  To: Dave Chinner, Matthew Wilcox
  Cc: Christopher Lameter, Andrew Morton, linux-mm, linux-kernel,
	Pekka Enberg, David Rientjes, Ming Lei, Darrick J . Wong,
	Christoph Hellwig, linux-xfs, linux-fsdevel, linux-block,
	James Bottomley, linux-btrfs

On 8/29/19 12:24 AM, Dave Chinner wrote:
> On Wed, Aug 28, 2019 at 12:46:08PM -0700, Matthew Wilcox wrote:
>> On Wed, Aug 28, 2019 at 06:45:07PM +0000, Christopher Lameter wrote:
>>> I still think implicit exceptions to alignments are a bad idea. Those need
>>> to be explicity specified and that is possible using kmem_cache_create().
>>
>> I swear we covered this last time the topic came up, but XFS would need
>> to create special slab caches for each size between 512 and PAGE_SIZE.
>> Potentially larger, depending on whether the MM developers are willing to
>> guarantee that kmalloc(PAGE_SIZE * 2, GFP_KERNEL) will return a PAGE_SIZE
>> aligned block of memory indefinitely.
> 
> Page size alignment of multi-page heap allocations is ncessary. The
> current behaviour w/ KASAN is to offset so a 8KB allocation spans 3
> pages and is not page aligned. That causes just as much in way
> of alignment problems as unaligned objects in multi-object-per-page
> slabs.

Ugh, multi-page (power of two) allocations *at the page allocator level*
simply have to be aligned, as that's how the buddy allocator has always
worked, and it would be madness to try relax that guarantee and require
an explicit flag at this point. The kmalloc wrapper with SLUB will pass
everything above 8KB directly to the page allocator, so that's fine too.
4k and 8k are the only (multi-)page sizes still managed as SLUB objects.
I would say that these sizes are the most striking example that it's
wrong not to align them without extra flags or special API variant.

> As I said in the lastest discussion of this problem on XFS (pmem
> devices w/ KASAN enabled), all we -need- is a GFP flag that tells the
> slab allocator to give us naturally aligned object or fail if it
> can't. I don't care how that gets implemented (e.g. another set of
> heap slabs like the -rcl slabs), I just don't want every high level

Given alignment is orthogonal to -rcl and dma-, would that be another
three sets? Or we assume that dma- would want it always, and complicate
the rules further? Funilly enough, SLOB would be the simplest case here.

> subsystem that allocates heap memory for IO buffers to have to
> implement their own aligned slab caches.

Definitely agree. I still hope we can do that even without a new flag/API.

> Cheers,
> 
> Dave.
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-08-29  7:56         ` Vlastimil Babka
@ 2019-08-30  0:29           ` Dave Chinner
  0 siblings, 0 replies; 40+ messages in thread
From: Dave Chinner @ 2019-08-30  0:29 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Matthew Wilcox, Christopher Lameter, Andrew Morton, linux-mm,
	linux-kernel, Pekka Enberg, David Rientjes, Ming Lei,
	Darrick J . Wong, Christoph Hellwig, linux-xfs, linux-fsdevel,
	linux-block, James Bottomley, linux-btrfs

On Thu, Aug 29, 2019 at 09:56:13AM +0200, Vlastimil Babka wrote:
> On 8/29/19 12:24 AM, Dave Chinner wrote:
> > On Wed, Aug 28, 2019 at 12:46:08PM -0700, Matthew Wilcox wrote:
> >> On Wed, Aug 28, 2019 at 06:45:07PM +0000, Christopher Lameter wrote:
> >>> I still think implicit exceptions to alignments are a bad idea. Those need
> >>> to be explicity specified and that is possible using kmem_cache_create().
> >>
> >> I swear we covered this last time the topic came up, but XFS would need
> >> to create special slab caches for each size between 512 and PAGE_SIZE.
> >> Potentially larger, depending on whether the MM developers are willing to
> >> guarantee that kmalloc(PAGE_SIZE * 2, GFP_KERNEL) will return a PAGE_SIZE
> >> aligned block of memory indefinitely.
> > 
> > Page size alignment of multi-page heap allocations is ncessary. The
> > current behaviour w/ KASAN is to offset so a 8KB allocation spans 3
> > pages and is not page aligned. That causes just as much in way
> > of alignment problems as unaligned objects in multi-object-per-page
> > slabs.
> 
> Ugh, multi-page (power of two) allocations *at the page allocator level*
> simply have to be aligned, as that's how the buddy allocator has always
> worked, and it would be madness to try relax that guarantee and require
> an explicit flag at this point. The kmalloc wrapper with SLUB will pass
> everything above 8KB directly to the page allocator, so that's fine too.
> 4k and 8k are the only (multi-)page sizes still managed as SLUB objects.

On a 4kB page size box, yes.

On a 64kB page size system, 4/8kB allocations are still sub-page
objects and will have alignment issues. Hence right now we can't
assume a 4/8/16/32kB allocation will be page size aligned
anywhere, because they are heap allocations on 64kB page sized
machines.

> I would say that these sizes are the most striking example that it's
> wrong not to align them without extra flags or special API variant.

Yup, just pointing out that they aren't guaranteed alignment right
now on x86-64.

> > As I said in the lastest discussion of this problem on XFS (pmem
> > devices w/ KASAN enabled), all we -need- is a GFP flag that tells the
> > slab allocator to give us naturally aligned object or fail if it
> > can't. I don't care how that gets implemented (e.g. another set of
> > heap slabs like the -rcl slabs), I just don't want every high level
> 
> Given alignment is orthogonal to -rcl and dma-, would that be another
> three sets? Or we assume that dma- would want it always, and complicate
> the rules further? Funilly enough, SLOB would be the simplest case here.

Not my problem. :) All I'm pointing out is that the minimum
functionality we require is specifying individual allocations as
needing alignment. I've just implemented that API in XFS, so
whatever happens in the allocation infrastructure from this point
onwards is really just implementation optimisation for us now....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-08-29  7:39       ` Michal Hocko
@ 2019-08-30 17:41         ` Christopher Lameter
  2019-09-01  0:52           ` Matthew Wilcox
  0 siblings, 1 reply; 40+ messages in thread
From: Christopher Lameter @ 2019-08-30 17:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Matthew Wilcox, Vlastimil Babka, Andrew Morton, linux-mm,
	linux-kernel, Pekka Enberg, David Rientjes, Ming Lei,
	Dave Chinner, Darrick J . Wong, Christoph Hellwig, linux-xfs,
	linux-fsdevel, linux-block, James Bottomley, linux-btrfs

On Thu, 29 Aug 2019, Michal Hocko wrote:

> > There are many places in the kernel which assume alignment.  They break
> > when it's not supplied.  I believe we have a better overall system if
> > the MM developers provide stronger guarantees than the MM consumers have
> > to work around only weak guarantees.
>
> I absolutely agree. A hypothetical benefit of a new implementation
> doesn't outweigh the complexity the existing code has to jump over or
> worse is not aware of and it is broken silently. My general experience
> is that the later is more likely with a large variety of drivers we have
> in the tree and odd things they do in general.


The current behavior without special alignment for these caches has been
in the wild for over a decade. And this is now coming up?

There is one case now it seems with a broken hardware that has issues and
we now move to an alignment requirement from the slabs with exceptions and
this and that?

If there is an exceptional alignment requirement then that needs to be
communicated to the allocator. A special flag or create a special
kmem_cache or something.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-08-30 17:41         ` Christopher Lameter
@ 2019-09-01  0:52           ` Matthew Wilcox
  2019-09-03 20:13             ` Christopher Lameter
  0 siblings, 1 reply; 40+ messages in thread
From: Matthew Wilcox @ 2019-09-01  0:52 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Michal Hocko, Vlastimil Babka, Andrew Morton, linux-mm,
	linux-kernel, Pekka Enberg, David Rientjes, Ming Lei,
	Dave Chinner, Darrick J . Wong, Christoph Hellwig, linux-xfs,
	linux-fsdevel, linux-block, James Bottomley, linux-btrfs

On Fri, Aug 30, 2019 at 05:41:46PM +0000, Christopher Lameter wrote:
> On Thu, 29 Aug 2019, Michal Hocko wrote:
> > > There are many places in the kernel which assume alignment.  They break
> > > when it's not supplied.  I believe we have a better overall system if
> > > the MM developers provide stronger guarantees than the MM consumers have
> > > to work around only weak guarantees.
> >
> > I absolutely agree. A hypothetical benefit of a new implementation
> > doesn't outweigh the complexity the existing code has to jump over or
> > worse is not aware of and it is broken silently. My general experience
> > is that the later is more likely with a large variety of drivers we have
> > in the tree and odd things they do in general.
> 
> The current behavior without special alignment for these caches has been
> in the wild for over a decade. And this is now coming up?

In the wild ... and rarely enabled.  When it is enabled, it may or may
not be noticed as data corruption, or tripping other debugging asserts.
Users then turn off the rare debugging option.

> There is one case now it seems with a broken hardware that has issues and
> we now move to an alignment requirement from the slabs with exceptions and
> this and that?

Perhaps you could try reading what hasa been written instead of sticking
to a strawman of your own invention?

> If there is an exceptional alignment requirement then that needs to be
> communicated to the allocator. A special flag or create a special
> kmem_cache or something.

The only way I'd agree to that is if we deliberately misalign every
allocation that doesn't have this special flag set.  Because right now,
breakage happens everywhere when these debug options are enabled, and
the very people who need to be helped are being hurt by the debugging.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-01  0:52           ` Matthew Wilcox
@ 2019-09-03 20:13             ` Christopher Lameter
  2019-09-03 20:53               ` Matthew Wilcox
  0 siblings, 1 reply; 40+ messages in thread
From: Christopher Lameter @ 2019-09-03 20:13 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Michal Hocko, Vlastimil Babka, Andrew Morton, linux-mm,
	linux-kernel, Pekka Enberg, David Rientjes, Ming Lei,
	Dave Chinner, Darrick J . Wong, Christoph Hellwig, linux-xfs,
	linux-fsdevel, linux-block, James Bottomley, linux-btrfs

On Sat, 31 Aug 2019, Matthew Wilcox wrote:

> > The current behavior without special alignment for these caches has been
> > in the wild for over a decade. And this is now coming up?
>
> In the wild ... and rarely enabled.  When it is enabled, it may or may
> not be noticed as data corruption, or tripping other debugging asserts.
> Users then turn off the rare debugging option.

Its enabled in all full debug session as far as I know. Fedora for
example has been running this for ages to find breakage in device drivers
etc etc.

> > If there is an exceptional alignment requirement then that needs to be
> > communicated to the allocator. A special flag or create a special
> > kmem_cache or something.
>
> The only way I'd agree to that is if we deliberately misalign every
> allocation that doesn't have this special flag set.  Because right now,
> breakage happens everywhere when these debug options are enabled, and
> the very people who need to be helped are being hurt by the debugging.

That is customarily occurring for testing by adding "slub_debug" to the
kernel commandline (or adding debug kernel options) and since my
information is that this is done frequently (and has been for over a
decade now) I am having a hard time believing the stories of great
breakage here. These drivers were not tested with debugging on before?
Never ran with a debug kernel?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-03 20:13             ` Christopher Lameter
@ 2019-09-03 20:53               ` Matthew Wilcox
  2019-09-04  5:19                 ` Christoph Hellwig
  2019-09-04 19:31                 ` Christopher Lameter
  0 siblings, 2 replies; 40+ messages in thread
From: Matthew Wilcox @ 2019-09-03 20:53 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Michal Hocko, Vlastimil Babka, Andrew Morton, linux-mm,
	linux-kernel, Pekka Enberg, David Rientjes, Ming Lei,
	Dave Chinner, Darrick J . Wong, Christoph Hellwig, linux-xfs,
	linux-fsdevel, linux-block, James Bottomley, linux-btrfs

On Tue, Sep 03, 2019 at 08:13:45PM +0000, Christopher Lameter wrote:
> On Sat, 31 Aug 2019, Matthew Wilcox wrote:
> 
> > > The current behavior without special alignment for these caches has been
> > > in the wild for over a decade. And this is now coming up?
> >
> > In the wild ... and rarely enabled.  When it is enabled, it may or may
> > not be noticed as data corruption, or tripping other debugging asserts.
> > Users then turn off the rare debugging option.
> 
> Its enabled in all full debug session as far as I know. Fedora for
> example has been running this for ages to find breakage in device drivers
> etc etc.

Are you telling me nobody uses the ramdisk driver on fedora?  Because
that's one of the affected drivers.

> > > If there is an exceptional alignment requirement then that needs to be
> > > communicated to the allocator. A special flag or create a special
> > > kmem_cache or something.
> >
> > The only way I'd agree to that is if we deliberately misalign every
> > allocation that doesn't have this special flag set.  Because right now,
> > breakage happens everywhere when these debug options are enabled, and
> > the very people who need to be helped are being hurt by the debugging.
> 
> That is customarily occurring for testing by adding "slub_debug" to the
> kernel commandline (or adding debug kernel options) and since my
> information is that this is done frequently (and has been for over a
> decade now) I am having a hard time believing the stories of great
> breakage here. These drivers were not tested with debugging on before?
> Never ran with a debug kernel?

Whatever is being done is clearly not enough to trigger the bug.  So how
about it?  Create an option to slab/slub to always return misaligned
memory.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-03 20:53               ` Matthew Wilcox
@ 2019-09-04  5:19                 ` Christoph Hellwig
  2019-09-04  6:40                   ` Ming Lei
  2019-09-04 19:31                 ` Christopher Lameter
  1 sibling, 1 reply; 40+ messages in thread
From: Christoph Hellwig @ 2019-09-04  5:19 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christopher Lameter, Michal Hocko, Vlastimil Babka,
	Andrew Morton, linux-mm, linux-kernel, Pekka Enberg,
	David Rientjes, Ming Lei, Dave Chinner, Darrick J . Wong,
	Christoph Hellwig, linux-xfs, linux-fsdevel, linux-block,
	James Bottomley, linux-btrfs

On Tue, Sep 03, 2019 at 01:53:12PM -0700, Matthew Wilcox wrote:
> > Its enabled in all full debug session as far as I know. Fedora for
> > example has been running this for ages to find breakage in device drivers
> > etc etc.
> 
> Are you telling me nobody uses the ramdisk driver on fedora?  Because
> that's one of the affected drivers.

For pmem/brd misaligned memory alone doesn't seem to be the problem.
Misaligned memory that cross a page barrier is.  And at least XFS
before my log recovery changes only used kmalloc for smaller than
page size allocation, so this case probably didn't hit.  But other
cases where alignment and not just not crossing a page boundary
occurred and we had problems with those before.  It just too a long
time for people to root cause them.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-04  5:19                 ` Christoph Hellwig
@ 2019-09-04  6:40                   ` Ming Lei
  2019-09-04  7:20                     ` Vlastimil Babka
  0 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2019-09-04  6:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Matthew Wilcox, Christopher Lameter, Michal Hocko,
	Vlastimil Babka, Andrew Morton, linux-mm, linux-kernel,
	Pekka Enberg, David Rientjes, Dave Chinner, Darrick J . Wong,
	linux-xfs, linux-fsdevel, linux-block, James Bottomley,
	linux-btrfs

On Wed, Sep 04, 2019 at 07:19:33AM +0200, Christoph Hellwig wrote:
> On Tue, Sep 03, 2019 at 01:53:12PM -0700, Matthew Wilcox wrote:
> > > Its enabled in all full debug session as far as I know. Fedora for
> > > example has been running this for ages to find breakage in device drivers
> > > etc etc.
> > 
> > Are you telling me nobody uses the ramdisk driver on fedora?  Because
> > that's one of the affected drivers.
> 
> For pmem/brd misaligned memory alone doesn't seem to be the problem.
> Misaligned memory that cross a page barrier is.  And at least XFS
> before my log recovery changes only used kmalloc for smaller than
> page size allocation, so this case probably didn't hit.

BTW, does sl[aou]b guarantee that smaller than page size allocation via kmalloc()
won't cross page boundary any time?

Thanks,
Ming

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-04  6:40                   ` Ming Lei
@ 2019-09-04  7:20                     ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2019-09-04  7:20 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig
  Cc: Matthew Wilcox, Christopher Lameter, Michal Hocko, Andrew Morton,
	linux-mm, linux-kernel, Pekka Enberg, David Rientjes,
	Dave Chinner, Darrick J . Wong, linux-xfs, linux-fsdevel,
	linux-block, James Bottomley, linux-btrfs

On 9/4/19 8:40 AM, Ming Lei wrote:
> On Wed, Sep 04, 2019 at 07:19:33AM +0200, Christoph Hellwig wrote:
>> On Tue, Sep 03, 2019 at 01:53:12PM -0700, Matthew Wilcox wrote:
>>>> Its enabled in all full debug session as far as I know. Fedora for
>>>> example has been running this for ages to find breakage in device drivers
>>>> etc etc.
>>>
>>> Are you telling me nobody uses the ramdisk driver on fedora?  Because
>>> that's one of the affected drivers.
>>
>> For pmem/brd misaligned memory alone doesn't seem to be the problem.
>> Misaligned memory that cross a page barrier is.  And at least XFS
>> before my log recovery changes only used kmalloc for smaller than
>> page size allocation, so this case probably didn't hit.
> 
> BTW, does sl[aou]b guarantee that smaller than page size allocation via kmalloc()
> won't cross page boundary any time?

AFAIK no, it's the same as with the alignment. After this patch, that
guarantee would follow from the alignment one for power of two sizes,
but sizes 96 and 192 could still cross page boundary.

> Thanks,
> Ming
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-03 20:53               ` Matthew Wilcox
  2019-09-04  5:19                 ` Christoph Hellwig
@ 2019-09-04 19:31                 ` Christopher Lameter
  1 sibling, 0 replies; 40+ messages in thread
From: Christopher Lameter @ 2019-09-04 19:31 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Michal Hocko, Vlastimil Babka, Andrew Morton, linux-mm,
	linux-kernel, Pekka Enberg, David Rientjes, Ming Lei,
	Dave Chinner, Darrick J . Wong, Christoph Hellwig, linux-xfs,
	linux-fsdevel, linux-block, James Bottomley, linux-btrfs

On Tue, 3 Sep 2019, Matthew Wilcox wrote

> > Its enabled in all full debug session as far as I know. Fedora for
> > example has been running this for ages to find breakage in device drivers
> > etc etc.
>
> Are you telling me nobody uses the ramdisk driver on fedora?  Because
> that's one of the affected drivers.

How do I know? I dont run these tests.

> > decade now) I am having a hard time believing the stories of great
> > breakage here. These drivers were not tested with debugging on before?
> > Never ran with a debug kernel?
>
> Whatever is being done is clearly not enough to trigger the bug.  So how
> about it?  Create an option to slab/slub to always return misaligned
> memory.

It already exists

Add "slub_debug" on the kernel command line.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-08-26 11:16 ` [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) Vlastimil Babka
  2019-08-28 18:45   ` Christopher Lameter
@ 2019-09-23 16:36   ` Vlastimil Babka
  2019-09-23 17:17     ` David Sterba
                       ` (3 more replies)
  2019-09-23 18:57   ` Matthew Wilcox
  2 siblings, 4 replies; 40+ messages in thread
From: Vlastimil Babka @ 2019-09-23 16:36 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: linux-kernel, Christoph Lameter, Pekka Enberg, David Rientjes,
	Ming Lei, Dave Chinner, Matthew Wilcox, Darrick J . Wong,
	Christoph Hellwig, linux-xfs, linux-fsdevel, linux-block,
	James Bottomley, linux-btrfs, Roman Gushchin, Johannes Weiner

On 8/26/19 1:16 PM, Vlastimil Babka wrote:
> In most configurations, kmalloc() happens to return naturally aligned (i.e.
> aligned to the block size itself) blocks for power of two sizes. That means
> some kmalloc() users might unknowingly rely on that alignment, until stuff
> breaks when the kernel is built with e.g.  CONFIG_SLUB_DEBUG or CONFIG_SLOB,
> and blocks stop being aligned. Then developers have to devise workaround such
> as own kmem caches with specified alignment [1], which is not always practical,
> as recently evidenced in [2].
> 
> The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()'
> variant would not help with code unknowingly relying on the implicit alignment.
> For slab implementations it would either require creating more kmalloc caches,
> or allocate a larger size and only give back part of it. That would be
> wasteful, especially with a generic alignment parameter (in contrast with a
> fixed alignment to size).
> 
> Ideally we should provide to mm users what they need without difficult
> workarounds or own reimplementations, so let's make the kmalloc() alignment to
> size explicitly guaranteed for power-of-two sizes under all configurations.
> What this means for the three available allocators?
> 
> * SLAB object layout happens to be mostly unchanged by the patch. The
>   implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due
>   to redzoning, however SLAB disables redzoning for caches with alignment
>   larger than unsigned long long. Practically on at least x86 this includes
>   kmalloc caches as they use cache line alignment, which is larger than that.
>   Still, this patch ensures alignment on all arches and cache sizes.
> 
> * SLUB layout is also unchanged unless redzoning is enabled through
>   CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With
>   this patch, explicit alignment is guaranteed with redzoning as well. This
>   will result in more memory being wasted, but that should be acceptable in a
>   debugging scenario.
> 
> * SLOB has no implicit alignment so this patch adds it explicitly for
>   kmalloc(). The potential downside is increased fragmentation. While
>   pathological allocation scenarios are certainly possible, in my testing,
>   after booting a x86_64 kernel+userspace with virtme, around 16MB memory
>   was consumed by slab pages both before and after the patch, with difference
>   in the noise.
> 
> [1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
> [2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/
> [3] https://lwn.net/Articles/787740/
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

So if anyone thinks this is a good idea, please express it (preferably
in a formal way such as Acked-by), otherwise it seems the patch will be
dropped (due to a private NACK, apparently).

Otherwise I don't think there can be an objective conclusion. On the one
hand we avoid further problems and workarounds due to misalignment (or
objects allocated beyond page boundary, which was only recently
mentioned), on the other hand we potentially make future changes to
SLAB/SLUB or hypotetical new implementation either more complicated, or
less effective due to extra fragmentation. Different people can have
different opinions on what's more important.

Let me however explain why I think we don't have to fear the future
implementation complications that much. There was an argument IIRC that
extra non-debug metadata could start to be prepended/appended to an
object in the future (i.e. RCU freeing head?).

1) Caches can be already created with explicit alignment, so a naive
pre/appending implementation would already waste memory on such caches.
2) Even without explicit alignment, a single slab cache for 512k objects
with few bytes added to each object would waste almost 512k as the
objects wouldn't fit precisely in an (order-X) page. The percentage
wasted depends on X.
3) Roman recently posted a patchset [1] that basically adds a cgroup
pointer to each object. The implementation doesn't append it to objects
naively however, but adds a separately allocated array. Alignment is
thus unchanged.

[1] https://lore.kernel.org/linux-mm/20190905214553.1643060-1-guro@fb.com/


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-23 16:36   ` Vlastimil Babka
@ 2019-09-23 17:17     ` David Sterba
  2019-09-23 17:51       ` Darrick J. Wong
  2019-09-24 20:52       ` cl
  2019-09-23 17:54     ` Roman Gushchin
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 40+ messages in thread
From: David Sterba @ 2019-09-23 17:17 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Christoph Lameter,
	Pekka Enberg, David Rientjes, Ming Lei, Dave Chinner,
	Matthew Wilcox, Darrick J . Wong, Christoph Hellwig, linux-xfs,
	linux-fsdevel, linux-block, James Bottomley, linux-btrfs,
	Roman Gushchin, Johannes Weiner

On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
> So if anyone thinks this is a good idea, please express it (preferably
> in a formal way such as Acked-by), otherwise it seems the patch will be
> dropped (due to a private NACK, apparently).

As a user of the allocator interface in filesystem, I'd like to see a
more generic way to address the alignment guarantees so we don't have to
apply workarounds like 3acd48507dc43eeeb each time we find that we
missed something. (Where 'missed' might be another sort of weird memory
corruption hard to trigger.)

The workaround got applied because I was not sure about the timeframe of
merge of this patch, also to remove pressure for merge in case there are
more private acks and nacks to be sent. In the end I'd be fine with
reverting the workaround in order to use the generic code again.

Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-23 17:17     ` David Sterba
@ 2019-09-23 17:51       ` Darrick J. Wong
  2019-09-24 20:47         ` cl
  2019-09-24 21:19         ` Vlastimil Babka
  2019-09-24 20:52       ` cl
  1 sibling, 2 replies; 40+ messages in thread
From: Darrick J. Wong @ 2019-09-23 17:51 UTC (permalink / raw)
  To: dsterba, Vlastimil Babka, Andrew Morton, linux-mm, linux-kernel,
	Christoph Lameter, Pekka Enberg, David Rientjes, Ming Lei,
	Dave Chinner, Matthew Wilcox, Christoph Hellwig, linux-xfs,
	linux-fsdevel, linux-block, James Bottomley, linux-btrfs,
	Roman Gushchin, Johannes Weiner

On Mon, Sep 23, 2019 at 07:17:10PM +0200, David Sterba wrote:
> On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
> > So if anyone thinks this is a good idea, please express it (preferably
> > in a formal way such as Acked-by), otherwise it seems the patch will be
> > dropped (due to a private NACK, apparently).

Oh, I didn't realize  ^^^^^^^^^^^^ that *some* of us are allowed the
privilege of gutting a patch via private NAK without any of that open
development discussion incovenience. <grumble>

As far as XFS is concerned I merged Dave's series that checks the
alignment of io memory allocations and falls back to vmalloc if the
alignment won't work, because I got tired of scrolling past the endless
discussion and bug reports and inaction spanning months.

Now this private NAK stuff helps me feel vindicated for merging it
despite my misgivings because now I can declare that "XFS will just work
around all the stupid broken sh*t it finds in the rest of the kernel".

--D

> As a user of the allocator interface in filesystem, I'd like to see a
> more generic way to address the alignment guarantees so we don't have to
> apply workarounds like 3acd48507dc43eeeb each time we find that we
> missed something. (Where 'missed' might be another sort of weird memory
> corruption hard to trigger.)
> 
> The workaround got applied because I was not sure about the timeframe of
> merge of this patch, also to remove pressure for merge in case there are
> more private acks and nacks to be sent. In the end I'd be fine with
> reverting the workaround in order to use the generic code again.
> 
> Thanks.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-23 16:36   ` Vlastimil Babka
  2019-09-23 17:17     ` David Sterba
@ 2019-09-23 17:54     ` Roman Gushchin
  2019-09-30  8:06     ` Christoph Hellwig
  2019-09-30  9:23     ` Michal Hocko
  3 siblings, 0 replies; 40+ messages in thread
From: Roman Gushchin @ 2019-09-23 17:54 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Christoph Lameter,
	Pekka Enberg, David Rientjes, Ming Lei, Dave Chinner,
	Matthew Wilcox, Darrick J . Wong, Christoph Hellwig, linux-xfs,
	linux-fsdevel, linux-block, James Bottomley, linux-btrfs,
	Johannes Weiner

On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
> On 8/26/19 1:16 PM, Vlastimil Babka wrote:
> > In most configurations, kmalloc() happens to return naturally aligned (i.e.
> > aligned to the block size itself) blocks for power of two sizes. That means
> > some kmalloc() users might unknowingly rely on that alignment, until stuff
> > breaks when the kernel is built with e.g.  CONFIG_SLUB_DEBUG or CONFIG_SLOB,
> > and blocks stop being aligned. Then developers have to devise workaround such
> > as own kmem caches with specified alignment [1], which is not always practical,
> > as recently evidenced in [2].
> > 
> > The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()'
> > variant would not help with code unknowingly relying on the implicit alignment.
> > For slab implementations it would either require creating more kmalloc caches,
> > or allocate a larger size and only give back part of it. That would be
> > wasteful, especially with a generic alignment parameter (in contrast with a
> > fixed alignment to size).
> > 
> > Ideally we should provide to mm users what they need without difficult
> > workarounds or own reimplementations, so let's make the kmalloc() alignment to
> > size explicitly guaranteed for power-of-two sizes under all configurations.
> > What this means for the three available allocators?
> > 
> > * SLAB object layout happens to be mostly unchanged by the patch. The
> >   implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due
> >   to redzoning, however SLAB disables redzoning for caches with alignment
> >   larger than unsigned long long. Practically on at least x86 this includes
> >   kmalloc caches as they use cache line alignment, which is larger than that.
> >   Still, this patch ensures alignment on all arches and cache sizes.
> > 
> > * SLUB layout is also unchanged unless redzoning is enabled through
> >   CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With
> >   this patch, explicit alignment is guaranteed with redzoning as well. This
> >   will result in more memory being wasted, but that should be acceptable in a
> >   debugging scenario.
> > 
> > * SLOB has no implicit alignment so this patch adds it explicitly for
> >   kmalloc(). The potential downside is increased fragmentation. While
> >   pathological allocation scenarios are certainly possible, in my testing,
> >   after booting a x86_64 kernel+userspace with virtme, around 16MB memory
> >   was consumed by slab pages both before and after the patch, with difference
> >   in the noise.
> > 
> > [1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
> > [2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/
> > [3] https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_787740_&d=DwICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=UUX1YoPTOOycNowHuRP2ZnqwSwZFjAFrkQFrqstidZ0&s=Kt_XTKlh2qxbC_7ME44MV3_QWFVRHlI1p2EQZYP0uqY&e= 
> > 
> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> So if anyone thinks this is a good idea, please express it (preferably
> in a formal way such as Acked-by), otherwise it seems the patch will be
> dropped (due to a private NACK, apparently).
> 
> Otherwise I don't think there can be an objective conclusion. On the one
> hand we avoid further problems and workarounds due to misalignment (or
> objects allocated beyond page boundary, which was only recently
> mentioned), on the other hand we potentially make future changes to
> SLAB/SLUB or hypotetical new implementation either more complicated, or
> less effective due to extra fragmentation. Different people can have
> different opinions on what's more important.
> 
> Let me however explain why I think we don't have to fear the future
> implementation complications that much. There was an argument IIRC that
> extra non-debug metadata could start to be prepended/appended to an
> object in the future (i.e. RCU freeing head?).
> 
> 1) Caches can be already created with explicit alignment, so a naive
> pre/appending implementation would already waste memory on such caches.
> 2) Even without explicit alignment, a single slab cache for 512k objects
> with few bytes added to each object would waste almost 512k as the
> objects wouldn't fit precisely in an (order-X) page. The percentage
> wasted depends on X.
> 3) Roman recently posted a patchset [1] that basically adds a cgroup
> pointer to each object. The implementation doesn't append it to objects
> naively however, but adds a separately allocated array. Alignment is
> thus unchanged.

To be fair here, we *might* want to put this pointer just after/before
the object to reduce the number of cache misses. I've put it into
a separate place mostly for simplicity reasons.

It's not an objection though, just a note.

Thanks!

> 
> [1] https://lore.kernel.org/linux-mm/20190905214553.1643060-1-guro@fb.com/
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-08-26 11:16 ` [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) Vlastimil Babka
  2019-08-28 18:45   ` Christopher Lameter
  2019-09-23 16:36   ` Vlastimil Babka
@ 2019-09-23 18:57   ` Matthew Wilcox
  2 siblings, 0 replies; 40+ messages in thread
From: Matthew Wilcox @ 2019-09-23 18:57 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Christoph Lameter,
	Pekka Enberg, David Rientjes, Ming Lei, Dave Chinner,
	Darrick J . Wong, Christoph Hellwig, linux-xfs, linux-fsdevel,
	linux-block, James Bottomley, linux-btrfs

On Mon, Aug 26, 2019 at 01:16:27PM +0200, Vlastimil Babka wrote:
> @@ -98,6 +98,10 @@ limited. The actual limit depends on the hardware and the kernel
>  configuration, but it is a good practice to use `kmalloc` for objects
>  smaller than page size.
>  
> +The address of a chunk allocated with `kmalloc` is aligned to at least
> +ARCH_KMALLOC_MINALIGN bytes. For sizes of power of two bytes, the

"For sizes which are a power of two, the"

> +alignment is also guaranteed to be at least to the respective size.

s/to //

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-23 17:51       ` Darrick J. Wong
@ 2019-09-24 20:47         ` cl
  2019-09-24 20:51           ` Matthew Wilcox
  2019-09-24 21:19         ` Vlastimil Babka
  1 sibling, 1 reply; 40+ messages in thread
From: cl @ 2019-09-24 20:47 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: dsterba, Vlastimil Babka, Andrew Morton, linux-mm, linux-kernel,
	Pekka Enberg, David Rientjes, Ming Lei, Dave Chinner,
	Matthew Wilcox, Christoph Hellwig, linux-xfs, linux-fsdevel,
	linux-block, James Bottomley, linux-btrfs, Roman Gushchin,
	Johannes Weiner

On Mon, 23 Sep 2019, Darrick J. Wong wrote:

> On Mon, Sep 23, 2019 at 07:17:10PM +0200, David Sterba wrote:
> > On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
> > > So if anyone thinks this is a good idea, please express it (preferably
> > > in a formal way such as Acked-by), otherwise it seems the patch will be
> > > dropped (due to a private NACK, apparently).
>
> Oh, I didn't realize  ^^^^^^^^^^^^ that *some* of us are allowed the
> privilege of gutting a patch via private NAK without any of that open
> development discussion incovenience. <grumble>

There was a public discussion about this issue and from what I can tell
the outcome was that the allocator already provides what you want. Which
was a mechanism to misalign objects and detect these issues. This
mechanism has been in use for over a decade.



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-24 20:47         ` cl
@ 2019-09-24 20:51           ` Matthew Wilcox
  2019-09-24 20:55             ` cl
  0 siblings, 1 reply; 40+ messages in thread
From: Matthew Wilcox @ 2019-09-24 20:51 UTC (permalink / raw)
  To: cl
  Cc: Darrick J. Wong, dsterba, Vlastimil Babka, Andrew Morton,
	linux-mm, linux-kernel, Pekka Enberg, David Rientjes, Ming Lei,
	Dave Chinner, Christoph Hellwig, linux-xfs, linux-fsdevel,
	linux-block, James Bottomley, linux-btrfs, Roman Gushchin,
	Johannes Weiner

On Tue, Sep 24, 2019 at 08:47:52PM +0000, cl@linux.com wrote:
> On Mon, 23 Sep 2019, Darrick J. Wong wrote:
> 
> > On Mon, Sep 23, 2019 at 07:17:10PM +0200, David Sterba wrote:
> > > On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
> > > > So if anyone thinks this is a good idea, please express it (preferably
> > > > in a formal way such as Acked-by), otherwise it seems the patch will be
> > > > dropped (due to a private NACK, apparently).
> >
> > Oh, I didn't realize  ^^^^^^^^^^^^ that *some* of us are allowed the
> > privilege of gutting a patch via private NAK without any of that open
> > development discussion incovenience. <grumble>
> 
> There was a public discussion about this issue and from what I can tell
> the outcome was that the allocator already provides what you want. Which
> was a mechanism to misalign objects and detect these issues. This
> mechanism has been in use for over a decade.

You missed the important part, which was *ENABLED BY DEFAULT*.  People
who are enabling a debugging option to debug their issues, should not
have to first debug all the other issues that enabling that debugging
option uncovers!

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-23 17:17     ` David Sterba
  2019-09-23 17:51       ` Darrick J. Wong
@ 2019-09-24 20:52       ` cl
  2019-09-24 23:54         ` Andrew Morton
  1 sibling, 1 reply; 40+ messages in thread
From: cl @ 2019-09-24 20:52 UTC (permalink / raw)
  To: David Sterba
  Cc: Vlastimil Babka, Andrew Morton, linux-mm, linux-kernel,
	Pekka Enberg, David Rientjes, Ming Lei, Dave Chinner,
	Matthew Wilcox, Darrick J . Wong, Christoph Hellwig, linux-xfs,
	linux-fsdevel, linux-block, James Bottomley, linux-btrfs,
	Roman Gushchin, Johannes Weiner

On Mon, 23 Sep 2019, David Sterba wrote:

> As a user of the allocator interface in filesystem, I'd like to see a
> more generic way to address the alignment guarantees so we don't have to
> apply workarounds like 3acd48507dc43eeeb each time we find that we
> missed something. (Where 'missed' might be another sort of weird memory
> corruption hard to trigger.)

The alignment guarantees are clearly documented and objects are misaligned
in debugging kernels.

Looking at 3acd48507dc43eeeb:Looks like no one tested that patch with a
debug kernel or full debugging on until it hit mainline. Not good.

The consequence for the lack of proper testing is to make the production
kernel contain the debug measures?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-24 20:51           ` Matthew Wilcox
@ 2019-09-24 20:55             ` cl
  2019-09-26 13:02               ` David Sterba
  0 siblings, 1 reply; 40+ messages in thread
From: cl @ 2019-09-24 20:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Darrick J. Wong, dsterba, Vlastimil Babka, Andrew Morton,
	linux-mm, linux-kernel, Pekka Enberg, David Rientjes, Ming Lei,
	Dave Chinner, Christoph Hellwig, linux-xfs, linux-fsdevel,
	linux-block, James Bottomley, linux-btrfs, Roman Gushchin,
	Johannes Weiner

n Tue, 24 Sep 2019, Matthew Wilcox wrote:

> > There was a public discussion about this issue and from what I can tell
> > the outcome was that the allocator already provides what you want. Which
> > was a mechanism to misalign objects and detect these issues. This
> > mechanism has been in use for over a decade.
>
> You missed the important part, which was *ENABLED BY DEFAULT*.  People
> who are enabling a debugging option to debug their issues, should not
> have to first debug all the other issues that enabling that debugging
> option uncovers!

Why would you have to debug all other issues? You could put your patch on
top of the latest stable or distro kernel for testing.

And I thought the rc phase was there for everyone to work on the bugs of
each other?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-23 17:51       ` Darrick J. Wong
  2019-09-24 20:47         ` cl
@ 2019-09-24 21:19         ` Vlastimil Babka
  2019-09-24 21:53           ` Dave Chinner
  1 sibling, 1 reply; 40+ messages in thread
From: Vlastimil Babka @ 2019-09-24 21:19 UTC (permalink / raw)
  To: Darrick J. Wong, dsterba, Andrew Morton, linux-mm, linux-kernel,
	Christoph Lameter, Pekka Enberg, David Rientjes, Ming Lei,
	Dave Chinner, Matthew Wilcox, Christoph Hellwig, linux-xfs,
	linux-fsdevel, linux-block, James Bottomley, linux-btrfs,
	Roman Gushchin, Johannes Weiner

On 9/23/19 7:51 PM, Darrick J. Wong wrote:
> On Mon, Sep 23, 2019 at 07:17:10PM +0200, David Sterba wrote:
>> On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
>>> So if anyone thinks this is a good idea, please express it (preferably
>>> in a formal way such as Acked-by), otherwise it seems the patch will be
>>> dropped (due to a private NACK, apparently).
> 
> Oh, I didn't realize  ^^^^^^^^^^^^ that *some* of us are allowed the
> privilege of gutting a patch via private NAK without any of that open
> development discussion incovenience. <grumble>
> 
> As far as XFS is concerned I merged Dave's series that checks the
> alignment of io memory allocations and falls back to vmalloc if the
> alignment won't work, because I got tired of scrolling past the endless
> discussion and bug reports and inaction spanning months.

I think it's a big fail of kmalloc API that you have to do that, and
especially with vmalloc, which has the overhead of setting up page
tables, and it's a waste for allocation requests smaller than page size.
I wish we could have nice things.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-24 21:19         ` Vlastimil Babka
@ 2019-09-24 21:53           ` Dave Chinner
  2019-09-24 22:21             ` Darrick J. Wong
  0 siblings, 1 reply; 40+ messages in thread
From: Dave Chinner @ 2019-09-24 21:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Darrick J. Wong, dsterba, Andrew Morton, linux-mm, linux-kernel,
	Christoph Lameter, Pekka Enberg, David Rientjes, Ming Lei,
	Matthew Wilcox, Christoph Hellwig, linux-xfs, linux-fsdevel,
	linux-block, James Bottomley, linux-btrfs, Roman Gushchin,
	Johannes Weiner

On Tue, Sep 24, 2019 at 11:19:29PM +0200, Vlastimil Babka wrote:
> On 9/23/19 7:51 PM, Darrick J. Wong wrote:
> > On Mon, Sep 23, 2019 at 07:17:10PM +0200, David Sterba wrote:
> >> On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
> >>> So if anyone thinks this is a good idea, please express it (preferably
> >>> in a formal way such as Acked-by), otherwise it seems the patch will be
> >>> dropped (due to a private NACK, apparently).
> > 
> > Oh, I didn't realize  ^^^^^^^^^^^^ that *some* of us are allowed the
> > privilege of gutting a patch via private NAK without any of that open
> > development discussion incovenience. <grumble>
> > 
> > As far as XFS is concerned I merged Dave's series that checks the
> > alignment of io memory allocations and falls back to vmalloc if the
> > alignment won't work, because I got tired of scrolling past the endless
> > discussion and bug reports and inaction spanning months.
> 
> I think it's a big fail of kmalloc API that you have to do that, and
> especially with vmalloc, which has the overhead of setting up page
> tables, and it's a waste for allocation requests smaller than page size.
> I wish we could have nice things.

I don't think the problem here is the code. The problem here is that
we have a dysfunctional development community and there are no
processes we can follow to ensure architectural problems in core
subsystems are addressed in a timely manner...

And this criticism isn't just of the mm/ here - this alignment
problem is exacerbated by exactly the same issue on the block layer
side. i.e. the block layer and drivers have -zero- bounds checking
to catch these sorts of things and the block layer maintainer will
not accept patches for runtime checks that would catch these issues
and make them instantly visible to us.

These are not code problems: we can fix the problems with code (and
I have done so to demonstrate "this is how we do what you say is
impossible").  The problem here is people in positions of
control/power are repeatedly demonstrating an inability to
compromise to reach a solution that works for everyone.

It's far better for us just to work around bullshit like this in XFS
now, then when the core subsystems get they act together years down
the track we can remove the workaround from XFS. Users don't care
how we fix the problem, they just want it fixed. If that means we
have to route around dysfunctional developer groups, then we'll just
have to do that....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-24 21:53           ` Dave Chinner
@ 2019-09-24 22:21             ` Darrick J. Wong
  0 siblings, 0 replies; 40+ messages in thread
From: Darrick J. Wong @ 2019-09-24 22:21 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Vlastimil Babka, dsterba, Andrew Morton, linux-mm, linux-kernel,
	Christoph Lameter, Pekka Enberg, David Rientjes, Ming Lei,
	Matthew Wilcox, Christoph Hellwig, linux-xfs, linux-fsdevel,
	linux-block, James Bottomley, linux-btrfs, Roman Gushchin,
	Johannes Weiner

On Wed, Sep 25, 2019 at 07:53:53AM +1000, Dave Chinner wrote:
> On Tue, Sep 24, 2019 at 11:19:29PM +0200, Vlastimil Babka wrote:
> > On 9/23/19 7:51 PM, Darrick J. Wong wrote:
> > > On Mon, Sep 23, 2019 at 07:17:10PM +0200, David Sterba wrote:
> > >> On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
> > >>> So if anyone thinks this is a good idea, please express it (preferably
> > >>> in a formal way such as Acked-by), otherwise it seems the patch will be
> > >>> dropped (due to a private NACK, apparently).
> > > 
> > > Oh, I didn't realize  ^^^^^^^^^^^^ that *some* of us are allowed the
> > > privilege of gutting a patch via private NAK without any of that open
> > > development discussion incovenience. <grumble>
> > > 
> > > As far as XFS is concerned I merged Dave's series that checks the
> > > alignment of io memory allocations and falls back to vmalloc if the
> > > alignment won't work, because I got tired of scrolling past the endless
> > > discussion and bug reports and inaction spanning months.
> > 
> > I think it's a big fail of kmalloc API that you have to do that, and
> > especially with vmalloc, which has the overhead of setting up page
> > tables, and it's a waste for allocation requests smaller than page size.
> > I wish we could have nice things.
> 
> I don't think the problem here is the code. The problem here is that
> we have a dysfunctional development community and there are no
> processes we can follow to ensure architectural problems in core
> subsystems are addressed in a timely manner...
> 
> And this criticism isn't just of the mm/ here - this alignment
> problem is exacerbated by exactly the same issue on the block layer
> side. i.e. the block layer and drivers have -zero- bounds checking
> to catch these sorts of things and the block layer maintainer will
> not accept patches for runtime checks that would catch these issues
> and make them instantly visible to us.
> 
> These are not code problems: we can fix the problems with code (and
> I have done so to demonstrate "this is how we do what you say is
> impossible").  The problem here is people in positions of
> control/power are repeatedly demonstrating an inability to
> compromise to reach a solution that works for everyone.
> 
> It's far better for us just to work around bullshit like this in XFS
> now, then when the core subsystems get they act together years down
> the track we can remove the workaround from XFS. Users don't care
> how we fix the problem, they just want it fixed. If that means we
> have to route around dysfunctional developer groups, then we'll just
> have to do that....

Seconded.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-24 20:52       ` cl
@ 2019-09-24 23:54         ` Andrew Morton
  2019-09-25  7:17           ` Vlastimil Babka
  2019-09-26  0:14           ` Christopher Lameter
  0 siblings, 2 replies; 40+ messages in thread
From: Andrew Morton @ 2019-09-24 23:54 UTC (permalink / raw)
  To: cl
  Cc: David Sterba, Vlastimil Babka, linux-mm, linux-kernel,
	Pekka Enberg, David Rientjes, Ming Lei, Dave Chinner,
	Matthew Wilcox, Darrick J . Wong, Christoph Hellwig, linux-xfs,
	linux-fsdevel, linux-block, James Bottomley, linux-btrfs,
	Roman Gushchin, Johannes Weiner

On Tue, 24 Sep 2019 20:52:52 +0000 (UTC) cl@linux.com wrote:

> On Mon, 23 Sep 2019, David Sterba wrote:
> 
> > As a user of the allocator interface in filesystem, I'd like to see a
> > more generic way to address the alignment guarantees so we don't have to
> > apply workarounds like 3acd48507dc43eeeb each time we find that we
> > missed something. (Where 'missed' might be another sort of weird memory
> > corruption hard to trigger.)
> 
> The alignment guarantees are clearly documented and objects are misaligned
> in debugging kernels.
> 
> Looking at 3acd48507dc43eeeb:Looks like no one tested that patch with a
> debug kernel or full debugging on until it hit mainline. Not good.
> 
> The consequence for the lack of proper testing is to make the production
> kernel contain the debug measures?

This isn't a debug measure - it's making the interface do that which
people evidently expect it to do.  Minor point.

I agree it's a bit regrettable to do this but it does appear that the
change will make the kernel overall a better place given the reality of
kernel development.

Given this, have you reviewed the patch for overall implementation
correctness?

I'm wondering if we can avoid at least some of the patch's overhead if
slab debugging is disabled - the allocators are already returning
suitably aligned memory, so why add the new code in that case?


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-24 23:54         ` Andrew Morton
@ 2019-09-25  7:17           ` Vlastimil Babka
  2019-09-26  0:16             ` Christopher Lameter
  2019-09-26  0:14           ` Christopher Lameter
  1 sibling, 1 reply; 40+ messages in thread
From: Vlastimil Babka @ 2019-09-25  7:17 UTC (permalink / raw)
  To: Andrew Morton, cl
  Cc: David Sterba, linux-mm, linux-kernel, Pekka Enberg,
	David Rientjes, Ming Lei, Dave Chinner, Matthew Wilcox,
	Darrick J . Wong, Christoph Hellwig, linux-xfs, linux-fsdevel,
	linux-block, James Bottomley, linux-btrfs, Roman Gushchin,
	Johannes Weiner

On 9/25/19 1:54 AM, Andrew Morton wrote:
> On Tue, 24 Sep 2019 20:52:52 +0000 (UTC) cl@linux.com wrote:
> 
>> On Mon, 23 Sep 2019, David Sterba wrote:
>>
>>> As a user of the allocator interface in filesystem, I'd like to see a
>>> more generic way to address the alignment guarantees so we don't have to
>>> apply workarounds like 3acd48507dc43eeeb each time we find that we
>>> missed something. (Where 'missed' might be another sort of weird memory
>>> corruption hard to trigger.)
>>
>> The alignment guarantees are clearly documented and objects are misaligned
>> in debugging kernels.
>>
>> Looking at 3acd48507dc43eeeb:Looks like no one tested that patch with a
>> debug kernel or full debugging on until it hit mainline. Not good.
>>
>> The consequence for the lack of proper testing is to make the production
>> kernel contain the debug measures?
> 
> This isn't a debug measure - it's making the interface do that which
> people evidently expect it to do.  Minor point.

Yes, detecting issues due to misalignment is one thing, but then there
are the workarounds necessary to achieve it (for multiple sizes, so no
single kmem_cache_create(..., alignment)), as XFS folks demonstrated.

> I agree it's a bit regrettable to do this but it does appear that the
> change will make the kernel overall a better place given the reality of
> kernel development.

Thanks.

> Given this, have you reviewed the patch for overall implementation
> correctness?
> 
> I'm wondering if we can avoid at least some of the patch's overhead if
> slab debugging is disabled - the allocators are already returning
> suitably aligned memory, so why add the new code in that case?

Most of the new code is for SLOB, which has no debugging and yet
misaligns. For SLUB and SLAB, it's just passing alignment argument to
kmem_cache_create() for kmalloc caches, which means just extra few
instructions during boot, and no extra code during kmalloc/kfree itself.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-24 23:54         ` Andrew Morton
  2019-09-25  7:17           ` Vlastimil Babka
@ 2019-09-26  0:14           ` Christopher Lameter
  2019-09-26  7:41             ` Vlastimil Babka
  1 sibling, 1 reply; 40+ messages in thread
From: Christopher Lameter @ 2019-09-26  0:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Sterba, Vlastimil Babka, linux-mm, linux-kernel,
	Pekka Enberg, David Rientjes, Ming Lei, Dave Chinner,
	Matthew Wilcox, Darrick J . Wong, Christoph Hellwig, linux-xfs,
	linux-fsdevel, linux-block, James Bottomley, linux-btrfs,
	Roman Gushchin, Johannes Weiner

On Tue, 24 Sep 2019, Andrew Morton wrote:

> I agree it's a bit regrettable to do this but it does appear that the
> change will make the kernel overall a better place given the reality of
> kernel development.

No it wont.

- It will only work for special cases like the kmalloc array
without extras like metadata at the end of objects.

- It will be an inconsistency in the alignments provided by the allocator.

- It will cause us in the future to constantly consider these exceptional
alignments in the maintenance of the allocators.

- These alignments are only needed in exceptional cases but with the patch
we will provide the alignment by default even if the allocating subsystem
does not need it.

- We have mechanisms to detect alignment problems using debug kernels and
debug options that have been available for years. These were not used for
testing in these cases it seems before the patches hit mainline. Once in
mainly someone ran a debug kernel and found the issue.

> Given this, have you reviewed the patch for overall implementation
> correctness?

Yes, the patch is fine.

> I'm wondering if we can avoid at least some of the patch's overhead if
> slab debugging is disabled - the allocators are already returning
> suitably aligned memory, so why add the new code in that case?

As far as I know this patch is not needed given that we have had the
standards for alignments for a long time now.

Why would the allocators provide specially aligned memory just based on
the size of an object? This is weird and unexpected behavior.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-25  7:17           ` Vlastimil Babka
@ 2019-09-26  0:16             ` Christopher Lameter
  0 siblings, 0 replies; 40+ messages in thread
From: Christopher Lameter @ 2019-09-26  0:16 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Sterba, linux-mm, linux-kernel,
	Pekka Enberg, David Rientjes, Ming Lei, Dave Chinner,
	Matthew Wilcox, Darrick J . Wong, Christoph Hellwig, linux-xfs,
	linux-fsdevel, linux-block, James Bottomley, linux-btrfs,
	Roman Gushchin, Johannes Weiner

On Wed, 25 Sep 2019, Vlastimil Babka wrote:

> Most of the new code is for SLOB, which has no debugging and yet
> misaligns. For SLUB and SLAB, it's just passing alignment argument to
> kmem_cache_create() for kmalloc caches, which means just extra few
> instructions during boot, and no extra code during kmalloc/kfree itself.

SLOB follows the standards for alignments in slab allocators and will
correctly align if you ask the allocator for a properly aligned object.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-26  0:14           ` Christopher Lameter
@ 2019-09-26  7:41             ` Vlastimil Babka
  2019-09-28  1:12               ` Christopher Lameter
  0 siblings, 1 reply; 40+ messages in thread
From: Vlastimil Babka @ 2019-09-26  7:41 UTC (permalink / raw)
  To: Christopher Lameter, Andrew Morton
  Cc: David Sterba, linux-mm, linux-kernel, Pekka Enberg,
	David Rientjes, Ming Lei, Dave Chinner, Matthew Wilcox,
	Darrick J . Wong, Christoph Hellwig, linux-xfs, linux-fsdevel,
	linux-block, James Bottomley, linux-btrfs, Roman Gushchin,
	Johannes Weiner

On 9/26/19 2:14 AM, Christopher Lameter wrote:
> On Tue, 24 Sep 2019, Andrew Morton wrote:
> 
>> I agree it's a bit regrettable to do this but it does appear that the
>> change will make the kernel overall a better place given the reality of
>> kernel development.
> 
> No it wont.
> 
> - It will only work for special cases like the kmalloc array
> without extras like metadata at the end of objects.

I don't understand what you mean here? The kmalloc caches are special
because they don't have metadata at the end of objects? Others do?

> - It will be an inconsistency in the alignments provided by the allocator.

I don't see a scenario where this will cause a kmalloc user problems.
Can you describe a scenario where a kmalloc users would have some
assumptions about alignment, but due to this change, those assumptions
will be incorrect, and how exactly would it break their code?

> - It will cause us in the future to constantly consider these exceptional
> alignments in the maintenance of the allocators.

Caches can be already created with explicit alignment. This patch just
means there are more of them.

> - These alignments are only needed in exceptional cases but with the patch
> we will provide the alignment by default even if the allocating subsystem
> does not need it.

True. This is where we have to make the decision whether to make things
simpler for those that don't realize they need the alignment, and
whether that's worth the cost. We have evidence of those cases, and the
cost is currently zero in the common cases (SLAB, SLUB without debug
runtime-enabled).

> - We have mechanisms to detect alignment problems using debug kernels and
> debug options that have been available for years. These were not used for
> testing in these cases it seems before the patches hit mainline. Once in
> mainly someone ran a debug kernel and found the issue.

Debugging options are useful if you know there's a bug and you want to
find it. AFAIK the various bots/CIs that do e.g. randconfig, or enable
debug options explicitly, run those kernels in a VM, so I guess that's
why potential breakage due to alignment can lurk in a hw-specific driver.

>> Given this, have you reviewed the patch for overall implementation
>> correctness?
> 
> Yes, the patch is fine.
> 
>> I'm wondering if we can avoid at least some of the patch's overhead if
>> slab debugging is disabled - the allocators are already returning
>> suitably aligned memory, so why add the new code in that case?
> 
> As far as I know this patch is not needed given that we have had the
> standards for alignments for a long time now.
> 
> Why would the allocators provide specially aligned memory just based on
> the size of an object? This is weird and unexpected behavior.

For some, it's expected.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-24 20:55             ` cl
@ 2019-09-26 13:02               ` David Sterba
  0 siblings, 0 replies; 40+ messages in thread
From: David Sterba @ 2019-09-26 13:02 UTC (permalink / raw)
  To: cl
  Cc: Matthew Wilcox, Darrick J. Wong, dsterba, Vlastimil Babka,
	Andrew Morton, linux-mm, linux-kernel, Pekka Enberg,
	David Rientjes, Ming Lei, Dave Chinner, Christoph Hellwig,
	linux-xfs, linux-fsdevel, linux-block, James Bottomley,
	linux-btrfs, Roman Gushchin, Johannes Weiner

On Tue, Sep 24, 2019 at 08:55:02PM +0000, cl@linux.com wrote:
> n Tue, 24 Sep 2019, Matthew Wilcox wrote:
> 
> > > There was a public discussion about this issue and from what I can tell
> > > the outcome was that the allocator already provides what you want. Which
> > > was a mechanism to misalign objects and detect these issues. This
> > > mechanism has been in use for over a decade.
> >
> > You missed the important part, which was *ENABLED BY DEFAULT*.  People
> > who are enabling a debugging option to debug their issues, should not
> > have to first debug all the other issues that enabling that debugging
> > option uncovers!
> 
> Why would you have to debug all other issues? You could put your patch on
> top of the latest stable or distro kernel for testing.

This does not work in development branches. They're based on some stable
point but otherwise there's a lot of new code that usually has bugs and
it's quite important be able to understand where the bug comes from.

And the debugging instrumentation is there to add more sanity checks and
canaries to catch overflows, assertions etc. If it's unreliable, then
there's no point using it during development, IOW fix all bugs first and
then see if there are more left after turning the debugging.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-26  7:41             ` Vlastimil Babka
@ 2019-09-28  1:12               ` Christopher Lameter
  2019-09-30 13:32                 ` Matthew Wilcox
  0 siblings, 1 reply; 40+ messages in thread
From: Christopher Lameter @ 2019-09-28  1:12 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Sterba, linux-mm, linux-kernel,
	Pekka Enberg, David Rientjes, Ming Lei, Dave Chinner,
	Matthew Wilcox, Darrick J . Wong, Christoph Hellwig, linux-xfs,
	linux-fsdevel, linux-block, James Bottomley, linux-btrfs,
	Roman Gushchin, Johannes Weiner

On Thu, 26 Sep 2019, Vlastimil Babka wrote:

> > - It will only work for special cases like the kmalloc array
> > without extras like metadata at the end of objects.
>
> I don't understand what you mean here? The kmalloc caches are special
> because they don't have metadata at the end of objects? Others do?

Yes.

> > - These alignments are only needed in exceptional cases but with the patch
> > we will provide the alignment by default even if the allocating subsystem
> > does not need it.
>
> True. This is where we have to make the decision whether to make things
> simpler for those that don't realize they need the alignment, and
> whether that's worth the cost. We have evidence of those cases, and the
> cost is currently zero in the common cases (SLAB, SLUB without debug
> runtime-enabled).

The cost is zero for a particular layout of the objects in a page using a
particular allocator and hardware configuration.

However, the layout may be different due to another allocator that prefers
to arrange things differently (SLOB puts multiple objects of different
types in the same page to save memory), if we need to add data to these
objects (debugging info, new metadata about the object, maybe the memcg
pointer, maybe other things that may come up), or other innovative
approaches (such as putting data of different kmem caches that are
commonly used together in the same page to improve locality).

The cost is an unnecessary petrification of the data layout of the memory
allocators.

> > - We have mechanisms to detect alignment problems using debug kernels and
> > debug options that have been available for years. These were not used for
> > testing in these cases it seems before the patches hit mainline. Once in
> > mainly someone ran a debug kernel and found the issue.
>
> Debugging options are useful if you know there's a bug and you want to
> find it. AFAIK the various bots/CIs that do e.g. randconfig, or enable
> debug options explicitly, run those kernels in a VM, so I guess that's
> why potential breakage due to alignment can lurk in a hw-specific driver.

That is not my experience. You need to run debugging to verify that a
patch does not cause locking problems, memory corruption etc etc. And
upstream code is tested by various people with debugging kernels so they
will locate the bugs that others introduce. This is usually not because
there was a focus on a particular bug. If you have a hw specific thing
that is not generally tested and skip the debugging tests well yes then we
have a problem.

What I have seen with developers is that they feel the debugging steps are
unnecessary for conveniences sake. I have seen build environments that had
proper steps for verification with a debug kernel. However, someone
disabled them "some months ago" and "nothing happened". Then strange
failures in production systems occur.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-23 16:36   ` Vlastimil Babka
  2019-09-23 17:17     ` David Sterba
  2019-09-23 17:54     ` Roman Gushchin
@ 2019-09-30  8:06     ` Christoph Hellwig
  2019-09-30  9:23     ` Michal Hocko
  3 siblings, 0 replies; 40+ messages in thread
From: Christoph Hellwig @ 2019-09-30  8:06 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Christoph Lameter,
	Pekka Enberg, David Rientjes, Ming Lei, Dave Chinner,
	Matthew Wilcox, Darrick J . Wong, Christoph Hellwig, linux-xfs,
	linux-fsdevel, linux-block, James Bottomley, linux-btrfs,
	Roman Gushchin, Johannes Weiner

On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
> So if anyone thinks this is a good idea, please express it (preferably
> in a formal way such as Acked-by), otherwise it seems the patch will be
> dropped (due to a private NACK, apparently).

I think we absolutely need something like this, and I'm sick and tired
of the people just claiming there is no problem.

From the user POV I don't care if aligned allocations need a new
GFP_ALIGNED flag or not, but as far as I can tell the latter will
probably cause more overhead in practice than not having it.

So unless someone comes up with a better counter proposal to provide
aligned kmalloc of some form that doesn't require a giant amount of
boilerplate code in the users:

Acked^2-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-23 16:36   ` Vlastimil Babka
                       ` (2 preceding siblings ...)
  2019-09-30  8:06     ` Christoph Hellwig
@ 2019-09-30  9:23     ` Michal Hocko
  2019-09-30  9:32       ` Kirill A. Shutemov
  3 siblings, 1 reply; 40+ messages in thread
From: Michal Hocko @ 2019-09-30  9:23 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, linux-kernel, Christoph Lameter,
	Pekka Enberg, David Rientjes, Ming Lei, Dave Chinner,
	Matthew Wilcox, Darrick J . Wong, Christoph Hellwig, linux-xfs,
	linux-fsdevel, linux-block, James Bottomley, linux-btrfs,
	Roman Gushchin, Johannes Weiner

On Mon 23-09-19 18:36:32, Vlastimil Babka wrote:
> On 8/26/19 1:16 PM, Vlastimil Babka wrote:
> > In most configurations, kmalloc() happens to return naturally aligned (i.e.
> > aligned to the block size itself) blocks for power of two sizes. That means
> > some kmalloc() users might unknowingly rely on that alignment, until stuff
> > breaks when the kernel is built with e.g.  CONFIG_SLUB_DEBUG or CONFIG_SLOB,
> > and blocks stop being aligned. Then developers have to devise workaround such
> > as own kmem caches with specified alignment [1], which is not always practical,
> > as recently evidenced in [2].
> > 
> > The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()'
> > variant would not help with code unknowingly relying on the implicit alignment.
> > For slab implementations it would either require creating more kmalloc caches,
> > or allocate a larger size and only give back part of it. That would be
> > wasteful, especially with a generic alignment parameter (in contrast with a
> > fixed alignment to size).
> > 
> > Ideally we should provide to mm users what they need without difficult
> > workarounds or own reimplementations, so let's make the kmalloc() alignment to
> > size explicitly guaranteed for power-of-two sizes under all configurations.
> > What this means for the three available allocators?
> > 
> > * SLAB object layout happens to be mostly unchanged by the patch. The
> >   implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due
> >   to redzoning, however SLAB disables redzoning for caches with alignment
> >   larger than unsigned long long. Practically on at least x86 this includes
> >   kmalloc caches as they use cache line alignment, which is larger than that.
> >   Still, this patch ensures alignment on all arches and cache sizes.
> > 
> > * SLUB layout is also unchanged unless redzoning is enabled through
> >   CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With
> >   this patch, explicit alignment is guaranteed with redzoning as well. This
> >   will result in more memory being wasted, but that should be acceptable in a
> >   debugging scenario.
> > 
> > * SLOB has no implicit alignment so this patch adds it explicitly for
> >   kmalloc(). The potential downside is increased fragmentation. While
> >   pathological allocation scenarios are certainly possible, in my testing,
> >   after booting a x86_64 kernel+userspace with virtme, around 16MB memory
> >   was consumed by slab pages both before and after the patch, with difference
> >   in the noise.
> > 
> > [1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
> > [2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/
> > [3] https://lwn.net/Articles/787740/
> > 
> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> So if anyone thinks this is a good idea, please express it (preferably
> in a formal way such as Acked-by), otherwise it seems the patch will be
> dropped (due to a private NACK, apparently).

Sigh.

An existing code to workaround the lack of alignment guarantee just show
that this is necessary. And there wasn't any real technical argument
against except for a highly theoretical optimizations/new allocator that
would be tight by the guarantee.

Therefore
Acked-by: Michal Hocko <mhocko@suse.com>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-30  9:23     ` Michal Hocko
@ 2019-09-30  9:32       ` Kirill A. Shutemov
  0 siblings, 0 replies; 40+ messages in thread
From: Kirill A. Shutemov @ 2019-09-30  9:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vlastimil Babka, Andrew Morton, linux-mm, linux-kernel,
	Christoph Lameter, Pekka Enberg, David Rientjes, Ming Lei,
	Dave Chinner, Matthew Wilcox, Darrick J . Wong,
	Christoph Hellwig, linux-xfs, linux-fsdevel, linux-block,
	James Bottomley, linux-btrfs, Roman Gushchin, Johannes Weiner

On Mon, Sep 30, 2019 at 11:23:34AM +0200, Michal Hocko wrote:
> On Mon 23-09-19 18:36:32, Vlastimil Babka wrote:
> > On 8/26/19 1:16 PM, Vlastimil Babka wrote:
> > > In most configurations, kmalloc() happens to return naturally aligned (i.e.
> > > aligned to the block size itself) blocks for power of two sizes. That means
> > > some kmalloc() users might unknowingly rely on that alignment, until stuff
> > > breaks when the kernel is built with e.g.  CONFIG_SLUB_DEBUG or CONFIG_SLOB,
> > > and blocks stop being aligned. Then developers have to devise workaround such
> > > as own kmem caches with specified alignment [1], which is not always practical,
> > > as recently evidenced in [2].
> > > 
> > > The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()'
> > > variant would not help with code unknowingly relying on the implicit alignment.
> > > For slab implementations it would either require creating more kmalloc caches,
> > > or allocate a larger size and only give back part of it. That would be
> > > wasteful, especially with a generic alignment parameter (in contrast with a
> > > fixed alignment to size).
> > > 
> > > Ideally we should provide to mm users what they need without difficult
> > > workarounds or own reimplementations, so let's make the kmalloc() alignment to
> > > size explicitly guaranteed for power-of-two sizes under all configurations.
> > > What this means for the three available allocators?
> > > 
> > > * SLAB object layout happens to be mostly unchanged by the patch. The
> > >   implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due
> > >   to redzoning, however SLAB disables redzoning for caches with alignment
> > >   larger than unsigned long long. Practically on at least x86 this includes
> > >   kmalloc caches as they use cache line alignment, which is larger than that.
> > >   Still, this patch ensures alignment on all arches and cache sizes.
> > > 
> > > * SLUB layout is also unchanged unless redzoning is enabled through
> > >   CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With
> > >   this patch, explicit alignment is guaranteed with redzoning as well. This
> > >   will result in more memory being wasted, but that should be acceptable in a
> > >   debugging scenario.
> > > 
> > > * SLOB has no implicit alignment so this patch adds it explicitly for
> > >   kmalloc(). The potential downside is increased fragmentation. While
> > >   pathological allocation scenarios are certainly possible, in my testing,
> > >   after booting a x86_64 kernel+userspace with virtme, around 16MB memory
> > >   was consumed by slab pages both before and after the patch, with difference
> > >   in the noise.
> > > 
> > > [1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
> > > [2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/
> > > [3] https://lwn.net/Articles/787740/
> > > 
> > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> > 
> > So if anyone thinks this is a good idea, please express it (preferably
> > in a formal way such as Acked-by), otherwise it seems the patch will be
> > dropped (due to a private NACK, apparently).
> 
> Sigh.
> 
> An existing code to workaround the lack of alignment guarantee just show
> that this is necessary. And there wasn't any real technical argument
> against except for a highly theoretical optimizations/new allocator that
> would be tight by the guarantee.
> 
> Therefore
> Acked-by: Michal Hocko <mhocko@suse.com>

Agreed.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
  2019-09-28  1:12               ` Christopher Lameter
@ 2019-09-30 13:32                 ` Matthew Wilcox
  0 siblings, 0 replies; 40+ messages in thread
From: Matthew Wilcox @ 2019-09-30 13:32 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Vlastimil Babka, Andrew Morton, David Sterba, linux-mm,
	linux-kernel, Pekka Enberg, David Rientjes, Ming Lei,
	Dave Chinner, Darrick J . Wong, Christoph Hellwig, linux-xfs,
	linux-fsdevel, linux-block, James Bottomley, linux-btrfs,
	Roman Gushchin, Johannes Weiner

On Sat, Sep 28, 2019 at 01:12:49AM +0000, Christopher Lameter wrote:
> However, the layout may be different due to another allocator that prefers
> to arrange things differently (SLOB puts multiple objects of different
> types in the same page to save memory), if we need to add data to these
> objects (debugging info, new metadata about the object, maybe the memcg
> pointer, maybe other things that may come up), or other innovative
> approaches (such as putting data of different kmem caches that are
> commonly used together in the same page to improve locality).

If we ever do start putting objects of different sizes that are commonly
allocated together in the same page (eg inodes & dentries), then those
aren't going to be random kmalloc() allocation; they're going to be
special kmem caches that can specify "I don't care about alignment".

Also, we haven't done that.  We've had a slab allocator for twenty years,
and nobody's tried to do that.  Maybe the co-allocation would be a net
loss (I suspect).  Or the gain is too small for the added complexity.
Whatever way, this is a strawman.

> The cost is an unnecessary petrification of the data layout of the memory
> allocators.

Yes, it is.  And it's a cost I'm willing to pay in order to get the
guarantee of alignment.


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2019-09-30 13:32 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-26 11:16 [PATCH v2 0/2] guarantee natural alignment for kmalloc() Vlastimil Babka
2019-08-26 11:16 ` [PATCH v2 1/2] mm, sl[ou]b: improve memory accounting Vlastimil Babka
2019-08-26 11:16 ` [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) Vlastimil Babka
2019-08-28 18:45   ` Christopher Lameter
2019-08-28 19:46     ` Matthew Wilcox
2019-08-28 22:24       ` Dave Chinner
2019-08-29  7:56         ` Vlastimil Babka
2019-08-30  0:29           ` Dave Chinner
2019-08-29  7:39       ` Michal Hocko
2019-08-30 17:41         ` Christopher Lameter
2019-09-01  0:52           ` Matthew Wilcox
2019-09-03 20:13             ` Christopher Lameter
2019-09-03 20:53               ` Matthew Wilcox
2019-09-04  5:19                 ` Christoph Hellwig
2019-09-04  6:40                   ` Ming Lei
2019-09-04  7:20                     ` Vlastimil Babka
2019-09-04 19:31                 ` Christopher Lameter
2019-09-23 16:36   ` Vlastimil Babka
2019-09-23 17:17     ` David Sterba
2019-09-23 17:51       ` Darrick J. Wong
2019-09-24 20:47         ` cl
2019-09-24 20:51           ` Matthew Wilcox
2019-09-24 20:55             ` cl
2019-09-26 13:02               ` David Sterba
2019-09-24 21:19         ` Vlastimil Babka
2019-09-24 21:53           ` Dave Chinner
2019-09-24 22:21             ` Darrick J. Wong
2019-09-24 20:52       ` cl
2019-09-24 23:54         ` Andrew Morton
2019-09-25  7:17           ` Vlastimil Babka
2019-09-26  0:16             ` Christopher Lameter
2019-09-26  0:14           ` Christopher Lameter
2019-09-26  7:41             ` Vlastimil Babka
2019-09-28  1:12               ` Christopher Lameter
2019-09-30 13:32                 ` Matthew Wilcox
2019-09-23 17:54     ` Roman Gushchin
2019-09-30  8:06     ` Christoph Hellwig
2019-09-30  9:23     ` Michal Hocko
2019-09-30  9:32       ` Kirill A. Shutemov
2019-09-23 18:57   ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).