linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Ming Lei <ming.lei@redhat.com>,
	Dave Chinner <david@fromorbit.com>,
	Matthew Wilcox <willy@infradead.org>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	Christoph Hellwig <hch@lst.de>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org,
	James Bottomley <James.Bottomley@HansenPartnership.com>,
	linux-btrfs@vger.kernel.org, Roman Gushchin <guro@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
Date: Mon, 23 Sep 2019 18:36:32 +0200	[thread overview]
Message-ID: <df8d1cf4-ff8f-1ee1-12fb-cfec39131b32@suse.cz> (raw)
In-Reply-To: <20190826111627.7505-3-vbabka@suse.cz>

On 8/26/19 1:16 PM, Vlastimil Babka wrote:
> In most configurations, kmalloc() happens to return naturally aligned (i.e.
> aligned to the block size itself) blocks for power of two sizes. That means
> some kmalloc() users might unknowingly rely on that alignment, until stuff
> breaks when the kernel is built with e.g.  CONFIG_SLUB_DEBUG or CONFIG_SLOB,
> and blocks stop being aligned. Then developers have to devise workaround such
> as own kmem caches with specified alignment [1], which is not always practical,
> as recently evidenced in [2].
> 
> The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()'
> variant would not help with code unknowingly relying on the implicit alignment.
> For slab implementations it would either require creating more kmalloc caches,
> or allocate a larger size and only give back part of it. That would be
> wasteful, especially with a generic alignment parameter (in contrast with a
> fixed alignment to size).
> 
> Ideally we should provide to mm users what they need without difficult
> workarounds or own reimplementations, so let's make the kmalloc() alignment to
> size explicitly guaranteed for power-of-two sizes under all configurations.
> What this means for the three available allocators?
> 
> * SLAB object layout happens to be mostly unchanged by the patch. The
>   implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due
>   to redzoning, however SLAB disables redzoning for caches with alignment
>   larger than unsigned long long. Practically on at least x86 this includes
>   kmalloc caches as they use cache line alignment, which is larger than that.
>   Still, this patch ensures alignment on all arches and cache sizes.
> 
> * SLUB layout is also unchanged unless redzoning is enabled through
>   CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With
>   this patch, explicit alignment is guaranteed with redzoning as well. This
>   will result in more memory being wasted, but that should be acceptable in a
>   debugging scenario.
> 
> * SLOB has no implicit alignment so this patch adds it explicitly for
>   kmalloc(). The potential downside is increased fragmentation. While
>   pathological allocation scenarios are certainly possible, in my testing,
>   after booting a x86_64 kernel+userspace with virtme, around 16MB memory
>   was consumed by slab pages both before and after the patch, with difference
>   in the noise.
> 
> [1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
> [2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/
> [3] https://lwn.net/Articles/787740/
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

So if anyone thinks this is a good idea, please express it (preferably
in a formal way such as Acked-by), otherwise it seems the patch will be
dropped (due to a private NACK, apparently).

Otherwise I don't think there can be an objective conclusion. On the one
hand we avoid further problems and workarounds due to misalignment (or
objects allocated beyond page boundary, which was only recently
mentioned), on the other hand we potentially make future changes to
SLAB/SLUB or hypotetical new implementation either more complicated, or
less effective due to extra fragmentation. Different people can have
different opinions on what's more important.

Let me however explain why I think we don't have to fear the future
implementation complications that much. There was an argument IIRC that
extra non-debug metadata could start to be prepended/appended to an
object in the future (i.e. RCU freeing head?).

1) Caches can be already created with explicit alignment, so a naive
pre/appending implementation would already waste memory on such caches.
2) Even without explicit alignment, a single slab cache for 512k objects
with few bytes added to each object would waste almost 512k as the
objects wouldn't fit precisely in an (order-X) page. The percentage
wasted depends on X.
3) Roman recently posted a patchset [1] that basically adds a cgroup
pointer to each object. The implementation doesn't append it to objects
naively however, but adds a separately allocated array. Alignment is
thus unchanged.

[1] https://lore.kernel.org/linux-mm/20190905214553.1643060-1-guro@fb.com/



  parent reply	other threads:[~2019-09-23 16:36 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-26 11:16 [PATCH v2 0/2] guarantee natural alignment for kmalloc() Vlastimil Babka
2019-08-26 11:16 ` [PATCH v2 1/2] mm, sl[ou]b: improve memory accounting Vlastimil Babka
2019-08-26 11:16 ` [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) Vlastimil Babka
2019-08-28 18:45   ` Christopher Lameter
2019-08-28 19:46     ` Matthew Wilcox
2019-08-28 22:24       ` Dave Chinner
2019-08-29  7:56         ` Vlastimil Babka
2019-08-30  0:29           ` Dave Chinner
2019-08-29  7:39       ` Michal Hocko
2019-08-30 17:41         ` Christopher Lameter
2019-09-01  0:52           ` Matthew Wilcox
2019-09-03 20:13             ` Christopher Lameter
2019-09-03 20:53               ` Matthew Wilcox
2019-09-04  5:19                 ` Christoph Hellwig
2019-09-04  6:40                   ` Ming Lei
2019-09-04  7:20                     ` Vlastimil Babka
2019-09-04 19:31                 ` Christopher Lameter
2019-09-23 16:36   ` Vlastimil Babka [this message]
2019-09-23 17:17     ` David Sterba
2019-09-23 17:51       ` Darrick J. Wong
2019-09-24 20:47         ` cl
2019-09-24 20:51           ` Matthew Wilcox
2019-09-24 20:55             ` cl
2019-09-26 13:02               ` David Sterba
2019-09-24 21:19         ` Vlastimil Babka
2019-09-24 21:53           ` Dave Chinner
2019-09-24 22:21             ` Darrick J. Wong
2019-09-24 20:52       ` cl
2019-09-24 23:54         ` Andrew Morton
2019-09-25  7:17           ` Vlastimil Babka
2019-09-26  0:16             ` Christopher Lameter
2019-09-26  0:14           ` Christopher Lameter
2019-09-26  7:41             ` Vlastimil Babka
2019-09-28  1:12               ` Christopher Lameter
2019-09-30 13:32                 ` Matthew Wilcox
2019-09-23 17:54     ` Roman Gushchin
2019-09-30  8:06     ` Christoph Hellwig
2019-09-30  9:23     ` Michal Hocko
2019-09-30  9:32       ` Kirill A. Shutemov
2019-09-23 18:57   ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=df8d1cf4-ff8f-1ee1-12fb-cfec39131b32@suse.cz \
    --to=vbabka@suse.cz \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).