All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Christoph Lameter <cl@linux.com>,
	"Pekka Enberg" <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Ming Lei <ming.lei@redhat.com>,
	Dave Chinner <david@fromorbit.com>,
	Matthew Wilcox <willy@infradead.org>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	Christoph Hellwig <hch@lst.de>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)
Date: Mon, 23 Sep 2019 17:54:01 +0000	[thread overview]
Message-ID: <20190923175356.GB3644@tower.DHCP.thefacebook.com> (raw)
In-Reply-To: <df8d1cf4-ff8f-1ee1-12fb-cfec39131b32@suse.cz>

On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote:
> On 8/26/19 1:16 PM, Vlastimil Babka wrote:
> > In most configurations, kmalloc() happens to return naturally aligned (i.e.
> > aligned to the block size itself) blocks for power of two sizes. That means
> > some kmalloc() users might unknowingly rely on that alignment, until stuff
> > breaks when the kernel is built with e.g.  CONFIG_SLUB_DEBUG or CONFIG_SLOB,
> > and blocks stop being aligned. Then developers have to devise workaround such
> > as own kmem caches with specified alignment [1], which is not always practical,
> > as recently evidenced in [2].
> > 
> > The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()'
> > variant would not help with code unknowingly relying on the implicit alignment.
> > For slab implementations it would either require creating more kmalloc caches,
> > or allocate a larger size and only give back part of it. That would be
> > wasteful, especially with a generic alignment parameter (in contrast with a
> > fixed alignment to size).
> > 
> > Ideally we should provide to mm users what they need without difficult
> > workarounds or own reimplementations, so let's make the kmalloc() alignment to
> > size explicitly guaranteed for power-of-two sizes under all configurations.
> > What this means for the three available allocators?
> > 
> > * SLAB object layout happens to be mostly unchanged by the patch. The
> >   implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due
> >   to redzoning, however SLAB disables redzoning for caches with alignment
> >   larger than unsigned long long. Practically on at least x86 this includes
> >   kmalloc caches as they use cache line alignment, which is larger than that.
> >   Still, this patch ensures alignment on all arches and cache sizes.
> > 
> > * SLUB layout is also unchanged unless redzoning is enabled through
> >   CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With
> >   this patch, explicit alignment is guaranteed with redzoning as well. This
> >   will result in more memory being wasted, but that should be acceptable in a
> >   debugging scenario.
> > 
> > * SLOB has no implicit alignment so this patch adds it explicitly for
> >   kmalloc(). The potential downside is increased fragmentation. While
> >   pathological allocation scenarios are certainly possible, in my testing,
> >   after booting a x86_64 kernel+userspace with virtme, around 16MB memory
> >   was consumed by slab pages both before and after the patch, with difference
> >   in the noise.
> > 
> > [1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/
> > [2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/
> > [3] https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_787740_&d=DwICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=UUX1YoPTOOycNowHuRP2ZnqwSwZFjAFrkQFrqstidZ0&s=Kt_XTKlh2qxbC_7ME44MV3_QWFVRHlI1p2EQZYP0uqY&e= 
> > 
> > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> So if anyone thinks this is a good idea, please express it (preferably
> in a formal way such as Acked-by), otherwise it seems the patch will be
> dropped (due to a private NACK, apparently).
> 
> Otherwise I don't think there can be an objective conclusion. On the one
> hand we avoid further problems and workarounds due to misalignment (or
> objects allocated beyond page boundary, which was only recently
> mentioned), on the other hand we potentially make future changes to
> SLAB/SLUB or hypotetical new implementation either more complicated, or
> less effective due to extra fragmentation. Different people can have
> different opinions on what's more important.
> 
> Let me however explain why I think we don't have to fear the future
> implementation complications that much. There was an argument IIRC that
> extra non-debug metadata could start to be prepended/appended to an
> object in the future (i.e. RCU freeing head?).
> 
> 1) Caches can be already created with explicit alignment, so a naive
> pre/appending implementation would already waste memory on such caches.
> 2) Even without explicit alignment, a single slab cache for 512k objects
> with few bytes added to each object would waste almost 512k as the
> objects wouldn't fit precisely in an (order-X) page. The percentage
> wasted depends on X.
> 3) Roman recently posted a patchset [1] that basically adds a cgroup
> pointer to each object. The implementation doesn't append it to objects
> naively however, but adds a separately allocated array. Alignment is
> thus unchanged.

To be fair here, we *might* want to put this pointer just after/before
the object to reduce the number of cache misses. I've put it into
a separate place mostly for simplicity reasons.

It's not an objection though, just a note.

Thanks!

> 
> [1] https://lore.kernel.org/linux-mm/20190905214553.1643060-1-guro@fb.com/
> 

  parent reply	other threads:[~2019-09-23 17:54 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-26 11:16 [PATCH v2 0/2] guarantee natural alignment for kmalloc() Vlastimil Babka
2019-08-26 11:16 ` [PATCH v2 1/2] mm, sl[ou]b: improve memory accounting Vlastimil Babka
2019-08-26 11:16 ` [PATCH v2 2/2] mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) Vlastimil Babka
2019-08-28 18:45   ` Christopher Lameter
2019-08-28 18:45     ` Christopher Lameter
2019-08-28 19:46     ` Matthew Wilcox
2019-08-28 22:24       ` Dave Chinner
2019-08-29  7:56         ` Vlastimil Babka
2019-08-30  0:29           ` Dave Chinner
2019-08-29  7:39       ` Michal Hocko
2019-08-30 17:41         ` Christopher Lameter
2019-08-30 17:41           ` Christopher Lameter
2019-09-01  0:52           ` Matthew Wilcox
2019-09-03 20:13             ` Christopher Lameter
2019-09-03 20:13               ` Christopher Lameter
2019-09-03 20:53               ` Matthew Wilcox
2019-09-04  5:19                 ` Christoph Hellwig
2019-09-04  6:40                   ` Ming Lei
2019-09-04  7:20                     ` Vlastimil Babka
2019-09-04 19:31                 ` Christopher Lameter
2019-09-04 19:31                   ` Christopher Lameter
2019-09-23 16:36   ` Vlastimil Babka
2019-09-23 17:17     ` David Sterba
2019-09-23 17:51       ` Darrick J. Wong
2019-09-24 20:47         ` cl
2019-09-24 20:47           ` cl
2019-09-24 20:51           ` Matthew Wilcox
2019-09-24 20:55             ` cl
2019-09-24 20:55               ` cl
2019-09-26 13:02               ` David Sterba
2019-09-24 21:19         ` Vlastimil Babka
2019-09-24 21:53           ` Dave Chinner
2019-09-24 22:21             ` Darrick J. Wong
2019-09-24 20:52       ` cl
2019-09-24 20:52         ` cl
2019-09-24 23:54         ` Andrew Morton
2019-09-25  7:17           ` Vlastimil Babka
2019-09-26  0:16             ` Christopher Lameter
2019-09-26  0:16               ` Christopher Lameter
2019-09-26  0:14           ` Christopher Lameter
2019-09-26  0:14             ` Christopher Lameter
2019-09-26  7:41             ` Vlastimil Babka
2019-09-28  1:12               ` Christopher Lameter
2019-09-28  1:12                 ` Christopher Lameter
2019-09-30 13:32                 ` Matthew Wilcox
2019-09-23 17:54     ` Roman Gushchin [this message]
2019-09-30  8:06     ` Christoph Hellwig
2019-09-30  9:23     ` Michal Hocko
2019-09-30  9:32       ` Kirill A. Shutemov
2019-09-23 18:57   ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190923175356.GB3644@tower.DHCP.thefacebook.com \
    --to=guro@fb.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.