linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] slab for 5.19
@ 2022-05-23  9:54 Vlastimil Babka
  2022-05-25 18:29 ` Linus Torvalds
  2022-05-25 18:37 ` pr-tracker-bot
  0 siblings, 2 replies; 8+ messages in thread
From: Vlastimil Babka @ 2022-05-23  9:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Rientjes, Joonsoo Kim, Christoph Lameter, Pekka Enberg,
	Andrew Morton, linux-mm, LKML, patches, Roman Gushchin,
	Hyeonggon Yoo

Linus,

please pull the latest slab changes from

  git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git tags/slab-for-5.19

======================================

- Conversion of slub_debug stack traces to stackdepot, allowing
  more useful debugfs-based inspection for e.g. memory leak
  debugging.  Allocation and free debugfs info now includes full
  traces and is sorted by the unique trace frequency.

  The stackdepot conversion was already attempted last year but
  reverted by ae14c63a9f20. The memory overhead (while not actually
  enabled on boot) has been meanwhile solved by making the large
  stackdepot allocation dynamic. The xfstest issues haven't been
  reproduced on current kernel locally nor in -next, so the slab
  cache layout changes that originally made that bug manifest were
  probably not the root cause.

- Refactoring of dma-kmalloc caches creation.

- Trivial cleanups such as removal of unused parameters, fixes
  and clarifications of comments.

- Hyeonggon Yoo joins as a reviewer.

Thanks,
Vlastimil

----------------------------------------------------------------
Andrey Konovalov (2):
      mm: slab: fix comment for ARCH_KMALLOC_MINALIGN
      mm: slab: fix comment for __assume_kmalloc_alignment

Hyeonggon Yoo (2):
      mm/slub, kunit: Make slub_kunit unaffected by user specified flags
      MAINTAINERS: add myself as reviewer for slab

JaeSang Yoo (2):
      mm/slub: remove unused parameter in setup_object*()
      mm/slub: remove meaningless node check in ___slab_alloc()

Jiyoup Kim (1):
      mm/slub: remove duplicate flag in allocate_slab()

Miaohe Lin (3):
      mm/slab: remove some unused functions
      mm/slub: remove unneeded return value of slab_pad_check
      mm/slub: remove unused kmem_cache_order_objects max

Ohhoon Kwon (1):
      mm/slab_common: move dma-kmalloc caches creation into new_kmalloc_cache()

Oliver Glitta (4):
      mm/slub: use stackdepot to save stack trace in objects
      mm/slub: distinguish and print stack traces in debugfs files
      mm/slub: sort debugfs output by frequency of stack traces
      slab, documentation: add description of debugfs files for SLUB caches

Vlastimil Babka (3):
      lib/stackdepot: allow requesting early initialization dynamically
      mm/slub: move struct track init out of set_track()
      Merge branches 'slab/for-5.19/stackdepot' and 'slab/for-5.19/refactor' into slab/for-linus

Yixuan Cao (1):
      mm/slab.c: fix comments

 Documentation/vm/slub.rst  |  64 +++++++++++++++++
 MAINTAINERS                |   1 +
 include/linux/slab.h       |  15 ++--
 include/linux/slub_def.h   |   1 -
 include/linux/stackdepot.h |  26 +++++--
 init/Kconfig               |   1 +
 lib/Kconfig.debug          |   1 +
 lib/slub_kunit.c           |  10 +--
 lib/stackdepot.c           |  67 +++++++++++------
 mm/page_owner.c            |   9 ++-
 mm/slab.c                  |  29 +++-----
 mm/slab.h                  |   5 +-
 mm/slab_common.c           |  23 +++---
 mm/slub.c                  | 174 ++++++++++++++++++++++++++++-----------------
 14 files changed, 283 insertions(+), 143 deletions(-)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] slab for 5.19
  2022-05-23  9:54 [GIT PULL] slab for 5.19 Vlastimil Babka
@ 2022-05-25 18:29 ` Linus Torvalds
  2022-05-25 20:54   ` Vlastimil Babka
  2022-05-25 18:37 ` pr-tracker-bot
  1 sibling, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2022-05-25 18:29 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Joonsoo Kim, Christoph Lameter, Pekka Enberg,
	Andrew Morton, linux-mm, LKML, patches, Roman Gushchin,
	Hyeonggon Yoo

On Mon, May 23, 2022 at 2:54 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
>   The stackdepot conversion was already attempted last year but
>   reverted by ae14c63a9f20. The memory overhead (while not actually
>   enabled on boot) has been meanwhile solved by making the large
>   stackdepot allocation dynamic.

Why do I still see

  +config STACK_HASH_ORDER
  +       int "stack depot hash size (12 => 4KB, 20 => 1024KB)"
  +       range 12 20
  +       default 20

there then?

All that seems to have happened is that it's not a static allocation
any more, but it's still a big allocation very early at boot by
default.

The people who complained about this last time were on m68k machines
iirc, and 1MB there is not insignificant.

It's not at all clear to me why that allocation should be that kind of
fixed number, and if it's a fixed number, why it should be the maximum
one by default. That seems entirely broken.

I've pulled this, but considering that it got reverted once, I'm
really fed up with this kind of thing.  This needs to be fixed.

Because I'm _this_ close to just reverting it again, and saying "No,
you tried this crap already, didn't learn from the last time, and then
did the same thing all over again just in a different guise".

                    Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] slab for 5.19
  2022-05-23  9:54 [GIT PULL] slab for 5.19 Vlastimil Babka
  2022-05-25 18:29 ` Linus Torvalds
@ 2022-05-25 18:37 ` pr-tracker-bot
  1 sibling, 0 replies; 8+ messages in thread
From: pr-tracker-bot @ 2022-05-25 18:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Linus Torvalds, David Rientjes, Joonsoo Kim, Christoph Lameter,
	Pekka Enberg, Andrew Morton, linux-mm, LKML, patches,
	Roman Gushchin, Hyeonggon Yoo

The pull request you sent on Mon, 23 May 2022 11:54:08 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git tags/slab-for-5.19

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/2e17ce1106e04a7f3a83796ec623881487f75dd3

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] slab for 5.19
  2022-05-25 18:29 ` Linus Torvalds
@ 2022-05-25 20:54   ` Vlastimil Babka
  2022-05-25 21:36     ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Vlastimil Babka @ 2022-05-25 20:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Rientjes, Joonsoo Kim, Christoph Lameter, Pekka Enberg,
	Andrew Morton, linux-mm, LKML, patches, Roman Gushchin,
	Hyeonggon Yoo, Geert Uytterhoeven, Alexander Potapenko

+Cc Geert and Alexander

On 5/25/22 20:29, Linus Torvalds wrote:
> On Mon, May 23, 2022 at 2:54 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>>   The stackdepot conversion was already attempted last year but
>>   reverted by ae14c63a9f20. The memory overhead (while not actually
>>   enabled on boot) has been meanwhile solved by making the large
>>   stackdepot allocation dynamic.
> 
> Why do I still see
> 
>   +config STACK_HASH_ORDER
>   +       int "stack depot hash size (12 => 4KB, 20 => 1024KB)"
>   +       range 12 20
>   +       default 20
> 
> there then?
> 
> All that seems to have happened is that it's not a static allocation
> any more, but it's still a big allocation very early at boot by
> default.
> 
> The people who complained about this last time were on m68k machines
> iirc, and 1MB there is not insignificant.

My main concern was that configs that enable SLUB_DEBUG (quite common)
shouldn't pay the stackdepot memory overhead if people don't actually
enable the slub object tracking on boot because they are debugging
something. It's possible I misunderstood Geert's point though.

> It's not at all clear to me why that allocation should be that kind of
> fixed number, and if it's a fixed number, why it should be the maximum
> one by default. That seems entirely broken.

As I understand it's a tradeoff between memory overhead due to hash
table size and cpu overhead due to length of collision chains.

> I've pulled this, but considering that it got reverted once, I'm
> really fed up with this kind of thing.  This needs to be fixed.

Right, I'll try convert stackdepot to rhashtable. If it turns out
infeasible for some reason, we could at least have an "auto" default
that autosizes the table according to how much memory the system has.

> Because I'm _this_ close to just reverting it again, and saying "No,
> you tried this crap already, didn't learn from the last time, and then
> did the same thing all over again just in a different guise".
> 
>                     Linus


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] slab for 5.19
  2022-05-25 20:54   ` Vlastimil Babka
@ 2022-05-25 21:36     ` Linus Torvalds
  2022-05-25 22:00       ` Vlastimil Babka
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2022-05-25 21:36 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Joonsoo Kim, Christoph Lameter, Pekka Enberg,
	Andrew Morton, linux-mm, LKML, patches, Roman Gushchin,
	Hyeonggon Yoo, Geert Uytterhoeven, Alexander Potapenko

On Wed, May 25, 2022 at 1:56 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> As I understand it's a tradeoff between memory overhead due to hash
> table size and cpu overhead due to length of collision chains.

... but why is that a build-time kernel config question? And when does
it actually end up triggering?

I didn't much care - and clearly small machines like m68k didn't
either - when the whole stackdepot thing was a "odd configuration
issue".

Because it used to be that stack depot users were fairly easy to
figure out. It was all about pretty special debug options that enable
a lot of very special code (KASAN and PAGE_OWNER - is there anything
else?).

Very clearly just debug builds.

But this slab change means that now it's basically *always* enabled as
an option, and that means that now it had better do sane things
because it's enabled by default and not something that can easily be
ignored any more.

The kernel config phase is something I'm very sensitive to, because
honestly, it's just about the worst part of building a kernel.
Everything else is pretty easy. The kernel config is painful.

That means that we should not ask insane questions - and asking what
static size you want for a data structure that 99% of all people have
absolutely _zero_ idea about or interest in is wrong.

So the problem here is two-fold:

 (a) it introduces a new compile-time question that isn't sane to ask
a regular user, but is now exposed to regular users.

 (b) this by default uses 1MB of memory for a feature that didn't in
the past, so now if you have small machines you need to make sure you
make a special kernel config for them.

And (a) happens unconditionally, while (b) is conditional. And it's
now clear exactly what all triggers the actual allocation in (b).

I _hope_ it's never triggered unless you actively enable slub debug,
but at least from a quick grep it wasn't obvious.

For example, if you happen to have rcutorture enabled, do you get the
allocation then just because rcutorture uses

       kcp = kmem_cache_create("rcuscale", 136, 8, SLAB_STORE_USER, NULL);

even if you have nothing else that would enable slaub debugging? It
*looked* that way to me from a quick grep, but I could easily have
missed something.

             Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] slab for 5.19
  2022-05-25 21:36     ` Linus Torvalds
@ 2022-05-25 22:00       ` Vlastimil Babka
  2022-05-25 23:07         ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Vlastimil Babka @ 2022-05-25 22:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Rientjes, Joonsoo Kim, Christoph Lameter, Pekka Enberg,
	Andrew Morton, linux-mm, LKML, patches, Roman Gushchin,
	Hyeonggon Yoo, Geert Uytterhoeven, Alexander Potapenko

On 5/25/22 23:36, Linus Torvalds wrote:
> Because it used to be that stack depot users were fairly easy to
> figure out. It was all about pretty special debug options that enable
> a lot of very special code (KASAN and PAGE_OWNER - is there anything
> else?).

Some parts of DRM debugging and looks like also recently networking
through ref_tracker, which also seems to be debugging.

> Very clearly just debug builds.
> 
> But this slab change means that now it's basically *always* enabled as
> an option, and that means that now it had better do sane things
> because it's enabled by default and not something that can easily be
> ignored any more.
> 
> The kernel config phase is something I'm very sensitive to, because
> honestly, it's just about the worst part of building a kernel.
> Everything else is pretty easy. The kernel config is painful.
> 
> That means that we should not ask insane questions - and asking what
> static size you want for a data structure that 99% of all people have
> absolutely _zero_ idea about or interest in is wrong.

Understood.

> So the problem here is two-fold:
> 
>  (a) it introduces a new compile-time question that isn't sane to ask
> a regular user, but is now exposed to regular users.
> 
>  (b) this by default uses 1MB of memory for a feature that didn't in
> the past, so now if you have small machines you need to make sure you
> make a special kernel config for them.
> 
> And (a) happens unconditionally, while (b) is conditional. And it's
> now clear exactly what all triggers the actual allocation in (b).
> 
> I _hope_ it's never triggered unless you actively enable slub debug,
> but at least from a quick grep it wasn't obvious.
> 
> For example, if you happen to have rcutorture enabled, do you get the
> allocation then just because rcutorture uses
> 
>        kcp = kmem_cache_create("rcuscale", 136, 8, SLAB_STORE_USER, NULL);
> 
> even if you have nothing else that would enable slaub debugging? It
> *looked* that way to me from a quick grep, but I could easily have
> missed something.

Yes, running rcutorture will trigger that stackdepot allocation, but as
that's in RCU debugging part of config, I considered it in the same
category as enabling slub debugging.
But yeah I realize it might become a problem if someone wants to run
rcutorture on m68k :/

>              Linus


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] slab for 5.19
  2022-05-25 22:00       ` Vlastimil Babka
@ 2022-05-25 23:07         ` Linus Torvalds
  2022-05-26 10:50           ` Hyeonggon Yoo
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2022-05-25 23:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Rientjes, Joonsoo Kim, Christoph Lameter, Pekka Enberg,
	Andrew Morton, linux-mm, LKML, patches, Roman Gushchin,
	Hyeonggon Yoo, Geert Uytterhoeven, Alexander Potapenko

On Wed, May 25, 2022 at 3:01 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> Yes, running rcutorture will trigger that stackdepot allocation, but as
> that's in RCU debugging part of config, I considered it in the same
> category as enabling slub debugging.

Yeah, I don't think rcutorture is a problem per se, it was more an
example of a random interaction that doesn't actually seem to make
much sense.

As far as I can tell, there is nothing in rcutorture that actually
wants that stack trace, and it seems to be doing just a test of the
object dumping working.

So it was more the oddity and randomness of it that made me go
"Whaah?" There might be others hiding elsewhere, that rcutorture use
just  happened to use the flag explicitly.

             Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GIT PULL] slab for 5.19
  2022-05-25 23:07         ` Linus Torvalds
@ 2022-05-26 10:50           ` Hyeonggon Yoo
  0 siblings, 0 replies; 8+ messages in thread
From: Hyeonggon Yoo @ 2022-05-26 10:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Vlastimil Babka, David Rientjes, Joonsoo Kim, Christoph Lameter,
	Pekka Enberg, Andrew Morton, linux-mm, LKML, patches,
	Roman Gushchin, Geert Uytterhoeven, Alexander Potapenko

On Wed, May 25, 2022 at 04:07:24PM -0700, Linus Torvalds wrote:
> On Wed, May 25, 2022 at 3:01 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > Yes, running rcutorture will trigger that stackdepot allocation, but as
> > that's in RCU debugging part of config, I considered it in the same
> > category as enabling slub debugging.
> 
> Yeah, I don't think rcutorture is a problem per se, it was more an
> example of a random interaction that doesn't actually seem to make
> much sense.

Creating cache with SLAB_STORE_USER is (currently) done in rcutorture.
SLAB_STORE_USER means "we're going to track every user of this slab
cache" so IMO it should not be done if not debugging.

Otherwise stackdepot will be used only when user passes slub_debug
boot parameter. 

> As far as I can tell, there is nothing in rcutorture that actually
> wants that stack trace, and it seems to be doing just a test of the
> object dumping working.

dumping an arbitrary object may call kmem_dump_obj(), which will dump
slab objects and thus printing where it was allocated.

> 
> So it was more the oddity and randomness of it that made me go
> "Whaah?" There might be others hiding elsewhere, that rcutorture use
> just  happened to use the flag explicitly.
> 
>              Linus

AFAIK only 1) creating cache with SLAB_STORE_USER or 2) passing
slub_debug make kernel allocate hash table for stackdepot.

This do not look that odd too me.

But I agree not asking size of hash table to user and by 1) using
rhashtables or 2) deciding size of hash table at runtime would be
better :)

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-05-26 10:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-23  9:54 [GIT PULL] slab for 5.19 Vlastimil Babka
2022-05-25 18:29 ` Linus Torvalds
2022-05-25 20:54   ` Vlastimil Babka
2022-05-25 21:36     ` Linus Torvalds
2022-05-25 22:00       ` Vlastimil Babka
2022-05-25 23:07         ` Linus Torvalds
2022-05-26 10:50           ` Hyeonggon Yoo
2022-05-25 18:37 ` pr-tracker-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).