All of lore.kernel.org
 help / color / mirror / Atom feed
* slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
@ 2015-09-02 23:13 Linus Torvalds
  2015-09-03  0:48 ` Andrew Morton
  2015-09-03  0:51   ` Mike Snitzer
  0 siblings, 2 replies; 42+ messages in thread
From: Linus Torvalds @ 2015-09-02 23:13 UTC (permalink / raw)
  To: Mike Snitzer, Dave Chinner, Christoph Lameter, Pekka Enberg,
	Andrew Morton, David Rientjes, Joonsoo Kim
  Cc: dm-devel, Alasdair G Kergon, Joe Thornber, Mikulas Patocka,
	Vivek Goyal, Sami Tolvanen, Viresh Kumar, Heinz Mauelshagen,
	linux-mm

On Wed, Sep 2, 2015 at 10:39 AM, Mike Snitzer <snitzer@redhat.com> wrote:
>
> - last but not least: add SLAB_NO_MERGE flag to mm/slab_common and
>   disable slab merging for all of DM's slabs (XFS will also use
>   SLAB_NO_MERGE once merged).

So I'm not at all convinced this is the right thing to do. In fact,
I'm pretty convinced it shouldn't be done this way. Since those
commits were at the top of your tree, I just didn't pull them, but
took the rest..

You are basically making this one-sided decision based on your notion
of convenience, and just forcing that thing unconditionally on people.

Your rationale seems _totally_ bogus: you say that it's to be able to
observe the sizes of the dm slabs without using slab debugging.

First off, you don't have to enable slab debugging. You can just
disable slab merging.  It's called "slab_nomerge". It does exactly
what you would think it does.

And what is it that makes dm slabs such a special little princess?
What makes you think that the fact that _you_ want to look at slab
statistics means that everybody else suddenly must have separate slabs
for dm, and dm only? Or xfs?

The other "rationale" was that not merging slabs limits
cross-subsystem memory corruption. Again, what the _hell_ is special
about device mapper that dm - and only dm - would make this a special
thing? That is just pure and utter garbage. Again, we already have
that "slab_nomerge" option, exactly so that when odd slab corruption
issues happen (they are rare, but they do occasionally happen), you
can try that to see if that pinpoints the problem more. And it is
*not* limited to some random set of subsystems. Which makes it clearly
superior to your broken approach, wouldn't you agree?

The only possible true rationale for why dm is special is "because dm
is such a buggy piece of sh*t that it's much more likely to have these
slab corruption bugs than anything else, so I'm just protecting the
rest of the system".

Is that really your rationale? Somehow I doubt it. But if it is, you
really should have said so. At least then it would make sense why this
thing came in through the dm tree, and why dm is so special than it -
and only it - would disable slab merging.

So I'm not pulling things like this from the device mapper tree. There
is just no excuse that I can see for something like SLAB_NO_MERGE to
go through the dm tree in the first place, but that's doubly true when
the rationale for these things were bogus and had nothing what-so-ever
to do with dm.

Things like this aren't supposed to come in through random irrelevant
trees like this, and with no discussion (at least judging by the
commits) with the maintainers of the other pieces of code.

If you have issues with slab merging, then those should be discussed
as such, not as some magical and bogus dm or xfs special case when
they damn well aren't, and damn well will never be.

Yes, I'm annoyed. This was not done well. I realize that everybody
thinks that _their_ code is so special and exceptional that
"obviously" they should be treated specially, but I don't see that
that is the case at all in this case.

If you want to argue that slab merging should be disabled by default,
then that is an argument that I'm willing to believe might be valid
("the downsides are bigger than the upsides").  Or if you are able to
explain why dm really _is_ special, that's an option too. But this
kind of "random subsystems decide unilaterally to not follow the
normal rules" is not acceptable. Not when the "arguments" for it have
absolutely nothing in particular to do with that subsystem.

                    Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-02 23:13 slab-nomerge (was Re: [git pull] device mapper changes for 4.3) Linus Torvalds
@ 2015-09-03  0:48 ` Andrew Morton
  2015-09-03  0:53   ` Mike Snitzer
  2015-09-03  0:51   ` Mike Snitzer
  1 sibling, 1 reply; 42+ messages in thread
From: Andrew Morton @ 2015-09-03  0:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mike Snitzer, Dave Chinner, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm

On Wed, 2 Sep 2015 16:13:44 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, Sep 2, 2015 at 10:39 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> >
> > - last but not least: add SLAB_NO_MERGE flag to mm/slab_common and
> >   disable slab merging for all of DM's slabs (XFS will also use
> >   SLAB_NO_MERGE once merged).
> 
> So I'm not at all convinced this is the right thing to do. In fact,
> I'm pretty convinced it shouldn't be done this way. Since those
> commits were at the top of your tree, I just didn't pull them, but
> took the rest..

I don't have problems with the patch itself, really.  It only affects
callers who use SLAB_NO_MERGE and those developers can make
their own decisions.

It is a bit sad to de-optimise dm for all users for all time in order
to make life a bit easier for dm's developers, but maybe that's a
decent tradeoff.


What I do have a problem with is that afaict the patch appeared on
linux-mm for the first time just yesterday.  Didn't cc slab developers,
it isn't in linux-next, didn't cc linux-kernel or linux-mm or slab/mm
developers on the pull request.  Bad!

I'd like the slab developers to have time to understand and review this
change, please.  Partly so they have a chance to provide feedback for
the usual reasons, but also to help them understand the effect their
design choice had on client subystems.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-02 23:13 slab-nomerge (was Re: [git pull] device mapper changes for 4.3) Linus Torvalds
@ 2015-09-03  0:51   ` Mike Snitzer
  2015-09-03  0:51   ` Mike Snitzer
  1 sibling, 0 replies; 42+ messages in thread
From: Mike Snitzer @ 2015-09-03  0:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Chinner, Christoph Lameter, Pekka Enberg, Andrew Morton,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm

On Wed, Sep 02 2015 at  7:13pm -0400,
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, Sep 2, 2015 at 10:39 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> >
> > - last but not least: add SLAB_NO_MERGE flag to mm/slab_common and
> >   disable slab merging for all of DM's slabs (XFS will also use
> >   SLAB_NO_MERGE once merged).
> 
> So I'm not at all convinced this is the right thing to do. In fact,
> I'm pretty convinced it shouldn't be done this way. Since those
> commits were at the top of your tree, I just didn't pull them, but
> took the rest..

OK, thanks.
 
> You are basically making this one-sided decision based on your notion
> of convenience, and just forcing that thing unconditionally on people.

The switch to slab merging was forced on everyone without proper notice.

What I made possible with SLAB_NO_MERGE is for each subsystem to decide
if they would prefer to not allow slab merging.

> Your rationale seems _totally_ bogus: you say that it's to be able to
> observe the sizes of the dm slabs without using slab debugging.
> 
> First off, you don't have to enable slab debugging. You can just
> disable slab merging.  It's called "slab_nomerge". It does exactly
> what you would think it does.

I'm well aware of slab_nomerge.  I called it out in my commit message.

> And what is it that makes dm slabs such a special little princess?
> What makes you think that the fact that _you_ want to look at slab
> statistics means that everybody else suddenly must have separate slabs
> for dm, and dm only? Or xfs?

>From where I sit it is much more useful to have separate slabs.  Could
be if a case was actually made for slab merging I'd change my view.  But
as of now these trump the stated benefits of slab merging:
1) useful slab usage stats
2) fault isolation from other subsystems

> The other "rationale" was that not merging slabs limits
> cross-subsystem memory corruption. Again, what the _hell_ is special
> about device mapper that dm - and only dm - would make this a special
> thing? That is just pure and utter garbage. Again, we already have
> that "slab_nomerge" option, exactly so that when odd slab corruption
> issues happen (they are rare, but they do occasionally happen), you
> can try that to see if that pinpoints the problem more. And it is
> *not* limited to some random set of subsystems. Which makes it clearly
> superior to your broken approach, wouldn't you agree?

I'm not interested in deciding such things for everyone.

I added a flag that enables piecewise enablement of unshared slabs for
subsystems that really don't want shared slabs.

Aside from improved accounting, the point is to not allow other crap
code (e.g. staging or whatever) to impact other subsystems via shared
slabs.

> The only possible true rationale for why dm is special is "because dm
> is such a buggy piece of sh*t that it's much more likely to have these
> slab corruption bugs than anything else, so I'm just protecting the
> rest of the system".
>
> Is that really your rationale? Somehow I doubt it. But if it is, you
> really should have said so. At least then it would make sense why this
> thing came in through the dm tree, and why dm is so special than it -
> and only it - would disable slab merging.

The 3 lines that added SLAB_NO_MERGE were pretty damn clean.
SLAB_NO_MERGE gives subsystems a choice they didn't have before and they
frankly probably never knew they had to care about it because they didn't
know slabs were being merged.  I asked around enough to know I'm not an
idiot for having missed the memo on slab merging.

Lack of awareness aside, nobody ever _convincingly_ detailed why slab
merging was pushed on everyone.  Look at the header for commit 12220de
("mm/slab: support slab merge") -- now that is some seriously weak
justification!

I sought to get more insight on "why slab merging?" and all I found was
this in Documentation/vm/slub.txt:

"
Slab merging
------------

If no debug options are specified then SLUB may merge similar slabs together
in order to reduce overhead and increase cache hotness of objects.
slabinfo -a displays which slabs were merged together."
"

I couldn't even find which package provides slabinfo to run slabinfo -a!

And the hand-wavvy "reduce overhead and increase cache hotness of
objects" frankly sucks.

> So I'm not pulling things like this from the device mapper tree. There
> is just no excuse that I can see for something like SLAB_NO_MERGE to
> go through the dm tree in the first place, but that's doubly true when
> the rationale for these things were bogus and had nothing what-so-ever
> to do with dm.

As DM maintainer I do have a choice about how the subsystem is
architected.

> Things like this aren't supposed to come in through random irrelevant
> trees like this, and with no discussion (at least judging by the
> commits) with the maintainers of the other pieces of code.

DM is irrelevant now?  Because I pissed you off?  Or because you trully
think that?

This is the first and hopefully last time I get flamed by you.  I
shouldn't have pushed for this change so aggressively.  The lack of
feedback from mm people shouldn't have been taken by me as implied "we
forced it on you a year ago, fuck you".  But I'm genuinely _not_
appreciative of this change to shared slabs so I took action to restore
what I hold to be the right way to design system software.

> If you have issues with slab merging, then those should be discussed
> as such, not as some magical and bogus dm or xfs special case when
> they damn well aren't, and damn well will never be.
> 
> Yes, I'm annoyed. This was not done well. I realize that everybody
> thinks that _their_ code is so special and exceptional that
> "obviously" they should be treated specially, but I don't see that
> that is the case at all in this case.
> 
> If you want to argue that slab merging should be disabled by default,
> then that is an argument that I'm willing to believe might be valid
> ("the downsides are bigger than the upsides").  Or if you are able to
> explain why dm really _is_ special, that's an option too. But this
> kind of "random subsystems decide unilaterally to not follow the
> normal rules" is not acceptable. Not when the "arguments" for it have
> absolutely nothing in particular to do with that subsystem.

DM isn't special.  Never intended it to come off like it is.  I don't
want slab merging but as a middle ground I made it so it is left to each
subsystem to decide to use it or not.  I clearly was the first to take
issue with slab merging by calling it out with patches.  In doing so
Dave Chinner said he'd rather avoid using shared slabs in XFS.  Pretty
sure XFS isn't irrelvant yet.

I'd wager there would be a flood of other subsystems opting to use
SLAB_NO_MERGE.  I can appreciate that as something the pro-slab-merge
camp would like to avoid (the more that opt-out the more useless slab
merging becomes).

It is messed up that no _real_ justification was given for slab merging
yet it was pushed on everyone.  Thankfully it hasn't been unstable
(which backs up your point) but I'd still love to understand how it is
so beneficial.   Is it a significant win?  If so where?  Or is it a
microoptimization at the expense of both accounting and fault isolation?

Mike

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
@ 2015-09-03  0:51   ` Mike Snitzer
  0 siblings, 0 replies; 42+ messages in thread
From: Mike Snitzer @ 2015-09-03  0:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Chinner, Christoph Lameter, Pekka Enberg, Andrew Morton,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm

On Wed, Sep 02 2015 at  7:13pm -0400,
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, Sep 2, 2015 at 10:39 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> >
> > - last but not least: add SLAB_NO_MERGE flag to mm/slab_common and
> >   disable slab merging for all of DM's slabs (XFS will also use
> >   SLAB_NO_MERGE once merged).
> 
> So I'm not at all convinced this is the right thing to do. In fact,
> I'm pretty convinced it shouldn't be done this way. Since those
> commits were at the top of your tree, I just didn't pull them, but
> took the rest..

OK, thanks.
 
> You are basically making this one-sided decision based on your notion
> of convenience, and just forcing that thing unconditionally on people.

The switch to slab merging was forced on everyone without proper notice.

What I made possible with SLAB_NO_MERGE is for each subsystem to decide
if they would prefer to not allow slab merging.

> Your rationale seems _totally_ bogus: you say that it's to be able to
> observe the sizes of the dm slabs without using slab debugging.
> 
> First off, you don't have to enable slab debugging. You can just
> disable slab merging.  It's called "slab_nomerge". It does exactly
> what you would think it does.

I'm well aware of slab_nomerge.  I called it out in my commit message.

> And what is it that makes dm slabs such a special little princess?
> What makes you think that the fact that _you_ want to look at slab
> statistics means that everybody else suddenly must have separate slabs
> for dm, and dm only? Or xfs?

From where I sit it is much more useful to have separate slabs.  Could
be if a case was actually made for slab merging I'd change my view.  But
as of now these trump the stated benefits of slab merging:
1) useful slab usage stats
2) fault isolation from other subsystems

> The other "rationale" was that not merging slabs limits
> cross-subsystem memory corruption. Again, what the _hell_ is special
> about device mapper that dm - and only dm - would make this a special
> thing? That is just pure and utter garbage. Again, we already have
> that "slab_nomerge" option, exactly so that when odd slab corruption
> issues happen (they are rare, but they do occasionally happen), you
> can try that to see if that pinpoints the problem more. And it is
> *not* limited to some random set of subsystems. Which makes it clearly
> superior to your broken approach, wouldn't you agree?

I'm not interested in deciding such things for everyone.

I added a flag that enables piecewise enablement of unshared slabs for
subsystems that really don't want shared slabs.

Aside from improved accounting, the point is to not allow other crap
code (e.g. staging or whatever) to impact other subsystems via shared
slabs.

> The only possible true rationale for why dm is special is "because dm
> is such a buggy piece of sh*t that it's much more likely to have these
> slab corruption bugs than anything else, so I'm just protecting the
> rest of the system".
>
> Is that really your rationale? Somehow I doubt it. But if it is, you
> really should have said so. At least then it would make sense why this
> thing came in through the dm tree, and why dm is so special than it -
> and only it - would disable slab merging.

The 3 lines that added SLAB_NO_MERGE were pretty damn clean.
SLAB_NO_MERGE gives subsystems a choice they didn't have before and they
frankly probably never knew they had to care about it because they didn't
know slabs were being merged.  I asked around enough to know I'm not an
idiot for having missed the memo on slab merging.

Lack of awareness aside, nobody ever _convincingly_ detailed why slab
merging was pushed on everyone.  Look at the header for commit 12220de
("mm/slab: support slab merge") -- now that is some seriously weak
justification!

I sought to get more insight on "why slab merging?" and all I found was
this in Documentation/vm/slub.txt:

"
Slab merging
------------

If no debug options are specified then SLUB may merge similar slabs together
in order to reduce overhead and increase cache hotness of objects.
slabinfo -a displays which slabs were merged together."
"

I couldn't even find which package provides slabinfo to run slabinfo -a!

And the hand-wavvy "reduce overhead and increase cache hotness of
objects" frankly sucks.

> So I'm not pulling things like this from the device mapper tree. There
> is just no excuse that I can see for something like SLAB_NO_MERGE to
> go through the dm tree in the first place, but that's doubly true when
> the rationale for these things were bogus and had nothing what-so-ever
> to do with dm.

As DM maintainer I do have a choice about how the subsystem is
architected.

> Things like this aren't supposed to come in through random irrelevant
> trees like this, and with no discussion (at least judging by the
> commits) with the maintainers of the other pieces of code.

DM is irrelevant now?  Because I pissed you off?  Or because you trully
think that?

This is the first and hopefully last time I get flamed by you.  I
shouldn't have pushed for this change so aggressively.  The lack of
feedback from mm people shouldn't have been taken by me as implied "we
forced it on you a year ago, fuck you".  But I'm genuinely _not_
appreciative of this change to shared slabs so I took action to restore
what I hold to be the right way to design system software.

> If you have issues with slab merging, then those should be discussed
> as such, not as some magical and bogus dm or xfs special case when
> they damn well aren't, and damn well will never be.
> 
> Yes, I'm annoyed. This was not done well. I realize that everybody
> thinks that _their_ code is so special and exceptional that
> "obviously" they should be treated specially, but I don't see that
> that is the case at all in this case.
> 
> If you want to argue that slab merging should be disabled by default,
> then that is an argument that I'm willing to believe might be valid
> ("the downsides are bigger than the upsides").  Or if you are able to
> explain why dm really _is_ special, that's an option too. But this
> kind of "random subsystems decide unilaterally to not follow the
> normal rules" is not acceptable. Not when the "arguments" for it have
> absolutely nothing in particular to do with that subsystem.

DM isn't special.  Never intended it to come off like it is.  I don't
want slab merging but as a middle ground I made it so it is left to each
subsystem to decide to use it or not.  I clearly was the first to take
issue with slab merging by calling it out with patches.  In doing so
Dave Chinner said he'd rather avoid using shared slabs in XFS.  Pretty
sure XFS isn't irrelvant yet.

I'd wager there would be a flood of other subsystems opting to use
SLAB_NO_MERGE.  I can appreciate that as something the pro-slab-merge
camp would like to avoid (the more that opt-out the more useless slab
merging becomes).

It is messed up that no _real_ justification was given for slab merging
yet it was pushed on everyone.  Thankfully it hasn't been unstable
(which backs up your point) but I'd still love to understand how it is
so beneficial.   Is it a significant win?  If so where?  Or is it a
microoptimization at the expense of both accounting and fault isolation?

Mike

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-03  0:48 ` Andrew Morton
@ 2015-09-03  0:53   ` Mike Snitzer
  0 siblings, 0 replies; 42+ messages in thread
From: Mike Snitzer @ 2015-09-03  0:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Dave Chinner, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm

On Wed, Sep 02 2015 at  8:48pm -0400,
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 2 Sep 2015 16:13:44 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Wed, Sep 2, 2015 at 10:39 AM, Mike Snitzer <snitzer@redhat.com> wrote:
> > >
> > > - last but not least: add SLAB_NO_MERGE flag to mm/slab_common and
> > >   disable slab merging for all of DM's slabs (XFS will also use
> > >   SLAB_NO_MERGE once merged).
> > 
> > So I'm not at all convinced this is the right thing to do. In fact,
> > I'm pretty convinced it shouldn't be done this way. Since those
> > commits were at the top of your tree, I just didn't pull them, but
> > took the rest..
> 
> I don't have problems with the patch itself, really.  It only affects
> callers who use SLAB_NO_MERGE and those developers can make
> their own decisions.
> 
> It is a bit sad to de-optimise dm for all users for all time in order
> to make life a bit easier for dm's developers, but maybe that's a
> decent tradeoff.
> 
> 
> What I do have a problem with is that afaict the patch appeared on
> linux-mm for the first time just yesterday.  Didn't cc slab developers,
> it isn't in linux-next, didn't cc linux-kernel or linux-mm or slab/mm
> developers on the pull request.  Bad!

Yeap, noted.  Won't happen again.

> I'd like the slab developers to have time to understand and review this
> change, please.  Partly so they have a chance to provide feedback for
> the usual reasons, but also to help them understand the effect their
> design choice had on client subystems.

Sure, sorry to force the issue like I did.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-03  0:51   ` Mike Snitzer
  (?)
@ 2015-09-03  1:21   ` Linus Torvalds
  2015-09-03  2:31     ` Mike Snitzer
  2015-09-03  6:02     ` Dave Chinner
  -1 siblings, 2 replies; 42+ messages in thread
From: Linus Torvalds @ 2015-09-03  1:21 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Dave Chinner, Christoph Lameter, Pekka Enberg, Andrew Morton,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm

On Wed, Sep 2, 2015 at 5:51 PM, Mike Snitzer <snitzer@redhat.com> wrote:
>
> What I made possible with SLAB_NO_MERGE is for each subsystem to decide
> if they would prefer to not allow slab merging.

.. and why is that a choice that even makes sense at that level?

Seriously.

THAT is the fundamental issue here.

There are absolutely zero reasons this is dm-specific, but it is
equally true that there are absolutely zero reasons that it is
xyzzy-specific, for any random value of 'xyzzy'.

And THAT is why I'm fairly convinced that the whole approach is bogus
and broken.

And note that that bogosity is separate from how this was done. It's a
broken approach, but it was also done wrong. Two totally separate
issues, but together it sure is annoying.

> From where I sit it is much more useful to have separate slabs.  Could
> be if a case was actually made for slab merging I'd change my view.  But
> as of now these trump the stated benefits of slab merging:
> 1) useful slab usage stats
> 2) fault isolation from other subsystems

.. and again, absolutely NEITHER of those have anything to do with
"subsystem X".

Can you really not see how *illogical* it is to make this a subsystem choice?

So explain to me why you made it so?

> The 3 lines that added SLAB_NO_MERGE were pretty damn clean.

No. It really seriously wasn't.

The code may be simple, but it sure isn't "pretty damn clean", exactly
because I think the whole concept is fundamentally illogical. See
above.

As I mentioned in my email: if your point is that "slab_nomerge" has
the wrong default value, then that is a different discussion, and one
that may well be valid.

But the whole concept of "random slabs can mark themselves no-merge
for no obvious reasons" is broken. That was my argument, and you don't
seem to get it.

And even if it turns out not to be broken (please explain), it still
should have been discussed.

> SLAB_NO_MERGE gives subsystems a choice they didn't have before and they
> frankly probably never knew they had to care about it because they didn't
> know slabs were being merged.  I asked around enough to know I'm not an
> idiot for having missed the memo on slab merging.

Put another way: things have been merged for years, and you didn't even notice.

Seriously. I'm not exaggerating about "for years". At least for slub,
it's been that way since it was initially  merged, back in 2007.
Yeah, it may have taken a while for slub to then become one of the
major allocators, but it's been the default in at least Fedora for
years and years too, afaik, so it's not like slub is something odd and
unusual.

You seem to argue that "not being aware of it" means that it's
surprising and should be disabled. But quite frankly, wouldn't you say
that "it hasn't caused any obvious problems" is at _least_ as likely
an explanation for you not being aware of it?

Because clearly, that lack of statistics and the possible
cross-subsystem corruption hasn't actually been a pressing concern in
reality.

But suddenly it became such a big issue that you just _had_ to fix it,
right? After seven years it's suddenly *so* important that dm
absolutely has to disable it. And it really had to be dm that did it
for its caches, rather than just use "slab_nomerge".

Despite there not being anything dm-specific about that choice.

Now tell me, what was the rationale for this all again?

Because really, I'm not seeing it. And I'm _particularly_ not seeing
why it then had to be sneaked in like this.

                Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-03  1:21   ` Linus Torvalds
@ 2015-09-03  2:31     ` Mike Snitzer
  2015-09-03  3:10       ` Christoph Lameter
  2015-09-03  3:11       ` Linus Torvalds
  2015-09-03  6:02     ` Dave Chinner
  1 sibling, 2 replies; 42+ messages in thread
From: Mike Snitzer @ 2015-09-03  2:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Heinz Mauelshagen, Andrew Morton, Viresh Kumar, Dave Chinner,
	Joe Thornber, Pekka Enberg, linux-mm, dm-devel, Mikulas Patocka,
	Vivek Goyal, Sami Tolvanen, David Rientjes, Joonsoo Kim,
	Christoph Lameter, Alasdair G Kergon

On Wed, Sep 02 2015 at  9:21pm -0400,
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, Sep 2, 2015 at 5:51 PM, Mike Snitzer <snitzer@redhat.com> wrote:
> >
> > What I made possible with SLAB_NO_MERGE is for each subsystem to decide
> > if they would prefer to not allow slab merging.
> 
> .. and why is that a choice that even makes sense at that level?
> 
> Seriously.
> 
> THAT is the fundamental issue here.
> 
> There are absolutely zero reasons this is dm-specific, but it is
> equally true that there are absolutely zero reasons that it is
> xyzzy-specific, for any random value of 'xyzzy'.
> 
> And THAT is why I'm fairly convinced that the whole approach is bogus
> and broken.

Why do we even have slab creation flags?

Andrew seemed much more reasonable about this.

> And note that that bogosity is separate from how this was done. It's a
> broken approach, but it was also done wrong. Two totally separate
> issues, but together it sure is annoying.
> 
> > From where I sit it is much more useful to have separate slabs.  Could
> > be if a case was actually made for slab merging I'd change my view.  But
> > as of now these trump the stated benefits of slab merging:
> > 1) useful slab usage stats
> > 2) fault isolation from other subsystems
> 
> .. and again, absolutely NEITHER of those have anything to do with
> "subsystem X".

OK, I get that I'm unimportant.  You can stop beating me over my
irrelevant subsystem maintainer head now...

But when longstanding isolation and functionality is removed in the name
of microoptimizations its difficult to accept -- even if the realization
occurs years after the fact.

> Can you really not see how *illogical* it is to make this a subsystem choice?
> 
> So explain to me why you made it so?
> 
> > The 3 lines that added SLAB_NO_MERGE were pretty damn clean.
> 
> No. It really seriously wasn't.
> 
> The code may be simple, but it sure isn't "pretty damn clean", exactly
> because I think the whole concept is fundamentally illogical. See
> above.

Yeah, your circular logic doesn't help me.  You defined your argument in
terms of unsubstantiated claims of me being illogical.

What is illogical about wanting DM to:
1) have useful slab accounting
2) have fault isolation from other slab consumers
3) not impose 1+2 on all other subsystems

?

I guess I'm just supposed to accept that slab merging is or isn't.
There is no in-between (unless I create a slab with SLAB_DESTROY_BY_RCU)

> As I mentioned in my email: if your point is that "slab_nomerge" has
> the wrong default value, then that is a different discussion, and one
> that may well be valid.
> 
> But the whole concept of "random slabs can mark themselves no-merge
> for no obvious reasons" is broken. That was my argument, and you don't
> seem to get it.

I'm not getting it because I don't understand why you really care.  What
implied benefits come with slab merging that I'm painfully unaware of?

Andrew said DM would miss out on performance benefits.  I'd obviously
not want to do that; but said performance benefits haven't been made
apparent.
 
> And even if it turns out not to be broken (please explain), it still
> should have been discussed.

See above ;)

> > SLAB_NO_MERGE gives subsystems a choice they didn't have before and they
> > frankly probably never knew they had to care about it because they didn't
> > know slabs were being merged.  I asked around enough to know I'm not an
> > idiot for having missed the memo on slab merging.
> 
> Put another way: things have been merged for years, and you didn't even notice.
> 
> Seriously. I'm not exaggerating about "for years". At least for slub,
> it's been that way since it was initially  merged, back in 2007.
> Yeah, it may have taken a while for slub to then become one of the
> major allocators, but it's been the default in at least Fedora for
> years and years too, afaik, so it's not like slub is something odd and
> unusual.

You're also coming at this from a position that shared slabs are
automatically good because they have been around for years.

For those years I've not had a need to debug a leak in code I maintain;
so I didn't notice slabs were merged.  I also haven't observed slab
corruption being the cause of crashes in DM, block or SCSI.

> You seem to argue that "not being aware of it" means that it's
> surprising and should be disabled. But quite frankly, wouldn't you say
> that "it hasn't caused any obvious problems" is at _least_ as likely
> an explanation for you not being aware of it?

Sure.

> Because clearly, that lack of statistics and the possible
> cross-subsystem corruption hasn't actually been a pressing concern in
> reality.

Agreed.

> But suddenly it became such a big issue that you just _had_ to fix it,
> right? After seven years it's suddenly *so* important that dm
> absolutely has to disable it. And it really had to be dm that did it
> for its caches, rather than just use "slab_nomerge".

The ship sailed on disabling it for everyone.  It is the new norm.  I
cannot push RHEL to flip-flop slab characteristics (at least not until
the next major release).

> Despite there not being anything dm-specific about that choice.
> 
> Now tell me, what was the rationale for this all again?

I was the first to want the option to opt-out on a per slab basis.  And
you're shooting the messenger.  Calling me illogical.

> Because really, I'm not seeing it. And I'm _particularly_ not seeing
> why it then had to be sneaked in like this.

And I'm a sneaky too... Sneaking isn't what this was.  Apologies if
that's how it came off.  I can appreciate why you might think that.
But like I said to Andrew: won't happen again.

I'm off the next 5 days.  I don't think either of us care _that_
strongly about this particular issue.  I've noted my process flaws.
I'll calm down and this will just be some unfortunate thing that
happened.

But I'd still like some pointers/help on what makes slab merging so
beneficial.  I'm sure Christoph and others have justification.  But if
not then yes the default to slab merging probably should be revisited.

Mike

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-03  2:31     ` Mike Snitzer
@ 2015-09-03  3:10       ` Christoph Lameter
  2015-09-03  4:55         ` Andrew Morton
  2015-09-03  3:11       ` Linus Torvalds
  1 sibling, 1 reply; 42+ messages in thread
From: Christoph Lameter @ 2015-09-03  3:10 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Linus Torvalds, Heinz Mauelshagen, Andrew Morton, Viresh Kumar,
	Dave Chinner, Joe Thornber, Pekka Enberg, linux-mm, dm-devel,
	Mikulas Patocka, Vivek Goyal, Sami Tolvanen, David Rientjes,
	Joonsoo Kim, Alasdair G Kergon

On Wed, 2 Sep 2015, Mike Snitzer wrote:

> You're also coming at this from a position that shared slabs are
> automatically good because they have been around for years.
>
> For those years I've not had a need to debug a leak in code I maintain;
> so I didn't notice slabs were merged.  I also haven't observed slab
> corruption being the cause of crashes in DM, block or SCSI.

Hmmm... Thats unusual. I have seen numerous leaks and corruptions that
were debugged using the additional debug code in the slab allocators.
Merging and debugging can be switched on at runtime if necessary and then
you will have a clear separation to be able to track down the offending
code as well as detailed problem reports that help to figure out what was
wrong. It is then typically even possible to fix these bugs without
getting the subsystem specialists involved.

> > Because clearly, that lack of statistics and the possible
> > cross-subsystem corruption hasn't actually been a pressing concern in
> > reality.
>
> Agreed.

To the effect now that even SLAB has adopted cache merging.

> But I'd still like some pointers/help on what makes slab merging so
> beneficial.  I'm sure Christoph and others have justification.  But if
> not then yes the default to slab merging probably should be revisited.

Well, we have discussed the pros and cons for merging a couple of times
but the general consensus was that it is beneficial. Performance on modern
cpu is very sensitive to cache footprint and reducing the overhead of meta
data for object allocation is a worthwhile goal. Also objects are more
likely to be kept cache hot if they can be used by multiple subsystems.
Slab merging also helps with reducing fragmentation since the free
objects on one page can be used for other purposes.

Check out the linux-mm archives for these dissussions.

This has been such an advantage that the feature was ported to SLAB (to
much more signficant effect than SLUB since SLAB is a pig with metadata
per node, per cpu and per kmem_cache). And yes sorry the consequence is
now that you do no longer have a choice. Both slab allocators default to
merging. SLAB had some difficulty staying competitive in performance
without that. Joonsoo Kim made SLAB more competitive last year and one of
the optimizations was to also support merging.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-03  2:31     ` Mike Snitzer
  2015-09-03  3:10       ` Christoph Lameter
@ 2015-09-03  3:11       ` Linus Torvalds
  1 sibling, 0 replies; 42+ messages in thread
From: Linus Torvalds @ 2015-09-03  3:11 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Heinz Mauelshagen, Andrew Morton, Viresh Kumar, Dave Chinner,
	Joe Thornber, Pekka Enberg, linux-mm, dm-devel, Mikulas Patocka,
	Vivek Goyal, Sami Tolvanen, David Rientjes, Joonsoo Kim,
	Christoph Lameter, Alasdair G Kergon

On Wed, Sep 2, 2015 at 7:31 PM, Mike Snitzer <snitzer@redhat.com> wrote:
>
> Why do we even have slab creation flags?

Ehh? Because they are meaningful?

Things like SLAB_DESTROY_BY_RCU have real semantic meaning. The
subsystem that creates the slab *cares*, and it makes sense because
that kind of choice really fundamentally is a per-slab choice.

>> .. and again, absolutely NEITHER of those have anything to do
>> "subsystem X".
>
> OK, I get that I'm unimportant.  You can stop beating me over my
> irrelevant subsystem maintainer head now...

What the hell is your problem?

At no point did I state that you are any less important than anything
else. But this isssue is simply not in any way specific to dm. dm is
not any less important than anything else, but dm is also not
magically *more* important than everything else.

Really.

Then you seem to take it personally, but please realize that that is
*your* issue, not mine.

> But when longstanding isolation and functionality is removed in the name
> of microoptimizations its difficult to accept -- even if the realization
> occurs years after the fact.

Bullshit.

You didn't notice. For years. It just wasn't important. Just admit it.
Those things you now tout as so important are complete non-issues.

But more importantly, and this is what you seem to not really get at
all, is that it's STILL not dm-specific.

If you think that isolation is so important, then tell me why
isolation is only important for dm? Why isn't it important for
everything else? What makes dm so special?

Really. I've asked you three times now, and you seem to not get it,
you just think I'm trying to put you in your place. I'm not. I'm
asking a serious question: what makes dm so special that it has to
have different allocation logic from everything else.

And *THAT* is why SLAB_DESTROY_BY_RCU is different from your
SLAB_NO_MERGE. Because I can actually answer the question:

   "What makes sighand_cachep need SLAB_DESTROY_BY_RCU but not other users?"

with a real technical reason.

> I'm not getting it because I don't understand why you really care.  What
> implied benefits come with slab merging that I'm painfully unaware of?

It does actually have less overhead, for one thing. The separation of
slabs doesn't cost you just in the slab data structure itself, but in
the memory fragmentation. Having multiple slabs share the backing pool
of pages uses less memory.

> You're also coming at this from a position that shared slabs are
> automatically good because they have been around for years.

No, I'm really not.

Christ, have you read anything I wrote?

I'm ok with discussing the "the defaults should be turned around". But
at least we *have* an option to turn that default around, so when
people care (because they are trying to chase down a slab corruption
issue, for example), they can do so.

Your patch actually gets rid of that choice, and forces things the
other way around.

So I would argue that your patch actually makes things *worse*.  It
hardcodes an arbitrary choice, and it's not even a choice that makes
obvious sense.

And no, the memory fragmentation issue isn't just made up. One of the
downsides of slab was historically that it used a lot of memory, and
to be honest, I suspect the percpu queues have made things worse. At
least sharing the backing store minimizes the effect of that somewhat.
We used to have numbers for this all, but it's really approaching a
decade since the whole initial SLUB vs SLAB things, so I don't know
where to point you.

But the reason I say "it's not a choice that makes obvious sense"
isn't even because I'm convinced that the merging is always the best
option. I *am* convinced that it has real upsides, but I also agree
that it has downsides. But at least as it is right now, the system
admin can make a choice.

You arbitrarily wanted to take that choice away for dm, without
apparently even knowing what the upsides of merging might be.

But the *real* issue I have with it is the completely random "dm is
different from everything else" thing. Which is bogus. That's what I
wanted to know: what makes dm so special that it should be different
from everything else?

And apparently you don't have an answer to that. You just took my
repeated questioning to mean that you're worthless. That wasn't the
intent. It was very much a literal "what's so different about dm that
it would act differently from everything else"?

> The ship sailed on disabling it for everyone.  It is the new norm.  I
> cannot push RHEL to flip-flop slab characteristics (at least not until
> the next major release).

But you can. Today. Put "slab_nomerge" on the kernel command line.

Really. If you care, you can do that. And if you _don't_ care, then
clearly not doing that doesn't hurt either.

> I was the first to want the option to opt-out on a per slab basis.  And
> you're shooting the messenger.  Calling me illogical.

But the opt-in shouldn't be *you*, it should be the system maintainer
who can actually tune for his load, or cares about memory use, or
wants to debug, or any number of issues.

See?

Btw, I do agree that the "all or nothing" approach of "slab_nomerge"
isn't optimal. But you made things *worse*. You took a tunable, and
made it non-tunable, without apparently even knowing what it tuned
for. Sure, it was a damn coarse-grained tunable, but you made *that*
worse too, since with your code it's not tunable at all for dm. So
your version isn't actually any more "fine-grained".

Now, what might be interesting - *if* people actually want to tune
just one set of slabs and not another - migth be to extend the
"slab_nomerge" option to actually take a pattern of slab names, and
match that way.

So then you could say "slab_nomerge=dm_* slab_nomerge=xfs*", and you'd
not merge dm or xfs slabs. I wouldn't mind that kind of approach at
all.

But please understand _why_ I wouldn't mind it: I wouldn't mind it
exactly because you didn't take tuning choice away from people, but
because such a patch would actually give people control of it. And it
*wouldn't* be dm-specific, because other people might ask to not merge
ext4 slabs or whatever.

And for a similar reason, I actually wouldn't mind switching the
default around for merging. I'm *not* married to the "we have to merge
slab caches by default" model. It used to make sense, and I know I've
seen numbers (I'm pretty sure Christoph Lameter had several talks
about it back in the days), but things can change.

But what doesn't make sense is to make random willy-nilly decisions on
a basis that makes no sense. And I do claim that random subsystems
just unilaterally deciding that they don't care about system default
memory management falls under that "makes no sense" heading.

                          Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-03  3:10       ` Christoph Lameter
@ 2015-09-03  4:55         ` Andrew Morton
  2015-09-03  6:09           ` Pekka Enberg
  0 siblings, 1 reply; 42+ messages in thread
From: Andrew Morton @ 2015-09-03  4:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mike Snitzer, Linus Torvalds, Heinz Mauelshagen, Viresh Kumar,
	Dave Chinner, Joe Thornber, Pekka Enberg, linux-mm, dm-devel,
	Mikulas Patocka, Vivek Goyal, Sami Tolvanen, David Rientjes,
	Joonsoo Kim, Alasdair G Kergon

On Wed, 2 Sep 2015 22:10:12 -0500 (CDT) Christoph Lameter <cl@linux.com> wrote:

> > But I'd still like some pointers/help on what makes slab merging so
> > beneficial.  I'm sure Christoph and others have justification.  But if
> > not then yes the default to slab merging probably should be revisited.
> 
> ...
>
> Check out the linux-mm archives for these dissussions.

Somewhat OT, but...  The question Mike asks should be comprehensively
answered right there in the switch-to-merging patch's changelog.

The fact that it is not answered in the appropriate place and that
we're reduced to vaguely waving at the list archives is a fail.  And a
lesson!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-03  1:21   ` Linus Torvalds
  2015-09-03  2:31     ` Mike Snitzer
@ 2015-09-03  6:02     ` Dave Chinner
  2015-09-03  6:13       ` Pekka Enberg
                         ` (2 more replies)
  1 sibling, 3 replies; 42+ messages in thread
From: Dave Chinner @ 2015-09-03  6:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mike Snitzer, Christoph Lameter, Pekka Enberg, Andrew Morton,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm

On Wed, Sep 02, 2015 at 06:21:02PM -0700, Linus Torvalds wrote:
> On Wed, Sep 2, 2015 at 5:51 PM, Mike Snitzer <snitzer@redhat.com> wrote:
> >
> > What I made possible with SLAB_NO_MERGE is for each subsystem to decide
> > if they would prefer to not allow slab merging.
> 
> .. and why is that a choice that even makes sense at that level?
> 
> Seriously.
> 
> THAT is the fundamental issue here.

It makes a lot more sense than you think, Linus.

One of the reasons slab caches exist is to separate objects of
identical characteristics from the heap allocator so that they are
all grouped together in memory and so can be allocated/freed
efficiently.  This helps prevent heap fragmentation, allows objects
to pack as tightly together as possible, gives direct measurement of
the number of objects, the memory usage, the fragmentation factor,
etc. Containment of memory corruption is another historical reason
for slab separation (proof: current memory debugging options always
causes slab separation).

Slab merging is the exact opposite of this - we're taking homogenous
objects and mixing them with other homogneous containing different
objects with different life times. Indeed, we are even mixing them
back into the slabs used for the heap, despite the fact the original
purpose of named slabs was to separate allocation from the heap...

Don't get me wrong - this isn't necessarily bad - but I'm just
pointing out that slab merging is doing the opposite of what slabs
were originally intended for. Indeed, a lot of people use slab
caches just because it's anice encapsulation, not for any specific
performance, visibility or anti-fragmentation purposes.  I have no
problems with automatically merging slabs created like this.

However the fact that we are merging slabs automatically for all
slabs now has made me think a bit deeper about the problems that can
result from this.

> There are absolutely zero reasons this is dm-specific, but it is
> equally true that there are absolutely zero reasons that it is
> xyzzy-specific, for any random value of 'xyzzy'.

Right, it's not xyzzy-specific where 'xyzzy' is a subsystem. The
flag application is actually *object specific*. That is, the use of
the individual objects that determines whether it should be merged
or not.

e.g. Slab fragmentation levels are affected more than anything by
mixing objects with different life times in the same slab.  i.e. if
we free all the short lived objects from a page but there is one
long lived object on the page then that page is pinned and we free
no memory. Do that to enough pages in the slab, and we end up with a
badly fragmented slab.

With slab merging, we have no control over what slabs are merged. We
may be merging slabs with objects that have vastly different life
times. Hence merging may actually be making one of the underlying
cause of slab fragmentation worse rather than better. It really
depends on what slabs get merged together and that's largely random
chance - you don't get to pick the size of your structures....

Another contributor to slab fragmentation is when allocation order
is very different to object freeing order. Pages in the slab get
fill up using an algorithm that optimises for temporal locality.
i.e. it will fill a partial page before moving on to the next
partial page or allocating a new page.  If the freeing of objects
doesn't have the same temporal locality as allocation then when the
slab grows and shrinks we end up with fragmentation. Mixing
different object types into the same pages pretty much guarantees
that we'll be mixing objects of different alloc/freeing order.

Further, rapid growth and shrinking of a slab cache due to memory
demand can cause fragmentation. Caches that have this problem are
usually those that have a shrinker associated with them. The
shrinker causes objects to have a variable, unpredictable lifetime
and hence can break allocation/freeing locality (as per above, even
for single object slabs).

Minmising the effect of this reclaim fragmentation is often held up
as the example of why slab merging is good - the other object types
fill all the holes and hence reduces the overall fragmentation of
the slab. Further, the density of the reclaimable objects is lower,
so the slab doesn't fragment as much.

On the surface, this looks like a big win but it's not - it's
actually a major problem for slab reclaim and it manifests when
there are large bursts of allocation activity followed by sudden
reclaim activity.  When the slab grows rapidly, we get the majority
of objects on a page being of one type, but a couple will be of a
different type. Than under memory pressure, the shrinker can then
only free the majority of objects on a page, guaranteeing the slab
will remain fragmented under memory pressure.  Continuing to run the
shrinker won't result in any more memory being freed from the merged
slab and so we are stuck with unfixable slab fragmentation.

However, if the slab with a shrinker only contains one kind of
object, when it becomes fragmented due to variable object lifetime,
continued memory pressure will cause it to keep shrinking and hence
will eventually correct the fragmentation problem. This is a much
more robust configuration - the system will self correct without
user intervention being necessary.

IOWs, slab merging prevents us from implementing effective active
fragmentation management algorithms and hence prevents us  from
reducing slab fragmentation via improved shrinker reclaim
algorithms.  Simply put: slab merging reduces the effectiveness of
shrinker based slab reclaim.

A key observation I just made: we are extremely lucky that many of
the critical slab caches in the system are not affected by merging.
A slab cache with a constructor will not get merged and that means
inode caches do not get merged. Hence, despite slab merging being
enabled, one of the largest memory consuming slabs in the system
does not get merged and hence it means the shrinker has been able to
do it's job without interference. hence we've avoided the worst
outcome of merging slabs by default by luck rather than good
managment.

Moving on from fragmentation: Slab caches can also back mempools.
mempools ar eused to guarantee forwards progress under memory
pressure, so it's important to have visibility into their behaviour.

Hence it makes sense to ensure these don't get merged with other
slabs so they are accounted accurately and we can see exactly the
demand being placed on these critical slabs under heavy memory
pressure. I've made use of this several times over the past few
years to discover why a system is floundering under heavy memory
pressure (e.g. writeback way slower than it should have been because
the xfs_ioend mempool was operating in 1-in, 1-out mode)...

So, when I said that I could use the SLAB_NO_MERGE for some caches
in XFS and acked the patch, I was refering to exactly this sort of
usage - the slabs that back mempools and the slabs that have a
shrinker for reclaim should have this flag set. 4 of 17 named slabs
in XFS need this flag - the rest I don't really care about because
their memory usage can be inferred from the shrinkable slab cache
sizes.

Managing slab caches and fragmentation is anything but simple and
there is no one right solution. Slab merging in some cases makes
sense, but there are several very good reasons for not merging a
slab.  The right solution is often difficult for people without
object-specific expertise to understand, but that goes for just
about everything in the kernel these days.

BTW, it is trivial to achieve SLAB_NO_MERGE simply by supplying a
dummy constructor to the slab initialisation.  I'd much prefer
SLAB_NO_MERGE or some variant, though.

Cheers,

Dave.
-- 
Dave Chinner
dchinner@redhat.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-03  4:55         ` Andrew Morton
@ 2015-09-03  6:09           ` Pekka Enberg
  2015-09-03  8:53             ` Dave Chinner
  0 siblings, 1 reply; 42+ messages in thread
From: Pekka Enberg @ 2015-09-03  6:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Christoph Lameter, Mike Snitzer, Linus Torvalds,
	Heinz Mauelshagen, Viresh Kumar, Dave Chinner, Joe Thornber,
	linux-mm, dm-devel, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	David Rientjes, Joonsoo Kim, Alasdair G Kergon

Hi Andrew,

On Wed, 2 Sep 2015 22:10:12 -0500 (CDT) Christoph Lameter <cl@linux.com> wrote:
>> > But I'd still like some pointers/help on what makes slab merging so
>> > beneficial.  I'm sure Christoph and others have justification.  But if
>> > not then yes the default to slab merging probably should be revisited.
>>
>> ...
>>
>> Check out the linux-mm archives for these dissussions.

On Thu, Sep 3, 2015 at 7:55 AM, Andrew Morton <akpm@linux-foundation.org> wrote:
> Somewhat OT, but...  The question Mike asks should be comprehensively
> answered right there in the switch-to-merging patch's changelog.
>
> The fact that it is not answered in the appropriate place and that
> we're reduced to vaguely waving at the list archives is a fail.  And a
> lesson!

Slab merging is a technique to reduce memory footprint and memory
fragmentation. Joonsoo reports 3% slab memory reduction after boot
when he added the feature to SLAB:

commit 12220dea07f1ac6ac717707104773d771c3f3077
Author: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Date:   Thu Oct 9 15:26:24 2014 -0700

    mm/slab: support slab merge

    Slab merge is good feature to reduce fragmentation.  If new creating slab
    have similar size and property with exsitent slab, this feature reuse it
    rather than creating new one.  As a result, objects are packed into fewer
    slabs so that fragmentation is reduced.

    Below is result of my testing.

    * After boot, sleep 20; cat /proc/meminfo | grep Slab

    <Before>
    Slab: 25136 kB

    <After>
    Slab: 24364 kB

    We can save 3% memory used by slab.

    For supporting this feature in SLAB, we need to implement SLAB specific
    kmem_cache_flag() and __kmem_cache_alias(), because SLUB implements some
    SLUB specific processing related to debug flag and object size change on
    these functions.

    Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

We don't have benchmarks to directly measure its performance impact
but you should see its effect via something like netperf that stresses
the allocator heavily. The assumed benefit is that you're able to
recycle cache hot objects much more efficiently as SKB cache and
friends are merged to regular kmalloc caches.

In any case, reducing kernel memory footprint already is a big win for
various use cases, so keeping slab merging on by default is desirable.

- Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-03  6:02     ` Dave Chinner
@ 2015-09-03  6:13       ` Pekka Enberg
  2015-09-03 10:29       ` Jesper Dangaard Brouer
  2015-09-03 15:02       ` Linus Torvalds
  2 siblings, 0 replies; 42+ messages in thread
From: Pekka Enberg @ 2015-09-03  6:13 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Mike Snitzer, Christoph Lameter, Andrew Morton,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm

On Thu, Sep 3, 2015 at 9:02 AM, Dave Chinner <dchinner@redhat.com> wrote:
> One of the reasons slab caches exist is to separate objects of
> identical characteristics from the heap allocator so that they are
> all grouped together in memory and so can be allocated/freed
> efficiently.  This helps prevent heap fragmentation, allows objects
> to pack as tightly together as possible, gives direct measurement of
> the number of objects, the memory usage, the fragmentation factor,
> etc. Containment of memory corruption is another historical reason
> for slab separation (proof: current memory debugging options always
> causes slab separation).
>
> Slab merging is the exact opposite of this - we're taking homogenous
> objects and mixing them with other homogneous containing different
> objects with different life times. Indeed, we are even mixing them
> back into the slabs used for the heap, despite the fact the original
> purpose of named slabs was to separate allocation from the heap...
>
> Don't get me wrong - this isn't necessarily bad - but I'm just
> pointing out that slab merging is doing the opposite of what slabs
> were originally intended for. Indeed, a lot of people use slab
> caches just because it's anice encapsulation, not for any specific
> performance, visibility or anti-fragmentation purposes.  I have no
> problems with automatically merging slabs created like this.

Yes, absolutely. Alternative to slab merging is to actually reduce the
number of caches we create in the first place and use kmalloc()
wherever possible.

- Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-03  6:09           ` Pekka Enberg
@ 2015-09-03  8:53             ` Dave Chinner
  0 siblings, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2015-09-03  8:53 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andrew Morton, Christoph Lameter, Mike Snitzer, Linus Torvalds,
	Heinz Mauelshagen, Viresh Kumar, Joe Thornber, linux-mm,
	dm-devel, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	David Rientjes, Joonsoo Kim, Alasdair G Kergon

On Thu, Sep 03, 2015 at 09:09:24AM +0300, Pekka Enberg wrote:
> Hi Andrew,
> 
> On Wed, 2 Sep 2015 22:10:12 -0500 (CDT) Christoph Lameter <cl@linux.com> wrote:
> >> > But I'd still like some pointers/help on what makes slab merging so
> >> > beneficial.  I'm sure Christoph and others have justification.  But if
> >> > not then yes the default to slab merging probably should be revisited.
> >>
> >> ...
> >>
> >> Check out the linux-mm archives for these dissussions.
> 
> On Thu, Sep 3, 2015 at 7:55 AM, Andrew Morton <akpm@linux-foundation.org> wrote:
> > Somewhat OT, but...  The question Mike asks should be comprehensively
> > answered right there in the switch-to-merging patch's changelog.
> >
> > The fact that it is not answered in the appropriate place and that
> > we're reduced to vaguely waving at the list archives is a fail.  And a
> > lesson!
> 
> Slab merging is a technique to reduce memory footprint and memory
> fragmentation. Joonsoo reports 3% slab memory reduction after boot
> when he added the feature to SLAB:

I'm not sure whether you are trying to indicate that it was
justified inteh commit message or indicate how little justification
there was...

> commit 12220dea07f1ac6ac717707104773d771c3f3077
> Author: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Date:   Thu Oct 9 15:26:24 2014 -0700
> 
>     mm/slab: support slab merge
> 
>     Slab merge is good feature to reduce fragmentation.  If new creating slab
>     have similar size and property with exsitent slab, this feature reuse it
>     rather than creating new one.  As a result, objects are packed into fewer
>     slabs so that fragmentation is reduced.

A partial page or two in a newly allocated slab in not
"fragmentation". They are simply free objects in the cache that
haven't been allocated yet. Fragmentation occurs when large numbers
of objects are freed so the pages end up mostly empty but
cannot be freed because there is still 1 or 2 objects in use of
them. As such, if there was fragementation and slab merging fixed
it, I'd expect to be seeing a much larger reduction in memory
usage....

>     Below is result of my testing.
> 
>     * After boot, sleep 20; cat /proc/meminfo | grep Slab
> 
>     <Before>
>     Slab: 25136 kB
> 
>     <After>
>     Slab: 24364 kB
> 
>     We can save 3% memory used by slab.

The numbers don't support the conclusion. Memory used from boot to
boot always varies by a small amount - a slight difference in the
number of files accessed by the boot process can account for this.
Also, you can't 't measure slab fragmentation by measuring the
amount of memory used. You have to look at object counts in each
slab and work out the percentage of free vs allocated objects. So
there's no evidence that this 772kb difference in memory footprint
can even be attributed to slab merging.

What about the rest of the slab fragmentation problem space?  It's
not even mentioned in the commit, but that's really what is
important to long running machines.

IOWs, where's description of the problem that needs fixing? What's
the example workload that demonstrates the problem? What's the
before and after measurements of the workloads that generate
significant slab fragmentation?  What's the long term impact of the
change (e.g.  a busy server with a uptime of several weeks)? is the
fragmentation level reduced?  increased? not significant?  What
impact does this have on subsystems with shrinkers that are now
operating on shared slabs? Do the shrinkers still work as
effectively as they used to?  Do they now cause slab fragmentation,
and if they do, does it self correct under continued memory
pressure?

And with the patch being merged without a single reviewed-by or
acked-by, I'm sitting here wondering how we managed to fail software
engineering 101 so badly here?

Cheers,

Dave.
-- 
Dave Chinner
dchinner@redhat.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-03  6:02     ` Dave Chinner
  2015-09-03  6:13       ` Pekka Enberg
@ 2015-09-03 10:29       ` Jesper Dangaard Brouer
  2015-09-03 16:19         ` Christoph Lameter
  2015-09-04  6:35         ` Sergey Senozhatsky
  2015-09-03 15:02       ` Linus Torvalds
  2 siblings, 2 replies; 42+ messages in thread
From: Jesper Dangaard Brouer @ 2015-09-03 10:29 UTC (permalink / raw)
  To: Dave Chinner
  Cc: brouer, Linus Torvalds, Mike Snitzer, Christoph Lameter,
	Pekka Enberg, Andrew Morton, David Rientjes, Joonsoo Kim,
	dm-devel, Alasdair G Kergon, Joe Thornber, Mikulas Patocka,
	Vivek Goyal, Sami Tolvanen, Viresh Kumar, Heinz Mauelshagen,
	linux-mm


On Thu, 3 Sep 2015 16:02:47 +1000 Dave Chinner <dchinner@redhat.com> wrote:

> On Wed, Sep 02, 2015 at 06:21:02PM -0700, Linus Torvalds wrote:
> > On Wed, Sep 2, 2015 at 5:51 PM, Mike Snitzer <snitzer@redhat.com> wrote:
> > >
> > > What I made possible with SLAB_NO_MERGE is for each subsystem to decide
> > > if they would prefer to not allow slab merging.
> > 
> > .. and why is that a choice that even makes sense at that level?
> > 
> > Seriously.
> > 
> > THAT is the fundamental issue here.
> 
> It makes a lot more sense than you think, Linus.
> 
[...]
> 
> On the surface, this looks like a big win but it's not - it's
> actually a major problem for slab reclaim and it manifests when
> there are large bursts of allocation activity followed by sudden
> reclaim activity.  When the slab grows rapidly, we get the majority
> of objects on a page being of one type, but a couple will be of a
> different type. Than under memory pressure, the shrinker can then
> only free the majority of objects on a page, guaranteeing the slab
> will remain fragmented under memory pressure.  Continuing to run the
> shrinker won't result in any more memory being freed from the merged
> slab and so we are stuck with unfixable slab fragmentation.
> 
> However, if the slab with a shrinker only contains one kind of
> object, when it becomes fragmented due to variable object lifetime,
> continued memory pressure will cause it to keep shrinking and hence
> will eventually correct the fragmentation problem. This is a much
> more robust configuration - the system will self correct without
> user intervention being necessary.
> 
> IOWs, slab merging prevents us from implementing effective active
> fragmentation management algorithms and hence prevents us  from
> reducing slab fragmentation via improved shrinker reclaim
> algorithms.  Simply put: slab merging reduces the effectiveness of
> shrinker based slab reclaim.

I'm buying into the problem of variable object lifetime sharing the
same slub.

With the SLAB bulk free API I'm introducing, we can speedup slub
slowpath, by free several objects with a single cmpxchg_double, BUT
these objects need to belong to the same page.
 Thus, as Dave describe with merging, other users of the same size
objects might end up holding onto objects scattered across several
pages, which gives the bulk free less opportunities.

That would be a technical argument for introducing a SLAB_NO_MERGE flag
per slab.  But I want to do some measurement before making any
decision. And it might be hard to show for my use-case of SKB free,
because SKB allocs will likely be dominating 256 bytes slab anyhow.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-03  6:02     ` Dave Chinner
  2015-09-03  6:13       ` Pekka Enberg
  2015-09-03 10:29       ` Jesper Dangaard Brouer
@ 2015-09-03 15:02       ` Linus Torvalds
  2015-09-04  3:26         ` Dave Chinner
  2 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2015-09-03 15:02 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Mike Snitzer, Christoph Lameter, Pekka Enberg, Andrew Morton,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm

On Wed, Sep 2, 2015 at 11:02 PM, Dave Chinner <dchinner@redhat.com> wrote:
> On Wed, Sep 02, 2015 at 06:21:02PM -0700, Linus Torvalds wrote:
>> On Wed, Sep 2, 2015 at 5:51 PM, Mike Snitzer <snitzer@redhat.com> wrote:
>> >
>> > What I made possible with SLAB_NO_MERGE is for each subsystem to decide
>> > if they would prefer to not allow slab merging.
>>
>> .. and why is that a choice that even makes sense at that level?
>>
>> Seriously.
>>
>> THAT is the fundamental issue here.
>
> It makes a lot more sense than you think, Linus.

Not really. Even your argument isn't at all arguing for doing things
at a per-subsystem level - it's an argument about the potential sanity
of marking _individual_ slab caches non-mergable, not an argument for
something clearly insane like "mark all slabs for subsystem X
unmergable".

Can you just admit that that was insane? There is *no* sense in that
kind of behavior.

> Right, it's not xyzzy-specific where 'xyzzy' is a subsystem. The
> flag application is actually *object specific*. That is, the use of
> the individual objects that determines whether it should be merged
> or not.

Yes.

I do agree that something like SLAB_NO_MERGE can make sense on an
actual object-specific level, if you have very specific allocation
pattern knowledge and can show that the merging actually hurts.

But making the subsystem decide that all its slab caches should be
"no-merge" is just BS. You know that. It makes no sense, just admit
it.

> e.g. Slab fragmentation levels are affected more than anything by
> mixing objects with different life times in the same slab.  i.e. if
> we free all the short lived objects from a page but there is one
> long lived object on the page then that page is pinned and we free
> no memory. Do that to enough pages in the slab, and we end up with a
> badly fragmented slab.

The thing is, *if* you can show that kind of behavior for a particular
slab, and have numbers for it, then mark that slab as no-merge, and
document why you did it.

Even then, I'd personally probably prefer to name the bit differently:
rather than talk about an internal implementation detail within slab
("don't merge") it would probably be better to try to frame it in the
semantic different you are looking for (ie in "I want a slab with
private allocation patterns").

But aside from that kind of naming issue, that's very obviously not
what the patch series discussed was doing.

And quite frankly, I don't actually think you have the numbers to show
that theoretical bad behavior.  In contrast, there really *are*
numbers to show the advantages of merging.

So the fragmentation argument has been shown to generally be in favor
of merging, _not_ in favor of that "no-merge" behavior. If you have an
actual real load where that isn't the case, and can show it, then that
would be interesting, but at no point is that "the subsystem just
decided to mark all its slabs no-merge".

               Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-03 10:29       ` Jesper Dangaard Brouer
@ 2015-09-03 16:19         ` Christoph Lameter
  2015-09-04  9:10           ` Jesper Dangaard Brouer
  2015-09-04  6:35         ` Sergey Senozhatsky
  1 sibling, 1 reply; 42+ messages in thread
From: Christoph Lameter @ 2015-09-03 16:19 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Dave Chinner, Linus Torvalds, Mike Snitzer, Pekka Enberg,
	Andrew Morton, David Rientjes, Joonsoo Kim, dm-devel,
	Alasdair G Kergon, Joe Thornber, Mikulas Patocka, Vivek Goyal,
	Sami Tolvanen, Viresh Kumar, Heinz Mauelshagen, linux-mm

On Thu, 3 Sep 2015, Jesper Dangaard Brouer wrote:

> > IOWs, slab merging prevents us from implementing effective active
> > fragmentation management algorithms and hence prevents us  from
> > reducing slab fragmentation via improved shrinker reclaim
> > algorithms.  Simply put: slab merging reduces the effectiveness of
> > shrinker based slab reclaim.
>
> I'm buying into the problem of variable object lifetime sharing the
> same slub.

Well yeah I see the logic of the argument but what I have seen in practice
is that the access to objects becomes rather random over time. inodes and
denties are used by multiple underlying volumes/mountpoints etc. They are
expired individually etc etc. The references to objects become garbled
over time anyways.

What I would be interested in is some means by which locality of objects
of different caches can be explicitly specified. This would allow the
placing together of multiple objects in the same page frame. F.e. dentries
and inodes and other metadata of a filesystem that is related. This would
enhance the locality of the data and allow better defragmentation. But we
are talking here about a totally different allocator design.

> With the SLAB bulk free API I'm introducing, we can speedup slub
> slowpath, by free several objects with a single cmpxchg_double, BUT
> these objects need to belong to the same page.
>  Thus, as Dave describe with merging, other users of the same size
> objects might end up holding onto objects scattered across several
> pages, which gives the bulk free less opportunities.

This happens regardless as far as I can tell. On boot up you may end up
for a time in special situations where that is true.

> That would be a technical argument for introducing a SLAB_NO_MERGE flag
> per slab.  But I want to do some measurement before making any
> decision. And it might be hard to show for my use-case of SKB free,
> because SKB allocs will likely be dominating 256 bytes slab anyhow.

With the skbs you would want to place the skb data together with the
packet data and other network related objects right? Maybe we can think
out an allocator that can store objects related to a specific action in a
page frame that can then be tossed as a whole.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-03 15:02       ` Linus Torvalds
@ 2015-09-04  3:26         ` Dave Chinner
  2015-09-04  3:51           ` Linus Torvalds
  2015-09-04 13:55           ` Christoph Lameter
  0 siblings, 2 replies; 42+ messages in thread
From: Dave Chinner @ 2015-09-04  3:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mike Snitzer, Christoph Lameter, Pekka Enberg, Andrew Morton,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm

On Thu, Sep 03, 2015 at 08:02:40AM -0700, Linus Torvalds wrote:
> On Wed, Sep 2, 2015 at 11:02 PM, Dave Chinner <dchinner@redhat.com> wrote:
> > On Wed, Sep 02, 2015 at 06:21:02PM -0700, Linus Torvalds wrote:
> > Right, it's not xyzzy-specific where 'xyzzy' is a subsystem. The
> > flag application is actually *object specific*. That is, the use of
> > the individual objects that determines whether it should be merged
> > or not.
> 
> Yes.
> 
> I do agree that something like SLAB_NO_MERGE can make sense on an
> actual object-specific level, if you have very specific allocation
> pattern knowledge and can show that the merging actually hurts.

There are generic cases where it hurts, so no justification should
be needed for those cases...

> > e.g. Slab fragmentation levels are affected more than anything by
> > mixing objects with different life times in the same slab.  i.e. if
> > we free all the short lived objects from a page but there is one
> > long lived object on the page then that page is pinned and we free
> > no memory. Do that to enough pages in the slab, and we end up with a
> > badly fragmented slab.
> 
> The thing is, *if* you can show that kind of behavior for a particular
> slab, and have numbers for it, then mark that slab as no-merge, and
> document why you did it.

The double standard is the problem here. No notification, proof,
discussion or review was needed to turn on slab merging for
everyone, but you're setting a very high bar to jump if anyone wants
to turn it off in their code.

> And quite frankly, I don't actually think you have the numbers to show
> that theoretical bad behavior.

I don't keep numbers close handy. I've been dealing with these
problems for ten years, to I just know what workloads demonstrate
this "theoretical bad behaviour" within specific slabs and test them
when relevant. I'll do a couple of quick "merging is better"
verification tests this afternoon, but other than that I don't have
time in the next couple of weeks...

But speaking of workloads, internal inode cache slab fragmentation
is simple to reproduce on any filesystem. XFS just happens to be the
only one that really actively manages it as a result of long term
developer awareness of the problem. I first tripped over it in early
2005 with SpecSFS, and then with other similar NFS benchmarks like
filebench.  That's where Christoph Lameter was introduced to the
problem, too:

https://lwn.net/Articles/371892/

" The problem is that sparse use of objects in slab caches can cause
large amounts of memory to become unusable. The first ideas to
address this were developed in 2005 by various people."

FYI, with appropriate manual "drop slab" hacks during the benchmark,
we could get 20-25% higher throughput from the NFS server because
dropping the entire slab cache before the measurement phase meant we
avoided the slab fragmentation issue and had ~50% more free memory
to use for the page cache during the measurement period...

Similar problems have been reported over the years by users with
backup programs or scripts that used find, rsync and/or 'cp -R' on
large filesystems. It used to be easy to cause these sorts of
problems in the XFS inode cache. There's quite a few other
workloads, but it easily to reproduce inode slab fragmetnation with
find, bulkstat and cp. Basically all you need to do is populate the
inode cache, randomise the LRU order, then trigger combined inode
cache and memory demand.  It's that simple.

The biggest problem with using a workload like this to "prove" that
slab merging degrades behaviour is that we don't know what slabs
have been merged. Hence it's extremely hard to generate a workload
definition that demonstrates it. Indeed, change kernel config
options, structures change size and the slab is merged with
different objects, so the workload that generates problems has to be
changed, too.  And it doesn't even need to be a kernel with a
different config - just a different set of modules loaded because
the hardware and software config is different will change what slabs
are merged.

IOWs, what produces a problem on one kernel on one machine will not
reproduce the same problem on a different kernel or machine. Numbers
are a crapshoot here, especially as the cause of the problem is
trivially easy to understand.

Linus, you always say that at some point you've just got to step
back, read the code and understand the underlying issue that is
being dealt with because some things are way too complex to
reproduce reliably. This is one of those cases - it's obvious that
slab merging does not fix or prevent internal slab cache
fragmentation and that it only serves to minimise the impact of
fragmentation by amortising it across multiple similar slabs.
Really, this is the best we can do with passive slab caches where
you can't control freeing patterns.

However, we also have actively managed slab caches, and they can and
do work to prevent fragmetnation and clear it quickly when it
happens. Merging these actively managed slabs with other passive
slab is just a bad idea because the passive slab objects can only
reduce the effectiveness of the active management algorithms. We
don't need numbers to understand this - it's clear and obvious from
an algorithmic point of view.

> In contrast, there really *are*
> numbers to show the advantages of merging.

I have never denied that. Please listen to what I'm saying.

> So the fragmentation argument has been shown to generally be in favor
> of merging, _not_ in favor of that "no-merge" behavior.

Yes, all the numbers and research I've seen has been on passive
slab cache behaviour. I *agree* that passive slab caches should be
merged, but I don't recall anyone documenting the behavioural
distinction between active/passive slabs before now, even though
it's been something I've had in my head for several years. Actively
managed slabs are very different in their behaviour to passive
slabs, and so what holds true for passive slabs is not necessarily
true for actively managed slabs.

Really, we don't need some stupidly high bar to jump over here -
whether merging should be allowed can easily be answered with a
simple question: "Does the slab have a shrinker or does it back a
mempool?" If the answer is yes then using SLAB_SHRINKER or
SLAB_MEMPOOL to trigger the no-merge case doesn't need any more
justification from subsystem maintainers at all.

Cheers,

Dave.
-- 
Dave Chinner
dchinner@redhat.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-04  3:26         ` Dave Chinner
@ 2015-09-04  3:51           ` Linus Torvalds
  2015-09-05  0:36               ` Dave Chinner
  2015-09-07  9:30             ` Jesper Dangaard Brouer
  2015-09-04 13:55           ` Christoph Lameter
  1 sibling, 2 replies; 42+ messages in thread
From: Linus Torvalds @ 2015-09-04  3:51 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Mike Snitzer, Christoph Lameter, Pekka Enberg, Andrew Morton,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm

On Thu, Sep 3, 2015 at 8:26 PM, Dave Chinner <dchinner@redhat.com> wrote:
>
> The double standard is the problem here. No notification, proof,
> discussion or review was needed to turn on slab merging for
> everyone, but you're setting a very high bar to jump if anyone wants
> to turn it off in their code.

Ehh. You realize that almost the only load that is actually seriously
allocator-limited is networking?

And slub was beating slab on that? And slub has been doing the merging
since day one. Slab was just changed to try to keep up with the
winning strategy.

Really. You seem to think that this merging thing is new. It's really
not. Where did you miss the part that it's been done since 2007?

It's only new for slab, and the reason it was introduced for slab was
that it was losing most relevant benchmarks to slub.

So do you now want a "SLAB_NO_MERGE_IF_NOT_SLUB" flag, which keeps the
traditional behavior for slab and slub? Just because its' traditional?
One that says "if the allocator is slub, then merge, but if the
allocator is slab, then don't merge".

Really, Dave. You have absolutely nothing to back up your points with.
Merging is *not* some kind of "new" thing that was silently enabled
recently to take you by surprise.

That seems to be your *only* argument: that the behavior changed
behind your back. IT IS NOT TRUE. It's only true since you don't seem
to realize that a large portion of the world moved on to SLUB a long
time ago.

Do you seriously believe that a "SLAB_NO_MERGE_IF_NOT_SLUB" flag is a
good idea, just to justify your position of "let's keep the merging
behavior the way it has been"?

Or do you seriously think that it's a good idea to take the
non-merging behavior from the allocator that was falling behind?

So no. The switch to merging behavior was not some kind of "no
discussion" thing. It was very much part of the whole original _point_
of SLUB. And the point of having allocator choices was to see which
one worked best.

SLUB essentially won. We could have just deleted SLAB. I don't think
that would necessarily have been a bad idea. Instead, slab was taught
to try to do some of the same things that worked for slub.

At what point do you just admit that your arguments aren't holding water?

So the fact remains: if you can actually show that not merging is a
good idea for particular slabs, then that's real data. But right now
you are just ignoring the real data and the SLUB  we've had over the
years.

And if you continue to spout nonsense about "silent behavioral
changes", the only thing you show is that you don't know what the hell
you are talking about.

So your claim of "double standard" is pure and utter shit. Get over it.

                 Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-03 10:29       ` Jesper Dangaard Brouer
  2015-09-03 16:19         ` Christoph Lameter
@ 2015-09-04  6:35         ` Sergey Senozhatsky
  2015-09-04  7:01           ` Linus Torvalds
  1 sibling, 1 reply; 42+ messages in thread
From: Sergey Senozhatsky @ 2015-09-04  6:35 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Dave Chinner, Linus Torvalds, Mike Snitzer, Christoph Lameter,
	Pekka Enberg, Andrew Morton, David Rientjes, Joonsoo Kim,
	dm-devel, Alasdair G Kergon, Joe Thornber, Mikulas Patocka,
	Vivek Goyal, Sami Tolvanen, Viresh Kumar, Heinz Mauelshagen,
	linux-mm, Sergey Senozhatsky

[-- Attachment #1: Type: text/plain, Size: 1841 bytes --]

On (09/03/15 12:29), Jesper Dangaard Brouer wrote:
[..]
> I'm buying into the problem of variable object lifetime sharing the
> same slub.
> 
> With the SLAB bulk free API I'm introducing, we can speedup slub
> slowpath, by free several objects with a single cmpxchg_double, BUT
> these objects need to belong to the same page.
>  Thus, as Dave describe with merging, other users of the same size
> objects might end up holding onto objects scattered across several
> pages, which gives the bulk free less opportunities.
> 
> That would be a technical argument for introducing a SLAB_NO_MERGE flag
> per slab.  But I want to do some measurement before making any
> decision. And it might be hard to show for my use-case of SKB free,
> because SKB allocs will likely be dominating 256 bytes slab anyhow.

Out of curiosity, I did some quite simple-minded
"slab_nomerge = 0" vs. "slab_nomerge = 1" tests today on my old
x86_64 box (4gigs of RAM, ext4, 4.2.0-next-20150903):

 - git clone https://github.com/git/git; make -j8; package (archlinux);
 cleanup;
 - create a container; untar gcc-5-20150901.tar.bz2; make -j6; package;
 cleanup;


I modified /proc/slabinfo to show a total `unused` space size, accounted
in cache_show() (reset in slab_start()):

  __unused_objs_sz += (sinfo.num_objs - sinfo.active_objs) * s->size;

and captured that value every second
	tail -1 /proc/slabinfo >> slab_unused


FWIW, files with the numbers attached (+ gnuplot graph). Those numbers are
not really representative, but well, they are what they are -- on my small
x86_64 box with very limited resources and under pretty general tests
"slab_nomerge = 1" shows better numbers. (hm... can embedded benefit from
disabling merging?). I think (as Dave Chinner said), preventing active and
passive slabs merging does not sound so crazy. Just my 2 cents.

	-ss

[-- Attachment #2: slab_unused_nomerge --]
[-- Type: text/plain, Size: 14836 bytes --]

101440
135360
135360
135360
135360
135360
118976
127168
127168
112832
156032
88448
129408
102784
102784
102784
102784
102784
98816
176640
176640
145920
137728
137728
129536
84480
143872
88256
78016
78016
78016
74048
74048
74048
74048
96576
96576
96576
96576
96576
96576
96576
153920
153920
102720
104768
93248
105536
80960
80960
66624
66624
66624
66624
66624
66624
138752
142848
130560
130560
130560
130560
124928
100352
100352
100352
69632
69632
53248
53248
53248
53248
53248
53248
69632
69632
69632
69632
100352
112640
98304
110592
95232
86016
66560
66560
44032
117760
117760
117760
87040
87040
82944
82944
82944
82944
118784
118784
118784
104448
104448
104448
104448
104448
104448
65536
49152
49152
49152
49152
124928
124928
77824
160768
160768
214016
150528
150528
121856
89088
89088
89088
160768
160768
142336
142336
142336
142336
107520
107520
109568
109568
109568
109568
109568
109568
109568
109568
109568
109568
109568
109568
109568
109568
111616
111616
91136
91136
91136
91136
91136
91136
91136
91136
91136
76800
76800
66560
95808
95808
95808
95808
159296
147008
147008
147008
196160
196160
196160
196160
196160
196160
163392
163392
163392
163392
134144
197312
216384
211456
213888
179136
204480
125952
205824
205824
183936
145728
203328
199296
183296
150016
128128
128128
130048
141568
231288
223392
204984
204984
217272
231736
163480
189576
153408
209024
205760
218760
177344
222400
191680
157248
227136
201088
240640
213912
203392
123904
146432
146432
146432
205312
122944
268800
256768
222944
272000
260480
288128
220608
224704
246056
137040
133120
154496
234496
276352
276352
318976
364224
300608
306688
254464
244352
173440
217664
227456
209408
185408
164480
149504
149504
179392
240768
235648
182336
152448
201152
238504
168976
226088
188896
141312
184000
181824
174912
155264
139968
186368
302848
341824
275376
260480
338816
288640
308840
245952
269120
255808
248416
248416
306008
291672
230616
201984
142000
187568
181248
194240
188416
143936
204672
217472
166080
104448
155392
139776
150016
208256
178496
178752
183488
104448
189568
164488
157200
122048
167872
176640
163712
196992
182848
100352
149248
141568
210432
194176
214784
213056
237824
210304
210304
260976
237808
386960
340816
321360
243760
209536
212272
192352
282208
198648
213816
245880
211896
244280
166392
293552
331616
328448
231680
362624
202752
202752
310592
348736
291456
199936
209536
217992
202816
284032
259392
259392
259392
215360
229440
229888
241152
161792
161792
204608
183336
195944
173432
173376
173376
159552
208960
178304
178304
133120
170432
207104
201536
198784
187712
250952
195688
150312
142864
142864
142864
187344
154800
143984
170352
203056
180448
168288
162944
151808
235008
226112
226112
186880
183936
133120
133120
133120
110592
98304
98304
81920
81920
81920
237568
276480
276480
276480
276480
276480
276480
276480
276480
260096
260096
260096
227328
227328
227328
266112
266112
266112
236224
236224
245312
226048
223616
223616
218624
181504
174848
170752
170752
170752
170752
170752
219904
285824
230848
208320
208320
208320
205504
205504
205504
194560
194560
194560
194560
194560
194560
194560
169984
169984
169984
159744
185856
181504
179328
179328
179328
169088
169088
150528
150528
150528
219584
219584
219584
178624
164416
162240
152000
224448
224576
219072
211712
206272
208256
271744
271744
271744
216320
216320
171008
171008
168192
168192
168192
168192
157952
155648
155648
146304
146304
144128
144128
144128
144128
207616
207616
207616
207616
207616
180864
180864
138560
138560
138624
138624
138624
122240
122240
98112
98112
98112
98112
85888
85888
85888
85888
81920
81920
81920
81920
168512
162752
162752
230336
197888
179328
179328
179328
164288
164288
164288
164288
164288
164288
160704
118784
101056
81920
81920
81920
81920
81920
221184
221184
221184
221184
221184
221184
190464
190464
225280
225280
225280
225280
227328
229376
194560
244008
245200
227000
265080
202832
172032
174080
214680
222944
196744
226312
221776
153936
181120
171520
118784
118784
118784
170752
118784
180928
184640
184640
205808
152488
142696
122480
118784
202880
177600
147456
147456
225536
181440
204800
128512
194688
90432
149184
147720
238464
161096
223752
220112
213296
183504
154320
211728
187152
162048
162048
162176
162176
158336
154560
154560
154560
118016
101056
89408
177472
173696
164672
169728
169728
169728
169728
166400
166400
144960
144960
131008
165632
141248
111808
177472
189120
170624
93760
161920
80256
165888
271296
259264
145280
233600
230592
230912
229056
232960
238592
247616
213120
206720
223872
190528
233280
228800
259008
219584
242752
221312
210432
232000
248512
246656
247808
236352
244224
205504
252032
255936
226112
213312
204160
171136
226560
281408
204032
262720
272576
273088
274560
271872
271936
273920
269888
274880
240704
224192
223168
211456
224960
204352
219520
178944
165248
169664
181440
152512
205760
190336
163392
205760
163264
237824
236992
231104
228416
213064
189320
285064
256456
270920
257728
186624
196224
185792
150976
169600
180224
159232
189952
225152
140544
197504
151552
175168
185536
165696
226624
209792
264576
249216
248640
254912
266368
249664
278720
249984
281472
251520
254912
203136
242816
239872
269120
189568
253120
197184
186624
203456
177600
346528
348216
334768
322416
334880
303776
305376
307232
340640
386784
372824
397976
360216
366040
406424
333528
324696
282840
303448
331032
349336
288536
285656
281816
231576
245144
216984
239576
310040
315736
323480
362968
233624
344856
347864
237336
320088
258136
335576
254040
363544
311064
357080
285976
296152
264664
306520
374232
364760
407832
407640
403544
368024
376600
405336
303576
397400
376664
334240
279968
375072
277920
375072
329312
378592
320864
337184
337504
293536
293280
285464
268480
259336
279752
299016
270280
276040
313480
265224
297288
290440
259464
273224
277320
238216
262728
241544
275272
274312
252488
179080
183176
220168
196168
195720
171592
210248
178376
200200
177984
192192
163584
203968
179328
157376
181568
166272
181632
182016
174592
179136
242432
231104
226112
227712
185088
210944
224640
180608
177536
192192
249856
267584
262784
270272
266688
247104
282944
266176
247296
340096
312256
310080
251520
272768
230784
230080
230080
230080
230080
230080
222016
222016
157568
214720
154368
236992
185664
214208
214080
214080
214080
201216
245240
206784
309848
306136
251736
354904
337688
364752
399200
228000
314848
244768
223968
260768
223968
213728
217568
266208
266208
251104
251104
247072
290432
241344
248784
190472
278408
247688
223944
223944
223944
205320
205320
205320
197128
193416
193416
183240
182344
163024
163024
163024
163024
188368
175504
175504
142992
106336
100192
100192
184104
93136
142088
121288
187592
172744
87816
150920
153224
256072
188104
209352
195144
221768
184520
175432
184832
118656
230528
228224
201792
109056
232640
187456
187456
181056
201216
247296
241728
241728
238144
148608
148608
148608
148608
148608
148608
251456
244416
148800
148608
202176
231680
158080
198144
175104
185408
158656
121152
111360
172480
117760
219072
276608
204544
223872
308032
215680
215936
201856
142464
142464
134272
226688
150528
147456
128640
169600
142336
64640
64640
161792
156672
134016
134016
130368
123840
179136
176768
176768
166208
168128
164224
163648
153856
174752
168200
227912
235208
216072
246600
146384
196496
171344
202000
160152
169560
236416
232384
141632
263424
266368
244800
244608
247808
244536
265848
174328
276032
216040
272864
246432
218520
306008
201480
262088
252232
205768
224968
212296
218440
185480
195408
198288
184416
182816
136576
184768
149056
193280
181056
184576
200064
202048
203968
218240
212992
218112
253824
252864
284992
191360
234944
242624
181248
172096
241984
178880
233344
227904
176448
241280
164928
258112
149952
160128
116288
199168
242816
289024
207936
297856
162048
178880
146240
142400
166144
104448
234944
219264
214528
141440
144640
138432
144832
168512
177088
174208
126272
142144
137984
147136
137856
132800
122112
127936
218752
223360
217920
192256
226624
205248
191296
224128
222464
199936
232128
223680
248448
217920
206144
230272
214528
197248
246784
202240
198656
185728
219520
193792
172928
198016
211584
174144
220032
209664
175552
169216
166848
152128
160768
108736
136576
212096
144000
247296
207232
235904
227328
216960
205440
253696
200512
339200
219648
260928
320768
260480
262912
249240
213328
244648
216488
267560
190296
177496
191272
173152
183856
153840
233008
252992
237232
244848
226304
235352
241672
211856
233488
189512
253480
209656
276856
230648
287288
220352
268480
148672
146560
203136
248064
142528
304640
256800
270816
262944
375808
240592
351464
290168
350200
267128
305656
185976
185976
171896
222528
165824
244288
209872
217616
215992
229048
172472
182264
173984
163744
171808
172064
153568
216096
243616
131832
195368
140272
136240
140656
101744
30568
30568
59240
59240
78312
134056
90344
90344
70696
59296
59296
119840
130520
95448
89112
152728
140312
138912
203584
136512
113536
111552
107840
86592
89536
83968
69504
139584
78848
120064
123072
100416
78848
78848
194176
48128
68608
116608
176832
115456
89728
116480
76544
63232
135680
169024
132544
157568
105856
115200
87744
104320
90752
86400
86144
79872
80128
50112
66432
81344
40128
45184
203840
127872
174016
222976
151424
106240
107328
79808
107200
87232
74432
70400
201984
89088
147264
196864
153984
110592
108992
100736
81664
81664
112064
112448
112448
112448
112448
115200
81152
124800
124800
68288
37888
105472
102848
190888
69832
138080
133616
150296
27968
144512
9216
69440
93760
50176
68480
116992
143464
160872
126248
152424
116200
142568
81552
229672
216528
138240
129920
120192
109632
149376
181968
161680
206336
201952
137920
190720
127952
223632
175672
171072
114648
232920
205592
184408
237720
282112
236736
304960
253824
258176
212224
205440
159872
147456
100928
100800
111232
88704
88704
62144
82176
64320
50176
50176
50176
76096
115904
128768
140096
44032
40576
63360
63808
49536
9216
80448
110400
153280
176320
152896
225408
160192
207040
200064
190400
178176
253952
292992
226560
226560
192512
189632
208000
183424
212096
156672
125952
125952
125952
197952
195904
225304
301272
196296
188648
233384
203752
182192
189872
125640
247312
269024
306304
306304
236608
274120
230408
219320
241176
197328
199248
233616
233616
199248
210256
203600
181424
207624
139720
139720
171960
157176
129872
172080
140816
162832
113848
90680
52472
101576
112776
65160
165320
165320
183944
216120
200464
141432
171032
160632
147576
189544
186712
195792
202128
206368
217704
177640
197384
136656
96520
121776
121776
121776
166120
166888
163624
137128
171112
166440
166440
139560
130344
144616
187304
183208
123000
121720
119504
122448
112592
99984
120912
175440
119376
174992
264144
182480
194448
193808
215184
229264
242192
225552
333328
207696
218064
181392
227280
188048
201808
165960
181512
145544
141656
152080
136912
139856
120592
94672
112016
125440
121984
126784
103488
113344
105664
63616
111176
162696
182152
149896
185152
155776
227712
158848
180224
70720
98752
91520
101760
41984
99648
152832
192768
123392
133888
65984
46208
151680
116928
103552
87424
168960
77888
75264
183488
103680
117888
168320
156288
158144
133568
88448
130432
130432
109696
113344
116992
111936
114240
130176
102720
98688
168632
157120
148928
169344
154368
129856
113472
128320
136064
71104
64768
212032
215552
207360
222088
193608
156808
163008
160376
121112
154832
149200
113136
208240
175664
218192
89672
210312
212608
168256
155552
163936
163936
112736
112736
116480
71424
60352
93952
77568
71296
91456
91456
83968
69696
85248
85248
71552
80320
100096
69760
92032
79488
70720
142784
160000
296192
177152
286080
302760
283752
260112
227104
276552
251976
251976
251976
251976
256200
256200
223496
223496
106640
142928
143056
143056
143184
156752
179792
179792
179792
179792
160336
139544
140080
177472
169800
148040
168984
160088
126376
150632
148264
153448
143208
135848
128936
124584
117664
207392
157472
152320
132480
91072
134016
129152
56320
56320
136704
180224
184000
219072
160984
262104
262104
240216
195096
213528
186520
167672
167672
135024
89712
116144
244976
240944
240944
206576
203272
245104
210656
290016
293472
301752
153784
246272
150592
222464
222720
188928
188928
188928
189056
189056
299776
275200
350848
337216
373440
339520
310976
310504
281392
254664
253768
253768
253768
253768
249736
249736
249736
216648
313128
298472
296424
282728
270600
270600
254216
254216
254216
254216
254216
254216
254216
236104
236104
236104
236104
236104
236104
236104
236104
236104
236104
236104
236104
236104
236104
236104
225864
225864
227912
227912
227912
227912
242512
229624
231040
227872
120448
211584
195200
205568
117056
113152
129344
235072
202496
109952
282144
98496
190656
118656
72640
51456
145344
190176
305672
226544
239552
98304
180608
159680
94976
139008
145160
257880
163120
130048
59008
115176
67688
144488
178752
87680
123584
197312
297152
382016
326016
244224
327424
328576
200448
280320
148352
148352
187008
251584
153152
255488
122112
277056
302656
303552
274880
274880
274880
274880
274880
274880
274880
274880
274880
274880
274880
274880
274880
275008
275008
275008
336448
336448
336448
336448
336448
336448
336448
336448
311872
293440
293440
293440
293440
293440
275520
206400
256768
202304
180672
218560
271360
258048
229504
227008
227008
229056
229056
265792
265792
265792
265792
206400
206400
206400
206400
157568
151424
151424
151424
151424
151424
151424
186752
186752
186752
186752
186752
186752
186752
208320
166720
124608
124608
126784
174720
251264
237952
258688
274176
306432
182528
197888
125568
181376
181376
150336
185024
185024
198976
168256
132096
196992
116032
205248
253696
269504
348544
435904
411328
411328
411456
343872
343872
343872
343872
343872
343872
343872
335552
335552
335680
335680
335680
335680
335680
335680
335680
335808
332096
332096
332096
332096
332096
332096
332096
332096
270080
265280
255040
255040
255040
255040
255040
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
244800
134208
166080
141312
108544
108544
108544
108928
108928
108928
108928
108928
108928
219520
219520
219520
219520
219520
219520
219520
219520
219520
219520
219520
219520
219520
219520
219520
219520
215872
216256
228544
228544
228544
228544
224704
226496
189632
189632
191808
202688
202688
203264
254656
251072
238784
242112
242112
242112
195776
195776
185536
144576
144576
158912
158912
158912
134336
158912
158912
201920
191680
193728
203968
283200
252288
217600
188928
188928
147968
225792
225792
188928
172544
172544
221184
221184
221184
221184
221184
154816
154816
154816
154816
154816
154816
154816
154816
154816
154816
154816
154816
146624
146624
146624
146624
146624
146624
146624
146624
146624
200640

[-- Attachment #3: slab_unused_merge --]
[-- Type: text/plain, Size: 15652 bytes --]

321152
368448
357056
399552
371072
367104
367104
364800
364800
360256
346304
364480
395328
397632
390400
390400
390400
390400
390400
390400
390400
390400
390400
345088
352192
379328
388032
383040
350528
316352
341440
361664
347840
343744
343744
343744
367680
339776
344384
344384
339776
339776
375616
374208
341952
349248
357440
346432
342400
348800
351872
366208
352640
345984
290432
290944
293504
286080
279424
279424
295552
285056
279424
279424
279424
279424
279424
279424
302976
295168
295168
295168
288768
332544
332544
321536
306944
313856
318720
346368
318464
319488
326144
318464
282112
291584
281344
297216
293376
292352
291584
276736
280064
265216
281088
281088
289536
261120
307712
278784
262912
263424
275968
281344
279040
265216
265216
284416
282368
269248
291520
286656
259520
271552
266816
253504
250432
250432
235712
235712
235712
257472
235712
252608
247744
277440
273600
276928
276928
248512
273088
273088
258240
254272
256320
256320
254464
231424
231424
229120
226560
224256
224256
219904
244480
244480
258048
264448
264448
260608
260608
260608
260608
260608
257536
257536
235008
251392
233216
248320
244992
239360
262912
239360
243968
259072
271872
246144
248960
246912
261248
255616
246144
240512
185984
170368
126976
126976
166144
188544
146432
126976
130688
126976
155392
184832
168064
155008
184960
184960
184960
185216
185216
262208
262464
262720
272000
273024
497280
460096
374144
360576
352768
383168
393472
355136
291264
297920
275264
333120
338688
323072
271104
271104
271104
271360
271360
271360
271360
271360
271360
271360
271360
271360
259392
310784
259392
270976
259392
387328
437120
413312
458112
435264
432896
252032
252032
278784
278784
245504
295680
301952
295680
317440
430784
490688
488128
457408
472128
462080
529216
502656
468992
505792
351936
339136
313600
359616
335360
443008
457856
422336
471616
421696
487360
511104
530624
527488
522432
497408
506240
490752
496512
482304
498048
454016
484224
439360
525760
538048
472960
513088
493888
508224
200064
200064
284544
200064
282496
322496
340800
346176
291520
408960
371392
448448
425728
375168
477120
458752
528768
525056
503296
469568
437632
497984
535296
557504
471104
318080
312512
305856
342464
385280
318400
381504
342016
288448
219584
213696
269888
237056
235904
387520
258560
286208
293952
367680
346944
361984
324736
305216
314240
402816
422400
423168
384576
169408
339328
305024
338368
356352
275520
266752
212672
209024
282944
286848
255168
290432
228224
200832
322240
379520
395200
285568
359872
359808
274240
334784
290496
252416
336832
353920
353472
312704
368256
302208
316416
298112
318336
309760
304128
278272
379584
324864
234496
283008
221824
221760
236800
220160
414016
436096
399744
370944
245056
244480
311360
347264
276864
249920
250112
321088
213888
182912
303872
352320
339008
345280
370176
245504
232192
252608
275520
266880
258624
244224
231936
176064
236672
198848
206336
183616
209152
229504
258048
258240
243264
243264
231552
235840
239104
277568
242688
237696
314176
316416
316672
332544
368512
300544
286592
268864
268352
268352
282112
267264
262528
251072
263936
355968
324544
370880
371008
335872
439040
449792
440768
390848
407488
377920
402752
435904
373056
424448
350784
368000
370368
401920
361216
310080
460032
464128
451008
451008
451008
451008
452672
445248
445248
445248
444096
441024
443456
435776
207360
201472
246720
246720
246720
316992
321664
319744
316288
310464
292992
291328
308928
305088
305088
366784
366784
366784
243776
243776
280256
280256
280256
280256
280256
280256
280256
280256
251520
256256
249024
219072
245504
236800
258496
234368
292544
223808
265408
253312
244480
256832
230400
190272
180416
175040
116096
233216
197696
198528
198528
176704
171008
154240
182080
182080
247232
223232
234304
234304
297472
251328
251328
251328
243136
243136
273472
470272
314816
314944
281664
293760
409024
507712
508480
478080
527232
527232
529280
614720
508224
601536
495488
538816
339200
309696
989952
303744
303744
303744
303744
396096
398144
398144
364800
481664
356224
281984
255296
277888
215232
234304
241088
217792
245312
321536
322752
319936
342080
338368
352704
408128
464128
399808
504704
426368
92416
515712
616256
511552
556992
459456
60928
1958400
130560
2141120
189312
264704
651200
398656
470848
396544
431360
423616
375360
359680
384000
350272
400832
356736
354432
348800
271552
324288
321856
311616
350656
381312
322112
346624
352512
345216
313216
174272
256960
278208
266112
272384
350912
324352
399168
404416
350336
384768
366336
372288
367488
321408
365888
415872
392256
380032
408064
335232
298816
297408
267456
336960
339904
344192
371584
325120
382464
376896
399872
423936
370048
428352
372352
394432
332608
385152
380672
351104
367552
408832
430080
395264
437248
268032
296576
239872
285312
286464
310912
286656
311104
356416
381952
346944
327808
429888
432640
425600
419392
418816
444032
274368
287744
351424
368768
386880
398400
374016
365120
365120
382592
288512
310400
339136
349696
353536
374272
359360
381056
363072
355328
338176
323136
349696
262336
354432
344512
350592
318912
259008
311040
301184
317760
324416
354048
334912
346176
330560
341888
318848
323264
322816
323520
322304
305856
277376
305024
324544
324096
343232
334976
353088
336896
324736
346368
321856
327616
348416
341056
359936
345408
369408
364288
352960
336192
361408
348672
347392
279168
310400
156096
280960
248256
373440
300416
377856
343616
272320
350912
337152
355584
334144
325376
311424
345984
318080
354432
324736
348352
327616
316288
295872
283584
283520
277952
282112
302592
322368
290240
281984
268032
298816
356800
465344
488000
466816
436416
445440
445952
429440
465792
408064
459392
369088
454784
413632
445888
514688
478016
496192
434560
345600
411392
387584
370688
236608
427328
377856
425728
419136
365504
385856
392128
448256
449984
456448
456512
450944
384512
392960
381504
358784
342848
369408
372544
376320
349120
416832
353408
390528
433024
474112
464320
426048
437248
399488
417280
421120
416384
421184
421824
426240
493824
453504
390528
391616
393984
424000
395712
433088
449984
419008
379328
346880
481600
463680
499264
439424
412928
471936
471744
460096
364672
411456
393728
443584
396992
442816
446912
446912
388800
444352
351488
432576
371840
399552
365376
445888
462336
470016
397952
444032
449152
306752
377024
310464
370752
352768
442176
434176
337152
402560
447104
453376
413120
393920
324352
295680
275968
352256
150016
178688
370688
346112
159232
263488
444736
458624
454592
226624
246848
226688
296320
229632
255936
371392
333888
300736
270016
280512
327104
285504
301888
298688
282944
281216
295872
311616
345664
303744
308992
275840
278272
221760
248768
237696
214464
174144
192960
226880
226176
265728
201920
146816
213376
167360
157632
111488
168448
259712
276096
262400
347648
269632
402560
461696
384384
394688
369472
306560
299136
308160
277184
242880
441216
419712
441152
383872
421824
481792
396480
395392
501440
539264
381696
396864
346688
344448
288832
326272
380096
363776
373248
297024
318016
319424
262912
260352
318144
315200
315200
345088
345472
345728
346112
346112
241152
240000
249472
324800
220672
349568
308352
465536
506752
531392
477952
404608
521728
363520
404928
445504
359104
394432
284736
354176
359680
344576
338752
492288
420928
426368
393984
340288
387904
351936
289792
283776
305536
306304
331072
393728
401024
434176
336448
373184
420800
430336
417792
341312
370176
364416
366656
354368
302592
268160
346688
333248
347200
302208
311360
326720
347968
377152
373312
381760
363136
347648
351488
386624
410688
415104
453504
512576
488832
449600
455680
488512
375104
396416
382528
374656
308352
406784
387712
417024
385472
370432
371264
417152
396032
359872
326912
349696
343232
346944
342784
334272
335616
374720
366784
380032
420672
410944
424384
401280
448576
364864
457024
414912
413376
391744
391360
341376
338432
365056
355776
395968
330624
376960
386944
401088
403648
404160
405952
279232
387328
380160
360576
393664
372608
385024
350336
379520
367424
375744
385280
364864
371584
363392
359360
385472
361792
378176
353856
342016
326016
328832
348032
278592
354432
343232
348160
298496
336704
262976
301632
307584
330240
335744
360064
363392
332288
365312
327232
342656
337920
315264
323904
326080
322944
318784
322496
283776
287168
280576
266624
258176
320256
320896
323712
313472
318080
319040
322816
242752
298880
310912
309056
291776
299520
312448
290176
317568
311040
287104
302272
273920
308608
275392
247488
286656
234688
201920
195008
193728
178752
207104
273280
281408
302976
347072
318912
311680
358720
399616
363968
326528
361408
340672
364096
344384
327360
309888
283328
315648
283200
312256
353792
352192
358592
326464
319872
309696
325312
326336
348032
359168
363584
365120
293312
330368
295488
361408
356608
337984
300288
338816
269376
324672
309760
255040
347648
249088
285568
346496
376960
325120
381248
300480
362752
369984
400768
368640
368448
275392
368000
363648
377856
290240
336192
312064
353344
313280
240704
241472
241472
181312
222976
252672
266496
260160
238144
238144
202304
197952
197952
197952
197952
251520
259968
271872
293440
286208
291328
282176
281152
261376
261376
257280
257280
298432
298432
283136
299520
299520
299520
299520
327936
315392
315392
315392
308544
283648
256256
234560
230464
248128
242368
239808
239808
239808
240000
386496
367168
446528
431360
374720
392576
403456
367616
302592
319936
367680
371776
334848
338240
328192
317632
326912
375104
349248
311808
266432
258240
295680
253760
331904
266752
359936
334528
411904
481408
486976
432384
432384
491584
422080
326400
392000
344384
322496
324352
318848
336960
328832
306560
269568
311936
371584
379328
351360
377536
373888
385920
472000
450688
433536
446400
387584
430848
355584
358144
344256
424704
377664
438272
444992
460992
426048
426304
412352
384768
374208
384064
386496
383424
410304
406720
427776
413760
351680
449600
436480
437248
438720
421440
423232
325888
390848
417280
440704
381952
411392
393216
488256
397504
390912
346176
288896
278464
265792
363392
381376
345792
364544
366592
366400
313792
366080
340800
386496
366464
392000
410368
433664
433728
446080
423616
464192
435840
425088
448320
429184
377536
407552
322944
400320
354880
358848
353984
315008
377216
319808
329280
272320
341056
308800
294976
287040
374336
339712
278912
279872
276416
285056
256000
318336
401984
422656
379072
396800
355200
431424
425664
416064
413632
426304
363392
364992
345856
379776
368704
340928
283648
236736
313920
377792
358400
381376
381504
395008
397440
390144
363456
363456
460672
426624
418112
379136
375616
347776
311616
341824
400512
431168
391872
406528
388736
402112
392640
490496
459520
492416
449472
491392
440832
424000
450048
379712
416192
324608
448896
336576
425088
385920
385920
379264
372096
346304
275712
309632
305856
307328
320000
345472
346496
362368
362368
335104
363968
391424
368768
353344
301312
329536
313408
310208
354176
374400
371328
404672
299712
405184
445312
382848
369472
416576
425280
397376
414912
411776
415296
409408
426816
327872
367744
356032
329408
378048
360512
313920
388032
397888
406016
385792
422336
376384
320000
285376
376256
296320
334080
332288
328832
275520
283264
278656
385984
385984
266432
370304
334272
356352
403136
388160
381760
351680
349376
349248
320960
261952
389504
315328
311488
385408
302976
282560
297728
257088
309824
173952
351424
414336
414400
371648
285440
329088
284480
306752
329984
373760
376064
428032
411264
464448
460864
480256
504384
500992
491392
408896
406848
388224
399488
362240
401536
390720
339200
345344
247808
370752
381824
435328
463168
388032
410816
401600
397952
327744
333696
378432
349952
367552
468992
427200
454720
389632
484416
393664
504448
378304
503616
367936
472448
447680
432128
413696
420672
422592
509952
697152
701760
685696
735168
706880
654400
680384
659200
553216
557760
578048
624640
609472
621760
586688
618944
534208
609792
538752
508032
629120
530432
579072
535360
568640
608768
483648
624000
632000
600704
596672
584768
554240
589376
588224
519040
557248
614016
617088
589888
608192
608192
626688
628672
621120
589120
607872
684736
706496
682240
683200
631296
651456
627840
667648
656768
692416
607680
707072
572800
683648
540608
358208
513856
567104
580480
476608
318784
648960
628608
634624
679360
658304
461824
808512
758208
758208
768192
520960
542336
483200
663424
744064
739072
637184
760320
773120
672384
793728
691456
783360
742400
720128
683328
748992
660416
540352
506816
543808
552704
541248
481728
506816
548160
511552
542016
506304
510976
511232
452736
471488
467328
511744
485120
456576
435648
542912
651136
559936
595648
637504
596224
492352
554496
535232
497984
569152
562112
539264
556672
492480
376960
427968
429760
435648
497408
519168
502976
334272
463680
279616
404352
491776
486912
434624
508096
519488
474816
462272
503808
480768
468736
468736
529728
487872
487872
490240
490752
510080
449856
570112
582848
492096
479552
499648
536960
536960
536960
629440
658432
578368
549056
564736
524992
509440
540608
530688
526912
511936
503552
475456
470400
440768
431744
402304
393984
401152
416640
446720
448896
398208
483776
464320
376256
479040
370816
369792
466752
466432
368128
324480
390208
383040
447744
488064
367104
465920
297984
387392
291648
356352
450304
453760
458048
468544
370496
587264
589312
566400
584576
573248
663872
647360
644480
558784
507584
582656
611776
521984
439040
573312
596608
530176
465856
525184
505152
458048
396928
439104
667264
437440
493888
613632
597952
413056
590208
543104
374848
530432
491200
598400
607104
590784
548672
572352
572352
572352
491264
484096
582400
582400
582400
582400
582400
668096
613568
613568
613568
613568
629376
629376
629376
629376
629504
543680
539840
539840
533376
525312
568192
569536
569536
529856
539264
507136
381824
516672
395968
477056
434752
447360
455040
474752
474752
451264
449472
449472
449472
394944
333568
369472
348928
311360
346816
364032
177088
314368
390784
343552
338112
335680
332288
332288
332288
373824
373824
373824
373824
416960
430784
419264
407680
427392
408832
366976
438080
406400
357120
292736
281536
333888
360704
336064
423296
222848
250304
426688
429248
382464
362432
347392
453184
536704
398976
376256
443072
395008
396288
368704
393408
399872
366784
381376
415744
366336
346048
349632
329216
344384
283712
281600
205632
295552
345664
343488
304576
364864
361984
375040
344448
371840
356352
344192
367424
324352
295552
286400
281600
281600
281600
278080
278080
278080
278080
278080
235968
279936
278080
243520
244352
244352
244352
375360
370752
368448
368448
374208
374208
323904
376256
220736
247104
222016
222016
222464
220672
282560
262784
262784
241664
241664
290432
290432
290432
291264
286272
286272
280192
280192
329856
274624
275712
355072
288704
288704
288704
272448
260160
260160
206016
206272
197760
204416
204416
272768
283968
280064
336896
336896
336896
306688
306688
321728
316480
309184
309184
300608
300608
300608
298560
279168
279808
279808
267136
264832
166720
117568
117568
117568
230080
279040
307008
313152
311936
316288
380352
381824
342912
342912
395200
321088
305856
305856
305856
306048
309888
395776
398016
395008
353920
353920
322048
354624
354624
404928
401856
376320
376320
376320
368384
368384
368384
336192
314368
311104
311104
311104
302080
293888
322176
322304
313984
313984
313984
313984
287104
285568
285568
283072
315072
315072
339648
339648
339648
339648
331904
308800
308800
584640
563136
573440
537408
537408
526400
526400
523840
523840
523840
523840
523840
523840
523840
523840
520768
520768
520768
520768
520768
496128
496128
496128
557248
557248
557248
554304
548928
570944
570944
570944
570944
570944
570944
570944
568256
565952
561984
559104
555648
551296
551296
551296
551296
621888

[-- Attachment #4: merge_vs_nomerge.png --]
[-- Type: image/png, Size: 34956 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-04  6:35         ` Sergey Senozhatsky
@ 2015-09-04  7:01           ` Linus Torvalds
  2015-09-04  7:59             ` Sergey Senozhatsky
  0 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2015-09-04  7:01 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jesper Dangaard Brouer, Dave Chinner, Mike Snitzer,
	Christoph Lameter, Pekka Enberg, Andrew Morton, David Rientjes,
	Joonsoo Kim, dm-devel, Alasdair G Kergon, Joe Thornber,
	Mikulas Patocka, Vivek Goyal, Sami Tolvanen, Viresh Kumar,
	Heinz Mauelshagen, linux-mm, Sergey Senozhatsky

On Thu, Sep 3, 2015 at 11:35 PM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
>
> Out of curiosity, I did some quite simple-minded
> "slab_nomerge = 0" vs. "slab_nomerge = 1" tests today on my old
> x86_64 box (4gigs of RAM, ext4, 4.2.0-next-20150903):

So out of interest, was this slab or slub? Also, how repeatable is
this? The memory usage between two boots tends to be rather fragile -
some of the bigger slab users are dentries and inodes, and various
filesystem scanning events will end up skewing things a _lot_.

But if it turns out that the numbers are pretty stable, and sharing
really doesn't save memory, then that is certainly a big failure. I
think Christoph did much of his work for bigger machines where one of
the SLAB issues was the NUMA overhead, and who knows - maybe it worked
well for the load and machine in question, but not necessarily
elsewhere.

Interesting.

                   Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-04  7:01           ` Linus Torvalds
@ 2015-09-04  7:59             ` Sergey Senozhatsky
  2015-09-04  9:56               ` Sergey Senozhatsky
                                 ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Sergey Senozhatsky @ 2015-09-04  7:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sergey Senozhatsky, Jesper Dangaard Brouer, Dave Chinner,
	Mike Snitzer, Christoph Lameter, Pekka Enberg, Andrew Morton,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm, Sergey Senozhatsky

[-- Attachment #1: Type: text/plain, Size: 16835 bytes --]

On (09/04/15 00:01), Linus Torvalds wrote:
> On Thu, Sep 3, 2015 at 11:35 PM, Sergey Senozhatsky
> <sergey.senozhatsky.work@gmail.com> wrote:
> >
> > Out of curiosity, I did some quite simple-minded
> > "slab_nomerge = 0" vs. "slab_nomerge = 1" tests today on my old
> > x86_64 box (4gigs of RAM, ext4, 4.2.0-next-20150903):
> 
> So out of interest, was this slab or slub? Also, how repeatable is
> this? The memory usage between two boots tends to be rather fragile -
> some of the bigger slab users are dentries and inodes, and various
> filesystem scanning events will end up skewing things a _lot_.
> 
> But if it turns out that the numbers are pretty stable, and sharing
> really doesn't save memory, then that is certainly a big failure. I
> think Christoph did much of his work for bigger machines where one of
> the SLAB issues was the NUMA overhead, and who knows - maybe it worked
> well for the load and machine in question, but not necessarily
> elsewhere.
> 
> Interesting.
> 


grep SLAB .config
# CONFIG_SLAB is not set
CONFIG_SLABINFO=y

grep SLUB .config
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB=y
CONFIG_SLUB_CPU_PARTIAL=y
# CONFIG_SLUB_DEBUG_ON is not set
# CONFIG_SLUB_STATS is not set


The numbers are stable on my box. Did another round of tests.
Please find attached (hope attachments are OK):
-- git clone glibc; make -j8; package; clean up


It differs on both busy and idle systems.


I was a bit surprised to see 0 unused memory
..
33472
56128
56128
0
0
0
0
0
0
0
0
0
0
0
59392
59392
59392
..


But I went through the corresponding slabinfo (I track slabinfo too); and yes,
zero unused objects.

slabinfo - version: 2.1
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
ext4_groupinfo_1k     36     36    224   18    1 : tunables    0    0    0 : slabdata      2      2      0
ext4_groupinfo_4k   7412   7412    232   17    1 : tunables    0    0    0 : slabdata    436    436      0
sda2                 117    117    104   39    1 : tunables    0    0    0 : slabdata      3      3      0
sd_ext_cdb           128    128     32  128    1 : tunables    0    0    0 : slabdata      1      1      0
scsi_sense_cache     224    224    128   32    1 : tunables    0    0    0 : slabdata      7      7      0
scsi_cmd_cache       166    234    448   18    2 : tunables    0    0    0 : slabdata     13     13      0
sgpool-128            16     16   4096    8    8 : tunables    0    0    0 : slabdata      2      2      0
sgpool-64             48     48   2048   16    8 : tunables    0    0    0 : slabdata      3      3      0
sgpool-32             64     64   1024   16    4 : tunables    0    0    0 : slabdata      4      4      0
sgpool-16             64     64    512   16    2 : tunables    0    0    0 : slabdata      4      4      0
sgpool-8             176    176    256   16    1 : tunables    0    0    0 : slabdata     11     11      0
scsi_data_buffer       0      0     24  170    1 : tunables    0    0    0 : slabdata      0      0      0
ip6-frags              0      0    280   29    2 : tunables    0    0    0 : slabdata      0      0      0
fib6_nodes           128    128     64   64    1 : tunables    0    0    0 : slabdata      2      2      0
ip6_dst_cache         42     42    384   21    2 : tunables    0    0    0 : slabdata      2      2      0
PINGv6                 0      0   1472   22    8 : tunables    0    0    0 : slabdata      0      0      0
RAWv6                 22     22   1472   22    8 : tunables    0    0    0 : slabdata      1      1      0
UDPLITEv6              0      0   1472   22    8 : tunables    0    0    0 : slabdata      0      0      0
UDPv6                 44     44   1472   22    8 : tunables    0    0    0 : slabdata      2      2      0
tw_sock_TCPv6          0      0    272   30    2 : tunables    0    0    0 : slabdata      0      0      0
request_sock_TCPv6      0      0    312   26    2 : tunables    0    0    0 : slabdata      0      0      0
TCPv6                  0      0   2752   11    8 : tunables    0    0    0 : slabdata      0      0      0
bsg_cmd                0      0    312   26    2 : tunables    0    0    0 : slabdata      0      0      0
mqueue_inode_cache     25     25   1280   25    8 : tunables    0    0    0 : slabdata      1      1      0
hugetlbfs_inode_cache     18     18    872   18    4 : tunables    0    0    0 : slabdata      1      1      0
jbd2_transaction_s    100    100    320   25    2 : tunables    0    0    0 : slabdata      4      4      0
jbd2_inode           340    340     48   85    1 : tunables    0    0    0 : slabdata      4      4      0
jbd2_journal_handle    204    204     80   51    1 : tunables    0    0    0 : slabdata      4      4      0
jbd2_journal_head    136    136    120   34    1 : tunables    0    0    0 : slabdata      4      4      0
jbd2_revoke_table_s   1024   1024     16  256    1 : tunables    0    0    0 : slabdata      4      4      0
jbd2_revoke_record_s    128    128     32  128    1 : tunables    0    0    0 : slabdata      1      1      0
ext4_inode_cache    2178   2178   1744   18    8 : tunables    0    0    0 : slabdata    121    121      0
ext4_free_data       192    192     64   64    1 : tunables    0    0    0 : slabdata      3      3      0
ext4_allocation_context     64     64    128   32    1 : tunables    0    0    0 : slabdata      2      2      0
ext4_prealloc_space     52     52    152   26    1 : tunables    0    0    0 : slabdata      2      2      0
ext4_system_zone     816    816     40  102    1 : tunables    0    0    0 : slabdata      8      8      0
ext4_io_end          224    224     72   56    1 : tunables    0    0    0 : slabdata      4      4      0
ext4_extent_status   3876   3876     40  102    1 : tunables    0    0    0 : slabdata     38     38      0
kioctx                 0      0    896   18    4 : tunables    0    0    0 : slabdata      0      0      0
aio_kiocb              0      0    128   32    1 : tunables    0    0    0 : slabdata      0      0      0
dio                    0      0    704   23    4 : tunables    0    0    0 : slabdata      0      0      0
fasync_cache          42     42     96   42    1 : tunables    0    0    0 : slabdata      1      1      0
pid_namespace          0      0   2256   14    8 : tunables    0    0    0 : slabdata      0      0      0
posix_timers_cache      0      0    264   31    2 : tunables    0    0    0 : slabdata      0      0      0
UNIX                 110    110   1472   22    8 : tunables    0    0    0 : slabdata      5      5      0
ip4-frags              0      0    264   31    2 : tunables    0    0    0 : slabdata      0      0      0
ip_mrt_cache           0      0    192   21    1 : tunables    0    0    0 : slabdata      0      0      0
UDP-Lite               0      0   1344   24    8 : tunables    0    0    0 : slabdata      0      0      0
tcp_bind_bucket       64     64     64   64    1 : tunables    0    0    0 : slabdata      1      1      0
inet_peer_cache        0      0    192   21    1 : tunables    0    0    0 : slabdata      0      0      0
ip_fib_trie          340    340     48   85    1 : tunables    0    0    0 : slabdata      4      4      0
ip_fib_alias         292    292     56   73    1 : tunables    0    0    0 : slabdata      4      4      0
ip_dst_cache          64     64    256   16    1 : tunables    0    0    0 : slabdata      4      4      0
PING                   0      0   1280   25    8 : tunables    0    0    0 : slabdata      0      0      0
RAW                   25     25   1280   25    8 : tunables    0    0    0 : slabdata      1      1      0
UDP                   96     96   1344   24    8 : tunables    0    0    0 : slabdata      4      4      0
tw_sock_TCP            0      0    272   30    2 : tunables    0    0    0 : slabdata      0      0      0
request_sock_TCP       0      0    312   26    2 : tunables    0    0    0 : slabdata      0      0      0
TCP                   12     12   2560   12    8 : tunables    0    0    0 : slabdata      1      1      0
eventpoll_pwq        224    224     72   56    1 : tunables    0    0    0 : slabdata      4      4      0
eventpoll_epi        192    192    128   32    1 : tunables    0    0    0 : slabdata      6      6      0
inotify_inode_mark    120    120    136   30    1 : tunables    0    0    0 : slabdata      4      4      0
blkdev_queue          22     22   2816   11    8 : tunables    0    0    0 : slabdata      2      2      0
blkdev_requests      322    322    344   23    2 : tunables    0    0    0 : slabdata     14     14      0
blkdev_ioc            88     88    184   22    1 : tunables    0    0    0 : slabdata      4      4      0
bio-0                315    315    192   21    1 : tunables    0    0    0 : slabdata     15     15      0
biovec-256            56     96   4096    8    8 : tunables    0    0    0 : slabdata     12     12      0
biovec-128            16     16   2048   16    8 : tunables    0    0    0 : slabdata      1      1      0
biovec-64             64     64   1024   16    4 : tunables    0    0    0 : slabdata      4      4      0
biovec-16             64     64    256   16    1 : tunables    0    0    0 : slabdata      4      4      0
uid_cache             64     64    128   32    1 : tunables    0    0    0 : slabdata      2      2      0
sock_inode_cache     153    153    960   17    4 : tunables    0    0    0 : slabdata      9      9      0
skbuff_fclone_cache     90     90    448   18    2 : tunables    0    0    0 : slabdata      5      5      0
skbuff_head_cache    320    320    256   16    1 : tunables    0    0    0 : slabdata     20     20      0
configfs_dir_cache      0      0     96   42    1 : tunables    0    0    0 : slabdata      0      0      0
file_lock_cache       64     64    256   16    1 : tunables    0    0    0 : slabdata      4      4      0
file_lock_ctx        156    156    104   39    1 : tunables    0    0    0 : slabdata      4      4      0
net_namespace          0      0   4480    7    8 : tunables    0    0    0 : slabdata      0      0      0
shmem_inode_cache   1023   1023   1048   31    8 : tunables    0    0    0 : slabdata     33     33      0
pool_workqueue        64     64    256   16    1 : tunables    0    0    0 : slabdata      4      4      0
proc_inode_cache    1309   1309    928   17    4 : tunables    0    0    0 : slabdata     77     77      0
sigqueue             100    100    160   25    1 : tunables    0    0    0 : slabdata      4      4      0
bdev_cache            96     96   1344   24    8 : tunables    0    0    0 : slabdata      4      4      0
kernfs_node_cache  17836  17836    152   26    1 : tunables    0    0    0 : slabdata    686    686      0
mnt_cache            108    108    448   18    2 : tunables    0    0    0 : slabdata      6      6      0
filp                1757   1998    448   18    2 : tunables    0    0    0 : slabdata    111    111      0
inode_cache         9234   9234    872   18    4 : tunables    0    0    0 : slabdata    513    513      0
dentry             15036  15036    288   28    2 : tunables    0    0    0 : slabdata    537    537      0
names_cache           32     32   4096    8    8 : tunables    0    0    0 : slabdata      4      4      0
buffer_head        11427  11427    104   39    1 : tunables    0    0    0 : slabdata    293    293      0
nsproxy              170    170     48   85    1 : tunables    0    0    0 : slabdata      2      2      0
vm_area_struct      4462   4462    176   23    1 : tunables    0    0    0 : slabdata    194    194      0
mm_struct            112    112   1152   28    8 : tunables    0    0    0 : slabdata      4      4      0
fs_cache             105    105    192   21    1 : tunables    0    0    0 : slabdata      5      5      0
files_cache           95     95    832   19    4 : tunables    0    0    0 : slabdata      5      5      0
signal_cache         225    225   1280   25    8 : tunables    0    0    0 : slabdata      9      9      0
sighand_cache        182    182   2240   14    8 : tunables    0    0    0 : slabdata     13     13      0
task_struct          187    192   4928    6    8 : tunables    0    0    0 : slabdata     32     32      0
cred_jar            2179   2368    128   32    1 : tunables    0    0    0 : slabdata     74     74      0
Acpi-Operand        1680   1680     72   56    1 : tunables    0    0    0 : slabdata     30     30      0
Acpi-ParseExt        204    204     80   51    1 : tunables    0    0    0 : slabdata      4      4      0
Acpi-Parse           292    292     56   73    1 : tunables    0    0    0 : slabdata      4      4      0
Acpi-State           204    204     80   51    1 : tunables    0    0    0 : slabdata      4      4      0
Acpi-Namespace      1122   1122     40  102    1 : tunables    0    0    0 : slabdata     11     11      0
anon_vma_chain      4096   4096     64   64    1 : tunables    0    0    0 : slabdata     64     64      0
anon_vma            2472   2472    168   24    1 : tunables    0    0    0 : slabdata    103    103      0
pid                  256    256    128   32    1 : tunables    0    0    0 : slabdata      8      8      0
radix_tree_node     2016   2016    584   28    4 : tunables    0    0    0 : slabdata     72     72      0
trace_event_file    1058   1058     88   46    1 : tunables    0    0    0 : slabdata     23     23      0
ftrace_event_field   2550   2550     48   85    1 : tunables    0    0    0 : slabdata     30     30      0
idr_layer_cache      300    300   2096   15    8 : tunables    0    0    0 : slabdata     20     20      0
page->ptl           2117   2117     56   73    1 : tunables    0    0    0 : slabdata     29     29      0
dma-kmalloc-8192       0      0   8192    4    8 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-4096       0      0   4096    8    8 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-2048       0      0   2048   16    8 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-1024       0      0   1024   16    4 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-512        0      0    512   16    2 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-256        0      0    256   16    1 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-128        0      0    128   32    1 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-64         0      0     64   64    1 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-32         0      0     32  128    1 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-16         0      0     16  256    1 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-8          0      0      8  512    1 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-192        0      0    192   21    1 : tunables    0    0    0 : slabdata      0      0      0
dma-kmalloc-96         0      0     96   42    1 : tunables    0    0    0 : slabdata      0      0      0
kmalloc-8192          44     44   8192    4    8 : tunables    0    0    0 : slabdata     11     11      0
kmalloc-4096         200    200   4096    8    8 : tunables    0    0    0 : slabdata     25     25      0
kmalloc-2048         816    816   2048   16    8 : tunables    0    0    0 : slabdata     51     51      0
kmalloc-1024         672    672   1024   16    4 : tunables    0    0    0 : slabdata     42     42      0
kmalloc-512          544    544    512   16    2 : tunables    0    0    0 : slabdata     34     34      0
kmalloc-256         1344   1344    256   16    1 : tunables    0    0    0 : slabdata     84     84      0
kmalloc-192          903    903    192   21    1 : tunables    0    0    0 : slabdata     43     43      0
kmalloc-128         3168   3168    128   32    1 : tunables    0    0    0 : slabdata     99     99      0
kmalloc-96          1092   1092     96   42    1 : tunables    0    0    0 : slabdata     26     26      0
kmalloc-64          7424   7424     64   64    1 : tunables    0    0    0 : slabdata    116    116      0
kmalloc-32          1792   1792     32  128    1 : tunables    0    0    0 : slabdata     14     14      0
kmalloc-16          3584   3584     16  256    1 : tunables    0    0    0 : slabdata     14     14      0
kmalloc-8           5120   5120      8  512    1 : tunables    0    0    0 : slabdata     10     10      0
kmem_cache_node      224    224    128   32    1 : tunables    0    0    0 : slabdata      7      7      0
kmem_cache           189    189    192   21    1 : tunables    0    0    0 : slabdata      9      9      0


	-ss

[-- Attachment #2: slab_glibc_nomerge --]
[-- Type: text/plain, Size: 3491 bytes --]

46016
90688
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
33472
33472
56128
56128
0
0
0
0
0
0
0
0
0
0
0
59392
59392
59392
43008
43008
43008
26624
26624
26624
26624
7168
7168
61440
61440
61440
57344
49152
29696
47104
47104
47104
19456
77824
77824
77824
77824
45056
45056
45056
45056
45056
45056
63488
63488
19456
19456
19456
19456
19456
19456
19456
47104
47104
19456
19456
19456
19456
19456
19456
19456
19456
19456
19456
3072
3072
3072
3072
3072
3072
3072
3072
3072
3072
3072
3072
3072
3072
3072
3072
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
67584
63488
63488
121280
121280
125120
90112
114368
63488
105728
142720
159488
155584
158400
195200
117952
208768
70272
72704
89664
134336
80512
109120
109120
109120
54016
30720
30720
67072
134912
134912
132928
132928
132928
117120
117120
74240
48576
115648
53248
173056
122496
129152
69440
127808
107456
51392
100544
50944
110976
103040
46336
56704
153216
91520
161472
161472
139264
167872
81920
96704
96704
96704
96704
96704
96704
96704
96704
142976
142976
142976
81920
122176
122176
139712
172480
170176
168192
143872
143872
168256
143552
188864
188864
188864
99328
81920
151616
81920
94016
94016
94016
94016
94016
94016
94016
157248
81920
150144
198464
174784
136640
121216
92608
153856
174144
174144
174144
127296
127296
127296
121792
121792
121792
121792
113280
113280
113280
113280
113280
113280
113280
113280
113280
113280
113280
113280
109504
109504
109504
109504
74176
111936
66560
125440
124352
128064
74368
123136
68160
118912
118912
135616
59904
139904
136320
188544
188544
169088
169088
76352
104128
102208
133440
109824
114560
101184
120064
114432
119936
108352
104128
177088
97984
112128
120320
106944
187328
85952
74624
74496
122944
112704
122816
48512
40448
40448
102720
104448
149824
101504
91008
245632
192384
206144
206592
173696
195648
113152
156416
106368
262144
186304
155712
114880
183296
107328
102592
161728
153728
80064
124672
110528
116288
141888
95104
40448
153920
138240
40448
40448
40448
40448
40448
168064
175488
115712
112768
94272
52672
103680
87040
122944
100416
36352
79360
79360
79360
83328
112192
112192
112192
112192
89088
86144
96896
94656
36352
114624
115712
106496
96128
96128
96128
72064
72064
72064
40064
40064
36352
36352
36352
134912
125248
56384
37504
36352
36352
36352
36352
36352
36352
36352
36352
82240
71616
59776
36352
36352
36352
38656
36352
103232
88896
36352
59840
36352
133440
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
91968
91968
91968
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
80640
36352
55936
78912
36352
74112
36352
135744
167808
193088
157120
157120
125376
174784
232128
232128
222144
206912
206912
206912
206912
198976
178816
154880
154880
167680
208128
151296
139520
113792
80640
76032
76032
76032
76032
76032
76032
76032
62592
62592
62592
62592
62592
62592
62592
62592
58624
48384
73856
73856
73856
73088
40832
40832
40832
40832
40832
40832
40832
40832
40832
40832
40832
40832
40832
36352
36352
36352
36352
36352
36352
36352
36352
64320
36352
36352
36352
36352
36352
36352

[-- Attachment #3: slab_glibc_merge --]
[-- Type: text/plain, Size: 3780 bytes --]

180032
119360
106176
200640
193472
156864
156864
156864
181440
181440
169152
169152
122048
179136
179392
203200
232832
232832
302848
303360
303872
304384
308608
326656
354240
327744
334272
347456
375808
326592
308672
304960
344768
324352
318464
340736
340736
351488
340736
340736
336768
356736
352640
336768
353856
352576
346560
343488
337600
340928
359616
340672
337600
340416
355008
360640
341504
365568
331456
353152
353600
330496
366464
358784
371072
318336
318336
304256
330880
327552
296832
295040
295040
297088
312192
298624
304256
296320
411392
431104
414720
416512
421888
429312
422656
411392
394240
415744
361728
366848
356864
372480
372480
390656
378624
382208
371968
381440
364544
364288
318336
318336
307328
307328
378688
358016
365824
372352
367488
346752
359168
339520
351168
334656
341568
330816
327488
335936
410368
388608
379648
373120
375680
426752
358464
409216
396864
417216
364480
433472
410240
472000
436800
373440
398080
398080
398080
398080
361984
361984
381184
381440
381440
381824
373632
457728
427968
436160
422016
463680
412096
416768
382976
347264
380416
332928
349568
285760
302912
248128
342720
338624
338624
335360
331456
331712
325184
302272
303104
303104
303104
329856
327424
483968
391232
368320
445312
395456
421376
344320
269824
256576
313600
304896
366464
355264
362112
456000
398208
526272
497280
518144
507200
511040
425536
349248
357824
389376
391616
372416
365760
422784
449664
538560
492992
537408
528192
524608
409280
468608
434304
398272
397568
468928
412160
381184
412224
372096
375296
390656
316352
307264
371648
486720
403776
400064
365120
389760
490944
355200
447168
428416
439680
501056
595200
591872
541312
549120
554048
646080
545344
572288
527872
561344
601472
595520
593728
665984
557248
683584
607808
553920
612032
603392
523264
589888
469504
690816
777728
677312
562240
407488
429440
500992
478592
405760
427456
534016
369472
369216
468352
396672
443392
418432
477504
445632
382720
282816
279424
303808
273408
250176
304256
277632
283136
297472
256320
244096
219328
267584
281216
259456
284160
341824
333248
315392
283008
263360
313280
401728
329344
291200
428032
284800
384960
371904
303232
342400
271168
292992
297408
363008
366336
401920
426176
362944
315840
337792
298752
306560
332352
323968
243520
329088
338432
310336
322112
319872
263552
395520
290240
278592
314304
254592
388416
342208
504640
501568
389248
430528
506112
465984
413952
474624
456896
377344
298048
308736
376768
381184
378432
406080
387840
401408
394560
299200
338880
293952
256576
282368
247872
276224
292032
360576
305920
364160
323584
416448
307904
289856
367040
322048
341120
397888
331072
306688
340928
319872
319872
328960
388352
382336
428224
380480
366720
313792
285120
321024
298304
285120
239168
388288
454464
430336
427904
520704
472576
525312
487296
591296
541760
523968
523072
548096
453056
509504
524800
499520
484992
440000
444160
440576
439360
420224
369344
372608
372608
372608
372608
372608
372608
372608
372608
372608
372608
356224
356224
356224
356224
356224
411584
399296
399488
399488
377472
541312
525760
525760
525760
525760
462784
462784
437248
431872
477120
563968
562048
559296
556224
547520
565888
582336
563968
552320
552320
543680
604928
601920
654720
633792
624704
575168
562112
545088
559872
594880
577024
585600
568128
559104
614400
576256
485504
529408
529408
524672
505664
497664
497664
495488
495488
483520
452032
454656
451136
451136
510592
514048
514048
514048
532032
487104
463040
426176
426176
426176
426176
403392
403392
401472
368832
368832
368832
404800
404800
404800
422336
422336
422336
422336
390400
361920
362048
359360
392704
390336
387968
388672
388672
426112
352704
356672
356800
356800
356800
357248
357248
357248
357248
357248
357248
357248
405952
405952
405952
405952
405952
444032
444032
442880

[-- Attachment #4: glibc-merge_vs_nomerge.png --]
[-- Type: image/png, Size: 30161 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-03 16:19         ` Christoph Lameter
@ 2015-09-04  9:10           ` Jesper Dangaard Brouer
  2015-09-04 14:13             ` Christoph Lameter
  0 siblings, 1 reply; 42+ messages in thread
From: Jesper Dangaard Brouer @ 2015-09-04  9:10 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Dave Chinner, Linus Torvalds, Mike Snitzer, Pekka Enberg,
	Andrew Morton, David Rientjes, Joonsoo Kim, dm-devel,
	Alasdair G Kergon, Joe Thornber, Mikulas Patocka, Vivek Goyal,
	Sami Tolvanen, Viresh Kumar, Heinz Mauelshagen, linux-mm, brouer

On Thu, 3 Sep 2015 11:19:53 -0500 (CDT) Christoph Lameter <cl@linux.com> wrote:

> On Thu, 3 Sep 2015, Jesper Dangaard Brouer wrote:
> 
> > I'm buying into the problem of variable object lifetime sharing the
> > same slub.
> 
[...]
>
> > With the SLAB bulk free API I'm introducing, we can speedup slub
> > slowpath, by free several objects with a single cmpxchg_double, BUT
> > these objects need to belong to the same page.
> >  Thus, as Dave describe with merging, other users of the same size
> > objects might end up holding onto objects scattered across several
> > pages, which gives the bulk free less opportunities.
> 
> This happens regardless as far as I can tell. On boot up you may end up
> for a time in special situations where that is true.

That is true, which is also why below measurements should be taken with
a grain of salt, as benchmarking is done within 10 min of boot up.


> > That would be a technical argument for introducing a SLAB_NO_MERGE flag
> > per slab.  But I want to do some measurement before making any
> > decision. And it might be hard to show for my use-case of SKB free,
> > because SKB allocs will likely be dominating 256 bytes slab anyhow.

I'll give you some preliminary measurements on my patchset which uses
the new SLAB bulk free API of SKBs in the TX completion on ixgbe NIC
driver (function ixgbe_clean_tx_irq() will bulk free max 32 SKBs).

Basic test-type is IPv4 forwarding, on a single CPU (i7-4790K CPU @
4.00GHz), with generator pktgen sending 14Mpps (using script
samples/pktgen/pktgen_sample03_burst_single_flow.sh). 

Test setup notes
 * Kernel: 4.1.0-mmotm-2015-08-24-16-12+ #261 SMP
  - with patches "detached freelist" and Christophs irqon/off fix.

Config /etc/sysctl.conf ::
 net/ipv4/conf/default/rp_filter = 0
 net/ipv4/conf/all/rp_filter = 0
 # Forwarding performance is affected by early demux
 net/ipv4/ip_early_demux = 0
 net.ipv4.ip_forward = 1

Setup::
 $ base_device_setup.sh ixgbe3
 $ base_device_setup.sh ixgbe4
 $ netfilter_unload_modules.sh ; netfilter_unload_modules.sh; rmmod nf_reject_ipv4
 $ ip neigh add 172.16.0.66 dev ixgbe4 lladdr 00:aa:aa:aa:aa:aa
 # GRO negatively affect forwarding performance (as least for UDP test)
 $ ethtool -K ixgbe4 gro off tso off gso off
 $ ethtool -K ixgbe3 gro off tso off gso off

First I tested a none patched kernel with/without "slab_nomerge".
 (Single CPU IP-forwarding of UDP packets)
 * Normal      : 2049166 pps
 * slab_nomerge: 2053440 pps
 * Diff: +4274pps and -1.02ns
 * Nanosec diff show we are below accuracy of system

Thus, results are the same.
Using bulking changes the picture:

Bulk free of max 32 SKBs in ixgbe TX-DMA-completion:
 * Bulk-free32: 2091218 pps
 * Diff to "Normal" case above: +42052 pps and 9.81ns
 * Nanosec diff is significant (enough above accuracy level of system)
 * Summary: Pretty nice improvement!

Same test with "slab_nomerge":
 * slab_nomerge: 2121703 pps
 * Diff to above: +30485 pps and -6.87 ns
 * Nanosec diff were upto 3ns in testrun, this 6ns is still valid
 * Summary: slab_nomerge did make a difference!

Total improvement is quite significant: +72537 pps and -16.68ns (+3.5%)

It is important to be critical about your own measurements.  What is
the real cause of this change.  Lets see that happens if we tune SLUB
per CPU structures to have more "room", instead of using "slab_nomerge".

Tuning::
  echo 256 > /sys/kernel/slab/skbuff_head_cache/cpu_partial
  echo 9   > /sys/kernel/slab/skbuff_head_cache/min_partial

Test with bulk-free32 and SLUB-tuning:
 * slub-tuned: 2110842 pps
 * Note this gets very close to "slab_nomerge"
  - 2121703 - 2110842 = 10861 pps
  - (1/2121703*10^9)-(1/2110842*10^9) = -2.42 ns
 * Nanosec diff around 2.5ns is not significant enough, call results the same

Thus, I could achieve the same performance results by tuning SLUB as I
could with "slab_nomerge".  Maybe the advantage from "slab_nomerge" was
just that I got my "own" per CPU structures, and this implicitly larger
per CPU memory for myself?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-04  7:59             ` Sergey Senozhatsky
@ 2015-09-04  9:56               ` Sergey Senozhatsky
  2015-09-04 14:05               ` Christoph Lameter
  2015-09-04 14:11               ` Linus Torvalds
  2 siblings, 0 replies; 42+ messages in thread
From: Sergey Senozhatsky @ 2015-09-04  9:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jesper Dangaard Brouer, Dave Chinner, Mike Snitzer,
	Christoph Lameter, Pekka Enberg, Andrew Morton, David Rientjes,
	Joonsoo Kim, dm-devel, Alasdair G Kergon, Joe Thornber,
	Mikulas Patocka, Vivek Goyal, Sami Tolvanen, Viresh Kumar,
	Heinz Mauelshagen, linux-mm, Sergey Senozhatsky,
	Sergey Senozhatsky

[-- Attachment #1: Type: text/plain, Size: 354 bytes --]

On (09/04/15 16:59), Sergey Senozhatsky wrote:
> 
> It differs on both busy and idle systems.
> 

1) IDLE system right after reboot and 2) under `desktop machine` load
(ssh, firefox, vim, etc.). So yes, the behaviour seems to be stable on
my box.

Only gnuplot graphs attached this time (let me know if files with
the numbers are of any interest).

	-ss

[-- Attachment #2: idle-merge_vs_nomerge.png --]
[-- Type: image/png, Size: 17240 bytes --]

[-- Attachment #3: desktop-merge_vs_nomerge.png --]
[-- Type: image/png, Size: 20249 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-04  3:26         ` Dave Chinner
  2015-09-04  3:51           ` Linus Torvalds
@ 2015-09-04 13:55           ` Christoph Lameter
  2015-09-04 22:46             ` Dave Chinner
  1 sibling, 1 reply; 42+ messages in thread
From: Christoph Lameter @ 2015-09-04 13:55 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Mike Snitzer, Pekka Enberg, Andrew Morton,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm

On Fri, 4 Sep 2015, Dave Chinner wrote:

> There are generic cases where it hurts, so no justification should
> be needed for those cases...

Inodes and dentries have constructors. These slabs are not mergeable and
will never be because they have cache specific code to be executed on the
object.

> Really, we don't need some stupidly high bar to jump over here -
> whether merging should be allowed can easily be answered with a
> simple question: "Does the slab have a shrinker or does it back a
> mempool?" If the answer is yes then using SLAB_SHRINKER or
> SLAB_MEMPOOL to trigger the no-merge case doesn't need any more
> justification from subsystem maintainers at all.

The slab shrinkers do not use mergeable slab caches.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-04  7:59             ` Sergey Senozhatsky
  2015-09-04  9:56               ` Sergey Senozhatsky
@ 2015-09-04 14:05               ` Christoph Lameter
  2015-09-04 14:11               ` Linus Torvalds
  2 siblings, 0 replies; 42+ messages in thread
From: Christoph Lameter @ 2015-09-04 14:05 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Linus Torvalds, Jesper Dangaard Brouer, Dave Chinner,
	Mike Snitzer, Pekka Enberg, Andrew Morton, David Rientjes,
	Joonsoo Kim, dm-devel, Alasdair G Kergon, Joe Thornber,
	Mikulas Patocka, Vivek Goyal, Sami Tolvanen, Viresh Kumar,
	Heinz Mauelshagen, linux-mm, Sergey Senozhatsky

[-- Attachment #1: Type: TEXT/PLAIN, Size: 532 bytes --]

On Fri, 4 Sep 2015, Sergey Senozhatsky wrote:

> But I went through the corresponding slabinfo (I track slabinfo too); and yes,
> zero unused objects.

Please use the slabinfo tool. What you see in /proc/slabinfo is generated
for slab compatibility and may not show useful numbers.

Run

	gcc -o slabinfo tools/vm/slabinfo.c

	slabinfo -T

to get an overview of the fragmentation etc state of the
slab caches.

Run

	slabinfo

to get individual cache statistics


It would be helpful to compare the output with and without merging.

[-- Attachment #2: Type: TEXT/PLAIN, Size: 3491 bytes --]

46016
90688
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
36608
33472
33472
56128
56128
0
0
0
0
0
0
0
0
0
0
0
59392
59392
59392
43008
43008
43008
26624
26624
26624
26624
7168
7168
61440
61440
61440
57344
49152
29696
47104
47104
47104
19456
77824
77824
77824
77824
45056
45056
45056
45056
45056
45056
63488
63488
19456
19456
19456
19456
19456
19456
19456
47104
47104
19456
19456
19456
19456
19456
19456
19456
19456
19456
19456
3072
3072
3072
3072
3072
3072
3072
3072
3072
3072
3072
3072
3072
3072
3072
3072
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
63488
67584
63488
63488
121280
121280
125120
90112
114368
63488
105728
142720
159488
155584
158400
195200
117952
208768
70272
72704
89664
134336
80512
109120
109120
109120
54016
30720
30720
67072
134912
134912
132928
132928
132928
117120
117120
74240
48576
115648
53248
173056
122496
129152
69440
127808
107456
51392
100544
50944
110976
103040
46336
56704
153216
91520
161472
161472
139264
167872
81920
96704
96704
96704
96704
96704
96704
96704
96704
142976
142976
142976
81920
122176
122176
139712
172480
170176
168192
143872
143872
168256
143552
188864
188864
188864
99328
81920
151616
81920
94016
94016
94016
94016
94016
94016
94016
157248
81920
150144
198464
174784
136640
121216
92608
153856
174144
174144
174144
127296
127296
127296
121792
121792
121792
121792
113280
113280
113280
113280
113280
113280
113280
113280
113280
113280
113280
113280
109504
109504
109504
109504
74176
111936
66560
125440
124352
128064
74368
123136
68160
118912
118912
135616
59904
139904
136320
188544
188544
169088
169088
76352
104128
102208
133440
109824
114560
101184
120064
114432
119936
108352
104128
177088
97984
112128
120320
106944
187328
85952
74624
74496
122944
112704
122816
48512
40448
40448
102720
104448
149824
101504
91008
245632
192384
206144
206592
173696
195648
113152
156416
106368
262144
186304
155712
114880
183296
107328
102592
161728
153728
80064
124672
110528
116288
141888
95104
40448
153920
138240
40448
40448
40448
40448
40448
168064
175488
115712
112768
94272
52672
103680
87040
122944
100416
36352
79360
79360
79360
83328
112192
112192
112192
112192
89088
86144
96896
94656
36352
114624
115712
106496
96128
96128
96128
72064
72064
72064
40064
40064
36352
36352
36352
134912
125248
56384
37504
36352
36352
36352
36352
36352
36352
36352
36352
82240
71616
59776
36352
36352
36352
38656
36352
103232
88896
36352
59840
36352
133440
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
109888
91968
91968
91968
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
92096
80640
36352
55936
78912
36352
74112
36352
135744
167808
193088
157120
157120
125376
174784
232128
232128
222144
206912
206912
206912
206912
198976
178816
154880
154880
167680
208128
151296
139520
113792
80640
76032
76032
76032
76032
76032
76032
76032
62592
62592
62592
62592
62592
62592
62592
62592
58624
48384
73856
73856
73856
73088
40832
40832
40832
40832
40832
40832
40832
40832
40832
40832
40832
40832
40832
36352
36352
36352
36352
36352
36352
36352
36352
64320
36352
36352
36352
36352
36352
36352

[-- Attachment #3: Type: TEXT/PLAIN, Size: 3780 bytes --]

180032
119360
106176
200640
193472
156864
156864
156864
181440
181440
169152
169152
122048
179136
179392
203200
232832
232832
302848
303360
303872
304384
308608
326656
354240
327744
334272
347456
375808
326592
308672
304960
344768
324352
318464
340736
340736
351488
340736
340736
336768
356736
352640
336768
353856
352576
346560
343488
337600
340928
359616
340672
337600
340416
355008
360640
341504
365568
331456
353152
353600
330496
366464
358784
371072
318336
318336
304256
330880
327552
296832
295040
295040
297088
312192
298624
304256
296320
411392
431104
414720
416512
421888
429312
422656
411392
394240
415744
361728
366848
356864
372480
372480
390656
378624
382208
371968
381440
364544
364288
318336
318336
307328
307328
378688
358016
365824
372352
367488
346752
359168
339520
351168
334656
341568
330816
327488
335936
410368
388608
379648
373120
375680
426752
358464
409216
396864
417216
364480
433472
410240
472000
436800
373440
398080
398080
398080
398080
361984
361984
381184
381440
381440
381824
373632
457728
427968
436160
422016
463680
412096
416768
382976
347264
380416
332928
349568
285760
302912
248128
342720
338624
338624
335360
331456
331712
325184
302272
303104
303104
303104
329856
327424
483968
391232
368320
445312
395456
421376
344320
269824
256576
313600
304896
366464
355264
362112
456000
398208
526272
497280
518144
507200
511040
425536
349248
357824
389376
391616
372416
365760
422784
449664
538560
492992
537408
528192
524608
409280
468608
434304
398272
397568
468928
412160
381184
412224
372096
375296
390656
316352
307264
371648
486720
403776
400064
365120
389760
490944
355200
447168
428416
439680
501056
595200
591872
541312
549120
554048
646080
545344
572288
527872
561344
601472
595520
593728
665984
557248
683584
607808
553920
612032
603392
523264
589888
469504
690816
777728
677312
562240
407488
429440
500992
478592
405760
427456
534016
369472
369216
468352
396672
443392
418432
477504
445632
382720
282816
279424
303808
273408
250176
304256
277632
283136
297472
256320
244096
219328
267584
281216
259456
284160
341824
333248
315392
283008
263360
313280
401728
329344
291200
428032
284800
384960
371904
303232
342400
271168
292992
297408
363008
366336
401920
426176
362944
315840
337792
298752
306560
332352
323968
243520
329088
338432
310336
322112
319872
263552
395520
290240
278592
314304
254592
388416
342208
504640
501568
389248
430528
506112
465984
413952
474624
456896
377344
298048
308736
376768
381184
378432
406080
387840
401408
394560
299200
338880
293952
256576
282368
247872
276224
292032
360576
305920
364160
323584
416448
307904
289856
367040
322048
341120
397888
331072
306688
340928
319872
319872
328960
388352
382336
428224
380480
366720
313792
285120
321024
298304
285120
239168
388288
454464
430336
427904
520704
472576
525312
487296
591296
541760
523968
523072
548096
453056
509504
524800
499520
484992
440000
444160
440576
439360
420224
369344
372608
372608
372608
372608
372608
372608
372608
372608
372608
372608
356224
356224
356224
356224
356224
411584
399296
399488
399488
377472
541312
525760
525760
525760
525760
462784
462784
437248
431872
477120
563968
562048
559296
556224
547520
565888
582336
563968
552320
552320
543680
604928
601920
654720
633792
624704
575168
562112
545088
559872
594880
577024
585600
568128
559104
614400
576256
485504
529408
529408
524672
505664
497664
497664
495488
495488
483520
452032
454656
451136
451136
510592
514048
514048
514048
532032
487104
463040
426176
426176
426176
426176
403392
403392
401472
368832
368832
368832
404800
404800
404800
422336
422336
422336
422336
390400
361920
362048
359360
392704
390336
387968
388672
388672
426112
352704
356672
356800
356800
356800
357248
357248
357248
357248
357248
357248
357248
405952
405952
405952
405952
405952
444032
444032
442880

[-- Attachment #4: Type: IMAGE/PNG, Size: 30161 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-04  7:59             ` Sergey Senozhatsky
  2015-09-04  9:56               ` Sergey Senozhatsky
  2015-09-04 14:05               ` Christoph Lameter
@ 2015-09-04 14:11               ` Linus Torvalds
  2015-09-05  2:09                   ` Sergey Senozhatsky
  2 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2015-09-04 14:11 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Jesper Dangaard Brouer, Dave Chinner, Mike Snitzer,
	Christoph Lameter, Pekka Enberg, Andrew Morton, David Rientjes,
	Joonsoo Kim, dm-devel, Alasdair G Kergon, Joe Thornber,
	Mikulas Patocka, Vivek Goyal, Sami Tolvanen, Viresh Kumar,
	Heinz Mauelshagen, linux-mm, Sergey Senozhatsky

On Fri, Sep 4, 2015 at 12:59 AM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
>
> But I went through the corresponding slabinfo (I track slabinfo too); and yes,
> zero unused objects.

Ahh. I should have realized - the number you are actually tracking is
meaningless. The "unused objects" thing is not really tracked well.

/proc/slabinfo ends up not showing the percpu queue state, so things
look "used" when they are really just on the percpu queues for that
slab.So the "unused" number you are tracking is not really meaningful,
and the zeroes you are seeing is just a symptom of that: slabinfo
isn't "exact" enough.

So you should probably do the statistics on something that is more
meaningful: the actual number of pages that have been allocated (which
would be numslabs times pages-per-slab).

                     Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-04  9:10           ` Jesper Dangaard Brouer
@ 2015-09-04 14:13             ` Christoph Lameter
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Lameter @ 2015-09-04 14:13 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Dave Chinner, Linus Torvalds, Mike Snitzer, Pekka Enberg,
	Andrew Morton, David Rientjes, Joonsoo Kim, dm-devel,
	Alasdair G Kergon, Joe Thornber, Mikulas Patocka, Vivek Goyal,
	Sami Tolvanen, Viresh Kumar, Heinz Mauelshagen, linux-mm

On Fri, 4 Sep 2015, Jesper Dangaard Brouer wrote:

> Thus, I could achieve the same performance results by tuning SLUB as I
> could with "slab_nomerge".  Maybe the advantage from "slab_nomerge" was
> just that I got my "own" per CPU structures, and this implicitly larger
> per CPU memory for myself?

Well if multiple slabs are merged then there is potential pressure on the
per node locks if huge amounts of objects are concurrently retrieved from
the per node partial lists by two different subsystems. So cache merging
can increase contention and thereby reduce performance. What you did with
tuning is to reduce that contention by increasing the per cpu pages that
do not require locks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-04 13:55           ` Christoph Lameter
@ 2015-09-04 22:46             ` Dave Chinner
  2015-09-05  0:25               ` Christoph Lameter
  0 siblings, 1 reply; 42+ messages in thread
From: Dave Chinner @ 2015-09-04 22:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Linus Torvalds, Mike Snitzer, Pekka Enberg, Andrew Morton,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm

On Fri, Sep 04, 2015 at 08:55:25AM -0500, Christoph Lameter wrote:
> On Fri, 4 Sep 2015, Dave Chinner wrote:
> 
> > There are generic cases where it hurts, so no justification should
> > be needed for those cases...
> 
> Inodes and dentries have constructors. These slabs are not mergeable and
> will never be because they have cache specific code to be executed on the
> object.

I know - I said as much early on in this discussion. That's one of
the generic cases I'm refering to.

I also said that the fact that they are not merged is really by
chance, not by good management. They are not being merged because of
the constructor, not because they have a shrinker. hell, I even said
that if it comes down to it, we don't even need SLAB_NO_MERGE
because we can create dummy constructors to prevent merging....

> > Really, we don't need some stupidly high bar to jump over here -
> > whether merging should be allowed can easily be answered with a
> > simple question: "Does the slab have a shrinker or does it back a
> > mempool?" If the answer is yes then using SLAB_SHRINKER or
> > SLAB_MEMPOOL to trigger the no-merge case doesn't need any more
> > justification from subsystem maintainers at all.
> 
> The slab shrinkers do not use mergeable slab caches.

Please, go back and read what i've already said.

*Some* shrinkers act on mergable slabs because they have no
constructor. e.g. the xfs_dquot and xfs_buf shrinkers.  I want to
keep them separate just like the inode cache is kept separate
because they have workload based demand peaks in the millions of
objects and LRU based shrinker reclaim, just like inode caches do.

That's what I want SLAB_SHRINKER for - to explicitly tell the slab
cache creation that I have a shrinker on this slab and so it should
not merge it with others. Every slab that has a shrinker should be
marked with this flag - we should not be relying on constructors to
prevent merging of critical slab caches with shrinkers....

I really don't see the issue here - explicitly encoding and
documenting the behaviour we've implicitly been relying on for years
is something we do all the time. Code clarity and documented
behaviour is a *good thing*.

Cheers,

Dave.
-- 
Dave Chinner
dchinner@redhat.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-04 22:46             ` Dave Chinner
@ 2015-09-05  0:25               ` Christoph Lameter
  2015-09-05  1:16                 ` Dave Chinner
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Lameter @ 2015-09-05  0:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linus Torvalds, Mike Snitzer, Pekka Enberg, Andrew Morton,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm

On Sat, 5 Sep 2015, Dave Chinner wrote:

> > Inodes and dentries have constructors. These slabs are not mergeable and
> > will never be because they have cache specific code to be executed on the
> > object.
>
> I also said that the fact that they are not merged is really by
> chance, not by good management. They are not being merged because of
> the constructor, not because they have a shrinker. hell, I even said
> that if it comes down to it, we don't even need SLAB_NO_MERGE
> because we can create dummy constructors to prevent merging....

Right. There is no chance here though. Its intentional to not merge slab
where we could get into issues.

Would be interested to see how performance changes if the inode/dentries
would become mergeable.

> *Some* shrinkers act on mergable slabs because they have no
> constructor. e.g. the xfs_dquot and xfs_buf shrinkers.  I want to
> keep them separate just like the inode cache is kept separate
> because they have workload based demand peaks in the millions of
> objects and LRU based shrinker reclaim, just like inode caches do.

But then we are not sure why we would do that. Certainly merging can
increases the stress on the per node locks for a slab cache as the example
by Jesper shows (and this can be dealt with by increasing per cpu
resources). On the other hand this also leads to rapid defragmentation
because the free objects from partial pages produced by the frees of
one of the merged slabs can get reused quickly for another purpose.

> I really don't see the issue here - explicitly encoding and
> documenting the behaviour we've implicitly been relying on for years
> is something we do all the time. Code clarity and documented
> behaviour is a *good thing*.

The question first has to be answered why keeping them separate is such a
good thing without also having an explicit way of telling the allocator to
keep certain objects in the same slab page if possible. Otherwise we get
this randomizing effect that nullifies the idea that sequential
freeing/allocation would avoid fragmentation.

I have in the past be in favor of adding such a flag to avoid merging but
I am slowly getting to the point that this may not be wise anymore. There
is too much arguing from gut reactions here and relying on assumptions
about internal operations of slabs (thinking to be able to exploit the
fact that linearly allocated objects come from the same slab page coming
from you is one of these).

Defragmentation IMHO requires a targeted approach were either objects that
are in the way can be moved out of the way or there is some type of
lifetime marker on objects that allows the memory allocators to know that
these objects can be freed all at once when a certain operation is
complete.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-04  3:51           ` Linus Torvalds
@ 2015-09-05  0:36               ` Dave Chinner
  2015-09-07  9:30             ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2015-09-05  0:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mike Snitzer, Christoph Lameter, Pekka Enberg, Andrew Morton,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm

On Thu, Sep 03, 2015 at 08:51:09PM -0700, Linus Torvalds wrote:
> On Thu, Sep 3, 2015 at 8:26 PM, Dave Chinner <dchinner@redhat.com> wrote:
> >
> > The double standard is the problem here. No notification, proof,
> > discussion or review was needed to turn on slab merging for
> > everyone, but you're setting a very high bar to jump if anyone wants
> > to turn it off in their code.
> 
> Ehh. You realize that almost the only load that is actually seriously
> allocator-limited is networking?

Of course I do - I've been following Jesper's work quite closely
because we might be able to make use of the batch allocation
mechanism in the XFS inode cache in certain workloads where we are
burning through a million inode slab allocations a second...

But again, you're bringing up justifications for a change that were
not documented in the commit message for the change. It didn't even
mention performance (just fragmentation and memory savings). If this
was such a critical factor in making this decision, then why weren't
such workloads and numbers provided with the commit? And why didn't
someone from netowkring actually review the change and ack/test that
it did actually do what it was supposed to?

If you are going to make an assertion, then you damn well better
provide numbers to go along with that assertion. What's you're
phrase, Linus? "Numbers talk and BS walks?" Where are the numbers,
Linus? Hmmmm?

Indeed, with network slabs that hot, mixing them with random other
slab caches could have a negative effect on performance by
increasing contention on the slab over what the network load already
brings. I learnt that lesson 12 years ago when optimisng the mbuf
slab allocator in the Irix network stack to scale to >1Mpps through
16 GbE cards: It worked just fine until we started doing something
with the data that the network was delivering and created more load
on the shared slab....

But, I digress. I've been trying to explain why we shouldn't be merging
slabs with shrinkers and you've shifted the goal posts rather
than addressing the discussion at hand.

> Really, Dave. You have absolutely nothing to back up your points with.
> Merging is *not* some kind of "new" thing that was silently enabled
> recently to take you by surprise.

The key slab tha I monitor for fragmentation behaviour (the XFS
inode slab) does not get merged. Ever. SLAB or SLUB. Because it has
a *constructor*.  Linus, if you bothered to read my previous
comments in this discussion then you'd know this.  I just want to
flag to extend that behaviour to all the slab caches I actively
manage with shrinkers, because slab merging does not benefit them
the same way it does passive slabs. That's not hard to understand,
nor is it a major issue for anyone.

>From my perspective, Linus, you're way out of line. You are not
engaging on a technical level - you're not even reading the
arguments I've been presenting. You're just cherry-picking something
mostly irrelelvant to the problem being discussed and going off at a
tangent ranting and swearing and trying your best to be abusive.
Your behaviour and bluster does not intimidate me, so please try to
be a bit more civil and polite and engage properly on a technical
level.


-Dave.
-- 
Dave Chinner
dchinner@redhat.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
@ 2015-09-05  0:36               ` Dave Chinner
  0 siblings, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2015-09-05  0:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mike Snitzer, Christoph Lameter, Pekka Enberg, Andrew Morton,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm

On Thu, Sep 03, 2015 at 08:51:09PM -0700, Linus Torvalds wrote:
> On Thu, Sep 3, 2015 at 8:26 PM, Dave Chinner <dchinner@redhat.com> wrote:
> >
> > The double standard is the problem here. No notification, proof,
> > discussion or review was needed to turn on slab merging for
> > everyone, but you're setting a very high bar to jump if anyone wants
> > to turn it off in their code.
> 
> Ehh. You realize that almost the only load that is actually seriously
> allocator-limited is networking?

Of course I do - I've been following Jesper's work quite closely
because we might be able to make use of the batch allocation
mechanism in the XFS inode cache in certain workloads where we are
burning through a million inode slab allocations a second...

But again, you're bringing up justifications for a change that were
not documented in the commit message for the change. It didn't even
mention performance (just fragmentation and memory savings). If this
was such a critical factor in making this decision, then why weren't
such workloads and numbers provided with the commit? And why didn't
someone from netowkring actually review the change and ack/test that
it did actually do what it was supposed to?

If you are going to make an assertion, then you damn well better
provide numbers to go along with that assertion. What's you're
phrase, Linus? "Numbers talk and BS walks?" Where are the numbers,
Linus? Hmmmm?

Indeed, with network slabs that hot, mixing them with random other
slab caches could have a negative effect on performance by
increasing contention on the slab over what the network load already
brings. I learnt that lesson 12 years ago when optimisng the mbuf
slab allocator in the Irix network stack to scale to >1Mpps through
16 GbE cards: It worked just fine until we started doing something
with the data that the network was delivering and created more load
on the shared slab....

But, I digress. I've been trying to explain why we shouldn't be merging
slabs with shrinkers and you've shifted the goal posts rather
than addressing the discussion at hand.

> Really, Dave. You have absolutely nothing to back up your points with.
> Merging is *not* some kind of "new" thing that was silently enabled
> recently to take you by surprise.

The key slab tha I monitor for fragmentation behaviour (the XFS
inode slab) does not get merged. Ever. SLAB or SLUB. Because it has
a *constructor*.  Linus, if you bothered to read my previous
comments in this discussion then you'd know this.  I just want to
flag to extend that behaviour to all the slab caches I actively
manage with shrinkers, because slab merging does not benefit them
the same way it does passive slabs. That's not hard to understand,
nor is it a major issue for anyone.

From my perspective, Linus, you're way out of line. You are not
engaging on a technical level - you're not even reading the
arguments I've been presenting. You're just cherry-picking something
mostly irrelelvant to the problem being discussed and going off at a
tangent ranting and swearing and trying your best to be abusive.
Your behaviour and bluster does not intimidate me, so please try to
be a bit more civil and polite and engage properly on a technical
level.


-Dave.
-- 
Dave Chinner
dchinner@redhat.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-05  0:25               ` Christoph Lameter
@ 2015-09-05  1:16                 ` Dave Chinner
  0 siblings, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2015-09-05  1:16 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Linus Torvalds, Mike Snitzer, Pekka Enberg, Andrew Morton,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm

On Fri, Sep 04, 2015 at 07:25:48PM -0500, Christoph Lameter wrote:
> On Sat, 5 Sep 2015, Dave Chinner wrote:
> 
> > > Inodes and dentries have constructors. These slabs are not mergeable and
> > > will never be because they have cache specific code to be executed on the
> > > object.
> >
> > I also said that the fact that they are not merged is really by
> > chance, not by good management. They are not being merged because of
> > the constructor, not because they have a shrinker. hell, I even said
> > that if it comes down to it, we don't even need SLAB_NO_MERGE
> > because we can create dummy constructors to prevent merging....
> 
> Right. There is no chance here though. Its intentional to not merge slab
> where we could get into issues.

The dentry cache does not have a constructor:

       /* 
         * A constructor could be added for stable state like the lists,
         * but it is probably not worth it because of the cache nature
         * of the dcache. 
         */
        dentry_cache = KMEM_CACHE(dentry,
                SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD);

> Would be interested to see how performance changes if the inode/dentries
> would become mergeable.

On my machines the dentry slab  doesn't merge with any other slabs,
though, because there are no other slabs with the same size object.
That's one of the major crap shoots with slab merging that I want to
fix.


> > *Some* shrinkers act on mergable slabs because they have no
> > constructor. e.g. the xfs_dquot and xfs_buf shrinkers.  I want to
> > keep them separate just like the inode cache is kept separate
> > because they have workload based demand peaks in the millions of
> > objects and LRU based shrinker reclaim, just like inode caches do.
> 
> But then we are not sure why we would do that. Certainly merging can
> increases the stress on the per node locks for a slab cache as the example
> by Jesper shows (and this can be dealt with by increasing per cpu
> resources). On the other hand this also leads to rapid defragmentation
> because the free objects from partial pages produced by the frees of
> one of the merged slabs can get reused quickly for another purpose.

We can't control the freeing of objects from other merged slabs,
unless they are also actively managed by a shrinker. So that page is
pinned until the slab object is freed by whatever subsystem owns it,
and no amount of memory pressure can cause that to happen.

> > I really don't see the issue here - explicitly encoding and
> > documenting the behaviour we've implicitly been relying on for years
> > is something we do all the time. Code clarity and documented
> > behaviour is a *good thing*.
> 
> The question first has to be answered why keeping them separate is such a
> good thing without also having an explicit way of telling the allocator to
> keep certain objects in the same slab page if possible. Otherwise we get
> this randomizing effect that nullifies the idea that sequential
> freeing/allocation would avoid fragmentation.

I don't follow. Sequential alloc/free of objects from an unshared
slab does not alter fragmentation patterns of the slab. If it was
fragmented before the sequntial run, it will be fragmented after.

If you are talking about merging dentry/inode objects into the same
slab and doing sequential allocation of them, that just does not
work. the relationship between detries and inodes is an M:N
relationship, not a 1:1 relationship, so they will never have nice
neat aligned alloc/free patterns.

> I have in the past be in favor of adding such a flag to avoid merging but
> I am slowly getting to the point that this may not be wise anymore. There
> is too much arguing from gut reactions here and relying on assumptions
> about internal operations of slabs (thinking to be able to exploit the
> fact that linearly allocated objects come from the same slab page coming
> from you is one of these).

Wow. The only time I've ever mentioned that we could do some
interesting things if we knew certain objects were on the same
backing page was earlier this year at LCA when we were talking about
the design of the proposed batch allocation interface. You said that
it probably couldn't be guaranteed and so i haven't even thought
about that since.

That's not an argument for preventing us from saying "don't merge
this slab, we actively manage it's contents".

> Defragmentation IMHO requires a targeted approach were either objects that
> are in the way can be moved out of the way or there is some type of
> lifetime marker on objects that allows the memory allocators to know that
> these objects can be freed all at once when a certain operation is
> complete.

Which, if we know that there is only one type of object in the slab,
is relatively easy to do and can be controlled by the subsystem
shrinker.... :)

Cheers,

Dave.
-- 
Dave Chinner
dchinner@redhat.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-04 14:11               ` Linus Torvalds
@ 2015-09-05  2:09                   ` Sergey Senozhatsky
  0 siblings, 0 replies; 42+ messages in thread
From: Sergey Senozhatsky @ 2015-09-05  2:09 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Lameter
  Cc: Sergey Senozhatsky, Jesper Dangaard Brouer, Dave Chinner,
	Mike Snitzer, Pekka Enberg, Andrew Morton, David Rientjes,
	Joonsoo Kim, dm-devel, Alasdair G Kergon, Joe Thornber,
	Mikulas Patocka, Vivek Goyal, Sami Tolvanen, Viresh Kumar,
	Heinz Mauelshagen, linux-mm, Sergey Senozhatsky

[-- Attachment #1: Type: text/plain, Size: 2566 bytes --]

On (09/04/15 07:11), Linus Torvalds wrote:
> >
> > But I went through the corresponding slabinfo (I track slabinfo too); and yes,
> > zero unused objects.
> 
> Ahh. I should have realized - the number you are actually tracking is
> meaningless. The "unused objects" thing is not really tracked well.
> 
> /proc/slabinfo ends up not showing the percpu queue state, so things
> look "used" when they are really just on the percpu queues for that
> slab.So the "unused" number you are tracking is not really meaningful,
> and the zeroes you are seeing is just a symptom of that: slabinfo
> isn't "exact" enough.
> 
> So you should probably do the statistics on something that is more
> meaningful: the actual number of pages that have been allocated (which
> would be numslabs times pages-per-slab).


Aha... Didn't know that, sorry.

Christoph Lameter wrote:
> Please use the slabinfo tool. What you see in /proc/slabinfo is generated
> for slab compatibility and may not show useful numbers.
> 

OK. I did another round of tests

 git clone git://sourceware.org/git/glibc.git
 make -j8
 package (xz)
 rm -fr glibc



>From slabinfo -T output

Slabcaches :  91      Aliases  : 118->69  Active:  65
Memory used:  60.0M   # Loss   :  13.2M   MRatio:    28%
# Objects  : 162.4K   # PartObj:  10.6K   ORatio:     6%

Per Cache    Average         Min         Max       Total
---------------------------------------------------------
#Objects        2.4K          11       19.0K      162.4K
#Slabs           108           1        1.8K        7.0K
#PartSlab         34           0        1.6K        2.2K
%PartSlab         7%          0%         86%         31%
PartObjs           6           0        4.7K       10.6K
% PartObj         3%          0%         33%          6%
Memory        923.9K        8.1K       10.2M       60.0M
Used          720.3K        8.0K        9.7M       46.8M
Loss          203.6K           0        6.1M       13.2M

Per Object   Average         Min         Max
---------------------------------------------
Memory           290           8        8.1K
User             288           8        8.1K
Loss               1           0          64


I took the
       "Memory used:  60.0M   # Loss   :  13.2M   MRatio:    28%"
line and generated 3 graphs:
-- "Memory used"	MM
-- "Loss"		LOSS
-- "MRatio"		RATION

for "slab_nomerge = 0" and "slab_nomerge = 1".

... And those are sort of interesting. I was expecting to see more
diverged behaviours.

Attached.

Please let me know if you want to see files with the numbers
(slabinfo -T only).

	-ss

[-- Attachment #2: glibc-RATIO-merge_vs_nomerge.png --]
[-- Type: image/png, Size: 15874 bytes --]

[-- Attachment #3: glibc-LOSS-merge_vs_nomerge.png --]
[-- Type: image/png, Size: 16482 bytes --]

[-- Attachment #4: glibc-MM-merge_vs_nomerge.png --]
[-- Type: image/png, Size: 16937 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
@ 2015-09-05  2:09                   ` Sergey Senozhatsky
  0 siblings, 0 replies; 42+ messages in thread
From: Sergey Senozhatsky @ 2015-09-05  2:09 UTC (permalink / raw)
  To: Linus Torvalds, Christoph Lameter
  Cc: Sergey Senozhatsky, Jesper Dangaard Brouer, Dave Chinner,
	Mike Snitzer, Pekka Enberg, Andrew Morton, David Rientjes,
	Joonsoo Kim, dm-devel, Alasdair G Kergon, Joe Thornber,
	Mikulas Patocka, Vivek Goyal, Sami Tolvanen, Viresh Kumar,
	Heinz Mauelshagen, linux-mm, Sergey Senozhatsky

[-- Attachment #1: Type: text/plain, Size: 2565 bytes --]

On (09/04/15 07:11), Linus Torvalds wrote:
> >
> > But I went through the corresponding slabinfo (I track slabinfo too); and yes,
> > zero unused objects.
> 
> Ahh. I should have realized - the number you are actually tracking is
> meaningless. The "unused objects" thing is not really tracked well.
> 
> /proc/slabinfo ends up not showing the percpu queue state, so things
> look "used" when they are really just on the percpu queues for that
> slab.So the "unused" number you are tracking is not really meaningful,
> and the zeroes you are seeing is just a symptom of that: slabinfo
> isn't "exact" enough.
> 
> So you should probably do the statistics on something that is more
> meaningful: the actual number of pages that have been allocated (which
> would be numslabs times pages-per-slab).


Aha... Didn't know that, sorry.

Christoph Lameter wrote:
> Please use the slabinfo tool. What you see in /proc/slabinfo is generated
> for slab compatibility and may not show useful numbers.
> 

OK. I did another round of tests

 git clone git://sourceware.org/git/glibc.git
 make -j8
 package (xz)
 rm -fr glibc



From slabinfo -T output

Slabcaches :  91      Aliases  : 118->69  Active:  65
Memory used:  60.0M   # Loss   :  13.2M   MRatio:    28%
# Objects  : 162.4K   # PartObj:  10.6K   ORatio:     6%

Per Cache    Average         Min         Max       Total
---------------------------------------------------------
#Objects        2.4K          11       19.0K      162.4K
#Slabs           108           1        1.8K        7.0K
#PartSlab         34           0        1.6K        2.2K
%PartSlab         7%          0%         86%         31%
PartObjs           6           0        4.7K       10.6K
% PartObj         3%          0%         33%          6%
Memory        923.9K        8.1K       10.2M       60.0M
Used          720.3K        8.0K        9.7M       46.8M
Loss          203.6K           0        6.1M       13.2M

Per Object   Average         Min         Max
---------------------------------------------
Memory           290           8        8.1K
User             288           8        8.1K
Loss               1           0          64


I took the
       "Memory used:  60.0M   # Loss   :  13.2M   MRatio:    28%"
line and generated 3 graphs:
-- "Memory used"	MM
-- "Loss"		LOSS
-- "MRatio"		RATION

for "slab_nomerge = 0" and "slab_nomerge = 1".

... And those are sort of interesting. I was expecting to see more
diverged behaviours.

Attached.

Please let me know if you want to see files with the numbers
(slabinfo -T only).

	-ss

[-- Attachment #2: glibc-RATIO-merge_vs_nomerge.png --]
[-- Type: image/png, Size: 15874 bytes --]

[-- Attachment #3: glibc-LOSS-merge_vs_nomerge.png --]
[-- Type: image/png, Size: 16482 bytes --]

[-- Attachment #4: glibc-MM-merge_vs_nomerge.png --]
[-- Type: image/png, Size: 16937 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-05  2:09                   ` Sergey Senozhatsky
  (?)
@ 2015-09-05 20:33                   ` Linus Torvalds
  2015-09-07  8:44                     ` Sergey Senozhatsky
  -1 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2015-09-05 20:33 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Christoph Lameter, Jesper Dangaard Brouer, Dave Chinner,
	Mike Snitzer, Pekka Enberg, Andrew Morton, David Rientjes,
	Joonsoo Kim, dm-devel, Alasdair G Kergon, Joe Thornber,
	Mikulas Patocka, Vivek Goyal, Sami Tolvanen, Viresh Kumar,
	Heinz Mauelshagen, linux-mm, Sergey Senozhatsky

On Fri, Sep 4, 2015 at 7:09 PM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
>
> Aha... Didn't know that, sorry.

Hey, I didn't react to it either. until you pointed out the oddity of
"no free slab memory" Very easy to overlook.

> ... And those are sort of interesting. I was expecting to see more
> diverged behaviours.
>
> Attached.

So I'm not sure how really conclusive these graphs are, but they are
certainly fun to look at. So I have a few reactions:

  - that 'nomerge' spike at roughly 780s is interesting. I wonder why
it does that.

 - it would be interesting to see - for example - which slabs are the
top memory users, and not _just_ the total (it could clarify the
spike, for example). That's obviously something that works much better
for the no-merge case, but could your script be changed to show (say)
the "top 5 slabs". Showing all of them would probably be too messy,
but "top 5" could be interesting.

 - assuming the times are comparable, it looks like 'merge' really is
noticeably faster. But that might just be noise too, so this may not
be real data.

 - regardless of how meaningful the graphs are, and whether they
really tell us anything, I do like the concept, and I'd love to see
people do things like this more often. Visualization to show behavior
is great.

That last point in particular means that if you scripted this and your
scripts aren't *too* ugly and not too tied to your particular setup, I
think it would perhaps not be a bad idea to encourage plots like this
by making those kinds of scripts available in the kernel tree.  That's
particularly true if you used something like the tools/testing/ktest/
scripts to run these things automatically (which can be a *big* issue
to show that something is actually stable across multiple boots, and
see the variance).

So maybe these graphs are meaningful, and maybe they aren't. But I'd
still like to see more of them ;)

                  Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-05 20:33                   ` Linus Torvalds
@ 2015-09-07  8:44                     ` Sergey Senozhatsky
  2015-09-08  0:22                       ` Sergey Senozhatsky
  0 siblings, 1 reply; 42+ messages in thread
From: Sergey Senozhatsky @ 2015-09-07  8:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sergey Senozhatsky, Christoph Lameter, Jesper Dangaard Brouer,
	Dave Chinner, Mike Snitzer, Pekka Enberg, Andrew Morton,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm, Sergey Senozhatsky

[-- Attachment #1: Type: text/plain, Size: 19729 bytes --]

On (09/05/15 13:33), Linus Torvalds wrote:
> > ... And those are sort of interesting. I was expecting to see more
> > diverged behaviours.
> >
> > Attached.


Hello, sorry for long reply.


> So I'm not sure how really conclusive these graphs are, but they are
> certainly fun to look at. So I have a few reactions:
> 
>   - that 'nomerge' spike at roughly 780s is interesting. I wonder why
> it does that.
> 

Please find some stats below (with TOP 5 slabs). ~780s looks like the
time when glibc build script begins to package glibc (gzip, xz...).


>  - it would be interesting to see - for example - which slabs are the
> top memory users, and not _just_ the total (it could clarify the
> spike, for example). That's obviously something that works much better
> for the no-merge case, but could your script be changed to show (say)
> the "top 5 slabs". Showing all of them would probably be too messy,
> but "top 5" could be interesting.


OFFTOP: Capturing is not a problem; visualizing -- is. With a huge number of samples
the graph quickly becomes impossible to read. We have different N `top' slabs
after every measurement, labeling them on a graph is a bit messy. So my script right
now just picks the first slab (most Memory Used or biggest Loss value) per sample
(e.g. every second) and does something like this (in png):

  20 +-+---+------------+------------+------------+---+-+
     |     +            +            +            +     |
     |            +------------+           SIZE +-----+ |
  18 +-+          |            |           LOSS +-----+-+
     |            |            |                        |
     |            |            |                        |
     |            |            |                        |
  16 +-+          |            |                      +-+
     |            |            |                        |
     |------------+            |                        |
  14 +-+          |            |                      +-+
     |            |            |                        |
     |            |            |           +------------|
     |            |            |           |            |
  12 +-+          |------------|           |          +-+
     |            |            |           |            |
     |            |            |           |            |
  10 +-+          |            |-----------+          +-+
     |            |            |           |            |
     |            |            |           |            |
     |            |            |           |            |
   8 +-+----------|            |           |          +-+
     |            |            |           |------------|
     |     +      |     +      |     +     |      +     |
   6 +-+---+------------+------------+------------+---+-+
         slab1        slab2        slab3        slab1
                           samples

          ^            ^            ^            ^
          1s           2s           3s           4s ... (<< not part of the graph)




BACK to spikes.

I modified `slabinfo' tool to report top N (5 in this case) slabs sorted by
Memory usage and by Loss, along with Slab totals (+report everything in bytes,
w/o the dynamic G/M/K scaling. well, techically Loss is `Space - Objects * Objsize'
and can be calculated from the existing output, but I'm lazy. Besides top N biggest
slabs and top N most fragmented ones do not necessarily overlap, so I print both
sets).


Some of the spikes. Samples are separated by "Sample #d".

Test
===============================================================================================
Sample -- 1 second. 98828288 -> 107409408 -> 100171776


Sample #408
Slabcache Totals
----------------
Slabcaches : 140      Aliases  :   0->0   Active: 105
Memory used: 98828288   # Loss   : 3872736   MRatio:     4%
# Objects  : 329484   # PartObj:    484   ORatio:     0%

Per Cache    Average         Min         Max       Total
---------------------------------------------------------
#Objects        3137          16       92313      329484
#Slabs            93           1        2367        9766
#PartSlab          0           0           8          57
%PartSlab         2%          0%         58%          0%
PartObjs           0           0         142         484
% PartObj         0%          0%         38%          0%
Memory        941221        4096    35258368    98828288
Used          904338        4096    33622848    94955552
Loss           36883           0     1635520     3872736

Per Object   Average         Min         Max
---------------------------------------------
Memory           289           8        8192
User             288           8        8192
Loss               1           0          64

Slabs sorted by size (5)
---------------------------------------------------------
Name                   Objects Objsize                Space Slabs/Part/Cpu  O/S O %Fr %Ef Flg
ext4_inode_cache         19368    1736             35258368       1072/0/4   18 3   0  95 a
dentry                   46200     288             13516800      1635/0/15   28 1   0  98 a
inode_cache              12150     864             11059200       665/0/10   18 2   0  94 a
buffer_head              92313     104              9695232       2363/0/4   39 0   0  99 a
radix_tree_node           6832     576              3997696        240/0/4   28 2   0  98 a

Slabs sorted by loss (5)
---------------------------------------------------------
ext4_inode_cache         19368    1736              1635520       1072/0/4   18 3   0  95 a
inode_cache              12150     864               561600       665/0/10   18 2   0  94 a
dentry                   46200     288               211200      1635/0/15   28 1   0  98 a
biovec-256                  46    4096               204800          7/7/5    8 3  58  47 A
task_struct                174    4928               125568        19/3/11    6 3  10  87 

Sample #409
Slabcache Totals
----------------
Slabcaches : 140      Aliases  :   0->0   Active: 105
Memory used: 107409408   # Loss   : 3782600   MRatio:     3%
# Objects  : 335908   # PartObj:    485   ORatio:     0%

Per Cache    Average         Min         Max       Total
---------------------------------------------------------
#Objects        3199          16       92742      335908
#Slabs            96           1        2378       10081
#PartSlab          0           0          39          67
%PartSlab         1%          0%         50%          0%
# Objects  : 335908   # PartObj:    485   ORatio:     0%
# Objects  : 335908   # PartObj:    485   ORatio:     0%

Per Cache    Average         Min         Max       Total
---------------------------------------------------------
#Objects        3199          16       92742      335908
#Slabs            96           1        2378       10081
#PartSlab          0           0          39          67
%PartSlab         1%          0%         50%          0%
PartObjs           0           0         274         485
% PartObj         0%          0%         38%          0%
Memory       1022946        4096    35422208   107409408
Used          986921        4096    33779088   103626808
Loss           36024           0     1643120     3782600

Per Object   Average         Min         Max
---------------------------------------------
Memory           310           8        8192
User             308           8        8192
Loss               1           0          64

Slabs sorted by size (5)
---------------------------------------------------------
Name                   Objects Objsize                Space Slabs/Part/Cpu  O/S O %Fr %Ef Flg
ext4_inode_cache         19458    1736             35422208       1077/0/4   18 3   0  95 a
dentry                   46620     288             13639680       1658/0/7   28 1   0  98 a
inode_cache              12150     864             11059200       665/0/10   18 2   0  94 a
buffer_head              92742     104              9740288      2367/0/11   39 0   0  99 a
biovec-256                2128    4096              8749056        263/0/4    8 3   0  99 A

Slabs sorted by loss (5)
---------------------------------------------------------
ext4_inode_cache         19458    1736              1643120       1077/0/4   18 3   0  95 a
inode_cache              12150     864               561600       665/0/10   18 2   0  94 a
filp                      2169     432               267216      134/39/13   18 1  26  77 A
dentry                   46620     288               213120       1658/0/7   28 1   0  98 a
task_struct                165    4928               104384        18/2/10    6 3   7  88 

Sample #410
Slabcache Totals
----------------
Slabcaches : 140      Aliases  :   0->0   Active: 105
Memory used: 100171776   # Loss   : 3975712   MRatio:     4%
# Objects  : 334759   # PartObj:    633   ORatio:     0%

Per Cache    Average         Min         Max       Total
---------------------------------------------------------
#Objects        3188          16       92859      334759
#Slabs            94           1        2381        9922
#PartSlab          0           0          12          74
%PartSlab         2%          0%         57%          0%
PartObjs           0           0         209         633
% PartObj         0%          0%         38%          0%
Memory        954016        4096    35618816   100171776
Used          916152        4096    33966576    96196064
Loss           37863           0     1652240     3975712

Per Object   Average         Min         Max
---------------------------------------------
Memory           289           8        8192
User             287           8        8192
Loss               1           0          64

Slabs sorted by size (5)
---------------------------------------------------------
Name                   Objects Objsize                Space Slabs/Part/Cpu  O/S O %Fr %Ef Flg
ext4_inode_cache         19566    1736             35618816       1083/0/4   18 3   0  95 a
dentry                   46788     288             13688832      1661/0/10   28 1   0  98 a
inode_cache              12150     864             11059200       665/0/10   18 2   0  94 a
buffer_head              92859     104              9752576      2371/0/10   39 0   0  99 a
radix_tree_node           6888     576              4030464        242/0/4   28 2   0  98 a

Slabs sorted by loss (5)
---------------------------------------------------------
ext4_inode_cache         19566    1736              1652240       1083/0/4   18 3   0  95 a
inode_cache              12150     864               561600       665/0/10   18 2   0  94 a
biovec-256                  54    4096               237568          8/8/6    8 3  57  48 A
dentry                   46788     288               213888      1661/0/10   28 1   0  98 a
task_struct                169    4928               182976        20/5/11    6 3  16  81 





Another test.
===============================================================================================

Sample -- 1 second.   251637760 -> 306782208 -> 252264448


Sample #426
Slabcache Totals
----------------
Slabcaches : 140      Aliases  :   0->0   Active: 107
Memory used: 251637760   # Loss   : 11002192   MRatio:     4%
# Objects  : 528119   # PartObj:   6437   ORatio:     1%

Per Cache    Average         Min         Max       Total
---------------------------------------------------------
#Objects        4935          11      114582      528119
#Slabs           164           1        4718       17594
#PartSlab          3           0         141         394
%PartSlab         4%          0%         65%          2%
PartObjs           1           0        2422        6437
% PartObj         2%          0%         42%          1%
Memory       2351754        4096   154599424   251637760
Used         2248930        3584   147428064   240635568
Loss          102824           0     7171360    11002192

Per Object   Average         Min         Max
---------------------------------------------
Memory           457           8        8192
User             455           8        8192
Loss               2           0          64

Slabs sorted by size (5)
---------------------------------------------------------
Name                   Objects Objsize                Space Slabs/Part/Cpu  O/S O %Fr %Ef Flg
ext4_inode_cache         84924    1736            154599424       4714/0/4   18 3   0  95 a
dentry                  114408     288             33472512       4080/0/6   28 1   0  98 a
buffer_head             114582     104             12034048       2934/0/4   39 0   0  99 a
inode_cache              12186     864             11091968       667/0/10   18 2   0  94 a
radix_tree_node          10388     576              6078464        367/0/4   28 2   0  98 a

Slabs sorted by loss (5)
---------------------------------------------------------
ext4_inode_cache         84924    1736              7171360       4714/0/4   18 3   0  95 a
inode_cache              12186     864               563264       667/0/10   18 2   0  94 a
dentry                  114408     288               523008       4080/0/6   28 1   0  98 a
kmalloc-128               4117     128               353664     160/141/55   32 0  65  59 
kmalloc-2048              1421    2048               202752       80/27/15   16 3  28  93 

Sample #427
Slabcache Totals
----------------
Slabcaches : 140      Aliases  :   0->0   Active: 107
Memory used: 306782208   # Loss   : 11304176   MRatio:     3%
# Objects  : 569050   # PartObj:   6538   ORatio:     1%

Per Cache    Average         Min         Max       Total
---------------------------------------------------------
#Objects        5318          11      114777      569050
#Slabs           187           1        4725       20096
#PartSlab          3           0         141         391
%PartSlab         3%          0%         65%          1%
PartObjs           1           0        2422        6538
% PartObj         1%          0%         42%          1%
Memory       2867123        4096   154828800   306782208
Used         2761476        3584   147646800   295478032
Loss          105646           0     7182000    11304176

Per Object   Average         Min         Max
---------------------------------------------
Memory           521           8        8192
User             519           8        8192
Loss               2           0          64

Slabs sorted by size (5)
---------------------------------------------------------
Name                   Objects Objsize                Space Slabs/Part/Cpu  O/S O %Fr %Ef Flg
ext4_inode_cache         85050    1736            154828800       4721/0/4   18 3   0  95 a
biovec-256               12416    4096             50954240       1550/3/5    8 3   0  99 A
dentry                  114548     288             33513472      4075/0/16   28 1   0  98 a
buffer_head             114777     104             12054528       2939/0/4   39 0   0  99 a
inode_cache              12186     864             11091968       667/0/10   18 2   0  94 a

Slabs sorted by loss (5)
---------------------------------------------------------
ext4_inode_cache         85050    1736              7182000       4721/0/4   18 3   0  95 a
inode_cache              12186     864               563264       667/0/10   18 2   0  94 a
dentry                  114548     288               523648      4075/0/16   28 1   0  98 a
kmalloc-128               4117     128               353664     160/141/55   32 0  65  59 
bio-0                    12852     176               244800       589/0/23   21 0   0  90 A

Sample #428
Slabcache Totals
----------------
Slabcaches : 140      Aliases  :   0->0   Active: 107
Memory used: 252264448   # Loss   : 11537008   MRatio:     4%
# Objects  : 529408   # PartObj:   8649   ORatio:     1%

Per Cache    Average         Min         Max       Total
---------------------------------------------------------
#Objects        4947          11      115947      529408
#Slabs           165           1        4725       17655
#PartSlab          5           0         141         566
%PartSlab         5%          0%         65%          3%
PartObjs           1           0        2422        8649
% PartObj         2%          0%         42%          1%
Memory       2357611        4096   154828800   252264448
Used         2249789        3584   147646800   240727440
Loss          107822           0     7182000    11537008

Per Object   Average         Min         Max
---------------------------------------------
Memory           456           8        8192
User             454           8        8192
Loss               2           0          64

Slabs sorted by size (5)
---------------------------------------------------------
Name                   Objects Objsize                Space Slabs/Part/Cpu  O/S O %Fr %Ef Flg
ext4_inode_cache         85050    1736            154828800       4721/0/4   18 3   0  95 a
dentry                  114660     288             33546240      4075/0/20   28 1   0  98 a
buffer_head             115947     104             12177408      2942/0/31   39 0   0  99 a
inode_cache              12186     864             11091968       667/0/10   18 2   0  94 a
radix_tree_node          10444     576              6111232        369/0/4   28 2   0  98 a

Slabs sorted by loss (5)
---------------------------------------------------------
ext4_inode_cache         85050    1736              7182000       4721/0/4   18 3   0  95 a
inode_cache              12186     864               563264       667/0/10   18 2   0  94 a
dentry                  114660     288               524160      4075/0/20   28 1   0  98 a
filp                      3572     432               447552     227/113/16   18 1  46  77 A
kmalloc-128               4117     128               353664     160/141/55   32 0  65  59 




Attached some graphs for NOMERGE kernel. So far, I haven't seen those spikes
for 'merge' kernel.


>  - assuming the times are comparable, it looks like 'merge' really is
> noticeably faster. But that might just be noise too, so this may not
> be real data.
>
>  - regardless of how meaningful the graphs are, and whether they
> really tell us anything, I do like the concept, and I'd love to see
> people do things like this more often. Visualization to show behavior
> is great.
>
> That last point in particular means that if you scripted this and your
> scripts aren't *too* ugly and not too tied to your particular setup, I
> think it would perhaps not be a bad idea to encourage plots like this
> by making those kinds of scripts available in the kernel tree.  That's
> particularly true if you used something like the tools/testing/ktest/
> scripts to run these things automatically (which can be a *big* issue
> to show that something is actually stable across multiple boots, and
> see the variance).

Oh, that's a good idea. I didn't use tools/testing/ktest/, it's a bit too
massive for my toy script. I have some modifications to slabinfo and a rather
ugly script to parse files and feed them to gnuplot (and yes, I use gnuplot
for plotting). slabinfo patches are not entirely dumb and close to being ready
(well.. except that I need to clean up all those %6s sprintfs that worked fine
for dynamically scalled sizes and do not work so nicely for sizes in bytes). I
can send them out later. Less sure about the script (bash) tho. In a nutshell
it's just a number of
     grep | awk > FOO; gnuplot ... FOO

So I'll finish some plotting improvements first (not ready yet) and then
I'll take a look how quickly I can land it (rewrite in perl) in
tools/testing/ktest/.

> So maybe these graphs are meaningful, and maybe they aren't. But I'd
> still like to see more of them ;)

Thanks.

	-ss

[-- Attachment #2: nomerge-mm-loss-usage-1.png --]
[-- Type: image/png, Size: 12580 bytes --]

[-- Attachment #3: nomerge-mm-loss-usage-2.png --]
[-- Type: image/png, Size: 12283 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-04  3:51           ` Linus Torvalds
  2015-09-05  0:36               ` Dave Chinner
@ 2015-09-07  9:30             ` Jesper Dangaard Brouer
  2015-09-07 20:22                 ` Linus Torvalds
  1 sibling, 1 reply; 42+ messages in thread
From: Jesper Dangaard Brouer @ 2015-09-07  9:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: brouer, Dave Chinner, Mike Snitzer, Christoph Lameter,
	Pekka Enberg, Andrew Morton, David Rientjes, Joonsoo Kim,
	dm-devel, Alasdair G Kergon, Joe Thornber, Mikulas Patocka,
	Vivek Goyal, Sami Tolvanen, Viresh Kumar, Heinz Mauelshagen,
	linux-mm, netdev


On Thu, 3 Sep 2015 20:51:09 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, Sep 3, 2015 at 8:26 PM, Dave Chinner <dchinner@redhat.com> wrote:
> >
> > The double standard is the problem here. No notification, proof,
> > discussion or review was needed to turn on slab merging for
> > everyone, but you're setting a very high bar to jump if anyone wants
> > to turn it off in their code.
> 
> Ehh. You realize that almost the only load that is actually seriously
> allocator-limited is networking?
> 
> And slub was beating slab on that? And slub has been doing the merging
> since day one. Slab was just changed to try to keep up with the
> winning strategy.

Sorry, I have to correct you on this.  The slub allocator is not as
fast as you might think.  The slab allocator is actually faster for
networking.

IP-forwarding, single CPU, single flow UDP (highly tuned):
 * Allocator slub: 2043575 pps
 * Allocator slab: 2088295 pps

Difference slab faster than slub:
 * +44720 pps and -10.48ns

The slub allocator have a faster "fastpath", if your workload is
fast-reusing within the same per-cpu page-slab, but once the workload
increases you hit the slowpath, and then slab catches up. Slub looks
great in micro-benchmarking.


As you can see in patchset:
 [PATCH 0/3] Network stack, first user of SLAB/kmem_cache bulk free API.
 http://thread.gmane.org/gmane.linux.kernel.mm/137469/focus=376625

I'm working on speeding up slub to the level of slab.  And it seems
like I have succeeded with half-a-nanosec 2090522 pps (+2227 pps or
0.51 ns).

And with "slab_nomerge" I get even high performance:
 * slub: bulk-free and slab_nomerge: 2121824 pps
 * Diff to slub: +78249 and -18.05ns

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-07  9:30             ` Jesper Dangaard Brouer
@ 2015-09-07 20:22                 ` Linus Torvalds
  0 siblings, 0 replies; 42+ messages in thread
From: Linus Torvalds @ 2015-09-07 20:22 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Dave Chinner, Mike Snitzer, Christoph Lameter, Pekka Enberg,
	Andrew Morton, David Rientjes, Joonsoo Kim, dm-devel,
	Alasdair G Kergon, Joe Thornber, Mikulas Patocka, Vivek Goyal,
	Sami Tolvanen, Viresh Kumar, Heinz Mauelshagen, linux-mm, netdev

On Mon, Sep 7, 2015 at 2:30 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> The slub allocator have a faster "fastpath", if your workload is
> fast-reusing within the same per-cpu page-slab, but once the workload
> increases you hit the slowpath, and then slab catches up. Slub looks
> great in micro-benchmarking.
>
> And with "slab_nomerge" I get even high performance:

I think those two are related.

Not merging means that effectively the percpu caches end up being
bigger (simply because there are more of them), and so it captures
more of the fastpath cases.

Obviously the percpu queue size is an easy tunable too, but there are
real downsides to that too. I suspect your IP forwarding case isn't so
different from some of the microbenchmarks, it just has more
outstanding work..

And yes, the slow path (ie not hitting in the percpu cache) of SLUB
could hopefully be optimizable too, although maybe the bulk patches
are the way to go (and unrelated to this thread - at least part of
your bulk patches actually got merged last Friday - they were part of
Andrew's patch-bomb).

            Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
@ 2015-09-07 20:22                 ` Linus Torvalds
  0 siblings, 0 replies; 42+ messages in thread
From: Linus Torvalds @ 2015-09-07 20:22 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Dave Chinner, Mike Snitzer, Christoph Lameter, Pekka Enberg,
	Andrew Morton, David Rientjes, Joonsoo Kim, dm-devel,
	Alasdair G Kergon, Joe Thornber, Mikulas Patocka, Vivek Goyal,
	Sami Tolvanen, Viresh Kumar, Heinz Mauelshagen, linux-mm, netdev

On Mon, Sep 7, 2015 at 2:30 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> The slub allocator have a faster "fastpath", if your workload is
> fast-reusing within the same per-cpu page-slab, but once the workload
> increases you hit the slowpath, and then slab catches up. Slub looks
> great in micro-benchmarking.
>
> And with "slab_nomerge" I get even high performance:

I think those two are related.

Not merging means that effectively the percpu caches end up being
bigger (simply because there are more of them), and so it captures
more of the fastpath cases.

Obviously the percpu queue size is an easy tunable too, but there are
real downsides to that too. I suspect your IP forwarding case isn't so
different from some of the microbenchmarks, it just has more
outstanding work..

And yes, the slow path (ie not hitting in the percpu cache) of SLUB
could hopefully be optimizable too, although maybe the bulk patches
are the way to go (and unrelated to this thread - at least part of
your bulk patches actually got merged last Friday - they were part of
Andrew's patch-bomb).

            Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-07 20:22                 ` Linus Torvalds
  (?)
@ 2015-09-07 21:17                 ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 42+ messages in thread
From: Jesper Dangaard Brouer @ 2015-09-07 21:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Chinner, Mike Snitzer, Christoph Lameter, Pekka Enberg,
	Andrew Morton, David Rientjes, Joonsoo Kim, dm-devel,
	Alasdair G Kergon, Joe Thornber, Mikulas Patocka, Vivek Goyal,
	Sami Tolvanen, Viresh Kumar, Heinz Mauelshagen, linux-mm, netdev,
	brouer

On Mon, 7 Sep 2015 13:22:13 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Sep 7, 2015 at 2:30 AM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > The slub allocator have a faster "fastpath", if your workload is
> > fast-reusing within the same per-cpu page-slab, but once the workload
> > increases you hit the slowpath, and then slab catches up. Slub looks
> > great in micro-benchmarking.
> >
> > And with "slab_nomerge" I get even high performance:
> 
> I think those two are related.
> 
> Not merging means that effectively the percpu caches end up being
> bigger (simply because there are more of them), and so it captures
> more of the fastpath cases.

Yes, that was also my theory.  As manually tuning the percpu sizes gave
me almost the same boost.


> Obviously the percpu queue size is an easy tunable too, but there are
> real downsides to that too. 

The easy fix is to introduce a subsystem specific percpu cache that is
large enough for our use-case.  That seems to be a trend. I'm hoping to
come up with something smarter that every subsystem can benefit from. 
E.g some heuristic that can dynamic adjust SLUB according to the usage
pattern. I can imagine something as simple as a counter for every
slowpath call, that is only valid as long as the jiffies count matches
(reset to zero, and store new jiffies cnt).  (But I have not thought
this through...)


> I suspect your IP forwarding case isn't so
> different from some of the microbenchmarks, it just has more
> outstanding work..

Yes, I will admit that my testing is very close to micro benchmarking,
and it is specifically designed to pressure the system to its limits[1].
Especially the minimum frame size is evil and unrealistic, but the real
purpose is preparing the stack for increasing speeds like 100Gbit/s.


> And yes, the slow path (ie not hitting in the percpu cache) of SLUB
> could hopefully be optimizable too, although maybe the bulk patches
> are the way to go (and unrelated to this thread - at least part of
> your bulk patches actually got merged last Friday - they were part of
> Andrew's patch-bomb).

Cool. Yes, it is only part of the bulk patches. The real performance
boosters are not in yet (but I need to make them work correctly with
memory debugging enabled before they can get merged).  At least the
main API is in, which allows me to implement use-case easier in other
subsystems :-)

[1] http://netoptimizer.blogspot.dk/2014/09/packet-per-sec-measurements-for.html
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)
  2015-09-07  8:44                     ` Sergey Senozhatsky
@ 2015-09-08  0:22                       ` Sergey Senozhatsky
  0 siblings, 0 replies; 42+ messages in thread
From: Sergey Senozhatsky @ 2015-09-08  0:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sergey Senozhatsky, Christoph Lameter, Jesper Dangaard Brouer,
	Dave Chinner, Mike Snitzer, Pekka Enberg, Andrew Morton,
	David Rientjes, Joonsoo Kim, dm-devel, Alasdair G Kergon,
	Joe Thornber, Mikulas Patocka, Vivek Goyal, Sami Tolvanen,
	Viresh Kumar, Heinz Mauelshagen, linux-mm, Sergey Senozhatsky

On (09/07/15 17:44), Sergey Senozhatsky wrote:
[...]
> Oh, that's a good idea. I didn't use tools/testing/ktest/, it's a bit too
> massive for my toy script. I have some modifications to slabinfo and a rather
> ugly script to parse files and feed them to gnuplot (and yes, I use gnuplot
> for plotting). slabinfo patches are not entirely dumb and close to being ready
> (well.. except that I need to clean up all those %6s sprintfs that worked fine
> for dynamically scalled sizes and do not work so nicely for sizes in bytes). I
> can send them out later. Less sure about the script (bash) tho. In a nutshell
> it's just a number of
>      grep | awk > FOO; gnuplot ... FOO
> 
> So I'll finish some plotting improvements first (not ready yet) and then
> I'll take a look how quickly I can land it (rewrite in perl) in
> tools/testing/ktest/.

Hi,

uploaded my scripts to
https://github.com/sergey-senozhatsky/slabinfo

A set of very simple bash scripts. The README file contains
some sort of documentation and a 'tutorial'.

==================================================================
To start collecting samples, record file name is NOMERGE, note sudo

sudo ./slabinfo-plotter.sh -r NOMERGE

#^C or reboot

pre-process records file for gnuplot

./slabinfo-plotter.sh -p NOMERGE -b gnuplot
File gnuplot_slabs-by-loss-NOMERGE
File gnuplot_slabs-by-size-NOMERGE
File gnuplot_totals-NOMERGE

generate grphs from 'slabinfo totals'

./gnuplot-totals.sh -f gnuplot_totals-NOMERGE


Graph file name -- gnuplot_totals-NOMERGE.png
...

==================================================================


Two things:
-- it wants a patched version of slabinfo (some sort of patches are in
   kernel_patches/ dir)
-- it wants slabinfo to be in PATH


For now on it does what it does -- captures numbers and picks only ones
that are interesting to me and generates plots.


I'm doing this in my spare time, but I'm surely accepting improvement
requests/ideas, pull requests, and everything that follows.


Will play around with the scripts for some time to make sure they
are usable and then we can decide if there is a place for something
like this in the kernel or it's better be done somehow differently.

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2015-09-08  0:21 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-02 23:13 slab-nomerge (was Re: [git pull] device mapper changes for 4.3) Linus Torvalds
2015-09-03  0:48 ` Andrew Morton
2015-09-03  0:53   ` Mike Snitzer
2015-09-03  0:51 ` Mike Snitzer
2015-09-03  0:51   ` Mike Snitzer
2015-09-03  1:21   ` Linus Torvalds
2015-09-03  2:31     ` Mike Snitzer
2015-09-03  3:10       ` Christoph Lameter
2015-09-03  4:55         ` Andrew Morton
2015-09-03  6:09           ` Pekka Enberg
2015-09-03  8:53             ` Dave Chinner
2015-09-03  3:11       ` Linus Torvalds
2015-09-03  6:02     ` Dave Chinner
2015-09-03  6:13       ` Pekka Enberg
2015-09-03 10:29       ` Jesper Dangaard Brouer
2015-09-03 16:19         ` Christoph Lameter
2015-09-04  9:10           ` Jesper Dangaard Brouer
2015-09-04 14:13             ` Christoph Lameter
2015-09-04  6:35         ` Sergey Senozhatsky
2015-09-04  7:01           ` Linus Torvalds
2015-09-04  7:59             ` Sergey Senozhatsky
2015-09-04  9:56               ` Sergey Senozhatsky
2015-09-04 14:05               ` Christoph Lameter
2015-09-04 14:11               ` Linus Torvalds
2015-09-05  2:09                 ` Sergey Senozhatsky
2015-09-05  2:09                   ` Sergey Senozhatsky
2015-09-05 20:33                   ` Linus Torvalds
2015-09-07  8:44                     ` Sergey Senozhatsky
2015-09-08  0:22                       ` Sergey Senozhatsky
2015-09-03 15:02       ` Linus Torvalds
2015-09-04  3:26         ` Dave Chinner
2015-09-04  3:51           ` Linus Torvalds
2015-09-05  0:36             ` Dave Chinner
2015-09-05  0:36               ` Dave Chinner
2015-09-07  9:30             ` Jesper Dangaard Brouer
2015-09-07 20:22               ` Linus Torvalds
2015-09-07 20:22                 ` Linus Torvalds
2015-09-07 21:17                 ` Jesper Dangaard Brouer
2015-09-04 13:55           ` Christoph Lameter
2015-09-04 22:46             ` Dave Chinner
2015-09-05  0:25               ` Christoph Lameter
2015-09-05  1:16                 ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.