linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: Fix comment for NODEMASK_ALLOC
@ 2018-08-20  8:55 Oscar Salvador
  2018-08-20 21:24 ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Oscar Salvador @ 2018-08-20  8:55 UTC (permalink / raw)
  To: akpm
  Cc: tglx, joe, arnd, mhocko, gregkh, linux-kernel, linux-mm, Oscar Salvador

From: Oscar Salvador <osalvador@suse.de>

Currently, NODEMASK_ALLOC allocates a nodemask_t with kmalloc when
NODES_SHIFT is higher than 8, otherwise it declares it within the stack.

The comment says that the reasoning behind this, is that nodemask_t will be
256 bytes when NODES_SHIFT is higher than 8, but this is not true.
For example, NODES_SHIFT = 9 will give us a 64 bytes nodemask_t.
Let us fix up the comment for that.

Another thing is that it might make sense to let values lower than 128bytes
be allocated in the stack.
Although this all depends on the depth of the stack
(and this changes from function to function), I think that 64 bytes
is something we can easily afford.
So we could even bump the limit by 1 (from > 8 to > 9).

Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
 include/linux/nodemask.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 1fbde8a880d9..5a30ad594ccc 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -518,7 +518,7 @@ static inline int node_random(const nodemask_t *mask)
  * NODEMASK_ALLOC(type, name) allocates an object with a specified type and
  * name.
  */
-#if NODES_SHIFT > 8 /* nodemask_t > 256 bytes */
+#if NODES_SHIFT > 8 /* nodemask_t > 32 bytes */
 #define NODEMASK_ALLOC(type, name, gfp_flags)	\
 			type *name = kmalloc(sizeof(*name), gfp_flags)
 #define NODEMASK_FREE(m)			kfree(m)
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: Fix comment for NODEMASK_ALLOC
  2018-08-20  8:55 [PATCH] mm: Fix comment for NODEMASK_ALLOC Oscar Salvador
@ 2018-08-20 21:24 ` Andrew Morton
  2018-08-21 12:17   ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2018-08-20 21:24 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: tglx, joe, arnd, mhocko, gregkh, linux-kernel, linux-mm, Oscar Salvador

On Mon, 20 Aug 2018 10:55:16 +0200 Oscar Salvador <osalvador@techadventures.net> wrote:

> From: Oscar Salvador <osalvador@suse.de>
> 
> Currently, NODEMASK_ALLOC allocates a nodemask_t with kmalloc when
> NODES_SHIFT is higher than 8, otherwise it declares it within the stack.
> 
> The comment says that the reasoning behind this, is that nodemask_t will be
> 256 bytes when NODES_SHIFT is higher than 8, but this is not true.
> For example, NODES_SHIFT = 9 will give us a 64 bytes nodemask_t.
> Let us fix up the comment for that.
> 
> Another thing is that it might make sense to let values lower than 128bytes
> be allocated in the stack.
> Although this all depends on the depth of the stack
> (and this changes from function to function), I think that 64 bytes
> is something we can easily afford.
> So we could even bump the limit by 1 (from > 8 to > 9).
> 

I agree.  Such a change will reduce the amount of testing which the
kmalloc version receives, but I assume there are enough people out
there testing with large NODES_SHIFT values.

And while we're looking at this, it would be nice to make NODES_SHIFT
go away.  Ensure that CONFIG_NODES_SHIFT always has a setting and use
that directly.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: Fix comment for NODEMASK_ALLOC
  2018-08-20 21:24 ` Andrew Morton
@ 2018-08-21 12:17   ` Michal Hocko
  2018-08-21 12:30     ` Oscar Salvador
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2018-08-21 12:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Oscar Salvador, tglx, joe, arnd, gregkh, linux-kernel, linux-mm,
	Oscar Salvador

On Mon 20-08-18 14:24:40, Andrew Morton wrote:
> On Mon, 20 Aug 2018 10:55:16 +0200 Oscar Salvador <osalvador@techadventures.net> wrote:
> 
> > From: Oscar Salvador <osalvador@suse.de>
> > 
> > Currently, NODEMASK_ALLOC allocates a nodemask_t with kmalloc when
> > NODES_SHIFT is higher than 8, otherwise it declares it within the stack.
> > 
> > The comment says that the reasoning behind this, is that nodemask_t will be
> > 256 bytes when NODES_SHIFT is higher than 8, but this is not true.
> > For example, NODES_SHIFT = 9 will give us a 64 bytes nodemask_t.
> > Let us fix up the comment for that.
> > 
> > Another thing is that it might make sense to let values lower than 128bytes
> > be allocated in the stack.
> > Although this all depends on the depth of the stack
> > (and this changes from function to function), I think that 64 bytes
> > is something we can easily afford.
> > So we could even bump the limit by 1 (from > 8 to > 9).
> > 
> 
> I agree.  Such a change will reduce the amount of testing which the
> kmalloc version receives, but I assume there are enough people out
> there testing with large NODES_SHIFT values.

We do have CONFIG_NODES_SHIFT=10 in our SLES kernels for quite some
time (around SLE11-SP3 AFAICS).

Anyway, isn't NODES_ALLOC over engineered a bit? Does actually even do
larger than 1024 NUMA nodes? This would be 128B and from a quick glance
it seems that none of those functions are called in deep stacks. I
haven't gone through all of them but a patch which checks them all and
removes NODES_ALLOC would be quite nice IMHO.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: Fix comment for NODEMASK_ALLOC
  2018-08-21 12:17   ` Michal Hocko
@ 2018-08-21 12:30     ` Oscar Salvador
  2018-08-21 12:51       ` Michal Hocko
  2018-08-21 20:51       ` Andrew Morton
  0 siblings, 2 replies; 8+ messages in thread
From: Oscar Salvador @ 2018-08-21 12:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, tglx, joe, arnd, gregkh, linux-kernel, linux-mm,
	Oscar Salvador

On Tue, Aug 21, 2018 at 02:17:34PM +0200, Michal Hocko wrote:
> We do have CONFIG_NODES_SHIFT=10 in our SLES kernels for quite some
> time (around SLE11-SP3 AFAICS).
> 
> Anyway, isn't NODES_ALLOC over engineered a bit? Does actually even do
> larger than 1024 NUMA nodes? This would be 128B and from a quick glance
> it seems that none of those functions are called in deep stacks. I
> haven't gone through all of them but a patch which checks them all and
> removes NODES_ALLOC would be quite nice IMHO.

No, maximum we can get is 1024 NUMA nodes.
I checked this when writing another patch [1], and since having gone
through all archs Kconfigs, CONFIG_NODES_SHIFT=10 is the limit.

NODEMASK_ALLOC gets only called from:

- unregister_mem_sect_under_nodes() (not anymore after [1])
- __nr_hugepages_store_common (This does not seem to have a deep stack, we could use a normal nodemask_t)

But is also used for NODEMASK_SCRATCH (mainly used for mempolicy):

struct nodemask_scratch {
	nodemask_t	mask1;
	nodemask_t	mask2;
};

that would make 256 bytes in case CONFIG_NODES_SHIFT=10.
I am not familiar with mempolicy code, I am not sure if we can do without that and
figure out another way to achieve the same.

[1] https://patchwork.kernel.org/patch/10566673/#22179663 

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: Fix comment for NODEMASK_ALLOC
  2018-08-21 12:30     ` Oscar Salvador
@ 2018-08-21 12:51       ` Michal Hocko
  2018-08-21 12:58         ` Oscar Salvador
  2018-08-21 20:51       ` Andrew Morton
  1 sibling, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2018-08-21 12:51 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, tglx, joe, arnd, gregkh, linux-kernel, linux-mm,
	Oscar Salvador

On Tue 21-08-18 14:30:24, Oscar Salvador wrote:
> On Tue, Aug 21, 2018 at 02:17:34PM +0200, Michal Hocko wrote:
> > We do have CONFIG_NODES_SHIFT=10 in our SLES kernels for quite some
> > time (around SLE11-SP3 AFAICS).
> > 
> > Anyway, isn't NODES_ALLOC over engineered a bit? Does actually even do
> > larger than 1024 NUMA nodes? This would be 128B and from a quick glance
> > it seems that none of those functions are called in deep stacks. I
> > haven't gone through all of them but a patch which checks them all and
> > removes NODES_ALLOC would be quite nice IMHO.
> 
> No, maximum we can get is 1024 NUMA nodes.
> I checked this when writing another patch [1], and since having gone
> through all archs Kconfigs, CONFIG_NODES_SHIFT=10 is the limit.
> 
> NODEMASK_ALLOC gets only called from:
> 
> - unregister_mem_sect_under_nodes() (not anymore after [1])
> - __nr_hugepages_store_common (This does not seem to have a deep stack, we could use a normal nodemask_t)
> 
> But is also used for NODEMASK_SCRATCH (mainly used for mempolicy):

mempolicy code should be a shallow stack as well. Mostly the syscall
entry.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: Fix comment for NODEMASK_ALLOC
  2018-08-21 12:51       ` Michal Hocko
@ 2018-08-21 12:58         ` Oscar Salvador
  0 siblings, 0 replies; 8+ messages in thread
From: Oscar Salvador @ 2018-08-21 12:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, tglx, joe, arnd, gregkh, linux-kernel, linux-mm,
	Oscar Salvador

On Tue, Aug 21, 2018 at 02:51:56PM +0200, Michal Hocko wrote:
> On Tue 21-08-18 14:30:24, Oscar Salvador wrote:
> > On Tue, Aug 21, 2018 at 02:17:34PM +0200, Michal Hocko wrote:
> > > We do have CONFIG_NODES_SHIFT=10 in our SLES kernels for quite some
> > > time (around SLE11-SP3 AFAICS).
> > > 
> > > Anyway, isn't NODES_ALLOC over engineered a bit? Does actually even do
> > > larger than 1024 NUMA nodes? This would be 128B and from a quick glance
> > > it seems that none of those functions are called in deep stacks. I
> > > haven't gone through all of them but a patch which checks them all and
> > > removes NODES_ALLOC would be quite nice IMHO.
> > 
> > No, maximum we can get is 1024 NUMA nodes.
> > I checked this when writing another patch [1], and since having gone
> > through all archs Kconfigs, CONFIG_NODES_SHIFT=10 is the limit.
> > 
> > NODEMASK_ALLOC gets only called from:
> > 
> > - unregister_mem_sect_under_nodes() (not anymore after [1])
> > - __nr_hugepages_store_common (This does not seem to have a deep stack, we could use a normal nodemask_t)
> > 
> > But is also used for NODEMASK_SCRATCH (mainly used for mempolicy):
> 
> mempolicy code should be a shallow stack as well. Mostly the syscall
> entry.

Ok, then I could give it a try and see if we can get rid of NODEMASK_ALLOC in there
as well.

-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: Fix comment for NODEMASK_ALLOC
  2018-08-21 12:30     ` Oscar Salvador
  2018-08-21 12:51       ` Michal Hocko
@ 2018-08-21 20:51       ` Andrew Morton
  2018-08-23 10:51         ` Oscar Salvador
  1 sibling, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2018-08-21 20:51 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Michal Hocko, tglx, joe, arnd, gregkh, linux-kernel, linux-mm,
	Oscar Salvador

On Tue, 21 Aug 2018 14:30:24 +0200 Oscar Salvador <osalvador@techadventures.net> wrote:

> On Tue, Aug 21, 2018 at 02:17:34PM +0200, Michal Hocko wrote:
> > We do have CONFIG_NODES_SHIFT=10 in our SLES kernels for quite some
> > time (around SLE11-SP3 AFAICS).
> > 
> > Anyway, isn't NODES_ALLOC over engineered a bit? Does actually even do
> > larger than 1024 NUMA nodes? This would be 128B and from a quick glance
> > it seems that none of those functions are called in deep stacks. I
> > haven't gone through all of them but a patch which checks them all and
> > removes NODES_ALLOC would be quite nice IMHO.
> 
> No, maximum we can get is 1024 NUMA nodes.
> I checked this when writing another patch [1], and since having gone
> through all archs Kconfigs, CONFIG_NODES_SHIFT=10 is the limit.
> 
> NODEMASK_ALLOC gets only called from:
> 
> - unregister_mem_sect_under_nodes() (not anymore after [1])
> - __nr_hugepages_store_common (This does not seem to have a deep stack, we could use a normal nodemask_t)
> 
> But is also used for NODEMASK_SCRATCH (mainly used for mempolicy):
> 
> struct nodemask_scratch {
> 	nodemask_t	mask1;
> 	nodemask_t	mask2;
> };
> 
> that would make 256 bytes in case CONFIG_NODES_SHIFT=10.

And that sole site could use an open-coded kmalloc.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: Fix comment for NODEMASK_ALLOC
  2018-08-21 20:51       ` Andrew Morton
@ 2018-08-23 10:51         ` Oscar Salvador
  0 siblings, 0 replies; 8+ messages in thread
From: Oscar Salvador @ 2018-08-23 10:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, tglx, joe, arnd, gregkh, linux-kernel, linux-mm,
	Oscar Salvador

On Tue, Aug 21, 2018 at 01:51:59PM -0700, Andrew Morton wrote:
> On Tue, 21 Aug 2018 14:30:24 +0200 Oscar Salvador <osalvador@techadventures.net> wrote:
> 
> > On Tue, Aug 21, 2018 at 02:17:34PM +0200, Michal Hocko wrote:
> > > We do have CONFIG_NODES_SHIFT=10 in our SLES kernels for quite some
> > > time (around SLE11-SP3 AFAICS).
> > > 
> > > Anyway, isn't NODES_ALLOC over engineered a bit? Does actually even do
> > > larger than 1024 NUMA nodes? This would be 128B and from a quick glance
> > > it seems that none of those functions are called in deep stacks. I
> > > haven't gone through all of them but a patch which checks them all and
> > > removes NODES_ALLOC would be quite nice IMHO.
> > 
> > No, maximum we can get is 1024 NUMA nodes.
> > I checked this when writing another patch [1], and since having gone
> > through all archs Kconfigs, CONFIG_NODES_SHIFT=10 is the limit.
> > 
> > NODEMASK_ALLOC gets only called from:
> > 
> > - unregister_mem_sect_under_nodes() (not anymore after [1])
> > - __nr_hugepages_store_common (This does not seem to have a deep stack, we could use a normal nodemask_t)
> > 
> > But is also used for NODEMASK_SCRATCH (mainly used for mempolicy):
> > 
> > struct nodemask_scratch {
> > 	nodemask_t	mask1;
> > 	nodemask_t	mask2;
> > };
> > 
> > that would make 256 bytes in case CONFIG_NODES_SHIFT=10.
> 
> And that sole site could use an open-coded kmalloc.

It is not really one single place, but four:

- do_set_mempolicy()
- do_mbind()
- kernel_migrate_pages()
- mpol_shared_policy_init()

They get called in:

- do_set_mempolicy()
	- From set_mempolicy syscall
	- From numa_policy_init()
	- From numa_default_policy()

	* All above do not look like they have a deep stack, so it should
	  be possible to get rid of NODEMASK_SCRATCH there.

- do_mbind
	- From mbind syscall

	* Should be feasible here as well.

- kernel_migrate_pages()

	- From migrate_pages syscall
	
	* Again, this should be doable.

- mpol_shared_policy_init()

	- From hugetlbfs_alloc_inode()
	- shmem_get_inode()
	
	* Seems doable for hugetlbfs_alloc_inode as well. 
	  I only got to check hugetlbfs_alloc_inode, because shmem_get_inode


So it seems that this can be done in most of the places.
The only tricky function might be mpol_shared_policy_init because of shmem_get_inode.
But in that case, we could use an open-coded kmalloc there.

Thanks
-- 
Oscar Salvador
SUSE L3

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-08-23 10:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-20  8:55 [PATCH] mm: Fix comment for NODEMASK_ALLOC Oscar Salvador
2018-08-20 21:24 ` Andrew Morton
2018-08-21 12:17   ` Michal Hocko
2018-08-21 12:30     ` Oscar Salvador
2018-08-21 12:51       ` Michal Hocko
2018-08-21 12:58         ` Oscar Salvador
2018-08-21 20:51       ` Andrew Morton
2018-08-23 10:51         ` Oscar Salvador

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).