linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Expose sysctls for enabling slab/file_cache interleaving
@ 2013-11-19  0:50 Andi Kleen
  2013-11-19 10:42 ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2013-11-19  0:50 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

cpusets has settings per cpu sets to enable NUMA node
interleaving for the slab cache or for the file cache.
These are quite useful, especially the setting for interleaving
the file page cache. This avoids the problem that some
program doing IO fills up a node completely and prevents
other programs from getting local memory. File IO
is often slow enough that the small NUMA differences do
not matter. In some cases doing the same for slab
is also useful.

This was always available using cpusets, but setting up
cpusets just for these two settings was always awkward
and complicated for many system administrators.

Add two sysctls that expose these settings directly.
When the sysctl is set it overrides the default choice
from cpusets. The default is still no interleaving,
so no defaults change.

One of the past SLES version had a sysctl similar
to spread_file_cache (with a different name)

There is basically no new code, we just use the existing
cpuset hooks/code. Right now the sysctls are only
active when cpusets are compiled in, but that
could be easily relaxed by changing a few
ifdefs and move the function to do it outside
cpuset.c

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 Documentation/sysctl/vm.txt | 16 ++++++++++++++++
 include/linux/cpuset.h      | 10 ++++++----
 include/linux/mm.h          |  2 ++
 kernel/sysctl.c             | 16 ++++++++++++++++
 mm/memory.c                 |  5 +++++
 5 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 1fbd4eb..4249fef 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -53,6 +53,8 @@ Currently, these files are in /proc/sys/vm:
 - panic_on_oom
 - percpu_pagelist_fraction
 - stat_interval
+- spread_file_cache
+- spread_slab
 - swappiness
 - user_reserve_kbytes
 - vfs_cache_pressure
@@ -680,6 +682,20 @@ is 1 second.
 
 ==============================================================
 
+spread_slab
+
+When not 0 interleave the slab cache over all NUMA nodes, instead of 
+allocating on the current node. Only works for SLAB. Default 0.
+
+==============================================================
+
+spread_file_cache
+
+When not 0 interleave the file cache over all NUMA nodes, instead
+of following the policy of the current process. Default 0.
+
+==============================================================
+
 swappiness
 
 This control is used to define how aggressive the kernel will swap
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index cc1b01c..10966f5 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -72,12 +72,14 @@ extern int cpuset_slab_spread_node(void);
 
 static inline int cpuset_do_page_mem_spread(void)
 {
-	return current->flags & PF_SPREAD_PAGE;
+	return (current->flags & PF_SPREAD_PAGE) ||
+		sysctl_spread_file_cache;
 }
 
 static inline int cpuset_do_slab_mem_spread(void)
 {
-	return current->flags & PF_SPREAD_SLAB;
+	return (current->flags & PF_SPREAD_SLAB) || 
+		sysctl_spread_slab;
 }
 
 extern int current_cpuset_is_being_rebound(void);
@@ -195,12 +197,12 @@ static inline int cpuset_slab_spread_node(void)
 
 static inline int cpuset_do_page_mem_spread(void)
 {
-	return 0;
+	return sysctl_spread_file_cache;
 }
 
 static inline int cpuset_do_slab_mem_spread(void)
 {
-	return 0;
+	return sysctl_spread_slab;
 }
 
 static inline int current_cpuset_is_being_rebound(void)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 42a35d9..e26b26a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1921,5 +1921,7 @@ void __init setup_nr_node_ids(void);
 static inline void setup_nr_node_ids(void) {}
 #endif
 
+extern int sysctl_spread_slab, sysctl_spread_file_cache;
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index d37d9dd..7995ba6 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1375,6 +1375,22 @@ static struct ctl_table vm_table[] = {
 		.extra1		= &zero,
 		.extra2		= &one_hundred,
 	},
+#ifdef CONFIG_CPUSETS
+	{
+		.procname	= "spread_slab",
+		.data		= &sysctl_spread_slab,
+		.maxlen		= sizeof(sysctl_spread_slab),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+	{
+		.procname	= "spread_file_cache",
+		.data		= &sysctl_spread_file_cache,
+		.maxlen		= sizeof(sysctl_spread_file_cache),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+#endif
 #endif
 #ifdef CONFIG_SMP
 	{
diff --git a/mm/memory.c b/mm/memory.c
index bf86658..b6b8fcb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -69,6 +69,11 @@
 
 #include "internal.h"
 
+#ifdef CONFIG_NUMA
+int __read_mostly sysctl_spread_file_cache;	/* Interleave file cache */ 
+int __read_mostly sysctl_spread_slab;		/* Interleave slab */ 
+#endif
+
 #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
 #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid.
 #endif
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Expose sysctls for enabling slab/file_cache interleaving
  2013-11-19  0:50 [PATCH] Expose sysctls for enabling slab/file_cache interleaving Andi Kleen
@ 2013-11-19 10:42 ` Michal Hocko
  2013-11-19 18:42   ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2013-11-19 10:42 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-mm, linux-kernel, Andi Kleen

On Mon 18-11-13 16:50:22, Andi Kleen wrote:
[...]
> diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
> index cc1b01c..10966f5 100644
> --- a/include/linux/cpuset.h
> +++ b/include/linux/cpuset.h
> @@ -72,12 +72,14 @@ extern int cpuset_slab_spread_node(void);
>  
>  static inline int cpuset_do_page_mem_spread(void)
>  {
> -	return current->flags & PF_SPREAD_PAGE;
> +	return (current->flags & PF_SPREAD_PAGE) ||
> +		sysctl_spread_file_cache;
>  }

But this might break applications that explicitly opt out from
spreading.

>  
>  static inline int cpuset_do_slab_mem_spread(void)
>  {
> -	return current->flags & PF_SPREAD_SLAB;
> +	return (current->flags & PF_SPREAD_SLAB) || 
> +		sysctl_spread_slab;
>  }
>  
>  extern int current_cpuset_is_being_rebound(void);
[...]
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Expose sysctls for enabling slab/file_cache interleaving
  2013-11-19 10:42 ` Michal Hocko
@ 2013-11-19 18:42   ` Andi Kleen
  2013-11-19 19:11     ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2013-11-19 18:42 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andi Kleen, linux-mm, linux-kernel, Andi Kleen

On Tue, Nov 19, 2013 at 11:42:03AM +0100, Michal Hocko wrote:
> On Mon 18-11-13 16:50:22, Andi Kleen wrote:
> [...]
> > diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
> > index cc1b01c..10966f5 100644
> > --- a/include/linux/cpuset.h
> > +++ b/include/linux/cpuset.h
> > @@ -72,12 +72,14 @@ extern int cpuset_slab_spread_node(void);
> >  
> >  static inline int cpuset_do_page_mem_spread(void)
> >  {
> > -	return current->flags & PF_SPREAD_PAGE;
> > +	return (current->flags & PF_SPREAD_PAGE) ||
> > +		sysctl_spread_file_cache;
> >  }
> 
> But this might break applications that explicitly opt out from
> spreading.

What do you mean? There's no such setting at the moment.

They can only enable it.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Expose sysctls for enabling slab/file_cache interleaving
  2013-11-19 18:42   ` Andi Kleen
@ 2013-11-19 19:11     ` Michal Hocko
  2013-11-19 20:13       ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2013-11-19 19:11 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-mm, linux-kernel, Andi Kleen

On Tue 19-11-13 19:42:00, Andi Kleen wrote:
> On Tue, Nov 19, 2013 at 11:42:03AM +0100, Michal Hocko wrote:
> > On Mon 18-11-13 16:50:22, Andi Kleen wrote:
> > [...]
> > > diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
> > > index cc1b01c..10966f5 100644
> > > --- a/include/linux/cpuset.h
> > > +++ b/include/linux/cpuset.h
> > > @@ -72,12 +72,14 @@ extern int cpuset_slab_spread_node(void);
> > >  
> > >  static inline int cpuset_do_page_mem_spread(void)
> > >  {
> > > -	return current->flags & PF_SPREAD_PAGE;
> > > +	return (current->flags & PF_SPREAD_PAGE) ||
> > > +		sysctl_spread_file_cache;
> > >  }
> > 
> > But this might break applications that explicitly opt out from
> > spreading.
> 
> What do you mean? There's no such setting at the moment.
> 
> They can only enable it.

cpuset_update_task_spread_flag allows disabling both flags. You can do
so for example via cpuset cgroup controller.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Expose sysctls for enabling slab/file_cache interleaving
  2013-11-19 19:11     ` Michal Hocko
@ 2013-11-19 20:13       ` Andi Kleen
  2013-11-19 21:21         ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2013-11-19 20:13 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andi Kleen, linux-mm, linux-kernel

On Tue, Nov 19, 2013 at 08:11:35PM +0100, Michal Hocko wrote:
> On Tue 19-11-13 19:42:00, Andi Kleen wrote:
> > On Tue, Nov 19, 2013 at 11:42:03AM +0100, Michal Hocko wrote:
> > > On Mon 18-11-13 16:50:22, Andi Kleen wrote:
> > > [...]
> > > > diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
> > > > index cc1b01c..10966f5 100644
> > > > --- a/include/linux/cpuset.h
> > > > +++ b/include/linux/cpuset.h
> > > > @@ -72,12 +72,14 @@ extern int cpuset_slab_spread_node(void);
> > > >  
> > > >  static inline int cpuset_do_page_mem_spread(void)
> > > >  {
> > > > -	return current->flags & PF_SPREAD_PAGE;
> > > > +	return (current->flags & PF_SPREAD_PAGE) ||
> > > > +		sysctl_spread_file_cache;
> > > >  }
> > > 
> > > But this might break applications that explicitly opt out from
> > > spreading.
> > 
> > What do you mean? There's no such setting at the moment.
> > 
> > They can only enable it.
> 
> cpuset_update_task_spread_flag allows disabling both flags. You can do
> so for example via cpuset cgroup controller.

Ok.

So you're saying it should look up the cpuset. I'm reluctant do 
that. It would make this path quite a bit more expensive.

Is it really a big problem to override that setting with
the global sysctl. Seems like sensible semantics for me.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Expose sysctls for enabling slab/file_cache interleaving
  2013-11-19 20:13       ` Andi Kleen
@ 2013-11-19 21:21         ` Michal Hocko
  2013-11-19 21:49           ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2013-11-19 21:21 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andi Kleen, linux-mm, linux-kernel

On Tue 19-11-13 12:13:33, Andi Kleen wrote:
> On Tue, Nov 19, 2013 at 08:11:35PM +0100, Michal Hocko wrote:
> > On Tue 19-11-13 19:42:00, Andi Kleen wrote:
> > > On Tue, Nov 19, 2013 at 11:42:03AM +0100, Michal Hocko wrote:
> > > > On Mon 18-11-13 16:50:22, Andi Kleen wrote:
> > > > [...]
> > > > > diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
> > > > > index cc1b01c..10966f5 100644
> > > > > --- a/include/linux/cpuset.h
> > > > > +++ b/include/linux/cpuset.h
> > > > > @@ -72,12 +72,14 @@ extern int cpuset_slab_spread_node(void);
> > > > >  
> > > > >  static inline int cpuset_do_page_mem_spread(void)
> > > > >  {
> > > > > -	return current->flags & PF_SPREAD_PAGE;
> > > > > +	return (current->flags & PF_SPREAD_PAGE) ||
> > > > > +		sysctl_spread_file_cache;
> > > > >  }
> > > > 
> > > > But this might break applications that explicitly opt out from
> > > > spreading.
> > > 
> > > What do you mean? There's no such setting at the moment.
> > > 
> > > They can only enable it.
> > 
> > cpuset_update_task_spread_flag allows disabling both flags. You can do
> > so for example via cpuset cgroup controller.
> 
> Ok.
> 
> So you're saying it should look up the cpuset. I'm reluctant do 
> that. It would make this path quite a bit more expensive.

Another option would be to use sysctl values for the top cpuset as a
default. But then why not just do it manually without sysctl?
 
> Is it really a big problem to override that setting with
> the global sysctl. Seems like sensible semantics for me.

If you create a cpuset and explicitly disable spreading then you would
be quite surprised that your process gets pages from all nodes, no?

> 
> -Andi

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Expose sysctls for enabling slab/file_cache interleaving
  2013-11-19 21:21         ` Michal Hocko
@ 2013-11-19 21:49           ` Andi Kleen
  2013-11-20  5:49             ` KOSAKI Motohiro
  0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2013-11-19 21:49 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, linux-kernel

Michal Hocko <mhocko@suse.cz> writes:
>
> Another option would be to use sysctl values for the top cpuset as a
> default. But then why not just do it manually without sysctl?

I want to provide an alternative to having to use cpusets to use this,
that is actually usable for normal people.

Also this is really a global setting in my mind.

> If you create a cpuset and explicitly disable spreading then you would
> be quite surprised that your process gets pages from all nodes, no?

If I enable it globally using a sysctl I would be quite surprised 
if some cpuset can override it.

That argument is equally valid :-)

The user configured an inconsistent configuration, and the kernel
has to make a decision somehow.

In the end it is arbitary, but not having to check the cpuset
here is a lot cheaper, so I prefer the "sysctl has priority" 
option.

Could EINVAL the cpuset setting when the sysctl is set
though (but it's difficult to do the other way round).

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Expose sysctls for enabling slab/file_cache interleaving
  2013-11-19 21:49           ` Andi Kleen
@ 2013-11-20  5:49             ` KOSAKI Motohiro
  0 siblings, 0 replies; 8+ messages in thread
From: KOSAKI Motohiro @ 2013-11-20  5:49 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Michal Hocko, linux-mm, LKML

On Tue, Nov 19, 2013 at 4:49 PM, Andi Kleen <andi@firstfloor.org> wrote:
> Michal Hocko <mhocko@suse.cz> writes:
>>
>> Another option would be to use sysctl values for the top cpuset as a
>> default. But then why not just do it manually without sysctl?
>
> I want to provide an alternative to having to use cpusets to use this,
> that is actually usable for normal people.
>
> Also this is really a global setting in my mind.
>
>> If you create a cpuset and explicitly disable spreading then you would
>> be quite surprised that your process gets pages from all nodes, no?
>
> If I enable it globally using a sysctl I would be quite surprised
> if some cpuset can override it.
>
> That argument is equally valid :-)
>
> The user configured an inconsistent configuration, and the kernel
> has to make a decision somehow.
>
> In the end it is arbitary, but not having to check the cpuset
> here is a lot cheaper, so I prefer the "sysctl has priority"
> option.

sorry.
I agree with Michael. If there are large scope knob and small scope knob,
the small scope should have high priority. It is one of best practice of the
interface design.

However, I fully agree the basic concept of this patch. sysctl help a
lot of admins.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-11-20  5:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-19  0:50 [PATCH] Expose sysctls for enabling slab/file_cache interleaving Andi Kleen
2013-11-19 10:42 ` Michal Hocko
2013-11-19 18:42   ` Andi Kleen
2013-11-19 19:11     ` Michal Hocko
2013-11-19 20:13       ` Andi Kleen
2013-11-19 21:21         ` Michal Hocko
2013-11-19 21:49           ` Andi Kleen
2013-11-20  5:49             ` KOSAKI Motohiro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).