All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm/compaction: Disable compact_unevictable_allowed on RT
@ 2020-01-15 16:10 Sebastian Andrzej Siewior
       [not found] ` <20200115161035.893221-1-bigeasy-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-01-15 16:10 UTC (permalink / raw)
  To: linux-mm
  Cc: Thomas Gleixner, Vlastimil Babka, Andrew Morton,
	Luis Chamberlain, Kees Cook, Iurii Zaikin,
	Sebastian Andrzej Siewior

Since commit
    5bbe3547aa3ba ("mm: allow compaction of unevictable pages")

it is allowed to examine mlocked pages and compact them by default.
On -RT even minor pagefaults are problematic because it may take a few
100us to resolve them and until then the task is blocked.

Make compact_unevictable_allowed = 0 default and remove it from /proc on
RT.

Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/sysctl.c | 3 ++-
 mm/compaction.c | 4 ++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 70665934d53e2..d08bd51a0fbc3 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1488,6 +1488,7 @@ static struct ctl_table vm_table[] = {
 		.extra1		= &min_extfrag_threshold,
 		.extra2		= &max_extfrag_threshold,
 	},
+#ifndef CONFIG_PREEMPT_RT
 	{
 		.procname	= "compact_unevictable_allowed",
 		.data		= &sysctl_compact_unevictable_allowed,
@@ -1497,7 +1498,7 @@ static struct ctl_table vm_table[] = {
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE,
 	},
-
+#endif
 #endif /* CONFIG_COMPACTION */
 	{
 		.procname	= "min_free_kbytes",
diff --git a/mm/compaction.c b/mm/compaction.c
index 672d3c78c6abf..b2c804c35ae56 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1590,7 +1590,11 @@ typedef enum {
  * Allow userspace to control policy on scanning the unevictable LRU for
  * compactable pages.
  */
+#ifdef CONFIG_PREEMPT_RT
+#define sysctl_compact_unevictable_allowed 0
+#else
 int sysctl_compact_unevictable_allowed __read_mostly = 1;
+#endif
 
 static inline void
 update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
-- 
2.25.0.rc2



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: Disable compact_unevictable_allowed on RT
  2020-01-15 16:10 [PATCH] mm/compaction: Disable compact_unevictable_allowed on RT Sebastian Andrzej Siewior
@ 2020-01-15 22:04     ` Vlastimil Babka
  0 siblings, 0 replies; 19+ messages in thread
From: Vlastimil Babka @ 2020-01-15 22:04 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-mm-Bw31MaZKKs3YtjvyW6yDsg
  Cc: Thomas Gleixner, Andrew Morton, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, Mel Gorman, Linux API

On 1/15/2020 5:10 PM, Sebastian Andrzej Siewior wrote:
> Since commit
>     5bbe3547aa3ba ("mm: allow compaction of unevictable pages")
> 
> it is allowed to examine mlocked pages and compact them by default.
> On -RT even minor pagefaults are problematic because it may take a few
> 100us to resolve them and until then the task is blocked.

Fine, this makes sense on RT I guess. There might be some trade-off for
high-order allocation latencies though. We could perhaps migrate such mlocked
pages to pages allocated without __GFP_MOVABLE during the mlock() to at least
somewhat prevent them being scattered all over the zones. For MCL_FUTURE,
allocate them as unmovable from the beginning. But that can wait until issues
are reported.
I assume you have similar solution for NUMA balancing and whatever else can
cause minor faults?

> Make compact_unevictable_allowed = 0 default and remove it from /proc on
> RT.

Removing it is maybe going too far in terms of RT kernel differences confusing
users? Change the default sure, perhaps making it read-only, but removing?

> Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org/

In any case the sysctl Documentation/ should be updated? And perhaps also the
mlock manpage as you noted in the older thread above?

Thanks,
Vlastimil

> Signed-off-by: Sebastian Andrzej Siewior <bigeasy-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
> ---
>  kernel/sysctl.c | 3 ++-
>  mm/compaction.c | 4 ++++
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 70665934d53e2..d08bd51a0fbc3 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1488,6 +1488,7 @@ static struct ctl_table vm_table[] = {
>  		.extra1		= &min_extfrag_threshold,
>  		.extra2		= &max_extfrag_threshold,
>  	},
> +#ifndef CONFIG_PREEMPT_RT
>  	{
>  		.procname	= "compact_unevictable_allowed",
>  		.data		= &sysctl_compact_unevictable_allowed,
> @@ -1497,7 +1498,7 @@ static struct ctl_table vm_table[] = {
>  		.extra1		= SYSCTL_ZERO,
>  		.extra2		= SYSCTL_ONE,
>  	},
> -
> +#endif
>  #endif /* CONFIG_COMPACTION */
>  	{
>  		.procname	= "min_free_kbytes",
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 672d3c78c6abf..b2c804c35ae56 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1590,7 +1590,11 @@ typedef enum {
>   * Allow userspace to control policy on scanning the unevictable LRU for
>   * compactable pages.
>   */
> +#ifdef CONFIG_PREEMPT_RT
> +#define sysctl_compact_unevictable_allowed 0
> +#else
>  int sysctl_compact_unevictable_allowed __read_mostly = 1;
> +#endif
>  
>  static inline void
>  update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: Disable compact_unevictable_allowed on RT
@ 2020-01-15 22:04     ` Vlastimil Babka
  0 siblings, 0 replies; 19+ messages in thread
From: Vlastimil Babka @ 2020-01-15 22:04 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-mm
  Cc: Thomas Gleixner, Andrew Morton, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, Mel Gorman, Linux API

On 1/15/2020 5:10 PM, Sebastian Andrzej Siewior wrote:
> Since commit
>     5bbe3547aa3ba ("mm: allow compaction of unevictable pages")
> 
> it is allowed to examine mlocked pages and compact them by default.
> On -RT even minor pagefaults are problematic because it may take a few
> 100us to resolve them and until then the task is blocked.

Fine, this makes sense on RT I guess. There might be some trade-off for
high-order allocation latencies though. We could perhaps migrate such mlocked
pages to pages allocated without __GFP_MOVABLE during the mlock() to at least
somewhat prevent them being scattered all over the zones. For MCL_FUTURE,
allocate them as unmovable from the beginning. But that can wait until issues
are reported.
I assume you have similar solution for NUMA balancing and whatever else can
cause minor faults?

> Make compact_unevictable_allowed = 0 default and remove it from /proc on
> RT.

Removing it is maybe going too far in terms of RT kernel differences confusing
users? Change the default sure, perhaps making it read-only, but removing?

> Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/

In any case the sysctl Documentation/ should be updated? And perhaps also the
mlock manpage as you noted in the older thread above?

Thanks,
Vlastimil

> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  kernel/sysctl.c | 3 ++-
>  mm/compaction.c | 4 ++++
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 70665934d53e2..d08bd51a0fbc3 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1488,6 +1488,7 @@ static struct ctl_table vm_table[] = {
>  		.extra1		= &min_extfrag_threshold,
>  		.extra2		= &max_extfrag_threshold,
>  	},
> +#ifndef CONFIG_PREEMPT_RT
>  	{
>  		.procname	= "compact_unevictable_allowed",
>  		.data		= &sysctl_compact_unevictable_allowed,
> @@ -1497,7 +1498,7 @@ static struct ctl_table vm_table[] = {
>  		.extra1		= SYSCTL_ZERO,
>  		.extra2		= SYSCTL_ONE,
>  	},
> -
> +#endif
>  #endif /* CONFIG_COMPACTION */
>  	{
>  		.procname	= "min_free_kbytes",
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 672d3c78c6abf..b2c804c35ae56 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1590,7 +1590,11 @@ typedef enum {
>   * Allow userspace to control policy on scanning the unevictable LRU for
>   * compactable pages.
>   */
> +#ifdef CONFIG_PREEMPT_RT
> +#define sysctl_compact_unevictable_allowed 0
> +#else
>  int sysctl_compact_unevictable_allowed __read_mostly = 1;
> +#endif
>  
>  static inline void
>  update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm/compaction: Disable compact_unevictable_allowed on RT
  2020-01-15 22:04     ` Vlastimil Babka
  (?)
@ 2020-01-16 10:22     ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-01-16 10:22 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Thomas Gleixner, Andrew Morton, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Mel Gorman, Linux API

On 2020-01-15 23:04:19 [+0100], Vlastimil Babka wrote:
> On 1/15/2020 5:10 PM, Sebastian Andrzej Siewior wrote:
> > Since commit
> >     5bbe3547aa3ba ("mm: allow compaction of unevictable pages")
> > 
> > it is allowed to examine mlocked pages and compact them by default.
> > On -RT even minor pagefaults are problematic because it may take a few
> > 100us to resolve them and until then the task is blocked.
> 
> Fine, this makes sense on RT I guess. There might be some trade-off for
> high-order allocation latencies though. We could perhaps migrate such mlocked
> pages to pages allocated without __GFP_MOVABLE during the mlock() to at least
> somewhat prevent them being scattered all over the zones. For MCL_FUTURE,
> allocate them as unmovable from the beginning. But that can wait until issues
> are reported.
> I assume you have similar solution for NUMA balancing and whatever else can
> cause minor faults?

I've found this one while testing. Could you please point to the NUMA
balancing that might be an issue?

> > Make compact_unevictable_allowed = 0 default and remove it from /proc on
> > RT.
> 
> Removing it is maybe going too far in terms of RT kernel differences confusing
> users? Change the default sure, perhaps making it read-only, but removing?

Okay. I will make it RO then. 

> > Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/
> 
> In any case the sysctl Documentation/ should be updated? And perhaps also the
> mlock manpage as you noted in the older thread above?

Sure. Let me add the sysctl documentation to this patch and then I will
look into the manpage.

> Thanks,
> Vlastimil

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2] mm/compaction: Disable compact_unevictable_allowed on RT
  2020-01-15 22:04     ` Vlastimil Babka
  (?)
  (?)
@ 2020-03-02 17:35     ` Sebastian Andrzej Siewior
  2020-03-02 21:25       ` Andrew Morton
  -1 siblings, 1 reply; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-02 17:35 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Thomas Gleixner, Andrew Morton, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Mel Gorman, Linux API

Since commit
    5bbe3547aa3ba ("mm: allow compaction of unevictable pages")

it is allowed to examine mlocked pages and compact them by default.
On -RT even minor pagefaults are problematic because it may take a few
100us to resolve them and until then the task is blocked.

Make compact_unevictable_allowed = 0 default and RO on RT.

Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v1…v2: - Make the proc file RO instead removing it.
       - Mention this change in Documentation/…/vm.rst.

 Documentation/admin-guide/sysctl/vm.rst | 1 +
 kernel/sysctl.c                         | 4 ++++
 mm/compaction.c                         | 4 ++++
 3 files changed, 9 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 64aeee1009cab..bbfa59d25eec3 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -128,6 +128,7 @@ allowed to examine the unevictable lru (mlocked pages) for pages to compact.
 This should be used on systems where stalls for minor page faults are an
 acceptable trade for large contiguous free memory.  Set to 0 to prevent
 compaction from moving pages that are unevictable.  Default value is 1.
+On CONFIG_PREEMPT_RT the default value is 0.
 
 
 dirty_background_bytes
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ad5b88a53c5a8..f113e31d0b0b6 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1483,7 +1483,11 @@ static struct ctl_table vm_table[] = {
 		.procname	= "compact_unevictable_allowed",
 		.data		= &sysctl_compact_unevictable_allowed,
 		.maxlen		= sizeof(int),
+#ifdef CONFIG_PREEMPT_RT
+		.mode		= 0444,
+#else
 		.mode		= 0644,
+#endif
 		.proc_handler	= proc_dointvec,
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE,
diff --git a/mm/compaction.c b/mm/compaction.c
index 672d3c78c6abf..ba77809a1666e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1590,7 +1590,11 @@ typedef enum {
  * Allow userspace to control policy on scanning the unevictable LRU for
  * compactable pages.
  */
+#ifdef CONFIG_PREEMPT_RT
+int sysctl_compact_unevictable_allowed __read_mostly = 0;
+#else
 int sysctl_compact_unevictable_allowed __read_mostly = 1;
+#endif
 
 static inline void
 update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] mm/compaction: Disable compact_unevictable_allowed on RT
  2020-03-02 17:35     ` [PATCH v2] " Sebastian Andrzej Siewior
@ 2020-03-02 21:25       ` Andrew Morton
  2020-03-03 17:59         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2020-03-02 21:25 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Vlastimil Babka, linux-mm, Thomas Gleixner, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Mel Gorman, Linux API

On Mon, 2 Mar 2020 18:35:16 +0100 Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> Since commit
>     5bbe3547aa3ba ("mm: allow compaction of unevictable pages")
> 
> it is allowed to examine mlocked pages and compact them by default.
> On -RT even minor pagefaults are problematic because it may take a few
> 100us to resolve them and until then the task is blocked.
> 
> Make compact_unevictable_allowed = 0 default and RO on RT.

hm, that's a bit sad but I guess it's tolerable.

> ...
>
> index 64aeee1009cab..bbfa59d25eec3 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -128,6 +128,7 @@ allowed to examine the unevictable lru (mlocked pages) for pages to compact.
>  This should be used on systems where stalls for minor page faults are an
>  acceptable trade for large contiguous free memory.  Set to 0 to prevent
>  compaction from moving pages that are unevictable.  Default value is 1.
> +On CONFIG_PREEMPT_RT the default value is 0.

This doesn't mention that the file is unwritable on -rt, and it doesn't
explain *why* -rt has different behaviour.

> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1483,7 +1483,11 @@ static struct ctl_table vm_table[] = {
>  		.procname	= "compact_unevictable_allowed",
>  		.data		= &sysctl_compact_unevictable_allowed,
>  		.maxlen		= sizeof(int),
> +#ifdef CONFIG_PREEMPT_RT
> +		.mode		= 0444,
> +#else
>  		.mode		= 0644,
> +#endif

This is non-backward-compatible and introduces a possibility that
tested-on-non-rt userspace will fail on -rt kernels.  It might be
better to accept the writes, but to ignore them.  Probably with a
pr_warn_once() to let people know what we did.

But do we really need to take the option away from -rt users?  Perhaps
someone wants this feature and can accept the latency hit.  How about
switching the default and otherwise leaving the kernel behaviour as-is
and simply emitting a warning letting -rt users know that they might
not want to enable this?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] mm/compaction: Disable compact_unevictable_allowed on RT
  2020-03-02 21:25       ` Andrew Morton
@ 2020-03-03 17:59         ` Sebastian Andrzej Siewior
  2020-03-03 20:20           ` [PATCH 1/2] =?UTF-8?q?mm/compaction:=20Really=20limit=20compact?= =?UTF-8?q?=5Funevictable=5Fallowed=20to=200=E2=80=A61?= Sebastian Andrzej Siewior
  0 siblings, 1 reply; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-03 17:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, linux-mm, Thomas Gleixner, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Mel Gorman, Linux API

On 2020-03-02 13:25:31 [-0800], Andrew Morton wrote:
> > index 64aeee1009cab..bbfa59d25eec3 100644
> > --- a/Documentation/admin-guide/sysctl/vm.rst
> > +++ b/Documentation/admin-guide/sysctl/vm.rst
> > @@ -128,6 +128,7 @@ allowed to examine the unevictable lru (mlocked pages) for pages to compact.
> >  This should be used on systems where stalls for minor page faults are an
> >  acceptable trade for large contiguous free memory.  Set to 0 to prevent
> >  compaction from moving pages that are unevictable.  Default value is 1.
> > +On CONFIG_PREEMPT_RT the default value is 0.
> 
> This doesn't mention that the file is unwritable on -rt, and it doesn't
> explain *why* -rt has different behaviour.

I updated this bit.

> > --- a/kernel/sysctl.c
> > +++ b/kernel/sysctl.c
> > @@ -1483,7 +1483,11 @@ static struct ctl_table vm_table[] = {
> >  		.procname	= "compact_unevictable_allowed",
> >  		.data		= &sysctl_compact_unevictable_allowed,
> >  		.maxlen		= sizeof(int),
> > +#ifdef CONFIG_PREEMPT_RT
> > +		.mode		= 0444,
> > +#else
> >  		.mode		= 0644,
> > +#endif
> 
> This is non-backward-compatible and introduces a possibility that
> tested-on-non-rt userspace will fail on -rt kernels.  It might be
> better to accept the writes, but to ignore them.  Probably with a
> pr_warn_once() to let people know what we did.

Hmm.

> But do we really need to take the option away from -rt users?  Perhaps
> someone wants this feature and can accept the latency hit.  How about
> switching the default and otherwise leaving the kernel behaviour as-is
> and simply emitting a warning letting -rt users know that they might
> not want to enable this?

I don't think that RT people can live with the latency spike. The
problem is that it is not deterministic in terms *when* it happens and
*how*long* does it need to complete. Also it is not visible so you end
up with additional 100us and you have no idea why.
compaction is "okay" in the setup / configuration phase when the mlock()
pages aren't around / the RT task is disabled. So it does not disturb
the RT load.

Allowing the user to change the knob and spitting a warning is probably
good. So we have a preferred default and the user is aware if it is
changed with or without his knowledge.
Let me send a patch in a bit…

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/2] =?UTF-8?q?mm/compaction:=20Really=20limit=20compact?= =?UTF-8?q?=5Funevictable=5Fallowed=20to=200=E2=80=A61?=
  2020-03-03 17:59         ` Sebastian Andrzej Siewior
@ 2020-03-03 20:20           ` Sebastian Andrzej Siewior
  2020-03-03 20:22             ` [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT Sebastian Andrzej Siewior
  2020-03-04  8:12             ` [PATCH 1/2] mm/compaction: Really limit compact_unevictable_allowed to 0…1 Vlastimil Babka
  0 siblings, 2 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-03 20:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, linux-mm, Thomas Gleixner, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Mel Gorman, Linux API

The proc file `compact_unevictable_allowed' should allow 0 and 1 only,
the `extra*' attribues have been set properly but without
proc_dointvec_minmax() as the `proc_handler' the limit will not be
enforced.

Use proc_dointvec_minmax() as the `proc_handler' to enfoce the valid
specified range.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/sysctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ad5b88a53c5a8..982203101f961 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1484,7 +1484,7 @@ static struct ctl_table vm_table[] = {
 		.data		= &sysctl_compact_unevictable_allowed,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec,
+		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE,
 	},
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT
  2020-03-03 20:20           ` [PATCH 1/2] =?UTF-8?q?mm/compaction:=20Really=20limit=20compact?= =?UTF-8?q?=5Funevictable=5Fallowed=20to=200=E2=80=A61?= Sebastian Andrzej Siewior
@ 2020-03-03 20:22             ` Sebastian Andrzej Siewior
  2020-03-03 23:56               ` Andrew Morton
                                 ` (2 more replies)
  2020-03-04  8:12             ` [PATCH 1/2] mm/compaction: Really limit compact_unevictable_allowed to 0…1 Vlastimil Babka
  1 sibling, 3 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-03 20:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, linux-mm, Thomas Gleixner, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Mel Gorman, Linux API

Since commit
    5bbe3547aa3ba ("mm: allow compaction of unevictable pages")

it is allowed to examine mlocked pages and compact them by default.
On -RT even minor pagefaults are problematic because it may take a few
100us to resolve them and until then the task is blocked.

Make compact_unevictable_allowed = 0 default and issue a warning on RT
if it is changed.

Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v2…v3: - Allow to modify the value but issue a warning if it is changed.

v1…v2: - Make the proc file RO instead removing it.
       - Mention this change in Documentation/…/vm.rst.

 Documentation/admin-guide/sysctl/vm.rst |  3 +++
 kernel/sysctl.c                         | 27 ++++++++++++++++++++++++-
 mm/compaction.c                         |  4 ++++
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 64aeee1009cab..0329a4d3fa9ec 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -128,6 +128,9 @@ allowed to examine the unevictable lru (mlocked pages) for pages to compact.
 This should be used on systems where stalls for minor page faults are an
 acceptable trade for large contiguous free memory.  Set to 0 to prevent
 compaction from moving pages that are unevictable.  Default value is 1.
+On CONFIG_PREEMPT_RT the default value is 0 in order to avoid a page fault, due
+to compaction, which would block the task from becomming active until the fault
+is resolved.
 
 
 dirty_background_bytes
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 982203101f961..3ace90b6ac57f 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -212,6 +212,11 @@ static int proc_do_cad_pid(struct ctl_table *table, int write,
 		  void __user *buffer, size_t *lenp, loff_t *ppos);
 static int proc_taint(struct ctl_table *table, int write,
 			       void __user *buffer, size_t *lenp, loff_t *ppos);
+#ifdef CONFIG_COMPACTION
+static int proc_dointvec_warn_RT_change(struct ctl_table *table, int write,
+					void __user *buffer, size_t *lenp,
+					loff_t *ppos);
+#endif
 #endif
 
 #ifdef CONFIG_PRINTK
@@ -1484,7 +1489,7 @@ static struct ctl_table vm_table[] = {
 		.data		= &sysctl_compact_unevictable_allowed,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
+		.proc_handler	= proc_dointvec_warn_RT_change,
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE,
 	},
@@ -2572,6 +2577,26 @@ int proc_dointvec(struct ctl_table *table, int write,
 	return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL);
 }
 
+#ifdef CONFIG_COMPACTION
+static int proc_dointvec_warn_RT_change(struct ctl_table *table, int write,
+					void __user *buffer, size_t *lenp,
+					loff_t *ppos)
+{
+	int ret, old;
+
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || !write)
+		return proc_dointvec(table, write, buffer, lenp, ppos);
+
+	old = *(int *)table->data;
+	ret = proc_dointvec(table, write, buffer, lenp, ppos);
+	if (ret)
+		return ret;
+	WARN_ONCE(old != *(int *)table->data, "sysctl attribute %s changed.",
+		  table->procname);
+	return ret;
+}
+#endif
+
 /**
  * proc_douintvec - read a vector of unsigned integers
  * @table: the sysctl table
diff --git a/mm/compaction.c b/mm/compaction.c
index 672d3c78c6abf..ba77809a1666e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1590,7 +1590,11 @@ typedef enum {
  * Allow userspace to control policy on scanning the unevictable LRU for
  * compactable pages.
  */
+#ifdef CONFIG_PREEMPT_RT
+int sysctl_compact_unevictable_allowed __read_mostly = 0;
+#else
 int sysctl_compact_unevictable_allowed __read_mostly = 1;
+#endif
 
 static inline void
 update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT
  2020-03-03 20:22             ` [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT Sebastian Andrzej Siewior
@ 2020-03-03 23:56               ` Andrew Morton
  2020-03-04  8:19                 ` Vlastimil Babka
  2020-03-19 16:39                 ` [PATCH 2/2 v4] " Sebastian Andrzej Siewior
  2020-03-04  8:18               ` [PATCH 2/2 v3] " Vlastimil Babka
  2020-03-04  9:11               ` Mel Gorman
  2 siblings, 2 replies; 19+ messages in thread
From: Andrew Morton @ 2020-03-03 23:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Vlastimil Babka, linux-mm, Thomas Gleixner, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Mel Gorman, Linux API

On Tue, 3 Mar 2020 21:22:25 +0100 Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> Since commit
>     5bbe3547aa3ba ("mm: allow compaction of unevictable pages")
> 
> it is allowed to examine mlocked pages and compact them by default.
> On -RT even minor pagefaults are problematic because it may take a few
> 100us to resolve them and until then the task is blocked.
> 
> Make compact_unevictable_allowed = 0 default and issue a warning on RT
> if it is changed.

Fair enough, I guess.

> @@ -2572,6 +2577,26 @@ int proc_dointvec(struct ctl_table *table, int write,
>  	return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL);
>  }
>  
> +#ifdef CONFIG_COMPACTION
> +static int proc_dointvec_warn_RT_change(struct ctl_table *table, int write,
> +					void __user *buffer, size_t *lenp,
> +					loff_t *ppos)
> +{
> +	int ret, old;
> +
> +	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || !write)
> +		return proc_dointvec(table, write, buffer, lenp, ppos);
> +
> +	old = *(int *)table->data;
> +	ret = proc_dointvec(table, write, buffer, lenp, ppos);
> +	if (ret)
> +		return ret;
> +	WARN_ONCE(old != *(int *)table->data, "sysctl attribute %s changed.",
> +		  table->procname);

The WARN will include a stack trace which just isn't interesting.  A
pr_warn() would be better?

> +	return ret;
> +}
> +#endif


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] mm/compaction: Really limit compact_unevictable_allowed to 0…1
  2020-03-03 20:20           ` [PATCH 1/2] =?UTF-8?q?mm/compaction:=20Really=20limit=20compact?= =?UTF-8?q?=5Funevictable=5Fallowed=20to=200=E2=80=A61?= Sebastian Andrzej Siewior
  2020-03-03 20:22             ` [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT Sebastian Andrzej Siewior
@ 2020-03-04  8:12             ` Vlastimil Babka
  1 sibling, 0 replies; 19+ messages in thread
From: Vlastimil Babka @ 2020-03-04  8:12 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Andrew Morton
  Cc: linux-mm, Thomas Gleixner, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, Mel Gorman, Linux API

On 3/3/20 9:20 PM, Sebastian Andrzej Siewior wrote:
> The proc file `compact_unevictable_allowed' should allow 0 and 1 only,
> the `extra*' attribues have been set properly but without
> proc_dointvec_minmax() as the `proc_handler' the limit will not be
> enforced.
> 
> Use proc_dointvec_minmax() as the `proc_handler' to enfoce the valid
> specified range.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  kernel/sysctl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index ad5b88a53c5a8..982203101f961 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1484,7 +1484,7 @@ static struct ctl_table vm_table[] = {
>  		.data		= &sysctl_compact_unevictable_allowed,
>  		.maxlen		= sizeof(int),
>  		.mode		= 0644,
> -		.proc_handler	= proc_dointvec,
> +		.proc_handler	= proc_dointvec_minmax,
>  		.extra1		= SYSCTL_ZERO,
>  		.extra2		= SYSCTL_ONE,
>  	},
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT
  2020-03-03 20:22             ` [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT Sebastian Andrzej Siewior
  2020-03-03 23:56               ` Andrew Morton
@ 2020-03-04  8:18               ` Vlastimil Babka
  2020-03-04  9:25                 ` Sebastian Andrzej Siewior
  2020-03-04  9:11               ` Mel Gorman
  2 siblings, 1 reply; 19+ messages in thread
From: Vlastimil Babka @ 2020-03-04  8:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Andrew Morton
  Cc: linux-mm, Thomas Gleixner, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, Mel Gorman, Linux API

On 3/3/20 9:22 PM, Sebastian Andrzej Siewior wrote:
> Since commit
>     5bbe3547aa3ba ("mm: allow compaction of unevictable pages")
> 
> it is allowed to examine mlocked pages and compact them by default.
> On -RT even minor pagefaults are problematic because it may take a few
> 100us to resolve them and until then the task is blocked.
> 
> Make compact_unevictable_allowed = 0 default and issue a warning on RT
> if it is changed.
> 
> Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> v2…v3: - Allow to modify the value but issue a warning if it is changed.
> 
> v1…v2: - Make the proc file RO instead removing it.
>        - Mention this change in Documentation/…/vm.rst.
> 
>  Documentation/admin-guide/sysctl/vm.rst |  3 +++
>  kernel/sysctl.c                         | 27 ++++++++++++++++++++++++-
>  mm/compaction.c                         |  4 ++++
>  3 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> index 64aeee1009cab..0329a4d3fa9ec 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -128,6 +128,9 @@ allowed to examine the unevictable lru (mlocked pages) for pages to compact.
>  This should be used on systems where stalls for minor page faults are an
>  acceptable trade for large contiguous free memory.  Set to 0 to prevent
>  compaction from moving pages that are unevictable.  Default value is 1.
> +On CONFIG_PREEMPT_RT the default value is 0 in order to avoid a page fault, due
> +to compaction, which would block the task from becomming active until the fault
> +is resolved.
>  
>  
>  dirty_background_bytes
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 982203101f961..3ace90b6ac57f 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -212,6 +212,11 @@ static int proc_do_cad_pid(struct ctl_table *table, int write,
>  		  void __user *buffer, size_t *lenp, loff_t *ppos);
>  static int proc_taint(struct ctl_table *table, int write,
>  			       void __user *buffer, size_t *lenp, loff_t *ppos);
> +#ifdef CONFIG_COMPACTION
> +static int proc_dointvec_warn_RT_change(struct ctl_table *table, int write,
> +					void __user *buffer, size_t *lenp,
> +					loff_t *ppos);
> +#endif
>  #endif
>  
>  #ifdef CONFIG_PRINTK
> @@ -1484,7 +1489,7 @@ static struct ctl_table vm_table[] = {
>  		.data		= &sysctl_compact_unevictable_allowed,
>  		.maxlen		= sizeof(int),
>  		.mode		= 0644,
> -		.proc_handler	= proc_dointvec_minmax,
> +		.proc_handler	= proc_dointvec_warn_RT_change,
>  		.extra1		= SYSCTL_ZERO,
>  		.extra2		= SYSCTL_ONE,
>  	},
> @@ -2572,6 +2577,26 @@ int proc_dointvec(struct ctl_table *table, int write,
>  	return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL);
>  }
>  
> +#ifdef CONFIG_COMPACTION
> +static int proc_dointvec_warn_RT_change(struct ctl_table *table, int write,
> +					void __user *buffer, size_t *lenp,
> +					loff_t *ppos)
> +{
> +	int ret, old;
> +
> +	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || !write)
> +		return proc_dointvec(table, write, buffer, lenp, ppos);

Shouldn't you use her proc_dointvec_minmax() per Patch 1/2 ?

> +
> +	old = *(int *)table->data;
> +	ret = proc_dointvec(table, write, buffer, lenp, ppos);

And here.

> +	if (ret)
> +		return ret;
> +	WARN_ONCE(old != *(int *)table->data, "sysctl attribute %s changed.",
> +		  table->procname);
> +	return ret;
> +}
> +#endif
> +
>  /**
>   * proc_douintvec - read a vector of unsigned integers
>   * @table: the sysctl table
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 672d3c78c6abf..ba77809a1666e 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1590,7 +1590,11 @@ typedef enum {
>   * Allow userspace to control policy on scanning the unevictable LRU for
>   * compactable pages.
>   */
> +#ifdef CONFIG_PREEMPT_RT
> +int sysctl_compact_unevictable_allowed __read_mostly = 0;
> +#else
>  int sysctl_compact_unevictable_allowed __read_mostly = 1;
> +#endif
>  
>  static inline void
>  update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT
  2020-03-03 23:56               ` Andrew Morton
@ 2020-03-04  8:19                 ` Vlastimil Babka
  2020-03-04  9:27                   ` Sebastian Andrzej Siewior
  2020-03-19 16:39                 ` [PATCH 2/2 v4] " Sebastian Andrzej Siewior
  1 sibling, 1 reply; 19+ messages in thread
From: Vlastimil Babka @ 2020-03-04  8:19 UTC (permalink / raw)
  To: Andrew Morton, Sebastian Andrzej Siewior
  Cc: linux-mm, Thomas Gleixner, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, Mel Gorman, Linux API

On 3/4/20 12:56 AM, Andrew Morton wrote:
>> @@ -2572,6 +2577,26 @@ int proc_dointvec(struct ctl_table *table, int write,
>>  	return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL);
>>  }
>>  
>> +#ifdef CONFIG_COMPACTION
>> +static int proc_dointvec_warn_RT_change(struct ctl_table *table, int write,
>> +					void __user *buffer, size_t *lenp,
>> +					loff_t *ppos)
>> +{
>> +	int ret, old;
>> +
>> +	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || !write)
>> +		return proc_dointvec(table, write, buffer, lenp, ppos);
>> +
>> +	old = *(int *)table->data;
>> +	ret = proc_dointvec(table, write, buffer, lenp, ppos);
>> +	if (ret)
>> +		return ret;
>> +	WARN_ONCE(old != *(int *)table->data, "sysctl attribute %s changed.",
>> +		  table->procname);
> 
> The WARN will include a stack trace which just isn't interesting.  A
> pr_warn() would be better?

Yeah, the only interesting part of full WARN would possibly be, which process
changed it. That might be useful to print.

>> +	return ret;
>> +}
>> +#endif
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT
  2020-03-03 20:22             ` [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT Sebastian Andrzej Siewior
  2020-03-03 23:56               ` Andrew Morton
  2020-03-04  8:18               ` [PATCH 2/2 v3] " Vlastimil Babka
@ 2020-03-04  9:11               ` Mel Gorman
  2 siblings, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2020-03-04  9:11 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Andrew Morton, Vlastimil Babka, linux-mm, Thomas Gleixner,
	Luis Chamberlain, Kees Cook, Iurii Zaikin, Linux API

On Tue, Mar 03, 2020 at 09:22:25PM +0100, Sebastian Andrzej Siewior wrote:
> Since commit
>     5bbe3547aa3ba ("mm: allow compaction of unevictable pages")
> 
> it is allowed to examine mlocked pages and compact them by default.
> On -RT even minor pagefaults are problematic because it may take a few
> 100us to resolve them and until then the task is blocked.
> 
> Make compact_unevictable_allowed = 0 default and issue a warning on RT
> if it is changed.
> 
> Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

(caveat: I do not spend very much some on RT-specific topics)

While I ack'd this, an RT application using THP is playing with fire,
I know the RT extension for SLE explicitly disables it from being enabled
at kernel config time. At minimum the critical regions should be mlocked
followed by prctl to disable future THP faults that are non-deterministic,
both from an allocation point of view, and a TLB access point of view. It's
still reasonable to expect a smaller TLB reach for huge pages than
base pages.

It's a similar hazard with NUMA balancing, an RT application should either
disable balancing globally or set a memory policy that forces it to be
ignored. They should be doing this anyway to avoid non-deterministic
memory access costs due to NUMA artifacts but it wouldn't surprise me
if some applications got it wrong.  In that case, the SLE RT extension
disables balancing by default and it probably should warn if it's enabled
like this patch does.

It wouldn't surprise me to see patches like this in the future (completely
untested, illustrative only).

diff --git a/init/Kconfig b/init/Kconfig
index 452bc1835cd4..7a406e2b5580 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -797,7 +797,7 @@ config NUMA_BALANCING
 config NUMA_BALANCING_DEFAULT_ENABLED
 	bool "Automatically enable NUMA aware memory/task placement"
 	default y
-	depends on NUMA_BALANCING
+	depends on NUMA_BALANCING && !PREEMPT_RT
 	help
 	  If set, automatic NUMA balancing will be enabled if running on a NUMA
 	  machine.
diff --git a/mm/Kconfig b/mm/Kconfig
index ab80933be65f..313a5d794491 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -385,7 +385,7 @@ config TRANSPARENT_HUGEPAGE
 
 choice
 	prompt "Transparent Hugepage Support sysfs defaults"
-	depends on TRANSPARENT_HUGEPAGE
+	depends on TRANSPARENT_HUGEPAGE && !PREEMPT_RT
 	default TRANSPARENT_HUGEPAGE_ALWAYS
 	help
 	  Selects the sysfs defaults for Transparent Hugepage Support.

I would hope that a user of an RT kernel with an RT-aware application
would be aware of this anyway but ..... uhhhh.

Point for Andrew is that I would not be too surprised if there were more
RT-specific checks in the future that sanity checked some configuration
options in response to RT-specific bugs that were down to insane
configurations (be they kernel configs or sysctls)

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT
  2020-03-04  8:18               ` [PATCH 2/2 v3] " Vlastimil Babka
@ 2020-03-04  9:25                 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-04  9:25 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, Thomas Gleixner, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Mel Gorman, Linux API

On 2020-03-04 09:18:21 [+0100], Vlastimil Babka wrote:
> > @@ -2572,6 +2577,26 @@ int proc_dointvec(struct ctl_table *table, int write,
> >  	return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL);
> >  }
> >  
> > +#ifdef CONFIG_COMPACTION
> > +static int proc_dointvec_warn_RT_change(struct ctl_table *table, int write,
> > +					void __user *buffer, size_t *lenp,
> > +					loff_t *ppos)
> > +{
> > +	int ret, old;
> > +
> > +	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || !write)
> > +		return proc_dointvec(table, write, buffer, lenp, ppos);
> 
> Shouldn't you use her proc_dointvec_minmax() per Patch 1/2 ?
> 
> > +
> > +	old = *(int *)table->data;
> > +	ret = proc_dointvec(table, write, buffer, lenp, ppos);
> 
> And here.

Yes, thank you for noticing. It didn't make from editor to disk after
rebasing…

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT
  2020-03-04  8:19                 ` Vlastimil Babka
@ 2020-03-04  9:27                   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-04  9:27 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, Thomas Gleixner, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Mel Gorman, Linux API

On 2020-03-04 09:19:21 [+0100], Vlastimil Babka wrote:
> >> +	WARN_ONCE(old != *(int *)table->data, "sysctl attribute %s changed.",
> >> +		  table->procname);
> > 
> > The WARN will include a stack trace which just isn't interesting.  A
> > pr_warn() would be better?
> 
> Yeah, the only interesting part of full WARN would possibly be, which process
> changed it. That might be useful to print.

Yes, the stack trace and register dump isn't interesting. But as
Vlastimil says, the task and pid are informative. So if that is too much
I could extract those two informations and include it in a pr_warn().

Sebastian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/2 v4] mm/compaction: Disable compact_unevictable_allowed on RT
  2020-03-03 23:56               ` Andrew Morton
  2020-03-04  8:19                 ` Vlastimil Babka
@ 2020-03-19 16:39                 ` Sebastian Andrzej Siewior
  2020-03-19 16:49                   ` Vlastimil Babka
  1 sibling, 1 reply; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-19 16:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Vlastimil Babka, linux-mm, Thomas Gleixner, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Mel Gorman, Linux API

Since commit
    5bbe3547aa3ba ("mm: allow compaction of unevictable pages")

it is allowed to examine mlocked pages and compact them by default.
On -RT even minor pagefaults are problematic because it may take a few
100us to resolve them and until then the task is blocked.

Make compact_unevictable_allowed = 0 default and issue a warning on RT
if it is changed.

Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v3…v4: - Replace WARN_ONCE() with pr_warn_once() and extend the message
         by name of the task and its pid.
       - Use proc_dointvec_minmax() in the !RT case.
       - Added Mel's Ack as per
	 20200304091159.GN3818@techsingularity.net.

v2…v3: - Allow to modify the value but issue a warning if it is changed.

v1…v2: - Make the proc file RO instead removing it.
       - Mention this change in Documentation/…/vm.rst.

 Documentation/admin-guide/sysctl/vm.rst |  3 +++
 kernel/sysctl.c                         | 29 ++++++++++++++++++++++++-
 mm/compaction.c                         |  4 ++++
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 64aeee1009cab..0329a4d3fa9ec 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -128,6 +128,9 @@ allowed to examine the unevictable lru (mlocked pages) for pages to compact.
 This should be used on systems where stalls for minor page faults are an
 acceptable trade for large contiguous free memory.  Set to 0 to prevent
 compaction from moving pages that are unevictable.  Default value is 1.
+On CONFIG_PREEMPT_RT the default value is 0 in order to avoid a page fault, due
+to compaction, which would block the task from becomming active until the fault
+is resolved.
 
 
 dirty_background_bytes
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 982203101f961..43eb9da0ea86c 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -212,6 +212,11 @@ static int proc_do_cad_pid(struct ctl_table *table, int write,
 		  void __user *buffer, size_t *lenp, loff_t *ppos);
 static int proc_taint(struct ctl_table *table, int write,
 			       void __user *buffer, size_t *lenp, loff_t *ppos);
+#ifdef CONFIG_COMPACTION
+static int proc_dointvec_minmax_warn_RT_change(struct ctl_table *table,
+					       int write, void __user *buffer,
+					       size_t *lenp, loff_t *ppos);
+#endif
 #endif
 
 #ifdef CONFIG_PRINTK
@@ -1484,7 +1489,7 @@ static struct ctl_table vm_table[] = {
 		.data		= &sysctl_compact_unevictable_allowed,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
+		.proc_handler	= proc_dointvec_minmax_warn_RT_change,
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE,
 	},
@@ -2572,6 +2577,28 @@ int proc_dointvec(struct ctl_table *table, int write,
 	return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL);
 }
 
+#ifdef CONFIG_COMPACTION
+static int proc_dointvec_minmax_warn_RT_change(struct ctl_table *table,
+					       int write, void __user *buffer,
+					       size_t *lenp, loff_t *ppos)
+{
+	int ret, old;
+
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || !write)
+		return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+
+	old = *(int *)table->data;
+	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+	if (ret)
+		return ret;
+	if (old != *(int *)table->data)
+		pr_warn_once("sysctl attribute %s changed changed by %s[%d]\n",
+			     table->procname, current->comm,
+			     task_pid_nr(current));
+	return ret;
+}
+#endif
+
 /**
  * proc_douintvec - read a vector of unsigned integers
  * @table: the sysctl table
diff --git a/mm/compaction.c b/mm/compaction.c
index 672d3c78c6abf..ba77809a1666e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1590,7 +1590,11 @@ typedef enum {
  * Allow userspace to control policy on scanning the unevictable LRU for
  * compactable pages.
  */
+#ifdef CONFIG_PREEMPT_RT
+int sysctl_compact_unevictable_allowed __read_mostly = 0;
+#else
 int sysctl_compact_unevictable_allowed __read_mostly = 1;
+#endif
 
 static inline void
 update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
-- 
2.26.0.rc2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2 v4] mm/compaction: Disable compact_unevictable_allowed on RT
  2020-03-19 16:39                 ` [PATCH 2/2 v4] " Sebastian Andrzej Siewior
@ 2020-03-19 16:49                   ` Vlastimil Babka
  2020-03-19 16:55                     ` [PATCH 2/2 v5] " Sebastian Andrzej Siewior
  0 siblings, 1 reply; 19+ messages in thread
From: Vlastimil Babka @ 2020-03-19 16:49 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Andrew Morton
  Cc: linux-mm, Thomas Gleixner, Luis Chamberlain, Kees Cook,
	Iurii Zaikin, Mel Gorman, Linux API

On 3/19/20 5:39 PM, Sebastian Andrzej Siewior wrote:
> Since commit
>     5bbe3547aa3ba ("mm: allow compaction of unevictable pages")
> 
> it is allowed to examine mlocked pages and compact them by default.
> On -RT even minor pagefaults are problematic because it may take a few
> 100us to resolve them and until then the task is blocked.
> 
> Make compact_unevictable_allowed = 0 default and issue a warning on RT
> if it is changed.
> 
> Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/
> Acked-by: Mel Gorman <mgorman@techsingularity.net>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

Nit below:

> @@ -2572,6 +2577,28 @@ int proc_dointvec(struct ctl_table *table, int write,
>  	return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL);
>  }
>  
> +#ifdef CONFIG_COMPACTION
> +static int proc_dointvec_minmax_warn_RT_change(struct ctl_table *table,
> +					       int write, void __user *buffer,
> +					       size_t *lenp, loff_t *ppos)
> +{
> +	int ret, old;
> +
> +	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || !write)
> +		return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> +
> +	old = *(int *)table->data;
> +	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> +	if (ret)
> +		return ret;
> +	if (old != *(int *)table->data)
> +		pr_warn_once("sysctl attribute %s changed changed by %s[%d]\n",

						  ^ "changed" twice

> +			     table->procname, current->comm,
> +			     task_pid_nr(current));
> +	return ret;
> +}
> +#endif
> +
>  /**
>   * proc_douintvec - read a vector of unsigned integers
>   * @table: the sysctl table

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/2 v5] mm/compaction: Disable compact_unevictable_allowed on RT
  2020-03-19 16:49                   ` Vlastimil Babka
@ 2020-03-19 16:55                     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 19+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-03-19 16:55 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, Thomas Gleixner, Luis Chamberlain,
	Kees Cook, Iurii Zaikin, Mel Gorman, Linux API

Since commit
    5bbe3547aa3ba ("mm: allow compaction of unevictable pages")

it is allowed to examine mlocked pages and compact them by default.
On -RT even minor pagefaults are problematic because it may take a few
100us to resolve them and until then the task is blocked.

Make compact_unevictable_allowed = 0 default and issue a warning on RT
if it is changed.

Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
v4…v5: - Add Vlastimil's Acked-by
       - s/changed changed/changed/

v3…v4: - Replace WARN_ONCE() with pr_warn_once() and extend the message
         by name of the task and its pid.
       - Use proc_dointvec_minmax() in the !RT case.
       - Added Mel's Ack as per
         20200304091159.GN3818@techsingularity.net.

v2…v3: - Allow to modify the value but issue a warning if it is changed.

v1…v2: - Make the proc file RO instead removing it.
       - Mention this change in Documentation/…/vm.rst.

 Documentation/admin-guide/sysctl/vm.rst |  3 +++
 kernel/sysctl.c                         | 29 ++++++++++++++++++++++++-
 mm/compaction.c                         |  4 ++++
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 64aeee1009cab..0329a4d3fa9ec 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -128,6 +128,9 @@ allowed to examine the unevictable lru (mlocked pages) for pages to compact.
 This should be used on systems where stalls for minor page faults are an
 acceptable trade for large contiguous free memory.  Set to 0 to prevent
 compaction from moving pages that are unevictable.  Default value is 1.
+On CONFIG_PREEMPT_RT the default value is 0 in order to avoid a page fault, due
+to compaction, which would block the task from becomming active until the fault
+is resolved.
 
 
 dirty_background_bytes
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 982203101f961..69c820d031248 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -212,6 +212,11 @@ static int proc_do_cad_pid(struct ctl_table *table, int write,
 		  void __user *buffer, size_t *lenp, loff_t *ppos);
 static int proc_taint(struct ctl_table *table, int write,
 			       void __user *buffer, size_t *lenp, loff_t *ppos);
+#ifdef CONFIG_COMPACTION
+static int proc_dointvec_minmax_warn_RT_change(struct ctl_table *table,
+					       int write, void __user *buffer,
+					       size_t *lenp, loff_t *ppos);
+#endif
 #endif
 
 #ifdef CONFIG_PRINTK
@@ -1484,7 +1489,7 @@ static struct ctl_table vm_table[] = {
 		.data		= &sysctl_compact_unevictable_allowed,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_minmax,
+		.proc_handler	= proc_dointvec_minmax_warn_RT_change,
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE,
 	},
@@ -2572,6 +2577,28 @@ int proc_dointvec(struct ctl_table *table, int write,
 	return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL);
 }
 
+#ifdef CONFIG_COMPACTION
+static int proc_dointvec_minmax_warn_RT_change(struct ctl_table *table,
+					       int write, void __user *buffer,
+					       size_t *lenp, loff_t *ppos)
+{
+	int ret, old;
+
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT) || !write)
+		return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+
+	old = *(int *)table->data;
+	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+	if (ret)
+		return ret;
+	if (old != *(int *)table->data)
+		pr_warn_once("sysctl attribute %s changed by %s[%d]\n",
+			     table->procname, current->comm,
+			     task_pid_nr(current));
+	return ret;
+}
+#endif
+
 /**
  * proc_douintvec - read a vector of unsigned integers
  * @table: the sysctl table
diff --git a/mm/compaction.c b/mm/compaction.c
index 672d3c78c6abf..ba77809a1666e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1590,7 +1590,11 @@ typedef enum {
  * Allow userspace to control policy on scanning the unevictable LRU for
  * compactable pages.
  */
+#ifdef CONFIG_PREEMPT_RT
+int sysctl_compact_unevictable_allowed __read_mostly = 0;
+#else
 int sysctl_compact_unevictable_allowed __read_mostly = 1;
+#endif
 
 static inline void
 update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
-- 
2.26.0.rc2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-03-19 16:55 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-15 16:10 [PATCH] mm/compaction: Disable compact_unevictable_allowed on RT Sebastian Andrzej Siewior
     [not found] ` <20200115161035.893221-1-bigeasy-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
2020-01-15 22:04   ` Vlastimil Babka
2020-01-15 22:04     ` Vlastimil Babka
2020-01-16 10:22     ` Sebastian Andrzej Siewior
2020-03-02 17:35     ` [PATCH v2] " Sebastian Andrzej Siewior
2020-03-02 21:25       ` Andrew Morton
2020-03-03 17:59         ` Sebastian Andrzej Siewior
2020-03-03 20:20           ` [PATCH 1/2] =?UTF-8?q?mm/compaction:=20Really=20limit=20compact?= =?UTF-8?q?=5Funevictable=5Fallowed=20to=200=E2=80=A61?= Sebastian Andrzej Siewior
2020-03-03 20:22             ` [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT Sebastian Andrzej Siewior
2020-03-03 23:56               ` Andrew Morton
2020-03-04  8:19                 ` Vlastimil Babka
2020-03-04  9:27                   ` Sebastian Andrzej Siewior
2020-03-19 16:39                 ` [PATCH 2/2 v4] " Sebastian Andrzej Siewior
2020-03-19 16:49                   ` Vlastimil Babka
2020-03-19 16:55                     ` [PATCH 2/2 v5] " Sebastian Andrzej Siewior
2020-03-04  8:18               ` [PATCH 2/2 v3] " Vlastimil Babka
2020-03-04  9:25                 ` Sebastian Andrzej Siewior
2020-03-04  9:11               ` Mel Gorman
2020-03-04  8:12             ` [PATCH 1/2] mm/compaction: Really limit compact_unevictable_allowed to 0…1 Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.