* [PATCH] mm/compaction: Disable compact_unevictable_allowed on RT @ 2020-01-15 16:10 Sebastian Andrzej Siewior [not found] ` <20200115161035.893221-1-bigeasy-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Sebastian Andrzej Siewior @ 2020-01-15 16:10 UTC (permalink / raw) To: linux-mm Cc: Thomas Gleixner, Vlastimil Babka, Andrew Morton, Luis Chamberlain, Kees Cook, Iurii Zaikin, Sebastian Andrzej Siewior Since commit 5bbe3547aa3ba ("mm: allow compaction of unevictable pages") it is allowed to examine mlocked pages and compact them by default. On -RT even minor pagefaults are problematic because it may take a few 100us to resolve them and until then the task is blocked. Make compact_unevictable_allowed = 0 default and remove it from /proc on RT. Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- kernel/sysctl.c | 3 ++- mm/compaction.c | 4 ++++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 70665934d53e2..d08bd51a0fbc3 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1488,6 +1488,7 @@ static struct ctl_table vm_table[] = { .extra1 = &min_extfrag_threshold, .extra2 = &max_extfrag_threshold, }, +#ifndef CONFIG_PREEMPT_RT { .procname = "compact_unevictable_allowed", .data = &sysctl_compact_unevictable_allowed, @@ -1497,7 +1498,7 @@ static struct ctl_table vm_table[] = { .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, - +#endif #endif /* CONFIG_COMPACTION */ { .procname = "min_free_kbytes", diff --git a/mm/compaction.c b/mm/compaction.c index 672d3c78c6abf..b2c804c35ae56 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1590,7 +1590,11 @@ typedef enum { * Allow userspace to control policy on scanning the unevictable LRU for * compactable pages. */ +#ifdef CONFIG_PREEMPT_RT +#define sysctl_compact_unevictable_allowed 0 +#else int sysctl_compact_unevictable_allowed __read_mostly = 1; +#endif static inline void update_fast_start_pfn(struct compact_control *cc, unsigned long pfn) -- 2.25.0.rc2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
[parent not found: <20200115161035.893221-1-bigeasy-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>]
* Re: [PATCH] mm/compaction: Disable compact_unevictable_allowed on RT 2020-01-15 16:10 [PATCH] mm/compaction: Disable compact_unevictable_allowed on RT Sebastian Andrzej Siewior @ 2020-01-15 22:04 ` Vlastimil Babka 0 siblings, 0 replies; 19+ messages in thread From: Vlastimil Babka @ 2020-01-15 22:04 UTC (permalink / raw) To: Sebastian Andrzej Siewior, linux-mm-Bw31MaZKKs3YtjvyW6yDsg Cc: Thomas Gleixner, Andrew Morton, Luis Chamberlain, Kees Cook, Iurii Zaikin, Mel Gorman, Linux API On 1/15/2020 5:10 PM, Sebastian Andrzej Siewior wrote: > Since commit > 5bbe3547aa3ba ("mm: allow compaction of unevictable pages") > > it is allowed to examine mlocked pages and compact them by default. > On -RT even minor pagefaults are problematic because it may take a few > 100us to resolve them and until then the task is blocked. Fine, this makes sense on RT I guess. There might be some trade-off for high-order allocation latencies though. We could perhaps migrate such mlocked pages to pages allocated without __GFP_MOVABLE during the mlock() to at least somewhat prevent them being scattered all over the zones. For MCL_FUTURE, allocate them as unmovable from the beginning. But that can wait until issues are reported. I assume you have similar solution for NUMA balancing and whatever else can cause minor faults? > Make compact_unevictable_allowed = 0 default and remove it from /proc on > RT. Removing it is maybe going too far in terms of RT kernel differences confusing users? Change the default sure, perhaps making it read-only, but removing? > Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org/ In any case the sysctl Documentation/ should be updated? And perhaps also the mlock manpage as you noted in the older thread above? Thanks, Vlastimil > Signed-off-by: Sebastian Andrzej Siewior <bigeasy-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org> > --- > kernel/sysctl.c | 3 ++- > mm/compaction.c | 4 ++++ > 2 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 70665934d53e2..d08bd51a0fbc3 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -1488,6 +1488,7 @@ static struct ctl_table vm_table[] = { > .extra1 = &min_extfrag_threshold, > .extra2 = &max_extfrag_threshold, > }, > +#ifndef CONFIG_PREEMPT_RT > { > .procname = "compact_unevictable_allowed", > .data = &sysctl_compact_unevictable_allowed, > @@ -1497,7 +1498,7 @@ static struct ctl_table vm_table[] = { > .extra1 = SYSCTL_ZERO, > .extra2 = SYSCTL_ONE, > }, > - > +#endif > #endif /* CONFIG_COMPACTION */ > { > .procname = "min_free_kbytes", > diff --git a/mm/compaction.c b/mm/compaction.c > index 672d3c78c6abf..b2c804c35ae56 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1590,7 +1590,11 @@ typedef enum { > * Allow userspace to control policy on scanning the unevictable LRU for > * compactable pages. > */ > +#ifdef CONFIG_PREEMPT_RT > +#define sysctl_compact_unevictable_allowed 0 > +#else > int sysctl_compact_unevictable_allowed __read_mostly = 1; > +#endif > > static inline void > update_fast_start_pfn(struct compact_control *cc, unsigned long pfn) > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] mm/compaction: Disable compact_unevictable_allowed on RT @ 2020-01-15 22:04 ` Vlastimil Babka 0 siblings, 0 replies; 19+ messages in thread From: Vlastimil Babka @ 2020-01-15 22:04 UTC (permalink / raw) To: Sebastian Andrzej Siewior, linux-mm Cc: Thomas Gleixner, Andrew Morton, Luis Chamberlain, Kees Cook, Iurii Zaikin, Mel Gorman, Linux API On 1/15/2020 5:10 PM, Sebastian Andrzej Siewior wrote: > Since commit > 5bbe3547aa3ba ("mm: allow compaction of unevictable pages") > > it is allowed to examine mlocked pages and compact them by default. > On -RT even minor pagefaults are problematic because it may take a few > 100us to resolve them and until then the task is blocked. Fine, this makes sense on RT I guess. There might be some trade-off for high-order allocation latencies though. We could perhaps migrate such mlocked pages to pages allocated without __GFP_MOVABLE during the mlock() to at least somewhat prevent them being scattered all over the zones. For MCL_FUTURE, allocate them as unmovable from the beginning. But that can wait until issues are reported. I assume you have similar solution for NUMA balancing and whatever else can cause minor faults? > Make compact_unevictable_allowed = 0 default and remove it from /proc on > RT. Removing it is maybe going too far in terms of RT kernel differences confusing users? Change the default sure, perhaps making it read-only, but removing? > Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/ In any case the sysctl Documentation/ should be updated? And perhaps also the mlock manpage as you noted in the older thread above? Thanks, Vlastimil > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > --- > kernel/sysctl.c | 3 ++- > mm/compaction.c | 4 ++++ > 2 files changed, 6 insertions(+), 1 deletion(-) > > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 70665934d53e2..d08bd51a0fbc3 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -1488,6 +1488,7 @@ static struct ctl_table vm_table[] = { > .extra1 = &min_extfrag_threshold, > .extra2 = &max_extfrag_threshold, > }, > +#ifndef CONFIG_PREEMPT_RT > { > .procname = "compact_unevictable_allowed", > .data = &sysctl_compact_unevictable_allowed, > @@ -1497,7 +1498,7 @@ static struct ctl_table vm_table[] = { > .extra1 = SYSCTL_ZERO, > .extra2 = SYSCTL_ONE, > }, > - > +#endif > #endif /* CONFIG_COMPACTION */ > { > .procname = "min_free_kbytes", > diff --git a/mm/compaction.c b/mm/compaction.c > index 672d3c78c6abf..b2c804c35ae56 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1590,7 +1590,11 @@ typedef enum { > * Allow userspace to control policy on scanning the unevictable LRU for > * compactable pages. > */ > +#ifdef CONFIG_PREEMPT_RT > +#define sysctl_compact_unevictable_allowed 0 > +#else > int sysctl_compact_unevictable_allowed __read_mostly = 1; > +#endif > > static inline void > update_fast_start_pfn(struct compact_control *cc, unsigned long pfn) > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] mm/compaction: Disable compact_unevictable_allowed on RT 2020-01-15 22:04 ` Vlastimil Babka (?) @ 2020-01-16 10:22 ` Sebastian Andrzej Siewior -1 siblings, 0 replies; 19+ messages in thread From: Sebastian Andrzej Siewior @ 2020-01-16 10:22 UTC (permalink / raw) To: Vlastimil Babka Cc: linux-mm, Thomas Gleixner, Andrew Morton, Luis Chamberlain, Kees Cook, Iurii Zaikin, Mel Gorman, Linux API On 2020-01-15 23:04:19 [+0100], Vlastimil Babka wrote: > On 1/15/2020 5:10 PM, Sebastian Andrzej Siewior wrote: > > Since commit > > 5bbe3547aa3ba ("mm: allow compaction of unevictable pages") > > > > it is allowed to examine mlocked pages and compact them by default. > > On -RT even minor pagefaults are problematic because it may take a few > > 100us to resolve them and until then the task is blocked. > > Fine, this makes sense on RT I guess. There might be some trade-off for > high-order allocation latencies though. We could perhaps migrate such mlocked > pages to pages allocated without __GFP_MOVABLE during the mlock() to at least > somewhat prevent them being scattered all over the zones. For MCL_FUTURE, > allocate them as unmovable from the beginning. But that can wait until issues > are reported. > I assume you have similar solution for NUMA balancing and whatever else can > cause minor faults? I've found this one while testing. Could you please point to the NUMA balancing that might be an issue? > > Make compact_unevictable_allowed = 0 default and remove it from /proc on > > RT. > > Removing it is maybe going too far in terms of RT kernel differences confusing > users? Change the default sure, perhaps making it read-only, but removing? Okay. I will make it RO then. > > Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/ > > In any case the sysctl Documentation/ should be updated? And perhaps also the > mlock manpage as you noted in the older thread above? Sure. Let me add the sysctl documentation to this patch and then I will look into the manpage. > Thanks, > Vlastimil Sebastian ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v2] mm/compaction: Disable compact_unevictable_allowed on RT 2020-01-15 22:04 ` Vlastimil Babka (?) (?) @ 2020-03-02 17:35 ` Sebastian Andrzej Siewior 2020-03-02 21:25 ` Andrew Morton -1 siblings, 1 reply; 19+ messages in thread From: Sebastian Andrzej Siewior @ 2020-03-02 17:35 UTC (permalink / raw) To: Vlastimil Babka Cc: linux-mm, Thomas Gleixner, Andrew Morton, Luis Chamberlain, Kees Cook, Iurii Zaikin, Mel Gorman, Linux API Since commit 5bbe3547aa3ba ("mm: allow compaction of unevictable pages") it is allowed to examine mlocked pages and compact them by default. On -RT even minor pagefaults are problematic because it may take a few 100us to resolve them and until then the task is blocked. Make compact_unevictable_allowed = 0 default and RO on RT. Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- v1…v2: - Make the proc file RO instead removing it. - Mention this change in Documentation/…/vm.rst. Documentation/admin-guide/sysctl/vm.rst | 1 + kernel/sysctl.c | 4 ++++ mm/compaction.c | 4 ++++ 3 files changed, 9 insertions(+) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 64aeee1009cab..bbfa59d25eec3 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -128,6 +128,7 @@ allowed to examine the unevictable lru (mlocked pages) for pages to compact. This should be used on systems where stalls for minor page faults are an acceptable trade for large contiguous free memory. Set to 0 to prevent compaction from moving pages that are unevictable. Default value is 1. +On CONFIG_PREEMPT_RT the default value is 0. dirty_background_bytes diff --git a/kernel/sysctl.c b/kernel/sysctl.c index ad5b88a53c5a8..f113e31d0b0b6 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1483,7 +1483,11 @@ static struct ctl_table vm_table[] = { .procname = "compact_unevictable_allowed", .data = &sysctl_compact_unevictable_allowed, .maxlen = sizeof(int), +#ifdef CONFIG_PREEMPT_RT + .mode = 0444, +#else .mode = 0644, +#endif .proc_handler = proc_dointvec, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, diff --git a/mm/compaction.c b/mm/compaction.c index 672d3c78c6abf..ba77809a1666e 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1590,7 +1590,11 @@ typedef enum { * Allow userspace to control policy on scanning the unevictable LRU for * compactable pages. */ +#ifdef CONFIG_PREEMPT_RT +int sysctl_compact_unevictable_allowed __read_mostly = 0; +#else int sysctl_compact_unevictable_allowed __read_mostly = 1; +#endif static inline void update_fast_start_pfn(struct compact_control *cc, unsigned long pfn) -- 2.25.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2] mm/compaction: Disable compact_unevictable_allowed on RT 2020-03-02 17:35 ` [PATCH v2] " Sebastian Andrzej Siewior @ 2020-03-02 21:25 ` Andrew Morton 2020-03-03 17:59 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 19+ messages in thread From: Andrew Morton @ 2020-03-02 21:25 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Vlastimil Babka, linux-mm, Thomas Gleixner, Luis Chamberlain, Kees Cook, Iurii Zaikin, Mel Gorman, Linux API On Mon, 2 Mar 2020 18:35:16 +0100 Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > Since commit > 5bbe3547aa3ba ("mm: allow compaction of unevictable pages") > > it is allowed to examine mlocked pages and compact them by default. > On -RT even minor pagefaults are problematic because it may take a few > 100us to resolve them and until then the task is blocked. > > Make compact_unevictable_allowed = 0 default and RO on RT. hm, that's a bit sad but I guess it's tolerable. > ... > > index 64aeee1009cab..bbfa59d25eec3 100644 > --- a/Documentation/admin-guide/sysctl/vm.rst > +++ b/Documentation/admin-guide/sysctl/vm.rst > @@ -128,6 +128,7 @@ allowed to examine the unevictable lru (mlocked pages) for pages to compact. > This should be used on systems where stalls for minor page faults are an > acceptable trade for large contiguous free memory. Set to 0 to prevent > compaction from moving pages that are unevictable. Default value is 1. > +On CONFIG_PREEMPT_RT the default value is 0. This doesn't mention that the file is unwritable on -rt, and it doesn't explain *why* -rt has different behaviour. > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -1483,7 +1483,11 @@ static struct ctl_table vm_table[] = { > .procname = "compact_unevictable_allowed", > .data = &sysctl_compact_unevictable_allowed, > .maxlen = sizeof(int), > +#ifdef CONFIG_PREEMPT_RT > + .mode = 0444, > +#else > .mode = 0644, > +#endif This is non-backward-compatible and introduces a possibility that tested-on-non-rt userspace will fail on -rt kernels. It might be better to accept the writes, but to ignore them. Probably with a pr_warn_once() to let people know what we did. But do we really need to take the option away from -rt users? Perhaps someone wants this feature and can accept the latency hit. How about switching the default and otherwise leaving the kernel behaviour as-is and simply emitting a warning letting -rt users know that they might not want to enable this? ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2] mm/compaction: Disable compact_unevictable_allowed on RT 2020-03-02 21:25 ` Andrew Morton @ 2020-03-03 17:59 ` Sebastian Andrzej Siewior 2020-03-03 20:20 ` [PATCH 1/2] =?UTF-8?q?mm/compaction:=20Really=20limit=20compact?= =?UTF-8?q?=5Funevictable=5Fallowed=20to=200=E2=80=A61?= Sebastian Andrzej Siewior 0 siblings, 1 reply; 19+ messages in thread From: Sebastian Andrzej Siewior @ 2020-03-03 17:59 UTC (permalink / raw) To: Andrew Morton Cc: Vlastimil Babka, linux-mm, Thomas Gleixner, Luis Chamberlain, Kees Cook, Iurii Zaikin, Mel Gorman, Linux API On 2020-03-02 13:25:31 [-0800], Andrew Morton wrote: > > index 64aeee1009cab..bbfa59d25eec3 100644 > > --- a/Documentation/admin-guide/sysctl/vm.rst > > +++ b/Documentation/admin-guide/sysctl/vm.rst > > @@ -128,6 +128,7 @@ allowed to examine the unevictable lru (mlocked pages) for pages to compact. > > This should be used on systems where stalls for minor page faults are an > > acceptable trade for large contiguous free memory. Set to 0 to prevent > > compaction from moving pages that are unevictable. Default value is 1. > > +On CONFIG_PREEMPT_RT the default value is 0. > > This doesn't mention that the file is unwritable on -rt, and it doesn't > explain *why* -rt has different behaviour. I updated this bit. > > --- a/kernel/sysctl.c > > +++ b/kernel/sysctl.c > > @@ -1483,7 +1483,11 @@ static struct ctl_table vm_table[] = { > > .procname = "compact_unevictable_allowed", > > .data = &sysctl_compact_unevictable_allowed, > > .maxlen = sizeof(int), > > +#ifdef CONFIG_PREEMPT_RT > > + .mode = 0444, > > +#else > > .mode = 0644, > > +#endif > > This is non-backward-compatible and introduces a possibility that > tested-on-non-rt userspace will fail on -rt kernels. It might be > better to accept the writes, but to ignore them. Probably with a > pr_warn_once() to let people know what we did. Hmm. > But do we really need to take the option away from -rt users? Perhaps > someone wants this feature and can accept the latency hit. How about > switching the default and otherwise leaving the kernel behaviour as-is > and simply emitting a warning letting -rt users know that they might > not want to enable this? I don't think that RT people can live with the latency spike. The problem is that it is not deterministic in terms *when* it happens and *how*long* does it need to complete. Also it is not visible so you end up with additional 100us and you have no idea why. compaction is "okay" in the setup / configuration phase when the mlock() pages aren't around / the RT task is disabled. So it does not disturb the RT load. Allowing the user to change the knob and spitting a warning is probably good. So we have a preferred default and the user is aware if it is changed with or without his knowledge. Let me send a patch in a bit… Sebastian ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/2] =?UTF-8?q?mm/compaction:=20Really=20limit=20compact?= =?UTF-8?q?=5Funevictable=5Fallowed=20to=200=E2=80=A61?= 2020-03-03 17:59 ` Sebastian Andrzej Siewior @ 2020-03-03 20:20 ` Sebastian Andrzej Siewior 2020-03-03 20:22 ` [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT Sebastian Andrzej Siewior 2020-03-04 8:12 ` [PATCH 1/2] mm/compaction: Really limit compact_unevictable_allowed to 0…1 Vlastimil Babka 0 siblings, 2 replies; 19+ messages in thread From: Sebastian Andrzej Siewior @ 2020-03-03 20:20 UTC (permalink / raw) To: Andrew Morton Cc: Vlastimil Babka, linux-mm, Thomas Gleixner, Luis Chamberlain, Kees Cook, Iurii Zaikin, Mel Gorman, Linux API The proc file `compact_unevictable_allowed' should allow 0 and 1 only, the `extra*' attribues have been set properly but without proc_dointvec_minmax() as the `proc_handler' the limit will not be enforced. Use proc_dointvec_minmax() as the `proc_handler' to enfoce the valid specified range. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- kernel/sysctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sysctl.c b/kernel/sysctl.c index ad5b88a53c5a8..982203101f961 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1484,7 +1484,7 @@ static struct ctl_table vm_table[] = { .data = &sysctl_compact_unevictable_allowed, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, -- 2.25.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT 2020-03-03 20:20 ` [PATCH 1/2] =?UTF-8?q?mm/compaction:=20Really=20limit=20compact?= =?UTF-8?q?=5Funevictable=5Fallowed=20to=200=E2=80=A61?= Sebastian Andrzej Siewior @ 2020-03-03 20:22 ` Sebastian Andrzej Siewior 2020-03-03 23:56 ` Andrew Morton ` (2 more replies) 2020-03-04 8:12 ` [PATCH 1/2] mm/compaction: Really limit compact_unevictable_allowed to 0…1 Vlastimil Babka 1 sibling, 3 replies; 19+ messages in thread From: Sebastian Andrzej Siewior @ 2020-03-03 20:22 UTC (permalink / raw) To: Andrew Morton Cc: Vlastimil Babka, linux-mm, Thomas Gleixner, Luis Chamberlain, Kees Cook, Iurii Zaikin, Mel Gorman, Linux API Since commit 5bbe3547aa3ba ("mm: allow compaction of unevictable pages") it is allowed to examine mlocked pages and compact them by default. On -RT even minor pagefaults are problematic because it may take a few 100us to resolve them and until then the task is blocked. Make compact_unevictable_allowed = 0 default and issue a warning on RT if it is changed. Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- v2…v3: - Allow to modify the value but issue a warning if it is changed. v1…v2: - Make the proc file RO instead removing it. - Mention this change in Documentation/…/vm.rst. Documentation/admin-guide/sysctl/vm.rst | 3 +++ kernel/sysctl.c | 27 ++++++++++++++++++++++++- mm/compaction.c | 4 ++++ 3 files changed, 33 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 64aeee1009cab..0329a4d3fa9ec 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -128,6 +128,9 @@ allowed to examine the unevictable lru (mlocked pages) for pages to compact. This should be used on systems where stalls for minor page faults are an acceptable trade for large contiguous free memory. Set to 0 to prevent compaction from moving pages that are unevictable. Default value is 1. +On CONFIG_PREEMPT_RT the default value is 0 in order to avoid a page fault, due +to compaction, which would block the task from becomming active until the fault +is resolved. dirty_background_bytes diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 982203101f961..3ace90b6ac57f 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -212,6 +212,11 @@ static int proc_do_cad_pid(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); static int proc_taint(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); +#ifdef CONFIG_COMPACTION +static int proc_dointvec_warn_RT_change(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, + loff_t *ppos); +#endif #endif #ifdef CONFIG_PRINTK @@ -1484,7 +1489,7 @@ static struct ctl_table vm_table[] = { .data = &sysctl_compact_unevictable_allowed, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_dointvec_warn_RT_change, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, @@ -2572,6 +2577,26 @@ int proc_dointvec(struct ctl_table *table, int write, return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL); } +#ifdef CONFIG_COMPACTION +static int proc_dointvec_warn_RT_change(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, + loff_t *ppos) +{ + int ret, old; + + if (!IS_ENABLED(CONFIG_PREEMPT_RT) || !write) + return proc_dointvec(table, write, buffer, lenp, ppos); + + old = *(int *)table->data; + ret = proc_dointvec(table, write, buffer, lenp, ppos); + if (ret) + return ret; + WARN_ONCE(old != *(int *)table->data, "sysctl attribute %s changed.", + table->procname); + return ret; +} +#endif + /** * proc_douintvec - read a vector of unsigned integers * @table: the sysctl table diff --git a/mm/compaction.c b/mm/compaction.c index 672d3c78c6abf..ba77809a1666e 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1590,7 +1590,11 @@ typedef enum { * Allow userspace to control policy on scanning the unevictable LRU for * compactable pages. */ +#ifdef CONFIG_PREEMPT_RT +int sysctl_compact_unevictable_allowed __read_mostly = 0; +#else int sysctl_compact_unevictable_allowed __read_mostly = 1; +#endif static inline void update_fast_start_pfn(struct compact_control *cc, unsigned long pfn) -- 2.25.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT 2020-03-03 20:22 ` [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT Sebastian Andrzej Siewior @ 2020-03-03 23:56 ` Andrew Morton 2020-03-04 8:19 ` Vlastimil Babka 2020-03-19 16:39 ` [PATCH 2/2 v4] " Sebastian Andrzej Siewior 2020-03-04 8:18 ` [PATCH 2/2 v3] " Vlastimil Babka 2020-03-04 9:11 ` Mel Gorman 2 siblings, 2 replies; 19+ messages in thread From: Andrew Morton @ 2020-03-03 23:56 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Vlastimil Babka, linux-mm, Thomas Gleixner, Luis Chamberlain, Kees Cook, Iurii Zaikin, Mel Gorman, Linux API On Tue, 3 Mar 2020 21:22:25 +0100 Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > Since commit > 5bbe3547aa3ba ("mm: allow compaction of unevictable pages") > > it is allowed to examine mlocked pages and compact them by default. > On -RT even minor pagefaults are problematic because it may take a few > 100us to resolve them and until then the task is blocked. > > Make compact_unevictable_allowed = 0 default and issue a warning on RT > if it is changed. Fair enough, I guess. > @@ -2572,6 +2577,26 @@ int proc_dointvec(struct ctl_table *table, int write, > return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL); > } > > +#ifdef CONFIG_COMPACTION > +static int proc_dointvec_warn_RT_change(struct ctl_table *table, int write, > + void __user *buffer, size_t *lenp, > + loff_t *ppos) > +{ > + int ret, old; > + > + if (!IS_ENABLED(CONFIG_PREEMPT_RT) || !write) > + return proc_dointvec(table, write, buffer, lenp, ppos); > + > + old = *(int *)table->data; > + ret = proc_dointvec(table, write, buffer, lenp, ppos); > + if (ret) > + return ret; > + WARN_ONCE(old != *(int *)table->data, "sysctl attribute %s changed.", > + table->procname); The WARN will include a stack trace which just isn't interesting. A pr_warn() would be better? > + return ret; > +} > +#endif ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT 2020-03-03 23:56 ` Andrew Morton @ 2020-03-04 8:19 ` Vlastimil Babka 2020-03-04 9:27 ` Sebastian Andrzej Siewior 2020-03-19 16:39 ` [PATCH 2/2 v4] " Sebastian Andrzej Siewior 1 sibling, 1 reply; 19+ messages in thread From: Vlastimil Babka @ 2020-03-04 8:19 UTC (permalink / raw) To: Andrew Morton, Sebastian Andrzej Siewior Cc: linux-mm, Thomas Gleixner, Luis Chamberlain, Kees Cook, Iurii Zaikin, Mel Gorman, Linux API On 3/4/20 12:56 AM, Andrew Morton wrote: >> @@ -2572,6 +2577,26 @@ int proc_dointvec(struct ctl_table *table, int write, >> return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL); >> } >> >> +#ifdef CONFIG_COMPACTION >> +static int proc_dointvec_warn_RT_change(struct ctl_table *table, int write, >> + void __user *buffer, size_t *lenp, >> + loff_t *ppos) >> +{ >> + int ret, old; >> + >> + if (!IS_ENABLED(CONFIG_PREEMPT_RT) || !write) >> + return proc_dointvec(table, write, buffer, lenp, ppos); >> + >> + old = *(int *)table->data; >> + ret = proc_dointvec(table, write, buffer, lenp, ppos); >> + if (ret) >> + return ret; >> + WARN_ONCE(old != *(int *)table->data, "sysctl attribute %s changed.", >> + table->procname); > > The WARN will include a stack trace which just isn't interesting. A > pr_warn() would be better? Yeah, the only interesting part of full WARN would possibly be, which process changed it. That might be useful to print. >> + return ret; >> +} >> +#endif > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT 2020-03-04 8:19 ` Vlastimil Babka @ 2020-03-04 9:27 ` Sebastian Andrzej Siewior 0 siblings, 0 replies; 19+ messages in thread From: Sebastian Andrzej Siewior @ 2020-03-04 9:27 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, linux-mm, Thomas Gleixner, Luis Chamberlain, Kees Cook, Iurii Zaikin, Mel Gorman, Linux API On 2020-03-04 09:19:21 [+0100], Vlastimil Babka wrote: > >> + WARN_ONCE(old != *(int *)table->data, "sysctl attribute %s changed.", > >> + table->procname); > > > > The WARN will include a stack trace which just isn't interesting. A > > pr_warn() would be better? > > Yeah, the only interesting part of full WARN would possibly be, which process > changed it. That might be useful to print. Yes, the stack trace and register dump isn't interesting. But as Vlastimil says, the task and pid are informative. So if that is too much I could extract those two informations and include it in a pr_warn(). Sebastian ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 2/2 v4] mm/compaction: Disable compact_unevictable_allowed on RT 2020-03-03 23:56 ` Andrew Morton 2020-03-04 8:19 ` Vlastimil Babka @ 2020-03-19 16:39 ` Sebastian Andrzej Siewior 2020-03-19 16:49 ` Vlastimil Babka 1 sibling, 1 reply; 19+ messages in thread From: Sebastian Andrzej Siewior @ 2020-03-19 16:39 UTC (permalink / raw) To: Andrew Morton Cc: Vlastimil Babka, linux-mm, Thomas Gleixner, Luis Chamberlain, Kees Cook, Iurii Zaikin, Mel Gorman, Linux API Since commit 5bbe3547aa3ba ("mm: allow compaction of unevictable pages") it is allowed to examine mlocked pages and compact them by default. On -RT even minor pagefaults are problematic because it may take a few 100us to resolve them and until then the task is blocked. Make compact_unevictable_allowed = 0 default and issue a warning on RT if it is changed. Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/ Acked-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- v3…v4: - Replace WARN_ONCE() with pr_warn_once() and extend the message by name of the task and its pid. - Use proc_dointvec_minmax() in the !RT case. - Added Mel's Ack as per 20200304091159.GN3818@techsingularity.net. v2…v3: - Allow to modify the value but issue a warning if it is changed. v1…v2: - Make the proc file RO instead removing it. - Mention this change in Documentation/…/vm.rst. Documentation/admin-guide/sysctl/vm.rst | 3 +++ kernel/sysctl.c | 29 ++++++++++++++++++++++++- mm/compaction.c | 4 ++++ 3 files changed, 35 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 64aeee1009cab..0329a4d3fa9ec 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -128,6 +128,9 @@ allowed to examine the unevictable lru (mlocked pages) for pages to compact. This should be used on systems where stalls for minor page faults are an acceptable trade for large contiguous free memory. Set to 0 to prevent compaction from moving pages that are unevictable. Default value is 1. +On CONFIG_PREEMPT_RT the default value is 0 in order to avoid a page fault, due +to compaction, which would block the task from becomming active until the fault +is resolved. dirty_background_bytes diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 982203101f961..43eb9da0ea86c 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -212,6 +212,11 @@ static int proc_do_cad_pid(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); static int proc_taint(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); +#ifdef CONFIG_COMPACTION +static int proc_dointvec_minmax_warn_RT_change(struct ctl_table *table, + int write, void __user *buffer, + size_t *lenp, loff_t *ppos); +#endif #endif #ifdef CONFIG_PRINTK @@ -1484,7 +1489,7 @@ static struct ctl_table vm_table[] = { .data = &sysctl_compact_unevictable_allowed, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_dointvec_minmax_warn_RT_change, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, @@ -2572,6 +2577,28 @@ int proc_dointvec(struct ctl_table *table, int write, return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL); } +#ifdef CONFIG_COMPACTION +static int proc_dointvec_minmax_warn_RT_change(struct ctl_table *table, + int write, void __user *buffer, + size_t *lenp, loff_t *ppos) +{ + int ret, old; + + if (!IS_ENABLED(CONFIG_PREEMPT_RT) || !write) + return proc_dointvec_minmax(table, write, buffer, lenp, ppos); + + old = *(int *)table->data; + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); + if (ret) + return ret; + if (old != *(int *)table->data) + pr_warn_once("sysctl attribute %s changed changed by %s[%d]\n", + table->procname, current->comm, + task_pid_nr(current)); + return ret; +} +#endif + /** * proc_douintvec - read a vector of unsigned integers * @table: the sysctl table diff --git a/mm/compaction.c b/mm/compaction.c index 672d3c78c6abf..ba77809a1666e 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1590,7 +1590,11 @@ typedef enum { * Allow userspace to control policy on scanning the unevictable LRU for * compactable pages. */ +#ifdef CONFIG_PREEMPT_RT +int sysctl_compact_unevictable_allowed __read_mostly = 0; +#else int sysctl_compact_unevictable_allowed __read_mostly = 1; +#endif static inline void update_fast_start_pfn(struct compact_control *cc, unsigned long pfn) -- 2.26.0.rc2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2 v4] mm/compaction: Disable compact_unevictable_allowed on RT 2020-03-19 16:39 ` [PATCH 2/2 v4] " Sebastian Andrzej Siewior @ 2020-03-19 16:49 ` Vlastimil Babka 2020-03-19 16:55 ` [PATCH 2/2 v5] " Sebastian Andrzej Siewior 0 siblings, 1 reply; 19+ messages in thread From: Vlastimil Babka @ 2020-03-19 16:49 UTC (permalink / raw) To: Sebastian Andrzej Siewior, Andrew Morton Cc: linux-mm, Thomas Gleixner, Luis Chamberlain, Kees Cook, Iurii Zaikin, Mel Gorman, Linux API On 3/19/20 5:39 PM, Sebastian Andrzej Siewior wrote: > Since commit > 5bbe3547aa3ba ("mm: allow compaction of unevictable pages") > > it is allowed to examine mlocked pages and compact them by default. > On -RT even minor pagefaults are problematic because it may take a few > 100us to resolve them and until then the task is blocked. > > Make compact_unevictable_allowed = 0 default and issue a warning on RT > if it is changed. > > Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/ > Acked-by: Mel Gorman <mgorman@techsingularity.net> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Nit below: > @@ -2572,6 +2577,28 @@ int proc_dointvec(struct ctl_table *table, int write, > return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL); > } > > +#ifdef CONFIG_COMPACTION > +static int proc_dointvec_minmax_warn_RT_change(struct ctl_table *table, > + int write, void __user *buffer, > + size_t *lenp, loff_t *ppos) > +{ > + int ret, old; > + > + if (!IS_ENABLED(CONFIG_PREEMPT_RT) || !write) > + return proc_dointvec_minmax(table, write, buffer, lenp, ppos); > + > + old = *(int *)table->data; > + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); > + if (ret) > + return ret; > + if (old != *(int *)table->data) > + pr_warn_once("sysctl attribute %s changed changed by %s[%d]\n", ^ "changed" twice > + table->procname, current->comm, > + task_pid_nr(current)); > + return ret; > +} > +#endif > + > /** > * proc_douintvec - read a vector of unsigned integers > * @table: the sysctl table ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 2/2 v5] mm/compaction: Disable compact_unevictable_allowed on RT 2020-03-19 16:49 ` Vlastimil Babka @ 2020-03-19 16:55 ` Sebastian Andrzej Siewior 0 siblings, 0 replies; 19+ messages in thread From: Sebastian Andrzej Siewior @ 2020-03-19 16:55 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, linux-mm, Thomas Gleixner, Luis Chamberlain, Kees Cook, Iurii Zaikin, Mel Gorman, Linux API Since commit 5bbe3547aa3ba ("mm: allow compaction of unevictable pages") it is allowed to examine mlocked pages and compact them by default. On -RT even minor pagefaults are problematic because it may take a few 100us to resolve them and until then the task is blocked. Make compact_unevictable_allowed = 0 default and issue a warning on RT if it is changed. Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/ Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- v4…v5: - Add Vlastimil's Acked-by - s/changed changed/changed/ v3…v4: - Replace WARN_ONCE() with pr_warn_once() and extend the message by name of the task and its pid. - Use proc_dointvec_minmax() in the !RT case. - Added Mel's Ack as per 20200304091159.GN3818@techsingularity.net. v2…v3: - Allow to modify the value but issue a warning if it is changed. v1…v2: - Make the proc file RO instead removing it. - Mention this change in Documentation/…/vm.rst. Documentation/admin-guide/sysctl/vm.rst | 3 +++ kernel/sysctl.c | 29 ++++++++++++++++++++++++- mm/compaction.c | 4 ++++ 3 files changed, 35 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 64aeee1009cab..0329a4d3fa9ec 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -128,6 +128,9 @@ allowed to examine the unevictable lru (mlocked pages) for pages to compact. This should be used on systems where stalls for minor page faults are an acceptable trade for large contiguous free memory. Set to 0 to prevent compaction from moving pages that are unevictable. Default value is 1. +On CONFIG_PREEMPT_RT the default value is 0 in order to avoid a page fault, due +to compaction, which would block the task from becomming active until the fault +is resolved. dirty_background_bytes diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 982203101f961..69c820d031248 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -212,6 +212,11 @@ static int proc_do_cad_pid(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); static int proc_taint(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos); +#ifdef CONFIG_COMPACTION +static int proc_dointvec_minmax_warn_RT_change(struct ctl_table *table, + int write, void __user *buffer, + size_t *lenp, loff_t *ppos); +#endif #endif #ifdef CONFIG_PRINTK @@ -1484,7 +1489,7 @@ static struct ctl_table vm_table[] = { .data = &sysctl_compact_unevictable_allowed, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_dointvec_minmax_warn_RT_change, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, @@ -2572,6 +2577,28 @@ int proc_dointvec(struct ctl_table *table, int write, return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL); } +#ifdef CONFIG_COMPACTION +static int proc_dointvec_minmax_warn_RT_change(struct ctl_table *table, + int write, void __user *buffer, + size_t *lenp, loff_t *ppos) +{ + int ret, old; + + if (!IS_ENABLED(CONFIG_PREEMPT_RT) || !write) + return proc_dointvec_minmax(table, write, buffer, lenp, ppos); + + old = *(int *)table->data; + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); + if (ret) + return ret; + if (old != *(int *)table->data) + pr_warn_once("sysctl attribute %s changed by %s[%d]\n", + table->procname, current->comm, + task_pid_nr(current)); + return ret; +} +#endif + /** * proc_douintvec - read a vector of unsigned integers * @table: the sysctl table diff --git a/mm/compaction.c b/mm/compaction.c index 672d3c78c6abf..ba77809a1666e 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1590,7 +1590,11 @@ typedef enum { * Allow userspace to control policy on scanning the unevictable LRU for * compactable pages. */ +#ifdef CONFIG_PREEMPT_RT +int sysctl_compact_unevictable_allowed __read_mostly = 0; +#else int sysctl_compact_unevictable_allowed __read_mostly = 1; +#endif static inline void update_fast_start_pfn(struct compact_control *cc, unsigned long pfn) -- 2.26.0.rc2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT 2020-03-03 20:22 ` [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT Sebastian Andrzej Siewior 2020-03-03 23:56 ` Andrew Morton @ 2020-03-04 8:18 ` Vlastimil Babka 2020-03-04 9:25 ` Sebastian Andrzej Siewior 2020-03-04 9:11 ` Mel Gorman 2 siblings, 1 reply; 19+ messages in thread From: Vlastimil Babka @ 2020-03-04 8:18 UTC (permalink / raw) To: Sebastian Andrzej Siewior, Andrew Morton Cc: linux-mm, Thomas Gleixner, Luis Chamberlain, Kees Cook, Iurii Zaikin, Mel Gorman, Linux API On 3/3/20 9:22 PM, Sebastian Andrzej Siewior wrote: > Since commit > 5bbe3547aa3ba ("mm: allow compaction of unevictable pages") > > it is allowed to examine mlocked pages and compact them by default. > On -RT even minor pagefaults are problematic because it may take a few > 100us to resolve them and until then the task is blocked. > > Make compact_unevictable_allowed = 0 default and issue a warning on RT > if it is changed. > > Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/ > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > --- > v2…v3: - Allow to modify the value but issue a warning if it is changed. > > v1…v2: - Make the proc file RO instead removing it. > - Mention this change in Documentation/…/vm.rst. > > Documentation/admin-guide/sysctl/vm.rst | 3 +++ > kernel/sysctl.c | 27 ++++++++++++++++++++++++- > mm/compaction.c | 4 ++++ > 3 files changed, 33 insertions(+), 1 deletion(-) > > diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst > index 64aeee1009cab..0329a4d3fa9ec 100644 > --- a/Documentation/admin-guide/sysctl/vm.rst > +++ b/Documentation/admin-guide/sysctl/vm.rst > @@ -128,6 +128,9 @@ allowed to examine the unevictable lru (mlocked pages) for pages to compact. > This should be used on systems where stalls for minor page faults are an > acceptable trade for large contiguous free memory. Set to 0 to prevent > compaction from moving pages that are unevictable. Default value is 1. > +On CONFIG_PREEMPT_RT the default value is 0 in order to avoid a page fault, due > +to compaction, which would block the task from becomming active until the fault > +is resolved. > > > dirty_background_bytes > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index 982203101f961..3ace90b6ac57f 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -212,6 +212,11 @@ static int proc_do_cad_pid(struct ctl_table *table, int write, > void __user *buffer, size_t *lenp, loff_t *ppos); > static int proc_taint(struct ctl_table *table, int write, > void __user *buffer, size_t *lenp, loff_t *ppos); > +#ifdef CONFIG_COMPACTION > +static int proc_dointvec_warn_RT_change(struct ctl_table *table, int write, > + void __user *buffer, size_t *lenp, > + loff_t *ppos); > +#endif > #endif > > #ifdef CONFIG_PRINTK > @@ -1484,7 +1489,7 @@ static struct ctl_table vm_table[] = { > .data = &sysctl_compact_unevictable_allowed, > .maxlen = sizeof(int), > .mode = 0644, > - .proc_handler = proc_dointvec_minmax, > + .proc_handler = proc_dointvec_warn_RT_change, > .extra1 = SYSCTL_ZERO, > .extra2 = SYSCTL_ONE, > }, > @@ -2572,6 +2577,26 @@ int proc_dointvec(struct ctl_table *table, int write, > return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL); > } > > +#ifdef CONFIG_COMPACTION > +static int proc_dointvec_warn_RT_change(struct ctl_table *table, int write, > + void __user *buffer, size_t *lenp, > + loff_t *ppos) > +{ > + int ret, old; > + > + if (!IS_ENABLED(CONFIG_PREEMPT_RT) || !write) > + return proc_dointvec(table, write, buffer, lenp, ppos); Shouldn't you use her proc_dointvec_minmax() per Patch 1/2 ? > + > + old = *(int *)table->data; > + ret = proc_dointvec(table, write, buffer, lenp, ppos); And here. > + if (ret) > + return ret; > + WARN_ONCE(old != *(int *)table->data, "sysctl attribute %s changed.", > + table->procname); > + return ret; > +} > +#endif > + > /** > * proc_douintvec - read a vector of unsigned integers > * @table: the sysctl table > diff --git a/mm/compaction.c b/mm/compaction.c > index 672d3c78c6abf..ba77809a1666e 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1590,7 +1590,11 @@ typedef enum { > * Allow userspace to control policy on scanning the unevictable LRU for > * compactable pages. > */ > +#ifdef CONFIG_PREEMPT_RT > +int sysctl_compact_unevictable_allowed __read_mostly = 0; > +#else > int sysctl_compact_unevictable_allowed __read_mostly = 1; > +#endif > > static inline void > update_fast_start_pfn(struct compact_control *cc, unsigned long pfn) > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT 2020-03-04 8:18 ` [PATCH 2/2 v3] " Vlastimil Babka @ 2020-03-04 9:25 ` Sebastian Andrzej Siewior 0 siblings, 0 replies; 19+ messages in thread From: Sebastian Andrzej Siewior @ 2020-03-04 9:25 UTC (permalink / raw) To: Vlastimil Babka Cc: Andrew Morton, linux-mm, Thomas Gleixner, Luis Chamberlain, Kees Cook, Iurii Zaikin, Mel Gorman, Linux API On 2020-03-04 09:18:21 [+0100], Vlastimil Babka wrote: > > @@ -2572,6 +2577,26 @@ int proc_dointvec(struct ctl_table *table, int write, > > return do_proc_dointvec(table, write, buffer, lenp, ppos, NULL, NULL); > > } > > > > +#ifdef CONFIG_COMPACTION > > +static int proc_dointvec_warn_RT_change(struct ctl_table *table, int write, > > + void __user *buffer, size_t *lenp, > > + loff_t *ppos) > > +{ > > + int ret, old; > > + > > + if (!IS_ENABLED(CONFIG_PREEMPT_RT) || !write) > > + return proc_dointvec(table, write, buffer, lenp, ppos); > > Shouldn't you use her proc_dointvec_minmax() per Patch 1/2 ? > > > + > > + old = *(int *)table->data; > > + ret = proc_dointvec(table, write, buffer, lenp, ppos); > > And here. Yes, thank you for noticing. It didn't make from editor to disk after rebasing… Sebastian ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT 2020-03-03 20:22 ` [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT Sebastian Andrzej Siewior 2020-03-03 23:56 ` Andrew Morton 2020-03-04 8:18 ` [PATCH 2/2 v3] " Vlastimil Babka @ 2020-03-04 9:11 ` Mel Gorman 2 siblings, 0 replies; 19+ messages in thread From: Mel Gorman @ 2020-03-04 9:11 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Andrew Morton, Vlastimil Babka, linux-mm, Thomas Gleixner, Luis Chamberlain, Kees Cook, Iurii Zaikin, Linux API On Tue, Mar 03, 2020 at 09:22:25PM +0100, Sebastian Andrzej Siewior wrote: > Since commit > 5bbe3547aa3ba ("mm: allow compaction of unevictable pages") > > it is allowed to examine mlocked pages and compact them by default. > On -RT even minor pagefaults are problematic because it may take a few > 100us to resolve them and until then the task is blocked. > > Make compact_unevictable_allowed = 0 default and issue a warning on RT > if it is changed. > > Link: https://lore.kernel.org/linux-mm/20190710144138.qyn4tuttdq6h7kqx@linutronix.de/ > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Mel Gorman <mgorman@techsingularity.net> (caveat: I do not spend very much some on RT-specific topics) While I ack'd this, an RT application using THP is playing with fire, I know the RT extension for SLE explicitly disables it from being enabled at kernel config time. At minimum the critical regions should be mlocked followed by prctl to disable future THP faults that are non-deterministic, both from an allocation point of view, and a TLB access point of view. It's still reasonable to expect a smaller TLB reach for huge pages than base pages. It's a similar hazard with NUMA balancing, an RT application should either disable balancing globally or set a memory policy that forces it to be ignored. They should be doing this anyway to avoid non-deterministic memory access costs due to NUMA artifacts but it wouldn't surprise me if some applications got it wrong. In that case, the SLE RT extension disables balancing by default and it probably should warn if it's enabled like this patch does. It wouldn't surprise me to see patches like this in the future (completely untested, illustrative only). diff --git a/init/Kconfig b/init/Kconfig index 452bc1835cd4..7a406e2b5580 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -797,7 +797,7 @@ config NUMA_BALANCING config NUMA_BALANCING_DEFAULT_ENABLED bool "Automatically enable NUMA aware memory/task placement" default y - depends on NUMA_BALANCING + depends on NUMA_BALANCING && !PREEMPT_RT help If set, automatic NUMA balancing will be enabled if running on a NUMA machine. diff --git a/mm/Kconfig b/mm/Kconfig index ab80933be65f..313a5d794491 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -385,7 +385,7 @@ config TRANSPARENT_HUGEPAGE choice prompt "Transparent Hugepage Support sysfs defaults" - depends on TRANSPARENT_HUGEPAGE + depends on TRANSPARENT_HUGEPAGE && !PREEMPT_RT default TRANSPARENT_HUGEPAGE_ALWAYS help Selects the sysfs defaults for Transparent Hugepage Support. I would hope that a user of an RT kernel with an RT-aware application would be aware of this anyway but ..... uhhhh. Point for Andrew is that I would not be too surprised if there were more RT-specific checks in the future that sanity checked some configuration options in response to RT-specific bugs that were down to insane configurations (be they kernel configs or sysctls) -- Mel Gorman SUSE Labs ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 1/2] mm/compaction: Really limit compact_unevictable_allowed to 0…1 2020-03-03 20:20 ` [PATCH 1/2] =?UTF-8?q?mm/compaction:=20Really=20limit=20compact?= =?UTF-8?q?=5Funevictable=5Fallowed=20to=200=E2=80=A61?= Sebastian Andrzej Siewior 2020-03-03 20:22 ` [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT Sebastian Andrzej Siewior @ 2020-03-04 8:12 ` Vlastimil Babka 1 sibling, 0 replies; 19+ messages in thread From: Vlastimil Babka @ 2020-03-04 8:12 UTC (permalink / raw) To: Sebastian Andrzej Siewior, Andrew Morton Cc: linux-mm, Thomas Gleixner, Luis Chamberlain, Kees Cook, Iurii Zaikin, Mel Gorman, Linux API On 3/3/20 9:20 PM, Sebastian Andrzej Siewior wrote: > The proc file `compact_unevictable_allowed' should allow 0 and 1 only, > the `extra*' attribues have been set properly but without > proc_dointvec_minmax() as the `proc_handler' the limit will not be > enforced. > > Use proc_dointvec_minmax() as the `proc_handler' to enfoce the valid > specified range. > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> > --- > kernel/sysctl.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sysctl.c b/kernel/sysctl.c > index ad5b88a53c5a8..982203101f961 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -1484,7 +1484,7 @@ static struct ctl_table vm_table[] = { > .data = &sysctl_compact_unevictable_allowed, > .maxlen = sizeof(int), > .mode = 0644, > - .proc_handler = proc_dointvec, > + .proc_handler = proc_dointvec_minmax, > .extra1 = SYSCTL_ZERO, > .extra2 = SYSCTL_ONE, > }, > ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2020-03-19 16:55 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-15 16:10 [PATCH] mm/compaction: Disable compact_unevictable_allowed on RT Sebastian Andrzej Siewior [not found] ` <20200115161035.893221-1-bigeasy-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org> 2020-01-15 22:04 ` Vlastimil Babka 2020-01-15 22:04 ` Vlastimil Babka 2020-01-16 10:22 ` Sebastian Andrzej Siewior 2020-03-02 17:35 ` [PATCH v2] " Sebastian Andrzej Siewior 2020-03-02 21:25 ` Andrew Morton 2020-03-03 17:59 ` Sebastian Andrzej Siewior 2020-03-03 20:20 ` [PATCH 1/2] =?UTF-8?q?mm/compaction:=20Really=20limit=20compact?= =?UTF-8?q?=5Funevictable=5Fallowed=20to=200=E2=80=A61?= Sebastian Andrzej Siewior 2020-03-03 20:22 ` [PATCH 2/2 v3] mm/compaction: Disable compact_unevictable_allowed on RT Sebastian Andrzej Siewior 2020-03-03 23:56 ` Andrew Morton 2020-03-04 8:19 ` Vlastimil Babka 2020-03-04 9:27 ` Sebastian Andrzej Siewior 2020-03-19 16:39 ` [PATCH 2/2 v4] " Sebastian Andrzej Siewior 2020-03-19 16:49 ` Vlastimil Babka 2020-03-19 16:55 ` [PATCH 2/2 v5] " Sebastian Andrzej Siewior 2020-03-04 8:18 ` [PATCH 2/2 v3] " Vlastimil Babka 2020-03-04 9:25 ` Sebastian Andrzej Siewior 2020-03-04 9:11 ` Mel Gorman 2020-03-04 8:12 ` [PATCH 1/2] mm/compaction: Really limit compact_unevictable_allowed to 0…1 Vlastimil Babka
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.