* [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() @ 2019-11-13 9:55 Shile Zhang 2019-11-13 15:17 ` Qian Cai ` (3 more replies) 0 siblings, 4 replies; 13+ messages in thread From: Shile Zhang @ 2019-11-13 9:55 UTC (permalink / raw) To: Andrew Morton, Joerg Roedel; +Cc: linux-mm, linux-kbuild, Shile Zhang vmalloc_sync_all() was put in the common path in __purge_vmap_area_lazy(), for one sync issue only happened on X86_32 with PTI enabled. It is needless for X86_64, which caused a big regression in UnixBench Shell8 testing on X86_64. Similar regression also reported by 0-day kernel test robot in reaim benchmarking: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ Fix it by adding more conditions. Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com> --- mm/vmalloc.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index a3c70e275f4e..7b9fc7966da6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) if (unlikely(valist == NULL)) return false; +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) /* * First make sure the mappings are removed from all page-tables * before they are freed. + * + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is + * the case on a PAE kernel with PTI enabled. */ - vmalloc_sync_all(); + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) + vmalloc_sync_all(); +#endif /* * TODO: to calculate a flush range without looping. -- 2.24.0.rc2 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() 2019-11-13 9:55 [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() Shile Zhang @ 2019-11-13 15:17 ` Qian Cai 2019-11-13 21:12 ` Andrew Morton ` (2 subsequent siblings) 3 siblings, 0 replies; 13+ messages in thread From: Qian Cai @ 2019-11-13 15:17 UTC (permalink / raw) To: Shile Zhang, Andrew Morton, Joerg Roedel; +Cc: linux-mm, linux-kbuild On Wed, 2019-11-13 at 17:55 +0800, Shile Zhang wrote: > vmalloc_sync_all() was put in the common path in > __purge_vmap_area_lazy(), for one sync issue only happened on X86_32 > with PTI enabled. It is needless for X86_64, which caused a big regression > in UnixBench Shell8 testing on X86_64. > Similar regression also reported by 0-day kernel test robot in reaim > benchmarking: > https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ > > Fix it by adding more conditions. > > Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") > > Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com> > --- > mm/vmalloc.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index a3c70e275f4e..7b9fc7966da6 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) > if (unlikely(valist == NULL)) > return false; > > +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) Is it cleaner to use if (IS_ENABLED()...) ? > /* > * First make sure the mappings are removed from all page-tables > * before they are freed. > + * > + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is > + * the case on a PAE kernel with PTI enabled. > */ > - vmalloc_sync_all(); > + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > + vmalloc_sync_all(); > +#endif > > /* > * TODO: to calculate a flush range without looping. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() 2019-11-13 9:55 [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() Shile Zhang 2019-11-13 15:17 ` Qian Cai @ 2019-11-13 21:12 ` Andrew Morton 2019-11-14 0:54 ` Shile Zhang 2019-11-14 13:56 ` Michal Hocko 2019-11-14 17:12 ` Joerg Roedel 3 siblings, 1 reply; 13+ messages in thread From: Andrew Morton @ 2019-11-13 21:12 UTC (permalink / raw) To: Shile Zhang Cc: Joerg Roedel, linux-mm, linux-kbuild, Thomas Gleixner, Dave Hansen, Qian Cai On Wed, 13 Nov 2019 17:55:30 +0800 Shile Zhang <shile.zhang@linux.alibaba.com> wrote: > vmalloc_sync_all() was put in the common path in > __purge_vmap_area_lazy(), for one sync issue only happened on X86_32 > with PTI enabled. It is needless for X86_64, which caused a big regression > in UnixBench Shell8 testing on X86_64. > Similar regression also reported by 0-day kernel test robot in reaim > benchmarking: > https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ That is indeed a large performance regression. > Fix it by adding more conditions. > > Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") > > ... > > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) > if (unlikely(valist == NULL)) > return false; > > +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) Are we sure that x86_32 is the only architecture whcih is (or ever will be) affected? > /* > * First make sure the mappings are removed from all page-tables > * before they are freed. > + * > + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is > + * the case on a PAE kernel with PTI enabled. > */ > - vmalloc_sync_all(); > + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > + vmalloc_sync_all(); > +#endif > > /* > * TODO: to calculate a flush range without looping. CONFIG_X86_PAE depends on CONFIG_X86_32 so no need to check CONFIG_X86_32? From: Andrew Morton <akpm@linux-foundation.org> Subject: mm-vmalloc-fix-regression-caused-by-needless-vmalloc_sync_all-fix simplify config expression, use IS_ENABLED() Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Qian Cai <cai@lca.pw> Cc: Shile Zhang <shile.zhang@linux.alibaba.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/vmalloc.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) --- a/mm/vmalloc.c~mm-vmalloc-fix-regression-caused-by-needless-vmalloc_sync_all-fix +++ a/mm/vmalloc.c @@ -1255,16 +1255,17 @@ static bool __purge_vmap_area_lazy(unsig if (unlikely(valist == NULL)) return false; -#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) - /* - * First make sure the mappings are removed from all page-tables - * before they are freed. - * - * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is - * the case on a PAE kernel with PTI enabled. - */ - if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) - vmalloc_sync_all(); + if (IS_ENABLED(CONFIG_X86_PAE)) { + /* + * First make sure the mappings are removed from all page-tables + * before they are freed. + * + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which + * is the case on a PAE kernel with PTI enabled. + */ + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) + vmalloc_sync_all(); + } #endif /* _ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() 2019-11-13 21:12 ` Andrew Morton @ 2019-11-14 0:54 ` Shile Zhang 0 siblings, 0 replies; 13+ messages in thread From: Shile Zhang @ 2019-11-14 0:54 UTC (permalink / raw) To: Andrew Morton Cc: Joerg Roedel, linux-mm, linux-kernel, Thomas Gleixner, Dave Hansen, Qian Cai Fix wrong cc list. On 2019/11/14 05:12, Andrew Morton wrote: > On Wed, 13 Nov 2019 17:55:30 +0800 Shile Zhang <shile.zhang@linux.alibaba.com> wrote: > >> vmalloc_sync_all() was put in the common path in >> __purge_vmap_area_lazy(), for one sync issue only happened on X86_32 >> with PTI enabled. It is needless for X86_64, which caused a big regression >> in UnixBench Shell8 testing on X86_64. >> Similar regression also reported by 0-day kernel test robot in reaim >> benchmarking: >> https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ > That is indeed a large performance regression. > >> Fix it by adding more conditions. >> >> Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") >> >> ... >> >> --- a/mm/vmalloc.c >> +++ b/mm/vmalloc.c >> @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) >> if (unlikely(valist == NULL)) >> return false; >> >> +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) > Are we sure that x86_32 is the only architecture whcih is (or ever will > be) affected? > >> /* >> * First make sure the mappings are removed from all page-tables >> * before they are freed. >> + * >> + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is >> + * the case on a PAE kernel with PTI enabled. >> */ >> - vmalloc_sync_all(); >> + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) >> + vmalloc_sync_all(); >> +#endif >> >> /* >> * TODO: to calculate a flush range without looping. > CONFIG_X86_PAE depends on CONFIG_X86_32 so no need to check > CONFIG_X86_32? Yes, Thanks for your review and kindly refactoring! > From: Andrew Morton <akpm@linux-foundation.org> > Subject: mm-vmalloc-fix-regression-caused-by-needless-vmalloc_sync_all-fix > > simplify config expression, use IS_ENABLED() > > Cc: Dave Hansen <dave.hansen@linux.intel.com> > Cc: Joerg Roedel <jroedel@suse.de> > Cc: Qian Cai <cai@lca.pw> > Cc: Shile Zhang <shile.zhang@linux.alibaba.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > mm/vmalloc.c | 21 +++++++++++---------- > 1 file changed, 11 insertions(+), 10 deletions(-) > > --- a/mm/vmalloc.c~mm-vmalloc-fix-regression-caused-by-needless-vmalloc_sync_all-fix > +++ a/mm/vmalloc.c > @@ -1255,16 +1255,17 @@ static bool __purge_vmap_area_lazy(unsig > if (unlikely(valist == NULL)) > return false; > > -#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) > - /* > - * First make sure the mappings are removed from all page-tables > - * before they are freed. > - * > - * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is > - * the case on a PAE kernel with PTI enabled. > - */ > - if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > - vmalloc_sync_all(); > + if (IS_ENABLED(CONFIG_X86_PAE)) { > + /* > + * First make sure the mappings are removed from all page-tables > + * before they are freed. > + * > + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which > + * is the case on a PAE kernel with PTI enabled. > + */ > + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > + vmalloc_sync_all(); > + } > #endif > > /* > _ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() 2019-11-13 9:55 [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() Shile Zhang 2019-11-13 15:17 ` Qian Cai 2019-11-13 21:12 ` Andrew Morton @ 2019-11-14 13:56 ` Michal Hocko 2019-11-14 14:40 ` Shile Zhang 2019-11-14 17:12 ` Joerg Roedel 3 siblings, 1 reply; 13+ messages in thread From: Michal Hocko @ 2019-11-14 13:56 UTC (permalink / raw) To: Shile Zhang; +Cc: Andrew Morton, Joerg Roedel, linux-mm, linux-kbuild On Wed 13-11-19 17:55:30, Shile Zhang wrote: > vmalloc_sync_all() was put in the common path in > __purge_vmap_area_lazy(), for one sync issue only happened on X86_32 > with PTI enabled. It is needless for X86_64, which caused a big regression > in UnixBench Shell8 testing on X86_64. > Similar regression also reported by 0-day kernel test robot in reaim > benchmarking: > https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ > > Fix it by adding more conditions. This doesn't really explain a lot. Could you explain why the syncing should be limited only to PAE and !SHARED_KERNEL_PMD? > Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") > > Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com> > --- > mm/vmalloc.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index a3c70e275f4e..7b9fc7966da6 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) > if (unlikely(valist == NULL)) > return false; > > +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) > /* > * First make sure the mappings are removed from all page-tables > * before they are freed. > + * > + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is > + * the case on a PAE kernel with PTI enabled. > */ > - vmalloc_sync_all(); > + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > + vmalloc_sync_all(); > +#endif > > /* > * TODO: to calculate a flush range without looping. > -- > 2.24.0.rc2 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() 2019-11-14 13:56 ` Michal Hocko @ 2019-11-14 14:40 ` Shile Zhang 2019-11-14 17:14 ` Joerg Roedel 0 siblings, 1 reply; 13+ messages in thread From: Shile Zhang @ 2019-11-14 14:40 UTC (permalink / raw) To: Michal Hocko; +Cc: Andrew Morton, Joerg Roedel, linux-mm, linux-kernel On 2019/11/14 21:56, Michal Hocko wrote: > On Wed 13-11-19 17:55:30, Shile Zhang wrote: >> vmalloc_sync_all() was put in the common path in >> __purge_vmap_area_lazy(), for one sync issue only happened on X86_32 >> with PTI enabled. It is needless for X86_64, which caused a big regression >> in UnixBench Shell8 testing on X86_64. >> Similar regression also reported by 0-day kernel test robot in reaim >> benchmarking: >> https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ >> >> Fix it by adding more conditions. > This doesn't really explain a lot. Could you explain why the syncing > should be limited only to PAE and !SHARED_KERNEL_PMD? Thanks for your review! Sorry for that, I'm not clear about the original issue the patch 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") fixed. I just get that limitation info from the commit log as following: " This is only needed x86-32 with !SHARED_KERNEL_PMD, which is the case on a PAE kernel with PTI enabled. On affected systems the missing sync causes old mappings to persist in some page-tables, causing data corruption and other undefined behavior. " https://patchwork.kernel.org/cover/11050511/ Hi, Joerg Could you please help to recall the original issue you encountered before? Thanks! > >> Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") >> >> Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com> >> --- >> mm/vmalloc.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >> index a3c70e275f4e..7b9fc7966da6 100644 >> --- a/mm/vmalloc.c >> +++ b/mm/vmalloc.c >> @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) >> if (unlikely(valist == NULL)) >> return false; >> >> +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) >> /* >> * First make sure the mappings are removed from all page-tables >> * before they are freed. >> + * >> + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is >> + * the case on a PAE kernel with PTI enabled. >> */ >> - vmalloc_sync_all(); >> + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) >> + vmalloc_sync_all(); >> +#endif >> >> /* >> * TODO: to calculate a flush range without looping. >> -- >> 2.24.0.rc2 >> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() 2019-11-14 14:40 ` Shile Zhang @ 2019-11-14 17:14 ` Joerg Roedel 0 siblings, 0 replies; 13+ messages in thread From: Joerg Roedel @ 2019-11-14 17:14 UTC (permalink / raw) To: Shile Zhang; +Cc: Michal Hocko, Andrew Morton, linux-mm, linux-kernel On Thu, Nov 14, 2019 at 10:40:22PM +0800, Shile Zhang wrote: > Could you please help to recall the original issue you encountered before? The original issue was data corruption because old mappings did not get removed from the vmalloc area, and thus also new mappings did not get faulted in. So depending on the page-table currently loaded one vmalloc/ioremap area pointed to different (often already freed or re-used) memory and caused the corruption. Regards, Joerg ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() 2019-11-13 9:55 [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() Shile Zhang ` (2 preceding siblings ...) 2019-11-14 13:56 ` Michal Hocko @ 2019-11-14 17:12 ` Joerg Roedel 2019-11-15 0:01 ` Andrew Morton 2019-11-15 1:06 ` Shile Zhang 3 siblings, 2 replies; 13+ messages in thread From: Joerg Roedel @ 2019-11-14 17:12 UTC (permalink / raw) To: Shile Zhang; +Cc: Andrew Morton, linux-mm, linux-kbuild On Wed, Nov 13, 2019 at 05:55:30PM +0800, Shile Zhang wrote: > +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) > /* > * First make sure the mappings are removed from all page-tables > * before they are freed. > + * > + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is > + * the case on a PAE kernel with PTI enabled. > */ > - vmalloc_sync_all(); > + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > + vmalloc_sync_all(); > +#endif I already submitted another fix for this quite some time ago: https://lore.kernel.org/lkml/20191009124418.8286-1-joro@8bytes.org/ Regards, Joerg ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() 2019-11-14 17:12 ` Joerg Roedel @ 2019-11-15 0:01 ` Andrew Morton 2019-11-15 8:38 ` Borislav Petkov 2019-11-15 1:06 ` Shile Zhang 1 sibling, 1 reply; 13+ messages in thread From: Andrew Morton @ 2019-11-15 0:01 UTC (permalink / raw) To: Joerg Roedel Cc: Shile Zhang, linux-mm, linux-kbuild, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, hpa On Thu, 14 Nov 2019 18:12:31 +0100 Joerg Roedel <jroedel@suse.de> wrote: > On Wed, Nov 13, 2019 at 05:55:30PM +0800, Shile Zhang wrote: > > +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) > > /* > > * First make sure the mappings are removed from all page-tables > > * before they are freed. > > + * > > + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is > > + * the case on a PAE kernel with PTI enabled. > > */ > > - vmalloc_sync_all(); > > + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > > + vmalloc_sync_all(); > > +#endif > > I already submitted another fix for this quite some time ago: > > https://lore.kernel.org/lkml/20191009124418.8286-1-joro@8bytes.org/ > Your patch is more verbose but looks quite a bit nicer than this patch (Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com). The separation of sync-for-a-mapping versus sync-for-an-unmapping adds clarity. It's fairly urgent - I consider this to be -stable material. Thomas & co, was that a deliberate skip? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() 2019-11-15 0:01 ` Andrew Morton @ 2019-11-15 8:38 ` Borislav Petkov 2019-11-15 23:37 ` Andrew Morton 0 siblings, 1 reply; 13+ messages in thread From: Borislav Petkov @ 2019-11-15 8:38 UTC (permalink / raw) To: Andrew Morton Cc: Joerg Roedel, Shile Zhang, linux-mm, linux-kbuild, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, hpa On Thu, Nov 14, 2019 at 04:01:08PM -0800, Andrew Morton wrote: > It's fairly urgent - I consider this to be -stable material. > > Thomas & co, was that a deliberate skip? More like lost in the mail flood. I'm assuming you're taking it? In any case, patch looks ok to me - the "unmappings" naming sounds like coming from someone with a very wild phantasy... :-) Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() 2019-11-15 8:38 ` Borislav Petkov @ 2019-11-15 23:37 ` Andrew Morton 2019-11-16 17:38 ` Borislav Petkov 0 siblings, 1 reply; 13+ messages in thread From: Andrew Morton @ 2019-11-15 23:37 UTC (permalink / raw) To: Borislav Petkov Cc: Joerg Roedel, Shile Zhang, linux-mm, linux-kbuild, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, hpa On Fri, 15 Nov 2019 09:38:47 +0100 Borislav Petkov <bp@alien8.de> wrote: > On Thu, Nov 14, 2019 at 04:01:08PM -0800, Andrew Morton wrote: > > It's fairly urgent - I consider this to be -stable material. > > > > Thomas & co, was that a deliberate skip? > > More like lost in the mail flood. I'm assuming you're taking it? > > In any case, patch looks ok to me - the "unmappings" naming sounds like > coming from someone with a very wild phantasy... :-) > OK, thanks, here's what I queued. Reviews, acks and testing, please?? From: Joerg Roedel <jroedel@suse.de> Subject: x86/mm: Split vmalloc_sync_all() Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in the vunmap() code-path. While this change was necessary to maintain correctness on x86-32-pae kernels, it also adds additional cycles for architectures that don't need it. Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported severe performance regressions in micro-benchmarks because it now also calls the x86-64 implementation of vmalloc_sync_all() on vunmap(). But the vmalloc_sync_all() implementation on x86-64 is only needed for newly created mappings. To avoid the unnecessary work on x86-64 and to gain the performance back, split up vmalloc_sync_all() into two functions: * vmalloc_sync_mappings(), and * vmalloc_sync_unmappings() Most call-sites to vmalloc_sync_all() only care about new mappings being synchronized. The only exception is the new call-site added in the above mentioned commit. Shile Zhang directed us to a report of an 80% regression in reaim throughput. Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") Signed-off-by: Joerg Roedel <jroedel@suse.de> Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [GHES] Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- arch/x86/mm/fault.c | 26 ++++++++++++++++++++++++-- drivers/acpi/apei/ghes.c | 2 +- include/linux/vmalloc.h | 5 +++-- kernel/notifier.c | 2 +- mm/nommu.c | 10 +++++++--- mm/vmalloc.c | 11 +++++++---- 6 files changed, 43 insertions(+), 13 deletions(-) --- a/arch/x86/mm/fault.c~x86-mm-split-vmalloc_sync_all +++ a/arch/x86/mm/fault.c @@ -189,7 +189,7 @@ static inline pmd_t *vmalloc_sync_one(pg return pmd_k; } -void vmalloc_sync_all(void) +static void vmalloc_sync(void) { unsigned long address; @@ -216,6 +216,16 @@ void vmalloc_sync_all(void) } } +void vmalloc_sync_mappings(void) +{ + vmalloc_sync(); +} + +void vmalloc_sync_unmappings(void) +{ + vmalloc_sync(); +} + /* * 32-bit: * @@ -318,11 +328,23 @@ out: #else /* CONFIG_X86_64: */ -void vmalloc_sync_all(void) +void vmalloc_sync_mappings(void) { + /* + * 64-bit mappings might allocate new p4d/pud pages + * that need to be propagated to all tasks' PGDs. + */ sync_global_pgds(VMALLOC_START & PGDIR_MASK, VMALLOC_END); } +void vmalloc_sync_unmappings(void) +{ + /* + * Unmappings never allocate or free p4d/pud pages. + * No work is required here. + */ +} + /* * 64-bit: * --- a/drivers/acpi/apei/ghes.c~x86-mm-split-vmalloc_sync_all +++ a/drivers/acpi/apei/ghes.c @@ -171,7 +171,7 @@ int ghes_estatus_pool_init(int num_ghes) * New allocation must be visible in all pgd before it can be found by * an NMI allocating from the pool. */ - vmalloc_sync_all(); + vmalloc_sync_mappings(); rc = gen_pool_add(ghes_estatus_pool, addr, PAGE_ALIGN(len), -1); if (rc) --- a/include/linux/vmalloc.h~x86-mm-split-vmalloc_sync_all +++ a/include/linux/vmalloc.h @@ -126,8 +126,9 @@ extern int remap_vmalloc_range_partial(s extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr, unsigned long pgoff); -void vmalloc_sync_all(void); - +void vmalloc_sync_mappings(void); +void vmalloc_sync_unmappings(void); + /* * Lowlevel-APIs (not for driver use!) */ --- a/kernel/notifier.c~x86-mm-split-vmalloc_sync_all +++ a/kernel/notifier.c @@ -554,7 +554,7 @@ NOKPROBE_SYMBOL(notify_die); int register_die_notifier(struct notifier_block *nb) { - vmalloc_sync_all(); + vmalloc_sync_mappings(); return atomic_notifier_chain_register(&die_chain, nb); } EXPORT_SYMBOL_GPL(register_die_notifier); --- a/mm/nommu.c~x86-mm-split-vmalloc_sync_all +++ a/mm/nommu.c @@ -359,10 +359,14 @@ void vm_unmap_aliases(void) EXPORT_SYMBOL_GPL(vm_unmap_aliases); /* - * Implement a stub for vmalloc_sync_all() if the architecture chose not to - * have one. + * Implement a stub for vmalloc_sync_[un]mapping() if the architecture + * chose not to have one. */ -void __weak vmalloc_sync_all(void) +void __weak vmalloc_sync_mappings(void) +{ +} + +void __weak vmalloc_sync_unmappings(void) { } --- a/mm/vmalloc.c~x86-mm-split-vmalloc_sync_all +++ a/mm/vmalloc.c @@ -1259,7 +1259,7 @@ static bool __purge_vmap_area_lazy(unsig * First make sure the mappings are removed from all page-tables * before they are freed. */ - vmalloc_sync_all(); + vmalloc_sync_unmappings(); /* * TODO: to calculate a flush range without looping. @@ -3050,16 +3050,19 @@ int remap_vmalloc_range(struct vm_area_s EXPORT_SYMBOL(remap_vmalloc_range); /* - * Implement a stub for vmalloc_sync_all() if the architecture chose not to - * have one. + * Implement stubs for vmalloc_sync_[un]mappings () if the architecture chose + * not to have one. * * The purpose of this function is to make sure the vmalloc area * mappings are identical in all page-tables in the system. */ -void __weak vmalloc_sync_all(void) +void __weak vmalloc_sync_mappings(void) { } +void __weak vmalloc_sync_unmappings(void) +{ +} static int f(pte_t *pte, unsigned long addr, void *data) { _ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() 2019-11-15 23:37 ` Andrew Morton @ 2019-11-16 17:38 ` Borislav Petkov 0 siblings, 0 replies; 13+ messages in thread From: Borislav Petkov @ 2019-11-16 17:38 UTC (permalink / raw) To: Andrew Morton Cc: Joerg Roedel, Shile Zhang, linux-mm, linux-kbuild, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, hpa On Fri, Nov 15, 2019 at 03:37:21PM -0800, Andrew Morton wrote: > OK, thanks, here's what I queued. Reviews, acks and testing, please?? > > From: Joerg Roedel <jroedel@suse.de> > Subject: x86/mm: Split vmalloc_sync_all() > > Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in > __purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in the > vunmap() code-path. While this change was necessary to maintain > correctness on x86-32-pae kernels, it also adds additional cycles for > architectures that don't need it. ... > arch/x86/mm/fault.c | 26 ++++++++++++++++++++++++-- > drivers/acpi/apei/ghes.c | 2 +- > include/linux/vmalloc.h | 5 +++-- > kernel/notifier.c | 2 +- > mm/nommu.c | 10 +++++++--- > mm/vmalloc.c | 11 +++++++---- > 6 files changed, 43 insertions(+), 13 deletions(-) Booting it looks ok here: Tested-by: Borislav Petkov <bp@suse.de> -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() 2019-11-14 17:12 ` Joerg Roedel 2019-11-15 0:01 ` Andrew Morton @ 2019-11-15 1:06 ` Shile Zhang 1 sibling, 0 replies; 13+ messages in thread From: Shile Zhang @ 2019-11-15 1:06 UTC (permalink / raw) To: Joerg Roedel; +Cc: Andrew Morton, linux-mm, linux-kernel On 2019/11/15 01:12, Joerg Roedel wrote: > On Wed, Nov 13, 2019 at 05:55:30PM +0800, Shile Zhang wrote: >> +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) >> /* >> * First make sure the mappings are removed from all page-tables >> * before they are freed. >> + * >> + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is >> + * the case on a PAE kernel with PTI enabled. >> */ >> - vmalloc_sync_all(); >> + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) >> + vmalloc_sync_all(); >> +#endif > I already submitted another fix for this quite some time ago: > > https://lore.kernel.org/lkml/20191009124418.8286-1-joro@8bytes.org/ > > Regards, > > Joerg Oh, sorry for I missed that, good job and thanks for your work! ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2019-11-16 17:38 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-13 9:55 [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() Shile Zhang 2019-11-13 15:17 ` Qian Cai 2019-11-13 21:12 ` Andrew Morton 2019-11-14 0:54 ` Shile Zhang 2019-11-14 13:56 ` Michal Hocko 2019-11-14 14:40 ` Shile Zhang 2019-11-14 17:14 ` Joerg Roedel 2019-11-14 17:12 ` Joerg Roedel 2019-11-15 0:01 ` Andrew Morton 2019-11-15 8:38 ` Borislav Petkov 2019-11-15 23:37 ` Andrew Morton 2019-11-16 17:38 ` Borislav Petkov 2019-11-15 1:06 ` Shile Zhang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).