linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all()
@ 2019-11-13  9:55 Shile Zhang
  2019-11-13 15:17 ` Qian Cai
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Shile Zhang @ 2019-11-13  9:55 UTC (permalink / raw)
  To: Andrew Morton, Joerg Roedel; +Cc: linux-mm, linux-kbuild, Shile Zhang

vmalloc_sync_all() was put in the common path in
__purge_vmap_area_lazy(), for one sync issue only happened on X86_32
with PTI enabled. It is needless for X86_64, which caused a big regression
in UnixBench Shell8 testing on X86_64.
Similar regression also reported by 0-day kernel test robot in reaim
benchmarking:
https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/

Fix it by adding more conditions.

Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")

Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
---
 mm/vmalloc.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a3c70e275f4e..7b9fc7966da6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
 	if (unlikely(valist == NULL))
 		return false;
 
+#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE)
 	/*
 	 * First make sure the mappings are removed from all page-tables
 	 * before they are freed.
+	 *
+	 * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is
+	 * the case on a PAE kernel with PTI enabled.
 	 */
-	vmalloc_sync_all();
+	if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI))
+		vmalloc_sync_all();
+#endif
 
 	/*
 	 * TODO: to calculate a flush range without looping.
-- 
2.24.0.rc2



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all()
  2019-11-13  9:55 [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() Shile Zhang
@ 2019-11-13 15:17 ` Qian Cai
  2019-11-13 21:12 ` Andrew Morton
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Qian Cai @ 2019-11-13 15:17 UTC (permalink / raw)
  To: Shile Zhang, Andrew Morton, Joerg Roedel; +Cc: linux-mm, linux-kbuild

On Wed, 2019-11-13 at 17:55 +0800, Shile Zhang wrote:
> vmalloc_sync_all() was put in the common path in
> __purge_vmap_area_lazy(), for one sync issue only happened on X86_32
> with PTI enabled. It is needless for X86_64, which caused a big regression
> in UnixBench Shell8 testing on X86_64.
> Similar regression also reported by 0-day kernel test robot in reaim
> benchmarking:
> https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/
> 
> Fix it by adding more conditions.
> 
> Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")
> 
> Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
> ---
>  mm/vmalloc.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index a3c70e275f4e..7b9fc7966da6 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
>  	if (unlikely(valist == NULL))
>  		return false;
>  
> +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE)

Is it cleaner to use if (IS_ENABLED()...) ?

>  	/*
>  	 * First make sure the mappings are removed from all page-tables
>  	 * before they are freed.
> +	 *
> +	 * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is
> +	 * the case on a PAE kernel with PTI enabled.
>  	 */
> -	vmalloc_sync_all();
> +	if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI))
> +		vmalloc_sync_all();
> +#endif
>  
>  	/*
>  	 * TODO: to calculate a flush range without looping.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all()
  2019-11-13  9:55 [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() Shile Zhang
  2019-11-13 15:17 ` Qian Cai
@ 2019-11-13 21:12 ` Andrew Morton
  2019-11-14  0:54   ` Shile Zhang
  2019-11-14 13:56 ` Michal Hocko
  2019-11-14 17:12 ` Joerg Roedel
  3 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2019-11-13 21:12 UTC (permalink / raw)
  To: Shile Zhang
  Cc: Joerg Roedel, linux-mm, linux-kbuild, Thomas Gleixner,
	Dave Hansen, Qian Cai

On Wed, 13 Nov 2019 17:55:30 +0800 Shile Zhang <shile.zhang@linux.alibaba.com> wrote:

> vmalloc_sync_all() was put in the common path in
> __purge_vmap_area_lazy(), for one sync issue only happened on X86_32
> with PTI enabled. It is needless for X86_64, which caused a big regression
> in UnixBench Shell8 testing on X86_64.
> Similar regression also reported by 0-day kernel test robot in reaim
> benchmarking:
> https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/

That is indeed a large performance regression.

> Fix it by adding more conditions.
> 
> Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")
>
> ...
>
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
>  	if (unlikely(valist == NULL))
>  		return false;
>  
> +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE)

Are we sure that x86_32 is the only architecture whcih is (or ever will
be) affected?

>  	/*
>  	 * First make sure the mappings are removed from all page-tables
>  	 * before they are freed.
> +	 *
> +	 * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is
> +	 * the case on a PAE kernel with PTI enabled.
>  	 */
> -	vmalloc_sync_all();
> +	if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI))
> +		vmalloc_sync_all();
> +#endif
>  
>  	/*
>  	 * TODO: to calculate a flush range without looping.

CONFIG_X86_PAE depends on CONFIG_X86_32 so no need to check
CONFIG_X86_32?

From: Andrew Morton <akpm@linux-foundation.org>
Subject: mm-vmalloc-fix-regression-caused-by-needless-vmalloc_sync_all-fix

simplify config expression, use IS_ENABLED()

Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Qian Cai <cai@lca.pw>
Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmalloc.c |   21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

--- a/mm/vmalloc.c~mm-vmalloc-fix-regression-caused-by-needless-vmalloc_sync_all-fix
+++ a/mm/vmalloc.c
@@ -1255,16 +1255,17 @@ static bool __purge_vmap_area_lazy(unsig
 	if (unlikely(valist == NULL))
 		return false;
 
-#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE)
-	/*
-	 * First make sure the mappings are removed from all page-tables
-	 * before they are freed.
-	 *
-	 * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is
-	 * the case on a PAE kernel with PTI enabled.
-	 */
-	if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI))
-		vmalloc_sync_all();
+	if (IS_ENABLED(CONFIG_X86_PAE)) {
+		/*
+		 * First make sure the mappings are removed from all page-tables
+		 * before they are freed.
+		 *
+		 * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which
+		 * is the case on a PAE kernel with PTI enabled.
+		 */
+		if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI))
+			vmalloc_sync_all();
+	}
 #endif
 
 	/*
_



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all()
  2019-11-13 21:12 ` Andrew Morton
@ 2019-11-14  0:54   ` Shile Zhang
  0 siblings, 0 replies; 13+ messages in thread
From: Shile Zhang @ 2019-11-14  0:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joerg Roedel, linux-mm, linux-kernel, Thomas Gleixner,
	Dave Hansen, Qian Cai

Fix wrong cc list.

On 2019/11/14 05:12, Andrew Morton wrote:
> On Wed, 13 Nov 2019 17:55:30 +0800 Shile Zhang <shile.zhang@linux.alibaba.com> wrote:
>
>> vmalloc_sync_all() was put in the common path in
>> __purge_vmap_area_lazy(), for one sync issue only happened on X86_32
>> with PTI enabled. It is needless for X86_64, which caused a big regression
>> in UnixBench Shell8 testing on X86_64.
>> Similar regression also reported by 0-day kernel test robot in reaim
>> benchmarking:
>> https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/
> That is indeed a large performance regression.
>
>> Fix it by adding more conditions.
>>
>> Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")
>>
>> ...
>>
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
>>   	if (unlikely(valist == NULL))
>>   		return false;
>>   
>> +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE)
> Are we sure that x86_32 is the only architecture whcih is (or ever will
> be) affected?
>
>>   	/*
>>   	 * First make sure the mappings are removed from all page-tables
>>   	 * before they are freed.
>> +	 *
>> +	 * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is
>> +	 * the case on a PAE kernel with PTI enabled.
>>   	 */
>> -	vmalloc_sync_all();
>> +	if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI))
>> +		vmalloc_sync_all();
>> +#endif
>>   
>>   	/*
>>   	 * TODO: to calculate a flush range without looping.
> CONFIG_X86_PAE depends on CONFIG_X86_32 so no need to check
> CONFIG_X86_32?

Yes, Thanks for your review and kindly refactoring!

> From: Andrew Morton <akpm@linux-foundation.org>
> Subject: mm-vmalloc-fix-regression-caused-by-needless-vmalloc_sync_all-fix
>
> simplify config expression, use IS_ENABLED()
>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Joerg Roedel <jroedel@suse.de>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
>
>   mm/vmalloc.c |   21 +++++++++++----------
>   1 file changed, 11 insertions(+), 10 deletions(-)
>
> --- a/mm/vmalloc.c~mm-vmalloc-fix-regression-caused-by-needless-vmalloc_sync_all-fix
> +++ a/mm/vmalloc.c
> @@ -1255,16 +1255,17 @@ static bool __purge_vmap_area_lazy(unsig
>   	if (unlikely(valist == NULL))
>   		return false;
>   
> -#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE)
> -	/*
> -	 * First make sure the mappings are removed from all page-tables
> -	 * before they are freed.
> -	 *
> -	 * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is
> -	 * the case on a PAE kernel with PTI enabled.
> -	 */
> -	if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI))
> -		vmalloc_sync_all();
> +	if (IS_ENABLED(CONFIG_X86_PAE)) {
> +		/*
> +		 * First make sure the mappings are removed from all page-tables
> +		 * before they are freed.
> +		 *
> +		 * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which
> +		 * is the case on a PAE kernel with PTI enabled.
> +		 */
> +		if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI))
> +			vmalloc_sync_all();
> +	}
>   #endif
>   
>   	/*
> _



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all()
  2019-11-13  9:55 [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() Shile Zhang
  2019-11-13 15:17 ` Qian Cai
  2019-11-13 21:12 ` Andrew Morton
@ 2019-11-14 13:56 ` Michal Hocko
  2019-11-14 14:40   ` Shile Zhang
  2019-11-14 17:12 ` Joerg Roedel
  3 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2019-11-14 13:56 UTC (permalink / raw)
  To: Shile Zhang; +Cc: Andrew Morton, Joerg Roedel, linux-mm, linux-kbuild

On Wed 13-11-19 17:55:30, Shile Zhang wrote:
> vmalloc_sync_all() was put in the common path in
> __purge_vmap_area_lazy(), for one sync issue only happened on X86_32
> with PTI enabled. It is needless for X86_64, which caused a big regression
> in UnixBench Shell8 testing on X86_64.
> Similar regression also reported by 0-day kernel test robot in reaim
> benchmarking:
> https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/
> 
> Fix it by adding more conditions.

This doesn't really explain a lot. Could you explain why the syncing
should be limited only to PAE and !SHARED_KERNEL_PMD? 

> Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")
> 
> Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
> ---
>  mm/vmalloc.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index a3c70e275f4e..7b9fc7966da6 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
>  	if (unlikely(valist == NULL))
>  		return false;
>  
> +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE)
>  	/*
>  	 * First make sure the mappings are removed from all page-tables
>  	 * before they are freed.
> +	 *
> +	 * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is
> +	 * the case on a PAE kernel with PTI enabled.
>  	 */
> -	vmalloc_sync_all();
> +	if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI))
> +		vmalloc_sync_all();
> +#endif
>  
>  	/*
>  	 * TODO: to calculate a flush range without looping.
> -- 
> 2.24.0.rc2
> 

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all()
  2019-11-14 13:56 ` Michal Hocko
@ 2019-11-14 14:40   ` Shile Zhang
  2019-11-14 17:14     ` Joerg Roedel
  0 siblings, 1 reply; 13+ messages in thread
From: Shile Zhang @ 2019-11-14 14:40 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Joerg Roedel, linux-mm, linux-kernel



On 2019/11/14 21:56, Michal Hocko wrote:
> On Wed 13-11-19 17:55:30, Shile Zhang wrote:
>> vmalloc_sync_all() was put in the common path in
>> __purge_vmap_area_lazy(), for one sync issue only happened on X86_32
>> with PTI enabled. It is needless for X86_64, which caused a big regression
>> in UnixBench Shell8 testing on X86_64.
>> Similar regression also reported by 0-day kernel test robot in reaim
>> benchmarking:
>> https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/
>>
>> Fix it by adding more conditions.
> This doesn't really explain a lot. Could you explain why the syncing
> should be limited only to PAE and !SHARED_KERNEL_PMD?

Thanks for your review!
Sorry for that, I'm not clear about the original issue the patch
3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")
fixed.

I just get that limitation info from the commit log as following:
"
This is only needed x86-32 with !SHARED_KERNEL_PMD, which is the case on 
a PAE kernel with PTI enabled. On affected systems the missing sync 
causes old mappings to persist in some page-tables, causing data 
corruption and other undefined behavior.
"
https://patchwork.kernel.org/cover/11050511/

Hi, Joerg
Could you please help to recall the original issue you encountered before?
Thanks!

>
>> Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")
>>
>> Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
>> ---
>>   mm/vmalloc.c | 8 +++++++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index a3c70e275f4e..7b9fc7966da6 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
>>   	if (unlikely(valist == NULL))
>>   		return false;
>>   
>> +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE)
>>   	/*
>>   	 * First make sure the mappings are removed from all page-tables
>>   	 * before they are freed.
>> +	 *
>> +	 * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is
>> +	 * the case on a PAE kernel with PTI enabled.
>>   	 */
>> -	vmalloc_sync_all();
>> +	if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI))
>> +		vmalloc_sync_all();
>> +#endif
>>   
>>   	/*
>>   	 * TODO: to calculate a flush range without looping.
>> -- 
>> 2.24.0.rc2
>>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all()
  2019-11-13  9:55 [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() Shile Zhang
                   ` (2 preceding siblings ...)
  2019-11-14 13:56 ` Michal Hocko
@ 2019-11-14 17:12 ` Joerg Roedel
  2019-11-15  0:01   ` Andrew Morton
  2019-11-15  1:06   ` Shile Zhang
  3 siblings, 2 replies; 13+ messages in thread
From: Joerg Roedel @ 2019-11-14 17:12 UTC (permalink / raw)
  To: Shile Zhang; +Cc: Andrew Morton, linux-mm, linux-kbuild

On Wed, Nov 13, 2019 at 05:55:30PM +0800, Shile Zhang wrote:
> +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE)
>  	/*
>  	 * First make sure the mappings are removed from all page-tables
>  	 * before they are freed.
> +	 *
> +	 * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is
> +	 * the case on a PAE kernel with PTI enabled.
>  	 */
> -	vmalloc_sync_all();
> +	if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI))
> +		vmalloc_sync_all();
> +#endif

I already submitted another fix for this quite some time ago:

	https://lore.kernel.org/lkml/20191009124418.8286-1-joro@8bytes.org/

Regards,

	Joerg



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all()
  2019-11-14 14:40   ` Shile Zhang
@ 2019-11-14 17:14     ` Joerg Roedel
  0 siblings, 0 replies; 13+ messages in thread
From: Joerg Roedel @ 2019-11-14 17:14 UTC (permalink / raw)
  To: Shile Zhang; +Cc: Michal Hocko, Andrew Morton, linux-mm, linux-kernel

On Thu, Nov 14, 2019 at 10:40:22PM +0800, Shile Zhang wrote:
> Could you please help to recall the original issue you encountered before?

The original issue was data corruption because old mappings did not get
removed from the vmalloc area, and thus also new mappings did not get
faulted in. So depending on the page-table currently loaded one
vmalloc/ioremap area pointed to different (often already freed or
re-used) memory and caused the corruption.

Regards,

	Joerg


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all()
  2019-11-14 17:12 ` Joerg Roedel
@ 2019-11-15  0:01   ` Andrew Morton
  2019-11-15  8:38     ` Borislav Petkov
  2019-11-15  1:06   ` Shile Zhang
  1 sibling, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2019-11-15  0:01 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Shile Zhang, linux-mm, linux-kbuild, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, hpa

On Thu, 14 Nov 2019 18:12:31 +0100 Joerg Roedel <jroedel@suse.de> wrote:

> On Wed, Nov 13, 2019 at 05:55:30PM +0800, Shile Zhang wrote:
> > +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE)
> >  	/*
> >  	 * First make sure the mappings are removed from all page-tables
> >  	 * before they are freed.
> > +	 *
> > +	 * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is
> > +	 * the case on a PAE kernel with PTI enabled.
> >  	 */
> > -	vmalloc_sync_all();
> > +	if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI))
> > +		vmalloc_sync_all();
> > +#endif
> 
> I already submitted another fix for this quite some time ago:
> 
> 	https://lore.kernel.org/lkml/20191009124418.8286-1-joro@8bytes.org/
> 

Your patch is more verbose but looks quite a bit nicer than this patch
(Link:
http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com).
The separation of sync-for-a-mapping versus sync-for-an-unmapping adds
clarity.


It's fairly urgent - I consider this to be -stable material.

Thomas & co, was that a deliberate skip?




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all()
  2019-11-14 17:12 ` Joerg Roedel
  2019-11-15  0:01   ` Andrew Morton
@ 2019-11-15  1:06   ` Shile Zhang
  1 sibling, 0 replies; 13+ messages in thread
From: Shile Zhang @ 2019-11-15  1:06 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Andrew Morton, linux-mm, linux-kernel



On 2019/11/15 01:12, Joerg Roedel wrote:
> On Wed, Nov 13, 2019 at 05:55:30PM +0800, Shile Zhang wrote:
>> +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE)
>>   	/*
>>   	 * First make sure the mappings are removed from all page-tables
>>   	 * before they are freed.
>> +	 *
>> +	 * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is
>> +	 * the case on a PAE kernel with PTI enabled.
>>   	 */
>> -	vmalloc_sync_all();
>> +	if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI))
>> +		vmalloc_sync_all();
>> +#endif
> I already submitted another fix for this quite some time ago:
>
> 	https://lore.kernel.org/lkml/20191009124418.8286-1-joro@8bytes.org/
>
> Regards,
>
> 	Joerg
Oh, sorry for I missed that, good job and thanks for your work!


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all()
  2019-11-15  0:01   ` Andrew Morton
@ 2019-11-15  8:38     ` Borislav Petkov
  2019-11-15 23:37       ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2019-11-15  8:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joerg Roedel, Shile Zhang, linux-mm, linux-kbuild, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	hpa

On Thu, Nov 14, 2019 at 04:01:08PM -0800, Andrew Morton wrote:
> It's fairly urgent - I consider this to be -stable material.
> 
> Thomas & co, was that a deliberate skip?

More like lost in the mail flood. I'm assuming you're taking it?

In any case, patch looks ok to me - the "unmappings" naming sounds like
coming from someone with a very wild phantasy... :-)

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all()
  2019-11-15  8:38     ` Borislav Petkov
@ 2019-11-15 23:37       ` Andrew Morton
  2019-11-16 17:38         ` Borislav Petkov
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2019-11-15 23:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Joerg Roedel, Shile Zhang, linux-mm, linux-kbuild, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	hpa

On Fri, 15 Nov 2019 09:38:47 +0100 Borislav Petkov <bp@alien8.de> wrote:

> On Thu, Nov 14, 2019 at 04:01:08PM -0800, Andrew Morton wrote:
> > It's fairly urgent - I consider this to be -stable material.
> > 
> > Thomas & co, was that a deliberate skip?
> 
> More like lost in the mail flood. I'm assuming you're taking it?
> 
> In any case, patch looks ok to me - the "unmappings" naming sounds like
> coming from someone with a very wild phantasy... :-)
> 

OK, thanks, here's what I queued.  Reviews, acks and testing, please??

From: Joerg Roedel <jroedel@suse.de>
Subject: x86/mm: Split vmalloc_sync_all()

Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in
__purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in the
vunmap() code-path.  While this change was necessary to maintain
correctness on x86-32-pae kernels, it also adds additional cycles for
architectures that don't need it.

Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported
severe performance regressions in micro-benchmarks because it now also
calls the x86-64 implementation of vmalloc_sync_all() on vunmap().  But
the vmalloc_sync_all() implementation on x86-64 is only needed for newly
created mappings.

To avoid the unnecessary work on x86-64 and to gain the performance back,
split up vmalloc_sync_all() into two functions:

	* vmalloc_sync_mappings(), and
	* vmalloc_sync_unmappings()

Most call-sites to vmalloc_sync_all() only care about new mappings being
synchronized.  The only exception is the new call-site added in the above
mentioned commit.

Shile Zhang directed us to a report of an 80% regression in reaim
throughput.

Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org
Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/
Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com
Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	[GHES]
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/x86/mm/fault.c      |   26 ++++++++++++++++++++++++--
 drivers/acpi/apei/ghes.c |    2 +-
 include/linux/vmalloc.h  |    5 +++--
 kernel/notifier.c        |    2 +-
 mm/nommu.c               |   10 +++++++---
 mm/vmalloc.c             |   11 +++++++----
 6 files changed, 43 insertions(+), 13 deletions(-)

--- a/arch/x86/mm/fault.c~x86-mm-split-vmalloc_sync_all
+++ a/arch/x86/mm/fault.c
@@ -189,7 +189,7 @@ static inline pmd_t *vmalloc_sync_one(pg
 	return pmd_k;
 }
 
-void vmalloc_sync_all(void)
+static void vmalloc_sync(void)
 {
 	unsigned long address;
 
@@ -216,6 +216,16 @@ void vmalloc_sync_all(void)
 	}
 }
 
+void vmalloc_sync_mappings(void)
+{
+	vmalloc_sync();
+}
+
+void vmalloc_sync_unmappings(void)
+{
+	vmalloc_sync();
+}
+
 /*
  * 32-bit:
  *
@@ -318,11 +328,23 @@ out:
 
 #else /* CONFIG_X86_64: */
 
-void vmalloc_sync_all(void)
+void vmalloc_sync_mappings(void)
 {
+	/*
+	 * 64-bit mappings might allocate new p4d/pud pages
+	 * that need to be propagated to all tasks' PGDs.
+	 */
 	sync_global_pgds(VMALLOC_START & PGDIR_MASK, VMALLOC_END);
 }
 
+void vmalloc_sync_unmappings(void)
+{
+	/*
+	 * Unmappings never allocate or free p4d/pud pages.
+	 * No work is required here.
+	 */
+}
+
 /*
  * 64-bit:
  *
--- a/drivers/acpi/apei/ghes.c~x86-mm-split-vmalloc_sync_all
+++ a/drivers/acpi/apei/ghes.c
@@ -171,7 +171,7 @@ int ghes_estatus_pool_init(int num_ghes)
 	 * New allocation must be visible in all pgd before it can be found by
 	 * an NMI allocating from the pool.
 	 */
-	vmalloc_sync_all();
+	vmalloc_sync_mappings();
 
 	rc = gen_pool_add(ghes_estatus_pool, addr, PAGE_ALIGN(len), -1);
 	if (rc)
--- a/include/linux/vmalloc.h~x86-mm-split-vmalloc_sync_all
+++ a/include/linux/vmalloc.h
@@ -126,8 +126,9 @@ extern int remap_vmalloc_range_partial(s
 
 extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
 							unsigned long pgoff);
-void vmalloc_sync_all(void);
- 
+void vmalloc_sync_mappings(void);
+void vmalloc_sync_unmappings(void);
+
 /*
  *	Lowlevel-APIs (not for driver use!)
  */
--- a/kernel/notifier.c~x86-mm-split-vmalloc_sync_all
+++ a/kernel/notifier.c
@@ -554,7 +554,7 @@ NOKPROBE_SYMBOL(notify_die);
 
 int register_die_notifier(struct notifier_block *nb)
 {
-	vmalloc_sync_all();
+	vmalloc_sync_mappings();
 	return atomic_notifier_chain_register(&die_chain, nb);
 }
 EXPORT_SYMBOL_GPL(register_die_notifier);
--- a/mm/nommu.c~x86-mm-split-vmalloc_sync_all
+++ a/mm/nommu.c
@@ -359,10 +359,14 @@ void vm_unmap_aliases(void)
 EXPORT_SYMBOL_GPL(vm_unmap_aliases);
 
 /*
- * Implement a stub for vmalloc_sync_all() if the architecture chose not to
- * have one.
+ * Implement a stub for vmalloc_sync_[un]mapping() if the architecture
+ * chose not to have one.
  */
-void __weak vmalloc_sync_all(void)
+void __weak vmalloc_sync_mappings(void)
+{
+}
+
+void __weak vmalloc_sync_unmappings(void)
 {
 }
 
--- a/mm/vmalloc.c~x86-mm-split-vmalloc_sync_all
+++ a/mm/vmalloc.c
@@ -1259,7 +1259,7 @@ static bool __purge_vmap_area_lazy(unsig
 	 * First make sure the mappings are removed from all page-tables
 	 * before they are freed.
 	 */
-	vmalloc_sync_all();
+	vmalloc_sync_unmappings();
 
 	/*
 	 * TODO: to calculate a flush range without looping.
@@ -3050,16 +3050,19 @@ int remap_vmalloc_range(struct vm_area_s
 EXPORT_SYMBOL(remap_vmalloc_range);
 
 /*
- * Implement a stub for vmalloc_sync_all() if the architecture chose not to
- * have one.
+ * Implement stubs for vmalloc_sync_[un]mappings () if the architecture chose
+ * not to have one.
  *
  * The purpose of this function is to make sure the vmalloc area
  * mappings are identical in all page-tables in the system.
  */
-void __weak vmalloc_sync_all(void)
+void __weak vmalloc_sync_mappings(void)
 {
 }
 
+void __weak vmalloc_sync_unmappings(void)
+{
+}
 
 static int f(pte_t *pte, unsigned long addr, void *data)
 {
_



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all()
  2019-11-15 23:37       ` Andrew Morton
@ 2019-11-16 17:38         ` Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2019-11-16 17:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Joerg Roedel, Shile Zhang, linux-mm, linux-kbuild, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	hpa

On Fri, Nov 15, 2019 at 03:37:21PM -0800, Andrew Morton wrote:
> OK, thanks, here's what I queued.  Reviews, acks and testing, please??
> 
> From: Joerg Roedel <jroedel@suse.de>
> Subject: x86/mm: Split vmalloc_sync_all()
> 
> Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in
> __purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in the
> vunmap() code-path.  While this change was necessary to maintain
> correctness on x86-32-pae kernels, it also adds additional cycles for
> architectures that don't need it.

...

>  arch/x86/mm/fault.c      |   26 ++++++++++++++++++++++++--
>  drivers/acpi/apei/ghes.c |    2 +-
>  include/linux/vmalloc.h  |    5 +++--
>  kernel/notifier.c        |    2 +-
>  mm/nommu.c               |   10 +++++++---
>  mm/vmalloc.c             |   11 +++++++----
>  6 files changed, 43 insertions(+), 13 deletions(-)

Booting it looks ok here:

Tested-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-11-16 17:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-13  9:55 [PATCH] mm/vmalloc: Fix regression caused by needless vmalloc_sync_all() Shile Zhang
2019-11-13 15:17 ` Qian Cai
2019-11-13 21:12 ` Andrew Morton
2019-11-14  0:54   ` Shile Zhang
2019-11-14 13:56 ` Michal Hocko
2019-11-14 14:40   ` Shile Zhang
2019-11-14 17:14     ` Joerg Roedel
2019-11-14 17:12 ` Joerg Roedel
2019-11-15  0:01   ` Andrew Morton
2019-11-15  8:38     ` Borislav Petkov
2019-11-15 23:37       ` Andrew Morton
2019-11-16 17:38         ` Borislav Petkov
2019-11-15  1:06   ` Shile Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).