linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/3] mm/vmalloc: remove preempt_disable/enable when do preloading
@ 2019-10-16  9:54 Uladzislau Rezki (Sony)
  2019-10-16  9:54 ` [PATCH v3 2/3] mm/vmalloc: respect passed gfp_mask " Uladzislau Rezki (Sony)
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Uladzislau Rezki (Sony) @ 2019-10-16  9:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Wagner, Sebastian Andrzej Siewior, Thomas Gleixner,
	linux-mm, LKML, Peter Zijlstra, Uladzislau Rezki, Hillf Danton,
	Michal Hocko, Matthew Wilcox, Oleksiy Avramchenko,
	Steven Rostedt

Some background. The preemption was disabled before to guarantee
that a preloaded object is available for a CPU, it was stored for.

The aim was to not allocate in atomic context when spinlock
is taken later, for regular vmap allocations. But that approach
conflicts with CONFIG_PREEMPT_RT philosophy. It means that
calling spin_lock() with disabled preemption is forbidden
in the CONFIG_PREEMPT_RT kernel.

Therefore, get rid of preempt_disable() and preempt_enable() when
the preload is done for splitting purpose. As a result we do not
guarantee now that a CPU is preloaded, instead we minimize the
case when it is not, with this change.

For example i run the special test case that follows the preload
pattern and path. 20 "unbind" threads run it and each does
1000000 allocations. Only 3.5 times among 1000000 a CPU was
not preloaded. So it can happen but the number is negligible.

V2 - > V3:
    - update the commit message

V1 -> V2:
  - move __this_cpu_cmpxchg check when spin_lock is taken,
    as proposed by Andrew Morton
  - add more explanation in regard of preloading
  - adjust and move some comments

Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for split purpose")
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e92ff5f7dd8b..b7b443bfdd92 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1078,31 +1078,34 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 
 retry:
 	/*
-	 * Preload this CPU with one extra vmap_area object to ensure
-	 * that we have it available when fit type of free area is
-	 * NE_FIT_TYPE.
+	 * Preload this CPU with one extra vmap_area object. It is used
+	 * when fit type of free area is NE_FIT_TYPE. Please note, it
+	 * does not guarantee that an allocation occurs on a CPU that
+	 * is preloaded, instead we minimize the case when it is not.
+	 * It can happen because of cpu migration, because there is a
+	 * race until the below spinlock is taken.
 	 *
 	 * The preload is done in non-atomic context, thus it allows us
 	 * to use more permissive allocation masks to be more stable under
-	 * low memory condition and high memory pressure.
+	 * low memory condition and high memory pressure. In rare case,
+	 * if not preloaded, GFP_NOWAIT is used.
 	 *
-	 * Even if it fails we do not really care about that. Just proceed
-	 * as it is. "overflow" path will refill the cache we allocate from.
+	 * Set "pva" to NULL here, because of "retry" path.
 	 */
-	preempt_disable();
-	if (!__this_cpu_read(ne_fit_preload_node)) {
-		preempt_enable();
-		pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
-		preempt_disable();
+	pva = NULL;
 
-		if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) {
-			if (pva)
-				kmem_cache_free(vmap_area_cachep, pva);
-		}
-	}
+	if (!this_cpu_read(ne_fit_preload_node))
+		/*
+		 * Even if it fails we do not really care about that.
+		 * Just proceed as it is. If needed "overflow" path
+		 * will refill the cache we allocate from.
+		 */
+		pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
 
 	spin_lock(&vmap_area_lock);
-	preempt_enable();
+
+	if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva))
+		kmem_cache_free(vmap_area_cachep, pva);
 
 	/*
 	 * If an allocation fails, the "vend" address is
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/3] mm/vmalloc: respect passed gfp_mask when do preloading
  2019-10-16  9:54 [PATCH v3 1/3] mm/vmalloc: remove preempt_disable/enable when do preloading Uladzislau Rezki (Sony)
@ 2019-10-16  9:54 ` Uladzislau Rezki (Sony)
  2019-10-16 11:06   ` Michal Hocko
  2019-10-16  9:54 ` [PATCH v3 3/3] mm/vmalloc: add more comments to the adjust_va_to_fit_type() Uladzislau Rezki (Sony)
  2019-10-16 10:59 ` [PATCH v3 1/3] mm/vmalloc: remove preempt_disable/enable when do preloading Michal Hocko
  2 siblings, 1 reply; 11+ messages in thread
From: Uladzislau Rezki (Sony) @ 2019-10-16  9:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Wagner, Sebastian Andrzej Siewior, Thomas Gleixner,
	linux-mm, LKML, Peter Zijlstra, Uladzislau Rezki, Hillf Danton,
	Michal Hocko, Matthew Wilcox, Oleksiy Avramchenko,
	Steven Rostedt

alloc_vmap_area() is given a gfp_mask for the page allocator.
Let's respect that mask and consider it even in the case when
doing regular CPU preloading, i.e. where a context can sleep.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index b7b443bfdd92..593bf554518d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1064,9 +1064,9 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 		return ERR_PTR(-EBUSY);
 
 	might_sleep();
+	gfp_mask = gfp_mask & GFP_RECLAIM_MASK;
 
-	va = kmem_cache_alloc_node(vmap_area_cachep,
-			gfp_mask & GFP_RECLAIM_MASK, node);
+	va = kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node);
 	if (unlikely(!va))
 		return ERR_PTR(-ENOMEM);
 
@@ -1074,7 +1074,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 	 * Only scan the relevant parts containing pointers to other objects
 	 * to avoid false negatives.
 	 */
-	kmemleak_scan_area(&va->rb_node, SIZE_MAX, gfp_mask & GFP_RECLAIM_MASK);
+	kmemleak_scan_area(&va->rb_node, SIZE_MAX, gfp_mask);
 
 retry:
 	/*
@@ -1100,7 +1100,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 		 * Just proceed as it is. If needed "overflow" path
 		 * will refill the cache we allocate from.
 		 */
-		pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
+		pva = kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node);
 
 	spin_lock(&vmap_area_lock);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 3/3] mm/vmalloc: add more comments to the adjust_va_to_fit_type()
  2019-10-16  9:54 [PATCH v3 1/3] mm/vmalloc: remove preempt_disable/enable when do preloading Uladzislau Rezki (Sony)
  2019-10-16  9:54 ` [PATCH v3 2/3] mm/vmalloc: respect passed gfp_mask " Uladzislau Rezki (Sony)
@ 2019-10-16  9:54 ` Uladzislau Rezki (Sony)
  2019-10-16 11:07   ` Michal Hocko
  2019-10-16 10:59 ` [PATCH v3 1/3] mm/vmalloc: remove preempt_disable/enable when do preloading Michal Hocko
  2 siblings, 1 reply; 11+ messages in thread
From: Uladzislau Rezki (Sony) @ 2019-10-16  9:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Wagner, Sebastian Andrzej Siewior, Thomas Gleixner,
	linux-mm, LKML, Peter Zijlstra, Uladzislau Rezki, Hillf Danton,
	Michal Hocko, Matthew Wilcox, Oleksiy Avramchenko,
	Steven Rostedt

When fit type is NE_FIT_TYPE there is a need in one extra object.
Usually the "ne_fit_preload_node" per-CPU variable has it and
there is no need in GFP_NOWAIT allocation, but there are exceptions.

This commit just adds more explanations, as a result giving
answers on questions like when it can occur, how often, under
which conditions and what happens if GFP_NOWAIT gets failed.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 593bf554518d..2290a0d270e4 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -969,6 +969,19 @@ adjust_va_to_fit_type(struct vmap_area *va,
 			 * There are a few exceptions though, as an example it is
 			 * a first allocation (early boot up) when we have "one"
 			 * big free space that has to be split.
+			 *
+			 * Also we can hit this path in case of regular "vmap"
+			 * allocations, if "this" current CPU was not preloaded.
+			 * See the comment in alloc_vmap_area() why. If so, then
+			 * GFP_NOWAIT is used instead to get an extra object for
+			 * split purpose. That is rare and most time does not
+			 * occur.
+			 *
+			 * What happens if an allocation gets failed. Basically,
+			 * an "overflow" path is triggered to purge lazily freed
+			 * areas to free some memory, then, the "retry" path is
+			 * triggered to repeat one more time. See more details
+			 * in alloc_vmap_area() function.
 			 */
 			lva = kmem_cache_alloc(vmap_area_cachep, GFP_NOWAIT);
 			if (!lva)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/3] mm/vmalloc: remove preempt_disable/enable when do preloading
  2019-10-16  9:54 [PATCH v3 1/3] mm/vmalloc: remove preempt_disable/enable when do preloading Uladzislau Rezki (Sony)
  2019-10-16  9:54 ` [PATCH v3 2/3] mm/vmalloc: respect passed gfp_mask " Uladzislau Rezki (Sony)
  2019-10-16  9:54 ` [PATCH v3 3/3] mm/vmalloc: add more comments to the adjust_va_to_fit_type() Uladzislau Rezki (Sony)
@ 2019-10-16 10:59 ` Michal Hocko
  2019-10-18  9:37   ` Uladzislau Rezki
  2 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2019-10-16 10:59 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: Andrew Morton, Daniel Wagner, Sebastian Andrzej Siewior,
	Thomas Gleixner, linux-mm, LKML, Peter Zijlstra, Hillf Danton,
	Matthew Wilcox, Oleksiy Avramchenko, Steven Rostedt

On Wed 16-10-19 11:54:36, Uladzislau Rezki (Sony) wrote:
> Some background. The preemption was disabled before to guarantee
> that a preloaded object is available for a CPU, it was stored for.

Probably good to be explicit that this has been achieved by combining
the disabling the preemption and taking the spin lock while the
ne_fit_preload_node is checked resp. repopulated, right?

> The aim was to not allocate in atomic context when spinlock
> is taken later, for regular vmap allocations. But that approach
> conflicts with CONFIG_PREEMPT_RT philosophy. It means that
> calling spin_lock() with disabled preemption is forbidden
> in the CONFIG_PREEMPT_RT kernel.
> 
> Therefore, get rid of preempt_disable() and preempt_enable() when
> the preload is done for splitting purpose. As a result we do not
> guarantee now that a CPU is preloaded, instead we minimize the
> case when it is not, with this change.

by populating the per cpu preload pointer under the vmap_area_lock.
This implies that at least each caller which has done the preallocation
will not fallback to an atomic allocation later. It is possible that the
preallocation would be pointless or that no preallocation is done
because of the race but your data shows that this is really rare.

> For example i run the special test case that follows the preload
> pattern and path. 20 "unbind" threads run it and each does
> 1000000 allocations. Only 3.5 times among 1000000 a CPU was
> not preloaded. So it can happen but the number is negligible.
> 
> V2 - > V3:
>     - update the commit message
> 
> V1 -> V2:
>   - move __this_cpu_cmpxchg check when spin_lock is taken,
>     as proposed by Andrew Morton
>   - add more explanation in regard of preloading
>   - adjust and move some comments
> 
> Fixes: 82dd23e84be3 ("mm/vmalloc.c: preload a CPU with one object for split purpose")
> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Acked-by: Daniel Wagner <dwagner@suse.de>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/vmalloc.c | 37 ++++++++++++++++++++-----------------
>  1 file changed, 20 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index e92ff5f7dd8b..b7b443bfdd92 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1078,31 +1078,34 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>  
>  retry:
>  	/*
> -	 * Preload this CPU with one extra vmap_area object to ensure
> -	 * that we have it available when fit type of free area is
> -	 * NE_FIT_TYPE.
> +	 * Preload this CPU with one extra vmap_area object. It is used
> +	 * when fit type of free area is NE_FIT_TYPE. Please note, it
> +	 * does not guarantee that an allocation occurs on a CPU that
> +	 * is preloaded, instead we minimize the case when it is not.
> +	 * It can happen because of cpu migration, because there is a
> +	 * race until the below spinlock is taken.
>  	 *
>  	 * The preload is done in non-atomic context, thus it allows us
>  	 * to use more permissive allocation masks to be more stable under
> -	 * low memory condition and high memory pressure.
> +	 * low memory condition and high memory pressure. In rare case,
> +	 * if not preloaded, GFP_NOWAIT is used.
>  	 *
> -	 * Even if it fails we do not really care about that. Just proceed
> -	 * as it is. "overflow" path will refill the cache we allocate from.
> +	 * Set "pva" to NULL here, because of "retry" path.
>  	 */
> -	preempt_disable();
> -	if (!__this_cpu_read(ne_fit_preload_node)) {
> -		preempt_enable();
> -		pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
> -		preempt_disable();
> +	pva = NULL;
>  
> -		if (__this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva)) {
> -			if (pva)
> -				kmem_cache_free(vmap_area_cachep, pva);
> -		}
> -	}
> +	if (!this_cpu_read(ne_fit_preload_node))
> +		/*
> +		 * Even if it fails we do not really care about that.
> +		 * Just proceed as it is. If needed "overflow" path
> +		 * will refill the cache we allocate from.
> +		 */
> +		pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
>  
>  	spin_lock(&vmap_area_lock);
> -	preempt_enable();
> +
> +	if (pva && __this_cpu_cmpxchg(ne_fit_preload_node, NULL, pva))
> +		kmem_cache_free(vmap_area_cachep, pva);
>  
>  	/*
>  	 * If an allocation fails, the "vend" address is
> -- 
> 2.20.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/3] mm/vmalloc: respect passed gfp_mask when do preloading
  2019-10-16  9:54 ` [PATCH v3 2/3] mm/vmalloc: respect passed gfp_mask " Uladzislau Rezki (Sony)
@ 2019-10-16 11:06   ` Michal Hocko
  2019-10-18  9:40     ` Uladzislau Rezki
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2019-10-16 11:06 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: Andrew Morton, Daniel Wagner, Sebastian Andrzej Siewior,
	Thomas Gleixner, linux-mm, LKML, Peter Zijlstra, Hillf Danton,
	Matthew Wilcox, Oleksiy Avramchenko, Steven Rostedt

On Wed 16-10-19 11:54:37, Uladzislau Rezki (Sony) wrote:
> alloc_vmap_area() is given a gfp_mask for the page allocator.
> Let's respect that mask and consider it even in the case when
> doing regular CPU preloading, i.e. where a context can sleep.

This is explaining what but it doesn't say why. I would go with
"
Allocation functions should comply with the given gfp_mask as much as
possible. The preallocation code in alloc_vmap_area doesn't follow that
pattern and it is using a hardcoded GFP_KERNEL. Although this doesn't
really make much difference because vmalloc is not GFP_NOWAIT compliant
in general (e.g. page table allocations are GFP_KERNEL) there is no
reason to spread that bad habit and it is good to fix the antipattern.
"
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/vmalloc.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index b7b443bfdd92..593bf554518d 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1064,9 +1064,9 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>  		return ERR_PTR(-EBUSY);
>  
>  	might_sleep();
> +	gfp_mask = gfp_mask & GFP_RECLAIM_MASK;
>  
> -	va = kmem_cache_alloc_node(vmap_area_cachep,
> -			gfp_mask & GFP_RECLAIM_MASK, node);
> +	va = kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node);
>  	if (unlikely(!va))
>  		return ERR_PTR(-ENOMEM);
>  
> @@ -1074,7 +1074,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>  	 * Only scan the relevant parts containing pointers to other objects
>  	 * to avoid false negatives.
>  	 */
> -	kmemleak_scan_area(&va->rb_node, SIZE_MAX, gfp_mask & GFP_RECLAIM_MASK);
> +	kmemleak_scan_area(&va->rb_node, SIZE_MAX, gfp_mask);
>  
>  retry:
>  	/*
> @@ -1100,7 +1100,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>  		 * Just proceed as it is. If needed "overflow" path
>  		 * will refill the cache we allocate from.
>  		 */
> -		pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
> +		pva = kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node);
>  
>  	spin_lock(&vmap_area_lock);
>  
> -- 
> 2.20.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 3/3] mm/vmalloc: add more comments to the adjust_va_to_fit_type()
  2019-10-16  9:54 ` [PATCH v3 3/3] mm/vmalloc: add more comments to the adjust_va_to_fit_type() Uladzislau Rezki (Sony)
@ 2019-10-16 11:07   ` Michal Hocko
  2019-10-18  9:41     ` Uladzislau Rezki
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2019-10-16 11:07 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: Andrew Morton, Daniel Wagner, Sebastian Andrzej Siewior,
	Thomas Gleixner, linux-mm, LKML, Peter Zijlstra, Hillf Danton,
	Matthew Wilcox, Oleksiy Avramchenko, Steven Rostedt

On Wed 16-10-19 11:54:38, Uladzislau Rezki (Sony) wrote:
> When fit type is NE_FIT_TYPE there is a need in one extra object.
> Usually the "ne_fit_preload_node" per-CPU variable has it and
> there is no need in GFP_NOWAIT allocation, but there are exceptions.
> 
> This commit just adds more explanations, as a result giving
> answers on questions like when it can occur, how often, under
> which conditions and what happens if GFP_NOWAIT gets failed.
> 
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/vmalloc.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 593bf554518d..2290a0d270e4 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -969,6 +969,19 @@ adjust_va_to_fit_type(struct vmap_area *va,
>  			 * There are a few exceptions though, as an example it is
>  			 * a first allocation (early boot up) when we have "one"
>  			 * big free space that has to be split.
> +			 *
> +			 * Also we can hit this path in case of regular "vmap"
> +			 * allocations, if "this" current CPU was not preloaded.
> +			 * See the comment in alloc_vmap_area() why. If so, then
> +			 * GFP_NOWAIT is used instead to get an extra object for
> +			 * split purpose. That is rare and most time does not
> +			 * occur.
> +			 *
> +			 * What happens if an allocation gets failed. Basically,
> +			 * an "overflow" path is triggered to purge lazily freed
> +			 * areas to free some memory, then, the "retry" path is
> +			 * triggered to repeat one more time. See more details
> +			 * in alloc_vmap_area() function.
>  			 */
>  			lva = kmem_cache_alloc(vmap_area_cachep, GFP_NOWAIT);
>  			if (!lva)
> -- 
> 2.20.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/3] mm/vmalloc: remove preempt_disable/enable when do preloading
  2019-10-16 10:59 ` [PATCH v3 1/3] mm/vmalloc: remove preempt_disable/enable when do preloading Michal Hocko
@ 2019-10-18  9:37   ` Uladzislau Rezki
  0 siblings, 0 replies; 11+ messages in thread
From: Uladzislau Rezki @ 2019-10-18  9:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Uladzislau Rezki (Sony),
	Andrew Morton, Daniel Wagner, Sebastian Andrzej Siewior,
	Thomas Gleixner, linux-mm, LKML, Peter Zijlstra, Hillf Danton,
	Matthew Wilcox, Oleksiy Avramchenko, Steven Rostedt

Hello, Michal.

Sorry for late reply. See my comments enclosed below:

> On Wed 16-10-19 11:54:36, Uladzislau Rezki (Sony) wrote:
> > Some background. The preemption was disabled before to guarantee
> > that a preloaded object is available for a CPU, it was stored for.
> 
> Probably good to be explicit that this has been achieved by combining
> the disabling the preemption and taking the spin lock while the
> ne_fit_preload_node is checked resp. repopulated, right?
> 
Right, agree with your comment!


> > The aim was to not allocate in atomic context when spinlock
> > is taken later, for regular vmap allocations. But that approach
> > conflicts with CONFIG_PREEMPT_RT philosophy. It means that
> > calling spin_lock() with disabled preemption is forbidden
> > in the CONFIG_PREEMPT_RT kernel.
> > 
> > Therefore, get rid of preempt_disable() and preempt_enable() when
> > the preload is done for splitting purpose. As a result we do not
> > guarantee now that a CPU is preloaded, instead we minimize the
> > case when it is not, with this change.
> 
> by populating the per cpu preload pointer under the vmap_area_lock.
> This implies that at least each caller which has done the preallocation
> will not fallback to an atomic allocation later. It is possible that the
> preallocation would be pointless or that no preallocation is done
> because of the race but your data shows that this is really rare.
> 
That makes sense to add. Please find below updated comment:

<snip>
mm/vmalloc: remove preempt_disable/enable when do preloading

Some background. The preemption was disabled before to guarantee
that a preloaded object is available for a CPU, it was stored for.
That was achieved by combining the disabling the preemption and
taking the spin lock while the ne_fit_preload_node is checked.

The aim was to not allocate in atomic context when spinlock
is taken later, for regular vmap allocations. But that approach
conflicts with CONFIG_PREEMPT_RT philosophy. It means that
calling spin_lock() with disabled preemption is forbidden
in the CONFIG_PREEMPT_RT kernel.

Therefore, get rid of preempt_disable() and preempt_enable() when
the preload is done for splitting purpose. As a result we do not
guarantee now that a CPU is preloaded, instead we minimize the
case when it is not, with this change, by populating the per
cpu preload pointer under the vmap_area_lock.

This implies that at least each caller that has done the preallocation
will not fallback to an atomic allocation later. It is possible
that the preallocation would be pointless or that no preallocation
is done because of the race but the data shows that this is really
rare.

For example i run the special test case that follows the preload
pattern and path. 20 "unbind" threads run it and each does
1000000 allocations. Only 3.5 times among 1000000 a CPU was
not preloaded. So it can happen but the number is negligible.

V2 - > V3:
    - update the commit message

V1 -> V2:
  - move __this_cpu_cmpxchg check when spin_lock is taken,
    as proposed by Andrew Morton
  - add more explanation in regard of preloading
  - adjust and move some comments
<snip>

Do you agree on that?

Thank you!

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/3] mm/vmalloc: respect passed gfp_mask when do preloading
  2019-10-16 11:06   ` Michal Hocko
@ 2019-10-18  9:40     ` Uladzislau Rezki
  2019-10-19  1:58       ` Andrew Morton
  0 siblings, 1 reply; 11+ messages in thread
From: Uladzislau Rezki @ 2019-10-18  9:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Uladzislau Rezki (Sony),
	Andrew Morton, Daniel Wagner, Sebastian Andrzej Siewior,
	Thomas Gleixner, linux-mm, LKML, Peter Zijlstra, Hillf Danton,
	Matthew Wilcox, Oleksiy Avramchenko, Steven Rostedt

> > alloc_vmap_area() is given a gfp_mask for the page allocator.
> > Let's respect that mask and consider it even in the case when
> > doing regular CPU preloading, i.e. where a context can sleep.
> 
> This is explaining what but it doesn't say why. I would go with
> "
> Allocation functions should comply with the given gfp_mask as much as
> possible. The preallocation code in alloc_vmap_area doesn't follow that
> pattern and it is using a hardcoded GFP_KERNEL. Although this doesn't
> really make much difference because vmalloc is not GFP_NOWAIT compliant
> in general (e.g. page table allocations are GFP_KERNEL) there is no
> reason to spread that bad habit and it is good to fix the antipattern.
> "
I can go with that, agree. I am not sure if i need to update the patch
and send v4. Or maybe Andrew can directly update it in his tree.

Andrew, should i send or can update?

Thank you in advance!

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 3/3] mm/vmalloc: add more comments to the adjust_va_to_fit_type()
  2019-10-16 11:07   ` Michal Hocko
@ 2019-10-18  9:41     ` Uladzislau Rezki
  0 siblings, 0 replies; 11+ messages in thread
From: Uladzislau Rezki @ 2019-10-18  9:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Uladzislau Rezki (Sony),
	Andrew Morton, Daniel Wagner, Sebastian Andrzej Siewior,
	Thomas Gleixner, linux-mm, LKML, Peter Zijlstra, Hillf Danton,
	Matthew Wilcox, Oleksiy Avramchenko, Steven Rostedt

On Wed, Oct 16, 2019 at 01:07:22PM +0200, Michal Hocko wrote:
> On Wed 16-10-19 11:54:38, Uladzislau Rezki (Sony) wrote:
> > When fit type is NE_FIT_TYPE there is a need in one extra object.
> > Usually the "ne_fit_preload_node" per-CPU variable has it and
> > there is no need in GFP_NOWAIT allocation, but there are exceptions.
> > 
> > This commit just adds more explanations, as a result giving
> > answers on questions like when it can occur, how often, under
> > which conditions and what happens if GFP_NOWAIT gets failed.
> > 
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> 
> Acked-by: Michal Hocko <mhocko@suse.com>
> 
Thanks!

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/3] mm/vmalloc: respect passed gfp_mask when do preloading
  2019-10-18  9:40     ` Uladzislau Rezki
@ 2019-10-19  1:58       ` Andrew Morton
  2019-10-19 15:51         ` Uladzislau Rezki
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2019-10-19  1:58 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Michal Hocko, Daniel Wagner, Sebastian Andrzej Siewior,
	Thomas Gleixner, linux-mm, LKML, Peter Zijlstra, Hillf Danton,
	Matthew Wilcox, Oleksiy Avramchenko, Steven Rostedt

On Fri, 18 Oct 2019 11:40:49 +0200 Uladzislau Rezki <urezki@gmail.com> wrote:

> > > alloc_vmap_area() is given a gfp_mask for the page allocator.
> > > Let's respect that mask and consider it even in the case when
> > > doing regular CPU preloading, i.e. where a context can sleep.
> > 
> > This is explaining what but it doesn't say why. I would go with
> > "
> > Allocation functions should comply with the given gfp_mask as much as
> > possible. The preallocation code in alloc_vmap_area doesn't follow that
> > pattern and it is using a hardcoded GFP_KERNEL. Although this doesn't
> > really make much difference because vmalloc is not GFP_NOWAIT compliant
> > in general (e.g. page table allocations are GFP_KERNEL) there is no
> > reason to spread that bad habit and it is good to fix the antipattern.
> > "
> I can go with that, agree. I am not sure if i need to update the patch
> and send v4. Or maybe Andrew can directly update it in his tree.
> 
> Andrew, should i send or can update?

I updated the changelog with Michal's words prior to committing.  You
were cc'ed :)


From: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Subject: mm/vmalloc: respect passed gfp_mask when doing preloading

Allocation functions should comply with the given gfp_mask as much as
possible.  The preallocation code in alloc_vmap_area doesn't follow that
pattern and it is using a hardcoded GFP_KERNEL.  Although this doesn't
really make much difference because vmalloc is not GFP_NOWAIT compliant in
general (e.g.  page table allocations are GFP_KERNEL) there is no reason
to spread that bad habit and it is good to fix the antipattern.

[mhocko@suse.com: rewrite changelog]
Link: http://lkml.kernel.org/r/20191016095438.12391-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmalloc.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/mm/vmalloc.c~mm-vmalloc-respect-passed-gfp_mask-when-do-preloading
+++ a/mm/vmalloc.c
@@ -1063,9 +1063,9 @@ static struct vmap_area *alloc_vmap_area
 		return ERR_PTR(-EBUSY);
 
 	might_sleep();
+	gfp_mask = gfp_mask & GFP_RECLAIM_MASK;
 
-	va = kmem_cache_alloc_node(vmap_area_cachep,
-			gfp_mask & GFP_RECLAIM_MASK, node);
+	va = kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node);
 	if (unlikely(!va))
 		return ERR_PTR(-ENOMEM);
 
@@ -1073,7 +1073,7 @@ static struct vmap_area *alloc_vmap_area
 	 * Only scan the relevant parts containing pointers to other objects
 	 * to avoid false negatives.
 	 */
-	kmemleak_scan_area(&va->rb_node, SIZE_MAX, gfp_mask & GFP_RECLAIM_MASK);
+	kmemleak_scan_area(&va->rb_node, SIZE_MAX, gfp_mask);
 
 retry:
 	/*
@@ -1099,7 +1099,7 @@ retry:
 		 * Just proceed as it is. If needed "overflow" path
 		 * will refill the cache we allocate from.
 		 */
-		pva = kmem_cache_alloc_node(vmap_area_cachep, GFP_KERNEL, node);
+		pva = kmem_cache_alloc_node(vmap_area_cachep, gfp_mask, node);
 
 	spin_lock(&vmap_area_lock);
 
_


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/3] mm/vmalloc: respect passed gfp_mask when do preloading
  2019-10-19  1:58       ` Andrew Morton
@ 2019-10-19 15:51         ` Uladzislau Rezki
  0 siblings, 0 replies; 11+ messages in thread
From: Uladzislau Rezki @ 2019-10-19 15:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Uladzislau Rezki, Michal Hocko, Daniel Wagner,
	Sebastian Andrzej Siewior, Thomas Gleixner, linux-mm, LKML,
	Peter Zijlstra, Hillf Danton, Matthew Wilcox,
	Oleksiy Avramchenko, Steven Rostedt

> > > 
> > > This is explaining what but it doesn't say why. I would go with
> > > "
> > > Allocation functions should comply with the given gfp_mask as much as
> > > possible. The preallocation code in alloc_vmap_area doesn't follow that
> > > pattern and it is using a hardcoded GFP_KERNEL. Although this doesn't
> > > really make much difference because vmalloc is not GFP_NOWAIT compliant
> > > in general (e.g. page table allocations are GFP_KERNEL) there is no
> > > reason to spread that bad habit and it is good to fix the antipattern.
> > > "
> > I can go with that, agree. I am not sure if i need to update the patch
> > and send v4. Or maybe Andrew can directly update it in his tree.
> > 
> > Andrew, should i send or can update?
> 
> I updated the changelog with Michal's words prior to committing.  You
> were cc'ed :)
> 
Ah, i saw the email stating that the patch has been added to the "mm"
tree, but i did not check the commit message. Now i see everything is
sorted out :)

Thank you!

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-10-19 15:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-16  9:54 [PATCH v3 1/3] mm/vmalloc: remove preempt_disable/enable when do preloading Uladzislau Rezki (Sony)
2019-10-16  9:54 ` [PATCH v3 2/3] mm/vmalloc: respect passed gfp_mask " Uladzislau Rezki (Sony)
2019-10-16 11:06   ` Michal Hocko
2019-10-18  9:40     ` Uladzislau Rezki
2019-10-19  1:58       ` Andrew Morton
2019-10-19 15:51         ` Uladzislau Rezki
2019-10-16  9:54 ` [PATCH v3 3/3] mm/vmalloc: add more comments to the adjust_va_to_fit_type() Uladzislau Rezki (Sony)
2019-10-16 11:07   ` Michal Hocko
2019-10-18  9:41     ` Uladzislau Rezki
2019-10-16 10:59 ` [PATCH v3 1/3] mm/vmalloc: remove preempt_disable/enable when do preloading Michal Hocko
2019-10-18  9:37   ` Uladzislau Rezki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).