linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/1] mm/vmalloc: Move draining areas out of caller context
@ 2022-01-25 16:39 Uladzislau Rezki (Sony)
  2022-01-25 16:50 ` Matthew Wilcox
  0 siblings, 1 reply; 5+ messages in thread
From: Uladzislau Rezki (Sony) @ 2022-01-25 16:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, LKML, Christoph Hellwig, Matthew Wilcox,
	Nicholas Piggin, Uladzislau Rezki, Oleksiy Avramchenko

A caller initiates the drain procces from its context once the
drain threshold is reached or passed. There are at least two
drawbacks of doing so:

a) a caller can be a high-prio or RT task. In that case it can
   stuck in doing the actual drain of all lazily freed areas.
   This is not optimal because such tasks usually are latency
   sensitive where the control should be returned back as soon
   as possible in order to drive such workloads in time. See
   96e2db456135 ("mm/vmalloc: rework the drain logic")

b) It is not safe to call vfree() during holding a spinlock due
   to the vmap_purge_lock mutex. The was a report about this from
   Zeal Robot <zealci@zte.com.cn> here:
   https://lore.kernel.org/all/20211222081026.484058-1-chi.minghao@zte.com.cn

Moving the drain to the separate work context addresses those
issues.

v1->v2:
   - Added prefix "_work" to the drain worker function.

Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 mm/vmalloc.c | 35 ++++++++++++++++++++++-------------
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index bdc7222f87d4..e5285c9d2e2a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -793,6 +793,9 @@ RB_DECLARE_CALLBACKS_MAX(static, free_vmap_area_rb_augment_cb,
 static void purge_vmap_area_lazy(void);
 static BLOCKING_NOTIFIER_HEAD(vmap_notify_list);
 static unsigned long lazy_max_pages(void);
+static void drain_vmap_area_work(struct work_struct *work);
+static DECLARE_WORK(drain_vmap_work, drain_vmap_area_work);
+static atomic_t drain_vmap_work_in_progress;
 
 static atomic_long_t nr_vmalloc_pages;
 
@@ -1719,18 +1722,6 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
 	return true;
 }
 
-/*
- * Kick off a purge of the outstanding lazy areas. Don't bother if somebody
- * is already purging.
- */
-static void try_purge_vmap_area_lazy(void)
-{
-	if (mutex_trylock(&vmap_purge_lock)) {
-		__purge_vmap_area_lazy(ULONG_MAX, 0);
-		mutex_unlock(&vmap_purge_lock);
-	}
-}
-
 /*
  * Kick off a purge of the outstanding lazy areas.
  */
@@ -1742,6 +1733,23 @@ static void purge_vmap_area_lazy(void)
 	mutex_unlock(&vmap_purge_lock);
 }
 
+static void drain_vmap_area_work(struct work_struct *work)
+{
+	unsigned long nr_lazy;
+
+	do {
+		mutex_lock(&vmap_purge_lock);
+		__purge_vmap_area_lazy(ULONG_MAX, 0);
+		mutex_unlock(&vmap_purge_lock);
+
+		/* Recheck if further work is required. */
+		nr_lazy = atomic_long_read(&vmap_lazy_nr);
+	} while (nr_lazy > lazy_max_pages());
+
+	/* We are done at this point. */
+	atomic_set(&drain_vmap_work_in_progress, 0);
+}
+
 /*
  * Free a vmap area, caller ensuring that the area has been unmapped
  * and flush_cache_vunmap had been called for the correct range
@@ -1768,7 +1776,8 @@ static void free_vmap_area_noflush(struct vmap_area *va)
 
 	/* After this point, we may free va at any time */
 	if (unlikely(nr_lazy > lazy_max_pages()))
-		try_purge_vmap_area_lazy();
+		if (!atomic_xchg(&drain_vmap_work_in_progress, 1))
+			schedule_work(&drain_vmap_work);
 }
 
 /*
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/1] mm/vmalloc: Move draining areas out of caller context
  2022-01-25 16:39 [PATCH v2 1/1] mm/vmalloc: Move draining areas out of caller context Uladzislau Rezki (Sony)
@ 2022-01-25 16:50 ` Matthew Wilcox
  2022-01-25 17:12   ` Uladzislau Rezki
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2022-01-25 16:50 UTC (permalink / raw)
  To: Uladzislau Rezki (Sony)
  Cc: Andrew Morton, linux-mm, LKML, Christoph Hellwig,
	Nicholas Piggin, Oleksiy Avramchenko

On Tue, Jan 25, 2022 at 05:39:12PM +0100, Uladzislau Rezki (Sony) wrote:
> @@ -1768,7 +1776,8 @@ static void free_vmap_area_noflush(struct vmap_area *va)
>  
>  	/* After this point, we may free va at any time */
>  	if (unlikely(nr_lazy > lazy_max_pages()))
> -		try_purge_vmap_area_lazy();
> +		if (!atomic_xchg(&drain_vmap_work_in_progress, 1))
> +			schedule_work(&drain_vmap_work);
>  }

Is it necessary to have drain_vmap_work_in_progress?  The documentation
says:

 * This puts a job in the kernel-global workqueue if it was not already
 * queued and leaves it in the same position on the kernel-global
 * workqueue otherwise.

and the implementation seems to use test_and_set_bit() to ensure this
is true.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/1] mm/vmalloc: Move draining areas out of caller context
  2022-01-25 16:50 ` Matthew Wilcox
@ 2022-01-25 17:12   ` Uladzislau Rezki
  2022-01-25 18:46     ` Matthew Wilcox
  0 siblings, 1 reply; 5+ messages in thread
From: Uladzislau Rezki @ 2022-01-25 17:12 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Uladzislau Rezki (Sony),
	Andrew Morton, linux-mm, LKML, Christoph Hellwig,
	Nicholas Piggin, Oleksiy Avramchenko

On Tue, Jan 25, 2022 at 04:50:14PM +0000, Matthew Wilcox wrote:
> On Tue, Jan 25, 2022 at 05:39:12PM +0100, Uladzislau Rezki (Sony) wrote:
> > @@ -1768,7 +1776,8 @@ static void free_vmap_area_noflush(struct vmap_area *va)
> >  
> >  	/* After this point, we may free va at any time */
> >  	if (unlikely(nr_lazy > lazy_max_pages()))
> > -		try_purge_vmap_area_lazy();
> > +		if (!atomic_xchg(&drain_vmap_work_in_progress, 1))
> > +			schedule_work(&drain_vmap_work);
> >  }
> 
> Is it necessary to have drain_vmap_work_in_progress?  The documentation
> says:
> 
>  * This puts a job in the kernel-global workqueue if it was not already
>  * queued and leaves it in the same position on the kernel-global
>  * workqueue otherwise.
> 
> and the implementation seems to use test_and_set_bit() to ensure this
> is true.
>
It checks pending state, if the work is in run-queue you can place it
one more time. The motivation of having it is to prevent the drain work
of being placed several times at once what i see on my stress testing.

CPU_1: invokes vfree() -> queues the drain work -> TASK_RUNNING
CPU_2: invokes vfree() -> queues the drain work one more time since it was not pending
...

Instead of drain_vmap_work_in_progress hack we can make use of work_busy()
helper. The main concern with that was the comment around that function:

/**
 * work_busy - test whether a work is currently pending or running
 * @work: the work to be tested
 *
 * Test whether @work is currently pending or running.  There is no
 * synchronization around this function and the test result is
 * unreliable and only useful as advisory hints or for debugging.
 *
 * Return:
 * OR'd bitmask of WORK_BUSY_* bits.
 */

i am not sure how reliable this is.

Thoughts?

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/1] mm/vmalloc: Move draining areas out of caller context
  2022-01-25 17:12   ` Uladzislau Rezki
@ 2022-01-25 18:46     ` Matthew Wilcox
  2022-01-25 19:17       ` Uladzislau Rezki
  0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2022-01-25 18:46 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Andrew Morton, linux-mm, LKML, Christoph Hellwig,
	Nicholas Piggin, Oleksiy Avramchenko

On Tue, Jan 25, 2022 at 06:12:48PM +0100, Uladzislau Rezki wrote:
> On Tue, Jan 25, 2022 at 04:50:14PM +0000, Matthew Wilcox wrote:
> > On Tue, Jan 25, 2022 at 05:39:12PM +0100, Uladzislau Rezki (Sony) wrote:
> > > @@ -1768,7 +1776,8 @@ static void free_vmap_area_noflush(struct vmap_area *va)
> > >  
> > >  	/* After this point, we may free va at any time */
> > >  	if (unlikely(nr_lazy > lazy_max_pages()))
> > > -		try_purge_vmap_area_lazy();
> > > +		if (!atomic_xchg(&drain_vmap_work_in_progress, 1))
> > > +			schedule_work(&drain_vmap_work);
> > >  }
> > 
> > Is it necessary to have drain_vmap_work_in_progress?  The documentation
> > says:
> > 
> >  * This puts a job in the kernel-global workqueue if it was not already
> >  * queued and leaves it in the same position on the kernel-global
> >  * workqueue otherwise.
> > 
> > and the implementation seems to use test_and_set_bit() to ensure this
> > is true.
> >
> It checks pending state, if the work is in run-queue you can place it
> one more time. The motivation of having it is to prevent the drain work
> of being placed several times at once what i see on my stress testing.
> 
> CPU_1: invokes vfree() -> queues the drain work -> TASK_RUNNING
> CPU_2: invokes vfree() -> queues the drain work one more time since it was not pending

But why not unconditionally call schedule_work() here?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 1/1] mm/vmalloc: Move draining areas out of caller context
  2022-01-25 18:46     ` Matthew Wilcox
@ 2022-01-25 19:17       ` Uladzislau Rezki
  0 siblings, 0 replies; 5+ messages in thread
From: Uladzislau Rezki @ 2022-01-25 19:17 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Uladzislau Rezki, Andrew Morton, linux-mm, LKML,
	Christoph Hellwig, Nicholas Piggin, Oleksiy Avramchenko

On Tue, Jan 25, 2022 at 06:46:35PM +0000, Matthew Wilcox wrote:
> On Tue, Jan 25, 2022 at 06:12:48PM +0100, Uladzislau Rezki wrote:
> > On Tue, Jan 25, 2022 at 04:50:14PM +0000, Matthew Wilcox wrote:
> > > On Tue, Jan 25, 2022 at 05:39:12PM +0100, Uladzislau Rezki (Sony) wrote:
> > > > @@ -1768,7 +1776,8 @@ static void free_vmap_area_noflush(struct vmap_area *va)
> > > >  
> > > >  	/* After this point, we may free va at any time */
> > > >  	if (unlikely(nr_lazy > lazy_max_pages()))
> > > > -		try_purge_vmap_area_lazy();
> > > > +		if (!atomic_xchg(&drain_vmap_work_in_progress, 1))
> > > > +			schedule_work(&drain_vmap_work);
> > > >  }
> > > 
> > > Is it necessary to have drain_vmap_work_in_progress?  The documentation
> > > says:
> > > 
> > >  * This puts a job in the kernel-global workqueue if it was not already
> > >  * queued and leaves it in the same position on the kernel-global
> > >  * workqueue otherwise.
> > > 
> > > and the implementation seems to use test_and_set_bit() to ensure this
> > > is true.
> > >
> > It checks pending state, if the work is in run-queue you can place it
> > one more time. The motivation of having it is to prevent the drain work
> > of being placed several times at once what i see on my stress testing.
> > 
> > CPU_1: invokes vfree() -> queues the drain work -> TASK_RUNNING
> > CPU_2: invokes vfree() -> queues the drain work one more time since it was not pending
> 
> But why not unconditionally call schedule_work() here?
>
We can :) The question is do we agree that extra queuing will be kind of
spurious? Because the CPU_1 will complete all cleanups once it is physically
on CPU and others workers just bail out.

We can disregard those spurious wake-ups for sure. If someone complains about
it in the future we can think later then.

Re-spin and do it unconditionally? I do not have a strong opinion about it.

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-01-25 19:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-25 16:39 [PATCH v2 1/1] mm/vmalloc: Move draining areas out of caller context Uladzislau Rezki (Sony)
2022-01-25 16:50 ` Matthew Wilcox
2022-01-25 17:12   ` Uladzislau Rezki
2022-01-25 18:46     ` Matthew Wilcox
2022-01-25 19:17       ` Uladzislau Rezki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).