All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
@ 2022-04-05 19:40 ` Omar Sandoval
  0 siblings, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2022-04-05 19:40 UTC (permalink / raw)
  To: linux-mm, kexec, Andrew Morton
  Cc: Uladzislau Rezki, Christoph Hellwig, Cliff Wickman, x86, kernel-team

From: Omar Sandoval <osandov@fb.com>

Commit 3ee48b6af49c ("mm, x86: Saving vmcore with non-lazy freeing of
vmas") introduced set_iounmap_nonlazy(), which sets vmap_lazy_nr to
lazy_max_pages() + 1, ensuring that any future vunmaps() immediately
purges the vmap areas instead of doing it lazily.

Commit 690467c81b1a ("mm/vmalloc: Move draining areas out of caller
context") moved the purging from the vunmap() caller to a worker thread.
Unfortunately, set_iounmap_nonlazy() can cause the worker thread to spin
(possibly forever). For example, consider the following scenario:

1. Thread reads from /proc/vmcore. This eventually calls
   __copy_oldmem_page() -> set_iounmap_nonlazy(), which sets
   vmap_lazy_nr to lazy_max_pages() + 1.
2. Then it calls free_vmap_area_noflush() (via iounmap()), which adds 2
   pages (one page plus the guard page) to the purge list and
   vmap_lazy_nr. vmap_lazy_nr is now lazy_max_pages() + 3, so the
   drain_vmap_work is scheduled.
3. Thread returns from the kernel and is scheduled out.
4. Worker thread is scheduled in and calls drain_vmap_area_work(). It
   frees the 2 pages on the purge list. vmap_lazy_nr is now
   lazy_max_pages() + 1.
5. This is still over the threshold, so it tries to purge areas again,
   but doesn't find anything.
6. Repeat 5.

If the system is running with only one CPU (which is typicial for kdump)
and preemption is disabled, then this will never make forward progress:
there aren't any more pages to purge, so it hangs. If there is more than
one CPU or preemption is enabled, then the worker thread will spin
forever in the background. (Note that if there were already pages to be
purged at the time that set_iounmap_nonlazy() was called, this bug is
avoided.)

This can be reproduced with anything that reads from /proc/vmcore
multiple times. E.g., vmcore-dmesg /proc/vmcore.

A simple way to "fix" this would be to make set_iounmap_nonlazy() set
vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I
think it'd be better to get rid of this hack of clobbering vmap_lazy_nr.
Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap
areas itself.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 arch/x86/include/asm/io.h       |  2 +-
 arch/x86/kernel/crash_dump_64.c |  2 +-
 mm/vmalloc.c                    | 21 ++++++++++-----------
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index f6d91ecb8026..da466352f27c 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -210,7 +210,7 @@ void __iomem *ioremap(resource_size_t offset, unsigned long size);
 extern void iounmap(volatile void __iomem *addr);
 #define iounmap iounmap
 
-extern void set_iounmap_nonlazy(void);
+void iounmap_purge_vmap_area(void);
 
 #ifdef __KERNEL__
 
diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
index a7f617a3981d..075dd36c502d 100644
--- a/arch/x86/kernel/crash_dump_64.c
+++ b/arch/x86/kernel/crash_dump_64.c
@@ -37,8 +37,8 @@ static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
 	} else
 		memcpy(buf, vaddr + offset, csize);
 
-	set_iounmap_nonlazy();
 	iounmap((void __iomem *)vaddr);
+	iounmap_purge_vmap_area();
 	return csize;
 }
 
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e163372d3967..48084d742688 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1671,17 +1671,6 @@ static DEFINE_MUTEX(vmap_purge_lock);
 /* for per-CPU blocks */
 static void purge_fragmented_blocks_allcpus(void);
 
-#ifdef CONFIG_X86_64
-/*
- * called before a call to iounmap() if the caller wants vm_area_struct's
- * immediately freed.
- */
-void set_iounmap_nonlazy(void)
-{
-	atomic_long_set(&vmap_lazy_nr, lazy_max_pages()+1);
-}
-#endif /* CONFIG_X86_64 */
-
 /*
  * Purges all lazily-freed vmap areas.
  */
@@ -1753,6 +1742,16 @@ static void purge_vmap_area_lazy(void)
 	mutex_unlock(&vmap_purge_lock);
 }
 
+#ifdef CONFIG_X86_64
+/* Called after iounmap() to immediately free vm_area_struct's. */
+void iounmap_purge_vmap_area(void)
+{
+	mutex_lock(&vmap_purge_lock);
+	__purge_vmap_area_lazy(ULONG_MAX, 0);
+	mutex_unlock(&vmap_purge_lock);
+}
+#endif /* CONFIG_X86_64 */
+
 static void drain_vmap_area_work(struct work_struct *work)
 {
 	unsigned long nr_lazy;
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
@ 2022-04-05 19:40 ` Omar Sandoval
  0 siblings, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2022-04-05 19:40 UTC (permalink / raw)
  To: kexec

From: Omar Sandoval <osandov@fb.com>

Commit 3ee48b6af49c ("mm, x86: Saving vmcore with non-lazy freeing of
vmas") introduced set_iounmap_nonlazy(), which sets vmap_lazy_nr to
lazy_max_pages() + 1, ensuring that any future vunmaps() immediately
purges the vmap areas instead of doing it lazily.

Commit 690467c81b1a ("mm/vmalloc: Move draining areas out of caller
context") moved the purging from the vunmap() caller to a worker thread.
Unfortunately, set_iounmap_nonlazy() can cause the worker thread to spin
(possibly forever). For example, consider the following scenario:

1. Thread reads from /proc/vmcore. This eventually calls
   __copy_oldmem_page() -> set_iounmap_nonlazy(), which sets
   vmap_lazy_nr to lazy_max_pages() + 1.
2. Then it calls free_vmap_area_noflush() (via iounmap()), which adds 2
   pages (one page plus the guard page) to the purge list and
   vmap_lazy_nr. vmap_lazy_nr is now lazy_max_pages() + 3, so the
   drain_vmap_work is scheduled.
3. Thread returns from the kernel and is scheduled out.
4. Worker thread is scheduled in and calls drain_vmap_area_work(). It
   frees the 2 pages on the purge list. vmap_lazy_nr is now
   lazy_max_pages() + 1.
5. This is still over the threshold, so it tries to purge areas again,
   but doesn't find anything.
6. Repeat 5.

If the system is running with only one CPU (which is typicial for kdump)
and preemption is disabled, then this will never make forward progress:
there aren't any more pages to purge, so it hangs. If there is more than
one CPU or preemption is enabled, then the worker thread will spin
forever in the background. (Note that if there were already pages to be
purged at the time that set_iounmap_nonlazy() was called, this bug is
avoided.)

This can be reproduced with anything that reads from /proc/vmcore
multiple times. E.g., vmcore-dmesg /proc/vmcore.

A simple way to "fix" this would be to make set_iounmap_nonlazy() set
vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I
think it'd be better to get rid of this hack of clobbering vmap_lazy_nr.
Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap
areas itself.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 arch/x86/include/asm/io.h       |  2 +-
 arch/x86/kernel/crash_dump_64.c |  2 +-
 mm/vmalloc.c                    | 21 ++++++++++-----------
 3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index f6d91ecb8026..da466352f27c 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -210,7 +210,7 @@ void __iomem *ioremap(resource_size_t offset, unsigned long size);
 extern void iounmap(volatile void __iomem *addr);
 #define iounmap iounmap
 
-extern void set_iounmap_nonlazy(void);
+void iounmap_purge_vmap_area(void);
 
 #ifdef __KERNEL__
 
diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
index a7f617a3981d..075dd36c502d 100644
--- a/arch/x86/kernel/crash_dump_64.c
+++ b/arch/x86/kernel/crash_dump_64.c
@@ -37,8 +37,8 @@ static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
 	} else
 		memcpy(buf, vaddr + offset, csize);
 
-	set_iounmap_nonlazy();
 	iounmap((void __iomem *)vaddr);
+	iounmap_purge_vmap_area();
 	return csize;
 }
 
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index e163372d3967..48084d742688 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1671,17 +1671,6 @@ static DEFINE_MUTEX(vmap_purge_lock);
 /* for per-CPU blocks */
 static void purge_fragmented_blocks_allcpus(void);
 
-#ifdef CONFIG_X86_64
-/*
- * called before a call to iounmap() if the caller wants vm_area_struct's
- * immediately freed.
- */
-void set_iounmap_nonlazy(void)
-{
-	atomic_long_set(&vmap_lazy_nr, lazy_max_pages()+1);
-}
-#endif /* CONFIG_X86_64 */
-
 /*
  * Purges all lazily-freed vmap areas.
  */
@@ -1753,6 +1742,16 @@ static void purge_vmap_area_lazy(void)
 	mutex_unlock(&vmap_purge_lock);
 }
 
+#ifdef CONFIG_X86_64
+/* Called after iounmap() to immediately free vm_area_struct's. */
+void iounmap_purge_vmap_area(void)
+{
+	mutex_lock(&vmap_purge_lock);
+	__purge_vmap_area_lazy(ULONG_MAX, 0);
+	mutex_unlock(&vmap_purge_lock);
+}
+#endif /* CONFIG_X86_64 */
+
 static void drain_vmap_area_work(struct work_struct *work)
 {
 	unsigned long nr_lazy;
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
  2022-04-05 19:40 ` Omar Sandoval
@ 2022-04-05 20:28   ` Uladzislau Rezki
  -1 siblings, 0 replies; 14+ messages in thread
From: Uladzislau Rezki @ 2022-04-05 20:28 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: linux-mm, kexec, Andrew Morton, Uladzislau Rezki,
	Christoph Hellwig, Cliff Wickman, x86, kernel-team

On Tue, Apr 05, 2022 at 12:40:31PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Commit 3ee48b6af49c ("mm, x86: Saving vmcore with non-lazy freeing of
> vmas") introduced set_iounmap_nonlazy(), which sets vmap_lazy_nr to
> lazy_max_pages() + 1, ensuring that any future vunmaps() immediately
> purges the vmap areas instead of doing it lazily.
> 
> Commit 690467c81b1a ("mm/vmalloc: Move draining areas out of caller
> context") moved the purging from the vunmap() caller to a worker thread.
> Unfortunately, set_iounmap_nonlazy() can cause the worker thread to spin
> (possibly forever). For example, consider the following scenario:
> 
> 1. Thread reads from /proc/vmcore. This eventually calls
>    __copy_oldmem_page() -> set_iounmap_nonlazy(), which sets
>    vmap_lazy_nr to lazy_max_pages() + 1.
> 2. Then it calls free_vmap_area_noflush() (via iounmap()), which adds 2
>    pages (one page plus the guard page) to the purge list and
>    vmap_lazy_nr. vmap_lazy_nr is now lazy_max_pages() + 3, so the
>    drain_vmap_work is scheduled.
> 3. Thread returns from the kernel and is scheduled out.
> 4. Worker thread is scheduled in and calls drain_vmap_area_work(). It
>    frees the 2 pages on the purge list. vmap_lazy_nr is now
>    lazy_max_pages() + 1.
> 5. This is still over the threshold, so it tries to purge areas again,
>    but doesn't find anything.
> 6. Repeat 5.
> 
> If the system is running with only one CPU (which is typicial for kdump)
> and preemption is disabled, then this will never make forward progress:
> there aren't any more pages to purge, so it hangs. If there is more than
> one CPU or preemption is enabled, then the worker thread will spin
> forever in the background. (Note that if there were already pages to be
> purged at the time that set_iounmap_nonlazy() was called, this bug is
> avoided.)
> 
> This can be reproduced with anything that reads from /proc/vmcore
> multiple times. E.g., vmcore-dmesg /proc/vmcore.
> 
> A simple way to "fix" this would be to make set_iounmap_nonlazy() set
> vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I
> think it'd be better to get rid of this hack of clobbering vmap_lazy_nr.
> Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap
> areas itself.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  arch/x86/include/asm/io.h       |  2 +-
>  arch/x86/kernel/crash_dump_64.c |  2 +-
>  mm/vmalloc.c                    | 21 ++++++++++-----------
>  3 files changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
> index f6d91ecb8026..da466352f27c 100644
> --- a/arch/x86/include/asm/io.h
> +++ b/arch/x86/include/asm/io.h
> @@ -210,7 +210,7 @@ void __iomem *ioremap(resource_size_t offset, unsigned long size);
>  extern void iounmap(volatile void __iomem *addr);
>  #define iounmap iounmap
>  
> -extern void set_iounmap_nonlazy(void);
> +void iounmap_purge_vmap_area(void);
>  
>  #ifdef __KERNEL__
>  
> diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
> index a7f617a3981d..075dd36c502d 100644
> --- a/arch/x86/kernel/crash_dump_64.c
> +++ b/arch/x86/kernel/crash_dump_64.c
> @@ -37,8 +37,8 @@ static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
>  	} else
>  		memcpy(buf, vaddr + offset, csize);
>  
> -	set_iounmap_nonlazy();
>  	iounmap((void __iomem *)vaddr);
> +	iounmap_purge_vmap_area();
>  	return csize;
>  }
>  
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index e163372d3967..48084d742688 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1671,17 +1671,6 @@ static DEFINE_MUTEX(vmap_purge_lock);
>  /* for per-CPU blocks */
>  static void purge_fragmented_blocks_allcpus(void);
>  
> -#ifdef CONFIG_X86_64
> -/*
> - * called before a call to iounmap() if the caller wants vm_area_struct's
> - * immediately freed.
> - */
> -void set_iounmap_nonlazy(void)
> -{
> -	atomic_long_set(&vmap_lazy_nr, lazy_max_pages()+1);
> -}
> -#endif /* CONFIG_X86_64 */
> -
>  /*
>   * Purges all lazily-freed vmap areas.
>   */
> @@ -1753,6 +1742,16 @@ static void purge_vmap_area_lazy(void)
>  	mutex_unlock(&vmap_purge_lock);
>  }
>  
> +#ifdef CONFIG_X86_64
> +/* Called after iounmap() to immediately free vm_area_struct's. */
> +void iounmap_purge_vmap_area(void)
> +{
> +	mutex_lock(&vmap_purge_lock);
> +	__purge_vmap_area_lazy(ULONG_MAX, 0);
> +	mutex_unlock(&vmap_purge_lock);
> +}
> +#endif /* CONFIG_X86_64 */
> +
>  static void drain_vmap_area_work(struct work_struct *work)
>  {
>  	unsigned long nr_lazy;
> -- 
> 2.35.1
> 
Indeed setting the vmap_lazy_nr to lazy_max_pages()+1 in order to
force the drain sounds like a hack.

Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
@ 2022-04-05 20:28   ` Uladzislau Rezki
  0 siblings, 0 replies; 14+ messages in thread
From: Uladzislau Rezki @ 2022-04-05 20:28 UTC (permalink / raw)
  To: kexec

On Tue, Apr 05, 2022 at 12:40:31PM -0700, Omar Sandoval wrote:
> From: Omar Sandoval <osandov@fb.com>
> 
> Commit 3ee48b6af49c ("mm, x86: Saving vmcore with non-lazy freeing of
> vmas") introduced set_iounmap_nonlazy(), which sets vmap_lazy_nr to
> lazy_max_pages() + 1, ensuring that any future vunmaps() immediately
> purges the vmap areas instead of doing it lazily.
> 
> Commit 690467c81b1a ("mm/vmalloc: Move draining areas out of caller
> context") moved the purging from the vunmap() caller to a worker thread.
> Unfortunately, set_iounmap_nonlazy() can cause the worker thread to spin
> (possibly forever). For example, consider the following scenario:
> 
> 1. Thread reads from /proc/vmcore. This eventually calls
>    __copy_oldmem_page() -> set_iounmap_nonlazy(), which sets
>    vmap_lazy_nr to lazy_max_pages() + 1.
> 2. Then it calls free_vmap_area_noflush() (via iounmap()), which adds 2
>    pages (one page plus the guard page) to the purge list and
>    vmap_lazy_nr. vmap_lazy_nr is now lazy_max_pages() + 3, so the
>    drain_vmap_work is scheduled.
> 3. Thread returns from the kernel and is scheduled out.
> 4. Worker thread is scheduled in and calls drain_vmap_area_work(). It
>    frees the 2 pages on the purge list. vmap_lazy_nr is now
>    lazy_max_pages() + 1.
> 5. This is still over the threshold, so it tries to purge areas again,
>    but doesn't find anything.
> 6. Repeat 5.
> 
> If the system is running with only one CPU (which is typicial for kdump)
> and preemption is disabled, then this will never make forward progress:
> there aren't any more pages to purge, so it hangs. If there is more than
> one CPU or preemption is enabled, then the worker thread will spin
> forever in the background. (Note that if there were already pages to be
> purged at the time that set_iounmap_nonlazy() was called, this bug is
> avoided.)
> 
> This can be reproduced with anything that reads from /proc/vmcore
> multiple times. E.g., vmcore-dmesg /proc/vmcore.
> 
> A simple way to "fix" this would be to make set_iounmap_nonlazy() set
> vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I
> think it'd be better to get rid of this hack of clobbering vmap_lazy_nr.
> Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap
> areas itself.
> 
> Signed-off-by: Omar Sandoval <osandov@fb.com>
> ---
>  arch/x86/include/asm/io.h       |  2 +-
>  arch/x86/kernel/crash_dump_64.c |  2 +-
>  mm/vmalloc.c                    | 21 ++++++++++-----------
>  3 files changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
> index f6d91ecb8026..da466352f27c 100644
> --- a/arch/x86/include/asm/io.h
> +++ b/arch/x86/include/asm/io.h
> @@ -210,7 +210,7 @@ void __iomem *ioremap(resource_size_t offset, unsigned long size);
>  extern void iounmap(volatile void __iomem *addr);
>  #define iounmap iounmap
>  
> -extern void set_iounmap_nonlazy(void);
> +void iounmap_purge_vmap_area(void);
>  
>  #ifdef __KERNEL__
>  
> diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c
> index a7f617a3981d..075dd36c502d 100644
> --- a/arch/x86/kernel/crash_dump_64.c
> +++ b/arch/x86/kernel/crash_dump_64.c
> @@ -37,8 +37,8 @@ static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize,
>  	} else
>  		memcpy(buf, vaddr + offset, csize);
>  
> -	set_iounmap_nonlazy();
>  	iounmap((void __iomem *)vaddr);
> +	iounmap_purge_vmap_area();
>  	return csize;
>  }
>  
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index e163372d3967..48084d742688 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1671,17 +1671,6 @@ static DEFINE_MUTEX(vmap_purge_lock);
>  /* for per-CPU blocks */
>  static void purge_fragmented_blocks_allcpus(void);
>  
> -#ifdef CONFIG_X86_64
> -/*
> - * called before a call to iounmap() if the caller wants vm_area_struct's
> - * immediately freed.
> - */
> -void set_iounmap_nonlazy(void)
> -{
> -	atomic_long_set(&vmap_lazy_nr, lazy_max_pages()+1);
> -}
> -#endif /* CONFIG_X86_64 */
> -
>  /*
>   * Purges all lazily-freed vmap areas.
>   */
> @@ -1753,6 +1742,16 @@ static void purge_vmap_area_lazy(void)
>  	mutex_unlock(&vmap_purge_lock);
>  }
>  
> +#ifdef CONFIG_X86_64
> +/* Called after iounmap() to immediately free vm_area_struct's. */
> +void iounmap_purge_vmap_area(void)
> +{
> +	mutex_lock(&vmap_purge_lock);
> +	__purge_vmap_area_lazy(ULONG_MAX, 0);
> +	mutex_unlock(&vmap_purge_lock);
> +}
> +#endif /* CONFIG_X86_64 */
> +
>  static void drain_vmap_area_work(struct work_struct *work)
>  {
>  	unsigned long nr_lazy;
> -- 
> 2.35.1
> 
Indeed setting the vmap_lazy_nr to lazy_max_pages()+1 in order to
force the drain sounds like a hack.

Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
  2022-04-05 19:40 ` Omar Sandoval
@ 2022-04-06  4:42   ` Christoph Hellwig
  -1 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2022-04-06  4:42 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: linux-mm, kexec, Andrew Morton, Uladzislau Rezki,
	Christoph Hellwig, Cliff Wickman, x86, kernel-team

On Tue, Apr 05, 2022 at 12:40:31PM -0700, Omar Sandoval wrote:
> A simple way to "fix" this would be to make set_iounmap_nonlazy() set
> vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I
> think it'd be better to get rid of this hack of clobbering vmap_lazy_nr.
> Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap
> areas itself.

This fixes the bug and the interface also is better than what we had
before.  But a vmap/iounmap_eager would seem even better.  But hey,
right now it has one caller in always built іn x86 arch code, so maybe
it isn't worth spending more effort on this.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
@ 2022-04-06  4:42   ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2022-04-06  4:42 UTC (permalink / raw)
  To: kexec

On Tue, Apr 05, 2022 at 12:40:31PM -0700, Omar Sandoval wrote:
> A simple way to "fix" this would be to make set_iounmap_nonlazy() set
> vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I
> think it'd be better to get rid of this hack of clobbering vmap_lazy_nr.
> Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap
> areas itself.

This fixes the bug and the interface also is better than what we had
before.  But a vmap/iounmap_eager would seem even better.  But hey,
right now it has one caller in always built ?n x86 arch code, so maybe
it isn't worth spending more effort on this.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
  2022-04-06  4:42   ` Christoph Hellwig
@ 2022-04-06  9:13     ` Uladzislau Rezki
  -1 siblings, 0 replies; 14+ messages in thread
From: Uladzislau Rezki @ 2022-04-06  9:13 UTC (permalink / raw)
  To: Christoph Hellwig, Omar Sandoval
  Cc: Omar Sandoval, linux-mm, kexec, Andrew Morton, Uladzislau Rezki,
	Cliff Wickman, x86, kernel-team

> On Tue, Apr 05, 2022 at 12:40:31PM -0700, Omar Sandoval wrote:
> > A simple way to "fix" this would be to make set_iounmap_nonlazy() set
> > vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I
> > think it'd be better to get rid of this hack of clobbering vmap_lazy_nr.
> > Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap
> > areas itself.
> 
> This fixes the bug and the interface also is better than what we had
> before.  But a vmap/iounmap_eager would seem even better.  But hey,
> right now it has one caller in always built іn x86 arch code, so maybe
> it isn't worth spending more effort on this.
>
IMHO, it just makes sense to remove it. The set_iounmap_nonlazy() was
added in 2010 year:

<snip>
commit 3ee48b6af49cf534ca2f481ecc484b156a41451d
Author: Cliff Wickman <cpw@sgi.com>
Date:   Thu Sep 16 11:44:02 2010 -0500

    mm, x86: Saving vmcore with non-lazy freeing of vmas

    During the reading of /proc/vmcore the kernel is doing
    ioremap()/iounmap() repeatedly. And the buildup of un-flushed
    vm_area_struct's is causing a great deal of overhead. (rb_next()
    is chewing up most of that time).

    This solution is to provide function set_iounmap_nonlazy(). It
    causes a subsequent call to iounmap() to immediately purge the
    vma area (with try_purge_vmap_area_lazy()).

    With this patch we have seen the time for writing a 250MB
    compressed dump drop from 71 seconds to 44 seconds.

    Signed-off-by: Cliff Wickman <cpw@sgi.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: kexec@lists.infradead.org
    Cc: <stable@kernel.org>
    LKML-Reference: <E1OwHZ4-0005WK-Tw@eag09.americas.sgi.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
<snip>

and the reason was the "slow vmap" code, i.e. due to poor performance
they decided to drop the lazily ASAP. Now we have absolutely different
picture when it comes to performance and the vmalloc/vmap code.

Any thoughts?

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
@ 2022-04-06  9:13     ` Uladzislau Rezki
  0 siblings, 0 replies; 14+ messages in thread
From: Uladzislau Rezki @ 2022-04-06  9:13 UTC (permalink / raw)
  To: kexec

> On Tue, Apr 05, 2022 at 12:40:31PM -0700, Omar Sandoval wrote:
> > A simple way to "fix" this would be to make set_iounmap_nonlazy() set
> > vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I
> > think it'd be better to get rid of this hack of clobbering vmap_lazy_nr.
> > Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap
> > areas itself.
> 
> This fixes the bug and the interface also is better than what we had
> before.  But a vmap/iounmap_eager would seem even better.  But hey,
> right now it has one caller in always built ?n x86 arch code, so maybe
> it isn't worth spending more effort on this.
>
IMHO, it just makes sense to remove it. The set_iounmap_nonlazy() was
added in 2010 year:

<snip>
commit 3ee48b6af49cf534ca2f481ecc484b156a41451d
Author: Cliff Wickman <cpw@sgi.com>
Date:   Thu Sep 16 11:44:02 2010 -0500

    mm, x86: Saving vmcore with non-lazy freeing of vmas

    During the reading of /proc/vmcore the kernel is doing
    ioremap()/iounmap() repeatedly. And the buildup of un-flushed
    vm_area_struct's is causing a great deal of overhead. (rb_next()
    is chewing up most of that time).

    This solution is to provide function set_iounmap_nonlazy(). It
    causes a subsequent call to iounmap() to immediately purge the
    vma area (with try_purge_vmap_area_lazy()).

    With this patch we have seen the time for writing a 250MB
    compressed dump drop from 71 seconds to 44 seconds.

    Signed-off-by: Cliff Wickman <cpw@sgi.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: kexec at lists.infradead.org
    Cc: <stable@kernel.org>
    LKML-Reference: <E1OwHZ4-0005WK-Tw@eag09.americas.sgi.com>
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
<snip>

and the reason was the "slow vmap" code, i.e. due to poor performance
they decided to drop the lazily ASAP. Now we have absolutely different
picture when it comes to performance and the vmalloc/vmap code.

Any thoughts?

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
  2022-04-06  9:13     ` Uladzislau Rezki
@ 2022-04-06  9:59       ` Baoquan He
  -1 siblings, 0 replies; 14+ messages in thread
From: Baoquan He @ 2022-04-06  9:59 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Christoph Hellwig, Omar Sandoval, linux-mm, kexec, Andrew Morton,
	Cliff Wickman, x86, kernel-team

On 04/06/22 at 11:13am, Uladzislau Rezki wrote:
> > On Tue, Apr 05, 2022 at 12:40:31PM -0700, Omar Sandoval wrote:
> > > A simple way to "fix" this would be to make set_iounmap_nonlazy() set
> > > vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I
> > > think it'd be better to get rid of this hack of clobbering vmap_lazy_nr.
> > > Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap
> > > areas itself.
> > 
> > This fixes the bug and the interface also is better than what we had
> > before.  But a vmap/iounmap_eager would seem even better.  But hey,
> > right now it has one caller in always built іn x86 arch code, so maybe
> > it isn't worth spending more effort on this.
> >
> IMHO, it just makes sense to remove it. The set_iounmap_nonlazy() was
> added in 2010 year:
> 
> <snip>
> commit 3ee48b6af49cf534ca2f481ecc484b156a41451d
> Author: Cliff Wickman <cpw@sgi.com>
> Date:   Thu Sep 16 11:44:02 2010 -0500
> 
>     mm, x86: Saving vmcore with non-lazy freeing of vmas
> 
>     During the reading of /proc/vmcore the kernel is doing
>     ioremap()/iounmap() repeatedly. And the buildup of un-flushed
>     vm_area_struct's is causing a great deal of overhead. (rb_next()
>     is chewing up most of that time).
> 
>     This solution is to provide function set_iounmap_nonlazy(). It
>     causes a subsequent call to iounmap() to immediately purge the
>     vma area (with try_purge_vmap_area_lazy()).
> 
>     With this patch we have seen the time for writing a 250MB
>     compressed dump drop from 71 seconds to 44 seconds.
> 
>     Signed-off-by: Cliff Wickman <cpw@sgi.com>
>     Cc: Andrew Morton <akpm@linux-foundation.org>
>     Cc: kexec@lists.infradead.org
>     Cc: <stable@kernel.org>
>     LKML-Reference: <E1OwHZ4-0005WK-Tw@eag09.americas.sgi.com>
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> <snip>
> 
> and the reason was the "slow vmap" code, i.e. due to poor performance
> they decided to drop the lazily ASAP. Now we have absolutely different
> picture when it comes to performance and the vmalloc/vmap code.

I would vote for the current code change, removing it. As pointed out by
Christoph, it's only used by x86, may not be so worth to introduce a new
interface.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
@ 2022-04-06  9:59       ` Baoquan He
  0 siblings, 0 replies; 14+ messages in thread
From: Baoquan He @ 2022-04-06  9:59 UTC (permalink / raw)
  To: kexec

On 04/06/22 at 11:13am, Uladzislau Rezki wrote:
> > On Tue, Apr 05, 2022 at 12:40:31PM -0700, Omar Sandoval wrote:
> > > A simple way to "fix" this would be to make set_iounmap_nonlazy() set
> > > vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I
> > > think it'd be better to get rid of this hack of clobbering vmap_lazy_nr.
> > > Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap
> > > areas itself.
> > 
> > This fixes the bug and the interface also is better than what we had
> > before.  But a vmap/iounmap_eager would seem even better.  But hey,
> > right now it has one caller in always built ?n x86 arch code, so maybe
> > it isn't worth spending more effort on this.
> >
> IMHO, it just makes sense to remove it. The set_iounmap_nonlazy() was
> added in 2010 year:
> 
> <snip>
> commit 3ee48b6af49cf534ca2f481ecc484b156a41451d
> Author: Cliff Wickman <cpw@sgi.com>
> Date:   Thu Sep 16 11:44:02 2010 -0500
> 
>     mm, x86: Saving vmcore with non-lazy freeing of vmas
> 
>     During the reading of /proc/vmcore the kernel is doing
>     ioremap()/iounmap() repeatedly. And the buildup of un-flushed
>     vm_area_struct's is causing a great deal of overhead. (rb_next()
>     is chewing up most of that time).
> 
>     This solution is to provide function set_iounmap_nonlazy(). It
>     causes a subsequent call to iounmap() to immediately purge the
>     vma area (with try_purge_vmap_area_lazy()).
> 
>     With this patch we have seen the time for writing a 250MB
>     compressed dump drop from 71 seconds to 44 seconds.
> 
>     Signed-off-by: Cliff Wickman <cpw@sgi.com>
>     Cc: Andrew Morton <akpm@linux-foundation.org>
>     Cc: kexec at lists.infradead.org
>     Cc: <stable@kernel.org>
>     LKML-Reference: <E1OwHZ4-0005WK-Tw@eag09.americas.sgi.com>
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> <snip>
> 
> and the reason was the "slow vmap" code, i.e. due to poor performance
> they decided to drop the lazily ASAP. Now we have absolutely different
> picture when it comes to performance and the vmalloc/vmap code.

I would vote for the current code change, removing it. As pointed out by
Christoph, it's only used by x86, may not be so worth to introduce a new
interface.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
  2022-04-06  9:59       ` Baoquan He
@ 2022-04-06 20:16         ` Omar Sandoval
  -1 siblings, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2022-04-06 20:16 UTC (permalink / raw)
  To: Baoquan He
  Cc: Uladzislau Rezki, Christoph Hellwig, linux-mm, kexec,
	Andrew Morton, Cliff Wickman, x86, kernel-team

On Wed, Apr 06, 2022 at 05:59:53PM +0800, Baoquan He wrote:
> On 04/06/22 at 11:13am, Uladzislau Rezki wrote:
> > > On Tue, Apr 05, 2022 at 12:40:31PM -0700, Omar Sandoval wrote:
> > > > A simple way to "fix" this would be to make set_iounmap_nonlazy() set
> > > > vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I
> > > > think it'd be better to get rid of this hack of clobbering vmap_lazy_nr.
> > > > Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap
> > > > areas itself.
> > > 
> > > This fixes the bug and the interface also is better than what we had
> > > before.  But a vmap/iounmap_eager would seem even better.  But hey,
> > > right now it has one caller in always built іn x86 arch code, so maybe
> > > it isn't worth spending more effort on this.
> > >
> > IMHO, it just makes sense to remove it. The set_iounmap_nonlazy() was
> > added in 2010 year:
> > 
> > <snip>
> > commit 3ee48b6af49cf534ca2f481ecc484b156a41451d
> > Author: Cliff Wickman <cpw@sgi.com>
> > Date:   Thu Sep 16 11:44:02 2010 -0500
> > 
> >     mm, x86: Saving vmcore with non-lazy freeing of vmas
> > 
> >     During the reading of /proc/vmcore the kernel is doing
> >     ioremap()/iounmap() repeatedly. And the buildup of un-flushed
> >     vm_area_struct's is causing a great deal of overhead. (rb_next()
> >     is chewing up most of that time).
> > 
> >     This solution is to provide function set_iounmap_nonlazy(). It
> >     causes a subsequent call to iounmap() to immediately purge the
> >     vma area (with try_purge_vmap_area_lazy()).
> > 
> >     With this patch we have seen the time for writing a 250MB
> >     compressed dump drop from 71 seconds to 44 seconds.
> > 
> >     Signed-off-by: Cliff Wickman <cpw@sgi.com>
> >     Cc: Andrew Morton <akpm@linux-foundation.org>
> >     Cc: kexec@lists.infradead.org
> >     Cc: <stable@kernel.org>
> >     LKML-Reference: <E1OwHZ4-0005WK-Tw@eag09.americas.sgi.com>
> >     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > <snip>
> > 
> > and the reason was the "slow vmap" code, i.e. due to poor performance
> > they decided to drop the lazily ASAP. Now we have absolutely different
> > picture when it comes to performance and the vmalloc/vmap code.
> 
> I would vote for the current code change, removing it. As pointed out by
> Christoph, it's only used by x86, may not be so worth to introduce a new
> interface.

I did a quick benchmark to see if this optimization is still needed.
This is on a system with 32GB RAM. I timed
`dd if=/proc/vmcore of=/dev/null` with 4k and 1M block sizes on 5.17,
5.18 with this fix, and 5.18 with the non-lazy cleanup removed entirely.
It looks like Uladzislau has a point, and this "optimization" actually
slows things down now:

  |5.17  |5.18+fix|5.18+removal
4k|40.86s|  40.09s|      26.73s
1M|24.47s|  23.98s|      21.84s

I'll send a v2 which removes set_iounmap_nonlazy() entirely.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
@ 2022-04-06 20:16         ` Omar Sandoval
  0 siblings, 0 replies; 14+ messages in thread
From: Omar Sandoval @ 2022-04-06 20:16 UTC (permalink / raw)
  To: kexec

On Wed, Apr 06, 2022 at 05:59:53PM +0800, Baoquan He wrote:
> On 04/06/22 at 11:13am, Uladzislau Rezki wrote:
> > > On Tue, Apr 05, 2022 at 12:40:31PM -0700, Omar Sandoval wrote:
> > > > A simple way to "fix" this would be to make set_iounmap_nonlazy() set
> > > > vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I
> > > > think it'd be better to get rid of this hack of clobbering vmap_lazy_nr.
> > > > Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap
> > > > areas itself.
> > > 
> > > This fixes the bug and the interface also is better than what we had
> > > before.  But a vmap/iounmap_eager would seem even better.  But hey,
> > > right now it has one caller in always built ?n x86 arch code, so maybe
> > > it isn't worth spending more effort on this.
> > >
> > IMHO, it just makes sense to remove it. The set_iounmap_nonlazy() was
> > added in 2010 year:
> > 
> > <snip>
> > commit 3ee48b6af49cf534ca2f481ecc484b156a41451d
> > Author: Cliff Wickman <cpw@sgi.com>
> > Date:   Thu Sep 16 11:44:02 2010 -0500
> > 
> >     mm, x86: Saving vmcore with non-lazy freeing of vmas
> > 
> >     During the reading of /proc/vmcore the kernel is doing
> >     ioremap()/iounmap() repeatedly. And the buildup of un-flushed
> >     vm_area_struct's is causing a great deal of overhead. (rb_next()
> >     is chewing up most of that time).
> > 
> >     This solution is to provide function set_iounmap_nonlazy(). It
> >     causes a subsequent call to iounmap() to immediately purge the
> >     vma area (with try_purge_vmap_area_lazy()).
> > 
> >     With this patch we have seen the time for writing a 250MB
> >     compressed dump drop from 71 seconds to 44 seconds.
> > 
> >     Signed-off-by: Cliff Wickman <cpw@sgi.com>
> >     Cc: Andrew Morton <akpm@linux-foundation.org>
> >     Cc: kexec at lists.infradead.org
> >     Cc: <stable@kernel.org>
> >     LKML-Reference: <E1OwHZ4-0005WK-Tw@eag09.americas.sgi.com>
> >     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > <snip>
> > 
> > and the reason was the "slow vmap" code, i.e. due to poor performance
> > they decided to drop the lazily ASAP. Now we have absolutely different
> > picture when it comes to performance and the vmalloc/vmap code.
> 
> I would vote for the current code change, removing it. As pointed out by
> Christoph, it's only used by x86, may not be so worth to introduce a new
> interface.

I did a quick benchmark to see if this optimization is still needed.
This is on a system with 32GB RAM. I timed
`dd if=/proc/vmcore of=/dev/null` with 4k and 1M block sizes on 5.17,
5.18 with this fix, and 5.18 with the non-lazy cleanup removed entirely.
It looks like Uladzislau has a point, and this "optimization" actually
slows things down now:

  |5.17  |5.18+fix|5.18+removal
4k|40.86s|  40.09s|      26.73s
1M|24.47s|  23.98s|      21.84s

I'll send a v2 which removes set_iounmap_nonlazy() entirely.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
  2022-04-06 20:16         ` Omar Sandoval
@ 2022-04-07  2:32           ` Baoquan He
  -1 siblings, 0 replies; 14+ messages in thread
From: Baoquan He @ 2022-04-07  2:32 UTC (permalink / raw)
  To: Omar Sandoval
  Cc: Uladzislau Rezki, Christoph Hellwig, linux-mm, kexec,
	Andrew Morton, Cliff Wickman, x86, kernel-team

On 04/06/22 at 01:16pm, Omar Sandoval wrote:
> On Wed, Apr 06, 2022 at 05:59:53PM +0800, Baoquan He wrote:
> > On 04/06/22 at 11:13am, Uladzislau Rezki wrote:
> > > > On Tue, Apr 05, 2022 at 12:40:31PM -0700, Omar Sandoval wrote:
> > > > > A simple way to "fix" this would be to make set_iounmap_nonlazy() set
> > > > > vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I
> > > > > think it'd be better to get rid of this hack of clobbering vmap_lazy_nr.
> > > > > Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap
> > > > > areas itself.
> > > > 
> > > > This fixes the bug and the interface also is better than what we had
> > > > before.  But a vmap/iounmap_eager would seem even better.  But hey,
> > > > right now it has one caller in always built іn x86 arch code, so maybe
> > > > it isn't worth spending more effort on this.
> > > >
> > > IMHO, it just makes sense to remove it. The set_iounmap_nonlazy() was
> > > added in 2010 year:
> > > 
> > > <snip>
> > > commit 3ee48b6af49cf534ca2f481ecc484b156a41451d
> > > Author: Cliff Wickman <cpw@sgi.com>
> > > Date:   Thu Sep 16 11:44:02 2010 -0500
> > > 
> > >     mm, x86: Saving vmcore with non-lazy freeing of vmas
> > > 
> > >     During the reading of /proc/vmcore the kernel is doing
> > >     ioremap()/iounmap() repeatedly. And the buildup of un-flushed
> > >     vm_area_struct's is causing a great deal of overhead. (rb_next()
> > >     is chewing up most of that time).
> > > 
> > >     This solution is to provide function set_iounmap_nonlazy(). It
> > >     causes a subsequent call to iounmap() to immediately purge the
> > >     vma area (with try_purge_vmap_area_lazy()).
> > > 
> > >     With this patch we have seen the time for writing a 250MB
> > >     compressed dump drop from 71 seconds to 44 seconds.
> > > 
> > >     Signed-off-by: Cliff Wickman <cpw@sgi.com>
> > >     Cc: Andrew Morton <akpm@linux-foundation.org>
> > >     Cc: kexec@lists.infradead.org
> > >     Cc: <stable@kernel.org>
> > >     LKML-Reference: <E1OwHZ4-0005WK-Tw@eag09.americas.sgi.com>
> > >     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > > <snip>
> > > 
> > > and the reason was the "slow vmap" code, i.e. due to poor performance
> > > they decided to drop the lazily ASAP. Now we have absolutely different
> > > picture when it comes to performance and the vmalloc/vmap code.
> > 
> > I would vote for the current code change, removing it. As pointed out by
> > Christoph, it's only used by x86, may not be so worth to introduce a new
> > interface.
> 
> I did a quick benchmark to see if this optimization is still needed.
> This is on a system with 32GB RAM. I timed
> `dd if=/proc/vmcore of=/dev/null` with 4k and 1M block sizes on 5.17,
> 5.18 with this fix, and 5.18 with the non-lazy cleanup removed entirely.
> It looks like Uladzislau has a point, and this "optimization" actually
> slows things down now:
> 
>   |5.17  |5.18+fix|5.18+removal
> 4k|40.86s|  40.09s|      26.73s
> 1M|24.47s|  23.98s|      21.84s
> 
> I'll send a v2 which removes set_iounmap_nonlazy() entirely.

Hi Omar,

Thanks for the effort on posting patch to fix this and further benchmark
testing.

The removing I said means what you are doing in v1. While from your
testing result, seems removing set_iounmap_nonlazy() directly is better.
I agree that the old optimization was made too long ago, should be not
needed any more. E.g the added purge_vmap_area_root tree will speed up
the searching, adding and removing of the purged vmap_area.

I am wondering if this is a real issue you met, or you just found it by
code inspecting. If it breaks vmcore dumping, we should add 'Fixes' tag
and cc stable.

I am wondering how your vmcore dumping is handled. Asking this because we
usually use makedumpfile utility to filter and dump those kernel data, the
unneeded data, e.g zero-ed pages, unused pages, are all filtered out. While
using makedumpfile, we use mmap which is 4M at one time by default, then
process the content. So the copy_oldmem_page() may only be called during
elfcorehdr and notes reading.

Thanks
Baoquan



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
@ 2022-04-07  2:32           ` Baoquan He
  0 siblings, 0 replies; 14+ messages in thread
From: Baoquan He @ 2022-04-07  2:32 UTC (permalink / raw)
  To: kexec

On 04/06/22 at 01:16pm, Omar Sandoval wrote:
> On Wed, Apr 06, 2022 at 05:59:53PM +0800, Baoquan He wrote:
> > On 04/06/22 at 11:13am, Uladzislau Rezki wrote:
> > > > On Tue, Apr 05, 2022 at 12:40:31PM -0700, Omar Sandoval wrote:
> > > > > A simple way to "fix" this would be to make set_iounmap_nonlazy() set
> > > > > vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I
> > > > > think it'd be better to get rid of this hack of clobbering vmap_lazy_nr.
> > > > > Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap
> > > > > areas itself.
> > > > 
> > > > This fixes the bug and the interface also is better than what we had
> > > > before.  But a vmap/iounmap_eager would seem even better.  But hey,
> > > > right now it has one caller in always built ?n x86 arch code, so maybe
> > > > it isn't worth spending more effort on this.
> > > >
> > > IMHO, it just makes sense to remove it. The set_iounmap_nonlazy() was
> > > added in 2010 year:
> > > 
> > > <snip>
> > > commit 3ee48b6af49cf534ca2f481ecc484b156a41451d
> > > Author: Cliff Wickman <cpw@sgi.com>
> > > Date:   Thu Sep 16 11:44:02 2010 -0500
> > > 
> > >     mm, x86: Saving vmcore with non-lazy freeing of vmas
> > > 
> > >     During the reading of /proc/vmcore the kernel is doing
> > >     ioremap()/iounmap() repeatedly. And the buildup of un-flushed
> > >     vm_area_struct's is causing a great deal of overhead. (rb_next()
> > >     is chewing up most of that time).
> > > 
> > >     This solution is to provide function set_iounmap_nonlazy(). It
> > >     causes a subsequent call to iounmap() to immediately purge the
> > >     vma area (with try_purge_vmap_area_lazy()).
> > > 
> > >     With this patch we have seen the time for writing a 250MB
> > >     compressed dump drop from 71 seconds to 44 seconds.
> > > 
> > >     Signed-off-by: Cliff Wickman <cpw@sgi.com>
> > >     Cc: Andrew Morton <akpm@linux-foundation.org>
> > >     Cc: kexec at lists.infradead.org
> > >     Cc: <stable@kernel.org>
> > >     LKML-Reference: <E1OwHZ4-0005WK-Tw@eag09.americas.sgi.com>
> > >     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > > <snip>
> > > 
> > > and the reason was the "slow vmap" code, i.e. due to poor performance
> > > they decided to drop the lazily ASAP. Now we have absolutely different
> > > picture when it comes to performance and the vmalloc/vmap code.
> > 
> > I would vote for the current code change, removing it. As pointed out by
> > Christoph, it's only used by x86, may not be so worth to introduce a new
> > interface.
> 
> I did a quick benchmark to see if this optimization is still needed.
> This is on a system with 32GB RAM. I timed
> `dd if=/proc/vmcore of=/dev/null` with 4k and 1M block sizes on 5.17,
> 5.18 with this fix, and 5.18 with the non-lazy cleanup removed entirely.
> It looks like Uladzislau has a point, and this "optimization" actually
> slows things down now:
> 
>   |5.17  |5.18+fix|5.18+removal
> 4k|40.86s|  40.09s|      26.73s
> 1M|24.47s|  23.98s|      21.84s
> 
> I'll send a v2 which removes set_iounmap_nonlazy() entirely.

Hi Omar,

Thanks for the effort on posting patch to fix this and further benchmark
testing.

The removing I said means what you are doing in v1. While from your
testing result, seems removing set_iounmap_nonlazy() directly is better.
I agree that the old optimization was made too long ago, should be not
needed any more. E.g the added purge_vmap_area_root tree will speed up
the searching, adding and removing of the purged vmap_area.

I am wondering if this is a real issue you met, or you just found it by
code inspecting. If it breaks vmcore dumping, we should add 'Fixes' tag
and cc stable.

I am wondering how your vmcore dumping is handled. Asking this because we
usually use makedumpfile utility to filter and dump those kernel data, the
unneeded data, e.g zero-ed pages, unused pages, are all filtered out. While
using makedumpfile, we use mmap which is 4M at one time by default, then
process the content. So the copy_oldmem_page() may only be called during
elfcorehdr and notes reading.

Thanks
Baoquan



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-04-07  2:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-05 19:40 [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore Omar Sandoval
2022-04-05 19:40 ` Omar Sandoval
2022-04-05 20:28 ` Uladzislau Rezki
2022-04-05 20:28   ` Uladzislau Rezki
2022-04-06  4:42 ` Christoph Hellwig
2022-04-06  4:42   ` Christoph Hellwig
2022-04-06  9:13   ` Uladzislau Rezki
2022-04-06  9:13     ` Uladzislau Rezki
2022-04-06  9:59     ` Baoquan He
2022-04-06  9:59       ` Baoquan He
2022-04-06 20:16       ` Omar Sandoval
2022-04-06 20:16         ` Omar Sandoval
2022-04-07  2:32         ` Baoquan He
2022-04-07  2:32           ` Baoquan He

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.