linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags
@ 2017-06-30  2:25 Mikulas Patocka
  2017-06-30  6:20 ` John Hubbard
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Mikulas Patocka @ 2017-06-30  2:25 UTC (permalink / raw)
  To: Michal Hocko, Alexei Starovoitov, Daniel Borkmann
  Cc: Andrew Morton, Stephen Rothwell, Vlastimil Babka, Andreas Dilger,
	John Hubbard, David Miller, linux-kernel, linux-mm, netdev

The __vmalloc function has a parameter gfp_mask with the allocation flags,
however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
pages are allocated with the specified gfp flags, but the pagetables are
always allocated with GFP_KERNEL. This allocation can cause unexpected
recursion into the filesystem or I/O subsystem.

It is not practical to extend page table allocation routines with gfp
flags because it would require modification of architecture-specific code
in all architecturs. However, the process can temporarily request that all
allocations are done with GFP_NOFS or GFP_NOIO with with the functions
memalloc_nofs_save and memalloc_noio_save.

This patch makes the vmalloc code use memalloc_nofs_save or
memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
__GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
flag.

The patch also simplifies code in dm-bufio.c, dm-ioctl.c and fs/xfs/kmem.c
by removing explicit calls to memalloc_nofs_save and memalloc_noio_save
before the call to __vmalloc.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm-bufio.c |   24 +-----------------------
 drivers/md/dm-ioctl.c |    6 +-----
 fs/xfs/kmem.c         |   14 --------------
 mm/util.c             |    6 +++---
 mm/vmalloc.c          |   18 +++++++++++++++++-
 5 files changed, 22 insertions(+), 46 deletions(-)

Index: linux-2.6/mm/vmalloc.c
===================================================================
--- linux-2.6.orig/mm/vmalloc.c
+++ linux-2.6/mm/vmalloc.c
@@ -31,6 +31,7 @@
 #include <linux/compiler.h>
 #include <linux/llist.h>
 #include <linux/bitops.h>
+#include <linux/sched/mm.h>
 
 #include <linux/uaccess.h>
 #include <asm/tlbflush.h>
@@ -1670,6 +1671,8 @@ static void *__vmalloc_area_node(struct
 	unsigned int nr_pages, array_size, i;
 	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
 	const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN;
+	unsigned noio_flag;
+	int r;
 
 	nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
 	array_size = (nr_pages * sizeof(struct page *));
@@ -1712,8 +1715,21 @@ static void *__vmalloc_area_node(struct
 			cond_resched();
 	}
 
-	if (map_vm_area(area, prot, pages))
+	if (unlikely(!(gfp_mask & __GFP_IO)))
+		noio_flag = memalloc_noio_save();
+	else if (unlikely(!(gfp_mask & __GFP_FS)))
+		noio_flag = memalloc_nofs_save();
+
+	r = map_vm_area(area, prot, pages);
+
+	if (unlikely(!(gfp_mask & __GFP_IO)))
+		memalloc_noio_restore(noio_flag);
+	else if (unlikely(!(gfp_mask & __GFP_FS)))
+		memalloc_nofs_restore(noio_flag);
+
+	if (unlikely(r))
 		goto fail;
+
 	return area->addr;
 
 fail:
Index: linux-2.6/mm/util.c
===================================================================
--- linux-2.6.orig/mm/util.c
+++ linux-2.6/mm/util.c
@@ -351,10 +351,10 @@ void *kvmalloc_node(size_t size, gfp_t f
 	void *ret;
 
 	/*
-	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
-	 * so the given set of flags has to be compatible.
+	 * vmalloc uses blocking allocations for some internal allocations
+	 * (e.g page tables) so the given set of flags has to be compatible.
 	 */
-	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
+	WARN_ON_ONCE(!gfpflags_allow_blocking(flags));
 
 	/*
 	 * We want to attempt a large physically contiguous block first because
Index: linux-2.6/drivers/md/dm-bufio.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-bufio.c
+++ linux-2.6/drivers/md/dm-bufio.c
@@ -386,9 +386,6 @@ static void __cache_size_refresh(void)
 static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
 			       enum data_mode *data_mode)
 {
-	unsigned noio_flag;
-	void *ptr;
-
 	if (c->block_size <= DM_BUFIO_BLOCK_SIZE_SLAB_LIMIT) {
 		*data_mode = DATA_MODE_SLAB;
 		return kmem_cache_alloc(DM_BUFIO_CACHE(c), gfp_mask);
@@ -402,26 +399,7 @@ static void *alloc_buffer_data(struct dm
 	}
 
 	*data_mode = DATA_MODE_VMALLOC;
-
-	/*
-	 * __vmalloc allocates the data pages and auxiliary structures with
-	 * gfp_flags that were specified, but pagetables are always allocated
-	 * with GFP_KERNEL, no matter what was specified as gfp_mask.
-	 *
-	 * Consequently, we must set per-process flag PF_MEMALLOC_NOIO so that
-	 * all allocations done by this process (including pagetables) are done
-	 * as if GFP_NOIO was specified.
-	 */
-
-	if (gfp_mask & __GFP_NORETRY)
-		noio_flag = memalloc_noio_save();
-
-	ptr = __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL);
-
-	if (gfp_mask & __GFP_NORETRY)
-		memalloc_noio_restore(noio_flag);
-
-	return ptr;
+	return __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL);
 }
 
 /*
Index: linux-2.6/drivers/md/dm-ioctl.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-ioctl.c
+++ linux-2.6/drivers/md/dm-ioctl.c
@@ -1691,7 +1691,6 @@ static int copy_params(struct dm_ioctl _
 	struct dm_ioctl *dmi;
 	int secure_data;
 	const size_t minimum_data_size = offsetof(struct dm_ioctl, data);
-	unsigned noio_flag;
 
 	if (copy_from_user(param_kernel, user, minimum_data_size))
 		return -EFAULT;
@@ -1714,10 +1713,7 @@ static int copy_params(struct dm_ioctl _
 	 * suspended and the ioctl is needed to resume it.
 	 * Use kmalloc() rather than vmalloc() when we can.
 	 */
-	dmi = NULL;
-	noio_flag = memalloc_noio_save();
-	dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_HIGH);
-	memalloc_noio_restore(noio_flag);
+	dmi = kvmalloc(param_kernel->data_size, GFP_NOIO | __GFP_HIGH);
 
 	if (!dmi) {
 		if (secure_data && clear_user(user, param_kernel->data_size))
Index: linux-2.6/fs/xfs/kmem.c
===================================================================
--- linux-2.6.orig/fs/xfs/kmem.c
+++ linux-2.6/fs/xfs/kmem.c
@@ -48,7 +48,6 @@ kmem_alloc(size_t size, xfs_km_flags_t f
 void *
 kmem_zalloc_large(size_t size, xfs_km_flags_t flags)
 {
-	unsigned nofs_flag = 0;
 	void	*ptr;
 	gfp_t	lflags;
 
@@ -56,22 +55,9 @@ kmem_zalloc_large(size_t size, xfs_km_fl
 	if (ptr)
 		return ptr;
 
-	/*
-	 * __vmalloc() will allocate data pages and auxillary structures (e.g.
-	 * pagetables) with GFP_KERNEL, yet we may be under GFP_NOFS context
-	 * here. Hence we need to tell memory reclaim that we are in such a
-	 * context via PF_MEMALLOC_NOFS to prevent memory reclaim re-entering
-	 * the filesystem here and potentially deadlocking.
-	 */
-	if (flags & KM_NOFS)
-		nofs_flag = memalloc_nofs_save();
-
 	lflags = kmem_flags_convert(flags);
 	ptr = __vmalloc(size, lflags | __GFP_ZERO, PAGE_KERNEL);
 
-	if (flags & KM_NOFS)
-		memalloc_nofs_restore(nofs_flag);
-
 	return ptr;
 }
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags
  2017-06-30  2:25 [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags Mikulas Patocka
@ 2017-06-30  6:20 ` John Hubbard
  2017-06-30  8:12 ` Michal Hocko
  2017-07-01  3:25 ` Andreas Dilger
  2 siblings, 0 replies; 11+ messages in thread
From: John Hubbard @ 2017-06-30  6:20 UTC (permalink / raw)
  To: Mikulas Patocka, Michal Hocko, Alexei Starovoitov, Daniel Borkmann
  Cc: Andrew Morton, Stephen Rothwell, Vlastimil Babka, Andreas Dilger,
	David Miller, linux-kernel, linux-mm, netdev

On 06/29/2017 07:25 PM, Mikulas Patocka wrote:
> The __vmalloc function has a parameter gfp_mask with the allocation flags,
> however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> pages are allocated with the specified gfp flags, but the pagetables are
> always allocated with GFP_KERNEL. This allocation can cause unexpected
> recursion into the filesystem or I/O subsystem.
> 
> It is not practical to extend page table allocation routines with gfp
> flags because it would require modification of architecture-specific code
> in all architecturs. However, the process can temporarily request that all
> allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> memalloc_nofs_save and memalloc_noio_save.
> 
> This patch makes the vmalloc code use memalloc_nofs_save or
> memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> flag.
> 
> The patch also simplifies code in dm-bufio.c, dm-ioctl.c and fs/xfs/kmem.c
> by removing explicit calls to memalloc_nofs_save and memalloc_noio_save
> before the call to __vmalloc.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> 
> ---
>  drivers/md/dm-bufio.c |   24 +-----------------------
>  drivers/md/dm-ioctl.c |    6 +-----
>  fs/xfs/kmem.c         |   14 --------------
>  mm/util.c             |    6 +++---
>  mm/vmalloc.c          |   18 +++++++++++++++++-
>  5 files changed, 22 insertions(+), 46 deletions(-)
> 
> Index: linux-2.6/mm/vmalloc.c
> ===================================================================
> --- linux-2.6.orig/mm/vmalloc.c
> +++ linux-2.6/mm/vmalloc.c
> @@ -31,6 +31,7 @@
>  #include <linux/compiler.h>
>  #include <linux/llist.h>
>  #include <linux/bitops.h>
> +#include <linux/sched/mm.h>
>  
>  #include <linux/uaccess.h>
>  #include <asm/tlbflush.h>
> @@ -1670,6 +1671,8 @@ static void *__vmalloc_area_node(struct
>  	unsigned int nr_pages, array_size, i;
>  	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>  	const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN;
> +	unsigned noio_flag;
> +	int r;
>  
>  	nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
>  	array_size = (nr_pages * sizeof(struct page *));
> @@ -1712,8 +1715,21 @@ static void *__vmalloc_area_node(struct
>  			cond_resched();
>  	}
>  
> -	if (map_vm_area(area, prot, pages))
> +	if (unlikely(!(gfp_mask & __GFP_IO)))
> +		noio_flag = memalloc_noio_save();
> +	else if (unlikely(!(gfp_mask & __GFP_FS)))
> +		noio_flag = memalloc_nofs_save();
> +
> +	r = map_vm_area(area, prot, pages);
> +
> +	if (unlikely(!(gfp_mask & __GFP_IO)))
> +		memalloc_noio_restore(noio_flag);
> +	else if (unlikely(!(gfp_mask & __GFP_FS)))
> +		memalloc_nofs_restore(noio_flag);
> +
> +	if (unlikely(r))
>  		goto fail;
> +
>  	return area->addr;
>  
>  fail:
> Index: linux-2.6/mm/util.c
> ===================================================================
> --- linux-2.6.orig/mm/util.c
> +++ linux-2.6/mm/util.c
> @@ -351,10 +351,10 @@ void *kvmalloc_node(size_t size, gfp_t f
>  	void *ret;
>  
>  	/*
> -	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
> -	 * so the given set of flags has to be compatible.
> +	 * vmalloc uses blocking allocations for some internal allocations
> +	 * (e.g page tables) so the given set of flags has to be compatible.
>  	 */
> -	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> +	WARN_ON_ONCE(!gfpflags_allow_blocking(flags));

Hi Mikulas,

OK, so given the new behavior in the underlying __vmalloc code, I think it's
appropriate to add this documentation change on top of what you have so far:

 mm/util.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/util.c b/mm/util.c
index cdbc9022c021..39fe94530dd2 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -343,7 +343,8 @@ EXPORT_SYMBOL(vm_mmap);
  * __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc is
  * preferable to the vmalloc fallback, due to visible performance drawbacks.
  *
- * Any use of gfp flags outside of GFP_KERNEL should be consulted with mm people.
+ * Any use of gfp flags other than GFP_KERNEL, GFP_NOIO, or GFP_NOFS should
+ * be done only after consulting with mm people.
  */
 void *kvmalloc_node(size_t size, gfp_t flags, int node)
 {

>  
>  	/*
>  	 * We want to attempt a large physically contiguous block first because
> Index: linux-2.6/drivers/md/dm-bufio.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-bufio.c
> +++ linux-2.6/drivers/md/dm-bufio.c
> @@ -386,9 +386,6 @@ static void __cache_size_refresh(void)
>  static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
>  			       enum data_mode *data_mode)
>  {
> -	unsigned noio_flag;
> -	void *ptr;
> -
>  	if (c->block_size <= DM_BUFIO_BLOCK_SIZE_SLAB_LIMIT) {
>  		*data_mode = DATA_MODE_SLAB;
>  		return kmem_cache_alloc(DM_BUFIO_CACHE(c), gfp_mask);
> @@ -402,26 +399,7 @@ static void *alloc_buffer_data(struct dm
>  	}
>  
>  	*data_mode = DATA_MODE_VMALLOC;
> -
> -	/*
> -	 * __vmalloc allocates the data pages and auxiliary structures with
> -	 * gfp_flags that were specified, but pagetables are always allocated
> -	 * with GFP_KERNEL, no matter what was specified as gfp_mask.
> -	 *
> -	 * Consequently, we must set per-process flag PF_MEMALLOC_NOIO so that
> -	 * all allocations done by this process (including pagetables) are done
> -	 * as if GFP_NOIO was specified.
> -	 */
> -
> -	if (gfp_mask & __GFP_NORETRY)
> -		noio_flag = memalloc_noio_save();
> -
> -	ptr = __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL);
> -
> -	if (gfp_mask & __GFP_NORETRY)
> -		memalloc_noio_restore(noio_flag);
> -
> -	return ptr;
> +	return __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL);

Seems like a nice cleanup, and although I'm not an expert, it looks
correct to me.

>  }
>  
>  /*
> Index: linux-2.6/drivers/md/dm-ioctl.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-ioctl.c
> +++ linux-2.6/drivers/md/dm-ioctl.c
> @@ -1691,7 +1691,6 @@ static int copy_params(struct dm_ioctl _
>  	struct dm_ioctl *dmi;
>  	int secure_data;
>  	const size_t minimum_data_size = offsetof(struct dm_ioctl, data);
> -	unsigned noio_flag;
>  
>  	if (copy_from_user(param_kernel, user, minimum_data_size))
>  		return -EFAULT;
> @@ -1714,10 +1713,7 @@ static int copy_params(struct dm_ioctl _
>  	 * suspended and the ioctl is needed to resume it.
>  	 * Use kmalloc() rather than vmalloc() when we can.
>  	 */
> -	dmi = NULL;
> -	noio_flag = memalloc_noio_save();
> -	dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_HIGH);
> -	memalloc_noio_restore(noio_flag);
> +	dmi = kvmalloc(param_kernel->data_size, GFP_NOIO | __GFP_HIGH);

Pre-existing, and apologies if I missed a conversation about it, but is
__GFP_HIGH necessary and appropriate for kvmalloc()? If so, maybe we
should add it to the list of approved GFP flags in the function's
documentation. But I tentatively think you can just pass in GFP_NOIO
instead, and get the right behavior.

thanks,
john h

>  
>  	if (!dmi) {
>  		if (secure_data && clear_user(user, param_kernel->data_size))
> Index: linux-2.6/fs/xfs/kmem.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/kmem.c
> +++ linux-2.6/fs/xfs/kmem.c
> @@ -48,7 +48,6 @@ kmem_alloc(size_t size, xfs_km_flags_t f
>  void *
>  kmem_zalloc_large(size_t size, xfs_km_flags_t flags)
>  {
> -	unsigned nofs_flag = 0;
>  	void	*ptr;
>  	gfp_t	lflags;
>  
> @@ -56,22 +55,9 @@ kmem_zalloc_large(size_t size, xfs_km_fl
>  	if (ptr)
>  		return ptr;
>  
> -	/*
> -	 * __vmalloc() will allocate data pages and auxillary structures (e.g.
> -	 * pagetables) with GFP_KERNEL, yet we may be under GFP_NOFS context
> -	 * here. Hence we need to tell memory reclaim that we are in such a
> -	 * context via PF_MEMALLOC_NOFS to prevent memory reclaim re-entering
> -	 * the filesystem here and potentially deadlocking.
> -	 */
> -	if (flags & KM_NOFS)
> -		nofs_flag = memalloc_nofs_save();
> -
>  	lflags = kmem_flags_convert(flags);
>  	ptr = __vmalloc(size, lflags | __GFP_ZERO, PAGE_KERNEL);
>  
> -	if (flags & KM_NOFS)
> -		memalloc_nofs_restore(nofs_flag);
> -
>  	return ptr;
>  }
>  
> 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags
  2017-06-30  2:25 [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags Mikulas Patocka
  2017-06-30  6:20 ` John Hubbard
@ 2017-06-30  8:12 ` Michal Hocko
  2017-06-30 18:11   ` Mikulas Patocka
  2017-07-01  3:25 ` Andreas Dilger
  2 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2017-06-30  8:12 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrew Morton,
	Stephen Rothwell, Vlastimil Babka, Andreas Dilger, John Hubbard,
	David Miller, linux-kernel, linux-mm, netdev

On Thu 29-06-17 22:25:09, Mikulas Patocka wrote:
> The __vmalloc function has a parameter gfp_mask with the allocation flags,
> however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> pages are allocated with the specified gfp flags, but the pagetables are
> always allocated with GFP_KERNEL. This allocation can cause unexpected
> recursion into the filesystem or I/O subsystem.
> 
> It is not practical to extend page table allocation routines with gfp
> flags because it would require modification of architecture-specific code
> in all architecturs. However, the process can temporarily request that all
> allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> memalloc_nofs_save and memalloc_noio_save.
> 
> This patch makes the vmalloc code use memalloc_nofs_save or
> memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> flag.

I strongly believe this is a step in the _wrong_ direction. Why? Because
the memalloc_no{io,fs}_save API is for the scope allocation context. We
want users of the scope to define it and document why it is needed.
GFP_NOFS (I haven't checked GFP_NOIO users) is overused a _lot_ mostly
based on the filesystem should rather use it to prevent deadlock cargo
cult. This should change longterm because heavy fs workloads can cause
troubles to the memory reclaim. So we really want to encourage those
users to define nofs scopes (e.g. on journal locked contexts etc.)
rather than have them use the GFP_NOFS explicitly and very often
mindlessly.

I am not going to nack this patch because it not incorrect but I would
really like to discourage you from it because while it saves 24 lines of
code it (ab)uses the scope allocation context at a wrong layer.

> The patch also simplifies code in dm-bufio.c, dm-ioctl.c and fs/xfs/kmem.c
> by removing explicit calls to memalloc_nofs_save and memalloc_noio_save
> before the call to __vmalloc.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> 
> ---
>  drivers/md/dm-bufio.c |   24 +-----------------------
>  drivers/md/dm-ioctl.c |    6 +-----
>  fs/xfs/kmem.c         |   14 --------------
>  mm/util.c             |    6 +++---
>  mm/vmalloc.c          |   18 +++++++++++++++++-
>  5 files changed, 22 insertions(+), 46 deletions(-)
> 
> Index: linux-2.6/mm/vmalloc.c
> ===================================================================
> --- linux-2.6.orig/mm/vmalloc.c
> +++ linux-2.6/mm/vmalloc.c
> @@ -31,6 +31,7 @@
>  #include <linux/compiler.h>
>  #include <linux/llist.h>
>  #include <linux/bitops.h>
> +#include <linux/sched/mm.h>
>  
>  #include <linux/uaccess.h>
>  #include <asm/tlbflush.h>
> @@ -1670,6 +1671,8 @@ static void *__vmalloc_area_node(struct
>  	unsigned int nr_pages, array_size, i;
>  	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
>  	const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN;
> +	unsigned noio_flag;
> +	int r;
>  
>  	nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
>  	array_size = (nr_pages * sizeof(struct page *));
> @@ -1712,8 +1715,21 @@ static void *__vmalloc_area_node(struct
>  			cond_resched();
>  	}
>  
> -	if (map_vm_area(area, prot, pages))
> +	if (unlikely(!(gfp_mask & __GFP_IO)))
> +		noio_flag = memalloc_noio_save();
> +	else if (unlikely(!(gfp_mask & __GFP_FS)))
> +		noio_flag = memalloc_nofs_save();
> +
> +	r = map_vm_area(area, prot, pages);
> +
> +	if (unlikely(!(gfp_mask & __GFP_IO)))
> +		memalloc_noio_restore(noio_flag);
> +	else if (unlikely(!(gfp_mask & __GFP_FS)))
> +		memalloc_nofs_restore(noio_flag);
> +
> +	if (unlikely(r))
>  		goto fail;
> +
>  	return area->addr;
>  
>  fail:
> Index: linux-2.6/mm/util.c
> ===================================================================
> --- linux-2.6.orig/mm/util.c
> +++ linux-2.6/mm/util.c
> @@ -351,10 +351,10 @@ void *kvmalloc_node(size_t size, gfp_t f
>  	void *ret;
>  
>  	/*
> -	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
> -	 * so the given set of flags has to be compatible.
> +	 * vmalloc uses blocking allocations for some internal allocations
> +	 * (e.g page tables) so the given set of flags has to be compatible.
>  	 */
> -	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> +	WARN_ON_ONCE(!gfpflags_allow_blocking(flags));
>  
>  	/*
>  	 * We want to attempt a large physically contiguous block first because
> Index: linux-2.6/drivers/md/dm-bufio.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-bufio.c
> +++ linux-2.6/drivers/md/dm-bufio.c
> @@ -386,9 +386,6 @@ static void __cache_size_refresh(void)
>  static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
>  			       enum data_mode *data_mode)
>  {
> -	unsigned noio_flag;
> -	void *ptr;
> -
>  	if (c->block_size <= DM_BUFIO_BLOCK_SIZE_SLAB_LIMIT) {
>  		*data_mode = DATA_MODE_SLAB;
>  		return kmem_cache_alloc(DM_BUFIO_CACHE(c), gfp_mask);
> @@ -402,26 +399,7 @@ static void *alloc_buffer_data(struct dm
>  	}
>  
>  	*data_mode = DATA_MODE_VMALLOC;
> -
> -	/*
> -	 * __vmalloc allocates the data pages and auxiliary structures with
> -	 * gfp_flags that were specified, but pagetables are always allocated
> -	 * with GFP_KERNEL, no matter what was specified as gfp_mask.
> -	 *
> -	 * Consequently, we must set per-process flag PF_MEMALLOC_NOIO so that
> -	 * all allocations done by this process (including pagetables) are done
> -	 * as if GFP_NOIO was specified.
> -	 */
> -
> -	if (gfp_mask & __GFP_NORETRY)
> -		noio_flag = memalloc_noio_save();
> -
> -	ptr = __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL);
> -
> -	if (gfp_mask & __GFP_NORETRY)
> -		memalloc_noio_restore(noio_flag);
> -
> -	return ptr;
> +	return __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL);
>  }
>  
>  /*
> Index: linux-2.6/drivers/md/dm-ioctl.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-ioctl.c
> +++ linux-2.6/drivers/md/dm-ioctl.c
> @@ -1691,7 +1691,6 @@ static int copy_params(struct dm_ioctl _
>  	struct dm_ioctl *dmi;
>  	int secure_data;
>  	const size_t minimum_data_size = offsetof(struct dm_ioctl, data);
> -	unsigned noio_flag;
>  
>  	if (copy_from_user(param_kernel, user, minimum_data_size))
>  		return -EFAULT;
> @@ -1714,10 +1713,7 @@ static int copy_params(struct dm_ioctl _
>  	 * suspended and the ioctl is needed to resume it.
>  	 * Use kmalloc() rather than vmalloc() when we can.
>  	 */
> -	dmi = NULL;
> -	noio_flag = memalloc_noio_save();
> -	dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_HIGH);
> -	memalloc_noio_restore(noio_flag);
> +	dmi = kvmalloc(param_kernel->data_size, GFP_NOIO | __GFP_HIGH);
>  
>  	if (!dmi) {
>  		if (secure_data && clear_user(user, param_kernel->data_size))
> Index: linux-2.6/fs/xfs/kmem.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/kmem.c
> +++ linux-2.6/fs/xfs/kmem.c
> @@ -48,7 +48,6 @@ kmem_alloc(size_t size, xfs_km_flags_t f
>  void *
>  kmem_zalloc_large(size_t size, xfs_km_flags_t flags)
>  {
> -	unsigned nofs_flag = 0;
>  	void	*ptr;
>  	gfp_t	lflags;
>  
> @@ -56,22 +55,9 @@ kmem_zalloc_large(size_t size, xfs_km_fl
>  	if (ptr)
>  		return ptr;
>  
> -	/*
> -	 * __vmalloc() will allocate data pages and auxillary structures (e.g.
> -	 * pagetables) with GFP_KERNEL, yet we may be under GFP_NOFS context
> -	 * here. Hence we need to tell memory reclaim that we are in such a
> -	 * context via PF_MEMALLOC_NOFS to prevent memory reclaim re-entering
> -	 * the filesystem here and potentially deadlocking.
> -	 */
> -	if (flags & KM_NOFS)
> -		nofs_flag = memalloc_nofs_save();
> -
>  	lflags = kmem_flags_convert(flags);
>  	ptr = __vmalloc(size, lflags | __GFP_ZERO, PAGE_KERNEL);
>  
> -	if (flags & KM_NOFS)
> -		memalloc_nofs_restore(nofs_flag);
> -
>  	return ptr;
>  }
>  

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags
  2017-06-30  8:12 ` Michal Hocko
@ 2017-06-30 18:11   ` Mikulas Patocka
  2017-06-30 20:41     ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Mikulas Patocka @ 2017-06-30 18:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrew Morton,
	Stephen Rothwell, Vlastimil Babka, Andreas Dilger, John Hubbard,
	David Miller, linux-kernel, linux-mm, netdev



On Fri, 30 Jun 2017, Michal Hocko wrote:

> On Thu 29-06-17 22:25:09, Mikulas Patocka wrote:
> > The __vmalloc function has a parameter gfp_mask with the allocation flags,
> > however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> > pages are allocated with the specified gfp flags, but the pagetables are
> > always allocated with GFP_KERNEL. This allocation can cause unexpected
> > recursion into the filesystem or I/O subsystem.
> > 
> > It is not practical to extend page table allocation routines with gfp
> > flags because it would require modification of architecture-specific code
> > in all architecturs. However, the process can temporarily request that all
> > allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> > memalloc_nofs_save and memalloc_noio_save.
> > 
> > This patch makes the vmalloc code use memalloc_nofs_save or
> > memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> > __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> > fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> > fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> > flag.
> 
> I strongly believe this is a step in the _wrong_ direction. Why? Because

What do you think __vmalloc with GFP_NOIO should do? Print a warning? 
Silently ignore the GFP_NOIO flag?

Mikulas

> the memalloc_no{io,fs}_save API is for the scope allocation context. We
> want users of the scope to define it and document why it is needed.
> GFP_NOFS (I haven't checked GFP_NOIO users) is overused a _lot_ mostly
> based on the filesystem should rather use it to prevent deadlock cargo
> cult. This should change longterm because heavy fs workloads can cause
> troubles to the memory reclaim. So we really want to encourage those
> users to define nofs scopes (e.g. on journal locked contexts etc.)
> rather than have them use the GFP_NOFS explicitly and very often
> mindlessly.
> 
> I am not going to nack this patch because it not incorrect but I would
> really like to discourage you from it because while it saves 24 lines of
> code it (ab)uses the scope allocation context at a wrong layer.
> 
> > The patch also simplifies code in dm-bufio.c, dm-ioctl.c and fs/xfs/kmem.c
> > by removing explicit calls to memalloc_nofs_save and memalloc_noio_save
> > before the call to __vmalloc.
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > 
> > ---
> >  drivers/md/dm-bufio.c |   24 +-----------------------
> >  drivers/md/dm-ioctl.c |    6 +-----
> >  fs/xfs/kmem.c         |   14 --------------
> >  mm/util.c             |    6 +++---
> >  mm/vmalloc.c          |   18 +++++++++++++++++-
> >  5 files changed, 22 insertions(+), 46 deletions(-)
> > 
> > Index: linux-2.6/mm/vmalloc.c
> > ===================================================================
> > --- linux-2.6.orig/mm/vmalloc.c
> > +++ linux-2.6/mm/vmalloc.c
> > @@ -31,6 +31,7 @@
> >  #include <linux/compiler.h>
> >  #include <linux/llist.h>
> >  #include <linux/bitops.h>
> > +#include <linux/sched/mm.h>
> >  
> >  #include <linux/uaccess.h>
> >  #include <asm/tlbflush.h>
> > @@ -1670,6 +1671,8 @@ static void *__vmalloc_area_node(struct
> >  	unsigned int nr_pages, array_size, i;
> >  	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> >  	const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN;
> > +	unsigned noio_flag;
> > +	int r;
> >  
> >  	nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
> >  	array_size = (nr_pages * sizeof(struct page *));
> > @@ -1712,8 +1715,21 @@ static void *__vmalloc_area_node(struct
> >  			cond_resched();
> >  	}
> >  
> > -	if (map_vm_area(area, prot, pages))
> > +	if (unlikely(!(gfp_mask & __GFP_IO)))
> > +		noio_flag = memalloc_noio_save();
> > +	else if (unlikely(!(gfp_mask & __GFP_FS)))
> > +		noio_flag = memalloc_nofs_save();
> > +
> > +	r = map_vm_area(area, prot, pages);
> > +
> > +	if (unlikely(!(gfp_mask & __GFP_IO)))
> > +		memalloc_noio_restore(noio_flag);
> > +	else if (unlikely(!(gfp_mask & __GFP_FS)))
> > +		memalloc_nofs_restore(noio_flag);
> > +
> > +	if (unlikely(r))
> >  		goto fail;
> > +
> >  	return area->addr;
> >  
> >  fail:
> > Index: linux-2.6/mm/util.c
> > ===================================================================
> > --- linux-2.6.orig/mm/util.c
> > +++ linux-2.6/mm/util.c
> > @@ -351,10 +351,10 @@ void *kvmalloc_node(size_t size, gfp_t f
> >  	void *ret;
> >  
> >  	/*
> > -	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
> > -	 * so the given set of flags has to be compatible.
> > +	 * vmalloc uses blocking allocations for some internal allocations
> > +	 * (e.g page tables) so the given set of flags has to be compatible.
> >  	 */
> > -	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> > +	WARN_ON_ONCE(!gfpflags_allow_blocking(flags));
> >  
> >  	/*
> >  	 * We want to attempt a large physically contiguous block first because
> > Index: linux-2.6/drivers/md/dm-bufio.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm-bufio.c
> > +++ linux-2.6/drivers/md/dm-bufio.c
> > @@ -386,9 +386,6 @@ static void __cache_size_refresh(void)
> >  static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
> >  			       enum data_mode *data_mode)
> >  {
> > -	unsigned noio_flag;
> > -	void *ptr;
> > -
> >  	if (c->block_size <= DM_BUFIO_BLOCK_SIZE_SLAB_LIMIT) {
> >  		*data_mode = DATA_MODE_SLAB;
> >  		return kmem_cache_alloc(DM_BUFIO_CACHE(c), gfp_mask);
> > @@ -402,26 +399,7 @@ static void *alloc_buffer_data(struct dm
> >  	}
> >  
> >  	*data_mode = DATA_MODE_VMALLOC;
> > -
> > -	/*
> > -	 * __vmalloc allocates the data pages and auxiliary structures with
> > -	 * gfp_flags that were specified, but pagetables are always allocated
> > -	 * with GFP_KERNEL, no matter what was specified as gfp_mask.
> > -	 *
> > -	 * Consequently, we must set per-process flag PF_MEMALLOC_NOIO so that
> > -	 * all allocations done by this process (including pagetables) are done
> > -	 * as if GFP_NOIO was specified.
> > -	 */
> > -
> > -	if (gfp_mask & __GFP_NORETRY)
> > -		noio_flag = memalloc_noio_save();
> > -
> > -	ptr = __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL);
> > -
> > -	if (gfp_mask & __GFP_NORETRY)
> > -		memalloc_noio_restore(noio_flag);
> > -
> > -	return ptr;
> > +	return __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL);
> >  }
> >  
> >  /*
> > Index: linux-2.6/drivers/md/dm-ioctl.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm-ioctl.c
> > +++ linux-2.6/drivers/md/dm-ioctl.c
> > @@ -1691,7 +1691,6 @@ static int copy_params(struct dm_ioctl _
> >  	struct dm_ioctl *dmi;
> >  	int secure_data;
> >  	const size_t minimum_data_size = offsetof(struct dm_ioctl, data);
> > -	unsigned noio_flag;
> >  
> >  	if (copy_from_user(param_kernel, user, minimum_data_size))
> >  		return -EFAULT;
> > @@ -1714,10 +1713,7 @@ static int copy_params(struct dm_ioctl _
> >  	 * suspended and the ioctl is needed to resume it.
> >  	 * Use kmalloc() rather than vmalloc() when we can.
> >  	 */
> > -	dmi = NULL;
> > -	noio_flag = memalloc_noio_save();
> > -	dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_HIGH);
> > -	memalloc_noio_restore(noio_flag);
> > +	dmi = kvmalloc(param_kernel->data_size, GFP_NOIO | __GFP_HIGH);
> >  
> >  	if (!dmi) {
> >  		if (secure_data && clear_user(user, param_kernel->data_size))
> > Index: linux-2.6/fs/xfs/kmem.c
> > ===================================================================
> > --- linux-2.6.orig/fs/xfs/kmem.c
> > +++ linux-2.6/fs/xfs/kmem.c
> > @@ -48,7 +48,6 @@ kmem_alloc(size_t size, xfs_km_flags_t f
> >  void *
> >  kmem_zalloc_large(size_t size, xfs_km_flags_t flags)
> >  {
> > -	unsigned nofs_flag = 0;
> >  	void	*ptr;
> >  	gfp_t	lflags;
> >  
> > @@ -56,22 +55,9 @@ kmem_zalloc_large(size_t size, xfs_km_fl
> >  	if (ptr)
> >  		return ptr;
> >  
> > -	/*
> > -	 * __vmalloc() will allocate data pages and auxillary structures (e.g.
> > -	 * pagetables) with GFP_KERNEL, yet we may be under GFP_NOFS context
> > -	 * here. Hence we need to tell memory reclaim that we are in such a
> > -	 * context via PF_MEMALLOC_NOFS to prevent memory reclaim re-entering
> > -	 * the filesystem here and potentially deadlocking.
> > -	 */
> > -	if (flags & KM_NOFS)
> > -		nofs_flag = memalloc_nofs_save();
> > -
> >  	lflags = kmem_flags_convert(flags);
> >  	ptr = __vmalloc(size, lflags | __GFP_ZERO, PAGE_KERNEL);
> >  
> > -	if (flags & KM_NOFS)
> > -		memalloc_nofs_restore(nofs_flag);
> > -
> >  	return ptr;
> >  }
> >  
> 
> -- 
> Michal Hocko
> SUSE Labs
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags
  2017-06-30 18:11   ` Mikulas Patocka
@ 2017-06-30 20:41     ` Michal Hocko
  2017-07-01  0:36       ` Mikulas Patocka
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2017-06-30 20:41 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrew Morton,
	Stephen Rothwell, Vlastimil Babka, Andreas Dilger, John Hubbard,
	David Miller, linux-kernel, linux-mm, netdev

On Fri 30-06-17 14:11:57, Mikulas Patocka wrote:
> 
> 
> On Fri, 30 Jun 2017, Michal Hocko wrote:
> 
> > On Thu 29-06-17 22:25:09, Mikulas Patocka wrote:
> > > The __vmalloc function has a parameter gfp_mask with the allocation flags,
> > > however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> > > pages are allocated with the specified gfp flags, but the pagetables are
> > > always allocated with GFP_KERNEL. This allocation can cause unexpected
> > > recursion into the filesystem or I/O subsystem.
> > > 
> > > It is not practical to extend page table allocation routines with gfp
> > > flags because it would require modification of architecture-specific code
> > > in all architecturs. However, the process can temporarily request that all
> > > allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> > > memalloc_nofs_save and memalloc_noio_save.
> > > 
> > > This patch makes the vmalloc code use memalloc_nofs_save or
> > > memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> > > __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> > > fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> > > fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> > > flag.
> > 
> > I strongly believe this is a step in the _wrong_ direction. Why? Because
> 
> What do you think __vmalloc with GFP_NOIO should do? Print a warning? 
> Silently ignore the GFP_NOIO flag?

I think noio users are not that much different from nofs users. Simply
use the scope API at the place where the scope starts and document why
it is needed. vmalloc calls do not have to be any special then and they
do not even have to think about proper gfp flags and they can use
whatever is the default.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags
  2017-06-30 20:41     ` Michal Hocko
@ 2017-07-01  0:36       ` Mikulas Patocka
  2017-07-03  6:31         ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Mikulas Patocka @ 2017-07-01  0:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrew Morton,
	Stephen Rothwell, Vlastimil Babka, Andreas Dilger, John Hubbard,
	David Miller, linux-kernel, linux-mm, netdev



On Fri, 30 Jun 2017, Michal Hocko wrote:

> On Fri 30-06-17 14:11:57, Mikulas Patocka wrote:
> > 
> > 
> > On Fri, 30 Jun 2017, Michal Hocko wrote:
> > 
> > > On Thu 29-06-17 22:25:09, Mikulas Patocka wrote:
> > > > The __vmalloc function has a parameter gfp_mask with the allocation flags,
> > > > however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> > > > pages are allocated with the specified gfp flags, but the pagetables are
> > > > always allocated with GFP_KERNEL. This allocation can cause unexpected
> > > > recursion into the filesystem or I/O subsystem.
> > > > 
> > > > It is not practical to extend page table allocation routines with gfp
> > > > flags because it would require modification of architecture-specific code
> > > > in all architecturs. However, the process can temporarily request that all
> > > > allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> > > > memalloc_nofs_save and memalloc_noio_save.
> > > > 
> > > > This patch makes the vmalloc code use memalloc_nofs_save or
> > > > memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> > > > __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> > > > fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> > > > fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> > > > flag.
> > > 
> > > I strongly believe this is a step in the _wrong_ direction. Why? Because
> > 
> > What do you think __vmalloc with GFP_NOIO should do? Print a warning? 
> > Silently ignore the GFP_NOIO flag?
> 
> I think noio users are not that much different from nofs users. Simply
> use the scope API at the place where the scope starts and document why
> it is needed. vmalloc calls do not have to be any special then and they
> do not even have to think about proper gfp flags and they can use
> whatever is the default.
> -- 
> Michal Hocko
> SUSE Labs

But you didn't answer the question - what should __vmalloc with GFP_NOIO 
(or GFP_NOFS) do? Silently drop the flag? Print a warning? Or respect the 
flag?

Currently, it silently drops the GFP_NOIO or GFP_NOFS flag, but some 
programmers don't know it and use these flags. You can't blame those 
programmers for not knowing it.

Mikulas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags
  2017-06-30  2:25 [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags Mikulas Patocka
  2017-06-30  6:20 ` John Hubbard
  2017-06-30  8:12 ` Michal Hocko
@ 2017-07-01  3:25 ` Andreas Dilger
  2017-07-01  4:49   ` Mikulas Patocka
  2 siblings, 1 reply; 11+ messages in thread
From: Andreas Dilger @ 2017-07-01  3:25 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Michal Hocko, Alexei Starovoitov, Daniel Borkmann, Andrew Morton,
	Stephen Rothwell, Vlastimil Babka, John Hubbard, David Miller,
	linux-kernel, linux-mm, netdev

[-- Attachment #1: Type: text/plain, Size: 7494 bytes --]

On Jun 29, 2017, at 8:25 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> 
> The __vmalloc function has a parameter gfp_mask with the allocation flags,
> however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> pages are allocated with the specified gfp flags, but the pagetables are
> always allocated with GFP_KERNEL. This allocation can cause unexpected
> recursion into the filesystem or I/O subsystem.
> 
> It is not practical to extend page table allocation routines with gfp
> flags because it would require modification of architecture-specific code
> in all architecturs. However, the process can temporarily request that all
> allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> memalloc_nofs_save and memalloc_noio_save.
> 
> This patch makes the vmalloc code use memalloc_nofs_save or
> memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> flag.
> 
> The patch also simplifies code in dm-bufio.c, dm-ioctl.c and fs/xfs/kmem.c
> by removing explicit calls to memalloc_nofs_save and memalloc_noio_save
> before the call to __vmalloc.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> 
> ---
> drivers/md/dm-bufio.c |   24 +-----------------------
> drivers/md/dm-ioctl.c |    6 +-----
> fs/xfs/kmem.c         |   14 --------------
> mm/util.c             |    6 +++---
> mm/vmalloc.c          |   18 +++++++++++++++++-
> 5 files changed, 22 insertions(+), 46 deletions(-)
> 
> Index: linux-2.6/mm/vmalloc.c
> ===================================================================
> --- linux-2.6.orig/mm/vmalloc.c
> +++ linux-2.6/mm/vmalloc.c
> @@ -31,6 +31,7 @@
> #include <linux/compiler.h>
> #include <linux/llist.h>
> #include <linux/bitops.h>
> +#include <linux/sched/mm.h>
> 
> #include <linux/uaccess.h>
> #include <asm/tlbflush.h>
> @@ -1670,6 +1671,8 @@ static void *__vmalloc_area_node(struct
> 	unsigned int nr_pages, array_size, i;
> 	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> 	const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN;
> +	unsigned noio_flag;
> +	int r;
> 
> 	nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
> 	array_size = (nr_pages * sizeof(struct page *));
> @@ -1712,8 +1715,21 @@ static void *__vmalloc_area_node(struct
> 			cond_resched();
> 	}
> 
> -	if (map_vm_area(area, prot, pages))
> +	if (unlikely(!(gfp_mask & __GFP_IO)))
> +		noio_flag = memalloc_noio_save();
> +	else if (unlikely(!(gfp_mask & __GFP_FS)))
> +		noio_flag = memalloc_nofs_save();
> +
> +	r = map_vm_area(area, prot, pages);
> +
> +	if (unlikely(!(gfp_mask & __GFP_IO)))
> +		memalloc_noio_restore(noio_flag);
> +	else if (unlikely(!(gfp_mask & __GFP_FS)))
> +		memalloc_nofs_restore(noio_flag);

Is this really an "else if"?  I think it should just a separate "if".

Cheers, Andreas

> +
> +	if (unlikely(r))
> 		goto fail;
> +
> 	return area->addr;
> 
> fail:
> Index: linux-2.6/mm/util.c
> ===================================================================
> --- linux-2.6.orig/mm/util.c
> +++ linux-2.6/mm/util.c
> @@ -351,10 +351,10 @@ void *kvmalloc_node(size_t size, gfp_t f
> 	void *ret;
> 
> 	/*
> -	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
> -	 * so the given set of flags has to be compatible.
> +	 * vmalloc uses blocking allocations for some internal allocations
> +	 * (e.g page tables) so the given set of flags has to be compatible.
> 	 */
> -	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
> +	WARN_ON_ONCE(!gfpflags_allow_blocking(flags));
> 
> 	/*
> 	 * We want to attempt a large physically contiguous block first because
> Index: linux-2.6/drivers/md/dm-bufio.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-bufio.c
> +++ linux-2.6/drivers/md/dm-bufio.c
> @@ -386,9 +386,6 @@ static void __cache_size_refresh(void)
> static void *alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask,
> 			       enum data_mode *data_mode)
> {
> -	unsigned noio_flag;
> -	void *ptr;
> -
> 	if (c->block_size <= DM_BUFIO_BLOCK_SIZE_SLAB_LIMIT) {
> 		*data_mode = DATA_MODE_SLAB;
> 		return kmem_cache_alloc(DM_BUFIO_CACHE(c), gfp_mask);
> @@ -402,26 +399,7 @@ static void *alloc_buffer_data(struct dm
> 	}
> 
> 	*data_mode = DATA_MODE_VMALLOC;
> -
> -	/*
> -	 * __vmalloc allocates the data pages and auxiliary structures with
> -	 * gfp_flags that were specified, but pagetables are always allocated
> -	 * with GFP_KERNEL, no matter what was specified as gfp_mask.
> -	 *
> -	 * Consequently, we must set per-process flag PF_MEMALLOC_NOIO so that
> -	 * all allocations done by this process (including pagetables) are done
> -	 * as if GFP_NOIO was specified.
> -	 */
> -
> -	if (gfp_mask & __GFP_NORETRY)
> -		noio_flag = memalloc_noio_save();
> -
> -	ptr = __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL);
> -
> -	if (gfp_mask & __GFP_NORETRY)
> -		memalloc_noio_restore(noio_flag);
> -
> -	return ptr;
> +	return __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL);
> }
> 
> /*
> Index: linux-2.6/drivers/md/dm-ioctl.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-ioctl.c
> +++ linux-2.6/drivers/md/dm-ioctl.c
> @@ -1691,7 +1691,6 @@ static int copy_params(struct dm_ioctl _
> 	struct dm_ioctl *dmi;
> 	int secure_data;
> 	const size_t minimum_data_size = offsetof(struct dm_ioctl, data);
> -	unsigned noio_flag;
> 
> 	if (copy_from_user(param_kernel, user, minimum_data_size))
> 		return -EFAULT;
> @@ -1714,10 +1713,7 @@ static int copy_params(struct dm_ioctl _
> 	 * suspended and the ioctl is needed to resume it.
> 	 * Use kmalloc() rather than vmalloc() when we can.
> 	 */
> -	dmi = NULL;
> -	noio_flag = memalloc_noio_save();
> -	dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_HIGH);
> -	memalloc_noio_restore(noio_flag);
> +	dmi = kvmalloc(param_kernel->data_size, GFP_NOIO | __GFP_HIGH);
> 
> 	if (!dmi) {
> 		if (secure_data && clear_user(user, param_kernel->data_size))
> Index: linux-2.6/fs/xfs/kmem.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/kmem.c
> +++ linux-2.6/fs/xfs/kmem.c
> @@ -48,7 +48,6 @@ kmem_alloc(size_t size, xfs_km_flags_t f
> void *
> kmem_zalloc_large(size_t size, xfs_km_flags_t flags)
> {
> -	unsigned nofs_flag = 0;
> 	void	*ptr;
> 	gfp_t	lflags;
> 
> @@ -56,22 +55,9 @@ kmem_zalloc_large(size_t size, xfs_km_fl
> 	if (ptr)
> 		return ptr;
> 
> -	/*
> -	 * __vmalloc() will allocate data pages and auxillary structures (e.g.
> -	 * pagetables) with GFP_KERNEL, yet we may be under GFP_NOFS context
> -	 * here. Hence we need to tell memory reclaim that we are in such a
> -	 * context via PF_MEMALLOC_NOFS to prevent memory reclaim re-entering
> -	 * the filesystem here and potentially deadlocking.
> -	 */
> -	if (flags & KM_NOFS)
> -		nofs_flag = memalloc_nofs_save();
> -
> 	lflags = kmem_flags_convert(flags);
> 	ptr = __vmalloc(size, lflags | __GFP_ZERO, PAGE_KERNEL);
> 
> -	if (flags & KM_NOFS)
> -		memalloc_nofs_restore(nofs_flag);
> -
> 	return ptr;
> }
> 


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags
  2017-07-01  3:25 ` Andreas Dilger
@ 2017-07-01  4:49   ` Mikulas Patocka
  0 siblings, 0 replies; 11+ messages in thread
From: Mikulas Patocka @ 2017-07-01  4:49 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Michal Hocko, Alexei Starovoitov, Daniel Borkmann, Andrew Morton,
	Stephen Rothwell, Vlastimil Babka, John Hubbard, David Miller,
	linux-kernel, linux-mm, netdev



On Fri, 30 Jun 2017, Andreas Dilger wrote:

> On Jun 29, 2017, at 8:25 PM, Mikulas Patocka <mpatocka@redhat.com> wrote:
> > 
> > The __vmalloc function has a parameter gfp_mask with the allocation flags,
> > however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> > pages are allocated with the specified gfp flags, but the pagetables are
> > always allocated with GFP_KERNEL. This allocation can cause unexpected
> > recursion into the filesystem or I/O subsystem.
> > 
> > It is not practical to extend page table allocation routines with gfp
> > flags because it would require modification of architecture-specific code
> > in all architecturs. However, the process can temporarily request that all
> > allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> > memalloc_nofs_save and memalloc_noio_save.
> > 
> > This patch makes the vmalloc code use memalloc_nofs_save or
> > memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> > __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> > fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> > fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> > flag.
> > 
> > The patch also simplifies code in dm-bufio.c, dm-ioctl.c and fs/xfs/kmem.c
> > by removing explicit calls to memalloc_nofs_save and memalloc_noio_save
> > before the call to __vmalloc.
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > 
> > ---
> > drivers/md/dm-bufio.c |   24 +-----------------------
> > drivers/md/dm-ioctl.c |    6 +-----
> > fs/xfs/kmem.c         |   14 --------------
> > mm/util.c             |    6 +++---
> > mm/vmalloc.c          |   18 +++++++++++++++++-
> > 5 files changed, 22 insertions(+), 46 deletions(-)
> > 
> > Index: linux-2.6/mm/vmalloc.c
> > ===================================================================
> > --- linux-2.6.orig/mm/vmalloc.c
> > +++ linux-2.6/mm/vmalloc.c
> > @@ -31,6 +31,7 @@
> > #include <linux/compiler.h>
> > #include <linux/llist.h>
> > #include <linux/bitops.h>
> > +#include <linux/sched/mm.h>
> > 
> > #include <linux/uaccess.h>
> > #include <asm/tlbflush.h>
> > @@ -1670,6 +1671,8 @@ static void *__vmalloc_area_node(struct
> > 	unsigned int nr_pages, array_size, i;
> > 	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
> > 	const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN;
> > +	unsigned noio_flag;
> > +	int r;
> > 
> > 	nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
> > 	array_size = (nr_pages * sizeof(struct page *));
> > @@ -1712,8 +1715,21 @@ static void *__vmalloc_area_node(struct
> > 			cond_resched();
> > 	}
> > 
> > -	if (map_vm_area(area, prot, pages))
> > +	if (unlikely(!(gfp_mask & __GFP_IO)))
> > +		noio_flag = memalloc_noio_save();
> > +	else if (unlikely(!(gfp_mask & __GFP_FS)))
> > +		noio_flag = memalloc_nofs_save();
> > +
> > +	r = map_vm_area(area, prot, pages);
> > +
> > +	if (unlikely(!(gfp_mask & __GFP_IO)))
> > +		memalloc_noio_restore(noio_flag);
> > +	else if (unlikely(!(gfp_mask & __GFP_FS)))
> > +		memalloc_nofs_restore(noio_flag);
> 
> Is this really an "else if"?  I think it should just a separate "if".
> 
> Cheers, Andreas

It is meant to be "else if". memalloc_noio_save() implies 
memalloc_nofs_save(). If we call memalloc_noio_save(), there's no need to 
call memalloc_nofs_save().

Mikulas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags
  2017-07-01  0:36       ` Mikulas Patocka
@ 2017-07-03  6:31         ` Michal Hocko
  2017-07-03 22:57           ` Mikulas Patocka
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2017-07-03  6:31 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrew Morton,
	Stephen Rothwell, Vlastimil Babka, Andreas Dilger, John Hubbard,
	David Miller, linux-kernel, linux-mm, netdev

On Fri 30-06-17 20:36:12, Mikulas Patocka wrote:
> 
> 
> On Fri, 30 Jun 2017, Michal Hocko wrote:
> 
> > On Fri 30-06-17 14:11:57, Mikulas Patocka wrote:
> > > 
> > > 
> > > On Fri, 30 Jun 2017, Michal Hocko wrote:
> > > 
> > > > On Thu 29-06-17 22:25:09, Mikulas Patocka wrote:
> > > > > The __vmalloc function has a parameter gfp_mask with the allocation flags,
> > > > > however it doesn't fully respect the GFP_NOIO and GFP_NOFS flags. The
> > > > > pages are allocated with the specified gfp flags, but the pagetables are
> > > > > always allocated with GFP_KERNEL. This allocation can cause unexpected
> > > > > recursion into the filesystem or I/O subsystem.
> > > > > 
> > > > > It is not practical to extend page table allocation routines with gfp
> > > > > flags because it would require modification of architecture-specific code
> > > > > in all architecturs. However, the process can temporarily request that all
> > > > > allocations are done with GFP_NOFS or GFP_NOIO with with the functions
> > > > > memalloc_nofs_save and memalloc_noio_save.
> > > > > 
> > > > > This patch makes the vmalloc code use memalloc_nofs_save or
> > > > > memalloc_noio_save if the supplied gfp flags do not contain __GFP_FS or
> > > > > __GFP_IO. It fixes some possible deadlocks in drivers/mtd/ubi/io.c,
> > > > > fs/gfs2/, fs/btrfs/free-space-tree.c, fs/ubifs/,
> > > > > fs/nfs/blocklayout/extent_tree.c where __vmalloc is used with the GFP_NOFS
> > > > > flag.
> > > > 
> > > > I strongly believe this is a step in the _wrong_ direction. Why? Because
> > > 
> > > What do you think __vmalloc with GFP_NOIO should do? Print a warning? 
> > > Silently ignore the GFP_NOIO flag?
> > 
> > I think noio users are not that much different from nofs users. Simply
> > use the scope API at the place where the scope starts and document why
> > it is needed. vmalloc calls do not have to be any special then and they
> > do not even have to think about proper gfp flags and they can use
> > whatever is the default.
> > -- 
> > Michal Hocko
> > SUSE Labs
> 
> But you didn't answer the question - what should __vmalloc with GFP_NOIO 
> (or GFP_NOFS) do? Silently drop the flag? Print a warning? Or respect the 
> flag?

We can add a warning (or move it from kvmalloc) and hope that the
respective maintainers will fix those places properly. The reason I
didn't add the warning to vmalloc and kept it in kvmalloc was to catch
only new users rather than suddenly splat on existing ones. Note that
there are users with panic_on_warn enabled.

Considering how many NOFS users we have in tree I would rather work with
maintainers to fix them.
 
> Currently, it silently drops the GFP_NOIO or GFP_NOFS flag, but some 
> programmers don't know it and use these flags. You can't blame those 
> programmers for not knowing it.

At least __vmalloc_node is documented to not support all gfp flags.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags
  2017-07-03  6:31         ` Michal Hocko
@ 2017-07-03 22:57           ` Mikulas Patocka
  2017-07-04  8:10             ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Mikulas Patocka @ 2017-07-03 22:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrew Morton,
	Stephen Rothwell, Vlastimil Babka, Andreas Dilger, John Hubbard,
	David Miller, linux-kernel, linux-mm, netdev



On Mon, 3 Jul 2017, Michal Hocko wrote:

> We can add a warning (or move it from kvmalloc) and hope that the
> respective maintainers will fix those places properly. The reason I
> didn't add the warning to vmalloc and kept it in kvmalloc was to catch
> only new users rather than suddenly splat on existing ones. Note that
> there are users with panic_on_warn enabled.
> 
> Considering how many NOFS users we have in tree I would rather work with
> maintainers to fix them.

So - do you want this patch?

I still believe that the previous patch that pushes 
memalloc_noio/nofs_save into __vmalloc is better than this.

Currently there are 28 __vmalloc callers that use GFP_NOIO or GFP_NOFS, 
three of them already use memalloc_noio_save, 25 don't.

Mikulas

---
 drivers/block/drbd/drbd_bitmap.c        |    8 +++++---
 drivers/infiniband/hw/mlx4/qp.c         |   21 +++++++++++++++++----
 drivers/infiniband/sw/rdmavt/qp.c       |   19 +++++++++++++------
 drivers/infiniband/ulp/ipoib/ipoib_cm.c |    7 +++++--
 drivers/md/dm-bufio.c                   |    2 +-
 drivers/mtd/ubi/io.c                    |   11 +++++++++--
 fs/btrfs/free-space-tree.c              |    7 ++++++-
 fs/ext4/super.c                         |   21 +++++++++++++++++----
 fs/gfs2/dir.c                           |   29 +++++++++++++++++++++--------
 fs/gfs2/quota.c                         |    8 ++++++--
 fs/nfs/blocklayout/extent_tree.c        |    7 ++++++-
 fs/ntfs/malloc.h                        |   11 +++++++++--
 fs/ubifs/debug.c                        |    5 ++++-
 fs/ubifs/lprops.c                       |    5 ++++-
 fs/ubifs/lpt_commit.c                   |   10 ++++++++--
 fs/ubifs/orphan.c                       |    5 ++++-
 fs/ubifs/ubifs.h                        |    1 +
 fs/xfs/kmem.c                           |    2 +-
 mm/page_alloc.c                         |    2 +-
 mm/vmalloc.c                            |    6 ++++++
 net/ceph/ceph_common.c                  |   14 ++++++++++++--
 21 files changed, 156 insertions(+), 45 deletions(-)

Index: linux-2.6/drivers/block/drbd/drbd_bitmap.c
===================================================================
--- linux-2.6.orig/drivers/block/drbd/drbd_bitmap.c
+++ linux-2.6/drivers/block/drbd/drbd_bitmap.c
@@ -26,6 +26,7 @@
 
 #include <linux/bitmap.h>
 #include <linux/vmalloc.h>
+#include <linux/sched/mm.h>
 #include <linux/string.h>
 #include <linux/drbd.h>
 #include <linux/slab.h>
@@ -408,9 +409,10 @@ static struct page **bm_realloc_pages(st
 	bytes = sizeof(struct page *)*want;
 	new_pages = kzalloc(bytes, GFP_NOIO | __GFP_NOWARN);
 	if (!new_pages) {
-		new_pages = __vmalloc(bytes,
-				GFP_NOIO | __GFP_ZERO,
-				PAGE_KERNEL);
+		unsigned noio;
+		noio = memalloc_noio_save();
+		new_pages = vmalloc(bytes);
+		memalloc_noio_restore(noio);
 		if (!new_pages)
 			return NULL;
 	}
Index: linux-2.6/drivers/infiniband/hw/mlx4/qp.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/hw/mlx4/qp.c
+++ linux-2.6/drivers/infiniband/hw/mlx4/qp.c
@@ -37,6 +37,7 @@
 #include <linux/slab.h>
 #include <linux/netdevice.h>
 #include <linux/vmalloc.h>
+#include <linux/sched/mm.h>
 
 #include <rdma/ib_cache.h>
 #include <rdma/ib_pack.h>
@@ -814,14 +815,26 @@ static int create_qp_common(struct mlx4_
 
 		qp->sq.wrid = kmalloc_array(qp->sq.wqe_cnt, sizeof(u64),
 					gfp | __GFP_NOWARN);
-		if (!qp->sq.wrid)
+		if (!qp->sq.wrid) {
+			unsigned noio;
+			if (!(gfp & __GFP_IO))
+				noio = memalloc_noio_save();
 			qp->sq.wrid = __vmalloc(qp->sq.wqe_cnt * sizeof(u64),
-						gfp, PAGE_KERNEL);
+						gfp | __GFP_FS | __GFP_IO, PAGE_KERNEL);
+			if (!(gfp & __GFP_IO))
+				memalloc_noio_restore(noio);
+		}
 		qp->rq.wrid = kmalloc_array(qp->rq.wqe_cnt, sizeof(u64),
 					gfp | __GFP_NOWARN);
-		if (!qp->rq.wrid)
+		if (!qp->rq.wrid) {
+			unsigned noio;
+			if (!(gfp & __GFP_IO))
+				noio = memalloc_noio_save();
 			qp->rq.wrid = __vmalloc(qp->rq.wqe_cnt * sizeof(u64),
-						gfp, PAGE_KERNEL);
+						gfp | __GFP_FS | __GFP_IO, PAGE_KERNEL);
+			if (!(gfp & __GFP_IO))
+				memalloc_noio_restore(noio);
+		}
 		if (!qp->sq.wrid || !qp->rq.wrid) {
 			err = -ENOMEM;
 			goto err_wrid;
Index: linux-2.6/drivers/infiniband/sw/rdmavt/qp.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/sw/rdmavt/qp.c
+++ linux-2.6/drivers/infiniband/sw/rdmavt/qp.c
@@ -49,6 +49,7 @@
 #include <linux/bitops.h>
 #include <linux/lockdep.h>
 #include <linux/vmalloc.h>
+#include <linux/sched/mm.h>
 #include <linux/slab.h>
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_hdrs.h>
@@ -719,11 +720,14 @@ struct ib_qp *rvt_create_qp(struct ib_pd
 		sz = sizeof(struct rvt_sge) *
 			init_attr->cap.max_send_sge +
 			sizeof(struct rvt_swqe);
-		if (gfp == GFP_NOIO)
+		if (gfp == GFP_NOIO) {
+			unsigned noio;
+			noio = memalloc_noio_save();
 			swq = __vmalloc(
 				sqsize * sz,
-				gfp | __GFP_ZERO, PAGE_KERNEL);
-		else
+				gfp | __GFP_FS | __GFP_IO | __GFP_ZERO, PAGE_KERNEL);
+			memalloc_noio_restore(noio);
+		} else
 			swq = vzalloc_node(
 				sqsize * sz,
 				rdi->dparms.node);
@@ -786,12 +790,15 @@ struct ib_qp *rvt_create_qp(struct ib_pd
 				qp->r_rq.wq = vmalloc_user(
 						sizeof(struct rvt_rwq) +
 						qp->r_rq.size * sz);
-			else if (gfp == GFP_NOIO)
+			else if (gfp == GFP_NOIO) {
+				unsigned noio;
+				noio = memalloc_noio_save();
 				qp->r_rq.wq = __vmalloc(
 						sizeof(struct rvt_rwq) +
 						qp->r_rq.size * sz,
-						gfp | __GFP_ZERO, PAGE_KERNEL);
-			else
+						GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL);
+				memalloc_noio_restore(noio);
+			} else
 				qp->r_rq.wq = vzalloc_node(
 						sizeof(struct rvt_rwq) +
 						qp->r_rq.size * sz,
Index: linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_cm.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ linux-2.6/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -37,6 +37,7 @@
 #include <linux/delay.h>
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
+#include <linux/sched/mm.h>
 #include <linux/moduleparam.h>
 #include <linux/sched/signal.h>
 
@@ -1132,9 +1133,11 @@ static int ipoib_cm_tx_init(struct ipoib
 {
 	struct ipoib_dev_priv *priv = ipoib_priv(p->dev);
 	int ret;
+	unsigned noio;
 
-	p->tx_ring = __vmalloc(ipoib_sendq_size * sizeof *p->tx_ring,
-			       GFP_NOIO, PAGE_KERNEL);
+	noio = memalloc_noio_save();
+	p->tx_ring = vmalloc(ipoib_sendq_size * sizeof *p->tx_ring);
+	memalloc_noio_restore(noio);
 	if (!p->tx_ring) {
 		ret = -ENOMEM;
 		goto err_tx;
Index: linux-2.6/drivers/mtd/ubi/io.c
===================================================================
--- linux-2.6.orig/drivers/mtd/ubi/io.c
+++ linux-2.6/drivers/mtd/ubi/io.c
@@ -89,6 +89,7 @@
 #include <linux/crc32.h>
 #include <linux/err.h>
 #include <linux/slab.h>
+#include <linux/sched/mm.h>
 #include "ubi.h"
 
 static int self_check_not_bad(const struct ubi_device *ubi, int pnum);
@@ -1342,11 +1343,14 @@ static int self_check_write(struct ubi_d
 	size_t read;
 	void *buf1;
 	loff_t addr = (loff_t)pnum * ubi->peb_size + offset;
+	unsigned nofs;
 
 	if (!ubi_dbg_chk_io(ubi))
 		return 0;
 
-	buf1 = __vmalloc(len, GFP_NOFS, PAGE_KERNEL);
+	nofs = memalloc_nofs_save();
+	buf1 = vmalloc(len);
+	memalloc_nofs_restore(nofs);
 	if (!buf1) {
 		ubi_err(ubi, "cannot allocate memory to check writes");
 		return 0;
@@ -1406,11 +1410,14 @@ int ubi_self_check_all_ff(struct ubi_dev
 	int err;
 	void *buf;
 	loff_t addr = (loff_t)pnum * ubi->peb_size + offset;
+	unsigned nofs;
 
 	if (!ubi_dbg_chk_io(ubi))
 		return 0;
 
-	buf = __vmalloc(len, GFP_NOFS, PAGE_KERNEL);
+	nofs = memalloc_nofs_save();
+	buf = vmalloc(len);
+	memalloc_nofs_restore(nofs);
 	if (!buf) {
 		ubi_err(ubi, "cannot allocate memory to check for 0xFFs");
 		return 0;
Index: linux-2.6/fs/btrfs/free-space-tree.c
===================================================================
--- linux-2.6.orig/fs/btrfs/free-space-tree.c
+++ linux-2.6/fs/btrfs/free-space-tree.c
@@ -18,6 +18,7 @@
 
 #include <linux/kernel.h>
 #include <linux/vmalloc.h>
+#include <linux/sched/mm.h>
 #include "ctree.h"
 #include "disk-io.h"
 #include "locking.h"
@@ -154,6 +155,7 @@ static inline u32 free_space_bitmap_size
 static u8 *alloc_bitmap(u32 bitmap_size)
 {
 	void *mem;
+	unsigned nofs;
 
 	/*
 	 * The allocation size varies, observed numbers were < 4K up to 16K.
@@ -167,7 +169,10 @@ static u8 *alloc_bitmap(u32 bitmap_size)
 	if (mem)
 		return mem;
 
-	return __vmalloc(bitmap_size, GFP_NOFS | __GFP_ZERO, PAGE_KERNEL);
+	nofs = memalloc_nofs_save();
+	mem = __vmalloc(bitmap_size, GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL);
+	memalloc_nofs_restore(nofs);
+	return mem;
 }
 
 int convert_free_space_to_bitmaps(struct btrfs_trans_handle *trans,
Index: linux-2.6/fs/ext4/super.c
===================================================================
--- linux-2.6.orig/fs/ext4/super.c
+++ linux-2.6/fs/ext4/super.c
@@ -40,6 +40,7 @@
 #include <linux/dax.h>
 #include <linux/cleancache.h>
 #include <linux/uaccess.h>
+#include <linux/sched/mm.h>
 
 #include <linux/kthread.h>
 #include <linux/freezer.h>
@@ -185,8 +186,14 @@ void *ext4_kvmalloc(size_t size, gfp_t f
 	void *ret;
 
 	ret = kmalloc(size, flags | __GFP_NOWARN);
-	if (!ret)
-		ret = __vmalloc(size, flags, PAGE_KERNEL);
+	if (!ret) {
+		unsigned nofs;
+		if (!(flags & __GFP_FS))
+			nofs = memalloc_nofs_save();
+		ret = __vmalloc(size, flags | __GFP_FS, PAGE_KERNEL);
+		if (!(flags & __GFP_FS))
+			memalloc_nofs_restore(nofs);
+	}
 	return ret;
 }
 
@@ -195,8 +202,14 @@ void *ext4_kvzalloc(size_t size, gfp_t f
 	void *ret;
 
 	ret = kzalloc(size, flags | __GFP_NOWARN);
-	if (!ret)
-		ret = __vmalloc(size, flags | __GFP_ZERO, PAGE_KERNEL);
+	if (!ret) {
+		unsigned nofs;
+		if (!(flags & __GFP_FS))
+			nofs = memalloc_nofs_save();
+		ret = __vmalloc(size, flags | __GFP_FS | __GFP_ZERO, PAGE_KERNEL);
+		if (!(flags & __GFP_FS))
+			memalloc_nofs_restore(nofs);
+	}
 	return ret;
 }
 
Index: linux-2.6/fs/gfs2/dir.c
===================================================================
--- linux-2.6.orig/fs/gfs2/dir.c
+++ linux-2.6/fs/gfs2/dir.c
@@ -62,6 +62,7 @@
 #include <linux/gfs2_ondisk.h>
 #include <linux/crc32.h>
 #include <linux/vmalloc.h>
+#include <linux/sched/mm.h>
 #include <linux/bio.h>
 
 #include "gfs2.h"
@@ -360,8 +361,11 @@ static __be64 *gfs2_dir_get_hash_table(s
 	}
 
 	hc = kmalloc(hsize, GFP_NOFS | __GFP_NOWARN);
-	if (hc == NULL)
-		hc = __vmalloc(hsize, GFP_NOFS, PAGE_KERNEL);
+	if (hc == NULL) {
+		unsigned nofs = memalloc_nofs_save();
+		hc = vmalloc(hsize);
+		memalloc_nofs_restore(nofs);
+	}
 
 	if (hc == NULL)
 		return ERR_PTR(-ENOMEM);
@@ -1172,8 +1176,11 @@ static int dir_double_exhash(struct gfs2
 		return PTR_ERR(hc);
 
 	hc2 = kmalloc(hsize_bytes * 2, GFP_NOFS | __GFP_NOWARN);
-	if (hc2 == NULL)
-		hc2 = __vmalloc(hsize_bytes * 2, GFP_NOFS, PAGE_KERNEL);
+	if (hc2 == NULL) {
+		unsigned nofs = memalloc_nofs_save();
+		hc2 = vmalloc(hsize_bytes * 2);
+		memalloc_nofs_restore(nofs);
+	}
 
 	if (!hc2)
 		return -ENOMEM;
@@ -1333,8 +1340,11 @@ static void *gfs2_alloc_sort_buffer(unsi
 
 	if (size < KMALLOC_MAX_SIZE)
 		ptr = kmalloc(size, GFP_NOFS | __GFP_NOWARN);
-	if (!ptr)
-		ptr = __vmalloc(size, GFP_NOFS, PAGE_KERNEL);
+	if (!ptr) {
+		unsigned nofs = memalloc_nofs_save();
+		ptr = vmalloc(size);
+		memalloc_nofs_restore(nofs);
+	}
 	return ptr;
 }
 
@@ -2000,9 +2010,12 @@ static int leaf_dealloc(struct gfs2_inod
 	memset(&rlist, 0, sizeof(struct gfs2_rgrp_list));
 
 	ht = kzalloc(size, GFP_NOFS | __GFP_NOWARN);
-	if (ht == NULL)
-		ht = __vmalloc(size, GFP_NOFS | __GFP_NOWARN | __GFP_ZERO,
+	if (ht == NULL) {
+		unsigned nofs = memalloc_nofs_save();
+		ht = __vmalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO,
 			       PAGE_KERNEL);
+		memalloc_nofs_restore(nofs);
+	}
 	if (!ht)
 		return -ENOMEM;
 
Index: linux-2.6/fs/gfs2/quota.c
===================================================================
--- linux-2.6.orig/fs/gfs2/quota.c
+++ linux-2.6/fs/gfs2/quota.c
@@ -59,6 +59,7 @@
 #include <linux/bit_spinlock.h>
 #include <linux/jhash.h>
 #include <linux/vmalloc.h>
+#include <linux/sched/mm.h>
 
 #include "gfs2.h"
 #include "incore.h"
@@ -1353,9 +1354,12 @@ int gfs2_quota_init(struct gfs2_sbd *sdp
 	bm_size *= sizeof(unsigned long);
 	error = -ENOMEM;
 	sdp->sd_quota_bitmap = kzalloc(bm_size, GFP_NOFS | __GFP_NOWARN);
-	if (sdp->sd_quota_bitmap == NULL)
-		sdp->sd_quota_bitmap = __vmalloc(bm_size, GFP_NOFS |
+	if (sdp->sd_quota_bitmap == NULL) {
+		unsigned nofs = memalloc_nofs_save();
+		sdp->sd_quota_bitmap = __vmalloc(bm_size, GFP_KERNEL |
 						 __GFP_ZERO, PAGE_KERNEL);
+		memalloc_nofs_restore(nofs);
+	}
 	if (!sdp->sd_quota_bitmap)
 		return error;
 
Index: linux-2.6/fs/nfs/blocklayout/extent_tree.c
===================================================================
--- linux-2.6.orig/fs/nfs/blocklayout/extent_tree.c
+++ linux-2.6/fs/nfs/blocklayout/extent_tree.c
@@ -3,6 +3,7 @@
  */
 
 #include <linux/vmalloc.h>
+#include <linux/sched/mm.h>
 
 #include "blocklayout.h"
 
@@ -570,6 +571,8 @@ ext_tree_prepare_commit(struct nfs4_layo
 retry:
 	ret = ext_tree_encode_commit(bl, start_p + 1, buffer_size, &count, &arg->lastbytewritten);
 	if (unlikely(ret)) {
+		unsigned nofs;
+
 		ext_tree_free_commitdata(arg, buffer_size);
 
 		buffer_size = ext_tree_layoutupdate_size(bl, count);
@@ -581,7 +584,9 @@ retry:
 		if (!arg->layoutupdate_pages)
 			return -ENOMEM;
 
-		start_p = __vmalloc(buffer_size, GFP_NOFS, PAGE_KERNEL);
+		nofs = memalloc_nofs_save();
+		start_p = vmalloc(buffer_size);
+		memalloc_nofs_restore(nofs);
 		if (!start_p) {
 			kfree(arg->layoutupdate_pages);
 			return -ENOMEM;
Index: linux-2.6/fs/ntfs/malloc.h
===================================================================
--- linux-2.6.orig/fs/ntfs/malloc.h
+++ linux-2.6/fs/ntfs/malloc.h
@@ -23,6 +23,7 @@
 #define _LINUX_NTFS_MALLOC_H
 
 #include <linux/vmalloc.h>
+#include <linux/sched/mm.h>
 #include <linux/slab.h>
 #include <linux/highmem.h>
 
@@ -47,8 +48,14 @@ static inline void *__ntfs_malloc(unsign
 		return kmalloc(PAGE_SIZE, gfp_mask & ~__GFP_HIGHMEM);
 		/* return (void *)__get_free_page(gfp_mask); */
 	}
-	if (likely((size >> PAGE_SHIFT) < totalram_pages))
-		return __vmalloc(size, gfp_mask, PAGE_KERNEL);
+	if (likely((size >> PAGE_SHIFT) < totalram_pages)) {
+		unsigned nofs;
+		if (!(gfp_mask & __GFP_FS))
+			nofs = memalloc_nofs_save();
+		return __vmalloc(size, gfp_mask | __GFP_FS, PAGE_KERNEL);
+		if (!(gfp_mask & __GFP_FS))
+			memalloc_nofs_restore(nofs);
+	}
 	return NULL;
 }
 
Index: linux-2.6/fs/ubifs/debug.c
===================================================================
--- linux-2.6.orig/fs/ubifs/debug.c
+++ linux-2.6/fs/ubifs/debug.c
@@ -818,10 +818,13 @@ void ubifs_dump_leb(const struct ubifs_i
 	struct ubifs_scan_leb *sleb;
 	struct ubifs_scan_node *snod;
 	void *buf;
+	unsigned nofs;
 
 	pr_err("(pid %d) start dumping LEB %d\n", current->pid, lnum);
 
-	buf = __vmalloc(c->leb_size, GFP_NOFS, PAGE_KERNEL);
+	nofs = memalloc_nofs_save();
+	buf = vmalloc(c->leb_size);
+	memalloc_nofs_restore(nofs);
 	if (!buf) {
 		ubifs_err(c, "cannot allocate memory for dumping LEB %d", lnum);
 		return;
Index: linux-2.6/fs/ubifs/lprops.c
===================================================================
--- linux-2.6.orig/fs/ubifs/lprops.c
+++ linux-2.6/fs/ubifs/lprops.c
@@ -1034,6 +1034,7 @@ static int scan_check_cb(struct ubifs_in
 	struct ubifs_scan_node *snod;
 	int cat, lnum = lp->lnum, is_idx = 0, used = 0, free, dirty, ret;
 	void *buf = NULL;
+	unsigned nofs;
 
 	cat = lp->flags & LPROPS_CAT_MASK;
 	if (cat != LPROPS_UNCAT) {
@@ -1091,7 +1092,9 @@ static int scan_check_cb(struct ubifs_in
 		}
 	}
 
-	buf = __vmalloc(c->leb_size, GFP_NOFS, PAGE_KERNEL);
+	nofs = memalloc_nofs_save();
+	buf = vmalloc(c->leb_size);
+	memalloc_nofs_restore(nofs);
 	if (!buf)
 		return -ENOMEM;
 
Index: linux-2.6/fs/ubifs/lpt_commit.c
===================================================================
--- linux-2.6.orig/fs/ubifs/lpt_commit.c
+++ linux-2.6/fs/ubifs/lpt_commit.c
@@ -1630,11 +1630,14 @@ static int dbg_check_ltab_lnum(struct ub
 	int err, len = c->leb_size, dirty = 0, node_type, node_num, node_len;
 	int ret;
 	void *buf, *p;
+	unsigned nofs;
 
 	if (!dbg_is_chk_lprops(c))
 		return 0;
 
-	buf = p = __vmalloc(c->leb_size, GFP_NOFS, PAGE_KERNEL);
+	nofs = memalloc_nofs_save();
+	buf = p = vmalloc(c->leb_size);
+	memalloc_nofs_restore(nofs);
 	if (!buf) {
 		ubifs_err(c, "cannot allocate memory for ltab checking");
 		return 0;
@@ -1881,9 +1884,12 @@ static void dump_lpt_leb(const struct ub
 {
 	int err, len = c->leb_size, node_type, node_num, node_len, offs;
 	void *buf, *p;
+	unsigned nofs;
 
 	pr_err("(pid %d) start dumping LEB %d\n", current->pid, lnum);
-	buf = p = __vmalloc(c->leb_size, GFP_NOFS, PAGE_KERNEL);
+	nofs = memalloc_nofs_save();
+	buf = p = vmalloc(c->leb_size);
+	memalloc_nofs_restore(nofs);
 	if (!buf) {
 		ubifs_err(c, "cannot allocate memory to dump LPT");
 		return;
Index: linux-2.6/fs/ubifs/orphan.c
===================================================================
--- linux-2.6.orig/fs/ubifs/orphan.c
+++ linux-2.6/fs/ubifs/orphan.c
@@ -880,12 +880,15 @@ static int dbg_scan_orphans(struct ubifs
 {
 	int lnum, err = 0;
 	void *buf;
+	unsigned nofs;
 
 	/* Check no-orphans flag and skip this if no orphans */
 	if (c->no_orphs)
 		return 0;
 
-	buf = __vmalloc(c->leb_size, GFP_NOFS, PAGE_KERNEL);
+	nofs = memalloc_nofs_save();
+	buf = vmalloc(c->leb_size);
+	memalloc_nofs_restore(nofs);
 	if (!buf) {
 		ubifs_err(c, "cannot allocate memory to check orphans");
 		return 0;
Index: linux-2.6/mm/vmalloc.c
===================================================================
--- linux-2.6.orig/mm/vmalloc.c
+++ linux-2.6/mm/vmalloc.c
@@ -1671,6 +1671,12 @@ static void *__vmalloc_area_node(struct
 	const gfp_t nested_gfp = (gfp_mask & GFP_RECLAIM_MASK) | __GFP_ZERO;
 	const gfp_t alloc_mask = gfp_mask | __GFP_HIGHMEM | __GFP_NOWARN;
 
+	/*
+	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
+	 * so the given set of flags has to be compatible.
+	 */
+	WARN_ON_ONCE((gfp_mask & GFP_KERNEL) != GFP_KERNEL);
+
 	nr_pages = get_vm_area_size(area) >> PAGE_SHIFT;
 	array_size = (nr_pages * sizeof(struct page *));
 
Index: linux-2.6/net/ceph/ceph_common.c
===================================================================
--- linux-2.6.orig/net/ceph/ceph_common.c
+++ linux-2.6/net/ceph/ceph_common.c
@@ -17,6 +17,7 @@
 #include <linux/statfs.h>
 #include <linux/string.h>
 #include <linux/vmalloc.h>
+#include <linux/sched/mm.h>
 
 
 #include <linux/ceph/ceph_features.h>
@@ -179,13 +180,22 @@ EXPORT_SYMBOL(ceph_compare_options);
 
 void *ceph_kvmalloc(size_t size, gfp_t flags)
 {
+	void *ptr;
+	unsigned noio;
+
 	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
-		void *ptr = kmalloc(size, flags | __GFP_NOWARN);
+		ptr = kmalloc(size, flags | __GFP_NOWARN);
 		if (ptr)
 			return ptr;
 	}
 
-	return __vmalloc(size, flags, PAGE_KERNEL);
+	if ((flags & (__GFP_FS | __GFP_IO)) != (__GFP_FS | __GFP_IO))
+		noio = memalloc_noio_save();
+	ptr = __vmalloc(size, flags | __GFP_FS | __GFP_IO, PAGE_KERNEL);
+	if ((flags & (__GFP_FS | __GFP_IO)) != (__GFP_FS | __GFP_IO))
+		memalloc_noio_restore(noio);
+
+	return ptr;
 }
 
 
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -7249,7 +7249,7 @@ void *__init alloc_large_system_hash(con
 		if (flags & HASH_EARLY)
 			table = memblock_virt_alloc_nopanic(size, 0);
 		else if (hashdist)
-			table = __vmalloc(size, GFP_ATOMIC, PAGE_KERNEL);
+			table = vmalloc(size);
 		else {
 			/*
 			 * If bucketsize is not a power-of-two, we may free
Index: linux-2.6/drivers/md/dm-bufio.c
===================================================================
--- linux-2.6.orig/drivers/md/dm-bufio.c
+++ linux-2.6/drivers/md/dm-bufio.c
@@ -406,7 +406,7 @@ static void *alloc_buffer_data(struct dm
 	if (gfp_mask & __GFP_NORETRY)
 		noio_flag = memalloc_noio_save();
 
-	ptr = __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL);
+	ptr = __vmalloc(c->block_size, gfp_mask | __GFP_FS | __GFP_IO, PAGE_KERNEL);
 
 	if (gfp_mask & __GFP_NORETRY)
 		memalloc_noio_restore(noio_flag);
Index: linux-2.6/fs/xfs/kmem.c
===================================================================
--- linux-2.6.orig/fs/xfs/kmem.c
+++ linux-2.6/fs/xfs/kmem.c
@@ -67,7 +67,7 @@ kmem_zalloc_large(size_t size, xfs_km_fl
 		nofs_flag = memalloc_nofs_save();
 
 	lflags = kmem_flags_convert(flags);
-	ptr = __vmalloc(size, lflags | __GFP_ZERO, PAGE_KERNEL);
+	ptr = __vmalloc(size, lflags | __GFP_FS | __GFP_IO | __GFP_ZERO, PAGE_KERNEL);
 
 	if (flags & KM_NOFS)
 		memalloc_nofs_restore(nofs_flag);
Index: linux-2.6/fs/ubifs/ubifs.h
===================================================================
--- linux-2.6.orig/fs/ubifs/ubifs.h
+++ linux-2.6/fs/ubifs/ubifs.h
@@ -30,6 +30,7 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
+#include <linux/sched/mm.h>
 #include <linux/spinlock.h>
 #include <linux/mutex.h>
 #include <linux/rwsem.h>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags
  2017-07-03 22:57           ` Mikulas Patocka
@ 2017-07-04  8:10             ` Michal Hocko
  0 siblings, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2017-07-04  8:10 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrew Morton,
	Stephen Rothwell, Vlastimil Babka, Andreas Dilger, John Hubbard,
	David Miller, linux-kernel, linux-mm, netdev

On Mon 03-07-17 18:57:14, Mikulas Patocka wrote:
> 
> 
> On Mon, 3 Jul 2017, Michal Hocko wrote:
> 
> > We can add a warning (or move it from kvmalloc) and hope that the
> > respective maintainers will fix those places properly. The reason I
> > didn't add the warning to vmalloc and kept it in kvmalloc was to catch
> > only new users rather than suddenly splat on existing ones. Note that
> > there are users with panic_on_warn enabled.
> > 
> > Considering how many NOFS users we have in tree I would rather work with
> > maintainers to fix them.
> 
> So - do you want this patch?

no, see below
 
> I still believe that the previous patch that pushes 
> memalloc_noio/nofs_save into __vmalloc is better than this.

It is, but both of them are actually wrong. Why? Because that would be
just a mindless application of the scope where the scope doesn't match
the actual reclaim recursion restricted scope. Really, the right way to
go is to simply talk to the respective maintainers. Find out whether
NOFS context is really needed and if so find the scope (e.g. a lock
which would be needed in the reclaim context) and document it. This is
not a trivial work to do but a) we do not seem to have any bug reports
complaining about these call sites so there is no need to hurry and b)
this will result in a cleaner and easier to maintain code.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-07-04  8:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-30  2:25 [PATCH] vmalloc: respect the GFP_NOIO and GFP_NOFS flags Mikulas Patocka
2017-06-30  6:20 ` John Hubbard
2017-06-30  8:12 ` Michal Hocko
2017-06-30 18:11   ` Mikulas Patocka
2017-06-30 20:41     ` Michal Hocko
2017-07-01  0:36       ` Mikulas Patocka
2017-07-03  6:31         ` Michal Hocko
2017-07-03 22:57           ` Mikulas Patocka
2017-07-04  8:10             ` Michal Hocko
2017-07-01  3:25 ` Andreas Dilger
2017-07-01  4:49   ` Mikulas Patocka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).