All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/3] extend vmalloc support for constrained allocations
@ 2021-10-18 11:47 Michal Hocko
  2021-10-18 11:47 ` [RFC 1/3] mm/vmalloc: alloc GFP_NO{FS,IO} for vmalloc Michal Hocko
                   ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Michal Hocko @ 2021-10-18 11:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Chinner, Neil Brown, Andrew Morton, Christoph Hellwig,
	Uladzislau Rezki, linux-fsdevel, LKML, Ilya Dryomov, Jeff Layton

Hi,
based on a recent discussion with Dave and Neil [1] I have tried to
implement NOFS, NOIO, NOFAIL support for the vmalloc to make
life of kvmalloc users easier.

A requirement for NOFAIL support for kvmalloc was new to me but this
seems to be really needed by the xfs code.

NOFS/NOIO was a known and a long term problem which was hoped to be
handled by the scope API. Those scope should have been used at the
reclaim recursion boundaries both to document them and also to remove
the necessity of NOFS/NOIO constrains for all allocations within that
scope. Instead workarounds were developed to wrap a single allocation
instead (like ceph_kvmalloc).

First patch implements NOFS/NOIO support for vmalloc. The second one
adds NOFAIL support and the third one bundles all together into kvmalloc
and drops ceph_kvmalloc which can use kvmalloc directly now.

Please note that this is RFC and I haven't done any testing on this yet.
I hope I haven't missed anything in the vmalloc allocator. It would be
really great if Christoph and Uladzislau could have a look.

Thanks!

[1] http://lkml.kernel.org/r/163184741778.29351.16920832234899124642.stgit@noble.brown



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC 1/3] mm/vmalloc: alloc GFP_NO{FS,IO} for vmalloc
  2021-10-18 11:47 [RFC 0/3] extend vmalloc support for constrained allocations Michal Hocko
@ 2021-10-18 11:47 ` Michal Hocko
  2021-10-19  0:44   ` NeilBrown
  2021-10-18 11:47 ` [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL Michal Hocko
  2021-10-18 11:47 ` [RFC 3/3] mm: allow !GFP_KERNEL allocations for kvmalloc Michal Hocko
  2 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2021-10-18 11:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Chinner, Neil Brown, Andrew Morton, Christoph Hellwig,
	Uladzislau Rezki, linux-fsdevel, LKML, Ilya Dryomov, Jeff Layton,
	Michal Hocko

From: Michal Hocko <mhocko@suse.com>

vmalloc historically hasn't supported GFP_NO{FS,IO} requests because
page table allocations do not support externally provided gfp mask
and performed GFP_KERNEL like allocations.

Since few years we have scope (memalloc_no{fs,io}_{save,restore}) APIs
to enforce NOFS and NOIO constrains implicitly to all allocators within
the scope. There was a hope that those scopes would be defined on a
higher level when the reclaim recursion boundary starts/stops (e.g. when
a lock required during the memory reclaim is required etc.). It seems
that not all NOFS/NOIO users have adopted this approach and instead
they have taken a workaround approach to wrap a single [k]vmalloc
allocation by a scope API.

These workarounds do not serve the purpose of a better reclaim recursion
documentation and reduction of explicit GFP_NO{FS,IO} usege so let's
just provide them with the semantic they are asking for without a need
for workarounds.

Add support for GFP_NOFS and GFP_NOIO to vmalloc directly. All internal
allocations already comply with the given gfp_mask. The only current
exception is vmap_pages_range which maps kernel page tables. Infer the
proper scope API based on the given gfp mask.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/vmalloc.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d77830ff604c..7455c89598d3 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2889,6 +2889,8 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	unsigned long array_size;
 	unsigned int nr_small_pages = size >> PAGE_SHIFT;
 	unsigned int page_order;
+	unsigned int flags;
+	int ret;
 
 	array_size = (unsigned long)nr_small_pages * sizeof(struct page *);
 	gfp_mask |= __GFP_NOWARN;
@@ -2930,8 +2932,24 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 		goto fail;
 	}
 
-	if (vmap_pages_range(addr, addr + size, prot, area->pages,
-			page_shift) < 0) {
+	/*
+	 * page tables allocations ignore external gfp mask, enforce it
+	 * by the scope API
+	 */
+	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
+		flags = memalloc_nofs_save();
+	else if (!(gfp_mask & (__GFP_FS | __GFP_IO)))
+		flags = memalloc_noio_save();
+
+	ret = vmap_pages_range(addr, addr + size, prot, area->pages,
+			page_shift);
+
+	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
+		memalloc_nofs_restore(flags);
+	else if (!(gfp_mask & (__GFP_FS | __GFP_IO)))
+		memalloc_noio_restore(flags);
+
+	if (ret < 0) {
 		warn_alloc(gfp_mask, NULL,
 			"vmalloc error: size %lu, failed to map pages",
 			area->nr_pages * PAGE_SIZE);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-18 11:47 [RFC 0/3] extend vmalloc support for constrained allocations Michal Hocko
  2021-10-18 11:47 ` [RFC 1/3] mm/vmalloc: alloc GFP_NO{FS,IO} for vmalloc Michal Hocko
@ 2021-10-18 11:47 ` Michal Hocko
  2021-10-18 16:24     ` kernel test robot
                     ` (4 more replies)
  2021-10-18 11:47 ` [RFC 3/3] mm: allow !GFP_KERNEL allocations for kvmalloc Michal Hocko
  2 siblings, 5 replies; 39+ messages in thread
From: Michal Hocko @ 2021-10-18 11:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Chinner, Neil Brown, Andrew Morton, Christoph Hellwig,
	Uladzislau Rezki, linux-fsdevel, LKML, Ilya Dryomov, Jeff Layton,
	Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Dave Chinner has mentioned that some of the xfs code would benefit from
kvmalloc support for __GFP_NOFAIL because they have allocations that
cannot fail and they do not fit into a single page.

The larg part of the vmalloc implementation already complies with the
given gfp flags so there is no work for those to be done. The area
and page table allocations are an exception to that. Implement a retry
loop for those.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/vmalloc.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 7455c89598d3..3a5a178295d1 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2941,8 +2941,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	else if (!(gfp_mask & (__GFP_FS | __GFP_IO)))
 		flags = memalloc_noio_save();
 
-	ret = vmap_pages_range(addr, addr + size, prot, area->pages,
+	do {
+		ret = vmap_pages_range(addr, addr + size, prot, area->pages,
 			page_shift);
+	} while ((gfp_mask & __GFP_NOFAIL) && (ret < 0));
 
 	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
 		memalloc_nofs_restore(flags);
@@ -3032,6 +3034,8 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 		warn_alloc(gfp_mask, NULL,
 			"vmalloc error: size %lu, vm_struct allocation failed",
 			real_size);
+		if (gfp_mask && __GFP_NOFAIL)
+			goto again;
 		goto fail;
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [RFC 3/3] mm: allow !GFP_KERNEL allocations for kvmalloc
  2021-10-18 11:47 [RFC 0/3] extend vmalloc support for constrained allocations Michal Hocko
  2021-10-18 11:47 ` [RFC 1/3] mm/vmalloc: alloc GFP_NO{FS,IO} for vmalloc Michal Hocko
  2021-10-18 11:47 ` [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL Michal Hocko
@ 2021-10-18 11:47 ` Michal Hocko
  2 siblings, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2021-10-18 11:47 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Chinner, Neil Brown, Andrew Morton, Christoph Hellwig,
	Uladzislau Rezki, linux-fsdevel, LKML, Ilya Dryomov, Jeff Layton,
	Michal Hocko

From: Michal Hocko <mhocko@suse.com>

A support for GFP_NO{FS,IO} and __GFP_NOFAIL has been implemented
by previous patches so we can allow the support for kvmalloc. This
will allow some external users to simplify or completely remove
their helpers.

GFP_NOWAIT semantic hasn't been supported so far but it hasn't been
explicitly documented so let's add a note about that.

ceph_kvmalloc is the first helper to be dropped and changed to
kvmalloc.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/ceph/libceph.h |  1 -
 mm/util.c                    | 15 ++++-----------
 net/ceph/buffer.c            |  4 ++--
 net/ceph/ceph_common.c       | 27 ---------------------------
 net/ceph/crypto.c            |  2 +-
 net/ceph/messenger.c         |  2 +-
 net/ceph/messenger_v2.c      |  2 +-
 net/ceph/osdmap.c            | 12 ++++++------
 8 files changed, 15 insertions(+), 50 deletions(-)

diff --git a/include/linux/ceph/libceph.h b/include/linux/ceph/libceph.h
index 409d8c29bc4f..309acbcb5a8a 100644
--- a/include/linux/ceph/libceph.h
+++ b/include/linux/ceph/libceph.h
@@ -295,7 +295,6 @@ extern bool libceph_compatible(void *data);
 
 extern const char *ceph_msg_type_name(int type);
 extern int ceph_check_fsid(struct ceph_client *client, struct ceph_fsid *fsid);
-extern void *ceph_kvmalloc(size_t size, gfp_t flags);
 
 struct fs_parameter;
 struct fc_log;
diff --git a/mm/util.c b/mm/util.c
index bacabe446906..fdec6b4b1267 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -549,13 +549,10 @@ EXPORT_SYMBOL(vm_mmap);
  * Uses kmalloc to get the memory but if the allocation fails then falls back
  * to the vmalloc allocator. Use kvfree for freeing the memory.
  *
- * Reclaim modifiers - __GFP_NORETRY and __GFP_NOFAIL are not supported.
+ * Reclaim modifiers - __GFP_NORETRY and GFP_NOWAIT are not supported.
  * __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc is
  * preferable to the vmalloc fallback, due to visible performance drawbacks.
  *
- * Please note that any use of gfp flags outside of GFP_KERNEL is careful to not
- * fall back to vmalloc.
- *
  * Return: pointer to the allocated memory of %NULL in case of failure
  */
 void *kvmalloc_node(size_t size, gfp_t flags, int node)
@@ -563,13 +560,6 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
 	gfp_t kmalloc_flags = flags;
 	void *ret;
 
-	/*
-	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
-	 * so the given set of flags has to be compatible.
-	 */
-	if ((flags & GFP_KERNEL) != GFP_KERNEL)
-		return kmalloc_node(size, flags, node);
-
 	/*
 	 * We want to attempt a large physically contiguous block first because
 	 * it is less likely to fragment multiple larger blocks and therefore
@@ -582,6 +572,9 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
 
 		if (!(kmalloc_flags & __GFP_RETRY_MAYFAIL))
 			kmalloc_flags |= __GFP_NORETRY;
+
+		/* nofail semantic is implemented by the vmalloc fallback */
+		kmalloc_flags &= ~__GFP_NOFAIL;
 	}
 
 	ret = kmalloc_node(size, kmalloc_flags, node);
diff --git a/net/ceph/buffer.c b/net/ceph/buffer.c
index 5622763ad402..7e51f128045d 100644
--- a/net/ceph/buffer.c
+++ b/net/ceph/buffer.c
@@ -7,7 +7,7 @@
 
 #include <linux/ceph/buffer.h>
 #include <linux/ceph/decode.h>
-#include <linux/ceph/libceph.h> /* for ceph_kvmalloc */
+#include <linux/ceph/libceph.h> /* for kvmalloc */
 
 struct ceph_buffer *ceph_buffer_new(size_t len, gfp_t gfp)
 {
@@ -17,7 +17,7 @@ struct ceph_buffer *ceph_buffer_new(size_t len, gfp_t gfp)
 	if (!b)
 		return NULL;
 
-	b->vec.iov_base = ceph_kvmalloc(len, gfp);
+	b->vec.iov_base = kvmalloc(len, gfp);
 	if (!b->vec.iov_base) {
 		kfree(b);
 		return NULL;
diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c
index 97d6ea763e32..9441b4a4912b 100644
--- a/net/ceph/ceph_common.c
+++ b/net/ceph/ceph_common.c
@@ -190,33 +190,6 @@ int ceph_compare_options(struct ceph_options *new_opt,
 }
 EXPORT_SYMBOL(ceph_compare_options);
 
-/*
- * kvmalloc() doesn't fall back to the vmalloc allocator unless flags are
- * compatible with (a superset of) GFP_KERNEL.  This is because while the
- * actual pages are allocated with the specified flags, the page table pages
- * are always allocated with GFP_KERNEL.
- *
- * ceph_kvmalloc() may be called with GFP_KERNEL, GFP_NOFS or GFP_NOIO.
- */
-void *ceph_kvmalloc(size_t size, gfp_t flags)
-{
-	void *p;
-
-	if ((flags & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS)) {
-		p = kvmalloc(size, flags);
-	} else if ((flags & (__GFP_IO | __GFP_FS)) == __GFP_IO) {
-		unsigned int nofs_flag = memalloc_nofs_save();
-		p = kvmalloc(size, GFP_KERNEL);
-		memalloc_nofs_restore(nofs_flag);
-	} else {
-		unsigned int noio_flag = memalloc_noio_save();
-		p = kvmalloc(size, GFP_KERNEL);
-		memalloc_noio_restore(noio_flag);
-	}
-
-	return p;
-}
-
 static int parse_fsid(const char *str, struct ceph_fsid *fsid)
 {
 	int i = 0;
diff --git a/net/ceph/crypto.c b/net/ceph/crypto.c
index 92d89b331645..051d22c0e4ad 100644
--- a/net/ceph/crypto.c
+++ b/net/ceph/crypto.c
@@ -147,7 +147,7 @@ void ceph_crypto_key_destroy(struct ceph_crypto_key *key)
 static const u8 *aes_iv = (u8 *)CEPH_AES_IV;
 
 /*
- * Should be used for buffers allocated with ceph_kvmalloc().
+ * Should be used for buffers allocated with kvmalloc().
  * Currently these are encrypt out-buffer (ceph_buffer) and decrypt
  * in-buffer (msg front).
  *
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index 57d043b382ed..7b891be799d2 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -1920,7 +1920,7 @@ struct ceph_msg *ceph_msg_new2(int type, int front_len, int max_data_items,
 
 	/* front */
 	if (front_len) {
-		m->front.iov_base = ceph_kvmalloc(front_len, flags);
+		m->front.iov_base = kvmalloc(front_len, flags);
 		if (m->front.iov_base == NULL) {
 			dout("ceph_msg_new can't allocate %d bytes\n",
 			     front_len);
diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
index cc40ce4e02fb..c4099b641b38 100644
--- a/net/ceph/messenger_v2.c
+++ b/net/ceph/messenger_v2.c
@@ -308,7 +308,7 @@ static void *alloc_conn_buf(struct ceph_connection *con, int len)
 	if (WARN_ON(con->v2.conn_buf_cnt >= ARRAY_SIZE(con->v2.conn_bufs)))
 		return NULL;
 
-	buf = ceph_kvmalloc(len, GFP_NOIO);
+	buf = kvmalloc(len, GFP_NOIO);
 	if (!buf)
 		return NULL;
 
diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c
index 75b738083523..2823bb3cff55 100644
--- a/net/ceph/osdmap.c
+++ b/net/ceph/osdmap.c
@@ -980,7 +980,7 @@ static struct crush_work *alloc_workspace(const struct crush_map *c)
 	work_size = crush_work_size(c, CEPH_PG_MAX_SIZE);
 	dout("%s work_size %zu bytes\n", __func__, work_size);
 
-	work = ceph_kvmalloc(work_size, GFP_NOIO);
+	work = kvmalloc(work_size, GFP_NOIO);
 	if (!work)
 		return NULL;
 
@@ -1190,9 +1190,9 @@ static int osdmap_set_max_osd(struct ceph_osdmap *map, u32 max)
 	if (max == map->max_osd)
 		return 0;
 
-	state = ceph_kvmalloc(array_size(max, sizeof(*state)), GFP_NOFS);
-	weight = ceph_kvmalloc(array_size(max, sizeof(*weight)), GFP_NOFS);
-	addr = ceph_kvmalloc(array_size(max, sizeof(*addr)), GFP_NOFS);
+	state = kvmalloc(array_size(max, sizeof(*state)), GFP_NOFS);
+	weight = kvmalloc(array_size(max, sizeof(*weight)), GFP_NOFS);
+	addr = kvmalloc(array_size(max, sizeof(*addr)), GFP_NOFS);
 	if (!state || !weight || !addr) {
 		kvfree(state);
 		kvfree(weight);
@@ -1222,7 +1222,7 @@ static int osdmap_set_max_osd(struct ceph_osdmap *map, u32 max)
 	if (map->osd_primary_affinity) {
 		u32 *affinity;
 
-		affinity = ceph_kvmalloc(array_size(max, sizeof(*affinity)),
+		affinity = kvmalloc(array_size(max, sizeof(*affinity)),
 					 GFP_NOFS);
 		if (!affinity)
 			return -ENOMEM;
@@ -1503,7 +1503,7 @@ static int set_primary_affinity(struct ceph_osdmap *map, int osd, u32 aff)
 	if (!map->osd_primary_affinity) {
 		int i;
 
-		map->osd_primary_affinity = ceph_kvmalloc(
+		map->osd_primary_affinity = kvmalloc(
 		    array_size(map->max_osd, sizeof(*map->osd_primary_affinity)),
 		    GFP_NOFS);
 		if (!map->osd_primary_affinity)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-18 11:47 ` [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL Michal Hocko
@ 2021-10-18 16:24     ` kernel test robot
  2021-10-18 16:48   ` Michal Hocko
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 39+ messages in thread
From: kernel test robot @ 2021-10-18 16:24 UTC (permalink / raw)
  To: Michal Hocko; +Cc: llvm, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5886 bytes --]

Hi Michal,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on ceph-client/for-linus]
[also build test WARNING on v5.15-rc6]
[cannot apply to hnaz-mm/master next-20211018]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Michal-Hocko/extend-vmalloc-support-for-constrained-allocations/20211018-194834
base:   https://github.com/ceph/ceph-client.git for-linus
config: hexagon-randconfig-r041-20211018 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project d245f2e8597bfb52c34810a328d42b990e4af1a4)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/ff0475c921959da4fd8e7d29e5a06739a1c52386
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Michal-Hocko/extend-vmalloc-support-for-constrained-allocations/20211018-194834
        git checkout ff0475c921959da4fd8e7d29e5a06739a1c52386
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=hexagon 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> mm/vmalloc.c:3037:16: warning: use of logical '&&' with constant operand [-Wconstant-logical-operand]
                   if (gfp_mask && __GFP_NOFAIL)
                                ^  ~~~~~~~~~~~~
   mm/vmalloc.c:3037:16: note: use '&' for a bitwise operation
                   if (gfp_mask && __GFP_NOFAIL)
                                ^~
                                &
   mm/vmalloc.c:3037:16: note: remove constant to silence this warning
                   if (gfp_mask && __GFP_NOFAIL)
                               ~^~~~~~~~~~~~~~~
   mm/vmalloc.c:781:1: warning: unused function 'compute_subtree_max_size' [-Wunused-function]
   compute_subtree_max_size(struct vmap_area *va)
   ^
   2 warnings generated.


vim +3037 mm/vmalloc.c

  2967	
  2968	/**
  2969	 * __vmalloc_node_range - allocate virtually contiguous memory
  2970	 * @size:		  allocation size
  2971	 * @align:		  desired alignment
  2972	 * @start:		  vm area range start
  2973	 * @end:		  vm area range end
  2974	 * @gfp_mask:		  flags for the page level allocator
  2975	 * @prot:		  protection mask for the allocated pages
  2976	 * @vm_flags:		  additional vm area flags (e.g. %VM_NO_GUARD)
  2977	 * @node:		  node to use for allocation or NUMA_NO_NODE
  2978	 * @caller:		  caller's return address
  2979	 *
  2980	 * Allocate enough pages to cover @size from the page level
  2981	 * allocator with @gfp_mask flags.  Map them into contiguous
  2982	 * kernel virtual space, using a pagetable protection of @prot.
  2983	 *
  2984	 * Return: the address of the area or %NULL on failure
  2985	 */
  2986	void *__vmalloc_node_range(unsigned long size, unsigned long align,
  2987				unsigned long start, unsigned long end, gfp_t gfp_mask,
  2988				pgprot_t prot, unsigned long vm_flags, int node,
  2989				const void *caller)
  2990	{
  2991		struct vm_struct *area;
  2992		void *addr;
  2993		unsigned long real_size = size;
  2994		unsigned long real_align = align;
  2995		unsigned int shift = PAGE_SHIFT;
  2996	
  2997		if (WARN_ON_ONCE(!size))
  2998			return NULL;
  2999	
  3000		if ((size >> PAGE_SHIFT) > totalram_pages()) {
  3001			warn_alloc(gfp_mask, NULL,
  3002				"vmalloc error: size %lu, exceeds total pages",
  3003				real_size);
  3004			return NULL;
  3005		}
  3006	
  3007		if (vmap_allow_huge && !(vm_flags & VM_NO_HUGE_VMAP)) {
  3008			unsigned long size_per_node;
  3009	
  3010			/*
  3011			 * Try huge pages. Only try for PAGE_KERNEL allocations,
  3012			 * others like modules don't yet expect huge pages in
  3013			 * their allocations due to apply_to_page_range not
  3014			 * supporting them.
  3015			 */
  3016	
  3017			size_per_node = size;
  3018			if (node == NUMA_NO_NODE)
  3019				size_per_node /= num_online_nodes();
  3020			if (arch_vmap_pmd_supported(prot) && size_per_node >= PMD_SIZE)
  3021				shift = PMD_SHIFT;
  3022			else
  3023				shift = arch_vmap_pte_supported_shift(size_per_node);
  3024	
  3025			align = max(real_align, 1UL << shift);
  3026			size = ALIGN(real_size, 1UL << shift);
  3027		}
  3028	
  3029	again:
  3030		area = __get_vm_area_node(real_size, align, shift, VM_ALLOC |
  3031					  VM_UNINITIALIZED | vm_flags, start, end, node,
  3032					  gfp_mask, caller);
  3033		if (!area) {
  3034			warn_alloc(gfp_mask, NULL,
  3035				"vmalloc error: size %lu, vm_struct allocation failed",
  3036				real_size);
> 3037			if (gfp_mask && __GFP_NOFAIL)
  3038				goto again;
  3039			goto fail;
  3040		}
  3041	
  3042		addr = __vmalloc_area_node(area, gfp_mask, prot, shift, node);
  3043		if (!addr)
  3044			goto fail;
  3045	
  3046		/*
  3047		 * In this function, newly allocated vm_struct has VM_UNINITIALIZED
  3048		 * flag. It means that vm_struct is not fully initialized.
  3049		 * Now, it is fully initialized, so remove this flag here.
  3050		 */
  3051		clear_vm_uninitialized_flag(area);
  3052	
  3053		size = PAGE_ALIGN(size);
  3054		kmemleak_vmalloc(area, size, gfp_mask);
  3055	
  3056		return addr;
  3057	
  3058	fail:
  3059		if (shift > PAGE_SHIFT) {
  3060			shift = PAGE_SHIFT;
  3061			align = real_align;
  3062			size = real_size;
  3063			goto again;
  3064		}
  3065	
  3066		return NULL;
  3067	}
  3068	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31826 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
@ 2021-10-18 16:24     ` kernel test robot
  0 siblings, 0 replies; 39+ messages in thread
From: kernel test robot @ 2021-10-18 16:24 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 6041 bytes --]

Hi Michal,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on ceph-client/for-linus]
[also build test WARNING on v5.15-rc6]
[cannot apply to hnaz-mm/master next-20211018]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Michal-Hocko/extend-vmalloc-support-for-constrained-allocations/20211018-194834
base:   https://github.com/ceph/ceph-client.git for-linus
config: hexagon-randconfig-r041-20211018 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project d245f2e8597bfb52c34810a328d42b990e4af1a4)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/ff0475c921959da4fd8e7d29e5a06739a1c52386
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Michal-Hocko/extend-vmalloc-support-for-constrained-allocations/20211018-194834
        git checkout ff0475c921959da4fd8e7d29e5a06739a1c52386
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=hexagon 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> mm/vmalloc.c:3037:16: warning: use of logical '&&' with constant operand [-Wconstant-logical-operand]
                   if (gfp_mask && __GFP_NOFAIL)
                                ^  ~~~~~~~~~~~~
   mm/vmalloc.c:3037:16: note: use '&' for a bitwise operation
                   if (gfp_mask && __GFP_NOFAIL)
                                ^~
                                &
   mm/vmalloc.c:3037:16: note: remove constant to silence this warning
                   if (gfp_mask && __GFP_NOFAIL)
                               ~^~~~~~~~~~~~~~~
   mm/vmalloc.c:781:1: warning: unused function 'compute_subtree_max_size' [-Wunused-function]
   compute_subtree_max_size(struct vmap_area *va)
   ^
   2 warnings generated.


vim +3037 mm/vmalloc.c

  2967	
  2968	/**
  2969	 * __vmalloc_node_range - allocate virtually contiguous memory
  2970	 * @size:		  allocation size
  2971	 * @align:		  desired alignment
  2972	 * @start:		  vm area range start
  2973	 * @end:		  vm area range end
  2974	 * @gfp_mask:		  flags for the page level allocator
  2975	 * @prot:		  protection mask for the allocated pages
  2976	 * @vm_flags:		  additional vm area flags (e.g. %VM_NO_GUARD)
  2977	 * @node:		  node to use for allocation or NUMA_NO_NODE
  2978	 * @caller:		  caller's return address
  2979	 *
  2980	 * Allocate enough pages to cover @size from the page level
  2981	 * allocator with @gfp_mask flags.  Map them into contiguous
  2982	 * kernel virtual space, using a pagetable protection of @prot.
  2983	 *
  2984	 * Return: the address of the area or %NULL on failure
  2985	 */
  2986	void *__vmalloc_node_range(unsigned long size, unsigned long align,
  2987				unsigned long start, unsigned long end, gfp_t gfp_mask,
  2988				pgprot_t prot, unsigned long vm_flags, int node,
  2989				const void *caller)
  2990	{
  2991		struct vm_struct *area;
  2992		void *addr;
  2993		unsigned long real_size = size;
  2994		unsigned long real_align = align;
  2995		unsigned int shift = PAGE_SHIFT;
  2996	
  2997		if (WARN_ON_ONCE(!size))
  2998			return NULL;
  2999	
  3000		if ((size >> PAGE_SHIFT) > totalram_pages()) {
  3001			warn_alloc(gfp_mask, NULL,
  3002				"vmalloc error: size %lu, exceeds total pages",
  3003				real_size);
  3004			return NULL;
  3005		}
  3006	
  3007		if (vmap_allow_huge && !(vm_flags & VM_NO_HUGE_VMAP)) {
  3008			unsigned long size_per_node;
  3009	
  3010			/*
  3011			 * Try huge pages. Only try for PAGE_KERNEL allocations,
  3012			 * others like modules don't yet expect huge pages in
  3013			 * their allocations due to apply_to_page_range not
  3014			 * supporting them.
  3015			 */
  3016	
  3017			size_per_node = size;
  3018			if (node == NUMA_NO_NODE)
  3019				size_per_node /= num_online_nodes();
  3020			if (arch_vmap_pmd_supported(prot) && size_per_node >= PMD_SIZE)
  3021				shift = PMD_SHIFT;
  3022			else
  3023				shift = arch_vmap_pte_supported_shift(size_per_node);
  3024	
  3025			align = max(real_align, 1UL << shift);
  3026			size = ALIGN(real_size, 1UL << shift);
  3027		}
  3028	
  3029	again:
  3030		area = __get_vm_area_node(real_size, align, shift, VM_ALLOC |
  3031					  VM_UNINITIALIZED | vm_flags, start, end, node,
  3032					  gfp_mask, caller);
  3033		if (!area) {
  3034			warn_alloc(gfp_mask, NULL,
  3035				"vmalloc error: size %lu, vm_struct allocation failed",
  3036				real_size);
> 3037			if (gfp_mask && __GFP_NOFAIL)
  3038				goto again;
  3039			goto fail;
  3040		}
  3041	
  3042		addr = __vmalloc_area_node(area, gfp_mask, prot, shift, node);
  3043		if (!addr)
  3044			goto fail;
  3045	
  3046		/*
  3047		 * In this function, newly allocated vm_struct has VM_UNINITIALIZED
  3048		 * flag. It means that vm_struct is not fully initialized.
  3049		 * Now, it is fully initialized, so remove this flag here.
  3050		 */
  3051		clear_vm_uninitialized_flag(area);
  3052	
  3053		size = PAGE_ALIGN(size);
  3054		kmemleak_vmalloc(area, size, gfp_mask);
  3055	
  3056		return addr;
  3057	
  3058	fail:
  3059		if (shift > PAGE_SHIFT) {
  3060			shift = PAGE_SHIFT;
  3061			align = real_align;
  3062			size = real_size;
  3063			goto again;
  3064		}
  3065	
  3066		return NULL;
  3067	}
  3068	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 31826 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-18 11:47 ` [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL Michal Hocko
  2021-10-18 16:24     ` kernel test robot
@ 2021-10-18 16:48   ` Michal Hocko
  2021-10-18 18:15     ` kernel test robot
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2021-10-18 16:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Chinner, Neil Brown, Andrew Morton, Christoph Hellwig,
	Uladzislau Rezki, linux-fsdevel, LKML, Ilya Dryomov, Jeff Layton

Fat fingers on my side. A follow up to fold
commit 63d1b80b9a298c9380e5175e2add7025b6bd2600
Author: Michal Hocko <mhocko@suse.com>
Date:   Mon Oct 18 18:47:04 2021 +0200

    fold me "mm/vmalloc: add support for __GFP_NOFAIL"

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 3a5a178295d1..4ce9ccc33e33 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3034,7 +3034,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 		warn_alloc(gfp_mask, NULL,
 			"vmalloc error: size %lu, vm_struct allocation failed",
 			real_size);
-		if (gfp_mask && __GFP_NOFAIL)
+		if (gfp_mask & __GFP_NOFAIL)
 			goto again;
 		goto fail;
 	}
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-18 11:47 ` [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL Michal Hocko
@ 2021-10-18 18:15     ` kernel test robot
  2021-10-18 16:48   ` Michal Hocko
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 39+ messages in thread
From: kernel test robot @ 2021-10-18 18:15 UTC (permalink / raw)
  To: Michal Hocko; +Cc: llvm, kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5743 bytes --]

Hi Michal,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on ceph-client/for-linus]
[also build test ERROR on v5.15-rc6]
[cannot apply to hnaz-mm/master next-20211018]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Michal-Hocko/extend-vmalloc-support-for-constrained-allocations/20211018-194834
base:   https://github.com/ceph/ceph-client.git for-linus
config: hexagon-buildonly-randconfig-r005-20211018 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project d245f2e8597bfb52c34810a328d42b990e4af1a4)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/ff0475c921959da4fd8e7d29e5a06739a1c52386
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Michal-Hocko/extend-vmalloc-support-for-constrained-allocations/20211018-194834
        git checkout ff0475c921959da4fd8e7d29e5a06739a1c52386
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=hexagon 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> mm/vmalloc.c:3037:16: error: use of logical '&&' with constant operand [-Werror,-Wconstant-logical-operand]
                   if (gfp_mask && __GFP_NOFAIL)
                                ^  ~~~~~~~~~~~~
   mm/vmalloc.c:3037:16: note: use '&' for a bitwise operation
                   if (gfp_mask && __GFP_NOFAIL)
                                ^~
                                &
   mm/vmalloc.c:3037:16: note: remove constant to silence this warning
                   if (gfp_mask && __GFP_NOFAIL)
                               ~^~~~~~~~~~~~~~~
   1 error generated.


vim +3037 mm/vmalloc.c

  2967	
  2968	/**
  2969	 * __vmalloc_node_range - allocate virtually contiguous memory
  2970	 * @size:		  allocation size
  2971	 * @align:		  desired alignment
  2972	 * @start:		  vm area range start
  2973	 * @end:		  vm area range end
  2974	 * @gfp_mask:		  flags for the page level allocator
  2975	 * @prot:		  protection mask for the allocated pages
  2976	 * @vm_flags:		  additional vm area flags (e.g. %VM_NO_GUARD)
  2977	 * @node:		  node to use for allocation or NUMA_NO_NODE
  2978	 * @caller:		  caller's return address
  2979	 *
  2980	 * Allocate enough pages to cover @size from the page level
  2981	 * allocator with @gfp_mask flags.  Map them into contiguous
  2982	 * kernel virtual space, using a pagetable protection of @prot.
  2983	 *
  2984	 * Return: the address of the area or %NULL on failure
  2985	 */
  2986	void *__vmalloc_node_range(unsigned long size, unsigned long align,
  2987				unsigned long start, unsigned long end, gfp_t gfp_mask,
  2988				pgprot_t prot, unsigned long vm_flags, int node,
  2989				const void *caller)
  2990	{
  2991		struct vm_struct *area;
  2992		void *addr;
  2993		unsigned long real_size = size;
  2994		unsigned long real_align = align;
  2995		unsigned int shift = PAGE_SHIFT;
  2996	
  2997		if (WARN_ON_ONCE(!size))
  2998			return NULL;
  2999	
  3000		if ((size >> PAGE_SHIFT) > totalram_pages()) {
  3001			warn_alloc(gfp_mask, NULL,
  3002				"vmalloc error: size %lu, exceeds total pages",
  3003				real_size);
  3004			return NULL;
  3005		}
  3006	
  3007		if (vmap_allow_huge && !(vm_flags & VM_NO_HUGE_VMAP)) {
  3008			unsigned long size_per_node;
  3009	
  3010			/*
  3011			 * Try huge pages. Only try for PAGE_KERNEL allocations,
  3012			 * others like modules don't yet expect huge pages in
  3013			 * their allocations due to apply_to_page_range not
  3014			 * supporting them.
  3015			 */
  3016	
  3017			size_per_node = size;
  3018			if (node == NUMA_NO_NODE)
  3019				size_per_node /= num_online_nodes();
  3020			if (arch_vmap_pmd_supported(prot) && size_per_node >= PMD_SIZE)
  3021				shift = PMD_SHIFT;
  3022			else
  3023				shift = arch_vmap_pte_supported_shift(size_per_node);
  3024	
  3025			align = max(real_align, 1UL << shift);
  3026			size = ALIGN(real_size, 1UL << shift);
  3027		}
  3028	
  3029	again:
  3030		area = __get_vm_area_node(real_size, align, shift, VM_ALLOC |
  3031					  VM_UNINITIALIZED | vm_flags, start, end, node,
  3032					  gfp_mask, caller);
  3033		if (!area) {
  3034			warn_alloc(gfp_mask, NULL,
  3035				"vmalloc error: size %lu, vm_struct allocation failed",
  3036				real_size);
> 3037			if (gfp_mask && __GFP_NOFAIL)
  3038				goto again;
  3039			goto fail;
  3040		}
  3041	
  3042		addr = __vmalloc_area_node(area, gfp_mask, prot, shift, node);
  3043		if (!addr)
  3044			goto fail;
  3045	
  3046		/*
  3047		 * In this function, newly allocated vm_struct has VM_UNINITIALIZED
  3048		 * flag. It means that vm_struct is not fully initialized.
  3049		 * Now, it is fully initialized, so remove this flag here.
  3050		 */
  3051		clear_vm_uninitialized_flag(area);
  3052	
  3053		size = PAGE_ALIGN(size);
  3054		kmemleak_vmalloc(area, size, gfp_mask);
  3055	
  3056		return addr;
  3057	
  3058	fail:
  3059		if (shift > PAGE_SHIFT) {
  3060			shift = PAGE_SHIFT;
  3061			align = real_align;
  3062			size = real_size;
  3063			goto again;
  3064		}
  3065	
  3066		return NULL;
  3067	}
  3068	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 21936 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
@ 2021-10-18 18:15     ` kernel test robot
  0 siblings, 0 replies; 39+ messages in thread
From: kernel test robot @ 2021-10-18 18:15 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5895 bytes --]

Hi Michal,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on ceph-client/for-linus]
[also build test ERROR on v5.15-rc6]
[cannot apply to hnaz-mm/master next-20211018]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Michal-Hocko/extend-vmalloc-support-for-constrained-allocations/20211018-194834
base:   https://github.com/ceph/ceph-client.git for-linus
config: hexagon-buildonly-randconfig-r005-20211018 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project d245f2e8597bfb52c34810a328d42b990e4af1a4)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/ff0475c921959da4fd8e7d29e5a06739a1c52386
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Michal-Hocko/extend-vmalloc-support-for-constrained-allocations/20211018-194834
        git checkout ff0475c921959da4fd8e7d29e5a06739a1c52386
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=hexagon 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> mm/vmalloc.c:3037:16: error: use of logical '&&' with constant operand [-Werror,-Wconstant-logical-operand]
                   if (gfp_mask && __GFP_NOFAIL)
                                ^  ~~~~~~~~~~~~
   mm/vmalloc.c:3037:16: note: use '&' for a bitwise operation
                   if (gfp_mask && __GFP_NOFAIL)
                                ^~
                                &
   mm/vmalloc.c:3037:16: note: remove constant to silence this warning
                   if (gfp_mask && __GFP_NOFAIL)
                               ~^~~~~~~~~~~~~~~
   1 error generated.


vim +3037 mm/vmalloc.c

  2967	
  2968	/**
  2969	 * __vmalloc_node_range - allocate virtually contiguous memory
  2970	 * @size:		  allocation size
  2971	 * @align:		  desired alignment
  2972	 * @start:		  vm area range start
  2973	 * @end:		  vm area range end
  2974	 * @gfp_mask:		  flags for the page level allocator
  2975	 * @prot:		  protection mask for the allocated pages
  2976	 * @vm_flags:		  additional vm area flags (e.g. %VM_NO_GUARD)
  2977	 * @node:		  node to use for allocation or NUMA_NO_NODE
  2978	 * @caller:		  caller's return address
  2979	 *
  2980	 * Allocate enough pages to cover @size from the page level
  2981	 * allocator with @gfp_mask flags.  Map them into contiguous
  2982	 * kernel virtual space, using a pagetable protection of @prot.
  2983	 *
  2984	 * Return: the address of the area or %NULL on failure
  2985	 */
  2986	void *__vmalloc_node_range(unsigned long size, unsigned long align,
  2987				unsigned long start, unsigned long end, gfp_t gfp_mask,
  2988				pgprot_t prot, unsigned long vm_flags, int node,
  2989				const void *caller)
  2990	{
  2991		struct vm_struct *area;
  2992		void *addr;
  2993		unsigned long real_size = size;
  2994		unsigned long real_align = align;
  2995		unsigned int shift = PAGE_SHIFT;
  2996	
  2997		if (WARN_ON_ONCE(!size))
  2998			return NULL;
  2999	
  3000		if ((size >> PAGE_SHIFT) > totalram_pages()) {
  3001			warn_alloc(gfp_mask, NULL,
  3002				"vmalloc error: size %lu, exceeds total pages",
  3003				real_size);
  3004			return NULL;
  3005		}
  3006	
  3007		if (vmap_allow_huge && !(vm_flags & VM_NO_HUGE_VMAP)) {
  3008			unsigned long size_per_node;
  3009	
  3010			/*
  3011			 * Try huge pages. Only try for PAGE_KERNEL allocations,
  3012			 * others like modules don't yet expect huge pages in
  3013			 * their allocations due to apply_to_page_range not
  3014			 * supporting them.
  3015			 */
  3016	
  3017			size_per_node = size;
  3018			if (node == NUMA_NO_NODE)
  3019				size_per_node /= num_online_nodes();
  3020			if (arch_vmap_pmd_supported(prot) && size_per_node >= PMD_SIZE)
  3021				shift = PMD_SHIFT;
  3022			else
  3023				shift = arch_vmap_pte_supported_shift(size_per_node);
  3024	
  3025			align = max(real_align, 1UL << shift);
  3026			size = ALIGN(real_size, 1UL << shift);
  3027		}
  3028	
  3029	again:
  3030		area = __get_vm_area_node(real_size, align, shift, VM_ALLOC |
  3031					  VM_UNINITIALIZED | vm_flags, start, end, node,
  3032					  gfp_mask, caller);
  3033		if (!area) {
  3034			warn_alloc(gfp_mask, NULL,
  3035				"vmalloc error: size %lu, vm_struct allocation failed",
  3036				real_size);
> 3037			if (gfp_mask && __GFP_NOFAIL)
  3038				goto again;
  3039			goto fail;
  3040		}
  3041	
  3042		addr = __vmalloc_area_node(area, gfp_mask, prot, shift, node);
  3043		if (!addr)
  3044			goto fail;
  3045	
  3046		/*
  3047		 * In this function, newly allocated vm_struct has VM_UNINITIALIZED
  3048		 * flag. It means that vm_struct is not fully initialized.
  3049		 * Now, it is fully initialized, so remove this flag here.
  3050		 */
  3051		clear_vm_uninitialized_flag(area);
  3052	
  3053		size = PAGE_ALIGN(size);
  3054		kmemleak_vmalloc(area, size, gfp_mask);
  3055	
  3056		return addr;
  3057	
  3058	fail:
  3059		if (shift > PAGE_SHIFT) {
  3060			shift = PAGE_SHIFT;
  3061			align = real_align;
  3062			size = real_size;
  3063			goto again;
  3064		}
  3065	
  3066		return NULL;
  3067	}
  3068	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 21936 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 1/3] mm/vmalloc: alloc GFP_NO{FS,IO} for vmalloc
  2021-10-18 11:47 ` [RFC 1/3] mm/vmalloc: alloc GFP_NO{FS,IO} for vmalloc Michal Hocko
@ 2021-10-19  0:44   ` NeilBrown
  2021-10-19  6:59     ` Michal Hocko
  0 siblings, 1 reply; 39+ messages in thread
From: NeilBrown @ 2021-10-19  0:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Dave Chinner, Andrew Morton, Christoph Hellwig,
	Uladzislau Rezki, linux-fsdevel, LKML, Ilya Dryomov, Jeff Layton,
	Michal Hocko

On Mon, 18 Oct 2021, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> vmalloc historically hasn't supported GFP_NO{FS,IO} requests because
> page table allocations do not support externally provided gfp mask
> and performed GFP_KERNEL like allocations.
> 
> Since few years we have scope (memalloc_no{fs,io}_{save,restore}) APIs
> to enforce NOFS and NOIO constrains implicitly to all allocators within
> the scope. There was a hope that those scopes would be defined on a
> higher level when the reclaim recursion boundary starts/stops (e.g. when
> a lock required during the memory reclaim is required etc.). It seems
> that not all NOFS/NOIO users have adopted this approach and instead
> they have taken a workaround approach to wrap a single [k]vmalloc
> allocation by a scope API.
> 
> These workarounds do not serve the purpose of a better reclaim recursion
> documentation and reduction of explicit GFP_NO{FS,IO} usege so let's
> just provide them with the semantic they are asking for without a need
> for workarounds.
> 
> Add support for GFP_NOFS and GFP_NOIO to vmalloc directly. All internal
> allocations already comply with the given gfp_mask. The only current
> exception is vmap_pages_range which maps kernel page tables. Infer the
> proper scope API based on the given gfp mask.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/vmalloc.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index d77830ff604c..7455c89598d3 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2889,6 +2889,8 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	unsigned long array_size;
>  	unsigned int nr_small_pages = size >> PAGE_SHIFT;
>  	unsigned int page_order;
> +	unsigned int flags;
> +	int ret;
>  
>  	array_size = (unsigned long)nr_small_pages * sizeof(struct page *);
>  	gfp_mask |= __GFP_NOWARN;
> @@ -2930,8 +2932,24 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  		goto fail;
>  	}
>  
> -	if (vmap_pages_range(addr, addr + size, prot, area->pages,
> -			page_shift) < 0) {
> +	/*
> +	 * page tables allocations ignore external gfp mask, enforce it
> +	 * by the scope API
> +	 */
> +	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
> +		flags = memalloc_nofs_save();
> +	else if (!(gfp_mask & (__GFP_FS | __GFP_IO)))

I would *much* rather this were written

        else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0)

so that the comparison with the previous test is more obvious.  Ditto
for similar code below.
It could even be

   switch (gfp_mask & (__GFP_FS | __GFP_IO)) {
   case __GFP__IO: flags = memalloc_nofs_save(); break;
   case 0:         flags = memalloc_noio_save(); break;
   }

But I'm not completely convinced that is an improvement.

In terms of functionality this looks good.
Thanks,
NeilBrown


> +		flags = memalloc_noio_save();
> +
> +	ret = vmap_pages_range(addr, addr + size, prot, area->pages,
> +			page_shift);
> +
> +	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
> +		memalloc_nofs_restore(flags);
> +	else if (!(gfp_mask & (__GFP_FS | __GFP_IO)))
> +		memalloc_noio_restore(flags);
> +
> +	if (ret < 0) {
>  		warn_alloc(gfp_mask, NULL,
>  			"vmalloc error: size %lu, failed to map pages",
>  			area->nr_pages * PAGE_SIZE);
> -- 
> 2.30.2
> 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 1/3] mm/vmalloc: alloc GFP_NO{FS,IO} for vmalloc
  2021-10-19  0:44   ` NeilBrown
@ 2021-10-19  6:59     ` Michal Hocko
  0 siblings, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2021-10-19  6:59 UTC (permalink / raw)
  To: NeilBrown
  Cc: linux-mm, Dave Chinner, Andrew Morton, Christoph Hellwig,
	Uladzislau Rezki, linux-fsdevel, LKML, Ilya Dryomov, Jeff Layton

On Tue 19-10-21 11:44:01, Neil Brown wrote:
> On Mon, 18 Oct 2021, Michal Hocko wrote:
[...]
> > @@ -2930,8 +2932,24 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> >  		goto fail;
> >  	}
> >  
> > -	if (vmap_pages_range(addr, addr + size, prot, area->pages,
> > -			page_shift) < 0) {
> > +	/*
> > +	 * page tables allocations ignore external gfp mask, enforce it
> > +	 * by the scope API
> > +	 */
> > +	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
> > +		flags = memalloc_nofs_save();
> > +	else if (!(gfp_mask & (__GFP_FS | __GFP_IO)))
> 
> I would *much* rather this were written
> 
>         else if ((gfp_mask & (__GFP_FS | __GFP_IO)) == 0)

Sure, this looks better indeed.

> so that the comparison with the previous test is more obvious.  Ditto
> for similar code below.
> It could even be
> 
>    switch (gfp_mask & (__GFP_FS | __GFP_IO)) {
>    case __GFP__IO: flags = memalloc_nofs_save(); break;
>    case 0:         flags = memalloc_noio_save(); break;
>    }
> 
> But I'm not completely convinced that is an improvement.

I am not a great fan of this though.

> In terms of functionality this looks good.

Thanks for the review!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-18 11:47 ` [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL Michal Hocko
                     ` (2 preceding siblings ...)
  2021-10-18 18:15     ` kernel test robot
@ 2021-10-19 11:06   ` Uladzislau Rezki
  2021-10-19 11:52     ` Michal Hocko
  2021-10-20  8:25   ` [PATCH] mm/vmalloc: be more explicit about supported gfp flags Michal Hocko
  4 siblings, 1 reply; 39+ messages in thread
From: Uladzislau Rezki @ 2021-10-19 11:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Dave Chinner, Neil Brown, Andrew Morton,
	Christoph Hellwig, Uladzislau Rezki, linux-fsdevel, LKML,
	Ilya Dryomov, Jeff Layton, Michal Hocko

> From: Michal Hocko <mhocko@suse.com>
> 
> Dave Chinner has mentioned that some of the xfs code would benefit from
> kvmalloc support for __GFP_NOFAIL because they have allocations that
> cannot fail and they do not fit into a single page.
> 
> The larg part of the vmalloc implementation already complies with the
> given gfp flags so there is no work for those to be done. The area
> and page table allocations are an exception to that. Implement a retry
> loop for those.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/vmalloc.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 7455c89598d3..3a5a178295d1 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2941,8 +2941,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	else if (!(gfp_mask & (__GFP_FS | __GFP_IO)))
>  		flags = memalloc_noio_save();
>  
> -	ret = vmap_pages_range(addr, addr + size, prot, area->pages,
> +	do {
> +		ret = vmap_pages_range(addr, addr + size, prot, area->pages,
>  			page_shift);
> +	} while ((gfp_mask & __GFP_NOFAIL) && (ret < 0));
>  
>  	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
>  		memalloc_nofs_restore(flags);
> @@ -3032,6 +3034,8 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>  		warn_alloc(gfp_mask, NULL,
>  			"vmalloc error: size %lu, vm_struct allocation failed",
>  			real_size);
> +		if (gfp_mask && __GFP_NOFAIL)
> +			goto again;
>  		goto fail;
>  	}
>  
> -- 
> 2.30.2
> 
I have checked the vmap code how it aligns with the __GFP_NOFAIL flag.
To me it looks correct from functional point of view.

There is one place though it is kasan_populate_vmalloc_pte(). It does
not use gfp_mask, instead it directly deals with GFP_KERNEL for its
internal purpose. If it fails the code will end up in loping in the
__vmalloc_node_range().

I am not sure how it is important to pass __GFP_NOFAIL into KASAN code.

Any thoughts about it?

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-19 11:06   ` Uladzislau Rezki
@ 2021-10-19 11:52     ` Michal Hocko
  2021-10-19 19:46       ` Uladzislau Rezki
  0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2021-10-19 11:52 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: linux-mm, Dave Chinner, Neil Brown, Andrew Morton,
	Christoph Hellwig, linux-fsdevel, LKML, Ilya Dryomov,
	Jeff Layton

On Tue 19-10-21 13:06:49, Uladzislau Rezki wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > Dave Chinner has mentioned that some of the xfs code would benefit from
> > kvmalloc support for __GFP_NOFAIL because they have allocations that
> > cannot fail and they do not fit into a single page.
> > 
> > The larg part of the vmalloc implementation already complies with the
> > given gfp flags so there is no work for those to be done. The area
> > and page table allocations are an exception to that. Implement a retry
> > loop for those.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  mm/vmalloc.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 7455c89598d3..3a5a178295d1 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -2941,8 +2941,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> >  	else if (!(gfp_mask & (__GFP_FS | __GFP_IO)))
> >  		flags = memalloc_noio_save();
> >  
> > -	ret = vmap_pages_range(addr, addr + size, prot, area->pages,
> > +	do {
> > +		ret = vmap_pages_range(addr, addr + size, prot, area->pages,
> >  			page_shift);
> > +	} while ((gfp_mask & __GFP_NOFAIL) && (ret < 0));
> >  
> >  	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
> >  		memalloc_nofs_restore(flags);
> > @@ -3032,6 +3034,8 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> >  		warn_alloc(gfp_mask, NULL,
> >  			"vmalloc error: size %lu, vm_struct allocation failed",
> >  			real_size);
> > +		if (gfp_mask && __GFP_NOFAIL)
> > +			goto again;
> >  		goto fail;
> >  	}
> >  
> > -- 
> > 2.30.2
> > 
> I have checked the vmap code how it aligns with the __GFP_NOFAIL flag.
> To me it looks correct from functional point of view.
> 
> There is one place though it is kasan_populate_vmalloc_pte(). It does
> not use gfp_mask, instead it directly deals with GFP_KERNEL for its
> internal purpose. If it fails the code will end up in loping in the
> __vmalloc_node_range().
> 
> I am not sure how it is important to pass __GFP_NOFAIL into KASAN code.
> 
> Any thoughts about it?

The flag itself is not really necessary down there as long as we
guarantee that the high level logic doesn't fail. In this case we keep
retrying at __vmalloc_node_range level which should be possible to cover
all callers that can control gfp mask. I was thinking to put it into
__get_vm_area_node but that was slightly more hairy and we would be
losing the warning which might turn out being helpful in cases where the
failure is due to lack of vmalloc space or similar constrain. Btw. do we
want some throttling on a retry?

It would be better if the kasan part dealt with gfp mask properly though
and something that we can do on top.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-19 11:52     ` Michal Hocko
@ 2021-10-19 19:46       ` Uladzislau Rezki
  2021-10-20  8:25         ` Michal Hocko
  0 siblings, 1 reply; 39+ messages in thread
From: Uladzislau Rezki @ 2021-10-19 19:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Uladzislau Rezki, linux-mm, Dave Chinner, Neil Brown,
	Andrew Morton, Christoph Hellwig, linux-fsdevel, LKML,
	Ilya Dryomov, Jeff Layton

On Tue, Oct 19, 2021 at 01:52:07PM +0200, Michal Hocko wrote:
> On Tue 19-10-21 13:06:49, Uladzislau Rezki wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > > 
> > > Dave Chinner has mentioned that some of the xfs code would benefit from
> > > kvmalloc support for __GFP_NOFAIL because they have allocations that
> > > cannot fail and they do not fit into a single page.
> > > 
> > > The larg part of the vmalloc implementation already complies with the
> > > given gfp flags so there is no work for those to be done. The area
> > > and page table allocations are an exception to that. Implement a retry
> > > loop for those.
> > > 
> > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > ---
> > >  mm/vmalloc.c | 6 +++++-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index 7455c89598d3..3a5a178295d1 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -2941,8 +2941,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> > >  	else if (!(gfp_mask & (__GFP_FS | __GFP_IO)))
> > >  		flags = memalloc_noio_save();
> > >  
> > > -	ret = vmap_pages_range(addr, addr + size, prot, area->pages,
> > > +	do {
> > > +		ret = vmap_pages_range(addr, addr + size, prot, area->pages,
> > >  			page_shift);
> > > +	} while ((gfp_mask & __GFP_NOFAIL) && (ret < 0));
> > >  
> > >  	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
> > >  		memalloc_nofs_restore(flags);
> > > @@ -3032,6 +3034,8 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> > >  		warn_alloc(gfp_mask, NULL,
> > >  			"vmalloc error: size %lu, vm_struct allocation failed",
> > >  			real_size);
> > > +		if (gfp_mask && __GFP_NOFAIL)
> > > +			goto again;
> > >  		goto fail;
> > >  	}
> > >  
> > > -- 
> > > 2.30.2
> > > 
> > I have checked the vmap code how it aligns with the __GFP_NOFAIL flag.
> > To me it looks correct from functional point of view.
> > 
> > There is one place though it is kasan_populate_vmalloc_pte(). It does
> > not use gfp_mask, instead it directly deals with GFP_KERNEL for its
> > internal purpose. If it fails the code will end up in loping in the
> > __vmalloc_node_range().
> > 
> > I am not sure how it is important to pass __GFP_NOFAIL into KASAN code.
> > 
> > Any thoughts about it?
> 
> The flag itself is not really necessary down there as long as we
> guarantee that the high level logic doesn't fail. In this case we keep
> retrying at __vmalloc_node_range level which should be possible to cover
> all callers that can control gfp mask. I was thinking to put it into
> __get_vm_area_node but that was slightly more hairy and we would be
> losing the warning which might turn out being helpful in cases where the
> failure is due to lack of vmalloc space or similar constrain. Btw. do we
> want some throttling on a retry?
> 
I think adding kind of schedule() will not make things worse and in corner
cases could prevent a power drain by CPU. It is important for mobile devices. 

As for vmap space, it can be that a user specifies a short range that does
not contain any free area. In that case we might never return back to a caller.

Maybe add a good comment something like: think what you do when deal with the
__vmalloc_node_range() and __GFP_NOFAIL?

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-19 19:46       ` Uladzislau Rezki
@ 2021-10-20  8:25         ` Michal Hocko
  2021-10-20  9:18           ` Michal Hocko
  2021-10-20 13:54           ` Uladzislau Rezki
  0 siblings, 2 replies; 39+ messages in thread
From: Michal Hocko @ 2021-10-20  8:25 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: linux-mm, Dave Chinner, Neil Brown, Andrew Morton,
	Christoph Hellwig, linux-fsdevel, LKML, Ilya Dryomov,
	Jeff Layton

On Tue 19-10-21 21:46:58, Uladzislau Rezki wrote:
> On Tue, Oct 19, 2021 at 01:52:07PM +0200, Michal Hocko wrote:
> > On Tue 19-10-21 13:06:49, Uladzislau Rezki wrote:
> > > > From: Michal Hocko <mhocko@suse.com>
> > > > 
> > > > Dave Chinner has mentioned that some of the xfs code would benefit from
> > > > kvmalloc support for __GFP_NOFAIL because they have allocations that
> > > > cannot fail and they do not fit into a single page.
> > > > 
> > > > The larg part of the vmalloc implementation already complies with the
> > > > given gfp flags so there is no work for those to be done. The area
> > > > and page table allocations are an exception to that. Implement a retry
> > > > loop for those.
> > > > 
> > > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > > ---
> > > >  mm/vmalloc.c | 6 +++++-
> > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index 7455c89598d3..3a5a178295d1 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -2941,8 +2941,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> > > >  	else if (!(gfp_mask & (__GFP_FS | __GFP_IO)))
> > > >  		flags = memalloc_noio_save();
> > > >  
> > > > -	ret = vmap_pages_range(addr, addr + size, prot, area->pages,
> > > > +	do {
> > > > +		ret = vmap_pages_range(addr, addr + size, prot, area->pages,
> > > >  			page_shift);
> > > > +	} while ((gfp_mask & __GFP_NOFAIL) && (ret < 0));
> > > >  
> > > >  	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
> > > >  		memalloc_nofs_restore(flags);
> > > > @@ -3032,6 +3034,8 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> > > >  		warn_alloc(gfp_mask, NULL,
> > > >  			"vmalloc error: size %lu, vm_struct allocation failed",
> > > >  			real_size);
> > > > +		if (gfp_mask && __GFP_NOFAIL)
> > > > +			goto again;
> > > >  		goto fail;
> > > >  	}
> > > >  
> > > > -- 
> > > > 2.30.2
> > > > 
> > > I have checked the vmap code how it aligns with the __GFP_NOFAIL flag.
> > > To me it looks correct from functional point of view.
> > > 
> > > There is one place though it is kasan_populate_vmalloc_pte(). It does
> > > not use gfp_mask, instead it directly deals with GFP_KERNEL for its
> > > internal purpose. If it fails the code will end up in loping in the
> > > __vmalloc_node_range().
> > > 
> > > I am not sure how it is important to pass __GFP_NOFAIL into KASAN code.
> > > 
> > > Any thoughts about it?
> > 
> > The flag itself is not really necessary down there as long as we
> > guarantee that the high level logic doesn't fail. In this case we keep
> > retrying at __vmalloc_node_range level which should be possible to cover
> > all callers that can control gfp mask. I was thinking to put it into
> > __get_vm_area_node but that was slightly more hairy and we would be
> > losing the warning which might turn out being helpful in cases where the
> > failure is due to lack of vmalloc space or similar constrain. Btw. do we
> > want some throttling on a retry?
> > 
> I think adding kind of schedule() will not make things worse and in corner
> cases could prevent a power drain by CPU. It is important for mobile devices. 

I suspect you mean schedule_timeout here? Or cond_resched? I went with a
later for now, I do not have a good idea for how to long to sleep here.
I am more than happy to change to to a sleep though.

> As for vmap space, it can be that a user specifies a short range that does
> not contain any free area. In that case we might never return back to a caller.

This is to be expected. The caller cannot fail and if it would be
looping around vmalloc it wouldn't return anyway.

> Maybe add a good comment something like: think what you do when deal with the
> __vmalloc_node_range() and __GFP_NOFAIL?

We have a generic documentation for gfp flags and __GFP_NOFAIL is
docuemented to "The allocation could block indefinitely but will never
return with failure." We are discussing improvements for the generic
documentation in another thread [1] and we will likely extend it so I
suspect we do not have to repeat drawbacks here again.

[1] http://lkml.kernel.org/r/163184741778.29351.16920832234899124642.stgit@noble.brown

Anyway the gfp mask description and constrains for vmalloc are not
documented. I will add a new patch to fill that gap and send it as a
reply to this one
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH] mm/vmalloc: be more explicit about supported gfp flags.
  2021-10-18 11:47 ` [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL Michal Hocko
                     ` (3 preceding siblings ...)
  2021-10-19 11:06   ` Uladzislau Rezki
@ 2021-10-20  8:25   ` Michal Hocko
  4 siblings, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2021-10-20  8:25 UTC (permalink / raw)
  To: linux-mm
  Cc: Dave Chinner, Neil Brown, Andrew Morton, Christoph Hellwig,
	Uladzislau Rezki, linux-fsdevel, LKML, Ilya Dryomov, Jeff Layton,
	Michal Hocko

From: Michal Hocko <mhocko@suse.com>

The core of the vmalloc allocator __vmalloc_area_node doesn't say
anything about gfp mask argument. Not all gfp flags are supported
though. Be more explicit about constrains.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/vmalloc.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f7098e616883..f57c09e98977 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2979,8 +2979,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
  * @caller:		  caller's return address
  *
  * Allocate enough pages to cover @size from the page level
- * allocator with @gfp_mask flags.  Map them into contiguous
- * kernel virtual space, using a pagetable protection of @prot.
+ * allocator with @gfp_mask flags. Please note that the full set of gfp
+ * flags are not supported. GFP_KERNEL would be a preferred allocation mode
+ * but GFP_NOFS and GFP_NOIO are supported as well. Zone modifiers are not
+ * supported. From the reclaim modifiers__GFP_DIRECT_RECLAIM is required (aka
+ * GFP_NOWAIT is not supported) and only __GFP_NOFAIL is supported (aka
+ * __GFP_NORETRY and __GFP_RETRY_MAYFAIL are not supported).
+ * __GFP_NOWARN can be used to suppress error messages about failures.
+ * 
+ * Map them into contiguous kernel virtual space, using a pagetable
+ * protection of @prot.
  *
  * Return: the address of the area or %NULL on failure
  */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-20  8:25         ` Michal Hocko
@ 2021-10-20  9:18           ` Michal Hocko
  2021-10-20 13:54           ` Uladzislau Rezki
  1 sibling, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2021-10-20  9:18 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: linux-mm, Dave Chinner, Neil Brown, Andrew Morton,
	Christoph Hellwig, linux-fsdevel, LKML, Ilya Dryomov,
	Jeff Layton

On Wed 20-10-21 10:25:06, Michal Hocko wrote:
[...]
> > > The flag itself is not really necessary down there as long as we
> > > guarantee that the high level logic doesn't fail. In this case we keep
> > > retrying at __vmalloc_node_range level which should be possible to cover
> > > all callers that can control gfp mask. I was thinking to put it into
> > > __get_vm_area_node but that was slightly more hairy and we would be
> > > losing the warning which might turn out being helpful in cases where the
> > > failure is due to lack of vmalloc space or similar constrain. Btw. do we
> > > want some throttling on a retry?
> > > 
> > I think adding kind of schedule() will not make things worse and in corner
> > cases could prevent a power drain by CPU. It is important for mobile devices. 
> 
> I suspect you mean schedule_timeout here? Or cond_resched? I went with a
> later for now, I do not have a good idea for how to long to sleep here.
> I am more than happy to change to to a sleep though.

Forgot to paste the follow up I have staged currently
--- 
commit 66fea55e5543fa234692a70144cd63c7a1bca32f
Author: Michal Hocko <mhocko@suse.com>
Date:   Wed Oct 20 10:12:45 2021 +0200

    fold me "mm/vmalloc: add support for __GFP_NOFAIL"
    
    - add cond_resched

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 0fb5413d9239..f7098e616883 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2944,6 +2944,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	do {
 		ret = vmap_pages_range(addr, addr + size, prot, area->pages,
 			page_shift);
+		cond_resched();
 	} while ((gfp_mask & __GFP_NOFAIL) && (ret < 0));
 
 	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
@@ -3034,8 +3035,10 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 		warn_alloc(gfp_mask, NULL,
 			"vmalloc error: size %lu, vm_struct allocation failed",
 			real_size);
-		if (gfp_mask & __GFP_NOFAIL)
+		if (gfp_mask & __GFP_NOFAIL) {
+			cond_resched();
 			goto again;
+		}
 		goto fail;
 	}
 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-20  8:25         ` Michal Hocko
  2021-10-20  9:18           ` Michal Hocko
@ 2021-10-20 13:54           ` Uladzislau Rezki
  2021-10-20 14:06             ` Michal Hocko
  1 sibling, 1 reply; 39+ messages in thread
From: Uladzislau Rezki @ 2021-10-20 13:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux Memory Management List, Dave Chinner, Neil Brown,
	Andrew Morton, Christoph Hellwig, linux-fsdevel, LKML,
	Ilya Dryomov, Jeff Layton

> > >
> > I think adding kind of schedule() will not make things worse and in corner
> > cases could prevent a power drain by CPU. It is important for mobile devices.
>
> I suspect you mean schedule_timeout here? Or cond_resched? I went with a
> later for now, I do not have a good idea for how to long to sleep here.
> I am more than happy to change to to a sleep though.
>
cond_resched() reschedules only if TIF_NEED_RESCHED is raised what is not good
here. Because in our case we know that we definitely would like to
take a breath. Therefore
invoking the schedule() is more suitable here. It will give a CPU time
to another waiting
process(if exists) in any case putting the "current" one to the tail.

As for adding a delay. I am not sure about for how long to delay or i
would say i do not
see a good explanation why for example we delay for 10 milliseconds or so.

> > As for vmap space, it can be that a user specifies a short range that does
> > not contain any free area. In that case we might never return back to a caller.
>
> This is to be expected. The caller cannot fail and if it would be
> looping around vmalloc it wouldn't return anyway.
>
> > Maybe add a good comment something like: think what you do when deal with the
> > __vmalloc_node_range() and __GFP_NOFAIL?
>
> We have a generic documentation for gfp flags and __GFP_NOFAIL is
> docuemented to "The allocation could block indefinitely but will never
> return with failure." We are discussing improvements for the generic
> documentation in another thread [1] and we will likely extend it so I
> suspect we do not have to repeat drawbacks here again.
>
> [1] http://lkml.kernel.org/r/163184741778.29351.16920832234899124642.stgit@noble.brown
>
> Anyway the gfp mask description and constrains for vmalloc are not
> documented. I will add a new patch to fill that gap and send it as a
> reply to this one
>
This is really good. People should be prepared for a case when it
never returns back
to a caller :)

-- 
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-20 13:54           ` Uladzislau Rezki
@ 2021-10-20 14:06             ` Michal Hocko
  2021-10-20 14:29               ` Uladzislau Rezki
  0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2021-10-20 14:06 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Linux Memory Management List, Dave Chinner, Neil Brown,
	Andrew Morton, Christoph Hellwig, linux-fsdevel, LKML,
	Ilya Dryomov, Jeff Layton

On Wed 20-10-21 15:54:23, Uladzislau Rezki wrote:
> > > >
> > > I think adding kind of schedule() will not make things worse and in corner
> > > cases could prevent a power drain by CPU. It is important for mobile devices.
> >
> > I suspect you mean schedule_timeout here? Or cond_resched? I went with a
> > later for now, I do not have a good idea for how to long to sleep here.
> > I am more than happy to change to to a sleep though.
> >
> cond_resched() reschedules only if TIF_NEED_RESCHED is raised what is not good
> here. Because in our case we know that we definitely would like to
> take a breath. Therefore
> invoking the schedule() is more suitable here. It will give a CPU time
> to another waiting
> process(if exists) in any case putting the "current" one to the tail.

Yes, but there is no explicit event to wait for currently.

> As for adding a delay. I am not sure about for how long to delay or i
> would say i do not
> see a good explanation why for example we delay for 10 milliseconds or so.

As I've said I am OK with either of the two. Do you or anybody have any
preference? Without any explicit event to wake up for neither of the two
is more than just an optimistic retry.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-20 14:06             ` Michal Hocko
@ 2021-10-20 14:29               ` Uladzislau Rezki
  2021-10-20 14:53                 ` Michal Hocko
  0 siblings, 1 reply; 39+ messages in thread
From: Uladzislau Rezki @ 2021-10-20 14:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux Memory Management List, Dave Chinner, Neil Brown,
	Andrew Morton, Christoph Hellwig, linux-fsdevel, LKML,
	Ilya Dryomov, Jeff Layton

On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 20-10-21 15:54:23, Uladzislau Rezki wrote:
> > > > >
> > > > I think adding kind of schedule() will not make things worse and in corner
> > > > cases could prevent a power drain by CPU. It is important for mobile devices.
> > >
> > > I suspect you mean schedule_timeout here? Or cond_resched? I went with a
> > > later for now, I do not have a good idea for how to long to sleep here.
> > > I am more than happy to change to to a sleep though.
> > >
> > cond_resched() reschedules only if TIF_NEED_RESCHED is raised what is not good
> > here. Because in our case we know that we definitely would like to
> > take a breath. Therefore
> > invoking the schedule() is more suitable here. It will give a CPU time
> > to another waiting
> > process(if exists) in any case putting the "current" one to the tail.
>
> Yes, but there is no explicit event to wait for currently.
>
> > As for adding a delay. I am not sure about for how long to delay or i
> > would say i do not
> > see a good explanation why for example we delay for 10 milliseconds or so.
>
> As I've said I am OK with either of the two. Do you or anybody have any
> preference? Without any explicit event to wake up for neither of the two
> is more than just an optimistic retry.
>
From power perspective it is better to have a delay, so i tend to say
that delay is better.

-- 
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-20 14:29               ` Uladzislau Rezki
@ 2021-10-20 14:53                 ` Michal Hocko
  2021-10-20 15:00                   ` Uladzislau Rezki
  0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2021-10-20 14:53 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Linux Memory Management List, Dave Chinner, Neil Brown,
	Andrew Morton, Christoph Hellwig, linux-fsdevel, LKML,
	Ilya Dryomov, Jeff Layton

On Wed 20-10-21 16:29:14, Uladzislau Rezki wrote:
> On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote:
[...]
> > As I've said I am OK with either of the two. Do you or anybody have any
> > preference? Without any explicit event to wake up for neither of the two
> > is more than just an optimistic retry.
> >
> From power perspective it is better to have a delay, so i tend to say
> that delay is better.

I am a terrible random number generator. Can you give me a number
please?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-20 14:53                 ` Michal Hocko
@ 2021-10-20 15:00                   ` Uladzislau Rezki
  2021-10-20 19:24                     ` Uladzislau Rezki
  0 siblings, 1 reply; 39+ messages in thread
From: Uladzislau Rezki @ 2021-10-20 15:00 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux Memory Management List, Dave Chinner, Neil Brown,
	Andrew Morton, Christoph Hellwig, linux-fsdevel, LKML,
	Ilya Dryomov, Jeff Layton

>
> On Wed 20-10-21 16:29:14, Uladzislau Rezki wrote:
> > On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote:
> [...]
> > > As I've said I am OK with either of the two. Do you or anybody have any
> > > preference? Without any explicit event to wake up for neither of the two
> > > is more than just an optimistic retry.
> > >
> > From power perspective it is better to have a delay, so i tend to say
> > that delay is better.
>
> I am a terrible random number generator. Can you give me a number
> please?
>
Well, we can start from one jiffy so it is one timer tick: schedule_timeout(1)

-- 
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-20 15:00                   ` Uladzislau Rezki
@ 2021-10-20 19:24                     ` Uladzislau Rezki
  2021-10-21  8:56                       ` Michal Hocko
  2021-10-21 10:13                       ` NeilBrown
  0 siblings, 2 replies; 39+ messages in thread
From: Uladzislau Rezki @ 2021-10-20 19:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Michal Hocko, Linux Memory Management List, Dave Chinner,
	Neil Brown, Andrew Morton, Christoph Hellwig, linux-fsdevel,
	LKML, Ilya Dryomov, Jeff Layton

On Wed, Oct 20, 2021 at 05:00:28PM +0200, Uladzislau Rezki wrote:
> >
> > On Wed 20-10-21 16:29:14, Uladzislau Rezki wrote:
> > > On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote:
> > [...]
> > > > As I've said I am OK with either of the two. Do you or anybody have any
> > > > preference? Without any explicit event to wake up for neither of the two
> > > > is more than just an optimistic retry.
> > > >
> > > From power perspective it is better to have a delay, so i tend to say
> > > that delay is better.
> >
> > I am a terrible random number generator. Can you give me a number
> > please?
> >
> Well, we can start from one jiffy so it is one timer tick: schedule_timeout(1)
> 
A small nit, it is better to replace it by the simple msleep() call: msleep(jiffies_to_msecs(1));

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-20 19:24                     ` Uladzislau Rezki
@ 2021-10-21  8:56                       ` Michal Hocko
  2021-10-21 10:13                       ` NeilBrown
  1 sibling, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2021-10-21  8:56 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Linux Memory Management List, Dave Chinner, Neil Brown,
	Andrew Morton, Christoph Hellwig, linux-fsdevel, LKML,
	Ilya Dryomov, Jeff Layton

On Wed 20-10-21 21:24:30, Uladzislau Rezki wrote:
> On Wed, Oct 20, 2021 at 05:00:28PM +0200, Uladzislau Rezki wrote:
> > >
> > > On Wed 20-10-21 16:29:14, Uladzislau Rezki wrote:
> > > > On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote:
> > > [...]
> > > > > As I've said I am OK with either of the two. Do you or anybody have any
> > > > > preference? Without any explicit event to wake up for neither of the two
> > > > > is more than just an optimistic retry.
> > > > >
> > > > From power perspective it is better to have a delay, so i tend to say
> > > > that delay is better.
> > >
> > > I am a terrible random number generator. Can you give me a number
> > > please?
> > >
> > Well, we can start from one jiffy so it is one timer tick: schedule_timeout(1)

OK, I will go with 1 jiffy.

> A small nit, it is better to replace it by the simple msleep() call: msleep(jiffies_to_msecs(1));

I have planned to use schedule_timeout_uninterruptible. Why do you think
msleep is better?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-20 19:24                     ` Uladzislau Rezki
  2021-10-21  8:56                       ` Michal Hocko
@ 2021-10-21 10:13                       ` NeilBrown
  2021-10-21 10:27                         ` Michal Hocko
  1 sibling, 1 reply; 39+ messages in thread
From: NeilBrown @ 2021-10-21 10:13 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Michal Hocko, Michal Hocko, Linux Memory Management List,
	Dave Chinner, Andrew Morton, Christoph Hellwig, linux-fsdevel,
	LKML, Ilya Dryomov, Jeff Layton

On Thu, 21 Oct 2021, Uladzislau Rezki wrote:
> On Wed, Oct 20, 2021 at 05:00:28PM +0200, Uladzislau Rezki wrote:
> > >
> > > On Wed 20-10-21 16:29:14, Uladzislau Rezki wrote:
> > > > On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote:
> > > [...]
> > > > > As I've said I am OK with either of the two. Do you or anybody have any
> > > > > preference? Without any explicit event to wake up for neither of the two
> > > > > is more than just an optimistic retry.
> > > > >
> > > > From power perspective it is better to have a delay, so i tend to say
> > > > that delay is better.
> > >
> > > I am a terrible random number generator. Can you give me a number
> > > please?
> > >
> > Well, we can start from one jiffy so it is one timer tick: schedule_timeout(1)
> > 
> A small nit, it is better to replace it by the simple msleep() call: msleep(jiffies_to_msecs(1));

I disagree.  I think schedule_timeout_uninterruptible(1) is the best
wait to sleep for 1 ticl

msleep() contains
  timeout = msecs_to_jiffies(msecs) + 1;
and both jiffies_to_msecs and msecs_to_jiffies might round up too.
So you will sleep for at least twice as long as you asked for, possible
more.

NeilBrown


> 
> --
> Vlad Rezki
> 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-21 10:13                       ` NeilBrown
@ 2021-10-21 10:27                         ` Michal Hocko
  2021-10-21 10:40                           ` Uladzislau Rezki
  0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2021-10-21 10:27 UTC (permalink / raw)
  To: NeilBrown
  Cc: Uladzislau Rezki, Linux Memory Management List, Dave Chinner,
	Andrew Morton, Christoph Hellwig, linux-fsdevel, LKML,
	Ilya Dryomov, Jeff Layton

On Thu 21-10-21 21:13:35, Neil Brown wrote:
> On Thu, 21 Oct 2021, Uladzislau Rezki wrote:
> > On Wed, Oct 20, 2021 at 05:00:28PM +0200, Uladzislau Rezki wrote:
> > > >
> > > > On Wed 20-10-21 16:29:14, Uladzislau Rezki wrote:
> > > > > On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > [...]
> > > > > > As I've said I am OK with either of the two. Do you or anybody have any
> > > > > > preference? Without any explicit event to wake up for neither of the two
> > > > > > is more than just an optimistic retry.
> > > > > >
> > > > > From power perspective it is better to have a delay, so i tend to say
> > > > > that delay is better.
> > > >
> > > > I am a terrible random number generator. Can you give me a number
> > > > please?
> > > >
> > > Well, we can start from one jiffy so it is one timer tick: schedule_timeout(1)
> > > 
> > A small nit, it is better to replace it by the simple msleep() call: msleep(jiffies_to_msecs(1));
> 
> I disagree.  I think schedule_timeout_uninterruptible(1) is the best
> wait to sleep for 1 ticl
> 
> msleep() contains
>   timeout = msecs_to_jiffies(msecs) + 1;
> and both jiffies_to_msecs and msecs_to_jiffies might round up too.
> So you will sleep for at least twice as long as you asked for, possible
> more.

That was my thinking as well. Not to mention jiffies_to_msecs just to do
msecs_to_jiffies right after which seems like a pointless wasting of
cpu cycle. But maybe I was missing some other reasons why msleep would
be superior.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-21 10:27                         ` Michal Hocko
@ 2021-10-21 10:40                           ` Uladzislau Rezki
  2021-10-21 22:49                             ` NeilBrown
  0 siblings, 1 reply; 39+ messages in thread
From: Uladzislau Rezki @ 2021-10-21 10:40 UTC (permalink / raw)
  To: Michal Hocko, NeilBrown
  Cc: NeilBrown, Uladzislau Rezki, Linux Memory Management List,
	Dave Chinner, Andrew Morton, Christoph Hellwig, linux-fsdevel,
	LKML, Ilya Dryomov, Jeff Layton

> On Thu 21-10-21 21:13:35, Neil Brown wrote:
> > On Thu, 21 Oct 2021, Uladzislau Rezki wrote:
> > > On Wed, Oct 20, 2021 at 05:00:28PM +0200, Uladzislau Rezki wrote:
> > > > >
> > > > > On Wed 20-10-21 16:29:14, Uladzislau Rezki wrote:
> > > > > > On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > > [...]
> > > > > > > As I've said I am OK with either of the two. Do you or anybody have any
> > > > > > > preference? Without any explicit event to wake up for neither of the two
> > > > > > > is more than just an optimistic retry.
> > > > > > >
> > > > > > From power perspective it is better to have a delay, so i tend to say
> > > > > > that delay is better.
> > > > >
> > > > > I am a terrible random number generator. Can you give me a number
> > > > > please?
> > > > >
> > > > Well, we can start from one jiffy so it is one timer tick: schedule_timeout(1)
> > > > 
> > > A small nit, it is better to replace it by the simple msleep() call: msleep(jiffies_to_msecs(1));
> > 
> > I disagree.  I think schedule_timeout_uninterruptible(1) is the best
> > wait to sleep for 1 ticl
> > 
> > msleep() contains
> >   timeout = msecs_to_jiffies(msecs) + 1;
> > and both jiffies_to_msecs and msecs_to_jiffies might round up too.
> > So you will sleep for at least twice as long as you asked for, possible
> > more.
> 
> That was my thinking as well. Not to mention jiffies_to_msecs just to do
> msecs_to_jiffies right after which seems like a pointless wasting of
> cpu cycle. But maybe I was missing some other reasons why msleep would
> be superior.
>

To me the msleep is just more simpler from semantic point of view, i.e.
it is as straight forward as it can be. In case of interruptable/uninteraptable
sleep it can be more confusing for people.

When it comes to rounding and possibility to sleep more than 1 tick, it
really does not matter here, we do not need to guarantee exact sleeping
time.

Therefore i proposed to switch to the msleep().

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-21 10:40                           ` Uladzislau Rezki
@ 2021-10-21 22:49                             ` NeilBrown
  2021-10-22  8:18                               ` Michal Hocko
  2021-10-25  9:48                               ` Uladzislau Rezki
  0 siblings, 2 replies; 39+ messages in thread
From: NeilBrown @ 2021-10-21 22:49 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Michal Hocko, Uladzislau Rezki, Linux Memory Management List,
	Dave Chinner, Andrew Morton, Christoph Hellwig, linux-fsdevel,
	LKML, Ilya Dryomov, Jeff Layton

On Thu, 21 Oct 2021, Uladzislau Rezki wrote:
> > On Thu 21-10-21 21:13:35, Neil Brown wrote:
> > > On Thu, 21 Oct 2021, Uladzislau Rezki wrote:
> > > > On Wed, Oct 20, 2021 at 05:00:28PM +0200, Uladzislau Rezki wrote:
> > > > > >
> > > > > > On Wed 20-10-21 16:29:14, Uladzislau Rezki wrote:
> > > > > > > On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > > > [...]
> > > > > > > > As I've said I am OK with either of the two. Do you or anybody have any
> > > > > > > > preference? Without any explicit event to wake up for neither of the two
> > > > > > > > is more than just an optimistic retry.
> > > > > > > >
> > > > > > > From power perspective it is better to have a delay, so i tend to say
> > > > > > > that delay is better.
> > > > > >
> > > > > > I am a terrible random number generator. Can you give me a number
> > > > > > please?
> > > > > >
> > > > > Well, we can start from one jiffy so it is one timer tick: schedule_timeout(1)
> > > > > 
> > > > A small nit, it is better to replace it by the simple msleep() call: msleep(jiffies_to_msecs(1));
> > > 
> > > I disagree.  I think schedule_timeout_uninterruptible(1) is the best
> > > wait to sleep for 1 ticl
> > > 
> > > msleep() contains
> > >   timeout = msecs_to_jiffies(msecs) + 1;
> > > and both jiffies_to_msecs and msecs_to_jiffies might round up too.
> > > So you will sleep for at least twice as long as you asked for, possible
> > > more.
> > 
> > That was my thinking as well. Not to mention jiffies_to_msecs just to do
> > msecs_to_jiffies right after which seems like a pointless wasting of
> > cpu cycle. But maybe I was missing some other reasons why msleep would
> > be superior.
> >
> 
> To me the msleep is just more simpler from semantic point of view, i.e.
> it is as straight forward as it can be. In case of interruptable/uninteraptable
> sleep it can be more confusing for people.

I agree that msleep() is more simple.  I think adding the
jiffies_to_msec() substantially reduces that simplicity.

> 
> When it comes to rounding and possibility to sleep more than 1 tick, it
> really does not matter here, we do not need to guarantee exact sleeping
> time.
> 
> Therefore i proposed to switch to the msleep().

If, as you say, the precision doesn't matter that much, then maybe
   msleep(0)
which would sleep to the start of the next jiffy.  Does that look a bit
weird?  If so, the msleep(1) would be ok.

However now that I've thought about some more, I'd much prefer we
introduce something like
    memalloc_retry_wait();

and use that everywhere that a memory allocation is retried.
I'm not convinced that we need to wait at all - at least, not when
__GFP_DIRECT_RECLAIM is used, as in that case alloc_page will either
  - succeed
  - make some progress a reclaiming or
  - sleep

However I'm not 100% certain, and the behaviour might change in the
future.  So having one place (the definition of memalloc_retry_wait())
where we can change the sleeping behaviour if the alloc_page behavour
changes, would be ideal.  Maybe memalloc_retry_wait() could take a
gfpflags arg.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-21 22:49                             ` NeilBrown
@ 2021-10-22  8:18                               ` Michal Hocko
  2021-10-25  9:48                               ` Uladzislau Rezki
  1 sibling, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2021-10-22  8:18 UTC (permalink / raw)
  To: NeilBrown
  Cc: Uladzislau Rezki, Linux Memory Management List, Dave Chinner,
	Andrew Morton, Christoph Hellwig, linux-fsdevel, LKML,
	Ilya Dryomov, Jeff Layton

On Fri 22-10-21 09:49:08, Neil Brown wrote:
[...]
> However now that I've thought about some more, I'd much prefer we
> introduce something like
>     memalloc_retry_wait();
> 
> and use that everywhere that a memory allocation is retried.
> I'm not convinced that we need to wait at all - at least, not when
> __GFP_DIRECT_RECLAIM is used, as in that case alloc_page will either
>   - succeed
>   - make some progress a reclaiming or
>   - sleep

There
are two that we have to do explicitly vmap_pages_range one is due to
implicit GFP_KERNEL allocations for page tables. Those would likely be a
good fit for something you suggest above. Then we have __get_vm_area_node
retry loop which can be either due to vmalloc space reservation failure
or an implicit GFP_KERNEL allocation by kasan. The first one is not
really related to the memory availability so it doesn't sound like a
good fit.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-21 22:49                             ` NeilBrown
  2021-10-22  8:18                               ` Michal Hocko
@ 2021-10-25  9:48                               ` Uladzislau Rezki
  2021-10-25 11:20                                 ` Michal Hocko
  2021-10-25 23:50                                 ` NeilBrown
  1 sibling, 2 replies; 39+ messages in thread
From: Uladzislau Rezki @ 2021-10-25  9:48 UTC (permalink / raw)
  To: NeilBrown, Michal Hocko
  Cc: Uladzislau Rezki, Michal Hocko, Linux Memory Management List,
	Dave Chinner, Andrew Morton, Christoph Hellwig, linux-fsdevel,
	LKML, Ilya Dryomov, Jeff Layton

On Fri, Oct 22, 2021 at 09:49:08AM +1100, NeilBrown wrote:
> On Thu, 21 Oct 2021, Uladzislau Rezki wrote:
> > > On Thu 21-10-21 21:13:35, Neil Brown wrote:
> > > > On Thu, 21 Oct 2021, Uladzislau Rezki wrote:
> > > > > On Wed, Oct 20, 2021 at 05:00:28PM +0200, Uladzislau Rezki wrote:
> > > > > > >
> > > > > > > On Wed 20-10-21 16:29:14, Uladzislau Rezki wrote:
> > > > > > > > On Wed, Oct 20, 2021 at 4:06 PM Michal Hocko <mhocko@suse.com> wrote:
> > > > > > > [...]
> > > > > > > > > As I've said I am OK with either of the two. Do you or anybody have any
> > > > > > > > > preference? Without any explicit event to wake up for neither of the two
> > > > > > > > > is more than just an optimistic retry.
> > > > > > > > >
> > > > > > > > From power perspective it is better to have a delay, so i tend to say
> > > > > > > > that delay is better.
> > > > > > >
> > > > > > > I am a terrible random number generator. Can you give me a number
> > > > > > > please?
> > > > > > >
> > > > > > Well, we can start from one jiffy so it is one timer tick: schedule_timeout(1)
> > > > > > 
> > > > > A small nit, it is better to replace it by the simple msleep() call: msleep(jiffies_to_msecs(1));
> > > > 
> > > > I disagree.  I think schedule_timeout_uninterruptible(1) is the best
> > > > wait to sleep for 1 ticl
> > > > 
> > > > msleep() contains
> > > >   timeout = msecs_to_jiffies(msecs) + 1;
> > > > and both jiffies_to_msecs and msecs_to_jiffies might round up too.
> > > > So you will sleep for at least twice as long as you asked for, possible
> > > > more.
> > > 
> > > That was my thinking as well. Not to mention jiffies_to_msecs just to do
> > > msecs_to_jiffies right after which seems like a pointless wasting of
> > > cpu cycle. But maybe I was missing some other reasons why msleep would
> > > be superior.
> > >
> > 
> > To me the msleep is just more simpler from semantic point of view, i.e.
> > it is as straight forward as it can be. In case of interruptable/uninteraptable
> > sleep it can be more confusing for people.
> 
> I agree that msleep() is more simple.  I think adding the
> jiffies_to_msec() substantially reduces that simplicity.
> 
> > 
> > When it comes to rounding and possibility to sleep more than 1 tick, it
> > really does not matter here, we do not need to guarantee exact sleeping
> > time.
> > 
> > Therefore i proposed to switch to the msleep().
> 
> If, as you say, the precision doesn't matter that much, then maybe
>    msleep(0)
> which would sleep to the start of the next jiffy.  Does that look a bit
> weird?  If so, the msleep(1) would be ok.
> 
Agree, msleep(1) looks much better rather then converting 1 jiffy to
milliseconds. Result should be the same.

> However now that I've thought about some more, I'd much prefer we
> introduce something like
>     memalloc_retry_wait();
> 
> and use that everywhere that a memory allocation is retried.
> I'm not convinced that we need to wait at all - at least, not when
> __GFP_DIRECT_RECLAIM is used, as in that case alloc_page will either
>   - succeed
>   - make some progress a reclaiming or
>   - sleep
> 
> However I'm not 100% certain, and the behaviour might change in the
> future.  So having one place (the definition of memalloc_retry_wait())
> where we can change the sleeping behaviour if the alloc_page behavour
> changes, would be ideal.  Maybe memalloc_retry_wait() could take a
> gfpflags arg.
> 
At sleeping is required for __get_vm_area_node() because in case of lack
of vmap space it will end up in tight loop without sleeping what is
really bad.

Thanks!

--
Vlad Rezki

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-25  9:48                               ` Uladzislau Rezki
@ 2021-10-25 11:20                                 ` Michal Hocko
  2021-10-25 14:30                                   ` Uladzislau Rezki
  2021-10-25 23:50                                 ` NeilBrown
  1 sibling, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2021-10-25 11:20 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: NeilBrown, Linux Memory Management List, Dave Chinner,
	Andrew Morton, Christoph Hellwig, linux-fsdevel, LKML,
	Ilya Dryomov, Jeff Layton

On Mon 25-10-21 11:48:41, Uladzislau Rezki wrote:
> On Fri, Oct 22, 2021 at 09:49:08AM +1100, NeilBrown wrote:
[...]
> > If, as you say, the precision doesn't matter that much, then maybe
> >    msleep(0)
> > which would sleep to the start of the next jiffy.  Does that look a bit
> > weird?  If so, the msleep(1) would be ok.
> > 
> Agree, msleep(1) looks much better rather then converting 1 jiffy to
> milliseconds. Result should be the same.

I would really prefer if this was not the main point of arguing here.
Unless you feel strongly about msleep I would go with schedule_timeout
here because this is a more widely used interface in the mm code and
also because I feel like that relying on the rounding behavior is just
subtle. Here is what I have staged now.

Are there any other concerns you see with this or other patches in the
series?

Thanks!
--- 
commit c1a7e40e6b56fed5b9e716de7055b77ea29d89d0
Author: Michal Hocko <mhocko@suse.com>
Date:   Wed Oct 20 10:12:45 2021 +0200

    fold me "mm/vmalloc: add support for __GFP_NOFAIL"
    
    Add a short sleep before retrying. 1 jiffy is a completely random
    timeout. Ideally the retry would wait for an explicit event - e.g.
    a change to the vmalloc space change if the failure was caused by
    the space fragmentation or depletion. But there are multiple different
    reasons to retry and this could become much more complex. Keep the retry
    simple for now and just sleep to prevent from hogging CPUs.

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 0fb5413d9239..a866db0c9c31 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2944,6 +2944,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	do {
 		ret = vmap_pages_range(addr, addr + size, prot, area->pages,
 			page_shift);
+		schedule_timeout_uninterruptible(1);
 	} while ((gfp_mask & __GFP_NOFAIL) && (ret < 0));
 
 	if ((gfp_mask & (__GFP_FS | __GFP_IO)) == __GFP_IO)
@@ -3034,8 +3035,10 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 		warn_alloc(gfp_mask, NULL,
 			"vmalloc error: size %lu, vm_struct allocation failed",
 			real_size);
-		if (gfp_mask & __GFP_NOFAIL)
+		if (gfp_mask & __GFP_NOFAIL) {
+			schedule_timeout_uninterruptible(1);
 			goto again;
+		}
 		goto fail;
 	}
 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-25 11:20                                 ` Michal Hocko
@ 2021-10-25 14:30                                   ` Uladzislau Rezki
  2021-10-25 14:56                                     ` Michal Hocko
  0 siblings, 1 reply; 39+ messages in thread
From: Uladzislau Rezki @ 2021-10-25 14:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: NeilBrown, Linux Memory Management List, Dave Chinner,
	Andrew Morton, Christoph Hellwig, linux-fsdevel, LKML,
	Ilya Dryomov, Jeff Layton

>
> I would really prefer if this was not the main point of arguing here.
> Unless you feel strongly about msleep I would go with schedule_timeout
> here because this is a more widely used interface in the mm code and
> also because I feel like that relying on the rounding behavior is just
> subtle. Here is what I have staged now.
>
I have a preference but do not have a strong opinion here. You can go
either way you want.

>
> Are there any other concerns you see with this or other patches in the
> series?
>
it is better if you could send a new vX version because it is hard to
combine every "folded"
into one solid commit. One comment below:

> ---
> commit c1a7e40e6b56fed5b9e716de7055b77ea29d89d0
> Author: Michal Hocko <mhocko@suse.com>
> Date:   Wed Oct 20 10:12:45 2021 +0200
>
>     fold me "mm/vmalloc: add support for __GFP_NOFAIL"
>
>     Add a short sleep before retrying. 1 jiffy is a completely random
>     timeout. Ideally the retry would wait for an explicit event - e.g.
>     a change to the vmalloc space change if the failure was caused by
>     the space fragmentation or depletion. But there are multiple different
>     reasons to retry and this could become much more complex. Keep the retry
>     simple for now and just sleep to prevent from hogging CPUs.
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 0fb5413d9239..a866db0c9c31 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2944,6 +2944,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>         do {
>                 ret = vmap_pages_range(addr, addr + size, prot, area->pages,
>                         page_shift);
> +               schedule_timeout_uninterruptible(1);
>
We do not want to schedule_timeout_uninterruptible(1); every time.
Only when an error is detected.

-- 
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-25 14:30                                   ` Uladzislau Rezki
@ 2021-10-25 14:56                                     ` Michal Hocko
  0 siblings, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2021-10-25 14:56 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: NeilBrown, Linux Memory Management List, Dave Chinner,
	Andrew Morton, Christoph Hellwig, linux-fsdevel, LKML,
	Ilya Dryomov, Jeff Layton

On Mon 25-10-21 16:30:23, Uladzislau Rezki wrote:
> >
> > I would really prefer if this was not the main point of arguing here.
> > Unless you feel strongly about msleep I would go with schedule_timeout
> > here because this is a more widely used interface in the mm code and
> > also because I feel like that relying on the rounding behavior is just
> > subtle. Here is what I have staged now.
> >
> I have a preference but do not have a strong opinion here. You can go
> either way you want.
> 
> >
> > Are there any other concerns you see with this or other patches in the
> > series?
> >
> it is better if you could send a new vX version because it is hard to
> combine every "folded"

Yeah, I plan to soon. I just wanted to sort out most things before
spaming with a new version.

> into one solid commit. One comment below:
> 
> > ---
> > commit c1a7e40e6b56fed5b9e716de7055b77ea29d89d0
> > Author: Michal Hocko <mhocko@suse.com>
> > Date:   Wed Oct 20 10:12:45 2021 +0200
> >
> >     fold me "mm/vmalloc: add support for __GFP_NOFAIL"
> >
> >     Add a short sleep before retrying. 1 jiffy is a completely random
> >     timeout. Ideally the retry would wait for an explicit event - e.g.
> >     a change to the vmalloc space change if the failure was caused by
> >     the space fragmentation or depletion. But there are multiple different
> >     reasons to retry and this could become much more complex. Keep the retry
> >     simple for now and just sleep to prevent from hogging CPUs.
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index 0fb5413d9239..a866db0c9c31 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -2944,6 +2944,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
> >         do {
> >                 ret = vmap_pages_range(addr, addr + size, prot, area->pages,
> >                         page_shift);
> > +               schedule_timeout_uninterruptible(1);
> >
> We do not want to schedule_timeout_uninterruptible(1); every time.
> Only when an error is detected.

Because I was obviously in a brainless mode when doing that one. Thanks
for pointing this out!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-25  9:48                               ` Uladzislau Rezki
  2021-10-25 11:20                                 ` Michal Hocko
@ 2021-10-25 23:50                                 ` NeilBrown
  2021-10-26  7:16                                   ` Michal Hocko
  1 sibling, 1 reply; 39+ messages in thread
From: NeilBrown @ 2021-10-25 23:50 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Michal Hocko, Uladzislau Rezki, Michal Hocko,
	Linux Memory Management List, Dave Chinner, Andrew Morton,
	Christoph Hellwig, linux-fsdevel, LKML, Ilya Dryomov,
	Jeff Layton

On Mon, 25 Oct 2021, Uladzislau Rezki wrote:
> On Fri, Oct 22, 2021 at 09:49:08AM +1100, NeilBrown wrote:
> > However I'm not 100% certain, and the behaviour might change in the
> > future.  So having one place (the definition of memalloc_retry_wait())
> > where we can change the sleeping behaviour if the alloc_page behavour
> > changes, would be ideal.  Maybe memalloc_retry_wait() could take a
> > gfpflags arg.
> > 
> At sleeping is required for __get_vm_area_node() because in case of lack
> of vmap space it will end up in tight loop without sleeping what is
> really bad.
> 
So vmalloc() has two failure modes.  alloc_page() failure and
__alloc_vmap_area() failure.  The caller cannot tell which...

Actually, they can.  If we pass __GFP_NOFAIL to vmalloc(), and it fails,
then it must have been __alloc_vmap_area() which failed.
What do we do in that case?
Can we add a waitq which gets a wakeup when __purge_vmap_area_lazy()
finishes?
If we use the spinlock from that waitq in place of free_vmap_area_lock,
then the wakeup would be nearly free if no-one was waiting, and worth
while if someone was waiting.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-25 23:50                                 ` NeilBrown
@ 2021-10-26  7:16                                   ` Michal Hocko
  2021-10-26 10:24                                     ` NeilBrown
  0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2021-10-26  7:16 UTC (permalink / raw)
  To: NeilBrown
  Cc: Uladzislau Rezki, Linux Memory Management List, Dave Chinner,
	Andrew Morton, Christoph Hellwig, linux-fsdevel, LKML,
	Ilya Dryomov, Jeff Layton

On Tue 26-10-21 10:50:21, Neil Brown wrote:
> On Mon, 25 Oct 2021, Uladzislau Rezki wrote:
> > On Fri, Oct 22, 2021 at 09:49:08AM +1100, NeilBrown wrote:
> > > However I'm not 100% certain, and the behaviour might change in the
> > > future.  So having one place (the definition of memalloc_retry_wait())
> > > where we can change the sleeping behaviour if the alloc_page behavour
> > > changes, would be ideal.  Maybe memalloc_retry_wait() could take a
> > > gfpflags arg.
> > > 
> > At sleeping is required for __get_vm_area_node() because in case of lack
> > of vmap space it will end up in tight loop without sleeping what is
> > really bad.
> > 
> So vmalloc() has two failure modes.  alloc_page() failure and
> __alloc_vmap_area() failure.  The caller cannot tell which...
> 
> Actually, they can.  If we pass __GFP_NOFAIL to vmalloc(), and it fails,
> then it must have been __alloc_vmap_area() which failed.
> What do we do in that case?
> Can we add a waitq which gets a wakeup when __purge_vmap_area_lazy()
> finishes?
> If we use the spinlock from that waitq in place of free_vmap_area_lock,
> then the wakeup would be nearly free if no-one was waiting, and worth
> while if someone was waiting.

Is this really required to be part of the initial support?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-26  7:16                                   ` Michal Hocko
@ 2021-10-26 10:24                                     ` NeilBrown
  2021-10-26 14:25                                       ` Uladzislau Rezki
  0 siblings, 1 reply; 39+ messages in thread
From: NeilBrown @ 2021-10-26 10:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Uladzislau Rezki, Linux Memory Management List, Dave Chinner,
	Andrew Morton, Christoph Hellwig, linux-fsdevel, LKML,
	Ilya Dryomov, Jeff Layton

On Tue, 26 Oct 2021, Michal Hocko wrote:
> On Tue 26-10-21 10:50:21, Neil Brown wrote:
> > On Mon, 25 Oct 2021, Uladzislau Rezki wrote:
> > > On Fri, Oct 22, 2021 at 09:49:08AM +1100, NeilBrown wrote:
> > > > However I'm not 100% certain, and the behaviour might change in the
> > > > future.  So having one place (the definition of memalloc_retry_wait())
> > > > where we can change the sleeping behaviour if the alloc_page behavour
> > > > changes, would be ideal.  Maybe memalloc_retry_wait() could take a
> > > > gfpflags arg.
> > > > 
> > > At sleeping is required for __get_vm_area_node() because in case of lack
> > > of vmap space it will end up in tight loop without sleeping what is
> > > really bad.
> > > 
> > So vmalloc() has two failure modes.  alloc_page() failure and
> > __alloc_vmap_area() failure.  The caller cannot tell which...
> > 
> > Actually, they can.  If we pass __GFP_NOFAIL to vmalloc(), and it fails,
> > then it must have been __alloc_vmap_area() which failed.
> > What do we do in that case?
> > Can we add a waitq which gets a wakeup when __purge_vmap_area_lazy()
> > finishes?
> > If we use the spinlock from that waitq in place of free_vmap_area_lock,
> > then the wakeup would be nearly free if no-one was waiting, and worth
> > while if someone was waiting.
> 
> Is this really required to be part of the initial support?

No.... I was just thinking out-loud.

NeilBrown

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-26 10:24                                     ` NeilBrown
@ 2021-10-26 14:25                                       ` Uladzislau Rezki
  2021-10-26 14:43                                         ` Michal Hocko
  0 siblings, 1 reply; 39+ messages in thread
From: Uladzislau Rezki @ 2021-10-26 14:25 UTC (permalink / raw)
  To: NeilBrown, Michal Hocko
  Cc: Linux Memory Management List, Dave Chinner, Andrew Morton,
	Christoph Hellwig, linux-fsdevel, LKML, Ilya Dryomov,
	Jeff Layton

On Tue, Oct 26, 2021 at 12:24 PM NeilBrown <neilb@suse.de> wrote:
>
> On Tue, 26 Oct 2021, Michal Hocko wrote:
> > On Tue 26-10-21 10:50:21, Neil Brown wrote:
> > > On Mon, 25 Oct 2021, Uladzislau Rezki wrote:
> > > > On Fri, Oct 22, 2021 at 09:49:08AM +1100, NeilBrown wrote:
> > > > > However I'm not 100% certain, and the behaviour might change in the
> > > > > future.  So having one place (the definition of memalloc_retry_wait())
> > > > > where we can change the sleeping behaviour if the alloc_page behavour
> > > > > changes, would be ideal.  Maybe memalloc_retry_wait() could take a
> > > > > gfpflags arg.
> > > > >
> > > > At sleeping is required for __get_vm_area_node() because in case of lack
> > > > of vmap space it will end up in tight loop without sleeping what is
> > > > really bad.
> > > >
> > > So vmalloc() has two failure modes.  alloc_page() failure and
> > > __alloc_vmap_area() failure.  The caller cannot tell which...
> > >
> > > Actually, they can.  If we pass __GFP_NOFAIL to vmalloc(), and it fails,
> > > then it must have been __alloc_vmap_area() which failed.
> > > What do we do in that case?
> > > Can we add a waitq which gets a wakeup when __purge_vmap_area_lazy()
> > > finishes?
> > > If we use the spinlock from that waitq in place of free_vmap_area_lock,
> > > then the wakeup would be nearly free if no-one was waiting, and worth
> > > while if someone was waiting.
> >
> > Is this really required to be part of the initial support?
>
> No.... I was just thinking out-loud.
>
alloc_vmap_area() has an retry path, basically if it fails the code
will try to "purge"
areas and repeat it one more time. So we do not need to purge outside some where
else.

-- 
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-26 14:25                                       ` Uladzislau Rezki
@ 2021-10-26 14:43                                         ` Michal Hocko
  2021-10-26 15:40                                           ` Uladzislau Rezki
  0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2021-10-26 14:43 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: NeilBrown, Linux Memory Management List, Dave Chinner,
	Andrew Morton, Christoph Hellwig, linux-fsdevel, LKML,
	Ilya Dryomov, Jeff Layton

On Tue 26-10-21 16:25:07, Uladzislau Rezki wrote:
> On Tue, Oct 26, 2021 at 12:24 PM NeilBrown <neilb@suse.de> wrote:
> >
> > On Tue, 26 Oct 2021, Michal Hocko wrote:
> > > On Tue 26-10-21 10:50:21, Neil Brown wrote:
> > > > On Mon, 25 Oct 2021, Uladzislau Rezki wrote:
> > > > > On Fri, Oct 22, 2021 at 09:49:08AM +1100, NeilBrown wrote:
> > > > > > However I'm not 100% certain, and the behaviour might change in the
> > > > > > future.  So having one place (the definition of memalloc_retry_wait())
> > > > > > where we can change the sleeping behaviour if the alloc_page behavour
> > > > > > changes, would be ideal.  Maybe memalloc_retry_wait() could take a
> > > > > > gfpflags arg.
> > > > > >
> > > > > At sleeping is required for __get_vm_area_node() because in case of lack
> > > > > of vmap space it will end up in tight loop without sleeping what is
> > > > > really bad.
> > > > >
> > > > So vmalloc() has two failure modes.  alloc_page() failure and
> > > > __alloc_vmap_area() failure.  The caller cannot tell which...
> > > >
> > > > Actually, they can.  If we pass __GFP_NOFAIL to vmalloc(), and it fails,
> > > > then it must have been __alloc_vmap_area() which failed.
> > > > What do we do in that case?
> > > > Can we add a waitq which gets a wakeup when __purge_vmap_area_lazy()
> > > > finishes?
> > > > If we use the spinlock from that waitq in place of free_vmap_area_lock,
> > > > then the wakeup would be nearly free if no-one was waiting, and worth
> > > > while if someone was waiting.
> > >
> > > Is this really required to be part of the initial support?
> >
> > No.... I was just thinking out-loud.
> >
> alloc_vmap_area() has an retry path, basically if it fails the code
> will try to "purge"
> areas and repeat it one more time. So we do not need to purge outside some where
> else.

I think that Neil was not concerned about the need for purging something
but rather a waiting event the retry loop could hook into. So that the
sleep wouldn't have to be a random timeout but something that is
actually actionable - like somebody freeing an area.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL
  2021-10-26 14:43                                         ` Michal Hocko
@ 2021-10-26 15:40                                           ` Uladzislau Rezki
  0 siblings, 0 replies; 39+ messages in thread
From: Uladzislau Rezki @ 2021-10-26 15:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: NeilBrown, Linux Memory Management List, Dave Chinner,
	Andrew Morton, Christoph Hellwig, linux-fsdevel, LKML,
	Ilya Dryomov, Jeff Layton

> > > > Is this really required to be part of the initial support?
> > >
> > > No.... I was just thinking out-loud.
> > >
> > alloc_vmap_area() has an retry path, basically if it fails the code
> > will try to "purge"
> > areas and repeat it one more time. So we do not need to purge outside some where
> > else.
>
> I think that Neil was not concerned about the need for purging something
> but rather a waiting event the retry loop could hook into. So that the
> sleep wouldn't have to be a random timeout but something that is
> actually actionable - like somebody freeing an area.
>
I see this point. But sometimes it is not as straightforward as it could be. If
we have lack of vmap space within a specific range, it might be not about
reclaiming(nobody frees to that area and no outstanding areas) thus we
can do nothing.

-- 
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2021-10-26 15:43 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-18 11:47 [RFC 0/3] extend vmalloc support for constrained allocations Michal Hocko
2021-10-18 11:47 ` [RFC 1/3] mm/vmalloc: alloc GFP_NO{FS,IO} for vmalloc Michal Hocko
2021-10-19  0:44   ` NeilBrown
2021-10-19  6:59     ` Michal Hocko
2021-10-18 11:47 ` [RFC 2/3] mm/vmalloc: add support for __GFP_NOFAIL Michal Hocko
2021-10-18 16:24   ` kernel test robot
2021-10-18 16:24     ` kernel test robot
2021-10-18 16:48   ` Michal Hocko
2021-10-18 18:15   ` kernel test robot
2021-10-18 18:15     ` kernel test robot
2021-10-19 11:06   ` Uladzislau Rezki
2021-10-19 11:52     ` Michal Hocko
2021-10-19 19:46       ` Uladzislau Rezki
2021-10-20  8:25         ` Michal Hocko
2021-10-20  9:18           ` Michal Hocko
2021-10-20 13:54           ` Uladzislau Rezki
2021-10-20 14:06             ` Michal Hocko
2021-10-20 14:29               ` Uladzislau Rezki
2021-10-20 14:53                 ` Michal Hocko
2021-10-20 15:00                   ` Uladzislau Rezki
2021-10-20 19:24                     ` Uladzislau Rezki
2021-10-21  8:56                       ` Michal Hocko
2021-10-21 10:13                       ` NeilBrown
2021-10-21 10:27                         ` Michal Hocko
2021-10-21 10:40                           ` Uladzislau Rezki
2021-10-21 22:49                             ` NeilBrown
2021-10-22  8:18                               ` Michal Hocko
2021-10-25  9:48                               ` Uladzislau Rezki
2021-10-25 11:20                                 ` Michal Hocko
2021-10-25 14:30                                   ` Uladzislau Rezki
2021-10-25 14:56                                     ` Michal Hocko
2021-10-25 23:50                                 ` NeilBrown
2021-10-26  7:16                                   ` Michal Hocko
2021-10-26 10:24                                     ` NeilBrown
2021-10-26 14:25                                       ` Uladzislau Rezki
2021-10-26 14:43                                         ` Michal Hocko
2021-10-26 15:40                                           ` Uladzislau Rezki
2021-10-20  8:25   ` [PATCH] mm/vmalloc: be more explicit about supported gfp flags Michal Hocko
2021-10-18 11:47 ` [RFC 3/3] mm: allow !GFP_KERNEL allocations for kvmalloc Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.