Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Dave Chinner <david@fromorbit.com>,
	Ming Lei <ming.lei@canonical.com>
Subject: __vmalloc() vs. GFP_NOIO/GFP_NOFS
Date: Sun, 3 Jan 2016 07:12:47 +0000
Message-ID: <20160103071246.GK9938@ZenIV.linux.org.uk> (raw)

	While trying to write documentation on allocator choice, I've run
into something odd:
        /*
         * __vmalloc() will allocate data pages and auxillary structures (e.g.
         * pagetables) with GFP_KERNEL, yet we may be under GFP_NOFS context
         * here. Hence we need to tell memory reclaim that we are in such a
         * context via PF_MEMALLOC_NOIO to prevent memory reclaim re-entering
         * the filesystem here and potentially deadlocking.
         */
in XFS kmem_zalloc_large().  The comment is correct - __vmalloc() (actually,
map_vm_area() called from __vmalloc_area_node()) ignores gfp_flags; prior
to that point it does take care to pass __GFP_IO/__GFP_FS to page allocator,
but once the data pages are allocated and we get around to inserting them
into page tables those are ignored.

Allocation page tables doesn't have gfp argument at all.  Trying to propagate
it down there could be done, but it's not attractive.

Another approach is memalloc_noio_save(), actually used by XFS and some other
__vmalloc() callers that might be getting GFP_NOIO or GFP_NOFS.  That
works, but not all such callers are using that mechanism.  For example,
drbd bm_realloc_pages() has GFP_NOIO __vmalloc() with no memalloc_noio_...
in sight.  Either that GFP_NOIO is not needed there (quite possible) or
there's a deadlock in that code.  The same goes for ipoib.c ipoib_cm_tx_init();
again, either that GFP_NOIO is not needed, or it can deadlock.

Those, AFAICS, are such callers with GFP_NOIO; however, there's a shitload
of GFP_NOFS ones.  XFS uses memalloc_noio_save(), but a _lot_ of other
callers do not.  For example, all call chains leading to ceph_kvmalloc()
pass GFP_NOFS and none of them is under memalloc_noio_save().  The same
goes for GFS2 __vmalloc() callers, etc.  Again, quite a few of those probably
do not need GFP_NOFS at all, but those that do would appear to have
hard-to-trigger deadlocks.

Why do we do that in callers, though?  I.e. why not do something like this:

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 8e3c9c5..412c5d6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1622,6 +1622,16 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 			cond_resched();
 	}
 
+	if (unlikely(!(gfp_mask & __GFP_IO))) {
+		unsigned flags = memalloc_noio_save();
+		if (map_vm_area(area, prot, pages)) {
+			memalloc_noio_restore(flags);
+			goto fail;
+		}
+		memalloc_noio_restore(flags);
+		return area->addr;
+	}
+
 	if (map_vm_area(area, prot, pages))
 		goto fail;
 	return area->addr;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply index

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-03  7:12 Al Viro [this message]
2016-01-03 16:56 ` Al Viro
2016-01-03 20:12 ` Dave Chinner
2016-01-03 20:35   ` Al Viro
2016-01-05 15:35     ` Michal Hocko
2016-01-04 13:40 ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160103071246.GK9938@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ming.lei@canonical.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git