linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
	Michal Hocko <mhocko@suse.com>, Wei Wang <wei.w.wang@intel.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Subject: [PATCH 4.14 43/45] virtio_balloon: fix deadlock on OOM
Date: Thu, 11 Oct 2018 17:40:10 +0200	[thread overview]
Message-ID: <20181011152510.807351989@linuxfoundation.org> (raw)
In-Reply-To: <20181011152508.885515042@linuxfoundation.org>

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Michael S. Tsirkin <mst@redhat.com>

commit c7cdff0e864713a089d7cb3a2b1136ba9a54881a upstream.

fill_balloon doing memory allocations under balloon_lock
can cause a deadlock when leak_balloon is called from
virtballoon_oom_notify and tries to take same lock.

To fix, split page allocation and enqueue and do allocations outside the lock.

Here's a detailed analysis of the deadlock by Tetsuo Handa:

In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
serialize against fill_balloon(). But in fill_balloon(),
alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
is specified, this allocation attempt might indirectly depend on somebody
else's __GFP_DIRECT_RECLAIM memory allocation. And such indirect
__GFP_DIRECT_RECLAIM memory allocation might call leak_balloon() via
virtballoon_oom_notify() via blocking_notifier_call_chain() callback via
out_of_memory() when it reached __alloc_pages_may_oom() and held oom_lock
mutex. Since vb->balloon_lock mutex is already held by fill_balloon(), it
will cause OOM lockup.

  Thread1                                       Thread2
    fill_balloon()
      takes a balloon_lock
      balloon_page_enqueue()
        alloc_page(GFP_HIGHUSER_MOVABLE)
          direct reclaim (__GFP_FS context)       takes a fs lock
            waits for that fs lock                  alloc_page(GFP_NOFS)
                                                      __alloc_pages_may_oom()
                                                        takes the oom_lock
                                                        out_of_memory()
                                                          blocking_notifier_call_chain()
                                                            leak_balloon()
                                                              tries to take that balloon_lock and deadlocks

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 drivers/virtio/virtio_balloon.c    |   24 +++++++++++++++++++-----
 include/linux/balloon_compaction.h |   35 ++++++++++++++++++++++++++++++++++-
 mm/balloon_compaction.c            |   28 +++++++++++++++++++++-------
 3 files changed, 74 insertions(+), 13 deletions(-)

--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -143,16 +143,17 @@ static void set_page_pfns(struct virtio_
 
 static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 {
-	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
 	unsigned num_allocated_pages;
+	unsigned num_pfns;
+	struct page *page;
+	LIST_HEAD(pages);
 
 	/* We can only do one array worth at a time. */
 	num = min(num, ARRAY_SIZE(vb->pfns));
 
-	mutex_lock(&vb->balloon_lock);
-	for (vb->num_pfns = 0; vb->num_pfns < num;
-	     vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-		struct page *page = balloon_page_enqueue(vb_dev_info);
+	for (num_pfns = 0; num_pfns < num;
+	     num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
+		struct page *page = balloon_page_alloc();
 
 		if (!page) {
 			dev_info_ratelimited(&vb->vdev->dev,
@@ -162,6 +163,19 @@ static unsigned fill_balloon(struct virt
 			msleep(200);
 			break;
 		}
+
+		balloon_page_push(&pages, page);
+	}
+
+	mutex_lock(&vb->balloon_lock);
+
+	vb->num_pfns = 0;
+
+	while ((page = balloon_page_pop(&pages))) {
+		balloon_page_enqueue(&vb->vb_dev_info, page);
+
+		vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE;
+
 		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
 		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
 		if (!virtio_has_feature(vb->vdev,
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -50,6 +50,7 @@
 #include <linux/gfp.h>
 #include <linux/err.h>
 #include <linux/fs.h>
+#include <linux/list.h>
 
 /*
  * Balloon device information descriptor.
@@ -67,7 +68,9 @@ struct balloon_dev_info {
 	struct inode *inode;
 };
 
-extern struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info);
+extern struct page *balloon_page_alloc(void);
+extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
+				 struct page *page);
 extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
 
 static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
@@ -193,4 +196,34 @@ static inline gfp_t balloon_mapping_gfp_
 }
 
 #endif /* CONFIG_BALLOON_COMPACTION */
+
+/*
+ * balloon_page_push - insert a page into a page list.
+ * @head : pointer to list
+ * @page : page to be added
+ *
+ * Caller must ensure the page is private and protect the list.
+ */
+static inline void balloon_page_push(struct list_head *pages, struct page *page)
+{
+	list_add(&page->lru, pages);
+}
+
+/*
+ * balloon_page_pop - remove a page from a page list.
+ * @head : pointer to list
+ * @page : page to be added
+ *
+ * Caller must ensure the page is private and protect the list.
+ */
+static inline struct page *balloon_page_pop(struct list_head *pages)
+{
+	struct page *page = list_first_entry_or_null(pages, struct page, lru);
+
+	if (!page)
+		return NULL;
+
+	list_del(&page->lru);
+	return page;
+}
 #endif /* _LINUX_BALLOON_COMPACTION_H */
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -11,22 +11,37 @@
 #include <linux/balloon_compaction.h>
 
 /*
+ * balloon_page_alloc - allocates a new page for insertion into the balloon
+ *			  page list.
+ *
+ * Driver must call it to properly allocate a new enlisted balloon page.
+ * Driver must call balloon_page_enqueue before definitively removing it from
+ * the guest system.  This function returns the page address for the recently
+ * allocated page or NULL in the case we fail to allocate a new page this turn.
+ */
+struct page *balloon_page_alloc(void)
+{
+	struct page *page = alloc_page(balloon_mapping_gfp_mask() |
+				       __GFP_NOMEMALLOC | __GFP_NORETRY);
+	return page;
+}
+EXPORT_SYMBOL_GPL(balloon_page_alloc);
+
+/*
  * balloon_page_enqueue - allocates a new page and inserts it into the balloon
  *			  page list.
  * @b_dev_info: balloon device descriptor where we will insert a new page to
+ * @page: new page to enqueue - allocated using balloon_page_alloc.
  *
- * Driver must call it to properly allocate a new enlisted balloon page
+ * Driver must call it to properly enqueue a new allocated balloon page
  * before definitively removing it from the guest system.
  * This function returns the page address for the recently enqueued page or
  * NULL in the case we fail to allocate a new page this turn.
  */
-struct page *balloon_page_enqueue(struct balloon_dev_info *b_dev_info)
+void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
+			  struct page *page)
 {
 	unsigned long flags;
-	struct page *page = alloc_page(balloon_mapping_gfp_mask() |
-				       __GFP_NOMEMALLOC | __GFP_NORETRY);
-	if (!page)
-		return NULL;
 
 	/*
 	 * Block others from accessing the 'page' when we get around to
@@ -39,7 +54,6 @@ struct page *balloon_page_enqueue(struct
 	__count_vm_event(BALLOON_INFLATE);
 	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
 	unlock_page(page);
-	return page;
 }
 EXPORT_SYMBOL_GPL(balloon_page_enqueue);
 



  parent reply	other threads:[~2018-10-11 15:47 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-11 15:39 [PATCH 4.14 00/45] 4.14.76-stable review Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 01/45] perf/core: Add sanity check to deal with pinned event failure Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 02/45] mm: migration: fix migration of huge PMD shared pages Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 03/45] mm, thp: fix mlocking THP page with migration enabled Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 04/45] mm/vmstat.c: skip NR_TLB_REMOTE_FLUSH* properly Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 05/45] KVM: x86: fix L1TFs MMIO GFN calculation Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 06/45] blk-mq: I/O and timer unplugs are inverted in blktrace Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 07/45] clocksource/drivers/timer-atmel-pit: Properly handle error cases Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 08/45] fbdev/omapfb: fix omapfb_memory_read infoleak Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 09/45] xen-netback: fix input validation in xenvif_set_hash_mapping() Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 10/45] drm/amdgpu: Fix vce work queue was not cancelled when suspend Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 11/45] drm/syncobj: Dont leak fences when WAIT_FOR_SUBMIT is set Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 12/45] x86/vdso: Fix asm constraints on vDSO syscall fallbacks Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 13/45] selftests/x86: Add clock_gettime() tests to test_vdso Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 14/45] x86/vdso: Only enable vDSO retpolines when enabled and supported Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 15/45] x86/vdso: Fix vDSO syscall fallback asm constraint regression Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 16/45] PCI: Reprogram bridge prefetch registers on resume Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 17/45] mac80211: fix setting IEEE80211_KEY_FLAG_RX_MGMT for AP mode keys Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 18/45] PM / core: Clear the direct_complete flag on errors Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 19/45] dm cache metadata: ignore hints array being too small during resize Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 20/45] dm cache: fix resize crash if user doesnt reload cache table Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 21/45] xhci: Add missing CAS workaround for Intel Sunrise Point xHCI Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 22/45] usb: xhci-mtk: resume USB3 roothub first Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 23/45] USB: serial: simple: add Motorola Tetra MTP6550 id Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 24/45] usb: cdc_acm: Do not leak URB buffers Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 25/45] tty: Drop tty->count on tty_reopen() failure Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 26/45] of: unittest: Disable interrupt node tests for old world MAC systems Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 27/45] perf annotate: Use asprintf when formatting objdump command line Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 28/45] perf tools: Fix python extension build for gcc 8 Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 29/45] cgroup/cpuset: remove circular dependency deadlock Greg Kroah-Hartman
2018-10-11 19:33   ` Sudip Mukherjee
2018-10-12 11:05     ` Greg Kroah-Hartman
2018-10-16 18:46       ` Amit Pundir
2018-10-11 15:39 ` [PATCH 4.14 30/45] ath10k: fix use-after-free in ath10k_wmi_cmd_send_nowait Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 31/45] ath10k: fix kernel panic issue during pci probe Greg Kroah-Hartman
2018-10-11 15:39 ` [PATCH 4.14 32/45] nvme_fc: fix ctrl create failures racing with workq items Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 33/45] powerpc/lib/code-patching: refactor patch_instruction() Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 34/45] powerpc: Avoid code patching freed init sections Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 35/45] powerpc/lib: fix book3s/32 boot failure due to code patching Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 36/45] ARC: clone syscall to setp r25 as thread pointer Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 37/45] crypto: chelsio - Fix memory corruption in DMA Mapped buffers Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 38/45] perf utils: Move is_directory() to path.h Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 39/45] f2fs: fix invalid memory access Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 40/45] ucma: fix a use-after-free in ucma_resolve_ip() Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 41/45] ubifs: Check for name being NULL while mounting Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 42/45] rds: rds_ib_recv_alloc_cache() should call alloc_percpu_gfp() instead Greg Kroah-Hartman
2018-10-11 15:40 ` Greg Kroah-Hartman [this message]
2018-10-11 15:40 ` [PATCH 4.14 44/45] virtio_balloon: fix increment of vb->num_pfns in fill_balloon() Greg Kroah-Hartman
2018-10-11 15:40 ` [PATCH 4.14 45/45] ath10k: fix scan crash due to incorrect length calculation Greg Kroah-Hartman
2018-10-11 22:37 ` [PATCH 4.14 00/45] 4.14.76-stable review Shuah Khan
2018-10-12  4:27 ` Naresh Kamboju
2018-10-12  7:50 ` Jon Hunter
2018-10-12 10:24   ` Greg Kroah-Hartman
2018-10-12 15:43 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181011152510.807351989@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=mst@redhat.com \
    --cc=penguin-kernel@I-love.SAKURA.ne.jp \
    --cc=stable@vger.kernel.org \
    --cc=sudipm.mukherjee@gmail.com \
    --cc=wei.w.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).