linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>,
	Brian King <brking@linux.vnet.ibm.com>,
	Douglas Miller <dougmill@linux.vnet.ibm.com>,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	Jens Axboe <axboe@fb.com>, Sumit Semwal <sumit.semwal@linaro.org>
Subject: [PATCH 4.9 26/31] blk-mq: Avoid memory reclaim when remapping queues
Date: Sun, 16 Apr 2017 10:04:17 +0200	[thread overview]
Message-ID: <20170416080223.164989771@linuxfoundation.org> (raw)
In-Reply-To: <20170416080221.808058771@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>

commit 36e1f3d107867b25c616c2fd294f5a1c9d4e5d09 upstream.

While stressing memory and IO at the same time we changed SMT settings,
we were able to consistently trigger deadlocks in the mm system, which
froze the entire machine.

I think that under memory stress conditions, the large allocations
performed by blk_mq_init_rq_map may trigger a reclaim, which stalls
waiting on the block layer remmaping completion, thus deadlocking the
system.  The trace below was collected after the machine stalled,
waiting for the hotplug event completion.

The simplest fix for this is to make allocations in this path
non-reclaimable, with GFP_NOIO.  With this patch, We couldn't hit the
issue anymore.

This should apply on top of Jens's for-next branch cleanly.

Changes since v1:
  - Use GFP_NOIO instead of GFP_NOWAIT.

 Call Trace:
[c000000f0160aaf0] [c000000f0160ab50] 0xc000000f0160ab50 (unreliable)
[c000000f0160acc0] [c000000000016624] __switch_to+0x2e4/0x430
[c000000f0160ad20] [c000000000b1a880] __schedule+0x310/0x9b0
[c000000f0160ae00] [c000000000b1af68] schedule+0x48/0xc0
[c000000f0160ae30] [c000000000b1b4b0] schedule_preempt_disabled+0x20/0x30
[c000000f0160ae50] [c000000000b1d4fc] __mutex_lock_slowpath+0xec/0x1f0
[c000000f0160aed0] [c000000000b1d678] mutex_lock+0x78/0xa0
[c000000f0160af00] [d000000019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs]
[c000000f0160b0b0] [d000000019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs]
[c000000f0160b0f0] [d0000000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs]
[c000000f0160b120] [c0000000003172c8] super_cache_scan+0x1f8/0x210
[c000000f0160b190] [c00000000026301c] shrink_slab.part.13+0x21c/0x4c0
[c000000f0160b2d0] [c000000000268088] shrink_zone+0x2d8/0x3c0
[c000000f0160b380] [c00000000026834c] do_try_to_free_pages+0x1dc/0x520
[c000000f0160b450] [c00000000026876c] try_to_free_pages+0xdc/0x250
[c000000f0160b4e0] [c000000000251978] __alloc_pages_nodemask+0x868/0x10d0
[c000000f0160b6f0] [c000000000567030] blk_mq_init_rq_map+0x160/0x380
[c000000f0160b7a0] [c00000000056758c] blk_mq_map_swqueue+0x33c/0x360
[c000000f0160b820] [c000000000567904] blk_mq_queue_reinit+0x64/0xb0
[c000000f0160b850] [c00000000056a16c] blk_mq_queue_reinit_notify+0x19c/0x250
[c000000f0160b8a0] [c0000000000f5d38] notifier_call_chain+0x98/0x100
[c000000f0160b8f0] [c0000000000c5fb0] __cpu_notify+0x70/0xe0
[c000000f0160b930] [c0000000000c63c4] notify_prepare+0x44/0xb0
[c000000f0160b9b0] [c0000000000c52f4] cpuhp_invoke_callback+0x84/0x250
[c000000f0160ba10] [c0000000000c570c] cpuhp_up_callbacks+0x5c/0x120
[c000000f0160ba60] [c0000000000c7cb8] _cpu_up+0xf8/0x1d0
[c000000f0160bac0] [c0000000000c7eb0] do_cpu_up+0x120/0x150
[c000000f0160bb40] [c0000000006fe024] cpu_subsys_online+0x64/0xe0
[c000000f0160bb90] [c0000000006f5124] device_online+0xb4/0x120
[c000000f0160bbd0] [c0000000006f5244] online_store+0xb4/0xc0
[c000000f0160bc20] [c0000000006f0a68] dev_attr_store+0x68/0xa0
[c000000f0160bc60] [c0000000003ccc30] sysfs_kf_write+0x80/0xb0
[c000000f0160bca0] [c0000000003cbabc] kernfs_fop_write+0x17c/0x250
[c000000f0160bcf0] [c00000000030fe6c] __vfs_write+0x6c/0x1e0
[c000000f0160bd90] [c000000000311490] vfs_write+0xd0/0x270
[c000000f0160bde0] [c0000000003131fc] SyS_write+0x6c/0x110
[c000000f0160be30] [c000000000009204] system_call+0x38/0xec

Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Douglas Miller <dougmill@linux.vnet.ibm.com>
Cc: linux-block@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 block/blk-mq.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1474,7 +1474,7 @@ static struct blk_mq_tags *blk_mq_init_r
 	INIT_LIST_HEAD(&tags->page_list);
 
 	tags->rqs = kzalloc_node(set->queue_depth * sizeof(struct request *),
-				 GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY,
+				 GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY,
 				 set->numa_node);
 	if (!tags->rqs) {
 		blk_mq_free_tags(tags);
@@ -1500,7 +1500,7 @@ static struct blk_mq_tags *blk_mq_init_r
 
 		do {
 			page = alloc_pages_node(set->numa_node,
-				GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO,
+				GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO,
 				this_order);
 			if (page)
 				break;
@@ -1521,7 +1521,7 @@ static struct blk_mq_tags *blk_mq_init_r
 		 * Allow kmemleak to scan these pages as they contain pointers
 		 * to additional allocations like via ops->init_request().
 		 */
-		kmemleak_alloc(p, order_to_size(this_order), 1, GFP_KERNEL);
+		kmemleak_alloc(p, order_to_size(this_order), 1, GFP_NOIO);
 		entries_per_page = order_to_size(this_order) / rq_size;
 		to_do = min(entries_per_page, set->queue_depth - i);
 		left -= to_do * rq_size;

  parent reply	other threads:[~2017-04-16  8:13 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-16  8:03 [PATCH 4.9 00/31] 4.9.23-stable review Greg Kroah-Hartman
2017-04-16  8:03 ` [PATCH 4.9 05/31] drm/i915: Drop support for I915_EXEC_CONSTANTS_* execbuf parameters Greg Kroah-Hartman
2017-04-16  8:03 ` [PATCH 4.9 06/31] drm/i915: Stop using RP_DOWN_EI on Baytrail Greg Kroah-Hartman
2017-04-16  8:03 ` [PATCH 4.9 07/31] drm/i915: Avoid rcu_barrier() from reclaim paths (shrinker) Greg Kroah-Hartman
2017-04-16  8:03 ` [PATCH 4.9 08/31] orangefs: fix memory leak of string new on exit path Greg Kroah-Hartman
2017-04-16  8:04 ` [PATCH 4.9 09/31] orangefs: Dan Carpenter influenced cleanups Greg Kroah-Hartman
2017-04-16  8:04 ` [PATCH 4.9 10/31] orangefs: fix buffer size mis-match between kernel space and user space Greg Kroah-Hartman
2017-04-16  8:04 ` [PATCH 4.9 12/31] rt2x00usb: fix anchor initialization Greg Kroah-Hartman
2017-04-16  8:04 ` [PATCH 4.9 13/31] rt2x00usb: do not anchor rx and tx urbs Greg Kroah-Hartman
2017-04-16  8:04 ` [PATCH 4.9 14/31] rt2x00: Fix incorrect usage of CONFIG_RT2X00_LIB_USB Greg Kroah-Hartman
2017-04-16  8:04 ` [PATCH 4.9 16/31] MIPS: Introduce irq_stack Greg Kroah-Hartman
2017-04-16  8:04 ` [PATCH 4.9 17/31] MIPS: Stack unwinding while on IRQ stack Greg Kroah-Hartman
2017-04-16  8:04 ` [PATCH 4.9 18/31] MIPS: Only change $28 to thread_info if coming from user mode Greg Kroah-Hartman
2017-04-16  8:04 ` [PATCH 4.9 19/31] MIPS: Switch to the irq_stack in interrupts Greg Kroah-Hartman
2017-04-16  8:04 ` [PATCH 4.9 20/31] MIPS: Select HAVE_IRQ_EXIT_ON_IRQ_STACK Greg Kroah-Hartman
2017-04-16  8:04 ` [PATCH 4.9 21/31] MIPS: IRQ Stack: Fix erroneous jal to plat_irq_dispatch Greg Kroah-Hartman
2017-04-16  8:04 ` [PATCH 4.9 24/31] [PATCH] Revert "drm/i915/execlists: Reset RING registers upon resume" Greg Kroah-Hartman
2017-04-16  8:04 ` [PATCH 4.9 25/31] net/packet: fix overflow in check for priv area size Greg Kroah-Hartman
2017-04-16  8:04 ` Greg Kroah-Hartman [this message]
2017-04-16  8:04 ` [PATCH 4.9 27/31] usb: hub: Wait for connection to be reestablished after port reset Greg Kroah-Hartman
2017-04-16  8:04 ` [PATCH 4.9 28/31] net/mlx4_en: Fix bad WQE issue Greg Kroah-Hartman
2017-04-16  8:04 ` [PATCH 4.9 29/31] net/mlx4_core: Fix racy CQ (Completion Queue) free Greg Kroah-Hartman
2017-04-16  8:04 ` [PATCH 4.9 30/31] net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions Greg Kroah-Hartman
2017-04-16 21:29 ` [PATCH 4.9 00/31] 4.9.23-stable review Guenter Roeck
2017-04-17 18:21 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170416080223.164989771@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@fb.com \
    --cc=brking@linux.vnet.ibm.com \
    --cc=dougmill@linux.vnet.ibm.com \
    --cc=krisman@linux.vnet.ibm.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=sumit.semwal@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).