linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>,
	Glauber Costa <glauber@scylladb.com>,
	Sasha Levin <sashal@kernel.org>,
	linux-fsdevel@vger.kernel.org, io-uring@vger.kernel.org
Subject: [PATCH AUTOSEL 5.5 48/58] io-wq: don't call kXalloc_node() with non-online node
Date: Sat, 22 Feb 2020 21:21:09 -0500	[thread overview]
Message-ID: <20200223022119.707-48-sashal@kernel.org> (raw)
In-Reply-To: <20200223022119.707-1-sashal@kernel.org>

From: Jens Axboe <axboe@kernel.dk>

[ Upstream commit 7563439adfae153b20331f1567c8b5d0e5cbd8a7 ]

Glauber reports a crash on init on a box he has:

 RIP: 0010:__alloc_pages_nodemask+0x132/0x340
 Code: 18 01 75 04 41 80 ce 80 89 e8 48 8b 54 24 08 8b 74 24 1c c1 e8 0c 48 8b 3c 24 83 e0 01 88 44 24 20 48 85 d2 0f 85 74 01 00 00 <3b> 77 08 0f 82 6b 01 00 00 48 89 7c 24 10 89 ea 48 8b 07 b9 00 02
 RSP: 0018:ffffb8be4d0b7c28 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000e8e8
 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000002080
 RBP: 0000000000012cc0 R08: 0000000000000000 R09: 0000000000000002
 R10: 0000000000000dc0 R11: ffff995c60400100 R12: 0000000000000000
 R13: 0000000000012cc0 R14: 0000000000000001 R15: ffff995c60db00f0
 FS:  00007f4d115ca900(0000) GS:ffff995c60d80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000002088 CR3: 00000017cca66002 CR4: 00000000007606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  alloc_slab_page+0x46/0x320
  new_slab+0x9d/0x4e0
  ___slab_alloc+0x507/0x6a0
  ? io_wq_create+0xb4/0x2a0
  __slab_alloc+0x1c/0x30
  kmem_cache_alloc_node_trace+0xa6/0x260
  io_wq_create+0xb4/0x2a0
  io_uring_setup+0x97f/0xaa0
  ? io_remove_personalities+0x30/0x30
  ? io_poll_trigger_evfd+0x30/0x30
  do_syscall_64+0x5b/0x1c0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7f4d116cb1ed

which is due to the 'wqe' and 'worker' allocation being node affine.
But it isn't valid to call the node affine allocation if the node isn't
online.

Setup structures for even offline nodes, as usual, but skip them in
terms of thread setup to not waste resources. If the node isn't online,
just alloc memory with NUMA_NO_NODE.

Reported-by: Glauber Costa <glauber@scylladb.com>
Tested-by: Glauber Costa <glauber@scylladb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/io-wq.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index 0dc4bb6de6566..25ffb6685baea 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -666,11 +666,16 @@ static int io_wq_manager(void *data)
 	/* create fixed workers */
 	refcount_set(&wq->refs, workers_to_create);
 	for_each_node(node) {
+		if (!node_online(node))
+			continue;
 		if (!create_io_worker(wq, wq->wqes[node], IO_WQ_ACCT_BOUND))
 			goto err;
 		workers_to_create--;
 	}
 
+	while (workers_to_create--)
+		refcount_dec(&wq->refs);
+
 	complete(&wq->done);
 
 	while (!kthread_should_stop()) {
@@ -678,6 +683,9 @@ static int io_wq_manager(void *data)
 			struct io_wqe *wqe = wq->wqes[node];
 			bool fork_worker[2] = { false, false };
 
+			if (!node_online(node))
+				continue;
+
 			spin_lock_irq(&wqe->lock);
 			if (io_wqe_need_worker(wqe, IO_WQ_ACCT_BOUND))
 				fork_worker[IO_WQ_ACCT_BOUND] = true;
@@ -793,7 +801,9 @@ static bool io_wq_for_each_worker(struct io_wqe *wqe,
 
 	list_for_each_entry_rcu(worker, &wqe->all_list, all_list) {
 		if (io_worker_get(worker)) {
-			ret = func(worker, data);
+			/* no task if node is/was offline */
+			if (worker->task)
+				ret = func(worker, data);
 			io_worker_release(worker);
 			if (ret)
 				break;
@@ -1006,6 +1016,8 @@ void io_wq_flush(struct io_wq *wq)
 	for_each_node(node) {
 		struct io_wqe *wqe = wq->wqes[node];
 
+		if (!node_online(node))
+			continue;
 		init_completion(&data.done);
 		INIT_IO_WORK(&data.work, io_wq_flush_func);
 		data.work.flags |= IO_WQ_WORK_INTERNAL;
@@ -1038,12 +1050,15 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
 
 	for_each_node(node) {
 		struct io_wqe *wqe;
+		int alloc_node = node;
 
-		wqe = kzalloc_node(sizeof(struct io_wqe), GFP_KERNEL, node);
+		if (!node_online(alloc_node))
+			alloc_node = NUMA_NO_NODE;
+		wqe = kzalloc_node(sizeof(struct io_wqe), GFP_KERNEL, alloc_node);
 		if (!wqe)
 			goto err;
 		wq->wqes[node] = wqe;
-		wqe->node = node;
+		wqe->node = alloc_node;
 		wqe->acct[IO_WQ_ACCT_BOUND].max_workers = bounded;
 		atomic_set(&wqe->acct[IO_WQ_ACCT_BOUND].nr_running, 0);
 		if (wq->user) {
@@ -1051,7 +1066,6 @@ struct io_wq *io_wq_create(unsigned bounded, struct io_wq_data *data)
 					task_rlimit(current, RLIMIT_NPROC);
 		}
 		atomic_set(&wqe->acct[IO_WQ_ACCT_UNBOUND].nr_running, 0);
-		wqe->node = node;
 		wqe->wq = wq;
 		spin_lock_init(&wqe->lock);
 		INIT_WQ_LIST(&wqe->work_list);
-- 
2.20.1


  parent reply	other threads:[~2020-02-23  2:22 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-23  2:20 [PATCH AUTOSEL 5.5 01/58] ipmi:ssif: Handle a possible NULL pointer reference Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 02/58] drm/msm: Set dma maximum segment size for mdss Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 03/58] sched/core: Don't skip remote tick for idle CPUs Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 04/58] timers/nohz: Update NOHZ load in remote tick Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 05/58] sched/fair: Prevent unlimited runtime on throttled group Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 06/58] dax: pass NOWAIT flag to iomap_apply Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 07/58] mac80211: consider more elements in parsing CRC Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 08/58] cfg80211: check wiphy driver existence for drvinfo report Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 09/58] io_uring: flush overflowed CQ events in the io_uring_poll() Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 10/58] s390/zcrypt: fix card and queue total counter wrap Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 11/58] qmi_wwan: re-add DW5821e pre-production variant Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 12/58] qmi_wwan: unconditionally reject 2 ep interfaces Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 13/58] NFSv4: Fix races between open and dentry revalidation Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 14/58] perf/smmuv3: Use platform_get_irq_optional() for wired interrupt Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 15/58] arm/ftrace: Fix BE text poking Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 16/58] sched/psi: Fix OOB write when writing 0 bytes to PSI files Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 17/58] perf/x86/intel: Add Elkhart Lake support Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 18/58] perf/x86/cstate: Add Tremont support Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 19/58] perf/x86/msr: " Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 20/58] ceph: do not execute direct write in parallel if O_APPEND is specified Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 21/58] ARM: dts: sti: fixup sound frame-inversion for stihxxx-b2120.dtsi Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 22/58] drm/amd/display: Do not set optimized_require to false after plane disable Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 23/58] RDMA/siw: Remove unwanted WARN_ON in siw_cm_llp_data_ready() Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 24/58] drm/amd/display: Check engine is not NULL before acquiring Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 25/58] drm/amd/display: Limit minimum DPPCLK to 100MHz Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 26/58] drm/amd/display: Add initialitions for PLL2 clock source Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 27/58] amdgpu: Prevent build errors regarding soft/hard-float FP ABI tags Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 28/58] soc/tegra: fuse: Fix build with Tegra194 configuration Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 29/58] i40e: Fix the conditional for i40e_vc_validate_vqs_bitmaps Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 30/58] net: ena: fix potential crash when rxfh key is NULL Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 31/58] net: ena: fix uses of round_jiffies() Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 32/58] net: ena: add missing ethtool TX timestamping indication Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 33/58] net: ena: fix incorrect default RSS key Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 34/58] net: ena: rss: do not allocate key when not supported Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 35/58] net: ena: rss: fix failure to get indirection table Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 36/58] net: ena: rss: store hash function as values and not bits Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 37/58] net: ena: fix incorrectly saving queue numbers when setting RSS indirection table Sasha Levin
2020-02-23  2:20 ` [PATCH AUTOSEL 5.5 38/58] net: ena: fix corruption of dev_idx_to_host_tbl Sasha Levin
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 39/58] net: ena: ethtool: use correct value for crc32 hash Sasha Levin
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 40/58] net: ena: ena-com.c: prevent NULL pointer dereference Sasha Levin
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 41/58] enic: prevent waking up stopped tx queues over watchdog reset Sasha Levin
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 42/58] ice: Fix switch between FW and SW LLDP Sasha Levin
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 43/58] ice: Don't allow same value for Rx tail to be written twice Sasha Levin
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 44/58] ice: fix and consolidate logging of NVM/firmware version information Sasha Levin
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 45/58] ice: update Unit Load Status bitmask to check after reset Sasha Levin
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 46/58] ice: Use ice_pf_to_dev Sasha Levin
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 47/58] ice: Use correct netif error function Sasha Levin
2020-02-23  2:21 ` Sasha Levin [this message]
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 49/58] cifs: Fix mode output in debugging statements Sasha Levin
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 50/58] bcache: ignore pending signals when creating gc and allocator thread Sasha Levin
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 51/58] cfg80211: add missing policy for NL80211_ATTR_STATUS_CODE Sasha Levin
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 52/58] mac80211: fix wrong 160/80+80 MHz setting Sasha Levin
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 53/58] net: hns3: add management table after IMP reset Sasha Levin
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 54/58] net: hns3: fix VF bandwidth does not take effect in some case Sasha Levin
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 55/58] net: hns3: fix a copying IPv6 address error in hclge_fd_get_flow_tuples() Sasha Levin
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 56/58] nvme/tcp: fix bug on double requeue when send fails Sasha Levin
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 57/58] nvme: prevent warning triggered by nvme_stop_keep_alive Sasha Levin
2020-02-23  2:21 ` [PATCH AUTOSEL 5.5 58/58] nvme/pci: move cqe check after device shutdown Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200223022119.707-48-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=glauber@scylladb.com \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).