From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: <linux-kernel@vger.kernel.org>, Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>,
Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>,
"Gu, Zheng" <guz.fnst@cn.fujitsu.com>,
tangchen <tangchen@cn.fujitsu.com>,
Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>
Subject: [PATCH 0/5] workqueue: fix bug when numa mapping is changed
Date: Fri, 12 Dec 2014 18:19:50 +0800 [thread overview]
Message-ID: <1418379595-6281-1-git-send-email-laijs@cn.fujitsu.com> (raw)
Workqueue code has an assumption that the numa mapping is stable
after system booted. It is incorrectly currently.
Yasuaki Ishimatsu hit a allocation failure bug when the numa mapping
between CPU and node is changed. This was the last scene:
SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
node 0: slabs: 6172, objs: 259224, free: 245741
node 1: slabs: 3261, objs: 136962, free: 127656
Yasuaki Ishimatsu investigated that it happened in the following situation:
1) System Node/CPU before offline/online:
| CPU
------------------------
node 0 | 0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119
2) A system-board (contains node2 and node3) is offline:
| CPU
------------------------
node 0 | 0-14, 60-74
node 1 | 15-29, 75-89
3) A new system-board is online, two new node IDs are allocated
for the two node of the SB, but the old CPU IDs are allocated for
the SB, here the NUMA mapping between node and CPU is changed.
(the node of CPU#30 is changed from node#2 to node#4, for example)
| CPU
------------------------
node 0 | 0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119
4) now, the NUMA mapping is changed, but wq_numa_possible_cpumask
which is the convenient NUMA mapping cache in workqueue.c is still outdated.
thus pool->node calculated by get_unbound_pool() is incorrect.
5) when the create_worker() is called with the incorrect offlined
pool->node, it is failed and the pool can't make any progress.
To fix this bug, we need to fixup the wq_numa_possible_cpumask and the
pool->node, it is done in patch2 and patch3.
patch1 fixes memory leak related wq_numa_possible_cpumask.
patch4 kill another assumption about how the numa mapping changed.
patch5 reduces the allocation fails when the node is offline or the node
is lack of memory.
The patchset is untested. It is sent for earlier review.
Thanks,
Lai.
Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: "Gu, Zheng" <guz.fnst@cn.fujitsu.com>
Cc: tangchen <tangchen@cn.fujitsu.com>
Cc: Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>
Lai Jiangshan (5):
workqueue: fix memory leak in wq_numa_init()
workqueue: update wq_numa_possible_cpumask
workqueue: fixup existing pool->node
workqueue: update NUMA affinity for the node lost CPU
workqueue: retry on NUMA_NO_NODE when create_worker() fails
kernel/workqueue.c | 129 ++++++++++++++++++++++++++++++++++++++++++++--------
1 files changed, 109 insertions(+), 20 deletions(-)
--
1.7.4.4
next reply other threads:[~2014-12-12 10:16 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-12 10:19 Lai Jiangshan [this message]
2014-12-12 10:19 ` [PATCH 1/5] workqueue: fix memory leak in wq_numa_init() Lai Jiangshan
2014-12-12 17:12 ` Tejun Heo
2014-12-15 5:25 ` Lai Jiangshan
2014-12-12 10:19 ` [PATCH 2/5] workqueue: update wq_numa_possible_cpumask Lai Jiangshan
2014-12-12 17:18 ` Tejun Heo
2014-12-15 2:02 ` Lai Jiangshan
2014-12-25 20:16 ` Tejun Heo
2014-12-18 2:22 ` Lai Jiangshan
2014-12-12 10:19 ` [PATCH 3/5] workqueue: fixup existing pool->node Lai Jiangshan
2014-12-12 17:25 ` Tejun Heo
2014-12-15 1:23 ` Lai Jiangshan
2014-12-25 20:14 ` Tejun Heo
2015-01-13 7:08 ` Lai Jiangshan
2015-01-13 15:24 ` Tejun Heo
2014-12-12 10:19 ` [PATCH 4/5] workqueue: update NUMA affinity for the node lost CPU Lai Jiangshan
2014-12-12 17:27 ` Tejun Heo
2014-12-15 1:28 ` Lai Jiangshan
2014-12-25 20:17 ` Tejun Heo
2014-12-12 10:19 ` [PATCH 5/5] workqueue: retry on NUMA_NO_NODE when create_worker() fails Lai Jiangshan
2014-12-12 16:05 ` KOSAKI Motohiro
2014-12-12 17:29 ` KOSAKI Motohiro
2014-12-12 17:29 ` Tejun Heo
2014-12-12 17:13 ` [PATCH 0/5] workqueue: fix bug when numa mapping is changed Yasuaki Ishimatsu
2014-12-15 1:34 ` Lai Jiangshan
2014-12-18 1:50 ` Yasuaki Ishimatsu
2014-12-13 16:27 ` [PATCH 0/4] workqueue: fix bug when numa mapping is changed v2 Kamezawa Hiroyuki
2014-12-13 16:30 ` [PATCH 1/4] workqueue: add a hook for node hotplug Kamezawa Hiroyuki
2014-12-13 16:33 ` [PATCH 2/4] workqueue: add warning if pool->node is offline Kamezawa Hiroyuki
2014-12-13 16:35 ` [PATCH 3/4] workqueue: remove per-node unbound pool when node goes offline Kamezawa Hiroyuki
2014-12-15 2:06 ` Lai Jiangshan
2014-12-15 2:06 ` Kamezawa Hiroyuki
2014-12-13 16:38 ` [PATCH 4/4] workqueue: handle change in cpu-node relationship Kamezawa Hiroyuki
2014-12-15 2:12 ` Lai Jiangshan
2014-12-15 2:20 ` Kamezawa Hiroyuki
2014-12-15 2:48 ` Lai Jiangshan
2014-12-15 2:55 ` Kamezawa Hiroyuki
2014-12-15 3:30 ` Lai Jiangshan
2014-12-15 3:34 ` Lai Jiangshan
2014-12-15 4:04 ` Kamezawa Hiroyuki
2014-12-15 5:19 ` Lai Jiangshan
2014-12-15 5:33 ` Kamezawa Hiroyuki
2014-12-15 11:11 ` [PATCH 0/4] workqueue: fix memory allocation after numa mapping is changed v3 Kamezawa Hiroyuki
2014-12-15 11:14 ` [PATCH 1/4] workqueue:Fix unbound workqueue's node affinity detection Kamezawa Hiroyuki
2014-12-16 5:30 ` Lai Jiangshan
2014-12-16 7:32 ` Kamezawa Hiroyuki
2014-12-16 7:54 ` Lai Jiangshan
2014-12-15 11:16 ` [PATCH 2/4] workqueue: update per-cpu workqueue's node affinity at,online-offline Kamezawa Hiroyuki
2014-12-16 5:32 ` Lai Jiangshan
2014-12-16 7:25 ` Kamezawa Hiroyuki
2014-12-15 11:18 ` [PATCH 3/4] workqueue: Update workqueue's possible cpumask when a new node, coming up Kamezawa Hiroyuki
2014-12-16 7:49 ` Lai Jiangshan
2014-12-16 8:10 ` Kamezawa Hiroyuki
2014-12-16 8:18 ` Kamezawa Hiroyuki
2014-12-15 11:22 ` [PATCH 4/4] workqueue: Handle cpu-node affinity change at CPU_ONLINE Kamezawa Hiroyuki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1418379595-6281-1-git-send-email-laijs@cn.fujitsu.com \
--to=laijs@cn.fujitsu.com \
--cc=guz.fnst@cn.fujitsu.com \
--cc=isimatu.yasuaki@jp.fujitsu.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tangchen@cn.fujitsu.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).