From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161659AbcFGPPB (ORCPT ); Tue, 7 Jun 2016 11:15:01 -0400 Received: from e31.co.us.ibm.com ([32.97.110.149]:51115 "EHLO e31.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161110AbcFGPOz (ORCPT ); Tue, 7 Jun 2016 11:14:55 -0400 X-IBM-Helo: d03dlp02.boulder.ibm.com X-IBM-MailFrom: ego@linux.vnet.ibm.com X-IBM-RcptTo: mpe@ellerman.id.au;htejun@gmail.com;peterz@infradead.org;tglx@linutronix.de;linuxppc-dev@lists.ozlabs.org;linux-kernel@vger.kernel.org From: "Gautham R. Shenoy" To: Peter Zijlstra , Thomas Gleixner , Tejun Heo , Michael Ellerman , Abdul Haleem , Aneesh Kumar Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, "Gautham R. Shenoy" Subject: [PATCH 2/2] workqueue:Fix affinity of an unbound worker of a node with 1 online CPU Date: Tue, 7 Jun 2016 20:44:03 +0530 Message-Id: X-Mailer: git-send-email 1.9.3 In-Reply-To: References: <5756BE20.3040208@linux.vnet.ibm.com> In-Reply-To: References: X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16060715-8236-0000-0000-000031FE9249 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With commit e9d867a67fd03ccc ("sched: Allow per-cpu kernel threads to run on online && !active"), __set_cpus_allowed_ptr() expects that only strict per-cpu kernel threads can have affinity to an online CPU which is not yet active. This assumption is currently broken in the CPU_ONLINE notification handler for the workqueues where restore_unbound_workers_cpumask() calls set_cpus_allowed_ptr() when the first cpu in the unbound worker's pool->attr->cpumask comes online. Since set_cpus_allowed_ptr() is called with pool->attr->cpumask in which only one CPU is online which is not yet active, we get the following WARN_ON during an CPU online operation. ------------[ cut here ]------------ WARNING: CPU: 40 PID: 248 at kernel/sched/core.c:1166 __set_cpus_allowed_ptr+0x228/0x2e0 Modules linked in: CPU: 40 PID: 248 Comm: cpuhp/40 Not tainted 4.6.0-autotest+ #4 <..snip..> Call Trace: [c000000f273ff920] [c00000000010493c] __set_cpus_allowed_ptr+0x2cc/0x2e0 (unreliable) [c000000f273ffac0] [c0000000000ed4b0] workqueue_cpu_up_callback+0x2c0/0x470 [c000000f273ffb70] [c0000000000f5c58] notifier_call_chain+0x98/0x100 [c000000f273ffbc0] [c0000000000c5ed0] __cpu_notify+0x70/0xe0 [c000000f273ffc00] [c0000000000c6028] notify_online+0x38/0x50 [c000000f273ffc30] [c0000000000c5214] cpuhp_invoke_callback+0x84/0x250 [c000000f273ffc90] [c0000000000c562c] cpuhp_up_callbacks+0x5c/0x120 [c000000f273ffce0] [c0000000000c64d4] cpuhp_thread_fun+0x184/0x1c0 [c000000f273ffd20] [c0000000000fa050] smpboot_thread_fn+0x290/0x2a0 [c000000f273ffd80] [c0000000000f45b0] kthread+0x110/0x130 [c000000f273ffe30] [c000000000009570] ret_from_kernel_thread+0x5c/0x6c ---[ end trace 00f1456578b2a3b2 ]--- This patch sets the affinity of the worker to a) the only online CPU in the cpumask of the worker pool when it comes online. b) the cpumask of the worker pool when the second CPU in the pool's cpumask comes online. Reported-by: Abdul Haleem Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Tejun Heo Cc: Michael Ellerman Signed-off-by: Gautham R. Shenoy --- kernel/workqueue.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index e412794..1199f73 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -4586,7 +4586,7 @@ static void rebind_workers(struct worker_pool *pool) * * An unbound pool may end up with a cpumask which doesn't have any online * CPUs. When a worker of such pool get scheduled, the scheduler resets - * its cpus_allowed. If @cpu is in @pool's cpumask which didn't have any + * its cpus_allowed. If @cpu is in @pool's cpumask which had at most one * online CPU before, cpus_allowed of all its workers should be restored. */ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu) @@ -4600,15 +4600,26 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu) if (!cpumask_test_cpu(cpu, pool->attrs->cpumask)) return; - /* is @cpu the only online CPU? */ cpumask_and(&cpumask, pool->attrs->cpumask, cpu_online_mask); - if (cpumask_weight(&cpumask) != 1) + + /* + * The affinity needs to be set + * a) to @cpu when that is the only online CPU in + * pool->attrs->cpumask. + * b) to pool->attrs->cpumask when exactly two CPUs in + * pool->attrs->cpumask are online. This affinity will be + * retained when subsequent CPUs come online. + */ + if (cpumask_weight(&cpumask) > 2) return; + if (cpumask_weight(&cpumask) == 2) + cpumask_copy(&cpumask, pool->attrs->cpumask); + /* as we're called from CPU_ONLINE, the following shouldn't fail */ for_each_pool_worker(worker, pool) WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, - pool->attrs->cpumask) < 0); + &cpumask) < 0); } /* -- 1.9.3