From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932566AbcFOLdC (ORCPT ); Wed, 15 Jun 2016 07:33:02 -0400 Received: from merlin.infradead.org ([205.233.59.134]:37690 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932095AbcFOLc7 (ORCPT ); Wed, 15 Jun 2016 07:32:59 -0400 Date: Wed, 15 Jun 2016 13:32:49 +0200 From: Peter Zijlstra To: Gautham R Shenoy Cc: Thomas Gleixner , Tejun Heo , Michael Ellerman , Abdul Haleem , Aneesh Kumar , linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/2] workqueue:Fix affinity of an unbound worker of a node with 1 online CPU Message-ID: <20160615113249.GH30909@twins.programming.kicks-ass.net> References: <20160614112234.GF30154@twins.programming.kicks-ass.net> <20160615101936.GA31671@in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160615101936.GA31671@in.ibm.com> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 15, 2016 at 03:49:36PM +0530, Gautham R Shenoy wrote: > Also, with the first patch in the series (which ensures that > restore_unbound_workers are called *after* the new workers for the > newly onlined CPUs are created) and without this one, you can > reproduce this WARN_ON on both x86 and PPC by offlining all the CPUs > of a node and bringing just one of them online. Ah good. > I am not sure about that. The workqueue creates unbound workers for a > node via wq_update_unbound_numa() whenever the first CPU of every node > comes online. So that seems legitimate. It then tries to affine these > workers to the cpumask of that node. Again this seems right. As an > optimization, it does this only when the first CPU of the node comes > online. Since this online CPU is not yet active, and since > nr_cpus_allowed > 1, we will hit the WARN_ON(). So I had another look and isn't the below a much simpler solution? It seems to work on my x86 with: for i in /sys/devices/system/cpu/cpu*/online ; do echo 0 > $i ; done for i in /sys/devices/system/cpu/cpu*/online ; do echo 1 > $i ; done without complaint. --- kernel/workqueue.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index e1c0e99..09c9160 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -4600,15 +4600,11 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu) if (!cpumask_test_cpu(cpu, pool->attrs->cpumask)) return; - /* is @cpu the only online CPU? */ cpumask_and(&cpumask, pool->attrs->cpumask, cpu_online_mask); - if (cpumask_weight(&cpumask) != 1) - return; /* as we're called from CPU_ONLINE, the following shouldn't fail */ for_each_pool_worker(worker, pool) - WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, - pool->attrs->cpumask) < 0); + WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, &cpumask) < 0); } /* @@ -4638,6 +4634,10 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb, case CPU_ONLINE: mutex_lock(&wq_pool_mutex); + /* update NUMA affinity of unbound workqueues */ + list_for_each_entry(wq, &workqueues, list) + wq_update_unbound_numa(wq, cpu, true); + for_each_pool(pool, pi) { mutex_lock(&pool->attach_mutex); @@ -4649,10 +4649,6 @@ static int workqueue_cpu_up_callback(struct notifier_block *nfb, mutex_unlock(&pool->attach_mutex); } - /* update NUMA affinity of unbound workqueues */ - list_for_each_entry(wq, &workqueues, list) - wq_update_unbound_numa(wq, cpu, true); - mutex_unlock(&wq_pool_mutex); break; }