From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753998AbdFMWbJ (ORCPT ); Tue, 13 Jun 2017 18:31:09 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:45167 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752259AbdFMWbH (ORCPT ); Tue, 13 Jun 2017 18:31:07 -0400 Date: Tue, 13 Jun 2017 15:31:03 -0700 From: "Paul E. McKenney" To: Tejun Heo Cc: jiangshanlai@gmail.com, linux-kernel@vger.kernel.org Subject: Re: WARN_ON_ONCE() in process_one_work()? Reply-To: paulmck@linux.vnet.ibm.com References: <20170501165747.GA993@linux.vnet.ibm.com> <20170501183807.GA7054@linux.vnet.ibm.com> <20170501184402.GB8921@htj.duckdns.org> <20170501185819.GJ3956@linux.vnet.ibm.com> <20170505171159.GA10296@linux.vnet.ibm.com> <20170613205837.GB7359@htj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170613205837.GB7359@htj.duckdns.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17061322-0048-0000-0000-000001A58777 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007227; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000212; SDB=6.00874367; UDB=6.00435227; IPR=6.00654455; BA=6.00005420; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015811; XFM=3.00000015; UTC=2017-06-13 22:31:05 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17061322-0049-0000-0000-0000418164B8 Message-Id: <20170613223103.GX3721@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-13_10:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1706130385 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 13, 2017 at 04:58:37PM -0400, Tejun Heo wrote: > Hello, Paul. > > On Fri, May 05, 2017 at 10:11:59AM -0700, Paul E. McKenney wrote: > > Just following up... I have hit this bug a couple of times over the > > past few days. Anything I can do to help? > > My apologies for dropping the ball on this. I've gone over the hot > plug code in workqueue several times but can't really find how this > would happen. Can you please apply the following patch and see what > it says when the problem happens? I have fired it up, thank you! Last time I saw one failure in 21 hours of test runs, so I have kicked of 42 one-hour test runs. Will see what happens tomorrow morning, Pacific Time. Thanx, Paul > Thanks. > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index c74bf39ef764..bd2ce3cbfb41 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -1691,13 +1691,20 @@ static struct worker *alloc_worker(int node) > static void worker_attach_to_pool(struct worker *worker, > struct worker_pool *pool) > { > + int ret; > + > mutex_lock(&pool->attach_mutex); > > /* > * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any > * online CPUs. It'll be re-applied when any of the CPUs come up. > */ > - set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); > + ret = set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); > + > + WARN(ret && !(pool->flags & POOL_DISASSOCIATED), > + "set_cpus_allowed_ptr failed, ret=%d pool->cpu/flags=%d/0x%x cpumask=%*pbl online=%*pbl active=%*pbl\n", > + ret, pool->cpu, pool->flags, cpumask_pr_args(pool->attrs->cpumask), > + cpumask_pr_args(cpu_online_mask), cpumask_pr_args(cpu_active_mask)); > > /* > * The pool->attach_mutex ensures %POOL_DISASSOCIATED remains > @@ -2037,8 +2044,11 @@ __acquires(&pool->lock) > lockdep_copy_map(&lockdep_map, &work->lockdep_map); > #endif > /* ensure we're on the correct CPU */ > - WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) && > - raw_smp_processor_id() != pool->cpu); > + if (WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) && > + raw_smp_processor_id() != pool->cpu)) > + printk_once("XXX workfn=%pf pool->cpu/flags=%d/0x%x curcpu=%d online=%*pbl active=%*pbl\n", > + work->func, pool->cpu, pool->flags, raw_smp_processor_id(), > + cpumask_pr_args(cpu_online_mask), cpumask_pr_args(cpu_active_mask)); > > /* > * A single work shouldn't be executed concurrently by >