From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752731AbdFQRbN (ORCPT ); Sat, 17 Jun 2017 13:31:13 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:54016 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752613AbdFQRbL (ORCPT ); Sat, 17 Jun 2017 13:31:11 -0400 Date: Sat, 17 Jun 2017 10:31:05 -0700 From: "Paul E. McKenney" To: Tejun Heo Cc: jiangshanlai@gmail.com, linux-kernel@vger.kernel.org Subject: Re: WARN_ON_ONCE() in process_one_work()? Reply-To: paulmck@linux.vnet.ibm.com References: <20170501183807.GA7054@linux.vnet.ibm.com> <20170501184402.GB8921@htj.duckdns.org> <20170501185819.GJ3956@linux.vnet.ibm.com> <20170505171159.GA10296@linux.vnet.ibm.com> <20170613205837.GB7359@htj.duckdns.org> <20170613223103.GX3721@linux.vnet.ibm.com> <20170614151548.GA14462@linux.vnet.ibm.com> <20170615153857.GA27788@linux.vnet.ibm.com> <20170616173658.GA451@linux.vnet.ibm.com> <20170617115314.GA20758@htj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170617115314.GA20758@htj.duckdns.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17061717-0052-0000-0000-000002261290 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007250; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000214; SDB=6.00876132; UDB=6.00436305; IPR=6.00656256; BA=6.00005425; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015865; XFM=3.00000015; UTC=2017-06-17 17:31:08 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17061717-0053-0000-0000-000050FFD260 Message-Id: <20170617173105.GI3721@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-17_14:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1706170326 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jun 17, 2017 at 07:53:14AM -0400, Tejun Heo wrote: > Hello, > > On Fri, Jun 16, 2017 at 10:36:58AM -0700, Paul E. McKenney wrote: > > And no test failures from yesterday evening. So it looks like we get > > somewhere on the order of one failure per 138 hours of TREE07 rcutorture > > runtime with your printk() in the mix. > > > > Was the above output from your printk() output of any help? > > Yeah, if my suspicion is correct, it'd require new kworker creation > racing against CPU offline, which would explain why it's so difficult > to repro. Can you please see whether the following patch resolves the > issue? That could explain why only Steve Rostedt and I saw the issue. As far as I know, we are the only ones who regularly run CPU-hotplug stress tests. ;-) I have a weekend-long run going, but will give this a shot overnight on Monday, Pacific Time. Thank you for putting it together, looking forward to seeing what it does! Thanx, Paul > Thanks. > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 803c3bc274c4..1500217ce4b4 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -980,8 +980,13 @@ struct migration_arg { > static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf, > struct task_struct *p, int dest_cpu) > { > - if (unlikely(!cpu_active(dest_cpu))) > - return rq; > + if (p->flags & PF_KTHREAD) { > + if (unlikely(!cpu_online(dest_cpu))) > + return rq; > + } else { > + if (unlikely(!cpu_active(dest_cpu))) > + return rq; > + } > > /* Affinity changed (again). */ > if (!cpumask_test_cpu(dest_cpu, &p->cpus_allowed)) >