From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752865AbdFQLxT (ORCPT ); Sat, 17 Jun 2017 07:53:19 -0400 Received: from mail-qk0-f169.google.com ([209.85.220.169]:36535 "EHLO mail-qk0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751994AbdFQLxS (ORCPT ); Sat, 17 Jun 2017 07:53:18 -0400 Date: Sat, 17 Jun 2017 07:53:14 -0400 From: Tejun Heo To: "Paul E. McKenney" Cc: jiangshanlai@gmail.com, linux-kernel@vger.kernel.org Subject: Re: WARN_ON_ONCE() in process_one_work()? Message-ID: <20170617115314.GA20758@htj.duckdns.org> References: <20170501165747.GA993@linux.vnet.ibm.com> <20170501183807.GA7054@linux.vnet.ibm.com> <20170501184402.GB8921@htj.duckdns.org> <20170501185819.GJ3956@linux.vnet.ibm.com> <20170505171159.GA10296@linux.vnet.ibm.com> <20170613205837.GB7359@htj.duckdns.org> <20170613223103.GX3721@linux.vnet.ibm.com> <20170614151548.GA14462@linux.vnet.ibm.com> <20170615153857.GA27788@linux.vnet.ibm.com> <20170616173658.GA451@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170616173658.GA451@linux.vnet.ibm.com> User-Agent: Mutt/1.8.2 (2017-04-18) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On Fri, Jun 16, 2017 at 10:36:58AM -0700, Paul E. McKenney wrote: > And no test failures from yesterday evening. So it looks like we get > somewhere on the order of one failure per 138 hours of TREE07 rcutorture > runtime with your printk() in the mix. > > Was the above output from your printk() output of any help? Yeah, if my suspicion is correct, it'd require new kworker creation racing against CPU offline, which would explain why it's so difficult to repro. Can you please see whether the following patch resolves the issue? Thanks. diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 803c3bc274c4..1500217ce4b4 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -980,8 +980,13 @@ struct migration_arg { static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf, struct task_struct *p, int dest_cpu) { - if (unlikely(!cpu_active(dest_cpu))) - return rq; + if (p->flags & PF_KTHREAD) { + if (unlikely(!cpu_online(dest_cpu))) + return rq; + } else { + if (unlikely(!cpu_active(dest_cpu))) + return rq; + } /* Affinity changed (again). */ if (!cpumask_test_cpu(dest_cpu, &p->cpus_allowed))