From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752656AbdFUR7l (ORCPT ); Wed, 21 Jun 2017 13:59:41 -0400 Received: from mail-yw0-f196.google.com ([209.85.161.196]:33483 "EHLO mail-yw0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751125AbdFUR7k (ORCPT ); Wed, 21 Jun 2017 13:59:40 -0400 Date: Wed, 21 Jun 2017 13:59:34 -0400 From: Tejun Heo To: Steven Rostedt Cc: Ingo Molnar , Peter Zijlstra , "Paul E. McKenney" , linux-kernel@vger.kernel.org, Lai Jiangshan , kernel-team@fb.com Subject: Re: simple repro case Message-ID: <20170621175934.GB10139@htj.duckdns.org> References: <20170617121008.GB20758@htj.duckdns.org> <20170617121149.GC20758@htj.duckdns.org> <20170621102457.088f28b8@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170621102457.088f28b8@gandalf.local.home> User-Agent: Mutt/1.8.2 (2017-04-18) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Steven. On Wed, Jun 21, 2017 at 10:24:57AM -0400, Steven Rostedt wrote: > On Sat, 17 Jun 2017 08:11:49 -0400 > Tejun Heo wrote: > > > Here's a simple rerpo. The test code runs whenever a CPU goes > > off/online. The test kthread is created on a different CPU and > > migrated to the target CPU while running. Without the previous patch > > applied, the kthread ends up running on the wrong CPU. > > > > Hmm, I'm not able to trigger the warn_on, with this patch applied. > > Adding a trace_printk("here!\n") just above the warn_on in > wq_worker_sleeping(), and doing the following: > > cpuhp/2-20 [002] d..1 751.204894: console: [ 751.018261] TEST: cpu 2 inactive, starting on 0 and migrating (active/online=0-1,3/0-3) > cpuhp/2-20 [002] d..1 751.318375: console: [ 751.131745] TEST: test_last_cpu=0 cpus_allowed=0 > cpuhp/2-20 [002] d..1 751.324249: console: [ 751.137621] TEST: migrating to inactve cpu 2 > cpuhp/2-20 [002] d..1 751.438368: console: [ 751.251738] TEST: test_last_cpu=0 cpus_allowed=2 Ah, sorry about not being clear. The repro is that test_last_cpu isn't 2 on the last line. It created a kthread on CPU 0 and tried to migrate that to an online but inactive CPU 2 but the kthread couldn't get on that CPU because the migration code disallowed the kthread from moving to an inactive CPU. The same problem affects workqueue rescuer. It tries to migrate to an inactive CPU to service the workqueue there but silently fails to and then ends up running the work item on the wrong CPU. Thanks. -- tejun