From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753387Ab3DKTsa (ORCPT ); Thu, 11 Apr 2013 15:48:30 -0400 Received: from www.linutronix.de ([62.245.132.108]:48051 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753213Ab3DKTs3 (ORCPT ); Thu, 11 Apr 2013 15:48:29 -0400 Date: Thu, 11 Apr 2013 21:48:16 +0200 (CEST) From: Thomas Gleixner To: Dave Hansen cc: Borislav Petkov , "Srivatsa S. Bhat" , LKML , Dave Jones , dhillf@gmail.com, Peter Zijlstra , Ingo Molnar Subject: Re: [PATCH] kthread: Prevent unpark race which puts threads on the wrong cpu In-Reply-To: <5166FBFB.2030703@sr71.net> Message-ID: References: <515F457E.5050505@sr71.net> <515FCAC6.8090806@linux.vnet.ibm.com> <20130407095025.GA31307@pd.tnic> <20130408115553.GA4395@pd.tnic> <516439DF.3050901@sr71.net> <51647C30.3050109@sr71.net> <5165C087.4060404@sr71.net> <5166FBFB.2030703@sr71.net> User-Agent: Alpine 2.02 (LFD 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 11 Apr 2013, Dave Hansen wrote: > On 04/11/2013 03:19 AM, Thomas Gleixner wrote: > > --- linux-2.6.orig/kernel/smpboot.c > > +++ linux-2.6/kernel/smpboot.c > > @@ -185,8 +185,18 @@ __smpboot_create_thread(struct smp_hotpl > > } > > get_task_struct(tsk); > > *per_cpu_ptr(ht->store, cpu) = tsk; > > - if (ht->create) > > - ht->create(cpu); > > + if (ht->create) { > > + /* > > + * Make sure that the task has actually scheduled out > > + * into park position, before calling the create > > + * callback. At least the migration thread callback > > + * requires that the task is off the runqueue. > > + */ > > + if (!wait_task_inactive(tsk, TASK_PARKED)) > > + WARN_ON(1); > > + else > > + ht->create(cpu); > > + } > > return 0; > > } > > This one appears to be doing the trick. I'll run the cpus in an > online/offline loop for a bit and make sure it's stable. It's passed > several round so far, which is way more than it's done up to this point, > though. I think the most critical part is bootup when the threads are created. The online/offline one is an issue as well, which is covered by the patches, but way harder to trigger. Thanks, tglx