From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752165AbbBLV3o (ORCPT ); Thu, 12 Feb 2015 16:29:44 -0500 Received: from mx1.redhat.com ([209.132.183.28]:42679 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751571AbbBLV3n (ORCPT ); Thu, 12 Feb 2015 16:29:43 -0500 Date: Thu, 12 Feb 2015 22:27:46 +0100 From: Oleg Nesterov To: Nicholas Mc Guire Cc: Davidlohr Bueso , paulmck@linux.vnet.ibm.com, linux-kernel@vger.kernel.org, waiman.long@hp.com, peterz@infradead.org, raghavendra.kt@linux.vnet.ibm.com Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9 Message-ID: <20150212212746.GB30430@redhat.com> References: <20150212003430.GA28656@linux.vnet.ibm.com> <1423710911.2046.50.camel@stgolabs.net> <20150212172805.GA20850@redhat.com> <20150212174144.GA21714@redhat.com> <20150212191009.GA26275@opentech.at> <20150212193734.GA28499@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150212193734.GA28499@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Nicholas, sorry, I sent the patch but forgot to CC you. See https://lkml.org/lkml/2015/2/12/587 And please note that "completion" was specially designed to guarantee that complete() can't play with this memory after wait_for_completion/etc returns. On 02/12, Oleg Nesterov wrote: > > On 02/12, Nicholas Mc Guire wrote: > > > > On Thu, 12 Feb 2015, Oleg Nesterov wrote: > > > > > No, sorry, only the 2nd one. > > > > > > > Unless at least document how > > > > you can use these helpers. > > > > > > > > Consider this code: > > > > > > > > void xxx(void) > > > > { > > > > struct completion c; > > > > > > > > init_completion(&c); > > > > > > > > expose_this_completion(&c); > > > > > > > > while (!completion_done(&c) > > > > schedule_timeout_uninterruptible(1); > > > > But that would not break due to the change - even if completion_done() had a > > problem - complete_done() is not consuming x->done it is only checking it? > > Nicholas, looks like you didn't read the text below: > > > > > Before that change this code was correct, now it is not. Hmm and note that > > > > this is what stop_machine_from_inactive_cpu() does although I do not know > > > > if this is related or not. > > > > > > > > Because completion_done() can now race with complete(), the final > > > > spin_unlock() can write to the memory after it was freed/reused. In this > > > > case it can write to the stack after return. > > Or I misunderstood you. > > > > bool completion_done(struct completion *x) > > > { > > > - return !!ACCESS_ONCE(x->done); > > > + if (!READ_ONCE(x->done)) > > > + return false; > > > + > > > + smp_rmb(); > > > + spin_unlock_wait(&x->wait.lock); > > > + return true; > > > > what would be the sense of the spin_unlock_wait here ? > > all you are interested in here is the state of x->done > > No. Please see above. > > > regarding the smp_rmb() where would the counterpart to that be ? > > to avoid the reordering, we should not read ->wait.lock before ->done. > > Oleg.