From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753665AbbBMSz2 (ORCPT ); Fri, 13 Feb 2015 13:55:28 -0500 Received: from mx1.redhat.com ([209.132.183.28]:35744 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752924AbbBMSz1 (ORCPT ); Fri, 13 Feb 2015 13:55:27 -0500 Date: Fri, 13 Feb 2015 19:53:28 +0100 From: Oleg Nesterov To: Nicholas Mc Guire Cc: Davidlohr Bueso , paulmck@linux.vnet.ibm.com, linux-kernel@vger.kernel.org, waiman.long@hp.com, peterz@infradead.org, raghavendra.kt@linux.vnet.ibm.com Subject: Re: BUG: spinlock bad magic on CPU#0, migration/0/9 Message-ID: <20150213185328.GA19746@redhat.com> References: <20150212003430.GA28656@linux.vnet.ibm.com> <1423710911.2046.50.camel@stgolabs.net> <20150212172805.GA20850@redhat.com> <20150212174144.GA21714@redhat.com> <20150212191009.GA26275@opentech.at> <20150212193734.GA28499@redhat.com> <20150212212746.GB30430@redhat.com> <20150213181752.GB11953@opentech.at> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150213181752.GB11953@opentech.at> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/13, Nicholas Mc Guire wrote: > > On Thu, 12 Feb 2015, Oleg Nesterov wrote: > > > Nicholas, sorry, I sent the patch but forgot to CC you. > > See https://lkml.org/lkml/2015/2/12/587 > > > > And please note that "completion" was specially designed to guarantee > > that complete() can't play with this memory after wait_for_completion/etc > > returns. > > > > hmmm.... I guess that "falling out of context" can happen in a number of cases > with completion - any of the timeout/interruptible variants e.g: > > void xxx(void) > { > struct completion c; > > init_completion(&c); > > expose_this_completion(&c); > > wait_for_completion_timeout(&c,A_FEW_JIFFIES); > } > > and if the other side did not call complete() within A_FEW_JIFFIES then > it would result in the same failure - I don't think the API can prevent > this type of bug. Yes sure, but in this case the user of wait_for_completion_timeout() should blame itself, it is simply buggy. > Tt has to be ensured by additional locking Yes, but > drivers/misc/tifm_7xx1.c:tifm_7xx1_resume() resolve this issue by resetting > the completion to NULL and testing for !NULL before calling complete() > with appropriate locking protection access. I don't understand this code, I can be easily wrong. but at first glance it doesn't need completion at all. Exactly because it relies on the additional fm->lock. ->finish_me could be "task_struct *", the tifm_7xx1_resume() could simply do schedule_timeout(), tifm_7xx1_isr() could do wake_up_process(). Nevermind, this is off-topic and most probably I misread this code. > Never the less of course the proposed change in completion_done() was a bug - > many thanks for catching that so quickly ! OK, perhaps you can ack the fix I sent? Oleg.