From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754574Ab2ANCx2 (ORCPT ); Fri, 13 Jan 2012 21:53:28 -0500 Received: from mail-ww0-f44.google.com ([74.125.82.44]:64279 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754185Ab2ANCx0 (ORCPT ); Fri, 13 Jan 2012 21:53:26 -0500 MIME-Version: 1.0 In-Reply-To: <4F10EB5B.5060804@linux.vnet.ibm.com> References: <20120111000051.GA28874@dztty> <4F10929E.8070007@linux.vnet.ibm.com> <4F10BDF7.8030306@linux.vnet.ibm.com> <4F10EB5B.5060804@linux.vnet.ibm.com> From: Linus Torvalds Date: Fri, 13 Jan 2012 18:53:04 -0800 X-Google-Sender-Auth: liMbDIyoLjO_bs2VT3d0JVpsLuY Message-ID: Subject: Re: x86/mce: machine check warning during poweroff To: "Srivatsa S. Bhat" Cc: Ming Lei , Djalal Harouni , Borislav Petkov , Tony Luck , Hidetoshi Seto , Ingo Molnar , Andi Kleen , linux-kernel@vger.kernel.org, Greg Kroah-Hartman , Kay Sievers , gouders@et.bocholt.fh-gelsenkirchen.de, Marcos Souza , Linux PM mailing list , "Rafael J. Wysocki" , "tglx@linutronix.de" , prasad@linux.vnet.ibm.com, justinmattock@gmail.com, Jeff Chua , Suresh B Siddha , Peter Zijlstra , Mel Gorman , Gilad Ben-Yossef Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 13, 2012 at 6:41 PM, Srivatsa S. Bhat wrote: > > YES!! Finally I have a fix for this whole MCE thing! :-) Goodie. > The patch below works perfectly for me - I tested multiple CPU hotplug > operations as well as multiple pm_test runs at core level. Please let me > know if this solves the suspend issue as well.. Ok, I'll try, and I bet it does. HOWEVER. I'd be a whole lot happier knowing exactly which field in "struct device" that needed to be NULL before it gets registered. I don't like how device_register() + device_create_file(dev).. is not sufficiently undone by .. device_remove_file(dev) + device_unregister() so that it can't be repeated. Exactly *what* state is stale and re-used incorrectly if you do that device_register() a second time. It smells like a misfeature of the device core handling. But that does obviously explain why this started happening with a fairly straightforward conversion from sysdev to struct device. It just makes me worry about any *other* such conversions. Of course, normal users will allocate and free the memory, so never see this "re-use the same piece of memory" issue. But still.. Linus