From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932756AbcGOLwH (ORCPT ); Fri, 15 Jul 2016 07:52:07 -0400 Received: from mail7.hitachi.co.jp ([133.145.228.42]:55751 "EHLO mail7.hitachi.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932595AbcGOLwF (ORCPT ); Fri, 15 Jul 2016 07:52:05 -0400 From: =?utf-8?B?5rKz5ZCI6Iux5a6PIC8gS0FXQUnvvIxISURFSElSTw==?= To: "'Dave Young'" CC: Michal Hocko , Toshi Kani , "Peter Zijlstra (Intel)" , Vitaly Kuznetsov , Minfei Huang , "H. Peter Anvin" , Daniel Walker , Ingo Molnar , Takao Indoh , Baoquan He , "x86@kernel.org" , "Lee, Chun-Yi" , Borislav Petkov , Vivek Goyal , Masami Hiramatsu , Petr Mladek , Josh Poimboeuf , Thomas Gleixner , Ingo Molnar , "kexec@lists.infradead.org" , "linux-kernel@vger.kernel.org" , "HATAYAMA Daisuke" , "Eric W. Biederman" , Tejun Heo , Andrew Morton Subject: RE: Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version Thread-Topic: [!]Re: [V3 PATCH 1/2] x86/panic: Replace smp_send_stop() with kdump friendly version Thread-Index: AQHR208fDDrA8sImK0m4taBP2wA4v6AUAfpQgAEGDwCABEwTAA== Date: Fri, 15 Jul 2016 11:50:55 +0000 Deferred-Delivery: Fri, 15 Jul 2016 11:52:00 +0000 Message-ID: <04EAB7311EE43145B2D3536183D1A84454C7A1B1@GSjpTKYDCembx31.service.hitachi.net> References: <20160705113323.5864.66834.stgit@softrs> <20160705113325.5864.74840.stgit@softrs> <20160711083449.GA19320@dhcp-128-65.nay.redhat.com> <04EAB7311EE43145B2D3536183D1A84454C6F889@GSjpTKYDCembx31.service.hitachi.net> <20160713020336.GA3222@dhcp-128-65.nay.redhat.com> In-Reply-To: <20160713020336.GA3222@dhcp-128-65.nay.redhat.com> Accept-Language: ja-JP, en-US Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.232.29.173] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 X-TM-AS-GCONF: 00 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id u6FBqBA9017833 Hi Dave, Thanks for your reply. > From: 'Dave Young' [mailto:dyoung@redhat.com] > Sent: Wednesday, July 13, 2016 11:04 AM > > On 07/12/16 at 02:49am, 河合英宏 / KAWAI,HIDEHIRO wrote: > > Hi Dave, > > > > Thanks for the comments. > > > > > From: Dave Young [mailto:dyoung@redhat.com] > > > Sent: Monday, July 11, 2016 5:35 PM > > > > > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote: > > > > This patch fixes one of the problems reported by Daniel Walker > > > > (https://lkml.org/lkml/2015/6/24/44). > > > > > > > > If crash_kexec_post_notifiers boot option is specified, other CPUs > > > > are stopped by smp_send_stop() instead of machine_crash_shutdown() > > > > in crash_kexec() path. This behavior change leads two problems. > > > > > > > > Problem 1: > > > > octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs > > > > are still online and try to stop their watchdog timer. If > > > > smp_send_stop() is called before octeon_generic_shutdown(), > > > > stopping watchdog timer will fail because other CPUs have been > > > > offlined by smp_send_stop(). > > > > > > > > panic() > > > > if crash_kexec_post_notifiers == 1 > > > > smp_send_stop() > > > > atomic_notifier_call_chain() > > > > kmsg_dump() > > > > crash_kexec() > > > > machine_crash_shutdown() > > > > octeon_generic_shutdown() // shutdown watchdog for ONLINE > > > > CPUs > > > > > > > > Problem 2: > > > > Most of architectures stop other CPUs in machine_crash_shutdown() > > > > path, and they also do something needed for kdump. For example, > > > > they save registers, disable virtualization extensions, and so on. > > > > However, if smp_send_stop() stops other CPUs before > > > > machine_crash_shutdown(), we miss those operations. > > > > > > > > How do we fix these problems? In the first place, we should stop > > > > other CPUs as soon as possible when panic() was called, otherwise > > > > other CPUs may wipe out a clue to the cause of the failure. So, > > > > we replace smp_send_stop() with more suitable one for kdump. > > > > > > We have been avoiding extra things in panic path, but unfortunately > > > crash_kexec_post_notifiers were added. I tend to agree the best > > > place for this stuff is in 2nd kernel or purgatory instead of in 1st kernel. > > > > Several months ago, I posted a patch set which writes regs to SEL, > > generate an event to send SNMP message, and start/stop BMC's watchdog > > timer in purgatory. This feature requires BMC with KCS (Keyboard > > Controller Style) I/F, but the most of enterprise grade server would have it. > > (http://thread.gmane.org/gmane.linux.kernel.kexec/15382) > > > > Doing kmsg_dump things in purgatory wouldn't be suitable (should be > > done in the 2nd kernel before enabling devices and IRQs?) > > In theory it is doable maybe do something like oldmem_kmsg_dump while /proc/vmcore initializing? Hmm, I checked the case of using ACPI ERST as a persistent storage. The feature is initialized by device_initcall(): device_initcall erst_init pstore_register And vmcore is initialized by fs_initcall() which is called before device_initcall(). We may be able to change the sequence, but anyway, these are done in later phase of the kernel initialization. So, it may get less reliable although it would be doable. > > > As for this patch I'm not sure it is safe to replace the > > > smp_send_stop with the kdump friendly function. I'm also not sure if > > > the kdump friendly function is safe for kdump. Will glad to hear > > > opinions from other arch experts. > > > > This stuff depends on architectures, so I speak only about > > x86 (the logic doesn't change on other architectures at this time). > > > > kdump path with crash_kexec_post_notifiers disabled: > > panic() > > __crash_kexec() > > crash_setup_regs() > > crash_save_vmcoreinfo() > > machine_crash_shutdown() > > native_machine_crash_shutdown() > > panic_smp_send_stop() /* mostly same as original > > * kdump_nmi_shootdown_cpus() > > */ > > > > kdump path with crash_kexec_post_notifiers enabled: > > panic() > > panic_smp_send_stop() > > __crash_kexec() > > crash_setup_regs() > > crash_save_vmcoreinfo() > > machine_crash_shutdown() > > native_machine_crash_shutdown() > > panic_smp_send_stop() // do nothing > > > > The difference is that stopping other CPUs before crash_setup_regs() > > and crash_save_vmcoreinfo() or not. Since crash_setup_regs() and > > crash_save_vmcoreinfo() just save information to some memory area, > > they wouldn't be affected by panic_smp_send_stop(). This means > > placing panic_smp_send_stop before __crash_kexec is safe. > > > > BTW, I noticed my patch breaks Xen kernel. I'll fix it in the next > > version. > > But it does breaks stuff which depends on cpu not being disabled like problem 1 you mentioned in patch log. As I mentioned in the description of this patch, we should stop other CPUs ASAP to preserve current state either crash_kexec_post_notifiers is enabled or not. Then, all remaining procedures should work well after stopping other CPUs (but keep the CPU map online). Vivek also mentioned similar things: https://lkml.org/lkml/2015/7/14/433 > > > BTW, if one want to use crash_kexec_post_notifiers he should take > > > the risk of unreliable kdump. How about only call smp_send_stop in > > > case no crash_kexec_post_notifiers being used. > > > > Unlike panic_smp_send_stop()/kdump_nmi_shootdown_cpus(), > > smp_send_stop() for x86 tries to stop other CPUs with normal IPI before issuing NMI IPI. > > This would be because NMI IPI has a risk of deadlock. We checked if > > the kdump path has a risk of deadlock in the case of NMI panic and > > fixed it. But I'm not sure about normal panic path. I agree with > > that use smp_send_stop if crash_kexec_post_notifiers or kdump is disabled. > > What I mean is like below, problem 1 will not exist in this way, but kdump will be unreliable: > if (!crash_kexec_post_notifiers) > smp_send_stop() Remaining procedures including atomic_notifier_call_chain and kmsg_dump may assume that other CPUs have stopped. Actually, IPMI driver callback assumes so that. Also, other CPUs may change the current state while calling these callbacks and make the dump analysis difficult. [snip] Best regards, Hidehiro Kawai