From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934498AbdBQQTb (ORCPT ); Fri, 17 Feb 2017 11:19:31 -0500 Received: from mx1.redhat.com ([209.132.183.28]:55210 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934159AbdBQQT3 (ORCPT ); Fri, 17 Feb 2017 11:19:29 -0500 Reply-To: xlpang@redhat.com Subject: Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic References: <20170123145056.fyraeehjfnwmmfb6@pd.tnic> <5886AD91.10803@redhat.com> <20170124122212.3dpdex5wjallypis@pd.tnic> <5889976A.9020802@redhat.com> <20170126064400.wfsn5pzxnpi6gcuk@pd.tnic> <58A53A65.3000405@redhat.com> <20170216101845.vkmnde4v6v72dgzx@pd.tnic> <58A59269.3050706@redhat.com> <20170216122215.uvrckt25g2msfxhe@pd.tnic> <58A65791.4090600@redhat.com> <20170217090735.ls5pmtsfwkf3q5h6@pd.tnic> To: Borislav Petkov , xlpang@redhat.com Cc: Prarit Bhargava , Kiyoshi Ueda , Tony Luck , Peter Zijlstra , x86@kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Ingo Molnar , Junichi Nomura , Naoya Horiguchi , Dave Young , Thomas Gleixner From: Xunlei Pang Message-ID: <58A72316.20204@redhat.com> Date: Sat, 18 Feb 2017 00:21:42 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20170217090735.ls5pmtsfwkf3q5h6@pd.tnic> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Fri, 17 Feb 2017 16:19:30 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/17/2017 at 05:07 PM, Borislav Petkov wrote: > On Fri, Feb 17, 2017 at 09:53:21AM +0800, Xunlei Pang wrote: >> It changes the value of cpu_online_mask/etc which will cause confusion to vmcore analysis. > Then export the crashing_cpu variable, initialize it to something > invalid in the first kernel, -1 for example, and test it in the #MC > handlier like this: > > int cpu; > > ... > > cpu = smp_processor_id(); > > if (cpu_is_offline(cpu) || > ((crashing_cpu != -1) && (crashing_cpu != cpu)) { > u64 mcgstatus; > > mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS); > if (mcgstatus & MCG_STATUS_RIPV) { > mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); > return; > } > } Yes, it is doable, I will do some tests later. >> Moreover, for the code(see comment inlined) >> >> if (cpu_is_offline(smp_processor_id())) { >> u64 mcgstatus; >> >> mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS); >> if (mcgstatus & MCG_STATUS_RIPV) { // This condition may be not true, the mce triggered on kdump cpu >> // doesn't need to have this bit set for the other cpus remain in 1st kernel. > Is this on kvm or on a real hardware? Because for kvm I don't care. And > don't say "theoretically". > It's from my understanding, I didn't get the explicit description from the intel SDM on this point. If a broadcast SRAO comes on real hardware, will MSR_IA32_MCG_STATUS of each cpu have MCG_STATUS_RIPV bit set? Regards, Xunlei From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from mx1.redhat.com ([209.132.183.28]) by bombadil.infradead.org with esmtps (Exim 4.87 #1 (Red Hat Linux)) id 1celGA-0007QP-Ca for kexec@lists.infradead.org; Fri, 17 Feb 2017 16:19:52 +0000 Subject: Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic References: <20170123145056.fyraeehjfnwmmfb6@pd.tnic> <5886AD91.10803@redhat.com> <20170124122212.3dpdex5wjallypis@pd.tnic> <5889976A.9020802@redhat.com> <20170126064400.wfsn5pzxnpi6gcuk@pd.tnic> <58A53A65.3000405@redhat.com> <20170216101845.vkmnde4v6v72dgzx@pd.tnic> <58A59269.3050706@redhat.com> <20170216122215.uvrckt25g2msfxhe@pd.tnic> <58A65791.4090600@redhat.com> <20170217090735.ls5pmtsfwkf3q5h6@pd.tnic> From: Xunlei Pang Message-ID: <58A72316.20204@redhat.com> Date: Sat, 18 Feb 2017 00:21:42 +0800 MIME-Version: 1.0 In-Reply-To: <20170217090735.ls5pmtsfwkf3q5h6@pd.tnic> List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: xlpang@redhat.com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "kexec" Errors-To: kexec-bounces+dwmw2=infradead.org@lists.infradead.org To: Borislav Petkov , xlpang@redhat.com Cc: Prarit Bhargava , Kiyoshi Ueda , Tony Luck , Peter Zijlstra , x86@kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Ingo Molnar , Junichi Nomura , Naoya Horiguchi , Dave Young , Thomas Gleixner On 02/17/2017 at 05:07 PM, Borislav Petkov wrote: > On Fri, Feb 17, 2017 at 09:53:21AM +0800, Xunlei Pang wrote: >> It changes the value of cpu_online_mask/etc which will cause confusion to vmcore analysis. > Then export the crashing_cpu variable, initialize it to something > invalid in the first kernel, -1 for example, and test it in the #MC > handlier like this: > > int cpu; > > ... > > cpu = smp_processor_id(); > > if (cpu_is_offline(cpu) || > ((crashing_cpu != -1) && (crashing_cpu != cpu)) { > u64 mcgstatus; > > mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS); > if (mcgstatus & MCG_STATUS_RIPV) { > mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); > return; > } > } Yes, it is doable, I will do some tests later. >> Moreover, for the code(see comment inlined) >> >> if (cpu_is_offline(smp_processor_id())) { >> u64 mcgstatus; >> >> mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS); >> if (mcgstatus & MCG_STATUS_RIPV) { // This condition may be not true, the mce triggered on kdump cpu >> // doesn't need to have this bit set for the other cpus remain in 1st kernel. > Is this on kvm or on a real hardware? Because for kvm I don't care. And > don't say "theoretically". > It's from my understanding, I didn't get the explicit description from the intel SDM on this point. If a broadcast SRAO comes on real hardware, will MSR_IA32_MCG_STATUS of each cpu have MCG_STATUS_RIPV bit set? Regards, Xunlei _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec