From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753844AbcJDQB2 (ORCPT ); Tue, 4 Oct 2016 12:01:28 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41180 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752005AbcJDQB1 (ORCPT ); Tue, 4 Oct 2016 12:01:27 -0400 Message-ID: <57F3D254.20806@redhat.com> Date: Tue, 04 Oct 2016 12:01:24 -0400 From: Prarit Bhargava User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Thomas Gleixner CC: linux-kernel@vger.kernel.org, Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Peter Zijlstra , Len Brown , Borislav Petkov , Andi Kleen , Jiri Olsa , Juergen Gross , dyoung@redhat.com, Eric Biederman , kexec@lists.infradead.org Subject: Re: [PATCH] arch/x86: Fix kdump on x86 with physically hotadded CPUs References: <1475514432-27682-1-git-send-email-prarit@redhat.com> <57F39C09.10001@redhat.com> In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Tue, 04 Oct 2016 16:01:26 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/04/2016 10:38 AM, Thomas Gleixner wrote: > On Tue, 4 Oct 2016, Prarit Bhargava wrote: >> On 10/04/2016 06:58 AM, Thomas Gleixner wrote: >>> While it is the right thing to initialize the package map in that case, it >>> still papers over a robustness issue in the uncore code, which needs to be >>> fixed first. >> >> I will include a separate patch with an error check for pkg == 0xffff in the >> uncore code. > > 0xffff? That won't help. The id returned is -1 if the entry is not > initialized. And aside of that just patching that particular place is not > helping as the uncore code and also rapl is relying on the package map > being populated. Yes, I noticed that after I started digging into it this morning. Not only what you pointed out but there's init that occurs in the uncore code that would have to be undone. There is a similar issue in the rapl code, but that code inadvertently protects itself with for loops that end up never running (and that's why the rapl code doesn't panic). > > So we need a sanity check in the initialization code which prevents any of > this being executed. Ok, should this be done only for logical_proc_id or for logical_proc_id, phys_proc_id, and cpu_core_id? What do you think of adding that to the end of smp_init_package_map() or smp_store_cpu_info()? > >>>> + if (!num_processors) { >>>> + pr_warn("CPU 0 not enumerated in mptable or ACPI MADT\n"); >>>> + num_processors = 1; >>> >>> And in this case we end up with the same problem, right? >> >> It occurs to me that I over thought this: I was thinking that there might exist >> a pre-ACPI (or at least a system without an MADT) x86 system that wold boot such >> that num_processors = 0. But in that case, the cpu should be listed in the >> mptables so the above should not happen. I'll change that to a BUG(). > > No. That's the wrong thing to do. Think SMP kernel on UP machines ... > Sorry Thomas, but my history with real UP hardware is limited. I think you might be saying I should call generic_processor_info(0, apic_version[0]) to populate cpu 0 but I'm not sure. P. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from mx1.redhat.com ([209.132.183.28]) by bombadil.infradead.org with esmtps (Exim 4.85_2 #1 (Red Hat Linux)) id 1brSAB-0001PM-0D for kexec@lists.infradead.org; Tue, 04 Oct 2016 16:01:52 +0000 Message-ID: <57F3D254.20806@redhat.com> Date: Tue, 04 Oct 2016 12:01:24 -0400 From: Prarit Bhargava MIME-Version: 1.0 Subject: Re: [PATCH] arch/x86: Fix kdump on x86 with physically hotadded CPUs References: <1475514432-27682-1-git-send-email-prarit@redhat.com> <57F39C09.10001@redhat.com> In-Reply-To: List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "kexec" Errors-To: kexec-bounces+dwmw2=infradead.org@lists.infradead.org To: Thomas Gleixner Cc: Juergen Gross , Len Brown , Andi Kleen , Peter Zijlstra , dyoung@redhat.com, x86@kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Ingo Molnar , Eric Biederman , "H. Peter Anvin" , Borislav Petkov , Jiri Olsa On 10/04/2016 10:38 AM, Thomas Gleixner wrote: > On Tue, 4 Oct 2016, Prarit Bhargava wrote: >> On 10/04/2016 06:58 AM, Thomas Gleixner wrote: >>> While it is the right thing to initialize the package map in that case, it >>> still papers over a robustness issue in the uncore code, which needs to be >>> fixed first. >> >> I will include a separate patch with an error check for pkg == 0xffff in the >> uncore code. > > 0xffff? That won't help. The id returned is -1 if the entry is not > initialized. And aside of that just patching that particular place is not > helping as the uncore code and also rapl is relying on the package map > being populated. Yes, I noticed that after I started digging into it this morning. Not only what you pointed out but there's init that occurs in the uncore code that would have to be undone. There is a similar issue in the rapl code, but that code inadvertently protects itself with for loops that end up never running (and that's why the rapl code doesn't panic). > > So we need a sanity check in the initialization code which prevents any of > this being executed. Ok, should this be done only for logical_proc_id or for logical_proc_id, phys_proc_id, and cpu_core_id? What do you think of adding that to the end of smp_init_package_map() or smp_store_cpu_info()? > >>>> + if (!num_processors) { >>>> + pr_warn("CPU 0 not enumerated in mptable or ACPI MADT\n"); >>>> + num_processors = 1; >>> >>> And in this case we end up with the same problem, right? >> >> It occurs to me that I over thought this: I was thinking that there might exist >> a pre-ACPI (or at least a system without an MADT) x86 system that wold boot such >> that num_processors = 0. But in that case, the cpu should be listed in the >> mptables so the above should not happen. I'll change that to a BUG(). > > No. That's the wrong thing to do. Think SMP kernel on UP machines ... > Sorry Thomas, but my history with real UP hardware is limited. I think you might be saying I should call generic_processor_info(0, apic_version[0]) to populate cpu 0 but I'm not sure. P. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec