[PATCH 1/2] x86/numa: carve node online semantics out of alloc_node

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] x86/numa: carve node online semantics out of alloc_node_data()
@ 2019-07-05  4:15 Pingfan Liu
       [not found] ` <1562300143-11671-2-git-send-email-kernelfans@gmail.com>
  0 siblings, 1 reply; 7+ messages in thread
From: Pingfan Liu @ 2019-07-05  4:15 UTC (permalink / raw)
  To: x86
  Cc: Pingfan Liu, Michal Hocko, Dave Hansen, Mike Rapoport, Tony Luck,
	Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Andrew Morton, Vlastimil Babka,
	Oscar Salvador, Pavel Tatashin, Mel Gorman,
	Benjamin Herrenschmidt, Michael Ellerman, Stephen Rothwell,
	Qian Cai, Barret Rhoden, Bjorn Helgaas, David Rientjes, linux-mm,
	linux-kernel

Node online means either memory online or cpu online. But there is
requirement to instance a pglist_data, which has neither cpu nor memory
online (refer to [2/2]).

So carve out the online semantics, and call node_set_online() where either
memory or cpu is online.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Qian Cai <cai@lca.pw>
Cc: Barret Rhoden <brho@google.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/mm/numa.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index e6dad60..b48d507 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -213,8 +213,6 @@ static void __init alloc_node_data(int nid)
 
 	node_data[nid] = nd;
 	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
-
-	node_set_online(nid);
 }
 
 /**
@@ -589,6 +587,7 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 			continue;
 
 		alloc_node_data(nid);
+		node_set_online(nid);
 	}
 
 	/* Dump memblock with node info and return. */
@@ -760,8 +759,10 @@ void __init init_cpu_to_node(void)
 		if (node == NUMA_NO_NODE)
 			continue;
 
-		if (!node_online(node))
+		if (!node_online(node)) {
 			init_memory_less_node(node);
+			node_set_online(nid);
+		}
 
 		numa_set_node(cpu, node);
 	}
-- 
2.7.5


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] x86/numa: instance all parsed numa node
       [not found]   ` <alpine.DEB.2.21.1907072133310.3648@nanos.tec.linutronix.de>
@ 2019-07-08  8:36     ` Pingfan Liu
       [not found]       ` <alpine.DEB.2.21.1907081125300.3648@nanos.tec.linutronix.de>
  0 siblings, 1 reply; 7+ messages in thread
From: Pingfan Liu @ 2019-07-08  8:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: x86, Michal Hocko, Dave Hansen, Mike Rapoport, Tony Luck,
	Andy Lutomirski, Peter Zijlstra, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Andrew Morton, Vlastimil Babka, Oscar Salvador,
	Pavel Tatashin, Mel Gorman, Benjamin Herrenschmidt,
	Michael Ellerman, Stephen Rothwell, Qian Cai, Barret Rhoden,
	Bjorn Helgaas, David Rientjes, linux-mm, LKML

On Mon, Jul 8, 2019 at 3:44 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Fri, 5 Jul 2019, Pingfan Liu wrote:
>
> > I hit a bug on an AMD machine, with kexec -l nr_cpus=4 option. nr_cpus option
> > is used to speed up kdump process, so it is not a rare case.
>
> But fundamentally wrong, really.
>
> The rest of the CPUs are in a half baken state and any broadcast event,
> e.g. MCE or a stray IPI, will result in a undiagnosable crash.
Very appreciate if you can pay more word on it? I tried to figure out
your point, but fail.

For "a half baked state", I think you concern about LAPIC state, and I
expand this point like the following:

For IPI: when capture kernel BSP is up, the rest cpus are still loop
inside crash_nmi_callback(), so there is no way to eject new IPI from
these cpu. Also we disable_local_APIC(), which effectively prevent the
LAPIC from responding to IPI, except NMI/INIT/SIPI, which will not
occur in crash case.

For MCE, I am not sure whether it can broadcast or not between cpus,
but as my understanding, it can not. Then is it a problem?

From another view point, is there any difference between nr_cpus=1 and
nr_cpus> 1 in crashing case? If stray IPI raises issue to nr_cpus>1,
it does for nr_cpus=1.

Thanks,
  Pingfan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] x86/numa: instance all parsed numa node
       [not found]       ` <alpine.DEB.2.21.1907081125300.3648@nanos.tec.linutronix.de>
@ 2019-07-08 17:53         ` Andy Lutomirski
  2019-07-09  4:26           ` Pingfan Liu
  2019-07-09  4:16         ` Pingfan Liu
  1 sibling, 1 reply; 7+ messages in thread
From: Andy Lutomirski @ 2019-07-08 17:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Pingfan Liu, x86, Michal Hocko, Dave Hansen, Mike Rapoport,
	Tony Luck, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Andrew Morton, Vlastimil Babka,
	Oscar Salvador, Pavel Tatashin, Mel Gorman,
	Benjamin Herrenschmidt, Michael Ellerman, Stephen Rothwell,
	Qian Cai, Barret Rhoden, Bjorn Helgaas, David Rientjes, linux-mm,
	LKML



> On Jul 8, 2019, at 3:35 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
>> On Mon, 8 Jul 2019, Pingfan Liu wrote:
>>> On Mon, Jul 8, 2019 at 3:44 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>> 
>>>> On Fri, 5 Jul 2019, Pingfan Liu wrote:
>>>> 
>>>> I hit a bug on an AMD machine, with kexec -l nr_cpus=4 option. nr_cpus option
>>>> is used to speed up kdump process, so it is not a rare case.
>>> 
>>> But fundamentally wrong, really.
>>> 
>>> The rest of the CPUs are in a half baken state and any broadcast event,
>>> e.g. MCE or a stray IPI, will result in a undiagnosable crash.
>> Very appreciate if you can pay more word on it? I tried to figure out
>> your point, but fail.
>> 
>> For "a half baked state", I think you concern about LAPIC state, and I
>> expand this point like the following:
> 
> It's not only the APIC state. It's the state of the CPUs in general.
> 
>> For IPI: when capture kernel BSP is up, the rest cpus are still loop
>> inside crash_nmi_callback(), so there is no way to eject new IPI from
>> these cpu. Also we disable_local_APIC(), which effectively prevent the
>> LAPIC from responding to IPI, except NMI/INIT/SIPI, which will not
>> occur in crash case.
> 
> Fair enough for the IPI case.
> 
>> For MCE, I am not sure whether it can broadcast or not between cpus,
>> but as my understanding, it can not. Then is it a problem?
> 
> It can and it does.
> 
> That's the whole point why we bring up all CPUs in the 'nosmt' case and
> shut the siblings down again after setting CR4.MCE. Actually that's in fact
> a 'let's hope no MCE hits before that happened' approach, but that's all we
> can do.
> 
> If we don't do that then the MCE broadcast can hit a CPU which has some
> firmware initialized state. The result can be a full system lockup, triple
> fault etc.
> 
> So when the MCE hits a CPU which is still in the crashed kernel lala state,
> then all hell breaks lose.
> 
>> From another view point, is there any difference between nr_cpus=1 and
>> nr_cpus> 1 in crashing case? If stray IPI raises issue to nr_cpus>1,
>> it does for nr_cpus=1.
> 
> Anything less than the actual number of present CPUs is problematic except
> you use the 'let's hope nothing happens' approach. We could add an option
> to stop the bringup at the early online state similar to what we do for
> 'nosmt'.
> 
> 

How about we change nr_cpus to do that instead so we never have to have this conversation again?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] x86/numa: instance all parsed numa node
       [not found]       ` <alpine.DEB.2.21.1907081125300.3648@nanos.tec.linutronix.de>
  2019-07-08 17:53         ` Andy Lutomirski
@ 2019-07-09  4:16         ` Pingfan Liu
       [not found]           ` <alpine.DEB.2.21.1907090810490.1961@nanos.tec.linutronix.de>
  1 sibling, 1 reply; 7+ messages in thread
From: Pingfan Liu @ 2019-07-09  4:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: x86, Michal Hocko, Dave Hansen, Mike Rapoport, Tony Luck,
	Andy Lutomirski, Peter Zijlstra, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Andrew Morton, Vlastimil Babka, Oscar Salvador,
	Pavel Tatashin, Mel Gorman, Benjamin Herrenschmidt,
	Michael Ellerman, Stephen Rothwell, Qian Cai, Barret Rhoden,
	Bjorn Helgaas, David Rientjes, linux-mm, LKML

On Mon, Jul 8, 2019 at 5:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Mon, 8 Jul 2019, Pingfan Liu wrote:
> > On Mon, Jul 8, 2019 at 3:44 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> > >
> > > On Fri, 5 Jul 2019, Pingfan Liu wrote:
> > >
> > > > I hit a bug on an AMD machine, with kexec -l nr_cpus=4 option. nr_cpus option
> > > > is used to speed up kdump process, so it is not a rare case.
> > >
> > > But fundamentally wrong, really.
> > >
> > > The rest of the CPUs are in a half baken state and any broadcast event,
> > > e.g. MCE or a stray IPI, will result in a undiagnosable crash.
> > Very appreciate if you can pay more word on it? I tried to figure out
> > your point, but fail.
> >
> > For "a half baked state", I think you concern about LAPIC state, and I
> > expand this point like the following:
>
> It's not only the APIC state. It's the state of the CPUs in general.
For other states, "kexec -l " is a kind of boot loader and the boot
cpu complies with the kernel boot up provision. As for the rest AP,
they are pinged at loop before receiving #INIT IPI. Then the left
things is the same as SMP boot up.

>
> > For IPI: when capture kernel BSP is up, the rest cpus are still loop
> > inside crash_nmi_callback(), so there is no way to eject new IPI from
> > these cpu. Also we disable_local_APIC(), which effectively prevent the
> > LAPIC from responding to IPI, except NMI/INIT/SIPI, which will not
> > occur in crash case.
>
> Fair enough for the IPI case.
>
> > For MCE, I am not sure whether it can broadcast or not between cpus,
> > but as my understanding, it can not. Then is it a problem?
>
> It can and it does.
>
> That's the whole point why we bring up all CPUs in the 'nosmt' case and
> shut the siblings down again after setting CR4.MCE. Actually that's in fact
> a 'let's hope no MCE hits before that happened' approach, but that's all we
> can do.
>
> If we don't do that then the MCE broadcast can hit a CPU which has some
> firmware initialized state. The result can be a full system lockup, triple
> fault etc.
>
> So when the MCE hits a CPU which is still in the crashed kernel lala state,
> then all hell breaks lose.
Thank you for the comprehensive explain. With your guide, now, I have
a full understanding of the issue.

But when I tried to add something to enable CR4.MCE in
crash_nmi_callback(), I realized that it is undo-able in some case (if
crashed, we will not ask an offline smt cpu to online), also it is
needless. "kexec -l/-p" takes the advantage of the cpu state in the
first kernel, where all logical cpu has CR4.MCE=1.

So kexec is exempt from this bug if the first kernel already do it.
>
> > From another view point, is there any difference between nr_cpus=1 and
> > nr_cpus> 1 in crashing case? If stray IPI raises issue to nr_cpus>1,
> > it does for nr_cpus=1.
>
> Anything less than the actual number of present CPUs is problematic except
> you use the 'let's hope nothing happens' approach. We could add an option
> to stop the bringup at the early online state similar to what we do for
> 'nosmt'.
Yes, we should do something about nr_cpus param for the first kernel.

Thanks,
  Pingfan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] x86/numa: instance all parsed numa node
  2019-07-08 17:53         ` Andy Lutomirski
@ 2019-07-09  4:26           ` Pingfan Liu
  0 siblings, 0 replies; 7+ messages in thread
From: Pingfan Liu @ 2019-07-09  4:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, x86, Michal Hocko, Dave Hansen, Mike Rapoport,
	Tony Luck, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Andrew Morton, Vlastimil Babka,
	Oscar Salvador, Pavel Tatashin, Mel Gorman,
	Benjamin Herrenschmidt, Michael Ellerman, Stephen Rothwell,
	Qian Cai, Barret Rhoden, Bjorn Helgaas, David Rientjes, linux-mm,
	LKML

On Tue, Jul 9, 2019 at 1:53 AM Andy Lutomirski <luto@amacapital.net> wrote:
>
>
>
> > On Jul 8, 2019, at 3:35 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> >> On Mon, 8 Jul 2019, Pingfan Liu wrote:
> >>> On Mon, Jul 8, 2019 at 3:44 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> >>>
> >>>> On Fri, 5 Jul 2019, Pingfan Liu wrote:
> >>>>
> >>>> I hit a bug on an AMD machine, with kexec -l nr_cpus=4 option. nr_cpus option
> >>>> is used to speed up kdump process, so it is not a rare case.
> >>>
> >>> But fundamentally wrong, really.
> >>>
> >>> The rest of the CPUs are in a half baken state and any broadcast event,
> >>> e.g. MCE or a stray IPI, will result in a undiagnosable crash.
> >> Very appreciate if you can pay more word on it? I tried to figure out
> >> your point, but fail.
> >>
> >> For "a half baked state", I think you concern about LAPIC state, and I
> >> expand this point like the following:
> >
> > It's not only the APIC state. It's the state of the CPUs in general.
> >
> >> For IPI: when capture kernel BSP is up, the rest cpus are still loop
> >> inside crash_nmi_callback(), so there is no way to eject new IPI from
> >> these cpu. Also we disable_local_APIC(), which effectively prevent the
> >> LAPIC from responding to IPI, except NMI/INIT/SIPI, which will not
> >> occur in crash case.
> >
> > Fair enough for the IPI case.
> >
> >> For MCE, I am not sure whether it can broadcast or not between cpus,
> >> but as my understanding, it can not. Then is it a problem?
> >
> > It can and it does.
> >
> > That's the whole point why we bring up all CPUs in the 'nosmt' case and
> > shut the siblings down again after setting CR4.MCE. Actually that's in fact
> > a 'let's hope no MCE hits before that happened' approach, but that's all we
> > can do.
> >
> > If we don't do that then the MCE broadcast can hit a CPU which has some
> > firmware initialized state. The result can be a full system lockup, triple
> > fault etc.
> >
> > So when the MCE hits a CPU which is still in the crashed kernel lala state,
> > then all hell breaks lose.
> >
> >> From another view point, is there any difference between nr_cpus=1 and
> >> nr_cpus> 1 in crashing case? If stray IPI raises issue to nr_cpus>1,
> >> it does for nr_cpus=1.
> >
> > Anything less than the actual number of present CPUs is problematic except
> > you use the 'let's hope nothing happens' approach. We could add an option
> > to stop the bringup at the early online state similar to what we do for
> > 'nosmt'.
> >
> >
>
> How about we change nr_cpus to do that instead so we never have to have this conversation again?
Are you interest in implementing this?

Thanks,
  Pingfan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] x86/numa: instance all parsed numa node
       [not found]             ` <CAFgQCTui7D6_FQ_v_ijj6k_=+TQzQ3PaGvzxd6p+XEGjQ2S6jw@mail.gmail.com>
@ 2019-07-09 13:34               ` Andy Lutomirski
  2019-07-10  8:40                 ` Pingfan Liu
  0 siblings, 1 reply; 7+ messages in thread
From: Andy Lutomirski @ 2019-07-09 13:34 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Thomas Gleixner, x86, Michal Hocko, Dave Hansen, Mike Rapoport,
	Tony Luck, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Andrew Morton, Vlastimil Babka,
	Oscar Salvador, Pavel Tatashin, Mel Gorman,
	Benjamin Herrenschmidt, Michael Ellerman, Stephen Rothwell,
	Qian Cai, Barret Rhoden, Bjorn Helgaas, David Rientjes, linux-mm,
	LKML



> On Jul 9, 2019, at 1:24 AM, Pingfan Liu <kernelfans@gmail.com> wrote:
> 
>> On Tue, Jul 9, 2019 at 2:12 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>> 
>>> On Tue, 9 Jul 2019, Pingfan Liu wrote:
>>>> On Mon, Jul 8, 2019 at 5:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>>>> It can and it does.
>>>> 
>>>> That's the whole point why we bring up all CPUs in the 'nosmt' case and
>>>> shut the siblings down again after setting CR4.MCE. Actually that's in fact
>>>> a 'let's hope no MCE hits before that happened' approach, but that's all we
>>>> can do.
>>>> 
>>>> If we don't do that then the MCE broadcast can hit a CPU which has some
>>>> firmware initialized state. The result can be a full system lockup, triple
>>>> fault etc.
>>>> 
>>>> So when the MCE hits a CPU which is still in the crashed kernel lala state,
>>>> then all hell breaks lose.
>>> Thank you for the comprehensive explain. With your guide, now, I have
>>> a full understanding of the issue.
>>> 
>>> But when I tried to add something to enable CR4.MCE in
>>> crash_nmi_callback(), I realized that it is undo-able in some case (if
>>> crashed, we will not ask an offline smt cpu to online), also it is
>>> needless. "kexec -l/-p" takes the advantage of the cpu state in the
>>> first kernel, where all logical cpu has CR4.MCE=1.
>>> 
>>> So kexec is exempt from this bug if the first kernel already do it.
>> 
>> No. If the MCE broadcast is handled by a CPU which is stuck in the old
>> kernel stop loop, then it will execute on the old kernel and eventually run
>> into the memory corruption which crashed the old one.
>> 
> Yes, you are right. Stuck cpu may execute the old do_machine_check()
> code. But I just found out that we have
> do_machine_check()->__mc_check_crashing_cpu() to against this case.
> 
> And I think the MCE issue with nr_cpus is not closely related with
> this series, can
> be a separated issue. I had question whether Andy will take it, if
> not, I am glad to do it.
> 
> 

Go for it. I’m not familiar enough with the SMP boot stuff that I would be able to do it any faster than you. I’ll gladly help review it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] x86/numa: instance all parsed numa node
  2019-07-09 13:34               ` Andy Lutomirski
@ 2019-07-10  8:40                 ` Pingfan Liu
  0 siblings, 0 replies; 7+ messages in thread
From: Pingfan Liu @ 2019-07-10  8:40 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, x86, Michal Hocko, Dave Hansen, Mike Rapoport,
	Tony Luck, Andy Lutomirski, Peter Zijlstra, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Andrew Morton, Vlastimil Babka,
	Oscar Salvador, Pavel Tatashin, Mel Gorman,
	Benjamin Herrenschmidt, Michael Ellerman, Stephen Rothwell,
	Qian Cai, Barret Rhoden, Bjorn Helgaas, David Rientjes, linux-mm,
	LKML

On Tue, Jul 9, 2019 at 9:34 PM Andy Lutomirski <luto@amacapital.net> wrote:
>
>
>
> > On Jul 9, 2019, at 1:24 AM, Pingfan Liu <kernelfans@gmail.com> wrote:
> >
> >> On Tue, Jul 9, 2019 at 2:12 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >>
> >>> On Tue, 9 Jul 2019, Pingfan Liu wrote:
> >>>> On Mon, Jul 8, 2019 at 5:35 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >>>> It can and it does.
> >>>>
> >>>> That's the whole point why we bring up all CPUs in the 'nosmt' case and
> >>>> shut the siblings down again after setting CR4.MCE. Actually that's in fact
> >>>> a 'let's hope no MCE hits before that happened' approach, but that's all we
> >>>> can do.
> >>>>
> >>>> If we don't do that then the MCE broadcast can hit a CPU which has some
> >>>> firmware initialized state. The result can be a full system lockup, triple
> >>>> fault etc.
> >>>>
> >>>> So when the MCE hits a CPU which is still in the crashed kernel lala state,
> >>>> then all hell breaks lose.
> >>> Thank you for the comprehensive explain. With your guide, now, I have
> >>> a full understanding of the issue.
> >>>
> >>> But when I tried to add something to enable CR4.MCE in
> >>> crash_nmi_callback(), I realized that it is undo-able in some case (if
> >>> crashed, we will not ask an offline smt cpu to online), also it is
> >>> needless. "kexec -l/-p" takes the advantage of the cpu state in the
> >>> first kernel, where all logical cpu has CR4.MCE=1.
> >>>
> >>> So kexec is exempt from this bug if the first kernel already do it.
> >>
> >> No. If the MCE broadcast is handled by a CPU which is stuck in the old
> >> kernel stop loop, then it will execute on the old kernel and eventually run
> >> into the memory corruption which crashed the old one.
> >>
> > Yes, you are right. Stuck cpu may execute the old do_machine_check()
> > code. But I just found out that we have
> > do_machine_check()->__mc_check_crashing_cpu() to against this case.
> >
> > And I think the MCE issue with nr_cpus is not closely related with
> > this series, can
> > be a separated issue. I had question whether Andy will take it, if
> > not, I am glad to do it.
> >
> >
>
> Go for it. I’m not familiar enough with the SMP boot stuff that I would be able to do it any faster than you. I’ll gladly help review it.
I had sent out a patch to fix maxcpus "[PATCH] smp: force all cpu to
boot once under maxcpus option"
But for the case of nrcpus, I think things will not be so easy due to
percpu area, and I think it may take a quite different way.

Thanks,
  Pingfan


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-07-10  8:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-05  4:15 [PATCH 1/2] x86/numa: carve node online semantics out of alloc_node_data() Pingfan Liu
     [not found] ` <1562300143-11671-2-git-send-email-kernelfans@gmail.com>
     [not found]   ` <alpine.DEB.2.21.1907072133310.3648@nanos.tec.linutronix.de>
2019-07-08  8:36     ` [PATCH 2/2] x86/numa: instance all parsed numa node Pingfan Liu
     [not found]       ` <alpine.DEB.2.21.1907081125300.3648@nanos.tec.linutronix.de>
2019-07-08 17:53         ` Andy Lutomirski
2019-07-09  4:26           ` Pingfan Liu
2019-07-09  4:16         ` Pingfan Liu
     [not found]           ` <alpine.DEB.2.21.1907090810490.1961@nanos.tec.linutronix.de>
     [not found]             ` <CAFgQCTui7D6_FQ_v_ijj6k_=+TQzQ3PaGvzxd6p+XEGjQ2S6jw@mail.gmail.com>
2019-07-09 13:34               ` Andy Lutomirski
2019-07-10  8:40                 ` Pingfan Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).