All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit
@ 2017-04-20 13:24 Vitaly Kuznetsov
  2017-04-20 15:06 ` Peter Zijlstra
  2017-04-20 15:06 ` Peter Zijlstra
  0 siblings, 2 replies; 16+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-20 13:24 UTC (permalink / raw)
  To: x86
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Borislav Petkov,
	Prarit Bhargava, linux-kernel, xen-devel

Recent changes in logical package management (Commit 9d85eb9119f4
("x86/smpboot: Make logical package management more robust") and its
predecessor) caused boot failures for some Xen guests. E.g. I'm trying to
boot 10 CPU guest on AMD Opteron 4284 system and I see the following crash:

[    0.116104] smpboot: Max logical packages: 1
...
[    0.590068]   #8
[    0.001000] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
[    0.001000] ------------[ cut here ]------------
[    0.001000] kernel BUG at arch/x86/kernel/cpu/common.c:1020!

This is happening because total_cpus is 10 and x86_max_cores is 16(!).
Turns out, the number of CPUs (vCPUs in our case) in each logical package
doesn't have to be exactly x86_max_cores, we can have any number of CPUs
<= x86_max_cores and they also don't have to match for all logical
packages. This breaks the current concept of __max_logical_packages.

In this patch I suggest we set __max_logical_packages based on the
max_physical_pkg_id and total_cpus, this should be safe and cover all
possible cases. Alternatively, we may think about eliminating the concept
of __max_logical_packages completely and relying on max_physical_pkg_id/
total_cpus where we currently use topology_max_packages().

The issue could've been solved in Xen too I guess. CPUID returning
x86_max_cores can be tweaked to be the lowerest(?) possible number of
all logical packages of the guest.

Fixes: 9d85eb9119f4 ("x86/smpboot: Make logical package management more robust")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kernel/smpboot.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index bd1f1ad..85f41cd 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -359,7 +359,6 @@ static void __init smp_init_package_map(struct cpuinfo_x86 *c, unsigned int cpu)
 		ncpus = 1;
 	}
 
-	__max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
 	logical_packages = 0;
 
 	/*
@@ -367,6 +366,15 @@ static void __init smp_init_package_map(struct cpuinfo_x86 *c, unsigned int cpu)
 	 * package can be smaller than the actual used apic ids.
 	 */
 	max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);
+
+	/*
+	 * Each logical package has not more than x86_max_cores CPUs but
+	 * it can happen that it has less, e.g. we may have 1 CPU per logical
+	 * package regardless of what's in x86_max_cores. This is seen on some
+	 * Xen setups with AMD processors.
+	 */
+	__max_logical_packages = min(max_physical_pkg_id, total_cpus);
+
 	size = max_physical_pkg_id * sizeof(unsigned int);
 	physical_to_logical_pkg = kmalloc(size, GFP_KERNEL);
 	memset(physical_to_logical_pkg, 0xff, size);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit
  2017-04-20 13:24 [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit Vitaly Kuznetsov
  2017-04-20 15:06 ` Peter Zijlstra
@ 2017-04-20 15:06 ` Peter Zijlstra
  2017-04-20 15:40   ` Vitaly Kuznetsov
                     ` (3 more replies)
  1 sibling, 4 replies; 16+ messages in thread
From: Peter Zijlstra @ 2017-04-20 15:06 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Prarit Bhargava, linux-kernel, xen-devel

On Thu, Apr 20, 2017 at 03:24:53PM +0200, Vitaly Kuznetsov wrote:
> In this patch I suggest we set __max_logical_packages based on the
> max_physical_pkg_id and total_cpus,

So my 4 socket 144 CPU system will then get max_physical_pkg_id=144,
instead of 4.

This wastes quite a bit of memory for the per-node arrays. Luckily most
are just pointer arrays, but still, wasting 140*8 bytes for each of
them.

> this should be safe and cover all
> possible cases. Alternatively, we may think about eliminating the concept
> of __max_logical_packages completely and relying on max_physical_pkg_id/
> total_cpus where we currently use topology_max_packages().
> 
> The issue could've been solved in Xen too I guess. CPUID returning
> x86_max_cores can be tweaked to be the lowerest(?) possible number of
> all logical packages of the guest.

This is getting ludicrous. Xen is plain broken, and instead of fixing
it, you propose to somehow deal with its obviously crack induced
behaviour :-(

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit
  2017-04-20 13:24 [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit Vitaly Kuznetsov
@ 2017-04-20 15:06 ` Peter Zijlstra
  2017-04-20 15:06 ` Peter Zijlstra
  1 sibling, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2017-04-20 15:06 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Prarit Bhargava, x86, linux-kernel, Ingo Molnar, H. Peter Anvin,
	xen-devel, Thomas Gleixner, Borislav Petkov

On Thu, Apr 20, 2017 at 03:24:53PM +0200, Vitaly Kuznetsov wrote:
> In this patch I suggest we set __max_logical_packages based on the
> max_physical_pkg_id and total_cpus,

So my 4 socket 144 CPU system will then get max_physical_pkg_id=144,
instead of 4.

This wastes quite a bit of memory for the per-node arrays. Luckily most
are just pointer arrays, but still, wasting 140*8 bytes for each of
them.

> this should be safe and cover all
> possible cases. Alternatively, we may think about eliminating the concept
> of __max_logical_packages completely and relying on max_physical_pkg_id/
> total_cpus where we currently use topology_max_packages().
> 
> The issue could've been solved in Xen too I guess. CPUID returning
> x86_max_cores can be tweaked to be the lowerest(?) possible number of
> all logical packages of the guest.

This is getting ludicrous. Xen is plain broken, and instead of fixing
it, you propose to somehow deal with its obviously crack induced
behaviour :-(


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit
  2017-04-20 15:06 ` Peter Zijlstra
  2017-04-20 15:40   ` Vitaly Kuznetsov
@ 2017-04-20 15:40   ` Vitaly Kuznetsov
  2017-04-20 16:01     ` [Xen-devel] " Boris Ostrovsky
                       ` (3 more replies)
  2017-04-20 17:04   ` Andrew Cooper
  2017-04-20 17:04   ` [Xen-devel] " Andrew Cooper
  3 siblings, 4 replies; 16+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-20 15:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Prarit Bhargava, linux-kernel, xen-devel

Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, Apr 20, 2017 at 03:24:53PM +0200, Vitaly Kuznetsov wrote:
>> In this patch I suggest we set __max_logical_packages based on the
>> max_physical_pkg_id and total_cpus,
>
> So my 4 socket 144 CPU system will then get max_physical_pkg_id=144,
> instead of 4.
>
> This wastes quite a bit of memory for the per-node arrays. Luckily most
> are just pointer arrays, but still, wasting 140*8 bytes for each of
> them.
>
>> this should be safe and cover all
>> possible cases. Alternatively, we may think about eliminating the concept
>> of __max_logical_packages completely and relying on max_physical_pkg_id/
>> total_cpus where we currently use topology_max_packages().
>> 
>> The issue could've been solved in Xen too I guess. CPUID returning
>> x86_max_cores can be tweaked to be the lowerest(?) possible number of
>> all logical packages of the guest.
>
> This is getting ludicrous. Xen is plain broken, and instead of fixing
> it, you propose to somehow deal with its obviously crack induced
> behaviour :-(

Totally agree and I don't like the solution I propose (and that's why
this is RFC)... The problem is that there are such Xen setups in the
wild and with the recent changes some guests will BUG() :-(

Alternatively, we can just remove the BUG() and do something with CPUs
which have their pkg >= __max_logical_packages, e.g. assign them to the
last package. Far from ideal but will help to avoid the regression.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit
  2017-04-20 15:06 ` Peter Zijlstra
@ 2017-04-20 15:40   ` Vitaly Kuznetsov
  2017-04-20 15:40   ` Vitaly Kuznetsov
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-20 15:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Prarit Bhargava, x86, linux-kernel, Ingo Molnar, H. Peter Anvin,
	xen-devel, Thomas Gleixner, Borislav Petkov

Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, Apr 20, 2017 at 03:24:53PM +0200, Vitaly Kuznetsov wrote:
>> In this patch I suggest we set __max_logical_packages based on the
>> max_physical_pkg_id and total_cpus,
>
> So my 4 socket 144 CPU system will then get max_physical_pkg_id=144,
> instead of 4.
>
> This wastes quite a bit of memory for the per-node arrays. Luckily most
> are just pointer arrays, but still, wasting 140*8 bytes for each of
> them.
>
>> this should be safe and cover all
>> possible cases. Alternatively, we may think about eliminating the concept
>> of __max_logical_packages completely and relying on max_physical_pkg_id/
>> total_cpus where we currently use topology_max_packages().
>> 
>> The issue could've been solved in Xen too I guess. CPUID returning
>> x86_max_cores can be tweaked to be the lowerest(?) possible number of
>> all logical packages of the guest.
>
> This is getting ludicrous. Xen is plain broken, and instead of fixing
> it, you propose to somehow deal with its obviously crack induced
> behaviour :-(

Totally agree and I don't like the solution I propose (and that's why
this is RFC)... The problem is that there are such Xen setups in the
wild and with the recent changes some guests will BUG() :-(

Alternatively, we can just remove the BUG() and do something with CPUs
which have their pkg >= __max_logical_packages, e.g. assign them to the
last package. Far from ideal but will help to avoid the regression.

-- 
  Vitaly

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xen-devel] [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit
  2017-04-20 15:40   ` Vitaly Kuznetsov
@ 2017-04-20 16:01     ` Boris Ostrovsky
  2017-04-20 16:21       ` Vitaly Kuznetsov
  2017-04-20 16:21       ` Vitaly Kuznetsov
  2017-04-20 16:01     ` Boris Ostrovsky
                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 16+ messages in thread
From: Boris Ostrovsky @ 2017-04-20 16:01 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Peter Zijlstra
  Cc: Prarit Bhargava, x86, linux-kernel, Ingo Molnar, H. Peter Anvin,
	xen-devel, Thomas Gleixner, Borislav Petkov, Juergen Gross

On 04/20/2017 11:40 AM, Vitaly Kuznetsov wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
>
>> On Thu, Apr 20, 2017 at 03:24:53PM +0200, Vitaly Kuznetsov wrote:
>>> In this patch I suggest we set __max_logical_packages based on the
>>> max_physical_pkg_id and total_cpus,
>> So my 4 socket 144 CPU system will then get max_physical_pkg_id=144,
>> instead of 4.
>>
>> This wastes quite a bit of memory for the per-node arrays. Luckily most
>> are just pointer arrays, but still, wasting 140*8 bytes for each of
>> them.
>>
>>> this should be safe and cover all
>>> possible cases. Alternatively, we may think about eliminating the concept
>>> of __max_logical_packages completely and relying on max_physical_pkg_id/
>>> total_cpus where we currently use topology_max_packages().
>>>
>>> The issue could've been solved in Xen too I guess. CPUID returning
>>> x86_max_cores can be tweaked to be the lowerest(?) possible number of
>>> all logical packages of the guest.
>> This is getting ludicrous. Xen is plain broken, and instead of fixing
>> it, you propose to somehow deal with its obviously crack induced
>> behaviour :-(
> Totally agree and I don't like the solution I propose (and that's why
> this is RFC)... The problem is that there are such Xen setups in the
> wild and with the recent changes some guests will BUG() :-(
>
> Alternatively, we can just remove the BUG() and do something with CPUs
> which have their pkg >= __max_logical_packages, e.g. assign them to the
> last package. Far from ideal but will help to avoid the regression.

Do you observe this failure on PV or HVM guest?

We've had a number of issues with topology discovery for PV guests but
AFAIK they have been addressed (so far). I wonder though whether it
would make sense to have some sort of a callback (or an smp_ops.op) to
override native topology info, if needed.

-boris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit
  2017-04-20 15:40   ` Vitaly Kuznetsov
  2017-04-20 16:01     ` [Xen-devel] " Boris Ostrovsky
@ 2017-04-20 16:01     ` Boris Ostrovsky
  2017-04-20 16:15     ` Peter Zijlstra
  2017-04-20 16:15     ` Peter Zijlstra
  3 siblings, 0 replies; 16+ messages in thread
From: Boris Ostrovsky @ 2017-04-20 16:01 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Peter Zijlstra
  Cc: Prarit Bhargava, Juergen Gross, x86, linux-kernel, Ingo Molnar,
	H. Peter Anvin, xen-devel, Thomas Gleixner, Borislav Petkov

On 04/20/2017 11:40 AM, Vitaly Kuznetsov wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
>
>> On Thu, Apr 20, 2017 at 03:24:53PM +0200, Vitaly Kuznetsov wrote:
>>> In this patch I suggest we set __max_logical_packages based on the
>>> max_physical_pkg_id and total_cpus,
>> So my 4 socket 144 CPU system will then get max_physical_pkg_id=144,
>> instead of 4.
>>
>> This wastes quite a bit of memory for the per-node arrays. Luckily most
>> are just pointer arrays, but still, wasting 140*8 bytes for each of
>> them.
>>
>>> this should be safe and cover all
>>> possible cases. Alternatively, we may think about eliminating the concept
>>> of __max_logical_packages completely and relying on max_physical_pkg_id/
>>> total_cpus where we currently use topology_max_packages().
>>>
>>> The issue could've been solved in Xen too I guess. CPUID returning
>>> x86_max_cores can be tweaked to be the lowerest(?) possible number of
>>> all logical packages of the guest.
>> This is getting ludicrous. Xen is plain broken, and instead of fixing
>> it, you propose to somehow deal with its obviously crack induced
>> behaviour :-(
> Totally agree and I don't like the solution I propose (and that's why
> this is RFC)... The problem is that there are such Xen setups in the
> wild and with the recent changes some guests will BUG() :-(
>
> Alternatively, we can just remove the BUG() and do something with CPUs
> which have their pkg >= __max_logical_packages, e.g. assign them to the
> last package. Far from ideal but will help to avoid the regression.

Do you observe this failure on PV or HVM guest?

We've had a number of issues with topology discovery for PV guests but
AFAIK they have been addressed (so far). I wonder though whether it
would make sense to have some sort of a callback (or an smp_ops.op) to
override native topology info, if needed.

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit
  2017-04-20 15:40   ` Vitaly Kuznetsov
                       ` (2 preceding siblings ...)
  2017-04-20 16:15     ` Peter Zijlstra
@ 2017-04-20 16:15     ` Peter Zijlstra
  2017-04-20 17:09       ` Boris Ostrovsky
  2017-04-20 17:09       ` [Xen-devel] " Boris Ostrovsky
  3 siblings, 2 replies; 16+ messages in thread
From: Peter Zijlstra @ 2017-04-20 16:15 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: x86, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Borislav Petkov, Prarit Bhargava, linux-kernel, xen-devel

On Thu, Apr 20, 2017 at 05:40:37PM +0200, Vitaly Kuznetsov wrote:
> > This is getting ludicrous. Xen is plain broken, and instead of fixing
> > it, you propose to somehow deal with its obviously crack induced
> > behaviour :-(
> 
> Totally agree and I don't like the solution I propose (and that's why
> this is RFC)... The problem is that there are such Xen setups in the
> wild and with the recent changes some guests will BUG() :-(
> 
> Alternatively, we can just remove the BUG() and do something with CPUs
> which have their pkg >= __max_logical_packages, e.g. assign them to the
> last package. Far from ideal but will help to avoid the regression.

So currently none of the stuff that uses this should appear in Xen. Its
all drivers for hardware that isn't virtualized (afaik). So assigning to
the last package 'works'.

But the moment this ends up getting used that explodes, because we'll
need different object instances for each piece of hardware.

There just isn't a good solution; on the one hand the BIOS is prone to
providing crap numbers, on the other hand virt (esp. Xen as it turns
out) provides absolutely bonkers/inconsistent topology information.

Very frustrating :-/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit
  2017-04-20 15:40   ` Vitaly Kuznetsov
  2017-04-20 16:01     ` [Xen-devel] " Boris Ostrovsky
  2017-04-20 16:01     ` Boris Ostrovsky
@ 2017-04-20 16:15     ` Peter Zijlstra
  2017-04-20 16:15     ` Peter Zijlstra
  3 siblings, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2017-04-20 16:15 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Prarit Bhargava, x86, linux-kernel, Ingo Molnar, H. Peter Anvin,
	xen-devel, Thomas Gleixner, Borislav Petkov

On Thu, Apr 20, 2017 at 05:40:37PM +0200, Vitaly Kuznetsov wrote:
> > This is getting ludicrous. Xen is plain broken, and instead of fixing
> > it, you propose to somehow deal with its obviously crack induced
> > behaviour :-(
> 
> Totally agree and I don't like the solution I propose (and that's why
> this is RFC)... The problem is that there are such Xen setups in the
> wild and with the recent changes some guests will BUG() :-(
> 
> Alternatively, we can just remove the BUG() and do something with CPUs
> which have their pkg >= __max_logical_packages, e.g. assign them to the
> last package. Far from ideal but will help to avoid the regression.

So currently none of the stuff that uses this should appear in Xen. Its
all drivers for hardware that isn't virtualized (afaik). So assigning to
the last package 'works'.

But the moment this ends up getting used that explodes, because we'll
need different object instances for each piece of hardware.

There just isn't a good solution; on the one hand the BIOS is prone to
providing crap numbers, on the other hand virt (esp. Xen as it turns
out) provides absolutely bonkers/inconsistent topology information.

Very frustrating :-/

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xen-devel] [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit
  2017-04-20 16:01     ` [Xen-devel] " Boris Ostrovsky
@ 2017-04-20 16:21       ` Vitaly Kuznetsov
  2017-04-20 16:21       ` Vitaly Kuznetsov
  1 sibling, 0 replies; 16+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-20 16:21 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Peter Zijlstra, Prarit Bhargava, x86, linux-kernel, Ingo Molnar,
	H. Peter Anvin, xen-devel, Thomas Gleixner, Borislav Petkov,
	Juergen Gross

Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:

> On 04/20/2017 11:40 AM, Vitaly Kuznetsov wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>>
>>> On Thu, Apr 20, 2017 at 03:24:53PM +0200, Vitaly Kuznetsov wrote:
>>>> In this patch I suggest we set __max_logical_packages based on the
>>>> max_physical_pkg_id and total_cpus,
>>> So my 4 socket 144 CPU system will then get max_physical_pkg_id=144,
>>> instead of 4.
>>>
>>> This wastes quite a bit of memory for the per-node arrays. Luckily most
>>> are just pointer arrays, but still, wasting 140*8 bytes for each of
>>> them.
>>>
>>>> this should be safe and cover all
>>>> possible cases. Alternatively, we may think about eliminating the concept
>>>> of __max_logical_packages completely and relying on max_physical_pkg_id/
>>>> total_cpus where we currently use topology_max_packages().
>>>>
>>>> The issue could've been solved in Xen too I guess. CPUID returning
>>>> x86_max_cores can be tweaked to be the lowerest(?) possible number of
>>>> all logical packages of the guest.
>>> This is getting ludicrous. Xen is plain broken, and instead of fixing
>>> it, you propose to somehow deal with its obviously crack induced
>>> behaviour :-(
>> Totally agree and I don't like the solution I propose (and that's why
>> this is RFC)... The problem is that there are such Xen setups in the
>> wild and with the recent changes some guests will BUG() :-(
>>
>> Alternatively, we can just remove the BUG() and do something with CPUs
>> which have their pkg >= __max_logical_packages, e.g. assign them to the
>> last package. Far from ideal but will help to avoid the regression.
>
> Do you observe this failure on PV or HVM guest?
>
> We've had a number of issues with topology discovery for PV guests but
> AFAIK they have been addressed (so far). I wonder though whether it
> would make sense to have some sort of a callback (or an smp_ops.op) to
> override native topology info, if needed.
>

This is HVM.

I guess that CPUID handling for AMD processors in the hypervisor doesn't
adjust the core information and passes it from hardware as-is while it
should be tweaked.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit
  2017-04-20 16:01     ` [Xen-devel] " Boris Ostrovsky
  2017-04-20 16:21       ` Vitaly Kuznetsov
@ 2017-04-20 16:21       ` Vitaly Kuznetsov
  1 sibling, 0 replies; 16+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-20 16:21 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: Prarit Bhargava, Juergen Gross, Peter Zijlstra, x86,
	linux-kernel, Ingo Molnar, H. Peter Anvin, xen-devel,
	Thomas Gleixner, Borislav Petkov

Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:

> On 04/20/2017 11:40 AM, Vitaly Kuznetsov wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>>
>>> On Thu, Apr 20, 2017 at 03:24:53PM +0200, Vitaly Kuznetsov wrote:
>>>> In this patch I suggest we set __max_logical_packages based on the
>>>> max_physical_pkg_id and total_cpus,
>>> So my 4 socket 144 CPU system will then get max_physical_pkg_id=144,
>>> instead of 4.
>>>
>>> This wastes quite a bit of memory for the per-node arrays. Luckily most
>>> are just pointer arrays, but still, wasting 140*8 bytes for each of
>>> them.
>>>
>>>> this should be safe and cover all
>>>> possible cases. Alternatively, we may think about eliminating the concept
>>>> of __max_logical_packages completely and relying on max_physical_pkg_id/
>>>> total_cpus where we currently use topology_max_packages().
>>>>
>>>> The issue could've been solved in Xen too I guess. CPUID returning
>>>> x86_max_cores can be tweaked to be the lowerest(?) possible number of
>>>> all logical packages of the guest.
>>> This is getting ludicrous. Xen is plain broken, and instead of fixing
>>> it, you propose to somehow deal with its obviously crack induced
>>> behaviour :-(
>> Totally agree and I don't like the solution I propose (and that's why
>> this is RFC)... The problem is that there are such Xen setups in the
>> wild and with the recent changes some guests will BUG() :-(
>>
>> Alternatively, we can just remove the BUG() and do something with CPUs
>> which have their pkg >= __max_logical_packages, e.g. assign them to the
>> last package. Far from ideal but will help to avoid the regression.
>
> Do you observe this failure on PV or HVM guest?
>
> We've had a number of issues with topology discovery for PV guests but
> AFAIK they have been addressed (so far). I wonder though whether it
> would make sense to have some sort of a callback (or an smp_ops.op) to
> override native topology info, if needed.
>

This is HVM.

I guess that CPUID handling for AMD processors in the hypervisor doesn't
adjust the core information and passes it from hardware as-is while it
should be tweaked.

-- 
  Vitaly

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xen-devel] [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit
  2017-04-20 15:06 ` Peter Zijlstra
                     ` (2 preceding siblings ...)
  2017-04-20 17:04   ` Andrew Cooper
@ 2017-04-20 17:04   ` Andrew Cooper
  3 siblings, 0 replies; 16+ messages in thread
From: Andrew Cooper @ 2017-04-20 17:04 UTC (permalink / raw)
  To: Peter Zijlstra, Vitaly Kuznetsov
  Cc: Prarit Bhargava, x86, linux-kernel, Ingo Molnar, H. Peter Anvin,
	xen-devel, Thomas Gleixner, Borislav Petkov

On 20/04/17 16:06, Peter Zijlstra wrote:
> On Thu, Apr 20, 2017 at 03:24:53PM +0200, Vitaly Kuznetsov wrote:
>> In this patch I suggest we set __max_logical_packages based on the
>> max_physical_pkg_id and total_cpus,
> So my 4 socket 144 CPU system will then get max_physical_pkg_id=144,
> instead of 4.
>
> This wastes quite a bit of memory for the per-node arrays. Luckily most
> are just pointer arrays, but still, wasting 140*8 bytes for each of
> them.
>
>> this should be safe and cover all
>> possible cases. Alternatively, we may think about eliminating the concept
>> of __max_logical_packages completely and relying on max_physical_pkg_id/
>> total_cpus where we currently use topology_max_packages().
>>
>> The issue could've been solved in Xen too I guess. CPUID returning
>> x86_max_cores can be tweaked to be the lowerest(?) possible number of
>> all logical packages of the guest.
> This is getting ludicrous. Xen is plain broken, and instead of fixing
> it, you propose to somehow deal with its obviously crack induced
> behaviour :-(

Xen is (and has been forever) plain broken in this specific regard. 
Fixing CPUID handling for guests has taken me 18 months and ~80 patches
so far, and it still isn't complete.

However, Linux needs to be able to function on existing production
systems (which is every instance of Xen currently running).

Linux shouldn't BUG() because something provided by the firmware looks
wonky.  This is conceptually no different from finding junk the APCI tables.

(I fully agree that we shouldn't have got to this state, but we are 12
years too late in this respect.)

~Andrew

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit
  2017-04-20 15:06 ` Peter Zijlstra
  2017-04-20 15:40   ` Vitaly Kuznetsov
  2017-04-20 15:40   ` Vitaly Kuznetsov
@ 2017-04-20 17:04   ` Andrew Cooper
  2017-04-20 17:04   ` [Xen-devel] " Andrew Cooper
  3 siblings, 0 replies; 16+ messages in thread
From: Andrew Cooper @ 2017-04-20 17:04 UTC (permalink / raw)
  To: Peter Zijlstra, Vitaly Kuznetsov
  Cc: Prarit Bhargava, x86, linux-kernel, Ingo Molnar, H. Peter Anvin,
	xen-devel, Thomas Gleixner, Borislav Petkov

On 20/04/17 16:06, Peter Zijlstra wrote:
> On Thu, Apr 20, 2017 at 03:24:53PM +0200, Vitaly Kuznetsov wrote:
>> In this patch I suggest we set __max_logical_packages based on the
>> max_physical_pkg_id and total_cpus,
> So my 4 socket 144 CPU system will then get max_physical_pkg_id=144,
> instead of 4.
>
> This wastes quite a bit of memory for the per-node arrays. Luckily most
> are just pointer arrays, but still, wasting 140*8 bytes for each of
> them.
>
>> this should be safe and cover all
>> possible cases. Alternatively, we may think about eliminating the concept
>> of __max_logical_packages completely and relying on max_physical_pkg_id/
>> total_cpus where we currently use topology_max_packages().
>>
>> The issue could've been solved in Xen too I guess. CPUID returning
>> x86_max_cores can be tweaked to be the lowerest(?) possible number of
>> all logical packages of the guest.
> This is getting ludicrous. Xen is plain broken, and instead of fixing
> it, you propose to somehow deal with its obviously crack induced
> behaviour :-(

Xen is (and has been forever) plain broken in this specific regard. 
Fixing CPUID handling for guests has taken me 18 months and ~80 patches
so far, and it still isn't complete.

However, Linux needs to be able to function on existing production
systems (which is every instance of Xen currently running).

Linux shouldn't BUG() because something provided by the firmware looks
wonky.  This is conceptually no different from finding junk the APCI tables.

(I fully agree that we shouldn't have got to this state, but we are 12
years too late in this respect.)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xen-devel] [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit
  2017-04-20 16:15     ` Peter Zijlstra
  2017-04-20 17:09       ` Boris Ostrovsky
@ 2017-04-20 17:09       ` Boris Ostrovsky
  1 sibling, 0 replies; 16+ messages in thread
From: Boris Ostrovsky @ 2017-04-20 17:09 UTC (permalink / raw)
  To: Peter Zijlstra, Vitaly Kuznetsov
  Cc: Prarit Bhargava, x86, linux-kernel, Ingo Molnar, H. Peter Anvin,
	xen-devel, Thomas Gleixner, Borislav Petkov, Juergen Gross

On 04/20/2017 12:15 PM, Peter Zijlstra wrote:
> On Thu, Apr 20, 2017 at 05:40:37PM +0200, Vitaly Kuznetsov wrote:
>>> This is getting ludicrous. Xen is plain broken, and instead of fixing
>>> it, you propose to somehow deal with its obviously crack induced
>>> behaviour :-(
>> Totally agree and I don't like the solution I propose (and that's why
>> this is RFC)... The problem is that there are such Xen setups in the
>> wild and with the recent changes some guests will BUG() :-(
>>
>> Alternatively, we can just remove the BUG() and do something with CPUs
>> which have their pkg >= __max_logical_packages, e.g. assign them to the
>> last package. Far from ideal but will help to avoid the regression.
> So currently none of the stuff that uses this should appear in Xen. Its
> all drivers for hardware that isn't virtualized (afaik). So assigning to
> the last package 'works'.
>
> But the moment this ends up getting used that explodes, because we'll
> need different object instances for each piece of hardware.


This already gets used. I don't remember details but we had to fix
something due to RAPL code referencing topology info (and yes, there is
no reason for a guest to use RAPL, but we shouldn't crash neither)


>
> There just isn't a good solution; on the one hand the BIOS is prone to
> providing crap numbers, on the other hand virt (esp. Xen as it turns
> out) provides absolutely bonkers/inconsistent topology information.
>
> Very frustrating :-/
>

So we might need a way to bypass topology discovery and present some
sort of default topology (single package, or 1 CPU per package, for
example) and ignore APICID and all that.

-boris

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit
  2017-04-20 16:15     ` Peter Zijlstra
@ 2017-04-20 17:09       ` Boris Ostrovsky
  2017-04-20 17:09       ` [Xen-devel] " Boris Ostrovsky
  1 sibling, 0 replies; 16+ messages in thread
From: Boris Ostrovsky @ 2017-04-20 17:09 UTC (permalink / raw)
  To: Peter Zijlstra, Vitaly Kuznetsov
  Cc: Prarit Bhargava, Juergen Gross, x86, linux-kernel, Ingo Molnar,
	H. Peter Anvin, xen-devel, Thomas Gleixner, Borislav Petkov

On 04/20/2017 12:15 PM, Peter Zijlstra wrote:
> On Thu, Apr 20, 2017 at 05:40:37PM +0200, Vitaly Kuznetsov wrote:
>>> This is getting ludicrous. Xen is plain broken, and instead of fixing
>>> it, you propose to somehow deal with its obviously crack induced
>>> behaviour :-(
>> Totally agree and I don't like the solution I propose (and that's why
>> this is RFC)... The problem is that there are such Xen setups in the
>> wild and with the recent changes some guests will BUG() :-(
>>
>> Alternatively, we can just remove the BUG() and do something with CPUs
>> which have their pkg >= __max_logical_packages, e.g. assign them to the
>> last package. Far from ideal but will help to avoid the regression.
> So currently none of the stuff that uses this should appear in Xen. Its
> all drivers for hardware that isn't virtualized (afaik). So assigning to
> the last package 'works'.
>
> But the moment this ends up getting used that explodes, because we'll
> need different object instances for each piece of hardware.


This already gets used. I don't remember details but we had to fix
something due to RAPL code referencing topology info (and yes, there is
no reason for a guest to use RAPL, but we shouldn't crash neither)


>
> There just isn't a good solution; on the one hand the BIOS is prone to
> providing crap numbers, on the other hand virt (esp. Xen as it turns
> out) provides absolutely bonkers/inconsistent topology information.
>
> Very frustrating :-/
>

So we might need a way to bypass topology discovery and present some
sort of default topology (single package, or 1 CPU per package, for
example) and ignore APICID and all that.

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit
@ 2017-04-20 13:24 Vitaly Kuznetsov
  0 siblings, 0 replies; 16+ messages in thread
From: Vitaly Kuznetsov @ 2017-04-20 13:24 UTC (permalink / raw)
  To: x86
  Cc: Prarit Bhargava, linux-kernel, Ingo Molnar, H. Peter Anvin,
	xen-devel, Thomas Gleixner, Borislav Petkov

Recent changes in logical package management (Commit 9d85eb9119f4
("x86/smpboot: Make logical package management more robust") and its
predecessor) caused boot failures for some Xen guests. E.g. I'm trying to
boot 10 CPU guest on AMD Opteron 4284 system and I see the following crash:

[    0.116104] smpboot: Max logical packages: 1
...
[    0.590068]   #8
[    0.001000] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
[    0.001000] ------------[ cut here ]------------
[    0.001000] kernel BUG at arch/x86/kernel/cpu/common.c:1020!

This is happening because total_cpus is 10 and x86_max_cores is 16(!).
Turns out, the number of CPUs (vCPUs in our case) in each logical package
doesn't have to be exactly x86_max_cores, we can have any number of CPUs
<= x86_max_cores and they also don't have to match for all logical
packages. This breaks the current concept of __max_logical_packages.

In this patch I suggest we set __max_logical_packages based on the
max_physical_pkg_id and total_cpus, this should be safe and cover all
possible cases. Alternatively, we may think about eliminating the concept
of __max_logical_packages completely and relying on max_physical_pkg_id/
total_cpus where we currently use topology_max_packages().

The issue could've been solved in Xen too I guess. CPUID returning
x86_max_cores can be tweaked to be the lowerest(?) possible number of
all logical packages of the guest.

Fixes: 9d85eb9119f4 ("x86/smpboot: Make logical package management more robust")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kernel/smpboot.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index bd1f1ad..85f41cd 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -359,7 +359,6 @@ static void __init smp_init_package_map(struct cpuinfo_x86 *c, unsigned int cpu)
 		ncpus = 1;
 	}
 
-	__max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
 	logical_packages = 0;
 
 	/*
@@ -367,6 +366,15 @@ static void __init smp_init_package_map(struct cpuinfo_x86 *c, unsigned int cpu)
 	 * package can be smaller than the actual used apic ids.
 	 */
 	max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);
+
+	/*
+	 * Each logical package has not more than x86_max_cores CPUs but
+	 * it can happen that it has less, e.g. we may have 1 CPU per logical
+	 * package regardless of what's in x86_max_cores. This is seen on some
+	 * Xen setups with AMD processors.
+	 */
+	__max_logical_packages = min(max_physical_pkg_id, total_cpus);
+
 	size = max_physical_pkg_id * sizeof(unsigned int);
 	physical_to_logical_pkg = kmalloc(size, GFP_KERNEL);
 	memset(physical_to_logical_pkg, 0xff, size);
-- 
2.9.3


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-04-20 17:11 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-20 13:24 [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit Vitaly Kuznetsov
2017-04-20 15:06 ` Peter Zijlstra
2017-04-20 15:06 ` Peter Zijlstra
2017-04-20 15:40   ` Vitaly Kuznetsov
2017-04-20 15:40   ` Vitaly Kuznetsov
2017-04-20 16:01     ` [Xen-devel] " Boris Ostrovsky
2017-04-20 16:21       ` Vitaly Kuznetsov
2017-04-20 16:21       ` Vitaly Kuznetsov
2017-04-20 16:01     ` Boris Ostrovsky
2017-04-20 16:15     ` Peter Zijlstra
2017-04-20 16:15     ` Peter Zijlstra
2017-04-20 17:09       ` Boris Ostrovsky
2017-04-20 17:09       ` [Xen-devel] " Boris Ostrovsky
2017-04-20 17:04   ` Andrew Cooper
2017-04-20 17:04   ` [Xen-devel] " Andrew Cooper
  -- strict thread matches above, loose matches on Subject: below --
2017-04-20 13:24 Vitaly Kuznetsov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.