All of lore.kernel.org
 help / color / mirror / Atom feed
* Crash when using cpupools
@ 2021-09-03 15:41 Bertrand Marquis
  2021-09-06  8:23 ` Juergen Gross
  0 siblings, 1 reply; 7+ messages in thread
From: Bertrand Marquis @ 2021-09-03 15:41 UTC (permalink / raw)
  To: Xen-devel; +Cc: Julien Grall, Stefano Stabellini, Juergen Gross, Dario Faggioli

Hi,

While doing some investigation with cpupools I encountered a crash when trying to isolate a guest to its own physical cpu.

I am using current staging status.

I did the following (on FVP with 8 cores):
- start dom0 with dom0_max_vcpus=1
- remove core 1 from dom0 cpupool: xl cpupool-cpu-remove Pool-0 1
- create a new pool: xl cpupool-create name=\"NetPool\”
- add core 1 to the pool: xl cpupool-cpu-add NetPool 1
- create a guest in NetPool using the following in the guest config: pool=“NetPool"

I end with a crash with the following call trace during guest creation:
(XEN) Xen call trace:
(XEN)    [<0000000000234cb0>] credit2.c#csched2_alloc_udata+0x58/0xfc (PC)
(XEN)    [<0000000000234c80>] credit2.c#csched2_alloc_udata+0x28/0xfc (LR)
(XEN)    [<0000000000242d38>] sched_move_domain+0x144/0x6c0
(XEN)    [<000000000022dd18>] cpupool.c#cpupool_move_domain_locked+0x38/0x70
(XEN)    [<000000000022fadc>] cpupool_do_sysctl+0x73c/0x780
(XEN)    [<000000000022d8e0>] do_sysctl+0x788/0xa58
(XEN)    [<0000000000273b68>] traps.c#do_trap_hypercall+0x78/0x170
(XEN)    [<0000000000274f70>] do_trap_guest_sync+0x138/0x618
(XEN)    [<0000000000260458>] entry.o#guest_sync_slowpath+0xa4/0xd4

After some debugging I found out that unit->vcpu_list is NULL after unit->vcpu_list = d->vcpu[unit->unit_id]; with unit_id 0 in common/sched/core.c:688
This makes the call to is_idle_unit(unit) in csched2_alloc_udata trigger the crash.

Am I doing something wrong ?
Is it a know problem ?
Does anybody have a pointer of where the source of the issue might be ?

I have a working setup to debug this if any other info is required to understand the issue.

Cheers
Bertrand


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Crash when using cpupools
  2021-09-03 15:41 Crash when using cpupools Bertrand Marquis
@ 2021-09-06  8:23 ` Juergen Gross
  2021-09-06  8:30   ` Bertrand Marquis
  2021-09-06  8:46   ` Andrew Cooper
  0 siblings, 2 replies; 7+ messages in thread
From: Juergen Gross @ 2021-09-06  8:23 UTC (permalink / raw)
  To: Bertrand Marquis, Xen-devel
  Cc: Julien Grall, Stefano Stabellini, Dario Faggioli


[-- Attachment #1.1.1: Type: text/plain, Size: 1636 bytes --]

On 03.09.21 17:41, Bertrand Marquis wrote:
> Hi,
> 
> While doing some investigation with cpupools I encountered a crash when trying to isolate a guest to its own physical cpu.
> 
> I am using current staging status.
> 
> I did the following (on FVP with 8 cores):
> - start dom0 with dom0_max_vcpus=1
> - remove core 1 from dom0 cpupool: xl cpupool-cpu-remove Pool-0 1
> - create a new pool: xl cpupool-create name=\"NetPool\”
> - add core 1 to the pool: xl cpupool-cpu-add NetPool 1
> - create a guest in NetPool using the following in the guest config: pool=“NetPool"
> 
> I end with a crash with the following call trace during guest creation:
> (XEN) Xen call trace:
> (XEN)    [<0000000000234cb0>] credit2.c#csched2_alloc_udata+0x58/0xfc (PC)
> (XEN)    [<0000000000234c80>] credit2.c#csched2_alloc_udata+0x28/0xfc (LR)
> (XEN)    [<0000000000242d38>] sched_move_domain+0x144/0x6c0
> (XEN)    [<000000000022dd18>] cpupool.c#cpupool_move_domain_locked+0x38/0x70
> (XEN)    [<000000000022fadc>] cpupool_do_sysctl+0x73c/0x780
> (XEN)    [<000000000022d8e0>] do_sysctl+0x788/0xa58
> (XEN)    [<0000000000273b68>] traps.c#do_trap_hypercall+0x78/0x170
> (XEN)    [<0000000000274f70>] do_trap_guest_sync+0x138/0x618
> (XEN)    [<0000000000260458>] entry.o#guest_sync_slowpath+0xa4/0xd4
> 
> After some debugging I found out that unit->vcpu_list is NULL after unit->vcpu_list = d->vcpu[unit->unit_id]; with unit_id 0 in common/sched/core.c:688
> This makes the call to is_idle_unit(unit) in csched2_alloc_udata trigger the crash.

So there is no vcpu 0 in that domain? How is this possible?


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Crash when using cpupools
  2021-09-06  8:23 ` Juergen Gross
@ 2021-09-06  8:30   ` Bertrand Marquis
  2021-09-06  8:46   ` Andrew Cooper
  1 sibling, 0 replies; 7+ messages in thread
From: Bertrand Marquis @ 2021-09-06  8:30 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Xen-devel, Julien Grall, Stefano Stabellini, Dario Faggioli

Hi Juergen,

> On 6 Sep 2021, at 09:23, Juergen Gross <jgross@suse.com> wrote:
> 
> On 03.09.21 17:41, Bertrand Marquis wrote:
>> Hi,
>> While doing some investigation with cpupools I encountered a crash when trying to isolate a guest to its own physical cpu.
>> I am using current staging status.
>> I did the following (on FVP with 8 cores):
>> - start dom0 with dom0_max_vcpus=1
>> - remove core 1 from dom0 cpupool: xl cpupool-cpu-remove Pool-0 1
>> - create a new pool: xl cpupool-create name=\"NetPool\”
>> - add core 1 to the pool: xl cpupool-cpu-add NetPool 1
>> - create a guest in NetPool using the following in the guest config: pool=“NetPool"
>> I end with a crash with the following call trace during guest creation:
>> (XEN) Xen call trace:
>> (XEN)    [<0000000000234cb0>] credit2.c#csched2_alloc_udata+0x58/0xfc (PC)
>> (XEN)    [<0000000000234c80>] credit2.c#csched2_alloc_udata+0x28/0xfc (LR)
>> (XEN)    [<0000000000242d38>] sched_move_domain+0x144/0x6c0
>> (XEN)    [<000000000022dd18>] cpupool.c#cpupool_move_domain_locked+0x38/0x70
>> (XEN)    [<000000000022fadc>] cpupool_do_sysctl+0x73c/0x780
>> (XEN)    [<000000000022d8e0>] do_sysctl+0x788/0xa58
>> (XEN)    [<0000000000273b68>] traps.c#do_trap_hypercall+0x78/0x170
>> (XEN)    [<0000000000274f70>] do_trap_guest_sync+0x138/0x618
>> (XEN)    [<0000000000260458>] entry.o#guest_sync_slowpath+0xa4/0xd4
>> After some debugging I found out that unit->vcpu_list is NULL after unit->vcpu_list = d->vcpu[unit->unit_id]; with unit_id 0 in common/sched/core.c:688
>> This makes the call to is_idle_unit(unit) in csched2_alloc_udata trigger the crash.
> 
> So there is no vcpu 0 in that domain? How is this possible?

No idea, I will need to dig deeper as the state I come to does not make sense.

Could you just confirm that my operations are right and this should work before I start digging ?

Cheers
Bertrand

> 
> 
> Juergen
> <OpenPGP_0xB0DE9DD628BF132F.asc>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Crash when using cpupools
  2021-09-06  8:23 ` Juergen Gross
  2021-09-06  8:30   ` Bertrand Marquis
@ 2021-09-06  8:46   ` Andrew Cooper
  2021-09-06  9:28     ` Juergen Gross
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Cooper @ 2021-09-06  8:46 UTC (permalink / raw)
  To: Juergen Gross, Bertrand Marquis, Xen-devel
  Cc: Julien Grall, Stefano Stabellini, Dario Faggioli

On 06/09/2021 09:23, Juergen Gross wrote:
> On 03.09.21 17:41, Bertrand Marquis wrote:
>> Hi,
>>
>> While doing some investigation with cpupools I encountered a crash
>> when trying to isolate a guest to its own physical cpu.
>>
>> I am using current staging status.
>>
>> I did the following (on FVP with 8 cores):
>> - start dom0 with dom0_max_vcpus=1
>> - remove core 1 from dom0 cpupool: xl cpupool-cpu-remove Pool-0 1
>> - create a new pool: xl cpupool-create name=\"NetPool\”
>> - add core 1 to the pool: xl cpupool-cpu-add NetPool 1
>> - create a guest in NetPool using the following in the guest config:
>> pool=“NetPool"
>>
>> I end with a crash with the following call trace during guest creation:
>> (XEN) Xen call trace:
>> (XEN)    [<0000000000234cb0>] credit2.c#csched2_alloc_udata+0x58/0xfc
>> (PC)
>> (XEN)    [<0000000000234c80>] credit2.c#csched2_alloc_udata+0x28/0xfc
>> (LR)
>> (XEN)    [<0000000000242d38>] sched_move_domain+0x144/0x6c0
>> (XEN)    [<000000000022dd18>]
>> cpupool.c#cpupool_move_domain_locked+0x38/0x70
>> (XEN)    [<000000000022fadc>] cpupool_do_sysctl+0x73c/0x780
>> (XEN)    [<000000000022d8e0>] do_sysctl+0x788/0xa58
>> (XEN)    [<0000000000273b68>] traps.c#do_trap_hypercall+0x78/0x170
>> (XEN)    [<0000000000274f70>] do_trap_guest_sync+0x138/0x618
>> (XEN)    [<0000000000260458>] entry.o#guest_sync_slowpath+0xa4/0xd4
>>
>> After some debugging I found out that unit->vcpu_list is NULL after
>> unit->vcpu_list = d->vcpu[unit->unit_id]; with unit_id 0 in
>> common/sched/core.c:688
>> This makes the call to is_idle_unit(unit) in csched2_alloc_udata
>> trigger the crash.
>
> So there is no vcpu 0 in that domain? How is this possible?

Easy, depending on the order of hypercalls issued by the toolstack.

Between DOMCTL_createdomain and DOMCTL_max_vcpus, the domain exists but
the vcpus haven't been allocated.

~Andrew



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Crash when using cpupools
  2021-09-06  8:46   ` Andrew Cooper
@ 2021-09-06  9:28     ` Juergen Gross
  2021-09-06  9:46       ` Juergen Gross
  2021-09-06 10:14       ` Bertrand Marquis
  0 siblings, 2 replies; 7+ messages in thread
From: Juergen Gross @ 2021-09-06  9:28 UTC (permalink / raw)
  To: Andrew Cooper, Bertrand Marquis, Xen-devel
  Cc: Julien Grall, Stefano Stabellini, Dario Faggioli


[-- Attachment #1.1.1: Type: text/plain, Size: 2116 bytes --]

On 06.09.21 10:46, Andrew Cooper wrote:
> On 06/09/2021 09:23, Juergen Gross wrote:
>> On 03.09.21 17:41, Bertrand Marquis wrote:
>>> Hi,
>>>
>>> While doing some investigation with cpupools I encountered a crash
>>> when trying to isolate a guest to its own physical cpu.
>>>
>>> I am using current staging status.
>>>
>>> I did the following (on FVP with 8 cores):
>>> - start dom0 with dom0_max_vcpus=1
>>> - remove core 1 from dom0 cpupool: xl cpupool-cpu-remove Pool-0 1
>>> - create a new pool: xl cpupool-create name=\"NetPool\”
>>> - add core 1 to the pool: xl cpupool-cpu-add NetPool 1
>>> - create a guest in NetPool using the following in the guest config:
>>> pool=“NetPool"
>>>
>>> I end with a crash with the following call trace during guest creation:
>>> (XEN) Xen call trace:
>>> (XEN)    [<0000000000234cb0>] credit2.c#csched2_alloc_udata+0x58/0xfc
>>> (PC)
>>> (XEN)    [<0000000000234c80>] credit2.c#csched2_alloc_udata+0x28/0xfc
>>> (LR)
>>> (XEN)    [<0000000000242d38>] sched_move_domain+0x144/0x6c0
>>> (XEN)    [<000000000022dd18>]
>>> cpupool.c#cpupool_move_domain_locked+0x38/0x70
>>> (XEN)    [<000000000022fadc>] cpupool_do_sysctl+0x73c/0x780
>>> (XEN)    [<000000000022d8e0>] do_sysctl+0x788/0xa58
>>> (XEN)    [<0000000000273b68>] traps.c#do_trap_hypercall+0x78/0x170
>>> (XEN)    [<0000000000274f70>] do_trap_guest_sync+0x138/0x618
>>> (XEN)    [<0000000000260458>] entry.o#guest_sync_slowpath+0xa4/0xd4
>>>
>>> After some debugging I found out that unit->vcpu_list is NULL after
>>> unit->vcpu_list = d->vcpu[unit->unit_id]; with unit_id 0 in
>>> common/sched/core.c:688
>>> This makes the call to is_idle_unit(unit) in csched2_alloc_udata
>>> trigger the crash.
>>
>> So there is no vcpu 0 in that domain? How is this possible?
> 
> Easy, depending on the order of hypercalls issued by the toolstack.
> 
> Between DOMCTL_createdomain and DOMCTL_max_vcpus, the domain exists but
> the vcpus haven't been allocated.

Oh yes, indeed.

Bertrand, does the attached patch fix the issue for you?


Juergen

[-- Attachment #1.1.2: 0001-xen-sched-fix-sched_move_domain-for-domain-without-v.patch --]
[-- Type: text/x-patch, Size: 1417 bytes --]

From 82af7d22a69a8cac633a6b2a40bc7d52dac5c5e8 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Dario Faggioli <dfaggioli@suse.com>
Date: Mon, 6 Sep 2021 11:19:12 +0200
Subject: [PATCH] xen/sched: fix sched_move_domain() for domain without vcpus

In case a domain is created with a cpupool other than Pool-0 specified
it will be moved to that cpupool before any vcpus are allocated.

This will lead to a NULL pointer dereference in sched_move_domain().

Fix that by tolerating vcpus not being allocated yet.

Fixes: 61649709421a5a7c1 ("xen/domain: Allocate d->vcpu[] in domain_create()")
Reported-by: Bertrand Marquis <Bertrand.Marquis@arm.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 8d178baf3d..79c9100680 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -671,6 +671,10 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
 
     for ( unit_idx = 0; unit_idx < n_units; unit_idx++ )
     {
+        /* Special case for move at domain creation time. */
+        if ( !d->vcpu[unit_idx * gran] )
+            break;
+
         unit = sched_alloc_unit_mem();
         if ( unit )
         {
-- 
2.26.2


[-- Attachment #1.1.3: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Crash when using cpupools
  2021-09-06  9:28     ` Juergen Gross
@ 2021-09-06  9:46       ` Juergen Gross
  2021-09-06 10:14       ` Bertrand Marquis
  1 sibling, 0 replies; 7+ messages in thread
From: Juergen Gross @ 2021-09-06  9:46 UTC (permalink / raw)
  To: Andrew Cooper, Bertrand Marquis, Xen-devel
  Cc: Julien Grall, Stefano Stabellini, Dario Faggioli


[-- Attachment #1.1.1: Type: text/plain, Size: 2320 bytes --]

On 06.09.21 11:28, Juergen Gross wrote:
> On 06.09.21 10:46, Andrew Cooper wrote:
>> On 06/09/2021 09:23, Juergen Gross wrote:
>>> On 03.09.21 17:41, Bertrand Marquis wrote:
>>>> Hi,
>>>>
>>>> While doing some investigation with cpupools I encountered a crash
>>>> when trying to isolate a guest to its own physical cpu.
>>>>
>>>> I am using current staging status.
>>>>
>>>> I did the following (on FVP with 8 cores):
>>>> - start dom0 with dom0_max_vcpus=1
>>>> - remove core 1 from dom0 cpupool: xl cpupool-cpu-remove Pool-0 1
>>>> - create a new pool: xl cpupool-create name=\"NetPool\”
>>>> - add core 1 to the pool: xl cpupool-cpu-add NetPool 1
>>>> - create a guest in NetPool using the following in the guest config:
>>>> pool=“NetPool"
>>>>
>>>> I end with a crash with the following call trace during guest creation:
>>>> (XEN) Xen call trace:
>>>> (XEN)    [<0000000000234cb0>] credit2.c#csched2_alloc_udata+0x58/0xfc
>>>> (PC)
>>>> (XEN)    [<0000000000234c80>] credit2.c#csched2_alloc_udata+0x28/0xfc
>>>> (LR)
>>>> (XEN)    [<0000000000242d38>] sched_move_domain+0x144/0x6c0
>>>> (XEN)    [<000000000022dd18>]
>>>> cpupool.c#cpupool_move_domain_locked+0x38/0x70
>>>> (XEN)    [<000000000022fadc>] cpupool_do_sysctl+0x73c/0x780
>>>> (XEN)    [<000000000022d8e0>] do_sysctl+0x788/0xa58
>>>> (XEN)    [<0000000000273b68>] traps.c#do_trap_hypercall+0x78/0x170
>>>> (XEN)    [<0000000000274f70>] do_trap_guest_sync+0x138/0x618
>>>> (XEN)    [<0000000000260458>] entry.o#guest_sync_slowpath+0xa4/0xd4
>>>>
>>>> After some debugging I found out that unit->vcpu_list is NULL after
>>>> unit->vcpu_list = d->vcpu[unit->unit_id]; with unit_id 0 in
>>>> common/sched/core.c:688
>>>> This makes the call to is_idle_unit(unit) in csched2_alloc_udata
>>>> trigger the crash.
>>>
>>> So there is no vcpu 0 in that domain? How is this possible?
>>
>> Easy, depending on the order of hypercalls issued by the toolstack.
>>
>> Between DOMCTL_createdomain and DOMCTL_max_vcpus, the domain exists but
>> the vcpus haven't been allocated.
> 
> Oh yes, indeed.
> 
> Bertrand, does the attached patch fix the issue for you?

Oh, just realized that the "Fixes:" tag is referencing the wrong commit,
that should be 70fadc41635b9b62


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Crash when using cpupools
  2021-09-06  9:28     ` Juergen Gross
  2021-09-06  9:46       ` Juergen Gross
@ 2021-09-06 10:14       ` Bertrand Marquis
  1 sibling, 0 replies; 7+ messages in thread
From: Bertrand Marquis @ 2021-09-06 10:14 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Andrew Cooper, Xen-devel, Julien Grall, Stefano Stabellini,
	Dario Faggioli

HI Juergen,

> On 6 Sep 2021, at 10:28, Juergen Gross <jgross@suse.com> wrote:
> 
> On 06.09.21 10:46, Andrew Cooper wrote:
>> On 06/09/2021 09:23, Juergen Gross wrote:
>>> On 03.09.21 17:41, Bertrand Marquis wrote:
>>>> Hi,
>>>> 
>>>> While doing some investigation with cpupools I encountered a crash
>>>> when trying to isolate a guest to its own physical cpu.
>>>> 
>>>> I am using current staging status.
>>>> 
>>>> I did the following (on FVP with 8 cores):
>>>> - start dom0 with dom0_max_vcpus=1
>>>> - remove core 1 from dom0 cpupool: xl cpupool-cpu-remove Pool-0 1
>>>> - create a new pool: xl cpupool-create name=\"NetPool\”
>>>> - add core 1 to the pool: xl cpupool-cpu-add NetPool 1
>>>> - create a guest in NetPool using the following in the guest config:
>>>> pool=“NetPool"
>>>> 
>>>> I end with a crash with the following call trace during guest creation:
>>>> (XEN) Xen call trace:
>>>> (XEN)    [<0000000000234cb0>] credit2.c#csched2_alloc_udata+0x58/0xfc
>>>> (PC)
>>>> (XEN)    [<0000000000234c80>] credit2.c#csched2_alloc_udata+0x28/0xfc
>>>> (LR)
>>>> (XEN)    [<0000000000242d38>] sched_move_domain+0x144/0x6c0
>>>> (XEN)    [<000000000022dd18>]
>>>> cpupool.c#cpupool_move_domain_locked+0x38/0x70
>>>> (XEN)    [<000000000022fadc>] cpupool_do_sysctl+0x73c/0x780
>>>> (XEN)    [<000000000022d8e0>] do_sysctl+0x788/0xa58
>>>> (XEN)    [<0000000000273b68>] traps.c#do_trap_hypercall+0x78/0x170
>>>> (XEN)    [<0000000000274f70>] do_trap_guest_sync+0x138/0x618
>>>> (XEN)    [<0000000000260458>] entry.o#guest_sync_slowpath+0xa4/0xd4
>>>> 
>>>> After some debugging I found out that unit->vcpu_list is NULL after
>>>> unit->vcpu_list = d->vcpu[unit->unit_id]; with unit_id 0 in
>>>> common/sched/core.c:688
>>>> This makes the call to is_idle_unit(unit) in csched2_alloc_udata
>>>> trigger the crash.
>>> 
>>> So there is no vcpu 0 in that domain? How is this possible?
>> Easy, depending on the order of hypercalls issued by the toolstack.
>> Between DOMCTL_createdomain and DOMCTL_max_vcpus, the domain exists but
>> the vcpus haven't been allocated.
> 
> Oh yes, indeed.
> 
> Bertrand, does the attached patch fix the issue for you?

It does, my guest is now booting properly :-)
So this is solving the issue on arm (and probably on x86 if it was present but untested by me).

Feel free to add to your patch my:
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>

Thanks a lot for the quick fix
Cheers
Bertrand

> 
> 
> Juergen
> <0001-xen-sched-fix-sched_move_domain-for-domain-without-v.patch><OpenPGP_0xB0DE9DD628BF132F.asc>


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-09-06 10:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-03 15:41 Crash when using cpupools Bertrand Marquis
2021-09-06  8:23 ` Juergen Gross
2021-09-06  8:30   ` Bertrand Marquis
2021-09-06  8:46   ` Andrew Cooper
2021-09-06  9:28     ` Juergen Gross
2021-09-06  9:46       ` Juergen Gross
2021-09-06 10:14       ` Bertrand Marquis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.